Hadoop file system commands

If you are working with EMR or hadoop, the following file system command would be handy.

List the contents of a directory

hadoop fs -ls folderPath

To list the contents of the folder hdfs:/input/

hadoop fs -ls /input/data

To list all the files recursively in all subfolders

hadoop fs -ls -R /input/data

To remove a file

hadoop fs -rm /input/data/file1.txt

To remove the contents of a folder

hadoop fs -rm -r -f /input/data

Note it does not recognize combining the switches as ‘-rf’

-rm: Illegal option -rf
Usage: hadoop fs [generic options] -rm [-f] [-r|-R] [-skipTrash]  ...

Copy file from hadoop filesystem to local filesystem

hadoop fs -copyToLocal /input/data/file1.txt  /local/file1.txt

Copy a folder from hadoop fs to local filesystem

hadoop fs -copyToLocal /input/data/  /local/

Some Filesystem related exceptions you might encoutner when running jobs on EMR

Copy a file to hadoop file system

hadoop fs -copyFromLocal /local/file2.txt /input/data/


Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs:// already exists

It’s trying to create a folder but one already exists with the same name. May be that the same job was run previously so it needs to be cleaned up before rerunning. You can use the command hadoop fs -rm -r -f(see example above)to delete the folder. Make sure to keep a copy in case if you would need them.

  Siri April 13, 2016, 4:10 am

    Any way to open a file directly from hadoop cluster without copying it to the local file system. Using vim or something?

