HDFS is a distributed filesystem, and just like a Unix filesystem, it allows users to manipulate the filesystem using shell commands. This recipe explains how to use the HDFS basic command line to execute those commands.
It is worth noting that HDFS commands have a one-to-one correspondence with Unix commands. For example, consider the following command:
>hadoop dfs –cat /data/foo.txt
The command reads the /data/foo.txt
file and prints it to the screen, just like the cat
command in Unix system.
HADOOP_HOME
./test
:>bin/hadoop dfs -mkdir /test
/
as the root directory just like the Unix filesystem. Run the following command to list the content of the HDFS root directory:>bin/hadoop dfs -ls /
>bin/hadoop dfs -put README.txt /test
/test
directory:>bin/hadoop dfs -ls /test
Found 1 items -rw-r--r-- 1 srinath supergroup 1366 2012-04-10 07:06 /test/README.txt
/test/README.txt
to local directory:>bin/hadoop dfs -get /test/README.txt README-NEW.txt
When a command is issued, the client will talk to the HDFS NameNode on the user's behalf and carry out the operation. Generally, we refer to a file or a folder using the path starting with /
; for example, /data
, and the client will pick up the NameNode from configurations in the HADOOP_HOME/conf
directory.
However, if needed, we can use a fully qualified path to force the client to talk to a specific NameNode. For example, hdfs://bar.foo.com:9000/data
will ask the client to talk to NameNode running on bar.foo.com
at the port 9000
.
HDFS supports most of the Unix commands such as cp
, mv
, and chown
, and they follow the same pattern as the commands discussed above. The document http://hadoop.apache.org/docs/r1.0.3/file_system_shell.html provides a list of all commands. We will use these commands throughout, in the recipes of the book.