Mounting HDFS (Fuse-DFS)

The Fuse-DFS project allows us to mount HDFS on Linux (supports many other flavors of Unix as well) as a standard filesystem. This allows any program or user to access and interact with HDFS similar to a traditional filesystem.

Getting ready

You must have the following software installed in your system.

  • Apache Ant (http://ant.apache.org/).
  • Fuse and fuse development packages. Fuse development files can be found in fuse-devel RPM for Redhat/Fedora and in libfuse-dev package for Debian/Ubuntu.

JAVA_HOME must be set to point to a JDK, not to a JRE.

You must have the root privileges for the node in which you are planning to mount the HDFS filesystem.

The following recipe assumes you already have pre-built libhdfs libraries. Hadoop contains pre-built libhdfs libraries for the Linux x86_64/i386 platforms. If you are using some other platform, first follow the Building libhdfs sub section in the more info section to build the libhdfs libraries.

How to do it...

The following steps show you how to mount an HDFS filesystem as a standard file system on Linux:

  1. Go to $HADOOP_HOME and create a new directory named build.
    >cd $HADOOP_HOME
    >mkdir build
    
  2. Create a symbolic link to the libhdfs libraries inside the build directory.
    >ln -s c++/Linux-amd64-64/lib/ build/libhdfs
    
  3. Copy the c++ directory to the build folder.
    >cp -R c++/ build/
    
  4. Build fuse-dfs by executing the following command in $HADOOP_HOME. This command will generate the fuse_dfs and fuse_dfs_wrapper.sh files in the build/contrib/fuse-dfs/ directory.
    > ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1
    

    Note

    If the build fails with messages similar to undefined reference to 'fuse_get_context', then append the following to the end of the src/contrib/fuse-dfs/src/Makefile.am file:

    fuse_dfs_LDADD=-lfuse -lhdfs -ljvm -lm
  5. Verify the paths in fuse_dfs_wrapper.sh and correct them. You may have to change the libhdfs path in the following line as follows:
    export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/$OS_ARCH/server:$HADOOP_HOME/build/libhdfs/:/usr/local/lib
    
  6. If it exists, uncomment the user_allow_other in /etc/fuse.conf.
  7. Create a directory as the mount point:
    >mkdir /u/hdfs
    
  8. Execute the following command from the build/contrib/fuse-dfs/ directory. You have to execute this command with root privileges. Make sure that the HADOOP_HOME and JAVA_HOME environmental variables are set properly in the root environment as well. The optional –d parameter enables the debug mode. It would be helpful to run the following command in the debug mode to identify any error when you run it for the first time. The rw parameter mounts the filesystem read-write (ro for read-only). –oserver must point to the NameNode hostname. –oport should provide the NameNode port number.
    >chmod a+x fuse_dfs_wrapper.sh
    >./fuse_dfs_wrapper.sh rw -oserver=localhost -oport=9000  /u/hdfs/  -d
    

How it works...

Fuse-DFS is based on the filesystem in user space. The FUSE project (http://fuse.sourceforge.net/) makes it possible to implement filesystems in the user space. Fuse-DFS interacts with HDFS filesystem using the libhdfs C API. libhdfs uses JNI to spawn a JVM that communicates with the configured HDFS NameNode.

There's more...

Many instances of HDFS can be mounted on to different directories using the Fuse-DFS as mentioned in the preceding sections.

Building libhdfs

In order to build libhdfs, you must have the following software installed in your system:

  • The ant-nodeps and ant-trax packages
  • The automake package
  • The Libtool package
  • The zlib-devel package
  • JDK 1.5—needed in the compile time for Apache Forrest
  • Apache Forrest (http://forrest.apache.org/)—use the 0.8 release

Compile libhdfs by executing the following command in $HADOOP_HOME:

>ant compile-c++-libhdfs -Dislibhdfs=1

Package the distribution together with libhdfs by executing the following command. Provide the path to JDK 1.5 using the -Djava5.home property. Provide the path to the Apache Forrest installation using the -Dforrest.home property.

>ant package -Djava5.home=/u/jdk1.5 -Dforrest.home=/u/apache-forrest-0.8

Check whether the build/libhdfs directory contains the libhdfs.* files. If it doesn't, copy those files to build/libhdfs from the build/c++/<your_architecture>/lib directory.

>cp -R build/c++/<Your_OS_Architecture/lib>/ build/libhdfs

See also

  • The HDFS C API recipe in this chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset