libhdfs— a native shared library—provides a C API that enables non-Java programs to interact with HDFS. libhdfs uses JNI to interact with HDFS through Java.
Current Hadoop distributions contain the pre-compiled libhdfs libraries for 32-bit and 64-bit Linux operating systems. You may have to download the Hadoop standard distribution and compile the libhdfs library from the source code, if your operating system is not compatible with the pre-compiled libraries. Refer to the Mounting HDFS (Fuse-DFS) recipe for information on compiling the libhdfs library.
The following steps show you how to perform operations on a HDFS installation using a HDFS C API:
NAMENODE_HOSTNAME
and PORT
with the relevant values corresponding to the NameNode of your HDFS cluster. The hdfs_cpp_demo.c
source file is provided in the HDFS_C_API
directory of the source code bundle for this folder.#include "hdfs.h" int main(intargc, char **argv) { hdfsFS fs =hdfsConnect("NAMENODE_HOSTNAME,PORT); if (!fs) { fprintf(stderr, "Cannot connect to HDFS. "); exit(-1); } char* fileName = "demo_c.txt"; char* message = "Welcome to HDFS C API!!!"; int size = strlen(message); int exists = hdfsExists(fs, fileName); if (exists > -1) { fprintf(stdout, "File %s exists! ", fileName); }else{ // Create and open file for writing hdfsFile outFile = hdfsOpenFile(fs, fileName, O_WRONLY|O_CREAT, 0, 0, 0); if (!outFile) { fprintf(stderr, "Failed to open %s for writing! ", fileName); exit(-2); } // write to file hdfsWrite(fs, outFile, (void*)message, size); hdfsCloseFile(fs, outFile); } // Open file for reading hdfsFile inFile = hdfsOpenFile(fs, fileName, O_RDONLY, 0, 0, 0); if (!inFile) { fprintf(stderr, "Failed to open %s for reading! ", fileName); exit(-2); } char* data = malloc(sizeof(char) * size); // Read from file. tSize readSize = hdfsRead(fs, inFile, (void*)data, size); fprintf(stdout, "%s ", data); free(data); hdfsCloseFile(fs, inFile); hdfsDisconnect(fs); return 0; }
gcc
as follows. When compiling you have to link with the libhdfs
and the JVM libraries. You also have to include the JNI header files of your Java installation. An example compiling command would look like the following. Replace the ARCH and the architecture dependent paths with the paths relevant for your system.>gcc hdfs_cpp_demo.c -I $HADOOP_HOME/src/c++/libhdfs -I $JAVA_HOME/include -I $JAVA_HOME/include/linux/ -L $HADOOP_HOME/c++/ARCH/lib/ -L $JAVA_HOME/jre/lib/ARCH/server -l hdfs -ljvm -o hdfs_cpp_demo
CLASSPATH
with the Hadoop dependencies. A safe approach is to include all the jar files in $HADOOP_HOME
and in the $HADOOP_HOME/lib
.export CLASSPATH=$HADOOP_HOME/hadoop-core-xx.jar:....
Ant build script to generate the classpath
Add the following Ant target to the build
file given in step 2 of the HDFS Java API recipe. The modified build.xml
script is provided in the HDFS_C_API
folder of the source package for this chapter.
<target name="print-cp"> <property name="classpath" refid="hadoop-classpath"/> <echo message="classpath= ${classpath}"/> </target>
Execute the Ant build using ant print-cp
to generate a string with all the jars in $HADOOP_HOME
and $HADOOP_HOME/lib
. Copy and export this string as the CLASSPATH
environmental variable.
>LD_LIBRARY_PATH=$HADOOP_HOME/c++/ARCH/lib:$JAVA_HOME/jre/lib/ARCH/server ./hdfs_cpp_demo Welcome to HDFS C API!!!
First we connect to a HDFS cluster using the hdfsConnect
command by providing the hostname (or the IP address) and port of the NameNode of the HDFS cluster. The
hdfsConnectAsUser
command can be used to connect to a HDFS cluster as a specific user.
hdfsFS fs =hdfsConnect("NAMENODE_HOSTNAME",PORT);
We create new file and obtain a handle to the newly created file using the hdfsOpenFile
command. The O_WRONLY|O_CREAT
flags create a new file or override the existing file and open it in write only mode. Other supported flags are O_RDONLY
and O_APPEND
. The fourth, fifth, and sixth parameters of the hdfsOpenFile
command are the buffer size for read/write operations, block replication factor and block size for the newly created file. Specify 0
if you want to use the default values for these three parameters.
hdfsFile outFile = hdfsOpenFile(fs, fileName,flags, 0, 0, 0);
The hdfsWrite
command writes the provided data in to the file specified by the outFile
handle. Data size needs to be specified using the number of bytes.
hdfsWrite(fs, outFile, (void*)message, size);
The hdfsRead
command reads data from the file specified by the inFile
. The size of the buffer in bytes needs to be provided as the fourth parameter. The hdfsRead
command returns the actual number of bytes read from the file that might be less than the buffer size. If you want to ensure certain amounts of bytes that are read from the file, it is advisable to use the hdfsRead
command from inside a loop until the specified number of bytes are read.
char* data = malloc(sizeof(char) * size); tSize readSize = hdfsRead(fs, inFile, (void*)data, size);
HDFS C API (libhdfs) supports many more filesystem operations than the functions we have used in the preceding sample. Refer to the $HADOOP_HOME/src/c++/libhdfs/hdfs.h
header file for more information.
You can also use the HDFS configuration files to point libhdfs to your HDFS NameNode, instead of specifying the NameNode hostname and the port number in the hdfsConnect
command.
hdfsConnect
command to 'default'
and 0
. (Setting the host as NULL would make libhdfs to use the local filesystem).hdfsFS fs = hdfsConnect("default",0);
conf
directory of your HDFS installation to the CLASSPATH
environmental variable.export CLASSPATH=$HADOOP_HOME/hadoop-core-xx.jar:....:$HADOOP_HOME/conf