Merging files in HDFS

This recipe shows how to merge files in HDFS to create a single file. This is useful when retrieving the output of a MapReduce computation with multiple reducers where each reducer produces a part of the output.

How to do it...

  1. The HDFS getMerge command can copy the files in a given path in HDFS to a single concatenated file in the local filesystem.
    >bin/hadoop fs -getmerge /user/foo/demofiles merged.txt
    

How it works...

The getmerge command has the following syntax:

hadoopfs -getmerge <src> <localdst> [addnl]

The getmerge command has three parameters. The first parameter, <src files> is the HDFS path to the directory that contains the files to be concatenated. <dist file> is the local filename of the merged file. addnl is an optional parameter that adds a new line in the result file, after the data from each merged file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset