This recipe shows how to merge files in HDFS to create a single file. This is useful when retrieving the output of a MapReduce computation with multiple reducers where each reducer produces a part of the output.
The getmerge
command has the following syntax:
hadoopfs -getmerge <src> <localdst> [addnl]
The getmerge
command has three parameters. The first parameter, <src files>
is the HDFS path to the directory that contains the files to be concatenated. <dist file>
is the local filename of the merged file. addnl
is an optional parameter that adds a new line in the result file, after the data from each merged file.