Setting HDFS block size

HDFS stores files across the cluster by breaking them down in to coarser grained, fixed-size blocks. The default HDFS block size is 64 MB. The block size of a data product can affect the performance of the filesystem operations where larger block sizes would be more effective, if you are storing and processing very large files. The block size of a data product can affect the performance of MapReduce computations, as the default behavior of Hadoop is to create one map task for each data block of the input files.

How to do it...

  1. To use the NameNode configuration file to set the HDFS block size, add or modify the following in the $HADOOP_HOME/conf/hdfs-site.xml. Block size is provided using the number of bytes. This change would not change the block size of the files that are already in the HDFS. Only the files copied after the change will have the new block size.
    <property>
      <name>dfs.block.size</name>
      <value>134217728</value>
    </property>
    
  2. To specify the HDFS block size for specific file paths, you can specify the block size when uploading the file from the command line as follows:
    >bin/hadoop fs -Ddfs.blocksize=134217728 -put data.in /user/foo

There's more...

You can also specify the block size when creating files using the HDFS Java API as well.

public FSDataOutputStream create(Path f,boolean overwrite, int bufferSize, short replication,long blockSize)

You can use the fsck command to find the block size and block locations of a particular file path in the HDFS. You can find this information by browsing the filesystem from the HDFS monitoring console as well.

>bin/hadoop fsck /user/foo/data.in -blocks -files -locations
......
/user/foo/data.in 215227246 bytes, 2 block(s): ....
0. blk_6981535920477261584_1059len=134217728 repl=1 [hostname:50010]
1. blk_-8238102374790373371_1059 len=81009518 repl=1 [hostname:50010]

......

See also

  • The Setting file replication factor recipe in this chapter.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset