Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Advanced HDFS

In this chapter, we will cover:

Benchmarking HDFS
Adding a new DataNode
Decommissioning DataNodes
Using multiple disks/volumes and limiting HDFS disk usage
Setting HDFS block size
Setting the file replication factor
Using HDFS Java API
Using HDFS C API (libhdfs)
Mounting HDFS (Fuse-DFS)
Merging files in HDFS

Introduction

Hadoop Distributed File System (HDFS) is a block-structured, distributed filesystem that is designed to run on a low-cost commodity hardware. HDFS supports storing massive amounts of data and provides high-throughput access to the data. HDFS stores file data across multiple nodes with redundancy to ensure fault-tolerance and high aggregate bandwidth.

HDFS is the default distributed filesystem used by the Hadoop MapReduce computations. Hadoop supports data locality aware processing of the data stored in HDFS. However, HDFS can be used as a general purpose distributed filesystem as well. HDFS architecture consists mainly of a centralized NameNode that handles the filesystem metadata and DataNodes that store the real data blocks. HDFS data blocks are typically coarser grained and perform better with large data products.

Setting up HDFS and other related recipes in Chapter 1, Getting Hadoop Up and Running in a Cluster, show how to deploy HDFS and give an overview of the basic operation of HDFS. In this chapter, you will be introduced to a selected set of advanced HDFS operations that would be useful when performing large-scale data processing with Hadoop MapReduce, as well as when using HDFS as a standalone distributed filesystem for non-MapReduce use cases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Advanced HDFS

Create new playlist

Sign In

Sign Up

Chapter 2. Advanced HDFS

Introduction

Table of Contents for
2. Advanced HDFS