Hadoop

Hadoop is a leading technology within the field of data warehouse systems, mainly due to its ability to effectively handle large amounts of data. To maintain this ability, Hadoop fully exploits the concept of parallel computing by means of a central master that divides the all needed data related to a job into smaller chunks to be sent to two or more slaves. Those slaves are to be considered as nodes within a network, each of them working separately and locally. They can actually be physically separated pieces of hardware, but even core within a CPU (which is usually considered pseudo-parallel mode).

At the heart of Hadoop is the MapReduce programming model. This model, originally conceptualized by Google, consists of a processing layer, and is responsible for moving the data mining activity close to where data resides. This minimizes the time and cost needed to perform computation, allowing for the possibility to scale the process to hundreds and hundreds of different nodes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset