Summary

This chapter begins with an introduction to the concept of Hadoop, which provides us with a deeper understanding of its distributed architecture on storages and processes, why and when we will use it, its working mechanism, and how the distributed job/task tracker works.

Following the introduction is the walkthrough of Pentaho Data Integration working on Hortonworks Sandbox, one of the Hadoop distributions, that is suitable for learning Hadoop. The chapter shows you how to read and write a datafile to HDFS, import it to Hive, and query the data using a SQL-like language.

In the following chapters, we will discuss how to extend the usage of Hadoop with the help of other Pentaho tools and present it visually using CTools, a community driven visualization tool.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset