OpenStack data-processing service – Sahara

In the field of data analytics, there are multiple data-processing frameworks. Apache Hadoop is a popular data-processing framework. It is an implementation of MapReduce. Hadoop works on clusters, and services such as Amazon Elastic MapReduce (EMR) provide managed Hadoop clusters. Sahara is an OpenStack project that aims at providing users an easy way to provision and manage clusters for frameworks such as Hadoop, Spark, and Storm.

The user is asked to specify various parameters such as the cluster topology, the version of the framework, node type, and so on. Once the user specifies these parameters, Sahara will then provision the required clusters in a matter of minutes. Using Sahara, an already-existing cluster can be scaled out or scaled in by adding or removing nodes.

Sahara interacts with various other OpenStack components. The following diagram shows you some of the various component interactions with Sahara:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset