Chapter 6 Introducing Spark and Kafka

OBJECTIVE

CHAPTER

Introducing Spark

andKafka

6.1 Introducing Spark

6.2 Working with Kafka

Now we have covered the core Big Data components,

such as Hadoop, MapReduce and NoSQL. It is the right

time to introduce another very important aspect of the

Hadoop ecosystem, i.e., Apache Spark. Spark is widely

used across organizations to process large data sets.

It is extremely popular for its great processing speed

and ability to integrate with diverse databases. Apache

Spark is accompanied by Apache Kafka, an open source

distributed streaming platform which is used to stream

data. Developed in Scala and Java by LinkedIn, it was

contributed to the Apache Software Foundation. It pro-

vides uni ed, high-throughput, low-latency platform

for handling real-time data feeds. We shall study the

functions of Apache Kafka in this chapter.

M06 Big Data Simplified XXXX 01.indd 117 5/17/2019 2:49:07 PM

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6 Introducing Spark and Kafka