There's more...

Now that you have understood the basics, load HDFS with a large amount of text, for example, stories.

If you have the files in a compressed format, you can load them as is in HDFS. Both Hadoop and Spark have codecs for unzipping, which they use based on file extensions.

When wordsFlatMap was converted into the wordsMap RDD, there was an implicit conversion. This converts the RDD into PairRDD. This is an implicit conversion, which does not require anything to be done. If you are doing it in Scala code, add the following import statement:

import org.apache.spark.SparkContext._ 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset