Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Shuffle and sort

Once the mappers are done with the input data processing (essentially, splitting the data and generating key/value pairs), the output has to be distributed across the cluster to start the reduce tasks. Hence, a reduce task starts with the shuffle and sort step, by taking the output files written by all of the mappers and subsequent partitioners and downloads them to the local machine in which the reducer task is running. These individual data pieces are then sorted by key into one larger list of key/value pairs. The purpose of this sort is to group equivalent keys together, so that their values can be iterated over easily in the reduce task. The framework handles everything automatically, with the ability for the custom code to control how the keys are sorted and grouped.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Shuffle and sort

Create new playlist

Sign In

Sign Up

Table of Contents for
Shuffle and sort