80 | Big Data Simplied
On the cluster running the Hadoop and MapReduce setup, we can consider that these key
value pairs are sent to the machine in which the reducer process operates.
FIGURE 4.12 Sorting of key value pairs
M
M
M
Split 1ABACB
CB BBBAAAAADC CCCCCD
DCADC
Split 2
Split 3
Now, these key value pairs are sorted such that all values which have the same key are available
together and are then fed to the reducer. The sorting implies that all prole views for a particular
member, say John, are available in a group.
FIGURE 4.13 Reducer function applied to sorted results
M
M
R
M
Split 1 A BAC B
CB BBBAAAAADC CCCCCDDD
DCADC
Split 2
Split 3
The reducer is a code that we have written to sum up the values associated with the same key.
As we can see, there is a lot going on here beyond the map and reduce logic for which we
write code. All these processes are handled completely behind the scenes by the MapReduce
framework collecting these key value pairs together, transferring them across the network to the
cluster node in which the reduce job runs, and sorting them so that the values associated with
the same key appear together.
4.3.2 Using Multiple Reducers
Let us now consider that we want two reducers running on two different nodes. There are now
two partitions to which the keys can be sent. Now, in this scenario, we have to gure out which
key is sent to which reducer and this process is called assigning partitions.
M04 Big Data Simplified XXXX 01.indd 80 5/10/2019 9:58:21 AM