Working with Big Data inPython | 253
PHP, Scala, Perl, UNIX and many more. This utility allows us to create and run Map/Reduce jobs
with any executable or script as the mapper and/or the reducer. A sample of the Hadoop execu-
tion is given below.
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/tools/lib/hadoop-
streaming.jar
-input myInputDirs
-output myOutputDir
-mapper /bin/cat
-reducer /bin/wc
9.4.2 Python MapReduce Code
mapper.py
#!/usr/bin/python
import sys
#Word Count Example
# input comes from standard input STDIN
for line in sys.stdin:
line = line.strip() #remove leading and trailing whitespaces
words = line.split() #split the line into words and returns as
a list
for word in words:
#write the results to standard output STDOUT
print %s %s % (word, 1) #Emit the word
#!/usr/bin/python
import sys
#Word Count Example
# input comes from standard input STDIN
for line in sys.stdin:
line = line.strip() #remove leading and trailing whitespaces
words = line.split() #split the line into words and returns as a list
for word in words:
#write the results to standard output STDOUT
print ‘%s %s’ % (word, “1”) #Emit the wor
d
reducer.py
#!/usr/bin/python
import sys
from operator import itemgetter
# using a dictionary to map words to their counts
current_count = {}
# input comes from STDIN
for line in sys.stdin:
line = line.strip()
word,count = line.split( ,1)
M09 Big Data Simplified XXXX 01.indd 253 5/10/2019 10:22:59 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset