258 | Big Data Simplied
4. Now check the output in HDFS in the specified path as shown below.
hadoop fs -ls /data/pythonoutput
hadoop fs -cat /data/pythonoutput/part-00000
Summary
Before we start Python programming, first
we need to import a few basic libraries of
Python as follows.
‘os’ for using operating system depen-
dent functions.
‘numpy’ for numerical operations.
‘pandas’ for extensive data manipulation
functionalities.
‘matplotlib’ for producing 2D plots to
render visualization and helps in explor-
ing the data sets.
‘scikit-learn’ for implementing machine
learning functionalities in Python.
To install a package, ‘pip’ can be used.
‘pip’ is a package management system in
Python, which is used to install new pack-
ages. The ‘pip’ commands can be run from
terminal (Example: pip install numpy).
The basic plots supported by matplotlib
library for exploratory data analysis are
box plot, histogram and scatter plot.
Python numpy library provides some
array-like objects known as memmap to
create a memory-map to an array stored in
a binary file on disk. The memmap array
can be used in any place where numpy
ndarray is accepted.
Message passing interface (MPI) is a stan-
dard messaging system which is leveraged
for parallel computing. MPI for Python
supports convenient, pickle-based com-
munication of generic Python object at a
fast, near C-speed.
The pickle library in Python is used to
serialize and deserialize Python object
(for example, list, dict, etc.) structure so
that it can be saved on disk.
Hadoop Streaming is a very good utility
that comes with the Hadoop distribution
package as a specific library. Hadoop
streaming can be performed using
Python.
Multiple-choice Questions (1 Mark Questions)
1. Which of the following returns the type of
an object?
a. type
b. class
c. typeof
d. None of the above
2. Which of the following is a package for
data manipulation functionalities?
M09 Big Data Simplified XXXX 01.indd 258 5/10/2019 10:23:02 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset