234 | Big Data Simplied
9.2.1
NumPy Library
NumPy is the basic package in Python for doing scientic computing. The main content of this
package includes functionality for multidimensional arrays, high-level mathematical functions,
for example, linear algebra and Fourier transform operations, random number generators, etc.
While exploring scikit-learn, which is the main library of Python to implement machine leaning
functionalities, we see that it highly uses NumPy array as its primary data structure. Therefore, as
the initial step, let us rst review NumPy array.
NumPy Array: The array data structure in NumPy library stores regular data, which are elements of
the same type, for example, integer in a structured way. The array can be of varying dimensions,
for example, one-dimensional or 1D, two-dimensional or 2D, three-dimensional or 3D and so
on. The dimension is termed as axis in NumPy. Hence, a 2D array has 2 axes.
Examples:
1D array
[3, 5, 16, 18]
This array has 1 axis of length 4.
2D array
[[3, 5, 16, 18]
[45, 3, 7, 79]]
This array has 2 axes. The rst axis has a length of 2 and the second axis has a length of 4.
3D array
[
[[ 45 7 0 ]
[ 34 2 8]]
[[ 2 22 9]
[ 4 5 42]]
]
This array has 3 axes. The rst and second axes have length 2 and the third axis has a length of 3.
NumPy array() function can be used to make NumPy array. The following are few of the salient
attributes of the array function.
ndim: To define the number of axes of the array.
dtype: To define the data type of the elements in the array.
shape: To get the dimensions of the array.
size: To get the total number of elements of the array.
The most commonly created array is an empty array of a specic dimension which can be used as
a data structure to hold dynamic data. Empty arrays can be created in the following way.
M09 Big Data Simplified XXXX 01.indd 234 5/10/2019 10:22:56 AM
Working with Big Data inPython | 235
# Defining 1-D array variable with data
>>> var2 = np.empty(4)
>>> var2[0] = 5.67
>>> var2[1] = 2
>>> var2[2] = 56
>>> var2[3] = 304
>>> print(var2)
[ 5.67 2. 56. 304. ]
>>> print (var2.shape) # Returns the dimension of the array
(4,)
>>> print(var2.size) # Returns the size of the array
4
# Defining 2-D array variable with data
>>> var3 = np.empty((2,3))
>>> var3[0][0] = 5.67
>>> var3[0][1] = 2
>>> var3[0][2] = 56
>>> var3[1][0] = .09
>>> var3[1][1] = 132
>>> var3[1][2] = 1056
>>> print(var3)
[[ 5.67000000e+00 2.00000000e+00 5.60000000e+01]
[ 9.00000000e-02 1.32000000e+02 1.05600000e+03]]
[Note: Same result will be obtained with dtype=np.float]
# Collapse the 2-D array into a single dimension
>>> print(var3.flatten())
[5.670e+00 2.000e+00 5.600e+01 9.000e-02 1.320e+02 1.056e+03]
>>> print(var3.shape)
(2, 3)
# Same declaration with dtype mentioned
>>> var3 = np.empty((2,3), dtype=np.int)
[[ 5, 2, 56],
[ 0, 132, 1056]]
[Note: Note that the oat values have been rounded-down while converting them to integer,
for example, 5.67 rounded to 5 and .09 rounded to 0]
>>> print(var3[1]) # Returns a row of an array
[ 0 132 1056]
M09 Big Data Simplified XXXX 01.indd 235 5/10/2019 10:22:56 AM
236 | Big Data Simplied
>>> print(var3[[0, 1]]) # Returns multiple rows of an array
[[ 5 2 56]
[ 0 132 1056]]
>>> print(var3[:, 2]) # Returns a column of an array
[ 56 1056]
>>> print(var3[:, [1, 2]]) # Returns multiple column of an array
[[ 2 56]
[ 132 1056]]
>>> print(var3[1][2]) # Returns a cell value of an array
1056
>>> print(var3[1, 2]) # Returns a cell value of an array
1056
>>> print(np.transpose(var3)) # Returns transpose of an array
[[ 5 0]
[ 2 132]
[ 56 1056]]
>>>print(var3.reshape(3,2)) # Returns an array with changed
dimensions
[[ 5 2]
[ 56 0]
[ 132 1056]]
Practice Problem: Create and concatenate arrays.
>>>import numpy as np
>>>arr1= np.empty((2,3), dtype=np.int)
>>>arr1[0][0] = 5.67
>>>arr1[0][1] = 2
>>>arr1[0][2] = 56
>>>arr1[1][0] = .09
>>>arr1[1][1] = 132
>>>arr1[1][2] = 1056
>>>arr2 = np.empty((1,3), dtype=np.int)
>>>arr2[0][0] = 37
(Continued)
M09 Big Data Simplified XXXX 01.indd 236 5/10/2019 10:22:56 AM
Working with Big Data inPython | 237
>>>arr2[0][1] = 2.193
>>>arr2[0][2] = 5609
>>arr_concat = np.concatenate((var3, var5), axis = 0)
>>>print(arr_concat)
There are other variants of NumPy array which enables to create arrays full of ones, zeros, ran-
dom numbers or with pre-lled values as shown below.
# Create an array of 1s
>>> np.ones((2,3))
[[ 1., 1., 1.],
[ 1., 1., 1.]]
# Create an array of 0s
>>> np.zeros((2,3),dtype=np.int)
[[0, 0, 0],
[0, 0, 0]]
# Create an array with random numbers
>>> np.random.random((2,2))
[[ 0.47448072, 0.49876875],
[ 0.29531478, 0.48425055]]
# Defining an array variable with pre-filled data
>>> import numpy as np
>>> var1 = np.array([[10,2,3], [23,45,67]])
>>> print(var1)
[[10 2 3]
[23 45 67]]
Mathematical and Statistical Functions in NumPy: The following table summarizes the key mathematical
functions provided by NumPy.
Sr # Command Purpose Sample Code with Output
1. sin, cos, tan, arcsin,
arccos, arctan,
degrees, etc.
Trigonometric
functions
>>> import numpy as np
>>> from numpy import pi
>>> array1 = np.array([30,60,90])
>>> np.sin(a*np.pi/180)
array([0.5, 0.70710678, 1.])
(Continued)
M09 Big Data Simplified XXXX 01.indd 237 5/10/2019 10:22:56 AM
238 | Big Data Simplied
Sr # Command Purpose Sample Code with Output
2. around, oor, ceil For rounding
decimals to the
desired precision.
>>> arr2 = np.array([67.07,88.10,
34, 231.67, 0.934])
>>> print(arr2)
[ 67.07 88.1 34. 231.67 0.934]
>>> np.around(arr2)
array([ 67., 88., 34., 232., 1.])
>>> np.around(arr2, decimals = 2)
array([ 67.07, 88.1 , 34. ,
231.67, 0.93])
>>> np.floor(arr2)
array([ 67., 88., 34., 231., 0.])
>>> np.ceil(arr2)
array([ 68., 89., 34., 232., 1.])
3. add, subtract,
multiply, divide,
power, reciprocal,
mod, etc.
Basic mathematical
operations on
arrays.
>>> arr1 = np.arange(6, dtype =
np.int).reshape(2,3)
>>> arr1
array([[0, 1, 2],
[3, 4, 5]])
>>> arr2 = np.arange(4, 15, 2,
dtype = np.int).reshape(2,3)
>>> arr2
array([[ 4, 6, 8],
[10, 12, 14]])
>>> np.add(arr1, arr2)
array([[ 4, 7, 10],
[13, 16, 19]])
>>> np.subtract(arr1, arr2)
array([[-4, -5, -6],
[-7, -8, -9]])
>>> np.multiply(arr1, arr2)
array([[ 0, 6, 16],
[30, 48, 70]])
(Continued)
M09 Big Data Simplified XXXX 01.indd 238 5/10/2019 10:22:57 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset