234 | Big Data Simplied
9.2.1
NumPy Library
NumPy is the basic package in Python for doing scientic computing. The main content of this
package includes functionality for multidimensional arrays, high-level mathematical functions,
for example, linear algebra and Fourier transform operations, random number generators, etc.
While exploring scikit-learn, which is the main library of Python to implement machine leaning
functionalities, we see that it highly uses NumPy array as its primary data structure. Therefore, as
the initial step, let us rst review NumPy array.
NumPy Array: The array data structure in NumPy library stores regular data, which are elements of
the same type, for example, integer in a structured way. The array can be of varying dimensions,
for example, one-dimensional or 1D, two-dimensional or 2D, three-dimensional or 3D and so
on. The dimension is termed as axis in NumPy. Hence, a 2D array has 2 axes.
Examples:
1D array
[3, 5, 16, 18]
This array has 1 axis of length 4.
2D array
[[3, 5, 16, 18]
[45, 3, 7, 79]]
This array has 2 axes. The rst axis has a length of 2 and the second axis has a length of 4.
3D array
[
[[ 45 7 0 ]
[ 34 2 8]]
[[ 2 22 9]
[ 4 5 42]]
]
This array has 3 axes. The rst and second axes have length 2 and the third axis has a length of 3.
NumPy array() function can be used to make NumPy array. The following are few of the salient
attributes of the array function.
• ndim: To define the number of axes of the array.
• dtype: To define the data type of the elements in the array.
• shape: To get the dimensions of the array.
• size: To get the total number of elements of the array.
The most commonly created array is an empty array of a specic dimension which can be used as
a data structure to hold dynamic data. Empty arrays can be created in the following way.
M09 Big Data Simplified XXXX 01.indd 234 5/10/2019 10:22:56 AM