Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. The pandas Data Structures

This chapter is one of the most important ones in this book. We will now begin to dive into the meat and bones of pandas. We start by taking a tour of NumPy ndarrays, a data structure not in pandas but NumPy. Knowledge of NumPy ndarrays is useful as it forms the foundation for the pandas data structures. Another key benefit of NumPy arrays is that they execute what is known as vectorized operations, which are operations that require traversing/looping on a Python array, much faster.

The topics we will cover in this chapter include the following:

Tour of numpy.ndarray data structure.
The pandas.Series 1-dimensional (1D) pandas data structure
The pandas.DatcaFrame 2-dimensional (2D) pandas tabular data structure
The pandas.Panel 3-dimensional (3D) pandas data structure

In this chapter, I will present the material via numerous examples using IPython, a browser-based interface that allows the user to type in commands interactively to the Python interpreter. Instructions for installing IPython are provided in the previous chapter.

NumPy ndarrays

The NumPy library is a very important package used for numerical computing with Python. Its primary features include the following:

The type numpy.ndarray, a homogenous multidimensional array
Access to numerous mathematical functions – linear algebra, statistics, and so on
Ability to integrate C, C++, and Fortran code

For more information about NumPy, see http://www.numpy.org.

The primary data structure in NumPy is the array class ndarray. It is a homogeneous multi-dimensional (n-dimensional) table of elements, which are indexed by integers just as a normal array. However, numpy.ndarray (also known as numpy.array) is different from the standard Python array.array class, which offers much less functionality. More information on the various operations is provided at http://scipy-lectures.github.io/intro/numpy/array_object.html.

NumPy array creation

NumPy arrays can be created in a number of ways via calls to various NumPy methods.

NumPy arrays via numpy.array

NumPy arrays can be created via the numpy.array constructor directly:

In [1]: import numpy as np
In [2]: ar1=np.array([0,1,2,3])# 1 dimensional array
In [3]: ar2=np.array ([[0,3,5],[2,8,7]]) # 2D array
In [4]: ar1
Out[4]: array([0, 1, 2, 3])
In [5]: ar2
Out[5]: array([[0, 3, 5],
               [2, 8, 7]])

The shape of the array is given via ndarray.shape:

In [5]: ar2.shape
Out[5]: (2, 3)

The number of dimensions is obtained using ndarray.ndim:

In [7]: ar2.ndim
Out[7]: 2

NumPy array via numpy.arange

ndarray.arange is the NumPy version of Python's range function:In [10]: # produces the integers from 0 to 11, not inclusive of 12

           ar3=np.arange(12); ar3
Out[10]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [11]: # start, end (exclusive), step size
        ar4=np.arange(3,10,3); ar4
Out[11]: array([3, 6, 9])

NumPy array via numpy.linspace

ndarray.linspace generates linear evenly spaced elements between the start and the end:

In [13]:# args - start element,end element, number of elements
        ar5=np.linspace(0,2.0/3,4); ar5
Out[13]:array([ 0.,  0.22222222,  0.44444444,  0.66666667])

NumPy array via various other functions

These functions include numpy.zeros, numpy.ones, numpy.eye, nrandom.rand, numpy.random.randn, and numpy.empty.

The argument must be a tuple in each case. For the 1D array, you can just specify the number of elements, no need for a tuple.