Extracting data from pandas

Interacting with pandas is quite easy. In fact, with pandas being built upon NumPy, arrays can easily be extracted from DataFrame objects, and they can be transformed into DataFrames themselves.

First, let's upload some data into a DataFrame. The BostonHouse example we downloaded in the previous chapter from the ML repository is suitable:

In: import pandas as pd
import numpy as np
housing_filename = 'regression-datasets-housing.csv'
housing = pd.read_csv(housing_filename, header=None)

As demonstrated in the Heterogeneous lists section, at this point, the .values method will extract an array of a type that accommodates all the different types that are present in the DataFrame:

In: housing_array = housing.values
housing_array.dtype

Out: dtype('float64')

In such a case, the selected type is float64 because the float type prevails over the int type:

In: housing.dtypes

Out: 0 float64
1 int64
2 float64
3 int64
4 float64
5 float64
6 float64
7 float64
8 int64
9 int64
10 int64
11 float64
12 float64
13 float64
dtype: object

Asking for the types used by the DataFrame object before extracting your NumPy array by using the .dtypes method on the DataFrame allows you to anticipate the dtype of the resulting array. Consequently, it allows you to decide whether to transform or change the type of the variables in the DataFrame object before proceeding (please consult the Working with categorical and textual data section of this chapter).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset