By default, the describe()
function restricts the stats to numerical or categorical columns. Use the following to include object columns:
import pandas as pd
accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv' accidents = pd.read_csv(accidents_data_file, sep=',', header=0, index_col=False, parse_dates=['Date'], dayfirst=True, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skip_blank_lines=True )
describe()
method of the DataFrame, and instruct it to include the object
type columns:accidents.describe(include=['object'])
Just as in the Generating summary statistics for the entire dataset recipe, we start by importing the Python libraries we need and by creating a Pandas DataFrame from the CSV file. Once we have our DataFrame, we call the describe()
function, and tell it to include the columns of type 'object'
, which in this case are Accident_Index
, Time
, Local_Authority_Highway
, and LSOA_of_Accident_Location
.
Something important to note here is that rather than provide the typical describe()
statistics, we're provided with count (non-null values), unique, top, and frequency (freq).