Generating summary statistics for object type columns

By default, the describe() function restricts the stats to numerical or categorical columns. Use the following to include object columns:

How to do it…

  1. To generate the summary statistics for object type columns in a Pandas DataFrame, begin by importing the libraries needed:
    import pandas as pd
  2. Next, import the dataset from the CSV file:
    accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
    accidents = pd.read_csv(accidents_data_file,
                            sep=',',
                            header=0,
                            index_col=False,
                            parse_dates=['Date'],
                            dayfirst=True,
                            tupleize_cols=False,
                            error_bad_lines=True,
                            warn_bad_lines=True,
                            skip_blank_lines=True
                            )
  3. Finally, use the describe() method of the DataFrame, and instruct it to include the object type columns:
    accidents.describe(include=['object'])

How it works…

Just as in the Generating summary statistics for the entire dataset recipe, we start by importing the Python libraries we need and by creating a Pandas DataFrame from the CSV file. Once we have our DataFrame, we call the describe() function, and tell it to include the columns of type 'object', which in this case are Accident_Index, Time, Local_Authority_Highway, and LSOA_of_Accident_Location.

How it works…

Something important to note here is that rather than provide the typical describe() statistics, we're provided with count (non-null values), unique, top, and frequency (freq).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset