Harkening back to Algebra class, the mode is the value that occurs most often. Let's see how to discover that for our dataset.
import pandas as pd
accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv' accidents = pd.read_csv(accidents_data_file, sep=',', header=0, index_col=False, parse_dates=['Date'], dayfirst=True, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skip_blank_lines=True )
accidents.mode().transpose()
We first import the Python libraries we need, and create a DataFrame from our source CSV file. It's then a simple matter to use the .mode()
function on the DataFrame, and see the results. Since we're using IPython Notebook, we also use the transpose()
function we saw in the Generating summary statistics for the entire dataset recipe to make the results a bit more intuitive.