Getting the mode of the entire dataset

Harkening back to Algebra class, the mode is the value that occurs most often. Let's see how to discover that for our dataset.

How to do it…

  1. In order to get the mode of the entire dataset, begin by importing the libraries needed:
    import pandas as pd
  2. Next, import the dataset from the CSV file:
    accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
    accidents = pd.read_csv(accidents_data_file,
                            sep=',',
                            header=0,
                            index_col=False,
                            parse_dates=['Date'],
                            dayfirst=True,
                            tupleize_cols=False,
                            error_bad_lines=True,
                            warn_bad_lines=True,
                            skip_blank_lines=True
                            )
  3. Finally, show the mode of each column, and transpose it so we can read everything in IPython Notebook:
    accidents.mode().transpose()

How it works…

We first import the Python libraries we need, and create a DataFrame from our source CSV file. It's then a simple matter to use the .mode() function on the DataFrame, and see the results. Since we're using IPython Notebook, we also use the transpose() function we saw in the Generating summary statistics for the entire dataset recipe to make the results a bit more intuitive.

How it works…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset