Generating summary statistics for the entire dataset

One of the first steps that business intelligence professionals perform on a new dataset is creating summary statistics. These statistics can be generated for an entire dataset or a part of it. In this recipe, you'll learn how to create summary statistics for the entire dataset.

How to do it…

  1. To generate summary statistics for the entire dataset, begin by importing the libraries that you need:
    import pandas as pd
  2. Next, import the dataset from the CSV file:
    accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
    accidents = pd.read_csv(accidents_data_file,
                            sep=',',
                            header=0,
                            index_col=False,
                            parse_dates=['Date'],
                            dayfirst=True,
                            tupleize_cols=False,
                            error_bad_lines=True,
                            warn_bad_lines=True,
                            skip_blank_lines=True
                            )
  3. After that, use the describe function to generate summary stats for the entire dataset:
    accidents.describe()
  4. Finally, transpose the results provided by describe() to make the results more readable:
    accidents.describe().transpose()

How it works…

We first import the Python libraries we need, and create a new Pandas DataFrame from the data file:

accidents.describe()

Next, we use the describe() function provided by Pandas to show the count, mean, standard deviation (std), minimum value, maximum value, and the 25 percent, 50 percent and 75 percent quartiles.

How it works…

In order to read the results of describe() a bit more easily, we use the transpose() function to convert the columns into rows and rows into columns:

accidents.describe().transpose()
How it works…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset