Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Generating summary statistics for the entire dataset

One of the first steps that business intelligence professionals perform on a new dataset is creating summary statistics. These statistics can be generated for an entire dataset or a part of it. In this recipe, you'll learn how to create summary statistics for the entire dataset.

How to do it…

To generate summary statistics for the entire dataset, begin by importing the libraries that you need:
```
import pandas as pd
```

Next, import the dataset from the CSV file:

accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
accidents = pd.read_csv(accidents_data_file,
                        sep=',',
                        header=0,
                        index_col=False,
                        parse_dates=['Date'],
                        dayfirst=True,
                        tupleize_cols=False,
                        error_bad_lines=True,
                        warn_bad_lines=True,
                        skip_blank_lines=True
                        )

After that, use the describe function to generate summary stats for the entire dataset:
```
accidents.describe()
```
Finally, transpose the results provided by describe() to make the results more readable:
```
accidents.describe().transpose()
```

How it works…

We first import the Python libraries we need, and create a new Pandas DataFrame from the data file:

accidents.describe()

Next, we use the describe() function provided by Pandas to show the count, mean, standard deviation (std), minimum value, maximum value, and the 25 percent, 50 percent and 75 percent quartiles.

In order to read the results of describe() a bit more easily, we use the transpose() function to convert the columns into rows and rows into columns:

accidents.describe().transpose()

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Generating summary statistics for the entire dataset

Create new playlist

Sign In

Sign Up

Generating summary statistics for the entire dataset

How to do it…

How it works…

Table of Contents for
Generating summary statistics for the entire dataset