Generating quantiles for a single column

According to a definition provided by Google, quantiles are any set of values of a that divide a frequency distribution into equal groups, each containing the same fraction of the total population. Examples of quantiles in everyday life include things such as top 10 percent of the class or the bottom 5 percent of customers. We can create any quantile we want using Pandas.

How to do it…

  1. To generate quantiles for a single column in a Pandas DataFrame, begin by importing the required libraries:
    import pandas as pd
  2. Next, import the dataset from the CSV file:
    accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
    accidents = pd.read_csv(accidents_data_file,
                            sep=',',
                            header=0,
                            index_col=False,
                            parse_dates=['Date'],
                            dayfirst=True,
                            tupleize_cols=False,
                            error_bad_lines=True,
                            warn_bad_lines=True,
                            skip_blank_lines=True
                            )
  3. Finally, use the quantile() method of the DataFrame, and specify the quantiles you want to see for the specified column:
    accidents['Number_of_Vehicles'].quantile(
    [.05, .1, .25, .5, .75, .9, .99]
    )

How it works…

We begin by importing the Python libraries we need and by creating a DataFrame from the source data. We then create a set of quantiles for the Number_of_Vehicles column:

How it works…
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset