Getting a count of unique values for a single column

Pandas make it very easy to get the count of unique values for a single column of a DataFrame. Information like this can easily be used to create charts that help us better understand the data we're working with.

How to do it…

  1. To get a count of the unique values for a single column of a Pandas DataFrame, begin by importing the required libraries:
    import pandas as pd
  2. Next, import the dataset from the CSV file:
    accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv'
    accidents = pd.read_csv(accidents_data_file,
                            sep=',',
                            header=0,
                            index_col=False,
                            parse_dates=['Date'],
                            dayfirst=True,
                            tupleize_cols=False,
                            error_bad_lines=True,
                            warn_bad_lines=True,
                            skip_blank_lines=True
                            )
  3. Finally, use the value_counts() method provided by Pandas, and pass in the single column of the DataFrame that you want to see the unique values and counts for:
    pd.value_counts(accidents['Date'])

How it works…

We begin by importing the Python libraries we need and by creating a DataFrame from the source data. We then use the value_counts() function, specifying the column we want to see the results for, to get the unique counts. By default, Pandas excludes the NA values, and returns the results in descending order.

How it works…

Additional Arguments

Value_counts() has a few additional arguments you may want to use, such as:

  • normalize: When set to True, the results will contain the relative frequencies of the unique values.
  • sort: To sort by values.
  • ascending: To sort in an ascending order.
  • bins: When a number of bins is provided, rather than counting values, Pandas will group the values into half-open bins. This is only a convenience for pd.cut, and only works with numeric data.
  • dropna: When set to True, null values are not included in the count of unique.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset