Pandas make it very easy to get the count of unique values for a single column of a DataFrame. Information like this can easily be used to create charts that help us better understand the data we're working with.
import pandas as pd
accidents_data_file = '/Users/robertdempsey/Dropbox/private/Python Business Intelligence Cookbook/Data/Stats19-Data1979-2004/Accidents7904.csv' accidents = pd.read_csv(accidents_data_file, sep=',', header=0, index_col=False, parse_dates=['Date'], dayfirst=True, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skip_blank_lines=True )
value_counts()
method provided by Pandas, and pass in the single column of the DataFrame that you want to see the unique values and counts for:pd.value_counts(accidents['Date'])
We begin by importing the Python libraries we need and by creating a DataFrame from the source data. We then use the value_counts()
function, specifying the column we want to see the results for, to get the unique counts. By default, Pandas excludes the NA values, and returns the results in descending order.
Value_counts()
has a few additional arguments you may want to use, such as:
normalize
: When set to True
, the results will contain the relative frequencies of the unique values.sort
: To sort by values.ascending
: To sort in an ascending order.bins
: When a number of bins is provided, rather than counting values, Pandas will group the values into half-open bins. This is only a convenience for pd.cut
, and only works with numeric data.dropna
: When set to True
, null values are not included in the count of unique.