How to do it...

  1. Read in the college dataset, group by state, and display the total number of groups. This should equal the number of unique states retrieved from the nunique Series method:
>>> college = pd.read_csv('data/college.csv', index_col='INSTNM')
>>> grouped = college.groupby('STABBR')
>>> grouped.ngroups
59

>>> college['STABBR'].nunique() # verifying the same number
59
  1. The grouped variable has a filter method, which accepts a custom function that determines whether a group is kept or not. The custom function gets implicitly passed a DataFrame of the current group and is required to return a boolean. Let's define a function that calculates the total percentage of minority students and returns True if this percentage is greater than a user-defined threshold:
>>> def check_minority(df, threshold):
minority_pct = 1 - df['UGDS_WHITE']
total_minority = (df['UGDS'] * minority_pct).sum()
total_ugds = df['UGDS'].sum()
total_minority_pct = total_minority / total_ugds
return total_minority_pct > threshold
  1. Use the filter method passed with the check_minority function and a threshold of 50% to find all states that have a minority majority:
>>> college_filtered = grouped.filter(check_minority, threshold=.5)
>>> college_filtered.head()
  1. Just looking at the output may not be indicative of what actually happened. The DataFrame starts with state Arizona (AZ) and not Alaska (AK) so we can visually confirm that something changed. Let's compare the shape of this filtered DataFrame with the original. Looking at the results, about 60% of the rows have been filtered, and only 20 states remain that have a minority majority:
>>> college.shape
(7535, 26)

>>> college_filtered.shape
(3028, 26)

>>> college_filtered['STABBR'].nunique()
20
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset