Read in the college dataset, and find the mean and standard deviation of the undergraduate population by state:
>>> college = pd.read_csv('data/college.csv') >>> college.groupby('STABBR')['UGDS'].agg(['mean', 'std']) .round(0).head()
This output isn't quite what we desire. We are not looking for the mean and standard deviations of the entire group but the maximum number of standard deviations away from the mean for any one institution. In order to calculate this, we need to subtract the mean undergraduate population by state from each institution's undergraduate population and then divide by the standard deviation. This standardizes the undergraduate population for each group. We can then take the maximum of the absolute value of these scores to find the one that is farthest away from the mean. Pandas does not provide a function capable of doing this. Instead, we will need to create a custom function: