Towards the end of this chapter, let's try to integrate all datasets that we have processed so far. Remember that we briefly introduced the three categories of population structures (that is, constrictive, stable, and expansive) earlier in this chapter?
In this section, we are going to implement a naive algorithm for classifying populations into one of the three categories. After that, we will explore different techniques of visualizing categorical data.
Most references online discuss visual classification of population pyramids only (for example, https://www.populationeducation.org/content/what-are-different-types-population-pyramids). Clustering-based methods do exist (for example, Korenjak-Cˇ erne, Kejžar, Batagelj (2008). Clustering of Population Pyramids. Informatica. 32.), but to date, mathematical definitions of population categories are scarcely discussed. We will build a naive classifier based on the ratio of populations between "0-4" and "50-54" age groups in the next example:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Select total population for each country in 2015
current_population = population_df[(population_df.Time == 2015) &
(population_df.Sex == 'Both')]
# A list for storing the population type for each country
pop_type_list = []
# Look through each country in the BigMac index dataset
for country in merged_df.country.unique():
# Make sure the country also exist in the GDP per capita dataset
if not country in current_population.country.values:
continue
# Calculate the ratio of population between "0-4" and "50-54"
# age groups
young = current_population[(current_population.country == country) &
(current_population.AgeGrp == "0-4")].Value
midage = current_population[(current_population.country == country) &
(current_population.AgeGrp == "50-54")].Value
ratio = float(young) / float(midage)
# Classify the populations based on arbitrary ratio thresholds
if ratio < 0.8:
pop_type = "constrictive"
elif ratio < 1.2 and ratio >= 0.8:
pop_type = "stable"
else:
pop_type = "expansive"
pop_type_list.append([country, ratio, pop_type])
# Convert the list to Pandas DataFrame
pop_type_df = pd.DataFrame(pop_type_list, columns=['country','ratio','population type'])
# Merge the BigMac index DataFrame with population type DataFrame
merged_df2 = pd.merge(merged_df, pop_type_df, how='inner', on='country')
merged_df2.head()
The expected output is as follows:
Date_x | local_price | dollar_ex | dollar_price | dollar_ppp | dollar_valuation | dollar_adj_valuation | euro_adj_valuation | sterling_adj_valuation | yen_adj_valuation | yuan_adj_valuation | country | Date_y | Value | ratio | population type | |
0 | 2015-01-31 | 28.00 | 8.610000 | 3.252033 | 5.845511 | -32.107881 | 0.540242 | -0.804495 | -2.49468 | 34.3905 | 6.01183 | ARG | 2015-12-31 | 10501.660269 | 1.695835 | expansive |
1 | 2015-01-31 | 5.30 | 1.227220 | 4.318705 | 1.106472 | -9.839144 | -17.8995 | -18.9976 | -20.3778 | 9.74234 | -13.4315 | AUS | 2015-12-31 | 54688.445933 | 0.961301 | stable |
2 | 2015-01-31 | 13.50 | 2.592750 | 5.206827 | 2.818372 | 8.702019 | 68.4555 | 66.2024 | 63.3705 | 125.172 | 77.6231 | BRA | 2015-12-31 | 11211.891104 | 1.217728 | expansive |
3 | 2015-01-31 | 2.89 | 0.661594 | 4.368235 | 0.603340 | -8.805115 | 3.11257 | 1.73343 | 0 | 37.8289 | 8.72415 | GBR | 2015-12-31 | 41182.619517 | 0.872431 | stable |
4 | 2015-01-31 | 5.70 | 1.228550 | 4.639616 | 1.189979 | -3.139545 | -2.34134 | -3.64753 | -5.28928 | 30.5387 | 2.97343 | CAN | 2015-12-31 | 50108.065004 | 0.690253 | constrictive |