Visualizing categorical data

Towards the end of this chapter, let's try to integrate all datasets that we have processed so far. Remember that we briefly introduced the three categories of population structures (that is, constrictive, stable, and expansive) earlier in this chapter?

In this section, we are going to implement a naive algorithm for classifying populations into one of the three categories. After that, we will explore different techniques of visualizing categorical data.

Most references online discuss visual classification of population pyramids only (for example, https://www.populationeducation.org/content/what-are-different-types-population-pyramids). Clustering-based methods do exist (for example, Korenjak-Cˇ erne, Kejžar, Batagelj (2008). Clustering of Population Pyramids. Informatica. 32.), but to date, mathematical definitions of population categories are scarcely discussed. We will build a naive classifier based on the ratio of populations between "0-4" and "50-54" age groups in the next example:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


# Select total population for each country in 2015
current_population = population_df[(population_df.Time == 2015) &
(population_df.Sex == 'Both')]

# A list for storing the population type for each country
pop_type_list = []

# Look through each country in the BigMac index dataset
for country in merged_df.country.unique():
# Make sure the country also exist in the GDP per capita dataset
if not country in current_population.country.values:
continue

# Calculate the ratio of population between "0-4" and "50-54"
# age groups
young = current_population[(current_population.country == country) &
(current_population.AgeGrp == "0-4")].Value

midage = current_population[(current_population.country == country) &
(current_population.AgeGrp == "50-54")].Value

ratio = float(young) / float(midage)

# Classify the populations based on arbitrary ratio thresholds
if ratio < 0.8:
pop_type = "constrictive"
elif ratio < 1.2 and ratio >= 0.8:
pop_type = "stable"
else:
pop_type = "expansive"

pop_type_list.append([country, ratio, pop_type])

# Convert the list to Pandas DataFrame
pop_type_df = pd.DataFrame(pop_type_list, columns=['country','ratio','population type'])

# Merge the BigMac index DataFrame with population type DataFrame
merged_df2 = pd.merge(merged_df, pop_type_df, how='inner', on='country')
merged_df2.head()

The expected output is as follows:

Date_x local_price dollar_ex dollar_price dollar_ppp dollar_valuation dollar_adj_valuation euro_adj_valuation sterling_adj_valuation yen_adj_valuation yuan_adj_valuation country Date_y Value ratio population type
0 2015-01-31 28.00 8.610000 3.252033 5.845511 -32.107881 0.540242 -0.804495 -2.49468 34.3905 6.01183 ARG 2015-12-31 10501.660269 1.695835 expansive
1 2015-01-31 5.30 1.227220 4.318705 1.106472 -9.839144 -17.8995 -18.9976 -20.3778 9.74234 -13.4315 AUS 2015-12-31 54688.445933 0.961301 stable
2 2015-01-31 13.50 2.592750 5.206827 2.818372 8.702019 68.4555 66.2024 63.3705 125.172 77.6231 BRA 2015-12-31 11211.891104 1.217728 expansive
3 2015-01-31 2.89 0.661594 4.368235 0.603340 -8.805115 3.11257 1.73343 0 37.8289 8.72415 GBR 2015-12-31 41182.619517 0.872431 stable
4 2015-01-31 5.70 1.228550 4.639616 1.189979 -3.139545 -2.34134 -3.64753 -5.28928 30.5387 2.97343 CAN 2015-12-31 50108.065004 0.690253 constrictive
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset