Visualizing categorical data

Towards the end of this chapter, let's try to integrate all datasets that we have processed so far. Remember that we briefly introduced the three categories of population structures (that is, constrictive, stable, and expansive) earlier in this chapter?

In this section, we are going to implement a naive algorithm for classifying populations into one of the three categories. After that, we will explore different techniques of visualizing categorical data.

Most references online discuss visual classification of population pyramids only (for example, https://www.populationeducation.org/content/what-are-different-types-population-pyramids). Clustering-based methods do exist (for example, Korenjak-Cˇ erne, Kejžar, Batagelj (2008). Clustering of Population Pyramids. Informatica. 32.), but to date, mathematical definitions of population categories are scarcely discussed. We will build a naive classifier based on the ratio of populations between "0-4" and "50-54" age groups in the next example:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


# Select total population for each country in 2015
current_population = population_df[(population_df.Time == 2015) &
                                   (population_df.Sex == 'Both')]

# A list for storing the population type for each country
pop_type_list = []

# Look through each country in the BigMac index dataset
for country in merged_df.country.unique():
    # Make sure the country also exist in the GDP per capita dataset
    if not country in current_population.country.values:
       continue
 
    # Calculate the ratio of population between "0-4" and "50-54"
    # age groups
    young = current_population[(current_population.country == country) &
                               (current_population.AgeGrp == "0-4")].Value
 
    midage = current_population[(current_population.country == country) &
                                (current_population.AgeGrp == "50-54")].Value
 
    ratio = float(young) / float(midage)
 
    # Classify the populations based on arbitrary ratio thresholds
    if ratio < 0.8:
        pop_type = "constrictive"
    elif ratio < 1.2 and ratio >= 0.8:
        pop_type = "stable"
    else:
        pop_type = "expansive"
 
    pop_type_list.append([country, ratio, pop_type])

# Convert the list to Pandas DataFrame
pop_type_df = pd.DataFrame(pop_type_list, columns=['country','ratio','population type'])

# Merge the BigMac index DataFrame with population type DataFrame
merged_df2 = pd.merge(merged_df, pop_type_df, how='inner', on='country')
merged_df2.head()

The expected output is as follows:

	Date_x	local_price	dollar_ex	dollar_price	dollar_ppp	dollar_valuation	dollar_adj_valuation	euro_adj_valuation	sterling_adj_valuation	yen_adj_valuation	yuan_adj_valuation	country	Date_y	Value	ratio	population type
0	2015-01-31	28.00	8.610000	3.252033	5.845511	-32.107881	0.540242	-0.804495	-2.49468	34.3905	6.01183	ARG	2015-12-31	10501.660269	1.695835	expansive
1	2015-01-31	5.30	1.227220	4.318705	1.106472	-9.839144	-17.8995	-18.9976	-20.3778	9.74234	-13.4315	AUS	2015-12-31	54688.445933	0.961301	stable
2	2015-01-31	13.50	2.592750	5.206827	2.818372	8.702019	68.4555	66.2024	63.3705	125.172	77.6231	BRA	2015-12-31	11211.891104	1.217728	expansive
3	2015-01-31	2.89	0.661594	4.368235	0.603340	-8.805115	3.11257	1.73343	0	37.8289	8.72415	GBR	2015-12-31	41182.619517	0.872431	stable
4	2015-01-31	5.70	1.228550	4.639616	1.189979	-3.139545	-2.34134	-3.64753	-5.28928	30.5387	2.97343	CAN	2015-12-31	50108.065004	0.690253	constrictive

Table of Contents for Visualizing categorical data

Create new playlist

Sign In

Sign Up

Table of Contents for
Visualizing categorical data