Visualizing a bivariate distribution

We should bear in mind that the Big Mac index is not directly comparable between countries. Normally, we would expect commodities in poor countries to be cheaper than those in rich ones. To represent a fairer picture of the index, it would be better to show the relationship between Big Mac pricing and Gross Domestic Product (GDP) per capita.

We are going to acquire GDP per capita from Quandl's World Bank World Development Indicators (WWDI) dataset. Based on the previous code example of acquiring JSON data from Quandl, can you try to adapt it to download the GDP per capita dataset?

For those who are impatient, here is the full code:

import urllib
import json
import pandas as pd
import time
from urllib.request import urlopen


def get_gdp_dataset(api_key, country_code):
    """Obtain and parse a quandl GDP dataset in Pandas DataFrame format
    Quandl returns dataset in JSON format, where data is stored as a 
    list of lists in response['dataset']['data'], and column headers
    stored in response['dataset']['column_names'].
 
    Args:
        api_key: Quandl API key
        country_code: Three letter code to represent country

    Returns:
        df: Pandas DataFrame of a Quandl dataset
    """
    base_url = "https://www.quandl.com/api/v3/datasets/"
    url_suffix = ".json?api_key="
 
    # Compose the Quandl API dataset code to get GDP per capita
    # (constant 2000 US$) dataset
    gdp_code = "WWDI/" + country_code + "_NY_GDP_PCAP_KD"
 
    # Parse the JSON response from Quandl API
    # Some countries might be missing, so we need error handling code
    try:
        u = urlopen(base_url + gdp_code + url_suffix + api_key)
    except urllib.error.URLError as e:
        print(gdp_code,e)
        return None
 
    response = json.loads(u.read().decode('utf-8'))
 
    # Format the response as Pandas Dataframe
    df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])
 
    # Add a new country code column
    df['country'] = country_code
 
    return df


api_key = "INSERT YOUR KEY HERE"
quandl_dfs = []

# Loop through all unique country code values in the BigMac index DataFrame
for country_code in bigmac_df.country.unique():
    # Fetch the GDP dataset for the corresponding country 
    df = get_gdp_dataset(api_key, country_code)
 
    # Skip if the response is empty
    if df is None:
        continue
 
    # Store in a list DataFrames
    quandl_dfs.append(df)
 
    # Prevents exceeding the API speed limit
    time.sleep(2)
 
# Concatenate the list of DataFrames into a single one 
gdp_df = pd.concat(quandl_dfs)
gdp_df.head()

The expected output:

WWDI/EUR_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/SIN_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/ROC_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/UAE_NY_GDP_PCAP_KD HTTP Error 404: Not Found

	Date	Value	country
0	2015-12-31	10501.660269	ARG
1	2014-12-31	10334.780146	ARG
2	2013-12-31	10711.229530	ARG
3	2012-12-31	10558.265365	ARG
4	2011-12-31	10780.342508	ARG

We can see that the GDP per capita dataset is not available for four geographical locations, but we can ignore that for now.

Next, we will merge the two DataFrames that contain Big Mac Index and GDP per capita respectively using pandas.merge(). The most recent record in WWDI's GDP per capita dataset was collected at the end of 2015, so let's pair that up with the corresponding Big Mac index dataset in the same year.

For those who are familiar with the SQL language, pandas.merge() supports four modes, namely left, right, inner, and outer joins. Since we are interested in rows that have matching countries in both DataFrames only, we are going to choose inner join:

merged_df = pd.merge(bigmac_df[(bigmac_df.Date == "2015-01-31")], gdp_df[(gdp_df.Date == "2015-12-31")], how='inner', on='country')
merged_df.head()

	Date_x	local_price	dollar_ex	dollar_price	dollar_ppp	dollar_valuation	dollar_adj_valuation	euro_adj_valuation	sterling_adj_valuation	yen_adj_valuation	yuan_adj_valuation	country	Date_y	Value
0	2015-01-31	28.00	8.610000	3.252033	5.845511	-32.107881	0.540242	-0.804495	-2.49468	34.3905	6.01183	ARG	2015-12-31	10501.660269
1	2015-01-31	5.30	1.227220	4.318705	1.106472	-9.839144	-17.8995	-18.9976	-20.3778	9.74234	-13.4315	AUS	2015-12-31	54688.445933
2	2015-01-31	13.50	2.592750	5.206827	2.818372	8.702019	68.4555	66.2024	63.3705	125.172	77.6231	BRA	2015-12-31	11211.891104
3	2015-01-31	2.89	0.661594	4.368235	0.603340	-8.805115	3.11257	1.73343	0	37.8289	8.72415	GBR	2015-12-31	41182.619517
4	2015-01-31	5.70	1.228550	4.639616	1.189979	-3.139545	-2.34134	-3.64753	-5.28928	30.5387	2.97343	CAN	2015-12-31	50108.065004

Table of Contents for Visualizing a bivariate distribution

Create new playlist

Sign In

Sign Up

Table of Contents for
Visualizing a bivariate distribution