Visualizing a bivariate distribution

We should bear in mind that the Big Mac index is not directly comparable between countries. Normally, we would expect commodities in poor countries to be cheaper than those in rich ones. To represent a fairer picture of the index, it would be better to show the relationship between Big Mac pricing and Gross Domestic Product (GDP) per capita.

We are going to acquire GDP per capita from Quandl's World Bank World Development Indicators (WWDI) dataset. Based on the previous code example of acquiring JSON data from Quandl, can you try to adapt it to download the GDP per capita dataset?

For those who are impatient, here is the full code:

import urllib
import json
import pandas as pd
import time
from urllib.request import urlopen


def get_gdp_dataset(api_key, country_code):
"""Obtain and parse a quandl GDP dataset in Pandas DataFrame format
Quandl returns dataset in JSON format, where data is stored as a
list of lists in response['dataset']['data'], and column headers
stored in response['dataset']['column_names'].

Args:
api_key: Quandl API key
country_code: Three letter code to represent country

Returns:
df: Pandas DataFrame of a Quandl dataset
"""
base_url = "https://www.quandl.com/api/v3/datasets/"
url_suffix = ".json?api_key="

# Compose the Quandl API dataset code to get GDP per capita
# (constant 2000 US$) dataset
gdp_code = "WWDI/" + country_code + "_NY_GDP_PCAP_KD"

# Parse the JSON response from Quandl API
# Some countries might be missing, so we need error handling code
try:
u = urlopen(base_url + gdp_code + url_suffix + api_key)
except urllib.error.URLError as e:
print(gdp_code,e)
return None

response = json.loads(u.read().decode('utf-8'))

# Format the response as Pandas Dataframe
df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])

# Add a new country code column
df['country'] = country_code

return df


api_key = "INSERT YOUR KEY HERE"
quandl_dfs = []

# Loop through all unique country code values in the BigMac index DataFrame
for country_code in bigmac_df.country.unique():
# Fetch the GDP dataset for the corresponding country
df = get_gdp_dataset(api_key, country_code)

# Skip if the response is empty
if df is None:
continue

# Store in a list DataFrames
quandl_dfs.append(df)

# Prevents exceeding the API speed limit
time.sleep(2)

# Concatenate the list of DataFrames into a single one
gdp_df = pd.concat(quandl_dfs)
gdp_df.head()

The expected output:

WWDI/EUR_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/SIN_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/ROC_NY_GDP_PCAP_KD HTTP Error 404: Not Found
WWDI/UAE_NY_GDP_PCAP_KD HTTP Error 404: Not Found
Date Value country
0 2015-12-31 10501.660269 ARG
1 2014-12-31 10334.780146 ARG
2 2013-12-31 10711.229530 ARG
3 2012-12-31 10558.265365 ARG
4 2011-12-31 10780.342508 ARG

 

We can see that the GDP per capita dataset is not available for four geographical locations, but we can ignore that for now.

Next, we will merge the two DataFrames that contain Big Mac Index and GDP per capita respectively using pandas.merge(). The most recent record in WWDI's GDP per capita dataset was collected at the end of 2015, so let's pair that up with the corresponding Big Mac index dataset in the same year.

For those who are familiar with the SQL language, pandas.merge() supports four modes, namely left, right, inner, and outer joins. Since we are interested in rows that have matching countries in both DataFrames only, we are going to choose inner join:

merged_df = pd.merge(bigmac_df[(bigmac_df.Date == "2015-01-31")], gdp_df[(gdp_df.Date == "2015-12-31")], how='inner', on='country')
merged_df.head()
Date_x local_price dollar_ex dollar_price dollar_ppp dollar_valuation dollar_adj_valuation euro_adj_valuation sterling_adj_valuation yen_adj_valuation yuan_adj_valuation country Date_y Value
0 2015-01-31 28.00 8.610000 3.252033 5.845511 -32.107881 0.540242 -0.804495 -2.49468 34.3905 6.01183 ARG 2015-12-31 10501.660269
1 2015-01-31 5.30 1.227220 4.318705 1.106472 -9.839144 -17.8995 -18.9976 -20.3778 9.74234 -13.4315 AUS 2015-12-31 54688.445933
2 2015-01-31 13.50 2.592750 5.206827 2.818372 8.702019 68.4555 66.2024 63.3705 125.172 77.6231 BRA 2015-12-31 11211.891104
3 2015-01-31 2.89 0.661594 4.368235 0.603340 -8.805115 3.11257 1.73343 0 37.8289 8.72415 GBR 2015-12-31 41182.619517
4 2015-01-31 5.70 1.228550 4.639616 1.189979 -3.139545 -2.34134 -3.64753 -5.28928 30.5387 2.97343 CAN 2015-12-31 50108.065004
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset