In this chapter, we will also draw upon financial data from Quandl's API to create insightful visualizations. If you are not familiar with Quandl, it is a financial and economic data warehouse that stores millions of datasets from hundreds of publishers. The best thing about Quandl is that these datasets are delivered via the unified API, without worrying about the procedures to parse the data correctly. Anonymous users can get up to 50 API calls per day, and you get up to 500 free API calls if you are a registered user. Readers can sign up for a free API key at https://www.quandl.com/?modal=register.
At Quandl, every dataset is identified by a unique ID, as defined by the Quandl Code on each search result webpage. For example, the Quandl code GOOG/NASDAQ_SWTX defines the historical NASDAQ index data published by Google Finance. Every dataset is available in three formats--CSV, JSON, and XML.
Although an official Python client library is available from Quandl, we are not going to use that for the sake of demonstrating the general procedures of importing JSON data. According to Quandl's documentation, we can fetch JSON formatted data tables through the following API call:
GET https://www.quandl.com/api/v3/datasets/{Quandl code}/data.json
Let's try to get the Big Mac index data from Quandl.
from urllib.request import urlopen
import json
import time
import pandas as pd
def get_bigmac_codes():
"""Get a Pandas DataFrame of all codes in the Big Mac index dataset
The first column contains the code, while the second header
contains the description of the code.
for example,
ECONOMIST/BIGMAC_ARG,Big Mac Index - Argentina
ECONOMIST/BIGMAC_AUS,Big Mac Index - Australia
ECONOMIST/BIGMAC_BRA,Big Mac Index - Brazil
Returns:
codes: Pandas DataFrame of Quandl dataset codes"""
codes_url = "https://www.quandl.com/api/v3/databases/ECONOMIST/codes"
codes = pd.read_csv(codes_url, header=None, names=['Code', 'Description'],
compression='zip', encoding='latin_1')
return codes
def get_quandl_dataset(api_key, code):
"""Obtain and parse a quandl dataset in Pandas DataFrame format
Quandl returns dataset in JSON format, where data is stored as a
list of lists in response['dataset']['data'], and column headers
stored in response['dataset']['column_names'].
for example, {'dataset': {...,
'column_names': ['Date',
'local_price',
'dollar_ex',
'dollar_price',
'dollar_ppp',
'dollar_valuation',
'dollar_adj_valuation',
'euro_adj_valuation',
'sterling_adj_valuation',
'yen_adj_valuation',
'yuan_adj_valuation'],
'data': [['2017-01-31',
55.0,
15.8575,
3.4683903515687,
10.869565217391,
-31.454736135007,
6.2671477203176,
8.2697553162259,
29.626894343348,
32.714616745128,
13.625825886047],
['2016-07-31',
50.0,
14.935,
3.3478406427854,
9.9206349206349,
-33.574590420925,
2.0726096168216,
0.40224795003514,
17.56448458418,
19.76377270142,
11.643103380531]
],
'database_code': 'ECONOMIST',
'dataset_code': 'BIGMAC_ARG',
... }}
A custom column--country is added to denote the 3-letter country code.
Args:
api_key: Quandl API key
code: Quandl dataset code
Returns:
df: Pandas DataFrame of a Quandl dataset
"""
base_url = "https://www.quandl.com/api/v3/datasets/"
url_suffix = ".json?api_key="
# Fetch the JSON response
u = urlopen(base_url + code + url_suffix + api_key)
response = json.loads(u.read().decode('utf-8'))
# Format the response as Pandas Dataframe
df = pd.DataFrame(response['dataset']['data'], columns=response['dataset']['column_names'])
# Label the country code
df['country'] = code[-3:]
return df
quandl_dfs = []
codes = get_bigmac_codes()
# Replace this with your own API key
api_key = "INSERT YOUR KEY HERE"
for code in codes.Code:
# Get the DataFrame of a Quandl dataset
df = get_quandl_dataset(api_key, code)
# Store in a list
quandl_dfs.append(df)
# Prevents exceeding the API speed limit
time.sleep(2)
# Concatenate the list of dataframes into a single one
bigmac_df = pd.concat(quandl_dfs)
bigmac_df.head()
The expected output is as follows:
- | Date | local_price | dollar_ex | dollar_price | dollar_ppp | dollar_valuation | dollar_adj_valuation | euro_adj_valuation | sterling_adj_valuation | yen_adj_valuation | yuan_adj_valuation | country |
0 | 2017-01-31 | 55.0 | 15.85750 | 3.468390 | 10.869565 | -31.454736 | 6.26715 | 8.26976 | 29.6269 | 32.7146 | 13.6258 | ARG |
1 | 2016-07-31 | 50.0 | 14.93500 | 3.347841 | 9.920635 | -33.574590 | 2.07261 | 0.402248 | 17.5645 | 19.7638 | 11.6431 | ARG |
2 | 2016-01-31 | 33.0 | 13.80925 | 2.389703 | 6.693712 | -51.527332 | -24.8619 | -18.714 | -18.7209 | 0.40859 | -17.029 | ARG |
3 | 2015-07-31 | 28.0 | 9.13500 | 3.065134 | 5.845511 | -36.009727 | -4.7585 | -0.357918 | -6.01091 | 30.8609 | 5.02868 | ARG |
4 | 2015-01-31 | 28.0 | 8.61000 | 3.252033 | 5.845511 | -32.107881 | 0.540242 | -0.804495 | -2.49468 | 34.3905 | 6.01183 | ARG |
The Big Mac index was invented by The Economist in 1986 as a lighthearted guide to check whether currencies are at their correct level. It is based on the theory of purchasing power parity (PPP) and is considered an informal measure of currency exchange rates at PPP. It measures their value against a similar basket of goods and services, in this case, a Big Mac. Differing prices at market exchange rates would imply that one currency is undervalued or overvalued.
The code for parsing JSON from the Quandl API is a bit more complicated, and thus extra explanations might help you to understand it. The first function, get_bigmac_codes(), parses the list of all available dataset codes in the Quandl Economist database as a pandas DataFrame. Meanwhile, the second function, get_quandl_dataset(api_key, code), converts the JSON response of a Quandl dataset API query to a pandas DataFrame. All datasets obtained are concatenated using pandas.concat().