Using the World Bank's Public Database API in Python
The World Bank offers some of the most comprehensive publicly available macroeconomic datasets. In this entry, we will be looking at how to install the World Bank's public database API (aka WBGAPI) in Python, how to import data and, importantly, how to work with it. We will be looking at a specific use case and will dive into macroeconomic data for Madagascar. Our program will include code for building a DataFrame, sorting and transposing data, renaming headers, calculating growth rates, indexing time series and different methods for charting macroeconomic indicators. Let's get started!
Disclaimer: The following code was written in and ran using Visual Studio Code and the latest Anaconda distribution. Any datasets and/or libraries used throughout this program is publicly available and the code itself is solely intended for educational purposes.
Author: Johary Razafindratsita, 2022.
To begin, the World Bank Database API (WBGAPI) needs to be installed using the pip command in a Python terminal window:
pip install wbgapi
Now, let's import the modules and libraries we will be using, namely Pandas, Matplotlib and Numpy:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import wbgapi as wb
We are now ready to select a few macroeconomic variables to analyze. Let's have a look at Madagascar's GDP, GDP per Capita, Human Capital Index, Inflation Rate and Foreign Direct Investment net inflows.
To streamline and make our program easier to navigate, let's rename the World Bank's mnemonics and us our own simplified versions:
GDP = 'NY.GDP.MKTP.KD' # GDP in constant 2015 $US
HCI = 'HD.HCI.OVRL' # Human Capital Index
GDPPC = 'NY.GDP.PCAP.KD' # GDP per capita in constant 2015 $US
CPI = 'FP.CPI.TOTL.ZG' # Inflation rate
FDI = 'BX.KLT.DINV.WD.GD.ZS' # Foreign Direct Investment as a share of GDP
Let's build a table with historical data using the above series. To do this, we will be importing the data into a Pandas DataFrame. Our table will be formatted so that the years are displayed as rows using a timeframe ranging from 1960 to 2021:
Table_1 = wb.data.DataFrame([GDP, HCI, GDPPC, CPI, FDI], 'MDG', time=range(1960, 2021), numericTimeKeys=True, labels=True, columns='series')
Table_1 = pd.DataFrame(Table_1)
Let's rename our column headers using the mnemonics we created previously:
Table_1.rename(columns = {'NY.GDP.MKTP.KD' : 'GDP', 'HD.HCI.OVRL' : 'HCI', 'NY.GDP.PCAP.KD' : 'GDPPC', 'FP.CPI.TOTL.ZG' : 'CPI', 'BX.KLT.DINV.WD.GD.ZS' : 'FDI'}, inplace = True)
The data was downloaded with the latest years at the top, so it's now time to sort our time series and move the latest data to the bottom of our table:
Table_1 = Table_1.sort_values(by = ['Time'], ascending = True)
Next, let's calculate growth rates for our GDP and GDP per Capita time series. To do this, we will be using a simple loop function associated with the 'pct_change' method multiplied by 100 and add these new series to our table:
for i in ['GDP', 'GDPPC']:
Table_1[i+'_G'] = Table_1[i].pct_change().mul(100)
It's now time to make some charts. Let's start with our newly calculated growth rates. We will also label our lines, as well as our vertical and horizontal axes and give our chart a title:
Figure_1 = Table_1[['GDP_G', 'GDPPC_G']].plot()
plt.title('Annual % Change in GDP and GDP per Capita - Madagascar')
plt.xlabel('Time')
plt.ylabel('Growth Rate')
plt.legend(['GDP', 'GDP per Capita'])
Now, let's export, name and save our newly created chart to our local drive:
plt.savefig('MDG_GDP.png', dpi = 300)
Now, let's try a bar chart. Let's use Foreign Domestic Investment, which we downloaded earlier and is expressed as a percentage of GDP. Because this specific time series starts in 1970 (as opposed to 1960), we will first need to create a subset of our original data table to use for our chart:
Table_2 = Table_1.loc['1970':]
We can now proceed to charting our time series. Let's use a different method this time - this should make things a tad more interesting. We will also add a line at "0" on the vertical axis as that doesn't get added by default, and will save our newly created chart to our local drive:
Figure_2, ax = plt.subplots()
ax.bar(Table_2.index.values, Table_2['FDI'], color = 'green')
ax.set(xlabel = "Years", ylabel = "% of GDP", title = "Foreign Domestic Investment, Net Inflows - Madagascar")
plt.axhline(y = 0, color = "black", linewidth = "0.75")
plt.savefig('MDG_FDI.png', dpi = 300)
For our final chart, let's try making a dual vertical axis graph. But first, let's spice things up a little and create an index for our constant $US GDP per Capita series. For this example, we will anchor it to a base year which we will set to 1965 (1965 = 100). Let's define it in our program:
Base_year = Table_1.loc[1965]
Now let's create and calculate our indexed GDP per Capita series:
Table_1['GDPPC_Idx'] = (Table_1['GDPPC'] *100 / Base_year['GDPPC'])
We can now proceed to charting our new indexed series, alongside CPI which we will plot on a different y axis. To do this, we will be using twinx() and as usual, we will add a legend, chart and axes titles and save the chart to our local drive:
Figure_3, ax = plt.subplots()
ax2 = ax.twinx()
ax.plot(Table_1.index.values, Table_1['GDPPC_Idx'], color = 'purple', label = "GDP per Capita")
ax.set(xlabel = "Years", ylabel = "GDP per Capita, Index (100 = 1965)", title = "GDP per Capita & Inflation - Madagascar")
ax2.plot(Table_1.index.values, Table_1['CPI'], color = 'black', linestyle = ":", label = "CPI")
ax2.set_ylabel('CPI, Annual % Change')
Figure_3.legend(loc="upper right", bbox_to_anchor=(1,1), bbox_transform=ax.transAxes)
plt.savefig('MDG_GDP_CPI.png', dpi = 300)
To conclude this entry, let's save, name and export our data table to Excel:
pd.DataFrame.to_excel(Table_1, excel_writer='WB_MDG_Data.xlsx')