Professional Documents
Culture Documents
Extended Case Code Sample
Extended Case Code Sample
Extended Case Code Sample
env: PROJ_DIR=c:\
1
In [102]: fig, ax = plt.subplots(1, 1)
world
2
26 4511488.0 Africa Central African Rep. CAF
27 33487208.0 North America Canada CAN
28 7604467.0 Europe Switzerland CHE
29 16601707.0 South America Chile CHL
.. ... ... ... ...
147 7379339.0 Europe Serbia SRB
148 481267.0 South America Suriname SUR
149 5463046.0 Europe Slovakia SVK
150 2005692.0 Europe Slovenia SVN
151 9059651.0 Europe Sweden SWE
152 1123913.0 Africa Swaziland SWZ
153 20178485.0 Asia Syria SYR
154 10329208.0 Africa Chad TCD
155 6019877.0 Africa Togo TGO
156 65905410.0 Asia Thailand THA
157 7349145.0 Asia Tajikistan TJK
158 4884887.0 Asia Turkmenistan TKM
159 1131612.0 Asia Timor-Leste TLS
160 1310000.0 North America Trinidad and Tobago TTO
161 10486339.0 Africa Tunisia TUN
162 76805524.0 Asia Turkey TUR
163 22974347.0 Asia Taiwan TWN
164 41048532.0 Africa Tanzania TZA
165 32369558.0 Africa Uganda UGA
166 45700395.0 Europe Ukraine UKR
167 3494382.0 South America Uruguay URY
168 313973000.0 North America United States USA
169 27606007.0 Asia Uzbekistan UZB
170 26814843.0 South America Venezuela VEN
171 86967524.0 Asia Vietnam VNM
172 218519.0 Oceania Vanuatu VUT
173 23822783.0 Asia Yemen YEM
174 49052489.0 Africa South Africa ZAF
175 11862740.0 Africa Zambia ZMB
176 12619600.0 Africa Zimbabwe ZWE
gdp_md_est \
0 22270.0
1 110300.0
2 21810.0
3 184300.0
4 573900.0
5 18770.0
6 760.4
7 16.0
8 800200.0
9 329500.0
10 77610.0
3
11 3102.0
12 389300.0
13 12830.0
14 17820.0
15 224000.0
16 93750.0
17 9093.0
18 29700.0
19 114100.0
20 2536.0
21 43270.0
22 1993000.0
23 20250.0
24 3524.0
25 27060.0
26 3198.0
27 1300000.0
28 316700.0
29 244500.0
.. ...
147 80340.0
148 4254.0
149 119500.0
150 59340.0
151 344300.0
152 5702.0
153 98830.0
154 15860.0
155 5118.0
156 547400.0
157 13160.0
158 29780.0
159 2520.0
160 29010.0
161 81710.0
162 902700.0
163 712000.0
164 54250.0
165 39380.0
166 339800.0
167 43160.0
168 15094000.0
169 71670.0
170 357400.0
171 241700.0
172 988.5
173 55280.0
174 491000.0
4
175 17500.0
176 9323.0
5
160
161 POLYGON ((9.482139926805274 30.30755605724619, 9.055602654668149 32.102691962201
162 (POLYGON ((36.91312706884216 41.33535838476431, 38.34766482926452 40.94858612727
163
164 POLYGON ((33.9037111971046 -0.9499999999999886, 34.07261999999997 -1.05981999999
165 POLYGON ((31.86617000000007 -1.027359999999931, 30.76986000000011 -1.01
166 POLYGON ((31.78599816257159 52.10167796488545, 32.15941206231267 52.061266994833
167
168 (POLYGON ((-155.54211 19.08348000000001, -155.68817 18.91619000000003, -155.9366
169 POLYGON ((66.51860680528867 37.36278432875879, 66.54615034370022 37.974684963526
170 POLYGON ((-71.3315836249503 11.77628408451581, -71.36000566271082 11.53999359786
171 POLYGON ((108.0501802917829 21.55237986906012, 106.7150679870901 20.696850694252
172
173 POLYGON ((53.10857262554751 16.65105113368895, 52.38520592632588 16.382411200419
174 POLYGON ((31.52100141777888 -29.25738697684626, 31.325561150851 -29.401977634398
175 POLYGON ((32.75937544122132 -9.23059905358906, 33.2313879737753 -9.6767216935648
176 POLYGON ((31.19140913262129 -22.2515096981724, 30.65986535006709 -22.15156747811
1.1 Introduction
Business Context. Air pollution is a very serious issue that the global population is currently
dealing with. The abundance of air pollutants is not only contributing to global warming, but it
is also causing problematic health issues to the population. There have been numerous efforts to
protect and improve air quality across most nations. However, it seems that we are making very
6
little progress. One of the main causes of this is the fact that the majority of air pollutants are
derived from the burning of fossil fuels such as coal. Big industries and several other economical
and political factors have slowed the progress towards the use of renewable energy by promoting
the use of fossil fuels. Nevertheless, if we educate the general population and create awareness of
this issue, we will be able to overcome this problem in the future.
For this case, you have been hired as a data science consultant for an important environmental
organization. In order to promote awareness of environmental and greenhouse gas issues, your
client is interested in a study of plausible impacts of air contamination on the health of the global
population. They have gathered some raw data provided by the World Health Organization, The
Institute for Health Metrics and Evaluation and the World Bank Group. Your task is to conduct
data analysis, search for potential information, and create visualizations that the client can use for
their campaigns and grant applications.
Analytical Context. You are given a folder, named files with raw data. This data contains
quite a large number of variables and it is in a fairly disorganized state. In addition, one of the
datasets contains very poor documentation, segmented into several datasets. Your objective will
be to:
1. Extract and clean the relevant data. You will have to manipulate several datasets to obtain
useful information for the case.
2. Conduct Exploratory Data Analysis. You will have to create meaningful plots, formulate
meaningful hypotheses and study the relationship between various indicators related to air
pollution.
Additionally, the client has some broad questions they would like to answer: 1. Are we making
any progress in reducing the amount of emitted pollutants across the globe? 2. Which are the
critical regions where we should start environmental campaigns? 3. Are we making any progress
in the prevention of deaths related to air pollution? 4. Which demographic characteristics seem to
correlate with the number of health-related issues derived from air pollution?
7
'2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
'2014', '2015', '2016', '2017', '2018', '2019', 'Unnamed: 64'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 377256 entries, 0 to 377255
Data columns (total 65 columns):
Country Name 377256 non-null object
Country Code 377256 non-null object
Indicator Name 377256 non-null object
Indicator Code 377256 non-null object
1960 37395 non-null float64
1961 41211 non-null float64
1962 43413 non-null float64
1963 43324 non-null float64
1964 43861 non-null float64
1965 46306 non-null float64
1966 46087 non-null float64
1967 47840 non-null float64
1968 47422 non-null float64
1969 49112 non-null float64
1970 69736 non-null float64
1971 76073 non-null float64
1972 78854 non-null float64
1973 78402 non-null float64
1974 79804 non-null float64
1975 83728 non-null float64
1976 85833 non-null float64
1977 89303 non-null float64
1978 88911 non-null float64
1979 89707 non-null float64
1980 94479 non-null float64
1981 96363 non-null float64
1982 97575 non-null float64
1983 97385 non-null float64
1984 98228 non-null float64
1985 99450 non-null float64
1986 100294 non-null float64
1987 101654 non-null float64
1988 101307 non-null float64
1989 103060 non-null float64
1990 126117 non-null float64
1991 131212 non-null float64
1992 135229 non-null float64
1993 136645 non-null float64
1994 138646 non-null float64
1995 146560 non-null float64
1996 146450 non-null float64
1997 147530 non-null float64
8
1998 149527 non-null float64
1999 154659 non-null float64
2000 179600 non-null float64
2001 169874 non-null float64
2002 174693 non-null float64
2003 175686 non-null float64
2004 180936 non-null float64
2005 194452 non-null float64
2006 192699 non-null float64
2007 196798 non-null float64
2008 195843 non-null float64
2009 196888 non-null float64
2010 211863 non-null float64
2011 203080 non-null float64
2012 204810 non-null float64
2013 200522 non-null float64
2014 206201 non-null float64
2015 201043 non-null float64
2016 197174 non-null float64
2017 176112 non-null float64
2018 126115 non-null float64
2019 21481 non-null float64
Unnamed: 64 0 non-null float64
dtypes: float64(61), object(4)
memory usage: 187.1+ MB
None
9
19 Arab World ARB
20 Arab World ARB
21 Arab World ARB
22 Arab World ARB
23 Arab World ARB
24 Arab World ARB
25 Arab World ARB
26 Arab World ARB
27 Arab World ARB
28 Arab World ARB
29 Arab World ARB
... ... ...
377226 Zimbabwe ZWE
377227 Zimbabwe ZWE
377228 Zimbabwe ZWE
377229 Zimbabwe ZWE
377230 Zimbabwe ZWE
377231 Zimbabwe ZWE
377232 Zimbabwe ZWE
377233 Zimbabwe ZWE
377234 Zimbabwe ZWE
377235 Zimbabwe ZWE
377236 Zimbabwe ZWE
377237 Zimbabwe ZWE
377238 Zimbabwe ZWE
377239 Zimbabwe ZWE
377240 Zimbabwe ZWE
377241 Zimbabwe ZWE
377242 Zimbabwe ZWE
377243 Zimbabwe ZWE
377244 Zimbabwe ZWE
377245 Zimbabwe ZWE
377246 Zimbabwe ZWE
377247 Zimbabwe ZWE
377248 Zimbabwe ZWE
377249 Zimbabwe ZWE
377250 Zimbabwe ZWE
377251 Zimbabwe ZWE
377252 Zimbabwe ZWE
377253 Zimbabwe ZWE
377254 Zimbabwe ZWE
377255 Zimbabwe ZWE
Indicator Name \
0 2005 PPP conversion factor, GDP (LCU per inter...
1 2005 PPP conversion factor, private consumptio...
2 Access to clean fuels and technologies for coo...
3 Access to electricity (% of population)
10
4 Access to electricity, rural (% of rural popul...
5 Access to electricity, urban (% of urban popul...
6 Account ownership at a financial institution o...
7 Account ownership at a financial institution o...
8 Account ownership at a financial institution o...
9 Account ownership at a financial institution o...
10 Account ownership at a financial institution o...
11 Account ownership at a financial institution o...
12 Account ownership at a financial institution o...
13 Account ownership at a financial institution o...
14 Account ownership at a financial institution o...
15 Adequacy of social insurance programs (% of to...
16 Adequacy of social protection and labor progra...
17 Adequacy of social safety net programs (% of t...
18 Adequacy of unemployment benefits and ALMP (% ...
19 Adjusted net enrollment rate, primary (% of pr...
20 Adjusted net enrollment rate, primary, female ...
21 Adjusted net enrollment rate, primary, male (%...
22 Adjusted net national income (annual % growth)
23 Adjusted net national income (constant 2010 US$)
24 Adjusted net national income (current US$)
25 Adjusted net national income per capita (annua...
26 Adjusted net national income per capita (const...
27 Adjusted net national income per capita (curre...
28 Adjusted net savings, excluding particulate em...
29 Adjusted net savings, excluding particulate em...
... ...
377226 Urban population
377227 Urban population (% of total population)
377228 Urban population growth (annual %)
377229 Urban population living in areas where elevati...
377230 Urban poverty gap at national poverty lines (%)
377231 Urban poverty headcount ratio at national pove...
377232 Use of IMF credit (DOD, current US$)
377233 Use of insecticide-treated bed nets (% of unde...
377234 Value lost due to electrical outages (% of sal...
377235 Vitamin A supplementation coverage rate (% of ...
377236 Vulnerable employment, female (% of female emp...
377237 Vulnerable employment, male (% of male employm...
377238 Vulnerable employment, total (% of total emplo...
377239 Wage and salaried workers, female (% of female...
377240 Wage and salaried workers, male (% of male emp...
377241 Wage and salaried workers, total (% of total e...
377242 Wanted fertility rate (births per woman)
377243 Water productivity, total (constant 2010 US$ G...
377244 Wholesale price index (2010 = 100)
377245 Women making their own informed decisions rega...
377246 Women participating in the three decisions (ow...
11
377247 Women who believe a husband is justified in be...
377248 Women who believe a husband is justified in be...
377249 Women who believe a husband is justified in be...
377250 Women who believe a husband is justified in be...
377251 Women who believe a husband is justified in be...
377252 Women who believe a husband is justified in be...
377253 Women who were first married by age 15 (% of w...
377254 Women who were first married by age 18 (% of w...
377255 Women's share of population ages 15+ living wi...
12
377232 DT.DOD.DIMF.CD NaN NaN NaN
377233 SH.MLR.NETS.ZS NaN NaN NaN
377234 IC.FRM.OUTG.ZS NaN NaN NaN
377235 SN.ITK.VITA.ZS NaN NaN NaN
377236 SL.EMP.VULN.FE.ZS NaN NaN NaN
377237 SL.EMP.VULN.MA.ZS NaN NaN NaN
377238 SL.EMP.VULN.ZS NaN NaN NaN
377239 SL.EMP.WORK.FE.ZS NaN NaN NaN
377240 SL.EMP.WORK.MA.ZS NaN NaN NaN
377241 SL.EMP.WORK.ZS NaN NaN NaN
377242 SP.DYN.WFRT NaN NaN NaN
377243 ER.GDP.FWTL.M3.KD NaN NaN NaN
377244 FP.WPI.TOTL NaN NaN NaN
377245 SG.DMK.SRCR.FN.ZS NaN NaN NaN
377246 SG.DMK.ALLD.FN.ZS NaN NaN NaN
377247 SG.VAW.REAS.ZS NaN NaN NaN
377248 SG.VAW.ARGU.ZS NaN NaN NaN
377249 SG.VAW.BURN.ZS NaN NaN NaN
377250 SG.VAW.GOES.ZS NaN NaN NaN
377251 SG.VAW.NEGL.ZS NaN NaN NaN
377252 SG.VAW.REFU.ZS NaN NaN NaN
377253 SP.M15.2024.FE.ZS NaN NaN NaN
377254 SP.M18.2024.FE.ZS NaN NaN NaN
377255 SH.DYN.AIDS.FE.ZS NaN NaN NaN
13
22 NaN NaN NaN ... 1.193135e+01
23 NaN NaN NaN ... 1.903652e+12
24 NaN NaN NaN ... NaN
25 NaN NaN NaN ... 9.382820e+00
26 NaN NaN NaN ... 5.241930e+03
27 NaN NaN NaN ... 5.416624e+03
28 NaN NaN NaN ... NaN
29 NaN NaN NaN ... NaN
... ... ... ... ... ...
377226 567387.00000 609178.00000 653686.000000 ... 4.257058e+06
377227 13.57800 14.09200 14.620000 ... 3.301500e+01
377228 7.11729 7.10689 7.051661 ... 9.896455e-01
377229 NaN NaN NaN ... NaN
377230 NaN NaN NaN ... NaN
377231 NaN NaN NaN ... NaN
377232 NaN NaN NaN ... 5.270949e+08
377233 NaN NaN NaN ... 9.700000e+00
377234 NaN NaN NaN ... 8.800000e+00
377235 NaN NaN NaN ... 4.700000e+01
377236 NaN NaN NaN ... 7.552700e+01
377237 NaN NaN NaN ... 5.570100e+01
377238 NaN NaN NaN ... 6.536100e+01
377239 NaN NaN NaN ... 2.415800e+01
377240 NaN NaN NaN ... 4.363800e+01
377241 NaN NaN NaN ... 3.414700e+01
377242 NaN NaN NaN ... 3.500000e+00
377243 NaN NaN NaN ... NaN
377244 NaN NaN NaN ... NaN
377245 NaN NaN NaN ... 5.880000e+01
377246 NaN NaN NaN ... 7.450000e+01
377247 NaN NaN NaN ... 3.960000e+01
377248 NaN NaN NaN ... 1.560000e+01
377249 NaN NaN NaN ... 7.500000e+00
377250 NaN NaN NaN ... 2.230000e+01
377251 NaN NaN NaN ... 2.140000e+01
377252 NaN NaN NaN ... 1.690000e+01
377253 NaN NaN NaN ... 3.900000e+00
377254 NaN NaN NaN ... 3.050000e+01
377255 NaN NaN NaN ... 5.910000e+01
14
7 NaN NaN 2.207935e+01 NaN NaN
8 NaN NaN 3.779076e+01 NaN NaN
9 NaN NaN 3.421658e+01 NaN NaN
10 NaN NaN 2.277989e+01 NaN NaN
11 NaN NaN 2.127804e+01 NaN NaN
12 NaN NaN 3.523748e+01 NaN NaN
13 NaN NaN 3.890061e+01 NaN NaN
14 NaN NaN 2.125614e+01 NaN NaN
15 NaN NaN NaN NaN NaN
16 NaN NaN NaN NaN NaN
17 NaN NaN NaN NaN NaN
18 NaN NaN NaN NaN NaN
19 8.520714e+01 8.421832e+01 8.425430e+01 8.403523e+01 8.453258e+01
20 8.411878e+01 8.321839e+01 8.334494e+01 8.318996e+01 8.382028e+01
21 8.630059e+01 8.522583e+01 8.518359e+01 8.489517e+01 8.525464e+01
22 6.032670e+00 3.090463e+00 1.504003e+00 -5.557763e+00 1.480371e-01
23 2.018494e+12 2.080874e+12 2.112171e+12 1.994781e+12 1.997734e+12
24 NaN NaN NaN NaN NaN
25 3.667670e+00 8.472754e-01 -6.422262e-01 -7.494294e+00 -1.834019e+00
26 5.434187e+03 5.480229e+03 5.445034e+03 5.036967e+03 4.944588e+03
27 5.905730e+03 5.951002e+03 6.035730e+03 5.445677e+03 5.294685e+03
28 NaN NaN NaN NaN NaN
29 NaN NaN NaN NaN NaN
... ... ... ... ... ...
377226 4.306222e+06 4.359425e+06 4.416215e+06 4.473868e+06 4.531255e+06
377227 3.283400e+01 3.265400e+01 3.250400e+01 3.238500e+01 3.229600e+01
377228 1.148264e+00 1.227921e+00 1.294283e+00 1.297036e+00 1.274558e+00
377229 NaN NaN NaN NaN NaN
377230 NaN NaN NaN NaN NaN
377231 NaN NaN NaN NaN NaN
377232 5.201243e+08 5.193419e+08 4.867292e+08 4.637526e+08 4.551646e+08
377233 NaN NaN 2.680000e+01 9.000000e+00 NaN
377234 NaN NaN NaN NaN 6.100000e+00
377235 6.100000e+01 3.400000e+01 3.200000e+01 4.500000e+01 3.500000e+01
377236 7.524300e+01 7.521700e+01 7.530400e+01 7.540300e+01 7.547700e+01
377237 5.622400e+01 5.617200e+01 5.622500e+01 5.628400e+01 5.631400e+01
377238 6.550400e+01 6.546900e+01 6.554400e+01 6.562600e+01 6.567600e+01
377239 2.444600e+01 2.447500e+01 2.439000e+01 2.429100e+01 2.421800e+01
377240 4.311900e+01 4.318300e+01 4.313400e+01 4.307700e+01 4.304900e+01
377241 3.400800e+01 3.405000e+01 3.397900e+01 3.389900e+01 3.384900e+01
377242 NaN NaN NaN 3.600000e+00 NaN
377243 NaN NaN NaN NaN NaN
377244 NaN NaN NaN NaN NaN
377245 NaN NaN NaN 5.990000e+01 NaN
377246 NaN NaN NaN 7.210000e+01 NaN
377247 NaN NaN 3.740000e+01 3.870000e+01 NaN
377248 NaN NaN NaN 1.670000e+01 NaN
377249 NaN NaN NaN 8.100000e+00 NaN
15
377250 NaN NaN NaN 2.280000e+01 NaN
377251 NaN NaN NaN 2.140000e+01 NaN
377252 NaN NaN NaN 1.450000e+01 NaN
377253 NaN NaN NaN 3.700000e+00 NaN
377254 NaN NaN 3.350000e+01 3.240000e+01 NaN
377255 5.930000e+01 5.950000e+01 5.960000e+01 5.960000e+01 5.970000e+01
16
377235 4.300000e+01 NaN NaN NaN
377236 7.550500e+01 7.555400e+01 75.610003 NaN
377237 5.619900e+01 5.618400e+01 56.179000 NaN
377238 6.563600e+01 6.565200e+01 65.674000 NaN
377239 2.418900e+01 2.414100e+01 24.084000 NaN
377240 4.316200e+01 4.317700e+01 43.181999 NaN
377241 3.388800e+01 3.387200e+01 33.848999 NaN
377242 NaN NaN NaN NaN
377243 NaN NaN NaN NaN
377244 NaN NaN NaN NaN
377245 NaN NaN NaN NaN
377246 NaN NaN NaN NaN
377247 NaN NaN NaN NaN
377248 NaN NaN NaN NaN
377249 NaN NaN NaN NaN
377250 NaN NaN NaN NaN
377251 NaN NaN NaN NaN
377252 NaN NaN NaN NaN
377253 NaN NaN NaN NaN
377254 NaN NaN NaN NaN
377255 5.970000e+01 5.980000e+01 NaN NaN
The data seems to have a large number of indicators dating from 1960. There are also columns
containing country names and codes. Notice that the first couple of rows say Arab World, which
may indicate that the data contains broad regional data as well. We notice also that there are at
least 100,000 entries with NaN values for each year column.
Since we are interested in environmental indicators, we must get rid of any rows not relevant
to our study. However, the number of indicators seems to be quite large and a manual inspection
seems impossible. Let’s load the file WDISeries.csv which seems to contain more information
about the indicators:
17
1 AG.CON.FERT.PT.ZS Environment: Agricultural production
2 AG.CON.FERT.ZS Environment: Agricultural production
3 AG.LND.AGRI.K2 Environment: Land use
4 AG.LND.AGRI.ZS Environment: Land use
Source \
0 Food and Agriculture Organization, electronic ...
1 Food and Agriculture Organization, electronic ...
2 Food and Agriculture Organization, electronic ...
3 Food and Agriculture Organization, electronic ...
4 Food and Agriculture Organization, electronic ...
18
0 Agricultural land covers more than one-third o... NaN
1 Factors such as the green revolution, has led ... NaN
2 Factors such as the green revolution, has led ... NaN
3 Agricultural land covers more than one-third o... NaN
4 Agricultural land covers more than one-third o... NaN
[5 rows x 21 columns]
Bingo! The WDI_ids DataFrame contains a column named Topic. Moreover, it seems that
Environment is listed as a key topic in the column.
1.2.1 Exercise 1:
Extract all the rows that have the topic key Environment in WDI_ids. Add to the resulting
DataFrame a new column named Subtopic which contains the corresponding subtopic of the in-
dicator. For example, the subtopic of Environment: Agricultural production is Agricultural
production. Which subtopics do you think are of interest to us?
Hint: Remember that you can apply string methods to Series using the str() method of
pandas.
Answer.
In [7]: WDI_copy=WDI_ids.copy()
WDI_ids_sub = WDI_copy.loc[(WDI_copy['Topic'].str.contains('Environment'))]
WDI_ids_sub[['Main Topic','Subtopic']] = WDI_ids_sub.Topic.str.split(':',expand = True,
#WDI_ids_sub #Print content of dataframe.
1.2.2 Exercise 2:
Use the results of Exercise 1 to create a new DataFrame with the history of all emissions indicators
for countries and major regions. Call this new DataFrame Emissions_df. How many emissions
indicators are in the study?
Answer.
In [8]: WDI_data_copy = WDI_data.copy() #copying first dataframe to enable merge
WDI_data_copy.head() #printing the data frame
Emissions_df = pd.merge(WDI_data_copy,WDI_ids_sub,how='inner', left_on='Indicator Code'
#Emissions_df.head()
In [9]: Emissions_df = Emissions_df.loc[(Emissions_df['Subtopic'].str.contains('Emissions'))]
#Emissions_df
In [10]: len(Emissions_df['Indicator Code'].unique())
Out[10]: 42
19
1.2.3 Exercise 3:
The DataFrame Emissions_df has one column per year of observation. Data in this form is usually
referred to as data in wide format, as the number of columns is high. However, it might be easier
to query and filter the data if we had a single column containing the year in which each indicator
was calculated. This way, each observation will be represented by a single row. Use the pandas func-
tion melt() to reshape the Emissions_df data into long format. The resulting DataFrame should
contain a pair of new columns named Year and Indicator Value:
Answer.
In [11]: #arr=WDI_data.columns[4:65]
1.2.4 Exercise 4:
The column Indicator Value of the new Emissions_df contains a bunch of NaN values. Addi-
tionally, the Year column contains an Unnamed: 64 value. What procedure should we follow to
clean these missing values in our DataFrame? Proceed with your suggested cleaning process.
Answer. I’m glad you asked. For the Indicator Value, I would exclude NaN values. For Year, I
would interpolate because
In [13]: Emissions_melt.dropna(inplace=True)
#Emissions_melt
1.2.5 Exercise 5:
Split the Emissions_df into two DataFrames, one containing only countries and the other con-
taining only regions. Name these Emissions_C_df and Emissions_R_df respectively.
Hint: You may want to inspect the file WDICountry.csv for this task. Region country codes
may be found by looking at null values of the Region column in WDICountry.
Answer.
Index(['Country Code', 'Short Name', 'Table Name', 'Long Name', '2-alpha code',
'Currency Unit', 'Special Notes', 'Region', 'Income Group', 'WB-2 code',
'National accounts base year', 'National accounts reference year',
'SNA price valuation', 'Lending category', 'Other groups',
'System of National Accounts', 'Alternative conversion factor',
'PPP survey year', 'Balance of Payments Manual in use',
'External debt Reporting status', 'System of trade',
'Government Accounting concept', 'IMF data dissemination standard',
20
'Latest population census', 'Latest household survey',
'Source of most recent Income and expenditure data',
'Vital registration complete', 'Latest agricultural census',
'Latest industrial data', 'Latest trade data', 'Unnamed: 30'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 263 entries, 0 to 262
Data columns (total 31 columns):
Country Code 263 non-null object
Short Name 263 non-null object
Table Name 263 non-null object
Long Name 263 non-null object
2-alpha code 261 non-null object
Currency Unit 217 non-null object
Special Notes 93 non-null object
Region 217 non-null object
Income Group 217 non-null object
WB-2 code 262 non-null object
National accounts base year 211 non-null object
National accounts reference year 71 non-null object
SNA price valuation 209 non-null object
Lending category 143 non-null object
Other groups 59 non-null object
System of National Accounts 206 non-null object
Alternative conversion factor 47 non-null object
PPP survey year 191 non-null object
Balance of Payments Manual in use 195 non-null object
External debt Reporting status 121 non-null object
System of trade 203 non-null object
Government Accounting concept 158 non-null object
IMF data dissemination standard 186 non-null object
Latest population census 217 non-null object
Latest household survey 152 non-null object
Source of most recent Income and expenditure data 168 non-null object
Vital registration complete 118 non-null object
Latest agricultural census 128 non-null object
Latest industrial data 147 non-null float64
Latest trade data 246 non-null float64
Unnamed: 30 0 non-null float64
dtypes: float64(3), object(28)
memory usage: 63.8+ KB
None
In [15]: #Export region codes as an array. Export the Country Code for all rows that have a nul
Region_Array=[]
x=0
21
for z in WDI_Country["Region"]:
if type(WDI_Country["Region"][x]) is float:
Region_Array.append(WDI_Country['Country Code'][x])
x+=1
#Region_Array
22
641276 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641277 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641278 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641279 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641280 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641281 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641282 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641283 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641284 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641285 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641286 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641287 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641288 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641290 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641291 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641292 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641293 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641294 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641295 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641296 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641297 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641298 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641299 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641300 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641301 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
641302 EN.ATM.PM25.MC.ZS PM2.5 air pollution, population exposed to lev...
23
1090 OECD members OED 1960
1093 Post-demographic dividend PST 1960
1096 South Asia SAS 1960
1097 South Asia (IDA & IBRD) TSA 1960
1098 Sub-Saharan Africa SSF 1960
1099 Sub-Saharan Africa (excluding high income) SSA 1960
1100 Sub-Saharan Africa (IDA & IBRD countries) TSS 1960
1101 Upper middle income UMC 1960
1102 World WLD 1960
1848 Arab World ARB 1960
... ... ... ...
641272 IDA & IBRD total IBT 2017
641273 IDA blend IDB 2017
641274 IDA only IDX 2017
641275 IDA total IDA 2017
641276 Late-demographic dividend LTE 2017
641277 Latin America & Caribbean LCN 2017
641278 Latin America & Caribbean (excluding high income) LAC 2017
641279 Latin America & the Caribbean (IDA & IBRD coun... TLA 2017
641280 Least developed countries: UN classification LDC 2017
641281 Low & middle income LMY 2017
641282 Low income LIC 2017
641283 Lower middle income LMC 2017
641284 Middle East & North Africa MEA 2017
641285 Middle East & North Africa (excluding high inc... MNA 2017
641286 Middle East & North Africa (IDA & IBRD countries) TMN 2017
641287 Middle income MIC 2017
641288 North America NAC 2017
641290 OECD members OED 2017
641291 Other small states OSS 2017
641292 Pacific island small states PSS 2017
641293 Post-demographic dividend PST 2017
641294 Pre-demographic dividend PRE 2017
641295 Small states SST 2017
641296 South Asia SAS 2017
641297 South Asia (IDA & IBRD) TSA 2017
641298 Sub-Saharan Africa SSF 2017
641299 Sub-Saharan Africa (excluding high income) SSA 2017
641300 Sub-Saharan Africa (IDA & IBRD countries) TSS 2017
641301 Upper middle income UMC 2017
641302 World WLD 2017
Indicator Value
1059 0.521402
1060 0.906151
1061 3.299058
1062 3.256992
1067 0.574664
24
1069 0.193260
1070 0.633055
1071 1.217518
1072 1.124521
1073 0.442355
1074 0.240913
1075 0.345023
1076 1.905257
1077 0.359918
1078 0.357913
1079 0.350371
1081 1.066781
1083 0.658654
1087 1.088138
1088 0.889531
1090 0.634181
1093 0.662381
1096 0.733359
1097 0.733359
1098 0.497527
1099 0.497722
1100 0.497527
1101 1.223373
1102 0.827240
1848 59535.396567
... ...
641272 98.043929
641273 99.955056
641274 99.937059
641275 99.943178
641276 95.016903
641277 87.214912
641278 87.793750
641279 87.423722
641280 99.999947
641281 98.103124
641282 100.000000
641283 99.469737
641284 100.000000
641285 100.000000
641286 100.000000
641287 97.872134
641288 3.022545
641290 59.442776
641291 91.544650
641292 93.629896
641293 52.803165
641294 100.000000
25
641295 93.213857
641296 99.320900
641297 99.320900
641298 100.000000
641299 100.000000
641300 100.000000
641301 96.065069
641302 91.295708
26
642283 EN.ATM.PM25.MC.T3.ZS
642284 EN.ATM.PM25.MC.T3.ZS
642285 EN.ATM.PM25.MC.T3.ZS
642286 EN.ATM.PM25.MC.T3.ZS
642287 EN.ATM.PM25.MC.T3.ZS
642288 EN.ATM.PM25.MC.T3.ZS
642289 EN.ATM.PM25.MC.T3.ZS
642290 EN.ATM.PM25.MC.T3.ZS
642291 EN.ATM.PM25.MC.T3.ZS
642292 EN.ATM.PM25.MC.T3.ZS
642293 EN.ATM.PM25.MC.T3.ZS
642294 EN.ATM.PM25.MC.T3.ZS
642297 EN.ATM.PM25.MC.T3.ZS
642298 EN.ATM.PM25.MC.T3.ZS
642299 EN.ATM.PM25.MC.T3.ZS
642300 EN.ATM.PM25.MC.T3.ZS
642301 EN.ATM.PM25.MC.T3.ZS
642302 EN.ATM.PM25.MC.T3.ZS
642303 EN.ATM.PM25.MC.T3.ZS
642304 EN.ATM.PM25.MC.T3.ZS
642305 EN.ATM.PM25.MC.T3.ZS
642306 EN.ATM.PM25.MC.T3.ZS
642307 EN.ATM.PM25.MC.T3.ZS
642308 EN.ATM.PM25.MC.T3.ZS
642309 EN.ATM.PM25.MC.T3.ZS
642310 EN.ATM.PM25.MC.T3.ZS
642311 EN.ATM.PM25.MC.T3.ZS
645448 EN.ATM.CO2E.PC
Indicator Name_x \
1105 CO2 emissions (kg per 2010 US$ of GDP)
1110 CO2 emissions (kg per 2010 US$ of GDP)
1113 CO2 emissions (kg per 2010 US$ of GDP)
1114 CO2 emissions (kg per 2010 US$ of GDP)
1116 CO2 emissions (kg per 2010 US$ of GDP)
1121 CO2 emissions (kg per 2010 US$ of GDP)
1122 CO2 emissions (kg per 2010 US$ of GDP)
1123 CO2 emissions (kg per 2010 US$ of GDP)
1124 CO2 emissions (kg per 2010 US$ of GDP)
1126 CO2 emissions (kg per 2010 US$ of GDP)
1129 CO2 emissions (kg per 2010 US$ of GDP)
1133 CO2 emissions (kg per 2010 US$ of GDP)
1137 CO2 emissions (kg per 2010 US$ of GDP)
1138 CO2 emissions (kg per 2010 US$ of GDP)
1140 CO2 emissions (kg per 2010 US$ of GDP)
1141 CO2 emissions (kg per 2010 US$ of GDP)
1143 CO2 emissions (kg per 2010 US$ of GDP)
1144 CO2 emissions (kg per 2010 US$ of GDP)
27
1145 CO2 emissions (kg per 2010 US$
GDP) of
1147 CO2 emissions (kg per 2010 US$
GDP) of
1148 CO2 emissions (kg per 2010 US$
GDP) of
1149 CO2 emissions (kg per 2010 US$
GDP) of
1150 CO2 emissions (kg per 2010 US$
GDP) of
1156 CO2 emissions (kg per 2010 US$
GDP) of
1159 CO2 emissions (kg per 2010 US$
GDP) of
1160 CO2 emissions (kg per 2010 US$
GDP) of
1161 CO2 emissions (kg per 2010 US$
GDP) of
1169 CO2 emissions (kg per 2010 US$
GDP) of
1170 CO2 emissions (kg per 2010 US$
GDP) of
1171 CO2 emissions (kg per 2010 US$
GDP) of
... ...
642281 PM2.5 pollution, population exposed to levels ...
642282 PM2.5 pollution, population exposed to levels ...
642283 PM2.5 pollution, population exposed to levels ...
642284 PM2.5 pollution, population exposed to levels ...
642285 PM2.5 pollution, population exposed to levels ...
642286 PM2.5 pollution, population exposed to levels ...
642287 PM2.5 pollution, population exposed to levels ...
642288 PM2.5 pollution, population exposed to levels ...
642289 PM2.5 pollution, population exposed to levels ...
642290 PM2.5 pollution, population exposed to levels ...
642291 PM2.5 pollution, population exposed to levels ...
642292 PM2.5 pollution, population exposed to levels ...
642293 PM2.5 pollution, population exposed to levels ...
642294 PM2.5 pollution, population exposed to levels ...
642297 PM2.5 pollution, population exposed to levels ...
642298 PM2.5 pollution, population exposed to levels ...
642299 PM2.5 pollution, population exposed to levels ...
642300 PM2.5 pollution, population exposed to levels ...
642301 PM2.5 pollution, population exposed to levels ...
642302 PM2.5 pollution, population exposed to levels ...
642303 PM2.5 pollution, population exposed to levels ...
642304 PM2.5 pollution, population exposed to levels ...
642305 PM2.5 pollution, population exposed to levels ...
642306 PM2.5 pollution, population exposed to levels ...
642307 PM2.5 pollution, population exposed to levels ...
642308 PM2.5 pollution, population exposed to levels ...
642309 PM2.5 pollution, population exposed to levels ...
642310 PM2.5 pollution, population exposed to levels ...
642311 PM2.5 pollution, population exposed to levels ...
645448 CO2 emissions (metric tons per capita)
28
1114 Austria AUT 1960 0.335608
1116 Bahamas, The BHS 1960 0.211426
1121 Belgium BEL 1960 0.763531
1122 Belize BLZ 1960 0.452195
1123 Benin BEN 1960 0.127381
1124 Bermuda BMU 1960 0.127571
1126 Bolivia BOL 1960 0.273275
1129 Brazil BRA 1960 0.190172
1133 Burkina Faso BFA 1960 0.038148
1137 Cameroon CMR 1960 0.054810
1138 Canada CAN 1960 0.654772
1140 Central African Republic CAF 1960 0.097822
1141 Chad TCD 1960 0.026319
1143 Chile CHL 1960 0.459254
1144 China CHN 1960 6.086700
1145 Colombia COL 1960 0.437281
1147 Congo, Dem. Rep. COD 1960 0.146524
1148 Congo, Rep. COG 1960 0.150811
1149 Costa Rica CRI 1960 0.126841
1150 Cote d'Ivoire CIV 1960 0.107714
1156 Denmark DNK 1960 0.316624
1159 Dominican Republic DOM 1960 0.238808
1160 Ecuador ECU 1960 0.173455
1161 Egypt, Arab Rep. EGY 1960 1.042268
1169 Fiji FJI 1960 0.278461
1170 Finland FIN 1960 0.279969
1171 France FRA 1960 0.456487
... ... ... ... ...
642281 Suriname SUR 2017 99.665666
642282 Sweden SWE 2017 0.000000
642283 Switzerland CHE 2017 1.377046
642284 Syrian Arab Republic SYR 2017 100.000000
642285 Tajikistan TJK 2017 100.000000
642286 Tanzania TZA 2017 100.000000
642287 Thailand THA 2017 99.594649
642288 Timor-Leste TLS 2017 96.262700
642289 Togo TGO 2017 100.000000
642290 Tonga TON 2017 0.000000
642291 Trinidad and Tobago TTO 2017 100.000000
642292 Tunisia TUN 2017 100.000000
642293 Turkey TUR 2017 99.999170
642294 Turkmenistan TKM 2017 99.960505
642297 Uganda UGA 2017 100.000000
642298 Ukraine UKR 2017 97.898540
642299 United Arab Emirates ARE 2017 100.000000
642300 United Kingdom GBR 2017 1.214196
642301 United States USA 2017 0.277273
642302 Uruguay URY 2017 0.000000
29
642303 Uzbekistan UZB 2017 97.681436
642304 Vanuatu VUT 2017 0.000000
642305 Venezuela, RB VEN 2017 72.091015
642306 Vietnam VNM 2017 99.234229
642307 Virgin Islands (U.S.) VIR 2017 10.000000
642308 West Bank and Gaza PSE 2017 99.999998
642309 Yemen, Rep. YEM 2017 100.000000
642310 Zambia ZMB 2017 100.000000
642311 Zimbabwe ZWE 2017 100.000000
645448 Sudan SDN 2018 0.000000
• Total greenhouse gas emissions (kt of CO2 equivalent), EN.ATM.GHGT.KT.CE: The to-
tal of greenhouse emissions includes CO2, Methane, Nitrous oxide, among other pollutant
gases. Measured in kilotons.
• CO2 emissions (kt), EN.ATM.CO2E.KT: Carbon dioxide emissions are those stemming
from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide
produced during consumption of solid, liquid, and gas fuels and gas flaring.
• Other greenhouse gas emissions, HFC, PFC and SF6 (kt of CO2 equivalent),
EN.ATM.GHGO.KT.CE: Other pollutant gases.
• PM2.5 air pollution, mean annual exposure (micrograms per cubic meter),
EN.ATM.PM25.MC.M3: Population-weighted exposure to ambient PM2.5 pollution is
defined as the average level of exposure of a nation’s population to concentrations of
suspended particles measuring less than 2.5 microns in aerodynamic diameter, which are
capable of penetrating deep into the respiratory tract and causing severe health damage.
Exposure is calculated by weighting mean annual concentrations of PM2.5 by population in
both urban and rural areas.
• PM2.5 air pollution, population exposed to levels exceeding WHO guideline value (% of
total), EN.ATM.PM25.MC.ZS: Percent of population exposed to ambient concentrations of
PM2.5 that exceed the World Health Organization (WHO) guideline value.
30
1.3.1 Exercise 6:
For each of the emissions DataFrames, extract the rows corresponding to the above indicators of
interest. Replace the long names of the indicators by the short names Total, CO2, CH4, N2O, Other,
PM2.5, and PM2.5_WHO. (This will be helpful later when we need to label plots of our data.)
Answer.
In [18]: indicators_dict = {
"EN.ATM.GHGT.KT.CE": "Total",
"EN.ATM.CO2E.KT": "C02",
"EN.ATM.METH.KT.CE": "CH4",
"EN.ATM.NOXE.KT.CE": "N20",
"EN.ATM.GHGO.KT.CE": "Other",
"EN.ATM.PM25.MC.M3": "PM2.5",
"EN.ATM.PM25.MC.ZS": "PM2.5WHO"
}
l = [indicators_dict['EN.ATM.PM25.MC.M3']]
31
1.4.1 Exercise 7:
Let’s first calculate some basic information about the main indicators across the globe.
7.1 Compute some basic statistics of the amount of kt of emissions for each of the four main
pollutants (CO2, CH4, N2O, Others) over the years. Use the Emissions_C_df data frame. What
trends do you see?
Answer.
2 C02
In [22]: mainCC02 = emit_C_df[emit_C_df["Indicator Code"].isin(C02)].copy()
cc02=mainCC02["Indicator Value"].describe()
#print (np.std(mainC02["ktEmit"]))
#print (np.mean(mainC02["ktEmit"]))
#print (np.var(mainC02["ktEmit"]))
cc02
3 CH4
In [23]: mainCCH4 = emit_C_df[emit_C_df["Indicator Code"].isin(CH4)].copy()
cch4=mainCCH4["Indicator Value"].describe()
#print (np.std(mainCH4["ktEmit"]))
#print (np.mean(mainCH4["ktEmit"]))
#print (np.var(mainCH4["ktEmit"]))
cch4
32
25% 8.806213e+02
50% 5.457505e+03
75% 1.932534e+04
max 1.752290e+06
Name: Indicator Value, dtype: float64
4 N20
In [24]: mainN20 = emit_C_df[emit_C_df["Indicator Code"].isin(N20)].copy()
n20=mainN20["Indicator Value"].describe()
#print (np.std(mainN20["ktEmit"]))
#print (np.mean(mainN20["ktEmit"]))
#print (np.var(mainN20["ktEmit"]))
n20
5 Other
In [25]: mainOther = emit_C_df[emit_C_df["Indicator Code"].isin(Other)].copy()
other=mainOther["Indicator Value"].describe()
#print (np.std(mainOthers["ktEmit"]))
#print (np.mean(mainOthers["ktEmit"]))
#print (np.var(mainOthers["ktEmit"]))
other
33
#pollutantsreg
avpoll=pollutantsreg.groupby(['Year', 'Indicator Code']).mean()['Indicator Value']
avpoll
34
2009 C02 148186.017030
CH4 38210.173914
N20 14944.935049
Other 32439.979269
2010 C02 155782.613970
CH4 37883.007264
N20 14832.660386
Other 38139.160139
2011 C02 161941.999808
CH4 38957.611430
N20 15155.798583
Other 44330.506322
2012 C02 162632.874078
CH4 39385.301627
N20 15300.954568
Other 46333.117526
2013 C02 163074.533966
2014 C02 165114.116327
Name: Indicator Value, Length: 184, dtype: float64
#avpoll.xticks(gr.get_xticks()[1::3])
av.set_yticklabels(av.get_yticklabels(), fontsize=12);
av.set_xticklabels(av.get_xticklabels(), fontsize=10, rotation=45);
av.set_xlabel(av.get_xlabel(), fontsize=20);
av.set_ylabel(av.get_ylabel(), fontsize=20);
plt.xticks(np.arange(5),['1960','1970','1980','1990','2000','2010'])
plt.xticks(np.arange(0,55,10))
plt.yticks(np.arange(4),['0','50000','100000','150000'])
plt.yticks(np.arange(0,170000,50000))
av
35
7.2 What can you say about the distribution of emissions around the globe over the years? What
information can you extract from the tails of these distributions over the years?
Answer.
In [28]: #emit_C_df
In [29]: LCN=["LCN"]
SAS=["SAS"]
SSF=["SSF"]
ECS=["ECS"]
MEA=["MEA"]
EAS=["EAS"]
NAC=["NAC"]
glRegions = [LCN[0],SAS[0],SSF[0],ECS[0],MEA[0],EAS[0],NAC[0]]
6 C02
In [30]: graphRC02i = emit_R_df[emit_R_df["Indicator Code"].isin(C02)].copy()
graphRC02 = graphRC02i[graphRC02i["Country Code"].isin(glRegions)].copy()
36
grC02=sns.lineplot(data=graphRC02, x='Year', y='Indicator Value',hue="Country Code")
grC02.set_xticks(grC02.get_xticks()[0::10])
grC02.set_xlabel(grC02.get_xlabel(), fontsize=24);
grC02.set_ylabel(grC02.get_ylabel(), fontsize=24);
plt.show()
7 CH4
In [31]: graphRCH4i = emit_R_df[emit_R_df["Indicator Code"].isin(CH4)].copy()
graphRCH4 = graphRCH4i[graphRCH4i["Country Code"].isin(glRegions)].copy()
grCH4=sns.lineplot(data=graphRCH4, x='Year', y='Indicator Value',hue="Country Code")
grCH4.set_xticks(grCH4.get_xticks()[0::10])
grCH4.set_xlabel(grCH4.get_xlabel(), fontsize=24);
grCH4.set_ylabel(grCH4.get_ylabel(), fontsize=24);
plt.show()
37
8 N20
In [32]: graphRN20i = emit_R_df[emit_R_df["Indicator Code"].isin(N20)].copy()
graphRN20 = graphRN20i[graphRN20i["Country Code"].isin(glRegions)].copy()
grN20=sns.lineplot(data=graphRN20, x='Year', y='Indicator Value',hue="Country Code")
grN20.set_xticks(grN20.get_xticks()[0::10])
grN20.set_xlabel(grN20.get_xlabel(), fontsize=24);
grN20.set_ylabel(grN20.get_ylabel(), fontsize=24);
plt.show()
38
9 Other
In [33]: graphROtheri = emit_R_df[emit_R_df["Indicator Code"].isin(Other)].copy()
graphROther = graphROtheri[graphROtheri["Country Code"].isin(glRegions)].copy()
grOther=sns.lineplot(data=graphROther, x='Year', y='Indicator Value',hue="Country Code
grOther.set_xticks(grOther.get_xticks()[0::10])
grOther.set_xlabel(grOther.get_xlabel(), fontsize=24);
grOther.set_ylabel(grOther.get_ylabel(), fontsize=24);
plt.show()
39
In [34]: #leaving this here because it's pretty and I may want to use it later.
fig = plt.figure()
ax = plt.axes(projection='3d')
40
7.3 Compute a plot showing the behavior of each of the four main air pollutants for each of the
main global regions in the Emissions_R_df data frame. The main regions are 'Latin America &
Caribbean', 'South Asia', 'Sub-Saharan Africa', 'Europe & Central Asia', 'Middle
East & North Africa', 'East Asia & Pacific' and 'North America'. What conclusions can
you make?
Answer.
In [35]: mainR=emit_R_df.melt(id_vars=['Indicator Code', 'Country Code'],
value_vars=WDI_data.columns[4:65],
var_name='Year',
value_name='ktEmit')
#mainR
41
11 South Asia
In [37]: mainRSAS = emit_R_df[emit_R_df["Country Code"].isin(SAS)].copy()
mainRSASemit=mainRSAS[mainRSAS["Indicator Code"].isin(Pollutants)].copy()
mainRSASemit
figSAS = plt.subplots(figsize=(30, 15))
sas=sns.swarmplot(y="Indicator Value", x="Year", hue="Indicator Code", data=mainRSASem
sas.set_xticklabels(sas.get_xticklabels(), rotation=45, fontsize=16);
sas.set_xlabel(sas.get_xlabel(), fontsize=24);
sas.set_ylabel(sas.get_ylabel(), fontsize=24);
sas.set_ylim(0,15000000)
plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False)
legsas = plt.legend(loc=9, ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
12 Sub-Saharan Africa
In [38]: mainRSSF = emit_R_df[emit_R_df["Country Code"].isin(SSF)].copy()
mainRSSFemit=mainRSSF[mainRSSF["Indicator Code"].isin(Pollutants)].copy()
mainRSSFemit
figSSF = plt.subplots(figsize=(30, 15))
ssf=sns.swarmplot(y="Indicator Value", x="Year", hue="Indicator Code", data=mainRSSFem
ssf.set_xticklabels(ssf.get_xticklabels(), rotation=45, fontsize=16);
ssf.set_xlabel(ssf.get_xlabel(), fontsize=24);
ssf.set_ylabel(ssf.get_ylabel(), fontsize=24);
ssf.set_ylim(0,15000000)
plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False)
legssf = plt.legend(loc=9, ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
42
13 Europe & Central Asia
In [39]: mainRECS = emit_R_df[emit_R_df["Country Code"].isin(ECS)].copy()
mainRECSemit=mainRECS[mainRECS["Indicator Code"].isin(Pollutants)].copy()
mainRECSemit
figECS = plt.subplots(figsize=(30, 15))
ecs=sns.swarmplot(y="Indicator Value", x="Year", hue="Indicator Code", data=mainRECSem
ecs.set_xticklabels(ecs.get_xticklabels(), rotation=45, fontsize=16);
ecs.set_xlabel(ecs.get_xlabel(), fontsize=24);
ecs.set_ylabel(ecs.get_ylabel(), fontsize=24);
ecs.set_ylim(0,15000000)
plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False)
legecs = plt.legend(loc=9, ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
43
14 Middle East & North Africa
In [40]: mainRMEA = emit_R_df[emit_R_df["Country Code"].isin(MEA)].copy()
mainRMEAemit=mainRMEA[mainRMEA["Indicator Code"].isin(Pollutants)].copy()
mainRMEAemit
figMEA = plt.subplots(figsize=(30, 15))
mea=sns.swarmplot(y="Indicator Value", x="Year", hue="Indicator Code", data=mainRMEAem
mea.set_xticklabels(mea.get_xticklabels(), rotation=45, fontsize=16);
mea.set_xlabel(mea.get_xlabel(), fontsize=24);
mea.set_ylabel(mea.get_ylabel(), fontsize=24);
mea.set_ylim(0,15000000)
plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False)
legmea = plt.legend(loc=9, ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
44
15 East Asia & Pacific
In [41]: mainREAS = emit_R_df[emit_R_df["Country Code"].isin(EAS)].copy()
mainREASemit=mainREAS[mainREAS["Indicator Code"].isin(Pollutants)].copy()
mainREASemit
figEAS = plt.subplots(figsize=(30, 15))
eas=sns.swarmplot(y="Indicator Value", x="Year", hue="Indicator Code", data=mainREASem
eas.set_xticklabels(eas.get_xticklabels(), rotation=45, fontsize=16);
eas.set_ylim(0,15000000)
plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False)
eas.set_xlabel(eas.get_xlabel(), fontsize=24);
eas.set_ylabel(eas.get_ylabel(), fontsize=24);
legeas = plt.legend(loc=9, ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
45
16 North America
In [42]: mainRNAC = emit_R_df[emit_R_df["Country Code"].isin(NAC)].copy()
mainRNACemit=mainRNAC[mainRNAC["Indicator Code"].isin(Pollutants)].copy()
mainRNACemit
figNAC = plt.subplots(figsize=(30, 15))
nac=sns.swarmplot(y="Indicator Value", x="Year", hue="Indicator Code", data=mainRNACem
nac.set_xticklabels(nac.get_xticklabels(), rotation=45, fontsize=16);
nac.set_xlabel(nac.get_xlabel(), fontsize=24);
nac.set_ylabel(nac.get_ylabel(), fontsize=24);
nac.set_ylim(0,15000000)
plt.gcf().axes[0].yaxis.get_major_formatter().set_scientific(False)
legnac = plt.legend(loc=9, ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
46
It seems that countries in East Asia and the Pacific are the worst dealing with pollutant emis-
sions. We also see that Europe and Central Asia have been making some efforts to reduce their
emissions. Surprisingly this is not the case with North America and Sub-Saharan Africa, which
levels have been increasing over the years as well.
16.0.1 Exercise 8:
In Exercise 7 we discovered some interesting features of the distribution of the emissions over the
years. Let us explore these features in more detail.
8.1 Which are the top five countries that have been in the top 10 of CO2 emitters over the years?
Have any of these countries made efforts to reduce the amount of CO2 emissions over the last 10
years?
Answer.
In [43]: #mainCC02
47
In [46]: n = 5
CC02list=newCC02['Country Name'].value_counts()[:n].index.tolist()
CC02list
In [47]: n = 5
fiveCC02=newCC02[newCC02["Country Name"].isin(newCC02['Country Name'].value_counts()[:
#fiveCC02
17 Answer
The 5 countries who most frequently made the top 10 emitters lists over the years were China, the
United States, Japan, India, and Canada. The United States decreased their C02 emissions over
the 10 year window. Canada and Japan saw minimal change to their C02 emissions during that
same window. India and China saw increases to their C02 emissions during the window.
48
8.2 Are these five countries carrying out the burden of most of the emissions emitted over the
years globally? Can we say that the rest of the world is making some effort to control their polluted
gasses emissions over the years?
Answer.
What percent of total emissions are these 5 countries responsible for? What is the change in
emissions over time for the other countries? We need to create a separate dataframe excluding
these 5 countries.
In [49]: #EN.ATM.GHGT.KT.CE
#emit_C_df
tot=['Total']
Sum of all emissions of type “Total” for all countries and years.
Out[50]: 1597525708.0684905
Sum of all emissions of type “Total” for the top 5 countries for all years.
Out[52]: 631799452.38203335
Percent of global emissions over the years for which the top 5 countries are responsible.
In [53]: burden=(fivesum/allsum)*100
print (str(burden)+"%")
39.5486250513%
Graph of total emissions over the years for all countries excluding the top 5.
49
Out[55]: <matplotlib.axes._subplots.AxesSubplot at 0x7f7b89db0f28>
199
2.512562814070352%
18 Answer
A. At first glance, I would have argued that these countries are not responsible for most of the
emissions because they’re only responsible for 39.5% of the total emissions. However, when con-
sidering that over a third of total emissions are coming from less than 3% of the total countries, I
would definitely say that these five coutntries shoulder the greatest burden of the emissions.
B. I cannot say that the rest of the world is doing much better on controlling emissions because
there has been an overall consistent upward trend in total polluted gasses over the year.
50
population to concentrations of these small particles. The PM2.5_WHO measures the percentage of
the population who are exposed to ambient concentrations of these particles that exceed some
thresholds set by the World Health Organization (WHO). In particular, countries with a higher
PM2.5_WHO indicator are more likely to suffer from bad health conditions.
9.1 The client would like to know if there is any relationship between the PM2.5_WHO indicator
and the level of income of the general population, as well as how this changes over time. What
plot(s) might be helpful to solve the client’s question? What conclusion can you draw from your
plot(s) to answer their question?
Hint: The DataFrame WDI_countries contains a column named Income Group.
Answer.
In [57]: pm2=['PM2.5WHO']
pm25_C_df = emit_C_df[emit_C_df['Indicator Code'].isin(pm2)].copy()
#pm25_C_df
In [58]: WDI_PM2 = WDI_Country.copy() #copying first dataframe to enable merge
#WDI_data_copy.head() #printing the data frame
#WDI_Country.set_index(['Country Code','Country Name'])['Year'].unstack()
#WDI_Country
PM25WHO = pd.merge(WDI_PM2, pm25_C_df, how='inner', on='Country Code')
#PM25WHO
In [59]: figPM25 = plt.subplots(figsize=(30, 15))
graphpm=sns.barplot(y="Indicator Value", x="Year", hue="Income Group", data=PM25WHO)
graphpm.set_xlabel(graphpm.get_xlabel(), fontsize=24);
graphpm.set_ylabel(graphpm.get_ylabel(), fontsize=24);
legpm = plt.legend(ncol=2, shadow=True, fancybox=True, fontsize='xx-large')
graphpm
Out[59]: <matplotlib.axes._subplots.AxesSubplot at 0x7f7b89c6d710>
51
9.2 What do you think are the causes behind the results in Exercise 9.1?
Answer.
Air quality in lower income communities would be lower because of miminal fresh air and
oxygen from greenery. Also, public transportation is likely to be utilized heavier in low income
areas. Lastly, high income citizens tend to live in more spacious communities; thus decreasing the
density of pollutants in relation to area.
10.1 Which indicators present in the file WDISeries.csv file might be useful to solve the client’s
question? Explain.
Note: Naming one or two indicators is more than enough for this question.
Answer.
In [61]: WID=WDI_ids.copy()
WDI_ids_health1 = WID[(WID['Long definition'].str.contains('pollution'))]
WDI_ids_health2 = WDI_ids_health1[(WDI_ids_health1['Topic'].str.contains('Health'))]
WDI_ids_health = WDI_ids_health2[WDI_ids_health2['Series Code'].isin(apm)].copy()
WDI_ids_health
Source \
986 World Health Organization, Global Health Obser...
52
Statistical concept and methodology \
986 NaN
[1 rows x 21 columns]
In [62]: pd.options.display.max_colwidth=1000
print (str(WDI_ids_health['Long definition']))
#indname=[WDI_ids_health['Indicator Name']]
#[print(x) for x in indname]
#WDI_ids_sub[['Main Topic','Subtopic']] = WDI_ids_sub.Topic.str.split(':',expand = Tru
#WDI_ids_sub #Print content of dataframe.
986 Mortality rate attributed to household and ambient air pollution is the number of deaths
Name: Long definition, dtype: object
Answer.
1. Mortality rate attributed to household and ambient air pollution, age-standardized, female
(per 100,000 female population)
2. Mortality rate attributed to household and ambient air pollution, age-standardized, male
(per 100,000 male population)
3. Mortality rate attributed to household and ambient air pollution, age-standardized (per
100,000 population)
10.2 Use the indicators provided in Exercise 10.1 to give valuable information to the client.
Answer.
53
In [66]: mergedapm = world.set_index('iso_a3').join(mort_countries.set_index('Country Code'))
mergedapm = mergedapm.reset_index()
mergedapm = mergedapm.fillna(0)
#mergedapm
10.3 Extend the analysis above to find some countries of interest. These are defined as
• The countries that have a high mortality rate due to household and ambient air pollution,
but with low PM2.5 exposure
• The countries that have a low mortality rate due to household and ambient air pollution, but
with high PM2.5 exposure
54
In [68]: ePMc.rename(columns={"Indicator Value": "Indicator Value-PM2.5"})
55
641238 PM2.5
641241 PM2.5
641242 PM2.5
641243 PM2.5
641244 PM2.5
641245 PM2.5
641246 PM2.5
641247 PM2.5
641248 PM2.5
641249 PM2.5
641250 PM2.5
641251 PM2.5
641252 PM2.5
641253 PM2.5
641254 PM2.5
641255 PM2.5
Indicator Name_x \
341663 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341664 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341665 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341666 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341667 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341668 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341669 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341670 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341671 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341673 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341674 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341675 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341676 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341677 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341678 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341679 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341680 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341681 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341682 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341683 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341684 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341685 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341686 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341687 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341688 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341689 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341691 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341692 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341693 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
341694 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
56
... ...
641224 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641225 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641226 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641227 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641228 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641229 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641230 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641231 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641232 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641233 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641234 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641235 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641236 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641237 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641238 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641241 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641242 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641243 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641244 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641245 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641246 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641247 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641248 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641249 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641250 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641251 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641252 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641253 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641254 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
641255 PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
57
341679 Barbados BRB 1990 28.212522
341680 Belarus BLR 1990 24.585848
341681 Belgium BEL 1990 17.587496
341682 Belize BLZ 1990 27.429802
341683 Benin BEN 1990 40.220136
341684 Bermuda BMU 1990 11.749759
341685 Bhutan BTN 1990 40.205969
341686 Bolivia BOL 1990 23.952679
341687 Bosnia and Herzegovina BIH 1990 30.352558
341688 Botswana BWA 1990 25.435431
341689 Brazil BRA 1990 15.143861
341691 Brunei Darussalam BRN 1990 6.732298
341692 Bulgaria BGR 1990 27.559881
341693 Burkina Faso BFA 1990 46.014661
341694 Burundi BDI 1990 41.248888
... ... ... ... ...
641224 Sudan SDN 2017 55.370834
641225 Suriname SUR 2017 24.780011
641226 Sweden SWE 2017 6.184665
641227 Switzerland CHE 2017 10.303100
641228 Syrian Arab Republic SYR 2017 43.757259
641229 Tajikistan TJK 2017 46.152185
641230 Tanzania TZA 2017 29.076641
641231 Thailand THA 2017 26.256727
641232 Timor-Leste TLS 2017 19.257209
641233 Togo TGO 2017 35.731336
641234 Tonga TON 2017 10.785479
641235 Trinidad and Tobago TTO 2017 24.108568
641236 Tunisia TUN 2017 37.655994
641237 Turkey TUR 2017 44.311526
641238 Turkmenistan TKM 2017 21.767721
641241 Uganda UGA 2017 50.494321
641242 Ukraine UKR 2017 20.309776
641243 United Arab Emirates ARE 2017 40.917510
641244 United Kingdom GBR 2017 10.472690
641245 United States USA 2017 7.409442
641246 Uruguay URY 2017 9.274883
641247 Uzbekistan UZB 2017 28.455901
641248 Vanuatu VUT 2017 11.652777
641249 Venezuela, RB VEN 2017 17.008554
641250 Vietnam VNM 2017 29.626728
641251 Virgin Islands (U.S.) VIR 2017 10.265312
641252 West Bank and Gaza PSE 2017 33.225630
641253 Yemen, Rep. YEM 2017 50.456007
641254 Zambia ZMB 2017 27.438035
641255 Zimbabwe ZWE 2017 22.251671
58
In [69]: mergedpm2 = pd.merge(mergedapm, ePMc, left_on='index', right_on='Country Code', how='o
mergedpm2 = mergedpm2.reset_index()
mergedpm2 = mergedpm2.fillna(0)
mergedpm2 = mergedpm2.drop(columns=['Country Name_x','Country Name_y','level_0','Count
mergedpm2 = mergedpm2.rename(columns={'index':'Country Code', 'Year_x': 'APM Year', 'Y
#mergedpm2.sample(2)
59
sm._A = []
cbar = fig.colorbar(sm)
cbar.ax.tick_params(labelsize=20)
mergedpm2.plot('PM2.5 Ind. Value', cmap='Blues', edgecolor='k', linewidth = 1, ax=ax)
plt.show()
60
21 Approach #3 - Establish middles and generate lists based on these
parameters.
In [74]: PM2med=mergedpm2['PM2.5 Ind. Value'].median()
APMmed=mergedapm['Indicator Value'].median()
PM2med, APMmed
Out[74]: (23.756718672876701, 63.899999999999999)
In [75]: pmlowapmhigh=mergedpm2[mergedpm2['PM2.5 Ind. Value'] <= PM2med][mergedpm2['APM Ind. Va
pmhighapmlow=mergedpm2[mergedpm2['PM2.5 Ind. Value'] > PM2med][mergedpm2['APM Ind. Val
#pmlowapmhigh.head()
pmlowapmhighlist=[pmlowapmhigh['Country Code'].unique()]
pmlowapmhighlist
Out[75]: [array(['ALB', 'BLZ', 'BWA', 'CIV', 'FJI', 'GEO', 'GIN', 'GUY', 'HTI',
'IDN', 'KGZ', 'LBR', 'LKA', 'MDA', 'MDG', 'MNE', 'MOZ', 'MWI',
'PHL', 'PNG', 'SLB', 'SLE', 'SWZ', 'TKM', 'TLS', 'UKR', 'VUT', 'ZWE'], dtype=o
In [76]: pmhighapmlowlist=[pmhighapmlow['Country Code'].unique()]
pmhighapmlowlist
Out[76]: [array(['ARE', 'ARM', 'AZE', 'BGR', 'BLR', 'BOL', 'CHL', 'CUB', 'CZE',
'DZA', 'HND', 'IRN', 'ISR', 'JOR', 'KOR', 'LBN', 'MAR', 'MEX',
'OMN', 'PER', 'POL', 'PSE', 'QAT', 'SLV', 'SRB', 'SUR', 'SVK',
'THA', 'TTO', 'TUN', 'TUR', 0], dtype=object)]
61
10.4 Finally, we want to look at the mortality data by income. We expect higher income countries
to have lower pollution-related mortality. Find out if this assumption holds. Calculate summary
statistics and histograms for each income category and note any trends.
Answer.
23 Low Income
In [82]: inmortplahLI.describe()
62
50% 170.200000 21.196673
75% 243.300000 22.625162
max 324.100000 23.565921
63
In [85]: inmortphalLI.describe()
Out[85]: APM Ind. Value PM2.5 Ind. Value
count 0.0 0.0
mean NaN NaN
std NaN NaN
min NaN NaN
25% NaN NaN
50% NaN NaN
75% NaN NaN
max NaN NaN
64
In [88]: LMI1graphPerspective=inmortplahLMI['APM Ind. Value'].hist(color='Teal').axis([0,300,0,
In [89]: inmortphalLMI.describe()
65
In [91]: LMI2graphPerspective=inmortphalLMI['APM Ind. Value'].hist(color='Grey').axis([0,300,0,
66
25 Upper Middle Income
In [92]: inmortplahUMI.describe()
67
In [95]: inmortphalUMI.describe()
68
In [97]: UMI2graphPerspective=inmortphalUMI['APM Ind. Value'].hist(color='Grey').axis([0,300,0,
69
26 High Income
In [98]: inmortplahHI.describe()
In [99]: inmortphalHI.describe()
70
In [101]: HIgraphPerspective=inmortphalHI['APM Ind. Value'].hist(color='Grey').axis([0,300,0,10
Answer
In general, the assumption is correct that higher income countries have lower mortality rates
due to ambient air pollution. However, I will note that there was an interesting and significant
spike in mortality rates from low income to lower middle income.
10.5 At the start, we asked some questions. Based on your analysis, provide a short answer to
each of these:
1. Are we making any progress in reducing the amount of emitted pollutants across the globe?
2. Which are the critical regions where we should start environmental campaigns?
3. Are we making any progress in the prevention of deaths related to air pollution?
4. Which demographic characteristics seem to correlate with the number of health-related is-
sues derived from air pollution?
Answer.
71