Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Python - Programming Basics 139

>>> np.zeros((3, 4))


array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>>

For climate data analysis it is important to be able to extract individual array values.
This is done using indices. See Section 4.3 for more details on indexing gridded
datasets. When selecting larger subsections from a NumPy array this is known as
slicing. Both, NumPy array indexing and slicing will be discussed in the remainder
of this section using examples.
The following example commands executed on the Python command prompt show
how to index one-dimensional NumPy arrays.

>>> x = np.arange(10) # create array x with 10 elements


>>> x # print variable x
[0 1 2 3 4 5 6 7 8 9]
>>>
>>> x[2] # print 3rd element of x
2
>>>
>>> x[-1] # print last element of x
9
>>>

The index -1 selects the last element of a NumPy array. This is a useful
shortcut when the length of an array is unknown.

Indices can also be used with two-dimensional arrays as shown in the following
examples.
Python - Programming Basics 140

>>> x = np.arange(10) # create array x with 10 elements


>>> x.shape = (2, 5) # convert (reshape) x from 1D to 2D array
>>> x
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>>
>>> x[0] # select 1st row of 2D array
array([0, 1, 2, 3, 4])
>>>
>>> x[0][2] # select 3rd element of 1st dimension
2
>>>

The following examples show how to slice one-dimensional arrays.

>>> x = np.arange(10) # create numpy array with 10 elements


>>> x[:] # select all dimensions of x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> x[:7] # select 1st to 7th elements of x
array([0, 1, 2, 3, 4, 5, 6])
>>>
>>> x[5:] # select 6th to 10th element of x
[5 6 7 8 9]
>>>
>>> x[2:6]
array([2, 3, 4, 5])
>>>

The following examples show how to slice two-dimensional arrays.


Python - Programming Basics 141

>>> x = np.arange(10) # create numpy array with 10 elements


>>> x.shape = (2, 5) # convert (re-shape) x from 1D to 2D array
>>> x[:] # select all dimensions
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>>
>>> x[0,:] # select 1st row
array([0, 1, 2, 3, 4])
>>>
>>> x[0,2:] # select 3rd to last element of 1st row
array([2, 3, 4])
>>>

Double colons (::) can be used to select every other value of an array within
one dimension.

7.3.3 Saving and Loading NumPy Variables


In some situations it is useful to save NumPy arrays for later use. For instance, if
a calculation takes a long time then it may be desirable to save the output NumPy
array variables. The variables can then be restored in a different script, for instance.
NumPy arrays can be saved and read back in using the functions np.savez() and
np.load(), respectively.

The NumPy fuction np.savez() creates a NumPy specific uncompressed file with the
file extension .npz.
The following code example saves the NumPy arrays lon2d, lat2d and field in a file
named mydata.npz. The file extension .npz will be added automatically to the file
name.

1 import numpy as np
2
3 # save NumPy variables to a file
4 np.savez('mydata', lon2d=lon2d, lat2d=lat2d, field=field)

In order to read the NumPy arrays saved in the file mydata.npz back in the following
commands can be used.
Python - Programming Basics 142

1 import numpy as np
2
3 # read in NumPy variables from file
4 npzfile = np.load('mydata.npz')
5 lon2d = npzfile['lon2d']
6 lat2d = npzfile['lat2d']
7 field = npzfile['field']

7.4 Tips and Solutions

7.4.1 String Formatting of Numbers


Within the coding process it is often necessary to print or display numbers. This
may just be part of a print statement to display a number in the terminal window or
perhaps a floating or integer value is supposed to be used as part of a title or label in
a plot. For the latter the numerical value needs to be converted to a string.
While this all sounds fairly straightforward at first there are a few things to consider.
The following code saves 1 divided by 3 in the variable a and then prints a, first just
as a numerical value and then converted to a string.

a = 1/3
print(a)
print(str(a))

The output of the print statements is as follows.

0.3333333333333333
0.3333333333333333

Both print() statements return the same number with many digits after the decimal
point. So a way needs to be found to control the precision of the floating point
number.
Other situations may include very large or very small numbers to be displayed using
exponent notation or padding numbers with zeros.
In order to format numbers in Python the str.format() method can be used whereby
the general syntax for a printing a formatted numbers is as follows.
Python - Programming Basics 143

print('<FORMAT>'.format(<NUMBER>))

The <FORMAT> part needs to be replaced with the desired format and the <NUMBER> part
with the number to be formatted. A list of number format examples is given in Table
7.13.1.1.

Table 7.13.1.1: Number formatting examples using the str.format() method. Reproduced with
permission from Marcus Kazmierczak’s blog Python String Format Cookbook.

Number Format Output Description


3.1415926 {:.2f} 3.14 Format float 2 decimal places
3.1415926 {:+.2f} +3.14 Format float 2 decimal places with sign
-1 {:+.2f} -1.00 Format float 2 decimal places with sign
2.71828 {:.0f} 3 Format float with no decimal places
5 {:0>2d} 05 Pad number with zeros (left padding, width
2)
5 {:x<4d} 5xxx Pad number with x’s (right padding, width
4)
10 {:x<4d} 10xx Pad number with x’s (right padding, width
4)
1000000 {:,} 1,000,000 Number format with comma separator
0.25 {:.2%} 25.00% Format percentage
1000000000 {:.2e} 1.00e+09 Exponent notation
13 {:10d} 13 Right aligned (default, width 10)
13 {:<10d} 13 Left aligned (width 10)
13 {:^10d} 13 Center aligned (width 10)

An example for the use of the str.format() method in a Python script can be found
in Code 7.5.1.1 in lines 32 and 36. Here the value in the variable wspd is formatted
to have one decimal digit which is then used to annotate a data point on the plot
(Figure 7.5.1.1).

7.4.2 Zero-padding Integer Values in Filenames


When looping over filenames that contain dates most of the time the date information
is formatted using zero-padding of day and month numbers (e.g., 2020-05-01 for
1st of May, 2020). This type of formatting makes sure that the files are listed
alphanumerically by the date.
Python - Programming Basics 144

To create leading-zero-padded string representations of integer values the str.zfill()


method can be used as demonstrated in the following code.

1 import numpy as np
2
3 days = np.linspace(1, 31, 31, dtype='int')
4
5 # loop through each element
6 for d in days:
7 print(d, str(d).zfill(2), str(d).zfill(5))

The np.linspace() function is used in line 3 to create a list of integer values stored
in the variable days ranging from 1 to 31. The loop set up in line 6 iterates over each
element of the variable days. A print() statement is executed for each iteration for
demonstration purposes. Three values are printed. First, the integer value d. Second,
the integer value d converted to a string and applying a zfill(2). Third, the integer
value d converted to a string and applying a zfill(5). Exectuting the above code will
give the following output.

1 01 00001
2 02 00002
3 03 00003
4 04 00004
5 05 00005
6 06 00006
7 07 00007
8 08 00008
9 09 00009
10 10 00010
11 11 00011
12 12 00012
13 13 00013
14 14 00014
15 15 00015
16 16 00016
17 17 00017
18 18 00018
19 19 00019
Python - Programming Basics 145

20 20 00020
21 21 00021
22 22 00022
23 23 00023
24 24 00024
25 25 00025
26 26 00026
27 27 00027
28 28 00028
29 29 00029
30 30 00030
31 31 00031

The zfill method takes only one value as an argument which is the total width of
the resulting string. It is the total character width after zero-padding was applied.
Applying zfill(2) will add a 0 in front of single digit values. Applying zfill(5) will
add four 0s in front of single digit values, three 0s in front of two digit values etc.

7.4.3 Calculate Height From Geopotential with MetPy


Geopotential height is the approximate height above mean sea level (usually given
in metres) of a pressure level. For instance, the 500 hPa geopotential height fields is
often used to identify ridges and troughs in synoptic meteorology.
Climate models often only make the geopotential on pressure levels available rather
than the geopotential height. The geopotential represents the work that would need
to be done to lift a unit mass from sea level up to the altitude at the which the unit
mass is located.
The MetPy function metpy.calc.geopotential_to_height()⁶ can be used to calculate
the height of a pressure level based on the pressure level’s geopotential. The example
file era5_z_bodele_20050301_1200.nc contains ERA5 geopotential values on pressure
levels for the atmospheric profile of single location (Bodele Depression, Chad) and
a single timestep (1 March 2005 at 12 UTC). The output of the ncdump -h era5_z_-
bodele_20050301_1200.nc command may look like the following.

⁶https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.geopotential_to_height.html
Python - Programming Basics 146

The MetPy function metpy.calc.geopotential_to_height()⁷ can be used to calculate


the height of a pressure level based on the pressure level’s geopotential. The example
file era5_z_bodele_20050301_1200.nc contains ERA5 geopotential values on pressure
levels for the atmospheric profile of single location (Bodele Depression, Chad) and
a single timestep (1 March 2005 at 12 UTC). The output of the ncdump -h era5_z_-
bodele_20050301_1200.nc command may look like the following.

1 netcdf era5_z_bodele_20050301_1200 {
2 dimensions:
3 lon = 1 ;
4 lat = 1 ;
5 level = 37 ;
6 time = UNLIMITED ; // (1 currently)
7 variables:
8 double lon(lon) ;
9 lon:standard_name = "longitude" ;
10 lon:long_name = "longitude" ;
11 lon:units = "degrees_east" ;
12 lon:axis = "X" ;
13 double lat(lat) ;
14 lat:standard_name = "latitude" ;
15 lat:long_name = "latitude" ;
16 lat:units = "degrees_north" ;
17 lat:axis = "Y" ;
18 double level(level) ;
19 level:standard_name = "air_pressure" ;
20 level:long_name = "pressure_level" ;
21 level:units = "millibars" ;
22 level:positive = "down" ;
23 level:axis = "Z" ;
24 double time(time) ;
25 time:standard_name = "time" ;
26 time:long_name = "time" ;
27 time:units = "hours since 1900-1-1 00:00:00" ;
28 time:calendar = "standard" ;
29 time:axis = "T" ;
30 short z(time, level, lat, lon) ;
31 z:standard_name = "geopotential" ;
⁷https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.geopotential_to_height.html
Python - Programming Basics 147

32 z:long_name = "Geopotential" ;
33 z:units = "m**2 s**-2" ;
34 z:add_offset = 235494.040103126 ;
35 z:scale_factor = 7.34166874895091 ;
36 z:_FillValue = -32767s ;
37 z:missing_value = -32767s ;
38 ...
39 }

Geopotential is available for the 37 levels (line 5) of the ERA5 model and geopotential
values are given in m²/s² (line 33). The example outlined in Code 7.13.3.1 demon-
strates how the MetPy package can be used to convert geopotential to geopotential
height.
Code 7.13.3.1: Calculating height of pressure levels from ERA5 geopotential field.

1 import numpy as np
2 from netCDF4 import Dataset
3 from metpy.calc import geopotential_to_height
4 from metpy.units import units
5
6 # read netcdf file
7 f = Dataset('../data/era5_z_bodele_20050301_1200.nc', mode='r')
8 levs = f.variables['level'][:]
9 field = f.variables['z'][0,:,0,0]
10 f.close()
11 print('var type field:', type(field))
12
13 # register geopotential field with metpy conform units
14 gp = field * (units.meters**2 / units.seconds**2)
15 print('var type gp:', type(gp))
16
17 # calc height, h is metpy variable with units attached
18 gph = geopotential_to_height(gp)
19 print('var type gph:',type(gph))
20
21 # unregister variable from metpy if needed; returns numpy array
22 height = gph.magnitude
23 print('var type height:', type(height))
24
Python - Programming Basics 148

25 # print variables for each level


26 for i in np.arange(len(levs)):
27 print("%6s %44s %25s" %(str(levs[i]), str(gp[i]), str(gph[i])))

The MetPy functions metpy.calc.geopotential_to_height() and metpy.units.units()


are imported from the metpy package in lines 2 and 3. Reading in the data field z from
the file returns a numpy array named field (line 9).
As a first step, the unit for the geopotential values needs to be registered with the
field variable. This is done in line 13 by multiplying the field variable with the
unit m²/s² using the syntax associated with the metpy.units.units() function (see
MetPy and Pint documentation). The new variable gp is a MetPy/Pint object which
is associated with a unit.
To demonstrate the differences in variable types (NumPy versus MetPy/Pint) several
print statements have been added to the code (lines 11, 15, 19 and 23).
In the next step geopotential height (gph) is calculated by feeding the geopotential (gp)
into the geopotential_to_height() function (line 18). The gph variable is a MetPy/Pint
object associated with the units meter.
If required, the data values can be extracted from MetPy/Pint object gph by using the
variable internal attribute magnitude which returns a NumPy array saved in the new
variable height (line 22).
As a final test and to show which variables are MetPy/Pint objects (include units) and
which variables are NumPy arrays a loop prints the pressure level (levs), geopotential
(gp) and geopotential height (height) in lines 26 and 27 in a formatted way ("%6s %44s
%25s"). The output of Code 7.13.3.1 is shown below.

!(code/7_python_calc_gph_metpy_output.txt)
8. Python - Creating Plots
8.1 Matplotlib
The main Python package for anything related to creating plots is Matplotlib¹. In a
script, the Matplotlib package is usually imported in the following way.

import matplotlib.pyplot as plt

While many climate-related plot examples can be found from Section 7.x onwards,
Matplotlib has a massive Gallery² with all kinds of plot and plotting related code
examples. It is worth just browsing through it just to get an idea as to what it possible.

8.1.1 Setting up Plotting Page (Figure and Axes)


When creating plots it is important to understand the concept of figures, axes and
subplots as they can be quite confusing for beginners. A Python figure object defines
the highest layer in the plot creation hierarchy. One can think of it as blank page.
The figure object allows control of some of the more general properties of the overall
plot including page dimensions (size) and the depth of borders from the page edge.
A figure (page) consists of subplots. Subplots are defined by creating axis objects.
Each axis object is associated with one subplot. If only a single graph is going to
be plotted on the page then only one axis (subplot) needs to be created. If multiple
graphs are to be plotted on the ‘page’ then multiple axes (subplots) need to be created.
Axis properties such as title, x-axis and y-axis range, axis labels, plot type and font
properties can be controlled for each individual subplot via its axis object.

Within the context of subplots axes are Python objects and should not be
confused with actual axes (x-axis or y-axis) of a graph.
¹https://matplotlib.org/
²https://matplotlib.org/gallery/index.html
Python - Creating Plots 150

While figures and axes can be defined explicitly in separate commands (e.g., fig =
plt.figure() and ax = fig.add_subplot()) the convenience function plt.subplots()
is used most of the time in this book. In its simplest form a figure and single axis
(subplot) object can be created using the following single command.

fig, ax = plt.subplots()

The following line of code will return the same as the code above but the number of
subplots in the horizontal and vertical direction is explicitly defined (1 by 1 is default
in above code).

fig, ax = plt.subplots(1, 1)

The first value provided to the plt.subplots() function defines the number of subplots
in the vertical direction and the second value the number of subplots in the horizontal
direction. The following command creates a figure with eight subplots on a grid with
2 plots in the vertical direction and 4 plots in the horizontal direction.

fig, axs = plt.subplots(2, 4)

The individual axes can now be referred to by using indexes as in axs[0, 0] to axs[1,
3]. The axes could also be explicitly named as in the following example.

fig, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8)) = plt.subplots(2, 4)

The plt.subplots() function also allows control of the page size, axis sharing and map
projection settings. In the example below taken from Code 8.12.3.1 (Figure 8.12.3.1)
a 3 by 4 grid of subplots is created (12 subplots in total) which share the x-axis and
y-axis and the Plate Carree map projection is set for all subplots.

fig, ax = plt.subplots(3, 4, sharex=True, sharey=True, figsize=(5.5, 3.98),


subplot_kw={'projection':ccrs.PlateCarree()})

The plt.subplots() function is not the only way to create plotting axes. One of the
advantages of using the plt.subplots() convenience function is that it makes it easy
Python - Creating Plots 151

to share plot axes (x-axis and y-axis on a graph). An example can be found in Code
8.10.1.1 line 22 and 23 (Figure 8.10.1.1). A good discussion of other ways to create
subplot axes can be found here³.

Many examples of setting up and configuring figures, axes and subplots


can be found on the Matplotlib webpage⁴.

The fig object is mostly used at the end of a plotting script to optimise the space
used by the subplots and to save the whole ‘page’ to a file as shown in the following
example.

fig.tight_layout()
fig.savefig('filename.png', format='png')

It is also good practice to close the plot properly at the end of a script using
the plt.close() command.

8.1.2 Main Plotting Commands


Each individual subplot axis instance can now be used to plot data within the asso-
ciated subplot space and to control subplot properties. Some of the more frequently
used plotting functions used to create plots in climate research are summarised in
Table 8.4.3.1 including links to example plots in this book as well as on the Matplotlib
webpage.
Data fields will need to passed on to the plotting functions as well as optional
keyword arguments. Which data fields and keyword arguments are associated with
each plotting functions can be found in the respective Matplotlib documentation.

³https://towardsdatascience.com/the-many-ways-to-call-axes-in-matplotlib-2667a7b06e06
⁴https://matplotlib.org/gallery/index.html#subplots-axes-and-figures
Python - Creating Plots 152

Table 8.4.3.1: Main plotting functions used in climate research.

Function Description Examples


ax.bar() Bar plot. Figure 8.8.1.1, MPL⁵
ax.contour() Line contour plots. Figure 8.10.1.1, MPL⁶
ax.contourf() Filled contour plots. Figure 8.7.1.1, Figure 8.7.2.1,
Figure 8.7.3.1, Figure 8.10.1.1,
Figure 8.12.3.1, Figure 8.9.1.1,
Figure 8.10.2.1, MPL⁷
ax.imshow() Display data as a 2D image. MPL⁸
ax.plot() Plot x versus y data as lines Figure 8.5.1.1, Figure 8.5.2.1,
and/or markers. Figure 8.5.4.1, Figure 8.5.5.1,
Figure 8.5.3.1, , MPL⁹
ax.scatter() Scatter plots. Figure 8.6.3.1, Figure 8.6.2.1,
Figure 8.6.1.1, MPL¹⁰

8.1.3 Colour Names and Colour Maps


Individual colours can be assigned to plot objects such as lines, markers, fonts and
axes. There are three types of colours: basic colours, tableau palette colours and CSS
colours. Figure 8.4.4.1 shows the colours and their associated names for all three
colour types.
There are 8 basic colours which can be referenced by small letters including b (blue),
g (green), r (red), c (cyan), m (magenta), y (yellow), k (black) and w (white).
The tableau palette colours can referenced by the string tab: followed by the colour
name as shown in Figure 8.4.4.1).
CCS colours can be reference by their names as shown in Figure 8.4.4.1.
⁵https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.bar.html?highlight=bar#examples-using-matplotlib-
pyplot-bar
⁶https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.contour.html?highlight=contour#examples-using-
matplotlib-pyplot-contour
⁷https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.contourf.html?highlight=contourf#examples-using-
matplotlib-pyplot-contourf
⁸https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.imshow.html?highlight=imshow#examples-using-
matplotlib-pyplot-imshow
⁹https://matplotlib.org/3.2.0/api/_as_gen/matplotlib.pyplot.plot.html#examples-using-matplotlib-pyplot-plot
¹⁰https://matplotlib.org/3.1.0/api/_as_gen/matplotlib.pyplot.scatter.html?highlight=scatter#examples-using-
matplotlib-pyplot-scatter
Python - Creating Plots 153

A colour an be assigned to a specific plot feature by providing the colour name as a


string to the color keyword argument as shown in the following example.

The name of the keyword that receives the colour name may be different
for other plotting functions. For instance, for the ax.scatter() function it is
just c.

ax.plot(x, y, color='steelblue', marker='o', markersize=2)


Python - Creating Plots 154

Figure 8.4.4.1: Matplotlib colour types and names (image sourced from Matplotlib).
Python - Creating Plots 155

Matplotlib colour maps are pre-defined sequence of colours that can be used when
plotting contours or markers in a plot where a distinction by plot colour is necessary.
A list of pre-defined colour maps and their names can be found on the Matplotlib
webpage¹¹.
The pre-defined colour maps are grouped into several categories such as perceptually
uniform sequential, sequential, diverging, cyclic, qualitative and miscellaneous. In
climate research sequential and divergent colour maps are most common. Divergent
colour maps are especially useful when plotting data that include negative and
positive values such as is generated in anomaly and composite difference analysis.
In the following example the colour map named seismic is imported from the
Matplotlib package into the Python object cmap.

import matplotlib.pyplot as plt


cmap = plt.cm.seismic

In some cases it is necessary to split up the pre-defined colour map into a number of
individual segments where the colours are clearly distinguishable from one another
(e.g., for contour). Basically, a colour map with continuous colours is indexed so that
colours in certain intervals are selected to create a colour map with discrete colours.
This is done by using the mcolors.BoundaryNorm() function as shown in the following
example (take from Code 8.6.2.1).

1 import matplotlib.pyplot as plt


2 import numpy as np
3 import matplotlib.colors as mcolors
4
5 # create index for discrete colour map
6 cmap = plt.cm.seismic
7 bounds = np.linspace(-25, 25, num=11)
8 norm = mcolors.BoundaryNorm(bounds, cmap.N)
9
10 # create scatter plot.
11 scat = ax.scatter(selevs, eelevs, c=fld, s=6, marker='o', cmap=cmap, norm=norm)

In the code example above the relevant packages are imported in line 1 to 3. The
colour map plt.cm.seismic is loaded in line 6 and saved in the object cmap. A sequence
¹¹https://matplotlib.org/tutorials/colors/colormaps.html
Python - Creating Plots 156

of numbers from -25 to 25 in steps of 5 is saved in the variable bounds in line 7. Line
8 is where the colour index for the discrete colour map is created. The bounds and
the number of colours from the colour map to be used (cmap.N returns the number
of colours in cmap) are passed to the mcolors.BoundaryNorm() function. The returned
colour map index is saved in norm. In the plotting command norm and the colour
map cmap are passed to the ax.scatter(). The field variable fld holding the data
values is passed to the colour keyword argument c‘. The complete code (Code 8.6.2.1)
generates Figure 8.6.2.1.

8.2 Line Plots

8.2.1 Line Plot with Labels


The Code 8.5.1.1 example shows how to create a simple line plot with markers and
labelling of data points. In this example, data from a Pilot balloon track are read in
from an Excel spreadsheet. The data coordinates represent the position of the balloon
in the longitude (x) and latitude (y) direction in relation to the release point at [0, 0]
(Figure 8.5.1.1)
Code 8.5.1.1: Code example for a line plot with markers and labels.

1 import numpy as np
2 import matplotlib.pyplot as plt
3 from openpyxl import load_workbook
4 from matplotlib.ticker import MultipleLocator
5
6 # open Excel file, sheet 'P01'
7 wb = load_workbook('../data/pibal_data.xlsx', data_only=True)
8 ws = wb['P01']
9
10 # read in date, time and location cells as strings
11 d = ws.cell(row=3, column=2).value
12 t = ws.cell(row=4, column=2).value
13 loc = ws.cell(row=5, column=2).value
14
15 # create empty numpy array variables
16 wspd = x = y = np.array([], dtype='float64')
Python - Creating Plots 157

17
18 # iterate over rows 8 to 39; read wind speed and x and y distance travelled
19 for row in range(8, 39):
20 wspd = np.append(wspd, np.float64(ws.cell(row=row, column=8).value))
21 x = np.append(x, np.float64(ws.cell(row=row, column=6).value))
22 y = np.append(y, np.float64(ws.cell(row=row, column=7).value))
23
24 # set up figure
25 fig, ax = plt.subplots(figsize=(5.5, 3.98))
26
27 # plot wind profile
28 ax.plot(x, y, color='steelblue', marker='o', markersize=2)
29
30 # plot data labels
31 for i in np.arange(0, 23):
32 ax.annotate("{:.1f}".format(wspd[i]), xy=(x[i], y[i]),
33 xytext=(x[i]-25, y[i]+25), fontsize=5, fontweight='bold',
34 horizontalalignment='right', verticalalignment='bottom')
35 for i in np.arange(23, len(wspd)):
36 ax.annotate("{:.1f}".format(wspd[i]), xy=(x[i], y[i]),
37 xytext=(x[i]-25, y[i]-25), fontsize=5, fontweight='bold',
38 horizontalalignment='right', verticalalignment='top')
39
40 # add title and set tick label size
41 ax.set_title('Pibal: '+loc+' '+d+' '+t, fontsize=12)
42 ax.tick_params(labelsize=7)
43
44 # format x axis
45 ax.set_xlabel('longitude direction [m]', fontsize=9)
46 ax.axes.set_xlim(-3100, 200)
47 ax.xaxis.set_minor_locator(MultipleLocator(100))
48
49 # format y axis
50 ax.set_ylabel('latitude direction [m]', fontsize=9)
51 ax.axes.set_ylim(-7000, 1000)
52 ax.yaxis.set_minor_locator(MultipleLocator(200))
53
54 # add gridlines
55 ax.grid(which='major', axis='both', linewidth=0.5, color='black', alpha=0.5,
56 linestyle=':')
Python - Creating Plots 158

57
58 # optimise layout
59 plt.tight_layout()
60
61 # save figure to file
62 plt.savefig('../images/7_python_line_plot_labels_300dpi.png',
63 orientation='portrait', format='png', dpi=300)
64
65 plt.close()

All necessary packages and functions used in the script are imported in lines 1 to 4.
For more details on how to read in data from an Excel spreadsheet see Section 7.2.3.3.
Here the Excel spreadsheet pibal_data.xlsx is opened for reading in line 7 creating
the handle wb which is used in line 8 to select the sheet named P01 creating a new
handle ws. Date time and location information is read in from the spreadsheet in line
11 to 13. Three empty NumPy arrays are defined in line 16 which are filled with
wind speed (wspd), longitude distance (x) and latitude distance (y) values from the
spreadsheet in lines 19 to 22.
A figure (fig) with a single subplot (ax) is set up in line 25.
The x and y coordinates are used in line 28 to plot markers. The colour of the markers
and connecting line is set to steelblue. The marker is set to a filled circle (o) and size
2.

The data value labels associated with each marker are plotted using the ax.annotation()
function. The labels are added in two steps. The first 23 labels are added in line 31 to
34 using horizontal alignment set to right and vertical alignment set to bottom. This
places the bottom right corner of the data value label closest to the associated marker
(Figure 8.5.1.1).
As the balloon track curves around moving towards the southeast the label position
has to be adjusted in order not to overlay the labels onto the line. The 23rd to last
label is added in line 35 to 38. Here the horizontal alignment is set to right and the
vertical alignment is set to top. This places the top right corner of the data value label
closest to the associated marker (Figure 8.5.1.1).
The distance and position of the label in relation to the marker is controlled by the
xytext keyword. In this example, values of 25 (metres in data coordinate system) are
Python - Creating Plots 159

added or subtracted to the x and y data coordinates accordingly.

String formatting is used in this example to convert the floating point


numbers to strings with one digit precision. See Section 7.13.1 for more
details on number formatting examples.

A plot title is added in line 41 using the date, time and location details retrieved from
the spreadsheet and the tick label size is adjusted to size 7 in line 42.
Further formatting is done for the x-axis and y-axis in lines 45 to 47 and 50 to 52,
respectively. This includes adding axis labels, axis limits and the distribution of minor
ticks. Dotted black grid lines with line width 0.5 are added for both axes in line 55
and 56.
Finally, the plot is optimised, saved and closed in lines 59, 62 to 63 and 65, respectively.

Figure 8.5.1.1: Line plot with markers and labels. Hodograph (birds-eye view) showing the track of
a pibal balloon from the release point at [0, 0].
Python - Creating Plots 160

This code example generates the plot shown in Figure 8.5.1.1. The balloon released
at the coordinates [0, 0] first travelled west-southwest (in the trade winds). As the
balloon rises the wind speed picks up and the distance between labels increases. after
travelling almost 2.8 km in the longitudinal direction the wind direction changes and
the balloon is caught in the westerly flow.
The plot created in the following section uses the data from the same pibal balloon
track. It shows that the change in wind direction occurs at an altitude of about 2 km
(Figure 8.5.2.1).

8.2.2 Line Plot with Arrows


The Code 8.5.2.1 example plots data from the same Excel spreadsheet (pilot balloon
track) as in the previous section (Section 7.x.x). This time altitude, wind speed and
wind direction are read in from the spreadsheet and a wind speed versus altitude line
plot is created. In addition, wind arrows are plotted to indicated wind direction.
Code 8.5.2.1: Plotting Pibal track derived wind profile

1 import numpy as np
2 import matplotlib.pyplot as plt
3 from openpyxl import load_workbook
4 from matplotlib.ticker import MultipleLocator
5 import metpy.calc as mpcalc
6 from metpy.units import units
7
8 # open Excel file and iterate through sheets
9 wb = load_workbook('../data/pibal_data.xlsx', data_only=True)
10 ws = wb['P01']
11
12 # read in date, time and location cells as strings
13 d = ws.cell(row=3, column=2).value
14 t = ws.cell(row=4, column=2).value
15 loc = ws.cell(row=5, column=2).value
16
17 # create empty numpy array variables
18 alt = wspd = wdir = np.array([], dtype='float64')
19
20 # iterate over rows 8 to 39; read altitude, wind speed and wind direction
Python - Creating Plots 161

21 for row in range(8, 39):


22 alt = np.append(alt, np.float64(ws.cell(row=row, column=2).value))
23 wspd = np.append(wspd, np.float64(ws.cell(row=row, column=8).value))
24 wdir = np.append(wdir, np.float64(ws.cell(row=row, column=12).value))
25
26 # set up figure
27 fig, ax = plt.subplots(figsize=(5.5, 3.98))
28
29 # plot wind profile
30 ax.plot(wspd, alt, color='orangered')
31
32 # add title
33 ax.set_title('Pibal: '+loc+' '+d+' '+t, fontsize=12)
34
35 # format x axis
36 ax.set_xlabel('wind speed [m/s]')
37 ax.axes.set_xlim(0, 11)
38 ax.xaxis.set_major_locator(MultipleLocator(1))
39
40 # format y axis
41 ax.set_ylabel('altitude [m]')
42 ax.axes.set_ylim(0, 3500)
43 ax.yaxis.set_minor_locator(MultipleLocator(100))
44
45 # calculate u and v wind components for quiver
46 wspdr = wspd * units('m/s')
47 wdirr = wdir * units.deg
48 u, v = mpcalc.wind_components(wspdr, wdirr)
49 x = np.full((len(alt)), 10)
50
51 # plot wind direction arrows
52 Q = plt.quiver(x, alt, u, v, pivot='mid', color='black', units='x', width=0.03,
53 scale_units='x', scale=10)
54
55 # make plot look nice
56 plt.tight_layout()
57
58 # save figure to file
59 plt.savefig('../images/7_python_line_plot_arrows_300dpi.png',
60 orientation='portrait', format='png', dpi=300)
Python - Creating Plots 162

61
62 plt.close()

The first part of the script up to line 29 is very similar to the one developed in the
previous section (Code 8.5.1.1). This is because data are read in from the same Excel
spreadsheet. The only difference is that instead of columns 8, 6 and 7 we read in
columns 2, 8 and 12 for altitude, wind speed and wind direction saved in the variables
alt, wspd and wdir, respectively.

In line 32, wind speed on the x-axis is plotted against altitude on the y-axis. The
line colour is orangered. No further keyword arguments are passed to the ax.plot()
function resulting in an orange coloured line with no markers (Figure 8.5.2.1).
A title is added in line 35 and the x and y axes are formatted in line 38 to 40 and 43
to 45, respectively.
In addition to the vertical wind profile wind arrows indicating the wind direction as
seen from a bird’s-eye view are plotted. This is done in two steps.
First, the U and V wind vector components need to be calculated from the wind speed
and wind direction values. This is done in line 48 to 50 using the MetPy package.
Then, the wind speed values saved in the varible wspd are registered with the unit
m/s using MetPy’s unit() function. Similarly, the wind direction values are registered
with the unit degrees. Now the wind speed and wind direction variables can be passed
to the mpcalc.wind_components() function in line 50 which returns the U and V wind
vector components. They are saved in the variables u and v.
Second, in order to plot the wind arrows in a vertical line on the plot a NumPy
variable x is created in line 51 which is of the same length as the alt variable. All
elements in x are set to 10. The arbitrary value of 10 represents the x coordinate for
the arrow plotting call in line 51.
The wind direction arrows are plotted in line 51 using the plt.quiver() function. The
function requires four mandatory input values (arrays). First, the x and alt variables
determine the location at which the arrows are going to be plotted. The u and v vector
components determine the arrow direction and length (associated with wind speed).
Understanding the keyword arguments that control the arrow properties can be
tricky and it is highly recommended to carefully read the plt.quiver() documen-
Python - Creating Plots 163

tation. Setting pivot to mid makes sure the middle of the arrow is located at the
coordinates provided by the variables x and alt. The arrow colour is set to black.
The unit length is chosen somewhat arbitrarily in this example as it does not relate
to either of the axes. However, the x-axis units are used as a reference here too. The
units and width keyword are used together to control the shaft width of the arrow.

The units keyword in the plt.quiver() function does not have an impact
on the arrow length.

The scale_units and scale keyword are used here to control the arrow length. Setting
scale_units to x means that X-axis units are used to draw the arrow length. Setting
scale to 10 means that the arrow length is scaled to a 10th of the x-axis units. For
example, if the wind speed is 5 m/s the associated arrow will be a 10th of the distance
between 0 and 5 on the x-axis.
The plot layout is optimised in line 56. Then the plot is saved and closed in line 59
and 62, respectively.
Python - Creating Plots 164

Figure 8.5.2.1: Line plot with arrows showing vertical wind profile and wind direction derived from
a Pibal track.

The resultant plot (Figure 8.5.2.1) shows the vertical wind profile as measured by
a pibal balloon. Analysing Figure 8.5.2.1 and Figure 8.5.1.1 together provides a
comprehensive picture of the wind within the lower 3 km of the atmosphere.

8.2.3 Multiple Lines Plot with Markers and Legend


The example Python code presented in Code 8.5.3.1 generates a plot with multiple
lines, markers and a legend (Figure 8.5.3.1). Values of longterm mean monthly rainfall
for three UK cities were obtained from UK Met Office¹².

¹²https://www.metoffice.gov.uk/research/climate/maps-and-data/uk-climate-averages
Python - Creating Plots 165

Code 8.5.3.1: Plotting a graph with multiple lines, markers and legend.

1 import matplotlib.pyplot as plt


2 import numpy as np
3 from matplotlib.ticker import MultipleLocator
4
5 # mean precipitation [mm] 1981-2010 (Met Office)
6 glasgow = [153, 112.3, 124.8, 67.4, 65.3, 73.4, 77.7, 100.9, 123.9, 142.6,
7 131.7, 145.6]
8 oxford = [56.6, 42.5, 47.6, 49.1, 57.1, 48, 48.9, 56.5, 54.1, 69.6, 66.6, 63.1]
9 cambridge = [46.6, 34.5, 38.3, 41.2, 46, 51.5, 47.5, 50.8, 53.5, 59, 52.8, 46.4]
10
11 # setup up figure and axes
12 fig, ax = plt.subplots(1, 1, figsize=(3.98, 5.5))
13
14 # plot data
15 mon = np.arange(12)+1
16 ax.plot(mon, glasgow, color='skyblue', marker='s', label='Glasgow, Bishopton')
17 ax.plot(mon, oxford, color='lightcoral', marker='o', label='Oxford')
18 ax.plot(mon, cambridge, color='seagreen', marker='v', label='Cambridge')
19
20 # format x axis
21 ax.set_xlabel('Month')
22 ax.axes.set_xlim(1, 12)
23 ax.xaxis.set_major_locator(MultipleLocator(1))
24
25 # format y axis
26 ax.set_ylabel('Rainfall [mm]')
27 ax.axes.set_ylim(30, 150)
28 ax.yaxis.set_minor_locator(MultipleLocator(5))
29
30 # add title
31 ax.set_title('Mean Rainfall (1989-2010)')
32
33 # add legend
34 ax.legend(loc='upper center', fontsize=7)
35
36 # optimise layout
37 fig.tight_layout()
38
Python - Creating Plots 166

39 # save plot
40 fig.savefig('../images/7_python_multiple_lines_legend_300dpi.png', format='png',
41 dpi=300)
42
43 # close plot
44 plt.close()

All packages needed are imported in lines 1 and 2 and rainfall data to be plotted are
entered manually into the script in line 6 to 9. Each variable (city name) holds twelve
values, one for each month. The figure and axis are set up in line 12.
In line 15 the variable mon is created using np.arange() which generates a sequence of
12 numbers starting at 0. The value 1 is added to each element of the sequence so that
the variable mon holds a sequence of 12 numbers ranging from 1 to 12. The variable
represents the months and is used as the x-axis variable in the plotting calls in the
following three lines.
The rainfall data are plotted for Glasgow, Oxford and Cambridge in lines 16, 17 and
18, respectively. The colour, marker symbol and label are set accordingly for each
plotting call. The label is used to generate the legend later.
The x-axis is formatted in lines 21 and 23, adding an axis label, setting the x-axis
range from 1 to 12 (months) and setting major ticks for each months.
Similarly, the y-axis is formatted in line 26 to 28 adding an axis label, setting the
y-axis range from 30 to 150 and adding minor ticks every 5 mm or rainfall.
In line 31 a title is added to the plot.
The legend is added in line 34. Only two keyword arguments are passed to the
ax.legend() function. The first one (loc='upper center') places the legend in the top
centre of the plotting area. The fontsize is set to 7 so that the legend fits nicely into
the plotting area.
The plot is completed by optimising the layout, saving it to a file and closing it in
lines 37, 40 to 41 and 44, respectively.
Python - Creating Plots 167

Figure 8.5.3.1: Example of multiple lines with markers and legend showing seasonal cycle of rainfall
for three cities in the UK.

Figure 8.5.3.1 shows the annual cycle of rainfall for Glasgow, Oxford and Cambridge.
Glasgow receives significantly more rainfall throughout the year than Oxford
and Cambridge and also shows a distinctly different annual cycle. As a result of
Cambridge being located further east than Oxford it receives slightly less rainfall
throughout the year than Oxford due to Atlantic storm track hitting the British Isles
Python - Creating Plots 168

from the west.

8.2.4 Multiple Lines Plot with two Scales


In some scenarios it is desirable to plot two different variables on the same plot to
show they relate to one another. If the two variables have different units then two
different scales are required. This can be achieved by plotting one scale on the left
y-axis and the other on the right y-axis. An example of how to do this is given in
Code 8.5.4.1 plotting 10 minute averaged temperature and humidity over a couple of
days (Figure 8.5.4.1).
Code 8.5.4.1: Plotting a graph with multiple lines and two different scales on the left and right y-axis
as well as time axis formatting.
1 import matplotlib.pyplot as plt
2 import numpy as np
3 from openpyxl import load_workbook
4 import datetime
5 import matplotlib.dates as mdates
6 from matplotlib.ticker import MultipleLocator, FormatStrFormatter
7
8 # read T and Hum from Excel spreadsheet
9 wb = load_workbook('../data/Tenerife_Hotel_AWS_2015.xlsx', data_only=True)
10 ws = wb['Hotel']
11 dates = []
12 t2m = np.array([], dtype='float64')
13 hum2m = np.array([], dtype='float64')
14 for row in range(3, 772):
15 # read temperature and humidity
16 t2m = np.append(t2m, np.float64(ws.cell(row=row, column=3).value))
17 hum2m = np.append(hum2m, np.float64(ws.cell(row=row, column=6).value))
18
19 # read date and time values and combine them in datetime object
20 d = ws.cell(row=row, column=1).value
21 t = ws.cell(row=row, column=2).value
22 if type(t) != type(datetime.time()):
23 t = datetime.time(0, 0, 0)
24 dt = datetime.datetime.combine(d, t)
25 dates.append(dt)
26
Python - Creating Plots 169

27 # setup up figure and axes


28 fig, ax1 = plt.subplots(1, 1, figsize=(5.5, 3.98))
29
30 # plot temperature (ax1)
31 color = 'tab:red'
32 ax1.set_ylabel('Temperature [C]', color=color)
33 ax1.plot(dates, t2m, color=color, linewidth=0.5)
34 ax1.tick_params(axis='y', labelcolor=color)
35 ax1.set_title('Puerto de la Cruz, Tenerife 2015')
36
37 # create second axes that shares the same x-axis
38 ax2 = ax1.twinx()
39
40 # plot humidity (ax2)
41 color = 'tab:blue'
42 ax2.set_ylabel('Humidity [%]', color=color)
43 ax2.plot(dates, hum2m, color=color, linewidth=0.5)
44 ax2.tick_params(axis='y', labelcolor=color)
45
46 # format time axis
47 ax1.set_xlim(datetime.datetime(2015, 4, 25, 6, 0, 0),
48 datetime.datetime(2015, 4, 30, 18, 0, 0))
49 ax1.xaxis.set_major_locator(mdates.HourLocator(interval=6))
50 datefmt = mdates.DateFormatter('%Y-%m-%d %H:%M')
51 ax1.xaxis.set_major_formatter(datefmt)
52 ax1.tick_params(axis='x', which='major', labelsize=6)
53 ax1.xaxis.set_minor_locator(mdates.HourLocator(interval=3))
54 fig.autofmt_xdate()
55
56 # add gridlines
57 ax1.grid(which='major', axis='x', linewidth=0.5, color='black', alpha=0.5,
58 linestyle=':')
59
60 # optimise layout
61 fig.tight_layout()
62
63 # save plot
64 fig.savefig('../images/7_python_multiple_lines_2yscales_300dpi.png',
65 format='png', dpi=300)
66
Python - Creating Plots 170

67 # close plot
68 plt.close()

The packages and functions used in the code are imported in line 1 to 6.
The data are read in from an Excel spreadsheet in line 9 to 25 (see Section 7.2.3.3
for more details on how to read data from Excel spreadsheets). Temperature and
humidity values are read into the Numpy variables t2m and hum2m.
The data and time information is saved in two different Excel spreadsheet columns
(columns 1 and 2). These are read into the variables d and t in line 20 and 21,
respectively. The date is automatically converted to a datetime.datetime object and
the time is converted to a datetime.time object. There is, however, one hiccup when
the time in the Excel spreadsheet cell is 00:00 in which case the time is not converted
to a datetime.time object. This is done manually in line 22 and 23 by including a if-
statement that checks for the variable type. If it is not a datetime.time object then it
creates one for the time 00:00:00 (midnight).
In line 24, the date and time objects are combined into a single datetime.datetime
object using the datetime.datetime.combine() function. The resulting object is saved
in a list named dates in line 25.
The figure and a single axis (ax1) are set up in line 28.
In lines 31 to 35 the first variable (temperature) is plotted and the left y-axis is
formatted accordingly. The colour used for plotting temperature and the left y-axis
label and tick labels is set to tab:red. The y-axis label plotted in line 32 will be placed
next to the left-hand side y-axis. The temperature values (t2m) are plotted against the
dates saved in the dates list in line 33. The y-axis colour is set in line 34 and a plot
title is added in line 35 (will be in colour black).
Next, a new axis (ax2) is created in line 38 which shares the same x-axis.

The ax.twinx() function can be used to create a second y-axis. It shares the
same x-axis.

The second variable (humidity) is plotted and the y-axis and right y-axis is formatted
accordingly in lines 41 to 44. The colour used for plotting humidity and the right y-
axis label and tick labels is set to tab:blue. The right y-axis (ax2) label is set in line 42
Python - Creating Plots 171

and the humidity variable hum2m is plotted against the dates list in line 43. The colour
of the right y-axis ticks is set in line 44.
The time axis (x-axis) is formatted in lines 47 to 54. The x-axis range is set by
providing two datetime.datetime objects to the ax1.set_xlim() function. The x-axis
limits are from 6 UTC on 25 April 2015 to 18 UTC on 30 April 2015. Major ticks are
plotted for every 6-hour interval (line 49) and minor ticks are plotted for every 3-hour
interval (line 53). The format of the date/time label is set to '%Y-%m-%d %H:%M' in line 50
and 51 which produces labels such as 2015-04-25 06:00. In line 54, the figure is made
aware of the fact that the x-axis represents dates by using the fig.autofmt_xdate()
function. The x-axis labels will be formatted accordingly.
Vertical dotted grid lines are added in line 57 and 58.
The plot layout is optimised, the figure is saved to a file and then closed in lines 61,
64 to 65 and 68, respectively.
Python - Creating Plots 172

Figure 8.5.4.1: Graph with multiple lines and two different scales on the left and right y-axis showing
temperature and humidity over 6 days at Puerto de la Cruz, Tenrife.

8.2.5 Multiple Lines Plot with Standard Deviation


Measures of variability or spread such as standard deviation, variance, range (mini-
mum to maximum) or interquartile range (25ʰ to 27ʰ percentile) are useful quantities
to include a plot. The following example Code 8.5.5.1 plots the annual cycle of 10m
wind speed at the Bodele Depression (Chad) as seen in the ERA5 and ERA-Interim
reanalyses. Standard deviations were calculated for each month and added to the
plot in order to provide a measure of variability for each month.
Python - Creating Plots 173

Code 8.5.5.1: Plotting a graph with multiple lines and standard deviations using fill_bewteen()
function.
1 import matplotlib.pyplot as plt
2 import numpy as np
3 from netCDF4 import Dataset
4 from matplotlib.ticker import MultipleLocator
5
6 # read data
7 f = Dataset('../data/era5_sfcwind_ymonmean_bodele.nc', mode='r')
8 era5mean = f.variables['si10'][:,0,0]
9 f.close()
10 f = Dataset('../data/era5_sfcwind_ymonstd_bodele.nc', mode='r')
11 era5std = f.variables['si10'][:,0,0]
12 f.close()
13 f = Dataset('../data/erai_sfcwind_ymonmean_bodele.nc', mode='r')
14 eraimean = f.variables['wspd10m'][:,0,0]
15 f.close()
16 f = Dataset('../data/erai_sfcwind_ymonstd_bodele.nc', mode='r')
17 eraistd = f.variables['wspd10m'][:,0,0]
18 f.close()
19
20 # setup figure and axes
21 fig, ax = plt.subplots(1, 1, figsize=(3.98, 5.5))
22
23 # plot data
24 mon = np.arange(12)+1
25 ax.plot(mon, era5mean, color='skyblue', label='ERA5')
26 ax.fill_between(mon, era5mean+era5std, era5mean-era5std, facecolor='skyblue',
27 alpha=0.5)
28 ax.plot(mon, eraimean, color='lightcoral', label='ERA-Interim')
29 ax.fill_between(mon, eraimean+eraistd, eraimean-eraistd, facecolor='lightcoral',
30 alpha=0.5)
31
32 # format x axis
33 ax.set_xlabel('Month')
34 ax.axes.set_xlim(1, 12)
35 ax.xaxis.set_major_locator(MultipleLocator(1))
36
37 # format y axis
38 ax.set_ylabel('Wind Speed [m/s]')
Python - Creating Plots 174

39 ax.axes.set_ylim(2, 10)
40 ax.yaxis.set_major_locator(MultipleLocator(1))
41 ax.yaxis.set_minor_locator(MultipleLocator(0.5))
42
43 # add title
44 ax.set_title('10m Wind Speed Bodele LLJ\n1979-2012')
45
46 # add legend
47 ax.legend(loc='lower left', fontsize=8)
48
49 # optimise layout
50 fig.tight_layout()
51
52 # save plot
53 fig.savefig('../images/7_python_multiple_lines_fill_between_300dpi.png',
54 format='png', dpi=300)
55
56 # close plot
57 plt.close()

All packages and functions used in the script are imported in lines 1 to 4.
The long-term monthly mean wind speed values and corresponding standard devia-
tions are read into the variables era5mean and era5std for ERA5 in lines 7 to 9 and 10
to 12, respectively. The fields are read in for ERA-Interim into the variables eraimean
and eraistd in lines 13 to 15 and 16 to 18, respectively.
A figure (fig) and axis (ax) is set up in line 21.
A sequence of numbers from 1 to 12 representing the months is saved into the variable
mon in line 24. This variable is used in plotting commands in the following lines.

ERA5 mean wind speed is plotted in 25 with the colour skyblue and the label
ERA5 (used in legend). In line 26 the ax.fill_between() function is used to plot
the associated standard deviations. The first argument the function expects is the
variable holding the x-axis coordinate values (mon). The second and third variable
hold the y-axis variables and the area between them will be filled in. In this example
the standard deviation values (era5std) are added to the mean (era5mean) to create
the upper boundary of the fill area subtracted from the mean to create the lower
boundary of the fill area (compare with Figure 8.5.5.1). The fill colour is set to skyblue
Python - Creating Plots 175

(same as for the mean values line) but alpha is set to 0.5 to make the fill area semi-
transparent.
The same plotting routines are repeated for ERA-Interim in line 28 to 30.
The x-axis and y-axis are formatted in lines 33 to 35 and 38 to 41, respectively and a
title is added in line 44.
A legend is placed in the lower left corner of the plotting area with a font size of 8
using the ax.legend() function. The legend will use the labels defined in the plotting
commands in line 25 and 28.
Finally, the plot layout is optimised, the figure saved to a file and closed in lines 50,
53 to 54 and 57, respectively.
Python - Creating Plots 176

Figure 8.5.5.1: Plot of longterm mean and standard deviation of 10 wind speed at the Bodele
Depression (Chad) as seen in ERA5 and ERA-Interim.
Python - Creating Plots 177

8.3 Scatter Plots

8.3.1 Scatter Plot with a Legend


The following example Code 8.6.1.1 shows how to create a scatter plot with a legend
that is placed outside of the plotting area (Figure 8.6.1.1). The code reads in data from
the CMIP5 dataset showing the relationship between global warming under a RCP8.5
pathway and rainfall in the tropics.
Code 8.6.1.1: Scatter plot with legend placed to the right of the plot.

1 import numpy as np
2 import matplotlib.pyplot as plt
3 from netCDF4 import Dataset
4 from itertools import cycle
5
6 # list of model names
7 modlist = ['ACCESS1-0', 'ACCESS1-3', 'bcc-csm1-1-m', 'bcc-csm1-1', 'BNU-ESM',
8 'CanESM2', 'CCSM4', 'CESM1-BGC', 'CESM1-CAM5', 'CMCC-CESM',
9 'CMCC-CM', 'CMCC-CMS', 'CNRM-CM5', 'CSIRO-Mk3-6-0', 'EC-EARTH',
10 'FGOALS-g2', 'FIO-ESM', 'GFDL-CM3', 'GFDL-ESM2G', 'GFDL-ESM2M',
11 'GISS-E2-H-CC', 'GISS-E2-H', 'GISS-E2-R-CC', 'GISS-E2-R',
12 'HadGEM2-AO', 'HadGEM2-CC', 'HadGEM2-ES', 'inmcm4', 'IPSL-CM5A-LR',
13 'IPSL-CM5A-MR', 'IPSL-CM5B-LR', 'MIROC5', 'MIROC-ESM-CHEM',
14 'MIROC-ESM', 'MPI-ESM-LR', 'MPI-ESM-MR', 'MRI-CGCM3', 'MRI-ESM1',
15 'NorESM1-ME', 'NorESM1-M']
16
17 # set up figure and map projection
18 fig, ax = plt.subplots(figsize=(5.5, 3.98))
19
20 # set up sequences of colours and symbols
21 colors = iter(plt.cm.gist_rainbow(np.linspace(0, 1, len(modlist))))
22 cycle_marker = cycle(['o', 'v', '^', '<', '>', 's'])
23
24 # loop through models
25 for i, m in enumerate(modlist):
26 # read two model files (pr and tas)
27 f = Dataset('../data/cmip5/pr_Amon_'+m+'_rcp85-hist.nc', mode='r')
28 pr = f.variables['pr'][:,0,0]
Python - Creating Plots 178

29 f.close()
30 f = Dataset('../data/cmip5/tas_Amon_'+m+'_rcp85-hist.nc', mode='r')
31 tas = f.variables['tas'][:,0,0]
32 f.close()
33
34 # scatter data
35 ax.scatter(tas, pr, c=next(colors), marker=next(cycle_marker),
36 s=10, label=m)
37
38 # format x-axis
39 ax.set_xlim(2.25, 5.25)
40 ax.set_xlabel('\u0394 Surface Temperature [\u00B0C]', fontsize=7)
41
42 # format y-axis
43 ax.set_ylim(0, 0.7)
44 ax.set_ylabel('\u0394 Precipitation [mm day$\mathregular{^{-1}}$]', fontsize=7)
45
46 # formats for both axes
47 ax.tick_params(axis='both', which='major', labelsize=6)
48 ax.grid(linewidth=0.5, color='black', alpha=0.5, linestyle=':')
49
50 # add legend
51 plt.subplots_adjust(right=0.6)
52 ax.legend(loc='upper left', bbox_to_anchor= (1.02, 1.0), fontsize= 6, ncol=2,
53 frameon=False)
54
55 # save figure to file
56 plt.savefig('../images/7_python_scatter_plot_legend_300dpi.png', format='png',
57 dpi=300)
58
59 # close plot
60 plt.close()

In lines 7 to 15 a list called modlist is created that contains the names of 40 models.
These are part of the input filenames. In line 18 the plot is set up.
Each symbol on the scatter plot can be identified by its marker shape and colour
(Figure 8.6.1.1). A sequence of 40 unique colours is created in line 21 based on the
plt.cm.gist_rainbow colour map. The Python-internal iter command is used here to
Python - Creating Plots 179

create an iterable object named colors that contains the colours for the scatter plot
markers.
In line 22 a sequence of 6 markers is passed on to the cycle function from the
itertools package creating an object called cycle_marker that can be cycled over.
Both the iterable colors object and the cyclable cycle_marker object are used later in
the scatter() function in line 35.
The loop set up in line 25 loops over the elements of modlist with the variables i
being a counter starting at 0 and m being the model name.
The data to be plotted are read in in lines 27 to 32 yielding the NumPy arrays pr
for precipitation and tas for surface temperature. Each file has been processed using
CDO and contains a single value. The precipitation variable represents the difference
in mean tropical (15°S and 15°N) precipitation between RCP8.5 projection (2081-2100)
and the Historical period (1986-2005). The surface temperature variable represents the
corresponding value but for global mean temperature.
Python - Creating Plots 180

Figure 8.6.1.1: Scatter plot showing CMIP5 model global mean surface temperature change between
Historical (1986-2005) and RCP8.5 (2081-2100) versus the corresponding change in mean precipita-
tion in the tropics (15°S-15°N). The legend is placed to the right of the plot.

With each loop iteration one marker is placed on the scatter plot by the ax.scatter()
function in line 35 representing one model. The two values in the variables tas and
pr are passed on to the function first. The keyword c stands for colour and the
next(colors) function will automatically move to the next colour of the colors object
created in line 21. The shape of the symbol is defined by the marker keyword and here
the next(cycle_marker) will cycle through the markers defind in line 22. The size s is
set to 10 and the model name is passed to the label keyword.
Line 43 to 48 deal with formatting the x-axis and y-axis of the plot setting axis limits
and labels. Note how the Unicode character for a delta symbol (∆) is included in the
axis labels. Also, scientific notation was included using -1 as superscript with days to
indicate mm/day. Some more formatting of axis label sizes and grid lines is done in
lines 47 and 48.
Python - Creating Plots 181

Placing legends can seem quite complicated at first. A good discussion of


solutions for placing legends and the use of the bbox_to_anchor keyword can
be found at Stackoverflow¹³.

Placing a legend within the plotting area of the current axis (ax) is reasonably
straightforward and can be achieved by using the location keyword loc with one
of the 11 options described in the documentation (e.g., best, upper left or center
right). However, placing a legend outside the the plotting area is more challenging.
The solution employed here works as follows.
First, a space on the right-hand side of the canvas is created by reducing the plotting
area on the right to just 0.6 (figure coordinates) using the plt.subplots_adjust()
function. The legend position is then controlled by the loc and bbox_to_anchor
keywords. loc='upper left' here refers to the upper left corner of the legend bounding
box (not the plot). This corner of the legend bounding box is then placed at the axis
(ax) coordinates 1.02 and 1.0 as defined by the bbox_to_anchor. As the horizontal value
of 1.02 is slightly larger than the plot area (ax) the legend is placed just to the right of
the plot on the canvas. The number of legend columns (ncol) is set to 2 in order to fit
all 40 labels. Drawing a frame around the legend has been turned off (frameon=False).
Finally, the plot is saved in line 56 and 57 and closed in line 60.

The fig.tight_layout() function overrides the settings implemented by the


plt.subplots_adjust() function which is why it is not used in this script.

8.3.2 Scatter Plot with Divergent Colour Bar


In the following example Code 8.6.2.1 creates a scatter plot that shows biases in
observed surface pressure compared to ERA5 surface pressure as a function of station
elevation and corresponding ERA5 elevation for SYNOP-reporting stations in Africa
(Figure 8.6.2.1). Instead of a legend a colour bar is used here (Figure 8.6.2.1).

¹³https://stackoverflow.com/questions/4700614/how-to-put-the-legend-out-of-the-plot
Python - Creating Plots 182

Code 8.6.2.1: Simple scatter plot with divergent colourbar.

1 import numpy as np
2 import matplotlib.colors as mcolors
3 import matplotlib.pyplot as plt
4
5 # read data to plot
6 npzfile = np.load('../data/sp.selevs.eelevs.emean.smean.12.JJA.npz')
7 selevs = npzfile['selevs']
8 eelevs = npzfile['eelevs']
9 smean = npzfile['smean']
10 emean = npzfile['emean']
11
12 # test
13 print('Station elevations:', selevs.shape, np.nanmin(selevs), np.nanmax(selevs))
14 print('ERAI elevations:', eelevs.shape, np.nanmin(eelevs), np.nanmax(eelevs))
15 print('Station sfc pressure:', smean.shape, np.nanmin(smean), np.nanmax(smean))
16 print('ERAI sfc pressure:', emean.shape, np.nanmin(emean), np.nanmax(emean))
17
18 # subtract observed surface pressure from ERA5 surface pressure
19 fld = emean-smean
20
21 # setup figure and axes
22 fig, ax = plt.subplots(1, 1, figsize=(5.5, 3.98))
23
24 # set up descrete color table
25 cmap = plt.cm.seismic
26 bounds = np.linspace(-25, 25, num=11)
27 norm = mcolors.BoundaryNorm(bounds, cmap.N)
28
29 # scatter plot
30 scat = ax.scatter(selevs, eelevs, c=fld, s=6, marker='o', cmap=cmap, norm=norm)
31
32 # add diagonal dashed line
33 axmax = 2500
34 ax.plot([0.0, axmax],[0.0, axmax], c='k', linewidth=0.5, linestyle=':')
35
36 # format x axis
37 ax.set_xlim(0.0, axmax)
38 ax.set_xlabel('Station Elevation [m]', fontsize=8)
Python - Creating Plots 183

39 ax.tick_params(labelsize=7)
40
41 #formate y axis
42 ax.set_ylim(0.0, axmax)
43 ax.set_ylabel('ERAI Elevation [m]', fontsize=8)
44
45 # colorbar
46 cbar = fig.colorbar(scat, orientation='vertical', shrink=0.7, extend='both')
47 cbar.set_label('Surface Pressure [hPa]', rotation=90, fontsize=8)
48 cbar.ax.tick_params(labelsize=6, length=0)
49 cbar.set_ticks(bounds)
50
51 # optimise layout
52 fig.tight_layout()
53
54 # save plot
55 fig.savefig('../images/7_python_scatter_plot_divergent_colorbar_300dpi.png',
56 format='png', dpi=300)
57
58 # close plot
59 plt.close()

The data to be plotted were calculated beforehand and the variables (NumPy arrays)
were saved in uncompressed format using the np.savez() function. Here, we use the
np.load() function to read them back in in line 6 to 10. The variables selevs and eelevs
hold elevation data and corresponding ERA5 surface elevation data for each station,
respectively. The variables smean and emean hold the observed mean surface pressure
data and the corresponding ERA5 surface pressure data for each station, respectively.
All four variables are one-dimensional NumPy arrays with 883 elements (stations).
For testing purposes the array shapes, minimum and maximum values of each array
are printed out in lines 13 to 16.
In line 19 the observed surface pressure is subtracted from the ERA5 surface pressure
saved in a new variable named fld. This variable now has positive values for stations
where observed surface pressure is lower than in ERA5 and negative values were the
observed surface pressure is larger than in ERA5.
The figure and axis is set up in line 22 and the colour map is defined in line 25 to 27.
Python - Creating Plots 184

A divergent colour map named seismic¹⁴ ranging from blue to red is used here with
values ranging from -25 to 25 in steps of 5 (defined using the np.linspace() function).
The scatter plot itself is created in line 30. The x and y coordinates for the markers are
given by the variables selevs and eelevs, respectively. The marker colour (c keyword)
is defined by the variable fld calculated earlier in line 19. The marker size (s keyword)
is set to 6 and the marker symbol (marker keyword) is set to a circle (o). The marker
colours are defined by the cmap and norm keywords.
A diagonal dotted line is plotted in lines 33 and 34 between two points defined by
the data coordinates [0, 0] and [2500, 2500]. This line represents a zero difference
between station elevation and ERA5 surface elevation.
The x-axis and y-axis are formatted in lines 37 to 39 and 42 to 43, respectively.
A vertical colour bar with extended ends is added in lines 46 to 49. The handle scat
created in line 30 is passed on to the fig.colorbar() function.
Finally, the plot layout is optimised in line 52. the figure is saved in lines 55 and 56
and the figure is closed in line 59.
¹⁴https://matplotlib.org/tutorials/colors/colormaps.html
Python - Creating Plots 185

Figure 8.6.2.1: ERA-Interim minus observed mean surface pressure for SYNOP reporting stations
in Africa with at least 100 records for JJA 12 UTC. The biases are plotted as a function of station
elevation (x-axis) and corresponding spatially interpolated ERA-Interim elevation (y-axis). Markers
left of the dotted diagonal line are associated with ERAI elevation greater than station elevation
whereas markers on the right are associated with ERAI elevation less than station elevation.

The resulting plot can be seen in Figure 8.6.2.1 above. With only a few exceptions
stations where ERA5 elevation is less than the actual station elevation (left of dashed
line) show a negative surface pressure bias whereas stations where ERA5 elevation
is larger (right of dashed line) show a positive surface pressure bias. The magnitude
of the bias (positive and negative) increases with distance from the dashed line.

8.3.3 Scatter Plot on a Map with Colour Bar and Legend


The following code example (Code 8.6.3.1) shows how to had markers on a map
using the plt.scatter() function. This is useful, for instance, when plotting data from
surface stations or data representing cities. In this example the linear correlation
coefficients of observed of 12 UTC 2m air temperature with corresponding spatially
Python - Creating Plots 186

interpolated ERA5 2m air temperatures for stations in Africa are plotted on a


map with the marker colour indicating the correlation and the size of the markers
indicating the number of available observations that went into the correlation
calculation (Figure 8.6.3.1). A colour bar and legend were added to the plot. For a
general introduction to plotting maps see Section 7.x..
Code 8.6.3.1: Using the plt.scatter() function function to plot markers on a map including colorbar
and legend.
1 import numpy as np
2 import matplotlib.pyplot as plt
3 import matplotlib.ticker as mticker
4 import matplotlib.colors as mcolors
5 import cartopy.crs as ccrs
6 from cartopy import feature
7 from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
8
9 # read data from a compressed numpy zipped archive file (NPZ)
10 npzfile = np.load('../data/t2m_AfricaStns_12_JJA.npz')
11 lons = npzfile['lons']
12 lats = npzfile['lats']
13 nobs = npzfile['nobs']
14 corr = npzfile['corr']
15
16 # calculate size of marker base on number of observations
17 # this leaves all stations with less than 100 obs as NaN
18 sarr = (np.arange(5) * 4) + 2
19 msize = np.zeros((len(nobs))) * np.NaN
20 nobsbins = [100, 500, 1000, 1500, 2500, 3500]
21 for n in np.arange(len(nobs)):
22 for x in np.arange(5):
23 if nobsbins[x] < nobs[n] <= nobsbins[x + 1]:
24 msize[n] = sarr[x]
25
26 # set up figure and map projection
27 fig, ax = plt.subplots(figsize=(5.5, 3.98),
28 subplot_kw={'projection':ccrs.PlateCarree()})
29
30 # set up map
31 ax.set_extent([-20, 55, -40, 40], crs=ccrs.PlateCarree())
32 ax.stock_img()
Python - Creating Plots 187

33 ax.coastlines()
34 ax.add_feature(feature.BORDERS, linestyle='-', linewidth=0.5)
35 ax.add_feature(feature.LAKES, alpha=0.5)
36 ax.add_feature(feature.RIVERS)
37
38 # set up discrete colour table
39 cmap = plt.cm.hot
40 bounds = np.arange(0, 1.1, 0.1)
41 norm = mcolors.BoundaryNorm(bounds, cmap.N - 50)
42
43 # plot symbols
44 scat = ax.scatter(lons, lats, c=corr, s=msize, cmap=cmap, norm=norm,
45 edgecolors='none', marker='s')
46
47 # format map gridlines and labels
48 gl = ax.gridlines(draw_labels=True, linewidth=0.5, color='black', alpha=0.5,
49 linestyle=':')
50 gl.xlabels_top = False
51 gl.xlocator = mticker.FixedLocator(np.arange(-180,180,20))
52 gl.xformatter = LONGITUDE_FORMATTER
53 gl.xlabel_style = {'size':7, 'color':'black'}
54 gl.ylabels_right = False
55 gl.ylocator = mticker.FixedLocator(np.arange(-90,90,20))
56 gl.yformatter = LATITUDE_FORMATTER
57 gl.ylabel_style = {'size':7, 'color':'black'}
58
59 # colorbar
60 cbar = fig.colorbar(scat, orientation='vertical', shrink=0.7)
61 cbar.set_label('Correlation Coefficient', rotation=90, fontsize=7)
62 cbar.ax.tick_params(labelsize=7, length=0)
63 cbar.set_ticks(np.arange(11)/10)
64
65 # add legend by scatter plotting 5 imaginary markers with labels
66 for i in np.arange(5):
67 ax.scatter([], [], c='black', s=sarr[i], edgecolors='none', marker='s',
68 label=str(nobsbins[i])+'-'+str(nobsbins[i+1]))
69 ax.legend(loc='lower left', fontsize=7)
70
71 # optimise layout
72 fig.tight_layout()
Python - Creating Plots 188

73
74 # save figure to file
75 plt.savefig('../images/7_python_map_markers_300dpi.png', format='png', dpi=300)
76
77 # close plot
78 plt.close()

All packages and functions needed are imported in lines 1 to 7.


Data variables to be plotted were pre-calculated and saved within an uncompressed
NumPy format .npz using the np.savez() function. Here we read the variables back
in using the np.load() function in line 10 to 14. The variables lons and lats hold the
longitude and latitude station coordinates, respectively. The variables nobs and corr
hold the number of available observations for each station and the linear correlation
coefficients (Pearson’s r), respectively. All variables are one-dimensional NumPy
arrays of the same length with each element corresponding to a single station.
In lines 18 to 24 an array is created (msize) that holds the marker size value for each
station. The variable sarr created in line 18 is a NumPy array containing marker
size values ([2, 6, 10, 14, 18]) which are used later inside the loop (line 24). The
variable msize is created in line 19 initially with NaN (not a number) values. This
NumPy array is later filled with actual numbers. The variable nobsbins defines the
number of observations bins. The loop (line 21) then iterates over each station and
checks the number of observations and sets the marker size accordingly saved in the
msize variable.

The figure and axis is set up in lines 27 and 28 setting the projection to ccrs.PlateCarree().
The general map characteristics are set in lines 31 to 36. The map domain and
coordinate reference system (crs) are set in line 31. The stock background image is
set in line 32 and coastlines are added in line 33. Additional features such as country
borders, lakes and rivers are added using the ax.add_feature() function in lines 34,
35 and 36, respectively.
A discrete colour table is set up in lines 39 to 41 using the hot¹⁵ colour map.
The ax.scatter() function is used to plot markers for each station on the map.
The x and y coordinates of the markers are given by the lons and lats variables,
¹⁵https://matplotlib.org/tutorials/colors/colormaps.html

You might also like