Professional Documents
Culture Documents
Slicing. Both, Numpy Array Indexing and Slicing Will Be Discussed in The Remainder
Slicing. Both, Numpy Array Indexing and Slicing Will Be Discussed in The Remainder
For climate data analysis it is important to be able to extract individual array values.
This is done using indices. See Section 4.3 for more details on indexing gridded
datasets. When selecting larger subsections from a NumPy array this is known as
slicing. Both, NumPy array indexing and slicing will be discussed in the remainder
of this section using examples.
The following example commands executed on the Python command prompt show
how to index one-dimensional NumPy arrays.
The index -1 selects the last element of a NumPy array. This is a useful
shortcut when the length of an array is unknown.
Indices can also be used with two-dimensional arrays as shown in the following
examples.
Python - Programming Basics 140
Double colons (::) can be used to select every other value of an array within
one dimension.
The NumPy fuction np.savez() creates a NumPy specific uncompressed file with the
file extension .npz.
The following code example saves the NumPy arrays lon2d, lat2d and field in a file
named mydata.npz. The file extension .npz will be added automatically to the file
name.
1 import numpy as np
2
3 # save NumPy variables to a file
4 np.savez('mydata', lon2d=lon2d, lat2d=lat2d, field=field)
In order to read the NumPy arrays saved in the file mydata.npz back in the following
commands can be used.
Python - Programming Basics 142
1 import numpy as np
2
3 # read in NumPy variables from file
4 npzfile = np.load('mydata.npz')
5 lon2d = npzfile['lon2d']
6 lat2d = npzfile['lat2d']
7 field = npzfile['field']
a = 1/3
print(a)
print(str(a))
0.3333333333333333
0.3333333333333333
Both print() statements return the same number with many digits after the decimal
point. So a way needs to be found to control the precision of the floating point
number.
Other situations may include very large or very small numbers to be displayed using
exponent notation or padding numbers with zeros.
In order to format numbers in Python the str.format() method can be used whereby
the general syntax for a printing a formatted numbers is as follows.
Python - Programming Basics 143
print('<FORMAT>'.format(<NUMBER>))
The <FORMAT> part needs to be replaced with the desired format and the <NUMBER> part
with the number to be formatted. A list of number format examples is given in Table
7.13.1.1.
Table 7.13.1.1: Number formatting examples using the str.format() method. Reproduced with
permission from Marcus Kazmierczak’s blog Python String Format Cookbook.
An example for the use of the str.format() method in a Python script can be found
in Code 7.5.1.1 in lines 32 and 36. Here the value in the variable wspd is formatted
to have one decimal digit which is then used to annotate a data point on the plot
(Figure 7.5.1.1).
1 import numpy as np
2
3 days = np.linspace(1, 31, 31, dtype='int')
4
5 # loop through each element
6 for d in days:
7 print(d, str(d).zfill(2), str(d).zfill(5))
The np.linspace() function is used in line 3 to create a list of integer values stored
in the variable days ranging from 1 to 31. The loop set up in line 6 iterates over each
element of the variable days. A print() statement is executed for each iteration for
demonstration purposes. Three values are printed. First, the integer value d. Second,
the integer value d converted to a string and applying a zfill(2). Third, the integer
value d converted to a string and applying a zfill(5). Exectuting the above code will
give the following output.
1 01 00001
2 02 00002
3 03 00003
4 04 00004
5 05 00005
6 06 00006
7 07 00007
8 08 00008
9 09 00009
10 10 00010
11 11 00011
12 12 00012
13 13 00013
14 14 00014
15 15 00015
16 16 00016
17 17 00017
18 18 00018
19 19 00019
Python - Programming Basics 145
20 20 00020
21 21 00021
22 22 00022
23 23 00023
24 24 00024
25 25 00025
26 26 00026
27 27 00027
28 28 00028
29 29 00029
30 30 00030
31 31 00031
The zfill method takes only one value as an argument which is the total width of
the resulting string. It is the total character width after zero-padding was applied.
Applying zfill(2) will add a 0 in front of single digit values. Applying zfill(5) will
add four 0s in front of single digit values, three 0s in front of two digit values etc.
⁶https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.geopotential_to_height.html
Python - Programming Basics 146
1 netcdf era5_z_bodele_20050301_1200 {
2 dimensions:
3 lon = 1 ;
4 lat = 1 ;
5 level = 37 ;
6 time = UNLIMITED ; // (1 currently)
7 variables:
8 double lon(lon) ;
9 lon:standard_name = "longitude" ;
10 lon:long_name = "longitude" ;
11 lon:units = "degrees_east" ;
12 lon:axis = "X" ;
13 double lat(lat) ;
14 lat:standard_name = "latitude" ;
15 lat:long_name = "latitude" ;
16 lat:units = "degrees_north" ;
17 lat:axis = "Y" ;
18 double level(level) ;
19 level:standard_name = "air_pressure" ;
20 level:long_name = "pressure_level" ;
21 level:units = "millibars" ;
22 level:positive = "down" ;
23 level:axis = "Z" ;
24 double time(time) ;
25 time:standard_name = "time" ;
26 time:long_name = "time" ;
27 time:units = "hours since 1900-1-1 00:00:00" ;
28 time:calendar = "standard" ;
29 time:axis = "T" ;
30 short z(time, level, lat, lon) ;
31 z:standard_name = "geopotential" ;
⁷https://unidata.github.io/MetPy/latest/api/generated/metpy.calc.geopotential_to_height.html
Python - Programming Basics 147
32 z:long_name = "Geopotential" ;
33 z:units = "m**2 s**-2" ;
34 z:add_offset = 235494.040103126 ;
35 z:scale_factor = 7.34166874895091 ;
36 z:_FillValue = -32767s ;
37 z:missing_value = -32767s ;
38 ...
39 }
Geopotential is available for the 37 levels (line 5) of the ERA5 model and geopotential
values are given in m²/s² (line 33). The example outlined in Code 7.13.3.1 demon-
strates how the MetPy package can be used to convert geopotential to geopotential
height.
Code 7.13.3.1: Calculating height of pressure levels from ERA5 geopotential field.
1 import numpy as np
2 from netCDF4 import Dataset
3 from metpy.calc import geopotential_to_height
4 from metpy.units import units
5
6 # read netcdf file
7 f = Dataset('../data/era5_z_bodele_20050301_1200.nc', mode='r')
8 levs = f.variables['level'][:]
9 field = f.variables['z'][0,:,0,0]
10 f.close()
11 print('var type field:', type(field))
12
13 # register geopotential field with metpy conform units
14 gp = field * (units.meters**2 / units.seconds**2)
15 print('var type gp:', type(gp))
16
17 # calc height, h is metpy variable with units attached
18 gph = geopotential_to_height(gp)
19 print('var type gph:',type(gph))
20
21 # unregister variable from metpy if needed; returns numpy array
22 height = gph.magnitude
23 print('var type height:', type(height))
24
Python - Programming Basics 148
!(code/7_python_calc_gph_metpy_output.txt)
8. Python - Creating Plots
8.1 Matplotlib
The main Python package for anything related to creating plots is Matplotlib¹. In a
script, the Matplotlib package is usually imported in the following way.
While many climate-related plot examples can be found from Section 7.x onwards,
Matplotlib has a massive Gallery² with all kinds of plot and plotting related code
examples. It is worth just browsing through it just to get an idea as to what it possible.
Within the context of subplots axes are Python objects and should not be
confused with actual axes (x-axis or y-axis) of a graph.
¹https://matplotlib.org/
²https://matplotlib.org/gallery/index.html
Python - Creating Plots 150
While figures and axes can be defined explicitly in separate commands (e.g., fig =
plt.figure() and ax = fig.add_subplot()) the convenience function plt.subplots()
is used most of the time in this book. In its simplest form a figure and single axis
(subplot) object can be created using the following single command.
fig, ax = plt.subplots()
The following line of code will return the same as the code above but the number of
subplots in the horizontal and vertical direction is explicitly defined (1 by 1 is default
in above code).
fig, ax = plt.subplots(1, 1)
The first value provided to the plt.subplots() function defines the number of subplots
in the vertical direction and the second value the number of subplots in the horizontal
direction. The following command creates a figure with eight subplots on a grid with
2 plots in the vertical direction and 4 plots in the horizontal direction.
The individual axes can now be referred to by using indexes as in axs[0, 0] to axs[1,
3]. The axes could also be explicitly named as in the following example.
fig, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8)) = plt.subplots(2, 4)
The plt.subplots() function also allows control of the page size, axis sharing and map
projection settings. In the example below taken from Code 8.12.3.1 (Figure 8.12.3.1)
a 3 by 4 grid of subplots is created (12 subplots in total) which share the x-axis and
y-axis and the Plate Carree map projection is set for all subplots.
The plt.subplots() function is not the only way to create plotting axes. One of the
advantages of using the plt.subplots() convenience function is that it makes it easy
Python - Creating Plots 151
to share plot axes (x-axis and y-axis on a graph). An example can be found in Code
8.10.1.1 line 22 and 23 (Figure 8.10.1.1). A good discussion of other ways to create
subplot axes can be found here³.
The fig object is mostly used at the end of a plotting script to optimise the space
used by the subplots and to save the whole ‘page’ to a file as shown in the following
example.
fig.tight_layout()
fig.savefig('filename.png', format='png')
It is also good practice to close the plot properly at the end of a script using
the plt.close() command.
³https://towardsdatascience.com/the-many-ways-to-call-axes-in-matplotlib-2667a7b06e06
⁴https://matplotlib.org/gallery/index.html#subplots-axes-and-figures
Python - Creating Plots 152
The name of the keyword that receives the colour name may be different
for other plotting functions. For instance, for the ax.scatter() function it is
just c.
Figure 8.4.4.1: Matplotlib colour types and names (image sourced from Matplotlib).
Python - Creating Plots 155
Matplotlib colour maps are pre-defined sequence of colours that can be used when
plotting contours or markers in a plot where a distinction by plot colour is necessary.
A list of pre-defined colour maps and their names can be found on the Matplotlib
webpage¹¹.
The pre-defined colour maps are grouped into several categories such as perceptually
uniform sequential, sequential, diverging, cyclic, qualitative and miscellaneous. In
climate research sequential and divergent colour maps are most common. Divergent
colour maps are especially useful when plotting data that include negative and
positive values such as is generated in anomaly and composite difference analysis.
In the following example the colour map named seismic is imported from the
Matplotlib package into the Python object cmap.
In some cases it is necessary to split up the pre-defined colour map into a number of
individual segments where the colours are clearly distinguishable from one another
(e.g., for contour). Basically, a colour map with continuous colours is indexed so that
colours in certain intervals are selected to create a colour map with discrete colours.
This is done by using the mcolors.BoundaryNorm() function as shown in the following
example (take from Code 8.6.2.1).
In the code example above the relevant packages are imported in line 1 to 3. The
colour map plt.cm.seismic is loaded in line 6 and saved in the object cmap. A sequence
¹¹https://matplotlib.org/tutorials/colors/colormaps.html
Python - Creating Plots 156
of numbers from -25 to 25 in steps of 5 is saved in the variable bounds in line 7. Line
8 is where the colour index for the discrete colour map is created. The bounds and
the number of colours from the colour map to be used (cmap.N returns the number
of colours in cmap) are passed to the mcolors.BoundaryNorm() function. The returned
colour map index is saved in norm. In the plotting command norm and the colour
map cmap are passed to the ax.scatter(). The field variable fld holding the data
values is passed to the colour keyword argument c‘. The complete code (Code 8.6.2.1)
generates Figure 8.6.2.1.
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from openpyxl import load_workbook
4 from matplotlib.ticker import MultipleLocator
5
6 # open Excel file, sheet 'P01'
7 wb = load_workbook('../data/pibal_data.xlsx', data_only=True)
8 ws = wb['P01']
9
10 # read in date, time and location cells as strings
11 d = ws.cell(row=3, column=2).value
12 t = ws.cell(row=4, column=2).value
13 loc = ws.cell(row=5, column=2).value
14
15 # create empty numpy array variables
16 wspd = x = y = np.array([], dtype='float64')
Python - Creating Plots 157
17
18 # iterate over rows 8 to 39; read wind speed and x and y distance travelled
19 for row in range(8, 39):
20 wspd = np.append(wspd, np.float64(ws.cell(row=row, column=8).value))
21 x = np.append(x, np.float64(ws.cell(row=row, column=6).value))
22 y = np.append(y, np.float64(ws.cell(row=row, column=7).value))
23
24 # set up figure
25 fig, ax = plt.subplots(figsize=(5.5, 3.98))
26
27 # plot wind profile
28 ax.plot(x, y, color='steelblue', marker='o', markersize=2)
29
30 # plot data labels
31 for i in np.arange(0, 23):
32 ax.annotate("{:.1f}".format(wspd[i]), xy=(x[i], y[i]),
33 xytext=(x[i]-25, y[i]+25), fontsize=5, fontweight='bold',
34 horizontalalignment='right', verticalalignment='bottom')
35 for i in np.arange(23, len(wspd)):
36 ax.annotate("{:.1f}".format(wspd[i]), xy=(x[i], y[i]),
37 xytext=(x[i]-25, y[i]-25), fontsize=5, fontweight='bold',
38 horizontalalignment='right', verticalalignment='top')
39
40 # add title and set tick label size
41 ax.set_title('Pibal: '+loc+' '+d+' '+t, fontsize=12)
42 ax.tick_params(labelsize=7)
43
44 # format x axis
45 ax.set_xlabel('longitude direction [m]', fontsize=9)
46 ax.axes.set_xlim(-3100, 200)
47 ax.xaxis.set_minor_locator(MultipleLocator(100))
48
49 # format y axis
50 ax.set_ylabel('latitude direction [m]', fontsize=9)
51 ax.axes.set_ylim(-7000, 1000)
52 ax.yaxis.set_minor_locator(MultipleLocator(200))
53
54 # add gridlines
55 ax.grid(which='major', axis='both', linewidth=0.5, color='black', alpha=0.5,
56 linestyle=':')
Python - Creating Plots 158
57
58 # optimise layout
59 plt.tight_layout()
60
61 # save figure to file
62 plt.savefig('../images/7_python_line_plot_labels_300dpi.png',
63 orientation='portrait', format='png', dpi=300)
64
65 plt.close()
All necessary packages and functions used in the script are imported in lines 1 to 4.
For more details on how to read in data from an Excel spreadsheet see Section 7.2.3.3.
Here the Excel spreadsheet pibal_data.xlsx is opened for reading in line 7 creating
the handle wb which is used in line 8 to select the sheet named P01 creating a new
handle ws. Date time and location information is read in from the spreadsheet in line
11 to 13. Three empty NumPy arrays are defined in line 16 which are filled with
wind speed (wspd), longitude distance (x) and latitude distance (y) values from the
spreadsheet in lines 19 to 22.
A figure (fig) with a single subplot (ax) is set up in line 25.
The x and y coordinates are used in line 28 to plot markers. The colour of the markers
and connecting line is set to steelblue. The marker is set to a filled circle (o) and size
2.
The data value labels associated with each marker are plotted using the ax.annotation()
function. The labels are added in two steps. The first 23 labels are added in line 31 to
34 using horizontal alignment set to right and vertical alignment set to bottom. This
places the bottom right corner of the data value label closest to the associated marker
(Figure 8.5.1.1).
As the balloon track curves around moving towards the southeast the label position
has to be adjusted in order not to overlay the labels onto the line. The 23rd to last
label is added in line 35 to 38. Here the horizontal alignment is set to right and the
vertical alignment is set to top. This places the top right corner of the data value label
closest to the associated marker (Figure 8.5.1.1).
The distance and position of the label in relation to the marker is controlled by the
xytext keyword. In this example, values of 25 (metres in data coordinate system) are
Python - Creating Plots 159
A plot title is added in line 41 using the date, time and location details retrieved from
the spreadsheet and the tick label size is adjusted to size 7 in line 42.
Further formatting is done for the x-axis and y-axis in lines 45 to 47 and 50 to 52,
respectively. This includes adding axis labels, axis limits and the distribution of minor
ticks. Dotted black grid lines with line width 0.5 are added for both axes in line 55
and 56.
Finally, the plot is optimised, saved and closed in lines 59, 62 to 63 and 65, respectively.
Figure 8.5.1.1: Line plot with markers and labels. Hodograph (birds-eye view) showing the track of
a pibal balloon from the release point at [0, 0].
Python - Creating Plots 160
This code example generates the plot shown in Figure 8.5.1.1. The balloon released
at the coordinates [0, 0] first travelled west-southwest (in the trade winds). As the
balloon rises the wind speed picks up and the distance between labels increases. after
travelling almost 2.8 km in the longitudinal direction the wind direction changes and
the balloon is caught in the westerly flow.
The plot created in the following section uses the data from the same pibal balloon
track. It shows that the change in wind direction occurs at an altitude of about 2 km
(Figure 8.5.2.1).
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from openpyxl import load_workbook
4 from matplotlib.ticker import MultipleLocator
5 import metpy.calc as mpcalc
6 from metpy.units import units
7
8 # open Excel file and iterate through sheets
9 wb = load_workbook('../data/pibal_data.xlsx', data_only=True)
10 ws = wb['P01']
11
12 # read in date, time and location cells as strings
13 d = ws.cell(row=3, column=2).value
14 t = ws.cell(row=4, column=2).value
15 loc = ws.cell(row=5, column=2).value
16
17 # create empty numpy array variables
18 alt = wspd = wdir = np.array([], dtype='float64')
19
20 # iterate over rows 8 to 39; read altitude, wind speed and wind direction
Python - Creating Plots 161
61
62 plt.close()
The first part of the script up to line 29 is very similar to the one developed in the
previous section (Code 8.5.1.1). This is because data are read in from the same Excel
spreadsheet. The only difference is that instead of columns 8, 6 and 7 we read in
columns 2, 8 and 12 for altitude, wind speed and wind direction saved in the variables
alt, wspd and wdir, respectively.
In line 32, wind speed on the x-axis is plotted against altitude on the y-axis. The
line colour is orangered. No further keyword arguments are passed to the ax.plot()
function resulting in an orange coloured line with no markers (Figure 8.5.2.1).
A title is added in line 35 and the x and y axes are formatted in line 38 to 40 and 43
to 45, respectively.
In addition to the vertical wind profile wind arrows indicating the wind direction as
seen from a bird’s-eye view are plotted. This is done in two steps.
First, the U and V wind vector components need to be calculated from the wind speed
and wind direction values. This is done in line 48 to 50 using the MetPy package.
Then, the wind speed values saved in the varible wspd are registered with the unit
m/s using MetPy’s unit() function. Similarly, the wind direction values are registered
with the unit degrees. Now the wind speed and wind direction variables can be passed
to the mpcalc.wind_components() function in line 50 which returns the U and V wind
vector components. They are saved in the variables u and v.
Second, in order to plot the wind arrows in a vertical line on the plot a NumPy
variable x is created in line 51 which is of the same length as the alt variable. All
elements in x are set to 10. The arbitrary value of 10 represents the x coordinate for
the arrow plotting call in line 51.
The wind direction arrows are plotted in line 51 using the plt.quiver() function. The
function requires four mandatory input values (arrays). First, the x and alt variables
determine the location at which the arrows are going to be plotted. The u and v vector
components determine the arrow direction and length (associated with wind speed).
Understanding the keyword arguments that control the arrow properties can be
tricky and it is highly recommended to carefully read the plt.quiver() documen-
Python - Creating Plots 163
tation. Setting pivot to mid makes sure the middle of the arrow is located at the
coordinates provided by the variables x and alt. The arrow colour is set to black.
The unit length is chosen somewhat arbitrarily in this example as it does not relate
to either of the axes. However, the x-axis units are used as a reference here too. The
units and width keyword are used together to control the shaft width of the arrow.
The units keyword in the plt.quiver() function does not have an impact
on the arrow length.
The scale_units and scale keyword are used here to control the arrow length. Setting
scale_units to x means that X-axis units are used to draw the arrow length. Setting
scale to 10 means that the arrow length is scaled to a 10th of the x-axis units. For
example, if the wind speed is 5 m/s the associated arrow will be a 10th of the distance
between 0 and 5 on the x-axis.
The plot layout is optimised in line 56. Then the plot is saved and closed in line 59
and 62, respectively.
Python - Creating Plots 164
Figure 8.5.2.1: Line plot with arrows showing vertical wind profile and wind direction derived from
a Pibal track.
The resultant plot (Figure 8.5.2.1) shows the vertical wind profile as measured by
a pibal balloon. Analysing Figure 8.5.2.1 and Figure 8.5.1.1 together provides a
comprehensive picture of the wind within the lower 3 km of the atmosphere.
¹²https://www.metoffice.gov.uk/research/climate/maps-and-data/uk-climate-averages
Python - Creating Plots 165
Code 8.5.3.1: Plotting a graph with multiple lines, markers and legend.
39 # save plot
40 fig.savefig('../images/7_python_multiple_lines_legend_300dpi.png', format='png',
41 dpi=300)
42
43 # close plot
44 plt.close()
All packages needed are imported in lines 1 and 2 and rainfall data to be plotted are
entered manually into the script in line 6 to 9. Each variable (city name) holds twelve
values, one for each month. The figure and axis are set up in line 12.
In line 15 the variable mon is created using np.arange() which generates a sequence of
12 numbers starting at 0. The value 1 is added to each element of the sequence so that
the variable mon holds a sequence of 12 numbers ranging from 1 to 12. The variable
represents the months and is used as the x-axis variable in the plotting calls in the
following three lines.
The rainfall data are plotted for Glasgow, Oxford and Cambridge in lines 16, 17 and
18, respectively. The colour, marker symbol and label are set accordingly for each
plotting call. The label is used to generate the legend later.
The x-axis is formatted in lines 21 and 23, adding an axis label, setting the x-axis
range from 1 to 12 (months) and setting major ticks for each months.
Similarly, the y-axis is formatted in line 26 to 28 adding an axis label, setting the
y-axis range from 30 to 150 and adding minor ticks every 5 mm or rainfall.
In line 31 a title is added to the plot.
The legend is added in line 34. Only two keyword arguments are passed to the
ax.legend() function. The first one (loc='upper center') places the legend in the top
centre of the plotting area. The fontsize is set to 7 so that the legend fits nicely into
the plotting area.
The plot is completed by optimising the layout, saving it to a file and closing it in
lines 37, 40 to 41 and 44, respectively.
Python - Creating Plots 167
Figure 8.5.3.1: Example of multiple lines with markers and legend showing seasonal cycle of rainfall
for three cities in the UK.
Figure 8.5.3.1 shows the annual cycle of rainfall for Glasgow, Oxford and Cambridge.
Glasgow receives significantly more rainfall throughout the year than Oxford
and Cambridge and also shows a distinctly different annual cycle. As a result of
Cambridge being located further east than Oxford it receives slightly less rainfall
throughout the year than Oxford due to Atlantic storm track hitting the British Isles
Python - Creating Plots 168
67 # close plot
68 plt.close()
The packages and functions used in the code are imported in line 1 to 6.
The data are read in from an Excel spreadsheet in line 9 to 25 (see Section 7.2.3.3
for more details on how to read data from Excel spreadsheets). Temperature and
humidity values are read into the Numpy variables t2m and hum2m.
The data and time information is saved in two different Excel spreadsheet columns
(columns 1 and 2). These are read into the variables d and t in line 20 and 21,
respectively. The date is automatically converted to a datetime.datetime object and
the time is converted to a datetime.time object. There is, however, one hiccup when
the time in the Excel spreadsheet cell is 00:00 in which case the time is not converted
to a datetime.time object. This is done manually in line 22 and 23 by including a if-
statement that checks for the variable type. If it is not a datetime.time object then it
creates one for the time 00:00:00 (midnight).
In line 24, the date and time objects are combined into a single datetime.datetime
object using the datetime.datetime.combine() function. The resulting object is saved
in a list named dates in line 25.
The figure and a single axis (ax1) are set up in line 28.
In lines 31 to 35 the first variable (temperature) is plotted and the left y-axis is
formatted accordingly. The colour used for plotting temperature and the left y-axis
label and tick labels is set to tab:red. The y-axis label plotted in line 32 will be placed
next to the left-hand side y-axis. The temperature values (t2m) are plotted against the
dates saved in the dates list in line 33. The y-axis colour is set in line 34 and a plot
title is added in line 35 (will be in colour black).
Next, a new axis (ax2) is created in line 38 which shares the same x-axis.
The ax.twinx() function can be used to create a second y-axis. It shares the
same x-axis.
The second variable (humidity) is plotted and the y-axis and right y-axis is formatted
accordingly in lines 41 to 44. The colour used for plotting humidity and the right y-
axis label and tick labels is set to tab:blue. The right y-axis (ax2) label is set in line 42
Python - Creating Plots 171
and the humidity variable hum2m is plotted against the dates list in line 43. The colour
of the right y-axis ticks is set in line 44.
The time axis (x-axis) is formatted in lines 47 to 54. The x-axis range is set by
providing two datetime.datetime objects to the ax1.set_xlim() function. The x-axis
limits are from 6 UTC on 25 April 2015 to 18 UTC on 30 April 2015. Major ticks are
plotted for every 6-hour interval (line 49) and minor ticks are plotted for every 3-hour
interval (line 53). The format of the date/time label is set to '%Y-%m-%d %H:%M' in line 50
and 51 which produces labels such as 2015-04-25 06:00. In line 54, the figure is made
aware of the fact that the x-axis represents dates by using the fig.autofmt_xdate()
function. The x-axis labels will be formatted accordingly.
Vertical dotted grid lines are added in line 57 and 58.
The plot layout is optimised, the figure is saved to a file and then closed in lines 61,
64 to 65 and 68, respectively.
Python - Creating Plots 172
Figure 8.5.4.1: Graph with multiple lines and two different scales on the left and right y-axis showing
temperature and humidity over 6 days at Puerto de la Cruz, Tenrife.
Code 8.5.5.1: Plotting a graph with multiple lines and standard deviations using fill_bewteen()
function.
1 import matplotlib.pyplot as plt
2 import numpy as np
3 from netCDF4 import Dataset
4 from matplotlib.ticker import MultipleLocator
5
6 # read data
7 f = Dataset('../data/era5_sfcwind_ymonmean_bodele.nc', mode='r')
8 era5mean = f.variables['si10'][:,0,0]
9 f.close()
10 f = Dataset('../data/era5_sfcwind_ymonstd_bodele.nc', mode='r')
11 era5std = f.variables['si10'][:,0,0]
12 f.close()
13 f = Dataset('../data/erai_sfcwind_ymonmean_bodele.nc', mode='r')
14 eraimean = f.variables['wspd10m'][:,0,0]
15 f.close()
16 f = Dataset('../data/erai_sfcwind_ymonstd_bodele.nc', mode='r')
17 eraistd = f.variables['wspd10m'][:,0,0]
18 f.close()
19
20 # setup figure and axes
21 fig, ax = plt.subplots(1, 1, figsize=(3.98, 5.5))
22
23 # plot data
24 mon = np.arange(12)+1
25 ax.plot(mon, era5mean, color='skyblue', label='ERA5')
26 ax.fill_between(mon, era5mean+era5std, era5mean-era5std, facecolor='skyblue',
27 alpha=0.5)
28 ax.plot(mon, eraimean, color='lightcoral', label='ERA-Interim')
29 ax.fill_between(mon, eraimean+eraistd, eraimean-eraistd, facecolor='lightcoral',
30 alpha=0.5)
31
32 # format x axis
33 ax.set_xlabel('Month')
34 ax.axes.set_xlim(1, 12)
35 ax.xaxis.set_major_locator(MultipleLocator(1))
36
37 # format y axis
38 ax.set_ylabel('Wind Speed [m/s]')
Python - Creating Plots 174
39 ax.axes.set_ylim(2, 10)
40 ax.yaxis.set_major_locator(MultipleLocator(1))
41 ax.yaxis.set_minor_locator(MultipleLocator(0.5))
42
43 # add title
44 ax.set_title('10m Wind Speed Bodele LLJ\n1979-2012')
45
46 # add legend
47 ax.legend(loc='lower left', fontsize=8)
48
49 # optimise layout
50 fig.tight_layout()
51
52 # save plot
53 fig.savefig('../images/7_python_multiple_lines_fill_between_300dpi.png',
54 format='png', dpi=300)
55
56 # close plot
57 plt.close()
All packages and functions used in the script are imported in lines 1 to 4.
The long-term monthly mean wind speed values and corresponding standard devia-
tions are read into the variables era5mean and era5std for ERA5 in lines 7 to 9 and 10
to 12, respectively. The fields are read in for ERA-Interim into the variables eraimean
and eraistd in lines 13 to 15 and 16 to 18, respectively.
A figure (fig) and axis (ax) is set up in line 21.
A sequence of numbers from 1 to 12 representing the months is saved into the variable
mon in line 24. This variable is used in plotting commands in the following lines.
ERA5 mean wind speed is plotted in 25 with the colour skyblue and the label
ERA5 (used in legend). In line 26 the ax.fill_between() function is used to plot
the associated standard deviations. The first argument the function expects is the
variable holding the x-axis coordinate values (mon). The second and third variable
hold the y-axis variables and the area between them will be filled in. In this example
the standard deviation values (era5std) are added to the mean (era5mean) to create
the upper boundary of the fill area subtracted from the mean to create the lower
boundary of the fill area (compare with Figure 8.5.5.1). The fill colour is set to skyblue
Python - Creating Plots 175
(same as for the mean values line) but alpha is set to 0.5 to make the fill area semi-
transparent.
The same plotting routines are repeated for ERA-Interim in line 28 to 30.
The x-axis and y-axis are formatted in lines 33 to 35 and 38 to 41, respectively and a
title is added in line 44.
A legend is placed in the lower left corner of the plotting area with a font size of 8
using the ax.legend() function. The legend will use the labels defined in the plotting
commands in line 25 and 28.
Finally, the plot layout is optimised, the figure saved to a file and closed in lines 50,
53 to 54 and 57, respectively.
Python - Creating Plots 176
Figure 8.5.5.1: Plot of longterm mean and standard deviation of 10 wind speed at the Bodele
Depression (Chad) as seen in ERA5 and ERA-Interim.
Python - Creating Plots 177
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from netCDF4 import Dataset
4 from itertools import cycle
5
6 # list of model names
7 modlist = ['ACCESS1-0', 'ACCESS1-3', 'bcc-csm1-1-m', 'bcc-csm1-1', 'BNU-ESM',
8 'CanESM2', 'CCSM4', 'CESM1-BGC', 'CESM1-CAM5', 'CMCC-CESM',
9 'CMCC-CM', 'CMCC-CMS', 'CNRM-CM5', 'CSIRO-Mk3-6-0', 'EC-EARTH',
10 'FGOALS-g2', 'FIO-ESM', 'GFDL-CM3', 'GFDL-ESM2G', 'GFDL-ESM2M',
11 'GISS-E2-H-CC', 'GISS-E2-H', 'GISS-E2-R-CC', 'GISS-E2-R',
12 'HadGEM2-AO', 'HadGEM2-CC', 'HadGEM2-ES', 'inmcm4', 'IPSL-CM5A-LR',
13 'IPSL-CM5A-MR', 'IPSL-CM5B-LR', 'MIROC5', 'MIROC-ESM-CHEM',
14 'MIROC-ESM', 'MPI-ESM-LR', 'MPI-ESM-MR', 'MRI-CGCM3', 'MRI-ESM1',
15 'NorESM1-ME', 'NorESM1-M']
16
17 # set up figure and map projection
18 fig, ax = plt.subplots(figsize=(5.5, 3.98))
19
20 # set up sequences of colours and symbols
21 colors = iter(plt.cm.gist_rainbow(np.linspace(0, 1, len(modlist))))
22 cycle_marker = cycle(['o', 'v', '^', '<', '>', 's'])
23
24 # loop through models
25 for i, m in enumerate(modlist):
26 # read two model files (pr and tas)
27 f = Dataset('../data/cmip5/pr_Amon_'+m+'_rcp85-hist.nc', mode='r')
28 pr = f.variables['pr'][:,0,0]
Python - Creating Plots 178
29 f.close()
30 f = Dataset('../data/cmip5/tas_Amon_'+m+'_rcp85-hist.nc', mode='r')
31 tas = f.variables['tas'][:,0,0]
32 f.close()
33
34 # scatter data
35 ax.scatter(tas, pr, c=next(colors), marker=next(cycle_marker),
36 s=10, label=m)
37
38 # format x-axis
39 ax.set_xlim(2.25, 5.25)
40 ax.set_xlabel('\u0394 Surface Temperature [\u00B0C]', fontsize=7)
41
42 # format y-axis
43 ax.set_ylim(0, 0.7)
44 ax.set_ylabel('\u0394 Precipitation [mm day$\mathregular{^{-1}}$]', fontsize=7)
45
46 # formats for both axes
47 ax.tick_params(axis='both', which='major', labelsize=6)
48 ax.grid(linewidth=0.5, color='black', alpha=0.5, linestyle=':')
49
50 # add legend
51 plt.subplots_adjust(right=0.6)
52 ax.legend(loc='upper left', bbox_to_anchor= (1.02, 1.0), fontsize= 6, ncol=2,
53 frameon=False)
54
55 # save figure to file
56 plt.savefig('../images/7_python_scatter_plot_legend_300dpi.png', format='png',
57 dpi=300)
58
59 # close plot
60 plt.close()
In lines 7 to 15 a list called modlist is created that contains the names of 40 models.
These are part of the input filenames. In line 18 the plot is set up.
Each symbol on the scatter plot can be identified by its marker shape and colour
(Figure 8.6.1.1). A sequence of 40 unique colours is created in line 21 based on the
plt.cm.gist_rainbow colour map. The Python-internal iter command is used here to
Python - Creating Plots 179
create an iterable object named colors that contains the colours for the scatter plot
markers.
In line 22 a sequence of 6 markers is passed on to the cycle function from the
itertools package creating an object called cycle_marker that can be cycled over.
Both the iterable colors object and the cyclable cycle_marker object are used later in
the scatter() function in line 35.
The loop set up in line 25 loops over the elements of modlist with the variables i
being a counter starting at 0 and m being the model name.
The data to be plotted are read in in lines 27 to 32 yielding the NumPy arrays pr
for precipitation and tas for surface temperature. Each file has been processed using
CDO and contains a single value. The precipitation variable represents the difference
in mean tropical (15°S and 15°N) precipitation between RCP8.5 projection (2081-2100)
and the Historical period (1986-2005). The surface temperature variable represents the
corresponding value but for global mean temperature.
Python - Creating Plots 180
Figure 8.6.1.1: Scatter plot showing CMIP5 model global mean surface temperature change between
Historical (1986-2005) and RCP8.5 (2081-2100) versus the corresponding change in mean precipita-
tion in the tropics (15°S-15°N). The legend is placed to the right of the plot.
With each loop iteration one marker is placed on the scatter plot by the ax.scatter()
function in line 35 representing one model. The two values in the variables tas and
pr are passed on to the function first. The keyword c stands for colour and the
next(colors) function will automatically move to the next colour of the colors object
created in line 21. The shape of the symbol is defined by the marker keyword and here
the next(cycle_marker) will cycle through the markers defind in line 22. The size s is
set to 10 and the model name is passed to the label keyword.
Line 43 to 48 deal with formatting the x-axis and y-axis of the plot setting axis limits
and labels. Note how the Unicode character for a delta symbol (∆) is included in the
axis labels. Also, scientific notation was included using -1 as superscript with days to
indicate mm/day. Some more formatting of axis label sizes and grid lines is done in
lines 47 and 48.
Python - Creating Plots 181
Placing a legend within the plotting area of the current axis (ax) is reasonably
straightforward and can be achieved by using the location keyword loc with one
of the 11 options described in the documentation (e.g., best, upper left or center
right). However, placing a legend outside the the plotting area is more challenging.
The solution employed here works as follows.
First, a space on the right-hand side of the canvas is created by reducing the plotting
area on the right to just 0.6 (figure coordinates) using the plt.subplots_adjust()
function. The legend position is then controlled by the loc and bbox_to_anchor
keywords. loc='upper left' here refers to the upper left corner of the legend bounding
box (not the plot). This corner of the legend bounding box is then placed at the axis
(ax) coordinates 1.02 and 1.0 as defined by the bbox_to_anchor. As the horizontal value
of 1.02 is slightly larger than the plot area (ax) the legend is placed just to the right of
the plot on the canvas. The number of legend columns (ncol) is set to 2 in order to fit
all 40 labels. Drawing a frame around the legend has been turned off (frameon=False).
Finally, the plot is saved in line 56 and 57 and closed in line 60.
¹³https://stackoverflow.com/questions/4700614/how-to-put-the-legend-out-of-the-plot
Python - Creating Plots 182
1 import numpy as np
2 import matplotlib.colors as mcolors
3 import matplotlib.pyplot as plt
4
5 # read data to plot
6 npzfile = np.load('../data/sp.selevs.eelevs.emean.smean.12.JJA.npz')
7 selevs = npzfile['selevs']
8 eelevs = npzfile['eelevs']
9 smean = npzfile['smean']
10 emean = npzfile['emean']
11
12 # test
13 print('Station elevations:', selevs.shape, np.nanmin(selevs), np.nanmax(selevs))
14 print('ERAI elevations:', eelevs.shape, np.nanmin(eelevs), np.nanmax(eelevs))
15 print('Station sfc pressure:', smean.shape, np.nanmin(smean), np.nanmax(smean))
16 print('ERAI sfc pressure:', emean.shape, np.nanmin(emean), np.nanmax(emean))
17
18 # subtract observed surface pressure from ERA5 surface pressure
19 fld = emean-smean
20
21 # setup figure and axes
22 fig, ax = plt.subplots(1, 1, figsize=(5.5, 3.98))
23
24 # set up descrete color table
25 cmap = plt.cm.seismic
26 bounds = np.linspace(-25, 25, num=11)
27 norm = mcolors.BoundaryNorm(bounds, cmap.N)
28
29 # scatter plot
30 scat = ax.scatter(selevs, eelevs, c=fld, s=6, marker='o', cmap=cmap, norm=norm)
31
32 # add diagonal dashed line
33 axmax = 2500
34 ax.plot([0.0, axmax],[0.0, axmax], c='k', linewidth=0.5, linestyle=':')
35
36 # format x axis
37 ax.set_xlim(0.0, axmax)
38 ax.set_xlabel('Station Elevation [m]', fontsize=8)
Python - Creating Plots 183
39 ax.tick_params(labelsize=7)
40
41 #formate y axis
42 ax.set_ylim(0.0, axmax)
43 ax.set_ylabel('ERAI Elevation [m]', fontsize=8)
44
45 # colorbar
46 cbar = fig.colorbar(scat, orientation='vertical', shrink=0.7, extend='both')
47 cbar.set_label('Surface Pressure [hPa]', rotation=90, fontsize=8)
48 cbar.ax.tick_params(labelsize=6, length=0)
49 cbar.set_ticks(bounds)
50
51 # optimise layout
52 fig.tight_layout()
53
54 # save plot
55 fig.savefig('../images/7_python_scatter_plot_divergent_colorbar_300dpi.png',
56 format='png', dpi=300)
57
58 # close plot
59 plt.close()
The data to be plotted were calculated beforehand and the variables (NumPy arrays)
were saved in uncompressed format using the np.savez() function. Here, we use the
np.load() function to read them back in in line 6 to 10. The variables selevs and eelevs
hold elevation data and corresponding ERA5 surface elevation data for each station,
respectively. The variables smean and emean hold the observed mean surface pressure
data and the corresponding ERA5 surface pressure data for each station, respectively.
All four variables are one-dimensional NumPy arrays with 883 elements (stations).
For testing purposes the array shapes, minimum and maximum values of each array
are printed out in lines 13 to 16.
In line 19 the observed surface pressure is subtracted from the ERA5 surface pressure
saved in a new variable named fld. This variable now has positive values for stations
where observed surface pressure is lower than in ERA5 and negative values were the
observed surface pressure is larger than in ERA5.
The figure and axis is set up in line 22 and the colour map is defined in line 25 to 27.
Python - Creating Plots 184
A divergent colour map named seismic¹⁴ ranging from blue to red is used here with
values ranging from -25 to 25 in steps of 5 (defined using the np.linspace() function).
The scatter plot itself is created in line 30. The x and y coordinates for the markers are
given by the variables selevs and eelevs, respectively. The marker colour (c keyword)
is defined by the variable fld calculated earlier in line 19. The marker size (s keyword)
is set to 6 and the marker symbol (marker keyword) is set to a circle (o). The marker
colours are defined by the cmap and norm keywords.
A diagonal dotted line is plotted in lines 33 and 34 between two points defined by
the data coordinates [0, 0] and [2500, 2500]. This line represents a zero difference
between station elevation and ERA5 surface elevation.
The x-axis and y-axis are formatted in lines 37 to 39 and 42 to 43, respectively.
A vertical colour bar with extended ends is added in lines 46 to 49. The handle scat
created in line 30 is passed on to the fig.colorbar() function.
Finally, the plot layout is optimised in line 52. the figure is saved in lines 55 and 56
and the figure is closed in line 59.
¹⁴https://matplotlib.org/tutorials/colors/colormaps.html
Python - Creating Plots 185
Figure 8.6.2.1: ERA-Interim minus observed mean surface pressure for SYNOP reporting stations
in Africa with at least 100 records for JJA 12 UTC. The biases are plotted as a function of station
elevation (x-axis) and corresponding spatially interpolated ERA-Interim elevation (y-axis). Markers
left of the dotted diagonal line are associated with ERAI elevation greater than station elevation
whereas markers on the right are associated with ERAI elevation less than station elevation.
The resulting plot can be seen in Figure 8.6.2.1 above. With only a few exceptions
stations where ERA5 elevation is less than the actual station elevation (left of dashed
line) show a negative surface pressure bias whereas stations where ERA5 elevation
is larger (right of dashed line) show a positive surface pressure bias. The magnitude
of the bias (positive and negative) increases with distance from the dashed line.
33 ax.coastlines()
34 ax.add_feature(feature.BORDERS, linestyle='-', linewidth=0.5)
35 ax.add_feature(feature.LAKES, alpha=0.5)
36 ax.add_feature(feature.RIVERS)
37
38 # set up discrete colour table
39 cmap = plt.cm.hot
40 bounds = np.arange(0, 1.1, 0.1)
41 norm = mcolors.BoundaryNorm(bounds, cmap.N - 50)
42
43 # plot symbols
44 scat = ax.scatter(lons, lats, c=corr, s=msize, cmap=cmap, norm=norm,
45 edgecolors='none', marker='s')
46
47 # format map gridlines and labels
48 gl = ax.gridlines(draw_labels=True, linewidth=0.5, color='black', alpha=0.5,
49 linestyle=':')
50 gl.xlabels_top = False
51 gl.xlocator = mticker.FixedLocator(np.arange(-180,180,20))
52 gl.xformatter = LONGITUDE_FORMATTER
53 gl.xlabel_style = {'size':7, 'color':'black'}
54 gl.ylabels_right = False
55 gl.ylocator = mticker.FixedLocator(np.arange(-90,90,20))
56 gl.yformatter = LATITUDE_FORMATTER
57 gl.ylabel_style = {'size':7, 'color':'black'}
58
59 # colorbar
60 cbar = fig.colorbar(scat, orientation='vertical', shrink=0.7)
61 cbar.set_label('Correlation Coefficient', rotation=90, fontsize=7)
62 cbar.ax.tick_params(labelsize=7, length=0)
63 cbar.set_ticks(np.arange(11)/10)
64
65 # add legend by scatter plotting 5 imaginary markers with labels
66 for i in np.arange(5):
67 ax.scatter([], [], c='black', s=sarr[i], edgecolors='none', marker='s',
68 label=str(nobsbins[i])+'-'+str(nobsbins[i+1]))
69 ax.legend(loc='lower left', fontsize=7)
70
71 # optimise layout
72 fig.tight_layout()
Python - Creating Plots 188
73
74 # save figure to file
75 plt.savefig('../images/7_python_map_markers_300dpi.png', format='png', dpi=300)
76
77 # close plot
78 plt.close()
The figure and axis is set up in lines 27 and 28 setting the projection to ccrs.PlateCarree().
The general map characteristics are set in lines 31 to 36. The map domain and
coordinate reference system (crs) are set in line 31. The stock background image is
set in line 32 and coastlines are added in line 33. Additional features such as country
borders, lakes and rivers are added using the ax.add_feature() function in lines 34,
35 and 36, respectively.
A discrete colour table is set up in lines 39 to 41 using the hot¹⁵ colour map.
The ax.scatter() function is used to plot markers for each station on the map.
The x and y coordinates of the markers are given by the lons and lats variables,
¹⁵https://matplotlib.org/tutorials/colors/colormaps.html