1 s2.0 S0038092X20301894 Main

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Solar Energy 199 (2020) 685–693

Contents lists available at ScienceDirect

Solar Energy
journal homepage: www.elsevier.com/locate/solener

Data Article

irradpy: Python package for MERRA-2 download, extraction and usage for T
clear-sky irradiance modelling
Jamie M. Brighta, , Xinyu Baib, Yue Zhangd, Xixi Sunc,d, Brendan Acorde, Peng Wangc,d,f,g

a
Solar Energy Research Institute of Singapore (SERIS), National University of Singapore (NUS), Singapore 117574, Singapore
b
School of Computer Science and Engineering, Beihang University, Beijing, China
c
Laboratory of Mathematics, Information and Behavior, Beihang University, Beijing, China
d
School of Mathematical Sciences, Beihang University, Beijing, China
e
Syene Clean Energy, Hong Kong, China
f
School of Microelectronics, Beihang University, Beijing, China
g
Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China

ARTICLE INFO ABSTRACT

Keywords: Clear-sky irradiance modelling requires many input variables to access the best available methodologies in
MERRA-2 literature. It is the authors’ experience that there are unnecessary computational barriers to obtaining and using
Reanalysis the data needed for worldwide clear-sky modelling. For this reason, we present a data article introducing a new
Clear-sky irradiance Python package called irradpy that, in this first release, (i) downloads the specific variables required for
Data article
worldwide clear-sky irradiance modelling from the MERRA-2 reanalysis database and reorganises them for
Python
REST2
optimised usage, (ii) provides two functions for the world leading global horizontal irradiance clear-sky models
irradpy (MAC2 and REST2v5) to make use of the downloaded MERRA-2 reanalysis database, (iii) provides an example
script showing how it works and can be used.
The package is available through the Python Package Index (PyPi) and can also be found in the GitHub
repository at https://github.com/BXYMartin/Python-irradpy.

1. Introduction Recently, Sun et al. (2019) evaluated and ranked 75 clear-sky global
horizontal irradiance models using input variables from the reanalysis
In the field of solar energy, many projects require access to accurate database called MERRA-2, or Modern-Era Retrospective analysis for
clear-sky irradiance estimates from forecasting (Yang, 2018b), resource Research and Applications version 2. Sun et al. (2019) made access to
assessment (Bright, 2019), synthetic irradiance (Frimane et al., 2019; multiple clear-sky models easy by providing the code to all models
Bright et al., 2015, 2017), irradiance downscaling (Bright, 2019), clear- (found at Bright and Sun (2019)).1 Access to the input reanalysis data,
sky detection algorithms (Bright et al., 2020; Gueymard et al., 2019), however, was not provided as it was downloaded and pre-processed
PV shading detection (Lingfors et al., 2018), quality controlling and manually. Following a call for submissions of data articles to the
tuning PV profiles (Killinger et al., 2017; Bright et al., 2019), irradiance Journal of Solar Energy (Yang et al., 2018), interesting opportunities
separation modelling (Bright and Engerer, 2019; Yang and Boland, for data access have emerged; notably OpenSolar (Feng et al., 2019)
2019), etc. However, it is the authors experience that obtaining a state- and SolarData (Yang, 2018a), which remove certain hurdles to data
of-the-art estimates of clear-sky irradiance is not straight forward and access. With a clear need for improved accessibility to clear-sky irra-
certainly far from ‘easily accessible’; this is perhaps best evidenced by diance input data, a data article on the topic would be most useful.
the lack of its uptake of the best clear-sky models in literature, which Reanalysis data is the result of a meteorological data assimilation
can only be a result of difficulty obtaining appropriate input varia- approach that processes historical data observations that span a sub-
bles—this reasoning is directly stated to be a barrier by Gueymard stantial period of time. Reanalysis databases that are of interest must
(2019) and Bright and Engerer (2019). have global coverage to satisfy the paper objective of worldwide ap-

Corresponding author.

E-mail address: jamie.bright@nus.edu.sg (J.M. Bright).


1
Note that this work is being extended for direct normal irradiance and diffuse (Sun et al., 2020) and all code for even more clear-sky models are scheduled for
release in Python, R and Matlab.

https://doi.org/10.1016/j.solener.2020.02.061
Received 14 January 2020; Received in revised form 13 February 2020; Accepted 17 February 2020
Available online 26 February 2020
0038-092X/ © 2020 International Solar Energy Society. Published by Elsevier Ltd. All rights reserved.
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

Table 1
Detailed list of the version 5.12.4 variables extracted from the MERRA-2 reanalysis database, indicating the full variable description, the database collection where it
is found, the reference variable name, as well as the post-production conversion applied and the resulting units in the local version of the database. More variables are
downloaded that do not directly contribute to clear-sky irradiance modelling, though are considered useful for irradiance modelling in general, these are indicated
below the divide.
Variable Database Reference Conversion applied Units

Ångström exponent M2T1NXAER TOTANGSTR – –


Total aerosol extinction (550 nm) M2T1NXAER TOTEXTTAU – –
Total aerosol scattering M2T1NXAER TOTSCATAU – –
Surface albedo M2T1NXRAD ALBEDO – frac.
Total column ozone M2T1NXSLV TO3 × 0.001 atm-cm
Surface geopotential height M2C0NXASM PHIS × 9.80665 m2s−2 m
Surface pressure M2T1NXSLV PS × 0.01 mb
Total precipitable water vapour M2T1NXSLV TQV × 0.1 atm-cm

Cloud optical thickness of all clouds M2T1NXRAD TAUTOT – frac.


Surface incoming shortwave flux M2T1NXRAD SWGDN – W m−2
Surface incoming shortwave flux assuming clear sky M2T1NXRAD SWGDNCLR – W m−2
Total cloud area fraction M2T1NXRAD CLDTOT – frac.

plicability. For the same reasoning provided in Sun et al. (2019), the We truly hope the solar community finds this work as useful as we
reanalysis selected is the MERRA-2. It is a global and long-term re- do. The irradpy package is hosted on GitHub, where contributors can
analysis database assimilating climate system physical processes, space- help advance the package. If you find any bugs or issues whilst using
observed aerosol information, and additional ground observations the package, visit the repository at https://github.com/BXYMartin/
(Gelaro et al., 2017). MERRA-2 is produced by NASA’s Global Modeling Python-irradpy and use the issue reporting feature (Bai et al., 2019). In
and Assimilation Office (GMAO) and provides 1-h resolution data since early stages of the package, there are inevitably bugs that can be fixed,
1980 to 2-months ago on a 0.5° by 0.625° regular grid. MERRA-2 was particularly due to the cross-platform operating system usage of the
found to be more accurate than its contemporaries for solar energy and research community.
aerosol estimations (Yang and Bright, 2020; Gueymard and Yang, We have plans to extend the downloading capabilities to other
2019), hence, it is the most suitable for such a data article. Readers are freely available databases useful for irradiance modelling, and also to
strongly advised to read the documentation by Gelaro et al. (2017) for bring other modelling features such as clear-sky detection etc., so do
deeper information on MERRA-2 data. engage with the community if you have feature requests and ideas for
Whilst NASA Goddard Earth Sciences (GES) Data and Information irradpy.
Services Center (DISC).2 provide a web tool for sub-selection and
downloading of the MERRA-2 database, it is a manual process and still 2. The irradpy package
requires the user to be able to know how to appropriately use the data
thereafter. Furthermore, the raw data from MERRA-2 is stored on a file- Before we begin detailing the package, we first state some ex-
per-day-per-collection basis, which adds additional computational de- pectations and requirements.
mand when calculating long term clear-sky irradiance. To add to the Naturally, the download requires an internet connection. The vo-
complexity, the variables required for clear-sky irradiance inputs are lume of data to be downloaded can be large depending on specification
stored across four different collections (detailed in Table 1). There is a (roughly 73 GB per year for the default variables). Therefore, we want
clear opportunity to simplify the process of obtaining accurate clear-sky to provide a note of caution: do not attempt this without some form of
irradiance estimates. Hence, the Python package irradpy was devel- unlimited data plan; we are not responsible for internet costs incurred
oped. resulting from using this package. Furthermore, depending on con-
This data article introduces the irradpy package. It is a Python nection speed and computer processing capabilities, the initial down-
package recently designed that is to be a home for all things irradiance load section could take a substantial period of time.
related. In this first release, it facilitates the downloading and access to For example, the authors have downloaded the entire 40-year his-
the MERRA-2 reanalysis database for fast and accurate global clear-sky tory of the MERRA-2 data using the example downloader (Section 2.1)
irradiance modelling worldwide. with a relatively fast internet connection (100 mb/s), a Windows laptop
The Python package is called irradpy and is supported only by with Intel i7-8565U 1.8 GHz processor and 16 GB of RAM, writing to an
Python 3.6+. The first release has three core competencies: down- external HDD hard drive, the process averaged 4-h per year of down-
loading of MERRA-2 data, extraction of appropriate data from MERRA- load data; additionally, the merging of data after download averaged 1-
2 and the modelling of clear-sky irradiance. The irradpy package shall h per year of data. The download need only occur first and once;
be maintained and developed moving forward so that the pool of re- however, integrity checks for file corruption may occur if the download
searchers with access to state-of-the-art clear-sky modelling grows. is interrupted at the terminal (Ctrl + C or Z), or if the log file (in-
This paper is organised into three more sections. The package is dex.npy) is deleted; please don’t delete this.
described in Section 2, detailing the download, extraction and usage Once downloaded, the raw data is merged so that instead of three
components of the first release of irradpy. Additionally therein, in- files per day, there is only one. The merged file requires less memory
formation on how to individualise the package for advanced use is due to attribute description loss and dimension reduction. A merging
provided. Example usage and a walk-through of how to go from option combines the daily files into coarser time frames. Combining
downloading the irradpy package to obtaining clear-sky curves or any data into monthly files can significantly speed up the extraction of data
MERRA-2 variable is provided in Section 3. Lastly, we specify terms of as it reduces the number of files to be opened. The final memory re-
usage in Section 4. quired for a single year of data in the merged-monthly format is 26 to
27 GB per year of download (recalling that the raw data is roughly
73 GB per year).
2
https://disc.gsfc.nasa.gov/. The user of this script must first have authorised registration with

686
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

GES DISC to download the data; as this registration is on an individual final_day: integer, optional – Final day for the data to be down-
basis we do not provide credentials. Instructions for creating an loaded, default is today’s day. Recommended usage in the example
EarthData account in GES DISC are well documented on their website.3 downloader.
Therefore, we only highlight key points here. lat_1: float, optional – Define the latitude of the bottom left corner
of the rectangle region of interest; select a value from range −90–+90,
1. Register an EarthData account.4 default is −90 (whole world).
2. Link GES DISC with your account to receive authorisation to access lon_1: float, optional – Define the longitude of the bottom left
GES DISC data5 (this is vital, otherwise you cannot specifically corner of the rectangle region of interest; select a value from range
download MERRA-2 data). −180–+180, default is −180 (whole world).
lat_2 float, optional – Define the latitude of the top right corner of
Fundamentally, the data is still property of GES DISC and the user of the rectangle region of interest; select a value from range −90–+90,
this package must adhere to their user requirements. When down- default is 90 (whole world).
loading, the user is prompted to input their credentials; they can also lon_2: float, optional – Define the longitude of the top right corner
hard-code them into the script (see Section 2.1 for information). of the rectangle region of interest; select a value from range
The download cannot be concurrently run to the same output di- −180–+180, default is −180 (whole world).
rectory as race condition errors will occur by writing to the log si- output_dir: Union[string, Path], optional – Define the location of
multaneously. the output saved data. It must be a path coinciding with your operating
Our recommendation is to use the examples provided in the ex- system (recommended to use the os.join package). Default is the current
ample directory, automatically downloaded with the package. These working directory of wherever you executed the code in a new direc-
are discussed in Section 3. tory called “./MERRA2_data”.
merge_timelapse: str, optional – State the merging resolution you
want, you can use (daily) to merge files per day, (monthly) to merge
2.1. The download files per month or (yearly) to merge files per year; default is monthly.
Warning, the yearly merge is only suitable for machines with con-
The first core competency of irradpy is to download the desired siderable memory.
data from GES DISC. thread_num: integer, optional – Number of threads to use on your
In summary, the irradpy.downloader performs the following machine for simultaneous download of files. Theoretically, this can be
procedure: any number so long as you have the computing power however there
may be limits to GES DISC. Furthermore, upon completion of all
1. Initiates a session with the GES DISC server with user credentials for threads, the data is logged which spikes CPU usage, and so this can be
authorisation, fine tuned to meet local requirements. Default is 5 threads as this is
2. Specifies the date range to download, tested as suitable to GES DISC. A slight warning is issued as the thread
3. Specifies the collections and variables for download, numbers utilise the multiprocessing package, which has various un-
4. Defines the geographic rectangular area for download, predictable memory errors with Windows (none detected so far on
5. Checks local logs to see if the data already exists, Linux and Mac OS).
6. Downloads all missing data from GES DISC, merra2_var_dicts: dictionary, optional – Dictionary containing the
7. Reorganises the different collections into a single new file format following keys: esdt_dir, collection, merra_name and stan-
dard_name. The default is defined in variables.py and can be di-
The downloader is customised with the following parameters, rectly modified there. Same order as var_names shown in Section 2.3.
described in the format: variable name: class, compulsory/optional – Once the total download has completed, the three different daily
description including default settings. collections are merged into a single file. Memory requirements are re-
auth: dictionary, compulsory – Dictionary contains login informa- duced by deleting the original downloaded data, and through attribute
tion in the format {“uid”: “USERNAME”, “password”: “PASSWORD”}. In the description reduction. The process iterates through each day of the
example downloader, they are securely requested at the command line, specified download period, merges the three datasets into a new file,
though the user may simply replace them with strings where defined. and deletes the original data. The user is recommended to have at least
initial_year: integer, compulsory – Initial year for the data to be 100 GB of drive space available per year of download. This daily merge
downloaded; select from 1980–now, though if too recent, there will not process took approximately 30 min per year to complete (on the com-
be any MERRA-2 data. Recommended usage in the example down- puter as described earlier).
loader. Finally, the files are merged in the temporal dimension to form a
initial_month: integer, compulsory – Initial month for the data single file of all the data within chosen time frame for this downloaded
to be downloaded; select from 1–12. Recommended usage in the ex- subset database. By default, the merge is monthly, though can be turned
ample downloader. off (i.e., daily) or set to a yearly. The yearly merge is only suitable for
initial_day: integer, compulsory – Initial day for the data to be high-performance machines that can comfortably load 25+ GB of data
downloaded; select from 1–31. Recommended usage in the example into working memory. This process is effective in reducing the com-
downloader. putational time of later queries of the data. The merging process is
final_year: integer, optional – Final year for the data to be reasonably efficient, taking on average 1 h to merge 12 months of
downloaded; select from 1980–now, default is today’s year. worldwide coverage data. The resulting size of a one year (12x monthly
Recommended usage in the example downloader. merged files) database file is 23.6 GB. The entire database in monthly
final_month: integer, optional – Final month for the data to be merged format from 1980-01-01 to 2020-01-01 is 1.02 TB.
downloaded; select from 1–12, default is today’s month. Recommended Whilst downloading all the data does take a long time, once com-
usage in the example downloader. plete, it does not need to be repeated with new uses of clear-sky
modelling. For our purposes, we downloaded the whole database to
3
https://disc.gsfc.nasa.gov/data-access. date, with a scheduled job to run once a week to extract the latest
4
https://wiki.earthdata.nasa.gov/display/EL/How+To+Register+For+an
+EarthData+Login+Profile.
5
https://disc.gsfc.nasa.gov/earthdata-login.

687
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

MERRA-2 data. This means that the local MERRA-2 database is always available variables among any collection.6 To make changes, first look
up to date and can be used anywhere in the world between 1980 and how the current set of variables are specified from collection (view in
two months ago. conjunction to Table 1):

2.2. Clear-sky modelling

The second core competency of this first release of irradpy is to


extract data from the MERRA-2 downloaded and reformatted database
for clear-sky modelling. The extractor is integrated within the clear-sky
model for simplification so it does not need to be an additional step.
In summary, the extraction of data for clear-sky modelling follows
these steps:

1. Accept a request for data based on latitude, longitude, elevation and


a time span,
2. Find the nearest MERRA-2 cell,
3. Linear interpolate the variables to the time steps requested,
4. Unit conversion and scale height correction of variables,
5. Solar geometry calculations for zenith angle and extraterrestrial ir-
radiance,
6. Execution of the clear-sky models.

The clear-sky models as shown in the example scripts (Section 3) are


customised with the following parameters, described in the format:
variable name: class, compulsory/optional – description including
default settings.
lats: numpy.ndarray, float, compulsory – Define the latitude(s) of
the location(s) of interest, size must match lons. If the location is not
within the data set, the extractor uses nearest data point.
lons: numpy.ndarray, float, compulsory – Define the longitude(s)
Key names should represent the collection though naming is arbi-
of the location(s) of interest, size must match lats. If the location is
trary. The esdt_dir is the folder naming convention of GES DISC,
not within the data set, the extractor uses nearest data point.
collection is the name of the MERRA-2 collection, variable_name
elevs: numpy.ndarray, float, compulsory – Define the elevation(s)
are the internal variable name convention within each collection and
of the location(s) of interest, size must match lats.
standard_name is used for reporting. Should the reader desire addi-
list [pandas.Datetimeindex], optional – use
tional variables within their local database, they should add to the
timedef:
pandas.Datetimeindex to specify time of the location(s) of interest. This
var_list variable or update the variables.py dictionary with their
is only required if constructing a time series using the
desired collection following the above format. Complete detail of the
main collections used are detailed in full in the appendix.
model.solarGeometry.timeseries_builder(timedef, sta-
tion number) to create the time variable (as per the clear-sky ex-
The user may only wish to download a specified rectangle on the
amples later).
grid also (e.g. just a country, or a single location). This is achieved by
time: numpy.ndarry of , compulsory –
specifying exact longitude and latitudes of the (1) south west and (2)
Define the time series desired. If the exact time stamp is not in the data
north east corners. Sub-setting reduces memory requirements, though
set, the extractor uses interpolation to obtain approximate data ac-
each subset is treated as its own database; merging of a collection of
cording to variable interpolate. A unique time can be provided per
subset databases is not currently supported. Each unique subset request
site following the examples later. This can be simply constructed using
creates a new sub directory in the default path “./MERRA2_Data/sub-
model.solarGeometry.timeseries_builder(timedef, sta-
set_dir” unless specified by the user. Identical requests use the same
tion number) if the time series desired is evenly spaced and chron-
subset directory so the user can extend a pre-existing subset to addi-
ological.
tional time periods without re-downloading all pre-existing data of the
datadir: Union[string, Path], optional – Define the location of the
same subset.
dataset of downloaded and merged data. It must be a path coinciding
with your operating system (recommended to use the os.join
module).
3. Examples of how to use irradpy
variables: list of strings, compulsory – Define the variables of
interest. If the variable is not within data set, the extractor will throw
The previous section detailed the package itself; this section focuses
an exception.
on the practicalities of actually using the package, with examples. The
interpolate: boolean, optional – Define whether linear inter-
GitHub repository contains descriptions of how to use it across different
polation is used to obtain approximate data. Default is defined as True,
platforms. Additionally, the irradpy package comes with some default
but for some special data like PHIS (see Table 1), you need to set it
examples.
False to avoid duplicate coordinates.
Downloading the package itself is described in Section 3.1. Down-
loading the data is described in Section 3.2. Generating clear-sky curves
2.3. Individualisation
is described in Section 3.3. Extracting MERRA-2 variables is described
in Section 3.4.
The user may use the code as they see fit; as such, there are addi-
tional opportunities for the keen user.
The databases and variables accessed are not limited by purely those 6
Complete list can be found at: https://gmao.gsfc.nasa.gov/pubs/docs/
for clear-sky irradiance modelling. The user may opt for any of the Bosilovich785.pdf, and described further in the Appendix.

688
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

3.1. Download the package Time estimations for completion are reported to the console based
on the previous iteration of the last downloaded batch. Note that this
The package is hosted on PyPi—the Python Packaging Index, where reported time is for the completion for the current collection of data, of
it can be installed using pip at the command line. which there are three collections.

3.3. Generate clear-sky curves

Two functions are provided that calculate clear-sky irradiance fol-


Equally, the reader may clone the repository directly7 and then in- lowing the MAC2 Davies and McKay (1982) and REST2V5 (Gueymard,
stall the package from the command line within the repository direc- 2008) models, which were found to be the best performers worldwide
tory. For Windows, the user can use a Git interface (e.g. GitBash), else it by Sun et al. (2019). Refer to Sun et al. (2019)—particularly the sup-
can be downloaded and installed like so after navigating to the ap- plementary material—for extensive detail on the clear-sky models.
propriate target directory: Whilst both models are vastly different in both input requirements and
calculation, the usage in this system is the same:

1. User defined set of latitudes, longitudes, elevations and time dura-


tions.
Once downloaded with all dependencies, setup the package with the
2. The appropriate data is then extracted from the local MERRA-2
following command.
database, converted and scale-height corrected.
3. The function returns corresponding global horizontal, direct normal
and diffuse horizontal clear-sky irradiance time series.
It is possible that some dependency packages will need to be in-
stalled individually using pip/anaconda. The example_clearsky.py script shows an example of how one
might use the previously downloaded and merged MERRA-2 database
to make clear-sky irradiance time series.
3.2. Downloading MERRA-2 data Usage is relatively straight forward. If testing the package, run
example_download.py first, followed by example_clearsky.py. If
Within the example directory, the user will find the script ex- you have changed dates and data storage directory, you must update
ample_downloader.py. This example file shows the simplest ap- across both examples so that the data range, locations and local di-
proach to downloading as much or as little of the MERRA-2 data as rectory correspond.
desired. First, the user must specify information about the location(s) that
The input parameters are clearly defined showing how to para- they desire clear-sky irradiance data for. A location is specified as 4-
meterise the download. It is recommended that this file be edited before dimensional with latitude (deg, −90S:90 N), longitude (deg,
executing to specify bespoke date ranges and spatial subsets; though −180 W:180E), elevation (metres above sea level) and time (UTC). The
note that the standard set examples across all example files must tem- specified time series can be any temporal resolution, however, MERRA-
porally and spatially correspond. The time span of data and the spatial 2 variables are linearly interpolated between hours. Note that time is in
domain can be specified. The output directory can also be modified, for UTC, so for the full day at our specific locations (SERIS/Beihang,
example to an external drive. UTC+8) are adjusted accordingly. We define the input parameters for
Once parameterised, save any changes and call the script from the this example like so:
command line or run from an integrated development environment
(IDE). If you did not hard-code your GES DISC username and password
within the script, you will be prompted at the command line to input
this information as demonstrated below from the command line:

No credentials are stored. They are used only to initialise a con-


nection to GES DISC. The password is suppressed in most terminals/
consoles/IDEs, and will prompt the user should password suppression
not be supported. We recommend using the command line equivalent
on your operating system.
At this stage the download and merging of all files will begin.
Depending on many factors (mainly internet speed and time span of
data request), the download can take a long time. If the download is
interrupted by either the user (Ctrl + Z/C) or an error is encountered
that cannot be handled (e.g., power loss, internet connection issues,
storage issues, etc.), once the problem has been resolved, simply restart
the download. Files are not downloaded duplicated; instead, a log is
kept (index.npy) of all the downloads. It is strongly recommended
that this is not deleted, else restarting the download will require all files
to undergo integrity checks, which is time consuming.

7
https://github.com/BXYMartin/Python-irradpy.

689
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

Fig. 1. Example plot of the data using the example_clearsky.py script. The example defines two locations where latitude, longitude and elevation is set to our two
affiliations: SERIS and Beihang University. Each site has an individual time duration request for 2nd and 3rd of Jan 2018, respectively. Both the REST2 and MAC2
clear sky outputs are shown for both days; note that they are extremely similar and are only included for completeness.

Once defined, the functions are called like so: 3.4. Extract any MERRA-2 variables

The final example is how to use the package to extract any data from
within the downloaded data, not just clear-sky irradiance. As per Section
2.3, we demonstrated that any MERRA-2 collection or set of variables can
be downloaded. This example shows how to extract a time series of two
variables from the database that were not used in clear-sky modelling.
In addition to the clear-sky variables, we also downloaded by de-
fault TAUTOT, CLDTOT, SWGDN, and SWGDNCLR (c.f. Table 1). These
variables were selected as they relate well to clear-sky irradiance and
solar energy applications in general. The final example script ex-
ample_extractor.py, we demonstrate how the user may extract raw
variables directly from the database.
The specification is the same as in example_clearsky.py
whereby the latitudes, longitudes, elevations, time and
dataset_dir must be defined. After, the data is extracted to either a
pandas dataframe (pandasT¯rue), or as numpy arrays
The example script illustrates an example figure showing the two (pandasF¯alse) with the following commands:
estimates, as is shown in Fig. 1. The final part of the script is an example
of how to save the data to a text file, saving each site as an individual
file with all the data generated from the script.

Fig. 2. Example plot of the data using the example_extractor.py script. The example defines two locations where latitude, longitude and elevation is set to our
two affiliations: SERIS and Beihang University. Each site has an individual time duration request for 2nd and 3rd of Jan 2018, respectively. The variables plotted are
SGWDN and SGWDNCLR.

690
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

The extracted SWGDN and SWGDNCLR are illustrated in Fig. 2, and interests or personal relationships that could have appeared to influ-
the code for producing this figure is as within the script. ence the work reported in this paper.

4. Terms of usage
Acknowledgements
The tool is free to use. The data belongs to NASA’s GES DISC; the
authors accept no responsibility for any users of this tool that breach J.M. Bright is funded by the Energy Market Authority (EMA),
GES DISC’s terms of usage. It is the author’s request that bugs be re- Energy Programme—Solar Forecasting Grant (NRF2017EWT-EP002-
ported to the GitHub repository. Citations to this data article must be 004). X. Bai, X. Sun, Y. Zhang and P. Wang were partially funded by
made for any future publication that benefited at all from its usage. National Key Research and Development Program of China (Grant No.
2017YFB0701700).
Declaration of Competing Interest We extend our thanks to NASA GES DISC and all those responsible
for designing and maintaining the MERRA-2 reanalysis dataset. It is a
The authors declare that they have no known competing financial fantastic resource and enables many solar radiation studies.

Appendix A

The MERRA-2 database has 42 different collections available for download. irradpy was designed to work flexibly with any of them, and they

Table 2
tavg1_2d_slv_Nx (M2T1NXSLV): Single-Level Diagnostics. Frequency: 1-hourly from 00:30 UTC (time-averaged). Spatial Grid:
2D, single-level, full horizontal resolution. Dimensions: longitude5¯76, latitude3¯61, time2¯4. Granule Size: 3̃93 MB.
Name Description Units

CLDPRS cloud top pressure Pa


CLDTMP cloud top temperature K
DISPH zero plane displacement height m
H1000 height at 1000 mb m
H250 height at 250 hPa m
H500 height at 500 hPa m
H850 height at 850 hPa m
OMEGA500 omega at 500 hPa Pa s−1
PBLTOP pbltop pressure Pa
PS surface pressure Pa
Q250 specific humidity at 250 hPa hPa kg kg−1
Q500 specific humidity at 500 hPa hPa kg kg−1
Q850 specific humidity at 850 hPa hPa kg kg−1
QV10M 10-meter specific humidity hPa kg kg−1
QV2M 2-meter specific humidity hPa kg kg−1
SLP sea level pressure Pa
T10M 10-meter air temperature K
T250 air temperature at 250 hPa K
T2M 2-meter air temperature K
T2MDEW dew point temperature at 2 m K
T2MWET wet bulb temperature at 2 m K
T500 air temperature at 500 hPa K
T850 air temperature at 850 hPa K
TO3 total column ozone Dobsons
TOX total column odd oxygen kg m−2
TQI total precipitable ice water kg m−2
TQL total precipitable liquid water kg m−2
TQV total precipitable water vapor kg m−2
TROPPB tropopause pressure based on blended estimate Pa
TROPPT tropopause pressure based on thermal estimate Pa
TROPPV tropopause pressure based on EPV estimate Pa
TROPQ tropopause specific humidity using blended TROPP hPa kg kg−1
estimate
TROPT tropopause temperature using blended TROPP K
estimate
TS surface skin temperature K
U10M 10-meter eastward wind m s−1
U250 eastward wind at 250 hPa m s−1
U2M 2-meter eastward wind m s−1
U500 eastward wind at 500 hPa m s−1
U50M eastward wind at 50 meters m s−1
U850 eastward wind at 850 hPa m s−1
V10M 10-meter northward wind m s−1
V250 northward wind at 250 hPa m s−1
V2M 2-meter northward wind m s−1
V500 northward wind at 500 hPa m s−1
V50M northward wind at 50 meters m s−1
V850 northward wind at 850 hPa m s−1
ZLCL lifting condensation level m

691
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

Table 3
tavg1_2d_rad_Nx (M2T1NXRAD): Radiation Diagnostics. Frequency: 1-hourly from 00:30 UTC (time-averaged). Spatial Grid: 2D,
single-level, full horizontal resolution. Dimensions: longitude5¯76, latitude3¯61, time2¯4. Granule Size: 2̃09 MB.
Name Description Units

ALBEDO surface albedo 1


ALBNIRDF surface albedo for near infrared diffuse 1
ALBNIRDR surface albedo for near infrared beam 1
ALBVISDF surface albedo for visible diffuse 1
ALBVISDR surface albedo for visible beam 1
CLDHGH cloud area fraction for high clouds 1
CLDLOW cloud area fraction for low clouds 1
CLDMID cloud area fraction for middle clouds 1
CLDTOT total cloud area fraction 1
EMIS surface emissivity 1
LWGAB surface absorbed longwave radiation W m−2
LWGABCLR surface absorbed longwave radiation assuming clear sky W m−2
LWGABCLRCLN surface absorbed longwave radiation assuming clear sky and no aerosol W m−2
LWGEM longwave flux emitted from surface W m−2
LWGNT surface net downward longwave flux W m−2
LWGNTCLR surface net downward longwave flux assuming clear sky W m−2
LWGNTCLRCLN surface net downward longwave flux assuming clear sky and no aerosol W m−2
LWTUP upwelling longwave flux at toa W m−2
LWTUPCLR upwelling longwave flux at toa assuming clear sky W m−2
LWTUPCLRCLN upwelling longwave flux at toa assuming clear sky and no aerosol W m−2
SWGDN surface incoming shortwave flux W m−2
SWGDNCLR surface incoming shortwave flux assuming clear sky W m−2
SWGNT surface net downward shortwave flux W m−2
SWGNTCLN surface net downward shortwave flux assuming no aerosol W m−2
SWGNTCLR surface net downward shortwave flux assuming clear sky W m−2
SWGNTCLRCLN surface net downward shortwave flux assuming clear sky and no aerosol W m−2
SWTDN toa incoming shortwave flux W m−2
SWTNT toa net downward shortwave flux W m−2
SWTNTCLN toa net downward shortwave flux assuming no aerosol W m−2
SWTNTCLR toa net downward shortwave flux assuming clear sky W m−2
SWTNTCLRCLN toa net downward shortwave flux assuming clear sky and no aerosol W m−2
TAUHGH in cloud optical thickness of high clouds(EXPORT) 1
TAULOW in cloud optical thickness of low clouds 1
TAUMID in cloud optical thickness of middle clouds 1
TAUTOT in cloud optical thickness of all clouds 1
TS surface skin temperature K

Table 4
tavg1_2d_aer_Nx (M2T1NXAER): Aerosol Diagnostics. Frequency: 1-hourly from 00:30 UTC (time-averaged). Spatial Grid: 2D,
single-level, full horizontal resolution. Dimensions: longitude5¯76, latitude3¯61, time2¯4. Granule Size: 4̃76 MB.
Name Description Units

BCANGSTR Black Carbon Angstrom and Parameter [470–870 nm] 1


BCCMASS Black Carbon Column Mass Density kg m−2
BCEXTTAU Black Carbon Extinction AOT [550 nm] 1
BCFLUXU Black Carbon column u-wind mass flux kg m−1 s−1
BCFLUXV Black Carbon column v-wind mass flux kg m−1 s−1
BCSCATAU Black Carbon Scattering AOT [550 nm] 1
BCSMASS Black Carbon Surface Mass Concentration kg m−3
DMSCMASS DMS Column Mass Density kg m−2
DMSSMASS DMS Surface Mass Concentration kg m−3
DUANGSTR Dust Angstrom and Parameter [470–870 nm] 1
DUCMASS Dust Column Mass Density kg m−2
DUCMASS25 Dust Column Mass Density - PM 2.5 kg m−2
DUEXTT25 Dust Extinction AOT [550 nm] - PM 2.5 1
DUEXTTAU Dust Extinction AOT [550 nm] 1
DUFLUXU Dust column u-wind mass flux kg m−1 s−1
DUFLUXV Dust column v-wind mass flux kg m−1 s−1
DUSCAT25 Dust Scattering AOT [550 nm] - PM 2.5 1
DUSCATAU Dust Scattering AOT [550 nm] 1
DUSMASS Dust Surface Mass Concentration kg m−3
DUSMASS25 Dust Surface Mass Concentration - PM 2.5 kg m−3
OCANGSTR Organic Carbon Angstrom and Parameter [470–870 nm] 1
OCCMASS Organic Carbon Column Mass Density kg m−2
OCEXTTAU Organic Carbon Extinction AOT [550 nm] 1
OCFLUXU Organic Carbon column u-wind mass flux kg m−1 s−1
OCFLUXV Organic Carbon column v-wind mass flux kg m−1 s−1
OCSCATAU Organic Carbon Scattering AOT [550 nm] 1
OCSMASS Organic Carbon Surface Mass Concentration kg m−3
SO2CMASS SO2 Column Mass Density kg m−2
(continued on next page)

692
J.M. Bright, et al. Solar Energy 199 (2020) 685–693

Table 4 (continued)

Name Description Units

SO2SMASS SO2 Surface Mass Concentration kg m−3


SO4CMASS SO4 Column Mass Density kg m−2
SO4SMASS SO4 Surface Mass Concentration kg m−3
SSANGSTR Sea Salt Angstrom and Parameter [470–870 nm] 1
SSCMASS Sea Salt Column Mass Density kg m−2
SSCMASS25 Sea Salt Column Mass Density - PM 2.5 kg m−2
SSEXTT25 Sea Salt Extinction AOT [550 nm] - PM 2.5 1
SSEXTTAU Sea Salt Extinction AOT [550 nm] 1
SSFLUXU Sea Salt column u-wind mass flux kg m−1 s−1
SSFLUXV Sea Salt column v-wind mass flux kg m−1 s−1
SSSCAT25 Sea Salt Scattering AOT [550 nm] - PM 2.5 1
SSSCATAU Sea Salt Scattering AOT [550 nm] 1
SSSMASS Sea Salt Surface Mass Concentration kg m−3
SSSMASS25 Sea Salt Surface Mass Concentration - PM 2.5 kg m−3
SUANGSTR SO4 Angstrom and Parameter [470–870 nm] 1
SUEXTTAU SO4 Extinction AOT [550 nm] 1
SUFLUXU SO4 column u-wind mass flux kg m−1 s−1
SUFLUXV SO4 column v-wind mass flux kg m−1 s−1
SUSCATAU SO4 Scattering AOT [550 nm] 1
TOTANGSTR Total Aerosol Angstrom and Parameter [470–870 nm] 1
TOTEXTTAU Total Aerosol Extinction AOT [550 nm] 1
TOTSCATAU Total Aerosol Scattering AOT [550 nm] 1

are all well documented online.8 However, we have replicated their tables for the collections we present as an example. Table 2 presents the single-
level diagnostics. Table 3 presents the radiation diagnostics. Table 4 presents the aerosol diagnostics. As is clearly evident, our approach to subset
sampling makes working with these heavily detailed databases much more manageable. In Section 2.3 and in Table 1, we presented the particular
variables and collections that our example downloads and how they can be modified. To add certain variables to the download, simply follow those
instructions with the appropriate name from Tables 2–4 to add them to the download. To work with them, the extractor functions enable time-series
extraction of any variable.

References 5419–5454.
Gueymard, C., 2008. REST2: high-performance solar radiation model for cloudless-sky
irradiance, illuminance, and photosynthetically active radiation: validation with a
Bai, X., Bright, J.M., Zhang, Y., Sun, X., 2019. Python tool for downloading clear-sky benchmark dataset. Sol. Energy 82 (3), 272–285.
irradiance reanalysis variables from merra-2. URL https://github.com/BXYMartin/ Gueymard, C.A., 2019. Clear-sky radiation models and aerosol effects. In: Solar Resources
Python-ClearSkyPy. Mapping. Springer, pp. 137–182.
Bright, J.M., 2019. Solcast: validation of a satellite-derived solar irradiance dataset. Sol. Gueymard, C.A., Bright, J.M., Lingfors, D., Habte, A., Sengupta, M., 2019. A posteriori
Energy 189, 435–449. clear-sky identification methods in solar irradiance time series: Review and pre-
Bright, J.M., 2019. The impact of globally diverse GHI training data: Evaluation through liminary validation using sky imagers. Renew. Sustain. Energy Rev. 109, 412–427.
application of a simple Markov chain downscaling methodology. J. Renew. Sustain. Gueymard, C.A., Yang, D., 2019. Worldwide validation of cams and merra-2 reanalysis
Energy 11 (2), 23703. aerosol optical depth products using 15 years of aeronet observations. Atmosp.
Bright, J.M., Babacan, O., Kleissl, J., Taylor, P.G., Crook, R., 2017. A synthetic, spatially Environ. 117216.
decorrelating solar irradiance generator and application to a lv grid model with high Killinger, S., Bright, J.M., Lingfors, D., Engerer, N.A., 2017. A tuning routine to correct
pv penetration. Sol. Energy 147, 83–98. systematic influences in reference pv systems’ power outputs. Sol. Energy 157,
Bright, J.M., Engerer, N.A., 2019. Engerer2: Global re-parameterisation, update, and 1082–1094.
validation of an irradiance separation model at different temporal resolutions. J. Lingfors, D., Killinger, S., Engerer, N.A., Widén, J., Bright, J.M., 2018. Identification of pv
Renew. Sustain. Energy 11 (3), 033701. system shading using a lidar-based solar resource assessment model: an evaluation
Bright, J.M., Sun, X., 2019. Github: A library of clear-sky irradiance models coded in R. and cross-validation. Sol. Energy 159, 157–172.
URL https://jamiembright.github.io/clear-sky-models/. Sun, X., Bright, J.M., Gueymard, C.A., Acord, B., Wang, P., Engerer, N.A., 2019.
Bright, J.M., Killinger, S., Engerer, N.I.A., 2019. Data article: Distributed PV power data Worldwide performance assessment of 75 global clear-sky irradiance models using
for three cities in Australia. J. Renew. Sustain. Energy 11 (3), 35504. https://doi.org/ principal component analysis. Renew. Sustain. Energy Rev. 111, 550–570.
10.1063/1.5094059. Sun, X., Bright, J.M., Gueymard, C.A., Acord, B., Wang, P., Engerer, N.A., 2020.
Bright, J.M., Smith, C.J., Taylor, P.G., Crook, R., 2015. Stochastic generation of synthetic Worldwide performance assessment of 95 direct and diffuse clear-sky irradiance
minutely irradiance time series derived from mean hourly weather observation data. models using Principal Component Analysis. Renew. Sustain. Energy Rev (in pre-
Sol. Energy 115, 229–242. paration).
Bright, J.M., Sun, X., Gueymard, C.A., Acord, B., Wang, P., Engerer, N.A., 2020. Bright- Yang, D., 2018a. Solardata: an r package for easy access of publicly available solar da-
Sun: A globally applicable 1-min irradiance clear-sky detection model. Renew. tasets. Sol. Energy 171, A3–A12.
Sustain. Energy Rev. 121, 109706. Yang, D., 2018b. Ultra-fast preselection in lasso-type spatio-temporal solar forecasting
Davies, J., McKay, D., 1982. Estimating solar irradiance and components. Sol. Energy 29 problems. Sol. Energy 176, 788–796.
(1), 55–64. Yang, D., Boland, J., 2019. Satellite-augmented diffuse solar radiation separation models.
Feng, C., Yang, D., Hodge, B.-M., Zhang, J., 2019. Opensolar: promoting the openness and J. Renew. Sustain. Energy 11 (2), 023705.
accessibility of diverse public solar datasets. Sol. Energy 188, 1369–1379. Yang, D., Bright, J.M., 2020. Worldwide validation of 8 satellite-derived and reanalysis
Frimane, A., Soubdhan, T., Bright, J.M., Aggour, M., 2019. Nonparametric bayesian-based solar radiation products: a preliminary evaluation and overall metrics for hourly data
recognition of solar irradiance conditions: application to the generation of high over 27 years. Sol. Energy (in this issue).
temporal resolution synthetic solar irradiance data. Sol. Energy 182, 462–479. Yang, D., Gueymard, C.A., Kleissl, J., 2018. Editorial: submission of data article is now
Gelaro, R., McCarty, W., Suárez, M.J., Todling, R., Molod, A., Takacs, L., Randles, C.A., open. Sol. Energy 171, A1–A2.
Darmenov, A., Bosilovich, M.G., Reichle, R., et al., 2017. The modern-era retro-
spective analysis for research and applications, version 2 (merra-2). J. Clim. 30 (14),

8
Complete list can be found at: https://gmao.gsfc.nasa.gov/pubs/docs/Bosilovich785.pdf.

693

You might also like