Rclimtool User Manual: by Lizeth Llanos Herrera, Student Statistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

RClimTool

USER MANUAL
By Lizeth Llanos Herrera, student Statistics

This tool is designed to support, process


automation and analysis of climatic series within
the agreement of CIAT-MADR. It is not intended to
compete or supplant other available tools
developed by other entities. Rather, we seek a
collaborative and ongoing feedback between
methodologies work.

www.aclimatesectoragropecuariocolombiano.org
RClimtool has been designed with the objective to facilitate the
performance of statistical analysis, quality control, filling
missing data, homogeneity analysis and calculation of
indicators for daily weather series of maximum temperature,
minimum temperature and precipitation.

INSTALLING AND RUNNING R


The tool was developed under the R language, therefore you have to install this program, specifically the
R 2.15.0 version, which can be downloaded from the following link: http://cran.r-project.org/bin /
windows/base/old/2.15.0 /

Once you have R installed, the following window will appear:

www.aclimatesectoragropecuariocolombiano.org
INSTALLING AND RUNNING RClimTool
To run the application interface you have to load the source code as shown in the following figure:

Once the code has been loaded successfully the subsequent GUI will appear:

The previous figure shows the main window of the tool, which is divided into different modules, each
located in the left panels of the interface. The content of these modules will be developed later.

www.aclimatesectoragropecuariocolombiano.org
WHAT MAKES RClimTool?
RClimTool offers different analysis options, designed with the objective of providing an application that
brings together everything needed to perform a comprehensive study of climate data.

To illustrate the functions of each of the modules, the analysis of daily weather series for the variables
maximum temperature, minimum temperature and precipitation from 10 meteorological stations will
be demonstrated in the next chapters.

1. Data reading

In the data reader module you will find different buttons that allow you to read and load the databases
with the information of the variables of interest. Important: Do not use accents or the letter "ñ" to
name folders and files to be used with the tool, as this creates conflict when using the application.

The buttom Change Directory (1) provides the option to select the directory where the files will be
loaded. This will also be the location to save all outputs of the application.

Figure 1: Data reading

In part (2) of Figure 1 are buttons that allows you to upload the information for each variable. For
example, by clicking on the Maximum Temperature, a popup window will be appear where you can find
the file that contains the maximum daily temperatures of different stations. You can perform this
procedure for all other variables that need to be analyzed.

www.aclimatesectoragropecuariocolombiano.org
i
a
g
n
o
s
Popup window
t
i
c
Figure 2: Example sfile selection
R
e is selected. Select the file and then click OK as
In this window, the location and the file we want to load
p
shown in Figure 2. Remember to close the popup window each time a different variable loads.
o
r
Important: The input data format is specified in Appendix
t A
(
2. Graphical and descriptive analysis )

Once we have loaded the data for all variables to be analyzed, we proceed to the descriptive analysis for
each of them. Consequently, you can specify the analysis period, which is useful if you want to analyze
only a section of the series, e.g. March-1990 to January-1991. However, if you want to analyze the full
data set then these fields must be empty.

Period of
analysis

i
a
g
n
o
Figure 3: Example descriptive
s analysis
t
i
c
s
R
www.aclimatesectoragropecuariocolombiano.org
e
p
o
r
After selecting the variable to be analyzed as shown in Figure 3, proceed to click on the Descriptives
button and the results can be seen on the R console (see Figure 4).

R Console

Figure 4: Descriptive analysis

For graphical analysis, you can generate different types of automatic graphics, which are generated for
all variables. If you want to work with monthly climatological information (monthly average temperature
and monthly total for precipitation) you have to select the Monthly Analysis Type option, then click any
of the buttons (Plot Charts, Graphs Scatter plots or Boxplot) and a message with the location of the
graphs generated will appear (see Figure 5).

Option to monthly
graphs

Figure 5: Automatic graphical analysis

www.aclimatesectoragropecuariocolombiano.org
Another option is custom shape graphics: By clicking on the module buttons Custom Graphics a window
will appear, where the fields will need to be specified for the x and y arguments and the according
variables can be chosen by a dropdown list.

Other attributes, such as title, axis labels, color, etc. can be used to customize the graph (if you require
more information on the attributes of the graph, click on the Help button). Once the variables are
selected and the attributes are modified, you can click OK and a new window will display the graph (see
Figure 6).

Figure 6: Custom graphics

3. Quality control
An important aspect to consider for the analysis of climate data is quality control. This is useful to
generate criteria and/or filters in order to identify unreasonable and/or erroneous data.

i
a
g
n
o
s
t
Figure 7:iQuality control
c
s
www.aclimatesectoragropecuariocolombiano.org
R
e
p
o
In Figure 7 the Quality Control module is displayed. Here are some editable fields that have to be filled in
by the user, for example the number of standard deviations, a useful criterion for identifying outliers in a
series (the default is 3). The range of the variable has to be specified according to the expected logic
values that the variable can take.

By clicking the button Validate a window will pop up, indicating the status of each station regarding the
range set for the variable. The criteria executed in the console are (see Figure 8):

 % Atypical data: This is defined as the percentage of data that are not within the following range
[ ̅ ], where ̅ and are the sample mean and sample standard deviation of the variable to
validate respectively. Note: This criterion is not suitable for the precipitation variable, which usually
has an asymmetric distribution.

 % Data out of range: Indicates the percentage of data that are outside the limits defined for the
range of the variable. The data identified for this criterion will be replaced automatically by NA's.

 % Data tmax <tmin: Calculated only for temperatures and indicates the percentage of data in which
the maximum temperature was lower than the minimum temperature on the same date. The data
identified for this criterion will be replaced automatically by NA's.

 % Data variation ≥ 10 (TM_10): Only calculated for temperature variables, and indicates the
percentage of days in which the variation of temperature data over another one was higher than or
equal to 10°C.

 % Consecutive data: Identifies the equal data in a period longer than five consecutive days in the
analyzed time series and these are replaced by NA's.

Figure 8: Criteria for the quality control

For outliers data and TM_10 filters, different files will be created for each of the stations in Excel. There
you will find the data that were identified before, accompanied by their respective date. It is up to the
user to replace data identified by these filters by NA’s. This has to be performed manually on the files
generated in the Missing Data folder, where you can find the files after you have completed the Quality
Control of all variables (see Figure 9).

www.aclimatesectoragropecuariocolombiano.org
If you want to replace the data
identified in the Quality Control
by NA's should be done on
these files.

File folders unreasonable


and/or erroneous data for each
station

Figure 9: Identification and replacement of unreasonable data by NA's

Figure 10: Creating the preliminary report

By clicking the button you can generate a pre-report and a Word file is automatically created with a
report. This report includes a preliminary descriptive analysis and further criteria generated in the
Quality Control module, supplemented with the graphics made by the application. The pre-report will be
stored in the directory listed in the popup window, as shown in Figure 10.

4. Missing data

Filling missing data is performed using the R package RMAWGEN which from VAR model estimation
performs data filling. Importantly, this methodology is useful when you have low percentages of NA data
and when information from various stations is linked and not showing much variability.

For this module it is essential that data from several stations are in the SAME PERIOD variables for
maximum temperature, minimum temperature and precipitation because of their interaction with each
other to complete the missing data.

www.aclimatesectoragropecuariocolombiano.org
d
i
a
g
n
o
s

d
t
i
c
s
R
e
p
o
r
t
(
)

Figure 11: Filling missing data

In Figure 11 the required fields that must be specified to fill the missing data are shown, click on the
complete data button to start. This process can take several minutes to finish.

Once the process is finished, a window appears again indicating that the process is complete. In the
Missing Data folder databases for each of the variables and graphics of the original series versus series
generated will be created (see Figure 12).

Folders with
graphical outputs

Data files generated


(no missing data)

Figure 12: Location data missing files

www.aclimatesectoragropecuariocolombiano.org
5. Homogeneity Analysis Series

In this module, several statistical tests were implemented to analyze the homogeneity of the series:

 Normality tests: These tests check whether the variable data in the study came from a normal
distribution, and if this assumption is true, parametric tests should be used. However, in case the
assumption is false, non-parametric tests are required.

 Seasonality (trend): Spearman’s rank correlation* and Mann-Kendall test are proposed. For future
estimates it is necessary that this assumption of Seasonality is met.

 Stability in variance: F- Test* is applied on subsets of information.

 Stability in Media: Includes T-Test* and U Mann-Whiney test as non-parametric alternative to the T-
test, using the medium as a more robust statistic than the statistical average.

Note: Tests with * require of compliance with the normality assumption.

In Figure 13 some of the results obtained for this module can be seen. In this example, the variable tmax
and a significance level of 5% were used. The displayed console tables obtained for each test, which
include the p-value and the decision according to the significance level chosen for each station.

Figure 13: Homogeneity analysis

www.aclimatesectoragropecuariocolombiano.org
For this module provides the option to generate a report that summarizes all statistical tests included in
the analysis of homogeneity. To do so, you can click on the Generate Report button.

6. Indicator calculations

You have got the following sub-modules for indicator calculations:

 Annual indicators: The number of days that meet the specified condition each year (Higher than or
Lower than) is calculated. The value of the criterion defining the condition is up to the user.

 Monthly Indicators: For this sub-module monthly maximum, minimum or average temperatures/
precipitation data are calculated.

To perform these calculations, you firstly need to select the period and the variable to be analyzed. In
the following the value for the indicator of interest is selected by clicking on the checkbox. Finally, the
Indicators folder Excel files will be generated with the calculated indicators (see Figure 14).

dia
gno
sti
csR
epo
rt(
)

Figure 14: Calculation of annual and monthly indicators

7. ENSO Condition (El Niño/ Southern Oscillation )

RClimTool has information on ENSO conditions from 1950 to 2013 which is available on monthly (1) or
quarterly (2) intervals (see Figure 15). After selecting the period of interest you can proceed by clicking
the consultation of your interest and the results will appear in the R console (see Figure 16).

www.aclimatesectoragropecuariocolombiano.org
1

Figure 15: ENSO condition

Figure 16: Example consultation ENSO Condition

KNOWN ISSUES
One problem identified for this version is in the form of missing data: In order to carry out the data
filling, the range of dates of the variables has to contain data from January 1 of the initial year of analysis
until 31 December of the final year.

REPORT PROBLEMS
Please report any problem to Lizeth Llanos l.llanos@cgiar.org and David Arango
d.arango@cgiar.org including screenshots of error messages and data used for analysis.
Furthermore we appreciate any suggestions that contribute to the improvement of the tool.

www.aclimatesectoragropecuariocolombiano.org
APPENDIX A: INPUT DATA FORMAT
Files have to be in CSV format (comma delimited). You must apply different bases for each of the
variables that contain the analyzed stations. These bases must comply with the following aspects:

1. Columns in the following sequence: day, month, year followed by the names of the stations.
NOTE: units precipitation= mm and temperature units = degrees Celsius

2. For cases in which missing data are submitted, they have to be coded as NA; data records must
be in chronological order. Missing dates are not allowed.

Example input data format for RClimTool:

Stations
names

Figure 17: Precipitation variable input format

www.aclimatesectoragropecuariocolombiano.org
Figure 18: Variable input format maximum temperature

Figure 19: Variable input format minimun temperature

www.aclimatesectoragropecuariocolombiano.org
www.aclimatesectoragropecuariocolombiano.org

You might also like