Professional Documents
Culture Documents
MLRA v1.19 Manual
MLRA v1.19 Manual
[Version 1.19]
July 2017
1.02 03/01/2014 The goodness-of-fit statistical measures in the Validation table have J.P. Rivera &
been extended according to the proposed set of optimal statistics J. Verrelst
according to Richter et al., (2012):
Richter, K., Atzberger, C., Hank, T. And Mauser, W. (2012): Derivation of
biophysical variables from Earth Observation data: validation and statistical
measures, Journal of Applied Remote Sensing 6 (1), DOI:
10.1117/1.jrs.6.063557.
1.03 04/04/2014 Several small modifications have been introduced: J.P. Rivera &
The Seil-Then estimate has been moved to a new table with J. Verrelst
‘advanced’ statistics within the calibration/validation tables. This
means that these advanced statistics are only calculated when
selecting one ore multiple SI models in the calibration/validation
table. It has been done so because its calculation is
computationally expensive, i.e. involves the looping over many
combinations. In future it is foreseen that more advanced
statistics will be added to the table.
Options to visualize residuals have been added to
calibration/validation tables. Residuals can be plotted against
validation data.
It appeared that GPR was no longer providing sigmas. This has
been corrected.
The sigmas as delivered by VHGPR are now also provided, as
well the associated uncertainties (SD, CV) when mapping mean
estimates.
The uncertainties as provided by KRR were incorrect and have
been removed.
It appeared that the storing of the generated maps in ENVI format
was wrongly projected.
Processing time for each regression model (training plus
validation) has been added. Also the time to generate a map can
be optionally saved as a text file. In the text file both model
development processing time as well mapping processing time is
provided.
In Options within Tools processing speed can be deactivated.
A bug within one of the advanced PCAs has been corrected.
Because the MLRA toolbox only provides goodness-of-fit results
from the validation it is not possible to provide 100 training in
Setting. Some portion has to be set aside for validation. To warn
the user, when 100 is entered, the box will turn yellow.
1.04 25/04/2014 The possibility to write away the processing time of generating a J.P. Rivera &
map has been moved to Options. By default it is deactivated. In J. Verrelst
Options it the path to a text file can be defined.
The option has been added that the MLRA models are saved in
the MySQL table. This allows that earlier generated models can
be re-used.
2
1.05 05/06/2014 It appeared that when an error occurs a temporary variable J.P. Rivera &
(dummyvar.m) is not being removed and therefore causes an J. Verrelst
error in a subsequent run. Now it is first checked whether this
variable is first removed.
Regarding PCA approaches a few small bugs have been
corrected when applying ‘Best RMSE’ option.
1.10 15/01/2015 The cross-variation module has been added. This allows more J.P. Rivera &
robust sampling. Along with it statistics and plotting functions J. Verrelst
added.
The reading of image and assigning of output folder is now
synchronized with that of the SI toolbox.
1.11 26/03/2015 Several small bugs have been identified and corrected: J.P. Rivera &
The cross-var 1:1-graph appeared to show results reversed. Now J. Verrelst
corrected.
R2-adj can in some cases lead to errors. Now if error happens
results are converted to NaN and it continues processing.
The original 1:1 line was showing from 0. Now corrected.
1.12 22/04/2015 A few improvements have been introduced: J.P. Rivera &
An option has added to export a MLRA model as a Matlab file J. Verrelst
and then to import an external MLRA model.
The reading of (User) Text file has been improved. Now no longer
an error appears when some white space is remaining at the end
of the text file.
1.13 16/06/2015 Various improvements have been introduced: J.P. Rivera &
Apart from standard ENVI images, it is now also possible to read J. Verrelst
(geo)TIFF images and write away TIFF maps.
In case the image appears to be very big, to avoid memory
problems, it starts reading and processing the image line by line.
The option to develop MLRA per land cover class has been
revised.
The multi-output option appeared to be outdated. It has now been
revised and updated similar to the single-output regression
algorithms.
The window that enables reading a (User) text file has been
updated: (1) a ‘transpose’ option has been added to transpose,
e.g. exported text data; (2) the ‘combined’ input variables has
been moved more backwards.
In Tools an option has been added that during the processing of
an image it skips pixels where all the bands have the same value
(e.g. 0).
The option load image at Input menu was redundant and has
been removed.
An option to convert the to-be-processed images has been added
in the Retrieval window. It is a multiplicative conversion factor.
1.14 21/07/2015 The following improvements have been introduced: J. López-
An error was resolved regarding displaying processing time Centelles,
mapping. J.P. Rivera &
To facilitate the work flow, Menu items now change color once the J. Verrelst
step is completed.
Also, the Settings step is now disabled until the Input step is
completed. The same with Validation->New.
A sigma band analysis tool has been added. This tool iteratively
removes the worst band in model development. Currently this tool
only works for GPR and VH-GPR.
1.15 25/10/2015 The following improvements have been introduced: J.P. Rivera,
The GPR sigma band analysis tool has been updated with new J. López-
features such as removal of multiple bands at first iteration and Centelles &
various results visualization tools. J. Verrelst
A processing bar when analyzing the multiple regression
strategies has been added.
A log window has been added. This window provides an overview
of the principal steps executed within the toolbox.
Apart from processing an image, now the option has been added
to process also text files, e.g. with data coming from field
spectrometer.
3
A bug of mapping in case of NaNs within the image has been
corrected. Also a bug in case of selecting SCOPE RTM output
data was corrected.
A scatterplot option in tools has been updated with calculation of
goodness-of-fit indicators.
1.15 25/10/2015 At the end of the Manual a section on about how to deal with memory Petar
problems has been added. dimitrov
1.16 22/03/2016 The following improvements have been introduced: J.P. Rivera &
The MLRA toolbox has been made Linux-proof (however there J. Verrelst
may be issues with MySQL).
The scatterplot tool has been expanded with error maps and
histograms.
An option has been added to validate an earlier developed
regression model with new, external data. This option allows to
evaluate the portability of a regression model.
In RTM input data, the ‘combined’ option has been moved a bit
backwards, similar to USER input data.
A bug regarding reading retrieval TXT file has been resolved.
1.17 08/10/2016 The following improvements have been introduced: J.P. Rivera &
There were some bugs in the GPR sigma band analysis tool J. Verrelst
(GPR-BAT) in some plotting and mapping options. This has now
been resolved.
A new active learning (AL) module has been implemented in this
version.
A ‘dir’ issue specific for Matlab 2016has been corrected and an
error message in case of reading User data has been improved.
1.18 08/02/2017 The following improvements have been introduced: J.P. Rivera,
The Neural Network (NN) and Regression Tree (RT) methods J. Muñoz-
have been updated for the newer Matlab versions. Now NN Mari & J.
should run faster (with small datasets). However, RT may lead to Verrelst
an error on Matlab 2013 or older. Note that Matlab toolboxes
(Neural Networks and Machine Learning) are required.
Some bugs have been resolved in case of reopening results with
‘measured vs estimated’ and in case of ‘retrieval->txt’.
The SVR method has been reintroduced. Now it works for both
32 and 64 bit machines.
There was a problem with processing large images. Now all
processing occurs line-by-line to avoid running into out-of-
memory problems. A process bar has been added.
A mask option has been add to View Maps. As such, in case of
GPR it allows to mask out uncertain retrievals.
When plotting the GPR sigmas, now the component numbers are
given in case of using a PCA. Also, the wavelength labels have
been improved.
For retrieval an image, now when having selected a single image
then the output file is editable. In case multiple images are
selected then only the output folder can be selected.
When importing an external MLRA model (as created by
ARTMO) now it can be applied both to retrieval an image or txt
file.
The process bar when training and validating MLRAs has been
improved. It now provides extra process bar in case sub-models
are being created (e.g. when using cross-validation).
The measured-estimated figures have been synchronized for
without and with cross-validation sampling.
1.19 02/07/2017 The following improvements have been introduced: J.P. Rivera,
In case of TIFF image processing, now it also writes away the J. Muñoz-
geo tags in case of geoTIFF. However, this option will only work Mari & J.
when Matlab’s Mapping Toolbox is available. Verrelst
Cross-validation estimates in case of k-fold and LOO has been
corrected. Now the statistics are calculated based on the
estimates of all the subsets. Afterwards, the model that is used
for retrieval is trained by all data.
4
The neural network algorithm has been extended with advanced
options. Now various alternative optimization algorithms can be
selected.
The window with the overview of the conduced analyses
(validation→load) has been improved. Now immediately some
key properties can be observed (e.g. #bands and samples). The
same window has been applied in case of deleting or renaming
analyses. In the delete window, now multiple exercises can be
deleted at once.
To speed up image processing, it is now possible to process
images per tiles (or block) The size of the number of lines of a tile
can be adjusted to avoid out-of-memory problems in case of
large images.
In case of ENVI images processing, now also images with an
extension (e.g., ‘.bsq’) can be processed.
The dimensionality reduction SIMFEAT module has been
improved. In case of the kernel dimensionality reduction
methods, now an internal sigma optimization has been added for
the kernelized methods. Additional options to control the
optimization been added. Also some small bugs have been
corrected.
When a dimensionality reduction method is used, in the
validation table now apart from the number of bands also the
number of features (components) is given.
5
Table of Contents
1 Revision History....................................................................................................... 2
2 Introduction .............................................................................................................. 8
2.1 Ongoing development ...................................................................................... 9
2.2 Please cite the toolbox: .................................................................................... 9
3 ARTMO‘s MLRA toolbox ....................................................................................... 11
3.1 Installation ...................................................................................................... 11
3.2 ARTMO .......................................................................................................... 11
3.3 MLRA‘s modular architecture ......................................................................... 12
4 Input....................................................................................................................... 13
4.1 Input from RTM model data (LUT) ................................................................. 13
4.2 Input from external User data (TXT) ............................................................... 15
4.3 Load land cover map (optional) ...................................................................... 17
4.4 Inserting input data with land cover class labels ............................................ 18
4.5 Combining RTM data with external User data ................................................ 19
5 Settings.................................................................................................................. 21
5.1 Single-output MLRAs ..................................................................................... 21
5.2 Band tools: redundancy reduction .................................................................. 24
5.3 Cross-validation module ................................................................................. 26
5.4 Active learning ................................................................................................ 28
5.5 Configuring per land cover class .................................................................... 30
5.6 Multi-output regression algorithms ................................................................. 31
6 Validation ............................................................................................................... 33
6.1 Validation: New .............................................................................................. 33
6.1.1 Graphics .................................................................................................. 36
6.2 Outputs GPR band analysis tool (GPR-BAT) ................................................. 40
6.2.1 Select regression model.......................................................................... 42
6.3 Active learning module ................................................................................... 42
6.4 Validation: Load.............................................................................................. 44
7 Retrieval ................................................................................................................ 46
7.1 Retrieval image .............................................................................................. 46
7.2 Retrieval: Text file........................................................................................... 50
8 Tools ...................................................................................................................... 53
8.1 Save ............................................................................................................... 53
8.2 Load ............................................................................................................... 53
8.3 Manage tests .................................................................................................. 54
8.4 Options ........................................................................................................... 55
6
8.5 View maps ...................................................................................................... 56
8.6 View figure...................................................................................................... 56
8.7 Import model .................................................................................................. 57
8.8 ScatterPlot ...................................................................................................... 57
8.9 Validation external data .................................................................................. 59
9 Help ....................................................................................................................... 61
9.1 Show log......................................................................................................... 61
10 Error reporting ....................................................................................................... 62
10.1 Dealing with memory problems ...................................................................... 62
10.2 Error in case of unautharized writing temporarily files .................................... 63
7
2 Introduction
Biophysical parameter mapping from optical remote sensing images always require an
intermediate modeling step to transform spectral observations into useful estimates. This
modeling step can be approached with either statistical, physical or hybrid methods. Here
emphasis is put on statistical methods. Statistical methods can be categorized into either
parametric or nonparametric approaches.
8
When having validation data available then multiple MLRA strategies can be
analyzed against the validation dataset by using goodness-of-fit statistics. Results
are stored in a relational database.
The best performing strategy can be loaded and applied to an imagery, or a model
can be directly developed and applied to an imagery, for mapping applications.
In case of hyperspectral data, a dimensionality reduction method (e.g. PCA) can be
applied prior to the regression model.
The majority of the algorithms used in the MLRA module is based on G. Camps-Valls
regression algorithms toolbox published in:
Retrieval of Biophysical Parameters with Heteroscedastic Gaussian
Processes. Miguel Lázaro-Gredilla, Michalis K. Titsias, Jochem Verrelst and
Gustavo Camps-Valls. (2014). IEEE Geoscience and Remote Sensing Letters,
11(4). P. 838-842.
The source code of the MLRAs is available at:
http://www.uv.es/gcamps/code/simpleR.html
Please also consider citing these related papers regarding the MLRA toolbox:
Verrelst J., Dethier, S., Rivera, J.P., Munoz-Mari, J., Camps-Valls, G., Moreno,
J. (2016). Active learning methods for efficient hybrid biophysical
variable retrieval. IEEE Geoscience and Remote Sensing Letters, 13, p. 1012-
1016.
Verrelst J., Rivera, J.P., Gitelson, A., Delegido, J., Moreno, J., Camps-Valls, G.,
(2016). Spectral band selection for vegetation properties retrieval using
Gaussian processes regression. International Journal of Applied Earth
Observation and Geoinformation, 52, p. 554-567.
9
Experimental Sentinel-2 LAI estimation using parametric, non-
parametric and physical retrieval methods – A comparison. Verrelst,
J., Rivera, J.P. Veroustraete, F., Muñoz-Marí, J., Clevers, J.G.P.W., Camps-Valls,
G., Moreno, J. ISPRS Journal of Photogrammetry and Remote Sensing, 108, p.
260-272, 2015).
Gaussian processes uncertainty estimates in experimental Sentinel-2
LAI and leaf chlorophyll content retrieval. Verrelst, J., Rivera, J.P. Moreno,
J., Camps-Valls, G., ISPRS Journal of Photogrammetry and Remote Sensing, 86,
p. 157-167, 2013).
Gaussian process retrieval of chlorophyll content from imaging
spectroscopy data. Verrelst, J., Alonso, L.., Rivera, J.P., Moreno, J. Camps-
Valls, G., IEEE Journal of Selected Topics in Applied Earth Observation and
Remote Sensing, 6(2), Part 3, 2013.
Machine Learning Regression Algorithms for Biophysical Parameter
Retrieval: Opportunities for Sentinel-2 and -3. Verrelst, J., Muñoz, J.,
Alonso, L., Delegido, J., Rivera, J.P., Camps-Valls, G, Moreno, J. Remote Sensing
of Environment, 118, p127-139, 2012.
Retrieval of Vegetation Biophysical Parameters using Gaussian
Processes Techniques. J. Verrelst, L. Alonso, G. Camps-Valls, J. Delegido and
J. Moreno. IEEE Transactions on Geoscience and Remote Sensing, 50(5), 1832 –
1843, 2012.
10
3 ARTMO‘s MLRA toolbox
3.1 Installation
The MLRA toolbox is being operated within the ARTMO environment (Figure 3-1).
Please consult the MLRA’s Installation guide on how to implement the MLRA toolbox
into ARTMO.
3.2 ARTMO
Once having the MLRA module installed, the module will automatically appear within
ARTMO’s top bar, in Retrieval (see Figure 3-2):
SCOPE
11
3.3 MLRA‘s modular architecture
The MLRA module is again organized in a modular way. All modules are accessible from
MLRA’s main drop-down menu. A schematic overview is provided below (Figure 3-4).
The 5 different modules of the MLRA toolbox are described in the following sections. The
modules have to be used in a logical order, according to:
12
4 Input
The Input module is the first mandatory step to configure. There are two sources
where Input data can come from:
As such, the Project overview window appears where a project can be chosen (Figure
4-1). Then within a Project a single look-up table (LUT) class can be chosen if multiple
LUT classes are configured.
Note that the Project overview window is the same as the one used in ARTMO. By
clicking on a project (any cell in the row of the top panel) and then on Input, the meta
data of the applied models can be consulted. If multiple LUT classes are configured they
will appear in the bottom panel. Then the appropriate LUT class can be selected by
clicking on any cell in a given row.
Figure 4-1. Project overview window to select a project and a LUT class.
Subsequently the window will appear where the output spectra and input variables of the
chosen LUT class can be selected (Figure 4-2). Depending on the complexity of the LUT
class the output from different models can be chosen (e.g. at leaf or at canopy level), the
LUT can be restricted by narrowing the variable ranges, and output parameters for
mapping can be selected.
13
Figure 4-2.Configure the required parameters to be mapped and used spectral output.
Two input variables can be combined. When clicking on ‘Combined’ a window will
appear where two variables can be selected (Figure 4-3). Subsequently its product will
be calculated (e.g. LAI and Leaf chlorophyll content will lead to canopy chlorophyll
content).
An important aspect to realize hereby is that only variables originating from the models
can be mapped. Also, when it comes to applying the generated model to a remote
sensing image, the simulated data may be too smooth as compared to real observations.
Therefore, it may be recommended to add noise in a subsequent step (see Settings).
Edit settings can be accessed if earlier RTM input data has been configured. The same
window (Figure 4-2) will then appear and input settings can be modified.
14
It is important to realize that many MLRAs are computational demanding. Therefore,
depending on your computer, most of the MLRAs can only be fed with up to 3000-
100000 samples. In a future version we aim to find a way so that larger training datasets
can be inserted.
To be able to insert your own User data, make sure that the data is prepared in a matrix
format, i.e. each cell should contain a number. Please prepare the data according the
structure below (Figure 4-4).
0 Variables in rows
. ……
. …….
. …..
wavelengths
Below you can find an example (Figure 4-5). Note that the text file can consist of a header
(e.g. 1,2,3..), but that should then be identified in the Input window (see Figure 4-6).
15
Input
parameters
Associated
Wavelength spectra
s
Figure 4-5. Example of an Input file with field data. The first column is a header. The following
columns represent parameters as measured in the field. Starting from Row 7, the corresponding
spectra are added below. The first column represents the wavelengths.
An important aspect here is that two types of input data are required within the same
file: (1) the parameters to be mapped, e.g. leaf area index, chlorophyll content; and (2)
the related spectra. These data need to be provided together in a plain text file.
When opening the User Input file, the following steps are required:
16
Figure 4-6.Input window to load external, User data as prepared in a text file. First rows: input
parameters, first column: wavelength. Spectra below the parameters.
In the top left bar it is also possible to save the inserted data, including the configured
input settings. A file browser allows you to save the data and a message will appear
when done (Figure 4-7, left). These data and settings can subsequently be loaded
(Figure 4-7, right). In this way, there is no need to repeat the input settings each time.
After loading, one can immediately click on Import and proceed with the MLRA settings.
Figure 4-7.Message windows that settings have been saved [left] and that selected preciously-
saved file has been loaded [right].
When importing data, make sure that the dataset is complete, i.e. that for each cell a
value is given. Empty cells can be replaced by a ‘NaN’, however that will lead to errors
in further processing. A warning message will appear in case inconsistencies are
encountered.
When aiming to develop MLRA strategies per land cover class, a first step to do is
selecting a land cover map:
17
A remote sensing land cover map can then be selected through a file browser window
(Figure 4-8). From 1.13 onwards both images as prepared by ENVI or as (geo)TIFF files
can be read. ENVI is preferred since the module makes use of the information from the
.hdr file. Either an ENVI image file or its associated .hdr file can be selected. In case of
TIFF files only labels will be identified.
Figure 4-8. File browser to insert an ENVI remote sensing classified map header (.hdr).
When this step is completed then in following steps the name of the different land cover
classes will appear. As such, per land cover class a new MLRA strategy can be
developed (see following sections).
18
Figure 4-9. Input window to load external, User data with assigning a row to ID class line.
Once completed, a window will appear that allows to link the land cover classes with the
labels to which input data is assigned (Figure 4-10). The user can then manually link
each land cover class and input class. When no land cover class is assigned then this
land cover class will not processed.
Figure 4-10. Window to link land cover classes with input classes.
19
Figure 4-11. Window to match the RTM parameters with external user parameters.
Internally it checks whether both datasets contain the same spectral bands. When the
number of bands does not match the following Error message appears (Figure 4-12):
Figure 4-12. Error message when spectral bands from RTM data do not match with those of
User data.
Once loading input data is completed, the following step is configuring the MLRA Settings
for training and validation. Alternatively, the user can also use all data for training
without a validation in the Retrieval module.
20
5 Settings
Once having the Input data configured then the Settings module can be accessed. This
module enables evaluating one or multiple MRLA scenarios prior to applying one to a
remote sensing imagery for biophysical parameter mapping. It is hereby expected that
the input data will be partitioned in both training and validation data since goodness-of-
fit results of validation data will be presented.
Note that it is also possible to directly apply a MLR model to a remote sensing
imagery. In that case the user can skip MLRA settings and go immediately go to
Retrieval. In that case no validation will be performed. In the Retrieval window a
model will be developed and directly applied to a remote sensing imagery.
Note that the structure more complicated MLRAs, such as neural networks, is already
internally predesigned (e.g. with respect to hidden layers) in order to ensure ease of use.
For the kernel MLRAs (KRR, SVR, GPR) internal tuning takes place. This involves
adjusting the parameters of the models, carried out automatically by partitioning the
training set and following a cross validation (n-fold) strategy. For more information
regarding these regression algorithms, please consult:
http://www.uv.es/gcamps/code/simpleR.html.
21
Figure 5-1.MLRA single-output settings window.
Various configuration options have been implemented which can lead to optimized
performances:
Multiple nonparametric regression algorithms can be selected at once for
evaluation. All possible combinations with further defined settings will then be
assessed.
Gaussian noise can be added. Noise can be added, both on the parameters as on
the spectra. Here, a range of noise can be configured so that multiple noise
scenarios can be evaluated. The injection of noise can be of importance to account
for environmental and instrumental uncertainties, e.g. when simulated spectra from
RTMs are used for training. Noise will be applied to both training and validation data.
Range. A choice can be made to insert a single value or multiple values by activating
the Range. Then, all inserted configurations will be assessed in subsequent
validation. There are three ways to enter a range:
22
Figure 5-3: Adding a range through a uniform distribution and a number of samples.
When having inserted both RTM input data and User data inserted (see
section 4.3) then both dataset can be combined. Both the RTM data and
USER data will then be activated. In principle for both data inputs a partition of
training data can be inserted. It is then assumed that for both the remaining parts
go to validation. However, by activating the boxes below ‘Only train’ or ‘Only
Validation’ it can be decided whether data will be excluded for train or validation.
In this way all kind of partitioning combinations are possible by assigning portions
of User or RTM data to either training or validation.
23
5.2 Band tools: redundancy reduction
Since each added band puts a burden on the computational load, an option to compute
relevant bands has been added to overcome the Hughes phenomenon or ‘curse of
dimensionality’.
24
Regarding the cluster-based methods (CCA, OPLS, KCCA, KOPLS), internally a
clustering step of the selected variable is applied based on k-means clustering.
The DR method is then based on the clusters. By default the same number of
clusters as components are taken, but the user can introduce more clusters. By
clicking on ‘cluster’ a window appears (Figure 5-8) where more clusters can be
given. Regarding the kernelized DR methods, additional options are provided
regarding the kernel type, the used sigma method and optimization regarding the
sigma in view of regression. By default, 10 repetitions are applied using linear
regression, but that option can be deactivated, or customized, e.g. by applying
the optimization on the user regression method (but that can go considerably
slower in case an advanced regression method is chosen). By default 5
components are given, but the user can insert any number.
Figure 5-7. SIMFEAT dimensionality reduction module with 11 DR methods and optimization
options.
Figure 5-8. Option to change the number of clusters for the cluster-based DR methods (CCA,
OPLS, KCCA, KOPLS). By default the same number of clusters as components +1 is provided.
4. GPR band analysis tool (GPR-BAT). From v. 1.14 a band analysis tool (BAT)
has been added based on band ranking properties of a few MLRAs. Specifically,
the family of GPR, operating in a Bayesian framework, provides band ranking
properties, i.e. the lower the sigma around the band the more important the band.
Consequently, high sigmas imply less relevant bands. With this property a
backward band reduction option is provided, whereby each iteration the poorest
performing band is removed and then goodness-of-fit statistics recalculated. As
such eventually the best performing bands are calculated, e.g. for 4, 3, 2 until
finally one band is left. This approach can be of interest in finding most sensitive
bands for a variable, as well ascertaining what would be the minimum of bands
to keep an acceptable accuracy. The method works only when first clicking on
25
GPR (or VH-GPR). When clicking then on the following window appears (Figure
5-9):
1
http://es.mathworks.com/help/stats/cvpartition.html
26
class proportions as in group. cvpartition treats NaNs or empty strings in group as missing
values.
c = cvpartition(n,'HoldOut',p) creates a random partition for holdout validation
on n observations. This partition divides the observations into a training set and a test (or holdout) set.
The parameter p must be a scalar. When 0 < p < 1, cvpartition randomly selects
approximately p*n observations for the test set. When p is an integer,cvpartition randomly
selects p observations for the test set. The default value of p is 1/10.
After clicking on Cross-validation, the following window will appear (Figure 5-10):
Figure 5-10. Single-output window with land cover class activated and Next Class button.
In accordance with the Matlab function cvpartition, the following methods are provided:
k-fold Stratified hold-out
Stratified k-fold Leave-one-out
Hold-out
In the case of stratified k-fold, the option is provided to provide a file with group data.
As an example, to enable validating MLRA methods with the complete dataset a k-fold
cross-validation technique can be employed. The k-fold cross-validation means that the
dataset is randomly divided into k equal-sized sub-datasets. From these k sub-datasets,
k; k-1 sub-datasets are used as training dataset, and the single k sub-dataset is used as
the validation dataset for testing the model. Then, the cross-validation process is
repeated k times, with each of the k sub-datasets used in turn as the validation dataset.
The results from each of the iterative processes are combined to produce a single
estimation. In this way, all the data are used for both training and validation, and each
single observation is used for validation exactly once.
Once having a cross-validation data partitioning selected, the Settings can be completed
as usual.
27
5.4 Active learning
From v.1.17 onwards a new module (plug-in) named “Active Learning” has been added
to Settings (Figure 5-11).
The active learning module has been published in Verrelst et al., 2016. Essentially, it
allows Active learning (AL) methods enable to select the most informative samples in an
additional large data set. The AL methods will sequently search for meaningful samples
within a sampling pool (e.g. a LUT) in order to increase model performance. Six AL
methods are introduced for achieving optimized biophysical variable estimation with a
manageable training data set. The selected criterion algorithms can rank the samples
according to the uncertainty of a sample or its diversity. These criteria are sometimes
used together within classification problems and are here applied separately to
regression. Selecting samples by uncertainty picks the most uncertain samples, i.e.,
those with the least confidence. Uncertainty criteria include variance-based pool of
regressors (PAL), entropy query by bagging (EQB), and residual regression AL (RSAL).
Selecting samples by diversity ensures that added samples are dissimilar from those
already accounted for. Diversity criteria include Euclidean distance based diversity
(EBD), angle-based diversity (ABD), and cluster-based diversity (CBD).The algorithms
are provided in Verrelst et al., 2016.
When clickin on Active Learning the following window appears (Figure 5-12):
28
Figure 5-12. Settings of the Active Learning module.
Apart from selecting the AL methods an obligatory step is to select the pool data (LUT
or user data). From that data AL select the samples and adds to the regression method.
Similarly as input data, data can either come from ARTMO RTM projects or User data.
The same steps have to be followed as in section 4. When data is selected the following
window appears (Figure 5-13), where the originally selected variable has to match with
the variable of the pool data (that should be the same, but perhaps names can differ).
Figure 5-13. Window to match name original input variable with name of pool input variable.
Obviously, the variable should be the same.
29
5.5 Advanced options
From v. 1.19 onwards some advanced options are added. The idea is that here the
advanced user will get the possibility to tune the MLRAs. Until now that was not possible
and only default settings are applied. In this version it becomes possible to tune the
optimization of neural networks – in future versions also other MLRAs will become
tunable. When clicking on:
advanced options→ neural networks
then the following window will appear (Figure 5-14):
30
Figure 5-15.Single-output window with land cover class activated and Next Class button.
31
Figure 5-16. GUI to configure the MLRA settings for multi-output.
32
6 Validation
6.1 Validation: New
Finally, once having Input data provided and MLRA settings (single-input or multi-input)
configured, then those scenarios can be run. All possible combinations that were defined
during the MLRA setting will be applied with the training phase and evaluated with the
remaining data that was assigned to validation. To start the analysis the following steps
have to be done:
Validation→ New
A text box will appear where a name can be filled in (Figure 6-1):
Figure 6-1. Window to provide a name for the new validation table.
Note that if no name is provided an automatic (default) name will be generated that
consists of the current date (year, month, day, minute, second) will be automatically
used. By clicking on OK, all configured scenarios will subsequently be one-by-one
analyzed. Validation results are automatically saved in the current MySQL database.
This has the advantage that a large number of results can be stored in a systematic
manner and that they can be easily queried later.
A message will appear that the MLRA analysis is proceeding, and within Matlab’s
command window the process status can be followed.
Once all scenarios have been analyzed, then an overview table with best validated
results will appear (Figure 6-2). It is important to note that only validation results are
presented in the ‘MLRA test table’. Training results are not provided because they
provide biased information, i.e. they tend to be over-optimistic because they predict on
the same data they were trained with.
For validation the following goodness-of-fit measures are provided:
33
Table 1. Goodness-of-fit statistical measures
7 Adjusted R2
All measures indicate the degree of association between estimated and observed values
of the same variable. Apart from MAE and ajusted-R2, these statistical measures are
those as proposed by Richter et al., (2012) which are considered as otimal statistical set.
Richter, K., Atzberger, C., Hank, T. And Mauser, W. (2012): Derivation of biophysical variables from Earth
Observation data: validation and statistical measures, Journal of Applied Remote Sensing 6 (1), DOI:
10.1117/1.jrs.6.063557.
MAE was also considered because it can be used together with RMSE to diagnose the
variation in the errors in a set of predictions. The RMSE will always be larger or equal to
the MAE; the greater difference between them, the greater the variance in the individual
errors in the sample. If the RMSE=MAE, then all the errors are of the same magnitude2.
2
http://www.eumetcal.org/resources/ukmeteocal/verification/www/english/msg/ver_cont_var/uos
3/uos3_ko1.htm
34
In the validation table the best performing results are shown according to selected land
cover Class (if configured), parameter and statistic (Figure 6-2).
35
Figure 6-2. Validation table with options to organize statistical results and plotting options. A
strategy can be chosen to apply to retrieval.
The statistics are organized with best results per regression model (in case multiple
options are calculated). The user can choose according to which statistics to sort the
results. When more options per regression model are calculated, then by changing the
top number these results can also be shown. From v.1.19 onwards, also the multiple
dimensionality reduction (DR) of the SIMFEAT module results can be accessed by
changing the top number.
6.1.1 Graphics
In the ‘Graphics’ section, when selecting a validated regression model (e.g. the best
performing one), then it is possible to visualize the prediction performance of the selected
regression model.
Further, various options to display the results are provided:
36
plotting the relevant bands of GPR (or VHGPR) through its sigmas (Figure 6-4):
A special case is the Gaussian process regression algorithm (GPR) and its “Variational
Heteroscedastic variant” (VHGPR). It provides additional outputs (that we call sigmas)
that were generated during the development of a model. These sigmas provide an
indicator of the relevance of contributed bands; the lower the sigma the more
important the band.
37
Figure 6-6. Histogram of the residuals.
38
Figure 6-8. 2D correlation plot with validation statistics for a GPR model based on
training/validation partition and inserted noise range.
Finally, Selecting a MLRA strategy it will move to the down panel. If they are
configured per land cover class then multiple strategies can appear here. When
clicking on Done, then these strategies will be transferred to Retrieval window
(Figure 7-1).
From MLRA v1.10 onwards due to a cross-validation data partition module (see
Section 5.3) the table of validation results shows the cross-validation results. That is,
the mean of the different cross-validation results. As such, it provides a more
robust indication of the predictive power of the different regression models. From v.
1.19 onwards the general statistics are calculated based on all estimations vs.
observation. The measured vs estimated figure will show the 1:1-line scatter plot
with all points. Also the statistics are calculated based on the subsets:
Cross-validation statistics (Figure 6-9):
Figure 6-9. General statistics of the cross-validation results from the selected model.
For the selected result, the cross-validation statistics can be provided. Because the
cross-validation method provides results of data sub-sets, various basic statistics can
be provided. By inspecting, e.g. the min-max and standard deviation, indications
about the robustness of the method is given.
From v. 1.19 onwards, regardless of the selected cross-validation method, when
proceeding to retieval the regession model will be based on all training data.
39
In case the dimensionality reduction (DR) SIMFEAT module was selected, from v. 1.19
onwards their statistics can be inspected (Error! Reference source not found.):
Figure 6-10. GPR sigma band analysis tool as activated in the MLRA validation table.
40
Figure 6-11, Goodness-of-fit validation statistics over the wavelengths where iteratively each
time the least contributing wavelength is removed. In case cross-validation is applied also
standard deviation and min-max is provided.
Table best band. This table provides the wavelengths of each iterative band
removal. Of interest are the wavelengths for the last few best performing
wavelengths (Figure 6-12):
Figure 6-12. Table of wavelengths for each iterative round and associated goodness-of-
validation validation statistics.
Figure frequency top ranked bands. This provides the frequencies of the top
performing bands in case a cross-validation strategy is applied. The user can
choose at which number of wavelengths the frequency ranking is applied and
how many top-performing rankings should be included (Figure 6-13).
41
Figure 6-13. Frequency ranking of best ranked wavelengths in case of cross-validation
techniques.
In the ‘Select’ section (left checkmark boxes), when selecting a validated regression
model (e.g. the best performing one), then it will be moved to the bottom panel and can
then be used as prediction model to retrieve biophysical parameters. See section 7
Retrieval.
It is also possible to export regression models. By clicking on Export the options is given
to save away a .mat file where all MLRAs and relevant information from the current
exercise is included. In Tools it is then possible to import the .mat file. See Tools, Import
model.
42
Figure 6-14. Table with statistic results Active Learning (AL) methods and processing time,
added samples and iterations.
From this table some more results can be visualized, i.e. (1) performance over the
iterations and, (2) added samples.
An example of performance is added in Figure 6-15. It can be viewed that in this case
eventually CBD is best performing. It can also be seen that random sampling (RS) shows
very irregular behavior. Again, when the performance is going down, those added
samples are then discarded in the following iteration.
Figure 6-15. AL performance for the selected AL methods along the given iterations according
to the selected statistic.
Following, also the added samples to reach best performance can be plotted for a
chosen statistic (e.g. R2) (Figure 6-16). It can again be observed that despite 100
iterations the adding of only a few samples caused improved performances. Here it can
be observed that EBD and CBD needed most samples, but at the same time these
methods are best performing; they led to most accurate regression models.
43
Figure 6-16. Number of added samples for the given iterations.
Then the following window will appear (Figure 6-17). From v. 1.19 onwards some
additional information about each test is provided when clicking on a test. Information
includes (1) if data is either coming from an RTM or from a USER, (2) the path of the
used data, (3) used variable, (4) number of samples, and (5) number of wavelengths.
That information will appear in the right panel.
44
The validation table of the selected name will appear when clicking on ‘Select’, In this
way, previously developed models can be easily called and used for mapping
applications.
45
7 Retrieval
From v.1.15 onwards it is possible to retrieval biophysical parameters from either an
image, resulting into a map, OR from a text file. A text file could exist of a field
spectrometer, although no validation data is available. The retrieval module will then
process that data and deliver the targeted biophysical parameters as output.
46
From v. 1.13 onwards it is possible to convert the to-be-processed image (or all images)
into different units. That can be of importance in case the training data is given in different
units as the image. They have to match. A multiplicative conversion factor can be
entered.
By clicking on OK, the input images can be loaded and the output maps will be written
away. If the input file was in ENVI format also the output will be written away as an ENVI
file. Similarly, if the input file was a TIFF file then also output is written as a TIFF file. The
following windows will appear to select the input remote sensing images (either *.hdr
or .tiff) (Figure 7-2):
Figure 7-2. Folder browser to select an Input folder where generated maps will be stored.
Finally, the output file or folder can be selected. From v. 1.18 onwards a distinction is
being made in case of processing only one image or multiple images. In case of
processing one image then the output folder can be selected and the suggested output
name will be given and can be edited. The suggested output name consists of the input
name plus the selected variable and MLRA. See Figure 7-3.
In case multiple images are selected then the only the output folder can be selected
(Figure 7-4). The output map will be written away in same format as the input image. By
47
default it will point to the folder of the input images. The same name will be used, plus
the output variable and MLRA added to it.
Figure 7-4. Folder browser to select a folder where the output maps are located.
It is of importance that an image is selected with the same band settings as those
been presented during the training phase. If the band numbers do not match, the
following error appears (Figure 7-5):
Figure 7-5. Error message in case number of bands do not match with those that have been
presented during training phase.
Note that this error only checks on band number matching. Make sure that also the
wavelengths match.
Finally, a map is created. In Matlab command window the processing time will be
displayed. When completed the following window appear: Figure 7-6. In this window an
output layer can be selected and visualized by clicking on VIEW.
Figure 7-6. Window to select a generated output map (through Open Map) and then to select a
layer. By clicking on Preview mapping options are provided.
48
From MLRA v. 1.18 onwards, it is possible to apply a mask to an output layer in case the
output file consists of multiple layers (bands), e.g. in case of GPR. By clicking on Mask
in the menu bar, then the following GUI appears (Figure 7-7):
Figure 7-7. Mask option when having clicked on Mask. In this GUI a band to be used as mask
can be selected and the boundaries for pixels to keep.
In this window the band to be used as mask (e.g GPR CV map that give the relative
uncertainties) and then the min and max threshold for pixels to keep (e.g. only pixels with
uncertainties between 0 and 30%). Then by clicking on OK the mask will be activated.
Following, in the Output map window (Figure 7-6), when having the Mask check box
activated then when clicking on View only the pixels that fall within the threshold will be
shown. The pixels outside the threshold will be masked out (given as NaN).
A figure will be provided (Figure 7-8; left). By clicking on Options in the figure top bar,
options are provided to manipulate mapping features such as the color table, axis and
titles fonts, etc. (Figure 7-8; right).
In order to have the map in the right way oriented, make sure to click on the
orientation “ij”.
A map is visualized by clicking on Sample. It is also possible to generate the map in
other, more conventional image types (jpeg, tiff, pdf, emf, eps). Hereby, redundant white
space around the figures will be automatically removed. When clicking on View, the map
will be visualized according to the configured settings. When subsequently clicking on
Save then a file browser will appear to save the map according to the chosen format.
Figure 7-8. Generated map in Figure window [left] and mapping option such a color tables,
colorbar, legend, and exporting option [right].
49
The GPR deserves special attention since apart from mean estimates it also provides
associated uncertainty estimates, expressed as the standard deviation (SD) around the
mean estimate. This map will be presented as a second layer. Further, since the
magnitude of the SD may be related to the magnitude of the mean estimate, therefore,
as a third layer, also the coefficient of variation (CV: SD/mean estimate) is provided. This
map can be considered as a relative uncertainty (e.g. see Figure 7-9).
Regarding the viewing maps window (Figure 7-6), note that in case of processing a TIFF
image no band names will be given because TIFF is not associated with a header file. In
case of GPR and VH-GPR multiple outputs are provided but only band_1, band_2 and
band_3 is given. They have the following meaning:
1. band_1: retrieved variable (e.g. LAI)
2. band_2: standard deviation (SD) around the mean (absolute uncertainties.
3. Band_3: coefficient of variation (CV=SD/mean estimate * 100). This map can be
interpreted as relative uncertainties, expressed as percentage.
From v. 1.18 it is possible to save the masked file as an ENVI file. This is done through
Options→ Save as format: ENVI.
50
Figure 7-10. Input data with spectral data. Although the GUI is quite similar as the validation
step, no validation data (field data of biophysical parameters) is required.
Once clicking on Import, the following step is to select the output file and its location. By
default the option is given to of the same folder as the input text file. Also, by default the
same name as the text file is given but then with the extension ‘_MLRAinv’ (Figure 7-11).
This output name is editable. Once completed a message ‘OK’ will appear.
Figure 7-11. Folder browser to select the output file. By default the same location and file name
as the input file is given, with the extension of ‘_MLRAinv’.
The output file provides retrieved data of the selected variables and some beta data how
the text file is organized. An example is provided in Figure 7-12.
51
Figure 7-12. Output text file with retrieved values of selected biophysical parameters, and in
case of GPR absolute and relative uncertainties.
52
8 Tools
The MLRA module consists of various tools:
8.1 Save
Tools→ Save
The Save option enables to save all input configurations defined in the toolbox. A file
browser will appear where the file can be saved (Figure 8-1). As such, input data and
settings can be saved to a Matlab .m file. This option can be of interest when aiming to
repeat an analysis with the same dataset.
8.2 Load
Tools→ Load
By clicking on Load, an earlier saved .m file can be loaded. A file browser will appear
(Figure 8-2). The settings are then inserted into the MLRA toolbox. Here a previously
saved Matlab .m file can be loaded.
53
Figure 8-2. File browser to load a saved general MLRA settings.
Figure 8-3. List of Validation tables in current DB [left], and text box to insert a new name [right].
54
Figure 8-4. List of Validation tables in current DB [left], and message that deletion is completed
[right].
8.4 Options
Tools→ Options
Finally, by clicking on Options the following options are provided (Figure 8-5):
Seed: Here the provided seed for generating random numbers can be changed.
Random numbers are used in the training and validation distribution. When changing
the seed, thus, other random training and validation distribution will be applied.
Change negative results: This option enables to convert negative values into other
values, e.g. close to zero. This is because negative values are not physically
possible and can be reasonably assumed as representing zero (non-existing).
However, in some cases such as GPR a zero value may not be preferred because
of further calculation (e.g., coefficient of variation: [standard deviation] / [mean
estimate]).
Skip NaN value: To speed up the processing, particularly for geometrically-
corrected images, it is wise to skip NaN (not a number) values. Hence only pixels
with real values are processed. The desired output data can be provided.
Skip pixels where all bands same value: Often instead of NaN, pixels with no
physical meaning are given all the same value, like 0, 255, -9999. With this option
these pixels will be skipped in the processing.
Processing speed: By default the processing speed that is required to develop and
validate a MLRA model is recorded. This processing speed is then provided in the
Assessment table. However, here in setting it can be deactivated since the recording
of processing speed also takes processing time.
Processing mapping speed: By default the processing speed that is required to
process an image through the MLRA model is recorded. This processing speed can
also be saved in a text file when activating the check box.
As images are processed line-by-line, from v1. 1.19 onwards for images it is
possible to process images block-by-block. That can go faster in some cases
and as such it becomes even possible to process the complete image at once. That
is only recommended in case the image is small in order to avoid memory problems.
Then as value ‘-1’ has to be given.
55
Figure 8-5. Options window with option to change Seed, to change negative results to a positive
value, to skip NaN values and to deactivate recording processing speed.
Figure 8-6. Window to select a generated output map (through Open Map) and then to select a
layer. By clicking on Preview mapping options are provided.
56
Tools→ View figure
With the View figure it is possible to reopen a Matlab figure (.fig) by selecting a figure
through the file browser. The Matlab figure window appears, but then with the inclusion
of the Options button in the top bar (see also: Figure 6-3, Figure 6-4, Figure 6-8, Figure
7-8).
Figure 8-7. Window to Import an earlier-exported MLRA model (.mat) with interface to display
targeted parameter, used MLRA and required wavelengths.
8.8 ScatterPlot
Tools→ScatterPlot
In all the retrieval toolboxes a Scatter plot tool has been added (Figure 8-8). With this
tool a scatter figure of two images can be plotted. Additionally, goodness-of-fit statistics
can be displayed. The tool requires to load an image, select a band and decide whether
it should be plotted at the X-axis or the Y-axis. These criteria have then be added by
clicking on ADD. The same step has then be repeated for the second image and then
the other axis has to be selected. By activating ‘Assessment’ apart from the scatter
(Figure 8-9) plot also goodness-of-fit statistics are displayed (Figure 8-9). The scatter
plot is with a color scale according to the density of the data cloud.
From v. 1.16 onwards it is also possible to plot error maps, both absolute and relative (in
%). Together with these maps also histograms are plotted (Figure 8-10). The scaling of
the colormap and the number of bins of the histogram can be controlled.
57
Figure 8-8. Scatterplot tool to display a scatterplot of two images. When activating ‘Assessment’
then also goodness-of-fit statistics are displayed.
Figure 8-10. Example of absolute error map (left) and associated histogram (right).
58
8.9 Validation external data
Tools→ Validation external data
From v. 1.16 onwards it is possible to apply an earlier configured and validated model to
external data for a new validation. This tool is useful to evaluate the portability of a
developed model. To have this tool activated, first an earlier validated model has to be
selected. See also Chapter 6.4Error! Reference source not found. to load and select
MLRA model. Otherwise a message will appear with the request to load a retrieval
model. Multiple models for different variables can be selected.
When a retrieval model has been load then the Import USER data window appears
(Figure 8-11). See also section 4.2. Because a model has been selected that is related
to a specfic variable, the given variable appears in the Variable window. In case multiple
models are selected then all the variables are listed in the drop-down menu. For each
variable the associated line numberof the file with external data has to be given. Further,
it is also required to give the starting line where spectral data begins.
Once the required info is entered then by clicking on Import it will use that data to validate
the performance of the selected model. The goodness-of-fit indicators as described in
section 6.1Error! Reference source not found. (Figure 8-12, left) and a 1:1-scatter plot
will be provided (Figure 8-12, right). Note that in case multiple models for different
variables are selected then the table will list the statistical indicators for each variable.
Also for each variable a 1:1-scatter plot will be displayed.
Figure 8-11. Input window to load external, User data as prepared in a text file. The variables of
the selected models are given in ‘Variable’ drop-down menu.
59
Figure 8-12. Goodness-of-fit statistics (left) and 1:1 measured vs estimated scatterplot
(right).
60
9 Help
In Help, the Manual (this document) (Help→ User’s manual) and the Installation Guide
(Help→ Installation guide) can be consulted. Also a Disclaimer note is included (Help→
Disclaimer).
Figure 9-1. Activated log window when clicking in Help to Show Log.
61
10 Error reporting
While much effort has gone into developing a bug-free toolbox, there may be situations
in which errors might still occur. Errors appear as red messages in the Matlab™ main
window. Please report any bugs to artmo.toolbox@gmail.com and we will try to resolve
them.
62
Figure 10-2. Finding the MySQL my.ini file and editing the „max_allowed_packet“ variable.
Secondly, for MATLAB 7.10 (R2010a) onwards, you can also set the Java heap size in
the preferences dialog box (File>Preferences) under the „Java Heap Memory“ section of
the „General“ tab.
For instructions how to do this in older MATLAB versions see this site:
http://www.mathworks.com/matlabcentral/answers/92813-how-do-i-increase-the-heap-
space-for-the-java-vm-in-matlab-6-0-r12-and-later-versions
To resolve this it is required to create a file with writing permisions where the temporarily
files can be written away e.g. in ‘Documents’. For instance create the folder
‘ARTMO’.
63
File→ Settings
The following window will appear (Figure 10-3. Settings menu where in ‘Local’ you can
point to the newly created writable folder.):
Figure 10-3. Settings menu where in ‘Local’ you can point to the newly created writable folder.
In ‘Local’ then point to that newly created empty folder. You could copy and paste the
path, or by clicking to ‘Get’ and then select the right folder. In this way, temporary files
will be stored in that folder.
64