Professional Documents
Culture Documents
Theoretical Background
Theoretical Background
Theoretical Background
Theoretical background
5
1. | Overview of the steps of
accuracy assessment
6
Figure 2: The main four steps of an accuracy assessment: Obtaining and finalizing the
map data, sampling design, response design and analysis. The symbols show the
software that can accomplish that step: 1 is R, 2 is QGIS, 3 is Excel, and 4& is Collect
Earth.
7
2. | Map data
8
Vector data can be produced by visual interpretation of pixel-
based satellite imagery, or also from segmented satellite data. If the
visual interpretation is based on pixel-based satellite imagery, the
resulting land cover map can be converted to raster data using the
spatial resolution of the original satellite data (i.e. 30 m for Landsat
data). Then the same methodology as for the GFC data can be
applied. The accuracy assessment of object-based vector data is not
covered by this document.
Once the custom land cover or land cover change map is available
as raster data, the strata need to be defined and the size of each
stratum needs to be calculated. The strata must be mutually exclusive,
meaning that each pixel must be assigned to one strata class. The sum
of all pixels in the strata defines the total study area. The calculation of
the strata size can be done using GIS software or an R script. The
process is the same for multiple change classes, i.e. between forest,
woodland and cultivated land, as it is for changes between forest and
non-forest. However, a high accuracy of many different change
classes is increasingly difficult to achieve, and the assessment of many
classes requires an increase in the amount of sample points.
9
3. | Sampling design
The sampling design defines how to select the subset of the map,
which forms the basis for the accuracy assessment. Selecting a subset
of the map is necessary because (i) a sampling approach allows more
careful interpretation of the parameters of interest at each sample site
thus satisfying the requirement of using ‘higher quality’ data than that
used to create the map, even if the data used to make the map is also
used in the accuracy assessment and (ii) it is usually not feasible to
collect reference data for the whole study area. In the sampling
design, the sample size for each map category is chosen to ensure that
the sample size is large enough to produce sufficiently precise
estimates of the area of the class (GFOI, 2013).
It is critical to use a probability sampling design that incorporates
randomization in the sample selection protocol. Probability sampling
is defined in terms of inclusion probabilities that quantify the
likelihood of a given unit being included in the sampling design. The
inclusion probability must be known for each unit selected in the
sample and it must be greater than zero for all units in the area of
interest. Non-response is the situation where the inclusion probability
is unknown or zero, i.e. inaccessible plots or unavailable data due to
cloud coverage. The circumstances must be clearly stated for non-
response samples, for example by reporting the proportion of the
selected sample units for which cloud cover or lack of reference
imagery prevented assessment of the unit. If ground visits are used to
collect reference data, a sampling design which considers non-
response is advisable such as the protocol described by Stevens and
Olsen (2004).
Commonly used probability sampling designs include simple
random, stratified random, and systematic. For land cover maps it is
recommended to use a stratified sampling approach, so only this
sampling design is being addressed in this document. The sampling
design can influence the results and information about it is necessary
to properly interpret the error matrix.
10
forest) or by sub-region (i.e. administrative units). The strata need to
be mutually exclusive and inclusive of the entire study area, with no
area that is in multiple strata classes or is omitted from the strata. The
end use of the map also needs to be considered when creating the
sampling design, i.e., a national forest change map that is being used
to derive area estimation of forest change in different sub-national
areas. In this case, the map is stratified by map class and
administrative boundaries. The user should ensure the sampling
design captures all of the strata.
There are two main purposes for stratification. Firstly strata can be
of interest for reporting results, i.e. accuracy per land cover class or
sub-region. The second purpose of stratification is to ensure a
sufficient representation of rare classes (e.g. that only represent a
small proportion of the area of interest). Land change often occupies a
small fraction of the landscape, so a change stratum (i.e. forest loss)
can be identified and the sample size allocated to that stratum can be
large enough to produce a small standard error for the user’s
accuracy estimate. The stratification by map classes improves the
precision of the accuracy and area estimates by increasing the
sampling density in the change classes. For this reason, stratification
in this study is based on land cover class and an independent sample
is drawn for each land cover class.
When defining the strata, a feasible number of classes need to be
chosen. For single date land cover maps, it is usually feasible to define
a stratum for each map class (Wulder et al., 2007), but it is more
challenging for a change map where the number of different types of
changes might be too high. To reduce the number of strata, types of
change that are very unlikely to occur could be eliminated. Strata
could also be defined on the basis of generalized change categories,
such as change from forest to non-forest instead of forest to cultivated
land, forest to water etc. The feasibility of distinguishing these change
classes in the reference data should also be taken into account.
Strahler et al. (2006) provides additional examples for aggregating
change classes. Even if a change type is not defined as stratum in the
sampling design, accuracy and area estimates can still be derived for
that change type, but the sample size might not be high enough to
derive estimates at the desired precision.
11
and then sampling with a fixed distance between sampling locations.
It is often implemented for field sampling activity, such as national
forest inventories. In general, the simple random selection protocol is
the recommended option, but systematic selection is also nearly
always acceptable. If using simple systematic sampling it can be
difficult to capture small classes, particularly change categories.
Therefore when conducting an accuracy assessment for land cover
change (such as for activity data) which includes the collection of
reference data, it is recommended to use a stratified approach.
(1)
The overall sample size resulting from this calculation can be
allocated among the stratum in multiple ways. The samples need to
be distributed between the strata balancing between equal sample size
per stratum and proportional allocation. In proportional allocation,
the overall sample size is allocated to the strata proportional to the
area of the strata, so rare strata receive a small proportion of the
overall sample size. In equal allocation, the overall sample size is
distributed equally between the strata. Stratification is used for rare
12
classes, such as the assessment of change, it is necessary to ensure
there are a sufficient number of samples in the rare classes. Minimum
sample size should be at the least 20 to 100 samples per strata
(Congalton and Green, 2008).
13
2 | Response design
14
elaborate on their advantages and disadvantages. Additionally
reference data should be temporally coincident with the map being
assessed, e.g. if a land cover map of the year 2000 is being assessed,
then the reference data should be from the year 2000. If reference data
is collected from a year different than the year of the map, then
adjusted areas will represent areas as of the time of the reference data.
A cost-effective tool for collecting reference data from very high,
high and medium resolution satellite imagery is Collect Earth2. This
Google Earth plugin allows the practitioner to visually assess the land
cover/use of sample locations with the freely available data from
Google Earth, Google Earth Engine, Here maps, and Bing maps.
Chapter 10 addresses the setup of Collect Earth incorporating the
major features of the response design.
2
http://www.openforis.org/tools/collect-earth.html
15
otherwise it is a misclassification. Defining agreement is more
complicated for heterogeneous assessment units or different
classification schemes. A heterogeneous assessment unit is a spatial
unit covers that more than one class, such as a pixel block that is 60%
non-forest and 40% forest. As for all steps, the rules for defining
agreement need to be clearly stated.
16
3 | Analysis
17
(2)
(3)
. (4)
(5)
For all three accuracy measures, the confidence intervals need to
be derived as well. The formula for the variance are presented in
equations 5, 6 and 7 in Olofsson et al. (2014), and the 95 % confidence
18
interval can be calculated by multiplying the square root of the
variance by 1.96.
The kappa coefficient is also often reported as a measure of map
accuracy. However, its use has been questioned by many articles and
is therefore not recommended (Pontius Jr and Millones, 2011).
(6)
19
4 | Interpretation of the results
20
classes. Furthermore, accuracy is variable in different landscapes.
Global products, like the GFC data, need to be assessed for each study
region instead of relying on global accuracy estimates. For example
Potapov et al., (2014a) opted not to use the global forest change
classification model because it had a conservative estimate of forest
loss. For a study area in Eastern Europe, Potapov et al. (2014a), reports
for forest loss between 2000 and 2012, the GFC data has a user’s
accuracy of 65 % and a producers accuracy of 68 % while their
customized classification model products had higher accuracy
measures of 94 % user’s accuracy and 88 % producer’s accuracy.
21
A
B
Figure 3: Graphs A and B show the area estimates from the map data alone (map), the
combination of map and reference data (adjusted), and reference data alone (reference)
for each of the four strata. Graph A shows the area estimates for the stable classes,
forest and non-forest and graph B shows the area estimates for the non-stable classes,
forest loss and forest gain. Each of the points includes confidence intervals. The
confidence intervals are larger for the non-stable classes because they cover a smaller
area. The map data does not have confidence intervals because it is represents the
entire population that is being sampled (all of the pixels in the map). The R script will
output this graph in addition to the values of the areas and confidence intervals.
22
4.3 Reporting results
When reporting the results of accuracy assessment, the report
should not only include the estimates of accuracy assessment,
adjusted area and their respective confidence intervals but also the
assumptions implied by several elements of the accuracy assessment.
The assumptions can influence the level of accuracy, and include, but
are not limited to:
1. the minimum mapping unit and the spatial assessment unit
2. the sampling design
3. the forest definition
4. the source of reference data
5. the confidence level used for calculating the confidence intervals
(typically 95 %).
23