Theoretical Background

Part I
Theoretical background
5
1. | Overview of the steps of
accuracy assessment
In an accuracy assessment of map data, the map is compared with

higher quality data. The higher quality data, called reference data, is
collected through a sample based approach, allowing for a more
careful interpretation of specific areas of the map. The reference data
is collected in a consistent manner and is harmonized with the map
data, in order to compare the two classifications. The comparison
results in accuracy measures and adjusted area estimates for each map
category. This process is broken down into four major components: (i)
a map, (ii) the sampling design (iii) the response design and (iv) the
analysis (see Figure 2). The sampling design specifies how to select a
subset of the map for which reference data will be collected. This is
necessary because it is usually impractical to collect reference data for
the entire study region (map area). The response design protocol
provides guidelines for collecting the reference data. The analysis
protocol includes all steps that lead to a decision whether the map and
reference data are in agreement for the subset of the data that was
sampled. How to derive accuracy and area estimates from the map
and reference data is determined in the analysis protocol.
An accuracy assessment must provide full documentation of the
map and reference data. To meet this prerequisite, this document first
gives background information about the example map data used in
the practical implementation, the Global Forest Change (GFC) dataset
(Hansen et al., 2013) and about how to use other land cover data sets.
The document explains how to define strata and calculate the area
occupied by each stratum for the map data. This information is
necessary for the calculation of the sample size that forms part of the
sampling design. The response design chapter provides
documentation of how the reference data can be obtained using
sample-based approaches and the Open Foris Collect Earth application
and alternatively, how to use other sources for reference data.
The steps of the accuracy assessment are outlined in Figure 2,

including the software applications that can be used to complete the
steps.
6
Figure 2: The main four steps of an accuracy assessment: Obtaining and finalizing the
map data, sampling design, response design and analysis. The symbols show the
software that can accomplish that step: 1 is R, 2 is QGIS, 3 is Excel, and 4& is Collect
Earth.
7
2. | Map data
Users may want to implement accuracy assessment for different

types of thematic map data. Map data can be made from satellite
images or be freely available map data for land cover or land use for a
single time period or change between multiple time periods. The
characteristics of the map determine some part of the methodology
used for the accuracy assessment, e.g. the minimum mapping unit
and the classes to be assessed.
The first step is a general quality control check of the map data. The
user can do this by visually assessing the map for obvious errors. A
preliminary analysis of comparing other similar data can help reveal
map errors. If a map of change is being assessed, a method for
checking the quality of the map is checking for impossible transitions,
i.e., water to forest. Obvious errors should be accounted for and
corrected before continuing with the accuracy assessment. The quality
control check does not need to be an independent assessment; rather it
can be beneficial if the map producer carries out the quality control
check.
In the general preparation of the map data the user can consider
aggregation. Aggregation of the map classes can reduce the burden
during the collection of reference data and can increase the accuracy
measures. Aggregation of the spatial unit from pixels to pixel blocks
can also be considered. Aggregation can be considered on case by case
basis and the justification for aggregating map classes and the spatial
unit varies.
The accuracy of custom land cover maps can be assessed using the
steps described in this document. The appropriate methodology
depends on the type of map data that is to be used. A map can be
either raster or vector data; in the former the spatial assessment unit is
usually a pixel or pixel block and in the latter the spatial assessment
unit is a polygon.
Raster data can be either pixel or object-based. The example
dataset used in this document is the Global Forest Change (GFC) data
set. If the data being assessed is pixel-based, the same steps described
in the practical implementation can be similarly implemented because
the GFC data is a pixel based analysis. If it is object-based (or
segment-based), estimating accuracy and area is more complex and is
not covered by this document.
8
Vector data can be produced by visual interpretation of pixel-
based satellite imagery, or also from segmented satellite data. If the
visual interpretation is based on pixel-based satellite imagery, the
resulting land cover map can be converted to raster data using the
spatial resolution of the original satellite data (i.e. 30 m for Landsat
data). Then the same methodology as for the GFC data can be
applied. The accuracy assessment of object-based vector data is not
covered by this document.
Once the custom land cover or land cover change map is available
as raster data, the strata need to be defined and the size of each
stratum needs to be calculated. The strata must be mutually exclusive,
meaning that each pixel must be assigned to one strata class. The sum
of all pixels in the strata defines the total study area. The calculation of
the strata size can be done using GIS software or an R script. The
process is the same for multiple change classes, i.e. between forest,
woodland and cultivated land, as it is for changes between forest and
non-forest. However, a high accuracy of many different change
classes is increasingly difficult to achieve, and the assessment of many
classes requires an increase in the amount of sample points.
9
3. | Sampling design
The sampling design defines how to select the subset of the map,
which forms the basis for the accuracy assessment. Selecting a subset
of the map is necessary because (i) a sampling approach allows more
careful interpretation of the parameters of interest at each sample site
thus satisfying the requirement of using ‘higher quality’ data than that
used to create the map, even if the data used to make the map is also
used in the accuracy assessment and (ii) it is usually not feasible to
collect reference data for the whole study area. In the sampling
design, the sample size for each map category is chosen to ensure that
the sample size is large enough to produce sufficiently precise
estimates of the area of the class (GFOI, 2013).
It is critical to use a probability sampling design that incorporates
randomization in the sample selection protocol. Probability sampling
is defined in terms of inclusion probabilities that quantify the
likelihood of a given unit being included in the sampling design. The
inclusion probability must be known for each unit selected in the
sample and it must be greater than zero for all units in the area of
interest. Non-response is the situation where the inclusion probability
is unknown or zero, i.e. inaccessible plots or unavailable data due to
cloud coverage. The circumstances must be clearly stated for non-
response samples, for example by reporting the proportion of the
selected sample units for which cloud cover or lack of reference
imagery prevented assessment of the unit. If ground visits are used to
collect reference data, a sampling design which considers non-
response is advisable such as the protocol described by Stevens and
Olsen (2004).
Commonly used probability sampling designs include simple
random, stratified random, and systematic. For land cover maps it is
recommended to use a stratified sampling approach, so only this
sampling design is being addressed in this document. The sampling
design can influence the results and information about it is necessary
to properly interpret the error matrix.
1.1 Stratification of the map

Stratification is the division of the area of interest into smaller areas
(strata), in which each assessment unit is assigned to a single stratum.
Stratification could be for example by map class (i.e. forest and non-
10
forest) or by sub-region (i.e. administrative units). The strata need to
be mutually exclusive and inclusive of the entire study area, with no
area that is in multiple strata classes or is omitted from the strata. The
end use of the map also needs to be considered when creating the
sampling design, i.e., a national forest change map that is being used
to derive area estimation of forest change in different sub-national
areas. In this case, the map is stratified by map class and
administrative boundaries. The user should ensure the sampling
design captures all of the strata.
There are two main purposes for stratification. Firstly strata can be
of interest for reporting results, i.e. accuracy per land cover class or
sub-region. The second purpose of stratification is to ensure a
sufficient representation of rare classes (e.g. that only represent a
small proportion of the area of interest). Land change often occupies a
small fraction of the landscape, so a change stratum (i.e. forest loss)
can be identified and the sample size allocated to that stratum can be
large enough to produce a small standard error for the user’s
accuracy estimate. The stratification by map classes improves the
precision of the accuracy and area estimates by increasing the
sampling density in the change classes. For this reason, stratification
in this study is based on land cover class and an independent sample
is drawn for each land cover class.
When defining the strata, a feasible number of classes need to be
chosen. For single date land cover maps, it is usually feasible to define
a stratum for each map class (Wulder et al., 2007), but it is more
challenging for a change map where the number of different types of
changes might be too high. To reduce the number of strata, types of
change that are very unlikely to occur could be eliminated. Strata
could also be defined on the basis of generalized change categories,
such as change from forest to non-forest instead of forest to cultivated
land, forest to water etc. The feasibility of distinguishing these change
classes in the reference data should also be taken into account.
Strahler et al. (2006) provides additional examples for aggregating
change classes. Even if a change type is not defined as stratum in the
sampling design, accuracy and area estimates can still be derived for
that change type, but the sample size might not be high enough to
derive estimates at the desired precision.
1.2 Random versus systematic sampling

The two most common protocols for selecting the assessment units
are simple random and systematic sampling. Systematic sampling is
defined as selecting a starting point at random with equal probability
11
and then sampling with a fixed distance between sampling locations.
It is often implemented for field sampling activity, such as national
forest inventories. In general, the simple random selection protocol is
the recommended option, but systematic selection is also nearly
always acceptable. If using simple systematic sampling it can be
difficult to capture small classes, particularly change categories.
Therefore when conducting an accuracy assessment for land cover
change (such as for activity data) which includes the collection of
reference data, it is recommended to use a stratified approach.
1.3 Determine sample size

The sample size should be representative of the population, the
number of spatial units (i.e., pixels), making it large enough to get
reliable estimates, but as small as possible in order to save costs.
Determining this sample size is an inexact science because it depends
on accuracy and area information that is not known prior to the
assessment. A “best guess” about the accuracy and area information
can be used for the sample size calculation. Although there are
formulas which can calculate the overall sample size and distribution
of the sample it is up to the user to decide how to best determine the
sample size. Area information is usually based on the number of
spatial units of the map, and accuracy is generally higher for larger
classes, i.e. the estimated accuracy is higher for no-change classes than
for change. Equation 1 (Cochran, 1977), calculates an adequate overall
sample size for stratified random sampling that can then be
distributed among the different strata. N is number of units in the area
of interest (number of overall pixels if the spatial unit is a pixel), S( )
is the standard error of the estimated overall accuracy that we would
like to achieve, Wi is the mapped proportion of area of class i, and Si is
the standard deviation of stratum i.
(1)
The overall sample size resulting from this calculation can be
allocated among the stratum in multiple ways. The samples need to
be distributed between the strata balancing between equal sample size
per stratum and proportional allocation. In proportional allocation,
the overall sample size is allocated to the strata proportional to the
area of the strata, so rare strata receive a small proportion of the
overall sample size. In equal allocation, the overall sample size is
distributed equally between the strata. Stratification is used for rare
12
classes, such as the assessment of change, it is necessary to ensure
there are a sufficient number of samples in the rare classes. Minimum
sample size should be at the least 20 to 100 samples per strata
(Congalton and Green, 2008).
Different allocations favor different estimation objectives, i.e. equal

sample size favors estimation of user’s accuracy, while proportional
allocation usually results in smaller standard errors for producer’s
and overall accuracy. As a compromise, it is suggested to use a
sample allocation somewhere in between same and proportional
allocation, taking into account a minimum sample size per stratum.
13
2 | Response design
The response design defines how to determine whether the map

and the reference data are in agreement. The response design
establishes reference data sources to be compared with map data,
assuming that the reference classification is sufficiently more accurate
than the map classification being evaluated. The four major features
in the response design are: the spatial unit, the sources of information
used to determine the reference classifications, the labeling protocol
for the reference classification, and the definition of agreement.
2.1 Spatial assessment unit

The spatial assessment unit is the unit at which the map was
sampled. This is the unit at which the reference data is collected. The
spatial unit is the basis for the location-specific comparison of the
reference classification and map classification. It can be a pixel,
polygon (segment), or pixel block. Usually, the pixel is chosen as the
spatial unit, but any of the types can be used. If a pixel block is used,
the map is coarsened to the size of the pixel block. The spatially
explicit character should be retained in the accuracy assessment;
therefore, the user should aim to have reference data with the same or
finer level of detail. The choice of the spatial assessment unit has
implications on the sampling design and analysis.
2.2 Sources of reference data
Various sources of reference data exist, ranging from ground visits

to the use of satellite imagery. The reference classification needs to be
of higher quality compared to the map classification which can be
ensured in two ways: i) The reference source has to be of higher
quality (e.g. higher spatial or radiometric resolution) than the source
for the map classification, or ii) the process to create the reference
classification has to be more accurate if using the same source
material. For example, if Landsat imagery is the only available source
for both map classification and reference data, the process for
obtaining the reference classification has to be more accurate than the
map classification, i.e. by including visual assessment from expert
users. Olofsson et al. (2014) list various sources of reference data, and
14
elaborate on their advantages and disadvantages. Additionally
reference data should be temporally coincident with the map being
assessed, e.g. if a land cover map of the year 2000 is being assessed,
then the reference data should be from the year 2000. If reference data
is collected from a year different than the year of the map, then
adjusted areas will represent areas as of the time of the reference data.
A cost-effective tool for collecting reference data from very high,
high and medium resolution satellite imagery is Collect Earth2. This
Google Earth plugin allows the practitioner to visually assess the land
cover/use of sample locations with the freely available data from
Google Earth, Google Earth Engine, Here maps, and Bing maps.
Chapter 10 addresses the setup of Collect Earth incorporating the
major features of the response design.
2.3 Reference labeling protocol

The labeling protocol defines how to convert the information
provided by the reference data into labels of the reference
classification. When reference data contains a mixture of classes, it is
especially to have a consistent approach for reference class labeling.
The specification of a minimum mapping unit (MMU) for the
reference classification should be defined by the labeling protocol
because it has important implications for the accuracy assessment and
area estimations. The MMU is the smallest area that can receive a
classification label in the reference data. A possible MMU is the spatial
unit of the sample; however it is not necessary that the MMU for the
reference data matches the spatial unit of the map. For example a
smaller MMU for the reference data increases the specificity of the
classification granting the ability to distinguish smaller patches of
change for the reference classification.
Olofsson et al. (2014) provide more detailed suggestions for
dealing with mixed reference units, and the practical section on
reference data collection, section 10, addresses these suggestions with
Collect Earth.
2.4 Defining agreement

After the map and reference classification for a given spatial unit
have been obtained, rules for defining agreement need to be set up. In
the simplest case, map and reference classification have the same
classification scheme and if the labels agree, the map class is correct;
2
http://www.openforis.org/tools/collect-earth.html
15
otherwise it is a misclassification. Defining agreement is more
complicated for heterogeneous assessment units or different
classification schemes. A heterogeneous assessment unit is a spatial
unit covers that more than one class, such as a pixel block that is 60%
non-forest and 40% forest. As for all steps, the rules for defining
agreement need to be clearly stated.
16
3 | Analysis
The analysis protocol specifies how to translate the information

contained in the comparison of map and reference data into accuracy
and area estimates, and how to quantify the uncertainty associated
with them. Most of the calculations are based on the error matrix (also
commonly called a confusion matrix), which contrasts the map and
reference classification.
This chapter gives an introduction about the error matrix, the
measures used to summarize the accuracy assessment, and about
estimating area.
3.1 The error matrix

The error matrix is a cross-tabulation of the class labels allocated by
map and reference data. It is derived as a q x q matrix with q being the
number of classes assessed. The elements show the number of data
points which represent a map class i and reference class j (nij). Usually
the map classes are represented in rows and the reference classes in
columns. The diagonal of the matrix contains the correctly classified
data points, whereas the cells off the diagonal show commission and
omission errors. Commission error is the complimentary measure to
user’s accuracy, calculated by subtracting 100% from the user’s
accuracy for each class. Commission error, calculated for each of the
map classes, is the probability that the spatial unit classified into a
given category on the map represents that category in the reference
data. Omission error is the complimentary measure to producer’s
accuracy, calculated by subtracting 100% from the producer’s
accuracy for each class. Omission error, calculated for each of the map
classes, is the probability that the spatial unit classified into a given
category in the reference data represents that category in the map
data.
The sample based absolute counts, nij, can be converted into
estimated area proportions (see table 1) with equation 2 when the
strata correspond to the map classes if simple random, simple
systematic or stratified random sampling are used
17
(2)
Wi is the proportion of area classified as class i and can be calculated

by dividing the number of pixels per stratum as derived in section 4.3
by the total number of pixels. Table 1 shows an example of an error
matrix with four classes.
Table 1: Population error matrix of four classes. Cell entries (pij)

represent proportion of area.
Reference
Class 1 Class 2 Class 3 Class 4 Total
Class 1 p11 p12 p13 p14 p1.
Class 2 p21 p22 p23 p24 p2.
Map Class 3 p 31 p 32 p 33 p 34 p3.
Class 4 p41 p42 p43 p44 p4.
Total p1. p.2 p.3 p.4 1
3.2 Estimating accuracy

The accuracy measures are derived from the error matrix and
reported with their respective confidence intervals. They include
overall accuracy, user’s accuracy and producer’s accuracy.
The overall accuracy is the proportion of area classified correctly,
and thus refers to the probability that a randomly selected location on
the map is classified correctly (see equation 3). User’s accuracy is the
proportion of the area classified as class i that is also class i in the
reference data (see equation 4). It provides users with the probability
that a particular area of the map of class i is also that class on the
ground. Producer’s accuracy is the proportion of area that is reference
class j and is also class j in the map (see equation 5). It is the
probability that class j on the ground is mapped as the same class.
(3)
. (4)
(5)
For all three accuracy measures, the confidence intervals need to
be derived as well. The formula for the variance are presented in
equations 5, 6 and 7 in Olofsson et al. (2014), and the 95 % confidence
18
interval can be calculated by multiplying the square root of the
variance by 1.96.
The kappa coefficient is also often reported as a measure of map
accuracy. However, its use has been questioned by many articles and
is therefore not recommended (Pontius Jr and Millones, 2011).
3.3 Estimating area

The error matrix provides information on the accuracy of the map.
It is also recommended using the information for estimating the area
of classes, such as the area of deforestation, and their standard errors.
The reference data can be used to adjust the area estimate as obtained
from the map. It is recommended to base that estimation on p. k , the
proportion of area derived from the reference classification, because
in contrast to pk . , the proportion mapped as class k, it should have
smaller bias. p. k is the column total of reference class k in the error
matrix (see equation 6).
(6)
The standard error for the stratified estimator of proportion of

area can be calculated using equations 10 and 11 in (Olofsson et al.,
2014), and the 95 % confidence interval is obtained by multiplying the
standard error by 1.96.
19
4 | Interpretation of the results
The main purpose of the accuracy assessment is to quantify the

accuracy of the map and to generate new area estimates to correct for
bias in the map. For both accuracy and area estimates, the accuracy
assessment provides confidence intervals. This section has a closer
look at what these estimates mean, and what needs to be taken into
account when reporting the results of the accuracy assessment.
4.1 Interpretation of the accuracy estimates

There is no general rule as to which level of accuracy is good and
which is not. Judgment on the data validity depends on the purpose
of the map and thus needs to be dealt with on a case by case basis. The
steps of the accuracy assessment, described in this document, need to
be considered when assessing the accuracy value. Accordingly,
UNFCCC does not provide any thresholds for the accuracy of data
provided for the construction of forest reference levels.
For land cover change, it is necessary to look at the accuracy of
change and not at the accuracy of two single land cover maps. Even if
both land cover maps have high accuracy measures for a single point
in time, it does not provide any information about the accuracy for
change classes. A new change analysis using remote sensing images is
necessary rather than comparing maps from different times. This is
because change usually occupies a small portion of an area and is
frequently smaller than the cumulative error of the individual map
productions (GFOI, 2013). Forest change is often less than 1% of the
total area; in two land cover maps with an overall accuracy of 99%, the
change can be attributed to error between the two land cover maps.
The overall map accuracy is not always representative of the
accuracy of individual classes (GFOI, 2013). High overall map
accuracy does not guarantee high accuracy for forest loss. Therefore,
both producer’s and user’s accuracy for all single classes need to be
considered. A high user’s accuracy and low producer’s accuracy for
forest loss, for example, indicate that most of the forest loss in the map
was also forest loss in the reference data, but that the map missed
catching a fair amount of forest loss. Additionally total sample size,
the number of strata and the allocation of the total sample size to the
strata can favor one accuracy measure over the other.
Accuracy is usually higher for stable classes than for change
20
classes. Furthermore, accuracy is variable in different landscapes.
Global products, like the GFC data, need to be assessed for each study
region instead of relying on global accuracy estimates. For example
Potapov et al., (2014a) opted not to use the global forest change
classification model because it had a conservative estimate of forest
loss. For a study area in Eastern Europe, Potapov et al. (2014a), reports
for forest loss between 2000 and 2012, the GFC data has a user’s
accuracy of 65 % and a producers accuracy of 68 % while their
customized classification model products had higher accuracy
measures of 94 % user’s accuracy and 88 % producer’s accuracy.
4.2 Interpretation of the area estimates

The accuracy assessment serves to derive the uncertainty of the
map area estimates. Whereas the map provides a single area estimate
for each land cover class without confidence interval, the accuracy
estimates adjusts this estimate and also provides confidence intervals
as estimates of uncertainty (Figure 3). The adjusted area estimates can
be considerably higher or lower than the map estimates.
Such area estimates with confidence intervals could also be derived
from the reference data alone, but the combination of map and
reference data increases the precision of the final estimate (Figure 3).
A higher precision means that the confidence intervals are smaller.
Therefore, it is highly recommended to use this combination of map
and reference data for area estimates. Even from a map with low
accuracy measures, meaningful and precise area estimates can be
derived if the reference data is collected in a thorough way. The
adjustment of the map area, whether the adjusted area is greater or
smaller than the map area, can be comprehended by comparing the
error matrix which shows the sample data. The map area for a
particular stratum is adjusted to a larger area when the number of
reference units sampled is greater than spatial map units for that
stratum. For example if 1000 units are sampled, and within forest
strata for the selected sample, the map data has 100 spatial units
labeled as forest and the reference data has 150 spatial units labeled as
forest, the resulting adjusted area is greater than the area only derived
from the map data.
21
A
B
Figure 3: Graphs A and B show the area estimates from the map data alone (map), the
combination of map and reference data (adjusted), and reference data alone (reference)
for each of the four strata. Graph A shows the area estimates for the stable classes,
forest and non-forest and graph B shows the area estimates for the non-stable classes,
forest loss and forest gain. Each of the points includes confidence intervals. The
confidence intervals are larger for the non-stable classes because they cover a smaller
area. The map data does not have confidence intervals because it is represents the
entire population that is being sampled (all of the pixels in the map). The R script will
output this graph in addition to the values of the areas and confidence intervals.
22
4.3 Reporting results
When reporting the results of accuracy assessment, the report
should not only include the estimates of accuracy assessment,
adjusted area and their respective confidence intervals but also the
assumptions implied by several elements of the accuracy assessment.
The assumptions can influence the level of accuracy, and include, but
are not limited to:
1. the minimum mapping unit and the spatial assessment unit
2. the sampling design
3. the forest definition
4. the source of reference data
5. the confidence level used for calculating the confidence intervals
(typically 95 %).
The estimates always need to be reported with their respective

confidence intervals. Additionally, it is recommended to present the
error matrix in terms of estimated area proportions instead of absolute
sample. The estimated area proportions normalize the absolute
sample counts by the map area and are used to calculate the users and
producer’s accuracy. Because the producer’s accuracy is based on the
map data, it can be calculated using either the estimated area
proportions or the sample counts; however, the user’s accuracy is
based on the reference data and yields different results whether the
calculation uses sample counts or estimated area proportions. It is
recommended to calculate the accuracy measures based on the
estimated area proportions and not the absolute sample counts,
therefore showing only the sample counts does not explicitly
demonstrate how the user’s accuracy is calculated.
23

Theoretical Background

Uploaded by

Copyright:

Available Formats

You might also like

Theoretical Background

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Theoretical Background

Uploaded by

Copyright:

Available Formats

Part I

In an accuracy assessment of map data, the map is compared with

The steps of the accuracy assessment are outlined in Figure 2,

Users may want to implement accuracy assessment for different

1.1 Stratification of the map

1.2 Random versus systematic sampling

1.3 Determine sample size

Different allocations favor different estimation objectives, i.e. equal

The response design defines how to determine whether the map

2.1 Spatial assessment unit

2.2 Sources of reference data

Various sources of reference data exist, ranging from ground visits

2.3 Reference labeling protocol

2.4 Defining agreement

The analysis protocol specifies how to translate the information

3.1 The error matrix

Wi is the proportion of area classified as class i and can be calculated

Table 1: Population error matrix of four classes. Cell entries (pij)

3.2 Estimating accuracy

3.3 Estimating area

The standard error for the stratified estimator of proportion of

The main purpose of the accuracy assessment is to quantify the

4.1 Interpretation of the accuracy estimates

4.2 Interpretation of the area estimates

The estimates always need to be reported with their respective

You might also like