Accuracy Assessment of The Portuguese CO20160617 14352 14xusoy With Cover Page v2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Accelerat ing t he world's research.

Accuracy assessment of the


Portuguese CORINE Land Cover map
Mário Caetano

Global Developments in Environmental Earth Observation from Space

Cite this paper Downloaded from Academia.edu 

Get the citation in MLA, APA, or Chicago styles

Related papers Download a PDF Pack of t he best relat ed papers 

Caet ano, M., Mat a, F., Freire, S. (2006). Accuracy assessment of t he Port uguese CORINE Land …
Fernando da Mat a

Evaluat ing Hyperion capabilit y for land cover mapping in a fragment ed ecosyst em: Pollino Nat ional Par…
simone pascucci

Mult it emporal MERIS images for land-cover mapping at a nat ional scale: a case st udy of Port ugal
Paulo Gonçalves
Accuracy assessment of the Portuguese CORINE Land
Cover map

M. Caetano
Instituto Geográfico Português (IGP), Rua Artilharia Um, 107, 1099-052 Lisboa, Portugal
mario.caetano@igeo.pt
F. Mata
Escola Superior Agrária de Elvas – Instituto Politécnico de Portalegre (ESA-IPP)
S. Freire
Instituto Superior de Estatística e Gestão de Informação – Universidade Nova de Lisboa (ISEGI-UNL)

Keywords: remote sensing, cartography, CORINE Land Cover 2000, land cover, accuracy
assessment

ABSTRACT: This paper presents the accuracy assessment methodology designed and implemented
to validate the Portuguese CORINE Land Cover 2000 (CLC2000) cartography. The procedure is
based on the comparison of the land cover database with reference data derived from visual
interpretation of aerial photography for sample areas. The sample unit is the land cover polygon,
organized within a systematic cluster sampling plan. Each cluster of polygons corresponds to an
aerial photography, which allowed a reduction in the number of air photos that had to be acquired
by maximizing the number of polygons to inspect in each photo. A multinomial distribution was
used to estimate the number of samples. In this validation effort, we computed the overall accuracy,
producer’s accuracy, and user’s accuracy. The CLC2000 for Portugal has an overall thematic
accuracy of 82.8, with a confidence interval of 80.5-85.2, and that the majority of the CLC classes
are mapped with high accuracy.

1 INTRODUCTION

Digital data obtained from satellites are nowadays a growing source of information used in the
production of land cover/use maps (LCLU). LCLU maps are important inputs to different studies
(e.g, environment, agriculture, land management), at global scale (e.g., land cover), regional scale
(e.g. temporal and spatial distribution of natural resources) and local scale (e.g. precision farming).
In this context, knowledge on the accuracy of maps and their fitting to reality is a key issue,
considering their uses in management and decision making. In fact, the lack of a map quality
indicator prevents an assessment of the risk of map use. Accuracy assessment of LCLU maps
produced using remote sensing data provides this information to managers and allows the estimation
of confidence levels for the decision-making. Accuracy assessment is the final quality check-up in
thematic cartography produced using remotely sensed data and gives the production team an
indication of how good was their work, and to the user an indication of the degree of confidence
that can be assigned to the cartography.
The CORINE Land Cover 2000 Project (CLC2000) in Portugal was carried out in the context
of the IMAGE and CORINE Land Cover 2000 (I&CLC2000) initiative from the European Commission
(Perdigão e Annoni, 1997; EEA, 2002). The Portuguese CLC2000 Project was undertaken between
October 2002 and February 2005, was funded by the Portuguese Environmental Institute (IA) and
by the European Commission, and was coordinated by the Institute of Statistics and Information
Management – New University of Lisbon (ISEGI-UNL) with the collaboration of the Portuguese
Geographic Institute (IGP) (Instituto do Ambiente, 2005). The main goal of this initiative is to map
the land cover of Europe in 2000 by updating the previous land cover maps. As a result of the

459
CLC2000 Project in Portugal, three land cover databases were produced for Continental Portugal:
(1) the CLC90-R database, which is an improvement (both geometric and thematic) of the first
CLC product of 1985/86/87, known as CLC90; (2) CLC2000 database, for the year 2000; and (3)
CLC-changes, the database of changes that occurred in the period between the two products
(CLC90 and CLC2000). The production of the CLC datasets was based on visual interpretation of
Landsat imagery, with relevant ancillary information also being used for best results. In this paper
we present the validation procedure designed and implemented for the national CLC2000 database.
The validation of the European land cover product was carried out by the European Technical Team
(Maucha & Buttner, 2005).

2 THEORETICAL BACKGROUND ON ACCURACY ASSESSMENT

Accuracy assessment is the process used to estimate the accuracy of the classification present in a
map, by confronting the map with reference information that we assume as true. The final goal is
the production of an error matrix, from which statistics and indices that indicate the accuracy of
individual classes and of the whole map can be derived. In accuracy assessment, one has to define:
the reference data, type of sampling unit, sampling design and intensity. These factors have to be
adequately balanced in order to allow the extrapolation of results for the whole map. Unfortunately
there is not a standard procedure for accuracy assessment and the choice of a methodology depends
on factors such as time, money and human resources.
There are several widely used indices for accuracy assessment based on the error matrix (Congalton
& Green 1999): overall accuracy, producer accuracy, user accuracy, global kappa and conditional
kappa. Tau statistics are an innovation of kappa (Ma & Redmond 1995). Other techniques that are
not based in the error matrix can be used to produce different statistics: fuzzy (Gopal & Woodcock
1994), variance analysis (Rosenfield 1981) and intersect sampling (Skidmore & Turner 1992).
The traditional error matrix methodology is widely used in maps produced under mutually
exclusive and totally exhaustive rules (Congalton 1991). The fuzzy set theory was introduced by
Gopal & Woodcock (1994) to handle the ambiguity that could be present in classification. Variance
analysis, regression and qui-square analysis to contingency tables are inferential models that can
also be used in validation, in contrast with the inference performed with the support of sampling
designs (Stehman 2000). However, these inferential models have assumptions that differ from those
obtained with sampling designs and can be better suited for super populations (populations with
infinite or hardly quantifiable sampling units) (Stehman 2000).
Reference information is used to compare the classification with reality, and should have a
higher degree of accuracy than the information used for map production. Sources of reference
information include: aerial photography; satellite imagery with better resolution than those used in
map production; and field work (Biging et al. 1998; Congalton & Biging 1992). Congalton &
Biging (1992) state that only field work has the potential for complete discrimination of landscape
classes, but some difficulties can arise: access, human and material resources, cost, and time.
Reference information should refer to a date close to that of the data used in map production,
avoiding the influence of landscape change (Congalton & Green 1999), and should also be independent
from data used in the training process (Hammond & Verbyla 1996; Stehman 1999).
Sampling units are the fragments of the classified map that have a probability of being selected,
and their choice is affected by map goals, map scale, resources, and reference information. Congalton
(1988) lists four options: simple pixel, cluster of pixels, simple polygons and cluster of polygons.
Aranoff (1985) stated that the sampling unit must have at least the area of the minimum cartographic
unit. Aranoff (1989) recommends the use of a simple pixel, because with a higher level of detail we
can increase accuracy but also the occurrence of errors. Janssen & van der Wel (1994) recommend
the pixel if it is used in classification and the use of polygons when visits to the field are difficult.
Congalton (1988) prefers the cluster of pixels due to its easier identification in reference data.
Biging et al. (1998) report that maps based in polygons and maps based in pixels have different
statistical methods of validation. A map of polygons validated using pixels as sampling unit usually

460 M. Caetano, F. Mata & S. Freire


has underestimated accuracy, while the opposite procedure usually results in overestimated accuracy.
Stehman & Czaplewski (1998) stated that pixels and polygons are examples of area samples
directly associated with the thematic units of the map, but an area sample does not need to be
associated with a specific class, and can include different pixels classification or parts of different
polygons. In this context an aerial photography can be viewed as a cluster of polygons. These
authors also observed that in order to minimize problems of generalization around polygon boundaries,
the sampled area to be used in validation can be reduced to the interior of the polygon, and areas
smaller than the minimum cartographic unit can be excluded from validation.
Regarding sampling intensity (number of samples needed for the sampling process), it depends
both on the maximum range of the confidence interval and on the significance that is attributed to
that same range (Hay 1979). There are two possibilities to compute the intensity: the binomial and
the multinomial formulation. The binomial model distinguishes between correct and incorrect
classification, and many authors used this model to generate tables of sampling intensity (e.g., Hord
& Brooner 1979, Ginevan 1979, Hay 1979). With this model it is not necessary to take into account
the number of classification classes, and it allows to obtain a shorter number of samples to be
collected. The calculation is also easier than the multinomial formulation1. The multinomial formulation
introduced by Rosenfield (1982) in the remote sensing community, makes an association of the
validation process with the multinomial distribution. Validation is not just a question of wrong and
right, but the error is classified. To use this formulation we need to know “a priori” the number of
classes and their proportion in the map2.
The sampling design affects the possibility of obtaining samples representing the entire map and
all the classes. In sampling design several issues should be considered, namely: sample selected
without bias; different sampling designs have different models and the estimators used in statistical
calculation are different; the sampling design will affect the sample distribution in the area to be
validated, which will affect the cost of the whole process (Biging et al. 1998). Furthermore, the
sampling methodology must follow a correct probabilistic design, where the probability of inclusion
of a sample unit is known. The main sampling designs used in validation are: simple random
sampling, stratified random sampling, clusters random sampling, and systematic sampling (e.g.
Congalton & Green 1999; Stehman & Czaplewski 1998).
Simple random sampling has good statistical properties (Congalton 1988), but is not always the
best design, due to tendency to undersample the less represented classes, unless the number of
samples is large enough (Stehman & Czaplewski 1998; Congalton & Green 1999). Stratified
sampling limits this problem. The important issue is the internal homogeneity of the stratus, which
is required in this design, in opposition to cluster sampling where internal heterogeneity is required
(Cochran 1977). Congalton (1988) refers that strata can be classes, administrative regions or eco-
regions. Cluster sampling uses statistic units grouped in sets called clusters. The cluster is called
primary unit and its elements are called secondary units. A cluster is selected randomly and all
secondary units are, in general, inspected. Alternatively some secondary units are randomly selected.
These designs are named one stage cluster sampling or two stage cluster sampling. This design has
the advantage of limiting the resources, or limiting the reference data needed (Stehman & Czaplewsky
1998; Stehman 1999; Cochran 1977). Aerial photography can be considered a cluster (primary
unit) and the polygons in its interior considered secondary units (Stehman & Czaplewsky 1998).
Sampling systematization is widely used as an attempt to attain homogeneity in the distribution
of the sample throughout the area to be sampled. Systematization can be aligned or unaligned, with

1
Equation to calculate the number of samples to be collected using the binomial formulation (Cochran, 1977):
n = pˆ ⋅ qˆ ⋅ zα2 /2 / d 2 , where p̂ is a priori estimate of the proportion of concordance, d is the desired absolute
accuracy ( var ( p ) ), 1 – α is the confidence level of p̂ e zα/2 is the percentile α/2 of standardized normal
distribution.
2
Equation to calculate the sample size ni for each class i, with i = 1, ... , k, regarding an absolute accuracy di
2 2
(Congalton & Green, 1999): n i = pˆ i ⋅ qˆ i ⋅ χ (1,1–( α / k )) / d i , the total number of samples to be taken is the
maximum of ni, or n = max{n i }. The number of samples for each class is n/k.
i

Accuracy assessment of the Portuguese CORINE Land Cover map 461


the second option having the advantage of minimizing possible spatial autocorrelation of sampling
units (Congalton & Green 1999). Pure random, stratified and cluster samplings can be systematized.
Some analysis techniques (e.g. kappa and related) require a multinomial model, which is only
obtained through pure random sampling. The effect of using different designs without the appropriated
variance estimator can lead to biased kappa values (Congalton 1991). Stehman (1992) concluded
that the effect of systematic sampling is negligible. Stehman (1996) also calculated the estimators
for kappa and its variance in stratified sampling. Stehman (1997) calculated the estimators for
kappa and its variance in cluster sampling for clusters with equal area, and states that variance for
clusters of different size can be derived by the same technique.
The error matrix or confusion matrix is used to compare the information obtained in the classification
process with reality, through the use of classic statistics. This matrix provides a concise method to
examine: the omission and commission errors in each class; producer and user accuracies for each
class; marginal kappa for each class; overall accuracy; and kappa statistic.
Overall accuracy is the proportion of sampling units correctly classified; Producer’s accuracy is
the proportion of sampling units classified in its true class, with commission error being the
difference to the unit; User’s accuracy is the proportion of sampling units that in reality belong to
the classified class, with omission error being the difference to the unit. Kappa analysis is a
multivariate discrete statistic used in accuracy assessment to statistically evaluate if error matrices
are significantly different (Congalton 1983). The idea behind kappa is that part of the assessed
accuracy can be due to chance in the random process of sampling (Congalton 1983; Rosenfield &
Fitzpatrick-Linz 1986). The KHAT3 statistic is a measure of agreement based in the difference
between the actual error matrix concordance and that due to chance (Congalton 1983; Congalton
1991). Kappa can be estimated for each class, performing conditional kappa, which is a user and
producer kappa.

3 METHODOLOGY

The methodology developed for accuracy assessment of the Portuguese CLC2000 was based on the
comparison of the final map with the “ground truth” for selected sample units, from which an error
matrix was computed. Accuracy indices were then derived from this matrix. The validation method
(Table 1) was designed to allow that the accuracy indices obtained for samples could be inferred for
the whole territory with a 95% confidence level.

Table 1. Characteristics of the accuracy assessment method of CLC2000 in Portugal.

Reference data Orthophotos 1:5 000 from INGA for year 2000
(used to derive ground truth)
Sampling unit Map polygon
Sampling design Unaligned Systematic cluster sampling
Number of clusters 144
Sampling intensity 1.5%
Accuracy assessment indices Overall accuracy index and user’s and producer’s accuracies

The reference information used in the validation process was orthorectified aerial photography
(i.e., orthophotos). The choice was based in the availability of an aerial coverage of Portugal for the
same year of the satellite imagery used to produce CLC2000 map. Because of the high cost of aerial
photography, it was decided to use the minimum number of photographs as possible and to use the
entire photo in order to maximize its use. The orthophotos are 4 km × 2,5 km in size, equivalent to
10 km2 (1000 ha), and its distribution over the country is framed by the CLC map grid.

3
KHAT means K hat, estimated kappa, K̂ (Treitz et al., 1992).

462 M. Caetano, F. Mata & S. Freire


In the sampling scheme, the area covered by each orthophoto was considered a cluster of
polygons, with polygon being the sample unit. In order to ensure a homogenous coverage of
mainland Portugal, the sample was systematized based on the CLC working units (i.e., map sheet
at 1: 100 000). However, some working units had little area for validation (near the coastline and
along the Spanish border), so there the sampling is proportional to the area being validated. We can
consider the sampling design to be an unaligned systematic cluster sampling.
Opting for a systematic cluster sampling allows optimizing the use of the orthophotos to acquire,
as the whole area of the ortho is considered in the validation process (Congalton & Green 1999;
Stehman & Czaplewsky 1998; Stehman 1999). The number of orthos to acquire (i.e., clusters) was
determined by sampling a preliminary version of the CLC2000 (available in October 2004), since
the final version was not yet available. The number of orthos that had to be acquired to derive
specific accuracy indices for level 3 classes little represented in Portugal was extremely high,
resulting in very high costs for the project. Therefore, a decision was made to select 4 sample
clusters in each working unit, in a compromise between the cost of validation and number of CLC
level 3 classes to validate. With this scheme and after rejecting the orthophotos located outside
mainland Portugal, a total of 144 clusters were selected. To determine the number of polygons to
sample with significance, a multinomial formulation was used, since in an error matrix it is also
relevant to distribute the error by the different classes.
For each orthophoto (cluster) a land cover map was produced by visual interpretation (i.e., CLC-
REF), with all incomplete polygons smaller than the MMU of CLC (25 ha) being excluded from
analysis. To produce the CLC-REF, the photo-interpreter was given a vector file of CLC2000
without information regarding the land cover class. The interpreter had to classify each polygon
with a CLC code and had to redefine the geometry of the polygons whenever their misfit to the
landscape was greater than 100 m. This photo-interpreter was not involved in the production of the
CLC databases. In the validation phase, it was not possible to distinguish with sufficient confidence
classes 211 from 212 and 231, resulting in their aggregation in a mega-class, hereby coded as 210.
A similar problem occurred with classes 321, 322 and 323, and so these were grouped under a
single code, 320. Therefore the validation of CLC2000 at level 3 includes 38 classes.
After producing the CLC-REF, the minimum number of polygons that had to be sampled to
obtain statistical significance was again computed using the multinomial formulation. The standard
error was set as d = 0.06. Considering a confidence level of α = 0.05 (95%), the existence of 42
classes to validate (k = 42), and the maximum proportion of class 243 with 13,82% of the polygons
to be sampled ( p̂i = 0.138230 and q̂ i = 1 – 0.138230 = 0.86177), we could anticipate a minimum
number of 9 polygons by class necessary for its validation with statistical significance.
The CLC-REF was then intersected with the CLC2000 to identify areas of agreement and
disagreement between the two maps and compute the error matrix. Indices of thematic accuracy
were derived for the three levels of CLC nomenclature. The indices computed were overall accuracy,
and for each class we estimated the producer’s accuracy and user’s accuracy. Confidence intervals
were calculated based on Cochran (1977) and Rossiter (2001).

4 RESULTS

The results of the validation process (accuracy indices) for the Portuguese CLC2000 show a rather
good overall accuracy at each of the three levels of the CLC nomenclature (Table 2).
Table 2. Overall accuracy indices of CLC2000 at the three levels of the
nomenclature.

CLC Level Overall accuracy Confidence interval (95%)


1st level 97.6 86.6 100
2nd level 90.1 77.6 100
3rd level 82.8 80.5 85.2

Accuracy assessment of the Portuguese CORINE Land Cover map 463


Table 3 shows the full error matrix assembled from sample results of accuracy assessment of the
CLC2000 at level 3. Values in the matrix refer to area in hectares. The User’s and Producer’s
accuracy indices for the sample areas are also presented. The very high values for almost all the
classes indicate that in the sample units the CLC2000 database has an excellent quality at the class
level. There are only 8 level 3 classes with producer’s accuracy indices below 85%. Regarding the
user’s accuracy indices, only 9 classes have values below 85%. Most of the level 3 classes displaying
lower specific accuracy indices belong to the level 1 class 3 (Forest and semi-natural areas) and
these values arise from confusion among them. An inspection of the error matrix reveals that
significant areas classified as 312 (Coniferous forest) correspond in fact to classes 313 (Mixed
forest) and 324 (Transitional woodland). On the other hand, some areas which were in fact 324
(Transitional woodland) were classified in CLC2000 as 320 (mega-class corresponding to shrub
and herbaceous vegetation) and 312 (Coniferous forest). Some mixed forests (312) were classified
as coniferous forests (312). The confusion among forest classes were expected because of the
uncertainty felt by the image interpreters in the map production (Instituto do Ambiente, 2005).
Portuguese forests have a rather open canopy and as a consequence the background as a strong
effect on the radiation reflected by the forest as a whole. Because the background may be very
similar in all types of forests, the forest reflectance may be very similar in all types of forest. The
confusions evolving class 324 (Transitional woodland) are partially a consequence of the rather
broad definition of this class since it encompasses shrublands with a tree cover smaller than 30%,
forests recovering from fires, new forest plantations and forests recently cut. By visual interpretation
of Landsat imagery it is very difficult to establish the threshold where a young forest is no longer
a 324 but instead a mature forest and therefore classified as 311, 312 or 313. On the other hand it
is also difficult to differentiate a shrubland with a high tree cover (class 322 or 323) from forest
with a large abundance of shrubs (class 311, 312 or 313 depending on forest type). The ancillary
data with information on forest distribution in Portugal used in CLC2000 map production could not
be used in many cases, because they were not in agreement (Instituto do Ambiente, 2005).
The user’s and producer’s accuracy indices presented at class level in Table 3 are just valid for
the sample units. The same indices, but now for the level 3 classes for the whole CLC2000 map are
presented in Fig. 1, together with the confidence intervals. The indices for classes with less than 9
polygons in the sampling units cannot be extrapolated to the whole map.

1 1
0.9 0.9
0.8 0.8
0.7 0.7
UA (%)

0.6
PA (%)

0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
112
221
222
223
241
242
243
244
311
312
313
320
324
333

112
221
222
223
241
242
243
244
311
312
313
320
324
333

CLC Class CLC Class

Figure 1. Confidence interval of the User’s Accuracy (UA) and Producer’s accuracy (PA) for level 3
classes of the CLC2000 Portugal. The blue line indicates the 85% value.

An analysis of Fig. 1 indicates that the amplitude of the confidence interval varies with land
cover class. While there are some classes with a narrow interval (e.g., 112, 211, 311) there are
others with a rather large interval (e.g., 241, 312). The large confidence intervals are an indicator
of the heterogeneity of the accuracy in the different sampling units. Regarding the user’s accuracy,
the only classes with a value lower than 85% are 241 and 312. Regarding the producer’s accuracy,
the only classes with a value below 85% are 223, 313 and 324. These results confirm the good
quality of the CLC2000 map for Portugal.

464 M. Caetano, F. Mata & S. Freire


Accuracy assessment of the Portuguese CORINE Land Cover map

Table 3. Error matrix for the samples selected for accuracy assessment of the CLC2000, at level 3.
CLC-REF
CLC UA
111 112 121 122 123 124 131 132 133 141 142 210 213 221 222 223 241 242 243 2 44 311 312 31 3 320 324 331 332 333 334 411 421 422 423 511 512 521 522 523 Total
2000 (%)
111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
112 53 1540 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1593 96.7
121 0 0 444 0 0 0 0 0 27 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 494 89.9
122 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 100
123 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
124 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
131 0 0 0 0 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 193 100
132 0 0 0 0 0 0 0 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 100
133 0 0 0 0 0 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 100
141 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
142 0 0 0 0 0 0 0 0 0 0 178 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 178 100
210 0 0 0 0 0 0 0 0 0 0 0 18559 26 9 0 182 0 283 83 1125 19 0 0 47 889 0 0 0 0 0 0 0 0 0 0 0 0 0 21223 87.4
213 0 0 0 0 0 0 0 0 0 0 0 20 224 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 244 91.8
221 0 0 0 0 0 0 0 0 0 0 0 96 0 4040 0 41 6 78 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4269 94.6
222 0 0 0 0 0 0 0 0 0 0 0 0 0 0 353 108 0 0 0 0 108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 569 62.0
223 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 3372 0 0 0 11 8 0 0 0 25 0 0 0 0 0 0 0 0 0 0 0 0 0 3445 97.9
241 0 0 0 0 0 0 0 0 0 0 0 101 0 0 0 721 3758 509 102 79 0 0 0 46 66 0 0 0 0 0 0 0 0 0 0 0 0 0 5381 69.8
242 0 14 5 0 0 0 0 0 0 0 0 0 287 0 109 37 339 722 6418 117 123 12 0 0 20 19 0 0 0 0 0 0 0 0 0 0 0 0 0 8348 76.9
2 43 0 25 0 0 0 0 0 0 0 0 0 447 0 60 97 32 90 261 7262 68 136 0 32 22 390 0 0 10 0 0 0 0 0 0 0 0 0 0 8934 81.3
244 0 0 0 0 0 0 0 0 0 0 0 106 0 0 0 0 0 0 0 5099 153 0 0 0 365 0 0 0 0 0 0 0 0 0 0 0 0 0 5724 89.1
311 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 7 0 71 17246 179 376 44 493 0 0 0 0 0 0 0 0 0 0 0 0 0 18423 93. 6
312 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 1 4 0 863 5703 1753 0 1106 0 0 0 0 0 0 0 0 0 0 0 0 0 9438 60.4
313 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 708 762 6230 0 776 0 0 0 0 0 0 0 0 0 0 0 0 0 8487 73.4
320 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 6 11 167 6 0 0 8 7813 2011 31 0 106 0 0 0 0 0 0 0 0 0 0 10162 76. 9
324 0 0 44 0 0 0 0 0 0 0 0 35 0 3 0 0 0 12 5 74 1603 167 576 492 13765 0 0 0 0 0 0 0 0 0 0 0 0 0 16777 82. 0
331 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 176 0 0 0 0 1 0 0 0 0 0 0 0 177 99.4
332 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 152 0 0 0 0 0 0 0 0 0 0 0 152 100
333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 391 134 0 0 1457 0 0 0 0 0 0 0 0 0 0 1988 73.3
334 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 207 0 0 0 0 0 0 0 0 0 221 93.7
411 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
421 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1064 0 0 0 0 0 0 0 1064 10 0
422 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 519 0 0 0 0 0 0 522 99.4
423 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 116 0 0 0 0 0 116 100
511 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 99 0 0 0 0 99 100
512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 237 0 0 0 237 100
521 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 0 0 51 100
522 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 0 0 1148 0 1202 95.5
523 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 21 100
Total 53 1738 488 31 0 0 193 35 58 0 179 19659 251 4255 491 4795 4582 7592 7748 6655 20856 6816 898 9 8875 20040 207 152 1572 207 0 1065 574 116 99 237 51 1148 21 129830
PA (%) 0.0 88.6 91.0 100 -. - 100 100 55.2 - 99.4 94.4 89.2 94.9 71.9 70.3 82.0 84.5 93.7 76.6 82.7 83.7 69.3 88.0 68.7 85.0 100 92.7 100 - 99.9 90.4 100 100 100 100 100 100 107542
465
5 CONCLUSIONS

A rigorous thematic accuracy assessment procedure was developed to validate the Portuguese
CLC2000 database, rooted in a statistically sound method. It can be stated with a 95% confidence
level that this land cover map has rather high overall accuracy indices at all levels of the CLC
nomenclature, meeting the accuracy requirements set for the CLC Project. Regarding specific
accuracy indices at level 3 of CLC, it can be stated with a 95% confidence level that only classes
223, 241, 312, 313, and 324, have values below 85%.

ACKNOWLEDGEMENTS

The authors would like to acknowledge the contributions to this validation effort made by Pedro
Marrecas (validation photo-interpreter), and by Hugo Carrão and Vasco Nunes (GIS operations).

REFERENCES

Aranoff, S. 1982. The map accuracy report: A user’s view. Photogrammetric Engineering & Remote Sensing,
48: 1039-1312.
Aranoff, S. 1985. The minimum accuracy value as an index of classification accuracy. Photogrammetric
Engineering & Remote Sensing, 51: 99-111.
Biging, G., Colby, D. E. & Congalton, R. 1998. Sampling systems for change detection accuracy assessment.
Remote sensing change detection, environmental monitoring methods and applications. Ed. Lunetta, R.
& Elvidge, C., Ann Arbor Press, Chelsea – Michigan, USA.
Cochran, W. 1977. Sampling techniques, 3rd edition. John Wiley & Sons, Inc., New York, USA.
Congalton, R. 1983. A quantitative method to test for consistency and correctness in photointerpretation.
Photogrammetric Engineering & Remote Sensing, 49: 69-74.
Congalton, R. 1988. A comparison of sampling schemes used in generating error matrices for assessing the
accuracy of maps generated from remotely sensed data. Photogrammetric Engineering & Remote Sensing,
54: 593-600.
Congalton, R. 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote
Sensing of Environment, 37: 35-46.
Congalton, R. & Biging, G. 1992. A pilot study ground reference data collection efforts for use in forest
inventory. Photogrammetric Engineering & Remote Sensing, 58: 1669-1671.
Congalton, R. & Green, K. 1999. Assessing the accuracy of remotely sensed data: principles and practices.
CRC Press, Danvers, USA.
EEA 2002. CORINE Land Cover update, I&CLC2000 project, Technical Guidelines.
Ginevan, M. 1979. Testing land-use map accuracy: another look. Photogrammetric Engineering & Remote
Sensing, 45: 1371-1377.
Gopal, S. & Woodcock, C. 1994. Theory and methods for accuracy assessment of thematic maps using fuzzy
sets. Photogrammetric Engineering & Remote Sensing. 60: 181-188.
Hammond, T. & Verbyla, D. 1996. Optimistic bias in classification accuracy assessment. International
Journal of Remote Sensing, 17: 1261-1266.
Hay, A. 1979. Sampling designs to test land-use map accuracy. Photogrammetric Engineering & Remote
Sensing, 45: 529-533.
Hord, R. & Brooner, W. 1976. Land-use map accuracy criteria. Photogrammetric Engineering & Remote
Sensing, 42: 671-677.
Instituto do Ambiente 2005. CORINE Land Cover 2000 Portugal. Technical Report.
Janssen, L. & van der Wel, F. 1994. Accuracy assessment of satellite derived land cover data: a review.
Photogrammetric Engineering & Remote Sensing, 60: 419-426.
Ma, Z. & Redmond, R. 1995. Tau coefficients for accuracy assessment of classification of remote sensing
data. Photogrammetric Engineering & Remote Sensing, 61: 435-439.
Maucha, G. & Buttner, G. 2005. Validation of the European CORINE Land Cover 2000 database. In this
book.
Perdigão, V. & Annoni, A. 1997. Technical and methodological guide for updating the CORINE Land Cover
database, JRC/EEA.

466 M. Caetano, F. Mata & S. Freire


Rossiter, D. 2001. Assessing the thematic accuracy of area-class soil maps. Soil Science Division, ITC.
Enschede, Holland. Waiting publication.
Rosenfield, G. 1981. Analysis of variance of thematic mapping experiment data. Photogrammetric Engineering
& Remote Sensing, 47: 1685-1692.
Rosenfield, G. & Fitzpatrick-Linz, K. 1986. A coefficient of agreement as a measure of thematic classification
accuracy. Photogrammetric Engineering & Remote Sensing, 52: 223-227.
Skidmore, A. & Turner, B. 1992. Map accuracy using intersect sampling. Photogrammetric Engineering &
Remote Sensing, 58: 1453-1457.
Stehman, S. 1992. Compararison of systematic and random sampling for estimating the accuracy of maps
generated from remotely sensed data. Photogrammetric Engineering & Remote Sensing, 58: 1343-1350.
Stehman, S. 1996. Estimating the kappa coefficient and its variance under stratified random sampling.
Photogrammetric Engineering & Remote Sensing, 62: 401-407.
Stehman, S. 1997. Estimating standard errors of accuracy assessment statistics under cluster sampling.
Remote Sensing of Environment, 60: 258-269.
Stehman, S. 1999. Basic probability sampling designs for thematic map accuracy assessment. International
Journal of Remote Sensing, 20: 2423-2441.
Stehman, S. 2000. Practical implications of design-based sampling inference for thematic map accuracy
assessment. Remote Sensing of Environment, 72: 35-45.
Stehman, S. & Czaplewsky, R. 1998. Design and analysis for thematic map accuracy assessment: fundamental
principles. Remote Sensing of Environment, 64: 331-344.
Treitz, P. Howarth, P. Suffing, R. E. & Smith, P. 1992. Application of detailed ground information to
vegetation mapping with high spatial resolution digital imagery. Remote Sensing of Environment, 42: 65-
82.

Accuracy assessment of the Portuguese CORINE Land Cover map 467

You might also like