Professional Documents
Culture Documents
Accuracy Assessment of The Portuguese CO20160617 14352 14xusoy With Cover Page v2
Accuracy Assessment of The Portuguese CO20160617 14352 14xusoy With Cover Page v2
Accuracy Assessment of The Portuguese CO20160617 14352 14xusoy With Cover Page v2
Caet ano, M., Mat a, F., Freire, S. (2006). Accuracy assessment of t he Port uguese CORINE Land …
Fernando da Mat a
Evaluat ing Hyperion capabilit y for land cover mapping in a fragment ed ecosyst em: Pollino Nat ional Par…
simone pascucci
Mult it emporal MERIS images for land-cover mapping at a nat ional scale: a case st udy of Port ugal
Paulo Gonçalves
Accuracy assessment of the Portuguese CORINE Land
Cover map
M. Caetano
Instituto Geográfico Português (IGP), Rua Artilharia Um, 107, 1099-052 Lisboa, Portugal
mario.caetano@igeo.pt
F. Mata
Escola Superior Agrária de Elvas – Instituto Politécnico de Portalegre (ESA-IPP)
S. Freire
Instituto Superior de Estatística e Gestão de Informação – Universidade Nova de Lisboa (ISEGI-UNL)
Keywords: remote sensing, cartography, CORINE Land Cover 2000, land cover, accuracy
assessment
ABSTRACT: This paper presents the accuracy assessment methodology designed and implemented
to validate the Portuguese CORINE Land Cover 2000 (CLC2000) cartography. The procedure is
based on the comparison of the land cover database with reference data derived from visual
interpretation of aerial photography for sample areas. The sample unit is the land cover polygon,
organized within a systematic cluster sampling plan. Each cluster of polygons corresponds to an
aerial photography, which allowed a reduction in the number of air photos that had to be acquired
by maximizing the number of polygons to inspect in each photo. A multinomial distribution was
used to estimate the number of samples. In this validation effort, we computed the overall accuracy,
producer’s accuracy, and user’s accuracy. The CLC2000 for Portugal has an overall thematic
accuracy of 82.8, with a confidence interval of 80.5-85.2, and that the majority of the CLC classes
are mapped with high accuracy.
1 INTRODUCTION
Digital data obtained from satellites are nowadays a growing source of information used in the
production of land cover/use maps (LCLU). LCLU maps are important inputs to different studies
(e.g, environment, agriculture, land management), at global scale (e.g., land cover), regional scale
(e.g. temporal and spatial distribution of natural resources) and local scale (e.g. precision farming).
In this context, knowledge on the accuracy of maps and their fitting to reality is a key issue,
considering their uses in management and decision making. In fact, the lack of a map quality
indicator prevents an assessment of the risk of map use. Accuracy assessment of LCLU maps
produced using remote sensing data provides this information to managers and allows the estimation
of confidence levels for the decision-making. Accuracy assessment is the final quality check-up in
thematic cartography produced using remotely sensed data and gives the production team an
indication of how good was their work, and to the user an indication of the degree of confidence
that can be assigned to the cartography.
The CORINE Land Cover 2000 Project (CLC2000) in Portugal was carried out in the context
of the IMAGE and CORINE Land Cover 2000 (I&CLC2000) initiative from the European Commission
(Perdigão e Annoni, 1997; EEA, 2002). The Portuguese CLC2000 Project was undertaken between
October 2002 and February 2005, was funded by the Portuguese Environmental Institute (IA) and
by the European Commission, and was coordinated by the Institute of Statistics and Information
Management – New University of Lisbon (ISEGI-UNL) with the collaboration of the Portuguese
Geographic Institute (IGP) (Instituto do Ambiente, 2005). The main goal of this initiative is to map
the land cover of Europe in 2000 by updating the previous land cover maps. As a result of the
459
CLC2000 Project in Portugal, three land cover databases were produced for Continental Portugal:
(1) the CLC90-R database, which is an improvement (both geometric and thematic) of the first
CLC product of 1985/86/87, known as CLC90; (2) CLC2000 database, for the year 2000; and (3)
CLC-changes, the database of changes that occurred in the period between the two products
(CLC90 and CLC2000). The production of the CLC datasets was based on visual interpretation of
Landsat imagery, with relevant ancillary information also being used for best results. In this paper
we present the validation procedure designed and implemented for the national CLC2000 database.
The validation of the European land cover product was carried out by the European Technical Team
(Maucha & Buttner, 2005).
Accuracy assessment is the process used to estimate the accuracy of the classification present in a
map, by confronting the map with reference information that we assume as true. The final goal is
the production of an error matrix, from which statistics and indices that indicate the accuracy of
individual classes and of the whole map can be derived. In accuracy assessment, one has to define:
the reference data, type of sampling unit, sampling design and intensity. These factors have to be
adequately balanced in order to allow the extrapolation of results for the whole map. Unfortunately
there is not a standard procedure for accuracy assessment and the choice of a methodology depends
on factors such as time, money and human resources.
There are several widely used indices for accuracy assessment based on the error matrix (Congalton
& Green 1999): overall accuracy, producer accuracy, user accuracy, global kappa and conditional
kappa. Tau statistics are an innovation of kappa (Ma & Redmond 1995). Other techniques that are
not based in the error matrix can be used to produce different statistics: fuzzy (Gopal & Woodcock
1994), variance analysis (Rosenfield 1981) and intersect sampling (Skidmore & Turner 1992).
The traditional error matrix methodology is widely used in maps produced under mutually
exclusive and totally exhaustive rules (Congalton 1991). The fuzzy set theory was introduced by
Gopal & Woodcock (1994) to handle the ambiguity that could be present in classification. Variance
analysis, regression and qui-square analysis to contingency tables are inferential models that can
also be used in validation, in contrast with the inference performed with the support of sampling
designs (Stehman 2000). However, these inferential models have assumptions that differ from those
obtained with sampling designs and can be better suited for super populations (populations with
infinite or hardly quantifiable sampling units) (Stehman 2000).
Reference information is used to compare the classification with reality, and should have a
higher degree of accuracy than the information used for map production. Sources of reference
information include: aerial photography; satellite imagery with better resolution than those used in
map production; and field work (Biging et al. 1998; Congalton & Biging 1992). Congalton &
Biging (1992) state that only field work has the potential for complete discrimination of landscape
classes, but some difficulties can arise: access, human and material resources, cost, and time.
Reference information should refer to a date close to that of the data used in map production,
avoiding the influence of landscape change (Congalton & Green 1999), and should also be independent
from data used in the training process (Hammond & Verbyla 1996; Stehman 1999).
Sampling units are the fragments of the classified map that have a probability of being selected,
and their choice is affected by map goals, map scale, resources, and reference information. Congalton
(1988) lists four options: simple pixel, cluster of pixels, simple polygons and cluster of polygons.
Aranoff (1985) stated that the sampling unit must have at least the area of the minimum cartographic
unit. Aranoff (1989) recommends the use of a simple pixel, because with a higher level of detail we
can increase accuracy but also the occurrence of errors. Janssen & van der Wel (1994) recommend
the pixel if it is used in classification and the use of polygons when visits to the field are difficult.
Congalton (1988) prefers the cluster of pixels due to its easier identification in reference data.
Biging et al. (1998) report that maps based in polygons and maps based in pixels have different
statistical methods of validation. A map of polygons validated using pixels as sampling unit usually
1
Equation to calculate the number of samples to be collected using the binomial formulation (Cochran, 1977):
n = pˆ ⋅ qˆ ⋅ zα2 /2 / d 2 , where p̂ is a priori estimate of the proportion of concordance, d is the desired absolute
accuracy ( var ( p ) ), 1 – α is the confidence level of p̂ e zα/2 is the percentile α/2 of standardized normal
distribution.
2
Equation to calculate the sample size ni for each class i, with i = 1, ... , k, regarding an absolute accuracy di
2 2
(Congalton & Green, 1999): n i = pˆ i ⋅ qˆ i ⋅ χ (1,1–( α / k )) / d i , the total number of samples to be taken is the
maximum of ni, or n = max{n i }. The number of samples for each class is n/k.
i
3 METHODOLOGY
The methodology developed for accuracy assessment of the Portuguese CLC2000 was based on the
comparison of the final map with the “ground truth” for selected sample units, from which an error
matrix was computed. Accuracy indices were then derived from this matrix. The validation method
(Table 1) was designed to allow that the accuracy indices obtained for samples could be inferred for
the whole territory with a 95% confidence level.
Reference data Orthophotos 1:5 000 from INGA for year 2000
(used to derive ground truth)
Sampling unit Map polygon
Sampling design Unaligned Systematic cluster sampling
Number of clusters 144
Sampling intensity 1.5%
Accuracy assessment indices Overall accuracy index and user’s and producer’s accuracies
The reference information used in the validation process was orthorectified aerial photography
(i.e., orthophotos). The choice was based in the availability of an aerial coverage of Portugal for the
same year of the satellite imagery used to produce CLC2000 map. Because of the high cost of aerial
photography, it was decided to use the minimum number of photographs as possible and to use the
entire photo in order to maximize its use. The orthophotos are 4 km × 2,5 km in size, equivalent to
10 km2 (1000 ha), and its distribution over the country is framed by the CLC map grid.
3
KHAT means K hat, estimated kappa, K̂ (Treitz et al., 1992).
4 RESULTS
The results of the validation process (accuracy indices) for the Portuguese CLC2000 show a rather
good overall accuracy at each of the three levels of the CLC nomenclature (Table 2).
Table 2. Overall accuracy indices of CLC2000 at the three levels of the
nomenclature.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
UA (%)
0.6
PA (%)
0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
112
221
222
223
241
242
243
244
311
312
313
320
324
333
112
221
222
223
241
242
243
244
311
312
313
320
324
333
Figure 1. Confidence interval of the User’s Accuracy (UA) and Producer’s accuracy (PA) for level 3
classes of the CLC2000 Portugal. The blue line indicates the 85% value.
An analysis of Fig. 1 indicates that the amplitude of the confidence interval varies with land
cover class. While there are some classes with a narrow interval (e.g., 112, 211, 311) there are
others with a rather large interval (e.g., 241, 312). The large confidence intervals are an indicator
of the heterogeneity of the accuracy in the different sampling units. Regarding the user’s accuracy,
the only classes with a value lower than 85% are 241 and 312. Regarding the producer’s accuracy,
the only classes with a value below 85% are 223, 313 and 324. These results confirm the good
quality of the CLC2000 map for Portugal.
Table 3. Error matrix for the samples selected for accuracy assessment of the CLC2000, at level 3.
CLC-REF
CLC UA
111 112 121 122 123 124 131 132 133 141 142 210 213 221 222 223 241 242 243 2 44 311 312 31 3 320 324 331 332 333 334 411 421 422 423 511 512 521 522 523 Total
2000 (%)
111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
112 53 1540 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1593 96.7
121 0 0 444 0 0 0 0 0 27 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 494 89.9
122 0 0 0 31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 100
123 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
124 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
131 0 0 0 0 0 0 193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 193 100
132 0 0 0 0 0 0 0 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 100
133 0 0 0 0 0 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 100
141 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
142 0 0 0 0 0 0 0 0 0 0 178 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 178 100
210 0 0 0 0 0 0 0 0 0 0 0 18559 26 9 0 182 0 283 83 1125 19 0 0 47 889 0 0 0 0 0 0 0 0 0 0 0 0 0 21223 87.4
213 0 0 0 0 0 0 0 0 0 0 0 20 224 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 244 91.8
221 0 0 0 0 0 0 0 0 0 0 0 96 0 4040 0 41 6 78 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4269 94.6
222 0 0 0 0 0 0 0 0 0 0 0 0 0 0 353 108 0 0 0 0 108 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 569 62.0
223 0 28 0 0 0 0 0 0 0 0 0 0 0 0 0 3372 0 0 0 11 8 0 0 0 25 0 0 0 0 0 0 0 0 0 0 0 0 0 3445 97.9
241 0 0 0 0 0 0 0 0 0 0 0 101 0 0 0 721 3758 509 102 79 0 0 0 46 66 0 0 0 0 0 0 0 0 0 0 0 0 0 5381 69.8
242 0 14 5 0 0 0 0 0 0 0 0 0 287 0 109 37 339 722 6418 117 123 12 0 0 20 19 0 0 0 0 0 0 0 0 0 0 0 0 0 8348 76.9
2 43 0 25 0 0 0 0 0 0 0 0 0 447 0 60 97 32 90 261 7262 68 136 0 32 22 390 0 0 10 0 0 0 0 0 0 0 0 0 0 8934 81.3
244 0 0 0 0 0 0 0 0 0 0 0 106 0 0 0 0 0 0 0 5099 153 0 0 0 365 0 0 0 0 0 0 0 0 0 0 0 0 0 5724 89.1
311 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 7 0 71 17246 179 376 44 493 0 0 0 0 0 0 0 0 0 0 0 0 0 18423 93. 6
312 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0 0 1 4 0 863 5703 1753 0 1106 0 0 0 0 0 0 0 0 0 0 0 0 0 9438 60.4
313 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 708 762 6230 0 776 0 0 0 0 0 0 0 0 0 0 0 0 0 8487 73.4
320 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 6 11 167 6 0 0 8 7813 2011 31 0 106 0 0 0 0 0 0 0 0 0 0 10162 76. 9
324 0 0 44 0 0 0 0 0 0 0 0 35 0 3 0 0 0 12 5 74 1603 167 576 492 13765 0 0 0 0 0 0 0 0 0 0 0 0 0 16777 82. 0
331 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 176 0 0 0 0 1 0 0 0 0 0 0 0 177 99.4
332 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 152 0 0 0 0 0 0 0 0 0 0 0 152 100
333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 391 134 0 0 1457 0 0 0 0 0 0 0 0 0 0 1988 73.3
334 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 207 0 0 0 0 0 0 0 0 0 221 93.7
411 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -
421 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1064 0 0 0 0 0 0 0 1064 10 0
422 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 519 0 0 0 0 0 0 522 99.4
423 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 116 0 0 0 0 0 116 100
511 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 99 0 0 0 0 99 100
512 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 237 0 0 0 237 100
521 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 0 0 51 100
522 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 0 0 1148 0 1202 95.5
523 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 21 100
Total 53 1738 488 31 0 0 193 35 58 0 179 19659 251 4255 491 4795 4582 7592 7748 6655 20856 6816 898 9 8875 20040 207 152 1572 207 0 1065 574 116 99 237 51 1148 21 129830
PA (%) 0.0 88.6 91.0 100 -. - 100 100 55.2 - 99.4 94.4 89.2 94.9 71.9 70.3 82.0 84.5 93.7 76.6 82.7 83.7 69.3 88.0 68.7 85.0 100 92.7 100 - 99.9 90.4 100 100 100 100 100 100 107542
465
5 CONCLUSIONS
A rigorous thematic accuracy assessment procedure was developed to validate the Portuguese
CLC2000 database, rooted in a statistically sound method. It can be stated with a 95% confidence
level that this land cover map has rather high overall accuracy indices at all levels of the CLC
nomenclature, meeting the accuracy requirements set for the CLC Project. Regarding specific
accuracy indices at level 3 of CLC, it can be stated with a 95% confidence level that only classes
223, 241, 312, 313, and 324, have values below 85%.
ACKNOWLEDGEMENTS
The authors would like to acknowledge the contributions to this validation effort made by Pedro
Marrecas (validation photo-interpreter), and by Hugo Carrão and Vasco Nunes (GIS operations).
REFERENCES
Aranoff, S. 1982. The map accuracy report: A user’s view. Photogrammetric Engineering & Remote Sensing,
48: 1039-1312.
Aranoff, S. 1985. The minimum accuracy value as an index of classification accuracy. Photogrammetric
Engineering & Remote Sensing, 51: 99-111.
Biging, G., Colby, D. E. & Congalton, R. 1998. Sampling systems for change detection accuracy assessment.
Remote sensing change detection, environmental monitoring methods and applications. Ed. Lunetta, R.
& Elvidge, C., Ann Arbor Press, Chelsea – Michigan, USA.
Cochran, W. 1977. Sampling techniques, 3rd edition. John Wiley & Sons, Inc., New York, USA.
Congalton, R. 1983. A quantitative method to test for consistency and correctness in photointerpretation.
Photogrammetric Engineering & Remote Sensing, 49: 69-74.
Congalton, R. 1988. A comparison of sampling schemes used in generating error matrices for assessing the
accuracy of maps generated from remotely sensed data. Photogrammetric Engineering & Remote Sensing,
54: 593-600.
Congalton, R. 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote
Sensing of Environment, 37: 35-46.
Congalton, R. & Biging, G. 1992. A pilot study ground reference data collection efforts for use in forest
inventory. Photogrammetric Engineering & Remote Sensing, 58: 1669-1671.
Congalton, R. & Green, K. 1999. Assessing the accuracy of remotely sensed data: principles and practices.
CRC Press, Danvers, USA.
EEA 2002. CORINE Land Cover update, I&CLC2000 project, Technical Guidelines.
Ginevan, M. 1979. Testing land-use map accuracy: another look. Photogrammetric Engineering & Remote
Sensing, 45: 1371-1377.
Gopal, S. & Woodcock, C. 1994. Theory and methods for accuracy assessment of thematic maps using fuzzy
sets. Photogrammetric Engineering & Remote Sensing. 60: 181-188.
Hammond, T. & Verbyla, D. 1996. Optimistic bias in classification accuracy assessment. International
Journal of Remote Sensing, 17: 1261-1266.
Hay, A. 1979. Sampling designs to test land-use map accuracy. Photogrammetric Engineering & Remote
Sensing, 45: 529-533.
Hord, R. & Brooner, W. 1976. Land-use map accuracy criteria. Photogrammetric Engineering & Remote
Sensing, 42: 671-677.
Instituto do Ambiente 2005. CORINE Land Cover 2000 Portugal. Technical Report.
Janssen, L. & van der Wel, F. 1994. Accuracy assessment of satellite derived land cover data: a review.
Photogrammetric Engineering & Remote Sensing, 60: 419-426.
Ma, Z. & Redmond, R. 1995. Tau coefficients for accuracy assessment of classification of remote sensing
data. Photogrammetric Engineering & Remote Sensing, 61: 435-439.
Maucha, G. & Buttner, G. 2005. Validation of the European CORINE Land Cover 2000 database. In this
book.
Perdigão, V. & Annoni, A. 1997. Technical and methodological guide for updating the CORINE Land Cover
database, JRC/EEA.