Professional Documents
Culture Documents
Inspire DQ&MD Krakow 2010-06-22 Minutes
Inspire DQ&MD Krakow 2010-06-22 Minutes
Inspire DQ&MD Krakow 2010-06-22 Minutes
Infrastructures Unit
Participants:
1) Representatives of EU/EFTA countries nominated by the INSPIRE Contact points: Country Austria Belgium Bulgaria Czech Republic Denmark Finland France Germany Greece Hungary Latvia Norway Poland Romania Slovakia Slovenia Spain Sweden UK Name Georg Topf Geraldine Nolf Lilyana Turnalieva Tomas Cajtham Dorthe Drauschke Aaro Mikkola Gilles Troispoux Sebastian Schmitz Eleni Grigoriou Tams Palya Saulius Urbanas Kre Kyrkjeeide Marcin Grudzie Organisation BEV AGIV - Agentschap voor Geografische Informatie Vlaanderen Czech Office for Surveying, Mapping and Cadastre National Land Survey CERTU / Ple Gomatique du Ministre Institute of Geodesy, Cartography and Remote Sensing Statkart Norway Main Geodetic and Cartographic Documentaion Centre Directorate of Geodesy and Cartography, Slovak Environmental Agency
Daniela DOCAN Martin Koka Irena Aman Celia Sevilla- Instituto Geogrfico Nacional de Espaa Snchez (IGN-E) Dolors Barrot-Feixat Christina Wasstrm NSDI CO-ordination Unit Lantmteriet, Sweden Dan Haigh Environment Agency (England & Wales
2) General audience from the INSPIRE conference (European Commission, UN FAO, EuroGeographics, Belgium, Czech Republic, France, Finland, Greece, Ireland, , Italy, Latvia, Netherlands, Poland, Sweden, United Kingdom around 60 persons) 1
3) Staff of the European Commission (INSPIRE Data Specification Support Team) Katalin Tth and Robert Tomas (workshop organisers) Vanda Nunes de Lima (INSPIRE Data Specification contact point)
Agenda
Welcome, objectives of the workshop Tour de table(introduction of the participants) Data Quality in INSPIRE: from Requirements to Metadata Analysis of the Member states responses (DQ questionnaire) Status of ISO 19157 project Discussion Conclusions, way forward R. Tomas All K. Tth R. Tomas Johan Esko All All
Sweden
1) The answers sent to the Commission are composed from 3-4 replies. A face to face meeting is planned for September. 2) Discussion paper: The metadata part was a bit more difficult to understand 3) DQ requirements: start the specification process and justify DQ requirements with usecases 5) MD the importance is obvious however focus on the most important metadata. Each data theme has to select and discuss which metadata elements are important 6) Lineage: better if structured 8) Positional accuracy is not the most important element sometimes data with good positional accuracy do not meet the expectations of users.
Spain
1) No, only input from Catalonia has been received so far, but more agencies will be involved in the face to face discussions in September 3) Use cases are important the description of high level use cases should come from the interoperability target specifications the data providers dont know the users 5) Metadata the Spanish profile include more DQ elements. These elements are mainly targeted at expert users. Other users can be better informed from informal fitness for use descriptions. 6) Lineage is also a language issue who is going to provide translations from Spanish to other European users? 7) The question about the conformance levels is not clear, put it in context with INSPIRE. We are in favour for introducing more conformity levels. 8) Positional accuracy is important, but should be related to scale.
Slovenia
1) No, only few stakeholders have been consulted. The situation is very different in each data provider. Face to face discussion is foreseen till the end of the year. 2) The discussion paper is a good start, however some questions were difficult to understand. 3) The use cases approach is important in international level. Within a country data providers usually know who their users are and what their requirements are. 5) In ISO standards there are about: 400 metadata elements, which is too much. The best method to learn what the data is good for is best way is to give a call to the data provider. 6) Lineage is important: should be structured for each data set. Each INSPIRE data specification should provide a template for that. 7) Conformance more levels and structure for describing the reason of non conformance are needed. 8) Positional accuracy is not so much important; depend mainly on the scale and the nature of the data set.
Poland
1) No, concerning the content of the discussion, Poland is in a learning stage. Discussion will be organized later in the year. 2) The discussion paper is a good start. Question 4 is not clear add examples to better understand the context. 3) Use cases are good idea and they may be important for potential data users working on international level. DQ recommendation can be inserted in the interoperability target specifications when it is justified by the high level use-cases of the infrastructure. With time (with more experience gained by NMCAs) recommendation could be changed to requirements. 4) We have to follow realistic objectives in data transformation. At the beginning we should focus on the quality of the process and the data received after the transformation. Extension of metadata requirements should come later. Probably, only further iterations can be based on formal quality inspections. 5) MD the more the better. Consequently, users would like to have both conformance statement and reported DQ values in metadata. However, please keep in mind that producing good metadata is resource consuming. One of the most resource consuming metadata elements are those which describe data quality. Poland will probably not be able to provide all optional DQ elements specified in INSPIRE during the first iteration of INSPIRE data sets production. 6) Lineage: more standardization might be helpful, but the lineage will never fully replace other DQ elements. 7) More conformance levels are generally good idea. However, each conformance level should be accompanied by standardized quality evaluation procedure. The purpose of such procedure is validation of dataset against conformance level requirements. 8) Positional accuracy is important, but there are other important data quality measures which say something about the comparability of data sets.
Norway
1) No discussion has taken place lack of time, but it is planned 5
Netherlands
1) Lack of time further discussions will be organised. 2) The Dutch SDI approach: criteria for selecting data to be included in the INSPIRE are published this is the basis for the content (availability of features and attributes), quality discussion, frequency of update, scale, selecting regional or local data set. Not all the data sets are available for INSPIRE, but what is provided is the best available ones.
Germany
1) Due to lack of time the discussion process has not been started. It can be predicted that it will be very hard to define a common position for Germany: there are many lands and many data providers 2) DQ requirements should be examined also from domain point of view.
Finland
1) No the discussion has not taken place. 2) The paper was clear. If possible add examples for conformance. Even though the question was clear, it is very difficult to answer what target values are appropriate. 3) Use-cases are important, we support this approach. 4) Data transformation: quality control of the metadata production is also necessary. 5) The current demand in INSPIRE is not very accurate perhaps new MD elements are needed. Use the experience of data providers and EuroGeographics 7) Conformance more conformance levels make more confusion; never the less the two effective values are insufficient 8) Positional accuracy depends on the scale but give some limits of acceptable quality.
France
1) The discussion process is ongoing between users and producers 2) The discussion paper is clear and no additional questions are needed. 3). Yes, a priori data quality requirements have to be introduced. Use cases are very important to determine criteria on data. Go further with fitness for use, which is especially important for thematic data. 4) Transformation applies to metadata, too. It is more difficult for data transformation. Lineage is certainly a solution. 5) Actual values of a priori data quality requirements shall be reported when affordable. This is the best, but often too expensive for actual datasets. Conformance statements seem not to be very useful and easy to obtain. A solution could be to declare its datasets following defined quality classes. Additional metadata is needed; we must work together to find new metadata around the external concept of fitness for use. This new way of approaching the problem must completely shift the perception of how can I measure the quality of my data and let the user know towards what does the user need in terms of quality information and how can I provide what they need to avoid data misuses. This approach must create a more direct link between the datasets and their uses, between the concerns of data producers and the expectations of mass users of spatial data. It could be really difficult because researchers work hard to find simple method to describe external quality. 6
6) Lineage is valuable to users. It gives a lot of information about the life of the data set but can give just an idea on DQ. It does not replace any DQ element. Certainly, when we look different catalogues, the contents of this metadata are very various and sometimes irrelevant and without interest. This subject is complex; common reflection is needed how to guide the writer of this metadata and how to describe the history of the data? Generally, it is necessary avoid free texts. From experience, open metadata is too poorly drafted and difficult to operate. 7) Conformance may be a partial but cheap solution for fitness for use. However more conformance level can create confusion, it seems to be practical to limit conformance levels to five. If the data set is correctly described with the good metadata the user must be able to understand if the data set is conform and no need for describing separately the non conformance. 8) Positional accuracy is important propose quality classes, following NATOs STANAG.
Denmark
1) The National INSPIRE advisory board will take up the discussion in the autumn.
Czech Republic
1) The first discussion forum was organised with involvement of 2 organizations. The plan is to continue in September with other bodies. 2) Make clearer terms e.g. target and scope and give definition of the non standard quality terms. 3) User requirements have to be revised to determine DQ requirements. 5) Be aware that for consuming metadata another tools might be required. For example metadata acquired during production processes could be generated through automation tools. The outputs should be standardised: it is not necessary to use only metadata; is possible to use also viewing services, reports or other outputs. 6) It seems that lineage is misused in INSPIRE guidelines it is not a black hole. One way is to change current usage of lineage to another way of reporting in MD/DQ elements. 7) Conformance should be demonstrated by certification and accreditation. 8) Positional accuracy is of prime interest only for a couple of use-cases. In the Czech Republic positional accuracy is calculated only for points and they may or may not be aggregated for lines and polygons.
Belgium
1) No face to face discussion has taken part, especially not at the federated level 2) The paper was clear, but just too short in time to revise properly. 3) Use cases, putted into real practice of metadata, are a good start and serve for justifying DQ requirements. Never the less keep in mind that any data is better than no data 4) Data transformation there is a need for objective measurement of data quality (need for certification methods/tools) on EU-level 5) Metadata the more are certainly not the better. In Belgium more elements are used than in INSPIRE; each conformant to EN ISO 19115. However experience shows that 7
low-end users dont read metadata, especially not the quality metadata elements. They want to know if the data will fulfil their needs. In this respect minimum metadata elements are abstract, scale, usage & constraints and lineage. Producing MD quality elements should not be the goal, but the aim has to be making the quality better in the data itself. 6) Lineage: Each update brings a new lineage process step (free text description), but is only for history dont misuse it! 7) Conformance is enough as it is now. General comment: Find the best balance in necessity, inevitability, nice to have and overkill.
Austria
1) The discussion has taken place in form of commenting the discussion paper, but no face to face meeting has been organised. No further discussion is planned for this year. 2) The discussion paper was a very good summary of the state of the art. 3) Before introducing a priori data quality requirements in the interoperability specifications, the scope of INSPIRE need to be defined first. 5) Evaluate the need for new metadata first the more the better is not adequate at this stage and will not help the user. But MD-Elements like CRS, Encoding and fitness for use (scope) should be included in the thinking process. 6) Lineage: it should be added, how quality is deteriorated in course of transformations 7) What should be the benefit of detailed conformance levels? At the moment (MDRegulation) a dataset can be conform, not conform or not evaluated. Additional conformance steps do not seem to be appropriate. What conformance level should it be, if a data set meets 80% of the requirements and how useful is this information? 8) Positional accuracy can bear witness about the comparability of data sets but it does e.g. not provide information about acquisition density and parameter. If positional accuracy will be fixed as a priori DQ requirement at a very high level it can course that many data sets will not fulfil this requirement. Inaccurate data are still better than no data. General comment: Please do not force the member states to implement additional data quality requirements and MD-quality requirements at the moment. Data quality requirements are important, for sure. But the implementation of INSPIRE (data and services) should have priority.
Peter Semrad (JRC, Institute of Energy, Petten) INSPIRE is not a general benchmark for data quality. Distinction between reference and thematic data is very important as the first can carry attributes for many other themes consequently the DQ requirements for Annex I should be higher. Positional accuracy is not so important for many themes e.g. in gas pipelines modelling the connectivity is much more important. When talking of quality look at other aspects too. Antti Jakobsson (EuroGeographics) INSPIRE is getting closer and closer to this topic, which is great! The current tools in the specifications are in the right directions the ESDIN project has tested them, but found some errors. The ESDIN project offers their results to be shared in a specific guideline. Create a platform to channel the user requirements in the process (ESTAT, EEA, GMES) INSPIRE has to introduce some minimal rules like the minimal logical consistency. EuroRegionalMap provides a good example how to deal with data quality and metadata when data come from disparate sources. It would be worth doing a small study on it. Marc Leobet (INSPIRE National Contact Point, France) It is not appropriate to be guided by use-cases. The absence of quality might be costly in the near future; put in realistic, but ambitious targets especially for positional accuracy of reference data. Users use data whatever they want to, but we have to sufficiently inform what data is good for.
Conclusions:
All participants agreed that the data quality in INSPIRE is an important issue and thus JRC should continue facilitating the process of finding the common position among Member States. Due to the lack of time proper discussions in Member states have not been realised, but based on the outcome of the Workshop explanations of the terminology etc. participants expressed that the national discussions will take place till the end of November 2010. It was agreed that based on the national discussions the MS who has not sent the answers yet would provide them by November 2010. (updated versions of already sent MS responses) JRC ISNPIRE team agreed to provide the Minutes of the workshop and updated version of the Discussion paper (adding the examples) The need for a new Data Quality workshop has been raised that could take place at the beginning of the next year (February 2011) to discuss the results of the Data Quality discussions in the Member States.
Outline
1. 2. 3. 4.
Introduction The roots of the problem Data quality and metadata in data production Data quality and metadata in spatial data infrastructures 5. What has been done in INSPIRE so far? 6. Why we have to go on? 7. What we plan to do?
2
The wide variety of data quality elements and measures sometimes rather confuse than help (even to specification development experts) Conformity statements based on self declaration too many question marks
How reliable they are? Can they replace metadata? Are they useful for the users when they dont know the content against which the conformity statement is issued? Different viewpoints (against specifications or against usability?)
4
DQ in data production
Ex ante Ex post Specification development Data & metadata production Quality assurance Data utilisation
Careful analysis of the use-case of a specific user The requirements derived from the use case contain DQ requirements too The data product specification prescribes target values for selected DQ elements
Targets at achieving/ fulfilling the data specification elements comprising those on DQ Contains DQ inspections based on appropriate sampling against the criteria of the data specification Results:
Pass/fail decisions Conformance declarations
The results of DQ inspection are published as metadata for evaluation and use May contain additional metadata to help the users better understand the data
Selected data specification elements (topic categories, scope, purpose, representation type, identification information, geographical description, encoding) Other specific metadata elements (lineage)
5
ISO 19131 (Data Product Specifications) contains both DQ requirements and Metadata
Definition of the scope based on the high level use-cases that the infrastructure is supporting Three levels of interoperability
Publish and share every existing data Establish interoperability Achieve full harmonisation
Targets at achieving/ fulfilling the specification elements (comprising those on DQ) in the interoperability target specification Contains DQ inspections based on appropriate sampling against the criteria of the interoperability specification to determine
(how well the transformation work How the DQ changes in the process
The results of DQ inspection are published as metadata for evaluation and use May contain additional metadata to help the users better understand the data
Selected data specification elements (topic categories, scope, purpose, representation type, identification information, geographical description, encoding) Other specific metadata elements (lineage)
Results:
Pass/fail decisions Conformance declarations 6 Interoperability target specifications may follow the structure of ISO 19131. Usually they contain both
Wide selection of data available Interoperability problems in many applications User are generally unsatisfied with data quality
No interoperability Put in DQ obstacles requirements when it is justified by the Only few datasets included scope/ typical usecases of the Low level of data infrastructure sharing
A smaller group of users completely satisfied, while the rest my remain empty handed 8
DQ requirements in INSPIRE
The Directive follows the interoperability approach: users should be able to combine spatial data from different sources in consistent way without specific efforts of humans or machines Data shall be comparable in terms of logical and geometrical consistency that implicitly sets requirements against data quality Requirements can be fulfilled by rigorous data modeling based on the Generic Conceptual Model and direct DQ requirements While the data models included in the specifications provide solid basis to provide data of appropriate quality in terms of logical consistency, other requirements against data quality are addressed on case by case basis Generic approach:
Use recommendations rather than mandatory data quality requirements Fully document data quality as metadata When DQ requirements are introduced they are also reported as metadata using the same DQ element
Natural diversity of the data themes: no common approach, slightly different data quality requirements and metadata elements in the specifications in spite of trying harmonised them
10
11
Why do we have to return to this topic? Applying (a-priori) DQ requirements has been a re-occurring problem where no real consensus has been achieved Deal at the INSPIRE Committee meeting in December 2009: the Commission shall further investigate the question To initiate the exchange of views a discussion paper has been prepared and distributed
12
Implementing rules
Commission Regulation (1205/2008) as regards Metadata Commission Decision (2009/4199) as regards monitoring and reporting Draft Commission Regulation as regards Interoperability of spatial data sets and services (Annex I Spatial data themes) Commission Regulation (268/2010/EC) as regards the access to spatial data sets and services
Article 6 ... Member States shall also make available, upon request, information for evaluation and use, on the mechanisms for collecting, processing, producing, quality control and obtaining access to the spatial data sets and services, where that additional information is available and it is reasonable to extract and deliver it.
Difficult to balance..
4
Questions
1. Is there a need to include a priori data quality targets (elements, measures, and values) in INSPIRE data specifications?
Yes, for each dataset addressing the same set of requirements Yes, but only for those datasets where achieving interoperability requires so (User requirements) No
2. 3.
Please, indicate the theme and whether these targets should be addresses by mandatory requirements (M) or recommendations (R)? Please, indicate the data quality elements, measures, and the target values to be used (add as many lines as needed). Please fill a separate table for each data theme to which a priori DQ requirements / recommendation apply.
6
Statistics: 15 responses Yes, for each dataset Yes, but only for some.. No
3x 7x 5x
7
General Comments: Extra cost for data providers to implement Directive doesnt requires collection of new data Better to use DQ Targets then Requirements Nice to have, but difficult to implement (Annex I experience) Quality is unlikely to be consistent across national datasets (different sources) Spatial data is (has to be) produced by the public authorities only in the appropriate quality which is necessary for the specific use.
Statistics: 10 responses
Mandatory: Annex I Themes Administrative Units 6x Geographical Names 4x Addresses 4x Cadastral Parcels 6x Hydrography 5x Transport Networks 5x Protected Sites 2x CRSystems 1x Annex II & III Themes Elevation 5x Orthoimagery 5x Buildings 4x Statistical Units 6x Land Cover 5x Utility and Governmnetal Services 2x
9
General Comments: Majority of positive answers only based on clear user requirements To distinguish between Reference data and Thematic data Minimal DQ Target on Positional accuracy (AU, CP) DQ targets only as recommendations only 2x
10
Statistics: 7 responses Not enough time to elaborate Should be proposed by domain experts Different DQ measures and values compare to Annex I Data specifications
11
Questions
4. Do you recommend to specify mandatory metadata elements in INSPIRE when no a priori data quality requirements have been specified, or to complement those specified in the DQ section to inform users about the fitness for purpose? 5. What is the best way to generate DQ metadata about the data that has been made conformant to the INSPIRE data specifications (i.e. after the necessary data transformations)?
Keep the original metadata Generate new metadata based on calculations or quality inspection by appropriate sampling Keep the original metadata and described as process step in MD_lineage (transformations performed with their possible effect on data quality)
6. Do you recommend to introduce theme-specific conformity levels (in addition to conformant, non conformant, not evaluated) in the INSPIRE Annex II-III data specifications development? 7. Would there be value in adding theme-specific conformity levels (apart from conformant, non conformant, not evaluated) to INSPIRE Annex I data specifications?
12
13
General comments:
Optional MD elements to report on the fitness for purpose should be introduced MD for interoperability should be applied also for Annex II&III lowest common denominator within Europe will cause unbearable expenses. It is more important to have technical characterisation and lineage then DQs The MD lineage should be structured (x bad use of the En ISO 19115 MD element instead of DQs)
14
Statistics: 15 responses Keep the original MD and use the MD_lineage 10x Generate new MD 5x
15
General comments:
Not ideal, but practical approach (Use of LI_ProcessStep) New MD based on quality inspection a long term target For Annex I Reference data new MD; for Annex II&III MD_Lineage Additional MD has to be searchable via Discovery Services
16
17
General comments: Cannot be dealt with now based on user requirements for Annex II&III Consistent approach should be taken no dataset will be 100% conformant how to record levels of not conformity or where is not conformant
18
19
General comments:
More variants to not conformant explaining way not complete or inadequate quality Is the origin dataset (national) conformant or it has to be transformed every time when delivered (via services) to be conformant No theme specific conformity levels complex in handling and hardly measurable
20
http://inspire.jrc.ec.europa.eu
23