Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/306456793

Spatial data management

Conference Paper · August 2016

CITATIONS READS

2 4,194

1 author:

Geir-Harald Strand
Norwegian Institute of Bioeconomy Research
69 PUBLICATIONS   715 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Spatio-Temporal analysis of forest health in Norway View project

Copernicus Land Monitoring Service - National contributions View project

All content following this page was uploaded by Geir-Harald Strand on 24 August 2016.

The user has requested enhancement of the downloaded file.


INTERACTION WITH USERS • SESSION C

Spatial data management


Geir-Harald Strand
Survey and statistics division, NIBIO
Spatial data management
Geir-Harald Strand1

A considerable amount of official statistics is based on spatially refer-


enced primary data. This potential is probably not fully utilized. A
more cognisant handling of the spatial aspect of statistics can boost
the efficiency in data collection, create new opportunities for data
analysis and improve the communication of statistics through more
and better cartography. The key is to ensure that professional spatial
data management is included in the statistical information chain as
described in the GSBPM. This is partly a technological issue, but also
an organizational challenge.

Key words: spatial data management, spatial information system

1. Introduction
Much of the official statistics produced today is based on primary data with direct or
indirect spatial reference. The reference can be an explicit coordinate, but is more
likely to be an address, a cadastral unit or an administrative region. This kind of ref-
erences allow the observations to be linked by location, providing for new ap-
proaches to data collection, new opportunities for data analysis and more use of
maps as a visualization and communication tool. An example from the statistical
community is the increased use of geographical grids as a framework for spatial sta-
tistics (Strand & Bloch 2009, Fujimoto et al. 2015).

Dedicated spatial data management through standardized and well-documented


spatial data bases is the key to actuate the potential of the spatial data held by pro-
ducers of statistics. The primary data must remain inside the existing databases, but
information about the locations referenced by the primary data can be organized in
dedicated spatial data systems. The introduction of spatial data management for the
statistics on land resources (Strand 2013), agriculture and forestry (Tomter et al.
2010) in Norway has fostered improvements throughout the entire information chain
ranging from data collection through analysis to data delivery.

Spatially referenced data can be visualised as thematic maps, but also cross-linked
by location and analysed with respect to spatial aspects. The potential is probably
not fully utilized. This assertion can be illustrated using an example. Part of the
Norwegian economic statistics for agriculture is based on a detailed review of the
accounts from 910 farms. A possible application of these statistical data is to examine
the difference in the economic results of sheep-farmers in areas with and without the

1 Survey and statistics division, NIBIO, Norway, Email: ghs@nibio.no

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


presence of large carnivores. The locations of the farms are, however, not recorded
in the material. Extracting subsets based on location is therefore no simple task.

Fortunately, the cadastral property identification code for each farm has been rec-
orded. This is an asset, because the National Agricultural Administration maintains
a database where the cadastral property identification code of every farm holding in
Norway is kept, together with key geographical data (including a representative
point location). The location of each farm in the survey is thus obtained by connect-
ing the two data sets and retrieving the locations from the NAA database. When this
is done, the survey data can be linked (by position) to a digital map of the manage-
ment areas for large carnivores. The latter data set is produced and maintained by
the National Environmental Administration and available as part of the National
Spatial Data Infrastructure. Through this link, the survey farms can be classified and
analysed according to their location: Inside or outside the management areas for
large carnivores.

The example shows that there is a potential for a spatial application, but it is not yet
utilized. Such examples are probably abundant. Increased attention to management
of the spatial aspect of statistics is therefore expected to improve the efficiency of
data collection, provide new opportunities for data analysis and improve the com-
munication of statistics through more and better cartography.

2. Everything is somewhere
“Everything is somewhere” is the catchy title of a geography quiz book for children
(McClintock 1986). This assertion may not quite be true – but much of the primary
data used in official statistics does have a location and can be linked to a place. Peo-
ple live somewhere and they do work somewhere. Accidents happen somewhere.
Goods are produced somewhere and maybe sold somewhere else. It is consequently
transported between those places. Much statistics can be produced without any
knowledge about these locations, but more statistics can be created when the loca-
tions are known – possibly also contributing new and valuable information (Good-
child 2007).

The spatial references can be direct or indirect. A direct spatial reference is an explic-
it coordinate or set of coordinates. The direct spatial reference can be used to place
the observation on a map. The indirect spatial reference is a reference to another ob-
ject that possesses the required direct spatial reference. An address, a cadastral unit
or the identification code of an administrative (e.g. NUTSx) region can act as indirect
references. This information cannot, by itself, place an observation on a map. But the
coordinates of the referenced object can be used for this purpose.

Spatial references – direct or indirect – allow observations to be sorted, structured


and combined by location. This provides opportunities for new approaches to data
collection; new prospects for data analysis; and more use of maps as a visualization
and communication tool. The prerequisite is obviously that spatial references are
collected and stored along with the data.

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


3. Spatial data in the GSBPM context
The process needed to produce official statistics is described in the Generic Statistical
Business Process Model (GSBPM). The eight phases of GSBPM are: 1) Specify needs;
2) Design; 3) Build; 4) Collect; 5) Process; 6) Analyse; 7) Disseminate; 8) Evaluate.
The process described by the GSBPM thus resembles a value-chain, perhaps more
appropriately described as an information-chain. The spatial aspects of the data
must be accounted for throughout the process, and in particular in the central part
represented by steps 4 to 7 (Figure 1).

Figure 1: GSBPM steps 4 – 7 with step 5 described as “Data management” instead of “Data processing”

Proper spatial references allow the process to collect data by linkage to other data-
bases, and use spatial analysis and cartographic communication if needed. Spatial
references allow the data in a particular business process to interact, through spatial
linkages, with spatial data residing in other business processes. The multiple use of
the statistical system for farm accounts, described above, is an example of such in-
teraction. The prerequisite for interaction is that proper spatial references – direct or
indirect - are obtained in the data collection phase and that they are stored for later
use. Consequently, there is strong a need for (spatial) data management. This is cur-
rently not included in GSBPM (although it could be seen as variant of phase 5: Pro-
cessing data).

Standardization together with good knowledge and understanding of the reference


systems is crucial to obtain proper and workable spatial references. Inadequate or
faulty spatial references can rarely be corrected later, at least not without incurring
major additional cost or loss of information. Too often, the potential of a data set is
lost – not because of missing spatial references but due to imperfect or undocument-
ed reference systems. The solution is to follow established national and international
standards, in close cooperation with competent national authorities.

Data management is at the core of a spatial data information chain. In principle, the
spatial reference is simply an additional characteristic of an observation. A database
containing spatial data is at first glance only an extension of any ordinary data base
structure. Latitude and longitude (or any other spatial reference, e.g. NUTS code,

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


address, postal code or a grid identifier) can be handled as variable characteristics.
There is, however, an inherent spatial ordering implied in the reference system –
representing not only where observations are made, but also how they are related to
each other. This topological information (Egenhofer et al. 1989) must be maintained
in the data base in order to make full use of the spatial information (Marceau et al.
2001), and is why a dedicated spatial database system is needed for spatial data
management.

Data analysis (phase 6 of GSBPM) is where information is extracted from the data-
base. A database stocked with spatially referenced data allows the analyst to include
the spatial context and relationships in the analysis (e.g. Voss 2007). Survey data can
also be downscaled using small area estimation methodology (Strand & Aune-
Lundberg 2012, Leyk et al. 2013). Finally, the dissemination phase of the GSBPM is
where the results are conveyed to users. Spatial references enable the use of maps as
part of the reporting.

Any information that could be drawn on a map (if the spatial reference were availa-
ble) is potentially spatial data. The spatial reference can be direct (by coordinate) or
indirect (by reference to another dataset containing coordinates). Clearly, it is im-
portant to maintain access to key data supporting indirect spatial reference (cadas-
tres, address registers, NUTS data, grids and postal codes are but a few examples).

4. Spatial data management by example


For statistical purposes, the Norwegian agricultural sector is a wilderness of obser-
vation units. The cadastral system is built on continuous parcels of land where one
or more parcels can constitute a basic property unit. Each basic property unit has one
or more owners and no basic property unit can cross the border between two NUTS
5 regions. The cadastre is a spatial database managed by the National Mapping Au-
thority and maintained (online) by the local municipal authorities. Information
about the basic property units, including their location and geometry (boundary) can
be retrieved from the central database by way of remote data service requests
(Strand 2001).

A basic property unit is a juridical entity, but does not correspond to the actual farm-
ing unit, defined as an economic entity. A farm can – and will frequently – consist of
several basic property units. This is a changeable relationship, and currently not rep-
resented in the cadastre. Instead, a centralized registry (the farm register) has been
established, connecting basic property units to the operational farm units. The spa-
tial references in the cadastre are explicit and direct, providing coordinates for indi-
vidual plots of land. The spatial references in the farm register are indirect, using the
basic property numbers as references. Any change in the cadastre, e.g. adjusting the
geometry of a parcel boundary, is thus immediately also reflected in the registry. As
a consequence of this organization, a particular application can request the registry
to return a list of all the basic property units belonging to a particular farm unit. This
list is, as a next step, used to request a longer list, containing all the parcels for each
basic property unit, from the cadastre.

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


The direct spatial reference for the farm unit consists of the combined geometries of
all these parcels retrieved from the cadastre, and can be used to calculate the area or
draw a map of the farm. Equally important: This geometry can be used to request
additional information about the farm from auxiliary databases. Examples are in-
formation about environmentally protected locations or cultural heritage sites.

5. Statistics for farms


The spatial data management described for the farm units and cadastral service pro-
vides a basis for a broader system of farmland statistics. An example is the land re-
sources found on each farm. Land resource mapping for individual farms or parcels
is a costly and inefficient undertaking. It is easier to organize land resource mapping
as a broad, national wall-to-wall survey. This was done in Norway, starting in the
1960’s. The results were later digitized and the information is sustained by a contin-
uous maintenance program. The information is kept in a spatial database (accessible
from http://kilden.nibio.no) at the Survey and statistics division of NIBIO

Figure 2: Cadastral units and Land resource map units combined results in “atomic” spatial elements
(sometimes called Minimal Mapping Units), unique with respect to cadastral as well as land resource
information

The digital land resource “map” is a database representing a partition (in a mathe-
matical sense) of the land surface. Formally, the data structure is quite similar to the
parcel data held in the cadastre. Each unit in the database is an observation with an
explicit spatial reference (a geometry) and a set of attributes characterizing the area.
A national standardization program has assured that the reference geometry is com-
patible with the cadastre, as well as with all other spatial information held by public
institutions in Norway. Consequently, although the actual shape of the spatial units
in the cadastre and the land resource map are different, they can be combined by
simple geometrical operations in order to create “atomic” spatial units.

The combination of cadastral units and land resource map units results in “atomic”
spatial units. An “atomic” unit is unique with respect to its ancestors – in this case
the cadastral as well as land resource information (Figure 2). Each “atomic” spatial
unit reference a single cadastral unit as well as a single land resource unit. With ref-
erence to the farm registry, as described above, it is now possible to assemble all the
atomic elements that belong to a particular farm unit and compute land resource
statistics (area by land resource class) for the farm unit. The example can of course be
extended to any combination of entities present in a common geographical space.

The Norwegian system for farmland statistics is using this approach to combine spa-
tially referenced data from multiple sources – all using spatial data management and
national geospatial standards to maintain compatibility (Figure 3). Information is

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


fetched “on-the-fly” from several sources, combined as a preparation for analysis,
processed statistically and cartographically and reports prepared in terms of tabular
and cartographic output which is returned to the user.

The user initiates the farmland statistics by choosing a farm identification code. The
identification code is used in a request to the central farm register, which will return
a list of the basic property units that constitute the farm. This list is used in a request
to the national cadastre, which returns a list of land parcels – including geometry.
The extreme north, south, east and west coordinates are used to define a bounding
box around the farm

Figure 3: The Norwegian system for farmland statistics is combining data from multiple sources in
order to compile on-the fly statistical information.

The bounding box is used in a request to the land resources database, which returns
the land resource units falling (at least partially) within the box. This is a potentially
time-consuming operation, and spatial indexing of the database is critical in order to
ensure a rapid response. The organization of the spatial database is thus also an im-
portant aspect of the system.

The parcel and land resource information is intersected – as described above - in


order to create atomic spatial elements. Those not belonging to the farm in question
are discarded and the rest subject to an elementary aggregation and summary opera-
tion, resulting in a statistical report of area per land resource class. Finally, carto-
graphic background information (topography, aerial photographs etc.) is fetched by
remote requests from other spatial databases (using the bounding box) and assem-
bled into a visual report consisting of a map and a statistical table (Figure 4). The

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


whole operation is done on-the-fly online, typically within 15-30 seconds. The sys-
tem can be accessed from http://gardskart.skogoglandskap.no/

The farmland statistics system is possible due to spatial data management. Basic
information is available and each topic is maintained by a particular institution
without redundant copies that creates uncertainty regarding data authority. Data are
reused for several purposes, and standardization ensures compatibility between sys-
tems. Another important factor is the organization of a national spatial data infra-
structure facilitating the sharing and exchange of data between public agencies.

Figure 4: Output from the Norwegian system for farmland statistics.

6. Conclusion
Efficient and flexible use of spatial information in a statistical production process
following the GSBPM model requires systematic and well-designed data storage
between data collection and analysis. There is wide acceptance of the fact that the
efficiency of the information chain is enhanced when the data storage aspect is pro-
fessionalized. It allows better documentation and easier access to data, and also facil-
itates multiple uses of the same data. This requires that data management is taken in
as an element of the “processing” phase in the GSBPM. We maintain that this is
equally true for spatial data. Spatial data management involves a systematic ap-
proach to include spatial data and spatial references in the overall database man-
agement strategy of the information chain.

The methodology as well as the technology needed to build, maintain and use spa-
tial data management systems are well known and thoroughly tested. The obstacle is
mainly organizational. The Survey and statistics division of NIBIO has developed its
spatial data management system over a period of 20 years. Our experience is that a
number of organizational factors represent the key to successful spatial data man-
agement

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


1) Acknowledgement - accepting that spatial data management is an important
issue for the organization
2) Ownership - involvement in the issue by the top management
3) Integration (of spatial data management) in the overall data management policy
4) Availability of resources – human as well as technical
5) Prioritizing – starting in one end and leaving some tasks for later
6) Long term commitment to spatial data management

7. References
Egenhofer, M. J., Frank, A. U. and Jackson, J. P. (1989) A topological data model for
spatial databases, In Buchmann, A.P., Günther, O., Smith, T.R. and Wang, Y-F. (eds)
Design and Implementation of Large Spatial Databases, Lecture Notes in Computer
Science, 409: 271-286. Springer Berlin-Heidelberg

Fujimoto, S., Mizuno, T., Ohnishi, T., Shimizu, C. and Watanabe, T. (2015) Geograp-
hic Dependency of Population Distribution. In: Proceedings of the International Con-
ference on Social Modeling and Simulation, plus Econophysics Colloquium 2014, 151
- 162, Springer International Publishing.

Goodchild, M. F. (2007). The Morris Hansen Lecture 2006 Statistical Perspectives on


Spatial Social Science. Journal of Official Statistics, 23: 1 - 15.

Leyk, S., Buttenfield, B. P., Nagle, N. N., and Stum, A. K. (2013) Establishing relat-
ionships between parcel data and land cover for demographic small area estimation.
Cartography and Geographic Information Science, 40: 305-315.

Marceau, D. J., Guindon, L., Bruel, M., and Marois, C. (2001) Building temporal topo-
logy in a GIS database to study the land-use changes in a rural-urban environment.
The Professional Geographer, 53: 546-558.

McClintock, J. (1986) Everything is somewhere. The geography quiz book, William


Morrow & Co

Strand, G-H. (2001) The role of Agriculture and Forestry in a National Geospatial
Data Infarstructure, Third International Conference on Geospatial Information in
Agriculture and Forestry, Denver, Colorado 5-7 November 2001

Strand, G-H. (2013) The Norwegian area frame survey of land cover and outfield
land resources. Norsk Geografisk Tidsskrift - Norwegian Journal of Geography 67,
24-35.

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se


Strand, G-H. and Aune-Lundberg, L. (2012) Small-area estimation of land cover
statistics by post-stratification of a national area frame survey. Applied Geography,
32(2), 546-555.

Strand, G-H. and Bloch, V.V.H. (2009) Statistical grids for Norway. Documentation
of national grids for analysis and visualization of spatial data in Norway.
Documents 2009/9, Statistics Norway, Oslo.

Tomter, S.M., Hylen, G., Nilsen, J.E. (2010) Norway. In: Tomppo, E., Gschwanter, T.,
Lawrence, M., McRoberts, R. (Eds.), National Forest Inventories, Pathways for
Common Reporting. Springer, pp. 411-424.

Voss, P. R. (2007) Demography as a spatial social science. Population research and


policy review, 26: 457-476.

Statistics Sweden | scb.se/nsm2016 | nsm2016@scb.se

View publication stats

You might also like