The Special Nature of

Spatial Data
Robert Haining

This chapter describes some of the special geographical space and from properties that
or distinguishing features of spatial data are a consequence of measurement processes
opening the way to methodological issues by which data are collected for the purpose
that will be treated in more depth in later of storage in the spatial data matrix (SDM).
chapters. The use of the term ‘special’ The SDM is what the analyst works with.
should not be taken to imply that no other We conclude by considering the implications
types of data possess these features. Spatial of these properties for the methodology of
data analysis is a sub-branch of the more spatial data analysis.
general field of quantitative data analysis Geographic Information Science (GISc)
and has sometimes suffered from not paying is the generic label that is frequently used,
sufficient attention to that fact. Many of the particularly by geographers, to define the
data properties that will be encountered are area of science that involves the analysis
found in other types of (non-spatial) data but of spatially referenced data – that is data
when found in spatial data, may possess a where each case has some form of locational
particular structure or properties may arise in co-ordinate attached to it. Data is the lynch
particular combinations. pin in the process of “doing science” and
The chapter will first define what is meant it is essential that methodologies for spatial
by spatial data and then identify properties. data analysis are tuned to the properties of
It will be helpful, in order to put structure on spatial data.
this discussion, to distinguish ‘fundamental’ The science undertaken with spatial data
properties of spatial data from properties is usually ‘observational’ rather than ‘experi-
that are due to the chosen representation of mental’. This is important. Much spatial data

are not collected under controlled situations. data may come from a variety of different
We often cannot choose the values of sources including national censuses; public
independent variables in order to generate a or private agency records (e.g., national
satisfactory experimental design. There is no health service, police force areas, consumer
replication (in order, for example, to assess surveys); and satellite imagery; environmen-
the effects of measurement error) and the tal surveys; and primary surveys. The data
analyst must take the world as he or she may be collected from a census or from
finds it. There may be further problems in a sampling process. For the purposes of
specifying what the appropriate locational analysis data from different sources may be
co-ordinate is when studying certain types required. Studies in environmental epidemi-
of processes and outcomes. All this has ology utilise health, demographic, socio-
implications for the quality of spatial data and economic and environmental data. These
for the methodologies that can be employed. data may come with differing degrees of
We worry not only about the quality of our quality and may not all be collected on
data but exactly what it is we are observing the same areal framework (Brindley et al.,
in any given situation. A consequence of this 2005).
is that much of the data collected may be To understand the properties of spatial
used to build a model of the situation under data we need to understand the relationship
study which can then be used to estimate between equation (2.1) and the ‘real world’
parameters and test hypotheses. We shall from which the data are taken. In order to
see that some of the fundamental properties undertake data analysis the complexity of the
of spatial data raise major problems in real world must be captured in finite form
this regard. through the processes of conceptualization
and representation (Goodchild, 1989; Guptill
and Morrison, 1995; Longley et al., 2001).
We shall focus here only on the issues
2.1. SPATIAL DATA AND THEIR associated with capturing spatial variation,
PROPERTIES but the reader should note that there are
conceptualization and representation issues
A spatial datum comprises a triple of associated with the way attributes and time
measurements. One or more attributes (X) are captured as well.
are measured at a set of locations (i) at time t, The first step in this process, which
where t may be a point or interval of time. ultimately leads to the construction of the
So, if k attributes are measured at n locations SDM, involves conceptualizing the geogra-
at time t, we can present the spatial data in phy of the real world. There are two views
the form: of the geographical world in GISc – the
field and the object views. The field view
conceptualizes space as covered by surfaces
{xj (i; t) ; j = 1, . . ., k; i = 1, . . ., n}. (2.1) with the attribute varying continuously across
the space. This is particularly appropriate for
many types of environmental and physical
Equation (2.1) expresses in shorthand much attributes. The object view conceptualizes
of the content of the SDM. The record of space as populated by well-defined indivisi-
when the observation was taken (t) may be ble objects, a view that is particularly appro-
suppressed if analysis is concerned with only priate for many types of social, economic and
a single time period but may be retained other types of data that refer to populations.
if there are to be a series of comparative Objects are conceptualized as points, lines or
studies through time or if different attributes polygons.
were recorded at different times and the These two views constitute models of
analyst needs to be aware of this. Such the real world. In order to reduce a field

to a finite number of bits of data then

2.1.1. Fundamental properties
the surface may be represented using a
finite number of sample points at which Fundamental properties are inherent to the
the attribute is recorded or it may be nature of attributes as they are distributed
represented using a raster grid. Pixels are laid across the earth’s surface. There is a fun-
down independently of the underlying field damental continuity (structure) to attributes
and its surface variation. Alternatively, the in space that derives from the underlying
surface may be represented by polygons that processes that shape the human and phys-
partition the space into areas with uniform ical geographical world. We shall discuss
characteristics (e.g., vegetation zones). How examples of these processes in section 2.2.2.
well any field is captured by these different The geographical world would be a strange
representations will depend on the density of place if levels of attributes changed suddenly
the points or the size of the raster in relation to and randomly as we moved from one point
surface variability. There is a large theoretical in space to another close by. Continuity is
and empirical literature on the efficiencies also a fundamental property of attributes
of different spatial sampling designs – for observed in time. If we know the level
example the properties of random, systematic of an attribute at one position in space
and stratified random sampling given the (time) we can make an informed estimate
nature of variation in the surface to be of its level at adjacent locations (points
sampled (see, e.g., Cressie, 1991; Ripley, in time). The information that is carried
1981). The process of discretizing in this in a piece of data about an attribute at
way involves a loss of information on surface a given location provides information on
variability. what the level of the attribute is at nearby
This loss of detail on variability also arises locations. However as distance increases then
when selecting a representation based on the similarity of attribute values weakens and
the object view. A city may comprise many in the GISc literature this is often referred
households (points) but for confidentiality to as Tobler’s First Law of Geography
reasons information about households is (‘…near things are more related than distant
aggregated into spatially defined groups things’). Although Tobler’s First Law is
(polygons) – output areas in the case of clearly an oversimplification, and in relation
the 2001 UK census, enumeration districts to some types of spatial variation just
prior to 2001 (Martin, 1998). Again aggre- plain wrong, it is nonetheless a useful
gation into polygons involves a loss of aphorism.
information. There may be a further loss of Testing for spatial autocorrelation was
information in capturing the polygon itself one of the high-profile research agendas in
in the database. It may be captured using geography during the quantitative revolution.
a representative point (such as its centroid) Geographers adapted spatial autocorrelation
and its spatial relationship to other polygons statistics based on the join-count statistic,
captured using a neighbourhood weights the cross product statistic and the squared
matrix. difference statistic that had been developed
The conceptualization of a geographic for quantifying spatial structure on regular
space as a field or as an object is areal frameworks (grids). These statistics
largely dictated by the attribute. However, were developed to test for statistically
representation – the process by which significant spatial autocorrelation on irregular
information about the geography of the areal frameworks (Cliff and Ord, 1973). The
real world is made finite using geomet- null hypothesis (no spatial autocorrelation)
ric constructs – involves making choices was assessed against a non-specific alter-
(Martin, 1999). These choices include the native hypothesis (spatial autocorrelation is
size and configuration of polygons, the present). We shall see how this argument was
location and density of sample points. developed in later years with the introduction

and use by geographers of models for spatial covariance may be another persons mean
variation. structure’ (p. 25). It has often been remarked
In the earth sciences, dealing principally that spatial variation is heterogeneous. This
with point data from surfaces, the quan- type of decomposition (plus a white noise
tification of structure was based on the element to capture highly localized hetero-
use of the empirical semi-variogram which geneity) is one way of formally capturing that
uses a squared difference statistic (Isaaks heterogeneity using what are termed ‘global’
and Srivastava, 1989). The advantage of models. Another approach is to only analyze
the latter route was that it led naturally to spatial subsets, that is allow model structure
model specification and model fitting using to vary locally.
theoretical semi-variograms. Of course these
quantitative measures and tests of hypothesis
depend on the scale of analysis. That is, they
depend on the size of the polygons in terms
2.1.2. Properties due to the
of which data are reported, the inter-point
chosen representation
distance between samples on a continuous
surface. Thus the chosen representation has We have already noted that the extent to
an important influence on the quantification which our data retains fundamental properties
of this fundamental property and hence depends on the chosen representation. We
its presence within any spatial dataset. If now turn to look at other properties that
samples are taken at sufficient distances apart stem directly or indirectly from the chosen
the level of spatial autocorrelation is likely to representation.
be much reduced relative to the case where Representing spatial variation using poly-
samples are taken close together. gons is employed in many branches of
Autocorrelation statistics are also used science that handle spatially referenced
to capture temporal structure in attribute data. Two of the generic consequences of
values but there are important differences working with data aggregates are: intra-
with the spatial situation. Time has a natural areal unit heterogeneity and inter-areal unit
uni-directional flow (from past to present) heteroscedasticity.
whereas space has no such order. The two Whether the data refer to a continu-
dimensional nature of space means that ously varying phenomenon (field view) or
dependency structures might vary not just aggregations of individuals like households
with distance but direction too giving rise (object view) the effect of bundling data into
to anisotropic dependency structures with spatial aggregates has the effect of smoothing
structure along the north–south axis differing variation. In the case of environmental data
from the east–west axis. The presence of and the use of pixels then the degree of
spatial autocorrelation, that attribute values smoothing will clearly depend on the size of
are not statistically independent, has funda- the pixels. The larger the pixels the greater
mental implications for the conduct of spatial the degree of smoothing. A non-intrinsic
analysis. partition, where the polygons are defined in
Spatial autocorrelation, in statistical terms, terms of attribute variability with the aim
is a second order property of an attribute of maximizing within unit homogeneity and
distributed in geographic space. In addition maximizing between-unit heterogeneity will
there may be a mean or first-order component not produce this effect to the same extent.
of variation represented by a linear, quadratic, This second process shares common ground
cubic (etc.) trend. We can think of these with the process of regionalization – to which
as two different scales of spatial variation it is sometimes compared.
although the distinction may be hard to make Intra-unit heterogeneity is a particular
and quantify in practice. As Cressie (1991) problem for many types of social science
remarks: ‘What is one person’s (spatial) data particularly in those cases where area

boundaries are chosen arbitrarily as was the fluctuations can impact severely on the
case with the UK census for example prior calculated values. Polygons containing many
to 2001. Attributes reported for an area may individuals will generate robust rates and
represent percentages or means of attribute ratios but often conceal much higher levels
values associated with the individuals (people of internal heterogeneity.
or households) that have been aggregated and In practice an area is sometimes partitioned
the analyst may have no information on the into polygons of varying size and this can
variability around the mean. If an ecological yield a secondary effect on data properties.
or contextual attribute is calculated for an A rate calculated for a polygon where
area (social capital say, or area deprivation) the denominator attribute is small has a
again the calculation is conditional on the larger variance than a rate computed for a
chosen representation and the scale of the polygon where the denominator attribute is
partition. large. Moreover there is a mean-variance
One of the conclusions that might be drawn dependence in the rate statistics. Take the
from this is that it is better to have small areal case where the denominator is the number of
aggregates rather than large ones. Assuming households (n(i)). Rates are observed counts
spatial structure, a reasonable supposition of some attribute (number of burglaries) in
given the discussion in section 2.1.1, then polygon i(O(i)) divided by the number of
smaller areas should be more homogeneous households. It follows from the binomial
than larger areas and their mean values model for O(i) that:
should be more representative of their area’s
population. But such spatial precision comes
at the cost of statistical precision. Data errors E [O(i)/n(i)] = (1/n (i)) E [O(i)] = p (i) ;
or small random fluctuations in numbers
of events (household burglaries; disease Var [O(i)/n (i)] = (1/n (i))2 Var [O(i)]
outcomes) will have a big effect on the = p(i)(1 − p(i))/n(i)
calculation of rates when populations are (2.2)
small. Take the case of a standardized
mortality ratio. If the expected count is
small, for example 2.0, then the ratio itself where E[…] and Var[…] denote mean and
(observed count divided by the expected variance and p(i) is the probability that
count) rises or falls by 0.5 with each any individual in area i (e.g., number of
addition or subtraction of a single case. This households) has the characteristic (e.g., been
will have implications for determining the burgled) that is being counted. The mean
statistical significance of counts – whether and the variance in equation (2.2) are
there are significantly more cases than would clearly not independent. It also follows from
be expected on the basis of chance alone. It equation (2.2) that the standard error of the
will also have implications for determining estimate of the rate p(i) which is:
the statistical significance of differences in
counts between areas which in turn raises
problems for the detection of significant [ p(i) (1 − p(i)) /n(i)]1/2
crime hotspots or disease clusters.
In summary, there is a trade-off that is
linked to the number of individual elements is inversely related to the number of
in a polygon. A polygon containing few households. It follows that any real spatial
individuals will tend to be more homo- variation in rates could be confounded by
geneous but statistical quantities, such as variation in n(i) (the number of households)
rates and ratios, tend to be unreliable in or alternatively spatial variation in rates could
the sense that small errors and random be an artifact of any spatial structure in

n(i) (see Gelman and Price, 1999 who give Data quality can be assessed in terms of
examples from disease mapping in the USA). four characteristics: accuracy, completeness,
Standardized ratios provide an estimate of consistency and resolution. As noted above,
the true but unknown area-specific relative a spatial datum comprises a triple of
risk of the selected disease under the measurements: the attributes, location and
assumption of an independent Poisson model time. Thus the quality of each of these
for the observed counts. It follows from the three measurements needs to be assessed
properties of the Poisson distribution that against the four characteristics. What is of
the standard error of the standardized ratio interest here, however, is how measurement
is O(i)1/2/E(i). Using a normal approxima- problems might introduce certain proper-
tion for the sampling distribution of the ties into the data (Guptill and Morrison,
standardized ratio, SR(i), approximate 95% 1995).
confidence intervals can be computed: A common assumption in error analysis
is that attribute errors are independent. This
  is likely to hold less often in the case
SR (i) ± 1.96 O(i)1/2/E(i) . of spatial data. Location error may lead
to overcounts in one area and undercounts
in adjacent areas because the source of
the overcount is the set of nearby areas
However there are problems here when mak-
that have lost cases as a result of the
ing comparisons. The standard error tends
location error. So, count errors in adjacent
to be large for areas with small populations
areas may be negatively correlated (Haining,
and small for areas with large populations
2003, pp. 67–70). Location error can be
because of the effect of population size on
introduced into a spatial data set as a result
E(i). So extreme ratios tend to be associated
of having to put data, collected on different
with small populations but ratios that are
spatial frameworks, onto a common spatial
significantly different from 1.0 tend to be
framework. Areal interpolation methods are
associated with areas with large populations
used but these are based on assumptions
(Mollie, 1996).
about how attributes are distributed within
These examples are intended to illustrate
areal units and these assumptions often
the way in which data properties can
cannot be tested. The consequence is that
be induced by the chosen representation.
further levels and patterns of error are
In certain circumstances the geographical
introduced into the database (Cockings et al.,
structure of the representation (for example
the geography of which areas have large
In the case of remotely sensed data,
and which have small denominator values)
the values recorded for any pixel are not
could induce a geographical structure on the
in one-to-one relationship with an area of
statistics which when mapped could then give
land on the ground because of the effects
rise to a misleading impression about trends
of light scattering. The form of this error
or patterns in the data.
depends on the type and age of the hardware
and natural conditions such as sun angle,
geographic location and season. The point
spread function quantifies how adjacent pixel
2.1.3. Properties due to
values record overlapping segments of the
measurement processes
ground so that the errors in adjacent pixel
The final step in the creation of the values will be positively correlated (Forster,
SDM involves obtaining measurements on 1980). The form of the error is analogous
the attributes of interest given the chosen to a weak spatial filter passed over the
representation. surface so that the structure of surface

variation, in relation to the size of the pixel populations are easier to track. Finally, since
unit, will influence the spatial structure of there are 10 year gaps between successive
error correlation. Linear error structures also censuses, population in- and out-flows in
propagation may further complicate error characteristics of the areas. On the other hand
properties when arithmetic or cartographic some areas of a city, especially inner-city
operations are carried out on the data areas, may experience population mobility
and source errors are compounded and and redevelopment which result in marked
transformed via these operations (Haining shifts that have implications for the reliability
and Arbia, 1993). of the data in the years following the
Data incompleteness may induce false Census.
patterns in spatial data. Data incompleteness Finally, in the case of some imagery, some
refers to the situation where there are missing areas of the image may be obscured because
data points or values or where there are under of cloud cover. A distinction should be drawn
or overcounts arising from the reporting between data that are ‘missing at random’
process. ‘Spatially uniform’ data incomplete- from data that are missing because of some
ness raises problems for analysis but spatial reason linked to the nature of the population
variation in the level of data incompleteness or the area. Weather stations temporarily
with, for example, undercounting 6 more out of action because of equipment failure
serious in some parts of the study area than produce data missing at random. On the other
others can seriously affect comparative work hand mountainous areas will tend to suffer
and the interpretation of spatial variation. from cloud cover more than adjacent plains
Missing or inaccurately located cases in a and there will be systematic differences in
point pattern of events may result in failure land use between such areas. This distinction
to detect a local cluster of cases (Kulldorff, has implications for how successfully miss-
1998). ing values can be estimated and whether the
Incompleteness in cancer data leads to results of data analysis will be biased because
forms of under or overcounting which give some component of spatial variation is
rise to spatial variation that is an artifact unobservable.
of how the data were collected. In the Figure 2.1 provides a summary of the
case of official crime statistics geographical points raised in this section.
differences between large counties in Eng-
land may be due to differences in police
investigative and reporting practices. On
the intra-urban scale, burglaries in suburban
areas will, on the whole, be well reported 2.2. IMPLICATIONS OF DATA
for insurance purposes, but in some inner PROPERTIES FOR THE ANALYSIS
city areas there may be under reporting OF SPATIAL DATA
either because there is no ‘incentive’ or
because of fear of reprisals. The Census In this section we turn to a consideration of
provides essential denominator data for the implications of the properties of spatial
computing small area rates. However refusals data for the conduct of spatial analysis. Again
to cooperate can lead to undercounting and we shall simply introduce ideas which will
the 1991 Census in the UK was thought be taken up in more detail in later chapters.
to have undercounted the population by as We divide this section into situations where
much as 2% because of fears that its data spatial properties can be exploited to help
would be used to enforce the new local solve problems and situations where spatial
‘poll tax’. Inner city areas show higher levels properties introduce complications for the
of undercounting than suburban areas where conduct of data analysis.

Figure 2.1 Processes involved in constructing the spatial data matrix and the data
properties that are present or introduced at each stage.

2.2.1. Taking advantage of spatial 1989). It is intuitive that any solution that
data properties to tackle problems did not use the information contained in the
location co-ordinates of sample data values
Consider the following problems:
would be considered an inefficient solution.
Consider another group of problems:

• Samples of attribute values have been taken

across an area. The analyst would like to • Aggregated data are obtained on race
construct a map to describe surface variation (black/white) and voting behaviour (did vote/did
using the information contained in the sample. not vote). Counts in the 2 × 2 table are known
Perhaps instead the analyst just wishes to but the real interest lies in the voting behaviour
estimate the surface at a point, or set of points, at the constituency level.
where no sample has been taken and estimate
the prediction error. • Unemployment estimates have been obtained
from a survey for each of a number of small
• A spatial database has been assembled but
areas in a region. The small area estimators
the database contains data that are ‘missing at are unbiased but, because of small sample
random’ in the sense that there are no underlying sizes have low precision. Conversely the region
reasons (such as suppression or confidentiality) wide estimator has high precision, but as an
why the particular values are missing. The analyst estimate for any of the small area levels of
wants to estimate these missing values. unemployment is biased. A similar situation
arises when estimating relative risk levels across
the small areas of a larger region using the
In both these cases we might expect to standardized mortality ratio.
exploit some formalized version of the notion
that data points near together in space carry
information about each other. Both of these In both these cases there is again an oppor-
examples constitute a form of the spatial tunity to exploit some formalized version of
interpolation problem and solutions such as the notion that data points nearby in space
kriging exploit the spatial structure inherent carry information about each other. One
in the surface as well as the configuration solution is to ‘borrow information’ or ‘borrow
of the sample points to provide an estimate strength’ so that the low precision of small
of surface values together with an estimate area estimates are raised by using data from
of the prediction error (Isaak and Srivastava, nearby areas (Mollie, 1996; King, 1997).

These nearby areas provide additional data data where spatial structure emerges as a fun-
(helping to improve precision) and because damental property of the data. Process shapes
they are nearby should reflect an underlying or at least influences attribute variation and
situation that is close to the small area in the resulting data that are collected possess
question so will not introduce a serious level dependency structures that reflect the way the
of bias. process plays out across geographic space.
Not all processes of interest are ‘spatial’
in the sense described above. Many of the
processes of interest to geographers play
2.2.2. Where spatial data out across geographic space in response to
the place-based characteristics of areas (the
properties introduce complications
particular mix of attributes they possess)
for data analysis
and the spatial relationships between those
Spatial analysis is often called upon to areas. Outcomes in places (whether for
address scientific questions relating to out- example economic, social, epidemiological
comes (numbers of cases of a disease, dis- or criminological) are not necessarily merely
tribution of house prices, regional economic the consequence of the properties of those
growth rates) that are a consequence of places – as places – but may also be the
processes that by their nature are spatial. consequence of relational and contextual
Haining (2003) identifies four generic groups influences. The distance between places;
of spatial processes. A diffusion process is the difference between adjacent places in
one where some attribute is taken up by terms of relevant attributes; the overall
a population so that at any point in time configuration of places across a region, are
some individuals have the attribute (e.g., an all facets of relation and context that may
infectious disease) and some do not. If the impact on outcomes and modify the role of
diffusion process operates in ways that are ‘place’ in influencing outcomes. Two places
constrained by distance then there is likely may be identical in terms of their place-
to be spatial structure in the geography of based characteristics but differ significantly
those who do and those who do not have in terms of their relational and contextual
the attribute in question. An exchange and attributes with neighbouring areas and these
transfer or mixing process is one where differences may explain why (for exam-
places become similar in attribute values ple) two similarly affluent neighbourhoods
(per capita income; employment) as a result experience quite different levels of assault
of flows of goods or services that bind and robbery; why two similarly deprived
their economic fortunes together or where neighbourhoods experience quite different
patterns of movement and mixing perhaps at levels of health outcomes.
different scales introduce a measure of spatial We now examine briefly how these fea-
homogeneity into structures. A third type of tures of how attribute values are generated
spatial process is an interaction process in impact on the choice of methodology for
which outcomes at one location (e.g., the the purpose of data analysis. We distinguish
price of a commodity) are observed and as between exploratory spatial data analysis and
a result of the competition effect influence model based forms of analysis that allow
outcomes (prices) at another location. Finally, hypothesis testing and parameter estimation.
there is a dispersal process in which
individuals spread across space (such as the
dispersal of seeds around a parent plant) Exploratory spatial data analysis
so that counts reflect the geography of the Exploratory data analysis (EDA) comprises a
dispersal mechanism. collection of visual and numerically resistant
These generic spatial processes – processes techniques for summarizing data properties,
that operate in geographic space – generate detecting patterns in data, identifying unusual

or interesting features in data including pos- areas are ignored. This may be particularly
sible data errors and formulating hypotheses. important if in fact it is the small areas
Exploratory spatial data analysis (ESDA) that have the larger populations so that it
undertakes these activities with respect to is their rates and ratios (rather than the
spatial data so that cases can be located on rates and ratios associated with the physically
a map and the spatial relationships between larger but less densely populated areas) that
cases assumes importance because they carry are the more robust. One solution to this
information that is likely to be relevant to problem is to use cartograms so that areas
the analysis (Cressie, 1984; Haining et al., are transformed in physical extent to reflect
1998; Fotheringham and Charlton, 1994). It some underlying attribute such as population
is important to be able to answer questions size (Dorling, 1994). This comes at a cost
such as: ‘where does that subset of cases on because the individual areas in the resulting
the scatterplot or that subset of cases on the cartogram may be hard for the analyst to
boxplot, occur on the map?’ ‘What are the place. There may be a need for a second,
spatial patterns and spatial associations in this conventional, map linked to the cartogram,
geographically defined subset of the map?’ In so the analyst can highlight areas on the
the case of regression modelling do the large cartogram and see where they are on the
positive residuals, for example, cluster in one conventional map.
area of the map? Conventional visualization technology is
ESDA and the software that supports often based on the assumption that all
ESDA needs to be able to handle the spatial data values are of equal status so that
index and be able to handle the special the viewer can extract information from
queries that arise because of the spatial refer- visual displays without worrying about the
encing of the data. Thus the map becomes an statistical comparability of the data values
essential visualization tool (Dorling, 1992). that are displayed. This assumption may
The linkage between a map window and other break down when dealing with spatially
graphics windows, so that cases can be simul- aggregated data (Haining, 2003).
taneously highlighted in more than one win-
dow, becomes an essential part of the conduct
of ESDA (Andrienko and Andrienko, 1999; Model fitting and hypothesis testing
Monmonier, 1989). If n data values are spatially autocorrelated
Visualizing spatial data raises particular then one of the consequences of this for the
problems, in part because of some of the application of standard statistical inference
properties discussed in earlier parts of this procedures is that the information content
chapter. We highlight two here. First, it has of the data set is less than would be the
been noted that data values, particularly rates case if the n values were independent. This
and ratios, may not be strictly comparable means that the degrees of freedom available
because standard errors are population size for testing hypotheses is not a simple function
dependent. So if areas vary substantially of n. We shall take the example of testing for
in terms of population counts (used as significant bivariate correlation between two
the denominators for a rate) then extreme variables to illustrate this point.
values and even patterns detected by visual Suppose n pairs of observations,
inspection might be associated with that {(x(i), y(i))}i are drawn from a bivariate
effect rather than real differences between normal distribution. Pearson’s product
areas. Second, areas that partition a region moment correlation coefficient (r) is the
might be very different in physical size. statistic used to measure the association
This may mean that the viewer of a map between X and Y . If the observations on the
has their attention drawn to certain areas of two variables are independent (there is no
the cartographic display (those areas with spatial autocorrelation in either X or Y ), then
physically large spatial units) whilst other if the null hypothesis is of no association

between X and Y then a test statistic autocorrelated, the information content of the
is given by: sample is over-estimated if n is used – it
needs to be deflated. The sampling variance
 −1/2 of statistics are underestimated leading the
(n − 2)1/2 |r| 1 − r 2 (2.3) analyst to reject the null hypothesis when
no such conclusion is warranted at the
chosen significance level. For the effects
which is t distributed with (n − 2) degrees of of spatial dependency on the analysis
the variance of the sampling distribution of r,
which is a function of the number of pairs the complications they introduce we need
of observations n, is underestimated by the to introduce models for spatial variation –
conventional formula which treats the pairs or data generators for spatial variation.
of observations as if they were independent. Such models are important. By specifying a
The effect of spatial autocorrelation on tests model to represent the variation in the data
of significance have been extensively studied (including the spatial variation), the analyst
(for reviews see Haining, 1990, 2003) and is able to construct tests of hypothesis with
shown to be very severe when both X and Y greater statistical power than is possible if
have high levels of spatial autocorrelation. testing is against a non-specific alternative.
Clifford and Richardson (1985) obtain an There are a number of possible formal
adjusted value for n(n ) which they call the models for spatial variation of which the
‘effective sample size’. This value, n , can simultaneous spatial autoregressive (SAR),
be interpreted as measuring the equivalent the conditional spatial autoregressive (CAR)
number of independent observations so that and the moving average (MA) models are
the solution to the problem lies in choosing probably the best known. We will briefly
the conventional null distribution based on n look at the first two but the interested
rather than n. An approximate expression for reader will need to follow up the liter-
this quantity is: ature to gain a fuller understanding of
these models and their properties (Whittle,
  −1 1954; Besag, 1974, 1975, 1978; Ripley,
n = 1 + n2 trace Rx Ry (2.4) 1981; Cressie, 1991; Haining, 1978, 1990,
A multivariate normal CAR model which
where Rx and Ry are the estimated spatial satisfies the first order (spatial) Markov
correlation matrices for X and Y respectively. property and thus might be thought of as the
(For a discussion of estimators see Haining, simplest departure from spatial independence
1990, pp.118–120.) The null hypothesis of no can be written as follows (Besag, 1974;
association between X and Y is rejected if: Cressie, 1991, p. 407):

 1/2  −1/2
n − 2 |r| 1 − r 2 (2.5)
E X(i) = x(i)  X( j) = x( j) j∈N(i)

exceeds the critical value of the t distribution =µ+ τ w(i, j) [X( j) − µ] ,

with (n − 2) degrees of freedom. j∈N(i)
This illustrates a general problem. Since
the n observations are positively spatially i = 1, . . ., n (2.6)

and: where ρ is a parameter. The bounds on ρ are

set by the largest and smallest eigenvalues
Var X(i) = x(i)  X( j) = x( j) j∈N(i) = σ 2 , of W just as in the case of the CAR model.
This is the model most often seen in the
i = 1, . . ., n spatial analysis and regional science literature
although the reason for its hegemony is far
from clear and seems to be largely based
where E[… | .] and Var[… | .] denote con- on a combination of historical accident (in
ditional expectation and variance respec- the sense that time series modelling preceded
tively, µ is a first-order parameter and τ spatial data modelling and methods were
is the spatial interaction parameter. The transferred across) and subsequent ‘lock-in’.
Markov property means observations are These models can be embedded into,
conditionally independent given the values for example, regression models either as
at neighbouring sites. {w(i, j)} denotes the additional covariates (as in the case of equa-
neighbourhood structure of the system of tion (2.7)) or as models for the error structure
areas and w(i, j) = 1 if i and j are neigh- where the errors (in practice the residuals)
bours ( j ∈ N(i)) and w(i, i) = 0 for all i. are tested and found to show evidence of
W is the n × n matrix of {w(i, j)} and is spatial autocorrelation (Anselin, 1988; Ord,
is a requirement that τ lies between (1/ωmin ) models by ordinary least squares when errors
and (1/ωmax ) where ωmin and ωmax are the are spatially (positively) autocorrelated gives
smallest and largest eigenvalues of W. For rise to some damaging consequences. First,
a fuller introduction to the Markov property although we shall obtain consistent estimates
for spatial data including how to construct of the regression parameters (there may be
higher-order spatial Markov models see, for some small sample bias), the sampling vari-
example, Haining (2003, pp. 297–299). This ance of these estimates may be inflated com-
approach allows the construction of a hier- pared with methods that take account of the
archy of models of increasing complexity. spatial autocorrelation in the errors. Second,
As noted in Haining (2003), however, the if the usual least squares formula for the sam-
Markov property does not have the natural pling variances of these regression estimates
appeal it has in the case of time series, is applied, the variances will be seriously
because space has no natural ordering. So underestimated. The formulae are no longer
the neighbourhood structure can often seem valid and conventional F and t tests of
rather arbitrary especially in the case of the hypothesis are also not valid. We shall take a
non-regular areal frameworks used to report very simple example to illustrate these points,
Census and other social and economic data. where the parameter to be estimated and tests
If the analyst of regional data does not of hypothesis relate to a constant mean µ.
attach importance to satisfying a Markov Suppose n independent observations {x(i)}
property another option is available called are drawn from a N (µ, σ 2 ) distribution. The
the SAR model specification. A form of this sample mean, x̄, is an unbiased estimator for
model was first introduced into statistics by µ, and the variance of the sample mean is:
Whittle (1954). Let e be independent normal
IN(0, σ 2 I) where I is the identity matrix
and e(i) is the variable associated with site Var (x̄) = σ 2/n. (2.8)
i(i = 1, . . ., n). Define the expression:
If σ 2 is unknown then it is estimated by:

X (i) = µ + ρw (i, j) [X( j) − µ]


s2 = (1/ (n − 1)) (x(i) − x̄)2 (2.9)

+ e(i), i = 1, . . ., n. (2.7) i=1, ..., n

so that: given by µ1 and n by n variance–covariance

matrix  = σ 2V given, say, by one of the

models described above. (In the case of the
Var (x̄) = (1/n (n − 1)) (x(i) − x̄)2 . CAR model (2.6), V = (I − τW)−1 .) The
i=1, ..., n
log likelihood for the data is:
If the n observations are not independent − (n/2) ln 2π σ 2 − (1/2) ln |V| − 1/2σ 2
then although the sample mean is still × (x − µ1)T V−1 (x − µ1) (2.13)
unbiased as an estimator of µ, assuming each
x(i) has the same variance (σ 2 ), the variance
of the sample mean is (see for example where 1 is a column vector of 1’s and |V|
Haining, 1988, p. 575): denotes the determinant of V. For simplicity
we assume V is known. The maximum
  likelihood estimator of µ is:
Var (x̄) = σ 2/n + 2/n2

× Cov (x(i), x( j))  −1  

i j(i<j)
µ = 1T V−1 1
1T V−1 x . (2.14)

The estimator (2.14) is the best linear

where Cov(x(i), x( j)) denotes the spatial unbiased estimator (BLUE) of µ. Note that
autocovariance between x(i) and x( j). So, if in the case of independence V = I (the
there is positive spatial dependence and σ 2 identity matrix with 1’s down the diagonal
is known then σ 2/n underestimates the true and zeros elsewhere) and equation (2.14)
sampling variance of the sample mean. If σ 2 reduces to the sample mean. In the case
is unknown and is estimated by equation (2.9) V  = I two modifications to the sample
then if there is positive spatial dependence mean are occurring. First, the denominator
the expected value of s2 is (see, for example, for positive spatial dependence will be less
Haining, 1988, p. 579): than n. Second, the presence of V−1 in the
numerator of equation (2.14) downweights
  the contribution of any attribute x(i) which is
E s2 = σ 2 − [(2/n(n − 1)) highly correlated with other attribute values

{x( j)} – that is, where x(i) is part of a cluster
× Cov (x(i), x( j))] of observations.
i j(i< j) The variance of µ is:

µ] = σ 2 (1T V−1 1)−1

Var[ (2.15)
so that equation (2.9) is a downward biased
estimate of σ 2 . This further compounds the
underestimation of the sampling variance. which reduces to σ 2/n if V = I.
Modified methods to take account Since the sample mean is an unbiased
of spatial dependence are often based estimator of µ, one modification is to replace
on the following argument (see, for equation (2.8) with equation (2.15). The term
example, Haining, 1988). Assume the data (1T V−1 1) is proportional to Fisher’s infor-
xT = (x(1), . . ., x(n)), where T denotes the mation measure (Haining, 1988, p. 586). It
transpose, are drawn from a multivariate identifies the information about µ contained
normal spatial model with mean vector in an observation. Now equation (2.9) is not

the maximum likelihood estimator for σ 2 . statistical assumptions, spatial data often
This is given by: create ‘data-related’ problems in regression
modelling (Haining, 1990, pp. 332–333). For
example, the fit of a trend surface model
σ 2 = n−1 (x − µ1)T V−1 (x − µ1). (2.16)
can be influenced by the configuration of
the sample data points on the surface where,
as a result of the particular distribution,
A further refinement is to replace equa- certain values have high leverage (Unwin and
tion (2.9) with equation (2.16) substituting Wrigley, 1987); the particular shape of the
the sample mean for µ in equation (2.16) study region may also influence the trend
where V−1 plays a role equivalent to the surface model fit (Haining, 1990, p. 372).
second term in the right-hand side of These and other issues are reviewed in
equation (2.11). Haining (1990, pp. 40–50).
The general results given by We conclude this section by remarking
equations (2.11) and (2.12) are why on the implications of intra-area and inter-
adjustments to conventional methods are area spatial dependency and intra-area het-
needed. The evidence suggests that it is the erogeneity when modelling a discrete valued
effect of the second term on the right-hand response variable such as the count of the
side of equation (2.11) that is the more number of cases of a disease across a region
serious, at least in the usual situation using the Poisson model. Spatial dependency
of positive spatial dependence, and that and heterogeneity are important causes of
one way to deal with this is to adjust n overdispersion. For example consider a local
in equation (2.8) thereby increasing the diffusion process in which individuals are
sampling variance of the sample mean. The more likely to be infected if they are close
size of the adjustment to n will be sensitive to someone already infected. The result is
to the estimates of the spatial autocorrelation that counts of the number of cases will
in the data or, if a spatial model is fitted to reveal Poisson overdispersion because there
the data, the choice of model. The problem is will be areas with large counts (due to the
further complicated if, as is usually the case, local infection process) and areas with zero
V is not known and so must be estimated counts where the process has not yet started.
from the data. These considerations require the analyst both
Before leaving the normal model it is to carry out tests for overdispersion and
important to note that aggregated spatial where necessary take appropriate action.
data may violate another of the statistical The effects of overdispersion in generalized
assumptions of least squares regression. It linear modelling are rather similar to those
was remarked in section 2.1 how rates and described for the normal model when spatial
ratios based on areas with very different autocorrelation is detected. If overdispersion
population counts will have different stan- is present, ignoring it tends to have little
dard errors. It follows that the assumption of impact on point estimates of the regression
homoscedasticity (or constant error variance) parameters (the maximum likelihood estima-
is likely to be violated when developing tor is consistent, although some small sample
models to explain how rates or ratios bias might be present). However, standard
vary over a region. Data transformations or error estimates for regression parameters are
weighted least squares estimators are used underestimated. Type I errors associated with
to address these problems (Haining, 1990, the model are underestimated which is par-
pp. 49–50) but such adjustments may need ticularly problematic in relation to predictors
to be implemented whilst also addressing that are close to the significance threshold.
the problems created by residual spatial If the objective is to build a parsimonious
autocorrelation (Haining, 1991). In addition model, the presence of overdispersion may
to the problems created by failure to satisfy result in an analyst constructing a model

more complicated than necessary, and that inferences are being made now? A frequent
overestimates the variance explained. answer to this is that the underlying process
Ways of tackling this problem may depend is stochastic (chance is an inherent part of the
on the reasons for the overdispersion. process) so that inferences are directed at the
A conventional approach is through the process (its parameters and covariates) rather
use of a variance inflation factor (Dobson, than the map. The problem with this is that
1999). Where the cause is inter-area spa- we have access to only one realization of the
tial autocorrelation then a discrete valued process and in order to give our inferences
‘auto-model’ may be used which is analogous some broader validity other assumptions need
to equation (2.6) (see Besag, 1974). More to be invoked such as that this realization
recently attention has focused on the use is representative of the underlying process.
of spatial random effects models using There may be no way to test such an
CAR models fitted using WinBUGS (Law assumption.
et al., 2006). These models allow for The modifiable areal units problem
overdispersion through the random effects (MAUP) reminds us that results obtained
term. This is an area of current research from analyzing aggregate data are dependent
in spatial modelling since the development on the particular scale of the partition, and,
of good modelling tools for discrete valued at the given scale, the particular boundaries
response variables has rather taken a back used. In general statistical relationships
seat whilst attention for many years has between attributes are stronger the larger
focused – perhaps disproportionately – on the the spatial aggregates because variances
normal model (Law and Haining, 2004). are reduced. Boundary shifts can influence
whether or not disease clusters or crime hot
spots are detected at any scale because if
boundaries happen to cut through the middle
2.3. DRAWING INFERENCES of a cluster this may dilute the effect over
two or more areas.
One of the main purposes of undertaking spa- The analysis of aggregated data is par-
tial statistical analysis is to make population ticularly problematic and not just because
inferences on the basis of the data collected. of the MAUP. It is important to remember
In concluding this chapter we consider some that conclusions drawn from aggregate data
of the inference pitfalls associated with the can only be transferred to the individual
analysis of spatial data. level under certain conditions. The ecological
What is the population about which fallacy is the uncritical transfer of findings
inferences are made in an observational at the group level to the individual level. As
science? If data are point samples from a the famous example cites, the suicide rate
continuous surface then the population might in Germany in the 17th century may have
be the surface itself. Of course the realized been larger in areas with higher percentages
surface may be thought of as only one of Catholics but that does not mean Catholics
of many possible realizations (the rest not were more prone to commit suicide than
having been observed). However, with or Protestants. Quite the reverse as individual
without the concept of a ‘superpopulation’ level data revealed. Aggregation bias raises
of surfaces, making inferences from point serious problems for epidemiological studies
samples to the (realized) surface population based on aggregate data and is one reason
does represent a legitimate target. This why it is considered the weakest of the
argument is less convincing when the data different methodologies for assessing dose–
represent a complete census – for example the response relationships – even though this
data refer to areas and a complete (or nearly may be the only realistic way of obtaining
complete) enumeration has been carried reasonably sound measures of exposure to an
out. What is the population about which environmental risk factor. The problem is that

it is not difficult to construct examples where and the way spatial data are collected and
there are complete sign reversals when going attributes measured. Many of these properties
from the ecological to the individual level were recognized early in geography’s ‘quan-
study (Richardson, 1992). titative revolution’ most notably the lack of
The converse of the ecological fallacy independence in data values collected close
is the atomistic (or individualistic) fallacy together in space. Geographers then and since
which assumes relationships identified at the have made important contributions to the
individual level apply at the group level. development of relevant statistical theory and
There may be group level or contextual practice.
effects that need to be taken into account – Geographers continue to develop new
as for example in the study of youth methods for describing spatial variation and
offending, where the risk of becoming an new methods for modelling processes that
offender may not depend only on personal operate across geographical space. At present
and household level risk factors but also there are two strong traditions which provide
neighbourhood and peer group effects. This focuses for research. On the one hand there
then raises the problem of defining what the are methodologies based on ‘whole map’ or
‘neighbourhood’ is. global statistics that seek to capture data
Figure 2.2 provides a summary of the properties through models that are fitted to
points raised in sections 2.2 and 2.3. all the data. On the other hand there are
methodologies based on ‘local’ statistics that
process geographically defined subsets of the
2.4. CONCLUSIONS data and do not seek to impose a single
statistic or model on the whole data set
Spatial data possess a number of distinctive (Anselin, 1995, 1996; Getis and Ord, 1996;
properties that derive from the fundamental Fotheringham and Brunsdon, 2000). They
nature of geographic space and the way pro- represent different ways of responding to the
cesses unfold in geographic space, the way need to develop methodologies to meet the
that spatial variation is represented for the analytical challenges posed by the special
purpose of storage in a finite digital database nature of spatial data.

Figure 2.2 Spatial data properties and how they impact at different stages of analysis.

REFERENCES Dobson, A.J. (1999). An Introduction to Generalized

Linear Models. Boca Raton: Chapman & Hall.
Andrienko, G.L. and Andrienko, N.V. (1999). Interactive Dorling, D. (1992). Stretching space and splicing
maps for visual data exploration. International time: from cartographic animation to interactive
Journal of Geographical Information Science, 13: visualization. Cartography and Geographic Informa-
355–374. tion Systems, 19: 215–227.
Anselin, L. (1988). Spatial Econometrics: Methods and
Dorling, D. (1994). Cartograms for visualizing human
Models. Dordrecht: Kluwer Academic.
geography. Hearnshaw, H.M. and Unwin, D.J., (eds),
Anselin, L. (1995). Local indicators of spatial Visualization in Geographic Information Systems,
association – LISA. Geographical Analysis, 27: pp. 85–102. New York: J. Wiley & Sons.
Fisher, R. (1935). The Design of Experiments.
Anselin, L. (1996). The Moran scatterplot as an ESDA Edinburgh: Oliver & Boyd.
tool to assess local instability in spatial association.
Forster, B.C. (1980). Urban residential ground cover
In: Fischer, M, Scholten, H.J. and Unwin, D., (eds),
using LANDSAT digital data. Photogrammetric
Spatial Analytical Perspectives on GIS, pp. 111–125.
Engineering and Remote Sensing, 46: 547–558.
London: Taylor & Francis.
Besag, J.E. (1974). Spatial interaction and the statistical Fotheringham, A.S., Brunsdon, C. and Charlton, M.
analysis of lattice systems. Journal, Royal Statistical (2000). Quantitative Geography: Perspectives on
Society, B, 36: 192–225. Spatial Data Analysis. London: SAGE.

Besag, J.E. (1975). Statistical analysis of non-lattice Fotheringham. A.S. and Charlton, M. (1994). GIS and
data. The Statistician, 24: 179–195. exploratory spatial data analysis: an overview of
some research issues. Geographical Systems, 1:
Besag, J.E. (1978). Some methods of statistical 315–327.
analysis for spatial data. Bulletin of the International
Statistical Institute, 47: 77–92. Gelman, A. and Price, P.N. (1999). All maps of
parameter estimates are misleading. Statistics in
Brindley, P., Wise, S.M., Maheswaran, R. and Haining, Medicine, 18: 3221–3234.
R.P. (2005) The effect of alternative representations
Getis, A. and Ord, J.K. (1996). Local spatial statistics: AQ : Getis and Ord
of population location on the areal interpolation of
air pollution exposure. Computers, Environment and an overview. In: Longley, P. and Batty, M., (eds), (1996) publisher
Urban Systems, 29: 455–469. Spatial Analysis: Modelling in a GIS environment, required
pp. 261–277.
Cerioli, A. (1997). Modified tests of independence in
2 × 2 tables with spatial data. Biometrics, 53: Goodchild, M.F. (1989). Modelling error in objects
619–628. and fields. In: Goodchild, M. and Gopal, S.
(eds), Accuracy of Spatial Databases, pp. 107–113.
Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation. London: Taylor & Francis.
London: Pion.
Guptill, S.C. and Morrison, J.L. (1995). Elements of
Clifford, P. and Richardson, S. (1985). Testing the Spatial Data Quality. Oxford: Elsevier Science.
association between two spatial processes. Statistics
and Decisions, Suppl. No. 2: 155–160. Haining, R.P. (1978). The moving average model for
314–328. an application to remotely sensed data. Commu-
nications in Statistics, Theory and Methods, 17:
Cressie, N. (1984). Towards resistant geostatistics.
In: Verly, G., David, M., Journel, A.G. and
Marechal, A., (eds), Geostatistics for Natural Haining, R.P. (1990). Spatial Data Analysis in the Social
Resources Characterization, pp. 21–44. Dordrecht: and Environmental Sciences. Cambridge: Cambridge
Reidel. University Press.
Cressie, N. (1991). Statistics for Spatial Data. New York: Haining, R.P. (1991). Estimation with heteroscedastic
Wiley. and correlated errors: a spatial analysis of

intra-urban mortality data. Papers in Regional Martin, D.J. (1998) Optimizing Census Geography: the
Science, 70: 223–241. separation of collection and output geographies.
International Journal of Geographical Information
Haining, R.P. (2003) Spatial Data Analysis: Theory and
Science, 12: 673–685.
Practice. Cambridge: Cambridge University Press.
Martin, D.J. (1999). Spatial representation: the
Haining, R.P. and Arbia, G. (1993). Error propaga-
social scientists’ perspective. In: Longley, P.A.,
tion through map operations. Technometrics, 35:
Goodchild, M.F., Maguire, D.J. and Rhind, D.W.
(eds), Geographical Information Systems: Volume 1.
Haining, R.P., Wise, S.M. and Ma, J. (1998). Exploratory Principles and Technical Issues, 2nd edition.
Spatial Data Analysis in a geographic information pp. 71–89. New York: Wiley.
system environment. The Statistician, 47: 457–469.
Mollie, A. (1996). Bayesian mapping of disease. Markov
Isaaks, E.H. and Srivastava, R.M. (1989). An Intro- Chain Monte Carlo in Practice: Interdisciplinary
duction to Applied Geostatistics. Oxford: Oxford Statistics, pp. 359–379. London: Chapman & Hall.
University Press.
Monmonier, M.S. (1989). Geographic brushing:
King, G. (1997). A Solution to the Ecological Inference exhancing exploratory analysis of the
Problem. Princeton, New Jersey: Princeton University scatterplot matrix. Geographical Analysis, 21:
Press. 81–84.
Kulldorff, M. (1998) Statistical methods for spatial
Richardson, S. (1992). Statistical methods for
epidemiology: tests for randomness. GIS and Health,
geographical correlation studies. In: Elliot, P.,
eds Gatrell, A. and Löytönen, M. pp. 49–62. London:
Cuzich, J., English, D. and Stern, R., (eds),
Taylor & Francis.
Geographical and Environmental Epidemiology:
Law, J. and Haining, R.P. (2004) A Bayesian approach Methods for Small Area Studies, pp. 181–204.
to modelling binary data: the case of high intensity Oxford: Oxford University Press.
crime areas. Geographical Analysis, 36: 197–216.
Ripley, B.D. (1981). Spatial Statistics. New York: Wiley.
Law, J., Haining R., Maheswaran, R. and Pearson, T.
(2006) Analysing the relationship between smoking Unwin, D.J. and Wrigley, N. (1987). Towards a general
and coronary heart disease at the small area level. theory of control point distribution effects in trend
Geographical Analysis, 38: 140–159. surface models. Computers and Geosciences, 13:
Longley, P.A., Goodchild, M.F., Maguire, D.J. and
Rhind, D.W. (2001). Geographical Information Whittle, P. (1954) On stationary processes in the plane.
Systems and Science. Chichester: Wiley. Biometrika, 41: 434–449.

