Spatial Auto Correlation Primer - Anselin - Luc

JOURNAL OF HOUSING ECONOMICS ARTICLE NO.
7, 304327 (1998)
HE980236
Spatial Autocorrelation: A Primer

Robin A. Dubin
Case Western Reserve University Received September 17, 1998
Regression error terms are likely to be spatially autocorrelated in any situation in which location matters. While both the precision of the estimates and the reliability of hypothesis testing can be improved by making a correction for spatial autocorrelation, the techniques for making such a correction are not widely understood. The purpose of this paper is to explore some of the issues involved in estimating models with spatially autocorrelated error terms. One of the two most common methods of handling spatial autocorrelation is the weight matrix approach, in which the process generating the errors is modeled. The resulting correlation structure is then derived from the hypothesized process. The second method models the correlation structure itself, rather than the underlying process. The bulk of this paper is concerned with comparing these two methods and their resulting correlation structures. Other issues are discussed at the end of the paper. 1998 Academic Press
1. INTRODUCTION
While autocorrelation in a time series context is well understood, and researchers routinely test and correct for this problem, the same cannot be said of autocorrelation in a cross-sectional context. The standard rule of thumb is that autocorrelation is a problem in time series data and heteroscedasticity is a problem with cross-sectional data. However, there are many instances in which an entitys location affects its behavior. Housing prices are a prime example: clearly the location of the house will have an effect on its selling price. If the location of the house inuences its price, then the possibility arises that nearby houses will be affected by the same location factors. Any error in measuring these factors will cause their error terms to be correlated. Spatial autocorrelation is likely to be present in any situation in which location matters. Although spatial autocorrelation can occur in many contexts, in this paper I will focus on housing prices. In the case of housing prices, the location factors are called neighborhood effects. There are at least two reasons to suspect that neighborhood effects are measured with errors. First, neighborhood is unobservable. This means that researchers wishing to
304 1051-1377/98 $25.00
Copyright 1998 by Academic Press All rights of reproduction in any form reserved.
SPATIAL AUTOCORRELATION
305
account for neighborhood must use proxies. Crime rates and socioeconomic characteristics of residents are examples of variables which are commonly used. Second, to make the use of proxies operational, a set of geographic boundaries must be assumed. Typically, the researcher uses the same set of boundaries as the data collector: census tracts are generally the boundaries when socioeconomic data are used and crime reporting areas are commonly used when crime rates are needed. Of course, the geographic boundaries that should be used are the (unknown) neighborhood boundaries. To the extent that neighborhood boundaries differ from the data gathering boundaries, the proxies themselves will contain error. These two problems, unobservability and boundaries, make it virtually certain that neighborhood variables will be measured with error, with the result that the regression error terms will be autocorrelated. The consequences of spatial autocorrelation are the same as those of time series autocorrelation: the OLS estimators are unbiased but inefcient, and the estimates of the variance of the estimators are biased. Thus the precision of the estimates as well as the reliability of hypotheses testing can be improved by making a correction for autocorrelation. Once the structure of the autocorrelation has been estimated, this information can be incorporated into any predictions, thereby improving their accuracy.1 Just as with time series autocorrelation, maximum likelihood (ML) techniques are commonly used to estimate the autocorrelation parameters and the regression coefcients.2 Despite the similarities, spatial autocorrelation is conceptually more difcult to model than time series autocorrelation, because of the ordering issue. In a time series context, the researcher typically assumes that earlier observations can inuence later ones, but not the reverse. In the spatial context, an ordering assumption such as this is not possible: if A affects B, it is likely that the reverse is also true. Also, the direction of inuence is not limited to one dimension as in time series, but can occur in any direction (although we generally restrict the problem, at least in the case of housing, to two dimensions). The purpose of this paper is to explore some of the issues involved in estimating models with spatially autocorrelated error terms. I use hedonic regression as the example problem, although the techniques discussed here are applicable to a wide variety of problems. I discuss the basic issues involved in modeling the autocorrelation structure and compare and contrast the most commonly used techniques. My purpose in doing so is to
This technique is known as kriging in the geostatistics literature and best linear unbiased prediction (BLUP) in the econometrics literature. Dubin (1992) and Basu and Thibodeau (1998) use this technique to predict housing prices. Also, Dubin (1998) and Dubin et al. (1998) discuss the issues involved in kriging. 2 Although other techniques are used in the literature for estimating models with spatially autocorrelated error terms, ML will be the only technique discussed here.
1
306
ROBIN A. DUBIN
promote a better understanding of these techniques, which I hope will encourage their use.
2. MODELS
There are two commonly used methods of modeling the autocorrelation structure. The rst is to model the process itself. This approach is based on the work of geographers (Cliff and Ord, 1981) and requires the use of a weight matrix. This approach is probably the more common of the two in the real estate literature (see Can (1992) and Pace and Gilley (1998) for examples). The second approach is to model the covariance matrix of the error terms directly. This approach is based on the work of geologists (Matheron, 1963) and has also been used in the real estate literature (see Dubin (1988) and Basu and Thibodeau (1998) for examples). 2.a. First Approach: Weight Matrix In this approach, the process generating the error terms is modeled explicitly. The model is Y u X Wu u . (1.a) (1.b)
In a hedonic regression, Y is an (N 1) vector containing the selling prices of the houses, X is an (N K ) matrix of the characteristics of the houses, u is an (N 1) vector of the correlated error terms, and is a (K 1) vector of unknown regression coefcients. The process generating the correlations is shown in Eq. (1.b). Here, is an (N 1) vector of normally distributed and independent error terms (with mean zero and variance 2) and is an unknown autocorrelation parameter (note that is a scalar). W is the weight matrix, which represents the spatial structure of the data. By far, the most common practice is to treat W as nonstochastic; that is, the researcher takes W as known a priori, and therefore, all results are conditional upon the specication of W (see Pace et al. (1998) for an exception). Note the similarity of this model to the time series AR1 model. Also, just as in time series, the model can be expanded by using various spatial lags (see Anselin (1988, pp. 2224) for details). In view of its centrality in this approach, a digression on W is in order. W is an N N matrix with zeros on its main diagonal. The off-diagonal elements, Wij , represent the spatial relationship between observations i and j. A common method of forming W is to use nearest neighbors. Under this scheme, Wij 1 if i and j are such that there is no observation closer to
307
either i or j, and zero otherwise. This scheme can easily be extended to n nearest neighbors. Another popular approach is to set Wij 1 if i and j are separated by a distance less than some, prespecied, limit. Rather than making the elements of W binary, another approach is to set Wij 1/DP , where D is an N N matrix showing the distances separating the ij observations, and P is a constant. All of these approaches have been used in the real estate literature; there does not appear to be any consensus regarding which scheme represents the best realization of the correlation structure appearing in the housing market. This is problematic because all of the results are conditional on the researchers a priori specication of the spatial structure. Solving (1.b) for u gives u and thus V E [uu ]
2
(I
W)
(2)
(I
W ) 1(I
W)
(3)
where V is the variance/covariance matrix of u. Note that V typically will not have a constant on the main diagonal. Thus, in this type of model, u is heteroskedastic, even though is not. The fact that V involves the product of two inverted matrices makes it difcult to visualize. In what follows, I show the correlations implied by the various choices of W, given a set of locations. Because housing data are not typically located on a regular grid, I use 10 observations, randomly located in a 10 10 square. These locations are shown in Fig. 1. Once the locations are known, the distance matrix, D, can be calculated; all of the weight matrices discussed here are based on D (see Table I). Once W is calculated, the population variance/covariance matrix is given by (3). In the illustration, I generate the correlations3 implied by the choice of W for each of the commonly used methods of specifying it: nearest neighbors, Wij 1 if Dij L, and Wij 1/DP . ij In addition to choosing the spatial weighting scheme, the researcher must also choose a parameter pertaining to it. For example, if the researcher chooses nearest neighbors, he must also decide the number of neighbors to use. For Wij 1 if Dij L, the researcher must decide the distance limit (L). And for the inverse distance weight matrix, the researcher must decide the power to which the denominator is raised (P). These choices
3
The correlations are derived from (3) as follows: Corrij
Vij / ViiVjj .
308
ROBIN A. DUBIN
FIG. 1. Locations.
(the form of the weight matrix and the value of the parameter) are made a priori by the researcher; the resulting weight matrix is taken as given. As the illustration below shows, these choices change the nature of the implied correlations considerably. A useful tool for representing spatial dependencies is the correlogram. The correlogram shows the correlations between points, graphed as a function of the distance separating them. Although not necessary, a nice prop-
TABLE I Distance Matrix 1 1 2 3 4 5 6 7 8 9 10 0.00 2.80 5.64 3.45 3.43 3.76 1.53 3.14 5.67 4.42 2 2.80 0.00 7.70 6.11 2.06 6.01 4.32 0.55 8.45 2.32 3 5.64 7.70 0.00 5.82 9.00 7.26 4.61 7.71 4.14 7.82 4 3.45 6.11 5.82 0.00 5.93 1.52 2.33 6.52 3.33 7.86 5 3.43 2.06 9.00 5.93 0.00 5.31 4.84 2.55 8.84 4.26 6 3.76 6.01 7.26 1.52 5.31 0.00 3.20 6.50 4.76 8.05 7 1.53 4.32 4.61 2.33 4.84 3.20 0.00 4.62 4.14 5.72 8 3.14 0.55 7.71 6.52 2.55 6.50 4.62 0.00 8.72 1.77 9 5.67 8.45 4.14 3.33 8.84 4.76 4.14 8.72 0.00 9.61 10 4.42 2.32 7.82 7.86 4.26 8.05 5.72 1.77 9.61 0.00
309
erty for correlograms to exhibit is that the correlations decline as separation distance increases. This is in accordance with Toblers (1970) rst law of geography: everything is related to everything else, but near things are more related than distant things. In the illustration, I show the correlograms for each of the spatial weighting schemes for different values of the parameters (which are normally chosen by the researcher) and for different values of (the autocorrelation parameter, which is normally estimated). I use three values of the parameters and two values of , which gives six correlograms for each weighting scheme. These are shown in Figs. 2 through 4. Note that these correlograms are not based on simulated data, but are the population correlograms, given the locations and the choice of W.4 I also present one weight matrix and one correlation matrix for each scheme, these are shown in Tables 2 through 4. Finally the distance matrix for the data is presented in Table 1. 2.a.1. Nearest neighbors. Figure 2.a. shows the correlograms for three choices of the number of nearest neighbors (1, 2, or 3) when the spatial 0.67). Note that because the weight matrices dependencies are strong ( are row standardized,5 the range of 1 to 1. Two observations can be drawn from an examination of this gure. First, while the correlations implied by this choice of W tend to fall with separation distance, the correlations do not fall monotonically. For example, in Fig. 2.a.1, there are zeros interspersed with positive correlations. This means that points separated by the same distances can have very different correlations. This occurs for two reasons: (a) the denition of W itself and (b) the formulation of the variance/covariance matrix as the product of two inverted matrices. The denition comes into play because Wij 1 only for nearest neighbors. Consider a case where points A and B are 0.5 units apart and points A and C are 0.6 units apart. For one nearest neighbor, only A and B are neighbors, and therefore, WAC 0. The presence of the inverted matrices is important, because it means that the locations of all points are taken into consideration when calculating the correlations. For example, consider row 2 of Table II (this is the correlation matrix for one nearest neighbor, when 0.67). Corr2,8 is the highest in this row because 2 and 8 are nearest neighbors. The other correlations are not zero, however. Corr2,5 is 0.826. This is because 2 is 5s nearest neighbor.6 Also, Corr2,10 0.764. This illustrates a three-way interaction: 8 is nearest neighbor to both 10 and 2, therefore 10 is correlated with 2
4 These correlograms were generated by graphing the values in the population correlation matrix (obtained from Eq. (3)) against the values in the distance matrix. 5 Row standardized means that W is transformed so that the rows sum to one. 6 Note that the reverse is not true: 8 (and not 5) is 2s nearest neighbor. Thus W is not symmetric for the nearest neighbor model.
310
ROBIN A. DUBIN
FIG. 2A. Nearest neighbor correlations: 0.67. (A1) One nearest neighbor; (A2) two nearest neighbors; (A3) three nearest neighbors.
311
FIG. 2B. Nearest neighbor correlations: 0.33. (B1) One nearest neighbor; (B2) two nearest neighbors; (B3) three nearest neighbors.
312
ROBIN A. DUBIN
FIG. 3A. Correlograms for Wij 4.
1 if Dij
L:
0.67. (A1) L
2; (A2) L
3; (A3)
313
FIG. 3B. Correlograms for Wij 4.
1 if Dij
L:
0.33. (B1) L
2; (B2) L
3; (B3)
314
ROBIN A. DUBIN
FIG. 4A. Correlograms for Wij
P 1/Dij :
0.67. (A1) P
1; (A2) P
2; (A3) P
3.
315
FIG. 4B. Correlograms for Wij
P 1/Dij :
0.33. (B1) P
1; (B2) P
2; (B3) P
3.
316
ROBIN A. DUBIN
TABLE II One Nearest Neighbor A. Weight Matrix 4 5 6 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.67 7 0.92 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 8 0.00 0.92 0.00 0.00 0.76 0.00 0.00 1.00 0.00 0.83 9 0.00 0.00 0.76 0.83 0.00 0.76 0.00 0.00 1.00 0.00 10 0.00 0.76 0.00 0.00 0.63 0.00 0.00 0.83 0.00 1.00
1 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 1 0 0 0
2 0 0 0 0 1 0 0 1 0 0
3 0 0 0 0 0 0 0 0 0 0
7 1 0 0 0 0 0 0 0 0 0
8 0 1 0 0 0 0 0 0 0 1
9 0 0 1 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0
1 1 2 3 4 5 6 7 8 9 10 1.00 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.00 0.00
2 0.00 1.00 0.00 0.00 0.83 0.00 0.00 0.92 0.00 0.76
3 0.00 0.00 1.00 0.63 0.00 0.58 0.00 0.00 0.76 0.00
B. Correlation Matrix: 4 5 6 0.00 0.00 0.63 1.00 0.00 0.92 0.00 0.00 0.83 0.00 0.00 0.83 0.00 0.00 1.00 0.00 0.00 0.76 0.00 0.63 0.00 0.00 0.58 0.92 0.00 1.00 0.00 0.00 0.76 0.00
because of 8. Although the correlations are not monotonic in terms of separation distance, they are with respect to the strength of the relationships. Corr2,8 is the highest because 2 and 8 are each others nearest neighbors. Corr2,5 is smaller because 2 is 5s nearest neighbor, but not the reverse. Corr2,10 is smaller yet because 10 and 2 are related only indirectly, through 8. The second observation to be drawn from Fig. 2 is that the choice of the number of nearest neighbors changes the implied correlation structure considerably. For one nearest neighbor, the correlations decline with distance. For two nearest neighbors, there are no nonzero correlations, because all of the points are related, either directly or indirectly. For three nearest neighbors, all of the correlations are about the same, because all of the points are related to each other (recall that there are only 10 observations). This is a potential weakness of this approach, because the researcher generally chooses (rather than estimates) the number of neighbors. 2.a.2. Wij 1 if Dij L. This spatial weighting scheme is similar in
317
Wij
TABLE III 1 if Dij 2
1 1 2 3 4 5 6 7 8 9 10 0 0 0 0 0 0 1 0 0 0
2 0 0 0 0 0 0 0 1 0 0
3 0 0 0 0 0 0 0 0 0 0
A. Weight Matrix 4 5 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.67
7 1 0 0 0 0 0 0 0 0 0
8 0 1 0 0 0 0 0 0 0 1
9 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 1 0 0
1 1 2 3 4 5 6 7 8 9 10 1.00 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.00 0.00
2 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.87 0.00 0.72
3 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
7 0.92 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
8 0.00 0.87 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.87
9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00
10 0.00 0.72 0.00 0.00 0.00 0.00 0.00 0.87 0.00 1.00
concept to nearest neighbors. The weight matrix is still binary; however, rather than specifying the number of ones in each row, this number is determined by setting a maximum distance within which points can inuence each other. Unlike the nearest neighbor scheme, this W is always symmetric. Despite the similarities, the two schemes produce different correlation patterns (see Table III). When one nearest neighbor is used, each row of W contains exactly one 1. When L 2, the number of 1s contained in each row of W is zero, one, or two, with the average being 0.75. Thus, L 2 is somewhat comparable to one nearest neighbor. However, L 2 gives much different correlations: there are only four off-diagonal nonzero correlations and these are very large. When L 3, the rows of W contain between zero and four 1s each, with the average being 1.8. But again the correlations are very different from two nearest neighbors: the correlations fall off more markedly with distance and some pairs exhibit zero correlation. When L 4, the average number of ones is 3, but again the pattern of correlations is much different from three nearest neighbors.
318
ROBIN A. DUBIN
TABLE IV 1/D3 Wij ij A. Weight Matrix 5 6 0.29 0.48 0.11 0.17 0.00 0.19 0.21 0.39 0.11 0.23 0.27 0.17 0.14 0.66 0.19 0.00 0.31 0.15 0.21 0.12
1 1 2 3 4 5 6 7 8 9 10 0.00 0.36 0.18 0.29 0.29 0.27 0.65 0.32 0.18 0.23
2 0.36 0.00 0.13 0.16 0.48 0.17 0.23 1.81 0.12 0.43
3 0.18 0.13 0.00 0.17 0.11 0.14 0.22 0.13 0.24 0.13
7 0.65 0.23 0.22 0.43 0.21 0.31 0.00 0.22 0.24 0.17 0.67 7 0.77 0.23 0.48 0.48 0.28 0.44 1.00 0.22 0.47 0.23
8 0.32 1.81 0.13 0.15 0.39 0.15 0.22 0.00 0.11 0.56
9 0.18 0.12 0.24 0.30 0.11 0.21 0.24 0.11 0.00 0.10
10 0.23 0.43 0.13 0.13 0.23 0.12 0.17 0.56 0.10 0.00
0.29 0.16 0.17 0.00 0.17 0.66 0.43 0.15 0.30 0.13
1 1 2 3 4 5 6 7 8 9 10 1.00 0.32 0.45 0.40 0.36 0.37 0.77 0.31 0.41 0.30
2 0.32 1.00 0.28 0.13 0.73 0.12 0.23 0.92 0.20 0.75
3 0.45 0.28 1.00 0.42 0.29 0.39 0.48 0.28 0.54 0.27
8 0.31 0.92 0.28 0.12 0.72 0.12 0.22 1.00 0.19 0.78
9 0.41 0.20 0.54 0.57 0.23 0.51 0.47 0.19 1.00 0.19
10 0.30 0.75 0.27 0.13 0.61 0.13 0.23 0.78 0.19 1.00
Here, the correlations fall off with separation distance, rather than being approximately constant, as for three nearest neighbors. However, this case is similar to nearest neighbors, in that the choice of the parameter greatly affects the correlation pattern. 2.a.3. Wij 1/DP . In this formulation, the elements of W are fractions. ij This is a departure from the earlier cases, both of which resulted in binary weight matrices. When this case is compared to the previous cases, it is important to remember that the larger P the smaller the band of inuence. Thus, P 3 is closest to one nearest neighbor and to L 2. The correlations for this case tend to fall with separation distance, particularly for the smaller bandwidths (see Table IV). The variation in the correlations is largest when the band width is small, because this allows the indirect relations to show up. As the band width increases, more of the points share neighbors, and so the correlations become more uniform. Finally, note that
319
the pattern of correlations produced by this scheme is markedly different from those produced by the other spatial weighting schemes. 2.a.4. Discussion. This illustration has demonstrated that different spatial weighting schemes produce markedly different implied correlation patterns. Furthermore, the choice of the parameter, which must be set once the family of weighting schemes is specied, also affects the implied correlations. This is problematic for a number of reasons. First, the weighting schemes discussed here are all plausible, and yet they imply different things for the data. Second, most tests of the presence of spatial autocorrelation are conditional on the choice of W. For example, Morans I statistic, which is one of the most commonly used tests of spatial autocorrelation, is given by the formula I N(e We) , S(e e) (4)
where N is the number of observations, e is a vector of regression residuals, S is a standardization factor, and W is the weight matrix. Clearly the results of this test will be conditional on the researchers choice of W.7 This problem is illustrated by a recent article by Can (1992). In this paper, Can uses three weighting schemes: W1ij 1 if Dij 5 miles, W2ij 1/Dij , and W3ij 1/D2 . She also uses two functional forms of the hedonic regression: linear ij and semilog. This gives six combinations. Three of these combinations show signicant spatial autocorrelation and three do not. Can has no way of knowing which is the correct specication and therefore whether the errors are spatially correlated or not.8 It would seem that users of this approach to modeling spatial autocorrelation should move in the direction of estimating the parameters of the weight matrix. 2.b. Second Approach: Direct Specication of the Covariance Structure In this approach, rather than starting with the process and deriving the covariance matrix, a functional form for the covariance structure is assumed. The parameters of this function are then estimated, along with the regression coefcients, using maximum likelihood methods. Functions are chosen which cause the correlations to fall as separation distance increases. The following are all permissible functions:
Kelejian and Robinson (1982) provide a test of spatial autocorrelation that does not use a weight matrix. 8 It is possible that the likelihood values could give some guidance as to which model best ts the data. However, this requires that the models be nested.
7
320 Negative Exponential
ROBIN A. DUBIN
Kij Gaussian Kij Spherical Kij b1 1 0
b1 exp
Dij b2
(5)
b1 exp
D2 ij b2
(6)
3Dij 2b2
D3 ij 2b3 2
if 0 if Dij
Dij b2 ,
b2
(7)
where K is the correlation matrix for the error terms (and 2K V ). The correlograms for these models for the simulated data are shown in Figs. 5 through 7. These gures differ from those for the weight matrix method. For example, in Fig. 2.a., the three panels represent different choices made by the researcher: the number of nearest neighbors to consider. In Fig. 5, the three panels represent different values of b1 , where b1 is estimated. Once the researcher picks the functional form, the data determine which of the nine functions shown is best (of course, values of b1 and b2 other than those shown in the gure are possible). The three functions result in similar graphs. The Gaussian correlogram falls off faster with separation distance than does the Negative Exponential. The Gaussian also has somewhat more weight at very small separation distances. This is difcult to see from these gures, however, because of the lack of observations with small separation distances (there is only one pair separated by a distance smaller than 1.5).9 As depicted in Fig. 7, the Spherical Correlogram looks very much like the Negative Exponential. In reality, the functions differ in their behavior near the origin, where the Spherical model produces higher correlations. These functions are much smoother than the implied correlograms for the various weight matrices. This is because the correlations are modeled directly, and thus, all points separated by a given distance will have the
9 This turns out to be a problem in empirical work as well. Typically, there are many pairs with large separation distances but a much smaller number with small separation distances. This can make it difcult to t the beginning of the curve.
321
FIG. 5. Correlograms for Kij
b1 exp( Dij /b2). (A) b1
0.95; (B) b1
0.67; (C) b1
0.33.
322
ROBIN A. DUBIN
FIG. 6. Correlograms for Kij
b1 exp( D2 /b2). (A) b1 ij
0.95; (B) b1
0.67; (C) b1
0.33.
323
FIG. 7. Correlograms for spherical case. (A) b1
0.95; (B) b1
0.67; (C) b1
0.33.
324
ROBIN A. DUBIN
same correlation, regardless of the location of other points. This is not the case for the weight matrix correlograms. For example, in the case of one 0.95, points 5 and 8 have a correlation of 0.629 nearest neighbor and and are separated by a distance of 2.549. Points 4 and 7 are closer (separation distance equals 2.327), but have a correlation of zero. This seeming anomaly occurs because point 2 provides the link between points 5 and 8 (as described earlier), while points 4 and 7 have other points which are closer to them and therefore are not nearest neighbors. 2.c. Discussion As pointed out above, there are two main approaches to modeling spatial autocorrelation: the weight matrix approach and the direct approach. Within each approach, there are alternatives available to the researcher (i.e., the method of forming the weight matrix or the functional form for the direct approach). As shown by the gures, each alternative implies a different assumption about the spatial relationships in the data. The literature currently provides little guidance about which models work best in which situations. However, two points seem clear. First, to the extent possible, it is probably better to estimate the parameters of the model, rather than choosing them a priori. Second, any spatial modeling of the error terms, in a situation when autocorrelation is likely to be present, will dominate a model which ignores the problem completely.
3. ESTIMATION
Once a model (weight matrix or direct approach) has been chosen to represent the covariance structure of the error terms, it can be estimated via Maximum Likelihood.10 In Maximum Likelihood estimation, the following log likelihood function is maximized with respect to the unknown parameters: ln(L) 1 ln V 2 n ln(Y 2
X ) V 1(Y
X ).
(8)
The unknowns are the regression coefcients ( ), the error variance ( 2),
Other techniques are available. For example, in the direct approach, one technique is to t (usually by eye) the parameters to an empirical correlogram (which is the average correlation among all points in a given separation distance range, plotted against separation distance). Once the parameters of the correlation function have been estimated, EGLS (estimated generalized least squares) can be used to obtain the regression coefcients. These techniques will not be discussed further here.
10
325
and the parameters of V (b1 and b2 or , depending on which approach is used). One nice byproduct of the ML approach is that a likelihood ratio test can be used to determine the presence of spatial autocorrelation: two times the difference between the likelihood functions of the restricted and unrestricted models is distributed as a 2 random variable. Here the restricted model is OLS, i.e., restricting V to be the identity matrix. The degrees of freedom are 1 for the weight matrix approach and 2 for the direct approach.
4. OTHER ISSUES
4.a. Sample Size V is an N N matrix, where N is the sample size. The log likelihood function contains both the determinant and the inverse of this matrix. Thus, the computational burden increases with sample size. However, since the accuracy of the estimates also increase with the sample size, it is important to use a large sample size in these problems. Pace (1997) has suggested the use of sparse matrix techniques to facilitate the use of large samples. If V is specied so that the number of nonzero elements is relatively small, these methods can reduce the computational burden considerably. 4.b. Measurement of Separation Distance Urban areas vary in the density of development. Therefore, it is possible that neighborhood size varies with the location of the neighborhood within the city: dense areas may have neighborhoods which are more compact, while suburban areas may have geographically larger neighborhoods. Researchers may wish to account for this by using separation measures other than geographic distance. For example, Dubin (1992) measures separation distance in terms of houses. 4.c. Functional Form of the Regression Hedonic regressions are reduced form, and economic theory has little to say about the proper functional form of such an equation. Most of this paper addresses the issue of the assumed form of the covariance structure. Clearly the functional form of the regression itself is of even greater importance: the regression residuals will not reect the true error structure if the wrong functional form is used.
326
ROBIN A. DUBIN
5. FURTHER READING
Below, I provide an annotated list of sources which the interested reader may wish to consult. Some of these are cited elsewhere in this paper. Texts 1. Anselin (1988). This book provides an extremely complete presentation of the weight matrix approach. 2. Ripley (1981). Chapter 4 of this book provides an excellent discussion of Kriging (prediction incorporating the spatially autocorrelated errors). 3. Upton and Fingleton (1985). In Chapter 5, the authors discuss regression with autocorrelated errors, using the weight matrix approach. This book is particularly nice because data and solutions are provided for most of the techniques discussed. 4. Anselin and Florax (1995). This book is an edited collection of many interesting papers on spatial econometrics. Papers 1. Dubin (1988). This is probably the rst application of these techniques to estimating a hedonic regression. This paper uses the direct approach. 2. Can (1992). An example of a hedonic estimation using the Weight matrix technique. 3. Pace et al. (1998). Provides an example of estimating the number of nearest neighbors in the weight matrix. 4. Basu and Thibodeau (1998). Kriges housing prices in Dallas. 5. Dubin (1998). Discusses the issues involved in Kriging housing prices. 6. Pace (1997). Uses sparse matrix techniques to facilitate the estimation.
REFERENCES
Anselin, L. (1988). Spatial Econometrics: Methods and Models. Dordrecht: Kluwer. Anselin, L., and Florax, R. (1995). New Directions in Spatial Econometrics. Berlin: Springer-Verlag. Basu, S., and Thibodeau, T. (1998). Analysis of Spatial Autocorrelation in House Prices, J. Real Estate Finance Econ. 17, 6186. Can, A. (1992). Specication and Estimation of Hedonic Housing Price Models, Reg. Sci. Urban Econ. 22, 453474.
327
Cliff, A. D., and Ord, J. K. (1981). Spatial Processes: Models and Applications. London: Pion. Dubin, R. A. (1988). Estimation of Regression Coefcients in the Presence of Spatially Autocorrelated Error Terms, Rev. Econ. Statist., 168173. Dubin, R. A. (1992). Spatial Autocorrelation and Neighborhood Quality, Reg. Sci. Urban Econ. 22, 433452. Dubin, R. A. (1998). Predicting House Prices Using Multiple Listings Data, J. Real Estate Finance Econ. 17, 3560. Dubin, R. A., Pace, K., and Thibodeau, T. (forthcoming). Spatial Autoregression Techniques for Real Estate Data. Kelejian, H. H., and Robinson, D. P. (1982). Spatial Autocorrelation: A New Computationally Simple Test with an Application to Per Capita County Policy Expenditures, Reg. Sci. Urban Econ. 22, 317332. Matheron, G. (1963). Principles of Geostatistics, Econ. Geol. 58, 12461266. Pace, K. (1997). Performing Large Spatial Regressions and Autoregressions, Econ. Lett., 283291. Pace, K., and Gilley, O. (1998). Generalizing the OLS and Grid Estimators, Real Estate Econ., 331347. Pace, K., Barry, R., Clapp, J. M., and Rodriguez, M. (1998). Spatiotemporal Autoregressive Models of Neighborhood Effects, J. Real Estate Finance Econ., 1534. Ripley, B. D. (1981). Spatial Statistics. New York: Wiley. Tobler, W. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region, Econ. Geog. Supplement 46, 234240. Upton, G., and Fingleton, B. (1985). Spatial Data Analysis by Example. New York: Wiley.

Spatial Auto Correlation Primer - Anselin - Luc

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Auto Correlation Primer - Anselin - Luc

Uploaded by

Copyright:

Available Formats

JOURNAL OF HOUSING ECONOMICS ARTICLE NO.

Spatial Autocorrelation: A Primer

The correlations are derived from (3) as follows: Corrij

FIG. 3A. Correlograms for Wij 4.

FIG. 3B. Correlograms for Wij 4.

FIG. 4A. Correlograms for Wij

FIG. 4B. Correlograms for Wij

TABLE III 1 if Dij 2

A. Weight Matrix 4 5 6 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.67

320 Negative Exponential

Kij Gaussian Kij Spherical Kij b1 1 0

FIG. 5. Correlograms for Kij

b1 exp( Dij /b2). (A) b1

FIG. 6. Correlograms for Kij

b1 exp( D2 /b2). (A) b1 ij

FIG. 7. Correlograms for spherical case. (A) b1

You might also like