Distance Analysis: Nearest Neighbor Index (Nna)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Chapter 5

Distance Analysis
In t h is ch a pt er , t ools t h a t iden t ify cha r a cter ist ics of th e dist a n ces bet ween point s
will be d escribed. Th e pr evious cha pt er pr ovided t ools for descr ibin g t h e gen er a l sp a t ia l
dis t r ibu t ion of cr im e in cid en t s or first-ord er pr oper t ies of t h e inciden t dist r ibut ion (Ba iley
a n d Ga t t r ell, 1995). F ir st -or der pr oper t ies a r e global be cau se t h ey r epr esen t t h e domin a n t
pa t t er n of dis t r ibu t ion - wh er e it is cen t er ed, how fa r it sp r ea ds out , an d wh et h er t h er e is
a n y orien t a t ion or dir ect ion t o its dis per sion . S econ d -ord er (or local) pr oper t ies, on t h e
ot h er h a n d, r efer t o s u b-r egion a l p a t t er n s or ‘n eigh bor h ood ’ p a t t er n s wit h in t h e over a ll
dist r ibut ion . If t h er e a r e dist inct ‘h ot spots ’ wh er e m a n y cr ime in ciden t s clu st er t ogeth er ,
t h eir d ist r ibut ion is spa t ially rela t ed n ot so mu ch t o t h e overa ll globa l pa t t er n a s t o
som et h in g u n iqu e in t h e su b-r egion or n eigh bor h ood. Th u s, s econ d-or der ch a r a ct er is t ics
t ell some t h in g a bout pa r t icula r en vir onm en t s t h a t m a y con cent r a t e crim e in ciden t s.
F igu r e 5.1 sh ows t h e dis t a n ce a n a lysis scr een a n d t h e dis t a n ce st a t is t ics t h a t a r e
calcu la t ed by Crim eS tat.

N e a r e s t N e i g h b o r In d e x (N n a )

On e of t h e oldest dist a n ce st a t ist ics is t h e n earest n eigh bor in d ex. It is p a r t icula r ly


u se ful becau se it is a sim ple t ool t o un der st a n d a n d t o ca lcula t e. It wa s d evelope d by t wo
bot a n ist s in t h e 1950s (Clar k a n d E van s, 1954), prim a r ily for field wor k, bu t it h a s been
u sed in m a n y differ en t fields for a wide va r iet y of pr oblem s (Cr essie, 1991). It h a s a lso
becom e t h e bas is of m a n y ot h er t ypes of dist a n ce st a t ist ics, som e of wh ich a r e imp lemen t ed
in Crim eS tat.

Th e n ea r est n eigh bor in dex comp a r es t h e dist a n ces bet ween n ea r est point s a n d
dis t a n ces t h a t wou ld be exp ect ed on t h e ba sis of ch a n ce. It is a n in dex t h a t is t h e r a t io of
t wo su m m a r y mea su r es. Fir st , th er e is th e n earest n eigh bor d istan ce. F or ea ch poin t (or
in cid en t loca t ion ) in t u r n , t h e d is t an ce t o t h e clos es t ot h er poin t (n ea r es t n eigh bor ) is
calculat ed and a veraged over all points.

N Min (d ij)
Nea r est Neigh bor Dis t a n ce = d(NN) = G[ ----------- ] (5.1)
i=1 N

wh er e Min (d ij) is th e dista nce between each point a nd its nea rest n eighbor a nd N is th e
n u m ber of point s in t h e dist r ibu t ion . Th u s, in Crim eS tat, t h e dist a n ce fr om a sin gle point
t o every ot h er poin t is ca lcu lat ed a n d t h e sm a llest d ist a n ce (t h e m inim u m ) is select ed.
Th en , t h e n ext point is t a k en a n d t h e dist a n ce to a ll ot h er point s (in clud in g t h e firs t point
m ea su r ed) is ca lcula t ed wit h t h e n ea r est bein g selected a n d a dd ed t o th e firs t m in im u m
dis t a n ce. Th is p r oces s is r epea t ed u n t il a ll poin t s h a ve h a d t h eir n ea r es t n eigh bor select ed.
Th e t ot a l s u m of t h e m in im u m dis t a n ces is t h en divid ed by N , t h e sa m ple size, t o pr odu ce
a n a vera ge min imu m dist a n ce.

171
Figure 5.1: Distance Analysis Screen
Th e second su m m a r y m ea su r e is t h e expect ed n ea r est n eigh bor dis t a n ce if t h e
dist r ibut ion of poin t s is com plet ely spa t ially ra n dom. This is t h e m ean ran d om d istan ce (or
th e mean ra ndom n earest neighbor dista nce). It is defined as

A
Mea n Ra n dom Dis t a n ce = d(r a n ) = 0.5 SQRT [ ------] (5.2)
N

wh er e A is t h e a r ea of t h e r egion a n d N is t h e n u m ber of inciden t s. Since A is defin ed by


th e squa re of th e unit of measu rem ent (e.g., squar e mile, squar e meters, etc.), it yields a
r a n dom dis t a n ce m ea su r e in t h e sa m e u n it s (i.e ., m iles, m et er s, et c.). 1 If d efin ed on t h e
m ea su r em en t pa r a m et er s p a ge by t h e u se r , Crim eS tat will u se t h e specified a r ea in
calculat ing th e mean ra ndom dista nce. If no ar ea mea sur ement is provided, Crim eS tat will
t a k e t h e r ecta n gle defined by t h e m in im u m a n d m a xim u m X a n d Y point s.

Th e n ea r est n eigh bor ind ex is t h e r a t io of t h e obser ved nea r est n eigh bor dist a n ce t o
t h e m ea n r a n dom d is t a n ce

d(NN)
Nea r est Ne ighbor In dex = N NI = --------------- (5.3)
d(r a n )

Th u s, t h e in dex com pa r es t h e a ver a ge dis t a n ce fr om t h e clos est n eigh bor t o ea ch


poin t with a dist a n ce t h a t wou ld be expect ed on t h e bas is of ch a n ce. If t h e obser ved
a vera ge dista n ce is about t h e sa m e a s t h e m ea n r a n dom dist a n ce, th en t h e r a t io will be
a bou t 1.0. On t h e ot h er h a n d, if t h e obser ved aver a ge dista n ce is sm a ller t h a n t h e m ea n
r a n dom dis t a n ce, t h a t is , p oin t s a r e a ct u a lly closer t oget h er t h a n wou ld be exp ect ed on t h e
ba sis of ch a n ce, t h en t h e n ea r est n eigh bor in dex will be less t h a n 1.0. Th is is eviden ce for
clustering. Conversely, if th e observed average dista nce is great er th an th e mean ra ndom
dis t a n ce, t h en t h e in dex will be gr ea t er t h a n 1.0. Th is wou ld be eviden ce for dis per sion ,
t h a t poin t s a r e m or e widely disper sed t h a n wou ld be expect ed on t h e bas is of ch a n ce.

Te s t i n g t h e S i g n i fi c a n c e o f t h e N e a r e s t N e i g h b o r In d e x

Some differ en ces from 1.0 in t h e n ea r est n eigh bor ind ex wou ld be expect ed by
ch a n ce. Cla r k a n d E van s (1954) pr oposed a Z-t est t o ind ica t e wh et h er t h e obser ved
a ver a ge n ea r est n eigh bor dis t a n ce wa s sign ifica n t ly differ en t fr om t h e m ea n r a n dom
dist a n ce (H a m m on d a n d McCullagh, 1978; Ripley, 1981). The t est is betw een t h e obser ved
n ea r est n eigh bor dist a n ce a n d t h a t expect ed from a r a n dom dist r ibut ion a n d is given by

d(N N ) - d (r a n )
Z = ---------------------- (5.4)
SE d (r a n )

wh er e t h e st a n da r d er r or of th e m ea n r a n dom dis t a n ce is a pp r oxima t ely given by:

173
(4 - B) A 0.26136
SE d (r a n ) . SQRT [--------------- ] . --------------------- (5.5)
4B N 2 SQRT[ N 2 /A ]

with A being t h e a r ea of r egion a n d N t h e n u m ber of poin t s. Ther e h a ve been oth er


su ggest ed t est s for t h e n ea r est n eigh bor dist a n ce a s well as cor r ect ion s for edge effect s (see
below). H owever , equ a t ions 5.4 a n d 5.5 a r e u se d m ost frequ en t ly t o test t h e a ver a ge
n ea r est n eigh bor dist a n ce. See Cress ie (1991) for det a ils of ot h er t est s.

Ca lc u l a ti n g t h e s t a t is t i c s

On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t in e. Th e pr ogr a m out pu t s 10 st a t is t ics :

1. Th e sa m ple size
2. Th e m ea n n ea r est n eigh bor dis t a n ce
3. Th e st a n da r d devia t ion of t h e n ea r est n eigh bor dis t a n ce
4. Th e m in im u m d is t a n ce
5. Th e m a xim u m d is t a n ce
6. Th e m ea n r a n dom dist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
7. Th e m ea n disper sed d ist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
8. Th e n ea r est n eigh bor ind ex for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
9. Th e st a n da r d er r or of t h e n ea r est n eigh bor in dex for bot h t h e m a xim u m
bou n din g recta n gle a n d t h e u ser inp u t a r ea , if pr ovided
10. A significa n ce t est of t h e n ea r est n eigh bor ind ex (Z-t est )

In a dd it ion , t h e out pu t can be s a ved t o a ‘.dbf’ file, wh ich can t h en be im por t ed in t o


spreadsh eet or gra phics progra ms.

Exam ple 1: The ne ares t ne ighbo r inde x for street robbe ries

In 1996, t h er e wer e 1181 st r eet r obber ies in Ba lt im or e Cou n t y. Th e a r ea of th e


Coun t y is a bout 607 squ a r e m iles a n d is sp ecified on t h e m ea su r em en t pa r a m et er s p a ge.
Crim eS tat r et u r n s t h e st a t ist ics sh own in Ta ble 5.1 with t h e NN A r ou t ine.

Crim eS tat does n ot p r ovide t h e sign ifica n ce level of t h e t est , bu t only t h e Z-va lu e.
H owever , t h e sign ifica n ce level of t h e Z-va lu e ca n be fou n d in a n y t a ble of st a n da r d n or m a l
devia n t s. In t h is ca se, a Z-va lu e of -44.4672 is h igh ly sign ifica n t (p#.001). In oth er words,
t h e dist r ibu t ion of t h e n ea r est n eigh bors of st r eet r obber ies in Ba lt im ore Coun t y is
sign ifica n t ly sm a ller t h a n t h e exp ect ed dis t r ibu t ion of n ea r est n eigh bor s.

174
Ta ble 5.1
Ne are st N e igh bo r Sta tis tic s for
1996 Street Robb erie s in B altimore County
N=1181

Mea n n ea r est n eigh bor dist a n ce: 0.11598 m i


Mea n r a n dom dist a n ce bas ed on u ser inp u t a r ea : 0.35837 m i
Nea r es t n eigh bor in dex: 0.3236
S ta n da r d er r or : 0.00545 m i
Tes t St a t ist ic (Z): -44.4672

It sh ou ld be n ot ed t h a t t h e sign ifica n ce t est for t h e n ea r est n eigh bor in dex is n ot a


test for complete spatial ran domn ess, for which it is somet imes mistaken . It is only a t est
wh et h er t h e a ver a ge n ea r est n eigh bor d ist a n ce is sign ifican t ly differ en t t h a n wh a t would
be expect ed on t h e ba sis of cha n ce. In ot h er wor ds, it is a t est of first-ord er near est
neighbor r an domn ess.2 Th er e a r e a ls o secon d-or der , t h ir d-or der , a n d so for t h dis t r ibu t ion s
t h a t m a y or m a y not be significa n t ly differ en t fr om t h eir cor r espond ing or der s u n der
com ple t e spa t ia l r a n dom n es s. A comp let e t es t would h a ve t o test for a ll t h ose effects , wh a t
ar e called K -ord er effect s.

Exam ple 2: The ne ares t ne ighbo r inde x for res iden tial burglaries

Th e n ea r es t n eigh bor in dex a n d t es t can be ver y u se ful for u n der st a n din g t h e


degr ee of clu st er in g of cr im e in cid en t s in spit e of it s lim it a t ion s. F or exa m ple, in Ba lt im or e
Cou n t y, t h e dist r ibut ion of 6051 res iden t ial bu r gla r ies in 1996 yields t h e following n ea r est
n eigh bor st a t is t ics (Ta ble 5.2):

Ta ble 5.2
Ne are st Ne ig h bo r St at is tic s for
1996 R esid e n tial Bu r glar ie s in B alt im or e Coun t y
N=6051

Mea n n ea r est n eigh bor dist a n ce: 0.07134 m i


Mea n r a n dom dist a n ce bas ed on u ser inp u t a r ea : 0.16761 m i
Nea r es t n eigh bor in dex: 0.4256
S ta n da r d er r or : 0.00113 m i
Tes t St a t ist ic (Z): -85.4750

Th e dis t r ibu t ion of r esid en t ia l bu r gla r ies is a ls o h igh ly sign ifica n t . N ow, s u ppose
we wa n t t o comp a r e t h e dist r ibu t ion of st r eet r obber ies (ta ble 5.1) wit h t h a t r esiden t ia l
bu r gla r ies (t a ble 5.2). Th e sign ifica n ce t est is n ot ver y u sefu l for t h e com pa r is on beca u se
t h e sa m ple sizes a r e so la r ge (1181 v. 6051); t h e m u ch h igher Z-va lu e for r esiden t ia l
bu r gla r ies indicat es pr ima r ily t h a t t h er e wa s a lar ger s a m ple size to test it. H owever,
com pa r in g t h e r ela t ive n ea r est n eigh bor in dices ca n be m ea n in gfu l.

175
Rela t ive
Near est
Neigh bor NNI(A)
Com pa r is on = ----------------- (5.6)
NN I(B)

wh er e NN I(A) is t h e n ea r est n eigh bor ind ex for on e group (A) a n d N NI (B) is t h e n ea r est
n eigh bor in dex for a n oth er gr oup (B). Th u s, com pa r in g st r eet r obber ies wit h r esiden t ia l
bu r gla r ies , we h a ve

NNI (A) NNI (robberies) 0.3057


-------------- = ------------------------ = ---------- = 0.7182
NN I (B) NNI (burglar ies) 0.4256

In ot h er wor ds, t h e dis t r ibu t ion of st r eet r obber ies r ela t ive t o a n exp ect ed r a n dom
dis t r ibu t ion a ppea r s t o be m or e con cen t r a t ed t h a n t h a t of bu r gla r ies r ela t ive t o a n
exp ect ed r a n dom dis t r ibu t ion . Th er e is n o sim ple sign ifica n ce t est of t h is com pa r is on sin ce
th e stan dar d error of th e joint distributions is not k nown. But th e relat ive index suggests
t h a t r obber ies a r e m ore con cent r a t ed t h a n bu r gla r ies a n d, h en ce, ar e m ore likely t o ha ve
‘h ot spot’ or ‘h ot zon es’ wh er e t h ey ar e pa r t icu lar ly con cen t r a t ed. This in dex, of cou r se,
does not pr ove th a t t h er e a r e ‘h ot spots ’, but on ly poin t s u s t owa r ds t h e h igh er
con cent r a t ion of robberies rela t ive to bur gla r ies. In t h e pr eviou s ch a pt er, it wa s sh own
t h a t r obber ies h a d a sm a ller dis per sion t h a n bu r gla r ies. H er e, h owever , t h e a n a lysis is
ta ken a step fur th er to suggest th at robberies ar e more concentr at ed tha n bur glaries.

K-Orde r Ne are st Ne ig h bo rs

As m en t ion ed a bove, t h e n ea r est n eigh bor ind ex is only an ind ica t or of firs t -or der
spa t ia l r a n dom n ess. It com pa r es t h e a ver a ge dis t a n ce for t h e n ea r est n eigh bor t o a n
expe cted r a n dom dis t a n ce. But wh a t a bout t h e secon d n ea r est n eigh bor? Or t h e t h ir d
n ea r est n eigh bor ? Or t h e K t h n ea r es t n eigh bor ? Crim eS tat const ru cts K-order n earest
n eigh bor in dices. On t h e dis t a n ce a n a lysis pa ge, t h e u ser ca n specify t h e n u m ber of
n ea r es t n eigh bor in dices t o be calcu la t ed.

Th e K-or der n ea r est n eigh bor r out in e r et u r n s fou r colu m n s:

1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
3. Th e expect ed n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e n ea r est n eigh bor ind ex for ea ch or der

F or ea ch or der , Crim eS tat ca lcu lat es t h e K t h n ea r est n eigh bor dis t a n ce for ea ch
obser va t ion a n d t h en t a k es t h e a ver a ge. Th e exp ect ed n ea r est n eigh bor dis t a n ce for ea ch
ord er is ca lcula t ed by:

176
Mea n Ra n d om Dis t a n ce K (2K)!
t o K t h n ea r est n eigh bor = d(K r a n ) = ------------------------------ (5.7)
(2 K K!)2 SQRT [N/A]

wh er e K is t h e or der a n d ! is t h e fa ct or ia l op er a t ion (e.g., 4! = 4 x 3 x 2 x 1; Th om pson ,


1956). The K t h n ea r est n eigh bor ind ex is t h e r a t io of t h e obser ved K t h n ea r est n eigh bor
dist a n ce t o t h e K t h m ea n r a n dom dist a n ce. Ther e is not a good significa n ce t est for t h e K t h
n ea r est n eigh bor ind ex due t o t h e n on -ind epen den ce of t h e differ en t or der s, t h ou gh t h er e
h a ve been a t t em pt s (see exa m ple s in Get is a n d Boot s, 1978; Aplin , 1983). Cons equ en t ly,
Crim eS tat does n ot p r ovide a t es t of sign ifica n ce.

Th er e a r e n o rest r iction s on t h e n u m ber of n ea r es t n eigh bors t h a t can be ca lcula t ed.


H owever , sin ce th e a ver a ge dist a n ce incre a se s w it h h igh er -or der n ea r es t n eigh bors, t h e
poten t ial for bias fr om edge effect s will a lso increa se. It is su ggest ed t h a t n ot m or e t h a n
100 nearest neighbors be calculat ed.3

Never t h eless, t h e K-or der n ea r est n eigh bor dis t a n ce a n d in dex ca n be u sefu l for
u n der st a n din g th e overa ll spa t ial dist r ibut ion s. Figur e 5.2 com pa r es t h e K-or der n ea r est
n eigh bor index for st r eet r obberies with t h a t of res ident ial bur gla r ies. The out pu t was
sa ved as a ‘.dbf’ an d wa s t h en im por t ed int o a gr a ph ics pr ogra m . Th e gra ph sh ows t h e
n ea r est n eigh bor ind ices for bot h r obber ies a n d bu r gla r ies u p t o t h e 50 t h or der (i.e., t h e 50 t h
n ea r est n eigh bor ). Th e n ea r est n eigh bor in dex is sca led fr om 0 (ext r em e clu st er in g) u p t o 1
(extr em e disper sion ). Since a n ea r est n eigh bor ind ex of 1 is expect ed u n der r a n domn ess,
t h e t h in st r a igh t lin e a t 1.0 in dica t es t h e exp ect ed K-or der in dex. As ca n be seen , bot h
st r eet r obber ies a n d r esiden t ial bu r gla r ies a r e m u ch m or e con cen t r a t ed t h a n K-or der
spa t ia l r a n dom n ess. F u r t h er , r obber ies a r e m or e con cen t r a t ed t h a n even bu r gla r ies for
ea ch of t h e 50 nea r est n eigh bor s. Thu s, t h e gra ph r einfor ces t h e a n a lysis a bove th a t
r obber ies a r e m or e con cen t r a t ed t h a n bur gla r ies, an d both a r e m or e con cen t r a t ed t h a n a
r a n dom dis t r ibu t ion.

In ot h er wor ds, even t h ou gh t h er e is not a good significa n ce t est for t h e K-or der
n ea r est n eigh bor ind ex, a gra ph of t h e K-or der ind ices (or t h e K-or der dist a n ces) ca n give a
p ict u r e of h ow clu s t er ed t h e d is t r ibu t ion is a s well a s a llow com p a r is on s in clu s t er in g
bet ween t h e differ en t t yp es of crim es (or t h e sa m e cr im e a t t wo differ en t t im e per iods).

Edge Effec ts

It sh ou ld be noted t h a t t h er e a r e poten t ial edge effect s t h a t ca n bias t h e n ea r est


n eigh bor in dex. An in ciden t occur r in g n ea r t h e bor der of t h e st u dy a r ea m a y a ctu a lly h a ve
it s n ea r es t n eigh bor on t h e oth er sid e of t h e bor der . H owever , sin ce th er e a r e u su a lly n o
da t a on t h e dist r ibut ion of inciden t s out side t h e st u dy ar ea , th e pr ogra m select s a n ot h er
poin t wit h in t h e st u dy a r ea a s t h e n ea r est n eigh bor of th e border point . Th u s, t h er e is t h e
poten t ial for exaggera t ing t h e n ea r est n eigh bor dist a n ce, th a t is, th e obser ved nea r est
n eigh bor d ist a n ce is pr obably grea t er t h a n wh a t it sh ould be a n d, t h er efor e, t h er e is a n
overestim ation of t h e n ea r est n eigh bor d ist a n ce. In oth er words , t h e in ciden t s a r e pr obably
m ore clus t er ed t h a n wh a t h a s been m ea su r ed (see Cr es sie , 1991 for det a ils).

177
Figure 5.2

K-Order Nearest Neighbor Indices


1996 Street Robberies and Residential Burglaries
2.0

1.8

1.6
Nearest Neighbor Index

1.4

1.2

K-order spatial randomness


1.0

0.8
Residential burglaries
0.6

0.4 Street robberies

0.2

0.0
1 5 9 13 17 21 25 29 33 37 41 45 49
3 7 11 15 19 23 27 31 35 39 43 47

Order of Nearest Neighbor Index


Nearest Neighbor Analysis
Man With A Gun Calls
Charlotte, N.C.: 1989

James L. LeBeau
Administration of Justice
Southern Illinois University-Carbondale

A comparison was made of Man with a Gun calls for the weekend in which
Hurricane Hugo hit the North Carolina coast ( September 22 – 24) with the
following New Year’s Eve weekend (December 29-31, 1989). There were 146 Man
with a Gun calls during the Hurricane Hugo weekend compared to 137 calls for New
Year’s Eve.

Nearest Neighbor Index of Man With A Gun Calls

0.85

0.80
Clustered - Index - Dispersed

0.75

0.70

New Year's Eve Weekend


Hurricane Hugo Weekend
0.65

0.60

0 5 10 15 20 25
ORDER

The Nearest Neighbor Index in CrimeStat was used to compare the


distributions. From the onset, the Hurricane Hugo Man With a Gun locations are
more dispersed than New Year’s Eve. After the 5th nearest neighbor (Order 5) the
differences become more pronounced
N e a re s t N e ig h bo r E dg e Co rre c ti on s

Th e defau lt con dit ion is no edge cor r ect ion . However , on e wa y th a t t h e m ea su r ed


dis t a n ce t o t h e n ea r est n eigh bor ca n be cor r ect ed for possible edge effect s is t o a ssu m e for
ea ch obser ved point t h a t t h er e is an ot h er poin t ju st ou t side t h e bor der a t t h e closest
dist a n ce. If t h e dist a n ce fr om a poin t t o t h e bor der is sh or t er t h a n t o its m ea su r ed n ea r est
n eigh bor, t h en t h e n ea r er t h eor et ical p oint is t a k en a s a pr oxy for t h e n ea r es t n eigh bor.
Th u s, with ea ch poin t in t h e da t a set , th e obser ved nea r est n eigh bor dist a n ce is com pa r ed
t o th e dist a n ce, th e m ea su r ed dis t a n ce is k ept . Th is cor r ection h a s t h e effect of r edu cing
t h e a ver a ge n eigh bor d ist a n ce. Since it a ss u m es t h a t t h er e is a lwa ys a n oth er point a t t h e
border , it pr obably un derestim ates t h e t r u e n ea r est n eigh bor d ist a n ce. The t r u e va lu e is
pr oba bly somewh er e in bet ween t h e m ea su r ed a n d t h e a ssu m ed n ea r est n eigh bor dist a n ce.

Crim eS tat h a s t wo differ en t edge cor r ect ion s. Beca u se Crim eS tat is not a GIS
pa cka ge, it ca n n ot loca t e t h e a ct u a l bor der of a st u dy a r ea . On e wou ld n eed a t opologica l
GIS p a cka ge in wh ich t h e dist a n ce fr om ea ch p oint t o th e n ea r es t boun da r y is ca lcula t ed.
In st ea d, th er e a r e t wo differ en t geom et r ic m odels t h a t ca n be ap plied. The firs t a ssu m es
t h a t t h e st u dy a r ea is a r ect a n gle wh ile t h e secon d a ss u m es t h a t t h e st u dy a r ea is a circle.
Depen din g on t h e sh a pe of t h e a ct u a l st u dy ar ea , on e or eith er of t h ese m odels m a y be
a ppr opr iat e.

R ect a n gu la r st u d y a r ea

In t h e r ect a n gula r a djus t m en t , th e a r ea of t h e st u dy ar ea , A, is fir st ca lcu lat ed,


eit h er fr om t h e u s er in p u t on t h e m ea s u r em en t p a r a m et er s t a b or fr om t h e m a xim u m
boun din g r ect a n gle defin ed by t h e m in im u m a n d m a xim u m X/Y va lu es (see cha pt er 3). If
t h e u ser pr ovides a n est im a t e of t h e a r ea , t h e r ect a n gle is pr opor t ion a t ely r e-sca led so t h a t
t h e a r ea of t h e r ect a n gle equa ls A. Secon d, for ea ch poin t , th e dist a n ce t o t h e n ea r est ot h er
poin t is calcula t ed. Th is is t h e obser ved n ea r est n eigh bor d ist a n ce for p oin t i.

Th ird , th e m inim u m dist a n ce t o t h e n ea r est edge of t h e r ect a n gle is ca lcu lat ed a n d


is com pa r ed t o t h e obser ved nea r est n eigh bor dist a n ce for poin t i. If t h e obser ved nea r est
n eigh bor d ist a n ce for p oin t i is equ a l t o or less t h a n t h e dist a n ce to t h e n ea r est bord er , it is
r et a in ed. On t h e oth er h a n d, if th e obser ved n ea r est n eigh bor d ist a n ce for p oin t i is
great er th an th e dista nce to th e nearest border, th e dista nce to th e border is used as a
pr oxy for t h e n ea r est n eigh bor d ist a n ce of poin t i.

Ci r cu la r st u d y a r ea

In t h e cir cu la r a dju st m en t , fir st , t h e a r ea of t h e st u dy a r ea is ca lcu la t ed, eit h er fr om


t h e u ser in pu t on t h e m ea su r em en t pa r a m et er s t a b (see ch a pt er 3) or from t h e m a xim u m
bou n din g recta n gle defin ed by th e m inim u m a n d m a xim u m X/Y valu es. If t h e u ser h a s
s pecified a s tu dy a r ea on t h e m ea su r em en t pa r a m et er s p age, t h en t h a t va lu e is ta k en for A
a n d t h e r a diu s of t h e circle is ca lcu lat ed by

R = SQRT [A / B ] (5.8)

180
If t h e u ser h a s n ot s pecified a st u dy a r ea on t h e m ea su r em en t pa r a m et er s p a ge, t h en A is
ca lcu la t ed fr om t h e m in im u m a n d m a xim u m X a n d Y coor d in a t es (t h e bou n d in g r ect a n gle)
a n d t h e r a diu s of t h e circle is ca lcu lat ed wit h equa t ion 5.8.

Secon d, for ea ch p oin t , t h e dist a n ce to t h e n ea r est oth er point is calcula t ed. Th is is


t h e obs er ved n ea r est n eigh bor dis t a n ce for poin t i. Th ir d, for ea ch poin t , i, t h e dis t a n ce
from t ha t point t o th e mean center is calculat ed, R i . F ou r t h , t h e m in im u m d is t a n ce t o t h e
n ea r est edge of t h e circle is ca lcula t ed u sin g

R iC = R - R i (5.9)

F ift h , for ea ch poin t , i, th e obser ved m inim u m dist a n ce is com pa r ed t o t h e n ea r est


edge of t h e circle, R iC . If t h e obs er ved n ea r est n eigh bor dis t a n ce for poin t i is equ a l t o or
less t h a n t h e dis t a n ce t o t h e n ea r est ed ge, it is r et a in ed. On t h e ot h er h a n d, if t h e
observed nearest n eighbor dista nce for point i is great er th an th e dista nce to th e nearest
edge, t h e dis t a n ce t o t h e bor der is u sed a s a pr oxy for t h e t r u e n ea r est n eigh bor dis t a n ce of
poin t i.

For eit h er cor r ect ion

Th e a ver a ge n ea r est n eigh bor dis t a n ce is ca lcu la t ed a n d com pa r ed t o t h e t h eor et ica l


a vera ge nea r est n eigh bor dist a n ce u n der r a n dom con dit ion s. The in dices a n d t est s a r e a s
befor e (see ch a pt er 4). F igu r e 5.3 below sh ows a gr a ph of t h e K-or der n ea r est n eigh bor
in dex for t h e 50 n ea r est n eigh bor s for 1996 m ot or veh icle t h eft s in police P r ecin ct 11 of
Baltimore Coun ty. The uncorr ected near est neighbor indices are compa red with th ose
cor r ected by a r ecta n gle a n d a cir cle. As ca n be s een , both cor r ection s a r e ver y sim ila r t o
t h e u n cor r ect ed. However, th ey bot h sh ow grea t er con cen t r a t ion s t h a n t h e u n cor r ect ed
index. The recta ngular corr ection sh ows great er concentr at ion t ha n t he circular becau se it
is less com pa ct (i.e., t h e a ver a ge dis t a n ce fr om t h e cen t er of t h e geom et r ic object t o t h e
bor der is sligh t ly la r ger ). In gen er a l, t h e r ect a n gle will lea d t o m or e cor r ect ion t h a n t h e
circle sin ce it s u bst it u t es a gr ea t er n ea r est n eigh bor d ist a n ce, on a ver a ge, for a point
n ea r er t h e bor der t h a n t o it s m ea su r ed n ea r est n eigh bor .

Th e u se r h a s t o decide wh et h er eit h er of t h es e cor r ect ions a r e m ea n in gful or n ot.


Depen din g on t h e sh a pe of t h e st u dy a r ea , eit h er cor r ect ion m a y or m a y n ot be a pp r opr ia t e.
If t h e st u dy a r ea is r ela t ively r ect a n gu la r , t h en t h e r ect a n gu la r m odel m a y p r ovide a good
a ppr oxim a t ion . Similar ly, if t h e st u dy ar ea is com pa ct (circula r ), th en t h e circula r m odel
m a y pr ovide a good a pp r oxim a t ion . On t h e oth er h a n d, if th e st u dy a r ea is of ir r egu la r
sh a pe, th en eith er of t h ese cor r ect ion s m a y produ ce m or e dist or t ion t h a n t h e r a w n ea r est
n eigh bor in dex. On e h a s t o us e t h ese cor r ection s wit h jud gem en t . Also, in s ome cas es, it
m a y not m a k e a n y sen se t o corr ect t h e m ea su r ed n ea r est n eigh bor d ist a n ces. In H onolulu ,
for exa m ple, on e wou ld not cor r ect t h e m ea su r ed nea r es t n eigh bor dis t an ces beca u se t h er e
a r e n o incide n t s ou t sid e t h e isla n d’s boun da r y.

181
Figure 5.3:

Correction of Nearest Neighbor Indices


Motor Vehicle Thefts in Precinct 11

Dispersed

Random
1

Concentrated
Nearest neighbor index

0.9

No correction

0.8 Circular correction

Rectangular correction

0.7

10 20 30 40
5 15 25 35 45

Order
Li n e a r N e a r e s t N e i g h b o r In d e x (Ln n a )

Th e lin ear n earest n eigh bor in d ex is a va r iat ion on t h e n ea r est n eigh bor r ou t ine, bu t
on e a p plied t o a s t r eet n et wor k . All d is t a n ces a lon g t h is n et wor k a r e a s su m ed t o t r a vel
a lon g a gr id, h en ce ind ir ect dis t a n ces a r e u sed. Wh er ea s t h e n ea r est n eigh bor r out in e
calculat es the distan ce between each point a nd its nea rest n eighbor u sing direct dista nces,
t h e lin ea r n ea r es t n eigh bor r out in e u se s in dir ect (‘Ma n h a t t a n ’) dist a n ces (see cha pt er 3).
Sim ilar ly, wher ea s t h e n ea r est n eigh bor r ou t ine calcu lat es t h e expect ed dist a n ce bet ween
n eigh bor s in a r a n dom dist r ibut ion of N p oint s u sin g th e geogra ph ica l ar ea of t h e st u dy
r egion , t h e lin ea r n ea r est n eigh bor r ou t in e u ses t h e t ot a l len gt h of t h e st r eet n et wor k .

Th e t h eor y of lin ea r n ea r es t n eigh bors comes from H a m m ond a n d McCulla gh


(1978). Th e obse r ved lin ea r n ea r es t n eigh bor dis t a n ce, Ld(N N), is calcu la t ed by Crim eS tat
a s t h e a vera ge of ind irect d ist a n ces bet ween ea ch poin t a n d it s n ea r est n eigh bor . The
expect ed lin ea r n ea r es t n eigh bor dis t a n ce is given by

L
Ld (r a n ) = 0.5 [------------------] (5.10)
N -1

wh er e L is th e tota l length of str eet n etwork an d N is the sam ple size (Ha mm ond a nd
McCullagh, 1978, 279). Consequent ly, th e linear n earest neighbor index is defined as

Lin ea r N ea r es t Ld(NN)
Neighbor In dex = LN NI = --------------- (5.11)
Ld (r a n )

Te s t i n g t h e S i g n i fi c a n c e o f t h e Li n e a r N e a r e s t N e i g h b o r In d e x

Sin ce t h e t h eor et ica l s t a n da r d er r or for t h e r a n dom lin ea r n ea r est n eigh bor dis t a n ce
is n ot kn own , t h e a u t h or h a s con st r u ct ed a n a ppr oxim a t e st a n da r d devia t ion for t h e
obser ved lin ea r n ea r est n eigh bor dist a n ce:

G ( Min (d ij) - Ld (NN) )2


S L d(N N ) . SQRT [ ------------------------------------ ] (5.12)
N -1

wh er e Min (d ij) is t h e n ea r est n eigh bor dist a n ce for poin t i an d Ld(NN ) is t h e a vera ge linea r
n ea r est n eigh bor dis t a n ce. Th is is t h e st a n da r d devia t ion of t h e lin ea r n ea r est n eigh bor
dist a n ces. The s t a n da r d er r or is ca lcu lat ed by

S L d(N N )
SE L d(N N ) = -------------- (5.13)
SQRT[N]

183
An a ppr oxim a t e significa n ce t est ca n be obt a ined by

Ld (N N ) - Ld (r a n )
t = ----------------------------- (5.14)
SE L d(N N )

wh er e Ld(NN) is t h e a vera ge linea r n ea r est n eigh bor dist a n ce, Ld(r a n ) is t h e expect ed
lin ea r n ea r es t n eigh bor dis t a n ce (equ a t ion 5.10), a n d S E L d(N N ) is th e a pp roxim a t e s ta n da r d
er r or of t h e lin ea r n ea r es t n eigh bor dis t an ce (equ a t ion 5.13). Sin ce t h e em p ir ica l s ta n da r d
devia t ion of th e lin ea r n ea r es t n eigh bor is bein g u se d in st ea d of a t h eor et ical va lu e, t h e
t es t is a t-test r a t h er t h a n a Z-t es t .

Ca lc u l a ti n g t h e s t a t is t i c s

On th e measu rem ents pa ra met ers page, th ere ar e two par am eters t ha t a re input ,
t h e geogr a ph ica l a r ea of t h e st u dy r egion a n d t h e len gt h of st r eet n et wor k . At t h e bot t om
of t h e p age, t h e u ser m u st select wh ich t yp e of d is t an ce m ea su r em en t t o u se, d ir ect or
in d ir ect . If t h e m ea s u r em en t t yp e is dir ect , t h en t h e n ea r es t n eigh bor r ou t in e r et u r n s t h e
sta nda rd n earest neighbor a na lysis (somet imes called areal nea r est neighbor ). On t h e
ot h er h a n d, if t h e m ea su r em en t t yp e is in dir ect , t h en t h e r ou t in e r et u r n s t h e lin ea r n ea r es t
n eigh bor a n a lysis . To ca lcu la t e t h e lin ea r n ea r est n eigh bor in dex, t h er efor e, d is t a n ce
m ea su r em en t m u st be specified a s in dir ect a n d t h e lengt h of t h e st r eet n et wor k m u st be
defined.

On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t ine. The L n n a rout ine out put s 9 stat istics:

1. Th e sa m ple size
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce
3. The minimum linear distan ce between n earest neighbors
4. Th e m a xim u m lin ea r dis t a n ce bet ween n ea r est n eigh bor s
5. Th e m ea n lin ea r r a n dom dist a n ce
6. Th e lin ea r n ea r es t n eigh bor in dex
7. Th e st a n da r d deviat ion of t h e lin ea r n ea r est n eigh bor dist a n ce
8. Th e st a n da r d er r or of t h e lin ea r n ea r est n eigh bor dis t a n ce
9. A significa n ce t est of t h e n ea r est n eigh bor ind ex (t -t est )

E x a m p l e 3: Au t o t h e ft s a lo n g t w o h i g h w a y s

Th e lin ea r n ea r est n eigh bor in dex is u seful for a n a lyzing t h e dist r ibu t ion of crim e
in ciden t s a lon g pa r t icula r st r eet s. F or exa m ple, in Ba lt im ore Coun t y, st a t e h ighwa y 26 in
t h e west er n pa r t a n d st a t e h igh wa y 150 in t h e ea st er n pa r t h a ve h igh con cen t r a t ion s of
m ot or vehicle th eft s (figu r e 5.4). In 1996, th er e wer e 87 vehicle th eft s on h igh wa y 26 an d
47 on h igh wa y 150. A GIS ca n be u sed wit h t h e lin ea r n ea r est n eigh bor in dex t o in dica t e
wh et h er t h es e in ciden t s a r e gr ea t er t h a n wh a t would be exp ect ed on t h e ba sis of cha n ce.

184
Figure 5.4:

1996 Auto Thefts in Baltimore County


Incident Distribution on State Highways 26 and 150

Sta
te
Hig
hw
ay
26

0
y 15
a
ighw
H
te
Sta

Miles
0 2 4
Ta ble 5.3 pr esen t s t h e da t a . Usin g th e GIS, we est ima t e t h a t t h er e a r e 3,333.54
m iles of roa dwa y s egm en t s; t h is n u m ber wa s es t im a t ed by a ddin g u p t h e t ot a l len gt h of th e
st r eet n et wor k in t h e GIS. Of a ll t h e r oa d segm en t s in Balt imore Coun t y, t h er e a r e 241.04
m iles of m a jor a r t er ial r oa ds of wh ich st a t e h igh wa y 26 ha s a t ot a l len gth of 10.42 miles
a n d s t a t e h igh wa y 150 h a s a t ota l r oad len gt h of 7.79 m iles .

In 1996, th er e wer e 3,774 m ot or vehicle th eft s in t h e cou n t y. If t h ese t h eft s wer e


dist r ibut ed r a n domly, t h en t h e r a n dom expect ed dist a n ce bet ween inciden t s would be 0.44
m iles (equ a t ion 5.10). Usin g th is est ima t e, ta ble 5.3 sh ows t h e n u m ber of inciden t s t h a t
wou ld be exp ect ed on ea ch of t h e t wo st a t e h igh wa ys if t h e dis t r ibu t ion wer e r a n dom a n d
t h e r a t io of t h e a ct u a l nu m ber of m ot or vehicle th eft s t o t h e expect ed n u m ber . As can be
seen , t h e dis t r ibu t ion of m ot or veh icle t h eft s is n ot r a n dom . On a ll m a jor a r t er ia l r oa ds,
t h er e a r e 2.2 t im es a s m a n y t h eft s a s w ould be expecte d by a r a n dom sp a t ia l dis t r ibu t ion.
In fa ct , in 1996, of 28,551 r oa d segm en t s in Ba lt im or e Cou n t y, on ly 7791 (27%) h a d on e or
m or e m ot or veh icle t h eft s occu r on t h em ; m ost of t h ese a r e m a jor r oa ds. F u r t h er , on
highway 26 th ere were 7.4 times as mu ch a nd on h ighway 150 there were 5.3 times as
m u ch a s w ould be expecte d if t h e dist r ibu t ion wa s r a n dom . Clea r ly, t h es e t wo high wa ys
h a d m ore t h a n t h eir sh a r e of a u t o th eft s in 1996.

But wha t a bout th e distr ibut ion of th e incidents a lon g each of th ese highwa ys? If
t h er e we r e a n y pa t t er n , for exa m ple , m ost of t h e in ciden t s clu st er in g on t h e we st er n edge
or in t h e cen t er , th en police cou ld u se t h a t infor m a t ion t o m or e efficient ly deploy veh icles t o
r es pon d qu ickly t o even t s. On t h e oth er h a n d, if t h e dist r ibu t ion a long t h es e h igh wa ys
wer e n o differen t t h a n a r a n dom dis t r ibu t ion , t h en police veh icles m u st be posit ion ed in
t h e m iddle, since th a t wou ld m inim ize t h e dist a n ce t o a ll occu r r ing incident s.

Un for t u n a t ely, t h e r esu lt s a pp ea r t o be close t o a r a n dom dis t r ibu t ion. Crim eS tat
ca lcu lat es t h a t for h igh wa y 26, t h e a vera ge linea r n ea r est n eigh bor dist a n ce is 0.05 m iles
wh ich is close t o t h e a vera ge ra n dom linea r n ea r est n eigh bor dist a n ce (0.06 miles). The
r a t io - t h e lin ea r n ea r est n eigh bor in dex, is 0.96 wit h a t -va lu e of -0.16, wh ich is n ot
significa n t ly differ en t fr om ch a n ce. Similar ly, for h igh wa y 150, t h e a vera ge linea r n ea r est
n eigh bor dis t a n ce is 0.079 m iles wh ich , a ga in , is a lm ost id en t ica l t o t h e a ver a ge r a n dom
linea r n ea r est n eigh bor dist a n ce (0.084 miles); t h e n ea r est n eigh bor ind ex is 0.94 an d t h e t -
va lu e is -0.41 (n ot sign ifica n t ). In sh or t , even t h ou gh t h er e wa s a h igh er con cen t r a t ion of
vehicle th eft s on t h ese t wo st a t e h igh wa ys t h a n wou ld be expect ed on t h e bas is of ch a n ce,
t h e d is t r ibu t ion a lon g ea ch h igh wa y is n ot ver y d iffer en t t h a n wh a t wou ld be exp ect ed on
t h e bas is of ch a n ce. 4

K-Or d e r Li n e a r N e a r e s t N e i g h b o rs

There is also a K-order linear near est neighbor a na lysis, as with t he ar eal nearest
n eigh bor s. The u ser ca n specify how ma n y add itiona l nea r est n eigh bor s a r e t o be
calculat ed. The linear K-order n earest neighbor r out ine retur ns four column s:

1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)

186
Table 5.3

Comp arison o f 1996 Balt imor e Coun t y Aut o The ft s


fo r D i ffe r e n t Ty p e s o f R o a d s
(N = 3774 Incidents)

Length of Road Segments:

H igh wa y 26 10.42 m i
H igh wa y 150 7.79 m i
All Ma jor
Ar t er ia ls 241.04 m i
All
Roads 3333.54 m i

Random E xpected
Dist a n ce
Bet ween In ciden t s = 0.44 miles

P rop ort io n al To N e tw ork P r o p o r t io n a l t o S a m e R o a d

Av e r a g e “R e l a t i v e
“R e l a t i v e Av e r a g e R an do m t o It se l f ”
to R a n d om ” L in e a r L in e a r L in e a r
Wh e r e Number E x p e c te d Ne arest N e are st Ne arest
Inc ide n ts of Number R a t io o f Neighbor Neighbor Neighbor
Oc cu rre d Inc ide n ts If R a n d o m Frequen cy D i s ta n c e D i s ta n c e In d e x

H ig h w a y 2 6 87 11 .8 7.4 0 .0 5 m i 0 .0 6 0 .9 6

H ig h w a y 1 5 0 47 8.8 5.3 0 .0 8 m i 0 .0 8 0 .9 4

A ll M a jo r
A rteria ls 607 27 2 .8 2.2 0 .1 3 m i 0 .2 0 0 .6 4
(p # .0 0 1 )

A ll R o a d s 3774 3 7 7 4 .0 1.0 0 .0 9 m i 044 0 .2 1


(p # .0 0 1 )

187
3. Th e expect ed linea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e lin ea r n ea r est n eigh bor ind ex for ea ch or der

Sin ce t h e expect ed linea r n ea r est n eigh bor dist a n ce h a s n ot been work ed out for or der s
h igh er t h a n on e, t h e ca lcu la t ion pr odu ced h er e is a r ou gh a ppr oxim a t ion . It a pplies equ a t ion
5.10 only a dju st in g for t h e decr ea sin g sa m ple size, N k , wh ich occu r s a s degr ees of fr eedom a r e
lost for each successive order. In th is sense, th e index is really th e k-order linear near est
n eigh bor dis t a n ce r ela t ive t o t h e exp ect ed lin ea r n eigh bor dis t a n ce for t h e fir st or der . It is n ot
a st r ict n ea r est n eigh bor ind ex for or der s a bove on e.

Never t h eless, like t h e a r ea l k-or der n ea r est n eigh bor ind ex, t h e k-or der lin ea r n ea r est
n eigh bor ind ex ca n pr ovide ins igh t s in t o t h e dist r ibut ion of t h e poin t s, even if t h e firs t -or der
is r a n dom . Figur e 5.5 s h ows a gr a ph of 50 lin ea r n ea r est n eigh bors for 1996 r esiden t ia l
bu r gla r ies a n d st r eet r obber ies for Balt imore Coun t y. As wit h t h e a r ea l k-or der n ea r est
n eigh bor s (see figu r e 5.3) bot h bu r gla r ies a n d r obber ies sh ow eviden ce of clu st er in g. F or bot h ,
t h e firs t n ea r est n eigh bor s a r e closer t ogeth er t h a n a r a n dom dist r ibut ion . Similar ly, over t h e
50 or der s, s t r eet r obber ies a r e m or e clu st er ed t h a n bu r gla r ies. H owever , m ea su r in g d is t a n ce
on a gr id sh ows t h a t for bu r gla r ies, t h er e is only a sm a ll a m ou n t of clu st er in g. After t h e
four th order n eighbor, the distribution for bur glaries is more dispersed th an a r an dom
dis t r ibu t ion . An in t er pr et a t ion of t h is is t h a t t h er e a r e sm a ll n u m ber of bu r gla r ies wh ich a r e
clus t er ed, bu t t h e clust er s a r e r ela t ively disp er se d. S t r eet r obber ies , on t h e oth er h a n d, a r e
highly clustered, up t o over 30 near est neighbors.

Th e lin ea r k-or der n ea r est n eigh bor dis t r ibu t ion gives a sligh t ly differ en t per sp ect ive
on t he distribution t ha n t he ar eal. For one th ing, th e index is slight ly biased as t he
den om in a t or - t h e K-or der exp ect ed lin ea r n eigh bor dis t a n ce, is on ly a ppr oxim a t ed. F or
a n ot h er t h ing, th e index m ea su r es dist a n ce as if t h e st r eet follow a t r u e gr id, orien t ed in a n
ea st -west a n d n or t h -sout h dir ect ion . In t h is sen se, it m a y be un r ea listic for m a n y places,
especia lly if st r eet s t r a ver se in dia gon a l p a t t er n s; in t h ese ca ses, t h e u se of in dir ect dis t a n ce
m ea su r em en t will pr odu ce grea t er dis t a n ces t h a n wh a t a ctu a lly occu r on t h e n et work . St ill,
t h e lin ea r n ea r est n eigh bor ind ex is a n a t t em pt t o a ppr oxim a t e t r a vel a lon g th e st r eet
n et work . To t h e ext en t t h a t a pa r t icula r jur isd iction’s s t r eet pa t t er n fall in t h is m a n n er , it
ca n pr ovide u sefu l in for m a t ion .

Ripley ’s K Statistic

R ipley’s K st a t ist ic is a n in dex of n on-r a n dom n es s for differ en t sca le va lu es (Ripley,


1976; Ripley, 1981; Ba iley a n d Ga t t r ell, 1995; Ven a bles a n d Rip ley, 1997) . In t h is s en se, it is
a ‘su per -or der ’ n ea r est n eigh bor st a t is t ic, pr ovidin g a t est of r a n dom n ess for ever y d is t a n ce
fr om t h e sm a llest u p t o t h e size of t h e st u dy ar ea . It is somet imes ca lled th e red u ced secon d
m om en t m easu re, im plyin g t h a t it is design ed t o m ea su r e secon d-or der t r en ds (i.e., loca l
clus t er in g a s opp osed t o a gen er a l pa t t er n over t h e r egion). However , it is a lso su bject t o fir st -
or der effect s s o t h a t it is not st r ict ly a secon d-or der m ea su r e.

Con sider a spatially ran d om dis t r ibu t ion of N point s. I f cir cles of r a diu s, d s , ar e dr a wn
a r oun d ea ch p oint , wh er e s is t h e order of r a dii fr om t h e sm a lles t t o th e la r gest , a n d t h e

188
Figure 5.5

K-Order Linear Nearest Neighbor Indices


1996 Street Robberies and Residential Burglaries
4.5

3.5
Linear Nearest Neighbor Index

Residential burglaries

2.5

1.5

K-order spatial randomness


1

0.5 Street robberies

0
0 10 20 30 40
5 15 25 35 45

Order of Linear Nearest Neighbor Index


n u m ber of oth er point s t h a t a r e fou n d wit h in t h e circle a r e cou n t ed a n d t h en su m m ed over a ll
poin t s (a llowin g for d u plicat ion ), t h en t h e expected n u m ber of point s wit h in t h a t r a diu s a r e

N
E (# of poin t s wit h in d ist a n ce d i ) = --------- K(d s ) (5.15)
A

wh er e N is t h e sa m ple size, A is t h e t ot a l s t u dy a r ea , a n d K(d s ) is t h e a r ea of a circle defin ed


by r a diu s, d s . F or exa m ple, if t h e a r ea defin ed by a pa r t icu la r r a diu s is on e-fou r t h t h e t ot a l
stu dy ar ea an d if t h er e is a s pa t ially ra n dom dist r ibut ion , on a vera ge app r oxim a t ely on e-
fou r t h of t h e ca ses will fa ll wit h in a n y on e cir cle (plu s or m in u s a sa m plin g er r or ). Mor e
for m a lly, wit h com plete spatia l ran d om n ess (cs r ), t h e exp ect ed n u m ber of p oin t s wit h in
dis t a n ce, d s , is

N
E (# un der csr ) = ------ B d s 2 (5.16)
A

On t h e ot h er h a n d, if t h e a ver a ge n u m ber of poin t s fou n d wit h in a cir cle for a


pa r t icu lar r a diu s pla ced over ea ch poin t , in t u r n , is gr ea t er t h a n t h a t fou n d in equa t ion 5.16,
t h is poin t s t o clu st er in g, t h a t is poin t s a r e, on a ver a ge, closer t h a n wou ld be exp ect ed on t h e
ba sis of cha n ce for t h a t r a diu s. Con ver sely, if t h e a ver a ge n u m ber of point s fou n d wit h in a
circle for a par ticular ra dius placed over each point, in t ur n, is less tha n t ha t foun d in
equ a t ion 5.16, t h is poin t s t o disper sion ; t h a t is point s a r e, on a vera ge, fa r t h er a pa r t t h a n
wou ld be exp ect ed on t h e ba sis of ch a n ce for t h a t r a diu s. By cou n t in g t h e n u m ber of t ot a l
n u m ber s wit h in a pa r t icu la r r a diu s a n d com pa r in g it t o t h e n u m ber exp ect ed on t h e ba sis of
com ple t e spa t ia l r a n dom n es s, t h e st a t ist ic is a n in dica t or of non -ra n dom n es s.

In t h is s en se, t h e K s t a t ist ic is s im ila r t o th e n ea r est n eigh bor d ist a n ce in t h a t it


pr ovides in for m a t ion a bou t t h e a ver a ge dis t a n ce bet ween poin t s. H owever , it is m or e
com p r eh en s ive t h a n t h e n ea r es t n eigh bor s t a t is t ic for t wo r ea s on s . F ir s t , it a p plies t o a ll
ord er s cu m u la t ively, not jus t a sin gle ord er . Secon d, it a pp lies t o all dist a n ces u p t o th e lim it
of t h e st u dy a r ea becau se t h e cou n t is con du cted over su ccessively in crea sin g r a dii.

Un der un const ra ined conditions, K is defined as

A
K(d s ) = ------ G G I (d ij) (5.17)
N2 i j

where I (d ij) is t h e n u m ber of oth er poin t s, j, fou n d w it h in dis t a n ce, d s , su m m ed over a ll


poin t s, i. Th a t is, a circle of r a diu s, d s , is pla ced over ea ch poin t , i. Then , th e n u m ber of ot h er
poin t s, ij, ar e cou n t ed. The cir cle is m oved to th e n ext i a n d t h e pr ocess is r epea t ed. Thu s,
t h e double su m m a t ion point s t o th e cou n t of a ll j’s for ea ch i, over a ll i’s. Aft er t h is p r ocess is
com ple t ed, t h e r a diu s of th e circle is in cre a se d, a n d t h e en t ir e pr oces s is r epea t ed. Typica lly,

190
t h e r a dii of circles ar e increa sed in sm a ll increm en t s so th a t t h er e a r e 50-100 int er vals by
which t he st at istic can be coun ted. In Crim eS tat, 100 in t er va ls (r a dii) a r e u sed, ba sed on

R
d s = -------- (5.18)
100

wh er e R is th e r a diu s of a circle for wh ose a r ea is equa l to th e st u dy ar ea (i.e., t h e a r ea


en t er ed on t h e m ea su r em en t pa r a m et er s pa ge).

On e ca n gra ph K(d s ) aga in st t h e dist a n ce, d s , t o revea l wh et h er t h er e is a n y clu st er in g


a t cer t a in d ist a n ces or a n y disper sion a t ot h er s (if t h er e is clus t er ing a t some scales, t h en
t h er e m u st be disp er sion a t oth er s). Su ch a plot is n on-linea r , however , t ypically incr ea sin g
exp on en t ia lly (Ka lu zn y et a l, 1998. Con sequ en t ly, K(d s ) is t r a n sfor m ed in t o a squ a r e r oot
fu n ct ion , L(d s ), t o m a k e it m or e lin ea r . L(d s ) is defined as:

K(d s )
L(d s ) = S QRT [ --------- ] - d s (5.19)
B

Th a t is , K(d s ) is divid ed by B a n d t h en t h e squ a r e r oot is t a k en . Th en t h e dist a n ce int er va l


(t h e p a r t icu la r r a d iu s ), d s , is subtr acted from t his.5 In pr a ct ice, on ly t h e L st a t ist ic is us ed
even t h ou gh t h e n a m e of t h e st a t is t ic K is ba sed on t h e K der iva t ion . F igu r e 5.6 sh ows a
gr a ph of L a ga in st dis t a n ce for 1996 r obber ies in Ba lt im ore Coun t y. As ca n be s een , L
in crea ses u p t o a dist a n ce of abou t 3 m iles w h er eu pon it decrea ses a ga in .

Co m p a r i s o n to A S p a t ia ll y R a n d o m D i s t ri b u t io n

To u n der st a n d wh et h er a n obs er ved K dis t r ibu t ion is differ en t fr om ch a n ce, on e


t yp ica lly u ses a r a n dom dis t r ibu t ion . Beca u se t h e sa m plin g dis t r ibu t ion of L(d s ) is not kn own ,
a sim u lat ion ca n be con du ct ed by ra n domly ass ign ing poin t s t o t h e st u dy ar ea . Beca u se a n y
on e s im u la t ion m igh t p r od u ce a clu s t er ed or d is per s ed pa t t er n s t r ict ly by ch a n ce, t h e
sim u la t ion is r epea t ed m a n y t im es, t yp ica lly 100 or m or e. Th en , for ea ch r a n dom sim u la t ion ,
t h e L st a t ist ic is ca lcu lat ed for ea ch dist a n ce int er val. Fin a lly, after a ll sim u lat ion s h a ve been
con d u ct ed , t h e h igh es t a n d lowes t L-va lu es a r e t a k en for ea ch d is t a n ce in t er va l. Th is is ca lled
a n envelope. Thu s, by com pa r ing t h e dist r ibut ion of L to th e r a n dom en velope, on e ca n a sses s
wh et h er t h e pa r t icu lar obser ved pa t t er n is likely to be differ en t fr om ch a n ce. 6

S pe c ify in g si m u la ti on s

Beca u s e s im u la t ion s ca n t a k e a lon g t im e, p a r t icu la r ly if t h e d a t a s et s a r e la r ge, t h e


defau lt n u m ber of sim u la t ions is 0. H owever , a u se r can con du ct s im u la t ions by wr it in g a
posit ive n u m ber (e.g., 10, 100, 300). If sim u la t ions a r e select ed, Crim eS tat will con du ct t h e
n u m ber of sim u la t ion s specified by t h e u ser a n d will ca lcu la t e t h e u pper a n d lower lim it s for
ea ch dist a n ce int er val, as well a s t h e 0.5%, 2.5%, 5%, 95%, 97.5% a n d 99% int er vals; th ese
la t t er st a t ist ics only m a k e sen se if ma n y sim u la t ion r u n s a r e con du cte d (e.g. 1000).

191
Figure 5.6:
K Statistic For 1996 Robberies
Compared to Random and Population Distributions
L(d) = Sqrt[K(d)/pi] - d
3
Robberies

1990 Population

0
L(d)

Envelope of 100 Random Simulations

-1

-2

-3
0.33 2.79 5.13 7.46 9.80 12.14 14.48
1.56 3.89 6.23 8.57 11.03 13.37

Distance Between Points (miles)


Th e wa y Crim eS tat con d u ct s t h e s im u la t ion is a s follows . It t a k es t h e m a xim u m
bou n d in g r ect a n gle of t h e d is t r ibu t ion , t h a t is t h e r ect a n gle for m ed by t h e m a xim u m a n d
m in im u m X a n d Y coor din a t es r espectively a n d r e-scales t h is (u p or down ) un t il t h e r ecta n gle
h a s a n a r ea equa l to th e st u dy ar ea (defin ed on t h e m ea su r em en t pa r a m et er s pa ge). It t h en
a ss igns N point s, wh er e N is t h e sa m e n u m ber of point s a s in t h e in ciden t dis t r ibu t ion , usin g
a u n ifor m r a n dom n u m ber gener a t or t o t h is r ect a n gle a n d calcu lat es t h e L st a t ist ic. It t h en
r epea t s t h e exp er im en t for t h e n u m ber of sp ecified sim u la t ions , a n d ca lcula t es t h e a bove
st a t ist ics. For exa m ple, wit h 1181 robberies for 1996, th e Ripley’s K fun ct ion ca lcu lat es t h e
em pir ica l L sta t ist ics for 100 dist a n ce int er vals a n d com pa r es t h is t o a sim u lat ion of 1181
poin t s r a n domly distr ibut ed over a r ect a n gle k t imes , wher e k is a u ser -defin ed n u m ber .

In pr a ct ice, th e sim u lat ion t est a lso h a s bia ses a ssociat ed wit h edges. Un like t h e
t h eore t ical L u n der u n ifor m con dit ion s of com plet e spa t ia l r a n dom n ess (i.e., st r et chin g in a ll
dir ect ion s well beyon d t h e st u dy ar ea ) wh er e L is a s t r a igh t h or izon t a l line, t h e sim u lat ed L
a ls o declin es wit h in cr ea sin g dis t a n ce s epa r a t ion be t ween poin t s. Th is is a fu n ct ion of th e
sa m e t ype of edge bia s. Con sequ en t ly, it is poss ible t o comp a r e t h e em pir ical L wit h t h e
r a n dom L for even lon ger dis t a n ce sepa r a t ion s sin ce bot h h a ve edge bia ses. Th er e a r e som e
su bt le differen ces bet ween t h e t wo, however , so some car e sh ould be u sed. Th e em pir ical L is
obta in ed from t h e point s w it h in t h e st u dy a r ea , t h e geogr a ph y of wh ich is u su a lly ir r egu la r .
Th e r a n dom L, however , is calcu lat ed from a r ect a n gle. Thu s, t h e differ en ces in t h e sh a pe
compa risons m ay accoun t for some variat ions.

Co m pa ri so n to B as e li ne P o p u la ti on s

F or m ost social dist r ibut ion s, su ch a s crim e inciden t s, r a n domn ess is n ot a very
m ea n in gfu l ba selin e. Most socia l ch a r a cter ist ics a r e n on-r a n dom . Con sequ en t ly, to find t h a t
t h e a m oun t of clu st er in g t h a t is occur r in g is gr ea t er t h a n wh a t would be exp ected on t h e ba sis
of ch a n ce is not ver y useful for cr ime a n a lyst s. H owever, it is p ossible t o com pa r e t h e
dis t r ibu t ion of L for crim e in ciden t s wit h t h e dist r ibu t ion of L for va r iou s ba selin e
cha r a cte r ist ics, for exa m ple , for t h e popu la t ion d ist r ibu t ion or t h e dist r ibu t ion of em ploym en t .
In a lmost a ll m et r opolita n a r ea s, popu lat ion is m or e con cen t r a t ed t owa r ds t h e cen t er t h a n a t
t h e per iph er y; t h e dr op-off in popu la t ion den sit y is ver y sh a r p a s wa s s h own in t h e la st
ch a p t er . All ot h er t h in gs bein g equ a l, on e wou ld exp ect m or e in cid en t s t owa r d s t h e
m et r opolita n cen t er t h a n a t t h e per iph er y; con sequ en t ly, th e a vera ge dista n ce bet ween
in cid en t s will be s h or t er in t h e cen t er t h a n fa r t h er ou t . Th is is n ot h in g m or e t h a n a
con sequ en ce of t h e dis t r ibu t ion of people. H owever , t o sa y s om et h in g a bou t con cen t r a t ion s of
in cid en t s a bove-a n d-beyon d t h a t exp ect ed by p opu la t ion r equ ir es u s t o exa m in e t h e pa t t er n of
populat ion a s well as of crime incidents.

Crim eS tat a llows t h e u se of in t en sit y a n d weigh t in g va r ia bles in t h e ca lcu la t ion of th e


K st a t ist ic. The u ser m u st defin e a n int en sit y or a weigh t (or bot h in sp ecial circum st a n ces)
on t h e pr im a r y file pa ge. Th e K r out in e will t h en u se t h e in t en sit y (or weigh t ) in t h e
ca lcu la t ion of L. In F igu r e 5.6 a bove, t h er e is a n en velop e pr odu ced fr om 100 r a n dom
simu lat ion s a s well as t h e L distr ibut ion fr om t h e 1990 popu lat ion ; t h e lat t er va r iable was
obta in ed by t a k in g t h e cent r oid of cens u s block gr oup s fr om t h e 1990 censu s a n d u sin g
popu la t ion a s t h e in t en sit y var ia ble. As can be s een , t h e a m oun t of clu st er in g for r obber ies is

193
m u ch gr ea t er t h a n both t h e r a n dom en velope a s w ell a s t h e dist r ibu t ion of popu la t ion. In
ot h er wor ds, r obber ies a r e m or e clu st er ed t oget h er t h a n even wh a t wou ld be exp ect ed on t h e
ba sis of t h e popula t ion dist r ibut ion a n d t h is h olds for dist a n ces u p t o a bou t 7 m iles,
wh er eu pon t h e dis t r ibu t ion of r obber ies is in dis t in gu is h a ble fr om a r a n dom dis t r ibu t ion . F or
com pa r is on , figu r e 5.7 below sh ows t h e dis t r ibu t ion of 1996 bu r gla r ies, a ga in com pa r ed t o a
r a n dom en velop e a n d t h e dis t r ibu t ion of popu la t ion . We fin d t h a t bu r gla r ies a r e m or e
clu st er ed t h a n even popu la t ion , bu t less so t h a n for r obber ies; t h e L va lu e is h igh er for
r obber ies t h a n for bu r gla r ies for n ea r dis t a n ces. Th u s, t h e dist r ibu t ion of L con firm s t h e
r esu lt t h a t bur gla r ies t en d t o be spr ea d over a m u ch lar ger geogra ph ica l ar ea in sm a ller
clu st er s t h a n st r eet r obber ies , wh ich t en d t o be m or e con cen t r a t ed in la r ge clu st er s . In t er m s
of lookin g for ‘h ot spots ’, on e wou ld expect t o find m or e with r obber ies t h a n with bur gla r ies.

E d g e Co r re c t i o n s fo r Ri p le y ’s K

Th e L st a t ist ic is p r one t o edge effects ju st like t h e n ea r est n eigh bor s t a t ist ic. Th a t is,
for poin t s loca t ed nea r t h e bou n da r y of t h e s tu dy a r ea , t h e n u m ber en u m er a t ed by a n y cir cle
for t h ose point s will, a ll ot h er t h ings bein g equa l, n ecessa r ily be less t h a n poin t s in t h e cen t er
of t h e s tu dy a r ea beca u se p oin t s ou t sid e t h e bou n da r y a r e n ot cou n t ed . F u r t h er , t h e gr ea t er
t h e dist a n ce bet ween point s t h a t a r e bein g t est ed (i.e., th e gr ea t er t h e r a diu s of t h e circle
placed over ea ch point ), th e grea t er t h e bia s. Thu s, a plot of L aga inst dist a n ce will show a
declin ing cu r ve as dist a n ce incr eas es a s figu r es 5.6 a n d 5.7 show.

Th er e a r e va r ious a dju st m en t s t o th e fun ction t o help corr ect t h e bia s. On e is a ‘gu a r d


r a il’ wit h in t h e st u dy a r ea so t h a t poin t s ou t sid e t h e gu a r d r a il, bu t in sid e t h e st u dy a r ea ca n
on ly be cou n t ed for p oin t s in s id e t h e gu a r d r a il, bu t ca n n ot be u s ed for en u m er a t in g ot h er
poin t s wit h in a circle pla ced over t h em (t h a t is, th ey ca n on ly be j’s a n d n ot i’s, t o u se t h e
la n gu a ge of equ a t ion 5.17). Su ch a n oper a t ion , however , r equ ir es m a n u a lly cons t r u ctin g
t h ese gu a r d r a ils a n d en u m er a t in g wh et h er ea ch poin t ca n be bot h a n en u m er a t or a n d a
r ecip ien t or a r ecip ien t on ly. F or com plex bou n da r ies, s u ch a s a r e fou n d in m ost police
depa r t m en t s, t h is t ype of oper a t ion is ext r em ely t ediou s a n d d ifficult . 7

Sim ila r ly, Rip ley h a s pr oposed a sim ple weigh t in g t o a ccoun t for t h e pr opor t ion of th e
circle pla ced over ea ch poin t t h a t is with in t h e st u dy ar ea (Ven a bles an d Ripley, 1997). Thu s,
equa tion 5.17 is re-written as:

A
K(d s ) = ------ G G Wij-1 I (d ij) (5.20)
N2 i j

wh er e W ij-1 is t h e in ver se of t h e pr oport ion of a cir cle of ra diu s, d s , pla ced over ea ch p oin t
wh ich is wit h in t h e t ota l st u dy a r ea . Th u s, if a point is n ea r t h e st u dy a r ea bord er , it will
r eceive a gr ea t er weigh t becau se a sm a ller pr opor t ion of t h e circle pla ced over it will be wit h in
t h e st u dy a r ea .

194
Figure 5.7:
K Statistic For 1996 Burglaries
Compared to Random and Population Distributions
L(d) = Sqrt[K(d)/pi] - d
2
Burglaries

1
1990 Population

0
L(d)

-1

Envelope of 100 Random Simulations

-2

-3

-4
0.40 2.77 5.15 7.52 9.89 12.27 14.64
1.58 3.96 6.33 8.71 11.08 13.46

Distance Between Points (miles)


Us ing t h is lat t er con cept , two edge cor r ect ion s for Ripley’s K st a t ist ic a r e pr ovided,
a lso followin g r ecta n gu la r a n d cir cula r m odels. Th e logic is s light ly differ en t t h a n wit h t h e
edge cor r ection s for t h e n ea r est n eigh bor in dex. The Ripley’s K r out in e pla ces a sea r ch circle
of ra dius, R j, over ea ch poin t a n d t h e n u m ber of ot h er poin t s wit h in t h e circle is cou n t ed. The
cir cle is m oved to t h e n ext poin t a n d t h e cu m u la t ive cou n t con t in u ed . Aft er a ll p oin t s a r e
vis it ed by t h e cir cle a n d a cu m u la t ive cou n t en u m er a t ed , t h e cou n t is t r a n sfor m ed in t o K a n d
t h en L (see cha pt er 4). The pr oces s t h en con t in u es wit h a sligh t ly la r ger r a diu s, R j + i, wh er e
i is th e bin width .

Rip ley’s K h a s t h e sa m e pot en t ia l edge pr oblem a s t h e n ea r est n eigh bor in dex. F or


poin t s loca t ed n ea r t h e bor der of t h e st u dy ar ea , th e cu m u lat ive cou n t will fr equ en t ly be
sm a ller t h a n point s m ore cen t r a l becau se t h er e a r e n o mea su r ed point s t h a t fall wit h in t h e
circle beyond t h e border . Th u s, t h ey u n der est im a t e t h e n u m ber of point s fou n d wit h in a
cer t a in dis t a n ce. Ripley (1976) su ggest ed t h a t ea ch poin t be weigh t ed by t h e in ver se of th e
pr oport ion of t h e sea r ch circle with in t h e st u dy ar ea .

Define t h is a s a n edge weigh t , E ,

E = 1/p (5.21)

wh er e p is t h e pr opor t ion of t h e sea r ch circle with in t h e st u dy a r ea . If th e en t ir e sea r ch circle


is with in t h e st u dy ar ea , th en E = 1/1 = 1. If t h e poin t is on t h e bor der of t h e st u dy ar ea , th en
for t h e r ect a n gle on ly h a lf t h e r a d iu s of t h e s ea r ch cir cle is wit h in t h e s t u dy a r ea a n d E = 1/0.5
= 2; for t h e circle, it is s ligh t ly less t h a n h a lf. In bet ween a r e va r ious va lu es of E (i.e., E
va r ies be t ween 1 a n d 2).

Th e following is a n a ppr oxim a t ion of t h e int er m edia t e weigh t s (between 1 an d


a ppr oxim a t ely 2) u sin g eit h er a r ect a n gu la r or cir cu la r cor r ect ion .

R ect a n gu la r cor r ect ion

In t h e r ect a n gu la r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is compa red to
t h e edge of a n a ssu m ed r ect a n gle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o
be an a lyzed is defined . If t h e u ser h a s sp ecified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for A is t a k en . Th e m a xim u m boun din g r ect a n gle is t a k en (i.e.,
r ect a n gle defin ed by th e m inim u m a n d m a xim u m X/Y valu es) an d pr oport ion a t ely r e-scaled so
t h a t t h e a r ea of t h e r ect a n gle is equ a l t o A. If t h e u ser does n ot specify a n a r ea on t h e
m ea su r em en t pa r a m et er s pa ge, t h en t h e m a xim u m bou n din g r ect a n gle is t a k en for A.

Secon d, for ea ch p oin t , t h e m in im u m dis t a n ce to t h e n ea r est edge of t h is r ecta n gle is


calcula t ed in both t h e h orizont a l a n d ver t ical dir ection s, d (min R X ) a n d d (m in R Y ). Th ir d, ea ch
of th e minimu m dista nces are compa red to th e sear ch circle ra dius, R j:

1. If neit h er t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), n or t h e


m in im u m dis t a n ce in t h e Y-dir ection - d(m in R Y ), a r e les s t h a n t h e sea r ch circle
ra dius, R j, th en t h e circle fa lls ent irely with in t h e r ect a n gle a n d E = 1;

196
K-Function Analysis to Determine Clustering in the
Police Confrontations Dataset in
Buenos Aires Province, Argentina: 1999

Gastón Pezzuchi, Crime Analyst


Buenos Aires Province Police Force
Buenos Aires, Argentina

Sometimes crime analysts tend to produce beautiful hot spot maps without
any formal evidence that clustering is indeed present in the data. One excellent and
powerful tool that CrimeStat provides is the computation of the K function, which
summarizes spatial dependence over a wide range of scales, and uses the
information of all events.

We computed the K function using 1999 police confrontations data (mostly


shootings) within our study area1 and ran 100 Monte Carlo simulations in order to
test for spatial randomness 2 (see figure below); the K function showed clustering up
to about 30 Km. Yet, spatial randomness is not a particularly meaningful hypothesis
to test considering that the “population at risk” are highly clustered. Hence we used
police deployment data as a base population and calculated the K function for that
data set. As can seen, the amount of clustering for the confrontation dataset is much
greater than both the random envelope as well as the distribution of police officers.

K Statistic for the 1999 Dataset


L(d) = Sqrt[K(d)/π] - d
6

4
Observed L(d)

2 Base-Population L(d)

-2
L(d)

-4

-6 100 Sim. Envelope

-8 L(d)
CSR
L(d)_MIN
-10 L(d)_MAX
L(d) Base Population
-12
0 10 20 30 40
Distance Between Points [km]

1 A years worth dataset of events occurring within a 9,500 km2 area around the Federal Capital (29
counties).
2 Remember that Pr( L(d) > Lmax) = Pr( L(d) < Lmin) = 1 / (m + 1) where m is the number of

independent simulations,
2. If eit h er t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), or t h e m in im u m
dis t a n ce in t h e Y-dir ection - d(m in R Y ), bu t NOT BO TH , a r e less t h a n t h e sea r ch
circle r a diu s, R j, th en par t of th e sear ch circle falls out side the recta ngle and a n
a dju st m en t is n ecess a r y. An a pp r oxim a t e a dju st m en t is m a de t h a n is in ver sely
pr opor t ion a l t o t h e a r ea of t h e sea r ch cir cle wit h in t h e r ect a n gle. Th e va lu es of
E will va r y bet ween 1 a n d 2 s in ce up t o on e-ha lf of t h e sea r ch circle could fall
ou t s id e t h e r ect a n gle;

3. If both t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), a n d t h e m in im u m


dis t a n ce in t h e Y-dir ection - d(m in R Y ), ar e les s t h a n t h e sea r ch cir cle ra diu s, R j,
t h en a gr ea t er a dju st m en t is r equ ir ed sin ce E cou ld va r y bet ween 1 a n d 4 sin ce
u p t o t h r ee-fou r t h of t h e sea r ch circle cou ld fa ll ou t side t h e r ect a n gle.

Ci r cu la r cor r ect ion

In t h e circula r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is com pa r ed t o t h e
edge of a n a ssu m ed cir cle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o be
a n a lyzed is defined . If t h e u ser h a specified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for a is t a k en . Th e r a diu s of th e circle, R j, is ca lcula t ed by equ a t ion 5.8
a bove. If t h e u ser h a s n ot specified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s pa ge, t h en
A is ca lcu la t ed fr om t h e m a xim u m bou n din g r ect a n gle a n d t h e r a diu s of t h e cir cle is
ca lcu lat ed by equa t ion 5.8 above.

Secon d, for ea ch p oint , t h e dist a n ce fr om t h a t poin t t o th e m ea n cent er , R j, is


ca lcu lat ed. The n ea r est dist a n ce fr om t h e poin t t o t h e circle’s edge is given by

R jC = R - R j (5.22)

Th ir d, t h e sea r ch cir cle ra diu s, R j, is comp a r ed t o th e n ea r es t edge of t h e circle, R iC :

1. If t h e sea r ch a r ea r a diu s, R j, is less t h a n or equ a l t o R jC , t h en t h e en t ir e s ea r ch


circle fa lls with in t h e m odel circle a n d E =1.

2. If t h e sea r ch a r ea r a diu s, R j, is gr ea t er t h a n R jC , t h en a n a dju s t m en t is m a de for


t h e a pp r oxim a t e pr opor t ion of t h e sea r ch circle with in t h e m odel circle wit h E
var yin g between 1 an d 2.2.

For eit h er cor r ect ion

Du r in g t h e ca lcu la t ion of Ripley’s K, ea ch poin t is m u lt ip lied by E (a sid e fr om W or I)


a n d t h e K a n d L st a t is t ics a r e ca lcu la t ed a s befor e (see ch a pt er 5). Th e sim u la t ion of r a n dom
poin t dis t r ibu t ions is t r ea t ed in a n a n a logou s w a y. Figu r e 5.8 below s h ows a Ripley’s K
dis t r ibu t ion for 1996 Ba lt im or e Cou n t y bu r gla r ies, wit h a n d wit h ou t edge cor r ect ion s. As ca n
be seen , t h e u n cor r ect ed L dis t r ibu t ion (t h e t r a n sfor m a t ion of K) decr ea ses a n d fa lls below t h e
t h eore t ical r a n dom cou n t (L=0) a fter a bout 8 m iles w h er ea s n eit h er t h e L dist r ibu t ion wit h

198
t h e r ect a n gu la r cor r ect ion n or t h e L dis t r ibu t ion wit h t h e cir cu la r dis t r ibu t ion do so. As
exp ect ed, t h e r ect a n gu la r dis t r ibu t ion pr odu ces t h e m ost con cen t r a t ion .

D is ta n c e Ma tri ce s

Crim eS tat h a s t h e ca pa bilit y for ou t pu t t in g d is t a n ce m a t r ices. Th er e a r e t wo t yp es of


m a t r ices t h a t ca n be ou t pu t . Fir st , th e dist a n ce bet ween every poin t in t h e pr ima r y file a n d
ever y ot h er point can be calcula t ed in m iles, n a u t ical m iles, feet , kilom et er s or m et er s. Th is is
ca lled th e w ith in file point -to-point m atrix (Ma t r ix). Secon d, if t h er e is also a s econ da r y file,
Crim eS tat can calcula t e t h e dist a n ce fr om ever y point in t h e pr im a r y file t o every point in t h e
secon da r y file, a gain in m iles, n a u t ica l miles, feet , kilom et er s or m et er s. This is called t h e
From all prim ary file poin ts to all secon d ary file poin ts m atrix (Im a t r ix).

Bot h t ypes of m a t r ices can be displa yed or sa ved to a t ext file for imp or t int o a n ot h er
pr ogra m . Ea ch m a t r ix define s in ciden t s by t h e order in wh ich t h ey occur in t h e files (i.e.,
Recor d n u m ber 1 is list ed a s ‘1'; r ecor d n u m ber 2 is list ed ‘2'; a n d so for t h ). On ly a su bset of
ea ch m a t r ix is displa yed on t h e r esu lts t a b. However , th er e a r e h or izon t a l an d vert ica l slider
ba r s t h a t a llow t h e u ser t o scr oll t h r ough t h e m a t r ix. The u ser sh ould m ove t h e ver t ical slide
ba r firs t t o a n a ppr oxim a t e pr oport ion of t h e m a t r ix a n d click t h e Go bu t t on. Th e m a t r ix will
scr oll t h r ough t h e r ows of t h e m a t r ix to a pla ce wh ich r epr esen t s t h a t pr opor t ion in dica t ed in
t h e slide bar . The u ser ca n t h en scroll across t h e r ows wit h t h e u pper slide bar .

Th e m a t r ices can be us ed for var iou s pu r poses. The w ith in file point -to-point m atrix
ca n be us ed t o exam ine dist a n ces bet ween pa r t icu lar inciden t s. The saved ‘.txt’ m atrix ca n
a lso be import ed int o a n et wor k p r ogra m for est ima t ing t r a n sport a t ion r ou t es. The prim ary-
to-secon d ary file m atrix ca n be us ed in opt imiza t ion r ou t ines , for exam ple in t r yin g to ass ess
opt im a l a lloca t ion of police ca r s in or der t o m in im ize r espon se t im e in a police dis t r ict .

Th e n ext cha pt er will dis cus s h ow to iden t ify ‘h ot s pot s’ wit h Crim eS tat.

199
Figure 5.8:

"K" Statistic For 1996 Burglaries


With Different Types of Corrections
L(d) = Sqrt[K(d)/pi] - d

8
Rectangular correction

4
Circular correction

No correction
L(d)

Complete spatial randomness


0

-2

0.14 2.08 4.03 5.98 7.92 9.87 11.81 13.76


1.11 3.06 5.00 6.95 8.89 10.84 12.78

Distance Between Points (miles)


En dn ot e s for Ch ap te r 5

1. Th er e is also a m ea n r a n dom dist a n ce for a disper sed p a t t er n , ca lled th e m ean


d isp ersed d ist an ce (Ebdon, 1988). It is defined as

SQRT[2]
d(dis ) = -------------------------
3 1 /4 SQRT[ N/A ]

A n ea r es t n eigh bor in d ex ca n be s et u p com p a r in g t h e obs er ved m ea n n eigh bor


dis t a n ce wit h t h a t expect ed for a dis per se d p a t t er n . Crim eS tat only provides the
t r a dit ion a l nea r est n eigh bor ind ex, but it does ou t pu t t h e m ea n disper sed d ist a n ce.

2. U n for t u n a t ely, t h e t er m ord er wh en u sed in t h e con t ext of n ea r est n eigh bor a n a lysis
ha s a slight ly different mean ing th an when used as first-ord er com p ar ed to secon d -
ord er st a t ist ics. In t h e n ea r es t n eigh bor con t ext , ord er really mean s n eighbor
wh er ea s in t h e t ype of st a t ist ics con t ext , ord er mean s th e scale of th e stat istics,
globa l or local. Th e u se of t h e t er m s is h ist orical.

3. Th er e is n ot a h a r d-a n d-fa st r u le a bou t h ow m a n y K-or der n ea r est n eigh bor


dis t a n ces m a y be calcu la t ed. Cr essie (1991, p. 613) sh ows t h a t er r or in crea ses wit h
increa sin g or der a n d t h e degree of divergen ce fr om a n edge-cor r ect ed m ea su r e
increa ses over t ime. In a t est ca se of 584 point loca t ion s, he s h ows t h a t even a ft er
on ly 25 n ea r est n eigh bor s, t h e u n cor r ect ed m ea su r e yield s opposit e con clu sion s
a bou t clus t er ing fr om t h e cor r ect ed m ea su r es. So, as a r ou gh a ppr oxim a t ion , or der s
no great er th an 2.5% of th e cases should be calculat ed.

4. Beca u se Crim eS tat u ses in dir ect dist a n ce for t h e lin ea r n ea r est n eigh bor ind ex (i.e.
m ea su r em en t on ly in a n h or izon t a l or ver t ica l d ir ect ion ), t h er e is a sligh t dis t or t ion
t h a t can occur if t h e in ciden t s a r e dist r ibu t ed in a dia gon a l m a n n er , su ch a s wit h
St a t e H igh wa ys 26 a n d 150 in F igu r e 5.4. Th e dist ort ion is ver y sm a ll, h owever .
F or exam ple, wit h t h e inciden t s a lon g Sta t e H igh wa y 26, a ft er r ot a t ing t h e inciden t
poin t s so th a t t h ey fell a ppr oxim a t ely in a h or izon t a l or ient a t ion , th e obser ved
a vera ge linea r n ea r est n eigh bor dist a n ce decrea sed s ligh t ly fr om 0.05843 m iles to
0.05061 m iles a n d t h e lin ea r n ea r est n eigh bor in dex beca m e 0.8354 (t =-.91; n ot
significa n t ). In oth er wor ds, t h e effect s of t h e dia gon a l dist r ibut ion lengt h en ed t h e
est ima t e for t h e a vera ge linea r n ea r est n eigh bor dist a n ce by about 41 feet com pa r ed
t o t h e a ct u a l dist a n ces bet ween inciden t s. For a sm a ll sa m ple size, t h is cou ld be
r eleva n t , bu t for a la r ger sa m ple it gen er a lly will be a sm a ll dist ort ion . H owever , if
a m or e pr ecis e m ea su r e is r equ ir ed, t h en t h e u ser sh ou ld r ot a t ion t h e dis t r ibu t ion
so t h a t t h e in cid en t s h a ve a s clos ely a s possible a h or izon t a l or ver t ica l or ien t a t ion .

5. Th is for m of t h e L(d s ) is t a k en fr om Cr essie (1991). In Ripley’s or igin a l for m u la t ion


(Ripley, 1976), dista n ce is not s u bt r a ct ed from t h e squ a r e r oot fu n ct ion . The
a dva n t a ge of t h e Cres sie for m u lat ion is t h a t a com plet e r a n dom dist r ibut ion will be
a st ra ight line tha t is para llel to th e X-axis.

201
6. Note, t h a t sin ce th er e is n ot a for m a l t est of sign ifican ce, th e com pa r ison wit h a n
en velope pr odu ced from a n u m ber of sim u la t ion s p r ovides on ly app r oxim a t e
con fid en ce a bou t wh et h er t h e d is t r ibu t ion d iffer s fr om ch a n ce or n ot . Th a t is , on e
ca n n ot s a y t h a t t h e lik elih ood of obt a in in g t h is r es u lt by ch a n ce is les s t h a n 5%, for
exam ple.

7. Th e ‘gu a r d r a il’ con cept , wh ile frequ en t ly used, is poor m et h odology becau se it
in volves ign or in g d a t a n ea r t h e bou n d a r y of a s t u dy a r ea . Th a t is , p oin t s wit h in t h e
gua r d r a il a r e on ly a llowed t o be select ed by ot h er poin t s a n d n ot , in t u r n , be
a llowed t o select ot h er s. This h a s t h e effect of t h r owing ou t da t a t h a t cou ld be very
im p or t a n t . It is an a logou s t o t h e old , bu t for t u n a t ely n ow d is ca r ded , p ra ct ice of
t h r owin g ou t ‘ou t lier s’ in r egr ession a n a lysis beca u se t h e ou t lier s wer e som eh ow
seen a s ‘n ot t yp ica l’. Th e gu a r d r a il con cept is a ls o poor policin g p r a ct ice sin ce
in cid en t s occu r r in g n ea r a bor d er m a y be ver y im p or t a n t t o a p olice d ep a r t m en t a n d
m a y r equ ir e coor din a t ion wit h a n a dja cen t ju r is dict ion . In sh or t , u se m a t h em a t ica l
adjustm ents for edge corr ections or, failing th at , leave th e data as it is.

202

You might also like