Professional Documents
Culture Documents
Distance Analysis: Nearest Neighbor Index (Nna)
Distance Analysis: Nearest Neighbor Index (Nna)
Distance Analysis: Nearest Neighbor Index (Nna)
Distance Analysis
In t h is ch a pt er , t ools t h a t iden t ify cha r a cter ist ics of th e dist a n ces bet ween point s
will be d escribed. Th e pr evious cha pt er pr ovided t ools for descr ibin g t h e gen er a l sp a t ia l
dis t r ibu t ion of cr im e in cid en t s or first-ord er pr oper t ies of t h e inciden t dist r ibut ion (Ba iley
a n d Ga t t r ell, 1995). F ir st -or der pr oper t ies a r e global be cau se t h ey r epr esen t t h e domin a n t
pa t t er n of dis t r ibu t ion - wh er e it is cen t er ed, how fa r it sp r ea ds out , an d wh et h er t h er e is
a n y orien t a t ion or dir ect ion t o its dis per sion . S econ d -ord er (or local) pr oper t ies, on t h e
ot h er h a n d, r efer t o s u b-r egion a l p a t t er n s or ‘n eigh bor h ood ’ p a t t er n s wit h in t h e over a ll
dist r ibut ion . If t h er e a r e dist inct ‘h ot spots ’ wh er e m a n y cr ime in ciden t s clu st er t ogeth er ,
t h eir d ist r ibut ion is spa t ially rela t ed n ot so mu ch t o t h e overa ll globa l pa t t er n a s t o
som et h in g u n iqu e in t h e su b-r egion or n eigh bor h ood. Th u s, s econ d-or der ch a r a ct er is t ics
t ell some t h in g a bout pa r t icula r en vir onm en t s t h a t m a y con cent r a t e crim e in ciden t s.
F igu r e 5.1 sh ows t h e dis t a n ce a n a lysis scr een a n d t h e dis t a n ce st a t is t ics t h a t a r e
calcu la t ed by Crim eS tat.
N e a r e s t N e i g h b o r In d e x (N n a )
Th e n ea r est n eigh bor in dex comp a r es t h e dist a n ces bet ween n ea r est point s a n d
dis t a n ces t h a t wou ld be exp ect ed on t h e ba sis of ch a n ce. It is a n in dex t h a t is t h e r a t io of
t wo su m m a r y mea su r es. Fir st , th er e is th e n earest n eigh bor d istan ce. F or ea ch poin t (or
in cid en t loca t ion ) in t u r n , t h e d is t an ce t o t h e clos es t ot h er poin t (n ea r es t n eigh bor ) is
calculat ed and a veraged over all points.
N Min (d ij)
Nea r est Neigh bor Dis t a n ce = d(NN) = G[ ----------- ] (5.1)
i=1 N
wh er e Min (d ij) is th e dista nce between each point a nd its nea rest n eighbor a nd N is th e
n u m ber of point s in t h e dist r ibu t ion . Th u s, in Crim eS tat, t h e dist a n ce fr om a sin gle point
t o every ot h er poin t is ca lcu lat ed a n d t h e sm a llest d ist a n ce (t h e m inim u m ) is select ed.
Th en , t h e n ext point is t a k en a n d t h e dist a n ce to a ll ot h er point s (in clud in g t h e firs t point
m ea su r ed) is ca lcula t ed wit h t h e n ea r est bein g selected a n d a dd ed t o th e firs t m in im u m
dis t a n ce. Th is p r oces s is r epea t ed u n t il a ll poin t s h a ve h a d t h eir n ea r es t n eigh bor select ed.
Th e t ot a l s u m of t h e m in im u m dis t a n ces is t h en divid ed by N , t h e sa m ple size, t o pr odu ce
a n a vera ge min imu m dist a n ce.
171
Figure 5.1: Distance Analysis Screen
Th e second su m m a r y m ea su r e is t h e expect ed n ea r est n eigh bor dis t a n ce if t h e
dist r ibut ion of poin t s is com plet ely spa t ially ra n dom. This is t h e m ean ran d om d istan ce (or
th e mean ra ndom n earest neighbor dista nce). It is defined as
A
Mea n Ra n dom Dis t a n ce = d(r a n ) = 0.5 SQRT [ ------] (5.2)
N
Th e n ea r est n eigh bor ind ex is t h e r a t io of t h e obser ved nea r est n eigh bor dist a n ce t o
t h e m ea n r a n dom d is t a n ce
d(NN)
Nea r est Ne ighbor In dex = N NI = --------------- (5.3)
d(r a n )
Te s t i n g t h e S i g n i fi c a n c e o f t h e N e a r e s t N e i g h b o r In d e x
Some differ en ces from 1.0 in t h e n ea r est n eigh bor ind ex wou ld be expect ed by
ch a n ce. Cla r k a n d E van s (1954) pr oposed a Z-t est t o ind ica t e wh et h er t h e obser ved
a ver a ge n ea r est n eigh bor dis t a n ce wa s sign ifica n t ly differ en t fr om t h e m ea n r a n dom
dist a n ce (H a m m on d a n d McCullagh, 1978; Ripley, 1981). The t est is betw een t h e obser ved
n ea r est n eigh bor dist a n ce a n d t h a t expect ed from a r a n dom dist r ibut ion a n d is given by
d(N N ) - d (r a n )
Z = ---------------------- (5.4)
SE d (r a n )
173
(4 - B) A 0.26136
SE d (r a n ) . SQRT [--------------- ] . --------------------- (5.5)
4B N 2 SQRT[ N 2 /A ]
Ca lc u l a ti n g t h e s t a t is t i c s
On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t in e. Th e pr ogr a m out pu t s 10 st a t is t ics :
1. Th e sa m ple size
2. Th e m ea n n ea r est n eigh bor dis t a n ce
3. Th e st a n da r d devia t ion of t h e n ea r est n eigh bor dis t a n ce
4. Th e m in im u m d is t a n ce
5. Th e m a xim u m d is t a n ce
6. Th e m ea n r a n dom dist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
7. Th e m ea n disper sed d ist a n ce for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
8. Th e n ea r est n eigh bor ind ex for bot h t h e bou n din g recta n gle a n d t h e u ser
inp u t a r ea , if pr ovided
9. Th e st a n da r d er r or of t h e n ea r est n eigh bor in dex for bot h t h e m a xim u m
bou n din g recta n gle a n d t h e u ser inp u t a r ea , if pr ovided
10. A significa n ce t est of t h e n ea r est n eigh bor ind ex (Z-t est )
Exam ple 1: The ne ares t ne ighbo r inde x for street robbe ries
Crim eS tat does n ot p r ovide t h e sign ifica n ce level of t h e t est , bu t only t h e Z-va lu e.
H owever , t h e sign ifica n ce level of t h e Z-va lu e ca n be fou n d in a n y t a ble of st a n da r d n or m a l
devia n t s. In t h is ca se, a Z-va lu e of -44.4672 is h igh ly sign ifica n t (p#.001). In oth er words,
t h e dist r ibu t ion of t h e n ea r est n eigh bors of st r eet r obber ies in Ba lt im ore Coun t y is
sign ifica n t ly sm a ller t h a n t h e exp ect ed dis t r ibu t ion of n ea r est n eigh bor s.
174
Ta ble 5.1
Ne are st N e igh bo r Sta tis tic s for
1996 Street Robb erie s in B altimore County
N=1181
Exam ple 2: The ne ares t ne ighbo r inde x for res iden tial burglaries
Ta ble 5.2
Ne are st Ne ig h bo r St at is tic s for
1996 R esid e n tial Bu r glar ie s in B alt im or e Coun t y
N=6051
Th e dis t r ibu t ion of r esid en t ia l bu r gla r ies is a ls o h igh ly sign ifica n t . N ow, s u ppose
we wa n t t o comp a r e t h e dist r ibu t ion of st r eet r obber ies (ta ble 5.1) wit h t h a t r esiden t ia l
bu r gla r ies (t a ble 5.2). Th e sign ifica n ce t est is n ot ver y u sefu l for t h e com pa r is on beca u se
t h e sa m ple sizes a r e so la r ge (1181 v. 6051); t h e m u ch h igher Z-va lu e for r esiden t ia l
bu r gla r ies indicat es pr ima r ily t h a t t h er e wa s a lar ger s a m ple size to test it. H owever,
com pa r in g t h e r ela t ive n ea r est n eigh bor in dices ca n be m ea n in gfu l.
175
Rela t ive
Near est
Neigh bor NNI(A)
Com pa r is on = ----------------- (5.6)
NN I(B)
wh er e NN I(A) is t h e n ea r est n eigh bor ind ex for on e group (A) a n d N NI (B) is t h e n ea r est
n eigh bor in dex for a n oth er gr oup (B). Th u s, com pa r in g st r eet r obber ies wit h r esiden t ia l
bu r gla r ies , we h a ve
In ot h er wor ds, t h e dis t r ibu t ion of st r eet r obber ies r ela t ive t o a n exp ect ed r a n dom
dis t r ibu t ion a ppea r s t o be m or e con cen t r a t ed t h a n t h a t of bu r gla r ies r ela t ive t o a n
exp ect ed r a n dom dis t r ibu t ion . Th er e is n o sim ple sign ifica n ce t est of t h is com pa r is on sin ce
th e stan dar d error of th e joint distributions is not k nown. But th e relat ive index suggests
t h a t r obber ies a r e m ore con cent r a t ed t h a n bu r gla r ies a n d, h en ce, ar e m ore likely t o ha ve
‘h ot spot’ or ‘h ot zon es’ wh er e t h ey ar e pa r t icu lar ly con cen t r a t ed. This in dex, of cou r se,
does not pr ove th a t t h er e a r e ‘h ot spots ’, but on ly poin t s u s t owa r ds t h e h igh er
con cent r a t ion of robberies rela t ive to bur gla r ies. In t h e pr eviou s ch a pt er, it wa s sh own
t h a t r obber ies h a d a sm a ller dis per sion t h a n bu r gla r ies. H er e, h owever , t h e a n a lysis is
ta ken a step fur th er to suggest th at robberies ar e more concentr at ed tha n bur glaries.
K-Orde r Ne are st Ne ig h bo rs
As m en t ion ed a bove, t h e n ea r est n eigh bor ind ex is only an ind ica t or of firs t -or der
spa t ia l r a n dom n ess. It com pa r es t h e a ver a ge dis t a n ce for t h e n ea r est n eigh bor t o a n
expe cted r a n dom dis t a n ce. But wh a t a bout t h e secon d n ea r est n eigh bor? Or t h e t h ir d
n ea r est n eigh bor ? Or t h e K t h n ea r es t n eigh bor ? Crim eS tat const ru cts K-order n earest
n eigh bor in dices. On t h e dis t a n ce a n a lysis pa ge, t h e u ser ca n specify t h e n u m ber of
n ea r es t n eigh bor in dices t o be calcu la t ed.
1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
3. Th e expect ed n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e n ea r est n eigh bor ind ex for ea ch or der
F or ea ch or der , Crim eS tat ca lcu lat es t h e K t h n ea r est n eigh bor dis t a n ce for ea ch
obser va t ion a n d t h en t a k es t h e a ver a ge. Th e exp ect ed n ea r est n eigh bor dis t a n ce for ea ch
ord er is ca lcula t ed by:
176
Mea n Ra n d om Dis t a n ce K (2K)!
t o K t h n ea r est n eigh bor = d(K r a n ) = ------------------------------ (5.7)
(2 K K!)2 SQRT [N/A]
Never t h eless, t h e K-or der n ea r est n eigh bor dis t a n ce a n d in dex ca n be u sefu l for
u n der st a n din g th e overa ll spa t ial dist r ibut ion s. Figur e 5.2 com pa r es t h e K-or der n ea r est
n eigh bor index for st r eet r obberies with t h a t of res ident ial bur gla r ies. The out pu t was
sa ved as a ‘.dbf’ an d wa s t h en im por t ed int o a gr a ph ics pr ogra m . Th e gra ph sh ows t h e
n ea r est n eigh bor ind ices for bot h r obber ies a n d bu r gla r ies u p t o t h e 50 t h or der (i.e., t h e 50 t h
n ea r est n eigh bor ). Th e n ea r est n eigh bor in dex is sca led fr om 0 (ext r em e clu st er in g) u p t o 1
(extr em e disper sion ). Since a n ea r est n eigh bor ind ex of 1 is expect ed u n der r a n domn ess,
t h e t h in st r a igh t lin e a t 1.0 in dica t es t h e exp ect ed K-or der in dex. As ca n be seen , bot h
st r eet r obber ies a n d r esiden t ial bu r gla r ies a r e m u ch m or e con cen t r a t ed t h a n K-or der
spa t ia l r a n dom n ess. F u r t h er , r obber ies a r e m or e con cen t r a t ed t h a n even bu r gla r ies for
ea ch of t h e 50 nea r est n eigh bor s. Thu s, t h e gra ph r einfor ces t h e a n a lysis a bove th a t
r obber ies a r e m or e con cen t r a t ed t h a n bur gla r ies, an d both a r e m or e con cen t r a t ed t h a n a
r a n dom dis t r ibu t ion.
In ot h er wor ds, even t h ou gh t h er e is not a good significa n ce t est for t h e K-or der
n ea r est n eigh bor ind ex, a gra ph of t h e K-or der ind ices (or t h e K-or der dist a n ces) ca n give a
p ict u r e of h ow clu s t er ed t h e d is t r ibu t ion is a s well a s a llow com p a r is on s in clu s t er in g
bet ween t h e differ en t t yp es of crim es (or t h e sa m e cr im e a t t wo differ en t t im e per iods).
Edge Effec ts
177
Figure 5.2
1.8
1.6
Nearest Neighbor Index
1.4
1.2
0.8
Residential burglaries
0.6
0.2
0.0
1 5 9 13 17 21 25 29 33 37 41 45 49
3 7 11 15 19 23 27 31 35 39 43 47
James L. LeBeau
Administration of Justice
Southern Illinois University-Carbondale
A comparison was made of Man with a Gun calls for the weekend in which
Hurricane Hugo hit the North Carolina coast ( September 22 – 24) with the
following New Year’s Eve weekend (December 29-31, 1989). There were 146 Man
with a Gun calls during the Hurricane Hugo weekend compared to 137 calls for New
Year’s Eve.
0.85
0.80
Clustered - Index - Dispersed
0.75
0.70
0.60
0 5 10 15 20 25
ORDER
Crim eS tat h a s t wo differ en t edge cor r ect ion s. Beca u se Crim eS tat is not a GIS
pa cka ge, it ca n n ot loca t e t h e a ct u a l bor der of a st u dy a r ea . On e wou ld n eed a t opologica l
GIS p a cka ge in wh ich t h e dist a n ce fr om ea ch p oint t o th e n ea r es t boun da r y is ca lcula t ed.
In st ea d, th er e a r e t wo differ en t geom et r ic m odels t h a t ca n be ap plied. The firs t a ssu m es
t h a t t h e st u dy a r ea is a r ect a n gle wh ile t h e secon d a ss u m es t h a t t h e st u dy a r ea is a circle.
Depen din g on t h e sh a pe of t h e a ct u a l st u dy ar ea , on e or eith er of t h ese m odels m a y be
a ppr opr iat e.
R ect a n gu la r st u d y a r ea
Ci r cu la r st u d y a r ea
R = SQRT [A / B ] (5.8)
180
If t h e u ser h a s n ot s pecified a st u dy a r ea on t h e m ea su r em en t pa r a m et er s p a ge, t h en A is
ca lcu la t ed fr om t h e m in im u m a n d m a xim u m X a n d Y coor d in a t es (t h e bou n d in g r ect a n gle)
a n d t h e r a diu s of t h e circle is ca lcu lat ed wit h equa t ion 5.8.
R iC = R - R i (5.9)
181
Figure 5.3:
Dispersed
Random
1
Concentrated
Nearest neighbor index
0.9
No correction
Rectangular correction
0.7
10 20 30 40
5 15 25 35 45
Order
Li n e a r N e a r e s t N e i g h b o r In d e x (Ln n a )
Th e lin ear n earest n eigh bor in d ex is a va r iat ion on t h e n ea r est n eigh bor r ou t ine, bu t
on e a p plied t o a s t r eet n et wor k . All d is t a n ces a lon g t h is n et wor k a r e a s su m ed t o t r a vel
a lon g a gr id, h en ce ind ir ect dis t a n ces a r e u sed. Wh er ea s t h e n ea r est n eigh bor r out in e
calculat es the distan ce between each point a nd its nea rest n eighbor u sing direct dista nces,
t h e lin ea r n ea r es t n eigh bor r out in e u se s in dir ect (‘Ma n h a t t a n ’) dist a n ces (see cha pt er 3).
Sim ilar ly, wher ea s t h e n ea r est n eigh bor r ou t ine calcu lat es t h e expect ed dist a n ce bet ween
n eigh bor s in a r a n dom dist r ibut ion of N p oint s u sin g th e geogra ph ica l ar ea of t h e st u dy
r egion , t h e lin ea r n ea r est n eigh bor r ou t in e u ses t h e t ot a l len gt h of t h e st r eet n et wor k .
L
Ld (r a n ) = 0.5 [------------------] (5.10)
N -1
wh er e L is th e tota l length of str eet n etwork an d N is the sam ple size (Ha mm ond a nd
McCullagh, 1978, 279). Consequent ly, th e linear n earest neighbor index is defined as
Lin ea r N ea r es t Ld(NN)
Neighbor In dex = LN NI = --------------- (5.11)
Ld (r a n )
Te s t i n g t h e S i g n i fi c a n c e o f t h e Li n e a r N e a r e s t N e i g h b o r In d e x
Sin ce t h e t h eor et ica l s t a n da r d er r or for t h e r a n dom lin ea r n ea r est n eigh bor dis t a n ce
is n ot kn own , t h e a u t h or h a s con st r u ct ed a n a ppr oxim a t e st a n da r d devia t ion for t h e
obser ved lin ea r n ea r est n eigh bor dist a n ce:
wh er e Min (d ij) is t h e n ea r est n eigh bor dist a n ce for poin t i an d Ld(NN ) is t h e a vera ge linea r
n ea r est n eigh bor dis t a n ce. Th is is t h e st a n da r d devia t ion of t h e lin ea r n ea r est n eigh bor
dist a n ces. The s t a n da r d er r or is ca lcu lat ed by
S L d(N N )
SE L d(N N ) = -------------- (5.13)
SQRT[N]
183
An a ppr oxim a t e significa n ce t est ca n be obt a ined by
Ld (N N ) - Ld (r a n )
t = ----------------------------- (5.14)
SE L d(N N )
wh er e Ld(NN) is t h e a vera ge linea r n ea r est n eigh bor dist a n ce, Ld(r a n ) is t h e expect ed
lin ea r n ea r es t n eigh bor dis t a n ce (equ a t ion 5.10), a n d S E L d(N N ) is th e a pp roxim a t e s ta n da r d
er r or of t h e lin ea r n ea r es t n eigh bor dis t an ce (equ a t ion 5.13). Sin ce t h e em p ir ica l s ta n da r d
devia t ion of th e lin ea r n ea r es t n eigh bor is bein g u se d in st ea d of a t h eor et ical va lu e, t h e
t es t is a t-test r a t h er t h a n a Z-t es t .
Ca lc u l a ti n g t h e s t a t is t i c s
On th e measu rem ents pa ra met ers page, th ere ar e two par am eters t ha t a re input ,
t h e geogr a ph ica l a r ea of t h e st u dy r egion a n d t h e len gt h of st r eet n et wor k . At t h e bot t om
of t h e p age, t h e u ser m u st select wh ich t yp e of d is t an ce m ea su r em en t t o u se, d ir ect or
in d ir ect . If t h e m ea s u r em en t t yp e is dir ect , t h en t h e n ea r es t n eigh bor r ou t in e r et u r n s t h e
sta nda rd n earest neighbor a na lysis (somet imes called areal nea r est neighbor ). On t h e
ot h er h a n d, if t h e m ea su r em en t t yp e is in dir ect , t h en t h e r ou t in e r et u r n s t h e lin ea r n ea r es t
n eigh bor a n a lysis . To ca lcu la t e t h e lin ea r n ea r est n eigh bor in dex, t h er efor e, d is t a n ce
m ea su r em en t m u st be specified a s in dir ect a n d t h e lengt h of t h e st r eet n et wor k m u st be
defined.
On ce n ea r est n eigh bor a n a lysis h a s been select ed, t h e u ser clicks on Com pute t o r u n
t h e r ou t ine. The L n n a rout ine out put s 9 stat istics:
1. Th e sa m ple size
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce
3. The minimum linear distan ce between n earest neighbors
4. Th e m a xim u m lin ea r dis t a n ce bet ween n ea r est n eigh bor s
5. Th e m ea n lin ea r r a n dom dist a n ce
6. Th e lin ea r n ea r es t n eigh bor in dex
7. Th e st a n da r d deviat ion of t h e lin ea r n ea r est n eigh bor dist a n ce
8. Th e st a n da r d er r or of t h e lin ea r n ea r est n eigh bor dis t a n ce
9. A significa n ce t est of t h e n ea r est n eigh bor ind ex (t -t est )
E x a m p l e 3: Au t o t h e ft s a lo n g t w o h i g h w a y s
Th e lin ea r n ea r est n eigh bor in dex is u seful for a n a lyzing t h e dist r ibu t ion of crim e
in ciden t s a lon g pa r t icula r st r eet s. F or exa m ple, in Ba lt im ore Coun t y, st a t e h ighwa y 26 in
t h e west er n pa r t a n d st a t e h igh wa y 150 in t h e ea st er n pa r t h a ve h igh con cen t r a t ion s of
m ot or vehicle th eft s (figu r e 5.4). In 1996, th er e wer e 87 vehicle th eft s on h igh wa y 26 an d
47 on h igh wa y 150. A GIS ca n be u sed wit h t h e lin ea r n ea r est n eigh bor in dex t o in dica t e
wh et h er t h es e in ciden t s a r e gr ea t er t h a n wh a t would be exp ect ed on t h e ba sis of cha n ce.
184
Figure 5.4:
Sta
te
Hig
hw
ay
26
0
y 15
a
ighw
H
te
Sta
Miles
0 2 4
Ta ble 5.3 pr esen t s t h e da t a . Usin g th e GIS, we est ima t e t h a t t h er e a r e 3,333.54
m iles of roa dwa y s egm en t s; t h is n u m ber wa s es t im a t ed by a ddin g u p t h e t ot a l len gt h of th e
st r eet n et wor k in t h e GIS. Of a ll t h e r oa d segm en t s in Balt imore Coun t y, t h er e a r e 241.04
m iles of m a jor a r t er ial r oa ds of wh ich st a t e h igh wa y 26 ha s a t ot a l len gth of 10.42 miles
a n d s t a t e h igh wa y 150 h a s a t ota l r oad len gt h of 7.79 m iles .
But wha t a bout th e distr ibut ion of th e incidents a lon g each of th ese highwa ys? If
t h er e we r e a n y pa t t er n , for exa m ple , m ost of t h e in ciden t s clu st er in g on t h e we st er n edge
or in t h e cen t er , th en police cou ld u se t h a t infor m a t ion t o m or e efficient ly deploy veh icles t o
r es pon d qu ickly t o even t s. On t h e oth er h a n d, if t h e dist r ibu t ion a long t h es e h igh wa ys
wer e n o differen t t h a n a r a n dom dis t r ibu t ion , t h en police veh icles m u st be posit ion ed in
t h e m iddle, since th a t wou ld m inim ize t h e dist a n ce t o a ll occu r r ing incident s.
Un for t u n a t ely, t h e r esu lt s a pp ea r t o be close t o a r a n dom dis t r ibu t ion. Crim eS tat
ca lcu lat es t h a t for h igh wa y 26, t h e a vera ge linea r n ea r est n eigh bor dist a n ce is 0.05 m iles
wh ich is close t o t h e a vera ge ra n dom linea r n ea r est n eigh bor dist a n ce (0.06 miles). The
r a t io - t h e lin ea r n ea r est n eigh bor in dex, is 0.96 wit h a t -va lu e of -0.16, wh ich is n ot
significa n t ly differ en t fr om ch a n ce. Similar ly, for h igh wa y 150, t h e a vera ge linea r n ea r est
n eigh bor dis t a n ce is 0.079 m iles wh ich , a ga in , is a lm ost id en t ica l t o t h e a ver a ge r a n dom
linea r n ea r est n eigh bor dist a n ce (0.084 miles); t h e n ea r est n eigh bor ind ex is 0.94 an d t h e t -
va lu e is -0.41 (n ot sign ifica n t ). In sh or t , even t h ou gh t h er e wa s a h igh er con cen t r a t ion of
vehicle th eft s on t h ese t wo st a t e h igh wa ys t h a n wou ld be expect ed on t h e bas is of ch a n ce,
t h e d is t r ibu t ion a lon g ea ch h igh wa y is n ot ver y d iffer en t t h a n wh a t wou ld be exp ect ed on
t h e bas is of ch a n ce. 4
K-Or d e r Li n e a r N e a r e s t N e i g h b o rs
There is also a K-order linear near est neighbor a na lysis, as with t he ar eal nearest
n eigh bor s. The u ser ca n specify how ma n y add itiona l nea r est n eigh bor s a r e t o be
calculat ed. The linear K-order n earest neighbor r out ine retur ns four column s:
1. Th e or der , s t a r t in g fr om 1
2. Th e m ea n lin ea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
186
Table 5.3
H igh wa y 26 10.42 m i
H igh wa y 150 7.79 m i
All Ma jor
Ar t er ia ls 241.04 m i
All
Roads 3333.54 m i
Random E xpected
Dist a n ce
Bet ween In ciden t s = 0.44 miles
Av e r a g e “R e l a t i v e
“R e l a t i v e Av e r a g e R an do m t o It se l f ”
to R a n d om ” L in e a r L in e a r L in e a r
Wh e r e Number E x p e c te d Ne arest N e are st Ne arest
Inc ide n ts of Number R a t io o f Neighbor Neighbor Neighbor
Oc cu rre d Inc ide n ts If R a n d o m Frequen cy D i s ta n c e D i s ta n c e In d e x
H ig h w a y 2 6 87 11 .8 7.4 0 .0 5 m i 0 .0 6 0 .9 6
H ig h w a y 1 5 0 47 8.8 5.3 0 .0 8 m i 0 .0 8 0 .9 4
A ll M a jo r
A rteria ls 607 27 2 .8 2.2 0 .1 3 m i 0 .2 0 0 .6 4
(p # .0 0 1 )
187
3. Th e expect ed linea r n ea r est n eigh bor dist a n ce for ea ch or der (in m et er s)
4. Th e lin ea r n ea r est n eigh bor ind ex for ea ch or der
Sin ce t h e expect ed linea r n ea r est n eigh bor dist a n ce h a s n ot been work ed out for or der s
h igh er t h a n on e, t h e ca lcu la t ion pr odu ced h er e is a r ou gh a ppr oxim a t ion . It a pplies equ a t ion
5.10 only a dju st in g for t h e decr ea sin g sa m ple size, N k , wh ich occu r s a s degr ees of fr eedom a r e
lost for each successive order. In th is sense, th e index is really th e k-order linear near est
n eigh bor dis t a n ce r ela t ive t o t h e exp ect ed lin ea r n eigh bor dis t a n ce for t h e fir st or der . It is n ot
a st r ict n ea r est n eigh bor ind ex for or der s a bove on e.
Never t h eless, like t h e a r ea l k-or der n ea r est n eigh bor ind ex, t h e k-or der lin ea r n ea r est
n eigh bor ind ex ca n pr ovide ins igh t s in t o t h e dist r ibut ion of t h e poin t s, even if t h e firs t -or der
is r a n dom . Figur e 5.5 s h ows a gr a ph of 50 lin ea r n ea r est n eigh bors for 1996 r esiden t ia l
bu r gla r ies a n d st r eet r obber ies for Balt imore Coun t y. As wit h t h e a r ea l k-or der n ea r est
n eigh bor s (see figu r e 5.3) bot h bu r gla r ies a n d r obber ies sh ow eviden ce of clu st er in g. F or bot h ,
t h e firs t n ea r est n eigh bor s a r e closer t ogeth er t h a n a r a n dom dist r ibut ion . Similar ly, over t h e
50 or der s, s t r eet r obber ies a r e m or e clu st er ed t h a n bu r gla r ies. H owever , m ea su r in g d is t a n ce
on a gr id sh ows t h a t for bu r gla r ies, t h er e is only a sm a ll a m ou n t of clu st er in g. After t h e
four th order n eighbor, the distribution for bur glaries is more dispersed th an a r an dom
dis t r ibu t ion . An in t er pr et a t ion of t h is is t h a t t h er e a r e sm a ll n u m ber of bu r gla r ies wh ich a r e
clus t er ed, bu t t h e clust er s a r e r ela t ively disp er se d. S t r eet r obber ies , on t h e oth er h a n d, a r e
highly clustered, up t o over 30 near est neighbors.
Th e lin ea r k-or der n ea r est n eigh bor dis t r ibu t ion gives a sligh t ly differ en t per sp ect ive
on t he distribution t ha n t he ar eal. For one th ing, th e index is slight ly biased as t he
den om in a t or - t h e K-or der exp ect ed lin ea r n eigh bor dis t a n ce, is on ly a ppr oxim a t ed. F or
a n ot h er t h ing, th e index m ea su r es dist a n ce as if t h e st r eet follow a t r u e gr id, orien t ed in a n
ea st -west a n d n or t h -sout h dir ect ion . In t h is sen se, it m a y be un r ea listic for m a n y places,
especia lly if st r eet s t r a ver se in dia gon a l p a t t er n s; in t h ese ca ses, t h e u se of in dir ect dis t a n ce
m ea su r em en t will pr odu ce grea t er dis t a n ces t h a n wh a t a ctu a lly occu r on t h e n et work . St ill,
t h e lin ea r n ea r est n eigh bor ind ex is a n a t t em pt t o a ppr oxim a t e t r a vel a lon g th e st r eet
n et work . To t h e ext en t t h a t a pa r t icula r jur isd iction’s s t r eet pa t t er n fall in t h is m a n n er , it
ca n pr ovide u sefu l in for m a t ion .
Ripley ’s K Statistic
Con sider a spatially ran d om dis t r ibu t ion of N point s. I f cir cles of r a diu s, d s , ar e dr a wn
a r oun d ea ch p oint , wh er e s is t h e order of r a dii fr om t h e sm a lles t t o th e la r gest , a n d t h e
188
Figure 5.5
3.5
Linear Nearest Neighbor Index
Residential burglaries
2.5
1.5
0
0 10 20 30 40
5 15 25 35 45
N
E (# of poin t s wit h in d ist a n ce d i ) = --------- K(d s ) (5.15)
A
N
E (# un der csr ) = ------ B d s 2 (5.16)
A
A
K(d s ) = ------ G G I (d ij) (5.17)
N2 i j
190
t h e r a dii of circles ar e increa sed in sm a ll increm en t s so th a t t h er e a r e 50-100 int er vals by
which t he st at istic can be coun ted. In Crim eS tat, 100 in t er va ls (r a dii) a r e u sed, ba sed on
R
d s = -------- (5.18)
100
K(d s )
L(d s ) = S QRT [ --------- ] - d s (5.19)
B
Co m p a r i s o n to A S p a t ia ll y R a n d o m D i s t ri b u t io n
S pe c ify in g si m u la ti on s
191
Figure 5.6:
K Statistic For 1996 Robberies
Compared to Random and Population Distributions
L(d) = Sqrt[K(d)/pi] - d
3
Robberies
1990 Population
0
L(d)
-1
-2
-3
0.33 2.79 5.13 7.46 9.80 12.14 14.48
1.56 3.89 6.23 8.57 11.03 13.37
In pr a ct ice, th e sim u lat ion t est a lso h a s bia ses a ssociat ed wit h edges. Un like t h e
t h eore t ical L u n der u n ifor m con dit ion s of com plet e spa t ia l r a n dom n ess (i.e., st r et chin g in a ll
dir ect ion s well beyon d t h e st u dy ar ea ) wh er e L is a s t r a igh t h or izon t a l line, t h e sim u lat ed L
a ls o declin es wit h in cr ea sin g dis t a n ce s epa r a t ion be t ween poin t s. Th is is a fu n ct ion of th e
sa m e t ype of edge bia s. Con sequ en t ly, it is poss ible t o comp a r e t h e em pir ical L wit h t h e
r a n dom L for even lon ger dis t a n ce sepa r a t ion s sin ce bot h h a ve edge bia ses. Th er e a r e som e
su bt le differen ces bet ween t h e t wo, however , so some car e sh ould be u sed. Th e em pir ical L is
obta in ed from t h e point s w it h in t h e st u dy a r ea , t h e geogr a ph y of wh ich is u su a lly ir r egu la r .
Th e r a n dom L, however , is calcu lat ed from a r ect a n gle. Thu s, t h e differ en ces in t h e sh a pe
compa risons m ay accoun t for some variat ions.
Co m pa ri so n to B as e li ne P o p u la ti on s
F or m ost social dist r ibut ion s, su ch a s crim e inciden t s, r a n domn ess is n ot a very
m ea n in gfu l ba selin e. Most socia l ch a r a cter ist ics a r e n on-r a n dom . Con sequ en t ly, to find t h a t
t h e a m oun t of clu st er in g t h a t is occur r in g is gr ea t er t h a n wh a t would be exp ected on t h e ba sis
of ch a n ce is not ver y useful for cr ime a n a lyst s. H owever, it is p ossible t o com pa r e t h e
dis t r ibu t ion of L for crim e in ciden t s wit h t h e dist r ibu t ion of L for va r iou s ba selin e
cha r a cte r ist ics, for exa m ple , for t h e popu la t ion d ist r ibu t ion or t h e dist r ibu t ion of em ploym en t .
In a lmost a ll m et r opolita n a r ea s, popu lat ion is m or e con cen t r a t ed t owa r ds t h e cen t er t h a n a t
t h e per iph er y; t h e dr op-off in popu la t ion den sit y is ver y sh a r p a s wa s s h own in t h e la st
ch a p t er . All ot h er t h in gs bein g equ a l, on e wou ld exp ect m or e in cid en t s t owa r d s t h e
m et r opolita n cen t er t h a n a t t h e per iph er y; con sequ en t ly, th e a vera ge dista n ce bet ween
in cid en t s will be s h or t er in t h e cen t er t h a n fa r t h er ou t . Th is is n ot h in g m or e t h a n a
con sequ en ce of t h e dis t r ibu t ion of people. H owever , t o sa y s om et h in g a bou t con cen t r a t ion s of
in cid en t s a bove-a n d-beyon d t h a t exp ect ed by p opu la t ion r equ ir es u s t o exa m in e t h e pa t t er n of
populat ion a s well as of crime incidents.
193
m u ch gr ea t er t h a n both t h e r a n dom en velope a s w ell a s t h e dist r ibu t ion of popu la t ion. In
ot h er wor ds, r obber ies a r e m or e clu st er ed t oget h er t h a n even wh a t wou ld be exp ect ed on t h e
ba sis of t h e popula t ion dist r ibut ion a n d t h is h olds for dist a n ces u p t o a bou t 7 m iles,
wh er eu pon t h e dis t r ibu t ion of r obber ies is in dis t in gu is h a ble fr om a r a n dom dis t r ibu t ion . F or
com pa r is on , figu r e 5.7 below sh ows t h e dis t r ibu t ion of 1996 bu r gla r ies, a ga in com pa r ed t o a
r a n dom en velop e a n d t h e dis t r ibu t ion of popu la t ion . We fin d t h a t bu r gla r ies a r e m or e
clu st er ed t h a n even popu la t ion , bu t less so t h a n for r obber ies; t h e L va lu e is h igh er for
r obber ies t h a n for bu r gla r ies for n ea r dis t a n ces. Th u s, t h e dist r ibu t ion of L con firm s t h e
r esu lt t h a t bur gla r ies t en d t o be spr ea d over a m u ch lar ger geogra ph ica l ar ea in sm a ller
clu st er s t h a n st r eet r obber ies , wh ich t en d t o be m or e con cen t r a t ed in la r ge clu st er s . In t er m s
of lookin g for ‘h ot spots ’, on e wou ld expect t o find m or e with r obber ies t h a n with bur gla r ies.
E d g e Co r re c t i o n s fo r Ri p le y ’s K
Th e L st a t ist ic is p r one t o edge effects ju st like t h e n ea r est n eigh bor s t a t ist ic. Th a t is,
for poin t s loca t ed nea r t h e bou n da r y of t h e s tu dy a r ea , t h e n u m ber en u m er a t ed by a n y cir cle
for t h ose point s will, a ll ot h er t h ings bein g equa l, n ecessa r ily be less t h a n poin t s in t h e cen t er
of t h e s tu dy a r ea beca u se p oin t s ou t sid e t h e bou n da r y a r e n ot cou n t ed . F u r t h er , t h e gr ea t er
t h e dist a n ce bet ween point s t h a t a r e bein g t est ed (i.e., th e gr ea t er t h e r a diu s of t h e circle
placed over ea ch point ), th e grea t er t h e bia s. Thu s, a plot of L aga inst dist a n ce will show a
declin ing cu r ve as dist a n ce incr eas es a s figu r es 5.6 a n d 5.7 show.
Sim ila r ly, Rip ley h a s pr oposed a sim ple weigh t in g t o a ccoun t for t h e pr opor t ion of th e
circle pla ced over ea ch poin t t h a t is with in t h e st u dy ar ea (Ven a bles an d Ripley, 1997). Thu s,
equa tion 5.17 is re-written as:
A
K(d s ) = ------ G G Wij-1 I (d ij) (5.20)
N2 i j
wh er e W ij-1 is t h e in ver se of t h e pr oport ion of a cir cle of ra diu s, d s , pla ced over ea ch p oin t
wh ich is wit h in t h e t ota l st u dy a r ea . Th u s, if a point is n ea r t h e st u dy a r ea bord er , it will
r eceive a gr ea t er weigh t becau se a sm a ller pr opor t ion of t h e circle pla ced over it will be wit h in
t h e st u dy a r ea .
194
Figure 5.7:
K Statistic For 1996 Burglaries
Compared to Random and Population Distributions
L(d) = Sqrt[K(d)/pi] - d
2
Burglaries
1
1990 Population
0
L(d)
-1
-2
-3
-4
0.40 2.77 5.15 7.52 9.89 12.27 14.64
1.58 3.96 6.33 8.71 11.08 13.46
E = 1/p (5.21)
In t h e r ect a n gu la r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is compa red to
t h e edge of a n a ssu m ed r ect a n gle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o
be an a lyzed is defined . If t h e u ser h a s sp ecified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for A is t a k en . Th e m a xim u m boun din g r ect a n gle is t a k en (i.e.,
r ect a n gle defin ed by th e m inim u m a n d m a xim u m X/Y valu es) an d pr oport ion a t ely r e-scaled so
t h a t t h e a r ea of t h e r ect a n gle is equ a l t o A. If t h e u ser does n ot specify a n a r ea on t h e
m ea su r em en t pa r a m et er s pa ge, t h en t h e m a xim u m bou n din g r ect a n gle is t a k en for A.
196
K-Function Analysis to Determine Clustering in the
Police Confrontations Dataset in
Buenos Aires Province, Argentina: 1999
Sometimes crime analysts tend to produce beautiful hot spot maps without
any formal evidence that clustering is indeed present in the data. One excellent and
powerful tool that CrimeStat provides is the computation of the K function, which
summarizes spatial dependence over a wide range of scales, and uses the
information of all events.
4
Observed L(d)
2 Base-Population L(d)
-2
L(d)
-4
-8 L(d)
CSR
L(d)_MIN
-10 L(d)_MAX
L(d) Base Population
-12
0 10 20 30 40
Distance Between Points [km]
1 A years worth dataset of events occurring within a 9,500 km2 area around the Federal Capital (29
counties).
2 Remember that Pr( L(d) > Lmax) = Pr( L(d) < Lmin) = 1 / (m + 1) where m is the number of
independent simulations,
2. If eit h er t h e m in im u m dis t a n ce in t h e X-dir ection - d(m in R X ), or t h e m in im u m
dis t a n ce in t h e Y-dir ection - d(m in R Y ), bu t NOT BO TH , a r e less t h a n t h e sea r ch
circle r a diu s, R j, th en par t of th e sear ch circle falls out side the recta ngle and a n
a dju st m en t is n ecess a r y. An a pp r oxim a t e a dju st m en t is m a de t h a n is in ver sely
pr opor t ion a l t o t h e a r ea of t h e sea r ch cir cle wit h in t h e r ect a n gle. Th e va lu es of
E will va r y bet ween 1 a n d 2 s in ce up t o on e-ha lf of t h e sea r ch circle could fall
ou t s id e t h e r ect a n gle;
In t h e circula r cor r ect ion for Riple y’s K , t h e sea r ch cir cle ra diu s, R j, is com pa r ed t o t h e
edge of a n a ssu m ed cir cle with a r ea , A, cen t er ed a t t h e m ea n cen t er . Fir st , th e a r ea t o be
a n a lyzed is defined . If t h e u ser h a specified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s
pa ge, t h en t h a t va lu e for a is t a k en . Th e r a diu s of th e circle, R j, is ca lcula t ed by equ a t ion 5.8
a bove. If t h e u ser h a s n ot specified a st u dy ar ea on t h e m ea su r em en t pa r a m et er s pa ge, t h en
A is ca lcu la t ed fr om t h e m a xim u m bou n din g r ect a n gle a n d t h e r a diu s of t h e cir cle is
ca lcu lat ed by equa t ion 5.8 above.
R jC = R - R j (5.22)
198
t h e r ect a n gu la r cor r ect ion n or t h e L dis t r ibu t ion wit h t h e cir cu la r dis t r ibu t ion do so. As
exp ect ed, t h e r ect a n gu la r dis t r ibu t ion pr odu ces t h e m ost con cen t r a t ion .
D is ta n c e Ma tri ce s
Bot h t ypes of m a t r ices can be displa yed or sa ved to a t ext file for imp or t int o a n ot h er
pr ogra m . Ea ch m a t r ix define s in ciden t s by t h e order in wh ich t h ey occur in t h e files (i.e.,
Recor d n u m ber 1 is list ed a s ‘1'; r ecor d n u m ber 2 is list ed ‘2'; a n d so for t h ). On ly a su bset of
ea ch m a t r ix is displa yed on t h e r esu lts t a b. However , th er e a r e h or izon t a l an d vert ica l slider
ba r s t h a t a llow t h e u ser t o scr oll t h r ough t h e m a t r ix. The u ser sh ould m ove t h e ver t ical slide
ba r firs t t o a n a ppr oxim a t e pr oport ion of t h e m a t r ix a n d click t h e Go bu t t on. Th e m a t r ix will
scr oll t h r ough t h e r ows of t h e m a t r ix to a pla ce wh ich r epr esen t s t h a t pr opor t ion in dica t ed in
t h e slide bar . The u ser ca n t h en scroll across t h e r ows wit h t h e u pper slide bar .
Th e m a t r ices can be us ed for var iou s pu r poses. The w ith in file point -to-point m atrix
ca n be us ed t o exam ine dist a n ces bet ween pa r t icu lar inciden t s. The saved ‘.txt’ m atrix ca n
a lso be import ed int o a n et wor k p r ogra m for est ima t ing t r a n sport a t ion r ou t es. The prim ary-
to-secon d ary file m atrix ca n be us ed in opt imiza t ion r ou t ines , for exam ple in t r yin g to ass ess
opt im a l a lloca t ion of police ca r s in or der t o m in im ize r espon se t im e in a police dis t r ict .
Th e n ext cha pt er will dis cus s h ow to iden t ify ‘h ot s pot s’ wit h Crim eS tat.
199
Figure 5.8:
8
Rectangular correction
4
Circular correction
No correction
L(d)
-2
SQRT[2]
d(dis ) = -------------------------
3 1 /4 SQRT[ N/A ]
2. U n for t u n a t ely, t h e t er m ord er wh en u sed in t h e con t ext of n ea r est n eigh bor a n a lysis
ha s a slight ly different mean ing th an when used as first-ord er com p ar ed to secon d -
ord er st a t ist ics. In t h e n ea r es t n eigh bor con t ext , ord er really mean s n eighbor
wh er ea s in t h e t ype of st a t ist ics con t ext , ord er mean s th e scale of th e stat istics,
globa l or local. Th e u se of t h e t er m s is h ist orical.
4. Beca u se Crim eS tat u ses in dir ect dist a n ce for t h e lin ea r n ea r est n eigh bor ind ex (i.e.
m ea su r em en t on ly in a n h or izon t a l or ver t ica l d ir ect ion ), t h er e is a sligh t dis t or t ion
t h a t can occur if t h e in ciden t s a r e dist r ibu t ed in a dia gon a l m a n n er , su ch a s wit h
St a t e H igh wa ys 26 a n d 150 in F igu r e 5.4. Th e dist ort ion is ver y sm a ll, h owever .
F or exam ple, wit h t h e inciden t s a lon g Sta t e H igh wa y 26, a ft er r ot a t ing t h e inciden t
poin t s so th a t t h ey fell a ppr oxim a t ely in a h or izon t a l or ient a t ion , th e obser ved
a vera ge linea r n ea r est n eigh bor dist a n ce decrea sed s ligh t ly fr om 0.05843 m iles to
0.05061 m iles a n d t h e lin ea r n ea r est n eigh bor in dex beca m e 0.8354 (t =-.91; n ot
significa n t ). In oth er wor ds, t h e effect s of t h e dia gon a l dist r ibut ion lengt h en ed t h e
est ima t e for t h e a vera ge linea r n ea r est n eigh bor dist a n ce by about 41 feet com pa r ed
t o t h e a ct u a l dist a n ces bet ween inciden t s. For a sm a ll sa m ple size, t h is cou ld be
r eleva n t , bu t for a la r ger sa m ple it gen er a lly will be a sm a ll dist ort ion . H owever , if
a m or e pr ecis e m ea su r e is r equ ir ed, t h en t h e u ser sh ou ld r ot a t ion t h e dis t r ibu t ion
so t h a t t h e in cid en t s h a ve a s clos ely a s possible a h or izon t a l or ver t ica l or ien t a t ion .
201
6. Note, t h a t sin ce th er e is n ot a for m a l t est of sign ifican ce, th e com pa r ison wit h a n
en velope pr odu ced from a n u m ber of sim u la t ion s p r ovides on ly app r oxim a t e
con fid en ce a bou t wh et h er t h e d is t r ibu t ion d iffer s fr om ch a n ce or n ot . Th a t is , on e
ca n n ot s a y t h a t t h e lik elih ood of obt a in in g t h is r es u lt by ch a n ce is les s t h a n 5%, for
exam ple.
7. Th e ‘gu a r d r a il’ con cept , wh ile frequ en t ly used, is poor m et h odology becau se it
in volves ign or in g d a t a n ea r t h e bou n d a r y of a s t u dy a r ea . Th a t is , p oin t s wit h in t h e
gua r d r a il a r e on ly a llowed t o be select ed by ot h er poin t s a n d n ot , in t u r n , be
a llowed t o select ot h er s. This h a s t h e effect of t h r owing ou t da t a t h a t cou ld be very
im p or t a n t . It is an a logou s t o t h e old , bu t for t u n a t ely n ow d is ca r ded , p ra ct ice of
t h r owin g ou t ‘ou t lier s’ in r egr ession a n a lysis beca u se t h e ou t lier s wer e som eh ow
seen a s ‘n ot t yp ica l’. Th e gu a r d r a il con cept is a ls o poor policin g p r a ct ice sin ce
in cid en t s occu r r in g n ea r a bor d er m a y be ver y im p or t a n t t o a p olice d ep a r t m en t a n d
m a y r equ ir e coor din a t ion wit h a n a dja cen t ju r is dict ion . In sh or t , u se m a t h em a t ica l
adjustm ents for edge corr ections or, failing th at , leave th e data as it is.
202