Scenario 1:: Acknowlegement

Acknowlegement
Code for this PySAL analyses below was adapated from

http://pysal.org/notebooks/explore/esda/Spatial_Autocorrelation_for_Areal_Unit_Data.html (http://pysal.org/notebooks
/explore/esda/Spatial_Autocorrelation_for_Areal_Unit_Data.html)
http://pysal.org/notebooks/explore/giddy/Markov_Based_Methods.html (http://pysal.org/notebooks/explore/giddy
/Markov_Based_Methods.html) (Serge Rey sjsrey@gmail.com, Wei Kang weikang9009@gmail.com)
Contact
david.lapp@oracle.com
Import libraries needed for the 2 demo scenarios
In [33]: import cx_Oracle

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import pysal.lib as lp
from pysal.viz import mapclassify as mc
from pysal.explore import giddy
from pysal.explore import esda
from shapely.wkt import loads
import numpy as np
np.set_printoptions(precision=4)
%matplotlib inline
Scenario 1:
#### Oracle Spatial storing large multi year nationwide auto accident data
#### Spatial query calculates % of traffic accidents that involved speeding within geograpahic tiles covering the US
#### Result pulled into Python for spatial statistical analysis
Connect to Oracle Autonomous Database
In [2]: file = open('/opt/dbconfig_adw.txt', 'r')

user = file.readline().strip()
pwd = file.readline().strip()
host_port_service = file.readline().strip()
connection = cx_Oracle.connect(user, pwd, host_port_service)
cursor = connection.cursor()
Handle CLOBs
1 of 17 8/10/20, 10:33 AM
In [3]: def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.CLOB:
return cursor.var(cx_Oracle.LONG_STRING, arraysize = cursor.arraysize)
connection.outputtypehandler = OutputTypeHandler
Run query returning "WKT" geometry
In [4]: cursor.execute("""
SELECT sdo_util.to_wktgeometry(geometry) as geometry
FROM grid_us
""")
gdf = gpd.GeoDataFrame(cursor.fetchall(), columns = ['geometry'])
gdf['geometry'] = gpd.GeoSeries(gdf['geometry'].apply(lambda x: loads(x)))
gdf
Out[4]:
geometry
0 POLYGON ((-68.90625 46.40625, -67.50000 46.406...
1 POLYGON ((-70.31250 45.00000, -68.90625 45.000...
2 POLYGON ((-68.90625 45.00000, -67.50000 45.000...
3 POLYGON ((-101.25000 43.59375, -99.84375 43.59...
4 POLYGON ((-102.65625 43.59375, -101.25000 43.5...
... ...
473 POLYGON ((-88.59375 37.96875, -87.18750 37.968...
474 POLYGON ((-92.81250 42.18750, -91.40625 42.187...
475 POLYGON ((-94.21875 47.81250, -92.81250 47.812...
476 POLYGON ((-84.37500 46.40625, -82.96875 46.406...
477 POLYGON ((-70.31250 46.40625, -68.90625 46.406...
478 rows × 1 columns
View result
2 of 17 8/10/20, 10:33 AM
In [5]: fig, ax = plt.subplots(figsize=(10,5))
ax.set_clip_on(False)
ax.set_facecolor("lightblue")
result=gdf.plot(ax=ax,linewidth=1.5,facecolor="#cccccc",edgecolor="darkgrey",legen
d=False)
leg=ax.get_legend()
us48shp = gpd.read_file('data/us48.shp')
us48shp.plot(ax=ax, facecolor='none', edgecolor='white')
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7f63663c5b50>
Query for "% of traffic accidents that involved speeding" per tile
WITH x as (
SELECT b.grid_key as grid_key, 100*sum(a.speeding_involved)/count(speeding_invo
lved) as metric
FROM fars_pysal_mv a, grid_us b
WHERE sdo_anyinteract(a.geometry,b.geometry)='TRUE'
GROUP BY b.grid_key having count(b.grid_key)>10 )
SELECT x.grid_key, x.metric, sdo_util.to_wktgeometry(y.geometry) as geometry
FROM x, grid_us y
WHERE x.grid_key=y.grid_key
""")
gdf = gpd.GeoDataFrame(cursor.fetchall(), columns = ['grid_key','metric', 'geometr
y'])
gdf.head()
Out[6]:
grid_key metric geometry
0 253 46.341463 POLYGON ((-68.90625 46.40625, -67.50000 46.406...
1 254 47.368421 POLYGON ((-70.31250 45.00000, -68.90625 45.000...
2 255 38.461538 POLYGON ((-68.90625 45.00000, -67.50000 45.000...
3 256 9.523810 POLYGON ((-101.25000 43.59375, -99.84375 43.59...
4 257 29.411765 POLYGON ((-102.65625 43.59375, -101.25000 43.5...
3 of 17 8/10/20, 10:33 AM
View the result
In [7]: fig, ax = plt.subplots(figsize=(11,5))

result=gdf.plot(ax=ax, column='metric',cmap='OrRd',linewidth=0.3,edgecolor="lightg
rey",legend=True)
leg=ax.get_legend()
Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x7f6365b9ea10>
Analyze with PySAL (geospatial data science library) http://pysal.org/

(http://pysal.org/)
Spatial autocorrelation
Correlation of home prices and household income: 2 variables... makes sense

Correlation of home prices and location: 1 variable and itself across space... less intuitive
Spatial autocorrelation measures similarity of a variable to nearby values
Provides local and global measures
Incorporates "Spatial Lag" (weighted avg of neighbors)
Calculate global spatial sutocorrelation
4 of 17 8/10/20, 10:33 AM
In [8]: wq = lp.weights.Queen.from_dataframe(gdf)
wq.transform = 'r'
y = gdf['metric']
ylag = lp.weights.lag_spatial(wq, y)
np.random.seed(12345)
mi = esda.Moran(y, wq)
mi.I
Out[8]: 0.3521637718312477
View in comparison to spatial randomness
In [9]: import seaborn as sbn

sbn.kdeplot(mi.sim, shade=True)
plt.vlines(mi.I, 0, 1, color='r')
plt.vlines(mi.EI, 0,1)
plt.xlabel("Moran's I")
Out[9]: Text(0.5, 0, "Moran's I")
Calculate local spatial autocorrelation
In [10]: li = esda.Moran_Local(y, wq)
p_sim is probability that value is spartially random
In [11]: li.p_sim[0:20]
Out[11]: array([0.04 , 0.025, 0.021, 0.22 , 0.04 , 0.254, 0.3 , 0.482, 0.41 ,
0.07 , 0.369, 0.001, 0.01 , 0.05 , 0.002, 0.075, 0.159, 0.008,
0.001, 0.103])
q is quandrant location: 1 HH (hot spot), 2 LH, 3 LL (cold spot), 4 HL
5 of 17 8/10/20, 10:33 AM
In [12]: li.q[0:20]
Out[12]: array([1, 1, 1, 3, 3, 1, 3, 2, 4, 1, 3, 3, 1, 3, 3, 3, 1, 3, 3, 1])
Hot spots: quadrant 1 and probability of spatial randomness <5%
In [13]: sig = li.p_sim < 0.05

hotspot=sig*li.q==1
spots = ['n.sig.', 'hot spot']
labels = [spots[i] for i in hotspot*1]
from matplotlib import colors
hmap = colors.ListedColormap(['red', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(12,5))
gdf.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
plt.show()
Cold spots: quadrant 3 and probability of spatial randomness <5%
6 of 17 8/10/20, 10:33 AM
In [14]: coldspot=sig*li.q==3
spots = ['n.sig.', 'cold spot']
labels = [spots[i] for i in coldspot*1]
from matplotlib import colors
hmap = colors.ListedColormap(['darkblue', 'lightgrey'])
f, ax = plt.subplots(1, figsize=(12,5))
gdf.assign(cl=labels).plot(column='cl', categorical=True, \
k=2, cmap=hmap, linewidth=0.1, ax=ax, \
edgecolor='black', legend=True)
plt.show()
Scenario 2:
#### For a region of interest, how is unemployment over time related to location?
#### Determine probabilities of changes based on 'regional' unemployment
Connect to Oracle Autonomous Database
In [15]: file = open('/opt/dbconfig_adw.txt', 'r')

user = file.readline().strip()
pwd = file.readline().strip()
host_port_service = file.readline().strip()
connection = cx_Oracle.connect(user, pwd, host_port_service)
cursor = connection.cursor()
Handle CLOBs
In [16]: def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):

if defaultType == cx_Oracle.CLOB:
return cursor.var(cx_Oracle.LONG_STRING, arraysize = cursor.arraysize)
connection.outputtypehandler = OutputTypeHandler
7 of 17 8/10/20, 10:33 AM
Preview the unemployment data
SELECT statefips, countyfips, year, unemp_pct
FROM bls_unemployment
where rownum<10
""")
pd.DataFrame(cursor.fetchall(), columns = ['STATEFIPS','COUNTYFIPS','YEAR', 'UNEMP
_PCT'])
Out[17]:
STATEFIPS COUNTYFIPS YEAR UNEMP_PCT
0 13 013 2000 3.0
1 13 015 2000 3.7
2 13 017 2000 4.9
3 13 019 2000 4.0
4 13 021 2000 4.4
5 13 023 2000 4.3
6 13 025 2000 4.2
7 13 027 2000 4.7
8 13 029 2000 3.2
Create region of interest: 100 mi buffer around Kansas, Missouri, Oklahoma, Tennessee, Kentucky, Arkansas
select sdo_util.to_wktgeometry(
sdo_geom.sdo_buffer(
sdo_aggr_union(sdoaggrtype(c.geom, 0.05)), 100, 0.05, 'unit=MILE')) a
s geometry
FROM states c
WHERE state in ('Kansas','Missouri','Oklahoma','Tennessee','Kentucky','Arkansas')
"""
)
gdfAOI = gpd.GeoDataFrame(cursor.fetchall(), columns = ['geometry'])
gdfAOI['geometry'] = gpd.GeoSeries(gdfAOI['geometry'].apply(lambda x: loads(x)))
gdfAOI['geometry']
Out[18]: 0 POLYGON ((-95.58789 32.38036, -95.58620 32.377...

Name: geometry, dtype: geometry
View the region
8 of 17 8/10/20, 10:33 AM
In [19]: f, ax = plt.subplots(1, figsize=(12,5))
us48shp.plot(ax=ax, facecolor='white', edgecolor='#c0c0c0')
gdfAOI.plot(ax=ax, facecolor='none', edgecolor='darkblue', hatch='|||')
Out[19]: <matplotlib.axes._subplots.AxesSubplot at 0x7f635ec9c3d0>
Prepare unemployment data for analysis (pivot, add geometry, spatial filter)
WITH
-- pivot the unemployment data

unemp_data as (
SELECT * FROM
(select statefips, countyfips, unemp_pct, year from bls_unemployment)
PIVOT( avg(unemp_pct)
FOR year IN (1996,1997,1998,1999,2000,2001,2002,2003,
2004,2005,2006,2007,2008,2009,2010,2011,
2012,2013,2014,2015,2016,2017,2018) )
),
-- define region of interest

aoi as (
select sdo_geom.sdo_buffer(
sdo_aggr_union(sdoaggrtype(c.geom, 0.05)), 100, 0.05, 'unit=MILE') as
geom
FROM states c
WHERE state in ('Kansas','Missouri','Oklahoma','Tennessee','Kentucky','Arkansas
')
)
-- add geometry, county/state names, filter for counties in the region of interest
SELECT c.state, c.county, a.*, sdo_util.to_wktgeometry(b.geom) as geometry
FROM unemp_data a, cb_2018_us_county_500k b, fips_county c, aoi
WHERE a.statefips=b.statefp and a.countyfips=b.countyfp
AND a.statefips=c.state_fips and a.countyfips=c.county_fips
AND sdo_anyinteract(b.geom,aoi.geom)='TRUE'
""")
Out[20]: <cx_Oracle.Cursor on <cx_Oracle.Connection to demo_data@spatial2_medium>>
9 of 17 8/10/20, 10:33 AM
In [21]: gdf = gpd.GeoDataFrame(cursor.fetchall(), columns = ['STATE','COUNTY','STATEFIPS
','COUNTYFIPS','1996','1997',
'1998','1999','2000','2001','
2002','2003','2004','2005',
'2006','2010','2007','2008','
2009','2011','2012','2013',
'2014','2015','2016','2017','
2018','geometry'])
gdf.head()
Out[21]:
STATE COUNTY STATEFIPS COUNTYFIPS 1996 1997 1998 1999 2000 2001 ... 2009 2011 2012
0 Mississippi Leake 28 079 5.0 5.2 4.8 6.1 6.3 5.8 ... 9.7 9.7 9.2
1 Mississippi Lee 28 081 4.6 4.7 4.1 3.8 4.4 4.9 ... 10.4 9.6 8.3
2 Mississippi Leflore 28 083 9.2 9.3 8.9 8.4 8.8 9.0 ... 15.3 14.9 14.0
3 Mississippi Lincoln 28 085 5.4 5.5 4.9 4.8 5.4 5.4 ... 10.5 9.8 8.6
4 Mississippi Lowndes 28 087 6.5 7.3 7.5 5.8 5.5 6.4 ... 11.8 10.5 9.1
5 rows × 28 columns
View regional unemployment over time
10 of 17 8/10/20, 10:33 AM
In [22]: index_year = range(2004,2018,2)
fig, axes = plt.subplots(nrows=2, ncols=3,figsize = (15,7))
for i in range(2):
for j in range(3):
ax = axes[i,j]
gdf.plot(ax=ax, column=str(index_year[i*3+j]), cmap='OrRd', scheme='quanti
les', legend=True)
us48shp.plot(ax=ax, facecolor='none', edgecolor='#c0c0c0')
gdfAOI.plot(ax=ax, facecolor='none', edgecolor='darkblue')
ax.set_title('unemployment pct %s Quintiles'%str(index_year[i*3+j]))
ax.axis('off')
leg = ax.get_legend()
leg.set_bbox_to_anchor((0.18, 0.0, 0.16, 0.2))
plt.tight_layout()
Temporal analysis with Discrete Markov Chains (DMC)

DMC is only suitable for certain scenarios (https://en.wikipedia.org/wiki/Markov_chain (https://en.wikipedia.org
/wiki/Markov_chain))
Input: Set of observation time series
Result: Probabilities and predictions of future observation transitions
PySAL supports Classic Markov and 2 spatial variants Spatial Markov and LISA Markov
Classic Hello World Example:
|-> 80% Sunny day

Sunny day |
|-> 20% Rainy day
|-> 70% Sunny day

Rainy day |
|-> 30% Rainy day
11 of 17 8/10/20, 10:33 AM
Prepare regional unemployment data
Data needs to be perpared as "array of time series" where the values are classified unemployment (we'll use quintile) per
year
[
[county 1 unemployment time series],
[county 2 unemployment time series],
...
[county n unemployment time series]
]
For example [2, 3, 3, 1, 4, 2, ....] is a time series showing 1st year unemployment in the 2nd quintile, 2nd
year unemployment in the 3rd quintile, and so on.
Generate arrays of binned values, where each array is a year with values for each county
In [23]: binnedData = np.array([mc.Quantiles(gdf[str(y)],k=5).yb for y in range(2004,201

8)])
print(binnedData.shape); print(); print(binnedData)
(14, 1271)
[[3 2 4 ... 4 2 3]
[4 3 4 ... 4 2 2]
[4 3 4 ... 3 2 3]
...
[3 2 4 ... 4 3 4]
[3 2 4 ... 4 3 4]
[4 2 4 ... 4 4 4]]
Transpose so that each array is a county with values for each year (time series)
In [24]: binnedDataT = binnedData.transpose()

print(binnedDataT.shape); print(); print(binnedDataT)
(1271, 14)
[[3 4 4 ... 3 3 4]
[2 3 3 ... 2 2 2]
[4 4 4 ... 4 4 4]
...
[4 4 3 ... 4 4 4]
[2 2 2 ... 3 3 4]
[3 2 3 ... 4 4 4]]
Create "aspatial" Markov instance
12 of 17 8/10/20, 10:33 AM
In [25]: m5 = giddy.markov.Markov(binnedDataT)
The Markov Chain is irreducible and is composed by:

1 Recurrent class (indices):
[0 1 2 3 4]
0 Transient classes.
The Markov Chain has 0 absorbing states.
View transition counts
In [26]: m5.transitions
Out[26]: array([[2946., 463., 44., 10., 5.],

[ 483., 2057., 673., 123., 24.],
[ 42., 718., 1663., 686., 117.],
[ 14., 98., 740., 1813., 584.],
[ 3., 15., 87., 638., 2477.]])
View transition probabilities
In [27]: print(m5.p)
[[0.8495 0.1335 0.0127 0.0029 0.0014]

[0.1438 0.6122 0.2003 0.0366 0.0071]
[0.013 0.2226 0.5155 0.2126 0.0363]
[0.0043 0.0302 0.2278 0.558 0.1797]
[0.0009 0.0047 0.027 0.1981 0.7693]]
View first mean passage time
In [28]: print(giddy.ergodic.fmpt(m5.p))
[[ 4.6041 7.9945 14.8811 22.1314 35.6067]

[22.8275 4.9295 9.0061 16.3859 29.9697]
[31.2434 10.4107 5.1934 10.635 24.3351]
[35.8927 15.3302 7.5034 5.0902 17.1953]
[39.2734 18.7488 11.0187 5.9992 5.2373]]
Before starting Spatial Markov, test for spatial randomness
Global spatial autocorrelation over time

Compare to random
13 of 17 8/10/20, 10:33 AM
In [29]: wq = lp.weights.Queen.from_dataframe(gdf)
years = np.arange(2004,2019)
mitest = [esda.Moran(gdf[str(x)], wq) for x in years]
res = np.array([(mi.I, mi.EI, mi.seI_norm, mi.sim[974]) for mi in mitest])
fig, ax = plt.subplots(nrows=1, ncols=1,figsize = (10,5) )
ax.plot(years, res[:,0], label='Moran\'s I')
ax.plot(years, res[:,1]+1.96*res[:,2], label='Upper bound',linestyle='dashed')
ax.plot(years, res[:,1]-1.96*res[:,2], label='Lower bound',linestyle='dashed')
ax.set_title("Global spatial autocorrelation",fontdict={'fontsize':15})
ax.set_xlim([2004, 2018])
ax.legend()
Out[29]: <matplotlib.legend.Legend at 0x7f635f46ba10>
Spatial Markov
Markov for regional context, using classification (bins) of Spatial Lag

Spatial Markov classifies the lag and observations for us
We will use quintiles as the classification
Result is Markov for each quintile of Spatial Lag
14 of 17 8/10/20, 10:33 AM
In [30]: npData = np.array([gdf[str(y)] for y in range(2004,2018)])
sm = giddy.markov.Spatial_Markov(npData.transpose(), wq, fixed = True, k = 5, m=5,
fill_empty_classes=True)
sm.summary()
--------------------------------------------------------------
Spatial Markov Test
--------------------------------------------------------------
Number of classes: 5
Number of transitions: 16523
Number of regimes: 5
Regime names: LAG0, LAG1, LAG2, LAG3, LAG4
--------------------------------------------------------------
Test LR Chi-2
Stat. 651.303 673.460
DOF 72 72
p-value 0.000 0.000
--------------------------------------------------------------
P(H0) C0 C1 C2 C3 C4
C0 0.726 0.127 0.077 0.054 0.016
C1 0.291 0.390 0.117 0.123 0.079
C2 0.042 0.329 0.359 0.114 0.155
C3 0.039 0.053 0.303 0.417 0.188
C4 0.023 0.078 0.078 0.244 0.576
--------------------------------------------------------------
P(LAG0) C0 C1 C2 C3 C4
C0 0.770 0.120 0.066 0.034 0.011
C1 0.447 0.280 0.096 0.120 0.057
C2 0.110 0.329 0.315 0.178 0.068
C3 0.077 0.154 0.462 0.154 0.154
C4 0.000 0.000 0.000 0.000 0.000
--------------------------------------------------------------
C0 0.599 0.148 0.104 0.117 0.033
C1 0.270 0.388 0.127 0.131 0.084
C2 0.035 0.393 0.258 0.118 0.197
C3 0.051 0.082 0.378 0.306 0.184
C4 0.000 0.000 0.111 0.444 0.444
--------------------------------------------------------------
C0 0.615 0.156 0.138 0.083 0.009
C1 0.257 0.442 0.112 0.102 0.087
C2 0.040 0.349 0.363 0.099 0.148
C3 0.007 0.055 0.362 0.346 0.230
C4 0.016 0.065 0.048 0.387 0.484
--------------------------------------------------------------
C0 0.429 0.286 0.143 0.143 0.000
C1 0.170 0.491 0.104 0.179 0.057
C2 0.042 0.256 0.434 0.125 0.144
C3 0.032 0.040 0.313 0.425 0.191
C4 0.016 0.034 0.030 0.423 0.496
--------------------------------------------------------------
C0 0.000 0.000 0.000 0.000 0.000
C1 0.000 1.000 0.000 0.000 0.000
C2 0.105 0.158 0.395 0.211 0.132
C3 0.088 0.082 0.201 0.491 0.139
C4 0.025 0.089 0.090 0.197 0.598
--------------------------------------------------------------
15 of 17 8/10/20, 10:33 AM
Visualize the Spatial Markov result
In [31]: fig, axes = plt.subplots(2,3,figsize = (15,10))

for i in range(2):
for j in range(3):
ax = axes[i,j]
if i==0 and j==0:
p_temp = sm.p
im = ax.imshow(p_temp,cmap = "Reds",vmin=0, vmax=1)
ax.set_title("Pooled",fontsize=18)
else:
p_temp = sm.P[i*3+j-1]
im = ax.imshow(p_temp,cmap = "Reds",vmin=0, vmax=1)
ax.set_title("Spatial Lag %d"%(i*3+j),fontsize=18)
for x in range(len(p_temp)):
for y in range(len(p_temp)):
text = ax.text(y, x, round(p_temp[x, y], 2),
ha="center", va="center", color="black")
fig.subplots_adjust(right=0.92)
cbar_ax = fig.add_axes([0.95, 0.228, 0.01, 0.5])
fig.colorbar(im, cax=cbar_ax)
Out[31]: <matplotlib.colorbar.Colorbar at 0x7f635d9f1350>
From the summary above, for a county with unemployment in 5th quintile
the probability of remaining in 5th quintile is ~60% if its neighbors are in the 5th quintile
the probability of remaining in 5th quintile is ~50% if its neighbors are in the 4th quintile
16 of 17 8/10/20, 10:33 AM
Spatially conditional first mean passage times when neighbors are in 2nd quintile
In [32]: print(sm.F[1])
[[ 5.4579 6.369 6.1909 6.4034 10.3668]

[ 8.7361 4.9012 5.8694 6.0232 9.5586]
[11.7555 4.8698 4.9459 5.6106 8.1261]
[12.6411 6.7157 3.8864 4.4734 7.7496]
[14.264 8.1465 4.9091 2.9221 5.3472]]
From the summary above, for a county with neighbors in the 2nd quintile
it will take roughy 5 years to transition to the 5th quintile to 3nd quintile
it will take roughy 8 years to transition to the 5th quintile to 2nd quintile
17 of 17 8/10/20, 10:33 AM

Scenario 1:: Acknowlegement

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scenario 1:: Acknowlegement

Uploaded by

Copyright:

Available Formats

Acknowlegement

Code for this PySAL analyses below was adapated from

Import libraries needed for the 2 demo scenarios

In [33]: import cx_Oracle

Connect to Oracle Autonomous Database

In [2]: file = open('/opt/dbconfig_adw.txt', 'r')

Run query returning "WKT" geometry

0 POLYGON ((-68.90625 46.40625, -67.50000 46.406...

1 POLYGON ((-70.31250 45.00000, -68.90625 45.000...

2 POLYGON ((-68.90625 45.00000, -67.50000 45.000...

3 POLYGON ((-101.25000 43.59375, -99.84375 43.59...

4 POLYGON ((-102.65625 43.59375, -101.25000 43.5...

473 POLYGON ((-88.59375 37.96875, -87.18750 37.968...

474 POLYGON ((-92.81250 42.18750, -91.40625 42.187...

475 POLYGON ((-94.21875 47.81250, -92.81250 47.812...

476 POLYGON ((-84.37500 46.40625, -82.96875 46.406...

477 POLYGON ((-70.31250 46.40625, -68.90625 46.406...

478 rows × 1 columns

Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7f63663c5b50>

0 253 46.341463 POLYGON ((-68.90625 46.40625, -67.50000 46.406...

1 254 47.368421 POLYGON ((-70.31250 45.00000, -68.90625 45.000...

2 255 38.461538 POLYGON ((-68.90625 45.00000, -67.50000 45.000...

3 256 9.523810 POLYGON ((-101.25000 43.59375, -99.84375 43.59...

4 257 29.411765 POLYGON ((-102.65625 43.59375, -101.25000 43.5...

In [7]: fig, ax = plt.subplots(figsize=(11,5))

Out[7]: <matplotlib.axes._subplots.AxesSubplot at 0x7f6365b9ea10>

Analyze with PySAL (geospatial data science library) http://pysal.org/

Correlation of home prices and household income: 2 variables... makes sense

Calculate global spatial sutocorrelation

View in comparison to spatial randomness

In [9]: import seaborn as sbn

Out[9]: Text(0.5, 0, "Moran's I")

Calculate local spatial autocorrelation

In [10]: li = esda.Moran_Local(y, wq)

p_sim is probability that value is spartially random

q is quandrant location: 1 HH (hot spot), 2 LH, 3 LL (cold spot), 4 HL

Out[12]: array([1, 1, 1, 3, 3, 1, 3, 2, 4, 1, 3, 3, 1, 3, 3, 3, 1, 3, 3, 1])

Hot spots: quadrant 1 and probability of spatial randomness <5%

In [13]: sig = li.p_sim < 0.05

Cold spots: quadrant 3 and probability of spatial randomness <5%

Connect to Oracle Autonomous Database

In [15]: file = open('/opt/dbconfig_adw.txt', 'r')

In [16]: def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):

0 13 013 2000 3.0

1 13 015 2000 3.7

2 13 017 2000 4.9

3 13 019 2000 4.0

4 13 021 2000 4.4

5 13 023 2000 4.3

6 13 025 2000 4.2

7 13 027 2000 4.7

8 13 029 2000 3.2

Out[18]: 0 POLYGON ((-95.58789 32.38036, -95.58620 32.377...

View the region

Out[19]: <matplotlib.axes._subplots.AxesSubplot at 0x7f635ec9c3d0>

-- pivot the unemployment data

-- define region of interest

Out[20]: <cx_Oracle.Cursor on <cx_Oracle.Connection to demo_data@spatial2_medium>>

View regional unemployment over time

Temporal analysis with Discrete Markov Chains (DMC)

|-> 80% Sunny day

|-> 70% Sunny day