Professional Documents
Culture Documents
Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area
Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area
Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area
Paul T. Grogan
ptgrogan@mit.edu
Massachusetts Institute of Technology
Introduction
The placement of storefronts is a difficult question on which many corporations spend a great amount of
time, effort, and money. There is a careful interplay between environment, potential customers, other
storefronts from the same franchise, and other storefronts for competing franchises. From the customers
perspective, the convenience of storefronts, especially for discretionary products or services, is of the
utmost importance. In fact, some franchises develop mobile phone applications to provide their customers
with an easy way to find the nearest storefront.1
This project takes an in-depth view of the storefront placements of Dunkin Donuts and Starbucks, two
competing franchises with strong presences in the Cambridge-Boston area. Both franchises purvey coffee,
coffee drinks, light meals, and pastries and cater especially well to sleep-deprived graduate students.
However, Dunkin Donuts typically puts more emphasis on take-out (convenience) customers looking to
grab a quick coffee before class whereas Starbucks provides an environment conducive to socializing,
meetings, writing theses, or studying over a longer duration. These differences in target customers may
drive differences in the distribution of storefronts in the area.
The goal of this project is to apply some of the concepts learned in ESD.86 on probabilistic modeling and
to the real-world system of franchise storefronts and customers. The focus of the analysis is directed on
the convenience of accessing storefronts, determined by the distance to the nearest location from a
random customer. The nearest neighbor probabilistic model is a natural choice for application to this
problem. Under this model, the distance from a random uniformly-distributed customer to the closest
spatially Poisson distributed storefront can be expressed with a closed-form equation. Of course, in the
real-world system, there are several assumptions that must be checked.
Does the nearest-neighbor distance correlate with the actual closest storefront distance?
Is the Euclidean or Manhattan distance metric appropriate for pedestrian walking paths?
To answer these questions, as well as the greater question of which coffee franchise provides better
service to the residents of the Cambridge/Boston area, the project is broken down into three parts. First,
data must be gathered on the existing storefront locations within an area of interest. Fortunately, both
franchises provide store locator services from the corporate web sites. Additionally, data representing
1
Grogan ESD.86
the demand distribution either through population density or other relevant features are required for
constructing the customer model. Second, probabilistic distributions will be created in accordance with
the nearest neighbor model. Using the data gathered in the first phase, storefront locations will be
modeled as spatial Poisson distributions and customers will be modeled with uniform distributions.
Finally, comparative analysis will investigate the differences between the two franchises as well as the
underlying assumptions and accuracy of the probabilistic models.
Grogan ESD.86
Data Gathering
The data gathering portion of the project assembles the information required to build the probabilistic
models. There are two primary formats of data needed: positional data and population data. Positional
data provides coordinates for storefront locations for both franchises as well as locations of other features
that may be helpful in the analysis. Population data provides a sense of customer density that will be used
to help drive customer demand models.
Positional Coordinates
Not long ago, gathering position coordinates in a format conducive to numerical analysis would have
been an insurmountable challenge for a term project. Fortunately, with the confluence of several
technologies, it is no longer out of scope to build a very accurate representation of the real world.
The general process to gather location data is as follows:
1. Aggregate addresses using online-available services or documents
2. Process addresses into GPS coordinates using online GeoCoder tool2
3. Visualize GPS coordinates using online mapping applications such as Google Maps, iterating on
improperly-identified addresses as necessary
4. Transform GPS coordinates into Cartesian coordinates using the haversine formula3
The main innovation in the above steps is the availability of the GeoCoder tool, which allows batch
queries of addresses to either Yahoo or Google mapping applications. Though the queries are not always
correct, it dramatically reduces the time required to generate GPS coordinates (latitude and longitude)
from text-based addresses.
Franchise Storefronts
The franchise storefront addresses are readily available on both Dunkin Donuts4 and Starbucks5
corporate websites. In both cases, the search criteria was limited to a target area being within five miles of
ZIP code 02139 , which resolves to a location near Central Square in Cambridge, MA. In addition, all
franchise storefront locations at Logan International Airport were removed under the assumption that
Grogan ESD.86
airline customers do not include locally-quantifiable customers. With these restrictions, there were a total
of 163 Dunkin Donuts and 59 Starbucks franchise storefronts identified in the target area.
MBTA Stations
As noted in one journal article, the optimal storefront placement for discretionary services may be at
intersections of high pedestrian traffic.6 In the Boston area, the MBTA public transportation system hosts
an average weekday ridership of 1.24 million customers as of April 20107 and is a prime target for
storefront location placement. In this project, MBTA stations on the red, blue, green, orange, and silver
lines were considered as inputs for a potential customer model. Also, as addresses are not widely used for
these stations, an freely-distributable list of 142 stations current through 2006 including GPS coordinates
was used for station location data.8
Visualizations
As an important part of gathering data, visualizations were used throughout the project to verify locations.
Figure 1 (below) shows plots of the storefront locations and MBTA stations using both GPS and
Cartesian coordinate systems. In the Cartesian coordinate system, the five-mile radius is highlighted.
a)
b)
Figure 1: a) Raw GPS Position Coordinates b) Cartesian Position Coordinates with 5-Mile Radius Highlighted
To improve the context of the franchise storefronts and MBTA stations, the location data was overlaid on
an area map9, as shown in Figure 2.
Berman, O., Larson R., Fouska N., Optimal Location of Discretionary Service Facilities, Transportation Science,
Vol. 26, No. 3, pp. 201-211, August 1992.
7
Davey R., MBTA Scorecard, April 2010. Retrieved 4/25/2010 from http://mbta.com/about_the_mbta/scorecard/
8
Demaine, E., Boston Subway Google Map. Retrieved 4/25/2010 from http://erikdemaine.org/maps/mbta/
9
Background map retrieved from Google Maps: http://maps.google.com
Grogan ESD.86
Population Density
Gathering population density data was a challenge for this project. Although population data is commonly
available from decadal censuses, it is commonly aggregated by county or city which is not conducive for
spatial analysis. Fortunately, an online Digital Atlas of Boston includes population maps based on the
1990 census utilizing red dots to represent 100 persons randomly distributed within a census tract.10
With some post-processing using Adobe Photoshop, the image was copped, resized, and filtered to
display only the population information which is readable using built-in MATLAB image processing
functions. The processed data is shown in Figure 3. Though there are some concerns over the accuracy of
the resulting population data,11 it should be internally consistent and be helpful towards the modeling
process.
10
Bowen, W., Boston and Vicinity: Total Population, 1997. Retrieved 4/25/2010 from
http://130.166.124.2/boston/bos1.GIF
11
There is some discrepancy if a dot is one pixel or two and whether the pixels were sampled with or without
replacement. In some cases, one pixel could represent somewhere between 50 and 100 people, more if there could
be overlap, though from rough estimates, the 100 people per pixel seems to provide accurate population data.
Grogan ESD.86
a)
b)
Figure 3: a) Raw Population Data for Boston Area b) Processed Population Data of Target Area
Grogan ESD.86
Probabilistic Modeling
Within the topics covered in ESD.86, the discussion of spatial probability distributions involved the
nearest neighbor problem of finding the expected distance to the closest neighbor from a random point.
This problem uses a uniform distribution to select the customer and a spatial Poisson distribution for
the neighboring storefronts within a specified area.
If this type of problem is to be extended to a real-world case of storefronts, ultimately selecting whether
Dunkin Donuts or Starbucks is a closer neighbor for random customers, the distributions of both the
customers and storefronts should be investigated. A city-wide Poisson distribution of storefronts is not
likely to hold as there is clearly some location dependence in the storefront placing. On a smaller scale,
however, a spatial Poisson distribution is conceivable, as the exact placement of a storefront within a
small area may be independent of others. In a similar sense, a city-wide uniform distribution of customers
is not likely to hold as there are significantly higher concentrations of customers in the city-centers. On
smaller scales, however, uniformly-distributed customers may be a valid approximation.
To implement the concept of piecewise spatially Poisson distributed storefronts and uniformly distributed
customers, the initial 78.5 square mile target area (circle with 5 mile radius) was sub selected to a 49
square miles (square with 7 miles per side). This area was then broken down into 100 square sectors, each
0.7 miles per side, or 0.49 square miles in area.
a)
b)
Figure 4: a) Target Area Divided into Sectors b) Sectors Highlighted by Neighborhood Assignment
Grogan ESD.86
With the relatively fine level of sector definition, many were not large enough to contain a storefront. In
order to determine a non-zero storefront density for each unit of analysis, sectors were grouped into seven
neighborhoods. The sizing of each neighborhood along with a description is provided in Table 1.
Table 1: Neighborhood Descriptions
Neighborhood
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Number
Sectors
8
18
21
4
3
17
17
Description
Cambridge Highlands, M. Auburn, East Watertown
Cambridge, East Cambridge
Everett, Somerville, Charlestown
Downtown Boston, North End
Back Bay, Fenway
Brookline, Aberdeen, Brighton
Roxbury, South Boston, Harrison Lenox
The process of neighborhood definition was done by hand using approximate city or geographical
boundaries. The neighborhoods do not exactly correspond to the geographical equivalents due to the
discretization of sectors, though the labeling scheme helps infer relative location. The only requirement of
each neighborhood is that it must contain both a Dunkin Donuts and a Starbucks storefront, providing a
non-zero storefront density. In some cases, sectors did not fit into an existing neighborhood, nor did they
exhibit enough information to establish a new neighborhood, so they went unused.
Grogan ESD.86
Using a count of the number of storefronts within each neighborhood and the associated area, the
storefront density parameter was determined for each neighborhood, as shown in Table 2.
Table 2: Storefront Model Parameters
Neighborhood
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Total
Area
(mi2)
Number Storefronts
Starbucks
3.92
8.82
10.29
1.96
1.47
8.33
8.33
43.12
1
11
2
13
16
10
6
59
Dunkin
Donuts
1
23
26
27
14
19
20
130
Storefront Density
(, 1/mi2)
Dunkin
Starbucks
Either
Donuts
0.2551
0.2551
0.5102
1.2472
2.6077
3.8549
0.1944
2.5267
2.7211
6.6327
13.7755
20.4082
10.8844
9.5238
20.4082
1.2005
2.2809
3.4814
0.7203
2.4010
3.1212
1.3683
3.0148
4.3831
Using the population density information previously gathered, the number of potential customers is
determined for each sector, which is then aggregated into neighborhoods. The probability of a customer
Grogan ESD.86
10
Neighborhood
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Total
Estimated
Population
46900
180600
162900
40800
51700
182500
175500
840900
Customer
Probability
0.0558
0.2148
0.1937
0.0485
0.0615
0.2170
0.2087
1.0000
b)
Figure 7: Histograms Generated During Goodness-of-Fit Test for Spatial Poisson Distribution
Grogan ESD.86
11
It should also be noted that there is some flexibility for running the test with different numbers of bins and
different sized sectors. With larger sectors, there are more storefronts on average allowing more numerous
bins, but the downside is that the frequency in each bin is decreased.
Grogan ESD.86
12
Comparative Analysis
Armed with the probabilistic models for both storefront and customer distributions, the next step is to
apply the models to compare the expected distances to each franchise between the neighborhoods, over
the entire city, and also to test the validity of the models by processing existing data.
Visualizations
In order to frame the resulting discussion, several visualizations of the storefront, MBTA station, and
population density are provided below. Storefronts are clearly focused in the Downtown and Back Bay
neighborhoods (where the population density is highest), though there is also significant population
density in the Cambridge and Southwest neighborhoods without similar storefront densities. Also, the
MBTA station density is higher in the Southwest neighborhood due to the numerous green line stops.
Grogan ESD.86
13
f De (r ) = 2re r
f Dm (r ) = 4 re 2r
E [De ] = r f De (r )dr =
E [Dm ] = r f Dm (r )dr =
1
4
By applying the nearest-neighbor formulas, the expected distance to the nearest storefront of one or either
franchise can be determined for each neighborhood. When combined with the customer model,
aggregated values can be determined for the target area as a whole.
Table 4: Probabilistic Model Results by Neighborhood
Neighborhood
Customer
Prob.
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Aggregated
0.0558
0.2148
0.1937
0.0485
0.0615
0.2170
0.2087
1.0000
The Downtown and Back Bay neighborhoods have the shorted expected distance to either storefront, with
an average of just 580 feet to the closest Dunkin Donuts or Starbucks. A few neighborhoods show
drastically different storefront placement strategies between the two franchises. Dunkin Donuts holds a
large advantage in the Northeast and moderate advantages in the Southeast. The only neighborhood where
Starbucks holds an advantage is in the Back Bay.
Grogan ESD.86
14
Neighborhood
Customer
Count
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Aggregated
469
1806
1629
408
517
1825
1755
8409
Remarkably, the results closely mirror those of the model using the estimated expected distance under
uniformly distributed customers and Poisson spatially-distributed storefronts in each sector. All of the
aggregated measures are within 10% of the previous estimates. The largest neighborhood-specific
differences occurred in the Northeast, Northwest neighborhoods (with errors around 0.5 miles for the
advantage of the nearest storefront), indicating that the spatial Poisson distribution may not be a good fit
for these regions.
Grogan ESD.86
15
a)
b)
Figure 10: a) Minimum Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks)
b) Absolute Expected Distance Error
The effectiveness of the model can be statistically evaluated using a paired t-test by neighborhood. The
null hypothesis that the mean distance to the closest storefront differs between the two approaches cannot
be rejected for either Dunkin Donuts (p=0.418), Starbucks (p=0.492), or either (p=0.994) using
Euclidean distance.
Comparison with Exact MBTA Station Demand
Under the theory that optimal storefronts should be placed at the intersections of high-traffic areas,
MBTA stations are a prime location for Dunkin Donuts and Starbucks. This next section investigates
how the minimum distance to the closest storefront varies if customers originate only from MBTA
stations, rather than their residences. MBTA stations for the red, green, blue, orange, and silver lines
within the neighborhoods are used.
Table 6: MBTA-based Customer Results by Neighborhood
Neighborhood
MBTA
Station
Count
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Aggregated
0
14
6
11
12
45
33
121
Grogan ESD.86
16
In every neighborhood except for Southwests Starbucks with Manhattan distance, the mean distance
from an MBTA station to the nearest storefront is lower than the population distribution. In the case of
Cambridge, the mean distances were 60-75% lower, indicating a strong preference for storefront locations
near public transport stations. In the aggregate sense, the mean distances were consistently around 35%
lower than simply using population density data.
Figure 11: Minimum Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks) with MBTA Customers
The difference in expected distance between the population-based and MBTA-based models can be
statistically evaluated using a paired t-test by neighborhood. The null hypothesis that the mean distance to
the closest storefront differs between the two approaches can be rejected for Dunkin Donuts (p=0.022),
and either (p=0.028), but not for Starbucks (p=0.072), using Euclidean distance.
D 4
E m =
De
Grogan ESD.86
17
This formula, however, assumes that the origin and destination remain constant between the two metrics.
In the application used in this project, of finding the nearest storefront using one or the other metric, the
nearest physical location may differ between the two metrics. For example, even if storefront A is closest
under the Euclidean metric, there may be another storefront B that is closer under the Manhattan metric.
Therefore, in this application, we would expect the ratio between distance metrics to be less than 4/.
To investigate, the 8409 pair-wise distances for the population-based analysis can be plotted against each
other in a scatter plot. As expected, the Manhattan metric distance is always greater than the Euclidean
metric distance, but the expected ratio (highlighted in red), does not appear to be at the center of the
distribution.
Figure 12: Scatter Plot of Closest Dunkin' Donuts Storefront under Different Distance Metrics
Using a hypothesis test on mean, the ratio of Manhattan-to-Euclidean distances is found to be 1.2617 with
a 95% confidence interval of [1.2591, 1.2642]. Of note, this interval does not include 4/ 1.2732,
meaning the expected ratio between closest storefront using Euclidean distance metric and closest
storefront using Manhattan distance metric is not 4/ in this application, though for all practical purposes,
the approximation is fine.
Distance Metrics vs. Google Distance
Aside from how the distance metrics relate to each other, it is of interest to see how they compare to
real distance calculations. Google Maps provides a direction generation service that provides a walking
distance between an origin and a destination point. Although it is still under development, it often
Grogan ESD.86
18
provides more realistic distance calculations based on obstructions such as waterways, highways, crooked
Boston streets, and buildings.
Since the question of distance metric accuracy does not rely on the underlying customer distribution,
customers were selected at random with a uniform distribution. In total, 80 customer origins were
generated, each being paired with the closest Dunkin Donuts (via Euclidean metric) and the GPS
coordinates were used to find the walking distance in Google Maps.
There are a few challenges to using the Google distance metric. First, the output is typically rounded to
the nearest 0.1 mile, which causes some accuracy problems. Second, there are some locations for which
walking directions do not exist (e.g. if the point falls in the middle of a highway) these points were
omitted from the analysis. Aside from these points, there were a couple outliers in the resulting data set
where the Google distance was (much) greater than the Euclidean or Manhattan distance. These points
typically corresponded to navigating the network of roads and highways between the coastal islands near
Logan Airport.
Grogan ESD.86
19
a)
b)
Figure 14: Distance Ratios for a) Google vs. Euclidean b) Google vs. Manhattan
In general, the Manhattan metric outperformed the Euclidean metric when compared to the Google
distance. The Euclidean metric under-estimated the Google distances by an average of 28% (37% if
outliers are included). The Manhattan metric under-estimated the Google distance by an average of 9%
(20% if outliers are included). Note that in this case, the Manhattan-to-Euclidean metric ratio confidence
interval does include 4/, as the origin-destination locations are invariant under the choice of metric.
Table 7: Distance Metric Ratios
Ratio
Dg/De
Dg/Dm
Dm/De
95% CI LB
w/o Outliers
1.3191
1.0356
1.2562
Grogan ESD.86
Mean
w/o Outliers
1.4019
1.1031
1.2821
95% CI UB
w/o Outliers
1.4847
1.1706
1.3080
95% CI LB
w/ Outliers
1.2955
1.0253
1.2577
Mean
w/ Outliers
1.6151
1.2615
1.2831
95% CI UB
w/ Outliers
1.9348
1.4977
1.3084
20
Conclusions
In conclusion, Dunkin Donuts holds a stronger grasp on the Cambridge-Boston area coffee market. The
areas of greatest advantage for Dunkin Donuts include the Northeast and the Southeast. Only in the Back
Bay neighborhood does Starbucks hold a shorter expected distance. Contributing to this analysis, several
key assumptions have been checked and are summarized as follows.
Can the franchise storefronts be modeled with a spatial Poission distribution?
On a city-wide scale, the spatial Poisson distribution does not accurately model the franchise storefront
locations. However, on a smaller scale such as neighborhoods, especially in regions of uniform
characteristics and low-density, spatial Poisson distributions can be used to accurately model the locations
of franchise storefronts.
Can the customers be modeled with a uniform distribution?
Similar to the storefront distribution method, although it is difficult to model an entire city with a uniform
customer distribution, on a smaller scale such as neighborhoods, piecewise-uniform customer
distributions can be used to model demands.
Does the nearest-neighbor distance correlate with the actual closest storefront distance?
The nearest-neighbor expected distance was not statistically different from the actual distances to the
closest storefront for both Dunkin Donuts and Starbucks, indicating that it seems to be an accurate
estimation of actual closest storefronts. The approximation was more accurate in neighborhoods with a
higher storefront density, such as Downtown and Back Bay, and less accurate in less well-defined
neighborhoods such as the Northeast and Northwest.
Is the Euclidean or Manhattan distance metric appropriate for pedestrian walking paths?
When comparing the Euclidean and Manhattan distance metrics as applied in the nearest-neighbor
problem, the differences observed were slightly less than the expected ratio due to changes in the closest
storefront when transitioning from one metric to the other. When compared to a realistic Google distance
metric (using Google Maps to calculate the walking distance), both Euclidean and Manhattan overestimated the distances, through Manhattan metric was generally closer (within 9% not considering
outliers). There were a few outliers observed corresponding to customers originating in difficult to
access spots such as between islands connected by few roads and/or highways.
Grogan ESD.86
21
Latitude
Longitude
42.3650774
-71.1032368
42.3664749
-71.0938686
42.372795
-71.093455
42.3721864
-71.115655
42.3720035
-71.1206418
42.362805
-71.084037
42.3487341
-71.096409
42.3790404
-71.0940013
42.3583
-71.1263
42.3458805
-71.1082714
42.3517146
-71.1216648
42.3745146
-71.0855527
42.373362
-71.118956
42.362548
-71.1303143
42.3831933
-71.1062606
42.3485475
-71.0864513
42.3724142
-71.0794484
42.3435354
-71.1015308
42.34649
-71.0873454
42.367101
-71.076376
42.381481
-71.0851061
42.340164
-71.105855
42.3497
-71.08
42.348296
-71.083099
42.3499
-71.1302
42.3532517
-71.1336082
42.3484649
-71.0780325
42.3423157
-71.1211249
42.3386435
-71.1068662
Grogan ESD.86
22
42.3884292
-71.1035077
42.3420175
-71.0859189
42.3484
-71.075
42.3823624
-71.0791666
42.3376055
-71.1087189
42.34719
-71.075406
42.3559771
-71.1385555
42.3885801
-71.1190066
42.353605
-71.1374228
42.334632
-71.0691073
42.3896854
-71.0883978
42.3678
-71.0649
42.3654553
-71.0611291
42.3521066
-71.0673109
42.3904617
-71.0873328
42.3341
-71.1039
42.3611697
-71.0628675
42.3643423
-71.0633287
42.360367
-71.062124
42.3577347
-71.063153
42.3353579
-71.0878575
42.3643579
-71.0661193
42.3751605
-71.0650097
42.3523815
-71.0650276
42.3643579
-71.0661193
42.3652919
-71.0609754
42.3564
-71.0618
42.3334223
-71.1188349
42.352419
-71.062727
42.3490894
-71.0601767
42.3592205
-71.0595142
42.3369897
-71.0774615
42.354485
-71.05942
42.3508657
-71.0622508
42.3557073
-71.0602716
42.3932588
-71.0832451
42.3318209
-71.1174846
42.3975134
-71.104938
42.3558
-71.149
42.3438149
-71.0660403
42.3953081
-71.1218721
Grogan ESD.86
23
42.3582252
-71.0590107
42.3578883
-71.0579421
42.3577028
-71.0593792
42.3930959
-71.0884036
42.362775
-71.059621
42.3535216
-71.0578313
42.354979
-71.0577489
42.3608194
-71.0558302
42.35672
-71.056577
42.3486807
-71.1496925
42.3348927
-71.0751815
42.3305118
-71.1238184
42.3591596
-71.0548161
42.3536133
-71.0561934
42.3294746
-71.084687
42.3863016
-71.1387706
42.365841
-71.060724
42.3543283
-71.0543216
42.356767
-71.0535218
42.3892258
-71.1429552
42.4009618
-71.1169843
42.3312853
-71.0748777
42.3490753
-71.1530339
42.3579577
-71.0508084
42.3714388
-71.157704
42.3425236
-71.0565548
42.3232
-71.1036
42.398885
-71.1321608
42.3946428
-71.1424549
42.3503648
-71.0503943
42.4053859
-71.101411
42.363237
-71.155593
42.351117
-71.049438
42.4058055
-71.0948504
42.3967042
-71.0651774
42.3358927
-71.1499417
42.3358
-71.056
42.349535
-71.040589
42.4074146
-71.0826755
42.313405
-71.0570969
42.3168307
-71.0982039
Grogan ESD.86
24
42.369
-71.0391999
42.3297
-71.0572
42.3505411
-71.1672294
42.4108632
-71.0881968
42.4110023
-71.1208938
42.3743987
-71.0395548
42.3958
-71.0526
42.4142048
-71.1106422
42.3355984
-71.0458407
42.403701
-71.057403
42.4054905
-71.0617024
42.3118183
-71.114301
42.4122525
-71.0791565
42.3399488
-71.1671857
42.3209
-71.061
42.3345983
-71.0475223
42.3892
-71.0408
42.409729
-71.1394997
42.4134649
-71.131227
42.3104
-71.1152
42.3217677
-71.0568335
42.4182977
-71.1096766
42.4026488
-71.0497033
42.4029719
-71.0576032
42.3938
-71.0386
42.3094486
-71.0825407
42.3663261
-71.1818116
42.3983547
-71.040518
42.3685455
-71.0300292
42.4120094
-71.1486312
42.4095468
-71.0532538
42.3934689
-71.0338541
42.423165
-71.0912071
42.4216695
-71.0753317
42.4211848
-71.1331323
42.3567005
-71.1875089
42.3853394
-71.1833662
42.423576
-71.0712012
42.3084253
-71.0582217
42.4050143
-71.0355293
42.4144921
-71.0474161
Grogan ESD.86
25
42.3622518
-71.1931601
42.4194124
-71.1527705
42.4239636
-71.0656994
42.4273278
-71.0740543
42.3709389
-71.1952175
42.3870472
-71.190931
42.2956999
-71.1162
42.4265519
-71.0672918
42.3982
-71.0207
42.4009
-71.0216
42.4201478
-71.0439663
Address
Latitude
Longitude
42.3656973
-71.1041728
42.3567886
-71.0945479
42.350478
-71.1096413
42.3488
-71.0994
Starbucks Locations
42.349347
-71.099656
42.362707
-71.0864538
42.3507655
-71.1138252
42.362793
-71.086199
42.3449155
-71.1009163
42.3738963
-71.1126878
42.3481
-71.0872
42.3466
-71.0878
42.3720035
-71.1206418
42.349244
-71.080974
42.346302
-71.084032
42.3508
-71.0789
42.3743901
-71.1200808
42.3673199
-71.0775368
42.346175
-71.079397
42.341792
-71.0862592
42.3488089
-71.0775803
42.3384972
-71.107699
42.3485141
-71.076156
42.3379
-71.1045
42.3402211
-71.0889516
42.3491045
-71.1299685
42.3515
-71.0729999
Grogan ESD.86
26
42.3470869
-71.1285487
42.3588955
-71.0707138
42.351456
-71.067188
42.3818755
-71.1198028
42.3600106
-71.0583794
42.3424754
-71.0744816
42.3854634
-71.1135012
42.3511502
-71.0662507
42.352269
-71.064369
42.3337518
-71.1188552
42.348839
-71.06427
42.3557941
-71.0613094
42.3593092
-71.0594142
42.3774405
-71.0647911
42.357925
-71.057883
42.3388327
-71.1367592
42.3549
-71.0569
42.355626
-71.056716
42.3531
-71.0575
42.35996
-71.05579
42.3590059
-71.0559325
42.352344
-71.056266
42.3547995
-71.0548818
42.3602
-71.0509
42.3561655
-71.0520816
42.3589
-71.1536
42.364
-71.0505
42.3955927
-71.1220376
42.363459
-71.157439
42.389376
-71.142569
42.3293
-71.0627
42.3486
-71.1596
Latitude
Longitude
Alewife Station
42.39490705
-71.14098072
Davis Station
42.39606385
-71.12205505
42.38834612
-71.1192441
42.373939
-71.119106
Central Station
42.36516345
-71.10332251
Kendall/MIT Station
42.36246023
-71.08658552
Grogan ESD.86
27
42.36127109
-71.07208014
42.35619719
-71.06229544
42.355295
-71.060788
South Station
42.35170961
-71.05499983
Broadway Station
42.3429
-71.05713
Andrew Station
42.32955
-71.05696
42.32143786
-71.05239272
42.31130702
-71.05322957
42.30026198
-71.06070757
Shawmut Station
42.29279438
-71.06578231
Ashmont Station
42.285924
-71.064219
42.27842012
-71.05974197
Butler Station
42.27211695
-71.06276751
Milton Station
42.27034655
-71.06794953
42.27001311
-71.07324958
42.26789332
-71.08306646
42.2675678
-71.08722925
Mattapan Station
42.26745665
-71.09313011
42.27481612
-71.02917552
Wollaston Station
42.26561466
-71.01940155
42.25093242
-71.00497127
42.23275157
-71.00714922
Braintree Station
42.20878042
-71.00133419
Lechmere Station
42.370582
-71.076884
42.36667752
-71.06816411
North Station
42.365512
-71.061423
Haymarket Station
42.362498
-71.058996
42.359297
-71.059895
42.35239149
-71.06487036
Arlington Station
42.351868
-71.070498
Copley Station
42.349962
-71.078089
42.348097
-71.088396
Kenmore Station
42.348797
-71.095296
42.349297
-71.100796
42.349648
-71.103825
42.34993352
-71.10618711
42.35090086
-71.1140728
42.3511308
-71.11590743
42.35134488
-71.11821413
42.35174133
-71.12126112
42.35207434
-71.12486601
Grogan ESD.86
28
42.35023483
-71.13102436
42.34871243
-71.13415718
42.34844284
-71.13778353
42.34847455
-71.14029408
42.34368509
-71.142869
42.34149641
-71.14662409
42.3403386
-71.15130186
42.33808635
-71.15334034
42.33957728
-71.15778208
42.33994208
-71.16619349
42.34613537
-71.10680938
42.34495386
-71.11101508
42.343997
-71.114596
42.34322516
-71.11734509
42.342097
-71.121396
42.34128229
-71.12461925
42.340072
-71.128526
Fairbanks Station
42.339609
-71.13134623
42.33933937
-71.13542318
42.33846702
-71.13879204
42.33770568
-71.14196777
42.33713468
-71.14512205
42.33589747
-71.1507225
Fenway Station
42.34528691
-71.10439539
Longwood Station
42.34044962
-71.11089706
42.33204296
-71.11811757
42.33121016
-71.12586379
Beaconsfield Station
42.33596092
-71.14160299
Reservoir Station
42.33493783
-71.14940286
42.32667321
-71.16551757
42.32935418
-71.1923182
42.32169964
-71.20617986
Eliot Station
42.31919287
-71.21691942
Waban Station
42.32626075
-71.23117805
Woodland Station
42.33368473
-71.24492168
Riverside Station
42.33711088
-71.2517345
Prudential Station
42.34563581
-71.08158588
Symphony Station
42.342697
-71.085095
42.34032274
-71.08889222
42.33772154
-71.09547973
42.335837
-71.100652
Grogan ESD.86
29
42.334097
-71.104996
42.33374818
-71.10558629
42.33322472
-71.1070776
Riverway Station
42.33197951
-71.11207724
42.33007596
-71.11133695
42.3287593
-71.11059666
State Station
42.358897
-71.057795
Aquarium Station
42.359456
-71.05357
Maverick Station
42.36886
-71.039926
42.37273343
-71.0351944
42.380797
-71.023394
42.386676
-71.006628
42.38840159
-71.00035787
Beachmont Station
42.39741872
-70.99219322
42.40716336
-70.99219322
Wonderland Station
42.414246
-70.992144
42.43534302
-71.07118964
42.42731334
-71.07387185
Wellington Station
42.40429559
-71.07700467
42.38575484
-71.07707977
42.38301288
-71.07710123
42.37263832
-71.07027769
Chinatown Station
42.352228
-71.062892
42.349873
-71.063795
42.34727722
-71.07603908
42.34155192
-71.08321667
Ruggles Station
42.33566748
-71.090523
42.33152742
-71.09540462
42.32273881
-71.1000824
42.31920081
-71.10282898
42.31056915
-71.10731363
42.29814321
-71.11548901
42.346377
-71.064842
42.343878
-71.066039
42.341197
-71.069795
42.338697
-71.073795
42.337456
-71.075812
42.336441
-71.077238
42.33504887
-71.07881784
42.33290747
-71.0810709
42.32889414
-71.08511567
Grogan ESD.86
30
Courthouse Station
42.35207434
-71.04530096
42.3488393
-71.04253292
42.34801465
-71.0371685
42.36628117
-71.01931572
42.34661908
-71.03523731
42.34509659
-71.03197575
42.344602
-71.028307
42.34393885
-71.02721214
42.3438992
-71.03431463
42.34468
-71.034797
42.33986278
-71.03553772
42.3381498
-71.03345633
City Point
42.3382291
-71.02935791
Population Density
Data is formatted as a 254x254-pixel GIF, scaled to approximately 25.4 pixels per mile. Black pixels
indicate 100 units of population, white pixels indicate 0 units of population.
Grogan ESD.86
31
Longitude
Latitude
Longitude
Euclidean
Distance
(De, miles)
42.3082
-71.1327
42.3104
-71.1152
0.9063
1.0454
1.2
42.4231
-71.0546
42.424
-71.0657
0.5698
0.6263
0.7
42.3773
-71.1012
42.379
-71.094
0.3867
0.4878
0.4
42.3162
-71.0922
42.3168
-71.0982
0.3096
0.3501
0.5
42.3481
-71.1549
42.3491
-71.153
0.1167
0.1627
0.1
42.4119
-71.1774
42.4194
-71.1528
1.3603
1.7764
1.8
42.3666
-71.0827
42.3628
-71.084
0.2709
0.3305
0.4
42.3673
-71.0242
42.3685
-71.03
0.3098
0.3836
0.3
42.3539
-71.1775
42.3567
-71.1875
0.5464
0.7045
0.6
42.3199
-71.1624
42.3359
-71.1499
1.275
1.741
1.7
42.3285
-71.1936
42.3399
-71.1672
1.5634
2.1395
1.7
42.3332
-71.0489
42.3346
-71.0475
0.1195
0.1669
0.1
Customer
Grogan ESD.86
Manhattan
Distance
(Dm, miles)
Google
Distance
(Dg, miles)
32
42.3654
-71.1112
42.3651
-71.1032
0.4071
0.4288
0.5
42.3694
-71.1135
42.3722
-71.1157
0.2217
0.3025
0.2
42.3503
-71.0367
42.3495
-71.0406
0.2055
0.2514
0.3
42.4149
-71.1224
42.411
-71.1209
0.2801
0.3462
0.4
42.3359
-71.1003
42.3341
-71.1039
0.2219
0.3082
0.3
42.3812
-71.0732
42.3824
-71.0792
0.315
0.3849
0.5
42.3337
-71.1814
42.3399
-71.1672
0.8444
1.1574
42.3293
-71.1063
42.3341
-71.1039
0.3536
0.4542
0.7
42.3598
-71.0539
42.3592
-71.0548
0.0644
0.091
0.08
42.4084
-71.0839
42.4074
-71.0827
0.0924
0.1306
0.3
42.3766
-71.0385
42.3744
-71.0396
0.1613
0.2059
0.2
42.3112
-71.0867
42.3094
-71.0825
0.2444
0.3334
0.3
42.3756
-71.1229
42.3734
-71.119
0.2539
0.356
0.3
42.3173
-71.0822
42.3094
-71.0825
0.5428
0.5599
0.6
42.3482
-71.1098
42.3459
-71.1083
0.1783
0.2383
0.3
42.3658
-71.1789
42.3663
-71.1818
0.153
0.185
0.2
42.4109
-71.084
42.4109
-71.0882
0.2143
0.2168
0.4
42.3824
-71.0294
42.3892
-71.0408
0.748
1.0518
42.3994
-71.1121
42.401
-71.117
0.2717
0.3573
0.3
42.3123
-71.09
42.3094
-71.0825
0.4288
0.5778
0.5
42.3648
-71.092
42.3665
-71.0939
0.15
0.2111
0.2
42.4141
-71.1559
42.412
-71.1486
0.3982
0.5155
0.5
42.3101
-71.0786
42.3094
-71.0825
0.2062
0.2462
0.3
42.3792
-71.1679
42.3714
-71.1577
0.7473
1.0568
0.9
42.3063
-71.0601
42.3084
-71.0582
0.1754
0.2427
0.2
42.3498
-71.158
42.3491
-71.153
0.2584
0.3036
0.3
42.3895
-71.1657
42.3853
-71.1834
0.9466
1.1894
1.4
42.3089
-71.0686
42.3084
-71.0582
0.5308
0.5626
0.7
42.3294
-71.0452
42.3346
-71.0475
0.3782
0.4777
2.7
42.3171
-71.1547
42.3359
-71.1499
1.321
1.5414
1.8
42.3781
-71.1659
42.3714
-71.1577
0.622
0.8787
0.7
42.3287
-71.1906
42.3399
-71.1672
1.4258
1.9726
1.7
42.3303
-71.1389
42.3359
-71.1499
0.6834
0.9501
0.8
42.4068
-71.0983
42.4054
-71.1014
0.1865
0.2565
0.4
42.348
-71.0513
42.3504
-71.0504
0.1698
0.2096
0.2
42.3095
-71.0549
42.3084
-71.0582
0.1851
0.2438
0.3
42.3212
-71.191
42.3399
-71.1672
1.7766
2.5112
2.4
42.3435
-71.1304
42.3499
-71.1302
0.4423
0.4524
0.8
42.4108
-71.1094
42.4142
-71.1106
0.2436
0.2987
0.3
42.3746
-71.1139
42.3722
-71.1157
0.1893
0.2564
0.2
42.3297
-71.1879
42.3399
-71.1672
1.2727
1.7656
1.5
Grogan ESD.86
33
42.3429
-71.0875
42.342
-71.0859
0.1012
0.1417
0.2
42.4125
-71.0943
42.4109
-71.0882
0.3315
0.4247
0.4
42.4142
-71.1431
42.412
-71.1486
0.3204
0.4337
0.4
42.422
-71.1892
42.4194
-71.1528
1.8684
2.0386
2.3
42.3311
-71.1714
42.3399
-71.1672
0.6481
0.8265
0.8
42.3686
-71.1429
42.3632
-71.1556
0.7465
1.0186
1.2
42.3192
-71.0623
42.3209
-71.061
0.1349
0.1838
0.2
42.3003
-71.1411
42.2957
-71.1162
1.3103
1.589
2.4
42.3312
-71.0764
42.3313
-71.0749
0.0779
0.0836
0.083
42.4268
-71.0322
42.4201
-71.044
0.7564
1.0603
0.9
42.3545
-71.0299
42.3495
-71.0406
0.6446
0.8887
8.2
42.3158
-71.0907
42.3168
-71.0982
0.3897
0.4543
0.6
42.4124
-71.0296
42.405
-71.0355
0.5933
0.813
0.8
42.2998
-71.1247
42.2957
-71.1162
0.5182
0.7172
1.1
42.311
-71.1881
42.3399
-71.1672
2.2673
3.0679
3.5
42.3287
-71.1439
42.3359
-71.1499
0.5849
0.8054
0.6
42.3957
-71.1896
42.387
-71.1909
0.6017
0.6658
42.4139
-71.0443
42.4145
-71.0474
0.1643
0.2
0.3
42.3251
-71.1197
42.3305
-71.1238
0.429
0.5842
0.8
42.3843
-71.179
42.3853
-71.1834
0.2342
0.2947
0.5
42.4035
-71.1712
42.412
-71.1486
1.2935
1.7401
1.7
42.4206
-71.153
42.4194
-71.1528
0.0829
0.0938
0.1
42.4108
-71.1563
42.412
-71.1486
0.4003
0.4751
0.8
42.3475
-71.0438
42.3495
-71.0406
0.216
0.3045
0.4
42.419
-71.0619
42.424
-71.0657
0.394
0.5369
0.6
42.3532
-71.1351
42.3533
-71.1336
0.0762
0.0797
0.078
42.3171
-71.1281
42.3118
-71.1143
0.7934
1.0694
1.2
Grogan ESD.86
34
Grogan ESD.86
35
5/5/10 6:47 PM
1 of 15
5/5/10 6:47 PM
2 of 15
5/5/10 6:47 PM
POP_n = zeros(length(neighborhoods),1);
A_n = zeros(length(neighborhoods),1);
DD_n = zeros(length(neighborhoods),1);
SB_n = zeros(length(neighborhoods),1);
number_pop_s = reshape(number_pop',N_s^2,1);
number_dd_s = reshape(number_dd',N_s^2,1);
number_sb_s = reshape(number_sb',N_s^2,1);
for i=1:length(neighborhoods)
A_n(i) = W_s^2*eval(['length(' neighborhoods{i} ')']);
DD_n(i) = sum(eval(['number_dd_s(' neighborhoods{i} ')']));
SB_n(i) = sum(eval(['number_sb_s(' neighborhoods{i} ')']));
POP_n(i) = sum(eval(['number_pop_s(' neighborhoods{i} ')']));
end
gamma_dd = DD_n./A_n;
gamma_sb = SB_n./A_n;
p_cust = POP_n./sum(POP_n);
=
=
=
=
sqrt(1./(4*gamma_dd));
sqrt(1./(4*gamma_sb));
sqrt(pi()./(8*gamma_dd));
sqrt(pi()./(8*gamma_sb));
exp_de = sqrt(1./(4*(gamma_dd+gamma_sb)));
exp_dm = sqrt(pi()./(8*(gamma_dd+gamma_sb)));
total_exp_de_dd
total_exp_de_sb
total_exp_dm_dd
total_exp_dm_sb
=
=
=
=
p_cust'*exp_de_dd;
p_cust'*exp_de_sb;
p_cust'*exp_dm_dd;
p_cust'*exp_dm_sb;
total_exp_dm = p_cust'*exp_dm;
total_exp_de = p_cust'*exp_de;
%% Compare the Euclidean and Manhattan Metrics
% use population data as customers
target_pop_xy = pop_xy((pop_xy(:,1)>S_xy(1)) .* ...
(pop_xy(:,1)<S_xy(end)) .* (pop_xy(:,2)>S_xy(1)) .* ...
(pop_xy(:,2)<S_xy(end))==1,:);
% use mbta stations as customers
% target_pop_xy = mbta_xy((mbta_xy(:,1)>S_xy(1)) .* ...
%
(mbta_xy(:,1)<S_xy(end)) .* (mbta_xy(:,2)>S_xy(1)) .* ...
%
(mbta_xy(:,2)<S_xy(end))==1,:);
closest_de_dd = zeros(length(target_pop_xy),1);
closest_de_sb = zeros(length(target_pop_xy),1);
closest_dm_dd = zeros(length(target_pop_xy),1);
closest_dm_sb = zeros(length(target_pop_xy),1);
for i=1:length(target_pop_xy)
3 of 15
5/5/10 6:47 PM
4 of 15
5/5/10 6:47 PM
(target_pop_xy(:,1)>S_xy(mod(sector-1,10)+1)) .* ...
(target_pop_xy(:,1)<S_xy(mod(sector-1,10)+2)) .* ...
(target_pop_xy(:,2)>S_xy(end-ceil(sector/10))) .* ...
(target_pop_xy(:,2)<S_xy(end-ceil(sector/10))+1)==1,:));
end
avg_de_sb_s(isnan(avg_de_sb_s))=0;
avg_de_dd_s(isnan(avg_de_dd_s))=0;
avg_dm_sb_s(isnan(avg_dm_sb_s))=0;
avg_dm_dd_s(isnan(avg_dm_dd_s))=0;
avg_de_s(isnan(avg_de_s))=0;
avg_dm_s(isnan(avg_dm_s))=0;
avg_de_dd_n = zeros(length(neighborhoods),1);
avg_de_sb_n = zeros(length(neighborhoods),1);
avg_dm_dd_n = zeros(length(neighborhoods),1);
avg_dm_sb_n = zeros(length(neighborhoods),1);
avg_de_n = zeros(length(neighborhoods),1);
avg_dm_n = zeros(length(neighborhoods),1);
num_cust_n = zeros(length(neighborhoods),1);
for i=1:length(neighborhoods)
sectors = eval(neighborhoods{i});
for s=1:length(sectors)
sector = sectors(s);
avg_de_dd_n(i) = (avg_de_dd_n(i)*num_cust_n(i) + ...
avg_de_dd_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
avg_de_sb_n(i) = (avg_de_sb_n(i)*num_cust_n(i) + ...
avg_de_sb_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
avg_dm_dd_n(i) = (avg_dm_dd_n(i)*num_cust_n(i) + ...
avg_dm_dd_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
avg_dm_sb_n(i) = (avg_dm_sb_n(i)*num_cust_n(i) + ...
avg_dm_sb_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
avg_de_n(i) = (avg_de_n(i)*num_cust_n(i) + ...
avg_de_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
avg_dm_n(i) = (avg_dm_n(i)*num_cust_n(i) + ...
avg_dm_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
num_cust_n(i) = num_cust_n(i) + num_cust_s(sector);
end
end
avg_de_dd =
avg_de_sb =
avg_dm_dd =
avg_dm_sb =
avg_de = 0;
avg_dm = 0;
0;
0;
0;
0;
5 of 15
5/5/10 6:47 PM
num_cust = 0;
for i=1:length(neighborhoods)
avg_de_dd = (avg_de_dd*num_cust + ...
avg_de_dd_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
avg_de_sb = (avg_de_sb*num_cust + ...
avg_de_sb_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
avg_dm_dd = (avg_dm_dd*num_cust + ...
avg_dm_dd_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
avg_dm_sb = (avg_dm_sb*num_cust + ...
avg_dm_sb_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
avg_de = (avg_de*num_cust + ...
avg_de_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
avg_dm = (avg_dm*num_cust + ...
avg_dm_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
num_cust = num_cust + num_cust_n(i);
end
table = [vertcat(num_cust_n,num_cust) vertcat(avg_de_sb_n,avg_de_sb) ...
vertcat(avg_de_dd_n,avg_de_dd) vertcat(avg_de_n,avg_de)...
vertcat(avg_dm_sb_n,avg_dm_sb) vertcat(avg_dm_dd_n,avg_dm_dd)...
vertcat(avg_dm_n,avg_dm)];
R_de_dm = mean(closest_dm_dd./closest_de_dd);
E_de_dm = -norminv(0.05/2)*std(closest_dm_dd./closest_de_dd)/...
sqrt(length(closest_dm_dd));
CI_de_dm = R_de_dm + [-E_de_dm E_de_dm];
%% Simulated Location Pairs for Metric Comparison
% lat = [min(dd(:,1)) max(dd(:,1))];
% long = [min(dd(:,2)) max(dd(:,2))];
% new_cust = [lat(1)+(lat(2)-lat(1))*rand(20,1) ...
%
long(1)+(long(2)-long(1))*rand(20,1)];
rand_cust =
42.3162
42.3666
42.3199
42.3654
42.4149
42.3337
42.4084
42.3756
42.3658
42.3994
[ 42.3082
-71.0922;
-71.0827;
-71.1624;
-71.1112;
-71.1224;
-71.1814;
-71.0839;
-71.1229;
-71.1789;
-71.1121;
-71.1327; 42.4231
42.3481 -71.1549;
42.3673 -71.0242;
42.3285 -71.1936;
42.3694 -71.1135;
42.3359 -71.1003;
42.3293 -71.1063;
42.3766 -71.0385;
42.3173 -71.0822;
42.4109 -71.0840;
42.3123 -71.0900;
-71.0546; 42.3773
42.4119 -71.1774;
42.3539 -71.1775;
42.3332 -71.0489;
42.3503 -71.0367;
42.3812 -71.0732;
42.3598 -71.0539;
42.3112 -71.0867;
42.3482 -71.1098;
42.3824 -71.0294;
42.3648 -71.0920;
-71.1012;
6 of 15
5/5/10 6:47 PM
42.4141
42.3063
42.3089
42.3781
42.4068
42.3212
42.3746
42.4125
42.3311
42.3003
42.3545
42.2998
42.3957
42.3843
42.4108
42.3532
-71.1559;
-71.0601;
-71.0686;
-71.1659;
-71.0983;
-71.1910;
-71.1139;
-71.0943;
-71.1714;
-71.1411;
-71.0299;
-71.1247;
-71.1896;
-71.1790;
-71.1563;
-71.1351;
42.3101
42.3498
42.3294
42.3287
42.3480
42.3435
42.3297
42.4142
42.3686
42.3312
42.3158
42.3110
42.4139
42.4035
42.3475
42.3171
-71.0786;
-71.1580;
-71.0452;
-71.1906;
-71.0513;
-71.1304;
-71.1879;
-71.1431;
-71.1429;
-71.0764;
-71.0907;
-71.1881;
-71.0443;
-71.1712;
-71.0438;
-71.1281;
42.3792
42.3895
42.3171
42.3303
42.3095
42.4108
42.3429
42.4220
42.3192
42.4268
42.4124
42.3287
42.3251
42.4206
42.4190
];
-71.1679;
-71.1657;
-71.1547;
-71.1389;
-71.0549;
-71.1094;
-71.0875;
-71.1892;
-71.0623;
-71.0322;
-71.0296;
-71.1439;
-71.1197;
-71.1530;
-71.0619;
rand_cust_dist = haversine(mit(1),mit(2),rand_cust(:,1),rand_cust(:,2));
rand_cust_xy = [haversine(mit(1),mit(2),mit(1),rand_cust(:,2)).*...
sign(rand_cust(:,2)-mit(2))...
haversine(mit(1),mit(2),rand_cust(:,1),mit(2)).*...
sign(rand_cust(:,1)-mit(1))];
rand_de_dd = zeros(length(rand_cust_xy),1);
rand_dm_dd = zeros(length(rand_cust_xy),1);
rand_dd = zeros(length(rand_cust_xy),2);
for i=1:length(rand_cust_xy)
[C,CI] = min(sqrt((rand_cust_xy(i,1)-dd_xy(:,1)).^2+ ...
(rand_cust_xy(i,2)-dd_xy(:,2)).^2));
rand_dd(i,:) = dd(CI,:);
rand_de_dd(i) = C;
rand_dm_dd(i) = abs(rand_cust_xy(i,1)-dd_xy(CI,1))+ ...
abs(rand_cust_xy(i,2)-dd_xy(CI,2));
end
rand_dd_dist = haversine(mit(1),mit(2),rand_dd(:,1),rand_dd(:,2));
rand_dd_xy = [haversine(mit(1),mit(2),mit(1),rand_dd(:,2)).*...
sign(rand_dd(:,2)-mit(2))...
haversine(mit(1),mit(2),rand_dd(:,1),mit(2)).*...
sign(rand_dd(:,1)-mit(1))];
[rand_cust_ij(:,1) rand_cust_ij(:,2)] = ...
xy2ij(rand_cust_xy(:,1),rand_cust_xy(:,2),W_m,W_i);
[rand_dd_ij(:,1) rand_dd_ij(:,2)] = ...
xy2ij(rand_dd_xy(:,1),rand_dd_xy(:,2),W_m,W_i);
% for i=41:length(rand_cust)
%
disp(['from: ' num2str(rand_cust(i,1)) ', ' num2str(rand_cust(i,2)) ...
%
' to: ' num2str(rand_dd(i,1)) ', ' num2str(rand_dd(i,2))])
% end
rand_dg_dd = [1.2; 0.7; 0.4; 0.5; 0.1; 1.8; 0.4; 0.3; 0.6; 1.7;
1.7; 0.1; 0.5; 0.2; 0.3; 0.4; 0.3; 0.5; 1.0; 0.7; .080; 0.3;
0.2; 0.3; 0.3; 0.6; 0.3; 0.2; 0.4; 1.0; 0.3; 0.5; 0.2; 0.5;
0.3; 0.9; 0.2; 0.3; 1.4; 0.7; 2.7; 1.8; 0.7; 1.7; 0.8; 0.4;
0.2; 0.3; 2.4; 0.8; 0.3; 0.2; 1.5; 0.2; 0.4; 0.4; 2.3; 0.8;
7 of 15
5/5/10 6:47 PM
1.2; 0.2; 2.4; .083; 0.9; 8.2; 0.6; 0.8; 1.1; 3.5; 0.6; 1.0;
0.3; 0.8; 0.5; 1.7; 0.1; 0.8; 0.4; 0.6; .078; 1.2; ];
outliers = abs(rand_dg_dd-rand_de_dd)>3*min(rand_dg_dd,rand_de_dd);
R_de_dg_o = mean(rand_dg_dd./rand_de_dd);
E_de_dg_o = -norminv(0.05/2)*std(rand_dg_dd./rand_de_dd)/...
sqrt(length(rand_dg_dd));
CI_de_dg_o = R_de_dg_o + [-E_de_dg_o E_de_dg_o];
R_de_dg_no = mean(rand_dg_dd(~outliers)./rand_de_dd(~outliers));
E_de_dg_no = -norminv(0.05/2)*std(rand_dg_dd(~outliers)./...
rand_de_dd(~outliers))/sqrt(length(rand_dg_dd(~outliers)));
CI_de_dg_no = R_de_dg_no + [-E_de_dg_no E_de_dg_no];
R_dm_dg_o = mean(rand_dg_dd./rand_dm_dd);
E_dm_dg_o = -norminv(0.05/2)*std(rand_dg_dd./rand_dm_dd)/...
sqrt(length(rand_dg_dd));
CI_dm_dg_o = R_dm_dg_o + [-E_dm_dg_o E_dm_dg_o];
R_dm_dg_no = mean(rand_dg_dd(~outliers)./rand_dm_dd(~outliers));
E_dm_dg_no = -norminv(0.05/2)*std(rand_dg_dd(~outliers)./...
rand_dm_dd(~outliers))/sqrt(length(rand_dg_dd(~outliers)));
CI_dm_dg_no = R_dm_dg_no + [-E_dm_dg_no E_dm_dg_no];
R_de_dm_o = mean(rand_dm_dd./rand_de_dd);
E_de_dm_o = -norminv(0.05/2)*std(rand_dm_dd./rand_de_dd)/...
sqrt(length(rand_dm_dd));
CI_de_dm_o = R_de_dm_o + [-E_de_dm_o E_de_dm_o];
R_de_dm_no = mean(rand_dm_dd(~outliers)./rand_de_dd(~outliers));
E_de_dm_no = -norminv(0.05/2)*std(rand_dm_dd(~outliers)./...
rand_de_dd(~outliers))/sqrt(length(rand_dm_dd(~outliers)));
CI_de_dm_no = R_de_dm_no + [-E_de_dm_no E_de_dm_no];
ci_table = [
[CI_de_dg_no(1) R_de_dg_no CI_de_dg_no(2) ...
CI_de_dg_o(1) R_de_dg_o CI_de_dg_o(2)]
[CI_dm_dg_no(1) R_dm_dg_no CI_dm_dg_no(2) ...
CI_dm_dg_o(1) R_dm_dg_o CI_dm_dg_o(2)]
[CI_de_dm_no(1) R_de_dm_no CI_de_dm_no(2) ...
CI_de_dm_o(1) R_de_dm_o CI_de_dm_o(2)]
];
%% Plot Locations using Longitude and Latitude (GPS) Coordinates
figure(1)
plot(dd(:,2),dd(:,1),'.b',...
sb(:,2),sb(:,1),'.g',...
mbta(mbta_dist<5,2),mbta(mbta_dist<5,1),'*r')
axis equal
xlabel('Longitude (\circ)')
ylabel('Latitude (\circ)')
legend('Dunkin Donuts','Starbucks','MBTA Station')
%% Plot Locations using Cartesian Coordinates
8 of 15
5/5/10 6:47 PM
figure(2)
plot(dd_xy(:,1),dd_xy(:,2),'.b',...
sb_xy(:,1),sb_xy(:,2),'.g',...
mbta_xy(mbta_dist<5,1),mbta_xy(mbta_dist<5,2),'*r',...
5*cos(linspace(0,2*pi(),100)),5*sin(linspace(0,2*pi(),100)),'-k')
xlabel('Distance (miles)')
ylabel('Distance (miles)')
legend('Dunkin Donuts','Starbucks','MBTA Station')
axis equal
%% Plot Locations Overlaid Map Image
figure(3)
imshow(I)
hold on
plot(dd_ij(:,1),dd_ij(:,2),'.b',...
sb_ij(:,1),sb_ij(:,2),'.g',...
mbta_ij(mbta_dist<5,1),mbta_ij(mbta_dist<5,2),'*r',...
W_i/2+W_i/2*cos(linspace(0,2*pi(),100)),...
W_i/2-W_i/2*sin(linspace(0,2*pi(),100)),'-k')
hold off
legend('Dunkin Donuts','Starbucks','MBTA Station')
axis image
%% Overlay Location Sector Sums on Location Map
figure(3)
hold on
for i=1:N_s+1
for j=1:N_s+1
plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',...
linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k')
if i<=N_s && j<=N_s
text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,...
['\bf\color{blue}' num2str(number_dd(j,i)) ...
'\newline\bf\color{green}' num2str(number_sb(j,i)) ...
'\newline\bf\color{red}' num2str(number_mbta(j,i)) ],...
'HorizontalAlignment','center',...
'VerticalAlignment','middle')
end
end
end
hold off
%% Plot Population Data Overlaid on Map
figure(4)
imshow(I)
hold on
plot(pop_ij(:,1),pop_ij(:,2),'.magenta',...
W_i/2+W_i/2*cos(linspace(0,2*pi(),100)),...
W_i/2-W_i/2*sin(linspace(0,2*pi(),100)),'-k')
hold off
9 of 15
5/5/10 6:47 PM
legend('100 People')
axis image
%% Overlay Population Sector Sums on Population Map
figure(4)
hold on
for i=1:N_s+1
for j=1:N_s+1
plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',...
linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k')
if i<=N_s && j<=N_s
text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,...
['\bf' num2str(number_pop(j,i))],...
'HorizontalAlignment','center')
end
end
end
hold off
%% Display Sector Labels Overlaid on Map
figure(5)
imshow(I)
hold on
for i=1:N_s+1
for j=1:N_s+1
plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',...
linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k')
if i<=N_s && j<=N_s
text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,...
['\bf' num2str(10*(j-1)+i)],...
'HorizontalAlignment','center')
end
end
end
hold off
axis off image
%% Overlay Neighborhood Colors on Sector Map
figure(5)
hold on
for sector=1:N_s^2
color = 'w';
if sum(cambridge==sector)>0 ...
|| sum(southeast==sector)>0
color='y';
elseif sum(northeast==sector)>0 ...
|| sum(northwest==sector)>0 ...
|| sum(back_bay==sector)>0
color='g';
10 of 15
5/5/10 6:47 PM
11 of 15
5/5/10 6:47 PM
12 of 15
5/5/10 6:47 PM
colorbar
shading flat
axis off equal
title('Dunkin Donuts (per mi^2)')
figure(11)
contourf(X,Y,number_sb/W_s^2,120)
caxis([0 30])
colorbar
shading flat
axis off equal
title('Starbucks (per mi^2)')
figure(12)
contourf(X,Y,number_mbta/W_s^2,120)
caxis([0 30])
colorbar
shading flat
axis off equal
title('MBTA Stations (per mi^2)')
figure(13)
contourf(X,Y,number_pop/1000/W_s^2,120)
colorbar
shading flat
axis off equal
title('Population (thousands per mi^2)')
%% Plot Distance Metric Comparison Ratio
figure(14)
hold on
scatter(closest_de_dd,closest_dm_dd,'.k')
plot(linspace(0,2,100),linspace(0,2,100),'-k',...
linspace(0,2,100),4/pi().*linspace(0,2,100),'--r',...
linspace(0,2,100),R_de_dm.*linspace(0,2,100),'--b')
hold off
title('Closest Dunkin'' Donuts')
xlabel('Euclidean Distance (D_e, miles)')
ylabel('Manhattan Distance (D_m, miles)')
legend('Sample','R = 1',...
'R = 4/\pi',['R = ' num2str(R_de_dm)]);
axis xy square
%% Plot Metric Comparison Ratios
figure(15)
hold on
scatter(rand_de_dd(~outliers),rand_dg_dd(~outliers),'.k')
scatter(rand_de_dd(outliers),rand_dg_dd(outliers),'.r')
plot(linspace(0,6,100),linspace(0,6,100),'-k',...
linspace(0,6,100),R_de_dg_o*linspace(0,6,100),'--r',...
13 of 15
5/5/10 6:47 PM
linspace(0,6,100),R_de_dg_no*linspace(0,6,100),'--b')
hold off
xlabel('Euclidean Distance (D_e, miles)')
ylabel('Google Distance (D_g, miles)')
legend('Sample','Outlier','R=1',['R=' num2str(R_de_dg_o)...
' (w/ Outliers)'],['R=' num2str(R_de_dg_no) ' (w/o Outliers)'])
axis xy square
figure(16)
hold on
scatter(rand_dm_dd(~outliers),rand_dg_dd(~outliers),'.k')
scatter(rand_dm_dd(outliers),rand_dg_dd(outliers),'.r')
plot(linspace(0,6,100),linspace(0,6,100),'-k',...
linspace(0,6,100),R_dm_dg_o*linspace(0,6,100),'--r',...
linspace(0,6,100),R_dm_dg_no*linspace(0,6,100),'--b')
hold off
xlabel('Manhattan Distance (D_m, miles)')
ylabel('Google Distance (D_g, miles)')
legend('Sample','Outlier','R=1',['R=' num2str(R_dm_dg_o)...
' (w/ Outliers)'],['R=' num2str(R_dm_dg_no) ' (w/o Outliers)'])
axis xy square
figure(17)
hold on
scatter(rand_de_dd(~outliers),rand_dm_dd(~outliers),'.k')
scatter(rand_de_dd(outliers),rand_dm_dd(outliers),'.r')
plot(linspace(0,6,100),linspace(0,6,100),'-k',...
linspace(0,6,100),R_de_dm_o*linspace(0,6,100),'--r',...
linspace(0,6,100),R_de_dm_no*linspace(0,6,100),'--b')
hold off
xlabel('Euclidean Distance (D_e, miles)')
ylabel('Manhattan Distance (D_m, miles)')
legend('Sample','Outlier','R=1',['R=' num2str(R_de_dm_o)...
' (w/ Outliers)'],['R=' num2str(R_de_dm_no) ' (w/o Outliers)'])
axis xy square
%% Display Customer-Storefront Pairs Overlaid on Map
figure(18)
imshow(I)
hold on
plot(rand_cust_ij(:,1),rand_cust_ij(:,2),'.m',...
rand_dd_ij(:,1),rand_dd_ij(:,2),'.b')
for i=1:length(rand_cust_ij)
plot([rand_cust_ij(i,1) rand_dd_ij(i,1)],...
[rand_cust_ij(i,2) rand_dd_ij(i,2)],'-k')
end
hold off
legend('Customer','Dunkin'' Donuts')
axis image
%%
figure(19)
14 of 15
5/5/10 6:47 PM
hold on
for i=1:size(neighborhoods)
plot([1 2],[exp_de_dd(i) avg_de_dd_n(i)],'-b')
plot([1 2],[exp_de_sb(i) avg_de_sb_n(i)],'-g')
end
hold off
axis([0 3 0 1.5])
15 of 15
5/5/10 6:49 PM
1 of 1
5/5/10 6:49 PM
1 of 1
5/5/10 6:49 PM
1 of 1