Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area

Comparative Analysis of Coffee Franchises
in the Cambridge-Boston Area
May 10, 2010

ESD.86: Models, Data, and Inference for Socio-Technical Systems
Paul T. Grogan
ptgrogan@mit.edu
Massachusetts Institute of Technology
Introduction
The placement of storefronts is a difficult question on which many corporations spend a great amount of
time, effort, and money. There is a careful interplay between environment, potential customers, other
storefronts from the same franchise, and other storefronts for competing franchises. From the customers
perspective, the convenience of storefronts, especially for discretionary products or services, is of the
utmost importance. In fact, some franchises develop mobile phone applications to provide their customers
with an easy way to find the nearest storefront.1
This project takes an in-depth view of the storefront placements of Dunkin Donuts and Starbucks, two
competing franchises with strong presences in the Cambridge-Boston area. Both franchises purvey coffee,
coffee drinks, light meals, and pastries and cater especially well to sleep-deprived graduate students.
However, Dunkin Donuts typically puts more emphasis on take-out (convenience) customers looking to
grab a quick coffee before class whereas Starbucks provides an environment conducive to socializing,
meetings, writing theses, or studying over a longer duration. These differences in target customers may
drive differences in the distribution of storefronts in the area.
The goal of this project is to apply some of the concepts learned in ESD.86 on probabilistic modeling and
to the real-world system of franchise storefronts and customers. The focus of the analysis is directed on
the convenience of accessing storefronts, determined by the distance to the nearest location from a
random customer. The nearest neighbor probabilistic model is a natural choice for application to this
problem. Under this model, the distance from a random uniformly-distributed customer to the closest
spatially Poisson distributed storefront can be expressed with a closed-form equation. Of course, in the
real-world system, there are several assumptions that must be checked.
Can the franchise storefronts be modeled with a spatial Poission distribution?
Can the customers be modeled with a uniform distribution?
Does the nearest-neighbor distance correlate with the actual closest storefront distance?
Is the Euclidean or Manhattan distance metric appropriate for pedestrian walking paths?
To answer these questions, as well as the greater question of which coffee franchise provides better
service to the residents of the Cambridge/Boston area, the project is broken down into three parts. First,
data must be gathered on the existing storefront locations within an area of interest. Fortunately, both
franchises provide store locator services from the corporate web sites. Additionally, data representing
1
myStarbucks App for iPhone and iPod Touch, http://www.starbucks.com/coffeehouse/mobile-apps/mystarbucks
Grogan ESD.86
the demand distribution either through population density or other relevant features are required for
constructing the customer model. Second, probabilistic distributions will be created in accordance with
the nearest neighbor model. Using the data gathered in the first phase, storefront locations will be
modeled as spatial Poisson distributions and customers will be modeled with uniform distributions.
Finally, comparative analysis will investigate the differences between the two franchises as well as the
underlying assumptions and accuracy of the probabilistic models.
Grogan ESD.86
Data Gathering
The data gathering portion of the project assembles the information required to build the probabilistic
models. There are two primary formats of data needed: positional data and population data. Positional
data provides coordinates for storefront locations for both franchises as well as locations of other features
that may be helpful in the analysis. Population data provides a sense of customer density that will be used
to help drive customer demand models.
Positional Coordinates
Not long ago, gathering position coordinates in a format conducive to numerical analysis would have
been an insurmountable challenge for a term project. Fortunately, with the confluence of several
technologies, it is no longer out of scope to build a very accurate representation of the real world.
The general process to gather location data is as follows:
1. Aggregate addresses using online-available services or documents
2. Process addresses into GPS coordinates using online GeoCoder tool2
3. Visualize GPS coordinates using online mapping applications such as Google Maps, iterating on
improperly-identified addresses as necessary
4. Transform GPS coordinates into Cartesian coordinates using the haversine formula3
The main innovation in the above steps is the availability of the GeoCoder tool, which allows batch
queries of addresses to either Yahoo or Google mapping applications. Though the queries are not always
correct, it dramatically reduces the time required to generate GPS coordinates (latitude and longitude)
from text-based addresses.
Franchise Storefronts
The franchise storefront addresses are readily available on both Dunkin Donuts4 and Starbucks5
corporate websites. In both cases, the search criteria was limited to a target area being within five miles of
ZIP code 02139 , which resolves to a location near Central Square in Cambridge, MA. In addition, all
franchise storefront locations at Logan International Airport were removed under the assumption that
GeoCoder tool provides search queries using Yahoo or Google: http://www.gpsvisualizer.com/geocoder/

Haversine formula computes great-circle distances: http://en.wikipedia.org/wiki/Haversine_formula
4
Dunkin Donuts store locator: https://www.dunkindonuts.com/aboutus/store/Search.aspx
5
Starbucks store locator (legacy): http://ie.starbucks.com/en-ie/_Our+Stores/
3
Grogan ESD.86
airline customers do not include locally-quantifiable customers. With these restrictions, there were a total
of 163 Dunkin Donuts and 59 Starbucks franchise storefronts identified in the target area.
MBTA Stations
As noted in one journal article, the optimal storefront placement for discretionary services may be at
intersections of high pedestrian traffic.6 In the Boston area, the MBTA public transportation system hosts
an average weekday ridership of 1.24 million customers as of April 20107 and is a prime target for
storefront location placement. In this project, MBTA stations on the red, blue, green, orange, and silver
lines were considered as inputs for a potential customer model. Also, as addresses are not widely used for
these stations, an freely-distributable list of 142 stations current through 2006 including GPS coordinates
was used for station location data.8
Visualizations
As an important part of gathering data, visualizations were used throughout the project to verify locations.
Figure 1 (below) shows plots of the storefront locations and MBTA stations using both GPS and
Cartesian coordinate systems. In the Cartesian coordinate system, the five-mile radius is highlighted.
a)
b)
Figure 1: a) Raw GPS Position Coordinates b) Cartesian Position Coordinates with 5-Mile Radius Highlighted
To improve the context of the franchise storefronts and MBTA stations, the location data was overlaid on
an area map9, as shown in Figure 2.
Berman, O., Larson R., Fouska N., Optimal Location of Discretionary Service Facilities, Transportation Science,
Vol. 26, No. 3, pp. 201-211, August 1992.
7
Davey R., MBTA Scorecard, April 2010. Retrieved 4/25/2010 from http://mbta.com/about_the_mbta/scorecard/
8
Demaine, E., Boston Subway Google Map. Retrieved 4/25/2010 from http://erikdemaine.org/maps/mbta/
9
Background map retrieved from Google Maps: http://maps.google.com
Grogan ESD.86
Figure 2: Location Data Overlaid on Map
Population Density
Gathering population density data was a challenge for this project. Although population data is commonly
available from decadal censuses, it is commonly aggregated by county or city which is not conducive for
spatial analysis. Fortunately, an online Digital Atlas of Boston includes population maps based on the
1990 census utilizing red dots to represent 100 persons randomly distributed within a census tract.10
With some post-processing using Adobe Photoshop, the image was copped, resized, and filtered to
display only the population information which is readable using built-in MATLAB image processing
functions. The processed data is shown in Figure 3. Though there are some concerns over the accuracy of
the resulting population data,11 it should be internally consistent and be helpful towards the modeling
process.
10
Bowen, W., Boston and Vicinity: Total Population, 1997. Retrieved 4/25/2010 from
http://130.166.124.2/boston/bos1.GIF
11
There is some discrepancy if a dot is one pixel or two and whether the pixels were sampled with or without
replacement. In some cases, one pixel could represent somewhere between 50 and 100 people, more if there could
be overlap, though from rough estimates, the 100 people per pixel seems to provide accurate population data.
Grogan ESD.86
a)
b)
Figure 3: a) Raw Population Data for Boston Area b) Processed Population Data of Target Area
Grogan ESD.86
Probabilistic Modeling
Within the topics covered in ESD.86, the discussion of spatial probability distributions involved the
nearest neighbor problem of finding the expected distance to the closest neighbor from a random point.
This problem uses a uniform distribution to select the customer and a spatial Poisson distribution for
the neighboring storefronts within a specified area.
If this type of problem is to be extended to a real-world case of storefronts, ultimately selecting whether
Dunkin Donuts or Starbucks is a closer neighbor for random customers, the distributions of both the
customers and storefronts should be investigated. A city-wide Poisson distribution of storefronts is not
likely to hold as there is clearly some location dependence in the storefront placing. On a smaller scale,
however, a spatial Poisson distribution is conceivable, as the exact placement of a storefront within a
small area may be independent of others. In a similar sense, a city-wide uniform distribution of customers
is not likely to hold as there are significantly higher concentrations of customers in the city-centers. On
smaller scales, however, uniformly-distributed customers may be a valid approximation.
To implement the concept of piecewise spatially Poisson distributed storefronts and uniformly distributed
customers, the initial 78.5 square mile target area (circle with 5 mile radius) was sub selected to a 49
square miles (square with 7 miles per side). This area was then broken down into 100 square sectors, each
0.7 miles per side, or 0.49 square miles in area.
a)
b)
Figure 4: a) Target Area Divided into Sectors b) Sectors Highlighted by Neighborhood Assignment
Grogan ESD.86
With the relatively fine level of sector definition, many were not large enough to contain a storefront. In
order to determine a non-zero storefront density for each unit of analysis, sectors were grouped into seven
neighborhoods. The sizing of each neighborhood along with a description is provided in Table 1.
Table 1: Neighborhood Descriptions
Neighborhood
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Number
Sectors
8
18
21
4
3
17
17
Description
Cambridge Highlands, M. Auburn, East Watertown
Cambridge, East Cambridge
Everett, Somerville, Charlestown
Downtown Boston, North End
Back Bay, Fenway
Brookline, Aberdeen, Brighton
Roxbury, South Boston, Harrison Lenox
The process of neighborhood definition was done by hand using approximate city or geographical
boundaries. The neighborhoods do not exactly correspond to the geographical equivalents due to the
discretization of sectors, though the labeling scheme helps infer relative location. The only requirement of
each neighborhood is that it must contain both a Dunkin Donuts and a Starbucks storefront, providing a
non-zero storefront density. In some cases, sectors did not fit into an existing neighborhood, nor did they
exhibit enough information to establish a new neighborhood, so they went unused.
Storefront Distribution Model

Using the neighborhood definitions, the storefront distribution model constructs a piecewise spatial
Poisson distribution for each neighborhood. Since there is at least one storefront from each franchise in
each neighborhood, the storefront densities are non-zero in all neighborhoods.
Figure 5: Storefront Locations by Neighborhood
Grogan ESD.86
Using a count of the number of storefronts within each neighborhood and the associated area, the
storefront density parameter was determined for each neighborhood, as shown in Table 2.
Table 2: Storefront Model Parameters
Neighborhood
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Total
Area
(mi2)
Number Storefronts
Starbucks
3.92
8.82
10.29
1.96
1.47
8.33
8.33
43.12
1
11
2
13
16
10
6
59
Dunkin
Donuts
1
23
26
27
14
19
20
130
Storefront Density
(, 1/mi2)
Dunkin
Starbucks
Either
Donuts
0.2551
0.2551
0.5102
1.2472
2.6077
3.8549
0.1944
2.5267
2.7211
6.6327
13.7755
20.4082
10.8844
9.5238
20.4082
1.2005
2.2809
3.4814
0.7203
2.4010
3.1212
1.3683
3.0148
4.3831
Customer Distribution Model

The customer distribution model uses the same sector and neighborhood definition as used in the
storefront distribution model. Using the assumption that customers are distributed in proportion to
population, the customer distribution model uses the population data to weigh the relative probabilities of
customers emerging from each neighborhood.
Figure 6: Sectors with Estimated Population
Using the population density information previously gathered, the number of potential customers is
determined for each sector, which is then aggregated into neighborhoods. The probability of a customer
Grogan ESD.86
10
emerging from a particular neighborhood is in proportion to the neighborhoods population fraction, as

shown in Table 3.
Table 3: Customer Model Parameters
Neighborhood
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Total
Estimated
Population
46900
180600
162900
40800
51700
182500
175500
840900
Customer
Probability
0.0558
0.2148
0.1937
0.0485
0.0615
0.2170
0.2087
1.0000
City-wide Spatial Poisson Distribution

As an aside, I thought it would be interesting to test the hypothesis that Dunkin Donuts and Starbucks
franchise storefronts follow a Poisson spatial distribution over all 100 sectors. This null hypothesis was
tested using the Pearson Goodness-of-Fit test using an assumed Poisson distribution parameter estimated
from the average number of storefronts per sector (1.36 Dunkin Donuts per sector, 0.59 Starbucks per
sector).
In both cases, the null hypothesis was rejected (=0.05, p=1). For some reasoning behind the hypothesis
was rejected at such a high confidence level, the histograms of expected versus observed storefronts per
sector is displayed below. The Poisson spatial distribution over-estimates the low-frequency sectors and
under-estimates the high-frequency sectors, which makes sense given a high concentration of storefronts
in the city-center.
a)
b)
Figure 7: Histograms Generated During Goodness-of-Fit Test for Spatial Poisson Distribution
Grogan ESD.86
11
It should also be noted that there is some flexibility for running the test with different numbers of bins and
different sized sectors. With larger sectors, there are more storefronts on average allowing more numerous
bins, but the downside is that the frequency in each bin is decreased.
Grogan ESD.86
12
Comparative Analysis
Armed with the probabilistic models for both storefront and customer distributions, the next step is to
apply the models to compare the expected distances to each franchise between the neighborhoods, over
the entire city, and also to test the validity of the models by processing existing data.
Visualizations
In order to frame the resulting discussion, several visualizations of the storefront, MBTA station, and
population density are provided below. Storefronts are clearly focused in the Downtown and Back Bay
neighborhoods (where the population density is highest), though there is also significant population
density in the Cambridge and Southwest neighborhoods without similar storefront densities. Also, the
MBTA station density is higher in the Southwest neighborhood due to the numerous green line stops.
Figure 8: Sector-based Heat Maps of Location and Population Data
Grogan ESD.86
13
Nearest Storefront Analysis

The primary comparative analysis between Dunkin Donuts and Starbucks seeks to determine which
franchise is closer to more customers. From the customers perspective, the closest storefront may
determine which franchise gets their business.
Model Results
Within the Poisson nearest neighbor model, the probability density function and expected value for
distance to the nearest neighbor take the form of closed-form solutions given below for both Euclidean
(De) and Manhattan (Dm) distance metrics, where r is the distance to the nearest storefront and is the
storefront density.
f De (r ) = 2re r
f Dm (r ) = 4 re 2r
E [De ] = r f De (r )dr =
E [Dm ] = r f Dm (r )dr =
1
4
By applying the nearest-neighbor formulas, the expected distance to the nearest storefront of one or either
franchise can be determined for each neighborhood. When combined with the customer model,
aggregated values can be determined for the target area as a whole.
Table 4: Probabilistic Model Results by Neighborhood
Neighborhood
Customer
Prob.
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Aggregated
0.0558
0.2148
0.1937
0.0485
0.0615
0.2170
0.2087
1.0000
Expected Min Euclidean Distance

(De, mi)
Dunkin
Either
Starbucks
Donuts
0.9899
0.9899
0.7000
0.4477
0.3096
0.2547
1.1341
0.3146
0.3031
0.1941
0.1347
0.1107
0.1516
0.1620
0.1107
0.4563
0.3311
0.2680
0.5891
0.3227
0.2830
0.6118
0.3383
0.2819
Expected Min Manhattan Distance

(Dm, mi)
Dunkin
Either
Starbucks
Donuts
1.2407
1.2407
0.8773
0.5611
0.3881
0.3192
1.4214
0.3942
0.3799
0.2433
0.1688
0.1387
0.1899
0.2031
0.1387
0.5719
0.4149
0.3359
0.7384
0.4044
0.3547
0.7668
0.4240
0.3533
The Downtown and Back Bay neighborhoods have the shorted expected distance to either storefront, with
an average of just 580 feet to the closest Dunkin Donuts or Starbucks. A few neighborhoods show
drastically different storefront placement strategies between the two franchises. Dunkin Donuts holds a
large advantage in the Northeast and moderate advantages in the Southeast. The only neighborhood where
Starbucks holds an advantage is in the Back Bay.
Grogan ESD.86
14
Figure 9: Expected Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks)
Comparison with Exact Population Density Demand

As the data set is relatively small, an exhaustive analysis can be performed on the raw data for a
comparison to the model. In this comparison, each population dot is used as a simulated customer,
maintaining the underlying customer probability distribution. For each customer, the actual closest
storefront is located using either the Euclidean or Manhattan distance metric. To report data consistently
with the model results, the customers and minimum distances are aggregated by sector and neighborhood.
Table 5: Population-based Customer Results by Neighborhood
Neighborhood
Customer
Count
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Aggregated
469
1806
1629
408
517
1825
1755
8409
Mean Min Euclidean Distance

(De, mi)
Dunkin
Starbucks
Either
Donuts
0.8599
0.6174
0.6315
0.4329
0.2995
0.2569
1.2250
0.2788
0.2730
0.2485
0.2248
0.1812
0.1604
0.1722
0.1448
0.4499
0.3207
0.2641
0.7784
0.3098
0.2796
0.6602
0.3085
0.2766
Mean Min Manhattan Distance

(Dm, mi)
Dunkin
Starbucks
Either
Donuts
1.1724
0.7626
0.7804
0.5536
0.3753
0.3202
1.6372
0.3526
0.3435
0.3485
0.2805
0.2228
0.2125
0.2145
0.1817
0.5260
0.4043
0.3305
0.8902
0.3897
0.3506
0.8314
0.3873
0.3457
Remarkably, the results closely mirror those of the model using the estimated expected distance under
uniformly distributed customers and Poisson spatially-distributed storefronts in each sector. All of the
aggregated measures are within 10% of the previous estimates. The largest neighborhood-specific
differences occurred in the Northeast, Northwest neighborhoods (with errors around 0.5 miles for the
advantage of the nearest storefront), indicating that the spatial Poisson distribution may not be a good fit
for these regions.
Grogan ESD.86
15
a)
b)
Figure 10: a) Minimum Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks)
b) Absolute Expected Distance Error
The effectiveness of the model can be statistically evaluated using a paired t-test by neighborhood. The
null hypothesis that the mean distance to the closest storefront differs between the two approaches cannot
be rejected for either Dunkin Donuts (p=0.418), Starbucks (p=0.492), or either (p=0.994) using
Euclidean distance.
Comparison with Exact MBTA Station Demand
Under the theory that optimal storefronts should be placed at the intersections of high-traffic areas,
MBTA stations are a prime location for Dunkin Donuts and Starbucks. This next section investigates
how the minimum distance to the closest storefront varies if customers originate only from MBTA
stations, rather than their residences. MBTA stations for the red, green, blue, orange, and silver lines
within the neighborhoods are used.
Table 6: MBTA-based Customer Results by Neighborhood
Neighborhood
MBTA
Station
Count
Northwest
Cambridge
Northeast
Downtown
Back Bay
Southwest
Southeast
Aggregated
0
14
6
11
12
45
33
121
Grogan ESD.86
Mean Min Euclidean Distance

(De, mi)
Dunkin
Either
Starbucks
Donuts
0.1628
0.0733
0.055
1.1817
0.2443
0.2681
0.2182
0.1063
0.091
0.1199
0.1003
0.0643
0.4340
0.2894
0.2314
0.6083
0.2094
0.1799
0.4365
0.2049
0.1694
Mean Min Manhattan Distance

(Dm, mi)
Dunkin
Either
Starbucks
Donuts
0.1707
0.0932
0.071
1.5962
0.3321
0.3639
0.3292
0.1394
0.1204
0.1326
0.1216
0.0765
0.5378
0.3517
0.2809
0.7119
0.2744
0.2343
0.5361
0.2576
0.2132
16
In every neighborhood except for Southwests Starbucks with Manhattan distance, the mean distance
from an MBTA station to the nearest storefront is lower than the population distribution. In the case of
Cambridge, the mean distances were 60-75% lower, indicating a strong preference for storefront locations
near public transport stations. In the aggregate sense, the mean distances were consistently around 35%
lower than simply using population density data.
Figure 11: Minimum Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks) with MBTA Customers
The difference in expected distance between the population-based and MBTA-based models can be
statistically evaluated using a paired t-test by neighborhood. The null hypothesis that the mean distance to
the closest storefront differs between the two approaches can be rejected for Dunkin Donuts (p=0.022),
and either (p=0.028), but not for Starbucks (p=0.072), using Euclidean distance.
Distance Metric Comparison

Euclidean vs. Manhattan Distance Metrics
In ESD.86, as a part of the spatial probability distribution lecture, we found the following relation for the
expected ratio between the Manhattan and Euclidean distance metrics.
D 4
E m =
De
Grogan ESD.86
17
This formula, however, assumes that the origin and destination remain constant between the two metrics.
In the application used in this project, of finding the nearest storefront using one or the other metric, the
nearest physical location may differ between the two metrics. For example, even if storefront A is closest
under the Euclidean metric, there may be another storefront B that is closer under the Manhattan metric.
Therefore, in this application, we would expect the ratio between distance metrics to be less than 4/.
To investigate, the 8409 pair-wise distances for the population-based analysis can be plotted against each
other in a scatter plot. As expected, the Manhattan metric distance is always greater than the Euclidean
metric distance, but the expected ratio (highlighted in red), does not appear to be at the center of the
distribution.
Figure 12: Scatter Plot of Closest Dunkin' Donuts Storefront under Different Distance Metrics
Using a hypothesis test on mean, the ratio of Manhattan-to-Euclidean distances is found to be 1.2617 with
a 95% confidence interval of [1.2591, 1.2642]. Of note, this interval does not include 4/ 1.2732,
meaning the expected ratio between closest storefront using Euclidean distance metric and closest
storefront using Manhattan distance metric is not 4/ in this application, though for all practical purposes,
the approximation is fine.
Distance Metrics vs. Google Distance
Aside from how the distance metrics relate to each other, it is of interest to see how they compare to
real distance calculations. Google Maps provides a direction generation service that provides a walking
distance between an origin and a destination point. Although it is still under development, it often
Grogan ESD.86
18
provides more realistic distance calculations based on obstructions such as waterways, highways, crooked
Boston streets, and buildings.
Since the question of distance metric accuracy does not rely on the underlying customer distribution,
customers were selected at random with a uniform distribution. In total, 80 customer origins were
generated, each being paired with the closest Dunkin Donuts (via Euclidean metric) and the GPS
coordinates were used to find the walking distance in Google Maps.
Figure 13: Paired Customers-Dunkin' Donuts for Distance Comparison
There are a few challenges to using the Google distance metric. First, the output is typically rounded to
the nearest 0.1 mile, which causes some accuracy problems. Second, there are some locations for which
walking directions do not exist (e.g. if the point falls in the middle of a highway) these points were
omitted from the analysis. Aside from these points, there were a couple outliers in the resulting data set
where the Google distance was (much) greater than the Euclidean or Manhattan distance. These points
typically corresponded to navigating the network of roads and highways between the coastal islands near
Logan Airport.
Grogan ESD.86
19
a)
b)
Figure 14: Distance Ratios for a) Google vs. Euclidean b) Google vs. Manhattan
In general, the Manhattan metric outperformed the Euclidean metric when compared to the Google
distance. The Euclidean metric under-estimated the Google distances by an average of 28% (37% if
outliers are included). The Manhattan metric under-estimated the Google distance by an average of 9%
(20% if outliers are included). Note that in this case, the Manhattan-to-Euclidean metric ratio confidence
interval does include 4/, as the origin-destination locations are invariant under the choice of metric.
Table 7: Distance Metric Ratios
Ratio
Dg/De
Dg/Dm
Dm/De
95% CI LB
w/o Outliers
1.3191
1.0356
1.2562
Grogan ESD.86
Mean
w/o Outliers
1.4019
1.1031
1.2821
95% CI UB
w/o Outliers
1.4847
1.1706
1.3080
95% CI LB
w/ Outliers
1.2955
1.0253
1.2577
Mean
w/ Outliers
1.6151
1.2615
1.2831
95% CI UB
w/ Outliers
1.9348
1.4977
1.3084
20
Conclusions
In conclusion, Dunkin Donuts holds a stronger grasp on the Cambridge-Boston area coffee market. The
areas of greatest advantage for Dunkin Donuts include the Northeast and the Southeast. Only in the Back
Bay neighborhood does Starbucks hold a shorter expected distance. Contributing to this analysis, several
key assumptions have been checked and are summarized as follows.
Can the franchise storefronts be modeled with a spatial Poission distribution?
On a city-wide scale, the spatial Poisson distribution does not accurately model the franchise storefront
locations. However, on a smaller scale such as neighborhoods, especially in regions of uniform
characteristics and low-density, spatial Poisson distributions can be used to accurately model the locations
of franchise storefronts.
Can the customers be modeled with a uniform distribution?
Similar to the storefront distribution method, although it is difficult to model an entire city with a uniform
customer distribution, on a smaller scale such as neighborhoods, piecewise-uniform customer
distributions can be used to model demands.
Does the nearest-neighbor distance correlate with the actual closest storefront distance?
The nearest-neighbor expected distance was not statistically different from the actual distances to the
closest storefront for both Dunkin Donuts and Starbucks, indicating that it seems to be an accurate
estimation of actual closest storefronts. The approximation was more accurate in neighborhoods with a
higher storefront density, such as Downtown and Back Bay, and less accurate in less well-defined
neighborhoods such as the Northeast and Northwest.
Is the Euclidean or Manhattan distance metric appropriate for pedestrian walking paths?
When comparing the Euclidean and Manhattan distance metrics as applied in the nearest-neighbor
problem, the differences observed were slightly less than the expected ratio due to changes in the closest
storefront when transitioning from one metric to the other. When compared to a realistic Google distance
metric (using Google Maps to calculate the walking distance), both Euclidean and Manhattan overestimated the distances, through Manhattan metric was generally closer (within 9% not considering
outliers). There were a few outliers observed corresponding to customers originating in difficult to
access spots such as between islands connected by few roads and/or highways.
Grogan ESD.86
21
Appendix A: Raw Data

Note: I cannot claim this data set is complete or accurate. I do not know how up-to-date the Dunkin
Donuts and Starbucks corporate store locator web applications are (for example, the Dunkin Donuts
location in the MIT Student Center is missing), nor can I vouch for the accuracy of the GeoCoder service.
Whenever possible, I compared the coordinates from both the Google and Yahoo services in attempt to
detect any significant deviations, though this would not catch incorrect or missing addresses.
Dunkin Donuts Locations

Address
Latitude
Longitude
616 Massachusetts Ave, Cambridge, MA 02139
42.3650774
-71.1032368
222 Broadway, Cambridge, MA 02142
42.3664749
-71.0938686
1001 Cambridge St., Cambridge, MA 02141
42.372795
-71.093455
1 Bow St, Cambridge, MA 02138
42.3721864
-71.115655
65 Jfk St, Cambridge, MA 02138
42.3720035
-71.1206418
42.362805
-71.084037
530 Commonwealth Ave, Boston, MA 02215
42.3487341
-71.096409
282 Somerville Ave, Somerville, MA 02143
42.3790404
-71.0940013
219 Cambridge St, Allston, MA 02134
42.3583
-71.1263
1008 Beacon St, Brookline, MA 02446
42.3458805
-71.1082714
42.3517146
-71.1216648
14 Mcgrath Hwy, Somerville, MA 02143
42.3745146
-71.0855527
Harvard Square Station, Cambridge, MA 02163
42.373362
-71.118956
209 N Harvard St, Allston, MA 02134
42.362548
-71.1303143
519 Somerville Ave, Somerville, MA 02143
42.3831933
-71.1062606
333 Newbury St, Boston, MA 02115
42.3485475
-71.0864513
5 3rd St, Cambridge, MA 02141
42.3724142
-71.0794484
1420 Boylston St, Boston, MA 02215
42.3435354
-71.1015308
153 Massachusetts Ave, Boston, MA 02115
42.34649
-71.0873454
100 Cambridgeside Place, Cambridge, MA 02141
42.367101
-71.076376
90 Washington St, Somerville, MA 02143
42.381481
-71.0851061
330 Brookline Ave, Boston, MA 02215
42.340164
-71.105855
42.3497
-71.08
42.348296
-71.083099
210 Harvard Ave, Allston, MA 02134
42.3499
-71.1302
179 Brighton Ave, Allston, MA 02134
42.3532517
-71.1336082
53 Huntington Ave, Boston, MA 02199
42.3484649
-71.0780325
1316 Beacon St, Brookline, MA 02446
42.3423157
-71.1211249
350 Longwood Ave, Boston, MA 02115
42.3386435
-71.1068662
Grogan ESD.86
22
154 Highland Ave, Somerville, MA 02143
42.3884292
-71.1035077
42.3420175
-71.0859189
430 Stuart St, Boston, MA 02116
42.3484
-71.075
99 Cambridge St, Charlestown, MA 02129
42.3823624
-71.0791666
457 Brookline Ave, Boston, MA 02215
42.3376055
-71.1087189
145 Dartmouth St, Boston, MA 02116
42.34719
-71.075406
60 Everett St, Allston, MA 02134
42.3559771
-71.1385555
1 White St, Cambridge, MA 02140
42.3885801
-71.1190066
509 Cambridge St, Allston, MA 02134
42.353605
-71.1374228
434 Mass Ave, Boston, MA 02118
42.334632
-71.0691073
220 Broadway, Somerville, MA 02145
42.3896854
-71.0883978
125 Nashua St, Boston, MA 02114
42.3678
-71.0649
Causeway Street, Boston, MA 02114
42.3654553
-71.0611291
8 Park Plz, Boston, MA 02116
42.3521066
-71.0673109
709 McGrath Hwy, Somerville, MA 02145
42.3904617
-71.0873328
1631 Tremont St, Boston, MA 02120
42.3341
-71.1039
106 Cambridge St, Boston, MA 02114
42.3611697
-71.0628675
59 Causeway St, Boston, MA 02114
42.3643423
-71.0633287
100 Cambridge St, Boston, MA 02114
42.360367
-71.062124
22 Beacon St, Boston, MA 02108
42.3577347
-71.063153
42.3353579
-71.0878575
1 Legends Way, Boston, MA 02114
42.3643579
-71.0661193
11 Austin St, Charlestown, MA 02129
42.3751605
-71.0650097
42.3523815
-71.0650276
100 Legends Way, Boston, MA 02114
42.3643579
-71.0661193
180 Canal St, Boston, MA 02114
42.3652919
-71.0609754
42.3564
-71.0618
8 Harvard St, Brookline, MA 02445
42.3334223
-71.1188349
630 Washington St., Boston, MA 02111
42.352419
-71.062727
750 Washington St, Proger Bldg, Boston, MA 02111
42.3490894
-71.0601767
16 Tremont St., Boston, MA 02116
42.3592205
-71.0595142
616 Massachusetts Ave, Boston, MA 02118
42.3369897
-71.0774615
1 Summer St, Boston, MA 02110
42.354485
-71.05942
16 Kneeland St, Boston, MA 02111
42.3508657
-71.0622508
417 Washington St, Boston, MA 02108
42.3557073
-71.0602716
76 Middlesex Ave, Somerville, MA 02145
42.3932588
-71.0832451
20 Boylston St, Brookline, MA 02445
42.3318209
-71.1174846
42.3975134
-71.104938
214 N. Beacon St, Brighton, MA 02135
42.3558
-71.149
42.3438149
-71.0660403
244 Elm St, Somerville, MA 02144
42.3953081
-71.1218721
Grogan ESD.86
23
100 City Hall Plaza, Boston, MA 02114
42.3582252
-71.0590107
42.3578883
-71.0579421
2 City Hall Sq, Boston, MA 02108
42.3577028
-71.0593792
498 Mystic Ave, Stop & Shop, Somerville, MA 02145
42.3930959
-71.0884036
1 Congress St, Boston, MA 02114
42.362775
-71.059621
101 Summer St., Boston, MA 02110
42.3535216
-71.0578313
10 Winthrop Sq, Boston, MA 02110
42.354979
-71.0577489
20 North St, Boston, MA 02109
42.3608194
-71.0558302
3 Post Office Square, Boston, MA 02109
42.35672
-71.056577
736 Cambridge St, Brighton, MA 02135
42.3486807
-71.1496925
850 Harrison Ave, Roxbury, MA 02118
42.3348927
-71.0751815
265 Boylston St, Brookline, MA 02445
42.3305118
-71.1238184
111 State Street, Boston, MA 02109
42.3591596
-71.0548161
176 Federal St, Boston, MA 02110
42.3536133
-71.0561934
2360 Washington St, Roxbury, MA 02119
42.3294746
-71.084687
517 Concord Ave, Cambridge, MA 02138
42.3863016
-71.1387706
One Fleet Center, Boston, MA 02114
42.365841
-71.060724
42.3543283
-71.0543216
265 Franklin St, Boston, MA 02109
42.356767
-71.0535218
201 Alewife Brook Pky, Cambridge, MA 02138
42.3892258
-71.1429552
42.4009618
-71.1169843
17 Melnea Cass Blvd, Boston, MA 02119
42.3312853
-71.0748777
350 Washington St, Brighton, MA 02135
42.3490753
-71.1530339
70 E India Row, Boston, MA 02110
42.3579577
-71.0508084
635 Mount Auburn St, Watertown, MA 02472
42.3714388
-71.157704
22 W Broadway, South Boston, MA 02127
42.3425236
-71.0565548
315 Centre St, Jamaica Plain, MA 02130
42.3232
-71.1036
42.398885
-71.1321608
5 Cambridgepark Dr, Cambridge, MA 02140
42.3946428
-71.1424549
268 Summer St, Boston, MA 02210
42.3503648
-71.0503943
275 Mystic Ave, Medford, MA 02155
42.4053859
-71.101411
485 Arsenal St., Watertown, MA 02472
42.363237
-71.155593
42.351117
-71.049438
3850 Mystic Valley Pkwy, Medford, MA 02155
42.4058055
-71.0948504
50 Broadway, Everett, MA 02149
42.3967042
-71.0651774
1955 Beacon St, Brighton, MA 02135
42.3358927
-71.1499417
75 Old Colony Ave, South Boston, MA 02127
42.3358
-71.056
200 Seaport Blvd, Boston, MA 02210
42.349535
-71.040589
620 Fellsway, Medford, MA 02155
42.4074146
-71.0826755
1100 Mass Ave, Dorchester, MA 02125
42.313405
-71.0570969
1926 Columbus Ave, Roxbury, MA 02119
42.3168307
-71.0982039
Grogan ESD.86
24
13-15 Maverick Sq, East Boston, MA 02128
42.369
-71.0391999
510 Southampton St, Boston, MA 02127
42.3297
-71.0572
42.3505411
-71.1672294
7 Commercial Street, Medford, MA 02155
42.4108632
-71.0881968
364 Boston Ave, Medford, MA 02155
42.4110023
-71.1208938
34 Central Sq., East Boston, MA 02128
42.3743987
-71.0395548
276 Beacham St, Chelsea, MA 02150
42.3958
-71.0526
154 Main St, Medford, MA 02155
42.4142048
-71.1106422
482 W Broadway, Boston, MA 02127
42.3355984
-71.0458407
42.403701
-71.057403
132 Main St, Everett, MA 02149
42.4054905
-71.0617024
42.3118183
-71.114301
283 Middlesex Ave, Medford, MA 02155
42.4122525
-71.0791565
15 Commonwealth Ave, Chestnut Hill, MA 02467
42.3399488
-71.1671857
256 Boston St, Dorchester, MA 02125
42.3209
-71.061
1 Old Harbor St, South Boston, MA 02127
42.3345983
-71.0475223
130 Broadway, Chelsea, MA 02150
42.3892
-71.0408
101 Broadway, Arlington, MA 02474
42.409729
-71.1394997
379 Alewife Brook Pky, Somerville, MA 02144
42.4134649
-71.131227
42.3104
-71.1152
847 Dorchester Ave, Dorchester, MA 02125
42.3217677
-71.0568335
2 Salem St, Medford Square, Medford, MA 02155
42.4182977
-71.1096766
1885 Revere Beach Pkwy, Everett, MA 02149
42.4026488
-71.0497033
1886 Revere Beach Pkwy, Everett, MA 02149
42.4029719
-71.0576032
83 Everett Ave, Chelsea, MA 02150
42.3938
-71.0386
456 Blue Hill Ave, Dorchester, MA 02121
42.3094486
-71.0825407
49 Mount Auburn St, Watertown, MA 02472
42.3663261
-71.1818116
234 Everett Ave, Chelsea, MA 02150
42.3983547
-71.040518
100 Service Rd, East Boston, MA 02128
42.3685455
-71.0300292
369 Massachusetts Ave, Arlington, MA 02474
42.4120094
-71.1486312
42.4095468
-71.0532538
12 Washington St, Chelsea, MA 02150
42.3934689
-71.0338541
430 Salem St, Medford, MA 02155
42.423165
-71.0912071
200 Commercial St, Malden, MA 02148
42.4216695
-71.0753317
4 Harvard Ave, Medford, MA 02155
42.4211848
-71.1331323
345 Washington St., Newton, MA 02458
42.3567005
-71.1875089
353 Trapelo Rd, Belmont, MA 02478
42.3853394
-71.1833662
99 Charles St, Malden, MA 02148
42.423576
-71.0712012
1236 Dorchester Ave, Dorchester, MA 02125
42.3084253
-71.0582217
372 Washington Ave, Chelsea, MA 02150
42.4050143
-71.0355293
321 Ferry St, Everett, MA 02149
42.4144921
-71.0474161
Grogan ESD.86
25
171 Watertown Street, Watertown, MA 02472
42.3622518
-71.1931601
21 Summer St, Arlington, MA 02474
42.4194124
-71.1527705
57 Eastern Ave, Malden, MA 02148
42.4239636
-71.0656994
245 Pleasant St, Malden, MA 02148
42.4273278
-71.0740543
448 Main St, Watertown, MA 02472
42.3709389
-71.1952175
52 Church St, Belmont, MA 02478
42.3870472
-71.190931
7 Walk Hill St, Jamaica Plain, MA 02130
42.2956999
-71.1162
424 Main St, Malden, MA 02148
42.4265519
-71.0672918
356 Eastern Ave., Chelsea, MA 02150
42.3982
-71.0207
936-942 Broadway, Chelsea, MA 02150
42.4009
-71.0216
42.4201478
-71.0439663
Address
Latitude
Longitude
42.3656973
-71.1041728
750 Memorial Drive, Cambridge, MA 02139
42.3567886
-71.0945479
775 Commonwealth Ave., Boston, MA 02215
42.350478
-71.1096413
42.3488
-71.0994
Starbucks Locations
42.349347
-71.099656
6 Cambridge Center, Cambridge, MA 02142
42.362707
-71.0864538
874 Commonwealth Ave, Brookline, MA 02215
42.3507655
-71.1138252
2 Cambridge Center, Cambridge, MA 02142
42.362793
-71.086199
142-148 Brookline Avenue, Boston, MA 02215
42.3449155
-71.1009163
42.3738963
-71.1126878
350 Newbury Street, Boston, MA 02116
42.3481
-71.0872
147-151 Massachusetts Avenue, Boston, MA 02115
42.3466
-71.0878
36 JFK Street, Cambridge, MA 02138
42.3720035
-71.1206418
755 Boylston Street, Boston, MA 02116
42.349244
-71.080974
39 Dalton Street, Boston, MA 02199
42.346302
-71.084032
165 Newbury Street, Boston, MA 02116
42.3508
-71.0789
31 Church Street, Cambridge, MA 02138
42.3743901
-71.1200808
100 Cambridgeside Place, Cambridge, MA 02141
42.3673199
-71.0775368
42.346175
-71.079397
42.341792
-71.0862592
42.3488089
-71.0775803
364 Brookline Ave., Boston, MA 02215
42.3384972
-71.107699
441 Stuart Street, Boston, MA 02116
42.3485141
-71.076156
283 Longwood Avenue, Boston, MA 02115
42.3379
-71.1045
42.3402211
-71.0889516
277 Harvard St, Brookline, MA 02146
42.3491045
-71.1299685
42.3515
-71.0729999
Grogan ESD.86
26
473 Havard Street, Brookline, MA 02146
42.3470869
-71.1285487
97 Charles Street, Boston, MA 02114
42.3588955
-71.0707138
1 Charles Street, Boston, MA 02108
42.351456
-71.067188
42.3818755
-71.1198028
222 Cambridge Street, Boston, MA 02114
42.3600106
-71.0583794
42.3424754
-71.0744816
711-723 Somerville Ave, Somerville, MA 02143
42.3854634
-71.1135012
143 Stuart Street, Boston, MA 02116
42.3511502
-71.0662507
42.352269
-71.064369
15 Harvard Street, Brookline, MA 02445
42.3337518
-71.1188552
821 Washington Street, Boston, MA 02111
42.348839
-71.06427
12 Winter Street, Boston, MA 02106
42.3557941
-71.0613094
63-65 Court Street, Boston, MA 02108
42.3593092
-71.0594142
27 School Street, Boston, MA 02106
42.3774405
-71.0647911
240 Washington St., Boston, MA 02108
42.357925
-71.057883
1655 Beacon Street, Brookline, MA 02146
42.3388327
-71.1367592
75-101 Federal Street, Boston, MA 02110
42.3549
-71.0569
1 Federal Street, Boston, MA 02110
42.355626
-71.056716
125 Summer Street, Boston, MA 02110
42.3531
-71.0575
2-4 Faneuil Hall Market Place, Boston, MA 02101
42.35996
-71.05579
84 State Street, Boston, MA 02109
42.3590059
-71.0559325
1 Financial Center, Boston, MA 02111
42.352344
-71.056266
211 Congress Street, Boston, MA 02110
42.3547995
-71.0548818
296 State St, Boston, MA 02109
42.3602
-71.0509
1 International Place, Boston, MA 02110
42.3561655
-71.0520816
1660-1670 Soldiers Field Rd, Brighton, MA 02135
42.3589
-71.1536
2 Atlantic Avenue, Boston, MA 02110
42.364
-71.0505
260 Elm Street, Somerville, MA 02144
42.3955927
-71.1220376
550 Arsenal Street, Watertown, MA 02472
42.363459
-71.157439
222 Alewife Brook Parkway, Cambridge, MA 02138
42.389376
-71.142569
7 Allstate Rd, Boston, MA 02125
42.3293
-71.0627
42.3486
-71.1596
MBTA Station Locations

Name
Latitude
Longitude
Alewife Station
42.39490705
-71.14098072
Davis Station
42.39606385
-71.12205505
Porter Square Station
42.38834612
-71.1192441
Harvard Square Station
42.373939
-71.119106
Central Station
42.36516345
-71.10332251
Kendall/MIT Station
42.36246023
-71.08658552
Grogan ESD.86
27
Charles/Massachusetts General Hospital Station
42.36127109
-71.07208014
Park Street Station
42.35619719
-71.06229544
Downtown Crossing Station
42.355295
-71.060788
South Station
42.35170961
-71.05499983
Broadway Station
42.3429
-71.05713
Andrew Station
42.32955
-71.05696
JFK / UMass Station
42.32143786
-71.05239272
Savin Hill Station
42.31130702
-71.05322957
Fields Corner Station
42.30026198
-71.06070757
Shawmut Station
42.29279438
-71.06578231
Ashmont Station
42.285924
-71.064219
Cedar Grove Station
42.27842012
-71.05974197
Butler Station
42.27211695
-71.06276751
Milton Station
42.27034655
-71.06794953
Central Avenue Station
42.27001311
-71.07324958
Valley Road Station
42.26789332
-71.08306646
Capen Street Station
42.2675678
-71.08722925
Mattapan Station
42.26745665
-71.09313011
North Quincy Station
42.27481612
-71.02917552
Wollaston Station
42.26561466
-71.01940155
Quincy Center Station
42.25093242
-71.00497127
Quincy Adams Station
42.23275157
-71.00714922
Braintree Station
42.20878042
-71.00133419
Lechmere Station
42.370582
-71.076884
Science Park Station
42.36667752
-71.06816411
North Station
42.365512
-71.061423
Haymarket Station
42.362498
-71.058996
Government Center Station
42.359297
-71.059895
Boylston Street Station
42.35239149
-71.06487036
Arlington Station
42.351868
-71.070498
Copley Station
42.349962
-71.078089
Hynes Convention Center/ICA Station
42.348097
-71.088396
Kenmore Station
42.348797
-71.095296
Blandford Street Station
42.349297
-71.100796
Boston University East Station
42.349648
-71.103825
Boston University Central Station
42.34993352
-71.10618711
Boston University West Station
42.35090086
-71.1140728
St. Paul Street Station (B)
42.3511308
-71.11590743
Pleasant Street Station
42.35134488
-71.11821413
Babcock Street Station
42.35174133
-71.12126112
Packards Corner Station
42.35207434
-71.12486601
Grogan ESD.86
28
Harvard Avenue Station
42.35023483
-71.13102436
Griggs Street / Long Avenue Station
42.34871243
-71.13415718
Allston Street Station
42.34844284
-71.13778353
Warren Street Station
42.34847455
-71.14029408
Washington Street Station
42.34368509
-71.142869
Sutherland Street Station
42.34149641
-71.14662409
Chiswick Road Station
42.3403386
-71.15130186
Chestnut Hill Avenue Station
42.33808635
-71.15334034
South Street Station
42.33957728
-71.15778208
Boston College Station
42.33994208
-71.16619349
St. Marys Street Station
42.34613537
-71.10680938
Hawes Street Station
42.34495386
-71.11101508
Kent Street Station
42.343997
-71.114596
St. Paul Street Station (C)
42.34322516
-71.11734509
Coolidge Corner Station
42.342097
-71.121396
Winchester Street / Summit Avenue Station
42.34128229
-71.12461925
Brandon Hall Station
42.340072
-71.128526
Fairbanks Station
42.339609
-71.13134623
Washington Square Station
42.33933937
-71.13542318
Tappan Street Station
42.33846702
-71.13879204
Dean Road Station
42.33770568
-71.14196777
Englewood Avenue Station
42.33713468
-71.14512205
Cleveland Circle Station
42.33589747
-71.1507225
Fenway Station
42.34528691
-71.10439539
Longwood Station
42.34044962
-71.11089706
Brookline Village Station
42.33204296
-71.11811757
Brookline Hills Station
42.33121016
-71.12586379
Beaconsfield Station
42.33596092
-71.14160299
Reservoir Station
42.33493783
-71.14940286
Chestnut Hill Station
42.32667321
-71.16551757
Newton Centre Station
42.32935418
-71.1923182
Newton Highland Station
42.32169964
-71.20617986
Eliot Station
42.31919287
-71.21691942
Waban Station
42.32626075
-71.23117805
Woodland Station
42.33368473
-71.24492168
Riverside Station
42.33711088
-71.2517345
Prudential Station
42.34563581
-71.08158588
Symphony Station
42.342697
-71.085095
Northeastern University Station
42.34032274
-71.08889222
Museum of Fine Arts Station
42.33772154
-71.09547973
Longwood Medical Area Station
42.335837
-71.100652
Grogan ESD.86
29
Brigham Circle Station
42.334097
-71.104996
Fenwood Street Station
42.33374818
-71.10558629
Mission Park Station
42.33322472
-71.1070776
Riverway Station
42.33197951
-71.11207724
Back of the Hill Station
42.33007596
-71.11133695
Heath Street Station
42.3287593
-71.11059666
State Station
42.358897
-71.057795
Aquarium Station
42.359456
-71.05357
Maverick Station
42.36886
-71.039926
Logan Airport Station
42.37273343
-71.0351944
Wood Island Station
42.380797
-71.023394
Orient Heights Station
42.386676
-71.006628
Suffolk Downs Station
42.38840159
-71.00035787
Beachmont Station
42.39741872
-70.99219322
Revere Beach Station
42.40716336
-70.99219322
Wonderland Station
42.414246
-70.992144
Oak Grove Station
42.43534302
-71.07118964
Malden Center Station
42.42731334
-71.07387185
Wellington Station
42.40429559
-71.07700467
Sullivan Square Station (Broadway Exit)
42.38575484
-71.07707977
Sullivan Square Station (Cambridge Street Exit)
42.38301288
-71.07710123
Community College Station
42.37263832
-71.07027769
Chinatown Station
42.352228
-71.062892
New England Medical Center Station
42.349873
-71.063795
Back Bay Station
42.34727722
-71.07603908
Massachusetts Avenue Station (Orange)
42.34155192
-71.08321667
Ruggles Station
42.33566748
-71.090523
Roxbury Crossing Station
42.33152742
-71.09540462
Jackson Square Station
42.32273881
-71.1000824
Stony Brook Station
42.31920081
-71.10282898
Green Street Station
42.31056915
-71.10731363
Forest Hills Station
42.29814321
-71.11548901
Herald Street Station
42.346377
-71.064842
East Berkeley Street Station
42.343878
-71.066039
Union Park Street Station
42.341197
-71.069795
Newton Street Station
42.338697
-71.073795
Worcester Square Station
42.337456
-71.075812
Massachusetts Avenue Station (Silver)
42.336441
-71.077238
Lenox Street Station
42.33504887
-71.07881784
Melnea Cass Boulevard Station
42.33290747
-71.0810709
Dudley Square Station
42.32889414
-71.08511567
Grogan ESD.86
30
Courthouse Station
42.35207434
-71.04530096
World Trade Center Station
42.3488393
-71.04253292
Silver Line Way Station
42.34801465
-71.0371685
Silver Line (Airport = SL1)
42.36628117
-71.01931572
Northern Avenue at Harbor Street, Boston
42.34661908
-71.03523731
Northern Avenue at Tide Street
42.34509659
-71.03197575
25 Dry Dock Avenue
42.344602
-71.028307
88 Black Falcon Avenue
42.34393885
-71.02721214
Black Falcon Avenue at Design Center Place
42.3438992
-71.03431463
Dry Dock Avenue at Design Center Place
42.34468
-71.034797
Summer Street at Power House Street
42.33986278
-71.03553772
East First Street at M Street
42.3381498
-71.03345633
City Point
42.3382291
-71.02935791
Population Density
Data is formatted as a 254x254-pixel GIF, scaled to approximately 25.4 pixels per mile. Black pixels
indicate 100 units of population, white pixels indicate 0 units of population.
Target Area Map

Data is formatted as a 500x500-pixel JPG, scaled to approximately 50 pixels per mile.
Grogan ESD.86
31
Random Customers and Paired Dunkin Donuts

Latitude
Longitude
Latitude
Longitude
Euclidean
Distance
(De, miles)
42.3082
-71.1327
42.3104
-71.1152
0.9063
1.0454
1.2
42.4231
-71.0546
42.424
-71.0657
0.5698
0.6263
0.7
42.3773
-71.1012
42.379
-71.094
0.3867
0.4878
0.4
42.3162
-71.0922
42.3168
-71.0982
0.3096
0.3501
0.5
42.3481
-71.1549
42.3491
-71.153
0.1167
0.1627
0.1
42.4119
-71.1774
42.4194
-71.1528
1.3603
1.7764
1.8
42.3666
-71.0827
42.3628
-71.084
0.2709
0.3305
0.4
42.3673
-71.0242
42.3685
-71.03
0.3098
0.3836
0.3
42.3539
-71.1775
42.3567
-71.1875
0.5464
0.7045
0.6
42.3199
-71.1624
42.3359
-71.1499
1.275
1.741
1.7
42.3285
-71.1936
42.3399
-71.1672
1.5634
2.1395
1.7
42.3332
-71.0489
42.3346
-71.0475
0.1195
0.1669
0.1
Customer
Grogan ESD.86
Paired Dunkin Donuts
Manhattan
Distance
(Dm, miles)
Google
Distance
(Dg, miles)
32
42.3654
-71.1112
42.3651
-71.1032
0.4071
0.4288
0.5
42.3694
-71.1135
42.3722
-71.1157
0.2217
0.3025
0.2
42.3503
-71.0367
42.3495
-71.0406
0.2055
0.2514
0.3
42.4149
-71.1224
42.411
-71.1209
0.2801
0.3462
0.4
42.3359
-71.1003
42.3341
-71.1039
0.2219
0.3082
0.3
42.3812
-71.0732
42.3824
-71.0792
0.315
0.3849
0.5
42.3337
-71.1814
42.3399
-71.1672
0.8444
1.1574
42.3293
-71.1063
42.3341
-71.1039
0.3536
0.4542
0.7
42.3598
-71.0539
42.3592
-71.0548
0.0644
0.091
0.08
42.4084
-71.0839
42.4074
-71.0827
0.0924
0.1306
0.3
42.3766
-71.0385
42.3744
-71.0396
0.1613
0.2059
0.2
42.3112
-71.0867
42.3094
-71.0825
0.2444
0.3334
0.3
42.3756
-71.1229
42.3734
-71.119
0.2539
0.356
0.3
42.3173
-71.0822
42.3094
-71.0825
0.5428
0.5599
0.6
42.3482
-71.1098
42.3459
-71.1083
0.1783
0.2383
0.3
42.3658
-71.1789
42.3663
-71.1818
0.153
0.185
0.2
42.4109
-71.084
42.4109
-71.0882
0.2143
0.2168
0.4
42.3824
-71.0294
42.3892
-71.0408
0.748
1.0518
42.3994
-71.1121
42.401
-71.117
0.2717
0.3573
0.3
42.3123
-71.09
42.3094
-71.0825
0.4288
0.5778
0.5
42.3648
-71.092
42.3665
-71.0939
0.15
0.2111
0.2
42.4141
-71.1559
42.412
-71.1486
0.3982
0.5155
0.5
42.3101
-71.0786
42.3094
-71.0825
0.2062
0.2462
0.3
42.3792
-71.1679
42.3714
-71.1577
0.7473
1.0568
0.9
42.3063
-71.0601
42.3084
-71.0582
0.1754
0.2427
0.2
42.3498
-71.158
42.3491
-71.153
0.2584
0.3036
0.3
42.3895
-71.1657
42.3853
-71.1834
0.9466
1.1894
1.4
42.3089
-71.0686
42.3084
-71.0582
0.5308
0.5626
0.7
42.3294
-71.0452
42.3346
-71.0475
0.3782
0.4777
2.7
42.3171
-71.1547
42.3359
-71.1499
1.321
1.5414
1.8
42.3781
-71.1659
42.3714
-71.1577
0.622
0.8787
0.7
42.3287
-71.1906
42.3399
-71.1672
1.4258
1.9726
1.7
42.3303
-71.1389
42.3359
-71.1499
0.6834
0.9501
0.8
42.4068
-71.0983
42.4054
-71.1014
0.1865
0.2565
0.4
42.348
-71.0513
42.3504
-71.0504
0.1698
0.2096
0.2
42.3095
-71.0549
42.3084
-71.0582
0.1851
0.2438
0.3
42.3212
-71.191
42.3399
-71.1672
1.7766
2.5112
2.4
42.3435
-71.1304
42.3499
-71.1302
0.4423
0.4524
0.8
42.4108
-71.1094
42.4142
-71.1106
0.2436
0.2987
0.3
42.3746
-71.1139
42.3722
-71.1157
0.1893
0.2564
0.2
42.3297
-71.1879
42.3399
-71.1672
1.2727
1.7656
1.5
Grogan ESD.86
33
42.3429
-71.0875
42.342
-71.0859
0.1012
0.1417
0.2
42.4125
-71.0943
42.4109
-71.0882
0.3315
0.4247
0.4
42.4142
-71.1431
42.412
-71.1486
0.3204
0.4337
0.4
42.422
-71.1892
42.4194
-71.1528
1.8684
2.0386
2.3
42.3311
-71.1714
42.3399
-71.1672
0.6481
0.8265
0.8
42.3686
-71.1429
42.3632
-71.1556
0.7465
1.0186
1.2
42.3192
-71.0623
42.3209
-71.061
0.1349
0.1838
0.2
42.3003
-71.1411
42.2957
-71.1162
1.3103
1.589
2.4
42.3312
-71.0764
42.3313
-71.0749
0.0779
0.0836
0.083
42.4268
-71.0322
42.4201
-71.044
0.7564
1.0603
0.9
42.3545
-71.0299
42.3495
-71.0406
0.6446
0.8887
8.2
42.3158
-71.0907
42.3168
-71.0982
0.3897
0.4543
0.6
42.4124
-71.0296
42.405
-71.0355
0.5933
0.813
0.8
42.2998
-71.1247
42.2957
-71.1162
0.5182
0.7172
1.1
42.311
-71.1881
42.3399
-71.1672
2.2673
3.0679
3.5
42.3287
-71.1439
42.3359
-71.1499
0.5849
0.8054
0.6
42.3957
-71.1896
42.387
-71.1909
0.6017
0.6658
42.4139
-71.0443
42.4145
-71.0474
0.1643
0.2
0.3
42.3251
-71.1197
42.3305
-71.1238
0.429
0.5842
0.8
42.3843
-71.179
42.3853
-71.1834
0.2342
0.2947
0.5
42.4035
-71.1712
42.412
-71.1486
1.2935
1.7401
1.7
42.4206
-71.153
42.4194
-71.1528
0.0829
0.0938
0.1
42.4108
-71.1563
42.412
-71.1486
0.4003
0.4751
0.8
42.3475
-71.0438
42.3495
-71.0406
0.216
0.3045
0.4
42.419
-71.0619
42.424
-71.0657
0.394
0.5369
0.6
42.3532
-71.1351
42.3533
-71.1336
0.0762
0.0797
0.078
42.3171
-71.1281
42.3118
-71.1143
0.7934
1.0694
1.2
Grogan ESD.86
34
Appendix B: MATLAB Files

Attached files:
Processing script (project.m)
Haversine function (haversine.m)
Pixel-Cartesian transformation (ij2xy.m)
Cartesian-Pixel transformation (xy2ij.m)
Grogan ESD.86
35
5/5/10 6:47 PM
C:\Documents and Settings\Paul Grogan\My Documents\...\project.m
% ESD.86 Term Project - Paul Grogan

% April 12, 2010
clc; clear all; close all;
dd = xlsread('data2.xlsx','DD');
sb = xlsread('data2.xlsx','SB');
mbta = xlsread('data2.xlsx','MBTA');
mit = [42.363253 -71.103086];
%% Calculate Distances using "Haversine" Formula
dd_dist = haversine(mit(1),mit(2),dd(:,1),dd(:,2));
sb_dist = haversine(mit(1),mit(2),sb(:,1),sb(:,2));
mbta_dist = haversine(mit(1),mit(2),mbta(:,1),mbta(:,2));
%% Transform to Cartesian Coordinates
dd_xy = [haversine(mit(1),mit(2),mit(1),dd(:,2)).*sign(dd(:,2)-mit(2))...
haversine(mit(1),mit(2),dd(:,1),mit(2)).*sign(dd(:,1)-mit(1))];
sb_xy = [haversine(mit(1),mit(2),mit(1),sb(:,2)).*sign(sb(:,2)-mit(2))...
haversine(mit(1),mit(2),sb(:,1),mit(2)).*sign(sb(:,1)-mit(1))];
mbta_xy = [haversine(mit(1),mit(2),mit(1),mbta(:,2)).*...
sign(mbta(:,2)-mit(2))...
haversine(mit(1),mit(2),mbta(:,1),mit(2)).*sign(mbta(:,1)-mit(1))];
%% Load Map Image and Transform to Pixel Coordinates
I = imread('map.jpg');
W_i = length(I);
% image width (pixels)
W_m = 10;
% map width (miles)
dd_ij = zeros(size(dd_xy));
[dd_ij(:,1) dd_ij(:,2)] = xy2ij(dd_xy(:,1),dd_xy(:,2),W_m,W_i);
sb_ij = zeros(size(sb_xy));
[sb_ij(:,1) sb_ij(:,2)] = xy2ij(sb_xy(:,1),sb_xy(:,2),W_m,W_i);
mbta_ij = zeros(size(mbta_xy));
[mbta_ij(:,1) mbta_ij(:,2)] = xy2ij(mbta_xy(:,1),mbta_xy(:,2),W_m,W_i);
%% Load and Process Population Data
P = imread('population.gif');
W_p = length(P);
pop_xy = zeros(sum(sum(P)),2);
pop_count = 1;
for i=1:length(P)
for j=1:length(P)
if P(j,i)
[pop_xy(pop_count,1) pop_xy(pop_count,2)] = ij2xy(i,j,W_p,W_m);
pop_count = pop_count+1;
end
end
end
pop_ij = zeros(size(pop_xy));
1 of 15
5/5/10 6:47 PM
[pop_ij(:,1) pop_ij(:,2)] = xy2ij(pop_xy(:,1),pop_xy(:,2),W_m,W_i);

%% Discretize Data into Sectors
N_s = 10;
W_s = 7/N_s;
S_xy = zeros(N_s+1,1);
for i=0:N_s
S_xy(i+1) = -N_s*W_s/2 + W_s*i;
end
S_ij = xy2ij(S_xy,S_xy,W_m,W_i);
number_dd = zeros(N_s);
number_sb = zeros(N_s);
number_mbta = zeros(N_s);
number_pop = zeros(N_s);
for i=1:N_s
for j=1:N_s
number_dd(j,i) = sum((dd_xy(:,1)>S_xy(i)) .* ...
(dd_xy(:,1)<=S_xy(i)+W_s) .* ...
(dd_xy(:,2)>S_xy(end-j)) .* ...
(dd_xy(:,2)<=S_xy(end-j)+W_s));
number_sb(j,i) = sum((sb_xy(:,1)>S_xy(i)) .* ...
(sb_xy(:,1)<=S_xy(i)+W_s) .* ...
(sb_xy(:,2)>S_xy(end-j)) .* ...
(sb_xy(:,2)<=S_xy(end-j)+W_s));
number_mbta(j,i) = sum((mbta_xy(:,1)>S_xy(i)) .* ...
(mbta_xy(:,1)<=S_xy(i)+W_s) .* ...
(mbta_xy(:,2)>S_xy(end-j)) .* ...
(mbta_xy(:,2)<=S_xy(end-j)+W_s));
number_pop(j,i) = 100*sum((pop_xy(:,1)>S_xy(i)) .* ...
(pop_xy(:,1)<=S_xy(i)+W_s) .* ...
(pop_xy(:,2)>S_xy(end-j)) .* ...
(pop_xy(:,2)<=S_xy(end-j)+W_s));
end
end
%% Define Neighborhoods (hard-coded for 10x10 grid)
neighborhoods = {'northwest'; 'cambridge'; 'northeast'; ...
'downtown'; 'back_bay'; 'southwest'; 'southeast'};
cambridge = [13:14 23:24 33:36 43:47 53:57];
northeast = [5:10 15:20 25:30 37:39];
northwest = [11:12 21:22 31:32 41:42];
downtown = [48:49 58:59];
southwest = [51:52 61:65 71:75 81:85];
back_bay = 66:68;
southeast = [69:70 76:80 86:90 96:100];
unused = [1:4 40 50 60 91:95];
%% Build Neighborhood Probabalistic Model
2 of 15
5/5/10 6:47 PM
POP_n = zeros(length(neighborhoods),1);
A_n = zeros(length(neighborhoods),1);
DD_n = zeros(length(neighborhoods),1);
SB_n = zeros(length(neighborhoods),1);
number_pop_s = reshape(number_pop',N_s^2,1);
number_dd_s = reshape(number_dd',N_s^2,1);
number_sb_s = reshape(number_sb',N_s^2,1);
for i=1:length(neighborhoods)
A_n(i) = W_s^2*eval(['length(' neighborhoods{i} ')']);
DD_n(i) = sum(eval(['number_dd_s(' neighborhoods{i} ')']));
SB_n(i) = sum(eval(['number_sb_s(' neighborhoods{i} ')']));
POP_n(i) = sum(eval(['number_pop_s(' neighborhoods{i} ')']));
end
gamma_dd = DD_n./A_n;
gamma_sb = SB_n./A_n;
p_cust = POP_n./sum(POP_n);
%% Evaluate Model Results

exp_de_dd
exp_de_sb
exp_dm_dd
exp_dm_sb
=
=
=
=
sqrt(1./(4*gamma_dd));
sqrt(1./(4*gamma_sb));
sqrt(pi()./(8*gamma_dd));
sqrt(pi()./(8*gamma_sb));
exp_de = sqrt(1./(4*(gamma_dd+gamma_sb)));
exp_dm = sqrt(pi()./(8*(gamma_dd+gamma_sb)));
total_exp_de_dd
total_exp_de_sb
total_exp_dm_dd
total_exp_dm_sb
=
=
=
=
p_cust'*exp_de_dd;
p_cust'*exp_de_sb;
p_cust'*exp_dm_dd;
p_cust'*exp_dm_sb;
total_exp_dm = p_cust'*exp_dm;
total_exp_de = p_cust'*exp_de;
%% Compare the Euclidean and Manhattan Metrics
% use population data as customers
target_pop_xy = pop_xy((pop_xy(:,1)>S_xy(1)) .* ...
(pop_xy(:,1)<S_xy(end)) .* (pop_xy(:,2)>S_xy(1)) .* ...
(pop_xy(:,2)<S_xy(end))==1,:);
% use mbta stations as customers
% target_pop_xy = mbta_xy((mbta_xy(:,1)>S_xy(1)) .* ...
%
(mbta_xy(:,1)<S_xy(end)) .* (mbta_xy(:,2)>S_xy(1)) .* ...
%
(mbta_xy(:,2)<S_xy(end))==1,:);
closest_de_dd = zeros(length(target_pop_xy),1);
closest_de_sb = zeros(length(target_pop_xy),1);
closest_dm_dd = zeros(length(target_pop_xy),1);
closest_dm_sb = zeros(length(target_pop_xy),1);
for i=1:length(target_pop_xy)
3 of 15
5/5/10 6:47 PM
closest_de_dd(i) = min(sqrt((target_pop_xy(i,1)-dd_xy(:,1)).^2+ ...

(target_pop_xy(i,2)-dd_xy(:,2)).^2));
closest_de_sb(i) = min(sqrt((target_pop_xy(i,1)-sb_xy(:,1)).^2+ ...
(target_pop_xy(i,2)-sb_xy(:,2)).^2));
closest_dm_dd(i) = min(abs(target_pop_xy(i,1)-dd_xy(:,1))+ ...
abs(target_pop_xy(i,2)-dd_xy(:,2)));
closest_dm_sb(i) = min(abs(target_pop_xy(i,1)-sb_xy(:,1))+ ...
abs(target_pop_xy(i,2)-sb_xy(:,2)));
end
closest_de = min(closest_de_dd,closest_de_sb);
closest_dm = min(closest_dm_dd,closest_dm_sb);
avg_de_dd_s = zeros(N_s^2,1);
avg_de_sb_s = zeros(N_s^2,1);
avg_dm_dd_s = zeros(N_s^2,1);
avg_dm_sb_s = zeros(N_s^2,1);
num_cust_s = zeros(N_s^2,1);
avg_de_s = zeros(N_s^2,1);
avg_dm_s = zeros(N_s^2,1);
for sector=1:N_s^2
num_cust_s(sector) = length(target_pop_xy(...
(target_pop_xy(:,1)>S_xy(mod(sector-1,10)+1)) .* ...
(target_pop_xy(:,1)<S_xy(mod(sector-1,10)+2)) .* ...
(target_pop_xy(:,2)>S_xy(end-ceil(sector/10))) .* ...
(target_pop_xy(:,2)<S_xy(end-ceil(sector/10)+1))==1,:));
avg_de_dd_s(sector) = mean(closest_de_dd(...
avg_de_sb_s(sector) = mean(closest_de_sb(...
avg_dm_dd_s(sector) = mean(closest_dm_dd(...
avg_dm_sb_s(sector) = mean(closest_dm_sb(...
(target_pop_xy(:,2)<S_xy(end-ceil(sector/10))+1)==1,:));
avg_de_s(sector) = mean(closest_de(...
avg_dm_s(sector) = mean(closest_dm(...
4 of 15
5/5/10 6:47 PM
end
avg_de_sb_s(isnan(avg_de_sb_s))=0;
avg_de_dd_s(isnan(avg_de_dd_s))=0;
avg_dm_sb_s(isnan(avg_dm_sb_s))=0;
avg_dm_dd_s(isnan(avg_dm_dd_s))=0;
avg_de_s(isnan(avg_de_s))=0;
avg_dm_s(isnan(avg_dm_s))=0;
avg_de_dd_n = zeros(length(neighborhoods),1);
avg_de_sb_n = zeros(length(neighborhoods),1);
avg_dm_dd_n = zeros(length(neighborhoods),1);
avg_dm_sb_n = zeros(length(neighborhoods),1);
avg_de_n = zeros(length(neighborhoods),1);
avg_dm_n = zeros(length(neighborhoods),1);
num_cust_n = zeros(length(neighborhoods),1);
sectors = eval(neighborhoods{i});
for s=1:length(sectors)
sector = sectors(s);
avg_de_dd_n(i) = (avg_de_dd_n(i)*num_cust_n(i) + ...
avg_de_dd_s(sector)*num_cust_s(sector))/...
(num_cust_n(i)+num_cust_s(sector)+eps);
avg_de_sb_n(i) = (avg_de_sb_n(i)*num_cust_n(i) + ...
avg_de_sb_s(sector)*num_cust_s(sector))/...
avg_dm_dd_n(i) = (avg_dm_dd_n(i)*num_cust_n(i) + ...
avg_dm_dd_s(sector)*num_cust_s(sector))/...
avg_dm_sb_n(i) = (avg_dm_sb_n(i)*num_cust_n(i) + ...
avg_dm_sb_s(sector)*num_cust_s(sector))/...
avg_de_n(i) = (avg_de_n(i)*num_cust_n(i) + ...
avg_de_s(sector)*num_cust_s(sector))/...
avg_dm_n(i) = (avg_dm_n(i)*num_cust_n(i) + ...
avg_dm_s(sector)*num_cust_s(sector))/...
num_cust_n(i) = num_cust_n(i) + num_cust_s(sector);
end
end
avg_de_dd =
avg_de_sb =
avg_dm_dd =
avg_dm_sb =
avg_de = 0;
avg_dm = 0;
0;
0;
0;
0;
5 of 15
5/5/10 6:47 PM
num_cust = 0;
avg_de_dd = (avg_de_dd*num_cust + ...
avg_de_dd_n(i)*num_cust_n(i))/...
(num_cust+num_cust_n(i)+eps);
avg_de_sb = (avg_de_sb*num_cust + ...
avg_de_sb_n(i)*num_cust_n(i))/...
avg_dm_dd = (avg_dm_dd*num_cust + ...
avg_dm_dd_n(i)*num_cust_n(i))/...
avg_dm_sb = (avg_dm_sb*num_cust + ...
avg_dm_sb_n(i)*num_cust_n(i))/...
avg_de = (avg_de*num_cust + ...
avg_de_n(i)*num_cust_n(i))/...
avg_dm = (avg_dm*num_cust + ...
avg_dm_n(i)*num_cust_n(i))/...
num_cust = num_cust + num_cust_n(i);
end
table = [vertcat(num_cust_n,num_cust) vertcat(avg_de_sb_n,avg_de_sb) ...
vertcat(avg_de_dd_n,avg_de_dd) vertcat(avg_de_n,avg_de)...
vertcat(avg_dm_sb_n,avg_dm_sb) vertcat(avg_dm_dd_n,avg_dm_dd)...
vertcat(avg_dm_n,avg_dm)];
R_de_dm = mean(closest_dm_dd./closest_de_dd);
E_de_dm = -norminv(0.05/2)*std(closest_dm_dd./closest_de_dd)/...
sqrt(length(closest_dm_dd));
CI_de_dm = R_de_dm + [-E_de_dm E_de_dm];
%% Simulated Location Pairs for Metric Comparison
% lat = [min(dd(:,1)) max(dd(:,1))];
% long = [min(dd(:,2)) max(dd(:,2))];
% new_cust = [lat(1)+(lat(2)-lat(1))*rand(20,1) ...
%
long(1)+(long(2)-long(1))*rand(20,1)];
rand_cust =
42.3162
42.3666
42.3199
42.3654
42.4149
42.3337
42.4084
42.3756
42.3658
42.3994
[ 42.3082
-71.0922;
-71.0827;
-71.1624;
-71.1112;
-71.1224;
-71.1814;
-71.0839;
-71.1229;
-71.1789;
-71.1121;
-71.1327; 42.4231
42.3481 -71.1549;
42.3673 -71.0242;
42.3285 -71.1936;
42.3694 -71.1135;
42.3359 -71.1003;
42.3293 -71.1063;
42.3766 -71.0385;
42.3173 -71.0822;
42.4109 -71.0840;
42.3123 -71.0900;
-71.0546; 42.3773
42.4119 -71.1774;
42.3539 -71.1775;
42.3332 -71.0489;
42.3503 -71.0367;
42.3812 -71.0732;
42.3598 -71.0539;
42.3112 -71.0867;
42.3482 -71.1098;
42.3824 -71.0294;
42.3648 -71.0920;
-71.1012;
6 of 15
5/5/10 6:47 PM
42.4141
42.3063
42.3089
42.3781
42.4068
42.3212
42.3746
42.4125
42.3311
42.3003
42.3545
42.2998
42.3957
42.3843
42.4108
42.3532
-71.1559;
-71.0601;
-71.0686;
-71.1659;
-71.0983;
-71.1910;
-71.1139;
-71.0943;
-71.1714;
-71.1411;
-71.0299;
-71.1247;
-71.1896;
-71.1790;
-71.1563;
-71.1351;
42.3101
42.3498
42.3294
42.3287
42.3480
42.3435
42.3297
42.4142
42.3686
42.3312
42.3158
42.3110
42.4139
42.4035
42.3475
42.3171
-71.0786;
-71.1580;
-71.0452;
-71.1906;
-71.0513;
-71.1304;
-71.1879;
-71.1431;
-71.1429;
-71.0764;
-71.0907;
-71.1881;
-71.0443;
-71.1712;
-71.0438;
-71.1281;
42.3792
42.3895
42.3171
42.3303
42.3095
42.4108
42.3429
42.4220
42.3192
42.4268
42.4124
42.3287
42.3251
42.4206
42.4190
];
-71.1679;
-71.1657;
-71.1547;
-71.1389;
-71.0549;
-71.1094;
-71.0875;
-71.1892;
-71.0623;
-71.0322;
-71.0296;
-71.1439;
-71.1197;
-71.1530;
-71.0619;
rand_cust_dist = haversine(mit(1),mit(2),rand_cust(:,1),rand_cust(:,2));
rand_cust_xy = [haversine(mit(1),mit(2),mit(1),rand_cust(:,2)).*...
sign(rand_cust(:,2)-mit(2))...
haversine(mit(1),mit(2),rand_cust(:,1),mit(2)).*...
sign(rand_cust(:,1)-mit(1))];
rand_de_dd = zeros(length(rand_cust_xy),1);
rand_dm_dd = zeros(length(rand_cust_xy),1);
rand_dd = zeros(length(rand_cust_xy),2);
for i=1:length(rand_cust_xy)
[C,CI] = min(sqrt((rand_cust_xy(i,1)-dd_xy(:,1)).^2+ ...
(rand_cust_xy(i,2)-dd_xy(:,2)).^2));
rand_dd(i,:) = dd(CI,:);
rand_de_dd(i) = C;
rand_dm_dd(i) = abs(rand_cust_xy(i,1)-dd_xy(CI,1))+ ...
abs(rand_cust_xy(i,2)-dd_xy(CI,2));
end
rand_dd_dist = haversine(mit(1),mit(2),rand_dd(:,1),rand_dd(:,2));
rand_dd_xy = [haversine(mit(1),mit(2),mit(1),rand_dd(:,2)).*...
sign(rand_dd(:,2)-mit(2))...
haversine(mit(1),mit(2),rand_dd(:,1),mit(2)).*...
sign(rand_dd(:,1)-mit(1))];
[rand_cust_ij(:,1) rand_cust_ij(:,2)] = ...
xy2ij(rand_cust_xy(:,1),rand_cust_xy(:,2),W_m,W_i);
[rand_dd_ij(:,1) rand_dd_ij(:,2)] = ...
xy2ij(rand_dd_xy(:,1),rand_dd_xy(:,2),W_m,W_i);
% for i=41:length(rand_cust)
%
disp(['from: ' num2str(rand_cust(i,1)) ', ' num2str(rand_cust(i,2)) ...
%
' to: ' num2str(rand_dd(i,1)) ', ' num2str(rand_dd(i,2))])
% end
rand_dg_dd = [1.2; 0.7; 0.4; 0.5; 0.1; 1.8; 0.4; 0.3; 0.6; 1.7;
1.7; 0.1; 0.5; 0.2; 0.3; 0.4; 0.3; 0.5; 1.0; 0.7; .080; 0.3;
0.2; 0.3; 0.3; 0.6; 0.3; 0.2; 0.4; 1.0; 0.3; 0.5; 0.2; 0.5;
0.3; 0.9; 0.2; 0.3; 1.4; 0.7; 2.7; 1.8; 0.7; 1.7; 0.8; 0.4;
0.2; 0.3; 2.4; 0.8; 0.3; 0.2; 1.5; 0.2; 0.4; 0.4; 2.3; 0.8;
7 of 15
5/5/10 6:47 PM
1.2; 0.2; 2.4; .083; 0.9; 8.2; 0.6; 0.8; 1.1; 3.5; 0.6; 1.0;
0.3; 0.8; 0.5; 1.7; 0.1; 0.8; 0.4; 0.6; .078; 1.2; ];
outliers = abs(rand_dg_dd-rand_de_dd)>3*min(rand_dg_dd,rand_de_dd);
R_de_dg_o = mean(rand_dg_dd./rand_de_dd);
E_de_dg_o = -norminv(0.05/2)*std(rand_dg_dd./rand_de_dd)/...
sqrt(length(rand_dg_dd));
CI_de_dg_o = R_de_dg_o + [-E_de_dg_o E_de_dg_o];
R_de_dg_no = mean(rand_dg_dd(~outliers)./rand_de_dd(~outliers));
E_de_dg_no = -norminv(0.05/2)*std(rand_dg_dd(~outliers)./...
rand_de_dd(~outliers))/sqrt(length(rand_dg_dd(~outliers)));
CI_de_dg_no = R_de_dg_no + [-E_de_dg_no E_de_dg_no];
R_dm_dg_o = mean(rand_dg_dd./rand_dm_dd);
E_dm_dg_o = -norminv(0.05/2)*std(rand_dg_dd./rand_dm_dd)/...
sqrt(length(rand_dg_dd));
CI_dm_dg_o = R_dm_dg_o + [-E_dm_dg_o E_dm_dg_o];
R_dm_dg_no = mean(rand_dg_dd(~outliers)./rand_dm_dd(~outliers));
E_dm_dg_no = -norminv(0.05/2)*std(rand_dg_dd(~outliers)./...
rand_dm_dd(~outliers))/sqrt(length(rand_dg_dd(~outliers)));
CI_dm_dg_no = R_dm_dg_no + [-E_dm_dg_no E_dm_dg_no];
R_de_dm_o = mean(rand_dm_dd./rand_de_dd);
E_de_dm_o = -norminv(0.05/2)*std(rand_dm_dd./rand_de_dd)/...
sqrt(length(rand_dm_dd));
CI_de_dm_o = R_de_dm_o + [-E_de_dm_o E_de_dm_o];
R_de_dm_no = mean(rand_dm_dd(~outliers)./rand_de_dd(~outliers));
E_de_dm_no = -norminv(0.05/2)*std(rand_dm_dd(~outliers)./...
rand_de_dd(~outliers))/sqrt(length(rand_dm_dd(~outliers)));
CI_de_dm_no = R_de_dm_no + [-E_de_dm_no E_de_dm_no];
ci_table = [
[CI_de_dg_no(1) R_de_dg_no CI_de_dg_no(2) ...
CI_de_dg_o(1) R_de_dg_o CI_de_dg_o(2)]
[CI_dm_dg_no(1) R_dm_dg_no CI_dm_dg_no(2) ...
CI_dm_dg_o(1) R_dm_dg_o CI_dm_dg_o(2)]
[CI_de_dm_no(1) R_de_dm_no CI_de_dm_no(2) ...
CI_de_dm_o(1) R_de_dm_o CI_de_dm_o(2)]
];
%% Plot Locations using Longitude and Latitude (GPS) Coordinates
figure(1)
plot(dd(:,2),dd(:,1),'.b',...
sb(:,2),sb(:,1),'.g',...
mbta(mbta_dist<5,2),mbta(mbta_dist<5,1),'*r')
axis equal
xlabel('Longitude (\circ)')
ylabel('Latitude (\circ)')
legend('Dunkin Donuts','Starbucks','MBTA Station')
%% Plot Locations using Cartesian Coordinates
8 of 15
5/5/10 6:47 PM
figure(2)
plot(dd_xy(:,1),dd_xy(:,2),'.b',...
sb_xy(:,1),sb_xy(:,2),'.g',...
mbta_xy(mbta_dist<5,1),mbta_xy(mbta_dist<5,2),'*r',...
5*cos(linspace(0,2*pi(),100)),5*sin(linspace(0,2*pi(),100)),'-k')
xlabel('Distance (miles)')
ylabel('Distance (miles)')
axis equal
%% Plot Locations Overlaid Map Image
figure(3)
imshow(I)
hold on
plot(dd_ij(:,1),dd_ij(:,2),'.b',...
sb_ij(:,1),sb_ij(:,2),'.g',...
mbta_ij(mbta_dist<5,1),mbta_ij(mbta_dist<5,2),'*r',...
W_i/2+W_i/2*cos(linspace(0,2*pi(),100)),...
W_i/2-W_i/2*sin(linspace(0,2*pi(),100)),'-k')
hold off
axis image
%% Overlay Location Sector Sums on Location Map
figure(3)
hold on
for i=1:N_s+1
for j=1:N_s+1
plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',...
linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k')
if i<=N_s && j<=N_s
text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,...
['\bf\color{blue}' num2str(number_dd(j,i)) ...
'\newline\bf\color{green}' num2str(number_sb(j,i)) ...
'\newline\bf\color{red}' num2str(number_mbta(j,i)) ],...
'HorizontalAlignment','center',...
'VerticalAlignment','middle')
end
end
end
hold off
%% Plot Population Data Overlaid on Map
figure(4)
imshow(I)
hold on
plot(pop_ij(:,1),pop_ij(:,2),'.magenta',...
W_i/2+W_i/2*cos(linspace(0,2*pi(),100)),...
W_i/2-W_i/2*sin(linspace(0,2*pi(),100)),'-k')
hold off
9 of 15
5/5/10 6:47 PM
legend('100 People')
axis image
%% Overlay Population Sector Sums on Population Map
figure(4)
hold on
for i=1:N_s+1
for j=1:N_s+1
if i<=N_s && j<=N_s
['\bf' num2str(number_pop(j,i))],...
'HorizontalAlignment','center')
end
end
end
hold off
%% Display Sector Labels Overlaid on Map
figure(5)
imshow(I)
hold on
for i=1:N_s+1
for j=1:N_s+1
if i<=N_s && j<=N_s
['\bf' num2str(10*(j-1)+i)],...
'HorizontalAlignment','center')
end
end
end
hold off
axis off image
%% Overlay Neighborhood Colors on Sector Map
figure(5)
hold on
for sector=1:N_s^2
color = 'w';
if sum(cambridge==sector)>0 ...
|| sum(southeast==sector)>0
color='y';
elseif sum(northeast==sector)>0 ...
|| sum(northwest==sector)>0 ...
|| sum(back_bay==sector)>0
color='g';
10 of 15
5/5/10 6:47 PM
elseif sum(downtown==sector)>0 ...

|| sum(southwest==sector)>0
color='r';
end
rectangle('Position',[S_ij(mod(sector-1,10)+1),...
S_ij(ceil((sector)/10)),...
S_ij(mod(sector-1,10)+2)-S_ij(mod(sector-1,10)+1),...
S_ij(ceil(sector/10)+1)-S_ij(ceil(sector/10))],'FaceColor',color);
end
hold off
%% Test Poisson Spatial Distribution - Dunkin Donuts
bins_dd = 0:1:4;
dd_exp = zeros(length(bins_dd),1);
dd_obs = zeros(length(bins_dd),1);
lambda_dd = sum(number_dd_s)/N_s^2;
for i=1:length(bins_dd)
if i==1 && length(bins_dd)>1
dd_exp(i) = N_s^2*poisscdf(bins_dd(i+1),lambda_dd);
dd_obs(i) = sum(number_dd_s<bins_dd(i+1));
elseif i==1 && length(bins_dd)==1
dd_exp(i) = N_s^2;
dd_obs(i) = N_s^2;
elseif i==length(bins_dd)
dd_exp(i) = N_s^2*(1-poisscdf(bins_dd(i),lambda_dd));
dd_obs(i) = N_s^2 - sum(number_dd_s<bins_dd(i));
else
dd_exp(i) = N_s^2*(poisscdf(bins_dd(i+1),lambda_dd) - ...
poisscdf(bins_dd(i),lambda_dd));
dd_obs(i) = sum(number_dd_s<bins_dd(i+1)) - ...
sum(number_dd_s<bins_dd(i));
end
end
figure(6)
bar(bins_dd,[dd_exp dd_obs],'group')
title('Dunkin Donuts Spatial Distribution Model')
ylabel('Frequency')
xlabel('Bin Lower Bound')
legend('Expected (Poisson)','Observed')
chi2_dd = sum(sum((dd_obs-dd_exp).^2./dd_exp));
if chi2_dd > chi2inv(0.05,length(bins_dd)-2)
disp(['H0: Dunkin Donuts storefronts Poisson spatially distributed '...
'is rejected (p=' ...
num2str(chi2cdf(chi2_dd,length(bins_dd)-2)) ').'])
else
disp(['H0: Dunkin Donuts storefronts Poisson spatially distributed '...
'cannot be rejected (p=' ...
num2str(chi2cdf(chi2_dd,length(bins_dd)-2)) ').'])
end
11 of 15
5/5/10 6:47 PM
%% Test Poisson Spatial Distribution - Starbucks

bins_sb = 0:1:3;
sb_exp = zeros(length(bins_sb),1);
sb_obs = zeros(length(bins_sb),1);
lambda_sb = sum(number_sb_s)/N_s^2;
for i=1:length(bins_sb)
if i==1 && length(bins_sb)>1
sb_exp(i) = N_s^2*poisscdf(bins_sb(i+1),lambda_sb);
sb_obs(i) = sum(number_sb_s<bins_sb(i+1));
elseif i==1 && length(bins_sb)==1
sb_exp(i) = N_s^2;
sb_obs(i) = N_s^2;
elseif i==length(bins_sb)
sb_exp(i) = N_s^2*(1-poisscdf(bins_sb(i),lambda_sb));
sb_obs(i) = N_s^2 - sum(number_sb_s<bins_sb(i));
else
sb_exp(i) = N_s^2*(poisscdf(bins_sb(i+1),lambda_sb) - ...
poisscdf(bins_sb(i),lambda_sb));
sb_obs(i) = sum(number_sb_s<bins_sb(i+1)) - ...
sum(number_sb_s<bins_sb(i));
end
end
figure(7)
bar(bins_sb,[sb_exp sb_obs],'group')
title('Starbucks Spatial Distribution Model')
ylabel('Frequency')
xlabel('Bin Lower Bound')
legend('Expected (Poisson)','Observed')
chi2_sb = sum(sum((sb_obs-sb_exp).^2./sb_exp));
if chi2_sb > chi2inv(0.05,length(bins_sb)-2)
disp(['H0: Starbucks storefronts Poisson spatially distributed ' ...
'is rejected (p=' ...
num2str(chi2cdf(chi2_sb,length(bins_sb)-2)) ').'])
else
disp(['H0: Starbucks storefronts Poisson spatially distributed ' ...
'cannot be rejected (p=' ...
num2str(chi2cdf(chi2_sb,length(bins_sb)-2)) ').'])
end
%% "Heat Map" Visualizations
X = S_xy(1:end-1)+W_s/2;
Y = flipud(S_xy(1:end-1))+W_s/2;
figure(10)
colormap jet
contourf(X,Y,number_dd/W_s^2,120)
caxis([0 30])
12 of 15
5/5/10 6:47 PM
colorbar
shading flat
axis off equal
title('Dunkin Donuts (per mi^2)')
figure(11)
contourf(X,Y,number_sb/W_s^2,120)
caxis([0 30])
colorbar
shading flat
axis off equal
title('Starbucks (per mi^2)')
figure(12)
contourf(X,Y,number_mbta/W_s^2,120)
caxis([0 30])
colorbar
shading flat
axis off equal
title('MBTA Stations (per mi^2)')
figure(13)
contourf(X,Y,number_pop/1000/W_s^2,120)
colorbar
shading flat
axis off equal
title('Population (thousands per mi^2)')
%% Plot Distance Metric Comparison Ratio
figure(14)
hold on
scatter(closest_de_dd,closest_dm_dd,'.k')
plot(linspace(0,2,100),linspace(0,2,100),'-k',...
linspace(0,2,100),4/pi().*linspace(0,2,100),'--r',...
linspace(0,2,100),R_de_dm.*linspace(0,2,100),'--b')
hold off
title('Closest Dunkin'' Donuts')
xlabel('Euclidean Distance (D_e, miles)')
ylabel('Manhattan Distance (D_m, miles)')
legend('Sample','R = 1',...
'R = 4/\pi',['R = ' num2str(R_de_dm)]);
axis xy square
%% Plot Metric Comparison Ratios
figure(15)
hold on
scatter(rand_de_dd(~outliers),rand_dg_dd(~outliers),'.k')
scatter(rand_de_dd(outliers),rand_dg_dd(outliers),'.r')
linspace(0,6,100),R_de_dg_o*linspace(0,6,100),'--r',...
13 of 15
5/5/10 6:47 PM
linspace(0,6,100),R_de_dg_no*linspace(0,6,100),'--b')
hold off
ylabel('Google Distance (D_g, miles)')
legend('Sample','Outlier','R=1',['R=' num2str(R_de_dg_o)...
' (w/ Outliers)'],['R=' num2str(R_de_dg_no) ' (w/o Outliers)'])
axis xy square
figure(16)
hold on
scatter(rand_dm_dd(~outliers),rand_dg_dd(~outliers),'.k')
scatter(rand_dm_dd(outliers),rand_dg_dd(outliers),'.r')
linspace(0,6,100),R_dm_dg_o*linspace(0,6,100),'--r',...
linspace(0,6,100),R_dm_dg_no*linspace(0,6,100),'--b')
hold off
xlabel('Manhattan Distance (D_m, miles)')
ylabel('Google Distance (D_g, miles)')
legend('Sample','Outlier','R=1',['R=' num2str(R_dm_dg_o)...
' (w/ Outliers)'],['R=' num2str(R_dm_dg_no) ' (w/o Outliers)'])
axis xy square
figure(17)
hold on
scatter(rand_de_dd(~outliers),rand_dm_dd(~outliers),'.k')
scatter(rand_de_dd(outliers),rand_dm_dd(outliers),'.r')
linspace(0,6,100),R_de_dm_o*linspace(0,6,100),'--r',...
linspace(0,6,100),R_de_dm_no*linspace(0,6,100),'--b')
hold off
ylabel('Manhattan Distance (D_m, miles)')
legend('Sample','Outlier','R=1',['R=' num2str(R_de_dm_o)...
' (w/ Outliers)'],['R=' num2str(R_de_dm_no) ' (w/o Outliers)'])
axis xy square
%% Display Customer-Storefront Pairs Overlaid on Map
figure(18)
imshow(I)
hold on
plot(rand_cust_ij(:,1),rand_cust_ij(:,2),'.m',...
rand_dd_ij(:,1),rand_dd_ij(:,2),'.b')
for i=1:length(rand_cust_ij)
plot([rand_cust_ij(i,1) rand_dd_ij(i,1)],...
[rand_cust_ij(i,2) rand_dd_ij(i,2)],'-k')
end
hold off
legend('Customer','Dunkin'' Donuts')
axis image
%%
figure(19)
14 of 15
5/5/10 6:47 PM
hold on
for i=1:size(neighborhoods)
plot([1 2],[exp_de_dd(i) avg_de_dd_n(i)],'-b')
plot([1 2],[exp_de_sb(i) avg_de_sb_n(i)],'-g')
end
hold off
axis([0 3 0 1.5])
15 of 15
5/5/10 6:49 PM
C:\Documents and Settings\Paul Grogan\My Documents\...\haversine.m
function [dist] = haversine(lat1, long1, lat2, long2)

radius = 3958.75587; % earth's radius (miles)
a = sin(deg2rad(lat2-lat1)/2).^2 + ...
cos(deg2rad(lat1)).*cos(deg2rad(lat2)).*sin(deg2rad(long2-long1)/2).^2;
c = 2*atan2(sqrt(a),sqrt(1-a));
dist = radius*c;
1 of 1
5/5/10 6:49 PM
C:\Documents and Settings\Paul Grogan\My Documents\Acad...\xy2ij.m
function [i j] = xy2ij(x, y, W_xy, W_ij)

i = round(W_ij/2+W_ij/W_xy*x);
j = round(W_ij/2-W_ij/W_xy*y);
1 of 1
5/5/10 6:49 PM
C:\Documents and Settings\Paul Grogan\My Documents\Acad...\ij2xy.m
function [x y] = ij2xy(i, j, W_ij, W_xy)

x = W_xy/W_ij*i-W_xy/2;
y = W_xy/2 - W_xy/W_ij*j;
1 of 1

Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area

Uploaded by

Copyright:

Available Formats

You might also like

Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparative Analysis of Coffee Franchises in The Cambridge-Boston Area

Uploaded by

Copyright:

Available Formats

Comparative Analysis of Coffee Franchises

in the Cambridge-Boston Area

May 10, 2010

Can the franchise storefronts be modeled with a spatial Poission distribution?

Can the customers be modeled with a uniform distribution?

myStarbucks App for iPhone and iPod Touch, http://www.starbucks.com/coffeehouse/mobile-apps/mystarbucks

GeoCoder tool provides search queries using Yahoo or Google: http://www.gpsvisualizer.com/geocoder/

Figure 2: Location Data Overlaid on Map

Storefront Distribution Model

Figure 5: Storefront Locations by Neighborhood

Customer Distribution Model

Figure 6: Sectors with Estimated Population

emerging from a particular neighborhood is in proportion to the neighborhoods population fraction, as

City-wide Spatial Poisson Distribution

Figure 8: Sector-based Heat Maps of Location and Population Data

Nearest Storefront Analysis

Expected Min Euclidean Distance

Expected Min Manhattan Distance

Figure 9: Expected Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks)

Comparison with Exact Population Density Demand

Mean Min Euclidean Distance

Mean Min Manhattan Distance

Mean Min Euclidean Distance

Mean Min Manhattan Distance

Distance Metric Comparison

Figure 13: Paired Customers-Dunkin' Donuts for Distance Comparison

Appendix A: Raw Data

Dunkin Donuts Locations

616 Massachusetts Ave, Cambridge, MA 02139

222 Broadway, Cambridge, MA 02142

1001 Cambridge St., Cambridge, MA 02141

1 Bow St, Cambridge, MA 02138

65 Jfk St, Cambridge, MA 02138

1 Broadway, Cambridge, MA 02142

530 Commonwealth Ave, Boston, MA 02215

282 Somerville Ave, Somerville, MA 02143

219 Cambridge St, Allston, MA 02134

1008 Beacon St, Brookline, MA 02446

1020 Commonwealth Ave, Boston, MA 02215

14 Mcgrath Hwy, Somerville, MA 02143

Harvard Square Station, Cambridge, MA 02163

209 N Harvard St, Allston, MA 02134

519 Somerville Ave, Somerville, MA 02143

333 Newbury St, Boston, MA 02115

5 3rd St, Cambridge, MA 02141

1420 Boylston St, Boston, MA 02215

153 Massachusetts Ave, Boston, MA 02115

100 Cambridgeside Place, Cambridge, MA 02141

90 Washington St, Somerville, MA 02143

330 Brookline Ave, Boston, MA 02215

715 Boylston St, Boston, MA 02116

800 Boylston St, Boston, MA 02199

210 Harvard Ave, Allston, MA 02134

179 Brighton Ave, Allston, MA 02134

53 Huntington Ave, Boston, MA 02199

1316 Beacon St, Brookline, MA 02446

350 Longwood Ave, Boston, MA 02115