Professional Documents
Culture Documents
15 07 026 PDF
15 07 026 PDF
SFI
Working
Papers
contain
accounts
of
scienti5ic
work
of
the
author(s)
and
do
not
necessarily
represent
the
views
of
the
Santa
Fe
Institute.
We
accept
papers
intended
for
publication
in
peer-‐reviewed
journals
or
proceedings
volumes,
but
not
papers
that
have
already
appeared
in
print.
Except
for
papers
by
our
external
faculty,
papers
must
be
based
on
work
done
at
SFI,
inspired
by
an
invited
visit
to
or
collaboration
at
SFI,
or
funded
by
an
SFI
grant.
©NOTICE:
This
working
paper
is
included
by
permission
of
the
contributing
author(s)
as
a
means
to
ensure
timely
distribution
of
the
scholarly
and
technical
work
on
a
non-‐commercial
basis.
Copyright
and
all
rights
therein
are
maintained
by
the
author(s).
It
is
understood
that
all
persons
copying
this
information
will
adhere
to
the
terms
and
constraints
invoked
by
each
author's
copyright.
These
works
may
be
reposted
only
with
the
explicit
permission
of
the
copyright
holder.
www.santafe.edu
SANTA FE INSTITUTE
Linked Activity Spaces: Embedding Social Networks in Urban Space
Yaoli Wanga,b, Chaogui Kangc,d, Luis M. A. Bettencourtb, Yu Liuc and Clio Andrisb*
a. Dept. of Geography, 120K, Geog-‐Geol Building, University of Georgia, Athens, GA30602
b. Santa Fe Institute, 1399 Hyde Park Rd. Santa Fe NM 87501
c. Institute of Remote Sensing and Geographical Information Systems, Peking University, Beijing
100871, China
d. MIT Senseable City Laboratory, 9-‐209, 77 Massachusetts Ave. Cambridge MA 02139
*Corresponding author
Keywords: call data records, interpersonal relationships, mobile phones, pervasive computing,
geography, GIS, urban planning
Abstract
We examine the likelihood that a pair of sustained telephone contacts (e.g. friends, family,
professional contacts) use the city similarly. Using call data records from an undisclosed city in
China, we define a proxy for the daily activity spaces of each individual subscriber by
interpolating the points of geo-‐located cell towers he or she uses most frequently. We calculate
the overlap of pairs of linked activity spaces, (e.g. the pairs of space between two established
telephone contacts) and find that friends and second degree friends are more likely to overlap
than random pairs. We find that higher degree users and users with many network triangles
(connected groups of three nodes) tend to congregate in the central business district. We also
find that the downtown area hosts many heterogeneous modular communities of social groups
(derived from telephone calls), but that two distinct neighborhoods contain distinct social
clusters and act like a boundary around these friendships. We connect our findings with the role
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
I Introduction
In this chapter, we present a methodology that can help elucidate how groups of friends, family
and professional contacts use the city. We know that cities are comprised of two interacting
components, social networks and physical infrastructure, and that the social dynamics of
encounters in urban space form the backbone of city life (Bettencourt 2013). Yet our ability to
model social networks in urban spaces is very limited. This presents a problem because often our
behavior results from the influence of others (Salganik and Watts 2008). Similarly, our social ties
affect how we use the city: where we choose to meet, live and work.
It is paradoxical whether a citizen reaps peak benefits of the city (such as social support, quality
of life and economic opportunities) (Bettencourt et al 2007, Bettencourt 2013) by having his or
her contacts nearby or dispersed. At one extreme, dispersed contacts can expose the ego to new
neighborhoods and a variety of urban knowledge (such as finding the quickest post office, the
best doctor or an exciting new restaurant), due to their variety of experiences in diverse parts of
a city. Yet, it may be more difficult to meet with dispersed contacts, and having friends in
disparate parts of the city is likely to form a social network where one has few friends “in
common” with other friends, which can be a key strength of social networks.
At the other extreme, a socially-‐tight neighborhood forms trusted bonds through multiple
channels of social validation (Centola and Macy 2007) and through increased outdoor exposure
and neighborhood institutions such as local schools. Proximal social contacts can meet
conveniently, benefiting elderly and the mobility-‐challenged and, in some cases, poorer or
immigrant communities who likely rely on friends and family for help with amenities such as
2
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
child-‐care. Yet, in these enclosed neighborhoods, information and social capital from other parts
of the city may be less accessible (Granovetter 2005) resulting in missing or unsupportive social
Do citizens seem take advantage of the benefits, and bear the drawbacks, of spatially-‐dispersed
social contacts or spatially-‐clustered social contacts? In order to answer this question, we must
model the social network of egos within the built environment.
There is no way to extricate social life from the built environment, yet the methods available to
understand the dispersion or clustering of a set of individual’s social networks in geography are
limited, as social network and urban spatial models have matured in separate domains, and are
analysed in separate spheres (Andris 2011). Network methods are also rarely used by those who
study city form (Sevtsuk and Mekonnen 2012). Social networks represent influences and social
capital as graph configurations of nodes (agents) and links (e.g. edges) between nodes that
Systems (GIS)) models are represented in a contiguous topological plane, where adjacency and
proximity links entities, such as commercial buildings. These entities can be represented as
polygons or lines and unlike in the social network model, direct connections (e.g. links) are rarely
respectively. One example is the study of obesity, where social networks (Christakis and Fowler
3
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
2008) and city form (Papas et al 2007) are examined as causal factors. These mechanisms may
be factors separately, but it may be how social ties influence choices within the confines of the
built environment that best explain this health problem.. Similarly, research showing how
students use a college campus in space and time via WiFi usage makes strides in examining the
flexibility of meeting places due to mobile computing from an architectural standpoint, but could
be extended to assess social gatherings in time and place (Sevtsuk and Ratti 2005, Sevtsuk et al
2009). Concurrently, on the same college campus, Eagle et al (2009) show the temporal social
patterns of dyadic (pairwise) relationships in terms of calls, SMS messages and co-‐location. This
equally ground-‐breaking study alludes to the role of the campus in providing space for social
groups and pairs, but without a spatial analysis of where dyads meet, where they travel on the
This is not to say that datasets on interpersonal communication and movement have not been
embedded into geography; analyses of inter-‐place networks of social flows such as commodities,
(examples abound). Yet, these represent place-‐to-‐place aggregate flows instead of person-‐to-‐
person flows, and thus do not include the decisions of individuals. Small-‐scale examples of
Papachristos et al 2013), transportation (Frei and Axhausen 2011 and Arentze et al 2012) and
epidemiology (Emch et al 2008). We take these initiatives a step further by creating a general
method that can respond to the general nature of human fraternization in a built environment.
These studies can elucidate where and when (different types of) relationships form, and could be
4
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
used to advise architects, urban and transportation planners in creating places that support
social connectivity.
In working towards this goal, those looking to examine social/spatial problems are aided by the
recent proliferation of large datasets evidencing human social contact and movement (such as
GPS or cell tower usage records) in the city (Reades et al 2007). The integration of human
movement and activity data, such as information from GPS traces (Gao et al 2013), check-‐in data
(Cho et al 2011), online social networks (Scellato et al 2010) and photo-‐sharing sites (Crandall et
al 2011, Girardin et al 2008, Sun et al 2013), into urban models show how humans use the built
environment. Specifically, the use of mobile telephone calls to understand city usage patterns
are becoming a cornerstone of modern urban informatics, planning and transportation (Ratti et
al 2006). We take advantage of mobile telephone call data to test our research questions about
We leverage cell phone call data records (CDRs) to model friendships (i.e. interpersonal
relationships) as a social network, inferred by the frequency calls between of two agents, and
the set of locations (i.e. activity space) of each member of the social network within the city. A
pair of activity spaces of an ego and alter are called linked activity spaces (LAS) if the ego and
alter are friends (i.e. contacts) in the data set. The two activity spaces of friends are manifested
as two polygons in a Geographic Information System (GIS) context, and spatially analyzed for
similarity, via the number of ‘third places’ shared among the pair (Rosenbaum 2006). Moreover,
5
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
we analyze the social network as a whole to find whether high-‐degree egos (those with many
friends), triangles (groups of three agents) and communities use the city in significant ways.
We have four main hypotheses for the analysis of LAS. (I) We expect that friends’ activity spaces
will overlap more often than a random pair of activity spaces, indicating that friends use the city
more similarly than a random pair of people. (II) We also hypothesize that egos with high
degrees or high clustering coefficients (Jackson 2010) will use be inclined toward the city center,
as this denser environment tends to have more meeting places, diverse services, commercial
areas and nightlife. (III) In terms of city form and groups, we believe that central areas will play
an enhanced role in supporting “clique-‐like” and modular groups instead of being a mixing pot
for many groups. We expect the downtown area to host tight-‐knit social groups who do not
venture to the suburbs often. (IV) Finally, we expect that peripheral areas will be more mixed
since the automobile may be a more popular mode of transportation, and may not limit visits to
This chapter proceeds as follows. We first describe study area and the subsetting of the CDR
dataset. We then describe in the methods section how to delineate each user’s activity spaces
from these points. We then analyze how linked activity spaces (LAS) are spatially correlated in an
urban environment by shared points of interest (POIs). We conclude with a discussion of the
usefulness of this method, its drawbacks, potential applications and future work.
industrial city serves as a producer of wood pulp and newsprint and participates in the global
6
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
economy via a thriving international trade harbor. The city is nearly 18 by 10 kilometers and on
average citizens travel 1 kilometer in a day (Kang et al 2012). We focus on calls made within the
city area and exclude long-‐distance calls. We use a CDR (call data record) dataset of mobile cell
The original CDR dataset is comprised of nearly 424,000 users’ activity for 31 days. Users are
anonymized in the dataset. All users combined make an average of 1,600,000 calls daily. In the
31 day timespan, each user participates in an average of 328 calls for a total duration of 6.15
hours. Each record of a mobile phone call contains the start time, call duration and locations of
the caller and receiver. The locations are geo-‐referenced to one of 96 cell towers closest to the
mobile phone’s location (Table 1). The dataset does not include text messages (e.g. SMS).
Table 1: Call Data Record (CDR) variables with original data fields (top).A social
network and spatial data summary table are listed in the middle and bottom
rows, respectively.
Caller
ORIGINAL Receiver
Caller Receiver Location (x, Start time Duration
TABLE Location (x, y)
y)
We process the dataset into two parts, a social network of agents (social network in Table 1) and
the activity spaces of each agent (spatial patterns in Table 1). We filter the network by including
only those who use at least 3 cell towers during the study period in order to eliminate users who
7
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
may be confined to home and thus interact with the city differently than a typical mobile user.
Also, an individual may have multiple mobile phones, and a phone with fewer than three cell
towers used may represent a “secondary” or less-‐frequently-‐used device. In the social network,
the number of calls is determined between a unique pair of users and duration is the sum of call
time between the two users. We represent the network as undirected in order to reflect each
words, records showing that A calls B, or B calls A, are summed to represent a connection
between unique, undirected pair A, B. Each pair must have at least 10 mutual phone calls and 10
minutes of total call duration in the given month in order to eliminate non-‐friend calls such as
The spatial patterns table inventories the locations of each user to geo-‐locate a pair of callers in
the social network. The coordinates of the cell phone tower where a user places or receives a
call are summed and weighted by the number of calls the user places or receives at that cell
tower location. We use the resulting set of weighted locations to represent the user’s geographic
activity pattern (such as Carrasco et al 2006), which are known to capture anchor points
(Golledge 1999) such as the home and workplace (or school), as they are the most visited
locations for the average traveler and thus, frequent calling points (Schönfelder and Axhausen
2003).
Because the universal CDR dataset is very large, we draw a sample from the original dataset by
8
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
selecting a random sample of 150 “seed” users, and query for their contacts (first degree ties),
their second degree and third degree ties. The 150 seed users are connected to 248 first degree,
1,674 second degree and 6,159 third degree ties, totaling 8,231 individual users, and 47,297
friendship links between the users. On average, each user has 44 network friends (distribution in
Figure 1) and participates in a total of 328 calls for 6.15 hours during the study period.
Figure 1: The degree distribution of the universal network is right-‐skewed with mean 44.
Since the network is configured in a step-‐wise process, where 150 “seed” users are randomly
selected, the remainder of the network grows from the relationships of the 150 users. This
configuration yields a network that is focused on the social interactions of a small sample of
users. As a result, the social network does not fully represent a typical “universal” sample of a
9
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
social network in degree distribution (such as Albert and Barabási 2002), traversability or density
(Newman and Park 2003). The network has 8,197 nodes and 47,297 links. The degree values for
the core network range from 1 – 344, with a mean degree of 43.8. The average weighted degree
(by the duration of calls in seconds) is 22150, where the degree is weighted by number of
seconds of phone call activity. The diameter of the network is 12. Clustering coefficient values
range from 0 – 1, with a mean coefficient of 0.11.The network is visualized in Figure 2.
10
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 2: The core network is visualized with a force-‐directed method that places nodes in
feature space based on their density of linkages in Gephi (Bastian et al 2009). Larger and more
purple nodes denote higher degree (max degree= 287) and small orange nodes indicate smaller
11
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Spatial Distribution There are 96 cell phone towers in our dataset, most of which are clustered in the
downtown area. The spatial granularity of CDR datasets reflects the distributions of cell towers, though
these are scattered unevenly throughout the city, as we see in Figures 3A and 3B. The towers’ actual
locations are randomly offset for security and proprietary reasons. The caller’s actual location is
estimated in a larger radius around the tower so the tower’s slightly offset location does not significantly
add to the dataset’s natural spatial noise. It is also possible that a caller’s call is routed to a cell tower that
is not the closest to him or her, as the closest cell tower may be saturated with calls or out of service, as
indicated by the heterogeneity of usage statistics in the downtown (Figures 3A, 3B). This inconsistency is
not expected to detrimentally affect the data analysis as many of the heterogeneous towers are in the
downtown, where reliable nearby towers can help delineate a user’s activity space. It also becomes clear
that some towers are used by many subscribers, while others are used by few (Figure 3C).
12
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 3A: A map of calls transferred (e.g. routed) through cell towers show that some towers
Figure 3B: A map of cell towers by user popularity illustrates that some cell towers are used by
50% of all users, where others are used by less than 1% of the population. Given the clustered
spatial distribution of towers that vary in usage, many callers could be nearest low usage towers,
but these towers are administratively less adroit to host a call, and their call is routed instead to
13
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 3C: A rank-‐size distribution shows each tower’s popularity with users. Each of the 10 most
popular cell phone towers (x axis) is used by at least 20% of the population whereas the 40th-‐91st
most popular towers each is used by less than 10% of the population.
III Method
Creating Activity Spaces To represent how the user moves in the city (e.g. Axhausen et al
2002, Axhausen 2007) we build activity spaces that likely encompass a user’s home, work and
“third places” (Ahas et al 2009, Schneider et al 2013). We choose a polygon method in order to
represent the area surrounding the cell towers where the user is likely to be found, since he or
14
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
These activity spaces summarize a user’s set of frequently-‐visited points (e.g. cell towers) in an
ellipse-‐shaped polygon that encapsulates 68% (i.e. one standard deviation) of points visited by
capturing points that are concentrated in the center and neglecting sparse points in the
periphery (as in Carrasco et al 2006). Ellipses are first centered on the mean geometric center of
a user’s tower locations (mean of x coordinates and y coordinates; repeating values are allowed
if a user visits towers more than once). The value of the standard deviation is calculated for all x
coordinates to obtain an axis, and y coordinates to obtain a second, perpendicular axis. The
ellipse is tilted in a direction that captures the major axis (long edge) of the distribution (see
Mitchell 2005).
As mentioned, the ellipse does not typically encompass all visited towers and excludes those not
frequently visited, encapsulating daily activity space (Figure 4A). Instead, it captures the essence
of the central tendency, dispersion and direction of the user’s travel patterns without including
anomaly cell tower usage (such as a traveler’s phone call from the airport).
Assessing Similarly of Activity Spaces After each individual is assigned an activity space,
we quantify the similarity between an ego’s and an alter’s activity spaces. A method for finding
whether two activity spaces are similar is not straightforward. The percent overlap between two
activity spaces will not account for how much physical area two friends’ spaces share.
Additionally, using the area that two activity spaces share does not tell us how big their spaces
are, i.e. whether this shared area is actually “convenient” relative to their whole activity spaces.
Also, with these two methods, we will not be able to understand what amenities and places for
meeting each individual has in his or her activity space. We overcome these drawbacks by
15
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
creating a third layer of relevant points, as suggested by geometric probability theory (Santalo
2004).
Agents’ activity spaces are qualified by the Points of Interest (POIs) they spatially intersect
(Figure 3C). These POIs are where ‘optional’ activities are likely to occur (Gehl 1987) and include
services, transport and recreational areas as a proxy for how agents use the city. POIs are good
landmarks for social interactions, as third places and the activities performed in third places,
have been shown to be essential for relationships, social health and quality of life (Rosenbaum
2006). Friends may visit POIs to dine or to do business.
POIs are selected and digitized from Google Maps (Google 2013) with data provided by Auto-‐
Navi (2011) and analyzed in the ESRI ArcMap 10.1 environment (ESRI 2012). POIs include the
city’s recreation spots (including parks, internet cafés, personal wellness centers), commercial
centers (restaurants and bars, stores, markets and shopping centers), public services and
institutions (hospitals, post offices, police stations), transportation centers (airports, train
stations), and named villages in the exurban area. POIs of similar types (e.g. restaurants) in a 50
meter radius are grouped into a single POI to eliminate redundant information.
As mentioned, two activity spaces are considered linked if they are connected via the social
network. These friendships are embedded in geographic space through the two activity spaces.
Each unique pair of linked activity spaces is compared by the number common of POIs shared by
the two activity spaces (in absolute number and percent of the user’s total POIs). In Figure 4C,
two pairs of linked activity space ellipses and share few POIs (red pair) or many POIs (yellow
16
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
pair). We use a statistical t-‐test to compare friends’ typical number of shared POIs versus that of
a random pair of users. Our hypothesis is that friends share more POI points than random pairs.
Figure 4A (Top Left): An individual activity space is denoted as an ellipse and captures a user’s
most frequently-‐used towers. Figure 4B (Upper Right): Two single activity space ellipses intersect
underlying Points of Interest (POIs). The green ellipse intersects more POIs than the pink ellipse.
Figure 4C (Lower Left): Two pairs of linked activity spaces, in red and yellow, show that the
yellow pair intersects more common POIs. Figure 4D (Lower Right): The activity space of a focal
17
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
user (in pink) is overlaid alongside the activity spaces of her frequently-‐contacted friends (black).
The pattern shows that the focal user and friends use the downtown area.
IV Results
In this section, we respond to our initial hypotheses regarding how dyads (pairs), social network
personas, triads (groups of three), and communities use the city. First, we find that friends tend
to use the city more frequently than a random pair. Second, we also find that egos with high
degrees are inclined toward the city center, while ego’s with high clustering coefficients show no
significant spatial correlation. Third, counter to our hypothesis, central areas are found to be a
mixing pot for many groups and clusters, many of whom frequent the inner city and the suburbs.
We do, however, find two specific neighborhoods that tend to harbor enclosed (‘clique-‐like’)
social groups. Finally, also counter to our initial hypothesis, peripheral areas show less frequent
mixing of social groups and friends than any other part of the city.
Dyadic Relationships
We find, as hypothesized, that friends’ activity spaces will overlap more often than a random pair
of activity spaces, indicating that friends use the city more similarly than a random pair of
people.
The percentages of friends and of random pairs found to share no POIs are 0.50% and 0.11%,
respectively. Of pairs who share POIs, friend pairs share an average of 55.8 POIs and a random
pair shares 45.77 out of 212 total possible POIs (Figure 5A). A t-‐test using the ellipse activity-‐
18
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
spacespace method yields a t-‐value of 56.03, (degree of freedom (df)) = 44,706, p-‐value < 0.001),
and allows us to reject the null hypothesis that the mean difference between these two groups is
insignificant, indicating that friends share more POIs than random pairs.
Second degree friends also utilize urban infrastructure significantly more similarly, in terms of
shared POIs, than random users (Figure 5B). The t-‐value is 94.82 (df = 152,070, p-‐value < 0.001).
The count of POIs shared by second degree friends on average is 55.23.
Figure 5A (Left): The distribution of frequency of shared POIs in the linked activity spaces of a
random pair (in blue) and of a pair of friends (in red) depicts a similar rate or frequency of shared
POIs, though direct friends have more shared POIs on average. Friends also have higher
frequencies of sharing POIs (depicted by red bars above blue bars). We exclude pairs (random or
linked) who share no common POIs. Figure 5B: The distribution of second-‐degree linked activity
spaces shows that second degree friends share more POI’s on average than random pairs.
Additionally, it is rare that an ego’s alters will visit a POI that the ego herself does not visit. The
average proportion of total egos who have visited a specific POI (e.g. “Flower Park”) is
19
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
proportional to the average percentage of their friends who have also visited the POI. For
example, consider a group of 100 egos, each of whom has 50 alters (summing to a total of 5,000
friends). If an average of 50% of alters (25 alters) per ego have visited a certain POI, then about
50% of egos (50 egos) will have visited there as well. If the average at another park is 20% of all
alters (10 alters), then about 20% of egos (20 egos) will have visited this park as well. The
proportion of egos who have visited a POI is 1.036 times the number of total alters who have
visited the POI, with an r2 correlation of 0.9684. This means there are few, if any, POIs where
one frequents and her friends do not frequent. Conversely, there are also few POIs where one
does not frequent yet a high percentage of her friends visit often.
Social Personas
In traditional social network analysis, the role that a user plays in the network is used to define
her importance and prominence in various facets of social life, such as information about new
job opportunities. For instance, a central figure in a social network (i.e. a figure with many
through social network metrics such as betweenness centrality, degree centrality, or brokerage
statistics (Jackson 2010). It has been shown that those with higher network centrality live in
more central places on the Euclidean grid of longitude and latitude for the network (Onnela et al
2011).
Confirming our second hypothesis, we find that high degree users use the city center more often
than expected, given a random set of users. The top 1% high degree users (each with 150 or
more friends) concentrate at the urban center (Figure 6). A Fisher-‐Snedecor test (F-‐test) of
20
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
ANOVA yields a p value of 0.006 (95% confidence interval) signifying that the spatial variance
between the high-‐degree users’ activity spaces is significantly lower than the spatial variance
between activity spaces of the universal population. This result illustrates high degree agents’
proclivity towards high-‐density areas that are shown to be more innovative, dynamic and
Figure 6. Spatial distribution of activity spaces in the city shows that high degree users generally
We do not find significant spatial patterning with ego characteristics such as clustering
coefficient (Jackson 2010). Users with high centrality measures, such as betweenness centrality,
21
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
congregate in the central east side of the city. Because we initially sampled 150 seed users, the
centrality measures may privilege these egos, and would not fully characterize users.
Additionally, the relationship between degree and total call duration (Table 2) is minimal,
indicating that having more contacts does not lead to more time talking. Generally, users with
more contacts make shorter calls, while those with fewer friends speak for longer. We also find
no correlation between degree or phone activity and area of activity space.
We hypothesized that central areas play an enhanced role in supporting social communities of
friends. The results do not confirm this hypothesis; the central area of the city does not favor
We define “groups” of friends (i.e. communities) in the Gephi environment with Blondel et al’s
(2008) modularity algorithm. This process produces 58 clusters with a modularity quotient of
0.732, indicating that, on average, those assigned to a cluster call within the same cluster 73.2%
of the time.
Each network agent (node) is assigned one modularity group. These agents are denoted by their
ellipse centroid (geographic centers) (Figure 7A). Agents denoted by yellow or teal (Figure 7A)
22
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
are examples from two social network clusters with significant spatial clusters that differ from
the overall distribution, deviating westward and north-‐eastward, respectively. The yellow and
teal clusters are examples of spatially-‐embedded social groups that are significantly clustered in
a way that deviates from the expected spatial distribution of agents the city.
Including these two examples, the spatial distribution of 58 social clusters, in total, do form
statistically significant “hot spots” (significantly dense clusters) and “cold spots”: a mixture of
modular groups (Getis-‐Ord Gi* statistic (Getis and Ord 1992), red and blue areas have P values
>= .0001). Hot spots (dark red areas in Figure 7A) contain agents of the same modular group in
two major regions. Cold spots (blue areas in Figure 7A) cover the downtown, signifying that the
groups that frequent the downtown are not clustered in the downtown, but have other group
23
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 7A. A modularity algorithm is applied to the “aspatial” social network. Two clusters are
now mapped in yellow and teal to show how social network groups use the city. Tight, insular,
social groups cluster in some parts of the city (in dark red), and show no clear pattern in others
(yellow, light red). In the downtown core, covered in blue, clusters are significantly mixed,
meaning that many social groups use the downtown but are not confined to the region.
Another prominent pattern of community configuration, at a more local scale, is the prevalence
of social triangles in the network. A social triangle can be defined as a group three nodes who
connect to one another (Latapy 2008), and pragmatically, will have meeting needs that are
different and more complex than those of a dyad but perhaps not as complex as a modular
24
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
group, which can contain many nodes. In our dataset, agents with the most social triangles
cluster in the downtown area. However, this may be an artifact of the high-‐degree users
downtown, as they are likely to have more social triangles.
The distribution of high-‐triangle nodes (denoted in green and yellow in Figure 7B) follows a
series of parallel roads downtown. Those with the highest number of social triangles (green
activity space centroids) form a tight linear cluster in the core area. This pattern is statistically
significant, showing high Getis-‐Ord GI* Z scores with P values > .0001 in the area slightly east of
the downtown (in red, Figure 7B). P values are not significant in other parts of the city.
Interestingly, centroids covering this neighborhood also saw the most significant modularity
25
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 7B. A clustering of agents who are a part of many social triangles (i.e. groups of three
friends) congregate towards the urban core (green and yellow triangles). This clustering is
statistically significant in one area east of downtown (denoted by red), via statistical Gi* Z scores.
We also hypothesized that peripheral areas may be more mixed. We reject this hypothesis as the
peripheral areas seem to host more cohesive communities than the downtown area. The POIs
shared by linked activity spaces are more frequently on the periphery of the city. A POI can have
from .01%-‐2.8% of all its pairs shown to be friends. Higher values represent places where linked
activity spaces frequent, and lower values, where random pairs frequent. The POI hosting 2.8%
linked activity spaces is located near many popular hotels and the city’s largest park—a notable
26
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
tourist site in the city. This site might be a popular meeting spot for friends, but it may also be
the artifact of many local business calls to one another in nearby buildings.
More generally, this ratio increases further from the city center (Figures 8A and 8B), so that POIs
on the outskirts of the city are more likely to host pairs of contacts. This may be because many
people have activity spaces that stretch into the downtown for work, but have their social
contacts closer to their residences in another part of the city. Though agents with higher degree
tend to frequent the dense downtown, given a POI with 100 random people, one is more likely
to friend the suburban areas. In other words, although the downtown core attracts more people,
the majority seem to be mutual strangers, while in the suburban area, POIs serve as intentional
meeting points. We interpret this finding with care, as there are fewer POI points on the
periphery of the city, thus reducing the granularity and precision to capture activity spaces found
in the city’s outskirts. For instance, an activity space in the shape of a narrow line can be
captured via the dense, granular points in downtown, but such a detailed structure could not be
found in the periphery, since there are so few cell towers and POIs to delineate a more complex
polygon.
27
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 8A: As the POIs are located farther away from the city center, there is a higher ratio of
28
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Figure 8B: The ratio of POI usage for friends vs. a random pairs shows that friends are more likely
to use the same POIs if they live on the periphery of the city. These POIs are denoted in red,
where up to 2.8% of their usage is linked with friendship.
In summary, friends and even second-‐degree friends tend to use common points of interest in
the city. Additionally, high degree users (i.e. those with many friends) tend to congregate
towards the downtown (central business district), but that those with many social triangle
friendships center in a neighborhood east of the central business district. The downtown core
hosts many heterogeneous social groups instead of small social clusters.
Discussion
In summary, we leverage social-‐spatial data from call data records in a new way that emphasizes
social relationships embedded in urban physical space. We find that the density (clustering) of
social ties deviates from the density users in the city.
It is clear that we are only at the beginning of understanding how interpersonal relationships
manifest themselves in the built environment. Yet it is a phenomena we experience daily as we
meet colleagues at work, family at home, and perhaps, friends in third places. These dueling
variables also guide our decisions about raising families through the choice of neighborhoods
and school districts, as well as migrating through the choice of leaving established social circles
We know that a healthy city is built on strong networks (Gilchrist 2009), but we still do not know
what kinds of ties exist in cities and neighborhoods, and we cannot see these networks in detail.
We cannot currently use type of social network configuration (clustered, decentralized, hub-‐
spoke, etc.) as a cause or effect variable in in assessing planning choices for new or existing
neighborhoods and cities. Yet, they certainly are a cause and effect variable.
Our ability to socialize with others is affected by urban planning and government decisions
regarding low-‐income housing, immigration reform, and health codes, such as number of people
to an urban residence as well as building subdivisions versus condominiums (Farber and Li 2013)
and narrow versus wide roads (Montgomery 2013). The spatial and social clustering of ties change
with the creation and dissolution of institutions: such as firms, universities, military bases, sports
franchises and religious institutions. Less socialization may also lead to stagnated mobility, not
30
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
due to lack of accessibility, but to a reduced need for third places to socialize (Rosenbaum 2006),
Urban planners, geographers, government officials, civil engineers and transportation planners
that focus on improving social life in their city can improve residents’ quality of life (Cacioppo
and Patrick 2008), more so than traditional economic stimuli, such as an increase in income
(Montgomery 2013). Instead, planners and geographers have worked towards better urban
environments investigating residents’ accessibility to amenities (such as hospitals), travel time to
work, and social justice issues such as susceptibility to industrial and environmental hazards
(Cutter et al 2003). In addition to these variables, we should continue to gage social capital
We cannot infer these social patterns from city form, or social networks alone. There must be a
combination, as currently there is no one-‐to-‐one ratio that links a certain type of urban fabric to
a certain type of social network. Thus, our task with this chapter is to illustrate the linked activity
space method that allows for the integration of information from a social network into
geographic space, a combination that is rarely perused (Andris 2011). Although we use a call
dataset record (CDR) for our analysis, this method can be employed to any dataset that has both
evidence of social ties between agents and the geo-‐location of the agents.
We do find a number of methodological and pragmatic challenges to this type of research. Many
of these challenges stem from the nascent state of big data analysis that will perhaps become
more fluid in the future, with better collection methods. Nevertheless, there are issues with
these data that can be addressed: CDR datasets do not capture an ego with alters who do not
31
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
appear in the social network. Multiple cell phones per person and multiple people per cell
phones do not ensure that the telephone number is a proxy for an individual’s communication
patterns. Without figures on the provider’s market penetration rate is difficult to understand
friendships via calls to users who use a different provider. We also note a number of subjective
decisions in creating a meaningful sample, such as the minimum number of towers frequented in
order to be included in the dataset, the number of seed users, and number of friend “levels” to
draw from the networks. Future endeavors include accounting for evening, holiday and weekend
dynamics, where social life may be different than a traditional workday, and the use of this
method in larger and smaller cities. We hope to see more research on the integration of social
Acknowledgements
Thank you to Joe Hand for discussions and revisions. This research was partially supported by the
Army Research Office (grant no. W911NF-‐121-‐0097), John Templeton Foundation (grant no.
15705), the Bill and Melinda Gates Foundation (grant no. OPP1076282), the Rockefeller
Foundation, the James S. McDonnell Foundation (grant no. 220020195), the National Science
Foundation (grant no. 103522) a gift from the Bryan J. and June B. Zwan Foundation and the
32
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
Geographic Information System(s) (GIS)
References
1. Ahas, R., Silm, S., Saluveer, E. and Järv, O. (2009) Modelling home and work locations of
populations using passive mobile positioning data. Location Based Services and
TeleCartography II Lecture Notes in Geoinformation and Cartography, Section III Ch. 18.
2. Albert, R., and Barabási, A. L. (2002) Statistical mechanics of complex networks. Reviews
3. Andris, C. (2011) Methods and metrics for social distance. Doctoral Dissertation, MIT,
4. Arentze,T., van den Berg, P. and Timmermans, H. (2012) Modeling social networks in
geographic space: approach and empirical application" Environment and Planning A,
44(5), 1101–1120.
5. Axhausen, K. W., Zimmermann, A., Schönfelder, S., Rindsfüser, G. and Haupt, T. (2002)
Observing the rhythms of daily life: A six-‐week travel diary. Transportation, 29(2), 95-‐124.
6. Axhausen, K. W. (2007) Activity spaces, biographies, social networks and their welfare
gains and externalities: Some hypotheses and empirical results. Mobilities, 2(1), 15-‐36.
7. Bastian, M., Heymann, S. and Jacomy, M. (2009) Gephi: an open source software for
exploring and manipulating networks. In: Proceedings of the International conference on
weblogs and social media (ICWSM) 17–20 May, San Jose, California.
33
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
8. Bettencourt, L. M. A., Lobo, J., Helbing, D., Kühnert, C. and West, G. B. (2007) Growth,
innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of
9. Bettencourt, L. M. A. (2013) The Origins of Scaling in Cities. Science, 340(6139), 1438–
1441.
10. Blondel, V., Guillaume, J-‐L. Lambiotte, R. and Lefebvre, E. (2008) Fast unfolding of
communities in large networks. Journal of Statistical Mechanics: Theory and Experiment
(10), 1000.
11. Cacioppo, J. T. and Patrick, W. (2008) Loneliness: Human nature and the need for social
12. Carrasco, J.A., Miller, E. and Wellman, B. (2006) Spatial and Social Networks: The case of
travel for social activities. 11th International Conference on Travel Behavior Research. 16-‐
13. Centola, D. and Macy, M. (2007) Complex contagions and the weakness of long ties1.
14. Christakis, N. and Fowler, J. (2007) The spread of obesity in a large social network over 32
15. Cho, E. Myers, S. A. and Leskovec. J. (2011) Friendship and mobility: User movement in
location-‐based social networks. In Proceedings of the 17th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD). 21 – 24 August, San Diego,
California. 1082-‐1090.
34
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
16. Crandall, D. J., Backstrom, L., Cosley, D., Suri, S., Huttenlocher, D. and Kleinberg, J. (2010)
Inferring social ties from geographic coincidences. Proceedings of the National Academy
17. Cutter, S. L., Boruff, B. J., & Shirley, W. L. (2003). Social vulnerability to environmental
18. De Montis, A,, Chessa, A., Campagna, M., Caschili, S. and Deplano, G. (2010) Modeling
commuting systems through a complex network analysis: A study of the Italian islands of
Sardinia and Sicily. The Journal of Transport and Land Use, 2(3), 39-‐55.
19. Eagle, N., Pentland, A. S., and Lazer, D. (2009) Inferring friendship network structure by
using mobile phone data. Proceedings of the National Academy of Sciences, 106(36),
15274-‐15278.
20. Emch, M., Root, E. D., Giebultowicz, S., Ali, M., Perez-‐Heydrich, C. and Yunus, M. (2012)
Integration of spatial and social network analysis in disease transmission studies. Annals
21. Farber, S. and Li, X. (2013) Urban sprawl and social interaction potential: an empirical
analysis of large metropolitan regions in the United States. Journal of Transport
22. Frei, A. and Axhausen, K.W. (2011) Modeling spatial embedded social networks, Working
paper 685: Transport and Spatial Planning, IVT, ETH Zurich, Zurich.
23. Gao, S., Wang, Y., Gao, Y. and Liu, Y. (2013) Understanding urban traffic-‐flow
35
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
24. Gehl, J. (1987) Life between buildings: Using public space. New York: Van Nostrand
25. Getis, A. and Ord, J.K. (1992) The analysis of spatial association by use of distance
26. Gilchrist, A. (2009) The well-‐connected community: A networking approach to
27. Girardin, F., Calabrese, F., Fiore, F. D., Ratti, C., and Blat, J. (2008) Digital footprinting:
Uncovering tourists with user-‐generated content. IEEE Pervasive Computing, 7(4), 36-‐43.
28. Golledge, R. G. (1999) Human wayfinding and cognitive maps. In: Wayfinding behavior:
Cognitive mapping and other spatial processes, Ed. R. Golledge. Baltimore: Johns Hopkins
29. Granovetter, M. (1973) The strength of weak ties. American Journal of Sociology,
78(6),1360-‐1380.
30. Granovetter, M. (2005) The impact of social structure on economic outcomes. The
31. Jackson, M. O. (2010) Social and economic networks. Princeton University Press.
32. Kang, C., Ma, X., Tong, D., and Liu, Y. (2012) Intra-‐urban human mobility patterns: An
urban morphology perspective. Physica A: Statistical Mechanics and its Applications,
391(4), 1702–1717.
33. Latapy, M. (2008) Main-‐memory triangle computations for very large (sparse (power-‐law))
36
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
34. Mitchell, A. (2005) The ESRI guide to GIS analysis, Volume 2 . Redlands, California: ESRI
Press.
35. Montgomery, C. (2013) Happy city: Transforming our lives through urban design. New
36. Newman, M. E., and Park, J. (2003) Why social networks are different from other types of
37. Onnela, J. P., Arbesman, S., González, M., Barabási, A. L., and Christakis, N. (2011)
Geographic constraints on social network groups. PLoS one, 6(4), e16939.
38. Papas, M. A., Alberg, A. J., Ewing, R., Helzlsouer, K. J., Gary, T. L. and Klassen, A. C. (2007)
39. Radil, S.M. Flint, C. and Tita, G. (2010) Spatializing social networks: Using social network
analysis to investigate geographies of gang rivalry, territoriality, and violence in Los
Angeles. Annals of the American Association of Geographers, 100(2), 307-‐326.
40. Papachristos, A., Hureau, D. and Braga, A. (2013) The corner and the crew: The influence
of geography and social networks on gang violence. American Sociological Review 78(3)
417–447.
41. Ratti, C., Williams, S., Frenchman, D. and Pulselli, R. M. (2006) Mobile landscapes: using
location data from cell phones for urban analysis. Environment and Planning B: Planning
42. Reades, J., Calabrese, F., Sevtsuk, A. and Ratti, C. (2007) Cellular census: Explorations in
37
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
43. Rosenbaum, M. S. (2006) Exploring the social supportive role of third places in
44. Salganik, M. J. and Watts, D. J. (2008) Leading the herd astray: An experimental study of
self-‐fulfilling prophecies in an artificial cultural market. Social Psychology Quarterly, 71(4),
338-‐355.
45. Santaló, L. A. (2004) Integral geometry and geometric probability. Cambridge: Cambridge
University Press.
46. Scellato, S., Noulas, A., Lambiotte, R. and Mascolo, C. (2011) Socio-‐spatial properties of
online location-‐based social networks. In Proceedings of the International Conference on
47. Schneider, C. M., Belik, V., Couronné, T., Smoreda, Z. and González, M. C. (2013)
Unravelling daily human mobility motifs. Journal of the Royal Society Interface, 10(84),
20130246.
48. Schönfelder, S. and Axhausen, K. W. (2003) Activity spaces: Measures of social exclusion?
Activity spaces: Measures of social exclusion? Arbeitsbericht Verkehrs-‐ und Raumplanung,
140, Institut für Verkehrsplanung und Transportsysteme (IVT), ETH Zürich, Zürich.
49. Sevtsuk, A. and Mekonnen, M. (2012) Urban network analysis: A new toolbox for ArcGIS.
50. Sevtsuk, A., Huang, S., Calabrese, F., and Ratti, C. (2009). Mapping the MIT campus in real
time using WiFi. In: Handbook of Research on Urban Informatics: The Practice and
Promise of the Real-‐Time City. Ed. Marcus Foth Hershey, PA: IGI Global. 326-‐337.
38
Book chapter proposal submission: Computational Approaches for Urban Environments, 2013
51. Sevtsuk, A. and Ratti, C. (2005) iSpots. How Wireless Technology Is Changing Life on the
MIT Campus. Proceedings of CUPUM 05: The Ninth International Conference on
Computers in Urban Planning and Urban Management, 29 June -‐ 1 July, London.
52. Sun, Y., Fan, H., Helbich, M. and Zipf, A. (2013) Analyzing human activities through
volunteered geographic information: Using Flickr to analyze spatial and temporal pattern
Heidelberg, 57-‐69.
39