Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

MOBILE AND UBIQUITOUS SYSTEMS

www.computer.org/pervasive

Cellular Census: Explorations in Urban Data


Collection
Jonathan Reades, Francesco Calabrese, Andres Sevtsuk, and Carlo Ratti

Vol. 6, No. 3
July–September 2007

This material is presented to ensure timely dissemination of scholarly and technical


work. Copyright and all rights therein are retained by authors or by other copyright
holders. All persons copying this information are expected to adhere to the terms
and constraints invoked by each author's copyright. In most cases, these works
may not be reposted without the explicit permission of the copyright holder.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or

for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be

obtained from the IEEE.

For more information, please see www.ieee.org/web/publications/rights/index.html.


URBAN COMPUTING

Cellular Census:
Explorations in Urban
Data Collection
Analysis of cell phone use can provide an important new way of looking
at the city as a holistic, dynamic system.

M
uch of our understanding of ur- tions of mobile phone usage levels in central
ban systems comes from tradi- Rome during autumn 2006. The system archi-
tional data collection methods tecture, including data collection, transfer, and
such as surveys by person or processing, has been detailed elsewhere.1
phone. These approaches can TI supplied several different types of data, first
provide detailed information about urban behav- and foremost of which was the Erlang, a mea-
iors, but they’re hard to update and might limit sure of network bandwidth usage typically col-
results to “snapshots in time.” lected at the antenna level. Additionally, TI used
In the past few years, some innovative ap- its innovative Lochness platform to supply aggre-
proaches have sought to use mobile devices to col- gate location and trajectory data on callers using
lect spatiotemporal data (see the sidebar, “Urban the system for more than three minutes at a time.
Analysis Using Mobile-Device Data”). But little Two transportation companies—Atac-Rome (a
research has been done to develop and analyze the public bus company) and Samarcanda (a private
much larger samples of existing data generated daily taxi company)—also provided supplemental GPS
by mobile networks. data to MIT for further processing. However,
The most common explanation for this is that here we focus on the Erlang data collected over
the challenge of data-sharing with the telecom- four months in late 2006 and covering a region
munications industry has ham- of 47 km2, considering how it can help us better
Jonathan Reades pered data access. However, in understand urban dynamics.
University College London early 2006, a collaboration be- An Erlang is one person-hour of phone use, so
tween Telecom Italia, which 1 Erlang could represent one person talking for an
Francesco Calabrese, Andres
serves 40 percent of the Roman hour, two people talking for a half hour each, 30
Sevtsuk, and Carlo Ratti
market, and MIT’s SENSEable people speaking for two minutes each, and so on.
Massachusetts Institute City Laboratory (http://senseable. Consequently, Erlang data is both aggregate and
of Technology mit.edu) allowed unprecedented anonymous, and deducing individual identities
access to aggregate mobile phone from the data collected and stored in the system is
data from Rome. Here, we ex- impossible. Additionally, because Erlang data is a
plore how researchers might be able to use data standard measure used by most network opera-
for an entire metropolitan region to analyze tors, it’s an accessible source for the analysis of typ-
urban dynamics. ical GSM (Global System for Mobile Communi-
cation) networks. You can collect Erlang data
The Real Time Rome platform without installing new applications or upgrading
The TI and MIT collaboration, developed the base station controllers, both of which incur
under the Real Time Rome label, was shown at costs and operational risks for the networks.
the 2006 Venice Biennale. The installation incor- Although Erlang data can’t be linked to an indi-
porated both real-time and historical visualiza- vidual subscriber and doesn’t offer the locational

30 PERVASIVE computing Published by the IEEE Computer Society ■ 1536-1268/07/$25.00 © 2007 IEEE
Urban Analysis Using Mobile-Device Data

T he Massachusetts Institute of Technology’s Reality Mining project


(http://reality.media.mit.edu) successfully abstracted common
behavioral patterns from the activities of 70 students and faculty issued
quire user consent. Consequently, sample sizes are necessarily more
modest and might be limited in terms of the research’s spatial extent.

REFERENCES
with Nokia phones carrying specially designed logging software.1,2
Rein Ahas and Ülar Mark tracked the mobile phones of 300 users 1. N. Eagle and A. Pentland, “Eigenbehaviors: Identifying Structure in Rou-
for a “social positioning method” analysis.3 By combining spatio- tine,” 2006; http://vismod.media.mit.edu//tech-reports/TR-601.pdf.

temporal data from phones with demographic and attitudinal data 2. N. Eagle and A. Pentland, “Reality Mining: Sensing Complex Social Sys-
from surveys, they created a map of social spaces in Estonia. tems,” Personal and Ubiquitous Computing, vol. 10, no. 4, 2006, pp.
In the UK, the Cityware research group has taken a more readily 255–268.

scalable approach. They supplement the pedestrian flow data typi- 3. R. Ahas and Ü. Mark, “Location Based Services—New Challenges for Plan-
cally gathered as part of a space syntax analysis with data on Blue- ning and Public Administration?” Futures, vol. 37, no. 6, 2005, pp. 547–561.
tooth devices passing through pedestrian survey “gates.”4
4. E. O’Neill et al., “Instrumenting the City: Developing Methods for
However, approaches such as these can suffer from important limi- Observing and Understanding the Digital Cityscape,” UbiComp 2006:
tations: they rely on the deployment of ad hoc infrastructure or re- Ubiquitous Computing, LNCS 4206, Springer, 2006, pp. 315–332.

specificity of GPS, it nonetheless remains


attractive for urban research at scales
where this level of resolution is unnec-
essary. In effect, Erlang data provides
both a view of urban space as seen
through network bandwidth consump-
tion and, indirectly, insight into urban
life’s spatial and temporal dynamics.
This aspect makes it an excellent jump-
ing-off point for research supporting
public-transport planning, health and
safety, advertising, and other types of
group-directed activity.

First visualizations
and hypothesis
Figure 1 shows one of the simplest vi-
sualizations of Erlang data: a 3D plot
of telecommunications activity during
Madonna’s controversial 6 August Figure 1. A 3D plot of telecommunications activity during a Madonna concert
2006 performance, when more than in Rome.
70,000 people converged on the Stadio
Olimpico for a concert condemned by
the Pope. Generic Erlang maps such as ious statistical techniques, we can use dif- bar), except that we’re characterizing
this, which was presented at the Bien- ferences in Erlang data over time to derive spaces by their mobile-bandwidth use over
nale, are graphically appealing and clues to the types of activity in the imme- time. By analyzing the bandwidth “signa-
intuitively easy to grasp. However, diate area of the mast. (A mast can carry ture” of each antenna, we try to envision
they’re actually quite difficult to inter- multiple antennas, oriented in different how it might correlate with urban activi-
pret rigorously, and they provide little directions or serving different frequencies.) ties in the geographical vicinity.
insight into local-area dynamics with- This analysis is conceptually related to the Because Erlang data is an antenna-
out additional processing. idea of a chronotype (see the “Chrono- level measure, we needed an algorithm
We hypothesize that by employing var- types and Space-Time Typologies” side- to spread the point data values across the

JULY–SEPTEMBER 2007 PERVASIVE computing 31


URBAN COMPUTING

Chronotypes and Space-Time Typologies

T o help conceptualize a city’s complex hourly, daily, weekly,


monthly, and annual rhythms, Luca Bertolini and Martin Dijst
put forward Roberta Bonfiglioli’s concept of the chronotype.1 The
typology of urban, suburban, and rural municipalities … based on
diurnal weekday variations in visitor populations”2 as a way to un-
derstand how place works in Manuel Castells’ “network society.”3
chronotype is a useful conceptual handle for thinking about how dif-
ferent groups occupy the same space depending on the time of day.
Bertolini and Dijst offer the example of a mixed-use area inhabited by REFERENCES
young couples without children and by families. The young couples 1. L. Bertolini and M. Dijst, “Mobility Environments and Network Cities,” J.
will likely work in another part of the city, returning perhaps only in Urban Design, vol. 8, no. 1, 2003, pp. 27–43.
the evening to socialize in bars and restaurants. In contrast, family
2. R. Zandvliet and M. Dijst, “Short-Term Dynamics in the Use of Places: A
members will go shopping and use other services during the day in Space-Time Typology of Visitor Populations in the Netherlands,” Urban
this area. The same space can thus have two or more distinct uses and Studies, vol. 43, no. 7, 2006.
populations.
3. M. Castells, “Grassrooting the Space of Flows,” Cities in the Telecommu-
As another way to conceptualize these rhythms, Robbert Zand- nications Age, J. Wheeler, Y. Ayoma, and B. Warf, eds., Routledge, 2000,
vliet and Dijst offer space-time typologies.2 That is, they propose “a pp. 18–27.

area served, accounting for distance transmitted to the SENSEable City Labo- Figure 2 shows the pixels for these six
decay in signal coverage and multiple ratory. This means that while the rela- locations.
antennas on a single mast. Carlo Ratti tive difference between any two observa- To minimize the impact of special
and his colleagues took a center-of- tions is scaled consistently, the actual events on the data set, we calculated an
gravity approach,2 but to interpolate Erlang value at that point in time is un- average Erlang value for each pixel at
values for the entire metropolitan region, known. So, it’s helpful to focus on the each 15-minute interval, using a 90-day
an alternative algorithm3 was used to relationships between points over time period. So, for example, the data point
divide Rome into “pixels” measuring and space rather than the specific value for 9 a.m. Monday is an average of every
1,600 m2. We used an exponential dis- at any one point in time.) 9 a.m. Monday value between 1 Sep-
tribution function to derive an Erlang Using prior knowledge of the city, we tember and 30 November 2006. We
point value based on a composite signal arbitrarily selected eight locations that excluded civic holidays from the calcu-
from the surrounding masts. we expected to have markedly different lation on the basis that they would intro-
We use this mathematical notation: signatures. Following an initial vi- duce unnecessary noise.
sualization exercise, we selected six for
• Loc is the set of 1,600 m2 pixels. analysis: Erlang data by day of the week
• T96 is the set of times when we made Beginning with a minimal level of
observations each day of the week. • Termini, Rome’s main passenger rail processing, figure 3 shows how Erlang
Because we took measurements every station and busiest subway station; data changes over time at each of the
15 minutes, one day comprises 96 • Trastevere, a mixed-use area popular six selected pixels. As the graphs in-
observations. with Romans and tourists for its bars dicate, Monday through Friday are
• Day is the set of {Weekday, Friday, Sat- and restaurants; broadly similar, except for a more rapid
urday, Sunday} (we discuss this in • the Piazza Bologna, a residential area decrease in activity on Friday after-
more detail later). east of the city center; noon, suggesting a transition to the
• erlang(␦, ␭, ) defines the Erlang value • the area in front of the Pantheon (one weekend. Even more strikingly, Satur-
at location ␭  Loc, at time   T96, of Rome’s premier tourist attractions), day and Sunday values often drop be-
and ␦  Day. which also contains many bars and low 50 percent of the typical weekday
{ }
• mean ai indicates the mean of the
i∈I
restaurants; load, but the drop’s magnitude varies
• the Stadio Olimpico, a sports and ma- dramatically from site to site. This find-
values ai, i  I. jor concert venue northwest of central ing indicates that weekday and week-
Rome; and end data should be treated separately
(To preserve confidentiality, TI used a • Tiburtina, a smaller rail and subway in our analysis.
scaling factor to adjust the Erlang values interchange. Intriguingly, areas more closely identi-

32 PERVASIVE computing www.computer.org/pervasive


Figure 2. A map of Rome indicating
the six locations (“pixels”) selected for
analysis of mobile phone usage.

fied with Roman residents, such as the


Piazza Bologna and Tiburtina, display
lower levels of day-to-day Erlang variance
than those associated with more transient
populations of commuters or tourists,
such as Termini and the Pantheon. This
peculiarity suggests that the greater the
flux of people—or, possibly, nonresi-
dents—through a site, the greater the vari-
ance in the signal. Conversely, predomi-
nantly “local” areas seem to have higher
levels of routine or habitual activity and
thus less variation between days.
However, in spite of the differences
between weekdays, Fridays, and the
weekend, figure 3 shows that all six loca-
tions demonstrate a broadly compara-
ble rhythm—a rapid ramping-up of
telecommunications activity between 6
and 10 a.m. on weekdays and a slower
pace on weekends. Apart from the Sta- 0 625 1,250 2,500 meters Selected pixels for analysis Subway stations
dio Olimpico, where the rhythms of con-
certs and football matches clearly show
on the graph, patterns are quite uniform:
a clear double peak and varying ratios
{ (
mean erlang δ , λ , τ
λ ∈Loc
)} bution of telecommunications activity.
Such events will likely place a corre-
of weekday-to-weekend activity. So, how spondingly high load on urban infra-
can we make differences in signatures 2. We then normalize the signature structure and resources, and a similar
between different sites more evident? over space: spike in Erlang data is also likely during
emergencies. This suggests that the real-
Normalization erlang norm _ space = time recognition of unusual concentra-
The magnitude of the differences be-
tween sites in figure 3 makes it hard to
(
erlang δ , λ , τ ) tions of telecommunications activity
might have relevance for public safety
compare them in a more detailed way, { (
mean erlang δ , λ , τ
λ ∈Loc
)} planning and transport scheduling.
so some type of data normalization is On weekends, the higher levels of
necessary. In figure 4, we plot the ratio of 3. We then normalize the signature over bandwidth use between midnight and 2
telecommunications intensity at one time: a.m. near the Pantheon and in Traste-
pixel against the average of every pixel in vere relative to work-oriented and resi-
the system at that point in time (nor- τ ∈T96 → erlang norm _ space _ time = dentially oriented pixels strongly suggest
malization over space). We then compare
that to the daily pixel average (normal-
(
erlang norm _ space δ , λ , τ ) leisure activity. This feature suggests that
we can also identify cultural and leisure
ization over time). Using this approach, {
mean erlang norm _ space δ , λ , τ
τ ∈T96
( )} areas on the basis of their telecommuni-
we can identify otherwise hidden shifts cations signature. Another feature with
in the relative intensity of activity across implications for the understanding of
Rome. The differences in figure 4 are quite urban dynamics is the high level of activ-
We employed these normalization steps: visible, and the radically different signa- ity at the transit hubs on weekday morn-
ture at the Stadio Olimpico indicates that ings, compared to residential sites.
1. For each location ␭ and day ␦, we you can readily recognize certain classes Figure 5 shows a more diffuse pattern
calculate of urban activity by the unusual distri- of spatial activity on weekends. This is

JULY–SEPTEMBER 2007 PERVASIVE computing 33


URBAN COMPUTING

Stadio Olimpico Pantheon Piazza Bologna


Erlang (natural scale)

Termini Tiburtina Trastevere


Erlang (natural scale)

12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m. 12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m. 12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m.

Monday Tuesday Wednesday Thursday Friday Saturday Sunday

Figure 3. Erlang data for the six locations by day of week.

consistent with the idea that although Cluster analysis 9 p.m. Each of these points lies toward
weekday telecommunications activity at So far, we’ve focused largely on indi- the middle of a period of rapid change
each site exhibits a more dynamic tempo- vidual pixels, and we’ve identified some or significant variation between sites—
ral pattern, weekend activity exhibits more interesting features at a fairly detailed the early morning rise in activity, late
spatial dispersal. From an urban-planning spatial level. Our preliminary analysis morning peak period, early afternoon
standpoint, this strongly suggests large indicates that residential areas, com- lull, afternoon peak, and evening drop.
commuter flows into the central business muter hubs, nighttime hot spots, and The six normalized Erlang values thus
district during the week and more resi- even special-event venues demonstrate make up the coordinates of a vector that
dentially oriented activity on weekends. features consistent with our contextual, describes, in a limited way, each pixel’s
Of course, planners are well aware of this anecdotal knowledge of Rome. How- signature.
spatial relationship, but spatial and tem- ever, validating our hypotheses requires We could use many clustering tech-
poral visualization of these features at this a more rigorously quantitative study. niques to create segmentations based on
scale hasn’t been possible before. The ultimate goal is to take the derived the affinity between vectors. We chose a
One caveat: the levels of activity be- signatures, group them by degree of sim- K-Means approach, such that every
tween 3 and 6 a.m. throughout the week ilarity, and map them to urban spa- observation in a cluster is as much like
mean that any analysis using that period tiotemporal structures. other members of that cluster and as dif-
would be rooted in extremely low Erlang As a proof of concept, we created a ferent as possible from members of any
values. So, such a comparison might erro- simplified vector—required for compu- other cluster. With six coordinates from
neously indicate excessive shifts in activ- tational manageability—to feed pixel each day, and separate sets of coordi-
ity from site to site. Nonetheless, from this data for each of Rome’s 262,144 pixels nates for Monday through Thursday
initial analysis, it seems that through nor- to a clustering algorithm. An examina- (one set of averaged observations), Fri-
malized signatures we can reconstruct tion of our six selected pixels suggested day, Saturday, and Sunday, the K-Means
some of the functioning of the city using that six times in the daily cycle of Erlang algorithm used a 24-dimensional space.
the invisible fingerprints of mobile phone activity are particularly significant: 1 We employed two clustering steps.
infrastructure. a.m., 7 a.m., 11 a.m., 2 p.m., 5 p.m., and First, for each pixel, we calculate fea-

34 PERVASIVE computing www.computer.org/pervasive


Figure 4. Erlang data normalized over
space and time by site: (a) Monday
4
through Thursday and (b) Saturday.

Normalized Erlang
3

2
ture(loc) = {erlang(␦, ␭, j)}, j = 1 a.m., 7
a.m., 11 a.m., 2 p.m., 5 p.m., and 9 p.m.
1
Second, the K-Means clustering algo-
rithm partitions the pixels into mutually 0
exclusive clusters. Each cluster is charac- 12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m.
terized by its centroid, and the algorithm (a)
aims to minimize the error function: Normalized Erlang 4

∑ distance (loc j , centroidk )


nClusters 2

3

k =1 loc j ∈Clusterk
2
where clusterk is the set of objects related
to the cluster k, and centroidk is the mean 1
of all the points in clusterk. We calculated
the distance between pixels using the 0
12 a.m. 5 a.m. 10 a.m. 3 p.m. 8 p.m.
squared Euclidean distance: (b)
distance loc1 , loc2 = ( ) Termini Trastevere Pantheon
1
⎛ 24 Piazza Bologna Tiburtina Stadio Olimpico
2⎞
2

⎜∑
⎜⎝ τ =1 i
( )
feature loc1 − feature loc2 ( )i ⎟
⎟⎠
plexity of cities, this is hardly surprising. with eight clusters as a compromise
As a result of the clustering process, we However, the existence of several small between simplicity and specificity. Doing
can group all pixels in the city into any arbi- clusters with much stronger levels of affil- this gave us a fair cophenetic correlation
trary number of groups based on the affin- iation or differentiation indicates that the value of 0.7704. Cophenetic correlation
ity of their composite Erlang signature. In overall data set includes some quite distinct is one way to gauge the clusters’ fit to the
our tests, we found a mix of clusters that signatures. These signatures will likely map original data set—values approaching
suggest a complex set of relationships to distinct types of urban activity. 1.0 suggest a good fit—by comparing
between signatures. Given the sheer com- For this initial research, we worked pairwise linkages between observations.

Figure 5. Erlang data for Rome normalized over space and time. Intensities range from low (blue) to high (red).

1 a.m. 7 a.m. 11 a.m. 2 p.m. 5 p.m. 9 p.m.


Monday to Thursday
Saturday

JULY–SEPTEMBER 2007 PERVASIVE computing 35


URBAN COMPUTING

(a) (b) (c)

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7 Cluster 8

Figure 6. Analysis of eight clusters of Erlang data: (a) clusters 1–4; (b) a satellite view of Rome, for comparison; (c) clusters 5–8.

Projecting these clusters onto a map map to the most important points of Moreover, we’ve recently received
of Rome (see figure 6) naturally indi- entry to the city by car and train: Ter- data from Pagine Gialle (the Italian Yel-
cates that they’re closely linked to the mini station, Tiburtina, the end of the low Pages) with which we intend to val-
normalized Erlang signatures. The Corso d’Italia, the Porta Maggiore, and idate our initial findings by linking the
edges of Rome’s urban core are clearly the Porta San Giovanni. signatures to spatial data on business
visible, as are the hot spots of urban types and densities. In so doing, we can
activity straddling the Tiber River. The Discussion build on the processing requirements we
map suggests an overall structure to the Our preliminary findings suggest that discussed earlier in this article:
city, with a correspondence between signature analysis can provide an impor-
levels of telecommunications activity tant new way of looking at the city as a 1. Antenna and pixel values must be
and types of human activity. At this holistic, dynamic system. In particular, the normalized over both space and
point, however, we can’t verifiably con- mobile phone network lets us develop a time to provide a measure of rela-
nect cellular signatures to specific types real-time representation of those dynam- tive telecommunications intensity.
of human activity. ics at the city and city-region scale. This 2. The substantial differences between
We then adjusted the metric to favor approach can complement traditional col- weekdays and weekends require
the two most distinctive types of use lection techniques, which are often out- treating them separately in a classi-
seen in the normalized graphs: early dated by the time they’re available to pol- fication algorithm.
morning use suggestive of commuting icy makers and the general public. Of 3. The key time periods intimated in
behavior and late evening use suggestive course, because our hypotheses so far are this initial analysis appear to be 12
of nighttime leisure activities. For these based on anecdotal evidence, our findings to 2 a.m., 5 to 8 a.m., 10 a.m. to 12
clusters we obtained cophenetic corre- will require additional validation, which p.m., 2 to 6 p.m., and 8 to 11 p.m.
lations of 0.7630 and 0.8508, indicat- we outline below. However, as our initial cluster analy-
ing that the clustering approach has sub- What’s most promising about this sis makes clear, these aren’t the only
stantial promise. early research is the extent to which our factors.
The red nighttime-leisure cluster in fig- findings seem to parallel those of other
ure 7a shows two discrete spatial group- European researchers4,5 as well as more We expect several other analytical
ings that map anecdotally to known conceptual research into telecommuni- approaches to yield insights into net-
areas of evening activity: Trastevere and cations’ impacts on urban behaviors.6,7 work usage patterns. One of the most
the area ranging to the west and south In particular, we can characterize areas promising approaches is Eigenbehavior
of the Piazza Navona, and the vicinity on the basis of flows and dynamics analysis.8 Because we can easily map the
of the Piazza Spagna. The red commuter rather than on the basis of comparatively signature to a vector representation of
clusters in figure 7b quite astonishingly static physical or demographic features. the sort already used in the cluster analy-

36 PERVASIVE computing www.computer.org/pervasive


Figure 7. Analysis of the five clusters
Nighttime-leisure-weighted clusters Transit-weighted clusters
covering Erlang data for the two most
distinctive types of cell phone use:
(a) nighttime leisure, (b) early morning
commuting.

sis, deriving the Eigenvectors should be


quite straightforward. Applying other
analytical techniques such as Fourier and
wavelet transform plots might reveal
new, distinct characteristics. Ideally, each
of these analyses will eventually feed into
a single categorization process that can
discriminate between discrete types of (a) (b)
behavior at the antenna and pixel levels.
Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5
Limitations of research
As we mentioned before, TI’s masking
function meant that we weren’t able to location-based services typically require. ther research using cell phones—the most
work with true Erlang values. An addi- So, we feel that there’s plenty of oppor- widely deployed device with locational
tional constraint is that owing to opera- tunity to gain valuable insights into capabilities—won’t be possible at the city
tional requirements and planning re- urban dynamics using Erlang data at or city-region scale.
strictions, GSM masts are irregularly smaller scales, and we intend to move Without encouragement, far more
distributed and oriented. To manage the forward with it. detailed data sets held by the networks
computational and mathematical com- will never see the light of day. For in-
plexity of calculating point values for stance, paging data—generated by

I
262,144 pixels over a three-month per- t would be exciting to compare the polling the phones in a cell to obtain a
iod, our algorithm spreads Erlang data signatures collected from Rome list of IMEI (International Mobile Equip-
through all 360 degrees, producing a pos- with similar data from other major ment Identity) numbers at the mast
sible skew in the overall distribution. European cities such as London, level—could provide unmatched detail
Finally, not all masts handle both the Paris, or Frankfurt. For instance, it’s rea- on travel origins and destinations, and
900- and 1,800-Hz bands used in sonable to expect that cities with more on population densities. By scrambling
Europe. So, some network activity might distinct spatial patterns of human activ- handset identifiers with changing encryp-
gravitate toward more physically remote ity might display correspondingly more tion schemes, reporting only partial tra-
base stations with the hardware to distinct patterns of network use and jectories, and never reporting on cells or
process calls in a particular band. We more readily classifiable signatures. paths containing fewer than an agreed
don’t have data that would let us com- Unfortunately, at this time commercial minimum number of users, you would
pensate for these possible biases. So, considerations appear to preclude using be able to perform this kind of research
without adopting an entirely different data from other network operators. without compromising personal privacy.
approach to data collection—one that This issue highlights the extent to This data would also assist enormously
the network operator would have been which research using cellular networks in understanding how individual and
reluctant to support at this development must take nonscientific factors into group behavior changes over time and
stage—localizing phones more accu- account. First, a policy framework at the space. This would not only shed further
rately is impossible. national or European level that encour- light on the rhythms of urban life but also
Although the data to which we cur- ages networks to share nonidentifiable address the fact that you can’t derive met-
rently have access has clear, substantial data with planning and policy researchers rics on activity and population densities
limitations, we believe our approach rep- would be immensely helpful. Clearly, from Erlang data alone.
resents an appropriate trade-off between there are important considerations from The challenge is that as the data
locational specificity and implementa- the standpoint of commercial confiden- becomes more useful, it also becomes
tional feasibility. Fortunately, analysis at tiality, personal privacy, and possibly more sensitive to both operators and end
the city and city-regional scale doesn’t even national security. However, in the users. An all-or-nothing approach to pri-
depend on the high level of accuracy that absence of clear regulatory guidance, fur- vacy has hampered this discussion.

JULY–SEPTEMBER 2007 PERVASIVE computing 37


URBAN COMPUTING

the AUTHORS
Jonathan Reades is an MPhil and a PhD candidate at the Bartlett School of Planning
at University College London. His research interests are the application of mobile
phone data to topics in urban planning such as business clustering and communica-
tion, and the spatiotemporal structures of European cities. He previously spent eight
years in data analytics, helping telecom firms use their data for targeted marketing. REFERENCES
His first degree was in comparative literature at Princeton University; he’s a student
member of both the Royal Town Planning Institute and the Town and Country Plan- 1. F. Calabrese and C. Ratti, “Real Time
ning Association. Contact him at the Planning Dept., 4th fl., The Bartlett School, Rome,” Networks and Communication
Wates House, 22 Gordon St., London WC1H 0QB, UK; j.reades@ucl.ac.uk. Studies, vol. 20, nos. 3 & 4, 2006, pp.
247–258.

2. C. Ratti et al., “Mobile Landscapes: Using


Francesco Calabrese is pursuing a PhD in informatics and automatics engineering Location Data from Cell-Phones for Urban
from the University of Naples Federico II and is a visiting doctoral student at the Mass- Analysis,” Environment and Planning B:
achusetts Institute of Technology’s SENSEable City Laboratory. His research interests Planning and Design, vol. 33, no. 5, 2006,
include hybrid control systems, embedded control systems, computer–numerical- pp. 727–748.
control machines, finite-time stability, and real-time monitoring and analysis of urban
dynamics. He received his Laurea in informatics engineering from the University of 3. F. Calabrese et al., “Real-Time Urban
Naples Federico II. He’s a member of the IEEE and the IEEE Control Systems Society. Monitoring Using Cellphones: A Case-
Contact him at the SENSEable City Laboratory, MIT 10-485, 77 Massachusetts Ave., Study in Rome,” SENSEable Working
Cambridge, MA, 02139; fcalabre@mit.edu. Paper, SENSEable City Laboratory, Mass-
achusetts Inst. of Technology, 2007;
http://senseable.mit.edu/papers/pdf/
Andres Sevtsuk is a PhD candidate in city design and development and urban infor- CalabreseRatti2007SCLWorkingPaper.pdf.
mation systems at the Massachusetts Institute of Technology. His research interests
include accuracy and reliability in using mobile phone data for estimating the distri- 4. L. Bertolini and M. Dijst, “Mobility Envi-
bution of people in cities, the relationship between urban form and movement, and ronments and Network Cities,” J. Urban
the study of new possibilities in street parking and one-way vehicle rentals using dis- Design, vol. 8, no. 1, 2003, pp. 27–43.
tributed computation and sensing. He received his MS in architecture studies from
MIT. Contact him at 7 Dodge St., Cambridge, MA 02139; asevtsuk@mit.edu. 5. R. Zandvliet and M. Dijst, “Short-Term
Dynamics in the Use of Places: A Space-
Time Typology of Visitor Populations in the
Netherlands,” Urban Studies, vol. 43, no. 7,
Carlo Ratti is an associate professor of the practice of urban technologies at the Mass- 2006.
achusetts Institute of Technology, where he directs the SENSEable City Laboratory. He’s
also founding partner and director of carlorattiassociati, an architectural firm. He re- 6. S. Graham, “Cities in the Real-Time Age:
ceived his PhD in architecture from the University of Cambridge. He’s a member of the The Paradigm Challenge of Telecommuni-
Ordine degli Ingegneri di Torino and the Association des Anciens Elèves de l’École Na- cations to the Conception and Planning of
tionale des Ponts et Chaussées, and is a UK Registered Architect. Contact him at the Urban Space,” Environment and Planning
SENSEable City Laboratory, MIT 10-485, 77 Massachusetts Ave., Cambridge, MA 02139; A, vol. 29, no. 1, 1997, pp. 105–127.
ratti@media.mit.edu.
7. M. Sheller and J. Urry, “The New Mobili-
ties Paradigm,” Environment and Planning
A, vol. 38, no. 2, 2006, pp. 207–226.

8. N. Eagle and A. Pentland, “Eigenbehaviors:


Identifying Structure in Routine,” 2006;
http://vismod.media.mit.edu/tech-reports/
It would be helpful to move toward a sponsored by Telecom Italia, with technical part- TR-601.pdf.
more nuanced understanding of how to ners Biennale di Venezia, the City of Rome, Google,
Atac-Rome, Samarcanda, and Microtek. We grate- 9. M. Langheinrich, “Privacy Invasions in
preserve reasonable expectations of pri- fully acknowledge the significant contributions of Ubiquitous Computing,” 2002; http://
vacy by the network user while creating Burak Arikan, Assaf Biderman, Filippo Dal Fiore, Sa- guir.berkeley.edu/pubs/ubicomp2002/
mechanisms to permit future research. ba Ghole, Daniel Gutierrez, Sonya Huang, Sriram privacyworkshop/papers/uc2002-pws.pdf.
Krishnan, Justin Moe, Francisca Rojas, and Najib
We need to establish the extent to which Marc Terazi. 10. B. Malin, “Betrayed by My Shadow: Learn-
certain types of data and analysis create ing Data Identity via Trail Matching,” J.
either the perception of a privacy inva- We also thank Matthew Jull, Tim Kindberg, and our Privacy Technology, 2005, pp. 1–22; www.
anonymous reviewers for their valuable feedback; jopt.org/publications/20050609001_malin
sion9 or the real risk of trail reidentifi- Giovanni Celentano of the University of Naples and _abstract.html.
cation,10 and to set out the trade-offs for Lucio Pinto of the Silvio Tronchetti Provera Founda-
public review and discussion. tion for their generous suggestions; and Peter Hall
and Michael Batty of University College London and
Larry Vale of the Massachusetts Institute of Techno-
logy for fostering a collaborative research environ-
ACKNOWLEDGMENTS ment. And a final thank-you to the Balzan Founda-
tion, whose support for young researchers through For more information on this or any other comput-
The Real Time Rome exhibit at the 10th Interna- the Balzan Prize helped enable this transatlantic ing topic, please visit our Digital Library at www.
tional Architecture Exhibition at Venice, Italy was collaboration. computer.org/publications/dlib.

38 PERVASIVE computing www.computer.org/pervasive

You might also like