2003 BeresfordSta Location

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

SECURITY & PRIVACY

SECURITY & PRIVACY

Location Privacy in
Pervasive Computing
As location-aware applications begin to track our movements in the
name of convenience, how can we protect our privacy? This article
introduces the mix zone—a new construction inspired by anonymous
communication techniques—together with metrics for assessing user
anonymity.

M
any countries recognize privacy therefore most people could see no privacy impli-
as a right and have attempted to cations in revealing their location, except in special
codify it in law. The first known circumstances. With pervasive computing, though,
piece of privacy legislation was the scale of the problem changes completely. You
England’s 1361 Justices of the probably do not care if someone finds out where
Peace Act, which legislated for the arrest of eaves- you were yesterday at 4:30 p.m., but if this someone
droppers and stalkers. The Fourth Amendment to could inspect the history of all your past movements,
the US Constitution proclaims citizens’ right to pri- recorded every second with submeter accuracy, you
vacy, and in 1890 US Supreme Court Justice Louis might start to see things differently. A change of scale
Brandeis stated that “the right to be left alone” is of several orders of magnitude is often qualitative
one of the fundamental rights of a democracy.1 The as well as quantitative—a recurring problem in per-
1948 Universal Declaration of vasive computing.4
Human Rights2 declares that We shall focus on the privacy aspects of using
Alastair R. Beresford and everyone has a right to privacy at location information in pervasive computing appli-
Frank Stajano home, with family, and in corre- cations. When location systems track users auto-
University of Cambridge spondence. Other pieces of more matically on an ongoing basis, they generate an
recent legislation follow this prin- enormous amount of potentially sensitive informa-
ciple. Although many people tion. Privacy of location information is about con-
clearly consider their privacy a fundamental right, trolling access to this information. We do not nec-
comparatively few can give a precise definition of essarily want to stop all access—because some
the term. The Global Internet Liberty Campaign3 applications can use this information to provide use-
has produced an extensive report that discusses per- ful services—but we want to be in control.
sonal privacy at length and identifies four broad cat- Some goals are clearly mutually exclusive and can-
egories: information privacy, bodily privacy, privacy not be simultaneously satisfied: for example, want-
of communications, and territorial privacy. ing to keep our position secret and yet wanting col-
This article concentrates on location privacy, a leagues to be able to locate us. Despite this, there is still
particular type of information privacy that we define a spectrum of useful combinations to be explored.
as the ability to prevent other parties from learning Our approach to this tension is a privacy-protecting
one’s current or past location. Until recently, the very framework based on frequently changing pseudo-
concept of location privacy was unknown: people nyms so users avoid being identified by the locations
did not usually have access to reliable and timely they visit. We further develop this framework by
information about the exact location of others, and introducing the concept of mix zones and showing

46 PERVASIVE computing Published by the IEEE CS and IEEE Communications Society ■ 1536-1268/03/$17.00 © 2003 IEEE
Related Work: Location Awareness and Privacy

T he first wide-scale outdoor location system, GPS,1 lets users cal-


culate their own position, but the flow of information is unidi-
rectional; because there is no back-channel from the GPS receiver to
customer billing information, micropayment systems, customer
preferences, and location information. In this context, individual
privacy becomes much more of a concern. Users of location-aware
the satellites, the system cannot determine the user’s computed loca- applications in this scenario could potentially have all their daily
tion, and does not even know whether anyone is accessing the movements traced.
service. At the level of raw sensing, GPS implicitly and automatically In a work-in-progress Internet draft that appeared after we sub-
gives its users location privacy. mitted this article for publication, Jorge R. Cuellar and colleagues
In contrast, the first indoor location system, the Active Badge,2 also explore location privacy in the context of mobile cellular sys-
detects the location of each user and broadcasts the information tems.8 As we do, they suggest using pseudonyms to protect loca-
to everyone in the building. The system as originally deployed tion privacy. Unlike us, however, they focus on policies and rules—
assumes anyone in the building is trustworthy; it therefore they do not consider attacks that might break the unlinkability
provides no mechanisms to limit the dissemination of individuals’ that pseudonyms offer.
location information. Ian W. Jackson3 modified the Active Badge
system to address this issue. In his version, a badge does not reveal
its identity to the sensor detecting its position but only to a trusted REFERENCES
personal computer at the network edge. The system uses
encrypted and anonymized communication, so observation of the 1. I. Getting, “The Global Positioning System,” IEEE Spectrum, vol. 30, no.
12, Dec. 1993, pp. 36–47.
traffic does not reveal which computer a given badge trusts. The
badge’s owner can then use traditional access control methods to 2. R. Want et al., “The Active Badge Location System,” ACM Trans. Infor-
allow or disallow other entities to query the badge’s location. mation Systems, vol. 10, no. 1, Jan. 1992, pp. 91–102.

More recent location systems, such as Spirit4 and QoSDream,5 3. I.W. Jackson, “Anonymous Addresses and Confidentiality of Location,”
have provided applications with a middleware event model Information Hiding: First Int’l Workshop, R. Anderson, ed., LNCS, vol.
through which entities entering or exiting a predefined region of 1174, Springer-Verlag, Berlin, 1996, pp. 115–120.

space generate events. Applications register their interest in a par- 4. N. Adly, P. Steggles, and A. Harter, “SPIRIT: A Resource Database for
ticular set of locations and locatables and receive callbacks when Mobile Users,” Proc. ACM CHI'97 Workshop on Ubiquitous Computing,
the corresponding events occur. Current location-aware middle- ACM Press, New York, 1997.

ware provides open access to all location events, but it would be 5. H. Naguib, G. Coulouris, and S. Mitchell, “Middleware Support for
possible to augment this architecture to let users control the dis- Context-Aware Multimedia Applications,” Proc. 3rd Int’l Conf. Distrib-
semination of their own location information. uted Applications and Interoperable Systems, Kluwer Academic Publish-
ers, Norwell, Mass., 2001, pp. 9–22.
So far, few location systems have considered privacy as an initial
design criterion. The Cricket location system6 is a notable excep- 6. N.B. Priyantha, A. Chakraborty, and H. Balakrishnan, “The Cricket Loca-
tion: location data is delivered to a personal digital assistant under tion-Support System,” Proc. 6th Int’l Conf. Mobile Computing and Net-
working (Mobicom 2000), ACM Press, New York, 2000, pp. 32–43.
the sole control of the user.
Commercial wide-area location-based services will initially 7. P. Smyth, Project Erica, 2002, www.btexact.com/erica.
appear in mobile cellular systems such as GSM. BTexact’s Erica sys-
8. J.R. Cuellar, J.B. Morris, and D.K. Mulligan, Geopriv Requirements,
tem7 delivers sensitive customer information to third-party appli- Internet draft, Nov. 2002, www.ietf.org/internet-drafts/draft-ietf-
cations. Erica provides an API for third-party software to access geopriv-reqs-01.txt.

how to map the problem of location pri- Problem, threat model, and now dangerously close to reality. The more
vacy onto that of anonymous communi- application framework challenging problem we explore in this arti-
cation. This gives us access to a growing In the pervasive computing scenario, loca- cle is to develop techniques that let users
body of theoretical tools from the infor- tion-based applications track people’s benefit from location-based applications
mation-hiding community. In this context, movements so they can offer various use- while at the same time retaining their loca-
we describe two metrics that we have ful services. Users who do not want such tion privacy.
developed for measuring location privacy, services can trivially maintain location pri- To protect the privacy of our location
one based on anonymity sets and the other vacy by refusing to be tracked—assuming information while taking advantage of
based on entropy. Finally, we move from they have the choice. This has always been location-aware services, we wish to hide
theory to practice by applying our meth- the case for our Active Badge (see the our true identity from the applications
ods to a corpus of more than three million “Related Work” sidebar) and Active Bat receiving our location; at a very high level,
location sample points obtained from the systems but might not be true for, say, a this can be taken as a statement of our
Active Bat installation at AT&T Labs nationwide network of face-recognizing security policy.
Cambridge.5 CCTV cameras—an Orwellian dystopia Users of location-based services will not,

JANUARY–MARCH 2003 PERVASIVE computing 47


SECURITY & PRIVACY

in general, want information required by We assume a system with the typical collusion. This is because certain regions
one application to be revealed to another. architecture of modern location-based ser- of space, such as user desk locations in an
For example, patients at an AIDS testing vices, which are based on a shared event- office, act as “homes” and, as such, are
clinic might not want their movements (or driven middleware system such as Spirit or strongly associated with certain identities.
even evidence of a visit) revealed to the QoSDream (see the “Related Work” side- Applications could identify users by fol-
location-aware applications in their work- bar). In our model, the middleware, like lowing the “footsteps” of a pseudonym to
place or bank. We can achieve this com- the sensing infrastructure, is trusted and or from such a “home” area. At an earlier
partmentalization by letting location ser- might help users hide their identity. Users stage in this research, we tried this kind of
vices register for location callbacks only at register interest in particular applications attack on real location data from the Active
the same site where the service is provided. with the middleware; applications receive Bat and found we could correctly de-
But what happens if the clinic and work- event callbacks from the middleware when anonymize all users by correlating two sim-
place talk about us behind our back? the user enters, moves within, or exits cer- ple checks: First, where does any given
Because we do not trust these applications tain application-defined areas. The mid- pseudonym spend most of its time? Sec-
and have no control over them, we must dleware evaluates the user’s location at reg- ond, who spends more time than anyone
assume they might collude against us to ular periodic intervals, called update else at any given desk? Therefore, long-
discover the information we aim to hide. periods, to determine whether any events term pseudonyms cannot offer sufficient
We therefore regard all applications as one have occurred, and issues callbacks to protection of location privacy.
global hostile observer. applications when appropriate. Our countermeasure is to have users
Some location-based services, such as Users cannot communicate with appli- change pseudonyms frequently, even while
“when I am inside the office building, let cations directly—otherwise they would they are being tracked: users adopt a series
my colleagues find out where I am,” can- reveal their identity straight away. We there- of new, unused pseudonyms for each appli-
not work without the user’s identity. Oth- fore require an anonymizing proxy for all cation with which they interact. In this sce-
ers, such as “when I walk past a coffee communication between users and appli- nario, the purpose of using pseudonyms is
shop, alert me with the price of coffee,” cations. The proxy lets applications receive not to establish and preserve reputation (if
can operate completely anonymously. and reply to anonymous (or, more correctly, we could work completely anonymously
Between those extremes are applications pseudonymous) messages from the users. instead, we probably would) but to provide
that cannot be accessed anonymously— The middleware system is ideally placed to a return address. So the problem, in this
but do not require the user’s true identity perform this function, passing user input context, is not whether changing pseudo-
either. An example of this third category and output between the application and the nyms causes a loss of reputation but more
might be “when I walk past a computer user. For example, to benefit from the appli- basically whether the application will still
screen, let me teleport my desktop to it.” cation-level service of “when I walk past a work when the user changes under its feet.
Here, the application must know whose coffee shop, alert me with the price of cof- Our preliminary analysis convinced us that
desktop to teleport, but it could conceiv- fee,” the user requests the service, perhaps many existing applications could be made
ably do this using an internal pseudonym from the coffee shops’ trade-association to work within this framework by judicious
rather than the user’s actual name. Web site, but using her location system mid- use of anonymizing proxies. However, we
Clearly, we cannot use applications dleware as the intermediary so as not to decided to postpone a full investigation of
requiring our true identity without vio- reveal her identity. The coffee shop appli- this issue (see the “Directions for Further
lating our security policy. Therefore, in cation registers with the user’s location sys- Research” sidebar) and concentrate instead
this article, we concentrate on the class of tem and requests event callbacks for posi- on whether this approach, assuming it did
location-aware applications that accept tive containment in the areas in front of not break applications, could improve loca-
pseudonyms, and our aim will be the each coffee shop. When the user steps into tion privacy. If it could not, adapting the
anonymization of location information. a coffee shop event callback area, the cof- applications would serve no purpose.
In our threat model, the individual does fee shop application receives a callback Users can therefore change pseudonyms
not trust the location-aware application but from the user’s location system on the next while applications track them; but then, if
does trust the raw location system (the sens- location update period. The coffee shop ser- the system’s spatial and temporal resolu-
ing infrastructure that can position locata- vice then sends the current price of coffee tion were sufficiently high (as in the Active
bles). This trust is certainly appropriate for as an event message to the registered user Bat), applications could easily link the old
systems such as GPS and Cricket, in which via the middleware without having to know and new pseudonyms, defeating the pur-
only the user can compute his or her own the user’s real identity or address. pose of the change. To address this point,
location. Its applicability to other systems, Using a long-term pseudonym for each we introduce the mix zone.
such as the Active Bat, depends on the trust user does not provide much privacy—even
relationships between the user and the if the same user gives out different pseu- Mix zones
entity providing the location service. donyms to different applications to avoid Most theoretical models of anonymity

48 PERVASIVE computing http://computer.org/pervasive


Directions for Further Research

D uring our research on location privacy, several interesting open


problems emerged.
measurements from the perspective of what a hostile observer can
see and deduce, instead of counting the users in the mix zone.
The formalization, which we have attempted, is rather more com-
Managing application use of pseudonyms plicated than what is presented in this article, but it might turn out
A user who does not wish to be tracked by an application will to provide a more accurate location privacy metric.
want to use different pseudonyms on each visit. Many applications
can offer better services if they retain per-user state, such as per- Dummy users
sonal preferences; but if a set of preferences were accessed by sev- We could increase location privacy by introducing dummy users,
eral different pseudonyms, the application would easily guess that similar to the way cover traffic is used in mix networks, but there
these pseudonyms map to the same user. We therefore need to are side effects when we apply this technique to mix zones. Serv-
ensure that the user state for each pseudonym looks, to the appli- ing dummy users, like processing dummy messages, is a waste of
cation, different from that of any other pseudonym. resources. Although the overhead might be acceptable in the
There are two main difficulties. First, the state for a given user realm of bits, the cost might be too high with real-world services
(common to all that user’s pseudonyms) must be stored elsewhere such as those found in a pervasive computing environment.
and then supplied to the application in an anonymized fashion. Dummy users might have to control physical objects—opening
Second, the application must not be able to determine that two and closing doors, for example—or purchase services with
sets of preferences map to the same user, so we might have to add electronic cash. Furthermore, realistic dummy user movements
small, random variations. However, insignificant variations might are much more difficult to construct than dummy messages.
be recognizable to a hostile observer, whereas significant ones
could adversely affect semantics and therefore functionality. Granularity
The effectiveness of mixing depends not only on the user popu-
Reacting to insufficient anonymity lation and on the mix zone’s geometry, but also on the sensing
You can use the methods described in this article to compute a quanti- system’s update rate and spatial resolution. At very low spatial res-
tative measure of anonymity. How should a user react to a warning olutions, any “location” will be a region as opposed to a point and
that measured anonymity is falling below a certain threshold? The sim- so might be seen as something similar to a mix zone. What are the
ple-minded solution would be for the middleware to stop divulging loca- requirements for various location-based applications? How does
tion information, but this causes applications to become unresponsive. their variety affect the design of the privacy protection system?
An alternative might be to reduce the size or number of the appli- Several location-based applications would not work at all if they
cation zones in which a user has registered. This is hard to automate: got updates only on a scale of hours and kilometers, as opposed to
either users will be bombarded with warnings, or they will believe they seconds and meters.
are still receiving service when they are not. In addition, removing an
application zone does not instantaneously increase user anonymity; Scalability
it just increases the size or number of available mix zones. The user Our results are based on experimental data from our indoor
must still visit them and mix with others before anonymity increases. location system, covering a relatively small area and user popula-
Reconciling privacy with application functionality is still, in gen- tion. It would be interesting to apply the same techniques to a
eral, an open problem. wider area and larger population—for example, on the scale of a
city—where we expect to find better anonymity. We are currently
Improving the models trying to acquire access to location data from cellular telephony
It would be interesting to define anonymity sets and entropy operators to conduct further experiments.

and pseudonymity originate from the work is analogous to a mix node in communi- network that offers anonymous commu-
of David Chaum, whose pioneering con- cation systems. Using the mix zone to nication facilities. The network contains
tributions include two constructions for model our spatiotemporal problem lets us normal message-routing nodes alongside
anonymous communications (the mix net- adopt useful techniques from the anony- special mix nodes. Even hostile observers
work6 and the dining cryptographers algo- mous communications field. who can monitor all the links in the net-
rithm7) and the notion of anonymity sets.7 work cannot trace a message from its
In this article, we introduce a new entity Mix networks and mix nodes source to its destination without the collu-
for location systems, the mix zone, which A mix network is a store-and-forward sion of the mix nodes.

JANUARY–MARCH 2003 PERVASIVE computing 49


SECURITY & PRIVACY

Figure 1. A sample mix zone arrangement


with three application zones. The airline
agency (A) is much closer to the bank
(B) than the coffee shop (C). Users
Airline Mix zone
leaving A and C at the same time might
be distinguishable on arrival at B.
Coffee
shop

Bank
amount to which the mix zone anonymizes
users is therefore smaller than one might
believe by looking at the anonymity set
size. The two “Results” sections later in
this article discuss this issue in more detail.

A quantifiable anonymity
In its simplest form, a mix node collects distinct mix zones. In contrast, we define measure: The anonymity set
n equal-length packets as input and reorders an application zone as an area where a user For each mix zone that user u visits dur-
them by some metric (for example, lexico- has registered for a callback. The middle- ing time period t, we define the anonymity
graphically or randomly) before forward- ware system can define the mix zones a pri- set as the group of people visiting the mix
ing them, thus providing unlinkability ori or calculate them separately for each zone during the same time period.
between incoming and outgoing messages. group of users as the spatial areas currently The anonymity set’s size is a first mea-
For brevity, we must omit some essential not in any application zone. sure of the level of location privacy avail-
details about layered encryption of packets; Because applications do not receive any able in the mix zone at that time. For exam-
interested readers should consult Chaum’s location information when users are in a ple, a user might decide a total anonymity
original work. The number of distinct mix zone, the identities are “mixed.” set size of at least 20 people provides suf-
senders in the batch provides a measure of Assuming users change to a new, unused ficient assurance of the unlinkability of
the unlinkability between the messages com- pseudonym whenever they enter a mix their pseudonyms from one application
ing in and going out of the mix. zone, applications that see a user emerging zone to another. Users might refuse to pro-
Chaum later abstracted this last obser- from the mix zone cannot distinguish that vide location updates to an application
vation into the concept of an anonymity user from any other who was in the mix until the mix zone offers a minimum level
set. As paraphrased by Andreas Pfitzmann zone at the same time and cannot link peo- of anonymity.
and Marit Köhntopp, whose recent survey ple going into the mix zone with those Knowing the average size of a mix zone’s
generalizes and formalizes the terminology coming out of it. anonymity set, as opposed to its instanta-
on anonymity and related topics, the If a mix zone has a diameter much larger neous value, is also useful. When a user
anonymity set is “the set of all possible sub- than the distance the user can cover dur- registers for a new location-aware service,
jects who might cause an action.”8 In ing one location update period, it might the middleware can calculate from histor-
anonymous communications, the action is not mix users adequately. For example, ical data the average anonymity set size of
usually that of sending or receiving a mes- Figure 1 provides a plan view of a single neighboring mix zones and therefore esti-
sage. The larger the anonymity set’s size, mix zone with three application zones mate the level of location privacy available.
the greater the anonymity offered. Con- around the edge: an airline agency (A), a The middleware can present users with this
versely, when the anonymity set reduces to bank (B), and a coffee shop (C). Zone A is information before they accept the services
a singleton—the anonymity set cannot much closer to B than C, so if two users of a new location-aware application.
become empty for an action that was actu- leave A and C at the same time and a user There is an additional subtlety here:
ally performed—the subject is completely reaches B at the next update period, an introducing a new application could result
exposed and loses all anonymity. observer will know the user emerging from in more users moving to the new applica-
the mix zone at B is not the one who tion zone. Therefore, a new application’s
Mix zones entered the mix zone at C. Furthermore, if average anonymity set might be an under-
We define a mix zone for a group of nobody else was in the mix zone at the estimate of the anonymity available
users as a connected spatial region of max- time, the user can only be the one from A. because users who have not previously vis-
imum size in which none of these users has If the maximum size of the mix zone ited the new application zone’s geograph-
registered any application callback; for a exceeds the distance a user covers in one ical area might venture there to gain access
given group of users there might be several period, mixing will be incomplete. The to the service.

50 PERVASIVE computing http://computer.org/pervasive


North

North zone
West zone
East zone &
Corridor Hallway
stairwell

South zone

Figure 2. Floor plan layout of the laboratory. Combinations of the labeled areas form the mix zones z1, z2, and z3, described in the
main text.

Experimental data period between 9 a.m. and 5 p.m.—a total rather meaningless to average values that
We applied the anonymity set technique of more than 3.4 million samples. All the also include zero.
to location data collected from the Active location sightings used in the experiment Figure 3 plots the size of the anonymity
Bat system5 installed at AT&T Labs Cam- come from researchers wearing bats in nor- set for the hallway mix zone, z1, for update
bridge, where one of us was sponsored and mal use. Figure 2 provides a floor plan view periods of one second to one hour. A loca-
the other employed. Our location privacy of the building’s first floor. The second- and tion update period of at least eight minutes
experiments predate that lab’s closure, so third-floor layouts are similar. is required to provide an anonymity set size
the data we present here refers to the orig- of two. Expanding z1 to include the corridor
inal Active Bat installation at AT&T. The Results yields z2, but the results (not plotted) for
Laboratory for Communications Engi- Using the bat location data, we simu- this larger zone do not show any significant
neering at the University of Cambridge, lated longer location update periods and increase in the anonymity level provided.
our current affiliation, has since redeployed measured the size of the anonymity sets for Mix zone z3, encompassing the main
the system. three hypothetical mix zones: corridors and hallways of all three floors
The system locates the position of bats: as well as the stairwell, considerably
small mobile devices each containing an • z1: first-floor hallway improves the anonymity set’s cardinality.
ultrasonic transmitter and a radio trans- • z2: first-floor hallway and main corridor Figure 4 plots the number of people in z3
ceiver. The laboratory ceilings house a • z3: hallway, main corridor, and stairwell for various location update periods. The
matrix of ultrasonic receivers and a radio on all three floors mean anonymity set size reaches two for a
network; the timing difference between location update period of 15 seconds. This
ultrasound reception at different receivers We chose to exclude from our measure value is much better than the one obtained
lets the system locate bats with less than 3 of anonymity set sizes of zero occupancy, using mix zone z1, although it is still poor
cm error 95 percent of the time. While in for the following reason. In general, a high in absolute terms.
use, typical update rates per bat are number of people in the mix zone means The anonymity set measurements tell us
between one and 10 updates per second. greater anonymity and a lower number that our pseudonymity technique cannot
Almost every researcher at AT&T wore means less. However, this is only true for give users adequate location privacy in this
a bat to provide position information to values down to one, which is the least- particular experimental situation (because
location-aware applications and colleagues anonymous case (with the user completely of user population, resolution of the loca-
located within the laboratory. For this arti- exposed); it is not true for zero, when no tion system, geometry of the mix zones,
cle, we examined location sightings of all users are present. While it is sensible to aver- and so on). However, two positive results
the researchers, recorded over a two-week age anonymity set size values from one, it is counterbalance this negative one: First, we

JANUARY–MARCH 2003 PERVASIVE computing 51


SECURITY & PRIVACY

Figure 3. Anonymity set size for z1.


18
Maximum
16
Mean
14 Minimum
point of view: If I am in the mix zone with
Cardinality of anonymity set

12 20 other people, I might consider myself


well protected. But if all the others are on
10
the third floor while I am on the second,
8 when I go in and out of the mix zone the
observer will strongly suspect that those
6 lonely pseudonyms seen on the second
4 floor one at a time actually belong to the
same individual. This discovery motivated
2 the work we describe next.

1s 2s 4s 8s 15s 30s 1m 2m 4m 8m 15m 30m 1h Accounting for user movement


Location update period So far we have implicitly assumed the
location of entry into a mix zone is inde-
pendent of the location of exit. This in gen-
have a quantitative measurement tech- approximately 50 seconds. For update eral is not the case, and there is a strong
nique that lets us perform this assessment. periods shorter than this value, we cannot correlation between ingress position and
Second, the same protection technique that consider users to be “mixed.” Intuitively, it egress position, which is a function of the
did not work in the AT&T setting might seems much more likely that a user enter- mix zone’s geography. For example, con-
fare better in another context, such as ing the mix zone from an office on the third sider two people walking into a mix zone
wide-area tracking of cellular phones. floor will remain on the third floor rather from opposite directions: in most cases
Apart from its low anonymity set size, than move to an office on the first. Analy- people will continue walking in the same
mix zone z3 presents other difficulties. The sis of the data reveals this to be the case: in direction.
maximum walking speed in this area, cal- more than half the movements in this mix Anonymity sets do not model user entry
culated from bat movement data, is 1.2 zone, users travel less than 10 meters. This and exit motion. Andrei Serjantov and
meters per second. To move from one means that the users in the mix zone are George Danezis9 propose an information-
extreme of z3 to another, users require not all equal from the hostile observer’s theoretic approach to consider the varying
probabilities of users sending and receiv-
ing messages through a network of mix
nodes. We apply the same principle here to
40 mix zones.
Maximum The crucial observation is that the
36
Mean anonymity set’s size is only a good measure
32
Minimum of anonymity when all the members of the
Cardinality of anonymity set

28 set are equally likely to be the one of inter-


est to the observer; this, according to Shan-
24
non’s definition,10 is the case of maximal
20 entropy. For a set of size n, if all elements
16 are equiprobable, we need log2 n bits of
information to identify one of them. If they
12
are not equiprobable, the entropy will be
8 lower, and fewer bits will suffice. We can
4 precisely compute how many in the fol-
lowing way.
0
1s 2s 4s 8s 15s 30s 1m 2m 4m 8m 15m 30m 1h
Location update period
Figure 4. Anonymity set size for z3.

52 PERVASIVE computing http://computer.org/pervasive


Figure 5. The movement matrix M records
the frequency of movements through the

Mix zone
hallway mix zone. Each element
represents the frequency of movements 96 47 66 101 814
from the preceding zone p, at time t – 1,
through zone z, at time t, to the

West
subsequent zone s, at time t + 1.
125 13 17 1 43

Preceding zone, p

South
Consider user movements through a 29 55 7 22 30
mix zone z. For users traveling through z
at time t, we can record the preceding zone,
p, visited at time t − 1 , and the subsequent North
zone, s, visited at time t + 1. The user does- 39 6 64 39 66
n’t necessarily change zones on every loca-
tion update period: the preceding and sub-
sequent zones might all refer to the same
East

24 47 74 176 162
mix zone z. Using historical data, we can
calculate, for all users visiting zone z, the
relative frequency of each pair of preceding East North South West Mix zone
and subsequent points (p, s) and record it
Subsequent zone, s
in a movement matrix M. M records, for
all possible (p, s) pairs, the number of times
a person who was in z at t was in p at t −
1 and in s at t + 1. The entries of M are pro- Results way mix zone z1 in the same location
portional to the joint probabilities, which We applied this principle to the same set update period. East and west are applica-
we can obtain by normalization: of Active Bat movement data described ear- tion zones, so the hostile observer will
lier. We consider the hallway mix zone, z1, know that two users went into z1, one from
M (p, s ) and define four hypothetical application east and another from west, and that later
P (prec = p, subs = s ) = .
∑i, j M (i , j ) zones surrounding it: east, west, north, and
south (see Figure 2). Walls prevent user
those users came out of z1, under new pseu-
donyms, one of them into east and the
movement in any other direction. Figure 5 other into west. What the observer wants
The conditional probability of coming shows M, the frequency of all possible out- is to link the pseudonyms—that is, to find
out through zone s, having gone in through comes for users entering the hallway mix out whether they both went straight ahead
zone p, follows from the product rule: zone z1, for a location update period of five or each made a U-turn. Without any a pri-
seconds. From Figure 5 we observe that ori knowledge, the information content of
M ( p, s ) users do not often return to the same mix this alternative is one full bit. We will now
P ( subs = s prec = p) = .
∑j M ( p, j ) zone they came from. In other words, it is
unlikely that the preceding zone, p, is the
show with a numerical example how the
knowledge encoded in movement matrix
same as the subsequent zone, s, in every case M lets the observer “guess” the answer
We can now apply Shannon’s classic except a prolonged stay in the mix zone with a chance of success that is better than
measure of entropy10 to our problem: itself. It is quite common for users to remain random, formalizing the intuitive notion
in z1 for more than one location update that the U-turn is the less likely option.
h =− ∑ pi ⋅ log pi. period; the presence of a public computer in
the hallway might explain this.
We are observing two users, each mov-
ing from a preceding zone p through the
i
We can use the data from Figure 5 to mix zone z1 to a subsequent zone s. If p and
This gives us the information content, in determine how modeling user movements s are limited to {E, W}, we can reduce M
bits, associated with a set of possible out- affects the anonymity level that the mix to a 2 × 2 matrix M′. There are 16 possible
comes with probabilities pi. The higher it is, zone offers. As an example, consider two cases, each representable by a four-letter
the more uncertain a hostile observer will users walking in opposite directions—one string of Es and Ws: EEEE, EEEW, …,
be about the true answer, and therefore the east to west and the other west to east— WWWW. The four characters in the string
higher our anonymity will be. passing through the initially empty hall- represent respectively the preceding and

JANUARY–MARCH 2003 PERVASIVE computing 53


SECURITY & PRIVACY

subsequent zone of the first user and the hostile observer can therefore unlink pseu- tem demonstrated that, because the tem-
preceding and subsequent zone of the sec- donyms with much greater success than the poral and spatial resolution of the location
ond user. mere anonymity set size would suggest. data generated by the bats is high, location
We identify two events of interest. The The entropy therefore gives us a tighter and privacy is low, even with a relatively large
first, observed, corresponds to the situation more accurate estimate of the available mix zone. However, we anticipate that the
we observed: two users coming into z1 from anonymity.9 same techniques will show a much higher
opposite directions and then again coming degree of unlinkability between pseudo-
out of it in opposite directions. We have nyms over a larger and more populated

T
echnologies for locating and area, such as a city center, in the context of
observed = EEWW ∨ EWWE ∨ tracking individuals are becom- locating people through their cellular tele-
WEEW ∨ WWEE, ing increasingly commonplace, phones.
and they will become more per-
where ∨ means logical OR. The second vasive and ubiquitous in the future. Loca-
event, uturn, corresponds to both users tion-aware applications will have the poten-
doing a U-turn: tial to follow your every move, from leaving
your house to visiting the doctor’s office, ACKNOWLEDGMENTS
uturn = EEWW ∨ WWEE. recording everything from the shelves you
view in a supermarket to the time you We thank our colleagues Andrei Serjantov, Dave
Scott, Mike Hazas, Tom Kelly, and Gray Girling,
We see that, in fact, the first event includes spend beside the coffee machine at work. as well as Tim Kindberg and the other IEEE Com-
the second. We can easily compute the prob- We must address the issue of protecting puter Society referees, for their thoughtful advice
abilities of these two events by summing the location information before the widespread and comments. We also thank all of our former col-
leagues at AT&T Laboratories Cambridge for pro-
probabilities of the respective four-letter sub- deployment of the sensing infrastructure. viding their movement data, as well as EPSRC and
cases. We obtain the probability of a four- Applications can be built or modified to AT&T Laboratories Cambridge for financial
letter subcase by multiplying together the use pseudonyms rather than true user iden- support.
probability of the paths taken by the two tities, and this is one route toward greater
users. For example, we obtain the proba- location privacy. Because different appli-
bility of EWWE by calculating P(EW) × cations can collude and share information
P(WE). We obtain these terms, in turn, from about user sightings, users should adopt
the reduced movement matrix M′ by nor- different pseudonyms for different appli-
malization. Numerically, P(observed) = cations. Furthermore, to thwart more REFERENCES
0.414. Similarly, P(uturn) = 0.0005. sophisticated attacks, users should change 1. S. Warren and L. Brandeis, “The Right to
As the hostile observer, what we want to pseudonyms frequently, even while being Privacy,” Harvard Law Rev., vol. 4, no. 5, Dec.
know is who was who, which is formally tracked. 1890, pp. 193–200.

expressible as the conditional probability Drawing on the methods developed for 2. United Nations, Universal Declaration of Human
of whether each did a U-turn given what anonymous communication, we use the Rights, General Assembly Resolution 217 A (III),
we saw: P(uturn | observed). From the conceptual tools of mix zones and 1948; www.un.org/Overview.

product rule, this is equal to P(uturn ∧ anonymity sets to analyze location privacy. 3. D. Banisar and S. Davies, Privacy and Human
observed) / P(observed). This, because the Although anonymity sets provide a first Rights, www.gilc.org/privacy/survey.
second event is included in the first, reduces quantitative measure of location privacy,
4. F. Stajano, Security for Ubiquitous Computing,
to P(uturn) / P(observed) = 0.001. this measure is only an upper-bound esti- John Wiley & Sons, New York, 2002.
So the observer now knows that they mate. A better, more accurate metric, devel-
either did a U-turn, with probability 0.1 oped using an information-theoretic 5. A. Ward, A. Jones, and A. Hopper, “A New
Location Technique for the Active Office,” IEEE
percent, or they went straight, with proba- approach, uses entropy, taking into Personal Communications, vol. 4, no. 5, Oct.
bility 99.9 percent. The U-turn is extremely account the a priori knowledge that an 1997, pp. 42–47.
unlikely, so the information content of the observer can derive from historical data.
6. D. Chaum, “Untraceable Electronic Mail,
outcome is not one bit, as it would be if the The entropy measurement shows a more Return Addresses and Digital Pseudonyms,”
choices were equiprobable, but much less— pessimistic picture—in which the user has Comm. ACM, vol. 24, no. 2, 1981, pp. 84–88.
because even before hearing the true answer less privacy—because it models a more
7. D. Chaum, “The Dining Cryptographers Prob-
we are almost sure that it will be “they both powerful adversary who might use histor- lem: Unconditional Sender and Recipient
went straight ahead.” Numerically, the ical data to de-anonymize pseudonyms Untraceability,” J. Cryptology, vol. 1, no. 1,
entropy of this choice is 0.012 bits. more accurately. 1988, pp. 66–75.

This value is much lower than 1 because Applying the techniques we presented in 8. A. Pfitzmann and M. Köhntopp, “Anonymity,
the choices are not equiprobable, and a this article to data from our Active Bat sys- Unobservability and Pseudonymity—A Proposal

54 PERVASIVE computing http://computer.org/pervasive


for Terminology,” Designing Privacy Enhancing
Technologies: Proc. Int’l Workshop Design Issues
the AUTHORS
in Anonymity and Observability, LNCS, vol.
2009, Springer-Verlag, Berlin, 2000, pp. 1–9. Alastair Beresford is a PhD student at the University of Cambridge. His research
interests include ubiquitous systems, computer security, and networking. After com-
9. A. Serjantov and G. Danezis, “Towards an Infor- pleting a BA in computer science at the University of Cambridge, he was a
mation Theoretic Metric for Anonymity,” Proc. researcher at BT Labs, returning to study for a PhD in the Laboratory for Communi-
Workshop on Privacy Enhancing Technologies, cations Engineering in October 2000. Contact him at the Laboratory for Communi-
2002; www.cl.cam.ac.uk/users/aas23/papers_ cation Engineering, Univ. of Cambridge, William Gates building, 15, JJ Thomson
aas/set.ps. Avenue, Cambridge, CB3 0FD, UK; arb33@cam.ac.uk.

10. C.E. Shannon, “A Mathematical Theory of


Communication,” Bell System Tech. J., vol. 27,
July 1948, pp. 379–423, and Oct. 1948, pp.
623–656. Frank Stajano is a faculty member at the University of Cambridge’s Laboratory for
Communication Engineering, where he holds the ARM Lectureship in Ubiquitous
Computing. His research interests include security, mobility, ubicomp, scripting lan-
guages, and middleware. He consults for industry on computer security and he is
the author of Security for Ubiquitous Computing (Wiley, 2002). Contact him at the
For more information on this or any other com- Laboratory for Communication Engineering, Univ. of Cambridge, William Gates
puting topic, please visit our Digital Library at building, 15, JJ Thomson Avenue, Cambridge, CB3 0FD, UK; fms27@cam.ac.uk;
http://computer.org/publications/dlib. www-lce.eng.cam.ac.uk/~fms27.

PU R POSE The IEEE Computer Society is EXECUTIVE COMMITTEE


the world’s largest association of computing President:
professionals, and is the leading provider of STEPHEN L. DIAMOND*
technical information in the field. Picosoft, Inc.
MEMBERSHIP Members receive the P.O.Box 5032
monthly magazine COM PUTER , discounts, San Mateo, CA 94402
and opportunities to serve (all activities are Phone: +1 650 570 6060
led by volunteer members). Membership is Fax: +1 650 345 1254
open to all IEEE members, affiliate society s.diamond@computer.org
members, and others interested in the com-
puter field.
President-Elect: CARL K. CHANG*
COMPUTER SOCIETY WEB SITE Past President: WILLIS. K. KING*
The IEEE Computer Society’s Web site, at VP, Educational Activities: DEBORAH K. SCHERRER
http://computer.org, offers information
and samples from the society’s publications (1ST VP)*
and conferences, as well as a broad range of VP, Conferences and Tutorials: CHRISTINA
information about technical committees, SCHOBER*
standards, student activities, and more. COMPUTER SOCIETY OFFICES VP, Chapters Activities: MURALI VARANASI†
Headquarters Office VP, Publications: RANGACHAR KASTURI †
BOARD OF GOVERNORS
1730 Massachusetts Ave. NW VP, Standards Activities: JAMES W. MOORE†
Term Expiring 2003: Fiorenza C. Albert-
Howard, Manfred Broy, Alan Clements, Richard A. Washington, DC 20036-1992 VP, Technical Activities: YERVANT ZORIAN†
Kemmerer, Susan A. Mengel, James W. Moore, Phone: +1 202 371 0101 • Fax: +1 202 728 9614 Secretary: OSCAR N. GARCIA*
Christina M. Schober Treasurer:WOLFGANG K. GILOI* (2ND VP)
Term Expiring 2004: Jean M. Bacon, Ricardo E-mail: hq.ofc@computer.org
Baeza-Yates, Deborah M. Cooper, George V. Cybenko, 2002–2003 IEEE Division VIII Director: JAMES D.
Haruhisha Ichikawa, Lowell G. Johnson, Thomas W. Publications Office ISAAK†
Williams 10662 Los Vaqueros Cir., PO Box 3014 2003–2004 IEEE Division V Director: GUYLAINE M.
Term Expiring 2005: Oscar N. Garcia, Mark A POLLOCK†
Grant, Michel Israel, Stephen B. Seidman, Kathleen Los Alamitos, CA 90720-1314
M. Swigger, Makoto Takizawa, Michael R. Williams Phone:+1 714 821 8380 Computer Editor in Chief:DORIS L. CARVER†
Next Board Meeting: 6 May 2003, Vancouver, WA Executive Director: DAVID W. HENNAGE†
E-mail: help@computer.org
* voting member of the Board of Governors
Membership and Publication Orders: † nonvoting member of the Board of Governors
IEEE OFFICERS Phone: +1 800 272 6657 Fax: +1 714 821 4641
President: MICHAEL S. ADLER
President-Elect: ARTHUR W. WINSTON E-mail: help@computer.org
Asia/Pacific Office
EXECUTIVE STAFF
Past President: RAYMOND D. FINDLAY Executive Director: DAVID W. HENNAGE
Executive Director: DANIEL J. SENESE Watanabe Building
Assoc. Executive Director:
Secretary: LEVENT ONURAL 1-4-2 Minami-Aoyama,Minato-ku, ANNE MARIE KELLY
Treasurer: PEDRO A. RAY Tokyo107-0062, Japan Publisher: ANGELA BURGESS
VP, Educational Activities: JAMES M. TIEN Phone: +81 3 3408 3118 • Fax: +81 3 3408 3553 Assistant Publisher: DICK PRICE
VP, Publications Activities:MICHAEL R. LIGHTNER
E-mail: tokyo.ofc@computer.org Director, Finance & Administration: VIOLET S.
VP, Regional Activities: W. CLEON ANDERSON
DOAN
VP, Standards Association: GERALD H. PETERSON
Director, Information Technology & Services:
VP, Technical Activities: RALPH W. WYNDRUM JR.
ROBERT CARE
IEEE Division VIII Director JAMES D. ISAAK
Manager, Research & Planning: JOHN C. KEATON
President, IEEE-USA: JAMES V. LEONARD

JANUARY–MARCH 2003 PERVASIVE computing 55

You might also like