Reinforcement Learning Based Smart Data Agent For Location Privacy

Reinforcement Learning based Smart Data
Agent for Location Privacy
Harkeerat Kaur, Rohit Kumar, and Isao Echizen
Abstract A “privacy by design” approach is presented for location management on

a smartphone that is based on the concept of “smart data”. Extensive use of location-
based services has given rise to various privacy concerns such as processing and
selling of personal information. The proposed work addresses these concerns by
developing an intelligent agent that controls the release of the user’s information in
accordance with its preferences, context, nature of situation. The proposed model
uses reinforcement learning to provide an on-the-go smart data agent that learns the
user’s privacy policy in a manner that is interactive and adaptive, enabling it to adjust
itself to changes in user preferences over time. The agent aimed at mimicking the
user, deciding when and where exact location information should be revealed and
evolving over time to become the user’s single trusted virtual proxy in cyberspace,
managing its interactions with various applications in different environments.
1 Introduction
The digital world is now hosting a complete palate of IoT and web-based applica-
tions, such as smart homes, smart cars, smart wearables, smart locks, that are cou-
pled with various online services for shopping, renting, education, insurance, etc. To
avail themselves of these value-added services, users must often submit extensive
personal information, including their preferences, behaviours, eating habits, and so-
cial status. Moreover, location information, another type of personal information
Harkeerat Kaur
Indian Institute of Technology Jammu, India e-mail: harkeerat.kaur@iitjammu.ac.in
Rohit Kumar
Indian Institute of Technology Jammu, India e-mail: 2017ucs0054@iitjammu.ac.in
Isao Echizen
National Institute of Informatics Tokyo, Japan e-mail: iechizen@nii.ac.jp
1
2 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
is increasingly, inadvertently and sometimes covertly being captured by location-

based services (LBS). Various mobile applications for route mapping, recommen-
dation, weather and traffic prediction, and even non-LBS applications like music and
entertainment continuously capture user location data, which is then used to mon-
itor user movement patterns and places of interest. A present-day smartphone user
often faces a choice like “ allow *** [app] to access your location even when you
are not using it”. Unless permission is granted, the user cannot use the app. There
is thus a strong need for technological solutions that integrate “privacy by design”
with IoT frameworks so that a user can enjoy the benefits of using such services
while retaining control over the extent to which the information can be shared.
A model is proposed here using the concept of “smart data” i.e., data that can
“think for itself” [5, 19] to provide location data privacy. It requires development of
an intelligent web-based agent that acts as a virtual proxy for the user in cyberspace,
controlling the release of data in accordance with its instructions and/or preferences.
Modelling the design of an agent that mimics the user’s’ privacy behaviour is the
most important and challenging aspect of this approach. This challenge is met by the
development of reinforcement learning (RL) based adaptive and automated “smart
data agent model”. In comparison to supervised and unsupervised learning, RL is
based on dynamic mapping and learning by interaction. The proposed agent model
interacts with the environment in order to understand how to map a situation and
to decide the appropriate action to take to maximize the total possible reward. The
ability to learn dynamically in a human-like way has led to RL being used to develop
adaptive software programs [13]. The proposed approach is the first step towards an
“intelligent data agent” specific to each user, supporting the user’ privacy prefer-
ences learned through reinforcement learning, to the best of our knowledge.
The work is organized as follows. The location privacy problem is described and
existing approaches are reviewed in Section 2. The RL formulation of the proposed
smart data agent model is discussed in Section 3. The simulation results are pre-
sented and discussed in Section 4. The key points are summarized in Section 5.
2 Location Privacy and Related Works
Location privacy essentially refers to a person’s control over the storing, sharing,
and dissipation of their location information [1]. Several types of mechanisms have
been proposed for managing users’ location information.
Centralized third-party mechanism: In this approach, a centralized anonymizer or a
third party acts as a trusted agent between the user and LBS. The anonymizer cloaks
the actual location of the user within a set of locations known as a “cloaking region”
[3]. In the basic K-anonymity approach, the anonymizer first encloses the actual
location within a set of K − 1 similar locations and then forwards the query to the
LBS [2]. Several improvements like K-anonymity and L-diversity [10], distributed
K-anonymity [22], and (K,T )-anonymity [11] have been proposed. However, the
major issues are reliability of the anonymizer and its probability of getting attacked.
Reinforcement Learning based Smart Data Agent for Location Privacy 3
The need for maintaining a sufficient number of users may cause expansion of the
cloaking regions which can degrade quality of service (QoS) and service time.
Collaborative mechanism: In this approach a group of users collaborate to hide
their location information [18, 17]. It requires peer-to-peer connections of location-
aware devices connected over a wireless infrastructure (Bluetooth, Wi-Fi) that can
communicate with each other. Main working principle is that if a peer in the net-
work has location-specific information fetched from LBS, it can share it with any
peer who is seeking the same information without it having to submit its actual lo-
cation. The devices can also use these locations for collaborative k-anonymization
[18], obfuscation [6, 8] etc. Major challenges with this approach are the presence of
neighbours, their probability having similar queries, willingness to cooperate, and
network viability for such operations [15]. Geographical location and population
density play a very important role here, and privacy cannot always be guaranteed.
User-centric mechanism: Here a user itself endeavours to protect its location instead
of relying on peers or third parties. Location is distorted by cloaking, enlarging, or
reducing the target region. A basic way to obfuscate is to distort on the basis of
geometry [12]. Another way to protect actual location is to submit fake or dummy
queries to the LBS directly rather than having a centralized anonymizer server gen-
erate them. However, if the locations for submitting the fake or dummy queries is
not carefully selected, the attacker may be able to infer the actual location. Historical
proximity and kernel transformation are other approaches in this direction [4, 21].
Blockchain mechanism: In this recently developed approach, a framework is created
that uses multiple private blockchains to collapse direct contact between users and
an LBS. Transactions are used to record the user’s location on a blockchain, and
incentivized mining is used to query an LBS and return results to users [14, 9].
While various approaches are proposed but the most important concern remains
the same i.e., the restricted choice of using either original or distorted location. A
map app, for example, gets permission to access exact location for navigation, which
it may misuse to record user’s location even when it is not in use. Furthermore, user’s
personality also affects their privacy preferences. A college going teenager would
likely be fine with revealing its exact location to a social networking LBS while it
would be matter of great concern to a medical practitioner, and unimaginable for a
military person. The same teenager’s behavior may change if he spends hours on
that app and may not like getting its exact location recorded throughout.
Until now there is no provision of letting a user decide with whom they want
to share information, when, and to what extent. This motivates the design of an
intelligent model that first analyses the environment and then predicts the amount of
privacy needed. It is cognizant of the user’s needs and adapts to changes accordingly
over time. We recently proposed a primitive model of such a data agent in [7]. It uses
supervised learning on static input-output training data recorded to predict privacy
behaviour over simple neural network architecture. However, it was cumbersome
to analyse all possible states and give a scalable solution suitable for all possible
scenarios. Our newly proposed model is an advanced, reconfigurable, self-evolving,
and a more practical version that uses RL for interactive behaviour to mimic a proxy
agent sensitive to the user’s personal choices, app behaviours, and context.
3 Proposed Smart Data Agent RL Formulation
Fig. 1 (a) Smart data agent concept; (b) Depicting state, actions, and rewards in setup.
Smart Data: The smart data concept allows data to “think for themselves” by
transforming themselves into different “active forms” in accordance with the user’s
preferences, content, context, and nature of the interacting environment with the
help of an agent [19, 5]. Data are instructed how to expose or transform themselves
to the interacting environment. The proposed model transcends the above definition
of the smart data concept to facilitate the creation of smart location data. As illus-
trated in Fig. 1(a) the original data, i.e., the coordinates of the user’s actual location
are at the center. Instead of directly revealing themselves (coordinates x,y) to the in-
teracting environment (apps), the data pass through a transformation/cloaking layer.
This process is supported by the agent, which senses the environment and interact-
ing apps (e.g., Maps & Navigation, Social, Food & Health), and associates different
actions (distortion ranges) in different situations.
Problem Formulation: The goal is to learn policies by testing different actions
and automatically learn which one has the most rewarding outcome for a given
situation [20]. This sets up the basis for contextual bandit RL formalism. In our con-
textual bandit formalism, we need to capture the user’s preferences in our agent’s
decision making. Thus, there is a need for human input to evaluate the agent’s be-
haviour. Therefore, the natural question to ask is how would a person evaluate an
agent’s behaviour? A natural way to capture such knowledge is from an object’s
group behaviour, an object’s specific behaviour, an object’s query goodness, and the
agent’s action. Fig 1(b) outlines the basic set up of the RL agent. The agent is a
representative of “user preferences,” and the object is an “app querying for loca-
tion”. This notion of capturing user preference and evaluation guides us to form the
elements of the problem, i.e., the states, actions, and rewards.
1. States: States are the input to the agent and should be mapped in accordance with
the user’s evaluation system. Three properties describe object’s/app’s state.
• Object’s group behaviour : app tag (category)
Table 1 Apps, Tags and Contexts
Apps Tags Context Words

Ola,Uber, Maps, Sygic Maps & Navigation book, ride, go, look, find, way, map, guide, locate
SnapChat, Facebook, Instagram Social check-in, live, tag, locate, share
Starbucks, Baskin Robins, Dominos Food & Drinks shop, order, book, deliver, address, locate
Urban Clap Home Services book, order, address, locate
Chrome Communication find, search, nearby
Book My Show Entertainment nearby, locate, maps
PicArt, Reading Others locate, find
• Object’s specific behaviour : usage frequency level (low, medium, high)

• Object’s query goodness : context (present or absent)
a) App Tags: The agent begins by crawling the web and/or app stores to obtain
initial tags for the installed apps. An online tool such as the Google Play scrapper
is available [16]. After scrapping the Google Play store, we identified seven major
categories as shown in Table 1 .
b) Usage Frequency: The usage frequency (UF) corresponds to the number of
times location data is accessed within a certain interval of time. It is used to deter-
mine the level of interaction and is computed for each app as:
No. of times location accessed in T IME BANDW IDT H
UFapp = . (1)
T IME BANDW IDT H in seconds
It is updated every T IME BANDW IDT H (e.g. every minute or every hour) fac-
tor. Due to the tabular formation of our problem, we discretize the UFapp into three
categories - (low, medium, and high) using some percentage threshold values.

 low, if UFapp > Low UF
category
UFapp = medium, if Low UF ≥ UFapp < High UF; (2)

high, if UFapp ≥ Low UF.
c) Context Words: Context words are obtained by inputting words for each app cat-
egory that are relevant or may require the user’s precise location (Table 1).
2. Actions: A set of valid location distortion ranges is defined as actions:
• A0 ⇒ no distortion
• A1 ⇒ distort by a random number in range ( a1start , a1end ) e.g. (0-250 mt)
• A2 ⇒ distort by a random number in range( a2start , a2end ) e.g. (250-500 mt)
• A3 ⇒ distort by a random number in range( a3start , a3end ) e.g. ( >500 mt)
The number of actions and the range of distortion can be set by the user or defined
by default, thus giving great flexibility.
3. Rewards: A non-stationary reward system is needed for the agent having reward
R ∈ 0, 1, where 0 is negative feedback and 1 is positive feedback. It is non-stationary
because human preferences may change over time.
Fig. 2 Main operations performed by agent - sensing, prediction, and distortion.
The environment is specified by a tuple M = (S; A; R), where S stands for

state, A for action, and R for reward. The first step of the algorithm requires sens-
ing the environment by analysing the state of the interacting object (“app query-
ing for location”). The state is defined by extracting its app category (seven here),
Category
the level of its usage frequency (three here, UFapp =low, medium, high), and
query context present/absent (Table 1). Thus, the total number of possible states S
Category
is (app categories × UFapp × context), which is 42 (7×3 × 2) here. Let there
be four possible actions, A0 to A3 . Thus, the total number of possible combinations
of state-action pairs can be defined by a matrix with dimensions (no. states × no.
actions ), i.e. 42 × 4 here. This matrix is known as Q-matrix, and the agent main-
tains the value of each (state, action) tuple in the Q-matrix. This setup is flexible and
Category
can be similarly configured for different numbers of app categories, UFapp , and
actions. Figure 2 depicts main operations performed by the agent.
Algorithm 1 describes the basic flow of various steps that the model takes while
learning to record data and make decisions. A reward table with dimensions sim-
ilar to those of Q matrix records the user feedback for each state-action pair. Al-
gorithm 2 initializes various important parameters, including the app categories,
context words, UF categories, and the number of actions and their distortion range.
One log file is maintained for each app to record its behaviour and act as meta data
for computing the current value of UF. LOGFILE has data entry for each app and
Algorithm 1 PRIVACY PREDICTION ()

1: INITIALISATION()
2: for each query q do
3: q’= FORMULATE STANDARD QUERY(q)
4: Action = AGENT DECISION MAKING ( q’ )
5: New location = GET NEW LOCATION (Action)
6: reward = GET REWARD (q’, action)
7: UPDATE AGENT (q’, action , reward )
8: UPDATE LOGS (q) // update logs
9: end for
10: return New location
for the following keys APP CAT EGORY , UF COUNT S, PREV UF, UF CAT .
The UF UPDATE() in algorithm 3 is a considered to be a concurrent process that
continuously update the log file after each T IME BANDW IDT H unit. The UF is
computed as a weighted average of the apps’ current (CURRENT UF) and pre-
vious usage frequency (PREV UF). A weight factor α is used to ensure that the
behaviour transitions are smooth.
Algorithm 2 INITIALISATION()
1: Initialise VALID APP CATEGORIES= [‘maps&navigation’, ‘social’, ‘food&drinks’, etc.] //
list of valid app categories as shown in Table 1
2: Initialise VALID CONTEXT WORDS for each category
3: Initialise No ACTIONS=4
4: Initialise VALID ACTIONS = A0 : ( start = 0, end = 0 ), A1 : ( start = 0, end = 250mt), A2 : (
start = 250mt, end = 500mt), A3 : ( start = 500mt, end = 1000mt)
5: Initialise VALID UF CATS = [’low’, ’mid’, ’high’] // a list of valid UF categories
Category
6: Initialise matrix Q of dimensions ( app categories, UFapp , context, actions ) with zeros
7: Initialise matrix REWARDTABLE with same shape as matrix Q, each entry = NULL.
8: An empty LOGFILE // a log file maintains the logs necessary for calculation
9: OUR SMART DATA AGENT IS AWARE OF THE TRUE LOCATION.
Algorithm 3 UF UPDATE( )
1: Initialise
TIME BANDWIDTH = 20 // in seconds
α = 0.9 // contribution factor of current frequency in weighted average
THRESHOLD LOW = 0.2 ,THRESHOLD MID = 0.4 //thresholds to discretise the UF
2: while True do
3: logs = readLogFile(LOGFILE)
4: for each app’s Entry in logs do
5: get PREV UF // previous usage frequency if not present than it’s 0
6: UPDATE PREV UF with UF
7: get UF COUNTS // frequency counts in previous TIME BANDWIDTH period
8: CURRENT UF = UF COUNTS/ TIME BANDWIDTH
9: UF = α ∗ CURRENT UF + ( 1 - α ) ∗ PREV UF
10: UF CAT = 0 if UF < THRESHOLD LOW
11: UF CAT = 1 if THRESHOLD LOW ≤ UF < THRESHOLD MID
12: UF CAT = 2 if UF ≥ THRESHOLD MID
13: end for
14: end while
Originating query q is simple, carrying only the app name and the context words
to make a LBS request. Algorithm 4 determines the goodness of a query (calling
context word present or absent in user defined set) and its category of interaction
using UF CAT (low, medium, high). It outputs a formatted query q’ as input to
decision making algorithm 5. Algorithm 5 follows the epsilon greedy approach to
balance exploration with exploitation. It instructs the agent to take the random ac-
tion with a probability ε (exploration) and the best action (one with highest reward
Algorithm 4 FORMULATE STANDARD QUERY (q)

Input: original query q
Output: formatted query q’
// template for input query q
q = { appName : “string”, appCategory : “string”, contextWord :“string” }
// template for output query q’
q’ = { appName : “string”, appCategory : “numeric”, contextWord :“binary”, UF CAT :
“numeric” }
1: logs = read LOGFILE
2: q’ [appName]= q[ appName]
3: q’ [ appCategory ] = get index of q [ appCategory ] in VALID APP CATEGORIES // if app-
Category not in VALID APP CATEGORIES, index of ”other” is used
4: q’ [ contextWord ] = 1 if q [ contextWord ] in VALID CONTEXT WORDS else 0
5: q’ [ UF CAT ] = read logs for UF CAT
6: return q’
Algorithm 5 AGENT DECISION MAKING()

1: Initialize ε = 0.1
2: return random action with prob = ε
3: return greedy action w.r.t to Q values with prob = 1 - ε
with ties broken randomly.
Algorithm 6 GET REWARD( q’, action)

Input: (q’, action)
Output: reward
1: Initialize ϕ = 0.01 // probability with which to ask for human feedback
2: if REWARDTABLE does not have reward corresponding to ( q’, action ): then
3: fill REWARDTABLE with human feedback
4: return lookup REWARDTABLE value corresponding to ( q’, action )
5: else
6: ask human for feedback with probability ϕ
7: return lookup in REWARDTABLE with probability (1-ϕ )
8: end if
Algorithm 7 UPDATE AGENT()

Input: q’, action , reward
1: STEP SIZE = 0.9
2: update Q [ q’, action ] = Q[ q’, action] + STEP SIZE * ( reward-Q[ q’, action ])
Algorithm 8 UPDATE LOGS()

1: appName, appCategory = get from q
2: if no entry of appName in database: then
3: SET default values of PREV UF = 0 , UF COUNTS = 0 , UF CAT = 0, APP CATEGORY
= appCategory
4: end if
5: logs = read LOGFILE for data corresponding to appName
6: logs [ UF COUNTS] = logs[ UF COUNTS] + 1
7: logs [ APP CATEGORY ] = appCategory
8: return write back logs
value) suggested by the REWARDTABLE and Q-matrix with probability (1-ε ). It is

followed by algorithm 6, which collects user feedback for giving rewards or penal-
ties. Algorithms 5 and 6 together enable continuous user feedback recording and
behavioural adaptation.
Algorithm 7 is the agent update mechanism, which is based on the principle
that NewEstimate = OldEstimate + step size ∗ (target − OldEstimate). Increasing
or decreasing the step size ∈ [0, 1] may affect the convergence. Since our problem
has a non-stationary distribution, i.e., the distortion preferences and user feedback
change over time, a constant step size is used. The logs need to be updated after
each action as in algorithm 8.
4 Simulation Results
Agent location distortion behaviour was simulated for two users (A and B) with
different personalities and professional backgrounds.
1. User A is a male university graduate who is not concerned much about location
sharing. He is fine with sharing his exact location with any app as long as UF
is low. He chooses the number of actions to be 4 with distortion ranges [A0 : 0–0
mt, A1 : 0–250 mt, A2 : 250–500 mt, A3 : 500–1000 mt]. The agent initially begins
exploration and records a feedback whenever it encounters a state that does not
have any value in Q matrix. Its interactions for two app categories is presented
below.
A. Interaction with app belonging to category “Maps & Navigation”
Assume that the user identifies the following words belonging to this category as
contextwords = [ book, ride, go, look, find, way, map, guide, locate ]. The user
wants the original location to be shared in the presence of context and irrespective
of UF. To simulate an app’s behaviour we make the following settings:
Probability of non-contextual location access = 0.1
Probability of a contextual location access = ( 1 - 0.1 )
Then the state-action mapping as explored by the agent is given below
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : present, ‘UF’ :
any of [low mid high] ⇒ preferred Action = A0
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ’UF’ :
[low ] ⇒ preferred Action = A0
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ‘UF’ :
any of [ mid high] ⇒ preferred Action = A2 /A3 (medium to high distortion)
The agent’s interaction with a ‘taxi’ app is visualized in Fig. 3 (a) and (b), where
the x-axis depicts the ith interaction. The state at each interaction is depicted by -
a green dot and a pink bar, for which the value is shown on the y-axis. The green
dot at value 1 indicates the presence of context and at value 0 indicates it absence.
The UF category is depicted by the pink bar for which the value is set to 0 for
Fig. 3 User: A, app: ‘taxi’ (a)-(b) mapping of states and actions (c)-(d) corresponding distortion.
‘low’, 1 for ‘medium’, and 2 for ‘high’. The user feedback is recoded using a red
star: value 1 indicates when the agent receives feedback (exploration) and value
0 indicates when it does not (exploitation). The action (A0 , A1 , A2 , or A3 ) taken
at a particular state is numbered above the pink bar. The distortion corresponding
to each ith interaction is plotted in Fig. 3 (c) and (d).
At interaction i0 , the user has first state, context present, and UF low, so the agent
explores by taking a random action A1 and records the feedback. The feedback
tells the agent that the action was wrong. Thus again when same state appears it
takes another action A0 and on recording feedback it finds to be appropriate. It
exploits this choice in following iterations which have same state. However, with
probability ϕ it attempts random actions to continue exploration at interactions
i5 , i10 , and i15 as defined in algorithm 6. By interaction i100 , the agent has seen
mid and high UF states and has learned the user choices for no distortion in the
presence of context, otherwise distorted using A2 /A3 in range 250-500/500-1000
mt. Since context is absent these distortions will give distorted locations if the
app is frequently collecting location in the background.
B. Interaction with app belonging to category “Social”
Let the app name be ‘chat’, the contextwords be [check-in, live, tag, locate,
share], and state-action mapping as explored for following settings:
Probability of non-contextual location access = 0.6
Probability of a contextual location access = ( 1 - 0.6 )
• State = ‘appCategory’ : Social, ‘contextWord’ : present, ‘UF’ : any of [low
mid high] ⇒ preferred Action = A0
Fig. 4 User: A, app: ‘chat’ (a)-(b)mapping of states and actions (c)-(d) corresponding distortion
• State = ‘appCategory’ : Social, ‘contextWord’ : absent, ’UF’ : low ⇒ pre-

ferred Action = A0
• State = ‘appCategory’ : Social, ‘contextWord’ : absent, ‘UF’ : mid ⇒ pre-
ferred Action = A1 (slight distortion)
• State = ‘appCategory’ : Social, ‘contextWord’ : absent, ‘UF’ : high ⇒ pre-
ferred Action = A2 (high distortion)
Fig. 4 (a) and (c) during initial interactions the agent learns not to distort (A0 )
when UF is low whether context present or absent. Around i18 it learns if UF is
mid and context absent distort by A1 (0-250mt). Later around i80 , Fig. 4 (b) and
(d) when UF is high then A2 (250-500 mt) is learned while continuous explo-
ration in process.
2. User B : She is a professional working in a government agency and highly con-
cerned about revealing her location. She chooses the number of actions to be 5
with distortion ranges [A0 : 0–0 mt, A1 : 0–300 mt, A2 : 300–600 mt, A3 : 600–1000
mt, A4 : 1000–5000 mt, A5 : ≥ 5000mt]. She needs to use apps in the maps and
navigation category and share her original location only in the presence of con-
text otherwise, only the distorted location is shared even if the UF is low. She
uses her social networking app to communicate with friends and family but with-
out revealing her exact location even in the presence of context.
A. Interaction with app belonging to category “Maps & Navigation”
We assume that the context words and simulation probabilities are the same as
for the category above. The state-action mapping as explored by the agent is
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : present, ‘UF’ :
any of [low mid high] ⇒ preferred Action = A0
Fig. 5 User: B, app: ‘taxi’ (a)-(b) mapping of states and actions (c)-(d) corresponding distortion.
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ’UF’ :

[low ] ⇒ preferred Action = A1 (slight distortion)
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ‘UF’ :
any of [ mid high] ⇒ preferred Action = A2 /A3 (higher distortion)
As shown in Fig. 5 (a) and (c), during the initial interactions, the UF is low and
the agent takes random actions. From user feedback, it learns not to distort (A0 )
when context is present. Even when context is absent, it continues exploring to
learn the correct actions. At around iteration i160 , Fig. 5 (b) and (d), it learns
that if UF is mid/high and context is absent, distort by 300-600 or 600-1000 mt
(A2 /A3 ).
B. Interaction with app belonging to category “Social”
The the state- action mapping as explored by the agent is given below
• State = ‘appCategory’ : Social, ‘contextWord’ : present, ‘UF’ : any of [low
mid high] ⇒ preferred Action = A1 (slight distortion)
• State = ‘appCategory’ : Social, ‘contextWord’ : absent, ‘UF’ : any of [low
mid high] ⇒ preferred Action = A2 /A3 /A4 (high distortion)
Figure 6(a) and (c) shows initial interactions i7 to i17 when UF is low and the actions
learned are A1 for context present and A2 when context is absent. At around inter-
action i20 , UF is transitioning to mid-range and actions continue to be explored. At
around interaction i95 − i113 , the agent has learned to take action A1 in the presence
of context for any UF and actions A2 /A3 /A4 otherwise 1 .
1 A complete video demonstration of the above behaviours can be found at

https://github.com/rohitdavas/Smart-data-2-An-RL-agent.
Fig. 6 User: B, app: ‘chat’ (a)-(b)mapping of states and actions (c)-(d) corresponding distortion.
5 Conclusions
Agent location distortion behaviour was simulated for two users (A and B) with
different personalities for two types of applications. The simulation demonstrated
agent’s to adapt to different numbers of distortion actions, distortion ranges, and
context words. After a few interactions, the agent was able to learn the desired poli-
cies through an interaction and reward mechanism. The epsilon greedy approach
and continuous feedback estimation lets users change their policy settings over time.
Although simulations were run for only two users, the agent behaviours was found
similar to those for other user settings as well. Overall, the proposed model behaves
as a desired cyber proxy and is a one-stop solution for managing various apps on
a user’s smart phone. It enables the user to decide with whom to share the user’s
location and in which manner while enjoying the value-based services provided
location-based services. The proposed model is practical application of ‘privacy by
design’ for handling location privacy and can be deployed on smartphone to make
users aware and control their location revelation.
Acknowledgements
This work was partially supported by JSPS KAKENHI Grants JP16H06302 and
JP18H04120, and by JST CREST Grants JPMJCR18A6 and JPMJCR20D3, Japan.
References
1. Bettini, C., Jajodia, S., Samarati, P., Wang, S.X.: Privacy in location-based applications: re-
search issues and emerging trends, vol. 5599. Springer Science & Business Media (2009)
2. Gedik, B., Liu, L.: Protecting location privacy with personalized k-anonymity: Architecture
and algorithms. IEEE Transactions on Mobile Computing 7(1), 1–18 (2007)
3. Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through spatial and
temporal cloaking. In: First International conference on Mobile systems, applications and
services, pp. 31–42 (2003)
4. Guo, X., Wang, W., Huang, H., Li, Q., Malekian, R.: Location privacy-preserving method
based on historical proximity location. Wireless Communications and Mobile Computing
2020 (2020)
5. Harvey, I., Cavoukian, A., Tomko, G., Borrett, D., Kwan, H., Hatzinakos, D.: Smartdata.
Springer (2013)
6. Hashem, T., Kulik, L.: Safeguarding location privacy in wireless ad-hoc networks. In: Inter-
national Conference on Ubiquitous Computing, pp. 372–390. Springer (2007)
7. Kaur, H., Echizen, I., Kumar, R.: Smart data agent for preserving location privacy. In: 2020
IEEE Symposium Series on Computational Intelligence (SSCI), pp. 2567–2575. IEEE (2020)
8. Le, T., Echizen, I.: Lightweight collaborative semantic scheme for generating an obfuscated
region to ensure location privacy. In: IEEE International Conference on Systems, Man, and
Cybernetics (SMC), pp. 2844–2849. IEEE (2018)
9. Li, B., Liang, R., Zhu, D., Chen, W., Lin, Q.: Blockchain-based trust management model for
location privacy preserving in vanet. IEEE Transactions on Intelligent Transportation Systems
(2020)
10. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy be-
yond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1),
3–es (2007)
11. Masoumzadeh, A., Joshi, J.: An alternative approach to k-anonymity for location-based ser-
vices. Procedia Computer Science 5, 522–530 (2011)
12. Mokbel, M.F.: Privacy in location-based services: State-of-the-art and research directions. In:
International Conference on Mobile Data Management, pp. 228–228. IEEE (2007)
13. Palm, A., Metzger, A., Pohl, K.: Online reinforcement learning for self-adaptive information
systems. In: International Conference on Advanced Information Systems Engineering, pp.
169–184. Springer (2020)
14. Qiu, Y., Liu, Y., Li, X., Chen, J.: A novel location privacy-preserving approach based on
blockchain. Sensors 20(12), 3519 (2020)
15. Santos, F., Humbert, M., Shokri, R., Hubaux, J.P.: Collaborative location privacy with rational
users. In: International Conference on Decision and Game Theory for Security, pp. 163–181.
Springer (2011)
16. Scrapper: https://pypi.org/project/google-play-scraper/ ((accessed May 14, 2020))
17. Shokri, R., Papadimitratos, P., Theodorakopoulos, G., Hubaux, J.P.: Collaborative location pri-
vacy. In: 2011 IEEE Eighth International Conference on Mobile Ad-Hoc and Sensor Systems,
pp. 500–509. IEEE (2011)
18. Takabi, H., Joshi, J.B., Karimi, H.A.: A collaborative k-anonymity approach for location pri-
vacy in location-based services. In: 2009 5th International Conference on Collaborative Com-
puting: Networking, Applications and Worksharing, pp. 1–9. IEEE (2009)
19. Tomko, G.J., Borrett, D.S., Kwan, H.C., Steffan, G.: Smartdata: Make the data think for itself.
Identity in the Information Society 3(2), 343–362 (2010)
20. Wiering, M., Van Otterlo, M.: Reinforcement learning, vol. 12. Springer (2012)
21. Zhang, L., Song, G., Zhu, D., Ren, W., Xiong, P.: Location privacy preservation through kernel
transformation. Concurrency and Computation: Practice and Experience p. e6014 (2020)
22. Zhong, G., Hengartner, U.: A distributed k-anonymity protocol for location privacy. In:
IEEE International Conference on Pervasive Computing and Communications, pp. 1–10. IEEE
(2009)

Reinforcement Learning Based Smart Data Agent For Location Privacy

Uploaded by

Copyright:

Available Formats

You might also like

Reinforcement Learning Based Smart Data Agent For Location Privacy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reinforcement Learning Based Smart Data Agent For Location Privacy

Uploaded by

Copyright:

Available Formats

Reinforcement Learning based Smart Data

Agent for Location Privacy

Harkeerat Kaur, Rohit Kumar, and Isao Echizen

Abstract A “privacy by design” approach is presented for location management on

is increasingly, inadvertently and sometimes covertly being captured by location-

2 Location Privacy and Related Works

3 Proposed Smart Data Agent RL Formulation

Table 1 Apps, Tags and Contexts

Apps Tags Context Words

• Object’s speciﬁc behaviour : usage frequency level (low, medium, high)

Fig. 2 Main operations performed by agent - sensing, prediction, and distortion.

The environment is speciﬁed by a tuple M = (S; A; R), where S stands for

Algorithm 1 PRIVACY PREDICTION ()

Algorithm 4 FORMULATE STANDARD QUERY (q)

Algorithm 5 AGENT DECISION MAKING()

Algorithm 6 GET REWARD( q’, action)

Algorithm 7 UPDATE AGENT()

Algorithm 8 UPDATE LOGS()

value) suggested by the REWARDTABLE and Q-matrix with probability (1-ε ). It is

• State = ‘appCategory’ : Social, ‘contextWord’ : absent, ’UF’ : low ⇒ pre-

• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ’UF’ :

1 A complete video demonstration of the above behaviours can be found at

You might also like