Professional Documents
Culture Documents
Reinforcement Learning Based Smart Data Agent For Location Privacy
Reinforcement Learning Based Smart Data Agent For Location Privacy
Reinforcement Learning Based Smart Data Agent For Location Privacy
1 Introduction
The digital world is now hosting a complete palate of IoT and web-based applica-
tions, such as smart homes, smart cars, smart wearables, smart locks, that are cou-
pled with various online services for shopping, renting, education, insurance, etc. To
avail themselves of these value-added services, users must often submit extensive
personal information, including their preferences, behaviours, eating habits, and so-
cial status. Moreover, location information, another type of personal information
Harkeerat Kaur
Indian Institute of Technology Jammu, India e-mail: harkeerat.kaur@iitjammu.ac.in
Rohit Kumar
Indian Institute of Technology Jammu, India e-mail: 2017ucs0054@iitjammu.ac.in
Isao Echizen
National Institute of Informatics Tokyo, Japan e-mail: iechizen@nii.ac.jp
1
2 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
Location privacy essentially refers to a person’s control over the storing, sharing,
and dissipation of their location information [1]. Several types of mechanisms have
been proposed for managing users’ location information.
Centralized third-party mechanism: In this approach, a centralized anonymizer or a
third party acts as a trusted agent between the user and LBS. The anonymizer cloaks
the actual location of the user within a set of locations known as a “cloaking region”
[3]. In the basic K-anonymity approach, the anonymizer first encloses the actual
location within a set of K − 1 similar locations and then forwards the query to the
LBS [2]. Several improvements like K-anonymity and L-diversity [10], distributed
K-anonymity [22], and (K,T )-anonymity [11] have been proposed. However, the
major issues are reliability of the anonymizer and its probability of getting attacked.
Reinforcement Learning based Smart Data Agent for Location Privacy 3
The need for maintaining a sufficient number of users may cause expansion of the
cloaking regions which can degrade quality of service (QoS) and service time.
Collaborative mechanism: In this approach a group of users collaborate to hide
their location information [18, 17]. It requires peer-to-peer connections of location-
aware devices connected over a wireless infrastructure (Bluetooth, Wi-Fi) that can
communicate with each other. Main working principle is that if a peer in the net-
work has location-specific information fetched from LBS, it can share it with any
peer who is seeking the same information without it having to submit its actual lo-
cation. The devices can also use these locations for collaborative k-anonymization
[18], obfuscation [6, 8] etc. Major challenges with this approach are the presence of
neighbours, their probability having similar queries, willingness to cooperate, and
network viability for such operations [15]. Geographical location and population
density play a very important role here, and privacy cannot always be guaranteed.
User-centric mechanism: Here a user itself endeavours to protect its location instead
of relying on peers or third parties. Location is distorted by cloaking, enlarging, or
reducing the target region. A basic way to obfuscate is to distort on the basis of
geometry [12]. Another way to protect actual location is to submit fake or dummy
queries to the LBS directly rather than having a centralized anonymizer server gen-
erate them. However, if the locations for submitting the fake or dummy queries is
not carefully selected, the attacker may be able to infer the actual location. Historical
proximity and kernel transformation are other approaches in this direction [4, 21].
Blockchain mechanism: In this recently developed approach, a framework is created
that uses multiple private blockchains to collapse direct contact between users and
an LBS. Transactions are used to record the user’s location on a blockchain, and
incentivized mining is used to query an LBS and return results to users [14, 9].
While various approaches are proposed but the most important concern remains
the same i.e., the restricted choice of using either original or distorted location. A
map app, for example, gets permission to access exact location for navigation, which
it may misuse to record user’s location even when it is not in use. Furthermore, user’s
personality also affects their privacy preferences. A college going teenager would
likely be fine with revealing its exact location to a social networking LBS while it
would be matter of great concern to a medical practitioner, and unimaginable for a
military person. The same teenager’s behavior may change if he spends hours on
that app and may not like getting its exact location recorded throughout.
Until now there is no provision of letting a user decide with whom they want
to share information, when, and to what extent. This motivates the design of an
intelligent model that first analyses the environment and then predicts the amount of
privacy needed. It is cognizant of the user’s needs and adapts to changes accordingly
over time. We recently proposed a primitive model of such a data agent in [7]. It uses
supervised learning on static input-output training data recorded to predict privacy
behaviour over simple neural network architecture. However, it was cumbersome
to analyse all possible states and give a scalable solution suitable for all possible
scenarios. Our newly proposed model is an advanced, reconfigurable, self-evolving,
and a more practical version that uses RL for interactive behaviour to mimic a proxy
agent sensitive to the user’s personal choices, app behaviours, and context.
4 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
Fig. 1 (a) Smart data agent concept; (b) Depicting state, actions, and rewards in setup.
Smart Data: The smart data concept allows data to “think for themselves” by
transforming themselves into different “active forms” in accordance with the user’s
preferences, content, context, and nature of the interacting environment with the
help of an agent [19, 5]. Data are instructed how to expose or transform themselves
to the interacting environment. The proposed model transcends the above definition
of the smart data concept to facilitate the creation of smart location data. As illus-
trated in Fig. 1(a) the original data, i.e., the coordinates of the user’s actual location
are at the center. Instead of directly revealing themselves (coordinates x,y) to the in-
teracting environment (apps), the data pass through a transformation/cloaking layer.
This process is supported by the agent, which senses the environment and interact-
ing apps (e.g., Maps & Navigation, Social, Food & Health), and associates different
actions (distortion ranges) in different situations.
Problem Formulation: The goal is to learn policies by testing different actions
and automatically learn which one has the most rewarding outcome for a given
situation [20]. This sets up the basis for contextual bandit RL formalism. In our con-
textual bandit formalism, we need to capture the user’s preferences in our agent’s
decision making. Thus, there is a need for human input to evaluate the agent’s be-
haviour. Therefore, the natural question to ask is how would a person evaluate an
agent’s behaviour? A natural way to capture such knowledge is from an object’s
group behaviour, an object’s specific behaviour, an object’s query goodness, and the
agent’s action. Fig 1(b) outlines the basic set up of the RL agent. The agent is a
representative of “user preferences,” and the object is an “app querying for loca-
tion”. This notion of capturing user preference and evaluation guides us to form the
elements of the problem, i.e., the states, actions, and rewards.
1. States: States are the input to the agent and should be mapped in accordance with
the user’s evaluation system. Three properties describe object’s/app’s state.
• Object’s group behaviour : app tag (category)
Reinforcement Learning based Smart Data Agent for Location Privacy 5
for the following keys APP CAT EGORY , UF COUNT S, PREV UF, UF CAT .
The UF UPDATE() in algorithm 3 is a considered to be a concurrent process that
continuously update the log file after each T IME BANDW IDT H unit. The UF is
computed as a weighted average of the apps’ current (CURRENT UF) and pre-
vious usage frequency (PREV UF). A weight factor α is used to ensure that the
behaviour transitions are smooth.
Algorithm 2 INITIALISATION()
1: Initialise VALID APP CATEGORIES= [‘maps&navigation’, ‘social’, ‘food&drinks’, etc.] //
list of valid app categories as shown in Table 1
2: Initialise VALID CONTEXT WORDS for each category
3: Initialise No ACTIONS=4
4: Initialise VALID ACTIONS = A0 : ( start = 0, end = 0 ), A1 : ( start = 0, end = 250mt), A2 : (
start = 250mt, end = 500mt), A3 : ( start = 500mt, end = 1000mt)
5: Initialise VALID UF CATS = [’low’, ’mid’, ’high’] // a list of valid UF categories
Category
6: Initialise matrix Q of dimensions ( app categories, UFapp , context, actions ) with zeros
7: Initialise matrix REWARDTABLE with same shape as matrix Q, each entry = NULL.
8: An empty LOGFILE // a log file maintains the logs necessary for calculation
9: OUR SMART DATA AGENT IS AWARE OF THE TRUE LOCATION.
Algorithm 3 UF UPDATE( )
1: Initialise
TIME BANDWIDTH = 20 // in seconds
α = 0.9 // contribution factor of current frequency in weighted average
THRESHOLD LOW = 0.2 ,THRESHOLD MID = 0.4 //thresholds to discretise the UF
2: while True do
3: logs = readLogFile(LOGFILE)
4: for each app’s Entry in logs do
5: get PREV UF // previous usage frequency if not present than it’s 0
6: UPDATE PREV UF with UF
7: get UF COUNTS // frequency counts in previous TIME BANDWIDTH period
8: CURRENT UF = UF COUNTS/ TIME BANDWIDTH
9: UF = α ∗ CURRENT UF + ( 1 - α ) ∗ PREV UF
10: UF CAT = 0 if UF < THRESHOLD LOW
11: UF CAT = 1 if THRESHOLD LOW ≤ UF < THRESHOLD MID
12: UF CAT = 2 if UF ≥ THRESHOLD MID
13: end for
14: end while
Originating query q is simple, carrying only the app name and the context words
to make a LBS request. Algorithm 4 determines the goodness of a query (calling
context word present or absent in user defined set) and its category of interaction
using UF CAT (low, medium, high). It outputs a formatted query q’ as input to
decision making algorithm 5. Algorithm 5 follows the epsilon greedy approach to
balance exploration with exploitation. It instructs the agent to take the random ac-
tion with a probability ε (exploration) and the best action (one with highest reward
8 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
4 Simulation Results
Agent location distortion behaviour was simulated for two users (A and B) with
different personalities and professional backgrounds.
1. User A is a male university graduate who is not concerned much about location
sharing. He is fine with sharing his exact location with any app as long as UF
is low. He chooses the number of actions to be 4 with distortion ranges [A0 : 0–0
mt, A1 : 0–250 mt, A2 : 250–500 mt, A3 : 500–1000 mt]. The agent initially begins
exploration and records a feedback whenever it encounters a state that does not
have any value in Q matrix. Its interactions for two app categories is presented
below.
A. Interaction with app belonging to category “Maps & Navigation”
Assume that the user identifies the following words belonging to this category as
contextwords = [ book, ride, go, look, find, way, map, guide, locate ]. The user
wants the original location to be shared in the presence of context and irrespective
of UF. To simulate an app’s behaviour we make the following settings:
Probability of non-contextual location access = 0.1
Probability of a contextual location access = ( 1 - 0.1 )
Then the state-action mapping as explored by the agent is given below
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : present, ‘UF’ :
any of [low mid high] ⇒ preferred Action = A0
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ’UF’ :
[low ] ⇒ preferred Action = A0
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : absent, ‘UF’ :
any of [ mid high] ⇒ preferred Action = A2 /A3 (medium to high distortion)
The agent’s interaction with a ‘taxi’ app is visualized in Fig. 3 (a) and (b), where
the x-axis depicts the ith interaction. The state at each interaction is depicted by -
a green dot and a pink bar, for which the value is shown on the y-axis. The green
dot at value 1 indicates the presence of context and at value 0 indicates it absence.
The UF category is depicted by the pink bar for which the value is set to 0 for
10 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
Fig. 3 User: A, app: ‘taxi’ (a)-(b) mapping of states and actions (c)-(d) corresponding distortion.
‘low’, 1 for ‘medium’, and 2 for ‘high’. The user feedback is recoded using a red
star: value 1 indicates when the agent receives feedback (exploration) and value
0 indicates when it does not (exploitation). The action (A0 , A1 , A2 , or A3 ) taken
at a particular state is numbered above the pink bar. The distortion corresponding
to each ith interaction is plotted in Fig. 3 (c) and (d).
At interaction i0 , the user has first state, context present, and UF low, so the agent
explores by taking a random action A1 and records the feedback. The feedback
tells the agent that the action was wrong. Thus again when same state appears it
takes another action A0 and on recording feedback it finds to be appropriate. It
exploits this choice in following iterations which have same state. However, with
probability ϕ it attempts random actions to continue exploration at interactions
i5 , i10 , and i15 as defined in algorithm 6. By interaction i100 , the agent has seen
mid and high UF states and has learned the user choices for no distortion in the
presence of context, otherwise distorted using A2 /A3 in range 250-500/500-1000
mt. Since context is absent these distortions will give distorted locations if the
app is frequently collecting location in the background.
B. Interaction with app belonging to category “Social”
Let the app name be ‘chat’, the contextwords be [check-in, live, tag, locate,
share], and state-action mapping as explored for following settings:
Probability of non-contextual location access = 0.6
Probability of a contextual location access = ( 1 - 0.6 )
• State = ‘appCategory’ : Social, ‘contextWord’ : present, ‘UF’ : any of [low
mid high] ⇒ preferred Action = A0
Reinforcement Learning based Smart Data Agent for Location Privacy 11
Fig. 4 User: A, app: ‘chat’ (a)-(b)mapping of states and actions (c)-(d) corresponding distortion
Fig. 4 (a) and (c) during initial interactions the agent learns not to distort (A0 )
when UF is low whether context present or absent. Around i18 it learns if UF is
mid and context absent distort by A1 (0-250mt). Later around i80 , Fig. 4 (b) and
(d) when UF is high then A2 (250-500 mt) is learned while continuous explo-
ration in process.
2. User B : She is a professional working in a government agency and highly con-
cerned about revealing her location. She chooses the number of actions to be 5
with distortion ranges [A0 : 0–0 mt, A1 : 0–300 mt, A2 : 300–600 mt, A3 : 600–1000
mt, A4 : 1000–5000 mt, A5 : ≥ 5000mt]. She needs to use apps in the maps and
navigation category and share her original location only in the presence of con-
text otherwise, only the distorted location is shared even if the UF is low. She
uses her social networking app to communicate with friends and family but with-
out revealing her exact location even in the presence of context.
A. Interaction with app belonging to category “Maps & Navigation”
We assume that the context words and simulation probabilities are the same as
for the category above. The state-action mapping as explored by the agent is
• State = ‘appCategory’ : Maps & Navigation, ‘contextWord’ : present, ‘UF’ :
any of [low mid high] ⇒ preferred Action = A0
12 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
Fig. 5 User: B, app: ‘taxi’ (a)-(b) mapping of states and actions (c)-(d) corresponding distortion.
Fig. 6 User: B, app: ‘chat’ (a)-(b)mapping of states and actions (c)-(d) corresponding distortion.
5 Conclusions
Agent location distortion behaviour was simulated for two users (A and B) with
different personalities for two types of applications. The simulation demonstrated
agent’s to adapt to different numbers of distortion actions, distortion ranges, and
context words. After a few interactions, the agent was able to learn the desired poli-
cies through an interaction and reward mechanism. The epsilon greedy approach
and continuous feedback estimation lets users change their policy settings over time.
Although simulations were run for only two users, the agent behaviours was found
similar to those for other user settings as well. Overall, the proposed model behaves
as a desired cyber proxy and is a one-stop solution for managing various apps on
a user’s smart phone. It enables the user to decide with whom to share the user’s
location and in which manner while enjoying the value-based services provided
location-based services. The proposed model is practical application of ‘privacy by
design’ for handling location privacy and can be deployed on smartphone to make
users aware and control their location revelation.
Acknowledgements
This work was partially supported by JSPS KAKENHI Grants JP16H06302 and
JP18H04120, and by JST CREST Grants JPMJCR18A6 and JPMJCR20D3, Japan.
14 Harkeerat Kaur, Rohit Kumar, and Isao Echizen
References
1. Bettini, C., Jajodia, S., Samarati, P., Wang, S.X.: Privacy in location-based applications: re-
search issues and emerging trends, vol. 5599. Springer Science & Business Media (2009)
2. Gedik, B., Liu, L.: Protecting location privacy with personalized k-anonymity: Architecture
and algorithms. IEEE Transactions on Mobile Computing 7(1), 1–18 (2007)
3. Gruteser, M., Grunwald, D.: Anonymous usage of location-based services through spatial and
temporal cloaking. In: First International conference on Mobile systems, applications and
services, pp. 31–42 (2003)
4. Guo, X., Wang, W., Huang, H., Li, Q., Malekian, R.: Location privacy-preserving method
based on historical proximity location. Wireless Communications and Mobile Computing
2020 (2020)
5. Harvey, I., Cavoukian, A., Tomko, G., Borrett, D., Kwan, H., Hatzinakos, D.: Smartdata.
Springer (2013)
6. Hashem, T., Kulik, L.: Safeguarding location privacy in wireless ad-hoc networks. In: Inter-
national Conference on Ubiquitous Computing, pp. 372–390. Springer (2007)
7. Kaur, H., Echizen, I., Kumar, R.: Smart data agent for preserving location privacy. In: 2020
IEEE Symposium Series on Computational Intelligence (SSCI), pp. 2567–2575. IEEE (2020)
8. Le, T., Echizen, I.: Lightweight collaborative semantic scheme for generating an obfuscated
region to ensure location privacy. In: IEEE International Conference on Systems, Man, and
Cybernetics (SMC), pp. 2844–2849. IEEE (2018)
9. Li, B., Liang, R., Zhu, D., Chen, W., Lin, Q.: Blockchain-based trust management model for
location privacy preserving in vanet. IEEE Transactions on Intelligent Transportation Systems
(2020)
10. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: l-diversity: Privacy be-
yond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1),
3–es (2007)
11. Masoumzadeh, A., Joshi, J.: An alternative approach to k-anonymity for location-based ser-
vices. Procedia Computer Science 5, 522–530 (2011)
12. Mokbel, M.F.: Privacy in location-based services: State-of-the-art and research directions. In:
International Conference on Mobile Data Management, pp. 228–228. IEEE (2007)
13. Palm, A., Metzger, A., Pohl, K.: Online reinforcement learning for self-adaptive information
systems. In: International Conference on Advanced Information Systems Engineering, pp.
169–184. Springer (2020)
14. Qiu, Y., Liu, Y., Li, X., Chen, J.: A novel location privacy-preserving approach based on
blockchain. Sensors 20(12), 3519 (2020)
15. Santos, F., Humbert, M., Shokri, R., Hubaux, J.P.: Collaborative location privacy with rational
users. In: International Conference on Decision and Game Theory for Security, pp. 163–181.
Springer (2011)
16. Scrapper: https://pypi.org/project/google-play-scraper/ ((accessed May 14, 2020))
17. Shokri, R., Papadimitratos, P., Theodorakopoulos, G., Hubaux, J.P.: Collaborative location pri-
vacy. In: 2011 IEEE Eighth International Conference on Mobile Ad-Hoc and Sensor Systems,
pp. 500–509. IEEE (2011)
18. Takabi, H., Joshi, J.B., Karimi, H.A.: A collaborative k-anonymity approach for location pri-
vacy in location-based services. In: 2009 5th International Conference on Collaborative Com-
puting: Networking, Applications and Worksharing, pp. 1–9. IEEE (2009)
19. Tomko, G.J., Borrett, D.S., Kwan, H.C., Steffan, G.: Smartdata: Make the data think for itself.
Identity in the Information Society 3(2), 343–362 (2010)
20. Wiering, M., Van Otterlo, M.: Reinforcement learning, vol. 12. Springer (2012)
21. Zhang, L., Song, G., Zhu, D., Ren, W., Xiong, P.: Location privacy preservation through kernel
transformation. Concurrency and Computation: Practice and Experience p. e6014 (2020)
22. Zhong, G., Hengartner, U.: A distributed k-anonymity protocol for location privacy. In:
IEEE International Conference on Pervasive Computing and Communications, pp. 1–10. IEEE
(2009)