Offline User Data Prediction - Oct17 - IEEE - Comp PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

User Behavior Prediction in the “Offline” Smart

Home Solutions

Valery Milykh Dmitry Vavilov Ivan Platonov Alexander Anisimov


CTO Project Manager Developer Developer
Quarta Technologies T-Systems Billetkontoret A/S IT Dept. State University
Moscow, Russia St-Petersburg, Russia St-Petersburg, Russia St-Petersburg, Russia
v.milykh@quarta.ru Dmitry.Vavilov@yahoo.com ivan.platonov@ticket2travel.ru alexander.m.anisimov@gmail.com

Abstract—Smart Home products spread is restrained due to report, “a utility in a major metropolitan area installed uni-
several reasons including poor usability and fears of the personal directional smart meters that simply watched power usage on
data security [1,2]. a "real time" basis. The CEO of the utility said that even
though they weren't looking for personal information, it
Our research team suggested Smart Home solutions based on
the user behavior analysis and modeling [3]. We also studied
quickly became apparent that the utility could tell when
application of this approach to different householder needs (like people woke up, went to work, showered, cooked breakfast or
healthcare or imitation of the user’s presence [4]). We showed dinner, and when they were on vacation.” [2]
that the approach also critically improve the usability of the So the user can require that managerial inputs, schedule of
Smart Home. Finally, we described the methodology for analysis turning on/off the appliances, and information from sensors
of the householder’s behavior and simulation of his activities. should not be visible for external observers.
Another barrier is the number of Smart Home components
So “offline” solutions based on this approach allows protecting noticeably increased each year. Manual management of
privacy and provide good usability (“offline” means here partly
appropriate parameters becomes time and effort consuming
or completely disconnected from the external operating signals
and observations). At the same time they have some process. It requires discipline from users. In reality, an
disadvantages including relatively low quality of predictions [5]. ordinary user leaves off such routine activities very soon.
Finally the major part of the functionality is not exploited.
In this paper we discuss how the tuning of parameters of the Poor usability of solutions has become one more considerable
previously suggested methodology provides improvement of the obstacle for progress of Smart Home expansion.
user needs predictions. The results of primal tests of this Our research team suggests the methodology that helps to
approach are presented and analyzed. resolve all mentioned issues. In our previous research papers
[3] we described the reference model for the software
Index Terms— Internet of Things, Smart Home, user behavior
analysis and simulation, personal data security, recommender
responsible for analysis of Smart Home user behavior and
system. simulation of the user’s activities in automatic mode.
Input data for this model should be gathered from all
available sources (info from sensors or explicit user’s
I. INTRODUCTION
directions), and the results of analysis will be analyzed for all
At least dozen of the productized Smart Home solutions possible goals (including but not limited by security and
were presented in frames of IFA 2015 Exhibition in Berlin last healthcare topics, energy consumption optimization, and so
September. They are already available for consumers in the on). The software will provide recommendations without any
USA and EU countries. However, several significant obstacles interaction with external resources so both personal data on the
slow down their promotion, and fear of the personal data user’s activities as well as recommendations themselves will
security remains one of the most important. According to not leave the Smart Home perimeter.
research [1], “consumers are more worried about privacy and At the same time such “offline” or “disconnected” Smart
security issues than any other potential downsides of the Home has some disadvantages. For example, quality of
Internet of Things, with 53 percent expressing concern that prediction of user needs by separate recommender system is
their data might be shared without their knowledge or deliberately below in comparison with collaborative ones [5].
approval, and 51 percent expressing concern that their data We considered earlier how to improve the quality of the
could be hacked by other users”. householder’s activities forecast by taking into account the
For example, widespread deployment of smart devices following assumptions:
connected to some centralized data storage has broadened the 1) Cyclicality of the Smart Home related user activities (at
scope of people who have access to logs of activities of least daily and weekly cycles);
thousands and even millions of users. As Smart Grid analytics

978-1-5090-2957-0/16/$31.00 ©2016 IEEE 24


2) The forgetting factor in probabilistic calculations The log structure is very simple, every record includes date,
(probability of the similar user’s actions is higher if the time time (hours, minutes, and seconds), and the motion indicator
interval in days between them is less). (0 or 1, “zero” records state lack of motion).
3) Duration and time of the start of the user’s activity as It is clear that this specific ‘Smart office” task completely
quantitative and qualitative estimation of his/her needs and agrees with “general” Smart Home task. The input data
preferences. insignificantly differs in these cases (for example, for Smart
Below the results of primal tests of the approach are Office we have data only for working days; and these
discussed. We studied how to optimize different parameters of observations usually complement the Smart Home signals in
the suggested model. For example, what kind of dependencies terms of time frames). However the data structure and the
should be used for calculations corresponding to the second algorithms for their processing are completely applicable for
and third assumptions (see [3] for details). “general” Smart Home tasks, and only specific clustering
Another important parameter of the used algorithm we should be preliminary used.
discuss below is the size of “rolling window” of data (usually III. TEST DATA CLUSTERING
measured in days). Every time when the predicted Smart
Home equipment action is re-calculated, the oldest data is The preliminary analysis shows that only “no motion”
removed from consideration, while the most recent data is records are gathered for non-working days and the night time.
added. Different “rolling window” size is recommended for To exclude artificial improvement of the algorithm [3] usage,
various goals. In General Electric (GE) research [6] 45 days we consider only records for working days since 6 AM till 8
are recommended for algorithm for automatic detection of PM. So data clustering takes place.
abnormally long periods of inactivity at home (a suitable Weekly cyclicality of the users’ activities should improve
rolling window must be short enough to adapt to changes in the prediction for appropriate days of week (for example, we
resident behaviour, but long enough to avoid an undue level of can expect that log of Thursday actions provides more reliable
false alerts). At the same time, for many Smart Home prediction for next Thursday). However due to fact that some
appliances 10 or even 10 days enough to create the schedule non-working days were Monday and Tuesday (Russian
of the predicted user’s activities. national holidays), they were excluded from the clustered data,
Finally we explore how often we should gather data from and weekly cyclicality of the users activities is broken. At the
the sensors to provide a reliable forecast. The sensors should same time, it is a relatively common failure of the working
not react on minor motions because the log should not contain days schedule, and it is necessary to have relatively long log
“white noise” data). However time intervals between two of data to disregard such data interruption. In our case the
successive records in the log shall not be huge enough (the weekly cyclicality is neglected because of short log of data.
Smart Home shall simulate the user’s activities relatively
IV. TEST RESULTS ANALYSIS
flexible and quick). For example, if air conditioner will
automatically change the modes with one hour step, it may The prediction algorithm is described in details in [3]. We
lead to permanent manual corrections and form bad consumer divide the considered time frame (6 AM – 8 PM) on intervals
feeling. in N minutes. For every interval of the specific day we assign
the motion indicator equal to 1 for the whole interval if motion
II. TASK AND TEST DATA DESCRIPTION is indicated at least once during this period of time. In other
We consider user behaviour prediction regarding the case motion indicator remains equal to 0). So we get values of
following “general” Smart Home task: proactive setup of the the motion indicator for each interval for every day in our log
comfort climate in the room before the householder arrives. (they are mentioned below as “actual”).
Our test stand was setup in the office, so our task is more We take the first K days of the log (here K is the size in
specific: proactive setup of the comfort climate in the open days of the “rolling window” of data). For each interval we
space before the first employee arrives to the office. calculate the predicted value of the motion indicator for K+1
The personnel does not have strict schedule for visiting the day (based on steps defined in [3]) – and compare it with its
office during the working days. The sensors track the people “actual” value.
moving inside the open space (but do not react on minor Then we repeat this procedure for prediction of the motion
motions, so the log does not contain “white noise” data). indicator’s values for K+2 day (based on data for K days since
The log is gathered during one month. It is necessary to the second day in the log) and again compare the predicted
predict the time when the first employee comes to the office and actual values. We reiterate the process as long as the size
next working day (so the air conditioner might be turned on of log allows it.
before it, and the necessary climate is arranged till the first Finally we calculate the percentage of coincidences for all
person comes in). We simplify the task as much as possible days and all intervals for the specific set of parameters used in
(for example, we do not consider different air conditioning these calculations: size of time interval (N minutes), “rolling
modes that can be setup by different persons in different days; window” size (K days), and the type function used in the
we suppose that only one mode is used). forgetting factor calculation (linear, quadratic, or cubic; it
defines how significant will be impact of the most recent
data). The results are presented on the diagrams below.

25
same time, it could not be found reasonable for the healthcare
tasks like detecting of abnormally long periods of inactivity
(due to frequent fault alerts).
Secondly, increase of the “rolling window” size does not
definitely improve the prediction, 5 days can be considered as
its optimal meaning. It can be easily explained by the
forgetting factor usage, however another optimum (8-9 days) is
also observed for each type of the forgetting factor
dependency. For the household appliances 5 days seems more
preferable because it means that the consumer can stop regular
manual setup during the second week of the usage.
Finally, the type of dependency used for calculation of the
forgetting factor does not significantly affect the prediction.
We can suggest that correlation of the householder’s needs for
Fug. 1 Prediction results for the linear forgetting factor dependency
two or three successive days is very strong.
V. CONCLUSIONS AND FURTHER STEPS
In our previous publications we considered opportunity to
apply the similar methodology of the user’s activities
prediction and simulation for different areas of Smart Home
functionality. Such implementation of Smart Home will use all
available input data (signals gathered from all available sensors
and explicit user’s directions) and similar algorithms for all
necessary Smart Home tasks. This Smart Home solution (partly
or completely disconnected from external environment) would
allow resolving different issues including personal data
security and usability improvement.
The results of primal tests of this approach presented and
Fug.2 Prediction results for the quadratic forgetting factor dependency analyzed in this paper show that it can provide acceptable
results for relatively uncritical features like comfort control. At
the same time, it could not be found reasonable for the
healthcare tasks like detecting of abnormally long periods of
inactivity (due to frequent fault alerts, etc.) So the suggested
relatively simple approach should be complemented with more
advanced algorithms. In other words, the all-purpose Smart
Home solution based on unified data should include the set of
algorithms for critical and non-critical functionality. In further
researches we will discuss how to combine these algorithms.
VI. ACKNOWLEDGMENT
The authors would like to acknowledge the contribution of
Dr. Mikhail Makarov (CEO, EVMTech, Zurich, Switzerland)
Fig.3 Prediction results for the cubic forgetting factor dependency and Alexey Melezhik (Lead Specialist, Gazprom Promgaz).

We can make several conclusions regarding influence of REFERENCES


each of the mentioned above parameters. First of all, increase [1] The Internet of Things, Afinnova Research report, 2014
of the time interval’s size leads to the prediction improvement. [2] Interview with Samuel Sciacca, IEEE Smart Grid Website,
If only this parameter is changed, the quality of prediction is http://smartgrid.ieee.org/questions-and-answers/551-interview-
with-samuel-sciacca, 2014
changed from 50 % (for 5 minutes intervals) to 80-85% (for
[3] Dmitry Vavilov, Alexey Melezhik, Ivan Platonov, “Reference
one hour interval). It is expected result, because rough forecast Model for Smart Home User Behavior Analysis Software
has more chances to be successful. At the same time, both Module”, ICCE-Berlin 2014, Sep. 7-10
admissible minimal size of the time intervals and satisfactory [4] Dmitry Vavilov, Alexey Melezhik, Ivan Platonov, “Imitation of
percentage of the successful predictions critically depend from Smart Home User’s Presence”, SHUR 2015, Malaga, Spain,
June 22-25
the considered task. In our case 30 minutes time interval and
[5] Alexander Felfernig, Michael Jeran, Gerald Ninaus, Florian
75% (or more) successful predictions can be found as Reinfrank, and Stefan Reiterer, Toward the Next Generation of
acceptable result (it means that approximately once per week Recommender Systems: Application and research Challenges,
the user will change the air conditioner mode manually). At the Multimedia Services in Intelligent Environments:
Recommendation Services, Springer, 25:1-18, 2013

26
[6] P. Cuddihy, J. Weisenberg, C. Graichen, and M. Ganesh, networking support for healthcare and assisted living
“Algorithm to automatically detect abnormally long periods of environments, ser. HealthNet ’07, 2007, pp. 89–94.
inactivity in a home,” in Proceedings of the 1st ACM
SIGMOBILE international workshop on Systems and

27

You might also like