Professional Documents
Culture Documents
Incremental Mining of Frequent Power 2016 PDF
Incremental Mining of Frequent Power 2016 PDF
net/publication/311532067
CITATIONS READS
4 55
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Abdulsalam Yassine on 22 August 2018.
Abstract—The key elements for understanding power con- addition, generation of energy consumption data from a smart
sumption of a typical home are related to the activities that meter is an ongoing continuous process and over period of
users are performing, the time at which appliances are used, and time inter-appliance associations can change or new ones can
the interdependencies with other appliances that may be used
concurrently. This information can be extracted from context establish.
rich smart meters big data. However, the main challenge is how To address the above challenge, in this paper we propose
to mine complex interdependencies among different appliances frequent pattern mining technique as a means of extracting
usage within a home where multiple concurrent data streams are vital information about energy consumption behavior. We
occurring. Furthermore, generation of energy consumption data propose an incremental mining of frequent power consumption
from a smart meter is an ongoing continuous process and over
period of time inter-appliance associations can change or new patterns from smart meters big data. The model exploits the
ones can establish. In this paper, we propose incremental mining benefits of pattern growth strategy and mine in quantum of
of frequent power consumption patterns from smart meters big 24 hour period, i.e. frequent patterns are extracted from data
data. Our model exploits the benefits of pattern growth strategy comprising of appliance usage tuples for 24 hours period, in a
and mine in quantum of 24 hour period, i.e. frequent patterns progressive manner. The main contribution of our mechanism
are extracted from data comprising of appliance usage tuples for
24 hours period, in a progressive manner. The details and the is that it takes behavioral variations in the order of appliances
results of evaluating the proposed mechanism using real smart usage into consideration, by examining energy consumption
meters dataset are presented in this paper. patterns incremental and progressively, to discover occupants
personal preferences; which can to be one essential input
I. I NTRODUCTION parameter to energy programs to buy-in consumer confidence
Currently, millions of homes are being equipped with smart and achieve greater success.
meters, capable of generating data measurements of more than We propose to use interestingness measure Kulczynski
100 energy consumption data points every 15 minutes resulting Measure (Kulc)] [5] alongside Imbalance ratio (IR) [5] to
in a massive volume of data [1]; i.e smart meters big data. The supplement support − conf idence frame-work, which can
generated big data have a time-series notion typically consist ensure effective elimination of uninteresting rules. The use
of usage measurements of component appliances over a time of Kulc alongside IR ensures selection of not only frequent
interval [2]. Extracting appliance usage patterns from smart but non-so-frequent patterns of interest. For the evaluation of
meters big data is extremely valuable for effective demand side the proposed mechanism, we use the UK Domestic Appliance
management mechanisms and energy conservation policies. Level Electricity dataset [7]; which includes time series data
In reality, end uses of energy in residential premises is of power consumption collected from 2012 to 2015 with time
related to the activity that the user is performing, the time resolution of 6 seconds for five houses with 109 appliances
at which an appliance is used, and the interdependencies from Southern England. The details and the results of evalu-
with other appliances that may be used concurrently. For ating the proposed mechanism are presented in this paper.
example, a user might charge the electric vehicle at the same The organization of this paper is as follows: The next section
time when the washing machine is on, or the user finds it II discusses the related work. In section III, the proposed
convenient to watch the TV whenever he/she is ironing clothes. model is presented followed by evaluation results in section
These relationships are key elements for understanding power IV. Finally, we conclude the paper and discuss future direction
consumption of a typical home. Demand side management in section V.
schedules that consider shifting the operation of appliances
from one time slot to another, while ignoring their associ- II. R ELATED WORK
ation and correlation relationships, may not meet the user’s Mining appliance usage and association in form of frequent
preference requirements; which is not acceptable. The main patterns from context-aware smart-meter data can reveal sur-
challenge, however, is how to mine complex interdependen- prising underlying information. In this section, we present
cies among different appliances usage within a home where existing studies that are directly related to frequent pattern
multiple concurrent time-series data streams are occurring. In mining of smart meters big data.
Ph ase - I Ph ase - I V 30 minutes time interval are included into source database for
R aw
Da t a D at a
P r e - p r o ce ssi n g
Vi su al i z at i o n frequent pattern data mining. The real dataset UK-Dale [7]
S m ar t M et er
had over 400 million raw records of five houses with a time
ns t
er n
As so c i a t i o n
resolution of 6 seconds. We reduced it to 20 million during
Da g
tt ue
ta
ce in
Pa eq
Rules
ur Min
preprocessing phase without loss of accuracy or precision. In
Fr
So P
F
The work presented by [2] and [8] uses sequential pattern B. Frequent Pattern Mining
mining to understand electrical appliance usage patterns with Frequent patterns are repeated patterns or itemsets, which
a goal to conserve energy. Similarly, [9] uses incremental often appear in a dataset. Considering smart-meter data, an
sequential mining technique to discover correlation patterns itemset, comprising of laptop and washing machine, that often
among appliances, and proposes a new algorithm offering re- presents itself together is a frequent pattern. Hence, frequent
duction in memory with improved performance. The approach pattern mining can help discover association and/or correlation
provided by [10], uses rule mining along with JMeasure to fil- among appliances, which defines relationship among data in-
ter strong rules. It analyzes appliance energy consumption and terpreting consumer energy consumption behavior. Therefore,
related behavioral characteristics to assist energy conservation, with enormous quantity of data progressively being collected
demand-response and anomaly detection. The study in [11] from smart meters; it is not only of keen interest to energy
utilizes sequential association rule mining in conjunction with producers and utilities, but also to end- user consumers to mine
hierarchical clustering to extract appliance associations with such frequent patterns for defining and facilitating decision-
time and occupants activities to forecast energy consumption. making processes such as energy saving plans, demand re-
The above discussed approaches does not take human sponse optimization, and cost reduction.
behavioral variations, such as high uncertainty in order of use
of appliances to complete an activity, or variations, such as C. Frequent Itemsets and Association Rules
increased or reduced frequency of use of appliances due to In this subsection, we introduce preliminary background on
seasons. These dissimilarities have direct influence over energy frequent pattern mining based on [5]. Let Γ = {I1 , I2 , ..., Ik }
consumption patterns, which results in increased number of be an itemset containing k items (appliances) which is referred
patterns to be analyzed along with different interpretation to as k-itemset (lk ). Let DB, represents transaction database
of same event/activity at different occasions. We address with a set of transactions as described in table (I), where
these shortcomings by adapting incremental progressive un- each transaction Υ is an itemset (set of appliances) having
supervised machine learning through frequent pattern mining, Υ ⊆ Γ and Υ 6= ∅. The frequency of appearance of an
translating energy consumption patterns into frequent patterns itemset is the number of transactions that contain the itemset,
representing inter-appliance associations, as an illustration of defined as support count, or count of the itemset. Let, X
occupant behavior. and Y be set of items, such that X ⊆ Υ and Y ⊆ Υ.
Itemsets X and Y are considered frequent itemsets or patterns,
III. P ROPOSED M ODEL provided respective support sX and sY , if the percentage of
Figure 1 represents our proposed model with distinct four transactions of the itemset appears in the transaction database
phases; data preparation, frequent pattern mining, association DB are greater than or equal to minsup; where minsup is
rules generation, and visualization. In this section, we discuss the pre-defined minimum support threshold. support can be
these phases and provide details about the proposed mecha- viewed as probability of itemset in transaction database DB.
nism. This is referred to as relative support, whereas the frequency of
occurrence is known as absolute support. Hence, if the relative
A. Data preparation support of an itemset X(sX ) [or Y (sY )] satisfies a pre-defined
Smart meters time-series raw data, which is a high time- minimum support threshold minsup, then the absolute support
resolution data, is transformed into 1 min resolution load of X (or Y ) satisfies the corresponding minimum support
data; subsequently translated into a 30 minutes time-resolution count threshold.
source data, i.e. 24 * 2 = 48 readings per day, while recording Association rules are results of the second iteration of the
usage duration, average load, and energy consumption for each frequent pattern mining process, where already discovered
active appliance. All the appliances registered active during the frequent patterns are processed to generate association rules.
2016 IEEE Electrical Power and Energy Conference (EPEC)
Rules, of form {X ⇒ Y }, are generated using support − We extend pattern growth approach and present an incremental
conf idence framework, where support sX⇒Y is the per- frequent pattern mining strategy of progressive manner, which
centage of transactions containing (X ∪ Y ) in transaction is discussed next.
database DB, which is also can be seen as the probability 1) FP-growth : A Pattern-Growth Approach, Without Can-
P (X∪Y ). The conf idence cX⇒Y is defined as the percentage didate Generation For Mining Frequent Itemsets: Apriori [6]
of transactions in DB containing X that also contain Y , which algorithm with candidate generation approach suffers from
is the conditional probability, P (Y |X) [5]. Equations (1) and two problems, it generates a large number of candidate sets
(2) capture the above notions respectively. and repeatedly searches through the entire database to find
support for an itemset. To overcome these deficiencies, the
support(X ⇒ Y ) = P (X ∪ Y ) = support(X ∪ Y ) (1)
work in [3] and [4] proposed pattern growth or FP-growth
support(X ∪ Y ) approach, which exploits the divide-and-conquer technique. To
conf idence(X ⇒ Y ) = P (Y |X) = (2)
support(X) start with, it generates a compact representation of transactions
Hence, an association rule established as {X ⇒ Y }, from database in form of frequent pattern tree or FP-tree.
where X ⊂ Γ, Y ⊂ Γ, X ∩ Y = ∅, X 6= ∅, and FP-tree preserves the association information, derived from
Y 6= ∅, with support sX⇒Y ≥ minsup and conf idence each individual transaction, along with support count for
cX⇒Y ≥ minconf are classified as strong, where minconf each constituent item. Next, conditional databases(tree)
is pre-defined minimum confidence threshold. Additionally, for each frequent item is extracted from FP-Tree to mine
association rule’s support sX⇒Y will automatically satisfy frequent patterns, which the item under consideration is part
the minimum support threshold as the rules are essentially of. Therefore, inspecting only divided portion relevant to the
generated from frequent patterns X, and Y having respective item and its associated growing patterns, and addressing both
support sX , sY ≥ minsup. Thus, once the support for X, the shortcoming of Apriori approach.
Y , and (X ∪ Y ) are determined, corresponding association 2) Incremental frequent pattern extraction: Our proposed
rules {X ⇒ Y } and {Y ⇒ X} can be extracted, which technique exploits the benefits of pattern growth strategy
satisfies minsup and minconf i.e. association rule generation and extends it to achieve incremental progressive mining of
process can be deduced to a two-step operation; first, frequent frequent patterns by mining in quantum of 24 hour period, i.e.
pattern mining, and second, generating strong association rules frequent patterns are extracted from data comprising of appli-
of interest [5]. ance usage tuples for every 24 hours period, in a progressive
manner. With this approach, we mine only a portion of entire
D. Incremental Approach of Data Mining for Frequent Pattern database at each iteration thus reducing the memory overhead
Extraction using FP-growth: Discovering Inter-Appliance As- for FP-growth strategy and achieve improved efficiency.
sociations In our proposed approach, available data is recursively
In this subsection, we discuss our proposed approach to- mined in quantum of 24 hours, and frequent patterns discov-
wards incremental progressive frequent pattern mining along ered database, represented in table II, is maintained across
with additional interestingness measures for the discovery of successive mining exercises. In other words, data mining can
correlation relationships. be viewed as a process conducted at the end of each day
The mining of frequent patterns is generally considered as in an incremental manner. During each consecutive mining
an off-line and costly process on large databases. In, a real operation, support count and database size for the existing
world application, transaction data generation is a continuous frequent patterns are incremented and new patterns, with
process, where new transactions are generated and old trans- applicable support count and database size, are added to
actions may become obsolete as the time progress. In such persistent database. Moreover, we cease the use of minimum
situation, existing frequent patterns are invalidated and/or new support threshold minsup at the mining stage to eliminate any
frequent pattern associations are established. Therefore, an candidate patterns, resulting in discovery of all the possible
incremental and progressive update strategy is imperative, frequent patterns. This change in technique is incorporated to
where these variations/updates are taken into account and avoid missing candidate patterns, which can become frequent
discovered frequent patterns are duly maintained. For example, if time quantum is increased or complete database is mined in a
an appliance such as room-heater generally will be used during single operation. At the end of mining process, database size
winter and we can expect reduced usage frequency during is updated for all the frequent patterns in the frequent patterns
other seasons. As an effect a significant gain in support during discovered database (table II) to ensure correct computation
winter but decrease during other seasons will be registered. of support.
As a result, room-heater should appear higher on the list Frequent patterns discovered database (table II), can be
of frequent patterns and association rules during winter, but maintained in-memory using hash table data-structure or off
much lower during summer or spring. This objective can be the memory in Database Management System (DBMS). The
achieved through progressive incremental data mining while latter approach reduces the memory requirement on the cost of
eliminating the need for re-mining the entire database at a marginal increase in processing time, whereas former approach
regular interval. Frequent pattern mining in a large database reduces processing time but requires more memory. Consid-
can be accomplished through pattern growth approach [3], [4]. ering smart meter environment although quicker processing
2016 IEEE Electrical Power and Energy Conference (EPEC)
time is of importance, however persistence of information Algorithm 2 Function save update f requent pattern
discovered through days, months or years is more vital to Require: Frequent Pattern extracted F P extracted, sup-
achieve useful and usable results for the future. Therefore, port count absolute support, Frequent pattern discovered
we prefer permanent storage using DBMS over in-memory database F P DB
volatile storage. We present the extended two-step process Ensure: Add or update Frequent Pattern in frequent patterns
for frequent pattern mining using FP-growth, i.e. constructing discovered database
FP-Tree and generating frequent patterns, in algorithm (1). 1: Search a frequent pattern F P = F P extracted in
Algorithm (2) outlines the mechanism to achieve persistent F P DB
storage of frequent patterns discovered from mining process 2: if Frequent Pattern found then
into a permanent storage such as DBMS. 3: Increment support count by absolute support.
Step-1 Constructing Frequent Pattern Tree (FP-Tree): 4: else
It takes two scans of transaction database; first to create the 5: Add a new Frequent Pattern with support count
list of 1-itemset frequent itemsets with support, and second absolute support and Database size = 0.
to construct the FP-Tree [3], [4]. In our setting, we do 6: end if
not eliminate 1-itemset frequent itemsets based on minimum
support threshold (minsup) for the reasons discussed earlier, TABLE II
which is a departure from original proposed algorithm by [3], F REQUENT PATTERNS : FREQUENT PATTERNS DISCOVERED DATABASE
Fig. 2. Inter-Appliance Associations extracted through frequent pattern mining with minsup ≥ 0.2
500 900
450 800
400 700
350
600
300
500
250
400
200
300
150
100 200
50 100
0 0
2013-07-20 2013-07-30 2013-08-09 2013-08-19 2013-08-29 2013-07-20 2013-07-30 2013-08-09 2013-08-19 2013-08-29
Period (Dates) Period (Dates)
1200
800
14th International Conference on Machine Learning and Applications
600 (ICMLA), Miami, FL, USA, pp 1123 - 1129 London, UK: IEEE,
2015.
400
[3] J. Han, M. Kamber, J. Pei, Mining Frequent Patterns without Candidate
200 Generation, In: Proc. Conf. on the Management of Data (SIGMOD00,
Dallas, TX), pp 1 - 12. NY, USA : ACM Press, 2000.
0
2013-07-20 2013-07-30 2013-08-09 2013-08-19 2013-08-29
[4] J. Han, M. Kamber, J. Pei, Mining Frequent Patterns without Candi-
Period (Dates) date Generation: A Frequent-Pattern Tree Approach, Data Mining and
Knowledge Discovery, 8, pp 53 - 87, 2004. Netherlands : Kluwer
Fig. 4. Energy consumption: Monitor Academic Publishers, 2004.
[5] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques,
(Third Edition), Chapter 6, pp 243 - 278. San Francisco, USA: Morgan
300
Kaufmann Publishers (Elsevier), 2012.
250 [6] R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules,
In Proceedings of the 20th International Conference on Very Large
Power Consumption (Watts)
200
Data Bases (VLDB94), Santiago, Chile. San Francisco, USA: Morgan
150 Kaufmann Publishers (Elsevier), 1994.
[7] J. Kelly, W. Knottenbelt, The UK-DALE dataset, domestic appliance-
100
level electricity demand and whole-house demand from five UK homes.,
50 Sci. Data 2:150007 doi: 10.1038/sdata.2015.7. London, UK: Scientific
Data, 2015.
0
2013-07-20 2013-07-30 2013-08-09 2013-08-19 2013-08-29 [8] M. Hassani, C. Beecks, , D. Tws, T. Seidl, Mining Sequential Patterns
Period (Dates) of Event Streams in a Smart Home Application, Proceedings of the
LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany,
Fig. 5. Energy consumption: Speakers Trier, Germany: http://ceur-ws.org, 2015.
[9] Y. Chen, H. Hung, B. Chiang, S. Peng, P. Chen, Incrementally Mining
Usage Correlations among Appliances in Smart Homes, 2015 18th
International Conference on Network-Based Information Systems Taipei,
extend our analysis of energy consumption behavior to include Taiwan, pp 273 - 279 : IEEE, 2015.
prediction of multiple appliances usage to forecast energy [10] S. Rollins, N. Banerjee, Using Rule Mining to Understand Appliance
Energy Consumption Patterns, 2014 IEEE International Conference
consumption on a short-term and long term basis. on Pervasive Computing and Communications (PerCom), Budapest,
Hungry, pp 29 - 37 : IEEE, 2014.
R EFERENCES [11] S.Rollins, N. Banerjee, Data Mining Techniques for Detecting House-
hold Characteristics Based on Smart Meter Data, Energies 2015, 8(7),
[1] T. Yu, N. Chawla, S. Simoff, Computational Intelligent Data Analysis 7407 - 7427; doi:10.3390/en8077407 Basel, Switzerland : Energies,
for Sustainable Development, Chapman and Hall/ CRC, Chapter 7, 2013 MDPI, 2015.