Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
1

Energy Theft Detection with Energy Privacy


Preservation in the Smart Grid
Donghuan Yao, Mi Wen, Member, IEEE, Xiaohui Liang, Member, IEEE, Zipeng Fu, Kai Zhang and Baojia Yang

Abstract—As a prominent early instance of the IoT in the As smart meters collect real-time energy usage that may reveal
smart grid, the advanced metering infrastructure (AMI) provides user’s habits and behavior at home, the user privacy concern
real-time information from smart meters to both grid operators will be raised if the collected data is not well protected [7]. For
and customers, exploiting the full potential of demand response.
However, the newly-collected information without security pro- example, if the user’s daily energy consumption is low, it may
tection can be maliciously altered and result in huge loss. In imply that the user is not at home [8]. Thus, such privacy-
this paper, we propose an energy theft detection scheme with sensitive information must be protected from unauthorized
energy privacy preservation in the smart grid. Specially, we access. To disclose the usage for theft detection and to hide
use combined convolutional neural networks (CNN) to detect the usage for privacy preservation are conflicting goals. We
abnormal behavior of the metering data from a long-period
pattern observation. In addition, we employ Paillier algorithm aim to address both theft detection and privacy preservation
to protect the energy privacy. In other words, the users’ energy in this work.
data are securely protected in the transmission and the data A number of works have been conducted for energy theft
disclosure is minimized. Our security analysis demonstrates that detection in the smart grid. Some used the classification-
in our scheme data privacy and authentication are both achieved. based support vector machine (SVM) technique to classi-
Experimental results illustrate that our modified CNN model
can effectively detect abnormal behaviors at an accuracy up to fy the normal and attack samples from the energy usage
92.67%. database [9]–[11]. In addition, matrix decomposition [12],
linear regression [13] and state estimation [14] can be used
Index Terms—Energy Theft , Privacy Preserving, CNN, Smart
Grid. to analyze the data for energy theft detection. However, these
approaches cannot be applied to cases with massive amounts of
data. Zheng et al. [15] proposed a wide and deep convolutional
I. INTRODUCTION
neural network model to analyze energy theft behavior of
HE INTERNET of things (IoT) and artificial intelligence
T (AI) are two cornerstone technologies enabling smart
cities, and have been interacting with each other into an
individual users. In our paper, we additionally study the energy
theft behavior from a user group perspective, i.e., a group of
users may exhibit similar energy consumption patterns due
organic ecosystem. In the smart grid, smart meters and various to local activities for a certain period of time. We plan to
sensors are widely used to increase the two-way commu- exploit this behavior characteristic to more accurately detect
nication capability. Combined with the advanced metering the sophisticated attacker.
infrastructure (AMI), they enable energy companies to obtain Most theft detection schemes require the access of the
real-time voltage, current, active power, reactive power, energy original smart meter data that are highly user privacy-sensitive.
usage and other measurements from the smart meters deployed Although privacy-preserving techniques have been introduced
at user homes [1], [2]. Recently, smart meters are shown in the smart grid communication [16]–[18], they are rarely
to be vulnerable to cyber physical attacks in the smart grid proposed in the context of theft detection. One work is
due to their insecure and distributed network and physical developed under an assumption that the normal energy output
environment [3]–[5]. One serious threat is energy theft attacks, of a photovoltaic device is similar to that from a geographical
which cost more than 25 billion dollars every year to the region [19]. With the homomorphic encryption technique, the
energy companies [6]. Such an attack aims to pay less by calculation of the distance of two vectors is conducted while
attacking user meters to tamper with the energy usage sent the vectors (energy data) are not disclosed to unauthorized
to energy company. Another severe threat is privacy violation. entities. However, the proposed work detects energy theft from
This work is supported by the National Natural Science Foundation of the perspective of generators, it cannot solve the diversity of
China under Grant No.61872230, No.61572311 and No.61802248; by the theft. For example, if a user’s meter is tampered with usage
Dawn Program of Shanghai Education Commission under grant no. 16SG47; by external illegal attack, it cannot be detect. In addition,
Donghuan Yao, Mi Wen and Kai Zhang are with the College of Computer
Science and Technology, Shanghai University of Electric Power, Shang- Salinas et al. [20] proposed a privacy-preserving state esti-
hai 201101, China (Email: dhyao1@mail.shiep.edu.cn; miwen@shiep.edu.cn; mation scheme based on two loosely coupled filters to detect
kzhang@shiep.edu.cn). energy theft attacks and achieve privacy preservation. But it
Xiaohui Liang is with the Department of Computer Science, University of
Massachusetts Boston (Email: xiaohui.liang@umb.edu). is not conformed the actual grid operation, because it protects
Zipeng Fu is with the Department of Computer Science, University privacy by sending residual rather than user usage, so the smart
of California, Los Angeles Los Angeles, CA 90095, USA (Email: fu- grid can not be dispatched and paid for bills.
zipeng@engineering.ucla.edu).
Baojia Yang is with the Microsoft Suzhou (Office) (Email: ybaoji- In this paper, we propose an energy theft detection scheme
a91@gmail.com). with energy privacy preservation in the smart grid to ensure

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
2

user privacy and realize the detection of theft. Specifically, we algorithm called social-spider optimization for feature selec-
employ the combined convolutional neural networks (CNN) tion purposes. Feature selection, tuning parameters and feature
for analyzing the reported usage data and detecting the fake selection+tuning parameters, are chosen as model selection.
data. To our best of knowledge, only [15] and ours use CNN to However, most of these studies are less accurate in energy theft
detect theft. Utilizing the homomorphic encryption technique, detection and require artificial feature extraction according to
we can protect the energy usage in the transmission and domain knowledge.
further enable the gateway to aggregate the authentic user
energy usage without accessing any original usage data. In B. Privacy Preserving with Data Aggregation
addition, the control center can only access the sum of the Recently, a number of works focused on data aggregation
authentic usage data and the number of users who honestly to preserve the privacy of users information in the smart
report their usage data. The control center is unable to access grid [27], [28]. It is assumed that the aggregate usage data
the original energy usage data of individual users, which are provides enough information to the entity without exposing
highly privacy-sensitive. The main contributions of this paper the individual user’s information privacy, that is, the entity can
are threefold. only know the whole data rather than the personal data. In [29],
• First, we build a combined convolutional neural networks Bao et al. proposed a differentially private data aggregation
model for detecting the abnormal theft behavior based scheme for aggregating smart meter measurements. In specific,
on the similarity of the users energy consuming behavior every smart meter reports an encrypted data onto the gateway,
in a local user group. The use of the user group data then, the gateway aggregates all the reported data and sends
helps overcome the data incompleteness problem, address the aggregated value to the control center. The control center
more sophiscated theft detection problem, and eventually decrypts the aggregated value to get the summation of all smart
increases the detection accuracy. meter readings.
• Second, we realize the dispatching of smart grid under the Different from previously privacy-preserving theft detection
premise of protecting users’ privacy, where we utilize the schemes, the set aggregation in the smart grid communications
homomorphic encryption to achieve privacy-preserving of of our proposed scheme enables the control center to obtain
data aggregation and efficient smart grid communications. not only the whole aggregated energy usage, but also the
• Third, we provide a comprehensive security analysis to number of users who honestly report their usage data. With
show that the proposed scheme achieves the desired this kind of set aggregation, the control server can carry out
security property. In addition, we conduct extensive ex- more accurate data analysis for monitoring and controlling the
periments on massive realistic energy usage dataset. The smart grid.
experimental results show that our proposed combined
CNN model outperforms other existing approaches in III. SYSTEM MODEL, S YSTEM D ESIGN G OAL AND
terms of accuracy. S YSTEM S ECURITY R EQUIREMENTS
The remainder of this article is organized as follows: after In this section, we formalize the system model, system
related work in section II, we introduce system model, system design goal and system security requirements.
design goal and system security requirements in Section III.
In Section IV, we review the relevant knowledge. Section V A. System Model
presents our proposed scheme. We give security analysis in
Section VI, while Section VII gives the experimental results. In this section, we discuss how users send the energy usage
Finally, we conclude this paper. information to the control center/gateway and a perpetrator of
theft detection. As shown in Fig. 1, it includes users, local area
network (LAN), control center (CC), and trusted third party
II. RELATED WORK (TTP).
This section discusses related work in two categories, energy • Users: Each user is equipped with a smart meter that
theft detection and privacy preserving with data aggregation. connects the smart devices at home to aggregate their energy
consumption [16]. Then the smart meter sends the usage to
the energy utility via the gateway for data analysis, charging
A. Energy Theft Detection and reasonable energy dispatching.
Some works have been conducted to investigate the energy • LAN: It is a collection of users in a certain area. The
theft problem in the smart grid, where existing technique LAN is a server with memory cells, processing units, and
can be generally classified into three categories: state esti- gateways [30]. Its gateway (GW) serves as a relay and
mation, game theory and machine learning. The classic state aggregator role in the system. The server gateway (SG) is a
estimation-based solutions [14], [21]–[23] usually introduced detection server with processing units, which used to energy
some integrated distribution state estimation tricks to realize; theft detection (we think it is trusted). The LAN connects
while the game theory-based method is considered to be a involved users and the control center in the smart grid.
new way to detect energy theft in energy-theft issues [24], • CC: The control center is the core entity in the energy
[25]. Previous works have investigated the prevention and company, who is responsible for processing and analyzing the
detection of attacks by using classification-based detection information from users. It considers the LAN as a unit and
technique, such as SVM. In [26], Pereira et al. introduced an does not know the details of each user under the LAN.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
3

Secure Information Flows


TTP

Energy Usage Transmission of Aggregation


Transmission Theft Result
SG
Energy Usage Transmission GW
Users LAN Control Center

Fig. 1. System model

• TTP: The trusted third party is a key generation party, • Data Privacy: Users’ private information is not revealed
which issues keys to other entities; it also issues a unique ID to the adversary; CC should knows nothing about the details
for each user, GW and SG and these IDs are stored in a secure of individual user’s usage.
place. Assume that TTP is trusted by all entities and would • Data information Confidentiality: The user’s energy usage
not be compromised. and bills should be protected against any adversary. Even if
We believe that smart grid contains local area networks an adversary eavesdrops on data transmission links, no useful
LAN s = {LAN1 , LAN2 , · · · , LANm }. These LANs engage information can be extracted from them. Additionally, if the
in two-way communications with the smart meter network, adversary steals the data from LANs’ and/or CC’s databases,
perform aggregation and authentication operations to ensure it cannot identify each users data, either.
data authenticity and integrity [31]. Simultaneously, each LAN • Data integrity and Authentication: If an adversary tries
contains users Us = {U1 , U2 , · · · , Ui , · · · , Uw }, where assum- to resend or modify data, these malicious behaviors should be
ing w is most 100 to alleviate the load on the LAN server. detected to ensure the integrity of data. In addition, the data
This paper leverages the measurements and communication should ensure that any unauthorized access or modification
capabilities of smart meters to detect energy thieves in a is detected, which means that adversaries can not invade or
privacy-preserving manner. falsify data within the LAN. Meanwhile, only the correct
reports can be received.

B. System Design Goal


IV. BACKGROUND K NOWLEDGE
Energy theft is a criminal behavior in the smart grid that This section reviews the convolution neural network tech-
manipulates the output of a smart meter. If an illegal user is nology that used to detect abnormal behavior of the metering
able to operate a meter, he can attack the meter and tamper data and the Paillier homomorphic algorithm that used to
with the amount of energy sent to the LAN. The purpose of the protect data privacy.
dishonest user is to reduce his own energy bills by reducing
the energy usage. In this case, the illegal user may tamper all
or some of the functionalities of the home-level meter, which A. Convolution neural network (CNN)
is easy to launch and difficult to detect. Hence, the system we In the research of image recognition, including the compe-
proposed aims to detect energy theft through users’ energy tition of authoritative Imagenet, the top algorithms in the list
usage pattern at user sides, that is, the behavior of stealing are all from CNN, such as VGG, ResNet, etc. In particular,
energy should be successfully and effectively detected, while CNN algorithm plays an important role in data processing in
still be expected to be realized in a privacy-preserving manner. matrix form.
1) The structure of CNN: A CNN architecture is established
by a stack of distinct layers that transform the input volume
C. System Security Requirements
into an output volume (e.g. holding the class scores) through a
In our system under consideration, the control center and differentiable function. Fig.2 shows that a CNN consists of an
GW are honest but curious, that is, they don’t change user- input, an output layer and multiple hidden layers [32], where
s’energy usage during communication, but they are curious the hidden layers are composed up of convolutional layers,
about the specific electrical information of each user. However, pooling layers, fully connected layers and so on.
the adversary in the region is malicious, namely, actively The convolutional layer consists of a set of learnable filters
eavesdrop on communication between different departments, or kernels, which have a small receptive field but extend
modify communication information or launch replay attacks. through the full depth of the input volume. During the forward
Therefore, our security requirements are as follows: pass, each filter is convolved across the width and height of the

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
4

Input

Output

Convolutions Poolings Convolutions Full Connections Softmax

Fig. 2. CNN architecture

input volume, computing the dot product between the entries descent optimization [35]. The basic idea is to get “gradient”
of the filter and the input and producing a 2-dimensional through a randomly selected data (xi , yi ), so as to update the
activation map of that filter. The Pooling layer is a form of weight W via Wt+1 ← Wt + ηθ(−yi WtT xi )(yi xi ).
non-linear down-sampling and serves to progressively reduce
the spatial size of the representation, to reduce the number
B. Paillier Homomorphic Algorithm
of parameters and the amount of computation in the network,
and hence to control overfitting. After several convolutional The Paillier cryptosystem can achieve the homomorphic
and max pooling layers, the high-level reasoning in the neural properties, which is widely desirable in many privacy preserv-
network is done via fully connection layers. Neurons in a ing applications [8], which consists of five main parts [36].
fully connection layer have connections to all activations in the 1) Generation of homomorphic key: Select a security pa-
previous layer. The fully connection layer is used to generate rameter κ and two large primes p and q, where |p| = |q| = κ.
the final output. Compute the parameters n = pq and λ = lcm(p − 1, q − 1)
2) Activation function: In artificial neural networks, the and select the element g ∈ Zn∗2 ; set public key as (n, g) and
activation function of a node defines the output of that node private key as λ. Define the function
given an input or set of inputs, the nonlinear activation
functions allow networks to compute nontrivial problems using L(ϕ) = (ϕ − 1)/n.
only a small number of nodes. Rectified linear unit (Relu) 2) Encryption: Select a random number ri ∈ Zn∗2 , the
is a common activation function [33], we use it in our encryption operation for plaintext ci = E (mi ) = g mi rin
neural network framework. The equation of Relu performs as where ci is the ciphertext of the plaintext mi .
follow [34]: { 3) Decryption: Decrypt the ciphertext ci into a plaintext
0, x < 0; mi :
f (x) =
x, x > 0. L(cλi mod n2 )
mi =D(ci ) = mod n.
L(g λ mod n2 )
For the classified problem, the softmax function is a com-
mon one added to output layer to get category, which squashes 4) Aggregation: Aggregate multiple ciphertext ci =
a K-dimensional vector of arbitrary real values to a K- E (mi ) = g mi rin , which 1 ≤ i ≤ w, as follows:
dimensional vector of real values where each entry is in the ∏
w ∏
w
range (0, 1], and all the entries add up to 1, c= ci mod n2 = = g m1 +m2 +···+mn rin mod n2 .
i=1 i=1
ezj
σ(z)j = ∑K f or j = 1, . . . , K. 5) Decrypt the aggregated ciphertext: Decrypt the aggre-
k=1 ez k
gated ciphertext as
3) Loss function and Optimizer: To train the neural net-
L(cλ mod n2 )
work, we define loss function and optimizer to adjust the m= mod n,
weights. We use categorical cross-entropy as loss function and L(g λ mod n2 )
stochastic gradient descent (SGD) as optimizer in our neural among them, m = m1 + m2 + · · · + mw .
network framework.
The cross entropy for the distributions u and v over a given
V. PROPOSED SCHEME
discrete set is defined as below:
∑ In this section, we introduce our energy theft detection
H(u, v) = − u(x) log v(x). scheme with energy privacy preservation in the smart grid.
x
To describe this scheme, we divide the process into two parts,
SGD is an iterative method for optimizing a differentiable namely energy privacy preservation and energy theft detection
objective function, a stochastic approximation of gradient with proposed combined CNN.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
5

PART 0: F EASIBILITY A NALYSIS AND P REPARATION Receiving pp and a security parameter κ chosen by TTP,
This part analyzes the feasibility of theft detection with the the CC also initializes the Paillier encryption algorithm by
combined CNN model. selecting two large prime numbers p and q with regard to κ
satisfying |p| = |q| = κ, computing two parameters n = pq
and λ = lcm(p − 1, q − 1). Select g ∈ Zn∗2 as the generator
A. Data attributes and set the public key is (n, g) and private key is (λ) in the
Two important attributes of electricity consumption behavior Paillier encryption algorithm.
of users are considered: one is periodicity, that is users usually
consume energy cyclically (daily or weekly) [15]; the other
B. System Registration
one is group similarity, that is users always follow some
similarly patterns with others that are in a same group. For When to register the system, a GW of the local area network
instance, the users who come from one community share sim- first chooses a random number xg ∈ Zq∗ as the private key,
ilar energy consumption environments and may also produce and computes the corresponding public key Yg = xg P ; a SG
same behaviors that may cause big valid changes on energy chooses a random number xs ∈ Zq∗ as the private key, and
usage side. computes the corresponding public key Ys = xs P ; a user
i ∈ U of the LAN chooses a random number xi ∈ Zq∗ as
the private key, and computes the corresponding public key
B. Modeling
Yi = xi P .
According to the periodicity, the obtained sequence data can
be converted into the matrix form. For example, a 28 days of
energy usage data can be formalized into a matrix with the C. Energy Usage Transmission
shape of 4 ∗ 7 by weekly cycle. To use group similarity, we As the CC needs to proceed managements and control
select some reference users who come from the same group decision towards the grid power system, in such a case that,
to the auxiliary input of the model as the detected target user. each user should send its real-time data to the CC.
Therefore, the CNN model consist of two inputs: target user Concretely, the user i encrypts its data mi by Paillier homo-
data and the reference users data. morphic encryption by choosing a random number ri ∈ Zn∗2
and computing a ciphertext
C. Training model
ci = E(mi ) = g mi rin mod n2 .
A major difficulty in detecting is to obtain abnormal meter
data, as is hard to manually label the data, we use the labeled Then, the user i uses the private key xi to generate a
database from State Grid Corporation of China (SGCC) [37] signature σi on a ci as [38]
which contains the energy usage data of 42, 372 customers
within 1, 035 days. Additionally, we insert a zero value to pad σi = xi H(ci ∥LAN ∥Ui ∥T S ).
of each user’s energy usage sequence to make each user’s data
length up to 1036, then convert each data into a matrix whose Where T S is the current timestamp (used to resist potential
shape is (148 ∗ 7), which aims to enables the length of input replay attack). Finally, the user sends the encrypted usage data
data to meet the integer multiple of the cycle in CNN. ci ∥LAN ∥Ui ∥T S ∥σi to both the GW and the SG.

PART 1: E NERGY P RIVACY P RESERVATION D. Recovery of the Encrypted Energy Usage


In this part, we show how to use the Paillier Homomorphic The SG verifies the validity of e(P, σi ) =
Encryption to protect the privacy of the energy usage data e(Yi , H(ci ∥LAN ∥Ui ∥T S )) and recover the corresponding
along with state information. usage data mi =D(ci ) from the ciphertext ci .

A. System Initialization E. Transmission of Theft Result


The system initialization inputs a security parame-
The theft detection state is expressed as 1 or 0, where the
ter (1λ ) and generates the public parameter pp =
abnormal data is labeled as 1, and vice the normal data is
(q, P, G1 , G2 , GT , g1 , g2 , φ, HΛ (·), e), where G1 and G2 are
represented as 0. The SG picks ri ∈ Zn∗2 and encrypts the
two cyclic groups of the same prime order q, P ∈ G is a
detection result ti into a ciphertext
generator, GT is a multiplicative cyclic group, g1 and g2 are
the generators of G1 and G2 , respectively, and φ(g2 ) = g1 , φ ai = E(ti ) = g ti rin mod n2 ,
is an isomorphic mapping, e : G1 × G2 → GT is a bilinear
mapping and HΛ (·) is an hash function with a key. where ti is 1 or 0. It uses the private key xs under
The TTP selects a system master key s ∈ Zp∗ and computes the current timestamp T S to generate a signature βi =
the system public key y = g2s , also randomly chooses δ, x ∈ xs H(ai ∥LAN ∥SG ∥T S ), in order to resist potential replay
Zq∗ and computes e(P, P )δ , Y = xP and selects two hash attack. Finally, the SG sends the encrypted detection result
functions: H1 (·) : {0, 1}∗ → G1 and H2 (·) : {0, 1}∗ → G2 . ai ∥LAN ∥SG ∥T S ∥βi to the GW.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
6

F. Aggregation A. Data Preprocessing


Receiving the total ω encrypted energy usage data The energy usage data consists of missing or erroneous
ci ∥LAN ∥Ui ∥T S ∥σi , for i = 1, 2, · · · , ω, the local GW values. We exploit the forward interpolation method to recover
first checks the time stamp T S and the signature σi to verify the missing values as
its validity by e(P, σi ) = e(Yi , H(ci ∥LAN ∥Ui ∥T S )). In

order to efficiently proceed the verification, the GW performs 
0,
 xi ∈ N aN, i = 1;
verification in a batch way as
f (xi ) = xi−1 , x i ∈ N aN, i > 1;



ω ∏
ω
xi , xi ∈
/ N aN.
e(P, σi ) = e(P,xi H(ci ∥LAN ∥Ui ∥T S ))
i=1 i=1 Where xi represents the value in the energy usage data over a
∏ω
period (e.g., a day). If xi is a null or a non-numeric character,
= e(Yi ,H(ci ∥LAN ∥Ui ∥T S )).
i=1
we set it as a member of NaN (NaN is a set). We obtain energy
data from m users for n units of time. Assume that the length
Similarly, the GW verifies the validity of the SG. Then of time period is c units, we have the sample data set n = k ∗c
it performs the following steps for privacy-preserving report based on its k time cycles, including a reference group users’
aggregation. Firstly, the GW aggregates the encrypted usage data.
data c1 , c2 , · · · , cω into c as We could get m single samples in total and each sample
s is a vector whose length is k ∗ c + 1, where the last value

ω ∏
w
c= ci mod n2 = g m1 +m2 +···+mw ( rin ) mod n2 . of the vector is y which is a single value (1 or 0). Assuming
i=1 i=1 the size of each group is g, some samples can be combined to
Similarly, the GW aggregates the encrypted detection results reference groups. We construct a k ∗ c matrix S from previous
a1 , a2 , · · · , aω into a as values of each sample s.
For the common CNN input, its shape should be

ω ∏
w
(height, width, channel), which means the height, width and
a= ai mod n2 = g a1 +a2 +···+aw ( rin ) mod n2 .
i=1 i=1 color channel of an image. To format our data, the height
should be the number of cycle (k), the width should be the
Then, the GW uses its private key xg to produce a signature length of cycle (c), and we can use the channel dimension
σg = xg H(c ∥a ∥LAN ∥GW ∥T S ), where T S is the current to present the number of different users. For a single target
time stamp. Finally, the GW publishes the aggregated encryp- user, the channel should be 1. For the reference group users,
tion data c ∥a ∥LAN ∥GW ∥T S ∥σg to the CC. the channel should be g. So we can  get an image-like data 
[S1,1 ] . . . [S1,c ]
 
G. Decryption the Aggregated Ciphertext structure for a single user, namely  
..
.
..
.
..
.
,

Upon c ∥a ∥LAN ∥GW ∥T S ∥σg , the CC checks [Sk,1 ] · · · [Sk,c ]
e(P, σg ) = e(Yg , H(c ∥a ∥LAN ∥GW ∥T S )), and decrypts the shape is (k, c, 1). And we can get an image-like data
the aggregated data c and a as: structure for group users, the shape is (k, c, g).
For the output data, our purpose is to detect the label value
L(cλ mod n2 ) of current data, it should be a single dimensional vector as
m= mod n,
L(g λ mod n2 ) [sk∗c+1 ], the shape is (1). For the models we want to train,
each of them have two data set, one is the training data set,
L(aλ mod n2 ) the other is the validation data set. In order to split them out,
t= mod n, we apply the shuffle algorithm in our data set firstly, then slice
L(g λ mod n2 )
the data set to get training and validation data. So the samples
where m = m1 + m2 + · · · + mw and t = t1 + t2 + · · · + tw . are randomly selected as training or validation, as shown in
So far, the CC gets the knowledge of sum of energy usage Fig. 3.
and the number of normal meters and abnormal meters in an
area without knowing each user’s energy usage and the number
of normal meters, in order to give an accurate decision for the éë d1 , d 2 , , d m*(n(n - k *c ) ùû
grid. Shuffle

éë d1¢, d 2¢ , , dt¢,dt¢+1 , d¢m*(n -k *c ) ùû


PART 2: T HEFT D ETECTION WITH O UR P ROPOSED
C OMBINED CNN Training Validation

In this part, we show how to use our combined CNN model


to analyze the decrypted data of smart meters and send the Fig. 3. Split data set
detection results in a ciphertext version.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
7

Target
CL1-1

148x7x1 148x7x8

PL1 CL2 PL2 CL3 PL3 Flatten FL1 FL2 FL3


Merge
Output

Reference 148x7x16 37x6x16 37x6x32 9x5x32 9x5x64 3x2x64 384 128 32

CL1-2 148x7x8

148x7x10

Fig. 4. Our proposed combined CNN model framework

B. Our combined CNN model The above variables are adjustable, which gives more space
We use 2D convolution layers and full connection layers to to optimize the expressiveness and efficiency for our combined
build our proposed combined CNN framework, and use the CNN model. Fig.4 shows an example parameter set, which
merge layer to merge two input threads as shown in Fig. 4, j equals 128, c equals 7, g equals 10, α equals 8 and ρ
which includes 3 stages and a combine. equals 128. To simplify the example model, we add 1 parallel
• Individual features extracting stage: For the input layer, convolution layer in the individual features extracting stage,
we have two input threads, one for the target user, and one 3 pooling layers and 2 convolution layers in the combined
for the reference users. The shape of target input data should features extracting stage and 3 full connection layers in the
be (j, c, 1) since the target is a single user, while the shape of combined features reducing stage.
reference input data should be (j, c, g) since there are g users
as reference. We use a dedicated convolution layer for the two VI. SECURITY ANALYSIS
input threads, they both have α convolution filters; after the In this section, we analyze the security of the proposed
double parallel convolution layers, the shape of two sets of scheme. According to the security requirements proposed in
data will all be (j, c, α). Section III-C, we discuss whether the proposed scheme meets
• Combine: We use multiple parallel convolution layers to the requirements.
extract features from the two input data independently. Then, • Fine-grained data privacy preservation
we use merge layer to combine the features from the target In the scheme, the energy usage mi and detection result ti
data and reference data. Assume that after β parallel layers, are encrypted, while the LAN aggregates users’ information
the two sets of data are all in the form of (j, c, α), it turns into c and t and then sends them to the CC. For another side,
into (j, c, 2α) after the merge layer. the usage data sent to the CC is an aggregated set that include a
• Combined features extracting stage: We stack convolution number of users, in such a case that, the control center cannot
and pooling layers just as common CNN models. To extract reveal the energy usage of each individual user. In addition,
more features and reduce computation, we stack the convolu- the CC can access the number of user who honestly reports
tion layer and the pooling layer alternately, one convolution its usage data rather than the state of each individual user.
layer and one pooling layer each time, while the convolution • Data confidentiality
layer doubles the number of features and the pooling layer The user sends the energy usage ci ∥LAN ∥Ui ∥T S ∥σi
changes the shape. For example, assume the current shape is and the SG sends the encrypted detection result
(j, c, γ), after pooling layer, it become (j/2, c − 1, γ); after ai ∥LAN ∥SG ∥T S ∥βi to the LAN. Here, other users
convolution layer, it become (j/2, c − 1, 2γ). including the LAN know nothing about the actual energy
• Combined features reducing stage: After several convolu- usage plaintext and detection result. The data of smart
tion and pooling layers, we flatten the shape to one dimension meters are sent to LAN after being encrypted by the
so we can stack some full connection layers. The shape of the Paillier algorithm, and then transmitted to CC by encrypted
layer just flattened will be very large, we add a full connection aggregated data under homomorphism. In this process, the
layer whose length is φ to change the shape to (φ), and stack users data have been transmitted in ciphertext formats, the
smaller full connection layers in the following. Finally, we attacker can not get any information about the data.
employ a full connecting layer with softmax as the output layer • Data authentication and data integrity
to classify the target. Since we only have 2 categories, theft In our scheme, each user’s data and the aggregated data are
or not, so the final output shape is (2). One is the probability signed by a short signature combined with a timestamp, in
of theft, the other is the probability of normal, and the sum of such a way that, the validation and the integrity of the data
the two is 1. If the probability of theft is greater than normal, can be nicely guaranteed. If any adversary attempts to modify
we think the metering data is abnormal, and vice versa. the stored data, the LAN gateway or CC can detect it.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
8

Combined CNN model Single CNN model Simple DNN model 3 convolution layers, 3 pooling layers and 3 full connection
Input (148,7,1) and (148,7,10) Input (148,7,1) Input (1036) layers. Simple DNN model only include 3 full connection
Conv2d layer 1-1 (148,7,8) Conv2d layer layers.
Conv2d layer1-2 (148,7,8) (148,7,16) The models configurations, evaluated in this paper, are out-
Merge layer1 (148,7,16) --
lined in Fig. 5, one per column. The shape of target input data
Pooling layer1 (37,6,16) -- of our proposed combined CNN model is (148, 7, 1). Refer-
Conv2d layer2 (37,6,32)
ence input data’s shape is (148, 7, 10). Through Conv2d layer
Pooling layer2 (9,5,32)
1-1, the shape of the target thread data become (148, 7, 8).
Conv2d layer3 (9,5,64)
Followed by Conv2d layer 1-2, the shape of the reference
Pooling layer 3 (3,2,64)
thread data become (148, 7, 8). Merge layer combines the
FC layer1 (128)
target data and the reference data to one. After this layer,
FC layer2 (32)
the shape of the data turns to be (148, 7, 16). Pooling layer
FC layer3 (2)
uses maxpooling, the shape of the data turns to be (37, 6, 16).
Similarly, the corresponding shapes are formed through these
Fig. 5. Model configurations layers in sequence in each model.
As shown in Table I, we set random values as the origin
weights, 128 as the batch size. After that, we compile the
VII. EXPERIMENTAL RESULTS model with SGD optimizer and the loss function. To evaluate
In order to evaluate the proposed energy theft detection the performance of models, we consider the accuracy score as
scheme with energy privacy preservation, we conduct the sim- the metric function.
ulations on a 64 bit computer with dual Intel(R) Core(TM) i5-
TABLE I
2410M 2.30 GHz CPU and 4G RAM, using Python, Numpy, M ODEL C ONFIGURATIONS
Pandas, Tensorflow and Keras. The energy usage data comes
Origin weights Optimizer
from SGCC [37]. Proposed combined CNN model random SGD
Single CNN model random SGD
Simple DNN model random SGD
A. Experimental Data
We get the data from 42372 users during two and a half After model compiling, we train the model using input data
years, where each value means energy usage of each day and in a batch way and evaluate the performance metric value
the data has similarity per cycle whose length is 7 days [15]. in each epoch, where an epoch means that all samples are
Therefore, we set c to be 7 and k to be 148, which is used selected once at the training data set. The training results are
to detect theft based on history data. By randomly selecting shown in Fig. 6, where the horizontal axis indicates the number
80% of samples from the total 44218 samples, we compose of epochs of training and the vertical axis indicates the average
the training data set while the remaining 20% of samples to loss and accuracy score value. We observe that the average loss
compose the validation data set. Moreover, we use Keras as of our proposed combined CNN model becomes smaller and
the implementation tool to build and train our model. smaller as the training goes, and it achieves a higher accuracy
Accuracy score: To train our model, we use the categorical score than the single CNN model after 100 epochs training.
cross-entropy as the loss function. To evaluate the performance The training result of the simple DNN model achieves a lower
of models, we use accuracy score as performance score. The y accuracy score than CNN models after 100 epochs training.
means the predicted value and ŷ means the true value. yi is the
predicted value of the i-th sample and ŷi is the corresponding C. Method Comparison
true value. The following models share this accuracy score as To evaluate the performance of our combined CNN model,
below: we present the experimental results over the given dataset
1 ∑
nsamples to have a performance comparison with other traditional
accuracy(y, ŷ) = δ(ŷi , yi ) machine learning methods. Concretely, Linear SVC [39] is an
nsamples i=1 implementation of support vector classification for the linear
where { kernel case; Random Forest [40] is an averaging algorithm
1, ŷi = yi ; based on randomized decision trees; Logistic Regression [41]
δ(ŷi , yi ) = uses a logistic function to describe the possible outcomes of a
0, else.
single trial are modeled. Table II gives the arguments used
for the baseline methods to train these models. Using the
B. Model comparison same training data set and validation data set, we see that
We derive the described CNN framework to build the pro- our proposed combined CNN model gets the highest accuracy
posed combined CNN model shown in Fig. 4. As comparison, score of 0.9267 from Table II.
we employ a single CNN model and a simple deep neural
network (DNN) model. Our proposed combined CNN model D. Parameter Study
includes 2 input layers, 4 convolution layers, 3 pooling layers There are various configurable parameters of model which
and 3 full connection layers. The single CNN model include can’t optimize by training but can affect the performance

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
9

Training loss Validation loss Training accuracy Validation accuracy


0.32 0.32 0.95 0.93
Our proposed combined CNN model Our proposed combined CNN model Our proposed combined CNN model Our proposed combined CNN model
Single CNN model Single CNN model Single CNN model 0.928 Single CNN model
0.3 0.945
Simple DNN model Simple DNN model Simple DNN model Simple DNN model
0.3
0.926
0.28
0.94
0.924
0.26 0.28

accuracy value

accuracy value
0.935
0.922
loss value

loss value
0.24
0.26 0.93 0.92
0.22
0.918
0.925
0.2 0.24
0.916
0.92
0.18
0.914
0.22
0.16 0.915
0.912

0.14 0.2 0.91 0.91


0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
epochs epochs epochs epochs

Fig. 6. Model comparison

Training loss Validation loss Training accuracy Validation accuracy


0.32 0.32 0.96 0.96
The combined model The combined model
0.3 0.3 0.955 The combined model(1) 0.955 The combined model(1)

0.28 0.95 0.95


0.28

0.26 0.945 0.945


0.26

accuracy value

accuracy value
0.24 0.94 0.94
loss value

loss value

0.24
0.22 0.935 0.935
0.22
0.2 0.93 0.93
0.2
0.18 0.925 0.925

0.18
0.16 0.92 0.92

0.16 The combined model 0.14 The combined model 0.915 0.915
The combined model(1) The combined model(1)
0.14 0.12 0.91 0.91
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
epochs epochs epochs epochs

Fig. 7. Parameter study of χ

Training loss Validation loss Training accuracy Validation accuracy


0.32 0.32 0.96 0.96
The combined model The combined model
0.3 0.3 0.955 The combined model(2) 0.955 The combined model(2)
The combined model(3) The combined model(3)
0.28 0.28 0.95 0.95

0.26 0.26 0.945 0.945


accuracy value

accuracy value
0.24 0.24 0.94 0.94
loss value

loss value

0.22 0.22 0.935 0.935

0.2 0.2 0.93 0.93

0.18 0.18 0.925 0.925

0.16 0.16 0.92 0.92


The combined model
The combined model
The combined model(2) The combined model(2)
0.14 0.14 The combined model(3) 0.915 0.915
The combined model(3)

0.12 0.12 0.91 0.91


0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
epochs epochs epochs epochs

Fig. 8. Parameter study of ε

Training loss Validation loss Training accuracy Validation accuracy


0.32 0.32 0.96 0.96
The combined model The combined model
0.3 0.3 0.955 The combined model(4) 0.955 The combined model(4)

0.28 0.28 0.95 0.95

0.26 0.26 0.945 0.945


accuracy value

accuracy value

0.24 0.24 0.94 0.94


loss value

loss value

0.22 0.22 0.935 0.935

0.2 0.2 0.93 0.93

0.18 0.18 0.925 0.925

0.16 0.16 0.92 0.92

The combined model


0.14 The combined model 0.14 0.915 0.915
The combined model(4)
The combined model(4)
0.12 0.12 0.91 0.91
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
epochs epochs epochs

Fig. 9. Parameter study of τ

TABLE II dropout layers are imported to the model. Hence, we give


A LGORITHM ACCURACY SCORES a deep analysis on the impacts of these parameters on the
Algorithm Arguments Accuracy score performance of our proposed combined CNN model. In Fig.
Proposed combined CNN model 100 epochs 0.9267 6, the parameter of the proposed combined CNN model is
Single CNN model 100 epochs 0.9218 χ = 32, ε = 0.1, τ = 0.2. The training results of models for
Simple DNN model 100 epochs 0.9145
Linear SVC kernel: linear function 0.9178 parameter study as follows.
Random Forest max depth: 7 0.9164 1) Effect of batch size χ : Fig. 7 shows the performance
Logistic Regression penalty: L2 0.9141 of our Combined model (1) with setting the batch size as
128 which achieves a high max accuracy with 0.9268, while
needs more epochs to optimize. We can see that a smaller
of model, such as batch size, learning rate of optimizer batch size can speed up the optimizing within same epochs,
and dropout rate. Batch size χ means how many samples which suggests that setting the bath size between 32 an 128
will be used to evaluate loss and do optimize each times; is more acceptable.
learning rate ε defines how many loss gradient will be used to 2) Effect of learning rate ε : Fig. 8 shows the performance
optimize the model and determines how fast the optimization of our Combined model (2) with the learning rate is 0.3
is; dropout rate τ defines how much the ratio of signals which achieves a lower accuracy of 0.9241 and reaches its
between layers will be random ignored; to avoid overfitting, max accuracy faster. While the learning rate in comparing

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
10

TABLE III
P ROPERTIES C OMPARISON

Proposed scheme Zheng et al. [15] Salinas et al. [20] Jindal et al. [10]
Technique adopted CNN CNN state estimation decision tree and SVM
User privacy Yes No Yes No
Dispatching of smart grid Yes No Yes Yes
Massive data processing Yes Yes No No

Training loss Validation loss Training accuracy Validation accuracy


0.34 0.6 0.95 0.95
Proposed combined CNN model Proposed combined CNN model
0.32 [15] [15]
0.55 0.945
0.3 0.9
0.5 0.94
0.28

accuracy value

accuracy value
0.45 0.935
0.26 0.85
loss value

0.24 loss value 0.4 0.93

0.22 0.8
0.35 0.925
0.2
0.3 0.92
0.18 0.75

Proposed combined CNN model 0.25 0.915 Proposed combined CNN model
0.16
[15] [15]
0.14 0.2 0.91 0.7
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
epochs epochs epochs epochs

Fig. 10. Our proposed combined CNN model versus [15]

Combined Model (3) is 0.03 shown in Fig. 8, this model training and validation set; the performance of our proposed
achieves a high accuracy of 0.9263 and achieves its max combined CNN model improves more stable along with train-
accuracy later. We can see that the learning rate affects the ing; finally, the max accuracy we achieved is 0.9267, is larger
speed of optimizing, and in our model setting learning rate than 0.9254 which [15] achieved. This improvement may owe
not bigger than 0.1 is more acceptable. to using two input threads from a user group perspective to
3) Effect of dropout rate τ : The dropout rate in the study the energy theft behavior.
Combined Model (4) is 0.1 shown in Fig. 9, which achieves
a high accuracy of 0.9267 but not always steady. And the VIII. CONCLUSION
performance gap between the validation set and the training In this paper, we have proposed an energy theft detection
set is very big, while the max accuracy in the training set is scheme with energy privacy preservation in the smart grid.
0.9607. We can see that the dropout rate reduces overfitting; The energy theft detection based on our proposed combined
and in our model importing dropout and setting its rate bigger CNN model is used to detect whether the metering data has
than 0.1 is more acceptable. an abnormal behavior. Moreover, the usage data of users and
the number of users who honestly report their usage data are
E. Comparison with Existing Schemes protected by the Paillier homomorphic algorithm. In addition,
This section elaborates the comparison of the proposed the security analysis shows that our scheme achieves confiden-
scheme with the existing schemes. The comparison results tiality and integrity, as well as data privacy. The experimental
reveal that user energy usage in zheng’s [15] and Jindal’s results show that the accuracy of anomaly detection is more
[10] scheme may be leaked and users’ privacy can not be better than others. For our future work, we intend to improve
guaranteed. Salinas’s [20] scheme can not realize the dispatch our scheme with less communication and computing overhead.
in the smart grid because the control center can’t know the
total energy usage in the area by sending residual to the R EFERENCES
operator. Furthermore, Salinas’s scheme and Jindal’s scheme [1] Y. Sun, L. Lampe, and V. W. S. Wong, “Smart meter privacy: Exploiting
the potential of household energy storage units,” IEEE Internet of Things
are unable to detect theft for massive data. Thus, as shown in Journal, vol. PP, no. 99, pp. 1–1, 2017.
Table III, we can see that the proposed scheme can achieve [2] T. Song, R. Li, B. Mei, J. Yu, X. Xing, and X. Cheng, “A privacy
user privacy and the dispatching of the smart grid, and detect preserving communication protocol for iot applications in smart homes,”
IEEE Internet of Things Journal, vol. PP, no. 99, pp. 1–1, 2017.
energy theft for massive data. [3] S. Mclaughlin, D. Podkuiko, and P. Mcdaniel, “Energy theft in the
advanced metering infrastructure.” in Critical Information Infrastruc-
F. Proposed Combined CNN model versus [15] tures Security, International Workshop, Critis 2009, Bonn, Germany,
September 30 - October 2, 2009. Revised Papers, 2009, pp. 176–187.
As the only two schemes that used the neural network to [4] R. Jiang, R. Lu, Y. Wang, J. Luo, C. Shen, and X. Shen, “Energy-theft
detect energy theft, we make a comparison. We build the detection issues for advanced metering infrastructure in smart grid,” ł(),
vol. 19, no. 2, pp. 105–120, 2014.
model which consists of the wide component and the deep [5] X. Shen, X. Liang, J. Lei, K. Zhang, R. Lu, and M. Wen, “Parq: A
CNN component from [15], and set paraments in the model privacy-preserving range query scheme over encrypted metering data
as ω = 90, ψ = 60, ξ = 90 and ζ = 3 (ω, ψ, ξ: a for smart grid,” IEEE Transactions on Emerging Topics in Computing,
vol. 1, no. 1, pp. 178–191, 2013.
parameter controlling the number of neurons, ζ: the number of [6] T. Ahmad, D. Q. U. Hasan, and S. Zada, “Non-technical loss detection,
convolution layers) to compare with our proposed combined prevention and suppression issues for ami in smart grid,” International
CNN model shown in Fig. 4 under the same data set. Journal of Scientific & Engineering Research, vol. 6, no. 3, pp. 217–228,
2015.
As shown in Fig. 10, using our proposed combined CNN [7] P. Mcdaniel and S. Mclaughlin, “Security and privacy challenges in the
model, the loss decrease faster and accuracy increase faster at smart grid.” IEEE Security & Privacy, vol. 7, no. 3, pp. 75–77, 2009.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
11

[8] R. Lu, X. Liang, X. Li, X. Lin, and X. Shen, “Eppa: An efficient and [30] P. Duplessis and P. Lescuyer, “Access network, gateway and manage-
privacy-preserving aggregation scheme for secure smart grid communi- ment server for a cellular wireless communication system,” 2010.
cations,” IEEE Transactions on Parallel & Distributed Systems, vol. 23, [31] Y. H. Heo, Z. Cai, A. M. Earnshaw, S. Mcbeath, and M. H. Fong,
no. 9, pp. 1621–1631, 2012. “Reporting power headroom for aggregated carriers,” 2013.
[9] S. S. S. R. Depuru, L. Wang, and V. Devabhaktuni, “Support vector [32] R. R. Varior, M. Haloi, and G. Wang, “Gated siamese convolutional
machine based data classification for detection of electricity theft,” in neural network architecture for human re-identification,” in European
Power Systems Conference and Exposition, 2011, pp. 1–8. Conference on Computer Vision, 2016, pp. 791–808.
[10] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra, [33] G. E. Dahl, T. N. Sainath, and G. E. Hinton, “Improving deep neural
“Decision tree and svm-based data analytics for theft detection in smart networks for lvcsr using rectified linear units and dropout,” in IEEE
grid,” IEEE Transactions on Industrial Informatics, vol. 12, no. 3, pp. International Conference on Acoustics, Speech and Signal Processing,
1005–1016, 2016. 2013, pp. 8609–8613.
[11] J. Nagi, K. S. Yap, S. K. Tiong, S. K. Ahmed, and A. M. Mohammad, [34] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
“Detection of abnormalities and electricity theft using genetic support boltzmann machines,” in International Conference on International
vector machines,” in TENCON 2008 - 2008 IEEE Region 10 Conference, Conference on Machine Learning, 2010, pp. 807–814.
2009, pp. 1–6. [35] L. Bottou, Stochastic Gradient Descent Tricks. Springer Berlin Hei-
[12] S. Salinas, M. Li, and P. Li, “Privacy-preserving energy theft detection delberg, 2012.
in smart grids,” in Sensor, Mesh and Ad Hoc Communications and [36] P. Paillier, “Public-key cryptosystems based on composite degree resid-
Networks, 2012, pp. 257–267. uosity classes,” in International Conference on Theory and Application
[13] S. C. Yip, C. K. Tan, W. N. Tan, M. T. Gan, and A. H. A. Bakar, of Cryptographic Techniques, 1999, pp. 223–238.
“Energy theft and defective meters detection in ami using linear regres- [37] https://www.datafountain.cn.
sion,” in IEEE International Conference on Environment and Electrical [38] D. Boneh, “Short signature from the weil pairing,” Asiacrypt, 2004.
Engineering and 2017 IEEE Industrial and Commercial Power Systems [39] H. Li and B. Liu, “Loss analysis simulation of svc / dc deicer under
Europe, 2017, pp. 1–6. svc mode,” in Power and Energy Engineering Conference, 2016, pp.
[14] S. Salinas, C. Luo, W. Liao, and P. Li, “State estimation for energy 2564–2568.
theft detection in microgrids,” in International Conference on Commu- [40] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and
nications and NETWORKING in China, 2015, pp. 96–101. B. P. Feuston, “Random forest: a classification and regression tool
[15] Z. Zheng, Y. Yang, X. Niu, H. N. Dai, and Y. Zhou, “Wide & deep for compound classification and qsar modeling.” Journal of Chemical
convolutional neural networks for electricity-theft detection to secure Information & Computer Sciences, vol. 43, no. 6, p. 1947, 2003.
smart grids,” IEEE Transactions on Industrial Informatics, vol. PP, [41] D. W. Hosmer, T. Hosmer, C. S. Le, and S. Lemeshow, “A comparison
no. 99, pp. 1–1, 2017. of goodness-of-fit tests for the logistic regression model.” Statistics in
[16] A. Abdallah and X. Shen, “Lightweight security and privacy preserving Medicine, vol. 16, no. 9, pp. 965–980, 2015.
scheme for smart grid customer-side networks,” IEEE Transactions on
Smart Grid, vol. PP, no. 99, pp. 1–1, 2017.
[17] M. Wen, X. Zhang, H. Li, and J. Li, “A data aggregation scheme with
fine-grained access control for the smart grid,” in Vehicular Technology
Conference, 2018, pp. 1–5.
[18] M. Wen, R. Lu, J. Lei, X. Liang, H. Li, and X. Shen, “Ecq: An efficient
conjunctive query scheme over encrypted multidimensional data in smart
grid,” in Global Communications Conference, 2013, pp. 796–801.
[19] C. Richardson, N. Race, and P. Smith, “A privacy preserving approach Donghuan Yao received the Bachelor’s degree in
to energy theft detection in smart grids,” in Smart Cities Conference, School of Electronics and Electrical Engineering
2016. from Changsha university, China, in 2015, and the
[20] S. A. Salinas and P. Li, “Privacy-preserving energy theft detection in master degree in Department of Computer Science
microgrids: A state estimation approach,” IEEE Transactions on Power and Technology from Shanghai University of Elec-
Systems, vol. 31, no. 2, pp. 883–894, 2016. tric Power, China, in 2019. Her research interest
[21] W. Luan, G. Wang, Y. Yu, J. Lin, W. Zhang, and Q. Liu, “Energy theft includes smart grid and information security.
detection via integrated distribution state estimation based on ami and
scada measurements,” in International Conference on Electric Utility
Deregulation and Restructuring and Power Technologies, 2016, pp. 751–
756.
[22] C. L. Su, W. H. Lee, and C. K. Wen, “Electricity theft detection in low
voltage networks with smart meters using state estimation,” in IEEE
International Conference on Industrial Technology, 2016, pp. 493–498.
[23] S. C. Huang, Y. L. Lo, and C. N. Lu, “Non-technical loss detection
using state estimation and analysis of variance,” IEEE Transactions on
Power Systems, vol. 28, no. 3, pp. 2959–2966, 2013.
[24] S. Amin, G. A. Schwartz, A. A. Cardenas, and S. S. Sastry, “Game-
theoretic models of electricity theft detection in smart utility networks:
Providing new capabilities with advanced metering infrastructure,” IEEE Mi Wen (IEEE M10) received the M.S. degree in
Control Systems, vol. 35, no. 1, pp. 66–81, 2015. Computer Science from University of Electronic Sci-
ence and Technology of China in 2005 and the Ph.D.
[25] A. A. Cardenas, S. Amin, G. Schwartz, and R. Dong, “A game theory
degree in computer science from Shanghai Jiao Tong
model for electricity theft detection and privacy-aware control in ami
University, Shanghai, China in 2008. She is currently
systems,” in Communication, Control, and Computing, 2012, pp. 1830–
an Associate Professor of the College of Computer
1837.
Science and Technology, Shanghai University of
[26] D. R. Pereira, M. A. Pazoti, L. Pereira, S. A. M, D. Rodrigues, C. O.
Electric Power. From May 2012 to May 2013, she
Ramos, A. Souza, J. Papa, and o P, “Social-spider optimization-based
was a visiting scholar at University of Waterloo,
support vector machines applied for energy theft detection,” Computers
Canada. She serves Associate Editor of Peer-to Peer
& Electrical Engineering, vol. 49, no. C, pp. 25–38, 2016.
Networking and Applications (Springer). She keeps
[27] Z. Erkin and G. Tsudik, “Private computation of spatial and temporal acting as the TPC member of some ?agship conferences such as IEEE
power consumption with smart meters,” in International Conference on INFOCOM, IEEE ICC, IEEE GLOEBECOM, etc from 2012. Her research
Applied Cryptography and Network Security, 2012, pp. 561–577. interests include privacy preserving in wireless sensor network, smart grid etc.
[28] F. Li, B. Luo, and P. Liu, Secure and privacy-preserving information
aggregation for smart grids. Inderscience Publishers, 2011.
[29] H. Bao and R. Lu, “A new differentially private data aggregation with
fault tolerance for smart grid communications,” IEEE Internet of Things
Journal, vol. 2, no. 3, pp. 248–258, 2017.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JIOT.2019.2903312, IEEE Internet of
Things Journal
12

Xiaohui Liang received the B.Sc. degree in Com- Zipeng Fu is a candidate for B.Sc. in Computer
puter Science and Engineering and the M.Sc. degree Science and Engineering and B.Sc. in Applied Math-
in Computer Software and Theory from Shanghai ematics in University of California, Los Angeles. His
Jiao Tong University (SJTU), China, in 2006 and research interest includes multi-agent reinforcement
2009, respectively. He is currently working toward learning and machine learning.
a Ph.D. degree in the Department of Electrical
and Computer Engineering, University of Water-
loo, Canada. His research interests include applied
cryptography, and security and privacy issues for e-
healthcare system, cloud computing, mobile social
networks, and smart grid.

Baojia Yang received the Bachelors degree in De-


partment of Computer Science and Technologe from
Kai Zhang received the Bachelor’s degree in School
Huazhong University of Science and Technology,
of Information Science and Engineering from Shan-
China, in 2014, and the Master’s degree of Computer
dong Normal University, China, in 2012, and the
Technology from University of Chinese Academy
Ph.D. degree in Department of Computer Science
of Sciences, China, in 2017. He is currently a
and Technology from East China Normal University,
Softerware Engineer of Microsoft China. His re-
China, in 2017. He is currently an Assistant Pro-
search interest includes Machine Learning and Deep
fessor with Shanghai University of Electric Power,
Learing.
China. His research interest includes applied cryp-
tography and information security.

2327-4662 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like