Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Electrical Power and Energy Systems 125 (2021) 106544

Contents lists available at ScienceDirect

International Journal of Electrical Power and Energy Systems


journal homepage: www.elsevier.com/locate/ijepes

Electricity theft detection in low-voltage stations based on similarity


measure and DT-KSVM
Xiangyu Kong a, *, Xin Zhao a, Chao Liu a, Qiushuo Li b, DeLong Dong c, Ye Li c
a
Key Laboratory of Smart Grid of Ministry of Education, Tianjin University, Tianjin 300072, China
b
Digital Grid Research Institute, China Southern Power Grid, Guangzhou 510663, China
c
Tianjin Electric Power Research Institute, State Grid Tianjin Electric Power Company, Tianjin 300384, China

A R T I C L E I N F O A B S T R A C T

Keywords: The theft of electricity affects power supply quality and safety of grid operation, and non-technical losses (NTL)
Electricity theft detection have become the major reason of unfair power supply and economic losses for power companies. For more
Similarity measure effective electricity theft inspection, an electricity theft detection method based on similarity measure and de­
WGAN
cision tree combined K-Nearest Neighbor and support vector machine (DT-KSVM) is proposed in the paper.
Electricity consumption behavior analysis
DT-KSVM
Firstly, the condensed feature set is devised based on feature selection strategy, typical power consumption
characteristic curves of users are obtained based on kernel fuzzy C-means algorithm (KFCM). Next, to solve the
problem of lack of stealing data and realize the reasonable use of advanced metering infrastructure (AMI). One
dimensional Wasserstein generative adversarial networks (1D-WGAN) is used to generate more simulated
stealing data. Then the numerical and morphological features in the similarity measurement process are
comprehensively considered to conduct preliminary detection of NTL. And DT-KSVM is used to perform sec­
ondary detection and identify suspicious customers. At last, simulation experiments verify the effectiveness of the
proposed method.

1. Introduction over fine-grained time intervals has made it easier for utility companies
to monitor anomalies in the network [5]. The research goal of the paper
1.1. Motivation is to analyze the customer’s power consumption behavior based on the
collected big power data and then to realize the accurate identification
There are two types of losses in transmission and distribution net­ of abnormal power consumption behavior by using the machine
works: technical losses and non-technical losses (NTL). Technical losses learning method. The research results are applied by some power grid
are caused by the heating of resistive elements in lines, transformers, companies, which could not only reduce electricity theft but also pro­
and other equipment. NTL is most caused by electricity theft, meter vide a new solution for the detection of power meter fault. This article
failures, or billing errors [1], which account for 20–40% of total losses, detection of an illegal customer is a high probability of electricity theft
and most of related to energy theft [2]. According to the World Bank, users. The manual investigation should be used to determine whether it
electricity theft has caused electricity supply losses to exceed 25% of is an electricity theft user
India’s supply, 16% in Brazil, 6% in China [3]. The impact of NTL is also
significant in developed countries, electricity theft is estimated at £173 1.2. Literation review
million every year in the UK, and it may be worth up to $6 billion in the
USA [4]. The use of metering data of AMI for electricity theft detection mainly
With the construction of smart grid and ubiquitous electric power includes data statistics-based methods and machine learning-based
Internet of Things, power systems have gradually achieved digitization methods. The method based on data statistics is obtained by analyzing
and interaction. Power selling companies have continuously obtained the energy consumption relationship between the main smart meter
more information about customers. The introduction of advanced (called master smart meter) and the smart meter installed on each user
metering infrastructure (AMI) to monitor electrical power consumption (home or business) in the same time interval. Rengaraju [6] compared

* Corresponding author.
E-mail address: eekongxy@tju.edu.cn (X. Kong).

https://doi.org/10.1016/j.ijepes.2020.106544
Received 28 February 2020; Received in revised form 14 August 2020; Accepted 19 September 2020
Available online 1 October 2020
0142-0615/© 2020 Elsevier Ltd. All rights reserved.
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

the difference between the master smart meter and smart meter to generate data, which is similarity exist sample, achieve the balance of
determine whether the electricity pilferage in the low-voltage station. the training set; this method avoids the loss of key information. The
Faria L [7] proposed a method to detect whether there is abnormal traditional ROS [19] reduces the imbalance of the training set by
power consumption behavior on a line according to the mutation of line copying a few samples. Xiaolong proposed a synthetic minority over­
loss. The implementation of the statistical method is relatively simple, sampling technique (SMOTE) [20], which randomly creates artificial
but an important drawback is that it can only be judged that electricity samples along the line joining a minority sample and one of its nearest
stealing has occurred in the low-voltage station, and it cannot accurately neighbors. SMOTE has been modified to produce the Random-SMOTE
locate the illegal consumers. If to determine the suspected user, manu­ [21] and Kmeans-SMOTE [22]. However, the above oversampling al­
ally check all the users in this area one by one is needed, which is less gorithms do not take into account the overall distribution characteristics
efficient and puts higher requirements on the quality of the detection of the data, so the improvement of model classification performance is
personnel in the detection process. often limited [23].
With the increase of measurement data, the power industry has Based on the above discussion, it can be summarised that the use of
entered the era of big data. Massive measurement data information also SVM has become an important research direction to detect electricity
provides a broader platform for the application of machine learning and theft. However, to obtain accurate analysis results, several key points
other artificial intelligence technologies [8]. The use of machine need to be resolved:
learning for NTL has become mainstream [9], Angelos [10] proposed K-
means to group customers with similar profiles to create a general (1) The numbers of normal and abnormal samples are not in the same
pattern of power consumption, and customers with vast euclidean dis­ range. Benign samples are easily available using historical data.
tances to the cluster centers were considered potential fraudsters. Joa­ Theft samples, on the other hand, rarely or do not exist for a given
quim [11] propose the use of fuzzy Gustafson-Kessel clustering (GK) to customer.
get consumption patterns. The farther away data from an analyzed (2) Combining supervised and non-supervised classification tech­
consumer is from the regular prototypes, the higher they may be stealing niques to detect a synthetic consumption pattern that results from
electricity. The use of unsupervised clustering algorithms has also been theft, achieving the best results with SVM classification.
used to identify anomalies in consumers’ demand patterns in AMI. While (3) When using SVM to detect electricity theft, it is necessary to
these methods are useful in identifying customers with similar load improve the accuracy of detection near the decision plane.
patterns, due to their unsupervised nature, they will typically result in
high false-positive rates if acting as the main algorithm for theft- Keeping the above constraints in mind, a comprehensive top-down
detection applications [12]. scheme has been put forth in this paper to identify electrical theft.
As a data mining technique, SVM is used to classify user electricity This paper proposes electricity theft detection based on similarity
consumption patterns or load profiles. Anish Jindal [13] based on the measure and decision tree combined K-Nearest Neighbor and support
combination of DT and SVM classifiers. It can be viewed as a two-level vector machine (DT-KSVM). The proposed electricity theft detection
data processing, and analysis approach since the data processed by DT based on the data augment method involves three critical steps shown in
are fed as an input to the SVM classifier rigorous analysis of gathered Fig. 1.
electricity consumption data to identify suspected users of theft. J. Nagi Step 1: Determine the suspected station and use KFCM clustering
[14] used historical consumption data, along with the SVM classifier, algorithm to cluster the user’s historical data . Obtaining the user’s
were used to detect abnormal behaviors. The average daily consump­ power consumption characteristic curve .
tions of customers were calculated, and the long term trend in energy Step 2: Based on similarity constraints and real constraints, 1D-
consumption was used to identify fraudulent customers. Jian et al. [15] WGAN was used to generate high-precision measurement data that
proposed an electricity theft detection scheme based on the One-class matched the characteristics of electricity theft.
SVM classification algorithm. By learning the user’s historical power Step 3: Comprehensively, considering the numerical and morpho­
consumption data, a typical power consumption model was constructed logical characteristics of the curve to be measured, and the characteristic
to identify abnormal power consumption. The SVM methods for elec­ curve, the suspected user and illegal costumers are obtained. Then the
tricity theft detection have the advantages of higher theft detection ac­ DT-KSVM is trained with a balanced data set and using the trained DT-
curacy, and the characteristics learned during the process. But it faces KSVM to identified the illegal consumers.
some problems: (1) When SVM is used for multiple classifications, and
the upper node fails to separate the samples correctly, the probability of
the misclassified samples entering the lower node and cause error
accumulation would increase; (2) In the process of classification, points Real data Historical Theft data
Step1
near the decision plane are not accurately classified by SVM.
Step2
Deep learning techniques for electricity theft detection are studied in Discriminator pre-
Data preprocessing Feature selsction
[16], and the comparison between different deep learning architectures, training
such as convolutional neural network (CNN), long-short-term memory Discriminator
No
(LSTM) recurrent neural network (RNN), and stacked autoencoders, are Determined KFCM training Alternate
suspected station
provided. Moreover, the authors in [17] proposed a deep neural network Generator training
Yes training
(DNN) based customer-specific detector that can efficiently thwart such
Test curve in suspected Characteristic
cyber attacks. In [18], a wide and deep CNN model was developed and station
No curve in station Generate theft
applied to analyze the electricity theft in smart grids. The use of deep sample
learning methods for electricity theft detection can yield higher accu­
Test curve and Theft samples balanced
racy, but training complex neural networks, the system is easy to fall Trained DT-KSVM
characteristic curve with normal samples
into the local minimum and cannot jump out. It is difficult to determine
the number of convolutional layers and network hyperparameters for Detection by Construct DT-KSVM
Evaluate
deep learning due to the lack of theoretical foundation. similarity measures
Classification effect
classifier
At present, there are two main ideas for processing unbalanced Determine the Step3
sample sets: random oversampling (ROS) and undersampling (RUS) al­ suspected users
Training DT-KSVM

gorithm. RUS will potentially lose useful information due to discarding


some of the most samples. ROS uses the existing minority samples to Fig. 1. Electricity theft detection based on data augment method.

2
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

1.3. Contribution this station is located based on similarity measure and DT-KSVM.
The theft data obtained with the AMI system involves m users in time
This paper proposes an electricity theft detection method based on n, and their form of data is described by matrix. The data of the same
similarity measure and DT-KSVM, which combined unsupervised and user at different periods can be specified xj . For the data of different
supervised learning to detect electricity theft users. The main contri­ users at the moment n, the Xi description is as follow.
butions to the paper are as follows.

(1) Using 1D-WGAN to generate electricity theft data. The deep


neural network learns the objective laws of measurement data (1)
from a small amount of sample data. Based on Wasserstein dis­
tance, similarity constraints, and real constraints, high-precision
measurement data that matches the characteristics of electricity
theft is generated. This method proposed this paper solves the
where xi, j represents the measurement value of the smart meter by the ith
problem of low sample size for training using machine learning
and imbalance between normal and abnormal samples. user during the jth measurement period.
(2) To increase the effectiveness of the similarity measure, numerical Normalization processing is an essential work of data mining. Min-
and morphological features are comprehensively considered in max normalization is useful in that maps result in values to [0,1]
the process of similarity measurement, and the dynamic time through linear transformations of the original data [25]. The conversion
warping (DTW) is used to measure the similarity of morpholog­ function is as follows:
ical characteristics. x − xmin
x* = (2)
(3) Using DT-KSVM for electricity theft detection. Firstly, the pro­ xmax − xmin
jection vector method is used to measure the separability be­
tween classes in the training process, and the biased binomial where x is the actual measurement data; xmax is the maximum value of
decision tree is constructed according to the value of the sepa­ sample data; xmin is the minimum value of sample data; x* is the
rability. This method reduces the occurrence of error accumula­ normalized electricity consumption data.
tion phenomenon. Secondly, the K-nearest neighbor is combined
with SVM in the classification process, which solves the problem
that the accuracy of the point classification of SVM near the de­ 2.2. Feature selection
cision plane is not high in the classification process.
There are various power consumption behaviors of users. For
1.4. Organization of the paper different users, selecting different feature sets to analyze the electricity
consumption behaviors of users has different analysis results. It is
The remaining work is organized as follows. Section 2 determines the necessary to select effective features to represent electricity consump­
suspected low-voltage stations and analysis of customer behavior. Sec­ tion behavior. However, the characteristics are closely related to the
tion 3 privies the method of sample data generation based on 1D-WGAN. user’s behavior, and there is a significant correlation between various
In Section 4, electricity theft preliminary detection using customers’ characteristics. The redundant information contained in the feature
consumption patterns and following that the DT-KSVM is used to space leads to poor analysis results. Removing the overlapping and
perform secondary detection. Section 5 gives the results and analyzes. redundant information to analyze the user’s electricity consumption
The conclusions are drawn in Section 6. behavior, which helps to improve the performance of the analysis.
Typical characteristics that represent the characteristics of users’ elec­
2. Detection of suspected station and analysis of customer tricity consumption behavior include the following aspects: statistical
behavior characteristics including daily power consumption, annual power con­
sumption data, seasonal power consumption data, daily maximum and
2.1. Station detection methods minimum load, average load rate, etc.; the time series characteristics
including peak-hour power consumption rate and valley power coeffi­
A typical low-voltage distribution network consists of a number of cient et al.; related features such as house area and the number of family
interconnected units. Electricity is transmitted from a power plant to
substations through high-voltage lines, which are routed to industrial, Table 1
commercial, and residential. There is also information transfer between Comparisons of power supply capacity.
different types of users. AMI refers to the systems for measuring, col­ Features Meaning
lecting, storing, analyzing, and using customer energy usage. The Daily maximum load Reflecting the user’s maximum load
collected data includes data of various large, medium, and small-sized Daily minimum load Reflecting the user’s minimum load
typical transformer users, as well as 380/220 V low-voltage residents. Average load Reflecting the user’s average load
The collected information includes data items such as electrical energy Load rate Reflecting the load changes throughout the day
Peak-hour electricity consumption Reflecting the load changes in peak period
data, event records, and other data. The collected data items are ratio
collected, analyzed, and stored to obtain the user’s power consumption Valley coefficient Reflecting the load changes in the valley period
information and the user’s consumption behavior [24]. Percentage consumption in normal Reflecting the load changes in the normal
In the low-voltage station, energy loss occurs is based on energy period period
Daily load variance Reflecting user load fluctuations
balance mismatch. Assuming all meters are reading normally, the low-
Coefficient of variation Reflect the dispersion degree of users in the
voltage stations is regarded as a node, according to Kirchhoff’s law, mean value
sub-meter reading + network loss = total meter reading, when the The ratio between peak and valley Reflecting the peak regulation capability of the
network loss is too large, it is generally considered that the possibility of load power grid
electricity theft is very high. If the network loss is too large, the low- Note: The peaks, valleys, and normal periods of different regions are not
voltage stations is regarded as suspected station. The total consump­ consistent. Usually, the peak period is 9:00–17:00; The valley period is from
tion is measured by transformer meters and is compared with the total 00:00 to 06:00, and from 21:00 to 24:00; the normal period is from 06:00 to
amount of usage reported by the smart meters. The illegal customer in 09:00, and from 17:00 to 21:00.

3
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

members, etc. [26]. A feature library is formed with common features. The generator (G) is responsible for learning the regularities of the
The specific features and meanings are shown in Table 1: distribution of samples and generating new samples. G is composed of a
The feature selection strategy in this paper as follows: neural network, and the input is a prior distribution PZ , corresponding
random variable z, the output is G(z). The distribution law Pg (z) of the
(1) Selecting the number of features is 1 and determine the feature generated data gradually fits the sample data pdata (x). The goal of the
which is the highest clustering evaluation criteria when the generator is to generate as realistic data as possible to confuse the
number of features is one; discriminator, so its loss function can be defined as Ez̃PZ [− D(G(z))]. The
(2) Selecting the number of features is 2, selecting new features based objective function of generator is:
on the selected features, determine two features which the two
features with the highest clustering evaluation criteria; minEz̃PZ [ - D(G(z))] (3)
(3) Selecting the number of features is i, selecting new features based
The discriminator (D) is responsible for determining whether the
on the selected features, determine i + 1 features which the
input data is real. D is also a neural network, but the input is actual data
i + 1 features with the highest clustering evaluation criteria;
or data generated by the generator. The main task of the discriminator is
(4) Repeat the above steps until when n +1 features are selected, the
to distinguish two kinds of data, so its output is a scalar between 0 and 1,
accuracy rate is lower than the n features, then determine the
which is the probability of belonging to the actual data or generating
optimal number of features. The feature with the highest clus­
data. The loss function of D can be defined as
tering evaluation criteria is the selected feature. More detailed
Ex̃P [D(x)] + Ez̃PZ [− D(G(z))]. The objective function is:
criteria could be found in [27]. data

maxEx̃Pdata [D(x)] + Ez̃PZ [ - D(G(z))] (4)


The deep-learning method based on feature extraction and the
stacked uncorrelated autoencoder (SUAE) [28] is used in the paper. Due The objective function of the entire game process is:
to the deep architecture and powerful uncorrelated ability of SUAE, the minmaxV(D, G)
features are extracted from load profiles concisely and effectively, which G D
(5)
= EX̃P [logD(X)] + Ez̃P [log(1 − D(G(z)))]
has thus enabled a significant improvement in final NTL detection per­ data (X) Z(z)

formance. DT-SVM is applied as classifiers, which uses the features Using the Wasserstein distance instead of the JS divergence. Training
extracted by SUAE to output a judgment result. the GAN with the minimized Wasserstein distance as the target effec­
tively improves the stability of GAN training. Using Wasserstein distance
3. Sample data generation based on 1D-WGAN can alleviate the problem of gradient disappearance during training and
improve training stability [30]. Wasserstein distance is defined as:
This paper uses one-dimensional generative adversarial networks ( )
(1D-WGAN) to generate electricity theft data. The objective function W pdata , pg = ∏inf E(x,y)∼γ [‖x − y‖] (6)
used in GAN image generation and integrating the advantages of Was­ γ∼ (pdata ,pg )

serstein generative adversarial networks (WGAN). The measured data is


∏( )
a one-dimensional (1D) time series, so the structure of GAN network where pdata , pg is the set of joint distributions γ where pdata and pg are
based on a 1D convolution layer is designed. The deep neural network ( )
marginal distribution; W pdata , pg is the infimum of γ(x, y) expecta­
learns the objective laws of measurement data from a small amount of
sample data. Based on Wasserstein distance, similarity constraints and tions, the implication is that fitting pg to pdata requires the distance from
real constraints, high-precision measurement data that matches the x to y. Since it is difficult to calculate the Wasserstein distance between
characteristics of electricity theft is generated. Finally, the generated arbitrary distributions directly, its Antonovich-Rubinstein dual form is
samples are combined with existing samples to obtain more samples. adopted [30]:
The overall framework adopted in this article is shown in Fig. 2. ( ) 1
Select the existing small amount of theft data as the training set. W pdata , pg = sup Ex̃pdata [f (x)] − Ex̃pg [f (x)] (7)
K ‖f ‖⩽K
Theft data is defined as pdata (X) which it is difficult to describe by
explicit mathematical models. A set of noise variables (hidden variables)
where ‖f‖L⩽K is shown that the function f(x) satisfies K-Lipschitz con­
z satisfy Gaussian distribution pZ (z). The mapping establishment process
tinuity, and the absolute value of its derivative has a supremum.
is realized through the training of GAN. In this way, the new data can be
After training, the WGAN can generate an unlimited number of
generated that meets the original data distribution by sampling from a
samples which can meet the distribution. In order to ensure the
known distribution [29].
authenticity of the generated measurement data, both authenticity and

Measurement data DeConv2 DeConv4 Fully


Fully DeConv2 DeConv4 f=(5,5) s=2 f=(5,5) s=2 Connected
Connected f=(5,5) s=2 f=(5,5) s=2

(4,4,512) (16,16,128) (16,16,128) (4,4,512) (1)


Actual data Generated data
G(z) (64,64,32) (64,64,32) D(x)
Loss of Loss of
z similarity Ls authenticity Lr
+ +

Fig. 2. Data generation framework based on 1D-WGAN.

4
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

similarity constraints must be met [31]. as normal electricity consumption. In the process of selecting the char­
The authenticity constraint is used to ensure that the generated data acteristic curve, this paper uses the weighted-average method to obtain
can be close to the real situation. The loss of authenticity Lr is defined as: the characteristic curve of the user in normal power consumption mode
( ( ) ) [33].
Lr = W G z; θ(G) ; θ(D) (8) The similarity of time series includes two aspects: value and
( ) morphological. Most of the researches on the similarity of time series has
where G(z; θ) is the generated data of the generator; W ̃; θ(D) repre­ failed to take into account well. For accounting the morphological and
sents the Wasserstein distance between the generated data and the real value of the curve, the Euclidean distance is used to measure the simi­
sample. larity, and DTW to measure the similarity of morphological features.
The generated data should be as similar to the actual data as possible, To simply and accurately describe the morphological characteristics
so the similarity loss Ls is defined as: of the curve, such as rise, fall, and stability at various periods, the slope
⃦ ( ) ⃦ of the line is used to represent the morphological characteristics of the
Ls = ⃦G z; θ(G) , I ⃦2 (9)
period. Therefore, the time series of length n is reduced to a morpho­
where I is the actual data, and the 2-norm is used to measure the simi­ logical sequence of n - 1.
larity of the two matrices. xi+1 − xi
(12)

xi = i = 1, 2, ⋯, n − 1
Therefore, the ultimate optimization goal of data generation is: Δt

min Lr + Ls (10) The introduction of morphological sequences based on the Euclidean


z∼pZ (z)
distance can overcome the shortcomings of relying only on the value of
Taking (10) as the optimization goal, Adam [32] was used as the each time point and ignoring morphological features, but its measure­
optimizer to optimize the latent variable z, so that the generated data ment effect depends on the selection of the distance function, so the
measurement of morphological sequences also needs to use accurate
was close to the actual data. The final total sample ̂I is
measurement method. DTW can bend the time axis to match points and
( )
̂I = I + G ̂z ; θ(G) (11) points, accurately measure the time series according to the morphology,
and meet the measurement requirements.
There are two separate morphological sequences X = (x1 , x2 , ⋯,

4. Location electricity theft users based on similarity measure ( )
and DT-KSVM xn− 1 ) and Y = y1 , y2 , ⋯, ym - 1 . In order to align the two sequences, a

distance matrix needs to be constructed. Where each element in the


Among all machine learning methods, SVM has been chosen as one of matrix is represented by Euclidean distance:
the classifiers in the proposed scheme due to its advantages over the ⃦ ⃦
D(i, j) = ⃦xi − yj ⃦2 i = 1, 2, ⋯, n − 1; j = 1, 2, ⋯, m − 1 (13)
existing classifiers. First, SVM has been rigorously tested in the past for
providing higher efficiency than the other classifiers. Second, it is However, the path of DTW is not chosen randomly, and it needs to
capable of handling the overfitting problem, which deals with the meet the boundary conditions, continuity, and monotonicity con­
appropriate handling of unknown datasets to produce related outputs. In straints. There are many paths after satisfying the three constraints. We
addition, SVM along with different kernels separates the data which are need to choose a path that minimizes the total distance finally obtained
not linearly separable otherwise [13]. [34].
This paper proposed the exact location of electricity theft users The cumulative distance γ is constructed by the method of dynamic
consists of two steps: (1) Determine suspected users based on similarity programming. The cumulative distance γ(i, j) is the sum of the distance
measures; (2) Use DT-KSVM to detect suspected users who determined D(i, j) of the current grid point and the cumulative distance of the
by preliminary detection and output illegal customers. As shown in smallest adjacent element that can reach the point:
Fig. 3, D1 and D2 is the set threshold, D1 <D2 . Dwhole is the distance ob­
tained by similarity measurement. When Dwhole >D2 , the probability that γ(i, j) = D(q, c)+
(14)
min{γ(i − 1, j − 1), γ(i − 1, j), γ(i, j − 1)}
the user is suspected is considered to be large, and should manual
verification. When Dwhole <D1 , the user should be regarded as a normal The path with the smallest cumulative distance is the best path to
user. When D1 <Dwhole <D2 , then the user performed secondary detec­ that point.
tion. Different parameters need to be defined for different types of users. Assume that there are two time series of curves X = (x1 , x2 , ⋯, xn )
( )
and Y = y1 , y2 , ⋯, ym , which represent the test curve and character­
4.1. Similarity measurement istic curve respectively, and calculate the morphological sequences X

and Y respectively. This similarity measurement method satisfies both


The user’s characteristic curve reflects the user’s electricity con­ numerical and morphological characteristics:
sumption characteristics and electricity consumption behavior. In the √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
similarity measurement, determine the characteristic curve of the user (15)
′ ′
Dwhole (X, Y) = αD2 (X, Y) + λDTW(X , Y )

Test curve and


where, α and λ are weights of the value and shape respectively, and
Balanced sample set
characteristic curve α + λ = 1.
Determined the
number and type of
Similarity measure illegal consumer Training DT-KSVM
4.2. DT-KSVM model
Yes Determine support
D whole>D2 Suspected user vectors and decision The concept of SVMs was introduced by Vapnik. SVMs are a set of
No Test
planes related supervised learning methods to analyze the given data and
recognize a pattern or trend of input values with respect to output. Map
Yes
D1<D whole<D 2 i=i+1
DT-KSVM for the input into a high-dimensional feature space by means of some
classification
No nonlinear functions and then construct in this space a hyperplane that
can effectively separate the inputs according to their output values.
Fig. 3. The method of location electricity theft users. The SVM algorithm was originally designed for binary classification

5
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

problems. When dealing with multi-class problems, it is necessary to ( ) ⃦ ( ) ⃦2


construct a suitable multi-class classifier. At present, there are two main d s, x*i = ⃦ϕ(s) − ϕ x*i ⃦
( ) ( ) (16)
methods for constructing SVM multi-class classifiers, direct method and = k(s, s) − 2k s, x*i + k x*i , x*i
indirect method; the DT-SVM algorithm needs to construct the least
number of classifiers. The DT-SVM method is relatively clear, easy to 1
a= (17)
train and classify, and has certain advantages in terms of training time 2‖ω‖
and classification accuracy. Moreover, it has the advantage that there is
no indivisible region, and it is not necessary to traverse all classifiers where the RBF kernel function is selected;s is the support vector in the
during classification. Numerous research experiments have proved that training sample; xi the sample to be classified.
the DT-SVM algorithm is the best method for SVM to solve multi-class The flowchart of electricity theft detection based on KSVM is shown
classification problems at present [35]. Based on DT-SVM, this paper in Fig. 5.
proposes DT-KSVM.
The multi-class classifier is constructed as follows: 5. Case studies

(1) The SVM1 separates the electricity theft features of the first class We used the smart energy data from the Irish Smart Energy Trial [38]
from the electricity theft features of the 2nd, 3th, …, Nth and in our tests. The dataset was released by Electric Ireland and Sustainable
normal samples, and constructs the SVM1; Energy Authority of Ireland (SEAI) in January 2012. It includes half-
(2) The SVMi separates the electricity theft features of the ith class hourly electricity usage reports of over 5000 Irish homes and busi­
from the electricity theft features of the i + 1th, i + 2th, …, Nth nesses during 2009 and 2010. Customers who participated in the trial
and normal samples, and constructs the SVMi; had a smart meter installed in their homes and agreed to take part in the
(3) The SVMn separates the electricity theft features of the Nth class research. For each customer, there is a file containing half-hourly
from the normal samples, and constructs the SVMn. metering reports for 535 days. Therefore, it is a reasonable assump­
tion that all samples belong to honest users. The large number and va­
Finally, N classifiers are constructed according to the structure of the riety of customers, long periods of measurements, and availability to the
binary tree. When performing electricity theft detection, the SVM of public make this dataset an excellent source for research.
each layer only recognizes one type of power stealing. The remaining According to the actual situation of stealing electricity, this paper
sample set is identified by the next level of SVM, which is gradually sets up six types of stealing electricity. The first type of electricity theft,
reduced. The SVM of the last layer separates the last characteristic of all samples are multiplied by the same randomly chosen coefficient. The
stealing electricity from the normal samples, and the leaf nodes of DT are second type of electricity theft is an ‘on-off’ attack in which consumption
the type of electricity theft. is reported as zero during some intervals. The third type of electricity
The structure of DT-SVM is shown in Fig. 4. However, since the de­ theft multiplies the consumption by a random factor that varies over
cision tree is constructed as a hierarchical classification model, the time. The fourth type of electricity theft is combining of the second and
biggest problem is “error accumulation”, which affects the accuracy of third types. The fifth type of electricity theft is multiplied by the same
classification. If a biased binomial tree is used for classification, a de­ randomly chosen coefficient in peak period. The sixth type of electricity
cision tree with low error accumulation and high classification accuracy theft is an ‘on-off’ attack in random period, but the duration is short and
needs to be constructed first. In order to reduce the effect of “error discontinuous, reducing the total electricity consumption. Compared
accumulation”, this paper adopts the projection vector [36] method to with the second type, the sixth type is more difficult to detect because of
measure the degree of separation between classes and construct a biased the randomness of the selected time period. The example of the weekly
binomial decision tree based on this. consumption of a customer and the corresponding attack patterns is
As the data is far away from the hyperplane, the SVM algorithm shown in Fig. 6.
could accurately classify. But when the distance is close to the hyper­
plane, the classification effect is low, and misclassification is prone to 5.1. Analysis of customer behavior
occur near the hyperplane. Therefore, the information provided by
samples near the interface is used to improve the accuracy of classifi­ This article numbers the characteristics commonly used to represent
cation, which combines SVM and KNN to establish a combined classifier users’ electricity consumption behavior, including the valley electricity
SVM-KNN (KSVM). When classifying the sample to be identified, the coefficient and peak-time electricity usage load rate. The common
distance between the sample and the classification hyperplane is
calculated. If the distance is greater than the given threshold a, the SVM
classification is directly applied. Otherwise, KNN is used for Training samples Test samples
classification.
In KNN classification, support vectors of each class are represented to
calculate the distance between the recognized sample and each SVM,
Calculate the distance
SVM classifier between the test sample
which distance is the feature space instead of the original space. The
and the decision plane
class of the sample to be divided is determined by distance. The distance
calculation formula is as follows [37]:
Determine decision
planes and support Yes
The distance
vectors >Threshold
No
Normal
SVM1 SVMi SVMn Classification Classification
sample
The
susp-
with SVM with KNN
ected
users Determine the number of the suspected user and the
Class 1 Class i Class n type of electricitytheft

Fig. 4. The structure of DT-SVM. Fig. 5. Flowchart of electricity theft detection with KSVM.

6
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

0.3 1.4

Normal consumption Normal consumption


Attack pattern Attack pattern
1.2
0.25

1
Consumption/(KW·h)

0.2

Consumption/(kW·h)
0.8

0.15
0.6

0.1
0.4

0.05 0.2

0 0
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
t/30min t/30min
(a) (d)
1.6 4
Normal consumption Normal consumption
1.4
Attack pattern Attack pattern
3.5

1.2 3
Consumption/(kW·h)

Consumption/(KW·h)
1 2.5

0.8 2

0.6 1.5

0.4 1

0.2 0.5

0 0
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
t/30min t/30min
(b) (e)
1.8 0.6
Normal consumption
1.6 Attack pattern
Normal consumption
Attack pattern 0.5
1.4
Consumption/(KW·h)
Consumption/(kW·h)

1.2 0.4

1
0.3
0.8

0.6 0.2

0.4
0.1
0.2

0 0
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
t/30min t/30min
(c) (f)

Fig. 6. The weekly consumption of a customer and the corresponding attack patterns.

features are used as a feature library representing the user’s electricity


100
consumption behavior. Feature selection is selected from the feature
library, the correspondence between features and numbers is shown in
95
Table 2. The trend of the number of features and the accuracy is shown
in Fig. 7.
Accuracy /%

90
It can be seen from Fig. 8, as the number of features increases, the
accuracy of clustering increases, but when the number of features ex­
85
ceeds 4, the accuracy of clustering decreases, so this article finally
determined the number of characteristics, which representing the elec­
80
tricity consumption behavior. The determined characteristic indicators
are load rate, valley coefficient, peak-hour electricity consumption ratio, 75
and percentage of consumption in a normal period. 0 1 2 3 4 5 6
The number of selected features
It can be seen from Fig. 8 KFCM has fewer iterations, and the
Fig. 7. The trend of the number of features and the accuracy.
Table 2
The correspondence between features and numbers. obtained objective function value is smaller, that is, the algebraic sum of
Number Feature Number Feature each point to the cluster center is the smallest, so KFCM can effectively
1 Daily maximum load 6 Electricity consumption ratio improve the classification effect and iteration time of the algorithm
in peak period compared with FCM.
2 Percentage consumption in 7 Ratio between peak and According to the selected features to cluster. The daily load charac­
normal period valley consumption teristic curve is obtained according to the weighted average method, is
3 Valley coefficient 8 Daily average load
4 Daily consumption 9 variance
shown in Fig. 9.
5 Load rate 10 Coefficient of variation

7
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

15 0.7
real sample
14 FCM 0.6 generated sample
KFCM
The valve of objective function

Consumption / (kW·h)
13
0.5

12
0.4
11
0.3
10
0.2
9

8 0.1

7 0
0 10 20 30 40 50 60 70 80 90 100
6 t/15min
0 10 20 30 40 50 60
(a) Samples generated after 192 pieces of training
The number of iterations
0.7
real sample
Fig. 8. Comparison of the number iterations of KFCM and FCM. 0.6 generated sample

Consumption / (kW·h)
0.5

0.4
1.5
The first type of typical consumption mode
Consumption / (kW·h)

0.3

1 0.2

0.1

0.5 0
0 10 20 30 40 50 60 70 80 90 100
t/15min
(b) Samples generated after 3840 pieces of training
0
0 5 10 15 20 25 30 35 40 45 50 0.7
Time/(30min)
real sample
(a) 0.6 generated sample
1.4
Consumption / (kW·h)
The second type of typical 0.5
1.2
Consumption / (kW·h)

consumption mode
1 0.4

0.8 0.3
0.6
0.2
0.4

0.2 0.1

0 0
0 5 10 15 20 25 30 35 40 45 50 0 10 20 30 40 50 60 70 80 90 100
Time/(30min) t/15min
(b) (c) Samples generated after 6400 pieces of training
1.2
The third type of typical
1 consumption mode Fig. 10. Comparison between generated samples and real samples.
Consumption / (kW·h)

0.8

0.6
pieces of training. It can be seen from Fig. 10 (c) that the generator after
192 pieces of training has initially learned the distribution of real
0.4
samples, but the distance from the real samples is still large. Fig. 10(b) is
0.2
the data generated by the generator after 3840 pieces of training. It can
0
0 5 10 15 20 25 30 35 40 45 50
be seen from Fig. 10 (b) that the gap between the samples generated by
Time/(30min) the 3840 training generators and the real samples is tiny. It can be seen
(c) from Fig. 10(c) that the generator after 6400 pieces of training, the
2.5
sample generated by the generator can already deceive the discrimi­
The fourth type of typical
2 consumption mode nator. The comparison between the generated samples and the real
samples Fig. 10 shows that the samples generated based on 1D-WGAN
Consumption / (kW·h)

1.5 are not exactly the same as the original samples, but the same fluctua­
tion rules between them and differences in specific locations, thus
1
ensuring the diversity of the generated electricity theft samples. In
0.5 practical, data generation should be decided according to the amount of
electricity theft data.
0
0 5 10 15 20 25 30 35 40 45 50 This article compares several common data generation algorithms to
Time/(30min) generate sample, and the classification accuracy comparison results of
(d) the sets are shown in Table 3. No matter whether there is noise or no
noise, the samples generated by 1D-WGAN model can effectively
Fig. 9. The daily load characteristic curve.

5.2. Analysis of 1D-WGAN Table 3


Comparison of different data generation algorithms.
Due to the imbalance between normal users and abnormal users, 1D- Algorithm No noise 40 dB noise 30 dB noise 20 dB noise
WGAN is used to generate electricity theft data. Under different pieces of
Original training set 92.5 87.83 85.96 80.63
training, the visualization images between the real samples and the SMOTE 93.54 90.17 88.97 84.35
generated samples are shown Fig. 10. Improved SMOTE 94.12 91.27 90.07 87.56
Fig. 10 (a) is the data generated through the generator after 192 This paper 95.6 93.17 91.89 89.08

8
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

improve the classification accuracy of the classifier. The results show Table 5
that by learning the sample distribution, 1D-WGAN generates new The values determined by similarity measures.
samples that are similar to the original samples but not the same. The Characteristic curve and test curve Measurement value
generated samples have a good effect and can solve the problem of
The first type of electricity theft 1.223
classifier overfitting. At the same time, the 1D-WGAN model reduces the The second type of electricity theft 0.8977
interference effect of noise in the process of adversarial learning and has The third type of electricity theft 2.3681
strong robustness and generalization. The fourth type of electricity theft 3.3260
Table 4 shows the running time of various methods at different SR. It The fifth type of electricity theft 3.5981
The sixth type of electricity theft 0.7843
can be seen that ROS has the fastest running time, and the running time
for WGAN to generate data at the same sampling rate is slightly higher
than ROS, but much lower than the SMOTE and ADA-SYN methods.
With the increase of the sampling rate, the operation time of ROS, 40 Excessive detection ratio 12
SMOTE, and ADA-SYN methods does not change significantly, and the Omission detection ratio
method based on WGAN increases significantly. 10

Excessive detection /%
30

Omission ratio\%
8

5.3. Analysis of electricity theft detection based on DTW and DT-KSVM 20 6

4
After several tests, the first threshold value D2 = 3, and the second 10
threshold value D1 = 0.7. The detection threshold in this interval can 2

ensure that there is a minimum of normal users among suspected users, 0 0


and fewer illegal customer among normal users. In Table 5, the values Euclidean
distance
Correlation
coefficient
Euclidean+
Correlation
The proposed
method
are calculated by (15). The weight of the two is taken here α = λ = 0.5. Methods
The preliminary detection can detect the 4th and 5th type of electricity
Fig. 11. The probability of omission and excessive detection ratio at different
theft.
detection methods.
The probability of omission and excessive detection ratio at different
detection methods are provided Fig. 11, including Euclidean distance,
correlation coefficient, the combined correlation of Euclidean distance
+ coefficient, and the proposed method. Through simulation verifica­ 95 SVM

tion, the method used in this paper has the lowest omission and exces­
DT-SVM
90 This paper
sive detection ratio compared with other methods.
With the increase in the number of normal samples and power 85
Accuracy/%

stealing samples, the accuracy of SVM, DT-SVM, and the proposed 80


method is compared as follows:
75
Fig. 12 shows that the accuracy of the three methods when the
number of stolen samples and normal samples is increased, and the 70
method proposed in this paper is compared with SVM and DT-SVM 65
under the same sample number. As the number of data increases, the
detection accuracy of the three methods increases. But the method 60
50 55 60 65 70 75 80 85 90
proposed in this paper has the highest accuracy rate under the same Percentage
sample number, which confirms the accuracy of the method proposed in
this paper. Fig. 12. The percentage of training samples and accuracy.
The 1st electricity theft detection method defined as no generated
data and combine similarity measure and DT-KSVM; The 2nd electricity
theft detection method defined as no generated data and combined 96 Accuracy 9

similarity measure and SVM; The 3rd electricity theft detection method Omission ratio 8
94
defined as no generate data and similarity measure; The 4th electricity 7
theft detection method defined as generated data and SVM and com­
92 6
bined similarity measure and DT-KSVM; The 5th electricity theft
Omission ratio/%

detection method defined as generated data and combined similarity 5


Accuracy/%

90
measure and SVM; The method proposed in this paper defined as 6. 4
Fig. 13 describes the accuracy and omission ratio under different
88 3
detection methods. It can be seen from the Fig. 13 that the detection
accuracy of the method used in this paper is the highest accuracy and the 2
86
lowest omission ratio. At the same time, comparing different methods, it 1

84 0
Table 4 1 2 3 4 5 6
Running time of various sampling methods. Detection method

Method Running time (s) Fig. 13. Comparative analysis of different schemes with respect to accuracy
SR = 0.5 SR = 1 and omission ratio.

ROS 8.856 11.25


SMOTE 262.41 291.804 can be seen that the problem of imbalanced data set has a greater impact
ADA-SYN 3713.802 3785.88 on the accuracy of detection. Fig. 14 shows the detection accuracy and
1D-WGAN 27.468 45.324 omission ratio of various types of electricity theft. Although the accu­
Note: Sampling rates (SR) is the ratio of the normal sample to the stolen sample racies of some type of electricity theft detection are not high, the whole
after generating data. The unit of time is minute. detection accuracy is higher than other algorithms.

9
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

Accuracy First type


Second type
98 Omission ratio 6 0.95
Third type
0.9 Sixth type
5
96

Omission ratio /%
Accuracy/%

0.85
4

AUC
94 0.8
3
92 0.75
2
0.7
90 1
0.65
88 The first The second
0
The third The sixth Whole 0.6
KNN NN RF SVM CNN This paper
type type type type detection
Algorithm
Fig. 14. Detection accuracy and omission ratio of various electricity
Fig. 15. AUC performance of different classification model in different
theft methods.
attack types.

With the increase of measurement data, the power industry has


entered the era of big data. Based on the large amount of quantitative
Accuracy 11
data information, the use of machine learning to detect power theft has
95 Omission ratio
become mainstream. This paper compares existing approaches with the 9
method proposed in this paper. Table 6 show the convolutional neural

Omission ratio /%
network (CNN) structure and parameters for detecting electicity theft. 90

Accuracy/%
7
Fig. 15 show AUC (Area Under Curve) performance of different classi­
fication models in different attack types, and Fig. 16 shows the com­ 85
5
parison with different algorithms. It can be seen that the proposed
method has accuracy advantages compared to other machine learning 80 3
methods, but is slightly less effective than CNN.
Fig. 17 shows the performance and response time compare CNN with 75 1
KNN NN RF SVM CNN This paper
DT-KSVM under different data sets. The lines with * indicate the accu­
Algorithm
racy comparison between the two algorithms, and the lines with ◇
indicate the response time comparison between the two algorithms. Fig. 16. The accuracy comparison with other algorithms.
When the amount of data is small (less than 3000), the algorithm used in
this paper is more accurate than CNN, but when the amount of data is
sufficient, CNN can achieve better results. Apart from the above anal­ 100 3000
ysis, it is quite essential to compute the response time of the proposed
scheme and CNN for theft detection. Comparing the response time be­ 90 2500

tween the two, when the amount of data is small, the reaction time is

Response time (s)


Accuracy (%)

similar, but when the amount of data is large, the learning time of CNN is
80 DT-KSVM 2000
Deep learning
longer. At the same time, traditional machine learning, it’s simpler to 70 DT-KSVM 1500
Deep learning
adjust hyperparameters and change model designs because of more
comprehensive understanding for the underlying algorithms. However, 60 1000

deep networks are difficult to design due to the lack of theoretical


50 500
foundation, hyperparameters, and network design. In this paper, the
characteristics of dataset, accuracy, false detection rate, AUC, reaction 40 0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
time, and the degree of difficulty parameter optimization are consid­ Number of request
ered. Under comprehensive comparison, the proposed method has bet­
ter results in practical application. Fig. 17. The performance and response time compare CNN with the method
proposed in this paper.
Table 6
Convolutional neural network structure and parameters. In the AMI measurement system, abnormal data will be collected
during the collection process, but the erroneous data is not electricity
The structure of network Parameter Number
stealing data. This data will affect the accuracy of the theft detection.
Convolutional layer 1 Convolution kernel structure 3
Fig. 18 shows the accuracy of the three detection methods under the
Step 1
Number of filters 32 condition of different proportions of error data. Under the condition of
Activation function ReLU erroneous data interference, the accuracy rates of the three methods
Convolutional layer 2‘ Convolution kernel structure 3 have gradually decreased to varying degrees, but the decrease rate and
Step 1 degree of decrease of the recognition rate, DT-KSVM are smaller than
Number of filters 64
SVM and SVM + DT. Under the condition of the same proportion of
Activation function ReLU
Convolutional layer 3 Convolution kernel structure 3 erroneous data, the average recognition rate of DT-KSVM is higher than
Step 1 SVM and DT-SVM. It is confirmed that the method used in this paper has
Number of filters 128 a better robust and anti-noise ability.
Activation function ReLU
Fully connected layer 1 Number of neurons 120
Activation function ReLU 6. Conclusions
Fully connected layer 2 Number of neurons 1
Activation function Sigmoid Aiming at the problem of electricity theft detection, this paper

10
X. Kong et al. International Journal of Electrical Power and Energy Systems 125 (2021) 106544

[6] Rengaraju P, Pandian SR, Lung CH. Communication networks and non-technical
95 energy loss control system for smart grid networks. IEEE Innov Smart Grid Technol
2014:418–23.
90
Accuracy/%

[7] Faria L, Melo J, Padilha-Feltrin A. Spatial-temporal estimation for non-technical


85 losses. IEEE Trans Power Deliv 2016;31:362–9.
80 [8] Yildiz B, Bilbao J, Dore J, Sproul A. Recent advances m the analysis of residential
electricity consumption and application of smart meter data. Appl Energy 2017;
75 208:402–27.
70 [9] Viegas JL, Esteves PR, Melício R, Mendes VMF, Vieira SM. Solutions for detection
of non-technical losses in the electricity grid: a review. Renew Sustain Energy Rev
DT-KSVM 2017;80:1256–68.
[10] Angelos E, Saavedra O, Cortes O, Souza AD. Detection and identification of
15
SVM abnormalities in customer consumptions in power distribution systems. IEEE Trans
10 Power Del 2011;26:2436–42.
5 [11] Joaquim L, Viegas. Clustering-based novelty detection for identification of non-
DT-SVM
0 technical losses. Int J Electr Power Energy Syst 2018;101:301–10.
[12] Razavi Rouzbeh, Gharipour Amin, Fleury Martin, Akpan Ikpe Justice. A practical
feature-engineering framework for electricity theft detection in smart grids. Appl
Fig. 18. Detection accuracy with abnormal data. Energy 2019;238:481–94.
[13] Jindal Anish, Dua Amit. Decision Tree and SVM-based data analytics for theft
detection in smart grid. IEEE Trans Ind Informat 2016;12(3):1005–16.
proposes electricity theft detection based on similarity measure and DT- [14] Nagi J, Yap KS, Mohamad KM. Non-technical loss detection for metered customers
KSVM. Firstly, determining the low-voltage theft station by ETD. Then in power utility using support vector machines. IEEE Trans Power Del 2010;25(2):
1162–71.
supervised learning and unsupervised learning are combined to locate [15] Fujun Jian, Min Cao. SVM based energy consumption abnormality detection in
illegal consumers. Finally, MATLAB are used to prove that various types AMI system. Electr Meas Instrument 2014;51(06):64–9.
of power stealing methods have achieved good results, and the detection [16] Bhat RR, Trevizan RD, Li X, Bretas A. Identifying non-technical power loss via
spatial and temporal deep learning. IEEE international conference on machine
accuracy is more than 90%. When the power stealing type is not learning and applications, Anaheim, CA, USA, December 2016. 2016.
considered, the whole accuracy of detection of the electricity stealing [17] Ismail M, Shahin M, Shaaban MF, Serpedin E, Qaraqe K. Efficient detection of
reaches 95.6%, and the omission detection rate reaches 4.3%. Compared electricity theft cyber attacks in ami networks. Proceedings of the IEEE wireless
communications and networking conference, Barcelona, Spain, April 2018. 2018.
with SVM and DT-SVM, the accuracy of the proposed method is
[18] Zheng Z, Yang Y, Niu X, Dai H-N, Zhou Y. Wide and deep convolutional neural
improved by 1.5%. The method used in this paper has some advantages networks for electricity-theft detection to secure smart grids. IEEE Trans Ind Inf
in terms of AUC and accuracy compared to existed methods. But slightly 2018;14(4):1606–15.
less effective than CNN. At the same time, under the interference of [19] Liu Yunpeng, Xu Ziqiang, Wang Quan. Data augmentation method for power
transformer fault diagnosis based on conditional Wasserstein generative
abnormal data, the detection accuracy of this method has higher accu­ adversarial network. Power Syst Technol 1-9.
racy than the other two methods. The method provided a new method [20] Xiaolong X, Wen C, Yanfei S. Over-sampling algorithm for imbalanced data
for real-life detection. But the test results show that the proposed classification. J Syst Eng Electron 2019;30(6):1182–91.
[21] Pan Tingting, Zhao Junhong, Wu Wei, Jie Yang. Learning imbalanced datasets
method still has certain limitations. The detection accuracy of the based on SMOTE and Gaussian distribution. Inf Sci 2020;512:1212–33.
existing stealing methods is relatively high. The detection accuracy and [22] Douzas G, Bacao F, Last F. Improving imbalanced learning through a heuristic
efficiency of some unknown theft methods need to be improved. Some oversampling method based on k-means and SMOTE. Inf Sci 2018;465:1–20.
[23] Li Yanxia, Chai Yi, Hu Youqiang. Review of imbalanced data classification
novel NTL detection methods based on measurement data need to be methods. Contr Decis 2019;34(4):673–88.
paid attention to in future research. [24] Scott Zuloaga, Puneet Khatavkar, Vijay Vittal. Interdependent electric and water
infrastructure modelling, optimisation and control. 1st ed.2. IET Energy Systems
Integration; 2020. p. 9–21.
[25] Kong Xiangyu, Hu Qian, Yi Zeng. Load data identification and correction method
Declaration of Competing Interest with improved fuzzy C-means clustering algorithm. Automat Electr Power Syst
2017;41:90–5.
The authors declare that they have no known competing financial [26] Guo Zhifeng, Zhou Kaile, Zhang Chi, Xinhui Lu, Chen Wen, Yang Shanlin.
Residential electricity consumption behavior: Influencing factors, related theories
interests or personal relationships that could have appeared to influence and intervention strategies. Renew Sustain Energy Rev 2018;81(1):399–412.
the work reported in this paper. [27] Gong Gangjun, Chen Zhimin. Clustering optimization strategy for electricity
consumption behavior analysis in smart grid. Automat Electr Power Syst 2018;42:
58–63.
Acknowledgment [28] Hu Tianyu, Guo Qinglai, Sun Hongbin. Non-technical loss detection based on
stacked uncorrelating autoencoder and support vector machine. Automat Electr
Power Syst 2019;43(01):119–27.
This work was supported by the National Natural Science Foundation
[29] Wang Shouxiang, Chen Haiwen. A reconstruction method for missing data in
of China (51877145). power system measurement using an improved generative adversarial network.
Proc CSEE 2019;39(01). pp. 56–64+320.
[30] Arjovsky M, Chintala S, Bottou L. Wasserstein GAN. arXiv preprint arXiv: 1701.
Appendix A. Supplementary material 07875; 2017.
[31] Seeliger K, Güçlü U, Ambrogioni L, Güçlütürk Y, van Gerven MAJ. Generative
Supplementary data to this article can be found online at https://doi. adversarial networks for reconstructing natural images from brain activity.
NeuroImage 2018;181:775–85.
org/10.1016/j.ijepes.2020.106544. [32] Kingma D P, Ba J. Adam: A method for stochastic optimisation. arXiv preprint
arXiv: 1412. 6980, 2014:434-435.
[33] Kang Ningnin, Li Chuan, Zeng Hu, Li Yingna. Electric larceny detection using FCM
References clustering and improved SVR model. J Electron Meas Instrument 2017;31(12):
2023–9.
[1] Zaman R, Brudermann T. Energy governance in the context of energy service [34] Li Z, Guo J, Li H, Wu T, Mao S, Nie F. Speed up similarity search of time series
security: a qualitative assessment of the electricity system in Bangladesh. Appl under dynamic time warping. IEEE Access 2019;7:163644–53.
Energy 2018;223:443–56. [35] De Cock M, et al. Efficient and private scoring of decision trees, support vector
[2] Gaur V, Gupta E. The determinants of electricity theft: an empirical analysis of machines and logistic regression models based on pre-computation. IEEE Trans
Indian states. Energy Policy 2016;93:127–36. Dependable Secure Comput 2019;16(2):217–30.
[3] Aryanezhad M. A novel approach to detection and prevention of electricity [36] Sun Fengxian. Study on multi-class classification method of support vector
pilferage over power distribution network. Int J Electr Power Energy Syst 2019; machine based on decision tree. Northeast Normal University 2015.
111:191–200. [37] Aydia Z, Gungor VC. A novel feature design and stacking approach for non-
[4] Jokar P, Arianpoo N, Leung VCM. Electricity theft detection in AMI using technicalelectricity loss detection. IEEE Innov Smart Grid Technol 2018:867–72.
customers’ consumption patterns. IEEE Trans Smart Grid 2016;7:216–26. [38] ISSDA. Data from the Commission for Energy Regulation – <http://www.ucd.
[5] Yaqi S, Guoliang Z, Yongli Zu. Present status and challenges of big data processing ie/issda>.
in smart grid. Power System Technol 2013;37:927–35.

11

You might also like