Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Electricity Theft Detection

Using Machine Learning

Dr. S R Mahanty
Anshul Shrivastava (17085082)
Ayush Gupta (17084004)
Jogani Tanmay Ashokbhai (17085038)
NON-TECHNICAL LOSSES
• Non-technical losses are mainly caused by fraud activities deliberately performed
by the consumers.
• Besides the financial issues due to non-revenue energy, these frauds lead to a
series of additional losses, including damage to grid infrastructure, reduction of
grid reliability and may be cause of accidents.
Unauthorized Electricity Consumption
(NTL)

• Ways of unauthorized electricity consumption (energy theft)


• Taking connections directly from distribution line
• Grounding the neutral wire
• Inserting some disc to stop rotating of the coil
• Hitting the meter to damage the rotating coil

• It is estimated that utility companies lose more than $89.3 billion every year
due to energy theft around the world [1].
Smart Energy Meter
• A smart meter is an electronic device that records consumption of electric energy
and communicates the information to the electricity supplier for monitoring and
billing.
• Smart meters typically record energy hourly or more frequently, and report at
least daily.
• Smart meters enable two-way communication between the meter and the
central system.
• Such an advanced metering infrastructure (AMI) differs from automatic meter
reading (AMR) in that it enables two-way communication between the meter and
the supplier.
• Communications from the meter to the network may be wireless, or via fixed
wired connections such as power line carrier (PLC).
Overview of Advanced Metering Infrastructure
• AMI:
• Comprised of state‐of‐the‐art
electronic/digital hardware and
software.
• Enable detailed, time‐based data
measurements and their
transmissions.
• Benefits:
• System operation benefits
• Customer service benefits
• Financial benefits
Anomaly Detection
• Anomaly is a pattern that does not conform
to the expected behavior.
• Also referred to outliers, exceptions,
surprises, novelty, etc.
• General Steps for Anomaly Detection:
• Build a profile (or pattern) of the normal
behavior
• Use the normal profile to detect anomalies
• Anomalies are observations
whose characteristics differ
significantly from normal profile.
• There are many algorithms for anomaly
detection such as clustering methods, one-class
SVMs etc.
State of the art: comparison
• Accuracy:

• Precision:

• Recall:

• F1:
Salient Features of datasets
• Electricity consumption data of State Grid Corporation of China
• This is a realistic electricity consumption dataset released by State Grid Corporation of China
[2].
• This dataset contains the electricity consumption data of 42,372 electricity customers within
1,035 days (from Jan. 1, 2014 to Oct. 31, 2016).
• Smart energy data from the Irish Smart Energy Trial
• This Smart Meter dataset is provided by Irish Social Science Data Archives (ISSDA), Ireland.
• The dataset used for the experiment is the subset of smart meter dataset of Ireland in
December 2010.
• The data set considered in this work is residential smart meter data. The data contains the
information about the customer id, code for date/time, electricity consumption for every 30
minutes (in KWh).
• The daily profile for every customer comprises of 48 power consumption readings. It
includes hourly electricity usage reports of Irish homes in 2009 and 2010.
1st Approach
Data
Raw Data
Pre-processing F1
Score:
22.29%

Dimensionality Test Data


Testing
Reduction

Training data
Accuracy
Score:
Synthetic Data 89.30%
Generation

Model Training
Processes involved:
• Data Pre-processing:
• In this process, we exploit the interpolation method to recover the missing values according to the following equation:
𝑥𝑖−1 +𝑥𝑖+1
; 𝑥𝑖 ∈ NaN, 𝑥𝑖−1 , 𝑥𝑖+1 ∈NaN
2
• 𝑓 𝑥𝑖 = 0 ; 𝑥𝑖 ∈ NaN, 𝑥𝑖−1 𝑜𝑟 𝑥𝑖+1 ∈NaN
𝑥𝑖 ; 𝑥𝑖 ∈ NaN


• Dimensionality Reduction:
• Since no. of features are very large in quantity (1035)
so we used Principal Component Analysis(PCA) to reduce
features to low-dimensional space.
• PCA:
• PCA aims to detect the correlation between variables. If a strong
correlation between variables exists, the attempt to reduce the
dimensionality only makes sense.It attempts to find directions of
maximum variance in high-dimensional data and project it onto a
smaller dimensional subspace while retaining most of the
information.

PCA
• Synthetic Data Generation & Over-sampling:
• Oversampling can be defined as adding more copies of the minority class.
• We used imblearn’s SMOTE or Synthetic Minority Oversampling Technique.
• SMOTE uses a nearest neighbors algorithm to generate new and synthetic
data.
• It actually creates new instances of the minority
class by forming convex combinations of
neighboring instances. As the graphic below shows,
it effectively draws lines between minority points
in the feature space, and samples along these lines.
• This allows us to balance our data-set without as
much overfitting, as we create new synthetic
examples rather than using duplicates.
• Classifier:
• We used random forest as classifier
• Random Forest:
• Random Forest is a supervised learning algorithm
• Random forest builds multiple decision trees and
merges them together to get a more accurate and
stable prediction.
• Instead of searching for the most important
feature while splitting a node, it searches for the
best feature among a random subset of features.
This results in a wide diversity that generally
results in a better model.
• Therefore, in Random Forest, only a random subset of the features is taken into
consideration by the algorithm for splitting a node.
Random Forest Pseudo code:
1.Randomly select “k” features from total “m” features.
1.Where k << m
2.Among the “k” features, calculate the node “d” using the best split point.
3.Split the node into daughter nodes using the best split.
4.Repeat 1 to 3 steps until “l” number of nodes has been reached.
5.Build forest by repeating steps 1 to 4 for “n” number times to create “n” number
of trees.
Now to get predictions…
• Take the test features and use the rules of each randomly created decision tree to
predict the outcome and store the predicted outcome (target)
• Calculate the votes for each predicted target.
• Consider the high voted predicted target as the final prediction from the random
forest algorithm.
Conclusions Non-Theft
Theft
• Accuracy: 89.30%
• F1 score: 22.29%
• 589 wrong predictions were made out
of 719 theft labelled data.
• Accuracy is not a good measure in this
case due to class imbalance, which
reflects in F1 score.
• Since consumption data of both
classes overlap for many customers so
there is a need to refine the data.
Electricity Consumption plot for theft & non-theft
consumers
Raw Data
Data
Pre-processing 2 nd Approach

K-means clustering
F1
&
Data Thresholding Score:
99.36%

Test Data
Dimensionality
Reduction
Testing

Training data
Accuracy
Synthetic Data Score:
Generation 99.83%

Model Training
Processes Involved
• All processes except K-means clustering have been described in previous slides.
• K-means clustering:
• 1: Define K centroidsrandomly.
• 2: Associate every observation according to the nearest centroid.
• 3: Define new centroids according to the mean of the clusters.
• 4: Repeat step 2 and 3 toconverge.
• Data Thresholding:
• Cluster with maximum no. of data points is chosen.
• Find Euclidean distance of each data point from cluster head of this cluster.
2
• Euclidean distance can be defined as σ𝑛𝑘=1 𝑥𝑖𝑘 − 𝑥𝑗𝑘 ,
• n=number of features.
• 𝑥𝑖 &𝑥𝑗 are profiles of 𝑖 𝑡ℎ and 𝑗𝑡ℎ customers respectively.
• If distance<110 and labelled as non-theft then it is considered as non-theft data.
• If distance>140 and labelled as theft then it is considered as theft data.
Conclusions Non-Theft
Theft
• Accuracy: 99.83%
• F1 score: 99.36%
• 5 wrong predictions were made out of
394 theft labelled data.

Electricity Consumption plot for theft & non-theft


consumers
Raw Data
Data
3rd Approach
Pre-processing

F1
Score:
K-means 92.78%
clustering

Testing
Trustworthiness of
Customers
Accuracy
Model Training Score:
92.59%

Bogus Data
Generation
Processes Involved
• All processes except Trustworthiness of Customers & Bogus Data Generation have
been described in previous slides.
• Trustworthiness of Customers:
• After applying K-means clustering cluster with maximum no. of data points is
chosen.
• Data points with Euclidean distance less than threshold are assumed as
Trustworthy customers. Threshold was taken equal to 90.
• Bogus Data Generation:
• Three types of Bogus
Data(T1,T2,T3) are generated
using trustworthy customers
profiles:
• Type-1:a random value is generated
between -0.5 and 0.5. This random value
is multiplied with the average reading
value calculated for per day.
• T1 (Dt ) = α Dt ;α = Random (0, 0.5)
• Type-2:random days are taken where the
actual data values are replaced with
random values between 0.1 and 0.8.
• Type-3:the mean value of all days
readings are multiplied with the each of
the day reading with a random value
greater than 1.
• T3 (Dt ) = Ƴ Dt Ƴ= mean (Dt )*random
value.
Conclusions Non-Theft
Theft
• Accuracy: 92.59%
• F1 score: 92.78%
• 21 wrong predictions were made out
of 317 theft labelled data.

Electricity Consumption plot for theft & non-theft


consumers
References and Links:
[1] https://www.prnewswire.com/news-releases/world-loses-893-billion-to-
electricity-theft-annually-587-billion-in-emerging-markets-300006515.html
[2] http://www.sgcc.com.cn/
[3] https://github.com/anshulll/Electricity-Theft-Detection
Given a training set X = x1, ..., xn with responses Y = y1, ..., yn, bagging
repeatedly (B times) selects a random sample with replacement of the
training set and fits trees to these samples:
For b = 1, ..., B:
1. Sample, with replacement, n training examples from X, Y; call
these Xb, Yb.
2. Train a classification or regression tree fb on Xb, Yb.

After training, predictions for unseen samples x' can be made by averaging
the predictions from all the individual regression trees on x':

Or by taking majority vote (mode) in case of classification

You might also like