Professional Documents
Culture Documents
Thesis Malayesia PDF
Thesis Malayesia PDF
Thesis Malayesia PDF
By
JAWAD NAGI
2009
DEDICATION
This thesis is dedicated to my father, who taught me that the best kind of
knowledge to have is that which is learned for its own sake. It is also dedicated
to my mother, who taught me that even the largest task can be accomplished if
it is done one step at a time.
ii
ABSTRACT
ACKNOWLEDGEMENT
First and foremost, I wish to thank God for giving me strength and courage to
complete this thesis and research, and also to those who have assisted and
inspired me throughout this research.
There are so many people to whom I am indebted for their assistance during my
endeavors to complete my Masters candidature in Electrical Engineering at
Universiti Tenaga Nasional (UNITEN). First and foremost, I would like to
express my gratitude to my supervisor Mr. Yap Keem Siah, whose invaluable
guidance and support was very helpful throughout my research. A similar level
of gratitude is due to my co-supervisor, Dr. Tiong Sieh Kiong. It is unlikely that I
would have reached completion without their encouragement and support.
Besides that, I would like to thank the Non-Technical Loss (NTL) team Project
Leader from TNB Research (TNBR) Sdn. Bhd., Ir. Haji Abdul Malik Mohamad,
and also Norazlinawati Mohamad who were both very supportive and helpful in
this research project. TNBR Sdn. Bhd. is most appreciated for funding this work
under Grant RJO 10061948.
My appreciation also goes to TNB Distribution (TNBD) Sdn. Bhd. for providing
us with the customer data and other helpful information for the project. I
express my appreciation to everyone who has involved directly and indirectly
to the success of this research. Last but not least, my family for their
understanding, patience, encouragement and support. Thank you for all the
support, comments and guidance.
iv
DECLARATION
This thesis may be made available within university library and may be
photocopied or loaned to other libraries for the purposes of consultation.
30 June 2009
Jawad Nagi
TABLE OF CONTENTS
Page
DEDICATION
ii
ABSTRACT
iii
ACKNOWLEDGEMENTS
iv
DECLARATION
TABLE OF CONTENTS
vi
LIST OF TABLES
xii
LIST OF FIGURES
xiv
xix
CHAPTER 1 INTRODUCTION
1.0
Preliminaries
1.1
Project Overview
1.2
1.3
1.4
Research Methodology
1.5
Thesis Overview
11
Overview
14
2.1
Load Profiling
14
17
18
Electricity Losses
21
21
2.2
23
2.2.2.1
NTL Impacts
26
2.2.2.2
26
vi
2.3
2.4
Fraud Detection
27
28
31
2.3.2.1
31
2.3.2.2
32
Electricity Theft
32
33
2.5
2.6
36
2.4.2.1
37
2.4.2.2
39
42
43
44
Summary
46
Overview
48
3.1
48
50
3.1.1.1
Pattern Recognition
51
3.1.1.2
52
3.1.1.3
3.1.1.2.1
Supervised Learning
53
3.1.1.2.2
Unsupervised Learning
55
Hypothesis Selection
3.1.2 AI Techniques
3.2
57
58
3.1.2.1
58
3.1.2.2
59
3.1.2.3
60
63
64
3.2.1.1
64
66
vii
3.2.2.1
3.2.2.2
68
3.2.2.1.1
Dual Problem
70
3.2.2.1.2
Non-Separable Case
71
3.2.2.1.3
Karush-Kuhn-Tucker Conditions
73
Non-linear SVC
75
78
3.2.3.1
3.2.3.2
3.3
Linear SVC
79
3.2.3.1.1
80
3.2.3.1.2
82
Summary
84
84
Overview
86
4.1
Proposed Framework
86
87
88
Data Collection
89
91
92
Data Preprocessing
94
94
4.2
4.3
4.3.1.1
95
4.3.1.2
Consumption Transformation
96
4.3.1.3
99
4.3.1.3.1
4.4
Cross-Validation (CV)
105
4.3.1.4
Feature Normalization
107
4.3.1.5
Feature Adjustment
107
4.3.1.6
Feature File
109
111
112
113
viii
4.5
4.4.2.1
Weight Adjustment
115
4.4.2.2
Parameter Optimization
117
4.4.2.3
Probability Estimation
118
4.4.2.4
SVC Training
119
4.4.2.5
SVC Testing
123
Data Postprocessing
127
129
4.5.1.1
Fuzzy Sets
129
4.5.1.2
129
4.5.1.3
Linguistic Variables
130
4.5.1.4
131
4.5.1.5
132
4.5.1.6
133
135
4.5.2.1
136
4.5.2.2
Parameter Selection
137
4.5.2.3
137
4.6
115
141
4.5.3.1
141
4.5.3.2
142
4.5.3.3
FIS Implementation
146
147
Summary
Overview
148
5.1
148
150
5.1.1.1
151
5.1.1.2
152
5.1.1.3
Execute Detection
153
5.1.1.4
Detection Complete
154
ix
155
5.1.2.1
156
5.1.2.2
156
5.2
160
5.1.3.1
Detection Report
160
5.1.3.2
161
163
Model Validation
165
166
5.2.1.2
168
168
170
5.2.2.2
171
5.2.2.3
173
5.3
167
174
5.2.3.1
175
5.2.3.2
177
5.2.3.3
182
5.2.3.4
185
Summary
191
Overview
193
6.1
193
6.2
194
6.3
196
6.4
198
6.5
201
202
203
203
6.6
204
Conclusion
205
REFERENCES
206
APPENDICES
226
Appendix A:
227
Appendix B:
228
230
xi
LIST OF TABLES
Table No.
Page
1.1
1.2
2.1
15
2.2
25
3.1
78
3.2
84
4.1
90
4.2
92
4.3
93
4.4
104
4.5
111
4.6
112
4.7
116
4.8
121
4.9
138
4.10
140
4.11
141
xii
5.1
169
5.2
172
5.3
183
5.4
185
xiii
LIST OF FIGURES
Figure No.
Page
1.1
1.2
1.3
2.1
34
2.2
36
2.3
36
2.4
38
2.5
40
3.1
62
3.2
67
3.3
69
3.4
72
xiv
3.5
76
3.6
81
4.1
87
4.2
88
4.3
91
4.4
93
4.5
94
4.6
96
4.7
98
4.8
98
4.9
101
4.10
101
4.11
102
4.12
102
4.13
105
4.14
106
4.15
108
4.16
109
4.17
110
4.18
110
4.19
112
xv
4.20
114
4.21
115
4.22
116
4.23
118
4.24
120
4.25
120
4.26
120
4.27
121
4.28
123
4.29
124
4.30
125
4.31
125
4.32
126
4.33
126
4.34
127
4.35
128
4.36
131
4.37
134
4.38
136
4.39
139
xvi
4.40
142
5.1
149
5.2
149
5.3
150
5.4
151
5.5
151
5.6
152
5.7
152
5.8
154
5.9
154
5.10
155
5.11
155
5.12
156
5.13
157
5.14
158
5.15
158
5.16
159
5.17
159
5.18
159
5.19
160
5.20
161
5.21
162
xvii
5.22
162
5.23
163
5.24
164
5.25
164
5.26
165
5.27
177
5.28
187
xviii
AEIC
AFDS
AI
Artificial Intelligence
AMR
ANN
ASEAN
BP
Back-Propagation
BPNN
BSV
CIBS
CT
Current Transformer
CV
Cross-Validation
CWR
DA
Discriminant Analysis
DARPA
DLO
e-CIBS
EA
Evolutionary Algorithm
ELM
EMPD
ERM
ES
Expert System
FAM
FCM
Fuzzy C Means
FIS
FL
Fuzzy Logic
FPDL
GA
Genetic Algorithm
GUI
HR
High Risk
HRC
HV
High Voltage
IG
Information Gain
IPP
IR
Irregularity Report
k-NN
k-Nearest Neighbor
KA
Kernel Adatron
KBS
KDD
KKT
Karush-Kuhn-Tucker
KL
Kuala Lumpur
LIBSVM
LPC
LV
Low Voltage
MF
Membership Function
MIT
ML-BPNN
MLP
Multi-Layer Perceptron
MS-DOS
MSE
MYR
Malaysian Ringgit
NN
Neural Network
NTL
Non-Technical Loss
OCR
OLAP
OMH
OPC
OS-ELM
PEA
PEC
PF
Power Factor
QP
Quadratic Programming
xx
RAM
RAN
Random Selection
RBF
RMR
RLS
SA
Simulated Annealing
SCADA
SEAL
SEC
SESB
SESCO
SLFN
SMB
SMO
SOM
Self-Organizing Map
SQL
SRM
SV
Support Vector
SVC
SVM
TNB
TNBD
TNBR
TOE
Theft Of Electricity
UNITEN
USD
VC
Vapnik-Chervonenkis
xxi
CHAPTER 1
INTRODUCTION
1.0 Preliminaries
Non-Technical Losses (NTLs) originating from electricity theft and other
customer malfeasances are a problem in the electricity supply industry. Such
losses occur due to meter tampering, meter malfunction, illegal connections,
billing irregularities and unpaid bills [1]. The problem of NTLs is not only faced
by the least developed countries in the Asian and African regions, but also by
developed countries such as the United States of America and the United
Kingdom [2]. Specifically, high rates of NTL activities have been reported in the
majority of developing countries in the Association of South East Asian Nations
(ASEAN) group, which include Malaysia, Indonesia, Thailand and Vietnam [3].
As an example, in the United States NTLs have been estimated to account
between 0.5% to 3.5% of the total annual revenue [1], which is relatively low
when compared to the NTLs faced by utilities in developing countries such as
Bangladesh [2], India [3], Pakistan [4] and Lebanon [5], where an average of
between 20% to 30% of NTLs have been observed. Nonetheless, in 1998, the
revenue loss by power utilities in United States was estimated between USD 1
billion to USD 10 billion, given that all utility companies in the United States had
an annual gross revenue of USD 280 billion [1].
Due to the problem associated with NTLs in electric utilities, various methods
for efficient management of NTLs [11] and protecting revenue in the electricity
distribution industry [12] have been proposed. The most effective method
currently to reduce NTLs and commercial losses up to date is by using smart
and intelligent electronic meters [13], which make fraudulent activities more
difficult, and easy to detect. However, the cost of such meters comes at an
expensive price. Therefore, these types of meters are not currently feasible to
be used by power utilities throughout the entire low voltage (LV) distribution
network, i.e., in residential and commercial sectors.
the dramatic increase of fraud which results in losses worldwide each year,
several computational intelligence techniques for detection and prevention of
fraud have continually evolved and are being applied to many business fields
[35]. The most popular computational intelligence branch evolved in the field of
Computer Science is Artificial Intelligence (AI), which has been used by
researchers over the last three decades.
Nave
Bayes
95.9
91.5
62.9
72.5
81.0
50.0
58.0
78.7
60.6
47.3
72.0
Rocchio
C4.5
k-NN
SVM
96.1
92.1
67.6
79.5
81.5
77.4
72.5
83.1
79.4
62.2
79.9
96.1
85.3
69.4
89.1
75.5
59.2
49.1
80.9
85.5
87.7
79.4
97.3
92.0
78.2
82.2
85.7
77.4
74.0
79.2
76.6
77.9
82.3
98.5
95.4
76.3
93.1
88.9
77.8
76.2
85.4
85.2
85.1
86.4
Electric meters or kWh (Kilowatt-hour) meters record readings which are used
to bill customers for the amount of electricity consumed [36]. In many
developing countries in the ASEAN group, electro-mechanical induction
electricity meters are currently used, which can easily be tampered. Inspection
of electric meters for fraud detection and identification is currently a manual
and tedious task, which requires experienced staff. However, meter malfunction
4
correction is currently limited for experienced utility staff [37], due to the
complexity associated in rectifying problematic electric meters.
Tenaga Nasional Berhad (TNB) is the main electricity provider in peninsular Malaysia.
TNB Distribution (TNBD) is the transmission and distribution division of TNB Malaysia.
3 TNB Research (TNBR) is the research division of TNB Malaysia.
Large inspection campaigns have been carried out by TNBD SEAL teams with
little success. The current actions taken by the SEAL teams in order to address
the problem of NTLs include: (i) meter checking and premise inspection, (ii)
reporting on irregularities, and (iii) monitoring of unbilled accounts, which
have resulted in a fraud detection hitrate of 3-5%. This is because customer
installation inspections are carried out without any specific focus or direction.
In most cases, inspections are carried out at random, while some targeted raids
are undertaken based on information reported by the public or meter readers.
Figure 1.2: A wire used to slow the rotating disc in an electric meter [37]
The main objective of research study is to identify, detect and predict customers
with fraud activities and abnormalities by investigating abrupt changes in
customer load consumption patterns.
NTLs not only affect a companys profitability and credibility, but also increase
the cost of electricity to customers [16]. Therefore, the need to minimize this
problem is crucial for the both utilities and their customers. The respective
benefits gained by power utilities and their customers from reduction of NTL
activities are illustrated in Table 1.2.
Customers
The resulting fraud detection system developed for detection and identification
of NTL activities will significantly benefit both the power utilities and their
customers, as indicated by Table 1.2.
Input
Raw Customer Data
Data Preprocessing
Feature Selection
and Extraction
List of Suspicious
Customers
Data Postprocessing
using FIS
Result
The proposed SVM and FIS computational intelligence scheme proposed in this
study, however, is different from the methodologies presented in [38-40],
because in this approach the FIS is used as a data postprocessing scheme for the
selection of suspicious customers, based on relationship of parameters in the
SVM input, output and the preprocessed customer data. This combination of
SVM and FIS provides better classification and detection of NTL activities than
compared to conventional SVM.
The purpose of this research study is to use the knowledge gathered from
customers load profiles in order to detect significant behavioral deviations that
signal NTL activities. NTLs have been observed in many countries, which are a
significant concern. Therefore, it is important to be able to detect and identify
9
1. This study focuses on the low voltage (LV) distribution network, which
include: residential, commercial and light industrial customers by using
the monthly kWh interval data retrieved over a period of time.
2. The NTL detection technique proposed in this research study is based on
a combination of data mining and AI based classification approaches,
which is distinct from the other approaches that have been implemented
to minimize the NTL problems, as discussed in Chapter 2 later onwards.
10
11
Chapter 4 provides the methodology proposed for the fraud detection system
and implements the associated key algorithms to be used in NTL detection,
identification and prediction. In the first sub chapter, general project and
research methodologies are introduced. The three major stages are involved in
the development of the fraud detection system, include: (i) data preprocessing,
(ii) classification engine development, and (iii) data postprocessing. The data
preprocessing sub chapter illustrates data mining techniques used for
preprocessing raw customer information and billing data for feature selection
and feature extraction. The sub chapter, classification engine development
illustrates the SVC training, parameter optimization, development of the SVC
classifier and the SVC testing and validation engine. The last sub chapter, data
postprocessing describes the development of a Fuzzy Inference System (FIS),
creation of fuzzy rules and membership function (MF) formation for the
selection of suspicious customers from the SVC results.
12
Chapter 5 is composed of two main sub chapters. Sub chapter 1, presents the
Graphical User Interface (GUI) developed for the fraud detection system. The
GUI of the software developed, generates the detection report of the list of
suspicious customers and the average daily consumption report in order to
inspect the load consumption patterns of the suspicious customers. In sub
chapter 2, model validation results are presented based on: (i) the classifier, (ii)
pilot testing, and (iii) comparison of the proposed model with other AI based
classification techniques. Model validation results obtained are discussed and
evaluated. The contribution of the FIS for hitrate improvement is also discussed
and the computational intelligence scheme of SVC and FIS is compared to
standard SVC. Finally in the end of sub chapter 2, a comparative study of the
proposed SVC and FIS model is performed with two AI based classification
techniques: (i) Multi-Layer Backpropagation Neural Network (ML-BPNN), and
(ii) Online-Sequential Extreme Learning Machine (OS-ELM) in order to evaluate
the efficiency of the proposed fraud detection system. The results of the
comparative study are discussed and elaborated in detail.
Chapter 6, which is the last chapter, concludes the thesis and summarizes the
research contributions made. The achievements and objectives of the research
study with respect to the project are highlighted along with the key findings of
the research. In addition, this chapter also discusses the impact and significance
of this project to TNB in Malaysia and suggests future research in the present
context that merits consideration.
13
CHAPTER 2
LITERATURE SURVEY
2.0 Overview
This chapter presents a literature review of electricity load profiling studies
conducted in various countries. A summary of these studies and their means of
implementation are provided in Table 2.1. In the next section, consideration is
given to background and theoretical concepts relating to power losses that
electricity distribution companys experience, which include technical losses
and NTLs. Also, some background issues concerning fraud detection techniques
used in electricity businesses, as well as in other businesses are reviewed. In
addition, various methods of electricity theft, such as meter tampering etc are
also presented. The final section provides an overview of TNB, the sole
electricity provider in peninsular Malaysia, as the case study for the present
research. Particular reference is made to the connection between load profiles,
NTLs, and fraud detection within this electricity supply utility.
14
In many countries, load profiles have been identified as an alternative and costeffective approach to the interval metering4 solution, which is known to be
expensive and impractical for small, LV commercial and domestic customers. In
addition, having knowledge of customers load profiles not only assists
distribution companies in determining the demand price of electricity, but also
provides better marketing strategies [42].
Over the last two decades, a number of load profile studies have been carried
out to classify electric utility customers based on their load consumption
behavior. These studies have been carried out in countries including Taiwan
[4347], Slovenia [7, 41], Romania [4850], Portugal [5153], United Kingdom
[5457], Malaysia [5862], Belgium [63], Spain [64] and Brazil [65]. The main
objective of load profile studies in general is to extract and record information
relating to customer load characteristics [46]. In all the different countries these
studies have been carried out, they have served various purposes and the
countries have gained significant benefits. The various reasons reported for
conducting such studies are indicated in Table 2.1.
Table 2.1: Summary of load profile studies worldwide and their implementation
Countries
Techniques
Taiwan,
Taiwan Power
Company
Statistical
techniques
Interval metering is when a meter records customer demand and consumption in fixed
intervals of time during a single day to create a consumption usage profile.
15
Slovenia
Romania
Portugal,
The
Portuguese
Distribution
Company
United
Kingdom
Malaysia,
Tenaga
Nasional
Berhad
Hierarchical
clustering
Fuzzy C Means
(FCM)
clustering
K-Means
clustering
Hierarchical
clustering
K-Means
clustering
Self-Organizing
Map (SOM)
Two Level
Approach
Fuzzy
Classification
Fuzzy C Means
(FCM)
clustering
Artificial neural
network (ANN)
Fuzzy
Classification
16
From Table 2.1 it is shown that the load profile studies conducted vary from
one country to another. However, in the case of Malaysia, the only load profile
studies that have ever been conducted used a set of load data from 46 different
feeders related with TNB in order to demonstrate a method of classifying the
daily load curves of different consumers derived from a distribution network
[61, 62]. Since such a limited study of customer behavior changes in Malaysia
have been previously reported, therefore research in here was proposed.
The rest curve represents the consumption profile of the end users that are not measured on
time-interval basis and is derived on the basis of actual power consumed [66].
17
group, various pattern recognition methods were used as load profiling tools to
develop typical load profiles based on the shapes of the load patterns recorded
[6, 41, 42, 49, 67].
However, these studies indicate that load profiling procedures of both research
groups are affected by limitations. The main limitation affecting the first group
is that the time required for measurement is quite long. For the second group,
the key limitation is that the procedure required to develop the customer
characteristics is expensive and time consuming [42]. Even though it might be
argued that the second approach is better than the first; there is no clear and
concise way to determine the optimum pattern recognition method when
seeking to represent load profiles. Hence, the research presented in this study
employs SVM to solve the pattern recognition problem of classifying load
profiles based on similarities of consumption behavior for differentiating
between fraud and good (normal) consumption patterns.
18
In recent years, clustering techniques have been widely used in many realworld applications, including document clustering [69, 70], gene expression
micro-array data analysis [71, 72], and image segmentation [73]. In addition,
clustering techniques have also been used in load profiling studies to group
similar load profiles for different purposes. However, none of the studies
conducted so far have used classification techniques to group load patterns of
individual customers based on similarities of consumption behavior for
differentiating between normal and abnormal load patterns. Therefore the
present study aims to use SVM, which is a novel machine learning technique for
classification.
In general, clustering techniques broadly fall into two classes (i) partitional and
(ii) hierarchical techniques. In partitional techniques, K-means clustering is
widely used and in hierarchical clustering, single linkage is most commonly
used [74]. However, in [75], Han et al. expanded the classification of clustering
techniques into five major categories: (i) Partitioning methods, (ii) Hierarchical
methods as mentioned by Karypis et al. in [76], (iii) Density-based methods, (iv)
Grid-based methods, and (v) Model-based methods [77]. Among all clustering
categories, each clustering technique has its own advantages and disadvantages
depending on the problem to be addressed and the assumptions made. The
most popular clustering techniques used for classification of electricity supply
utility customers based on load profiles and reported in literature are described
briefly in the following section.
One of the most popular clustering techniques for determining load profiles is
Fuzzy clustering. From [41], it is apparent that Fuzzy C Means (FCM) can be
used as an effective outlier detection tool. The only limitation of FCM is that the
number of clusters needs to be specified in advance. Hierarchical clustering was
employed by Gerbec [7, 41, 42, 66, 7882] and Chico [4850, 8388] to group
customers based on their behavioral similarities. Unlike FCM, the main
19
The studies conducted above indicate that load profile investigations have long
been undertaken for a variety of reasons. However, currently there are no
existent case studies of customer behavior changes that are important for
electricity supply utilities. Most of the studies carried out have reported
clustering techniques to group customers with their load profiles appropriately,
but no studies on applying classification techniques for load profiling are
present. Therefore, the present study focuses on employing an AI based
classification technique, SVM for grouping load patterns of individual customers
20
(2.1)
Where
ELoss is the amount of energy lost,
EDelivered represents the amount of energy delivered, and
ESold represents the amount of energy recorded or sold.
Technical losses are possible to compute and control, provided the power
system in question consists of known quantities of loads. Computation tools for
calculating power flow, losses, and equipment status in power systems have
been developed for some time. Improvements in information technology and
data acquisition have also made the calculation and verification of technical
losses easier. These losses are calculated based on the natural properties of
components in the power system, which include resistance, reactance,
capacitance, voltage, and current. Loads are not included in technical losses
because they are actually intended to receive as much energy as possible [94].
(2.2)
Where
CLoss is the revenue loss due to technical/additional losses,
UCost represents the unit cost of electricity,
ELoss represents the amount of energy lost, and
MCost represents the maintenance and additional operational costs.
Two major sources contribute to technical losses: (i) load losses consisting of
the and loss components in the series impedances of the various system
22
elements, and (ii) no-load losses which are independent of the actual load
served by the power system [96]. The majority of the no-load losses are due to
transformer core losses resulting from excitation current flows [97].
NTLs are more difficult to measure because they are often unaccounted by the
system operators and thus have no recorded information. Two major sources
which contribute to NTLs are: (i) component breakdowns and (ii) electricity
theft. NTLs caused by equipment breakdown are quite rare, where factors may
include equipment struck by lightning, equipment damaged over time, and the
elements of neglecting equipment or performing no equipment maintenance.
Even though equipment failure due to natural abuses like rain, snow and wind
is rare, the equipment selected and the distribution infrastructure designed is
in consideration with the local weather and natural phenomena [94].
electricity theft and non-payment, which are believed to account for most, if not
all NTLs in power systems [94]. The factors contributing to NTL activities, as
indicated in [1, 22, 93], can be characterized to include the following:
Other forms of NTLs may also exist, such as unanticipated increases in power
system losses due to equipment deterioration over time, system miscalculations
on the part of the utilities due to accounting errors or other information errors.
These losses have not been taken into account in the present study, due to
insufficient background information [94]. In order to estimate revenue loss due
to NTLs, Davidson [89] defined a general equation, which is given below.
=
(2.3)
Where
CNTL is the NTL cost component,
CLoss represents the revenue loss due to technical/additional losses, and
24
Although some electrical power loss is inevitable, steps can be taken to ensure
that it is minimized. Several measures have been applied to this end, including
those based on technology and those that rely on human effort and ingenuity
[8]. Among the factors contributing to NTL activities, NTLs based on the
components identified are listed in Table 2.2 [1, 22, 93, 94].
Meter
Bills
Power Utilities
Inadequacies and
inaccuracies in meter
reading.
Electricity Customers
Unauthorized line tapping
and diversion.
Inadequate or faulty
metering.
Inaccurate customer
electricity billing.
Inefficiency of business and
technology management
systems.
Arranging billing
irregularities with the help
of internal employees.
Non-payment of electricity
bills.
Arranging billing
irregularities with the help of
internal employees.
Arranging false readings by
bribing meter readers.
25
The current methods of minimizing NTLs impose high operational costs and
require extensive use of human resources. Several methods have been
developed in other countries to minimize NTL problems. Most electricity supply
utilities concentrate on onsite technical inspection of customers, which has high
operational costs and occupy much human resources and time [22]. Onsite
technical inspections in electricity supply utilities such as TNB in Malaysia, are
carried out at random while some targeted raids are undertaken based on
information reported by the public or meter readers. This study proposes a
method to overcome such limitations by monitoring and detecting deviations in
customers load profiles, as an alternative to complement the ongoing existing
actions enforced by power utilities to reduce NTLs.
27
Fraud detection in the present context is defined as, monitoring the behavior of
a user population in order to estimate, detect or avoid undesirable behavior
[100]. Fraud detection involves identifying fraud as quickly as possible once it
has been perpetrated. Fraud detection methods are continuously developed in
order to defend criminals adapting newer strategies [35]. Since most criminals
are not aware of the fraud detection methods that have been successful in the
past, hence they will adopt strategies which will more likely lead to identifiable
frauds. Therefore to detect fraud, earlier detection tools need to be applied as
well as the latest developments [21].
Alternatively, the techniques used to detect outliers can be divided into two
major categories, which are as follows:
1. Electricity business.
2. Other type of businesses, including: credit card transaction, insurance,
risk management, and telecommunication.
From the literature reviewed it is observed that, Jiang et al. in [25] employed
Wavelet techniques with a combination of multiple classifiers to identify fraud
customers in an electricity distribution network. The Wavelet technique was
selected over conventional methods for feature extraction, because of the
localization and multi-resolution properties in wavelets helps to obtain results
with greater accuracy. Alternatively, in [15] Rough Sets and in [18, 19] Decision
Trees were used for the classification of electricity utility customers. In
addition, Statistical methods in [16, 17] were also used to minimize NTLs in
electricity distribution networks. There were also studies conducted using the
ANN [20], Statistical-based Outlier detection [21], and most recently developed
31
ELM [8], where all studies presented different approaches employing a general
framework that had customer databases as the input data source.
The literature reviewed indicates that for credit card transactions, online
merchants are very susceptible to fraud, as the purchasers are not present
during the transaction [115]. Most of the credit card companies have used ANNs
as their tools to detect fraud [27, 112], with the telecommunication businesses
and the Security and Exchange Commission (SEC) applying similar techniques.
All of the applications cited above have employed data mining techniques to
expose fraud directly from their customer databases. Alternatively, in [116], a
rule-learning based algorithm was developed in order to identify users with
fraudulent behavior from databases of cellular phone customer transactions.
The current approach used by the PEA of Thailand [117], TNB in Malaysia [118]
and other power utility companies, in order to detect the two major causes of
NTLs, as mentioned above, is by performing onsite technical inspection of
customers. This primarily involves field staff monitoring meters and access
points in the transmission and distribution system on a regular basis. Onsite
technical inspections in TNB Malaysia [117] are carried out at random, while
some targeted raids are undertaken based on information reported
(irregularity reports) by the public or meter readers. In addition, most power
utilities even provide specialized training to regular meter readers in order for
them to spot irregularities in consumption behavior [94].
The reason that meter inspection is the main method of NTL detection is
because, power utilities consider electricity theft to be the major source of
NTLs, and the majority of electricity theft cases involves meter tampering or
meter vandalism [94].
The principles of operation for electric Watt-hour (Wh) meters virtually have
not changed since the 1880s and the 1890s, since the Watt-hour meter was first
invented [121]. The basic principle for a single-phase electrical energy
measurement meter, first commercially used in 1894, is as follows. A standard
Watt-hour meter consists of two coils that produce electromagnetic fluxes [94]:
33
1. A coil connected across the two leads that produces a flux proportional
to the voltage (potential coil) as shown in Figure 2.1 (top left).
2. A coil connected in series with one of the leads that produces a flux
proportional to the current (current coils) as shown in Figure 2.1.
Figure 2.1: Basic components of a Watt-Hour Meter Clockwise from top left:
The coil connections for voltage and current sensing elements, the rotating disc
that records the electricity consumption, and the basic construction [94]
34
In early meter designs, such as the ones shown in Figure 2.2, electricity meters
were not enclosed and all the parts including the meter installation was easily
accessible to anyone [94]. However, as early as 1899, the Association of Edison
Illuminating Companies (AEIC) indicated that electricity theft was a concern
early on. In response to the recommendations proposed by the committee of
the AEIC, the following improvements along with other efficiency and accuracy
improvements were incorporated into electricity meters [121]:
The literature above indicates that the problem of electricity theft has been
around almost as long as power systems were introduced. Modern meters, such
as those in shown Figure 2.3, are comparatively well enclosed and have seals
that can reveal tampering [94]. However, theft can still occur. Most power
utilities train their inspection teams for spot tampering, however, sometimes
access to the inner mechanisms of the meter can be achieved by drilling a very
tiny and fine hole at less obvious parts of the enclosure [120], which is difficult
for inspection teams to identify. A detailed historical background, and timeline
of Watt-hour meters from the earliest days, since the 1880s to modern present
day meters, can be found at [122].
35
Shanty towns also referred to as slums or squatter settlement camps, are settlements
(sometimes illegal or unauthorized) of people who live in improvised dwellings made from
scrap materialsoften plywood, corrugated metal and sheets of plastic.
37
38
Other methods of electricity theft include, tapping off nearby paying customers,
and using magnets to reduce the speed of the rotating disc in the meter housing.
One of the other possible methods found to reduce the billing consumption at
the consumer end is by using a Tron Box [120], which reverses the phase
signal of one of the lines passing through to meter to cancel out the opposing
phase in the line. This reverse phase effect reduces the speed of the meter
indicating lower electricity consumption and can also result in the meter dials
to move backwards [94].
A current transformer is a device that outputs a current proportional to the load current being
measured, enabling the meter to measure the load without subjecting it to a large current [94].
39
1. Direct connections from the power line One of the obvious methods
used by perpetrators to eliminate consumption records is to bypass the
meter. The major obstacle in this scenario is that most HV loads are
constructed and connected at the request of customers. Since customers
have knowledge regarding the location of the HV power lines, as they are
the ones who request for the connections, therefore, direct connections
from the power line to the loads can be established with assistance from
electricians. However in most cases, not many electricians would risk to
expose themselves to HV power lines without the power utility there to
assist them with safety [94].
40
3. Tampering with the terminal seals The most common meter abuse
method used by far, is by tampering with the terminal seals. Since the
terminal seals are in an easy to reach location, i.e., immediately below
the meter, most perpetrators use this to their ease. This is accomplished
by breaking the terminals and connecting one of the CT wires to the
ground, making it appear to the meter that one of the phases does not
have any voltage or current [94].
4. Breaking the control wires The control wires are the secondary
wires of a CT. Meters for large loads measure high currents, therefore
step-down CTs are connected to reduce current level, in order to make
the current level compatible with the components in the meter. This can
be accomplished by breaking the insulation of a control wire and
connecting external taps to it, in order to reduce the current going into
the meter, which will cause the meter to read less current than the actual
amount [94].
5. Shorting the control wires Shorting the control wires in the meter,
will divert the current reading in the meter. In this scenario, the current
going into the meter will be zero. This effect is immediate, and obviously
with zero current, the power consumption readings will be zero and the
accumulated consumption will be stationary [94].
phase C always has the least amount of load and lowest power factor
(PF). By switching the CTs or the control wires from the CT secondary
windings, the meters reading speed can be altered. This is accomplished
by removing the CT from one of the phases A or B, and replacing the
removed CT on phase C to reduce the meters power reading speed [94].
7. Breaking the voltage taps Voltage taps, as shown in Figure 2.5 are
present in the meter housing, in order to allow the meter to read the
voltage of the load. Perpetrators, usually employ one of the following
methods for electricity theft: (i) break the voltage taps, or (ii) short the
voltage taps to the ground, or (iii) connect the voltage taps to another
line, in order to distort the reading of the meter, so that the meter reads
a lower voltage [94].
Many other methods for electricity theft have been identified. However, some of
the methods are not feasible enough, while others require too much effort or
are outright dangerous [120]. As an example, one highly far flung idea of
electricity theft found was, to place enormous coils around HV power lines to
act as transformers with ridiculously large air gaps [123].
TNB has classified customers into six different categories based on different
types of businesses, which include: (i) domestic, (ii) commercial, (iii) industrial,
(iv) agricultural, (v) mining, and (vi) public lighting [118]. Industrial customers
or Large Power Customers (LPCs), as referred by TNB comprise of a fairly large
amount of TNB users, which contribute the largest proportion of sales revenue.
The commercial and the domestic customers then follow in sequence, which
comprise of the largest proportion of TNB users, referred to as Ordinary Power
Customers (OPCs). The remaining agricultural and mining categories comprise
of a smaller proportion of customers.
43
Reducing NTLs is a core strategy for TNB under its 20 year strategic master
plan and TNB has since 2004, made intensive efforts to reduce power theft,
since power theft is a major contributor to the NTLs [126]. Currently, three
measures comprise the main effort by TNB to minimize and work towards
preventing NTLs, which include: (i) the installation of Remote Meter Reading
(RMR) service [127] to provide power consumption statistics and online billed
data, (ii) the installation of a prepayment metering system [128], and (iii) the
setting up of a Special Enforcement Against Losses (SEAL) team to investigate
problems by conducting onsite customer meter installation inspections [126].
Currently, the RMR service and the prepayment meter system are targeted only
towards the HV distribution network, for LPCs. For OPCs, such as commercial
domestic and light industry customers, in the LV distribution network, the SEAL
team provides effective solutions to minimize and reduce NTLs.
44
The SEAL team was set up by the NTL Electricity Theft group of TNB in 2004, in
order to reduce and minimize NTL problems faced by TNB [126]. The SEAL
teams activities include improving metering and billing processes, ensuring
metering is accurate, and reducing the theft of electricity. In 2005, the NTL
Electricity Theft group successfully reduced distribution losses by about 1%
mainly through the efforts of the SEAL team. The electricity theft rate for TNB in
2005 was reduced almost by 50%, where the theft rate for LPCs was reduced
from 3% to 1.5% and likewise for OPCs, the theft rate was reduced from 4.1%
to 2% [125].
In 2007 and 2008, large inspection campaigns were carried out by the SEAL
team with little success. The reason for this is newer and improved methods of
electricity theft, which are difficult to identify. The current actions taken by the
45
SEAL team in order to address the problem of NTLs include: (i) meter checking
and premise inspection, (ii) reporting on irregularities, and (iii) monitoring of
unbilled accounts, meter reading and sales. Currently, customer installation
inspections are carried out by the SEAL teams without any specific focus or
direction, and most inspections are carried out at random, while some targeted
raids are undertaken based on information reported by the public or meter
readers. Therefore, the motivation for the present research investigation is to
reduce NTLs effectively in the Malaysian electricity supply industry, mainly
TNB, by proposing and alternative solution to complement their currently
existing approaches.
2.6 Summary
The fundamental objectives of power utilities are to maximize profit and
minimize operational costs, which require dealing with the common problems
of losses. Such losses are categorized as technical losses and NTLs. The need to
minimize and reduce NTLs is critical for power utilities, as these losses
contribute to the cost of electricity, which is passed on to the utility customers.
From all the current solutions available which include field investigations and
Supervisory Control and Data Acquisition (SCADA) systems, the present
approach is to use customer behavior changes in load consumption variations
as a means of indicating fraud activities that contribute to NTL activities. For
this purpose, a fraud detection system similar to those implemented by
electricity businesses and other businesses, such as credit card transactions and
bank loan applications is recommended for implementation by power utilities.
This chapter reviewed the background and literature relating to losses in power
utilities, including technical losses and those due to NTL activities referring to
the impact of NTL activities from an economic and a financial perspective. A
comprehensive review of customers load profile analysis in several countries
was presented in this chapter. Also considered were the clustering techniques
46
47
CHAPTER 3
3.0 Overview
This chapter presents the background and theoretical concepts of the artificial
intelligence (AI) techniques applied in this research study. The introduction
starts off by discussing the preliminaries of AI briefly, which include pattern
recognition and machine learning techniques, such as supervised learning and
unsupervised learning. Next, some popular AI techniques are briefly discussed,
which include: Expert System (ES), Fuzzy Logic (FL) and Artificial Neural
Networks (ANNs). In the sub chapter of SVM, the statistical learning theory is
presented, followed by the Structural Risk Minimization (SRM) principle. The
background and theoretical concepts of Support Vector Machines (SVMs) with
regards to Support Vector Classification (SVC) are discussed in detail where
derivations of the margin hyperplanes for linear and non-linear SVC are
presented, followed by the kernel methods. The last part of the chapter
presents the Sequential Minimal Optimization (SMO) algorithm used for
optimization of Quadratic Programming (QP) problems in SVMs.
The field of AI was founded on the claim that a central property of human
beings, i.e. intelligence, can be so precisely described that it can be simulated by
a machine [131]. AI first emerged in the early 1940s, when scientists began new
approaches to build intelligent machines based on recent discoveries in
neurology, cybernetics, new mathematical theory of information and most
importantly, with the invention of the digital computer [132134]. After World
War II, a number of people independently started to work on intelligent
systems. An English mathematician named Alan Turing was the first to conduct
lecture on AI in 1947 [135], and decided that AI was best researched by
programming computers, rather than by building machines [136]. Officially, the
field of AI research was founded at a conference in Dartmouth College in the
summer of 1956 [132, 133, 137]. The researchers present at this conference,
later onwards established AI laboratories in world famous computer science
research institutions, such as the Massachusetts Institute of Technology (MIT)
and Stanford University in the United States.
By the late 1950s, there were many researchers working in the field of AI, and in
fact all of their work was based on computer programming [136]. From the
middle 1960s AI research started to be funded by the U.S. Department of
Defense, i.e., the Defense Advanced Research Projects Agency (DARPA). In the
early 80s, with the commercial success of expert systems and ANNs, the field of
AI research led to a new horizon. By 1985, the market for AI had reached more
than a billion dollars and governments around the world funded money on AI
research projects [132, 133, 137].
In the 90s, AI achieved its greatest successes, when many industries replaced
expensive human experts by using main stream computing systems that
reduced the cost of business and high risk exposure to their employees [138].
Since then, many AI based techniques have been successfully transitioned from
the research lab into real-world applications for pattern recognition, data
49
mining, control systems and robotics [139]. In the early 21st century, many
areas throughout the technology industry, such as defense, transportation,
manufacturing, and entertainment commercialized applications based on AI,
some of which include: face recognition, medical diagnosis of cancers and
tumors, aircraft control, nuclear power systems, and intelligent systems used
for optimization, monitoring, control, planning, scheduling and fault diagnosis.
In todays era, AI is already a part of human life in many countries, and it has
grown into an important scientific and technological field over the recent years.
Currently, AI is assisting people faster and more efficiently to make better use of
information in performing tasks which require detailed instructions, mental
alertness and good decision and making capabilities. The future benefits of AI
are indeed very promising, as it is currently being deployed in space exploration
missions and is also being used to invent robots with similar human like skills
and characteristics.
50
described with a number of selected features and their values. An object thus
can be described as a vector of features [144]:
= , , , # $
(3.1)
51
Where
p is the number of features or attributes.
The features or attributes together span a multi-variate space called the feature
space or measurement space. In pattern recognition, the classification task can
be seen as a two-class (binary) or multi-class problem. With respect to focus of
the present study, in a two-class problem, an object is classified as belonging or
not belonging to a particular class [144].
During the classification scheme, the learning algorithm takes the training data
as input and selects a hypothesis from the hypothesis space that fits the data
[144]. There are many machine learning algorithms such as unsupervised
learning and supervised learning, however, the availability or non-availability of
training samples determines which machine learning should be considered.
52
In supervised learning, it is necessary that the data used for training and testing
relates to the domain of interest. As an example, suppose that a bank wishes to
detect fraudulent credit card transactions. In order to accomplish this, some
basic knowledge regarding the domain of interest is required to identify factors
that are likely to be indicative of fraudulent credit card usage. These factors may
include: the frequency of usage, amount of transactions, spending patterns, the
type of business engaging in the transaction and so forth. These variables are
referred as features or independent variables. These independent variables
should be in some way related to the targets, or dependent variables [37], so
that the supervised learning can generate a model to map the input objects to
the desired outputs.
53
(3.2)
teams (targets) are used as the dependent variables * . Then, the relationship
between and * is given by the joint probability density is [37]:
9(x, * + = 9 (x+9(*|x+
(3.3)
Where
x represents the independent variable representing the features,
* represents the dependent variable representing the targets,
9(x+ is the prior probability, and
(3.4)
<(*, 67 (x, = ++ is the loss between the target * and function 67 (x, =+,
54
67 (x, =+ predicts the targets from a the inputs x and model parameters =,
?(x, *+ is the joint probability distribution.
In binary SVM classification, the loss function <(*, 67 (x, = ++ in eq. (3.4) can be
rewritten as [152]:
<(*, 67 (x, = ++ = @
0 if * = 67 (x, =+K
1 otherwise
(3.5)
Since the joint probability distribution ?(x, *+ of the inputs and targets of the
(3.6)
The most widely used supervised learning classifiers include: ANNs, SVMs,
Decision Trees, Gaussian Mixture Models, Discriminant Analysis (DA),
Classification Trees, Nave Bayes and the Radial Basis Function (RBF) classifiers.
The advantage of supervised learning is that it provides confidence values for
the predictions that are important for this research study. However, the
disadvantage of supervised learning is that estimating the distributions can be
difficult and a full probabilistic model may not be required [37].
learning methods use are the observed input patterns x , which are often
assumed to be independent samples from an underlying unknown probability
distribution 9 (x+, and some explicit or implicit a priori information as to what is
important [153].
There are many unsupervised learning techniques, one of which the most
common is clustering. Other forms of unsupervised learning include the: SelfOrganizing Maps (SOMs), Adaptive Resonance Theory (ART), Independent
Component Axis (ICA), Principle Component Axis (PCA) and other methods such
as association rules and collaborative filtering. The main advantage of
56
& = '(x , * +, , (x
, *
+,, such that the risk is minimized. In practice, the true
distribution 9(x, *+ is unknown and eq. (3.4) cannot be evaluated. Instead, the
empirical risk is estimated into eq. (3.6), based on the training set &. However,
the minimizer of eq. (3.6) is not necessarily the minimizer of eq. (3.4) [37].
Trivially, the function that takes the values 6 (x + = * on the training set and is
random, elsewhere has zero empirical risk, which clearly does not generalize.
Thus, minimizing the empirical error does not necessarily lead to a good
hypothesis [152]. This phenomenon is referred to as overfitting, where the
learned hypothesis has fitted both the underlying data generating process and
the noise in the training set [37, 154, 155].
57
3.1.2 AI Techniques
AI has produced a number of tools and techniques since its emergence as a
discipline in the mid 1950s. These techniques are of great practical significance
in engineering to solve various complex problems normally requiring human
intelligence [156]. After years of steady progress AI tools have evolved into
modern techniques, such as: Expert System (ES), Fuzzy Logic (FL), Artificial
Neural Networks (ANNs), Support Vector Machine (SVM), and the most recently
developed Extreme Learning Machine (ELM). All these techniques are being
widely applied in a growing number of applications [37]. The following sub
sections briefly discuss the basic concepts of ES, FL, ANNs. The theoretical
concepts of ELM are discussed in chapter 5, while the SVMs are discussed later
onwards in this chapter.
58
In Boolean logic variables may have a membership value of only 0 or 1 [37]. The
notion central to fuzzy systems is that the truth values in fuzzy logic or
membership values in fuzzy sets are indicated by values in the range inclusively
between 0 to 1, and are not constrained to the two truth values {true (1), false
(0)} as in classic predicate logic [158]. Conventional methodology or theory
implementing crisp definitions such as the: classical set theory, arithmetic, and
programming, can be fuzzified by generalizing the concept of a crisp set to a
fuzzy set with blurred boundaries [159]. In addition, logical operations on fuzzy
sets are generalizations of conventional Boolean algebra. Similar to Boolean
logic, fuzzy logic has three basic operations: (i) intersection, (ii) union, and (iii)
complement [158].
59
Fuzzy logic systems are based on a set of rules. These rules allow the input to be
fuzzy, i.e., more similar to the natural way that human express knowledge [160].
Linguistic variables are also a critical aspect of FL applications, where general
terms such as large, medium, and small or hot and cold are each used to
capture a range of numerical values, which might be context dependent [161].
Fuzzy logic has been used in various applications such as, automobile and
vehicle subsystems, air conditioners, cameras, image processing, elevators,
washing machines, rice cookers, dishwashers, video games, speech recognition,
and pattern recognition [159].
by learning a mathematical model for the problem [37]. In addition, ANNs can
readily handle both continuous and discrete data and have good generalization
capability as with fuzzy expert systems [163].
ANNs map the inputs to outputs via weights during a training process by means
of connection and parallel distributed processing. The learned weights are used
to predict corresponding outputs for given inputs [164]. In a basic multi-layer
ANN structure, as shown in Figure 3.1, the input layer of the artificial neurons
receives information from the environment and the output layer communicates
the response. Between the input layer and the output layer of an ANN lie the
hidden layers [37]. The number of hidden layers in an ANN is variable and
may be one or more than one, depending upon the problem to be solved [155].
The hidden layers have no direct contact with the environment, and these layers
are where most of the information processing takes place in an ANN. The output
of an ANN depends on the weights of the connections between neurons in
different layers. Each weight indicates the relative importance of a particular
connection. If the total sum of all the weighted inputs received by a particular
neuron surpasses a certain threshold value, the receiving neuron will send a
signal to each neuron to which it is connected in the next layer [37]. This
standard procedure is followed by all neurons within the network, in order to
form the network output.
61
Weights
Network
Inputs
Network
Outputs
...
...
Output Layer
Input Layer
Hidden
Layer 1
Hidden
Layer 2
Hidden Layers
ANNs have proven to be successful in many general problem areas such as:
function approximation, regression analysis, prediction, classification, pattern
recognition, optimization, conceptualization, data processing, filtering and
clustering [37]. They have been employed in a number of applications, some of
which include: vehicle control, game-playing, decision making, radar systems,
face identification, object recognition, speech recognition, text recognition,
medical diagnosis, financial applications, data mining, visualization and spam
filtering.
The advantage of ANNs lies in their resilience against distortions in the input
data and their capability of learning. ANNs are good at solving problems that are
too complex for humans or conventional technologies [161]. However, the main
disadvantage of ANNs is the problem of overfitting and underfitting of data
62
SVMs were initially developed to solve classification problems and initial work
was focused on optical character recognition (OCR) applications [172, 173].
Some recent applications and extensions of SVMs include: isolated handwritten
digit recognition [174], object recognition [175], speaker identification [176],
face detection [28, 177], text categorization [29, 33] and bioinformatics and
data mining [178]. All of the above cases are examples of SVMs used for
classification. In addition, SVMs have also been proposed and applied to a
number of different types of problems including regression estimation, novelty
detection, density estimation and the solution of inverse problems. However, in
thesis only SVM classification will be discussed, since the focus of the present
study is classification.
63
The sections that follow give a brief description of the basic concepts of
statistical learning theory and the SRM principle, followed by an introduction to
Support Vector Classification (SVC), including the theoretical concepts and its
implementation. For additional material and a more detailed description of
SVMs one can refer to the works of V. Vapnik [152, 169], C. Burges [179], B.
Schlkopf [173, 180] and A. Smola [181, 182].
exmaples, such that 67 will correctly classify unseen samples (x, *+.
learning theory, shows that the set of functions , where 67 is chosen from must
64
be restricted to one that has a capacity, that is suitable for the amount of the
available training data. In order to do so, the VC theory introduces bounds on
the expected risk in eq. (3.4) and depends on the empirical risk in eq. (3.6)
and the capacity of . The minimization of these bounds leads to the SRM
principle. According to this principle, for all the functions in (i.e., for any value
of =), and % with a probability of atleast 1 U (0 U 1+, the following
bound holds:
V67 W L# V67 W + X
^_
`
d
e
YZ[\]Z ab ac[\]Z a
Where
(3.7)
eq. (3.7) is a bound on V67 W and it holds only with a certain probability. The
9(x, *+, and if U is known, it can be easily computed. Conversely, the left hand
side of eq. (3.7) is difficult to compute. Thus, given some selection of learning
machines whose empirical risk is zero and choosing a fixed, sufficiently small
value of U, one must choose the learning machine whose associated set of
65
To summarize, given a fixed number of training samples, one can control the
risk by controlling both the empirical risk L# V67 W and the VC dimension . The
VC confidence term in eq. (3.7) depends on the chosen class of functions,
whereas both the empirical risk in eq. (3.6) and the expected risk in eq. (3.4)
depend on the function chosen by the training procedure, i.e., on the parameters
of =. The right choice = of for controlling the empirical risk in eq. (3.6) is made
machine) one must find the subset of the chosen set of functions where the risk
bound is minimized. To do so, a structure is introduced dividing the entire class
of functions into nested subsets.
F F Fh
(3.8)
(3.9)
For each subset, or a bound on itself should be computed. The SRM principle
then consists of finding that subset of functions that minimizes the bound on
the expected risk. This can be done by training a series of machines, one for
each subset, where for a given subset the goal of training is simply to minimize
the empirical risk. The subset of functions which minimizes the bound on the
expected risk is the subset of the trained machine in the series, whose sum of
empirical risk and VC confidence is minimal. For more details about the bound,
the reader can refer to [152].
i m ,
(3.10)
(3.11)
Figure 3.2: Optimal margin hyperplane for the separable case of SVC
This learning algorithm was proposed for separable problems and it is based on
the fact that among all possible separating hyperplanes, there exists a unique
hyperplane with maximum margin of separation from the classes, i.e. an
optimal margin hyperplane (OMH), and the capacity decreases with increasing
67
margin, as shown in Figure 3.2. The sections below present the linearly
separable case, followed by the non-linear separable case, implementation of
the SVC, and basics of the support vector algorithm in detail.
hyperplane also involves finding two hyperplanes parallel to it. Suppose, p and
p are these hyperplanes, as shown in Figure 3.3, with equal distances to the
maximum margin and with the condition that there are no data points between
them. If all the training data satisfies the constraints as follows:
i k q + l +1, for * = +1
i k q + l 1, for * = 1
(3.12)
(3.13)
Where
w is normal to the hyperplane,
|r|
(3.14)
The points for which the equality in eq. (3.12) holds lie on the hyperplane
p i k q + l = +1, with normal w and perpendicular distance from the
origin
| r|
i
. The points for which the equality in eq. (3.13) holds lie on the
The maximum margin is the distance from the separating hyperplane to the closest sample.
68
||
| r|
i
and the pair of hyperplanes that gives the maximum margin can
Figure 3.3: Linear separating hyperplanes for the separable case of SVC
outlining the support vectors (maximum margin approach)
The points that lie on the hyperplanes p and p , and whose removal would
change the found solution, are known as support vectors (SVs). So, in order to
(3.15)
69
equations are multiplied by the Lagrange multipliers and subtracted from the
function 6 (k+ = i k q + l, to form the primal formulation of the Lagrangian11.
<x i
N = * ( i k q + l+ +
N = ,
= 0
(3.16)
In eq. (3.16), <# must be minimized with respect to w, b, and simultaneously the
derivates of <# with respect to all Lagrange multipliers = must vanish, all
subject to the constraints, = 0. For more details on Lagrange multipliers and
other optimization methods, one can refer to [151, 154, 185].
can be used instead [185], which maximizes <# with respect to = , subject to the
constraints that the gradient of <# with respect to w and b vanish, and that
= 0. This gives the following two conditions:
i = zqN{ = * k q
zqN{ = * = 0
(3.17)
(3.18)
Substituting these constraints into eq. (3.16) gives the dual formulation of the
Lagrangian:
<
N =
,|N = =| * *| (k q k }+
(3.19)
11
The Lagrange function is given by substituting the sum of all products between constraints
and corresponding Lagrange multipliers from the primal objective function (the function that is
to be minimized) [185].
70
Support vector training for the linearly separable case therefore amounts to
maximizing < with respect to = , subject to the linear equality constraint in eq.
(3.18) and positivity of the = , with solution given by eq. (3.17). There is a
Lagrange multiplier = for every training point and in the solution, those points
for which the inequality constraint = 0 is true are the support vectors (SVs).
All other training points have = = 0. For SVMs the SVs are critical elements of
the training set because they lie closest to the decision boundary. Therefore, a
new object x can be classified using:
6 ( k+ = i k + l
6(k+ =
N = * k q k + l
6(k+ = sign
N = * (k q k+ + l
(3.20)
In both the objective function < and the solution, the training vectors kq occur
in the form of dot products, which is one of the reasons for using a Lagrangian
solution, with no local extreme. Since, both <x and < arise from the same
objective function, but with different constraints, the solution can be achieved
by minimizing the Lagrange function, <x with respect to the primal variables or
maximizing it with respect to the Lagrange multipliers, i.e. the dual variables
[186]. In practical terms, this convex QP problem can quickly be solved using
quadratic optimization methods such as Sequential Minimum Optimization
(SMO) [187, 188]. However, a large part of the reason behind this optimization
is the great computational efficiency of the SVC formulation.
71
the two hyperplanes p and p is not strictly enforced. In order to do that, the
constraints in eq. (3.12) and eq. (3.13) are relaxed, but only when necessary.
Hence, a further cost, i.e., an increase in the primal objective function in eq.
(3.16), is introduced penalize to the data points that cross the hyperplane
boundaries.
Figure 3.4: Linear separating hyperplanes for the non-separable case of SVC
where the slack variable permits margin failures (soft margin approach)
the constraints [174]. The constraints in eq. (3.12) and eq. (3.13) then become:
i k q + l +1 ,
i k q + l 1 + ,
0,
for * = +1
for * = 1
(3.21)
(3.22)
(3.23)
From eq. (3.21) and (3.22) it is shown that for an error to occur, the
corresponding must exceed unity, so is an upper bound on the number of
training errors.
72
In order to assign an extra cost for errors, the objective function in eq. (3.16) is
modified in order to include a penalizing term i + ( +L instead of just
using the term i. For any positive integer, m this is a QP problem; even for
advantage that neither , nor their Lagrange multipliers, appear in the Wolfe
dual problem. The penalty parameter C is a regularization parameter that trades
off the wide margin with a small number of margin failures (soft margins). This
parameter is finite (setting = leads back to the original perfect separating
0 = C K
N = * = 0
(3.24)
Thus, the only difference from the perfectly separating case, i.e. linear case in
section 3.2.2.1, is that the Lagrange multipliers = are now bounded above by C
instead of . The solution is again given by:
i =
qN{ = * k q
(3.25)
Where
N is the number of support vectors.
73
<x =
N = * kq = 0, = 1,2, , >
<x =
N = * = 0
* (i k q + l+ 1 0, i = 1,2, , %
= '* (i k q + l+ 1, = 0
= 0
(3.26)
(3.27)
(3.28)
(3.29)
(3.30)
where 1 = 1,2, , % and = 1,2, , >. The parameter > represents the dimension
of the data. The eqs. (3.26) to (3.30) are satisfied at the solution of any
Thus, finding a solution to the KKT conditions is equivalent to solving the SVM
problem. The threshold b is found by using the KKT complementary condition in
eq. (3.29), by choosing any 1 where = 0. However, it is numerically safer to
take the mean value of b, resulting from all such equations [179]. The KKT
conditions for the primal problem are also used in the non-separable case, after
which the primal Lagrangian becomes:
<x i + = '* (i kq + l+ 1 + ,
(3.31)
slack variables, . The KKT conditions for the primal problem are therefore:
=
N = * k q = 0
= = * = 0
= = = 0
= '* (i k q + l+ 1 + , = 0
(3.32)
(3.33)
(3.34)
(3.35)
74
* (i k q + l+ 1 + 0
= , , 0
0
(3.36)
(3.37)
(3.38)
where 1 = 1,2, , % and = 1,2, , >. Once again the KKT complementary
conditions in eq. (3.35) and (3.38) can be used to determine the threshold b. It is
oberseved that, eq. (3.34) combined with eq. (3.38) shows that = 0 if = < ,
since = = 0. Thus, any training point for which 0 < = < , that is, a
data point that is not penalised, i.e. it doesnt cross the boundary, can be taken
to compute b. As before, the average b over all such training points is used. The
eqs. (3.35) and (3.36) show that a data point can either be behind p and p :
= = 0
* (i k q + l+ 1 + > 0
(3.39)
(3.40)
and does not participate in the derivation of the separating function, or is either
on the planes p and p or crosses the boundaries with = = and > 0.
= > 0
* (i k q + l+ 1 + = 0
(3.41)
(3.42)
In the case of linearly inseparable data, SVMs map the data into another high
dimensional space, known as the feature space, such that the mapped data
points will be linearly separable. This can be done easily, since the only
75
(3.43)
Where
d is the dimensionality of the input space, and
D is the dimensionality of the feature space.
Figure 3.5: Non-linear separating hyperplanes for the non-separable case of SVC
(3.44)
76
Then the dot product in the high-dimensional feature space [175] is equivalent
to a kernel function in the current space, that is:
k q , k } = (k q + k }
(3.45)
(3.46)
(k, + = (kq + (q +
(3.47)
is finite, then:
; (k+ >k
(3.48)
(3.49)
77
Parameters
,
Kernel Function
k q , k} = ckq k}
k q , k} = tanhk q k }
k q , k} = k q k } +
^
The Radial Basis Function (RBF) or Gaussian kernel is the most widely used
of the RBF kernel is that it adds only a single free parameter > 0, which
controls the width of the RBF kernel as: = 12 , where is the variance of
the resulting Gaussian hypersphere. The RBF kernel has been shown to perform
well in a wide variety of practical applications, such as in [192194].
78
This research study only focuses on SMO, as other optimization methods are
slower [196] on large amounts of data, and also because the library used for
implementing SVC in this research study utilizes the SMO algorithm. The
following section gives a brief description of the SMO algorithm and its
implementation for SVC.
At every step of SMO, two Lagrange multipliers are selected for optimization
and after their optimal values are found given that all the other multipliers are
fixed, the SVC is updated accordingly. In SMO, the two training samples are
selected using a heuristic method, and then the two Lagrange multipliers are
solved analytically. These are the two main components of the SMO algorithm.
79
The main advantage of SMO is that because it uses only two training samples at
every step and avoids the computation of a kernel matrix [197], SMO requires a
smaller amount of memory and can handle very large training sets compared to
other optimization techniques [187, 188] such as the chunking algorithm
[169].
In order not to violate the bound constraints of the problem in eq. (3.24), the
Lagrange multipliers must lie within a box defined by 0 = , = , while for
the linear equality constraint of the problem to be true, the multipliers must lie
on a line:
= * + =* = constant
(3.50)
Thus, the constrained minimum of the objective function must lie on a diagonal
line segment, as shown in Figure 3.6. The SMO algorithm first computes the
second Lagrange multiplier = and then uses it to obtain the first Lagrange
multiplier = . Using the constraints of the dual problem in eq. (3.24), the
following bounds apply to = :
< = p
(3.51)
80
Where
if * * , and
if * = * .
< = max(0, = = + ,
p = min(, + = = +,
< = max(0, = + = + ,
p = min(, = + = +,
(3.52)
(3.53)
Figure 3.6: Inequality constraints causing the Lagrange multipliers to lie within
a box and the linear equality constraint causing the Lagrange multipliers to lie
on a diagonal line [187]
The difference (1 = 1,2+ between the output of the current hypothesis 6(k+
using:
= 6 (k q + * =
|N =| *| k } , k q + b * ,
1 = 1,2,
(3.54)
Where
is the error on the ith training sample, for the error-cache E.
81
The second derivative of the objective function, along the diagonal line, is
expressed as, :
= (k {, k {+ + K(k , k + 2K(k {, k + = (k { + (k +
(3.55)
Given the objective function is definite positive, there is a minimum along the
direction of the second constraint in eq. (3.24), and, > 0. The maximum of the
=h = = +
^ ( c^ +
(3.56)
and by clipping =h to the end of the line segment (i.e., < =h p),
h ,
##
=
=h
<
16 =h > p
16 < < =h < p K
16 =h < <
(3.57)
h ,
##
as
follows:
= h = = + * * (= =
h ,
##
(3.58)
82
The algorithm has an outer loop that iterates over the entire training set,
looking for samples that violate the KKT conditions. These samples are eligible
for optimization. The first choice heuristic is used for selecting the first sample
(k {, * + and concentrates on the non-bound samples whose Lagrange
multipliers are neither 0 nor C. These samples are most likely to violate the KKT
conditions and they are eligible for optimization. The outer loop makes repeated
passes over the non-bound subset until all of the non-bound samples satisfy the
KKT conditions. The SMO then iterates again over the entire training set to
search for any bound samples that may now cause KKT violations due to the
optimization of the non-bound subset. After the entire training set obeys the
KKT conditions, the algorithm then terminates.
The second Lagrange multiplier is chosen so that the size of the step taken
during joint optimization is maximized, resulting in a large increase of the dual
objective. Because computation of the kernel function is time consuming, SMO
maximum error . To reduce the computation further, the algorithm also keeps
a cached list of errors for every non-bound sample in the training set.
In the case where the SMO makes no significant progress (i.e. there is no
significant increase in the dual objective), the algorithm iterates over all the
non-bound samples12. If this fails too, SMO looks once more for a suitable
sample through the entire training set12.
12
Starting from a random location in order to avoid a bias towards the samples at the beginning
of the training set.
83
Procedure
3
4
multiplier =h
to
h ,
##
=
^ ( c^ +
3.3 Summary
This chapter reviewed the background and theoretical concepts of the AI
techniques applied in this research study. The introduction started off by
discussing the preliminaries of AI briefly, which include pattern recognition and
machine learning techniques. In the next section, popular AI techniques such as:
Expert System (ES), Fuzzy Logic (FL) and Artificial Neural Networks (ANNs)
were briefly discussed. In the sub chapter of SVM, the statistical learning theory
was presented followed by the SRM principle, where both of these concepts
lead to the introduction of the SVM, namely SVC. The background and
theoretical concepts of SVC were then discussed in detail where derivations of
the margin hyperplanes for linear and non-linear SVC were presented, followed
by the kernel methods. In the last part of the SVM sub chapter the SMO
algorithm was presented.
84
From the extensive review of SVM concepts this chapter, it is observed that SVC
has a notable number advantages as compared to standard neural networks.
Firstly, SVC has non-linear dividing hypersurfaces that give it high
discrimination. Secondly, it provides good generalization ability for unseen data
classification. Lastly, SVC determines the optimal network structure itself,
without requiring tuning any parameters. With the recent success of SVMs in
various real world applications such as: face identification [28], text
categorization, [29] and bioinformatics [30], additional motivation for this
research study is gained. There are many widespread research papers and
literatures indicating the classification accuracy of SVMs outperform other
traditional classification methods, such as ANNs [31, 32]. A comparison of SVM
classification results wither other techniques as reported by T. Joachims [33]
for text categorization is presented in Table 1.1, which indicates that SVMs have
the ability to achieve a higher classification accuracy as compared to other
techniques.
85
CHAPTER 4
MODEL DEVELOPMENT
4.0 Overview
This chapter provides the methodology proposed for the fraud detection
framework and implements the associated key algorithms to be used for NTL
detection. The first sub chapter introduces the general project and research
methodologies. The three major stages are involved in the development of the
fraud detection system, include: (i) data preprocessing, (ii) classification engine
development, and (iii) data postprocessing. The data preprocessing sub chapter
illustrates data mining techniques used for preprocessing raw customer
information and billing data for feature selection and feature extraction. The
sub chapter, classification engine development illustrates the SVC training,
parameter optimization, development of the SVC classifier and the SVC testing
and validation engine. The last sub chapter, data postprocessing describes the
development of a Fuzzy Inference System (FIS), creation of fuzzy rules and
membership function (MF) formation for the selection of suspicious customers.
Start of Project
Literature Review
Data Identification
Data Collection
Preliminary Testing
and Model Analysis
Data Processing
and Model Fitting
Data Cleaning
Model
Improvement for
Larger Samples
Development of
User Interface and
System Integration
Testing and
Validation
End of Project
87
Start of Research
Data Preprocessing
Feature Selection
and Extraction
SVM Training
Classification
SVM Parameter
Tuning
Data Postprocessing
using FIS
Model (Classifier)
List of Suspicious
Customers
Figure 4.2: Flowchart of the proposed framework for detection of NTL activities
88
In this research study, the NTL detection framework proposed in Figure 4.2 uses
historical customer billing data of TNB customers and transforms the data into
the required format for the SVM, by data preprocessing and feature extraction.
TNB customers are represented by their consumption profiles over a period of
time. These profiles are characterized by means of patterns, which significantly
represent their general behavior and it is possible to evaluate the similarity
measure between each customer and their consumption patterns. This creates a
global similarity measure between a normal and fraud customers as a whole.
The identification, detection and prediction are undertaken by the SVC, which is
the intelligent classification engine. With the help of the SVC results correlated
with the customer data, a FIS is employed to shortlist suspicious customers
from the testing data. This list of suspicious customers is used by TNBD SEAL
teams to strategize their NTL inspection activities.
The fraud detection system developed in this project forms the basis of this
research study. The Graphical User Interface (GUI) software is developed in this
project uses Microsoft Visual Basic 6.0. SVC training and testing is implemented
using the software, LIBSVM v2.86 [199], which is a library for SVMs. The
computer used for training the SVC model was Dell PowerEdge 840 workstation
with Windows XP, a 2.40 GHz Intel Quad-core Xeon X3320 Processor with 4 GB
of RAM. The following sections will further discuss the details of the processes
outlined in the fraud detection framework, in Figure 4.2.
89
The data collection for this research study was performed two times. In the first
data collection phase, historical customer billing and consumption data was
collected for training, i.e., to make the intelligent system learn, memorize and
differentiate between normal and suspicious consumption patterns. In the
second data collection phase, similar data was collected as in the first stage, but
this data was used for testing the fraud detection system, i.e., to detect and
identify suspicious TNB customers. The data used for training the SVC was
collected from the Kuala Lumpur (KL) Barat region, in the state of Selangor in
Malaysia, while the data used for testing the system was acquired from three
cities in the state of Kelantan in Malaysia, namely: Kota Bharu, Kuala Krai and
Gua Musang. Table 4.1 illustrates the statistics of the data collected from TNBD.
Table 4.1: Customer data collected from TNBD for training and testing
Data
Training
Testing
TNBD Station
Kuala Lumpur (KL)
Barat
No. of Customers
Fraud Cases
265,870
1171
Kota Bharu
76,595
101
Kuala Krai
18,880
37
Gua Musang
13,045
The data acquired for the TNB stations listed in Table 4.1, was for a period of 25
months, i.e., from May 2006 to May 2008. The amount of data was limited to be
25 months only, as there were problems faced by TNBD system administrators
with retrieving archived data older than 25 months from their customer billing
database. The archived data stored in TNBDs customer database consists of
customer billing records for the previous 10 years. The two years of customer
consumption data acquired for this research study, only contributes 20% of the
entire customer consumption history from the 10 year billing period. Thus, this
is the major drawback faced by this research study, since the amount of data
collected only contributes to a small amount of customer consumption history.
90
Data for all TNB stations listed in Table 4.1 was obtained in the Microsoft Office
Access Database format. Two types of customer data were collected, which are
the: (i) Enhanced Customer Information Billing System (e-CIBS) data, and (ii)
High Risk Data.
The monthly e-CIBS data is arranged into 14 data columns as shown in Figure
4.3. Table 4.2 describes the customer billing and consumption information
listed the monthly e-CIBS data.
Data
Description
Station No.
Customer No.
Reading Unit
Customer Name
Customer Class
Tariff Code
TOE
CWR
HRC
10
Reading Date
11
Reading Type
12
Consumption
13
IR
previously identified by the SEAL teams. The HighRisk data was additionally
requested from TNBD, in order to aid this research study for the development of
this project.
The HighRisk data is arranged into 3 data columns as shown in Figure 4.4. Table
4.3 indicates the customer information listed the HighRisk data.
Data
Description
Station No.
Customer No.
Detection Date
93
Start of Data
Preprocessing
Historical 25 Month
Customer Data
Customer Filtering
and Selection
Consumption
Transformation
Feature Adjustment
Feature
Normalization
Feature Selection
and Extraction
Feature File
(Training and
Testing Data)
End of Data
Preprocessing
Figure 4.5: Flowchart of the proposed framework for e-CIBS data preprocessing
94
As shown in Figure 4.5, six major steps are involved in preprocessing the e-CIBS
data, which are as follows:
1. Customer Filtering and Selection
2. Consumption Transformation
3. Feature Selection and Extraction
4. Feature Normalization
5. Feature Adjustment
6. Feature File
The following sections further discuss in the detail the six steps involved for eCIBS data preprocessing.
After customer filtering and selection from the KL Barat station data, only
186,968 customer records remained from the initial 265,870 customer
population. Even though approximately 30% of customers were removed after
95
Figure 4.6: Common customer records for the KL Barat station after customer
filtering and selection
The number of normal consecutive consumptions may vary through the entire
load profile for each customer. This entirely depends on meter readers, if they
are able to collect the meter readings. If they are not able to collect the actual
meter readings from the customer premises, then they estimate the current
months consumption based on the previous months consumption of the
customer.
97
Figure 4.8: The e-CIBS monthly consumption transformed into the normal
monthly kWh consumption
98
Figure 4.8 indicates the transformed consumption values into normal kWh
consumption values using statistical averaging, for the e-CIBS monthly
consumption data in Figure 4.7. The data columns R1 through R24 in Figure 4.7
and 4.8 represent the month number, the rows indicate different customers,
and the data value in every cell is in the format of, Reading_Type:
Monthly_Consumption. After transformation, as indicated in Figure 4.8, all
estimated E consumption values are converted into their respective normal
N consumption values.
From the 25 month kWh consumption data, daily average kWh consumption
values, corresponding to features were computed for each customer. These
features were calculated using the following expression:
Y
(L+
= |
` `|
= 1,2, ,24
(4.1)
Where
< represents the monthly kWh consumption of the following month, and
99
Using eq. (4.1) 24 features i.e., 24 daily average kWh consumption values were
calculated for each customer. It is known that meter readings for each customer
are recorded on different dates of the month and are not always the same for all
customers, i.e. meters are not exactly read every 30/31 days (month) and there
are longer or shorter durations in the number of days. As meter reading dates
effect the monthly kWh consumption recorded for each customer, thus, the 24
daily average kWh consumption values computed using eq. (4.1) reveal an
accurate consumption history of the customers.
The 24 daily average kWh consumption values computed for each customer
correspond to customer load profiles. For a selected group of Q customers, each
average kWh consumption values. Therefore, the whole set of customer load
profiles is represented by, = (L+ , = 1, , .
Figure 4.9 illustrates the normal monthly kWh consumption values, where < is
represented by data columns C1 through C24. The meter reading dates with
respect to the consumption values in Figure 4.9 are shown in Figure 4.10,
represented by data columns RD1 through RD24. The absolute difference of the
days with respect to the meter reading date, |Yb Y | for the respective
consumption values of Figure 4.9, are shown in Figure 4.11, represented by data
columns DD1 through DD24. The daily average kWh consumption values
calculated using eq. (4.1), are obtained by dividing the respective consumption
values in Figure 4.9 with the respective absolute day difference values Figure
4.11, as shown in Figure 4.12.
100
Figure 4.11: The difference of days between each meter reading date
After calculation of the daily average kWh consumption features, other features
were evaluated for selection based on the Cross-Validation (CV) method. Section
4.3.1.3.1 gives a brief overview regarding -fold CV. The criterion for selection
considered useful to the problem of NTL detection were evaluated: (i) HRC, (ii)
CWR, and (iii) IR. During feature selection, class labels {0, 1} for evaluating each
sample were selected based upon the TOE for the respective samples, where 0
indicates good (normal) samples and 1 indicates fraud samples.
Feature selection was performed using all customer data in the KL Barat station.
Utilizing the TOE information as the class label for the features results in an
unbalanced dataset, as there are only 1171 TOE cases from the total 186,968
customers, i.e., only 0.5% customers are fraud while the remaining are good.
Therefore, in this scenario, an unbalanced class ratio is achieved, for which the
CV accuracy does not prove to be accurate anymore. Therefore, to overcome this
problem, an objective function, i.e. eq. (4.2) was implemented in order to
calculate the detection hitrate13 for the purpose of performance evaluation. The
detection hitrate is calculated using the following expression:
1 p1 =
100%
(4.2)
Where
? represents the number of samples correctly classified as fraud cases
by the SVC and labeled as fraud cases by TNBD,
SVC.
13
Detection hitrate is the measure of accuracy in percentage for the number of samples
correctly classified by the SVC and identified as fraud by TNB, over the number of samples
classified as fraud by the SVC.
103
Table 4.4 indicates the hitrates obtained for the different combination of
features evaluated. The detection hitrate calculated in Table 4.4 is based on the
average of 100 trials, where on every trial 67% samples from every class are
used for SVC training and the remaining 33% samples are used for testing. For
every trial, the training and testing samples are selected in a random order.
Detection
Hitrate
68.13%
65.40%
78.96%
72.57%
67.43%
64.65%
77.21%
61.85%
No.
Feature selection results in Table 4.4 indicate that the best detection hitrate is
obtained using a combination of: (i) the load profile (daily average kWh
consumption values) features, and (i) the Credit Worthiness Rating (CWR).
Additionally, based on the data analysis of fraud customers previously identified
by TNBD SEAL experts, it was observed that CWR significantly contributed
towards customers committing fraud activities. As the CWR is targeted to
identify customers intentionally avoiding paying bills and delaying payments, in
the majority of the fraud and theft cases detected by TNBD, customers who
delay payments and ignore paying bills are most likely involved in fraud
activities, such as electricity theft. Therefore, the CWR is selected as another
feature, to be used alongside the load profile features in the SVC model.
104
The CWR for all customers in TNBDs billing system is automatically generated,
which is based on the monthly payment status of the customers. The CWR is
based on six integer values ranging from 0 to 5, where 0 represents the
minimum CWR and 5 represents the maximum CWR, as shown in Figure 4.13,
where data columns C1 through C25 represent the month and the customers are
indicated by the rows. Since, the CWR changes every now and then based on
monthly payment status of customers, therefore, averaged CWR values for each
customer over a period of 25 months were used as the 25th feature in the SVC
model, as shown in Figure 4.14.
4.3.1.3.1 Cross-Validation
Cross-Validation (CV), also referred to as rotation estimation [200, 201], is a
technique for assessing how the results of a statistical analysis will generalize to
105
set). In -fold CV, the training samples are partitioned into subsamples. Of the
subsamples, a single subsample is retained as the validation data for testing
the model, and the remaining 1 subsamples are used as training data. The CV
process is then repeated times (the folds), with each of the subsamples used
exactly once as the validation data. The results from the folds are then
used for validation exactly once [193]. To reduce variability further, multiple
rounds of CV are also performed using different partitions, and the validation
results can then be averaged over the rounds. Generally in most practices, the
10-fold CV method is the most commonly used procedure for feature selection
and parameter tuning in SVMs [202].
Where
(4.3)
Figure 4.15 indicates the 24 daily average kWh consumption features (load
profiles) from Figure 4.12, normalized by implementing eq. (4.3), where data
columns V1 through V24 represent the normalized kWh consumption and the
customers are indicated by the rows. Figure 4.16 indicates the normalized CWR
values for the averaged CWR values Figure 4.14.
proper format to the LIBSVM software [199]. Therefore, all normalized features
were labeled properly, where labels were represented by integer values.
Normalized feature values alongside their respective label values are denoted
by the matrix W, in the form:
%
%
=
%#
%
Where
:
:
: #
:
% :
% :
%# : #
% :
% :
% :
%# : #
% :
(4.4)
The modeling features obtained after the e-CIBS data preprocessing is complete,
are shown in Figure 4.17. The data columns V1 through V24 represent the
features in the standard format: Label_Number: Feature_Value, where the
customers are indicated by the rows.
In order to extract useful information from the HighRisk data, the type of
information which can be extracted was distinguished first. From observations,
it was seen that two types of major information can be extracted: (i) Detection
Count, and (ii) Last Detection Date. Table 4.5 indicates the information
retrieved from the HighRisk data using the Structured Query Language (SQL).
Table 4.5: Information retrieved from the HighRisk data using SQL
Column
Data
Description
Station No.
Customer No.
Detection Count
Figure 4.19 shows the information extracted from the HighRisk data by using
SQL. For the KL Barat station HighRisk data, after grouping all customers into
one particular record, 32,972 fraud customers were identified from the 105,525
fraud cases detected by TNB in the last five years. This indicates that on an
average basis, the number of times a customer is detected for fraud is 3.2 times,
i.e., all customers commit fraud at least 3 times, which indicates a high rate of
repetitive fraud. Table 4.6 indicates the statistics of the information extracted
from the HighRisk data for all TNB stations listed in Table 4.1.
111
Figure 4.19: Information extracted from the HighRisk data using SQL
Table 4.6: Information extracted using SQL from the HighRisk data
No.
Station
Customers Found
KL Barat
105,525
32,972
Kota Bharu
6,773
3,557
Kuala Krai
818
653
Gua Musang
1,826
953
tuning, class weight adjustment, SVC probability estimation and SVC testing and
validation. The following sections will further discuss in detail the development
of the SVC engine.
From the 186,968 filtered customers in the KL Barat data, only 1171 customers
were identified and detected as Theft of Electricity (TOE) cases by TNBD in the
past five years. From the remaining 185,797 customers with no TOE cases, a few
hundred customers were identified as normal customers, i.e., customers with no
fraud activities. The customers identified with TOE and no TOE cases, form
the backbone for the development of the SVC model.
Firstly, manual inspection was performed on all 1171 TOE cases to identify load
profiles in which abrupt changes appear clearly, indicating fraud activities,
abnormalities and other irregularities in consumption characteristics. From the
TOE 1171 cases inspected, only 53 customer load profiles were identified with
the presence of abrupt or sudden drops relating to fraudulent events. These 53
TOE cases were selected as Fraud Suspects (Class 1), in order to train the SVC for
identifying fraud and suspicious customers. Figure 4.20 indicates the load
113
profiles of four typical fraud customers over a period of two years from the 53
fraud cases identified. The remaining 1118 load profiles with TOE cases did not
have any abnormal consumption patterns or abrupt drops relating to fraud
activities; i.e., these customers committed fraud before the two year period
(before May 2006), for which there was no customer data.
(a)
(b)
(c)
(d)
Figure 4.20: Normalized load profiles of four typical fraud customers over a
period of two years
Secondly, inspection was also performed on a set of 500 load profiles with no
TOE cases. From the inspection, 330 load profiles in which no abrupt changes
appear were selected as Normal Suspects (Class 2), in order to train the SVC for
identifying normal (good) customers, i.e. customers with no fraud activities.
Load profiles of four good customers from the 330 normal cases identified are
indicated in Figure 4.21. Therefore, in total 383 samples from both the fraud
(Class 1) and normal (Class 2) classes were used to build the SVC model, as
shown in Figure 4.22.
114
(a)
(b)
(c)
(d)
Figure 4.21: Normalized load profiles of four good customers over a period of
two years
sample ratio for each class. This is achieved by dividing the total number of
classifier samples with the individual class samples. In addition, class weights
are also multiplied by a weight factor of 100 in order to achieve satisfactory
weight ratios for training. Table 4.7 indicates the weight ratios obtained after
adjustment.
Training Samples
Weightage
53
722.6415
330
109.4258
Figure 4.22: Customer data features (samples) used for SVC training
116
fine tuned. The Grid Search method proposed by Hsu et al. in [193] is used for
SVC parameter optimization. In the Grid Search method, exponentially growing
was measured by training 67% of the classifier data and testing the remaining
33%. This procedure was repeated 100 times for 10-fold CV trials, where on
every trial, data samples were selected in a random order.
parameters are found to be: = 1 and = 0.92, obtained for the highest 10fold CV accuracy of 93.71%. The detection hitrate in eq. (4.2) at this CV accuracy
100%
(4.5)
The SVC training engine proposed for parameter optimization and building the
classifier generation is shown in Figure 4.23. As seen in Figure 4.23, all 383
samples were trained in order to build the classifier (model file).
14 Training accuracy is the measure of the memorization or learning capability of the classifier.
Training accuracy is calculated in percentage for the number of samples used for testing the SVC
model, over the number of samples correctly classified by the SVC.
117
Start of Training
KL Barat Station
Training Data
(383 samples)
10-fold CV Trials
using 67% Data for
Training and 33%
for Testing
Reselect Optimal
SVC Parameters
using Grid Search
Bad
Good
10-fold CV Accuracy
for 100 Trials
Model File
(Classifier)
End of Training
Figure 4.23: The SVC training engine proposed for parameter optimization and
building the classifier (model)
118
min# N |:| | | | , N = 1
0, 1
(4.6)
In the case of this research study, the pairwise probability information defined
in eq. (4.6) was specified to be calculated by the LIBSVM software, in order to
estimate the probabilities of the classified customers. The probability estimates
(decision values) of the tested/validated data provide additional information for
data postprocessing i.e., selection of the suspicious customers.
SVC training in LIBSVM using the input parameter string in Figure 4.25 is shown
in Figure 4.26. During SVC training, the SMO training algorithm undergoes
iterations and on the last iteration, a total of 160 support vectors (SVs) are
calculated, as shown in Figure 4.26. Specifications of the trained classifier are
shown in Table 4.8.
119
120
Training Samples
Support Vectors
Weightage
53
42
722.6415
330
118
109.4258
The model file in Figure 4.27 is generated after SVC training is complete. The
first few lines in the model file indicate the parameters of the model followed by
the number of SVs. The first line in Figure 4.27 indicates the type of SVC method
used, which is C-Support Vector Classification (C-SVC) method. The second line
indicates the kernel type used, which is Radial Basis Function (RBF) kernel. The
third line in the model file indicates the value of (Gamma) used for SVC
training. The line following indicates the total number of classes, which is 2, i.e.
class 1 indicates fraud customers and class 2 indicates normal customers.
The total number of SVs in the classifier is 160, as indicated by the last line of
Figure 4.26 and the fifth line of Figure 4.27. In the sixth line of the model file, the
(rho) is a parameter indicating the negative value of the bias term l. The next
lines in the model file indicate the labels used to represent the two classes and
number of SVs in each class, i.e., Class 1 and Class 2 respectively are given in the
line following the pairwise probability l . All rows after the text SV in
Figure 4.27 represent the values of the 160 SVs computed using SVC training.
In the classifier in Figure 4.27, the total numbers of SVs are defined by the
constraint 0 = in eq. (3.24), with class 1 having 42 SVs and class 2 having
118 SVs. There were no bounded support vectors (BSVs), || in the classifier,
maximum value of the optimal solution of the dual problem in eq. (3.24) was
calculated to be: -133.5426. The (rho) parameter in the model file is defined as
= l in the decision function in eq. (3.46), which is the negative bias term b
The separating boundaries between the two classes of the classifier are shown
in Figure 4.28, which are plotted using the svm-toy.exe in the LIBSVM package.
As seen from the Figure 4.28, the dark region (blue in color) represents the
Class 1 boundary, while the lighter region (yellow in color) represents the Class
2 boundary. Therefore, this indicates that both classes are well separated during
the training process, resulting in a good training (learning) performance.
122
Figure 4.28: Separating boundaries between the two classes of the SVC model
Dark (blue) region indicates the Class 1 boundary and lighter (yellow) region
indicates the Class 2 boundary
Start of Testing
Reselect Optimal
SVC Parameters
using Grid Search
and CV and Retrain
Testing Data
(Customers)
Model File
(Classifier)
Bad
Good
Classification
Accuracy and Hitrate
Classification
(SVC) Results
End of Testing
Figure 4.29: The SVC testing engine proposed for the classification of fraud and
normal customers
In order to implement SVC testing, the LIBSVM v2.86 [199], Microsoft Disk
Operating System (MS-DOS) executable, svm-predict.exe is employed for testing
the customer data samples, as shown in Figure 4.30. The customer data used for
testing is in exact same format as the training data shown in Figure 4.22.
However, during testing the class labels of the data samples are obviously
unknown and for the purpose of simplicity all class labels in the testing data are
124
set to Class 2, as shown in Figure 4.31. The reason for using class labels in the
testing data is due to the data format requirement of LIBSVM. During the testing
and validation phase of the SVC, the class labels are internally ignored by the
LIBSVM software.
125
SVC testing in LIBSVM using the input parameters in Figure 4.30 is shown in
Figure 4.32. Implementation of the SVC testing/validation is shown in Figure
4.33, using the classifier model developed in Figure 4.27. During testing, 150
customers from the Kota Bharu station were tested, resulting in 127 correctly
classified customers and 23 incorrectly classified customers as, indicated by
Figure 4.33. The overall classification accuracy of SVC model is calculated to be
84.67%, and the detection hitrate defined in eq. (4.2) is calculated to be 77.16%.
The output file (classification results) generated after SVC testing is shown in
Figure 4.34. The first line in Figure 4.34 indicates the class labels predicted, i.e.
Class 1 and Class 2. The lines following the first line indicate the predicted
results for the tested customers with respect to the customer data features in
Figure 4.31. The predicted results for each customer in Figure 4.34 consist of
three data columns. The first data column indicates the predicted class for each
customer, where Class 1 represents fraud customers and Class 2 represents
normal customers. The second and the third data columns indicate the
probability estimates for Class 1 and Class 2 respectively for each customer. The
sum of the two probabilities (data column 2 and column 3) for each customer is
always equal to one.
126
Figure 4.34: Output file (classification results) after SVC testing is complete
of the results with the customer data, customer filtering is performed based on
human expertise and knowledge, using a FIS. The FIS performs filtering and
shortlists customers based on suspicious consumption patterns relating to fraud
activities and abnormalities. The following sections will briefly present the
theoretical concepts and background of fuzzy logic followed by the development
of the data postprocessing scheme for customer filtering and selection.
Start of Data
Postprocessing
e-CIBS Data
Classification (SVC)
Results for tested
customers
Data Parameter
Selection
HighRisk Data
Formation of SQL
statements for
Customer Filtering
and Selection
Transformation of
SQL into Rule Base
Membership
Function (MF)
Formation
Implementation of
Fuzzy Inference
System (FIS)
List of Suspicious
Customers
128
= , ( +|
(4.7)
where ( + is called the Membership Function (MF) for the fuzzy set . The
As most fuzzy sets in use have a universe of discourse consisting of the real
line , it is impractical to list all the pairs defining a MF. The convenient way to
define an MF is by expressing it as a mathematical formula. The most commonly
used MFs used in fuzzy sets are listed as follows:
129
1%(; , l, + =
0,
c
rc ,
cr ,
0,
l
l
(4.8)
0,
c
rc ,
1> (; , l, + = 1,
c
c ,
0,
l
l K
l
(4.9)
>
The parameters ', l, , >, (with < l < < >) determine the co-
^
a
1(; , + = c^Z
(4.10)
variable if its values are linguistic rather than numerical, i.e., young, very young,
old, very old, etc., rather than numerical, such as 20, 21, 23, 45 etc. Figure 4.36
illustrates the term set age expressed by Gaussian MFs.
Figure 4.36: Fuzzy Membership Functions (MFs) for the term set age
(4.11)
where A and B are linguistic variables or labels defined by fuzzy sets [205]
characterized by appropriate membership functions. The expression x is A is
131
If speed is low AND the distance is small, then the force on the brake is
small
(4.12)
( + = max( +, (+ = ( + ( +
(4.13)
132
(+ = min ( +, ( + = ( + ( +
(4.14)
the largest fuzzy set which is contained in both and . This reduces
to the ordinary intersection operation, if both and are non-fuzzy.
(4.15)
These fuzzy set operations perform exactly as the corresponding operations for
ordinary sets, if the values of the MFs are restricted to either 0 or 1.
133
Knowledge Base
Database
Rule Base
Input
Output
Fuzzification
Interface
Defuzzification
Interface
(Crisp)
(Crisp)
(Fuzzy)
(Fuzzy)
Decision-Making Unit
In common practice, the rule base and the database in a FIS are jointly referred
to as the knowledge base, as shown in Figure 4.37. The steps of fuzzy
reasoning (operations upon fuzzy IF-THEN rules) performed by FISs are:
1. Input variables are compared with the MFs on the premise part to obtain
the membership values (or compatibility measures) of each linguistic
label. This step is also known as fuzzification.
2. The membership values on the premise part are combined through fuzzy
set operations such as: min, max or multiplication to get firing strength
(weight) of each rule.
Type 1 In this type of FIS, the overall output is the weighted average of
each rules crisp output induced by the rules firing strength (the product
or minimum of the degrees of match with the premise part) and output
MFs. The output membership functions used in this scheme must be
monotonic functions [211].
Type 3 This type of FIS uses Takagi and Sugenos fuzzy IF-THEN rules
[212]. The output of each rule is a linear combination of input variables
plus a constant term, and the final output is the weighted average of each
rules output.
the correlated data in order to formulate SQL statements for the selection of
suspicious customers, as indicated in Figure 4.35. The following sections further
discuss in detail the processes implemented for the purpose of preliminary data
postprocessing.
Figure 4.38: Correlation of the SVC results with the customer data
The SVC results and customer data are correlated by employing data mining,
using SQL techniques. As seen in Figure 4.38, the first four data columns, i.e.
CustomerNo, CustomerName, CWR and Consumption are retrieved from the
preprocessed e-CIBS data in Figure 4.3. The data columns Predicted, Pclass1 and
136
Pclass2 are retrieved from the SVC output results shown in Figure 4.34, and last
two data columns in Figure 4.38, i.e. DetectCount and LastDetectDate are
retrieved from the preprocessed HighRisk data in Figure 4.19. The SVC results,
e-CIBS data and HighRisk data are correlated together based on the
CustomerNo, which is used as the criteria for matching the data together.
137
Parameter
Description
Class
L23
L24
MinkWh
MaxkWh
DiffkWh
DetectCount
Probability
TOE
10
HRC
The three customer selection levels indicated in Table 4.10 are termed as Fraud
Detection Levels (FDLs), which are: Low, Moderate and High respectively. The
FDLs utilize data parameters from Table 4.9 in order to match the customer
load consumption patterns with fraud patterns previously analyzed, where
results are obtained in terms of the matching percentage between the load
consumption patterns. The main idea behind the SQL statements in Table 4.10
is to filter unwanted customers using logical decisions based on the 10
parameters identified from the correlated data. The result from this filtering
yields a list of suspicious customers with a high possibility of fraud activities.
138
1. Replaced meters
2. Abandoned houses or premises
3. Change of tenants or residents, and
4. Faulty meter wiring
Pattern Match
Low
Matches 55% to
65% of the fraud
pattern
Moderate
Matches 65% to
75% of the fraud
pattern
High
Matches 75% to
100% of the fraud
pattern
SQL Statement
SELECT Customers FROM Result WHERE
((Class=1 AND L23<5.5 AND L24<5.5 AND
L23>1 AND L24>1 AND MaxkWh>7 AND
MaxkWh<26 AND MinkWh>1.2 AND
MinkWh<5 AND DiffkWh>6) AND
((DetectCount>=1 AND Probability>0.6)
OR (TOE=1 AND Probability>0.6) OR
(HRC=1 AND Probability>0.6) OR
(DetectCount=0 AND Probability>=0.6))
OR
((Class=1 AND L23>10 AND L24>10 AND
MaxkWh>75 AND MinkWh>8 AND
MinkWh<(0.2*MaxkWh)) AND
((DetectCount>=1 AND
Probability>0.55) OR (TOE=1 AND
Probability>0.55) OR (HRC=1 AND
Probability>0.55) OR (DetectCount=0
AND Probability>=0.65))))
SELECT Customers FROM Result WHERE
((Class=1 AND L23<4 AND L24<4 AND
L23>1 AND L24>1 AND MaxkWh>7 AND
MaxkWh<18 AND MinkWh>1.5 AND
MinkWh<4 AND DiffkWh>6) AND
((DetectCount>=1 AND Probability>0.6)
OR (TOE=1 AND Probability>0.6) OR
(HRC=1 AND Probability>0.6) OR
(DetectCount=0 AND Probability>=0.75))
OR
((Class=1 AND L23>10 AND L24>10 AND
MaxkWh>75 AND MinkWh>8 AND
MinkWh<(0.2*MaxkWh)) AND
((DetectCount>=1 AND
Probability>0.55) OR (TOE=1 AND
Probability>0.55) OR (HRC=1 AND
Probability>0.55) OR (DetectCount=0
AND Probability>=0.68))))
SELECT Customers FROM Result WHERE
(Class=1 AND L23<4 AND L24<4 AND
L23>1 AND L24>1 AND DiffkWh>6.5 AND
MaxkWh>7 AND MaxkWh<14 AND
MinkWh>1.5 AND MinkWh<3.5) AND
((DetectCount>=1 AND
Probability>=0.62) OR (TOE=1 AND
Probability>=0.62) OR (HRC=1 AND
Probability>=0.62) OR (DetectCount=0
and Probability>=0.8))
140
Table 4.11: Fuzzy rules transformed from the SQL statements in Table 4.10
Level
Fuzzy Rule
Low
Moderate
High
141
(a)
(b)
(c)
142
(d)
(e)
(f)
(g)
143
(h)
(i)
Figure 4.40: Fuzzy MFs used in order to implement the Low level fuzzy rule
In Figure 4.40 (a), (b), (c), (d) two trapezoidal MFs are used in each figure to
formulate the different ranges of the parameters L23, L24, MinkWh and
MaxkWh. In Figure 4.40(e) only one MF is used to represent the DiffkWh
parameter. The probability MFs are indicated in Figure 4.40(f), where four
trapezoidal MFs are used to represent four different probabilities (P1, P2, P3 and
P4) specified in the Low level SQL statement given in Table 4.10. For the case
of Figure 4.40(g), the DetectCount1 uses triangular MF, since, the DetectCount
can only be 0, therefore, a triangular MF with the parameters: [-0.01 0 0.01] is
used. Similarly in Figure 4.40(h) and (i) triangular MFs are used to represent
the TOE and HRC, since the TOE and HRC can only be 1, therefore, triangular
MFs with the parameters: [0.99 1 1.01] are used. The MFs formulated for the
other levels i.e., the Moderate and High fuzzy rules in Table 4.11 are formulated
in the exact same procedure as for the Low FPDL.
144
The following code describes the formation of the MFs using the parameter
values in the SQL statements in Table 4.10 for the Low FPDL. The code is in
syntax similar to Microsoft Visual Basic 6.0 and MATLAB programming.
TRAPMF(Probability1,
TRAPMF(Probability2,
TRAPMF(Probability1,
TRAPMF(Probability2,
0.61,
0.64,
0.61,
0.67,
0.62,
0.65,
0.63,
0.66,
1,
1,
1,
1,
1)
1)
1)
1)
%DetectCount MFs
DetectCountMF1 = TRIMF(DetectCount, -0.01, 0, 0.01)
DetectCountMF2 = TRAPMF(DetectCount, 1, 1, 1, 1000)
%TOE MF
TOEMF = TRIMF(TOE, 0.99, 1, 1.01)
%HRC MF
HRCMF = TRIMF(HRC, 0.99, 1, 1.01)
In the code above, TRIMF refers to the triangular MF defined in eq. (4.8), which
requires three input values and TRAPMF refers to the trapezoidal MF defined in
eq. (4.9), which requires four input values.
145
The FIS is implemented in such a way that all 10 parameter values in Figure
4.39 for each customer are evaluated one-by-one using a fuzzy rule selected in
Table 4.11. After evaluating the selected fuzzy rule, the output (crisp) value of
the FIS for each customer is known as the final value, which lies in between
the range of 0 to 1. Based on simple IF-ELSE logic statements of the final value
obtained, customers are marked as fraud or normal. The pseudocode of the
IF-ELSE logic structure, i.e. FIS algorithm for detecting fraud customers from
the correlated data is as follows:
The fraud customers (customers with final value > 0.5) are accumulated into
the List of Suspicious Customers, which is the result/output of the fraud
detection system, as indicated in Figure 4.2. The list of suspicious customers is
used by TNBD SEAL teams in order to carry out onsite inspection of customer
installations for the detection of fraud activities. The shortlisted customers from
the fraud detection system will reduce TNBDs operational cost in monitoring
NTL activities and will also increase their inspection hitrate by identifying black
areas, i.e. areas and regions where fraud customers are most likely present.
146
4.6 Summary
This chapter provided the methodology proposed for the fraud detection
framework and implemented the associated algorithms used for NTL
identification and detection. In the first sub chapter, a general project and
research methodology was introduced. Three major stages were involved in the
development of the intelligent fraud detection system, which include: (i) data
preprocessing,
(ii)
classification
engine
development,
and
(iii)
data
postprocessing. The data preprocessing sub chapter illustrated data mining, i.e.
SQL techniques used for preprocessing the raw customer information and
billing data for feature selection and extraction. The sub chapter, classification
engine development illustrated the SVC training, parameter optimization,
development of the SVC classifier and the SVC testing and validation engine. The
last sub chapter, data postprocessing presented the development of a Fuzzy
Inference System (FIS), creation of fuzzy rules and MFs for the selection of
suspicious customers.
147
CHAPTER 5
EXPERIMENTAL RESULTS
5.0 Overview
This chapter is composed of two main sub chapters. Sub chapter 1, presents the
Graphical User Interface (GUI) developed for the fraud detection system. The
GUI of the software developed generates the detection report of the list of
suspicious customers and the average daily consumption report. In sub chapter
2, model validation results are presented based on: (i) the classifier, (ii) pilot
testing, and (iii) comparison of the proposed model with other AI techniques.
Model validation results obtained are discussed and evaluated. The contribution
of the FIS for hitrate improvement is also discussed and the computational
intelligence scheme of SVC and FIS is compared to standard SVC. Finally in the
end of sub chapter 2, a comparative study of the proposed SVC and FIS model is
performed with two AI based classification techniques: (i) Multi-Layer
Backpropagation Neural Network (ML-BPNN), and (ii) Online-Sequential
Extreme Learning Machine (OS-ELM), in order to evaluate the efficiency of the
proposed fraud detection system.
simplify and aid the adoption of the proposed system to the user. The GUI
application (AFDS) upon being launched requires login authentication by the
user in order to access the software. The authentication screen of the AFDS is
shown in Figure 5.1.
After having been authenticated, the AFDS loads into the memory of the
computer, during which the welcome screen of the software appears for a few
seconds. The welcome screen of the AFDS is shown in Figure 5.2.
Button to
select e-CIBS
data file
Button to select
HighRisk data file
Detection
start/stop
toggle button
Location of e-CIBS
data file
Overall detection
progress
Location of
Fraud Pattern Detection
HighRisk data file
Level (FPDL)
Progress of the
current process
List of processes
executed
150
Figure 5.4: Selecting the e-CIBS data file in the AFDS software
Figure 5.5: File browser for data file selection in the AFDS software
151
After the file browser closes, the location of the selected file is displayed in the
textbox beside the file selection button, as shown in Figure 5.6. In order to
select the HighRisk data file, the exact same procedure is applied.
Figure 5.7: Selection of Fraud Pattern Detection Level (FPDL) in the AFDS
152
The three FPDLs indicated in Figure 5.7 and in Table 4.10 previously used for
fraud pattern detection, are described briefly as follows:
153
Figure 5.10: The start/stop detection toggle button in the AFDS software
155
Code of station
to be detected
Suspected customers
found out of the total
detected and FPDL used
Figure 5.12: Main screen of the AFDS software after detection is complete
detection report is typed in the File name textbox as shown in Figure 5.16. The
location on the computer where to save the detection report can be browsed
through the file browser. After the file name of the detection report to be saved
is specified, the Save button is clicked in the file browser dialog to save the
detection report in the Microsoft Office Excel format, as shown in Figure 5.16.
After the save button is clicked, within a few seconds, a message box as shown
in Figure 5.17 appears, which confirms the detection report has been saved.
After the message box disappears by clicking the OK button, the location of the
saved detection report will be displayed in the textbox beside save report
button, as shown in Figure 5.18.
157
Detection summary
Figure 5.14: Viewing the list of suspicious customers in the AFDS software
Figure 5.16: File browser dialog to save the detection report in the AFDS
Location of saved
detection report
Figure 5.18: Location of saved detection report in the second screen of the AFDS
159
A sample detection report for 150 customers tested from the KL Barat station is
shown in Figure 5.20, where Pattern indicates the detected consumption
160
pattern of the customers with respect to the three FPDLs. The ReadingUnit,
CustomerName, TOE and HRC fields in the detection report are retrieved from
the preprocessed e-CIBS data and the DetectCount and LastDetectDate fields are
retrieved from the preprocessed HighRisk data.
Figure 5.20: Sample detection report for customers tested in KL Barat station
161
Figure 5.22: Average daily consumption report for the KL Barat station Page 1
162
Figure 5.23: Average daily consumption report for the KL Barat station Page 2
as shown in Figure 5.25. The cover page of the AFDS Software Installation and
Operation Manual is shown in Figure 5.26.
Figure 5.24: Inspecting load profiles of suspected customers from the daily
average consumption report
Figure 5.25: Opening the AFDS software installation and operation manual
164
Figure 5.26: Cover page of the AFDS software installation and operation manual
used and the number of customers inspected. The following sections will further
discuss and evaluate: the experimental results obtained for SVC training, SVC
testing and validation, pilot testing, contribution of the FIS for fraud detection
hitrate improvement and comparison of the computational intelligence scheme
of SVC and FIS with other AI based classification techniques.
SVC training aims to obtain the best SVC parameters (, + for building the
classifier model. The developed classifier is evaluated using testing and
validation data, i.e. new and unseen data that has not used for training. The
accuracy of the classifier is evaluated using Cross-Validation (CV). The reason
for using CV is to ensure that the SVC does not overfit the training data.
The Grid Search method proposed by Hsu et al. in [193] was used for SVC
parameter tuning, where exponentially growing sequences of parameters (, +
were used to identify SVC parameters obtaining the best CV accuracy for the
383 classifier samples. Experimentally, 10-fold CV was used as the measure of
the training accuracy, where 67% of the 383 samples were used for training and
the remaining 33% were used for testing and validation. The 67% and 33%
training and validation data ratios used, are indicated by Mattfeldt et al. in [213]
in order achieve satisfactory level of CV results. For each parameter set tested,
the average 10-fold CV accuracy over 100 trials was computed, where on every
trial training and testing data were selected in a random order.
fold CV trials, the best SVC parameters were found to be: = 1 and = 0.92, for
the highest 10-fold CV accuracy of 93.71%. This CV accuracy is considered high,
because approximately 94% of the tested samples are classified correctly.
166
hyperparameter that defines the trade-off between the training error and model
For the trained classifier in Figure 4.27, the SVC parameter for 1 does not
affect the training accuracy of the SVC model, while < 1 provides a
significantly low training accuracy. The reason why the parameter 1 does
not affect the training accuracy of the classifier, is because, there are no
The parameter is the RBF kernel parameter used for SVC, which controls the
width of the RBF (Gaussian) kernel. Gamma, is related to (sigma), defined by
the following expression:
= ^
Where
(5.1)
For the optimal parameter = 0.92 used for training the classifier, the value of
is calculated using eq. (5.1), which is found to be, = 0.5907. The value of
for the trained classifier is acceptable, since any value of below 0.01 is
167
considered small and any value of above 100 is considered large. As acts as
small values lead to a higher VC-dimension, meaning that too many features
are used for modeling which lead to overfitting, while large values lead to a
lower VC-dimension, meaning that too few features are used to model the
168
Testing and validation results obtained from the SVC and FIS computational
intelligence scheme for the testing data in Table 4.1 are tabulated in Table 5.1.
As seen from Table 5.1 the total numbers of customers in all TNB stations are
tested. The training accuracy of the SVC in Table 5.1 is obtained from the
expression defined in eq. (4.5) and the inspection hitrate15 is obtained from
TNBDs feedback for manual onsite inspection of the shortlisted customers by
the fraud detection system.
Table 5.1: Model testing and validation results for the fraud detection system
TNBD Station
No. of Customers
Tested
SVC Training
Accuracy
(Memorization)
Inspection
Hitrate
Kota Bharu
76,595
81.66%
42.56%
Kuala Krai
18,880
73.56%
38.07%
Gua Musang
13,045
79.27%
41.39%
As indicated from Table 5.1 for all three cities in the Kelantan state of Malaysia
an average training accuracy of 78.16% and an average inspection hitrate of
40.67% is achieved. As an example, an inspection hitrate of 40% means that, if
the AFDS shortlists 100 suspicious customers, then the TNBD SEAL teams will
perform onsite inspection for all the 100 shortlisted suspicious customers. From
the 100 inspected customer, if 40 customers are found as confirm fraud cases by
the TNBD SEAL teams then the inspection hitrate is 40%.
15
Inspection hitrate is the measure of accuracy in percentage for the number of customers
identified as fraud by TNBD SEAL teams (for manual onsite inspection) from the total number of
customers shortlisted as suspicious from the fraud detection system.
169
were shortlisted from the entire city of Bangi, after which TNBD SEAL teams
performed onsite inspection for all of the 105 shortlisted customers. Based on
the SEAL teams manual inspection, the results revealed that 43 customers out of
the total 105 customers were detected as confirm fraud cases, resulting in a
inspection hitrate of 40.95%, which was inclusive of both fraud activities and
abnormalities (as mentioned in section 4.5.2.3). Therefore, the model testing
and validation results for the three cities in the state of Kelantan and the city of
Bangi in the state of Selangor, can be accumulated together to obtain an average
inspection hitrate of 40.75%.
The 40% inspection hitrate obtained from the proposed fraud detection system
as compared to TNBDs current inspection hitrate of 3% to 5% is a major
improvement in terms of the fraud detection rate. Thus, on an average basis the
fraud detection system can identify 35-37% of fraud customers better than the
TNBD SEAL teams, and it has also shown that it can imitate the capability of a
SEAL teams without physically inspecting the electricity meters. Therefore, it is
feasible to say that the proposed fraud detection system is better in terms of the
inspection hitrate as compared to the current actions taken by TNBD.
Since the total amount of customers remaining in the KL Barat station data after
customer filtering and selection was approximately 186,900 as indicated in
170
The pilot testing results for the KL Barat station data indicate that the average
training accuracy of the SVC and the average inspection hitrate are significantly
better as compared to the model validation and testing results for the three
cities in the Kelantan state of Malaysia. The logical reasoning behind this is that
the load consumption patterns of fraud customers which signal fraud activities
in rural and urban areas within Malaysia do not have exactly similar trends.
Since Kuala Lumpur (KL) is an urban area in Malaysia, with higher population
density and faster pace of life, Kelantan is considered a rural area in Malaysia
with a lower population and slower pace of life.
Since the SVC model is trained and tested/validated using data from the same
station (KL Barat station), therefore, the pilot testing results are significantly
better as compared to the results for the three cities in the state of Kelantan.
This is because training and testing the fraud detection system with data from
the same city/station will have similar trends of fraud consumption patterns.
No. of Customers
Tested
Inspection Hitrate
SVC
Kota Bharu
76,595
34.12%
42.56%
Kuala Krai
18,880
29.54%
38.07%
Gua Musang
13,045
31.98%
41.39%
The reason behind the improved inspection hitrate using the computational
intelligence scheme of SVC and FIS is that, the FIS attempts to emulate the
reasoning process that a human expert undertakes in detecting fraud activities
172
from load consumption profiles. A clear and conclusive research finding is that,
the FIS is able to remove the hard limiting of the parameter values from SQL
statements in Table 4.10 with the help of the MFs, and with the addition of
experienced human knowledge into the fraud detection system a better scheme
for selection of suspicious customers from the SVC results is developed.
The reason for a considerably low inspection hitrate is due to the fact that the
amount of e-CIBS data provided by TNBD was less, i.e., only two years of data
was provided due to the problem associated with retrieving the archived data.
The customer data provided by TNBD was not entirely sufficient to back-track a
significant amount of the customer consumption history. Since two years of
customer consumption data from the 10 years archive, contributes to only 20%
of the entire customer consumption history, therefore, this is the limitation due
to which a significantly low inspection hitrate is obtained. Nonetheless, utilizing
only 20% of historical customer load consumption patterns for SVC training and
testing, the fraud detection system is able to achieve an average inspection
hitrate of 40%, which significantly increases TNBDs current inspection hitrate
35-37% from 3-5%.
173
In the case of pilot testing on the KL Barat station data as indicated in section
5.2.2.1.1 previously, the inspection hitrate is higher, i.e. approximately 48%. The
reason for higher inspection hitrate is due to the fact that the same stations
data is used for training and testing the SVC. In addition, with respect to the
model testing and validation results obtained for the state of Kelantan in
Malaysia, it is indicated that load consumption patterns of customers relating to
fraud activities have different trends for rural and urban populations within
peninsular Malaysia. Therefore, the inspection hitrate may vary for different
cities within peninsular Malaysia. However, with the use of the proposed fraud
detection system, an average inspection hitrate of 40% is more or likely
achievable for any city within peninsular Malaysia.
This FIS in the proposed SVC and FIS fraud detection framework contributes to
improve the inspection hitrate by an increase of 8%, due to the inclusion of
human knowledge and intelligence into the system. In addition, the one and only
drawback of the proposed fraud detection system is that, customers committing
fraud activities before the two year period for which there is no data of will not
be detected as suspicious customers by the fraud detection system. This is
because customers who commit fraud activities before the two year period have
normal load consumption patterns with no noticeable abrupt drops or sudden
changes, in order for the SVC to classify them as fraud suspects.
174
results obtained for model testing and evaluation using the two AI based
classification techniques mentioned above.
pattern is generated by the output layer. If the pattern is different from the
desired output, an error is calculated in each output neuron and then it is
propagated backwards through the network from the output layer to the input
layer. The weights of each neuron are adjusted as the error is propagated.
Figure 5.27 shows a typical architecture of a BPNN [215]. The following four
steps illustrate the implementation of the BP algorithm.
Step 1: Initialization
All the weights and threshold levels of the network are set to random
numbers uniformly.
Step 2: Activation
The BPNN is activated by applying inputs (5+, (5+, , h (5+ and desired
the range of 0 and 1. Then the actual output of the neurons in the hidden
layer is calculated using the function below:
*| (5+ = 11>hN (5+ | (5 + | $
(5.2)
Where
n is the number of inputs of neuron j in the hidden layer, and
sigmoid is the sigmoid activation function.
Step 4: Iteration
The value of k is increased by one, and Step 2 is repeated again. The
iterations continue until the error becomes zero.
176
Input Signals
x1
y1
..
..
y2
yk
yl
x2
xi
..
..
..
..
xn
wij
wjk
Hidden Layer
Output Layer
Input Layer
Error Signals
Many kinds of activation functions have been proposed and the BP algorithm is
applicable to all of them. A differentiable activation function makes the function
computed by a neural network differentiable (assuming that the integration
function at each node is just the sum of the inputs), since the network itself
computes only function compositions.
ELM is a general learning algorithm for SLFNs that works effectively for function
approximations, classifications, and online prediction problems. Moreover, it
can generally work well for a variety of types of applications. Usually, a SLFN
has three kinds of input parameters: (i) the input weight , (ii) the hidden
neuron biases l , and (iii) the output weight . While conventional learning
algorithms of SLFNs have to tune these three parameters, ELM randomly
generates the input weight and the hidden neuron biases l and then
Where
= 1,2, ,
(5.3)
is the weight vector connecting the input neurons and the ith hidden
neuron i,
is the weight vector connecting the ith hidden neuron and the output
neurons.
Here | denotes the inner product of and | . If the SLFN can approximate
178
p( , , , l , , l , , , + =
( + l + ( + l +
( + l + ( + l +
=
L
and
=
(5.4)
(5.5)
L
neuron output with respect to inputs , , , h . Based on the previous work
of Huang in [154], matrix H is square and invertible only if the number of hidden
= , indicating
neurons is equal to the number of distinct training samples
that SLFNs can approximate these training samples with zero error. In most
cases, the number of hidden neurons is much lower than the number of distinct
, H is a non-square matrix and there may not exist
training samples,
, l (1 = 1, , + such that p = . Thus one specific set of
, l (1 =
p
,,
, l , , l
= min,r, p( , , , l , , l +
(5.6)
(5.7)
Huang in [218, 222] indicates, the hidden neuron parameters need not be tuned,
as the matrix H indeed converts the data from non-linear separable cases to
high dimensional linear separable cases. However, Huang in [220] showed that
the input weights and hidden neurons or kernel parameters are not necessarily
to be tuned and can be randomly selected and then fixed. Thus, for fixed input
179
weights and the hidden layer biases or kernel parameters training a SLFN is
equivalent to finding a least squares solution for the linear system, p = .
In order to handle online applications, the variant of ELM referred to as OnlineSequential Extreme Learning Machine (OS-ELM) was introduced by Liang et al.
[223]. It was proposed to overcome the limitation of ELM as developed by
Huang [220]. As the ELM algorithm belongs among the batch learning
algorithms, this prohibits its further application. As in the real world, training
data may arrive either chunk-by-chunk or one-by-one, therefore an OnlineSequential learning is most suitable to cater for such variations. The OS-ELM
algorithm is designed to handle both additive neurons and RBF nodes. This
algorithm was originally developed for SLFNs with additive or radial basis
function (RBF) hidden nodes in a unified framework. Unlike other sequential
learning algorithms that require so many parameters to be tuned, OS-ELM only
requires the number of hidden nodes to be specified.
The OS-ELM as proposed by Liang et al. [224] consists of two phases: (i) an
initialization phase, and (ii) a sequential-learning phase. In the initialization
phase, the number of data required should be at least equal to the number of
hidden nodes. The boosting phase trains the SLFNs using the primitive ELM
method given some batch of training data in the initialization stage. This data is
discarded once the process is complete. Following the initialization phase, in the
learning phase, OS-ELM learns the training data chunk-by-chunk. Subsequently,
all the training data is discarded once the learning procedure involving the data
is complete.
In the derivation of OS-ELM, only the specific matrix H is considered, where the
. Under this
rank of H is equal to the number of hidden neurons: 5(p+ =
180
derived and given by, p = (p p+ c p , which is often called the left pseudoinverse of H from the fact that p p = . The corresponding estimation of is
given by:
= (p p+c p
(5.8)
(a) Assign random input weight and bias l or centre and impact
.
width , 1 = 1,2, ,
(d) Set 5 = 0.
+ 1,
+ 2,
+ 3, ,
observation ( , +, where h , L and 1 =
do the following:
181
(c) Set 5 = 5 + 1.
(b
b
(b + b
=
1 + b (b +
= (+ + b (b + b
(+
The ELM and OS-ELM provide a faster learning capability as compared with
conventional machine learning algorithms. Unlike many other popular learning
algorithms, little human involvement is required in OS-ELM. Except for the
numbers of the hidden neurons (insensitive to OS-ELM), no other parameters
need to be tuned manually by users, because this algorithm chooses the input
weights randomly and analytically determines the output weights. Furthermore,
as Huang and Chen [218] have recently proven, OS-ELM is actually a learning
algorithm for generalized SLFNs.
randomly generates the input weight and the hidden neuron biases of the SLFN
and uses them to calculate the output weight without requiring further learning.
In the classification results, the inspection hitrate and the training accuracy of
the proposed SVC and FIS model is compared to the experimental results
obtained for the ML-BPNN and OS-ELM models. The ML-BPNN and OS-ELM
techniques are evaluated using the same data preprocessing framework
outlined in Figure 4.5. Experimental results obtained for the proposed SVC and
FIS model versus the ML-BPNN and the OS-ELM models are tabulated in Table
5.3.
Table 5.3: Experimental results of the proposed SVC and FIS scheme versus the
ML-BPNN and OS-ELM classification techniques
Model
ML-BPNN
OS-ELM
TNBD Station
Training Accuracy
(Memorization)
Inspection
Hitrate
Kota Bharu
85.23%
28.51%
Kuala Krai
81.08%
23.93%
Gua Musang
83.12%
26.38%
Kota Bharu
82.96%
33.06%
Kuala Krai
75.43%
29.97%
Gua Musang
81.71%
30.45%
Kota Bharu
81.66%
42.56%
Kuala Krai
73.56%
38.07%
Gua Musang
79.27%
41.39%
As indicated by Table 5.3, the highest training accuracy is achieved by the MLBPNN and the highest inspection hitrate is obtained by the SVC and FIS model.
The training accuracy is a measure of the memorization capability of the
183
For the case of the comparative study, the architecture of the BPNN is chosen to
have 25 inputs, two hidden layers and one output layer. The single neuron in the
output layer of the BPNN gives an output of 0 for good customers and an
output of 1 for suspicious customers. All 25 features are fed as the input data to
the input layer of the BPNN. The actual output of the neurons in the hidden
layers is calculated using the activation function defined in eq. (5.2).
In the comparative research study, the OS-ELM was implemented with the
Radial Basis Function (RBF) activation function. In the OS-ELM with the RBF
nodes, the centres and widths of the nodes were randomly generated and fixed,
and based on this, the output weights were analytically determined. The
number of hidden neurons used in the OS-ELM varies in the range from 20 to
200. The method to search for the optimal size of the hidden layer neurons in
the OS-ELM is suggested by Huang et al. in [220]. With an initial starting size of
20, the number of neurons is increased with a step of 20, until 200 neurons are
reached. Based on the output performance, the optimal size of the neurons is
decided. Finally based on the optimal size of neurons, 100 trials are performed,
after which the average accuracy is determined based on the best output.
The number of neurons in the hidden layers of the BPNN and OS-ELM are tuned
using CV. In this research study, 10-fold CV is chosen, since there are many
training data present that can be divided into subsets. The reason for using CV is
to ensure the accuracy of the results do not overfit the training data. Table 5.4
summarizes the comparison results for Table 5.3, by computing the average
184
training accuracy and the average inspection hitrate for the tested customers in
the three cities in the state of Kelantan.
As indicated by Table 5.4, the ML-BPNN achieves the highest training accuracy
and the lowest inspection hitrate. In contrast, the OS-ELM obtains a slightly
higher inspection hitrate as compared to the ML-BPNN. With respect to the
training accuracy, the proposed SVC and FIS model obtains the lowest training
accuracy. However, in terms of the inspection hitrate, the proposed SVC and FIS
model outperforms the other two models by far, with an average inspection
hitrate of 40.75%. The increase in the inspection hitrate of the proposed model
is 14.5% and 9.5% as compared to the ML-BPNN and OS-ELM respectively.
Table 5.4: Comparison of the average training accuracy and inspection hitrate of
the proposed SVC and FIS scheme versus the ML-BPNN and OS-ELM
Model
Training Accuracy
(Memorization)
Inspection Hitrate
(Onsite Inspection)
ML-BPNN
83.14%
26.27%
OS-ELM
80.03%
31.16%
78.16%
40.75%
185
The 10-fold CV method used for tuning the hidden layer neurons in the MLBPNN and OS-ELM, firstly partitions the training data into 10 subsamples. Of the
10 subsamples, a single subsample is retained as the validation data for testing
the model, and the remaining 9 subsamples are used as the training data. The
CV process is then repeated 10 times, with each of the 10 subsamples used
exactly once as the validation data. The results from the 10 folds are then
averaged to produce a single estimation, which is generally termed as the
training accuracy of the classifier, defined by eq. (4.5).
As indicated from Table 5.4, it is observed that all three models have a relatively
good memorization capability, i.e., high learning rates (training accuracies).
However the BPNN obtained the highest training accuracy, followed by the OSELM and the proposed SVC and FIS model. The main reason for the higher
training accuracy of the BPNN is, as the BPNN model is a problem of nonlinearity optimization using a gradient descent approach, the major peculiarity
that impacts its performance is the presence of the local minimum. The main
drawback of the BPNN is that, it gets trapped into the local minimum. Figure
5.28 illustrates the phenomenon of the local minimum, in which case the
training is to be optimized to achieve a global optimum solution. The local
minimum is determined with respect to the Mean Square Error (MSE), generally
ends at that instance, resulting in locally optimized answers. Proof of the local
minimum phenomenon is shown in Figure 5.28.
MSE
Local minimum
Global minimum
(best solution)
wj
Figure 5.28: The phenomenon of local minimum in a BPNN
It is also observed from the experimental results in Table 5.4 that the BPNN
achieves the lowest inspection hitrate of 26.27%. This indicates that the BPNN
has a lower generalization capability as compared to the OS-ELM and SVC. The
main reason for this is because, as it is difficult to obtain the best network
structure of the BPNN using a trial and error procedure, the optimum solution
cannot be easily found. In terms of more logical reasoning it can be said that, as
the BPNN tries to minimize the training error, it results in over fitting the
training data. However, this does not mean that the BPNN is not a good
algorithm for classification, but as there are many noisy training data present,
the BPNN is not a suitable choice to be used in this application.
The OS-ELM used for comparison purposes in this research study overcomes
many issues in traditional gradient algorithms like BPNN, such as: stopping
187
criterion, learning rate, number of epochs and local minima. The experimental
results obtained in Table 5.4 reveal that the training accuracy of the OS-ELM is
slightly higher and the inspection hitrate of the OS-ELM is significantly lower as
compared to the proposed SVC and FIS model. The reason for the higher
training accuracy of the OS-ELM is because, the OS-ELM iteratively fine tunes
the networks input weights and biases using finite samples of the training data,
which yields in a higher memorization capability. For the RBF activation
function, the OS-ELM randomly initializes hidden neuron parameters (input
weight vectors and neuron biases for additive hidden neurons and centers and
impact factors for RBF hidden neurons) and iteratively computes the output
weight vector.
Furthermore, in the OS-ELM it was observed that, if the order of the training
samples is switched or changed, the resulting training accuracy of the OS-ELM
also changes. Therefore, in order to cater this situation, during training the
training accuracy was computed over an average of 100 trials, where on each
trial training samples were ordered randomly. Another noticeable observation
concluded from the OS-ELM was that, with the increase of neurons, the OS-ELM
achieved a better performance, while remaining stable for a wide range of
neuron sizes. However, with an increase in the number of hidden layer neurons,
the training time of the OS-ELM decreases.
The inspection hitrate achieved by the OS-ELM during testing and validation
was 31.16%, which is significantly lower as compared to the proposed SVC and
FIS model, with an inspection hitrate of 40.75%. There are a few reasons which
can be stated in order to contribute to the lower generalization capability of the
OS-ELM. The first reason being that, the assignment of the initial weights in the
OS-ELM is arbitrary, which effects generalization performance of classifier. As
the proper selection of input weights and hidden bias values contributes to the
188
is that, the value of (Gamma) in the RBF activation function is set as a constant
value of 1. The cause of this is unclear and is not stated by the authors in [223].
The only justifiable reason given for this is that, for the three classification
problems conducted by Liang et al. in [223]: (i) image segmentation, (ii) satellite
image problem, and (iii) DNA problem, the RBF activation function using the
value, = 1 achieves the best classification performance.
constant value, since the width of the Gaussian function depends upon the data
to be classified and the amount of noise present in the data. For the case of the
three classification problems in [223], the value of = 1 might result in the best
classification accuracy, however, this might not be true for other types of
classification problems with different data. Furthermore, there is no evidence or
literature found in the OS-ELM, on how to tune the parameter in the RBF
The last reason contributing to the lower inspection hitrate of the OS-ELM is
that, since OS-ELM only requires one parameter to fine tune, which is the
number of hidden neurons in the hidden layer, in reality however, it is relatively
189
difficult to obtain the best network structure using a trial and error procedure,
as the optimum solution cannot be easily found. The ELM and OS-ELM, do suffer
from a few drawbacks, which are indicated as follows:
(a) For achieving comparable results, the number of neurons in the hidden
layer must be chosen larger than in the standard BP algorithms. This is
because the neuron weights and biases are not learned from the data.
(b) As there is only one hidden layer in the SLFN, if trained properly, MultiLayer Perceptron (MLP) networks with more than one hidden layer can
possibly achieve similar and even better results as compared to ELM and
OS-ELM.
(c) The solution provided by ELM and OS-ELM is not always so smooth, and
mostly shows some ripple.
The method of using SVC for fraud detection is very promising, as the SVC and
FIS model achieves the highest inspection hitrate for fraud customer detection,
as indicated in Table 5.4. Firstly, it is noted that, SVC has non-linear dividing
hypersurfaces that give it high discrimination. Secondly, SVC provides good
generalization ability for unseen data classification. Lastly, SVC determines the
optimal network structure itself, without requiring to fine tune any external
parameters, as in the case of the ML-BPNN and the OS-ELM. In contrast to the
advantages of SVMs over neural networks, there are however some drawbacks
of SVMs. These drawbacks are restricted due to practical aspects concerning
memory limitation and real time training. Some of the major drawbacks of SVMs
are as follows:
(a) The optimization problem arising in SVMs is not easy to solve. Since the
number of Lagrange multipliers is equal to the number of training
190
samples, the training process is relatively slow. Even with the use of the
SMO algorithm, real time training is not possible for a large set of data.
In comparing SVC to the OS-ELM, the only advantage of OS-ELM over SVC is its
faster training process, with the increase in the chunk size. It is well known that
with the RBF as the kernel function, SVMs suffer from tedious parameter tuning.
The OS-ELM even with a single parameter to be tuned, its arbitrary assignment
of initial weights requires it to search for the optimal size of neurons and
execute many times in order to get the average value. Hence, in this case, the OSELM loses its edge over SVC. Given all these aspects, the author feels that SVC is
a superior technique when the requirement is to solve a classification problem.
5.3 Summary
This first sub chapter presented the Graphical User Interface (GUI) developed
for the fraud detection system. The GUI of the software developed generates the
detection report of the list of suspicious customers and the average daily
consumption report. The second sub chapter presented the model testing
results based on: (i) the classifier, (ii) pilot testing, and (iii) comparison of the
proposed SVC and FIS model with other AI based classification techniques. The
contribution of the FIS for hitrate improvement is also discussed and the
computational intelligence scheme of SVC and FIS is compared to standard SVC.
Finally in the end of sub chapter 2, a comparative study of the proposed SVC and
191
FIS model was performed with two AI based classification techniques: (i) MultiLayer Backpropagation Neural Network (ML-BPNN), and (ii) Online-Sequential
Extreme Learning Machine (OS-ELM) in order to evaluate the efficiency and
effectiveness of the proposed model.
192
CHAPTER 6
6.0 Overview
This chapter concludes the thesis and summarizes the research contributions
made. The achievements and objectives of the research study with respect to
the project are highlighted along with the key findings of the research. In
addition, this chapter also discusses the impact and significance of this project
to TNB in Malaysia and suggests future research in the present context that
merits consideration.
In the developed fraud detection system, SVC has a considerable advantage over
neural networks, as it provides the use of soft margins for the purpose of
separation (classification), thus allowing improvement in the generalization
193
to
neural
networks.
The FIS as indicated by Table 5.2 gives an additional boost to the inspection
hitrate, with the inclusion of experienced human knowledge into the fraud
detection system. The main reason behind the increased inspection hitrate
using the FIS is that, the FIS attempts to emulate the reasoning process that a
human expert undertakes in detecting fraud activities. In addition, the FIS is
able to remove the hard limiting of the parameter values from SQL statements
with the help of the MFs. Thus, with the use of the FIS, the fraud detection
system provides a combination of two computational intelligence schemes, i.e.
the SVC and FIS, and achieves higher fraud detection accuracy.
194
The average pilot testing results for the proposed SVC and FIS model indicate
that an inspection hitrate of 48% is achievable. However, this is not true for all
cities within peninsular Malaysia, as the average inspection hitrate obtained
using testing data for different cities within peninsular Malaysia is 40.75%, as
indicated in Table 5.4. This is because in pilot testing the data used for training
and testing/validating the SVC model is for the same station, which results in a
higher inspection hitrate. Thus, with the use of the proposed fraud detection
system, an average inspection hitrate of 40% is more or likely achievable for
any city within peninsular Malaysia.
Comparison of the proposed SVC and FIS model with standard SVC indicates
that the SVC and FIS model obtains a higher inspection hitrate, as shown in
Table 5.2. On an average, by using FIS scheme, the fraud inspection hitrate
increases from 32% to 40%. This indicates that the FIS contributes an 8%
increase to the inspection hitrate, which is a significant increase in the detection
accuracy. Thus, the fraud detection system is able to achieve an average
inspection hitrate of 40%, which increases TNBDs current inspection hitrate
35-37% from their current hitrate of 3-5%.
Since SVMs and neural networks are considered as black box models, which
share the same structure but utilize different learning methods, a comparative
study is performed using SVC and neural networks. The comparative results
between the neural networks (ML-BPNN and OS-ELM) and the proposed SVC
and FIS model indicate that the proposed model has a higher performance, i.e.
inspection hitrate, compared to the ML-BPNN and OS-ELM, as indicated by
Table 5.4. This indicates that the generalization performance of the SVC and FIS
model is significantly better as compared to the ML-BPNN and the OS-ELM.
195
196
The second objective of this research study is to identify, detect and predict
customers with fraud activities, abnormalities and other irregularities, by
investigating and monitoring significant deviations and abrupt changes from
historical customer consumption patterns with the use of billing data. By the
implementation of this research study, the SVC model developed, utilizes load
consumption patterns, i.e. load profiles of customers, in order to detect and
identify customers with fraud activities and abnormalities. As, this problem is
pure a pattern classification task, fraud customers are identified by detecting
abrupt drops or sudden changes in load consumption patterns of customers.
Through the experimental results obtained for model testing and evaluation, it
is clearly indicated that fraud activities relating to NTLs, can be detected and
identified by investigating historical load consumption patterns of customers.
In the fourth objective of the research study, a fraud detection model using a
combination of two AI based computational intelligence schemes, namely:
Support Vector Machine (SVM) and Fuzzy Inference System (FIS), is proposed
to be developed. Through this research study, a SVC and FIS computational
intelligence model for fraud detection was developed, which is fully functional
and ready to be used. In the developed SVC and FIS model, the SVC is used as
the core of the pattern classification engine in order to detect and identify fraud
197
customers based on their load consumption profiles, while the FIS is used a data
postprocessing scheme in order to select suspicious customers based on the
results from customer data correlated with the SVC results.
Lastly, this research aims to evaluate the proposed fraud detection system using
customer data from TNBD for different cities within in peninsular Malaysia and
provide a comparative study using different AI based classification techniques
for the purpose of benchmarking the proposed model. Based on the research
study conducted, the developed SVC and FIS computational intelligence model
is tested and validated using TNBD customer data from different cities within
peninsular Malaysia, as indicated by section 5.2.2. In addition, a comparative
study is also conducted which benchmarks the proposed SVC and FIS model
with two neural networks: (i) the ML-BPNN, and (ii) the OS-ELM. The results of
the comparative study indicate that the proposed model is significantly better
in terms of the fraud detection hitrate, as compared to the other two AI based
classification techniques. The developed fraud detection system does show
encouraging results, however it also has a few limitations. These limitations can
be eliminated by implementing the future work suggested in section 6.5.
Large inspection campaigns have been carried out by TNBD with little success.
The current actions taken by TNBD SEAL teams in order to address the problem
198
of NTLs include: (i) meter checking and premise inspection, (ii) reporting on
irregularities, and (iii) monitoring of unbilled accounts, which have resulted in a
detection rate of 3-5% from the total inspections carried out. This is because at
present, customer installation inspections are carried out without any specific
focus or direction. Most inspections are carried out at random, while some
targeted raids are undertaken based on information reported by the public or
meter readers.
199
4. Lastly, by using the proposed fraud detection system, great time saving
in detecting and identifying problematic electric meters can be achieved
for power utilities including TNB in Malaysia.
The section below discusses a scenario where by the developed fraud detection
system can help to reduce TNBs operational cost due to onsite inspection in
monitoring NTL activities. The scenario taken is based on TNBs experience for
industrial and commercial OPCs in the financial year of 2005 [125, 198]. This
scenario uses the minimal inspection hitrate achieved by fraud detection system
for model testing and evaluation, which is 38% as indicated in Table 5.1. A
simple assessment of the benefit of the developed fraud detection system is
illustrated as follows:
such as the: (i) Sabah Electricity Sdn. Bhd. (SESB), and (ii) Sarawak Electricity
Supply Corporation (SESCO). The fraud detection system can be implemented in
the SESB and SESCO distribution networks in order to evaluate the detection
accuracy and performance of the system. This research study strongly believes
that the fraud detection system can contribute significant improvement to
SESBs and SESCOs distribution networks for the reduction of NTL activities.
The sections below provide brief suggestions and recommendations on future
work, which can be carried out on the current fraud detection system, in order
to increase the detection accuracy of the system.
A practical difficulty of using SVC is the selection of parameters, C and the kernel
parameter in the RBF (Gaussian) kernel. Even though with the use of -fold CV
with the Grid-Search method, as used in this research study, an optimal solution
might not be achieved. The parameters (, + need to find their optimal values
so they can minimize the expectation of testing and validation error, that needs
to adapt to multiple parameters values at the same time.
The Genetic Algorithm (GA) with its characteristics of high efficiency and global
optimization has been widely applied in many applications to solve optimization
problems. Thus, it is suggested that, in order to solve the Dual Lagrangian
Optimization (DLO) problem in SVC for the selection of optimal parameters, a
202
hybrid combination of the SVC and the GA can be implemented as the future
work for this research study. The hybrid SVC-GA algorithm can be developed as
an alternative technique to search for the optimal SVC parameters, which will
avoid the local optimum in finding the maximum Lagrangian.
In order to achieve a better kernel for SVC, one possible way is to adjust the
velocity of decrement in each range of the Euclidean distance between the two
points. The multi-scale kernel obtained using this method should however
maintain the characteristics of the RBF kernel. To implement this multi-scale
RBF kernel, a combination of RBF kernels at different scales is suggested as the
future work for this research study. In order to proceed, at first, a linear
combination of the RBF must be satisfied to be the Mercers kernel. It is hoped
that with use of this proposed kernel, the classification accuracy of the SVC
model will improve.
the SQL statements parameter values, the fuzzy MFs are not optimized or fine
tuned. Fine tuning the fuzzy MFs for the derived fuzzy rules, can improve the
inspection hitrate of the fraud detection system, by increasing the amount of
suspicious customers shortlisted and reducing the amount of abnormalities in
the result/output of the fraud detection system.
(L+
= |`` | ,
|x
x |
`
= 1,2, ,24
(6.1)
Where
|9Yb 9Y | represents the absolute difference in the monthly kWh
204
For future work in this research study, it is suggested that the alternative
feature extraction technique in eq. (6.1) can be used as a measure to compare
the performance of the fraud detection system against the proposed feature
extraction technique in eq. (4.1). By using eq. (6.1) in the proposed SVC and FIS
model, it is hoped that a higher inspection hitrate can be obtained.
6.6 Conclusion
In conclusion, the overall research has shown encouraging results and a
performance that matches human intelligence in detecting fraud customers and
abnormalities in peninsular Malaysia. The main contribution of this research
study is that, the fraud detection system developed in order to assist TNBD
SEAL teams is able to achieve an average inspection hitrate of 40%, which
increases TNBDs current inspection hitrate 35-37% from their current hitrate
of 3-5%. The outcome of this research study, which is the AFDS, is currently in
use by TNB in Malaysia, for their residential, commercial and light industry
customers in the low voltage distribution network. It is expected that,
utilization the fraud detection system will benefit TNB not only in improving its
handling of NTLs, but will complement their existing on-going practices and it is
envisaged that tremendous savings will result from the use of this system.
205
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
M. Syeda, Y.-Q. Zhang, and Y. Pan, Parallel Granular Neural Networks for
Fast Credit Card Fraud Detection, in Proc. of the 2002 IEEE International
Conference on Fuzzy Systems, Honolulu, Hawaii, May 2002.
[28]
[29]
[30]
[31]
[32]
208
[33]
[34]
[35]
[36]
[37]
[38]
Y. Chen, Support Vector Machines and Fuzzy Systems, Springer US, 2008.
Part III, pp. 205-223.
[39]
[40]
[41]
[42]
[43]
209
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
210
[54]
[55]
[56]
[57]
[58]
[59]
[60]
K. L. Lo, Z. Zakaria, and M. H. Sohod, Application of Two-Stage Fuzzy CMeans in Load Profiling, WSEAS Transactions on Information Science
and Applications, vol. 2, no. 11, Nov. 2005, pp. 1905-1912.
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
L. Bjornar and A. Chinatsu, Fast and Effective Text Mining using LinearTime Document Clustering, in Proc. of the 5th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Diego,
California, United States, Aug. 1999.
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
[86]
G. Chicco, R. Napoli, and F. Piglione, Load Pattern Clustering for ShortTerm Load Forecasting of Anomalous Days, in Proc. of the IEEE Porto
Power Tech, Porto, Sept. 2001.
213
[87]
[88]
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
[98]
214
[99]
[100] Y. Kou, C.-T. Lu, S. Sirwongwattana, and Y.-P. Huang, Survey of Fraud
Detection Techniques, in Proc. of the IEEE International Conference on
Networking, Sensing and Control, Taipei, Taiwan, Mar. 2004.
[101] V. J. Hodge and J. Austin, A Survey of Outlier Detection Methodologies,
Artificial Intelligence Review, vol. 22, no. 2, Oct. 2004, pp. 85-126.
[102] Z. Ferdousi and A. Maeda, Unsupervised Outlier Detection in Time Series
Data, in Proc. of the 22nd International Conference on Data Engineering
Workshops, Atlanta, Georgia, U.S.A., Apr. 2006.
[103] E. Lozano and E. Acufia, Parallel Algorithms for Distance-Based and
Density-Based Outliers, in Proc. of the 5th IEEE International Conference
on Data Mining, Houston, Texas, U.S.A., Nov. 2005.
[104] D. Ren, I. Rahal, and W. Perrizo, A Vertical Outlier Detection Algorithm
with Clusters as By-Product, in Proc. of the 16th IEEE International
Conference on Tools with Artificial Intelligence, 2004, Boca Raton,
Florida, U.S.A., Nov. 2004.
[105] D. Ren, B. Wang, and W. Perrizo, RDF: A Density-Based Outlier Detection
Method using Vertical Data Representation, in Proc. of the 4th IEEE
International Conference on Data Mining, Brighton, U.K., Nov. 2004.
[106] M. Markou and S. Singh, Novelty Detection: A Review Part 1: Statistical
Approaches, Signal Processing, vol. 83, no. 12, Dec. 2003, pp. 2481-2497.
[107] J. Laurikkala and M. Juhola, Hierarchical Clustering of Female Urinary
Incontinence Data Having Noise and Outliers, Berlin: Springer Berlin /
Heidelberg, 2001, pp. 161-67.
[108] M. Markou and S. Singh, Novelty Detection: A Review Part 2: Neural
Network Based Approaches, Signal Processing, vol. 83, no. 12, Dec. 2003,
pp. 2499-2521.
[109] E. H. Feroz and T. M. Kwon, Self-Organizing Fuzzy and MLP Approaches
to Detecting Fraudulent Financial Reporting, in Proc. of the IEEE/IAFE
1996 Conference on Computational Intelligence for Financial
Engineering, New York City, NY, Mar. 1996.
[110] T. M. Kwon and E. H. Feroz, A Multilayered Perceptron Approach to
Prediction of the SEC's Investigation Targets, IEEE Transactions on
Neural Networks, vol. 7, no. 5, Sept. 1996, pp. 1286-1290.
215
(IURPA),
[125] TNB, Annual Report Tenaga Nasional Berhad 2005, Tenaga Nasional
Berhad, Malaysia, 2005.
[126] TNB, Annual Report Tenaga Nasional Berhad 2004, Tenaga Nasional
Berhad, Malaysia, 2004.
[127] F. E. Jin, 2004. Impianas Gets Extension for TNB Project New Straits
Times. http://www.highbeam.com/doc/1P1-96215780.html. Accessed
on March 5, 2009.
[128] R. Damodaran, 2004. Leader in Electricity Metering, Card-based Revenue
Collection New Straits Times. http://www.highbeam.com/doc/1P190630620.html. Accessed on March 5, 2009.
[129] S. S. A. Naser, A. Z. A. Ola, An Expert System for Diagnosing Eye Diseases
using Clips", Journal of Theoretical and Applied Information Technology,
vol. 4, no. 10, 2008, pp. 923-930.
[130] John McCarthy, 2007. Homepage What is Artificial Intelligence?.
http://www-formal.stanford.edu/jmc/whatisai/whatisai.html. Accessed
on March 17, 2009.
[131] J. McCarthy, M. L. Minsky, N. Rochester, C.E. Shannon, A Proposal for the
Dartmouth Summer Research Project on Artificial Intelligence, August 31,
1955.
[132] D. Crevier, AI: The Tumultuous Search for Artificial Intelligence, New
York, NY: BasicBooks, 1993.
[133] S. J. Russell, and P. Norvig, Artificial Intelligence: A Modern Approach,
2nd edition, Upper Saddle River, NJ: Prentice Hall, 2003.
[134] H. Moravec, Mind Children: The Future of Robot and Human Intelligence,
Harvard University Press, 1988.
[135] A. Hodges, Alan Turing: The Enigma, Walker and Company, New York,
2000.
[136] John McCarthy, 2007. Basic Questions What is Artificial Intelligence?.
http://www-formal.stanford.edu/jmc/whatisai/node1.html. Accessed
on March 23, 2009.
[137] National Research Council, Developments in Artificial Intelligence,
Funding a Revolution: Government Support for Computing Research,
National Academy Press, Washington D.C., 1999.
[138] N. M. Barnes, and Z. Q. Liu, Vision Guided Circumnavigating Autonomous
Robots, International Journal of Pattern Recognition and Artificial
Intelligence, vol. 14 (6), 2000, pp 689-714.
217
218
[162] S. Chen, and G. Jiang, The Prediction Model of Multiple Myeloma Based on
the BP Artificial Neural Network, in Proc. of the International Conference
on Technology and Applications in Biomedicine, 30-31 May 2008, pp.
380-382.
[163] D. T. Pham, and P. T. N. Pham, Artificial intelligence in Engineering,
International Journal of Machine Tools and Manufacture, vol. 39 no.6,
1999, pp. 937-949.
219
221
224
225
APPENDICES
226
APPENDIX A
LIBSVM Copyright Notice
227
APPENDIX B
Related List of Publications
[1]
Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed,
Farrukh Nagi, "A Computational Intelligence Scheme for Prediction of
the Daily Peak Load", submitted for second review to Applied Soft
Computing (ASOC) on 10 August 2010. Manuscript Reference No: ASOCD-09-00556.
[2]
Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed,
Farrukh Nagi, "Improving SVM-based Nontechnical Loss Detection in
Power Utility Using Fuzzy Inference System", accepted for publication
in IEEE Transactions on Power Delivery on 22nd June 2010. Manuscript
ID: PESL-00108-2009.R2.
[3]
Mohammad Mehdi Badjian, Jawad Nagi, Sieh Kiong Tiong, Keem Siah
Yap, Siaw Paw Koh, Farrukh Nagi, Comparison of Supervised
Learning Techniques for Non-Technical Loss Detection in Power
Utility, submitted to Malaysian Journal of Computer Science (MJCS) for
first review on 9 April 2010.
[4]
Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed, and
Malik Mohammad, Nontechnical Loss Detection for Metered
Customers in Power Utility Using Support Vector Machines, IEEE
Transactions on Power Delivery, vol. 25, no. 2, pp. 11621171, Apr. 2010.
[5]
[6]
[7]
Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Abdul Malik Mohammad,
and Syed Khaleel Ahmed, Non-Technical Loss Analysis for Detection
of Electricity Theft using Support Vector Machines, in Proc. of the
2nd IEEE International Power and Energy Conference (PECON) 2008,
Dec. 1-3, 2008, Johor Bahru, Malaysia, pp. 907912.
[8]
Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, and Syed Khaleel Ahmed,
Detection of Abnormalities and Electricity Theft using Genetic
Support Vector Machines, in Proc. of the IEEE Region 10 Conference
(TENCON) 2008, Nov. 18-21, 2008, Hyderabad, India, pp. 16.
228
[9]
Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, and Abdul Malik
Mohammad, Intelligent System for Detection of Abnormalities and
Theft of Electricity using Genetic Algorithm and Support Vector
Machines, in Proc. of the 4th International Conference on Information
Technology and Multimedia at UNITEN, Nov. 18-19, 2008, Bandar Baru
Bangi, Selangor, Malaysia, pp. 122127.
[10] Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, and Syed Khaleel Ahmed,
Electrical Power Load Forecasting using Hybrid Self-Organizing
Maps and Support Vector Machines, in Proc. of the 2nd International
Power Engineering and Optimization Conference, Jun. 4-5, 2008, Shah
Alam, Malaysia, pp. 5156.
229
230