Thesis Malayesia PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 251

AN INTELLIGENT SYSTEM FOR DETECTION OF NON-TECHNICAL LOSSES IN

TENAGA NASIONAL BERHAD (TNB) MALAYSIA LOW VOLTAGE


DISTRIBUTION NETWORK

By

JAWAD NAGI

A THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE


DEGREE OF MASTERS OF ELECTRICAL ENGINEERING

COLLEGE OF GRADUATE STUDIES


UNIVERSITI TENAGA NASIONAL

2009

DEDICATION

This thesis is dedicated to my father, who taught me that the best kind of
knowledge to have is that which is learned for its own sake. It is also dedicated
to my mother, who taught me that even the largest task can be accomplished if
it is done one step at a time.

ii

ABSTRACT

Electricity consumer dishonesty is a problem faced by all power utilities


worldwide. Finding efficient measurements for detecting fraudulent electricity
consumption has been an active research area in recent years. This thesis
presents a new approach towards Non-Technical Loss (NTL) detection in power
utilities using a combination of data mining and artificial intelligence (AI) based
techniques, namely: Support Vector Machine (SVM) and the Fuzzy Inference
System (FIS). The main motivation of this study is to assist Tenaga Nasional
Berhad (TNB) in peninsular Malaysia to reduce its NTLs in the distribution
sector. The intelligent system developed in this research study preselects
suspicious customers to be inspected onsite by TNBD SEAL (Strike Enforcement
Against Losses) teams for detection of fraud activities. This approach provides a
method of data mining, which involves feature selection and extraction from
historical customer consumption data. The Support Vector Classification (SVC)
technique applied in this research study uses customer load profile information
in order to expose abnormal behavior that is known to be highly correlated
with NTL activities. The FIS is employed as a data postprocessing scheme,
which uses knowledge of human expertise combined with the results from the
SVC, in order to shortlist potential fraud suspects for onsite inspection. The
proposed SVC and FIS model is trained using TNB Distribution's (TNBD's)
historical kWh consumption data for the Kuala Lumpur (KL) Barat station,
which is recorded with one of the highest rates of fraud activities in the state of
Selangor in Malaysia. Model testing and validation is performed using customer
data from three cities in the state of Kelantan in Malaysia. Feedback from TNBD
for onsite inspection indicates that the fraud detection system developed is
more effective as compared to the current actions taken by them. With the
implementation of this new fraud detection system, TNBDs average hitrate for
onsite customer inspection will become 40%, which increases their current
inspection hitrate 35-37% from a mere 3-5%.
iii

ACKNOWLEDGEMENT

First and foremost, I wish to thank God for giving me strength and courage to
complete this thesis and research, and also to those who have assisted and
inspired me throughout this research.

There are so many people to whom I am indebted for their assistance during my
endeavors to complete my Masters candidature in Electrical Engineering at
Universiti Tenaga Nasional (UNITEN). First and foremost, I would like to
express my gratitude to my supervisor Mr. Yap Keem Siah, whose invaluable
guidance and support was very helpful throughout my research. A similar level
of gratitude is due to my co-supervisor, Dr. Tiong Sieh Kiong. It is unlikely that I
would have reached completion without their encouragement and support.
Besides that, I would like to thank the Non-Technical Loss (NTL) team Project
Leader from TNB Research (TNBR) Sdn. Bhd., Ir. Haji Abdul Malik Mohamad,
and also Norazlinawati Mohamad who were both very supportive and helpful in
this research project. TNBR Sdn. Bhd. is most appreciated for funding this work
under Grant RJO 10061948.

My appreciation also goes to TNB Distribution (TNBD) Sdn. Bhd. for providing
us with the customer data and other helpful information for the project. I
express my appreciation to everyone who has involved directly and indirectly
to the success of this research. Last but not least, my family for their
understanding, patience, encouragement and support. Thank you for all the
support, comments and guidance.

iv

DECLARATION

I hereby declare that this thesis, submitted to Universiti Tenaga Nasional as


fulfillment of the requirements for the degree of Masters of Electrical
Engineering has not been submitted as an exercise for a similar degree at any
other university. I also certify that the work described here is entirely my own
except for excerpts and summaries whose sources are appropriately cited in the
references.

This thesis may be made available within university library and may be
photocopied or loaned to other libraries for the purposes of consultation.

30 June 2009

Jawad Nagi

TABLE OF CONTENTS

Page
DEDICATION

ii

ABSTRACT

iii

ACKNOWLEDGEMENTS

iv

DECLARATION

TABLE OF CONTENTS

vi

LIST OF TABLES

xii

LIST OF FIGURES

xiv

LIST OF ABBREVIATIONS / NOTATIONS / GLOSSARY OF TERMS

xix

CHAPTER 1 INTRODUCTION
1.0

Preliminaries

1.1

Project Overview

1.2

Research Objectives and Scope

1.3

Contributions of the Research

1.4

Research Methodology

1.5

Thesis Overview

11

CHAPTER 2 LITERATURE REVIEW


2.0

Overview

14

2.1

Load Profiling

14

2.1.1 Load Profiling Approaches

17

2.1.2 Load Profiling with Data Mining Techniques

18

Electricity Losses

21

2.2.1 Technical Losses

21

2.2

2.2.2 Non-Technical Losses (NTLs)

23

2.2.2.1

NTL Impacts

26

2.2.2.2

Current NTL Solutions

26
vi

2.3

2.4

Fraud Detection

27

2.3.1 Fraud Detection Techniques

28

2.3.2 Proposed Fraud Detection Methods

31

2.3.2.1

Fraud Detection in Electricity Business

31

2.3.2.2

Fraud Detection in Other Types of Businesses

32

Electricity Theft

32

2.4.1 Watt-Hour Meters

33

2.4.2 Methods of Electricity Theft

2.5

2.6

36

2.4.2.1

Low Voltage Meters

37

2.4.2.2

High Voltage Meters

39

Overview of Tenaga Nasional Berhad (TNB)

42

2.5.1 NTLs Encountered by TNB

43

2.5.2 Measures Taken by TNB for NTL Reduction

44

Summary

46

CHAPTER 3 SUPPORT VECTOR MACHINE


3.0

Overview

48

3.1

Introduction to Artificial Intelligence (AI)

48

3.1.1 Machine Learning

50

3.1.1.1

Pattern Recognition

51

3.1.1.2

Machine Learning Algorithms

52

3.1.1.3

3.1.1.2.1

Supervised Learning

53

3.1.1.2.2

Unsupervised Learning

55

Hypothesis Selection

3.1.2 AI Techniques

3.2

57
58

3.1.2.1

Expert System (ES)

58

3.1.2.2

Fuzzy Logic (FL)

59

3.1.2.3

Artificial Neural Network (ANN)

60

Support Vector Machine (SVM)

63

3.2.1 Statistical Learning Theory

64

3.2.1.1

Structural Risk Minimization (SRM) Principle

3.2.2 Support Vector Classification (SVC)

64
66

vii

3.2.2.1

3.2.2.2

68

3.2.2.1.1

Dual Problem

70

3.2.2.1.2

Non-Separable Case

71

3.2.2.1.3

Karush-Kuhn-Tucker Conditions

73

Non-linear SVC

75

3.2.3 Implementation of SVC

78

3.2.3.1

3.2.3.2
3.3

Linear SVC

Sequential Minimal Optimization (SMO)

79

3.2.3.1.1

Solving Two Lagrange Multipliers

80

3.2.3.1.2

Heuristics to Select Multipliers

82

SMO Algorithm Review

Summary

84
84

CHAPTER 4 MODEL DEVELOPMENT


4.0

Overview

86

4.1

Proposed Framework

86

4.1.1 Project Methodology

87

4.1.2 Research Methodology

88

Data Collection

89

4.2.1 Customer Information Billing System (e-CIBS) Data

91

4.2.2 HighRisk Data

92

Data Preprocessing

94

4.3.1 e-CIBS Data Preprocessing

94

4.2

4.3

4.3.1.1

Customer Filtering and Selection

95

4.3.1.2

Consumption Transformation

96

4.3.1.3

Feature Selection and Extraction

99

4.3.1.3.1

4.4

Cross-Validation (CV)

105

4.3.1.4

Feature Normalization

107

4.3.1.5

Feature Adjustment

107

4.3.1.6

Feature File

109

4.3.2 HighRisk Data Preprocessing

111

Classification Engine Development

112

4.4.1 Load Profile Inspection

113

viii

4.4.2 SVC Development

4.5

4.4.2.1

Weight Adjustment

115

4.4.2.2

Parameter Optimization

117

4.4.2.3

Probability Estimation

118

4.4.2.4

SVC Training

119

4.4.2.5

SVC Testing

123

Data Postprocessing

127

4.5.1 Fuzzy Logic Overview

129

4.5.1.1

Fuzzy Sets

129

4.5.1.2

Membership Functions (MFs)

129

4.5.1.3

Linguistic Variables

130

4.5.1.4

Fuzzy IF-THEN Rules

131

4.5.1.5

Combining Fuzzy Sets

132

4.5.1.6

Fuzzy Inference System (FIS)

133

4.5.2 Preliminary Data Postprocessing

135

4.5.2.1

Correlation of SVC Results with Data

136

4.5.2.2

Parameter Selection

137

4.5.2.3

Customer Filtering and Selection Using SQL

137

4.5.3 Suspicious Customer Selection Using FIS

4.6

115

141

4.5.3.1

Transformation of SQL into Fuzzy Rules

141

4.5.3.2

Membership Function Formulation

142

4.5.3.3

FIS Implementation

146
147

Summary

CHAPTER 5 EXPERIMENTAL RESULTS


5.0

Overview

148

5.1

Graphical User Interface

148

5.1.1 Main Screen

150

5.1.1.1

Selecting Data Files

151

5.1.1.2

Fraud Pattern Detection Level (FPDL)

152

5.1.1.3

Execute Detection

153

5.1.1.4

Detection Complete

154

ix

5.1.2 Second Screen

155

5.1.2.1

Suspected Customer List

156

5.1.2.2

Save Detection Report

156

5.1.3 Detection Result

5.2

160

5.1.3.1

Detection Report

160

5.1.3.2

Average Daily Consumption Report

161

5.1.4 AFDS Operation Manual

163

Model Validation

165

5.2.1 Validation of Classifier

166

5.2.1.2

Discussion of Training Results

5.2.2 Validation of Classification Results


5.2.2.1

Model Testing and Validation Results


5.2.2.1.1

Pilot Testing Results

168
168
170

5.2.2.2

Contribution of FIS for Hitrate Improvement

171

5.2.2.3

Discussion of Classification Results

173

5.2.3 Comparison of Model with Other AI Techniques

5.3

167

174

5.2.3.1

Multi-Layer Backpropagation Neural Network

175

5.2.3.2

Online-Sequential Extreme Learning Machine

177

5.2.3.3

Classification Results of Compared Models

182

5.2.3.4

Discussion of Comparison Results

185

Summary

191

CHAPTER 6 CONCLUSION AND FUTURE WORK


6.0

Overview

193

6.1

Benefits of the SVC and FIS in the Proposed Model

193

6.2

Key Findings of the Research

194

6.3

Achievement of the Research Objectives

196

6.4

Impact and Significance of the Project to TNB

198

6.5

Future Expansion and Recommendations

201

6.5.1 SVC Parameter Tuning using Genetic Algorithm (GA)

202

6.5.2 Improving SVC Kernel using Multi-Scale RBF Kernel

203

6.5.3 FIS Membership Function Optimization using GA

203

6.6

6.5.4 Feature Extraction Using Consumption Difference

204

Conclusion

205

REFERENCES

206

APPENDICES

226

Appendix A:

LIBSVM Copyright Notice

227

Appendix B:

Related List of Publications

228

BIODATA OF THE AUTHOR

230

xi

LIST OF TABLES

Table No.

Page

1.1

Text categorization classification accuracy from T. Joachims


[34]

1.2

Benefits gained from reduction of NTL activities

2.1

Summary of load profile studies worldwide and their


implementation

15

2.2

Types of NTLs based on the components identified

25

3.1

Non-linear kernels commonly used to perform a dot


product in a mapped feature space in the SVM formulation

78

3.2

Summarized procedure of the SMO algorithm [37]

84

4.1

Customer data collected from TNBD for training and testing

90

4.2

Customer information listed in the monthly e-CIBS data

92

4.3

Customer information listed in the HighRisk data

93

4.4

Detection hitrate for different combinations of modeling


features

104

4.5

Information retrieved from the HighRisk data using SQL

111

4.6

Information extracted using SQL from the HighRisk data

112

4.7

Weight ratio adjustment of the SVC

116

4.8

Specifications of trained classifier (model)

121

4.9

Parameters used for selection of suspicious customers

138

4.10

SQL statements implemented for suspicious customer


selection

140

4.11

Fuzzy rules transformed from the SQL statements in Table


4.10

141

xii

5.1

Model testing and validation results for the fraud detection


system

169

5.2

Comparison of the inspection hitrate using the


computational intelligence scheme of SVC and FIS versus
standard SVC

172

5.3

Experimental results of the proposed SVC and FIS scheme


versus the ML-BPNN and OS-ELM classification techniques

183

5.4

Comparison of the average training accuracy and inspection


hitrate of the proposed SVC and FIS scheme versus the MLBPNN and OS-ELM

185

xiii

LIST OF FIGURES

Figure No.

Page

1.1

An electric wire bypassing the electricity meter [37]

1.2

A wire used to slow the rotating disc in an electric meter


[37]

1.3

Proposed research methodology Computational


intelligence scheme of SVC and FIS for development of the
fraud detection system

2.1

Basic components of a Watt-Hour Meter Clockwise from


top left: The coil connections for voltage and current
sensing elements, the rotating disc that records the
electricity consumption, and the basic construction [94]

34

2.2

Earliest electricity recording meters Left to right:


Westinghouse ampere-hour meter by Shallenberger (18881897) and Gutmann type A (1899-1901) Watt-hour meter
by Sangamo [122]

36

2.3

Modern electricity recording meters Left to right:


Schlumberger J5S (1984) and General Electric I-70S (1968)
[122]

36

2.4

Parts of a single-phase Watt-hour meter where tampering


frequently occurs [94]

38

2.5

A 3-phase Watt-hour meter connection [94]

40

3.1

Basic structure of a multi-layer Artificial Neural Network


(ANN)

62

3.2

Optimal margin hyperplane for the separable case of SVC

67

3.3

Linear separating hyperplanes for the separable case of SVC


outlining the support vectors (maximum margin approach)

69

3.4

Linear separating hyperplanes for the non-separable case


of SVC where the slack variable  permits margin failures
(soft margin approach)

72

xiv

3.5

Non-linear separating hyperplanes for the non-separable


case of SVC

76

3.6

Inequality constraints causing the Lagrange multipliers to


lie within a box and the linear equality constraint causing
the Lagrange multipliers to lie on a diagonal line [187]

81

4.1

Flowchart of the proposed project methodology [198]

87

4.2

Flowchart of the proposed framework for detection of NTL


activities

88

4.3

The e-CIBS data for the KL Barat Station

91

4.4

The HighRisk data for the KL Barat Station

93

4.5

Flowchart of the proposed framework for e-CIBS data


preprocessing

94

4.6

Common customer records for the KL Barat station after


customer filtering and selection

96

4.7

The e-CIBS monthly consumption data to be transformed

98

4.8

The e-CIBS monthly consumption transformed into the


normal monthly kWh consumption

98

4.9

The normal monthly kWh consumption of the customers

101

4.10

The meter reading date of the customers

101

4.11

The difference of days between each meter reading date

102

4.12

The average daily kWh consumption features

102

4.13

The monthly CWR of the customers

105

4.14

The averaged CWR over a period of 25 months

106

4.15

The normalized average daily kWh consumption features

108

4.16

The normalized average CWR over a period of 25 months

109

4.17

The preprocessed features from the e-CIBS data

110

4.18

The LIBSVM feature file

110

4.19

Information extracted from the HighRisk data using SQL

112

xv

4.20

Normalized load profiles of four typical fraud customers


over a period of two years

114

4.21

Normalized load profiles of four good customers over a


period of two years

115

4.22

Customer data features (samples) used for SVC training

116

4.23

The SVC training engine proposed for parameter


optimization and building the classifier (model)

118

4.24

LIBSVM MS-DOS executable used for SVC training

120

4.25

SVC training parameters for LIBSVM

120

4.26

SVC training using LIBSVM

120

4.27

Model file (classifier) after SVC training is complete

121

4.28

Separating boundaries between the two classes of the SVC


model Dark (blue) region indicates the Class 1 boundary
and lighter (yellow) region indicates the Class 2 boundary

123

4.29

The SVC testing engine proposed for classification of fraud


and normal customers

124

4.30

LIBSVM MS-DOS executable used for SVC testing

125

4.31

Customer data features used for SVC testing

125

4.32

SVC testing parameters for LIBSVM

126

4.33

SVC testing and validation using LIBSVM

126

4.34

Output file (classification results) after SVC testing is


complete

127

4.35

Flowchart of the proposed framework for data


postprocessing

128

4.36

Fuzzy Membership Functions (MFs) for the term set age

131

4.37

Flowchart of the general architecture of a FIS

134

4.38

Correlation of the SVC results with the customer data

136

4.39

Parameters selected from the correlated data

139

xvi

4.40

Fuzzy MFs used in order to implement the "Low" level


fuzzy rule

142

5.1

Authentication screen of the AFDS software

149

5.2

Welcome screen of the AFDS software

149

5.3

Main screen of the AFDS software

150

5.4

Selecting the e-CIBS data file in the AFDS software

151

5.5

File browser for data file selection in the AFDS software

151

5.6

Location of the selected file in the AFDS software

152

5.7

Selection of Fraud Pattern Detection Level (FPDL) in the


AFDS

152

5.8

Starting detection in the AFDS software

154

5.9

AFDS running for detecting suspicious customers

154

5.10

The start/stop detection toggle button in the AFDS


software

155

5.11

Message box confirming detection is complete

155

5.12

Main screen of the AFDS software after detection is


complete

156

5.13

Second screen of the AFDS software

157

5.14

Viewing the list of suspicious customers in AFDS software

158

5.15

Saving the detection report in the AFDS software

158

5.16

File browser dialog to save the detection report in the AFDS

159

5.17

Message box confirming the detection report is saved

159

5.18

Location of saved detection report in the second screen of


the AFDS

159

5.19

Sample detection report in Microsoft Office Excel

160

5.20

Sample detection report for customers tested in KL Barat


station

161

5.21

Average daily consumption report in Microsoft Office Excel

162

xvii

5.22

Average daily consumption report for the KL Barat station


Page 1

162

5.23

Average daily consumption report for the KL Barat station


Page 2

163

5.24

Inspecting load profiles of suspected customers from the


daily average consumption report

164

5.25

Opening the AFDS software installation and operation


manual

164

5.26

Cover page of the AFDS software installation and operation


manual

165

5.27

The network architecture of a BPNN

177

5.28

The phenomenon of local minimum in a BPNN

187

xviii

LIST OF ABBREVIATIONS / NOTATIONS / GLOSSARY OF TERMS

AEIC

Association of Edison Illuminating Companies

AFDS

Abnormality and Fraud Detection System

AI

Artificial Intelligence

AMR

Automatic Meter Reading

ANN

Artificial Neural Network

ASEAN

Association of South East Asian Nations

BP

Back-Propagation

BPNN

Backpropagation Neural Network

BSV

Bounded Support Vector

CIBS

Customer Information Billing System

CT

Current Transformer

CV

Cross-Validation

CWR

Credit Worthiness Rating

DA

Discriminant Analysis

DARPA

Defense Advanced Research Projects Agency

DLO

Dual Lagrangian Optimization

e-CIBS

Enhanced Customer Information Billing System

EA

Evolutionary Algorithm

ELM

Extreme Learning Machine

EMPD

Expanded Metal Protection Door

ERM

Empirical Risk Minimization

ES

Expert System

FAM

Fuzzy Associative Memory

FCM

Fuzzy C Means

FIS

Fuzzy Inference System

FL

Fuzzy Logic

FPDL

Fraud Pattern Detection Level

GA

Genetic Algorithm

GUI

Graphical User Interface


xix

HR

High Risk

HRC

High Risk Customer

HV

High Voltage

IG

Information Gain

IPP

Independent Power Producer

IR

Irregularity Report

k-NN

k-Nearest Neighbor

KA

Kernel Adatron

KBS

Knowledge Based System

KDD

Knowledge Discovery in Databases

KKT

Karush-Kuhn-Tucker

KL

Kuala Lumpur

LIBSVM

Library for Support Vector Machine

LPC

Large Power Customers

LV

Low Voltage

MF

Membership Function

MIT

Massachusetts Institute of Technology

ML-BPNN

Multi-Layer Back Propagation Neural Network

MLP

Multi-Layer Perceptron

MS-DOS

Microsoft Disk Operating System

MSE

Mean Square Error

MYR

Malaysian Ringgit

NN

Neural Network

NTL

Non-Technical Loss

OCR

Optical Character Recognition

OLAP

Online Analytical Processing

OMH

Optimal Margin Hyperplane

OPC

Ordinary Power Customers

OS-ELM

Online-Sequential Extreme Learning Machine

PEA

Provincial Energy Authority

PEC

Power Engineering Centre

PF

Power Factor

QP

Quadratic Programming
xx

RAM

Random Access Memory

RAN

Random Selection

RBF

Radial Basis Function

RMR

Remote Meter Reading

RLS

Recursive Least Squares

SA

Simulated Annealing

SCADA

Supervisory Control and Data Acquisition

SEAL

Special Enforcement Against Losses

SEC

Security and Exchange Commission

SESB

Sabah Electricity Sdn. Bhd.

SESCO

Sarawak Electricity Supply Corporation

SLFN

Single Hidden-Layer Feed-Forward Neural Network

SMB

Secure Meter Box

SMO

Sequential Minimal Optimization

SOM

Self-Organizing Map

SQL

Structured Query Language

SRM

Structural Risk Minimization

SV

Support Vector

SVC

Support Vector Classification

SVM

Support Vector Machine

TNB

Tenaga Nasional Berhad

TNBD

Tenaga Nasional Berhad Distribution

TNBR

Tenaga Nasional Berhad Research

TOE

Theft Of Electricity

UNITEN

Universiti Tenaga Nasional

USD

United States Dollar

VC

Vapnik-Chervonenkis

xxi

CHAPTER 1

INTRODUCTION

1.0 Preliminaries
Non-Technical Losses (NTLs) originating from electricity theft and other
customer malfeasances are a problem in the electricity supply industry. Such
losses occur due to meter tampering, meter malfunction, illegal connections,
billing irregularities and unpaid bills [1]. The problem of NTLs is not only faced
by the least developed countries in the Asian and African regions, but also by
developed countries such as the United States of America and the United
Kingdom [2]. Specifically, high rates of NTL activities have been reported in the
majority of developing countries in the Association of South East Asian Nations
(ASEAN) group, which include Malaysia, Indonesia, Thailand and Vietnam [3].
As an example, in the United States NTLs have been estimated to account
between 0.5% to 3.5% of the total annual revenue [1], which is relatively low
when compared to the NTLs faced by utilities in developing countries such as
Bangladesh [2], India [3], Pakistan [4] and Lebanon [5], where an average of
between 20% to 30% of NTLs have been observed. Nonetheless, in 1998, the
revenue loss by power utilities in United States was estimated between USD 1
billion to USD 10 billion, given that all utility companies in the United States had
an annual gross revenue of USD 280 billion [1].

In deregulated markets, the knowledge of electricity customers provides an


understanding of their consumption behavior, which has recently become
1

important in the electricity supply industry. With this knowledge, electricity


providers are able to develop new marketing strategies and offer services based
on customer demand. One of the most common methods used in acquiring
knowledge of customers behavior is load profiling [6], which is defined as the
pattern of electricity consumption of a customer or group of customers over a
period [7]. Load profiling has been used for many years by power utilities for
tariff formulation, system planning, and devising marketing strategies [8].

As a common practice, power utility companies record historical customer data,


such as contractual details, billing procedures, and consumption records in
various customer databases to support their billing activity [9]. However, such
information as it presently exists is often too complex to allow the human mind
to formulate efficient and strategic decisions or draw effective conclusions. In
addition, this information is often inaccessible and extremely time consuming
to retrieve, due to the problems associated with archived data in complex
database systems [9, 10].

Due to the problem associated with NTLs in electric utilities, various methods
for efficient management of NTLs [11] and protecting revenue in the electricity
distribution industry [12] have been proposed. The most effective method
currently to reduce NTLs and commercial losses up to date is by using smart
and intelligent electronic meters [13], which make fraudulent activities more
difficult, and easy to detect. However, the cost of such meters comes at an
expensive price. Therefore, these types of meters are not currently feasible to
be used by power utilities throughout the entire low voltage (LV) distribution
network, i.e., in residential and commercial sectors.

In this research, data mining is employed to meet the above challenges. In


general, data mining is defined as a process of discovering various models,
summaries and derived values from a given collection of data [14]. In recent
years, several data mining studies on fraud identification and detection in the
electricity distribution industry have been researched, including: Rough Sets
[15], Statistical methods [16, 17], Decision Trees [18, 19], Artificial Neural
Networks (ANNs) [20], Extreme Learning Machine (ELM) [8], Statistical-based
Outlier detection [21], Knowledge Discovery in Databases (KDDs) [10, 22] and
Wavelet-based feature extraction with multiple classifiers [23]. In addition to
this, data mining techniques have also been used in the other types of
businesses including: telecommunication [24], insurance [25], risk management
[26], and credit card transaction [27]. Most of these studies have used data
mining techniques for means of detection and prediction of fraud activities.

The main motivation of this study is to investigate the capability of using


Support Vector Machine (SVM) and the Fuzzy Inference System (FIS) for the
detection and identification of NTL activities [20], and to solve the existing
drawbacks of ANNs. The recent success of SVMs in various real world
applications such as: face identification [28], text categorization, [29] and
bioinformatics [30] provides additional motivation to this research. There are
many research papers and literatures indicating the classification accuracy of
SVMs outperform other traditional classification methods, such as ANNs [31,
32]. Comparison of SVM classification results with other techniques as reported
by T. Joachims [33] for text categorization is shown in Table 1.1.

1.1 Project Overview


Fraud in legal terms is defined as An intentional perversion of truth; deceitful
practice or device resorted to with intent to deprive another of property or
other right [34]. Fraud occurs in every corner of life, as in: electronic banking,
credit card transaction, telecommunication and insurance businesses. Due to
3

the dramatic increase of fraud which results in losses worldwide each year,
several computational intelligence techniques for detection and prevention of
fraud have continually evolved and are being applied to many business fields
[35]. The most popular computational intelligence branch evolved in the field of
Computer Science is Artificial Intelligence (AI), which has been used by
researchers over the last three decades.

Table 1.1: Text categorization classification accuracy from T. Joachims [34]


Data
Earn
ACQ
Money-fx
Grain
Crude
Trade
Interest
Ship
Wheat
Corn
Microaverage

Nave
Bayes
95.9
91.5
62.9
72.5
81.0
50.0
58.0
78.7
60.6
47.3
72.0

Rocchio

C4.5

k-NN

SVM

96.1
92.1
67.6
79.5
81.5
77.4
72.5
83.1
79.4
62.2
79.9

96.1
85.3
69.4
89.1
75.5
59.2
49.1
80.9
85.5
87.7
79.4

97.3
92.0
78.2
82.2
85.7
77.4
74.0
79.2
76.6
77.9
82.3

98.5
95.4
76.3
93.1
88.9
77.8
76.2
85.4
85.2
85.1
86.4

Loss of revenue due to electricity fraud and meter malfunction is an important


concern for power utilities [20]. Relating the statement above to the context of
this research project, fraud in this project is referred to illegal actions
conducted on the customer side to reduce the amount of electricity usage
charged or billed to them.

Electric meters or kWh (Kilowatt-hour) meters record readings which are used
to bill customers for the amount of electricity consumed [36]. In many
developing countries in the ASEAN group, electro-mechanical induction
electricity meters are currently used, which can easily be tampered. Inspection
of electric meters for fraud detection and identification is currently a manual
and tedious task, which requires experienced staff. However, meter malfunction
4

correction is currently limited for experienced utility staff [37], due to the
complexity associated in rectifying problematic electric meters.

This project titled, Development of an Intelligent System for Detection of


Abnormalities and Probable Fraud by Metered Customers in TNB Distribution
Division, was initiated by Tenaga Nasional Berhad1 (TNB) in Malaysia in the
effort to reduce its NTLs in the LV distribution network, estimated around 15%
in peninsular Malaysia. This project is a collaborated effort between TNB
Distribution2 (TNBD), TNB Research 3 (TNBR) and the Power Engineering
Centre (PEC) of Universiti Tenaga Nasional (UNITEN). Distribution losses due to
fraud activities are low throughout peninsular Malaysia; however, in some
states like Selangor, Penang and Johor high NTLs have been reported. Two
typical fraud scenarios with meter tampering (fraud) cases identified by TNBD
SEAL (Special Engagement Against Losses) teams are shown in Figure 1.1 and
1.2. In Figure 1.1 [37] an electric wire bypassing the meter is identified,
whereas in Figure 1.2 [37] a wire of a magnet bar inserted into an electric meter
to reduce the speed rotating disc is identified.

Figure 1.1: An electric wire bypassing the electricity meter [37]

Tenaga Nasional Berhad (TNB) is the main electricity provider in peninsular Malaysia.
TNB Distribution (TNBD) is the transmission and distribution division of TNB Malaysia.
3 TNB Research (TNBR) is the research division of TNB Malaysia.

Large inspection campaigns have been carried out by TNBD SEAL teams with
little success. The current actions taken by the SEAL teams in order to address
the problem of NTLs include: (i) meter checking and premise inspection, (ii)
reporting on irregularities, and (iii) monitoring of unbilled accounts, which
have resulted in a fraud detection hitrate of 3-5%. This is because customer
installation inspections are carried out without any specific focus or direction.
In most cases, inspections are carried out at random, while some targeted raids
are undertaken based on information reported by the public or meter readers.

Figure 1.2: A wire used to slow the rotating disc in an electric meter [37]

With the use of an intelligent detection system, when customer installation


inspections are guided to a reduced group of likely fraudulent customers, a
systematic checking system with a higher fraud detection rate of success will be
achieved. This can also reduce the operational costs of TNBD in monitoring NTL
activities. Therefore, an intelligent system to guide the inspections is proposed
in this research study.

1.2 Research Objectives and Scope


The motivation of the present study requires dealing more efficaciously with
NTL activities. The aim of the present research is to develop an intelligent fraud
detection system to aid the identification and detection of NTL activities, i.e.
fraud activities and abnormalities. More specifically, the present research study
6

pursues the following objectives in its development and application of load


behavior profiling for the detection of NTL activities:

The main objective of research study is to identify, detect and predict customers
with fraud activities and abnormalities by investigating abrupt changes in
customer load consumption patterns.

To achieve the main objectives, sub-objectives below are outlined:


1. To develop a dedicated and automated strategy to preprocess customer
data for smoothening noise, feature extraction and data normalization.
2. To develop a fraud detection system using a combination of two AI based
computational intelligence schemes: (i) Support Vector Machine (SVM),
and (ii) Fuzzy Inference System (FIS).
3. To evaluate the proposed fraud detection system using customer data
from different regions within in peninsular Malaysia and to provide a
comparative study using different AI based classification techniques.
4. To develop a user-friendly Graphical User Interface (GUI) for the
intelligent fraud detection system.

1.3 Contributions of the Research


The main significance and benefits of the research study reported in this thesis
are identified as follows.

1. The fraud detection system developed in this research study provides


utility information system tools to power utilities including TNB in
Malaysia for efficient detection and classification of NTL activities in
order to increase effectiveness of their onsite operation.
2. With the implementation of the proposed system, operational costs for
power utilities including TNB in Malaysia due to onsite inspection in
monitoring NTL activities will be significantly reduced. This will also
7

reduce the number of inspections carried out at random, resulting in a


higher fraud detection hitrate.
3. Disseminate knowledge and behavior regarding fraudulent consumption
patterns is obtained with the use of the proposed system, which is useful
for further study and analysis by NTL experts and inspection teams in
power utilities including TNB in Malaysia.
4. Lastly, by using the proposed fraud detection system, great time saving
in detecting and identifying problematic electric meters can be achieved
by power utilities including TNB Malaysia.

NTLs not only affect a companys profitability and credibility, but also increase
the cost of electricity to customers [16]. Therefore, the need to minimize this
problem is crucial for the both utilities and their customers. The respective
benefits gained by power utilities and their customers from reduction of NTL
activities are illustrated in Table 1.2.

Table 1.2: Benefits gained from reduction of NTL activities


Power Utilities

Customers

Reduction of operational costs of


onsite physical checking as inspection
teams are able directly to target
suspicious NTL activities.

Reduction in the cost of electricity


as the extent of NTL activities are
decreased and transfer of its cost
impact to customers is reduced.

Minimization of NTL problems due to


the more rapid methods of detecting
and predicting customers behavior.

Improved customer satisfaction as


the system provides customers with
more reliable and efficient services.

Increased system efficiency and


reliability as generation of electricity
is based on actual economic demand.

Strong customer relationships as


timely and reliable results assist
decision-making process.

The resulting fraud detection system developed for detection and identification
of NTL activities will significantly benefit both the power utilities and their
customers, as indicated by Table 1.2.

1.4 Research Methodology


A combination of two AI based computational intelligence schemes, Support
Vector Machine (SVM) and Fuzzy Inference System (FIS) is proposed in this
research study, as shown in Figure 1.3. The FIS has proven in many applications
to enhance the performance of machine learning algorithms like SVMs [3840].
The SVM being a computational machine learning tool and the FIS utilizing
human knowledge and expertise, both techniques combined in this research
study contribute to an intelligent framework for enhancing the performance of
machine learning algorithms like SVMs.

Input
Raw Customer Data

Data Preprocessing

Feature Selection
and Extraction

List of Suspicious
Customers

Data Postprocessing
using FIS

SVM Training and


Testing

Result

Figure 1.3: Proposed research methodology Computational intelligence


scheme for development of the fraud detection system

The proposed SVM and FIS computational intelligence scheme proposed in this
study, however, is different from the methodologies presented in [38-40],
because in this approach the FIS is used as a data postprocessing scheme for the
selection of suspicious customers, based on relationship of parameters in the
SVM input, output and the preprocessed customer data. This combination of
SVM and FIS provides better classification and detection of NTL activities than
compared to conventional SVM.

The purpose of this research study is to use the knowledge gathered from
customers load profiles in order to detect significant behavioral deviations that
signal NTL activities. NTLs have been observed in many countries, which are a
significant concern. Therefore, it is important to be able to detect and identify
9

possible NTL activities by means of analyzing customer data made available


through the Customer Information and Billing Systems (CIBSs).

Although customer data analysis can be extended to include a vast range of


research topics, however, the present availability of customer data and research
priorities have limited the scope of this study as follows.

1. This study focuses on the low voltage (LV) distribution network, which
include: residential, commercial and light industrial customers by using
the monthly kWh interval data retrieved over a period of time.
2. The NTL detection technique proposed in this research study is based on
a combination of data mining and AI based classification approaches,
which is distinct from the other approaches that have been implemented
to minimize the NTL problems, as discussed in Chapter 2 later onwards.

The analysis undertaken in this research study focuses on customers behavior


changes by identifying abrupt changes in their load consumption patterns made
apparent through load profiling and data mining techniques. More specifically,
customers are represented by their load profiles or by using temporal evolution
of consumption-related variables over a period of time [37]. These profiles are
characterized by means of patterns which represent the general behavior of the
customers. Once the patterns are adjusted, the similarity measure between each
customer and each pattern, and the global similarity measure between a
customer and the patterns considered as a whole (the normality degree) are
evaluated. The global similarity measure identifies customers who do not fit any
of the patterns, while the customer-pattern similarity measure identifies
customers similar to patterns which are detected as fraud or anomalous [20].

10

This methodology is general and is not bound to a particular set of variables or


customer types. All input information required is taken solely from the
customer consumption and billing database. The consumption patterns allow
visual representation of the major groups of homogeneous customers, which
are normal or indicate possible fraud activities [20]. The analysis and the results
of the research reported in this thesis show the effectiveness and significance of
identifying and detection NTL activities through the proposed fraud detection
framework.

1.5 Thesis Overview


This thesis is arranged in a methodical manner. It is organized into six chapters
comprising of this Introduction chapter and five further chapters as follows.

Chapter 2 reviews the background and literature relating to losses in power


utilities, including technical losses and those due to NTL activities referring to
the impact of NTL activities from an economic and a financial perspective. A
comprehensive review of customers load profile analysis in several countries is
presented in this chapter. Also considered are the clustering techniques that
have commonly been implemented to classify electricity customers using the
customer load profiles together with the associated cluster validity measures.
Some background issues concerning fraud detection techniques and fraud cases
in electricity businesses, as well as in other businesses including credit card
transaction, telecommunication, insurance and risk management are also
reviewed. In addition, the operation of Watt-hour meters and various methods
of electricity theft including meter tampering are discussed. Finally, this chapter
provides an overview of TNB as the largest power utility in peninsular Malaysia
and outlines its needs with respect to implementing solutions to minimize NTL
activities.

11

In Chapter 3, the background and theoretical concepts of the AI based


techniques applied in this research study are presented. The introduction starts
off by discussing the preliminaries of AI briefly, which include pattern
recognition and machine learning techniques, such as: supervised learning and
unsupervised learning. Next, some popular AI techniques are briefly discussed,
which include: Expert System (ES), Fuzzy Logic (FL) and Artificial Neural
Networks (ANNs). In the sub chapter of SVM, the statistical learning theory is
presented, followed by the Structural Risk Minimization (SRM) principle. The
background and theoretical concepts of Support Vector Machines (SVMs) with
regards to Support Vector Classification (SVC) are discussed, where derivations
of the margin hyperplanes for linear and non-linear SVC are presented, followed
by the kernel methods. The last part of the chapter presents the Sequential
Minimal Optimization (SMO) algorithm used for optimization of the Quadratic
Programming (QP) problems in SVMs.

Chapter 4 provides the methodology proposed for the fraud detection system
and implements the associated key algorithms to be used in NTL detection,
identification and prediction. In the first sub chapter, general project and
research methodologies are introduced. The three major stages are involved in
the development of the fraud detection system, include: (i) data preprocessing,
(ii) classification engine development, and (iii) data postprocessing. The data
preprocessing sub chapter illustrates data mining techniques used for
preprocessing raw customer information and billing data for feature selection
and feature extraction. The sub chapter, classification engine development
illustrates the SVC training, parameter optimization, development of the SVC
classifier and the SVC testing and validation engine. The last sub chapter, data
postprocessing describes the development of a Fuzzy Inference System (FIS),
creation of fuzzy rules and membership function (MF) formation for the
selection of suspicious customers from the SVC results.

12

Chapter 5 is composed of two main sub chapters. Sub chapter 1, presents the
Graphical User Interface (GUI) developed for the fraud detection system. The
GUI of the software developed, generates the detection report of the list of
suspicious customers and the average daily consumption report in order to
inspect the load consumption patterns of the suspicious customers. In sub
chapter 2, model validation results are presented based on: (i) the classifier, (ii)
pilot testing, and (iii) comparison of the proposed model with other AI based
classification techniques. Model validation results obtained are discussed and
evaluated. The contribution of the FIS for hitrate improvement is also discussed
and the computational intelligence scheme of SVC and FIS is compared to
standard SVC. Finally in the end of sub chapter 2, a comparative study of the
proposed SVC and FIS model is performed with two AI based classification
techniques: (i) Multi-Layer Backpropagation Neural Network (ML-BPNN), and
(ii) Online-Sequential Extreme Learning Machine (OS-ELM) in order to evaluate
the efficiency of the proposed fraud detection system. The results of the
comparative study are discussed and elaborated in detail.

Chapter 6, which is the last chapter, concludes the thesis and summarizes the
research contributions made. The achievements and objectives of the research
study with respect to the project are highlighted along with the key findings of
the research. In addition, this chapter also discusses the impact and significance
of this project to TNB in Malaysia and suggests future research in the present
context that merits consideration.

13

CHAPTER 2

LITERATURE SURVEY

2.0 Overview
This chapter presents a literature review of electricity load profiling studies
conducted in various countries. A summary of these studies and their means of
implementation are provided in Table 2.1. In the next section, consideration is
given to background and theoretical concepts relating to power losses that
electricity distribution companys experience, which include technical losses
and NTLs. Also, some background issues concerning fraud detection techniques
used in electricity businesses, as well as in other businesses are reviewed. In
addition, various methods of electricity theft, such as meter tampering etc are
also presented. The final section provides an overview of TNB, the sole
electricity provider in peninsular Malaysia, as the case study for the present
research. Particular reference is made to the connection between load profiles,
NTLs, and fraud detection within this electricity supply utility.

2.1 Load Profiling


A load profile is defined as the pattern of electricity load demand of a customer
or a group of customers over a given period [6], where the period concerned
can be daily, weekly, monthly or yearly. This concept has long been used as an
effective tool for tariff rate formulation, system planning, load management, and
devising marketing strategies [41].

14

In many countries, load profiles have been identified as an alternative and costeffective approach to the interval metering4 solution, which is known to be
expensive and impractical for small, LV commercial and domestic customers. In
addition, having knowledge of customers load profiles not only assists
distribution companies in determining the demand price of electricity, but also
provides better marketing strategies [42].

Over the last two decades, a number of load profile studies have been carried
out to classify electric utility customers based on their load consumption
behavior. These studies have been carried out in countries including Taiwan
[4347], Slovenia [7, 41], Romania [4850], Portugal [5153], United Kingdom
[5457], Malaysia [5862], Belgium [63], Spain [64] and Brazil [65]. The main
objective of load profile studies in general is to extract and record information
relating to customer load characteristics [46]. In all the different countries these
studies have been carried out, they have served various purposes and the
countries have gained significant benefits. The various reasons reported for
conducting such studies are indicated in Table 2.1.

Table 2.1: Summary of load profile studies worldwide and their implementation
Countries

Purpose of Load Profile Studies

Techniques

Taiwan,
Taiwan Power
Company

Since 1993, load profile research in


Taiwan [4347] has studied multiple
functions of system planning, system
operation and maintenance. This
includes developing effective load
management alternatives to reduce
system peak demands and designing
proper tariff rate structures according
to actual power consumption and
developing marketing strategies for
promoting the electricity business.

Statistical
techniques

Interval metering is when a meter records customer demand and consumption in fixed
intervals of time during a single day to create a consumption usage profile.

15

Slovenia

Romania

In Slovenia [7, 41], load profiles have


been used as an effective tool for billing
customers who have deviated from
their contracted schedules. Under the
Slovenian energy law, only eligible
customers with rated power above
41kW are allowed to changed their
suppliers or buy energy in the market
individually. Therefore, it is essential to
have appropriate techniques to allocate
customers to representative groups.
In Romania [4850], load profile
studies have been conducted to
formulate better tariff rates due to
market deregulation challenges. This
study has been implemented in order
to develop improved tariff offers and to
study the margin left to a distribution
company for fixing dedicated tariffs to
each customer class, for developing
dedicated marketing strategies.

Portugal,
The
Portuguese
Distribution
Company

In Portugal [5153], load profile


studies were carried out on 165 LV
customers in order to develop a
decision-making system to support the
definition of adequate contract options,
as well as to develop better market
strategies. Since market liberalization,
utility companies needed strategies to
differentiate themselves based on cost
and values of services offered [52].

United
Kingdom

In the United Kingdom [5457], load


profile studies have had long history in
the electricity supply industry. Load
profile studies have been employed for
many years to formulate and set retail
electricity tariffs [54]. Some studies
considered weather effects [56].

Malaysia,
Tenaga
Nasional
Berhad

In Malaysia [5862], load profile


studies have been conducted using AIbased techniques in order to identify
similarities of customers and to
allocate customers to one of the
identified categories. The studies have
used a set of load data from different
feeders associated with TNB [5860].

Hierarchical
clustering

Fuzzy C Means
(FCM)
clustering

K-Means
clustering
Hierarchical
clustering

K-Means
clustering
Self-Organizing
Map (SOM)
Two Level
Approach

Fuzzy
Classification

Fuzzy C Means
(FCM)
clustering
Artificial neural
network (ANN)
Fuzzy
Classification
16

From Table 2.1 it is shown that the load profile studies conducted vary from
one country to another. However, in the case of Malaysia, the only load profile
studies that have ever been conducted used a set of load data from 46 different
feeders related with TNB in order to demonstrate a method of classifying the
daily load curves of different consumers derived from a distribution network
[61, 62]. Since such a limited study of customer behavior changes in Malaysia
have been previously reported, therefore research in here was proposed.

2.1.1 Load Profiling Approaches


Presently, two types of approaches are most commonly used to model load
profiles. The first approach is the area model and the second approach is the
category model [42, 66]. The area model or analytical model has a limitation
in which it assumes that all customers supplied from the same distribution
station have similar load patterns. On the other hand the category model or
synthetic model based on similar customer pattern categories has a limitation
relating to the variance distribution between the rest curve5 and the
constructed load patterns. Therefore, in [66], a new model for the settlement of
load profile based purposes was designed using the rest curve and typical load
profiles. This model was designed to take advantage of the merits of the area
and category models cited above, and claims that this new model improves on
the previous two approaches.

Recent advancements in technology have allowed a number of modern load


profiling approaches to arise from work done in particular countries. Two
major research groups have been identified as using different approaches for
determining typical load profiles. The first research group developed typical
load profiles using comprehensive load survey systems according to predefined customer categories [4547, 65, 66]. In the case of the second research
5

The rest curve represents the consumption profile of the end users that are not measured on
time-interval basis and is derived on the basis of actual power consumed [66].

17

group, various pattern recognition methods were used as load profiling tools to
develop typical load profiles based on the shapes of the load patterns recorded
[6, 41, 42, 49, 67].

However, these studies indicate that load profiling procedures of both research
groups are affected by limitations. The main limitation affecting the first group
is that the time required for measurement is quite long. For the second group,
the key limitation is that the procedure required to develop the customer
characteristics is expensive and time consuming [42]. Even though it might be
argued that the second approach is better than the first; there is no clear and
concise way to determine the optimum pattern recognition method when
seeking to represent load profiles. Hence, the research presented in this study
employs SVM to solve the pattern recognition problem of classifying load
profiles based on similarities of consumption behavior for differentiating
between fraud and good (normal) consumption patterns.

2.1.2 Load Profiling with Data Mining Techniques


The most common data mining approach for load profiling reported in
literature is clustering, which are based on unsupervised learning. Clustering
techniques provide one of the steps for exploratory data analysis and form
parts of pattern recognition methods [41]. Cluster analyses are used for
identification of common patterns or to group similar cases through a process
of gathering a set of objects into clusters. The outcome of clustering is that
objects in one cluster have a high degree of similarity, while being very
dissimilar to the objects in the other clusters. Clustering is a useful technique
for finding the distribution of patterns, correlations among data attributes [68],
and it can be used as an outlier6 detection tool to identify and detect objects
that deviate from normal patterns.

An outlier is an observation that lies outside the overall pattern of a distribution.

18

In recent years, clustering techniques have been widely used in many realworld applications, including document clustering [69, 70], gene expression
micro-array data analysis [71, 72], and image segmentation [73]. In addition,
clustering techniques have also been used in load profiling studies to group
similar load profiles for different purposes. However, none of the studies
conducted so far have used classification techniques to group load patterns of
individual customers based on similarities of consumption behavior for
differentiating between normal and abnormal load patterns. Therefore the
present study aims to use SVM, which is a novel machine learning technique for
classification.

In general, clustering techniques broadly fall into two classes (i) partitional and
(ii) hierarchical techniques. In partitional techniques, K-means clustering is
widely used and in hierarchical clustering, single linkage is most commonly
used [74]. However, in [75], Han et al. expanded the classification of clustering
techniques into five major categories: (i) Partitioning methods, (ii) Hierarchical
methods as mentioned by Karypis et al. in [76], (iii) Density-based methods, (iv)
Grid-based methods, and (v) Model-based methods [77]. Among all clustering
categories, each clustering technique has its own advantages and disadvantages
depending on the problem to be addressed and the assumptions made. The
most popular clustering techniques used for classification of electricity supply
utility customers based on load profiles and reported in literature are described
briefly in the following section.

One of the most popular clustering techniques for determining load profiles is
Fuzzy clustering. From [41], it is apparent that Fuzzy C Means (FCM) can be
used as an effective outlier detection tool. The only limitation of FCM is that the
number of clusters needs to be specified in advance. Hierarchical clustering was
employed by Gerbec [7, 41, 42, 66, 7882] and Chico [4850, 8388] to group
customers based on their behavioral similarities. Unlike FCM, the main
19

advantage of hierarchical clustering is that the numbers of clusters need not to


be specified in advance. K-Means clustering [52] also has had widespread use,
particularly in determining load profile classes in different electricity markets
[73]. However, there are two major issues that arise in K-means clustering,
namely: (i) the need to determine the K cluster in advance, and (ii) the usage of
the alternating minimization method to solve non-convex optimization
problems. Additionally, the Self-Organizing Map (SOM) has proven to be an
excellent tool for data mining analysis including customer classification in
electricity markets [52, 89, 90]. In [52] SOM clustering was used to create
different customer classes in order to develop better marketing strategies.
Moreover, SOMs are faster and convenient methods, which have prominent
visualization properties [91], compared to hierarchical clustering [92].

Comprehensive comparisons among all clustering techniques cited above have


been presented in [52, 85, 89, 90, 92]. The major problem observed in
clustering is the judgment of the number of clusters that arises because the data
can reveal clusters in different shapes and sizes in an n-dimensional data space.
The common solution for this is to form a view of the data itself. However, in
real-world applications, it is impossible to view clusters in high dimensional
data [14]. Hence, there is no clustering technique that is universally applicable
in exposing the variety of structures present in multi-dimensional data sets.

The studies conducted above indicate that load profile investigations have long
been undertaken for a variety of reasons. However, currently there are no
existent case studies of customer behavior changes that are important for
electricity supply utilities. Most of the studies carried out have reported
clustering techniques to group customers with their load profiles appropriately,
but no studies on applying classification techniques for load profiling are
present. Therefore, the present study focuses on employing an AI based
classification technique, SVM for grouping load patterns of individual customers
20

based on similarities of consumption behavior for differentiating between fraud


and good (normal) load patterns.

2.2 Electricity Losses


Electricity or power losses that affect electricity utilities can be classified into
two categories: (i) technical losses, and (ii) NTLs. Electricity losses are defined
as the difference between quantities of electricity delivered and quantities of
electricity recorded as sold to customers [8].

By default, the electrical energy generated should be equal to the energy


registered as consumed. However, in reality, the situation is different because
losses occur as an integral result of energy transmission and distribution [8].
Davidson in [89] defined these energy losses in terms of the following equation.
 = 
 


(2.1)

Where
ELoss is the amount of energy lost,
EDelivered represents the amount of energy delivered, and
ESold represents the amount of energy recorded or sold.

2.2.1 Technical Losses


Technical losses in power systems are naturally occurring losses, which are
caused by actions internal to the power system and consist mainly of power
dissipation in electrical system components such as transmission lines, power
transformers and measurement systems. Technical losses can involve degrees
of turbine efficiency in generation, together with substation, transformer, and
line related losses [94]. The most common examples of technical losses include
21

the power dissipated in transmission lines and transformers due to their


internal electrical resistance.

Technical losses are possible to compute and control, provided the power
system in question consists of known quantities of loads. Computation tools for
calculating power flow, losses, and equipment status in power systems have
been developed for some time. Improvements in information technology and
data acquisition have also made the calculation and verification of technical
losses easier. These losses are calculated based on the natural properties of
components in the power system, which include resistance, reactance,
capacitance, voltage, and current. Loads are not included in technical losses
because they are actually intended to receive as much energy as possible [94].

Technical losses include resistive losses in the primary feeders (I2R),


distribution transformer losses (resistive losses in windings and core losses),
resistive losses in secondary networks, resistive losses in service drops, and
losses in kWh metering [95]. Davidson in [89] specified an equation to calculate
the revenue loss due to technical losses which is given below.
 =   + 

(2.2)

Where
CLoss is the revenue loss due to technical/additional losses,
UCost represents the unit cost of electricity,
ELoss represents the amount of energy lost, and
MCost represents the maintenance and additional operational costs.

Two major sources contribute to technical losses: (i) load losses consisting of

the   and    loss components in the series impedances of the various system
22

elements, and (ii) no-load losses which are independent of the actual load
served by the power system [96]. The majority of the no-load losses are due to
transformer core losses resulting from excitation current flows [97].

2.2.2 Non-Technical Losses


Non-technical Losses (NTLs) refer to losses that occur independently of
technical losses in power systems. NTLs are caused by actions external to the
power system and also by the loads and conditions that technical losses
computations fail to take into account [94]. NTLs relate to the customer
management process and can include a number of means of consciously
defrauding the utility concerned [8]. More specifically, NTLs mainly relate to
power theft in one form or another and can also be viewed as undetected loads;
customers that the utilities dont know exist [94].

NTLs are more difficult to measure because they are often unaccounted by the
system operators and thus have no recorded information. Two major sources
which contribute to NTLs are: (i) component breakdowns and (ii) electricity
theft. NTLs caused by equipment breakdown are quite rare, where factors may
include equipment struck by lightning, equipment damaged over time, and the
elements of neglecting equipment or performing no equipment maintenance.
Even though equipment failure due to natural abuses like rain, snow and wind
is rare, the equipment selected and the distribution infrastructure designed is
in consideration with the local weather and natural phenomena [94].

Reducing NTLs is crucial for distribution companies as these losses are


concentrated in the LV network, their origins are spread along the whole
system and are most critical at lower levels in residential, smaller commercial
and light industrial sectors [8]. The most prominent forms of NTLs are
23

electricity theft and non-payment, which are believed to account for most, if not
all NTLs in power systems [94]. The factors contributing to NTL activities, as
indicated in [1, 22, 93], can be characterized to include the following:

1. Unauthorized line tapping; tampering with meters so that meters record


lower rates of consumption.
2. Unauthorized line diversions; stealing by bypassing meters or otherwise
making illegal connections.
3. Inadequacies and inaccuracies of meter reading.
4. Inaccurate customer electricity billing.
5. Poor revenue collection techniques.
6. Arranging billing irregularities with the help of internal employees, such
as making out lower bills and adjusting decimal point position on bills.
7. Non-payment of electricity bills.
8. Losses due to faulty meters and equipment.
9. Loss or damage of equipment/hardware e.g. protective equipment,
cables, conductors, switchgear etc.
10. Inaccurate estimation of non-metered supplies, e.g. public lightning,
agricultural consumption, rail traction etc.
11. In efficiency of business and technology management systems.

Other forms of NTLs may also exist, such as unanticipated increases in power
system losses due to equipment deterioration over time, system miscalculations
on the part of the utilities due to accounting errors or other information errors.
These losses have not been taken into account in the present study, due to
insufficient background information [94]. In order to estimate revenue loss due
to NTLs, Davidson [89] defined a general equation, which is given below.
 =  

(2.3)

Where
CNTL is the NTL cost component,
CLoss represents the revenue loss due to technical/additional losses, and
24

CTLoss represents the technical loss cost component.

Although some electrical power loss is inevitable, steps can be taken to ensure
that it is minimized. Several measures have been applied to this end, including
those based on technology and those that rely on human effort and ingenuity
[8]. Among the factors contributing to NTL activities, NTLs based on the
components identified are listed in Table 2.2 [1, 22, 93, 94].

Table 2.2: Types of NTLs based on the components identified


Components

Meter

Bills

Power Utilities
Inadequacies and
inaccuracies in meter
reading.

Electricity Customers
Unauthorized line tapping
and diversion.

Losses due to faulty meters


and equipment.

Stealing by bypassing meters


or otherwise making illegal
connections.

Inadequate or faulty
metering.

Tampering with meters to


ensure meters record lower
rates of consumption.

Loss and damage of


equipment/hardware, e.g.
protective equipment,
cables, conductors,
switchgear etc.

Faulty meters not reported.

Inaccurate customer
electricity billing.
Inefficiency of business and
technology management
systems.
Arranging billing
irregularities with the help
of internal employees.

Non-payment of electricity
bills.
Arranging billing
irregularities with the help of
internal employees.
Arranging false readings by
bribing meter readers.

Poor revenue collection


techniques.

Inaccurate estimation of nonmetered supplies, e.g. public


lighting, agricultural
consumption, rail traction etc.

Making out lower bills,


adjusting the decimal point
position on bills.

Ignoring to pay bills.

25

In the majority of factors contributing to NTL activities as indicated in Table 2.2,


electricity customers intentionally avoid paying their bills or are involved in
pilferage, theft, and unauthorized use [1]. Therefore, the intention of the
present study is to focus on detecting and identifying NTL activities in the LV
distribution network where deviations in customer behavior exist.

2.2.2.1 NTL Impacts


NTLs experienced by electricity supply utilities have major impacts on a
number of areas, including financial and economic outcomes and political
stability [1]. Financial impacts are the most critical for many utilities, as they
involve reduction in profits, shortage of funds for investment in improving the
power system and its capacity, and the necessity for implementing measures to
deal with the power system losses. More general economic impacts flow from
power utilities that are experiencing increasing losses associated with
corruption and forms of internal political intervention [22]. In such cases, the
costs of NTLs are passed down to the customers to cover such losses within the
utility operations. Therefore, the reduction of NTLs is critical for electricity
distribution networks as it will ensure that the costs for both the supplier and
the customers will be minimized, and the efficiency of the distribution network
will be improved [1].

2.2.2.2 Current NTL Solutions


Several methods have been recently proposed to overcome and minimize the
NTL problems in power systems. The two most common methods in use are: (i)
using hard measurement by installing electronic meters for revenue protection
[98], and (ii) applying estimation modeling [16]. The first method indicates that
installing electronic meters is beneficial, despite their high cost and necessary
network infrastructure extensions. In addition, Automatic Meter Reading
(AMR) proposed by Sridharan et al. in [99] has been used as an intelligent filter,
to provide an effective method for measuring losses and electricity theft in the
26

LV distribution networks [5]. The second method which involves estimation


modeling was developed by Fourie et al. in [18]. This approach applies a
statistical method to minimize electrical energy losses, particularly NTLs, in
electricity distribution networks. Estimation modeling is considered to be an
effective approach in reducing the cost of electricity to customers, which is
achieved by the technical evaluation and economical designing of electrical
distribution networks.

The current methods of minimizing NTLs impose high operational costs and
require extensive use of human resources. Several methods have been
developed in other countries to minimize NTL problems. Most electricity supply
utilities concentrate on onsite technical inspection of customers, which has high
operational costs and occupy much human resources and time [22]. Onsite
technical inspections in electricity supply utilities such as TNB in Malaysia, are
carried out at random while some targeted raids are undertaken based on
information reported by the public or meter readers. This study proposes a
method to overcome such limitations by monitoring and detecting deviations in
customers load profiles, as an alternative to complement the ongoing existing
actions enforced by power utilities to reduce NTLs.

2.3 Fraud Detection


In recent years, the development of new technologies has provided further
ways for criminals to commit fraud activities. The concise Oxford dictionary
defines fraud as, criminal deception; the use of false representations to gain an
unjust advantage [21]. Traditional forms of fraudulent behavior such as money
laundering have become easier to perpetrate and have been joined by new
kinds of fraud such as cellular phone telecommunication fraud, credit card
transaction fraud, and computer intrusion [35].

27

Fraud detection in the present context is defined as, monitoring the behavior of
a user population in order to estimate, detect or avoid undesirable behavior
[100]. Fraud detection involves identifying fraud as quickly as possible once it
has been perpetrated. Fraud detection methods are continuously developed in
order to defend criminals adapting newer strategies [35]. Since most criminals
are not aware of the fraud detection methods that have been successful in the
past, hence they will adopt strategies which will more likely lead to identifiable
frauds. Therefore to detect fraud, earlier detection tools need to be applied as
well as the latest developments [21].

2.3.1 Fraud Detection Techniques


The most common fraud detection techniques reported through literature
include data mining, AI and statistical methods. Two comprehensive surveys of
fraud detection techniques have been reported in [100, 101]. In [101], Hodge et
al. presented three fundamental approaches to the problem of outlier detection,
which are as follows:

1. Unsupervised Determining outliers with no prior knowledge of the


data using unsupervised clustering. In [102], Ferdousi et al. applied
unsupervised outlier detection to a time-series financial data.

2. Semi-Supervised Modeling only normality, or in a few cases modeling


abnormality, using semi-supervised recognition or detection.

3. Supervised Modeling both normality and abnormality using


supervised classification with pre-labeled data.

The three broad machine learning approaches mentioned above comprise of


five major outlier detection methods, namely: (i) Statistical-based methods, (ii)
Distance-based methods, (iii) Density-based methods, (iv) Clustering-based
methods, and (v) Deviation-based methods, which are as follows:
28

1. Statistical-based Outlier Detection Statistical-based outlier detection


identifies outliers using a disagreement test that assumes a distribution
or probability model for the given datasets. The major drawback of this
approach is that it is only limited to one-dimensional samples, which
renders it unsuitable for use in data mining problems, as most current
databases in such applications are multi-dimensional [14, 75].

2. Distance-based Outlier Detection Distance-based outlier detection


was introduced to overcome the drawbacks of statistical-based outlier
detection methods. In [103], Lozano et al. presented a parallel
combination of two outlier detection algorithms. The former algorithm
employed the distance-based outlier, while latter algorithm employed
the density-based outlier. The distance-based outlier was based on
nested loops, along with randomization and the use of a pruning rule,
while the density-based outlier required one parameter comprised of
the number of the nearest neighbors used to define the local
neighborhood of the instance.

3. Density-based Outlier Detection In [103105], density-based


approaches for mining outliers with different densities and arbitrary
shapes were proposed. The results obtained revealed that the proposed
methods had significant speed improvements with comparable accuracy
over the current state of the art density-based outlier detection
approaches.

4. Clustering-based Outlier Detection In [104], Ren et al. developed an


efficient cluster-based outlier detection method using a vertical data
model. The empirical results obtained revealed that this method
enhanced the clustering performance at five times the speed, when
compared to the current clustering-based outlier detection approaches.

5. Deviation-based Outlier Detection Deviation-based outlier detection


is different from the density-based and clustering-based outlier
29

detection algorithms. The deviation-based outlier detection identifies


outliers by examining the main characteristics of objects in the group.
Two deviated-based outlier detection techniques reported in literature
have been widely used, namely: (i) the Sequential Exception technique,
and (ii) the Online Analytical Processing (OLAP) Data Cube technique.

Alternatively, the techniques used to detect outliers can be divided into two
major categories, which are as follows:

1. Statistical Techniques A review of novel statistical approaches used


for outlier detection is presented in [106]. In [107], Laurikkala et al.
presented an informal box plot to detect univariate outliers directly in
areas of medical science. Even though the technique employed was quite
simple, however, the results obtained revealed that the classification
accuracy was effectively increased.

2. Artificial Neural Network Techniques A review of novelty detection


techniques based on the use of neural networks is presented in [108]. In
recent years, various applications have focused on using ANNs for
detecting fraud and outliers, which include fraudulent financial activities
exposed by the Securities and Exchange Commission (SEC) [109, 110],
communication network fraud [111] and fraudulent credit card
operations [27, 112, 113].

Outlier detection has been investigated as a part of this research because it is


concerned with the most crucial approaches in data mining. Based on the
comprehensive literature reviewed in the present context, it is indicated that
variety of detection techniques [102105, 114] have been applied to detect
outliers. Outliers can originate through a number of reasons, including:
mechanical faults, changes in system behavior, fraudulent behavior, human
error, instrumentation error, noise and other inconsistencies [102].
30

2.3.2 Proposed Fraud Detection Methods


Based on the broad literature examined, a review of fraud identification and
detection techniques is conducted below with businesses divided into two
major categories as follows:

1. Electricity business.
2. Other type of businesses, including: credit card transaction, insurance,
risk management, and telecommunication.

2.3.2.1 Fraud Detection in the Electricity Business


Several data mining and research studies on fraud identification and detection
in electricity business were reviewed, which include: Rough Sets [15], Statistical
methods [16, 17], Decision Trees [18, 19], Artificial Neural Networks (ANNs)
[20], Extreme Learning Machine (ELM) [8], Statistical-based Outlier detection
[21], Knowledge Discovery in Databases (KDDs) [10, 22] and Wavelet-based
feature extraction with multiple classifiers [23]. Most of the studies cited above
used data mining techniques by directly applying them to customer databases
as inputs.

From the literature reviewed it is observed that, Jiang et al. in [25] employed
Wavelet techniques with a combination of multiple classifiers to identify fraud
customers in an electricity distribution network. The Wavelet technique was
selected over conventional methods for feature extraction, because of the
localization and multi-resolution properties in wavelets helps to obtain results
with greater accuracy. Alternatively, in [15] Rough Sets and in [18, 19] Decision
Trees were used for the classification of electricity utility customers. In
addition, Statistical methods in [16, 17] were also used to minimize NTLs in
electricity distribution networks. There were also studies conducted using the
ANN [20], Statistical-based Outlier detection [21], and most recently developed

31

ELM [8], where all studies presented different approaches employing a general
framework that had customer databases as the input data source.

2.3.2.2 Fraud Detection in Other Types of Businesses


Data mining techniques have also been used in the other types of businesses,
including: telecommunication [24], insurance [25], risk management [26], and
credit card provision [27]. In each context, data mining techniques are used as a
tool to enable the detection and prediction of fraud activities.

The literature reviewed indicates that for credit card transactions, online
merchants are very susceptible to fraud, as the purchasers are not present
during the transaction [115]. Most of the credit card companies have used ANNs
as their tools to detect fraud [27, 112], with the telecommunication businesses
and the Security and Exchange Commission (SEC) applying similar techniques.
All of the applications cited above have employed data mining techniques to
expose fraud directly from their customer databases. Alternatively, in [116], a
rule-learning based algorithm was developed in order to identify users with
fraudulent behavior from databases of cellular phone customer transactions.

2.4 Electricity Theft


Electricity theft is a problem that has long been known to power utilities, and is
defined as, a conscience attempt by a person to reduce or eliminate the amount
of money he or she will owe the utility for electric energy [94]. The Provincial
Energy Authority (PEA) of Thailand [117], TNB in Malaysia [118], and other
distinct published sources in [119, 120], agree that the most prominent sources
of NTLs are: (i) electricity theft, and (ii) non-payment of electricity bills.
Electricity theft can range from tampering with meters to indicate false billing
consumption to performing unauthorized connections to the power grid [94].
32

The current approach used by the PEA of Thailand [117], TNB in Malaysia [118]
and other power utility companies, in order to detect the two major causes of
NTLs, as mentioned above, is by performing onsite technical inspection of
customers. This primarily involves field staff monitoring meters and access
points in the transmission and distribution system on a regular basis. Onsite
technical inspections in TNB Malaysia [117] are carried out at random, while
some targeted raids are undertaken based on information reported
(irregularity reports) by the public or meter readers. In addition, most power
utilities even provide specialized training to regular meter readers in order for
them to spot irregularities in consumption behavior [94].

The reason that meter inspection is the main method of NTL detection is
because, power utilities consider electricity theft to be the major source of
NTLs, and the majority of electricity theft cases involves meter tampering or
meter vandalism [94].

2.4.1 Watt-Hour Meters


Electric meters or Wh (Watt-hour) meters record readings which are used to
bill customers for the amount of electricity consumed [36]. In virtually every
household, commercial and industrial sectors, kWh (Kilowatt-hour) meters are
currently used to record the amount of electricity consumed for calculation of
electricity bills by utility companies.

The principles of operation for electric Watt-hour (Wh) meters virtually have
not changed since the 1880s and the 1890s, since the Watt-hour meter was first
invented [121]. The basic principle for a single-phase electrical energy
measurement meter, first commercially used in 1894, is as follows. A standard
Watt-hour meter consists of two coils that produce electromagnetic fluxes [94]:
33

1. A coil connected across the two leads that produces a flux proportional
to the voltage (potential coil) as shown in Figure 2.1 (top left).
2. A coil connected in series with one of the leads that produces a flux
proportional to the current (current coils) as shown in Figure 2.1.

Figure 2.1: Basic components of a Watt-Hour Meter Clockwise from top left:
The coil connections for voltage and current sensing elements, the rotating disc
that records the electricity consumption, and the basic construction [94]

The dot product of these two electromagnetic fluxes creates a force


proportional to the power load [94]. An illustration of the basic components of
the Watt-hour meter is given in Figure 2.1. The development of these electricity

34

meters, technological improvements, and alternative designs, which reflected


the growing power industry in the late 19th century, is presented in detail in
[121].

In early meter designs, such as the ones shown in Figure 2.2, electricity meters
were not enclosed and all the parts including the meter installation was easily
accessible to anyone [94]. However, as early as 1899, the Association of Edison
Illuminating Companies (AEIC) indicated that electricity theft was a concern
early on. In response to the recommendations proposed by the committee of
the AEIC, the following improvements along with other efficiency and accuracy
improvements were incorporated into electricity meters [121]:

1. A dust and insect-proof cover.


2. A cover and frame so shaped and retained together, as to render
dishonest and curious tampering with the internal mechanism as
impossible as may be.
3. Means for full protection from malicious tampering the heads of all
screws in the base, which binds the damping magnets, etc. in place
without rendering them inaccessible to those authorized to reach them.

The literature above indicates that the problem of electricity theft has been
around almost as long as power systems were introduced. Modern meters, such
as those in shown Figure 2.3, are comparatively well enclosed and have seals
that can reveal tampering [94]. However, theft can still occur. Most power
utilities train their inspection teams for spot tampering, however, sometimes
access to the inner mechanisms of the meter can be achieved by drilling a very
tiny and fine hole at less obvious parts of the enclosure [120], which is difficult
for inspection teams to identify. A detailed historical background, and timeline
of Watt-hour meters from the earliest days, since the 1880s to modern present
day meters, can be found at [122].

35

Figure 2.2: Earliest electricity recording meters Left to right: Westinghouse


ampere-hour meter by Shallenberger (1888-1897) and Gutmann type A
(1899-1901) Watt-hour meter by Sangamo [122]

Figure 2.3: Modern electricity recording meters Left to right: Schlumberger


J5S (1984) and General Electric I-70S (1968) [122]

2.4.2 Methods of Electricity Theft


Presently, a vast number of methods are employed by power utility customers,
in order to commit electricity theft. However, the two most common methods of
36

electricity theft widely practiced are: (i) directly connecting an unregistered


load to a power line, and (ii) tampering with a registered loads meter in order
to reduce the actual consumption of the load indicated by the meter [94]. The
various methods practiced to commit electricity theft for high and low voltage
electricity meters, recorded by the PEA of Thailand [117] and TNB in Malaysia
[118] are described below in detail.

2.4.2.1 Low Voltage Meters


A low voltage (220V single-phase) Watt-hour meter is based on the principles
of operation discussed earlier in section 2.4.1 of this chapter. Presently, electromechanical induction meters used by developing countries in the ASEAN group,
such as Malaysia, Indonesia, Thailand and Vietnam can be tampered by
breaking the meter seals and gaining access to the components inside. The
various methods employed by perpetrators to steal electricity from LV meter
installations are discussed below in detail. The parts of a LV single-phase meter
where tampering mostly occurs are shown in Figure 2.4.

1. Direct connections from the power line In 220V single-phase


systems, mainly used in residential districts and small businesses,
electricity meters and equipment have low voltages, therefore, direct
connections to the power lines are feasible. This can be accomplished
using a pair of rubber gloves, for all the necessary protection with the
use of a ladder, pliers and all the other necessary tools [94]. Most of the
electricity theft cases identified by the PEA of Thailand [117] and TNB in
Malaysia [118] indicate that this is by far the most common method used
by perpetrators, especially street vendors and shanty towns.7

Shanty towns also referred to as slums or squatter settlement camps, are settlements
(sometimes illegal or unauthorized) of people who live in improvised dwellings made from
scrap materialsoften plywood, corrugated metal and sheets of plastic.

37

Figure 2.4: Parts of a single-phase Watt-hour meter where tampering frequently


occurs [94]

2. Meter tampering by breaking the seal The other most common


method for electricity theft is by breaking the meters housing seal for
tampering with the components inside. Once the meters seal is broken,
access to the meter inside the housing is attained, where several actions
can be performed. One of the ways is to mechanically obstruct the
spinning disc and the axis that perform the recording. Another popular
method is to turn back the number dials in the meter, that bill collectors
read in order to calculate the electricity consumption [94].

3. Using alternate neural lines Typically, in a single-phase system, one


wire known as the positive line (line carrying the current) is going into
the house. Additionally, the neutral line is usually grounded and is
sometimes provided by the foundation of the house. This scenario
suggests that an alternative neural line can be used to show reduction in
the meters unit count. This can be accomplished by using a small
transformer as the neutral, then the meter that uses the very same
neutral source will read the incoming voltage lower than it actually is
[94], thus reducing the units counted by the meter.

38

4. Phase-to-phase connection The phase-to-phase connection is similar


to using an alternate neutral line. However, in this case the system
voltage becomes the phase-to-phase voltage at 240V or 380V [94],
depending on the country, as different countries use different main
voltages and frequencies.

Other methods of electricity theft include, tapping off nearby paying customers,
and using magnets to reduce the speed of the rotating disc in the meter housing.
One of the other possible methods found to reduce the billing consumption at
the consumer end is by using a Tron Box [120], which reverses the phase
signal of one of the lines passing through to meter to cancel out the opposing
phase in the line. This reverse phase effect reduces the speed of the meter
indicating lower electricity consumption and can also result in the meter dials
to move backwards [94].

2.4.2.2 High Voltage Meters


High voltage (12kV or 24kV, 3-phase, 3 or 4-wire primary) 3-phase Watt-hour
meters are installed throughout the PEA system in Thailand [117] and TNB in
Malaysia [118] to monitor loads that consume high volumes of electricity,
requiring high voltage. The 3-phase Watt-hour meters use a technique known
as the two Watt-hour meters connection to measure electricity consumption.
Since the load is connected with a HV line and consumes high current levels,
therefore, current and voltage sensing is achieved by using current
transformers8 (CTs) and voltage taps respectively [94]. The various methods
applied by perpetrators to steal electricity from HV meter installations are
discussed below in detail. A schematic drawing illustrating the connections in a
3-phase Watt-hour meter is shown in Figure 2.5.

A current transformer is a device that outputs a current proportional to the load current being
measured, enabling the meter to measure the load without subjecting it to a large current [94].

39

Figure 2.5: A 3-phase Watt-hour meter connection [94]

1. Direct connections from the power line One of the obvious methods
used by perpetrators to eliminate consumption records is to bypass the
meter. The major obstacle in this scenario is that most HV loads are
constructed and connected at the request of customers. Since customers
have knowledge regarding the location of the HV power lines, as they are
the ones who request for the connections, therefore, direct connections
from the power line to the loads can be established with assistance from
electricians. However in most cases, not many electricians would risk to
expose themselves to HV power lines without the power utility there to
assist them with safety [94].

2. Tampering with the meter Another common method for electricity


theft is by breaking the meters housing seal for tampering with the
components inside, as previously mentioned in section 2.4.2.1 of this
chapter. After gaining access to the components inside the meters
housing, the spinning disc and the axis in the meter that perform the
recording can be disrupted. Another popular employed usually, is to turn
back the number dials in the meter that bill collectors read in order to

40

calculate the units of electricity consumed [94]. This method, however,


obviously is not applicable for digital display meters.

3. Tampering with the terminal seals The most common meter abuse
method used by far, is by tampering with the terminal seals. Since the
terminal seals are in an easy to reach location, i.e., immediately below
the meter, most perpetrators use this to their ease. This is accomplished
by breaking the terminals and connecting one of the CT wires to the
ground, making it appear to the meter that one of the phases does not
have any voltage or current [94].

4. Breaking the control wires The control wires are the secondary
wires of a CT. Meters for large loads measure high currents, therefore
step-down CTs are connected to reduce current level, in order to make
the current level compatible with the components in the meter. This can
be accomplished by breaking the insulation of a control wire and
connecting external taps to it, in order to reduce the current going into
the meter, which will cause the meter to read less current than the actual
amount [94].

5. Shorting the control wires Shorting the control wires in the meter,
will divert the current reading in the meter. In this scenario, the current
going into the meter will be zero. This effect is immediate, and obviously
with zero current, the power consumption readings will be zero and the
accumulated consumption will be stationary [94].

6. Switching the current transformer wires Another ingenious and


effective way used by perpetrators to steal electricity, is by switching the
CT wires. Generally, most 3-phase meters use only two CTs to measure
the current from phase A and B, by assuming the load is balanced
between all three phases. However, in reality, large industrial facilities
may have unbalanced loads, where the imbalance can vary in between
10% to 20% of the most heavily loaded phase. In the majority of cases,
41

phase C always has the least amount of load and lowest power factor
(PF). By switching the CTs or the control wires from the CT secondary
windings, the meters reading speed can be altered. This is accomplished
by removing the CT from one of the phases A or B, and replacing the
removed CT on phase C to reduce the meters power reading speed [94].

7. Breaking the voltage taps Voltage taps, as shown in Figure 2.5 are
present in the meter housing, in order to allow the meter to read the
voltage of the load. Perpetrators, usually employ one of the following
methods for electricity theft: (i) break the voltage taps, or (ii) short the
voltage taps to the ground, or (iii) connect the voltage taps to another
line, in order to distort the reading of the meter, so that the meter reads
a lower voltage [94].

Many other methods for electricity theft have been identified. However, some of
the methods are not feasible enough, while others require too much effort or
are outright dangerous [120]. As an example, one highly far flung idea of
electricity theft found was, to place enormous coils around HV power lines to
act as transformers with ridiculously large air gaps [123].

2.5 Overview of Tenaga Nasional Berhad


Tenaga Nasional Berhad (TNB) is the largest electricity supply company in
Malaysia, with a complete infrastructure that includes generation, transmission,
and distribution systems serving peninsular Malaysia [124]. Tenaga Nasional
Berhad (TNB) started its business in 1946 as the Central Electricity Board in
Malaysia. At that time, it had an initial generation capacity of 46 MW, with 2,466
employees serving 45,000 customers [118]. After more than six decades of
operation, TNB has 11,200 MW of installed generation capacity, more than MYR
67 billion in assets, with 28,000 employees serving more than seven million
customers [125].
42

In peninsular Malaysia, TNB contributes to 55% of the total countrys supply


capacity through six thermal stations and three major hydroelectric schemes.
TNB also manages and operates a comprehensive transmission network, the
National Grid that links TNB power stations and Independent Power
Producers (IPPs) to the distribution network [118]. The national grid is also
interconnected with Singapores transmission system in the south and
Thailands transmission system in the north [125]. Integral to all this is TNBs
management of power distribution, through a comprehensive distribution
network, call management centers, and customer service centers.

TNB has classified customers into six different categories based on different
types of businesses, which include: (i) domestic, (ii) commercial, (iii) industrial,
(iv) agricultural, (v) mining, and (vi) public lighting [118]. Industrial customers
or Large Power Customers (LPCs), as referred by TNB comprise of a fairly large
amount of TNB users, which contribute the largest proportion of sales revenue.
The commercial and the domestic customers then follow in sequence, which
comprise of the largest proportion of TNB users, referred to as Ordinary Power
Customers (OPCs). The remaining agricultural and mining categories comprise
of a smaller proportion of customers.

2.5.1 NTLs Encountered by TNB


In common with other such utilities in the region, TNB experiences NTLs, most
especially due to power theft and delinquent customers [118]. In 2004, TNB
recorded revenue losses as high as RM 800 million a year as a result of the
electricity theft, billing errors and faulty metering [8]. Additionally, in 2005,
TNB detected 3,000 cases of electricity theft from OPCs and 105 cases from
LPCs. This involved the recovery of MYR 23 million and MYR 11 million
respectively from these consumer sectors [125].

43

NTLs are considered to be a serious problem for many electricity supply


companies worldwide, as they affect a companys profit and credibility and also
increase the cost of electricity to its customers [16]. In Malaysia, the revenue
cost due to NTLs that was recovered in 2004 contributed an estimated MYR 106
million to TNB [126]. This indicates that preliminary efforts carried out by TNB
in order to detect and minimize NTLs reduced a significant amount of illegal
power use, however, power theft problems still exist in all possible forms.
Therefore, there is a crucial need to minimize NTLs for the sake of power
utilities, including TNB Malaysia (that is the focus here), and their customers.

2.5.2 Measures Taken by TNB for NTL Reduction


In Malaysia, TNB has adopted various measures in order to reduce NTLs and
prevent electricity theft, which include: (i) upgrading existing monitoring and
detection systems, (ii) performing onsite inspection of customer installations,
and (iii) physically protecting metering installations in the cases of high risk
customers (HRCs) [126].

Reducing NTLs is a core strategy for TNB under its 20 year strategic master
plan and TNB has since 2004, made intensive efforts to reduce power theft,
since power theft is a major contributor to the NTLs [126]. Currently, three
measures comprise the main effort by TNB to minimize and work towards
preventing NTLs, which include: (i) the installation of Remote Meter Reading
(RMR) service [127] to provide power consumption statistics and online billed
data, (ii) the installation of a prepayment metering system [128], and (iii) the
setting up of a Special Enforcement Against Losses (SEAL) team to investigate
problems by conducting onsite customer meter installation inspections [126].
Currently, the RMR service and the prepayment meter system are targeted only
towards the HV distribution network, for LPCs. For OPCs, such as commercial
domestic and light industry customers, in the LV distribution network, the SEAL
team provides effective solutions to minimize and reduce NTLs.
44

The SEAL team was set up by the NTL Electricity Theft group of TNB in 2004, in
order to reduce and minimize NTL problems faced by TNB [126]. The SEAL
teams activities include improving metering and billing processes, ensuring
metering is accurate, and reducing the theft of electricity. In 2005, the NTL
Electricity Theft group successfully reduced distribution losses by about 1%
mainly through the efforts of the SEAL team. The electricity theft rate for TNB in
2005 was reduced almost by 50%, where the theft rate for LPCs was reduced
from 3% to 1.5% and likewise for OPCs, the theft rate was reduced from 4.1%
to 2% [125].

In order to increase effectiveness of the SEAL teams operations, additional


engineers, technicians and meter readers were specially trained in 2006, to spot
billing consumption irregularities, as well the procurement of new equipment
and transportation were provided, in order to pursue suspected cases of power
theft, which were overlooked in the previous years. In 2006, the SEAL team
aggressively carried out various activities including improvement of the
customer billing process, conducting physical meter inspections, testing and
rectification of the metering systems for LPCs and certain numbers of the OPCs.
These efforts resulted in substantial amounts of back billing and collections.
Additionally, the SEAL team installed: (i) Secure Meter Boxes (SMBs) for HRCs,
and (ii) Expanded Metal Protection Doors (EMPDs) for OPCs, in order to
prevent from meter tampering. TNB also expanded their Enhanced Customer
Information Billing System (e-CIBS) in 2006, in order to identify HRCs for
better security against power theft. The e-CIBS provides the SEAL team with
accurate analysis and consumption reports, which can identify consumption
patterns of repeated power theft [124].

In 2007 and 2008, large inspection campaigns were carried out by the SEAL
team with little success. The reason for this is newer and improved methods of
electricity theft, which are difficult to identify. The current actions taken by the
45

SEAL team in order to address the problem of NTLs include: (i) meter checking
and premise inspection, (ii) reporting on irregularities, and (iii) monitoring of
unbilled accounts, meter reading and sales. Currently, customer installation
inspections are carried out by the SEAL teams without any specific focus or
direction, and most inspections are carried out at random, while some targeted
raids are undertaken based on information reported by the public or meter
readers. Therefore, the motivation for the present research investigation is to
reduce NTLs effectively in the Malaysian electricity supply industry, mainly
TNB, by proposing and alternative solution to complement their currently
existing approaches.

2.6 Summary
The fundamental objectives of power utilities are to maximize profit and
minimize operational costs, which require dealing with the common problems
of losses. Such losses are categorized as technical losses and NTLs. The need to
minimize and reduce NTLs is critical for power utilities, as these losses
contribute to the cost of electricity, which is passed on to the utility customers.
From all the current solutions available which include field investigations and
Supervisory Control and Data Acquisition (SCADA) systems, the present
approach is to use customer behavior changes in load consumption variations
as a means of indicating fraud activities that contribute to NTL activities. For
this purpose, a fraud detection system similar to those implemented by
electricity businesses and other businesses, such as credit card transactions and
bank loan applications is recommended for implementation by power utilities.

This chapter reviewed the background and literature relating to losses in power
utilities, including technical losses and those due to NTL activities referring to
the impact of NTL activities from an economic and a financial perspective. A
comprehensive review of customers load profile analysis in several countries
was presented in this chapter. Also considered were the clustering techniques
46

that have commonly been implemented to classify electricity customers using


the customer load profiles together with the associated cluster validity
measures. Some background issues concerning fraud detection techniques and
fraud cases in electricity businesses, as well as in other businesses including
credit card transaction, telecommunication, insurance and risk management
were also reviewed. In addition, the operation of Watt-hour meters and various
methods of electricity theft including meter tampering, direct connections and
reducing speed of the rotating disc in electric meters were presented. In the
end, this chapter provided an overview of TNB as the largest power utility in
peninsular Malaysia and outlined its needs with respect to implementing
solutions to minimize NTL activities.

47

CHAPTER 3

SUPPORT VECTOR MACHINE

3.0 Overview
This chapter presents the background and theoretical concepts of the artificial
intelligence (AI) techniques applied in this research study. The introduction
starts off by discussing the preliminaries of AI briefly, which include pattern
recognition and machine learning techniques, such as supervised learning and
unsupervised learning. Next, some popular AI techniques are briefly discussed,
which include: Expert System (ES), Fuzzy Logic (FL) and Artificial Neural
Networks (ANNs). In the sub chapter of SVM, the statistical learning theory is
presented, followed by the Structural Risk Minimization (SRM) principle. The
background and theoretical concepts of Support Vector Machines (SVMs) with
regards to Support Vector Classification (SVC) are discussed in detail where
derivations of the margin hyperplanes for linear and non-linear SVC are
presented, followed by the kernel methods. The last part of the chapter
presents the Sequential Minimal Optimization (SMO) algorithm used for
optimization of Quadratic Programming (QP) problems in SVMs.

3.1 Introduction to Artificial Intelligence (AI)


Artificial Intelligence (AI) is the area of computer science focusing on creating
machines that can engage on behaviors that humans consider intelligent [129].
John McCarthy, who devised the term artificial intelligence in 1956, defined
this field as, the science and engineering of making intelligent machines [130].
48

The field of AI was founded on the claim that a central property of human
beings, i.e. intelligence, can be so precisely described that it can be simulated by
a machine [131]. AI first emerged in the early 1940s, when scientists began new
approaches to build intelligent machines based on recent discoveries in
neurology, cybernetics, new mathematical theory of information and most
importantly, with the invention of the digital computer [132134]. After World
War II, a number of people independently started to work on intelligent
systems. An English mathematician named Alan Turing was the first to conduct
lecture on AI in 1947 [135], and decided that AI was best researched by
programming computers, rather than by building machines [136]. Officially, the
field of AI research was founded at a conference in Dartmouth College in the
summer of 1956 [132, 133, 137]. The researchers present at this conference,
later onwards established AI laboratories in world famous computer science
research institutions, such as the Massachusetts Institute of Technology (MIT)
and Stanford University in the United States.

By the late 1950s, there were many researchers working in the field of AI, and in
fact all of their work was based on computer programming [136]. From the
middle 1960s AI research started to be funded by the U.S. Department of
Defense, i.e., the Defense Advanced Research Projects Agency (DARPA). In the
early 80s, with the commercial success of expert systems and ANNs, the field of
AI research led to a new horizon. By 1985, the market for AI had reached more
than a billion dollars and governments around the world funded money on AI
research projects [132, 133, 137].

In the 90s, AI achieved its greatest successes, when many industries replaced
expensive human experts by using main stream computing systems that
reduced the cost of business and high risk exposure to their employees [138].
Since then, many AI based techniques have been successfully transitioned from
the research lab into real-world applications for pattern recognition, data
49

mining, control systems and robotics [139]. In the early 21st century, many
areas throughout the technology industry, such as defense, transportation,
manufacturing, and entertainment commercialized applications based on AI,
some of which include: face recognition, medical diagnosis of cancers and
tumors, aircraft control, nuclear power systems, and intelligent systems used
for optimization, monitoring, control, planning, scheduling and fault diagnosis.

In todays era, AI is already a part of human life in many countries, and it has
grown into an important scientific and technological field over the recent years.
Currently, AI is assisting people faster and more efficiently to make better use of
information in performing tasks which require detailed instructions, mental
alertness and good decision and making capabilities. The future benefits of AI
are indeed very promising, as it is currently being deployed in space exploration
missions and is also being used to invent robots with similar human like skills
and characteristics.

3.1.1 Machine Learning


Machine learning is a broad subfield of AI that is concerned with the design and
development of algorithms and techniques, which allow computers to learn,
and improve their performance over time based on data from databases and
sensors [140]. The main focus of machine learning research is to extract
information from data automatically, using computational and statistical
methods and produce models, such as rules and patterns. More specifically in
terms of pattern recognition, machine learning is an approach to search a large
space of potential hypotheses and to determine the hypothesis that will best fit
the labeled or unlabelled data [37]. Therefore, machine learning is not only
related to data mining, pattern recognition and statistics, but it is also
considered a part of theoretical computer science [141].

50

In recent years, machine learning has been applied to a wide variety of


applications, which include: natural language processing, search engines,
medical diagnosis, bioinformatics, detecting credit card fraud, stock market
analysis, classifying DNA sequences, speech and handwriting recognition, face
recognition, game playing and robot locomotion [142].

3.1.1.1 Pattern Recognition


Pattern recognition also known as pattern classification, is a subtopic of
machine learning, which is defined as, the act of taking in raw data and taking
an action based on the category of the data [143]. Pattern recognition, aims at
classifying data (patterns) based on either a priori knowledge that is acquired
by human experts or on knowledge automatically learned from data [144]. The
patterns to be classified are usually groups of measurements or observations,
defining points in an appropriate multi-dimensional space [145]. A pattern
recognition system consists of three main components, as follows [146]:

1. A database of observations or a sensor that gathers the observations to


be classified or described.
2. A feature extraction mechanism that computes numeric or symbolic
information from the observations.
3. A classification or description scheme that does the actual job of
classifying or describing observations, relying on the extracted features.

Pattern recognition classifies objects into a number of classes or categories


based on the patterns that objects exhibit. The classification patterns are
recognized as a combination of features and their values. The objects are

described with a number of selected features and their values. An object  thus
can be described as a vector of features [144]:

 =  ,  , , # $

(3.1)
51

Where
p is the number of features or attributes.

The features or attributes together span a multi-variate space called the feature
space or measurement space. In pattern recognition, the classification task can
be seen as a two-class (binary) or multi-class problem. With respect to focus of
the present study, in a two-class problem, an object is classified as belonging or
not belonging to a particular class [144].

During the classification scheme, the learning algorithm takes the training data
as input and selects a hypothesis from the hypothesis space that fits the data
[144]. There are many machine learning algorithms such as unsupervised
learning and supervised learning, however, the availability or non-availability of
training samples determines which machine learning should be considered.

3.1.1.2 Machine Learning Algorithms


Many machine learning algorithms have been developed for the purpose of
pattern recognition, which include: supervised learning, semi-supervised
learning, unsupervised learning, reinforcement learning, transduction and
learning to learn. There are three major factors used to determine the type of
machine learning algorithm to be considered for use: (i) the type of application,
(ii) the type of data i.e., labeled or unlabeled, and (iii) the desired outcome of the
algorithm. In this research thesis, the two major machine learning algorithms (i)
supervised learning, and (ii) unsupervised learning are discussed and evaluated,
as the other machine algorithms do not contribute to the focus of the present
study.

52

3.1.1.2.1 Supervised Learning


Supervised learning is a machine learning algorithm to create an objective
function from [37] a set of patterns that have already been classified or
described. This set of patterns is termed as the training set [144]. In pattern
recognition problems, the training data consists of pairs of input observations
(numerical feature values) along with their class labels. The output of the
function consists of the predicted class labels of the input samples from the
testing data. The task of the supervised learner is to predict the value of the
function for any valid input sample after having seen a number of training
samples [37].

In supervised learning, it is necessary that the data used for training and testing
relates to the domain of interest. As an example, suppose that a bank wishes to
detect fraudulent credit card transactions. In order to accomplish this, some
basic knowledge regarding the domain of interest is required to identify factors
that are likely to be indicative of fraudulent credit card usage. These factors may
include: the frequency of usage, amount of transactions, spending patterns, the
type of business engaging in the transaction and so forth. These variables are
referred as features or independent variables. These independent variables
should be in some way related to the targets, or dependent variables [37], so
that the supervised learning can generate a model to map the input objects to
the desired outputs.

Deciding on the variables or features to use in a model is a very difficult


problem in general, and this is known as the problem of feature selection. Many
feature selection methods exist [147], such as: cross-validation (CV), random
selection (RAN), information gain (IG), genetic algorithm (GA) and simulated
annealing (SA) algorithm [148].

53

In supervised learning, given a set of % observations [149]:


& = '(x , * +, , (x
, *
+,

(3.2)

with input samples x . = 0 , for 1 = 1,2, , % that indicate target outputs

* 4 = '1,2, , 5, [149], the goal of supervised learning is to find a function


67 (x+ in the set of functions which minimizes a loss functional on future

observations [150]. In this research study, the monthly kWh consumption is


used as the independent variables x and the cases confirmed by TNBD SEAL

teams (targets) are used as the dependent variables * . Then, the relationship
between  and * is given by the joint probability density is [37]:
9(x, * + = 9 (x+9(*|x+

(3.3)

Where
x represents the independent variable representing the features,
* represents the dependent variable representing the targets,
9(x+ is the prior probability, and

9(*|x+ is the conditional probability.

This formulation allows * to become a deterministic or stochastic function of x.

However, in reality, data is generated in the presence of noise, so the observed


values are stochastic even if the underlying function is deterministic [151]. For
the purpose of this research study, a function that minimizes the binary

classification error, where 4 = '1, +1, is to be found, more specifically, the


risk functional is to be minimized [152]:

(67 + = ; <(*, 67 (x, = ++ >?(x, *+


Where

(3.4)

< is the cost of making the prediction,

<(*, 67 (x, = ++ is the loss between the target * and function 67 (x, =+,

54

67 (x, =+ predicts the targets from a the inputs x and model parameters =,
?(x, *+ is the joint probability distribution.

In binary SVM classification, the loss function <(*, 67 (x, = ++ in eq. (3.4) can be

rewritten as [152]:

<(*, 67 (x, = ++ = @

0 if * = 67 (x, =+K
1 otherwise

(3.5)

Since the joint probability distribution ?(x, *+ of the inputs and targets of the

observations is unknown, this risk functional is estimated using observations to


create an empirical risk functional [152]. In the case of binary classification, eq.
(3.4) is discretized with the loss functional from eq. (3.5), to create the binary
classification empirical risk functional or classification error:
 L# (67 + =
N |* 67 (x , = +|

(3.6)

The most widely used supervised learning classifiers include: ANNs, SVMs,
Decision Trees, Gaussian Mixture Models, Discriminant Analysis (DA),
Classification Trees, Nave Bayes and the Radial Basis Function (RBF) classifiers.
The advantage of supervised learning is that it provides confidence values for
the predictions that are important for this research study. However, the
disadvantage of supervised learning is that estimating the distributions can be
difficult and a full probabilistic model may not be required [37].

3.1.1.2.2 Unsupervised Learning


Unsupervised learning is a method of machine learning where a model is fitted
to the observations without referring to any predefined or learned cases. More
specifically, unsupervised learning studies behavior of systems in order to
represent particular input patterns in a way that reflects the statistical structure
55

of the overall collection of input patterns [153]. This is accomplished by


gathering a data set of input objects, after which the learning algorithm typically
treats input objects as a set of random variables, by building a joint density
model for the data set [37].

In contrast to supervised learning, there are no explicit target outputs or


environmental evaluations associated with each input; rather the unsupervised
learner brings to bear prior biases as to what aspects of the structure of the
input should be captured in the output. The only data that unsupervised

learning methods use are the observed input patterns x , which are often
assumed to be independent samples from an underlying unknown probability
distribution 9 (x+, and some explicit or implicit a priori information as to what is

important [153].

Unsupervised learning relates to the problem of density estimation in statistics,


however, it also encompasses many other techniques that seek to summarize
and explain key features of the data [143]. The two classes of methods that are
used to perform unsupervised learning are: (i) density estimation techniques,
and (ii) feature extraction techniques. In brief, density estimation techniques
explicitly build statistical models of how underlying causes can build the input,
while feature extraction techniques try to extract statistical regularities or
sometimes irregularities directly from the input observations [153].

There are many unsupervised learning techniques, one of which the most
common is clustering. Other forms of unsupervised learning include the: SelfOrganizing Maps (SOMs), Adaptive Resonance Theory (ART), Independent
Component Axis (ICA), Principle Component Axis (PCA) and other methods such
as association rules and collaborative filtering. The main advantage of
56

unsupervised learning is that it is useful technique for data compression, as all


data compression algorithms either explicitly or implicitly rely on a probability
distribution over a set of inputs [153].

3.1.1.3 Hypothesis Selection


As stated in section 3.1.1.2.1 of this chapter, during supervised learning a
hypothesis, consider, = needs to be found based on the available training data,

& = '(x , * +, , (x
, *
+,, such that the risk  is minimized. In practice, the true

distribution 9(x, *+ is unknown and eq. (3.4) cannot be evaluated. Instead, the
empirical risk is estimated into eq. (3.6), based on the training set &. However,
the minimizer of eq. (3.6) is not necessarily the minimizer of eq. (3.4) [37].

Trivially, the function that takes the values 6 (x + = * on the training set and is
random, elsewhere has zero empirical risk, which clearly does not generalize.

Thus, minimizing the empirical error does not necessarily lead to a good
hypothesis [152]. This phenomenon is referred to as overfitting, where the
learned hypothesis has fitted both the underlying data generating process and
the noise in the training set [37, 154, 155].

This situation can be avoided, if some kind of hypothesis capacity9 control is


performed. If a low empirical risk is achieved by choosing a hypothesis from a
low capacity hypothesis space, then the true risk is also likely to be low.
Generally, given a consistent data set and a sufficiently rich hypothesis space,
there will definitely be a function that will give zero empirical risk and a large
true risk [37].

Capacity of a hypothesis space is a measure of the number of different labeling implementable


by functions in the hypothesis space.

57

3.1.2 AI Techniques
AI has produced a number of tools and techniques since its emergence as a
discipline in the mid 1950s. These techniques are of great practical significance
in engineering to solve various complex problems normally requiring human
intelligence [156]. After years of steady progress AI tools have evolved into
modern techniques, such as: Expert System (ES), Fuzzy Logic (FL), Artificial
Neural Networks (ANNs), Support Vector Machine (SVM), and the most recently
developed Extreme Learning Machine (ELM). All these techniques are being
widely applied in a growing number of applications [37]. The following sub
sections briefly discuss the basic concepts of ES, FL, ANNs. The theoretical
concepts of ELM are discussed in chapter 5, while the SVMs are discussed later
onwards in this chapter.

3.1.2.1 Expert System


Expert systems (ES), also known as Knowledge Based Systems (KBS) are
software based computer programs that attempt to reproduce the performance
of human experts, in a narrow domain for the solution of problems related to
that domain [156]. Expert systems are considered as a traditional technique and
subfield of AI.

A wide variety of methods can be used to simulate the performance of an ES,


however, the most commonly used ES consists of two basic components: (i) a
knowledge base, and (ii) an inference mechanism [37]. The knowledge base
uses a knowledge representation formalism to extract relevant knowledge from
the Subject Matter Experts (SME) or human experts, which is often heuristic in
nature and is expressed as any combinations of: IF-THEN rules, factual
statements, frames, objects, procedures and cases. The inference mechanism
gathers knowledge from the SME and manipulates the stored knowledge
accordingly to the formalism, in order to produce solutions [156].

58

Many applications of expert systems have been implemented to facilitate tasks


in various fields, such as accounting, medicine, process control, financial service,
production, and human resource [156]. In addition, ESs are currently being used
in business projects, such as credit rating people to see if they are worth giving
credit to, and also in the prediction of rise and fall in shares in the stock market.
The advantages of expert systems are that they are able process information
faster than most AI techniques and they do not make careless mistakes or
become obsolete over time [37]. However, contrary to the advantages, the
disadvantage of ESs is that they are notoriously narrow in their domain of
knowledge, i.e., they are experts only their particular field [156].

3.1.2.2 Fuzzy Logic


Fuzzy logic (FL) is a multi-valued logic derived from fuzzy set theory to deal
with the reasoning that is approximate rather than precise [157]. Specifically,
fuzzy logic is a superset of conventional Boolean logic that has been extended to
handle the concept of partial truth [158], i.e., the intermediate truth values
between conventional evaluations [157], completely true and completely
false such as yes or no.

In Boolean logic variables may have a membership value of only 0 or 1 [37]. The
notion central to fuzzy systems is that the truth values in fuzzy logic or
membership values in fuzzy sets are indicated by values in the range inclusively
between 0 to 1, and are not constrained to the two truth values {true (1), false
(0)} as in classic predicate logic [158]. Conventional methodology or theory
implementing crisp definitions such as the: classical set theory, arithmetic, and
programming, can be fuzzified by generalizing the concept of a crisp set to a
fuzzy set with blurred boundaries [159]. In addition, logical operations on fuzzy
sets are generalizations of conventional Boolean algebra. Similar to Boolean
logic, fuzzy logic has three basic operations: (i) intersection, (ii) union, and (iii)
complement [158].
59

Fuzzy logic systems are based on a set of rules. These rules allow the input to be
fuzzy, i.e., more similar to the natural way that human express knowledge [160].
Linguistic variables are also a critical aspect of FL applications, where general
terms such as large, medium, and small or hot and cold are each used to
capture a range of numerical values, which might be context dependent [161].

Fuzzy logic has been used in various applications such as, automobile and
vehicle subsystems, air conditioners, cameras, image processing, elevators,
washing machines, rice cookers, dishwashers, video games, speech recognition,
and pattern recognition [159].

3.1.2.3 Artificial Neural Network


An artificial neural network (ANN), often referred to as a neural network (NN),
is a mathematical or computational model based on biological neural networks
[154]. An ANN consists of several simple units called neurons or artificial
neurons, which are interconnected together and operate in parallel to process
information, thus, known as parallel distributed processing systems or
connectionist systems. In most cases an ANN is an adaptive system that changes
its structure based on external or internal information that flows through the
network during the learning phase. In more practical terms, ANNs are nonlinear statistical data modeling tools, which can be used to model complex
relationships between inputs and outputs or to find patterns in data [162].

ANNs can be used to gain an understanding of biological neural networks,


however, in most cases they are employed to solve AI problems without
necessarily creating a model of a real biological system. Since ANNs are models
based on the working of the human brain, which utilize a distributed processing
approach to computation, they are capable of solving a wide range of problems
60

by learning a mathematical model for the problem [37]. In addition, ANNs can
readily handle both continuous and discrete data and have good generalization
capability as with fuzzy expert systems [163].

Implicit knowledge is built into ANNs by learning, also referred to as training.


ANNs can be trained by typical input patterns and the corresponding expected
output patterns. In an ANN, the error between the actual and expected outputs
is used to strengthen the weights of the connections between the neurons,
which is a supervised machine learning technique. ANNs can also be trained
using unsupervised learning techniques, where only the input patterns are
provided during training and the network automatically learns to cluster the
inputs in groups with similar features [164].

ANNs map the inputs to outputs via weights during a training process by means
of connection and parallel distributed processing. The learned weights are used
to predict corresponding outputs for given inputs [164]. In a basic multi-layer
ANN structure, as shown in Figure 3.1, the input layer of the artificial neurons
receives information from the environment and the output layer communicates
the response. Between the input layer and the output layer of an ANN lie the
hidden layers [37]. The number of hidden layers in an ANN is variable and
may be one or more than one, depending upon the problem to be solved [155].
The hidden layers have no direct contact with the environment, and these layers
are where most of the information processing takes place in an ANN. The output
of an ANN depends on the weights of the connections between neurons in
different layers. Each weight indicates the relative importance of a particular
connection. If the total sum of all the weighted inputs received by a particular
neuron surpasses a certain threshold value, the receiving neuron will send a
signal to each neuron to which it is connected in the next layer [37]. This
standard procedure is followed by all neurons within the network, in order to
form the network output.
61

Weights

Network
Inputs

Network
Outputs

...

...
Output Layer

Input Layer
Hidden
Layer 1

Hidden
Layer 2

Hidden Layers

Figure 3.1: Basic structure of a multi-layer Artificial Neural Network (ANN)

ANNs have proven to be successful in many general problem areas such as:
function approximation, regression analysis, prediction, classification, pattern
recognition, optimization, conceptualization, data processing, filtering and
clustering [37]. They have been employed in a number of applications, some of
which include: vehicle control, game-playing, decision making, radar systems,
face identification, object recognition, speech recognition, text recognition,
medical diagnosis, financial applications, data mining, visualization and spam
filtering.

The advantage of ANNs lies in their resilience against distortions in the input
data and their capability of learning. ANNs are good at solving problems that are
too complex for humans or conventional technologies [161]. However, the main
disadvantage of ANNs is the problem of overfitting and underfitting of data

62

associated with them, due to excessive or less training respectively, which


results in poor generalization performance.

3.2 Support Vector Machine


Support Vector Machines (SVMs) were first introduced by Vladimir Vapnik
between the late 1970s and early 1980s [169], and they are based on the
Structural Risk Minimization (SRM) principle from statistical learning theory
[152]. SVMs are a set of related supervised machine learning methods used for
classification, which have recently become an active area of intense research
with extensions to regression and density estimation [170]. In contrast to the
Empirical Risk Minimization (ERM) principle, which is used by neural networks
to minimize the error on the training data, the SRM minimizes a bound on the
testing error, thus allowing SVMs to generalize better than conventional neural
networks [171]. Apart from the problem of poor generalization and overfitting
in NNs, SVMs also address the problem of efficiency of training and testing, and
the parameter optimization [151] problems frequently encountered in ANNs.

SVMs were initially developed to solve classification problems and initial work
was focused on optical character recognition (OCR) applications [172, 173].
Some recent applications and extensions of SVMs include: isolated handwritten
digit recognition [174], object recognition [175], speaker identification [176],
face detection [28, 177], text categorization [29, 33] and bioinformatics and
data mining [178]. All of the above cases are examples of SVMs used for
classification. In addition, SVMs have also been proposed and applied to a
number of different types of problems including regression estimation, novelty
detection, density estimation and the solution of inverse problems. However, in
thesis only SVM classification will be discussed, since the focus of the present
study is classification.

63

The sections that follow give a brief description of the basic concepts of
statistical learning theory and the SRM principle, followed by an introduction to
Support Vector Classification (SVC), including the theoretical concepts and its
implementation. For additional material and a more detailed description of
SVMs one can refer to the works of V. Vapnik [152, 169], C. Burges [179], B.
Schlkopf [173, 180] and A. Smola [181, 182].

3.2.1 Statistical Learning Theory


Generally, for a given learning task with a given finite amount of training data,
the best generalization performance will be achieved if the right balance is
reached between the accuracy attained on the particular training set, and the
ability of the machine to learn any training set without errors, that is, its
capacity. A trained machine with very large capacity will overfit on the
training data and will be able to identify only previously seen samples, i.e.
trained samples, while a machine with very small capacity will not be able to
correctly identify previously seen data, i.e. learning of the training data will be
incomplete. Therefore, the relation between the capacity and the performance
of a learning machine must be controlled to achieve the right balance.

3.2.1.1 Structural Risk Minimization Principle

Suppose, for a two-class pattern recognition problem, % observations are given.

Each observation consists of a pair: a vector, x = 0 , for 1 = 1,2, , % and the


associated label * = '1, +1,, given by the source of information. Ultimately,
the task of a learning machine is to estimate a function 67 : '1, using these

exmaples, such that 67 will correctly classify unseen samples (x, *+.

The VC (Vapnik-Chervonenkis) characteristics, which is a part of the statistical

learning theory, shows that the set of functions , where 67 is chosen from must
64

be restricted to one that has a capacity, that is suitable for the amount of the
available training data. In order to do so, the VC theory introduces bounds on
the expected risk in eq. (3.4) and depends on the empirical risk in eq. (3.6)
and the capacity of . The minimization of these bounds leads to the SRM

principle. According to this principle, for all the functions in (i.e., for any value
of =), and % with a probability of atleast 1 U (0 U 1+, the following

bound holds:

V67 W  L# V67 W + X

^_
`

d
e

YZ[\]Z ab ac[\]Z a

Where

(3.7)

is a non-negative integer, called the VC Dimension.

% represents the number of training samples, and


U is a confidence measure.

The parameter is defined as the largest number of points that can be


separated in all possible ways using functions of the given set. In effect, the VC
dimension provides a measure of the notion of capacity. The right hand side of

eq. (3.7) is a bound on  V67 W and it holds only with a certain probability. The

second term of this bound is called the VC Confidence, which is a monotonic

increasing function of for every value of %. This bound is independent of

9(x, *+, and if U is known, it can be easily computed. Conversely, the left hand

side of eq. (3.7) is difficult to compute. Thus, given some selection of learning
machines whose empirical risk is zero and choosing a fixed, sufficiently small

value of U, one must choose the learning machine whose associated set of

functions has minimal VC dimension. This leads to a better upper bound on


the expected risk. In general, for non-zero empirical risk, one wants to choose
that learning machine which minimizes the right hand side of eq. (3.7). This is
the main idea of the SRM principle.

65

To summarize, given a fixed number of training samples, one can control the

risk by controlling both the empirical risk  L# V67 W and the VC dimension . The
VC confidence term in eq. (3.7) depends on the chosen class of functions,

whereas both the empirical risk in eq. (3.6) and the expected risk in eq. (3.4)
depend on the function chosen by the training procedure, i.e., on the parameters
of =. The right choice = of for controlling the empirical risk in eq. (3.6) is made

through experimentation. For controlling (i.e. the complexity of the learning

machine) one must find the subset of the chosen set of functions where the risk
bound is minimized. To do so, a structure is introduced dividing the entire class
of functions into nested subsets.

F F Fh

(3.8)

whose VC-dimensions satisfy,


 h

(3.9)

For each subset, or a bound on itself should be computed. The SRM principle

then consists of finding that subset of functions that minimizes the bound on
the expected risk. This can be done by training a series of machines, one for
each subset, where for a given subset the goal of training is simply to minimize
the empirical risk. The subset of functions which minimizes the bound on the
expected risk is the subset of the trained machine in the series, whose sum of
empirical risk and VC confidence is minimal. For more details about the bound,
the reader can refer to [152].

3.2.2 Support Vector Classification


After introducing the basic concepts of statistical learning theory and SRM
principle, the basis needed for describing SVMs, namely the Support Vector
Classification (SVC) is set. Since SVMs are supervised learning machines, based
on the binary classification case given in section 3.1.1.2.1, it is shown that one
66

needs to find a class of functions whose capacity can be computed in order to


design a learning machine. Vapnik and Chervonenkis in [183, 184] considered
the class of hyperplanes:
i k + l = 0,

i m ,

(3.10)

corresponding to the decision function,


6(x+ = sign(i k + l+

(3.11)

and described the Generalized Portrait algorithm for constructing separating


hyperplanes from empirical data.

Figure 3.2: Optimal margin hyperplane for the separable case of SVC

This learning algorithm was proposed for separable problems and it is based on
the fact that among all possible separating hyperplanes, there exists a unique
hyperplane with maximum margin of separation from the classes, i.e. an
optimal margin hyperplane (OMH), and the capacity decreases with increasing
67

margin, as shown in Figure 3.2. The sections below present the linearly
separable case, followed by the non-linear separable case, implementation of
the SVC, and basics of the support vector algorithm in detail.

3.2.2.1 Linear SVC


In the simplest linear form, a SVC is a hyperplane that separates the positive
samples from the negative samples with a maximum margin10. Finding this

hyperplane also involves finding two hyperplanes parallel to it. Suppose, p and

p are these hyperplanes, as shown in Figure 3.3, with equal distances to the

maximum margin and with the condition that there are no data points between
them. If all the training data satisfies the constraints as follows:
i k q + l +1, for * = +1

i k q + l 1, for * = 1

(3.12)
(3.13)

Where
w is normal to the hyperplane,
|r|

is the perpendicular distance from the hyperplane to the origin, and

i is the Euclidean norm of w.

The separating hyperplane is defined by the plane, i k q + l = 0, and the above


constrations in eq. (3.12) and eq. (3.13) can be combined into:
* (i k q + l+ 1 0 1

(3.14)

The points for which the equality in eq. (3.12) holds lie on the hyperplane
p i k q + l = +1, with normal w and perpendicular distance from the
origin

| r|
i

. The points for which the equality in eq. (3.13) holds lie on the

hyperplane p i k q + l = 1, with normal w and perpendicular distance


10

The maximum margin is the distance from the separating hyperplane to the closest sample.

68

from the origin


| crb br|
i

||

| r|
i

. Hence the maximum margin can be simply calculated by,

and the pair of hyperplanes that gives the maximum margin can

be found by minimizing i , subject to constraints in eq. (3.14).

Figure 3.3: Linear separating hyperplanes for the separable case of SVC
outlining the support vectors (maximum margin approach)
The points that lie on the hyperplanes p and p , and whose removal would
change the found solution, are known as support vectors (SVs). So, in order to

construct the optimal margin hyperplane (OMH), the optimization problem is


formulated as:
mini,r i


(3.15)

subject to eq. (3.14). Switching to the Lagrangian formulation of the problem


will allow for a generalization to the non-linear case, since the training data will
then appear in the form of dot products between vectors. The constraints in eq.
(3.14) will also be replaced by the constraints on the Lagrange multipliers,
which are easier to handle. This problem is reformulated by introducing
Lagrange multipliers, = for 1 = 1,2, , %, one for each of the inequality

69

constraints in eq. (3.14). For constraints in the form, = 0, the constraint

equations are multiplied by the Lagrange multipliers and subtracted from the
function 6 (k+ = i k q + l, to form the primal formulation of the Lagrangian11.
<x  i
N = * ( i k q + l+ +
N = ,

= 0

(3.16)

In eq. (3.16), <# must be minimized with respect to w, b, and simultaneously the
derivates of <# with respect to all Lagrange multipliers = must vanish, all
subject to the constraints, = 0. For more details on Lagrange multipliers and
other optimization methods, one can refer to [151, 154, 185].

3.2.2.1.1 Dual Problem


The problem defined in eq. (3.16), is a convex quadratic programming (QP)
problem, since the objective function is itself quadratic and the points that
satisfy the constraints also form a convex set. Hence, in this case the Wolfe dual

can be used instead [185], which maximizes <# with respect to = , subject to the

constraints that the gradient of <# with respect to w and b vanish, and that
= 0. This gives the following two conditions:

i = zqN{ = * k q
zqN{ = * = 0

(3.17)
(3.18)

Substituting these constraints into eq. (3.16) gives the dual formulation of the
Lagrangian:
<
N = 
,|N = =| * *| (k q k }+

(3.19)

11

The Lagrange function is given by substituting the sum of all products between constraints
and corresponding Lagrange multipliers from the primal objective function (the function that is
to be minimized) [185].

70

Support vector training for the linearly separable case therefore amounts to

maximizing < with respect to = , subject to the linear equality constraint in eq.
(3.18) and positivity of the = , with solution given by eq. (3.17). There is a

Lagrange multiplier = for every training point and in the solution, those points
for which the inequality constraint = 0 is true are the support vectors (SVs).

All other training points have = = 0. For SVMs the SVs are critical elements of

the training set because they lie closest to the decision boundary. Therefore, a
new object x can be classified using:
6 ( k+ = i k + l

6(k+ = 
N = * k q k + l

6(k+ = sign
N = * (k q k+ + l

(3.20)

In both the objective function < and the solution, the training vectors kq occur
in the form of dot products, which is one of the reasons for using a Lagrangian

formulation of the problem. There is a single, global maximum forming the

solution, with no local extreme. Since, both <x and < arise from the same

objective function, but with different constraints, the solution can be achieved
by minimizing the Lagrange function, <x with respect to the primal variables or
maximizing it with respect to the Lagrange multipliers, i.e. the dual variables
[186]. In practical terms, this convex QP problem can quickly be solved using
quadratic optimization methods such as Sequential Minimum Optimization
(SMO) [187, 188]. However, a large part of the reason behind this optimization
is the great computational efficiency of the SVC formulation.

3.2.2.1.2 Non-Separable Case


There are two possible directions to which SVC can be extended in order to
include the non-separable cases, by constructing a soft margin hyperplane.
The first direction is to extend SVC to non-linear separating hyperplanes, as
shown in Figure 3.4, which is discussed in detail in section 3.3.2.2 later onwards.

71

The second direction is to allow for noise or imperfect separation, as shown in


Figure 3.5. In the second case the fact that there must be no data points between

the two hyperplanes p and p is not strictly enforced. In order to do that, the
constraints in eq. (3.12) and eq. (3.13) are relaxed, but only when necessary.

Hence, a further cost, i.e., an increase in the primal objective function in eq.
(3.16), is introduced penalize to the data points that cross the hyperplane
boundaries.

Figure 3.4: Linear separating hyperplanes for the non-separable case of SVC
where the slack variable  permits margin failures (soft margin approach)

This can be achieved by introducing positive slack variables  , 1 = 1,2, , % in

the constraints [174]. The constraints in eq. (3.12) and eq. (3.13) then become:
i k q + l +1  ,

i k q + l 1 +  ,
 0,

for * = +1
for * = 1

(3.21)
(3.22)
(3.23)

From eq. (3.21) and (3.22) it is shown that for an error to occur, the
corresponding  must exceed unity, so  is an upper bound on the number of

training errors.

72

In order to assign an extra cost for errors, the objective function in eq. (3.16) is
modified in order to include a penalizing term  i +  (  +L instead of just
using the term  i. For any positive integer, m this is a QP problem; even for

m = 1 and m = 2 it is also a QP problem and is usually set to 1, which has the

advantage that neither  , nor their Lagrange multipliers, appear in the Wolfe
dual problem. The penalty parameter C is a regularization parameter that trades

off the wide margin with a small number of margin failures (soft margins). This
parameter is finite (setting  = leads back to the original perfect separating

case in section 3.2.2.1). A larger value of C corresponds to assigning a higher


penalty to errors. From the above, the Wolfe dual problem becomes:
max7 <
N = 
,|N = =| * *| (k q k } +
subject to:

0 = C K

N = * = 0

(3.24)

Thus, the only difference from the perfectly separating case, i.e. linear case in

section 3.2.2.1, is that the Lagrange multipliers = are now bounded above by C
instead of . The solution is again given by:

i =
qN{ = * k q

(3.25)

Where
N is the number of support vectors.

3.2.2.1.3 Karush-Kuhn-Tucker Conditions


The Karush-Kuhn-Tucker (KKT) conditions are very important for both the
theory and practice of constrained minimization. For the primal problem in eq.
(3.16), the KKT conditions are [185]:

73

<x =
N = * kq = 0, = 1,2, , >

<x =
N = * = 0

* (i k q + l+ 1 0, i = 1,2, , %

= '* (i k q + l+ 1, = 0
= 0

(3.26)
(3.27)
(3.28)
(3.29)
(3.30)

where 1 = 1,2, , % and = 1,2, , >. The parameter > represents the dimension
of the data. The eqs. (3.26) to (3.30) are satisfied at the solution of any

constrained optimization problem, provided that the, intersection of the set of


feasible directions with the set of descent directions, coincides with the
intersection of the set of feasible directions, for linearised constraints with the
set of descent directions [185, 189]. This regularity assumption holds for all
SVMs, since the constraints are always linear. In addition, the problem for SVMs
is convex and for convex problems, if the regularity assumption holds, the KKT
conditions are necessary and sufficient for w, b, and = to be a solution [185].

Thus, finding a solution to the KKT conditions is equivalent to solving the SVM
problem. The threshold b is found by using the KKT complementary condition in
eq. (3.29), by choosing any 1 where = 0. However, it is numerically safer to

take the mean value of b, resulting from all such equations [179]. The KKT
conditions for the primal problem are also used in the non-separable case, after
which the primal Lagrangian becomes:
<x  i +   = '* (i kq + l+ 1 +  , 

(3.31)

where are the Lagrange multipliers introduced to enforce positivity of the

slack variables,  . The KKT conditions for the primal problem are therefore:

=
N = * k q = 0
=  = * = 0

=  = = 0

= '* (i k q + l+ 1 +  , = 0

(3.32)
(3.33)
(3.34)
(3.35)
74

* (i k q + l+ 1 +  0

= ,  , 0
 0

(3.36)
(3.37)
(3.38)

where 1 = 1,2, , % and = 1,2, , >. Once again the KKT complementary

conditions in eq. (3.35) and (3.38) can be used to determine the threshold b. It is

oberseved that, eq. (3.34) combined with eq. (3.38) shows that  = 0 if = < ,

since =  = 0. Thus, any training point for which 0 < = < , that is, a

data point that is not penalised, i.e. it doesnt cross the boundary, can be taken
to compute b. As before, the average b over all such training points is used. The
eqs. (3.35) and (3.36) show that a data point can either be behind p and p :
= = 0

* (i k q + l+ 1 +  > 0

(3.39)
(3.40)

and does not participate in the derivation of the separating function, or is either
on the planes p and p or crosses the boundaries with = =  and  > 0.
= > 0

* (i k q + l+ 1 +  = 0

(3.41)
(3.42)

3.2.2.2 Non-Linear SVC


The SVC formulation in section 3.2.2.1 is related in terms of a linear separating
hyperplane. When applied to linearly inseparable data, as shown in Figures 3.4
and 3.5, no feasible solution is found, in which case the objective function (i.e.
the dual Lagrangian) will grow arbitrarily large, i.e. into an infinite solution.

In the case of linearly inseparable data, SVMs map the data into another high
dimensional space, known as the feature space, such that the mapped data
points will be linearly separable. This can be done easily, since the only
75

requirement is the convolution between two input vectors in feature space,


described by the dot product in eqs. (3.16) and (3.20). If the mapping between
input space and feature space is given by:
 

(3.43)

Where
d is the dimensionality of the input space, and
D is the dimensionality of the feature space.

Figure 3.5: Non-linear separating hyperplanes for the non-separable case of SVC

The input space is transformed into a higher-dimensional feature space [174,


190] in order to determine the optimum soft-margin hyperplane in the feature
space, using the dual Lagrangian formulation in eq. (3.16). Since, the
transformation is represented by (+, then the dual Lagrangian takes the form:
<
N =
,|N = =| * *| (kq + k }


(3.44)

76

Then the dot product in the high-dimensional feature space [175] is equivalent
to a kernel function in the current space, that is:
k q , k } = (k q + k }

(3.45)

Then there is no need to explicitly know the transformation function (+ as

long as it is known that the kernel function k q , k} [190] is equivalent to the


dot product in some other high dimensional space. Hence, with this kernel
function replacing the dot product, the classifier in eq. (3.20) becomes:
6(k+ = sign
N = * (kq , k+ + l

(3.46)

In order to determine whether a function is a kernel function or not, the


Mercers condition can be used [152, 191]. Mercers condition states that, there
exists a mapping and an expansion:

(k, + = (kq + (q +

(3.47)

if and only if, for any function (k+ such that,

is finite, then:

; (k+ >k

(3.48)

(k, + (k+ (+ >k> 0

(3.49)

must be satisfied for any function (k+ with a finite L norm.


There are many kernel functions that can be used. Three common kernels,
summarized in Table 3.1, have been shown to satisfy Mercers condition and are
commonly used with SVM. The sigmoidal kernel will only satisfy Mercers
condition for particular values of the free parameters [179, 186], but has been
used successfully in practice [152]. The polynomial kernel of degree p, is

77

inhomogeneous in that, it allows the additive constant c to be larger than zero


[186, 190] for additional degrees of freedom.

Table 3.1: Non-linear kernels commonly used to perform a dot product in a


mapped feature space in the SVM formulation
Name
Polynomial
Radial Basis Function
(RBF)
Sigmoidal

Parameters
,

Kernel Function

k q , k} = ckq k}

k q , k} = tanhk q k }

k q , k} = k q k } +
^

The Radial Basis Function (RBF) or Gaussian kernel is the most widely used

kernel. The RBF kernel is translation invariant, that is, k q , k } = k q k }


[186], and has an infinite number of dimensions [152]. A significant advantage

of the RBF kernel is that it adds only a single free parameter > 0, which
controls the width of the RBF kernel as: = 12  , where  is the variance of

the resulting Gaussian hypersphere. The RBF kernel has been shown to perform
well in a wide variety of practical applications, such as in [192194].

3.2.3 Implementation of SVC


As mentioned previously, training of SVMs requires the solution of a convex QP
optimization problem. This is a task that is very difficult to implement by an
average engineer and the training algorithms that use numerical QP are slow,
especially for large size problems. In the past decade, a number of researchers
have introduced new learning algorithms that use faster and simpler-toimplement methods, than the algorithms used until now for solving QP
problems. These methods include the: Newton method, Quasi Newton method,
Kernel Adatron (KA) [195], Sequential Minimal Optimization (SMO) [187, 188]
and Genetic Algorithm (GA).

78

This research study only focuses on SMO, as other optimization methods are
slower [196] on large amounts of data, and also because the library used for
implementing SVC in this research study utilizes the SMO algorithm. The
following section gives a brief description of the SMO algorithm and its
implementation for SVC.

3.2.3.1 Sequential Minimal Optimization


As the training of SVMs under normal circumstances, requires a lot of time for
calculation of the Kernel matrix, the training time increases tremendously when
a large number training samples are present, resulting in bigger a Kernel
matrix. In order to solve this problem, SMO deals with the large QP problems,
by breaking (decomposing) the problem into a series of smaller QP problems.

The Sequential Minimal Optimization (SMO) algorithm was first introduced by


John C. Platt [187, 188] in 1998. The main concept of SMO is to break large QP
problems into smaller QP problems. More specifically, in SMO a minimal subset
of just two training samples can be optimized on every iteration. This is because
the smallest number of multipliers that can be used for optimization at each
step, keeping the condition in eq. (3.18) of the full problem true throughout the
learning process, is two. In this way each small QP problem is solved
analytically without the need for time consuming numerical QP optimization as
part of the algorithm, which makes implementation of SMO easy and simple.

At every step of SMO, two Lagrange multipliers are selected for optimization
and after their optimal values are found given that all the other multipliers are
fixed, the SVC is updated accordingly. In SMO, the two training samples are
selected using a heuristic method, and then the two Lagrange multipliers are
solved analytically. These are the two main components of the SMO algorithm.
79

The main advantage of SMO is that because it uses only two training samples at
every step and avoids the computation of a kernel matrix [197], SMO requires a
smaller amount of memory and can handle very large training sets compared to
other optimization techniques [187, 188] such as the chunking algorithm
[169].

3.2.3.1.1 Solving Two Lagrange Multipliers


In order to solve for two Lagrange multipliers, SMO computes the constraints
on these multipliers and solves for the constrained minimum. In this thesis, for
convenience, the first multiplier is represented by subscript 1, while the second
is represented by subscript 2. At every step of the algorithm, the minimum
number of Lagrange multipliers that can be optimized is two since, in order to
keep the linear equality constraint of problem in eq. (3.24) true, whenever one
multiplier is updated, the one other needs to be adjusted.

In order not to violate the bound constraints of the problem in eq. (3.24), the
Lagrange multipliers must lie within a box defined by 0 = , = , while for
the linear equality constraint of the problem to be true, the multipliers must lie
on a line:
= * + =* = constant

(3.50)

Thus, the constrained minimum of the objective function must lie on a diagonal
line segment, as shown in Figure 3.6. The SMO algorithm first computes the
second Lagrange multiplier = and then uses it to obtain the first Lagrange

multiplier = . Using the constraints of the dual problem in eq. (3.24), the
following bounds apply to = :

< = p

(3.51)

80

Where

if * * , and
if * = * .

< = max(0, = = + ,

p = min(,  + = = +,

< = max(0, = + =  + ,

p = min(, = + = +,

(3.52)

(3.53)

Figure 3.6: Inequality constraints causing the Lagrange multipliers to lie within
a box and the linear equality constraint causing the Lagrange multipliers to lie
on a diagonal line [187]
The difference  (1 = 1,2+ between the output of the current hypothesis 6(k+

and the target classification in the training samples k { or k can be expressed

using:

 = 6 (k q + * = 
|N =| *| k } , k q + b * ,

1 = 1,2,

(3.54)

Where
 is the error on the ith training sample, for the error-cache E.

81

The second derivative of the objective function, along the diagonal line, is
expressed as, :

= (k {, k {+ + K(k , k + 2K(k {, k + = (k { + (k +

(3.55)

Given the objective function is definite positive, there is a minimum along the

direction of the second constraint in eq. (3.24), and, > 0. The maximum of the

objective function for the optimization problem can be achieved by computing


the new value of = .

=h = = +

^ ( c^ +

(3.56)

and by clipping =h to the end of the line segment (i.e., < =h p),
h ,
## 
=

=h

<

16 =h > p
16 < < =h < p K
16 =h < <

(3.57)

Then, the new value of = is obtained from the clipped value =

h ,
## 

as

follows:

= h = = + * * (= =

h ,
## 

(3.58)

3.2.3.1.2 Heuristics to Select Multipliers


As the SMO algorithm always optimizes and changes the two Lagrange
multipliers at every step, however, at least one of the two Lagrange multipliers
violates the KKT conditions before the step, so each step will decrease the
objective function, contributing less towards convergence. Therefore, in order
to speed convergence, SMO uses heuristics to choose which two Lagrange
multipliers to jointly optimize.

82

The algorithm has an outer loop that iterates over the entire training set,
looking for samples that violate the KKT conditions. These samples are eligible
for optimization. The first choice heuristic is used for selecting the first sample
(k {, * + and concentrates on the non-bound samples whose Lagrange

multipliers are neither 0 nor C. These samples are most likely to violate the KKT
conditions and they are eligible for optimization. The outer loop makes repeated
passes over the non-bound subset until all of the non-bound samples satisfy the
KKT conditions. The SMO then iterates again over the entire training set to
search for any bound samples that may now cause KKT violations due to the
optimization of the non-bound subset. After the entire training set obeys the
KKT conditions, the algorithm then terminates.

The second Lagrange multiplier is chosen so that the size of the step taken
during joint optimization is maximized, resulting in a large increase of the dual
objective. Because computation of the kernel function is time consuming, SMO

chooses the second sample (k {, * + to maximize the absolute value |  | of


the numerator in eq. (3.56). If  is positive, the SMO chooses a sample k with

minimum error  . However, if  is negative, the SMO chooses a sample k with

maximum error  . To reduce the computation further, the algorithm also keeps

a cached list of errors for every non-bound sample in the training set.

In the case where the SMO makes no significant progress (i.e. there is no
significant increase in the dual objective), the algorithm iterates over all the
non-bound samples12. If this fails too, SMO looks once more for a suitable
sample through the entire training set12.

12

Starting from a random location in order to avoid a bias towards the samples at the beginning
of the training set.

83

3.2.3.2 SMO Algorithm Overview


The SMO algorithm, described in section 3.2.3.1 above, is summarized in Table
3.2 below.

Table 3.2: Summarized procedure of the SMO algorithm [37]


Step

Procedure

Choose the first Lagrange multiplier to be a KKT violator.

Choose the second Lagrange multiplier using heuristics.

3
4

Update the second Lagrange multiplier via: =h = = +


Clip the

multiplier =h

to

h ,
## 
=

^ ( c^ +

If the multiplier does not change, go back to Step 1.

Update the first Lagrange multiplier.

Update the error-cache.

If all Lagrange multiplier fulfill KKT conditions, stop; else go to Step 1.

3.3 Summary
This chapter reviewed the background and theoretical concepts of the AI
techniques applied in this research study. The introduction started off by
discussing the preliminaries of AI briefly, which include pattern recognition and
machine learning techniques. In the next section, popular AI techniques such as:
Expert System (ES), Fuzzy Logic (FL) and Artificial Neural Networks (ANNs)
were briefly discussed. In the sub chapter of SVM, the statistical learning theory
was presented followed by the SRM principle, where both of these concepts
lead to the introduction of the SVM, namely SVC. The background and
theoretical concepts of SVC were then discussed in detail where derivations of
the margin hyperplanes for linear and non-linear SVC were presented, followed
by the kernel methods. In the last part of the SVM sub chapter the SMO
algorithm was presented.

84

From the extensive review of SVM concepts this chapter, it is observed that SVC
has a notable number advantages as compared to standard neural networks.
Firstly, SVC has non-linear dividing hypersurfaces that give it high
discrimination. Secondly, it provides good generalization ability for unseen data
classification. Lastly, SVC determines the optimal network structure itself,
without requiring tuning any parameters. With the recent success of SVMs in
various real world applications such as: face identification [28], text
categorization, [29] and bioinformatics [30], additional motivation for this
research study is gained. There are many widespread research papers and
literatures indicating the classification accuracy of SVMs outperform other
traditional classification methods, such as ANNs [31, 32]. A comparison of SVM
classification results wither other techniques as reported by T. Joachims [33]
for text categorization is presented in Table 1.1, which indicates that SVMs have
the ability to achieve a higher classification accuracy as compared to other
techniques.

85

CHAPTER 4

MODEL DEVELOPMENT

4.0 Overview
This chapter provides the methodology proposed for the fraud detection
framework and implements the associated key algorithms to be used for NTL
detection. The first sub chapter introduces the general project and research
methodologies. The three major stages are involved in the development of the
fraud detection system, include: (i) data preprocessing, (ii) classification engine
development, and (iii) data postprocessing. The data preprocessing sub chapter
illustrates data mining techniques used for preprocessing raw customer
information and billing data for feature selection and feature extraction. The
sub chapter, classification engine development illustrates the SVC training,
parameter optimization, development of the SVC classifier and the SVC testing
and validation engine. The last sub chapter, data postprocessing describes the
development of a Fuzzy Inference System (FIS), creation of fuzzy rules and
membership function (MF) formation for the selection of suspicious customers.

4.1 Proposed Framework


The proposed detection framework is divided into two categories: (i) project
methodology, and (ii) research methodology. The overall work completed for
this project consists of the project methodology. The following sections give a
brief introduction of the proposed framework used for the purpose of the
project and the research study.
86

4.1.1 Project Methodology


This project, titled Development of an Intelligent System for Detection of
Abnormalities and Probable Fraud by Metered Customers in TNB Distribution
Division, was initiated by TNB in Malaysia in an effort to reduce its NTLs in the
LV distribution network, which are estimated around 15% in peninsular
Malaysia. This project is a collaborated effort between TNBD, TNB Research
(TNBR) and the Power Engineering Centre (PEC) of Universiti Tenaga Nasional
(UNITEN). The overall project methodology outlined in the project proposal
[198] is given in Figure 4.1.

Start of Project

Literature Review

Data Identification

Data Collection

Preliminary Testing
and Model Analysis

Data Processing
and Model Fitting

Data Cleaning

Model
Improvement for
Larger Samples

Development of
User Interface and
System Integration

Testing and
Validation

End of Project

Figure 4.1: Flowchart of the proposed project methodology [198]

87

4.1.2 Research Methodology


The research methodology framework proposed in order to develop an
intelligent fraud detection system for detection, identification and prediction of
NTL activities is shown in Figure 4.2. The research methodology is embedded
within the project methodology shown in Figure 4.1.

Start of Research

Raw Customer Data

Data Preprocessing

Feature Selection
and Extraction

SVM Training

Classification

SVM Parameter
Tuning

Data Postprocessing
using FIS

Model (Classifier)

List of Suspicious
Customers

Classification Engine Development


End of Research

Figure 4.2: Flowchart of the proposed framework for detection of NTL activities

88

In this research study, the NTL detection framework proposed in Figure 4.2 uses
historical customer billing data of TNB customers and transforms the data into
the required format for the SVM, by data preprocessing and feature extraction.
TNB customers are represented by their consumption profiles over a period of
time. These profiles are characterized by means of patterns, which significantly
represent their general behavior and it is possible to evaluate the similarity
measure between each customer and their consumption patterns. This creates a
global similarity measure between a normal and fraud customers as a whole.
The identification, detection and prediction are undertaken by the SVC, which is
the intelligent classification engine. With the help of the SVC results correlated
with the customer data, a FIS is employed to shortlist suspicious customers
from the testing data. This list of suspicious customers is used by TNBD SEAL
teams to strategize their NTL inspection activities.

The fraud detection system developed in this project forms the basis of this
research study. The Graphical User Interface (GUI) software is developed in this
project uses Microsoft Visual Basic 6.0. SVC training and testing is implemented
using the software, LIBSVM v2.86 [199], which is a library for SVMs. The
computer used for training the SVC model was Dell PowerEdge 840 workstation
with Windows XP, a 2.40 GHz Intel Quad-core Xeon X3320 Processor with 4 GB
of RAM. The following sections will further discuss the details of the processes
outlined in the fraud detection framework, in Figure 4.2.

4.2 Data Collection


Data collection is one of the most important phases in this project. The
electricity consumption data to be used in this project and research study was
identified by the TNBD SEAL team experts who have inspected customer
premises and meter installations. The type of data acquired is in the form of
historical customer electricity usage and billing information.

89

The data collection for this research study was performed two times. In the first
data collection phase, historical customer billing and consumption data was
collected for training, i.e., to make the intelligent system learn, memorize and
differentiate between normal and suspicious consumption patterns. In the
second data collection phase, similar data was collected as in the first stage, but
this data was used for testing the fraud detection system, i.e., to detect and
identify suspicious TNB customers. The data used for training the SVC was
collected from the Kuala Lumpur (KL) Barat region, in the state of Selangor in
Malaysia, while the data used for testing the system was acquired from three
cities in the state of Kelantan in Malaysia, namely: Kota Bharu, Kuala Krai and
Gua Musang. Table 4.1 illustrates the statistics of the data collected from TNBD.

Table 4.1: Customer data collected from TNBD for training and testing
Data
Training

Testing

TNBD Station
Kuala Lumpur (KL)
Barat

No. of Customers

Fraud Cases

265,870

1171

Kota Bharu

76,595

101

Kuala Krai

18,880

37

Gua Musang

13,045

The data acquired for the TNB stations listed in Table 4.1, was for a period of 25
months, i.e., from May 2006 to May 2008. The amount of data was limited to be
25 months only, as there were problems faced by TNBD system administrators
with retrieving archived data older than 25 months from their customer billing
database. The archived data stored in TNBDs customer database consists of
customer billing records for the previous 10 years. The two years of customer
consumption data acquired for this research study, only contributes 20% of the
entire customer consumption history from the 10 year billing period. Thus, this
is the major drawback faced by this research study, since the amount of data
collected only contributes to a small amount of customer consumption history.

90

Data for all TNB stations listed in Table 4.1 was obtained in the Microsoft Office
Access Database format. Two types of customer data were collected, which are
the: (i) Enhanced Customer Information Billing System (e-CIBS) data, and (ii)
High Risk Data.

4.2.1 Customer Information Billing System Data


The customer information and billing data collected from TNBDs Enhanced
Customer Information Billing System (e-CIBS) as indicated in Table 4.1, consists
of 25 tables in a Microsoft Access Database format, as shown in Figure 4.3. Each
of the 25 tables consists of monthly consumption and billing data for all
customers within in a TNBD station. The number of rows in Figure 4.3 indicates
the number of customers. As customer data is confidential to the utility, i.e.
TNBD in this case, therefore, in order to protect the privacy of the customers,
"Customer Names" in Figure 4.3 and the figures succeeding have been blurred.

Figure 4.3: The e-CIBS data for the KL Barat Station


91

The monthly e-CIBS data is arranged into 14 data columns as shown in Figure
4.3. Table 4.2 describes the customer billing and consumption information
listed the monthly e-CIBS data.

Table 4.2: Customer information listed in the monthly e-CIBS data


Column

Data

Description

Station No.

Number of the TNB Distribution centre.

Customer No.

Reading Unit

Customer Name

Full name of registered customer.

Customer Class

Indicates if customer belongs to:


corporate, government or normal
sector.

Tariff Code

Indicates the tariff for the customer.

TOE

CWR

HRC

10

Reading Date

Meter reading date.

11

Reading Type

Meter reading type: normal or


estimated.

12

Consumption

Monthly kWh consumption.

13

IR

Irregularity Report (IR) indicates if


customer has irregular consumption.

Customer number listed in TNBs


billing information
Number identifying the location of
customers premises within a specific
area of a locality.

Indicating customers previously


detected by TNBD for Theft of
Electricity (TOE).
Credit Worthiness Rating (CWR)
relates to the payment status of the
customer.
Indicating if customer is a High Risk
Customer (HRC), i.e., probability of
fraud.

4.2.2 HighRisk Data


The HighRisk data acquired from TNBDs customer database for all TNB stations
in Table 4.1 consists of only one table in the Microsoft Access Database format,
as shown in Figure 4.4. The HighRisk data mainly consists of fraud customers
92

previously identified by the SEAL teams. The HighRisk data was additionally
requested from TNBD, in order to aid this research study for the development of
this project.

Figure 4.4: The HighRisk data for the KL Barat Station

The HighRisk data is arranged into 3 data columns as shown in Figure 4.4. Table
4.3 indicates the customer information listed the HighRisk data.

Table 4.3: Customer information listed in the HighRisk data


Column

Data

Description

Station No.

Number of the TNB Distribution centre.

Customer No.

Detection Date

Customer number listed in TNBs


billing information.
Detection date when customer was
detected for fraud by TNB.

93

4.3 Data Preprocessing


Data preprocessing is the first major stage involved in the development of the
fraud detection system. Data preprocessing involves data mining techniques in
order to transform raw customer data into the required format, to be used by
the SVC for detection and identification of fraud load consumption patterns.
Different data mining techniques have been applied to preprocess the e-CIBS
and HighRisk data, which are discussed in the sections below.

4.3.1 e-CIBS Data Preprocessing


The e-CIBS data acquired from TNBD, as shown in Figure 4.3 is in raw format.
This data was preprocessed to remove unwanted customers and smoothen out
noise and other inconsistencies, in order to extract only useful and relevant
information required. Figure 4.5 illustrates the proposed framework for
preprocessing the e-CIBS data.

Start of Data
Preprocessing

Historical 25 Month
Customer Data

Customer Filtering
and Selection

Consumption
Transformation

Feature Adjustment

Feature
Normalization

Feature Selection
and Extraction

Feature File
(Training and
Testing Data)

End of Data
Preprocessing

Figure 4.5: Flowchart of the proposed framework for e-CIBS data preprocessing

94

As shown in Figure 4.5, six major steps are involved in preprocessing the e-CIBS
data, which are as follows:
1. Customer Filtering and Selection
2. Consumption Transformation
3. Feature Selection and Extraction
4. Feature Normalization
5. Feature Adjustment
6. Feature File

The following sections further discuss in the detail the six steps involved for eCIBS data preprocessing.

4.3.1.1 Customer Filtering and Selection


As the e-CIBS data acquired from TNBD is in raw format, therefore, in order to
extract relevant and useful information, only customers with complete and
useful data were selected from the KL Barat station, for the SVC model
development. Since, the data acquired is in the form of a database, therefore
data mining techniques using the Structured Query Language (SQL) were
applied to satisfy the four criteria, as follows:

1. Remove repeating customers in the monthly e-CIBS data.


2. Remove customers having no consumption (i.e., 0 kWh) throughout the
entire 25 month period.
3. Remove customers not present within the entire 25 month period
(missing data).
4. Remove new customers registered after the first month in the data i.e.,
customers registered after May 2006.

After customer filtering and selection from the KL Barat station data, only
186,968 customer records remained from the initial 265,870 customer
population. Even though approximately 30% of customers were removed after
95

applying the four filtering conditions mentioned previously, the amount of


customers remaining were more than sufficient for SVC training and model
development. The 186,968 common customers remaining after applying the
filtering were transferred into a new database for further preprocessing, which
is shown in Figure 4.6, for the first month, i.e. May 2006.

Figure 4.6: Common customer records for the KL Barat station after customer
filtering and selection

4.3.1.2 Consumption Transformation


Real world datasets tend to be noisy and inconsistent. Therefore, to overcome
these problems, data mining techniques using statistical methods were applied
to the e-CIBS data, in order to remove noise and other inconsistencies. In some
cases where meter readers are unable to record meter readings, due to
customers not being present at their premises, therefore, in these situations
96

meter readers indicate an estimated (RType: E in Figure 4.6) monthly kWh


consumption for those customers in the billing information. This estimated
billing consumption is based on the previous consumption recorded for the
customers. Since, the estimated monthly consumptions are not an accurate
consumption value, therefore, they were transformed into normal (RType: N
in Figure 4.6) consumption values in order to smoothen out inconsistencies
within the consumption patterns.

The estimated consumption transformation is accomplished by averaging all the


previously consecutive normal N consumption values of a customer with
respect to the currently estimated consumption. As an example, for an
estimated consumption value, if the last two consecutive consumptions of a
customer are normal, i.e., say N1 = 100 kWh and N2 = 200 kWh, then the
average of N1 and N2 will be 150 kWh. This value of 150 kWh will be replaced
in the estimated value as the normal (transformed) consumption value. This
technique is implemented for the entire 25 month period, for all customers. By
applying this technique, the estimated monthly kWh consumption values are
replaced with the transformed normal monthly kWh consumption values.
Figure 4.7 shows a sample of the e-CIBS monthly kWh consumption data to be
transformed.

The number of normal consecutive consumptions may vary through the entire
load profile for each customer. This entirely depends on meter readers, if they
are able to collect the meter readings. If they are not able to collect the actual
meter readings from the customer premises, then they estimate the current
months consumption based on the previous months consumption of the
customer.

97

Figure 4.7: The e-CIBS monthly consumption data to be transformed

Figure 4.8: The e-CIBS monthly consumption transformed into the normal
monthly kWh consumption
98

Figure 4.8 indicates the transformed consumption values into normal kWh
consumption values using statistical averaging, for the e-CIBS monthly
consumption data in Figure 4.7. The data columns R1 through R24 in Figure 4.7
and 4.8 represent the month number, the rows indicate different customers,
and the data value in every cell is in the format of, Reading_Type:
Monthly_Consumption. After transformation, as indicated in Figure 4.8, all
estimated E consumption values are converted into their respective normal
N consumption values.

4.3.1.3 Feature Selection and Extraction


Features were selected from the e-CIBS data, in order to extract only useful and
relevant information required for training the SVC model. Since the proposed
fraud detection framework applies consumption information of customers (load
profiles) for the detection of fraud activities, therefore, the load consumption
features are crucial part for pattern recognition in this research study.

From the 25 month kWh consumption data, daily average kWh consumption
values, corresponding to features were computed for each customer. These
features were calculated using the following expression:
Y

(L+

= |

` `|

= 1,2, ,24

(4.1)

Where
< represents the monthly kWh consumption of the following month, and

|Yb Y | represents the absolute difference of days with respect meter

reading date between the following month and current month.

99

Using eq. (4.1) 24 features i.e., 24 daily average kWh consumption values were
calculated for each customer. It is known that meter readings for each customer
are recorded on different dates of the month and are not always the same for all
customers, i.e. meters are not exactly read every 30/31 days (month) and there
are longer or shorter durations in the number of days. As meter reading dates
effect the monthly kWh consumption recorded for each customer, thus, the 24
daily average kWh consumption values computed using eq. (4.1) reveal an
accurate consumption history of the customers.

The 24 daily average kWh consumption values computed for each customer
correspond to customer load profiles. For a selected group of Q customers, each

customer load profile is characterized by a vector  (L+ = @Y , = 1, , p,


(L+

where H = 24 corresponds to 24 time domain intervals based on the daily

average kWh consumption values. Therefore, the whole set of customer load
profiles is represented by,  =  (L+ , = 1, , .

Figure 4.9 illustrates the normal monthly kWh consumption values, where < is

represented by data columns C1 through C24. The meter reading dates with
respect to the consumption values in Figure 4.9 are shown in Figure 4.10,

represented by data columns RD1 through RD24. The absolute difference of the

days with respect to the meter reading date, |Yb Y | for the respective

consumption values of Figure 4.9, are shown in Figure 4.11, represented by data
columns DD1 through DD24. The daily average kWh consumption values
calculated using eq. (4.1), are obtained by dividing the respective consumption
values in Figure 4.9 with the respective absolute day difference values Figure
4.11, as shown in Figure 4.12.

100

Figure 4.9: The normal monthly kWh consumption of the customers

Figure 4.10: The meter reading date of the customers


101

Figure 4.11: The difference of days between each meter reading date

Figure 4.12: The average daily kWh consumption features


102

After calculation of the daily average kWh consumption features, other features
were evaluated for selection based on the Cross-Validation (CV) method. Section

4.3.1.3.1 gives a brief overview regarding -fold CV. The criterion for selection

of other features is trivial; if there is distinguishable relationship (correlation)


between the new and existing features, only then the new feature is
incorporated in the SVC model. From Table 4.2, three features which are

considered useful to the problem of NTL detection were evaluated: (i) HRC, (ii)
CWR, and (iii) IR. During feature selection, class labels {0, 1} for evaluating each
sample were selected based upon the TOE for the respective samples, where 0
indicates good (normal) samples and 1 indicates fraud samples.

Feature selection was performed using all customer data in the KL Barat station.
Utilizing the TOE information as the class label for the features results in an
unbalanced dataset, as there are only 1171 TOE cases from the total 186,968
customers, i.e., only 0.5% customers are fraud while the remaining are good.
Therefore, in this scenario, an unbalanced class ratio is achieved, for which the
CV accuracy does not prove to be accurate anymore. Therefore, to overcome this
problem, an objective function, i.e. eq. (4.2) was implemented in order to
calculate the detection hitrate13 for the purpose of performance evaluation. The
detection hitrate is calculated using the following expression:
1 p1 =

100%

(4.2)

Where
? represents the number of samples correctly classified as fraud cases
by the SVC and labeled as fraud cases by TNBD,

? represents the total number of samples classified as fraud cases by the

SVC.
13

Detection hitrate is the measure of accuracy in percentage for the number of samples
correctly classified by the SVC and identified as fraud by TNB, over the number of samples
classified as fraud by the SVC.

103

Table 4.4 indicates the hitrates obtained for the different combination of
features evaluated. The detection hitrate calculated in Table 4.4 is based on the
average of 100 trials, where on every trial 67% samples from every class are
used for SVC training and the remaining 33% samples are used for testing. For
every trial, the training and testing samples are selected in a random order.

Table 4.4: Detection hitrate for different combinations of modeling features


Features Selected

Detection
Hitrate

24 Daily average consumptions

68.13%

24 Daily average consumptions + HRC

65.40%

24 Daily average consumptions + CWR

78.96%

24 Daily average consumptions + IR

72.57%

24 Daily average consumptions + HRC + CWR

67.43%

24 Daily average consumptions + HRC + IR

64.65%

24 Daily average consumptions + CWR + IR

77.21%

24 Daily average consumptions + HRC + CWR + IR

61.85%

No.

Feature selection results in Table 4.4 indicate that the best detection hitrate is
obtained using a combination of: (i) the load profile (daily average kWh
consumption values) features, and (i) the Credit Worthiness Rating (CWR).
Additionally, based on the data analysis of fraud customers previously identified
by TNBD SEAL experts, it was observed that CWR significantly contributed
towards customers committing fraud activities. As the CWR is targeted to
identify customers intentionally avoiding paying bills and delaying payments, in
the majority of the fraud and theft cases detected by TNBD, customers who
delay payments and ignore paying bills are most likely involved in fraud
activities, such as electricity theft. Therefore, the CWR is selected as another
feature, to be used alongside the load profile features in the SVC model.

104

The CWR for all customers in TNBDs billing system is automatically generated,
which is based on the monthly payment status of the customers. The CWR is
based on six integer values ranging from 0 to 5, where 0 represents the
minimum CWR and 5 represents the maximum CWR, as shown in Figure 4.13,
where data columns C1 through C25 represent the month and the customers are
indicated by the rows. Since, the CWR changes every now and then based on
monthly payment status of customers, therefore, averaged CWR values for each
customer over a period of 25 months were used as the 25th feature in the SVC
model, as shown in Figure 4.14.

Figure 4.13: The monthly CWR of the customers

4.3.1.3.1 Cross-Validation
Cross-Validation (CV), also referred to as rotation estimation [200, 201], is a
technique for assessing how the results of a statistical analysis will generalize to
105

an independent dataset. CV is mainly used in settings where the goal is


prediction or classification, and one wants to estimate how accurately a model
will perform in practice. In addition, CV is also able to prevent the overfitting
problems in supervised learning techniques, during training.

Figure 4.14: The averaged CWR over a period of 25 months

One round of CV involves partitioning a sample of data into complementary


subsets, performing the analysis on one subset (called the training set), and
validating the analysis on the other subset (called the validation set or testing

set). In -fold CV, the training samples are partitioned into subsamples. Of the
subsamples, a single subsample is retained as the validation data for testing

the model, and the remaining 1 subsamples are used as training data. The CV
process is then repeated times (the folds), with each of the subsamples used

exactly once as the validation data. The results from the folds are then

averaged to produce a single estimation. The advantage of CV is that all


observations are used for both training and validation, and each observation is
106

used for validation exactly once [193]. To reduce variability further, multiple
rounds of CV are also performed using different partitions, and the validation
results can then be averaged over the rounds. Generally in most practices, the
10-fold CV method is the most commonly used procedure for feature selection
and parameter tuning in SVMs [202].

4.3.1.4 Feature Normalization


In order for the feature data to fit the SVC properly, all feature data (25 features)
were represented using a normalized scale, in the range of 0 to 1. All data was
linearly scaled using the following expression:
< = (+c (+
c (+

Where

(4.3)

< represents the current kWh consumption or CWR of the customer,

min(<+ represents the minimum kWh consumption in the load profile of

the customer or the minimum CWR value in all customers, and

max(<+ represents the maximum kWh consumption in the load profile of


the customer or the maximum CWR value in all customers.

Figure 4.15 indicates the 24 daily average kWh consumption features (load
profiles) from Figure 4.12, normalized by implementing eq. (4.3), where data
columns V1 through V24 represent the normalized kWh consumption and the
customers are indicated by the rows. Figure 4.16 indicates the normalized CWR
values for the averaged CWR values Figure 4.14.

4.3.1.5 Feature Adjustment


In order for the modeling features in Figure 4.15 and 4.16 to be used for SVC
training and testing, the features need to be adjusted in and presented in a
107

proper format to the LIBSVM software [199]. Therefore, all normalized features
were labeled properly, where labels were represented by integer values.
Normalized feature values alongside their respective label values are denoted
by the matrix W, in the form:
%

%

=
%#

%
Where

:
: 

: #

: 

% : 
% : 

%# : #

% : 

% : 

% : 

%# : #

% : 

(4.4)

% represents the feature label,

 represents the normalized feature value,

 indicates the number of features ( = 25+, and


indicates the number of customers.

Figure 4.15: The normalized average daily kWh consumption features


108

Figure 4.16: The normalized average CWR over a period of 25 months

The modeling features obtained after the e-CIBS data preprocessing is complete,
are shown in Figure 4.17. The data columns V1 through V24 represent the
features in the standard format: Label_Number: Feature_Value, where the
customers are indicated by the rows.

4.3.1.6 Feature File


In order for the features to be used for training and testing in the LIBSVM
software [199], the feature database in Figure 4.17, is transformed into the
standard LIBSVM feature file (text file), as shown in Figure 4.18. The first data
column of the text file in Figure 4.18 represents the SVC class labels {1, 2} which
are based on the TOE information, where 1 represents fraud customers and 0
represents normal (good) customers. The following data columns after the class
label represent the 25 modeling features, i.e. the 24 daily average kWh
consumptions and the CWR.
109

Figure 4.17: The preprocessed features from the e-CIBS data

Figure 4.18: The LIBSVM feature file


110

4.3.2 HighRisk Data Preprocessing


The HighRisk data contains additional information of the fraud customers
previously identified by TNBD SEAL teams. The HighRisk data collected from
TNBD, as shown in Figure 4.4 is in raw format. Therefore, data mining
techniques need to be applied in order to extract the useful and relevant
information required.

In order to extract useful information from the HighRisk data, the type of
information which can be extracted was distinguished first. From observations,
it was seen that two types of major information can be extracted: (i) Detection
Count, and (ii) Last Detection Date. Table 4.5 indicates the information
retrieved from the HighRisk data using the Structured Query Language (SQL).

Table 4.5: Information retrieved from the HighRisk data using SQL
Column

Data

Description

Station No.

Number of the TNB Distribution centre.

Customer No.

Detection Count

Last Detection Date

Customer number listed in TNBs


billing information.
Number of times customer is detected
for fraud by TNBD previously.
Last date when customer was detected
for fraud.

Figure 4.19 shows the information extracted from the HighRisk data by using
SQL. For the KL Barat station HighRisk data, after grouping all customers into
one particular record, 32,972 fraud customers were identified from the 105,525
fraud cases detected by TNB in the last five years. This indicates that on an
average basis, the number of times a customer is detected for fraud is 3.2 times,
i.e., all customers commit fraud at least 3 times, which indicates a high rate of
repetitive fraud. Table 4.6 indicates the statistics of the information extracted
from the HighRisk data for all TNB stations listed in Table 4.1.

111

Figure 4.19: Information extracted from the HighRisk data using SQL

Table 4.6: Information extracted using SQL from the HighRisk data
No.

Station

Cases Detected by TNBD

Customers Found

KL Barat

105,525

32,972

Kota Bharu

6,773

3,557

Kuala Krai

818

653

Gua Musang

1,826

953

4.4 Classification Engine Development


The development of the classification engine, namely the SVC model, is the main
focus of this project and research study. Development of the classification
engine involves: load profile inspection for detection of normal and fraud
customers, training and development of the SVC classifier, SVC parameter
112

tuning, class weight adjustment, SVC probability estimation and SVC testing and
validation. The following sections will further discuss in detail the development
of the SVC engine.

4.4.1 Load Profile Inspection


Load profiles, i.e., the 24 daily average kWh consumption features of the
customers were inspected to retrieve samples, in order to build the SVC model
for the purpose of training. As, in this study a 2-class (binary) SVC classifier is
used to represent two different types of customer load profiles, therefore, load
profile samples used to build the SVC classifier were extracted from the
preprocessed KL Barat station data, as shown in Table 4.1. The load profiles
inspected were extracted and classified into two different categories according
to their behavior i.e., fraud or normal consumption.

From the 186,968 filtered customers in the KL Barat data, only 1171 customers
were identified and detected as Theft of Electricity (TOE) cases by TNBD in the
past five years. From the remaining 185,797 customers with no TOE cases, a few
hundred customers were identified as normal customers, i.e., customers with no
fraud activities. The customers identified with TOE and no TOE cases, form
the backbone for the development of the SVC model.

Firstly, manual inspection was performed on all 1171 TOE cases to identify load
profiles in which abrupt changes appear clearly, indicating fraud activities,
abnormalities and other irregularities in consumption characteristics. From the
TOE 1171 cases inspected, only 53 customer load profiles were identified with
the presence of abrupt or sudden drops relating to fraudulent events. These 53
TOE cases were selected as Fraud Suspects (Class 1), in order to train the SVC for
identifying fraud and suspicious customers. Figure 4.20 indicates the load
113

profiles of four typical fraud customers over a period of two years from the 53
fraud cases identified. The remaining 1118 load profiles with TOE cases did not
have any abnormal consumption patterns or abrupt drops relating to fraud
activities; i.e., these customers committed fraud before the two year period
(before May 2006), for which there was no customer data.

(a)

(b)

(c)

(d)

Figure 4.20: Normalized load profiles of four typical fraud customers over a
period of two years

Secondly, inspection was also performed on a set of 500 load profiles with no
TOE cases. From the inspection, 330 load profiles in which no abrupt changes
appear were selected as Normal Suspects (Class 2), in order to train the SVC for
identifying normal (good) customers, i.e. customers with no fraud activities.
Load profiles of four good customers from the 330 normal cases identified are
indicated in Figure 4.21. Therefore, in total 383 samples from both the fraud
(Class 1) and normal (Class 2) classes were used to build the SVC model, as
shown in Figure 4.22.
114

(a)

(b)

(c)

(d)

Figure 4.21: Normalized load profiles of four good customers over a period of
two years

4.4.2 SVC Development


The 383 samples selected through load profile inspection are used train a SVC in
order to create a SVC model. The SVC model development consists of a few
stages, which includes: weight adjustment, parameter optimization, probability
estimation and SVC training. The following sections will further discuss in detail
the development stages of the SVC model.

4.4.2.1 Weight Adjustment


The ratio between the two classes of samples is unbalanced i.e., Class 1 having
53 samples and Class 2 having 330 samples; therefore, classifier is weightaged
in order to balance the sample ratio. Weights are adjusted by calculating the
115

sample ratio for each class. This is achieved by dividing the total number of
classifier samples with the individual class samples. In addition, class weights
are also multiplied by a weight factor of 100 in order to achieve satisfactory
weight ratios for training. Table 4.7 indicates the weight ratios obtained after
adjustment.

Table 4.7: Weight ratio adjustment of the SVC


Class

Training Samples

Weightage

53

722.6415

330

109.4258

Figure 4.22: Customer data features (samples) used for SVC training

116

4.4.2.2 Parameter Optimization


The training accuracy of the SVC model is estimated by tuning the SVC kernel
parameter and the error penalty parameter, C. In this study, the RBF kernel is
used, hence, the parameter which controls the width of the Gaussian is to be

fine tuned. The Grid Search method proposed by Hsu et al. in [193] is used for
SVC parameter optimization. In the Grid Search method, exponentially growing

sequences of parameters (, + are used to identify SVC parameters obtaining


the best 10-fold CV accuracy. Sequences of parameters in the range,

 = V2c , 2c, , 2 W and = V2c , 2c , , 2W were used for 100 100 =


10,000 combinations respectively. For each pair of (, + the CV performance

was measured by training 67% of the classifier data and testing the remaining
33%. This procedure was repeated 100 times for 10-fold CV trials, where on
every trial, data samples were selected in a random order.

Experimentally, by iterating different parameter combinations, the optimal SVC

parameters are found to be:  = 1 and = 0.92, obtained for the highest 10fold CV accuracy of 93.71%. The detection hitrate in eq. (4.2) at this CV accuracy

was calculated to be 81.46%. The training accuracy14 of the classifier is


calculated using the expression:
11 * =
Where




100%

(4.5)

 represents the total number of samples correctly classified by the SVC,


 represents the total number of samples used for testing.

The SVC training engine proposed for parameter optimization and building the
classifier generation is shown in Figure 4.23. As seen in Figure 4.23, all 383
samples were trained in order to build the classifier (model file).
14 Training accuracy is the measure of the memorization or learning capability of the classifier.
Training accuracy is calculated in percentage for the number of samples used for testing the SVC
model, over the number of samples correctly classified by the SVC.

117

Start of Training

KL Barat Station

Training Data
(383 samples)

10-fold CV Trials
using 67% Data for
Training and 33%
for Testing

Reselect Optimal
SVC Parameters
using Grid Search

Bad

Good
10-fold CV Accuracy
for 100 Trials

Model File
(Classifier)

End of Training

Figure 4.23: The SVC training engine proposed for parameter optimization and
building the classifier (model)

4.4.2.3 Probability Estimation


Besides conducting classifications, SVCs also compute the probabilities for each
class [203]. This supports the analytic concept of generalization and certainty.
Given that | is an estimate for the probability of the output of a pairwise

classifiers between class 1 and class (i.e., | P(* = '1, ,,  +, | + | = 1) and

118

that is the probability of the 1th class, the probability = ( , , + of a class


can be derived via a QP problem [204]:

min# N |:|  | | | , N = 1


0, 1

(4.6)

In the case of this research study, the pairwise probability information defined
in eq. (4.6) was specified to be calculated by the LIBSVM software, in order to
estimate the probabilities of the classified customers. The probability estimates
(decision values) of the tested/validated data provide additional information for
data postprocessing i.e., selection of the suspicious customers.

4.4.2.4 SVC Training


After weight adjustment and parameter optimization, all 383 samples are
trained in order to build a SVC model (classifier). The LIBSVM v2.86 [199],
Microsoft Disk Operating System (MS-DOS) executable, svm-train.exe is
employed for training all the 383 samples, as shown in Figure 4.24. The weight
ratios and SVC optimized parameters obtained from sections 4.4.2.1 and 4.4.2.2
are used for SVC training using the LIBSVM software, as indicated in Figure 4.25,
where samples.txt represents the training data samples (two classes) as shown
in Figure 4.22 and classifier.txt is the model file or classifier generated after
the SVC training is complete.

SVC training in LIBSVM using the input parameter string in Figure 4.25 is shown
in Figure 4.26. During SVC training, the SMO training algorithm undergoes
iterations and on the last iteration, a total of 160 support vectors (SVs) are
calculated, as shown in Figure 4.26. Specifications of the trained classifier are
shown in Table 4.8.

119

Figure 4.24: LIBSVM MS-DOS executable used for SVC training

Figure 4.25: SVC training parameters for LIBSVM

Figure 4.26: SVC training using LIBSVM

120

Table 4.8: Specifications of trained classifier (model)


Class

Training Samples

Support Vectors

Weightage

53

42

722.6415

330

118

109.4258

The model file in Figure 4.27 is generated after SVC training is complete. The
first few lines in the model file indicate the parameters of the model followed by
the number of SVs. The first line in Figure 4.27 indicates the type of SVC method
used, which is C-Support Vector Classification (C-SVC) method. The second line
indicates the kernel type used, which is Radial Basis Function (RBF) kernel. The
third line in the model file indicates the value of (Gamma) used for SVC

training. The line following indicates the total number of classes, which is 2, i.e.
class 1 indicates fraud customers and class 2 indicates normal customers.

Figure 4.27: Model file (classifier) after SVC training is complete


121

The total number of SVs in the classifier is 160, as indicated by the last line of
Figure 4.26 and the fifth line of Figure 4.27. In the sixth line of the model file, the
(rho) is a parameter indicating the negative value of the bias term l. The next
lines in the model file indicate the labels used to represent the two classes and

the pairwise probability estimates represented by l and l . The

number of SVs in each class, i.e., Class 1 and Class 2 respectively are given in the
line following the pairwise probability l . All rows after the text SV in
Figure 4.27 represent the values of the 160 SVs computed using SVC training.

In the classifier in Figure 4.27, the total numbers of SVs are defined by the

constraint 0 =  in eq. (3.24), with class 1 having 42 SVs and class 2 having
118 SVs. There were no bounded support vectors (BSVs), || in the classifier,

since = . This is because, clusters were considered to be well separated and

no outliers were considered to be included, since BSVs defined by || 1,

used  = 1 as an optimal tuning parameter in the classifier. Training of the


classifier finished on the 464th iteration as shown in Figure 4.26, where the

maximum value of the optimal solution of the dual problem in eq. (3.24) was
calculated to be: -133.5426. The (rho) parameter in the model file is defined as
= l in the decision function in eq. (3.46), which is the negative bias term b

as mentioned in [193]. The value of on the last iteration is calculated by


LIBSVM to be = 0.6185. Additionally, pairwise probabilities of the classifier
defined in eq. (4.6) were computed as: = -2.1079 and = 0.6982.

The separating boundaries between the two classes of the classifier are shown
in Figure 4.28, which are plotted using the svm-toy.exe in the LIBSVM package.
As seen from the Figure 4.28, the dark region (blue in color) represents the
Class 1 boundary, while the lighter region (yellow in color) represents the Class
2 boundary. Therefore, this indicates that both classes are well separated during
the training process, resulting in a good training (learning) performance.

122

Figure 4.28: Separating boundaries between the two classes of the SVC model
Dark (blue) region indicates the Class 1 boundary and lighter (yellow) region
indicates the Class 2 boundary

4.4.2.5 SVC Testing


As training is done offline, i.e., once only, in order to obtain the model file
(classifier), testing is performed online in order to classify customers as fraud or
normal. The customer data used for testing the classifier is specified in Table
4.1, where three cities within the state of Kelantan in Malaysia, namely: (i) Kota
Bharu, (ii) Gua Musang, and (iii) Kuala Krai are used for SVC testing and
validation. The proposed SVC testing and validation engine is shown in Figure
4.29 below.
123

Start of Testing

1. Kota Bharu Station


2. Gua Musang Station
3. Kuala Krai Station

Reselect Optimal
SVC Parameters
using Grid Search
and CV and Retrain

Testing Data
(Customers)

Model File
(Classifier)

Bad

Good
Classification
Accuracy and Hitrate

Classification
(SVC) Results

End of Testing

Figure 4.29: The SVC testing engine proposed for the classification of fraud and
normal customers

In order to implement SVC testing, the LIBSVM v2.86 [199], Microsoft Disk
Operating System (MS-DOS) executable, svm-predict.exe is employed for testing
the customer data samples, as shown in Figure 4.30. The customer data used for
testing is in exact same format as the training data shown in Figure 4.22.
However, during testing the class labels of the data samples are obviously
unknown and for the purpose of simplicity all class labels in the testing data are
124

set to Class 2, as shown in Figure 4.31. The reason for using class labels in the
testing data is due to the data format requirement of LIBSVM. During the testing
and validation phase of the SVC, the class labels are internally ignored by the
LIBSVM software.

Figure 4.30: LIBSVM MS-DOS executable used for SVC testing

Figure 4.31: Customer data features used for SVC testing

125

SVC testing in LIBSVM using the input parameters in Figure 4.30 is shown in
Figure 4.32. Implementation of the SVC testing/validation is shown in Figure
4.33, using the classifier model developed in Figure 4.27. During testing, 150
customers from the Kota Bharu station were tested, resulting in 127 correctly
classified customers and 23 incorrectly classified customers as, indicated by
Figure 4.33. The overall classification accuracy of SVC model is calculated to be
84.67%, and the detection hitrate defined in eq. (4.2) is calculated to be 77.16%.

Figure 4.32: SVC testing parameters for LIBSVM

Figure 4.33: SVC testing and validation using LIBSVM

The output file (classification results) generated after SVC testing is shown in
Figure 4.34. The first line in Figure 4.34 indicates the class labels predicted, i.e.
Class 1 and Class 2. The lines following the first line indicate the predicted
results for the tested customers with respect to the customer data features in
Figure 4.31. The predicted results for each customer in Figure 4.34 consist of
three data columns. The first data column indicates the predicted class for each
customer, where Class 1 represents fraud customers and Class 2 represents
normal customers. The second and the third data columns indicate the
probability estimates for Class 1 and Class 2 respectively for each customer. The
sum of the two probabilities (data column 2 and column 3) for each customer is
always equal to one.
126

The probability estimates indicate the belongingness of a customer to a


particular class. For a customer, if the probability of Class 1 is higher than Class
2, then that customer is classified as Class 1 (fraud), otherwise that customer is
classified as Class 2 (normal). This indicates that for the case probability in
between different classes, the SVC model will represent the customer to be in
the class with the highest probability.

Figure 4.34: Output file (classification results) after SVC testing is complete

4.5 Data Postprocessing


Data postprocessing involves correlating the SVC classification results with the
e-CIBS and HighRisk data and selecting suspicious customers using a Fuzzy
Inference System (FIS), as shown in Figure 4.35. The SVC results include class
labels and probability estimates of the tested customers, which are correlated
with the customer data using data mining, i.e., SQL techniques. After correlation
127

of the results with the customer data, customer filtering is performed based on
human expertise and knowledge, using a FIS. The FIS performs filtering and
shortlists customers based on suspicious consumption patterns relating to fraud
activities and abnormalities. The following sections will briefly present the
theoretical concepts and background of fuzzy logic followed by the development
of the data postprocessing scheme for customer filtering and selection.

Start of Data
Postprocessing

e-CIBS Data
Classification (SVC)
Results for tested
customers

Data Parameter
Selection

HighRisk Data

Preliminary Data Postprocessing

Formation of SQL
statements for
Customer Filtering
and Selection

Transformation of
SQL into Rule Base

Membership
Function (MF)
Formation

Implementation of
Fuzzy Inference
System (FIS)

List of Suspicious
Customers

Fuzzy Inference System (FIS)


End of Data
Postprocessing

Figure 4.35: Flowchart of the proposed framework for data postprocessing

128

4.5.1 Fuzzy Logic Overview


Lotfi A. Zadeh was the first to published work on fuzzy sets in 1965 [205],
which lead to the introduction to the theory of fuzzy logic. The basic idea of
fuzzy logic is to allow not only the values 1 and 0, corresponding to true and
false but the whole interval [0,1] as degrees of truth. This leads to a radical
extension of classical logic. The following sections will briefly discuss the
general information about the theory of fuzzy logic, definitions of fuzzy sets,
membership functions, linguistic variables, fuzzy IF-THEN rules, combining
fuzzy sets and fuzzy inference systems (FISs).

4.5.1.1 Fuzzy Sets


A fuzzy set is a simple extension of the definition of a classical set in which the
characteristic function is permitted to have any values between 0 and 1 [206]. If

 is a collection of objects denoted generally by , then a fuzzy set in  is


defined as a set of ordered pairs:

= , ( +| 

(4.7)

where ( + is called the Membership Function (MF) for the fuzzy set . The

MF maps each element of  to a membership grade (or value) between 0 and 1.

4.5.1.2 Membership Functions

As most fuzzy sets in use have a universe of discourse  consisting of the real

line , it is impractical to list all the pairs defining a MF. The convenient way to
define an MF is by expressing it as a mathematical formula. The most commonly
used MFs used in fuzzy sets are listed as follows:

1. Triangular MF A triangular MF is specified by three parameters


', l, , as follows:

129

1%(; , l, + =

0,
c
rc ,

cr ,

0,


l
l


(4.8)

The parameters ', l, , (with < l < ) determine the  co-ordinates


of the three corners underlying the triangular MF.

2. Trapezoidal MF A trapezoidal MF is specified by four parameters


', l, , > , as follows:

0,
c
rc ,
1> (; , l, + = 1,
c
c ,

0,


l

l  K
l

(4.9)

>

The parameters ', l, , >, (with < l < < >) determine the  co-

ordinates of the four corners underlying the trapezoidal MF.

3. Gaussian MF A Gaussian MF is specified by two parameters ', , as


follows:

^
a

1(; , + = c^Z

(4.10)

A Gaussian MF is determined completely by and , where represents

the MFs center and determines the MF spread.

4.5.1.3 Linguistic Variables


The concept of linguistic variables was introduced by Zadeh [207] to provide a
basis for approximate reasoning. A linguistic variable is defined as a variable
whose values are words or sentences. For instance, age can be linguistic
130

variable if its values are linguistic rather than numerical, i.e., young, very young,
old, very old, etc., rather than numerical, such as 20, 21, 23, 45 etc. Figure 4.36
illustrates the term set age expressed by Gaussian MFs.

Figure 4.36: Fuzzy Membership Functions (MFs) for the term set age

4.5.1.4 Fuzzy IF-THEN Rules


Fuzzy rules and fuzzy reasoning are the backbone of fuzzy inference systems
(FISs), which are the most important modeling tools based on the fuzzy set
theory. A fuzzy IF-THEN rule (fuzzy rule, fuzzy implication or fuzzy conditional
statement) is expressed as follows:
x is A y is B

(4.11)

where A and B are linguistic variables or labels defined by fuzzy sets [205]
characterized by appropriate membership functions. The expression x is A is

called antecedent or premise, while y is B is called the consequence or


conclusion [206]. Some fuzzy IF-THEN rules are given as examples below:

131

If pressure is high, then volume is small

If velocity is high, then force = 5 (%1*+

If speed is low AND the distance is small, then the force on the brake is
small

4.5.1.5 Combining Fuzzy Sets


Fuzzy sets are combined in the application of fuzzy reasoning. The combination
of fuzzy sets can be obtained using intersection (AND), union (OR) and
complement (NOT) operations. The mathematical description of the basic
operations for a set  and a subset are presented below.

1. Subset A fuzzy set is contained in fuzzy set (or, equivalently, is a

subset of , or is smaller than or equal to ) if an only if ( + ( +


for all  in . In symbols, this expression can be represented by:
(+ (+  

(4.12)

2. Union The union or disjunction of two fuzzy sets and is a fuzzy

set , written as  = or  =  , whose MF is related to those


of and by the following expression:

( + = max( +, (+ = ( + ( +  

(4.13)

A more intuitive but equivalent definition of union is the smallest fuzzy


set containing both and . Alternatively, if is any fuzzy set that
contains both and , then it also contains .

3. Intersection The intersection or conjunction of two fuzzy sets and

is a fuzzy set , written as  = or  = , whose MF is


related to those of and by the following expression:

132

(+ = min ( +, ( + = ( + ( +  

(4.14)

As in the case of union, it is obvious that the intersection of and is

the largest fuzzy set which is contained in both and . This reduces
to the ordinary intersection operation, if both and are non-fuzzy.

4. Complement The complement or negation of a fuzzy set denoted


by (NOT ) is defined by the following expression:
( + = 1 ( +  

(4.15)

These fuzzy set operations perform exactly as the corresponding operations for
ordinary sets, if the values of the MFs are restricted to either 0 or 1.

4.5.1.6 Fuzzy Inference System


Fuzzy inference systems (FISs) are also known as fuzzy-rule-based systems,
fuzzy models, fuzzy associative memories (FAMs) or fuzzy controllers when
used as controllers. A fuzzy inference system (FIS) is composed of five
functional blocks, as shown in Figure 4.37. The functions of the five blocks are
as follows:

1. A rule base containing a number of fuzzy if-then rules.


2. A database which defines the membership functions (MFs) of the fuzzy
sets used in the fuzzy rules.
3. A decision-making unit which performs the inference operation on the
rules.
4. A fuzzification interface which transforms the crisp inputs into degrees
of match with linguistic variables.
5. A defuzzification interface which transforms the fuzzy results of the
inference into a crisp output.

133

Knowledge Base

Database

Rule Base

Input

Output
Fuzzification
Interface

Defuzzification
Interface

(Crisp)

(Crisp)

(Fuzzy)

(Fuzzy)

Decision-Making Unit

Figure 4.37: Flowchart of the general architecture of a FIS

In common practice, the rule base and the database in a FIS are jointly referred
to as the knowledge base, as shown in Figure 4.37. The steps of fuzzy
reasoning (operations upon fuzzy IF-THEN rules) performed by FISs are:

1. Input variables are compared with the MFs on the premise part to obtain
the membership values (or compatibility measures) of each linguistic
label. This step is also known as fuzzification.

2. The membership values on the premise part are combined through fuzzy
set operations such as: min, max or multiplication to get firing strength
(weight) of each rule.

3. The qualified consequent (either fuzzy or crisp) of each rule is generated


depending on the firing strength.

4. The qualified consequents are aggregated to produce crisp output


according to the defined methods such as: centroid of area, bisector of
134

area, mean of maximum, smallest of maximum and largest of maximum


etc. This step is also known as defuzzification [208].

Several types of fuzzy reasoning [209,210] have been proposed in literature.


Depending on the types of fuzzy reasoning and fuzzy IF-THEN rules employed,
FISs can be classified into three major types, which are as follows:

Type 1 In this type of FIS, the overall output is the weighted average of
each rules crisp output induced by the rules firing strength (the product
or minimum of the degrees of match with the premise part) and output
MFs. The output membership functions used in this scheme must be
monotonic functions [211].

Type 2 In this type of FIS, the overall fuzzy output is derived by


applying the max operation to the qualified fuzzy outputs (each of
which is equal to the minimum of firing strength and the output
membership function of each rule). Various schemes have been
proposed to choose the final crisp output based on the overall fuzzy
output, such as the centroid of area, bisector of the area, mean of
maxima, maximum criterion, etc [209,210].

Type 3 This type of FIS uses Takagi and Sugenos fuzzy IF-THEN rules
[212]. The output of each rule is a linear combination of input variables
plus a constant term, and the final output is the weighted average of each
rules output.

4.5.2 Preliminary Data Postprocessing


Useful parameters from the classification (SVC) results and customer data (eCIBS and HighRisk) are required in order to form the rules and MFs of the FIS.
Preliminary data postprocessing involves correlating the classification results
with the e-CIBS and HighRisk data and selecting the optimum parameters from
135

the correlated data in order to formulate SQL statements for the selection of
suspicious customers, as indicated in Figure 4.35. The following sections further
discuss in detail the processes implemented for the purpose of preliminary data
postprocessing.

4.5.2.1 Correlation of SVC Results with Data


The classification (SVC) results shown in Figure 4.34 are correlated with the
preprocessed e-CIBS data in Figure 4.3 and preprocessed HighRisk data in
Figure 4.19, in order to obtain the correlated data shown in Figure 4.38.

Figure 4.38: Correlation of the SVC results with the customer data

The SVC results and customer data are correlated by employing data mining,
using SQL techniques. As seen in Figure 4.38, the first four data columns, i.e.
CustomerNo, CustomerName, CWR and Consumption are retrieved from the
preprocessed e-CIBS data in Figure 4.3. The data columns Predicted, Pclass1 and
136

Pclass2 are retrieved from the SVC output results shown in Figure 4.34, and last
two data columns in Figure 4.38, i.e. DetectCount and LastDetectDate are
retrieved from the preprocessed HighRisk data in Figure 4.19. The SVC results,
e-CIBS data and HighRisk data are correlated together based on the
CustomerNo, which is used as the criteria for matching the data together.

4.5.2.2 Parameter Selection


In order to retrieve useful data information from the correlated data in Figure
4.38, useful parameters were determined by inspecting customer load profiles
of previously identified fraud cases with the help of TNBD SEAL team experts.
From the correlated data in Figure 4.38, ten parameters were selected in order
to construct SQL statements for the purpose of customer filtering and selection.
Table 4.9 indicates the 10 parameters used for the selection of suspicious
customers. All data parameters in Table 4.10 are indicated in Figure 4.39 for the
correlated data in Figure 4.38.

4.5.2.3 Customer Filtering and Selection Using SQL


The ten parameters in Table 4.9 were further studied and analyzed in order to
develop SQL statements for the selection of suspicious customers from the
correlated data in Figure 4.38. By inspecting load profiles of fraud customers
previously identified by TNBD and with additional knowledge from TNBD SEAL
team experts and the NTL Group of TNB, SQL statements for the selection of
suspicious customer were developed. The SQL statements were established
based on characteristics distinguishing the load profiles characteristics of
normal (good) customers in Figure 4.21 amongst load profiles characteristics of
fraud customers in Figure 4.20. Table 4.10 indicates the SQL statements
developed for three different types of customer selection.

137

Table 4.9: Parameters used for selection of suspicious customers


No.

Parameter

Description

Class

Class label of the SVC results.

L23

Unnormalized 23rd daily average kWh


consumption in the load profile of a customer.

L24

Unnormalized 24th daily average kWh


consumption in the load profile of a customer.

MinkWh

Minimum kWh consumption within the load


profile of a customer.

MaxkWh

Maximum kWh consumption within the load


profile of a customer.

DiffkWh

Difference between the maximum and


minimum consumption (MaxkWh-MinkWh)
in the load profile of a customer.

DetectCount

Number of times a customer has been


detected for fraud by TNBD previously.

Probability

Probability estimates of the fraud class (Class


1) from the SVC results.

TOE

Theft of Electricity (TOE) tagging indicating if


customer has been detected for fraud by
TNBD previously.

10

HRC

High Risk Customer (HRC) tagging indicating


if customer has a high possibility of
committing fraud activities.

The three customer selection levels indicated in Table 4.10 are termed as Fraud
Detection Levels (FDLs), which are: Low, Moderate and High respectively. The
FDLs utilize data parameters from Table 4.9 in order to match the customer
load consumption patterns with fraud patterns previously analyzed, where
results are obtained in terms of the matching percentage between the load
consumption patterns. The main idea behind the SQL statements in Table 4.10
is to filter unwanted customers using logical decisions based on the 10
parameters identified from the correlated data. The result from this filtering
yields a list of suspicious customers with a high possibility of fraud activities.

138

Figure 4.39: Parameters selected from the correlated data

In addition to detecting and identifying fraud activities, the SQL statements in


Table 4.10 additionally identify abnormalities in load consumption patterns.
Abnormalities or irregularities are defined by TNBD as, sudden or abrupt drops
in load consumption patterns similar to fraud activities; however, they are
caused due to other reasons, which are stated as follows:

1. Replaced meters
2. Abandoned houses or premises
3. Change of tenants or residents, and
4. Faulty meter wiring

Since, abnormalities have similar consumption patterns as fraud consumption


patterns as shown in Figure 4.20, thus, abnormalities also contribute to a small
percentage to NTLs. Therefore, the detection and identification of abnormalities
and other irregularities also plays a crucial role in NTL reduction.
139

Table 4.10: SQL statements implemented for suspicious customer selection


Level

Pattern Match

Low

Matches 55% to
65% of the fraud
pattern

Moderate

Matches 65% to
75% of the fraud
pattern

High

Matches 75% to
100% of the fraud
pattern

SQL Statement
SELECT Customers FROM Result WHERE
((Class=1 AND L23<5.5 AND L24<5.5 AND
L23>1 AND L24>1 AND MaxkWh>7 AND
MaxkWh<26 AND MinkWh>1.2 AND
MinkWh<5 AND DiffkWh>6) AND
((DetectCount>=1 AND Probability>0.6)
OR (TOE=1 AND Probability>0.6) OR
(HRC=1 AND Probability>0.6) OR
(DetectCount=0 AND Probability>=0.6))
OR
((Class=1 AND L23>10 AND L24>10 AND
MaxkWh>75 AND MinkWh>8 AND
MinkWh<(0.2*MaxkWh)) AND
((DetectCount>=1 AND
Probability>0.55) OR (TOE=1 AND
Probability>0.55) OR (HRC=1 AND
Probability>0.55) OR (DetectCount=0
AND Probability>=0.65))))
SELECT Customers FROM Result WHERE
((Class=1 AND L23<4 AND L24<4 AND
L23>1 AND L24>1 AND MaxkWh>7 AND
MaxkWh<18 AND MinkWh>1.5 AND
MinkWh<4 AND DiffkWh>6) AND
((DetectCount>=1 AND Probability>0.6)
OR (TOE=1 AND Probability>0.6) OR
(HRC=1 AND Probability>0.6) OR
(DetectCount=0 AND Probability>=0.75))
OR
((Class=1 AND L23>10 AND L24>10 AND
MaxkWh>75 AND MinkWh>8 AND
MinkWh<(0.2*MaxkWh)) AND
((DetectCount>=1 AND
Probability>0.55) OR (TOE=1 AND
Probability>0.55) OR (HRC=1 AND
Probability>0.55) OR (DetectCount=0
AND Probability>=0.68))))
SELECT Customers FROM Result WHERE
(Class=1 AND L23<4 AND L24<4 AND
L23>1 AND L24>1 AND DiffkWh>6.5 AND
MaxkWh>7 AND MaxkWh<14 AND
MinkWh>1.5 AND MinkWh<3.5) AND
((DetectCount>=1 AND
Probability>=0.62) OR (TOE=1 AND
Probability>=0.62) OR (HRC=1 AND
Probability>=0.62) OR (DetectCount=0
and Probability>=0.8))
140

4.5.3 Suspicious Customer Selection Using FIS


The FIS in this research study implemented as a data postprocessing scheme, as
indicated in Figure 4.35. Selection of suspicious customers using the proposed
FIS involves: (i) transforming the SQL statements into fuzzy rules, (ii)
formulating MFs based on the parameter values used in the SQL statements,
and (iii) implementing the FIS.

4.5.3.1 Transformation of SQL into Fuzzy Rules


The SQL statements in Table 4.10 are transformed into fuzzy rules, using the
fuzzy set definitions defined in eq. (4.13) and (4.14). During transformation, for
union operations (OR), the fuzzy set operation max is used, while for
intersection (AND), the fuzzy set operation min is used. Table 4.11 shows the
transformed SQL statements into fuzzy rules, where the Low and Moderate level
fuzzy rules are exactly the same and the only difference are the parameter
values used in the MFs.

Table 4.11: Fuzzy rules transformed from the SQL statements in Table 4.10
Level

Fuzzy Rule

Low

MAX{MAX[MIN(L23, L24, MaxkWh, MinkWh, DiffkWh),


MIN(DetectCount2, Probability1), MIN(TOE, Probability1),
MIN(HRC, Probability1), MIN(DetectCount1, Probability2)],
MAX[MIN(L23, L24, MaxkWh, MinkWh), MIN(DetectCount,
Probability3), MIN(TOE, Probability3), MIN(HRC,
Probability3), MIN(DetectCount, Probability4)]}

Moderate

MAX{MAX[MIN(L23, L24, MaxkWh, MinkWh, DiffkWh),


MIN(DetectCount2, Probability1), MIN(TOE, Probability1),
MIN(HRC, Probability1), MIN(DetectCount1, Probability2)],
MAX[MIN(L23, L24, MaxkWh, MinkWh), MIN(DetectCount,
Probability3), MIN(TOE, Probability3), MIN(HRC,
Probability3), MIN(DetectCount, Probability4)]}

High

MAX{MIN(L23, L24, MaxkWh, MinkWh, DiffkWh),


MIN(DetectCount2, Probability1), MIN(TOE, Probability1),
MIN(HRC, Probability1), MIN(DetectCount1, Probability2)}

141

4.5.3.2 Membership Function Formation


The fuzzy rules transformed from the SQL statements in Table 4.11, require MF
formation in order to implement the FIS. The triangular and trapezoid MFs
defined in eq. (4.8) and (4.9) are used to implement the fuzzy rules in Table
4.11. Figure 4.40 indicates the fuzzy MFs used in order to implement the Low
FPDL fuzzy rule in Table 4.11. The parameter values used in the MFs are taken
exclusively from the SQL statements in Table 4.10.

(a)

(b)

(c)

142

(d)

(e)

(f)

(g)

143

(h)

(i)

Figure 4.40: Fuzzy MFs used in order to implement the Low level fuzzy rule

In Figure 4.40 (a), (b), (c), (d) two trapezoidal MFs are used in each figure to
formulate the different ranges of the parameters L23, L24, MinkWh and
MaxkWh. In Figure 4.40(e) only one MF is used to represent the DiffkWh
parameter. The probability MFs are indicated in Figure 4.40(f), where four
trapezoidal MFs are used to represent four different probabilities (P1, P2, P3 and
P4) specified in the Low level SQL statement given in Table 4.10. For the case
of Figure 4.40(g), the DetectCount1 uses triangular MF, since, the DetectCount
can only be 0, therefore, a triangular MF with the parameters: [-0.01 0 0.01] is
used. Similarly in Figure 4.40(h) and (i) triangular MFs are used to represent
the TOE and HRC, since the TOE and HRC can only be 1, therefore, triangular
MFs with the parameters: [0.99 1 1.01] are used. The MFs formulated for the
other levels i.e., the Moderate and High fuzzy rules in Table 4.11 are formulated
in the exact same procedure as for the Low FPDL.
144

The following code describes the formation of the MFs using the parameter
values in the SQL statements in Table 4.10 for the Low FPDL. The code is in
syntax similar to Microsoft Visual Basic 6.0 and MATLAB programming.

Membership Function Parameter Values


%L23 MFs
L23MF1 = TRAPMF(L23, 0.8, 1, 5.5, 5.8)
L23MF2 = TRAPMF(L23, 9.8, 10, 100000, 1000000)
%L24 MFs
L24MF1 = TRAPMF(L24, 0.8, 1, 5.5, 5.8)
L24MF2 = TRAPMF(L24, 9.8, 10, 100000, 1000000)
%MinkWh MFs
MinkWhMF1 = TRAPMF(MinkWh, 1, 1.2, 5, 5.2)
MinkWhMF2 = TRAPMF(MinkWh, 7.6, 8, (0.2*MaxkWh), (0.22*MaxkWh))
%MaxkWh MFs
MaxkWhMF1 = TRAPMF(MaxkWh, 6.8, 7, 26, 28)
MaxkWhMF2 = TRAPMF(MaxkWh, 73, 75, 100000, 1000000)
%DiffkWh MF
DiffkWhMF = TRAPMF(DiffkWh, 5.5, 6, 100000, 1000000)
%Probability MFs
Probability1MF =
Probability2MF =
Probability3MF =
Probability4MF =

TRAPMF(Probability1,
TRAPMF(Probability2,
TRAPMF(Probability1,
TRAPMF(Probability2,

0.61,
0.64,
0.61,
0.67,

0.62,
0.65,
0.63,
0.66,

1,
1,
1,
1,

1)
1)
1)
1)

%DetectCount MFs
DetectCountMF1 = TRIMF(DetectCount, -0.01, 0, 0.01)
DetectCountMF2 = TRAPMF(DetectCount, 1, 1, 1, 1000)
%TOE MF
TOEMF = TRIMF(TOE, 0.99, 1, 1.01)
%HRC MF
HRCMF = TRIMF(HRC, 0.99, 1, 1.01)

In the code above, TRIMF refers to the triangular MF defined in eq. (4.8), which
requires three input values and TRAPMF refers to the trapezoidal MF defined in
eq. (4.9), which requires four input values.

145

4.5.3.3 FIS Implementation


The FIS determines if customers are fraud or normal (good), based on the
parameter values in the correlated data in Figure 4.39. The FIS is implemented
using the fuzzy rules defined in Table 4.11 along with the respective MFs for the
different levels of filtering, i.e. Low, Moderate, and High FPDLs.

The FIS is implemented in such a way that all 10 parameter values in Figure
4.39 for each customer are evaluated one-by-one using a fuzzy rule selected in
Table 4.11. After evaluating the selected fuzzy rule, the output (crisp) value of
the FIS for each customer is known as the final value, which lies in between
the range of 0 to 1. Based on simple IF-ELSE logic statements of the final value
obtained, customers are marked as fraud or normal. The pseudocode of the
IF-ELSE logic structure, i.e. FIS algorithm for detecting fraud customers from
the correlated data is as follows:

FIS Algorithm for Detecting Fraud Customers


For Each Customer in Correlated_Data
IF Final_Value > 0.5
Customer = Fraud
%Customer added into the "List of Suspicious Customers"
ELSE
Customer = Normal
END
Next

The fraud customers (customers with final value > 0.5) are accumulated into
the List of Suspicious Customers, which is the result/output of the fraud
detection system, as indicated in Figure 4.2. The list of suspicious customers is
used by TNBD SEAL teams in order to carry out onsite inspection of customer
installations for the detection of fraud activities. The shortlisted customers from
the fraud detection system will reduce TNBDs operational cost in monitoring
NTL activities and will also increase their inspection hitrate by identifying black
areas, i.e. areas and regions where fraud customers are most likely present.

146

4.6 Summary
This chapter provided the methodology proposed for the fraud detection
framework and implemented the associated algorithms used for NTL
identification and detection. In the first sub chapter, a general project and
research methodology was introduced. Three major stages were involved in the
development of the intelligent fraud detection system, which include: (i) data
preprocessing,

(ii)

classification

engine

development,

and

(iii)

data

postprocessing. The data preprocessing sub chapter illustrated data mining, i.e.
SQL techniques used for preprocessing the raw customer information and
billing data for feature selection and extraction. The sub chapter, classification
engine development illustrated the SVC training, parameter optimization,
development of the SVC classifier and the SVC testing and validation engine. The
last sub chapter, data postprocessing presented the development of a Fuzzy
Inference System (FIS), creation of fuzzy rules and MFs for the selection of
suspicious customers.

147

CHAPTER 5

EXPERIMENTAL RESULTS

5.0 Overview
This chapter is composed of two main sub chapters. Sub chapter 1, presents the
Graphical User Interface (GUI) developed for the fraud detection system. The
GUI of the software developed generates the detection report of the list of
suspicious customers and the average daily consumption report. In sub chapter
2, model validation results are presented based on: (i) the classifier, (ii) pilot
testing, and (iii) comparison of the proposed model with other AI techniques.
Model validation results obtained are discussed and evaluated. The contribution
of the FIS for hitrate improvement is also discussed and the computational
intelligence scheme of SVC and FIS is compared to standard SVC. Finally in the
end of sub chapter 2, a comparative study of the proposed SVC and FIS model is
performed with two AI based classification techniques: (i) Multi-Layer
Backpropagation Neural Network (ML-BPNN), and (ii) Online-Sequential
Extreme Learning Machine (OS-ELM), in order to evaluate the efficiency of the
proposed fraud detection system.

5.1 Graphical User Interface


A user-friendly Graphical User Interface (GUI) of the proposed fraud detection
system, Abnormality and Fraud Detection System (AFDS) was developed in
order for the NTL Group of TNB and TNBD SEAL teams to utilize the software
for detecting fraud customers. The GUI is designed in such a manner as to
148

simplify and aid the adoption of the proposed system to the user. The GUI
application (AFDS) upon being launched requires login authentication by the
user in order to access the software. The authentication screen of the AFDS is
shown in Figure 5.1.

Figure 5.1: Authentication screen of the AFDS software

After having been authenticated, the AFDS loads into the memory of the
computer, during which the welcome screen of the software appears for a few
seconds. The welcome screen of the AFDS is shown in Figure 5.2.

Figure 5.2: Welcome screen of the AFDS software


149

5.1.1 Main Screen


After the welcome screen disappears, the main screen of the AFDS interface
appears as shown in Figure 5.3. The AFDS software has two control interfaces,
i.e., a tabbed window with two tab selections. The first control tab Detect
Abnormalities and Fraud of the AFDS software as shown in Figure 5.3, selects
the input data files, executes and stops the detection, and displays the detection
progress of the software.

Button to
select e-CIBS
data file

Button to select
HighRisk data file

Detection
start/stop
toggle button

Location of e-CIBS
data file

Overall detection
progress

Location of
Fraud Pattern Detection
HighRisk data file
Level (FPDL)

Progress of the
current process

List of processes
executed

Figure 5.3: Main screen of the AFDS software

150

5.1.1.1 Selecting Data Files


Two data files need to be selected and input into the AFDS in order for the
software to run. The two data files are the: (i) e-CIBS data, and (ii) HighRisk
data as shown in Figure 4.3 and Figure 4.4 respectively, which should be in the
Microsoft Access Database format. To select an e-CIBS data file, the user should
click on the Select e-CIBS File button as shown in Figure 5.4, which opens a file
browser dialog as shown in Figure 5.5. In the file browser dialog, the e-CIBS
data file is selected (highlighted) by browsing the local drive, after which the
Open button is clicked as indicated in Figure 5.5.

Figure 5.4: Selecting the e-CIBS data file in the AFDS software

Figure 5.5: File browser for data file selection in the AFDS software
151

After the file browser closes, the location of the selected file is displayed in the
textbox beside the file selection button, as shown in Figure 5.6. In order to
select the HighRisk data file, the exact same procedure is applied.

Location of the selected


e-CIBS data file

Figure 5.6: Location of the selected file in the AFDS software

5.1.1.2 Fraud Pattern Detection Level


The Fraud Pattern Detection Level (FPDL) is the only parameter which needs to
be selected before running the detection in the AFDS software. The FPDL
control is a drop-down list for the user to select a filtering level as shown in
Figure 5.7. The FPDL is based on three filtering levels: (i) Low, (ii) Moderate,
and (iii) High as indicated in Table 4.10 previously.

Three FPDL levels

Figure 5.7: Selection of Fraud Pattern Detection Level (FPDL) in the AFDS
152

The three FPDLs indicated in Figure 5.7 and in Table 4.10 previously used for
fraud pattern detection, are described briefly as follows:

1. Low Level Introduces low level filtering where a large number of


customer suspects are shortlisted by the AFDS software. However, due to
the large number of suspects the fraud detection hitrate is generally low.
This filtering level is suitable for detecting abnormalities and other
irregularities.

2. Moderate Level Introduces moderate level filtering where a suffice


amount of customers are shortlisted by the AFDS software, i.e., a good
amount of suspicious customers are shortlisted for onsite inspection.
Using this filtering level the fraud detection hitrate is higher than the
moderate level.

3. High Level Introduces high level filtering where a small amount of


suspicious customers are shortlisted by the AFDS software. Using this
FPDL the fraud detection hitrate will generally be highest among all
three FPDL levels, however, this filtering level will not shortlist all
possible fraud suspects, i.e. some suspicious customers might be missed.

5.1.1.3 Execute Detection


After all data files and the FPDL have been selected by the user, clicking on the
Start Detection button as indicated in Figure 5.8, will start to execute the
detection in the AFDS software, as shown in Figure 5.9. The start detection
button is a start/stop detection toggle button, i.e., after the detection is started,
the button will display Stop Detection as indicated by Figure 5.9. The stop
button will cancel the current detection in progress. If detection is cancelled by
the user as shown in Figure 5.10, the data files and the FPDL can be reselected
again by the user and detection can be restarted.

153

Figure 5.8: Starting detection in the AFDS software

Figure 5.9: AFDS running for detecting suspicious customers

5.1.1.4 Detection Complete


Detection is only complete once the overall progress bar in Figure 5.3 indicates
100% and the message box in Figure 5.11 pops up on the screen. After clicking
the OK button, the message box disappears leaving the main screen of the
AFDS idle, as shown in Figure 5.12.
154

Figure 5.10: The start/stop detection toggle button in the AFDS software

Figure 5.11: Message box confirming detection is complete

5.1.2 Second Screen


After fraud detection is complete then the second control tab in the AFDS
software, View Suspected Customers is to be selected, which switches to the
second screen of the AFDS, as shown in Figure 5.13. The second control tab
displays the detection summary, the list of the suspicious customers and also
saves the detection report and the daily average consumption report into a
Microsoft Office Excel spreadsheet.

155

Total time taken for


program execution

Code of station
to be detected

Suspected customers
found out of the total
detected and FPDL used

Date and time

Figure 5.12: Main screen of the AFDS software after detection is complete

5.1.2.1 Suspected Customer List


After detection is complete, the View Suspected Customers control tab displays
the detection result: (i) the detection summary, and (ii) the list of the suspected
customers, as shown in Figure 5.14. The detection summary and the list of
suspected customers are saved in a Microsoft Office Excel report, which can be
viewed at the convenience of the user.

5.1.2.2 Save Detection Report


The detection report can be saved by clicking the Save Report button, as
shown in Figure 5.15. After the save report button is clicked, a file browser
dialog opens as shown in Figure 5.16. The name of the file to save as the
156

detection report is typed in the File name textbox as shown in Figure 5.16. The
location on the computer where to save the detection report can be browsed
through the file browser. After the file name of the detection report to be saved
is specified, the Save button is clicked in the file browser dialog to save the
detection report in the Microsoft Office Excel format, as shown in Figure 5.16.

Button to save the


detection report

Refresh and clear


program for next use

Location where to save the


detection report

Display summary of detection report


and list of suspicious customers

Figure 5.13: Second screen of the AFDS software

After the save button is clicked, within a few seconds, a message box as shown
in Figure 5.17 appears, which confirms the detection report has been saved.
After the message box disappears by clicking the OK button, the location of the
saved detection report will be displayed in the textbox beside save report
button, as shown in Figure 5.18.
157

Detection summary

List of suspicious customers

Figure 5.14: Viewing the list of suspicious customers in the AFDS software

Figure 5.15: Saving the detection report in the AFDS software


158

Figure 5.16: File browser dialog to save the detection report in the AFDS

Figure 5.17: Message box confirming the detection report is saved

Location of saved
detection report

Figure 5.18: Location of saved detection report in the second screen of the AFDS
159

5.1.3 Detection Result


The report generated by the AFDS software is in Microsoft Office Excel and
contains two spreadsheets (reports), which are: (i) the detection report, and (ii)
the average daily consumption report. The following sections will briefly discuss
the two reports generated by the AFDS software.

5.1.3.1 Detection Report


In the Microsoft Office Excel file saved by the AFDS software, the first
spreadsheet, Detection Report as shown in Figure 5.19, shortlists the
suspicious customers detected by the AFDS software.

Figure 5.19: Sample detection report in Microsoft Office Excel

A sample detection report for 150 customers tested from the KL Barat station is
shown in Figure 5.20, where Pattern indicates the detected consumption
160

pattern of the customers with respect to the three FPDLs. The ReadingUnit,
CustomerName, TOE and HRC fields in the detection report are retrieved from
the preprocessed e-CIBS data and the DetectCount and LastDetectDate fields are
retrieved from the preprocessed HighRisk data.

Figure 5.20: Sample detection report for customers tested in KL Barat station

5.1.3.2 Average Daily Consumption Report


In the Microsoft Office Excel file saved by the AFDS, the second spreadsheet,
Average Daily Consumption Report as shown in Figure 5.21, indicates the
average daily consumption of the suspicious customers (in the detection report)
by displaying the load profiles of the customers. By analyzing load profiles of the
suspicious customers detected by the AFDS, TNBD SEAL teams are able to carry
out effective out onsite inspections for the detection of fraud activities. A sample
average daily consumption report for 150 customers tested from the KL Barat
station is shown in Figure 5.22 and 5.23.

161

Figure 5.21: Average daily consumption report in Microsoft Office Excel

Figure 5.22: Average daily consumption report for the KL Barat station Page 1
162

Figure 5.23: Average daily consumption report for the KL Barat station Page 2

Load profiles of the suspicious customers in the average daily consumption


report are listed based on rows in the spreadsheet, as indicated by Figure 5.22
and 5.23. In the average daily consumption report, customers can be highlighted
by the user and line charts can be plotted graphically in Microsoft Office Excel in
order to view the load profiles of the suspected customers, as shown in Figure
5.24.

5.1.4 AFDS Operation Manual


An operation manual for the AFDS software was created in order to assist users
with the installation, setup and software usage. The AFDS operation manual can
be viewed by pressing CTRL+H (shortcut key) when the AFDS software is
loaded into the computer memory or by going to the File Menu bar on top of
the AFDS software and browsing into the sub menu: Help > Operation Manual,
163

as shown in Figure 5.25. The cover page of the AFDS Software Installation and
Operation Manual is shown in Figure 5.26.

Figure 5.24: Inspecting load profiles of suspected customers from the daily
average consumption report

Figure 5.25: Opening the AFDS software installation and operation manual

164

Figure 5.26: Cover page of the AFDS software installation and operation manual

5.2 Model Validation


The fraud detection system presented in this research study is tested and
validated using a Dell PowerEdge 840 workstation with Windows XP, a 2.40
GHz Intel Quad-core Xeon X3320 Processor with 4 GB of RAM. The time elapsed
for obtaining detection results from the testing data is approximately 1.8
seconds per customer, which varies based on the configuration of the computer
165

used and the number of customers inspected. The following sections will further
discuss and evaluate: the experimental results obtained for SVC training, SVC
testing and validation, pilot testing, contribution of the FIS for fraud detection
hitrate improvement and comparison of the computational intelligence scheme
of SVC and FIS with other AI based classification techniques.

5.2.1 Validation of Classifier

SVC training aims to obtain the best SVC parameters (, + for building the
classifier model. The developed classifier is evaluated using testing and

validation data, i.e. new and unseen data that has not used for training. The
accuracy of the classifier is evaluated using Cross-Validation (CV). The reason
for using CV is to ensure that the SVC does not overfit the training data.

The Grid Search method proposed by Hsu et al. in [193] was used for SVC
parameter tuning, where exponentially growing sequences of parameters (, +

were used to identify SVC parameters obtaining the best CV accuracy for the
383 classifier samples. Experimentally, 10-fold CV was used as the measure of
the training accuracy, where 67% of the 383 samples were used for training and
the remaining 33% were used for testing and validation. The 67% and 33%
training and validation data ratios used, are indicated by Mattfeldt et al. in [213]
in order achieve satisfactory level of CV results. For each parameter set tested,
the average 10-fold CV accuracy over 100 trials was computed, where on every
trial training and testing data were selected in a random order.

Experimentally, by iterating different (, + parameter combinations for 100 10-

fold CV trials, the best SVC parameters were found to be:  = 1 and = 0.92, for
the highest 10-fold CV accuracy of 93.71%. This CV accuracy is considered high,
because approximately 94% of the tested samples are classified correctly.
166

5.2.1.1 Discussion of Training Results


The highest 10-fold CV accuracy for SVC training is 93.71%, for parameters

 = 1 and = 0.92, as indicated in section 5.2.1. The parameter  is a SVC

hyperparameter that defines the trade-off between the training error and model

complexity. In the dual Lagrangian formulation, the parameter  defines the

upper bound of the Lagrange multipliers = , hence, it defines the maximal


influence the sample can exert on the solution.

For the trained classifier in Figure 4.27, the SVC parameter  for  1 does not

affect the training accuracy of the SVC model, while  < 1 provides a

significantly low training accuracy. The reason why the parameter  1 does
not affect the training accuracy of the classifier, is because, there are no

bounded support vectors (BSVs) in the classifier, as = . Due to this reason,

parameter  is selected as  = 1, which indicates that the training data is less

noisy. Therefore, by using a lower value of parameter , the results of the


classification mapping are smoother with a lower noise consideration.

The parameter is the RBF kernel parameter used for SVC, which controls the
width of the RBF (Gaussian) kernel. Gamma, is related to (sigma), defined by
the following expression:

= ^
Where

(5.1)

 is the variance of the resulting Gaussian hypersphere.

For the optimal parameter = 0.92 used for training the classifier, the value of
is calculated using eq. (5.1), which is found to be, = 0.5907. The value of

for the trained classifier is acceptable, since any value of below 0.01 is
167

considered small and any value of above 100 is considered large. As acts as

an important hyperparameter during SVC modeling, small values of lead the


model close to overfitting the training data, while large values of tend to over
smooth the training data. From the statistical learning theory point of view,

small values lead to a higher VC-dimension, meaning that too many features
are used for modeling which lead to overfitting, while large values lead to a

lower VC-dimension, meaning that too few features are used to model the

classifier. Thus, the value of found is acceptable to model the classifier, in

relation to parameter . However, it is hard to justify the optimal value of that

suits the best solution for the problem.

5.2.2 Validation of Classification Results


The classification accuracy of the testing data is a gauge to evaluate the ability of
the fraud detection system in order to detect and identify fraud customers. The
testing data used to evaluate the efficiency of the proposed fraud detection
system (classifier) is taken exclusively from TNBDs customer database for
different cities within peninsular Malaysia. The following sections will further
discuss and present the experimental results obtained from: SVC testing and
validation, pilot testing, contribution of the FIS for hitrate improvement and
comparison the computational intelligence scheme of SVC and FIS with other AI
based classification techniques.

5.2.2.1 Model Testing and Validation Results


The proposed fraud detection computational intelligence scheme of SVC and FIS
was tested and evaluated using customer data from three cities within the state
of Kelantan in Malaysia: (i) Kota Bharu, (ii) Gua Musang, and (iii) Kuala Krai as
indicated in Table 4.1. The proposed framework used for testing and validation
of the classifier i.e., the SVC testing engine used for classification of fraud and
normal customers is shown in 4.29 previously.

168

Testing and validation results obtained from the SVC and FIS computational
intelligence scheme for the testing data in Table 4.1 are tabulated in Table 5.1.
As seen from Table 5.1 the total numbers of customers in all TNB stations are
tested. The training accuracy of the SVC in Table 5.1 is obtained from the
expression defined in eq. (4.5) and the inspection hitrate15 is obtained from
TNBDs feedback for manual onsite inspection of the shortlisted customers by
the fraud detection system.

Table 5.1: Model testing and validation results for the fraud detection system
TNBD Station

No. of Customers
Tested

SVC Training
Accuracy
(Memorization)

Inspection
Hitrate

Kota Bharu

76,595

81.66%

42.56%

Kuala Krai

18,880

73.56%

38.07%

Gua Musang

13,045

79.27%

41.39%

As indicated from Table 5.1 for all three cities in the Kelantan state of Malaysia
an average training accuracy of 78.16% and an average inspection hitrate of
40.67% is achieved. As an example, an inspection hitrate of 40% means that, if
the AFDS shortlists 100 suspicious customers, then the TNBD SEAL teams will
perform onsite inspection for all the 100 shortlisted suspicious customers. From
the 100 inspected customer, if 40 customers are found as confirm fraud cases by
the TNBD SEAL teams then the inspection hitrate is 40%.

In addition to the model testing and validation conducted in the state of


Kelantan, the NTL Group of TNB implemented the AFDS software for the
purpose of model testing and validation in the city of Bangi in the Selangor state
of Malaysia. The AFDS results obtained indicated 105 suspicious customers

15

Inspection hitrate is the measure of accuracy in percentage for the number of customers
identified as fraud by TNBD SEAL teams (for manual onsite inspection) from the total number of
customers shortlisted as suspicious from the fraud detection system.

169

were shortlisted from the entire city of Bangi, after which TNBD SEAL teams
performed onsite inspection for all of the 105 shortlisted customers. Based on
the SEAL teams manual inspection, the results revealed that 43 customers out of
the total 105 customers were detected as confirm fraud cases, resulting in a
inspection hitrate of 40.95%, which was inclusive of both fraud activities and
abnormalities (as mentioned in section 4.5.2.3). Therefore, the model testing
and validation results for the three cities in the state of Kelantan and the city of
Bangi in the state of Selangor, can be accumulated together to obtain an average
inspection hitrate of 40.75%.

The 40% inspection hitrate obtained from the proposed fraud detection system
as compared to TNBDs current inspection hitrate of 3% to 5% is a major
improvement in terms of the fraud detection rate. Thus, on an average basis the
fraud detection system can identify 35-37% of fraud customers better than the
TNBD SEAL teams, and it has also shown that it can imitate the capability of a
SEAL teams without physically inspecting the electricity meters. Therefore, it is
feasible to say that the proposed fraud detection system is better in terms of the
inspection hitrate as compared to the current actions taken by TNBD.

5.2.2.1.1 Pilot Testing


The proposed fraud detection model was implemented for pilot testing with the
KL Barat station data. Since the SVC uses training samples from the KL Barat
station data as indicated in Table 4.1 previously, for pilot testing, the samples
used for model testing and validation were also from the KL Barat station, but
were different from the 383 training samples.

Since the total amount of customers remaining in the KL Barat station data after
customer filtering and selection was approximately 186,900 as indicated in
170

section 4.3.1.1 previously, therefore, 10 testing and validation trials each of


100,000 customers were implemented. On each trial customers were selected in
a random order and the results obtained from all 10 trials were averaged. The
results indicated that the average training accuracy of the SVC obtained was
87.19% and the average inspection hitrate achieved was 48.27%.

The pilot testing results for the KL Barat station data indicate that the average
training accuracy of the SVC and the average inspection hitrate are significantly
better as compared to the model validation and testing results for the three
cities in the Kelantan state of Malaysia. The logical reasoning behind this is that
the load consumption patterns of fraud customers which signal fraud activities
in rural and urban areas within Malaysia do not have exactly similar trends.
Since Kuala Lumpur (KL) is an urban area in Malaysia, with higher population
density and faster pace of life, Kelantan is considered a rural area in Malaysia
with a lower population and slower pace of life.

Since the SVC model is trained and tested/validated using data from the same
station (KL Barat station), therefore, the pilot testing results are significantly
better as compared to the results for the three cities in the state of Kelantan.
This is because training and testing the fraud detection system with data from
the same city/station will have similar trends of fraud consumption patterns.

5.2.2.2 Contribution of FIS for Hitrate Improvement


The proposed fraud detection computational intelligence scheme of SVC and FIS
was evaluated with standard SVC in order to compare the detection results, i.e.
the inspection hitrate. Table 5.2 indicates the inspection hitrate achieved for the
computational intelligence scheme of SVC and FIS versus standard SVC using the
testing data for the three cities in the state of Kelantan.
171

As indicated by Table 5.2, the inspection hitrate for the computational


intelligence scheme of SVC and FIS is significantly better as compared to
standard SVC. The average inspection hitrate of the computational intelligence
scheme is 40% as indicated in section 5.2.2.1 previously, whereas the average
inspection hitrate for standard SVC, indicated by the third column in Table 5.2,
is approximately 32%. This indicates that with the implementation of the FIS,
the inspection hitrate is increased by 8%, which is a significant increase in
terms of detecting fraud customers.

Table 5.2: Comparison of the inspection hitrate using the computational


intelligence scheme of SVC and FIS versus standard SVC
TNBD Station

No. of Customers
Tested

Inspection Hitrate
SVC

SVC and FIS

Kota Bharu

76,595

34.12%

42.56%

Kuala Krai

18,880

29.54%

38.07%

Gua Musang

13,045

31.98%

41.39%

The FIS is implemented in this computational intelligence scheme, since it has


the capability to deal with problems that are based on human knowledge and
experience. The FIS implemented in this research study is designed by a human
expert by analyzing load consumption patterns of fraud customers. This type of
computational intelligence scheme is no longer a pure computational system,
since the proposed scheme is incorporated with human knowledge; therefore, it
is referred to as a combination of two intelligent systems. In practical terms, a
combination of two intelligent systems, i.e. SVC and FIS, will perform better than
one intelligent system, i.e. standard SVC.

The reason behind the improved inspection hitrate using the computational
intelligence scheme of SVC and FIS is that, the FIS attempts to emulate the
reasoning process that a human expert undertakes in detecting fraud activities
172

from load consumption profiles. A clear and conclusive research finding is that,
the FIS is able to remove the hard limiting of the parameter values from SQL
statements in Table 4.10 with the help of the MFs, and with the addition of
experienced human knowledge into the fraud detection system a better scheme
for selection of suspicious customers from the SVC results is developed.

5.2.2.3 Discussion of Classification Results


The model validation and testing performed for three cities in the state of
Kelantan in Malaysia resulted in an average inspection hitrate of 40%, as
indicated in section 5.2.2.1 previously. The detection hitrate estimated for the
proposed model in section 4.4.2.2 was calculated using eq. (4.2) to be 81.46%.
However, the average onsite inspection hitrate of 40% obtained for model
testing and validation is sufficiently low as compared the estimated value of the
detection hitrate, i.e. 81.46%.

The reason for a considerably low inspection hitrate is due to the fact that the
amount of e-CIBS data provided by TNBD was less, i.e., only two years of data
was provided due to the problem associated with retrieving the archived data.
The customer data provided by TNBD was not entirely sufficient to back-track a
significant amount of the customer consumption history. Since two years of
customer consumption data from the 10 years archive, contributes to only 20%
of the entire customer consumption history, therefore, this is the limitation due
to which a significantly low inspection hitrate is obtained. Nonetheless, utilizing
only 20% of historical customer load consumption patterns for SVC training and
testing, the fraud detection system is able to achieve an average inspection
hitrate of 40%, which significantly increases TNBDs current inspection hitrate
35-37% from 3-5%.

173

In the case of pilot testing on the KL Barat station data as indicated in section
5.2.2.1.1 previously, the inspection hitrate is higher, i.e. approximately 48%. The
reason for higher inspection hitrate is due to the fact that the same stations
data is used for training and testing the SVC. In addition, with respect to the
model testing and validation results obtained for the state of Kelantan in
Malaysia, it is indicated that load consumption patterns of customers relating to
fraud activities have different trends for rural and urban populations within
peninsular Malaysia. Therefore, the inspection hitrate may vary for different
cities within peninsular Malaysia. However, with the use of the proposed fraud
detection system, an average inspection hitrate of 40% is more or likely
achievable for any city within peninsular Malaysia.

This FIS in the proposed SVC and FIS fraud detection framework contributes to
improve the inspection hitrate by an increase of 8%, due to the inclusion of
human knowledge and intelligence into the system. In addition, the one and only
drawback of the proposed fraud detection system is that, customers committing
fraud activities before the two year period for which there is no data of will not
be detected as suspicious customers by the fraud detection system. This is
because customers who commit fraud activities before the two year period have
normal load consumption patterns with no noticeable abrupt drops or sudden
changes, in order for the SVC to classify them as fraud suspects.

5.2.3 Comparison of Model with Other AI Techniques


A comparative study of the proposed SVC and FIS scheme was performed using
two AI based classification techniques: (i) Multi-Layer Backpropagation Neural
Network (ML-BPNN), and (ii) Online-Sequential Extreme Learning Machine (OSELM), in order to evaluate the efficiency and effectiveness of the proposed fraud
detection system. The following sections briefly discuss the theoretical concepts
of the ML-BPNN and OS-ELM followed by a discussion of the experimental

174

results obtained for model testing and evaluation using the two AI based
classification techniques mentioned above.

5.2.3.1 Multi-Layer Backpropagation Neural Network


Backpropagation (BP) also referred to as propagation of error is a common
method utilized for teaching ANNs on how to perform given tasks. The term BP
is an abbreviation for backwards propagation of errors. BP was first described
by Paul Werbos in 1974, but it wasn't until 1986, that it gained recognition, and
it led to a renaissance in the field of ANN research. BP is most useful for
feedforward networks (networks that have no feedback or simply, that have no
connections that loop).

Backpropagation is a supervised learning method, which is an implementation


of the Delta rule. It requires a teacher that knows, or can calculate, the desired
output for any given input. In BP, the errors propagate backwards from the
output nodes to the inner nodes. So BP, is widely used to calculate the gradient
of the error of the network with respect to the network's modifiable weights.
The gradient is then used in a simple stochastic gradient descent algorithm to
find weights that minimize the error. BP networks are necessarily Multi-Layer
Perceptrons (MLPs), usually with one input, one hidden layer, and one output
layer. In order for the hidden layer to serve any useful function, multi-layer
networks must have non-linear activation functions for the multiple layers.
Non-linear activation functions that are commonly used include the: logistic
function, softmax function and the gaussian function.

In a Backpropagation Neural Network (BPNN), the learning is formulated as


follows. Firstly, a training input pattern is presented to the network input layer.
The network propagates the input pattern from layer to layer until the output
175

pattern is generated by the output layer. If the pattern is different from the
desired output, an error is calculated in each output neuron and then it is
propagated backwards through the network from the output layer to the input
layer. The weights of each neuron are adjusted as the error is propagated.
Figure 5.27 shows a typical architecture of a BPNN [215]. The following four
steps illustrate the implementation of the BP algorithm.

Step 1: Initialization
All the weights and threshold levels of the network are set to random
numbers uniformly.

Step 2: Activation

The BPNN is activated by applying inputs  (5+, (5+, , h (5+ and desired

outputs * (5 +, * (5+, , *h (5+. The input values are normalized in between

the range of 0 and 1. Then the actual output of the neurons in the hidden
layer is calculated using the function below:
*| (5+ = 11>h N  (5+  | (5 + | $

(5.2)

Where
n is the number of inputs of neuron j in the hidden layer, and
sigmoid is the sigmoid activation function.

Step 3: Weight Training


The weights in the BPNN are updated and the errors associated with output
neurons are then propagated backward.

Step 4: Iteration
The value of k is increased by one, and Step 2 is repeated again. The
iterations continue until the error becomes zero.

176

Input Signals
x1

y1

..
..

y2

yk

yl

x2

xi

..
..

..
..

xn

wij

wjk

Hidden Layer
Output Layer

Input Layer
Error Signals

Figure 5.27: The network architecture of a BPNN

Many kinds of activation functions have been proposed and the BP algorithm is
applicable to all of them. A differentiable activation function makes the function
computed by a neural network differentiable (assuming that the integration
function at each node is just the sum of the inputs), since the network itself
computes only function compositions.

5.2.3.2 Online-Sequential Extreme Learning Machine


The Extreme Learning Machine (ELM) was proposed by Huang in 2006 [216],
for single hidden-layer feed-forward neural networks (SLFNs), and it was
devised to produce superior performance [217219] compared to other AI
techniques. The ELM is one algorithm amongst the supervised batch learning
algorithms that uses a finite number of input and output samples for training.
The ELM algorithm is claimed to be extremely fast in its learning speed and has
better generalization performance when compared to conventional learning
algorithms [220].
177

ELM is a general learning algorithm for SLFNs that works effectively for function
approximations, classifications, and online prediction problems. Moreover, it
can generally work well for a variety of types of applications. Usually, a SLFN
has three kinds of input parameters: (i) the input weight  , (ii) the hidden

neuron biases l , and (iii) the output weight  . While conventional learning
algorithms of SLFNs have to tune these three parameters, ELM randomly

generates the input weight  and the hidden neuron biases l and then

analytically calculates the output weight  . No further learning is required for


SLFNs trained using ELM.

Given arbitrary distinct samples ( , +, where  = V ,   , ,  h W h

hidden neurons and activation


and = V ,  , , L W L , a SLFN with
function (+ can be mathematically modelled as:


N  ( | + l + = | ,

Where

= 1,2, ,

(5.3)

 is the weight vector connecting the input neurons and the ith hidden

neuron i,

l is the threshold of the ith hidden neuron, and

 is the weight vector connecting the ith hidden neuron and the output
neurons.

Here  | denotes the inner product of  and | . If the SLFN can approximate

these samples with zero error, then 


|N | | = 0, follows; i.e., there

exists  ,  , l such that 
N  ( | + l + = | , = 1,2, , . The above

equations can be written compactly as p = , where:

178

p( , ,  , l , , l ,  , ,  + =

(  + l + (  + l +


(  + l + (  + l + 


=


L

and


=


(5.4)

(5.5)

L

As specified by Huang and Babri in [221], H is called the hidden-layer output


matrix of the neural network, with the ith column of H being the ith hidden

neuron output with respect to inputs  , , , h . Based on the previous work

of Huang in [154], matrix H is square and invertible only if the number of hidden

= , indicating
neurons is equal to the number of distinct training samples
that SLFNs can approximate these training samples with zero error. In most

cases, the number of hidden neurons is much lower than the number of distinct
, H is a non-square matrix and there may not exist
training samples,
 , l  (1 = 1, , + such that p = . Thus one specific set of 
 , l  (1 =

+ needs to be found such that,


1, ,

p
 ,,
  , l , , l  

= min,r, p( , ,  , l , , l +

(5.6)

which is equivalent to minimizing the cost function,




 = 
|N  N  ( | + l + |

(5.7)

Huang in [218, 222] indicates, the hidden neuron parameters need not be tuned,
as the matrix H indeed converts the data from non-linear separable cases to
high dimensional linear separable cases. However, Huang in [220] showed that
the input weights and hidden neurons or kernel parameters are not necessarily
to be tuned and can be randomly selected and then fixed. Thus, for fixed input

179

weights and the hidden layer biases or kernel parameters training a SLFN is
equivalent to finding a least squares solution  for the linear system, p = .

In order to handle online applications, the variant of ELM referred to as OnlineSequential Extreme Learning Machine (OS-ELM) was introduced by Liang et al.
[223]. It was proposed to overcome the limitation of ELM as developed by
Huang [220]. As the ELM algorithm belongs among the batch learning
algorithms, this prohibits its further application. As in the real world, training
data may arrive either chunk-by-chunk or one-by-one, therefore an OnlineSequential learning is most suitable to cater for such variations. The OS-ELM
algorithm is designed to handle both additive neurons and RBF nodes. This
algorithm was originally developed for SLFNs with additive or radial basis
function (RBF) hidden nodes in a unified framework. Unlike other sequential
learning algorithms that require so many parameters to be tuned, OS-ELM only
requires the number of hidden nodes to be specified.

The OS-ELM as proposed by Liang et al. [224] consists of two phases: (i) an
initialization phase, and (ii) a sequential-learning phase. In the initialization
phase, the number of data required should be at least equal to the number of
hidden nodes. The boosting phase trains the SLFNs using the primitive ELM
method given some batch of training data in the initialization stage. This data is
discarded once the process is complete. Following the initialization phase, in the
learning phase, OS-ELM learns the training data chunk-by-chunk. Subsequently,
all the training data is discarded once the learning procedure involving the data
is complete.

In the derivation of OS-ELM, only the specific matrix H is considered, where the

. Under this
rank of H is equal to the number of hidden neurons: 5(p+ =
180

condition, the following implementation of the pseudo inverse of H is readily

derived and given by, p  = (p p+ c p , which is often called the left pseudoinverse of H from the fact that p  p =  . The corresponding estimation of  is
given by:

 = (p p+c p

(5.8)

which is called the least-square solution to p = . The sequential

implementation can be derived and is referred to as the recursive least-squares


(RLS) algorithm.

The algorithm of OS-ELM proposed by Liang et al. in [224]: Given an activation


for a
function g or RBF kernel and hidden neuron or RBF kernel number
specific application, the following two steps are taken:

Step 1: Boosting Phase Given a small initial training set, =

, the intention is to boost the


( , +| h , L , 1 = 1,2, ,

learning algorithm by means of the following procedure:

(a) Assign random input weight  and bias l or centre and impact
.
width , 1 = 1,2, ,

(b) Calculate the initial hidden-layer output matrix, p = V , .  W ,


.
where = V(  + l +, , (  + l +W , 1 = 1,2, ,

(c) Estimate the initial output weight,  (+ =  p  , where  =


(p p +c and  = V , ,  W .

(d) Set 5 = 0.

Step 2: Sequential-Learning Phase For each further incoming

+ 1,
+ 2,
+ 3, ,
observation ( , +, where  h , L and 1 =
do the following:

181

(a) Calculate the hidden-layer output vector, using:


(b

= V(  + l +, , (  + l +W

(b) Calculate the latest output weight  (b


Square (RLS) algorithm:

(c) Set 5 = 5 + 1.

 (b

b

based on a Recursive Least


 (b + b

= 

1 + b  (b +


=  (+ + b (b +   b
 (+

The ELM and OS-ELM provide a faster learning capability as compared with
conventional machine learning algorithms. Unlike many other popular learning
algorithms, little human involvement is required in OS-ELM. Except for the
numbers of the hidden neurons (insensitive to OS-ELM), no other parameters
need to be tuned manually by users, because this algorithm chooses the input
weights randomly and analytically determines the output weights. Furthermore,
as Huang and Chen [218] have recently proven, OS-ELM is actually a learning
algorithm for generalized SLFNs.

5.2.3.3 Classification Results of Compared Models


In this research study, the Backpropagation Neural Network (BPNN) and
Extreme Learning Machine (ELM) are used to benchmark the performance of
the proposed SVC and FIS model. As neural networks have similar structure to
that of SVMs, therefore, they are used in this research study for comparison with
the proposed modelling scheme.

The BPNN during training uses a different approach in calculation as it


minimizes the empirical error; the SVM in contrast, minimizes the structural
risk. Similarly, the ELM is a SLFN. Conventional neural network learning
algorithms of SLFNs require tuning of network parameters; however, the ELM
182

randomly generates the input weight and the hidden neuron biases of the SLFN
and uses them to calculate the output weight without requiring further learning.

In the classification results, the inspection hitrate and the training accuracy of
the proposed SVC and FIS model is compared to the experimental results
obtained for the ML-BPNN and OS-ELM models. The ML-BPNN and OS-ELM
techniques are evaluated using the same data preprocessing framework
outlined in Figure 4.5. Experimental results obtained for the proposed SVC and
FIS model versus the ML-BPNN and the OS-ELM models are tabulated in Table
5.3.

Table 5.3: Experimental results of the proposed SVC and FIS scheme versus the
ML-BPNN and OS-ELM classification techniques

Model

ML-BPNN

OS-ELM

SVC and FIS

TNBD Station

Training Accuracy
(Memorization)

Inspection
Hitrate

Kota Bharu

85.23%

28.51%

Kuala Krai

81.08%

23.93%

Gua Musang

83.12%

26.38%

Kota Bharu

82.96%

33.06%

Kuala Krai

75.43%

29.97%

Gua Musang

81.71%

30.45%

Kota Bharu

81.66%

42.56%

Kuala Krai

73.56%

38.07%

Gua Musang

79.27%

41.39%

As indicated by Table 5.3, the highest training accuracy is achieved by the MLBPNN and the highest inspection hitrate is obtained by the SVC and FIS model.
The training accuracy is a measure of the memorization capability of the
183

classification system, however, the inspection hitrate is a more cruical and


important result in this research study, since it reflects the actual detection
accuracy of the model for the customers tested/validated.

For the case of the comparative study, the architecture of the BPNN is chosen to
have 25 inputs, two hidden layers and one output layer. The single neuron in the
output layer of the BPNN gives an output of 0 for good customers and an
output of 1 for suspicious customers. All 25 features are fed as the input data to
the input layer of the BPNN. The actual output of the neurons in the hidden
layers is calculated using the activation function defined in eq. (5.2).

In the comparative research study, the OS-ELM was implemented with the
Radial Basis Function (RBF) activation function. In the OS-ELM with the RBF
nodes, the centres and widths of the nodes were randomly generated and fixed,
and based on this, the output weights were analytically determined. The
number of hidden neurons used in the OS-ELM varies in the range from 20 to
200. The method to search for the optimal size of the hidden layer neurons in
the OS-ELM is suggested by Huang et al. in [220]. With an initial starting size of
20, the number of neurons is increased with a step of 20, until 200 neurons are
reached. Based on the output performance, the optimal size of the neurons is
decided. Finally based on the optimal size of neurons, 100 trials are performed,
after which the average accuracy is determined based on the best output.

The number of neurons in the hidden layers of the BPNN and OS-ELM are tuned
using CV. In this research study, 10-fold CV is chosen, since there are many
training data present that can be divided into subsets. The reason for using CV is
to ensure the accuracy of the results do not overfit the training data. Table 5.4
summarizes the comparison results for Table 5.3, by computing the average
184

training accuracy and the average inspection hitrate for the tested customers in
the three cities in the state of Kelantan.

As indicated by Table 5.4, the ML-BPNN achieves the highest training accuracy
and the lowest inspection hitrate. In contrast, the OS-ELM obtains a slightly
higher inspection hitrate as compared to the ML-BPNN. With respect to the
training accuracy, the proposed SVC and FIS model obtains the lowest training
accuracy. However, in terms of the inspection hitrate, the proposed SVC and FIS
model outperforms the other two models by far, with an average inspection
hitrate of 40.75%. The increase in the inspection hitrate of the proposed model
is 14.5% and 9.5% as compared to the ML-BPNN and OS-ELM respectively.

Table 5.4: Comparison of the average training accuracy and inspection hitrate of
the proposed SVC and FIS scheme versus the ML-BPNN and OS-ELM
Model

Training Accuracy
(Memorization)

Inspection Hitrate
(Onsite Inspection)

ML-BPNN

83.14%

26.27%

OS-ELM

80.03%

31.16%

SVC and FIS

78.16%

40.75%

5.2.3.4 Discussion of Comparison Results


First and foremost, both SVMs and neural networks are considered as black box
modeling. Although both algorithms share the same structure, but the learning
methods for both algorithms are completely different. Neural networks try to
minimize the training error; however, in contrast SVMs reduce capacity using
the structural risk minimization (SRM) concept.

185

The 10-fold CV method used for tuning the hidden layer neurons in the MLBPNN and OS-ELM, firstly partitions the training data into 10 subsamples. Of the
10 subsamples, a single subsample is retained as the validation data for testing
the model, and the remaining 9 subsamples are used as the training data. The
CV process is then repeated 10 times, with each of the 10 subsamples used
exactly once as the validation data. The results from the 10 folds are then
averaged to produce a single estimation, which is generally termed as the
training accuracy of the classifier, defined by eq. (4.5).

As indicated from Table 5.4, it is observed that all three models have a relatively
good memorization capability, i.e., high learning rates (training accuracies).
However the BPNN obtained the highest training accuracy, followed by the OSELM and the proposed SVC and FIS model. The main reason for the higher
training accuracy of the BPNN is, as the BPNN model is a problem of nonlinearity optimization using a gradient descent approach, the major peculiarity
that impacts its performance is the presence of the local minimum. The main
drawback of the BPNN is that, it gets trapped into the local minimum. Figure
5.28 illustrates the phenomenon of the local minimum, in which case the
training is to be optimized to achieve a global optimum solution. The local
minimum is determined with respect to the Mean Square Error (MSE), generally

referred to as error in BPNNs sometimes alongside the weights, | of the

network, as indicated by Figure 5.28.

The main objective of a neural network training process is to obtain a global


optimum solution. However, in a BPNN to get the overall minimum answers of
the error function, the network extrema corrects itself slowly along the local
improved way and eventually ends up in obtaining the local optimization
answers. The reason for this is, as the BP algorithm is based on the gradient
optimization method, the network tends to descend slowly with a low learning
speed, and when a flat section (roof) appears for a long time the BPNN training
186

ends at that instance, resulting in locally optimized answers. Proof of the local
minimum phenomenon is shown in Figure 5.28.

MSE

Solution can be stuck


here because this local
minimum is deep
Local minimum

Local minimum

Global minimum
(best solution)

wj
Figure 5.28: The phenomenon of local minimum in a BPNN

It is also observed from the experimental results in Table 5.4 that the BPNN
achieves the lowest inspection hitrate of 26.27%. This indicates that the BPNN
has a lower generalization capability as compared to the OS-ELM and SVC. The
main reason for this is because, as it is difficult to obtain the best network
structure of the BPNN using a trial and error procedure, the optimum solution
cannot be easily found. In terms of more logical reasoning it can be said that, as
the BPNN tries to minimize the training error, it results in over fitting the
training data. However, this does not mean that the BPNN is not a good
algorithm for classification, but as there are many noisy training data present,
the BPNN is not a suitable choice to be used in this application.

The OS-ELM used for comparison purposes in this research study overcomes
many issues in traditional gradient algorithms like BPNN, such as: stopping
187

criterion, learning rate, number of epochs and local minima. The experimental
results obtained in Table 5.4 reveal that the training accuracy of the OS-ELM is
slightly higher and the inspection hitrate of the OS-ELM is significantly lower as
compared to the proposed SVC and FIS model. The reason for the higher
training accuracy of the OS-ELM is because, the OS-ELM iteratively fine tunes
the networks input weights and biases using finite samples of the training data,
which yields in a higher memorization capability. For the RBF activation
function, the OS-ELM randomly initializes hidden neuron parameters (input
weight vectors and neuron biases for additive hidden neurons and centers and
impact factors for RBF hidden neurons) and iteratively computes the output
weight vector.

Furthermore, in the OS-ELM it was observed that, if the order of the training
samples is switched or changed, the resulting training accuracy of the OS-ELM
also changes. Therefore, in order to cater this situation, during training the
training accuracy was computed over an average of 100 trials, where on each
trial training samples were ordered randomly. Another noticeable observation
concluded from the OS-ELM was that, with the increase of neurons, the OS-ELM
achieved a better performance, while remaining stable for a wide range of
neuron sizes. However, with an increase in the number of hidden layer neurons,
the training time of the OS-ELM decreases.

The inspection hitrate achieved by the OS-ELM during testing and validation
was 31.16%, which is significantly lower as compared to the proposed SVC and
FIS model, with an inspection hitrate of 40.75%. There are a few reasons which
can be stated in order to contribute to the lower generalization capability of the
OS-ELM. The first reason being that, the assignment of the initial weights in the
OS-ELM is arbitrary, which effects generalization performance of classifier. As
the proper selection of input weights and hidden bias values contributes to the

188

generalization capability of the classifier, the initialization of arbitrary weights


may tend to decrease the generalization performance of the OS-ELM.

Another factor contributing to the lower generalization accuracy of the OS-ELM

is that, the value of (Gamma) in the RBF activation function is set as a constant
value of 1. The cause of this is unclear and is not stated by the authors in [223].

The only justifiable reason given for this is that, for the three classification
problems conducted by Liang et al. in [223]: (i) image segmentation, (ii) satellite
image problem, and (iii) DNA problem, the RBF activation function using the
value, = 1 achieves the best classification performance.

As the parameter controls the width of the Gaussian function, therefore, it is

suggested to be selected in within the range of 0 to 1. If the value of is


increased to 1, the generalization performance of the classifier for unseen

data will decrease. More importantly, the value of cannot be fixed as a

constant value, since the width of the Gaussian function depends upon the data
to be classified and the amount of noise present in the data. For the case of the

three classification problems in [223], the value of = 1 might result in the best

classification accuracy, however, this might not be true for other types of
classification problems with different data. Furthermore, there is no evidence or
literature found in the OS-ELM, on how to tune the parameter in the RBF

activation function. Therefore, the constant value of = 1 in the RBF activation


function of the OS-ELM, may contribute to the lower generalization accuracy,
resulting in a lower inspection hitrate.

The last reason contributing to the lower inspection hitrate of the OS-ELM is
that, since OS-ELM only requires one parameter to fine tune, which is the
number of hidden neurons in the hidden layer, in reality however, it is relatively
189

difficult to obtain the best network structure using a trial and error procedure,
as the optimum solution cannot be easily found. The ELM and OS-ELM, do suffer
from a few drawbacks, which are indicated as follows:

(a) For achieving comparable results, the number of neurons in the hidden
layer must be chosen larger than in the standard BP algorithms. This is
because the neuron weights and biases are not learned from the data.

(b) As there is only one hidden layer in the SLFN, if trained properly, MultiLayer Perceptron (MLP) networks with more than one hidden layer can
possibly achieve similar and even better results as compared to ELM and
OS-ELM.
(c) The solution provided by ELM and OS-ELM is not always so smooth, and
mostly shows some ripple.

The method of using SVC for fraud detection is very promising, as the SVC and
FIS model achieves the highest inspection hitrate for fraud customer detection,
as indicated in Table 5.4. Firstly, it is noted that, SVC has non-linear dividing
hypersurfaces that give it high discrimination. Secondly, SVC provides good
generalization ability for unseen data classification. Lastly, SVC determines the
optimal network structure itself, without requiring to fine tune any external
parameters, as in the case of the ML-BPNN and the OS-ELM. In contrast to the
advantages of SVMs over neural networks, there are however some drawbacks
of SVMs. These drawbacks are restricted due to practical aspects concerning
memory limitation and real time training. Some of the major drawbacks of SVMs
are as follows:

(a) The optimization problem arising in SVMs is not easy to solve. Since the
number of Lagrange multipliers is equal to the number of training

190

samples, the training process is relatively slow. Even with the use of the
SMO algorithm, real time training is not possible for a large set of data.

(b) Another major drawback of SVMs is the requirement of storage capacity.


The support vectors (SVs) represent the important training samples
describing the distinguishing features of the given classes. When the
optimization problem has a low separability in the space used, the
number of SVs increases. These SVs have to be stored in a model file for
future classification. This puts limitations on the use of SVM for pattern
recognition or classification in devices with limited storage capacity.

In comparing SVC to the OS-ELM, the only advantage of OS-ELM over SVC is its
faster training process, with the increase in the chunk size. It is well known that
with the RBF as the kernel function, SVMs suffer from tedious parameter tuning.
The OS-ELM even with a single parameter to be tuned, its arbitrary assignment
of initial weights requires it to search for the optimal size of neurons and
execute many times in order to get the average value. Hence, in this case, the OSELM loses its edge over SVC. Given all these aspects, the author feels that SVC is
a superior technique when the requirement is to solve a classification problem.

5.3 Summary
This first sub chapter presented the Graphical User Interface (GUI) developed
for the fraud detection system. The GUI of the software developed generates the
detection report of the list of suspicious customers and the average daily
consumption report. The second sub chapter presented the model testing
results based on: (i) the classifier, (ii) pilot testing, and (iii) comparison of the
proposed SVC and FIS model with other AI based classification techniques. The
contribution of the FIS for hitrate improvement is also discussed and the
computational intelligence scheme of SVC and FIS is compared to standard SVC.
Finally in the end of sub chapter 2, a comparative study of the proposed SVC and
191

FIS model was performed with two AI based classification techniques: (i) MultiLayer Backpropagation Neural Network (ML-BPNN), and (ii) Online-Sequential
Extreme Learning Machine (OS-ELM) in order to evaluate the efficiency and
effectiveness of the proposed model.

192

CHAPTER 6

CONCLUSION AND FUTURE WORK

6.0 Overview
This chapter concludes the thesis and summarizes the research contributions
made. The achievements and objectives of the research study with respect to
the project are highlighted along with the key findings of the research. In
addition, this chapter also discusses the impact and significance of this project
to TNB in Malaysia and suggests future research in the present context that
merits consideration.

6.1 Benefits of the SVC and FIS in the Proposed Model


In this research study, the SVM and FIS have been investigated and applied in
the development of the proposed fraud detection system. The main concern of
this research study is the application of SVM, namely SVC, for the classification
of patterns (load consumption profiles of customers) into two categories:
normal and fraud customers. The FIS is used as a data postprocessing scheme
for the selection of suspicious customers from the correlated customer data and
SVC output.

In the developed fraud detection system, SVC has a considerable advantage over
neural networks, as it provides the use of soft margins for the purpose of
separation (classification), thus allowing improvement in the generalization
193

performance of the system. SVC has a notable number of advantages as


compared

to

neural

networks.

Firstly, SVC has non-linear dividing

hypersurfaces that give it high discrimination. Secondly, SVC provides good


generalization ability for unseen data classification. In addition, SVC determines
the optimal network structure itself, which is not the case with traditional
neural networks. With the introduction of the proposed SVC technique, the
developed system is able to control the balance between sensitivity and
specificity, giving the fraud detection system more flexibility [224]. Thus, for
this research study SVC is more practical and favorable, where this control is
most needed in the frequent presence of unknown and unbalanced data sets.

The FIS as indicated by Table 5.2 gives an additional boost to the inspection
hitrate, with the inclusion of experienced human knowledge into the fraud
detection system. The main reason behind the increased inspection hitrate
using the FIS is that, the FIS attempts to emulate the reasoning process that a
human expert undertakes in detecting fraud activities. In addition, the FIS is
able to remove the hard limiting of the parameter values from SQL statements
with the help of the MFs. Thus, with the use of the FIS, the fraud detection
system provides a combination of two computational intelligence schemes, i.e.
the SVC and FIS, and achieves higher fraud detection accuracy.

6.2 Key Findings of the Research


This section highlights the key observations made from the experimental results
in Chapter 5. The SVC scheme performs relatively well during the training
process. As indicated in Table 5.4, the average training accuracy achieved by the
SVC is 78.16%, which indicates that the memorization and learning capability of
the SVC model is notably good, even with the presence of noisy data. The two
main reason reasons which contribute to this good training accuracy are: (i)
proper selection of the features for building the training model, and (ii) fine
tuning of the SVC and RBF kernel parameters using the -fold CV approach.

194

The average pilot testing results for the proposed SVC and FIS model indicate
that an inspection hitrate of 48% is achievable. However, this is not true for all
cities within peninsular Malaysia, as the average inspection hitrate obtained
using testing data for different cities within peninsular Malaysia is 40.75%, as
indicated in Table 5.4. This is because in pilot testing the data used for training
and testing/validating the SVC model is for the same station, which results in a
higher inspection hitrate. Thus, with the use of the proposed fraud detection
system, an average inspection hitrate of 40% is more or likely achievable for
any city within peninsular Malaysia.

Comparison of the proposed SVC and FIS model with standard SVC indicates
that the SVC and FIS model obtains a higher inspection hitrate, as shown in
Table 5.2. On an average, by using FIS scheme, the fraud inspection hitrate
increases from 32% to 40%. This indicates that the FIS contributes an 8%
increase to the inspection hitrate, which is a significant increase in the detection
accuracy. Thus, the fraud detection system is able to achieve an average
inspection hitrate of 40%, which increases TNBDs current inspection hitrate
35-37% from their current hitrate of 3-5%.

Since SVMs and neural networks are considered as black box models, which
share the same structure but utilize different learning methods, a comparative
study is performed using SVC and neural networks. The comparative results
between the neural networks (ML-BPNN and OS-ELM) and the proposed SVC
and FIS model indicate that the proposed model has a higher performance, i.e.
inspection hitrate, compared to the ML-BPNN and OS-ELM, as indicated by
Table 5.4. This indicates that the generalization performance of the SVC and FIS
model is significantly better as compared to the ML-BPNN and the OS-ELM.

195

The performance of a SVC can be however, problem dependent, as it is based on:


the collected data set, the selected modeling features and the method in which
the data is split between the training and testing/validating sets. In some cases,
its worth noticing that the SVC algorithms have indicated a lower classification
performance compared to neural network algorithms, as in the paper presented
by Osareh et al. [225]. Thus, this shows that suitability of a classifier for a
problem is data dependant. Since the data set used in this research study is
large and contains a significant amount of noise, the SVC as a classifier is found
to be more suitable technique in order to solve this classification problem.

6.3 Achievement of the Research Objectives


The present study has achieved the research objectives outlined in section 1.3
by the successful development of a fraud detection system, using data mining
techniques and AI based approaches. This section highlights the research
objectives fulfilled with respect to the project. The achievements of the research
study are highlighted and discussed briefly, as follows:

The main objective of this research is to develop a user-friendly Graphical User


Interface (GUI) of an intelligent detection system, in order to assist TNBD SEAL
teams to increase effectiveness of their onsite operation for reduction of NTLs.
Through this research study the desired fraud detection system was developed
and it is able assist the NTL Group of TNB and their SEAL teams to in order to
detect and identify customers with fraud activities and abnormalities. The
developed, Abnormality and Fraud Detection System (AFDS), was delivered to
TNBD on January 7, 2009 at their Port Dickson office, with the TNBD SEAL team
being satisfied with the final outcome of this research study. The developed
system is currently assisting TNB in different states of peninsular Malaysia to
reduce NTL activities.

196

The second objective of this research study is to identify, detect and predict
customers with fraud activities, abnormalities and other irregularities, by
investigating and monitoring significant deviations and abrupt changes from
historical customer consumption patterns with the use of billing data. By the
implementation of this research study, the SVC model developed, utilizes load
consumption patterns, i.e. load profiles of customers, in order to detect and
identify customers with fraud activities and abnormalities. As, this problem is
pure a pattern classification task, fraud customers are identified by detecting
abrupt drops or sudden changes in load consumption patterns of customers.
Through the experimental results obtained for model testing and evaluation, it
is clearly indicated that fraud activities relating to NTLs, can be detected and
identified by investigating historical load consumption patterns of customers.

The third objective of this research aims to develop an automated framework to


preprocess customer data for smoothening noise, feature extraction and data
normalization. The data preprocessing framework used to implement the fraud
detection system is illustrated in Figure 4.5. Since SVC training is carried offline
i.e. only once, therefore the training (classifier) data does not require any
preprocessing. However, for testing and validating the fraud detection system,
preprocessing is carried out by the automated data preprocessing procedure.
Thus, the data preprocessing is an automated task, which has been successfully
implemented in the proposed fraud detection system.

In the fourth objective of the research study, a fraud detection model using a
combination of two AI based computational intelligence schemes, namely:
Support Vector Machine (SVM) and Fuzzy Inference System (FIS), is proposed
to be developed. Through this research study, a SVC and FIS computational
intelligence model for fraud detection was developed, which is fully functional
and ready to be used. In the developed SVC and FIS model, the SVC is used as
the core of the pattern classification engine in order to detect and identify fraud
197

customers based on their load consumption profiles, while the FIS is used a data
postprocessing scheme in order to select suspicious customers based on the
results from customer data correlated with the SVC results.

Lastly, this research aims to evaluate the proposed fraud detection system using
customer data from TNBD for different cities within in peninsular Malaysia and
provide a comparative study using different AI based classification techniques
for the purpose of benchmarking the proposed model. Based on the research
study conducted, the developed SVC and FIS computational intelligence model
is tested and validated using TNBD customer data from different cities within
peninsular Malaysia, as indicated by section 5.2.2. In addition, a comparative
study is also conducted which benchmarks the proposed SVC and FIS model
with two neural networks: (i) the ML-BPNN, and (ii) the OS-ELM. The results of
the comparative study indicate that the proposed model is significantly better
in terms of the fraud detection hitrate, as compared to the other two AI based
classification techniques. The developed fraud detection system does show
encouraging results, however it also has a few limitations. These limitations can
be eliminated by implementing the future work suggested in section 6.5.

6.4 Impact and Significance of the Project to TNB


As mentioned earlier in section 1.2, the one of key initiatives of TNB in Malaysia
is to reduce their NTLs in the LV distribution network, estimated around 15% in
peninsular Malaysia. Distribution losses due fraud activities are low throughout
peninsular Malaysia; however, in some states like Selangor, Penang and Johor
high distribution loss have been reported.

Large inspection campaigns have been carried out by TNBD with little success.
The current actions taken by TNBD SEAL teams in order to address the problem
198

of NTLs include: (i) meter checking and premise inspection, (ii) reporting on
irregularities, and (iii) monitoring of unbilled accounts, which have resulted in a
detection rate of 3-5% from the total inspections carried out. This is because at
present, customer installation inspections are carried out without any specific
focus or direction. Most inspections are carried out at random, while some
targeted raids are undertaken based on information reported by the public or
meter readers.

The reason as to why customer installation inspections are carried out at


random, is due to unavailability of a detection system that can shortlist possible
suspicious customers with fraud activities and abnormalities. With the use of a
fraud detection system, when customer installation inspections are guided to a
reduced group of likely fraudulent customers, a systematic checking system
with a higher fraud detection rate of success is achieved. The main significance
and impacts to power utilities including TNB in Malaysia, from the research
reported in this thesis are identified as follows:

1. The fraud detection system developed in this research study provides


utility information system tools to power utilities including TNB in
Malaysia for efficient detection and classification of NTL activities, in
order to increase effectiveness of their onsite inspection operations.
2. With the implementation of the proposed system, operational costs for
power utilities including TNB in Malaysia due to onsite inspection in
monitoring NTL activities will be significantly reduced. This will also
reduce the number of inspections carried out at random, resulting in a
higher fraud detection hitrate.
3. Disseminate knowledge and behavior regarding fraudulent consumption
patterns is obtained with the use of the proposed system, which is useful
for further study and analysis by NTL experts and inspection teams in
power utilities including TNB in Malaysia.

199

4. Lastly, by using the proposed fraud detection system, great time saving
in detecting and identifying problematic electric meters can be achieved
for power utilities including TNB in Malaysia.

The section below discusses a scenario where by the developed fraud detection
system can help to reduce TNBs operational cost due to onsite inspection in
monitoring NTL activities. The scenario taken is based on TNBs experience for
industrial and commercial OPCs in the financial year of 2005 [125, 198]. This
scenario uses the minimal inspection hitrate achieved by fraud detection system
for model testing and evaluation, which is 38% as indicated in Table 5.1. A
simple assessment of the benefit of the developed fraud detection system is
illustrated as follows:

Without the use of the fraud detection system

Total installations checked: 441,662

Inspection hitrate: 3-5% [118, 198]

Average installations with NTL: 17,666 (i.e. Average hitrate of 3-5%)

Cost of checking customer installation: RM 40/installation [125]

Operational expenditure cost: 441,662 RM 40 = RM17.6 Million/yr

With the use of the fraud detection system

Inspection hitrate: 38%

Total installations checked: 46,488 (i.e. Hitrate of 38% = 17,666/46,488)

Operational expenditure cost: 46,488 RM 40 = RM 1.86 Million

Cost saved in checking installations: RM15.74 Million/yr

As it is indicated from the assessment above, with the implementation of the


proposed fraud detection system, TNB in Malaysia is estimated to save
approximately RM 15 Million annually, for onsite inspections of customer
installations. This tremendous amount of cost saving in inspecting customer
installations results as a fact that the proposed fraud detection system is able to
achieve an average inspection hitrate of 40%, which increases TNBDs current
200

inspection hitrate 35-37% from their current hitrate of 3-5%. Thus, as a


conclusion the fraud detection system developed for the detection and
identification of NTL activities will significantly benefit power utilities, including
TNB in Malaysia and their customers.

6.5 Future Expansion and Recommendations


Even though the project has been delivered to TNB and it is currently being
implemented by their NTL Group and SEAL teams throughout peninsular
Malaysia, however, research can be further continued to improve the current
fraud detection system. One of the major concerns is that, the developed system
needs to be further tested and evaluated using more customer data from
different cities within peninsular Malaysia, so as to ensure the reliability and
accuracy of the fraud detection system.

This research study can be expanded by classifying load consumption patterns


by respective districts, i.e. states or regions within in peninsular Malaysia. This
approach is seen to be more accurate, as different states have been identified
with different consumption patterns and trends. Evidence of this characteristic
is supported by the fact that, the pilot testing results indicated in section
5.2.2.1.1 achieved a higher inspection hitrate of 48%, as compared to the
inspection hitrate for the three cities in the Kelantan state. However this
approach requires more training data from each state, especially the load
consumption patterns of the fraud customers. In the future, the fraud detection
system may need to cater for new training data, because electricity consumption
patterns of each state can eventually change.

It is also recommended that the developed fraud detection system can be


further tested and evaluated on other power distribution networks in Malaysia,
201

such as the: (i) Sabah Electricity Sdn. Bhd. (SESB), and (ii) Sarawak Electricity
Supply Corporation (SESCO). The fraud detection system can be implemented in
the SESB and SESCO distribution networks in order to evaluate the detection
accuracy and performance of the system. This research study strongly believes
that the fraud detection system can contribute significant improvement to
SESBs and SESCOs distribution networks for the reduction of NTL activities.
The sections below provide brief suggestions and recommendations on future
work, which can be carried out on the current fraud detection system, in order
to increase the detection accuracy of the system.

6.5.1 SVC Parameter Tuning using Genetic Algorithm


For any classification task, performance of the classifier will decrease if the
modelling hyperparameters are not selected properly. The hyperparameters are
fine tuned in order to obtain the best generalization performance. Lagrange
parameter selection in the case of SVM is complex in nature, as it is difficult to
solve by conventional optimization techniques.

A practical difficulty of using SVC is the selection of parameters, C and the kernel

parameter in the RBF (Gaussian) kernel. Even though with the use of -fold CV
with the Grid-Search method, as used in this research study, an optimal solution

might not be achieved. The parameters (, + need to find their optimal values
so they can minimize the expectation of testing and validation error, that needs
to adapt to multiple parameters values at the same time.

The Genetic Algorithm (GA) with its characteristics of high efficiency and global
optimization has been widely applied in many applications to solve optimization
problems. Thus, it is suggested that, in order to solve the Dual Lagrangian
Optimization (DLO) problem in SVC for the selection of optimal parameters, a
202

hybrid combination of the SVC and the GA can be implemented as the future
work for this research study. The hybrid SVC-GA algorithm can be developed as
an alternative technique to search for the optimal SVC parameters, which will
avoid the local optimum in finding the maximum Lagrangian.

6.5.2 Improving SVC Kernel using Multi-Scale RBF Kernel


The RBF (Gaussian) kernel is one of the well-known Mercers kernels for SVM,
which is widely used in many problems, as in the case of this research study.
The RBF kernel uses the Euclidean distance between two points in the original
space to find the correlation in the augmented space. The points very close to
each other are strongly correlated in the augmented space, whereas the points
far apart are uncorrelated in the augmented space. However, there is only one
parameter for adjusting the width of the RBF kernel , which is not powerful
enough to cater for complex problems.

In order to achieve a better kernel for SVC, one possible way is to adjust the
velocity of decrement in each range of the Euclidean distance between the two
points. The multi-scale kernel obtained using this method should however
maintain the characteristics of the RBF kernel. To implement this multi-scale
RBF kernel, a combination of RBF kernels at different scales is suggested as the
future work for this research study. In order to proceed, at first, a linear
combination of the RBF must be satisfied to be the Mercers kernel. It is hoped
that with use of this proposed kernel, the classification accuracy of the SVC
model will improve.

6.5.3 FIS Membership Function Optimization using GA


The MFs for the fuzzy rules in Table 4.11 are developed for the FIS using the SQL
statement parameter values in Table 4.10. Since the fuzzy MFs developed use
203

the SQL statements parameter values, the fuzzy MFs are not optimized or fine
tuned. Fine tuning the fuzzy MFs for the derived fuzzy rules, can improve the
inspection hitrate of the fraud detection system, by increasing the amount of
suspicious customers shortlisted and reducing the amount of abnormalities in
the result/output of the fraud detection system.

As the GA is well known technique for solving optimization problems, therefore,


it is suggested that a hybrid combination of FIS and GA can be implemented as
the future work for this research study in order to optimize the fuzzy MFs used
in the FIS. This hybrid FIS-GA algorithm will optimize the experienced human
knowledge and intelligence incorporated into the fraud detection system. By the
implementation of such a hybrid FIS-GA algorithm, it is hoped that significant
improvements will result in the accuracy of the fraud detection system.

6.5.4 Feature Extraction Using Consumption Difference


In the current research study, load profile features were extracted using eq.
(4.1). However, an alternative feature extraction method can be used, which is
based on calculating the difference between the monthly kWh consumption
divided by the difference of days with respect to the meter reading date. In
order to implement this, the following expression can be used:
Y

(L+

= |`` | ,
|x

x |
`

= 1,2, ,24

(6.1)

Where
|9Yb 9Y | represents the absolute difference in the monthly kWh

consumption between the following month and current month, and

|Yb Y | represents the absolute difference of days with respect meter


reading date between the following month and current month.

204

For future work in this research study, it is suggested that the alternative
feature extraction technique in eq. (6.1) can be used as a measure to compare
the performance of the fraud detection system against the proposed feature
extraction technique in eq. (4.1). By using eq. (6.1) in the proposed SVC and FIS
model, it is hoped that a higher inspection hitrate can be obtained.

6.6 Conclusion
In conclusion, the overall research has shown encouraging results and a
performance that matches human intelligence in detecting fraud customers and
abnormalities in peninsular Malaysia. The main contribution of this research
study is that, the fraud detection system developed in order to assist TNBD
SEAL teams is able to achieve an average inspection hitrate of 40%, which
increases TNBDs current inspection hitrate 35-37% from their current hitrate
of 3-5%. The outcome of this research study, which is the AFDS, is currently in
use by TNB in Malaysia, for their residential, commercial and light industry
customers in the low voltage distribution network. It is expected that,
utilization the fraud detection system will benefit TNB not only in improving its
handling of NTLs, but will complement their existing on-going practices and it is
envisaged that tremendous savings will result from the use of this system.

As it usually happens in an area of research, a lot of approaches can be used and


developed, given the appropriate amount of time and effort. It is strongly
recommended that the future work suggested and discussed above should be
investigated. The proper application and use of the fraud detection system will
be greatly appreciated both by TNB and those who regulate the electricity in
Malaysia, and it is hoped with the use of the developed system significant
savings and NTL reduction will result to the power utilities. Thus, with this
research study, it is desired that more SVM applications should be developed
and applied for the improvement and quality of human life.

205

REFERENCES

[1]

T. B. Smith, Electricity Theft: A Comparative Analysis, Energy Policy, vol.


32, no. 18, Dec. 2004, pp. 2067-2076.

[2]

M. S. Alam, E. Kabir, M. M. Rahman, and M. A. K. Chowdhury, Power


Sector Reform in Bangladesh: Electricity Distribution System, Energy, vol.
29, no. 11, Sept. 2004, pp. 1773-1783.

[3]

A. Kumar and S. D. Saxena, Decision Priorities and Scenarios for


Minimizing Electrical Power Loss in an Indian Power System Network,
Electric Power Components and Systems, vol. 31, no. 8, Aug. 2003, pp.
717-727.

[4]

R. M. Shrestha and M. Azhar, Environmental and Utility Planning


Implications of Electricity Loss Reduction in a Developing Country: A
Comparative Study of Technical Options, International Journal of Energy
Research, vol. 22, no. 1, Jan. 1998, pp. 47-59.

[5]

R. F. Ghajar and J. Khalife, Cost/Benefit Analysis of an AMR System to


Reduce Electricity Theft and Maximize Revenues for Electricite du Liban,
Applied Energy Energex 2002 - Energy Policies and Economics and
Rational Use of Energy of Energy Topics VI and VII, vol. 76, no. 1-3, Jan.
2003, pp. 25-37.

[6]

S. V. Allera and A. G. Horsburgh, Load Profiling for the Energy Trading


and Settlements in the UK Electricity Markets, in Proc. of the
DistribuTECH Europe DA/DSM Conference, London, U.K., Oct. 1998.

[7]

D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, Allocation of the Load


Profiles to Consumers using Probabilistic Neural Networks, IEEE
Transactions on Power Systems, vol. 20, no. 2, May 2005, pp. 548-555.

[8]

A. H. Nizar, Z. Y. Dong, and Y. Wang, Power Utility Nontechnical Loss


Analysis with Extreme Learning Machine Method, IEEE Transactions on
Power Systems, vol. 23, no. 3, Aug. 2008, pp. 946-955.

[9]

M. Sforna, Data Mining in a Power Company Customer Database,


Electric Power Systems Research, vol. 55, no. 3, Sept. 2000, pp. 201-209.

[10]

A. H. Nizar, Z. Y. Dong, and J. H. Zhao, Load Profiling and Data Mining


Techniques in Electricity Deregulated Market, presented at the IEEE
Power Engineering Society (PES) General Meeting, Montreal, Quebec,
Canada, June 2006.
206

[11]

I. E. Davidson, Evaluation and Effective Management of Non-Technical


Losses in Electrical Power Networks, in Proc. of the 6th IEEE Africon
Conference in Africa (AFRICON) 2002, vol. 1, 2-4 October 2002, pp. 473477.

[12]

R. Mano, R. Cespedes, and D. Maia, Protecting Revenue in the Distribution


Industry: A New Approach with the Revenue Assurance and Audit Process,
in Proc. of the 2004 IEEWPES Transmission and Distribution Conference
and Exposition: Latin America, 8-11 November, 2004, pp. 218-223.

[13]

M. V. K. Rao, and S. H. Miller, Revenue Improvement from Intelligent


Metering Systems, in Proc. of Ninth International Conference on
Metering and Tariffs for Energy Supply, Birmingham, U.K., August 1999,
pp. 218-222.

[14]

M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms,


Hoboken, NJ: Wiley-Interscience: IEEE Press, 2003.

[15]

J. E. Cabral, J. O. P. Pinto, E.M. Gontijo, and J. R. Filho, Fraud Detection in


Electrical Energy Consumers using Rough Sets, in Proc. of the 2004 IEEE
International Conference on System, Man and Cybernetics, 2004, pp.
3625-3629.

[16]

J. W. Fourie and J. E. Calmeyer, A Statistical Method to Minimize


Electrical Energy Losses in a Local Electricity Distribution Network, in
Proc. of the 7th IEEE AFRICON Conference Africa: Technology
Innovation, Botswana, Sept. 15-17, 2004.

[17]

J. Bilbao, E. Torres, P. Egufa, J. L. Berasategui, and J. R. Saenz,


Determination of Energy Losses, in Proc. of the 16th International
Conference and Exhibition on Electricity Distribution (CIRED), 2001.

[18]

J. R. Filho, E. M. Gontijo, A. C. Delaiba, E. Mazina, J. E. Cabral, and J. O. P.


Pinto, Fraud Identification in Electricity Company Customers using
Decision Trees, in Proc. of the 2004 IEEE International Conference on
Systems, Man and Cybernetics, Vol. 4, pp. 3730-3734, 10-13 October
2004.

[19]

A. H. Nizar, Z. Y. Dong, J. H. Zhao, and P. Zhang, A Data Mining Based NTL


Analysis Method, IEEE Power Engineering Society (PES) General
Meeting, 2007, pp. 1-8.

[20]

J. R. Galvan, A. Elices, A. Munoz, T. Czernichow, and M. A. Sanz-Bobi,


System for Detection of Abnormalities and Fraud in Customer
Consumption, in Proc. of the 12th Conference on the Electric Power
Supply Industry, November 2-6, 1998, Pattaya, Thailand.

[21]

R. J. Bolton and D. J. Hand, Statistical Fraud Detection: A Review,


Statistical Science, vol. 17, no. 3, Aug. 2002, pp. 235-55.
207

[22]

A. H. Nizar, Z. Y. Dong, M. Jalaluddin, and M. J. Raffles, Load Profiling


Non-Technical Loss Activities in a Power Utility, in Proc. of the 1st
International Power and Energy Conference (PECON), November 28-29,
Putrajaya, Malaysia, 2006.

[23]

R. Jiang, H. Tagaris, A. Lachsz, and M. Jeffrey, Wavelet Based Feature


Extraction and Multiple Classifiers for Electricity Fraud Detection, in Proc.
of the IEEE/PES Transmission and Distribution Conference and
Exhibition 2002: Asia Pacific, Yokohama, Japan, Oct. 2002.

[24]

L. Yan, R. Wolniewicz, and R. Dodier, Customer Behavior Prediction - It's


All in the Timing, IEEE Potentials, vol. 23, no. 4, Oct. 2004, pp. 20-25.

[25]

B. Stefano and F. Gisella, Insurance Fraud Evaluation: A Fuzzy Expert


System, in Proc. of the 10th IEEE International Conference on Fuzzy
Systems, Melbourne, Australia, Dec. 2001.

[26]

A. Deshmukh and T. L. N. Talluru, A Rule Based Fuzzy Reasoning System


for Assessing the Risk of Management Fraud, in Proc. of the IEEE
International Conference on Systems, Man, and Cybernetics, Orlando, FL,
Oct. 1997.

[27]

M. Syeda, Y.-Q. Zhang, and Y. Pan, Parallel Granular Neural Networks for
Fast Credit Card Fraud Detection, in Proc. of the 2002 IEEE International
Conference on Fuzzy Systems, Honolulu, Hawaii, May 2002.

[28]

E. Osuna, Applying SVMs to Face Detection, IEEE Intelligent Systems,


vol. 13, Jul./Aug. 1998, pp. 23-26.

[29]

S. Dumais, Using SVMs for Text Categorization, IEEE Intelligent Systems,


vol. 13, Jul./Aug. 1998, pp. 21-23.

[30]

G. Dror, R. Sorek, and S. Shamir, Accurate Identification of Alternatively


Spliced Exons using Support Vector Machine, Bioinformatics, vol. 21 no.
7, Apr. 2005, pp. 897-901.

[31]

R. Behroozmand, and F. Almasganj, Comparison of Neural Networks and


Support Vector Machines Applied to Optimized Features Extracted from
Patients' Speech Signal for Classification of Vocal Fold Inflammation, in
Proc. of the 5th IEEE International Symposium on Signal Processing and
Information Technology, 21 Dec 2005, pp. 844-849.

[32]

E. Byvatov, U. Fechner, J. Sadowski, and G. Schneider, Comparison of


Support Vector Machine and Artificial Neural Network Systems for
Drug/Nondrug Classification, Journal of Chemical Information and
Computer Science, vol. 43, no. 6, Sept, 2003, pp. 1882-1889.

208

[33]

T. Joachims, Text Categorization with Support Vector Machines: Learning


with Many Relevant Features, in Proc. of the 10th European Conference
on Machine Learning, Springer Verlag, Heidelberg, DE, 1998, pp. 137142.

[34]

D. Poynter, Expert Witness Handbook: Tips and Techniques for the


Litigation Consultant, Para Publishing, 2nd edition, June 1997, pp. 283.

[35]

Y. Kou, C.-T. Lu, S. Sirwongwattana, and Y.-P. Huang, Survey of Fraud


Detection Techniques, in Proc. of the 2004 IEEE International
Conference on Networking, Sensing and Control, vol. 2, Sept. 2004, pp.
749-754.

[36]

L. Hai-Feng, Z. Hao, Study on Risk Evasion in Electricity Market, in Proc.


of the 2004 IEEE International Conference on Electric Utility
Deregulation, Restructuring and Power Technologies, vol. 2, 5-8 April
2004, pp. 495-499.

[37]

P. H. Loong, Application of Support Vector Machine to Detect


Abnormalities and Probable Fraud of the Electric Meters Used by
Customers in SESB, Master's Dissertation, Universiti Tenaga Nasional,
Malaysia, 2007.

[38]

Y. Chen, Support Vector Machines and Fuzzy Systems, Springer US, 2008.
Part III, pp. 205-223.

[39]

Y. Chen, and J. Z. Wang, Support Vector Learning for Fuzzy Rule-Based


Classification Systems, IEEE Transactions on Fuzzy Systems, vol. 11, no.
6, 2003, pp. 716-728.

[40]

A. Chaves, M. M. Vellasco, and R. Tanscheit, Fuzzy Rules Extraction from


Support Vector Machines for Multi-Class Classification, Analysis and
Design of Intelligent Systems using Soft Computing Techniques, vol. 41,
2007, pp. 99-108.

[41]

D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, Determining the Load


Profiles of Consumers Based on Fuzzy Logic and Probability Neural
Networks, IEE Proceedings - Generation, Transmission and Distribution,
vol. 151, no. 3, May 2004, pp. 395-400.

[42]

D. Gerbec, S. Gasperic, and F. Gubina, Determination and Allocation of


Typical Load Profiles to the Eligible Consumers, in Proc. of the IEEE
Bologna Power Tech Conference Proceedings, Bologna, Italy, June 2003.

[43]

R. F. Chang and C. N. Lu, Load Profiling and its Applications in Power


Market, presented at the IEEE Power Engineering Society General
Meeting, Toronto, Canada, July 2003.

209

[44]

R. F. Chang and C. N. Lu, Load Profile Assignment of Low Voltage


Customers for Power Retail Market Applications, IEE Proceedings Generation, Transmission and Distribution, vol. 150, no. 3, pp. 263-67,
May 2003.

[45]

C. S. Chen, J. C. Hwang, and C. W. Huang, Application of Load Survey


Systems to Proper Tariff Design, IEEE Transactions on Power Systems,
vol. 12, no. 4, Nov. 1997, pp. 1746-1751.

[46]

C. S. Chen, J. C. Hwang, Y. M. Tzeng, C. W. Huang, and M. Y. Cho,


Determination of Customer Load Characteristics by Load Survey System
at Taipower, IEEE Transactions on Power Delivery, vol. 11, no. 3, Jul.
1996, pp. 1430-1436.

[47]

C. S. Chen, M. S. Kang, J. C. Hwang, and C. W. Huang, Implementation of


the Load Survey System in Taipower, in Proc. of the IEEE Transmission
and Distribution Conference, New Orleans, Los Angeles, Apr. 1999.

[48]

G. Chicco, R. Napoli, F. Piglione, P. Postolache, M. Scutariu, and C. Toader,


Emergent Electricity Customer Classification, IEE Proceedings Generation, Transmission and Distribution, vol. 152, no. 2, Mar. 2005, pp.
164-72.

[49]

G. Chicco, R. Napoli, P. Postolache, M. Scutariu, and C. Toader, Customer


Characterization Options for Improving the Tariff Offer, IEEE Power
Engineering Review, vol. 22, no. 11, Nov. 2002, pp. 60-60.

[50]

G. Chicco, E. Carpaneto, R. Napoli, and M. Scutariu, Electricity Customer


Classification using Frequency-Domain Load Pattern Data, International
Journal of Electrical Power and Energy Systems, vol. 28, no. 1, Jan. 2006,
pp. 13-20.

[51]

V. Figueiredo, F. Rodrigues, Z. Vale, and J. B. Gouveia, An Electric Energy


Consumer Characterization Framework Based on Data Mining
Techniques, IEEE Transactions on Power Systems, vol. 20, no. 2, May
2005, pp. 596- 602.

[52]

F. Rodrigues, J. Duarte, V. Figueiredo, Z. Vale, and M. Cordeiro, A


Comparative Analysis of Clustering Algorithms Applied to Load Profiling,
in Proc. of the 3rd International Conference (MLDM), Leipzig, Germany,
July 2003.

[53]

F. J. Duarte, F. Rodrigues, V. Figueiredo, Z. Vale, and M. Cordeiro, Data


Mining Techniques Applied to Electric Energy Consumers
Characterization, in Proc. of the Artificial Intelligence and Soft
Computing Conference, Banff, Alta, Canada, July 2003.

210

[54]

A. P. Birch and C. S. Ozveren, An Adaptive Classification for Tariff


Selection, in Proc. of the 7th International Conference on Metering
Apparatus and Tariffs for Electricity Supply, Glasgow, U.K., Nov. 1992.

[55]

A. P. Birch, C. S. Ozveren, and A. T. Sapeluk, A Generic Load Profiling


Technique using Fuzzy Classification, in Proc. of the 8th International
Conference on Metering and Tariffs for Energy Supply, Brighton, U.K., Jul.
1996.

[56]

A. P. Birch, C. S. Ozveren, and M. M. Smith, A Review of the Electricity


Supply Industry in Britain, in Proc. of the 7th Mediterranean Electro
Technical Conference, Antalya, Turkey, Apr. 1994.

[57]

R. Yao and K. Steemers, A Method of Formulating Energy Load Profile for


Domestic Buildings in the UK, Energy and Buildings, vol. 37, no. 6, June
2005, pp. 663-671.

[58]

K. L. Lo and Z. Zakaria, Electricity Consumer Classification using Artificial


Intelligence, in Proc. of the 39th International Universities Power
Engineering Conference, Bristol, U.K., Sept. 2004.

[59]

K. L. Lo, Z. Zakaria, and C. S. Ozveren, Fuzzy Classification and Statistical


Methods for Load Profiling: A Comparison, in Proc. of the 6th
International Conference on Advances in Power System Control,
Operation and Management - Proceedings, Hong Kong, China, Nov. 2003.

[60]

K. L. Lo, Z. Zakaria, and M. H. Sohod, Application of Two-Stage Fuzzy CMeans in Load Profiling, WSEAS Transactions on Information Science
and Applications, vol. 2, no. 11, Nov. 2005, pp. 1905-1912.

[61]

Z. Zakaria, K. L. Lo, and M. H. Sohod, Application of Fuzzy Clustering to


Determine Electricity Consumers' Load Profiles, in Proc. of the IEEE
International Power and Energy Conference, PECON, Putrajaya, Malaysia,
Nov. 2006.

[62]

Z. Zakaria and K. L. Lo, Load Profiling in the New Electricity Market, in


Proc. of the Student Conference on Research and Development, SCOReD,
Shah Alam, Malaysia, July 2002.

[63]

M. Espinoza, C. Joye, R. Belmans, and B. DeMoor, Short-Term Load


Forecasting, Profile Identification, and Customer Segmentation: A
Methodology Based on Periodic Time Series, IEEE Transactions on Power
Systems, vol. 20, no. 3, Oct. 2005, pp. 1622-1630.

[64]

S. V. Verdu, M. O. Garcia, F. J. G. Franco, N. Encinas, A. G. Marin, A. Molina,


and E. G. Lazaro, Characterization and Identification of Electrical
Customers through the Use of Self-Organizing Maps and Daily Load
Parameters, in Proc. of the IEEE PES Power Systems Conference and
Exposition, New York, U.S.A., Oct. 2004.
211

[65]

J. A. Jardini, C. M. V. Tahan, M. R. Gouvea, S. U. Ahn, and F. M. Figueiredo,


Daily Load Profiles for Residential, Commercial and Industrial Low
Voltage Consumers, IEEE Transactions on Power Delivery, vol. 15, no. 1,
Jan. 2000, pp. 375-380.

[66]

D. Gerbec, F. Gubina, and Z. Toro, Actual Load Profiles of Consumers


without Real Time Metering, presented at the IEEE Power Engineering
Society General Meeting, San Francisco, California, U.S.A., June 2005.

[67]

B. D. Pitt and D. S. Kitschen, Application of Data Mining Techniques to


Load Profiling, in Proc. of the 21st IEEE International Conference Power
Industry Computer Applications, Santa Clara, California, U.S.A., May
1999.

[68]

M. Halkidi, Y. Batistakis, and M. Vazirgiannis, On Clustering Validation


Techniques, Journal of Intelligent Information Systems, vol. 17, no. 2,
Dec. 2001, pp. 107-45.

[69]

L. Bjornar and A. Chinatsu, Fast and Effective Text Mining using LinearTime Document Clustering, in Proc. of the 5th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Diego,
California, United States, Aug. 1999.

[70]

M. Steinbach, G. Karypis, and V. Kumar, A Comparison of Document


Clustering Techniques, in Proc. of the 6th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, Boston, MA, U.S.A.,
Aug. 2000.

[71]

J. Daxin, T. Chun, and Z. Aidong, Cluster Analysis for Gene Expression


Data: A Survey, IEEE Transactions on Knowledge and Data Engineering,
vol. 16, no. 11, pp. 1370-86, Nov. 2004.

[72]

P. C. H. Ma, K. C. C. Chan, Y. Xin, and D. K. Y. Chiu, An Evolutionary


Clustering Algorithm for Gene Expression Microarray Data Analysis, IEEE
Transactions on Evolutionary Computation, vol. 10, no. 3, pp. 296-314,
June 2006.

[73]

S. Ray and R. H. Turi, Determination of Number of Clusters in K-Means


Clustering and Application in Color Image Segmentation, in Proc. of the
4th International Conference on Advances in Pattern Recognition and
Digital Techniques (ICAPRDT'99), Calcutta, India, July 1999.

[74]

U. Maulik and S. Bandyopadhyay, Performance Evaluation of Some


Clustering Algorithms and Validity Indices, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 24, no. 12, Dec. 2002, pp. 16501654.

[75]

J. Han and M. Kamber, Data Mining: Concepts and Techniques, San


Francisco, California: Morgan Kaufmann Publishers, 2001.
212

[76]

G. Karypis, H. Eui-Hong, and V. Kumar, Chameleon: Hierarchical


Clustering using Dynamic Modeling Computer, vol. 32, no. 8, Aug. 1999,
pp. 68-75.

[77]

D. Fisher, X. Ling, J. R. Carnes, Y. Reich, J. Fenves, J. Chen, R. Shiavi, G.


Biswas, and J. Weinberg, Applying AI Clustering to Engineering Tasks,
IEEE Expert, vol. 8, no. 6, Dec. 1993, pp. 51-60.

[78]

D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, Consumers' Load Profile


Determination Based on Different Classification Methods, presented at
the IEEE Power Engineering Society General Meeting, Toronto, Ont.,
Canada, Jul. 2003.

[79]

D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, An Approach to Customers


Daily Load Profile Determination, presented at the IEEE Power
Engineering Society Summer Meeting, Chicago, IL, U.S.A., July 2002.

[80]

D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, Hierarchic Clustering


Methods for Consumers Load Profile Determination, in Proc. of the 2nd
Balkan Power Conference, Belgrade (YU), May 2002.

[81]

D. Gerbec, S. Gasperic, I. Smon, and F. Gubina, A Methodology to Classify


Distribution Load Profiles, in Proc. of the Asia Pacific Conference and
Exhibition of the IEEE-Power Engineering Society on Transmission and
Distribution, Yokohama, Japan, Oct. 2002.

[82]

D. Gerbec and F. Gubina, Division of the "Rest Curve" into Consumers'


Load Profiles using Linear Programming, in Proc. of the International
Conference on Power System Technology (PowerCon), Singapore, Oct.
2004.

[83]

G. Chicco, R. Napoli, P. Postolache, M. Scutariu, and C. Toader, Electric


Energy Customer Characterization for Developing Dedicated Market
Strategies, in Proc. of the IEEE Porto Power Tech Conference, Porto,
Sept. 2001.

[84]

G. Chicco, R. Napoli, and F. Piglione, Application of Clustering Algorithms


and Self Organizing Maps to Classify Electricity Customers, in Proc. of the
IEEE Bologna Power Tech Conference, Bologna, Italy, June 2003.

[85]

G. Chicco, R. Napoli, and F. Piglione, Comparisons Among Clustering


Techniques for Electricity Customer Classification, IEEE Transactions on
Power Systems, vol. 21, no. 2, May 2006, pp. 1-8.

[86]

G. Chicco, R. Napoli, and F. Piglione, Load Pattern Clustering for ShortTerm Load Forecasting of Anomalous Days, in Proc. of the IEEE Porto
Power Tech, Porto, Sept. 2001.

213

[87]

G. Chicco, R. Napoli, F. Piglione, P. Postolache, M. Scutariu, and C. Toader,


Load Pattern-Based Classification of Electricity Customers, IEEE
Transactions on Power Systems, vol. 19, no. 2, May 2004, pp. 1232-1239.

[88]

G. Chicco, R. Napoli, F. Piglione, P. Postolache, M. Scutariu, and C. Toader,


A Review of Concepts and Techniques for Emergent Customer
Categorization, in Proc. of the Telmark Discussion Forum, London, U.K.,
Sept. 2002.

[89]

A. K. Jain, M. N. Murty, and P. J. Flynn, Data Clustering: A Review ACM


Computing Surveys (CSUR), ACM Computing Surveys (CSUR), vol. 31, no.
3, Sept. 1999, pp. 264-323.

[90]

J. Vesanto and E. Alhoniemi, Clustering of the Self-Organizing Map, IEEE


Transactions on Neural Networks, vol. 11, no. 3, May 2000, pp. 586-600.

[91]

D. L. Davies and D. W. Bouldin, A Cluster Separation Measure, IEEE


Transactions on Pattern Analysis and Machine Intelligence, vol. 1, no. 2,
Apr. 1979, pp. 224-227.

[92]

P. Mangiameli, S. K. Chen, and D. West, A Comparison of SOM Neural


Network and Hierarchical Clustering Methods, European Journal of
Operational Research, vol. 93, no. 2, Sept. 1996, pp. 402-417.

[93]

I. E. Davidson, Evaluation and Effective Management of Non-Technical


Losses in Electrical Power Networks, in Proc. of the IEEE AFRICON
Conference in Africa, South Africa, Oct. 2002.

[94]

D. Suriyamongkol, Non-Technical Losses in Electrical Power Systems,


Master's Dissertation, Ohio University, United States, 2002.

[95]

M. Poveda, A New Method to Calculate Power Distribution Losses in an


Environment of High Unregistered Loads, in Proc. of the IEEE
Transmission and Distribution Conference, New Orleans, LA, Apr. 1999.

[96]

C. A. Dortolina and R. Nadira, The Loss that is Unknown is No Loss at All:


A Top-Down/Bottom-Up Approach for Estimating Distribution Losses,
IEEE Transactions on Power Systems, vol. 20, no. 2, May 2005, pp. 11191125.

[97]

R. Nadira, F. F. Wu, D. J. Maratukulam, E. P. Weber, and C. L. Thomas,


Bulk Transmission System Loss Analysis, IEEE Transactions on Power
Systems, vol. 8, no. 2, May 1993, pp. 405-416.

[98]

W. A. Doorduin, H. T. Mouton, R. Herman, and H. J. Beukes, Feasibility


Study of Electricity Theft Detection using Mobile Remote Check Meters, in
Proc. of the 7th AFRICON Conference in Africa, Africa, Sept. 2004.

214

[99]

K. Sridharan and N. N. Schulz, Outage Management Through AMR


Systems using an Intelligent Data Filter, IEEE Transactions on Power
Delivery, vol. 16, no. 4, Oct. 2001, pp. 669-675.

[100] Y. Kou, C.-T. Lu, S. Sirwongwattana, and Y.-P. Huang, Survey of Fraud
Detection Techniques, in Proc. of the IEEE International Conference on
Networking, Sensing and Control, Taipei, Taiwan, Mar. 2004.
[101] V. J. Hodge and J. Austin, A Survey of Outlier Detection Methodologies,
Artificial Intelligence Review, vol. 22, no. 2, Oct. 2004, pp. 85-126.
[102] Z. Ferdousi and A. Maeda, Unsupervised Outlier Detection in Time Series
Data, in Proc. of the 22nd International Conference on Data Engineering
Workshops, Atlanta, Georgia, U.S.A., Apr. 2006.
[103] E. Lozano and E. Acufia, Parallel Algorithms for Distance-Based and
Density-Based Outliers, in Proc. of the 5th IEEE International Conference
on Data Mining, Houston, Texas, U.S.A., Nov. 2005.
[104] D. Ren, I. Rahal, and W. Perrizo, A Vertical Outlier Detection Algorithm
with Clusters as By-Product, in Proc. of the 16th IEEE International
Conference on Tools with Artificial Intelligence, 2004, Boca Raton,
Florida, U.S.A., Nov. 2004.
[105] D. Ren, B. Wang, and W. Perrizo, RDF: A Density-Based Outlier Detection
Method using Vertical Data Representation, in Proc. of the 4th IEEE
International Conference on Data Mining, Brighton, U.K., Nov. 2004.
[106] M. Markou and S. Singh, Novelty Detection: A Review Part 1: Statistical
Approaches, Signal Processing, vol. 83, no. 12, Dec. 2003, pp. 2481-2497.
[107] J. Laurikkala and M. Juhola, Hierarchical Clustering of Female Urinary
Incontinence Data Having Noise and Outliers, Berlin: Springer Berlin /
Heidelberg, 2001, pp. 161-67.
[108] M. Markou and S. Singh, Novelty Detection: A Review Part 2: Neural
Network Based Approaches, Signal Processing, vol. 83, no. 12, Dec. 2003,
pp. 2499-2521.
[109] E. H. Feroz and T. M. Kwon, Self-Organizing Fuzzy and MLP Approaches
to Detecting Fraudulent Financial Reporting, in Proc. of the IEEE/IAFE
1996 Conference on Computational Intelligence for Financial
Engineering, New York City, NY, Mar. 1996.
[110] T. M. Kwon and E. H. Feroz, A Multilayered Perceptron Approach to
Prediction of the SEC's Investigation Targets, IEEE Transactions on
Neural Networks, vol. 7, no. 5, Sept. 1996, pp. 1286-1290.

215

[111] M. Taniguchi, M. Haft, J. Hollmen, and V. Tresp, Fraud Detection in


Communication Networks using Neural and Probabilistic Methods, in
Proc. of the 1998 IEEE International Conference on Acoustics, Speech,
and Signal Processing, Seattle, WA, May 1998.
[112] J. R. Dorronsoro, F. Ginel, C. Sgnchez, and C. S. Cruz, Neural Fraud
Detection in Credit Card Operations, IEEE Transactions on Neural
Networks, vol. 8, no. 4, July 1997, pp. 827-834.
[113] S. Ghosh and D. L. Reilly, Credit Card Fraud Detection with a NeuralNetwork, in Proc. of the 27th Hawaii International Conference on
System Sciences, Information System, Decision Support and KnowledgeBased Systems, vol. 3, Wailea, HI, Jan. 1994.
[114] J. Takeuchi and K. Yamanishi, A Unifying Framework for Detecting
Outliers and Change Points from Time Series, IEEE Transactions on
Knowledge and Data Engineering, vol. 18, no. 4, Apr. 2006, pp. 482-492.
[115] M. Weatherford, Mining for Fraud, IEEE Intelligent Systems, vol. 17, no.
4, Jul./Aug. 2002, pp. 4-6.
[116] T. Fawcett and F. Provost, Adaptive Fraud Detection, Data Mining and
Knowledge Discovery, vol. 1, no. 3, Sept. 1997, pp. 291-316.
[117] PEA, Internal Annual Report 2001, Provincial Energy Authority (PEA) of
Thailand, Department of Power Economics, pp. 24-60.
[118] TNB, Annual Report Tenaga Nasional Berhad 2007, Tenaga Nasional
Berhad, Malaysia, 2007.
[119] Energy Sector Unit, Europe and Central Asia Region, Non-Payment in the
Electricity Sector in Eastern Europe and the Former Soviet Union, World
Bank Technical Paper No. 423, The World Bank, June 1999, pp. 3-33.
[120] International Utilities Revenue Protection Association
Electricity Theft Related News Group Posts, vol. 1, 2002.

(IURPA),

[121] Electricity Metering Subcommittee of the IEEE Committee on Power


System Instrumentation and Measurements, Progress in the Art of
Metering Electric Energy, IEEE, 3rd edition, Dec. 1969, pp. 1-5.
[122] David Dahle, 2004. An online reference on US-made Watt-hour meters,
http://www.watthourmeters.com. Accessed on 19 February, 2009.
[123] A. E. Fitzgerald, Charles Kingsley, Stephen D Umans, Electric Machinery,
5th edition, McGraw-Hill Company, 1992, pp. 51-87.
[124] TNB, Annual Report Tenaga Nasional Berhad 2006, Tenaga Nasional
Berhad, Malaysia, 2006.
216

[125] TNB, Annual Report Tenaga Nasional Berhad 2005, Tenaga Nasional
Berhad, Malaysia, 2005.
[126] TNB, Annual Report Tenaga Nasional Berhad 2004, Tenaga Nasional
Berhad, Malaysia, 2004.
[127] F. E. Jin, 2004. Impianas Gets Extension for TNB Project New Straits
Times. http://www.highbeam.com/doc/1P1-96215780.html. Accessed
on March 5, 2009.
[128] R. Damodaran, 2004. Leader in Electricity Metering, Card-based Revenue
Collection New Straits Times. http://www.highbeam.com/doc/1P190630620.html. Accessed on March 5, 2009.
[129] S. S. A. Naser, A. Z. A. Ola, An Expert System for Diagnosing Eye Diseases
using Clips", Journal of Theoretical and Applied Information Technology,
vol. 4, no. 10, 2008, pp. 923-930.
[130] John McCarthy, 2007. Homepage What is Artificial Intelligence?.
http://www-formal.stanford.edu/jmc/whatisai/whatisai.html. Accessed
on March 17, 2009.
[131] J. McCarthy, M. L. Minsky, N. Rochester, C.E. Shannon, A Proposal for the
Dartmouth Summer Research Project on Artificial Intelligence, August 31,
1955.
[132] D. Crevier, AI: The Tumultuous Search for Artificial Intelligence, New
York, NY: BasicBooks, 1993.
[133] S. J. Russell, and P. Norvig, Artificial Intelligence: A Modern Approach,
2nd edition, Upper Saddle River, NJ: Prentice Hall, 2003.
[134] H. Moravec, Mind Children: The Future of Robot and Human Intelligence,
Harvard University Press, 1988.
[135] A. Hodges, Alan Turing: The Enigma, Walker and Company, New York,
2000.
[136] John McCarthy, 2007. Basic Questions What is Artificial Intelligence?.
http://www-formal.stanford.edu/jmc/whatisai/node1.html. Accessed
on March 23, 2009.
[137] National Research Council, Developments in Artificial Intelligence,
Funding a Revolution: Government Support for Computing Research,
National Academy Press, Washington D.C., 1999.
[138] N. M. Barnes, and Z. Q. Liu, Vision Guided Circumnavigating Autonomous
Robots, International Journal of Pattern Recognition and Artificial
Intelligence, vol. 14 (6), 2000, pp 689-714.
217

[139] A. Zeichick, A. Rice, Improving Decisions and Increasing Customer


Satisfaction with Intelligent Systems, MindBox Inc., July 2000.
[140] N.J. Nilsson, 1996. Introduction to Machine Learning A rough draft of a
textbook.
http://robotics.stanford.edu/people/nilsson/mlbook.html.
Accessed on March 28, 2009.
[141] T. Mitchell, Machine Learning, McGraw Hill, New York, U.S.A., 1997.
[142] E. Alpaydin, Introduction to Machine, The MIT Press, U.S.A., 2004.
[143] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd edition,
Wiley, New York, 2001.
[144] M.-F. Moens, Information Extraction: Algorithms and Prospects in a
Retrieval Context, Springer, 1st edition, 2006, pp. 65-67.
[145] J. C. Isaacs, S. Y. Foo, and A. Meyer-Baese, Novel Kernels and Kernel PCA
for Pattern Recognition, in Proc. of the 2007 IEEE International
Symposium on Computational Intelligence in Robotics and Automation
Jacksonville, FL, U.S.A., June 20-23, 2007, pp. 438-443.
[146] P. J. Garca-Laencina, J. Serrano-Garca, A. R. Figueiras-Vidal, and J.-L.
Sancho-Gmez Multi-task Neural Networks for Dealing with Missing
Inputs, in Proc. of the 2nd International Work-Conference on the
Interplay Between Natural and Artificial Computation, Spain, 18-21 June,
2007, pp. 282-291.
[147] H. Peng, F. Long, and C. Ding, Feature Selection Based on Mutual
Information: Criteria of Max-Dependency, Max-Relevance, and MinRedundancy, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 27, no. 8, 2005, pp. 1226-1238.
[148] D. Morariu, L. N. Vintan, and V. Tresp, Feature Selection Methods for an
Improved SVM Classifier, International Journal of Intelligent Systems and
Technologies, vol. 1, no. 4, 2006, pp. 288-298.
[149] M. D. Boardman, Extrinsic Regularization in Parameter Optimization for
Support Vector Machines, Masters Dissertation, Dalhousie University,
Canada, 2006.
[150] O. Chapelle, V. N. Vapnik, O. Bousquet, and S. Mukherjee, Choosing
Multiple Parameters for Support Vector Machines, Machine Learning, vol.
46, no. 1-3, 2002, pp. 131-159.
[151] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
Machines and Other Kernel-based Learning Methods, Cambridge
University Press, Cambridge, U.K., 2000.

218

[152] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag,


New York, 1995.
[153] R. A. Wilson, and F. C. Keil, F.C., The MIT Encyclopedia of Cognitive
Sciences, Cambridge, Massachusetts, The MIT Press, 1999.
[154] C. M. Bishop, Neural Networks for Pattern Recognition, Oxford
University Press, Oxford, U.K., 1995.
[155] B. D. Ripley, Pattern Recognition and Neural Networks, Cambridge
University Press, Cambridge, U.K., 1996.
[156] F. Filippetti, G. Franceschini, C. Tassoni, and P. Vas, AI Techniques in
Induction Machines Diagnosis Including the Speedripple Effect, IEEE
Transactions on Industry Applications, vol. 34, no. 1, Jan./Feb. 1998, pp.
98-108.
[157] K. Khatatneh, and T. Mustafa, Software Reliability Modeling using Soft
Computing Technique, European Journal of Scientific Research, vol. 26,
no.1, 2009, pp. 154-160.
[158] G. Klir, and B. Yuan, Fuzzy Sets and Fuzzy Logic, Theory and
Applications, Prentice Hall, Upper Saddle River, New Jersey, U.S.A., 1995.
[159] B. Kosko, Neural Networks and Fuzzy Systems: A Dynamical Systems
Approach to Machine Intelligence, Englewood Cliffs, New Jersey, Prentice
Hall, 1992.
[160] M. E. H. Benbouzid, and H. Nejjari, A Simple Fuzzy Logic Approach for
Induction Motors Stator Condition Monitoring, in Proc. of the IEEE
International Electric Machines and Drives Conference, 2001, pp. 634639.
[161] L.-P. Li, J. Ma, N. Zhao, Z. Zhao, J.-Z. Liu, Probabilistic Model-Based
Degradation Diagnosing of Thermal System and Simulation Test, in Proc.
of the 2005 International Conference on Machine Learning and
Cybernetics, 13-16 Aug. 2006, pp. 1483-1486.

[162] S. Chen, and G. Jiang, The Prediction Model of Multiple Myeloma Based on
the BP Artificial Neural Network, in Proc. of the International Conference
on Technology and Applications in Biomedicine, 30-31 May 2008, pp.
380-382.
[163] D. T. Pham, and P. T. N. Pham, Artificial intelligence in Engineering,
International Journal of Machine Tools and Manufacture, vol. 39 no.6,
1999, pp. 937-949.

219

[164] A. Siddique, G. S. Yadava, and B. Singh, Applications of Artificial


Intelligence Techniques for Induction Machine Stator Fault Diagnostics:
Review, in Proc. of the 4th IEEE International Symposium on Diagnostics
for Electric Machines, Power Electronics and Drives, Atlanta, GA, 24-26
August 2003, pp. 29-34.
[165] E. F. Wanner, F. G. Guimares, R. H. C. Takahashi, and P. J. Fleming, Local
Search with Quadratic Approximations into Memetic Algorithms for
Optimization with Multiple Criteria, Evolutionary Computation, vol. 16,
no. 2, 2008, pp. 185-224.
[166] C. Patel, and A. Kumar, Map Colour Problem using Genetic Algorithm
Approach, in Proc. of the 2nd National Conference on Challenges and
Opportunities in Information Technology, Mandi Gobindgarh, India, 29
March 2008, pp. 247-250.
[167] D. Goldberg, Genetic Algorithms in Search, Optimization and Machine
Learning, Addison-Wesley, Reading, Massachusetts, 1989.
[168] D. W. Coit, A. E. Smith, and D. Tate, Adaptive Penalty Methods for Genetic
Optimization of Constrained Combinatorial Problems, INFORMS Journal
on Computing, vol. 8, no. 2, 1996, pp. 173-182.
[169] V. Vapnik, Estimation of Dependences Based on Empirical Data,
Springer-Verlag, New York, 1982.
[170] H. L. Pok, K. S. Yap, I. Z. Abidin, A. H. Hashim, Z. F. Hussien, and A. M.
Mohamad, Abnormalities and Fraud Electric Meter Detection using
Hybrid Support Vector Machine and Modified Genetic Algorithm, in Proc.
of the 19th International Conference on Electricity Distribution, CIRED,
Vienna, 21-24 May, 2007.
[171] S. Gunn, Support Vector Machines for Classification and Regression,
Technical Report, Image, Speech and Intelligent Systems Research
Group, University of Southampton, U.S.A., 1997.
[172] C. Cortes, Prediction of Generalisation Ability in Learning Machines, Ph.D
Thesis, University of Rochester, U.S.A., 1995.
[173] B. Schlkopf, Support Vector Learning. R. Oldenbourg Verlag, Mnchen,
1997.
[174] C. Cortes, and V. Vapnik, Support Vector Networks, Machine Learning,
vol. 20, no .3, pp. 273-297.
[175] V. Blanz, B. Schlkopf, H. Blthoff, C. J. C. Burges, V. Vapnik, and T. Vetter,
Comparison of View-Based Object Recognition Algorithms using Realistic
3D Models, in Proc. of the International Conference on Artificial Neural
Networks. Springer Verlag, Berlin, 1996.
220

[176] M. Schmidt, Identifying Speakers with Support Vector Machines, in Proc.


of Interface, Sydney, 1996.
[177] E. Osuna, R. Freund, and F. Girosi, Training Support Vector Machines: An
Application to Face Detection, in Proc. of Computer Vision and Pattern
Recognition, 1997, pp. 130-136.
[178] G. Dror, R. Sorek, and S. Shamir, Accurate Identification of Alternatively
Spliced Exons using Support Vector Machine, Bioinformatics, vol. 21 no.
7, 2005, pp. 897-901.
[179] C. J. C. Burges, A Tutorial on Support Vector Machines for Pattern
Recognition, Data Mining and Knowledge Discovery, vol. 2, no. 2, 1998,
pp. 121-167.
[180] B. Schlkopf, C. J. C. Burges, and A. J. Smola, Introduction to Support
Vector Learning Advances in Kernel Methods Support Vector Learning,
MIT Press, Cambridge, Massachusetts, 1999.
[181] A. J. Smola, Learning with Kernels, Ph.D Thesis, University of Berlin,
Berlin, 1998.
[182] A. J. Smola, and B. Schlkopf, On a Kernel-Based Method for Pattern
Recognition, Regression, Approximation and Operator Inversion,
Algorithmica, vol. 22, pp. 211-231, 1988.
[183] V. Vapnik, and A. Chervonenkis, A Note on One Class of Perceptrons,
Automation and Remote Control, 25, 1964.
[184] V. Vapnik, and A. Chervonenkis, Theory of Pattern Recognition [in
Russian], Nauka, Moscow, 1974. (German Translation: W. Wapnik and A.
Tscherwonenkis, Theorie der Zeichenerkennung. AkademieVerlag,
Berlin, 1979.)
[185] R. Fletcher, Practical Methods of Optimization. John Wiley and Sons,
Inc., 2nd edition, 1987.
[186] A. J. Smola, B. Schlkopf, A Tutorial on Support Vector Regression,
Statistics and Computing, vol. 14, no. 3, 2004, pp. 199-222.
[187] J. C. Platt, Sequential Minimal Optimization: A Fast Algorithm for Training
Support Vector Machines, Technical Report MSR-TR-98-14, Microsoft
Research Center, Redmond, U.S.A., 1998.
[188] J. C. Platt, Fast Training of Support Vector Machines using Sequential
Minimal Optimization, in A. J. Smola, P. L. Bartlett, B. Schlkopf, and D.
Schuurmans, eds., Advances in Large Margin Classifiers, pp. 185-208, MIT
Press, Cambridge, MA, U.S.A., 1999.

221

[189] G. P. McCormick, Non Linear Programming: Theory, Algorithms and


Applications, John Wiley and Sons, Inc., 1983.
[190] B. E. Boser, I. M. Guyon, and V. N. Vapnik, A Training Algorithm for
Optimal Margin Classifiers, in Proc. of the 5th Annual Workshop on
Computational Learning Theory, Pittsburgh, PA, July 1992, pp. 144-152.
[191] R. Courant, and D. Hilbert, Methods of Mathematical Physics, vol. 1.
Interscience Publishers, Inc., New York, 1953.
[192] S. Degroeve, K. Tanghe, B. D. Baets, M. Leman, and J.-P. Martens, A
Simulated Annealing Optimization of Audio Features for Drum
Classification, in Proc. of the 6th International Conference on Music
Information Retrieval, London, U.K., September, 2005, pp. 482-487.
[193] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, A Practical Guide to Support Vector
Classification, Technical Report, Department of Computer Science and
Information Engineering, National Taiwan University, Taipei, 2003.
[194] H.-Q. Wang, D.-S. Huang, and B. Wang, Optimisation of Radial Basis
Function Classifiers using Simulated Annealing Algorithm for Cancer
Classification, Electronics Letters, vol. 41, no. 11, 2005, pp. 630-632.
[195] T.-T. Frie, N. Cristianini, and C. Campbell, The Kernel-Adatron
Algorithm: A Fast and Simple Learning Procedure for Support Vector
Machines, in Proc. of the 15th International Conference on Machine
Learning, San Francisco, California, pp. 188-196.
[196] J. C. Platt, Probabilistic Output for Support Vector Machines and
Comparison to Regularized Likelihood Methods, in A. J. Smola, P. L.
Bartlett, B. Schlkopf, and D. Schuurmans, eds., Advances in Large Margin
Classifiers, MIT Press, Cambridge, MA, U.S.A., 1999.
[197] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy,
Improvements to Platt's SMO Algorithm for SVM Classifier Design, Neural
Computation, vol. 13, no. 3. 2001, pp. 637-649.
[198] A. M. Mohamad, Development of an Intelligent System for Detection of
Abnormalities and Probable Fraud by Metered Customers in TNB
Distribution Division, R & D Project Proposal, TNB Research Sdn. Bhd.,
Malaysia, 2007.
[199] C.-C. Chang and C.-J. Lin, 2005. LIBSVM: A Library for Support Vector
Machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed on
January 28, 2008.
[200] R. Kohavi, A Study of Cross-Validation and Bootstrap for Accuracy
Estimation and Model Selection, in Proc. of the 14th International Joint
Conference on Artificial Intelligence, 1995.
222

[201] P. A. Devijver, and J. Kittler, Pattern Recognition: A Statistical Approach,


Prentice-Hall, London, 1982.
[202] K. K. Kumar, and P. S. Shelokar, An SVM Method using Evolutionary
Information for the Identification of Allergenic Proteins, Bioinformation,
vol. 2, no. 6, 2008, pp. 253-256.
[203] T. F. Wu, C.-J. Lin, and R. C. Weng, Probability Estimates for Multiclass
Classification by Pairwise Coupling, Journal of Machine Learning
Research, vol. 5, 2004, pp. 975-1005.
[204] M. A. Oskoei, and H. Hu, Support Vector Machine-Based Classification
Scheme for Myoelectric Control Applied to Upper Limb, IEEE Transactions
on Biomedical Engineering, vol. 55, no. 8, 2008, pp. 1956-1965.
[205] L. A. Zadeh, Fuzzy Sets, Information and Control, vol. 8, 1965, pp. 338353.
[206] O. Castillo, and P. Melin, Soft Computing for Control Nonlinear Dynamical
System, Physica-Verlag Heidelberg, New York, 2001.
[207] L. A. Zadeh, Similarity Relations and Fuzzy Ordering, Journal of
Information Sciences, vol. 3, 1971, pp. 177-206.
[208] J.-S. R. Jang, ANFIS: Adaptive-Network-Based Fuzzy Inference System,
IEEE Transactions on Man, and Cybernetics, vol. 23, no. 3, 1993, pp. 665683.
[209] C.-C. Lee, Fuzzy Logic in Control Systems: Fuzzy Logic Controller - Part I,
IEEE Transactions on Man, and Cybernetics, vol. 20, no. 2, 1990, pp. 404418.
[210] C.-C. Lee, Fuzzy Logic in Control Systems: Fuzzy Logic Controller - Part II,
IEEE Transactions on Man, and Cybernetics, vol. 20, no. 2, 1990, pp. 419435.
[211] Y. Tsukamoto, An Approach to Fuzzy Reasoning Method, in M. M. Gupta,
R. K. Ragade, and R. R. Yager, Eds., Advances in Fuzzy Set Theory and
Applications, Amsterdam: North-Holland, 1979, pp. 137-149.
[212] T. Takagi and M. Sugeno, Derivation of Fuzzy Control Rules from Human
Operators Control Actions, in Proc. of IFAC Symposium on Fuzzy
Information, Knowledge Representation and Decision Analysis, July
1983, pp. 55-60.
[213] T. Mattfeldt, H.-W. Gottfried, M. Burger, and H. A. Kestler, Classification
of Prostatic Cancer using Artificial Neural Networks, in G. A. Losa, D.
Merlini, T. F. Nonnenmacher, and E. R. Weibel, editors, Fractals in Biology
and Medicine, Volume III, pages 101-111. Birkhuser, Basel, 2002.
223

[214] R. Benbernou, and K. Warwick, A Fuzzy Multi-Criteria Decision Approach


for Enhanced Auto-Tracking of Seismic Events, in Proc. of the IEEE
International Conference on Signal Processing and Communications,
Dubai, 24-27 Nov, 2007, pp. 1331-1334.
[215] J. Wen, J. L. Zhao, S. W. Luo and Z. Han, The Improvements of BP Neural
Network Learning Algorithm, in Proc. of the 5th International
Conference on Signal Processing Proceedings, Beijing, China, vol. 3, 2000,
pp. 1647-1649.
[216] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme Learning Machine: Theory
and Applications, Neurocomputing, vol. 70, no. 1-3, May 2006, pp. 489501.
[217] G.-B. Huang and S. Chee-Kheong, Extreme Learning Machine: RBF
Network Case, in Proc. of the 8th International Conference on Control,
Automation, Robotics and Vision, Kunming, China, Dec. 2004.
[218] G.-B. Huang, L. Chen, and C.-K. Siew, Universal Approximation using
Incremental Constructive Feedforward Networks with Random Hidden
Nodes, IEEE Transactions on Neural Networks, vol. 17, no. 4, Jul. 2006,
pp. 879-892.
[219] G.-B. Huang, Q.-Y. Zhu, K. Z. Mao, C.-K. Siew, P. Saratchandran, and N.
Sundararajan, Can Threshold Networks be Trained Directly?, IEEE
Transactions on Circuits and Systems - II: Express Briefs, vol. 53, no. 3,
Mar. 2006, pp. 187-191.
[220] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme Learning Machine: A New
Learning Scheme of Feedforward Neural Networks, in Proc. of the
International Joint Conference on Neural networks, Budapest, Hungary,
Jul. 2004.
[221] G.-B. Huang and H. A. Babri, Upper Bounds on the Number of Hidden
Neurons in Feedforward Networks with Arbitrary Bounded Nonlinear
Activation Functions, IEEE Transactions on Neural Networks, vol. 9, no.
1, Jan. 1998, pp. 224-229.
[222] G.-B. Huang, Learning Capability and Storage Capacity of Two-HiddenLayer Feedforward Networks, IEEE Transactions on Neural Networks,
vol. 14, no. 2, Mar. 2003, pp. 274-281.
[223] N.-Y. Liang, G.-B. Huang, P. Saratchandran, and N. Sundararajan, A Fast
and Accurate Online Sequential Learning Algorithm for Feedforward
Networks, IEEE Transactions on Neural Networks, vol. 17, no. 6, Nov.
2006, pp. 1411-1423.

224

[224] W. H. Wolberg and O. L. Mangasarian, Multisurface Method of Pattern


Separation for Medical Diagnosis Applied to Breast Cytology, in Proc. of
the National Academy of Sciences, U.S.A., vol. 87, Dec. 1990, pp 91939196.
[225] A. Osareh, M. Mirmehdi, B. T. Thomas, and R. Markham, Comparative
Exudate Classification using Support Vector Machines and Neural
Networks, in Proc. of the 5th International Conference on Medical Image
Computing and Computer-Assisted Intervention, Tokyo, Japan, Sept.
2002, pp. 413-420.

225

APPENDICES

226

APPENDIX A
LIBSVM Copyright Notice

The intelligent fraud detection software developed in this research study,


"Abnormality and Fraud Detection System" (AFDS) is developed by the
Power Engineering Centre (PEC) of Universiti Tenaga Nasional (UNITEN) for
TNB Distribution (TNBD) Sdn Bhd. and TNB Research (TNBR) Sdn. Bhd. The
AFDS software developed incorporates a tool LIBSVM, a library for support
vector machines, which was developed by Chih-Chung Chang and Chih-Jen Lin.
Acknowledgement of the LIBSVM copyright is shown as below.

Copyright (c) 2000-2008 Chih-Chung Chang and Chih-Jen Lin All


rights reserved.
Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the following
conditions are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
3. Neither name of copyright holders nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED.
IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.

227

APPENDIX B
Related List of Publications

[1]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed,
Farrukh Nagi, "A Computational Intelligence Scheme for Prediction of
the Daily Peak Load", submitted for second review to Applied Soft
Computing (ASOC) on 10 August 2010. Manuscript Reference No: ASOCD-09-00556.

[2]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed,
Farrukh Nagi, "Improving SVM-based Nontechnical Loss Detection in
Power Utility Using Fuzzy Inference System", accepted for publication
in IEEE Transactions on Power Delivery on 22nd June 2010. Manuscript
ID: PESL-00108-2009.R2.

[3]

Mohammad Mehdi Badjian, Jawad Nagi, Sieh Kiong Tiong, Keem Siah
Yap, Siaw Paw Koh, Farrukh Nagi, Comparison of Supervised
Learning Techniques for Non-Technical Loss Detection in Power
Utility, submitted to Malaysian Journal of Computer Science (MJCS) for
first review on 9 April 2010.

[4]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Syed Khaleel Ahmed, and
Malik Mohammad, Nontechnical Loss Detection for Metered
Customers in Power Utility Using Support Vector Machines, IEEE
Transactions on Power Delivery, vol. 25, no. 2, pp. 11621171, Apr. 2010.

[5]

Farrukh Nagi, Syed Khaleel Ahmed, Jawad Nagi, "Fuzzy Time-Optimal


Controller (FTOC) for Second Order Nonlinear Systems", submitted to
IEEE Transactions on Systems, Man, and Cybernetics: Part B for first
review on 30 November 2009. Paper No: SMCB-E-2009-11-1046.

[6]

Farrukh Nagi, Logah Perumal, and Jawad Nagi, A New Integrated


Fuzzy Bang-Bang Relay Control System, Mechatronics, vol. 19, no. 5,
pp. 748760, Aug. 2009.

[7]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, Abdul Malik Mohammad,
and Syed Khaleel Ahmed, Non-Technical Loss Analysis for Detection
of Electricity Theft using Support Vector Machines, in Proc. of the
2nd IEEE International Power and Energy Conference (PECON) 2008,
Dec. 1-3, 2008, Johor Bahru, Malaysia, pp. 907912.

[8]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, and Syed Khaleel Ahmed,
Detection of Abnormalities and Electricity Theft using Genetic
Support Vector Machines, in Proc. of the IEEE Region 10 Conference
(TENCON) 2008, Nov. 18-21, 2008, Hyderabad, India, pp. 16.
228

[9]

Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, and Abdul Malik
Mohammad, Intelligent System for Detection of Abnormalities and
Theft of Electricity using Genetic Algorithm and Support Vector
Machines, in Proc. of the 4th International Conference on Information
Technology and Multimedia at UNITEN, Nov. 18-19, 2008, Bandar Baru
Bangi, Selangor, Malaysia, pp. 122127.

[10] Jawad Nagi, Keem Siah Yap, Sieh Kiong Tiong, and Syed Khaleel Ahmed,
Electrical Power Load Forecasting using Hybrid Self-Organizing
Maps and Support Vector Machines, in Proc. of the 2nd International
Power Engineering and Optimization Conference, Jun. 4-5, 2008, Shah
Alam, Malaysia, pp. 5156.

229

BIODATA OF THE AUTHOR

Jawad Nagi was born in Karachi, Pakistan on


March 23, 1985. He received his Bachelors degree
from Universiti Tenaga Nasional (UNITEN),
Malaysia with Honors in Electrical and Electronics
Engineering in 2007.
He is currently working as a Project Engineer in
the Power Engineering Centre (PEC) of Universiti
Tenaga Nasional (UNITEN) since January 2008. He
is currently working towards his Masters degree in
Electrical Engineering under a project collaborated
between TNB Distribution (TNBD) Sdn. Bhd., TNB
Research (TNBR) Sdn. Bhd. and the PEC of
UNITEN, titled: Development of an Intelligent System for Detection of
Abnormalities and Probable Fraud by Metered Customers in TNB Distribution
Division. His research interests include pattern recognition, image processing
and recognition, load forecasting, fuzzy logic, neural networks and support
vector machines.

230

You might also like