Professional Documents
Culture Documents
Power System Security Assessment: Application of Learning Algorithms
Power System Security Assessment: Application of Learning Algorithms
Christian Andersson
Licentiate Thesis
Department of Industrial Electrical Engineering
and Automation
Department of
Industrial Electrical Engineering and Automation
Lund University
Box 118
SE-221 00 LUND
SWEDEN
http://www.iea.lth.se
ISBN 91-88934-39-X
CODEN:LUTEDX/(TEIE-1047)/1-90/(2005)
ii
School of Technology and Society
Malmö University
iv
Abstract
The last years blackouts have indicated that the operation and control of
power systems may need to be improved. Even if a lot of data was available,
the operators at different control centers did not take the proper actions in
time to prevent the blackouts. This depends partly on the reorganization of
the control centers after the deregulation and partly on the lack of reliable de-
cision support systems when the system is close to instability. Motivated by
these facts, this thesis is focused on applying statistical learning algorithms
for identifying critical states in power systems. Instead of using a model of
the power system to estimate the state, measured variables are used as input
data to the algorithm. The algorithm classifies secure from insecure states
of the power system using the measured variables directly. The algorithm is
trained beforehand with data from a model of the power system.
The thesis uses two techniques, principal component analysis (PCA) and
support vector machines (SVM), in order to classify whether the power sys-
tem can withstand an (n−1)-fault during a variety of operational conditions.
The result of the classification with PCA is not satisfactory and the technique
is not appropriate for the classification problem. The result with SVM is
much more satisfying and the support vectors can be used on-line in order to
determine if the system is moving into a dangerous state, thus the operators
can be supported at an early stage and proper actions can be taken.
In this thesis it is shown that the scaling of the variables is important for
successful results. The measured data includes angle difference, busbar volt-
age and line current. Due to different units, such as kV, kA and MW, the data
must be preprocessed to obtain classification results that are satisfactory. A
new technique for finding the most important variables to measure or super-
vise is also presented. Guided by the support vectors the variables which has
large influence on the classification are indicated.
v
vi
Contents
Abstract iv
Preface 1
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . 1
1 Introduction 3
1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Objectives and contributions of the thesis . . . . . . . . . . 6
1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Data set 15
3.1 Data set structure . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 The NORDEL power system . . . . . . . . . . . . . . . . . 16
3.3 NORDIC32 . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 The 2k factorial design . . . . . . . . . . . . . . . . . . . . 21
3.5 Dimension criterion . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Training set . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7 Classification set . . . . . . . . . . . . . . . . . . . . . . . 24
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 26
vii
4.2 Number of principal components . . . . . . . . . . . . . . . 29
4.3 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 Scaling to keep the physical properties . . . . . . . . 31
4.4 Classifying power system stability with PCA . . . . . . . . 32
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6 Conclusions 75
6.1 Summary of the results . . . . . . . . . . . . . . . . . . . . 75
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Bibliography 77
viii
Preface
During the final year of my B.Sc. studies in 2001 I was introduced to the
challenges related to planning and operation of power system by my current
PhD-advisor Bo Eliasson. I was studying computer and electrical engineer-
ing at Campus Norrköping, Linköping Institute of Technology. When plan-
ning my undergraduate project I contacted Daniel Karlsson at ABB Tech-
nology Products and came to work with wide area protection system against
voltage collapse. The project was supervised by Bo Eliasson at Malmö Uni-
versity. It was during my undergraduate project I became aware of the special
problems with multi input multi output system (MIMO), many of the power
systems today are probably the largest MIMO system ever built by mankind.
In the summer of 2002 I was appointed a PhD position at Malmö University
and was accepted to the post-graduate study program at Lund Institute of
Technology.
1
2
Acknowledgement
Introduction
In August and September 2003 three major blackouts occurred in N.E. North
America, S. Scandinavia and Italy. About 110 million people were affected
by these events. The interruption time ranged from a couple of hours to days.
The reason why these blackouts occurred were partly due to the lack of sup-
portive applications when the system is close to instability. If the power
system is close to the stability limit, actions must be taken by system opera-
tors to counteract a prospective blackout. Today a large problem for system
operators is to identify critical states since there are thousands of parameters
which describe and affect the state.
The deregulation of the power markets has contributed to the fact that the
power systems today are frequently operated close to stability limit. This is
due to a larger focus on economical profits instead of system security and
preventive maintenance.
The last three years the consumption of electrical power in the world has
increased with 3 %, each year. That corresponds to investments of 60 bil-
lion USD each year in production and transmission equipment to match the
increased consumption. If the trend continues actions must be taken to main-
tain system stability.
The latest blackouts and deregulated power markets contribute to an in-
creased requirement of Wide Area Protection Schemes (WAPS). The WAPS
will make it possible to change the operation criteria from: ”The power sys-
tem should withstand the most severe credible contingency” to ”The power
system should withstand the most severe credible contingency, followed by
protective actions from the WAPS.”
3
4 Chapter 1. Introduction
This thesis deals with the problem of classification of power system stabil-
ity. An overview of power system stability and design can be found in the
standard textbooks [13] and [22]. In [23] the CIGRE Study Committee 38
and the IEEE Power System Dynamic Performance Committee gives a back-
ground and overview of power system stability with the focus on definition
and classification. A nice introduction to power system voltage stability can
be found in [34], which has a detailed description of reactive power com-
pensation and control. These books and reports gives a good overview and
background to power system design and control.
Previous work on power systems stability has been focused on identifica-
tion and finding methods to prevent instability. In [6] and [26] the problem
with voltage instability is analyzed, these two theses are a part of the previ-
ous work that is the inspiration to this thesis. Another work that has been
a great inspiration to this thesis is the research by Louis Wehenkel which is
summarized in [39]. Wehenkel uses the same approach to the stability prob-
lem as in this thesis but he uses decision trees instead of principal component
analysis and support vector machines.
The knowledge about separation with a hyperplane has been known for
quite long time, but the calculation of the hyperplane has caused problems.
In 1992 Support Vector Machines (SVM) was introduced by Vapnik, see
[37]. It is described in detail in [36]. The SVM calculate a hyperplane with
the maximal margin between the two classes. This is done using a global
optimization problem on quadratic form, which can easily be solved.
The last years SVM has been used to find methods to solve many classi-
fication and regression problems. In [16] and [31], SVM has been applied
to face detection with successful results. Text categorization has also been
improved with SVM, see [18]. In [19], windows are detected from pictures
of city scenes. The author has found one publication which deals with clas-
sification of power system stability using SVM, see [30]. In the article clas-
sification of power system stability is made with nonlinear SVM and neural
networks. The data set which is used has an impressive size, and the result
of the classification is very successful.
6 Chapter 1. Introduction
People with different background should be able to read and comprehend the
thesis. This means that some readers will consider some parts of the thesis as
obvious. However, if the reader is not oriented in power system fundamentals
an introduction can be found in [13] and [22]. The main contributions of the
thesis are listed below.
The thesis consists of two parts, the design of the data set and two different
methods applied to the classification problem.
In the first part, Chapter 2, an overview of power system stability and op-
eration is given. Furthermore, the contribution of this thesis to the power sta-
bility problem is described. In Chapter 3, a short summary of the NORDEL
power system and the operation rules is given. Thereafter, an introduction to
design of experiments with factorial design is given. Finally, the design of
the data set is described.
The second part consists of Chapter 4 and Chapter 5. In Chapter 4 the
unsupervised learning method principal component analysis is applied to the
classification problem and in Chapter 5 the supervised learning method of
calculating a hyperplane with support vector machines is applied to the clas-
sification problem. Chapter 6 summarizes the results and suggests future
work.
1.4 Publications
This chapter consists of a brief overview of power system operation and the
security limits that are taken to maintain system stability in stressed situa-
tions.
The stability of power systems is defined as: "The ability of an electric power
system, for a given initial operating condition, to regain a state of operating
equilibrium after being subjected to a physical disturbance, with most sys-
tem variables bounded so that practically the entire system remains intact",
[23]. This means that some faults should not cause any instability problems.
Figure 2.1 shows different stability states. Due to the complexity of power
systems, the different instabilities affect each other. A blackout scenario
often starts with a single stability problem but after the initial phase other
instability problems may occur. Therefore the actions in order to prevent a
blackout in stressed situations may be different. For a detailed description of
the different stability scenarios see [23]. Due to mainly economical factors
there is a limit of which faults the system should withstand and remain in
stability.
The synchronous NORDEL system (Sweden, Norway, Finland and Sealand)
is dimensioned to fulfill (n−1) criterion, which means that the system should
withstand a single fault, e.g. short circuit of busbar or disconnection of nu-
clear unit, see [1]. In Figure 2.2 the different operative states are illustrated
and how the power systems operation state can change due to a disturbance.
9
10 Chapter 2. Power system control
Power System
Stability
Short Term Short Term Long Term Short Term Long Term
In the secure state where the operating point of a power system should nor-
mally be it is possible to maintain system stability after a (n − 1) fault. It is
very important to have tools to verify the operating point of the system. In
this thesis a new method to improve the classification of the power system
stability is introduced. The method also makes it possible to deicide how
close the system is to the dividing line between the secure and insecure state.
However, faults which are included in the (n − 1) criterion sometimes
cause blackouts. In Sweden this happened Dec. 27, 1983 when a broken dis-
connector caused a short circuit of a busbar which lead to a blackout where
60 % of the load was disconnected from the system. The time to restore
normal operation was about 7 h. In the N.E. of North America on Aug. 14,
2003, 50 million people were affected by a blackout caused by a cascade
disconnection of transmission lines. The time between the disconnection of
the first transmission line and the blackout was 2,5 hours and the restoration
took almost 48 hours. In Italy in Sep. 28, 2003, 53 million people were
affected by a blackout which also was caused by a cascade disconnection of
transmission lines. The fault scenario was a bit more difficult to overview
than in USA and Canada. First a transmission line was disconnected due to
a short-circuit caused by a line sagging into a tree and then after 25 minutes
twelve other lines were disconnected within 2,5 minutes which caused the
blackout. The restoration time was 18 hours and the number of switching
operations that had to be made was about 3600.
2.1. Power system stability 11
PREVENTIVE STATE
NORMAL
SECURE
Maximize economy and minimize the effect of uncertain contingencies
Preventive control
RESTORATIVE ALERT
INSECURE
Resynchronization Tradeoff of
Load pickup preventive vs
corrective control
Emergency control
(corrective)
IN EXTREMIS EMERGENCY
Protections
ASECURE
Figure 2.2: The operating states and transitions for power systems. Adapted
from [12] .
12 Chapter 2. Power system control
In Figure 2.2 the state during abnormal situations are described with the
insecure and asecure states. If a power systems operating point moves from
the secure to the insecure state, actions must be taken by system operators
to return to the secure state. By the three blackouts in Sweden, N.E. North
America and Italy, it is shown that the preventive time interval can be be-
tween 50 seconds to 2,5 hours. From a system operation point of view it is
possible to take actions before entering the insecure state. This thesis has
focus on supportive information, before entering the insecure area. In the
asecure state, only automatic actions and control are realistic.
Blackouts have occurred and will happen in the future, but the objective
from the power industry is to increase the time between the blackouts and
also minimize the number of people which are affected. By e.g. support-
ive applications for the operators based on new technology, the prevention
against blackouts can be improved. By increased automation the imbalances
between production and consumption can be quickly alleviated. Also the
black-start capability after a blackout can be improved.
Data set
In this chapter the data set for verifying the statistical learning algorithms
applied on classification of power system stability is described. With the
algorithm is should be possible to estimate the class for all states in the power
system. It is therefore important that data set properly describe the variations
of the system. It must be considered that if the data set does not show enough
variability, the SVM cannot be successfully applied.
To create a function for classification with a learning machine with the best
conditions, the space of operation should be covered by the data set. In e.g.
chemical processes, design of experiments are used to cover the operation
space with a limited number of experiments, for details see [10].
The data set used to verify the classification algorithms has not been de-
signed with experimental design since it would become very large. The time
it takes to prepare the simulator and collect the data are not covered in this
project. Instead, the knowledge about the daily operation of the NORDEL
system and the problems that has caused stressed operation situations has
been taken into account to create the data set.
In this thesis the data set is organized as a matrix, see Table 3.1. A row in the
matrix, denoted with an example, is the state of the power system contain-
ing the measured variables. The label vector consists of the binary code
for each example. If the example sustains the dimension criterion (n − 1),
see Section 3.5, has the binary code +1, otherwise -1.
15
16 Chapter 3. Data set
Table 3.1: Description of how the data set is organized, the label vector con-
sists of the binary code to the corresponding examples.
Variables Label
Example 1 +1
.. ..
. .
Example l -1
Figure 3.1: The NORDEL main grid, 2000, The cut-sets, Snitt 1 to 4, are
marked.
18 Chapter 3. Data set
3.3 NORDIC32
The NORDIC32 (N32) system is used for the study [5]. It is an equivalent or
simplified model of a transmission system with similarities to the Swedish
power system, see Figure 3.2. The generation consists of approximately
45 % nuclear, 45 % hydro and 10 % other thermal power. The installed
capacity is 17 GW, see Table 3.3. The system has 32 substations, 35 units,
50 branches and the number of measurable variables are 653. To adjust the
spinning reserve and speed droop the relation between N32 and NORDEL
are settled to approximately 1:4.
In order to produce the data set, the platform described below was used.
Since 1996 the simulator ARISTO (Advanced Real-time Interactive Sim-
ulator for Training and Operation) has been used for operator training at
Swedish National Grid (Svenska Kraftnät). The simulator has unique fea-
tures for stability analysis in real time [9]. Not only that, the simulation
proves to be extremely robust during abnormal operating conditions. Using
ARISTO the operators obtain a better understanding of the phenomena that
may cause limitations and problems to power transmission and distribution
systems.
The load models in ARISTO cover the load dynamics (short and long
term) and load characteristics (voltage and frequency dependence) very well.
Transmission lines are represented with pi-equivalents. The generation units
are represented by synchronous generators and turbine governors, either ther-
mal or hydro. The protection equipment implemented in ARISTO are
• Transmission capability
• Consumption
and responses
• Bus voltages
• Angle differences
• Line current
• System frequency
According to Nordic grid instructions, the Nordel system must fulfill the
(n − 1) criterion, see [1] and [2]. An example is considered acceptable if the
power system can withstand any single failure. The system has withstood a
contingency if:
The first step is to find the minimum load level for the four generation pat-
terns, by scaling the active power part of the load on a common system base.
To create balance in the system at the minimum level the production of hy-
dro power is corrected. As little hydro power as possible should be used to
24 Chapter 3. Data set
create balance in the minimum case. In order to achieve training data with
the characteristics of the NORDEL power system the following dimension
criteria are used for all examples.
• The voltage profile is adjusted according to [24], where the normal op-
eration voltage on the 400 kV level is 400-415 kV, with the minimum
accepted voltage 395 kV and the maximum accepted voltage 420 kV.
• The spinning reserve in the system is 750 MW, divided on three hydro
power units. It corresponds to the largest unit in N32.
• The ratio between N32 and NORDEL is approximately 1:4. Taken this
into account the speed droop is tuned to about 1500 MW/Hz for all
models. The Nordel system (Sweden, Finland, Norway and Sealand)
it is 6000 MW/Hz, see [2].
When the minimum load level is achieved, a new model is created by increas-
ing the load level by 5 % increments from the base load profile. Spinning
reserve and the speed droop is tuned so that all models have similar charac-
teristic. For the 4 generation patterns the load level is increased with 5 %
until the maximum load level is identified. The total number of models in
the training data is 30, with a consumption between 3 280 and 13 570 MW.
This is done in order to extend the generation/transmission patterns as much
as possible with a limited number of training models. The generation and
consumption of reactive and active power for the 30 examples in the training
set are listed in Table 3.4. The label for the examples are also listed here.
Table 3.4 shows that the increased consumption for the three different base
patterns, NN, NO, ON, does not imply a more stressed state.
Table 3.4: The system generation and consumption for the examples in the
training set, MW and MVar.
Name PGeneration Pload QGeneration Qload Label
NN1 7 489 7 329 1 550 3 359 -1
NN2 8 869 8 752 975 3 359 1
NN3 9 968 9 846 1 244 3 359 1
NN4 10 545 10 392 907 3 359 -1
NN5 11 136 10 940 1 019 3 359 -1
NN6 11 707 11 487 970 3 359 1
NN7 12 463 12 034 2 386 3 359 -1
NN8 12 948 12 581 1 640 3 359 1
NN9 13 545 13 128 2 019 3 359 1
NN10 14 075 13 566 2 832 3 359 -1
NO1 4 541 4 376 1 126 3 359 1
NO2 5 667 5 470 857 3 359 1
NO3 6 826 6 564 998 3 359 1
NO4 7 418 7 111 1 199 3 359 -1
NO5 8 021 7 658 1 440 3 359 -1
NO6 8 630 8 205 1 581 3 359 -1
NO7 9 266 8 752 2 408 3 359 -1
ON1 4 468 4 375 915 3 359 1
ON2 5 556 5 470 873 3 359 1
ON3 6 674 6 564 798 3 359 1
ON4 7 243 7 111 780 3 359 -1
ON5 7 819 7 657 777 3 359 -1
ON6 8 405 8 205 908 3 359 1
ON7 8 998 8 752 1 041 3 359 1
ON8 9 608 9 299 1 382 3 359 1
ON9 10 203 9 846 1 580 3 359 1
ON10 10 835 10 393 2 262 3 359 1
OO1 3 428 3 282 760 3 359 -1
OO2 4 639 4 376 1 075 3 359 -1
OO3 5 905 5 470 2 254 3 359 -1
26 Chapter 3. Data set
fulfills the criterion in the training set is manipulated so that it does not fulfill
the criterion in the classification set. The binary code for the examples is
decided with the same criterion as for the training set.
3.8 Conclusions
In this chapter the data set that will be used to verify the classification meth-
ods in Chapter 4 and 5 described. The methods to design the different ex-
amples and the dimension criterion for deciding the binary code are pre-
sented. The detailed protection equipment in ARISTO, adjustment of spin-
ning reserve, speed-droop, transient analysis and comparison to Nordel grid
instructions improves the quality of the data set. The ARISTO simulator is
used instead of a normal load flow solver, which improves the quality.
Chapter 4
PCA was introduced by Pearson in 1901 [32]. With PCA the dimension of
the data set is reduced from Rd to Rn , where n ≤ d. The principal com-
ponents are orthogonal in Rn . The principal components are the directions
in the data set with the highest variance, see Figure 4.1. An introduction to
how the principal components are calculated will be given below.
27
28 Chapter 4. Principal component analysis
* *
W
1
* *
* * *
* *
* *
*
* * * *
* *
* *
* * *
Figure 4.1: The first linear principal component of a data set. The line mini-
mizes the total squared distance from each point to its orthogonal
projection onto the line.
X̂ = [x1 − x1 , . . . , xd − xd ] ,
1
Σ= X̂> X̂ .
l−1
X̂ = USV> ,
where Ul×l and Vd×d are orthogonal matrices and Sl×d is a diagonal matrix
containing the singular values. The first principal component, the direction in
the data set with largest variance captured with a linear function, is the eigen-
vector with the largest corresponding singular value. The SVD-factorization
is usually defined with the singular values in descending order.
In [15] and [11] the calculation of principal components are denoted with
X̂ = TP + E , (4.1)
where Tl×n are called score vectors, Pd×n are called loading vectors, n is
the number of principal components and E is the residual or variance in the
data set not described by the principal components. T and P are calculated
as,
I = {i1 , i2 , . . . , id }
J = {i1 , i2 , . . . , in }
P = V> {I, J}
T = X̂P> ,
where V> {I, J} are all the rows and the n first columns of V > .
Singular value
0 i
0 1 2 3 4 5 6 7
Principal component
b > X.
where λi are the eigenvalues to X b The number of principal components
should be chosen so Hvalue and Herror are minimized. Both functions can
be used but it is often more appropriate to use H error because it is more
distinct, compared to Hvalue .
4.3 Scaling
ing, e.g. the angle difference is between 0-90 degrees and the production of
active power difference between 0-600 MW. In practice, the examples in the
data set xi are stored as rows in a matrix. The columns z = (z 1 , . . . , zl )
of this matrix are scaled to a predefined interval. There are different meth-
ods to scale the variables in the examples e.g. mean-centering and univariate
scaling, see [11].
z−m
zunivariate = , where (4.2)
σ
l
1X
m = zi
l
i=1
v
u l
uX (zi − m)2
σ = t .
l−1
i=1
To keep the physical properties of the system a new scaling method is in-
troduced, by scaling the variables group-wise. First, the voltage V and I are
calculated in p.u. [13] as
V I
Vp.u. = and Ip.u. = ,
Vbase Ibase
Sbase
Ibase = √ .
3Vbase
Then the variables are divided into groups, e.g. busbar V, load P, line I and
angle differences. The groups are scaled with the univariate function (4.2).
Instead of dividing with the standard deviation in every column, the columns
are divided with the largest absolute value found in groups of variables.
z−m
zgroup = , where (4.3)
Max
Max = The largest absolute value in a group of variables .
32 Chapter 4. Principal component analysis
7
x 10
2
8000
1.8
7000
1.6
6000
1.4
5000 1.2
4000 1
0.8
3000
0.6
2000
0.4
1000
0.2
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
(a) (b)
Figure 4.3: (a) The scree plot for the first 30 singular values to the training set.
(b) The 30 first numbers of Herror for the training set.
The principal components are calculated for the training set, see Section 3.6.
According to Figure 4.3 (a), the number of principal components should be
between six or seven if the ”knee” condition is used. In Figure 4.3 (b) the
number of principal components should be 29, if H error should be mini-
mized. The purpose with principal components, to reduce the dimension,
is not fulfilled according to Herror . In Figure 4.4 to 4.7 the score plots for
the five first principal components are presented. The two different scaling
methods from Section 4.3 are used. It is clearly shown in Figure 4.4 to 4.7
that it is impossible to linearly separate the examples in the training set with
different binary code using these five principal components. Experiments
have shown that 19 principal components are needed to linearly separate the
examples.
4.5 Conclusions
40 40
30 30
Second Principal Component
10 10
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−40 −30 −20 −10 0 10 20 30 40 −40 −30 −20 −10 0 10 20 30 40
First Principal Component First Principal Component
40 40
30 30
Fourth Principal Component
20
Fifth Principal Component 20
10 10
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−40 −30 −20 −10 0 10 20 30 40 −40 −30 −20 −10 0 10 20 30 40
First Principal Component First Principal Component
Figure 4.4: Score plot for the 5 first principal components. The examples are
marked with ”o” or ”*”, where ”o” has the binary code -1 and ”*”
has the binary code 1. The data set are scaled univariate, see (4.2) .
examples in the training set with different binary code. A method for separa-
tion in higher dimensions must be chosen to succeed with the classifications.
34 Chapter 4. Principal component analysis
40 40
30 30
20 20
10 10
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−40 −30 −20 −10 0 10 20 30 40 −40 −30 −20 −10 0 10 20 30 40
Second Principal Component Second Principal Component
40 40
30 30
20 20
10 10
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−40 −30 −20 −10 0 10 20 30 40 −40 −30 −20 −10 0 10 20 30 40
Second Principal Component Third Principal Component
40 40
30 30
Fifth Principal Component
20 20
10 10
0 0
−10 −10
−20 −20
−30 −30
−40 −40
−40 −30 −20 −10 0 10 20 30 40 −40 −30 −20 −10 0 10 20 30 40
Third Principal Component Fourth Principal Component
Figure 4.5: Score plot for the 5 first principal components. The examples are
marked with ”o” or ”*”, where ”o” has the binary code -1 and ”*”
has the binary code 1. The data set are scaled univariate, see (4.2) .
4.5. Conclusions 35
10 10
8 8
6 6
Second Principal Component
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
−10 −10
−10 −5 0 5 10 −10 −5 0 5 10
First Principal Component First Principal Component
10 10
8 8
6 6
Fourth Principal Component
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
−10 −10
−10 −5 0 5 10 −10 −5 0 5 10
First Principal Component First Principal Component
10 10
8 8
6 6
Fourth Principal Component
Third Principal Component
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
−10 −10
−10 −5 0 5 10 −10 −5 0 5 10
Second Principal Component Second Principal Component
Figure 4.6: Score plot for the 5 first principal components. The examples are
marked with ”o” or ”*”, where ”o” has the binary code -1 and ”*”
has the binary code 1. The data set are scaled group-wise, see
(4.3) .
36 Chapter 4. Principal component analysis
10 10
8 8
6 6
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
−10 −10
−10 −5 0 5 10 −10 −5 0 5 10
Second Principal Component Third Principal Component
10 10
8 8
6 6
Fifth Principal Component
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
−10 −10
−10 −5 0 5 10 −10 −5 0 5 10
Third Principal Component Fourth Principal Component
Figure 4.7: Score plot for the 5 first principal components. The examples are
marked with ”o” or ”*”, where ”o” has the binary code -1 and ”*”
has the binary code 1. The data set are scaled group-wise, see
(4.3) .
Chapter 5
SVM were introduced by Vapnik in 1992 [36] and has received considerable
attention recently. The objective is to create a learning machine that responds
with a binary output to the test data. If the system sustains the dimension
criterion it is marked by +1 otherwise it is marked -1. This designation is here
after called the binary code. In order to train the machine, the training data
must properly describe the variations of the system. It must be considered
that if the training data do not show enough variability, the SVM cannot be
successfully applied.
37
38 Chapter 5. Support vector machines
where xi ∈ Rd are the input measurements and yi ∈ {−1, 1} are the corre-
sponding binary responses. Suppose that there exists a hyperplane,
x·w+b = 0 ,
with parameters (w, b), separating positive from negative examples. Let d+
and d- denote the shortest distance from the hyperplane to the closest posi-
tive/negative point respectively. The goal of the training is to find the hyper-
plane which best separates the training data. For a linear SVM this results in
a convex optimization problem, which has a global solution. The hyperplane
is therefore the plane in Rd , with the largest d+ and d-. This is a useful prop-
erty that other classification methods e.g. neural networks [33] do not have.
xi · w + b ≤ 0, for yi = −1
may be combined into
yi (xi · w + b) ≥ 0 .
The points which lie on the hyperplane H 1 : xi · w + b = −1 have the
perpendicular distance from the origin |b − 1|/kwk, and the points on the
hyperplane H2 : xi · w + b = 1 have the perpendicular distance from the
origin |b + 1|/kwk. Hence the margin is 1/||w||, see Figure 5.1.
H2
o
H1
w
o o
*
* o
* *
d
+
d
−
H2
*j
ξj o
H1
ξi w
o o
oi *
* o
* *
d
+
d
−
test, or a separate validation test set the value of the parameter is decided to
obtain optimal performance. In this thesis C is set to 1000. The optimization
test has not been performed to decide the value of C, due to the limited num-
ber of training examples. For implementation in power system operation the
performance will be improved if C has the optimal value. The corresponding
Lagrangian for the 2-norm soft margin optimization problem is
l l l
1 CX 2 X X
L(w, b, ξ, a) = ||w||2 + ξi − ai yi (xi · w + b) + ai (1 − ξi ) ,
2 2
i=1 i=1 i=1
(5.6)
where a = (a1 , . . . , al ) are the Lagrange multipliers. The constraints from
the gradient of the optimization problem with the respect to w, ξ and b are
l
∂L(w, b, ξ, a) X
= w − ai yi xi = 0
∂w
i=1
∂L(w, b, ξ, a)
= Cξ − a = 0
∂ξ
l
∂L(w, b, ξ, a) X
= ai yi = 0 .
∂b
i=1
42 Chapter 5. Support vector machines
l
X l
1 X
L(w, b, a) = ai − ai aj yi yj xi · x j , (5.10)
2
i=1 i,j=1
which is identical with the maximal margin. The only difference is the de-
termination if b with the KKT condition
ai (yi (x · w + b) − 1 + ξi ) = 0
ξi (ai − C) = 0 ,
5.2 Scaling
2(z − Min)
zlinear = −1 (5.11)
Max − Min
Min = The smallest value in every column
Max = The largest value in every column.
Sbase
Ibase = √ .
3Vbase
44 Chapter 5. Support vector machines
Then the variables are divided into groups e.g. busbar V, load P, line I and
angle differences. The groups are scaled with the linear function (5.11).
Instead of finding Min and Max in every column, the Min and Max values
are found in groups of variables which are then scaled group-wise.
5.2.3 Variables
In total 653 variables are used in an example to describe the operating point
of the power system, post-fault. The variables are
Using the OSU SVM matlab toolbox [27] the support vectors are calculated.
The number of support vectors is 20. The support vectors are used with (5.9)
to calculate the hyperplane. The hyperplane is the optimal linear separation
between the two classes. Using (5.4), the function value for each of the
examples in the training set is decided.
In Figure 5.3 (a)-(c) the function value for the 30 examples in the training
set are marked by a ”o” or ”*” (”o” has the binary code -1 and ”*” +1). The
examples which are used as support vectors have the function value ±1, as
in Figure 5.3 (a)-(c). It is possible to find a hyperplane which separates the
two classes in the training set, scaled by 3 different methods, with SVM.
In Section 3.3 one of the traditional methods to indicate if the (n − 1) is
fulfilled in the Swedish part of the Nordel system, the cut-sets methodology
was introduced. Figure 5.4 shows that it is difficult to use the cut-sets to
classify the stability of the examples in the training set. The examples with
different binary code can not be separated with a certain threshold.
As the number of available examples is small, the performance of the
training set is validated by leave-one-out cross validation, see [38]. Each
time 29 of the 30 examples in the training set are used to train the SVM.
The remaining example is then used for classification. This is done for all 30
examples, see Figure 5.3 (d). With the leave-one-out test 28 of 30 examples
in the training set are correctly classified. The robustness in the training set
is pretty good with a fault classification of 28/30.
46 Chapter 5. Support vector machines
3 3
2 2
1 1
Function value
Function value
0 0
−1 −1
−2 −2
−3 −3
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
(a) (b)
3 3
2 2
1 1
Function value
Function value
0 0
−1 −1
−2 −2
−3 −3
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
(c) (d)
Figure 5.3: Linear SVM with soft-margin optimization, where C = 1000. Ex-
amples marked with ”o” has the binary code -1 and ”*” has the
binary code +1. The function value, see (5.4), for the training set.
All 633 feature variables are used, 20 variables are excluded due
to variance equal to 0. (a) The variables are scaled linear group-
wise, the unit stator current are also scaled after capacity. (b) The
variables scaled linear group-wise. (c) The variables scaled lin-
ear separately. (d) Leave-one-out test. The function value, for the
training set, 28 of 30 examples are correctly classified. The vari-
ables are scaled as in (b).
5.4. Classifying with a hyperplane 47
3500 2000
1800
3000
1600
2500 1400
1200
Cut−set value
Cut−set value
2000
1000
1500
800
1000 600
400
500
200
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
(a) (b)
Figure 5.4: (a) The sum of transferred active power through Snitt2, for the
examples in the training set. (b) Snitt4, for the examples in the
training set.
sification problem can also be too large, thus the hyperplane is over-fitted,
see Figure 5.6. The hyperplane is over-fitted if the hyperplane is adjusted to
the training set instead of the classification problem, which hopefully is de-
scribed by the training set. The linear SVM decreases the risk for overfitting,
compared to using a nonlinear function. There is a need to investigate if the
classification can be improved with fewer variables.
As with the examples in the training set the cut-set methodology is not
successful to use to classify the examples in the classification set, see Fig-
ure 5.7. The examples with different binary code can not be separated with
a threshold.
5.4. Classifying with a hyperplane 49
3 3
2 2
1 1
Function value
Function value
0 0
−1 −1
−2 −2
−3 −3
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
(a) (b)
20
15
10
5
Function value
−5
−10
−15
−20
0 2 4 6 8 10 12 14 16
Example
(c)
3500 2000
1800
3000
1600
2500 1400
1200
Cut−set value
Cut−set value
2000
1000
1500
800
1000 600
400
500
200
0 0
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
(a) (b)
Figure 5.7: (a) Snitt2, for the classification set. (b) Snitt4, for the classification
set.
5.5. Reducing variable sets 51
There is a need to analyze if the original data set consists of variables which
cause over-fitting problems. The problem is, given the original data set in
Rd , what is the best subset of feature variables in R c , where c<d? It is not
possible to find an optimal solution for this problem. First of all, we do not
know the dimension of Rc and if we knew it, the number of possible subsets
is
d!
nc =
(d − c)!c!
which is very large when d is 650.
Therefore, a new method to reduce the dimension of the data set is intro-
duced, thus reducing the dimension of the data set without removing vari-
ables which are important for the separation. First, all variables in the train-
ing data with zero variance are removed. Then a method for further reduction
is introduced. To reduce the variables the whole training data is scaled ac-
cording to (5.11), and a hyperplane is calculated with SVM. By guidance
from the normal w of the hyperplane, the dimension of the training set is
reduced.
The unit normal of the hyperplane wn is calculated, as
w
wn = . (5.12)
||w||
e
j
Figure 5.8: Reducing variables by guidance form the normal of the hyper-
plane.
where the numbers of variables in each group is shown, ordered from left to
right, in parenthesis.
3 3
2 2
1 1
Function value
Function value
0 0
−1 −1
−2 −2
−3 −3
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
(a) (b)
• Quantitative
Qualitative variables are normally represented numerically by codes. The
easiest representation is a single binary code as 0 or 1, e.g. circuit breakers
status. In this study circuit breakers/disconectors are not used in the data set,
but shunt capacitors and reactors are included in the data set and they are
approximately qualitative.
Quantitative variables may assume any numerical value along a continu-
ous scale. All variables in the data set are of this type except shunts. It is
important to be aware of the difference between the two variable types.
There is a need to investigate if the number of variables that are used for
classification can be reduced. Large data sets can cause over-fitting prob-
lems, it is also interesting to minimize the cost to measure and collect the
data set. Therefore the different variable groups are used, one-at-a-time, to
calculate the hyperplane with soft margin optimization. The variables are
54 Chapter 5. Support vector machines
scaled group-wise, unit stator current is also scaled after capacity. In Figures
5.10-5.12 the results with the different variables are shown, some variables
give a better hyperplane for separation than others. The load and frequency
gives an unsatisfactory separation while line and voltage gives successful
separation.
5.5. Reducing variable sets 55
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
Q load I load
30 30
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
Q shunt I shunt
30 30
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
Figure 5.10: The function value for the SVM training based on the different
variable groups in the training set. The soft-margin optimization
is used to calculate a hyperplane with slack variables, where C =
1000, for further information see Section 5.1.2. The variables are
scaled group-wise, where the unit stator current also are scaled
after capacity.
56 Chapter 5. Support vector machines
P unit Q unit
30 30
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
P line Q line
30 30
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
Figure 5.11: The function value for the SVM training based on the different
variable groups in the training set. The soft-margin optimization
is used to calculate a hyperplane with slack variables, where C =
1000, for further information see Section 5.1.2. The variables are
scaled group-wise, where the unit stator current also are scaled
after capacity.
5.5. Reducing variable sets 57
I line P transformer
30 30
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
Q transformer I transformer
30 30
20 20
10 10
Function value
Function value
0 0
−10 −10
−20 −20
−30 −30
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Example Example
Frequency
30
20
10
Function value
−10
−20
−30
0 5 10 15 20 25 30
Example
Figure 5.12: The function value for the SVM training based on the different
variable groups in the training set. The soft-margin optimization
is used to calculate a hyperplane with slack variables, where C =
1000, for further information see Section 5.1.2. The variables are
scaled group-wise, where the unit stator current also are scaled
after capacity.
58 Chapter 5. Support vector machines
1.5
−0.5
−1
−1.5
−2
0 5 10 15 20 25 30
Example
Figure 5.13: Two of the variables in the training set, scaled linear group-wise,
CT22 X1 CurrentValue, qualitative, marked with ”” and CT22
A400 VoltageValue, quantitative, marked with ”+”.
Table 5.1: The 30 variables which have the largest influence on the hyperplane.
The soft-margin optimization is used to calculate a hyperplane with
slack variables, where C = 1000, for further information see Sec-
tion 5.1.2. The variables are scaled group-wise, where the unit sta-
tor current also are scaled after capacity.
Object w
||w||
CT22 X1 Shunt reactor (I) 0.4675
CT32 X3 Shunt reactor (I) 0.3942
CT22 X1 Shunt reactor (Q) 0.2635
FT41 EK1 Shunt capacitor (I) 0.2276
CT32 X3 Shunt reactor (Q) 0.2243
CT31 X3 Shunt reactor (I) 0.2124
FT42 X1 Shunt reactor (I) 0.1483
FT44 X1 Shunt reactor (I) 0.1419
FT50 X1 Shunt reactor (I) 0.1410
CT11 T2 Unit (Q) 0.1364
FT41 EK1 Shunt capacitor (Q) 0.1309
CT21 X1 Shunt reactor (I) 0.1225
CT31 X3 Shunt reactor (Q) 0.1207
CT11 X1 Shunt reactor (I) 0.1100
CT32 X1 Shunt reactor (I) 0.1075
CT22 A400 Busbar (V) 0.0871
CL1 Line (I) CT11 0.0867
FT42 X1 Shunt reactor (Q) 0.0837
FT44 X1 Shunt reactor (Q) 0.0810
FT44 T11 Unit (Q) 0.0799
FT50 X1 Shunt reactor (Q) 0.0773
FT51 G2 Unit (Q) 0.0752
FT50 X2 Shunt reactor (I) 0.0743
CT72 G2 Unit (P) 0.0733
CL8 Line (Q) CT22 0.0727
CT122 A130 Busbar (V) 0.0718
CT21 X1 Shunt reactor (Q) 0.0695
FL14 Line (I) FT51 0.0690
FL15 Line (I) FT51 0.0690
CL3 Line (Q) CT22 0.0671
60 Chapter 5. Support vector machines
Due to the different variable types, different variable setups are applied to
classification with SVM. In Section 5.4 the data set was scaled with three
different methods. All three are also used here because no method could be
excluded from the results in Section 3. Five variable setups are compared,
1. Group-wise
3. Separately.
The variable setup which gives the best separation are 2, 4 and 5.
5.5. Reducing variable sets 61
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
Figure 5.14: The function value for the SVM classification based on the dif-
ferent variable setups and different scaling methods. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
62 Chapter 5. Support vector machines
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
Figure 5.15: The function value for the SVM classification based on the dif-
ferent variable setups and different scaling methods. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
5.5. Reducing variable sets 63
15 15
10 10
5 5
Function value
Function value
0 0
−5 −5
−10 −10
−15 −15
−20 −20
0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16
Example Example
15
10
5
Function value
−5
−10
−15
−20
0 2 4 6 8 10 12 14 16
Example
Figure 5.16: The function value for the SVM classification based on the dif-
ferent variable setups and different scaling methods. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
64 Chapter 5. Support vector machines
The method to reduce variables by guidance from the normal to the hy-
perplane is applied to the data sets with the variable setups which give the
best classification, according to Figure 5.14-5.16. The variable setups with
shunts are excluded. The linear scaling method is not used. The reduction
is illustrated with a new plot, reduction plot, which shows the number of
correct classified examples in the classification set. The number of variables
in the data set is reduced by two different reduction methods:
2. The other is by excluding the variable which has the least influence
on the hyperplane, the new data set is used to calculate the difference
between the variables. Thereafter a new hyperplane is calculated and
the process is repeated, see Figure 5.19-5.20.
The variable groups that are dominating in the reduction process are pre-
sented in Table 5.2. The variable setups that have varying classification result
are divided into subsets, see Figure 5.17-5.20. In these subsets the dominat-
ing variable groups are presented in Table 5.2. This is only a guideline of
which variable groups that are excluded, there are other variable groups rep-
resented in the subsets. In Table 5.3 the 30 variables which have the largest
impact on the hyperplane are listed. The variable setup where P, Q, trans-
former and shunt are reduced is used. These 30 variables are also pointed
out in Figure 5.21.
5.5. Reducing variable sets 65
Variable setup 2, scaling method 1, reduction method 1 Variable setup 2, scaling method 2, reduction method 1
16 16
14 14
12 3 2 1 12
Correct classifications
Correct classifications
3 2 1
10 10
4
8 8 4
6 6
4 4
2 2
0 0
0 100 200 300 400 500 0 100 200 300 400 500
Number of variables Number of variables
(a) (b)
Variable setup 4, scaling method 1, reduction method 1 Variable setup 4, scaling method 2, reduction method 1
16 16
14 14
3 2 1 2 1
12 12
Correct classifications
Correct classifications
10 10
8 8
4 3
6 6
4 4
2 2
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Number of variables Number of variables
(c) (d)
Figure 5.17: The reduction plot based on the different variable setups and dif-
ferent scaling methods. The x-axis show the numbers of variables
that are used in the classification. The y-axis show the number
of correct classified examples in the classification set. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
66 Chapter 5. Support vector machines
Variable setup 5, scaling method 1, reduction method 1 Variable setup 5, scaling method 2, reduction method 1
16 16
14 14
12 12
Correct classifications
Correct classifications
3 2 1 2 1
10 10
8 8
6 6 3
4
4 4
2 2
0 0
0 50 100 150 200 0 50 100 150 200
Number of variables Number of variables
(a) (b)
Figure 5.18: The reduction plot based on the different variable setups and dif-
ferent scaling methods. The x-axis show the numbers of variables
that are used in the classification. The y-axis show the number
of correct classified examples in the classification set. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
5.5. Reducing variable sets 67
Variable setup 2, scaling method 1, reduction method 2 Variable setup 2, scaling method 2, reduction method 2
16 16
14 14
12 12
Correct classifications
Correct classifications
10 10
8 8
6 6
4 4
2 2
0 0
0 100 200 300 400 500 0 100 200 300 400 500
Number of variables Number of variables
(a) (b)
Variable setup 4, scaling method 1, reduction method 2 Variable setup 4, scaling method 2, reduction method 2
16 16
14 14
12 12
Correct classifications
Correct classifications
10 10
8 8
6 6
4 4
2 2
0 0
0 50 100 150 200 250 0 50 100 150 200 250
Number of variables Number of variables
(c) (d)
Figure 5.19: The reduction plot based on the different variable setups and dif-
ferent scaling methods. The x-axis show the numbers of variables
that are used in the classification. The y-axis show the number
of correct classified examples in the classification set. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
68 Chapter 5. Support vector machines
Variable setup 5, scaling method 1, reduction method 2 Variable setup 5, scaling method 2, reduction method 2
16 16
14 14
12 12
Correct classifications
Correct classifications
4 3 2 1 3 2 1
10 10
8 8
6 6
4 4
2 2
0 0
0 50 100 150 200 0 50 100 150 200
Number of variables Number of variables
(a) (b)
Figure 5.20: The reduction plot based on the different variable setups and dif-
ferent scaling methods. The x-axis show the numbers of variables
that are used in the classification. The y-axis show the number
of correct classified examples in the classification set. The soft-
margin optimization is used to calculate a hyperplane with slack
variables, where C = 1000.
5.5. Reducing variable sets 69
Table 5.2: The dominating variable groups that are excluded in the reduction
in Figure 5.17-5.20
1 2 3 4
5.17 (a) Load (P, Q) Load (I) Mixture of Line (P, Q, I)
Angle differ- all variables Unit (P, Q I)
ence Busbar (V)
5.17 (b) Load (P, Q) Load (I) Mixture of Line (P, Q, I)
Line (P) all variables Unit (P, Q I)
Busbar (V)
5.17 (c) Load (I) Load (I) Line (I) Line (I)
Busbar (f) Unit (I) Angle differ- Busbar (V)
ence Angle differ-
ence
5.17 (d) Load (I) Line (I) Busbar (V)
Busbar (f) Unit (I) Line (I)
Angle differ-
ence
5.18 (a) Load (I) Line (I) Line (I) Line (I)
Load (I) Angle differ- Busbar (V)
Angle differ- ence Unit (I)
ence Busbar (V)
5.18 (b) Angle differ- Line (I) Line (I)
ence Busbar (V) Busbar (V)
Line (I) Unit (I)
Load (I)
5.20 (a) Load (I) Load (I) Angle differ- Line (I)
ence Busbar (V)
Line (I)
Unit (I)
5.20 (b) Mixture of Mixture of Busbar (V)
all variables all variables
70 Chapter 5. Support vector machines
Table 5.3: The 30 variables which have the largest influence on the hyperplane
in Figure 5.17, variable setup 4, scaling method 2 and reduction
method 1.
Object w
||w||
CT11 A400 Busbar (V) 0.3245
FT44 A400 Busbar (V) 0.3212
CT22 A400 Busbar (V) 0.3100
CT111 A130 Busbar (V) 0.2843
CT122 A130 Busbar (V) 0.2272
FL2 Line (I) FT42 0.2249
FL6 Line (I) FT61 0.1983
FT41 A400 Busbar (V) 0.1708
CT21 A400 Busbar (V) 0.1678
CL1 Line (I) CT11 0.1656
CL5 Line (I) CT12 0.1560
CT12 A400 Busbar (V) 0.1474
CT72 G2 Unit (I) 0.1423
FT41 G1 Unit (I) 0.1260
CT72 G3 Unit (I) 0.1201
AT111 A130 Busbar (V) 0.1189
CL4 Line (I) CT12 0.1121
CT112 A130 Busbar (V) 0.1071
CL7 Line (I) CT71 0.1065
CL6 Line (I) CT71 0.1065
FL3 Line (I) FT43 0.1047
FT42 A400 Busbar (V) 0.0956
FT142 A130 Busbar (V) 0.0937
CT11 T1 Unit (I) 0.0913
FL10 Line (I) FT62 0.0889
CL17 Line (I) FT44 0.0878
FL15 Line (I) FT51 0.0855
FL14 Line (I) FT51 0.0855
FT50 A400 Angle FT62 A400 0.0852
RT132 G1 Unit (I) 0.0851
5.5. Reducing variable sets 71
Figure 5.21: The NORDIC32 system with the variables from Table 5.3
marked. Bus voltage are marked with ”V”, unit stator current
with ”U” and line current with ”—”.
72 Chapter 5. Support vector machines
5.6 Implementation
5.7 Results
5.8 Conclusions
Conclusions
The SVM method is more adaptable to the classification problem in this the-
sis than PCA and the result is much more satisfying. With the linear SVM
the hyperplane is calculated that separates the stable from the unstable ex-
ample in the training set with the maximum distance between them. The
hyperplane gives a better indication of the security of a state than the cut-set
methodology. The calculation time is also short due to the global optimiza-
tion problem of the SVM. In general, the SVM does not perform well outside
the span of the training data. The training and classification set in this thesis
are too small. In the work by Wehenkel, see [39], the number of examples in
the training set is 11 000. For further work the number of examples must be
increased.
The three main contribution of this thesis are:
• The method to scale the variables to keep the physical properties and
the capability of the units, see Section 5.2.
75
76 Chapter 6. Conclusions
• The method which excludes the binary variables, due to the difficulties
to classification with SVM with a mixture of binary and continuous
variables, see Section 5.5.1.
These three methods are introduced in this thesis and they contribute to im-
prove the classification result.
The classification result in this thesis is quite successful, but the method must
be further verified before a possible implementation in power system opera-
tion. The size of the training and classification sets are too small and must
be increased. With the ARISTO simulator it is possible to produce realistic
synthetic data, thanks to the detailed models and the dynamic simulations.
However, the interface of ARISTO is not developed for this type of study.
The amount of time it takes to develop the data set is very long. Therefore
consideration must be taken if ARISTO should be used in the future for this
project. The dimension of the training set should be at least the same number
of examples as measured variables, 653.
In this thesis linear SVM was used and the impact of adding non linear
functions to the hyperplane has not been investigated. Therefore an analysis
of the effect of adding non linearities to the SVM to solve the classification
problem is motivated. The non linear SVM provide the user to the choose
what non linear functions that shall be added and also the order of them. The
frequently used functions today are polynomial, sigmoid and gaussian.
The scaling function was modified in this thesis to compensate for the
capability of the generators. It should be interesting to develop methods
which also compensate for the transmission capability of the power system.
By adding this, the changes in the transmission, e.g. disconnection of lines
and changes in transmission capability due to over heated lines, could easier
be observed by the separating hyperplane. The improvements by adjusting
the scaling method will hopefully make the training set more robust.
In this thesis no methods have been developed for dealing with the prob-
lem of how to check if the training set is representative for the classification
6.2. Future work 77
[1] Nordel grid planning rules for the nordic transmission system.
www.nordel.org, 2005-02-27, 1992.
[5] CIGRE TF 38.02.08. Long term dynamics. phase ii. final report. Tech-
nical report, CE/SC 38 GT/WG 02, Ref. No. 102, 1995.
[9] A. Edström and K. Walve. The training simulator aristo - design and ex-
periences. In IEEE Power Engineering Society Winter Meeting, 1999.
79
80 Bibliography
[12] L. H. Flink and K. Carlsen. Operating under stress and strain. In IEEE
Spectrum 15, 1978.
[16] K. Hotta. Support vector machines with local summation kernel for
robust face recognition. In Proceedings 17th International Conferance
on Pattern Recognition, 2004.
[17] C-W. Hsu, C-C. Chang, and C-J Lin. A practical guide to support vector
classification. Technical report, Department of Computer Science and
Information Engineering, National Taiwan University, 2002.
[19] B. Johansson. Computer vision using rich features - geometry and sys-
tems. PhD thesis, Lund Institute of Technology, 2002.
[22] P. Kundur. Power system stability and control. McGraw-Hill, Inc, 1993.
ISBN 0-07-035958-X.
Bibliography 81
[26] P-A. Löf. On static analysis of long-term voltage stability. PhD thesis,
Royal Institute of Technology, Sweden, 1995.
[27] J. Ma, Y. Zhao, and S. Ahalt. Osu svm classifier matlab toolbox (ver
3.00), 2002. web http://eewww.eng.ohio-state.edu/ ∼maj/osu_svm.
[38] A. Webb. Statistical Pattern Recognition. John Wiley & Sons, 2002.
ISBN 0-470-84513-9.