Professional Documents
Culture Documents
Zheng 2015
Zheng 2015
1, JANUARY 2015
11
N OMENCLATURE
V , V (x)
Vci
Vr
Vco
P , P (x)
PA , PT
d(x, y)
Manuscript received April 23, 2014; revised July 02, 2014 and August
11, 2014; accepted September 03, 2014. Date of publication September 29,
2014; date of current version December 12, 2014. This work was supported
in part by the National High Technology Research and Development Program
2011AA05A112 of China, in part by the National Natural Science Foundation
of China under Grant 51190101, in part by the Science and Technology Projects
of the State Grid Corporation of China SGHN0000DKJS1300221, in part by
Hunan Electric Power Corporation, and in part by Ningxia Electric Power
Corporation. Paper no. TSTE-00173-2014.
The authors are with the State Key Lab of Power Systems, Department of Electrical Engineering, Tsinghua University, Beijing 100084,
China (e-mail: zhengl07@mails.tsinghua.edu.cn; huwei@mail.tsinghua.
edu.cn; minyong@mail.tsinghua.edu.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TSTE.2014.2355837
T
Nk (x)
Lrd(x)
LOF(x)
Ncubic , Nlinear
Ncommon
I. I NTRODUCTION
1949-3029 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
12
TABLE I
R AW DATA C LASSIFICATION
Fig. 1. Raw scatter plot of wind farm output and wind speed.
TABLE II
S AMPLE OF I NVALID W IND DATA
13
Fig. 2. Distribution of raw wind data: (a) period from 10/1/2010 to 1/31/2011;
(b) period from 2/1/2011 to 5/31/2011; and (c) period from 6/1/2011 to
9/30/2011.
TABLE IV
M ISSING DATA P ROCESSING A LGORITHM
the density in the area of the valid data. Both the unnatural and
the irrational data can be considered outliers or noise compared
to the valid data.
Therefore, outlier detection, which tries to identify exceptional cases that deviate substantially from the majority patterns
[15], can be used to exclude the unnatural and the irrational
data. Furthermore, from the simplicity point of view, as a type
of unsupervised learning, outlier detection can learn relationships and structure from the attributes of the data themselves
[16], so that the classification step in [12] and [13] is no longer
necessary.
III. P REPROCESSING M ETHODOLOGY
A. Wind Data Preprocessing Method
Fig. 3 shows the structure of the proposed preprocessing
method. The constant data processing block, the missing data
processing block, and the physical range check block can easily be implemented via several ifthen judgment sentences, as
shown in Tables IIIV. Regarding imputation of the invalid
14
TABLE V
E XCEEDING DATA P ROCESSING A LGORITHM
yNMinPts (x)
data, the major imputation approaches [21] are to fill or predict the missing values based on the nearby observed values.
However, because the invalid data are often consistent for
a relatively long time, there are insufficient data to make a
smooth imputation, which may only introduce more incorrect
data to the database. Moreover, there is a large amount of
data available, so we can obtain sufficiently interesting patterns from the remaining data that the effect of the pattern
losses with the removal of the invalid data is limited. Therefore,
no approximation is performed after removing the invalid
data.
Data scaling is performed by applying the following
equation:
x
=
x
xr
(1)
|NMinPts (x)|
. (4)
reach distMinPts (x, y)
LOFMinPts (x) =
IrdMinPts (y)
Ird
MinPts (x)
yNMinPts (x)
|NMinPts (x)|
(5)
15
plot of raw wind data in Fig. 1 also shows the same shape characteristics of the power curve. Hence, the points corresponding
to the valid data are distributed near the power curve, whereas
the points corresponding to the unnatural and the irrational data
are far away. Therefore, the weight can be formulated based on
the difference between the measured and the true value of wind
power, as follows.
1) When Vci V < Vr ,
PT P 0.1
1,
(7)
=
PT P /0.1, PT P > 0.1.
2) When V < Vci or Vr V < Vco ,
PT P 0.05
1,
=
PT P /0.05, PT P > 0.05.
(8)
3) When V Vco ,
=1
(9)
where Vci , Vr , and Vco represent the cut-in, rated, and cut-out
speed of the wind turbine. PT denotes the normalized true value
of wind power.
An object that is close to the power curve is defined as being
located in the [PT 0.1, PT + 0.1] interval when Vci V <
Vr and in the [PT 0.05, PT + 0.05] interval when V < Vci
or V Vr , based on domain experiences and past studies.
The weights assigned to these objects are 1, identical to the
Euclidean distance. Additionally, the weights of the data whose
wind speed values are larger than the cut-out speed, are also
equal to 1, to extract the natural properties from the wake effects
data. The weight of the other data is larger than 1, in proportion to the difference between the measured and the accurate
value of the wind power. The farther away an object is located,
the greater the weight, and the more likely the object is to be
detected as an outlier. Thus the weighted distance considers
sticking close to the power curve as an auxiliary factor of being
valid, which is achieved by applying the following equation:
2
2
d(x, y) = (V (x) V (y)) + T (P (x) P (y)) (10)
where the notations are identical to those in (6). T 0 is a
tuning parameter, to be determined separately.
However, the accurate wind power curve is ambiguous and
impossible to be determined. Therefore, we have to use an
0,
0 < V < Vci
V 3 Vci3
,V V < Vr
PA = Vr3 Vci3 ci
1, Vr V < Vco
0,
Vco V
V Vci ,V V < V
ci
r
PA = Vr Vci
1,
V
V
<
V
r
co
0,
Vco V
(11)
(12)
16
Fig. 5. Filtered scatter plot of wind data with the Euclidean distance.
B. Parameter Selection
The method for selecting the MinPts and the LOF_threshold
can be found in [14]. In this paper, MinPts equals 300 and
LOF_threshold is 1.1. Selecting a good value for T is critical.
However, unlike supervised learning, there are no outputs by
which to supervise the learning; hence, the most common performance evaluation methods (such as cross validation) cannot
be used. Because the task is outlier detection in a 2-D space,
the simplest way to evaluate the accuracy of the algorithm is
by visual inspection. Another way is to choose the T value
that results in the lowest bias plus variance value (denoted by
bias + variance). As a general rule, as we increase the value
of T , the bias tends initially to decrease faster than the variance increases. Consequently, the expected bias + variance
declines. However, at some point, increasing the value of T
has little impact on the bias but starts to increase the variance significantly. When this happens, the bias + variance
increases.
According to the definition of bias in Section IV-A, bias measures the detection performance of the algorithm, so the value
of bias is defined as if the algorithm fails to detect all irrational and unnatural data and as 0 the other way around. In the
same way, variance is used to assess the differences among various power curve approximations, and the value of variance
is computed by comparing the detection results of different
approximations applied. In this paper, we use the two models
described in (10) and (11), i.e., the cubic model and the linear
model. The variance is low if most of the outliers detected using
different approximation formulas coincide, which is computed
by
Variance =
TABLE VI
R ESULTS OF VARIOUS T UNING PARAMETERS
17
TABLE VII
C ONFUSION M ATRIX OF THE W EIGHTED D ISTANCE A LGORITHM
U SING THE C UBIC A PPROXIMATION M ODEL , T = 0.5
TABLE VIII
C ONFUSION M ATRIX OF THE W EIGHTED D ISTANCE A LGORITHM
U SING THE C UBIC A PPROXIMATION M ODEL , T = 0.7
TABLE IX
C ONFUSION M ATRIX OF THE W EIGHTED D ISTANCE A LGORITHM
U SING THE C UBIC A PPROXIMATION M ODEL , T = 0.8
Fig. 7. Filtered scatter plot of wind data with the weighted distance: (a) cubic
approximation model, T = 0.5; (b) linear approximation model, T = 0.5;
(c) cubic approximation model, T = 0.7; and (d) linear approximation model,
T = 0.7.
18
TABLE X
C ONFUSION M ATRIX OF THE W EIGHTED D ISTANCE A LGORITHM
U SING THE L INEAR A PPROXIMATION M ODEL , T = 0.5
TABLE XI
C ONFUSION M ATRIX OF THE W EIGHTED D ISTANCE A LGORITHM
U SING THE L INEAR A PPROXIMATION M ODEL , T = 0.7
TABLE XII
C ONFUSION M ATRIX OF THE W EIGHTED D ISTANCE A LGORITHM
U SING THE L INEAR A PPROXIMATION M ODEL , T = 0.8
19
Wei Hu was born in China, in 1976. He received the B.S. and Ph.D. degrees
in electrical engineering from Tsinghua University, Beijing, China, in 1998 and
2002, respectively.
Currently, he is working as an Associate Professor with the Department
of Electrical Engineering, Tsinghua University. His research interests include
power system modeling and simulation, security analysis, and smart control.
Yong Min was born in China, in 1963. He received the B.S. and Ph.D. degrees
in electrical engineering from Tsinghua University, Beijing, China, in 1984 and
1990, respectively.
He is currently a Professor with the Department of Electrical Engineering,
Tsinghua University. His research interests include power system stability and
control.
Prof. Min is a Fellow of the IET.