An Overview of State-of-the-Art Partial Discharge Analysis Techniques For Condition Monitoring

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14


An Overview of State-of-the-Art Partial

Discharge Analysis Techniques
for Condition Monitoring
Key words: partial discharge, condition monitoring, sensor, feature extraction, pattern recognition

Introduction Min Wu, Hong Cao, Jianneng Cao,

As one step toward the future smart grid, condition monitor-
ing is important to facilitate the reliability of grid asset opera-
Hai-Long Nguyen, João Bártolo Gomes,
tion and to save on maintenance cost [1]. Most failures of the and Shonali Priyadarsini Krishnaswamy
power grid are caused by electrical insulation failure, and a key Data Analytics Department, Institute for
indicator of such electrical failure is the occurrence of partial
discharge (PD). Therefore, one focus of condition monitoring is Infocomm Research, A*STAR, 1 Fusionopolis
to detect PD, especially in the early stages, to prevent a serious Way #21-01 Connexis, Singapore 138632
power failure or outage [2].
PD is an electrical discharge that occurs across a portion of the
insulation between two conductors, without completely bridging
the conductors. PD is harmful because it causes progressive de-
terioration of power apparatus and insulation, ultimately leading This article presents a comprehen-
to equipment failure. Therefore, detecting PD and differentiat- sive survey for the state-of-the-art
ing PD from noise are important tasks of condition monitoring
to facilitate grid asset operation and timely maintenance. In ad- knowledge on partial discharge
dition, PD can take place at both AC and DC voltages; however, (PD) signal sensing, feature repre-
because AC voltage systems are much more common and PD at
DC voltage is less harmful [3], [4], this article focuses only on
sentation for PD signals, and pattern
PD at AC voltage. recognition techniques for PD data
Given that PD happens in various insulation systems, various analysis.
types of PD can be captured [5]–[7], such as corona, discharge
in oil, surface discharge, internal discharge, slot discharge, de-
lamination discharge, and discharge due to floating particles in
gas-insulated systems. Corona discharges generally happen in techniques. Typically, a PD monitoring system consists of two
open air and produce a current originating from an electrode components, i.e., a PD signal collection module and a data anal-
with a high voltage. Surface discharges are produced along the ysis module. PD emits energy in several ways, for example, pro-
surface of the insulation. Internal discharges take place inside ducing a current, electromagnetic radiation, light, heat, acoustic
the solid insulations with gas-filled cavities or near the contami- waves, and so on. The PD signal collection module involves var-
nants. In addition, internal discharges are more harmful than sur- ious sensors that are designed and deployed to capture a current
face discharges and corona [8]. As such, discriminating different signal, electromagnetic radiation, light, heat, or acoustic signal.
types of PD would provide an initial assessment on the severity The data analysis module is often equipped with advanced pat-
of a deteriorated spot where PD is happening. tern recognition techniques to help differentiate PD from noise,
Nowadays, automatic detection of PD is made possible and even to tell the specific sources of PD and locate them. Thus,
largely due to advanced sensing technology and data analytic diagnosis strategies, for example sending out an alert, can be

22 0883-7554/15/©2015/IEEE IEEE Electrical Insulation Magazine

Figure 1. Flowchart of the three components in a partial discharge (PD) monitoring system.

made with respect to the analysis results. In addition to these two with split cores can be conveniently applied to a high voltage
main components, feature extraction and representation for PD equipment without disconnecting any part of it, whereas those
signals can play an important role in data analysis, and we con- with closed cores are generally for permanent monitoring and
sider it as an independent component. This survey will mainly need to be installed prior to the operation.
focus on the pipeline with the three components for PD data
collection, representation, and analysis as illustrated in Figure 1. Acoustic Detection
Acoustic sensors can be used to detect sound waves gener-
ated by PD in sonic and ultrasonic range [14], [15]. Generally,
Detection and Localization for PD Signals acoustic sensors are applied to detect PD in switchgears, gas-
Based on different types of signals generated by PD, we insulated systems, and transformers. Acoustic sensors have sev-
summarize the PD sensing methods into three main categories: eral advantages for detecting PD signals. First, they are easily
electrical, acoustic, and chemical detection [9]. In fact, PD also installed. For example, they can be conveniently mounted on the
generates heat, and thus, thermal detection for PD using temper- outside of the tank wall of the transformer while the transformer
ature sensors sounds practicable. However, temperature sensors stays in full service. Second, acoustic sensors are immune to
have not been used for PD detection in current literatures. Next, electromagnetic interference. Third, acoustic sensors are gener-
we briefly introduce each detection method and the sensors in- ally associated with low price, but perform well for localizing
volved in each method. In addition, we also briefly introduce PD the PD sources [16]. Still, acoustic sensors suffer from their own
localization in this section. limitations, e.g., they are less sensitive due to background acous-
tic noise.
Electrical Detection Fiber-optic acoustic sensors have been developed to increase
Electrical detection of PD is based on the electrical phenom- their sensitivity for signal detection. They are becoming high-
ena accompanied by the discharge, including electromagnetic ly attractive for PD sensing due to their advantages (e.g., high
radiation and electric current pulses. sensitivity, electrical nonconductivity, and immunity to electro-
Ultrahigh frequency (UHF) sensors or UHF antennas are able magnetic interference). In [17]–[20], various fiber-optic acoustic
to detect electromagnetic waves generated by PD (usually with sensor systems have been designed for detecting and localizing
a frequency range from 300 MHz to 3 GHz). UHF sensors are PD in transformers.
often applied to detect PD in gas-insulated systems and trans-
formers. UHF sensors offer the advantage that external noise Chemical Detection
and disturbances can be shielded effectively. Therefore, UHF Chemical detection of PD is based on the fact that the dis-
sensors for PD signals have gained increasing attention recently charge may generate new chemical components via reactions.
[10], [11]. In addition, transformers with different geometry in- Dissolved gas analysis (DGA) is the main chemical detection
side will have different impedance, and UHF sensors thus need method [21]–[23], and it focuses on analyzing dissolved gases in
to be recalibrated for different types of transformers. the transformer oil. DGA methods include gas chromatography,
High frequency current transformer (HFCT) sensors detect hydrogen on-line monitoring, and photo-acoustic spectroscopy
the high frequency current pulses generated by PD. The frequen- [23]. The distribution of these gases collected from the above
cy band of HFCT sensors is usually from hundreds of kilohertz methods can help to identify different types of PD, based on the
to dozens of megahertz, which has been proven to be capable of existing IEC standards [23], [24].
capturing PD signals. HFCT sensors consist of a magnetic core Chemical detection is usually intrusive and performed for
and screened windings, and they are clamped around the case oil-filled components, e.g., transformers. This limits its applica-
ground of the components (e.g., switchgears and cable termina- tions to other components. Furthermore, chemical detection is
tions) to measure the current pulses [12], [13]. HFCT sensors not able to provide any information about the location of PD.

November/December — Vol. 31, No. 6 23

method [28], and the details of the time domain reflectometry
method are not introduced here. Basically, acoustic sensors ob-
viously achieved much better performance then UHF sensors for
PD localization. For example, the location error achieved in [26]
using UHF sensors was 14.33 cm, while the dimension of the
transformer tank was 71.5 × 118 × 95 cm. With transformers
having similar size, the location error using acoustic sensors in
[14] and [19] was 5.4 and 6 cm, respectively.

Summary for PD Detection and Localization

Various sensor systems have their own advantages and limita-
tions. For example, UHF sensors have high sensitivity for sig-
Figure 2. Multiple sensors (S1, S2, S3, and S4) mounted on a nal detection while fiber-optic acoustic sensors have advantages
transformer tank with a partial discharge (PD) inside. of high sensitivity, electrical nonconductivity, and immunity to
electromagnetic interference. As shown in [29], the signals de-
tected by acoustic sensors were significantly attenuated within
PD Source Localization the cable joint. Meanwhile, the detection sensitivity of radio
Localizing PD sources accurately is valuable for maintenance frequency current transformer sensors (similar to HFCT sen-
and repairing purposes. Figure 2 shows an example transformer sors but with lower frequency range) was not good, and thus
tank where a PD is inside and several sensors are placed outside. the authors in [29] further applied a wavelet denoising method
The task for PD localization here is basically to compute the to increase the signal-to-noise ratio for the signals from radio
Cartesian coordinates (x, y, z) of the PD. Arrival time analysis frequency current transformer sensors.
is a common method to localize the PD sources [14], [25], [26].
Next, we introduce a common approach based on arrival time
analysis. Feature Representation for PD Signals
Assume that S1 is a reference sensor and the arrival time of For PD signals from the PD detectors in Table 1, we generally
PD at S1 is T (T here is unknown). Since we are able to collect represent them in two different patterns, namely, time-resolved
the arrival time difference τ1i between S1 and Si (i >1), we can patterns and phase-resolved patterns, as shown in Figure 3. Fig-
have the following Equation 1. With four equations (i.e., 1 ≤ ure 3(A) shows a time-resolved pattern, i.e., a q-t waveform,
i ≤ 4), we can thus calculate (x, y, z) of the PD as well as the where q is the amplitude (i.e., the apparent charge or discharge
arrival time T at the reference sensor. Note that, v in Equation voltage) and t shows the time information. Figure 3(B) demon-
1 is the sound velocity for acoustic measurement systems [27] strates a phase-resolved pattern, i.e., a φ-q-n pattern, where φ is
and the light velocity for UHF sensor systems [25], [26], re- the phase angle for the PD pulse, q also refers to the apparent
spectively. In addition, we are able to locate the PD with only 3 charge or discharge voltage, and n is the number of pulses. Next,
equations when the arrival time T is known, and this scenario is we will introduce the features extracted from both waveforms
also known as triangulation, which is used in global positioning and phase-resolved patterns.
systems (GPS). In addition, a common representation for phase-resolved
patterns is based on the phase-window method in [6], [7], [30],
(x − xsi)2 + (y − ysi)2 + (z − zsi)2 = (T + τ1i)2 × v2 (1) [31]. The phase-window method divides the power cycle with
360° into several small phase windows and then generates some
As introduced above, both UHF sensors [25], [26] and acous- features for each phase window. For example, we have 360 win-
tic sensors [14], [27] can be used for PD localization. In addition, dows if each phase window has a size of 1°. Assume that we
HFCT sensors can be used for PD localization in crosslinked have three features for each phase window, namely, the number
polyethylene cables based on the time domain reflectometry of pulses, maximal amplitude, and the average amplitude in this

Table 1. The features for various partial discharge detection methods and sensors

Ultrahigh frequency High frequency current Optical acoustic Dissolved gas

Item sensors transformer sensors sensors analysis

Signals Electromagnetic waves Electrical current pulse Acoustic waves NA1

Signal frequency 0.300–3 GHz 3–30 MHz 10–300 kHz NA

Installation difficulty Moderate Easy Easy Moderate

Ability to localize partial discharge Limited Limited Yes No

NA = not applicable.

24 IEEE Electrical Insulation Magazine

Figure 3. (A) Time-resolved pattern (waveform) and (B) phase-resolved pattern.

phase window. We can thus represent the phase-resolved pattern • Pulse decay time (td): time required to decrease from 0.9A
with a 360 × 3 = 1,080 dimensional vector. Due to the curse of to 0.1A
dimensionality, it is difficult to analyze, understand, and visual- • Pulse width (tw): the time interval between 0.5A on both
ize such high dimensional data (e.g., 1,080 dimensional data as sides
introduced above). Feature extraction techniques for dimension-
ality reduction would thus be important before we can analyze Statistical Features
the PD data. In this section, we will also introduce principal Given a set of pulses, phase-resolved pattern is a good repre-
component analysis (PCA) and t-distributed stochastic neighbor sentation for them as mentioned above. The phase-resolved pat-
embedding (t-SNE) for feature dimensionality reduction. tern is first divided into many phase windows, e.g., each phase
window has 1° and we can thus have 360 phase windows. We
Pulse Characteristics can then extract some features in each phase window, e.g., the
Figure 4 shows an example PD pulse and typical features number of pulses, maximum amplitude, and average amplitude.
to describe the pulse shape. Assuming that the maximum am- The statistical moments of these features over all the phase win-
plitude of the pulse is A, we can extract the following features dows can be extracted to further represent the given set of pulses
to describe the pulse shape from the waveform, e.g., pulse rise [32].
time, decay time, and pulse width (duration) [32]. Assume that we have N phase windows, and xi is a specific
feature value for the ith phase window. The mean of xi over these
• Pulse rise time (tr): time required to increase from 0.1A to N
N phase windows would be µ = ∑ i =1 x i / N . If we consider
0.9A that phase windows may have different importance [let p(xi) be
the importance of the ith phase window], the weighted mean is
thus computed in Equation 2. For simplicity, the other statistics,
e.g., variance (σ2), skewness (γ), and kurtosis (κ), are all defined
as follows without considering the importance for phase win-
dows [33], [34].

∑ x i p (x i )
Weighted mean (w ∝) : w ∝ = i =N1 (2)
∑ i =1p (x i )
N 2
Variance (σ ) : σ 2 ∑ (x i − µ )
= i =1 (3)
N 3
∑ (x i − µ )
Skewness (γ ): γ = i =1 (4)
σ 3 ×N

Figure 4. Typical features to describe the shape of a partial dis- N 4

charge pulse. tr = pulse rise time; tw = pulse width; td = pulse Kurtosis (κ ): κ =
∑ i =1 (x i − µ ) −3 (5)
decay time. σ 4 ×N

November/December — Vol. 31, No. 6 25

In the above definitions, skewness and kurtosis are calculated (CWT) of a signal f(t) with respect to a wavelet ψ(t) is defined
with respect to a reference normal distribution. Skewness is a as follows:
measure of asymmetry or degree of tilt of the data with respect
to normal distribution, which has a skewness of zero. Negative +∞
1 t − b 
values for the skewness indicate data that are skewed left (the CWT (a,b ) =
∫ f (t )ψ  a 
 dt , (6)
left tail is longer than the right tail), and positive values indi- −∞
cate data that are skewed right. Kurtosis is an indicator of sharp-
ness of distribution. If a distribution has the same sharpness as where a is a scale factor and b is a translation factor. The scale
normal distribution, the kurtosis is zero. Negative values for factor is for compressing and stretching the mother wavelet,
kurtosis indicate a distribution flatter than normal distribution, whereas the translation factor is for shifting the wavelet function
whereas positive values indicate a sharper distribution. ψ(t) along the time axis. Both a and b are continuous variables
PD pulses generally occur in both positive and negative in CWT.
halves of the voltage cycle. Several features can be extracted to If we set a = 2j and b = n × a in Equation 6 with n and j being
tell the differences between two halves. Please refer to [32] for integers, we then derive a discrete wavelet transform (DWT).
the details of such features, e.g., discharge asymmetry and cross- DWT can be implemented by a well-known pyramidal algo-
correlation factor. rithm based on multiresolution analysis as shown in Figure 5. A
Recall that we use a feature vector with 1,080 dimensions to signal is decomposed to an approximation signal and a detailed
represent a set of pulses based on the phase-window method, signal by half-band low-pass and high-pass filters, respectively.
i.e., 360 phase windows × 3 features from the phase-resolved The approximation signal can be further decomposed to two sig-
pattern. Now, we consider three features from the phase-re- nals iteratively. As such, a k-level DWT generates (k + 1) signals
solved pattern as three distributions, and we have four of the (cD1, ···, cDk, cAk), where cDi is the detailed signal at ith level,
above statistical features, i.e., mean, variance, skewness, and 1 ≤ i ≤ k, and cAk is the approximation signal at kth level. Simi-
kurtosis, which are calculated separately for these three distribu- lar to DWT, wavelet packet transform (WPT) is another type of
tions. Note that all these features are calculated for positive and wavelet transform where the signal in each level will be decom-
negative voltage cycles. Hence, we have 3 × 4 × 2 = 24 features. posed to two signals by two half-band filters. Thus, for k-level
In addition, we have six features from discharge asymmetry and decomposition, WPT produces 2k different signals.
cross-correlation factor. In total, we extract 30 features based on In [35], an original PD signal S was transformed to n de-
the statistical moments. As such, the feature dimension for a set composition signals (dS1, ···, dSn) by DWT. A decomposi-
of pulses is much reduced, comparing with 1,080 features based tion signal dSi has an energy Ei = ∑ dSi (t )2 . S is then rep-
on the phase-window method. resented by a normalized energy vector (E1′ ,, En′ ), where
Signal Processing Method Ei′ = Ei / (∑ n
j =1 )
E j . Those energy coefficients for PD data
Wavelet transform is a common signal processing method for were generated to discriminate different types of PD sources via
various signals including audio signals, images, and so on. It is clustering in [35].
particularly suitable to analyze irregular and nonperiodical sig- k-Level WPT was applied to generate 2k decomposition sig-
nals. In this section, we focus on its application for extracting nals for an original PD signal S in [7], [36], [37]. In [36], the
features for PD data. authors calculate energy, skewness, and kurtosis for these 2k
A wavelet ψ(t) is a small wave-type signal with an aver- decomposition signals, as well as 2k − 1 intermediate signals.
age value of zero (i.e., ∫ ψ (t )dt = 0) and a finite energy However, each PD signal has a large number of features, i.e., 3
+∞ 2 × (2k+1 − 1) features and feature selection is needed for further
(i.e., ∫−∞ [ψ (t )] dt < ∞). The continuous wavelet transform
analysis in this paper. In [37], [7], each level will be assigned 4

Figure 5. The iterative decomposition process of discrete wavelet transform.

26 IEEE Electrical Insulation Magazine

features, namely, mean (µi), variance (σ i2), skewness (γi), and ability distribution Q over the points in the low-dimensional
kurtosis (κi) for the signals in this level. Hence, original PD sig- map, and qij is the probability for the corresponding pair (yi, yj)
nals are assigned with 4k features (k = 9 and thus 36 features in in low-dimensional space. Given that two probability distribu-
[37], [7]). Note that the statistical moments here were derived tions P and Q are similar or close, two close points xi and xj
for waveforms from the distribution of discharge amplitude in high-dimensional space are likely to be nearby in the lower-
against the time. dimensional space. Hence, t-SNE uses Kullback-Leibler (KL)
divergence in Equation 7 to measure the distance between the
PCA two distributions P and Q. By minimizing the KL divergence
Principal component analysis is a common statistical tech- between P and Q, t-SNE thus reduces the dimensionality for
nique for feature extraction and dimensionality reduction [38]. original data and retains their pairwise distance or similarity.
We can represent m data instances with n features as a data
matrix εRm×n. Assuming that X is normalized (each column with pij
mean 0), C = (1/n)X TX is the covariance matrix of X. Here, X T KL ( P || Q ) = ∑pij log (7)
i ≠j qij
is the transpose of X and C ∈ Rm×n. Subsequently, we calculate
the eigenvalues and eigenvectors for C based on eigenvalue
decomposition. For example, λ is an eigenvalue and vεRn×1 is In [43], wavelet decomposition on the waveform of a PD
its corresponding eigenvector if λv = Cv. We then select top-p pulse generates 10 wavelet energy coefficients for this pulse.
eigenvalues and their eigenvectors, and V = Rn×p is the matrix t-SNE further reduces the dimensionality from 10 to 3 for PD
consisting of these p eigenvectors (p < n). Finally, the data from pulse visualization and clustering. In [7], the feature vector for
n-dimension space X is mapped to p-dimension space Xpca by each phase-resolved pattern is with 600 dimensions, and t-SNE
Xpca = XV. then reduces the dimensionality from 600 to 7 for PD source
PCA for Waveform Data
In [39], each PD pulse was represented by its waveform as Summary
a 2,000-dimension vector (2,000 points during 20-μs sampling Table 2 summarizes the features introduced above into two
period). PCA was then applied to reduce the feature dimension categories. For a specific feature extraction method, we also list
from 2,000 to 2 to visualize and group PD pulses. In [40], each in the table the references where it was used for PD data analy-
PD pulse was represented by its waveform as a 5,000-dimen- sis.
sion vector. Transformation of the PD pulses from time domain The first category of features are derived directly from the
(waveforms) to frequency domain via Fourier transform reduced waveforms or the phase-resolved patterns, and the second type
the dimension of each signal from 5,000 to 2,500. PCA further of features are those generated by feature dimensionality reduc-
reduced the feature dimension from 2,500 to 6 for hierarchical tion techniques, e.g., PCA and t-SNE. These two types of fea-
cluster analysis. In [35], wavelet decomposition on the wave- tures have their own advantages. The first type of features have
form of a PD pulse generated a number of wavelet energy coef- clearer physical meanings and may achieve better performance
ficients (e.g., 6 energy coefficients) for this pulse. PCA further for PD data analysis than the second type of features. For exam-
reduced the dimensionality from 6 to 3 for PD visualization and ple in [7], DWT for feature extraction from waveforms provides
clustering. the best set of features. Statistical operators (e.g., skewness, kur-
tosis, etc.) are the second best set of features, which are better
PCA for Phase-Resolved Data than those generated by PCA and kernal PCA. The second type
In [7], the feature vector for the phase-resolved pattern as of features are generally with low dimensionality and are easy
shown in Figure 3(B) was with 600 dimensions (200 phase win- for visualization. However, they are generally the combination
dows × 3 features in each phase window). PCA then reduced of original features, and thus it is hard to interpret their physical
the dimensionality from 600 to 7 for PD source classification. meanings.
In [41], [6], each phase-resolved pattern had a feature vector
with 1,440 dimensions (360 phase windows × 4 features in each
phase window). PCA was similarly used for dimensionality re- PD Clustering
duction for PD source classification. After PD signals are represented in feature space, we are able
to analyze them based on both unsupervised and supervised pat-
t-SNE tern recognition techniques. In this section, we introduce various
t-SNE is another dimensionality reduction technique that was methods for clustering PD signals (unsupervised methods). In
recently developed for visualizing high-dimensional data [42]. the next section, we will introduce different models for classify-
t-SNE aims to project the data to the low-dimensional space and ing PD signals (supervised methods).
retain their pairwise distance or similarity. For this purpose, t- Clustering algorithms divide objects into groups, such that
SNE first constructs a probability distribution P over pairs of objects in the same group are more similar to each other than
high-dimensional objects in such a way that two similar objects to those from a different group. PD pulses from the same PD
have a high probability score, while dissimilar points have a low source tend to have similar characteristics (e.g., similar wave-
probability score. Let pij denote the probability for the pair (xi, forms or phase-resolved patterns). Clustering algorithms are
xj) in high-dimensional space. Similarly, t-SNE defines a prob- thus widely used for separating multiple PD sources.

November/December — Vol. 31, No. 6 27

Table 2. Feature extraction for partial discharge signals1

Type I: Features extracted directly from the waveforms or phase-resolved (PR) patterns

Data pattern Method References

Waveforms Pulse shape [44]–[48]

DWT and WPT [7] [35]–[37]

Fourier transform [40]

PR patterns Phase window method [6] [7] [30] [31]

Statistical features [6] [7] [41] [44] [49] [50] [51]

Type II: Features derived from dimensionality reduction techniques

Technique Data pattern References

PCA Waveforms [35] [39] [40] [52]

PR patterns [6] [7] [41] [49] [51]

t-SNE Waveforms [53] [43]

PR patterns [7]
DWT = discrete wavelet transform; WPT = wavelet packet transform; PCA = principal component analysis; t-SNE = t-
distributed stochastic neighbor embedding.

K-Means the pulse equivalent frequencies to achieve optimal clustering

K-means is a widely used centroid-based clustering algo- results. In [11], nine features were extracted for signals recorded
rithm. K is a user-specified parameter, i.e., the number of clus- by UHF sensors, based mainly on the amplitude and time inter-
ters desired. It is an iterative refinement technique in the fol- vals between two consecutive pulses. In [55], PD pulses were
lowing steps. First, K initial centroids are randomly selected. represented by their amplitudes and phase angles, and rules were
Second, K clusters are formed by assigning each object to its applied to tell the PD source for each cluster based on its shape.
closest centroid (assignment step). Third, the centroid of each K-means is simple and efficient clustering method that is
cluster is updated (update step). K-means clustering proceeds by widely used. However, K-means also has two limitations. On
alternating between the assignment step and update step, until it one hand, it may converge to local minimum of Equation 8.
converges. In fact, the above iterative process is a heuristic algo- On the other hand, we need to specify the number of clusters k,
rithm to minimize the optimization function in Equation 8. Here, while it is often challenging to predetermine the right number
C = {C1, …, Ck} is the set of k clusters and oi is the centroid of of clusters since we may have no idea about the number of PD
the cluster Ci. sources.

2 Fuzzy C-Means
arg min ∑ ∑ x j − oi (8) Fuzzy C-means (FCM) is a soft version of K-means, where
i =1 x j ∈C i
each object has a fuzzy degree of belonging to each cluster [56].
Here, “soft” clustering means that an object can be assigned to
All the approaches [11], [44], [45], [54], [55] used K-means multiple clusters. In FCM, the optimization function in Equation
algorithm to group signals into different clusters. Their differ- 9 is extended from Equation 8.
ence is at the features extracted for clustering. In [44], PD sig-
nals were represented by a feature vector with the statistics of k N
the phase-resolved patterns (i.e., skewness, kurtosis, asymmetry, arg min ∑∑pijm ⋅ x j − oi (9)
P ,C
and cross-correlation factor). In [54], the authors proposed nor- i =1 j =1
malized auto-correlation function for recorded signals, which
summarizes both time and frequency domain features. In addi- In Equation 9, C = {C1, …, Ck} is the set of k clusters and oi
tion, the similarity between two signals is measured by the Pear- is the center of the cluster Ci. P = {pij} (1 ≤ i ≤ k, 1 ≤ j ≤ N, N is
son correlation coefficient between their normalized auto-corre- the total number of objects) and pij shows the probability of the
lation functions. In [45], the authors first extracted the features object xj belonging to the cluster Ci. In addition, m is a parameter
for the pulse shape (e.g., pulse rise and decay time, maximum that controls how much the weight is given to the closest center.
amplitude, and so on) and generated clusters with K-means. Using Lagrange multipliers, FCM computes P and C by itera-
Then, they merged clusters based on the distribution range of tively updating pij and oi in Equation 10 until they are converged.

28 IEEE Electrical Insulation Magazine

2 (
2/ m −1) clustered by DBSCAN and visualized in 3-D space. Similar
 
k  x j − oi to [35], each PD signal was represented by energy coefficients
(pij ) = ∑  
 2  with 10 dimensions in [43]. The authors then applied t-SNE for
l =1  x j − ol  feature dimensionality reduction (e.g., the number of features
was reduced from 10 to 3) and DBSCAN for PD data cluster-
∑ j =1x j ⋅ pijm ing. The approach [60] extended the work in [54], which applied
and oi = N
. (10) K-means for PD clustering, and adapted DBSCAN instead to
∑ j =1pijm generate overlapping PD clusters (i.e., some PD signals were
allowed to be involved in multiple clusters).
Compared with K-means, DBSCAN has two advantages.
In [51], 26 statistical features were extracted from the phase- First, it does not require one to specify the number of clusters.
resolved patterns, and 11 principle features were further ob- Second, it can find clusters with various shapes (e.g., those non-
tained by applying PCA. The authors then applied FCM and linearly separable clusters), which may not be detected by K-
its two variants, kernel fuzzy c-means and probabilistic fuzzy means.
c-means, for PD data clustering. They draw two interesting con-
clusions in this paper. First, 11 principle features identified by Hierarchical Clustering
PCA performed better than the original 26 features, i.e., feature Hierarchical clustering is a clustering method that builds a
reduction by PCA helped various FCM. Second, probabilistic hierarchy of clusters in agglomerative (“bottom up”) or divisive
fuzzy c-means achieved better results. In [57], [58], the authors (“top down”) manners. As these two manners are reverse of each
also applied FCM to PD pulses for separating their PD sources. other, next we only introduce the agglomerative hierarchical
clustering in details.
DBSCAN For agglomerative hierarchical clustering, each object is ini-
DBSCAN is a well-known density-based clustering algo- tially considered as an individual cluster. Based on various defi-
rithm [59]. It has two parameters ε and Nmin. Given a point P and nitions of the distance between two clusters as shown in Table
a distance metric d, N(P) = {Q | d(P,Q) ≤ ε} is the set of neigh- 3, two nearest clusters are then selected and merged. In Table
bors of P, and we define that P and all the points in N(P) are con- 3, Dist(A, B) is the distance between two clusters A and B, and
nected. If two different points X and Y are connected to a third dist(a, b) is the distance between two objects (e.g., Euclidean
point Z, X and Y are also connected. If N(P) | ≥ Nmin, DBSCAN distance, Manhattan distance, Mahalanobis distance, and so on).
generates a cluster from P by including all the points that are In addition, ca and cb are the centroids of A and B, respectively.
connected to P (using bread-first search or depth-first search). The merge operation is repeated until all the objects are in the
Once a cluster is identified, DBSCAN selects a new point P′ not same cluster or some stop criteria is satisfied.
involved in any clusters and repeats the above procedure. Hierarchical clustering was used for PD data clustering in
DBSCAN was exploited for clustering PD signals [35], [43], [39], [40], [49]. In [40], raw waveforms with 5,000 samples were
[60]. In [35], raw waveforms for PD signals were first converted converted to 6-dimensional feature vectors by Fourier transform
to 6 decomposition coefficients by 5-level DWT. Each PD signal followed by PCA. The authors then applied hierarchical cluster-
was then represented by a 6-dimensional vector with the energy ing to analyze PD signals. In particular, they considered 2 types
coefficients. Principal component analysis further reduced the of distance between instances/PD signals (i.e., Euclidean and
feature dimensionality from 6 to 3. Finally, PD signals were Mahalanobis distance) and 2 types of distance between clusters

Table 3. Distance between two clusters A and B, Dist(A, B)

Name for Dist(A, B) Formula

Single-nearest distance (single linkage) min dist ( a, b )

a ∈ A, b ∈ B

Complete-farthest distance (complete linkage) max dist ( a, b )

a ∈ A, b ∈ B

Average-average distance (average linkage or unweighted pair group method with average,
i.e., UPGMA) A B ΣΣdist ( a, b )
a ∈ Ab ∈B

Centroid distance (unweighted pair group method with centroid, i.e., UPGMC) dist ( ca , cb )

ΣΣdist ( a, b )
Ward’s criterion
a ∈ Ab ∈B

November/December — Vol. 31, No. 6 29

(i.e., average linkage and ward’s linkage) and generated 4 dif- As it exploits the complementary nature of individual clustering
ferent sets of hierarchical clustering results. The approach [49] methods by leveraging the outputs of them, it generally achieves
extended the work [51] (that clustered PD signals by fuzzy C- better clustering results than its component clustering solutions.
means) and applied hierarchical clustering to group PD signals
using Manhattan distance and complete linkage. In [39], 3 types PD Classification
of cluster distances (e.g., single, average, and complete linkag- The above PD clustering is an unsupervised process to group
es) were used for hierarchical clustering to separate PD signals. similar PD signals into the same cluster. Here, PD classification
Hierarchical clustering is especially good for analyzing clus- is a general term that refers to the supervised learning process on
ters with inherent structures (e.g., some big clusters contain sub- labeled signals. It involves various analytic tasks, which include
clusters). However, it is generally slow with a time complexity classifying PD and noise, discriminating multiple PD sources,
O(n3) (n is the number of instances to be clustered) to build the assessing the insulation status, diagnosing different types of
full hierarchy. In addition, it is challenging to determine the op- transformer failure, and so on. In this section, we introduce three
timal number of clusters for hierarchical clustering. Solutions popular techniques as shown in Figure 6 for PD classification,
for this problem include selecting a distance threshold to cut the namely, support vector machines, neural networks, and decision
dendrogram [51], predefining the number of clusters, and ex- tree.
ploiting measures for the cluster structures (e.g., the modularity
in [61]). Support Vector Machines (SVM)
SVM is a state-of-the-art classification technique in machine
Summary learning [64], and it has been proven to be one of the best classi-
Clustering techniques are applied, usually when the labels fiers in many application domains, e.g., image recognition, text
of PD sources are not available. Currently, there are two ways categorization, and so on.
of validating the clustering output. First, computational metrics In Figure 6(A), y = wx + b is a linear hyperplane that can be
were proposed for evaluation, such as intracluster and interclu- used for classification, while 2/||w|| is the margin for this hy-
ster distance [39] and normalized mutual information [62]. Sec- perplane. The basic principle of SVM is to find the maximum-
ond, domain knowledge was manually exploited to evaluate the margin hyperplane, by solving the convex optimization problem
quality of detected clusters. For example, phase-resolved pat- in Equation 11. The first term (1/2)||w||2 in the equation indicates
terns were used for evaluation as PD pulses from different sourc- the margin size, while the second term is the penalty for misclas-
es tend to have distinguished distributions for phase-resolved sifying instances (yi is the real label of the ith instance, and w ×
patterns [35], [43], [54], [60]. In the future, we are motivated to xi + b is the predicted label). The optimization problem can be
transfer such domain knowledge to intelligent algorithms that solved using quadratic programming [65].
work automatically for PD data analysis and evaluation.
In summary, the above clustering methods with certain fea- 1 m 
ture extraction techniques have been demonstrated to be effec- w * = arg min  w + C ∑max {0,1 − yi ⋅ (w ⋅ x i + b )}
w ∈Rn  2
tive for separating different PD sources. In the real application,  i =1 
a user can select a clustering method based on its advantages (11)
and limitations, as well as the clustering validation results. Once
we have multiple sets of clusters identified by various clustering Figure 6(A) shows a linear case for SVM. To find nonlinear
methods, we would have the opportunity to use ensemble clus- hyperplanes, various kernel functions can be applied to map the
tering (or consensus clustering) [62], [63]. Ensemble clustering original input space to higher dimensional space where we can
combines different clustering solutions into a consensus one. find linear hyperplanes for classification [64]. Common kernel

Figure 6. (A) Linear support vector machines and the maximum margin principle; (B) an example of multilayer neural network; and
(C) a decision tree for the mammal classification problem.

30 IEEE Electrical Insulation Magazine

functions include polynomial kernel, Gaussian kernel (radial ba- In [67], the input data of the neural network was the percent-
sis function), and so on. ages of gases (e.g., H2, C2H2, and C2H4), which were generated
In [50], SVM was used to identify the insulation defect loca- by DGA. Based on the input, the neural network then classi-
tion based on the parameters or features from the PD signals. fied the transformer faults as overheating, discharge, and PD.
The defect location is simplified to indicate whether the defect As shown in their experiments, the trained neural network out-
is close to a high voltage electrode or low voltage electrode, i.e., performed the IEC 60599 criteria [24], which is widely used to
the defect location is thus a binary label. Polarity and dissym- interpret DGA results. Moreover, with the fuzzy rules extracted
metry features from the phase-resolved patterns were then fed from the neural network, the authors provided an improved IEC
into SVM for predicting the locations of the insulation defects. table for interpreting DGA results [70].
In [46], contaminating particles in transformer mineral oils In [69], the authors first collected 8 types of PD data that
were separated into four classes based on their length, radius, occurred in paper–oil insulation systems impaired by aging pro-
and metal type. Features were collected from PD signals gener- cesses. They then extracted features from frequency and time-
ated by the particle-electrode collision (e.g., the amplitude of PD frequency domains for the PD signals. Neural network was fi-
pulse and the time interval between PD pulses). SVM was then nally used to classify different types of PD. In [36], WPT was
applied to classify the contaminating particles. first applied to generate a number of features, and 6 features
In [30], [6], [31], SVM was employed to discriminate dif- were derived by a feature selection strategy. Neural network
ferent types of PD sources. For example in [6], two sets of fea- was later employed to separate corona and other types of PD.
tures were generated from the phase-resolved patterns. On both In addition, back-propagation neural network [6] and a variety
feature sets, SVM achieved higher accuracy for classifying PD of neural networks, e.g., multilayer perceptron and radial basis
sources than the other two classification models, back-propaga- function network [7], were also employed for PD source clas-
tion neural network and self-organizing map. sification.
Some variants of SVM, e.g., least square SVM [58] and
fuzzy-SVM [7], are used for PD source classification. For least Decision Tree
square SVM, the least square error, i.e., (yi − w ∙ xi − b)2, is used Decision tree techniques construct a flowchart-like structure
as the penalty term in Equation 11. For fuzzy-SVM, each data as shown in Figure 6(C) in which internal node represents a test
point has a fuzzy membership so that different input points can on a feature, each leaf node represents class label, and a path
make different contributions for computing the decision hyper- from root to leaf represents a classification rule [65]. Decision
plane. tree is constructed in a “top-down” manner, i.e., the root and
later the internal nodes are iteratively split until certain criteria
are met and this tree-growing process terminates. Impurity (for-
Neural Network mulated by entropy and Gini index) is one of the most popular
Neural networks are machine learning models inspired from measurements used for building a decision tree. The impurity
the central nervous systems of animals (e.g., brains). A neural difference between a node and its child nodes (i.e., information
network generally consists of an input layer, an output layer, gain) is used to select the best splitting point of the node. In ad-
and one or more hidden layers as shown in Figure 6(B) (a neural dition, we consider a node with impurity lower than a threshold
network without hidden layer is often denoted as “perceptron”). to be a leaf, and this would be a natural stop-condition for the
The nodes (called “neuron”) in the network connect all the tree construction.
nodes in preceding and following layers with different weights Decision tree techniques are widely used for PD-related clas-
(called “synapses”). In addition, each neuron in the input layer sification tasks in various scenarios. In [47], [48], the features
is an input feature while each neuron in the output layer is a pre- were extracted from the shape of PD pulses, e.g., pulse rising
dicted class. A neuron computes a weighted sum of the incom- time, decay time, pulse width, etc. Decision tree on these fea-
ing signals and has a nonlinear activation function to determine tures was employed for classifying the sizes of voids or cavities
whether it is active or not. in dielectric specimens. In [71], decision tree was used to assess
During the training phase, we generally learn a nonlinear the depth and size of cavities in high voltage cables. In addition,
mapping between input and output, which is actually determined decision tree techniques were used for classifying PD signals
by the weights of edges and neurons in the network. In particu- and noises in [72] and differentiating multiple PD sources in
lar, the back-propagation algorithm is a common method devel- [73].
oped to learn these weights. These weights learned in the train- Not like neural network or SVM as black boxes, decision
ing phase would be further used for classifying new instances. tree provides us with visible rules of features for classification,
Neural network is one of the most popular classification tech- which are simple to understand and interpret.
niques used for PD analysis. In [66], the authors conducted a
comprehensive review of neural network techniques used for PD Summary
analysis before 2003. For example, these two studies [33], [34], A comparative study on PD source classification was pre-
which used neural networks with the statistical features for PD sented in [7], which applied various classification models, e.g.,
data analysis, were also reviewed in the above survey. Later on, fuzzy-SVM and a variety of neural network techniques (two-lay-
various neural network techniques are still widely used for ana- er network, multilayer perceptron, radial basis function network,
lyzing PD data in [6], [7], [36], [37], [67]–[69]. and so on). Fuzzy-SVM was demonstrated to achieve the best

November/December — Vol. 31, No. 6 31

performance. In addition, another study on PD source classifi- often distributed. Distributed data mining [80] is thus a promis-
cation [6] also showed that SVM achieved better performance ing solution to efficiently and scalably detect PD.
than back-propagation neural network and self-organizing map.
Therefore, these studies provide us with good guidelines to se- Acknowledgment
lect classification models. This work was supported by a research grant, which was
Moreover, the classification techniques introduced in this sur- funded by Energy Market Authority, Singapore, with grant ref-
vey (namely, decision tree, neural network, and SVM) are quite erence: NRF2012EWT-EIRP002-044.
popular for analyzing PD data. In fact, many other classification
techniques have already been used for PD data–related analy-
sis. For example, random forest [74] and evidential reasoning References
[1] Y. Han and Y. Song, “Condition monitoring techniques for electrical
approaches [75], [76] were used for transformer condition as-
equipment—A literature survey,” IEEE Trans. Power Del., vol. 18, pp.
sessment. A bagging algorithm was proposed in [77] for clas- 4–13, 2003.
sifying different types of PD in the transformers. Furthermore, [2] G. Montanari and A. Cavallini, “Partial discharge diagnostics: From ap-
deep learning techniques [78] have been used as the winning paratus monitoring to smart grid assessment,” IEEE Electr. Insul. Mag.,
vol. 29, pp. 8–17, 2013.
algorithm for several international pattern recognition contests.
[3] U. Fromm, “Partial discharge and breakdown testing at high DC voltage,”
Deep learning techniques for PD data analysis could be explored PhD dissertation, Delft Univ. Technol., Delft, the Netherlands, 1995.
in the future. [4] P. H. Morshuis and J. J. Smit, “Partial discharges at DC voltage: Their
Last, PD clustering and classification are actually different mechanism, detection and analysis,” IEEE Trans. Dielectr. Electr. Insul.,
vol. 12, pp. 328–340, 2005.
learning processes, i.e., unsupervised and supervised learning,
[5] C. Hudon and M. Belec, “Partial discharge signal interpretation for
respectively. In specific scenarios, they can work together to generator diagnostics,” IEEE Trans. Dielectr. Electr. Insul., vol. 12, pp.
achieve better understanding for PD data. Generally, an input 297–319, 2005.
instance for a PD classification model is an acquisition of signal [6] K. Lai, B. Phung, and T. Blackburn, “Application of data mining on
partial discharge Part I: Predictive modelling classification,” IEEE Trans.
with multiple PD pulses from a specific sensor. If multiple PD
Dielectr. Electr. Insul., vol. 17, pp. 846–854, 2010.
pulses from different PD sources are acquired, it would be hard [7] H. Ma, J. C. Chan, T. K. Saha, and C. Ekanayake, “Pattern recognition
for a PD classification model to tell the label of such input in- techniques and their applications for automatic classification of artificial
stance, which can hinder the classification accuracy of the mod- partial discharge sources,” IEEE Trans. Dielectr. Electr. Insul., vol. 20,
pp. 468–478, 2013.
el. In [31], [58], different clusters are first identified to indicate
[8] G. Montanari, A. Cavallini, and F. Puletti, “A new approach to partial
different PD sources, and the PD classification model then takes discharge testing of HV cable systems,” IEEE Electr. Insul. Mag., vol. 22,
each cluster as input and recognizes the PD sources in a more pp. 14–23, 2006.
accurate manner. [9] G. Stone, “Partial discharge diagnostics and electrical equipment insula-
tion condition assessment,” IEEE Trans. Dielectr. Electr. Insul., vol. 12,
pp. 891–904, 2005.
Conclusions [10] M. D. Judd, L. Yang, and I. B. Hunter, “Partial discharge monitoring of
PD detection and measurement provide an important tool power transformers using UHF sensors. Part I: Sensors and signal inter-
pretation,” IEEE Electr. Insul. Mag., vol. 21, pp. 5–14, 2005.
for assessing the condition of power equipments. In this article, [11] W. Gao, D. Ding, and W. Liu, “Research on the typical partial discharge
we have presented a comprehensive survey of the existing tech- using the UHF detection method for GIS,” IEEE Trans. Power Del., vol.
niques used for PD signal sensing, PD feature representation, 26, pp. 2621–2629, 2011.
and PD classification. [12] S. Birlasekaran and W. H. Leong, “Comparison of known PD signals with
the developed and commercial HFCT sensors,” IEEE Trans. Power Del.,
We preserve introductory materials and also discuss the lat- vol. 22, pp. 1581–1590, 2007.
est development of PD analysis, thus our survey can serve as [13] G. Luo and D. Zhang, “Study on performance of HFCT and UHF sen-
a resource of PD analysis for researchers with different back- sors in partial discharge detection,” in IEEE Int. Power Electron. Conf.
grounds. For example, researchers with an electrical engineer- (IPEC), 2010, pp. 630–635.
[14] S. Markalous, S. Tenbohlen, and K. Feser, “Detection and location of par-
ing background would quickly get familiar with computational
tial discharges in power transformers using acoustic and electromagnetic
models for PD data analysis, while those with a computer en- signals,” IEEE Trans. Dielectr. Electr. Insul., vol. 15, pp. 1576–1583,
gineering background are able to learn the basic concepts for 2008.
PD sensing and PD feature engineering. This survey also ben- [15] W. Si, J. Li, D. Li, J. Yang, and Y. Li, “Investigation of a comprehensive
identification method used in acoustic detection system for GIS,” IEEE
efits practitioners, engineers, and system architects by equipping
Trans. Dielectr. Electr. Insul., vol. 17, pp. 721–732, 2010.
them with an overview of the processing pipeline and the vari- [16] P. M. Eleftherion, “Partial discharge. XXI. Acoustic emission based PD
ous techniques used to detect the important PD signals. source location in transformers,” IEEE Electr. Insul. Mag., vol. 11, pp.
In the future, various research directions can be explored for 22–26, 1995.
[17] J. Deng, H. Xiao, W. Huo, M. Luo, R. May, A. Wang, and Y. Liu, “Optical
PD data analysis. First, as various types of sensors have been
fiber sensor-based detection of partial discharges in power transformers,”
designed for sensing PD data, multimodal sensor system [46], Optics Laser Technol., vol. 33, pp. 305–311, 2001.
[14] would leverage the advantages of different sensors for PD [18] M. MacAlpine, Z. Zhiqiang, and M. S. Demokan, “Development of a
data analysis. Second, PD sensors collect data continuously at fibre-optic sensor for partial discharges in oil-filled power transformers,”
Electr. Power Syst. Res., vol. 63, pp. 27–36, 2002.
high speed. Stream data mining [79] thus naturally becomes a
[19] X. Wang, B. Li, H. T. Roman, O. L. Russo, K. Chin, and K. R. Farmer,
good choice for real-time PD data analysis. Third, PD sensors “Acousto-optical PD detection for transformers,” IEEE Trans. Power
are deployed at many sites of the power grid, and PD data are Del., vol. 21, pp. 1068–1073, 2006.

32 IEEE Electrical Insulation Magazine

[20] J. Posada-Roman, J. A. Garcia-Souto, and J. Rubio-Serrano, “Fiber optic [41] K. Lai, B. Phung, T. Blackburn, and N. Muhamad, “Classification of
sensor for acoustic detection of partial discharges in oil-paper insulated partial discharge using PCA and SOM,” in Int. Power Eng. Conf. (IPEC),
electrical systems,” Sensors, vol. 12, pp. 4793–4802, 2012. 2007, pp. 1311–1316.
[21] Z. Wang, I. Cotton, and S. Northcote, “Dissolved gas analysis of alterna- [42] L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J.
tive fluids for power transformers,” IEEE Electr. Insul. Mag., vol. 23, pp. Mach. Learn. Res., vol. 9, pp. 2579–2605, 2008.
5–14, 2007. [43] P. Lewin, L. Petrov, and L. Hao, “A feature based method for partial
[22] S. Singh and M. Bandyopadhyay, “Dissolved gas analysis technique for discharge source classification,” in IEEE Int. Symp. Electr. Insul. (ISEI),
incipient fault diagnosis in power transformers: A bibliographic survey,” 2012, pp. 443–448.
IEEE Electr. Insul. Mag., vol. 26, pp. 41–46, 2010. [44] V. Chatpattananan, N. Pattanadech, and P. Yutthagowith, “Partial dis-
[23] N. Bakar, A. Abu-Siada, and S. Islam, “A review of dissolved gas analysis charge classification on high voltage equipment with K-means,” in 8th
measurement and interpretation techniques,” IEEE Electr. Insul. Mag., Int. Conf. Prop. Appl. Dielectr. Mater., 2006, pp. 191–194.
vol. 30, pp. 39–49, 2014. [45] Y. H. Lin, “Using K-means clustering and parameter weighting for
[24] M. Duval and A. DePabla, “Interpretation of gas-in-oil analysis using new partial-discharge noise suppression,” IEEE Trans. Power Del., vol. 26, pp.
IEC publication 60599 and IEC TC 10 databases,” IEEE Electr. Insul. 2380–2390, 2011.
Mag., vol. 17, pp. 31–41, 2001. [46] R. Sharkawy, R. Mangoubi, T. Abdel-Galil, M. Salama, and R. Bartnikas,
[25] T. Pinpart and M. Judd, “Experimental comparison of UHF sensor types “SVM classification of contaminating particles in liquid dielectrics using
for PD location applications,” in IEEE Electr. Insul. Conf. (EIC), 2009, higher order statistics of electrical and acoustic PD measurements,” IEEE
pp. 26–30. Trans. Dielectr. Electr. Insul., vol. 14, pp. 669–678, 2007.
[26] H. Sinaga, B. Phung, and T. Blackburn, “Partial discharge localization in [47] T. Abdel-Galil, R. Sharkawy, M. Salama, and R. Bartnikas, “Partial dis-
transformers using UHF detection method,” IEEE Trans. Dielectr. Electr. charge pulse pattern recognition using an inductive inference algorithm,”
Insul., vol. 19, pp. 1891–1900, 2012. IEEE Trans. Dielectr. Electr. Insul., vol. 12, pp. 320–327, 2005.
[27] S. M. Markalous, S. Tenbohlen, and K. Feser, “New robust non-iterative [48] T. Abdel-Galil, R. Sharkawy, M. M. Salama, and R. Bartnikas, “Partial
algorithms for acoustic PD-localization in oil/paper-insulated transform- discharge pattern classification using the fuzzy decision tree approach,”
ers,” in 14th Int. Symp. High Volt. Eng., 2005, pp. 29–34. IEEE Trans. Inst. Meas., vol. 54, pp. 2258–2263, 2005.
[28] W. Zheng, Y. Qian, N. Yang, C. Huang, and X. Jiang, “Research on partial [49] R. Liao, L. Yang, J. Li, and S. Grzybowski, “Aging condition assessment
discharge localization in XLPE cable accessories using multi-sensor joint of transformer oil-paper insulation model based on partial discharge
detection technology,” Przegląd Elektrotechniczny, vol. 87, pp. 281–284, analysis,” IEEE Trans. Dielectr. Electr. Insul., vol. 18, pp. 303–311, 2011.
2011. [50] S. Poyhonen, M. Conti, A. Cavallini, G. C. Montanari, and F. Filippetti,
[29] Y. Tian, P. Lewin, and A. Davies, “Comparison of on-line partial dis- “Insulation defect localization through partial discharge measurements
charge detection methods for HV cable joints,” IEEE Trans. Dielectr. and numerical classification,” in IEEE Int. Symp. Ind. Electron., 2004, pp.
Electr. Insul., vol. 9, pp. 604–615, 2002. 417–422.
[30] L. Hao, P. Lewin, and S. Dodd, “Comparison of support vector machine [51] J. Li, R. Liao, S. Grzybowski, and L. Yang, “Oil-paper aging evaluation
based partial discharge identification parameters,” in IEEE Int. Symp. by fuzzy clustering and factor analysis to statistical parameters of partial
Electr. Insul. (ISEI), 2006, pp. 110–113. discharges,” IEEE Trans. Dielectr. Electr. Insul., vol. 17, pp. 756–763,
[31] L. Hao and P. Lewin, “Partial discharge source discrimination using a 2010.
support vector machine,” IEEE Trans. Dielectr. Electr. Insul., vol. 17, pp. [52] V. P. Darabad, M. Vakilian, T. Blackburn, and B. Phung, “Data mining
189–197, 2010. on partial discharge signals of power transformer’s defect models,” Int.
[32] N. Sahoo, M. Salama, and R. Bartnikas, “Trends in partial discharge pat- Trans. Electr. Energy Syst., vol. 23, pp. 423–437, 2013.
tern classification: A survey,” IEEE Trans. Dielectr. Electr. Insul., vol. 12, [53] J. Hunter, L. Hao, P. Lewin, C. Walton, and M. Michel, “Partial discharge
pp. 248–264, 2005. diagnostics of defective medium voltage three-phase PILC cables,” in
[33] E. Gulski and A. Krivda, “Neural networks as a tool for recognition of IEEE Int. Symp. Electr. Insul. (ISEI), 2012, pp. 371–375.
partial discharges,” IEEE Trans. Electr. Insul., vol. 28, pp. 984–1001, [54] A. Contin and S. Pastore, “Classification and separation of partial dis-
1993. charge signals by means of their auto-correlation function evaluation,”
[34] R. Candela, G. Mirelli, and R. Schifani, “PD recognition by means of IEEE Trans. Dielectr. Electr. Insul., vol. 16, pp. 1609–1622, 2009.
statistical and fractal parameters and a neural network,” IEEE Trans. [55] X. Peng, C. Zhou, D. M. Hepburn, M. D. Judd, and W. Siew, “Applica-
Dielectr. Electr. Insul., vol. 7, pp. 87–94, 2000. tion of K-Means method to pattern recognition in on-line cable partial
[35] L. Hao, P. Lewin, J. Hunter, D. Swaffield, A. Contin, C. Walton, and M. discharge monitoring,” IEEE Trans. Dielectr. Electr. Insul., vol. 20, pp.
Michel, “Discrimination of multiple PD sources using wavelet decom- 754–761, 2013.
position and principal component analysis,” IEEE Trans. Dielectr. Electr. [56] J. C. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means cluster-
Insul., vol. 18, pp. 1702–1711, 2011. ing algorithm,” Comp. Geosci., vol. 10, pp. 191–203, 1984.
[36] C. Chang, J. Jin, C. Chang, T. Hoshino, M. Hanai, and N. Kobayashi, [57] A. Contin, A. Cavallini, G. Montanari, G. Pasini, and F. Puletti, “Digital
“Separation of corona using wavelet packet transform and neural network detection and fuzzy classification of partial discharge signals,” IEEE
for detection of partial discharge in gas-insulated substations,” IEEE Trans. Dielectr. Electr. Insul., vol. 9, pp. 335–348, 2002.
Trans. Power Del., vol. 20, pp. 1363–1369, 2005. [58] W. Si, J. Li, P. Yuan, and Y. Li, “Digital detection, grouping and classifi-
[37] D. Evagorou, A. Kyprianou, P. Lewin, A. Stavrou, V. Efthymiou, A. cation of partial discharge signals at DC voltage,” IEEE Trans. Dielectr.
Metaxas, and G. E. Georghiou, “Feature extraction of partial discharge Electr. Insul., vol. 15, pp. 1663–1674, 2008.
signals using the wavelet packet transform and classification with a prob- [59] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
abilistic neural network,” IET Sci., Meas. Technol., vol. 4, pp. 177–192, for discovering clusters in large spatial databases with noise,” in ACM
2010. Conf. Knowl. Disc. Data Mining (KDD), 1996, pp. 226–231.
[38] I. Jolliffe, “Principal component analysis,” in Encyclopedia of Statistics [60] A. Contin and S. Pastore, “Automatic separation of multiple PD sources
in Behavioral Science, B. S. Everitt and D. Howell, Ed. Wiley Online using an Amplitude-AutoCorrelation Relation Diagram,” in IEEE Int.
Library: John Wiley & Sons Ltd, 2005. Symp. Electr. Insul. (ISEI), 2012, pp. 434–438.
[39] R. Liao, Y. Fernandess, K. Tavernier, G. Taylor, and M. Irving, “Recogni- [61] M. E. Newman, “Modularity and community structure in networks,”
tion of partial discharge patterns,” in IEEE Power Ener. Soc. Gen. Meet., Proc. Natl. Acad. Sci. (PNAS), vol. 103, pp. 8577–8582, 2006.
2012, pp. 1–8. [62] Y. Yang and K. Chen, “Temporal data clustering via weighted clustering
[40] T. Babnik, R. K. Aggarwal, and P. J. Moore, “Principal component and ensemble with different representations,” IEEE Trans. Knowl. Data Eng.,
hierarchical cluster analyses as applied to transformer partial discharge vol. 23, pp. 307–320, 2011.
data with particular reference to transformer condition monitoring,” IEEE [63] S. Vega-Pons and J. Ruiz-Shulcloper, “A survey of clustering ensemble
Trans. Power Del., vol. 23, pp. 2008–2016, 2008. algorithms,” Int. J. Pattern Recog. Art. Intel., vol. 25, pp. 337–372, 2011.

November/December — Vol. 31, No. 6 33

[64] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Hong Cao (S ’08, M ’11) currently heads
Machines and Other Kernel-Based Learning Methods. New York, NY: the analytics function in McLaren Applied
Cambridge University Press, 2000.
[65] P. N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Technologies, Asia Pacific. Previously, he
Addison-Wesley, 2005. was a research scientist and head of the dis-
[66] M. Danikas, N. Gao, and M. Aro, “Partial discharge recognition using tributed analytics laboratory in the Institute
neural networks: A review,” Electr. Eng., vol. 85, pp. 87–93, 2003. for Infocomm Research (I2R), a member
[67] A. R. G. Castro and V. Miranda, “Knowledge discovery in neural net-
works with application to transformer failure diagnosis,” IEEE Trans. of Singapore’s Agency of Science, Tech-
Power Syst., vol. 20, pp. 717–724, 2005. nology and Research (A*STAR). He re-
[68] M. H. Wang, “Partial discharge pattern recognition of current transform- ceived the first-class honors BEng, MEng,
ers using an ENN,” IEEE Trans. Power Del., vol. 20, pp. 1984–1990, and PhD degrees from Nanyang Technological University, Sin-
[69] T. Boczar, S. Borucki, A. Cichon, and D. Zmarzly, “Application pos- gapore, in 2001, 2003, and 2010, respectively. His current re-
sibilities of artificial neural networks for recognizing partial discharges search interest includes health monitoring for complex systems,
measured by the acoustic emission method,” IEEE Trans. Dielectr. Electr. data forensics, mobile data analytics, and machine learning in
Insul., vol. 16, pp. 214–223, 2009. applied context. His previous work in image forensics received
[70] V. Miranda and A. R. G. Castro, “Improving the IEC table for transformer
failure diagnosis with knowledge extraction from neural networks,” IEEE the best paper award in IWDW 2010 and the honorary mention
Trans. Power Del., vol. 20, pp. 2509–2516, 2005. in ISCAS 2010. Recently, he also led teams to win international
[71] A. Yazdandoust, F. Haghjoo, and S. Shahrtash, “Insulation status assess- and local benchmarking challenges such Opportunity Activ-
ment in high voltage cables based on decision tree algorithm,” in IEEE ity Recognition Challenge 2011 and Up-Singapore Hackathon
Canada Electr. Power Conf., 2008, pp. 1–5.
[72] H. Hirose, M. Hikita, S. Ohtsuka, S. Tsuru, and J. Ichimaru, “Diagnosis 2012. He was part of the winning team for GE FlightQuest 2013.
of electric power apparatus using the decision tree method,” IEEE Trans. He currently serves as vice president for IEEE signal processing
Dielectr. Electr. Insul., vol. 15, pp. 1252–1260, 2008. society, Singapore section. More information on Cao is available
[73] K. Lai, B. Phung, and T. Blackburn, “Descriptive data mining of partial at
discharge using decision tree with genetic algorithm,” in Australasian
Univ. Power Eng. Conf. (AUPEC), 2008, pp. 1–6.
[74] X. Chen, H. Cui, and L. Luo, “Fault diagnosis of transformer based on Jianneng Cao is a research scientist in the
random forest,” in Int. Conf. Intel. Comp. Technol. Automat. (ICICTA), Data Analytics Department at the Institute
2011, pp. 132–134. for Infocomm Research (I2R) under the
[75] W. Tang, K. Spurgeon, Q. Wu, and Z. Richardson, “An evidential reason-
ing approach to transformer condition assessments,” IEEE Trans. Power Agency for Science, Technology and Re-
Del., vol. 19, pp. 1696–1703, 2004. search (A*STAR), Singapore. He obtained
[76] R. Liao, H. Zheng, S. Grzybowski, L. Yang, Y. Zhang, and Y. Liao, “An a PhD degree in computer science from the
integrated decision-making model for condition assessment of power National University of Singapore in 2011.
transformers using fuzzy approach and evidential reasoning,” IEEE Trans.
Power Del., vol. 26, pp. 1111–1118, 2011. Before joining I2R, he worked at Purdue
[77] T. Jiang, J. Li, Y. Zheng, and C. Sun, “Improved bagging algorithm for University as a postdoc. Jianneng’s main
pattern recognition in UHF signals of partial discharges,” Energies, vol. 4, research interests are data privacy and
pp. 1087–1101, 2011. stream data management.
[78] G. Hinton, S. Osindero, and Y. W. Teh, “A fast learning algorithm for
deep belief nets,” Neural Comput., vol. 18, pp. 1527–1554, 2006.
[79] M. M. Gaber, A. Zaslavsky, and S. Krishnaswamy, “Mining data streams: Hai-Long Nguyen is a research scientist
A review,” ACM Sigmod Rec., vol. 34, pp. 18–26, 2005. in the Data Analytics Department at the In-
[80] K. Shim, “MapReduce algorithms for big data analysis,” Proc. VLDB stitute for Infocomm Research (I2R) under
Endow., vol. 5, pp. 2016–2017, 2012.
the Agency for Science, Technology and
Research (A*STAR), Singapore. He ob-
tained a PhD from Nanyang Technological
University in 2013. He received the first-
class bachelor’s degree from Ho Chi Minh
City University of Technology, Vietnam,
Min Wu is currently a research scientist in 2007. His research interests include time series mining, data
in the Data Analytics Department at the stream mining, and big data.
Institute for Infocomm Research (I2R) un-
der the Agency for Science, Technology João Bártolo Gomes is a research scien-
and Research (A*STAR), Singapore. He tist at the Institute for Infocomm Research
received the BEng from the University of (I2R) under the Agency for Science, Tech-
Science and Technology of China (USTC), nology and Research (A*STAR), Singa-
China, in 2006 and his PhD degree from pore. He received his PhD in computer
Nanyang Technological University, Sin- science from the Technical University of
gapore, in 2011. His current research interest includes machine Madrid (UPM), Spain, in 2011. Previous-
learning, data mining, and bioinformatics. ly, he was a member of the research group

34 IEEE Electrical Insulation Magazine

DAME (data mining engineering) at UPM. His research inter- Engineering, Faculty of Information Technology, Monash Uni-
ests include ubiquitous knowledge discovery, machine learning versity, Melbourne, Australia, where she is currently an associ-
algorithms, data stream mining, and learning from evolving data ate professor. She has contributed to around 200 research pa-
streams. pers. Her current research interests include the areas of mobile,
ubiquitous, distributed data mining, and data stream mining. She
Shonali Krishnaswamy received the is increasingly interested in mobile crowd sensing, mobile user
bachelor of science degree in computer analytics, and mobile activity recognition. Krishnaswamy was
science from the University of Madras, the recipient of the Monash University Vice-Chancellor’s Award
Chennai, India, in 1996 and the master of for Excellence in Research by an Early Career Researcher 2008,
computing degree and the PhD degree in the IBM Innovation Award (unstructure information manage-
computer science from Monash University, ment architecture), the Faculty of Information Technology Early
Australia, in 1998 and 2003, respectively. Career Researcher Award, and an Australian Post-Doctoral Fel-
She is currently the head of the Data lowship from the Australian Research Council.
Analytics Department, Institute for In-
focomm Research (I2R), A*STAR, Singapore. She was the
director of the Centre for Distributed Systems and Software

November/December — Vol. 31, No. 6 35

You might also like