Abu Rub2019

Partial Discharge Localization in Gas-Insulated
Switchgear using Various Learning Algorithms

Omar Abu-Rub1 , Ahmad Darwish1
1
Texas A&M University, College Station, TX, 77843, USA
omar.abu-rub@qatar.tamu.edu; a.darwish1993@tamu.edu
Abstract— Gas-insulated switchgears have become essential the PD pulse and is an extremely sensitive method due to the
parts of the electric power systems utilities because of their very high immunity to noise. Optical fiber sensors are
reliability, compactness, and effectiveness. Different techniques commonly used in the detection of PD using optical techniques
have been developed to protect these devices from partial [4], [5]. Optical techniques are prone to fringe fading because of
discharge. The Ultra-High Frequency techniques, which detects the random polarization. Electrical techniques rely on the
electromagnetic waves emitted during the event of partial measurement of voltages and currents using an electric circuit
discharge, showed great performance on the localization of such (conventional method), or the detection of Electromagnetic
phenomenon. In this paper, three different machine learning (EM) waves generated because of the acceleration and
classification models have been built to localize partial discharge
deceleration of currents during PD activities [6].
in a modeled gas-insulated switchgear using COMSOL multi-
physics. Results for five different sensors are obtained and Machine learning is concerned with the study of statistical
processed to establish concrete attributes for machine learning models for prediction and decision making based on training
purposes. Various learning algorithms have been evaluated and data. A machine learning model uses specific statistical and
compared, such as artificial neural networks, support vector mathematical techniques on training data to predict the output of
machine (SVM), and a subspace discriminant ensemble model future input data with similar attributes. Within machine
yielding a range of accuracies. An accuracy of 94.8% was obtained learning exists three classes: supervised learning, unsupervised
through the subspace discriminant ensemble model. learning, and reinforcement learning. Supervised learning, the
Keywords—Gas-Insulated Switchgear, Neural Networks,
primary focus of this study, trains input data with an identified
Subspace Discriminant Ensemble, Support Vector Machine. field indicating the output. When new input data is given, the
model assigns an output for this data based on previously trained
data. On the other hand, unsupervised learning trains data
I. INTRODUCTION without knowledge of the output of each data instance and
as-Insulated Switchgears (GIS) have recently been widely partitions the data into different groups with similar attributes
integrated with the electric power system utilities because of the when using cluster analysis. When new input data is supplied,
associated high reliability, small size, and ease of maintenance the output will be based on the previous data and the model
[1]. Partial Discharge (PD) is one of the most common factors determines within which group of data it lies (for cluster
that contributes to the failure of the electrical insulation of such analysis). Finally, reinforcement learning differs from
devices. Thus, monitoring GIS against PD phenomenon has supervised learning such that the input/output pair do not need
become of paramount importance to improve power systems to be labelled. The reinforcement environment is usually
reliability. Sulphur Hexafluoride (SF_6) is one of the most formulated as a Markov decision process. In essence, in the
common gasses used as the insulating medium in GIS. SF_6 has absence of training dataset, it is bound to learn from its
a very high dielectric strength of around 89kV/cm which is 3 experience [7].
times higher than that of air [2], [3].
It is important to note that for supervised learning, there
A large number of techniques have been developed to detect exists further classes namely regression and classification.
and identify PD phenomenon. Such techniques include Classification models contain discrete valued outputs or
chemical, acoustic, optical and electrical techniques [4]. categorical outputs. On the other hand, regression models have
Chemical techniques rely on the collection of released samples continuous valued outputs. For both classification and
of gasses and oil during the PD activities. Chemical techniques regression models, the input variables can be either discrete or
can provide an accurate indication about the condition of the continuous valued [7].
insulation material. Nonetheless, these techniques acquire long
The main contribution of the paper is concerned with void
time as they rely on the decomposition of materials due to PD
localization inside an L-structured GIS using machine learning
[4], [5]. Acoustic methods measure the mechanical (acoustic)
classification models. A short dipole is modelled to represent the
waves caused by the release of energy during PD activities. This
defect inside the GIS enclosure. The voltage amplitude and
technique is capable of locating PD sources using the time of
power received by each sensor are used to localize defects using
arrival and the phase delay of waves [4]. Acoustics techniques
might give some false information about the conditoiions of
insulations because of the different noise sources [4], [5].
Optical techniques rely on the emission of light associated with
different machine learning techniques. In essence, this paper is 700 mm
organized as follows: Section 2 shows the simulation model Sensor 3 ( = 0)
500 mm
used to inject the pulses. Section 3 discusses the machine
learning models used for void localization. It also presents,
Steel Cover Steel Cover
discusses, and analyses the obtained the results. Finally,
Sensor 3 ( = 90)
concluding remarks are given in the section 4.
1000 mm
300 mm Switch
II. SIMULATION MODEL

A detailed model of L-type structured GIS with a void defect
in the insulator has been built and validated the proposed Sensor 2 400 mm
Sensor 2 ( = 0)
methodology to localize pf PD source in the GIS. Fig. 1 shows ( = 90)
the modeled L-structured GIS with its dimensions. The inner 1000 mm
conductor radius is 80 mm whereas the outer conductor is 200
mm. The spacers’ thickness is 30 mm. The implemented model
is used for the detection and localization of PD sources inside
Spacer
the GIS. If the void size is small enough, the Streamer channel
280 mm
created during PD event can be modeled using a dipole antenna Sensor 1
500 mm
[8]. Thus, a defect is modelled by a short dipole antenna. In fact,
when the size of voids is small, the streamer channel can be
approximated by a small dipole antenna the length of which Steel Cover
cannot exceed 5mm whereas the diameter is in the order of
0.3mm [8]. The modelled defect has been placed at different Fig. 1. A 2-Dimensional View of the Proposed GIS Model
locations inside the GIS enclosure. The Wanninger pulse
represented by Eq. (1) is used as the input to the dipole model
= (1)
Where is the voltage amplitude applied to the dipole, and
is the rise-time of the pulse. The localization of defect is
obtained based on the time of flight as well as intensity of the
voltage and power received by each of the sensors shown in Fig.
1.
As a result of the limited number of void simulations and
locations due to the large computational and memory
requirements of the finite element model, the GIS is subdivided
into four different sections, seen in Fig. 2. Among each section
exists 3-4 voids with their respective raw data. Machine learning
supervised classification models will be used to act as a meta-
model to classify in which section of the GIS the void exists.
The raw data includes recorded data from 5 different sensors
along the GIS with each sensor measuring both the voltage and
the power. The domain of the measured signal exists in a
timeframe from 0ns (the instance the pulse is created) up to 175
ns after the pulse is initialized. In total, the raw signal includes
Fig. 2. A 2-D Diagram of the Categorized GIS Model
11 variables: time, 5 different voltage measurements, and 5
different power measurements from the respective 5 sensors.
A defect is placed close to sensor 2 ( = 90°) in the GIS III. PD LOCALIZATION USING MACHINE LEARNING
structure (refer to Fig. 1). The obtained results, which include The main objective in ensuring high accuracy for machine
the voltage amplitude and the total received power by each learning models is to extract meaningful information that
sensor, are shown in Fig. 3 and Fig. 4. Since the defect is placed directly influences the output prediction. Training the raw data
close to sensor 2, the voltage and power amplitude received by yielded a maximum of 50% accuracy. This is attributed to the
sensor 2 is higher than that received by sensors 1 and 3. This is fact that since each data point of a signal was trained, some of
mainly attributed to the fact that the longer the distance between which are insignificant and further caused overfitting of the data.
the PD source and sensors, the higher the attenuation
Due to the randomness of the voltage signal, it is rational that
experienced by the EM waves [9, 10].
training each point of the signal would be inefficient causing a
low accuracy prediction. Thus, the original signal was broken
down into 12 segments, where each segment includes a
timeframe of 120 ns of the original signal. Furthermore,
a a
b b
c c
Fig. 3. Voltage Amplitude Received by a) Sensor 1, b) Sensor 2, c) Fig. 4. _Total Power Received by a) Sensor 1, b) Sensor 2, c) Sensor 3
Sensor 3
statistical time-domain features were obtained for each segment void. The full process of classifying the data through specific
including the variance, standard deviation, mean, absolute sum, machine learning models is shown in the flow chart in Fig. 5.
skewness, and kurtosis. Breaking the signal into 12 segment
insures the extraction of the important information by inspecting The total data set contains 14 voids and 12 instances per
the characteristics of the signal. Also, partitioning the original void, thus a total of 168 instances. The training data was
signal into segments reduces the dataset improving the model obtained from the dataset through a 5-fold cross validation.
training latency. In essence, the classification models aim to According to various experimental studies, this value generally
establish a strong relationship between the signals’ statistical results in a model prediction with low and modest variance,
time-domain features to accurately predict the localization of the particularly for small quantity samples [11].
output at each neuron is the weighted sum of the inputs of the
neuron computed by a non-linear function. In other terms, each
input signal to a specific neuron is multiplied by a weight with
an added bias before being transformed by an activation
function. The weights and biases of the network are initially
randomized and varied by a training algorithm such that each
weight is adjusted based on the cost function or the error of the
training data and its corresponding output. For this particular
study, the back-propagation algorithm was used to train the
ANN system which uses a gradient descent approach that
exploits the chain rule. Training the neural network by updating
the weights holds a recursive nature although not practically
efficient; especially regarding overall training latency can
typically yield high accuracies [22]. Among the different
machine learning models tested throughout this study, neural
networks yielded the lowest accuracy, with a maximum
accuracy of 51%. The ANN models tested included up to 3
hidden layers with varied number of nodes/neurons at each
layer.
B. Support Vector Machine (SVM)

The support vector machine (SVM) is a supervised learning
model [13]. It is a representation of the data as points in space
separated by a clear gap that is as wide as possible. The
boundary that classifies different classes is known as a
hyperplane, and the points closest to the hyperplane are known
Fig. 5. Flowchart Illustrating the Adapted Machine Learning as support vectors. Support vector machine was first proposed
Process by Vladimir Vapnik in 1995. SVM provides a great advantage
in small sample quantity training data, nonlinear, and high
A. Neural Network dimensionality pattern recognition problems. Due to the size and
The first classification algorithm tested was the neural high dimensionality of the training data proposed earlier, SVM
networks, primarily back propagation neural networks. is proposed to be most appropriate for this application.
Artificial neural networks are computing systems that learn to Successful applications have demonstrated that SVM can
perform certain tasks by taking into consideration examples and perform as well or better than neural networks in a wide variety
training data. An ANN is composed of a collection of connected of fields, including engineering, information retrieval, and
nodes also known as artificial neurons where each node can bioinformatics [13].
transmit a signal. Typically, an artificial neuron receives a signal A linear SVM models the data and divides them into
then processes it and can further transmit the processed signal to different regions separated by a linear hyperplane (linear
further neurons connected to it [12]. boundary). A linear SVM algorithm finds a linear boundary that
An ANN can be partitioned in layers where artificial neurons separates regions with the least error by maximizing the distance
are aggregated into the layers where the signal can travel. between the boundary and the support vectors. Linear SVM is
Generally speaking, artificial neural networks contain 3 primary restricted as they are ideal for linearly separable data and may
layers, the input layer, hidden layer(s), and the output layer. In fail to achieve high accuracy when dealing with non- linear data
ANN, the signal is real numbered value where the output at each set.
neuron is the weighted sum of the inputs of the neuron computed As seen in Fig. 6, the linear support vector machine model
by a non-linear function. In other terms, each input signal to a yielded an accuracy of around 80%, with a maximum achieved
specific neuron is multiplied by a weight with an added bias accuracy of 83.1%. Although this is a significant improvement
before being transformed by an activation function. The weights from the neural network model, this is a considerably low
and biases of the network are initially randomized and varied by accuracy given that the GIS is only categorized into 4 sections.
a training algorithm such that each weight is adjusted based on According to the confusion matrix seen in Fig. 5, the localization
the cost function or the error of the training data and its of voids in section I were accurately localized. However,
corresponding output. For this particular study, the back- sections II and III obtained relatively large classification
propagation algorithm was used to train the ANN system which inaccuracies due to their similarities and the model’s limitation
uses a gradient descent approach that exploits the chain rule. in distinguishing the differences. Similarly, the primary
Training the neural An ANN can be partitioned in layers where inaccuracies in the model for section IV were caused due to the
artificial neurons are aggregated into the layers where the signal similarity with voids in section III.
can travel. Generally speaking, artificial neural networks contain
3 primary layers, the input layer, hidden layer(s), and the output Fig. 7 shows the ROC curves for the classification of each
layer. In ANN, the signal is real numbered value where the section. Although noticeable inaccuracies exist within the linear
Fig. 6. Linear SVM model confusion matrix Fig. 8. Initial Subspace Discriminant Model Confusion Matrix
SVM model, the primary inaccuracies are caused by the C. Subspace Discriminant Ensemble
incorrect prediction of adjacent sections. Fig. 7 shows the area
under curve (AUC) is above 0.93 with section I and IV more The final machine learning algorithm to be used for void
accurately classified. This shows that despite the minor classification and localization in this study is the subspace
inaccuracies, this model has the potential to be implemented for discriminant ensemble algorithm that is based on the random
more void classification in the GIS. subspace model. Ensemble models are a powerful algorithm
commonly used in machine learning. Ensembles implement
Although the linear SVM supports the notion of using multiple algorithms to achieve better predictive performance
machine learning as a method to accurately localize voids in the and accuracy as compared to any of the constituent algorithms
GIS, a more improved feature selection and larger dataset is alone [14]. Among the various ensemble algorithms, decision
required to acquire higher predictive accuracies and precise trees, such as random forests, are fast and commonly used
localizations. Thus, to improve the accuracy of the SVM model, ensemble methods. A prominent issue with decision tree
further feature analysis and improved feature extraction is algorithms for classification is avoiding overfitting the training
essential. data whilst achieving high accuracy. In 1998 Tin Kam Ho
proposed a method to construct a decision tree-based classifier
that maintains high accuracy on training data and aims to
improve on generalization accuracy as it grows in complexity.
The proposed classifier consists of multiple trees constructed
systematically by pseudo randomly selecting subsets of
Fig. 7. Linear SVM (a)Section I (b)Section II (c)Section III (d)Section

IV ROC curves Fig. 9. Subspace discriminant model confusion matrix
results support the potential of using machine learning for void
localization in systems such as the GIS with conventionally used
sensors. The limitation lies within the limited cases due to the
large demand processing and memory resources that causes each
instance and void simulation to take an average of a day and a
half. Furthermore, with the addition of more void instances and
improved feature extraction methods, the localization of voids
within the GIS can be more precisely accomplished based on the
results of this study.
REFERENCES
[1] Guo, H., Lu, F., Ren, K.F.: “Simulation and measurement of PD-induced
electromagnetic wave leakage in GIS with metal belt” IEEE Transactions
on Dielectrics and Electrical Insulation, 2014, 21, (4), pp. 1942–1949.
[2] Prabakaran T, John Powl S and Uppili B, "PD test in gas insulated
substation using UHF method," 2015 International Conference on
Condition Assessment Techniques in Electrical Systems (CATCON),
Bangalore, 2015, pp. 57-60.
[3] Fan, X., Li, L., Zhou, Y., et al.: “Online detection technology for SF6
decomposition products in electrical equipment: a review” IET Science,
Measurement & Technology, 2018, 12, (6), pp. 707–711.
[4] M. Yaacob, M. Alsaedi, J. Rashed, A. Dakhil, and S. Atyah, “Review on
Partial Discharge Detection Techniques Related to High Voltage Power
Fig. 10. Subspace discriminant (a) Section I (b) Section II Equipment Using Different Sensors,” Photonic Sensors, 2014, 4(4): 325–
(c) Section III (d) Section IV ROC curves 337.
[5] R. Rahmani, R. Yusof, M.S. Mahmodian, and A. A. Shojaei, “Review on
components of the feature vector, in other words, trees PD detection and Measurement in Electrical Power Equipment,” in
constructed in randomly chosen subsets [14]. International Review on Modelling and Simulations, vol. 5, no. 1, pp. 426-
433, February 2012.
As seen in Fig. 8, the subspace discriminant ensemble model [6] M. D. Judd, Li Yang and I. B. B. Hunter, "Partial discharge monitoring of
initially yielded an accuracy of 86.7%. The primary inaccuracies power transformers using UHF sensors. Part I: sensors and signal
in the model was caused by the incorrect prediction of section interpretation," in IEEE Electrical Insulation Magazine, vol. 21, no. 2, pp.
5-14, March-April 2005.
IV and misidentifying it as section II. In addition, more
[7] Hastie, Trevor, R. Tibshirani, J. Friedman, and J. Franklin, "The elements
inaccuracies exist between adjacent sections, as seen previously of statistical learning: data mining, inference and prediction," In The
in the linear SVM model. However, with further signal Mathematical Intelligencer, vol. 27, no. 2, pp. 83-85, 2005.
processing and removing unnecessary data detrimental to the [8] Y. Qi, Y. Fan, B. Gao, Y. Mengzhuo, A. Jadoon, Y. Peng, and T. Jie,
total accuracy; specifically, removing the last section of the ‘‘Study on the propagation characteristics of partial discharge in
signal from training data yielded a total accuracy of 94.8%. This switchgear based on near-field to far-field transformation,’’ Energies, vol.
is a major improvement from previous models as seen in Fig. 9. 11, no. 7, p. 1619, 2018.
[9] M. Hikita, S. Ohtsuka, G. Ueta, S. Okabe, T. Hoshino and S. Maruyama,
The confusion matrix in Fig. 10 shows the highly accurate "Influence of insulating spacer type on propagation properties of PD-
classification of voids using the subspace discriminant model. induced electromagnetic wave in GIS," in IEEE Transactions on
The figure shows the flawless prediction of voids in section IV. Dielectrics and Electrical Insulation, vol. 17, no. 5, pp. 1642-1648,
October 2010.
The model accurately classified and localized voids in sections
[10] G. Behrmann, K. Wyss, J. Weiss, M. Schraudolph, S. Neuhold and J.
I and III with only a single false prediction each attributed to Smajic, "Signal delay effects of solid dielectrics on time-of-flight
their adjacent sections. The greatest inaccuracies and false measurements in GIS," in IEEE Transactions on Dielectrics and
classifications exists within section II, where 6 instances in Electrical Insulation, vol. 23, no. 3, pp. 1275-1284, June 2016.
section II were falsely classified as voids existing in section I. [11] James, Gareth, D. Witten, T. Hastie, and R. Tibshirani, "An introduction
Furthermore, the ROC curves in Fig. 10 show the ideal to statistical learning," Vol. 112. New York: springer, 2013.
classification of voids in section 4 with an ideal AUC of 1.0 and [12] S. Haykin, Neural networks: a comprehensive foundation, Prentice Hall
the inaccuracies existing in section II as seen from Fig. 10(b). PTR, 1994.
[13] B. Scholkopf and A.J. Smola, Learning with Kernels: Support Vector
Machines, Regularization, Optimization and Beyond, the MIT Press,
IV. CONCLUSION 2002.
In conclusion, an L-structured GIS model has been built [14] Ho, T. Kam. "The random subspace method for constructing decision
using the Multiphysics package of COMSOL. Different forests, " IEEE Transactions on Pattern Analysis and Machine
Intelligence, 20(8), pp.832-844, 1998.
machine learning classification models were trained to localize
voids in the GIS system. A maximum accuracy of 94.8% was
obtained by training the subspace discriminant ensemble. The

Abu Rub2019

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Abu Rub2019

Uploaded by

Copyright:

Available Formats

Partial Discharge Localization in Gas-Insulated

Switchgear using Various Learning Algorithms

II. SIMULATION MODEL

B. Support Vector Machine (SVM)

Fig. 7. Linear SVM (a)Section I (b)Section II (c)Section III (d)Section

You might also like