B3-211 - 2020 Optimization of Health Indices For Power Assets in Substation Using Machine Learning Method

Paris 2020
B3-211
Optimization of Health Indices for Power Assets in Substation

Using Machine Learning Method
J. R. JUNG*, H. D. SEO, K. R. HWANG H. R. DO, M. G. KWAK, S. B. KIM

HYOSUNG Corporation KOREA University
Republic of Korea Republic of Korea
SUMMARY
This paper aims to establish and verify modules predicting the Health index (HI) of the
high-voltage transformer and GIS by applying machine learning to modules that calculate
the Health index of assets in the Asset Management Solution (AMS). The learning data of
the HI machine learning module consists of an evaluation score of each asset’s condition
assessment factors, and an optimal expert diagnosis HI score that can adequately reflect
actual asset conditions, based on the evaluation score of each factor. The two-phase
framework was constructed to improve the predicted accuracy of the HI machine
learning module. The first-phase HI-classification model predicts the HI score interval,
while the second-phase HI-regression model predicts the HI score from the HI-score
interval predicted in phase 1. To develop the optimal HI machine learning module, the
final algorithm was selected by comparing the performance of the HI machine learning
algorithm using the classification accuracy and the Mean absolute percentage error
(MAPE) by applying various machine learning algorithms to the classification model and
regression models. The developed HI modules are being operated in the application of
our steel and chemical customers’ smart substation pilot projects, and the performance
of algorithms in the actual application situation is being verified, and problems being
improved.
KEYWORDS
Asset Management - Health Index - Machine Learning - Transformer - Ensemble Learning -
Gas insulated switchgear
jrjung@hyosung.com 1
1. INTRODUCTION
In recent years, many utilities have adopted the concept of asset health index as a tool in
asset management decision making. The asset health index (HI) is a score that is derived
for an asset, in order to provide an indicator of its condition. It is used in deciding the
need for maintenance or replacement within given timeframes. The existing process of
calculating the HI of the power assets has used weighted averages for scores, and
weights for each assessment factor[1-3]. However, existing processes that use simple
weighted averages based on weights have a drawback that results in inconsistent
maintenance matching for actual equipment condition. This is due to the fact that it is
difficult to effectively reflect the low scores of a small number of assessment factors.
Even if the weighted value is a relatively small assessment factor, a corresponding level of
maintenance is required if the absolute value of the evaluation score falls below a certain
standard, but it is difficult to provide appropriate levels of HI through the weighted
averaging process. As a result, the new HI module was researched in response to our
customers' demand for improvement and the introduction of new AI technologies.
In this paper, to overcome these drawbacks, the HI module based on machine learning
models was proposed for high-voltage transformer and GIS. The learning data of the HI
machine learning module consists of an evaluation score of each asset’s condition
assessment factors, and an optimal expert diagnosis HI score that can adequately reflect
actual asset conditions based on the evaluation score of each factor. The two-phase
framework was constructed to improve the predicted accuracy of the HI machine
learning module. The first-phase HI-classification model predicts the HI score interval,
while the second-phase HI-regression model predicts the HI score from the HI-score
interval predicted in phase 1. The optimal HI machine learning module was selected by
comparing performance after applying various machine learning algorithms to
classification models and regression models.
Verification of the performance of the HI machine learning module was evaluated using
the Classification Accuracy and the Mean Absolute Percentage Error (MAPE) with 10-fold
cross-validation. Figure 1 shows the HI prediction module development process. Finally,
this paper describes our customer’s use of the newly developed HI module, the case of
application of a chemical and steel company smart substation.
Figrure 1. Process for developing HI machine learning modules.
2
2. LEARNING DATA OF THE HI MACHINE LEARNING MODULE
2.1 COMPOSITION OF LEARNING DATA
In this study, a machine learning-based HI prediction module was built for high-voltage
transformers and GIS. The learning data of the transformers and GIS used to build
machine learning HI modules consists of the assessment scores for each asset’s
evaluation items, and the HI scores showing the overall health status of the asset from
the individual item evaluation scores. The evaluation items of each asset were selected as
key factors for evaluating the health of the facility through literature analysis and facility
failure/decreasing analysis, and the assessment scores were allocated according to the
conditions of each assessment item. The transformer assesses the condition of the facility
according to a total of 35 assessment items within six assessment groups, including
operating environment, insulation degradation, dielectric risk, thermal risk, chemical risk,
and mechanical risk[2,8]. The GIS’s HI is evaluated according to a total of 28 assessment
items within six assessment groups, namely operation data, airtight performance,
insulation performance, switching performance, current carrying performance, and
others[4,7].
2.2 GENERATION OF LEARNING DATA
Building and evaluating a machine learning-based module for predicting transformers and
GIS HI requires a sufficient amount of asset evaluation scores and learning data, and it is
generally known that the larger the amount of learning data, the better. However, there
is a limit to the amount of learning data that can be deployed from actual maintenance
history data performed on transformers and GIS. In addition, it is difficult to learn from
the actual data collected, because the HI scores are rarely low due to poor health, and
most HI scores are in good condition. Therefore, assessment score data reflecting real-
world systems were divided into several cases, and data was randomly generated
according to the data generation logic, and used to learn the model. The HI was
evaluated by a power asset expert on the assessment score of any transformer and GIS
asset that was produced. Figure 2 indicates the process of generating HI score data for
each evaluation item for the transformer, and then obtaining HI scores from the power
asset expert to form the final data.
Figure 2. Process for generating evaluation score data and scoring HI.
3
3. HI MACHINE LEARNING MODULE
3.1 FRAMEWORK OF THE HI MACHINE LEARNING MODULE
To improve HI predictive accuracy, this study proposes a first-phase ‘score interval HI

predictive regression model’, and a second-phase ‘score interval HI predictive model’.
Figure 3 shows the structure of the framework. In Step 1, the HI score interval
classification model primarily learns the relationship between the assessment score and
the HI score interval of the asset, and automatically allocates the HI score interval that is
appropriate for the evaluation score. HI interval can be divided into four total categories:
A, B, C, and D, and each section corresponds to (81 – 100), (61 – 80), (41 – 60), and (0 – 40).
In level I model learning, the input variables are the evaluation score data, and the output
variables are the categorical variables, HI intervals (A, B, C, and D). The HI predictive
regression model for each point interval built in Step 2 is learned for each HI score interval
classified in Step 1. A model is learned to predict HI scores by section A, B, C, and D, with
the advantage of pre-classifying the HI scoring intervals of the facility in phase I, and the
subsequent HI scoring prediction model can more closely reflect the characteristics of the
assessment score data inherent in each section. The input variables of the two-step
model are the same evaluation score data as the first-phase model, but the output
variables are the numerical variables, HI scores.
Figure 3. HI Prediction Framework.
3.2 APPLICATION OF THE MACHINE LEARNING ALGORITHM
The machine learning-based classification and regression models include: from basic
models such as the Decision Tree, k-Nearest Neighbor, and the Artistic Neural Network,
to ensemble techniques that improve general performance by using the performance of
multiple models on average, and, in recent years, Deep Learning, which is known to excel
in a variety of domains. Among these models, an ensemble Algorithm model was selected
to calculate the importance of input variables. This is because the importance of variables
based on data and models can be extracted and compared to the weight of the
assessment items being used in existing systems. However, since the importance of
variables basically means the degree to which they have a great effect on classifying
evaluation classes, they do not exactly match the weights of existing systems in meaning,
and the significance of variables varies, depending on the characteristics of the machine
4
learning model. Faulty predictions of various types of machine learning can lead to more
reliable results than a single model. Typical ensemble algorithms, such as Random Forest,
Gradient Boosting, and AdaBoost, were applied to this model, to select the algorithm that
represented the highest performance as the final model[10-13].
4. PERFORMANCE EVALUATION
4.1 METHODS FOR EVALUATING MODEL PERFORMANCE
To evaluate the performance of the HI predictive model developed, 10-fold cross-

validation is applied in this study. As in Figure 4, the evaluation scores generated and the
HI score data given are divided into 10 sub-databases, to use 9 sub-databases as learning
data, and 1 sub-database as verification data. The advantage of repeating this process 10
times is that the performance of the developed model can be more generalized by
performing a large number of experiments with a limited number of data. Cross-
validation is one of the most commonly used methodologies for evaluating the
performance of algorithms or models in the field of machine learning, and mainly uses 5-
fold or 10-fold.
Figure 4. Example of a 10-fold cross-validation.
Classification accuracy was used as an indicator of HI score interval accuracy, to evaluate

the performance of the first phase HI score interval classification model. Classification
accuracy is expressed as the percentage of the total number of data divided by the
number of total data matching the actual interval of the data, and the predicted interval
produced by the model. The HI score accuracy indicator for evaluating the two-phase HI
predictive regression model was the Mean Absolute Percentage Error (MAPE). MAPE is
the percentage of the predicted error rate compared to the actual value. The equation for
each evaluation metric is as follows:
(1)
(2)
5
4.2 PERFORMANCE COMPARION
Each model can be applied to phases 1 and 2, and a total of 9 model combinations have
been applied. Tables 1 and 2 show the results of the experiments performed on the
combination. Both classification and regression models performed best when Gradient
Boosting was used. In addition, the performance of the model changed a lot, depending
on the type of HI score interval classification model as a whole; and it was found that the
performance of the model was the lowest when used as the AdaBoost classification
model.
Table I. Performance Comparison by Algorithm Applied to Transformer.
Phase 1: Phase 2: Classification
MAPE(%)
Classification model Regression model Accuracy(%)
Gradient Boosting Gradient Boosting 92.8 2.2
Gradient Boosting Random Forest 92.8 2.7
Gradient Boosting AdaBoost 92.8 3.9
Random Forest Gradient Boosting 88.3 2.6
Random Forest Random Forest 88.4 3.1
Random Forest AdaBoost 88.4 4.5
AdaBoost Gradient Boosting 75.5 3.4
AdaBoost Random Forest 75.5 4.5
AdaBoost AdaBoost 75.5 5.9
Table II. Performance Comparison by Algorithm Applied to GIS.

Phase 1: Phase 2: Classification
MAPE(%)
Classification model Regression model Accuracy(%)
Gradient Boosting Gradient Boosting 94.3% 1.2%
Gradient Boosting Random Forest 94.3% 1.6%
Gradient Boosting AdaBoost 94.3% 2.8%
Random Forest Gradient Boosting 94.4% 1.2%
Random Forest Random Forest 94.3% 1.6%
Random Forest AdaBoost 94.3% 2.8%
AdaBoost Gradient Boosting 78.4% 2.8%
AdaBoost Random Forest 76.9% 3.6%
AdaBoost AdaBoost 77.7% 4.6%
4.3 PERFORMANCE EVALUATION OF FINAL ALGORITHM
A 10-fold cross-check was used to evaluate and verify the performance of the HI
prediction module. A total of 2,933 observations of transformers and 1,542 observations
of GIS were used, 90 % of which were used as learning data, and 10 % as verification data,
which were repeatedly tested 10 times. The first phase ‘HI score interval classification
model’ that shows the best performance from the performance comparison results by
algorithm used the Gradient Boosting Classifier, while the second phase ‘HI prediction
regression model by point segment’ used the Gradient Boosting Regulator. The accuracy
of transformer classification was 92.8 %, and the predicted error rate MAPE was 2.2 %. In
other words, 94.3 % of the maintenance evaluation classes required for the facility were
correctly classified, and the HI value also showed a 1.2 % error rate, compared to the
6
actual value. The results of the transformers were visualized as a graph comparing the
actual values with the Confusion Matrix, as shown in Figure 5. The accuracy of the GIS
classification was 94.3 %, and the predicted error rate MAPE was 1.2 %. In other words,
94.3 % of the maintenance evaluation classes required for the facility were correctly
classified, while the HI value also showed a 1.2 % error rate compared to the actual value.
Figure 6 visualizes the GIS results the same as the transformers.
Figure 5. Visualisation of the HI prediction results for transformer

(left: noon matrix; right: scatterplot).
Figure 6. Visualisation of the HI prediction results for GIS

(left: noon matrix ; right: scatterplot).
5. Application case
The high-voltage transformer, GIS HI machine learning module developed through this
study is applied to the project of a smart substation of a steel company and a chemical
company who are our customers. The results of the proposed HI prediction module from
the actual measurement data of the device in the actual customer smart substation are
monitored. We plan to continuously verify and improve the performance and problems of
the HI machine learning module through monitoring for performance.
7
Figure 7. The customer Smart substation AMS HI screen
6. CONCLUSION
Through this study, a machine learning framework and module were developed to predict
the HI of high-voltage transformers and GIS. To overcome the limitations of the
previously used weight-based HI calculation and evaluation process, a framework
consisting of two phases was proposed: the ‘HI score interval classification model’, and
the HI score regression model by score interval. Three types of ensemble classification
model and regression model were applied to conduct comparative experiment, and
finally the Gradient Boosting algorithm was selected. The accuracy of transformer
classification was 92.8 %, that of the MAPE was 2.2 %, while the accuracy of GIS
classification was 94.3 %, and the MAPE was 1.2 %. For most data, score intervals were
classified by more than 92 %, and HI scores were also estimated to be approximately 2 %
different from the actual values. When applying the existing weighted average method
with the same data, the transformer MAPE was 18.6% and the GIS MAPE was 10.5%. It has
been confirmed that the proposed method has a much more accurate HI score prediction
performance than the existing weighted average method. In this study, a simple
weighted average based on pre-allocated weights was used to compensate for the
shortcomings of the existing processes, where maintenance matching for actual facility
defects was not conducted, thus improving maintenance matching by predicting a level
of HI that was similar to expert judgment HI. And in South Korea, a steel company and a
chemical company has applied the AMS including the HI prediction module to a 154kV S/S
in 2019.
8
BIBLIOGRAPHY
[1] CIGRE Technical Brochure 499, “Residual Life Concepts Applied to HV GIS”, 2012
[2] CIGRE Technical Brochure 248, “GUIDE on ECONOMICS of TRANSFORMER MANAGEMENT”,
2004
[3] CIGRE Technical Brochure 300, “Guidelines to an optimized approach to the renewal of
existing air insulated substations”, 2006
[4] CIGRE Technical Brochure 167, “USER GUIDE FOR THE APPLICATION OF MONITORING AND
DIAGNOSTIC TECHNIQUES FOR SWITCHING EQUIPMENT FOR RATED VOLTAGES OF 72.5 kV
AND ABOVE”, 2000
[5] J. R. JUNG*, H. D. SEO, S. J. KIM, “Application of an Asset Health Management System for
High-Voltage Substations" (21, rue d’Artois, F-75008, 2018PARIS)
[6] Naderian Jahromi, Ray Piercy, Stephen Cress, Jim R. R. Service, and Wang Fan, “An Approach
to Power Transformer Asset Management using Health Index” (IEEE Electrical Insulation
Magazine, March 2009)
[7] J. R. Jung, Y. M. Kim, S. W. Kim, and J. B. Kim, “Partial Discharge Diagnosis Method using Non-
phase Synchronized UHF PD Pattern based on On-site Measurement Database for Substion”
(21, rue d’Artois, F-75008, 2016 PARIS)
[8] J. R. Jung, H. D. Seo, S. J. Kim, and S. W. Kim, “Advanced Dissolved Gas Analysis(DGA)
Diagnostic Methods with Estimation of Fault Location for Power Transformer Based on Field
Database” (21, rue d’Artois, F-75008, 2016 PARIS)
[9] Dietterich, T. G. “Ensemble Methods in machine learning. In Proceedings of the First
International Workshop on Multiple Classifier Systems », pp. 1-15, 2000
[10] Merz, C. J. “Using correspondence analysis to combine classifiers. Machine Learning », 36(1/2),
pp. 33-58. 1999
[11] Liaw, A., & Wiener, M. “Classification and regression by Random Forest. R news”, 2(3), 18-22.
2002
[12] Breiman, L. “Random forests. Machine learning”, 45(1), 5-32. 2001
[13] Schapire, R. E. The boosting approach to machine learning: An overview. In Nonlinear
estimation and classification (pp. 149-171). Springer, New York, NY. 2003

B3-211 - 2020 Optimization of Health Indices For Power Assets in Substation Using Machine Learning Method

Uploaded by

Copyright:

Available Formats

You might also like

B3-211 - 2020 Optimization of Health Indices For Power Assets in Substation Using Machine Learning Method

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

B3-211 - 2020 Optimization of Health Indices For Power Assets in Substation Using Machine Learning Method

Uploaded by

Copyright:

Available Formats

Paris 2020

Optimization of Health Indices for Power Assets in Substation

J. R. JUNG*, H. D. SEO, K. R. HWANG H. R. DO, M. G. KWAK, S. B. KIM

Figrure 1. Process for developing HI machine learning modules.

2.1 COMPOSITION OF LEARNING DATA

2.2 GENERATION OF LEARNING DATA

3.1 FRAMEWORK OF THE HI MACHINE LEARNING MODULE

To improve HI predictive accuracy, this study proposes a first-phase ‘score interval HI

Figure 3. HI Prediction Framework.

3.2 APPLICATION OF THE MACHINE LEARNING ALGORITHM

4.1 METHODS FOR EVALUATING MODEL PERFORMANCE

To evaluate the performance of the HI predictive model developed, 10-fold cross-

Figure 4. Example of a 10-fold cross-validation.

Classification accuracy was used as an indicator of HI score interval accuracy, to evaluate

Table II. Performance Comparison by Algorithm Applied to GIS.

4.3 PERFORMANCE EVALUATION OF FINAL ALGORITHM

Figure 5. Visualisation of the HI prediction results for transformer

Figure 6. Visualisation of the HI prediction results for GIS

You might also like