Substations and Electrical Installations: Asset Health Indices For Equipment in Existing Substations

B3
Substations and
TECHNICAL BROCHURE
electrical installations
Asset health indices for

equipment in existing
substations
Reference: 858
December 2021
Asset health indices for
equipment in existing
substations
WG B3.48
Members
J. BEDNAŘÍK, Convener IE A. WILSON, Secretary GB

J. SMIT, B3 AA4 Advisor NL G. BALZER DE
R. CLERC FR P. CREGO ES
L. DARIAN RU A. GOYVAERTS BE
N. KAISER DE C. KOMIYA JP
A. LIVSHITZ US H. MANNINEN EE
L. MCCARTNEY IE T. MCGRAIL US
E. MORALES CRUZ US S. NOGUCHI JP
A. PURNOMOADI ID P. STEFFENS DE
B. VAN MAANEN NL T. WEHRSTEDT DE
P. WERDELMANN DE
Corresponding Members
R. CORNELL US
M. VERRIER AU
A. KURZ DE
Copyright © 2021
“All rights to this Technical Brochure are retained by CIGRE. It is strictly prohibited to reproduce or provide this publication in any
form or by any means to any third party. Only CIGRE Collective Members companies are allowed to store their copy on their
internal intranet or other company network provided access is restricted to their own employees. No part of this publication may
be reproduced or utilized without permission from CIGRE”.
Disclaimer notice
“CIGRE gives no warranty or assurance about the contents of this publication, nor does it accept any responsibility, as to the
accuracy or exhaustiveness of the information. All implied warranties and conditions are excluded to the maximum extent permitted
by law”.
WG XX.XXpany network provided access is restricted to their own employees. No part of this publication may be
reproduced or utilized without permission from CIGRE”.
Disclaimer notice
ISBN : 978-2-85873-563-1
“CIGRE gives no warranty or assurance about the contents of this publication, nor does it accept any
responsibility, as to the accuracy or exhaustiveness of the information. All implied warranties and
TB 858 - Asset Health Indices for Equipment in Existing Substations
Executive summary
Satisfactory and reliable performance of substation equipment is critical for any utility company. During
their service-life assets transition from being new to ones that are aged in terms of having one or more
developing failure modes occurring. This may be a gradual deterioration or a step change after a
damaging incident. Eventually failure would follow unless an appropriate corrective action is taken. As
such this deterioration presents a risk exposure affecting key business objectives unless it is identified
and managed. It is a situation that requires each asset, or in some cases a functional group of assets,
to have individual care plans to ensure their future ability to perform.
This process would be used to identify when to intervene with maintenance, refurbishment or
replacement. It creates a focus onto estimates of a “failure-free” period for each asset. It all begins
when the asset is newly commissioned, and it has ongoing revisions as the years in service pass. In
many situations these estimates in turn can be based on periodic condition assessments. To achieve
this the process begins with identification of relevant failure modes, to apply corresponding diagnostic
indicators and coordinate the outcomes within an Asset Health Indexing (AHI) methodology.
The specific aim for such work is two-fold. One is to develop processes to identify intervention
priorities applicable at the individual asset level. The second is to identify processes to aggregate
these priorities for different asset types to produce a score for a circuit end, bay or substation.
The development of this brochure draws upon related experience of working group members together
with some work undertaken mainly within CIGRE A2, A3 and B3 study committees.
1. IN-SERVICE ASSET FAILURES
In this context an in-service failure is a failure to perform a network duty and it does not necessarily
mean an event that creates the end of life. The deteriorated condition might be rectifiable with
maintenance, refurbishment or repair. In this way the aim is more to identify failure-free “life periods”
and this is not necessarily the same as re-defining asset life. In some types of asset such deterioration
is normally addressed with timely maintenance. More catastrophic damage is repaired. The end point
for asset life is when these tasks cease to be effective. This could be when the damage is too great,
one or more fundamental functions have irreversible limiting deterioration or when the costs of repair
or refurbishments outweighs the benefits.
Historically HV power equipment has been specified and then designed with ratings that optimised the
expectation of 40- or 25-year lifetime to match that for the civils and mechanical structures in the
substation or power station. Over time it has become apparent that these are lasting much longer – in
many cases. Experience also shows that even within comparable assets the rates of deterioration are
much more varied. SC A2 and A3 reliability studies have attributed the causes for the greater range in
asset performance to the quality of the design, manufacture, commissioning, maintenance and
variations in use, see their references [B1] and [B2].
One issue is then to have the capability to identify the time frames where assets are most likely to fail,
either because they require maintenance and repair, or due to irreversible causes. This range of
failure-free periods and consequential failure mechanisms require a different management practice
from one that follows a simple time-based assignment applied to the whole asset class. One way of
managing the assets is by using condition assessment if it can be linked to both the range and rate of
development of failure modes found to occur. These are critical provisos for the approach adopted in
this document. The outcomes may then be a failure free expectancy based upon asset health indexing
systems that link the condition to a failure time frame. The process is, therefore, one moving away
from broad time base criteria to one that has asset-specific decisions based upon condition. It may
migrate further into decisions based upon risk.
2. THE ROLE OF ASSET HEALTH INDICES
The role of an AHI is, therefore, to divide the asset register into several categories such as the five
shown in Table 1.1. This example is the one developed within WG B3.48 for primary assets and based
upon experience of members and of publications elsewhere. The time scales need to be user defined
and the active failure modes identified. Both format and methodology will depend on the intended
application. In order to aggregate scores across the various asset classes in a substation it is
important, however, to preserve the same definitions for scores for all asset types, at least within a
company. Most assets assessed in this way will be in groups 1 and 2 and require no specific remedial
action at the time of the assessment. It is part of a condition-based regime where activities only occur
3
when diagnostics indicate a need. It is necessary, therefore, with such an approach to continue with
an ongoing programme to capture condition data with an ongoing process to re-define and re-assess
activities and their time scales. Those in groups 3, 4 and 5 would be assessed and given their
individual action plan based upon the failure mode identified, its general rate of progression, its
condition-based rate of progression, and its criticality exposure.
Table 1.1 – Example of AHI definitions for expressing remaining failure free years
AHI CONDITION DEFINITION ACTIONS TIME SCALES
Very low likelihood of failure over Continue with inspect More than 10 years likely before additional
many years. This would be in the and test schedule. maintenance and refurbishment is
1 Very good original factory condition or after undertaken. Timing of interventions is asset
extensive refurbishment. specific and indicated by the inspection and
test results.
Low likelihood of failure over a Continue with inspect and 5-10 years likely before additional
long period. General deterioration test schedule. maintenance and refurbishment is
is consistent with its time in undertaken. Timing of interventions is asset
service. specific and indicated by the inspection and
test results. Subcategory bands can be
2 Good introduced based upon failure mode and
rate of change in diagnostics.
The impact of any life-limiting irreversible
deterioration is expected to be beyond this
time frame. If not, introduce extra column
with 5-year replacement bands.
Low risk defect or life-limiting Investigate the issue 2-5 years before interventions. Timing and
deterioration has been detected. and plan any scope are indicated by investigations,
Performance may be adversely intervention. Continue together with changes in inspect and test
affected long term unless remedial with a revised inspect results.
3 Fair action is carried out. and test schedule. Subcategories introduced in yearly bands
Revise life expectancy based upon failure mode and rate of change
planning into likely 5- in diagnostics.
year bands.
Progressive deterioration has been Remedial action to be 3- 24 months before interventions. Planning
detected, with high likelihood of carried out and/or the action and its timing is determined by
failure in the short term. The unit increased condition failure mode analysis and operational
can remain in service, but short- monitoring implemented. practicalities. This is managed using
4 Poor term reliability is likely to be De-rating and risk increased surveillance.
reduced. Subcategories are useful management zones may
to define urgency of repair or be needed.
replacement timeframes.
High likelihood of immediate failure Any exception would 0-3 months determined by risk assessment.
exists and the unit should not require intensive risk
remain in service. management actions. If
5 Critical returned to service
decision points and time
frames need to be
defined.
In category 1 the term “as new condition” is avoided since some newer assets can have a higher
failure rate.
Mostly assets will be allocated into categories 2 and 3 and this will not aid prioritisation in terms of
remedial actions. It is likely that there will be several subsets – 2.1, 2.2, etc. to provide this granularity
and allow prioritisation. It will be important however to make these subsets relate to time scales for
action consistent with category 2, or 3 etc.
3. CREATING THE AHI METHODOLOGY
Creating a condition linked methodology usually starts by working from the asset register – which is a
simple list identifying the company assets. The register identifies the asset type, design information,
location and function. The AHI modification includes an assessment of condition and implied likelihood
of an in-service failure within a timescale for each asset in the register. In this context, the modified list
is then referred to as an Asset Health Index (AHI). By associating a future time scale alongside the
detected deterioration, the outcome can then be used as the best tool to identify timing for
4
interventions to reduce the likelihood of such an in-service failure. Such interventions include asset
replacement, repairs, refurbishments and maintenance.
It is important that the interpretation of the test data needs to be done within the terms of the failure
mode. Test results are not failure modes and always need to be interpreted within their relationship to
failure modes and predicted rates of progression. For example, some tests (such as Dissolved Gas
Analysis (DGA) for transformer oils) can be indicators of several failure modes, but the analyses need
to be interpreted severally and relative to each failure mode.
The AHI is built from assessments of each of the critical failure modes. The result may be
encapsulated into the assessment of just one failure mode (that with the poorest assessment.)
Illustration of the approach on each of the many substation asset types is described in Chapter 4. This
worst score can cascade up levels for individual components, single assets, bays or substation, at
each level carrying forward the worst most urgent score. Alternatively, some granularity can be
achieved, at least at component or asset level either by having sub-categories based upon either the
failure mode and its detected rate of progression, or by aggregating scores for all failure modes.
However, this is not without its problems when results from many modes and where different assets in
a bay or substation are aggregated. This is an aspect discussed at length in Chapter 5 of the brochure
and concludes with the appreciation that there is no single approach to AHI, many will work and need
to be chosen to suit the application.
4. CONCLUSIONS
 The AHI approach is applicable where there is a link through condition indicators of symptoms of
failure modes and the timescales for the transition from being sound to being likely to cause an
in-service failure.
 Creating an AHI approach is to produce a listing of each asset in terms of its likelihood to fail in
service in a user selected time interval. This likelihood would be used with a criticality analysis to
form a risk assessment register.
 Any AHI process should repeat and follow each asset through its life by identifying changing
likelihoods of failure with their associated time periods and by creating an action plan for an
intervention – maintenance, repair, or replacement.
 The resulting set of AHIs should be calibrated for time. The AHI must uniformly reflect the same
urgency of intervention. All assets with the same score should have the same timescale for
intervention, irrespective of failure mode or asset type, otherwise there is confusion in applying
AHIs consistently.
 A ‘poorer’ AHI should always reflect a more urgent condition. This means that where several
failure modes are being assessed and the scores aggregated the method of aggregation should
not produce any violation of this principle.
 The AHI methodology can be used at component, asset, bay and substation levels, incorporating
a wide variety of asset types and for a range of outcomes – maintenance and replacement
planning. With such a range there is no single “correct” method for developing and applying the
AHI process.
5
Contents
Executive summary ............................................................................................................. 3
Figures and Illustrations ................................................................................................... 11
Tables ................................................................................................................................. 13
1. Introduction.............................................................................................................. 16
Goal of this working group and the technical brochure ...................................................................... 16
The role of health indices within asset life planning ............................................................................ 16
Drivers for the development of an AHI process .................................................................................... 18
Societal impact of in-service failures .................................................................................................. 18
Regulatory impact .............................................................................................................................. 18
AHI within an asset management process ............................................................................................ 19
AHI and the ageing asset base ............................................................................................................... 19
Dealing with Unexpected Failures.......................................................................................................... 21
Experience developing AHI .................................................................................................................... 22
2. Processes used in Asset Health Indexing ............................................................. 23

AHI Processes described in Publications ............................................................................................. 23
Asset Health Index terminology ............................................................................................................. 23
Assets and health indices .................................................................................................................. 23
AHI Applications................................................................................................................................. 25
Failures, reliability, probability and likelihood of failure ...................................................................... 25
Diagnostic Indicators for failure modes .............................................................................................. 26
Failure mode susceptibility indicators ................................................................................................ 27
Intervention ........................................................................................................................................ 27
Failure Mode, Effects and Analysis ........................................................................................................ 27
Extent of an AHI review and restricted assessments ........................................................................... 28
Level 1: Basic Strategy – based on office study ................................................................................ 28
Level 2: Simple Strategy – added visual inspections ......................................................................... 29
Level 3: Intermediate Strategy – added non-invasive diagnostic ....................................................... 29
Level 4: Advanced Strategy – added offline measurements and investigations ................................ 29
Level 5: Advanced Strategy – added continuous online monitoring ................................................... 29
Translating into the scale code .............................................................................................................. 30
Working with scale codes ....................................................................................................................... 32
Missing or aged data.......................................................................................................................... 32
Linear and Logarithmic summing options .......................................................................................... 32
Weighting of scores ........................................................................................................................... 33
Displaying aggregated condition scale codes for a single asset ........................................................ 34
Assembling the AHI ................................................................................................................................. 35
Chapter conclusions – creating an asset health index ........................................................................ 36
6
3. The generic methodology ....................................................................................... 37

Step 1: Identify the assets and decide on review levels. ..................................................................... 37
Failure impact assessment ................................................................................................................ 37
Review levels ..................................................................................................................................... 38
Step 2: Perform FMEA ............................................................................................................................. 38
Step 3: Assess Individual Asset Performance ...................................................................................... 38
Asset register data ............................................................................................................................. 39
Documentation ................................................................................................................................... 39
The original specifications.................................................................................................................. 40
Standards .......................................................................................................................................... 40
Factory Information ............................................................................................................................ 40
Financial information on the different asset classes:.......................................................................... 40
Operation history on the different asset classes: ............................................................................... 40
Failure information ............................................................................................................................. 40
Maintenance policy ............................................................................................................................ 41
Historic test and inspection data ........................................................................................................ 41
Failure Susceptibility Indicators.......................................................................................................... 41
Scoring Failure Susceptibility Indicators ............................................................................................ 42
Step 4: Identify the condition indicators to be used ............................................................................. 42
Step 4.1: Estimate the detectability .................................................................................................... 42
Step 4.2: Estimate the cost of monitoring the condition indicator ....................................................... 43
Step 4.3: Decide the condition indicators to be used ......................................................................... 44
Step 5: Collect inspection data............................................................................................................... 44
Step 6: Evaluate Current Condition relative to key failure modes ...................................................... 45
Step 6.1: Translating the condition indicator result to a condition scale code score .......................... 45
Step 6.2: Translating the set of condition indicator scores to a condition indicator index .................. 45
Step 7: Aggregate analyses for AHI ....................................................................................................... 45
Step 7.1: Aggregate condition scale code scores to a sub-health score and asset health score ....... 46
Step 8: Identify mitigation actions ......................................................................................................... 46
Assembling the final AHI ........................................................................................................................ 47
4. APPLIED METHODOLOGY ...................................................................................... 48

Steps common for all asset categories ................................................................................................. 48
Step 1: Identify the assets, gather asset data and decide on review levels ....................................... 48
Step 2: Perform FMEA ....................................................................................................................... 49
Step 3: Assess Individual Asset Performance ................................................................................... 49
Transformers and reactors ..................................................................................................................... 52
Step 1: Identify the assets and decide on review levels ..................................................................... 52
Step 2: Perform FMEA and identify condition indicators to be used .................................................. 53
Step 3 Assess Individual Asset Performance .................................................................................... 54
Step 4: Identify diagnostic strategy .................................................................................................... 55
Step 5: Collect inspection data .......................................................................................................... 55
Step 6: Evaluate Current Condition relative to key failure modes ...................................................... 55
7
Step 7: Aggregate analyses for AHI ................................................................................................... 59

Step 8: Identify mitigation actions to improve AHI .............................................................................. 59
Circuit breakers ....................................................................................................................................... 60
Step 1: Identify assets and decide review level ................................................................................. 60
Step 2: Perform FMEA and identify condition indicators .................................................................... 60
Step 4: Identify the condition indicators to be used ............................................................................ 64
Step 6: Evaluate current condition relative to key failure modes ........................................................ 67
Step 7: Aggregate analysis for AHI .................................................................................................... 68
Step 8: Plan Actions........................................................................................................................... 69
Disconnectors and earthing switches ................................................................................................... 70
4.4.1 Step 1: Identify the Assets and Decide on Review Levels ................................................................. 70
4.4.2 Step 2: Perform FMEA and identify condition indicators to be used .................................................. 71
4.4.3 Step 3: Assess Individual asset Performance .................................................................................... 74
4.4.4 Step 4: Identify diagnostic strategy .................................................................................................... 74
4.4.5 Step 5: Collect inspection data .......................................................................................................... 75
4.4.6 Step 6: Evaluate current Condition relative to key failure modes ....................................................... 76
4.4.7 Step 7: Aggregate analyses for AHI ................................................................................................... 76
4.4.8 Step 8 Plan actions ............................................................................................................................ 76
Instrument Transformers ........................................................................................................................ 77
Step 1: Identify the assets and decide on review levels ..................................................................... 77
Step 4: Identify condition indicators to be used .................................................................................. 82
Step 6: Evaluate current condition relative to key failure modes ........................................................ 85
Step 7: Aggregate analyses for AHI ................................................................................................... 85
Step 8: Identify mitigation actions to improve AHI .............................................................................. 86
GIS ............................................................................................................................................................ 87
Step 1: Identify assets and decide review level ................................................................................. 87
Step 3 Assess (Individual) Asset Performance .................................................................................. 87
Step 4: Identify Diagnostic Strategy and condition indicators ............................................................ 91
Step 5: Collect Inspection Data .......................................................................................................... 93
Step 6: Evaluate Current Condition relative to key failure modes and Norms Generation ................. 93
Step 7: Aggregate Indicators’ analysis for Asset Health Index ........................................................... 97
Step 8: Plan Mitigation Actions ........................................................................................................ 100
Other substation primary equipment ................................................................................................... 102
Step 1: Identify the assets and decide on review levels ................................................................... 103
Step 2: Perform FMEA and identify condition indicators .................................................................. 105
Step 3: Assess Individual Asset Performance ................................................................................. 108
Step 4: Identify diagnostic strategy .................................................................................................. 108
8
Step 5 Collect Inspection data ......................................................................................................... 111

Step 6: Evaluate current condition relative to key failure modes ...................................................... 112
Step 7: Aggregate Indicators for AHI ............................................................................................... 112
Step 8: Plan mitigating actions ......................................................................................................... 112
Control and protection .......................................................................................................................... 113
Step 1: Identify the assets and decide on review levels ................................................................... 113
Step 2: Perform FMEA and identify condition indicators .................................................................. 113
Step 3: Assess Individual Asset Performance ................................................................................. 115
Step 5: Collect inspection data ........................................................................................................ 115
Step 7: Aggregate indicators for AHI................................................................................................ 115
Step 8: Plan mitigating actions ......................................................................................................... 115
Auxiliary systems .................................................................................................................................. 116
4.9.1 Step 1: Identify Assets ..................................................................................................................... 116
4.9.2 Step 2 Review Failure Modes .......................................................................................................... 116
4.9.3 Step 3: Assess historic performance................................................................................................ 117
4.9.4 Step 4 Identify Diagnostic Strategy .................................................................................................. 117
4.9.5 Step 5 Collect Inspection Data ......................................................................................................... 117
4.9.6 Step 6 Evaluate Condition relative to Failure Mode ......................................................................... 117
4.9.7 Step 7 Aggregate Indicators to AHI.................................................................................................. 117
4.9.8 Step 8 Plan Actions.......................................................................................................................... 117
Buildings and structures ...................................................................................................................... 117
Step 1: Identify the assets and decide on maturity levels ................................................................ 117
Step 2: Perform FMEA and identify condition indicators to be used ................................................ 120
Step 3: Assess individual asset performance .................................................................................. 121
Step 5: Collect inspection data ........................................................................................................ 122
Step 7: Aggregate analyses for AHI ................................................................................................. 123
Step 8 Identify mitigation actions to improve AHI ............................................................................. 124
5. Assembling sets of AHI outcomes and Displaying results ................................. 125

Issues when combining sets involving different asset types ............................................................ 125
Examples – Part 1 .................................................................................................................................. 126
Simple Substation Max and Average of available indices ................................................................ 126
Combining Asset Health Indices: Simple Approach ......................................................................... 127
Potential methods for aggregation of health scores .......................................................................... 128
Option 1 – Enumeration of single (overall) asset scores .................................................................. 128
Option 2 – Enumeration of all available condition indicator scores for all assets ............................. 129
Option 3 – Normalisation of all asset scores into one overall aggregate score ................................ 130
Option 4 – Focussed aggregation using probability of failure information ........................................ 132
Sanity checking – PoF back calculation, expected condition issues ............................................... 133
Feedback discussion ....................................................................................................................... 134
9
Back calculation of probability of failure ........................................................................................... 134

Conclusions relating to aggregation ................................................................................................... 135
6. Conclusion ............................................................................................................. 137
APPENDIX A. Definitions, abbreviations and symbols ................................................. 139

A.1. General terms......................................................................................................................................... 139
A.2. Specific terms ........................................................................................................................................ 139
APPENDIX B. Links and references ............................................................................... 141
APPENDIX C. Additional explanation specific to Chapter 5 ......................................... 144

C.1. Characteristics of combinable health indices ..................................................................................... 144
C.2. Mathematics of probability ................................................................................................................... 144
APPENDIX D. CIGRE PUBLICATIONS ............................................................................ 147

D.1. UK TSO ................................................................................................................................................... 147
D.2. USA TSO ................................................................................................................................................. 148
D.3. OEM – International group of transformer experts ............................................................................. 148
D.4. A2 Brochure TB 761 Condition assessment of power transformers ................................................ 149
APPENDIX E. COLLABORATIVE DEVELOPMENTS ...................................................... 150

E.1. UK DNO .................................................................................................................................................. 150
APPENDIX F. UTILITY DEVELOPMENTS ....................................................................... 151

F.1. Canadian TSO ........................................................................................................................................ 151
F.2. USA Utility .............................................................................................................................................. 151
F.3. Indian Power System ............................................................................................................................. 152
F.4. Japanese Utility ..................................................................................................................................... 153
F.5. Transmission power lines in Africa ..................................................................................................... 153
APPENDIX G. WG members experiences ...................................................................... 156

G.1. UK and USA TSO together with collaborating service provider ........................................................ 156
G.2. Belgian TSO ........................................................................................................................................... 156
G.3. Dutch TSO .............................................................................................................................................. 157
G.4. Dutch service provider .......................................................................................................................... 157
G.5. German TSO ........................................................................................................................................... 158
G.6. German DSO .......................................................................................................................................... 159
G.7. German OEM .......................................................................................................................................... 159
G.8. Estonian utility ....................................................................................................................................... 162
G.9. Japanese OEM ....................................................................................................................................... 162
G.10. Japanese Utility ..................................................................................................................................... 163
G.11. Russian service provider ...................................................................................................................... 164
G.12. Indonesian utility – AHI for a tropical climate ..................................................................................... 166
10
Figures and Illustrations

Figure 1.1 – Risk based decision making .............................................................................................. 17
Figure 1.2 – Outcomes of asset failure ................................................................................................. 18
Figure 1.3 – Asset investment planning ................................................................................................ 19
Figure 1.4 – Failure hazard and replacement hazard for TSO population [B1] and [B14] .................... 20
Figure 1.5 – Hoop buckling on common winding (left) and crush damage on the tertiary (right). ........ 22
Figure 2.1 – Achieving AHI with 5 identified strategies, each with staged activities ............................. 28
Figure 2.2 – Linking test data to failure modes and to a linear condition scale code ............................ 32
Figure 2.3 – Numbers and scale codes as shown in TB 761 [B3] ........................................................ 34
Figure 2.4 –Score the diagnostic indicators .......................................................................................... 36
Figure 3.1 – The steps to creating an AHI ............................................................................................. 37
Figure 3.2 – Data to be obtained and assessed.................................................................................... 39
Figure 3.3 – Failure evolution and diagnostics ...................................................................................... 43
Figure 3.4 – Cost benefit analysis for evaluating condition indicators .................................................. 44
Figure 4.2.1 – Inspection example findings – Tank rusting and a stuck WTI ........................................ 57
Figure 4.2.2 – Some site diagnostics for use without outages .............................................................. 58
Figure 4.3.1 – Live tank (left) and dead tank (right) circuit breakers ..................................................... 60
Figure 4.3.2 – Measured physical parameters in switchgear condition evaluation ............................... 63
Figure 4.3.3 – Examples of scoring condition indicators ....................................................................... 68
Figure 4.3.4 – Aggregate condition scale scores into an asset health score ........................................ 68
Figure 4.4.1 – Disconnector and earthing switch .................................................................................. 70
Figure 4.4.2 – Distribution of failed subassembly (DS; ES; DE = DS + ES) [B21] ................................ 73
Figure 4.4.3 – Distribution of failure origin (DS; ES; DE = DS + ES) [B21] ........................................... 73
Figure 4.5.1 – Translating the set of condition indicator scores to a condition indicator index ............. 85
Figure 4.6.1 – An example of a feeder bay in GIS. The components are placed inside different
enclosures of GIS .................................................................................................................................. 88
Figure 4.6.2 – The hierarchical layers in GIS [B16]............................................................................... 90
Figure 4.6.3 – Example of some failure modes of the dielectric subsystem of GIS from a case study
[B16] ...................................................................................................................................................... 90
Figure 4.6.4 – Example of some failure modes of the construction and support subsystem of GIS [B16]
............................................................................................................................................................... 91
Figure 4.6.5 – Boundary values for humidity content in the CB enclosure for GIS from a manufacturer.
The fitted distribution is the Gamma distribution. .................................................................................. 94
Figure 4.6.6 – A carbonized female-main contact in one of circuit breaker in GIS in the case study.
The measurement before opening the enclosure had shown the increase of the static contact
resistance above 20% of the value during commissioning. .................................................................. 96
Figure 4.6.7 – Failed fragments of an epoxy disconnector drive tube (left) that exploded out through
bursting disc into the bay and created a significant safety risk. A similar unit is on right. Defects in the
casting considered to be the cause. ...................................................................................................... 97
Figure 4.6.8 – The single line diagram of the GIS example from the case study ................................. 98
Figure 4.6.9 – The configuration of enclosures in three types of bays in GIS example ........................ 99
Figure 4.7.1 – Stack of capacitor rolls within a can ............................................................................. 102
11
Figure 4.7.2 – Failed centre phase arrester ........................................................................................ 104

Figure 4.7.3 – Failure of bottom rack capacitor ................................................................................... 105
Figure 4.7.4 – Puncture hole at end of core screen on 132 kV cable. ................................................ 107
Figure 4.7.5 – Tracking on porcelain insulator .................................................................................... 107
Figure 4.7.6 – UHF and UV Scanning to detect PD ............................................................................ 109
Figure 4.7.7 – Benchmarking DDF data with international database resource [B38] ......................... 110
Figure 4.7.8 – Bushing tap modified for PD and PF measurements, and typical results .................... 111
Figure 4.10.1 – Sample image of substation where we can name all assets under consideration .... 117
Figure 4.10.2 – High security fence ..................................................................................................... 119
Figure 5.1 – Max and Average of Asset Health Indices at a single station ......................................... 127
Figure 5.2 – Possible visualisation of asset scores ............................................................................. 129
Figure 5.3 – Possible visualization of asset scores ............................................................................. 130
Figure 5.4 – Example of a bay configuration ....................................................................................... 133
Figure 5.5 – Adjusting category PoF values ........................................................................................ 135
Figure D.1 – An OEM’s RCM approach [B43] ..................................................................................... 148

Figure D.2 – Scoring matrix from TB 761 [B3] .................................................................................... 149
Figure E.1 – DNO Methodology Derivation of PoF [B44] .................................................................... 150
Figure F.1 – Flowchart for Indian utility ............................................................................................... 152
Figure F.2 – AHI derivation.................................................................................................................. 154
Figure F.3 – AHI Distribution ............................................................................................................... 154
Figure G.1 – Example with a step change in apparent age ................................................................ 157
Figure G.2 – AHI assessments............................................................................................................ 157
Figure G.3 – Displaying the AHI result ................................................................................................ 160
Figure G.4 – Graphical representation of WPA-Method ..................................................................... 161
Figure G.5 – Health Index representation in RCAM Dynamic ............................................................. 161
Figure G.6 – RCAM Methodology overview ........................................................................................ 161
Figure G.7 – Methodology developed in Japan................................................................................... 163
Figure G.8 – Circuit breaker example from Japan .............................................................................. 163
Figure G.9 – Number of units for each health index value .................................................................. 164
Figure G.10 – Health index distribution against age ........................................................................... 164
12
Tables
Table 1.1 – Example of AHI definitions for expressing remaining failure free years ............................... 4
Table 2.1 – Example of AHI definitions for expressing remaining failure free years ............................. 24
Table 2.2 – Log and Linear condition scale codes ................................................................................ 30
Table 2.3 – Converting condition indicators (observations or measured values) to condition scale
codes ..................................................................................................................................................... 31
Table 2.4 – Aggregating summed scores with linear and logarithmic scoring ...................................... 33
Table 2.5 – Effect of weighting linear scores [B15] ............................................................................... 33
Table 2.6 – Aggregating scores............................................................................................................. 35
Table 3.1 – Asset data ........................................................................................................................... 40
Table 3.2 – First level assessment – example of a susceptibility review .............................................. 42
Table 3.3 – Detectability ........................................................................................................................ 43
Table 3.4 – Indication of restricted data and limited confidence ........................................................... 45
Table 3.5 – Example showing the relation between score and AHI ...................................................... 46
Table 3.6 – The compiled AHI – example based upon Log base 3 scoring .......................................... 47
Table 4.1.1 – Asset register information example ................................................................................. 48
Table 4.1.2 – Consequences of Failure ................................................................................................ 48
Table 4.1.3 – Diagnostic indicators in use and failure modes ............................................................... 49
Table 4.1.4 – Common asset data ........................................................................................................ 50
Table 4.1.5 – Scoring Historic data ....................................................................................................... 50
Table 4.1.6 – Level assessment – example of review of failure mode susceptibility factors ................ 51
Table 4.2.1 – Common faults and indicators (simplified list) ................................................................. 54
Table 4.2.2 – Scale code assignment ................................................................................................... 56
Table 4.2.3 – Visual Inspection ............................................................................................................. 56
Table 4.2.4 – Survey test results ........................................................................................................... 57
Table 4.2.5 – On-line monitoring ........................................................................................................... 58
Table 4.2.6 – Offline and investigative testing ....................................................................................... 58
Table 4.2.7 – Common faults, indicators and scoring for AHI ............................................................... 59
Table 4.3.1 – Distribution of CB failures per cause ............................................................................... 61
Table 4.3.2 – MaF modes ...................................................................................................................... 61
Table 4.3.3 – Examples of condition indicators related to components and failure modes .................. 64
Table 4.3.4 – Example of condition indicator estimation for circuit breakers ........................................ 65
Table 4.3.5 – Review Level, Grid Integrity and C/P for condition indicators ......................................... 66
Table 4.3.6 – Typical condition indicators and scoring methodologies ................................................. 67
Table 4.3.7 – Comprehension about health indices .............................................................................. 69
Table 4.4.1 – Main tasks of the equipment ........................................................................................... 71
Table 4.4.2 – Review level .................................................................................................................... 71
Table 4.4.3 – DS and ES: Failure mode of drive only by type of drive (Sum MaF + MiF) [B21] ........... 72
Table 4.4.4 – DS and ES: Failure mode excluding drive (Sum MaF + MiF) (Table 3-60; Table 3-59 in
[B21]) ..................................................................................................................................................... 72
13
Table 4.4.5 – Effects and root causes of several Failure Modes .......................................................... 73
Table 4.4.6 – Deciding diagnostic strategy ........................................................................................... 75
Table 4.4.7 – Example assessment and comparison of 3 different disconnectors ............................... 75
Table 4.5.1 – Component, failure mode and indicators ........................................................................ 78
Table 4.5.2 – Visual Inspection [B28] .................................................................................................... 80
Table 4.5.3 – Non-invasive in-service test results ................................................................................. 81
Table 4.5.4 – Offline and investigative testing [B28] ............................................................................. 81
Table 4.5.5 – On-line monitoring ........................................................................................................... 82
Table 4.5.6 – Detectability of diagnostics .............................................................................................. 82
Table 4.5.7 – Example oil results .......................................................................................................... 84
Table 4.5.8 – Example of translation of the C2H6 condition from DGA to a condition indicator index .. 85
Table 4.5.9 – Reduction of dielectric withstand capability ..................................................................... 85
Table 4.5.10 – Example AHI scores ...................................................................................................... 86
Table 4.6.1 – GIS components, sub group of components, subsystems, function of subsystems, and
key parts ................................................................................................................................................ 88
Table 4.6.2 – The condition indicators in subsystems of GIS ............................................................... 91
Table 4.6.3 – Summary of norm for humidity content for 150 kV GIS from a specific manufacturer as
generated from different approaches .................................................................................................... 94
Table 4.6.4 – Example of condition scores and their descriptions ........................................................ 95
Table 4.6.5 – Condition scores of primary conductor subsystem in GIS .............................................. 96
Table 4.6.6 – Example of Condition Score (CC), interpretation, and bay index ................................... 98
Table 4.6.7 – Summary of Condition Scores of Subsystems in CB (G0) from each line of GIS ........... 99
Table 4.6.8 – Summary of Bay Index of GIS example .......................................................................... 99
Table 4.6.9 – Failure susceptibility indicator index of GIS example .................................................... 100
Table 4.6.10 – Summary of Bay Health Index & Failure Susceptibility Indicator index of GIS example
before and (expected after) mitigation action ...................................................................................... 101
Table 4.7.1 – Diagnostic indicators in use and failure modes ............................................................. 107
Table 4.7.2 – Dielectric dissipation factor analysis for capacitor banks .............................................. 109
Table 4.7.3 – Capacitance analysis for capacitance banks ................................................................ 110
Table 4.7.4 – Data and scale codes .................................................................................................... 111
Table 4.7.5 – Scale code assignment ................................................................................................. 112
Table 4.8.1 – Identifying assets and diagnostics................................................................................. 113
Table 4.8.2 – failure mode analysis ..................................................................................................... 113
Table 4.9.1 – Auxiliary Equipment and Roles ..................................................................................... 116
Table 4.9.2 – Review levels ................................................................................................................. 116
Table 4.10.1 – Components ................................................................................................................ 121
Table 4.10.2 – Failure mode detection indicators ............................................................................... 122
Table 4.10.2 – Classification rules for buildings according to their condition [B5] .............................. 124
Table 5.1 – Condition scale code examples ........................................................................................ 126
Table 5.2 – Example with alphabetical codes ..................................................................................... 127
Table 5.3 – Example with numeric codes ............................................................................................ 128
Table 5.4 – Combining AHI for 3 assets, alphanumeric codes ........................................................... 128
14
Table 5.5 – Second example with alphanumeric codes ...................................................................... 128

Table 5.6 – Use of colour coding and TB 761 scoring [B3] ................................................................. 129
Table 5.7 – Example of condition indicator scores: Asset 1 ................................................................ 129
Table 5.8 – Example of condition indicator scores: Asset 2 ................................................................ 129
Table 5.9 – Enumeration of Combined Asset Condition Scores for Assets 1 and 2 [B3] ................... 129
Table 5.10 – Example of a simplified bay............................................................................................ 130
Table 5.11 – Example of aggregation of scores .................................................................................. 131
Table 5.12 – Example of a log-3 based scoring system with category promotion .............................. 131
Table 5.13 – Example of a Combined Score Without Category Promotion ........................................ 132
Table 5.14 – Example of correlating scoring categories to ranges of failure probability. .................... 132
Table 5.15 – Example of calculating overall failure probability of the bay .......................................... 133
Table 5.16 – Comparing weighted with Max and field engineer assessment ..................................... 134
Table A.1 – Definition of general terms used in this TB ...................................................................... 139

Table A.2 – Definition of technical terms used in this TB .................................................................... 139
Table C.1 – Estimated probability of failure ......................................................................................... 145
Table D.1 – Asset health legend extracted from reference [B14] ....................................................... 147
Table D.2 – AHI scoring used in TB 761 [B3]...................................................................................... 149
Table F.1 – Weighting factors for diagnostics ..................................................................................... 152
Table F.2 – Example of a transmission line evaluation scoring method ............................................. 153
Table F.3 – AHI and Probability of failure ............................................................................................ 154
Table G.1 – Final ranking .................................................................................................................... 157
Table G.2 – Criteria for the condition assessment of equipment in case of a circuit-breaker ............. 158
Table G.3 – Example of a 66kV transformer assessment ................................................................... 162
Table G.4 – Health index calculation. .................................................................................................. 165
Table G.5 – Examples of measurands ................................................................................................ 166
Table G.6 – Example of condition codes ............................................................................................. 167
Table G.7 – Condition code range interpretations .............................................................................. 167
15
1. Introduction
Goal of this working group and the technical brochure
Working group B3.48 was created to produce a technical brochure describing a process to classify
substation assets in terms of their changing likelihood of having an in-service failure. This would be
achieved by producing guidelines for companies to build credible Asset Health Indices (AHIs). These
AHIs should be the first step towards more focused outcomes such as plans for maintenance, asset
refurbishment, asset replacement and risk management.
The role of health indices within asset life planning

An AHI is, or should be, part of a life plan for the various assets owned by a utility company. The first
step is deciding which assets need to be assessed and for what purpose. In the short term the most
immediate priority is to identify and manage possibilities of in-service failures. For the longer term it
can be part of the planning process for critical decisions to manage replacement or maintenance.
A regulator-driven capital planning tool might, for example, allocate asset replacements into time
bands corresponding to the future regulatory review cycles. Assets in each band have been assessed
based upon condition and having a common lifetime remaining thereby justifying the reinvestment
plan. Similarly, for maintenance tasks the bands relate to timescales before onset of malfunction if
corrective work is not undertaken.
Historically the underlying basis for both maintenance and replacement has been a time-based
regime. Over the years other strategies have evolved, modifying the approach to one based upon
condition or risk. Here the AHI can be used as a facilitating process by identifying assets in terms of
the time periods for failure or malfunction, one linked to identifying developing deterioration. In this
case the need is to avoid business disruption by taking timely avoiding action. The process is to
assess the “health” or “resilience” such that outcomes are then to assign likelihood of failure within
several forward-looking time intervals. The health assessment does not necessarily involve looking
backwards to past time in service. For assets in this category service age is a poor proxy for
estimating end of life or likelihood of failure.
Creating a failure or malfunction linked AHI methodology usually involves working from the asset
register, which is a simple list identifying the company assets. The register identifies the asset type,
design information, location and function. The AHI modification includes an assessment of condition
and implied likelihood of an in-service failure within a timescale for each asset in the register. In this
context the modified list is then referred to as an Asset Health Index (AHI). By associating a future
time scale alongside the detected deterioration, the outcome can then be used as the best tool to
identify timing for interventions to reduce the likelihood of such an in-service failure. Such interventions
include asset replacement, repairs, refurbishments and maintenance. In some cases, the AHI
development can lead to further sets of “action plan” indices, each prioritising assets in terms of the
need for one type of intervention. This is an approach also adopted in a recent A2 Brochure [B3].
However, the choice of which type of intervention to use is a further step beyond the AHI and the work
of the current B3.48, involving assessing technical feasibility, cost benefit and risk analysis.
Primary equipment is purchased based upon a design made to a specification that relates asset
lifetime to its application. The equipment invariably comes with a warranty reliant upon following a
specified maintenance regime based on time interval or duty cycle. After the warranty period many
asset owners develop their own plans for interventions, such as maintenance, refurbishment and
replacement, in order to achieve the design lifetime. Each of these plans are then applied uniformly
across the various classes of assets and related to specifications for rating and duty. Two earlier B3
groups have surveyed maintenance trends and showed a further development having greater use of
diagnostics within a condition based decision based framework [B4], [B5]. In the case of maintenance
timing the move has usually not been to a purely condition based timing regime, but to one applying
condition assessment as a facilitating tool to enhance the application of time, reliability centred or risk-
based maintenance strategies. As described in TB 660 [B5], asset replacement decisions are
increasingly being based on an assessment of the likelihood of failure and the consequential risk
exposure.
To apply these decision processes involves using a method called Failure Mode, Effects Analysis
(FMEA) to identify relevant deterioration. Failure modes are here linked to corresponding diagnostic
indicators capable of assessing the condition and likelihood of failure within a time frame. It evolves
16
towards a set of plans for managing each asset, a process consistent with international asset
management practice [B6], [B7], [B8]. Within this context, company asset managers will be most
concerned with going further than only assessing condition and likelihood of failure. They will be
assessing the risk of failure where this is defined as:
𝑅𝑖𝑠𝑘 𝑜𝑓 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑜𝑓 𝑎𝑛 𝑎𝑠𝑠𝑒𝑡 = 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑠𝑠𝑒𝑡 × 𝑐𝑜𝑛𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒𝑠 𝑜𝑓 𝑖𝑡𝑠 𝑓𝑎𝑖𝑙𝑢𝑟𝑒
Here the consequences of failure will be assessed in terms of the asset's role in the network, business
impact, safety, environmental damage etc. This creates a much more bespoke assessment between
companies depending upon their risk tolerance. Conversely the "likelihood" factor should be more
amenable to creating a common methodology across companies. For this reason, the terms of
reference for CIGRE Group B3.48 is being limited to this aspect. The work will be followed by a later
group B3.61 to complete the task of building the methodology for the broader risk-based decisions.
Diagrammatically this two-stage approach is shown in Figure 1.1.
Figure 1.1 – Risk based decision making

In this the first step for each asset is to relate diagnostic indicators to appropriate failure modes. The
second step is to ascribe the consequence into one of several asset health index (AHI) categories,
each expressing the condition regarding failure possibility in a time range varying between the near to
distant future. Here the working group has decided that five categories are appropriate, and these will
be described in later chapters. Once placed in one of the AHI categories, the assessment may then be
used:
 To identify time scales for remediation activities for each asset, activities such as
maintenance, repair, refurbishment or replacement, where appropriate. This has impact on the
capital and operational financial plans.
 To identify optimum means to monitor the changing rate of deterioration and identify risk
management planning requirements.
17
Drivers for the development of an AHI process

Societal impact of in-service failures
Figure 1.2 – Outcomes of asset failure

International environmental commitments will lead to significant societal changes consequent to the
reduced use of carbon-based fuels during this century. In turn this will lead to increased reliance upon
electrical power to be delivered with optimal performance and reliability. It will require the correct and
adequate infrastructure in place to deliver these changes. A major risk will be the disruption to society
caused by loss of load events following an in-service failure of major assets. These can lead to a
range of undesirable outcomes, as shown in Figure 1.2. Within this context a "failure" includes not only
a catastrophic failure but also failure of the equipment to perform its role in a network and failure to
comply with specification criteria, including environmental requirements such as continuing to contain
insulating fluids and gases.
Regulatory impact
Much of the industry has changed focus to become performance driven organisations. The most
significant facilitator has been the development of an asset manager model as a single business
function in the utility. Such a function is empowered by the company executives to implement their
asset related strategies. This involves the control of costs to achieve the stated business objectives for
network performance, risk exposures and return on investment. It has led to the need to identify what
assets exist, where they are in the network and what is their role relative to these business objectives.
Risk management is a fundamental role in an asset management company, and it is a legal
requirement in some.
Regulators have been keen to see that utilities have processes in place to manage the competing
demands of cost reduction, network performance and the range of business risks. It was this that led a
range of utility sector stakeholders to create firstly the BSI-PAS 55 document [B6], [B7] and more
recently in February 2014 the first international asset management standard, ISO 55000 [B8]. These
have been used to change organisations which had been founded as service providers into ones that
are asset focused, achieving business returns on invested capital whilst defining and managing risk
exposures. One important feature of this asset management model has been the "line of sight" which
is a direct link between the role for every asset within the system and how it meets the objectives set
by the utility executives. This means that each asset has its own life plan. This includes its
receiving ongoing investment to meet its business goals. Thus, from a completely different standpoint
the requirement has been created to evolve an asset register to include cost evaluated mitigation
plans to address identified asset related risks relating to loss of load. This is what creating asset health
index is attempting to achieve when allocating each asset into a band reflecting likelihood of failure
within a time period.
18
AHI within an asset management process

Utilities need to balance the competing requirements of performance, cost and risk through
establishing comprehensive and fully integrated strategies to manage assets. This needs a clear
process and a culture directed at gaining greatest lifetime effectiveness, value, profitability and return
from the asset. This has led to development of asset management systems, outlined in PAS 55 and
ISO 55000 and away from selecting work purely based on OPEX budget levels. The role of the AHI
starts with the asset register listings and builds into its asset historic and condition performance.
Expenditure to maintain, repair or replace follows a systematic review of the available evidence and
risk assessment. The approach illustrated in Figure 1.3 was included in the Substations Green Book
[B9]. The AHI process described in this brochure follows the same philosophy shown in this figure.
Figure 1.3 – Asset investment planning
AHI and the ageing asset base

When the major transmission networks were built in 1960s or 1970s the design intent was to have an
infrastructure including the power equipment as well as buildings, concrete construction, roads and
steelwork lasting for 40 years. For the generating stations the major facilities, such as coal handling,
boilers and buildings, were thought to have 25-year lifetimes. On this basis HV power equipment was
designed with a rating that optimised the expectation of 25- or 40-year lifetime to match its application.
It was generally believed that lifetime of an asset class would then follow a bath-tub failure pattern,
described mathematically in terms of Weibull statistics where a shape factor greater than unity would
lead to an increasing failure rate at the end of life. Included within such a perception was a predicted
“onset of unreliability” starting a few years less than the 40- or 25-year lifetimes.
Such a statistically derived "bathtub" pattern for a whole asset class might have been quite reasonable
if a single common failure mode applied. This has been a long-term presumption for both circuit
breakers and transformers. With switching apparatus, it related to simply contact erosion at a rate
dependent upon a summation of data from all arcs (a product of time duration of each arc x a power of
the current drawn). A comparable situation could also occur in oil filled non-switching component if, as
commonly assumed, failure always and only followed insulating paper on conductors depolymerising
at a rate determined by time at loading temperatures, oxygen and moisture levels. This would
eventually reach a point where its structural integrity had been lost. An inter-turn failure would follow
the inability of brittle paper to withstand a subsequent short circuit. At its simplest, these years could
be, and in many cases were, used as perceived time to failure for whole asset classes.
After decades of service-life, however, it has become clear that infrastructure assets can last for much
longer. How long depends firstly upon the variable extent to which individual suppliers have been able
to design and build to achieve their intent. Failure studies from both CIGRE Study Committees A2 and
A3 indicate that, whilst wear and ageing are important processes that lead to failure, asset failures at
the voltage ranges of interest are predominantly random when viewed from statistical analysis of
whole populations [B1] and [B2]. The reason is that assets in this category usually have a range of
diverse failure modes that relate both to design and random system events. The second source of
variability has been the impact from varying operational environments such as exposure to
19
environmental damage, system disturbances switching duty and load levels. Failures that have been
seen tend to be those in early life units that have design limitations and so unsuitable for predictive
analysis for applying to the whole asset class. The recent CIGRE survey of transformer failures [B1],
for example, concluded that there was no evidence of a bathtub characteristic for transformer
population as a whole. A separate study of UK transformer and reactor failures concluded that only by
separating out the population into its design groups, each with its own dominant failure mode, could
any sign of a predictive time based failure pattern be identified [B10]. Similarly, German studies of
failures in over 4000 SF6 circuit breakers showed an infant mortality followed by random failure rate
[B11] and [B12]. It is only when specific functions such as failures in hydraulic drives are separated out
can a predictive wear out pattern be seen. It is only visible when the data is not related to service age
but to years since an earlier intervention (repair, rebuild or maintenance). These results are not
surprising for the generation and transmission sectors in particular which have relatively few assets
with the same design, OEM and operational environment. But the converse can also be true – in
distribution or with cable systems there may well be large populations of identical equipment all with a
single dominant failure mode. Predictive failure modelling might then be more relevant [B13].
An important perspective is to be able to introduce key performance indicators subsequent to audits of
the use and role of condition based AHI when assets are removed from service (step 6 of Figure 1.3).
A2 session papers [B10] and [B14] have described the AHI process and how it was used to identify
replacement schedules over a 20 year period. The population included around 800 transformers, all
over 100 MVA at 400 kV and 275 kV and installed since 1952. Forensic examinations were made
during scrapping and the individual results compared with the AHI created when the unit was in
service. Reference [B14] describes how the correlation was generally good. The age of transformers
selected for replacement is shown in Figure 1.4 [B1] and [B14]. The figure also shows actual failures
still occurring in service (lower line). In this work failure was defined as a situation requiring complete
removal and replacement. In many cases these failures were random in time and mainly ones that
followed a system event and so not predicted to fail as a result of assessments of longer-term
deterioration based upon selected indicators. The outcome is a hazard rate for these unexpected but
“actual” in-service failures. The second important point is that where replacement was justified by
condition indicators (upper line) these indicators are more likely to worsen with age. The third point is
that investigation of real failures together with their forensics provide the basis for ongoing continuous
improvement of the methodology.
Hazard
Figure 1.4 – Failure hazard and replacement hazard for TSO population [B1] and [B14]
For power transformers at least, such conclusions confirm the development of condition-based AHI
and away from decision rules based upon simple time/age and duty relationships. The AHI
methodology seeks to manage within a context of recognising the more diverse range of failure modes
and applying asset-specific risk and condition-based tools. It then leads to the individual asset life
plans within an asset health review that link not to the past time in service but forward looking, towards
a time left in service before a particular failure mode becomes terminal. This could be a rolling
estimate that starts when an asset is newly commissioned and on through stages during service. It is
not to say that the number of previous service years (age), duty and exposure are not relevant. But
these are influencing factors and not failure modes themselves. Their relevance is in their possible
influence on timing of the onset and rate of progression of a particular failure mode. The tenet of a
20
condition-based approach is one where the time/duty-based development of a particular failure mode
is identified through related diagnostic indicators which do relate to deterioration of a particular failure
mode, its onset and rate of progression.
Dealing with Unexpected Failures

It should be expected that a condition-based AHI will provide a good predictive indication of advancing
deterioration. However, failure studies for power industry assets commonly indicate a significant
contradictory category where failures occurred in assets that had previously been judged to be in a
good condition and fit for service (e.g. the black line of Figure 1.4.) This may be due to a variety of
reasons:
 The diagnostic strategy is not optimal or not being applied optimally. The interpretation of the
diagnostic data should always be within the context of a developing failure mode and not
simply upon the measurement values.
 There are situations where there are significant failure modes not being assessed adequately
by the range of diagnostics being used. This may be from an error of omission or following
conscious adoption of a restricted assessment strategy. The more comprehensive or “mature”
the assessment the greater the degree of confidence can be placed in the derived AHI, see
the case example below and Section 2.3.
 The AHI method must be developed to have a clear link between results of condition
assessment and the failure modes. A ‘worse’ AHI score should always reflect a more urgent
condition. That will not necessarily be the case in some methods currently in use. This is an
issue discussed later, in Section 2.5.
 The periodicity selected for gathering data within the diagnostic strategy being used may be
too long. A failure mode with a rate of progression from a sound condition to faulty that may
be too fast for a response. The condition is then not as good as expected from the most
recent review of data. For some failure modes the periodicity of any assessment can be
improved with use of on-line diagnostics.
 An unusual weather event, or as a consequence of some unrelated catastrophe or system
event, that produced stresses beyond the specified design levels.
 The data has been entered incorrectly and led to a false assessment.
 The design variabilities had not been correctly factored into the analysis, see below.
Relating to the last point, since the time of the first installations in many networks, there have been
significant enhancements to the design and calculation tools. This in turn influences any age-related
calculations of failure rates unless design changes are factored into the analysis. These changes
allowed manufacturers to optimise costs and build to their calculated design margins. In some early
cases, however, the results were not as effective as intended, particularly in the transition period
between building based on “custom and practice” to those being “design optimised”. But over the
longer-term true performance improvements through design evolution have been achieved. Any AHI
being created needs to include not only outcomes of specific design groups but also the greater
likelihood of failure to those designed to the earlier practices. Successful examples include short
circuit withstand capability of power transformers being improved with use of continuously transposed
conductors: lightning withstand was improved with inter-leaving and inter-shielded windings. Tap
changer reliability improved with silver coated contacts. Circuit breaker reliability has improved
following changing designs of drive mechanism.
An example: a case study of a restricted assessment
This example has been selected because it demonstrates many aspects of the AHI approach with its
strengths and weaknesses. Several of the following chapters will refer back to this example.
Significant deterioration can be missed depending upon how diagnostics are applied. The transformer
winding shown in Figure 1.5 was from a 38 years old unit, one of a banked pair of the similar design
and age installed as an N-1 configuration. The selected diagnostic indicators also appeared to show
no problems and the transformer appeared to be operating quite satisfactorily, even at the time of its
removal from service. Yet the illustration shows the condition was contrary, with significant hoop
buckling to the common winding, as well as consequential damage on the tertiary. This situation had
existed for several years in this condition following a system event, and it would have reduced the
capability to withstand a further short circuit.
21
Figure 1.5 – Hoop buckling on common winding (left) and crush damage on the tertiary (right).
The problem here had been the optimum diagnostic strategy selected had been reliant upon only
levels for each combustible gas after dissolved gas analysis (DGA) in the oil. Yet DGA is a poor
indicator when the transformer failure mode is by mechanical movement. Secondly DGA was being
used as per a common practice, relying upon an adverse laboratory report initiated only when based
on IEEE and CIGRE guidelines for exceeding stated levels for one or more of the gases. In this case
study a close-up short circuit had taken place but the change in combustible gas levels following it was
not sufficient to exceed levels in the condition 1 of the IEEE C57.104 guidelines for normal units.
However, at this time the utility changed its practice with specialist engineers assessing consequences
from any damaging events. They then looked for changes in relative concentration of the key gases as
per IEEE C57.104-2008. This identified that there had been a change in percent hydrogen content
after the short circuit. This is indicative of partial discharge (PD) damage. But confirmatory evidence of
significant damage only came following out-of-service diagnostics. Since the DGA indicators had
appeared to show a unit with a low increased failure risk it took a further 2 years for the unit to be
allocated a circuit outage to allow such investigative testing to take place. Only then did the
assessment change. Winding capacitance and sweep frequency response results both gave very
unambiguous indications of severe winding movement with hoop buckling. Internal inspection from the
top of the tank revealed a broken clamping plate. The tear down confirmed hoop buckling and showed
the hydrogen was coming from PD at crushing damage to a tertiary winding.
Experience developing AHI

A review of published papers on the subject, together with case study experience from working group
members is included as APPENDIX D - APPENDIX G.
22
2. Processes used in Asset Health Indexing

AHI Processes described in Publications
There is no great library of published experience describing AHI developments. A review of available
resources is included in APPENDIX D - APPENDIX G. This is a review of published papers on the
subject, together with case study experiences from working group members.
Relatively few publications present outcomes of authors’ methodology, let alone evidence of any audit
of it. One exception was work in a UK TSO where authors described both their method and its audit
covering their 25 years of experience [B10] and [B14]. Their experience in USA was later described in
a 2018 paper [B15] where greater details were supplied. Some papers from OEMs and service
providers have methodologies but do not provide much detail. Even with A2 brochure TB 761 [B3] it is
not clear how extensively their advocated scoring system is being used, even by the utilities
represented on the working group.
However, many of those developing AHI methods have realised some of the basic issues, including:
 To begin by deciding the purpose. It could be an internal document for prioritising tasks such
as maintenance or replacement. Similarly, it may be to indicate likelihood of in-service failure and
so to plan the means to address this, through replacement or repair. Different purposes will lead
to a set of differently prioritised lists and different AHI as described in TB 761 [B3].
 To have a clear understanding about failure modes and asset life. Some have used “current
age relative to a defined asset life” as their starting position and recalibrating it by factoring in
both the presumed effect from the operational environment and results of diagnostic tests.
Others, however, start with a FMEA approach and by seeking to identify the onset of a failure
mode. It is then this that indicates future failure free lifetime. The need is then to identify these
modes, their causes and apply diagnostic indicators which then assess the future time frame to
failure. Individual indicator results are not failure modes – nor is age.
 To have a method of aggregating the results in a way that does not dilute and so hide a
bad score. This is a major issue where adding to aggregate a set of linear scores from individual
failure modes, or made worse when using weighted scores, and can lead to incorrect decisions
[B15].
 The technical reasoning for an AHI outcome should be clear and reasonable. The
interpretation of data should be through published standards and guides (such as IEEE, IEC and
CIGRE) which relate asset condition to defined failure modes and identify the presence and
severity of those failure modes. The action should be clear, and the evidence provided to justify
the decisions to be made- and not lost within a convoluted and/or multifactored assessment
process. The AHI output should be time calibrated so that resulting actions can be prioritised.
 An ongoing improvement process needs to be built in. There is little evidence that assets
removed from service are then forensically examined to re-assess the approach.
Asset Health Index terminology

Assets and health indices
An asset is an item, thing or entity which has value or potential value to the owner (from ISO 55000
[B8]). Health is the state of an asset which represents its ability to perform the function for which it is
required and for the timescale defined by the user. In this context the health is a state which varies
monotonically throughout the asset lifetime and is reflected by condition as determined from indicators
corresponding to related failure modes. The Asset Health Index (AHI) is therefore a snapshot
indication of health in terms of likelihood to fail.
An AHI system usually contains a set of “failure-free” time categories into which assets are allocated
based upon an estimation of their likelihood of failure in each time period. As an example, a health
index system having a 1 - 5 set AHI classification is shown below in Table 2.1. Some users may want
more or fewer categories or introduce subsets; but it is this example that will be typical of those used
throughout this brochure. The more variable feature is the time frames selected for “remaining failure-
free years” and this depends upon the purpose of the AHI. With most assets their failure is prevented,
or arguably delayed, by timely maintenance interventions. An AHI process could be used to identify
23
the maximum periods before maintenance is carried out to avoid an in-service failure. It assumes that
maintenance is being undertaken as indicated by time intervals derived from type approval tests, or as
part of a condition-based assessment and creation of an asset maintenance index. This example in
Table 2.1 was constructed to reflect the timescales appropriate for an intervention which would itself
be used to identify the type of action – maintain, refurbish or replace. In this way it is generic, reflecting
the needs of most asset types and their components. Decisions relating to the timing of end of life will
reflect a time when future maintenance and refurbishment interventions will no longer delay onset of
an end of life failure. Identifying a time to this end of life will depend upon the asset type, the quality of
its design and manufacture, duty cycle and operational environment. To convert the table into an asset
replacement schedule would require a modification to the time scales with many more subcategories
in each of the five categories. However, although this final column will be a variable depending upon
the purpose of the AHI, it is important that time frames are stated in an AHI.
All AHI should reflect a condition, an action and a time scale for the action.
Table 2.1 – Example of AHI definitions for expressing remaining failure free years
AHI CONDITION DEFINITION ACTIONS TIME SCALES
Very low likelihood of failure over Continue with inspect More than 10 years likely before additional
many years. This would be in the and test schedule. maintenance and refurbishment is
1 Very good original factory condition or after undertaken. Timing of interventions is asset
extensive refurbishment. specific and indicated by the inspection and
test results.
Low likelihood of failure over a Continue with inspect and 5-10 years likely before additional
long period. General deterioration test schedule. maintenance and refurbishment is
is consistent with its time in undertaken. Timing of interventions is asset
service. specific and indicated by the inspection and
test results. Subcategory bands can be
2 Good introduced based upon failure mode and
rate of change in diagnostics.
The impact of any life-limiting irreversible
deterioration is expected to be beyond this
time frame. If not, introduce extra column
with 5-year replacement bands.
Low risk defect or life-limiting Investigate the issue 2-5 years before interventions. Timing and
deterioration has been detected. and plan any scope are indicated by investigations,
Performance may be adversely intervention. Continue together with changes in inspect and test
affected long term unless remedial with a revised inspect results.
3 Fair action is carried out. and test schedule. Subcategories introduced in yearly bands
Revise life expectancy based upon failure mode and rate of change
planning into likely 5- in diagnostics.
year bands.
Progressive deterioration has been Remedial action to be 3- 24 months before interventions. Planning
detected, with high likelihood of carried out and/or the action and its timing is determined by
failure in the short term. The unit increased condition failure mode analysis and operational
can remain in service, but short- monitoring implemented. practicalities. This is managed using
4 Poor term reliability is likely to be De-rating and risk increased surveillance.
reduced. Subcategories are useful management zones may
to define urgency of repair or be needed.
replacement timeframes.
High likelihood of immediate failure Any exception would 0-3 months determined by risk assessment.
exists and the unit should not require intensive risk
remain in service. management actions. If
5 Critical returned to service
decision points and time
frames need to be
defined.
In category 1 the term “As new condition” is avoided since some newer assets can have a higher
failure rate and consequently not in a “Very good condition”.
It would be normal for the user to define future time-related terms to suit their application. These times
may then be inserted into the table for use within the company. For example, it might be to include into
the timescale column an estimate of remaining time before irreversible deterioration and asset
replacement is due. This may need to align replacement timing with their regulatory review periods, as
24
was the case with the utility reference earlier [B10] and [B14]. Equally the timeframes for
reassessment may differ according to experience with the design and operational environment.
The actual health of an asset is at best an estimate based on selected indicators used at a certain
time. The confidence in the assigned index will improve if more comprehensive diagnostic strategies
are used. The range of possible diagnostic strategies, from basic to advanced are here referred to as
“Review Levels”. See section 2.3 for further explanation.
AHI Applications
AHI can be used for likelihood to fail, as a replacement index and maintenance prioritisation indicator.
A health index is a result of a condition assessment that leads to a value, whether it be a letter, a
code, a number or some other indicator, that has to be consistent in terms of timescales to make
sense. For example, if the AHI is an expression of generic likelihood of failure, as described above, it
may be turned into AHI for replacement and maintenance actions:
 A health index for asset replacement may give results as numeric codes, say 1-5. The timescale
for action for code 3’s may be “maintain normally (as per manufacturer instructions) but asset still
has a likely need to be replaced in 5-15 years”. All code 3 assets, of any type, should be in the
plan for replacement in 5-15 years. If the index is given as a percentage then we would also
expect monotonicity – if 100% indicates a “very good condition’’ asset, then an asset at 60%
should always be more urgent than those at 70% in terms of action timescales.
 A maintenance index based on monitoring data, say for OLTC’s or bushings, may be based upon
interventions with far shorter timescales than a replacement index for a transformer: hours to
days to weeks rather than decades. This can be confusing if a maintenance scale is 1-5 where 5
means intervene ‘immediately, while 4 is ‘within 24 hours, and 3 is ‘within a week’ etc. In this
case we can identify these as 5.1, 5.2, and 5.3 so the main code is consistent with the
replacement index and likelihood to fail index. The sub-code reflects the urgency within that time
period. But, remembering A2.49 advice, if a monitor indicates an urgent ‘do it now’ condition, do
not wait for a new review of the index to confirm or deny.
 Subcomponent elements of a health index, say an OLTC or a winding, may have a code of their
own; this code should also be consistent with the overall replacement/ maintenance index. It
would be confusing to have a breaker mechanism needing intervention in 3-5 years, but the
asset replacement index indicating a need to replace in 2 years.
 We must be aware of time passing and when analyses are performed. If a replacement index
says ‘replace in 3-5’ years, then, in 2 years’ time, that should become ‘replace in 1-3 years’ so
we would expect the asset to be in the plan and ahead of those which had entered the ‘replace in
3-5 years’ code during those two years. That said – review and checking of condition is required
to make sure that the code still applies to the asset.
Failures, reliability, probability and likelihood of failure

Failure is defined as the loss of the primary function of the asset according to IEC 600050-191. With
some assets such as a circuit breaker this is universally applicable. With other asset classes this is
less helpful; for example in TB 642 a major transformer failure is one defined by being out of service
for major repairs taking more than seven days [B1]. Others may have a regulator who imposes a
definition of a failure as an event producing complete loss of the asset (and thereby a capital and not
revenue expense). Also important is the failure to meet safety and environmental regulations. Some
use a definition with the need to replace the asset within a defined regulatory interval. Users will
decide their definition appropriate to their business needs.
A Failure Mode is one of the malfunction possibilities where a failure can be the end result. Failure
Mode, Effects Analysis (FMEA) is a process that identifies and separates out:
 Each of the possible ways an asset can fail,
 The effect for each in terms of how the specific failure produces features that can be identified
and associated directly with the failure mode
 The degree of criticality of the mode to the unit’s loss of functionality.
The criticality aspect usually involves an analysis of the consequences of this mode to its loss of
functionality and its likelihood of occurring within the time periods of interest.
25
Reliability is the likelihood that an asset will perform its specified function under specified conditions
for a specific period of time. This definition is in line with that of the IEC 60050-191.
Probability theory is a branch of mathematics where reliability and age-related failures of a
population of assets are expressed using terms and equations - such as hazard function, probability
density function, and survival function. It is, however, only a model and only as good a model as its
inputs and assumptions allow. The relevance of this approach to a whole asset class is debatable
since asset failures that are being experienced relate to a range of differing failure modes, the installed
population shows various internal design limitations, and installations are subject to random external
events to a greater or lesser extent. It would have greater application when asset classes are
separated into groups where some had a single failure mode and a single age-related distribution. But
even then, random system events would limit it. TB 761 [B3] devotes the whole of its Chapter 7 with a
useful discussion of the problems and dependency of the outcome on the significant, and often
unsubstantiated, assumptions that have to be made when assigning probability of failure rates.
Likelihood of failure is a less specific term than the above where applied to a single asset. The
likelihood is assessed in terms of its predicted failure modes, effects, condition indicators and
expected rate of deterioration. It is, therefore, “only a model” relying on how comprehensive the inputs
are and how effective is the expertise available to translate measured values to a time scale for the
rate at which the asset will deteriorate to the point of failure.
Failure free period is a useful estimate for managing possibilities of an in-service failure. It is the
outcome of a result that could be based upon probability theory or an estimate of likelihood of failure.
Diagnostic Indicators for failure modes

Inspection is a term usually used to describe a non-invasive visual examination of an energised
asset. Here it follows the broader use described in IEC specification TS 63060: 2019 (Electric energy
supply networks – General aspects and methods for the maintenance of installations and equipment).
This includes any activity that obtains condition data about the asset. This can be an external non-
invasive visual inspection, a common routine activity throughout the industry. Also, it is used to include
other activities such as function checks, intrusive inspections, non-intrusive surveys with diagnostic
instruments, audits, gaining data from online monitoring or off-line measurements, etc. But importantly
it is not including any consequent intervention to remedy a malfunction.
Condition Indicators are the results of an inspection as defined above. Perhaps as a set of dissolved
gases each expressed as ppm of a gas, electrical diagnostic tests expressed as a power factor, SFRA
trace etc, or observations of defects such as broken porcelains, low oil or gas levels, an observed
leak, etc. Each are the effects of a developing failure mode and each need to be interpreted in terms
of the scale codes linked to the likelihood and time scale for the mode developing.
Condition Scale Codes are AHI values assigned to these indicators. They are based upon the
interpretation placed on these condition indicators (observations, diagnostic results and
measurements etc.) in terms of each of the developing failure modes, their severity and rate of
progression. It involves expertise to enable a set of condition data to be translated into a scale code
number. Each is indicative of likelihood of failure within a timeframe from one failure mode. In this TB
the indicators are transferred into one of five scale codes and the application is described in section
2.4. The scale code sequence is linked to the definitions for the five AHI categories of Table 2.1.
Typically, each asset has several failure modes, each now with its set of scale codes. To assign a
single AHI code representing all failure modes of the asset requires care. At its simplest the AHI code
for the asset could be the worst value of codes enumerated 1-5. Alternatively, an aggregation of scale
codes may be attempted and how this is performed is critical; see section 2.5. Integral to the
aggregation is whether the scale code numbers are expressed either as part of a linear or exponential
series.
It is also worth pointing out that some condition indicators are effects produced from more than one
failure mode. In this case actual values may lead to different AHI categories. For example, dissolved
gases are indicators of both thermal and discharge/arcing in oil impregnated paper systems. But the
amounts and proportions of individual gases differ in the two cases. This means that it is the levels in
a gas signature and not individual gas levels that relate to a particular failure mode.
26
Failure mode susceptibility indicators

It is useful to recognise the existence of factors that could lead to some assets having a greater
likelihood to develop the onset of particular failure modes in a shorter future time scale than others
that are otherwise similar. Factors may include asset age, time in service, limitations in initial design or
manufacturing, operating in adverse environmental location, high duty factors, regular loading above
nameplate etc. These factors are not failure modes themselves. Creating an AHI must always relate to
failure modes and the diagnostic effects. But a useful starting point when creating an AHI may then be
to look back to the historical experiences and build a preliminary step to identify such factors and
score them in terms of their likely significance to future performance of particular assets. The outcome
can be used to identify the most appropriate diagnostics and assist interpretation of timescales.
References [B14] and [B15], for example, follow this approach with a separate “liability score” based
on design performance and related known life-limiting weaknesses. Another user of this approach is a
WG member who described the method in reference [B16] and described in the Appendix G.12 as an
expression of performance under challenging tropical environments. Such analysis should help
construct the failure mode and diagnostic plan for the asset – but not necessarily. Their role is,
therefore, as an “advisory” notification, not an AHI outcome or expression of risk.
Intervention
These are activities to remedy a malfunction by maintenance, repair, refurbishment or replacement.
Failure Mode, Effects and Analysis

For many the starting point for developing an AHI system is use of FMEA methodology [B11], [B12]
[B14], [B15]. From a top down approach, the steps are:
1. Define the functions. What functions and performance standards can be defined for the
system component? For example, in case of a circuit breaker: Switching off/on the operating
currents, interruption of the short-circuit currents, to secure in and off positions.
2. Identify what are functional failures. For the asset population under review these are factors
that prevent the asset performing one or more of the critical functions that it is meant to
perform. These are usually not functioning within its design specification.
3. Determine Failure Causes. These are the causes leading to the loss of each critical function.
The causes of failure are the targets of the maintenance program. Causes can be risk
prioritised based on likelihood and consequences. It should reflect observability and
uncertainty. These are usually design specific and may be influenced by duty or operational
environment. Clearly with some circuit breakers the mechanism type influences reliability; but
surveys from CIGRE Study Committees A2 and A3 both show reliability and dominating failure
modes are strongly influenced by OEM design and manufacturing quality.
4. Identify and apply diagnostic indicators. These should be capable of detecting the causes
of each failure mode in terms of onset and rate of progression. The timescales should be
consistent with application, otherwise the indicator is no better than a protection trip. But the
context is also broadened beyond diagnostic sensors to include histories of load and duty cycle
obtained from substation Historian servers.
A key element of the methodology used, as with Reliability Centred Maintenance (RCM), is to
ensure all important functions are reliable and available. If one of these functions is not able to be
performed it is a failure of the whole unit; this is the case of a chain and its weakest link. If a critical
function is in a deteriorated state, targeted maintenance is required. For example, if a transformer
cannot maintain a stable output voltage within specified limits, it is a functional failure of the
transformer. It follows, therefore, that it is critical that any aggregation of indicators of failure modes
and their scale codes must not average out and thereby obscure an adverse assessment occurring on
one individual critical function and its indicators. In this respect FMEA differs from FMECA method in
that it does not address the criticality of the failure mode to the system performance. This latter aspect
will be addressed in the work of WG B3.61.
27
Timescales are important when creating and using health indices. The response time to changes in
condition can vary between milliseconds up to foreseeable future (up to 15 years). Aspects that need
to be considered include:
 Asset condition
 Failure modes
 Data and its relevance
 Condition assessments
 Intervention planning
Extent of an AHI review and restricted assessments

A starting point is to identify the cost/ value of undertaking FMEA in order to decide how
comprehensive the analysis needs to be for each asset type.
The staged approach shown in Figure 2.1 has 5 alternatives, each advancing not only the range of
diagnostics but also the resulting amount of data to be analysed. In turn it increases the confidence of
the AHI assessment. Lower number reviews, 1 to 3 are “restricted strategies” that may be cost
effective for some asset types or with lower voltage levels, where a higher chance of being in error is
accepted. AHI based on more extensive reviews such as 4 and 5, should cover more failure modes
and provide more reliable indicators of the likelihood of failure. Experience has also shown that when
creating an AHI for the first time it is more realistic to implement slowly, starting with a Level 1, 2 or 3
review and build up towards the final goal selected for the asset class. Within a single company there
could be a range of “final” degrees of investigation and analysis in use, depending on the range of
assets and their network criticality.
Figure 2.1 – Achieving AHI with 5 identified strategies, each with staged activities
The confidence level depends not only on the range of activity, but also will depend upon the extent
each activity is adequately comprehensive. Important is the age of the data – how often and how
recent are the inspections and out-of-service testing. This can be overcome with permanent on-line
monitoring, but presently only a restricted number of failure modes can be monitored in this way.
Level 1: Basic Strategy – based on office study

The asset register should contain nameplate data, including manufacturer, design, ratings and date of
manufacture. Historical records such as performance data for the design group and OEM, past work
and costs should be accessed. This is easier where a Computerised Maintenance Management
System (CMMS) and Activity-Based Costing (ABC) database decision support tools are in place. The
historic database should also include any earlier diagnostic tests that have been undertaken. Some
historic performance data may also be available from external sources – published papers, trade
association and even web-based chat forums. Each can provide the input for creating a FMEA by
28
identifying the significant failure modes. If no more is undertaken its limitation will be that the
diagnostic condition related data will be restricted and aged. It is likely to relate to fewer failure modes.
Some low voltage assets have a value and impact on failure where it may not be worth investing much
more effort than using this Level 1 strategy alone. There may not be any diagnostic data. However, it
is commonly the case that at these lowest voltages there are sufficient numbers in each design family
to use a more statistical approach. This could link design group, separating in terms of its failure
modes and applying Weibull statistics to identify a lifetime in terms of onset of failure for each of these
categories. Service-life may then be related to this lifetime estimate. This is, however, quite a different
assessment from a condition-based health index.
Level 2: Simple Strategy – added visual inspections

The next higher level includes a programme of site visits. Routine site patrols are normal in most
utilities at this class of substation. The concern is to ensure the data are recorded adequately and
entered into the asset database correctly.
 Confirmation of the asset register data and the equipment actually in each bay is recorded
correctly. It should identify where any contain hazardous or environmentally recordable
materials.
 Undertaking an external visual assessment to identify damage and malfunctions.
 Obtaining site-specific records from counters and gauges.
 Creating an external impact assessment in terms of collateral damage, safety,
environmental damage.
Level 3: Intermediate Strategy – added non-invasive diagnostic

This is the base level appropriate to power station, transmission and sub-transmission assets.
Outages can be difficult to achieve and to acquire sufficient data as frequently as required in order to
achieve a realistically current assessment. It is increasingly common to use non-invasive survey tools
more generally, and for high impact/cost items to have an installed on-line monitor. Here as much as
possible is undertaken non-invasively and importantly without any service interruption. It is an
assessment made when the asset is at its operating stress and temperature. Apart from any
consequent saving from the outage avoided, there are also savings by avoiding disconnections and it
allows all assets on the site to have an assessment in the single occasion. This allows more frequent
assessment and made on more recent data.
TB 660 [B5] describes the cost benefit achieved through introduction of basic non-invasive diagnostics
into a site survey. Further detail for circuit breakers is included in A3 publication TB 737 [B17]. Many
will be taking oil and gas samples from transformers, for example. In addition to visual inspection and
system checks, infra-red scans can indicate overheating locations and UHF PD survey scans can
detect areas of both internal and external partial discharge. Oil/gas analyses have been widely used to
detect oil deterioration as well as overheating and partial discharge in transformers.
Level 4: Advanced Strategy – added offline measurements and investigations

Not all failure modes are amenable to assessment by online survey diagnostics and a more
comprehensive condition assessment is needed using out of service diagnostics on a routine basis. In
the industry there exists significant experience, with comprehensive data bases linking normal and
abnormality, remedial action triggers etc. This stage may also involve an internal inspection of the
asset or major accessory to investigate a perceived problem prior to any intervention. An example of
this was described in Section 1.5 of the preceding chapter.
Level 5: Advanced Strategy – added continuous online monitoring

The addition of offline diagnostics and selected online systems should produce the most
comprehensive diagnostic strategy, matching all failure modes with a diagnostic indicator, and in some
case cater for cases where rapid changes in condition can occur.
For completeness it would also include outage testing identified in Level 4.
There has been a long history with online dissolved gas monitors and bushing power factor systems.
Partial discharge monitoring is becoming more widespread after many years use in GIS. Some see
further developments to cover more failure modes on equipment throughout a substation. Data can be
29
fed back and combined with operational data extracted from Historian servers to allow a dynamic
indication of AHI. Over recent years the reliability and longevity of monitoring systems have improved.
Site data management and hardware have improved, with fibre networks and IEC 61850 protocols
enabling greater access between vendor systems. Access to operational data is improving with an
asset management data file being incorporated into the Common Information Model (CIM) by
IEC TC 57.
Translating into the scale code

For each condition being assessed there is an indicator to be obtained. It might be, for example, a
measurement such as temperature in degrees. Equally it might be a subjective assessment – such as
the extent of an oil leak or of tank rusting. Others could be operational performance histories during
the service-life. Whatever the indicator shows, there needs to be a means of converting various types
of condition values or indications into "condition scale codes" which relate to failure modes and rates
of progression. These when processed will contribute to the AHI category of Table 2.1. These need to
be translated in a systematic way to the failure modes and to the likelihood of failure within a time
scale.
Here we propose a 5-set numerical base for the condition scale codes matching the 5 AHIs as shown
in Table 2.1. The numbers could continue with the same linear series, 1 – 5, as per Table 2.2. Some
sought to aggregate scale code values for each asset and have used an alternative exponential series
(see Table 2.2 and section 2.5).
Here the interpretation score relates to Table 2.1 with, for example, “Critical condition” meaning
degradation as identified by this indicator is such that there is a high probability of immediate asset
failure from its related failure mode.
Table 2.2 – Log and Linear condition scale codes
Possible options Description
Keep it simple Use Log Use Log Each are linked to descriptions in Table 2.1
base 3 base 10
Alphanumeric
A 1 1 1 Very good condition
B 2 3 10 Good condition
C 3 10 100 Fair condition
D 4 30 1,000 Poor condition
E 5 100 10,000 Critical condition
The starting point is the description of each of the failure modes and how the condition relating to
extent of deterioration for the mode is reflected from the measured values. This relationship is specific
to the failure mode. The user could utilise the relationships obtained from custom and practice within a
company or rely upon a set of standards. Important is the level of expertise available to undertake this
task. For example relating a dissolved gas value to a dielectric failure mode would involve measuring
specific combustible gases and relating the assessment to the value (ppm of each gas), the rate of
change, and an indication of the type of problem using tools such as the IEEE key gas method or
Duval’s triangles. The failure mode assessment could be improved with additional indicators from
directly measuring the partial discharge activity, by using a UHF probe inserted into the tank for
example. From such an assessment the numerical values may be translated into one of the 5 scale
codes. It is for SC A2 and A3 to identify the link between measured diagnostic values and condition.
SC A2 does do this in the appendix for TB 761 [B3].
More difficult is where the assessment is subjective – how likely is a leak to lead to a failure, for
example. Here failure may relate to a functional failure following a low oil level alarm, but equally relate
to when pollution and its environmental impact become unacceptable. It is then for the user to define
30
the relationship as per Table 2.3. The answer requires past experience within or outside the company.
The key is always to relate it to the likelihood of failure, as given in the text of column 2 of Table 2.1.
For both linear and exponential scoring situations the specialist engineer has a key role ensuring the
sanity of the outcomes. At very least the outcomes must be sufficiently transparent to be audited by an
expert, particularly when significant investment is being indicated from an automated or semi-
automated system for translating observational data into scale codes.
Table 2.3 – Converting condition indicators (observations or measured values) to condition scale codes
OBSERVATIONS – Examples
Condition Indicator 1 T ≤ -10 °C -10 < T ≤ -8 °C -8 < T ≤ -6 °C -6 < T ≤ -5 °C T > -5 °C

But - How to decide the link
between temperature limits and
failure time periods?
Condition Indicator 2 no leakage very few few low medium high

But - How bad is a bad leak? leakages leakages leakages leakages
How to define “few” and “low/
medium/ high”? How to get
consistency?
ASSIGNING CONDITION SCALE CODES
Linear numeric scale code 1 2 3 4 5
Alternatively, Log base 3 scale 1 3 10 30 100

code
An important factor when deciding the indication system is how the codes from different indicators are
to be aggregated as a single number or as a summation. This is described later in Section 2.6.
It is important that the scale code selected in such as Table 2.3 is based upon the FMEA analysis to
indicate how the measured values can be associated with specific failure modes and their time scales.
Noteworthy here is that in this simple example (relating to the case study in Chapter 1) most of the
measured test values indicate normal deterioration when they are related to their failure modes. Each
would then be assigned to a condition code 2 (linear) or 3 in a log scale. However, when the
interpretation of gas results changed to use the IEEE Key gas method the higher hydrogen levels after
the fault indicated a dielectric fault had initiated and the assessment changed with that mode changing
its scale code to 3 (linear) or 10 (log). It was not until out of service testing was done, which indicated
a higher risk from a second failure mode (a mechanical failure in the event of a future close up short
circuit) and that this mode required an increase in condition scale code to 4 (linear) or 30 (log).
31
Figure 2.2 – Linking test data to failure modes and to a linear condition scale code
Working with scale codes

Missing or aged data
In an ideal world all data would be available in good quality to use the selected strategy and to be
matched to failure modes in order to create a current AHI. More realistically some data will be either
missing or aged. Any listing of AHI for an asset class would be mixed with some input data that is
timely and some incomplete or aged.
One solution could be to score the affected data with the worst possible score indicating an immediate
action is needed. However, this is impractical as it could take some years to arrange an outage “just”
to obtain new out of service test data.
One way forward would then be to make the assessment with the due regard to failure modes relevant
to that category and mark the AHI assessment as being “restricted”. The case study in Chapter 1 is a
good example. It did indeed take 2 years to get the outage to make DDF, capacitance and SFRA
measurements. But the knowledge of an increased hydrogen level in DGA following a close-up short
circuit was sufficient to mark this transformer as at risk if a further short circuit occurred. Here this was
sufficient until it could be proven.
Another way of illustrating a result containing restricted data is to grade the intensity of the green,
yellow, orange, red and black colours.
Linear and Logarithmic summing options

An example from reference [B15] is shown in Table 2.4. Here after using a method that simply sums
linear scores 1, 2, 3, 4, and 5, all add to similar totals for each of these three assets. But the individual
results indicate this is false. Trf 3 has a critical end of life score of 5 in one of the criteria assessed.
Simply summing a set of linear scores clearly hides this 5 score and gives an incorrect assessment. In
contrast if a simple rule is made that only accepts the highest score for each unit, then Trf 3 and its 5
score would be clearly and correctly identified. An algorithm or black box-based AHI system may not
recognise this.
An alternative method is to use the same data but covert to an exponential series, approximately
logarithmic. For example, with scores 1, 3, 10, 30 and 100 it requires more than 3 criteria having the
same score and then to add to incorrectly “promote” the total into the next higher category. With a log
base 10 it would need 10 with the same score to achieve this. The use of Logarithmic scores can
achieve granularity, and end with a single number to be used in prioritisation. It has been used
successfully by utilities in UK and USA for 20 years [B14], [B15] to create a prioritised list for network
transformers. All being compared in the same way with the same scale codes.
Problems arise particularly when aggregating scores for many assets. The log base 3 used with 9
criteria is that a near perfect unit with 9 scores all of 1 would take the total to 9. If attempting to
32
aggregate many assets with varying numbers of scale codes in the summation the reader would be
unsure if this value 9 related to adding 9 codes of 1, or just three scale codes of 3, 3, 3 being used.
With a base 10, the sum would again be 9 which is less than the next advanced scale code score
of 10. Put another way, if attempting to combine scores for a bay analysis, the number of scale codes
needs to be less than the number to promote the sum to the next AHI band. This is most relevant
when attempting to aggregate many assets with large numbers of scale codes and varying numbers of
scale codes across the asset types. This is a topic to be pursued later in Chapter 5.
Table 2.4 – Aggregating summed scores with linear and logarithmic scoring
Uniform linear weighting Logarithmic weighting
Factor Trf1 Trf2 Trf3 Factor Trf1 Trf2 Trf3

DGA Main Tank Score 2 1 1 DGA Main Tank Score 3 1 1
Dielectric Score 1 1 1 Dielectric Score 1 1 1
Thermal Score 2 1 1 Thermal Score 3 1 1
Mechanical Score 3 4 1 Mechanical Score 10 30 1
Oil Score 1 1 1 Oil Score 1 1 1
DGA LTC Tank Score 3 1 5 DGA LTC Tank Score 10 1 100
Operational Score 2 3 3 Operational Score 3 10 10
Design/manufacturer Score 1 4 1 Design/manufacturer Score 1 30 1
Subject Matter Expert Score 3 1 2 Subject Matter Expert Score 10 1 3
Sum 18 17 16 Sum 42 76 119
Sums are similar The urgent score stands out
Sense of urgency is lost
Weighting of scores
One of the ways some have tried to make units with high risk of failure stand out when using linear
scoring has been to use a weighting factor on each of the criteria. An example is shown in Table 2.5
[B15].
Table 2.5 – Effect of weighting linear scores [B15]
Here the very methodology pre-supposes the answer as to what criteria are most likely to cause
failure, and any manipulation of scores loses the conceptual appreciation of what is going wrong.
Some readers might argue why this user gave acetylene value the lowest rating, or why ethylene be
three times more important than the LTC oil, and so on. The assessment will be subjective and not
readily useable as a generic method: the preferred alternative is not to get involved with approaches
involving weighting of data.
Similarly, translating aggregated weighted systems into a likelihood of failure (LoF) is not simple or
direct, since higher (or lower) scores do not represent a higher LoF. This system might be relatively
easy to understand, but the dilution effects of the aggregated weighting rob the system of meaning.
However, the result will not be directly relatable to LoF. Units at the top of the list with the “worst”
scores may not be the units that fail and so invalidate the usefulness of this approach.
33
Displaying aggregated condition scale codes for a single asset

Several methods have been used for aggregating several condition scales codes into one single score
in an AHI. The merits and disadvantages of various options are shown in Table 2.6. The essential
point is that in all cases, any individual condition scale code that is a Category 5 is worthy of
immediate attention and must not be hidden within an inappropriate aggregation. The worst score
must always be visible and not lost by processing/adding or by using weighting factors. With simple
linear scores, there is no way to avoid assessment being hidden when the individual scores are
added, see Table 2.4 au-dessus. The key references on AHI methodology all agree with this
conclusion [B3], [B10], [B14], and [B15].
Conclusion: The only feasible option when using linear scores is to not to aggregate by simply
summing individual scores.
At this point there are two options when using linear scores. One is to use only the highest (worst
score) as the final output score. For the three examples Tfr1, 2 and 3 in Table 2.4 the respective worst
scores to carry forward would be 3, 4 and 5 respectively.
A second option is to follow the direction of TB 761 with albeit a more complicated option. Here the
number of individual scores with the same scale code are identified as per Table 2.4. This example
(and its colours) is copied directly from TB 761 where 5 categories of scale code are proposed [B3].
Here there were 0× black, 3× red, etc. In any tabulation this method would list the asset’s AHI as
035310. It would be listed below (better than) those starting with 04 and 05.
Figure 2.3 – Numbers and scale codes as shown in TB 761 [B3]
The problem comes when there are more than 9 values in any category. It is not clear in TB 761 [B3],
however, what experience exists in the use of its recommended methodology. This method is better
than using the highest code alone since it indicates that there are several other aspects that could
cause an early failure. Critically it also allows linear scores to be used for single assets without losing
bad results when aggregating. Again, the weakness comes when there are so many scale codes that
there could be a double-digit score in one or more category, thereby destroying the five-digit combined
score approach. As with the log score approach it is most suited to single assets with a modest
number of scoring codes.
The method suggested in TB 761 overcomes many of these limitations because it does not attempt to
sum – just to record the number in each score code. It does require confidence that the indicators that
you have are sufficient to cover all relevant failure modes and that all indicators are properly calibrated
with respect to each other (same scoring indicator means same probability of failure). Also, you need
to assume then that all indicators are independent. If these requirements are met, then even in case of
different number of indicators the enumeration system should work properly.
There is further discussion of this scoring and aggregation topic in Chapter 5.
The conclusion there is that both the log scoring system and the TB 761 method may be useful
when creating AHIs at a single asset class level but not when trying to aggregate outcomes of
many assets with many failure modes and where the assets differ in number of failure modes
and scale codes.
34
Table 2.6 – Aggregating scores
Method Comment
Adding linear scores Badly scoring codes are averaged out. See Table 2.4.
A
Condition indices are in a linear set of condition scores, NOT RECOMMENDED
(1-5 or 1-10) and aggregated by adding all individual
scores to reach an AHI as a sum or averaged to
normalise assets with different inputs.
Using weighting See Table 2.5

B
Condition indices are in a linear set of condition scores NOT RECOMMENDED
1-5 that are then weighted by multiplying the initial score
with a weight value to emphasise the importance to some
failure modes. The numbers are then added.
Using worst score Will identify presence of WORST defect that could
C cause failure. But where several failure modes with the
Condition indices have a linear score, 1-5 or 1-10 as
above but then using only the score of the attribute same score are present this will not be apparent. It will
having the highest (worst) value as the AHI. not reflect the overall condition of the asset or provide
any granularity for prioritisation within the asset class.
In the example in Table 2.4 it means only the single
scores of “3”, “4” and “5” are used and all with lower This can work and is most useful when
scores are ignored. consolidating bay or substation wide scores.
Adding log scores Gives single number and reflects general condition. Will
D also identify presence of weak attribute that could
Each condition index has a logarithmic score, say 1, 3,
10, 30, and 100, and then the scores are summed. cause failure.
This clearly identifies the worst failure mode scores as But it will have problems when aggregating both with
with (C) above and also gives a numerical appreciation of single assets each with many modes with same score,
other higher scores. and where there are many different assets in a Bay or
In both references [B14] and [B15] a sum with a base 3 substation wide AHI – see chapter 5.
logarithmic score was used. It can be used in a prioritised This will be most suitable for scoring single asset
table display for a single asset type. types when creating prioritised actions for that
asset class.
TB 761 approach Loses unique single number but will reflect the range of
E scale scores for an asset. Will also have problems
Use linear scores, say 1-5, and include all in each
category as per A2 TB 761. The score can be used in a when aggregating many assets in a bay or substation
league table display of outcomes. This will allow wide AHI.
prioritisation within the asset population. This also will be more suitable for scoring single
asset types when creating prioritised actions for
the asset class.
Assembling the AHI

In references [B14] and [B15] the 20+ years’ experience of this utility in UK and USA networks has
been to construct AHI in the form of Figure 2.4. This is a real service example extracted from Figure 8
of reference [B15] shows several relevant features. In the first step the asset nameplate and location
are listed and to this is added as step 2 a colour coded score of the inherent life limiting factors –
which in their case is only design limitations revealed through past forensic tear downs of scrapped
transformers. This information is used to indicate relevant and likely failure modes and so indicate
diagnostics upon which the AHI will be based. This is the scoring of the indicators as far as they relate
to failure modes such as dielectric, thermal, mechanical, oil, OLTC, bushings, etc. shown in Step 3.
Several cells are white with no score and these represent missing data and easily recognised as a
restricted analysis. Although a log scoring is used here it is also colour coded to aid the user. The
individual diagnostic scores for component parts in step 3 are then summarised in step 4 as the raw
AHI and what the change in AHI could be if remediation took place. This is a good model to follow.
35
Figure 2.4 –Score the diagnostic indicators
Chapter conclusions – creating an asset health index

The essence of AHI creation is:
1. Creating an AHI approach is costly in time and effort. It is essential before starting out to clearly
establish the benefits and potential for cost savings and maintaining core business attributes of
safety, performance and reputation. This would identify the review level of the AHI method to
be adopted for each class of assets.
2. The aim is to produce a listing of each asset in terms of its likelihood to fail in service in a user
selected time interval. This likelihood would be used with a criticality analysis to form a risk
assessment register. It is important to ensure that there is a clear justifiable link between AHI
and likelihood of failure in a selected time scale.
3. The AHI needs to link through indicators of symptoms relating to failure modes and their
timescales.
4. Any AHI should identify changing likelihood with time periods- and so creating an action plan
for an intervention- maintenance, repair, or replacement.
5. Ideally the output should be clear, auditable and justifiable by those needing to make decisions
based on the output. The AHI is not just a number as an output from an automated
analysis.
There needs to be an audit process – to tear down and forensically examine any scrapped unit. This
provides experience with actual failure modes as well as auditing the AHI process. This should
indicate the existence of all active failure modes, their rates of progression and the relevance of the
assessment strategy.
36
3. The generic methodology

The aim in this chapter is to describe the WG’s generic methodology for any asset within a power
network. It has essentially 8 steps as shown in Figure 3.1. The chapter will then be followed by
chapter 4 which has sections illustrating the approach for specific asset classes, each following these
same 8 steps.
Figure 3.1 – The steps to creating an AHI
Step 1: Identify the assets and decide on review levels.

The aim of this step is to gather existing data relevant to creating the FMEA and eventually the AHI
analysis. In terms of process it will be similar across the range of asset classes. However, in terms of
detail it will differ between them. This point will be more obvious in Chapter 4. Here it is considered
that an asset is in a state as it was commissioned. Some assets are fairly simple with all elements
made in a single factory. Others are more complicated and where the supplier will provide
components from other manufacturers. For example, here a transformer is treated as an asset able to
perform its required function as a transformer. This functional unit would include, therefore, not just the
windings, core and main tank but includes its set of bushings, tap-changer, oil, cooling, and control
and protection systems.
Common factors include activities:
 Identifying each asset and its role.
 Identifying its design group and manufacturer.
 Identifying suppliers and designs of accessories.
 Establishing evidence from past failures in the design groups, their causes and their
consequences.
 Identifying cost of ownership and other factors that will determine the cost benefit for establishing
AHI activities.
 Identifying factors that could lead to a shorter asset life, location, duty etc.
Failure impact assessment

By assessing the impact of an in-service failure, it should be possible to identify the review level.
Factors to be considered are:
 Likelihood to cause system outage.
 Likelihood to lead to reportable loss of supply.
 Direct and indirect costs significant.
 Safety and environmental impact involved.
37
Review levels
The review level would be decided for each asset as illustrated in Figure 2.1. At its simplest a review
could be only office based and analysing data as obtained and identified in preceding sections. More
detailed strategies will provide more comprehensive activities and results to assess the current
condition. In particular, they will relate to the capability of the chosen diagnostic strategy to identify all
significant failure modes that are developing. This point was made in Chapter 1 with an example of a
transformer failure mode requiring an out of service test to assess winding movement.
Deciding just how comprehensive the AHI process should be can be addressed through a cost benefit
analysis. The degree of rigour will vary, but inevitably will identify the cost of impact from an in-service
failure and relate it to the cost of undertaking and implementing an analysis.
The FMEA process is part of Step 2 where the aim is to identify relevant failure modes and to link
them to diagnostic indicators. This also has a requirement that all diagnostic data being used has the
same quality relative to the speed of development of each failure mode. For example, the decision
might have been to work to Level 4, but for some assets out of service testing had been delayed. The
assignment would then be judged on a restricted basis of Level 3. This could also happen if the time
from the last measurement had been too long. Restricted data means that some failure modes are not
being assessed and a decision is required as to the consequences. What should happen is that the
assessment is made against all failure modes for which the data exists. It would then be given a
restrictive marker identifying modes missing or very aged. How this is handled is described later in
Step 5.
Step 2: Perform FMEA

From a top down approach, the FMEA steps are:
1. Perform functional decomposition of asset/components. These are the primary functions
that the asset provides in the network.
2. Define what constitutes a loss of each function identified.
3. Identify modes of failure. These are the failures leading to the loss of primary functions.
4. Identify the causes that lead to onset of a failure mode. For example, a transformer might
fail due to a winding short circuit. But the cause could be long term thermal ageing of the
winding paper or mechanical vibration.
5. Identify local and final effects of the failure modes. Split the asset into smaller sub-
components in order to evaluate the impact of a component failure on the primary function of
the asset. Examples would be to consider a transformer in terms of its main tank, bushings and
tap changer and the extent a failure of one of these would be critical to the transformers ability
to continue its function.
6. Identify diagnostic strategies. These would identify failure modes and their causes, together
with the onset and progression. This should link back to Figure 2.1 and the extent that failure
modes are linked to non-invasive surveys, offline measurements or continuous measurements.
Step 3: Assess Individual Asset Performance

This step is an analysis of the data obtained in Step 1. Its aim is:
 To identify factors specific to a design group that could result in a shorter lifetime than similarly
specified units.
 To identify factors specific to individual assets that could similarly result in a shorter lifetime.
 To provide identification of specific failure modes for the asset design group and its utilisation.
38
Figure 3.2 – Data to be obtained and assessed
Asset register data

An asset register should be in existence. This lists all assets within the company, their location,
function etc. Some are important in terms of their monetary value, network role and criticality (as well
as safety and environmental implications) should there be a malfunction. Consequently, an initial high
level cost-profit analyses on asset class level must be done in order to evaluate how intensive the
asset health index should be (the confidence level – see Figure 2.1). Within any company for the
range of assets, applications and their voltage levels there is likely to be a range of answers and
consequently a range of AHI processes.
The data should include:
 Asset design, manufacture, date made.
 Containing materials considered hazardous or requiring special treatment (e.g. PCBs).
 Locations and years in service.
 Specification at purchase and relevant standards applying.
 Asset role and circuit type.
 Total number of assets of same design group.
 Spare assets available as replacements in the event of a failure.
 Spare parts held or available for specific asset design.
 Asset technology and accessible expertise both in the company and with manufacturers.
Documentation
This information provides historic documentation held within company databases and other data
storage systems. In addition, the information relating to the performance of the particular design in
both factory evaluations and service experience should provide the basis for identifying failure modes
and their diagnostic indicators.
39
Table 3.1 – Asset data
Documentation from the manufacturer and held by the utility Documentation on policies in utility
 Copies of manuals  Maintenance policies and practices
 Original factory test results and standards applying now relevant – to acquire actual
activities identified for each unit
 Relevant design standards used for this build (IEC/ IEEE etc.)
 Policies and practices relevant to
 Specification requirements (BIL, short circuit, etc.) diagnostic testing
 Maintenance tasks, intervals, materials identified both when new by the  Operational policies and practices,
manufacturer and subsequently by the company factors that affect the mode of
 Failure investigation reports on this and similar design of units operation of the apparatus
 Service advisories from the manufacturer
 Details from manufacturer/repairer relating to any rebuild/refurbishment
done on this specific design group
The original specifications

These contain the definitions relative to the user circumstances for ambient temperatures, load,
voltage, power factor, source impedance, lightning levels, short circuit withstand, acceptable losses
etc. In some cases, requirements may now have evolved into ones where these original requirements
are inadequate for the present and future operating environments.
The purchase specification is, therefore, an important document to review in light of current
manufacturing standards, and against the actual operating environment. Discussions and
documentation within the utility should indicate any significant requirement to operate the unit outside
its nameplate ratings.
Standards
The standards used at the time of manufacture, which now may be considered to have been
inadequate, are identified; examples are withstand tests such as the Basic Insulation Level (BIL).
Designs also have changed over the years that improve or reduce performance and expected lifetime.
Identifying the design practice at build is important, therefore, to predict future performance in these
areas.
Factory Information
Original test data for the unit and accessories should be available for comparison with the specification
and also in-service results. Problems found either in manufacture or factory test with these units can
be identified and assessed against service experience. This might include poorer performance in
tests, such as for heat run results that could affect normal thermal rating and any consequence of
overloads. Details of any major rebuild should be identified, and related reports often give a good
indication of the rate of aging generally to be found.
Financial information on the different asset classes:

 CAPEX costs on the procurement and installation.
 OPEX costs for internal and external labour and materials and services.
Operation history on the different asset classes:

 Severe location prone to higher levels of corrosion rates, ambient temperatures or lightning
exposures, etc.
 Load levels – location with higher overload temperatures, switching duty etc.
Failure information
This comes from the malfunction reports of the asset:
 failure effect (major, minor failure),
 failure mode failure rates.
40
In case the failure information is not available, or the quality of the information is not good enough an
alternative can be to use the service experience of in-house technicians and technical engineers
to retrieve some information on the failures.
 Number and duration of forced outages caused by the asset
 Maintenance man days used and material cost for the repairs
 Disruptive consequences (failure effect) of past outages of the unit on the system
 Failure modes and investigation reports on this and similar design units
 Event records from SCADA, which may indicate frequency of system faults
Sometimes useful information can come from publications – such as CIGRE technical brochures such
as TB 642 (transformers) [B1] and TB 509-513 (substation equipment) [B2], [B21], [B27], [B29]. These
provide general outcomes which are helpful but do not provide the above specific information for
particular designs. Utility trade organisations and some service providers do, however, compile design
specific failure information.
Maintenance policy
The differences between the manufacturers' original maintenance plan, what activity has actually been
done over the years, and opinion as to best practice maintenance policy and practice should be
identified. This may indicate shortcomings that could affect future reliability. A review of maintenance
work undertaken indicates the problems encountered with the unit, their extent, and cost. This can be
used to indicate integrity and likely future performance risks, as well as yielding key performance
indicators for reliability and cost.
Historic test and inspection data

For most assets covered by this brochure there will be site test data ranging from commissioning to
testing during periodic outages, as well as routine surveys, involving external condition, operational
counts, liquid levels, gas pressures etc. There may be on-line survey results, gas analyses, infra-red
and UHF-RIV surveys. Whilst this activity relates to historical, perhaps aged, information they provide
a basis for trend analysis.
Failure Susceptibility Indicators.

The aim here is to use historic data identified above that relates to specific assets and from this
consider factors that might affect early onset and predominant failure modes, adverse condition
assessments and shorter lifetimes. It should be appreciated that this does not produce an AHI but is
merely an indicator of possible factors that could be life-limiting. It is there to provide a useful input for
FMEA and Steps 2-8 of the methodology.
Figure 2.4 shows an approach that has been used successfully for 20 years. The first two tasks had
such an indication as described here- apart from it being restricted to only design and manufacturing
limitations. In this reference side notes alongside the AHI summarised these factors – what are the
likelihoods and their likely impact in terms of failure mode and time frames. A similar system is
recommended in this brochure. The scoring is a colour code and represents a lifetime hazard, again
not an AHI, since the scoring here represents a perceived expectation of a shorter life rather than a
current assessment based on evidence:
 Green – No issues with any aspect that would produce a shorter life than others in the same
asset class; too early in product life to identify dominant failure modes.
 Orange – Some issues exist that might produce shorter lifetime; these could be performance or
application issues, or there could be design/ manufacturing limitations.
 Red – Application is expected to produce shorter lifetimes. It could be there are performance
issues with design or specification. It could be that some similar units have already failed for
these reasons and it is possible to identify dominant failure modes and their impact on future
time in service. There could have been prior damaging events.
 Black – It is fairly common that a utility has a strategic policy in place to replace as soon as
practicable all items of a particular design group due to experience including a poor design and
unacceptably high failure rates.
41
The criteria are:

 Age – This is more important when service-life is higher than age at failure of similar and directly
comparable units – for reasons associated with design quality or application.
 Design limitations – How the particular design has performed in the general utility experience is
important. Often poor in-service performance can be related back to design limitations described
above. Sometimes they may be due to a standard that has since required improvements. It
could be due to the purchasing specification not reflecting actual service conditions- say for
ambient temperatures, overloading or duty factors. The same points can be made about
accessories such as bushings and tap changers and should be included in a review of a
transformer. Industry or trade association reports of failures on similar designs can be a valuable
source of information about such limitations.
 Service history/performance – The differences between the manufacturers’ original
maintenance plan, what activity has actually been done over the years, and opinion as to best
practice maintenance policy and actual work undertaken should be identified. This may indicate
shortcomings that could affect future reliability.
 Maintenance – A review of maintenance work undertaken can indicate unforeseen problems
encountered with the unit, their extent, and cost. This can be used to indicate integrity and likely
future performance risks, as well as yielding key performance indicators for reliability and cost.
 Operational environment – Lifetime is often restricted where units are heavily loaded, have
high utilisation factors, overloads, exposure to transients, harmonics, frequent switching,
pollution levels, extreme weathering, etc. Re-locations and load histories can impact future
performance in many asset types. With the latter it is not just temperature rise with load but
also the starting ambient temperatures.
 Historic events – details of past damage, close short circuits and trips from earlier events.
Scoring Failure Susceptibility Indicators

Table 3.2 – First level assessment – example of a susceptibility review
Asset register Scoring – red/orange/green/black Summary score, coloured

data from Table as per worst, and
3.1 a b c d e indicated
age design life cycle costs duty factor events
Unit 1 a, b, c
Unit 2 a, d
Unit 3 b
Unit 4 b
Here the five criteria are colour scored with the worst colour being carried through to the summary
column as a colour and with the criteria having this worst score indicated as shown.
Step 4: Identify the condition indicators to be used

In this step, starting from the prioritized list of failure modes from Step 2, for each failure mode the
condition indicators capable of detecting the failure mode are identified.
The condition indicator should be able to detect the failure mode before the failure actually happens.
For each condition indicator the detectability and cost for monitoring should be estimated.
Step 4.1: Estimate the detectability

Important for evaluating the detectability of the condition indicator are the following 2 aspects:
The evolution rate of the failure mode in terms of onset and rate of progression. A good way to
estimate this is to look at the evolution of the failure rate of the failure mode with time. An illustration is
shown in Figure 3.3. How fast a failure mode develops will vary considerably from onset until failure
and this will have a significant effect on the diagnostic strategy.
42
Figure 3.3 – Failure evolution and diagnostics

The measurement method and the inspection frequency. If the time between two successive
diagnostic tests/inspections is longer than the time between the onset and the actual failure, the
indicator has a poor detectability. In this case the indicator is no better than a protection trip. In
addition to this, the measurement method should also be consistent and reliable. When performing
two successive measurements, the results should be comparable in order to rely on the result.
An example of detectability levels used for this purpose is shown below:
Table 3.3 – Detectability
Detection Likelihood of detection by control Ranking

Absolute uncertainty Control cannot detect potential cause/mechanism and subsequent failure mode 10
Very remote Very remote chance the control will detect potential cause/mechanism and 9
subsequent failure mode
Remote Remote chance the control will detect potential cause/mechanism and 8
Very low Very low chance the control will detect potential cause/mechanism and 7
Low Low chance the control will detect potential cause/mechanism and subsequent 6
failure mode
Moderate Moderate chance the control will detect potential cause/mechanism and 5
Moderately high Moderately high chance the control will detect potential cause/mechanism and 4
High High chance the control will detect potential cause/mechanism and subsequent 3
failure mode
Very high Very high chance the control will detect potential cause/mechanism and 2
Almost certain The control will detect potential cause/mechanism and subsequent failure mode 1
Step 4.2: Estimate the cost of monitoring the condition indicator

Once all the different condition indicators and their detectability for each failure mode are identified
one should evaluate the cost for monitoring the condition indicator. Monitoring a condition
indicator has a cost: labour, retrieving data, storing and analysing the information. In this step it could
be useful to look at alternatives. Example: compare the cost of performing a manual inspection at a
fixed frequency and the cost for installing an online monitoring system.
Note: when comparing two inspection methods (manual inspection and survey vs continuous online
monitoring for example), the detectability of the condition indicator on the failure mode can also
change. Take this into account in the end evaluation.
43
Step 4.3: Decide the condition indicators to be used

Which condition indicators are interesting to be used in the health index model depends on three
parameters estimated in the steps above:
 Impact of the failure mode (low, medium, high): See Step 3 in section 3.3
 Detectability of the condition indicator on the failure mode (certain, maybe, absolutely
uncertain): See Step 4.1 in section 3.4.1
 Cost of monitoring the condition indicator (low, medium, high): see Step 4.2 in section
3.4.2
In a first step estimate, based on the risk impact and the detectability, the risk reduction of the failure
mode by monitoring a condition indicator is evaluated. If by monitoring a condition indicator the risk on
one or more risk domains (primary function, health and safety, environment and costs) reduces, the
condition indicator has a high impact and should decrease the probability of failure of that failure
mode. Estimate the impact (low, medium, high) towards the risk reduction of the failure mode for each
condition indicator.
Based on the impact of the risk reduction of each condition indicator and cost of monitoring the
indicator a heat-map can be constructed visualizing the cost-benefit of monitoring the condition
indicator.
Condition indicators having:
 Medium to high impact towards the risk reduction of the failure mode and a low to medium
cost are interesting to monitor and thus incorporate into the health index model.
 Medium to high impact towards the risk reduction of the failure mode and a medium to
high cost should be investigated into detail by a business case to decide if one wants to
monitor and incorporate the condition indicator into the health index or not.
 Low impact towards the risk reduction of the failure mode and a medium to high cost are
not interesting to monitor and to incorporate into the health index model.
Figure 3.4 – Cost benefit analysis for evaluating condition indicators
Step 5: Collect inspection data

Once all desired condition indicators are defined, the next step is to start collecting data. Much of this
data will probably be already available within the organization but the following challenges may have
to be tackled:
 Data is missing or very aged – Obtaining outages for testing is increasingly difficult in some
networks. If the assessment indicates that Stage 4 in Figure 2.1 is to be used, then there could
be critical test data missing or significantly aged.
 Data is available but not in the right format – Due to historical reasons, some data may only
be available on paper format and is not yet digitalized or they may exist as digital data but in an
older format. In order to use this data, a digitalization of the data capturing process is necessary.
In addition to this, if one wants to use the historical data, a digitalization (or format conversion for
already digitised data) of the historical data may be necessary for some condition indicators.
44
 Data is available but it is low data quality – Data quality is an important point in the asset
health index process. If some data is poor in quality, a data cleaning campaign could be
necessary to augment the data quality before using it within an AHI process. Using data of poor
quality will result in a wrong asset health index.
 Data unit’s conversion – For the same data, it is sometimes possible to use different units
(example pressure in MPa vs bar, SF6 dew point at nominal pressure vs at atmospheric
pressure, etc.). In order to correctly interpret the measurement results, al results must be using
the same units within the whole company.
 Missing time stamps – In order to have the latest information and decide on the overruling,
each data point must have a valid time stamp.
There needs to be some form of marker indicated in the AHI tabulation that identifies that the AHI
includes a restriction associated with data quality – as per Erreur ! Référence non valide pour un
signet..
Table 3.4 – Indication of restricted data and limited confidence
Colour Meaning
GREEN Figure 2.1 is correctly assigned and all data matching failure modes exists and has been used.
ORANGE Data as identified above is missing or poor quality. However, the effect on the AHI is not
considered drastic.
RED Data as identified above is missing or poor quality. The AHI is not reflecting all failure modes
effectively.
This restriction on AHI was also considered in a more complex way, with scoring of quality levels –
see Table 4.3 of TB 761.
Before buying or developing new applications for data collection and storage, a proof of concept in a
spreadsheet for a limited number of data points is advised. This will help the utility later on in
describing the business requirements for the software required.
Step 6: Evaluate Current Condition relative to key failure modes

Step 6.1: Translating the condition indicator result to a condition scale code
score
Having the data retrieved and stored in a uniform way, the next step is to analyse the data and
translate the results of the different condition indicators into a condition indicator score. For doing this,
a relation between an input value (measurement result) and an output score must be defined. This
function is free to choose by the user and can for example be a linear function, or an exponential
function, or a step function and even a user defined function.
Note: In order to compare the results of different condition indicators when using a combination of
different functions, attention should be paid to use the same output scoring scale (minimum and
maximum).
Step 6.2: Translating the set of condition indicator scores to a condition

indicator index
The Condition indicator Index can be used to visualise in an easier way the state of the condition
indicator.
Example: a measured value of 100 – is this good or bad? In order to achieve this interpretation and
then allocate a scale score, a relation between the condition indicator score and the Condition scale
code must be given by the user. This relation depends on the scoring method and function used and
the level of information and expert skills available. A fuller discussion was included in Section 2.4.
Step 7: Aggregate analyses for AHI

Once all the different parameters which are interesting to include in the health index model are known
based on the FMEA exercise the AHI model of the asset class can be set up. An asset can consist
(depending on the complexity of the asset) of different subcomponents or sub-systems.
45
A transformer for example could be considered as comprising the main unit with tap changer and
bushings. Alternatively, the AHI could be computed for each category.
It is advised to set up the model by grouping different condition indicators with relation to the same
sub-component or the same failure mode into one sub-AHI. Each sub-AHI is determined by analysis of
the underlying condition indicators. These sub-AHIs will later on be combined to one overall Asset
Health Index (AHI). The important advice from this Working Group is that simple linear or weighted
scores from individual failure modes or subcomponents should not be added – see Chapter 2. Adding
would average out the scores and mask the badly scoring element. (With logarithmic scores 1, 3, 10,
30, and 100 masking when adding is far less likely). The important step is to use a method of
aggregation capable of identifying the highest scoring failure modes and its indicators.
Step 7.1: Aggregate condition scale code scores to a sub-health score and
asset health score
Once a score has been given to all condition indicators, a consolidated score by sub-component or by
failure mode can be produced. It is advised in this step to continue working with the condition indicator
score and not with the condition indicator index in order to keep enough detailed information of the
potential problems.
This conversion from measured values to 1-5 scores in the likelihood of failure by a particular failure
mode indicated in Table 2.1 is done by using pre-agreed functions defined by the user in order to
calculate a score by sub-component or failure mode. This is the most difficult part and where the
subject expert has a role, as do international standards such as IEC and IEEE, best practice guides,
etc.
The sub-AHI and AHI can be used to visualize in an easier way the estimated condition of the sub-
component and/or the asset. In order to achieve this, a relation between the final score and the AHI
must be given by the user. This relation depends on the scoring method and function(s) used.
Step 8: Identify mitigation actions

The results can then be compiled into tables or lists, either by asset class or on a bay/substation level,
Table 3.5. This would be an outcome of AHI. As here some failure modes are incomplete, and that is
perhaps a situation that is common. The AHI always represents an ongoing evaluation.
Table 3.5 – Example showing the relation between score and AHI
Asset Sub- Highest Failure mode for Effects Remediation Modified

number component score highest scoring possible score after
element remediation
Tr 123 Main tank 3 Winding ageing None needed 3
Tr 123 OLTC 4 Drive failure Not responding refurbish 2
to controls
Tr 123 Bushings 3 Ageing none 3
Here the dominant score is in the OLTC but can be improved with remediation. Nevertheless, the
reportable AHI score for this asset would be 4 should only the highest (worse) score of a linear scoring
system be used. With a base 3 log scale the total would be 10 + 30 + 10 = 50. This indicates that a
score of 30 has been given to one subcomponent, but that there are other areas with a score
indicative of significant deterioration.
46
Assembling the final AHI

The completion of the exercise is to draw all relevant information into a table such as Table 3.6. It will
include:
Column 1 Assets listed to include unit reference, asset register data from nameplates.
Column 2 Any liability score – age, design performance, duty factor, costs that may influence
performance and failure mode. This is described in Section 3.3.12 and Table 3.2.
Column 3 An indication that the assessment is restricted, and some failure modes are not
being assessed, or some data is aged. This is described in Section 3.5 in There
needs to be some form of marker indicated in the AHI tabulation that identifies that
the AHI includes a restriction associated with data quality – as per Erreur !
Référence non valide pour un signet..
Table 3.4.
Column 4-8 The scale code for each failure mode – scored here with a log base 3.
Column 9 The sum of all scale codes when using log base 3 scores.
Column 10 The alternative representation as per TB 761, identifying number of scale code
scores in each category 5, 4, 3, 2, 1, etc.
Column 11 The sum of all scale codes when using 1-5 as per Table 2.1.
Column 12 How the AHI score in column 11 could change if identified remediation is
undertaken.
Notes Details of relevance of liability factors and restricted outcomes would be given here.
It would contain any factors the reader should know in order to make a justifiable
decision.
Table 3.6 – The compiled AHI – example based upon Log base 3 scoring
Guidance columns AHI scoring Outcomes
From step 1 From step 5 Based only on 5 FMs in this example Alternative AHI approaches
Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 Col 7 Col 8 Col 9 Col 10 Col 11 Col 12
Asset Susceptibility Restricted FM 1 FM 2 FM 3 FM 4 FM 5 Sum log AHI AHI if AHI with

Register score from Analysis scores scored modified
data Section 3.2 Indication all FMs TB761 max 1-5 1-5
from scoring
Section 3.5
Unit 1 a, b, c 3 3 3 3 10 22 00140 3 2
Unit 2 a, d 3 30 3 3 39R 01030R 4R 2R
Unit 3 b 3 10 100 3 3 119 10130 5 5
Unit 4 b
etc. Note example with no data for one FM NOTE R in score of example unit 2 to denote
restricted assessment.
47
4. Applied methodology
The following chapter describes the application of the generic methodology described in Chapter 3 to
a range of substation assets. The structure again follows the eight steps of Figure 3.1 in Chapter 3. In
each of the sections 4.1 to 4.10 the asset is considered as a functional unit. Each chapter has been
prepared based on experience and also to illustrate differences in scoring systems and if aggregation
is used. Such a ranking is based on a single asset/functional unit basis and would be the source
information for prioritising maintenance, repair, refurbishment or replacement. How outcomes are
combined into AHI for bays, circuits or substations is considered within Chapter 5.
Steps common for all asset categories

Step 1: Identify the assets, gather asset data and decide on review levels
Basic information for Step 1
Table 4.1.1 – Asset register information example
Record Information Record Information

Asset manufacturer Factory location
Serial number Year of manufacture
Asset type/style Transformer/instrument Basic design information Windings/fluids, live/dead tank,
transformer, etc. drive mechanism, etc.
Rating Site and position
Current/Voltage designation
Asset role and circuit GSU, Network, Cap Bank Total number of assets
type in same design group
Spare assets available Asset technology Live/dead tank, air blast/SF6,
in the event of failure drive mechanism, core, shell
form, etc.
Cost implications
An early decision is to decide upon the diagnostic strategy as per Figure 2.1, and this involves a
simple cost benefit. The first step is to identify the consequences of an in-service failure and typical
inputs are listed in Table 4.1.2.
The next step is to review the extent to which diagnostic data is already being gathered within the
company, and the cost implication of implementing more comprehensive data. Also included is
provision to indicate the type of failure mode that may be identified.
Table 4.1.2 – Consequences of Failure
Failure type Impact scored – Low/Serious/Severe/Catastrophic

Circuit outage
Loss of supply
Direct Costs
Safety
Environment
Conclusions:
(1) Worst case impact
(2) Review level justified, reference Figure 2.1
48
Table 4.1.3 – Diagnostic indicators in use and failure modes
Diagnostic indicator data available in utility Failure modes being Failure modes not
assessed being assessed
Visual surveys Y/ N
Survey diagnostics Y/ N
Oil analyses
Oil Quality
Dissolved gases
Paper ageing
OLTC oils
Gas analyses
Gas leakage
Infra – red surveys
UHF – PD surveys
UV Corona surveys
Timing
Other
Offline tests Periodicity

Dielectric dissipation factor (DDF) / Capacitance
Windings
Bushings
Sweep Frequency Response Analysis (SFRA)
Insulation Resistance
Winding Resistance
Timing
Other
On-line continuous monitoring Y/ N
Partial Discharge (PD)
DDF/ Capacitance
DGA
Relative Humidity
Gas density
Other
Decide the Review level

From assessing the impact in the manner described it should be possible to identify the review level,
as per Figure 2.1.
Indicate using Figure 2.1 1 /2 /3 /4 /5

Analyse in accordance with next chapters.

Documentation from the manufacturer and held by the utility, often at the substation.
49
Table 4.1.4 – Common asset data
Documentation from the manufacturer and held by the utility, often at the substation Availability – Y/N
Copies of manuals
Original factory test results and standards applying
Relevant design standards used for this build (IEC/ IEEE etc)
Specification requirements (BIL, short circuit etc)
Maintenance tasks, intervals, materials identified both when new by the manufacturer and
subsequently by the company
Failure investigation reports on this and similar design units
Service Advisories from the manufacturer
Details from manufacturer/ repairer relating to any rebuild/ refurbishment done on this specific design
group
Documentation on Policies in Utility

Maintenance policies and practices now relevant – to acquire actual activities identified for the unit
Policies and practices relevant to diagnostic testing
Operational policies and practices, factors that affect the mode of operation of the apparatus
Historic Asset Performance Data Assessment- for guidance

The scoring:
 Green – No issues with any aspect that would hasten early failure, too early in product life
to identify dominant failure modes.
 Orange – Some issues that might hasten early failure, some performance issues
 Red – Application likely to hasten early failure, performance issues with design and
possible to identify dominant failure modes.
 Black – Assets where company policy is to replace all assets with this design as soon as
practicable, due to unacceptable performance.
Table 4.1.5 – Scoring Historic data
Information Result Score

This is typical of the information used, not all will be available.
a – Age relative to estimate for asset lifetime % used >150% Red
75-150% Orange
<75% Green
b – Poor performers
Failure rates higher and shorter lives than other designs To be able to identify any design Black
Limitations revealed from failures and tear downs on this design group limitations leading to shorter lives, and Red
to indicate related failure modes that
Activity description of past major repair or modifications to similar designs, are more likely with this design group. Orange
as well as details of any investigation undertaken on the specific asset. Green
c – Operation and maintenance records

Number and duration of planned and forced outages for the unit, for List of malfunctions, failure modes to Black
calculation of reliability KPI. identify: Red
Maintenance history on the unit - Unreliable units Orange
Maintenance man days used, and material cost spent on the unit, for - High ongoing cost units. Green
calculation of cost KPI, and as element in lifetime cost calculation - Multiple repair needs
Disruptive consequences of past outages of the unit on the system, for use
in future risk assessment.
Failure investigation reports on this and similar design units
Activity description for any past major repair or modification
Lifetime Cost Analyses
50
Information Result Score

This is typical of the information used, not all will be available.
d – Operating environment exposures
System load and voltage levels, overload instances, load power factor Conditions that could lead to shorter Red
Prior through faults lives Orange
Pollution and humidity levels Green
Lightning levels
Natural disasters – earthquakes, floods, hurricanes, etc.
System Fault frequency/current/time levels
Protection system
e – Historic Event Data
Event records from SCADA, which may indicate high frequency of system Prior damage possibly giving life Red
faults withstood by the unit. reduction. Orange
Known damage caused by events listed in d above Green
Table 4.1.6 – Level assessment – example of review of failure mode susceptibility factors
Asset Scoring – black/ red/ orange/ green Example

register data
from summary
Age Design Costs Utilisation Events
Table 4.1.5 score
a b c d e
Unit 1 d, e
Unit 2 a, d
Unit 3 b
51
Transformers and reactors

It should be appreciated that Study Committee A2 have recently produced a technical brochure
TB 761, [B3]. Their structure produces a series of “transformer assessment indices”. These are for
identifying the need to maintain, refurbish and replace – all in prioritised listing by condition and timing.
The difference between the two brochures is that here the focus is to produce a listing in terms of
likelihood to fail within a defined time scale. Like the current B3 work, TB 761 methodology for AHI is
not based upon age or statistically derived lifetimes. As here it is starting with an identification of the
dominant but varied failure modes and their diagnostic indicators within a fleet, considering the main
unit and each of the accessories as far as their failure impacts upon the whole unit. It expresses very
similar advice on scoring, aggregation and weighting of scores to those described here in Chapter 3.
Step 1: Identify the assets and decide on review levels

Within this category are the major high value assets – power transformers and reactors, and these
would be treated similarly. In addition, but with restricted Level 3 reviews would be:
 Air cored reactors,
 Arc suppression (Peterson) coils,
 Auxiliary and earthing transformers.
For most substation transformers and reactors it would be a normal requirement to migrate quickly
away from an asset health index having a simple Level 1 confidence to one of the more
comprehensive Levels, 4 and 5 in Figure 2.1. Which level would depend on cost benefit factors
included in the following list:
 Transformer and reactor purchases in transmission and generation utilities have a big impact on
capital expenditure (CAPEX) budgets and it is important to identify likely life expectancies for the
utility stock in order to justify investment planning requirements.
 Unlike switching assets, many of the failure modes in large transformers are not repairable on
site and usually lead to asset replacement. However, operating in a colder climate a network unit
in a shared load N-1 configuration can last up to 80 years before these occur [B10].
 GSU and interconnector failures present a risk to the continuity of the grid and its income earning
capability. An unexpected early life failure can have a significant impact. For network units the
risks are less when in N-1 parallel operation.
 Some outage times can be considerable, depending upon the spares holding and logistics of
transporting the replacement to site.
 Some failure modes are catastrophic with safety and environmental implications from fires,
explosions and release of insulating fluids into the environment. This is particularly the case
where bushing failures are often catastrophic and destroy the transformer in the process. On-line
diagnostics is valuable where this is a perceived risk.
 Some important failure modes require offline diagnostics – this includes mechanical deformation
of the winding, clamping and core movement. Ageing of the DDF/capacitance measurements. Oil
testing to reveal bushing degradation can be effective, but some utilities have restrictions on oil
sampling of factory sealed equipment [B5]. Gaining an outage to allow oil sampling and/or offline
testing is a problem for some.
Purchase files
 The purchase documents should include design information such as the winding configuration,
materials and manufacturing processes. It should also identify the oil preservation system, tap
changer, and cooling system. The duty should be identified – network, GSU, interconnector etc.
 Performance of the unit or design group in factory acceptance tests. This would include poorer
performance in tests, such as for heat run results that could affect operating at both normal
thermal ratings and any as a consequence of overloads.
 Original specifications define the user’s requirements for ambient temperatures, load, voltage,
power factor, source impedance, lightning levels, short circuit withstand, acceptable losses,
source impedance and load power factor. Some of these factors may now have evolved such
that these requirements are inadequate for the current and future operating environments. The
purchase specification is, therefore, an important document to review in light of current
manufacturing standards, and against the actual operating environment.
52
 In light of these specifications discussions and documentation within the utility should indicate
any significant requirement to operate the unit above specified conditions, exposure to unusual
levels of short circuits or switching transients, DC carry through, harmonics, extent of reverse
power flow, etc. together with historic relocations, whether system voltages are at the top of the
voltage range and over-fluxing is a possibility, and whether fault frequency or levels are higher
than specified. The latter could occur if source impedance or earth impedance have changed, or
protection is slower than specified.
 Standards used at the time of manufacture which now may be considered to have been
inadequate are identified; examples are withstand tests such as BIL. Designs also have changed
over the years, using Roebel conductor transpositions, interleaving, winding in discs rather than
layers, directed oil flow, belted cores rather than core bolts are all examples of changes that
improve thermal performance and impulse withstand. Identifying the design practice at build is
important, therefore, to predict future performance in these areas.
Failure Mode Susceptibilities

The review of the foregoing data should allow an assessment of the susceptibility factors as outlined in
Table 3.2. This can then be built with the asset register to provide the basis for creating the AHI by
seeking a link between susceptibility factors, failure mode evidence from similar vintage and designs
and the diagnostics to be used.
Step 2: Perform FMEA and identify condition indicators to be used

Historically many users have based transformer and reactor lifetime upon just one dominating failure
mode, that is long term ageing of the winding insulation. It is certainly true this will be the ultimate
determinant of life of equipment having a paper covered winding. Ageing occurs by depolymerisation
brought about by high loading levels and consequentially the time exposed to elevated temperatures.
The relationship between depolymerisation and temperature was derived from laboratory studies on
small samples sealed in oil exposed to temperatures above 100° C. IEC 60076-7 and IEEE C57-91
standards reflect this long-held approach and if this really was to be the dominant failure mode then a
bathtub wear out with an identifiable onset of unreliability would indeed be identifiable. But real life is
more varied with temperature rise on loading being influenced by a range of starting/ambient
temperatures, variable design quality yielding a more varied relationship between estimates of hot
spot temperatures from temperature measurements, changes in oil temperature or average winding
resistance. Depolymerisation is also significantly affected by moisture and oxygen levels in the oil. The
range of variables thus introduced contributes to the variability in the rates of paper ageing being
found.
Moreover, transformer failures that do occur come from a variety of other causes. When a range of
failure modes is found within an utility stock, this prevents the application for a wider population of
transformers of any simplistic application of bath-tub failure characteristics and predictable onset of
unreliability from this one cause of failure [B1].
It is the responsibility of SC A2 to identify detailed failure mode charts that cover the wide range of
ways that a transformer may fail and produce a detailed FMEA. They do this in TB 761 in far greater
detail than here. But in essence they should identify failures possible in basic design areas and the
range of relevant diagnostic indicators. Table 4.2.1 summarises the general areas for such an analysis
and the utility should identify the diagnostic strategy to be applied for their particular fleet.
53
Table 4.2.1 – Common faults and indicators (simplified list)
Failure Mode Design Area Effect Indicator

Tank with rusting and perforation, Oil loss, low oil levels and moisture ingress. Visual inspection
deteriorated gaskets and poor conservator Reduced insulation strength can lead to Infra-red
seals/ oil containment, helmet issues in flashover across gaps and surfaces
bushings Oil tests
Connection overheating
Oil quality issues from ageing, moisture Accelerated ageing, flashover across Generally indicated by oil testing
ingress, contamination, corrosive sulphur, barriers, loss of insulation quality in both oil Visual inspection of tank and pipes
etc. and paper caused by permeation of moisture may indicate loss of oil and staining
These may be caused by oil type, defective or corrosive sulphur.
Inspections of oil preservation system
moisture control, failures of gaskets and Acids and sludge attack on core and and oil escape
pumps. windings
Dielectric off-line tests
Destruction of critical function
Magnetic circuit core supports, core bolts Loss of mechanical support and movement, DGA if sufficient for arcing or
or straps vibration, abrasion, overheating overheating
Offline diagnostics
Flux shunt issues, circulating currents at Overheating and arcing DGA and sometimes UHF-PD and
core earth and supports infra-red surveys
Winding issues – Mechanical Winding movement Offline diagnostics of PF/C and SFRA
Close up short circuits
Winding issues – Dielectric Partial discharge DGA tests from oil samples or
Surges and high inter-turn stresses, Inter turn faults continuous monitoring
vibration, loosening and wear Internal flashovers UHF in-tank probes for PD
Oil quality deterioration Winding insulation overheating Offline diagnostics, DDF/C
Corrosive sulphur Oil sampling and testing
Winding issues – Thermal, Thermal degradation by depolymerisation of Oil sampling for DGA.
Hotspots, poor thermal design the insulation leading to inter turn failures Furan measurements
Overloading Connections Off-line DDF/C
Operating at ambient temperatures higher Winding resistance for connections
than used for rating calculations. Infra-red scan for pumps and fans
Malfunction of pumps, fans and control
systems
Bushing failures due to ageing, design Overheating, PD, loss of insulation quality Visual inspection of Oil level gauge for
issues, moisture ingress. Often explosive with safety and loss of oil
Connection failures environmental impact Broken sheds
Loss of oil May lead to transformer fire On-line scanning UHF PD and IR
External connection overheating On-line monitoring for DDF/C and PD
Out of service DDF/C tests required
Tap changer issues from oil ageing, Overheating and partial discharge Position indicators, temperature
corroded contacts, stuck mechanisms Explosion risk sensing, DGA, UHF-PD and IR scans
Winding resistance
Protection- water ingress and corrosion, Malfunction and failure to perform Visual inspection, testing
service outstanding, aged system
Pumps and fans not working or inefficiently Winding overheating Visual inspection, Infra-Red
Oil containment concrete cracks, leaks, Environmental impact if oil spilt Visual inspection
water blockage
Step 3 Assess Individual Asset Performance

The aim is to gather performance data relating to specific assets. This may require only a restricted
activity for low cost distribution units. Much of the context has been given in Chapter 3, but in addition
the following points can be made.
The strength of a winding to withstand future faults is often reduced by relocations and prior close-up
short circuit faults. Risk assessment requires providing information on relocations, past fault
numbers/levels/durations and the current incidence of short circuit system and lightning events –
together with information on system impedance, the protection system used and status of earthing
systems. This may then be used with offline diagnostics such as sweep frequency response analysis
(SFRA) to identify any distortion that may have occurred.
Load history, including overload history, and ambient temperatures are important to the life of
insulation. This can be assessed using temperature data, and from results of any earlier internal
54
inspection. Included within this assessment is the effect of moisture estimated from oil testing. The first
stage is to review the load history over the life of the units. From these representative periods are
selected and the loss of life calculated using equations from relevant standards (e.g. IEEE C57.91)
and from temperature data. Ideally, the hotspot temperatures are available.
Step 4: Identify diagnostic strategy
Basic Asset health review Level 1

This would be an office-based study. All units would require this analysis. For residential transformers
it may be appropriate to go no further than a level 1 maturity level, see Figure 2.1.
Simple Asset health review Level 2

A traditional strategy would add a routine visual inspection noting condition abnormalities and basic
data such as trip records, see Table 4.2.3.
Intermediate Asset health review Level 3

For most transmission and sub-transmission units the aim would be to start with a Level 3 and apply
non-invasive diagnostics that do not require an outage. This includes at least oil testing for quality and
dissolved gases and apply to both main tank and tap changer compartments. More information
particularly relating to accessories can be gained from other surveys such as UHF-PD and Infra-Red
to detect discharge, arcing and overheating (see TB 660 [B5]).
Advanced Asset health review Level 4

Some failure modes, such as mechanical damage and related failure modes, as well as deterioration
in bushings, would follow from outage testing. Periodic outage testing would be appropriate where
these failure modes are considered a concern. It would be appropriate also after a close-up short
circuit. The AHI would therefore quickly progress to a Level 4 review.
Advanced Asset health review Level 5

Some failure modes can initiate and progress in a short time that is inconsistent with routine survey or
out of service testing. Then on-line diagnostics with permanently installed diagnostic systems is
appropriate. This is particularly the case with key transformers or where catastrophic failure could lead
to safety or environmental consequences. Monitoring would normally include partial discharge,
dissolved gases, relative humidity and bushing power factor. Increasing interest will cover transients,
harmonics and power quality as more circuits include power electronic devices.

The diagnostic strategy should be identified as indicated from Step 3, and data collected. The data as
collected may be a qualitative observation – e.g. oil staining, or a measured value – e.g. 100 ppm of
hydrogen, inter-winding dielectric dissipation factor 1.5%. Such is the data as it is collected. The next
step is to interpret these data points in terms of criticality to failure modes and likelihood of failure.
Step 6: Evaluate Current Condition relative to key failure modes

Each data point needs to be scores with a scale code in either a log or linear system. The
interpretation must always be consistent. That is an indicator the failure mode will induce a
transformer failure in a defined period as indicated in Table 4.2.2. These are typical of what might
apply, but individual companies should define their own time criteria.
55
Table 4.2.2 – Scale code assignment
Scale code Scale code Description Fault free time Remaining life
log linear
1 1 Very good condition >25 years
5-10 years before maintenance on OLTC, cooling Or 40-80y in network
3 2 Good condition system, oil etc. 15-40 years
10 3 Fair condition 2-5 years before investigation. 5 - 15 years

30 4 Poor condition 3-24 months before investigation. Review rating <5 years
restrictions.
100 5 Critical condition Do not re-energise. Risk assess for 0-3 months 0 - 3 months
possible restricted use to manage faulty unit.
Visual inspection
Confirmation of the asset register data and the equipment actually in each bay is recorded correctly.
Results from an external visual assessment, Figure 4.2.1 may be used to relate to failure modes and
relationships between observed data and risk of failure as per Erreur ! Source du renvoi
introuvable.
Table 4.2.3 – Visual Inspection
Step 5 Step 6
Feature Data or observations Converted to scale code

as collected Scores 1-5 or log 1-100
Concrete footings/oil containment walls.
Cracks or deterioration? Anchor bolts missing or rusty? Evidence of oil leak?
Earthing leads or straps
Oxidised/tight?
Tank issues
Tank rusting, overheating, missing bolts, oil leak stains, gasket deterioration,
Paint peeling or rusty tank?
Bulging of tank, indicating internal deformation or overheating?
Gasket deterioration? Oil leaks? Bushing liquid levels- levels, glass condition
Loose or missing nuts, bolts, or washers?
Liquid level in main tank, OLTC, oil expansion tanks?
Check Nitrogen system, bottles, pressure above 2 psi
Cooling and operation of radiators pumps and fans
Pumps and fans – check operation, rust, debris
Check gauges to see if cooling should be on (Top oil and hot spot),
Check which pumps and fans are on,
Note oil flow rate,
Note unusual noises
Check of radiators for correct temperature profile
Moisture control
Silica gel, refrigerants, nitrogen systems
Gauges and Controls
Viable readings, damage, wiring, terminals,
Check condition of cabinets check for water ingress, heater OK, wiring
condition, terminals tight/loose, contactors.
Check for excess heating in controls – discoloration of metals or charred
insulation, smell.
Bushings and surge arresters
Porcelains chipped and broken sheds
Oil levels
Connections sound
Surge Counts on arrester
56
Figure 4.2.1 – Inspection example findings – Tank rusting and a stuck WTI
Data from Non-invasive in-service periodic diagnostics

Outages can be difficult to achieve and perform a realistic assessment. It is increasingly common to
use non-invasive survey tools more generally, and for high impact/cost items an installed on-line
monitor. TB 660 [B5] describes the introduction of basic diagnostics into site surveys. Many will be
undertaking routine infra-red thermography around the site, and taking oil and gas samples, for
example. Detection of partial discharge and arcing is less well established for some. A simple method
is a systematic patrol with a UHF scanner and an antenna, identifying partial discharge at higher
frequencies than produced by corona and pollution discharge. Where detected near to a transformer it
can be investigated using either a HFCT or UHF probe penetrating the tank through an oil valve (see
Figure 4.2.2.).
Examples of techniques that can be used are shown in Figure 4.2.2. Details see reference [B8].
Table 4.2.4 – Survey test results
Step 5 Step 6
Test Data or Converted to scale

observations, in code Scores 1-5 or
units as collected log 1-100
Oil testing – Oil quality for acidity and consequences of contamination
or deterioration. It should include testing for contaminating material such
as PCB and for corrosive sulphur.
Oil testing for paper ageing – This is a furanic compound test,
particularly looking for rate of change. Some research is indicating other
tests may be useful additionally.
Dissolved gases – Levels and rates of change for indicating active
thermal, PD or arcing fault.
OLTC selector – Oil quality, partial discharge and DGA. The diverter oil
is assessed from moisture and dielectric breakdown voltage
Infra-red surveys – A simple survey method using an infra-red camera
to detect high temperatures indicative of overheated joints, shunts, non-
operational fans, overheating in OLTC, bushing oil levels.
UHF-PD detection – A simple survey method using a UHF antenna and
scanner to identify partial discharge. By detecting above 100 MHz it
escapes the unwanted effects from corona and surface discharge.
Internal PD – If a PD source is indicated it can be further investigated
using an in tank UHF probe or an external HF CT wrapped around a
neutral strap – see Figure 4.2.2
Surge arrester – Compensated third harmonic current measurement
57
Oil – taking a sample of oil from a valve –

for laboratory analysis, top left.
Infra-red – a scan showing a cold non-
operating fan, top right.
UHF- PD – a frequency scan to detect
high frequency radiation. PD confirmed by
frequency range and phase resolving.
PD –within the tank indicated using a UHF
CT or an in tank UHF probe (both shown,
bottom left).
Figure 4.2.2 – Some site diagnostics for use without outages
Installed monitoring
In addition, there are several permanently installed systems now available.
Table 4.2.5 – On-line monitoring
Outcomes from test methods – if used Score 1-5 or log 1-100

Online dissolved gas detectors have become more reliable. Commonly are online dissolved
gas monitors for transformers.
There is even longer experience with on-line bushing monitoring of dielectric dissipation
factor.
Online monitoring of partial discharge of main windings or bushings is newer, and
interpretation is less defined.
Relative saturation sensors can give a reliable indication of moisture content in the windings
OLTC monitors for temperature and movement functionality
Vibration monitors to detect core loosening
Outage investigations
Unusual results from online survey methods are best investigated further with offline testing, targeted
at specific failure modes. Outage testing is also appropriate for detecting some failure modes.
Table 4.2.6 – Offline and investigative testing
Result of Tests – based on level and rates of change AHI Score 1-5 or log 1-100
Winding Dielectric Dissipation Factor (DDF)/capacitance detecting deteriorated primary
insulation, mechanical deformation
Bushing DDF/capacitance detecting deteriorated primary insulation
Dielectric frequency response giving some indication of moisture content
Turns ratio detecting winding conductor issues
Leakage reactance and excitation current detecting issues in the magnetic circuit
Sweep Frequency Response Analysis (SFRA) detecting winding movement
Winding DC resistance for connection issues in winding and tap changer
Core earth resistance (for core form designs) for single point or loss of earthing
58

Each of the failure modes listed in Table 4.2.7 should at this point have their own scale code assigned.
The next step is to aggregate to a single scale code for the complete transformer.
Table 4.2.7 – Common faults, indicators and scoring for AHI
Failure modes Indicator Score 1-5 or log 1-100

External – Tank/oil containment Visual inspection
Oil ageing, Contamination Oil testing, visual inspection
Magnetic circuit and mechanical integrity – winding, core Offline diagnostics
Dielectric – Winding issues – surges and inter-turn stresses, DGA tests, UHF in-tank probes for PD and
vibration, loosening and wear, hotspots offline diagnostics
Thermal – Flux shunt, circulating currents, Winding hotspots DGA and infra-red and sometimes UHF-PD
Malfunction of pumps, fans and control systems surveys.
Tap changer – oil ageing, corroded contacts, stuck Position indicators, temperature sensing,
mechanisms DGA, UHF-PD and infra-red scans, winding
resistance
Bushings – Ageing, design issues, moisture ingress, Oil level gauge UHF-PD and infra-red, out of
connection failure service tests required.
Cooling – Pumps and fans faults Visual inspection, infra-red scan
Protection – Water ingress and corrosion, service Visual inspection, testing
outstanding, aged system
Oil Containment – Concrete cracks, leaks, water blockage Visual inspection, testing
The scoring in Table 4.2.7 is used to compile the AHI Table 3.6 with the ten failure modes. Whilst
there may be six bushings each with several failure modes, it is their collective assessment that is
used. If assembled using a log score the summation column 9 can be displayed as a list from most
likely to least likely to fail. Displaying as a spreadsheet also has the added value over a number
spewed out of an algorithm. In this way the engineer can see how the AHI numbers are derived.
Step 8: Identify mitigation actions to improve AHI

This would include options such as an oil change or processing and could produce the greatest cost
benefit. Other options could include a bushing or tap changer exchange where these were creating the
poor score. Connection issues can sometimes be rectified. A mitigated score can then be calculated
(column 12 of Table 3.6) and a cost benefit undertaken to assess the value of the mitigation.
59
Circuit breakers
Step 1: Identify assets and decide review level
Figure 4.3.1 – Live tank (left) and dead tank (right) circuit breakers
Figure 4.3.1 shows two of the main types of breaker in use today in air insulated substations [B9]. The
interrupters are either at high voltage and the operating mechanism is at the bottom of the insulated
stack (i.e. “live tank”), or, all is at earth potential (i.e. “dead tank”). A third category is the GIS breaker
which is fully enclosed within the GIS trunking.
Over the years the insulating medium and arc extinction principle has changed from oil and air blast to
SF6. The SF6 breaker has been the preferred choice at transmission voltages for many years, but all
types are still found in many countries. There is now increased interest to develop alternative or mixed
gases and also to use vacuum. A circuit breaker has two states; to be either fully insulating and
withstanding the system voltage across the terminals, or fully conducting and passing system current
through the contacts. The breaker needs to be able to pass from one state to the other in a few
milliseconds and to close circuits without generating excessive overvoltages on the system. Selecting
the number of circuit breakers to install at a substation is important to allow sections to be isolated with
minimal disruption.
The major application is essentially passive, to be there to operate and protect primary assets (and
hence provide system security) should the system be subjected to an abnormal fault current. The
breaker then operates and the fault current in the circuit is interrupted. In contrast, there is second
application where the breaker is operated hundreds of times a year, and here it is to manage system
current and voltage by switching in reactors and capacitor banks. In the former application operation is
initiated by the protection system; the latter is by a control command. The command to open is made
by sending tripping signals to the circuit breaker mechanism trip coils. Each interrupter consists of a
fixed part and a moving part. When the interrupter is opened, the SF6 gas is propelled through the
created gap forcing the arc to be extinguished.
There are a number of different types of mechanism that store the energy required for driving the
moving contact. The most common today is the spring type where the opening and closing strokes of
the circuit breaker are performed by releasing charged springs. Earlier designs used hydraulic or
pneumatic drives or charged capacitors.
Since high voltage circuit breakers provide critical protection of high value primary assets such as
generators and transformers, as well as facilitating correct operation of the system, there must be one
of the highest levels of investment in creating AHI analysis.
Step 2: Perform FMEA and identify condition indicators
Typical failure modes and condition indicators

This section describes performance evaluation for circuit breakers, which is an essential step to
develop a health index. Asset managers should determine which condition indicators to use for health
index calculation.
First step for this is to think over failure rates for each failure mode to select critical ones. TB 510 [B2]
provides useful information about failure statistics including typical failure modes which have relatively
high probability of failure for circuit breakers. Considering this point, the condition indicators related to
60
these critical failure modes can be a sensible option to choose for health index calculation. TB 510
reports on the major failures of 840 circuit breakers and provides a list and distribution of 25 major
(MaF) and minor (MiF) circuit breaker failure categories. (See Table 4.3.1 au-dessous.)
Table 4.3.1 – Distribution of CB failures per cause
Primary cause Code MiF [%] MaF [%]
Design fault (manufacturer responsibility) 1 6,1 5,8

Engineering fault (utility responsibility) 2 1,5 0,6
Manufacturing fault (poor quality control) 3 3,2 7,6
Incorrect transport or erection 4 0,8 3,6
Inadequate instructions for transport, erection or operation 5 0,3 0,0
Other 6 1,6 2,3
Current in excess of rating 8 0,0 0,2
Voltage at power frequency in excess of rating 9 0,0 0,4
Switching overvoltage in excess of rating 10 0,0 0,2
Lightning overvoltage in excess of rating 11 0,1 1,8
Mechanical stress in excess of rating 12 0,4 1,3
Environmental stresses (other than lightning) in excess of ratings 13 0,7 0,8
Corrosion 14 12,3 4,4
Wear, ageing 15 55,9 42,3
Incorrect operation 16 0,4 1,1
Incorrect monitoring 17 0,1 1,7
Electrical failure of adjacent equipment 18 0,0 0,8
Mechanical failure of adjacent equipment 19 0,2 1,3
Human error 20 0,3 0,7
Incorrect maintenance (incl. inadequate instruction for maintenance) 21 1,1 3,3
External damage caused by animals, humans etc. 22 0,3 1,0
Other abnormal service conditions 23 0,6 2,3
Unknown other causes 25 14,1 16,5
Additionally, TB510 provided the Table 4.3.2 identifying the MaF modes:
Table 4.3.2 – MaF modes
Failure mode for MaF

1 Does not close on command
2 Does not open on command
3 Closes without command
4 Opens without command
5 Fails to carry current
6 Breakdown to earth in closed position
7 Breakdown to earth during a closing operation
8 Breakdown to earth in open position
9 Breakdown to earth during an opening operation
10 Breakdown between poles in closed position
11 Breakdown between poles during a closing operation
12 Breakdown between poles in open position
13 Breakdown between poles during an opening operation
14 Breakdown across pole (internal) during a closing operation (does not make the current)
15 Breakdown across pole (internal) in open position
16 Breakdown across pole (internal) during an opening operation (does not break the current)
17 Breakdown across pole (external) during a closing operation
61
Failure mode for MaF

18 Breakdown across pole (external) in open position
19 Breakdown across pole (external) during an opening operation
20 Locking in open or closed position (alarm has been triggered by the control system)
21 Loss of mechanical integrity (mechanical damages of different parts like insulators, etc.)
22 Other
Both Tables could assist the assets managers in evaluating the performance of the circuit breakers
and identifying the condition indicators which would be essential for the specific failure mode.
Also, TB 167 [B18] provides detailed information about relationship between condition indicators and
components’ functions, which can be associated with failure modes. Bear in mind that though these
references are highly informative, asset managers still have to consider specifics associated with their
assets in order to determine suitable condition indicators.
Defects and failures for circuit breakers

For the readers’ reference, the typical defect, the failure modes and the results of condition monitoring
and testing are enumerated below. TB510, TB 737 and TB 167 describe these matters in detail [B2],
[B17] and [B18]. The results of the survey presented in TB510 demonstrated that:
 The life tank circuit breakers have approximately three times higher failure frequencies than for
dead tank, GIS and metal enclosed circuit breakers.
 The reliability of all kinds of circuit breakers is less at higher voltage levels.
 Circuit breakers switching shunt reactors and capacitors are the most unreliable of all kinds and
this trend is valid for all types of circuit breakers (enclosures). As noted earlier these breakers
are more heavily used and is the likely explanation of the poorer failure performance.
 For major failures the total failure frequency of the operating mechanism is 0.14 MaF per
100 CB-years. Comparing this value with the failure frequency of all circuit breakers of 0.30 MaF
per 100 CB-years it can be assumed that about half of all major failures of circuit breakers are
related to operating mechanisms.
 Hydraulic mechanisms have the highest failure frequency and the spring mechanisms have the
lowest failure frequency. Pneumatic mechanisms have slightly higher failure frequency than the
spring mechanisms.
 In the case of failure modes, the dominating ones are: “does not close on command”, “does not
open on command” and “locking in open or closed position”.
Failure susceptibility indicators

It is also imperative to take so called “Failure Susceptibility Indicator (FSI)” into account for the
condition estimation. This means the factors which increase or accelerate tendency to fail but is not
categorized as the condition indicators. For example:
 Location of the equipment
 Environmental zone (temperature/humidity/earthquake/sea site, etc.
 Poor design
 Poor quality control in manufacturing/installation stage
 Common failure mode
 Indoor vs outdoor installation
 Application (Capacitor/Shunt reactor switching vs Cable)
Table 4.3.1 shows multiple failure causes that could be attributed to FSI’s and its impact on the major
and minor failures of the circuit breakers.
If there is an obvious FSI to influence the performance of the circuit breakers, asset managers should
keep an eye on these matters and take proper action to improve the situation. With the specific
selected model to predict value of the health indices, these factors can be reflected in a quantitative
way in the calculation of health index.
62

It should be noted that Study Committee A3 has an active Joint Working Group A3.43 “Tools for
lifecycle management of T&D switchgear based on data from condition monitoring systems” with the
scope to include the identification of the critical condition indicators of T&D switchgear and
establishing criteria for developing analytical tools to support switchgear health assessment. The
materials developed by JWG A3.43 would complement the process of developing the AHI for circuit
breakers by providing the deeper understanding of the critical condition indicators specific for the
circuit breakers’ FMEA. The analysis provided by JWG A3.43 would be applicable for Levels 4 and 5
of the circuit breakers health review. At the same time, the methodology of establishing and
aggregation of the AHI developed by the B3.48 WG will be guiding the process of AHI for circuit
breakers and other T&D switchgear components which are the part of the substation
Details of FMEA are well described in several documents, such as IEEE C37.10 “IEEE Guide for
Investigation, Analysis and Reporting of Power Circuit Breaker Failures” [B19], IEEE C37.10.1 “IEEE
Guide for the Selection of Monitoring for Circuit Breakers” [B20], CIGRE TB 737 [B17]. The later one
established the correlation between the circuit breaker components and various diagnostic methods
(see Figure 4.3.2)
Figure 4.3.2 – Measured physical parameters in switchgear condition evaluation
63
IEEE C37.10 Guide includes twelve different tables, which describes the diagnostics of various types
of circuit breakers and characteristics for these assets [B19]. Based on this information and various
previously established failure modes, the condition indicators for each failure mode should be
determined (see Table 4.3.3 au-dessous). The table contains the typical failure modes for the
components which contribute to the primary functions and the related condition indicators. These are
mainly based on the structure presented in the TB 167 and the TB 737 [B18] and [B17].
Note that the listed condition indicators are the ones regarded as essential/high rank in the
maintenance/life cycle assessment in the TB 167, or ones described as standard practice in the field
in the TB737 respectively.
The brackets in the Condition indicators column indicate the respective component from the
Components column.
Table 4.3.3 – Examples of condition indicators related to components and failure modes
Components Failure modes Condition indicators
Contact resistance (C)

Gas (humidity) condition (G)
Primary path, Interrupting Chamber, Does not operate on command Gas leak rate (G)
Mechanical Linkages Failure to carry current Oil condition (O)
(Switching, Insulation, Current Change in electrical functional characteristics Partial Discharge (C)
carrying) DDF/Capacitance (C)
Loss of insulation
Common (C) Radiographic (C)
Gas/Air/Oil leakage
SF6 circuit breakers (G) I2T recording (C)
Insulation breakdown to earth or between the
Air circuit breakers (A) phases Primary circuit insulation resistance(C)
Oil circuit breakers (O) Failure to interrupt current under normal or Primary path temperature (C)
Vacuum circuit breakers (V) abnormal circuit condition Make/Break/Arcing Time (C)
Vacuum condition (V)
Primary contacts wipe (C)
Vibration (C)
Charged/Discharged indicator (S)
Does not operate on command Main contacts velocity (C)
Mechanical drive Change in mechanical functional characteristics Opening/Closing/Arcing Time (C)
Common (C) Mismatched (unexpected) operation Number of fault operations (C)
Hydraulic operated (H) Loss of pressure in hydraulic mechanism Mechanical operations counter (C)
Pneumatic operated (P) Fluid leakage Contact travel curve (C)
Spring operated (S) Hardened lubricant Coils current (C)
Springs (close/open) not fully charged Number of motor pump starts (H)
Motor current (C)
Recharging (running) time (C)
Control circuit insulation resistance (C)

Does not operate on command
Close/Trip coil current (C)
Control and auxiliary circuits Undesired reclosing after the fault interruption
Control voltage drop (C)
Abnormal operation
Position and condition of auxiliary switches (C)
Step 4: Identify the condition indicators to be used

Once condition indicators are identified, it is imperative to evaluate their criticality in order to determine
which indicators to use for health indices. A recommendation is to consider both “Correlation
to primary function” and “Detectability” of each indicator, which can be enabled by FMEA
consideration described above. It does not make sense to detect a critical failure mode by a condition
indicator with poor detectability or vice versa.
Table 4.3.4 shows an example of condition indicator estimation for circuit breakers. It is obvious that
utilities can review and improve their own maintenance policies by making this quantified ranking list,
which provides suggestions about what they are missing in present maintenance in terms of criticality
of condition indicators. Also, this estimation is useful to determine which indicators are worthwhile
64
checking more frequently by online monitoring. In this case, cost benefit analysis should be done by
considering the cost of monitoring.
The following Table 4.3.4 shows a list of condition indicators for circuit breakers estimated with
regards to their criticality, i.e. correlation to primary function and detectability as an example. The TB
167 [B18] describes “Rank” for each condition indicator with regard to life assessment, which is
informative to estimate the correlation to the circuit breakers’ primary functions of making, carrying and
breaking currents under normal and specified abnormal circuit conditions. Also, the TB 737 introduces
“Degree of maturity” for each condition indicators, which is useful to estimate the detectability [B17].
The detectability level scores based on the example Table 4.3.4 where score 1 indicates easy to
detect and score 10 indicates it is hard to detect a failure.
Table 4.3.4 – Example of condition indicator estimation for circuit breakers
Effectiveness or
No. Condition Indicator Correlation to primary function detectability of diagnostic
(score 1 – 10)
1 Make/Break/Arcing time Switching function 3
2 Contact resistance of primary path Carrying function 5
3 I2T recording Switching/Carrying functions 3
4 Primary circuit insulation resistance Insulating function 4
5 Gas leak rate Switching function 2
6 Control circuit insulation resistance Switching function 7
7 Main contacts velocity Switching function 1
8 Primary contact wipe Carrying function 4
9 Partial Discharge Insulating function 6
10 Spring recharging time Switching function 4
11 Vacuum condition Switching/insulating functions 6
12 Number of motor pump starts Switching function 3
13 Primary path temperature Carrying function 7
14 Dielectric dissipation factor/Capacitance Insulating function 5
15 Radiographic Switching function 8
16 Vibration Switching function 7

This section is mainly concerned with the ways to collect typical condition indicators thereby determine
“Review Levels” for condition indicators.
Grid integrity (in/off service) in TB 737

The TB 767 [B17] covers “Grid Integrity” for each condition indicator, which summarises whether or
not a certain condition indicator can be measured while the equipment is in the service operation.
Thanks to progress in monitoring and diagnostic technology, some indicators can be taken without
equipment outage. In this respect, the condition indicators taken in service operation correspond with
the review Level of 2 or 3 – namely, “Visual inspection” and “Diagnostic – in service”, respectively.
Also, the condition indicators taken off service operation are related to the review Level of 4,
“Diagnostic – Out of service”.
65
Continuously monitored (C)/Periodically diagnosed (P) in TB 167

The TB 167 classifies the condition indicators with respect to timing to measure: i.e. “Continuously
monitored (C)” and “Periodically diagnosed (P)”. Asset owners can determine the review level for the
condition indicators factoring in these perspectives. It is imperative to consider the criticality of the
condition indicators considering the relevant failure modes and cost, which was previously discussed.
Review Level for condition indicators

Table 4.3.5 shows the typical condition indicators for circuit breakers and the associated Review
Levels, the Grid Integrity and the category of “Continuously monitored (C)”, “Periodically diagnosed (P)
which is quoted from the TB 167 and the TB 737.
Note that although the Table 4.3.5 shows the typical Review Level for the condition indicators, asset
owners should determine which level to adopt taking into account criticality of the failure modes, i.e.
correlation to primary function and detectability and cost, which was discussed in detail in the previous
section.
Table 4.3.5 – Review Level, Grid Integrity and C/P for condition indicators
Review Level Grid Integrity C or P

Typical condition indicators
(see Figure 2.1)
In service C Primary circuit current and voltage,

In service C Gas/Oil condition,
In service C Contact temperature,
In service C Partial discharge monitoring,
5 Online monitoring – in
In service C Make time,
service
In service C Break time,
In service C Electromagnetic emission monitoring of switching transients,
In service C Trip and close coils current monitoring,
In service C Acoustic and vibration monitoring of the operating mechanism
Out service P Primary path insulation resistance,

Out service P Contact resistance,
Out service P Opening time,
Out service P Closing time,
Out service P Contact travel curve,
4 Diagnostic – out of
Out service P Contact velocity,
service
Out service P Contact wipe,
Out service P Contact wear,
Out service P Timing of auxiliary contacts,
Out service P Control wiring insulation resistance,
Out service P Grading capacitor capacitance and DDF
In service C Cumulative current switched (I2t),

In service C Break time,
In service C Make time,
In service C Arcing time,
3 Diagnostic – in service In service C Detection of reignition,
In service C Detection of prestrike,
In service P Operating temperature (IR scanning),
In service C Recharging time of the operating mechanism,
In service C Supply control voltage
66
Review Level Grid Integrity C or P

Typical condition indicators
(see Figure 2.1)
In service P Environmental conditions,

In service P Pollution,
In service P Insulating surfaces contamination,
In service P Corrosion present,
In service P Gas leak rate,
2 Visual inspection
In service P Air/Gas pressure,
In service P Oil level,
In service P Number of operations,
In service P Condition of lubrication,
In service P Visible leaks
In service P Specification,
In service P OEM design and insulation type,
In service P Operating mechanism,
In service P Nameplate information,
1 Office study
In service P Switching application (overhead line, transformer, cable, shunt reactor,
capacitor, bus coupler, others),
In service P Spare assets/parts available,
In service P Maintenance records
Step 6: Evaluate current condition relative to key failure modes

A condition indicator itself has a measured value e.g. 1,000 MΩ for insulation resistance, 60 V for
minimum tripping voltage, etc. This step is to translate these measured values into certain scores,
which enables health indices to be calculated. The most important part in this step is not to mask
abnormal scores by number of normal scores. For example, assume that normal condition indicators
are scored by “1” whereas abnormal condition indicators are scored by “2” or “3” depending on its
condition, and then the total score is calculated by simply summing up each score. Then scores which
indicate abnormality could be masked by a large number of normal scores “1”.
Table 4.3.6 – Typical condition indicators and scoring methodologies
Intermittent scoring Consecutive scoring
Insulation resistance Age

Opening and closing characteristic Operation number
Minimum control voltage Contact resistance on main circuit
Table 4.3.4 shows examples of scoring condition indicators. Some condition indicators are scored
intermittently because they are not to keep track of its trend, but to indicate just “good”, “bad” and so
forth at the moment the indicator was measured. Note that normal values are scored as “0” to avoid
masking abnormal indicators. On the other hand, other condition indicators are scored consecutively,
because their trend is important to catch abnormality. It is essential to consider the trace of these
condition indicators to catch abnormal symptoms at an early stage for appropriate actions. Typical
condition indicators are described in Table 4.3.3 in terms of these classifications.
67
a) Intermittent scoring
b) Consecutive scoring
Figure 4.3.3 – Examples of scoring condition indicators
Step 7: Aggregate analysis for AHI

Once scoring is completed for all the condition indicators, AHI can be calculated by integrating these
scores following the methodology explained in the previous chapter. It is easier to calculate sub-health
scores first, which indicates health of individual parts (e.g. interrupting chamber, hydraulics, etc.) then
integrate them into whole AHI. This concept is shown in Figure 4.3.4 au-dessous.
Figure 4.3.4 – Aggregate condition scale scores into an asset health score
68
Thanks to the consideration in Step 3, the relations between condition indicators and key failure
modes are easy to understand.
Table 4.3.7 shows an example of overall comprehension about health indices according to AHI
definition described in Chapter 2. In this example from Japan a more intensive analysis is used so
allowing the AHI to be scored as a percentage (column 2).
Table 4.3.7 – Comprehension about health indices
AHI Health Index [%] Comprehension
1 Very good condition 0-5 Continue with inspect and test schedule.
2 Good condition 5-10 Continue with inspect and test schedule.
3 Fair condition 10-25 Intervention to be planned (maintain, refurbishment etc)
Some scores are above caution level.

4 Poor condition 25-50
Expected life short, replacement is needed
Some scores are above warning level.

5 Critical condition 50-100
Can no longer be operated, immediate action is needed
Further details of the Japanese experience are described in Appendix G.10.
Step 8: Plan Actions
Mitigation actions would include replacement, refurbishment, inspection, do nothing etc. What
counts is to predict future AHI value assuming possible actions in order to make rational
decisions based on objective criteria.
69
Disconnectors and earthing switches
This section focuses on disconnectors and earthing switches. This equipment is used in both Air-
Insulated and Gas-Insulated Switchgear (AIS and GIS). The chapter describes the principles in
the applied methodology for generating an AHI. Focus is given to AIS. For GIS specifics please
refer to Chapter 4.6.
4.4.1 Step 1: Identify the Assets and Decide on Review Levels

Circuit breakers have the capability to make or break circuit current. Once operated, the need is
then to provide a more positive isolation between parts of the circuit. This will then permit safe
working on the isolated section. Here the isolation is provided by a disconnector or combined
disconnector and earthing switch. They have therefore, the same critical function in circuit
operation as the breaker itself. A typical traditional disconnector combined with an earthing switch
is shown in Figure 4.4.1.
Figure 4.4.1 – Disconnector and earthing switch

Definitions
 Disconnector (DS) – A mechanical switching device which provides when in the open position,
an isolating distance in accordance with specified requirements. Disconnectors are of numerous
design types. Most common ones are centre break, double break, knee type and pantograph.
[B22], [B23].
 Earthing Switch (ES) – A mechanical switching device for earthing parts of a circuit, capable of
withstanding for a specified time currents under abnormal conditions such as those of a short-
circuit, but not required to carry current under normal conditions of the circuit. Earthing switches
are generally distinguished if they are intended to perform making operations or not. Earthing
switches intended to perform making operations are sometimes known as “fast-earthing
switches”. [B21], Erreur ! Source du renvoi introuvable.
 Disconnectors and earthing switches are often combined in one device (as shown in Figure
4.4.1).
An overview of the main tasks of disconnectors, earthing switches and fast earthing switches is
presented in Table 4.4.1.
70
Table 4.4.1 – Main tasks of the equipment
Equipment Main Tasks
carrying current: rated current, short-circuit current

Disconnector (Isolator) withstand voltages: rated voltage, overvoltage (switching, atmospheric)
(visible) isolation gap
provides earthing up to short circuit conditions

Earthing switch carrying short-circuit current
withstand voltages: rated voltage, overvoltages (switching, atmospheric)
same as earthing switch

Earthing switch with making ability
making operation under fault conditions (short-circuit)
(high speed earthing switch)
There are two types, depending on the electrical endurance classification E1: 2×, E2: 5× [B21]
Table 4.4.2 – Review level
Level Action Content
1 Office study Type, manufacturer, year of manufacturing, etc.
2 Visual inspection Counters, wear, corrosion, insulators surface pollution/damage
Hot-spots via infra-red image, PD (UV camera, UHF-PD or ultrasound locator), anti-
3 Diagnostics - in service
condensation heaters
Supply voltages, mechanical chain (e.g. friction, alignment), operating times, power
4 Diagnostics - out of service
consumption of drives, operation times, interlocks
Operation times, contact wear, power consumption of drives, operation times, temperatures
5 Online monitoring - in service
(inner/outer), power consumption of auxiliaries e.g. heaters
Within the distribution network (typically Un = 110 kV) an AHI Level 2 is recommended, since the
network is designed redundantly (n-1) and the centre break type is not used. For the transmission
network the AHI Level 3 may be advisable (if AHI Level 2 does not apply). Due to certain conditions or
applications such as Un >> 380 kV, as well as extremely critical nodes and power circuits for the
connection of transformers and power plants, AHI Level 3 or 4 is applied.
4.4.2 Step 2: Perform FMEA and identify condition indicators to be used

Step 2 covers the Failure Mode Effects Analysis (FMEA) as well as the identification of the condition
indicators used. Failure modes can be identified by analysing the existing historical data of the
equipment. Most network operators have corresponding entries for errors in their databases.
Therefore, only a suitable evaluation has to be carried out to identify the relevant failure modes.
4.4.2.1 Findings and Commentaries

The quantitative results of the CIGRE Report 511 [B21] can be summarized:
 Dominating MaF modes are
o “Does not operate on command” (70% for disconnectors and 79% for earthing
switches)
o “Loss of mechanical integrity” (14% for disconnectors and 7% for earthing switches).
 Most frequently reported MiF modes are
o “Change in mechanical functional characteristics” (31% on disconnectors and 38% on
earthing switches).
o “Change in functional characteristics of control or auxiliary systems” (22% on
disconnectors and 36% on earthing switches) [B21].
The report contains the conclusion that most of the MaF are associated with the drive and kinematic
chain.
71
Based on this report the data shown in the following table displays the summed values for the minor
and major faults (drive only) regarding the three main types of DS and ES.
Table 4.4.3 – DS and ES: Failure mode of drive only by type of drive (Sum MaF + MiF) [B21]
Failure Mode of Drive only (DS and ES) Classification Electric Motor Pneumatic Manual
Does not operate on command MaF 37% 47% 31%
Air leakage in the operating mechanism MiF - 19% -
Change in mechanical functional characteristics MiF 16% 7% 27%
Change in functional characteristics of control of auxiliary systems MiF 34% 16% 22%
Other MaF + MiF 12% 11% 20%
Total of reported failures MaF + MiF 1441 1087 51
The evaluation leads to the conclusion that independent of the investigated type of drive (electric
motor, pneumatic, manual) the data main failure mode is "does not operate on command". An
important note is that none of the values presented exceeded 50%.
Table 4.4.4 displays the failure mode excluding drive for different types of DS and ES. For this display
the absolute values are summed up for MiF and MaF and transformed into percentages by dividing
them by the total number of reported failures (see [B21]).
Table 4.4.4 – DS and ES: Failure mode excluding drive (Sum MaF + MiF) (Table 3-60; Table 3-59 in [B21])
Centre Double Knee Vertical Semi-

Pantograph
Failure Mode Classification Break Break Type Break Pantograph ES
DS
DS DS DS DS DS
Does not operate on

MaF 7% 15% 10% 17% 29% 6% 32%
command
Loss of mechanical integrity

(mechanical
MaF 16% 2% 22% 18% 29% 5% 9%
damages of different parts like
insulators, etc.)
Change in mechanical
MiF 28% 31% 52% 15% 29% 26% 33%
functional characteristics
Change in electrical functional

MiF 23% 32% 9% 33% 14% 39% 10%
characteristics
Other MaF + MiF 26% 21% 7% 16% 14% 24% 16%
Total of reported failures MaF + MiF 839 381 135 92 14 140 196
The failure modes with the highest relative values are depending on the type. The "Change in
mechanical functional characteristics" is the leading cause for failure regarding the types centre break
DS, knee type DS and ES. For the double break DS as well as vertical break DS and pantograph the
main failure mode is "Change in electrical functional characteristics".
Figure 4.4.2 displays the distribution by failed subassemblies separated in MaF and MiF. In general,
the drive is the main cause for failure.
72
Figure 4.4.2 – Distribution of failed subassembly (DS; ES; DE = DS + ES) [B21]

Figure 4.4.3 displays the distribution by failure origin separated in MaF and MiF. All four failure origins
are relevant for the statistic.
Figure 4.4.3 – Distribution of failure origin (DS; ES; DE = DS + ES) [B21]

In general, there is a causal connection between the failure itself and its effects – respectively its
indicator. Table 4.4.5 shows certain examples of such cases. These connections can lead to the
source of failure. Therefore, the connections are important for the condition assessment.
Table 4.4.5 – Effects and root causes of several Failure Modes
Failure Effect (failure mode) Root cause
Lower pressure at the pneumatic/hydraulic drive,

Insufficient drive energy Does not operate on command
no voltage at the drive unit
Malfunction in the
Does not operate on command Broken/ rusted parts
mechanical chain
Change in mechanical functional Broken drive rod/ partly broken insulator (by
Mechanical overload
characteristics optical inspection)
Wire rupture / failed Change in functional characteristics of Open connection; auxiliary switch does not
component control of auxiliary Systems open/close accordingly
73
The components of disconnectors and earthing switches can be divided into the following groups:
 Current path and contact system
 Insulating system
 Operating mechanism and mechanical chain and
 Control and auxiliaries
The following sections describe the typical deviation, wear and defects separately for each group. With
regard to AIS none of the following defect necessitate that the switch reached the end of its usable life.
Generally, repairs are possible and cheaper if spare parts are available and not more than one
component parts/component unit must be replaced.
4.4.2.2 Current path including contact and insulation system

For the current path and insulating-system the following failures are typical (see [B21]):
 Misalignment of contacts (cf. Chapter 5.3.9 - fig. 11 in reference [B21])
 Increased contact resistance (cf. Chapter 5.3.9 - fig. 9)
 Burnt contacts (esp. fast earthing switches) (cf. Chapter 5.3.9 - fig. 12)
 Breakdown across pole during operation
 Breakdown across pole in open position
 Breakdown between poles
 Breakdown to earth
4.4.2.3 Operating mechanism and mechanical chain

There exists a number of typical defects of the mechanism system of a disconnector or an earthing
switch, despite the fact that the system is very robust and the level of the defect rate very low: A failure
of the electrical engine unit often occurs in relative terms. A closer look in reference [B21] reveals that
the specific cause can be water ingress (cf. chapter 5.3.9 - fig. 7), defective heating system, motor-
brushes in the wrong positions, or transmission damage (cf. chapter 5.3.9 - fig. 2). Furthermore, a
typical symptom which is responsible for the outage of a disconnector or an earthing switch is
sluggishly movement of the mechanical systems and excessive play in the bearings. Also, plastic
deformations and metal fatigue (breakage) is not uncommon (cf. chapter 5.3.9 - fig. 10).
4.4.2.4 Auxiliary systems

Regarding the auxiliary systems a defect of the anti-condensation heating is relatively common.
A further breakdown reason is an error of an auxiliary switch. Also, the counter sometimes fails.
4.4.3 Step 3: Assess Individual asset Performance

The CIGRE Working Group A3.06 "International Enquiry on Reliability of High Voltage Equipment"
already investigated individual asset Performance. The results were published in the CIGRE Report
511 (Part 3) in 2012 [B21]. The conclusion of this study is that the probability of major and minor –
faults (MaF / MiF) and the type of error depend mainly on three Parameters:
 Age of the equipment
 Construction type (e. g. pantograph, knee type, double break, centre break)
 Asset technology (e. g. type of mechanism: electric / hydraulic / pneumatic drive)
4.4.4 Step 4: Identify diagnostic strategy

This step involves reviewing failure modes – both conceptually for the asset type and specifically from
company records and international data for the asset design, manufacture and environment.
According to CIGRE study [B21] it can be calculated that 30% of MaF are caused by design
(dominated by knee type, semi-pantograph and centre break) and 70% of MaF are caused by drive
(dominated by pneumatic type).
74
Table 4.4.6 – Deciding diagnostic strategy
Step 1 /2 Step 3a Step 3b Step 4
Relevance
Failure Mode Condition Indicators Diagnostic Strategy
(FMEA)
Increased pump-starts
DS/ES with pneumatic or Does not operate on Level 3 as standard, higher
High leakages (oil, air)
hydraulic operated drives command 4 on request
lack of aux. voltages
Visible damages
Redundancy:
Surface conditions (rust, pollution)
Level 2 as standard
Visible misalignment
Knee type, Centre break Change in Check contact alignment in open and
or Semi-pantograph mechanical functional High
design characteristics close position (visible)
No redundancy:
Check contact alignment in open and
Level 3 as standard,
close position (metrological)
higher on request
Contact resistance
Does not operate on

command Visible damages (primary, secondary
Other than above Change in High part) Level 2 as standard
mechanical functional counters
characteristics
4.4.5 Step 5: Collect inspection data

Data should now be collected, trends identified to lead to a condition assessment.
Table 4.4.7 – Example assessment and comparison of 3 different disconnectors
DS 1 DS 2 DS 3
Indicator Step 5 Step 6 Step 5 Step 6 Step 5 Step 6
DS1 score log3 DS2 score log3 DS3 score log3
Design type Vertical break 1 1 Knee type 3 10 Semi pantograph 3 10
Drive Hand 1 1 Electrical 2 3 Pneumatic 4 30
Surface clean,
Surface clean, no Surface dusty, no Some shields broken,
Insulators 1 1 2 3 4 30
damages damages Slight cracks in
cement
Increased contact
resistance
Contacts Visible OK 1 1 Slight burn marks 3 10 4 30
Contacts burnt
Slight misalignment
Mechanical Hand crank and

1 1 Linkage OK 1 1 Linkage OK 1 1
chain linkage OK
Aux. supplies OK 1 1 OK 1 1 OK 1 1
75
DS 1 DS 2 DS 3
Indicator Step 5 Step 6 Step 5 Step 6 Step 5 Step 6
DS1 score log3 DS2 score log3 DS3 score log3
Anti-
condensation
Anti-condensation heating Ok Anti-condensation
heating Ok Aux. switch in heating Ok
Control place and fixed
Aux. switch in place 1 1 3 30 Aux. switch in place 1 1
circuits
and fixed Loose wires and fixed
No loose wires found No loose wires
Motor increased
noise
Some loose
Corrosion Some loose paint, no Lot of loose paint,
2 3 paint, getting 3 10 4 30
protection rust rusty
rusty
Step 7 Good condition 10 Fair condition 68 Critical condition 133
Replace asap
No special action required.
Step 8 Plan maintenance/repair (In this example repair technically and
Next inspection acc. schedule
economically not meaningful)
4.4.6 Step 6: Evaluate current Condition relative to key failure modes
The condition can now be related to failure mode.
4.4.7 Step 7: Aggregate analyses for AHI
Scores for each failure mode can be aggregated to an AHI.
4.4.8 Step 8 Plan actions
The AHI should indicate time scales for action to deal with any adverse score. It may involve cost
benefit analysis to decide between repair, refurbish or replace.
76
Instrument Transformers
Instrument Transformers (IT) – The most common usage of instrument transformers is to access
instruments or metering from high voltage or high current circuits, safely isolating secondary control
circuitry from the high voltages or currents. The primary winding of the transformer is connected to the
high voltage or high current circuit, and the meter or relay is connected to the lower voltage secondary
circuit. Non-conventional ITs are not covered in this TB as there is not enough experience with their
failures yet.
Current Transformer (CT) – An instrument transformer intended to have its primary winding
connected in series with the conductor carrying the current to be measured or controlled.
Voltage Transformer (VT) – An instrument transformer intended to have its primary winding
connected in shunt with a power supply circuit, the voltage of which is to be measured or controlled by
a secondary winding where the signal is proportional to the actual prevailing value on the primary.
 Inductive Voltage Transformer (VT) – A voltage transformer that uses a transformer to step
down the voltage
 Capacitor Voltage Transformer (CVT) – A voltage transformer that uses a capacitive potential
divider, inductive element, and an auxiliary transformer to step down the voltage.
Combined Current and Voltage Transformer (CCVT) – An instrument transformer that combines a
magnetic voltage transformer and a current transformer in the same device.
Failure impact assessment

Continuity of supply
 High impact
 The function of the instrument transformer is key for the good functioning of the grid. Indeed, the
instrument transformers give information to the protection devices to protect the grid and trip the
circuit breaker when necessary.
Safety
 High impact
 Most of the failure modes are catastrophic with safety and environmental implications, which can
damage surrounding parts and cause high follow-up costs.
Cost
 Low to Medium impact
 The CAPEX cost for the preventive replacement of instrument transformers are low.
 The OPEX costs for the inspection and maintenance of instrument transformers are low. Online
diagnostics on instrument transformers are not common and quite expensive in terms of OPEX
costs.
 The repair costs after a breakdown can be quite high due to the collateral damage caused by an
exploding instrument transformer.
Decide on review level

The most common failure mode of instrument transformers according to CIGRE Report 512 (Part 4)
[B27] is the reduction of dielectric withstand capability that can result in a catastrophic failure
(explosion). Here the main concern would be safety or environmental consequences and the cost of
power interruption, more than the cost of the actual asset. These failure modes can only be detected
in an early stage from the results of periodic outage testing or on-line monitoring.
Online diagnostics on instrument transformers are not common and quite expensive in terms of OPEX
costs, therefore, for most instrument transformers an asset health index having a Review Level 4
would be a normal requirement. Level 5 may be considered in certain scenarios where the dominant
failure mode has a rapid development.
77
 Advanced Asset health review Level 4 The reduction of dielectric withstand capability as well
as other failure modes, can be evaluated from the results of outage testing. Periodic outage
testing would be appropriate where these failure modes are considered a concern.
 Advanced Asset health review Level 5 The reduction of dielectric withstand capability failure
modes can initiate and progress in a short time inconsistent with routine survey or out of service
testing. Then online diagnostics with permanently installed diagnostic systems is appropriate
where catastrophic failure could lead to safety or environmental consequences.

Instrument transformer problems can be characterised as those that arise from manufacturing defects,
those derived from deterioration processes, and those induced by operating conditions that exceed
the capability of the instrument transformer. These conditions may take many years to develop into a
problem or failure. However, in some cases undesirable consequences may develop rapidly.
Deterioration processes relating to aging are accelerated by mechanical, thermal, and voltage
stresses. Elevated temperature, along with oxygen content, moisture content, and other contaminants
significantly contribute to accelerated insulation degradation. The rate of deterioration may be
compounded by the presence of contaminants and by mechanical or electro-mechanical wear.
Characteristics of the deterioration processes include sludge accumulation, weakened mechanical
strength of insulation materials such as paper wrapped on conductor, and shrinkage of materials that
provide mechanical support. Overheating of insulation having a high water content can cause gas
bubbles in the insulating fluid. The bubbles can cause serious reduction in dielectric strength of the
insulating structure, which could result in an eventual dielectric failure [B26].
The CIGRE Working Group A3.06 "International Enquiry on Reliability of High Voltage Equipment"
already investigated individual asset Performance. The results were published in the CIGRE Report
512 (Part 4) in 2012 [B27]. The Table 4.5.1 shows the most common failure modes identified in this
study as well as the instrument transformer component and indicator associated with the failure mode.
Table 4.5.1 – Component, failure mode and indicators
Component Failure Mode Indicator
Main internal insulation Reduction of dielectric withstand capability Deterioration of bellows (O)
Common (C) Internal dielectric failure (explosion) Oil leakage (O)
Oil (O) SF6 leakage (S)
SF6 (S) Moisture content of the asset (O)
Resin (R) Dissipation factor/capacitance of the asset (C)
DGA (O)
Oil DDF (O)
Oil breakdown voltage (O)
Oil moisture content (O)
SF6 quality (S)
Partial discharges (C)
Insulator (porcelain, External dielectric failure (flashover) Dissipation factor/capacitance of the asset (C)
composite, or resin) Loss of mechanical integrity (mechanical Insulator cleanliness (C)
damages of different parts like insulators, Thermal hot spots (C)
etc.)
Partial discharges (C)
Primary terminals Loss of electrical connections integrity in Thermal hot spots (C)
primary Partial discharges (C)
HV tank Sealing failure Oil leaks (O)

Loss of mechanical integrity Corrosion level (C)
Secondary terminal board Loss of electrical connections integrity in Voltage comparison between units (Secondary
(secondary terminals and secondary false readings) (C)
reconnection taps included) Thermal hot spots (C)
78
Component Failure Mode Indicator
Sealing (e.g. gaskets and Sealing failure Deterioration of bellows (O)

o-rings) Oil leakage (O)
SF6 leakage (S)
Capacitors in CVT Accuracy out of tolerances Voltage comparison between units (Secondary
Windings (short turns) false readings) (C)
Winding ratio (C)
Damping circuits in VT and Damping circuit failure Damping circuit resistance (C)
CVT
Monitoring device (SF6 Monitoring device failure Functioning of the SF6 monitoring device (S)
density meter)

The review for all categories would begin with the asset register and include all instrument
transformers. Aspects would include:
Asset register data
 Year of manufacture.
 Type of instrument transformer, name plate ratings, substation insulation, main insulation
material, type of insulator, type of sealing, and primary arrangement.
 The manufacturer and factory location.
 Location – atmospheric pollution exposure, risks for creating environmental pollution/safety
impact towards surroundings if failure occurs.
Purchase files
 The purchase documents should include design information such as the primary arrangement,
materials, and manufacturing processes. It should also identify the instrument transformer type of
sealing.
 Performance of the unit or design group in factory acceptance tests. This would include poorer
performance in tests, such as for heat run results that could affect normal thermal rating and any
consequence of overloads.
 Specifications – User requirements defined in the original specifications for ambient
temperatures, load, voltage, power factor, lightning levels, short circuit withstand, and acceptable
losses. Some of these factors may now have evolved such that these requirements are
inadequate for the current and future operating environments. The purchase specification is,
therefore, an important document to review considering current manufacturing standards, and
against the actual operating environment.
 In light of these specifications discussions and documentation within the utility should indicate
any significant requirement to operate the unit above specified conditions, exposure to unusual
levels of short circuits or switching transients, DC carry through, harmonics, extent of reverse
power flow, etc. together with historic relocations, whether system voltages are at the top of the
voltage range and over-fluxing is a possibility, and whether fault frequency or levels are higher
than specified. The latter could occur if source impedance or earth impedance have changed, or
protection is slower than specified.
 Standards used at the time of manufacture which now may be considered to have been
inadequate are identified; examples are withstand tests such as BIL. Designs also have changed
over the years, Identifying the design practice at build is important, therefore, to predict future
performance in these areas.
The aim is to gather performance data relating to specific assets. This may require only a restricted
activity for low cost units.
79
Service experience
How the design has performed in the general utility experience is important. Often poor in-service
performance can be related back to design limitations described above. Sometimes they may be due
to poor standard or specification not meeting actual conditions. Industry or trade association reports of
failures on similar designs (HV tank, insulator, main internal insulation, primary winding, capacitors in
CVT, sealing, and secondary winding) are a valuable source of information – being able to relate
trends in test data for all units to the failure and rates of aging revealed in the failed unit.
Operating History
The differences between the manufacturers’ original maintenance plan, what activity has been done
over the years, and opinion as to best practice maintenance policy and actual work undertaken should
be identified. This may indicate shortcomings that could affect future reliability.
A review of maintenance work undertaken indicates the problems encountered with the unit, their
extent, and cost. This can be used to indicate integrity and likely future performance risks, as well as
yielding key performance indicators for reliability and cost.
The dielectric withstand capability of the main internal insulation system is often affected by the
ingress of moisture due to loss of hermeticity. Risk assessment requires information on sealing
integrity, lightning events, dissipation factor and the capacitance. This may then be used with
diagnostics such as dielectric response analysis and DGA to assess the condition of the main internal
insulation.
Operating condition history, including load history, and ambient temperatures are important to the life
of insulation. Included within this assessment is the effect of moisture estimated from oil testing. The
first stage is to review the load history over the life of the units. From this, representative periods are
selected, and the loss of life calculated using IEEE C57.91 [B25] equations and from temperature
data.
Operating Costs
Some assets involve more ongoing activity to prevent in service failures.
Some will have this data captured in a CMMS (Computerized maintenance management system). The
amount spent may have relevance to assessing ongoing life.
Historical test data
A review of this data will identify normal significance of deviations for values found in the test group.
Data for individual units will also be reviewed for changes throughout life, and the current rates of
change.
Data from visual inspection
 Confirmation of the asset register data and the equipment in each bay is recorded correctly.
 Results from an external visual assessment, Table 4.5.2, may be used to relate to failure modes
and relationships between observed data and risk of failure.
Table 4.5.2 – Visual Inspection [B28]
Feature
Compare equipment nameplate data with drawings and specifications.
Inspect physical and mechanical condition.
Verify correct connection of transformers with system requirements.
Verify that adequate clearances exist between primary and secondary circuit wiring.
Verify the unit is clean.
Inspect bolted electrical connections.
80
Feature
Verify that all required earthing and shorting connections provide contact.
Verify correct primary and secondary fuse sizes for voltage transformers.
Verify insulator integrity.
Verify correct operation of gauges.
Data from non-invasive in-service periodic diagnostics

Examples of techniques that can be used are shown in Table 4.5.3.
Table 4.5.3 – Non-invasive in-service test results
Test
Infra-red surveys – a simple survey method using an infra-red camera to detect high
temperatures indicative of overheated joints.
Partial discharge (PD) detection: a simple survey method using a UHF antenna and
scanner to identify partial discharge. Detecting above 100 MHz escapes the unwanted
effects from corona and surface discharge.
Outage investigations
Unusual results from on-line survey methods are best investigated further with off-line testing, targeted
at specific failure modes. Outage testing is also appropriate for detecting some failure modes. Table
4.5.4 shows offline test applicable to instrument transformers.
Table 4.5.4 – Offline and investigative testing [B28]
Result of Tests – based on level and rates of change
Winding DC resistance measurements through bolted connections detecting connection

issues
Oil testing: oil quality for acidity and consequences of contamination or deterioration. It
should include testing for contaminating material such as PCB.
Oil testing for paper ageing: This is a furanic compound test, although some research is
indicating other tests may be additionally useful.
Insulation-resistance
Polarity
Turns ratio detecting winding conductor issues
Excitation test on current transformers used for relaying applications
Current circuit burdens at transformer terminals
Dielectric withstand on the primary winding with the secondary earthed
Power-factor or dissipation-factor detecting deteriorated primary insulation
Dielectric response analysis giving some indication of moisture content
81
Result of Tests – based on level and rates of change
Verify that current transformer secondary circuits are earthed and have only one earthing
point
Measure capacitance of capacitor sections on CVT.
Installed monitoring
In addition, there are some permanently installed systems now available that could be applied to
instrument transformers. Table 4.5.5 shows some systems that could be applicable to instrument
transformers.
Table 4.5.5 – On-line monitoring
Outcomes from test methods – if used
On-line dissolved H2 detectors have become more reliable.
Dissipation factor (power factor)
Relative saturation sensors can give a reliable indication of moisture content in the windings
Monitors for temperature
Step 4: Identify condition indicators to be used

Once condition indicators are identified, it is imperative to evaluate their criticality to determine which
indicators to use for health indices. A recommendation is to consider both “Correlation to primary
function” and “Detectability” of each indicator, which can be enabled by FMEA consideration described
above. It does not make sense to detect a critical failure mode by a condition indicator with poor
detectability or vice versa.
Table 4.5.6 below shows an example of condition indicator estimation for instrument transformers. It is
obvious that utilities can review and improve their own maintenance policies by making this quantified
ranking list, which provides suggestions about what they are missing in present maintenance in terms
of criticality of condition indicators. Also, this estimation is useful to determine which indicators are
worthwhile checking more frequently by online monitoring. In this case, cost benefit analysis should be
done by considering the cost of monitoring.
The following table shows a list of condition indicators for instrument transformers estimated with
regard to their criticality, i.e. correlation to primary function and detectability as an example.
Table 4.5.6 – Detectability of diagnostics
Criticality Detectability
Indicator Related failure mode(s) (10 = high, Inspection Method (1 = high,
1= low) 10 = low)
Sealing failure
Reduction of dielectric withstand
SF6 leakage 10 Online monitoring (alarm)
capability
Internal dielectric failure (explosion) 1
Sealing failure
Oil leakage 10 Visual inspection every 3 months
capability
Insulator Cleanliness External dielectric failure (flashover) 8 Visual inspection every 3 months 2
82
1= low) 10 = low)
Deterioration of capability
10 Visual inspection every 4 years
bellows Internal dielectric failure (explosion)
Sealing failure 5
capability Oil analysis every 8 years starting
DGA 10
from an age of 25 years in service
Oil DDF 10
Oil breakdown capability Oil analysis every 8 years starting
10
voltage from an age of 25 years in service

Oil Moisture content 10
capability SF6 Quality measurement every 8
SF6 Quality 10
years
Loss of electrical connections integrity
in secondary
Thermal hot spots Loss of electrical connections integrity 8 Infrared scans every year
in primary
External dielectric failure (flashover) 4
Winding ratio Accuracy out of tolerances 5 Winding ratio testing every 8 years 5
Damping circuit Damping circuit resistance test
Damping circuit failure 5
resistance every 8 years 6
Reduction of dielectric withstand Dynamic frequency response
Moisture content of capability 10 Insulation diagnostic (DFR) every 8
the asset
Internal dielectric failure (explosion) years 3
capability
Internal dielectric failure (explosion) UHF PD sweep of the substation
Partial discharges 10
External dielectric failure (flashover) every year
in primary 3
capability
Dissipation Dynamic frequency response
factor/capacitance of Internal dielectric failure (explosion) 10 Insulation diagnostic (DFR) every 8
the asset External dielectric failure (flashover) years
Accuracy out of tolerances 4
Dissipation capability
factor/capacitance of 10 Online dissipation factor monitoring
Internal dielectric failure (explosion)
the asset
DGA capability 10 Online monitoring DGA
Voltage comparison Accuracy out of tolerances
between units Continuous online voltage
Loss of electrical connections integrity 5
(Secondary false comparison
readings) in secondary 1
Sealing failure
Corrosion level 1 Visual inspection every 3 months
Loss of mechanical integrity 1
83
1= low) 10 = low)
capability
Internal dielectric failure (explosion)
Partial discharges 10 Continuous PD online monitoring
External dielectric failure (flashover)
in primary 2
in secondary
Daily Infrared scans by infrared
Thermal hot spots Loss of electrical connections integrity 8
camera's
in primary
Functioning of the
Functional test of the alarms every
SF6 monitoring Monitoring device failure 1
6 years
device 4

The diagnostic strategy should be identified as indicated from Step 4, and data collected. The data as
collected may be a quantitative observation – e.g. moisture analysis, or an increased measured value
of hydrogen or dissipation factor and capacitance. Such is the data as it is collected.
Example of collection of DGA results on CT and VT:
Table 4.5.7 – Example oil results
Break-down voltage
Tg delta 90°C Baur
% SATURATION
Temperature
WATER
Water
C2H4
C2H2
C2H6
Date
TAN
CO2
CH4
CO
O2
H2
N2
ID
20/07/2016 15680 34,40 5,00 3.754,25 50.000,00 82,68 1.533,02 46,63 13,14 1,00 478,68 0,01 0,08 44,00 37,52 35,00
20/07/2016 15681 25,00 5,00 500,00 50.000,00 97,46 1.950,14 51,02 14,52 1,00 489,62 0,01 0,08 45,00 38,28 35,00
20/07/2016 15682 27,20 5,00 3.617,47 50.000,00 90,29 1.419,30 40,66 11,07 1,00 387,38 0,01 0,07 42,00 35,82 35,00
28/07/2016 26891 35,10 5,00 14.787,80 50.000,00 218,02 1.664,86 9,05 1,02 1,00 5,32 0,01 0,01 28,00 45,31 20,00
28/07/2016 26889 22,00 5,00 999,02 50.000,00 178,73 1.614,78 14,56 3,27 1,00 19,11 0,01 0,02 33,00 56,24 20,00
28/07/2016 26890 26,60 5,00 2.521,11 50.000,00 156,00 2.081,02 14,09 3,16 1,00 18,38 0,01 0,03 33,00 49,61 23,00
02/08/2016 8418 35,30 15,70 20.000,00 48.966,70 304,68 1.264,20 8,48 1,00 1,00 1,00 0,01 0,00 14,00 36,68 14,80
02/08/2016 8420 41,50 25,44 20.000,00 50.000,00 421,23 1.412,29 13,12 1,00 1,00 1,00 0,01 0,00 13,00 34,06 14,80
02/08/2016 8419 40,20 17,66 20.000,00 50.000,00 365,52 1.314,38 12,11 1,00 1,00 1,00 0,01 0,01 16,00 41,92 14,80
The next step is to assess these data points in terms of criticality to failure modes and likelihood of
failure.
84

A condition indicator itself has a measured value e.g. 40 ppm for C2H6, 0.002 for oil tangents delta etc.
This step is to translate these measured values into certain scores, which enables to calculate health
indices.
Translating the condition indicator value to a condition score
200
180
160
140
C2H6 1
120
C2H6 2
C2H6
100
80 C2H6 3
60
C2H6 4
40
20 C2H6
0
0 0,5 1 1,5
Score
Figure 4.5.1 – Translating the set of condition indicator scores to a condition indicator index
Example of translation of the C2H6 (ppm) value from DGA to a condition score (0-infinity %) is given in
Table 4.5.8.
Table 4.5.8 – Example of translation of the C2H6 condition from DGA to a condition indicator index
Condition indicator 1 2 3 4 5
C2H6 (Condition indicator score) <=50% 50%-70% 70%-90% 90%-100% > 100%
C2H6 (Condition indicator Index) Green Yellow Orange Red Black

Aggregate condition scale code scores to a sub-health score and asset health score.
Example of the calculation of a sub-AHI related to the failure mode “Reduction of dielectric withstand
capability” for CT’s and VT’s based on 3 condition indicators (DDF, C 2H2, C2H6) is shown in Table
4.5.9.
Here the Sub-AHI Reduction of dielectric withstand capability = max (score DDF, score C 2H2, score
C2H6)
Where:
Table 4.5.9 – Reduction of dielectric withstand capability
Condition indicator 1 2 3 4 5
DDF
C2H2
C2H6
Sub-HI Reduction of dielectric withstand capability <=50% 50%-70% 70%-90% 90%-100% > 100%
85
Table 4.5.10 – Example AHI scores
C2H2 Health Index (%)
C2H6 Health Index (%)

DDF Health Index (%)
dielectric withstand
Sub-HI Reduction of
capability (%)
Equipment ID
36278 57,14 % 0,00 % 103,58 % 103,58 %
23231 19,05 % 0,00 % 97,20 % 97,20 %
24117 29,64 % 96,00 % 1,25 % 96,00 %
25027 93,38 % 0,00 % 5,00 % 93,38 %
50272222 92,23 % 0,00 % 3,03 % 92,23 %
42723 42,86 % 92,00 % 0,00 % 92,00 %
28072 91,96 % 0,00 % 5,75 % 91,96 %
The Table 4.5.10 shows a Sub-Hi Reduction of dielectric withstand capability based on the condition
indicators DDF, C2H2, and C2H6. As explained in the following 3 examples shown above, the dominant
AHI determines the overall AHI score for the asset.
1. Equipment ID 36278: DDF HI is 57.14% which is a good condition and indicates a low
likelihood of failure over a long period. Similarly, C2H2 HI is 0% and also has a very good
condition, indicating very low likelihood of failure over many years. However, the C2H6 HI is
103.58% which is critical condition and indicates a high likelihood of immediate failure existing.
This means that the unit should not remain in service. In this case, therefore, the C2H6 HI
prevails over the others and the Sub-HI reduction of dielectric withstand capability takes its
value of 103.58% as the general condition of the equipment.
2. Equipment ID 24117: DDF HI is 29.64% and C2H6 HI is 1.25% and both are a very good
condition and indicate very low likelihood of failure over many years, C2H2 HI is 96% which is a
poor condition and indicates that progressive deterioration has been detected, with high
likelihood of failure in the short term, therefore, the C2H2 HI prevails over the others and the
Sub-HI reduction of dielectric withstand capability takes its value of 96% as the general
condition of the equipment.
3. Equipment ID 28072: DDF HI is 91.96% which is a poor condition and indicates that
progressive deterioration has been detected, with high likelihood of failure in the short term,
C2H2 HI is 0% and C2H6 HI is 5.75% and both are a very good condition and indicate very low
likelihood of failure over many years, therefore, the DDF HI prevails over the others and the
Sub-HI reduction of dielectric withstand capability takes its value of 91.96% as the general
condition of the equipment.
Step 8: Identify mitigation actions to improve AHI

Based on the relatively low CAPEX cost in most cases an instrument transformer with a bad health index
will be replaced as there are only a few technical possibilities to refurbish an instrument transformer to
improve its health.
86
GIS
Step 1: Identify assets and decide review level
GIS installations typically house several main substation components, depending on actual
configuration encompassing busbar sections, switchgear, earthing switches, disconnectors, measuring
devices (VTs, CTs) and cable or line bushings. Furthermore, GIS may be combined with AIS
components (hybrid substations).
GIS plays a critical role in electricity transmission, as it may serve as a node in the network or as a
main substation next to a power plant. Some utilities may operate GIS from several manufacturers and
different years of manufacturing with numerous kinds of designs and technologies. As a consequence,
it might be possible that the number of condition indicators is not the same among the GIS types.
Utilities commonly apply comprihensive Review Level 4 and 5 (see Figure 2.1) when assessing the
health index of HV/EHV GIS. Commonly utilities apply comprehensive Levels, 4 and 5; according to
Figure 2.1; when assessing the health index of HV/EHV GIS.

The scope of a FMEA is dependent on the GIS configuration, e.g. what components/functionalities are
housed within the GIS assembly. Depending on the configuration it may be beneficial or improve
clarity to perform FMEA analysis for separate sub-compartments such as:
 busbar and bay section(s) including flange connections
 disconnector(s)
 circuit breaker(s)
 voltage transformer(s)
 current transformers(s)
 surge arrester(s)
 bushings
 drive(s)
 auxiliaries
Note, that overlap may occur in these analyses as sub-components may for instance have shared gas
compartments.
Step 3 Assess (Individual) Asset Performance

GIS functions as a substation
In the transmission network, the GIS substation role is being a node to distribute the electricity within
the network. The GIS should be able to energize (and de-energize) the high voltage apparatus and to
disconnect faults within the shortest possible time to maintain overall grid stability.
Functions of bays in a GIS substation
Typical arrangements of bays in a GIS substation are as follows [B9]:
 Single busbar
 Double busbar
 Double busbar with double circuit breaker
 One and a half circuit breaker scheme
 Ring busbar
Functions of enclosures in bays of a GIS substation

An enclosure contains components in GIS. The configuration of enclosures differ between GIS makes
and designs. Some components have a dedicated enclosure, like the CB, VT (Voltage Transformer),
and termination; while others share the same enclosure. It is also possible to say that the enclosure
provides dielectric and construction support functions for components in GIS.
87
Voltage Transformer
CB Driving Mechanism
Arcing Contacts Compartment
Current Transformer
CB Compartment
Termination ES
Termination Compartment
Termination Disconnector
Cable Termination
Busbar Disconnector
Busbar Compartment
Figure 4.6.1 – An example of a feeder bay in GIS. The components are placed inside different enclosures
of GIS
Functions of components located inside the enclosures
Based on their functionality, there are seven groups of components, namely:
1. Fault and load interrupters, i.e. Circuit Breaker (CB)
2. No-load switches including limited-fault interrupter, i.e. Disconnector Switch (DS), Earthing
Switch (ES), High-Speed DS (HSDS)
3. The main path for current distributions in GIS and interconnection among GIS feeders, i.e.
Busbar, Bus Segment (BS)
4. Link the GIS with the incoming and outgoing feeders, i.e. Terminations (TE)
5. Voltage and current sensing devices, i.e. Instrument Transformers (IT), including the Current
Transformer (CT) and the Voltage Transformer (VT)
6. Transient overvoltage limiter, i.e. Surge Arrester (SA)
7. Local Control Cabinet (LCC) housing the auxiliary wiring and control and protection
functionality
Each component consists of subsystems. Table 4.6.1 gives an example of how to make divisions of a
GIS.
Table 4.6.1 – GIS components, sub group of components, subsystems, function of subsystems, and key
parts
Subgroup of
Component Subsystem Function of subsystem Key parts
component
Main and arcing contacts,
Primary Conduct the current at its rating
conductor
Sending a command to driving
Wiring, auxiliary contacts,
Secondary mechanism either from remote control or
relays
from local control cubicle.
Energy storage to actuate the CB after a Spring, hydraulic and

CB can be grouped command from secondary sub system pneumatic compressions
based on its driving Driving
Circuit Breaker mechanism, as follows: mechanism
Mechanical rod/ link,
(CB) Hydraulic CB, Transform the energy from the energy
mechanical joints of CB
pneumatic CB, spring storage to move the main contacts
driving mechanism
CB
Extinguish the arcs and to insulate HV
Dielectrics SF6 gas and spacers
parts to the earth
Provide mechanical strength

Enclosures body, enclosure’s
Construction and SF6 gas containment base, sudden pressure relief,
Support Monitor gas pressure/density gas pressure/gas density
gauge
Provide overpressure relief
88
Subgroup of
Component Subsystem Function of subsystem Key parts
component
Primary Conduct the current at its rating Main contacts, conductor.
Sending a command to driving

Switches can be Wiring, auxiliary Contacts,
Secondary mechanism either from remote control or
grouped based on its relays
from local control cubicle.
functionality and driving
mechanisms Spring, hydraulic and
Energy storage to actuate the switches
pneumatic compressions
Driving
mechanism Mechanical rod/ link,
Based on its Transform the energy from the driving
mechanical joints of DS
functionality: DS, ES, mechanism to move the main contacts
Switches driving mechanism
HSDS
Extinguish the sparks and to insulate HV
parts to the earth. In DS for bus-coupler
Dielectrics and HSDS, the dielectric may also SF6 gas and Spacers
Based on its driving distinguish the arcs but with limited
mechanism: Electric DS, capacity.
Pneumatic DS, Spring
Provide mechanical strength
DS Enclosure base, enclosure
Construction and SF6 gas containment body, Sudden pressure
Support Monitor gas pressure /density Relief, Gas Pressure/ Density
gauge
Primary conductor, including
joints of bus conductor
Dielectrics To insulate the HV parts to the earth SF6 gas and Spacers
Busbar, Bus
– Provide mechanical strength
Segment (BS)
SF6 gas containment Enclosure base, enclosure
Construction and
body, sudden pressure relief,
Support Monitor gas pressure /density gas pressure/density gauge
Primary conductor of
Based on types of termination
connection to GIS: Dielectrics To insulate the HV parts to the earth. SF6 gas and spacers
Termination SF6 – air bushing
(TE) Cable sealing end Provide mechanical strength
GIL with Power SF6 gas containment Enclosure base, enclosure
Construction &
Transformer /Reactor body, sudden pressure relief,
Support Monitor gas pressure /density
Interface gas pressure/density gauge
Based on its Transform the current (CT) or the

Instrument functionality: Current voltage (VT) from a higher value to a Active parts: primary and
Transformer Transformer (CT), Active Parts lower one. CT and VT are used for secondary windings,
(IT) Voltage Transformer monitoring, and part of protection dielectrics
(VT) system.
Cutting the peak of transient over

Surge Arrester
– Active Part voltage accordingly to its V-I Metal oxide blocks
(SA)
characteristics.
Local Control Installed on the GIS or Connects with the rest of the control and Bay control unit, contactors,
Secondary
Cabinet (LCC) free standing protection auxiliary switches
GIS consists of bays, enclosures, components, and parts. Therefore, when performing FMEA on GIS,
its “hierarchical layers” should be taken into consideration. Figure 4.6.2 shows how a GIS can be seen
based on its physical layers (in the horizontal direction), and by its functionality layers (in the vertical
direction). Typically, a GIS can be divided into four layers of functionalities, namely (from top to bottom
hierarchy), substation, bay, enclosure, and component. The lower layer becomes a subsystem of the
higher layer. The component-layer consists of subsystems based on sub-functionalities. Physically, a
component consists of parts.
89
Substation
Substation functionality
Bays
Bay functionality
Enclosures
Enclosure functionality
Components
Parts
Component functionality
GIS physical layers
Figure 4.6.2 – The hierarchical layers in GIS [B16]

The physical layers see a GIS based on the grouping of components, while the functional layers see
a GIS based on the division of functions.
Failure mode effect analysis can be done once the GIS layers are developed. Figure 4.6.3 gives an
example of failure modes of GIS’ insulation system operating under moist environment. A failure mode
which is reported in the CIGRE TB 513 [B29] is mentioned in the red box. The bubble with dotted lines
shows the Failure Susceptibility Indicators for circumstances that may increase the likelihood of a
failure mode more than usual.
Figure 4.6.3 – Example of some failure modes of the dielectric subsystem of GIS from a case study [B16]
90
AND
Figure 4.6.4 – Example of some failure modes of the construction and support subsystem of GIS [B16]
Step 4: Identify Diagnostic Strategy and condition indicators

The critical failure modes can be different among utilities, as they depend on many factors, including
the environments and the operation and maintenance policy. Typically, the failure modes listed in
Figure 4.6.3 are deemed relevant for GIS bays.
After deciding on relevant failure modes, condition indicators should be defined. Table 4.6.2 provides
an example of condition indicators of various subsystems in GIS’ components.
Table 4.6.2 – The condition indicators in subsystems of GIS
GIS Failure mode / What to Condition

Unit
component Check Indicator
Primary Conductor Subsystem

Cumulative
CB short circuit kA2
current
Deterioration of main contacts in Number of

CB CB and switches short circuit times
interruption
CB and
Static contact resistance μΩ
Switches
CB, Switches,
Contact resistance of primary °C and
and Primary Hot spot in the enclosure
conductor joint pattern
Conductor
Dielectric Subsystem
Gas Pressure
Bar, MPa
(Leakage Rate)
Density of SF6
Gas Density
kg/m3
All Components (Density reduction)
Quality of SF6 SF6 Purity %-SF6
Partial Discharge Activity SO2 content ppmV
91
GIS Failure mode / What to Condition

Unit
component Check Indicator
SF6 by-products other than

ppmV
SO2
PD pattern & PD growth

“Multiple
(including UHF/Acoustic PD
indicator”
localisation)
Humidity content
ppmV
in SF6
Possibility to have condensation
on the surface of solid insulation
Dew point in SF6 at gas
°C
pressure
Driving Mechanism Subsystem

CB and Number of
Mechanical wear times
Switches mechanical operations
Hydraulic/ Number of gas pressure

Pneumatic (Compression) energy storage replenishing unintentionally times/period
readiness
CB (if any)
CB Contact timing open and close ms
Mechanical integrity
Contact
CB Contact travel record position vs
time
Electric
Electric motor readiness Motor current A
Switches
Secondary Subsystem
Corrosion of wiring and aux

-
relays
Any corrosion or dust deposited

in wiring connections Deposited dust in wiring and
-
aux relays
CB and
Switches o
C and
Hot Spot in wiring in LCC
pattern
Functionality of relays & remote Relay & control function;

OK/ Not OK
controls Indicators check
Construction and Support Subsystem
Corrosion level -
Corrosion on enclosures
All Deposited Pollutants -
Foundation integrity Foundation integrity -
Failure Susceptibility Indicators (FSI)

Failure Susceptibility Indicators were described in section 3.3.11 and 4.2.1.2. These are not failure
modes but factors that may or could indicate a probable onset of a failure mode. The use of the FSI is
to help distinguish that the same/similar GIS equipment may show aberrant failure behaviour due to
exceptional environmental conditions, exceptional loading conditions, maintenance regimes,
92
manufacturers’ concepts, etc. Note that the FSI is not a failure mode, it is also not a condition
indicator, but it makes one aware that the likelihood of a certain failure mode is becoming more
relevant. This may also aid in selecting/favouring certain condition indicators above others between
different sets of equipment, conditions, etc. However, since FSI is only an expectation, it functions as
a “warning flag” for the decision making regarding an asset. It stands as “side notes” of the AHI.
Step 5: Collect Inspection Data

Maintenance information & utilization data are needed to assess the performance of the asset.
Maintenance information including the following:
 Overhaul information (if applicable)
 Inspection information
 Diagnostic indicator information
 Past repairs
 Availability of spare parts
 Maintainable/non-maintainable issues
While utilization data is includes:

 Loading information
 Type of loads (reactive compensation, lines, cables, transformers)
 Environmental conditions
Step 6: Evaluate Current Condition relative to key failure modes and Norms
Generation
The result from inspections (i.e. condition indicators) needs to be interpreted to justify the health status
of the subsystems in GIS components. It is achieved by setting the limit, or the boundary values,
known as the “norm.” The norm uses the measured values of the condition indicators to decide on
a health status which is further translated into a condition score. In case of quantitative condition
indicators (e.g. a gas pressure level) numerical values are to be derived/calculated to reflect good, fair,
poor and/or severe levels/classifications. In case of qualitative condition indicators some guidance
must be given in making proper classifications, for example for a visual inspection on the presence of
rust, presence of leakages, etc.
Several approaches to develop such norms exist, for example:
 By using statistical analysis on the condition indicators taken from field inspections on numerous
assets, to determine expected values and what deviations from expected values are deemed
fair, poor or severe deviations. (this is also includes trending analysis, information from failure
investigations, and comparison with sister components)
 By using recommendations from literature, like, GIS manuals, international standards,
publications.
 By deterministic analysis, for example, from failure modes observed during a forensic
investigation, or by a laboratory test.
 By expert’s judgement (can be through discussion with the maintenance expert group or by a
Delphi test [B30]).
 By a combination of two or more of the above approaches.
An example of norm generation for moisture content in GIS is given below.

Approach 1: Setting the norm by using the statistics of humidity content from a population of GIS
The statistics using the distribution fitting method can be used for the definition of the boundary values
as proposed in [B31]. As an example, an estimated probability density function (PDF) is derived to
define the boundary of “Very Good,” “Deteriorate,” and “Bad” based on the three sigma (σ) limits of the
statistical distribution. Figure 4.6.5 gives the result for a distribution of moisture contents in 150 kV
GIS’ CB enclosures of a case study with service time over ten years [B16].
93
VERY GOOD DETERIORATE BAD VERY

BAD
Figure 4.6.5 – Boundary values for humidity content in the CB enclosure for GIS from a manufacturer.
The fitted distribution is the Gamma distribution.
From the example, the boundary values for the four-condition status are as follows:
1. Very Good : humidity-content ≤ 135 ppmV
2. Deteriorate : 135 < humidity-content ≤ 277 ppmV
3. Bad : 277 < humidity-content ≤ 336 ppmV
4. Very Bad : > 336 ppmV
Approach 2: Setting the norm based on recommendation from standards and manufacturer’s
recommendation.
The maximum humidity limit from the literature usually can only be interpreted as “Good,” if the
measured value is below the recommended limit, and “Bad” if the measured value is above the limit.
These recommendations are as follow:
1. Maximum humidity content from a specific manufacturer’s guide:
a) CB enclosure : 350 ppmV
b) Non-CB enclosure : 840 ppmV
2. Limit from the IEC60227-1 Ed1 : 804 ppmV
3. Limit from the CIGRE TB 234 and 567 : 200 ppmV
Example:
The norm for the humidity content in GIS from a specific manufacturer as derived from Approach 1
and Approach 2 have been summarized in Table 4.6.3.
Table 4.6.3 – Summary of norm for humidity content for 150 kV GIS from a specific manufacturer as
generated from different approaches
Humidity Content (in ppmV) per Health Status
Deteriorate/
Very Good Good Bad Very Bad
No. Approach Moderate
Non Non Non Non Non

CB CB CB CB CB
CB CB CB CB CB
1 Statistics 135  209 - - 135-277 209-660 277-336 660-804 > 336 > 804
Manufacturer N/A ≤ 350 ≤ 840 N/A > 350 > 840 N/A
2 IEC [B22] N/A ≤ 804 N/A > 804 N/A

CIGRE [B52],
N/A ≤200 N/A > 200 N/A
[B53]
94
It is up to the user of the system to ultimately decide on which norm-scheme to use. General
recommendation is to carefully consider (for example in the FMEA analysis) what the best basis is for
the norm schemes applied. In some cases, more generalized recommendation from for instance
international standards may be less applicable to a specific situation, requiring a more localized
analysis. In the absence of such local information, it may be best to utilize norms derived from
standards. It is recommended to also involve the OEM.
Condition scores and their description
Condition scores are used to represent the condition status of subsystems in GIS (based on the
measured/ observed condition indicators). Table 4.6.4 gives an example of condition scores and their
definition as used in the case study. In this example a log base 3 scoring system is used.
Table 4.6.4 – Example of condition scores and their descriptions
Qualitative
Score Description Likelihood of a failure mode to occur
meaning
Very Good As good as new, no evidence of ageing or Very Low

1
Condition deterioration. GIS can continue working properly.
Slight deterioration/ageing process is observed,

but it is considered at normal stage. Low
Good
3 Minor defect may be observed, but it does not GIS can continue working properly.
Condition
influence the GIS performance both in short and It is running at normal deterioration/aging process.
longer terms.
Deterioration/ aging process has been observed Moderate

Moderate beyond the normal stage. GIS can continue working but remedial action is
10
Condition Intervention is required as deterioration/ aging advised, otherwise it may contribute to GIS
may interfere the GIS performance in long-term. performance in longer term.
High
Bad Severe deterioration/aging has been observed.
30 The GIS performance is possibly reduced in short-
Condition Intervention is required in short-term
term.
Very severe deterioration/aging (i.e. at a final Very High

Very Bad
100 stage) has been observed. GIS shutdown is required for further action to fix
Condition
Emergency action is required. GIS performance.
Another example for primary conductor subsystem is shown in Table 4.6.5. Meanwhile, Figure 4.6.6
gives an example of badly deteriorated main contact of CB in GIS.
95
Table 4.6.5 – Condition scores of primary conductor subsystem in GIS
Condition Score
Component Condition Indicator Unit
1 3 10 30 100
Cumulative Short Circuit
20% < unit value ≤ 40%
40% < unit value ≤ 70%

CB ICUM-SC
Current
≤ 20% of design limit
100% of design limit

70% < unit value ≤
of design limit
of design limit
> design limit

Number of Short Circuit
CB NSC
Interruption
10% < Δ Rst-contact ≤ 20%

5% < Δ Rst-contact ≤ 10%
Δ Rst-contact > 20%

CB,
Δ Rst-contact ≤ 5%
Static Contact Resistance Rst-contact
Switches
N/A
Hot
No Hot Spot
All Hot Spot on the Enclosure (Pictorial) -
Spot
Found

Figure 4.6.6 – A carbonized female-main contact in one of circuit breaker in GIS in the case study. The
measurement before opening the enclosure had shown the increase of the static contact resistance
above 20% of the value during commissioning.
96
Figure 4.6.7 – Failed fragments of an epoxy disconnector drive tube (left) that exploded out through
bursting disc into the bay and created a significant safety risk. A similar unit is on right. Defects in the
casting considered to be the cause.
Step 7: Aggregate Indicators’ analysis for Asset Health Index

There are several methods to aggregate the condition scores of subsystems into a single condition
score representing the overall condition of an asset. In case of GIS multiple aggregations may be
chosen, e.g. on a bay level or the complete GIS. The varying configurations among GIS makes should
be taken into consideration in the development of a GIS AHI model.
In accordance with the hierarchic layers of Figure 4.6.2 the steps are e.g. as follows:
1. The process starts at the lowest layer, where the worst score of condition indicators defines the
condition score of a subsystem (following Table 4.6.4, a log 3 based logarithmic value has
been assigned to represent condition indicators)
2. At the next layer, a component will have sub scores of subsystems within it. No aggregating
process at this layer.
3. The process continues to the enclosure layer, but now, only the worst condition score of
subsystems among components passed the process.
4. The similar process as in point 3 continues to the bay layer. Now, the worst condition score of
subsystems among enclosures passed the process.
5. We generate a (sub) health index at the bay layer in 2 steps as follows:
a) All condition scores of a bay were added into a single condition score.
b) The score found in point a is then translated into an index.
6. The worst (sub) health index of bays in the same substation defines the total GIS’s health
index. It is possible to give additional information about the number of bays with a similar index
in GIS. For example, an index of 5 means there are 2 bays in GIS that own index of 5.
Fundamentally, the likelihood of failure of subsystems defines the GIS health index as a whole. The
example uses a hybrid coding method that combines the worst-score and the summation approaches.
97
Finally, Table 4.6.6 gives an example of a method for defining the health index scoring for the GIS
bay. The condition score range in the table comes from steps as explained in the previous paragraph.
Table 4.6.6 – Example of Condition Score (CC), interpretation, and bay index
CC range Interpretation Bay index

CC < 7 All subsystems have condition code of 1 (very good) 1 – Very good
7 ≤ CC < 14 At least one subsystem has condition code of 3 (good) but none of them has code of 10 2 –Good
(moderate)
14 ≤ CC < 34 At least one subsystem has condition code of 10 (moderate) but none of them has code of 3 – Moderate
30 (bad)
34 ≤ CC < 104 At least one subsystem has condition code of 30 (bad) but none of them has code of 100 4 – Bad
(very bad)
104 ≤ CC At least one subsystem has condition code of 100 (very bad) or at least 3 subsystems have 5 – Very bad
code of 30 (bad) each and the addition with the other two codes gives the total code above
104
Example:
An HV GIS with double busbar configuration. The bay configuration is shown in Figure 4.6.8. The GIS
consists of 8 bays: 4 transmission feeders, 3 transformer feeders, and 1 bus coupler. The surge
arresters are located outdoor connected to an overhead line.
Bus Coupler
Figure 4.6.8 – The single line diagram of the GIS example from the case study
The configurations of enclosures in the three types of bays in the GIS example includes:
 the line feeder,
 the transformer feeder,
 the bus-coupler.
The busbars are segmented among these configurations.
98
Figure 4.6.9 – The configuration of enclosures in three types of bays in GIS example
Table 4.6.7 shows the “scores of subsystems in Circuit Breaker compartment (G0) in each bay.” The
score shown in the column follows the definition in Table 4.6.4. In each compartment there are five
subsystems showing the worst condition indicator score. It can be seen that, in the current example,
the worst score (i.e. 100) comes from the dielectric subsystem of Line 1B, due to high moisture
content in it.
Table 4.6.7 – Summary of Condition Scores of Subsystems in CB (G0) from each line of GIS
Condition score
Subsystem
Bus
Line1A Line1B Line2A Line2B Trx01 Trx02 Trx03
Coupler
Primary 30 30 1 1 1 10 1 1
Dielectric 10 100 10 10 10 10 30 1
Driving
10 10 10 10 10 10 10 10
mechanism
Secondary 30 30 30 30 30 30 30 30
Construction
1 1 1 1 1 1 1 1
& Support
There are tables for other compartments from the same GIS. The bay index comes from the
aggregation of condition scores of compartments (by summation of worst subsystem scores).
The condition scores are then translated into the health index of the bay using Table 4.6.6. Table 4.6.8
provides the summary of bay Index of GIS in the example. Table 4.6.7only provide condition scores of
subsystem in CB (G0) enclosure from each of bay lines, bay transformers and bus coupler of GIS.
While Table 4.6.8 provide the “bay index” of which in one bay consists not only G0 but also G1/G10,
G2/G20 and G9 (see Figure 4.6.9)
Table 4.6.8 – Summary of Bay Index of GIS example
BAY Condition Score Sub Health Index

Line Feeder 1A 171 5
Line Feeder 1B 171 5
Line Feeder 2A 142 5
Line Feeder 2B 142 5
TRX-01 142 5
TRX-02 151 5
TRX-03 142 5
Bus Coupler 43 4
Besides the sub health index, the susceptibility indicators (FSI, Failure Susceptibility Indicators) have
been added for each bay. In the example, two susceptibility indicators have been added, i.e. related to
the environmental parameters and the lightning density.
99
Table 4.6.9 – Failure susceptibility indicator index of GIS example
HI
BAY FSI – pollutants FSI – lightning density
Bay
Line Feeder 1A 5 Low High
Line Feeder 1B 5 Low High
Line Feeder 2A 5 Low High
Line Feeder 2B 5 Low High
TRX-01 5 Low High
TRX-02 5 Low High
TRX-03 5 Low High
Bus Coupler 4 Low High
The complete result of the AHI of the GIS example is as follows:

The GIS example owns 7 bays with bay health index of 5 (i.e. “Very Bad” condition); moreover, there
is a warning flag for every bay with red colour indicating susceptibility due to lightning incidence.
Step 8: Plan Mitigation Actions

Remedial action on GIS (e.g., refurbish, replace, repair) is usually taking place at the bay-layer.
Although it is possible to do the action only on a specific part/ component, it requires the outage of the
affected bay. Therefore, the overall condition of a bay is necessary for deciding the optimal mitigating
action. This is the reason why the summation process is performed only at the bay layer. Providing a
(sub) health index of every bay should be handy for the asset manager, yet it is still possible to show
an index of a complete GIS substation.
To estimate the effectiveness of mitigating actions in the described GIS example it should be
understood, which condition indicators caused the increased (sub) asset health index. As shown in
Table 4.6.8, seven bays of the GIS example have a Health Index of 5, representing a “Very Bad”
condition. The condition of the dielectric subsystem of the G9 enclosures contributed substantially to
this increased value of the index (see Figure 4.6.9). High humidity content within the range of 2500 –
5000 ppmV at the termination cone placed inside the G9 enclosure has been found from a multi-year
report. The cause was probably related to the design, where there is no absorbent in such a
termination. The absorbed moisture inside the semiconducting tapes evaporates during GIS operation
that results in high humidity content. SO2 content of dozens ppmV has also been reported from all
termination enclosures that indicates the PD activity inside the termination.
The FSI Lightning Density (including surge arrestor readiness), shown in Table 4.6.9 indicates that all
bays have a “HIGH” level of susceptibility. This is mainly due to the old outdoor surge arrestors of GIS,
while the GIS is located in a high lightning density area.
Scenarios are available for mitigation actions on this GIS example, as follows:
1. Overhauling the GIS, with repair/ retrofitting the GIS termination
2. Replace the surge arresters
3. Combination of points 1 and 2
4. Gas reclamation
5. Gas replacement
6. Replace the whole GIS (also the surge arresters)
Approaches like multicriteria analysis could be an option to select the optimum solution, covering
parameters like cost, downtime of GIS, and residual risk of failure. Based on the analysis, Option 3 is
the best selection for the GIS example.
The scope of the overhaul includes replacement of seals, adjustment of the GIS driving mechanisms,
replacement of absorbent, and the retrofit of termination design. It takes two weeks per GIS bay. The
overhaul work will improve the dielectric status, not only inside the G9 enclosure but also in others. As
a result, after the overhaul, the revised Bay index will be expected at a level of “2”. In addition, the
replacement of the surge arrestor could decrease the susceptibility against a lightning stroke from
100
level “5” to “3” (Moderate). Table 4.6.10 summarizes the revised bay index and failure susceptibility
indicator of GIS example, before and (expected) after mitigation action.
Table 4.6.10 – Summary of Bay Health Index & Failure Susceptibility Indicator index of GIS example
before and (expected after) mitigation action
Bay Index FSI – Lightning Density
Before Expected Before Expected

Line-1A 5 2 5 3
Line-1B 5 2 5 3
Line-2A 5 2 5 3
Line-2B 5 2 5 3
T1 5 2 5 3
T2 5 2 5 3
T3 5 2 5 3
Bus Coupler 4 2 5 3
101
Other substation primary equipment

It is not possible to cover all possible asset types in this brochure. Major primary asset types have
been discussed in sections 4.2 – 4.5. Here we will include salient features of a few of the most
important “other” items. If the aim is to lead to an AHI for a complete substation then it is important to
have recognised the existence of all of these other assets and to have ensured that they are included
even if they have only been given some form of limited assessment. With the exception of bushings,
failure of these other assets is most likely to have an impact on delivery compliance, safety and
environmental impact – rather than causing a system disruption, but these are equally important as
business drivers.
Capacitor banks
These are sets of individual capacitor units connected in series and parallel combinations to achieve
the required voltage withstand and bank capacitance. Within each capacitor can is a stack of folded
rolls made from a high-grade polymer sandwiched between two aluminium foils, all being impregnated
and encased in an insulating fluid. Critical are the connections between individual rolls and to two
small solid bushings and the seal of the latter to the can. They are protected with fuses (internal or
external) or they can be fuse-less with relay protection. Trays of arrays are usually assembled within
their own compound to protect the workforce. Often the trays are at height. The capacitors could be
tuned with an adjacent inductor so as to create a filter. Their role is also to provide power factor
correction and voltage regulation, as well as removing harmonics.
Figure 4.7.1 – Stack of capacitor rolls within a can

Bushings
These are critical interfaces between the internals of an asset, through the earthed asset tank walls to
the high voltage circuit connection. To control the stress in the space between conductor and earth
flange where it is bolted to the asset tank it is normal to have a series of capacitive foils separated by
a layered dielectric. The latter may be a Resin Impregnated Synthetic material (RIS) or Oil
Impregnated Paper (OIP). Earlier types included a resin bonded paper (RBP) core. The core itself is
immersed in oil within a porcelain or resin housing. (Some later designs of RIS use a foam or gel in
place of the oil.) The conductor is terminated within a head having a sealed air space above the oil,
together with a sight glass to show the oil level. The outermost foil normally has a connection brought
to the surface and terminated to earth within a small metal box on the flange. The connection can be
undone to allow for periodic out of service testing of the dielectric. This thereby allows the condition of
both the bushing and the transformer to be tested without including the polluted surface condition in
the measurement circuit.
Surge arresters
These are protective devices protecting connected equipment against overvoltages. Their role is
to withstand the normal voltage with minimal current flow. In overvoltage conditions the arrester
impedance rapidly changes and conducts, so protecting the connected equipment. Earlier,
protection was given by arcing horns that had a lower transient flashover voltage than the
bushings fitted to the protected equipment. More usual now is a dedicated external arrester to
achieve the purpose. The arrangement is normally a series of connected metal oxide blocks.
They are sized to achieve the required voltage withstand and are contained within a porcelain or
102
synthetic housing. The first-generation devices consisted of an array of small sealed gas filled
spark gaps. This design was followed with one incorporating spark gaps with silicone carbide
blocks. Spark gaps were eliminated when a column of metal oxide blocks became standard.
Cable sealing ends
These components are be used as interconnectors within a substation or as an incoming cable
connection. The cables typically consist of cores with an insulation made from extruded cross-linked
polyethylene (XLPE) contained within a protective sheath. Older types of cables are oil-filled.
Critical is the removal of the core screen within the termination and the stress at the cable end is
capacitively controlled with a pre-formed stress relieving cone connected to the end of core screen. All
are encased within a dielectric fluid inside a porcelain or synthetic housing. This may then be mounted
on a gantry prior to connecting to the incoming overhead line or connected directly into a cable box on
a transformer.
Insulators
These are used throughout a substation and its incoming feeders. They may not be identified as
individual assets.
They could be solid insulators supporting open conductors. Or they could be cap and pin individual
units supporting an incoming connection at the tower, and then ongoing along the circuit external to
the substation. Normally these are made from porcelain or glass. Porcelain is typically ten times
stronger in compression than in tension, and designs reflect this. Polymeric insulators are now
emerging in recent years as well, but their strength can be limited.

Data and tables identified in Section 4.1 would be completed insofar as applicable. This includes:
 Asset register information
 Asset role and circuit type
 Total number of assets in same design group
 Spare assets available in the event of failure
 Spare parts for specific assets
 Asset technology
Capacitor banks
These would be treated like any other primary asset. Typical data would include:
 Manufacturer and factory location of both the bank and individual capacitors
 Rated voltage of the bank
 Rated reactive power at nominal voltage
 Maximum short-circuit withstand
 Short circuit withstand time duration
 Neutral connection
 Earthing arrangement
 Service location – outdoor/indoor
 Cooling – air natural forced ventilation or internal
 Degree of protection of the enclosure
 Maximum sound pressure level
 Support structure material
 Portable earth lead connection points
The individual capacitor unit has a rating 2-25 kV. It should have its own nameplate data; one
important detail is the materials of construction. Earlier capacitors contained PCB, and this has an
environmental significance if the can fails and there is a leak.
Failure is most likely to be on a can-by-can progression, until some critical stress on remaining units
leads to a complete failure. The time frame is likely to be fairly long and could be tracked with a Level
4 analysis.
103
Bushings
Bushings can be used on transformers, circuit breakers and for through wall exits. They are as well
used on a hybrid switchgear between the GIS and AIS parts. The bushing is a critical part of the asset
to which it is fitted and would have an analysis level determined by the primary asset. The nameplate
should contain ratings and factory test data.
This and other information include:
 Bushing manufacturer
 Factory and date of manufacture
 Voltage and current ratings
 Standards applying at manufacture and factory test.
 Design and materials used (e.g. OIP or RIS)
Surge arresters
According to IEC 60099-4 [B32], surge arresters are identified by the following nameplate information:
 Designation of arrester
 Continuous operating voltage Uc
 Rated voltage Ur
 Rated frequency fr
 Nominal discharge current In
 Rated short-circuit current Is
 Manufacturer's name or trademark, type and identification of the complete arrester
 Identification of the assembling position of the unit (for multi-unit arresters only)
 Year of manufacture
 Serial number
 Repetitive charge transfer rating Qrs
 Contamination withstand level of the enclosure (IEC TS 60815-1)
Figure 4.7.2 – Failed centre phase arrester

Their primary function is to protect primary assets in the event of an overvoltage. If they fail, then this
protection is no longer being given. This implies that in the first instance their failure is not critical to
network supply. Most failures are not explosive (see the fairly localised debris in Figure 4.7.2) [B34]. It
is as consequence of a failed arrester that is critical since primary assets would be unprotected and
could then fail with a high direct and indirect cost, power interruption, safety and environmental
concerns. This would indicate a careful analysis of failure rates and determinants.
The routine diagnostic strategy would be a levels 2 or 3.
104
Cable sealing ends

The nameplate on the support gantry should list:
 Cable supplier
 Circuit name
 Installation date
 Voltage and current level
 Materials used in the sealing end
These systems are very reliable, and failures are usually related to manufacturing defects or incorrect
assembly on site.
Insulators
A Level 3 diagnostic system is all that is likely to be required. In exposed sites pollution levels may be
measured using a sample insulator. This may indicate a need for cleaning or adding booster sheds to
improve flashover performance.
Capacitor banks
The most common failure mode is a dielectric failure of the capacitor units. This may follow from
system transients, external faults (animal and bird impact being important), loose terminations (hot
joint) and manufacturing defects.
Failure of a can increases the current in connected units allowing other cascaded failure causes such
as overheating. Both dielectric and thermal modes are affected by leakage of the fluid. Earlier designs
used paper as the dielectric, and these had a significantly higher failure rate than the currently used
polypropylene. Other problems relate to damaged or polluted can insulators, as well as flashovers
following animal or bird contact.
Figure 4.7.3 – Failure of bottom rack capacitor

External fuses such as shown in Figure 4.7.3 (the vertical orange tubes) [B34] reduce the reliability of
the system and many now prefer internal fuses or no fuses. However, this eliminates a useful visual
diagnostic.
Surge arresters
With all designs their failures may be due to:
 Damage to the housing affecting voltage withstand
 Moisture ingress into the housing
 Overstress (voltage, current, temperature)
 Operations
 Poor manufacture, installation or selection of an inappropriate rating.
105
Additionally, with gap arresters a shorting out of some blocks will increase the stress on others and will
lead to progressive failure. With metal oxide material, the degree of ageing depends on the
nature/quality of the granular outer layer. Experience reported by a system operator in India identified
90% of failures due to moisture ingress, i.e. associated with poor manufacture [B33]. The remaining
10% were due to deterioration of the metal oxide blocks. Debris around the base of a failed arrester
can be seen in Figure 4.7.2.
These are problems arising from manufacture and specified acceptance tests, perhaps including
immersion. Many find that their diagnostic strategy relates more to the manufacturer and duty factors.
Survey methods for condition assessment should include:
 Visual examination to detect surface damage and/or pollution
 Surge counter data, monitoring system damage
 Gas pressure, tightness (in case of GIS)
 Arrester disconnector, functionality (if present in case of medium voltage equipment)
 Measuring compensated third harmonic resistive current and looking for increasing or elevated
levels, followed up with out of service tests measuring capacitance/power factor to identify block
deterioration and insulation resistance changes to detect moisture leakage (see [B33] and [B35]).
The failure scenario described above (low failure rate) applies to surge arresters used in transmission
and distribution networks. It is quite possible that for special applications, e.g. in industrial networks
(electric arc furnaces, capacitors), higher stresses and thus higher failure rates may occur, but these
applications are not part of this consideration.
Bushings
Failure causes may be attributed to mechanical, dielectric and thermal stresses, as well as surface
damage. A recent A2 brochure has described the various aspects covering bushing reliability [B36].
Most importantly failures of bushings are often explosive with the outer insulator ejected several
hundred metres. An electric arc can pass through the asset itself and in the case of a transformer lead
to failure of the latter and poses a fire and environmental impact from lost transformer oil. Assessing
the likelihood of failure is important, therefore. A common failure mode is where the paper, or synthetic
material used in RIS bushings, has deteriorated leading to PD at foil ends. Another is where dielectric
losses and conductivity have increased. This may be due to penetration of moisture, the inks used as
foils in some designs, or corrosive sulphur. Thermal degradation of the dielectric can also increase
losses. Contamination of the oil can lead to failure. This might be caused by moisture ingress,
contaminants from oil-gasket interaction and oil ageing.
Cables
Most failures are associated with sealing end issues rather than in the cable; and there they can be
explosive with debris sent over 100 m. The cable itself is extruded at the factory and tested there on a
drum. The remaining risks occur when rolling out the cable, pulling in, bending and creating the
termination. The latter involves paring back the core screen with a tool and sliding a tightly fitting pre-
formed stress cone to mate with the end of the core screen. Problems can arise if the end of the
screen is damaged, the paring creates stress raisers or the whole system is damaged during fitting.
Other installation issues can arise if there is an error in earthing the core screen. (Circulating currents
can be reduced if the cable is earthed at only one end. If the design calls for just one end and both or
neither are earthed, then discharge and overheating can occur.) Further, over time in service the
insulation medium, e.g. viscous silicone-based fluid or XLPE can itself deteriorate if moisture enters
the housing.
106
Figure 4.7.4 – Puncture hole at end of core screen on 132 kV cable.
Insulators
Failure causes include damage from impacts, generally or from rifle bullets. Failure can follow
flashover in heavily polluted sites. Known as dry-banding surface currents can lead to localized
erosive discharge, shown in Figure 4.7.5. Another mechanism follows cracking at the cement area.
With age the cement will improve its mechanical strength but as it does so it can expand. The latter
transfers stresses to create local tensional forces. Where moisture can enter hollow housings, PD can
develop tracking down this inner surface.
Figure 4.7.5 – Tracking on porcelain insulator
Condition indicators
With each asset in this section performance data will be assembled, indicating past failures and
outages, identified by site and manufacturer.
Table 4.7.1 – Diagnostic indicators in use and failure modes
Diagnostic indicator data

Plant Failure modes being assessed Notes
available in utility
Visual surveys energised assets Cap bank External fuses Very limited access restricting
Can condition effectiveness.
Arrester External damage Internal deterioration not
Excess pollution identified.
Bushing External damage
Loss of oil
Cable sealing end External damage
Excess pollution
107
Diagnostic indicator data

Plant Failure modes being assessed Notes
available in utility
Insulator External damage
Excess pollution
Survey diagnostics Cable sealing end Flashover due to moisture ingress, Specialist team needed to
Oil analysis Bushing overheating, PD extract sample and replace fluid
Infra-red surveys All Overheating at connections and
some internal overheating.
Bushing oil levels
UHF- PD surveys All PD all assets With antenna and UHF-PD
scanner, see Figure 4.7.6
UV Corona surveys All External PD, all assets See Figure 4.7.6
Third harmonic compensated currents Arresters Moisture ingress and/or block
deterioration
Surge counts Arresters Ageing of unit
Out of balance harmonics Cap banks Faulty units
Offline tests Arresters Moisture ingress
DDF/ Capacitance
Bushings Moisture ingress, shorted foil
sections, dielectric deterioration
Cable sealing end Moisture ingress, dielectric
deterioration
Cap banks Faulty units
Offline tests Arresters Moisture ingress

Insulation Resistance
Offline Cap banks Detailed access looking for bulging
Visual inspection and clean of of cans, discolorations
capacitors
Online continuous monitoring Bushings PD and for dielectric deterioration
At bushing taps

Documentation from the manufacturer and held by the utility, often at the substation, would be used to
complete Table 4.1.4., Table 4.1.5 and Table 4.1.6.
Simple asset health review levels 1 and 2

A traditional strategy would add a routine visual inspection noting condition abnormalities and basic
data such as trip records to the basic collection of data. Surge counts on an arrester should be
measured.
With all items in this group a routine visual inspection is recommended, using binoculars since all
are elevated. A more thorough visual inspection is possible during an outage. Also, within an
outage, components like capacitor and its components (cans, fuses, etc.) can be accessed
safely, examined and cleaned. All porcelain insulators need to be inspected for cracks or
breaks. It should be possible to see bulging or paint discoloration on the items; both indicative
of overheating. Oil stains indicate leaks.
4.7.4.2 Intermediate asset health review Level 3

Above the Level 2, simple AHI can be added scanning with an infra-red camera to detect overheating
at connections and bushing oil level. To detect PD a frequency scan with UHF-RFI and for external PD
a corona UV camera are used, Figure 4.7.6. All can be carried out non-invasively and from the
ground.
108
Figure 4.7.6 – UHF and UV Scanning to detect PD

A further online measurement for surge arresters is to measure the third harmonic compensated
current around the earth lead, using a clip-on ammeter and an induction antenna [B33] and [B35].
If there is evidence of a series of malfunctions in bushings and cable ends it is possible to extract a
fluid sample (during an outage) and perform a materials analysis to detect chemical changes and
moisture ingress. The presence of hydrogen and carbon monoxide is detected if there has been PD.
More concerning is the presence of acetylene, indicative of higher temperature arcing. These assets
are considered hermetically sealed and any disruption has to be considered carefully.
Some utilities do not perform this level test; others restrict it to specialist teams.
Advanced asset health review Level 4

The primary diagnostics for de-energised use are for changes in dielectric loss and punctured foils
using dielectric dissipation factor and capacitance techniques. It has a role in all assets in this list.
Insulation resistance is important to identify surface leakage down the inside of an arrester housing.
Capacitance changes in a capacitor bank is effective in detecting shorted cans.
For capacitor banks one set of criteria are given in Table 4.7.2 and Table 4.7.3 [B37]. Testing a set of
capacitor cans in a bank individually can be useful- but more effective is to measure the complete
array. A comprehensive bank of test data and conditions (voltage, current, watt loss, power factor,
capacitance, correction factors, temperature, humidity, date and time) is essential for comparison and
trend analysis. Reference [B37] concluded that using the average of the measured value as a
benchmark would be the simplest and the most effective when analysing the test data. This also
confirmed the limit suggested in a previous paper [B37], where it is summarized in Table 3 and 4.
Table 4.7.2 – Dielectric dissipation factor analysis for capacitor banks
Condition Relative limit of the Absolute limit Typical problem Action

average
Very low 50% Case earth loose or Inspection, cleaning
bad connection and tightening the unit
mounting
Lower than normal 65% Case earth or bad Inspection of the unit
connection mounting
Normal 100% < 3%
Higher than normal 150% 3% ≤ x < 5% Bad connection or Inspecting and
internal detoriation tightening the
connection and fuse
with retest
Very high 200% 5% ≤ x Severe bad Inspecting and
connection or internal tightening the
detoriation connection and fuse
with retest or replace.
109
Table 4.7.3 – Capacitance analysis for capacitance banks
Condition Limit of deviation Typical problem Action

Very low x ≤ -5% Severe delamination, Replacing the unit or
low fluid level or repair the connection
discontinuity
Lower than normal -5% ≤ x < -3% Delaminating, low Inspection of the unit
fluid level or partial mounting or retesting
discontinuity sooner
Normal -3% ≤ x < 3%
Higher than normal 3% ≤ x < 5% Partial short-circuited Inspecting the unit
section of internal and replacing if
layers necessary
Very high 5% ≤ x Partial short-circuited Replacing the unit
section of internal
layers
There is a long tradition of DDF/ capacitance testing on HV bushings, this being done in an outage
and at 10 kV. As a result, there are extensive utility shared databases detailing most major suppliers,
designs and voltage levels. Individual results can be compared against similar designs. Particularly
important are changes relative to the nameplate values. Changes used for alert/action levels depend
upon the general aspects of design, materials used, known design-specific failure modes and voltage
levels. More focussed is a dependency upon past experience with particular designs. Changes from
nameplate DDF vary due to these factors, but changes from 20 to 100% indicate a significant concern
and the need to measure more frequently. At that point, the rate of increase becomes critical.
Capacitance changes may be due to dielectric changes, but more likely are indicative of shorting of
adjacent foil sections. Concerning levels of change depend upon the number of foil sections.
One way of assessing change is to compare values with benchmarks of the same design group within
the company, or with an international database. In the case of the set of three bushings. H1, H2 and
H3 the company had only 100 similar bushings, whereas a better benchmark is possible using a
collaborative industry wide data set of 2700 similar bushings.
Figure 4.7.7 – Benchmarking DDF data with international database resource [B38]
4.7.4.3 Advanced asset health review Level 5

There is also a long tradition of monitoring DDF of bushings online [B38]. This avoids the need for
regular outages in order to identify failure initiating deterioration. Connections are made at the three
bushing taps, taking current and voltage measurements by recording ‘raw’ sinusoid data on each
channel. The data is used to ‘derive’ capacitance and power factor values. PD levels can be monitored
in the same tap – see Figure 4.7.8. This particular set of data relates to an issue revealed when a
step-up transformer was re energised after an outage.
This is a technology to be used with care, with the utility having clear action plans of what to do in the
event of an alert and in what time scale.
110
Figure 4.7.8 – Bushing tap modified for PD and PF measurements, and typical results
Step 5 Collect Inspection data
Table 4.7.4 – Data and scale codes
Step 5 Step 6
Test Data or observations, Converted to scale code

in units as collected scores 1-5 or log 1-100
Capacitor banks
Visual inspection
Survey measurements – IR and UHF-PD
Harmonic currents
De-energised inspections
DDF/ capacitance
Bushings
Visual inspection
DDF/ Capacitance
Oil quality for acidity and consequences of contamination or deterioration.
Dissolved gas levels and rates of change for indicating active thermal, PD
or arcing fault.
Surge arresters
Visual inspection
Compensated third harmonic
Insulation resistance/capacitance and DDF
Cable sealing end
Visual inspection
Oil quality for acidity and consequences of contamination or deterioration.
Dissolved gas levels and rates of change for indicating active thermal, PD
or arcing fault.
Insulation resistance/capacitance and DDF
Insulators
Pollution levels
Infrastructure
Concrete footings cracks or deterioration?
Anchor bolts missing or rusty?
Earthing leads or straps - oxidised/ tight?
111

Each data point needs to be scored with a scale code in either a log or linear system. The
interpretation must always be consistent. That is it is an indicator the failure mode will induce a failure
in a defined period as indicated in Table 4.7.5. Each of these asset classes should be capable of a
significant lifetime and require minimal maintenance. These are typical of what might apply, but
individual companies should define their own time criteria.
Table 4.7.5 – Scale code assignment
Scale code log Scale code linear Description Expected fault free life
1 1 Very good condition >25 years with
Normal test and inspect schedule
3 2 Good condition 15-25 years with
Normal test and inspect schedule
10 3 Fair condition 5 - 15 years with
More regular test and inspect schedule
30 4 Poor condition <5 years
Advanced test and inspect, plan remediation or
replacement.
100 5 Critical condition 0-6 months
Immediate remediation or replacement.
Step 7: Aggregate Indicators for AHI

A tabulation similar to Table 3.6 as described earlier would be constructed and used.
Step 8: Plan mitigating actions

It is likely that much can be achieved to improve or mitigate the score other than replacement of the
asset or some constituent part.
112
Control and protection

Control and protection functionality in a substation is provided by protection relays and by substation
automation devices. These devices were initially of an electromechanical design, followed by static
devices. The current design is realised by a microprocessor design with some sort of digital
communications between the devices.
In order for these devices to work, they are accompanied by additional low voltage secondary
equipment like switches, indicator devices, contactors, fibre optic star couplers, etc. All of these
devices are connected together by a metallic or fibre control cables via sets of terminals.
The most common reason for replacing these secondary plant items is not due to a condition
ascended risk of failure. Rather it is to upgrade a system to add functionality or new more cost-
effective technology or when the software or hardware is no longer supported by the manufacturer.
Many utilities do not include the control and protection devices in their asset register databases, and
they do not apply any AHI methodology to these assets. That said it is useful to look at how any
condition-based likelihood of failure might arise. As previously, the first step is to identify the assets
and decide what type of maturity analysis is appropriate to this particular class of assets. This is likely
to link to business need, particularly system reliability.
Table 4.8.1 – Identifying assets and diagnostics
Asset Method of confirming condition Confidence level
Digital relays Testing (functional) 3
Electromechanical relays Testing (measuring) 3
Control terminals Testing (functional) 3
Fault recorders Visual inspection 2
RTU Visual inspection 2
Condition monitoring system Visual inspection 2
Station computer Visual inspection 2
PMU Visual inspection 2
Metering system Visual inspection 2
Control cables Testing 3

This step involves reviewing failure modes- both conceptually for the asset type and specifically from
Table 4.8.2 – failure mode analysis
Relevance Condition Condition

Asset Failure mode
(FMEA) indicators score
Unable to operate
Mis-operation
Power supply failure
Digital relays Mechanical damage
Overheating
Internal battery failure
Contamination of relay
113
Relevance Condition Condition

Asset Failure mode
(FMEA) indicators score
Unable to operate
Mis-operation
Electromechanical relays Mechanical damage
Overheating
Contamination of relay
Unable to operate
Mis-operation
Control terminals
Mechanical damage
Overheating
Contamination
Unable to function
Fault recorders Mechanical damage
Overheating
Impaired ability to record
Unable to function
Mechanical damage
RTU
Overheating
Unable to connect to
network
Unable to function
Condition monitoring Mechanical damage
system Overheating
Reporting wrong data
Unable to function
Station computer
Mechanical damage
Overheating
Unable to function
Mechanical damage
PMU
Overheating
Unable to function
Mechanical damage
Metering system
Overheating
Failure of insulation
Open wire
Control cables
Critical attenuation (fibre)
Contact high resistance
114

To assess historic performance- test data, inspections, trips, costs, operational environment, plans for
the asset and its role in the network etc.
 Instruction manuals from manufacturers
 Test and inspection reports (manufacturers, acceptance, periodic reports)
 Federal regulator requirements
 Operational environment
 Recall data/notifications from manufacturers
 Historical operational data

The FMEA should indicate the relevant indications of onset and progression of deterioration leading to
a failure mode. From this the diagnostic strategy can be determined.

Data can now be collected, trends identified to lead to a condition assessment.

Step 7: Aggregate indicators for AHI

Scores for each failure mode can now be aggregated to an AHI.
Step 8: Plan mitigating actions

115
Auxiliary systems
This section focuses on the auxiliary systems of the substation and application of applied methodology
for generating an AHI.
4.9.1 Step 1: Identify Assets

The auxiliary systems of the substation are broken into two different categories:
 DC system
 AC system
From TB300 [B40] 3.12 – The most important components of AC and DC station services are the
station services transformers, automatic and/or manual AC and DC transfer schemes, battery
chargers, batteries, and AC and DC distribution panels.
An overview of the main tasks of the auxiliary substation components is described in Table 4.9.1.
Table 4.9.1 – Auxiliary Equipment and Roles
Equipment Main Task
Station Service Transformer Provides AC power to operate equipment/control systems in the substation
AC/DC Transfer Scheme Switch between primary, backup AC sources and the DC power system in the substation
Battery Stores DC power for use by control and protection systems
Keeps DC system powered and charging - source comes from station service or backup AC power
Battery Charger
source
Generators Supplies AC power during emergency situations
AC Distribution Backup Supplies AC power during emergency situations
4.9.2 Step 2 Review Failure Modes

Failure modes can be identified by analysing the existing historical data of the equipment. Most
network operators have corresponding entries for errors in their databases. Therefore, only a suitable
evaluation has to be carried out to identify the relevant failure modes.
Each asset is broken down into its subcomponents for failure mode categorization.
Table 4.9.2 – Review levels
Typical Analysis
Asset Component Priority Comments
Review Level
Station service Priority is very high for this asset, because it can
AC Supply 1 2
transformer damage the main power transformer
Generators 2 3
Distribution backup 2 3
DC Supply Battery 1 1 Monitoring systems commercially available
Battery charger 1 1 Monitoring systems commercially available
AC/DC AC/DC transfer

1 3
Conversion switch
116
As an example, for the DC supply asset, battery component, the FMEA procedure leads to the
following failure modes:
 Insufficient electrolyte level
 High temperature
 Contact corrosion
 Over or under voltage
 Cell failure
4.9.3 Step 3: Assess historic performance

Asset performance data can be collected from various CIGRE work done in this area.
Utilities could use their own historic performance data to help calibrate this section.
4.9.4 Step 4 Identify Diagnostic Strategy

The FMEA (see Section 3.4) should indicate the relevant indications of onset and progression of
deterioration leading to a failure mode. From this the diagnostic strategy can be determined.
4.9.5 Step 5 Collect Inspection Data

Data can now be collected, trends identified to lead to a condition assessment.
4.9.6 Step 6 Evaluate Condition relative to Failure Mode

4.9.7 Step 7 Aggregate Indicators to AHI

Scores for each failure mode can now be aggregated to an AHI.
4.9.8 Step 8 Plan Actions

Buildings and structures

Step 1: Identify the assets and decide on maturity levels
Figure 4.10.1 – Sample image of substation where we can name all assets under consideration
117
Site issues
Typical components of a substation are shown in Figure 4.10.1. To perform a condition assessment
there should be a strategic review covering the following:
 Is the site of adequate size for the substation installed on it?
 Is there satisfactory access for loaders, cranes, vehicles, and equipment? In particular, access
for transformer floats should be level, provide for unloading on site, not in the street, provides
adequate road turning radius, and provide adequate crane height.
 Have there been any changes that affect access?
 Are the ownership and tenure of the site satisfactory? Freehold title is the preferred tenure
method. Leases and easements are less satisfactory and if they are used should not be limited in
time; even 99-year leases run out.
 Is there satisfactory access for cables, duct lines, and overhead lines?
 Is there a buffer zone around the substation?
 Are security issues adequately covered?
 Are there noise issues?
 Are there nuisance to neighbours’ issues?
 Are there intrusive lighting issues?
 Are the earth conditions stable?
 Are there drainage or subsidence issues?
 Is the site clean, tidy, and adequately landscaped, grass mowed, etc?
 Are there any risks imposed by adjacent properties, such as fuel storage, petrol stations, and
trees?
 Is the site in a flood plain and if so, is it sufficiently elevated?
 Is the site at risk from bushfires?
Substation fences
General condition of substation fences can be assessed based on their location and their purpose.
Increasingly fences are being made to more secure designs and require more attention and
maintenance (Figure 4.10.2).
Support poles must be plumb and capable of supporting fence loads for all design conditions.
Corrosion of posts should be repaired and inhibited against further rust. Fences shall be checked
regularly for excessive gaps between the bottom of the fence and finish grade. Large gaps have to be
fixed to prevent animals, unauthorized personnel and children gaining access into a station. These
gaps have to be filled and any other holes should be repaired.
Connections to the earth grid and bonding straps should be inspected to prevent touch potential
hazard to the public. Integrity of the station fence earthing grid should also be checked periodically. It
is not unusual to find that thieves have removed entire pieces of fence earthing. To mitigate this
situation, replacement of copper conductors with aluminium or aluminium-clad steel wire is an
alternative.
118
Figure 4.10.2 – High security fence
Buildings
There is a wide range of building types used in substations, from the large fully enclosed building in a
city business district housing all equipment, to the control building in an outdoor type substation. Even
in outdoor substations the building will house critical protection and control equipment, without which
the substation cannot operate. The design and condition of the building can therefore be a critical item
in the assessment of the substation (it is often overlooked).
Points to address in assessing a substation building are as follows:
 What type of building is it? Reinforced concrete, brick or block work, steel framed, steel clad,
light construction, etc.
 What is the design life of the building and its current age?
 What type of foundations are installed and what are the earth conditions? Is it built on rock or on
piers to rock or solid earth? If not, it may be subject to movement, subsidence, cracking, or even
collapse.
 Is the construction strong enough to withstand stress, e.g. from earthquake, impact, pressure
load from electrical faults and similar?
 Is the structural design such that a failure of one part of the building will not cause a cascade
failure of other parts?
 Condition of the structure. Is there evidence of subsidence, movement, cracking, damage?
 Roof. A key requirement for a substation is that the roof should prevent water entry into any
electrical area as this could cause failure of the electrical equipment.
 Type of roof e.g. concrete, steel sheet, fibre cement sheet, aluminium sheet, tiled.
 A concrete slab roof is sound provided it is tied into the wall structure. However, if it
depends on a membrane for waterproofing this is poor as the membrane has limited life.
 A pre-cast beam roof will usually not be tied into the wall structure and will depend even
more on a membrane for waterproofing.
 A fibre cement sheet roof will be easily damaged by large hailstones and any damage will
let water in.
 A tiled roof will also be damaged by large hailstones or similar objects.
 A sheet steel roof if well-constructed will be resistant to damage and water ingress but at
best will require replacement at about 25 years.
 Structure of roof
 A timber framed roof will be a fire risk, from internal equipment failure, from external
sources such as a transformer fire or a bushfire, or even from wiring in the roof space. Once
119
fire takes hold in the roof space it will generally burn out completely, collapse, and render
the whole substation unusable. Fire rated ceilings provide only partial protection.
 A steel framed roof is a good option for a substation provided there is no fuel source below
it or nearby. Exposed Steel framing does not have a fire rating.
 The roof structure should be adequately tied down to resist extreme wind loadings.
 Gutters and down pipes.
 All roofs should overhang the walls. Internal gutters will invariably lead to water ingress as they
will block up with debris or hail and flood. All gutters should be external to the walls so that
overflow will not let water into the building.
 Similarly, all down pipes should be external. Internal down pipes will inevitably block up or leak
and let water into the building. The most suitable arrangement is to have no roof gutters but to
use earth level dish drains below the roof edges. This avoids the inevitable build-up of leaves
and debris in roof gutters.
Identify the assets and decide what type of maturity analysis is appropriate to the particular class of
assets. This is likely to link to business need, particularly system reliability.
In a substation, there will be all possible types of buildings and structures whose design may vary from
one manufacturer to another, would depend on voltages, countries etc. However, it is possible to list
most used materials and describe possible problems, their effects and indicators according to that.
There will be different categories of (main) features which independently contribute to the reliability of
the substation:
 Buildings
 Structures and foundations (steel and concrete)
 Substation platform, drainage, roads, earthing, fences and gates.
A substation expertise would require a technical expertise mostly in these three technical areas:
 Quality of concrete structures and foundations
 Quality of steel structures and their assembly
 Quality of the site and the drainage system
Step 2: Perform FMEA and identify condition indicators to be used

This step involves reviewing failure modes – both conceptually for the asset type and specifically from
As it takes time and money to make measurements and condition assessments on all assets, then
under the limited budget it is recommended to focus on assets that have largest impact on the
substation’s main function – delivering electrical energy.
 Assets for which a failure may lead to an interruption of supply.
 Structures and foundations.
 The main function of these structures is to maintain high voltage electrical equipment in position
without any risk or modification of the geometrical characteristics (position, distances) under
normal and abnormal operating stresses (under constraints of efforts of mechanical, electrical or
natural origin (wind, ice, etc.), without degradation of the mechanical strength of these supports
over time.
 The buildings. The main function of these buildings is to protect electrical equipment,
telecommunication equipment, control equipment, in an environment that is appropriate for their
operation (temperature, hygrometry, cleanliness, etc.), and prevent interventions by unauthorized
personnel, or third parties (malice, vandalism), as well as assaults of various origins (wildlife,
etc.).
According to generic methodology performing FMEA should be categorized into three risk categories:
 High risk
 Medium risk
 Low risk
120
But as we can list four possible consequence scenarios for failures in buildings and structures assets
list, we can use following priorities with explanations.
 Priority 1 – If something happens then substation is unable to function properly. For example,
when roof of the control building is leaking heavily then all the equipment inside may suffer from
water damage and therefore stop functioning. Safety of workers.
 Priority 2 – If something happens then asset (primary or secondary) related to that is unable to
function properly. For example, when support of disconnector is failing then it is unable to use
that disconnector.
 Priority 3 – If something happens then nothing will happen to substation in short term, but it will
affect reliability in long term. For example, when drainage is not functioning properly.
 Priority 4 – If something happens then nothing will happen to substation that may affect
reliability but makes service more inconvenient. For example, when road is broken.
Table 4.10.1 – Components
Asset Component Priority

Roof 1
Floor 2
Walls 2
Doors 3
Building
Windows 3
Drainage 3
Protective equipment (air-conditioning, ventilation, fire protection, anti-intrusion) 2
Foundation 2
Supporting structures 1
Foundations 1
Structures and foundations
Bolts 2
Welds 2
Fence 3
Fence Gates 3
Earthing 3
Drainage 2
Roadways 4
Other equipment
Lighting protection system 2
Site cleanness and vegetation management 2
Earthing connections 1
Earthing system
Earthing system 1
Step 3: Assess individual asset performance

Historic performance to be assessed – test data, inspections, costs, operational environment, plans for
the asset and its role in the network, etc.
First of all, we need to gather technical data about the original design of structures and buildings,
including the site conditions.
 Site condition (temperature; elevation, keraunic level, pollution level, etc. – any criteria having an
influence on ageing)
 Project drawings and technical specifications, etc. (the “as-built“ conditions)
 Commissioning stage – records of events
 Incidents and events which occurred since the construction stage
 Maintenance reports, events record will be collected and analysed
Depending on the design, there will be different areas to investigate:

 The structures and foundations
 The buildings
 The fences
 The platform, drainage, etc.
121

The FMEA should indicate the relevant indications of onset and progression of deterioration leading to
a failure mode. From this the diagnostic strategy can be determined.
Priority 1 assets should have the most sophisticated diagnostics (for example detailed measurements
of coating thickness and carbonizations of concrete) and Priority 4 assets should have the simplest
diagnostics methods (for example simple visual inspections). Decision about diagnostics strategy
should be done according to available methods for the asset, priority levels and reliability of output
data from selected methods. Selected diagnostics strategy should describe main failure modes of
assets.

Data can now be collected, trends identified to lead to a condition assessment. We have listed in the
following table the facts that can be observed when doing the inspection of the substation.
Table 4.10.2 – Failure mode detection indicators
Category List of assets Failure mode Method(s) to detect Indicator

Buildings All buildings in the Water is falling on control- and Visual inspection Roof is leaking
substation protection devices
Building is about to collapse Visual inspection Building is showing
signs of failures
Dust, birds, etc. are Visual inspection Doors and windows
contaminating indoor devices are not functioning
Steel Overhead line Overstresses Visual inspection Mechanical defects
structures portals, Loss of mechanical strength Visual inspection Rusting, welding
lightning protection defects,
supports, Pulling tests Test results
high voltage
Zinc/paint coating Coating thickness
equipment supports
thickness
measurements,
X-ray X-ray results
Concrete Foundations, Concrete’s protective Carbonization Carbonization exist
structures reinforced concrete characteristics will stop measurements
panels, protecting reinforcements from
rusting
high voltage
equipment supports Loss of mechanical strength Visual inspection Concrete is falling
from reinforcements,
Cracks in the concrete
Pulling tests Test results
Schmidt hammer test Test results
Reinforcements Measurement results
corrosion
measurements
Reinforcements are uncovered Visual inspection Rust
and start to rust rapidly
Wooden Lightning towers, Loss of mechanical strength Visual inspection Wood is rotting,
structures reactor fence mechanical defects
Other Drainage Flooding Visual inspection Water stagnation
Danger for the workers Visual inspection Expert opinion
Danger for the equipment Visual inspection Expert opinion
Roads Unable to perform maintenance Visual inspection Road is deteriorated
work or emergency repair
Lightning protection Direct lighting strike to the Office study Poor design of
system equipment lightning system
Loss of lightning protection Visual inspection Mechanical defects
system, risk of mechanical
damage
Site cleanness and Flashover Visual inspection Vegetation is growing
vegetation in substation
management LiDAR PointCloud analysis
122
Category List of assets Failure mode Method(s) to detect Indicator

Mechanical damage of the Visual inspection Falling vegetation
equipment may reach to
equipment
LiDAR PointCloud analysis
Earthing Earthing Earthing system is not Visual inspection No connection
system connections functioning
High resistance Resistance Connections
measurement deteriorated or
ineffective
Earthing system High resistance Resistance Earthing system
measurement deteriorated
This table is not final and will never be. It gives just an overview how it might be possible to
determine failure mode, methods to detect the failure mode and condition indicators for assets that are
not directly related to high voltage transmission system. How the exact classification is done is based
on individual FMEA analysis.

The condition can now be related to failure modes.
Develop condition indicators limit values or criteria for relevant failure modes. One possible approach
is to do it in a similar way to that done in overhead line methodology “Advanced condition monitoring
method for high voltage overhead lines based on visual inspection” [B39].
A simple approach is given in CIGRE TB 300 [B40]:
Structures and foundations
Substation foundations and structures should be periodically evaluated in terms of their existing
condition, current state of repair, and integrity. The following ranking of the condition of these elements
could be used:
 Good Condition – The element is intact, structurally sound, and performing its intended purpose
and there are few or no cosmetic imperfections. There is no need for repair.
 Fair Condition – The element shows early signs of wear, failure, or deterioration, although it is
generally structurally sound and performing its intended purpose. Examples of fair condition of
an element would be crumbling of concrete surfaces or signs of rust on steel structures.
 Poor Condition – The element no longer performs its intended purpose; it is missing, or it shows
signs of imminent failure or breakdown. Such a condition of an element requires major repair or
replacement. Examples of a poor condition of an element is unusual and excessive deflection of
a steel structure or deterioration of a foundation to the level that rebars are exposed and are
visibly rusting.
 Critical Condition – The element shows advanced deterioration that will result in its failure if not
corrected within two years, and/or the element poses a threat to the health and/or safety of the
user. Example of a critical deficiency would be tilting of foundations due to excessive loading or
poor drainage conditions.

Scores for each failure mode can now be aggregated to an AHI, as described in Chapter 3.
123
Example of the scoring was presented at CIGRE meeting in Belgium in 2013 and is referenced in
TB 660 [B5].
Table 4.10.3 – Classification rules for buildings according to their condition [B5]
Normal mainte-nance
Repair or refurbish
Include in planned
Trigger to start a
Explanation
projects
project
Code
GREEN Building in good condition yes

N/A
Keep building Normal ageing, no
with a significant problems.
horizon > Normal maintenance is
2020 YELLOW yes yes
sufficient to preserve the
building in a good
condition
Building with important
Yes but only those needed
defects. If a project is
to guarantee minimal
ORANGE already planned, the yes X
reliability (e.g. repair instead
Abandon replacement of the
of refurbishment)
building building has to be included
horizon 2020 Building with serious
problems, a reason to start
RED (minimal) – X X
a project in the next tariff
period
Building in very bad
Short term
condition, immediate risk
action BLACK Immediate action
for people or outage of
needed
grid element(s)
Step 8 Identify mitigation actions to improve AHI

124
5. Assembling sets of AHI outcomes and Displaying

results
A health index is a mathematical construction based on raw data and analyses, which attempts to
summarise the available data in a single code or value. The index is an estimate and is produced by
reduction of what may be a large quantity of disparate data – the reduction removes significant
information available only in the original data. The advantages of an index are that should be both
easily understood and easily incorporated into spreadsheets and similar tools.
Earlier chapters have focussed on creation of AHIs on the basis of a single asset. Many of the
publications have also been derived for such an application in order to produce a prioritised action
plan based upon urgency [B14], [B41] and [B42]. However, a health index system will typically
comprise a large number of different asset classes and the asset manager typically has a broader
interest than just looking at individual assets and their respective AHI scores. Typically, the main tasks
for the asset manager are (in brief) to:
 Prioritise maintenance efforts
 Which assets?
 What maintenance content?
 When?
 Prioritise replacement investments
 What needs to be replaced?
 When?
 Simultaneous replacement of “neighbouring” assets advantageous?
 Prioritise and specify further investigations
 Select and prioritise (further) data collection
There are multiple ways to present the final result of a health index. An asset condition can be
represented by a health index score and/or colour. The code/score gives a quick overview of the asset
technical state. For a health index to be effective it should have a clear and stated aim, an explanation
of how the indices meet that aim, and unambiguous display of the indices for subsequent use.
Many users expect their index to produce a single number to allow assets within the index to be easily
ranked. Other users prefer the output presented as simple colour to indicate the overall state of each
asset. The colour code enables quick fleet assessment and dashboard functionalities. The health
score/colour coding typically is transferred to a parameter that may be used to derive apparent risks
for the assets given its health status. In this way the health index scoring may be coupled and
complemented to a risk assessment to enable replacement/maintenance prioritization and cost
management. This may be achieved by calculating/deriving for each asset:
 The probability of failure (PoF) of the asset within a period, or
 A remaining life (RL) for the asset, or
 A time to act
Note that these three potential outcomes, though not strictly the same, are strongly related to each
other and all may be used similarly in further analysis. As already stated in Chapter 2, company asset
managers will typically need to go further than only assessing condition and likelihood of failure. They
will be assessing the risk of failure, which further requires impact of failure estimations, compared to
specific specified risk appetite (thresholds) for the company, more commonly referred to as Key
Performance Indicators (KPIs) which would be defined as part of the company policies and/or
aspirations.
The risk assessments for all assets may then be ranked, yielding a Risk Index (analogous to the
Health Index methodology). Together with cost and portfolio optimization tools these results can then
be used to determine optimized replacement/maintenance programs.
Issues when combining sets involving different asset types

The remainder of this chapter looks at options for deriving a health index for a collection of assets
within a bay, circuit end or the whole substation. Some further discussion of the mathematical
concepts is included in APPENDIX C.
125
To begin with we must identify the characteristics of a health index which is capable of being
combined with other health indices in a meaningful manner. Then we can look at the pros and cons of
different approaches to combination.
In particular we may need to estimate the probability of failure of an asset or component in order to
perform calculations of overall probability which reflects both the individual assets and their
interconnectedness.
Collating health scores for individual assets allows a unified view of a collection of assets: a bay, a
substation or a circuit. The collection could be physical, such as a bay, or logical, such as oil filled
assets, or those of a particular manufacturer. Examples of such collections may be:
 Transformers > 300 kV for a specific manufacturer, manufactured after 1990
 Circuit breakers connecting lines or reactive compensation
 All 33 kV substations bays in a certain region
To build up a single score/index for a collection of assets it is required that for each individual asset:
 There is an individual asset health/index score for each asset in the aggregation.
 The score for each asset is both calibrated and monotonic ensuring similar scores have similar
ranked ‘urgency’ and/or a time-to-act ranking alongside it. The individual health scores for all
asset have associated and uniform timescales for action; it is up to the health index system
‘operator’ to determine appropriate timescales for their organization. For example, see Table 5.1.
 The individual score is an indication of the probability of a need for intervention to prevent failure
– maintenance, replacement, reduced loading etc.
 There is no requirement that the code value (1, 2, 3, etc.) relate directly to the urgency or time to
act.
Table 5.1 – Condition scale code examples
Log base 3 Log base 10 Linear Alpha Description Urgency Time to act
1 1 1 A Very good Negligible > 10 years
condition
3 10 2 B Good condition Low / 5-10 years 5-10 years
10 100 3 C Fair condition Moderate 2-5 years
30 1,000 4 D Poor condition High 1-2 years
100 10,000 5 E Critical condition Extremely high < 1 year
The methods for aggregation/collation of health scores for individual assets will depend on the type of
health scoring system used for the individual assets, where this may be (refer also to Table 1.1):
 A Max scoring system, where the final (individual) asset score is defined by the worst scoring
condition indicator.
 A logarithmic scoring system, base 3 or 10 for example, or linear system which are summed to
give an overall score.
Examples – Part 1
Note that for any aggregation it is crucial that health index scores for assets within the aggregation are
calibrated and monotonic with respect to each other in order to create a meaningful aggregated score.
In case of an aggregated score for multiple asset types, the user must be cautious of the health
scoring methodology used for the individual assets.
Simple Substation Max and Average of available indices

In a practical system shown in Figure 5.1 for the results of a station analysis: 1 is good, 100 is in a
state requiring immediate attention, using a 1, 3, 10, etc. scale. The maximum score of the set of
assets is 41, while the average for the station is 11.
126
Figure 5.1 – Max and Average of Asset Health Indices at a single station
How do we ascribe meaning to the two values? If the scores were generated by a means which
retains the urgency, then the Max will continue to indicate that urgency while the Average will not, as
we do not know, from the score, just how many assets are involved or how the urgency has been
maintained. The idea that the average score gives a ‘general indication’ of the health of the station is,
to a degree, valid – but it provides little in the way of understanding the volume of work/intervention
required, or the urgency thereof.
The overall health index produced by Max and Average are ‘valid’ from a mathematical point of view,
but the meaning we ascribe to them may be different and difficult to justify for the average.
Combining Asset Health Indices: Simple Approach

Consider the case of two assets, T1 and B1, which have health indices defined by Table 5.2 (note that
all values/codes are examples only):
Table 5.2 – Example with alphabetical codes
Code: Alpha Description Urgency Time to act

A Very good condition Negligible > 10 years
B Good condition Low / 5-10 years 5-10 years
C Fair condition Moderate 2-5 years
D Poor condition High 1-2 years
E Critical condition Extremely high < 1 year
If transformer T1 has a health index code A, and Breaker B1 has a health index code C, what should
their combined asset health index be?
Their Max is code C, and their ‘average’ is somewhere between A and C. What is the urgency of the
combination, and the timescale for action? These are not well defined but are important for action
planning if using an index for the combined assets.
127
What if we have numeric codes, as in Table 5.3, so asset T1 is a code 2 and asset B1 is a code 3.
Table 5.3 – Example with numeric codes
Code: Alpha Description Urgency Time to act

1 Very good condition Negligible > 10 years
2 Good condition Low / 5-10 years 5-10 years
3 Fair condition Moderate 2-5 years
4 Poor condition High 1-2 years
5 Critical condition Extremely high < 1 year
The Max for T1 and B1 is 3, while the average is now 2.5, but there is no Code ‘2.5’. The Max
identifies the urgency of the two assets, but the Average is a mathematical construction and again has
no physical meaning.
What if there are three assets, with overall asset health score codes as per Table 5.4:
Table 5.4 – Combining AHI for 3 assets, alphanumeric codes
Asset: Set 1 Code: Alpha Code: Numeric

T1 B 2
B1 C 3
D1 B 2
The Max is still at C or 3, but the average is somewhere between B and C, with a value of ~2.3. This
average could give false confidence in the viability of the collection. If we compare this set with the set
in Table 5.5, we can see this new set has a Max of E or 5, which reflects the urgency of work required
on the breaker, but the average is, maybe a C? And a mathematical average value of ~2.3 is the same
for Set 1 and Set 2. So, the average will not help identify the urgency within the data set.
Table 5.5 – Second example with alphanumeric codes
Asset: Set 2 Code: Alpha Code: Numeric

T2 A 1
B2 E 5
D2 A 1
In summary – the use of Max identifies the urgency of the worst asset or assets but does not attempt
to combine several codes into a new single code. The average is a mathematical construction with
little meaning in terms of urgency.
Note: if we have a range of ‘Probability of Failure’ (PoF) associated with each category then we can
calculate a range of PoF for the assets in that category.
Potential methods for aggregation of health scores

The following methods are suggested for creating a combined asset health score for an aggregate of
assets. These aggregates may consist of assets of the same or different asset types.
1. Enumeration of single (overall) asset scores
2. Enumeration of all available condition indicator scores for all assets
3. Normalization of all asset scores into one overall aggregate score
4. Focussed aggregation aimed at a functional set of assets using probability of failure
information
Option 1 – Enumeration of single (overall) asset scores

An Enumeration approach looks at the number of assets which fall into a particular category. If we
apply the approach to the assets in Set 1 and Set 2 (Table 5.4 and Table 5.5), we would produce
Table 5.6 where additional colour coding has been used to indicate the urgency.
128
Table 5.6 – Use of colour coding and TB 761 scoring [B3]
Asset Code Code Code Code Code Enumeration

E or 5 D or 4 C or 3 B or 2 A or 1
Set 1 0 0 1 2 0 00120
Set 2 1 0 0 0 2 10002
The higher the enumeration, the more urgent the intervention would be note that there is no need to
average or combine individual scores or PoF’s as the enumeration ‘automatically’ ranks more urgent
cases as higher numbers. The ranking does not indicate the PoF but would retain the PoF rank order.
In this example, set 2 has a higher ranking than set 1.
The enumeration approach is one of the methods for collating component scores discussed in the [B3]
CIGRE TB 761 “Condition Assessment of Power Transformers” (Table 2-2). As an alternative
visualisation of the enumerated scores a stacked (colour) bar chart can be used as depicted for this
example in Figure 5.2.
Figure 5.2 – Possible visualisation of asset scores
Option 2 – Enumeration of all available condition indicator scores for all assets
In Option1, only the overall asset score was considered. However, we lose information concerning the
number of indicators with a certain score category. It could be argued that an asset that has three
indicators with a certain score category is more likely to fail than one with one indicator in that same
category.
Consider the same two sets of assets, but now with indication of their three individual condition
indicator scores, see Table 5.7 and Table 5.8.
Table 5.7 – Example of condition indicator scores: Asset 1
Asset Set 1 T1 B1 D1
Condition indicator 1 2 3 2
Table 5.8 – Example of condition indicator scores: Asset 2
Asset Set 2 T2 B2 D2
The enumeration of all condition indicator scores would give (Table 5.9):
Table 5.9 – Enumeration of Combined Asset Condition Scores for Assets 1 and 2 [B3]
Asset: Code: Code: Code: Code: Code: Enumeration

E or 5 D or 4 C or 3 B or 2 A or 1
Set 1 0 0 2 5 2 00252
Set 2 1 0 1 0 7 10107
129
Again, the higher the enumeration, the more urgent the intervention would be note that there is no
need to average or combine individual scores or PoF’s as the enumeration ‘automatically’ ranks more
urgent cases as higher numbers. The ranking does not indicate the PoF but would retain the PoF rank
order. In this example, set 2 has a higher ranking than set 1.
As an alternative visualization of the enumerated scores a stacked (colour) bar chart can be used as
depicted for this example in Figure 5.3.
Figure 5.3 – Possible visualization of asset scores

Although ranking is preserved, one needs to consider the potential for skewing of visual results and its
perception in case large differences in the number of condition indicators exist between different asset
types.
Option 3 – Normalisation of all asset scores into one overall aggregate score
It can be argued that a single aggregate health score for an aggregate of assets is also a feasible way
to show an overall score. For instance, by summing all condition indicator scores for all assets to
create one number to represent the entire aggregate. However, summation of the scores for multiple
assets would prefer all individual asset scores to have an equal number of assessments/indicators.
This will allow a fair comparison between different asset scores within the aggregate and creation of a
balanced final aggregate score.
An example of this is shown in Table 4.6.8 where this is applied to GIS bays each with five
subcomponents that are each scored according to a log base 3 scoring system. The total bay score is
then determined as the sum of all the subsystem scores.
Note that a clear drawback of the method shown in Table 4.6.8 is that it needs to be calibrated to the
number of assets and indicators. For instance, if the number of assets in the bay increases from 5
to 7, the condition ranges (left column) would need to be re-calibrated. A further complicating factor
would be to determine an aggregate score for a group of assets which have a different number of
condition indicators between them.
For example, consider a simplified substation bay consisting out of a transformer, circuit breaker and a
disconnector. Assume that each asset has a health score using a base 10 log scoring system as
depicted in Table 5.10. Note that there is difference in the number of condition indicators for the
transformer (6), circuit breaker (3) and disconnector (2). The overall score for the bay is given as the
sum of the individual scores for the assets.
Table 5.10 – Example of a simplified bay
Transformer Score Circuit breaker Score Disconnector Score

Main tank oil DGA 1 Switching time 10 Corrosion 10
Tap changer condition 10 Corrosion 10 Number of operations 10
Thermal scan 10 Number of operations 1
Bushings – tan delta 10
Tank corrosion 1
Sound level 10
Sum score 42 21 20
Normalised score 14 14 20
Total bay score 83
130
In this case one can easily see that the overall summated score of the bay (42+21+20 = 83) is
dominated by the transformer, in this example largely due to the larger number of indicators for this
asset compared to the others. This clearly indicates that the total bay score is skewed by a difference
in number of condition indicators between assets taken together.
This could be solved using normalisation of scores, i.e. to compensate for the different number of
indicators. In the example of Table 5.10 this is done by normalising to the situation of maximum two
indicators (as is the case for the disconnector) with a maximum possible score of 20 (2 indicators with
max score 10). The normalised scores as shown in Table 5.10 show equal scores for the transformer
and circuit breaker, and the worst score for the disconnector. Since the transformer and breaker have
the same relative number of ’10’ and ‘1’ scores, this is to be expected. Similarly, both indicators of the
disconnector indicate the worst possible score, explaining why this is the asset with the overall worst
normalised score. Whether this actually proves that the disconnector is the worst asset of the
aggregate is quite questionable, since this is based on the judgement of two indicators only. In case
several other condition indicators are possible to assess but were disregarded or there simply is a lack
of data, the above example could give a false feeling of security/trust in the assessment where in fact
it is incomplete.
At which point do a number of ‘lower’ urgency codes add together to become a code of ‘higher’
urgency? And how would we calculate and manage such an effect? This does not affect simple
enumeration scores or simple Max scores, but does apply to those which use summation, and could
be applied to complex enumeration where multiple lower scores combine to a higher score.
Another problem with normalisation of scores is that a single “bad” score can become less visible in
case the remaining indicators score significantly better. The higher the number of indicators, the more
pressing this issue becomes. See for example Table 5.11, where asset 1 has eight condition
indicators, asset 2 has four. By normalising the scores, a skewed image of the actual situation is
created where it seems that asset 1 is in a significantly better condition which is not true.
Table 5.11 – Example of aggregation of scores
Asset 1 Score Circuit Asset 2 Score

Condition indicator 1 3 Condition indicator 1 1
Condition indicator 5 10
Individual sum score 142 103
Normalised score 71 103
The worst score 100 100
Users of log-based scoring systems must remain aware of compound-effects for scores that will
influence total scores of aggregated assets. An example is given in Table 5.12 where a log base 3
summation index is shown, a maximum score of 100 applies. The highest score for the indicators is 30
points. Note that the sum score is 117.
Table 5.12 – Example of a log-3 based scoring system with category promotion
Asset Score Highest score Value Number of condition indices with promotion
Asset 1 1 1st 100 1
nd
Asset 2 30 2 30 0
Asset 3 10 3rd 10 1
Asset 4 3 4th 3 2
Asset 5 10 5th 1 1
Asset 6 30
Asset 7 3
Asset 8 30
Sum score 117
131
Inherent to the summation of scores typically is “promotion”. This is explained as follows:

In the example of Table 5.12, four asset scores with a score of 30 will sum to 120, yielding a combined
effect in the summed score which represents a higher score category, namely 100. This can be
chosen in case there is a wish to express a combined required effort for each asset to be addressed.
For example, in case of maintenance efforts where addressing more than three “small” issues are
regarded equivalent to one “medium” issue. In case of aggregated health scores, it is in this case
recommended to carry forward the “compounded” worst case score. This can be achieved by
decomposing the summated health score of all assets taken together using the modulus of the
summated score with the worst applicable condition indicator score, in this example ‘100’. This results
in a ranking number of the asset: 10121.
Note that again, differences in numbers of indicators for each asset will cause skewing of results. User
must be aware of this characteristic.
In case individual condition indicators are not considered linked in any way such that promotion is not
a preferred characteristic, the summated score is only assessed for the worst score and number of
indicators with this score. Then, similarly to what was discussed in earlier sections, an enumeration
system is best used to count the number of condition score categories to preserve ranking. In the
example shown before this would yield a ranking number 03221 (see Table 5.13).
Table 5.13 – Example of a Combined Score Without Category Promotion
Highest score Value Number of condition indices with promotion

st
1 100 -
2nd 30 3
3rd 10 2
4th 3 2
5th 1 1
Option 4 – Focussed aggregation using probability of failure information

When considering the health of a functional set of assets, for instance a bay or feeder, a potential
health score for the bay also considers the impact on the functionality of the bay. That is, if we define
failure as the set of events in which the substation, bay or feeder is no longer supplying power to a
connected customer.
Here the health of the individual assets in the bay should be translated to the corresponding expected
probability of failure (range). The user of the health index should define suitable ranges to correlate a
given individual asset score to a corresponding failure probability. Although cumbersome, good
practice is to perform analysis on actual failure and survival data of assets in the network and correlate
this to corresponding condition indicators for an asset to derive suitable probability of failure numbers/
ranges. This is not trivial. Note, that these analyses should also be updated periodically to reflect
future developments, including (new) ageing phenomena, changes in maintenance schemes, changes
in the way the system is utilised, etc. An example of such a translation is shown in Table 5.14. Note
that by definition this is a different scoring table compared to the scoring tables used for the individual
assets. This happens because the aggregate will have a different failure probability (typically higher)
compared to the individual assets.
Table 5.14 – Example of correlating scoring categories to ranges of failure probability.
Code Timescale Description Estimated probability of

failure/ per year, p
A Good for > 15 years with normal maintenance No concern 0 < p < 1%
B Requires attention in 5-15 years with normal Slight concern 1% < p < 2%
maintenance
C Requires attention in 2-5 years with normal Moderate concern 2% < p < 3%
maintenance
D Requires attention within 2 years with normal Critical 3% < p < 10%
maintenance
132
If we consider the single-line diagram of a feeder of Figure 5.4, it is apparent that poor health of the
assets which form a single point of failure pose a larger risk than the assets which are in parallel
(assuming that one can take over the load of the other). Based on the substation configuration and the
individual probabilities of failure of the assets in the bay, a total PoF for failure of the aggregate to
supply power to the customer could be derived and also expressed in terms of a health score.
If we assume, for the sake of simplicity, that the busbars have a failure probability of zero, the overall
bay failure probability using standard mathematics is equal to:
𝑝𝑏𝑎𝑦 = 1 − (1 − 𝑝𝐷𝑆1 ∙ 𝑝𝐷𝑆2 ) ∙ (1 − 𝑝𝐶𝐵1 ) ∙ (1 − 𝑝𝑇1 ) ∙ (1 − 𝑝𝐶𝐵2 )
Figure 5.4 – Example of a bay configuration

An example for given failure probabilities of the assets in the bay is given in Table 5.15. Note that this
is only an example with single numbers for each asset. Similar calculations can be expanded by
applying failure probability ranges and using Monte-Carlo simulations to yield a range for the overall
failure probability of the bay. The overall probability can then be represented using for example a
colour code using the scoring table as depicted in Table 5.14.
Table 5.15 – Example of calculating overall failure probability of the bay
Component Individual failure probability [%] Failure probability bay [%]

DS1 5.1
DS2 11.0
CB1 1.4 9.8
T1 2.3
CB2 5.8
This can serve to compare the performance of different bays. Note that it is important to realise that
bays (or other functional aggregates for that matter) with different configurations may have different
overall failure probabilities due to the configuration itself. In some cases the user may only be
interested to rank functional aggregates in relation to the failure probability, regardless of the
configuration, this may be a deliberate choice.
Sanity checking – PoF back calculation, expected condition issues

It is important to check the results of asset combination health scores against reality – noting that an
index is an estimate of the actual health and we may get better estimates through analysis of larger
data sets.
In reference [B14] authors looked at closing the feedback loop – comparing health scores of
transformers based on available data with what was found when the transformer was removed from
service. This showed that the system in use was accurate, in many cases, but also could be
misleading without additional information such as design elements.
133
Feedback discussion
In this practical example, transformer assessments were performed using analyses of components,
with scores for each component on a log scale 1, 3, 10, 30, 100. An asset score was then calculated
using either an average of component scores or with the maximum of individual component scores. As
shown below, an Average score dilutes and loses the urgency, while a Max score retains the sense of
urgency. The results, in Table 5.16, show the Average Health Index (AHI - Avg) and the Max Health
Index (AHI – Max) based on the component scores. This is listed against the ranking from the field
engineers who reviewed the assessment data (Found) without use of an index and then ranked the
assets in order of intervention priority (Field).
Table 5.16 – Comparing weighted with Max and field engineer assessment
Asset ID AHI – Avg AHI – Max Field Found

H 2 30 1 Main windings PF and C rise 2008-2012
C 14 30 2 Main windings PF rise but not C rise 2014-2017
D 14 30 3 Main windings large C rise but small PF rise 2015-2016
E 11 30 4 Main windings anomalous PF and C 2013-2017
B 15 30 5 Bushing bad, replace. Hope it did not go back to service.
A 30 30 6 Bushing bad, replace, do not re-energise
F 10 10 7 Bushing PF more than doubled, C1 Cap up ~2.8%
G 10 10 8 Bushing rise C1 PF and C1 value 2012-2013-2016
The two approaches show the effect of the different means of combining scores: with the same
approach being available to apply at a components-to-assets level or assets-to-aggregate level. What
was considered to be the most urgent issue by the field team received a very low ranking using the
AHI-Average; further, the AHI-Max does not give much detail for ranking within the code 30’s. Both
methods have clear limitations.
Back calculation of probability of failure

By applying/using:
 A PoF range to each code/index
 The ‘centre’ value of the PoF as we have assets which may be distributed normally (or not – it is
an interesting detail, but Poisson distribution may be appropriate)
 The number of assets in each category
 Historic asset failure rates based on condition
 The assumption that next year will look somewhat like this year in terms of failures
We can calculate the expected number of failures and adjust figures to reflect historic rates – or
whatever rate we choose.
An example is shown in Figure 5.5, where the PoF for different condition codes for a variety of assets
is adjusted to allow an overall target to be met.
134
Figure 5.5 – Adjusting category PoF values
Conclusions relating to aggregation

Proper aggregation of health scores requires the following aspects:
 The aggregated score must retain monotonicity
 The individual asset scores as well as the aggregate scores must be calibrated for time
 There exists a correlation between rank order with urgency of intervention (consequence of
monotonicity and calibration)
 Auditable – show the root cause data which implies act, i.e. it is traceable how the overall score
was calculated
Note that a health index encapsulates data and analyses and does not provide ‘new’ information. This
is certainly also true for aggregated scores. Four different methodologies for aggregating individual
asset health scores were discussed:
1. Enumeration of single (overall) asset scores
2. Enumeration of all available condition indicator scores for all assets
3. Normalisation of all asset scores into one overall aggregate score
4. Focussed aggregation aimed at a functional set of assets using probability of failure
information
Both enumeration methods (1 and 2) are easy to implement, retain ranking and preserve scoring
information better compared to aggregations aiming to create a single aggregate score (3). Both these
methods are recommended to use.
Furthermore, as was demonstrated with some examples, there are risks in creating a single overall
score by use of summation methods, especially when dealing with different asset types that have
different numbers of condition indicators. Therefore, this method is only to be used when potential
pitfalls are properly addressed.
The focused aggregation is quite useful from the perspective of functionality of a network or part of a
grid but requires more effort to extract failure probability of equipment, based on condition indicators
and failure and survival data analysis. This is not a trivial exercise but does offer a mathematically
sound outcome that also further enables/feeds into further risk assessment. Although more complex,
this approach is recommendable.
135
136
6. Conclusion
The role of this working group has been to produce a version of the Asset Health Index methodology
applicable to the various assets within a transmission substation. By defining the likelihood of failure of
each asset over various time scales it provides action plans for remediation or replacement. The
outcome leads to a list of assets such as the asset register but modified with additional data relating to
the condition of the asset in terms of possible failure modes. The second role of AHI analyses is to
combine failure likelihoods across the asset classes contained within a circuit end, bay or even a
complete substation. This is an outcome to be used for strategic planning and operational decision
making.
Important features are:
1. The use of AHIs is important to the application of condition-based decision making. As such it
is also the likelihood step within risk-based decision making. An important point is that there
should be a link between related diagnostics and failure mode modes. This comes through the
application of FMEA methods. The second important consideration is that the rate change in
the indicators of deterioration is sufficient for detection, decision making and corrective actions.
2. The AHI outcomes should not be confused with age, the period of prior service. Aspects such
as inferior designs, age, duty cycle and operational environment are not failure modes. They
are relevant in this context as hazard factors that might emphasise the likelihood of particular
failure modes becoming more dominant for a specific asset.
3. The AHI approach is based upon a systematic application of all failure modes, and so has the
attraction over a simple assessment based upon just one or two diagnostics. The resulting set
of AHIs should be calibrated for time. The AHI must uniformly reflect the same urgency of
intervention. Any AHI should identify changing likelihood with time periods by creating an
action plan for an intervention – maintenance, repair, or replacement. All assets with the same
score should have the same timescale for intervention, irrespective of failure mode or asset
type, otherwise there is confusion in applying AHIs consistently. A ‘poorer’ AHI should always
reflect a more urgent condition. This means that where several failure modes are being
assessed and the scores aggregated the method of aggregation should not produce any
violation of this principle.
4. The AHI is created by assimilating assessments from a number of Failure Mode scores.
Reducing scores from different failure modes into a single unified number is attractive,
particularly when the purpose is to create a prioritised action list. One option is to use only the
worst failure mode score to apply to the whole asset or across a bay or substation covering
many assets. This approach has both attractions and limitations. In order to achieve
prioritisation within an asset class it will be necessary to introduce time limited subcategories.
An alternative approach is where an exponential based score set could be added and so reflect
the condition of all failure modes. However, such an aggregation should not dilute a dominant
score of one failure mode, nor preclude any reverse audit. Combining scores across different
asset classes to provide a bay wide score again has operational attractions but also introduces
further complications. There can be no simple adding of scores when so many assets and
failure modes are involved.
5. The methodology should allow an auditable and direct trail between the outcome and
supporting evidence. The output should be clear, auditable and justifiable by those needing to
make decisions based on the output. And that it is not just a number as an output from an
automated analysis.
6. Creating an AHI approach is costly in time and effort. It is essential before starting out to clearly
establish the benefits and potential for cost savings and maintaining core business attributes of
safety, performance and reputation. This would identify the review level of the AHI method to
be adopted for each class of assets.
7. There needs to be an audit process – to tear down and examine forensically any unit scrapped
in order to confirm the AHI process. This should indicate the existence of all active failure
modes and their relationship to the assessment and AHIs made whilst the unit was in service.
With improvements identified it then can become part of the asset life plan within ISO 55000.
137
138
APPENDIX A. Definitions, abbreviations and

symbols
A.1. General terms
Table A.1 – Definition of general terms used in this TB
Acronym Phrase Definition

TB Technical Brochure A publication produced by CIGRÉ representing the state-of-
the-art guidelines and recommendations produced by a SC
WG. Hardcopy TBs can be purchased, or Individual
Members, or staff of a Collective Member can download the
PDF for free using their login credentials (copyright
restrictions for use within their own CIGRE Membership
only)
SC Study Committee One of the 16 technical domain groups of CIGRE
WG Working Group A group formed by a SC to develop a TB on a particular
subject of interest
A.2. Specific terms

Table A.2 – Definition of technical terms used in this TB

ABC Activity based costing An auditing method for maintenance costs
AC Alternating current A sinusoidal varying current
AHI Asset health index A measure of condition
BIL Basic Insulation level, a required impulse A specification impulse voltage level to demonstrate an
withstand value ability to withstand system lightning.
CAPEX Capital expenditure A budgetary term used for investment activities
CIMS Common Information Management system A common protocol for managing site data
CMMS Computerised maintenance management A means developed for controlling costs
system
CT Current transformer A device to permit current measurements in a HV circuit
CCVT Combined current and voltage transformer A device to allow both current and voltage measurements
DC Direct current A non-varying current and voltage
DDF Dielectric dissipation factor Alternate common names in common use are power factor,
tangent delta, and dielectric loss tangent. All relate to losses
within a dielectric material. It is a dimensionless number but
measured in different ways all produce a very similar value,
sometimes expressed as a percent.
DGA Dissolved gas analysis, A common technique used for oil filled assets. Gases
dissolved in an oil sample are extracted and analysed. The
presence of different gases indicates different failure modes.
FMEA Failure mode, effects analysis A systematic method for identifying failures
GSU Generator step up transformer Output transformer converting generation power to high
voltage transmission systems
HFCT High frequency current transformer Usually a measurement device used around earth or
bonding leads to allow measurement of PD
IEC International electro technical committee International standards body for the industry
IEEE Institute of electrical and electronic A professional body for working engineers, creating and
engineers disseminating best practice and standards
IT Instrument transformer A HV asset used to measure current and/or voltage
KPI Key performance Indicator A measure to allow continuous improvement.
MOV Metal oxide varistor A type of surge arrester material
OEM Original equipment manufacturer Original manufacturer of the plant item
OIP Oil impregnated paper A common insulation system- transformers bushings etc
OLTC/ LTC On load tap changer A winding allowing varying number of turns to a winding and
facilitate voltage stability of a power transformer
139

OPEX Operating expenditure A budgetary term describing ongoing costs required to
maintain an activity.
PCB Polychlorinated biphenyl An insulating fluid now banned. It exists around the world as
a contaminant in oils within power equipment used.
PD Partial discharge A small breakdown current flowing as a result of localised
insulation breakdown at a weakness.
PMU Phasor Measurement Unit Device to measure phasors
DDF/C Dielectric factor and capacitance A measure of insulation quality
measurement
RH Relative humidity A measure of moisture in a fluid or gas and relative to
saturation levels.
RTU Remote Terminal Unit Device used to connect to remote SCADA control centre
SCADA Supervisory control and data acquisition Control system architecture comprising computers,
networked data communications and graphical user
interfaces
SF6 Sulphur hexafluoride A high strength gas ideal for quenching arcs in circuit
breakers- but has recognised environmental impact if
released.
SFRA Sweep frequency response analysis A spectrum created by selective attenuation and resonances
in a winding when a varying sine wave is injected along the
winding.
UHF-PD Ultra-high frequency measurement of Following a partial discharge there is a current flow in
radiated electromagnetic emissions. connected circuits. This in turn creates high frequency
radiation that can be detected using a scanning device.
UV Ultraviolet Part of the electromagnetic spectrum with higher frequency
than visible.
VT Voltage transformer Device used to measure line voltage
140
APPENDIX B. Links and references

[B1] TB 642 (2015) “Transformer reliability survey”, WG A2
[B2] TB 510 (2012) “Final Report of the 2004 - 2007 International Enquiry on Reliability of High
Voltage Equipment; Part 2 - Reliability of High Voltage SF6 Circuit Breakers”, WG A3
[B3] TB 761 (2019) “Condition assessment of power transformers”, WG A2
[B4] TB 152 (2000) "International Survey of Maintenance Policies and Trends ", JWG 23/39
[B5] TB 660 (2016) “Saving Through Optimised Maintenance in AIS”, WG B3
[B6] PAS 55, "Specification for the optimized management of physical assets", BSI, 2004
[B7] PAS 55, "Specification for the optimized management of physical assets", BSI, 2008
[B8] ISO 55000, "Asset management- overview, principles and terminology". ISO 2014
[B9] CIGRE Substations Green book, Chapter 52, Springer, 2018
[B10] “Transformer life prediction using data from units removed from service and thermal
modelling”, CIGRE-Session, A2-212, Paris, 2010, P. Jarman, R. Hooton, L. Walker, Q.
Zhong, T. Ishak and Z. Wang
[B11] “Evaluation of failure data of H.V. circuit-breakers for conditioned based maintenance”,
CIGRE session, A3-305, Paris, 2004, G. Balzer, D. Drescher, F. Heil, P. Kirchesch, R.
Meister, C. Neumann
[B12] “End of life estimation and optimisation of maintenance of HV switchgear and GIS
substations”, CIGRE session, A3-202, 2012, C. Neumann, B. Rusek G. Balzer, I. Jeromin, C.
Hille, and A. Schnettler
[B13] “Risk Based Maintenance in Electric Network Organisations”, Thesis Delft University of
Technology, The Netherlands 2016, R.P.Y Mehairjan, http://repository.tudelft.nl
[B14] "Transformer Asset Health Review: Does it work?", CIGRE Session Paper A2-108, Paris,
2014, R. H. Heywood, P.N. Jarman and S. Ryder
[B15] “Developing and using justifiable Asset Health Indices for Tactical and Strategic Risk
Management”, Paper B3-201, CIGRE Session Paris, 2018, T. McGrail, S. Rhoads and J.
White
[B16] “Asset Health Index and Risk Assessment Models for High Voltage Gas-Insulated Switchgear
Operating in Tropical Environment,” PhD Dissertation, TU Delft, 2020, A.P. Purnomoadi,
http://repository.tudelft.nl
[B17] TB 737 (2018) "Non-intrusive methods for condition assessment of distribution and
transmission switchgear" WG A3
[B18] TB167 “User guide for the application of monitoring and diagnostic techniques for switching
equipment for rated voltages of 72.5 kV and above"
[B19] IEEE C37.10 “IEEE Guide for Investigation, Analysis and Reporting of Power Circuit Breaker
Failures”
[B20] IEEE C37.10.1 “IEEE Guide for the Selection of Monitoring for Circuit Breakers”
[B21] TB 511 (2012) “Final Report of the 2004 - 2007 International Enquiry on Reliability of High
Voltage Equipment Part 3 - Disconnectors and Earthing Switches”
[B22] IEC 62271 “High-voltage switchgear and control gear – Part 102: Alternating current
disconnectors and earthing switches (2nd Ed. 2018)”
[B23] IEC 60050, “International Electro technical Vocabulary”, IEV 441-14-05
[B24] IEC 60050, “International Electro technical Vocabulary”, IEV 441-14-11
[B25] IEEE Std C57.91 (2011) “IEEE Guide for Loading Mineral-Oil-Immersed Transformers and
Step-Voltage Regulators”
[B26] IEEE Std C57.143 (2012) “IEEE Guide for Application for Monitoring Equipment to Liquid-
Immersed Transformers and Components”
141
[B27] TB 512 “Final Report of the 2004 - 2007 International Enquiry on Reliability of High Voltage
Equipment, Part 4”, WG A3.06
[B28] "Acceptance Testing Specifications for Electrical Power Distribution Equipment and
Systems”, NETA
[B29] TB 513 (2012) “Final Report of the 2004-2007, International Enquiry on Reliability of High
Voltage Equipment, Part 5: Gas Insulated Switchgear (GIS),” WG A3.06
[B30] “The Delphi Method: techniques and applications,” 1975, H.A. Linstone, M. Turoff
[B31] “Statistical Lifetime Management for Energy Network Components,” PhD Thesis, TU Delft,
the Netherlands, 2012, R. A. Jongen
[B32] IEC 60099-4 “Surge arresters - Part 4: Metal-oxide surge arresters without gaps for a.c.
systems”
[B33] “Condition Monitoring of Surge Arresters”, IndiaDoble symposium Delhi, 2005, R. K.
Tyagi
[B34] “Capacitor bank failure investigation”, Doble Engineering Conference, 2007, L. Pong and J-F
Chrétien
[B35] “Experience with leakage-current testing of 380 kV MOV surge arresters in the field,
utilizing an lcm portable instrument – section 9-3”, Proceedings of the 1994 International
Conference of Doble Clients, P Leemans and G Moulaert
[B36] TB 775 (2019) “Transformer bushing Reliability”
[B37] “Update – field testing capacitor bank with M4000 test instrument”, Doble Engineering
Conference, 2007, L Pong and D Wheat
[B38] “Managing Bushings: From Statistics to Singularities – Where to Focus?” Transformer
Technology, Issue 8, 2020, T. McGrail, https://www.transformer-technology.com/community-
hub/technical-articles.
[B39] “Advanced condition monitoring method for high voltage overhead lines based on visual
inspection”, IEEE PES General Meeting, Portland, 2018, H. Manninen, J. Kilter,
M. Landsberg
[B40] TB 300 (2006) “Guidelines to an optimized approach to the renewal of existing air insulated
substations Working”, WG B3.03
[B41] “Transformer Asset Management: How Well Are We Doing And Where Do We Need To Do
Better?”, International Conference of Doble Clients, Boston, 2015, Ryder S., Jarman P.,
Heywood R.
[B42] “T. (2016). Deriving a Useful Asset Health Index - Getting Started, Getting Value and Making
Use of Them”, Doble Client Conference, 2016, McGrail, T., Heywood, R.
[B43] “Transformer Health Index and Probability of Failure Based on Failure Mode Effects Analysis
(FMEA) of a Reliability Centered Maintenance Program (RCM)” CIGRE Session paper A2-
110, Paris, 2016, P. Lorin et al
[B44] “DNO Common Network Asset Indices Methodology”, UK Regulator, OFGEM, 2017,
www.ofgem.gov.uk/system/files/docs/2017/05/dno_common_network_asset_indices_method
ology_v1.1
[B45] “Hydro One Distribution - ACA Summary Report”, Hydro One, 2005
[B46] “Aggregate Health Indices as Used for Asset Investment Decisions and Universal
Understanding”, EuroDoble, UK, 2007, Kydd T.
[B47] “Transformer Condition Assessment”, International Doble Conference, 2003, Bennett, G.
[B48] “Development of Transformer Health Index (THI) – TATA Power Experience”, 13th IndiaDoble
Power Forum, 2015, Kini M.V. et al
[B49] “Current Situation and Recent Challenges in Asset Management of Aging T&D Substation
Facilities in Japan”, CIGRE Paper B3_302_2016, Paris, 2016, Kobayashi T., et al
142
[B50] “Procedure for Using Condition Based Maintenance to Create the Health Indices of
Transmission Power Lines: A Case Study of the Kenyan Coast”, International Doble
Conference, 2012, Bosire E. & Yarrow A.
[B51] “Modern Insulation Condition Assessment for Instrument Transformers”, IEEE International
Conference on Condition Monitoring and Diagnosis, Bali, 2012, Stephanie Raetzke, Maik
Koch, Martin Anglhuber
[B52] TB 234 (2003) "SF6 Recycling Guide"
[B53] TB 567 (2014) "SF6 Analysis for AIS, GIS and MTS Condition Assessment”
143
APPENDIX C. Additional explanation specific to

Chapter 5
C.1. Characteristics of combinable health indices
Generation of a health index is reliant on estimates of deterioration, intervention and timescales – if
things do not deteriorate over time, we do not need to replace/maintain them based on condition.
If we wish to combine the health indices of several components or assets, the individual health indices
need to be consistent with respect to time: any individual index value must have the same urgency or
timescale for action whenever it is used. This ‘calibration’ ensures that we retain the original urgency.
For example – if health indices are codes 1-5, then all values of 3 have to have a consistent timescale
of say 2-5 years for action but noting that the actions may differ. That said, there must be raw data
which is analysed to allow us to identify a timescale for intervention – both diagnosing identifying
failure modes which are in operation and their prognosis.
Note that when we group assets into a category, such as having a number of Code 3 assets, there is a
natural tendency to consider all assets in that category as being similar, and those same assets as
being quite different to assets in other categories1. How long can an asset remain as a Code 3, if the
requirement is to act within 2-5 years? Over time the asset must get closer and closer to being in the
‘act within 0-2 years’ category.
Some health index systems use weights to generate a final score as a percent value, or a number
between 0 and 10 to several decimal places. These approaches need to be examined to ensure that
the urgency inherent in the identified failure modes is not lost. If that urgency is not maintained, then
any subsequent combination of indices will give an illusion of mathematical rigor while not actually
providing realistic meaning.
Further, there needs to be at least two significant checks on the value of an indexing system:
 Closing the feedback loop and checking that scores assigned reflect what is found in practice.
This can be achieved through forensic teardown, as in the work described in [B14] as the
condition found may not be the condition expected from the assessment which led to the health
index.
 Checking that the identification of assets for replacement, say, which improves the overall
population is better than use of a placebo. That is, showing the average health index of a
population improves by targeted replacement has to be demonstrated to be consistently better
than replacement at random.
C.2. Mathematics of probability

The mathematics of probability is well documented but not always well understood, or well applied.
The discussion here looks at combining probability of failure values, PoF, for multiple assets, whether
a physical group such as a single bay, or a more ‘logical’ grouping such as multiple assets which are
of a given type
As a simple example: if we have two independent (unconnected) assets, each with a probability of
failure p in the next 12 months, what is the probability that at least one of them will fail in the next 12
months? (The value of p is between 0 and 1, which represents 0-100%)
The answer is:
2𝑝 − 𝑝2
The derivation goes as follows, covering the 12-month period of interest:
Probability of failure of each asset:
𝑝
1Sapolski – https://www.youtube.com/watch?v=NNnIGh9g6fA “...when you pay attention to categorical

boundaries, you don't see big pictures...”
144
Probability of survival of each asset:

(1 − 𝑝)
Probability both survive, assuming they are independent:
(1 − 𝑝) × (1 − 𝑝) = (1 − 𝑝)2
Probability at least one fails:
(1 − (1 − 𝑝)2 ) = 2𝑝 − 𝑝2 = 𝑝(2 − 𝑝)
The same approach for three independent assets, each with failure probability p, yields a probability
of:
(1 − (1 − 𝑝)3 )
and we can generalize for n assets as:
(1 − (1 − 𝑝)𝑛 )
The final value is greater than the original value of p.
If the probability of failure of each of n assets is p1, p2, p3… pn then the probability that at least one will
fail in the 12 months is:
(1 – ((1 − 𝑝1 ) × (1 − 𝑝2 ) × (1 − 𝑝3 ) × … (1 − 𝑝𝑛 ))
and the overall value is greater than any individual value of pn.
The final value for probability of failure of at least one asset in a collection of assets is thus greater
than the individual probability of failure of any individual asset. This has implications for assets which
are coded by category: use of Max underestimates the probability of failure of a group of assets.
Let us also look at probability of failure over time – again, start with the probability of failure in a given
12-month period being p. What is the probability of failure occurring in the next n years?
The answer is:
(1 − (1 − 𝑝)𝑛 )
as we treat each year independently.
In practice, the probability of failure may rise with time, or may fall – depending on which probability of
failure curve we think applies. For transformers there is evidence for many types to have a higher
failure rate during ‘burn in’ (infant mortality) then a long random period; other types may have a rising
rate for older units, displaying a more bathtub-like curve.
If the probability of failure varies, then we can say that the overall probability of failure within a multi-
year period is greater than the highest for any one year.
If, for example, we have four codes for asset health, A thru D, with an associated timescale and range
of probability of failure during that time, as per Table C.1, then we can look at putting a range on the
probability of failure, and use this in an aggregated condition code of a set of assets.
Table C.1 – Estimated probability of failure
Code Timescale Estimated Probability of Failure/ per year

A Good for >15 years with normal maintenance 0 < p < 1%
B Replace in 5-15 years with normal maintenance 1% < p < 2%
C Replace in 2-5 years with normal maintenance 2% < p < 3%
D Replace in 2 years with normal maintenance 3% < p < 10% (or higher???)
Note, there is a point at which an organization may consider the probability and consequent risk of
failure in a 12-month period to be too high – at which point it is incumbent upon them to remove the
asset to meet their risk aversion. The level of risk aversion will vary between organizations. What is an
acceptable level of failure for an asset? For an asset group?
So – if we have a set of three independent assets of condition code B, what is the range of probability
of failure, p, for the set?
145
The range must be somewhere between the three assets all being at 1% and all three assets being at
2% giving a range, as long as the assets are independent:
(1 – (1 − 0.01)3 ) < 𝑝 < (1 – (1 − 0.02)3 ) ≥ 2.97% < 𝑝 < 5.88%
The higher value shows that the set of three assets could easily now be a Code D.
The problem is that manipulating codes, rather than the root probability data, means that we have lost
sight of the data which puts it in a particular category to start with. The best approach would be to use
the original data – however it was derived or estimated - such that it allowed us to place the asset in a
particular category.
It may be that the placement of an asset in a category was done by pure estimate – a Delphic2
Approach, for example? – and the limits for the category are the best we have. In which case math of
the limits for the category may always be used to provide a range for a set of assets: a range which
depends on the number of assets.
The situation is compounded if the assets have an influence on each other such that the probability of
failure of a transformer is decreased if used in combination with a surge arrester or increased if used
with a breaker prone to restrike. The interconnectedness of assets may well affect their individual
PoF’s.
2 From the Oracle at Delphi: an approach based on experience and discussion
146
APPENDIX D. CIGRE PUBLICATIONS

This covers several Paris session papers and a recent technical brochure from SC A2.
D.1. UK TSO
This refers to the publication from the 2014 CIGRE Paris session [B14]. It describes a method
developed some 20 years ago by the TSO at the instigation of the UK Regulator who wanted to see
some justification for the transformer capital replacement plan. This was coupled with providing
evidence that the company was managing its supply business with due regard to the management of
risk, cost control and supply performance. This methodology has been used by the utility in its UK and
USA networks and their experience was described in this 2014 publication. It separates the population
of about 800 transformers into categories shown in Table D.1.
Table D.1 – Asset health legend extracted from reference [B14]
The essence of the methodology is:
 It is a 5-year plan to be used with the Regulator for justifying capital replacement planning in the
5-year rate review periods. It is not a prioritisation for planning maintenance, repairs etc, nor is it
directly addressing likelihood to fail.
 It identifies which transformers are faulty and likely time scale for the end of life – AHI 1 and 2.
Noteworthy is that age is specifically not factored into the assessment since for this utility the
failure rate does not increase with time. Conversely their failure rates show that in the first 10
years it is 3× higher than the random failure rate over the following 40 years [B10]. However, low
asset lives can be associated with a poor performing design. Consequently, a unit with such a
poorly performing design but even with no evidence of the developing deterioration is identified in
a special category (3 in table) and requires increased surveillance. Transformers that are
deteriorating are assessed by their condition and failure mode into categories 1 and 2.
 It allows clear identification of the dominant failure mode, but as developed it related to (only) the
transformer tank, core and windings, where any deterioration would likely lead to a replacement
unit. Component parts such as cooling, oil, OLTC, bushings and control systems could be
refurbished or replaced independently of the main tank and not included in this AHI.
 Assessment has been achieved using a base 3 logarithmic scale, using no weighting to the
condition assessments in order to allow a clear identification of the weak link.
 Whenever a transformer is being scrapped (for whatever reason), the company engages
specialist design engineers and forensic scientists to study the extent of deterioration, confirming
both the selected failure modes and the AHI ascribed prior to scrapping. This allows an audit of
the process as well as ensuring it evolves with more knowledge. The results of this audit were
described in the 2014 publication [B14].
This system has been used in UK since the mid-1990s and initially used by specialist transformer
engineers based within the utility’s Technology Division. After the closure of this unit in 2002 this
specialist engineering work transferred to an outsourced service provider. At this point the
methodology evolved beyond life-limiting failure modes to include all failure modes where failure could
prevent the transformer from meeting its performance requirement. The outcomes reflect likelihood to
fail and the impact on the score if recommended remediation is then carried out. As such this method
has been used by the service provider in other utilities in USA, Africa and Middle East.
147
D.2. USA TSO

To demonstrate the approach described above a further Paris paper was prepared for the 2018
session to explain the process and how the UK TSO system was applied within the USA by the same
utility and service provider [B15]. This paper provided greater detail of the methodology, particularly on
the choice of data inputs and why using a logarithmic or exponential method with no weighting was
preferred. The examples reproduced in Chapter 3 of this TB are taken from this paper. It remains
transformer specific, but one important difference from reference [B14] is that the focus is on
production of a coding that is more than a justification to a regulator for capital replacement. It has
evolved into an AHI with all categories relating to the likelihood of failure. A second is that all sub-
components and functions are now included. These authors stress that any code produced needs to
have a timescale for corrective action associated with it. The action in turn is required to address the
problem that the AHI is designed to solve. The timescale needs to be appropriate to the action and the
problem.
To conclude, the guiding process identified here is that it must:
 Be calibrated such that all equivalent indices have the same timescale for action and thus same
sense of urgency, and
 Be monotonic, meaning that worse indices require a more urgent action. The condition is unlikely
to improve with time.
D.3. OEM – International group of transformer experts

Several authors from this major OEM with locations throughout the world presented a useful paper at
the 2016 session of CIGRE [B43]. They begin by pointing out the greater confusion occurring when
adding condition assessments when each is expressed as a linear score, with or without weighting.
The resulting masking of a bad assessment result justifies their concern. Like that described in
references [B14] and [B15], and this TB, their advice is not to use such a methodology that adds
linear scores or uses weighting.
Figure D.1 shows their preferred approach, which is traditionally FMEA, linking diagnostic data to the
component and then to the function. The paper is conceptual in describing a system rather than
experience from years of application. The methodology is still based on a weighted sum of parameters
but applied only to individual components (say, OLTC) for each of the different failure modes root
cause. The score per function is then aggregated and translated into probability of failure statistics
utilizing OEM databases, specific user experience and global surveys. This system may well allow
units with a risk of failing to be correctly identified. But it is not clear how a fleet-based register can be
constructed reflecting all shades of likelihood.
Figure D.1 – An OEM’s RCM approach [B43]
148
D.4. A2 Brochure TB 761 Condition assessment of power transformers

This brochure was produced from SC A2 [B3]. Its aim was to develop methodologies to derive various
Transformer Assessment Indices. It notes that these differ from a ‘Health Index’ since a single list
does not prioritise particular actions, i.e. replace, repair or refurbish. In many cases some “unhealthy”
transformers can be (relatively) easily repaired and therefore do not need to be replaced. Their aim is
to create a set of prioritised lists for each type of intervention. There are many common similarities in
the TB 761 approach to that described in the current TB.
Like the work of B3-48, TB 761 identifies the roadmap:
1. Determine the purpose of the Transformer Assessment Score and Index.
2. Identify the failure modes to be included.
3. Determine how each failure mode will be assessed.
4. Design a calibrated system for categorising failure modes (scoring matrix).
5. Calculate a score for each transformer.
In this way it is not focussing on age or statistically derived lifetimes. It is focussing on starting with an
identification and onset of the dominant but varied dominant failure modes and their diagnostic
indicators within a fleet, considering the main unit and each of the accessories in so far as their failure
impacts upon the whole unit. It expresses very similar advice on scoring, aggregation and weighting of
scores to those described in Chapter 2 of this TB and reference [B14].
They provide a detailed failure mode tabulation broken down by active part, bushings, tap changer
and cooling systems. They identify relevant diagnostics and, in the appendix, identify the relationship
between measured values and the score categories. This is based on knowledge from standards,
although it appears not to consider the importance of rates of change of measured values.
Table D.2 – AHI scoring used in TB 761 [B3]
They use a linear scoring system but recognise that simply adding scores hides damage. They prefer
to note all assessments with the number of aspects that score in each of the bands, as per Figure D.2.
Within any tabulation this score would translate to be 035310 and would be ranked higher than others
with scores 026310 or 04410 for example. This is acceptable because each category is calibrated in
terms of having the same time scales for activity to take place.
Figure D.2 – Scoring matrix from TB 761 [B3]
149
APPENDIX E. COLLABORATIVE DEVELOPMENTS

E.1. UK DNO
In order to meet the licence conditions for 2015-23 the UK Regulator for Gas and Electricity Markets
(OFGEM) initiated a working group from the Distribution Network Operators (DNO). Its aim was to
have a common framework to produce asset health indices covering all asset classes. This included
common definitions, principles and calculation methodologies for use within the six DNO groups for
adoption across all UK Distribution Network Operators for the assessment, forecasting and regulatory
reporting of Asset Risk. After a series of drafts, version 1.1 was issued in 2017 and is freely available
as a web download [B44]. As such, this common methodology is a significant development for use
across a large network and different suppliers.
It has as its starting point an asset register that identifies a generic “expected asset lifetime” for each
design and voltage class of breakers, transformers, lines, cables, batteries etc. This is in the range 60-
100 years. The next step is to identify the current age as a fraction of this expected lifetime for every
asset. This lifetime is then modified using a series of multipliers, firstly for location relative to distance
from saline and other pollution sources and by duty factors. The modified age is then modified again
with results from a selection of diagnostic indicators. This is a particularly challenging system that has
been developed, as shown in Figure E.1, and for this reason justifies some space in this brochure.
Figure E.1 – DNO Methodology Derivation of PoF [B44]

This OFGEM DNO methodology is substantial and may be seen by some as convoluted. The
document runs to 195 pages with many different asset types covered. It is the extreme contrast to
methodologies described elsewhere, including the CIGRE references [B14], [B15] and [B43]. It makes
significant use of weighting of scores and with algorithms and rules, embedded in its process. This
approach is not likely to provide a comprehension of the meaning behind the final answer and
identifying a time scale to react in order to address the possible outcomes. It does, however, provide a
way to view assets consistently across multiple organisations, and this is beneficial in terms of making
comparisons between DNOs. The complexity of the system and the over reliance upon asset age may
make the system difficult to implement effectively and by its very nature make it likely that users do not
question the data they feed into the system or the outcomes. An illusion of accuracy and precision
may result. Caution when applying these systems is necessary to ensure that interpretations are valid
and meaningful. It is essentially an example of a black box approach built upon a premise of a clear
link existing between age and likelihood of failure. It is also using diagnostic data outside of their
context of their relationship to failure modes. It will produce significantly different outcomes to the
foregoing methods described in CIGRE publications described in APPENDIX D.
150
APPENDIX F. UTILITY DEVELOPMENTS

F.1. Canadian TSO
This AHI work [B45], [B46] takes a strategic approach and, like the UK approach in reference [B14],
provides a common basis for prioritising and justifying their investment strategies for a wide range of
transmission assets. It incorporates the effect of in-service maintenance and refurbishments and has
been used since 1999. Whilst recognising that deterioration is usually associated with age, they clearly
state that this does not imply that older assets are more likely to fail in any time period. Recent data
from inspections, survey and other tests are used where possible. Each asset is categorized based on
risk of failure. Using RCM thinking it recognises that dominant failure modes should be identified for
each asset class, and perhaps also segregated in terms of OEM. Having identified these dominant
failure modes for the asset class the scores are weighted accordingly when calculating the AHI. They
note that “for example, a dominant condition factor may reduce the overall health of an asset by as
much as 50%. The condition rating numbers are multiplied by the assigned weights to compute
weighted scores for each asset. The weighted scores are then totalled for each asset” to yield the final
Health Indices. Importantly, the activity goes further to provide aggregated scores for systems
containing these assets. Again, a weighted system is used, on a 1-100 scale, but will have some of
the limitations of weighting mentioned before – but partly mitigated by using a clear link to RCM
established dominant modes specific to the asset group.
It is stated in the review, that a health index should have the following properties:
 Indicative – Must indicate the overall assets health.
 Objective – The index must be verifiable to industry standards, observations and LoF.
 Simple – Should be easy to use.
The problem with such a weighted system is the difficulty relating AHI and LoF. Furthermore, as
discussed earlier, a weighted aggregated AHI may actually be masking assets in poor condition and
may not be a good indication of overall health.
F.2. USA Utility

To reduce in-service failures this company began a comprehensive Transformer Condition
Assessment program in the early 2000s [B47]Erreur ! Source du renvoi introuvable.. Their starting
point was to identify those units which were performing worse, by comparing families and multiple
indicators. With an RCM approach and extensive inter-company discussion it involved extensive
condition assessment based upon:
 Historical review of test data, operating conditions, etc.
 Visual inspection
 OLTC assessment (DGA/GOQ/number of operations/load/runaway/mechanism/design)
 DGA (Main Tank and OLTC compartments)
 Comprehensive oil quality tests (Main Tank and OLTC compartments)
 Total combustible gas tests of gas blanket
 Partial discharge tests of MT and LTC compartments
 Airborne corona detection (bushings)
 Ultrasonic leak detection (sealed tanks)
 IR scans for Main Tank and OLTC compartments
 Cooling system checks (automatic controls, flow, temperature differences.
 OLTC smooth-rise test
 Vibration analysis and sound level tests
Data was assessed according to standards, experience and relevance. On this basis their fleet of
nearly 600 transformers were allocated into five classes ranging from those in need of more
immediate intervention to those operating as expected. There is no asset health index with scoring – it
is simply using diagnostic data to categorise the units into these five bands. They believe their
assessment has proven to be 96% correct. There is, however, no formal relationship identified
between coding and Likelihood of Failure (LoF).
151
F.3. Indian Power System

Kini et al [B48] described an interesting AHI approach for transformers in an Indian network. The
approach has clear aims, which are to manage transformers by ranking them to justify replacement,
maintenance, etc. and monitor the population over time.
A first tier of assessment is used for the whole fleet. A second tier is for those requiring further
investigative analysis. There are over 30 different parameters of assessment, each scored 1-4 and
having a weighting as identified in Table F.1. Asset age score with a high weighting is important in this
approach.
Table F.1 – Weighting factors for diagnostics
The final score is the sum of the weighted values. This is not unusual and is open to dilution effects as
described earlier. To mitigate this, although the normal scores are 1 (poor) to 4 (very good), there is a
fifth “special” category for units assessed to be “very poor” which has a score 0.01. This introduces an
element of extreme logarithmic scoring and so over-rules the scoring and weighting used in their
standard system. The AHI (or THI as authors refer to it) is developed based on a 1-100 system, similar
to others in the industry. Once sub-component weights are identified, major component analysis takes
place.
Figure F.1is a flowchart for THI. It shows that the process starts by identifying the analysis of basic or
Tier 1 factors. Manually assessed factors that are described as Tier 2 are used to adjust the final THI.
Figure F.1 – Flowchart for Indian utility

Note that the Adjusting Factor includes expert assessment of operational data. The diagnostic factors
are just that. Not directly linked to the developing effects of a particular failure mode. It is similar to the
DNO method described earlier.
Health indices are given on a 1-100 scale and placed into five classes. A qualitative assessment of the
PoF is also provided with an estimate of the expected life for those in the five classes. This ranges
from more than 15 years to having reached end of life.
152
Their process of developing the AHI is formal, but there are areas where judgment and heuristics
apply. The weighted system focuses on known failure modes and related diagnostics. The results
have estimated timescales that provide a basis for a loose LoF calculation. But the weighting
approach and so many parameters means that the relation is difficult to calibrate and evaluate, and
any sense of urgency is lost.
F.4. Japanese Utility

At CIGRE Paris, August 2016, Kobayashi et al, presented a poster on the latest work on AHI in Japan
[B49]. As investments rise to cope with increasing power demands, the authors note the challenges
and benefits of an AHI as one of four key areas of focus. They also focus on:
 Consolidation of failure data
 Predicted failure rates (with respect to economic life)
 Substations as a complete system
An AHI for 66 kV transformers has been developed. It is a mix of conditions and consequences, which
makes it more suited to risk analysis. The problems are the intermingling of possible condition factors
such as rising DGA or poor tap changer condition, with criticality factors such as presence of PCBs in
oil entailing a more substantial clean up. The system does not seem to be calibrated and has different
scores possible for different contributing factors. DGA may be up to 10 points, while winding
configuration may be allowed to score up to 47 points.
The calibration of scales is not stated. The system is proposed to allow ranking for replacement, and
the index developed is an indication of where to look. At present, we believe there would be little
chance of developing a relationship between the AHI and LoF.
F.5. Transmission power lines in Africa

This paper [B49] showed the development of an AHI following a formulaic approach, a weighted set of
inputs and a 1-100 score. The authors assume a bathtub curve for asset ageing and failure rate. The
key parameters are evaluated using expert analysis or reference values and are weighted as per
Table F.2.
Table F.2 – Example of a transmission line evaluation scoring method
LoF is discussed only qualitatively and the relation between the AHI and LoF is neither described nor
identified.
The authorErreur ! Source du renvoi introuvable. described a case study on transmission line AHIs
focusing on assets in one district. Four categories are defined, based on assessed imminence of
failure and the AHI developed previously:
 CR1 is a condition in which there is no detectable or measurable deterioration and no increased
probability of failure.
 CR2 is where there is evidence of deterioration that is considered to be normal ageing and has
no significant effect on the probability of failure.
153
 CR3 is a condition where there is significant deterioration that increases the probability of failure
in the short to medium term.
 CR4 represents severe degradation and indicates an immediate, significant increase in the
probability of failure.
The AHI is derived from a number of parameters, as shown in Figure F.2.
Figure F.2 – AHI derivation

The authors also look at the distribution of AHI scores and define further categories as shown in
Figure F.3.
Figure F.3 – AHI Distribution

A qualitative indication of LoF is also given, as per Table F.3.
Table F.3 – AHI and Probability of failure
154
The relationship described is monotonic, indicating that the lower the AHI, the more likely a unit is to
fail. The AHI is used to generate a LoF, and the LoF is used to drive action.
The problem with weighted systems will again arise and some units with a higher AHI may actually
have an increased likelihood of failure. In addition, the paper states that LoF will rise with age; which is
contrary to what has been noted by many other investigations and analysis. However, this may be true
depending on failure modes for lines where the condition can be improved via maintenance, e.g.
painting.
155
APPENDIX G. WG members experiences

Experiences of WG members are shared in this brochure and the content does not relate to existing
publications.
Many have realised some of the basic issues facing development of AHI. The first important step to be
addressed is an agreement as to its purpose. It may be for internal use to document a process for
prioritising tasks such as maintenance or replacement. Similarly, it may be to indicate likelihood of in-
service failure and so to plan the response measures, through replacement or repair, to organise
replacements or spare parts, and to set up risk management (exclusion) zones. It may be a system to
identify and justify expenditure for a government Regulator when undertaking rate reviews. Different
purposes will lead to a set of differently prioritised lists and different AHI.
Several have started by relating time in service against a perception of a generic life for the asset
class. The latter is then reduced by factors such as application and condition indicators and so, by
decreasing the effective lifetime, increasing the proportion of life used. This approach remains in the
traditional time-based mind set. Many, however, start from the opposite viewpoint, recognising that
assets fail for a variety of causes and it is better to recognise and build around failure modes as the
starting point. The onset of a failure mode re-sets the remaining lifetime. The need is then to identify
these modes, their causes and apply diagnostic indicators which then assess the future time frame to
failure. This reduces the influence of age factors and the duration of previous service-life. The latter is
the application of condition-based decision making and FMEA approaches with condition assessment
linked to each of those various failure modes. The importance of how diagnostic outcomes relate
likelihood to fail is the challenge.
Scoring the likelihood has been an issue, particularly when it came to aggregating a set of scores for a
number of failure modes. A common trap was to aggregate in such a way that input data having a
significantly bad assessment scores are averaged out. A second problem is where some include
factors such as age, design quality and application in the AHI score as if they were indicators of failure
modes themselves. They are but hazards that can reduce lifetime and enhance likelihood of a
particular failure mode dominating.
G.1. UK and USA TSO together with collaborating service provider

Engineers involved in the development and application of methods described in references [B14] and
[B15] were involved in the group.
G.2. Belgian TSO

This AHI method is based on test results that have an impact on condition. They also have a service
level index looking at spares, knowledge etc.
As with the UK DNO approach, they start with a conceptual life duration for each asset class. The
actual age is modified by condition to derive an equivalent age.
This is expressed as:
𝒆𝒒𝒖𝒊𝒗𝒂𝒍𝒆𝒏𝒕 𝒂𝒈𝒆
𝒆𝒇𝒇𝒆𝒄𝒕𝒊𝒗𝒆 𝒂𝒈𝒆 = × 𝟏𝟎𝟎%.
𝒄𝒐𝒏𝒄𝒆𝒑𝒕𝒖𝒂𝒍 𝒍𝒊𝒇𝒆
They use linear scoring and weighting to reach the equivalent age, as per Figure G1 and all results
are then converted to a 5-point ranking shown in Table G1. The data in Figure G1 is but one example
of the use of diagnostic data. IEEE C57.104 categorises DGA data and C2H6 values less than 50 ppm
are classed condition 1 (the best). Leaping to 400 ppm is in the worst category, level 4. The example
gives a good demonstration of the logic. The 400 ppm value is indicative of some event causing the
rapid increase and it is likely the unit now has a more immediate risk of failure than earlier.
156
Figure G.1 – Example with a step change in apparent age

Table G.1 – Final ranking
G.3. Dutch TSO

The AHI for a system > 110 kV began with an age-related index. But substations and tasks were
ascribed inspections at 3-, 6-, 9- or 12-year intervals and over time it has been possible to replace age
with condition data to be used for condition-based maintenance, starting with circuit breakers.
Condition is expressed relative to hazard rate or expected remaining life. The index is converted to
four categories – good (normal maintenance), sufficient (needs additional maintenance), moderate
(risk-based mitigation) and insufficient (refurbish or replace). The analysis will then be used with risk-
based assessment. Clearly at an early stage of development their use of AHI is not as an end point-
but as a steppingstone to:
 Benchmarking reliability and availability analyses
 Prediction of failure modes and develop strategies for maintenance
 Assist in drafting investment plans
G.4. Dutch service provider

There is an established two-part methodology involving condition and risk which the company has
been applying for international clients. Three approaches to assessment are made and the worst case
one is carried forward to the Figure G.2 The three are for utilisation, statistics and condition. The
analysis is displayed with a colour code and a shading reflecting the confidence level of the input data
or analysis.
Figure G.2 – AHI assessments
157
G.5. German TSO

This company evaluates both condition and importance factors within a RCM context and derives a
maintenance programme. Each asset is assessed separately and compared within the asset class.
The collection of criteria includes asset register information relating to age, technology, and location of
installation. The maintenance staff make periodic inspections, visual observations and assessments.
Furthermore, the evaluation includes the operational experience and the existing technical and
economic know-how of the system operator. The consequence is that the total criteria can be
classified in two groups:
Technical condition
 Age
 Type of equipment
 Switching operations per year
 Measurements
 Damages
 Hazard rate
 Operating loss
Strategic condition
 Service know-how
 Spare parts
 Experiences
 Number of components in the grid
While the first group describes the actual condition of the installations, which can be expressed, for
example, by the hazard rate, the second group describes a strategic condition that leads to an artificial
aging of the installation. In this case, this has no effect on the failure rate of the installation. The
consequence is that this review can perform an investment control to replace individual devices or
groups earlier, for example, to force a change in technology or implement a new overall strategy. In
this case, the assessment would assign "artificially" a poor condition to the asset, so that, for example,
a replacement occurs earlier.
In the example shown in Table G.2, the final condition index c is calculated by these individual
evaluations as relative weighted sum of these values (weighting). The following rule is applied for the
considered equipment: the larger the calculated index c, the worse the condition of the equipment. For
c = 0, the equipment is in excellent condition, for c = 100 in very poor condition.
Table G.2 – Criteria for the condition assessment of equipment in case of a circuit-breaker
Criteria Scale S Assessment Weighting

Age (years) < 20 1 5
20 – 35 3
26 – 30 5
31 – 35 7
36 – 40 9
> 40 10
Switching operations Normal 1 3
per year Medium 6
High 10
Service know-how Good 1 4
Medium 6
Poor 10
Results of Normal 1 10
measurements Medium 6
Poor 10
Total condition c (Score)
158
In general, it is possible that an assessment of equipment consists of many criteria. A single poor
assessment can be perfectly compensated by other assessments, so this poor rating is not noticeable
in the overall result. To solve this problem, a message should occur, if a threshold is exceeded so that
immediate maintenance actions must be carried out.
G.6. German DSO

The objective of the project is to enable DSOs to reduce maintenance costs and improve their grid
quality. It gives answers to several questions classified into four layers.
The project includes two main topics with three use cases. The first subproject Diagnostics 110-kV-
Transformers can be assigned to the subject area predictive maintenance. It contains two use cases.
The subject area outage management will be researched in more detailed in the second subproject
Prediction of Outages in Cable Systems. This section outlines some first stage results of the first
subproject, which was completed in 2017.
The subprojects’ main aim is to use load profiles and gas-in-oil analysis cycles as indicators to assess
the transformers condition. Therefore, it is essential to create an ingenious algorithm. The health index
algorithm is able to estimate the technical age of transformers and makes it possible to prolong its
remaining useful life.
The algorithm forecasts the oils gas quality for approximately 1300 transformers. The following list
highlights the most important results:
 Despite a reduced amount of data a significant result could be received.
 Algorithms classify all transformers into their criticality in consideration of a future outage. It
tolerates missing historical data.
 In opposition to today’s situation the algorithms allow a prioritization of the most critical
transformers.
 The approximations major advantages are that the oil measurements of transformers can be
reduced extensively, and the “biological” age can be determined accurately.
The AHI algorithm accomplishes new benefits for the operational business. The number of oil
inspections can be reduced dramatically. The investigation of 50 % of transformers revealed the
possibility to predict the failure of 85 % of the critical transformers Therefore it is possible to reduce the
operational costs substantially. In the meantime, the second stage of this pilot project has started
already. Moreover, the described strategy will be continued with an expanded database (local
transformer data). The user interfaces will be further developed to enhance the usability of the
algorithms.
G.7. German OEM

There are two systems provided.
The aim is to produce a number of health indices, for maintenance, refurbishment and replacement.
 For maintenance, HI reversible parameters are considered.
 For refurbishment, parameters that are not reversible are considered and where the rate of aging
is slow.
 For replacement, even non- technical parameters are considered- such as spares availability,
staff competences, etc.
SAFE (Standard Audit for Energy)

First tool is static and based on an audit procedure considering results of manual inspections (guided
by questionnaires, touchpad or printed out) and results of on-site measurements.
159
Figure G.3 – Displaying the AHI result

The other one is a fully computerized dynamic system.
Generally, the analysis involves assessing technical condition, weight of relevance, a quality
weighting, environmental impact, maintenance index. They do not consider that age relates strongly to
probability of failure. They use a mix of log and linear scoring, and weighting. Outputs are printed in an
automatically generated report giving status and even recommendations for service and maintenance
and shown in an spider diagram.
RCAM Dynamic (Reliability Centred Asset Management)
Physical values of Condition Parameters which are selected to be relevant for an asset Health Index in
a range from 0 – 10 become graded in a grading system:
 Very good (HI 0-2)
 Good (HI > 2-4
 Fair (HI > 4-6)
 Poor (HI > 6-8)
 Very poor (HI > 8-10)
The measured values may come from either monitoring systems or measurements.
An OEM-invented weighting system for condition parameter aggregation is in use. It is called the
weighted probability average (WPA which weights in general worse values exponentially higher than
good ones. The exponential function is equivalent to the probability of a failure.
160
Figure G.4 – Graphical representation of WPA-Method

An environmental factor is in use to amplify the ageing factors to force a modified – faster (>1) or
slower (<1) ageing forecast of an asset according to the environmental condition in which it is
operated.
The result is the aggregated, current health index of an asset or a selected sample of assets.
Figure G.5 – Health Index representation in RCAM Dynamic

The RCAM Methodology aims for actionable recommendations for
1. Alarm-based maintenance
2. Condition based maintenance with modified time intervals according reliability targets
3. Forecasting strategic measures being refurbishment and replacement (end of life)
Figure G.6 – RCAM Methodology overview
161
G.8. Estonian utility

The experience relates to overhead lines in 110, 220 and 330 kV networks. Much was installed
between1960-90 and with an expectation of a 50 year service-lifetime. There are three types of
assessment:
 Visual
 Visual with climbing
 Visual with measurements
Criteria scores are in the range 1-5 and prioritised by weighting:

 Foundations – 22%
 Support – 22%
 Crossbar – 4%
 Guy-wire – 2%
 Insulation – 10%
 Conductor – 35%
 Earthing wire – 2.5%
 Earthing system – 2.5%
This is then assigned with a risk analysis.

Having established that the AHI is in the range 1-5 this is related to a “new” lifetime:
(5 − 𝐴𝐻𝐼)
𝑁𝑒𝑤 𝑙𝑖𝑓𝑒𝑡𝑖𝑚𝑒, 𝑦𝑒𝑎𝑟𝑠 = 50 𝑦𝑒𝑎𝑟𝑠 ×
5
Clearly if the AHI was 3 then the new lifetime would be
𝟐
𝟓𝟎 × = 𝟐𝟎 𝒚𝒆𝒂𝒓𝒔
𝟓
This would prioritise the need to act in 10-year windows.
G.9. Japanese OEM

An example was described for 66 kV transformers. They used a linear score (1-10) for DGA, (1-6) for
PCB, (1-20) oil leaks, (1-47) for defects, (0-5) OLTC. Age was scored 1 point per year and so will
dominate many condition-based scores. The result would then be used to evaluate future life,
refurbishment, replacement etc.
Table G.3 – Example of a 66kV transformer assessment
Index Points Notes
Age n 1 point/operating year
Dissolved gas analysis in oil 10 Over Alarm level II
Trace PCB in insulating oil 6 Exceeding defined limits
Oil leakage from transformer 5 Leakage from one area
10 Leakage from several areas
20 High loss transformer using Hycar cork for gasket, and has leakage
from several areas.
Specific equipments 47 Winding configuration with high probability of lightning-caused
failure
5 Risk of core corner damage due to aging
10 RIP bushing of Company A
(Risk of insulation breakdown due to aging)
3 RIP bushing of Company B
(Risk of insulation breakdown due to aging)
LTC with no replacements 5 Transformer using LTC with no replacements
162
G.10. Japanese Utility

In Japan there is considerable collaboration between utilities, OEMs and universities, often taking
place within the Electric Technology Research Association. Work is underway to develop
methodologies like Figure G.7.
Figure G.7 – Methodology developed in Japan

An example of use is shown in Figure G.8.
Figure G.8 – Circuit breaker example from Japan

As an example, the methodology to calculate health indices was applied to more than 4000 gas circuit
breakers in Japan. Their voltage was 22-500 kV and included various OEMs as a trial demonstration.
Figure G.9 shows the number of units for each health index value. Note that the total number of units
of each age varies and certain generations have a large number of units, and others do not. From
Figure G.9, the younger GCBs tend to have low health indices of around 10%, whereas older ones
tend to have higher health indices of 20 % and higher. Overall, most of GCBs have health indices of
lower than 20 % indicating roughly a good condition to operate. However, there are relatively younger
GCBs which have high health indices and vice versa urging the utility to implement some interventions
to correct the situation. This kind of estimation by the utilizing health index calculation is particularly
important to establish and improve maintenance/replacement policy.
163
Figure G.9 – Number of units for each health index value

Figure G.10 shows the relationship between health indices and age of GCBs. Obviously, it is not only
the age factor which influences the GCBs’ condition, which means PoF calculation based on only age
of GCBs is unlikely to identify trends. Interestingly, some old GCBs appear to be in good condition for
their age and vice versa. This figure is also useful to create maintenance/replacement policy by
providing clear prioritization taking into account GCBs’ performance. Furthermore, equivalent age can
be calculated with certain standard curve for determining relationship between the health index and
age, which enables a simple and easy criterion for asset management decision making.
Figure G.10 – Health index distribution against age
G.11. Russian service provider

This group has developed a draft industry standard currently under review by the Russian Ministry of
Energy. It is a procedure to evaluate the technical condition of major assets in the power system. This
includes electrical and mechanical equipment as well as lines and cables.
The results of the evaluation are both quantitative values (a health index scored from 0 to 100), and
qualitative (from "critical" to "excellent").
164
For each asset the diagnostic portfolio is determined together with values of acceptability. The range
of operating values is then divided into 5 sub-bands, assigned to each individual assessment:
 "4" – There is no deviation of the measured parameters to requirements of normative and
technical and/or design (project) documentation that equipment complies with the required
functions in full.
 "3" – Measured parameters are within the values defined by the normative and technical
documentation and/or design (project) documentation, but there is a tendency of deterioration in
the values of such parameter.
 "2" – Measured parameters are within the values defined by the normative and technical
documentation and/or design (project) documentation, but there is a threat of occurrence of
failures
 "1" – Measured parameters are at the level of maximum permissible values, defined by the
normative and technical documentation and/or design (project) documentation that equipment
complies with the required functions not fully
 "0" – Measured parameters are outside of maximum permissible values, defined by the
normative and technical documentation and/or design (project) documentation
Each parameter is assigned to a design function (insulation, active part, magnetic, OLTC, bushings,
etc.). A weighting of the importance of the parameter to assessment of the function is assigned with
their sum adding to 1.
The score 0-4 is expressed as a fraction of 4 and then converted to a percent by:
𝑠𝑐𝑜𝑟𝑒
100 × 𝑤𝑒𝑖𝑔ℎ𝑡𝑖𝑛𝑔 ×
4
That percentage is again weighted by importance to yield a health index. This is illustrated in Table
G.4 for a transformer assessment. The total is then categorised in terms of excellent to critical, as
shown. The further step is to evaluate risk in order to identify appropriate next steps.
Table G.4 – Health index calculation.
165
G.12. Indonesian utility – AHI for a tropical climate

This development was made in order to manage failures in GIS chambers at 150 and 500 kV [B29].
The rates have been higher than the A3 CIGRE report [B2] and the dominant failure modes differed.
As shown in the table G5 a significant difference is the higher proportion of dielectric failures. The
results here relate to a study of 79 locations consisting of 631 CB bays. These represented 5227 bay
years at 150 kV and 730 bay years at 500 kV. Of these the 150 kV GIS is mainly indoor and the 500
kV outdoor installations.
Table G.5 – Examples of measurands
FAILURE RATE per 100 CB-bay years Fail to perform Dielectric breakdown
inside outside CIGRE [2] Utility CIGRE [2] Utility CIGRE [2]
150 kV 0.36 3.48 0.24 16% 63% 48% 23%
500kV 0.57 1.6 0.5 40% 30%
The difference in failure patterns is attributed to the tropical environment. This is high average annual
relative humidity of 80% at high ambient temperatures of 27° C. The annual lightning flash density is
high at 15 per km2. In the wet half year there is an average of 200 mm rainfall. Corrosion is common,
exacerbated by pollution near Jakarta. This leads to leaks and acidic attack on gaskets. The AHI
methodology attempts to rank the system in terms of likely to fail periods. It introduces a cascade
based on AHI assessed firstly for parts, then components enclosures, bays and finally the AHI for the
substation.
They consider GIS as consisting of five subsystems:
1. Primary conductor (including main contacts)
2. Dielectric
3. Driving mechanism
4. Secondary
5. Construction and support
Each is analysed using FMEA methods, identifying diagnostic indicators to be applied. An example of
one of the most important indicators (the dielectric with its scale code indices) is shown in Table G.6.
The approach is in line with this TB, with a base 3 log scores (1,3,10,30 and 100) which are then
added.
Interesting point is the experience of cascading from parts to substations (example of which is shown
in Table G.7) for the consolidation from component to bay levels.
The other innovation is the use of “failure susceptibility indicators”. There are three used, and each is
assessed as low, medium and high in terms of impact in creating a failure:
Environmental
 Pollutant level
 Lightning density
Operation and maintenance

 Service time
 Intensity of voltage transients
 Maintenance history
Design
 Short life failure modes
 O-ring materials
 Absorbent use
They and their assessments are used as guides for those preparing AHI by utilising the side notes.
These are important factors to help creating AHI. They are not failure modes nor are they evidence of
an active failure mode.
166
Table G.6 – Example of condition codes
Table G.7 – Condition code range interpretations
167
ISBN : 978-2-85873-563-1
TECHNICAL BROCHURES
©2021 - CIGRE
Reference 858 - December 2021

Substations and Electrical Installations: Asset Health Indices For Equipment in Existing Substations

Uploaded by

Copyright:

Available Formats

You might also like

Substations and Electrical Installations: Asset Health Indices For Equipment in Existing Substations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Substations and Electrical Installations: Asset Health Indices For Equipment in Existing Substations

Uploaded by

Copyright:

Available Formats

B3

Asset health indices for

J. BEDNAŘÍK, Convener IE A. WILSON, Secretary GB

AHI CONDITION DEFINITION ACTIONS TIME SCALES

Figures and Illustrations ................................................................................................... 11

2. Processes used in Asset Health Indexing ............................................................. 23

3. The generic methodology ....................................................................................... 37

4. APPLIED METHODOLOGY ...................................................................................... 48

Step 7: Aggregate analyses for AHI ................................................................................................... 59

Step 5 Collect Inspection data ......................................................................................................... 111

5. Assembling sets of AHI outcomes and Displaying results ................................. 125

Back calculation of probability of failure ........................................................................................... 134

6. Conclusion ............................................................................................................. 137

APPENDIX A. Definitions, abbreviations and symbols ................................................. 139

APPENDIX B. Links and references ............................................................................... 141

APPENDIX C. Additional explanation specific to Chapter 5 ......................................... 144

APPENDIX D. CIGRE PUBLICATIONS ............................................................................ 147

APPENDIX E. COLLABORATIVE DEVELOPMENTS ...................................................... 150

APPENDIX F. UTILITY DEVELOPMENTS ....................................................................... 151

APPENDIX G. WG members experiences ...................................................................... 156

Figures and Illustrations

Figure 4.7.2 – Failed centre phase arrester ........................................................................................ 104

Figure D.1 – An OEM’s RCM approach [B43] ..................................................................................... 148

Table 5.5 – Second example with alphanumeric codes ...................................................................... 128

Table A.1 – Definition of general terms used in this TB ...................................................................... 139

The role of health indices within asset life planning

Figure 1.1 – Risk based decision making

Drivers for the development of an AHI process

Figure 1.2 – Outcomes of asset failure

AHI within an asset management process

Figure 1.3 – Asset investment planning

AHI and the ageing asset base

Dealing with Unexpected Failures

Experience developing AHI

2. Processes used in Asset Health Indexing

Asset Health Index terminology

AHI CONDITION DEFINITION ACTIONS TIME SCALES

Failures, reliability, probability and likelihood of failure

Diagnostic Indicators for failure modes

Failure mode susceptibility indicators

Failure Mode, Effects and Analysis

Extent of an AHI review and restricted assessments

Level 1: Basic Strategy – based on office study

Level 2: Simple Strategy – added visual inspections

Level 3: Intermediate Strategy – added non-invasive diagnostic

Level 4: Advanced Strategy – added offline measurements and investigations

Level 5: Advanced Strategy – added continuous online monitoring

Translating into the scale code

Possible options Description

A 1 1 1 Very good condition

C 3 10 100 Fair condition

D 4 30 1,000 Poor condition

E 5 100 10,000 Critical condition

Condition Indicator 1 T ≤ -10 °C -10 < T ≤ -8 °C -8 < T ≤ -6 °C -6 < T ≤ -5 °C T > -5 °C

Condition Indicator 2 no leakage very few few low medium high

ASSIGNING CONDITION SCALE CODES

Linear numeric scale code 1 2 3 4 5

Alternatively, Log base 3 scale 1 3 10 30 100