Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

8

Equipment Failures and


System Performance

8.1 INTRODUCTION

T&D systems are an interconnection of thousands of building blocks, components arranged


and interconnected together to form a working power delivery system. When the system
fails to do its job of delivering power to all of its distributed consumer points, it is usually
because one or more of these building blocks has failed to fulfill its role in the system.
Many system performance problems are due to equipment failures.
Chapter 7 covered some of the basics of capacity ratings, expected lifetimes, and failure
modes associated with individual units of equipment — the components that make up this
system, v^ne can usefully speak about failure rates and probabilities only v/hen re/erriug to a
large number of units. This chapter will examine failure as seen in a large population of
equipment and the impact that varying failure rate or lifetime or other aspects of equipment
has on its reliable operation as part of the power system.
Section 8.2 begins with a look at equipment failure rates, aging, and how the two
interrelate over the large set of equipment in a typical distribution system to influence
system reliability. Section 8.3 then takes a look at failure rate and the distribution of ages in
the inventory of equipment actually in service. What does the age distribution of equipment
over the entire system look like as equipment ages, "dies," and is replaced by newer
equipment? How does that effect replacement budgets and failure rates of the whole
system? What impact do various age-management policies have? Section 8.4 concludes
with a summary of key points.
8.2 EQUIPMENT FAILURE RATE INCREASES WITH AGE
All equipment put into service will eventually fail. All of it. Some equipment will last
longer than others will, and often these differences in lifetimes will appear to be of a
random nature. Two apparently identical units, manufactured to the same design from the
same materials on the same day at the same plant, installed by the same utility crew on the
same day in what appear to be identical situations, can nonetheless provide very different
267

Copyright © 2004 by Marcel Dekker, Inc.


268 Chapter 8

lifetimes. One might last only 11 years, while the other will still be providing service after
40 years. Minute differences in materials and construction, along with small differences in
ambient conditions, in service stress seen by the units, and in the abnormal events
experienced in service, cumulatively lead to big differences in lifetimes.
In fact, about the only time one finds a very tight grouping of lifetimes (i.e., all of the
units in a batch fail at about the same time) is when there is some common manufacturing
defect or materials problem - a flaw - leading to premature failure.
Failures, Failure Rates, and Lifetime
Table 7.3 listed four consequences of failure and the allied discussion in Chapter 7 made an
important distinction that a failure does not necessarily mean that a unit has "died," only
that it has ceased to do its function. As there, that type of failure will be called a "functional
failure." The terms "failure" and "failed" as used here will mean an event that ends the
useful life of the device.
Failure rate is the annual rate of failure - the likelihood (if predicting the future) or
actual record (if analyzing the past) of failure as a portion of population. It is usually
expressed as a percentage (e.g., 1.25%) and/or applied as a probability (1.25% probability
of failure). Expected lifetime - how long equipment will last in service - and failure rate are
clearly linked. A set of units expected to last only a short time in service must have a high
failure rate - if they do not fail at a high rate they will last a long time in service and thus
have a long expected lifetime
However, expected lifetime and failure rate are not necessarily functionally related. For
example, a set of units that have an average expected lifetime of twenty years may not
necessarily have a failure rate of 5% annually. It may be that failure rate for the next 10
years is only 2%, and that it then rises rapidly near the end of the period. Regardless, as a
qualitative concept, one can think of failure rate and lifetime as inversely related. If set A of
equipment has an expected lifetime twice that of set B equipment, then it has a much lower
failure rate.
Quantitative Analysis of Equipment Failure Probabilities
The concepts of failure and expected lifetime can be applied to individual units of
equipment (Chapter 7). But failure rate really only makes sense when a large population is
involved. Application of a failure rate to a specific unit of equipment is appropriate only in
the context of its being a member of a large set. Although mathematically one can apply
probabilities to a single unit of equipment to obtain expectations, any real value comes only
when the information is applied to analyze or manage the entire population.
Figure 8.1 shows the classic "bathtub" lifetime failure rate curve. Qualitatively, this is
representative of equipment failure likelihood as a function of age for just about all types of
devices, both within the power systems industry and without. The curve shows the relative
likelihood of failure for a device in each year over a period of time.
This "bathtub curve" and its application are probabilistic. Curves such as Figure 8.1
represent the expected failure of devices as a function of age over a large population. This
curve cannot be applied to any one particular unit. It can only be applied over a large
population, for which one can predict with near certainty the overall characteristics of
failure for the group as a whole. Thus, based on Figure 8.1, which has a failure rate of 1.5%
for year 8, 1.5 out of every 1000 units that have lasted 8 years will fail in that 8th year of
service. One can also see that among those units that survived 35 years of service, the
average probability of failure in the next year is .075%, etc.

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 269

10 20 30 40 50
Component Age (years)

Figure 8.1 The traditional bathtub failure rate curve, with three periods as discussed in the text.

Figure 8.1 shows three different periods during the device's lifetime. These are:
1. Break-in period. Failure rate for equipment is often higher for new
equipment. Flaws in manufacturing, installation, or application design lead
to quick failure of the device.
2. Useful lifetime. Once the break-in period is past, there is a lengthy period
when the device is performing as designed and failure rate is low, during
which the design balance of all of its deterioration rates is achieving its
goal. During this period, for many types of equipment, the failure rates due
to "natural causes" (deterioration due to chronological aging and service
stress) may be so low that the major causes of failure for the equipment set
are damage from abnormal events.
3. Wear-out period. At some point, the failure rate begins to increase due to the
cumulative deterioration caused by time, service stress, and abnormal
events. From this point on failure rate increases with time, reaching the
high double-digit percentages at some point.
Often, periodic maintenance, rebuilding or refitting, filtering of oil, etc., can "extend"
lifetime and lower expected failure rate, as shown in Figure 8.2 This concept applies
only to units designed for or conducive to service (i.e., this applies to breakers, perhaps,
but not overhead conductor). Here, the unit is serviced every ten years or so. Note
however, that failure rate still increases over time, just at a lower rate of increase, and that
these periodic rebuilds create their own temporary "infant mortality" increases in failure
rate.
Failure Rate Always Increases with Age
All available data indicate that, inevitably, failure rate increases with age in all types of
power system equipment. Figure 8.3 shows operating data on the failure rate of

Copyright © 2004 by Marcel Dekker, Inc.


270 Chapter 8

1.0

8 0.8
V.
0)
0.6
Periodic maintenance
(0
tt 0.4 or rebuilding/refit
o>

0 10 20 30 40
Component Age (years)
Figure 8.2 A bathtub curve showing the impact of periodic rebuilding and refitting.

underground equipment for a utility in the northeast United States, which is qualitatively
similar to the failure performance seen on all equipment in any power system.
Failure rate escalation characteristics
Figure 8.3 shows how the rate of increase of failure rate over time for different
equipment can exhibit different characteristics. In some cases the failure rate increases
steadily over time (the plot is basically a straight line). In other cases the rate of increase
increases itself - failure rate grows exponentially with a steadily increasing slope. In yet
other cases, the rate climbs steeply for awhile and then escalation in failure rate decreases
- a so-called "S" shaped curve. Regardless, the key factor is that over time, failure rate
always increases.
There are cases where failure rate does no-. Increase over time, where it is constant
with time (it just does not increase with age) or where it may actually go down over time.
But these rare situations are seen only in other industries and with equipment far different
than power system equipment. All electrical equipment sees sufficient deterioration with
time and service that failure rate is strictly increasing over time.
Eventually the failure rates become quite high
Figure 8.3 is actual data representing a large population in an actual and very
representative electric utility. Note that the failure rates for all three types of equipment
shown eventually reach values that indicate failure within 5 years is likely (rates in the
15-20% range). In some cases failure rates reach very high levels (80%). Failure in the
next year or so is almost a certainty. To power system engineers and managers who have
not studied failure data, these values seem startlingly high. However, these are typical for
power system equipment - failure rates do reach values of 15%, 25%, and eventually
80%. But these dramatically high values are not significant in practice, as will be
discussed below, because few units "live" long enough to become that old. What really
impacts a utility is the long period at the end of useful lifetime and beginning of wear-out
period when failure rate rises to two to five times normal (useful lifetime) rates.

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 271

0.5
CABLE SECTIONS 25-kV Solid
c5 0.4
CD

CD
25-kV Paper,
I 0.2
0> 15-kV Paper,
i 0.1

10 15 20 25 30
Equipment Age (years)

1.0
CABLEJOINTS 25-kV Solii
CD 0.8

15-kV Soli
CD
_ 0.6
CD
15-kV Paper, 15-kV Paper-Solid
S 0.4
25-kV Paper-
CD
3 0.2
a
10 15 20 25 30
Equipment Age (years)

PA D-M OUNTBD TRA NSFORM ERS

0.00
10 15 20 25 30
Equipment Age (years)

Figure 83 Data on failure rates as a function of age for various types of underground equipment
in a utility system in the northeastern US. Equipment age in this case means "time in service."
Equipment of different voltage classes can have radically different failure characteristics, but in all
cases failure rate increases with age.

Copyright © 2004 by Marcel Dekker, Inc.


272 Chapter 8

Predicting Time to Failure


High failure rate and uncertainty make for a costly combination
As a conceptual learning exercise, it is worth considering how valuable exact knowledge of
when any particular device would fail would be to a power delivery utility. Suppose that it
were known with certainty that a particular device would fail at 3:13 PM on July 23rd.
Replacement could be scheduled in a low-cost, low-impact-on-consumers manner just prior
to that date. There would be no unscheduled outage and no unanticipated costs involved:
impact on both customer service quality and utility costs could be minimized.
It is the uncertainty in the failure times of power system equipment that creates the high
costs, contributes to service quality problems, and makes management of equipment failure
so challenging. The magnitude of this problem increases as the equipment ages because the
failure rates increase: there are more "unpredictable" failures occurring. The utility has a
larger problem to manage.
Failure time prediction: an inexact science
With present technologies, it does not seem possible to predict time-to-failure of an
individual equipment unit exactly, except in cases of expensive real-time monitoring such
as for power transformer DGA systems. In fact, capability in failure prediction for
equipment is about the same as it is for human beings. The following statements apply just
about as well to people as to electrical equipment:
1. Time-to-failure can be predicted accurately only over a large population (set of
units). Children born in 2003 in the United States have an expected lifetime of 77
years, with a standard deviation of 11 years. Similarly, service transformers put
into service in year 2003 at a particular utility have an average expected lifetime
of 53 years with a standard deviation of 9 years.
2. Assessment based on time-in-service can be done, but still leads to information
that is accurate only when applied to a large population. Thus, medical
demographers can determine that people who have reached age 50 in year 2003
have an expected average 31 years of life remaining. Similarly, statistical
analysis of power transformers in a system can establish that, for example, those
that have survived 30 years in service have an average 16 years of service life
remaining.
3. Condition assessment can identify different expectations based on past or existing
service conditions, but again this is only accurate for a large population. Smokers
who have reached age 50 have only a remaining 22 years of expected lifetime,
not 31. Power transformers that have seen 30 years service in high-lightning
areas have an average of only 11 years service life remaining, not 16.
4. Tests can narrow but not eliminate uncertainty in failure prediction of individual
units. All the medical testing in the world cannot predict with certainty the time
of death of an apparently healthy human being, although it can identify flaws that
might indicate a high likelihood for failure. Similarly, testing of a power
transformer will identify if it has a "fatal" flaw in it. But if a human being or a
power system unit gets a "good bill of health," it really means that there is no
clue to when the unit will fail, except that it is unlikely to be soon.
5. Time to failure of an individual unit is only easy to predict when failure is
imminent. In cases where failure is due to "natural causes" (i.e., not due to

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 273

abnormal events such as being in an auto accident or being hit by lightning),


failure can be predicted only a short time prior to failure. At this point, failure is
almost certain to be due to advanced stages of detectable deterioration in some
key component. Thus, when rich Uncle Jacob was in his 60s and apparently
healthy, neither his relatives nor his doctors knew whether it would be another
two years or two decades before he died and his will was probated. Now that he
lies on his deathbed with a detectable bad heart, failure within a matter of days
is nearly certain. (The relatives gather.)
Similarly, in the week or two leading up to catastrophic failure, a power
transformer usually will give detectable signs of impending failure: an
identifiable acoustic signature will develop, internal gassing will be high, and
perhaps detectable changes in leakage current will be present, etc. But a lack of
those signs does not indicate certainty of any long period before failure.
6. Failure prediction and mitigation thus depend on periodic testing as units get
older. Given the above facts, the only way to manage failure is to test older units
more often than younger units. Men over 50 years of age are urged to have
annual physical exams in order that possible system problems are detected early
enough to treat. Old power transformers have to be inspected periodically in
order to detect signs of impending failure in time to repair them.
7. Most electrical equipment gives some diagnosable sign of impending failure.
Temperature rise, changes in sound volume or frequency, leakage current,
changes in power factor - something — nearly always provides a factor which, if
noted, indicates failure is very near.
Table 8.1 summarizes the key points about failure time prediction.

Table 8.1 Realities of Power System Equipment Lifetime Prediction

1. Time to failure can be predicted accurately only over large populations and in
a probabilistic manner.
2. Past and present service conditions can be used to narrow the expected
uncertainty range, but "deterministic" models of remaining lifetime are
unrealistic and unreliable.
3. Testing provides accurate "time to failure" information only when it reveals
flaws that mean "failure in the near future is nearly certain."
4. Test data that reveal no problems do not mean the unit has a long lifetime
ahead of it. At most, they mean there is little likelihood of failure in the near
term, nothing more. At worst, the results mean only that the tests did not find
flaws that will lead to failure.
5. Periodic testing and inspection is the only way to assure accurate prediction of
time to failure.
6. "Good" test scores have no long-term meaning. All they mean is that the unit
is currently in good condition and unlikely to fail soon.
7. Testing needs to be done more often as a unit gets older.

Copyright © 2004 by Marcel Dekker, Inc.


274 Chapter 8

8.3 A LOOK AT FAILURE AND AGE IN A UTILITY SYSTEM


This section will examine the effects of failure rate escalation, high failure rates, and
various replacement policies on the average age and the average failure rate of a large
population of equipment in utility service. This is done by using a computerized program,
Transformer Population Demographics Simulator, written by the author. It computes
failures and lifetimes, and can simulate replacement policies for large sets of transformers.
The example cases given below designed to illustrate key points about the relationship
between failure rates, how many units are left, and how many must be replaced. Often the
results are interesting and counter-intuitive. For example, the extremely high failure rates
that an escalating trend eventually reaches (perhaps as high as 50% annually, see Figure
8.3) are real, but of little concern to a utility. The reason: few units get to be that age - the
vast majority fail long before they are that old.
A large electric utility may have 100,000 or more service transformers in the field and
over a thousand power transformers in operation on its system. Every year all of those units
age. Some fail and are replaced by newer units. The population's average age may increase
or decrease, depending on how many failed and at what ages. The "demographics" of this
population depend on the failure rates and the failure rate escalation curve for the
equipment.
Example 1: A Typical Failure Rate Escalation and Its
Impact on the Installed Equipment Base
Figure 8.4 illustrates a very simple example that will begin this section's quantitative
examination of failure, installed base characteristics, and overall impact on the utility. In
this example, the group of 100,000 service transformers is installed in one year, a rather
unrealistic assumption but one that has no impact on the conclusions that this example will
draw.
As a group, this set of 100,000 units has the statistical failure rate characteristic shown
in the top part of Figure 8.4. That plot gives the probability that an operating unit of any
particular age will fail in the next 12 months of service. In this case there is no high break-
in-period failure rate. The base rate during normal lifetime begins at 1.5% per year, rising to
2.5% by year 24, 6.6% by year 30, and to 9% annually by age 40. This curve is based upon
actual service transformer failure rate curves (see bottom of Figure 8.3).
The bottom diagram in Figure 8.4 shows, as a function of age, the percent of the
100,000 units installed in year zero that can be expected to remain in service each year, as
units fail according to the expectation defined by the top curve. In year 1, 1.5% of the units
fail, meaning 99% are in service at the beginning of year two. At the end of a decade, 85%
are still in service. The failure rate is initially 1.5%, increasing slightly above that value
each year. Despite this rise, only 15% of the units (ten times 1.5%) fail in the first decade.
The reason is that the number of units left to fail decreases each year - there are only
98,500 units left at the end of the first year, etc., so 1.5% is not exactly 1,500 in the second
year. The number of actual failures decreases slightly each year for the first decade, to a low
of only 1,440 failures in year ten, because the number of units remaining to fail drops faster
than the failure rate increases.
At the end of 20 years, 71% of the units remain, and at the end of thirty, only 53%
remain. The 50% mark is reached at 32 years. By the end of year 50, failure rate has
escalated to more than 15%, but only a paltry 10.3% of the units remain. Only 500 (.7%)
make it to year 60. Less than two are expected to be in service by year 70, when failure rate
has escalated to 50%. The average unit ends up providing 43.8 years of service before
failure.

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 275

Failures in intermediate years are the real


cause of this system's reliability problems
As Figure 8.4 shows, every year, as the units in this example grow older, their failure rate
increases. But every year, because many have already failed in previous years, there are
fewer units remaining to potentially fail in the next year. In this example, the number of
units that can be expected to fail in any year is the failure rate for that age times the number
remaining in that year. When does that value reach a maximum?
The left side of Figure 8.5 answers this question. That plot is the year-by-year product
of failure rate (left side of Figure 8.4) times the number of units remaining (right side of
Figure 8.5). As mentioned earlier, the failure rate is initially only 1.5% and it does not
increase much in the first few years. Thus, the number of units actually failing each year
drops slightly during the first decade as the product (failure rate x number of units
remaining) decreases slightly from year to year. But then, at about ten years, the annual
number of failures begins to rise. At this point, failure rate is increasing at a rate faster than
the number of remaining units is decreasing (i.e., the annual increase in the failure rate is
greater than the failure rate itself)-
The number of units actually failing each year peaks in year (age) 44, with 2,576
expected failures. Thereafter, even though the failure rate keeps increasing every year, the
number of failures actually occurring decreases, because there are fewer and fewer units left
each year, so that the net number of failures (failure rate times number of units left)
decreases from there on. From here on, the annual increase in the failure rate is not greater
than the failure rate itself.
Thus, the very high failure rates that develop after five or six decades of service make
little real impact on the utility's quality of service. Units that are 70 years old have a 50
percent likelihood of failing in the next year, but as shown earlier, there are only two units
out of every 100,000 that make it to that age - essentially an anomaly in the system.
Instead, the high impact failure levels that plague a utility are caused by transformers of
intermediate age.

1.00

S .10

10 20 30 40 50 60 70 10 20 30 40 50 60 70
Time - Years Time - Years

Figure 8.4 Left, failure rates as a function of age for a group of 100,000 service transformers. Right,
the percent remaining from the original group as a function of time if they fail at the failure rates
shown at the left as they age. After twenty years, 60% remain, but after fifty years only 12.5% are still
in service, and after 70 years, only 2 of the original 100,000 are expected to still be in service.

Copyright © 2004 by Marcel Dekker, Inc.


276 Chapter 8

2.5

= 2.0

10 20 30 40 50 60 70 10 20 30 40 50 60 70
Time - Years Years of Service

Figure 8.5 Left, number of failures occurring each year in Example 1's population that originally
numbered 100,000 units. The maximum is 2,576, in year 44, when the combination of escalating
failure rate and number of remaining units peaks. More than half the failures occur in the range
between 20 and 45 years in service. Right, a generalized result applicable to any population made up
of these units no matter how many and regardless of when they were installed. This curve is the
distribution of failure likelihood for one of these units, as a function of service age. It gives the
probability that a unit of this type will fail at a certain age. The area under this curve (including a small
portion of it beyond 70 years that is not shown) is 100%.

Failure-Count Diagrams
The right side of Figure 8.5 shows this same failure-count diagram with the scale changed
from a base of 100,000 units to just one unit. This diagram is basically a probability
distribution of when a unit can be expected to fail. The difference between this diagram and
a failure rate diagram (left side of Figure 8.4) is that the failure rate curve gives the
probability of failure for a unit as a function of its age, n, assuming it has lasted the
previous n-1 years. The diagram shown at the right of Figure 8.5 gives the likelihood of
failure in year n taking into account that the unit may not have lasted to year n. To
distinguish this curve from failure rate curves like that shown in Figure 8.4, it wi1! be called
& failure count curve, even if scaled as it is in Figure 8.5 to a percentage basis.
This is an interesting and very useful diagram, because it applies to any population
made up of this same type of transformer. The curve gives, in relative terms, how much
transformers of a particular age contribute to an installed base's failure rate. For any
population of this type of unit, no matter what the mix of ages - some installed last year,
some installed long ago - it is still those units that have reached 44 years of service that
contribute most to the system's problems. Units older than that fail with a higher likelihood
(left side of Figure 8.4) but there are too few to generate as high a total count.
Information like Figure 8.5, when developed for a specific situation (this diagram is for
only one particular transformer type as applied at one specific utility), is the foundation for
studies of proposed inspection, service, and replacement policies as well as various asset
management strategies. Application to strategic and tactical planning of that type will be
discussed later in this chapter.
Example 2: A More "Real World" Case
The preceding example provided useful insight into equipment failure characteristics, and
led to the failure count contribution curve (Figure 8.5), a very useful analytical tool. But to
see the full implications of failure rate escalation on a utility and understand how certain

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 277

replacement policies might reduce failure count, it is necessary to sacrifice Example 1's
simplicity for more realism. Specifically, one needs to look at a situation where failed units
are replaced, which is the fact of life in the "real world." Thus, Example 2 builds on
Example 1, using the same type of service transformers. It:
1. Assumes, as before, 100,000 units are installed in year 0.
2. Assumes a period of 70 years.
3. But assumes that failed units are replaced immediately with new
ones, keeping the overall count at 100,000.
4. And also assumes these replacement units follow the same failure
rate curve as the original units.
5. Looks at the entire population of transformers that results from
these assumptions, at the end of 70 years.
This is a more realistic example, as it represents what a utility has to do - keep the same
number of units in service, replacing failed units when they fail.1 It is still not completely
realistic because of the initial "build" of 100,000 units in year zero, but for the moment that
is not an issue. The important point in the overall equipment base and its interaction with
failures in this example is that replacement units can fail, too.
What does this installed equipment base look like with respect to age distribution of
units in service, average failure rate, and failure count? Figure 8.6 shows the distribution
of ages of the 100,000 units in the system after 70 years. As in Example 1, in year 1 (the
first year) 1,500 units failed, but here they were replaced with new units, which are a
year newer, thus at the end of the 70-year period, those "first year" replacements that
have survived are now 69 years old. But those 1,500 units did not all survive. They failed
with the same characteristic trend as the original set, meaning that 1.5%, or 22, failed in
their first year of service and were replaced in year 2. Their replacements are now 68
years old, assuming they lasted and did not fail in some interim year.
Furthermore, in year 2, 1,478 of the original 100,000 units failed (1.5% of the 98,500
original units remaining after year 1). Thus, a total of 1,500 replacement units were
installed in year 2 (1,478 + 22). Those replacements began to fail along the same trend as
the original units. Thus, in year 3 there were failures of units that had been installed in
years 0, 1, and 2, etc.
Eventually, as the population ages and its average failure rate rises, the utility sees the
annual replacement rate over the entire population rising to about 3,000 units a year, and
it finds itself replacing units of all ages. And the net result, when all of these
replacements and replacements for replacements, etc., are added up, is that the utility had
to install about 60,000 additional replacement units during the 70 year period. More than
half of the original units failed. And following the failure count contributions of Figure
8.5, most of those that failed were "middle aged transformers" - those in the 15- to 45-
year-old range.

1
The fact that this example assumes the system is created from scratch in year zero makes no impact
on the results given here. As shown earlier, by year 70 only 2 of the original units are left. The
population consists of units that have been replaced and in some cases failed and replaced again. As
a result after 70 years there is only an insignificant "start effect" involved in the data - the model's
results in year 70 represent a fairly stable look at what year-to-year operation for the utility would be
with respect to service transformers.

Copyright © 2004 by Marcel Dekker, Inc.


278 Chapter 8

50

£ o
c o
3O
*- C-
SJB
%S
E
D a
0>
Area under the curve
Z 2 equals 100,000 units

.OS

10 20 30 40 50 60 70
Age - Years
Figure 8.6 Distribution of ages of units in the Example 2 system, 100,000 units that have been
replaced as they failed over the last 70 years.

A key point: the failure contribution curve from Example 1 (Figure 8.5) applies to this
example as well. Again, as stated earlier, that curve applies to all populations made up of
this same type of unit. Readers who are uncertain of this should stop and reflect on this
fact before moving on. For any such population, this curve is a representation of the
relative contribution to failures of units as a function of their age. Thus, the failure
contribution curve is a very important tool in managing reliability and replacement
policies. It will be discussed in detail later in this section.
Figure 8.6 shows the resulting equipment base's distribution of transformer ages, after
70 years for this example. It has nearly an even distribution of transformers from age 0
(new) to 30 years of age. At about 35 years of age the count takes a rapid plunge - this is
the period (starting at about 35 years m service) during which the bulk of failure counts
occur (see Figure 8.5), and thus the age when a good deal of replacements had to be made.
In this system, at the end of 70 years, the average unit in service is 22 years old.
However, due to the escalation of failure rates as units age, those older than the average
contribute a good deal more to the average failure rate. The entire population has an
average failure rate of 3.15%, or more than twice that of new units. That figure corresponds
to the failure rate of a unit that is 29 years old (see Figure 8.4, top). Thus, while the average
age of this population is 22 years old, its failure rate is equal to that of a population made
up of 29-year-old units.
Other Example Cases
The author ran a number of other cases, increasing realistic representations of actual utility
operations, through the simulation. However, no outstanding additional conclusions useful
for managing transformer failures and lifetime (as will be discussed in the next section) are
revealed. Conclusions of these studies and the cases studied were:
2
Figure 8.5 can also be interpreted as giving the relative age of units when they are replaced by the
utility over the course of its annual O&M in each year.

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 279

Case 3: A simulation that has the transformers being added gradually, rather than
all in one year, results in a population little different than shown in Figure 8.6.
That plot is fairly representative despite that "all in one year" scenario. Any "end
effect" of modeling all the units as starting in one year is worked out of the model
by the end of any simulated period longer than 50 years.
Case 4: Growth of the population was modeled in several cases, where the utility
had to expand the number of transformers each year by an amount that varied
from 1 to 3% annually. Figure 8.7 shows the types of changes that result from
applying an annual growth rate to the population. Of course, the total number of
units is greater than 100,000 in these scenarios. These populations become
somewhat "younger" and there is a sloped rather than a flat distribution of age
over the 0- to 30-year-old period.
Case 5: A period of high growth, lasting ten years with a growth rate of 5%, was
modeled as having occurred in the past. The result is a "bulge" in the age
distribution around the time (years in the past) of that growth period, as shown in
Figure 8.8. In cases where the growth spurt occurred about 30-40 years ago, this
large population of "now failing units" (see Figure 8.5) results in a relatively high
failure rate for the entire population. Where it occurred more than 50 years ago,
the impact is minimal - most of the units added then have failed since then.
Replacement Policy Analysis
Suppose this utility decided to replace all units as they reach fifty years of age, even if they
appear to be in good condition. Looking at the data for Case 2 (Figure 8.6), it would have to
replace only about 75 units annually - not a tremendous cost, particularly considering the
units will have to be replaced pretty soon anyway (they will most likely fail in a few years
at that age). However, the impact on the overall failure rate would be insignificant, making
no real difference in the average age or failure rates for the total population. The real
contributor to the system's annual failure count comes from units that are aged 25 to 45
years, because there are so many of them. Replacement at age 50 gets to the units after too
many have failed.
Replacement of units at age 40 has a far different effect. First, nearly 1,000 units a
year have to be replaced, so the annual cost is roughly 12 times that of a 50-year
replacement policy. But a noticeable portion of the system's unexpected failure rate is
avoided. The average failure rate drops to 2.6% (from 3.1%), a reduction in unexpected
failures of nearly 20%, wrought by replacement on only 1% of the units in the system
annually. Average age of a unit under this policy falls from 22 years given earlier to less
than 18 years. The population's average failure rate of 2.6% under this policy is
equivalent to the failure rate of a 25-year-old unit.
Thus, run-to-failure usually makes economic sense
Whether a replacement policy makes economic sense or not is something for
management to weigh in its evaluation of how best to spend the (always-limited) monies
it has for system improvements. The marginal cost of replacement can be weighed
against the marginal gain in reliability obtained, as well as compared against the marginal
cost of similar reliability gains available through other avenues. What is most important
in the foregoing examples is that the analytical results of a rather simple analysis of age,
failures, and remaining units can provide the type of tool needed to support such
decision-making.

Copyright © 2004 by Marcel Dekker, Inc.


280 Chapter 8

High growth
Low growth
No growth (Case 2)

20 30 40 50 60 70
Age - Years
Figure 8.7 Distribution of ages of units in examples that included annual growth of the utility (and
thus its population of transformers).

50
High growth 15-25 years ago
High growth 30-40 years ago
High growth 50-60 years ago
XI w
"E § No growth (Case 2)
3o

•9 ° .5
E «
3 O)
Z O

.05

10 20 30 40 50 60 70
Age - Years

Figure 8.8 A period of high growth, here a 5% annual addition of units for ten years, results in a
"bulge" in the age distribution around the time of the growth spurt that lasts about 40 years, failures
gradually working it out of the population. All cases shown here assume replacement of units as they
fail. Note that the 5% for 10-year growth results in a greater area under the curve. The shape of the
base and 50-60 year ago curves are nearly identical.

Copyright © 2004 by Marcel Dekker, Inc.


Equipment Failures and System Performance 281

Table 8.2 Key Points from Chapter 8


All equipment will eventually fail unless replaced while still having some useful lifetime left.
Failure rate increases with age: excepting a brief "infant mortality rate" when new, failure
rate is monotonically increasing over time.
The average failure rate of equipment in any population will be greater than the failure rate
for equipment that is the age of the average age of the population.
Different modes or types of failures affect most equipment: a transformer can fail due to
core, winding, bushing, or case failures. One can view deterioration in each of these areas as
being in a kind of "race" to see where failure occurs first.
A unit that is "worn out" in one failure mode area is probably near failure in other
modes, too. Sound, common sense design principles result in equipment in which all modes
reach failure at about the same time: core, windings, bushings all are designed with similar
service lives.
The very high failure rates of really old equipment actually make little impact on a utility's
service reliability, because there are very few such units left.
The slightly-higher failure rates of forty-year-old equipment typically create the greatest
reliability problem for a utility. The bulk of failures come not from very old equipment but
from "middle aged" equipment. There, failure rate times number of units of this age left in
service is usually the highest.
The bathtub curve model of failure likelihood, or a modified version of it, is useful in almost
all situations involving anticipation and management of equipment failure.
Time-to-failure can be predicted accurately only by using probablistic methods applied over
large populations and in an expectation-of-failure manner.
Uncertainty about remaining lifetime (time to failure) is the real factor shaping both poor
service quality and increased utility costs.
Past and present service conditions can be used to narrow the expected uncertainty range,
but "deterministic" models of remaining lifetime for individual units are unrealistic and
unreliable.
Testing and condition assessment provide accurate "time to failure" information only when
they reveal flaws that in essence mean "This unit is very likely to fail in the near future."
Good test results and assessed condition have no long-term meaning. They mean only that
the unit is currently in good condition and unlikely to fail in the near future.
Periodic testing and inspection is the only way to assure accurate prediction of time to
failure.
Inspection and testing needs to be done more often as a unit gets older.
Replacement policy analysis is a rather straightforward way to combine age, failures, test
results, and other data to determine if and how units should be replaced rather than left in
service.
Early replacement policy can be optimized to determine if and how equipment should be
replaced at some specific age (e.g., 40 years) or condition (e.g., meets IEEE category 4 for
transformers).
Run to failure is still the best cost-justifiable policy for most electric utilities and most
equipment, in spite of the "costs" associated with failures.

Copyright © 2004 by Marcel Dekker, Inc.


282 Chapter 8

8.4 CONCLUSION AND SUMMARY


All equipment installed in an electric system will eventually fail and need to be replaced. It
is mostly the uncertainty associated with failure that creates poor service quality and raises
utility costs: if a utility could accurately predict failure, it could use all the lifetime available
in its equipment but still avoid service interruptions due to failing equipment. Inspection,
testing, and condition modeling can reduce uncertainty but not eliminate it or even produce
meaningful results on a single-unit "deterministic" basis. Early replacement policies can be
worked out for equipment, but usually "run to failure" is the most economical approach.
Table 8.2 gives a one-page summary of key points made in this chapter.

REFERENCES
P. F. Albrecht and H. E. Campbell, "Reliability Analysis of Distribution Equipment Failure Data,"
EEI T&D Committee Meeting, January 20, 1972.
R. E. Brown, Electric Power Distribution Reliability, Marcel Dekker, New York, 2002.
J. B. Bunch, H. I. Stalder, and J. T. Tengdin, "Reliability Considerations for Distribution Automation
Equipment," IEEE Transactions on Power Apparatus and Systems, PAS-102, November 1983,
pp. 2656-2664.
EEI Transmission and Distribution Committee, "Guide for Reliability Measurement and Data
Collection," October 1971, Edison Electric Institute, New York.
P. Gil, Electrical Power Equipment Maintenance and Testing, Marcel Dekker, New York, 1998.
Institute of Electrical and Electronics Engineers, Recommended Practice for Design of Reliable
Industrial and Commercial Power Systems, The Institute of Electrical and Electronics Engineers,
Inc., New York, 1990.
A. D. Patton, "Determination and Analysis of Data for Reliability Studies," IEEE Transactions on
Power Apparatus and Systems, PAS-87, January 1968.
N. S. Rau, "Probabilistic Methods Applied to Value-Based Planning," IEEE Transactions on Power
Systems, November 1994, pp. 4082-4088.
E. Santacana et al, Electric Transmission and Distribution Reference Book, fifth edition, ABB Inc.,
Raleigh, 1997.
A. J. Walker, "The Degradation of the Reliability of Transmission and Distribution Systems During
Construction Outages," Int. Conf. on Power Supply Systems. IEEE Conf. Publ. 225, January
1983, pp. 112-118.
H. B. White, "A Practical Approach to Reliability Design," IEEE Transactions on Power Apparatus
and Systems, PAS-104, November 1985, pp. 2739-2747.

Copyright © 2004 by Marcel Dekker, Inc.

You might also like