Professional Documents
Culture Documents
DK1341ch8 PDF
DK1341ch8 PDF
8.1 INTRODUCTION
lifetimes. One might last only 11 years, while the other will still be providing service after
40 years. Minute differences in materials and construction, along with small differences in
ambient conditions, in service stress seen by the units, and in the abnormal events
experienced in service, cumulatively lead to big differences in lifetimes.
In fact, about the only time one finds a very tight grouping of lifetimes (i.e., all of the
units in a batch fail at about the same time) is when there is some common manufacturing
defect or materials problem - a flaw - leading to premature failure.
Failures, Failure Rates, and Lifetime
Table 7.3 listed four consequences of failure and the allied discussion in Chapter 7 made an
important distinction that a failure does not necessarily mean that a unit has "died," only
that it has ceased to do its function. As there, that type of failure will be called a "functional
failure." The terms "failure" and "failed" as used here will mean an event that ends the
useful life of the device.
Failure rate is the annual rate of failure - the likelihood (if predicting the future) or
actual record (if analyzing the past) of failure as a portion of population. It is usually
expressed as a percentage (e.g., 1.25%) and/or applied as a probability (1.25% probability
of failure). Expected lifetime - how long equipment will last in service - and failure rate are
clearly linked. A set of units expected to last only a short time in service must have a high
failure rate - if they do not fail at a high rate they will last a long time in service and thus
have a long expected lifetime
However, expected lifetime and failure rate are not necessarily functionally related. For
example, a set of units that have an average expected lifetime of twenty years may not
necessarily have a failure rate of 5% annually. It may be that failure rate for the next 10
years is only 2%, and that it then rises rapidly near the end of the period. Regardless, as a
qualitative concept, one can think of failure rate and lifetime as inversely related. If set A of
equipment has an expected lifetime twice that of set B equipment, then it has a much lower
failure rate.
Quantitative Analysis of Equipment Failure Probabilities
The concepts of failure and expected lifetime can be applied to individual units of
equipment (Chapter 7). But failure rate really only makes sense when a large population is
involved. Application of a failure rate to a specific unit of equipment is appropriate only in
the context of its being a member of a large set. Although mathematically one can apply
probabilities to a single unit of equipment to obtain expectations, any real value comes only
when the information is applied to analyze or manage the entire population.
Figure 8.1 shows the classic "bathtub" lifetime failure rate curve. Qualitatively, this is
representative of equipment failure likelihood as a function of age for just about all types of
devices, both within the power systems industry and without. The curve shows the relative
likelihood of failure for a device in each year over a period of time.
This "bathtub curve" and its application are probabilistic. Curves such as Figure 8.1
represent the expected failure of devices as a function of age over a large population. This
curve cannot be applied to any one particular unit. It can only be applied over a large
population, for which one can predict with near certainty the overall characteristics of
failure for the group as a whole. Thus, based on Figure 8.1, which has a failure rate of 1.5%
for year 8, 1.5 out of every 1000 units that have lasted 8 years will fail in that 8th year of
service. One can also see that among those units that survived 35 years of service, the
average probability of failure in the next year is .075%, etc.
10 20 30 40 50
Component Age (years)
Figure 8.1 The traditional bathtub failure rate curve, with three periods as discussed in the text.
Figure 8.1 shows three different periods during the device's lifetime. These are:
1. Break-in period. Failure rate for equipment is often higher for new
equipment. Flaws in manufacturing, installation, or application design lead
to quick failure of the device.
2. Useful lifetime. Once the break-in period is past, there is a lengthy period
when the device is performing as designed and failure rate is low, during
which the design balance of all of its deterioration rates is achieving its
goal. During this period, for many types of equipment, the failure rates due
to "natural causes" (deterioration due to chronological aging and service
stress) may be so low that the major causes of failure for the equipment set
are damage from abnormal events.
3. Wear-out period. At some point, the failure rate begins to increase due to the
cumulative deterioration caused by time, service stress, and abnormal
events. From this point on failure rate increases with time, reaching the
high double-digit percentages at some point.
Often, periodic maintenance, rebuilding or refitting, filtering of oil, etc., can "extend"
lifetime and lower expected failure rate, as shown in Figure 8.2 This concept applies
only to units designed for or conducive to service (i.e., this applies to breakers, perhaps,
but not overhead conductor). Here, the unit is serviced every ten years or so. Note
however, that failure rate still increases over time, just at a lower rate of increase, and that
these periodic rebuilds create their own temporary "infant mortality" increases in failure
rate.
Failure Rate Always Increases with Age
All available data indicate that, inevitably, failure rate increases with age in all types of
power system equipment. Figure 8.3 shows operating data on the failure rate of
1.0
8 0.8
V.
0)
0.6
Periodic maintenance
(0
tt 0.4 or rebuilding/refit
o>
0 10 20 30 40
Component Age (years)
Figure 8.2 A bathtub curve showing the impact of periodic rebuilding and refitting.
underground equipment for a utility in the northeast United States, which is qualitatively
similar to the failure performance seen on all equipment in any power system.
Failure rate escalation characteristics
Figure 8.3 shows how the rate of increase of failure rate over time for different
equipment can exhibit different characteristics. In some cases the failure rate increases
steadily over time (the plot is basically a straight line). In other cases the rate of increase
increases itself - failure rate grows exponentially with a steadily increasing slope. In yet
other cases, the rate climbs steeply for awhile and then escalation in failure rate decreases
- a so-called "S" shaped curve. Regardless, the key factor is that over time, failure rate
always increases.
There are cases where failure rate does no-. Increase over time, where it is constant
with time (it just does not increase with age) or where it may actually go down over time.
But these rare situations are seen only in other industries and with equipment far different
than power system equipment. All electrical equipment sees sufficient deterioration with
time and service that failure rate is strictly increasing over time.
Eventually the failure rates become quite high
Figure 8.3 is actual data representing a large population in an actual and very
representative electric utility. Note that the failure rates for all three types of equipment
shown eventually reach values that indicate failure within 5 years is likely (rates in the
15-20% range). In some cases failure rates reach very high levels (80%). Failure in the
next year or so is almost a certainty. To power system engineers and managers who have
not studied failure data, these values seem startlingly high. However, these are typical for
power system equipment - failure rates do reach values of 15%, 25%, and eventually
80%. But these dramatically high values are not significant in practice, as will be
discussed below, because few units "live" long enough to become that old. What really
impacts a utility is the long period at the end of useful lifetime and beginning of wear-out
period when failure rate rises to two to five times normal (useful lifetime) rates.
0.5
CABLE SECTIONS 25-kV Solid
c5 0.4
CD
CD
25-kV Paper,
I 0.2
0> 15-kV Paper,
i 0.1
10 15 20 25 30
Equipment Age (years)
1.0
CABLEJOINTS 25-kV Solii
CD 0.8
>»
15-kV Soli
CD
_ 0.6
CD
15-kV Paper, 15-kV Paper-Solid
S 0.4
25-kV Paper-
CD
3 0.2
a
10 15 20 25 30
Equipment Age (years)
0.00
10 15 20 25 30
Equipment Age (years)
Figure 83 Data on failure rates as a function of age for various types of underground equipment
in a utility system in the northeastern US. Equipment age in this case means "time in service."
Equipment of different voltage classes can have radically different failure characteristics, but in all
cases failure rate increases with age.
1. Time to failure can be predicted accurately only over large populations and in
a probabilistic manner.
2. Past and present service conditions can be used to narrow the expected
uncertainty range, but "deterministic" models of remaining lifetime are
unrealistic and unreliable.
3. Testing provides accurate "time to failure" information only when it reveals
flaws that mean "failure in the near future is nearly certain."
4. Test data that reveal no problems do not mean the unit has a long lifetime
ahead of it. At most, they mean there is little likelihood of failure in the near
term, nothing more. At worst, the results mean only that the tests did not find
flaws that will lead to failure.
5. Periodic testing and inspection is the only way to assure accurate prediction of
time to failure.
6. "Good" test scores have no long-term meaning. All they mean is that the unit
is currently in good condition and unlikely to fail soon.
7. Testing needs to be done more often as a unit gets older.
1.00
S .10
10 20 30 40 50 60 70 10 20 30 40 50 60 70
Time - Years Time - Years
Figure 8.4 Left, failure rates as a function of age for a group of 100,000 service transformers. Right,
the percent remaining from the original group as a function of time if they fail at the failure rates
shown at the left as they age. After twenty years, 60% remain, but after fifty years only 12.5% are still
in service, and after 70 years, only 2 of the original 100,000 are expected to still be in service.
2.5
= 2.0
10 20 30 40 50 60 70 10 20 30 40 50 60 70
Time - Years Years of Service
Figure 8.5 Left, number of failures occurring each year in Example 1's population that originally
numbered 100,000 units. The maximum is 2,576, in year 44, when the combination of escalating
failure rate and number of remaining units peaks. More than half the failures occur in the range
between 20 and 45 years in service. Right, a generalized result applicable to any population made up
of these units no matter how many and regardless of when they were installed. This curve is the
distribution of failure likelihood for one of these units, as a function of service age. It gives the
probability that a unit of this type will fail at a certain age. The area under this curve (including a small
portion of it beyond 70 years that is not shown) is 100%.
Failure-Count Diagrams
The right side of Figure 8.5 shows this same failure-count diagram with the scale changed
from a base of 100,000 units to just one unit. This diagram is basically a probability
distribution of when a unit can be expected to fail. The difference between this diagram and
a failure rate diagram (left side of Figure 8.4) is that the failure rate curve gives the
probability of failure for a unit as a function of its age, n, assuming it has lasted the
previous n-1 years. The diagram shown at the right of Figure 8.5 gives the likelihood of
failure in year n taking into account that the unit may not have lasted to year n. To
distinguish this curve from failure rate curves like that shown in Figure 8.4, it wi1! be called
& failure count curve, even if scaled as it is in Figure 8.5 to a percentage basis.
This is an interesting and very useful diagram, because it applies to any population
made up of this same type of transformer. The curve gives, in relative terms, how much
transformers of a particular age contribute to an installed base's failure rate. For any
population of this type of unit, no matter what the mix of ages - some installed last year,
some installed long ago - it is still those units that have reached 44 years of service that
contribute most to the system's problems. Units older than that fail with a higher likelihood
(left side of Figure 8.4) but there are too few to generate as high a total count.
Information like Figure 8.5, when developed for a specific situation (this diagram is for
only one particular transformer type as applied at one specific utility), is the foundation for
studies of proposed inspection, service, and replacement policies as well as various asset
management strategies. Application to strategic and tactical planning of that type will be
discussed later in this chapter.
Example 2: A More "Real World" Case
The preceding example provided useful insight into equipment failure characteristics, and
led to the failure count contribution curve (Figure 8.5), a very useful analytical tool. But to
see the full implications of failure rate escalation on a utility and understand how certain
replacement policies might reduce failure count, it is necessary to sacrifice Example 1's
simplicity for more realism. Specifically, one needs to look at a situation where failed units
are replaced, which is the fact of life in the "real world." Thus, Example 2 builds on
Example 1, using the same type of service transformers. It:
1. Assumes, as before, 100,000 units are installed in year 0.
2. Assumes a period of 70 years.
3. But assumes that failed units are replaced immediately with new
ones, keeping the overall count at 100,000.
4. And also assumes these replacement units follow the same failure
rate curve as the original units.
5. Looks at the entire population of transformers that results from
these assumptions, at the end of 70 years.
This is a more realistic example, as it represents what a utility has to do - keep the same
number of units in service, replacing failed units when they fail.1 It is still not completely
realistic because of the initial "build" of 100,000 units in year zero, but for the moment that
is not an issue. The important point in the overall equipment base and its interaction with
failures in this example is that replacement units can fail, too.
What does this installed equipment base look like with respect to age distribution of
units in service, average failure rate, and failure count? Figure 8.6 shows the distribution
of ages of the 100,000 units in the system after 70 years. As in Example 1, in year 1 (the
first year) 1,500 units failed, but here they were replaced with new units, which are a
year newer, thus at the end of the 70-year period, those "first year" replacements that
have survived are now 69 years old. But those 1,500 units did not all survive. They failed
with the same characteristic trend as the original set, meaning that 1.5%, or 22, failed in
their first year of service and were replaced in year 2. Their replacements are now 68
years old, assuming they lasted and did not fail in some interim year.
Furthermore, in year 2, 1,478 of the original 100,000 units failed (1.5% of the 98,500
original units remaining after year 1). Thus, a total of 1,500 replacement units were
installed in year 2 (1,478 + 22). Those replacements began to fail along the same trend as
the original units. Thus, in year 3 there were failures of units that had been installed in
years 0, 1, and 2, etc.
Eventually, as the population ages and its average failure rate rises, the utility sees the
annual replacement rate over the entire population rising to about 3,000 units a year, and
it finds itself replacing units of all ages. And the net result, when all of these
replacements and replacements for replacements, etc., are added up, is that the utility had
to install about 60,000 additional replacement units during the 70 year period. More than
half of the original units failed. And following the failure count contributions of Figure
8.5, most of those that failed were "middle aged transformers" - those in the 15- to 45-
year-old range.
1
The fact that this example assumes the system is created from scratch in year zero makes no impact
on the results given here. As shown earlier, by year 70 only 2 of the original units are left. The
population consists of units that have been replaced and in some cases failed and replaced again. As
a result after 70 years there is only an insignificant "start effect" involved in the data - the model's
results in year 70 represent a fairly stable look at what year-to-year operation for the utility would be
with respect to service transformers.
50
£ o
c o
3O
*- C-
SJB
%S
E
D a
0>
Area under the curve
Z 2 equals 100,000 units
.OS
10 20 30 40 50 60 70
Age - Years
Figure 8.6 Distribution of ages of units in the Example 2 system, 100,000 units that have been
replaced as they failed over the last 70 years.
A key point: the failure contribution curve from Example 1 (Figure 8.5) applies to this
example as well. Again, as stated earlier, that curve applies to all populations made up of
this same type of unit. Readers who are uncertain of this should stop and reflect on this
fact before moving on. For any such population, this curve is a representation of the
relative contribution to failures of units as a function of their age. Thus, the failure
contribution curve is a very important tool in managing reliability and replacement
policies. It will be discussed in detail later in this section.
Figure 8.6 shows the resulting equipment base's distribution of transformer ages, after
70 years for this example. It has nearly an even distribution of transformers from age 0
(new) to 30 years of age. At about 35 years of age the count takes a rapid plunge - this is
the period (starting at about 35 years m service) during which the bulk of failure counts
occur (see Figure 8.5), and thus the age when a good deal of replacements had to be made.
In this system, at the end of 70 years, the average unit in service is 22 years old.
However, due to the escalation of failure rates as units age, those older than the average
contribute a good deal more to the average failure rate. The entire population has an
average failure rate of 3.15%, or more than twice that of new units. That figure corresponds
to the failure rate of a unit that is 29 years old (see Figure 8.4, top). Thus, while the average
age of this population is 22 years old, its failure rate is equal to that of a population made
up of 29-year-old units.
Other Example Cases
The author ran a number of other cases, increasing realistic representations of actual utility
operations, through the simulation. However, no outstanding additional conclusions useful
for managing transformer failures and lifetime (as will be discussed in the next section) are
revealed. Conclusions of these studies and the cases studied were:
2
Figure 8.5 can also be interpreted as giving the relative age of units when they are replaced by the
utility over the course of its annual O&M in each year.
Case 3: A simulation that has the transformers being added gradually, rather than
all in one year, results in a population little different than shown in Figure 8.6.
That plot is fairly representative despite that "all in one year" scenario. Any "end
effect" of modeling all the units as starting in one year is worked out of the model
by the end of any simulated period longer than 50 years.
Case 4: Growth of the population was modeled in several cases, where the utility
had to expand the number of transformers each year by an amount that varied
from 1 to 3% annually. Figure 8.7 shows the types of changes that result from
applying an annual growth rate to the population. Of course, the total number of
units is greater than 100,000 in these scenarios. These populations become
somewhat "younger" and there is a sloped rather than a flat distribution of age
over the 0- to 30-year-old period.
Case 5: A period of high growth, lasting ten years with a growth rate of 5%, was
modeled as having occurred in the past. The result is a "bulge" in the age
distribution around the time (years in the past) of that growth period, as shown in
Figure 8.8. In cases where the growth spurt occurred about 30-40 years ago, this
large population of "now failing units" (see Figure 8.5) results in a relatively high
failure rate for the entire population. Where it occurred more than 50 years ago,
the impact is minimal - most of the units added then have failed since then.
Replacement Policy Analysis
Suppose this utility decided to replace all units as they reach fifty years of age, even if they
appear to be in good condition. Looking at the data for Case 2 (Figure 8.6), it would have to
replace only about 75 units annually - not a tremendous cost, particularly considering the
units will have to be replaced pretty soon anyway (they will most likely fail in a few years
at that age). However, the impact on the overall failure rate would be insignificant, making
no real difference in the average age or failure rates for the total population. The real
contributor to the system's annual failure count comes from units that are aged 25 to 45
years, because there are so many of them. Replacement at age 50 gets to the units after too
many have failed.
Replacement of units at age 40 has a far different effect. First, nearly 1,000 units a
year have to be replaced, so the annual cost is roughly 12 times that of a 50-year
replacement policy. But a noticeable portion of the system's unexpected failure rate is
avoided. The average failure rate drops to 2.6% (from 3.1%), a reduction in unexpected
failures of nearly 20%, wrought by replacement on only 1% of the units in the system
annually. Average age of a unit under this policy falls from 22 years given earlier to less
than 18 years. The population's average failure rate of 2.6% under this policy is
equivalent to the failure rate of a 25-year-old unit.
Thus, run-to-failure usually makes economic sense
Whether a replacement policy makes economic sense or not is something for
management to weigh in its evaluation of how best to spend the (always-limited) monies
it has for system improvements. The marginal cost of replacement can be weighed
against the marginal gain in reliability obtained, as well as compared against the marginal
cost of similar reliability gains available through other avenues. What is most important
in the foregoing examples is that the analytical results of a rather simple analysis of age,
failures, and remaining units can provide the type of tool needed to support such
decision-making.
High growth
Low growth
No growth (Case 2)
20 30 40 50 60 70
Age - Years
Figure 8.7 Distribution of ages of units in examples that included annual growth of the utility (and
thus its population of transformers).
50
High growth 15-25 years ago
High growth 30-40 years ago
High growth 50-60 years ago
XI w
"E § No growth (Case 2)
3o
•9 ° .5
E «
3 O)
Z O
.05
10 20 30 40 50 60 70
Age - Years
Figure 8.8 A period of high growth, here a 5% annual addition of units for ten years, results in a
"bulge" in the age distribution around the time of the growth spurt that lasts about 40 years, failures
gradually working it out of the population. All cases shown here assume replacement of units as they
fail. Note that the 5% for 10-year growth results in a greater area under the curve. The shape of the
base and 50-60 year ago curves are nearly identical.
REFERENCES
P. F. Albrecht and H. E. Campbell, "Reliability Analysis of Distribution Equipment Failure Data,"
EEI T&D Committee Meeting, January 20, 1972.
R. E. Brown, Electric Power Distribution Reliability, Marcel Dekker, New York, 2002.
J. B. Bunch, H. I. Stalder, and J. T. Tengdin, "Reliability Considerations for Distribution Automation
Equipment," IEEE Transactions on Power Apparatus and Systems, PAS-102, November 1983,
pp. 2656-2664.
EEI Transmission and Distribution Committee, "Guide for Reliability Measurement and Data
Collection," October 1971, Edison Electric Institute, New York.
P. Gil, Electrical Power Equipment Maintenance and Testing, Marcel Dekker, New York, 1998.
Institute of Electrical and Electronics Engineers, Recommended Practice for Design of Reliable
Industrial and Commercial Power Systems, The Institute of Electrical and Electronics Engineers,
Inc., New York, 1990.
A. D. Patton, "Determination and Analysis of Data for Reliability Studies," IEEE Transactions on
Power Apparatus and Systems, PAS-87, January 1968.
N. S. Rau, "Probabilistic Methods Applied to Value-Based Planning," IEEE Transactions on Power
Systems, November 1994, pp. 4082-4088.
E. Santacana et al, Electric Transmission and Distribution Reference Book, fifth edition, ABB Inc.,
Raleigh, 1997.
A. J. Walker, "The Degradation of the Reliability of Transmission and Distribution Systems During
Construction Outages," Int. Conf. on Power Supply Systems. IEEE Conf. Publ. 225, January
1983, pp. 112-118.
H. B. White, "A Practical Approach to Reliability Design," IEEE Transactions on Power Apparatus
and Systems, PAS-104, November 1985, pp. 2739-2747.