Statistics of Design Error in The Process Industries

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Safety Science 45 (2007) 61–73

www.elsevier.com/locate/ssci

Statistics of design error in the process industries


J. Robert Taylor
Rambøll Danmark A/S, Bredevej 2, DK-2830 Virum, Denmark

Abstract

The paper addresses questions on how frequently incidents and accidents are caused by design
errors and how signiWcant design reviews are in removing design errors before a system is put into
operation. It is based on a review of earlier studies mainly from the chemical and nuclear industries.
The studies report that from about 20% to 50% of the studied incidents and accidents have at least
one root cause attributed to erroneous design. The number of design errors actually occurring during
the design process is much higher, but 80–95% of them are removed by thorough design reviews. To
improve the design process further, it is necessary to analyse the nature and causes of design errors
through Wrst hand knowledge about the design process.
© 2006 Elsevier Ltd. All rights reserved.

Keywords: Safety; Incidents and accidents; Design error; Design review; Chemical industry; Nuclear industry

1. Introduction

Design error is one of the most frequent causes of system failure and of accidents in the
process industries, but has nevertheless been largely overlooked in risk analysis of process
systems and control systems. Evidence for this is given in this paper. It is based on a series
of studies by the author as participant in the design process over a period of 35 years, cov-
ering both studies of design errors and methods for reducing the incidence of such errors.
Collection of data on design error is not straightforward. Evidence of design error
appears in accident reports and in trouble reports from customers to manufacturers. As
will be shown, only a small percentage of errors actually reach the stage where they cause

E-mail address: rrt@ramboll.dk

0925-7535/$ - see front matter © 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ssci.2006.08.013
62 J.R. Taylor / Safety Science 45 (2007) 61–73

accidents or operations problems. By far the majority of errors are removed from systems
and plants before they are put into operation. For this reason, it is necessary to participate
in the actual design process, in order to be able to collect the data on which improvements
in the design process can be based.
Methods such as hazard and operability and functional failure analyses are partially
eVective in detecting design errors. The actual eVectiveness is reviewed here on the basis of
practical studies of large-scale systems and on experiments intended to elucidate speciWc
problems. One of the classical methods, Hazard and Operability (Hazop) analysis, is found
to give more than an 80% chance of discovering those errors which lie within its domain of
application.

2. DeWnition of design error

A design error may be deWned as a feature of a design which makes it unable to perform
according to its speciWcation. It is rare that a design fails under all circumstances, and the
deWnition generally means that there are some circumstances, within the scope of the speci-
Wcation, under which the system does not match its speciWcation.
There are some problems with this deWnition. For many systems, the speciWcation is
inadequate, and needs to be supplemented by general statements, such as “additionally the
systems should work in a European climate” or “in addition to performing according to
the speciWcation, the system should not produce hazardous outputs”. For most systems,
there are a very large number of requirements which are included in the speciWcation by
reference, or are implicit. Many of the requirements which are not stated in the design doc-
uments are nevertheless explicit in legal requirements, or standards which are legally bind-
ing. To understand design error, and even to determine whether a design error has
occurred, it is necessary to understand this implicit or indirect background. To complicate
matters even further, speciWcations may contain errors, which lead them to diverge from
the designer’s, or the purchaser’s true intentions. For these reasons, a more pragmatic deW-
nition may sometimes by used (Taylor, 1975).
“During analysis of incident records, a design error is deemed to have occurred, if the
design or operating procedures are changed after an incident has occurred.”

3. Statistics of design error

3.1. Accident statistics

Statistics of accident causes are important because they give us an idea of how accidents
arise in practice, and help prevent us focussing on the purely anecdotal. One of the Wrst
published studies of design error is useful in this way. It was carried out on “abnormal
occurrence reports” published by the US Nuclear Regulatory Commission (NRC) in the
1960s and early 1970s (Taylor, 1975, 1976). The criterion for whether a design error
occurred was an objective one. If a design change was made as a result of the incident, then
a design error or omission was considered to have occurred. In order to make this deWni-
tion compatible with that in the previous section, we need to add the errors in procedures,
making 45% design errors in total. The results of this study are given in Tables 1–3. In all
250 reports were assessed over a 10 year period of operation.
J.R. Taylor / Safety Science 45 (2007) 61–73 63

Table 1
Incident causes for nuclear reactors
Error cause (total number of errors N D 422) % of total errors
Design error 35
Component failure 18
Operator error 12
Error in procedure 10
Maintenance or installation error 12
Fabrication fault 1
Cause unknown or unrecorded 12

Table 2
Causes of design errors for nuclear reactors
Design error cause % of design errors
Component selection 14
Oversight 17
EVect unknown at design time 25
Sizing, dimensioning error 13
Complex interactions overlooked 7
Communications problems 1
Cause unknown or unrecorded 22

Table 3
Causes of errors in procedures in nuclear reactors
Cause of procedure error % of errors in procedures
Omission of step in procedure 56
Omission due to eVect or need unknown at design time 16
Procedure unclear or ambiguous 7
Wrong test frequency speciWed 2
Wrong procedure speciWed 2
Extra checks required 6
Cause unknown or unrecorded 14
Total number of procedural errors 42

Some conclusions can be drawn from this assessment. Firstly, improving component
selection could presumably reduce the total number of incidents by about 5%. This could
be accomplished by having better speciWcations, speciWcation checking, and application
rules. Sizing and dimensioning errors could probably be reduced in a similar way by using
computer aids which calculated a wider range of constraints and requirements. By far the
biggest reduction would arise by preventing oversights and eVects due to lack of knowl-
edge. These causes point to some kind of checking based on actual process knowledge.
If procedural design is included in the range of problems treated, then the most obvious
place for improvements is in planning and scoping procedures, since a simple oversight in
the need for a particular procedure is the prime cause of error shown in Table 3.
Note that in all cases, design and procedural design errors are more signiWcant than ran-
dom component failures of the kind treated by traditional reliability analysis techniques.
Note also that some of the operator errors and maintenance errors could probably be
reduced by better man machine design, and procedure design. In all, this study estimated
that 70% of incidents appear to be susceptible to improvements in the design process.
64 J.R. Taylor / Safety Science 45 (2007) 61–73

Table 4
Incident primary causes for nuclear reactor safety related incidents from 1980s
Error cause (total number of errors N D 100) % of total errors
Design error 46
Component failure 11
Operator error 9
Error in procedure 15
Maintenance or installation error 17
Fabrication fault 1
Cause unknown or unrecorded 8

The data from the 1975 study are necessarily out of date – they represent design practice
from the 1950s. In order to test whether design practices have changed, the abnormal
occurrence study was repeated for incidents on newer plant, using data from the 1980s
(Taylor, 1997). The results nevertheless represent the design practices of the 1970s, since
the data is for nuclear plants operating in the 1980s. One hundred incident reports were
studied (Table 4). The results showed an increased percentage of incidents involving design
error. This may be because other causes of failures became less frequent, since there was a
signiWcant improvement in equipment reliability resulting from more mature designs and
the move to solid-state electronics over the 10–15 years between the studies.
The studies resulted in some important observations:

• Distinguish between system design (interconnecting a set of components) and compo-


nent design (selecting speciWc component types and dimensioning). Many generalised
analysis techniques, such as FMEA and Hazop, exist for checking system designs. Com-
ponent design involves selection and calculation, which almost always depends on spe-
ciWc knowledge, and for which there are at best check list methods to support design
review.
• The importance of lack of knowledge among designers and introduction of the classiW-
cation “non culpable ignorance” to cover the cases where an incident provides com-
pletely new knowledge about accidents. The frequency of “new and unusual” accident
phenomena has been of continuing interest since this study, because they set the limit to
how well we can analyse and predict accidents, and therefore how well we can prevent
them. If every accident occurred in some new way, then our risk reduction and loss pre-
vention eVorts would be useless.

A similar kind of study to this was carried out by Haastrup (1984) in the chemical
industry, using Manufacturing Chemists Association, and Loss Prevention Bulletin inci-
dent descriptions. The number of incidents classiWed as arising due to design error is here
about 25%. All of the above analyses are to some extent pre-selected, or are in some specia-
lised area of engineering. The Wrst were from the nuclear industry, while the Haastrup’s set
was taken from publications, which focus on “interesting” accidents. A further study was
made by the present author, of 121 accidents reported under the major hazards scheme to
the European Joint Research Centre MARS database, and published by Drogaris (1993).
These are not pre-selected and should therefore, within the limits of the obedience to the
reporting requirements, be more representative of accident causes. The results are given in
Figs. 1–3 and show that over 50% of accidents have some contribution from design error.
J.R. Taylor / Safety Science 45 (2007) 61–73 65

Unavoidable component failure

Maintenance error, procedure not


followed

Operator error, performance error

Operator error, did not follow


procedures
Inadequate codes or standards, wrong
code used

Inadequate safety analysis

Inadequate lab analysis

Design

Managerial

0 10 20 30 40 50 60 70
%

Fig. 1. Causes of 121 chemical industry accidents reported to the MARS accident database.

This Wgure is potentially higher if the use of wrong codes and inadequate safety analysis
are added. This “modern” classiWcation gives several contributing causes for most acci-
dents (the percentages add up to more than 100), hence the exact Wgure for such a conclu-
sion is hard to estimate. Management error and design error dominate to a larger extent
than in the data of Tables 1–3, as might be expected from the change in expectations over a
20-year period. More signiWcant perhaps is the light which the study threw onto manage-
ment (Fig. 2) error mechanisms and design error mechanisms (Fig. 3).
A set of data from the US Risk Management Programme was studied. This also non
pre-selected data and can be regarded as statistically representative (US Environmental
Protection Agency, 2005). This programme requires a risk management report for all
plants which have above a certain level of inventory. The reports include a Wve year acci-
dent history for the plants, covering the accident history for the plants, for all accidents
with oV site consequences, deWned in terms of oV site concentrations of toxic substances
from the incidents. The causes of the accidents are self-assessed by the companies (see Figs.
4–6).
As might be expected, the proportion of accidents attributed to design error is lower
when a self-assessment is made, and when only one cause can be given. Indeed there is no
one category of “design error” deWned. Also, the reporting provides little detail about how
the errors were made. However, in these Wgures we can assume that all “unsuitable equip-
ment” causes, some “excessive corrosion”, many “equipment failures” and some
“improper procedure” causes may be considered as design errors. Even then, as can be seen
from these data, design error is assessed as a cause of only a small fraction of the accidents.
A large fraction is still regarded by the engineer reporting them as unavoidable, for design-
ers, in some way. Consider for example the large proportion attributed to adverse weather
conditions. However, most safety engineers today would consider a plant badly designed if
bad weather could cause a signiWcant release of toxic material.
66 J.R. Taylor / Safety Science 45 (2007) 61–73

Managerial, inadequate emergency


preparedness

Managerial, no MOC, delayed follow up

Managerial, poor communications

Managerial, inadequate security

Managerial, poor storage procedures

Managerial, understaffing
Error cause

Managerial, failure to respond to warnings

Managerial, inadequate preparation for


maintenance

Managerial, inadequate permitting

Managerial, inadequate inspection,


integrity audit

Managerial, inadequate training

Managerial, poor safety culture

Managerial - inadequate operation or


maintanance procedures

0 5 10 15 20 25 30
%

Fig. 2. Causes of the managerial errors from Fig. 1.

What conclusions can we draw from these data, if we take the Risk Management Pro-
gramme records of cause at face value? Design error is obviously a contributor to risk. One
approach is then to take the average non-design error accident frequency, and just increase
it by a factor of about 20% to account for design error in a risk analysis. Alternatively one
could take a base rate of “unavoidable” accidents, related to unavoidable equipment fail-
ure, and multiply by a factor of about 3 to take into account the failures which we would
generally regard as avoidable.
This line of thinking has been used to justify ignoring process plant design error in risk
assessments, or at least not considering it as a special Weld worthy of study, for many years.
Even the extensive data collections of failure rates, such as OREDA or Mil Std. 217 do not
look at why failures occur (Sintef, 2002; US Department of Defence, 1995). The problem
here is that the more serious accidents, more or less by deWnition, involve design error or
management error. No one would today accept the possibility that piping carrying toxic
material would simply fail as a result of a long period of corrosion. Similarly, one would
expect a pressure vessel today to have a probability of failure of 10¡7 per year or less, and
would expect design, manufacturing procedures and non-destructive testing to ensure this.
We must expect that the serious accidents in the future will be dominated by those caused
J.R. Taylor / Safety Science 45 (2007) 61–73 67

Lack of knowledge

Lack of kowledge, Novel system

LTA feedback

LTA safety awareness

LTA MOC

Lack of qualified staff

LTA communication

LTA standard

LTA analysis

LTA design procedure

LTA specification

0 20 40 60 80 100 120
%

Fig. 3. Causes of design error for accidents from Fig. 1 (LTA D less than adequate, MOC D management of
change).

by conditions over which we do not yet have full control. At present, these are maintenance
error, management error, and design error, and a few types of operator error.
To see whether there is any evidence of this kind of relationship in the Risk Manage-
ment Programme data, the cause classes were correlated with the size of releases. The
results are shown in Figs. 7 and 8. There is a clear correlation, the design errors (including
equipment, corrosion, etc.) leading to larger releases. The results are not as convincing as
the arguments above, based on the logic of our expectations of safety engineering, but this
is not surprising. The data covered in the Wgures represent only about 1000 plant years of
experience, and do not include the largest accidents which could occur. Nevertheless, it can
be seen that “unsuitable equipment” is correlated with large releases, at least for the reWn-
ery units.
From all of these studies taken together we get a varying picture, but all show at least
20% of incidents having a signiWcant causal factor in design, and most show much higher
Wgures around the half or even more, especially if we use the same deWnition as Kinnersley
and Roelen (see, this issue) and include inadequate procedures as design errors.

3.2. Collection of design error data in the design oYce and at the plant

Using historical accident data from incidents and accidents to determine distributions
of causes means, unavoidably, that the data reXects out of date design practices. Another
source of data comes from design review and safety studies, which look explicitly for
68 J.R. Taylor / Safety Science 45 (2007) 61–73

Cause distribution, refinery piping

Improper procedure
Alkylation piping
Hot tap Crude unit piping

Bypass Reformer piping

Weather

Upset

Unsuitable equipment

Overpressure

Management

Maintenance

Operator error

Flame impingement

Equipment

Excessive corrosion

0 10 20 30 40 50 60 70
percentage

Fig. 4. Causes of failure (as recorded by plant management) for 58 reWnery piping failures.

Weather

Plant upset

Unsuitable equipment

Overpressure

Management

Human error

Hot tapping

Flame impingement

Equipment

Defective pipe, LTA monitoring

Bypass

0 5 10 15 20 25
%

Fig. 5. Causes of failure, (as recorded by plant management) for reWnery equipment in general.

design error. The results of these show far more errors made and recovered before they
resulted in incidents or accidents.
Three such systematic methods are available, which provide signiWcant amounts of
data: hazard and operability analysis (Hazop), action error analysis and formal design
J.R. Taylor / Safety Science 45 (2007) 61–73 69

Causes, Ammonia plant releases

Weather

Unsuitable equipment

Unrecorded

Purging

Overpressure

Management

Human error

Equipment

0 5 10 15 20 25 30 35 40 45
%

Fig. 6. Causes (as recorded by plant management) for 96 ammonia plant releases.

Size of releases (kg/s) vs release count


for ammonia plants
25

20

15 Equipment
Human error
Unsuitable equipment
10 Weather

0
1
3
5
7
9
11
13
15
17

19
21
23
25
27
29
31

Fig. 7. Sizes of releases recorded by class, for ammonia plants.

review. Hazop can be regarded as an eVective design error reduction method. It Wnds
design errors in process plants at the piping and instrument diagram level. However, it does
not deal with mechanical design errors, and only rarely with dimensioning errors such as
pump sizing and it does not deal very well with the kind of problems arising during start
up, shut down, etc., so some design errors will be overlooked.
Results from Hazop studies were available from the design process for eight chemical
plants (Taylor, 1991). The data were available from earliest design through to operation.
The results give information not only on the occurrence of design errors, but also on the
eVectiveness of the Hazop procedure. The initial designs seem to include about one systems
design error per vessel, at the initial design stage, depending somewhat on the degree of
standardisation. The Hazop process is eVective in elimination between 80% and 98% of
these errors, depending on how the analysis is carried out. (The most complete results were
obtained by performing thorough cross check analyses, using a very powerful expert system
70 J.R. Taylor / Safety Science 45 (2007) 61–73

Refinery releases, size kg/s vs number of failures


1000

100

10 Corrosion
Equipment
1
Release kg/s

Heater flame
0.1 Human error
Management
0.01
Unsuitable equipment
0.001 Weather

0.0001

0.00001
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
Number

Fig. 8. Sizes of releases recorded by cause class, for reWnery units.

and then following up discrepancies.) Presumably, therefore, between 0.2 and 0.02 systems
design errors persist through to commissioning. Note that there may be other design
errors, arising from mechanical design, electrical design, and layout, which are not amena-
ble to Hazop. It has only been possible to gather precise information on these from one of
these projects, but the results indicate about 0.5 of those errors per vessel persisting
through to the commissioning stage.
Table 5 gives error rates found by Hazop for a biological waste processing plant, for a
chemical waste incinerator, for a sintering furnace, and for a nitric acid plant, all made dur-
ing the design.
Not all the recommendations from the Hazops can be regarded as reXecting design
errors. In many cases the designs were not wrong; they just did not match up to safety
expectations. The Hazops should for these aspects, be regarded as part of the design pro-
cess intended to ensure that safety expectations are met. The data have therefore been
divided into two groups of Wndings/recommendations, those which are needed to make the
design work, and those needed to ensure a high level of safety (see the last two columns in
Table 4).
Five plants from the Wrst study (Taylor, 1991) were followed up in detail in post com-
missioning or operational audits. There were fairly complete details of design errors avail-
able, and interviews were carried out with the designers involved. Fig. 9 shows the

Table 5
Errors per vessel during design, found by Hazop study during design
Analysis Design errors Vessels Errors per vessel Safety related
found during HAZOP errors per vessel
Biological waste processing plant 9 1 9 2
Chemical waste incinerator 48 9 5.5 5.5
Sintering furnace 8 1 8 6
Nitric acid plant 84 8 10 8.5
J.R. Taylor / Safety Science 45 (2007) 61–73 71

Proximate causes of design errors

Wrong specification

Requirement not
recognised

Practical constraint
not recognised

Mistake

Lack of
understanding

Lack of knowledge
of standards

Lack of knowledge

Lack of analysis

Inadequate analysis

Drawing error

Difficulty in finding
solution

0 5 10 15 20 25

Fig. 9. Design error causes for 52 design errors in Wve chemical plants.

proximate causes of the design errors. Fig. 10 gives design error root causes. The x-axis of
both these Wgures is the absolute number of errors, not the percentage.
The number of errors persisting through to operation varied from 0.2 to 2.4 errors per
vessel type. Of the 52 errors, three gave actual accidents. About 15 would have given acci-
dents within about a year, based on calculated accident frequencies from risk analyses. A
further 20 might well have caused accidents over a 10-year period. The remainder would
have led to increased consequences if other accidents or disturbance arose.
It was shown in Section 3.1 that a signiWcant fraction of accidents arising in process
plant involve design error. The percentage varies from about 20% up to over 50%, depend-
ing on the standards set for designs. The studies in this section show that the number of
72 J.R. Taylor / Safety Science 45 (2007) 61–73

Design errors classified by root cause

Novel system

Poor feedback from


operation

Poor communication

Poor safety culture

Lack of training

Lack of qualified
staff

Lack of information

Inadequate standard

Inadequate
specification
procedure

Inadequate MOC
procedure

Inadequate drafting
procedure

Inadequate design
procedure

Inadequate
awareness of field
conditions

Inadequate analysis
procedures

0 2 4 6 8 10 12 14 16

Fig. 10. Design error root causes or 52 design errors in Wve chemical plants.
J.R. Taylor / Safety Science 45 (2007) 61–73 73

design errors actually occurring during the design process is much higher than the number
causing accidents. Fortunately, not all design errors are transferred to the Wnal construc-
tion; many are removed as part of the design review, Hazop analysis, and commissioning
audit process. Of those design errors which do survive until the operational stage, only a
fraction actually causes accidents. Lack of risk analysis, lack of knowledge and inadequate
speciWcation and design codes and procedures are the most signiWcant proximal or under-
lying causes of errors.

4. Conclusions

From the data presented here we can conclude that design error plays a major role in
process plant risk. Current design review methods discover and remove between 80% and
perhaps 95% of the errors made, but there is still a design element present in between 20%
and 50% of the accidents and incidents which happen in chemical process plant. That per-
centage depends very much on the quality of the data reporting, the representativeness of
the data analysed and the way in which that analysis is performed. In particular the deWni-
tion of what to include under the term design error is determinant, as is the way in which
the analyst conceives of the responsibility of the designer.
In order to understand more about the nature and causes of design error it is necessary
to dig deeper, behind the statistics and the incident analyses. The other paper by this
author in this special issue will do that, based on long experience of involvement in the pro-
cess of design and safety review. That discussion is therefore based on anecdote and case
studies, but rooted in a systematic description of the design process.

References

Drogaris, D., 1993. Major Accidents Reporting System. Commission of the European Communities, Joint
Research Centre, Ispra.
Haastrup, P., 1984. Design Error in the Chemical Industry. Report Risø-R-500, Risø National Laboratory, Ros-
kilde.
Roelen, A., Kinnersley, S., Drogoul, this issue.
Sintef, 2002. OREDA, OVshore Reliability Database. Sintef, Trondheim.
Taylor, J.R. 1975. A Study of Abnormal Occurrence Reports. Report RISØ-M-1837, Risø National Laboratory,
Roskilde.
Taylor, J.R., 1976. A Study of Abnormal Occurrence Reports. IAEA Conference on Reliability of Nuclear Power
Plants, IAEA_SM-195/6, Innsbruk.
Taylor, J.R., 1991. Quality and Completeness of Risk Assessment. In: Hazard IdentiWcation, 1. Taylor Associates,
Copenhagen.
Taylor, J.R., 1997. Design Error. Taylor Associates, Copenhagen.
US Department of Defence, 1995. Reliability Prediction of Electronic Equipment. Military handbook MIL-
HDBK-217, Rome Laboratory, GriYss AFB, NY.
US Environmental Protection Agency, 2005. Chemical Emergency Preparedness and Prevention. Available from:
<http://yosemite.epa.gov/oswer/ceppoweb.nsf/content/index.html>.

You might also like