Adelard Safety Case Development Manual: Ascad

ASCAD
/az-kad/
Adelard
Safety Case
Development
Manual
First Published 1998 by Adelard

College Building, Northampton Square, London EC1V 0HB
Adelard, 1998
Foreward
Following research for the UK HSE/NII in the 1990's, Adelard published its Safety
Case Development Manual (ASCAD) in 1998. This has successfully been used in
many organisations worldwide since then.
In support of the safety community Adelard has decided to make the manual
publicly available. It can be downloaded, after registration, at our website
http://www.adelard.com/resources/ascad
While now available free of charge to individuals, copyright is retained by
Adelard. Conditions of use are:
The manual may only be used by the individual who downloads the
document. It may not be passed on to anyone else without permission
from Adelard. Other interested parties should download the document
from our website. Anyone who has difficulty downloading the document
should contact Adelard to discuss other options.
The manual may be used freely by registered users, both for commercial
and non-commercial use.
While Adelard believes the content to be accurate, it accepts no
responsibility for any consequence of use, either direct or indirect. Use of
the manual implies acceptance of this and all other conditions.
The content of the manual may not be reproduced in any format (other
than for backup purposes) without agreement from Adelard in writing.
The document may be used is support of both academic teaching and
research, and in both cases some of the above restrictions may be
waived. Contact <office@adelard.com> for more information.
The document is available free of charge in softcopy only. Hard copy
versions are available at a nominal reproduction charge. Contact
<office@adelard.com> for more information.
Published 1998 by Adelard, 3 Coborn Rd, London E3 2DA

Published 2006 by Adelard, College Building, Northampton Square, London EC1V 0HB
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form, or by any means electronic, mechanical,
photocopying, recording or otherwise without prior permission in writing from Adelard.
British Library Cataloguing in Publication Data
ASCADAdelard Safety Case Development Manual
ISBN 0 9533771 0 5
Adelard Safety Case Development Manual
Adelard
Adelard is an independent consultancy founded in 1987 by Robin Bloomfield and
Peter Froome. Adelard works on a wide spectrum of problems in the area of the
assurance and development of safety related computer-based systems, ranging
from formal machine assisted verification to the human and social vulnerabilities
of organisations. We also apply this specialist knowledge to the development and
verification of real industrial systems.
http://www.adelard.com
Adelard of Bath
Adelard takes its name from Adelard of
Bath, a medieval mathematician and
natural philosopher, a crucial figure in the
development of early European thought,
and a major influence in the
revolutionary adoption of the Arabic
notation for numbers instead of the
intractable Roman numerals.
Adelards most influential works were on mathematics. He translated Euclids
Elementsstill the basis of much of todays mathematicsfrom Arabic into Latin,
the international language of European scholarship. He was also the author of a
Latin version of a treatise on Arabic arithmetic by al-Khwarizmi, the great Saracen
mathematician whose name, corrupted to algorism, became the European word
for the new system of numbers.
Version: 1.0
Contents
Part 1 Introduction............................................................................................................. 7
1 Scope ............................................................................................................................. 7
2 What is a safety case?.................................................................................................. 8
3 The importance of a good safety case ...................................................................... 8
4 Basis of the ASCAD methodology ............................................................................... 8
5 How to use the manual................................................................................................. 9
6 Feedback..................................................................................................................... 11
7 Acknowledgements ................................................................................................... 11
Part 2 Description of the safety case methodology.................................................... 13
1 Introduction.................................................................................................................. 13
2 Overview of approach ............................................................................................... 14
2.1 Safety case principles ......................................................................................... 14
2.2 Safety case structure ........................................................................................... 14
2.3 Types of claim....................................................................................................... 17
2.4 Sources of evidence............................................................................................ 17
2.5 Style of argument................................................................................................. 18
3 Safety case development.......................................................................................... 21
3.1 Safety case elements .......................................................................................... 21
4 Developing Preliminary safety case elements ........................................................ 22
4.1 Definition of system and project ........................................................................ 23
4.1.1 Operating context ............................................................................................ 23
4.1.2 Identify any defined PES (Programmable Electronic System) or component
safety requirements..................................................................................................... 24
4.1.3 Existing safety and project information ........................................................ 24
Version: 1.1
4.2 Develop claims from attributes ..........................................................................24

4.2.1 Computer system architecture ...................................................................... 25
4.2.2 Software attributes............................................................................................ 26
4.3 Traceability between levels ................................................................................27
4.4 Establish project constraints................................................................................28
4.5 Long term issues....................................................................................................29
5 Developing Architectural safety case elements .....................................................31
5.1 Design for assessment..........................................................................................32
5.1.1 Keeping it simple (KISS)...................................................................................... 33
5.1.2 Partitioning according to criticality ................................................................ 34
5.1.3 Avoidance of novelty........................................................................................ 35
5.2 Sources of evidence ............................................................................................35
5.3 Design assumptions..............................................................................................36
5.4 Choosing a suitable system architecture and safety case ............................36
5.5 Risk assessment and review ................................................................................37
6 Developing Implementation safety case elements ................................................39
6.1 Attribute-claim-evidence tables........................................................................40
6.2 Risk Assessment and review ................................................................................41
7 Operation and installation safety case elements ..................................................43
8 Project safety case structure......................................................................................44
8.1 Relationship to project lifecycle and structure ................................................44
8.2 Influence of types of system ...............................................................................46
8.3 Subsystem safety case.........................................................................................47
9 Independent assessment and acceptance of the safety case ............................48
10 Long-term maintenance ..........................................................................................50
11 Contents of a safety case report: documentation issues ......................................52
11.1 Environment description.....................................................................................53
11.2 PES safety requirements......................................................................................54
11.3 PES system architecture......................................................................................54
11.4 Planned and actual implementation approach............................................54
Version: 1.1
11.5 PES system architecture safety argument ....................................................... 55

11.6 Subsystem design and safety arguments ........................................................ 56
11.7 Long term support requirements....................................................................... 56
11.8 Status information ............................................................................................... 56
11.9 Evidence of quality and safety management ............................................... 57
11.10 References......................................................................................................... 57
Appendix A System safety context............................................................................... 59
A.1 Safety-related standards in the public domain ............................................... 61
A.2 Other safety guidance ........................................................................................ 62
A.3 Example criteria.................................................................................................... 63
A.3.1 Probabilistic criteria ........................................................................................... 63
A.3.2 Deterministic criteria ......................................................................................... 64
A.3.3 Qualitative criteria............................................................................................. 65
Appendix B Design options to limit dangerous failures ............................................. 67
B.1 Computer system defences ................................................................................ 67
B.2 Software defences ............................................................................................... 69
B.3 Operations and maintenance error defences ................................................. 71
Appendix C Checklist of safety documents ............................................................... 73
C.1 Planning ............................................................................................................... 73
C.2 Safety cases......................................................................................................... 73
C.3 Safety related documentation ......................................................................... 74
C.4 Project implementation ..................................................................................... 74
C.5 Review and audits .............................................................................................. 74
Appendix D Attribute-claim-evidence tables ............................................................ 75
D.1 Attribute-claim-design tables ............................................................................ 75
D.2 Attribute-claim-argument tables ..................................................................... 79
Appendix E Review of changes that can affect the safety case ............................. 83
E.1 Changed PES system requirements .................................................................... 83
E.2 Impending obsolescence.................................................................................... 85
E.3 Changes to regulatory environment or safety criteria..................................... 88
Appendix F Safety case review checklist .................................................................... 91
Version: 1.1
F.1 Basis for the checklists ..........................................................................................91

F.2 Demonstrable.....................................................................................................91
F.3 Valid.....................................................................................................................91
F.4 Adequately safe ................................................................................................92
F.5 Over its entire lifetime........................................................................................92
F.6 Checklist for the technical adequacy of the arguments ................................93
F.6.1 Completeness of argument ............................................................................. 93
F.6.2 Credibility of argument ..................................................................................... 94
F.6.3 Integrity of the safety case documentation and system design .............. 94
F.6.4 Checklist for integrity of the operations and maintenance infra-structure94
F.7 Long-term maintainability of the safety case....................................................95
F.7.1 Robustness to system change.......................................................................... 95
F.7.2 Long-term integrity of the safety case support infra-structure .................. 95
F.7.3 Impact of technological obsolescence ........................................................ 96
F.7.4 Impact of regulatory change .......................................................................... 96
Appendix G Use of field evidence to support a reliability claim...............................97
G.1 Empirical evidence.............................................................................................97
G.2 Theoretical analysis .............................................................................................99
G.3 Application of the theory to COTS..................................................................101
G.4 Application to a new system ...........................................................................103
G.5 Estimating residual faults ..................................................................................103
G.6 References .........................................................................................................105
Appendix H Long term issues......................................................................................107
H.1 Introduction ........................................................................................................107
H.2 Incorporating the guidance in existing safety management processes ...107
H.3 Long-term improvement of the safety methodology ...................................109
H.4 Safety case maintenance documentation ...................................................109
Appendix I Maintenance and human factors ..........................................................113
I.1 Individual weaknesses..........................................................................................113
I.2 Supporting materials ...........................................................................................115
I.3 Violations ...............................................................................................................115
I.4 Group weaknesses ...............................................................................................115
Version: 1.1
I.5 Organisational issues ........................................................................................... 116

I.6 Knowledge management .................................................................................. 117
I.7 References ........................................................................................................... 118
Appendix J Example checklist long term issues ...................................................... 119
J.1 Basis for the checklists........................................................................................ 119
J.2 Remain acceptable....................................................................................... 120
J.2.1 Demonstrable.................................................................................................... 121
J.2.2 Consistent........................................................................................................... 123
J.2.3 Valid .................................................................................................................... 124
J.2.4 Adaptable ......................................................................................................... 124
J.3 Respond to changes in the equipment, environment, and technical knowledge
..................................................................................................................................... 125
J.3.1 Equipment Changes........................................................................................ 125
J.3.2 Changes in the environment ......................................................................... 126
J.3.3 Changes in technical knowledge ................................................................ 127
J.4 The checklists ...................................................................................................... 129
J.5 Demonstrable ..................................................................................................... 129
J.5.1 Human resources............................................................................................. 129
J.5.2 Documentation ............................................................................................... 131
J.5.3 Technical resources ........................................................................................ 132
J.6 Consistent............................................................................................................ 132
J.7 Valid ..................................................................................................................... 133
J.8 Adaptable........................................................................................................... 134
J.9 Respond to changes in the equipment ......................................................... 136
J.10 Respond to changes in the environment .................................................... 137
J.11 Respond to changes in the technical knowledge ...................................... 139
J.12 Long-term improvement of the safety methodology ................................. 140
Appendix K Example safety case.............................................................................. 141
K.1 The environment ................................................................................................. 141
K.1.1 The plant ............................................................................................................ 141
K.1.2 Sensors and actuators..................................................................................... 141
K.1.3 Failure modes.................................................................................................... 141
K.2 Trip system requirements.................................................................................... 141
Version: 1.1
K.3 Candidate system architecture ........................................................................142

K.3.1 Redundant channels and thermocouples ................................................. 143
K.3.2 Fail-safe design features ................................................................................. 144
K.3.3 Separate monitor computer.......................................................................... 145
K.3.4 Simplicity ............................................................................................................ 145
K.3.5 Formally proved software ............................................................................... 145
K.3.6 1oo2 high trip logic .......................................................................................... 145
K.3.7 2oo2 low trip logic............................................................................................ 146
K.3.8 Program and trip parameters in PROM ....................................................... 146
K.3.9 Modular hardware replacement.................................................................. 146
K.3.10 Use of mature hardware and software tools............................................ 146
K.3.11 Access constraints.......................................................................................... 146
K.3.12 Summary of design features contributing to safety ................................ 146
K.4 Evidence from the development process .......................................................148
K.5 Long term support activities ..............................................................................148
K.6 Arguments supporting the safety claims..........................................................149
K.7 Supporting analyses............................................................................................153
K.7.1 Probabilistic fault tree analysis....................................................................... 153
K.7.2 Anticipated change analysis......................................................................... 155
K.7.3 Analysis of maintenance and operations ................................................... 156
K.8 Safety long-term support requirements............................................................157
K.8.1 Support infrastructure ...................................................................................... 157
K.8.2 Maintenance support risks ............................................................................. 158
K.8.3 Regular analyses .............................................................................................. 159
K.9 Elaboration to subsystem requirements ...........................................................159
K.9.1 Software Functional requirements ................................................................ 160
K.9.2 Safety case design constraints imposed on the software ....................... 161
K.9.3 Safety case evidence requirements for the software development .... 162
K.9.4 Software documentation/QA requirements............................................... 162
K.10 References: ......................................................................................................162
Appendix L Index .........................................................................................................163
Version: 1.1
Part 1 Introduction
1 Scope
This manual defines the Adelard safety case development methodology (ASCAD)
which seeks to minimise safety risks and commercial risks by constructing a
demonstrable safety case. The ASCAD methodology places the main emphasis
on claims about the behaviour of the system (i.e. functional behaviour and system
attributes) and methods for structuring the safety arguments which are both
understandable and traceable.
The overall approach used in ASCAD is generic and applicable across a wide
range of technologies. The details of the approach are concerned with safety
cases for computer based command, control and protection systems such as
those found in railway signalling, nuclear reactor protection, air traffic control and
safety critical medical devices as well as many diverse military applications.
ASCAD can be applied both to new systems, using bespoke or COTS
components, and to the retrospective development of safety cases.
Many problems in producing an acceptable safety case can arise from an
attitude that regards the safety case as a bolt-on accessory to the system
(often produced after the system has been built). At this stage it is often
discovered that retro-fitting the supporting safety case is both expensive and
time consuming because the design does not minimise the scope of assessment
and the retrospective production of evidence is expensive. The overall ASCAD
approach can be applied to existing systems but the safety case options are
more constrained.
The manual assumes that the reader is familiar with the concepts of safety
management systems, quality management systems and safety analysis in
general. There is already a large body of guidance in these areas and the
uniqueness of this manual is its emphasis on addressing the construction of safety
cases. We also assume a familiarity with the system safety context as elaborated
in Appendix A.
Version: 1.1
2 What is a safety case?

We define a safety case as:
A documented body of evidence that provides a convincing and
valid argument that a system is adequately safe for a given
application in a given environment
The safety case is a living set of documents which evolve over the life of the
system. In practice the arguments of the safety case are contained in the safety
case report, a document defining and describing the overall safety case, with
references to a number of supporting documents.
3 The importance of a good safety case

It is important that an adequate safety case is produced for a safety related
system in order to:
1. Ensure an adequate level of safety
2. Ensure that safety is maintained throughout the lifetime of the system
3. Minimise licensing risk (being able to demonstrate safety to the regulators
and assessors)
4. Minimise commercial risk (ensuring implementation and maintenance
costs are acceptable)
A safety case is a requirement in many safety standards and industries. Explicit
safety cases are required for military systems, the off shore oil industry, rail transport
and the nuclear industry. Furthermore, equivalent requirements can be found in
other industry standards, such as the emerging IEC 61508 (which requires a
functional safety assessment) the EN 292 Machinery Directive (which requires a
technical file) and DO 178B for avionics (which requires an accomplishment
summary).
4 Basis of the ASCAD methodology

Adelard has developed this methodology over several years. Initially the ideas
were the product of research studies, but this methodology has been applied in:
Specific safety cases for a number of command and control systems
Version: 1.1
The safety case for the DUST-EXPERT advisory software produced by

Adelard for the Health and Safety Executive.
The development of safety standards such as MOD Def Stan 00-55
A generalised form in Def Stan 00-42 Part 2, as the software reliability case
The development of a Software Assessment Manual to IEC 61508 for

Factory Mutual Research Corp.
The approach has evolved during this period, but the evolution is largely through
extensions to the methodology rather than changing earlier ideas. While the
methodology is likely to evolve further, we believe that our current ASCAD
provides a good basis for safety case development.
5 How to use the manual

Part 1 of the manual provides some introductory material. The methodology itself
is described in Part 2. Part 2 is structured as follows:
Section 2 provides an overview of the technical approach
Section 3 defines the different elements of safety cases and their relationship
to the project and safety lifecycles. Guidance on the development of the
safety case elements is provided in detail in sections 47.
Section 4 provides guidance on the Preliminary safety case element
Section 5 provides guidance on the Architectural safety case element. This
leads into the need for design for assessmentcovered in Section 5.1
Section 6 provides guidance on the Implementation safety case element
Section 7 describes the Operation and Installation safety case element
Section 8 describes how to combine the safety case elements into a safety
case structure for a real project
Section 9 discusses the independent assessment of the safety cases
Section 10 deals with the important topic of long term maintenance of the
safety case
Section 11 discusses the contents of the safety case report and references
checklists of supporting documents.
Version: 1.1
The main guidance is supplemented by appendices containing supporting

information, checklists, and an example safety case for a simple application.
The manual can be read from a number of different viewpoints.
New to safety cases?
Section 2 provides an overview of the approach. Also consider Appendix A
which provides an overview of the system safety context. It might also be
worth browsing Section 11 to get an idea of the contents of a safety case
and the example in Appendix K.
Wishing to construct a safety case?
It may be worth browsing the introductory material but the main starting
point is Section 3. This will provide signposts to the guidance on the different
elements of the safety case. Before starting it would be worth looking at the
example in Section K.
An experienced safety case developer?
Again, browse the overview and start with Section 3. One of the differences
between this manual and other published material is the emphasis on design
for assessment (see Section 5.1 and the checklists in Appendix D). The
tabular approach to safety case development will also probably be new
(see Sections 5 and Section 6.1). Also Section G, which discusses at length
the use of field experience, may be unfamiliar.
An independent assessor or regulator?
There is a specific section on assessment (Section 9) with a supporting
checklist in Appendix F. Long term issues are often not given sufficient weight
so consider Section 10 as well.
Concerned about long term issues?
Section 10 is specifically devoted to long term issues and refers to a number
of supporting appendices.
Developing a safety case retrospectively?
The methodology does not only apply to new systems. However the
evidence and arguments used can be different. Of particular interest will be
Appendix G, which discusses at length the use of field experience
There are a variety of checklists to support the safety case construction. These are
indicated with a tick.
10
Version: 1.1
6 Feedback
We are keen to receive feedback on this manual. Please send comments to
ascad@adelard.co.uk, see our www page at http://www.adelard.co.uk or write
to Robin Bloomfield, Adelard, 3 Coborn Road, London E3 2DA.
7 Acknowledgements
The manual was produced by Peter Bishop, Robin Bloomfield, Luke Emmet, Claire
Jones and Peter Froome. Some of the underlying technical work was undertaken
in the CEC sponsored SHIP project (ref. EV5V 103). More recent material has come
from the Quarc project funded by the UK (Nuclear) Industrial Management
Committee (IMC) Nuclear Safety Research Programme under Scottish Nuclear
contracts 70B/0000/006384 and PP/74851/HN/MB.
Version: 1.1
11
12
Version: 1.1
Part 2 Description of the

safety case methodology
1 Introduction
This manual describes our approach to developing safety cases for computer
based command, control and protection systems. It provides technical rationale,
an explanation of how to construct safety cases, and supporting checklists and
examples to help with the efficient and practical development of safety case.
The manual is structured as follows:
Section 2 provides an overview of the technical approach
Section 3 defines the different elements of safety cases and their relationship
to the project and safety lifecycles. Guidance on the development of the
safety case elements is provided in detail in sections 47 below.
Section 4 provides guidance on the Preliminary safety case element
Section 5 provides guidance on the Architectural safety case element. This
leads into the need for design for assessmentcovered in Section 5.1
Section 6 provides guidance on the Implementation safety case element
Section 7 describes the Operation and Installation safety case element
Section 8 describes how to combine the safety case elements into a safety
case structure for a real project
Section 9 discusses the independent assessment of the safety cases
Section 10 deals with the important topic of long term maintenance of the
safety case
Section 11 discusses the contents of the safety case report and references
checklists of supporting documents.
Version: 1.1
13
The main guidance is supplemented by appendices containing supporting

information, checklists, and an example safety case for a simple application.
2 Overview of approach
2.1 Safety case principles
We define a safety case as:
a documented body of evidence that provides a demonstrable and
valid argument that a system is adequately safe for a given application
and environment over its lifetime.
To implement a safety case we need to:
make an explicit set of claims about the system
produce the supporting evidence
provide a set of safety arguments that link the claims to the
evidence
make clear the assumptions and judgements underlying the
arguments
allow different viewpoints and levels of detail
The following sections describe how we think a safety case should be structured
to meet these goals.
2.2 Safety case structure

The safety case should:
make an explicit set of claims about the system
provide a systematic structure for marshalling the evidence
provide a set of safety arguments that link the claims to the evidence
make clear the assumptions and judgements underlying the arguments
provide for different viewpoints and levels of detail
14
Version: 1.1
A safety case consists of the following elements: a claim about a property of the
system or some subsystem; evidence which is used as the basis of the safety
argument; an argument linking the evidence to the claim, and an inference
mechanism that provides the transformational rules for the argument. This is
summarised in the figure below.
Inference rule
Evidence
Claim
Evidence
Subclaim
Inference rule
Argument structure
Figure 1: Safety case structure

Note that evidence can be a sub-claim produced by a subsidiary safety-case.
This means that there can be a relatively simple top-level argument, supported by
a hierarchy of subsidiary safety cases. This structuring makes it easier to
understand the main arguments and to partition the safety case activities.
Different types of argument can be used to support claims for the attributes:
Deterministic application of predetermined rules to derive a true/false
claim (given some initial assumptions), e.g. formal proof of
compliance to a specification, or demonstration of a safety
requirement (such as execution time analysis or exhaustive
test of the logic)
Probabilistic
Version: 1.1
quantitative statistical reasoning, to establish a numerical

level (e.g. MTTF, MTTR, reliability testing)
15
Qualitative
compliance with rules that have an indirect link to the

desired attributes (e.g. compliance with QMS standards, staff
skills and experience)
The choice of argument will depend on the available evidence and the type of
claim. For example claims for reliability would normally be supported by statistical
arguments, while other claims (e.g. for maintainability) might rely on more
qualitative arguments such as adherence to codes of practice.
In addition the overall argument should be robust, i.e. it should be valid even if
there are uncertainties or errors. For example, two independent arguments could
be used to support the top level safety claim about a given system. Alternatively,
if there are two independent systems that can assure safety, it may only be
necessary to have a single argument for each one. Typically the strength of the
argument will depend on the integrity level associated with the specific system. At
the highest integrity level (Level 4) we might expect two independent arguments
for a single system regardless of the existence of other systems, as illustrated in
Figure H0 below
Evidence A
Evidence B
Argument 1
Evidence C
Argument 2
Claim
Figure 2: Illustration of a robust claim

The development of the safety case should be alert to the possibility of evidence
that detracts from or possibly refutes the claims being made.
The safety case needs to be viewed at various levels of detail. A top-level safety
case might be decomposed into a hierarchy of sub-claims which are treated as
evidence in the top level safety caseso the evidence used at one level of the
argument can be:
16
facts, e.g. based on established scientific principles and prior research
Version: 1.1
assumptions, which are necessary to make the argument, but may not
always apply in the real world
sub-claims, derived from a lower-level sub-argument
This is a recursive structure which can represent arguments at successively finer

levels of detail. This structure could evolve over the lifetime of the project. Initially
some of the sub-claims might actually be design targets, but as the system
develops the sub-claims might be replaced by facts or more detailed subarguments based on the real system. Deviations in implementation can be
analysed to see how they affect a sub-claim, and how changes in a sub-claim
ripple through the safety argument.
If correctly designed, the top-level safety case should remain substantially the
same as the design evolves, and many of the detailed sub-arguments and
evidence can be referenced out to supporting documents or subsidiary safety
cases. The evolution of the safety case at the top-level should be confined mainly
to the changing status of the supporting sub-claims and assumptions. For example
there may be an assumption that some tool will operate correctly, and this may
be later supported by explicit field evidence or analysis. The status of the
assumption would then change from unsupported to verified with a crossreference to the supporting document.
2.3 Types of claim

The safety case is broken down into claims about different attributes for the
various sub-systems, e.g.:
reliability and availability
usability (by the operator)
security (from external attack)
fail-safety
functional correctness
accuracy
time response
robustness to overload
maintainability
modifiability
The relevant attributes should be identified and, where possible, quantified. Note
that the attributes listed are only examples and further attributes may be safetyrelevant. This is elaborated later in Section 4.2.1.
2.4 Sources of evidence

The arguments themselves can utilise evidence from the following main sources:
Version: 1.1
17
the design
the development processes
simulated experience (via reliability testing)
prior field experience
The choice of argument will depend in part on the availability of such evidence,
e.g. claims for reliability might be based on field experience for an established
design, and on development processes and reliability testing for a new design.
2.5 Style of argument

In safety related systems, we are primarily concerned with dangerous failures, and
the safety argument should be focused on ways of inhibiting a dangerous failure.
To do this, we first have to establish whether there is a known safe state for the
safety system or component. In the application described in Appendix K, there is
a known safe state so the design can be biased to fail in that direction.
Even in cases where there is no safe state for the hazardous plant (e.g. an aircraft,
or an unstable chemical process) it may still be possible to identify a safe state for
the subsystem (such as a transfer to a backup system or manual control). The
nature of the safety case argument will depend on the existence or otherwise of
these safe states.
This possible strategy for maximising safety is very similar to the one followed for
the top-level plant safety (i.e. hazard elimination, control, and accident
mitigation). For the subsystems, the hazards are more indirecta hazardous
subsystem state could, in combination with other failures, lead to an accident. At
these lower levels, we need to consider how hazardous subsystem states could
arise (i.e. what random and systematic faults could lead to a hazardous state),
and then minimise the probability of occurrence by a combination of fault
elimination, fault tolerance, and failure mitigation (normally fail-safety).
We can characterise the various approaches to limiting the dangerous failure
rate in the following figure.
18
Version: 1.1
Transition depends on:

fault tolerance in
design
nature of application
(grace time, selfhealing?)
Safe
State
Error correction
Safe failure
Error
State
OK
State
Transition depends
on:
fail-safe design
partitioning
existence of safe
states
Dangerous failure
Fault activation
Danger
State
Transition depends on:

fault freeness
KISS, partitioning, novelty,
implementation quality,
past operating experience
Figure 3: Model of system failure behaviour
This fault-error-failure model can be applied at the level of a complete system or
for sub-components (e.g. the software level). A fault is a defect in the system and
is the primary source of the failure. However a system will probably operate as
intended until some triggering input condition is encountered. Once triggered,
some of the output values will deviate from the design intent (an error). However
the deviation may not be large enough (or persist long enough) to be dangerous,
so the system may recover naturally from the glitch in subsequent computations
(self healing). Alternatively explicit design features (e.g. diversity or safety
kernels) can be used to detect such deviations and either recover the correct
value (error recovery) or override the value with a safe alternative (fail-safety).
The chance of a dangerous failure transition would normally be expressed in
probabilistic terms, but it might also be expressed in deterministic terms (this can
never happen), or qualitative terms (e.g. two barriers must fail before this can
happen).
Version: 1.1
19
A particular safety argument can focus on claims about particular transition arcs.
The main approaches are listed below:
A fault elimination argument can increase the chance of being in the
perfect state and can hence reduce or eliminate the OK erroneous
transition. This is the reasoning behind the requirement to use formal methods
(e.g. in MOD DS 00-55) which essentially supports a claim that the error
transition rate is zero because the software correctly implements the
specified logical behaviour.
A failure containment argument can strengthen the erroneous OK or
erroneous safe transition. An example would be a strongly fail-safe design
which quantifies the fail-safe bias. This, coupled with test evidence bounding
the error activation rate, would be sufficient to bound the dangerous failure
rate.
A failure rate estimation argument can estimate the OK dangerous
transition. The whole system is treated as a black-box, and probabilistic
arguments are made about the observed failure rate based on past
experience or extensive reliability testing.
It is also possible to apply the arguments selectively to particular components or
fault classes, e.g.:
A design incorporates a safety barrier, which can limit dangerous failures
occurring in the remainder of the system. The safety argument would then
focus on the reliability of the barrier rather than the whole system.
Different countermeasures might be utilised for different classes of fault.
Each fault class then represents a separate link in the argument chain, and
all fault classes would have to be covered to complete the argument chain.
For example, design faults might be demonstrated to be absent by formal
development, while random hardware failures are covered by hardware
redundancy.
While normally applied to incorrect logical behaviour, the same approach can
be applied to many of the other safety attributes. For instance to ensure
timeliness, timing errors could be:
eliminated by a design that ensures a maximum response time

mitigated using independent timing checks to force the output to a safe
state
A similar strategy could be applied for other safety-related attributes (e.g.

accuracy, security and maintainability).
20
Version: 1.1
3 Safety case development

3.1 Safety case elements
The development of a safety case does not follow a simple step by step process
as the main activities interact with each other and iterate as the design proceeds
and as the level of component in the system changes. We have identified four
elements from which, in different combinations, one can construct the safety
cases required on a real project. The elements are:
Preliminary
Architectural
Implementation
Operation and
Installation
The scale and nature of a project will determine the number and type of safety
case elements required. There may be a recursive structure with multiple
Preliminary and Architectural elements within the system and sub- safety cases.
The structuring of these elements on real projects is discussed in Section 8.
The characteristics of the safety case elements are, whether one is considering a
new or off the shelf system, as follows.
Preliminary safety case element
1. This establishes the system context, whether the safety case is for a
complete system or a component within a system, and the phase of the
project lifecycle
2. It also establishes safety requirements and attributes for the level of the
design and interfaces to the system safety analysis
3. It defines operational requirements and constraints such as maintenance
levels, time to repair.
Architectural safety case element
1. This defines the system or sub-system architecture and makes trade-offs
between the design of the system and the options for the safety case.
Version: 1.1
21
2. It defines the assumptions that need to be validated and evidence to be

provided in the component safety cases.
3. It also defines how the design addresses the preliminary operating and
installation aspects for the safety case (e.g. via maintainability,
modifiability, and usability attributes).
Implementation safety case element
1. This safety case provides the justification that the design intent of the
architectural safety case has been implemented and that the actual
design features and the development process followed provides the
evidence that the safety requirements are satisfied.
2. Additional assumptions for operation and maintenance are identified
and detail provided on how to meet the operational requirements.
Operation and installation safety case element
1. This safety case adds detail to the maintenance and support
requirements identified in the implementation safety case.
2. It defines any safety related operational procedures identified in the
preliminary safety case or Architectural safety case.
3. For a COTS system, the safety case would include the safety justification
of the specific configuration, and human factors-related issues such as
staffing requirement s and competence levels, training of operators and
maintenance personnel and facilities for long-term support.
4. The safety case would also record and resolve any non-compliances with
the original safety requirements.
Note: the reason for separating out the Architectural safety case is the
importance that good design has in ensuring safety. In our experience this is often
a neglected area of safety engineering and standards.
4 Developing Preliminary safety case elements

A Preliminary safety case element establishes the system and safety context:
whether the safety case is for a complete system or a component within a system,
the phase of the project lifecycle and defines the safety requirements and
attributes. To achieve this it is necessary to:
1. Define the system and equipment that a safety case is being developed
for and assess existing information about the project
22
Version: 1.1
2. Select relevant attributes and define safety requirements as claims from

them
3. Provide traceability to system and other sub-system safety cases
4. Establish project constraints on design options and availability of
evidence
5. Assess potential long term changes to the safety case context
The extent to which this involves just marshalling existing information and the
extent to which it requires new analysis is of course very project dependent. Some
legacy systems may have none of this information available.
4.1 Definition of system and project

4.1.1 Operating context
Establish the operating context for the safety system including:
external equipment (e.g. the plant or other equipment)
interfaces to environment (e.g. actuators, sensors, data links)
failure modes of external equipment and interfaces
the safe and hazardous plant states (or equipment states) and target
failure probabilities
hazardous / safe states of the interfaces
anticipated changes in external equipment, interfaces and operating
modes
any operational or maintenance requirements such as maintenance
levels, repair times, manning intervals
4.1.2 Identify any defined PES (Programmable Electronic System) or

component safety requirements
Identify any top level safety requirements for the components that have been
defined. These might include:
safety functions
Version: 1.1
23
reliability requirements
other safety attributes
applicable design criteria and standards
anticipated changes over its lifetime
4.1.3 Existing safety and project information

Establish the extent and quality of the existing safety documentation and the
requirements of any safety management system. A checklist of safety
documentation is provided in Appendix C.
4.2 Develop claims from attributes

The safety case is broken down into claims about different attributes for the
various sub-systems. The relevant attributes should be identified and, where
possible, quantified. Below is a suggested list of safety attributes at two levels: the
computer system level and the software level.
Note that the attributes listed are only examples and further attributes may be
safety-relevant. Conversely, for some applications not all attributes need be
safety-related. For example
security might be addressed by physical barriers
fault tolerance may be implemented in hardware
time response would not be safety-relevant for off-line stress analysis

programs, but it would be necessary to have accuracy and functional
correctness
4.2.1 Computer system architecture

Some of these attributes may not be relevant or may be addressed by other parts
of the system The appropriate set of safety attributes should be identified for the
specific application.
24
Version: 1.1
System attributes
Accuracy
Availability
Fail-safety
Logical correctness
Maintainability (e.g. MTTR)
Maximum input and output data rates
Maximum response time
Maximum storage capacity (e.g. permanent records)
Modifiability (with respect to identified functional changes)
Real-time performance
Reliability (e.g. MTTF, pfd)
Response to hardware failures
Response to internal failures
Response to overload (data rate, internal storage)
Security
Timeliness
Usability
Table 1: Computer system attributes
4.2.2 Software attributes

This is a suggested list of safety attributes for the software. Many of the
requirements at the computer system level will be converted into functional
requirements for the software and hardware, so for example logical correctness
at the software level may be needed to implement the security attribute at the
computer system level. Some of these attributes may not be relevant to the
Version: 1.1
25
software or may be addressed by other parts of the system (e.g. fault tolerance
may be implemented entirely in hardware). In addition, the software
implementation must cope with the constraints imposed by the specific choice of
hardware.
Software attributes
Accuracy
Compliance with hardware constraints (e.g. memory
capacity)
Fail-safety
Fault tolerance
Logical correctness (sometimes represented by the
software integrity level)
Maintainability
Modifiability (with respect to identified functional
changes)
Reliability
Response to hardware failures
Response to internal failures
Response to overload (data rate, internal storage)
Time response
Table 2: Software attributes
4.3 Traceability between levels

The top-level requirements are transformed into derived requirements as the
design proceeds producing a layered safety case. In the example below, a toplevel overall safety target (a worst case accident rate) is progressively
transformed into derived requirements for subsystems.
26
Version: 1.1
Target for top event
Plant Safety
Requirement
Safety
Functions 1
Dangerous failure rate

Availability
Integrity attributes
Performance attributes

Availability
Residual attributes
Hardware
Functions
(Accident probability)
Safety
Functions 2
Dangerous failure
rate
Availability
System
Architecture

Availability
Integrity attributes
Performance attributes
Computer
System Functions
Computer
System Functions
Software
Functions
Initially these might be attributes such as security or maintainability, but at a

more detailed level of implementation these requirements will be converted into
design requirements that are implemented in one or more subsystems. It is
important that there is traceability between these levels so that there is a clear link
between the design features and the safety attributes. The subsidiary safety cases
for the subsystems should identify the design features and present arguments to
support claims that they implement the safety attributes. The traceability between
levels is illustrated in the figure below:
Version: 1.1
27
Additional software
allow parameterisation
Modifiability
Recovery routines
Availability
Hardware
Security
Voting algorithms
Redundant channels
Data encryption
mechanisms
Password authentication
network isolation
In this way layered safety cases are developed, i.e. a top-level safety case with
subsidiary traceable safety cases for subsystems.
4.4 Establish project constraints

The extent to which the design can be changed and the availability and costs of
different types of evidence are two very important considerations in the
development of the safety case strategy. Each project will be faced with different
immutable realties and these should be established as early as possible in the
safety case development. Two extreme examples are:
The development of a new safety case in conjunction with a new system
The development of a safety case for an existing legacy system
The first is characterised by freedom to choose design options and an absence of

operating experience. The latter has no design freedom but a potentially large
body of operating experience.
28
Version: 1.1
Checklist of project constraints

Design freedoms
To what extent can the design of the system be influenced?
If the design of the component is frozen are there other design options
available? e.g. add additional systems or components
Is there scope for design for assessment which takes into account the
costs and complexity of the safety case as well as the design?
Availability of evidence
Is there operating experience with the component? If so, how much,
how similar?
Is there evidence about the engineering process used to develop and
assure the component? If so, what type of data, of what quality? If not,
what are the likely costs of obtaining the evidence?
Is there analytical or empirical evidence about the behaviour of the
component? Does the data include safety relevant situations? Is it
relevant to this application?
Are there existing safety cases e.g. for generic system, or similar system?
4.5 Long term issues

The safety case context should include an assessment of the expected changes
over the lifetime of the safety case. A checklist of issues to consider is provided in
Table H0. The background to the lists is provided in Appendix E.
Version: 1.1
29
Checklist of potential changes

Support environment changes
Changes to equipment maintenance and operation procedures (e.g. to
increase the intervals between maintenance)
Obsolescence of test and analysis hardware and software
Obsolescence/ upgrades of support tools (compilers, linkers, archiving
tools, document browsers, etc.)
Hardware changes
Obsolescence of computer hardware
Obsolescence / replacement of interface equipment (sensors,
actuators)
Interfaces to new systems
Software changes
Data changes (trip levels, configuration options)
Functional changes (new safety logic, support for new interfaces)
Changes to other attributes (timing, accuracy, storage capacity)
Changes in safety criteria
More stringent requirements on diversity of subsystems
More stringent requirements for system and channel isolation
Increased integrity requirements for software
Additional / stronger safety arguments to support claim
Table 3: Checklist of potential changes
There is also the need to plan ahead in the design of the safety case and
consider long term support issues. These are discussed in Section 10 and in
particular it may be necessary to conduct a safety case infrastructure assessment
(see Appendix H.2).
30
Version: 1.1
5 Developing Architectural safety case elements

A Architectural safety cases provides the first level of detail of the safety case. It
involves:
1. Establishing the safety requirements either by importing the Preliminary
safety case and/or repeating it for the changes that have occurred (e.g.
revised safety analysis, more detail of design).
2. Evaluating design options or existing features to assess their relevance to
the safety case claims and attributes.
3. Adopting a design for assessment approach to develop a solution for
each safety attribute claim.
4. Elaborating the evidence to show that the claim is met and more usually
for this type of safety case defining the evidence that is required to be
collected.
5. Identifying the requirements that will be passed onto subsystems to
implement the architectural requirements.
6. Undertaking a risk assessment to identify any additional hazards arising
from: random failures, systematic faults or human errors in operations and
maintenance.
7. Assessing any additional risks introduced by the subsystem to ensure they
are acceptable in the context of the overall safety case.
The development of the Architecture safety case can be seen as the
progressive completion of a table for each attribute:
Version: 1.1
31
Attribute: Functional Behaviour

Claim
This is from the

Preliminary
Safety Case.
See Section 4
Design Features
Assumption
/Evidence
Subsystem
Requirements
The evidence
either needed
(assumption) or
used to
substantiate the
claim is recorded
here.
See Section 5.2
These are selected or

those present
evaluated using the
fault avoidance, the
tolerance and failsafe bias approach.
See Section 5.1.
Used to
document and
trace
assumptions.
See Section 5.3
Design options See Section 5.1 and checklists in Appendix B

Attribute
Fault Avoidance
Error Tolerance
Fail-safe bias
Example tables are provided in Appendix D.
5.1 Design for assessment

In systems where there is design freedom to influence the design a design for
assessment approach is advocated where the safety system and the safety case
arguments are designed in parallel. In other more constrained situations the
design features that can contribute to the safety argument need to be identified
and evaluated.
By integrating the safety case into the design, the feasibility and cost of the safety
case construction and maintenance can be evaluated in the initial design phase.
This design for assessment approach should help exclude unsuitable designs
and enable more realistic design trade-offs to be made. The need to
demonstrate safety can be a very significant factor in the overall costs. For
example the Darlington computer-based reactor trip software was considered to
be too complex to understand and difficult to maintain. As a result, around 50
32
Version: 1.1
man-years of effort was expended in software analysis, combined with extensive

statistical testing of the software using simulated trips. This resulted in several
months of licensing delay and the loss of several million pounds in lost generation.
Further costs will be incurred, as the software must be rewritten to make it more
maintainable. In this case, a design that permitted a more convincing safety case
would have been very cost-effective, even if the implementation costs were
higher.
The design should incorporate defences against anticipated hardware failures,
design flaws and human errors that could affect the functional behaviour of the
system. Three main strategies exist for ensuring safety: fault avoidance, error
tolerance and fail-safe bias. The tables in Appendix B identify potential defences
under these three main headings.
It is difficult to be specific about the choice of appropriate design and safety
case options that are likely to be both cost effective and convincing, but some
general design strategies are given below.
5.1.1 Keeping it simple (KISS)

Simplicity has many benefits: it can reduce the costs of implementation, the
safety case is easier to understand and, as a consequence, the risks of licensing
delays are reduced. While this may sound obvious, actually achieving simplicity is
quite difficult and it is all too easy to introduce unnecessary design complexity
which then has to be justified with a more complex safety case and more
extensive evidence.
Avoidance of complexity should be considered at an early stage in the design
process. It is often possible to choose a safety system architecture that eliminates
or at least reduces dependence on complex computer-based systems and
hence reduces the problems of constructing safety software. Take for example a
proposal to replace existing pressure limit switches (illustrated below).
High
Pressure
Pipe
Pressure limit
switch
Safety logic
(1,000 metres)
Figure 4: Original design

The operational goals are to improve availability and reduce time-consuming
manual checks (e.g. valving off a pipe to perform an over-pressure test). One
possibility is to replace each switch with a smart sensor; intelligent cross-
Version: 1.1
33
comparisons between sensors can identify failures and hence improve fault
diagnosis and availability. However the safety justification would be extremely
difficult without detailed analysis of the smart sensor software and hardware. In
fact it is possible to produce a simple design which meets the safety and
operational requirements without excessive reliance on computer-based
elements as shown below.
High
Pressure
Pipe
Limit Switch
Analogue
Pressure Sensor
Safety logic
4-20 mA signal
Isolated
repeater
signals
(1,000 meters)
spy-points for monitor computer

Figure 5: Simple replacement design
The main difference is that an analogue pressure measurement is made in the
pipe rather than using a binary switch. By adding some spy-points, the
performance of the external pressure switches can be continuously monitored
and cross-compared by a single computer. It is relatively easy to justify the main
safety logic as it uses well-established, simple components. Failure of the
computer will interrupt monitoring but this has no immediate safety impact, so the
monitoring functions could be readily implemented on the existing station data
processing system or a simple low-integrity PLC.
5.1.2 Partitioning according to criticality

Even in cases where computers are assigned a safety critical function, it is possible
to minimise design complexity by partitioning the design according to criticality.
For example the Digital Trip Meter developed for Ontario Hydros Pickering B
nuclear power station is a simple device which trips on a single parameter. To
further minimise the complexity, the basic trip function is implemented in an
entirely separate computer. The more complex but less critical display and
monitoring functions are implemented in a second computer. This makes it easier
to justify the integrity of the main trip function. A similar approach is used in railway
computer-based safety interlocking equipmentthe interlocking and monitoring
functions are implemented on separate computers.
34
Version: 1.1
The reactor trip system design in Appendix K provides a further example of

partitioning. Complex diagnostic functions are excluded by implementing them
on a separate machine. The choice of this architecture simplifies the design of the
main safety software, which is then easier to implement, analyse, test and justify in
a safety case.
5.1.3 Avoidance of novelty

Established systems and components have an established track record and past
field experience. The availability of existing evidence of fitness for purpose (e.g.
failure rates, failure modes and resistance to environmental attacks) reduces
uncertainty in the safety case arguments and the need to produce new
evidence to support the safety argument. The over-pressure switch example cited
earlier is an example of a system design that uses established analogue
components to simplify the associated safety case arguments.
Where computer-based systems are used, a similar approach can be employed.
Commercial off the shelf (COTS) systems and software components can benefit
from past experience. Mature versions of software are likely to be more reliable
than new ones. This also applies to complex hardware chips. For example, over
100 design faults have been reported in the five generations of the Intel processor
chip (from the 8086 to the Pentium). Of these faults, most were present in the early
versions of each chip design and were subsequently removed in the later versions
of the chip.
Even with established components, it is difficult to extrapolate from existing
experience if the operating conditions are novel. Risks can be reduced by
avoiding unusual modes of use and operating environments.
5.2 Sources of evidence

A preliminary safety case argument should be developed for the outline
architecture, which shows why the candidate design satisfies the safety related
requirements. This could use evidence from:
System Hazard Analysis, Fault Tree Analysis, etc.

Human Error Analysis (addressing the safety impact of maintenance and
operational actions, and the safeguards)
probabilistic design assessments (of reliability, availability, fail-safety and
performance)
qualitative design assessment studies (of complexity, analysability and
novelty)
Version: 1.1
35
resource estimates for the implementation and the associated safety

case (effort, cost, time)
prior evidence about specific design techniques
independent certification (e.g. for COTS products)
experience from existing systems in field operation
5.3 Design assumptions

Almost inevitably, the safety case for the top-level design will have to make
design assumptions that need to be verified at a later stage. It is also necessary to
identify how the integrity of the design and the associated safety case will be
maintained over the lifetime of the safety system. It is therefore necessary to
identify:
requirements for additional safety case evidence to be produced during

the project (e.g. specific tests and analyses)
requirements for the long term maintenance and operation of the
equipment
requirements for long-term safety case maintenance (e.g. to handle
possible changes in safety function or technology)
5.4 Choosing a suitable system architecture and safety case

It can be seen from the example tables in Appendix D that there are many
possible arguments and architectures that could be used to meet the safety
requirements, and the choices can affect the requirements placed on the
subsystem components. For example, a triple modular redundant hardware
design (TMR) could minimise the need for software-based hardware integrity
checks. Equally it may be feasible to reduce the criticality of the software by using
hardware safety devices, diverse safety functions or diverse software
implementation. Some safety standards such as Def Stan 00-56 define basic rules
for system architectures for a given integrity level. There may also be specific
design criteria, based on prior design consensus, that are deemed essential for a
safety system.
Typically, some candidate design options will be identified and a preliminary
safety case will be constructed. This will normally be an iterative process, which
involves the identification of hazardous subsystem states (e.g. through some form
36
Version: 1.1
of hazard analysis), and appropriate countermeasures (elimination, reduction and

failure mitigation, see Appendix B). The design and safety case are then assessed
to establish whether:
the design implements the safety functions and attributes
the design criteria are satisfied
the design is feasible
the associated safety arguments are credible
the approach is cost-effective
A more detailed review checklist is given in Appendix F. In this assessment process,
the costs of implementing the safety system and the associated safety case
should be considered during the architectural design phase. This analysis should
also include a consideration of the long-term safety risks and lifecycle support
costs involved in:
changing the safety functions
changing the hardware (e.g. due to obsolescence)
maintaining the equipment
maintaining the associated safety case
A checklist of potential changes is given in Table H0. An explicit list of anticipated
changes should be constructed, and the initial system should be designed to
cater for these changes.
5.5 Risk assessment and review

Having produced the outline design and safety case argument, a risk assessment
should be performed, to identify any additional hazards arising from:
random failures
systematic faults
human errors in operations and maintenance
This could utilise techniques such as FMEA, Hazops, and Human Reliability Analysis.
Version: 1.1
37
The additional risks introduced by the subsystem should be assessed to ensure

they are acceptable in the context of the overall safety case. The feasibility and
cost should also be reviewed. This will require the involvement of the prime
contractor, and subsystem developer. Any changes and trade-offs will require the
agreement of the affected parties (e.g. regulator if safety case is changed,
operator if functionality changed).
It is also necessary to assess the safety case feasibility and cost. This assessment
should consider:
Implementation risk (addressing cost, novel designs and techniques,

design complexity, and need for ALARP).
Supplier risk assessment (if known). This could be based on factors such as
past track record, technical skills, documentation standards, quality
management standards, etc.
Licensing risk (addressing credibility of arguments, assumptions and
evidence, analysability of design, risks in obtaining the required evidence,
and compliance with design criteria and standards).
Safety case support risks concerning the long-term ability to sustain the
safety case (e.g. impact of functional changes, specialist skills, tools,
hardware obsolescence, regulatory changes).
Having identified the risks, the options and possible trade-offs should be reviewed.
This review will include the viewpoints of the developer, operator, licenser,
purchaser and maintainer. Also, the candidate design, system requirements,
safety case evidence and arguments, and the long term support requirements,
should be agreed with these stakeholders.
Appendix F provides a checklist for safety case reviewing.
6 Developing Implementation safety case elements

The Implementation safety case element completes the safety case in the sense
that it provides arguments and evidence to support the safety claims being made
about the component being implemented. Developing the Implementation
safety case element involves:
1. Establishing the component safety requirements either by importing them
from a Preliminary safety case element and/or by activities specific to this
case.
2. Elaborating the evidence to show that the claims are met.
38
Version: 1.1
3. Documenting the results and providing traceability to the appropriate

Preliminary and Architecture safety case elements.
The preliminary subsystem (or component) and architecture safety case elements
should have identified the evidence needed for the implementation safety case
(e.g. test results, proofs, checks of assumptions, justification of tools, etc.). This
evidence is now gathered either as part of the normal development processes or,
for some retrospective safety cases, through additional technical investigations.
The key distinction between this and the other safety cases is that the evidence is
now provided to support the claims.
The evidence can be a combination of:
design features and supporting analyses (e.g. failsafe bias and

demonstration of the strength of the feature)
process features and results of the process (e.g. worst case timing
analysis)
experience either real or simulated (e.g. via statistical testing) and, more
usually for this type of safety case defining the evidence that is required
to be collected.
and could arise from:
normal verification and validation activities

tests and analyses to check specific safety attributes (e.g. time response,
response to overload, fail-safe bias, fault diagnosis performance, etc.)
tests of design assumptions (i.e. fail-safety, fault detection coverage, etc.)
more general safety analyses on the system design such as failure modes
and effects analysis (FMEA) and common cause failure analysis (CCF)
The completed implementation safety case for a subsystem will provide evidence
that:
the design features, V&V and safety analysis demonstrate that the
required attributes were implemented
all sub-contracted components have been implemented to specification,
and implement their required attributes
Version: 1.1
39
all deviations are documented, and their impact has been analysed and
justified
As the project evolves the results of this subsidiary safety case will be incorporated
in the higher level system safety case. The actual subsystem components would
then be integrated into the overall system according to an integration plan. As
part of this process, the safety case may require evidence from:
conventional V&V tests
tests for specific system attributes
tests of design assumptions
hazard analyses on the final design
6.1 Attribute-claim-evidence tables

The development of the safety case can be seen as the progressive completion
of an attribute-claim-evidence table for each attribute. The following table
illustrates the types of claim that might be made for the correctness of the safety
function or other safety-related attributes of the software which have been
identified in the system-level safety case, and which are apportioned to the
software for the system. The text in italics refers to additional evidence which is
derived by V&V activities when the system has been partially or wholly
implemented.
40
Version: 1.1
Attribute Correctness
Claim
Argument
Evidence/Assumptions
There is no
logical fault in
the software
implementation
Formal proof of
specified safety
properties
The design is simple enough to be

amenable to proof
Formal proof that code

implements its
specification
Proof tool is correct (or unlikely to

make a compensating error)
Compiler generates correct code
(sub-argument might use formal
proof, past experience, or compiler
certification)
High quality V&V process
Unit test results
Software
reliability
exceeds system
requirement
Reliability can be
assessed under
simulated operational
conditions
Statistical test results
Table 4: Example safety arguments in safety case for functional correctness.

More examples are provided in Appendix D.2. The tables could also be used to
record the evidence that might refute or undermine the claim being made.
6.2 Risk Assessment and review

At the completion of the subsystem implementation, the safety case evidence
should be reviewed to establish:
Version: 1.1
41
the acceptability of the implemented subsystem (e.g. consistency with

requirements, and compliance with agreed standards criteria)
the acceptability of the subsystem safety case (e.g. completeness,
consistency, credibility, non-compliances)
whether it is consistent with the systems safety case
If problems are identified, a resolution to the problem has to be agreed between

the stakeholders (such as re-implementation, more extensive testing, reworking of
the subsystem safety case, or adjustments to the top-level safety case).
When a subsystem safety case is completed, the impact of the subsystem safety
case results on the overall safety case should be assessed, e.g.:
the impact of sub-system non-compliances on the overall safety case
whether independence and diversity assumptions have been satisfied
the strength of subsystem evidence and supporting documentation
The completed system safety case (including subsystem evidence and systemlevel evidence) should be reviewed to assess whether:
all identified hazards have been tracked and resolved

the safety case is complete (implementation and traceability of all
requirements)
the information is consistent and accessible (e.g. indexing and cross
referencing)
the supporting evidence is available in a suitable archive
It will also be necessary to check that an appropriate system support

infrastructure is identified, including:
a supporting document set
system operations and maintenance requirements
42
technical resources (e.g. for document archiving, safety analyses, and

tests)
safety case maintenance infrastructure
Version: 1.1
Appendix F provides a checklist for safety case reviewing.
7 Operation and installation safety case elements

The operational, maintenance and installation aspects of the system will have
been addressed in part by all the previous safety case elements.
The Preliminary safety case element will have defined requirements in this
area and identified any operating constraints that might apply (see
Section 4)
The Architecture safety case element will have addressed the need to
design for usability, maintainability and modifiability (see Section 5)
The Implementation safety case element will have implemented these
features and assessed whether there are any new operating constraints
or procedures required as well as adding the detail now available to the
maintenance, use and support aspects (see Section 6)
This safety case element then adds to these by:
defining any safety related operational, installation and maintenance

procedures and requirements identified in the Preliminary safety case or
Architectural safety case
assessing whether the assumptions and operating constraints defined in
the Preliminary safety case are still valid and updating them as necessary
recording and resolving any non-compliances with the original safety
requirements
putting in place acceptable measures for dealing with outstanding
concerns (e.g. periodic tests, gathering field evidence, further analyses,
etc.)
for a COTS system, including the safety justification of the specific
configuration
The types of information that will be new to this safety case are those aspects of
operation, installation and maintenance that the developer may not be
competent to define. For example, specific grades of staff to undertake the
different types of maintenance, training requirements for operators, the exact user
specific permit to work system that should be used, the identified operating
Version: 1.1
43
procedures needed to mitigate failure modes that require knowledge of the

wider system and environment to draft.
The development of the safety case may require particular trials or experiments to
be undertaken to confirm the adequacy of the operation and maintenance
aspects.
It will also be necessary to check that an appropriate system support
infrastructure is identified, including:
a supporting document set
system operation and maintenance resources
technical resources (e.g. for document archiving, safety analyses, and

tests)
appropriate user safety management systems
Over the lifetime of the system, there will almost inevitably be changes to the
safety case to accommodate changes in regulations, technology and
organisations so it will be necessary to establish a safety case maintenance
infrastructure (see Section 10).
8 Project safety case structure

The guidance above provides a systematic approach to the construction of
Preliminary, Architectural and Implementation safety case elements. To apply
these ideas to a real project will involve the identification of the number and
nature of the different safety case elements that are required. This will involve
consideration of:
Project lifecycle and structure
System structure
We also need to address any special considerations that apply when there are
subsidiary safety cases for components and sub-systems. The documentation
structure is addressed in Section 11.
8.1 Relationship to project lifecycle and structure

The project lifecycle and structure will influence the number and type of safety
case elements required. The different safety case components are influenced by
the system structure and by the changes of responsibilities. For example when a
44
Version: 1.1
contractual boundary is crossed the safety responsibilities are handed over via a
safety case for that stage. This is the practice for civil and military air traffic control
where there are four part safety cases reflecting the purchaser/developer/
operator/user/maintainer boundaries.
The following table illustrates the different safety case components, for example
of a simple command system that consists of a database and an interface. Note
not all the project phases are shown.
Version: 1.1
45
Project Phase
Safety case
element
Produced by
Invitation to Tender
Preliminary
Purchaser in conjunction with

user
Preliminary design
Preliminary
Designer, to confirm initial

preliminary safety case
System Design
Architectural
Designer
Subsystem Requirements
Preliminary
Designer for HCI

subcontractor
Subsystem Requirements
Preliminary
Designer for database

developer
HCI Subsystem Design
Architectural
HCI subcontractor for

designer
Database Subsystem
Design
Architectural
Database subcontractor for

designer
Subsystem
Implementation
Implementation
HCI subcontractor for

designer
Subsystem
Implementation
Implementation
Database subcontractor for

designer
Systems Integration
Implementation
Designer integrates and

consolidates subsystem
implementation safety cases
into overall case for
purchaser
Operation
Operational
User integrates overall

implementation safety case
into the operational safety
case
8.2 Influence of types of system

As noted above, the safety case activities will depend on the nature of the actual
system. For example, the system could be:
46
Version: 1.1
an entirely new bespoke system
a COTS product that is configured for a new application
a legacy system which is already operational
The main distinction between these cases is the degree of implementation

freedom that exists for the different types of system. This will affect the approach
taken for the Architectural and Implementation safety cases.
For a new system the design can take into account the need to
demonstrate safety and the safety case production can be incorporated
into the project following a design for assessment approach.
For a COTS product, the design freedom is more constrained. There is
design freedom in the choice of COTS, so that a system can be chosen
where there is sufficient generic evidence to demonstrate safety. There is
also design freedom in the way the product is configured and used in a
particular application.
For a pre-existing system, there is very little design freedom, but there
may be scope for additional testing and analysis to demonstrate safety
attributes.
8.3 Subsystem safety case

The system architecture will apportion top-level safety functions to subsystems and
will also impose derived requirements on the subsystem. These will include
additional functional requirements (e.g. to support fail-safety and fault tolerance
mechanisms). There may also be attributes such as timeliness which have been
apportioned by the systems analysis, which have to be implemented and
demonstrated at the subsystem level. In addition, the system architecture design
may impose additional design constraints (such as available memory), which
must be respected before the system can function correctly. Thus the subsystem
safety case will consist of a number of claims which link back to requirements and
constraints imposed by the system-level safety case.
There could be several layers of subsystems and associated safety cases (e.g.
individual computer system and software). For each layer one can identify a
design activity that establishes the component safety context and an
implementation activity.
Version: 1.1
47
9 Independent assessment and acceptance of the safety case

Depending on the system and industry involved, there is usually some form of
independent assessment and acceptance of the safety case. The safety case
may be accepted by some safety body within the customer organisation, or by
some external regulator or independent assessor. Ideally, this assessment process
should be phased to run in parallel with safety case development. For example, if
the individual parts of the safety case are assessed and accepted as they are
developed, this can reduce the effort and time required to produce the overall
safety case.
The Preliminary safety case element assessment should focus on:
the realism of the environment description
the credibility of the safety analysis and hazard identification
the credibility and conservatism of the assumptions (e.g. sensitivity to

error)
for the validity of the safety requirements to be implemented in the other
safety cases
The Architecture safety case element assessment should focus on:
whether the design features will achieve the attributes and whether a
design for assurance approach has been adopted
whether the design addresses long term issues
the credibility and depth of the design hazard analysis
the project risk arising from novelty, complexity and project stress
the extent of standards compliance
the use of prior evidence, pre-certification of components (e.g. by TV),

and field experience
traceability to the Preliminary safety case component
The Implementation safety case element assessment should focus on:
48
Version: 1.1
the consistency of the claims with the Architecture safety case

requirements
the strength of the arguments and evidence to support the claims
the sensitivity to argument flaws (e.g. number of argument legs)
the credibility and conservatism of assumptions
whether the hazards identified in the design have been tracked and
controlled (e.g. by hazard elimination, protective features, or operational
procedures)
the impact of changes made during development (and whether this
affects the arguments in the Preliminary and Architectural safety cases)
whether the operational and maintenance requirements to maintain the
system and the safety case are likely to be reasonable
The Operational safety case assessment should focus on:
whether the operational procedures necessary to maintain safety have

been implemented, and are reasonable
whether there are adequate staff with appropriate technical skills and
training to maintain and operate the system
whether there is an adequate support infrastructure to monitor and
update the safety case during operation (e.g. are there staff or contracts
in place to update systems and the associated safety cases and
evidence)
whether there is an acceptable approach for dealing with outstanding
concerns (e.g. periodic tests, gathering field evidence, further analyses,
etc.)
Over the lifetime of the system, there will almost inevitably be changes to the
safety case to accommodate changes in regulations, technology and
organisations.
Appendix F provides a generic safety case review checklist that can be used at
all project phases.
Version: 1.1
49
The independent assessments should also look broadly at the available evidence
to ensure that any evidence contradicting the claims is properly incorporated into
the safety arguments.
10 Long-term maintenance
An important part of many safety cases is their potential longevity. This part of the
manual looks at the issues raised by this longevity and the supporting
organisational and management processes that are needed. The maintenance
implications of the safety case have been incorporated into the overall safety
case methodology in Section 5, so that the long-term costs and risks of
maintaining the safety case can be considered at an early stage in the system
design. There is little published data on the costs of safety case maintenance. The
costs of maintaining the overall safety cases in the nuclear industry is significant,
roughly 2% of operating costs per year, so a methodology that considers support
implications could have a considerable impact on costs as well as safety.
Control and protection systems are long lived in comparison with the lifetimes of
the implementation technologies, which are typically electronic and computer
based. Developments in these technologies are rapid with typical products
obsolete within a few years. This has led to the special provision of spares and for
the planned refurbishment of systems, and considerable effort is expended to
address the long term operational requirements. There are however wider issues
than this to be addressed when looking at the long term maintenance of safety
cases. These include the need to maintain the safety case in the light of external
changes which may affect it, e.g.:
changes in operational requirements
changes to the implementation and assurance technologies
physical deterioration of the equipment
changes to safety criteria, standards and the regulatory environment
new technical knowledge and the feedback of experience
We also need to consider internal changes which affect the long-term integrity of
the safety case maintenance process, e.g.:
50
changes to the safety case process, people and technical resources
changes in organisational structures and responsibilities
Version: 1.1
In discussing the integrity of the maintenance process, we cannot just consider

the process as some kind of machinewe have to address some of the human
factors and organisational issues involved in the process. For example, the
maintenance integrity will be affected by the skills and the unwritten knowledge
(tacit knowledge) of a person or a team. The performance can also be
impaired by organisational factors such as cumbersome procedures, poor
communication, and management attitudes (i.e. pressures imposed through
resource availability, time schedules, response to safety problems, etc.). In
general the weaknesses arising from human factors can be broadly classified in
terms of their source, as follows:
1. generic individual weaknesses to errore.g. skill-based, rule-based, and
knowledge-based errors
2. supporting materialshow does the design and actual use of documents
and other representations help or hinder the human activities they are
designed to support?
3. violations of established proceduresdo existing accepted or
documented procedures encourage deviations or departures that are
(a) harmful or (b) necessary to get the job done?
4. generic group weaknesses to errore.g. group co-ordination and process
failures arising from inappropriate resources; co-ordination failures or
motivational problems
5. organisational weaknesseshow does the communication, culture and
structure of the organisation impact on the process and its constituent
activities?
More background on human factors is provided in Appendix I.
In preparing this manual we investigated historical trends in safety system design
and their impact on safety case construction and maintenance. We also
examined the overall safety case maintenance activities, the main sources of
change and the requirements for maintaining the integrity of the maintenance
process so that it can respond to change. We also conducted interviews with
safety case specialists to help identify difficulties in maintaining the integrity of an
existing system and the problems associated with system and organisational
changes. This provided the raw material for the guidance.
The long-term maintenance requirements are dealt with in more detail in
Appendix H, but briefly the main activities are to:
Monitor the integrity of the safety case and the support infrastructure. (Is
the safety case still valid? Have the outstanding concerns been
addressed? Can anticipated changes be implemented?)
Version: 1.1
51
Assess the safety impact of any proposed changes (e.g. replacement of

obsolete parts, changes to functional requirements, changes to related
equipment) or new technical knowledge (e.g. about component failure
modes).
Update the safety case to reflect any change in the system.
11 Contents of a safety case report: documentation issues

As defined above a safety case is a documented body of evidence
accessible at different levels of detail. To provide an effective means for
developing, communicating and reviewing the safety case as it evolves, the main
claims, arguments and evidence should be contained in a safety case report with
a wealth of supporting documentation.
The guidance in this manual is more concerned with making valid and convincing
arguments than with prescriptive details of how the information should be
structured. Clearly there are a number of different ways of structuring the
documentation according to the project structure and the overall approach to
documentation.
The safety case structure is to some extent recursive, with a similar structure for the
system and subsystem safety cases. Also not all the information is available at the
start of the project so the safety case report will evolve from being a statement of
requirements, to a statement of intent to a record of what has been achieved. For
this reason it is important that the report contains a section describing the current
status of the system implementation, subsidiary safety cases and the supporting
evidence.
For generic systems and subsystems the safety case would be split into two parts:
1. A generic safety casecovering generic safety features (this is done only
once)
2. An installation and operation safety casecovering the safety
justification for the specific installation
The safety case documentation should be a coherent and consistent whole.
Some approaches will involve subsuming earlier safety cases in the current one,
others will update and reissue previous versions. traceability needs to be
maintained throughout.
Safety cases are good candidates for electronic support. In Adelard we have
been developing a particular approach to this based on the Claviar tool set:
details can be found on our web page (www.adelard.co.uk).
52
Version: 1.1
In developing an approach to documentation the following should be

addressed:
A common style (template) across all parts and stages of the
project
A structure that reflects the contractual boundaries (e.g. a
new safety case report for each change in ownership)
A structure that takes account of the system structure (e.g. a
safety case for each major component)
A mechanism for ensuring consistency between safety cases
(e.g. review, insertion of common information from one
source)
A practicable mechanism for ensuring traceability between
safety cases and their different versions
The following sections provide a checklist of headings that could be incorporated
in a safety case report. A checklist of other safety documents is provided in
Appendix C.
11.1 Environment description

This summarises the operating context for the safety system. The description should
include:
external equipment (e.g. the plant or other equipment)
interfaces to environment (e.g. actuators, sensors, data links)
failure modes of external equipment and interfaces
hazardous plant states
hazardous / safe states of the interfaces
anticipated changes in external equipment, interfaces and operating

modes
Version: 1.1
53
11.2 PES safety requirements

This identifies the top level safety requirements for the Programmable Electronic
System (PES), and should include:
safety functions
reliability requirements
other safety attributes (see Table H0, page 25)
applicable design criteria and standards
anticipated changes over its lifetime
11.3 PES system architecture

The proposed or actual system architecture used to implement the safety
requirements should be described. The architecture description should cover:
subsystem components and interconnections
the apportionment of safety functions and integrity levels to subsystems
description of design features and methods for implementing the

attributes
subsystem derived functions (e.g. diagnostic, security, maintenance
functions)
subsystem attributes (e.g. time budgets, reliability, integrity level, fail-safe
bias)
subsystem design constraints (memory capacity, processor utilisation,
expansion capacity, segregation and isolation, diversity requirements)
subsystem safety case evidence requirements
11.4 Planned and actual implementation approach

The development methods used to assure the integrity of the implemented system
should be identified, e.g.:
54
fault avoidance (e.g. use of established components)
Version: 1.1
design analyses
verification and validation tests
11.5 PES system architecture safety argument

The safety case should present a set of arguments based on the design, planned
development processes and assumptions which support a claim that the safety
requirements can be met. A safety argument should:
provide at least one safety argument for each requirement which relates
evidence from design features, subsystem requirements and
development processes to a claim about the requirement
identify all design assumptions used in the argument, e.g.:
claim limits (CCF, system failure rates, fault detection, diversity)
failure modes
failure rates
fail-safe bias
fault detection coverage
segregation and independence
performance of test and development methods
identify supporting evidence and analyses
system hazard analysis
operations and maintenance hazard analysis (human error analysis)
reliability and availability (e.g. failures per demand, spurious trip rate)
timeliness, accuracy
compliance with design criteria
expected change analysis
Version: 1.1
55
evidence from subsystem safety cases on subsystem attributes and

functions
analysis of field evidence (standard components, supporting tools)
11.6 Subsystem design and safety arguments

This is essentially similar to the main safety case except that some of the attributes
may be converted to functional requirements, and the supporting evidence may
differ. See Table H0, page 26 for a list of attributes related to software.
This should include the results of additional hazard analyses assessing the impact
of the subsystem on the overall system and validating the assumptions made in
earlier safety analyses.
11.7 Long term support requirements

The safety case should also identify the associated support requirements for the
PES. This will include:
PES maintenance and operation procedures

safety case support infrastructure, to accommodate anticipated system
changes and maintain integrity of the existing safety argument
11.8 Status information

The safety case report is a living document which evolves throughout
development and subsequent operation and should contain a description of the
current status. This could include the status of:
safety case evidence
design assumptions
subsystems
outstanding concerns
unresolved hazards
In order to track the evolution of the safety case, it is also desirable to record
significant events during the construction of the safety case. This would include:
56
Version: 1.1
changes in safety case arguments and evidence
justification of changes
identification of hazards arising during implementation, and their

resolution
11.9 Evidence of quality and safety management

Since the safety case is only credible if the development is well-controlled, the
accomplishment summary must also reference evidence that the safety
management and quality management procedures have been followed. This
evidence includes:
results of QA audits
result of safety audits
evidence that identified problems are resolved
11.10 References
The document will include references to related documents. These could include:
environment descriptions
design documents
safety case evidence (analysis documents, test documents)
subsystem safety cases
hazard log
quality and safety audits
scientific journals, technical documents
Version: 1.1
57
58
Version: 1.1
Appendix A System safety

context
Safety is a property of the overall system
rather than any given sub-component.
The top-level requirement is to maintain
the safety of the plant, aircraft or other
system. In the subsequent safety analysis, there is a process of hazard
identification, risk analysis, and risk reduction. The risk reduction can be
implemented using a number of different strategies:
Hazard elimination. The hazard is removed by modifying the plant design.

For example a fuel reprocessing cell full of radioactive material might go
critical. The hazard can be removed by reducing the size of the cell so
that there will always be a sub-critical mass.
Hazard control. For remaining hazards, the possibility of an accident can
be reduced by independent safety features. For example, the processing
cell could be fitted with a number of independent safety devices to
prevent too much material entering the cell (weighing devices or
radioactivity measurements) or to prevent criticality if a dangerous
amount does enter (e.g. by flooding with neutron absorber).
Accident mitigation. If the hazard controls fail, and there could be an
incident or accident, the severity could be mitigated by other safety
features (i.e. by operating the system underground, using various forms of
containment, etc.).
The overall plant or system safety case makes an overall claim for safety based on
all these risk reduction approaches. Targets would be set for the tolerable
accident frequency and severity, and the top-level safety case would argue that
the implemented safety features ensured the accident frequency was within
limits. There is also a requirement to show the risk is ALARP (as low as reasonably
practicable) so further risk reduction should be implemented provided the costs
do not outweigh the gains.
The systems we are discussing fall mainly into the category of hazard control (i.e.
reducing accident frequency). They would be used to implement the basic safety
functions (e.g. preventing excess mass entry or flooding the cell). Of course there
is no actual need to use a computer-based system to implement a safety
function; other mechanisms such as mechanical interlocks, discrete logic or a
human operator could be used instead. In addition the same safety function
Version: 1.1
59
might be implemented by several different systems (e.g. computers, discrete

logic, and manual operation).
The top-level safety case is normally based on compliance with established safety
engineering principles coupled with probabilistic arguments about the failure
probability of the various safety systems that implement the safety functions. The
safety engineering principles are based on engineering consensus and past
practice within an industry and system designs have to satisfy specific criteria such
as defence in depth and the single failure criterion. The industry can also
impose design safety assessment rules to introduce some conservatism into the
probabilistic analysis used in the safety case. These might be qualitative rules
about what components can be considered diverse (and hence fail
independently) or quantitative rules such as:
claim limits on the failure rates of a subsystem or component
claim limits on the level of common mode failure between components
In computer-based systems, these limits are often related to an assigned integrity

level. The concept of an integrity level is used in a number of different safety
standards, such as IEC 61508, and MOD Defence Standard 00-56 with the integrity
level ranges from 1 to 4 (with 4 being the highest level). Generally speaking, subcomponents inherit the system integrity level, although if the sub-components are
diverse, the sub-components integrity level can be one level lower. The integrity
level for a software component is often linked to recommended techniques
(e.g. statistical tests for Level 4).
Safety also has to be maintained over the lifetime of the plant. The system design
and the associated safety case must consider potential attacks on the design
integrity over its lifetime (e.g. through normal operation, maintenance, upgrades
and replacement) which can introduce new flaws. When these considerations
are applied to individual subsystems (typically by applying hazard analysis
methods), a set of derived requirements may be produced for the subsystems
that are necessary to maintain the top-level safety goal. These derived
requirements can be represented by attributes of the equipment that can affect
plant safety. The most obvious attributes are reliability, availability and fail-safety,
but there are other more indirect attributes including:
60
maintainability
modifiability
security
usability
Version: 1.1
replaceability
These tend to be treated as softer attributes, but they are necessary to maintain
the integrity of the original design against potential sources of attack (even if
these are unintentional). Essentially the attributes relate to threats from different
sources (such as maintenance staff, the operator, unauthorised personnel, or
ageing and obsolescent equipment). These might be addressed using more
qualitative arguments (e.g. number of defences or conformity to ergonomic rules
and design standards).
A.1 Safety-related standards in the public domain

There are many safety related standards. A selection of the publicly available
generic standards is shown below.
IEC 61508
A general safety standard for industrial computer

systems, which covers both system and safety aspects.
Identifies recommended methods at different
integrity levels. Also contains guidance on
management and documentation requirements
MOD DS 00-56
Ministry of Defence system safety standard. It includes

a risk classification scheme and rules for assigning
integrity levels to systems and software
MOD DS 00-55
Ministry of Defence standard for software in safety

related systems. Initially designed for Integrity Level 4,
but now extended to cover lower integrity levels
DIN V-19250
German system safety standard, includes a risk

classification scheme
DIN VDE-801
German standard for implementing safety related

software. Focuses on defences against systematic and
random faults in the system
ISO 9001
Quality standard. Compliance to an accepted quality

standard is required to ensure the safety case
construction process is well managed
ISO 9000-3
Quality standard for software development within the

framework of ISO 9001
Table A1: Example public domain standards
Version: 1.1
61
Some links to standards and reviews of standards are provided from

www.adelard.co.uk.
A.2 Other safety guidance

In addition to specific standards there is guidance issued by the HSE, and
established industry practices.
HSE PES Guidelines
Guidance for general industrial use.

Concentrates mainly on the overall
system safety architecture
IAEA Software important to safety in

nuclear power plant
Provides guidance on the

development and verification of
software for nuclear power plant
Table A2: Example industry guidance
A.3 Example criteria

Design criteria identify design rules that are considered necessary to achieve an
adequate level of safety. The criteria can be expressed in probabilistic, qualitative
or deterministic terms. For example: no single independent fault shall affect
normal operation of the equipment is an absolute rule. There is scope for
interpretation of the words independent and fault, but once agreed the
design can be subjected to an analysis to establish conformity to the rule. A
qualitative rule using terms such reasonable or practicable requires human
judgement and justification for each new case.
A.3.1 Probabilistic criteria

A probabilistic safety analysis can yield unrealistic results if excessive claims are
made about the performance of individual subsystems. Unidentified and
unquantified factors might exist which drastically affect the achieved level of
performance. To avoid over-optimistic analyses, limits are imposed on the
claimed performance level of complete systems and specific design features.
Note that evidence must still be supplied to show the level is actually reached or
exceeded. In the case of IEC 61508, reliability claims for computer systems are
associated with specific integrity levels, i.e.:
62
Version: 1.1
Integrity
level
Failure probability per

hour (P)
10-5 > P 10-6
10-6 > P 10-7
10-7 > P 10-8
10-8 > P 10-9
Table A3: IEC 61508 reliability claims
A similar limit scheme is used in MOD DS 00-56, but the probability ranges are not
pre-determined; they have to be defined for a specific application.
For diverse subsystems implementing the same function (or functions), MOD DS 0056 allows the subsystem integrity level to be reduced by one level. Common faults
can limit the reliability improvement of diverse systems. The reduction in integrity
level reflects empirical experience that diversity can yield an order of magnitude
improvement. Other examples of claim limits for other design features are:
Fault detection coverage factor (e.g. maximum 95%)
Beta Factor limit for redundant channels (e.g. 10%)
Fail-safe bias (e.g. maximum 95%)
A.3.2 Deterministic criteria

These usually impose constraints on the system architecture that would be
acceptable in a safety system. These constraints might include rules such as:
No single independent fault shall affect normal operation.
No two independent faults shall affect safety.
Segregation requirements, where all components in one box are

assumed to be able to affect each other. Any component inside the
segregation boundary has to be implemented to the integrity level of the
most critical component.
Version: 1.1
63
At least two different safety functions to protect against the most critical
accidents.
A.3.3 Qualitative criteria

Qualitative criteria require compliance to some rule but the judgement of
compliance tends to be more subjective and context specific e.g.:
64
Compliance with a quality management standard. This could be

implemented in different ways so the acceptance requirements are not
so clear cut.
As low as reasonably practicable (ALARP). This is based on some
quantitative estimates of gain in safety versus implementation cost, but
the relative costs are open to debate.
Version: 1.1
Appendix B Design options to

limit dangerous failures
A design should incorporate defences
against anticipated hardware failures,
design flaws and human errors that
could affect the functional behaviour of
the system. Three main strategies exist for ensuring safety: fault avoidance, error
tolerance and fail-safe bias. The following tables identify potential defences
under these three main headings.
B.1 Computer system defences

This table identifies some of the safety related attributes at the software level, and
identifies possible design approaches that limit a dangerous failure of an
attribute.
Design Approaches
Attribute
Accuracy
Fault Avoidance
stable sensors
stable and
accurate inputoutput system
Availability
Version: 1.1
Error Tolerance
Fail-safe bias
feedback
mechanisms to
minimise long-term
error
high reliability
components
multiple channels +
voting
compliance with
environmental
standards (EMI,
temperature,
etc.)
main + hot standby

main + cold
standby
65
Logical
correctness
design simplicity
design diversity
formally proved
hardware (e.g.
VIPER)
hardware
watchdogs
fail-safe bias on
inputs and outputs
mature hardware
(stable, extensive
field experience)
Maintainability
interface
labelling
multiple channels +
voting
keyed
connectors to
avoid errors
main + hot standby

main + cold
standby
indicator lights for

failed
components
Modifiability
simple, standard
interfaces
modular design
Response to
overload
ensuring
processor
capacity is
sufficient for
maximum inputoutput data rates
prioritising functions
so that the least
important functions
can be discarded
Security
locked cabinets
access indicators
(e.g. light on if door
open)
encryption
Timeliness
time budgets
assigned to
functions
hardware
watchdogs
Table B1: System level defences
66
Version: 1.1
B.2 Software defences

This table identifies some of the safety-related attributes at the software level, and
identifies possible design countermeasures that limit a dangerous failure of an
attribute.
Design options
Attribute
Accuracy
Fault Avoidance
use of floating
point
Error Tolerance
Fail-safe bias
diversity plus voting
comparison
against
computation with
a small input
perturbation
safety critical tasks

have priority on
resources
memory exhaustion
checks
alternative data
sources
fail-safe response
to failure conditions
integer
calculations with
worst case error
and overflow
analysis
algorithm stability
analysis
Compliance to
hardware
constraints (e.g.
memory)
pre-allocation of
resources
Fault tolerance
alternative output
devices
Logical
correctness
design simplicity
formal
development
design diversity
isolation from
failures in noncritical functions
safety kernels
assertion checks in
code
Modifiability
design simplicity
code assertions to
detect errors
information
hiding
Version: 1.1
67
Response to
overload
mechanisms for
limiting
throughput
graceful
degradation (e.g.
discarding old
data in a real time
system)
overload detection
Time response
bounded
execution time
preference given
to safety critical
tasks
software timers
watchdogs
Table B2: Software level defences
68
Version: 1.1
B.3 Operations and maintenance error defences

Risk reduction options
Human Error
Operator error
Error Avoidance
training
procedures
Fault Detection /
Recovery
status displays
capability for cancelling
or returning to original
state
Calibration
error
Repair error
Update error
(parameter
data, redesign)
training
independent checks
status recording
pre-start tests
training
independent checks
status recording
pre-start tests
training
independent checks
status recording
capability for cancelling

or returning to original
state
on-line monitoring of
configuration integrity
pre-start tests
Malicious
damage
restriction of access
(locked cabinets,
passwords, authorisation
procedures)
Table B3: Maintenance and operations defences
Version: 1.1
69
70
Version: 1.1
Appendix C Checklist of safety

documents
To support the safety case construction
and maintenance activities, a wide
range of safety and project documents
will be needed. The set of documents
listed here is illustrative rather than mandatory. For a checklist on safety case
maintenance documentation see Appendix H.4.
C.1 Planning
Safety plandocument specifying the steps to produce a structured safety case
over the systems lifetime, covering quality management, safety management
and functional and technical safety.
Other plansthe safety aspects of other plans may also be relevant e.g. Quality
Plan, Configuration Management Plan, Integrated Logistic Support Plan,
Operation and Maintenance Plan, V&V Plan, Overall Project Plan.
C.2 Safety cases

There may be a number of different safety cases, phased by the stage in the
project or by the subsystem or component that they apply to.
Preliminary safety caseoutline safety case, provides the basic arguments used
to justify the system and subsystems, and the supporting evidence required.
Subsystem safety casesafety case for subsystem, provides the basic arguments
used to justify subsystems, and references to the supporting documentation.
Architectural safety casesafety case for system and subsystem design.
Final safety casefor generic systems and subsystems this would be split into two
parts:
Generic safety casecovering generic safety features (this is done only
once).
Installation and operation safety casecovering the safety justification for
the specific installation.
Version: 1.1
71
C.3 Safety related documentation

Safety management systemdefinition and supporting procedures, work
instructions, organisations safety policy (targets, approach to ALARP), safety
record of company.
Supporting safety documents i.e. analyses, safety audit reports, hazard
identification, hazard analysis, operations and maintenance hazard analysis,
human error analysis, risk assessment, hazard log and hazard resolution records.
Safety aspects of "'ility" analysesreliability, availability, maintainability, security,
performance.
Analysis of field evidenceof standard components, supporting tools.
C.4 Project implementation

Codes of design practice intention and compliance at system, equipment,
software levels.
Project documentationrequirements, specification, design, coding, V&V and
test records, QA records, CM records.
System and environment descriptionexternal equipment (e.g. the plant or other
equipment), interfaces to environment (e.g. actuators, sensors, data links), failure
modes of external equipment and interfaces, hazardous plant states, hazardous/
safe states of the interfaces, anticipated changes in external equipment,
interfaces and operating modes.
Management and status informationcurrent concerns list, assessments of
competency, project stress indicators and safety culture indicators.
C.5 Review and audits

Safety Case Assessment Reportsevaluate credibility (performed by internal and
external assessors) of safety cases.
Certificatesresults of review and audit e.g. by ISA, QA organisation, internal
project audits, results of third party certification or previous regulatory approvals.
Audit reportsresults of audit e.g. by ISA, QA organisation, internal project audits.
Standards Compliance reportsintentions, statement and assessments.
72
Version: 1.1
Appendix D Attribute-claimevidence tables

D.1 Attribute-claim-design tables
The following tables identify attributes
which may affect safety at the system
architecture level, together with possible safety arguments based on these forms
of evidence. These arguments often rely on assumptions, and requirements for
subsystems, which will have to be substantiated to complete the overall safety
case.
Attribute: Functional Behaviour
Claim
Claim that the

composite
behaviour of the
critical functions
implements the
overall safety
function
Design Features
Identification of
safety-related
functions
Partitioning
according to
criticality
Assumption
/Evidence
Subsystem
Requirements
Assumption that
Subsystem integrity
segregated
level
functions cannot
affect each other Functional
segregation
requirements
Design simplicity
Version: 1.1
73
Attribute: Fail-safety
Claim
Design
Features
Claim that safety is

maintained under
stated failure
conditions, assuming
the subsystems are
correctly implemented
Use of
functional
diversity
Fail-safe
architectures
Assumption
/Evidence
System Hazard
Analysis
Fault Tree
Analysis
Subsystem
Requirements
Fail safety
requirements to
subsystems (response
to failure conditions)
Attribute: Reliability/availability
Claim
Reliability claim
based on
reliability
modelling and
CMF assumptions,
together with
fault detection
and repair
assumptions.
Reliability claim
based on
experience with
similar systems
74
Design Features
Architecture,
levels of
redundancy,
segregation
Fault tolerant
architectures
Design simplicity
Assumption
/Evidence
Subsystem
Requirements
Reliability of
components,
CMF assumptions
Hardware
component reliability
Failure rate,
diagnostic
coverage,
test intervals,
repair time,
chance of
successful repair
Prior field
reliability
in similar
applications
Software integrity
level
Component
segregation
requirements
Fault detection and
diagnostic
requirements
Maintenance
requirements
Version: 1.1
Attribute: Response Time

Claim
Claim that overall

system design
can meet target
time response
Design Features
Design ensures
overall response
time is bounded
Assumption
/Evidence
Assumes subsystem time
budgets can be
met
Subsystem
Requirements
Time budgets for
hardware interfaces,
and software
Attribute: Security
Claim
Claim a defence
exists for all
identified attacks
Claim of defence
in depth for
critical attacks
Design Features
System level
access controls
External
interfaces
Physical barriers
Assumption
/Evidence
Knowledge of
likelihood of
different forms of
attack
Assumption that
all forms of
attack are
identified
Subsystem
Requirements
Subsystem
integrity checks,
interface credibility
checks,
subsystem
segregation
Attribute: Modifiability
Claim
Claim that
anticipated
changes do not
pose a safety risk
Version: 1.1
Design Features
Assumption
/Evidence
Functional
segregation,
design structure
Identification of
features likely to
change
Design simplicity
Impact
assessment of
incorrect
modification
Subsystem
Requirements
Explicit identification
of features likely to
change in software
and hardware
specifications
75
Attribute: Maintainability
Claim
Claim that
maintenance
actions can be
performed
reliably, or are at
least fail-safe
(based on
analysis)
Design Features
Time to repair
Limits on
maintenance
actions (access,
calibration,
repair,
reconfiguration)
(based on past
systems with
similar features)
Assumption
/Evidence
Identification of
possible
maintenance
errors
Subsystem
Requirements
Subsystem failure
reporting and self-test
functions
Assessment of
incorrect action,
assessment of
impact on
dangerous failure
Attribute: Usability
Claim
Design Features
Claim that
operator cannot
affect the safety
of the system
On-line help,
ergonomic
design,
credibility
checks,
limits on operator
action
76
Assumption
/Evidence
Human error
rates, types of
error
Subsystem
Requirements
Operator interface
requirements
Usability tests
Version: 1.1
D.2 Attribute-claim-argument tables

The following tables identify attributes that may affect safety at the software
subsystem architecture level, together with possible safety arguments based on
these forms of evidence.
Attribute: Correctness
Claim
Argument
There is no
logical fault in
the software
implementation
Formal proof of
specified safety
properties
The design is simple enough to be

amenable to proof
Formal proof that code

implements its
specification
Proof tool is correct (or unlikely to

make a compensating error)
Compiler generates correct code
(sub-argument might use formal
proof, past experience, or compiler
certification)
High quality V&V process
Unit test results
Statistical test results
Attribute: Reliability
Claim
Software
reliability
exceeds system
requirement
Version: 1.1
Argument
Reliability can be
assessed under
simulated operational
conditions
77
Attribute: Timeliness
Claim
Argument
The system will

always respond
within the
specified time
constraints
Software design is such

that the execution time
is bounded and
statically decidable
Maximum timing decided by static
code analysis
Dynamic tests of worst case time
response
Maximum time less than

limit
Attribute: Memory Constraints

Claim
The system will
always have
sufficient
memory to
continue
operation
Argument
Software design is such
that the memory is
bounded and statically
decidable
Analysis of memory usage
Stress testing of system
Maximum memory use is

less than limit
Attribute: Tolerance to hardware failure

Claim
Argument
Identified
hardware
failures
(computer
interfaces, and
computer
system) are
either tolerated
or result in a failsafe response
Interface faults are

detectable by software
(e.g. via redundancy or
encoding). Internal
failure is detectable
and fail-safe
78
All failure modes have been
identified
Fault injection tests to check
response
Version: 1.1
Attribute: Tolerance to overload

Claim
Demands in
excess of the
specified rates
will result in a
safe response
Argument
Design can detect
overload conditions
and either maintain a
degraded service or
perform a fail-safe
action
There is sufficient processing power
to cope with credible levels of
overload
Overload tests
Attribute: Maintainability
Claim
Argument
Parameter
adjustments
can be made
without
affecting safety
Software-imposed limits
ensure parameters
remain in the safe range
Systems level analysis of allowable
safe ranges
Validation tests
Attribute: Operability
Claim
Claim that the
system is robust
to faulty
operator
actions
Claim that the
system is
designed to
minimise user
error
Version: 1.1
Argument
Design conforms to
human factors
standards
Interface prototyping
Validation tests
Actions checked for

safety implications (e.g.
software safety
interlocks)
79
80
Version: 1.1
Appendix E Review of changes

that can affect the safety case
The following discussion considers the
various sources of change and the
associated modifications to the safety
case.
E.1 Changed PES system requirements

There are a number of different types of change to PES system requirements.
Ideally these should have been anticipated in the original development (e.g.
revised sensors or different setpoints). These should have been designed for in the
safety case, and the changes required will depend on the success in partitioning
the design for maintenance and in possibly showing generic properties so that
existing analyses can be reused.
Other types of changes could be to add functionality for new items (e.g. for
safety reasons a new trip is required) or to respond to changed interface
requirements for operational reasons (e.g. to improve provision of information to
the operator or to facilitate maintenance).
These changes to requirements can affect both non-functional and functional
requirements. Changes to functional requirements will require the redevelopment
and assurance of part of the system and the revision of part of the safety case
with the principles and the types of argument remaining as before. However, if
the technology is not available, or the safety criteria have changed, then there
may be a more radical change to the safety case. This would happen in the case
of obsolete equipment (see Appendix E.2) or in the case of changes in the
regulatory environment (Appendix E.3). If the changes were so major as to require
redesign of the complete system or a different hardware architecture the safety
case changes could also be large.
Changes to non-functional requirements such as reliability or availability can also
have an effect of varying degree. We could anticipate incremental
improvements which require the additional assurance of, say, performance. Or
we might have increased reliability requirements that could be achieved by
strengthening the statistical testing argument of the safety case. However, the
marginal cost of these changes will depend on how the system has been
designed and developed. If, for example, changes to non-functional
requirements resulted in the system moving from SIL2 to SIL3 so that formal
methods were then required, that would be a very significant and expensive
Version: 1.1
81
change. For example, in a system Adelard developed, we estimate that a

change from SIL2 to SIL3 would increase the project cost by 25% and the project
development time by 20%. If we had not been using formal methods at SIL2 these
figures would have been more comparable to the original development effort.
The overall cost of the changes is important but often, for systems that are already
in operation, the duration of the assurance required can be as significant as the
direct cost.
To summarise, the impact on the safety case of changes is very non-linear. The
following tables summarise the discussion:
Changes to functional requirements
Arguments
Impact of small
change
Larger changes
Statistical testing
There will be a need

to redo tests of the
impacted
components: this
could be significant
in terms of project
time
Test environment may

change
Deterministic
arguments
The changes should

be localised and only
require partial reanalysis
May need to redo or

restructure whole
proof
Experience
The change would

invalidate the use of
prior experience
unless some
modularisation
argument would
apply
The change could

invalidate the use of
prior experience
Process
Need to reimplement
process or develop
changed/new parts.
Obsolescence could
be a problem
Need to reimplement
process
Table E1: Changes to functional requirements
82
Version: 1.1
Changes to non-functional requirements

Arguments
Impact of small
change
Larger changes
Statistical testing
Additional tests directly

related to changes in
reliability requirement.
Note that number
varies as 10 to the
power of SIL
Could make tests

infeasible due to time
required for them
Deterministic
arguments
May require re-analysis

of timing analysis with
changes to data
May require complete

re-analysis, e.g. building
new model of system
Changes to
performance and
optimisation
Experience
May require re-analysis

and more data to use
this argument
May greatly weaken

the experience
argument if reliability
requirement
significantly increased
Process
Need to collect
process data as for
initial development
Process may not be

appropriate for
significant changes in
SIL
Table E2: Changes to non-functional requirements
E.2 Impending obsolescence

Impending obsolescence of the PES equipment can precipitate a need for
changes to the safety case. The extent of the change can vary from complete
replacement of the system, with the corresponding requirement for completely
new arguments and evidence to support the safety case, through to less drastic
modification. The following table provides some more detail of the possible
impact.
Version: 1.1
83
Arguments
Impact on software safety case of change

to hardware
Statistical testing
Need to repeat with the new hardware.

Arguments for more limited testing might
be possible if software reused entirely
Deterministic
arguments
The arguments from specification to

source code should remain untouched if
using the same source languages
otherwise only reusable to detailed design
Need to redo those arguments from
source code to object code and
generate new evidence
Experience
Experience of application software may

carry over but not of operating system.
Again depends on extent of software
change
Process
There will be a need to control and

measure the process used for any
changes compatible with the use made of
process arguments
Table E3: Impact on software safety case of change to hardware
The extent of the change to software will obviously have a profound effect on the
changes to the safety case. The following table indicates the potential impact:
84
Version: 1.1
Arguments
Impact of change to software
Statistical testing
Need to retest, probably completely
Deterministic arguments
Need to repeat parts of software

affected and to argue for partitioning
Experience
Evidence not easily reusable. Some

software structures may allow
arguments for reuse to be made
Process
Not reusable. Need new process

arguments for new software
Table E4: Impact of change to software
There is also the potential problem of the obsolescence of the software

engineering process, techniques and tools used to justify the software safety case.
The issues of maintenance of expertise are dealt with below.
This problem has been recognised in the defence industry, and is to some extent
addressed in defence standards and guidelines that require delivery of the tools
and supporting environments used to develop the software. This provides the
potential capability for maintenance but poses the problems of how to maintain
these essential tools in working order. The tools fall into a number of categories:
documentation support: tools that do not perform technical analysis or

transform the system, such as word processors and configuration
management tools
checking and analysis tools, e.g. static analysis, timing analysis, and test
coverage analysis tools
transformation tools e.g. compilers, code generators, linkers and loaders
The issue of obsolescent tools needs to be addressed in the periodic review and
an appropriate response formulated. This might involve:
Selection of an alternative tool, and migration of the relevant items

(documentation, analysis or software) to the new tool.
Running the existing tool on a software emulation of an old system. For
example, the Malpas analysis tool only runs on a DEC VAX machine, but
Version: 1.1
85
the VAX environment can be emulated on the current DEC Alpha

machines.
Adopting an entirely new approach (e.g. new language, different forms

of analysis, etc.).
The costs and risks of the approaches need to be considered, and this would
have to include the issue of maintaining expertise in the tools.
E.3 Changes to regulatory environment or safety criteria

Another driver for maintenance of the safety case is changes to what might be
broadly called the regulatory environment. Changes in the regulatory
environment can sometimes lead to changes in the safety case without top level
changes to the PES requirements. This can be as a result of new standards or
interpretations of standards, new tools and technologies becoming feasible, or
shifts in attitude to risk.
Shifts in attitude to risk might occur from changing perceptions of new
technologies, from the investigation of incidents, or from wider social or political
pressures. For example, an incident involving, say, problems with a
communication protocol in another industry might focus concern on that issue
and require a reinvestigation of the safety case. This may be no more than a
reappraisal that the system is not vulnerable to the incident, or it could require
major new analytical work to bolster the safety case in this area.
The emergence of new standards, or the re-interpretations of existing standards,
could also lead to changes in the safety case. These could impact almost any
area but, given the long gestation period of standards, it should be possible to
track developments and to plan accordingly. In terms of international standards,
the development of IEC 61508 is a significant step as this is likely to have a major
impact on the safety related PES market, and might lead to systems becoming
classified and assessed to IEC 61508 and being used in application areas where
safety cases have traditionally been developed using industry-specific standards.
Either through the development of standards or through wider industrial
application, technologies may be shown to be feasible and hence, given the
present interpretation of ALARP as representing good industrial practice, become
requirements. This may either lead to retrofitting techniques to the safety case or,
as is more usual, to requirements on new systems or modifications of the system.
Potential candidates here are the increased use of statistical testing, more
rigorous justifications from field experience, and an increase in formality and the
degree of proof.
Good industrial practice can also change with respect to the software
engineering process. The increased take up of capability assessment and process
86
Version: 1.1
maturity models may lead to increased requirements for software development.

For example, a large UK company has set target Software Engineering Institute
Capability Maturity Model (SEI CMM) levels of 4 and 5 for safety related software
development and intends that these should be achieved within the next couple
of years.
One must also countenance possible changes to the criteria underpinning the
safety case. This work is broadly in line with the existing policy of requiring two
arguments in the safety case. But the issues of the strength of the arguments
should they both be equally strong?and how the number of arguments might
change with the criticality of the application, are presently unclear and subject to
further regulatory development.
Version: 1.1
87
88
Version: 1.1
Appendix F Safety case review

checklist
F.1 Basis for the checklists
In this section we define the basis for the
checklists which follow. We consider the objective of a safety case and elaborate
on the implications of the different parts of the definition.
The objective is to produce
a demonstrable and valid argument that a system is adequately safe
over its entire lifetime
F.2 Demonstrable
Understandable. The safety case (or a component part) has to be presented to
and understood by different audiences, such as the developer, the operator and
regulator.
Evolutionary. The safety case has to be presented at different phases in the
system lifetime, i.e.: system concept, system development, acceptance,
operation and replacement.
F.3 Valid
Accurate. As a prerequisite for a valid argument, the evidence presented should
be accurate, i.e.:
Internally consistent.
Be available to all interested parties. We have termed these the
stakeholders; they could include the regulator, developer, subcontractor, and customer departments (e.g. engineering, health and
safety, operations and maintenance).
Be up-to-date and relate to the actual system design.
This is achieved by producing the safety case within an established safety and
quality management system which tracks the status of the various components of
the safety case and system design and controls the release of documents.
Version: 1.1
89
Related to safety properties. The arguments should directly support claims about
the required safety properties of the system (reliability, fail-safety, etc.). Arguments
of good practice (e.g. we tried hard) are not sufficient
Designed for assurance. The construction of a valid safety case may be not
feasible unless an appropriate design is used. A design for assurance approach
is advocated where the system design and safety are developed in parallel to
ensure that:
safety properties can implemented
the design is feasible
the associated safety argument is credible
the approach is cost-effective
KISS (keep it simple). The risk of flaws in the system design and the associated
safety case will increase with complexity. Complexity should be minimised
wherever possible (see Section 5.1.1).
Traceable. Safety properties at one level will be translated into design features at
a lower level. It should be possible to demonstrate a clear link between top level
safety goals and the functional behaviour and attributes of implemented
subsystems
Robust. Arguments may contain flaws. The overall claims should not be sensitive to
individual
F.4 Adequately safe

ALARP. The risk level should be as low as reasonably practicable. It should be
shown that further improvements are either unnecessary or too costly. This will
require analysis of the associated costs of developing the system design and its
associated safety case.
Satisfies design criteria. The nuclear industry frequently uses design safety criteria
that are based on engineering consensus and prior experience. To be considered
adequate, the safety case should show the design criteria have been satisfied
and the safety analysis has respected imposed constraints (such as claim limits).
F.5 Over its entire lifetime

Adaptable. The original system and safety case should be designed to
accommodate likely changes in:
90
Version: 1.1
operational requirements
the operational environment
hardware (e.g. replacement due to obsolescence)
changes in safety requirements
These factors should be considered in the initial design process.

Sustainable. The lifecycle support requirements for the safety case should be
considered during the initial design phase. A support infrastructure is needed to:
check that the integrity of the current safety case is maintained
respond to required changes
The infrastructure requirements have to be feasible over the long term. This
requires an assessment in the design phase of the costs and risks of maintaining
the safety case over the long term.
F.6 Checklist for the technical adequacy of the arguments

This section covers the requirements for a safety case to be a valid
argument that a system is adequately safe.
F.6.1 Completeness of argument
Is there adequate coverage of the safety related attributes?
Are the criticalities of attributes defined and justified?
Is there coverage of initiating faults in the system?
Are the mechanisms for eliminating faults and dealing with failure
adequate?
Is there coverage of operations and maintenance risks and adequacy of
defences?
Does the argument conform to the design criteria?
Version: 1.1
Assessment rules (e.g. claim limits)
91
Design rules
F.6.2 Credibility of argument
Are there a number of independent arguments supporting a claim?
Is an appropriate safety case structure is used?
the system design and the associated safety argument is kept simple (see
Section 5.1.1)
Is the argument readily understandable?
Is the evidence of the required quality?
Are the assumptions credible and/or demonstrably conservative?
F.6.3 Integrity of the safety case documentation and system design
Does the documentation satisfy standards and is it under

configuration control?
Is the documentation correct?

Is the documentation consistent?
Is the documentation complete?
Is the cross-referencing in the documentation correct and
appropriate?
Is the documentation unduly complex?

F.6.4 Checklist for integrity of the operations and maintenance infrastructure
Is there a large number of departments and organisations involved

and/or complex contractual interfaces?
Does the organisation have a safety culture?

Is operator training carried out?
92
Version: 1.1
Is maintenance training available?

Are operation and maintenance procedures formulated and
maintained?
Are all documents available and subject to configuration

management?
Is there project stress (e.g. shortage of sufficient staff and time)?

F.7 Long-term maintainability of the safety case
This section gives checklists to indicate whether a safety case is likely to
remain valid over the lifetime of the system to which it relates. More
discussion of long term issues can be found in Appendix H.
F.7.1 Robustness to system change
What is the likelihood of changes to system function, technology, or

regulatory requirements?
Is there the capability to adapt to the identified changes, and

provision made for the costs of implementing change?
Is there dependency on operator and maintenance intervention?

Is there the infrastructure for long-term support (organisations,
responsibilities, procedures, skills, resources, budget)?
F.7.2 Long-term integrity of the safety case support infra-structure
What is the number of organisations involved? What are the

contractual interfaces?
Is there scarcity of skills for performing the safety case analysis and
updates?
Is there access to domain knowledge about the safety

application?
Is there access to documentation and supporting evidence?
Version: 1.1
93
Is there project stress (e.g. shortage of sufficient staff and time)?

F.7.3 Impact of technological obsolescence
Does the system depend on a specific computer system or supplier?

Does the system depend on specialised niche technologies (e.g.
nucleonics)?
Are there any fall-back options (and are they acceptable to

regulators)?
F.7.4 Impact of regulatory change
For example, can more stringent requirements for the following be met?
94
diversity (design diversity or functional diversity)
segregation
independence (e.g. for IV&V)
evidence to support an attribute
number of independent arguments
design criteria and claim limits
Version: 1.1
Appendix G Use of field

evidence to support a reliability
claim
This appendix discusses the use of prior
experience as a justification for
reliability. This is normally applied to pre-developed systems. Typically the predeveloped systems are commercial off-the-shelf software (COTS) like operating
systems, compilers, graphics or networking software which may be used as part of
an overall design. There can also be complete hardware/software packages
which are configured for specific applications.
The main incentives for using such packages are reduced development time and
(potentially) better reliability since extensive usage allows faults to be detected
and corrected. The following sub-sections look at some of the empirical evidence
on achieved reliability, and some of the underlying theory which is applicable to
COTS (and to a lesser extent) new software developments.
G.1 Empirical evidence

The most direct method of estimating reliability is to record failures in the field
when the product is used. This data can then be analysed to obtain a reliability
estimate. This does however require a very extensive failure recording and
collection process to be implemented over many different sites and companies.
This may be difficult to achieve in general, although data may be available for
very specific application areas.
One study of available field reliability data [2] showed that quite high reliability
figures can be obtained (see figure below).
Version: 1.1
95
(measurement bound)
10000
1000
100
MTTF
(years)
10
1
0.1
0.1
10
100
1000 10000 100000
Operational Use (years)
The data relates mainly to commercial products used for real-time applications
(control, protection, telephone switching, etc). For one of the protection systems,
the MTTF approaches 1000 years. However MTTF may be an unsuitable measure
for such systems as the important attribute is the probability of failure on demand,
and demands may be infrequent (e.g. less than one per year). Nevertheless clear
trends were found in the study:
1. The reliability seems to be higher with increased operational use
2. Small software applications have higher reliability than large programs
given the same level of operational use.
Using IEC 61508 terminology, the field reliability results indicate that a System
Integrity Level target of SIL 2 (10 to 100 years mean time to failure) is achievable
for some commercial real-time products but that many fall below this. It also
provides some justification for treating SIL3 and SIL4 as very onerous requirements
requiring special measures.
Another independent study of reliability in PLC applications [4] yielded the
following results.
96
Version: 1.1
Industry
Sector
Years of
operation
No. of Failures
Safety
significant
Production
significant
Minor
Total
Nuclear
924.0
16
Chemical
74.5
Oil and
gas
64.5
Electricity
54.4
10
1117.4
11
16
30
Totals
Note that all the failures observed were due to faults in the application software
rather than the underlying PLC operating system. The average failure rate of the
application software is about once in 35 years, and about once in 100 years for
safety-related failures. Again this is consistent with a SIL2 target (10 to 100 years).
Like the previous study, this study found a correlation between application
complexity and unreliability.
G.2 Theoretical analysis

There are a large number of reliability growth models available, but they generally
require detailed data and are only accurate over the short term. However there
has been recent work on estimating a worst case bound for reliability after a
given period of operational use [3]. This prediction can be made with quite
limited data.
Provided that:
the pattern of use is broadly similar over time
faults are diagnosed and corrected immediately
the theory predicts that for a system with N faults, the achieved reliability after a
usage time T will be bounded by:
MTTF
Version: 1.1
e
T
N
97
where MTTF is the mean time to failure, and e is the exponential constant
(2.7181..). Studies of empirical reliability seem to indicate that this result applies in
practice, and the empirical results shown in the section above are consistent with
this finding. For example [3] discusses the application to a number of data sets.
One of these is from three generations of teleswitch equipment. Most of the
detailed data are confidential, but information is available about: the number of
known faults; the software size; and the failure rate over time. Most of the reliability
growth data are based on operation in the field. One complicating factor is that
new systems were being progressively installed on different sites, each with a
different operational profile and possibly different software options, so that new
parts of the input space could be covered for each new installation. The results for
one generation of teleswitch are shown below. We have used a fault estimate
which is 50% greater than the known faults.
MTTF
Predicted Bound
(N=175)
10.00
1.00
MTTF
(years
0.10
)
0.01
0.001
0.0
0.1
1.0
10.0
100.0
Prior Usage Time (years)
Figure 1 Teleswitch reliability growth

(note axes are logarithmic)
Note how the model provides a long term prediction. The following shows the
growth in time-to-failure (TTF|t) using random input distribution test data.
98
Version: 1.1
1000000
100000
10000
TTF
1000
(cycles)
100
TTF
Bound
(N=31)
10
1
100 1000 10000 1E+5 1E+6

Usage Time
(cycles
Figure 2 Growth of time to failure: PODS uniform random test data
(note axes are logarithmic)
1
10
The predicted lower bound is also plotted on the figure, assuming N=31. It can be
seen that most TTFs lie above the bound. The bound actually relates to the
average TTF, so statistically some TTFs could fall outside the limits. The one point
that falls a long way below the line is known to be a correction-induced fault, but
this has little impact on subsequent reliability growth
G.3 Application of the theory to COTS

Any claims for a COTS product based on such evidence would have to
demonstrate that the underlying assumptions were respected, namely that:
the developer has an appropriate infrastructure for collecting and

analysing field fault reports
the developer has appropriate quality and configuration management
controls for implementing the required corrections
the product is mature, so that successive releases are related to fault
corrections rather than the addition of new functions (which could
introduce a new set of faults)
Version: 1.1
99
the usage of the product in the intended safety application is typical

and avoids infrequently used functions
To perform the calculation, we also require the overall usage time of the product
and an estimate of the number of residual faults. The usage time can be inferred
from the number of units sold, and a reasonably good estimate of residual faults
can be obtained by multiplying the software size by the expected fault density.
The fault density might be provided by the developer, or a generic figure could
be used. Relatively conservative generic values of fault density are:
1 fault per kilobyte of binary code
(if there is no knowledge of the

source code)
10 faults per kilo line of source
(lines of source code excluding

comments)
For example, a small PLC with 20 kilobytes of code might have 20 residual faults. If
the PLC had 10 000 years of prior usage we might expect the MTTF for operating
system faults (excluding hardware failures) to be better than:
e
2.718 10000
T=
1300 years
N
20
This is consistent with the empirical evidence (i.e. that no PLC operating system
failures were observed in 1000 years of operation). More complex systems will
have more faults so the expected level of reliability growth will be lower. For
example a teleswitching system might contain ten to a hundred times as many
faults, so the reliability after a similar level of usage might be one or two orders of
magnitude lower (i.e. between 10 and 100 years MTBF which is broadly consistent
with empirical observation)
The theory and empirical results support the KISS principle (Keep It Simple). Simpler
systems should contain fewer faults and hence become reliable more rapidly
than large systems.
It also follows that rapidly evolving designs will be more unreliable than stable
designs. Some systems may be subject to continuous change to incorporate new
functions. These changes can reduce the reliability to a much lower level since
the new faults will have been exposed to relatively little usage and hence can
have much higher failure rates. Under conditions of continuous change, the
failure rates of the new faults can be the dominant factor, i.e. the limit will always
be worse than eT/N where N is the number of new faults introduced in the
periodic upgrades which occur after a usage time T. So for a system that
introduces 100 new faults in each upgrade, and upgrades once per year over
100
Version: 1.1
1000 sites (i.e. N=100, T=1000 years), the best reliability that can be expected at
the end of the year is at most:
eT 2.718 1000
=
27 years
N
100
and in the early stages in the upgrade period the MTTF bound will be much
smaller. It would therefore be sensible not to upgrade to a new version until
extensive field experience has been gained.
G.4 Application to a new system

The theory can also be used to predict field reliability for a system which is under
development. However the approach cannot support a claim for high reliability
since the amount of usage time T that can be accumulated under the realistic
conditions is typically quite low. For example, the theory predicts that, for a new
system with N=100 residual faults N and T=4 months of realistic field trials, the
resultant MTTF would be around 3 days.
The only exception to this might be demand-based system (such as shut-down
systems and interlock systems) where a large number of realistic demands can be
simulated in a relatively short test period. A similar equation can be used where
the probability of failure on demand (PFD) is calculated by replacing T with the
number of test demands. In this case the theory predicts that for 100 faults,
around 37 000 test demands would be needed to achieve a PFD of 10-3. Such a
test programme might be feasible within a test period of a few months.
For real-time systems with low reliability requirements or demand-based systems, it
is feasible to measure the achieved reliability directly (e.g. record the failures and
execution rate or demands and compute the reliability). Clearly it takes less than
4 months to demonstrate whether an MTTF of 3 days is being achieved.
Thus the main relevance of the theory is that it gives an indication of how long it
will take to reach a given reliability level. This can be used to check the realism of
the project plan and the likelihood that target levels of reliability will be reached.
G.5 Estimating residual faults

In order to apply the theory, it is necessary to make an estimate of the number of
residual faults. The available evidence suggests that program complexity is the
primary determinant of the number of residual faults. The number will increase
approximately linearly with program size but specific development methods, tools
and verification methods can reduce the occurrence rate. In the PLC study the
program size was measured by the number of coils or input-outputs. The
average incidence of faults detected in operation was around 0.5 faults per 1000
input-outputs. In conventional computers, program size is typically measured in
Version: 1.1
101
kloc (kilo-lines of code) and studies show the post-delivery fault density might lie
between 1 and 5 faults per kloc for conventional development processes.
For in-house software development more accurate estimates of fault density may
be feasible. For a well-established development process applied to large systems,
more precise estimates might be obtained from process profiling. This involves
estimating the fault detection profile for previous projects, and the early
developmental fault data can be scaled to derive accurate estimates of residual
faults. To illustrate the following table shows the process profile of the PODS
software diversity experiment [1]. The results are probably not typical of larger
projects, but it does illustrate the overall approach.
Detection Method
Cust. Req
68
Suppl Spec Design

53
19
Code
26
Cust. Spec Review
52
Suppl. Spec Review
38
Design Review
14
Code Review /Test
24
Acceptance Test
14
Faults
Created
Detected
Remaining
Table G1: Fault detection performance (PODS project)
The column headings are stages in development (i.e. production of documents

and code). The row headings are fault detection techniques that are applied.
From these results we obtain a profile of the fault creation and fault detection
rates. Assuming the process is typical, the next project can rescale these figures
(e.g. based on the relative sizes of the programs) or on the relative number of
faults found in the early phases of the project. For example, in the PODS project
around 10% of faults escaped acceptance testing and were found in simulated
field operation. If another project used the same process and 40 faults were
found at acceptance testing we might expect around 4 faults to remain in the
released software. While it may not be typical, the figure of around 10% of the
acceptance faults has been observed in some large real-time projects as well.
Alternatively, the data can be used to estimate fault density at different stages of
development (and especially in field operation). Given knowledge of the size of
the final product in terms of lines of code, and assuming a similar process will yield
similar fault densities in the released product, an estimate can be formed be
multiplying the program size by the predicted residual fault density.
102
Version: 1.1
G.6 References
[1]
P.G. Bishop, et al, PODS a Project on Diverse Software, IEEE Trans.

Software Engineering, Vol. SE-12, No. 9, 1986, pp 929-940
[2]
P.G. Bishop, R.E. Bloomfield, C.C.M. Jones, Quantification of Software

Reliability in a Nuclear Safety Case: Main Report QUARC 1 Project,
GNSR/CI/2/3, SNL Contract 70B/0000/006384 Adelard report ref,
D/68/4301/4 v1.0, 4 May, 1995
[3]
P.G. Bishop and R.E. Bloomfield, A Conservative Theory for Long-Term

Reliability Growth Prediction. IEEE Trans. Reliability, vol. 45, no. 4, Dec.
1996, pp 550-560
[4]
R.I. Wright and A.F. Pilkington, An Investigation into PLC Reliability, HSE
Software Reliability Study, GNSR/CI/21, Risk Management Consultants
(RMC), Report R94-1(N) Issue B, Nov. 1995
Version: 1.1
103
104
Version: 1.1
Appendix H Long term issues

H.1 Introduction
This section provides guidance for
maintaining the integrity of the safety
case. The long term maintenance of the
safety case should address the:
safety case maintenance process
technologies used to underpin the safety case
human resources used to implement the process
organisational infrastructure used to sustain the process
We have assumed in our recommendations that the safety management context

will be broadly similar over time, i.e. that there will be periodic safety reviews;
there will be certain stakeholders (procurers, suppliers, regulators, safety
departments, operators and safety committees); and there will be established
safety and quality management processes.
We define the overall long term objective as:
The safety case should remain acceptable over its planned lifetime and
respond to changes in the equipment, environment, and technical
knowledge.
In Appendix J we provide an elaboration of this objective and checklists to
support its achievement. Below we discuss the impact on the safety management
process.
H.2 Incorporating the guidance in existing safety management processes

Long term issues should be addressed during the design and the development of
the safety case (see Section 5) and of course during the operation and
maintenance phase of the lifecycle. We envisage that the assessment guidance
will be integrated into existing safety management activities, namely:
safety and quality audits
Version: 1.1
105
periodic reviews of the safety case
the approvals process for system modifications
These processes should be extended to include the integrity of the safety case
infrastructure. In the case of the system modification procedure, this may require
a redefinition of a modification to include changes in the safety case
maintenance environment (i.e. people, structures, resources and procedures),
and an extension to the scope of periodic reviews and audits.
We propose two forms of assessment:
A safety case infrastructure assessment, to assess the capability for

maintaining and updating the safety case. This will be based on an
assessment of the adequacy of the supporting resources needed to
maintain the integrity of the safety case, e.g. documentation, procedures,
staff and organisation, and technical resources.
A safety case technical assessment, to establish the continued adequacy
of the safety case itself and its capability to meet new requirements.
We also identify the set of documents needed to support these assessments, and
propose that a mechanism is put in place which updates the assessment
guidance in the light of practical experience and new technical knowledge.
It will also be necessary to consider what activities should be implemented at the
system level and the corporate level. It would be logical to make the long-term
monitoring of new technical knowledge a corporate function so there should be
some central corporate activity which:
106
collates system experience (e.g. failure data, common cause failure and
incident analyses)
monitors technical advances, standards and regulatory requirements
alerts system sites to immediate problems (e.g. to other sites with similar
systems)
analyses past experience and updates the design and safety case
assessment guidance and checklists
Version: 1.1
H.3 Long-term improvement of the safety methodology

It is unlikely that any proposed assessment methodology will be perfectaspects
may be incomplete or incorrect, so it is necessary to have some feedback
mechanism which assess the performance of the methodology and incorporates
practical operational experience. Such processes have been implemented in the
aerospace industry. The basic approach for learning from experience is to:
maintain records of past experience (e.g. incidents, failures, etc.)
actively capture new feedback data and application experience
analyse the data and derive the lessons learned (these could be more
general than the incident itself)
review and validate the new rules
incorporate the rules in design criteria, claim limits, checklists, assessment
procedures, etc., for use on subsequent projects
verify the application of the new rules applied in each project
It should be noted that this long-term improvement process need not result in an
increased assessment and maintenance burden. Optimisation is part of the
processthe performance and costs of the existing rules and recommendations
should be assessed. If these prove to be ineffective or irrelevant or there are more
cost-efficient alternatives, the rules should be changed to reflect this.
Greater knowledge of the maintenance effort could have a significant impact on
the approach to the design of new safety systems (e.g. by employing simple
designs or additional defences which reduce the safety case maintenance
requirements). This could be reflected in updated design guidance and design
criteria.
H.4 Safety case maintenance documentation

The documents required to deal with the dynamics of long-term safety case
support include:
Infrastructure requirements. This identifies the expected level of safety case
maintenance support (e.g. staff competencies, technical resources and
staffing levels). This would have been considered at the initial design stage,
but will be updated in the light of subsequent changes.
Version: 1.1
107
Anticipated change list. This is a list of possible changes that have been allowed
for in the current system design and safety case, and may need to be
updated over time.
Current concerns list. This would include lists of safety issues that require either
resolution, monitoring or further analysis (similar in principle to the Hazard Log
defined in Defence Standard 00-56). This list can change with time (e.g.
problems can be resolved, and new concerns can be identified).
Safety case infrastructure status report. Produced at periodic reviews to assess the
adequacy of the safety case infrastructure, i.e.:
staff competencies
documentation
technical resources
and any recommended remedial action. Identified problems would be

added to the list of current concerns.
Operational safety case status report. Produced at periodic reviews to assess the
integrity of the current safety case, this would include:
unresolved issues and reported problems
changes to safety case and system since the last review
audit of operation and maintenance procedures and their compliance
review of anticipated changes
summary of the safety case infrastructure report
identification of any potential problems and recommendations for

remedial action
Identified problems would be added to the list of current concerns.

Assessment checklists. Used in the periodic assessments of the integrity of the
safety case and its infrastructure.
Change log for safety case. Records changes to the safety case.
System configuration record. Records the current and past states of the safety
system.
108
Version: 1.1
Feedback records. Contains incident records, equipment failure records, software

and hardware problem reports.
Feedback analyses. Identify problems and update rules and recommendations
(design rules, infrastructure rules and checklists). Identified problems would
be added to the list of current concerns.
Change requests. To correct problems, to respond to operational or regulatory
changes, etc.
Modification impact analysis. Assesses the safety implications and costs of a
requested change. This can result in either a rejection of the request, or a
modification proposal with an appropriate safety classification, which is
handled by the existing system modification procedures.
Safety case construction and maintenance guidance. This guidance could be
updated in the light of experience.
Updated safety case and system design documents. These are the final products
of the safety case maintenance process.
Supporting DocumentsQA records, CM records, V&V and test records, safety
analyses, safety audit reports, etc., produced by the existing QMS and safety
management processes.
Version: 1.1
109
110
Version: 1.1
Appendix I Maintenance and

human factors
Maintenance of a safety case can be
viewed as a general human factors
problem, by virtue of its being an
example of complex collaborative
human activity. As such, it is subject to
weaknesses arising in its constituent human individual and collaborative work.
In this appendix, we review the generic vulnerabilities and discuss how these
might manifest themselves in terms of a safety case and its supporting processes.
The weaknesses arising from human factors can be broadly classified in terms of
their source, as follows:
1. generic individual weaknesses to errore.g. skill-based, rule-based, and
knowledge-based errors
2. supporting materialshow does the design and actual use of documents
and other representations help or hinder the human activities they are
designed to support?
3. violations of established proceduresdo existing accepted or
documented procedures encourage deviations or departures that are
(a) harmful or (b) necessary to get the job done?
4. generic group weaknesses to errore.g. group co-ordination and
process failures arising from inappropriate resources; co-ordination
failures or motivational problems
5. organisational weaknesseshow does the communication, culture and
structure of the organisation impact on the process and its constituent
activities?
I.1 Individual weaknesses

In general, errors due to human factor weaknesses do not arise in a random,
haphazard manner. One well-used taxonomy of errors (see for example [7]), is the
distinction between skill-based, rule-based and knowledge-based errors. Skillbased errorssometimes called slips and lapsesare typically associated with
execution failures of routine planned action, and can thus arise when any of the
components of planned action fail.
Version: 1.1
111
Rule-based activity can be thought of as problem-solving where pre-packaged

solutions or rules are applied. Thus rule-based mistakes can arise if those rules are
inappropriate or misapplied.
Knowledge-based activity, where plans and solutions have to be devised without
recourse to pre-packaged rules or solutions, is also vulnerable to error.
Knowledge-based mistakes may arise when errors are made in the formulation of
plans or in judgmental processes and reasoning about novel situations based on
existing knowledge. For example, availability and frequency biases may be
exhibited in solutions to novel problems that are merely chosen because (a) the
solution easily comes to mind or (b) the solution is chosen because the current
problem seems to resemble an existing problem, or has been used before.
The maintenance of experts skills and domain knowledge is clearly an issue for
complex processes. In particular, we might be concerned with the maintenance
of expertise in the use of notations and languages. One concern might be over
the ability to read and understand a computer program written in an almost
dead language (e.g. Argus assembler); or at another extreme even over the
natural languages used to document the safety case. In the USA, for example, a
significant proportion of the workforce on PES systems on chemical plants may not
have English as their first language. We might also be concerned with some
artefacts of the maintenance process (e.g. an antique piece of computer
hardware, or static analysis tool) and how expertise is maintained in their use. If
certain aspects of the system require external expertise, there is a likelihood that
some de-skilling may take place for internal experts.
The maintenance of expert knowledge requires close consideration of how
expertise and know-how is to be transferred and communicated within the
particular community maintaining the safety-case. Different types of expert
knowledge require different kinds of support. Explicit knowledge (facts,
information, text-book knowledge etc.) may be transferred by means of formal
documentation within the appropriate organisational groups. However, part of an
experts skills and knowledge may be in the form of tacit knowledge or knowhow that is hard to articulate and transfer. Attempts to manage organisational
knowledge merely by focusing on those aspects that are easy to manage
(namely explicit knowledge) may exclude other aspects that require different
kinds of organisational and technical support. Tacit knowledgetypically the
informal how-to-do-it expertiseis often poorly supported by documentation,
and typically requires some form of sustained apprenticeship to be transferred.
We therefore need to consider the maintenance of domain knowledge for the
individual; especially the role of tacit knowledge which tends to remain in the
head of the particular expert. If the appropriate level of expertise and
knowledge is not sustainable, maintainers of safety cases may not justifiably
assume that the rationale behind all aspects of the safety case will be
112
Version: 1.1
reconstructable or make sense outside the particular context in which it is first

constructed.
I.2 Supporting materials

Supporting diagrams, documentation and notations perform a central role in the
maintenance process. Careful consideration needs to be given to whether the
structure and design of such materials supports the work actually done.
Maintenance of individuals knowledge in obscure notations and representations
may become an issue when those notations or conventions fall out of common
usage.
I.3 Violations
Past incidents and accidents may provoke restrictions and prescriptive
procedures on the actions of users of the system. Increased maturity adds further
restrictions as time goes by, perhaps resulting in procedural over-specification to
the point where user violations are the only way to actually get the job done. An
overly-prescriptive safety case procedure set against time demands may
therefore encourage violations and result in unsafe acts. Thus it is important to
consider, in an open-minded manner, any difference between the prescribed
procedures for the process and the actual procedures followed by the users.
I.4 Group weaknesses

The maintenance of safety cases involves the co-ordinated activity of teams of
experts. These teams may include a wide range of stakeholders in the safety case
maintenance process including technical experts, plant managers, regulators
consultants and so on. The ability of such diverse groups to maintain good
cohesion, co-ordination and functioning is central to the success of the safety
case maintenance activity.
Consideration of the social-scientific literature can be used to identify potential
sources of weakness for co-ordinated social group activity. In particular we can
look for weaknesses associated with:
resourcesare the available human resources lacking or inappropriate

for the task? Teams and group leaders should be selected according to
the skills and experience they possess relevant to the task.
normshow is the function of the group presented to the group
members? Are the group norms that govern the groups functioning
explicit and available to the group members or to observers outside the
group? How is consensus formed and managed within the group?
Version: 1.1
113
performancehow are the contributions of group members produced

and co-ordinated? For example, there may be: status-related problems
(e.g. experts are over-believed); socio-motivational problems (so called
free-rider problems); or group co-ordination problems arising from
inappropriate leadership styles and group management.
evaluationhow are the individual contributions and overall products of
the group evaluated? If the supervision and evaluation of the group is
inappropriate, members may be apprehensive about contributing, or
contributions may be inappropriately weighted towards expert or
majority opinion.
I.5 Organisational issues

Organisational factors and corporate culture must also be considered when
identifying sources of weakness due to human factors. At an organisational level
the maintenance of a safety case clearly depends on the process within an
organisation, and also on the assumptions about the nature of the organisation
embedded in that safety case. Organisational change is a fact of life and the
extent to which an organisation copes with change can greatly effect safety. For
example, Rochlin [5] identifies organisational failure in high hazard technologies
arising from organisational change as:
failure to adapt to changing technologies and retention of traditional

forms of decision making
too rapid change and the neglect of an organisations experience base
failures due to not recognising that changed organisational or mission
goals could undermine existing mechanisms for error control
Many analyses of incidents (e.g. the Challenger and Herald of Free Enterprise
disasters) that have naively been attributed to human error have shown that
organisational context and culture are central in assuring the safety of a process.
An inappropriate organisational context surrounding a process may provide
latent errors that lie dormant until a particular set of coinciding events come
together to form a safety critical incident. Furthermore, organisational, culture
and communication structures (e.g. [6]) determine the extent to which corporate
knowledge and good practice may be reused; otherwise old problems may have
to be revisited and solved afresh each time. Organisational weaknesses may be
associated with:
114
structureWhat is the organisational structure? Different categories of

organisational weakness are associated with, for example, centralised vs.
decentralised structures; complex vs. linear interactions; and tight vs.
Version: 1.1
loose component coupling. For example, do any single points of failure

exist in terms of vital human resources or communication links?
communicationsWhat are the existing communication channels and

structure? How is safety case knowledge disseminated? How is nonexplicit knowledge recorded if at all? If people are the main sources of
communication links (X meets Y), are there other means by which the
non-explicit, tacit corporate knowledge and learning can be
disseminated to appropriate personnel?
safety-cultureWhat is the actual safety culture of the organisation? Do
procedures encourage violations of regulations? How is responsibility for
safety managed and controlled?
learningHow are experts re-trained in the light of technological and
organisational change? How is experience made available to a wider
audience and disseminated throughout the organisation? How is tacit
knowledge managedit may certainly be easier to ask Fred, but what
happens to Freds knowledge when he leaves?
While the documentation, structuring and other systemisation of a safety case

may make it more explicit and make the process altogether more algorithmic,
there are limits to how much knowledge can be captured in this way.
I.6 Knowledge management

The problems of maintaining an adaptive response to the changes in structure
and demand for organisational knowledge resources raise general issues
surrounding the management of knowledge and learning within the whole
organisation.
In particular a long term knowledge-based project such as the maintenance of a
safety case requires a commitment to knowledge management at an
organisational level.
Different types of knowledge require different management strategies. For
example, the role of tacit knowledge needs to be recognised and integrated
within long term development projects. Recent work has been done looking at
the role of tacit knowledge in the design and uninvention of nuclear weapons
[2] portraying a picture whereby expert know-how may cease to exist unless
actively maintained and transferred. Also, in the medical sector there is the
example of a large hospital that threw away a scanner when the only technician
who knew how to operate it left. Attempts to codify tacit knowledge into an
explicit form are therefore of value if they provide some means of access to
otherwise inaccessible expertise.
Version: 1.1
115
However, research [1,2] cautions against an overly technical approach to some

of the problems of long term maintenance. For example, the use of computer
based expert systems into which are poured the expertise of an engineer
before he retires are at best only a very limited solution to the problem of loss of
experienced personnel. Such attempts seldom capture the valuable aspects of
expertise that are typically hard to codify and formalise. Instead, the formal
knowledge may be easily encoded, but without any notion of how that
knowledge should be applied, or when it is no longer applicable.
Hard-core tacit knowledge is much better passed on through a process of
apprenticeship, whereby this expert know-how can be transferred through
observation and shared participation.
On an organisational level, informal communities of practice may evolve to
support the sharing and dissemination of knowledge between interested parties.
Such groups tend to be focused around particular problems or interests and,
although they typically do not produce deliverables or documents, provide an
important source of information sharing within a long term project.
Organisational supportrather than strict management of such bodiescan
enable these informal social networks of learning and knowledge sharing to
provide additional sources of important non-formal knowledge and continuity for
their participants.
I.7 References
[1]
Collins Artificial Experts, Social Knowledge and Intelligent Machines MIT

Press, Cambridge, Mass, 1990.
[2]
D MacKenzie and G Spinardi, Tacit knowledge, weapons design, and the

uninvention of nuclear weapons. Manuscript 1994.
[3]
Safety Case management task forceFinal report to steering group:

proposals for the redesign of the safety-case management process.
[4]
F J Redmill, Dependability of Critical Computer Systems 2, Part 2:

Maintenance and Modification. ISBN 1-8516-203-0, Elsevier, 1989.
[5]
G Rochlin, Essential friction: error control in organisational behaviour, in The

necessity of friction, Springer Verlag, 1993.
[6]
Scott D Sagan, The Limits of Safety: organisations, accidents and nuclear

weapons, Princeton, 1993.
[7]
Reason, J., Human Error. Cambridge University Press, Cambridge, UK, 1990.
116
Version: 1.1
Appendix J Example checklist

long term issues
J.1 Basis for the checklists
In this section we define the basis for the
checklists which follow. This is achieved
by considering the overall objective of maintaining the safety case and then
elaborating the implications of the different parts of this definition. We assume
that, as an initial condition, the safety case is accepted, the maintenance
infrastructure has been assessed and the system has been licensed, possibly with
known, negotiated concerns (e.g. things to fix later, monitor or investigate).
The overall objective is as follows:
knowledge.
We now consider the implications of this objective. First we consider the
requirements arising from a static safety case (where the basic system and the
environment are unchanged). Responding to the inevitable changes is dealt with
in Section J.3.
Version: 1.1
117
Maintaining
the safety
case
Respond to
changes
Environment
Remain acceptable given

static system and
environment
Demonstrable
Equipment
Consistent
Technical
Knowledge
Valid
Adaptable
Human resources
Technical resources
Documentation
J.2 Remain acceptable

To remain acceptable the safety case should be: demonstrable, consistent, valid
and adaptable.
It might be argued that such requirements are irrelevant if a system and safety
case is unchanged. But as a matter of principle, a company should be able to
understand the systems it operates, and be able to demonstrate safety to a third
party at any time. The safety case maintenance infrastructure provides this
understanding. In practice of course, nothing is ever entirely static, and a safety
case infrastructure is needed to assess potential threats to the integrity of the
safety case, and to demonstrate its continued acceptability in the light of new
developments. It is also necessary to check that the infrastructure has not
degraded with time and is capable of responding to anticipated changes.
118
Version: 1.1
J.2.1 Demonstrable
The safety case should be demonstrablefor each stakeholder there should be
adequate human resources, documentation and technical resources to
understand and evaluate the safety case.
Adequate human resources
The safety case is not demonstrable unless there are people available who
understand the safety case and its relationship to the safety system. This
requirement applies to each stakeholder. Some of the issues involved in
maintaining safety case knowledge and skills are discussed below.
Maintaining Skills: There will be a need to identify the skills and knowledge
necessary for the stakeholders. The required skills and knowledge should be
documented, together with the staff who provide these capabilities (e.g. in a
competence matrix). This should include any key sub-contract staff who
provide maintenance support. This matrix should define the required depth of
understanding; in some cases it may only be necessary to have sufficient
knowledge to understand what others have done, while in other cases there
should be the in-depth knowledge needed to create acceptable documents or
designs.
The safety case infrastructure status assessment should:
assess the available competencies of the staff and sub-contractors

report on the adequacy of coverage; this would include the depth of
understanding in each area
identify any potential risks, such as excessive dependency on a single
person or sub-contract
recommend remedial actions, such as recruitment, training, supporting
contracts or in-house support
Tacit knowledge: The safety case may rely on unexpressed knowledge and
expertise within the safety team or supporting experts. Some of this may be in the
form of implicit assumptions and background rationale for design decisions that
have been made. However some of the deep expertise of domain experts may
be in the form of know how and difficult to express (see Appendix I). This form of
tacit knowledge is hard to formalise and codify and can be a vulnerability once
these key personnel retire or move on.
Version: 1.1
119
To address this vulnerability, the review should assess the extent of tacit
knowledge, and recommend how it may be converted to explicit knowledge, or
maintained for the future.
Adequate documentation
The safety case documentation set should meet the needs of the various
stakeholders. It should be written with a clear understanding of who the target
audience is, their likely tasks, and how the safety case documentation set is going
to support these tasks. In particular it should be:
completein terms of coverage of the (sub-)system, and references to

supporting material
well-structured (good indexing and cross-referencing)to support basic
user navigation and document understanding
understandable and usable by the various stakeholdersdoes the safety
case make reasonable assumptions about the audiences background
technical knowledge, does it support user tasks such as review, evaluation
and assessment
A global assessment of the adequacy of the documentation could be performed

prior to acceptance of the original safety case, and be rechecked after any
significant change.
Adequate technical resources
Adequate technical resources should be available to maintain the safety case.
These should include the ability to reconstruct evidence and supporting analyses,
such as: hazard analyses, reliability, availability and maintainability analyses, and
so on. This will require the existence of appropriate tools and techniques to reuse
and reinterpret such data.
Due to the typical size and complexity of a safety case there is a need to provide
adequate support for the documentation set itself. This tool support should itself
support and be based on an understanding of:
Safety case developers taskssuch as authoring, document management,
review and evolution
Safety case user taskssuch as assessment, navigation, search and so on
These will need to be supported by
120
document archive and retrieval systems and data bases
Version: 1.1
document configuration control, analysis, cross reference and navigation

tools
word processing facilities
A periodic assessment should be made of the availability of the tools, hardware

and software environments and the people needed to support these functions.
The assessment should:
identify any potential problem areas (e.g. obsolescence of tools, lack of

resources)
make recommendations to rectify the situation (e.g. migration to a new
database or word processor, training, use of external consultancy, etc.)
J.2.2 Consistent
For each stakeholder (e.g. operation, regulator or safety department) the safety
case documentation set(s) should be consistent, i.e.:
all the stakeholders should have the same documentation set
the documentation set should relate to the current system in operation
the documentation should be internally consistent (in terms of crossreferences and dependencies)
The available records can be reviewed and stakeholder sites can be audited to
see if the latest versions have been distributed. Even if the documents remain
unchanged, responsibilities and organisations may alter, and it may be necessary
to check that the current stakeholders have the relevant documentation.
An audit can also be performed to check whether the safety case has taken into
account any changes to the system and the operational environment, e.g.:
changes to related systems
changes in operation and maintenance procedures
Any mismatches should be identified, the causes analysed and, where necessary,
changes in procedures implemented. This may require an analysis of existing
processes, and should take into account human factors aspects (see also
Appendix I). For example, a private marked-up copy may exist which reflects
the true configuration of the system. One response is to make the procedures
more strict. However a human-centred analysis might conclude that the existing
Version: 1.1
121
procedures are too restrictive, causing the official procedures to be bypassed. If

this was the root cause, a more streamlined and rapid document updating and
dissemination process might be needed.
J.2.3 Valid
Issues to be addressed include:
Stability of the environment so that the safety case assumptions and

evidence remain valid.
Procedures should exist for monitoring for changes in the equipment and
environment that could invalidate the safety case.
The safety case maintenance process should track identified concerns or
caveats in the initial safety case.
The integrity of the PES equipment should be assessed to ensure that
physical deterioration does not affect the performance levels assumed in
the initial safety case.
The maintenance and operational procedures required by the safety
case should be checked for adequacy and conformance.
J.2.4 Adaptable
The safety case and supporting process should be capable of responding to
anticipated changes. As part of the overall safety case methodology, a list of
anticipated changes should be identified, and the system design and safety case
should be able to accommodate those changes.
The capacity to adapt to change should be periodically assessed, e.g. by:
122
Reviewing and updating the list of potential changes. The anticipated

change list should be reviewed and updated in the light of operating
experience, changing requirements (e.g. for changed modes of system
operation such as a change to load following from base load operation)
and developments in technology (e.g. test methods, understanding of
diversity, sensors, or obsolescence).
Assessing the capability to implement such changes (e.g. availability of
staff skills or technical resources) and documenting the areas that are
difficult to change.
Version: 1.1
Recommending any necessary changes to the support infrastructure to

facilitate change (i.e. recruitment, re-deployment, training, increasing
technical resources, etc.).
Recommending feasibility studies to cope with impending changes. For
example, technical obsolescence might be addressed by using a
software emulation of an earlier computer. A study might be needed to
establish if this was feasible and to assess the impact on the safety case.
J.3 Respond to changes in the equipment, environment, and technical

knowledge
J.3.1 Equipment Changes
The safety case should evolve and remain acceptable after equipment changes.
There are a number of different categories of change:
changes to equipment maintenance and operational procedures (e.g. to

replacement of obsolete hardware, but with no changes to functionality
(e.g. a simple refurbishment)
changes to functionality but not the basic equipment (e.g. changes to
parameter settings or configuration options which have been anticipated
in the original design)
changes in functionality which may involve changes to both hardware
and software
For any change, there should be processes in place to assess and manage the
impact of these changes on the safety case. This will include:
analysing safety significance (both for the system and more globally)
identifying what changes are required to the safety case and the system
design
assessing commercial aspects of the change: risks and costs
(implementation cost, outage delays and lifecycle support costs)
negotiating the proposed changes (e.g. to procedures, equipment, and
safety case) with appropriate stakeholders (the relevant stakeholders will
Version: 1.1
123
depend on the safety category and commercial significance of the

change)
approving the changes with the licensors
implementing the change
At a more general level, we can also use long-term experience feedback to

improve the overall safety case design and maintenance methodology. This is
discussed in more detail in Section H.3.
J.3.2 Changes in the environment

The safety case should remain acceptable after changes to the environment
(e.g. to organisations, individuals or supporting tools).
Periodic assessments would normally check that the environment is substantially
the same, and could recommend remedial action if any creep in the
environment is found. This can include:
loss of skilled staff and associated tacit knowledge and know how (see
Appendix I)
changes of responsibility and organisational restructuring
loss of ready access to key resources (e.g. documentation, technical
equipment or expertise).
obsolescence of technical resources (e.g. databases or test equipment)
project stress (e.g. reduced time-scales or greater work-loads)
changes in management attitude and culture (e.g. is the notification of a

safety problem rewarded or penalised? do payment schemes encourage
several quick bodge-ups rather than a single good repair?)
There are a number of possible recommendations for remedial action. Typically

these involve the development of a policy and strategy to support the relevant
organisation safety case knowledge. This should be supported by appropriate
communication channels within the organisation together with systems to support
knowledge sharing and transfer (see Appendix I.6)
Major changes to the environment should be assessed and approved in
advance, since they can have a very significant impact on the safety case. These
would usually be organisational changes. The process would typically require a
124
Version: 1.1
proposal which identifies the organisational changes and maps out the changes
in resources and how the new structure is to be aligned with the safety case
maintenance tasks.
For both remedial actions and major changes there should be a process involving
the system stakeholders which can accept the proposed change and approve
the resulting implementation. This would typically be part of the normal safety
management process (e.g. involving the plant safety committee, the corporate
safety departments and the licensors).
J.3.3 Changes in technical knowledge

Maintaining an adaptive response to technical changes requires a similar
approach to that of responding to general environmental changes. In particular
there is a need to be aware of the contribution of domain experts to the integrity
of the safety case. There needs to be an approach to managing this intellectual
asset so that the safety case can (a) make the most of any changes (b) mitigate
against vulnerabilities arising from any loss of expertise.
The safety case should remain valid or be improved in the light of new technical
knowledge.
New technical knowledge can come from a variety of different sources,
including:
Long-term monitoring of the validity of design assumptions (e.g.

quantitative data, such as failure rates, and new phenomena, such as
unforeseen failure modes). This can be obtained using experience gained
from the operational system, similar systems and generic data.
Analysis of incidents and experience (with safety or commercial
consequences) on the current and other systems.
Industry-wide technical interest groups that share experience
Technology watch to identify new methods or new threats (e.g. mobile
phones and EMI, or viruses).
Changes to interpretations of engineering principles (e.g. diversity or the
single failure criterion). For example in early protection systems, similar
designs implemented with different relays were considered diverse. A
more strict interpretation might require different designs and different
technology.
Version: 1.1
125
Reinterpretation of ALARP (e.g. the policy on what is feasible, or the

influence of other organisations practices).
The impact of any new knowledge on the safety case can be either positive or
negative. The information should be assessed to establish whether:
The safety case is still valid, or whether changes are required to the safety
case or the system.
The safety case is too conservative (e.g. pessimistic design assumptions for
fail-safe bias, failure rates, etc.). The new information may permit stronger
claims to be made about the system.
Whether the safety case is still ALARP (e.g. are any of the new methods
reasonably practicable?).
At a more general level, we can also use long-term experience feedback to

improve the overall safety case design and maintenance methodology. This is
discussed in more detail in Section H.3.
126
Version: 1.1
J.4 The checklists

Here we provide a checklist of issues and questions to be addressed in meeting
the main objective defined above, namely:
knowledge.
To meet this objective we proposed the following sub objectives:
Remain acceptable
Demonstrable in terms of human resources, documentation and technical
resources
Consistent
Valid
Adaptable
Respond to changes
Equipment changes
Changes in environment
Changes in technical knowledge and learning from experience
J.5 Demonstrable
The requirement:
The safety case should be demonstrablefor each stakeholder there should be
adequate human and technical resources and documentation to understand
and evaluate the safety case.
The following sets of questions address this requirement.
J.5.1 Human resources

Who are the stakeholders that need to be able to understand the safety case?
Version: 1.1
127
Maintaining skills:
For each stakeholder identify:
Who has application knowledge (e.g. of trip systems, reactor types) ?

Who has knowledge of the generic implementation technologies (e.g.
computers, software, laddics)?
Who has specific knowledge about the system design and
implementation?
Who understands the overall safety case arguments and rationale?
Who understands the safety case arguments?
Who is able to read and understand the details of the safety case?
Who needs to be able to write(change) the safety case in the future?
Ask for evidence of these understandings, e.g. reviews.

The safety case is not demonstrable unless there are people available who
understand the safety case and its relationship to the safety system. This
requirement applies to each stakeholder.
The safety case infrastructure status assessment should:
assess the available competencies of the staff and sub-contractors

report on the adequacy of coverage; this would include the depth of
understanding in each area
identify any potential risks, such as excessive dependency on a single
person or sub-contractor
recommend remedial actions, such as recruitment, training, supporting
contracts or in-house support
Tacit knowledge.
Assess the extent to which the safety case relies on tacit knowledge or know
how of experts. Look for indications such as:
128
Version: 1.1
the existence of gatekeepers who seem to know where to look for

information and how to interpret that information
the requirements for use or interpretation of complex test data,
languages, formalisms
the extent of skilled hands on expertise (e.g. in maintenance or
operation)
Develop a strategy to maintain and transfer this tacit knowledge in the future, for
example by:
converting it to explicit knowledge by making explicit underlying

assumptions and background context
direct explicit attention to elaborating the rationale behind the work
supporting the safety case
enabling the transfer/development of know how by supporting some
system of safety case mentor/apprentice arrangement or training
approach
encourage the sharing of expertise across interest groups or internal
communities of practice
J.5.2 Documentation
What constitutes the safety case documentation set? Is it complete?
Is this structured to facilitate navigation, browsing and searching?
What indexing and cross referencing is provided?
Which stakeholders will use the safety case documentation set?
What assumptions are made about the different users background

knowledge?
What will be typical user tasks in using the safety case? Consider activities
such as familiarisation, evaluation, assessment and so on.
To what extent does the structure and presentation of the
documentation support the user tasks?
Version: 1.1
129
Is the safety case documentation accessible? How?
J.5.3 Technical resources

For each stakeholder ask:
Is there adequate availability of tools, hardware and software environments,
people to use them to reconstruct evidence and supporting analyses used in the
safety case (from safety analysis to testing)? Consider how the users will
reconstruct, interpret and reuse analyses such as:
hazard analyses
reliability, availability and maintainability analyses
functional tests and analyses
performance tests and analyses (i.e. timing, memory usage, throughput,

etc.)
Is there adequate availability of tools to support the access, navigation and

maintenance of the safety case documentation? Consider the following:
document archive and retrieval systems and data bases

document configuration control, analysis, cross reference and navigation
tools
text retrieval and search engines
word processing facilities
user annotations and cross-references
networks/intranets to support distributed use of such documentation
Is there evidence of the use of the tools by the stakeholders for the safety case?
J.6 Consistent
The requirement:
For each stakeholder (operation, regulator, safety dept) the safety case
documentation sets should be consistent with the current configuration of
the system.
130
Version: 1.1
You should consider:
How is consistency maintained between the different versions of the

safety case documentation (e.g. those with different stakeholders)?
What tools are used to support configuration management? Traceability
across versions? Internal document set consistency?
What quality management and change management procedures exist?
Is there evidence of adequate use?
Is a change log maintained for the safety case documentation set?
How is consistency maintained between the actual physical equipment
and plant and the safety case documentation?
Is the right balance maintained between rigorous documentation control
procedures and the need for usability (e.g. are change procedures so
onerous that users maintain their own unofficial versions)?
J.7 Valid
The requirement:
The safety case should remain valid.
Consider the following questions:
Is the environment (in its broadest sense) stable so that the safety case
assumptions and evidence remain valid?
What monitoring is there to detect any changes that invalidate the safety case,
consider the following:
operational modes
interfaces
connected equipment outside the system boundary
new information, e.g. from the analysis of failure data, incidents, and
periodic tests
Version: 1.1
131
Are concerns or caveats in the initial safety case tracked (e.g. things to fix later,
questionable assumptions, continuing investigations or supporting analyses)?
Consider:
questionable assumptions (e.g. on common mode failure, failure rates)
required investigations and supporting analyses
implementation of required fixes
integrity of PES
sensors and actuators
the computing system equipment
ancillary equipment (power supplies, cooling systems, etc.)
Review the maintenance procedures and processes. Consider:
What is the impact of the actual operational and maintenance

performance on the safety case?
Are there adequate procedures to maintain the equipment? Are they
followed adequately?
Is there monitoring of maintenance records to check the maintenance
process?
What types of vulnerability does the equipment have to maintenance
(i.e. was there design for maintainability)?
Have there been any incidents? e.g. has there been loss of availability
spurious tripdue to maintaining redundant channels incorrectly?
Have the causes of any common failures been identified?(e.g. wrong
versions in all channels defeating redundancy, passing of bad data,
wrong connections, system not restored after maintenance, calibration
problems, misdiagnose and repair of the wrong component)?
J.8 Adaptable
The requirement:
132
Version: 1.1
The safety case and supporting process should be capable of responding to

anticipated changes.
The questions:
Has the need for change been addressed in the design of the safety
case?
Is there an anticipated change list? Is this reviewed and updated in the
light of operating experience, changing requirements (e.g. changed
modes of system operation such as a change to load following from
base load operation), and developments in technology (e.g. test
methods, understanding of diversity, sensors or obsolescence)?
Is it possible to adapt the safety case to change? Review the safety case
with respect to the anticipated change list. Assess cost of different types
of change. Document the areas that are difficult to change.
Does the safety case documentation structure and architecture support
its own evolution and development?
Version: 1.1
133
J.9 Respond to changes in the equipment

There are a number of different categories of change that need to be tracked:
changes to equipment maintenance and operational procedures (e.g. to

replacement of obsolete hardware, but with no changes to functionality
(e.g. a simple refurbishment)
changes to functionality but not the basic equipment (e.g. changes to
parameter settings or configuration options which have been anticipated
in the original design)
changes in functionality which may involve changes to both hardware
and software
For any change, is there a process in place to:
analyse safety significance (both for the system and more globally)?
identify what changes are required to the safety case and the system
design, i.e. what is changed and to what extent? Consider:
134
same arguments and type of evidence but regenerated (e.g. for

different compiler or different hardware)
same type of evidence but change in rigour (e.g. more testing)
change in evidence sources (e.g. replace test evidence with field
experience)
change in argument (e.g. change in strategy as fail-safe design
features added)
changes in operational modes or maintenance procedures (e.g.
system modes or contractual arrangements)
changes in deployment and long term support for the system (i.e.
skills, tools, process, training for new technologies, etc.)
assess commercial aspects of the change: risks and costs

(implementation cost, outage delays and lifecycle support costs)?
Version: 1.1
negotiate the proposed changes (e.g. to procedures, equipment, and

safety case) with appropriate stakeholders (the relevant stakeholder will
depend on the safety category and commercial significance of the
change)?
approve the proposal (depends on actual process)?
implement the change?
agree the change with the licensors?
approve the change?
J.10 Respond to changes in the environment

Assess changes to the environment (in terms or organisational, human resource
and tools) that could affect the validity of the safety case. Consider the extent of:
past/proposed loss/migration of skilled staff

organisational protections against the ensuing loss of expertise and tacit
knowledge and know how
any changes of responsibility
loss of ready access to key resources (e.g. documentation, technical
equipment or expertise)
obsolescence of technical resources (e.g. databases or test equipment)
project stress (e.g. reduced time-scales or greater work-loads)
changes in management attitude (e.g. is the notification of a safety

problem rewarded or penalised? do payment schemes encourage
several quick bodge-ups rather than a single good repair?)
Possible recommendations for remedial action include:
recruitment and retraining
documenting tacit knowledge
Version: 1.1
135
setting up structures to spread tacit knowledge (e.g. a safety case

apprentice, group reviews of safety case arguments or providing
organisational support for communities of practice)
facilitating formal and informal communications (e.g. via communities
of practice) between key groups (e.g. access to the instrumentation
group may be needed to discuss the failure behaviour of sensors)
changes in responsibilities (e.g. maintenance of all safety instrumentation
might be assigned to a single person or group)
provision of new technical equipment and resources
changes in resource levels, and management incentive structures
Assess organisational changes which typically have a very significant impact on

the safety case. Develop a proposal to deal with these changes that:
Identifies the current organisations, their procedures, tasks and interfaces,

and staff involved.
Describes the proposed new structure, procedures, tasks and interfaces,
and staff involved.
Includes an impact analysis of the proposed changes that assesses:
136
the coverage of current tasks (are all tasks covered? are some tasks
duplicated?)
the organisational fit (e.g. are tasks spread across organisational
boundaries? will the speed of response be acceptable?)
the loss of expertise and domain knowledge (will past knowledge be
diluted, or split across separate organisations? do we have a
lobotomised organisation?)
inter-group communication and number of contractual barriers (e.g.
the number of interfaces involved in implementing a given activity
or change)
access to documentation and expertise
Provides an argument that the new structure is capable of supporting

design and safety case changes.
Version: 1.1
Includes the input from the system stakeholders
J.11 Respond to changes in the technical knowledge

Identify relevant sources of new technical knowledge that could impact the
safety case. This could include:
Long-term monitoring of the validity of design assumptions (e.g.

quantitative data, such as failure rates, and new phenomena, such as
unforeseen failure modes). This can be obtained using experience gained
from the operational system, similar systems and generic data.
Analysis of incidents and experience (with safety or commercial
consequences) on the current and other systems.
Industry-wide technical interest groups that share experience.
Technology watch to identify new methods or new threats (e.g. mobile
phones and EMI, or viruses).
Changes to interpretations of engineering principles (e.g. diversity or the
single failure criterion). For example in early protection systems, similar
designs implemented with different relays were considered diverse. A
more strict interpretation might require different designs and different
technology.
Reinterpretation of ALARP (e.g. the policy on what is feasible, or the
influence of other organisations practices).
The impact of any new knowledge on the safety case can be either positive or
negative. The information should be assessed to establish whether:
The safety case is still valid, or whether changes are required to the safety
case or the system.
The safety case is too conservative (e.g. pessimistic design assumptions
for fail-safe bias, failure rates, etc.). The new information may permit
stronger claims to be made about the system.
Whether the safety case is still ALARP (e.g. are any of the new methods
reasonably practicable?).
Version: 1.1
137
J.12 Long-term improvement of the safety methodology

Assess the support for learning from experience within the organisation.
Is there a process for learning from experience?
Are incidents and feedback data collected?
138
How are new requirements elaborated from existing incident and

feedback data?
What is the process for verifying the application of new requirements to
new projects?
Version: 1.1
Appendix K Example safety

case
This appendix illustrates our approach to
the construction of a safety case. It is
intended to illustrate the basic
principles, and focuses on the overall
system architecture, and the identification of requirements and constraints for the
hardware and software. The following safety case has been developed for a
notional reactor trip system. It is hoped the requirements are similar to those for
real reactors, and that many of the arguments presented below would be
applicable to a real system.
K.1 The environment

K.1.1 The plant
The plant is a gas-cooled nuclear reactor containing 400 fuel pins. Each pin is in a
separate gas duct and is cooled by carbon dioxide gas, and if the gas flow is
restricted in any duct the fuel pin could overheat and rupture. A reactor trip
system is required to trip the reactor if an excessive temperature is observed in
any duct.
K.1.2 Sensors and actuators

The temperature in each duct is measured by two thermocouples. The reactor trip
is implemented by dropping safety rods into the reactor.
K.1.3 Failure modes

The rod drop system is designed to be fail safe.
Thermocouples could fail to an open circuit state, to a short circuit state or
gradually degrade.
K.2 Trip system requirements

We will base our example on the requirements of a notional reactor trip system
which has two thermocouple probes in each of the 400 individual reactor coolant
ducts to detect overheating.
Version: 1.1
139
The primary safety requirement for the trip is:

R.TRIP
Trip the reactor if the temperature is too high in any gas duct
There are a number of associated performance requirements for the safety

function:
R.PFD
R.TIM
Probability of failure on demand < 0.001 per annum

Maximum response time 5 seconds
The implemented system must also satisfy a number of operational and

maintenance requirements:
R.STR
R.FIX
R.TST
Spurious Trip Rate < 0.1 per annum

MTTR (including identification) 10 hours
Periodic on-line test interval: 3 months
The integrity should be maintained in the light of changes:

R.UPD
R.SEC
Can be modified to meet expected changes

Can withstand maintenance errors and malicious attacks
The system also has to satisfy specific design criteria, e.g.:

D.F1
D.F2
No single independent fault affects availability

No two independent faults affect safety
K.3 Candidate system architecture

A system architecture has to be evolved which can satisfy these overall
requirements. One solution is shown below. Note that PAC denotes the Protection
Algorithm Computer and DCL stands for Dynamic Check Logic.
140
Version: 1.1

Isolation
Amplifiers
Coded
output
signal
Square
wave
signal
DCL
PAC
Thermocouples
B
DCL
PAC
DCL
PAC
Fail-safe
guardline
logic
2oo4
DCL
PAC
Serial lines
Monitor
Computer
Figure K1: Reactor protection example: system architecture
Each design feature addresses one or more of the safety requirements as
described below.
K.3.1 Redundant channels and thermocouples

Since there are four channels, a single channel failure will not cause a spurious
trip, similarly testing can proceed on a single channel without causing a trip. If two
channels fail to no-trip, the safety function is still maintained (R.TST, D.F1 and D.F2).
The 2oo4 channel voting reduces spurious trip rate (R.STR) in the presence of
random failures. With only two thermocouples however special arrangements are
needed to minimise the spurious trip rate due to thermocouple failures. If required,
one sensor of a pair can be disconnected and tested without the need for a veto
(discussed later).
The four channels and dual thermocouples also reduce the risk of a failure on
demand (R.PFD), and the risk of maintenance induced faults (R.UPD).
Version: 1.1
141
K.3.2 Fail-safe design features

Each Protection Algorithm Computer (PAC) produces a dynamic output signal
which is checked by the Dynamic Check Logic (DCL) check hardware. This
design continuously checks the integrity of the input/output and should be failsafe if it encounters a systematic or random fault. This reduces the risk of a failure
on demand due to an unrevealed fault (R.PFD) and can aid fault detection
(R.FIX).
The DCL checks for an expected output trip pattern based on the injection of test
signals as shown in the figure below. A test signal is fed into each ADC input card
(which is assumed to service 8 analogue inputs). Half the test inputs are
connected to test source T1 and the other half to T2. The test sources T1 and T2
can produces values which should be just above and just below the trip level. The
test values are swapped over by a test mode selector output from the DCL (the
alternation occurs after a complete scan). The test signal inputs to the PAC are
carefully chosen to ensure that a unique pattern of trip output signals is produced
on alternate cycles. This checks the operation of the input hardware and the
setting of the trip level. It also detects stuck-at inputs because the DCL expects
different trip patterns on alternate scans and will freeze if the wrong pattern is
found.
PAC
800 thermocouple
readings
T1
DCL
Coded
output
signal
Square
wave
signal
T2
Test
Source
Test mode selector
Figure K2: Dynamic check logic for a reactor trip channel

The integrity of the underlying computer hardware and compiler is checked using
the reversible computing concept (see reference [7]). This is sensitive to both
systematic faults and random failures in the hardware or faults created by the
compiler and should result in a freeze which is fail-safe (D.F2)it would also
reveal malicious program modifications (R.SEC). Time overruns caused by infinite
loops are also detectable by the reversible computer technique (R.TIM).
142
Version: 1.1
K.3.3 Separate monitor computer

This is an example of partitioning according to criticality. The more complex, but
less critical diagnostic functions are performed on a separate system. This
simplifies the design of the trip channel. Each channel provides:
software configuration data (limits, version numbers etc.)
measured values and trip results
The monitor computer can be used for pre-start checks on the consistency of the
software configurations in the four channels (R.SEC), and for on-line diagnosis of
channel failures and failures of thermocouples (R.TST, R.FIX). By comparing outputs
from the channels it is possible to decide whether the fault resides in a channel or
the thermocouple input system. It can also be used to monitor long term
degradation of thermocouples. If these are severe, availability can be
maintained by replacement or a veto.
K.3.4 Simplicity
The design has no intercommunication between channels and the A/D
conversion is performed within the PAC. There is no need for interrupt handling or
buffering so the software can be implemented as a simple cyclic program. This
should be easy to test and verify (R.TRIP) and alter (R.UPD).
Since the program is simple and cyclic, the worst case response time is bounded,
and the worst case time is readily determined via timing tests or code analysis. The
time delays in the interfaces can also be measured to determine the overall
response time (R.TIM).
K.3.5 Formally proved software

A simple cyclic program is amenable to formal proof (R.TRIP, R.PFD, R.STR).
K.3.6 1oo2 high trip logic

In order the minimise the risk of failing to trip on demand (R.PFD), either
thermocouple reading high will trip the reactor. To reduce the spurious trip rate,
this design imposes a fail-low direction on the thermocouples and buffer
amplifiers. A veto for a high-failing thermocouple forces the input low, but a
double veto is fail-safe as it will cause a trip (see below).
Version: 1.1
143
K.3.7 2oo2 low trip logic

To ensure that the system is fail-safe if both sensors fail, the system will trip if a
thermocouple pair have readings well below the average sensor reading. This
design can withstand a transient loss of a single sensor (e.g. for repair) or a lowreading sensor without using vetoes (R.STR), this minimises the need for error-prone
manual vetoes. The sensor comparison can assist in detecting failed sensors
(R.FIX).
K.3.8 Program and trip parameters in PROM

The program and trip parameters are stored in separate PROMs so changes
cannot be made without PROM-burning equipment and physical access to the
machine (R.SEC). Configuration errors can also be revealed by the on-line test
inputs, the outputs to the monitor computer and the periodic tests. This helps to
ensure the intended trip function is performed (R.TRIP) and reduces the risk of a
failure on demand or a spurious trip (R.STR, R.PFD).
K.3.9 Modular hardware replacement

Plug in cards reduce the repair time (T.FIX). Simple input-output interfaces can be
easily upgraded to accommodate new types of sensor (R.UPD).
K.3.10 Use of mature hardware and software tools

This reduces the risk of systematic faults within the system (R.TRIP, R.PFD, R.STR). This
is an example of avoidance of novelty.
K.3.11 Access constraints

To limit the scope of maintenance error (R.SEC), all equipments are locked and
can only be accessed using the appropriate key (different for each channel). All
plugs and sockets are uniquely identified or physically different to prevent
misconnection. An indicator light is used to show when a cabinet is unlocked.
K.3.12 Summary of design features contributing to safety

The main features of the design, and their relationship to the safety requirements
are summarised in the following table.
144
Version: 1.1
Requirement
PFD
STR
Redundant channels and

thermocouples
Fail-safe design features
Design Feature
TRIP
TIM
Formally proved software
F1
F2
SEC
UPD

1oo2 high temp trip in

software
2oo2 low temp trip in

software, fail-low bias on
inputs (to reduce vetoes)
Modular hardware
replacement
Mature hardware and
software tools

Design simplicityno interchannel communication,

cyclic software
TST
Separate monitor computer
Program and trip

parameters in PROM
FIX
Access constraints
Table K1: Safety case: design features vs. safety requirements
An alternative architecture might use design diversity as a method for reducing

the probability of failure on demand (R.PFD). For example, two channels could be
implemented using PLC type A, and two channels implemented using PLC type B.
The failure rate for the hardware and system software (and possibly fail-safe bias)
of the PLCs could be based on field experience (see Appendix G). Diversity might
Version: 1.1
145
be used to claim an order of magnitude reduction in the probability of common

failures between the diverse PLCs.
K.4 Evidence from the development process

The development and verification processes can produce evidence that can be
used in the safety argument. Documentary evidence is needed to show that the
planned activities are being carried out correctly (e.g. audits). This is necessary to
have confidence in the documented evidence and its relevance to the actual
system.
More specifically there can be tests incorporated within the development process
to support claims about specific safety attributes, i.e.:
R.TRIP
Proof of conformance to specification

High trip tests for pairs and single inputs
Low trip tests for pairs and single inputs
Tests of independence between inputs from different ducts
R.PFD
Statistical reliability tests (104 representative trips)

Tests of fail-safe response (e.g. simulated failures)
R.TIM
Static analysis to determine the worst case execution time

Time response tests
R.FIX
Test of diagnosis and repair times using simulated faults
With an alternative diverse channel architecture using PLCs, we may not be able
to perform formal proofs but we might be able to claim an order of magnitude
reduction in failures per demand beyond that demonstrated in the statistical tests.
K.5 Long term support activities

Long term support requirements are discussed in Appendix H. This deals with the
long term infrastructure requirements necessary for maintaining and updating the
system. The details will not be discussed here, but there are some specific support
activities which can affect the system integrity, namely:
Scheduled testingproof testing to verify all inputs can produce a trip,
recalibration, etc. Scheduled testing for channels would typically be
staggered to reduce the risk of a common mode maintenance error.
146
Version: 1.1
On-line fault detectionA fault might be diagnosed from a behavioural

anomaly (e.g. a partial or total trip), or by apparent discrepancies between
channels.
Fault diagnosisUsing available data from the computer monitor outputs,
and direct tests on the hardware, the source of the problem is identified.
RepairAn item is recalibrated, or an item is replaced. The channel or a
channel interface is powered down while this is done. The unit is retested
and the channel put on-line.
VetoIt is sometimes necessary to disable the normal functionality of the
system in order to maintain availability. The thermocouples are physically
located in the reactor and cannot be repaired immediately so a veto might
also be applied to avoid a spurious trip if a thermocouple sensor was failing
high. The trip for an individual fuel element may also be temporarily vetoed
for on-load refuelling.
RefuellingThe thermocouple connectors are disconnected on refuelling.
This is not a problem if the reactor is refuelled off-load, but disconnected
thermocouples could cause problems on start-up.
UpdatesThe software functionality may be changed. Changes are most
likely to be made to trip limits and scaling parameters, but in some cases the
program may be modified. The changes have to be verified off-line, and
correctly installed (via PROM replacement). The likely changes are
anticipated to be:
trip limit changes
change in number of inputs
change of computer hardware or software tools
change in trip logic
change of sensors
regulatory changes (design criteria, or evidence)
K.6 Arguments supporting the safety claims

For each safety requirement there will be one or more independent arguments to
support the claim. A subset of the safety claims is summarised in the following
tables:
Version: 1.1
147
R.PFD
Claim
Argument
Assumptions
PFD<10-3 pa
C.PFD.RAND
Hardware reliability
analysis
(redundancy + monitor
+ self-tests)
Common mode
factor
The failure per demand

due to random
hardware failures is less
than 10-3
No systematic faults
(sub-claim C.NO-FLT)
(see Probabilistic Fault
Tree Analysis)
Component
failure rates
Fault detection
coverage and
fail-safe bias of
inputs
Repair times
C.PFD.SYST.1
Even if there are
systematic faults, the
chance of failure per
demand is less than 10-3
C.PFD.SYST.2
Fail-safe design will
ensure that at least 90%
of failures due to
systematic faults are
fail-safe
(Note: design
assessment criteria
might impose a claim
limit of 90%)
148
104 reliability tests using

representative trips
without failure give
more than 99%
confidence in a PFD of
10-3
Trip scenarios are

realistic
a) Double
thermocouple
disconnection or veto
will cause a trip
Thermocouples
fail low in 90% of
cases
b) Compiler, loader
and processor flaws
protected by the
reversible computing
technique
Tests indicate
99.995% fail-safe
bias
c) ADC and
application software
and configuration
flaws covered by
dynamic on-line tests
The requirements
are correct
Detects 90% of
systematic
failures
Version: 1.1
Sub-claim
Lower level claim
Argument
Assumptions
C.NO-FLT.HW
Established designs +
system tests + reliability
tests imply that there
will be no systematic
hardware flaws
Extensive use will

reveal and
remove inherent
flaws
C.NO-FLT
There are no systematic

flaws in the hardware
Tests will reveal all

miswiring and
mis-configuration
C.NO-FLT.SW
There are no systematic
faults in the software
Version: 1.1
The code has been

formally proved
The requirements
are correct
It has undergone
functional tests to
reveal compilerinduced faults
The functional
tests can reveal
all compilerinduced faults
149
R.TIM
Claim
Argument
Assumptions/
evidence
Time<5
secs
C.TIM.STATIC
Static analysis of worst

case path through the
code.
Instruction
execution times
are correct
Includes times needed

for i/o
ADC conversions
and output time
are correct
Timing measurements +
argument that the
execution time is
bounded and relatively
constant
Test results
C.TIM.REV
An excessive or infinite
loop will be detected
by the reversible
computer
implementation
The reversible
computer
implementation is
OK
R.UPD
Claim
Argument
Evidence
Updating
the system
should not
introduce
faults
C.UPD.DATA
C.UPD.PROG
Adequate support
infrastructure
See Anticipated
Change analysis
There is sufficient
protection to prevent
updates of program or
data introducing
dangerous faults
Procedures for testing

updates
Worst case cycle time is

2.7 seconds
C.TIM.TEST
Worst measured time is
2.4 seconds
150
On-line test injection will

reveal dangerous flaws
in trip limits and trip
logic
Version: 1.1
K.7 Supporting analyses

The basic safety argument will refer to evidence from supporting analyses. This
evidence will change as the system is developed. Initially the analyses may be
based on initial assumptions (e.g. based on past experience) and design targets.
This can later be supplemented by test evidence and, in some cases, there may
be a requirement to gather supporting evidence during system operation in the
longer term (e.g. to confirm initial assumptions in the estimate of the probability of
failure per demand).
The following sections provide examples of analyses which support some of the
safety arguments.
K.7.1 Probabilistic fault tree analysis

The fault tree analysis is based on a system hazard identification study (not
discussed here) which uses conventional guide words to help identify potentially
dangerous failure modes of the various system components. A fault tree is then
constructed to identify combinations of events which can cause a dangerous
failure. The top event in the tree is when the system is unavailable but the failure is
unrevealed.
To be more concise, the fault tree is represented textually with the top events on
the left and sub-events indented. Terms in square brackets represent intermediate
or top events, and are expanded on the subsequent indented lines. The fault tree
covers the main safety related eventa failure to trip on demand. A similar tree
could be constructed for spurious trips.
The probability of the base events in the fault tree are based on estimates of
hardware reliabilities, and the likelihood of human initiated events. The
assumptions on which the analysis is based are listed first, followed by the
quantitative estimates for the minimal cutsets contributing to the top event.
Note that some events may be deemed incredible (i.e. probability zero) based
either on deterministic arguments or because of the depth of defences. Even if
zero, all probabilities are shown for later inspection and independent assessment.
Assumptions
10% of sensor failures are unrevealed
10% of buffer failures are unrevealed
Common failures are 10% of individual failures
10% of channel failures are unrevealed by a channel trip
10% of channel failures are unrevealed by the monitor
Channel failure rate (CPU + ADC + DCL) 1 pa
Sensor failure rate 10-3 pa
Version: 1.1
151
Buffer failure rate 10-3 pa

MTTR 10 hours
Proof test interval 3 months
Probability estimation
The system is unsafe if a dangerous fault exists but is unrevealed. Internal checks,
monitor checks and proof tests are the main methods for revealing failures.
Systematic faults are mainly deemed to be incredible (see the sub-claim R.NOFLT).
For random failures we have to include the risk of common cause failures, and the
chance they will remain undetected until the 3-monthly proof test. Taking the
case of the sensors, the basic failure rate is estimated to be 10-3 per annum. We
assume that the common mode failures are 10% of this (10-4 per annum), and 10%
of these will be undetected until the 3 monthly proof test (10-5 per annum). On
average the dangerous sensor measurement failure will be unrevealed for one
and half months (0.125 of a year), so the probability of unrevealed unavailability is
(0.125 10-5 ). The unavailability of temperature measurements due to two
unrevealed random failures in one duct is negligible (around 10-10). Since the
demand is only made on one duct, we only need to consider the unavailability of
a single duct measurement.
A similar argument can be applied to the isolation amplifiers and buffers. The
dominant factor is again common mode failure, which is assumed to affect all
buffers simultaneously, so the calculation is identical to the one used for the
thermocouples.
For the hardware channel failures we assume the common mode failure rate is
10% of the single channel failure rate (10-1 per annum). Of these 10% are
unrevealed by a channel trip (10-2 per annum), and 10% of the remainder are not
detected by the monitor (10-3 per annum). An unrevealed failure persists an
average of 0.125 years, so the overall is 12.5 10-5
The probability assignments for the fault tree events are summarised below,
including those which are assumed to be incredible (probability zero).
[duct-specific fault]
Demand(i) and
2oo2 Sensor (I) failed unrevealed
or
3oo4 [ Buffer (A,I) and Buffer (B,I) fail
unrevealed ]
or
software reads input J instead of input I
0.125 10-5
0.125 10-5
(proof tests,
analysis)
or
152
Version: 1.1
multiplexor reads input J instead of input I

or
[multiple channel faults]
3oo4 [hardware channels fail unrevealed]
or
wrong trip settings
(proof test, DCL)
12.5 10-5
0
(proof test +
monitor + DCL)
(no copies, DCL)
(no copies, DCL)
(analysis + test +
online test)
or
high trip logic flawed
(formal proof, test,

DCL)
or
multiplexor hardware latches past values
(proof test, DCL)
DCL fail-danger flaw
(analysis, fault
injection)
or
operating on stale copy of input data
or
sends old copy of output data
or
execution time too long
or
PFD
12.7 10-5
With an unrevealed unavailability of 0.13 10-3, and an assumed demand rate of

1 per annum, the estimated PFD is 0.13 10-3 pa is which is well within the 10-3 pa
target.
K.7.2 Anticipated change analysis

System Updates. The system and its safety case will need to be updated to
respond to functional changes, changes in technology, and regulatory
requirements (R.UPD). Potential changes to the system and their impact are
discussed below.
Trip limit changes. The safety case has to justify that trip limits are valid, the
changes are correctly implemented, and do not affect the remaining software.
The impact of the change is minimised by holding the parameters on a separate
PROM. The installed parameter settings can be verified by proof testing, via the
on-line test signals (each side of the trip limit) and via the monitor output.
Change in number of inputs. No fundamental changes are required in the design
or the safety case. It may requires changes in the input-output hardware,
software and DCL, but no change in the proof, and only small changes in the
Version: 1.1
153
program which can be verified by proof testing and by testing in conjunction with
the modified DCL.
Change of computer hardware or software tools. The fail-safe integrity checks
provide protection against flaws in the new hardware and software tools. The
separate channel structure and simple input-output interfaces permit selective
upgrading on a per-channel basis (phased commissioning).
Change in functional requirements. Would require repetition of the formal proof
and the formally developed software. Proof tools have to be available (or be reimplementable on another system). Formal proof requires relatively scarce
expertise and could represent a risk in terms of greater implementation delays
and higher update costs. However licensing risks and the associated costs are
likely to be reduced.
Change of sensors. Relatively simple technology. Changes can be
accommodated by re-scaling the buffer amplifiers or changing the scaling
constants in the software. Verifiable via proof testing, dynamic on-line tests and
the monitor output.
Regulatory changes. If the requirements for diversity become more stringent,
diversely implemented channels can be used to protect against systematic
hardware and software flaws. This is relatively simple as each channel is
independent. Diverse sensors and buffers are also feasible. Requirements for more
rigorous system testing should be feasible as each channel is a standalone unit,
and tests can be performed individually without the need to test for interaction
effects.
K.7.3 Analysis of maintenance and operations

The possible failures that could occur in these activities are enumerated by
considering a number of guide words (e.g. incomplete or wrong). The design
safeguards are identified for each case. These could well be supplemented by
procedures, training, manual records and checklists, but are not discussed below.
Proof testing Incompletee.g. some elements not tested, transposede.g. tests
on wrong channel, wronge.g. incorrect recalibration.
Safeguardsclear identification of channel equipment, access keys
(different for each channel), limits on amount of adjustment, crosschecking subsequent behaviour via the monitor.
Fault diagnosis Incompletefailure to spot discrepancy between channels,
transposeidentify correct component type but not which one (e.g.
channel or thermocouple), wrongidentification completely wrong.
154
Version: 1.1
Safeguardsproof tests, cross-checking subsequent behaviour via the

monitor, system trip (fail-safe, but undesirable).
Repair Incompleterepair omitted or partially performed (e.g. not fully
reconnected), transposeswap over connections or components, wrong
e.g. wrong component, wrong settings. Repair on the wrong channel could
cause a spurious trip if one channel is tripped already.
Safeguardsproof tests, cross-checking subsequent behaviour via the
monitor, PROM and computer self-tests, system trip (fail-safe, but
undesirable).
Veto Incompletee.g. sensor not vetoed on all channels, transposeveto
wrong sensor of pair, wronge.g. wrong channel vetoed.
Safeguardsproof tests, cross-checking behaviour via the monitor,
channel trip when sensor fails low or high, avoidance of vetoes for
normal operation and designed failure modes.
Refuelling Incompletethermocouple left disconnected, transposesensor
connections transposed, wrongbad connection (reading low, short
circuit).
Safeguardsreactor start-up checks, proof tests, cross-checking
behaviour via the monitor, connection labelling.
Updates Incompleteincomplete PROMS, transposePROMS in wrong
order, wrongwrong PROM version used, update incorrect.
Safeguardsproof tests, PROM integrity checks (e.g. CRC checks
across program PROMS and parameter PROMS), version and
parameter settings echoed to monitor. Cross-checking behaviour via
the monitor. Channel trip due to pattern mismatch at DCL.
K.8 Safety long-term support requirements

K.8.1 Support infrastructure
Activities:
safety reviews
problem analysis
system/safety case redesign
Version: 1.1
155
Special tools/skills:
formal proof methods
reversible computer design
DCL design
test environments
test suites
FTA and RAMS techniques
Domain knowledge:
sensor characteristics
CMF mechanisms
Anticipated changes:
trip parameters
trip logic
fault detection
number of inputs
processor hardware
interface hardware
K.8.2 Maintenance support risks

Most of the maintenance and upgrade safety issues have been addressed in the
design, but upgrades could be hampered if there was a lack of key skills and
technologies. Replacement of obsolescent hardware does not require any
unusual skills. Reprogramming the software is mainly restricted to a reimplementation of the reversible computer instruction set and is a relatively
straightforward task. Functional changes will require a change to the formal
proof, and may be vulnerable to obsolescence of the support tools and formal
methods skills. There will be a significant delay if the formal proof has to be re-
156
Version: 1.1
implemented from scratch using a different formal notations and support tools.
Obsolescence of the dynamic coded logic could be a problem, but the basic
structure should be re-implementable in a new technology, and the fail-safety
can be reviewed by independent specialists and tested directly by fault injection.
As a fall-back, the system could be re-implemented with diverse hardware and
software in the channels.
K.8.3 Regular analyses

The safety case is predicated on a set of design assumptions about the
equipment, the operational environment and the behaviour of connected
equipment. Records should be maintained of equipment failures and repairs, and
these should be analysed to determine whether these assumptions are borne out
in practice. The analyses would typically include:
equipment failure rates
component failure rates
proportion of common mode failures
proportion of fail danger faults
proportion of gradual and abrupt sensor failures
MTTR
maintenance error rates
proportion of equipment faults found in on-line tests and proof tests
spurious trip rate
software faults and the proportion which are dangerous
The impact of these results on the safety case should be assessed. If the results
undermine the safety case, changes to the system design, operating procedures,
or monitoring systems may be necessary.
K.9 Elaboration to subsystem requirements

If the candidate system architecture, safety case and support requirements are
acceptable, the design can be further elaborated into a set of design
Version: 1.1
157
requirements for the subsystems. In the specific reactor trip example there might
be requirements for the following.
D.ARCH
Overall system architectureapportionment of functions, overall

design safety case, design assumptions, numerical design targets,
design constraints, required safety case evidence, operation and
maintenance infrastructure, design for change, long-term support
requirements.
D.ENV
Requirements for environmental tests (shake and bake) for all

hardware, maximum temperature, humidity, cooling requirements, EMI
protection.
D.POW
Power supply specifications, reliability requirements.
D.DCL
Specification of the DCL + fail-safety requirements.
D.INP
Input specifications (number, range, isolation, etc.).
D.ADC
Requirements for the ADC (number of inputs, range, speed, reliability).
D.MON
Requirements for the monitor and monitor interfaces.
D.CPU
Requirements for the CPU (speed, PROM capacity, RAM capacity,

input-output, etc.).
D.SW
Requirements for the software.
Note that the subsystem requirements will include any evidence required for the
safety case (e.g. environmental test evidence, timing, fault tolerance tests, fault
injection tests, etc.). This evidence could be part of the subsystem deliverable.
As an example of how the subsystem requirements are elaborated, the
requirements for the software (D.SW) are given below. The requirements placed
on the software are based on an apportionment of the top-level safety functions
together with additional requirements imposed by lower level design decisions.
The requirements include the basic functional requirements for the software,
specific design constraints on the implementation method, and requirements for
safety case evidence.
K.9.1 Software Functional requirements

SW.INFO
From (R.SEC and R.UPD). Every complete scan cycle, send the software
configuration data (number of inputs, input scale factors, trip limit
values, software version number and sumchecks).
SW.TRIP
From (R.TRIP). For all inputs:
158
Version: 1.1
Scan the two temperature readings (Ra, Rb) from the ADC.
Scale the values to Ta and Tb.
Perform 1oo2 voted high temperature trip (HiTrip = max(Ta,Tb) > Tlimit).
Perform 2oo2 voted low temperature trip ( LoTrip = max(Ta,Tb) <

MinOpTemp)
(MinOpTemp is MaxDiff below the median operating temperature for
all ducts).
Send (HiTrip or LoTrip) to the DCL.
Send Ra, Rb, HiTrip, LoTrip values to the monitor output.
SW.IO
Satisfy the specified interface requirements for the ADC, DCL, and
Monitor ports (from D.DCL, D.ADC, D.MON).
SW.CHK
From R.MTTR. Halt if an internal failure is detected (PROM sumcheck,

RAM checks, processor, time overrun). Provide indication of the type of
fault detected.
SW.TIM
From R.TIM. The software scan cycle should be less than 5 seconds
including the time required for all input and output operations.
K.9.2 Safety case design constraints imposed on the software

1. For an architecture where there is common hardware and software in all four
channels:
SW.REV
Implement the software using the reversible computer technique.
SW.FM
Formal proof that code implements specification.
2. For an architecture using diverse PLCs and software in the channels:

SW.ETST
Exhaustive test for software components + arguments for

independence/ composability.
SW.DIV
Diverse implementations of the application software (e.g. IEC 1131-3

and Pascal).
SW.CYC
Ensure that the software is implemented in a simple cyclic loop. Avoid

the use of interrupts and buffering.
Version: 1.1
159
K.9.3 Safety case evidence requirements for the software development

SW.CHK.CASE
Check the fault detection performance for simulated faults.
SW.TRIP.CASE
Perform 104 demands on the system using realistic trip profiles.
SW.TIM.CASE
Show the timing constraint is satisfied.
SW.V&V.CASE1 SW.FM.VER
SW.REV.CASE
Provide proof script, independent verification of

proof.
Demonstrate the reversible computer is
implemented correctly and formal software is
correctly mapped to reversible code.
Provide tests of fail-safe performance.
SW.V&V.CASE2 SW.ETST.CASE
Show all software modules are exhaustively tested.

Show all modules operate independently for all
readings.
SW.DIV.CASE
Show diverse implementations are independent

(languages, tools, staff, V&V).
SW.DES.CASE
Show compliance with the implementation constraints.
SW.TOOL.CASE
Provide impact analysis of faults in support tools, analysis of tool

quality (e.g. likely number of faults injected).
K.9.4 Software documentation/QA requirements

SW.PROCESS
Provide evidence for the integrity of the delivered system and the
development process: safety plan, safety audit records, quality
plan, QA records, plans, design documents, software, proof files,
V&V records.
SW.PRODUCT
Provide all necessary items for use and long-term support: design
documents, software, proof scripts, test environment, support
tools.
K.10 References:
[1]
160
P.G. Bishop, 1997 Using Reversible Computing to Achieve Fail-safety in

proceedings ISSRE 97, Nov 1997, Alberquerque, New Mexico, USA
Version: 1.1
Appendix L Index
accident mitigation .................... 18, 61
deterministic argument...............15, 86
adequate ..............................................8
diversity ................................................19
ALARP ...................................... 61, 67, 94
documentation ......................52, 53, 58
architectural safety case ........... 22, 32
domain knowledge .........................116
assumptions.........................................14
external equipment ...........................24
certification .........................................37
failsafe bias .........................................40
checklist ............ 30, 31, 75, 93, 123, 133
failure mitigation ..........................18, 38
claim limits .............................. 56, 62, 66
failure modes ......................................24
claims ........................................ 7, 14, 20
fault elimination..................................18
commercial risk.....................................8
fault tree analysis................................36
common cause failure analysis........41
feedback records ............................113
conservatism .......................................49
field experience .................................99
correctness..........................................41
FMEA ....................................................41
costs......................................................29
formal proof ............................15, 56, 81
COTS................................ 37, 48, 99, 103
hazard analysis ...................................36
dangerous failure ........................ 18, 19
Hazops .................................................39
Def Stan 00-55 .....................................64
human error ......................................115
Def Stan 00-56 ........................ 62, 64, 66
human factors ............................52, 115
defence in depth ...............................62
IAEA-367...............................................65
design criteria .....................................65
IEC 61508 .......... 8, 9, 62, 64, 65, 90, 100
design for assessment ..... 30, 32, 33, 48
implementation safety case ......23, 40
design options.....................................69
independent assessment..................49
Version: 1.1
161
integrity checks.................................. 37
qualitative argument.........................16
integrity level .............. 16, 55, 62, 65, 66
quality management ........... 39, 58, 67
interlocking ......................................... 36
regulator ........................... 10, 39, 49, 93
ISO 9000-3 ........................................... 64
risk .................... 32, 34, 36, 38, 39, 49, 61
ISO 9001 .............................................. 64
risk assessment ....................... 32, 38, 42
KISS............................................... 34, 104
robustness ............................................16
legacy system .............................. 29, 48
safety case maintenance...............109
long term issues................................ 109
security...........................................21, 62
long-term costs .................................. 51
single failure criterion ................ 62, 131
maintainability ............................. 21, 62
software ...................... 27, 34, 37, 41, 69
maintenance ..................................... 51
software integrity level ......................27
management..................................... 52
software reliability case.......................9
modifiability ........................................ 62
software safety case............................9
MTTF ..................................................... 15
stakeholders ............................... 39, 117
MTTR..................................................... 22
support tools........................................31
novelty................................................. 36
tacit knowledge ................ 52, 116, 134
obsolescence .............................. 51, 87
teams .................................................117
operation and installation safety

case............................................... 23, 44
timing errors .........................................20
operator .............................................. 39
PES........................................................ 55
Preliminary .......................................... 23
preliminary safety case............... 22, 23
probabilistic arguments.................... 15
probabilistic criteria........................... 65
project lifecycle........................... 22, 45
purchaser............................................ 39
162
tolerable ..............................................61
tools ................................................89, 90
traceability ....................................28, 54
training ...........................................23, 73
usability ................................................63
validation.......................................40, 56
verification.....................................40, 56
voting ...................................................71
watchdogs ....................................70, 72
Version: 1.1

Adelard Safety Case Development Manual: Ascad

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adelard Safety Case Development Manual: Ascad

Uploaded by

Copyright:

Available Formats

ASCAD

First Published 1998 by Adelard

Published 1998 by Adelard, 3 Coborn Rd, London E3 2DA

Adelard Safety Case Development Manual

Adelard Safety Case Development Manual

Adelard Safety Case Development Manual

4.2 Develop claims from attributes ..........................................................................24

Adelard Safety Case Development Manual

11.5 PES system architecture safety argument ....................................................... 55

Adelard Safety Case Development Manual

F.1 Basis for the checklists ..........................................................................................91

Adelard Safety Case Development Manual

I.5 Organisational issues ........................................................................................... 116

Adelard Safety Case Development Manual

K.3 Candidate system architecture ........................................................................142

Adelard Safety Case Development Manual

Adelard Safety Case Development Manual

2 What is a safety case?

3 The importance of a good safety case

4 Basis of the ASCAD methodology

Specific safety cases for a number of command and control systems

Adelard Safety Case Development Manual

The safety case for the DUST-EXPERT advisory software produced by

The development of safety standards such as MOD Def Stan 00-55

The development of a Software Assessment Manual to IEC 61508 for

5 How to use the manual

Adelard Safety Case Development Manual

The main guidance is supplemented by appendices containing supporting

Adelard Safety Case Development Manual

Adelard Safety Case Development Manual

Adelard Safety Case Development Manual

Part 2 Description of the

Adelard Safety Case Development Manual

The main guidance is supplemented by appendices containing supporting

2.2 Safety case structure

Adelard Safety Case Development Manual

Figure 1: Safety case structure

quantitative statistical reasoning, to establish a numerical

Adelard Safety Case Development Manual

compliance with rules that have an indirect link to the

Figure 2: Illustration of a robust claim

facts, e.g. based on established scientific principles and prior research

Adelard Safety Case Development Manual

This is a recursive structure which can represent arguments at successively finer

2.3 Types of claim

usability (by the operator)

security (from external attack)

2.4 Sources of evidence

Adelard Safety Case Development Manual

the development processes

simulated experience (via reliability testing)

prior field experience

2.5 Style of argument

Adelard Safety Case Development Manual

Transition depends on:

Transition depends on:

Adelard Safety Case Development Manual

eliminated by a design that ensures a maximum response time

A similar strategy could be applied for other safety-related attributes (e.g.

Adelard Safety Case Development Manual

3 Safety case development

Adelard Safety Case Development Manual