Professional Documents
Culture Documents
Lectura 3 PDF
Lectura 3 PDF
“From a safety viewpoint, the software architecture is Input Output Input 2 Output
Component 2 Voter Component 2 Voter
where the basic safety strategy is developed in the
software.” Component 3
Input 3
Component 3
However, there exists little guidance on how to develop Figure 1. A TMR pattern and a variation
such a basic safety strategy in software architecture
design. Three key issues need to be addressed in order to
improve this situation: Safety analysis. Although a considerable number of
safety analysis techniques have been proposed to aid
Design techniques. A vast number of software safety software design such as Software Hazard Analysis
design techniques have emerged in both research and and Resolution in Design (SHARD) [12], there is
practice. Deciding upon appropriate techniques or little analysis work focusing on an architectural level
to aid software architecture design. In particular,
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE
safety is the entire property of a system; it is almost elements form an analytic model for the safety quality
impossible to analyse software safety effectively attribute and thus prompt our consideration of the
without considering system or platform safety. We effective measures against these projected failures. Figure
thus need a safety analysis approach that is able to 2 shows this analytic model in a simple notation, which
model the integration of software with hardware or demonstrates how software-level failures can contribute
other system components and to characterise the to the system-level behaviour. This model is defined in
architectural elements at different architectural levels. terms of software system component and other system
components. The underlying computer hardware is also
This paper describes an approach to software distinguished from the software, since unintended
architecture design for safety-related systems. Our work behaviour of computer hardware can lead to the
was inspired by the notion of architectural tactic [4] hazardous execution of software. Finally, the failure
proposed by the Software Engineering Institute (SEI) at behaviour is analysed with respect to the interaction of
Carnegie Mellon University. The principle of architectural (software) components.
tactics is to identify and codify the underlying primitives
of patterns in order to solve the problem of the intractable 2.1 Failure classification
number of patterns existing [3]. Clearly, there are
relatively fewer tactics to be handled, since there are Systematic failures in software can arise from any
many ways in which tactics can be combined into patterns. stage of the development process, which makes the work
The SEI has proposed tactics for six quality attributes of failure identification almost infeasible. Thus, a better
(availability, modifiability, security, performance, approach is to analyse them and to classify them at an
usability, and testability) [4]. This is inadequate, however, abstract level. Several classifications of failure types exist
in order to meet the need of all the attribute communities. in the literature [6, 14, 12]. We have adopted Pumfrey’s
Safety stands out as an emergent property of a system that failure classification [12] as the most appropriate
should be considered within the safety domain. Based framework. The failure categories we have adopted are
upon the work of architectural tactics, it is natural to thus defined as follows:
introduce the notion of safety tactics that connect
software safety and software architecture. Service provision: Omission (the failure of an event
The following sections provide a discussion of our to occur), Commission (the spurious occurrence of an
tactical approach to address the above three issues. This event).
tactical approach extends the existing architectural tactic Service timing: Early (the occurrence of an event
development framework proposed in [1, 2], with a view before the time required), Late (the occurrence of an
to include consideration of safety as a quality attribute. event after the time required).
The approach can be divided into four main sections. Service value: Coarse incorrect (incorrect but
Section 2 defines an analytic model to analyse software detectable value delivered), Subtle incorrect
safety at architectural level. Section 3 presents the (incorrect and undetectable value delivered).
development of safety tactics. Section 4 illustrates the
tactic-based approach for software architecture design. 2.2 Failure cause
Finally, section 5 demonstrates this approach by means of
an example. The second stage in analysing failures concerns how
are they generated among architectural elements. Again,
2. An analytic safety-attribute model to make the identification work feasible, we also group all
potential failure causes into three abstract types: software
Safety design concerns the identification and itself, its underlying hardware, and its environment, as
management of hazards. Hazards are caused by failures. shown in Figure 2.
A distinction is often made between causes of failures in
physical devices (e.g., random failures) and failures in The software itself – Software including its
software. Software does not fail randomly; its “failures” underlying operating system is an abstraction. It can
are due to its systematic nature (e.g., design faults). In only contain flaws arising from either incomplete or
software systems, safety is thus achieved by avoiding, or inaccurate specification, or incorrect design and
protecting against, these failures. As a result, the focus of implementation. These flaws can lead to unexpected
attention in our analytic model is the relationship between behaviour that the software exhibits within a system.
the safety attribute and software architecture with respect Underlying hardware – Even correct software can
to failures. Four key elements are thus identified to still fail due to unexpected behaviours of the
analyse this relationship: failure classification, failure underlying hardware.
cause, failure behaviour, and failure property. These four Environment – By treating software as a black box,
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE
we are also interested in anomalies that originate 2.4 Failure property
externally to the software and are fed into the
software as inputs. Leveson [11] describes such The final consideration in our analytic model concerns
anomalies as environment disturbance. the properties of failures that can be used to guide the
selection of effective protection mechanisms. Two such
The Entire System useful properties are detectability [6], and tolerability [14].
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE
terms of failure elimination, failure detection, and failure 4. Deriving safety tactics
tolerance. Safety tactics can thus be organised into three
sets: tactics for failure avoidance, tactics for failure This section describes how safety tactics can be
detection, and tactics for failure containment. To be applied in software architecture and architectural patterns
consistent with the SEI’s tactics work, we also organised definition.
safety tactics as a hierarchy, as shown in Figure 3.
4.1 Properties of safety tactics
Safety
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE
Table 2. An example documented safety tactic
Rollback
Aim Applicability
To roll back in time to a previous known state upon the The tactic can be applied when the system state is
detection of an erroneous sate and then to retry recoverable. It is always combined with tactics for failure
execution. detection for use in order to activate the recovery procedure.
Description This tactic is more secure as it places no reliance on the
Also known as “recovery-and-retry”, it simply attempts current system state, and it is less expensive than other
to simulate the reversal of time, in the hope that the failure recovery tactics such as degradation and
earlier known state in the new time will not recreate the reconfiguration and tactics for failure masking. But it may
failure. The tactic is applied when a failure is detected. not be suitable for timing-critical systems since re-execution
In practice, this tactic can be applied to multi-version will probably miss the deadline requirements of these
software with the combination of redundancy tactics. In systems. Furthermore, successful rollback does not mean
such situations, rollback retries computation using an that faults have been eliminated; failure may still recreate
alternate version of software that is assumed to work when rolling back and retry again in a new time.
better than the original one.
Rationale Consequences
The purpose of this tactic is to contain failure. This The erroneous state leading to a failure is repaired. But this
forms the basis for presenting claims about this tactic. failure may not be contained successfully and it may
recreate again after rolling back.
G1
Side Effects
The effect of a failiure
has been minimised It will incur penalties both in the speed of system operation
and in the expenditure of resources needed to permit state
S1 C1
restoration.
Argument over
rollback in design
Design decision
to use rollback
Practical Strategies
Recovery block
Patterns
Simplex pattern
G2 G3 A1
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE
Proximity
Interlock. Correct sequencing of events should be Sensor
Backup
enforced. For example, the control function must not Microswitch Battery
Proximity
produce the next result value without the previous Control
System
Sensor
Microswitch
Bumper Bumper
with the selected safety tactics.
Figure 5. AGV collision detection mechanisms
Control/Monitor Pattern 2
5.1 Elicitation of software safety requirements
In the above pattern, the responsibility of failure detection
Our software safety design process always starts with
completely relies on the monitor component. And no
the system/platform hazards identified by preliminary
protection is taken against its failure, which leads to a
hazard analysis. Here we considered only one of the main
potential source of single-point failure in the system.
hazards arising from AGV movement operation: AGV
Furthermore, there is a strong dependency between the
systems faulty such that obstacle detection and/or braking
control and monitoring functions, since they both are
does not operate correctly. Thus one of the AGV platform
implemented in the same “Monitor” component. Thus, we
safety requirements is that AGV platform shall avoid
may instead allocate the safety responsibility to both the
collision. Our design process precedes by refining this
“Monitor” and “Control” channels by applying the tactic
platform safety requirement into the lower-level system
for comparison to each component to form cross-
functional safety requirements: AGV shall detect obstacle
checking. Once these two channels do not agree with
and stop in time. Further refinement of this system safety
each other, action should be taken by the “Output”
function requires the allocation of stopping function to
component to ensure that the actuator is stopped in time.
hardware or software. The Substitution tactic tells us
Figure 4 shows the Control/Monitor Patterns 1 and 2.
using simple hardware protection mechanism is safer.
Sensors
Control
Sensors
Control
However, there is a domain constraint that revising
Actuators
Actuators hardware design is not allowed. Therefore a software
Output Output
safety requirement is elicited during the design process:
Monitor Monitor
control system shall output commands in time in response
to obstacle detection. The whole safety requirements and
Figure 4. The Control/Monitor Patterns 1 and 2 design processes are recorded by the Goal Structuring
Notation (GSN) [9]. The above top-down safety
requirements refinement process with hardware/software
5. An example tradeoffs is illustrated in Figure 6, 7.
Our example concerns the design of the control system
5.2 Identification of possible safety tactics
of an Automatic Guided Vehicle (AGV). The favoured
system consists of a number of vehicles that navigate
The elicited high-level software safety requirements
using a scanning laser and a loaded map file. The
can be further refined into lower-level software functional
movement of each vehicle is instructed by its embedded
safety requirement and timing requirement. Let us
control system that calculates its position either precisely
consider the software functional safety requirement:
by triangulation or by dead-reckoning or a combination of
control system shall always output brake and stop
both. Figure 5 shows the collision detection mechanisms
commands in response to detection even in the presence
proposed for each vehicle. The braking mechanisms are
of credible failures. Since failures are unavoidable, tactics
hydraulically operated. There are two sensing systems:
must be selected from the categories of failure detection
and failure containment. Obviously, time-out and
A proximity sensing system where proximity sensors timestamp tactics are irrelevant in this case. The sanity-
signal to the control system that an object is being checking tactic is also rejected, since there is nothing to
approached, with sufficient space to stop the vehicle act on the validity signal. Moreover, the recovery tactics
before the bumpers touch the object set are deemed to be inappropriate simply because any
Bumpers whose pressure is detected by recovery action cannot immediately stop collision. Finally,
microswitches, which will open if force is applied to the barrier tactics set are not suitable candidates since
the bumpers, cutting off the power to the drive motor they are used as passive protection mechanisms.
regardless of any action taken by the control system. Therefore, possible safety tactic alternatives would be:
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE
PlatCollisionAvoidance SubstitutionArgument CrosscheckingArgument
AGV platform shall avoid AGV shall apply brakes The control system design
collision using hardware devices shall introduce twin cross-
linked channels
AGVMovementSafety
HWCircuitStopping BumperStopping
EnablingCondition
HighAvailability HighCost AvailabilityCompromised
AGV shall introduce hardware AGV shall revise the bumper
circuit to ensure that the brakes mechansims so that the AGV movements only proceeds
are applied immediately when a movement of bumpers directly
Voting can lead to high There is no sufficiently high The availability of AGV
when enabling conditions exist
microswich is triggered operate the brakes availability of the control availability requirement for movement function is
and the controller has attempted
system the AGV to justify the cost compromised
to comand movement
Figure 6. The top-down process for eliciting software safety Figure 9. Arguments over the Figure 10. Arguments over the
requirements use of voting use of enabling monitor
CSFnStopping
DetectDiscrepancy
NoSpecificTimingCons
traint
Timeout can detect whether the
There is no obvious
stopping operation of the control
timing constraint that can
system meet specific timing ST1
be specified Sn3
constraints Sn1 Sn2 Sn4 Sn5
Crosschecking Twin cross- Triple
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
Control + Simple
linked modular software Controller +
Enabling
channels redundancy design watchdog
Figure 11. Arguments over the use of timeout monitor
Context
ArgOverCrosschecking
ST2
Argument over the use of ConditionMonitori
Inputs Outputs ST3 ST6
Controller
Crosschecking tactic ng ST5
Critical Timeout
Outputs
Voting Simplicity
Context
Monitor Context
Enable ArgOverConditionMonitori
ng Context
ArgOverVoting
Argument over the use of ArgOverTimeout
Argument over the use
Condition monitoring tactic of Voting tactic Argument over the use
of Timeout tactic
Proceedings of the 28th Annual International Computer Software and Applications Conference (COMPSAC’04)
0730-3157/04 $20.00 © 2004 IEEE