Professional Documents
Culture Documents
03 Fault Covearage008
03 Fault Covearage008
03 Fault Covearage008
Abstract
The fault coverage for digital system in nuclear power plants is evaluated using a simulated fault injection method. Digital systems have
numerous advantages, such as hardware elements share and hardware replication of the needed number of independent channels. However,
the application of digital systems to safety-critical systems in nuclear power plants has been limited due to reliability concerns. In the
reliability issues, fault coverage is one of the most important factors. In this study, we propose an evaluation method of the fault coverage for
safety-critical digital systems in nuclear power plants. The system under assessment is a local coincidence logic processor for a digital plant
protection system at Ulchin nuclear power plant units 5 and 6. The assessed system is simplified and then a simulated fault injection method is
applied to evaluate the fault coverage of two fault detection mechanisms. From the simulated fault injection experiment, the fault detection
coverage of the watchdog timer is 44.2% and that of the read only memory (ROM) checksum is 50.5%. Our experiments show that the fault
coverage of a safety-critical digital system is effectively quantified using the simulated fault injection method.
q 2005 Published by Elsevier Ltd.
Keywords: Digital plant protection system; Local coincidence logic processor; Fault coverage; Simulated fault injection; Heartbeat-watchdog timer; ROM
checksum
techniques cannot adequately evaluate all features of digital However, the foregoing studies concentrated only on one
systems. Kang and Sung found that fault coverage, common semiconductor component test in the manufacturing
cause failures, and software reliability are three most critical process. A nuclear power plant digital instrumentation and
factors in the safety assessment of digital systems [3]. control (I & C) system is a very large system, with many
Among these factors, this study focuses on evaluating components, that precludes the use of conventional
method of the fault coverage of actual nuclear power plant evaluation methods.
digital system. Therefore, in this study, we propose an evaluation
The probability of a fault being properly removed from a method for the fault detection coverage of a system, and
fault-tolerant system is referred to as fault coverage. The this method is represented as following:
fault coverage value crucially affects the dependability of a
System coverage
system. Thus, fault coverage is one of the most critical
factors in a PSA. There are mathematical and qualitative Total number of detected faults in the system
expressions for the fault coverage. Mathematically, the fault Z (2)
Total number of faults in the system
coverage C is defined as the fault processed correctly
divided by the fault existence. For a given time Dt, the number of faults at component i is
liDt, where li is the fault rate of component i(iZ1, 2,
C Z P fault processed correctly=fault existence (1) 3,.,N).1 Therefore, the total number of faults in the system
Qualitatively, coverage is a measure of the system’s can be represented as
ability to detect, locate, contain, and recover from the X
N
presence of a fault. There are four primary types of fault Total number of faults in the system Z li Dt (3)
coverage: (1) fault detection coverage, (2) fault location iZ1
coverage, (3) fault containment coverage, and (4) fault As shown in the previous section, fault detection coverage is
recovery coverage. Thus, the term ‘fault processed correctly’ defined as the fault processed correctly divided by the fault
refers to one or more of the four coverage types [4]. existence. The detected faults in component i can be
Most of safety-critical systems manage faults in a fail- obtained with multiplying the fault detection coverage (Ci,d)
safe manner when they detect a fault. For example, the by the total number of faults in component i: Ci,d$liDt. The
digital protection systems of the Ulchin nuclear power units total number of detected faults in the system is represented
generate safety signals when they detect a fault. That is, in as follows:
this case, the fault detection coverage is a matter of interest.
The purpose of this study is to introduce a quantitative, Total number of detected faults in the system
fault-detection coverage evaluation method for a fail-safe
X
N
digital system by using a simulated fault injection. Z Ci;d ,li Dt (4)
This paper is structured as follows. In Section 2, we iZ1
describe the fault-detection coverage evaluation method.
The target system and the local coincidence logic (LCL) in Hence, (2) can be derived by substituting (3) and (4):
the DPPS are introduced in Section 3. The experiment setup PN
iZ1 Ci;d ,li
is presented in Section 4. In Section 5, we present some System fault detection coverage Z P N (5)
iZ1 li
application results from the experiment. We conclude the
paper in Section 6. The fault detection coverage of each component is
evaluated using the simulated fault injection experiments.
2.1. Coverage evaluation method There are three types of faults: permanent faults,
transient faults, and intermittent faults.
Several studies have considered a quantitative evaluation Permanent faults are related to irreversible physical
of fault detection coverage by using fault injection methods. defects in the circuit. These defects can be produced during
Koche et al. proposed a deductive fault simulator for the the manufacturing process or during normal operation.
fault coverage evaluation [5]. Levendal and Menon used Transient faults appear during the operation of a circuit, and
hardware description languages to describe small circuits the duration of this fault is very short. Intermittent faults are
and faults are applied to the circuit, such as function similar to transient faults, being temporary, but this type of
variables stuck at 0 or 1 and control faults [6]. Mao and fault appears and disappears repeatedly in time, without
Gulati proposed an RTL fault model and simulation
methodology [7]. Hayne and Johnson evaluated coverage 1
In this paper, we treat the fault rate as the failure rate from MIL-HDBK-
by using hardware description and fault injection [8]. 217F. That is, all fault occurrences in the system lead to a failure state.
DTD 5 ARTICLE IN PRESS
S.J. Kim et al. / Reliability Engineering and System Safety xx (2005) 1–10 3
2.2.3. Simulated fault injection Fig. 1. DPPS trip path block diagram.
In this technique, the system being assessed is simulated
on another computer system. Faults are induced by altering
the logical values of the model elements during the
digitized trip demand signals from four bistable processors
simulation.
and provide the binary outputs that indicate whether two or
This work focuses on the simulated fault injection
more channels are in a trip condition. The individual 2 out of
technique to evaluate fault detection coverage, because
4 outputs of an LCL processor are appropriately combined
using the hardware-implemented fault injection technique
to generate an initiation signal for the particular function.
is difficult, requires expensive hardware, and faults cannot
be controlled and limited by the complexity of the system. Fig. 2 shows the 2 out of 4 coincidence logic. The 2 out of 4
In addition, the software implemented fault injection coincidence logic function is coded by C complier for 8051
technique concentrates on software rather than hardware. and stored in ROM.
For these reasons, we applied the simulated fault injection For the experiment, the LCL processor is realized by
technique to a digital system model, whereby faults are the hardware description language. The coincidence logic
simulated using the hardware description code with self-checking is programmed and stored in the ROM.
modification. The results of the local coincidence logic operation and the
error detection signals are sent to the output pin of the
8051 CPU.
3. Target system
CH.A
PARA
METER CH.B
TRIP CH.C
SIGNAL
CH.D
CH.A
CH.B
CH.C
CH.D
CH.A
PARAMETER TRIP
SIGNAL
CH.C
CH.B
CH.D
CH.A
BYPASS CH.B
SIGNAL
CH.C
CH.D
TR TR
BISTABLE BISTABLE
PROCESSOR PROCESSOR
RAM ROM
I/O port
Watchdog timer
DIGITAL DIGITAL DIGITAL DIGITAL
OUTPUT OUTPUT OUTPUT OUTPUT
MODULE MODULE MODULE MODULE Counter
SELECTIVE SELECTIVE
2/4 2/4
faults until finally provoking permanent faults. These cases 4.4. Number of experiments
are considered to be possible defects of the LCL processor.
The stuck-at faults are circuit failures equivalent to one In this study, many experiments are performed to
or several circuit nodes being fixed at logic 0 (stuck-at 0), evaluate the fault detection coverage. In the simulations,
logic 1 (stuck-at 1), or wrong data. The relative simplicity of an input case ‘0000 0000’ and the stuck-at fault are
this model has led to its wide use in the industry. Fig. 8 considered. This case can be divided into CPU, ROM, and
shows an example of the stuck-at fault injection. The stuck- RAM.
at fault injection method is realized by the data modification Table 2 represents the number of fault cases in the CPU
for the experiment. In RAM and ROM cases, one byte of and Fig. 9 shows how to inject the stuck-at fault in the CPU.
original data is modified by AND, OR operations so that one The fault is injected into three components of the CPU:
bit is fixed to 0 or 1. In CPU cases, one byte of original data address decoder, instruction register, and program counter;
is assigned to wrong data so that one byte data is changed to all of these components are very sensitive to a fault
wrong data. occurrence of the CPU function. For the address decoder
The stuck-at fault operation is applied continuously to fault, a normal instruction is replaced with another
each address in the simple computer system components instruction. The simulator has 111 instructions, and 110
until the program is completed. This injection method possible wrong instruction assignment cases can be
allows an easy way to implement the permanent stuck-at generated, making the total possible cases for the decoder
fault to modify signals and variables. Stuck-at faults that 12,210. The instruction register is a small, high-speed
occur in a system can cause infinite loops or instruction memory that holds the instruction currently being executed.
errors that lead to wrong results. In 8051, the instruction register has 8 bit; therefore, the
Simulate()
Locate data
Setbit() GetBit()
number of possible stuck-at fault cases is 256. The program Table 3 shows the number of experiments on RAM and
counter is a register that points to the next instruction to be ROM. In the RAM and the ROM, both stuck-at 1 and stuck-
executed. A 16-bit program counter is used in 8051, and the at 0 faults are injected. The RAM has 434 bytes and the
number of possible stuck-at fault cases is 65,536. Therefore, ROM has 384 bytes. Therefore, the total number of
the total experiments on the CPU are 78,002. experiments on them is 13,088. All faults are injected into
the system sequentially. For instance, one permanent stuck-
at 0 fault is injected into RAM and is simulated in the
Original data D7 D6 D5 D4 D3 D2 D1 D0 hardware description; it then validates the result from the
OR description’s output ports, then changes to another bit and
Reference data 0 0 0 0 0 0 0 1
repeats the simulation.
…
(a) Fault example (Address decoder)
Simplified System
Faulted data Register
HeartbeatSignal
Heartbeat Signal
00000001 ?
00000110
11110001
…
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
Fault detection coverage of simplified
Fault detection coverage of ROM
system
Heartbeat-watchdog timer 13.2% 44.2%
Checksum 50.0% 50.5%
coverage of digital systems can be improved by using self- [4] Dugan Joanne B, Trivedi Kishor S. Coverage modeling for
checking algorithms. In this study, the ROM checksum dependability analysis of fault-tolerant systems. IEEE Trans Comput
1989;38(6):775–87.
method is used as a self-checking algorithm. If the ROM
[5] Khoche A, Sherlekar SD, Venkateshesh G, Venkateswaran R. A
checksum method is applied, the fault detection coverage of behavioral fault simulator for ideal. IEEE Design Test Comput 1992;
the system increases to 50.5%. 9(4):14–21.
Our results show that the proposed method is a useful [6] Levendel YH, Menon PR. Test generation algorithms for computer
application for the design of safety-critical digital systems hardware description languages. IEEE Trans Comp 1982;C31:577–89.
used in nuclear power plants. [7] Mao W, Gulati R. Improving gate level fault coverage by RTL fault
grading. Proceeding of international test conference 1996 pp. 150–159.
[8] Hayne RJ, Johnson BW. Behavioral fault modeling in a VHDL
synthesis environment. Proceeding of 17th VLSI test symposium
Acknowledgements 1999 pp. 333–340.
[9] MIL-HDBK-217F. Reliability prediction of electronic equipment;
This work is partly supported by the Korean National Dec 2 1991.
Research Laboratory (NRL) Program. [10] Sueh M, Tsai T, Iyer RK. Fault injection techniques and tools. IEEE
Comput 1997;30(4):75–82.
[11] Clark JA, Pradhan DK. Fault injection: a method for validating
computer-system dependability. IEEE Comput 1995;28:47–56.
References [12] Technical Manual for Digital Plant Protection System (DPPS) for
Ulchin 5 and 6, Westinghouse electric company LLC; 2002.
[1] Kaufman LM, Johnson BW. Embedded digital system reliability and [13] AT89 Series Hardware Description, Atmel Corporation; 2000.
safety analyses, NUREG/GR-0200; 2001. [14] Tanenbaum Andrew S. Structured computer organization. Englewood
[2] National Research Council. Digital instrumentation and control Cliffs, NJ, USA: Prentice-Hall International; 1984.
systems in nuclear power plants. Washington, DC: National Academy [15] Siewiorek DP, Swarz RS. Reliable computer systems-design and
Press; 1997. evaluation. In: Peters AK, editor. 1998.
[3] Kang Hyun Gook, Sung Taeyong. An analysis of safety-critical digital
systems for risk-informed design. Reliab Eng Syst Saf 2002;78(3):
307–14.