Professional Documents
Culture Documents
Maximum Likelihood Voting For Fault-Tolerant Software With Finite Output-Space
Maximum Likelihood Voting For Fault-Tolerant Software With Finite Output-Space
For a given input, the A-version outputs = 0, and the B- 3b. A failed software version gives 1 of the R- 1 incor-
version outputs = 1; this event is denoted E. According to rect outputs with equal probability. 4
MV or CV, the correct result is estimated to be 1 because
the 6 B-versions (of 11 total versions) give 1. However, given
the 11 outputs and assuming s-independence of version 3. MLV
failures,
The main idea of MLV is to choose, based on how reliable
Pr {correct output is 0 I [} = 0.995-(1-0.95)6/Pr{[} each software version is and how faithful its output is, the out-
put which is most likely correct.
= -
1.49 10-'/Pr { [} ; From assumption #3, a software version gives a particular
incorrect output with probability ( 1 - ri)/ ( R - 1) .
Pr{correct output is 1 I [ } = (1 -0.99)5.0.956/Pr{[} R
Pr{s} = Pr{s n (correct result is output j ) }
= 7.35.10-"/Pr{[). j=1
R N
Since Pr{correct output is 0 IE} > Pr {correct output is 1 I E},
it is more likely that-the correct output is 0, although only 5
of the 11 versions give 0.
This paper proposes:
Aj(i) =
a MLV strategy for multiversion software with finite output (l-ri)/(R-l), otherwise.
space. Based on the outputs and the reliabilities of the soft-
ware versions, this strategy selects the output that is most like- xi is found as follows:
ly correct.
two enhancements to MLV. xj = Pr{correct result is output j l s }
We analyze & compare the performance of these MLV strategies
= Pr{s f l (only the versions giving output j are
with those of MV & CV.
correct)}/Pr {s)
2. ASSUMPTIONS & NOTATION R N
(3)
Notation k=l i=l
1. All software versions are functionally equivalent and Since the denominators of all xj are the same, see (3), the
mutually s-independent. voter need only find & compare the numerators of the xj, and
2. R is finite. thus reduce the computation time.
3a. For 1 input: 1 of the R possible outputs is correct; R- 1 MLV does not pose any restriction on the ri, i E [ 1,N.
of them are incorrect. Therefore, MLV still applies when ri = a constant for all i, and -
~
LEUNG: MAXIMUM LIKELIHOOD VOTING FOR FAULT-TOLERANT SOFTWARE WITH FINITE OUTPUT-SPACE 421
Example 2
End
(4)
Figure 2. Maximum Likelihood Voting with Enhancement 1
422 IEEE TRANSACTIONS ON RELIABILITY. VOL. 44. NO. 3. 1995 SEPTEMBER
6. PERFORMANCE ANALYSIS
< start )
Notation
Assumptions
2. FOR each incoming set of y software-version outputs ( y 2 1 Pr ( t } = Pr {t n (only the versions giving output 1 are correct)}
is a r.v.) DO
N
3a. Q, := 9 U { j k : k ~ [ l , y ] } = 6(i);
3b. FOR k= 1 TO y DO Q , : = Q, U {jk}; END-FOR i=l
3c.x:=x y +
4. Compute all the xi according to (4); arrange them in 6(i) =
decreasing sequence such that there are zmaxima (z L 0) satis- ( 1 -ri)/(Z?- l ) , otherwise.
fying both conditions:
The f is found from t:
A = Ofor 1 I i I R;
5. IF z > 0 THEN choose output j k , k E [l,z], with prob-
ability ltz; F O R i = l T O N D O f , : = f ,+ l ; END-FOR.
/ * If z= 1, then the unique maximum is chosen with prob-
ability 1 * / Execution Time
GoTo EndAlgorithm; E N D l F
6. IF x = R THEN Output a warning message; GoTo (For all voting except MLV-E2.)
EndAlgorithm; E N D J F The voter waits for all the outputs before it estimates.
7. END-FOR
EndAlgorithm 4 33 = max{?i: i E [l,n]); (6)
~
LEUNG: MAXIMUM LIKELIHOOD VOTING FOR FAULT-TOLERANT SOFTWARE WITH FINITE OUTPUT-SPACE 423
For MLV-E2, the voter need not wait for all outputs before
it estimates. Section 7 uses computer simulation to study the
mean testing time required. and (fi > fk for k ,... ,ji-l)}
# lJ1 (16)
6.2 CV
Notation
6.3 MLV
C event: {output 1 is chosen}
Ei fi
event: { f i = fjl = J 2 = ... = Ji-,, > fk for all k Fi= {output 1, output jl, ..., output ji- have the same largest
f 1, j 1 , ..., i~[l,Rl probability of correctness)
4 opposite of any event $.
The E;, i E [1,R] are mutually exclusive. For El, the number
of software versions giving output 1 is the largest. Therefore,
consensus voting chooses output 1 with certainty. for all k z 1, jl, ..., j i - l }
The Fk,k E [l ,RI are mutually exclusive.
Pr{E, n C} = Pr(t};
R
tEQl
RMLV = Pr{Fi f l C}
= {t: fl > fk for all k f l } i=l
R
For E2, f i = fjl; hence CV chooses output 1 with probability
'h. = (l/i)* Pr{t}, (18)
i=l f E f!
The MLV fails when the most likely correct result is not
and cfi > fk for k # l,jl)} (13) output 1, or the most likely correct result is: output 1, or out-
put j , or .. . or output j i - 1, but the voter does not choose out-
Pr{Ei f7 C}, i E [3,R] can be found in a similar fashion. put 1.
6.4 MLV-EI
Since the Eiare mutually exclusive,
This strategy fails when:
R
Rcv = Pr(E; n C } output 1 is not the most likely correct one while the most likely
i=l correct one satisfies the requirement a*,or
424 IEEE TRANSACTIONS ON RELIABILITY. VOL. 44, NO. 3. 199.5 SEPTEMBER
output 1, output j,, .. ., and output ji- are the most likely
system reliability
correct results and all of them satisfy the requirement a*,
but the voter does not choose output 1. 0.9 %
i-1
FMLV-E~ = pr{t> + 7 ~r{t>, (2W
t E i-T i=2 tE c
{r = {t: x1 < max{xj} and max{xj} 2 a*},
I I
(21b)
6.5 MLV-E2
4 6 8 10 12 14 16 18 20 22
1
This strategy fails when the voter has received one or more R
outputs, and:
[N = 5; r = (0.3,0.4, 0.5,0.6,0.7}]
outputj U # 1) is found to be the most likely correct result
Figure 4. System-Reliability vs R
and it satisfies the accuracy requirement a*,or
output 1, output j,, ..., and output ji-lare found to be the average system reliability
most likely correct results, all of them satisfy the require- 1.ooo
ment a*,but the voter does not choose output 1.
It is difficult to express the failure probability in closed form; 0.995
it is measured by computer simulation in section 7.
0.990
N = 5
5 7 9 11
Figure 4 shows system reliability vs R. N
MLV always results in larger system reliability than CV
or MV. When R increases, the probability of getting identical [R = 3; ri is uniformly distributed between 0.8 & 1.01
and incorrect outputs becomes smaller and hence the reliabili-
ty of both MLV & CV increases with R. Figure 5. System-Reliability vs N
To study the effect of N on system reliability, let
We now illustrate the effects of the variation of software
R = 3, version reliability on the system reliability. Consider the three
distributions of software version reliability:
the reliability of each software version be uniformly distributed uniformly distributed -
between 0.8 and 1.0.
between 0.5 and 0.7,
We generate 5000 random cases, calculate the system
reliability for each, and then obtain the average system between 0.3 and 0.9,
reliability. Figure 5 shows the average system reliability between 0.2 and 1.0.
vs N . MLV provides the largest system reliability. As
N increases, the multiversion software becomes more reliable, These distributions have the same mean, but the first distribu-
and the reliability difference among the three voting strategies tion has the smallest standard deviation while the third distribu-
become smaller. tion has the largest. Figures 6 - 8 show the average system
~
LEUNG: MAXIMUM LIKELIHOOD VOTING FOR FAULT-TOLERANT SOFTWARE WITH FINITE OUTPUT-SPACE 425
reliability vs N for the three distributions. If the variation of p= 1.0, and a* = 0.99. The relative reduction in mean execu-
software version reliability is larger, MLV has better perfor- tion time is larger:
mance than either CV or MV.
as N increases, because the other voting strategies require a
longer time to wait for all the N outputs, or
average system reliability as the reliability of the software versions increases because
the MLV-E2 needs to wait only for fewer outputs to satisfy
l'O
0.9
I the same requirement a*,or
as the (T increases.
I
0.9
//
0.7 :
0.8
0.6 1 1
t-/
I
3 5 7 9 11
Y
N 0.7
N
average system reliability [R = 3; ri is uniformly distributed between 0.2 & 1.01
1 .o
Figure 8. System-Reliability vs N
60
0.8 50
40
0.7 30
20
0.6
3 5 7 9 11 10 - is uniformlydistributedover (0.9.1 .O)
. - - .rt ft IS unilormly distributed over (0.7. 1 0)
N
0 , " " " ' " " " ' * ' J .
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2(
[R= = 3; r, is uniformly distributed between 0.3 & 0.91
N
Figure 7. System-Reliability vs N
[R= 5; /A= 1 .o, CY* = 0.001
Figure 9. Relative Reduction in Mean Execution Time vs N
Compared with the other voting strategies, MLV-E2 re-
quires smaller mean execution time because it need not wait
for all the outputs. To illustrate, we let the execution time re- Figure 10 shows the relative reduction in mean execution
quired by each software version be s-normally distributed with time for three values of a* with R = 5; p = 1.0, B = 0.1;
mean p and standard-deviation B . Figure 9 shows the relative ri is uniformly distributed between 0.9 & 1.0. When a* is
reduction in mean execution time required vs N for R=5, higher, MLV-E2 must wait for more outputs in order to have
426 IEEE TRANSACTIONS ON RELIABILITY. VOL. 44, NO. 3, 1995 SEPTEMBER
a more accurate estimation, and hence the relative reduction ability is smaller than, but close to, 1-a*. Although FMLV-EI
in mean execution time is smaller.
25 10-2
-
20
-
15
10‘~
-
10
-
5 -a*= 0.90
a*=0.99
10-61 ’
0 2 4
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 6 8 10 .I 2
N
N
[R=5; p = 1.0, u = O . I ; ri is uniformly distributed between 0.7 &
[R = 5; p = 1.O,0 = 0.1; r;.is uniformly distributed between 0.9 & 1.O]
1.O]
Figure 12. Failure Probability vs N
Figure 10. Relative Reduction in Mean Execution Time vs N
S. Brocklehurst, B.Littlewood, “New ways to get accurate reliability Internet (e-mail): y .leungQieee .org
measures”, IEEE Sofnoare, vol 9, 1992 Jul, pp 34-42. Yiu-Wing Leung was born in Hong Kong. He received his BSc (1989)
F.T. Sheldon, K.M. Kavi, R.C. Tausworthe, eral, “Reliability measure- and PhD (1992) from the Chinese University of Hong Kong. He is an Assis-
ment: From theory to practice”, IEEE Sojiware, vol9, 1992Jul, pp 13-20. tant Professor in the Dept. of Computing, The Hong Kong Polytechnic Univer-
M.A. Vouk, A.M. Paradkar, D.F.McAllister, “Modeling execution time sity. His main research interests include information networks and hardware/sof’t-
of multi-stage N-version fault-tolerant software”, Fault-Tolerant SOB- ware reliability.
ware Systems: Techniques and Applications (H. Pham, Ed), 1992, pp
55-61; IEEE Press. Manuscript received 1995 April 1.
A.S. Tanenbaum, Computer Networks (2nded), 1988; Prentice-Hall.
Special Issue on Fault-Tolerant Software, ZEEE Trans. Reliability, vol IEEE Log Number 94-13708 4TRb
42, 1993 Jun.
AUTHOR
Dr. Yiu-Wing Leung; Dept. of Computing; The Hong Kong Polytechnic Univer-
sity; HONG KONG.
.4RWM.S ARWMS ARWMS ARWMS ARWMS ARWMS AIIWMS ARWMS AKWMS AKWMS ARWMS ARWMS ARWMS
Each year the Symposzum presents the P . K. McElroy Award for the Best Paper at the previous
Symposzum. T h e P . K . McElroy Award consists of a plaque and a $1500 Honorarium. T h e two
criteria for Best Paper are:
T h e written paper is lucid, excellent, and important to the theory and/or practice of R&M
(reliabilitv & maintainability) engineering.
T h e presentation of the paper at the Symposzum is likewise lucid 8c excellent.
P. K. McElroy was an intensely practical person. T h e people & papers that receive the P. K.
McElroy Award must be able to make a difference to R8cM engineers and/or managers. It is not
enough that the paper’s content be competent & important: that competence & importance must
be obvious in both the written gaper and the presentation at the Symposzum.
Before the Symposzum, the content of each written paper is examined bv the Program Committee
for technical excellence and for clarity of‘exposition. T h e best of those papers a r e chosen and then
referred to a select group of past General Chair’n of the Symposzum. Each person in that group
attends each presentation; that group chooses the Best Paper to receive the P. K. McElroy Award.