Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

System Software Reliability

Abir Naskar
Roll. No. 13MA60R21
Computer Science and Data Processing
Department Of Mathematics
IIT Kharagpur

“ Under the Guidance of


Prof. Nitin Gupta ”

1
Abstract:

Our idea is to increase the system reliability by keeping cost same, so we use two component of
same power instead of using only one component of double power. But there we have to distribute
the job in such mannar that the job assign to the suitable components and maintaining the load. So
for that we need a software to assign the job in a perticular component, now to form the entire
system reliability we have to calculate the reliability of the system also.

Basic Knowledge :

1.1 INTRODUCTION :

Now a days the consumer and capital goods industries, space research agencies like ISRO,
NASA etc. are facing a big problem of unreliability. These companies and research
institutes can not progress or be succeed in their mission without the knowledge of
reliability engineering. For the space research projects like MOM, Mars Rover, Curiosity,
Apollo etc. very high reliability is needed at every stage. In DRDO's project or any atomic
project also without high reliability progress is very dangerous. Reliability engineering is
essential for companies of electronic goods to approximate warranty date, make idea about
wear out period. For software calculation of expected remaining bug is very important for
space and aviation agencies. Many software companies have to show the reliability of their
products to be in market. Reliability also helps companies to optimize the product cost,
empower the processor speed or making the product faster in less cost. To tackle the
growing competition any company need a clear idea about reliability.

1.2 DEFINITION OF RELIABILITY:

Reliability is restricted for a certain condition and for a particular task. So Reliability is
a probability that the device will perform the task for the period of time under certain
operating condition, so it will not break down in that period. So it is the probability that
the product will face any failure. Reliability is thus also called as probability of
survival. So probability that component survive until some time t is, R(t) = P(X>t) =
1-F(t), where F(t) is called unreliability.
According to ANSI, Software Reliability is defined as: the probability of failure-free
software operation for a specified period of time in a specified environment. Although
Software Reliability is defined as a probabilistic function, and comes with the notion of
time, we must note that, different from traditional Hardware Reliability, Software
Reliability is not a direct function of time. Electronic and mechanical parts may become
"old" and wear out with time and usage, but software will not rust or wear-out during its
life cycle. Software will not change over time unless intentionally changed or upgraded.

1.3 HARDWARE RELIABILITY, UNRELIABILITY, MTBF, FAILURE


RATE :

Failure rate is a parameter. It is a frequency of malfunction. It is the measure of number

2
of failure per unit time.
And the reciprocal value of failure rate is called MTBF or mean time between failure.
Usually we denote the failure rate by λ, and the MTBF by m=1/λ.
The relation of the λ and is given below:

fig:1

The bathtub curve is the graph of the component failure rate as a function of time. This
curve is the mixture of 3 failure rate graph. One is early life failure, second is wearout
failure, and third is random failure.
Now from the Poisson distribution for parameter μ and for the random variable X having
the enumerable set {0, 1, 2,…} the probability mass function is given by,

Now for the given time interval (0,t), and the failure rate λ we have μ=λt, and the
probability mass function is,

Now, if there is no failure up to time t, then the probability P(X=0) gives the reliability at
time t as,
e− t  t0 −t
Rt =P X =0= =e .
0!
And the probability that it fails during the time t, that is the unreliability is, Q(t) = 1 − e− λt .
We can find it from exponential distribution like below.
The exponential density function is, f (t ) = λ e − λt , λ be the constant failure rate.
This is drawn from the Poisson distribution invented by the French mathematician Poisson.
We use this distribution for finding the reliability as only the parameter λ or its reciprocal m
describes completely that distribution. And it is independent of the age of component as
long as the constant failure rate condition persists.

3

Hence, the reliability from the exponential distribution is, R (t ) = ∫ λ e − λt dt =e − λt .
t
And unreliability is that the possibility that it may fail before time t is
t
Q (t ) = ∫ λe − λt dt =1 − e − λt .
0
t
And the mean time between failure (MTBF) is, ∫0 R s dt , [ Rs is the reliability of
the system ].

1.4 SOFTWARE RELIABILITY AND HALSTEAD'S SOFTWARE METRIC:

Software reliability is defined as the probability of failure-free operation of a software


system for a specified time in a specified environment.
A curve is shown if we projected software reliability on the same axes. There are two
major differences between hardware and software curves. One difference is that in the
last phase, software does not have an increasing failure rate as hardware does. In this
phase, software is approaching obsolescence; there are no motivation for any upgrades
or changes to the software. Therefore, the failure rate will not change. The second
difference is that in the useful-life phase, software will experience a drastic increase in
failure rate each time an upgrade is made. The failure rate levels off gradually, partly
because of the defects found and fixed after the upgrades.

fig:2

Halstead complexity measures are software metric introduced by Maurice Howard


Halstead. Halstead made the observation that metrics of the software should reflect the
implementation or expression of algorithms in different languages, but be independent
of their execution on a specific platform. These metrics are therefore computed
statically from the code.
Halstead's goal was to identify measurable properties of software, and the relations
between them. This is similar to the identification of measurable properties of matter
(like the volume, mass, and pressure of a gas) and the relationships between them. Thus
his metrics are actually not just complexity metrics. It is the best known technique to
measure the complexity in a software program and the amount of difficulty involved in

4
testing and debugging the software. The following notations are used:
n1 = number of unique or distinct operators appearing in a program
n2 = number of unique or distinct operands appearing in a program
N 1 = total number of operator occurring in a program
N 2 = total number of operand occurring in a program
N = length of the program
V = volume of the program
E = number of errors in the program
I = number of machine instructions
the length and the volume measure of the program can be obtained by, respectively,
N=N 1 N 2 and V =N log2 n1n2  where N 1=n 1 log 2 n1 and N 2=n 2 log 2 n2 .
The volume of the program will be, V =N∗log 2  n1n2  .

The difficulty of the program will be, D=


n1
2 { }{ }
N
∗ 2 .
n2
Halsted also proposed two empirical formulae to estimate the number of remaining
defects in the program, E, from the program volume. The two formulae, namely,
Halsted empirical model 1 and Halsted empirical model 2, respectively, are
2
E= V
3000
and E= A where,
3000
A= V /
 2 n2 3 .
n1 N 2 
1.5 SERIES SYSTEM:

In this kind of arrangement we use the components in a series like below,

fig:3

In that system, if one component fails then the entire machine goes down. The least value
of reliability of the components used in the machine is the maximum possible reliability of
the machine.
That is let the reliablity of n components are R 1 ,R 2 ,...,R n .
And the reliability of the system is R s .
Then, R s ≤ min{R 1 ,R 2 ,...,R n }.
To get the reliability of the system we have to find the probability that all components are
working up to time t. And hence the reliability of the system is,
R s =R 1 × R 2 × ... × R n = (1 − Q1 ) × (1 − Q2 ) × ... × (1 − Qn ) . Here we assume that the failure
possibility is independent one from another.
In other sense if A1 ,A 2 ,...,A n be the events that the corresponding components will works

5
then the reliability can be describes as, P A1∩ A 2∩...∩ A n .

1.6 PARALLEL SYSTEM:

In this kind of system we use the components in parallel mode like below.

fig:4

In this kind of system the entire machine works until the single component works. Hence
the maximum value of the reliabilities of the components is the minimum possible value of
the reliability of the machine.
That is let the reliablity of n components are R 1 ,R 2 ,...,R n .
And the reliability of the system is R s .
Then, R s ≥ max{R1 ,R 2 ,...,R n }.
Now the reliability of the system we have to find the probability that at least one system
will work at that particular time. Which will be, P(A1 U A 2 U ... U A n ) .
In another sense the machine will fail if all the components fail simultaneously. That
probability is Q1 × Q2 × ... × Q n . Hence the reliability of the system will be,
R s =1-Q1 × Q 2 × ... × Q n .

1.7 RELIABILITY OF K OUT OF N SYSTEM:

A ‘k out of n’ system is the special case of parallel redundancy. It succeeds if at least k


components out of n parallel components work properly. The reliability of this kind of
system is derived from the binomial (n,p) distribution as they are iid. The reliability is,
n n
Rs (n, k ) = ∑ n i
Cr p q n −i
. Here p=R, then this becomes, Rs ( n, k ) = ∑
r=k
n
Cr R i (1 − R) n −i ,
r=k

this holds when all components have same reliability.


The diagram is given below:

6
fig:5

1.8 SURVIVAL FUNCTION AND HAZARD RATE:

The basic quantity employed to describe time-to-event phenomenon is the Survival


function S(t), and it is defined as:
S(t) = P[T>t] = the probability an individual survives beyond time t.
Since a unit either fails, or survives, and one of two mutually exclusive alternatives
must occur, we have;
S(t) = 1-F(t), F(t) = 1-S(t),
where F(t) is the cumulative distribution function (CDF). If T is a continuous random
variable, then S(t) is a continuous, strictly decreasing function. The survival function is
the integral of the probability density function (pdf), f(t),
that is,

S t =∫t f  x dx .
Thus,
−dS t 
f t= .
dt
The failure rate is defined as the instantaneous rate of failure (experiencing the event)
for the survivors to time t during the next instant of time. It is a rate per individual of
time. The next instant the failure rate may change and the individuals that have already
failed play no further role since only the survivors count.
The failure rate (or hazard rate) is denoted by h(t) and is defined by the following
equation
P[ tT th∣T t ] f t 
ht  = lim h  0 = = the instantaneous (conditional failure
h S t
rate.)
The failure rate is sometimes called a “conditional failure rate” since the denominator
S(t) (i.e., the population survivors) converts the expression into a conditional rate, given
survival past time t.
Since h(t) is also equal to the negative of the derivative of ln{S(t)}, we have the useful
t
{
identity, S t =exp −∫0 ht  dt } .

7
t
If we let, H t=∫0 ht dt , be the cumulative hazard function, we then have
S t =e H t  . Two other useful identities that follow from these formulas are:
d ln St 
ht =−
dt
H t=−ln S t .
It is sometime useful to define an average failure rate over any interval T 1, T 2 that
“averages” the failure rate over that interval. This rate, denoted by AFR T 1, T 2 , is a
single number that can be used as a specification or target for the population failure rate
over that interval. If T 1 is 0, it is dropped from the expression. For example,
AFR(40000) would be the average failure rate for the population over the first 40000
hours of operation. The formula for calculating AFR's are:
t2
∫t h t dt H t 2−H t 1  ln St 1−ln S t 2
AFR t 1, t 2= 1
= =
t 2 −t 1 t 2−t 1 t 2 −t 1
and,
H t −ln St 
AFR 0,t =AFR t = = .
t t

Task Overview :

2.1 OUR GOAL:

We will see what will be the change of the reliability if we use two components of same
power instead of using one component of double power. And we will control the work
flow in these two components at the same time. And we will try to find the failure rate
of the new model also. In this way we can get more reliability at the same cost.

fig:6

8
2.2 REQUIREMENTS:

Here we have to design such a system as well as a software which will control the
work flow. So to calculate the system reliability we have to calculate the hardware
and software reliability also. Here we have to know the reliability of each
component. We will use the Halstead's software metric to estimate the remaining
bug in the software.

2.3 CHALLENGES:

It is very hard to design a real type machine of our own and run a software on that
and observe the output. So we are designing our system virtually and use Halstead's
software metric on that.

Calculations :

3.1 CALCULATION OF RELIABILITY IN BOTH SYSTEMS OF FIG 6:

Let the chance failure rate of the component of power 2p is 1 and the failure rate of each
component of power p 2 .
Then the reliability of the component of the power 2p as well as the reliability of the
system1 will be e− t . [ t be arbitrary operating time]
1

The reliability of each component of power p will be e− t . 2

System 2 is a 1 out of 2 system, then system 2 will fail if both component will fail. Now the
probability that both component will fail will be 1−e− t  x 1−e− t =1−e− t 2 .
2 2 2

Hence the reliability of the system 2 will be, 1−1−e− t 2 . 2

3.2 FAILURE OR HAZARD RATE OF SYSTEM 2:

1 dR
The most general expression of failure rate will be t=− .
R dt
 is a function of operating time t,
f t dR
and hence, = as f t =− .
Rt  dt
If the reliability of two component will be R1 t and R2 t  , then the reliability of the
entire system will be, Rt =1−P max X 1, X 2 t =1−1−R1 t 1−R2 t .
d  Rt 
Then failure density function is, f t=− =1−R 1 t f 2 t 1−R2 t f 2 t .
dt
Then the hazard rate or failure rate will be,
f t 1−R1 t  f 2 t 1−R 2 tf 1 t 
ht = = .
g t 1−1−R1 t 1−R2 t 
Here, R1 t=R 2 t=e− t 2

hence, f 1 t=f 2 t=2 e− t 2

so our failure rate will be, ht =2 2 1− { 1


2−e− t2 } .

9
3.3 ESTIMATION OF SOFTWARE BUG BY SOFTWARE METRIC:

Here the important task is to form a software which will assign job to the components and
maintain the work flow in a parallel way. So we design two program one is to run a simple
task parallel (Prog. 1) and another is a scheduling program (Prog. 2) to assign the task. We
try to approximate the result with the real situation with these programs.
When we apply Halsted's software metric on these two program individually we get the
result as,

n1 n2 n N1 N2 N V D E
Prog. 1 13 7 20 27 17 44 190.16 15.78 0.06
Prog. 2 18 8 26 45 31 76 357.23 34.88 0.12

Final Review :

4.1 CONCLUSION:

If the values of the 1 and 2 is very close then the reliability of system 2 will be more
than the reliability of system 1. For example if we take 1 = 0.50 and 2 = 0.60 then
for t=1, we will get the reliability of system 1 and system 2 as, 0.6065 and 0.7964.
So we will get the more reliability on hardware in same cost. But the software reliability
will be added to it. More reliable software will cause more reliable system.

4.2 FUTURE WORK:

I am mainly working on frailty model. A frailty model is a random effects model for time
variables, where the random effect (the frailty) has a multiplicative effect on the hazard.
My research interest is univariate frailty. My aim is to design a suitable frailty model for my
system. These are some initial works related to frailty model. I'm working on that. I'm
hoping that very soon I will be able to design a frailty model for my system.

10
REFERENCES :

 Kececioglu, D. B. (2002). Reliability Engineering Handbook (Vol. 1, pp. 1-41).


Pennsylvania: DEStech Publication, ISBN: 1-932078-00-2.
 Trivedi, K. S. (1982). Probability Statistics with Reliability, Queuing and Computer
Science Application (pp. 283-290, 309-324). Englewood: Prectice-Hall, ISBN: 0-13-
711564-4.
 Bazovsky, I (2004). Reliability Theory and Practice. Mineola: Dover Publication, ISBN: 0-
486-43867-8.
 Zuo, M. J., Huang, J. and Kuo, W (2002). Multi-State k-out-of-n Systems. London:
Springer-Verlag.
 Pham, H. (2002). Reliability of Systems with Multiple Failure Modes. London: Springer-
Verlag.
 Hanagal, D.D, Pandey, A., 2014. Gamma shared frailty model based on revesersed hazard
rate for bivariate survival data. Statistics and Probability letters 88 190-196.
 Pham, H. (2007). System Software Reliability (Springer Series in Reliability Engineering).
Springer. ISBN-10: 1852339500

11

You might also like