Software Reliability & Smerfs 3: A Methodology and Tool For Software Reliability Assessment

Software Reliability & SMERFS^3
A Methodology and Tool for Software

Reliability Assessment
March 14th, 2002

Goddard Space Flight Center NASA
William Farr, PhD

B35, Combat Systems Branch Head
Phone: (540) 653-8388
Fax: (540) 653-8673
Email: farrwh@nswc.navy.mil
PURPOSE
To provide an overview of
software reliability and to
illustrate the SMERFS^3 tool for
software reliability assessment.
2
3/14/2002
Outline of Presentation
• Overview of Software Reliability Modeling
• History of SMERFS^3
• Sample data analysis
– Data Input
– Data Plotting and transformations
– Modeling analysis
– Using SMERFS^3 for tradeoff analysis
• Lessons Learned
• Summary
3/14/2002
3
Overview of
Software Reliability Modeling
4
3/14/2002
OBJECTIVES OF SOFTWARE
RELIABILITY MODELING
• QUANTITATIVE ASSESSMENT OF
RELIABILITY
• ALLOCATION OF RESOURCES
• ASSESSMENT OF TESTING EFFORT
• COMPARE RELIABILITY IN DIFFERENT

ENVIRONMENTS
• TRACK RELIABILITY OVER TIME

5
3/14/2002
QUESTIONS ADDRESSED BY
SOFTWARE RELIABILITY MODELING
HOW ERROR FREE IS THE

SOFTWARE?
WHEN SHOULD I STOP TESTING?

SHOULD THE PROGRAM BE
REENGINEERED?
6
3/14/2002
DEFINITIONS
SOFTWARE RELIABILITY IS THE PROBABILITY
THAT A GIVEN SOFTWARE PROGRAM WILL
OPERATE WITHOUT FAILURE FOR A SPECIFIED
TIME IN A SPECIFIED ENVIRONMENT.
SOFTWARE RELIABILITY ENGINEERING (SRE)
IS THE APPLICATION OF STATISTICAL
TECHNIQUES TO DATA COLLECTED DURING
SYSTEM DEVELOPMENT AND OPERATION TO
SPECIFY, PREDICT, ESTIMATE, AND ASSESS THE
SOFTWARE RELIABILITY OF SOFTWARE-BASED
SYSTEMS.
7
3/14/2002
Errors, Faults, Failures
• ERROR – Human action that results in software

containing a fault.
• FAULT – A defect in code that can be the cause of
one or more failures (synonymous with “bug”).
• FAILURE – The inability of a system or system
component to perform a required function within
specified limits. A departure of program operation
from program requirements.
ERROR => FAULT(S) => FAILURE(S)

8
3/14/2002
OPERATIONAL PROFILE
• Operation – A major system logical task of

short duration, which returns control to the
system when complete and whose
processing is substantially different from
other operations.
• Operational profile – It is simply the set of
operations and their probability of
occurrence.
9
3/14/2002
APPROACHES OF S/W
RELIABILITY ESTIMATION
• ERROR SEEDING/TAGGING
MODELS
• DATA DOMAIN
• TIME DOMAIN
10
3/14/2002
APPROACH TO ESTIMATING S/W
RELIABILITY IN THE TIME DOMAIN
OUTPUT
INPUT PROGRA
M
SOFTWARE FAILURE
t1 t2 tn

TIME


xx x xx x xx x x x x x x x x
PERIOD 1 PERIOD 2 PERIOD 3 PERIOD 4 PERIOD 5
SOFTWARE RELIABILITY DATA:

NUMBER OF FAILURES/PERIOD (e.g.,
6,4,2,3,1,...)
OR
TIME BETWEEN FAILURES ( e.g., t1, t2, ... tn)
11
3/14/2002
SOFTWARE RELIABILITY
MODELS IN THE TIME DOMAIN
6 T
I 8
M
C 5
O
E 6
U 4 B
N E 4
T
3 T
S
2
W
2
E
E
1 N 0
1 2 3 4 5 N 1 2 3 4 5 6 7 8 N
TESTING PERIOD FAILURE NUMBER
FAILURE COUNTS PLOT FAILURE TIME PLOT
12
3/14/2002
EXAMPLE SOFTWARE
RELIABILITY MEASURES
ERROR COUNT MODELS TIME BETWEEN MODELS
• TOTAL NUMBER OF ERRORS • TOTAL NUMBER OF ERRORS
• EXPECTED NUMBER OF • MEAN TIME TO NEXT

ERRORS IN FUTURE TESTING
PERIODS FAILURE
• FAILURE RATE
• FAILURE RATE
• TESTING TIME REQUIRED TO
ELIMINATE THE NEXT K • TESTING TIME REQUIRED TO
ERRORS OR TO ACHIEVE A
SPECIFIED FAILURE RATE ACHIEVE A SPECIFIED RATE
• CURRENT PROGRAM • CURRENT PROGRAM

RELIABILITY RELIABILITY
13
3/14/2002
Functions Used in
Reliability Modeling
M(t) = Cumulative # of Faults by time t
^t E{M(t)}
^ = Cumulative Mean Value Function
(t) = d (E{M(t)} ) = Failure Intensity Function

^ ^
__________
dt
Expected Number of faults in the next t time =
t+tt
^ ^
14
3/14/2002
SAMPLE
TRADE-OFF ANALYSIS
1200
T
I 1000
M
E 800
600 TEST TIME
-
400
H
R 200
S
0
600 800 1000 1200 1400
MTBF T
1000
I 800
M
E 600
TEST TIME
-
400
H
200
R
S
0
0.0008 0.0010 0.0012 0.0014 0.0016 0.0018
FAILURE INTENSITY FUNCTION
15
3/14/2002
TIME UNITS
• “Counts” – by day, week, month,

quarter, etc.
• “Time Between” – Wall Clock,
CPU, Number of Test Cases, Natural
unit (e.g. # transactions, # printed
pages, etc.)
Personal Experience: Counts by month provided
the easiest to implement without impacting
prediction accuracy. 16
3/14/2002
CURRENT STATUS OF
RELIABILITY MODELING
• Currently over 75 models in the Literature
• SRE Newsletter, International Symposium (12 held – next in
Annapolis), Numerous Publications on the subject including 4
key books, (Wallace – “ Practical Software Reliability
Modeling”)
• Standards already exist, e.g. IEEE 982.1 and 982.2 and
AIAA’s ANSI/AIAA “Recommended Practice for Software
Reliability” (R-013-1992)
• Software Reliability modeling is applied in a wide variety of
applications, e.g. Communications (AT&T Best Practice),
Space Program (Shuttle, Galileo); DOD (Trident &
Tomahawk), and Industry (SMERFS and SMERFS^3
distributed internationally to over 300 organizations)
17
3/14/2002
HISTORY OF
SMERFS^3
18
3/14/2002
ORGANIZATION OBJECTIVES
TO ENSURE THE HIGHEST QUALITY IN

THE DELIVERED PRODUCTS TO THE
NAVY THROUGH:
A. Verification and Validation,

B. Quality Control, and
C. Configuration Management
THROUGHOUT THE LIFECYCLE OF THE

PRODUCTS.
19
3/14/2002
SOFTWARE RELIABILITY
MODELING AT NSWCDD
• SLBM Software Development program initiated
research in 1981
– Investigated “software reliability modeling” approaches
– Developed a useful reliability tool for QA (SMERFS)
– Applied software reliability methodology on the Trident I
(C4) and Trident II (D5) systems
• Engineering of Complex Systems (ECS)
– Sponsored by the Office of Naval Research.
– One particular task was to develop methodologies and their
implementations for the evaluation and assessment of
“complex” systems.
– Development of SMERFS^3 (Statistical Modeling and
Estimation of Reliability Functions for Systems: Software,
3/14/2002
Hardware, and Systems) 20
SMERFS and SMERFS^3
(1982 – 1996) (1996 – present)
S TATISTICAL S TATISTICAL
M ODELING AND M ODELING AND
E STIMATION OF E STIMATION OF
R ELIABILITY R ELIABILITY
F UNCTIONS FOR F UNCTIONS FOR
S OFTWARE S YSTEMS^3
(HARDWARE, SOFTWARE,
3/14/2002 & SYSTEMS ) 21
SMERFS and SMERFS^3
SMERFS
Time Models
• Littlewood & Verrall Bayesian
Model Mean time to failure
• Musa Basic Execution Model
• Geometric Model
Time between failures • Non-homogeneous Poisson
Process Model
• Musa Logarithmic Poisson Model
• Jelinski-Moranda Model Reliability
Error Count Models
• Generalized Poisson Process
Model
• Non-homogeneous Poisson
Process Model for Counts Number of errors
• Brooks and Motley’s Model remaining
• Schneidewind Model
• S-shaped Reliability Growth
Model
Number of failures per Required time to meet
unit time SMERFS^3 specified reliability
Software Models objectives
• All of the above
Hardware Models
• Weibull
• Exponential Number of failures
• Rayleigh* expected in a specified
• 4 Army Models* time period
System Models
• Markov*
• Hyper-exponential*
* Not Implemented 22
3/14/2002
SMERFS^3
• FOR A PC WINDOWS ENVIRONMENT (WINDOWS, NT)
• INCORPORATES HARDWARE, SOFTWARE, AND
SYSTEM’S RELIABILITY ( SOFTWARE + HARDWARE)
• CAPTURES ALL OF THE FUNCTIONALITY OF SMERFS
• GUI INTERFACE - (VISUAL C++)
• LIBRARY IN FORTRAN 90
• HARDWARE MODELS: EXPONENTIAL, RAYLEIGH,
WEIBULL, CONTINUOUS AND DISCRETE RELIABILITY
GROWTH PROJECTION MODELS, AND THE CONTINUOS
AND DISCRETE RELIABILITY TRACKING MODELS
• SYSTEM MODELS: HYPEREXPONENTIAL AND MARKOV
MODELS
• INCORPORATED INTO ARMY’S INSIGHT TOOL 23
3/14/2002
SMERFS^3 OPTIONS
 DATA INPUT
 ASCI FILE of TBF or FAULT Count
 PLOTS (Raw, Fitted, Residual, etc.)
 DATA EDITING
 UNIT CONVERSION (Time – seconds, minutes,
hours, etc.)
 DATA TRANSFORMATIONS
 DATA STATISTICS
 MODEL FITTING
 Model Applicability analysis
 Model Fit analysis
 Trade-off analysis 24
3/14/2002
Current & Future Efforts
• ARMY
– Last year we had an initiative to incorporate SMERFS^3 into
the Army’s Insight tool.
• Insight is a tool developed by the Army’s Software Metrics Office that
helps the user tailor and implement the issue-driven software and
systems measurement process as defined in the DOD’s Practical
Software and Systems Measurement (PSM) Guide.
• Jet Propulsion Laboratory (JPL), Naval Postgraduate School
(NPS), and the Air Force Operational Test and Evaluation
Center (AFOTEC)
– Develop a new SMERFS^3 incorporating software, hardware, and systems as well
as integrating in JPL’s CASRE tool
– Additional features will include: early reliability prediction and new models
– Currently seeking funding and developing a series of papers/presentations
3/14/2002
showing the benefits of reliability assessment 25
HOW TO OBTAIN COPIES
• Contact me:
– By Phone: (540) 653-8388
– By Email: farrwh@nswc.navy.mil
• Indicate SMERFS or SMERFS^3
• Selected copy will be emailed as a
zipped file
26
3/14/2002
SMERFS^3 SAMPLE
DATA ANALYSIS
27
3/14/2002
FILE INPUT FORMAT (TBF)
Software: TIME BETWEEN FAILURES
Elapsed Time; “0” to denote software, a severity level of the
failure (1-4 levels)*; example:
5.3 0 1
6.7 0 2
10.5 0 1
System:
Elapsed Time; “0” to denote software, “1” to denote hardware; a
severity level of the failure (1-4 levels)*; example:
5.3 0 1
4.7 1 2
6.7 0 2
8.9 1 1
10.5 0 1 * Currently not activated 28
3/14/2002
FILE INPUT FORMAT
(INTERVAL)
Software Interval (Count) Data
Software:
Fault Count ; Interval Length (usually normalized to 1 – 1 day, 1-
month, etc); example:
20 1
32 1
14 1
10 .5
5 .5
2 .25
0 .25
29
3/14/2002
Data File (soft_int)
Soft_int
30
3/14/2002
SMERFS^3 EXECUTION
SMERFS^3
31
3/14/2002
Output From SMERFS^3
32
3/14/2002
Data File (soft_tbf)
Soft_tbf
33
3/14/2002
SMERFS^3 EXECUTION
SMERFS^3
34
3/14/2002
Data Keyboard INPUT
SMERFS^3
35
3/14/2002
Data File (system_tbf)
System_tbf
36
3/14/2002
SMERFS^3 EXECUTION
SMERFS^3
37
3/14/2002
FUTURE PLANS
• Complete the Prototype Development of SMERFS3

 Finish GUI Interface in C++
 Finish Hardware Model Coding and Testing
 Formulate and Implement Systems Reliability
 Develop Supporting User’s Guide
 Distribute It Via Web, INCOSE, Mail Request
 Apply It to a Major Program
 Seek Commercialization
38
3/14/2002
Lesson’s Learned
1. You must keep in mind your environment that your
trying to make an assessment of. The data collected
for your assessment must be reflective of that
environment.
2. Keep in mind your objectives. That will help
determine what data to collect and at what level.
3. Provide feedback to those involved in the collection
process on the results of the assessment.
4. Software reliability models provide one perspective
on reliability but there are others!
39
3/14/2002
Lesson’s Learned cont.
5. In our environment the models based upon counts did
just as well as the ones based upon “time”.
6. Everyone must agree on what constitutes a software
fault and from where it may arise. Also, if your
considering “severity” of a fault, a set of accepted
definitions must be established.
7. If count data is used you must capture the usage
and/or effort involved using the software during the
time intervals of interest. If wall clock tbf data is used
you also must be aware of the effort.
40
3/14/2002
Do’s and Don'ts of Software
Reliability Lessons Learned
Do’s
1. Consider it as a quantitative measure to be used in determination of
software quality.
2. Track reliability over time and/or software version to determine whether
reengineering is needed.
3. Consider it as an important aid in determining resource allocation and in
the testing/maintenance efforts.
4. Develop specific objectives for a software reliability effort.
5. Develop an automated data collection process and database.
Don’t
1. Consider one software reliability model is applicable for all situations.
2. Extrapolate results beyond the environment in which the data is
generated.
3. Apply it before integrated testing in the life cycle.
4. Consider it as the sole data point upon which to make judgments.
41
3/14/2002
SUMMARY
• THERE IS A NEED TO QUANTIFY BOTH COMPONENT
(HARDWARE & SOFTWARE) AND SYSTEM’S RELIABILITY
• FOR TOTAL SYSTEM’S RELIABILITY THE HUMAN FACTOR

MUST BE WORKED INTO THE FRAMEWORK ALSO
• RELIABILITY NEEDS TO BE CONSIDERED OVER THE ENTIRE

LIFECYCLE OF THE SYSTEM
• RELIABILITY ANALYSIS CAN PROVIDE USEFUL

INFORMATION TO ASSESS BOTH THE PRODUCT AND THE
ORGANIZATION’S DEVELOPMENT PROCESS
• WHEN MAKING AN OVERALL PRODUCT ASSESSMENT,

CONSIDER RELIABILITY MODELING AS PROVIDING ONE
PERSPECTIVE - OTHERS NEED TO BE CONSIDERED AS WELL
42
3/14/2002
QUESTIONS?
43
3/14/2002
REFERENCES
44
3/14/2002
KEY REFERENCES
• Software Reliability Engineering, by John Musa, McGraw-Hill,
1999.
• Software Reliability Measurement, Prediction, Application, by
J. Musa, A. Iannino, and K. Okumoto, McGraw-Hill, 1996.
• Software Reliability Modeling, by M. Xie, World Scientific,
1991.
• Handbook of Software Reliability Engineering, Edited by
Michael Lyu, McGraw-Hill, 1997.
• Reommended Practice for Software Reliability, ANSI/AIAA,
R-013-1992, ISBN 1-56347-024-1
• Center For Software Relability - http://www.csr.city.ac.uk/
• Links to Others - http://members.aol.com/JohnDMusa/
3/14/2002 http://www.dacs.dtic.mil/ 45

Software Reliability & Smerfs 3: A Methodology and Tool For Software Reliability Assessment

Uploaded by

Copyright:

Available Formats

You might also like

Software Reliability & Smerfs 3: A Methodology and Tool For Software Reliability Assessment

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Software Reliability & Smerfs 3: A Methodology and Tool For Software Reliability Assessment

Uploaded by

Copyright:

Available Formats

Software Reliability & SMERFS^3

A Methodology and Tool for Software

March 14th, 2002

William Farr, PhD

• ASSESSMENT OF TESTING EFFORT

• COMPARE RELIABILITY IN DIFFERENT

• TRACK RELIABILITY OVER TIME

HOW ERROR FREE IS THE

WHEN SHOULD I STOP TESTING?

• ERROR – Human action that results in software

ERROR => FAULT(S) => FAILURE(S)

• Operation – A major system logical task of

SOFTWARE RELIABILITY DATA:

TESTING PERIOD FAILURE NUMBER

FAILURE COUNTS PLOT FAILURE TIME PLOT

• TOTAL NUMBER OF ERRORS • TOTAL NUMBER OF ERRORS

• EXPECTED NUMBER OF • MEAN TIME TO NEXT

• CURRENT PROGRAM • CURRENT PROGRAM

(t) = d (E{M(t)} ) = Failure Intensity Function

FAILURE INTENSITY FUNCTION

• “Counts” – by day, week, month,

TO ENSURE THE HIGHEST QUALITY IN

A. Verification and Validation,

THROUGHOUT THE LIFECYCLE OF THE

• Complete the Prototype Development of SMERFS3

• FOR TOTAL SYSTEM’S RELIABILITY THE HUMAN FACTOR

• RELIABILITY NEEDS TO BE CONSIDERED OVER THE ENTIRE

• RELIABILITY ANALYSIS CAN PROVIDE USEFUL

• WHEN MAKING AN OVERALL PRODUCT ASSESSMENT,

You might also like