Professional Documents
Culture Documents
Analysis of Software Safety and Reliability Method PDF
Analysis of Software Safety and Reliability Method PDF
net/publication/315920456
CITATIONS READS
3 318
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Shahrzad Oveisi on 06 September 2018.
Abstract: The software-based systems, alone do not cause any risk; but the risk
is posed when the software-based systems are considered in the context of
general systems where potential risks or hazards exist. Cyber-physical systems
are cited as instances of software-based systems. Nowadays, safety and
reliability of cyber-physical systems are considerably important due to the
increasing complexity of these systems. Risk management techniques are
required to reduce the risk to an acceptable level. Generally, safety and
reliability methods are important in a risk management process among them
software fault tree analysis (SFTA) and software failure modes and effects
analysis (SFMEA) methods can be utilised. The main purpose of this article is
to provide a comprehensive survey and evaluation of the currently available
approaches for software safety and reliability methods in cyber-physical
systems in order to reflect the state of the art of this active area.
1 Introduction
As observed in Figure 1, the question begins where a fault is caused in subsystem. Then,
this fault turns to an error. Then, the error is issued within the system until it reaches
subsystem Y where it leads to an improper service in subsystem Y. If subsystem Y fails,
issuance of subsystem Y failure can lead to failure of the entire system. Failure in the
subsystems can also be identified and resolved; the effect of system represented as failure
in other subsystem or operation of the system in particular case may lead to system
failure. System failure may lead to a safe or dangerous failure through secure or
dangerous events. A combination of hazardous events caused by the environment, the
entire system and the system may hazard the whole system. Dangerous events can be
known as initiating events.
Software safety and reliability methods are explained in this section. These models are
improved using a bottom-up or top-down analysis. In bottom-up approach, the analyst
repeatedly asks what may happen in case of failure in order to construct a model. In fact,
an analyst views the system with a bottom-up perspective. In other words, the analyst
initiates his action by observing the lowest level of system details and its performance. In
top to bottom analysis, the analyst asks what led to system failure in order to construct a
4 S. Oveisi and R. Ravanmehr
model. Analyst observes the system with a top-down perspective. In other words, the
analyst begins his work by observing the highest level of system failures and move
downward in the system in order to track failure routes.
Generally, safety and reliability methods play a major role in risk management process.
An acceptable level of risk management is required to reduce the risk (Jirsa and Zacek,
2010). Figure 2 shows the position of safety and reliability methods in risk management
activities and in Table 1; these methods are reviewed based on evaluation criteria.
Failure modes and effects analysis (FMEA) is a bottom-up analysis, which aims to
identify, classify and evaluate the hazards and risks associated with them (Sozer et al.,
2007). Process analysis begins by identifying the system range and limits. FMEA
functions are used to develop a flowchart of a process and design system maps. In the
next step, potential failure modes are identified gradually. There are several worksheets
to document FMEA. In the next step, causes of inherent failure effects are decided. In the
last step, necessary measures are identified to reduce the risks due to failure (Menkhaus
and Andrich, 2005).
Analysis of software safety and reliability methods in cyber physical systems 5
Computing and communication capabilities are embedded in all types of objects and
structures in the physical environment. Applications with enormous societal impact and
economic benefit are created by harnessing these capabilities across both space and time.
Such systems that bridge the cyber-world of computing and communications with the
physical world are referred to as CPSs. CPS are physical and engineered systems whose
operations are monitored, coordinated, controlled and integrated by a computing and
6 S. Oveisi and R. Ravanmehr
communication core (Jianhua et al., 2011). This intimate coupling between the cyber and
physical will be manifested from the nano-world to large-scale wide-area systems of
systems. The internet transformed how humans interact and communicate with one
another, revolutionised how and where information is accessed, and even changed how
people buy and sell products. Similarly, CPS will transform how humans interact with
and control the physical world around us.
Examples of CPS include medical devices and systems, aerospace systems,
transportation vehicles and intelligent highways, defence systems, robotic systems,
process control, factory automation, building and environmental control and smart
spaces. CPS interact with the physical world, and must operate dependably, safely,
securely, and efficiently and in real-time. CPS can be considered to be a confluence of
embedded systems, real-time systems, distributed sensor systems and controls. The
promise of CPS is pushed by several recent trends (Miclea and Sanislav, 2011): the
proliferation of low-cost and increased-capability sensors of increasingly smaller form
factor; the availability of low cost, low-power, high-capacity, small form-factor
computing devices; the wireless communication revolution; abundant internet bandwidth;
continuing improvements in energy capacity, alternative energy sources and energy
harvesting.
The need for CPS technologies is also being pulled by CPS vendors in sectors like
aerospace, building and environmental control, critical infrastructure, process control,
factory automation and healthcare, who are increasingly finding that the technology base
to build large-scale safety-critical CPS correctly, affordably, flexibly and on schedule is
seriously lacking (Wu et al., 2011).
CPS brings together the discrete and powerful logic of computing to monitor and
control the continuous dynamics of physical and engineered systems. The precision of
computing must interface with the uncertainty and the noise in the physical environment.
The lack of perfect synchrony across time and space must be dealt with. The failures of
detailed in both the cyber and physical domains must be tolerated or contained. Security
and privacy requirements must be enforced. System dynamics across multiple time-scales
must be addressed. Scale and increasing complexity must be tamed.
These needs call for the creation of innovative scientific foundations and engineering
principles. Trial-and-error approaches to build computing-centric engineered systems
must be replaced by rigorous methods, certified systems, and powerful tools. Analyses
and mathematics must replace inefficient and testing-intensive techniques. Unexpected
accidents and failures must fade, and robust system design must become an established
domain. The confluence of the underlying CPS technologies enables new opportunities
and poses new research challenges.
As can be seen in Figure 3, CPS will be composed of interconnected clusters of
processing elements and large-scale wired and wireless networks that connect a variety of
smart sensors and actuators. The coupling between the cyber and physical contexts will
be driven by new demands and applications. Innovative solutions will address
unprecedented security and privacy needs. New spatial temporal constraints will be
satisfied. Novel interactions among communications, computing and control will be
understood. CPS will also interface with many non-technical users. Integration and
influence across administrative boundaries will be possible. The innovation and
development of CPS will require computer scientists and network professionals to work
with experts in various engineering disciplines including control engineering, signal
processing, civil engineering, mechanical engineering and biology. This, in turn, will
Analysis of software safety and reliability methods in cyber physical systems 7
revolutionise how universities educate engineers and scientists. The size, composition
and competencies of industry teams that design, develop and deploy CPS will also
change dramatically. The global competitiveness of national economies that become
technology leaders in CPS will improve significantly (Rajkumar, 2010).
It is hard to discover the defects with increasing complexity of the CPS. Software is the
backbone of CPS. Complexity of the software with millions of code lines makes a
dangerous position for the effect of failure in this software. Evaluation and verification of
such software as CPS need to ensure stability and analysis of failure modes. Potential
deficiencies in prerequisites, design or application of the software can cause adverse
events in the next level of software integration. As mentioned earlier, one main challenge
in CPSs lies in safety and reliability. In this section, two main approaches have been
studied and evaluated for safety and reliability.
Analysis at system level confirms the system software products are responsible for a
potential defect in the system (top event). SFTA can be used for identification of software
details embedded in the software product whose behaviour leads to occurrence of top
event. If the top event was a critical flaw, software details involved in the top event can
be classified as critical software.
The events composing a tree are analysed as lower level events, which can link
together through logical gates defined in methodology. When analysis and integration of
events are ended, the lowest granularity analysis is obtained. This package depends on
the scope and purpose of SFTA (Needham and Jones, 2006).
6 SFMEA analysis
In this section, SFMEA is studied at both system and detailed levels. In Table 4, these
two methods are evaluated based on evaluation criteria (Goddard, 2000).
Table 4 SFMEA in different levels
Assessment
SFMEA at system level SFMEA at detailed level
method/criteria
Application • Software protection to prevent • Software protection to prevent
system hazardous behaviours. system hazardous behaviours.
• Is used in identifying structural • Check the software to
weaknesses in software design. recognise effects of error in
product unique variables
• Examines effectiveness of
employed.
software architecture
• Aims to identify the major
detailed of software and
functions.
Time of application In software architecture design When the code is fully available,
phase at software detailed design phase.
Runtime Is much quicker than SFMEA at Is very time-consuming and is
detailed level used in special cases.
Output • Shows effects of failure modes Protection at high level of design
on software output to identify is accomplished or not.
any analysed hazardous
outputs.
• Critical level of each detailed is
determined.
Failure modes • Usual software: problems of Failure modes of various variables
output, input, quality, user are examined: char, bool, int,
intermediate, etc. float, double, etc.
• Embedded software: problems
of control, relationship and
transfer, computed problems,
display problems, etc.
10 S. Oveisi and R. Ravanmehr
7 SFTA analysis
After construction of SFTA, FTA can be done in two quantitative analysis and qualitative
methods. In the Table 5, each case study is stated as follows:
Table 5 Cases examined in qualitative and quantitative analyses
In Table 6, the advantages and disadvantages of SFTA and SFMES have been
investigated (European Cooperation for Space Standardization, 2012).
Table 6 Advantage and disadvantage SFTA and SFMEA method
This section explains effective application of a set of software functions and safety
methods, which provides software safety program conditions for many applications
(for example: for those systems where software controls the hardware such as CPSs in
which the effect of software failure is very serious).
These inputs, outputs, and tasks are software safety program requirements for CPSs
(Czerny et al., 2005) and are consistent with part 3 of the IEC 61508 standard that
addresses software safety. In Figure 4, software life cycle is represented.
12 S. Oveisi and R. Ravanmehr
Table 7 Relations between software development phases and software safety tasks
Table 7 shows the relationship between software development process and software
safety procedure. Accordingly, a general overview of the system is designed during
design operational phase. Project leaders should decide whether a software safety plan is
needed for product implementation or not.
Analysis of software safety and reliability methods in cyber physical systems 13
These decisions are typically based on either previous product knowledge or PHA. If
PHA identifies any hazard that may cause software failure, software safety program is
well developed.
In the next phase, i.e., the analysis phase, the software requirements include software
safety program objectives such as identification of software safety requirements to
eliminate, reduce or control potential hazards related to software potential failures.
Software safety requirements cover a set of government rules, applicable international
standards, customer or internal corporate need. A software safety requirement
identification matrix may trace the requirements throughout the development process.
Applied cases and procedures meet the software safety objectives as follows:
1 software hazard analysis
2 hazard testing
3 safety requirements review.
Software hazard analysis detects software scenarios that may lead to identification of
potential hazards identified during the PHA. As mentioned before, a common method
used to accomplish this task is called SFTA. It should be noted that there is no software
architecture and detailed design at this stage. Therefore, software modes are identified in
FTA. Then, safety requirements are defined in investigating software safety requirements
for each possible software failure. At the final stage, a real test is needed to examine the
system under probable risk in testing the hazards. The results of this test show deviation
in error response times and an acceptable system level.
FTA and FMEA are performed at the system level in software architecture design
phase, which aims to identify the major software details and functions. Critical level of
each detailed is determined. Detailed FTA is examined and the codes were written safety
in the next phase, i.e., the detailed software design phase, based on results of the previous
FMEA phase. Given that analysis at detailed level is very time consuming, detailed FTA
and FMEA are investigated in such cases with high risk or intensity based on analysis
results. Then necessary functions in coding are separated from unnecessary function in
secure programming to reduce the probability of unnecessary error leading to probable
risks. Necessary or unnecessary functions are determined in two FTA/FMEA studies at
system and detailed levels. Finally, software validation and verification are determined
using unit tests, integration, etc. to ensure that 2005.
9 Conclusions
For future research, we utilise the results of this survey for evaluating and analysis of
software safety in a specific CPS. For this purpose, we focus on software safety analysis
based on SFTA approach for an optical telescope.
References
Alur, R. (2015) Principles of Cyber-Physical Systems, MIT Press, Cambridge, MA.
Carlson, C.S. (2012) Effective FMEAs: Achieving Safe, Reliable, and Economical Products and
Processes using Failure Mode and Effects Analysis, John Wiley & Sons, Inc., Hoboken,
New Jersey.
Clemson, B. (1984) Cybernetics: A New Management Tool, Abacus Press, Tunbridge Wells, Kent,
UK.
Czerny, B.J. et al. (2005) ‘Effective application of software safety techniques for automotive
embedded control systems’, Transaction Journal of Passenger Cars: Electronic and Electrical
Systems, Vol. 114, No. 7, pp.20–33, Detroit, Michigan.
Dong, W. et al. (2009) ‘Automating software FMEA via formal analysis of dependence relations’,
IEEE 2008, 32nd Annual IEEE International Computer Software and Applications, Turku.
European Cooperation for Space Standardization (2012) Software Dependability and Safety, ECSS-
Q-HB-80-03A, Noordwijk, The Netherlands.
Goddard, P.L. (2000) ‘Software FMEA techniques’, IEEE 2000, Proceedings Annual Reliability
and Maintainability Symposium, Los Angeles, CA.
Helmer, G. et al. (2001) ‘A software fault tree approach to requirements analysis of an intrusion
detection system’, Journal of Requirements Engineering, Vol. 7, No. 4, pp.207–220.
Jianhua, S. et al. (2011) ‘A survey of cyber-physical systems’, IEEE 2011, International
Conference Wireless Communications and Signal Processing, Nanjing.
Jirsa, J. and Zacek, J. (2010) ’UML-oriented risk analysis in manufacturing systems’, Acta
Polytechnica, Vol. 50, No. 6, pp.41–48.
Lee, E.A (2008) ‘Cyber physical systems: design challenges, center for hybrid and embedded
software systems’, IEEE 2008, 11th IEEE Symposium on Object Oriented Real-Time
Distributed Computing, Orlando, FL.
Lutz, R. and Nikora, A. (2012) Failure Assessment, in Nasa Technical Reports 2008: 1st
International Forum on Integrated System Health Engineering and Management in Aerospace,
Pasadena, CA, USA.
Menkhaus, G. and Andrich, B. (2005) ‘Metric suite for directing the failure mode analysis of
embedded software systems’, Paper Presented at the Proceedings of the Seventh International
Conference on Enterprise Information Systems, Miami, USA, pp.266–273.
Miclea, L. and Sanislav, T. (2011) ‘About dependability in cyber-physical systems’, IEEE 2011:
9th East-West Design & Test Symposium, Sevastopol.
Murali, D.V. (2013) Verification of Cyber Physical Systems, Unpublished Master of Science
Thesis, Virginia Polytechnic Institute and State University, Blacksburg, Virginia.
NASA Technical Standard (2004) Nasa Software Safety Guidebook, NASA-GB-8719.13.
Needham, D. and Jones, S. (2006) ‘A software fault tree metric’, IEEE 2006: 22nd International
Conference on Software Maintenance, Philadelphia, PA.
Ozarin, N. and Siracusa, M. (2003) ‘A process for failure modes and effects analysis of computer
software’, IEEE 2009: Paper Proceedings Annual Reliability and Maintainability Symposium,
USA.
Rajkumar, R. (2010) ‘Cyber-physical systems: the next computing revolution’, ACM 2010: Design
Automation Conference, California, USA.
Raspotnig, C.H. and Opdahl, A. (2013) ‘Comparing risk identification techniques for safety and
security requirement’, The Journal of Systems & Software, Vol. 86, No. 4, pp.1124–1151.
Analysis of software safety and reliability methods in cyber physical systems 15
Sanislav, T. and Miclea, L. (2012) ‘Cyber-physical systems – concept, challenges and research
areas’, Journal of Control Engineering and Applied Informatics, Vol. 14, No. 2, pp.28–33.
Snooke, N. and Price, C. (2011) ‘Model-driven automated software FMEA’, IEEE 2011:
Reliability and Maintainability Symposium (RAMS), 2011 Proceedings, Annual, Lake Buena
Vista, FL.
Sozer, H., Tekinerdogan, B. and Aksit,M (2007) ‘Extending failure modes and effects analysis
approach for reliability analysis at the software architecture design level’, Journal of
Architecting Dependable Systems, pp.409–433, Berlin.
Wu, F.J., Kao, Y.F. and Tseng, Y.C. (2011) ‘Review from wireless sensor networks towards cyber
physical systems’, Journal of Pervasive and Mobile Computing, Vol. 7, No. 4, pp.397–413.
Wu, L. and Kaiser, G. (2013) ‘FARE: a framework for benchmarking reliability of cyber-physical
systems’, IEEE 2013: Systems, Applications and Technology Conference, Long Island.