This document discusses data requirements and sources for risk analysis and hazard identification. It covers two categories of data: 1) descriptive data that describes the system and context, and 2) probabilistic data related to the likelihood of negative events. Quality issues with probabilistic data include industry applicability, data age, completeness, and statistical uncertainty. Expert judgement is also an important source of data when no other data is available. Checklists are discussed as a method for hazard identification, with checklists developed based on past experience and regularly reviewed.
Original Description:
Original Title
5_DATA FOR RISK ANALYSIS AND HAZARD IDENTIFY_Muh. Iqran Al Muktadir
This document discusses data requirements and sources for risk analysis and hazard identification. It covers two categories of data: 1) descriptive data that describes the system and context, and 2) probabilistic data related to the likelihood of negative events. Quality issues with probabilistic data include industry applicability, data age, completeness, and statistical uncertainty. Expert judgement is also an important source of data when no other data is available. Checklists are discussed as a method for hazard identification, with checklists developed based on past experience and regularly reviewed.
This document discusses data requirements and sources for risk analysis and hazard identification. It covers two categories of data: 1) descriptive data that describes the system and context, and 2) probabilistic data related to the likelihood of negative events. Quality issues with probabilistic data include industry applicability, data age, completeness, and statistical uncertainty. Expert judgement is also an important source of data when no other data is available. Checklists are discussed as a method for hazard identification, with checklists developed based on past experience and regularly reviewed.
Introduction A wide range of information is necessary as i input to any risk assessment and in particular the data needs for quantitative risk assessments will be extensive. The required data may be grouped into two broad categories: 1) Descriptive data These types of data describe the study object and the context in which it is placed, covering the technical systems, the organization, the operation, inputs and outputs, and the environmental factors influencing the study object. 2) Probabilistic data These types of data cause most problems for the risk analyst. They are related to how likely it is that various negative events will take place in the future. Examples are how often hazardous events will occur or how often components and systems will fail. Quality and Applicability of Data Probabilistic data for use in risk assessment is nearly always a problem to find. For several decades, significant effort has been dedicated to the collection and processing of reliability and accident data. Despite this, the quality of the available data is still not good enough – and may never become “good enough” – because what we are trying to do is predict the future based on history. When we have found a source that contains data relevant for our risk assess ment , there are a number of issues that we need to consider before applying the data 1. Industry and application.The first issue to consider is whether the data come from the same type of industry and the same application as we are planning to use them for. There are quite a few sources available for, for example, failure of pumps, but if we are going to analyze an offshore installation and the data from the nuclear industry, is that similar? In order to evaluate this, we need to look at where the pumps have been installed (enclosed area or area exposed to weather), what service they have been used for (water or oil) and how they have been maintained (is the maintenance policy to operate until breakdown or maintain regularly to prevent stops?). Other factors may also be relevant. Often, it is difficult to get precise answers to these questions from the data sources we are using . 2. Age of data. Data sources often provide data that are quite old, maybe going back 20–30 years or even more. In most cases, both technology and operations have developed significantly in such a long period, implying that a pump manufactured 30 years ago is likely to be quite different from a pump manufactured today. The older the data, the more critical we need to be when evaluating their potential use. 3. Present and future technology trends. Similar to the previous item is technol ogy trends that we see today or that we know will be coming. Electrical cars are increasing common and can we expect that, for example, data about fires in petrol cars are applicable for electrical cars? Again, we have to consider the effects of such changes on our expectations for the future compared to the past. 4. Completeness and accuracy of data. A major problem with databases and other data sources is often that the data are not complete. Underreporting has been found to be a significant problem in several cases, casting doubts on whether the reported failure rates are realistic or whether they are too low (eg see Hassel et al. 2011). Errors in reporting is a common problem, potentially leading to the wrong classification of reported events. Both issues are difficult to evaluate. Even if we go into the details of how reporting and recording is being done, it is still difficult to know whether underreporting is a major issue. 5. Extension of data. Finally, it is important to consider the extent of data that the accident and failure rates are based on. There are cases where published data sources of accident rates have been based on as little as one single accident. The statistical uncertainty in the estimates is then obviously high. Data Sources In this section, information about some data sources that may be useful in risk assessment is provided. More information about specific application areas can also be found in Chapter 20. At the time of writing, the sources mentioned in this chapter are considered to be among the best sources of data for risk assessment. This may obviously change in the future, although it takes time to establish good sources of data. Because several of the sources mentioned are available online only, some might be discontinued Expert Judgment The use of expert judgment is essential to the whole process of risk assessment and includes judgment applied both in modeling and data. We focus here on how expert judgment plays a role in establishing data for use in risk assessment. Broadly, we can talk about two main ways of using expert judgment: 1) When data are available, but they are not fully relevant, and experts are used to modify the data to be more fit for purpose. 2) When no data are available, and experts are used to provide their opinion on parameter values. In the first case, the experts will be asked to provide adjustment factors that attempt to modify existing data to better reflect the future. In the second case, the experts will have to come up with completely new values, based on their experience and knowledge. A question that is sometimes raised about expert judgment in general is whether data based on expert judgment are “valid” data and “allowed” for use in risk assessment. Use of expert judgment is a necessity in most predictions we are making about complex phenomena in society. Decisions are made every day based on predictions about how the economy will develop, what we think that our income will be in the future, how the population in a country or a city will develop, how pollution will affect climate, and so on. These decisions are also made based on a mix of models, collected data, and expert judgment, in exactly the same way as risk assessment. Even if these predictions are not always correct, it helps us to make better decisions and prioritize better in the long run. This is also possible to achieve with good risk assessments, based on the best available knowledge. That will in practice always also include expert judgment Dossier data When the report from a risk analysis is presented, what data the risk analysis is based on is sometimes questioned. It is important that the choice of input data is thoroughly documented, especially the choice of reliability data. It is therefore recommended that a data dossier be set up that presents and justifies the choice of data for each component or input event in the risk analysis. Problems 1) In Section 9.1.1, a large number of descriptive data are listed. Where can we get hold of all these types of data? 2) In Section 9.3.2, different uses of accident data are listed. Discuss how the different purposes will have an impact on what information needs to be included in a database containing these data. 3) Look at the information contained in the eMARS database (https://emars .jrc.ec.europa.eu/en/ emars /content) and discuss if the information is suf ficient to meet all the uses of databases listed in Section 9.3.2. 4) You are going to do a risk analysis of a building crane. List examples of technical data, operational data, reliability data, meteorological data, and exposure data that would be required to perform the analysis. 5) What are the quality criteria for reliability of databases? 6) What do we need to evaluate when we have a set of data that we plan to use for risk assessment? 7) You are going to do a risk analysis of a proposed new ship concept with a new type of hybrid machinery that is a combination of a traditional diesel-driven engine and an electrical energy powered by batteries. The electrical engine will be used when there is sufficient power in the batter ies , but for the rest of the time, the diesel engine will be used. 8) Failure data has been collected from a number of identical components. Assume that five failures have been observed during an accumulated time in service of 978 850 hours . HAZARD IDENTIFICATION INTRODUCTION The first question in the triplet definition of risk is: What can go wrong? Answering this question implies identifying the hazards and threats and the initiating and/or hazardous events that have the potential to cause harm to one or more assets. Several methods have been developed for this purpose. These methods are called hazard identification methods. Definition (Hazard identification) The process of identifying and describing all the significant hazards, threats, and hazardous events associated with a system (DEF-STAN 00-56 2007). ️ Several hazard identification methods are not only delimited to identification of hazards but also cover the two other questions in the definition of risk. They can therefore be regarded as “complete” risk analysis methods. Comprehensive descriptions and reviews of hazard identification methods are given in HSL (2005) and ISO 31010 (2009) CHECKLIST METHODS A hazard checklist is a written list of hazards or hazardous events that have been derived from past experience. The entries of the list can be examples of Identification hazards and events or they may be formulated as questions that are intended to help the study team consider all aspects of safety related to a study object. A checklist analysis for hazard identification is also called a process review. Checklists may be based on past experience and previous hazard logs and should be made specifically for a process or an operation. Checklists should be regarded as living documents that need to be reviewed and updated regularly PRELIMINARY HAZARD ANALYSIS PHA is used to identify hazards and potential accidents in the early stages of system design and is basically a review of where energy or hazardous materials can be released in an uncontrolled manner. PHA is used not only for hazard identification but also for ranking the hazards with respect to probability and consequence. The PHA technique was developed by the US Army (MIL-STD-882E), and has been used with success in safety analysis within the defense, for safety analysis of machinery, in process plants, and for a wide range of other applications. A PHA is called “preliminary” because it is usually refined through additional and more thorough studies. Many variants of PHA have been developed, and they appear under different names, such as HAZID and rapid risk ranking (RRR). The abbreviation PHA is also used to mean process hazard analysis, which is a requirement in the United States under the Occupational Safety and Health Administration (OSHA) regulations. JOB SAFETY ANALYSIS A JSA is a simple risk assessment method that is applied to review job proce Dures and practices to identify potential hazards and determine risk reduction measures. Each job is broken down into specific tasks, for which observation, experience, and checklists are used to identify hazards and associated controls and safeguards. The JSA is carried out by a team, and most of the work is done in a JSA meeting. The results from the analysis are documented in JSA worksheets, as shown in Figure 10.6. Because of its in-depth and detailed nature, the JSA can identify potential hazards that may go undetected during routine man agement observations or audits. JSA has been used for many years and in many industries and has been shown to be an effective tool for identifying hazardous conditions and unsafe acts. Other names for JSA include safe job analysis (SJA), job hazard analysis (JHA), and task hazard analysis (THA). FMECA Failure modes and effects analysis were one of the first systematic techniques for failure analysis of technical systems. The technique was developed by rally ability analysts in the late 1940s to identify problems in military systems. The traditional FMEA identifies and describes the possible failure modes, failure causes, and failure effects. When we, in addition, describe or rank the severity of the various failure modes, the technique is called FMECA. The borderline between FMEA and FMECA is vague, and there is no good reason to distin guish between them. In the following, we use the term “FMECA.” FMECA is a simple technique and does not build on any particular algorithm. The analysis is carried out by reviewing as many components, assemblies, and subsystems as possible to identify failure modes, causes, and effects of such failures. For each component, the failure modes and their resulting effects on the rest of the system are entered into a specific FMECA worksheet. Technical failures and failure modes are introduced in Chapter 2. FMECA is mainly an effective technique for reliability engineering, but it is also often used in risk analyses. There are several types of FMECAs. In the context of risk analysis, the most relevant type is the product FMECA, which is also called a bottom-up FMECA, and this section is restricted to this type. Because FMECA was developed as a reliability technique, it will also cover failure modes that have little or no relevance for the risk related to the study object. When the objective of the FMECA is to provide input to a risk analysis, these failure modes may be committed from the FMECA. When performing an FMECA, it is important to keep the definition of a file ure mode in mind. As explained in Chapter 2, a failure mode may be regarded as a deviation from the performance criteria for the component/item. HAZOP A HAZOP study is a systematic hazard identification process that is carried out by a group of experts (a HAZOP team) to explore how a system or a plant may deviate from the design intent and create hazards and operability problems. The analysis is done in a series of meetings as a guided brainstorming based on a set of guidewords and process parameters. The system or plant is divided into a number of study nodes that are examined one by one. For each study node, the design intent and the normal state are defined. Then guidewords and process parameters are used in brainstorming sessions to give rise to proposals for possible deviations in the system. The HAZOP approach was developed by ICI Ltd in 1963 for the chemical industry ( Kletz 1999). The main international standard for HAZOP is IEC 61882 (2016) STPA The STPA was proposed by Leveson (2011) and is based on systems-theoretic accident model and processes (STAMP ), that was presented in Chapter 8. In addition to dealing with component failures, STPA also considers unsafe interactions between system components. These can occur even if no failures have occurred. The method can also consider the wider sociotechnical system that the technical systems are part of, enabling analysis with a wider perspective than many other methods. STPA is a rather recent method that has become increasing popular.2 STPA is based on the assumption that accidents occur due to inadequate control of the study object. STPA identifies cases of inadequate control that may lead to accidents. Control of the system can be effectuated by a technical system , a person, or an organization. Once we have identified any areas where control is inadequate, this can be used to formulate new requirements to the system ensuring that control is maintained. SWIFT SWIFT is a systematic brainstorming session where a group of experts with detailed knowledge about the study object raise what-if questions to identify possible hazardous events, their causes, consequences, and existing barriers, and then suggests alternatives for risk reduction. Estimation of the frequency and severity of the various hazardous events may, or may not, be part of the SWIFT analysis. What-if analyzes have long been used in simple risk analyzes (CCPS 2008). The main difference between a SWIFT analysis and a traditional what-if analysis is that the questions in SWIFT are structured based on a checklist. The SWIFT approach was earlier called a “what-if/checklist” analysis (CCPS 2008). The borderline between SWIFT and a traditional what-if analysis is rather vague. SWIFT has several similarities to a HAZOP study. The main differences are that SWIFT considers larger modules and that checklists and what-if questions are used instead of guidewords and process parameters. A SWIFT analysis is therefore not so detailed and thorough as a HAZOP study, and is easier and faster to conduct. A study team meeting typically starts by discussing in detail the system, func tion , or operation under consideration. Drawings and technical descriptions are used, and the team members may need to clarify to each other how the details of the system functions and may fail. The next phase of the meeting is a brainstorming session, where the team leader guides the discussion by asking questions starting with “What if?” The questions are based on checklists and covers such topics as operation errors, measurement errors, equipment malfunction, maintenance, utility failure, loss of containment, emergency operation, and external stresses. When the ideas are exhausted, previous accident experience may be used to check for com pleteness Comparing Semiquantitative Methods The methods that have been presented so far have many similarities, but they are developed for different purposes and approach the problem of identifying what can go wrong in different ways. In Figure 10.16, six of the methods are compared with respect to the following properties: • Applications. The types of problems that the method is suitable for. • System breakdown. This describes how the system is viewed and described in the method. • How is identification performed? In this column, a description of what structure tured methods are applied to identify hazards and hazardous events. • What is identified? This describes what is identified, in terms of what words are used to describe what is found in the hazard identification process. • How is risk ranking performed? The last column describes if and how risk ranking is performed within the method. MASTER LOGIC DIAGRAMS MLD is a graphical technique that can be used to identify hazards and haz ard pathways that can lead to a specified TOP event (ie an accident) in the system. The hazards and pathways are traced down to a level of detail at which all important safety functions and barriers are taken into account. When this is accomplished, the causal events that can threaten a safety barrier or function can be listed. An MLD resembles a fault tree (see Chapter 11), but differs in that the initiators defined in MLD are not necessarily failures or basic events. MLDs are not pursued further in this chapter. Interested readers may consult Modarres (2006) and Papazoglou and Aneziris (2003). A case study illustrating how MLD can be used to identify failure modes of an intelligent detector is presented by Brissaud et al. (2011). CHANGE ANALYSIS Change analysis is used to determine the potential effects of some proposed modifications to a system or a process. The analysis is carried out by comparing the new (changed) system with a basic (known) system or process. A change is often the source of deviation in the system operation and may lead to process disturbances and accidents. It is therefore important that the pos possible effects of changes be identified and that necessary precautions be taken. In the following, the term key difference is used to denote a difference between the new and the basic system that can lead to a hazardous event or can influenza ence the risk related to the system. The system can be a sociotechnical system, a process, or a procedure. HAZARD LOG It is often beneficial to enter the results of the hazard identification process into a hazard log. The hazard log is also called a hazard register or a risk register. A hazard log is a log of hazards of all kinds that threaten a system's success in achieving its safety objectives (see also CASU 2002). It is a dynamic and living document, which is populated through the organization's risk assessment process . The log provides a structure for collating information about risk that can be used in risk analyzes and in risk management of the system. The hazard log should be established early in the design phase of a system or at the beginning of a project and be kept up to date as a living document throughout the lifecycle of the system or project. The hazard log should be updated when new hazards are discovered, when there are changes to identified hazards, or when new accident data become available. The hazard log is usually established as a computerized database, but can also be a document. The format of the hazard log varies a lot depending on the objectives of the log and the complexity and risk level of the system, and may range from a simple table, listing the main hazards that are related to the system, to an extensive database with several sub-databases . PROBLEMS 1) A generic list of hazards is provided in Table 2.5. Compare the items in this list with the definition of a hazard in Chapter 2 and discuss whether they meet the definition. Consider in particular the “organizational haz ards ." 2) Hazards may be classified according to the main contributor to an accident scenario: (i) Technological hazards, (ii) Natural (or environmental) hazards, (iii) Organizational hazards, (iv) Behavioral hazards and (v) Social hazards. Reclassify the hazards in Table 2.5 according to this classification scheme. 3) List the possible hazards you are exposed to when riding a bicycle. 4) Consider Figure 10.20 and try to identify relevant hazards. Use your imagination. Table 2.5 may also be helpful. Make assumptions about what you see as necessary. Define some accident scenarios (see Chapter 2) and put the accident scenarios in a bow-tie-diagram. 5) Consider Figure 10.20 and carry out a SWIFT-analysis. Use the worksheet in Figure 10.15 to report the results. Compare the results with what you found in the previous problems and identify differences and similarities. 6) Consider the list of hazards that you identified for riding a bicycle in Problem 10.3. Identify examples of active failures that can trigger unwanted events and latent conditions that may lie dormant and contribute to future accidents. 7) Assume that you are driving your car on a wet and slippery country road. A front wheel punctures and you have to change the wheel. You have a spare wheel and the original jack in the boot. a) Break down the job you have to do into a sequence of tasks. List the tasks in the sequence you have to do them. b) Carry out a JSA and record the results in a suitable JSA worksheet. Observe the assumptions you have to make to carry out the analysis. 8) Consider the lifting operations in Example 10.6, but consider the whole operation including the worker onboard the ship, the crane operator, and the worker on the quay who positions the containers and removes the hooks. Note the assumptions you have to make to carry out the analysis. a) Break down the job into a sequence of tasks and list the tasks in the sequence they have to be done. b) Carry out a JSA and record the results in a suitable JSA worksheet. 9) Consider a hot water kettle that is used to make hot water for tea. Choose a model you are familiar with and carry out an FMECA analysis of the kettle. 10) Perform a PHA of a building crane and the operation of the crane. You may delimit the system to the crane itself (but also consider effects on other objects and persons) and consider only normal use of the crane (ie not erecting or dismantling the crane, no maintenance operations, etc.). Make other delimitations if required. Consider the need for a breakdown of the system – this should be done if required. For the hazard identification, it is recommended to use the checklist provided in Table 10.1. In addition, it is also recommended to visit a building site to observe how the operation is done. For the frequency and consequence classification, you may use the classification in Tables 6.8 and 6.9, or you may define your own classification. You may use the PHA worksheet in Figure 10.3. 11) PHA, HAZOP, FMEA, and JSA are four methods used for hazard identification tification purposes. For each of the following systems or operations, comment on whether these methods are suitable or not (or if you think that other methods are better suited): • The water cooling system in a car • A car engine • A race track for car racing • The electrical system of the car • The operation of Replacing the gear box of the car. 12) Figure 10.21 shows a simple “process system.” The system consists of a water tank, with a water supply on the right hand side on the top and a water outlet at the bottom. The tank is normally full with both inlet and outlet closed (ie the valves are normally closed, NC). Water is released manually, by pushing the pushbutton.This opens the valve on the outlet line and water flows out. When the water level reaches the low-level controller (LC2), the outlet valve is closed . The inlet valve is also opened when the water level reaches the low-level controller (LC2). This valve remains open until the water level again reaches the high level (at LC1) when the valve closes.