58A All Slides

RISK ANALYSIS & MANAGEMENT
LECTURE NOTES
ON
Prof. Dr. İlhan OR
Industrial Engineering Department, Boğaziçi University,
Istanbul
Spring 2021
İlhan Or - Boğaziçi University 1

IE498 - RISK ANALYSIS & MANAGEMENT
PART 1
INTRODUCTION TO RISK CONCEPTS

The History of Risk Appreciation
u The concept of Risk and Risk Assessment has a long history.

u 2400 years ago the Athenians offered their capacity of
assessing risks before making decisions:
• “We Athenians take our decisions in policy and submit them to proper
discussion. The worst thing is to rush into action before consequences
have been properly debated. This is an important aspect in which we
differ from other people”.
• “We are capable of taking risks while assessing them beforehand.
Others are brave out of ignorance; and when they stop to think, they
begin to fear. But the man who can most truly be accounted brave is he
who best knows of what is sweet in life and hat is terrible, and he then
goes out undeterred to meet what is to come”.

u Development of formal Risk Appreciation over time:

• Games of chance have been one of the first inventions of primitive man;
• 17th Century: Pascal concluded that, given the probability distribution
for God's Existence, the expected value of beliving in God outweighed
the expected value of atheism;
• Graunt (1662) published Life Expectancy tables;
• Arbuthnot (1692) argued that probabilities of causal events can be
calculated;
• Hutchinson (1728) analyzed the tradeoff between probability and utility
in risky decisions;
• Laplace (1793) developed a prototype of modern day risk assessment -
An analysis of the probability of death with and without smallpox
vaccination.

u Establishment of Causal Links over time:

▪ 5th Century B.C: The association between malaria and swamps was established
even though the precise reason remained obscure;
▪ 1st Century B.C:
➢ Burn tests of Pliny to detect food adulteration;
➢ Greeks & Romans observed the adverse effect of exposure to lead;
▪ 16th – 18th Century:

➢ Agricola linked adverse health effect to various mining & metallurgical practices;
➢ Evelyn linked smoke in London to various types of acute respiratory problems;
➢ Ramazinni indicated that nuns living in Appenine monasteries appeared to have

higher frequencies of breast cancer;
➢ Hill linked the usage of tobacco snuffing with cancer of the nasal passage;
➢ Hutchinson showed that occupational&medical exposure to arsenic can lead to

cancer.

u Some Early Examples of Societal Risk Management:

▪ Avoiding the risk by prohibiting the use of a potentially dangerous
object or substance.
▪ Reducing adverse effects of natural hazards through construction
projects such as levees, seawalls.
▪ Reducing vulnerability by elevating buildings in flood plains, by
implementing quarantine laws.
▪ Developing response/recovery procedures by stock piling food, by
setting up fire extinguishing services.
▪ Insurance is one of the oldest strategies for coping with risk. Its origins
trace back to setting interest rates in 3000 B.C. Mesopotamia.

u Two major obstacles impeded the progress in establishing

causal links necessary for Risk Analysis & Management.
▪ The scarcity of scientific models of biological, chemical and physical
processes.
➢ Also, lack of instrumentation and the lack of rigorous observational and
experimental techniques for collecting data and testing hypotheses.
▪ The belief, rooted in ancient traditions, that most illnesses, injuries and
disasters could be best explained in social, religious, or magical terms.
➢ In 16th - 17th centuries, witch hunting resulted in death by fire for an
estimated half-million people, as the Church attempted to eliminate a
perceived source of crop failures, disease, and other ill fortune.
➢ In 1721 an influential critic of medical experimentation in Boston
insisted that smallpox is “a judgment of God on the sins of the people".

u Without a structured approach to dealing with risk (without odds
and probabilities), the natural way of dealing with risk is to
appeal to gods or to fate; risk is wholly a matter of gut.
u A key factor in distinguishing modern age is the mastery of risk
▪ The notion that man are not passive before nature.
▪ By understanding, measuring and managing risk, risk-taking has become
one of the prime catalysts that drives modern society; facilitating
economic growth, technological progress and improved quality of life.
◆ Without some form of risk management, engineers could never have designed the great
bridges, electrical power utilities would not exist, no airplanes would fly.
◆ Economic growth, technological progress would scarcely be what it is today without
motivated risk takers such as Marie Curie, Warren Buffett, Bill Gates, Steve Jobs (*).
◆ Without risk-taking nothing big happens (*).
▪ It is (structured) risk taking that drives innovation(*).

Risk Related Definitions
u Risk is a common term, heavily used in daily language.
But, issues/concepts reflected may differ:
• May refer to vulnerability/exposure;
• May refer to likelihood of occurrence of the event;
• May refer to the consequences of the event;
• May refer to a measure combining the notions above.
u The beginning of Wisdom is the definition of terms (Socrates)
u Units deployed to measure Risk may also differ:
• Number of fatalities per year (fatalities/year);
• Infrastructure damage (down time);
• Business losses pear year ($/year);
• Combination of the above.

u Hazard: An act, event or phenomenon posing potential harm
(fatalities, injuries, property/infrastructure/environmental damage,
agricultural loss, business interruption).
• The Hazard is the potential, the Disaster is the actual event;
• Based on the Arabic word al zahr (dice)
u Exposure: To be placed without protection in the area affected by
hazard.
u Disaster: An occurrence inflicting wide spread destruction and
distress (from the Latin word disastro (ill starred))
• Any occurrence which causes damage, ecological disruption, loss of human
lives, deterioration of health and health services on a scale sufficient to
warrant an extraordinary response from outside the affected community
(WHO)

Risk Related Definitions - Hazards
u Examples of Natural
Hazards:
• Earthquakes;
• Tsunami;
• Hurricanes (Cyclones, Typhoons);
• Tornadoes;
• Floods (flash, coastal, urban);
• Wildfires;
• Drought/Famine;
• Thunderstorms, Lightning, Hail;
• Blizzards, Avalanches;
• Mudslides, Landslides.

u Industrial Hazards are threats to people and life support

systems that arise from the production of goods and services.
u Examples of Industrial Hazards:
• Oil Spills (ships, pipelines, facilities);
• Hazardous materials incidents, toxic releases;
• Radiological incidents (reactor accident, release of radioactive materials);
• Fire and explosion (industrial, residential);
• Transportation accidents (ship, train, rail);
• IT reliability and cyber failures;
• Electrical power and/or communications loss;
• Other infrastructure failures;
• Quality failures.

An Industrial Disaster - Buncefield Fire, 2005 U.K.

u Characteristics of Industrial Hazards:

• Large scale technology or technical activities;
• Large scale storage or use of high energy sources and/or toxic, flammable,
explosive materials;
• Large scale transportation of toxic, flammable, explosive materials;
• Transportation, processing and storage of nuclear materials;
• Potentially large numbers of people exposed.
u Sites/Installations at which Major Hazards are Likely:
• Nuclear installations;
• Chemical processing facilities handling large quantities of flammable
and/or toxic materials;
• Offshore oil and gas installations, oil refineries, petrochemical plants;
• Water treatment plants using bulk chlorine;
• Bulk storage of flammable materials;
• Underground mines.

u Other Types of Hazards:

• Terrorism;
• Civil Disturbances;
• Chronic Diseases;
• Biodiversity Loss;
• Sabotage;
• Crime and Corruption;
• Climate Change.

u Emergency: An unexpected situation or sudden occurrence of

a serious and urgent nature that demands immediate action.
• An unexpected event which places life and/or property in danger and
requires and immediate response through the use of routine community
resources and procedures. (Drabek)
u Risk Environment: Environments featuring events/actions
(hazards) that may lead to potential losses of life, property or
environmental quality, depending on chance outcomes.
u Risk: The exposure to the chance of loss; the combination of
the likelihood of an undesirable event (hazard) occurring and the
significance (level) of the consequence of the event occurring.
• [Risk Level = Probability X Level of Damage]

u Risk is the probability of an adverse outcome (Graham and Weiner 1995).
u Risk is the combination of probability of an event and its consequences (ISO
2002);
u Risk refers to uncertainty of outcome, of actions and events (Cabinet Office,
UK 2002);
u Risk equals the expected disutility (Campbell 2005);
u Risk is an uncertain consequence of an event or an activity with respect to
something that humans value (Renn 2005);
u Risk is equal to the combination of events/consequences and associated
uncertainties (Aven 2007);
u Risk is uncertainty about and severity of the consequence of an activity with
respect to something that humans value (Aven and Renn 2008);
u Risk arising from an activity is the set of all possible scenarios leading from
that activity to consequences, together with prior probabilities of scenarios
(Bedford 2008).

Risk Related Quotations
u If we don't succeed, we run the risk of failure. Bill Clinton, President.
u We are ready for an unforeseen event that may or may not occur. Al Gore,
Vice President.
u Outside of the killings, Washington has one of the lowest crime rates in the
country. Mayor Marion Barry, Washington, DC.
u Predictions are difficult, especially about the future; maybe that’s why most
of us avoid uncertainty instead. Physicist Niels Bohr
u Once you have internalized the concept that you cannot prove anything in
absolute terms, life becomes all the more about odds, chances and trade-offs.
Robert Ruben Former US Secretary of Treasury.
u Good strategists manage uncertainty by playing the probabilities; but most of
us use wishful thinking instead. Anon.
u Democracies don’t prepare well for things that have never happened before.
Richard Clarke, White House Chief of Staff.

A Word of Caution
u Risk does not exist out there, independent of our minds and
culture, waiting to be measured.
u Human beings have invented the concept of risk to help them
understand and cope with the dangers and uncertainties of life.
u Although these dangers are real, there is no such thing as “real
risk” or “objective risk”.
u The evaluation of risk depends on the choice of a measure.
• With the possibility that the choice may have been guided by a preference
for one outcome or another.
u Defining risk is thus an Exercise in Power.

u Risk Assessment: The quantitative (or semi-quantitative)
determination of the likelihood and impact level of identified potential
hazards, as well as other characteristics of these hazards that may be
deployed in risk evaluation.
• Science, medicine, technology can help in the understanding and assessment of risks.
u Risk Analysis: The deployment of obtained quantitative and
qualitative assessments (regarding the likelihood, impact level and
other characteristics of hazards) for the purpose of comparing possible
risks and making risk management decisions.
u Risk Communication: The exchange of information, concerns,
perceptions, and preferences within an organization and between an
organization and its external environment about risks, risk assessment
and risk management issues.

What is Risk Management
u Risk Management is a set of integrated and coordinated proactive

attempts in an organization to recognize and manage internal events
and external threats that affect occurrence likelihood, impact level
and recovery from technological, natural or other hazards. It aims at,
• Identifying as many Risks as possible (what can go wrong);
• Assessing and Analyzing the Identified Risks;
• Generating and Evaluating action Alternatives aimed at minimizing their
occurence likelihoods and/or impacts;
• Providing contingency funds, communication, preparedness, response and
recovery plans to cover Risk Events that actually materialize.
• Implementing the developed Risk Strategy and related plans.
u It is imperative that Top Management understand and comprehend
the meaning of risk management.
Ilhan Or - Boğaziçi University 21

Risk Related Definitions: Risk Preferences
u Risk Tolerance: Risk level acceptable to individuals or groups.
u Risk Preferences: Individuals’ attitude to take Risks.
u Risk Averse – Disinclined (afraid) to take Risks
• Preferring a certain environment having a specified “damage level”, to a risky
environment having the same “risk level”.
u Risk Prone – Inclined (seeking) to take Risks
• Preferring a risky environment having a specified “risk level” to a certainty
environment having the same “damage level”.
u People may exhibit risk prone or risk averse
behaviour depending on the circumstances.
• Buying insurance (risk averse behaviuor);
• Buying lottery tickets; extreme sports
(risk prone behaviour).

Risk Perception
u Risk is essentially a cognitive phenomenon.
u Social, cultural & professional bias may have considerable
effects on group’s or individual’s perception of risk.
• The current emphasis on security, ecological and IT risks could make
excellent research material for an anthropologist in 200 years.
u People respond to hazards that they perceive. If perceptions are
faulty, public or organizational policy will be misdirected.
u Laymen’s perceptions of risk are more qualitative than experts.
• Individuals have great difficulty in interpreting low probabilities in
making their decisions;
• There is evidence that people may not even want data on the likelihood
of an event occurring.

Risk Perception
u Perceived risk is biased by Imaginability & Memorability of hazard.
• People’s perception of risks are influenced by whether they are told of
likelihood and/or consequences of risk events or whether they personally
experience those disasters.
• If risk event has not materialized “yet”, they underestimate; if an undesirable
realization has occurred they tend to overestimate.
• Many people buy their first set of battery cables only after their car doesn’t start
and has to be towed.
• After each significant earthquake, Californians are for a while diligent in
purchasing insurance and adopting measures of protection and mitigation.
– They tie down their boiler to reduce quake damage, seal their basement doors against
floods, and maintain emergency supplies in good order. However, the memories of the
disaster dim over time, and so do worry and diligence.
• Protective actions (by individuals or governments), are usually designed to be
adequate to the worst disaster actually experienced.
– Images of a worse disaster do not come easily to mind.
– Famous comment of R. Clarke: “Democracies don’t prepare well for things that
have never happened before”.

Risk Perception
u Low Probability / High consequence events are often perceived

as impossible until they happen; and then are seen as inevitable
• In many cases, these events are ignored because of the underlying
assumptions we have made about our environment;
• Some were never envisioned (Bhopal-1984/ World Trade Center-2001);
• Some were envisioned but thought to be extremely unlikely (New
Orleans Flooding);
• Some were thought likely to occur but would only have minor or
controllable consequences (Exxon Valdez).
u Had they been expected, they would not have caused the
damage they did.
• Strange to see an event happening and causing great damage precisely
because it was not supposed to happen.

Risk Perception
Black Swans
The Black Swan by Nassim N. Taleb, 2007
u Not all swans are white (as the assumption goes), look out for Black Swans.
u A Black Swan is a highly improbable event with four key characteristics:
• It is Unpredictable;
• It has a (usually unnoticed) Fat Tail;
• It carries a Massive Impact;
• After the fact, we concoct an explanation that makes it appear less random and
more predictable, than it was.
u Most of history is shaped by such low probability/high consequence events.
White Swans by Nouriel Roubini, 2010 (Crisis Economics)
u Many disasters are results of built-up vulnerabilities (such as forest fires,
hurricanes) & policy mistakes (escalating policies); thus they are predictable.

Risk Perception
White Swans
u According to Roubini, many mounting geopolitical and environmental risks
today are clearly visible to anyone who is willing to look.
• On potential disorder following American elections (February 2020)
There is growing concern that China, Russia, Iran, and North Korea are using cyber warfare
to interfere with the election and deepen America’s partisan divisions. A close outcome will
almost certainly lead to accusations (by either side) of “election-rigging,” and potentially to
civil disorder.
• On the Sino-American relations (January 2020)
Geopolitical tensions are escalating dangerously in Hong Kong, Taiwan, and South China
Seas. Even if neither China nor the US wants a military confrontation, increased
brinkmanship could lead to a military accident that spins out of control. Sino-American cold
war could turn hot.
• On Climate Change (March 2019)
Environmental concerns are mounting. In East Africa, desertification has created conditions
for locust swarms that are destroying crops and livelihoods. Research suggests that crop
failures due to rising temperatures and desertification will drive millions of people from
tropical zones toward the US, Europe, and other temperate regions in the coming decades.

Risk Perception in Environments Involving
Catastrophic Risks
Level of
Perceived
Risk
Real Risk
Level
Realization of Time
Catastrophic Event

Risk Perception
u “Availability Heuristic” has a major effect on individuals’
recognition of risk.
• It substitutes one question for another: you wish to estimate the size/cost of a
category or the frequency of an event, but you report an impression of the ease
with which instances come to mind.
• Even though strokes cause almost twice as many deaths as all accidents
combined, in a study by Slovic and Fischhoff, it was found that 80% of
respondents judged accidental death to be more likely.
• In the same study, tornadoes were seen as more frequent killers than asthma,
although the latter cause 20 times more deaths.
• Death by lightning was judged more likely than death from food poisoning even
though it is 52 times less frequent.
• Death by disease is 18 times as likely as accidental death, but the two were
judged about equally likely.
• Death by accidents was judged to be more than 300 times more likely than death
by diabetes, but the true ratio is 1:4.

Risk Perception
Expert Bias due to hazard availability (frequency)
Estimated number of deaths per year
Actual number of deaths per year

Risk Perception
u “Affect Heuristic” is also influential in recognition of risk.
• It is an instance of substitution, in which the answer to an easy question (How
do I feel about it?) serves as an answer to a much harder question (What do I
think about it?).
• In a ground breaking study by Slovic, it was demonstrated that people who had
received a message extolling the benefits of a technology also changed their
beliefs about its risks.
– Although they had received no relevant evidence, the technology they now liked more
than before was also perceived as less risky.
– Similarly, respondents who were told only that the risks of a technology were mild
developed a more favorable view of its benefits.
• The affect heuristic simplifies our lives by creating a world that is much tidier
than reality. Good technologies have few costs in the imaginary world we
inhabit, bad technologies have no benefits, and all decisions are easy.
– In the real world, of course, we often face painful tradeoffs between benefits and
costs.

Risk Perception
u “It will happen/it won’t happen to me” belief is also important in
recognition of risk.
• Many drivers still refuse to wear seatbelts. When asked why, they
responded, “I won’t have an accident”.
u Past events always look less random then they actually were.
• This is called “Hindsight Bias”.
• After the fact, people always find logical and well fitting arguments to
explain why a past event was not as random (or even inevitable).
• People, have a natural tendency to indulge in elegant, after-the-fact
rationalizations.
u People succumb to selective memory, cherishing successes, while
forgetting failures. Since they fail to acknowledge and learn from
their mistakes, they are prone to repeat them.

Risk Perception
u In summary, an average person’s attitude towards risk is:
• Guided by emotions and publicity rather than by reason,
• Erroneus cause-effect judgements,
• Influenced by memory lapses,
• Easily swayed by trivial details,
• Inadequately sensitive to differences between low and negligibly low
probabilities.

Qualitative Factors in Risk Perception
Risk assumed voluntarily Risk assumed involuntarily
Risk of death delayed Risk of death immediate
Risk certain not to be fatal Risk certain to be fatal
Risk level of exposure known Risk level of exposure unknown
Personal risk can be controlled Personal risk cannot be controlled
Risk is old and familiar Risk is new and unfamiliar
Risk is chronic (one at a time) Risk is catastrophic (many deaths)
Risk is not mediatic Risk is mediatic
Rist to beneficiaries Risk to non-beneficiaries
Risks to unidentified persons Risks to identified individuals
Risks from open activities Risks from secret activities
Source: Baruch Fischhoff

Quantitative Factors in Risk Perception
u Hazard warnings and hazard probability statements are not

easy to interpret for the layman.
• You are within a 100 year flood plain;
• The probability of hurricane within 24 hours at your location is 75%;
• The probability of occurrence of a magnitude 7.0 earthquake on the San
Andreas Fault in the next 25 years is 30%;
• The probability of a major oil spill in the next year is 2 x 10-3;
• The probability of dying an airplane crash is 10-9 per mile;
• The probability of dying in an automobile accident is 10-7 per mile.

u Such quantifications can rarely be applied across the board.

The risk of tornadoes in the tornado-prone Midwestern USA is 1 death per 455,000
persons per year, (2.2 deaths per million persons). This is much greater than the risk
across the the U.S. as a whole.

Estimated Loss of Life Expectancy Due to Various Causes

u The units selected for the presentation of risk also influence

perception.
u Typical quantitative expressions of risk concerning health,
safety and mortality:
• Frequency of deaths per year (absolute numbers);
• Fatal accident rate (e.g. rate per 1000 exposed);
• Probability of death per annum (e.g. 1 in 1000);
• Loss of life expectancy (e.g. in years)
u Usually hard data regarding hazard likelihood and impact level
originate from practitionars such as engineers & operators who
may include their own value judgments in their estimates.

Risk Perception
Impacts of Major Social, Political, Economic Issues
u Many of the global (and local) major Social, Political,

Economic Issues we witness are aggrevated by the different
risk perceptions of stakeholders.
Rather than clashing of different technical/economic
theories/beliefs; knowledge basis, experiences.
• Deployment of Nuclear Power Generators;
• Global Warming and Climate Change;
• Extraction of Oil From Shale;
• Taking Precaution Against Earthquakes;

Risk Communication Issues
u Perceived accuracy of a Risk Communication is hampered by:
• Reputation for deceit, misrepresentation, coercion;
• Self serving framing of messages;
• Contradictory messages from other sources;
• Actual or perceived professional incompetence and impropriety.
u There may be many different reactions to these conflicting
reports. One layperson may decide that they cannot rely on the
judgment of any expert. Another individual may decide to
focus on the expert supporting his or her own view of the risk.

Corona Pandemic Case
Will People Trust a Covid-19 Vaccine:
u In the last two decades, anti-vaccination groups disseminating misinformation about vaccines
have proliferated online, far outstripping the reach of pro-vaccine groups.
u Viruses that had become rare (e.g. measles) have reappeared because of declining vaccination
rates. The anti-vaccination movement is poised to sabotage the Covid-19 vaccine uptake.
u How can politicians convince the public to take a vaccine once it becomes available? The
answer is simple:
Keep silent, and let the scientists and public-health experts share the facts with the people (*).
u Politicization increases public skepticism about vaccination, undermining public-health efforts
to widely distribute the vaccine when it becomes available (**).
u A Trump endorsement dampens the likelihood that individuals will vaccinate (***). A Biden
endorsement fares no better statistically.
u Despite early missteps by WHO and CDC in responding to Covid-19, endorsements by either
would be a more powerful lure for Americans than either a Trump or Biden endorsement.
u Approving a vaccine under an FDA Emergency Use Authorization (EUA) is also more likely
to raise concerns about politicization. People are more likely to infer political motives for the
emergency approval than the the regular FDA approval process (****).

Corona Pandemic Case
Covid-19 Vaccine
Attributes §
Preferences “Marginal
Mean Willingness” for each
attribute refers to subjects’
likelihood of accepting a
vaccine with the specific
characteristic in question,
averaged across all other
vaccine features.
Most Effective Attributes:
• 90% efficacy;
• USA Origin;
• Endorsement by CDC;
• Major Side Effects≤ 0,001%
• Endorsement by WHO
• Full FDA approval vs.
emergency approval.

u Other Issues:
• People simplify;
• Once peoples’ minds are made up, it is difficult to change them;
• People remember what they see;
• People cannot detect incomplete information nor inconsistencies;
• People are highly influenced by the behavior and actions of others in
their immediate neighborhood;
• The more people are warned about arisk, the higher they perceive it;
• The confusion in nomenclature creates fertile ground for ambiguity &
confusion in risk communication.
u A problematic Risk Communication issue: Oversensitive/over-
regulated Automated Alarm Systems.
• Auto alarms (do they really serve a purpose or are they mostly annoyance?);
• Alarms in hospital operating rooms;
• Role of the automated alarm system in the Deepwater Horizon disaster.

Intentional Scaremongering
u Sometimes, the “non-accuracy” of Risk Communication is
intentional, even well-intentioned.
u The authorities/press, deliberately push-up their risk estimates
associated with certain potential major hazards.
• In many cases involving the estimation and communication of risks, the
(generalized) costs associated with,
– over-cautioning the public about some potential hazard and the hazard not
materializing (or at least not at the feared level)
is far less than,
– under-cautioning the public about that hazard and the hazard materializing with dire
consequences.
• In order to coerce the public into taking/abiding with precautions.
• Speculation about a potential disaster is newsworthy.
u This is called Intentional Scaremongering

u Examples of Intentional Scaremongoring:
• Meteorological forecasts of Extreme Weather Conditions (such as rain
or snow storms) are often deliberately exagerated;
• Individuals/communities get extremely upset if forecasts do no predict
such extreme weather conditions and extreme weather conditions do
actually materialize;
• However, individuals/communities do not get upset if forecasts do
predict extreme weather conditions and predicted conditions do not
materialize.
• Warnings about the Bird-flue Epidemic (in 2007) and about the Swine-
flue Epidemic (in 2009) could also have been deliberately exagerated.
• Those two crisis went past almost without any major negative impacts
and the public was mostly relieved.
• Can you image what would have been the public reaction if the public
was under-cautioned and the feared consequences had materialized?

u Examples of Intentional Scaremongoring:
• The high uncertainty associated with the potential dire consequences of
the Swine-flu Epidemic may have forced the authorities to assume the
worst (such as the virus mutating and becoming much more deadly and
drug resistant);
• So, they issued risk warnings at the highest level, which may have
coerced many people to get inoculations (and other precautions), which
in return reduced infections, mutations and drug resistance.
• The highly publicized 3K Crisis at the turn of the millennium (in 2000)
may have coerced many organizations to thoroughly inspect their IT
systems, thereby avoiding the feared dire consequences.
• Newspapers loved speculating about the potential disastrous affects of
various 3K and Swine-flu Epidemic scenarios.

u Such actions, especially when the dreaded hazard does not
materialize, end up increasing the distrust of the public.
• Hürriyet writer Ahmet Hakan’s comment (January 7, 2019):
“Meteorolojiyi uyarıyorum: … Eğer bir kez daha biz İstanbul ahalisini
‘İstanbul’a kar geliyor’ diyerek uyarırsanız; ve bu uyarınıza rağmen,
İstanbul’a bir dirhem bile kar yağmazsa …”
u A Risk Communication/Perception Message by Cui Wei, İstanbul General
Councel, People’s Republic of China (during the 2020 Corona Virus epidemic):
• Virüs hiçbir zaman korkunç değil, asıl korkunç olan rivayet ve paniktir.
Daha korkunç olan, bağımsız olarak düşünememek ve insanlar ne
dediyse ona inanmaktır.
• The Corona Virus virus is not horrible; what is horrible is the hearsey
and panic. What is even worse is not being able to think independently
and beliving everything others say.

Availability Cascades
u An “Availability Cascade” is a self-sustaining chain of events, which may
start from media reports of a relatively minor event and lead up to public
panic and large-scale government action.
• A media story about a risk catches the attention of a segment of the public,
which becomes aroused and worried.
• This emotional reaction becomes a story in itself, prompting additional coverage
in the media, which in turn produces greater concern and involvement.
• The cycle is sped along deliberately by “availability entrepreneurs”, individuals
or organizations who work to ensure a continuous flow of worrying news.
• The danger is increasingly exaggerated as the media compete for attention-
grabbing headlines.
• Scientists and others who try to dampen the increasing fear attract little
attention, most of it hostile: anyone who claims that the danger is overstated is
suspected of association with a cover-up.
• The issue becomes politically important because it is on everyone‘s mind, and
the response of the government is guided by the intensity of public sentiment.

Availability Cascades
u Terrorists are very good in the art of inducing “Availability Cascades”.
u The number of casualties from terror attacks is very small relative to other
causes of death.
• With a few horrible exceptions, of course.
u Even in countries that have been targets of intensive terror campaigns, the
weekly number of casualties almost never are close to the number of traffic
deaths.
u The difference is in the availability of the two risks, the ease and the
frequency with which they come to mind.
u Gruesome images, endlessly repeated in the media, cause everyone to be on
edge.
• It is difficult to reason oneself into a state of complete calm.

Receive Information
A: 60% B: 30% C: 10%

Ignore/ misjudge Investigate, look Accept that there is
indications for signs of a dangerous
something unusual situation
Initial reaction during a crisis

• As the signs of the crisis become stronger, Group A moves to Group B and Group
C. As soon as Group B has received several signs, it will move over to Group C.

Acceptance
10 % 5% 10 % 60 % 12-14 % 1-3 %
Evacuate Attack Warm/ Wait for
instruct others’ Paralyzed Panic
themselves the threat
others initiatives
Reaction after acceptance of the situation during accidents

Some Risk Perception – Communication Cases
From Everyday Life
Misperceptions regarding hazard likelihood, hazard consequence;
inadequate safety measures; unsafe practices; unsuccessfull risk
communication.
All the cases described below appeared in the media as “unfortunate
accidents”. Actually, any harm (let alone death) could easily have been
avoided with a little serious consideration of the risks involved.
u (At an engagement ceromony in Dudullu) A balkony collapsed when 50 people tried to squeeze
themselves to the small space. Unfortunately many died in the ensuing “accidental” balcony
collapse.
Hazard likelihood, Hazard Consequence; Inadequate Safety Measures; Unsafe Practices; Unsuccessfull Risk Communication
u (Aksaray – Istanbul) During a fire fighting exercise, the basket attached to the ladder, which was
raised to a height of 10 meters, broke loose and fell with 5 students and one firefighter inside.
Hazard Likelihood, Hazard Consequence; Inadequate Safety Measures; Unsafe Practices; Unsuccessfull Risk Communication
u (Ali Kirca - at his home in Kuruçeşme) Mr. Kırca walked into the void while attepting to step
into the elevetor (which was at another floor).

From Everyday Life
Elevator Repairs at Boğaziçi

University – November 2011

From Everyday Life
u (Karabük Demir Çelik Fabrikaları) A worker crawled through a 600 ton press in order to reach
the 2450 degree celcius oven on the other side, so that he could light his cigarette in the oven.
Unfortunately, he died in the ensuing “accident”.
u (Istanbul/Sultanbeyli) A young man sprayed “shelltox” into his mouth in order to kill the fly
that he accidentally swallowed. Unfortunately, he was “accidentally” and fatally effected by
the chemical and died.
u (Dilovasi Iskelesi Kocaeli) An unfortunate event on the tanker “Gaziantep”: The third officer
decided to inspect the steam boiler and went inside without notifying anybody; then someone
else saw the open boiler hatch and just closed it. The officer died as a result of a “horrible
accident” when the ship took anchor before anybody noticed his absence.
u (Hendek - Adapazarı) Under the influence of alcohol, 5 young man, travelling on the TEM
expressway, pulled their car into the emengency lane, got out and started dancing on the
expressway, to the tune of the lively belly-dance music paling on the car’s radio.
Unfortunately, the oncoming cars did not notice them and 3 of them got killed in the ensuing
“accidents”.

From Everyday Life
Medical Mishaps in Canada

A sample of adverse and possibly
avoidable events reported by
Manitoba hospitals from july 2012
to March 2013

PART 2
INTRODUCTION
TO
RISK MANAGEMENT

◆ Risk Management is a set of integrated and coordinated proactive

attempts in an organization to recognize and manage internal events
and external threats that affect occurrence likelihood, impact level
and recovery from technological, natural or other hazards. It aims at,
• Identifying as many Risks as possible (what can go wrong);
• Assessing and Analyzing the Identified Risks;
• Generating and Evaluating action Alternatives aimed at minimizing their
occurence likelihoods and/or impacts;
• Providing contingency funds, communication, preparedness, response and
recovery plans to cover Risk Events that actually materialize.
• Implementing the developed Risk Strategy and related plans.
◆ It is imperative that Top Management understand and comprehend
the meaning of risk management.

u Risk Management addresses identification, assessment, analysis of &

possible responses to potential risks in the environment, situation or
project under consideration, before any hazard actually materialize.
• It is a Proactive rather than Reactive approach;
• Reduces surprises and negative consequences;
• Prepares the decision makers to take advantage of appropriate risks;
• Provides better control over the future.
u Risk Management is NOT Box Ticking.
• It is still common to find risk teams who just go through the motions, “doing risk
management” instead of actually managing risk.
• They follow risk procedure becauses it is required by the quality system or by a
client contract, but they show no commitment to action and no understanding that
managing risk is supposed to support better decision making in the business.
• Instead, risk management is seen as additional cost, an optional extra, and a
necessary evil to be endured and got through as quickly as possible.

Effective Risk Management Requires Investment

System monitoring capabilities must be created and maintained;
Infrastructure must support information management,
modeling, planning, and leadership;
Resources that enable plans to be executed must be provided;
Prevention and consequence management systems must be
considered;
A shared organizational culture of
reliability/prevention/mitigation must be created.

Key Features of Risk Sensitive/Aware Organizational

Cultures:
All work processes clearly defined, with an eye towards risk minimization.
Taking inspections and maintenance work seriously;
Taking trainings, exercises, dry runs seriously;
Keeping spare and back-up equipment and material in mint condition;
Taking contingency plans seriously;
Keeping warning and alarm mechanisms in mint condition and taking these
systems seriously;
Having a transparent and complete reporting system regarding errors,
accidents and near accidents.

Evolution of the Safety Culture in High Risk Organizations


The Risk Management Cycle

(Swiss Federal Office for Civil Protection)

Managing Risk in Complex Projects
What is Meant by “Complex Projects”?

Complexity is not just a function of scale - a project can be large but simple, or small
and complex.
Complexity arises from the structure of the project and the way its elements relate
together.
Complexity involves unpredictability, where it is not always clear how a change in one
part might influence other parts.
• It is harder to see how variations in input might affect the overall output for a complex
project, due to the number of interconnections and dependencies within project elements.
The behaviour of complex projects is often ambiguous, which means that complex
projects are always risky.
Complex projects are subject to the same sorts of risk that are found in any project.
However, their unpredictable nature leads to more Unforeseeable risk.
• These risks are hard to identify in advance and difficult to assess accurately, and the
standard risk response strategies are often not effective in treating them.

The fact that some risks are unpredictable/unforeseeable means that we

cannot fully use the normal proactive risk process to prepare for them,
because we cannot see them coming.
• Additionally, more data in such environments may create the impression that we
are dealing with calculable risks when we actually are dealing with unknowns.
Two key strategies that are helpful against unforeseen risks:
• Flexibility – Ability to bend without breaking, to adapt easily; Capacity of
systems to survive, adapt and flourish in the face of turbulent change.
• Resilience – Capacity to maintain core purpose and carry on with integrity.
Nassim Taleb (without saying so) actually defends this concept in his book
Antifragile: Things That Gain from Disorder. He argues that we should,
• focus on non-predictive forms of decision-making;
• be looking for choices that would confer benefits (and/or reduce harm) in the
event of unpredictable extreme changes.

Engineering Resilience of Complex Systems

• Resilience, as utilized in engineering systems, can be defined as resisting change
from an original state while a stress is applied and returning back to that original
state after the stress subsides, regaining previous functionality and equilibrium.
This is referred as “bouncing back”.
• Left image depicts how this

definition focuses on resisting
change and returning to an
original equilibrium state.
• Right image, shows system
functionality over time given
a disturbance at time t0.
• Trajectory A is that of a more
resilient system compared
with trajectory B, as less time
is required to return to 100%
functionality.

• “Bouncing back” aims at only maintaining the status-quo and lacks the component of
“adaptation”, without which, the system does not learn to better manage such shocks
and remains potentially vulnerable to disaster should another disturbance occur.
• Additionally, the disturbance may have changed the system so that the status it falls
back to is either no longer possible or has other implications.
• In the left image, the system
behaves like the engineering
resilience definition, until a
certain disturbance threshold
is reached.
• Then, through adaptation, the
system is pushed into a new
equilibrium state residing on
Line A.
• Line A demonstrates a more
resilient system than line B,
which is more easily forced
past the threshold and into an
undesirable regime.

Both strategies can be applied at multiple levels to address complexity

issues.
• The whole organization, at project/programme levels, in contractual and technical
areas, in the personal attitudes of key staff. Specifically:
Improved Ability to Detect Disruptions
• Better defining and being on the lookout for outliers (events warranting special
scrutiny);
• Being on the lookout for unfavorable trends in processes and searching for
possible causes;
• “Internalizing” possible disruption data. That is, absorbing the data and
communicating them internally, so that relevant parties know of the situation with
enough clarity to be able to contemplate possible actions;
• Questioning long-held assumptions about what is possible and moving
information outside the normal channels.

◆ Improved Ability to Respond to Disruptions

• A key to fast response is the process of escalating knowledge, including what to
inform superiors about and when to do so. In other words, policies and processes
regarding, i) doing nothing and waiting for additional data, ii) investigating the
situation, iii) notify superiors ("escalating" the notification), iv) take immediate
action, should be clearly defined.
◆Improved Flexibility through Mass Customization
• This strategy allows companies to satisfy many customer segments, while taking
advantage of economies of scale in manufacturing/outsourcing in reducing costs.
• It is based on redesigning products and processes so that a core component,
common to a group of product varieties, is manufactured first; the focus on the
base component is costs.
• Thus, it can be manufactured offshore and/or in long production runs to spread the
fixed costs of production over a large number of items.
• The finished products are then manufactured/customized from the base product,
according to customer orders.

◆ Resilience through Redundancy

• Usually, a good move to improve resilience is increased redundancy: that is,
having extra inventory, surplus capacity, alternative supply sources etc. (which
can give an organization time to organize its response and recovery).
◆Improved Flexibility through Interchangeability
• Flexibility requires having viable alternatives in any situation. Standardization of
parts, processes, and production systems, so that these elements are
interchangeable, creates options for using them where there is a shortfall.
Just “Coping” with unforeseen disruptions is sometimes named “Real-time
Resilience”, while “Adapting” to a changing environment is named
“Sustainable Resilience”.
At each level, specific actions can be taken to develop appropriate flexibility
and resilience to deal with risks that may arise.

The Choluteca Bridge Syndrome
◆ In the 90s, in Honduras, a new road was planned for the city, so a new bridge was needed. The
new 484 meter Choluteca Bridge, was built by a Japanese company between 1996-98.
◆ Since the region is hurricane prone, the authorities insisted on very high technical specifications.
So, the contractor built a strong, solid bridge, designed to withstand extreme weather conditions.
◆ Soon after the bridges opening, Honduras was hit by
a devistating Hurricane. Choluteka river flooded the
entire region. Many bridges were were destroyed; but
the Choluteca Bridge survived with minor damage.
◆ While the bridge itself was in good condition,the roads
on its either end of were totally swept away. Moreover,
the Choluteca River had carved itself a new channel
during the flood and it now flowed beside the bridge,
not beneath it.
◆ So, while the bridge was strong enough to survive the
hurricane, it became a bridge over nothing, spanning
just dry ground. It became known as “The Bridge to
Nowhere”.

The Choluteca Bridge Syndrome
The lessons from the Choluteca Bridge are very much relevant to Risk Management.
i.A serial system is only as strong as its weakest component (the swept away roads);
ii.The world may change in ways we never imagined; this bridge is an excellent
methaphore for what can happen to us (our careers, our businesses, our lives) as the
world around us gets transformed – adapt to change, or else …;
iii.Be careful when in your career you aim to become an expert in some specific, narrow
area; that expertise might soon become redundant;
iv.We get focused on creating the best solution for a given problem, while ignoring the
possibility that the problem itself might change;
v.We focus on building the strongest, most sophisticated product or service, without
thinking of the possibility that the market could change and the need could vanish.
Hondurans focused on the bridge and ignored the possibility that the river below could
change course.
vi.“Built to Last” migth have been a popular mantra, but “Built to Adapt” could be the
way to go.

The Risk Management Process
Major Components of the Risk Management Process:

• Identifying Sources of Risk;
• Analyzing and Assessing Risk;
• Risk Response Development;
– Contingency Planning & Contingency Reserves;
• Risk Response Control & Implementation.

The Risk Management Process
Step 1: Risk Identification
Analyze the environment to identify sources of risk
Known risks
Step 2: Risk Assessment
New risks Assess risks in terms of,
• Severity of impact;
• Likelihood of hazard occurrence;
• Controllability.
Risk assessment
New risks
Step 3: Risk Response Development
• Develop strategy to reduce occurence likelihood;
• Develop strategy to reduce impact;
• Develop contingency plans.
Risk management plan
Step 4: Risk Response Control & Impl.
New risks
• Implement risk strategy;
• Monitor & adjust plans for new risks;
• Trainings and Exercises.

Identifying Sources of Risk
A risk, once identified, is no longer a risk, it is a management problem.

Still, it is easier to figure out if something is fragile than to predict the
occurance of an event that may harm it or level of damage.
Risk Identification begins with a list of all areas that may contain
hazards or cause failures & their respective outcomes.
• Brainstorming and Risk Profiling is the initial step;
• Things that have not been done before are potential trouble spots;
• Risk sources depending on specific type of issue at hand:
– Construction; Design; Software; Transportation.
• Risk sources originating from the organization:
– Organizational structure and culture, organizational size, resources, financial structure.
• Risk Sources external to the organization:
– Inflation; market acceptance; exchange rates; government regulations.
• It is better to start with risks associated with the whole issue at hand rather than
one specific section;
– Once macro risks are identified, specific areas can be checked.

Risk Profiling is a good tool to help identify risks.

• It is a list of questions addressing traditional areas of uncertainty;
• Good risk profiles are tailored to the type of work/environment under consideration;
• Questions are usually developed from previous, similar experiences;
• They should recognize/focus on strengths & weaknesses of the organization;
• They address both technical and management risk;
• Some management consulting firms provide/sell risk profiles.
Historical Records are also useful in risk identification.
• Transparency and full/correct reporting is a key issue.
Simulating all business processes, procedures is a good practice.
Risk Identification process should not be limited to a few persons.
• Input from employees, customers, sponsors, subcontractors, vendors should be
solicitude through interviews.
Risk Identification process should be repeated at regular intervals.
• The risk landscape constantly evolves and, with it, the risks.

Partial Risk Profile for Product Development

Roots of Risk Propensity in Large & Complex Systems
Activities that are performed in the system may be inherently risky (e.g.
mining, hazardous material transportation, air transportation);
The technology used may have inherent risks, or exacerbate risk in the
system (e.g. heavy equipment);
Physical environment may be inherently risky,
• Susceptibility to natural disasters;
• Proximity to and nature of populated areas and other businesses;
• Supporting infrastructure (power, telecommunications, water, transportation)
The product or service provided may have inherent risks,
• Potentially dangerous materials, and products or services;
• Demographics of customers;
• Liability for defective services and products;
• Quality issues.
Human and organizational errors can be propagated by organizations and
individuals executing/coordinating tasks, or using/coordinating technology.

Roots of Risk Propensity in Large & Complex Systems
Organizational resources and policies may be inadequate or out of phase.

• Financial, manpower, equipment, management and technology resources;
• Investments, marketing, strategies, suppliers, customers/partners/markets.
Communications channels may be inadequate or prone to failure.
• Availability level of formal and informal channels of communication;
• Procedures not clearly established nor well understood;
• Ability to receive warning signals;
• Receptivity and ability to properly interpret warning signals.
Organizational structures may enable risky practices to occur, or may
encourage workers to pursue risky courses of action. For example:
• Lack of formal safety reporting systems or departments in organizations;
• Organizational standards that are impossible to meet without taking risks.
Organizational cultures may support/encourage risk taking, or fail to
encourage risk aversion. For example:
• Cultures that encourage the belief “it can’t happen here”,
• Rewarding people for taking warranted/unwarranted risks.

Technical New technology or materials; Test failures & quality issues.

Enviromental Unforeseen weather/natural conditions.
Operational New systems & procedures; Training needs; Quality issues.
Cultural Established customs and beliefs.
Financial Freeze on capital, bankruptcy of stakeholders; Currency &
interest rate fluctuation.
Legal Local laws; Lack of clarity of contract.
Commercial Change in market conditions or customers.
Resource Shortage of staff, operatives, materials, equipment.
Economic Slow-down in economy, change in prices and demand.
Political Change of government or government policy.

Some New Risk Sources
u Globalization of infrastructures increases exposure to potential harm.

u Easy accessibility of many infrastructure systems and organizations via the
internet increases their vulnerability.
u Easy accessibility of many appliances and equipment in our everyday
environment via the internet increases their vulnerability.
u Interdependencies of systems make accident consequences harder to
predict and perhaps more severe.
u Everyday tools and products that appear to be harmless and seemingly do
not need a high degree of technical skill to operate, but actually may be
quite hazardous are widely available.

Global Risks
Source: World Economic Forum - Global Risk Network Report, 2012

Global Risks
Catastrophic Threats to Humanity
◆Autonomous Weapons – Near Term
◆Cyber Attacks and Information Infrastructure Breakdown – Near Term
◆Data Fraud or Theft – Near Term
◆Extreme Weather – Long Term
◆Catastrophic Climate Change – Long Term
◆Biological and Chemical Warfare – Long Term
◆Artificial Intelligence – Long Term
◆Food or Water Crises – Long Term
◆Ecological Collapse – Long Term
◆Pandemics and Antimicrobial Resistance – Long Term
◆Nuclear warfare – Long Term
◆Asteroid Collision – Very Long Term
◆Super Volcanic Eruption – Very Long Term
◆The Sun Consumes Earth – Very, Very Long Term
Sources: World Economic Forum, the Intergovernmental Panel on Climate Change, the
Chicago Actuarial Association, the Global Challenges Foundation, NASA

Analyzing and Assessing Risk
u At an individual level, informal risk assessments are more-or-

less continuous cognitive process.
• When crossing the road;
• Purchasing goods;
• Deciding on mode of transport;
• Engaging in social interaction;
• Gauging job prospects.
u The aim of (formal) risk assessment is to provide information on
which decisions may be made about proposed actions, the
adequacy of risk controls and what improvements might be
required.
u Not all risks deserve attention.

◆ Such quantifications can rarely be applied across the board.

The risk of tornadoes in the tornado-prone Midwestern USA is 1 death per 455,000
persons per year, (2.2 deaths per million persons). This is much greater than the risk
across the the U.S. as a whole.

Disasters Caused by Natural/Industrial Hazards in Europe in 1998–2009

Disasters Due to Natural Hazards in EEA Countries, 1980–2009

Global Risks Landscape, 2019

Global Risks Interaction Map, 2019

Risk Assessment focuses on selected potential foreseen risk

events exhibiting high probability of occurrence and/or having a
high consequence of loss.
• Even sophisticated risk analysis are disciplined guesswork.
Typically, such risk assessments focus on “scenarios” deemed to
be credible and significant by “experts”.
• Risk assessment for an off-shore oil and gas installation may focus on
various scenarios involving bad weather conditions, ship collision, riser
blow-out and helicopter crash.
• Risk assessment for an investment bank may focus on rouge trading,
major bad debts and multiple loss of key executives in an air crash.
• One drawback: experts following a rationalistic approach may not give
sufficient emphasis to all pertinent routes to causes of disaster.
– Three Mile Island, World Trade Center cases.

Two broad approaches to Risk Assessment

• Heuristic/Rule of Thumb approach, being in general, qualitative &
subjective in nature, relying on individuals’ collective judgment.
– Expert opinion or “gut feeling” estimates are the most used, but they carry
serious errors depending on the skill of the persons making the judgment.
– Wishful thinking in forecasting erros (if forecasts involve potential personal
gain) may be a problem.
• Scientific approach employing quantitative modeling and generally
requiring formal training in mathematics.
– Quantitative methods, usually require serious data collection and a more
detailed analysis of the facts, while being limited in scope;
– So, they have low acceptance levels by practicing managers.

There is a tendency to regard Heuristic Approaches as being
inherently inferior.
• This unwarrented view is based on a failure to recognise that risk assessment,
however sophisticated the math involved, is inherently value-laden.
• In real life, problems related to the estimation of probabilities rarely proceed
as computations of odds on the dice. Probability and/or impact level
estimations almost never present themselves as well defined mathematical
problems.
• The detailed but narrow base of technical knowledge on which many QRA’s
are made may create a false, reduced picture of real world settings in which
risk behaviour is actually much more complex.
• The most risky aspects of an organization may lie not in physical hazards but
in self reinforcing behaviour associated with organizational structure, habits
and culture.

Main Risk Assessment Techniques
• Semi-Quantitative Approaches:
– Risk Assessment Charts;
– P-I Tables;
– Risk Assessment Forms & Matrices;
– Failure Modes and Effect Analysis (FMEA);
– Risk Scoring.
• Quantitative Modeling and Analysis:
– Statistical Analysis;
– Dynamic (Simulation) Analysis;
– Decision Trees, Event Trees, Fault Trees.
Choice depends on risk source, possible outcomes & impacts,
and management’s attitude towards risk assessment.

Risk Assessment Charts
A Simple Risk Assessment Chart
What are the three major risks for this environment?

1.
2.
3.
Risk 1
above
What is the Probability of the 0 to 1.0 Risk 2
above risks occurring? None - Very High above
Risk 3
above
Risk 1
above
What is the Impact if these 0 to 1.0 Risk 2
risks do occur? above
None - Very High
Risk 3
above
Resources Available?

P-I Tables
A qualitative assessment of the probability (P) of a hazard & the

impacts (I) it would produce are made by assigning descriptions
to the occurance probability & impact levels of each risk.
• The assessor is asked to describe the probability and impact of each risk
by selecting from a predetermined set of phrases.
– (such as nil, very low, low, medium, high, very high);
• A range of values is assigned to each phrase in order to maintain
consistency between the estimates of each risk.
– These values generally are not evenly spaced (there is a multiple difference
between each case (3-10);
– Usually the same multiple is applied to the probability and impact, so that
severity scores will be more meaningful;
– The value ranges can be selected to match the specific risk environment.

P-I Tables
A table examplifying value ranges that could be associated with qualitative
descriptions of the probabilities and impacts in a particular risk environment
• Associated with the successful completion of a new product development project.
Category Probability (%) Cost Quality
Very high 10-50 > 1000 Failure to meet acceptance criteria
High 5-10 300-1000 Failure to meet >1 important spec.
Medium 2-5 100-300 Failure to meet an important spec.
Low 1-2 20-100 Failure to meet > 1 minor spec.
Very low <1 < 20 Failure to meet a minor spec.
When the definition of each phrase is made specific to a particular risk

environment, it becomes difficult to perform a combined analysis of the risks
from multiple risk environments that the organization might face.

P-I Tables
Value ranges can also be selected to reflect the potential effects

of various hazards on the whole organization.
A table examplifying impact descriptions more fitting to the
potential effects of various hazards on the organization as a
whole.
Category Quality
Catastrophic Jeopardizes the existence of the organization.
Major No longer possible to achieve strategic objectives.
Moderate Reduces the ability to reach strategic objectives.
Minor Some short term/tactical disruptions but little effect on strategic objectives.
Insignificant No impact on tactical operations nor on strategic objectives.

P-I Tables
◆ An example description of “Likelihoods” in a P-I Table.

P-I Tables
◆ An example description
of “Consequence Levels”
in a P-I Table.

P-I Tables
◆ Various types of impacts of each single risk can be individually
defined and quantified.

P-I Tables
u A P-I Table offers a quick way to visualize the relative importance of

all identified risks that pertain to a risk environment.
• In the following table, all risks are plotted allowing easy identification of the most
threatening risks, as well as providing a general picture of the overall risks.
– Risk numbers 10, 2, 12, 8 are the most thretening in this example.
Combined Impact for Identified Risks

V. High 6 10, 2
High 8 12
Impact
Medium 5 4, 9 1
Low
V. Low 11 7 3
V. Low Low Medium High V. High
Probability

P-I Tables
P-I scores can be used to rank identified risks, by assigning a scaling factor
(such as 1 – 5) to phrases used to describe each type of probability & impact.
• In this type of scoring, the base measure is “probability x impact”;
• However, since the categorization resembles a log scale, for consistency, severity can be
defined as “S = P+I” (which leaves severity on a log scale also).
5. V. High 6 7 8 9 10
4. High 5 6 7 8 9
Impact
High severity
3. Medium 4 5 6 7 8
2. Low 3 4 5 6 7 Medium severity
1. V. Low 2 3 4 5 6 Low severity
1.V. Low 2. Low 3. Medium 4. High 5. V. High
Probability
• If a risk has “k” possible types of impact, they can all be combined into a singe score as,
é k Pi +Ii ù
S = log10 ê å 10 ú
ë i=1 û

P-I Tables
P-I scores may also be deployed to detect trends.

• For example, the distribution of severity scores over an organization gives an
indication of the overall amount of risk exposure.
P-I scores may also be deployed to compare risk reduction alternatives
through efficiency measures such as
• In this equation “Inv” refers to the amount of investment (of people, time, money
etc.) necessary to pull the “S” score from Sold level to Snew level.
Periodically, “Inherent Risks” (i.e. risk estimates before the
execution of any risk mitigation efforts) may be plotted together with
“Residual Risks” (i.e. risk estimates after the execution of risk
mitigation efforts) in order to monitor the risk management efforts.

Monitoring Risk Progress Through P-I Scores
4    
A D
3     
G C F E
Impact
2   
B  
F
1 
H 
1 2 3 4 5
Probability
P-I Table Implementation: Torino Asteroid Impact Scale
◆ It is a risk-assessment scale assigning values to celestial objects
moving near Earth (http://neo.jpl.nasa.gov/risk/).
• It takes into account the object's size and speed, as well as the probability
that it will collide with Earth.
• The scale runs from zero to 10. An object with value 0-1 has virtually no
chance of causing damage on Earth; 10 means certain global catastrophe.
• Close encounters, assigned values 2-7, could be categorized as ranging
from "events meriting concern" to "threatening events”.
• “Certain collisions” merit values 8-10, depending on whether the impact
energy is large enough to cause local, regional or global devastation.
◆ It is difficult to figure out what level of anxiety we should have
about an approaching asteroid.
• Torino scale puts in perspective whether a Near-Earth Object merits
public concern, just as the Richter Scale does with earthquakes.

Torino Scale: P-I Scores Related to Asteroid Impact Risks

Torino Scale: P-I Scores Related to Asteroid Impact Risks

The Torino P-I Plot

The Torino P-I Plot
◆ Surprising/Unintuitive Risk Assessment results:
• Class 7 includes celestial object having the potential to cause catastrophes
of global nature (maybe involving hundreds thousands of people killed)
and have significant occurance likelihood (between 1 - 50%).
• A risk environment involving nuclear or poisonous chemical plants and
featuring similar impact/likelihood estimates would simply have been
decleared “intolerable” and closed down.
◆ Why such a “relaxed” attitude towards astreoids?
• Unavoidability (at least with todays technology);
• Randomness (Unpredictability) of the Target Area;
• Time Frame (decades, maybe centuries away).

Risk Assessment Forms and Matrices
u Rough estimates are inputted on Risk Assessment Forms/Matrices
Risk Event Likelihood Impact Detect. Diff. When
Interface Problems 4 4 4 Conversion

System Freezing 2 5 5 Start-up
User Backlash 4 3 3 Post Installation
Hardware Malfunction 1 5 5 Installation
A Risk Assessment Form
5
• Some types of Risk Assessment 4 B I
I: Interface Prob
(Severity) Matrices used to provide 3
S: System Freeze
a basis for prioritizing which risks B: User Backlash
2 F
to address, resemble P-I tables. H: Hardware Malf.
1 H
1 2 3 4 5

Failure Mode and Effects Analysis (FMEA)
Each risk is assessed in term of the score,

Risk Value = Impact x Probability x Detection
• Each of the three dimensions is rated along a 5 point scale.
• Detection Score (ability to discern that the risk event is imminent) :
1 (spotting very easy) for anybody being able to spot the risk coming;
5 (spotting very hard) discovering after the fact.
• Similar anchored scales would be applied to impact severity and
occurrence probability.
• The weighting of the risks would be based on their overall score.
• A broad range of numerical scores: 1 – 125

Risk Scoring
◆ Each risk is assessed in term of the score,

Risk Value = Impact x Probability x Exposure
 Impact: Potential consequences of the hazard as presently controlled;
 Probability: Probability that the hazard will result in an accident;
 Exposure: Typical frequency and duration of people’s exposure to the
hazard.
 Each of the three dimensions is rated along a 5 point scale.
Typical Ratings for Impact, Probability and Exposure

Impact Level
1- Minor Injury 1- Very unlikely 1- Rare/never

Probability
Exposure
2- Serious Injury 2- Unlikely 2- Infrequent (1-3 month)
3- Major Injury 3- Likely 3- Frequent (weekly)
4- Multiple casualties 4- Very likely 4- High (daily)
5- At least one fatality 5- Inevitable (imminent) 5- Constant

u Evaluate risks by comparing the assessed values

against established standards (if any) in the areas,
• Laws;
• Regulations;
• Organization policy;
• Industry best practice;
• Stakeholder concerns(limitations).

u 10 Commandments of Risk Assessment & Analysis

• Do your homework with literature, experts and users.
• Let the problem drive the analysis;
• Make the analysis as simple as possible, but not simpler.
• Identify all significant assumptions.
• Be explicit about decision criteria and policy strategies.
• Be explicit about uncertainties.
• Perform systematic sensitivity analysis.
• Iteratively refine the problem statement and the analysis.
• Document clearly and completely.
• Expose to peer review.

Risk Response Development
u The more effort given to risk response planning before an

incident or crisis occurs, the lesser are the surprises. Stress and
confusion when the risk event occurs is reduced.
u Reducing Risk: Reducing likelihood and/or impact of an
hazard.
• Mitigating Risk: Modifying levels/likelihoods of the impact levels.
• Preventing Risk: Changing the probability of occurrence of the hazard.
• Avoiding Risk: Affecting environment/action change to eliminate risk.
– Use a tried and tested technology instead of a new one;
– Change the country location of a factory to avoid political instability;
– Scrap the project under consideration.
• Modifying Objectives: Reduce or raise performance targets, change
tradeoffs.

u Transferring Risk: Passing the risk to another party, without changing

it; usually results in paying a premium for this exemption.
• Fixed price contracts;
• Insurance;
• Penalty clauses for exceeding agreed schedules;
• Performance guarantee of product or service.
u Sharing Risk: Allocates proportions of risk to different parties
• Leads to innovative continuous improvement procedures.
u A primary driving principle regarding either risk transfer or risk sharing:
• The more the ownership of risks are allocated to those who control them the
better, up to the point where the owner could not reasonably bear the impact
where others can.
– How big is the risk?
– What are the risk drivers?
– Who is in control of the risk drivers? Who has experience to control them?
– Who can absorb the risk impacts?

u Retaining Risk: It may not be feasible to transfer or reduce risk; so, a

conscious decision is made to retain the risk of an hazard.
• Contingency Planing: Set aside resources to provide reactive capability.
• Acceptance: Accept risk exposure, but do nothing about it.
• Monitoring: Collect more data about the probabilities of occurrence,
anticipated impacts, in order to better understand the risk.
• Controlling: Applies to high probability, low impact risks normally
associated with repetitive actions, aiming better management through
better internal processes.
• Remaining unaware: Ignore the possibility of risk exposure and take no
action.
• Increasing: Judging the present course of action as overly cautious and
taking actions to increase the probability of hazard occurance or impact.

high
Introduce Measures
To Avoid Scenario
Avoid the Unmanageable

Likelihood
Manage Scenario
and
of
(Reduce or Manage the Unavoidable*
Realization
Transfer risk)
Ignore
low (Accept Risk)
low high
Level of Impact


Contingency Planning
u Contingency Planning: Planning for an organization's reaction

to potential hazards to ensure the protection of life, safety, health
and the environment, to limit and contain damage to facilities
and equipment, to stabilize operational service and public image
impacts and to manage communications about the event.
u Contingency Plan is an alternative plan to be used if a possible
foreseen hazard becomes a reality. It identifies preventive
actions that will mitigate the negative impacts of the hazard. It
includes:
• Emergency response plan;
• Incident management plan;
• Crisis communications plan;
• Crisis management team plan.
Contingency Planning
u Contingency Plans should include cost estimates, and identify and
establish availability of the necessary funding, equipment & materials.
u Contingency Plans should be transparent and conditions for activating
them should be determined and clearly documented.
• The plan and related documentation should be communicated to team members to
minimize surprise and resistance.
• All parties effected should agree to the plan and have authority to make
commitments.
• The warning regarding some responses leading to complications featuring
multiplicative chains of unanticipated effects in complex systems also applies to
contingency planning.
u Risk events arising from sources external to the project are more
difficult to foresee & tend to cause more disruption.
• Contingency plans responding to external events may involve new team players,
unfamiliar to the project and having conflicting goals.

Contingency Planning For The Notre Dame Cathedral
u The Contingency Plan implemented by the Paris Fire Department during the
2019 fire at the Notre Dame Cathedral was prepared 160 years ago (following
the original building’s demise and rebuilding after the French Revolution).
u Its key clauses regarding priorities in any fire intervention operation are:
1. First save human beings trapped.
2. Next save the artworks in the Cathedral.
3. Next save the Altar (the big Cross).
4. Next save the furniture.
5. Next try to save the building.
u Notice the highest priority is given to irreplacable human life and artwork;
lower priority is given to semi-replacable Altar and furniture; while lowest
priority is given to the building, which was considered replacable.
u Additionally, an oak forest was initiated at the Versailles Palace Gardens to be
cut and used for the church roof if it ever became necessary.
•Accordingly, the oak trees at the Versailles Gardens provided a crucial resource during the
2019 rebuilding campaign.
Risk Assessment and Response Matrix
Detect Accept/Reduce Contingency Trigger
Risk Event Chance Severity When
Diffty Share/Transfer Plan Event
Transfer/Accept: Having old
Delay
Pre- Better contracts with machine on
Late Delivery Low Medium Low exceeding 5
Instltn penalty clauses; better standby for
customs agencies days
backup
Reduce: Having experts Prod rate
Operators’ Post- flown in to 10% below
Adaptation Low Low Medium On site training before
Instltn delivery; Better training support local planned after
Problems
procedures team 5 days
Reduce: Having old
Machine not
Post- Wide communication machine on Acceptance
Confirming to Low High Medium standby for
Instltn with order;Pre delivery on tests negative
Specifications backup
site inspection
Reduce: Order
Order Influence top mngmt Having a
Financial delayed by 5
Low High Low Placmn priorities; obtain self leasing plan
Problems financing from ready days because
t
manufacturer of financing
Unresolved Reduce: Have backup

Pre- location;Have Acceptance
Installation Medium Medium Medium Wide communication old machine on
Instltn with order tests negative
Problems standby;

Documentation - Risk Registers
u A Risk Register is a document or database that lists each risk pertaining to
a project or organization, along with the following information that is useful
for the management of those risks:
• Date the register was last modified;
• Name and description of the risk;
• Description of why it would occur;
• Description of factors that would increase or decrease the probability of occurrence or size
of impact;
• Semi-quantitative estimates of its probability & potential impact (e.g. P-I scores);
• Name of owner of the risk (the person who will be responsible for monitoring the risk and
effecting any risk reduction strategies that have been agreed on);
• Details of risk reduction strategies that it is agreed to be taken;
• Reduced impact and/or probability of the risk, given the above agreed risk reduction
strategies have been taken;
• Action window: period during which risk reduction strategies must be put in place;
• Contingency plans (short description, person responsible, triggers, reference to details);
• Description of secondary risks that may arise as a result of adopting risk reduction strategies.

Risk Response Control & Implementation
u Prediction and Early Warning Systems

• Detecting the approach or even the occurance of a disaster and effectively
and speedily informing the parties to be affected.
• Prediction is of little use without the ability to actually trigger an alarm
immediately prior to or at the onset of the event occurring.
– Need to remove the guesswork, the surprise, the shock of the event arrival;
– Need to provide real time early warning sensor and communication systems.
• Hurricane Warning Systems (1-4 days), Fire Alarms (5-30 minutes),

Earthquake Alarms (1- 5 seconds) are good examples.
– Evacuation/emergency access plans and means are important;
– Automatic ceasing/provision of utilities/services are important;
– Public awareness programs, media broadcasts may be necessary to teach/condition
communities into recognizing warning signals and speedily reacting as directed.

u Trainings & Exercises

• They are key aspects of Successful Risk Response Control and
Implementation.
• Periodically and routinely acting out an imaginary realization of the hazard,
where all participants of the related contingency plan and or rescue teams
are asked to perform their assigned tasks.
– Real equipment and material should be deployed;
– i) response times, ii) handling/availability of equipment/supplies, iii) physical, medical and
psychological handling of the victims, iv) damage control, potential domino effects
(multiplicative chains of unanticipated effects), evacuation procedures, should all be
tested;
– Should be in the lookout for ways to increase system resiliance and flexibility;
– Public awareness programs, media broadcasts should be considered.
• During the 9/11 disaster, if it were not for the many, routinely conducted
previous evacuation exercises, it wouldn’t have been possible to evacuate
thousands of people from these building in the few hours before collapse.
u Implement the Adopted Risk Strategy during the occurence of

the dreaded hazard.
• To limit the consequences by containing damages and by means of rescue,
evacuation and aid actions.
• Implementation of contingency plans and rescue/damage control
operations including,
• Logistics (transportation of aid teams and victims; transportation, storage,
distribution of aid and supplies; transportation and operation of
equipment);
• Coordination of aid teams and other experts;
• Setting up communications and information hardware and networks;
• Information compilation, processing and distribution;
• Communications and public relations.

u Rehabilitation
• Of losses and functionalities.
u Monitor & Adjust Plans (for old and new risks) based on the
experience gained.
• Update the related risk data (incidents, risk factors etc.)
• Revise the related risk assessments;
• Documentation;
• Revise basic operational procedures, equipment, communication,
information and material needs (with more flexibility & resiliance in
mind);
• Suggestions for new regulations and organizational framework;
• Reconsider strategies for preventive and mitigation measures;
• Revise contingency plans;
• Revise training and exercise programs.
u Example: Revised Fire Precautions in BU after Galatasaray fire.
u Transparency Regarding Errors, Accidents, Near Accidents

• The point of reporting/recognizing errors should not be to shame or blame;
• When things go wrong, it is typically the result of complex interaction of
factors, sometimes involving underlying flaws in the system;
• Transparency and full/correct reporting helps in identifying those flaws and
interactions (learning from mistakes);
• Also helps in drawing attention to the issue so that more pressure and effort
will be spent in identifying flaws and weak points;
• Facilitates exchange of safety/reliability information;
• Reduces suspicion (by stakeholders and or layman) facilitates better risk
communication.
u Factors Discouraging Transparancy
• Fear of penalty, legal action;
• Fear of reputation loss, shame;
• Highly hierarchical systems discouraging people from speaking out.

A Successful Risk Response Implementation Case
2010 Copiapo Mining Accident
u Copiapó mining accident occurred on 5.8.10, when part of the San Jose
copper mine near Copiapo, Chile collapsed, leaving 33 men trapped 700 m.
below ground. The miners survived underground for 69 days. All 33 were
rescued and brought to the surface on 13.10.10.
u The shift supervisor, recognizing the gravity of the situation and difficulty of
rescue, gathered all in a Secure room (Refuge) & organized them for long-
term survival.
• Experienced miners were sent out to assess the situation, men with important
skills were given key roles;
• Scarce food resources were rationed;
• Teams were set up to dig for underground water sources;
• Exercise, sleep, cleaning/maintenance times were scheduled for all.
u The 50 m2 Refuge (typical in such coal mine) was designed to provide 4 days
of air, food, water for 15 people. Communication equipment and medical
supplies were also available; it was located close to working areas.
• The miners also had 2 km of galleries in which to move around.
•


Keys to Success
Availability of the Refuge (as a part of the Contingency Plan) which
provided the trapped miners with crucial physical needs and morale.
• Food and water supplies;
• Medical Supplies;
• Communication & other equipment (fire extinguishers, drilling equipment);
• Clothing and Hygiene material.
Excellent Training of the Men and their Leader.
• Organization for extended (unknown duration) stay;
• Excellent physical conditions of men;
• Team spirit and high morale.
Immediate,decisive, professional, creative actions of the Rescue Team.
Good luck: 2 km undamaged galleries, underground water availability.

Aspects that could have led to Failure (if Luck was not there)
Emergency ladders to scale up the ventilation shafts were not
operational.
The ventilation of the refuge was very poor.
• If it were not for the undamaged galleries, the conditions would have been
much worse.
There were no toilets.
• If it were not for the undamaged galleries, the conditions would have been
much worse.
The maps of the mine-shafts were out of date.
• This slowed down the rescue operations considerably.
Available supplies at the Refuge could have been better
• They used the batteries of the mining trucks around to power their lamps.

A Plane Crash Rescue Case: US Air Flight 1549
Keys to Success
Risk Mitigation Measures built into the Aircraft
• Body made “waterproof” at the flick of a button;
• Water impact resistant fuselage & wing design;
• Water pressure sensitive “locking” doors.
Excellent Expertise and Training of the Crew
• Professionalism, clear decision making and technical ability of the Pilots;
• Professionalism, clear decision making of the Cabin Crew.
Excellent Expertise and Training of the Rescue Operations
• Immediate, decisive, professional actions of the rescue teams;
• Immediate availability of rescue teams and equipment.
Trusting and instruction abiding Culture of the passengers
• Evacuation from the damaged plane and balanced placement on the
precarious wings proceeded in an extremely orderly fashion.

A Plane Crash Rescue Case: US Air Flight 1549
Keys to Success
Availability of a sound, serious and well designed Contingency Plan.
• All craft in the immediate neighborhood of the crash landing area were directed to
participate at the rescue operations (some of the first arriving boat were tourist
carrying ferries);
• Port authority had divers on call that were able to participate at the rescue
operations immediately;
• Helicopters were made immediately available to coordinate and support rescue
operations.
Good luck: good weather and water conditions; no mishap during landing.

PART 3
QUANTITATIVE
RISK ASSESSMENT

Quantitative Modeling & Assessment of Risk
◆ Role of Undesirable Event Data;

 Rare Events;
 Elicitation of Expert Judgment.
◆ Statistical Analysis;
– Measures of Central Tendency, Spread & Shape;
– Linear & Logistic Regression;
– Hypothesis Testing;
– Correlation Analysis;
– Statistical Inference;
– Understanding Probability and Randomness.
◆ Fault Trees;
◆ Decision Trees;
◆ Dynamic (Simulation) Analysis.

Role of Undesirable Event (Accident/Failure) Data

Accident/Failure data are used by industry as tools of risk measurement.
• They provide useful information on safety and risk level within relevant units;
• Prediction of future events are frequently based on historical realizations &
frequencies.
Given that predictions often fail, one might ask why bother. The answer is
that there is no other choice,
• even we are aware of possibe deviations, we cannot live without predictions.
Statistics obtained from that data can be used in several ways:
• To monitor the risk and safety level;
• To give input to risk analysis;
• To identify hazards;
• To analyze accident/failure causes;
• To evaluate the effect of risk reducing measures;
• To compare alternative areas of efforts and measures.


To many people risk assessment is closely related to statistics
derived from accident/failure data.
• Numerous reports are produced showing losses as a result of
accidents/failures;
• The data may cover/identify different consequence categories (loss of
life, property damage, environmental degradation etc.) and
accident/failure types (fire, explosion, collusion etc.)
• Often data are related to time periods; also detailed information such as
occupation, age, operations, environmental conditions, maintenance
activities may be available.
• In some cases (so called) hard data regarding accidents and failures
originate from practitioners, such as engineers & operators, who may
have included their own value judgments in their reports.

◆ Although the data is historic data, they usually provide a good
picture of what to expect in future.
• The data may belong to the past, but assuming a similar future
performance of systems, they give reasonable estimates and predictions
for the future.
◆ However, Relevance, Reliability & Completeness of historic data is
always an important issue.
Have all past injuries been reported?
Are there any institutional incentives / inclinations for not reporting past
accidents or consequences?
Any changes in the environment?
◆ Statistical analysis is based on the idea of “similar” situations.
For all practical purposes, similar regarding historical data.

Case of Insufficient Accident/Failure Data – Rare Events

◆ A key question: Is the number of observations at hand meaningful
for a rigorous statistical analysis?
• The actual number of undesirable events realized in the past may not be a large.
(especially regarding serious undesirable events)
• More detailed analysis (identifying possible causes or keeping track of different
accidents or consequences) require categorization/classification of past data.
– Then, the number of events in each category would be small, thereby reducing the
reliability of the statistical analysis.
• In order to increase the amount of data, near misses and/or deviations from
established procedures may be included.
– Such events a relatively good picture of where accidents might occur, but they do not
necessarily give a good basis for quantifying risk.
– An increase in the number of near misses could be a result of a worsening of safety,
but it could also be a result of increased reporting.
– Often near miss data is incomplete and not as reliable as accident data.


◆ Another approach deployed in investigating rare events associated with
complex systems is to breakdown the system in smaller components.
• And model the rare event as a combination of a series of smaller events.
• Whereas there may be no data regarding the actual occurrence of the
original rare event, there may be some (even significant) data regarding
the occurrence of the individual smaller events in the string.
• Fault Trees and Monte-Carlo Simulation are two methodologies based
on this idea.


◆ Another approach in investigating rare events associated with
complex systems is to generate the related Accident/Failure data
artificially.
• Via controlled experiments, simulators, extreme condition tests,
survival tests in simulated conditions, survival tests in laboratory
environments.
• Care should be taken to make sure the experiment/test environment
closely resembles the actual working conditions in all aspects.
- Pressure, heat, vibration, lighting, noise, time window, operating speed,
utilities, cooling;
- In the Deepwater Horizon Oil/Gas rig accident case the problematic
“Blowout Preventer - BOP” unit had featured 5/11 failure rate in deep-
water conditions, while having only 1% failure rate in above-water tests.
The second value was deployed in the safety report.

A Warning Regarding Rare Events

◆ The occurance likelihood and/or potantial damage associated with
rare events may simply not be computable.
• We know a lot less about hundred-year floods than five-year floods.
• Modeling and estimation errors swell when it comes to small
probabilities.
• The rarer the event, the less tractable, and the less we know about how
frequent its occurrence or its potential damage.
◆ The height (6 m.) of tsunami protection walls around the Fukishima
Nuclear Power Plant were designed and built based on 60 year
statistics.
• 150 year statistics, which featured a 7 m. tsunami, became available after
the plant became operational.
• It was too late; nobody took notice (until the 7 m. tsunami of 2011).

Elicitation of Expert Judgment
Case of Insufficient Data – Elicitation of Expert Judgment

◆ Another approach in investigating rare events is to generate the
related data via the extraction of the opinions of related experts.
• The structured elicitation of knowledge from domain experts;
• While ensuring a “clean” collection process, primarily regarding,
- Consistency of interpretation;
- Data gathering under same conditions.
◆ What can be asked to experts?
• Probabilities for outcomes of observable events (under general or
specific environmental conditions);
- Failure on demand; Failure before time t; Median failure time.
• Impact levels associated with realized (undesirable) events (under
general or specific environmental conditions)
- Qualitative (high, medium, low);
- Quantitative (TL, lives lost, time lost).

◆ Principles of Structured Expert Judgment (EJ)

• Reproducibility
- All calculations should be reproducible.
• Accountability
- Source of expert subjective evaluations should be identifiable/traceable.
• Empirical/Rationality Control
- Expert assessments should, in principle, be open to Empirical and/or
Rationality control.
• Neutrality
- Methodologies deployed should encourage experts to state their true
opinions.
• Fairness
- All experts should be treated equal a-priori.
• Clarity
- What is being asked should be precisely defined.

◆ Strategies for Elicitation of EJ

• Make the choices for probabilities comprehendible;
• Provide both quantitative lower and upper bounds;
• Provide a single quantitative (lower or upper) bound;
• Provide P-I Table like descriptions for quantitative categories;
• Try to field “calibration” queries (i.e. extract expert opinion on known
items/issues);
• Try to methodically catch inconsistencies (irrationalities) in expert
responses;
• Ask experts to respond by ratios, instead of absolute values;
• Provide “central tendencies” of other experts and ask if he/she would
like to reconsider;
• Point of inconsistencies (irrationalities) in his/her responses and ask if
he/she would like to reconsider.
• Use technology whereever possible in extracting information.

◆ Generic Issues (in Elicitation of EJ)

• Achieve consensus – or diversity of opinions?
- Experts usually disagree;
- Diversity of opinion may also be important.
• Who is an Expert?
- Different levels of expertise may be present;
- The analyst may not know how to judge who is best able to give opinion;
- Choice of experts may bias study or make study appear biased;
- Taking experts from one scientific “school” of opinion is bad for diversity;
- Sometimes no specific expertise is required; just ordinary subjects from the
investigated rist environment.
• Uneven Levels of Expertise
- High degree of specialism;
- Rating of experts may represent “average” over their field of expertise.
• Dishonesty and/or Biases
- Experts might have interest in outcome of study (financial, workload, ego)
- May have Structural, Motivational, Cognitive biases.

What is “Cognitive Bias” in Subjective Data
u Cognitive biases are psychological tendencies that cause the human

brain to draw incorrect conclusions.
• Such biases are a form of "cognitive shortcut", often based upon rules of thumb,
and include errors in statistical judgment, social attribution, and memory.
• They are systematic errors in thought process impacting rational thinking and the
way one lives & works.
• These biases are a common outcome of human thought, and often drastically skew
rationality and reliability of the interviewee.
• Recognizing such biases is important not only for EJ elicitation, but also for better
understanding risk perception, communication and risk management in general.
u Cognitive Social Biases
• Forer Effect: The tendency to give high accuracy ratings to descriptions of their
personality that supposedly are tailored specifically for them, but are in fact
vague and general enough to apply to a wide range of people (e.g. Horoscopes).
• Ingroup Bias: The tendency for people to give preferential treatment to others
they perceive to be members of their own groups.

What is “Cognitive Bias”
u Cognitive Social Biases
• Self-fulfilling Prophecy: The tendency to engage in behaviors that elicit
results which will (consciously or not) confirm existing attitudes.
• Halo Effect: The tendency for a person's positive or negative traits to
"spill over" from one area of their personality to another, in others'
perceptions of them.
• Herd Instinct: Tendency to adopt the opinions and follow the behaviors
of the majority to feel safer and to avoid conflict.
• False Consensus Effect: The tendency for people to overestimate the
degree to which others agree with them.
• Status Quo Bias: The tendency to defend and bolster the status quo.
Sometimes, existing social, economic, and political arrangements are preferred,
against any change, even at the expense of individual/collective self-interest.
• Illusion of Transparency: People overestimate others' ability to know
them, and they also overestimate their ability to know others.

u Cognitive Memory Biases
• Suggestibility: A form of misattribution where ideas suggested by a
questioner are mistaken for memory.
• False Memory: A form of misattribution where a memory is mistaken for
imagination, or the confusion of true memories with false memories.
• Consistency Bias: Incorrectly remembering one's past attitudes and
behavior as resembling present attitudes and behavior.
• Rosy Retrospection: The tendency to rate past events more positively
than they had actually rated them when the event occurred.
• Self-serving Bias: Perceiving oneself responsible for desirable outcomes
but not responsible for undesirable ones.
• Hindsight Bias: Enriching memory of past events with present knowledge,
so that those events look more predictable than they actually were.
• Egocentric Bias: Recalling the past in a self-serving manner (e.g.
remembering a caught fish as being bigger than it was).

u Cognitive Decision Making Biases
• Hyperbolic Discounting: The tendency for people to have a stronger
preference for more immediate payoffs relative to later payoffs.
• Irrational Escalation: The tendency to make irrational decisions based
upon rational decisions in the past or to justify actions already taken.
• Omission Bias: The tendency to judge harmful actions as worse, or less
moral, than equally harmful omissions (inactions).
• Exposure Effect: The tendency for people to express undue liking for
things merely because they are familiar with them.
• Negativity Bias: Phenomenon by which humans pay more attention to
and give more weight to negative than positive experiences.
• Normalcy Bias: The refusal to plan for, or react to, a disaster which has
never happened before.
• Semmelweis Reflex: The tendency to reject new evidence that
contradicts an established paradigm.

◆ Cognitive Probability/Belief Biases
• Positive Outcome Bias: The tendency to overestimate the probability of
good things happening to them.
• Selection Bias: A distortion of evidence or data that arises from the way
that the data are collected.
• Disregard of Regression Toward the Mean: The tendency to expect
extreme performance to continue.
• Clustering Illusion: The tendency to see patterns where actually none
exist.
• Hawthorne Effect: The tendency to perform or perceive differently when
one knows they are being observed.
• Gambler's Fallacy: The tendency to think that future probabilities are
altered by past events. Results from an erroneous conceptualization of the Law
of large numbers. For example, "I've flipped heads five times consecutively, so
the chance of tails coming out on the sixth flip is much greater than heads”.

◆ Biases Precipitated by the Modern World
• Automation Bias: Tendency to favor the suggestions of Automated
Systems (algorithms).
E.g: in Netflix we tend to follow the suggestions of the system, which are based
on some algorithms analyzing, catagorising, rating our past viewing history.
Instead of implementing good, old fashioned human curiosity.
• Google Effect: Tendency to forget information that can be easily accessed
online or from our own electronic files (when we need it, we look it up).
In one experiment, participants typed trivia statements into a computer and were
later asked to recall them. Half believed the statements were saved, and half
believed the statements were erased. Participants who assumed they could look up
their statements did not make much effort to remember them.
• Ikea Effect: Tendency to attach a higher value to things we help create.
Combining the Ikea Effect with other related traits, such as our willingness to pay a
premium for customization, is a strategy employed by companies seeking to
increase the intrinsic value that we attach to their products.

Elicitation of Expert/Subject Judgment
Common Mistakes Made In Survey Preperation/Execution
• Having Little or No Understanding of The Target Audience
- The analyst should know as much as possible about the attitudes & beliefs of the
potential respondents.
- Focus or what information respondents can provide, rather than what information the
is seeked through the survey.
• Providing Restritive Multiple Choice Lists
- Try to include answer options such as “don’t know” or “not applicable” and “other.”
- This approach will weed out respondents that don’t have a clear opinion; otherwise,
the good responses might get contaminated.
- Also, respondents become frustrated when they don’t see their choice response.
• Requiring Answers To All Questions (Online Surveys)
- A few skipped responses is not going to change the results.
- Ultimately the respondents cannot be forced to answer a question. If so desires, a
respondent can just close his browser and forget about the survey.
• Asking Many Open-ended Questions
- Too many open-ended questions (comment fields) make it appear that the analyst did
not bother to create easy-to-answer questions focused on survey objectives.

• Using Ranking Questions Incorrectly
- Ranking questions are often difficult for a respondent to answer (and even more
difficult for them to analyze and interpret).
- If there is just one (or two) subjects then a ranking of all their priorities (over the
options presented) may be necessray.
- When there are many subjects, asking respondents to select their top three priorities
(or two or four) creates a natural ranking when the data is aggregated.
• Asking Too Many Questions
- There is no magic number for right number of survey questions. Limiting factors are,
i) commitment and attention span of the target audience, and
ii) resources and time the analyst has for acting on the information received.
• Asking Two Survey Questions In One
- This is a great way to frustrate the respondents and obtain ambiguous data.
- As example, consider the question: “Rate technician’s professionalism & knowledge”
❖There are 2 questions: technician’s knowledge may be great and his professionalism lousy.

• Making Questions Too General
- Information obtained from a survey depends on asking focused, unambiguous
questions specific to the survey objectives.
- While, the problem with general questions is that two respondents can sometimes
answer the question the same, but for completely different reasons.
• Putting Too Little Time and Effort Into Survey Preperation/Execution
- It is easy to prepare a survey with lot of questions and send it out to a broad group.
- The difficulty is getting usable information that can help with solid decision-making.
- Every question in a survey needs to be well thought out and evaluated against the
survey objectives and the target audience.
- Extra effort spent in survey preperation will pay big dividends when using the data.

Expert Bias due to hazard availability (frequency)
Estimated number of deaths per year
Actual number of deaths per year

◆ Combining experts’ quantitative responses.
• Arithmetic Average
- Simple averaging of the quantitative values provided
• Geometric Average
- Product of the (n) quantitative values provided, taken to the nth root.
◆ Well Known EJ Elicitation Methodologies
• The Delphi Method
- Experts invited to make predictions;
- Those outside central tendency asked if they want to revise their responses;
- Experts must justify staying outside central tendency.
• The Analytical Hierarchical Process (more suitable for value judgments)
- The value judgment elicitation process has a hierarchical design;
- At each level of hierarchy factors contributing to the overall value are
compared among themselves through a series of pair-wise comparisons;
- The factors at the lowest level of the hierarchy are individually queried for
their value contributions on a semi-quantitative 0-9 scale.

◆ The Analytical Hierarchical Process
• The product/service/event whose overall value/impact is to be assessed is
placed at the root of a “Hierarchy Tree”;
• The main components/factors contributing to the overall value/impact are
identified and placed at the second level of the hierarchy;
• Each main factor is then further decomposed into its sub-factors, thus forming
the third level of the hierarchy, and so on;
• At each level of hierarchy, (sub)factors’ weights/value contribution potentials
are determined by comparing the (sub)factors involved among themselves
through a series of pair-wise comparisons;
• The factors at the lowest level of the hierarchy are individually queried for their
value contributions on a semi-quantitative 0-9 scale.
◆ Strong Points:
• Pairwise comparisons; availability of a “consistency” index; the semi-
quantitative valuation.
◆ Weak Points:
• Level of remaining factors being ignored during pairwise comparisons.

An AHP Model for Assessing the “Riskiness” of a Pipeline Laying

Project

Statistical Analysis – Based on Historic Data

Statistical Analysis should be seen more as a screening instrument for
identifying where to concentrate the follow- up when studying
numerous hazards.
• Since all decisions (in business, politics, or even one’s personal life) are
based on some idea of what the future holds, demand for statistics to foresee
the future is insatiable.
– Additionally, people want to be able to justify decisions that they would have
already made anyway for other reasons.
• Suppose we have to investigate 100 hazards. Then some kind of
identification of the key hazards would be useful and statistical testing could
be used for this purpose.

Statistical Analysis
Statistical Analysis based on Historic Data

Suppose 2, 4, 3, 5, 6 accidents leading to injuries have been observed in a
company in five consecutive years. Related important statistical information
• Expected (Mean) Value: should we expect 4 accidents for the coming year?
• Variation: how close/far the actual number of annual accident realizations will
be to the mean value estimated?
• Point Estimates: what is the probability of having 5 accidents in the coming
year?
• Range Estimates: what is the probability of having 3-5 accidents in the coming
year?
• Confidence Intervals: What is the upper bound that we can (reasonably) be
sure (at 95% confidence level) that the number of accidents will not exceed?
• Best/Worst Case Prediction: should we expect at most 6 accidents for the
coming year?
• Trends: is there a negative or positive trend in the number of accidents/failures?

Key Statistics based on Historic Data

◆ Measures of Central Tendency
• Expected (Mean) Value (μ): weighted average of all generated output values.
• Median Value: Value above and below which there are equal number of
realizations in the compiled data.
• Mode Value: The data value with the greatest observed frequency.
For any unimodal and positively skewed distribution, mode, median
& mean fall in that order.

◆ Measures of Spread
• Variance (σ2): Average of the squared distance of all data values from the mean
– Given a set of n data points (x1, …xn), their Variance (σ2) is defined as,
• Standard Deviation (σ): It is the square root of the Variance and has the
advantage of featuring the same units as the data to which it refers.
– For a set of n data points (x1, …xn), the standard deviation (σ) is defined as,
• Mean Deviation: The average of the absolute differences between the data
points and their mean.

Some probability areas of Normal Distribution as a function of Standard Deviation

u Measures of Spread
• Range: Difference between the maximum and minimum data values.
• Normalised Standard Deviation: It is Standard Deviation divided by the mean.
σN = σ/μ
Advantages: Spread is being measured as a fraction of the mean
• Interpercentile Range: It is the difference between two percentiles
▪ x95 – x05 giving the central 90% range;
▪ x90 – x00 giving the lower 90% range;
▪ x90 – x10 giving the central 80% range.
• Normalised Interpercentile Range: Works the same way as normalized S.
Deviation
▪ (xB – xA) /x50, where xB > xA are percentiles like x95 and x05.
Chebychev’s Inequality: This is distribution free measure of spread
P æçè x - mx ³ ks x ö÷ø £ 12
k

• Confidence Intervals: The basic idea is to look at how likely it is to obtain
realized values (for a parameter, for a random variate) “more extreme” than
(beyond the limits of) the interval actually computed for specific parameter
values of the probability distribution assumed to describe the underlying
population.
– The degree of acceptability in a parameter estimate is conveyed by using a confidence
interval.
– If, for a given parameter value, the observed value becomes extremely unlikely, than
that parameter value may itself be considered unlikely.
– e.g. consider the estimation of probability of failure “p”, by the ratio “r/n”, given “r”
failures in “n” independent trials; then, the central 90% statistical confidence bounds
for the probability of failure is:
pl: largest p value such that P(observing r or more failures in n trials/p) ≤ 0.05
pu: least p value such that P(observing r or fewer failures in n trials/p) ≤ 0.05

• A Confidence Interval for any estimated parameter, θ, is usually sought
as a measure of the data variation.
• An interval (θL, θH) is said to be (1-α)100% confidence interval if there is
a probability of (1-α) that the interval contains θ, that is,
P(θL< θ < θH) = 1-α
• Level (1-α) is a measure of our confidence that the interval contains θ.
• Once an appropriate probability distribution is fitted (assumed) and its
parameters estimated, the desired confidence interval can routinely be
computed.
• Interval width is a measure of data variation (in regard to the assumed
distribution), as well as an indication of the appropriateness of the
assumed distribution.
– Especially telling are the status of future realizations in regard to the
confidence interval.

u Measures of Shape
• Skewness (S): It is the degree to which a distribution is “loopsided”.
– A positive (negative) skewness meaning a longer right (left) tail.
– Given a set of n data points (x1, …xn), their skewness (S) is defined as,
• Percentile Skewness (SP): It is similar to skewness.
– (0 <SP <1) indicates a longer left) tail; (1 <SP) indicates a longer right tail.

Skewness Examples

Reliability and Hazard Functions

Regarding Accident/Failure risk, a factor of particular interest
is time to next accident/failure.
• Reliability Function gives the probability that a particular environment
(or equipment or component) survives up to time t without failure (or
accident); i.e. Time to failure, T, exceeding t.
R(t)=P( T > t)
• Hazard Function gives the probability that a particular environment (or
equipment or component) fails in the period dt, given that it has
survived up to time t.
æ ö
P çè t < T ≤ t +dt ÷ø f (t)dt
hT (t)dt = P æçè t < T ≤ t +dt / T > tö÷ø = = T
æ ö
P çè T > t÷ø R(t)

Hazard Function Patterns

Constant failure rate over useful life.
Gradually increasing failure rate with no identifiable wear.
Low failure rate followed by a constant level.
Infant mortality followed by a constant failure rate.
Constant failure rate followed by a pronounced wear-out.
Bath-tub curve: infant mortality followed by a stable and

wear-out periods.

Key Statistical Analysis Tools
Regression Analysis – Linear & Multiple Linear Regression

◆ The investigation of a mathematical relationship between some possible
incident (accident) causing factors and one (or more) set of accident
realizations.
• Such as traffic accidents & road congestion, highway quality, speed limits etc.
◆ Linear Regression aims at representing the relationship between a set of
independent variables (x1, …, xn) and the dependent variable y, as follows:
yi = β0 + β1x1i + β2x2i + … + βnxni + εi
βj is the regression slope for the variable xj;
β0 is the y-axis intercept;
εi is the error term of the ith trial.
◆ Key assumptions behind Linear Regression:
• Error terms are Normally distributed;
• The individual yi values observed are independent;
• The distribution of y given a value of x has equal variance for all x values.

Regression Analysis – Linear Regression

◆ Suppose, there is a single independent variable, with the assumption of
normally distributed error terms, the relationship between x & y reduces to,
yi = Normal(βixi +β0 ,σ),
where, “βi” is the slope of the line;
“β0” is the y-axis intercept;
“σ” is the standard deviation of the variation of y about this line.

Regression Analysis – Choice of Independent Variables

In any regression study, proper identification of the independent variables (in
accordance with the selected dependent variable) is very important.
• Consider: traffic accidents vs. road congestion, highway quality, speed limits etc.
• Suppose you have traffic accident data regarding 100 intersections and/or road
segments, together with traffic congestion, road quality and speed limit info at
these locations.
• You are after the relationship: y = β0 + β1x1 + β2x2 + β3x3 + ε
• Data associated with each road segment/intersection, i, corresponds to,
yi = β0 + β1x1i + β2x2i + β3x3i + εi
• What othen factors can you think of that may effect "the realization and impacts
of traffic accidents"?
– How about "visibility" of the road (beyond the intersection/curve ahead)?
– How about "age/gender/experience of drivers involved in accidents”?
– How about ”type, make, age & speed of vehicles involved in accidents”?

Regression Analysis – Choice of Independent Variables

• Remember that independent variables should help "explain" the value of
the dependent variable. In this context,
• High/low Visibility (at a road segment) may certainly contribute to the
accident rate/impact at that road segment.
• How can "age/gender/experience of drivers involved in an accident"
contribute to accident rate of the road segment?
 It is not a characteristic of the road segment;
You would at least need the ages/genders/experience levels of drivers not
involved in accidents (an additional set of information that would most
possibly be unavailable);
Then % of young, male and/or inexperienced drivers could be considered.
• What if the dependent variable was "damage level in realized accidents"?
 Then age/gender/experience of drivers involved in accidents, as well as make of car
and speed of car involved in accidents would make sensible dependent variables.

Regression Analysis – Logistic Regression

◆ In some cases the response variable is binary
• Only 0-1 values making sense, such as,
• Patients responding properly to a drug (success) vs less than adequate response;
• Landslide occurrence at a particular site (success) vs no landslide;
• That mean response then becomes a probability.
◆ Logistic Regression aims at representing the relationship between a set of
independent variables (x1, …, xn) and the binary dependent variable y,
◆ Key characteristics of Logistic Regression:

• Error terms need not be normally distributed;
• The individual y values observed should be independent;
• Relationship between independent variable(s) and response variable not linear.

◆ “Logistic Function” is non-linear, “S” shaped & bounded in the 0-1 range;
◆ The regression problem becomes the problem of estimating the success
likelihood of the response variable “p”, given the levels of the independent
variables,
0 xi
◆ The portion is called the “Linear Predictor”;
◆ The “Odds Ratio”, which is the likelihood of success divided by the
likelihood of failure is an important concept in Logistic Regression;
◆ The Odds Ratio is designed to determine how the “odds of success”

increases as certain changes in regressor values (independent variables)
occurs.
◆ The logarithm of the odds ratio is the linear predictor.

◆ In the “Landslide Likelihood Estimation of Sites” case,
• Independent variables could be, geographical characteristics (slope), soil
characteristics, water content, vegetation of various sites;
• Dependent variables could be whether a landslide was realized (y=1) or
not (y=0) in the last 50 years, at the selected specific sites;
• In the regression equation the likelihood of a landslide is set against the
values of the explanatory variables.
◆ In the “Patients’ Response to Drugs” case,
• Independent variables could be, patient characteristics (age, gender,
weight), drug dosage;
• Dependent variables could be whether the drug was successful (y=1) or
not (y=0) specific patients;
• In the regression equation the likelihood of drug success is set against the
values of the explanatory variables.

Hypothesis Testing
Hypothesis Testing is a key component of classical statistics.
• Test of simple hypotheses: Hypotheses in which the distnibution of the random
variable is fully specified.
The standard appoach in Hypothesis Testing is using Test Statistics.
• A real valued function of the data whose distribution, under the hypothesis is
known.
• If the value of the test statistic based on the data is extreme, then the hypothesis is
rejected.
Hypothesis Testing involves the idea of Significance Level.
• A hypothesis is rejected at m% significance level, if the value of the test statistic
calculated from the datais in the upper m% tail of the distribution.
Key Problem: The exact distribution of the test statistic is usually unknown.
• Often, asymptotic (limiting) distribution of the test statistic is known, which is
reliable when number of observations is large.

Hypothesis Testing for Distribution Fitting (χ2 tests)

◆ Suppose a random quantity can take values in one of “n” distinct categories
and it is desired to test the hypothesis that “the probability of being in
category k is pk”.
• Frequently, (p1, …, pn) are obtained from the assumed (fitted) distribution.
◆ A random sample of N independent observations are taken, to form the
empirical probabilities,
(number of observations in category k)
sk =
N
◆ Intuitively, if the hypothesis is true, there should not be a large deviation
between any pair sk & pk; a relevant test statistic is:
n æs ö
I(s,p) = å si log ç i ÷
çp ÷
1 è iø
◆ If the calculated value of I(s,p) is in the upper tail of the χ2n-1 distribution,
then the hypothesis is rejected.

Hypothesis Testing for Distribution Fitting (K-S tests)

◆ Suppose it is desired to fit an appropriate cumulative distribution funtion F
to a data set containing n random observations.
• First, the empirical cumulative distribution Sn(x) is constructed.
ì 0 x < x1
ï
Sn (x) = ïí k x k £ x < x k+1
ï n
ïî 1 xn £ x
• The basic procedure involves the comparison between the empirical and the assumed
(fitted) cumulative distribution functions.
− If the discrepancy is large, the assumed function is rejected.
• A relevant statistics is
Dn = n max
x F(x) - Sn (x)
• If Dn is above a critical value of the K-S statistic, then the hypothesis that the true
distribution is F can be rejected.
− If Dn < Dnα , proposed distribution is acceptable at the specified significance level α.

Kolmogorov-Smirnov
Distribution Fitting
Empirical Cumulative Frequency
versus
Theoretical Distribution Function

Hypothesis Testing
If the issue is important, the observed trend suspicion (statistical conjecture)
should not be just attributed to randomness and ignored even when
hypothesis testing fails.
• Suppose 2, 4, 3, 5, 6 accidents leading to injuries have been observed in a
company in five consecutive years and say we are concerned about an increasing
trend in accidents.
• This example data set would not pass a statistical significance test (in this case
regarding increasing trends).

Correlation Analysis
The investigation of dependence relationships among random factors.
• Recognition and investigation of the interdependencies between uncertain
components is crucial in most risk studies.
• Such as the possible dependence relationships among road congestion, highway
quality and speed limit in the above example.
• Such as the dependence between interest rates and mortgage rates.
– A strong positive correlation is expected: if the interest rate turns out to be at the
high end of the distribution, the mortgage rate is also expected to feature a high
value.
• If interdependency between two random variables is ignored, the joint
probabilities of these random variables will be incorrectly modeled.
– While a low interest rate in concert with a high mortgage rate is not very likely in
reality, under a false independence assumption this situation would be quite routine.

Distributions of interest rate and montgage rate predictions

Reasons of Correlation Between Observed Data

There may be a logical relationship between the two (or more) variables.
• Such as the case between interest rates and mortgage rates.
There may be another common external factor affecting both variables.
• Prevailing weather conditions will affect both the time it takes to excavate the
site, and the time it takes to pour the foundations.
Two random phenomena might naturally feature similar behavior without
having any relationship between them.
• The number of personal computers in use in Europe and the population of Asia
will probably seem to be strongly correlated since both have increased steadily
over extended periods of time.
The observed correlation may have occurred purely by chance (and no real
correlation actually exits).

Difference Between Dependency and Correlation

A Dependency relationship is where the sampled value of one variable
(called the independent) has a statistical relationship that approximately
determines the value of the other variable (called the dependent).
• It usually features an underlying (average) relationship between the variables
around which the individual observations will be scattered.
• It presumes a causal relationship.
• Such as the mortgage rate in essence being dependent (driven by) the interest rate.
Correlation describes the degree to which one variable is related to another.
• Given random variables X & Y, the related Covariance & Correlation
Coefficient terms are defined as,
Cov(X,Y) = E(X Y) – E(X) E(Y)
ρ(X,Y) = Cov(X,Y) / (σX σY)
• Some care is necessary in interpreting covariance:
Independent variables are always uncorrelated, but uncorrelated variables are not
always independent.

Example
◆ Suppose X = Uniform (-1, 1) ; Y = X2
Then, Cov(X,Y) = E(X Y) – E(X) E(Y) = E(X3) – E(X) E(X2) = 0
Since, E(X3) = E(X) = 0
◆ This is one reason Scatter Plots are as important as the numerical correlation
statistics.
• Independent variable is plotted on the x-axis, and the dependent on the y-axis.
• They provide a simple way of visualizing form of a correlation or dependency.

Unnoticed/Ignored Dependence among random factors, whose

simultaneous occurrence/non-occurrence may trigger/lead to disasters is a
central issues in risk assessment & management.
The Sinking of the Transatlantic Liner Titanic
• Titanic’s hull had featured 14 separate and independent waterproof
compartments and the ship would not sink as long as 9 of these
compartments remained functional and waterproof.
• The designers estimated some small probability of failure (accident
caused or otherwise) for each compartment and then, assuming
independence, simply took the 6th power of this “single compartment
failure” probability to represent the vessel’s sinking probability (which,
of course, turned out to be a very small number).
• They did not consider the fact that the failure of these compartments
could be dependent, which, unfortunately turned out to be the case when
the vessel scraped an iceberg damaging 6 compartments at the same time.

The Major Flooding of New Orleans Caused by the Hurricane Katrina

• The severe loss of life in New Orleans due to Hurricane Katrina in 2005
was a function of interactions among a number of disparate factors:
i) a low-probability storm intensity;
ii) a low-probability storm trajectory;
iii) low-probability failures of the levee system;
iv) difficulties in achieving high evacuation rates; etc.
• Before disaster risk studies either did not incorporate the simultaneous
occurrence of these factors or considered their independent occurrence.
• Unfortunately, the storm intensity and trajectory pretty much led to the
failure of the levee system;
• The non-expectations of the emergency management team regarding
storm intensity and trajectory led to a half-hearted effort regarding
evacuations.

The Earthquake Triggered Fukushima Nuclear Power Plant Accident

• The earthquake and the tsunami cut the supply of off-site power to the
plant.
• Diesel generators intended to provide back-up electricity to the deadly
critical cooling systems all failed in three of the six reactors at the site.
- Just at the Daiichi-1 reactor all 13 back-up diesel generators failed;
- Emergency crews were struggling to keep the cooling systems running by
available batteries and extra batteries shipped in by helicopters;
- The chance of independent failures is almost non-existent;
- Presumably some (previously ignored) common cause lead to the failures,
thereby making all failures dependent events;
- Such as all generators being located at underground or ground levels,
therefore being all effected by the Tsunami, which immediately followed the
earthquake.

Rank Order Correlation Coefficient

When two variables X and Y are functionally related in a non-linear way
(such as Y = X11), then correlation coefficient may be small.
In such cases Rank Order Correlation Coefficient is much more effective in
identifying correlations.
• Ranking of the data (what position/rank the data point takes in an ordered list
from the minimum to maximum values) is deployed.
• Suppose there are “n” data pairs (xi, yi); let (R(xi), R(yi)) be the relative
rankings of the components (xi, yi). Then, Spearman’s ρ is calculated as,


Statistical Inference
Given a set of observations regarding some random phenomena, estimation
(inference) of the probability distribution describing it.
• Usually, it is assumed that the distribution is one of a family of distributions f(t/θ)
parameterized by θ, and the assessment of the likely values of θ is endeavored.
• It is generally agreed that Statistical Inference should be based on the likelihood
of a parameter given the data.
• The Likelihood Function L(θ/x) for given data x, is equal to the Probability
Density Function of x given θ, f(x/θ).
In Risk Analysis and/or Assessment random phenomenon of interest are
usually the occurrence of undesirable events and their impact levels.
The realizations and/or levels of factors that may be triggering undesirable
events or their consequences may also be additional random phenomenon of
interest.

Bayesian Inference
The key principle behind Bayesian Inference is the revision of probabilistic
estimates based on current data.
• As such, Bayesian Inference allows the use of prior knowledge, which is very
important when there is little data (as is often the case).
• This “prior knowledge/expertise” of the modeler regarding the random
phenomena is contained in his estimate of a “prior distribution”.
The Bayesian viewpoint is popular in the risk analysis community.
• The parameter is considered to be stochastic in order to represent all the various
sources of uncertainty affecting incident occurrence.
• The process of updating a prior distribution to obtain a posterior distribution
gives an important role to the analyst/engineer.
– He/she can use their expertise to decide on the form of the prior distribution, then
experiments/observations/data can be deployed to update the prior.
• This combination of giving weight to experts, but still allowing for the scientific
evidence makes the approach popular.

The Bayes’ Theorem

A partition is a collection of events A1, …, An ⊂ Ω, such that Ai ∩Aj = ∅
whenever i ≠ j, and A1∪ …∪ An ⊂ Ω.
Theorem: Suppose that B is an event and A1, …, An a partition. Then,
A1 A2 A3 A4 A5 A6

Example I Deploying Bayes’ Theorem

The 5 meter pipe segments used in a pipeline construction are either good (g) or bad
(b) (prone to leak oil under operating conditions), with probabilities P(g) = 0.999,
P(b) = 0.001. A simple inspection technique can be used to identify good and bad
segments and is rather effective in that good segments are identified as good with
probability 0.99, and bad segments are identified as bad with probability 0.99.
Suppose that a particular segment is identified as bad under this inspection. What is
the probability that it is actually bad?
The prior probability that it is bad is 0.001. We need to determine P(b/inspection b).
By Bayes’ Theorem:

Example II Deploying Bayes’ Theorem

A particular disease strikes 0.1% of the population. A test of the disease carries a
rate of 5% false positives (i.e. test giving a positive result on a healthy person or
giving a negative result on a disease stricken person). People are tested at random,
regardless of whether they are suspected of having the disease. A particular
patient’s test turns out to be positive; what is the probability of that patient being
stricken with the disease?
The prior probability of being struck with disease is 0.001. We need to determine
P(disease/test positive). By Bayes’ Theorem:

Classical Statistical Inference – Maximum Likelihood Estimation

The Maximum Likelihood Estimation is the most important classical
estimation method based on the Likelihood Function.
• Given a Likelihood Function L(θ/x), a Maximum Likelihood Estimator (MLE)
for θ is a value θ° such that,
L(θ°/x) = maxθ L(θ/x)
• MLE is the value of the parameter θ for which the data x observed has the
highest probability.
f(x/θ°) = L(θ°/x) ≥ L(θ/x) = f(x/θ)
• MLE approach is often much easier to use than Bayesian methods; in many
cases a simple formula can be used to calculate the MLE.
• MLE approach does not require the determination of a prior distribution.
• When there is a reasonable amount of data, both approaches converge to the true
value of the parameter.
– Bayesian methods will tend to “forget” the prior distribution as data increases.


◆ If X is Binomial with parameters (n,p), then the MLE for “p” based on “r”
successes in “n” trials is (r/n).
◆ If T is distributed Exponentially, with failure rate λ, then the MLE
estimate of λ, based on “n” observations (t1, …, t10) is:
◆ If T is Poisson with parameter λ, then the MLE estimate of λ, based on “n”

observations (t1, …, t10) is the average of observations:


If X is Normal with parameters (μ, σ), then the MLE estimate of (μ, σ),
based on “n” observations (x1, …, x10) is:
In other words, for many of the well-known distributions, the statistics we

routinely use to represent key parameters are actually Maximum Likelihood
Estimators.

The Shapes of the Normal Density & Cumulative Functions under

different (m, s) realizations/assumptions.

Statistical Inference – Selection of the Assumed Distribution

The analysis discussed above is actually quite straightforward once the
representative probability distribution is assumed correctly.
• To represent the unkown underlying real probability distribution of the random
event/process being studied.
This selection, though very critical, does not have precise procedure.
• Rather it draws heavily from past experience, peer studies and some established
general guidelines;
• Closely observe the random environment and the related historic data to detect
trends and behaviours that indicate a good match to the basic defining
characteristics of the candidate probability distribution(s);
• Make sure you well understand the basic defining characteristics of the
candidate distributions;
• When in doubt, use more than one candidate distribution and select the best
fitting.

Statistical Inference – Basic Characteristics of Some Distributions;

The Normal Distribution
• Symmetric around the mean;
• Continous realizations;
• Reasonably short tails (99.7% of all realizations are in the [μ-3σ , μ+3σ] interval);
• If the random event/process under study can be regarded as “sum of independent
random variables”, the Central Limit Theorem strongly suggests normality.
The Exponential Distribution
• The memoryless property;
• Unsymmetric around the mean;
• Very long right tail;
• Continous realizations.
The Binomial Distribution
• Discrete realizations;
• Sum of independent Bernoulli (0/1) random variables.

Understanding Probability & Randomness
Case 1: Example II of Bayes’ Theorem Implementation

rate of 5% false positives (i.e. test giving a positive result on a healthy person or
giving a negative result on a disease stricken person). People are tested at random,
regardless of whether they are suspected of having the disease. A particular
patient’s test turns out to be positive; what is the probability of that patient being
stricken with the disease?
The prior probability of being struck with disease is 0.001. We need to determine
P(disease/test positive). By Bayes’ Theorem:

◆ Something very similar to such a situation occurs frequently due to

doctors’ perception of disease likelihoods.
• The problem presented in the last slide was actually from a quiz given to
doctors;
• More than 80% of the doctors who took the quiz answered 95% to the question
“A particular patient's test turns out to be positive; what is the probability of that
patient being stricken with the disease?”
• While, as we saw in the computations on the last slide, the correct answer is
0.0187.
◆How would you like your doctor prescribing you a medicine with
damaging side effects for a disease you were told you had, when
you may only have less than 2% probability of being affected by
it!
• Note that, it is not the doctors’ medical knowledge that is causing the problem,
but their perceptions and attitudes regarding probabilities.
◆ How can the testing reliability (and doctors’ perceptions) be improved?
A New Look into Case 1: Administer a Second Testing

rate of 5% false positives. People are tested at random, regardless of whether they
are suspected of having the disease. A particular patient’s test turns out to be
positive, thereby increasing the probability of that patient being stricken with the
disease to 2.87%. Apply a second testing (which also turns out to be positive); what
is the probability of that patient being stricken with the disease, given that the
second test is also positive?
The prior probability of being struck with disease is now 0.0187. We need to
determine P(disease/second test positive). By Bayes’ Theorem:

Case 2: Fallacies of Causal Thinking - A Cancer Study

A study of the incidence of kidney cancer in the 3,141 counties of the
U.S.A revealed a remarkable pattern. The counties in which the
incidence of kidney cancer was lowest were mostly rural, sparsely
populated, and located in the Midwest, the South, and the West.
• What do you make of this?
Now consider the counties in which the incidence of kidney cancer
was highest. These ailing counties also tended to be mostly rural,
sparsely populated, and located in the Midwest, the South, and the
West.
What is going on?

Imagine an urn filled with marbles; half red, half white. Next,
imagine blindly drawing 4 marbles from the urn, recording the
number of reds in the sample, throwing the balls back into the urn.
If such trials are repeated many times, we will find the outcome (2 rd, 2 wh)
occuring (almost) 6 times as often as the outcome (4 red or 4 white).
Suppose two samplings are done from the same urn. In the first, 4
marbles, in the second 7 marbles are drawn on each trial. In both
cases the occurance of extreme samples (all white or all red) are
recorded.
If this is repeated long enough, it will be observed that extreme outcomes
occur 8 times more often in the first sampling.
• Expected percentages being 12.5% and 1.56%.
This statistical fact is relevant to the cancer example.

Imagine the population of the USA as marbles in a giant urn. Some

marbles are marked KC, for kidney cancer.
You draw samples of marbles and populate each county in turn.
Rural samples are smaller than other samples.
Just as in the marble ball game outcomes (very high and/or very low
cancer rates) are most likely to be found in sparsely populated
counties.
The following two statements mean exactly the same thing:
• Large samples are more precise than small samples.
• Small samples yield extreme results more often than large samples do
Various studies have shown that even sophisticated researchers have
poor intuitions and a wobbly understanding of sampling effects.

Somewhat similar to the “Cancer Study”, the issue of “more

frequent extreme outcomes” also blurred the Corona19 vaccine tests:
• As of January 25, 2021, Phase 3 tests of various Corona19 vaccines
were not yet completed (i.e the full sample was not yet processed).
• Nevertheless, some countries/authorities announced “pre-results”,
meaning results from “smaller (uncompleted) samples from individual
countries”.
• So, like the “cancer outcomes” in small counties, these small samples
from individual countries featured more frequent extreme results, such
as “very high effectiveness rates” or “very low effectiveness rates”.
◆ So came Dr. Mehmet Ceyhan’s comment on Cumhuriyet:
• Zaten faz-3 çalışmasında kullanılan denek sayısı düşük, ama bunu bir de
ülke ülke, bölük pörçük açıklarsanız, açıkladığınız ülkelerin verilerinin
güvenirliliği olmaz.

Case 3: Fallacies of Causal Thinking - Newborn Babies

u Our inclination for causal thinking exposes us to serious mistakes in
evaluating the randomness of truly random events.
• Take the sex of six babies born in sequence at a hospital and consider
three possible sequences:
BBBGGG ; GGGGGG ; BGBBGB
• Are the sequences equally likely?
• BGBBGB is judged much more likely than the other two sequences.
• We are pattern seekers, believers in a coherent world, in which
regularities appear not by accident but as a result of mechanical
causality or intention.

Case 4: People See Patterns Where None Exists:

u During the intensive bombing of London in WW II, it was believed
that the bombing could not be random because a map of the hits
revealed conspicuous gaps.
• Some suspected that German spies were located in the unharmed areas.
u A careful statistical analysis revealed that the distribution of hits was
typical of a random process.
• and typical in evoking an impression that it was not random.
u To the untrained eye, randomness appears as regularity or tendency
to cluster; the tendency to see patterns in randomness is great.
u We are far too willing to reject the belief that much of what we see
in life is random.

Case 5: Fallacies of Causal Thinking - Good & Bad Performers

Inclination for causal thinking exposes us to serious mistakes in
evaluating the randomness of truly random events.
• An education expert was telling a group of flight instructors about the merits of
rewarding improved performance, rather than punishing mistakes.
• One of the seasoned instructors in the group raised an objection and denied that
this approach was optimal for flight cadets; he said:
• Many times I have praised flight cadets for clean execution of some aerobatic
maneuver. The next time they try the same maneuver they usually do worse.
• On the other hand, when I punish a cadet for bad execution, in general he does
better on his next try.
• So, don’t tell us that reward works and punishment does not, because the opposite
is the case.
What is going on?

Basic observations of the instructor were correct:

• Occasions on which he praised performance were likely to be followed by a
disappointing performance, and punishments were typically followed by an
improvement.
But the inferences made about the effectiveness of reward and
punishment were incorrect.
• What was observed is known as “tendency to the mean”, which is mostly due to
random fluctuations in performance quality.
• A cadet whose performance is far better than average gets a praise.
• But he is probably just lucky on that particular attempt and therefore likely to
deteriorate regardless of whether or not he is praised.
• Similarly, a cadet whose performance is far worse than average gets punished.
• But he is probably just unlucy on that attempt and likely to improve regardless of
what the instructor did.
The instructor attached a causal interpretation to normal fluctuations
of a random process.
Case 6: Role of the Difference Between Individual & Community

Variances Regarding Impacts of Risk Mitigation Measures
In many risk environments the impacts of various risk mitigation
measures on individuals (and the community) are random.
•For individuals, expected impact (of Risk Mitigation Measures) may be quite favorable
(considerably decreasing expected level and/or likelihood of negative outcomes).
•However, due to the randomness and variance of the impact, some individual
realizations might actually increase the level and/or likelihood of negative outcomes.
•For the Community as a whole, extreme outcomes (of Risk Mitigation Measures)
(highly decreasing/increasing level and/or likelihood of negative outcomes on
individuals) cancel one another, thereby greatly reducing the variance of risk
mitigation measures on the Community.
•Such Risk Mitigation Measures may be quite beneficial of the Community,
•While rejected/avoided by some risk averse individuals.

Case 6: Role of the Difference Between Individual & Community

Variances Regarding Impacts of Risk Mitigation Measures
Examples
• Vaccination Drives for some specific ailments: Featuring negligable negative
impacts for the Community as a whole, while being very much beneficial for
the overall Community (by reducing the impact of the malady).
– Their possible negative impacts on individuals may be regarded as unacceptable
by some (e.g recent resistance to smallpox vaccine by some parents).
• Hurricane/Flood Warning Induced Evacuations: They also feature negligable
negative impacts for the Community as a whole, while being highly beneficial for the
overall Community (reducing the potential damage of the expected Flood/Hurricane)
– Their possible negative impacts on individuals may be regarded as unacceptable
by some.
– Especially those whose residences are relatively safer, while the journey to the
disaster shelters and living conditions there might have its own risks.

◆ A Very Current Issue: Influenza Vaccination – Useful or Harmful

Societal Risk versus Individual Risk in Vaccination
◆ Individual Risks in Vaccination

P{Getting Sick/Vaccination} < P{Getting Sick/No Vaccination}
P{Major Negative Impacts/Vaccination} < P{Major Negative Impacts/No Vaccination}
P{Side Effects of Vacc. / Vacc.} > P{Side Effects of Vacc. / No Vacc.}
◆ Societal Risks in Vaccination
E[Number of Cases/ %65 Vaccination] << E[Number of Cases/ No Vaccination]
E[Societal (-) Impacts/ %65 Vaccination] << E[Societal (-) Impacts/ No Vaccination]
◆Accordingly,
– Prof. Mustafa Çetinel: “We cannot come to a healthy conclusion if we approach the
issue from the individual’s perspective”
– Dr. Mehmet Bozkurt: “Getting individually vaccinated is our duty to the society”

Case 7: Survivalship Bias

In some risk environments, the damage likelihood
and impact characteristics are evaluated by
analyzing only past “surviving” samples. This is
called “Survivalship Bias”.
Naturally, this way, risk may be underestimated
and risk reduction mesaures may be distorted.
• In the second world war, the allies tried to strengten the
planes bodies against anti-aircraft fire by examining the
damage location, type and size on the planes after their
return from a mission.
• Fortunately, the statistician Abraham Wald cautioned that
this appoach totally ignored the damage characteristics on
the downed planes, which should be the main issue.


Actually “Survivalship Bias” is one form of a more general problem:
Possible Bias in the sample data, which is a flaw experienced in diverse
statistics implementations.
• A psychology professor doing research on the driving habits & alcohol consumption
of young people, selecting all his/her samples from his/her university students.
• Yılmaz Özdil in Sözcü, April 2020: Kalp kriziyle alakalı araştırmayı Norveç'te
yaparsanız, “kalp krizi geçirenlerin yüzde 99'u sarışın” sonucuna ulaşabilirsiniz ve
bu araştırmanızı kanıtlayan istatistiki veriler ortaya koyabilirsiniz.
• A medical researcher doing research on the effects of a certain new drug or therapy,
selecting all his/her samples from one hospital.
• Yılmaz Özdil in Sözcü, April, 2020: En çok Corona vakası İstanbul'da. Çünkü,
testlerin yüzde 80'inden fazlası İstanbul'da yapılıyor. Manisa'da mesela, vaka az.
Çünkü, Manisa'da parmakla gösterilecek kadar az test yapılıyor.


• A traffic study, researching the possible causes of traffic accidents on different road
segments, focusing only on the vehicles involved in accidents.
Suppose, for some reason, trucks traverse a particular road segment in much higher
frequency than other types of vehicles. Then, naturally, the number of trucks
involved in accidents will be much higher in that particular road segment. The
sample bias might then induce the traffic researcher to conclude that this particular
road segment is more accident prone for trucks than other vehicle types.

Lessons Learned
There is a strong tendency to believe that small samples resemble the
population from which they are drawn.
• We are prone to exaggerate consistency and coherence of what we see.
Statistics produce many observations that appear to beg for causal
explanations; but do not lend themselves to such explanations.
Many facts of the world are due to chance, including accidents of sampling.
Causal explanations of chance events are inevitably wrong.
Probabilistic assessments on key issues are biased regarding “cost of being
wrong” and “publicity” considerations.
Difference between Individual & Community Variances regarding impacts of
Risk Mitigation Measures might create a natural conflict of interest.
Unbiasedness of the sample data is very important.

Fault Trees
Fault Trees (F-T)

It is an analytical technique where an undesired state (e.g. a ship collision,
collapsed bridge) of a system is specified and the system is analyzed to find
all credible (or incredible) ways in which the undesired event can occur.
• They are useful for focusing attention on what might go wrong and why.
A Fault Tree works with “backward logic”. Given a particular failure of a
system (the undesirable event), the component failures which contribute to
system failure are sought.
• The undesirable event is placed at the root and the tree is constructed out of that
event (moving left or down) with the possible immediate events that could have
made that outcome arise, continuing further with the possible events that could
have made the first set of events arise, and so on.
• Component failures giving rise to the root event are identified through
(combinations of) Boolean operations (and, or and not).

Fault Trees
Fault Trees
All root, intermediary & basic events of a Fault Tree have binary character.
• The corresponding indicator variable Xi= 1 in case of success, Xi= 0 otherwise.
The system being represented must be Coherent.
• The system as a whole cannot improve when one or more
subsystems/components fail.
Hospital System Power Failure System Power
Case Failure
G: Event of generator failing

S: Event of switch failing
B: Event of battery failing
Generato
An “and” gate: r
Failure
An “or” gate: Switch Battery

Failure Failure

Fault Trees
External Boundaries (limits of the system): Determination of the factors

that could be of influence in the root event. In the security system example:
• Should the oil tank supplying oil to the generator be included?
• Should the oil transporter be included?
• Should the case of both generator and switch failing because of a common
cause (struck by lightening) be considered?
Internal Boundaries (depth of the model): In how much detail should the
system be studied? In the security system example:
• Should the generator be split into its major components?
The choice of resolution depends on two factors:
i. If some design changes should be pursued as a result of the F-T analysis, the basic
components should be those which can be moved or replaced in the new design.
ii.The level of resolution should be such that good data about the reliability of the addressed
components is either available or can be compiled.
Temporal Boundaries (model is static): Systems change and/or behave
differently through the course of time or under different circumstances.

Fault Trees
AND Gate OR Gate
Intermediate Event Transfer Out
Basic Event (failure at the lowest level)
Priority AND Gate (output if and only if the inputs occur in a given order)
Exclusive OR Gate (output only if exactly one of the inputs active)
m Out of n Gate (output if and only if at least m of the n inputs are active)
m
Transfer In (the tree is developed further elsewhere)
Inhibit (output occurs if and only if the single input occurs in the presence of
a conditioning event)
External Event

Fault Trees
Guidelines for Developing Fault Trees

Replace an abstract event with a less abstract event;
Classify an event into more elementary events;
Identify distinct causes for an event;
Couple trigger event with lack of protective action;
Try to find co-operative causes for an event;
Pinpoint a component failure;
Complete the gate (fully describe all inputs to a particular gate before
developing any of the inputs further);
No gate-to-gate connections (the input to a gate should always be properly
defined fault event and not another gate).

Fault Trees
Hints for Developing Fault Trees

All the more complicated gate symbols can be constructed with the basic
and, or and not symbols.
Operator fails to
Operator fails to shut down system
shut down system
Operator pushes
wrong switch when
alarm sounds
Operator pushes wrong
Alarm Sounds
Alarm Sounds switch when alarm sounds
Partial loss of power

Partial loss of power
Generator 1 fails Generator 2 fails

Generator1 fails Generator2 Oprl Generator1 Oprl Generator2 fails

Fault Trees – Reactor System Example
E1
P1
P2
A
Reactor Tank
S1
S2
E2
Schematic diagram of the reactor protection system

Fault Trees – Reactor System Example
T
G1
P1 does not function P2 does not function
G2 G3
P1 No signal to E1 P2 No signal to E1
fails P1 fails fails P2 fails
G4
G5
A No signal to E1
fails A fails
G6 G7
S1 E2 S2 E2
fails fails fails fails
Fault Trees - Waterway Example
Op.
Building
North
WLM WLM
WLM WLM
WMP WMP
Downstream Waterway Upstream
WLM WLM
WLM Conduit WLM

WMP WMP
Op.
Building
South
Schematic diagram of the water level measurement system

Fault Trees - Waterway Example
No difference measurement available
G1
No info at South op. building No info at North op. building
G2 G3
No upstream measurement No dwnstream measurement No upstream measurement No dwnstream measurement
G4 G5 G6 G7
No signal from North bank No signal from North bank No signal from South bank No signal from South bank
G8 G9 G10 G11
NU C SU ND C SD SU C NU SD C ND

Fault Trees
Structure Functions
The Root Event of a Fault Tree can be represented by an indicator variable XR which
is a Boolean Function of the Boolean Variables X1, …, Xn describing the states of the
n events of the system.
XR = Φ(X1, …, Xn )
This function is called a Structure Function and incorporates all the causal
relationships leading to the root event.
•The Structure Function of a Series System:
•The Structure Function of a Parallel System:
Formal definition of Coherent Systems
i.Φ(X1=1, …, Xn=1) = 1 (when all components are in success state, system is
successful);
ii.Φ(X1=0, …, Xn=0) = 0 (when all components are failed, system is failed);
iii.Φ(X) ≥ Φ(Y) for X ≥ Y

Fault Trees
Minimal Cut and Path Sets for Coherent Systems

A Cut Set is a collection of basic events such that if these events occur together,
then the root event will certainly occur.
A Minimal Cut Set is a collection of basic events forming a cut set such that if any
one of these basic events is removed, than the remaining set is no longer a cut set.
In the Security System Fault Tree the Minimal Cut Sets are
• {G, B} and {G, S}
In the Reactor Protection System Fault Tree the Minimal Cut Sets are
• {P1, P2} , {E1}, {E2}, {A} , {S1, S2}
A Path Set is a collection of basic events such that if none of these events occur,
then the root event will certainly not occur.
A Minimal Path Set is a collection of basic events forming a path set such that if
any of these basic events is removed, then the remaining set is no longer a path set.
In the Security System Fault Tree the Minimal Path Sets are
• {G} and {S, B}
In the Reactor Protection System Fault Tree the Minimal Path Sets are
• {P1, E1, E2, A, S1} , {P1, E1, E2, A, S2} , {P2, E1, E2, A, S1} , {P2, E1, E2, A, S2}

Fault Trees
Boolean Algebra
A basic event in a fault tree can be represented by a Boolean (binary) variable.
There are two binary operators “and” and “or” and one unary operator “not”.
A fault tree is a pictorial representation of a Boolean expression.
◆ Key Laws in Boolean Algebra

Commutative Laws: XY=YX
X+Y=Y+X
Associative Laws: X  (Y  Z) = (X  Y)  Z
X + (Y + Z) = (X + Y) + Z
Distributive Laws: X  (Y + Z) = (X  Y) + (X  Z)
Idempotent Laws: XX=X
X+X=X
Absorbtion Law: X+XY=X
Complementation: X + X’ = Ω
(X’)’ = X
De Morgan’s Laws (X  Y)’ = X’ + Y’
(X + Y)’ = X’  Y’

Fault Trees
Boolean Algebra
In the reactor protection case, if we use each symbol (R, P1, E1, etc.) to denote the
event of failure of the associated component and Gi to denote the output at Gate i,
R = G1
= G2  G3
= (P1 + E1 + G4)  (P2 + E1 + G4)
= (P1 + E1 + A + G5)  (P2 + E1 + A + G5)
= (P1 + E1 + A + G6  G7)  (P2 + E1 + A + G6  G7)
= [P1 + E1 + A + (S1+E2)  (S2+E2)]  [P2 + E1 + A + (S1+E2)  (S2+E2)]
= P1  P2 + E1 + E2 + A + S1  S2
Cut Set Fault Tree

Representation of the Reactor
Protection Case
P1 P2 E1 E2 A S1 S2

Fault Trees
Computational Details in Boolean Algebra

[P1 + E1 + A + (S1+E2)  (S2+E2)]
= [P1 + E1 + A + (S1)  (S2+E2) + (E2)  (S2+E2)]
= [P1 + E1 + A + (S1  S2) + (S1  E2) + (E2  S2) + (E2  E2)]
= [P1 + E1 + A + (S1  S2) + (S1  E2) + (E2  S2) + (E2)]
= [P1 + E1 + A + (S1  S2) + (S1  E2) + (E2)]
= [P1 + E1 + A + (S1  S2) + (E2)]
[P2 + E1 + A + (S1+E2)  (S2+E2)] = [P2 + E1 + A + (S1  S2) + (E2)]
[P1 + E1 + A + (S1  S2) + (E2)]  [P2 + E1 + A + (S1  S2) + (E2)]

= (P1)[P2+E1+A+(S1S2)+(E2)] = (P1P2)+(P1E1)+(P1A)+(P1S1S2)+(P1E2)
+
= (E1)[P2+E1+A+(S1S2)+(E2)] = (E1P2)+(E1E1)+(E1A)+(E1S1S2)+(E1E2)
+
= (A)[P2+E1+A+(S1S2)+(E2)] = (AP2)+(AE1)+(AA)+(AS1S2)+(AE2)
+
= (S1S2)[P2+E1+A+(S1S2)+(E2)] = (S1S2P2)+(S1S2E1)+(S1S2A)+(S1S2S1S2)+(S1S2E2)
+
= (E2)[P2+E1+A+(S1S2)+(E2)] = (E2P2)+(E2E1)+(E2A)+(E2S1S2)+(E2E2)

Fault Trees
Computational Details in Boolean Algebra

[P1 + E1 + A + (S1  S2) + (E2)]  [P2 + E1 + A + (S1  S2) + (E2)]
= (P1P2)+(P1E1)+(P1A)+(P1S1S2)+(P1E2) +
(E1P2)+(E1)+(E1A)+(E1S1S2)+(E1E2) +
(AP2)+(AE1)+(A)+(AS1S2)+(AE2) +
(S1S2P2)+(S1S2E1)+(S1S2A)+(S1S2)+(S1S2E2) +
(E2P2)+(E2E1)+(E2A)+(E2S1S2)+(E2)
= (P1P2)+(P1E1)+(P1A)+(P1S1S2)+(P1E2) +
(E1P2)+(E1)+(E1S1S2)+(E1E2) +
(AP2)+(A)+(AS1S2)+(AE2) +
(S1S2P2)+(S1S2E1)+(S1S2A)+(S1S2) +
(E2P2)+(E2E1)+(E2A)+(E2)
= (P1P2)+(P1E1)+(P1A)+(P1S1S2)+(P1E2) + (E1) + (A) + (S1S2) + (E2)
= (P1P2) + (P1E1) + (E1) + (P1A) + (A) + (P1S1S2) + (S1S2) + (P1E2) + (E2)
= (P1P2) + (E1) + (A) + (S1S2) + (E2)

Fault Trees
Estimating the Probability of the Root Event

Suppose R is the Root Event and that C1, … Cn are the Minimal Cut Sets.
We know that,
Upper and Lower Bounds for P(T)
Rare Event Approximation for P(R)
Usually it is assumed that the basic events in a Cut Set occur independently.
• Probability of the Cut Set is the product of the probabilities of the Basic Events.

Fault Trees
Estimating the Probability of the Root Event
◆ Consider System Power Failure probability in the example, where the Cut
Sets are: {G, B} and {G, S}
• Suppose events G, B, and S occur independently with probability 0.1
• While the exact probability is,
System Power
Failure
Generato
r
Failure
Switch Battery
Failure Failure

Fault Trees
Advantages
Modeling via few, simple logic operations;
Directing the analysis to ferret out failures;
Focusing on one Root Event of interest at a time;
Pointing out the aspects of the system important to failure;
Providing a graphical communication tool whose understanding is easy and
analysis is transparent;
Providing an insight into system behavior;
Through minimal cuts sets, providing a synthetic result enabling the
identification of critical components.

Event Trees
Event Trees
Event Trees follow a “forward logic”: They begin with an initiating event
(an abnormal incident) and “propagate” this event through the system under
study by considering all possible ways in which it can effect the system
behavior of the (sub)systms.
Nodes of an Event Tree represent the possible functioning or
malfunctioning of a (sub)system.
• The intervention (or not) of protection systems which are supposed to take
action for the mitigation of the abnormal incident (accident/failure);
• The fulfillment (or not) of safety functions;
• The occurrence (or not) of physical phenomena (fires, dispersion etc.).
A path through an Event Tree resulting in an accident is called an accident
sequence.
• Accident sequences are quantified in terms of their probability of occurrence.
• Different endpoints of a tree can give the same consequences.

Event Trees
Initiating Event System 1 System 2 System 3
Accident
Jet fire
Sequences
Quenching
Tanks IS1S2S3
Cooling Success
Success Failure
Flow IS1S2F3
Interception
Success
IS1F2S3
Failure Success
Tube Rupture
with Release of Failure
Burnable Liquid IS1F2F3
Success
IF1S2S3
Success Failure
IF1S2F3
Failure
IF1F2S3
Failure Success
Failure
IF1F2F3
Event Trees
Event Trees
Event Trees begin with a defined accident/failure (initiating) event
• There is one Event Tree for each different accident/failure (initiating) event
considered.
• Similar initiating events may be grouped and only one representative event in
each group may be investigated in detail.
Once an initiating event is defined, all the safety functions that are required
to mitigate the accident must be defined and organized according to their
time of intervention.
The logic order of the required functions must also be accounted for.
• If the successful fulfillment of a given function is dependent on the fulfillment of another
one, the tree needs to be oriented such that the dependent functions follow those upon
which they depend;
• System dependencies can be Functional - failure of intervention of a system renders
helpless (or increases failure likelihood) regarding the intervention of a successive one;
• Or Structural - if the systems share some common parts or flow so that failure of that
part makes them both fail.

Event Trees
Event Tree Evaluation

Event Tree evaluation focuses on the computation of the conditional
probabilities associated with individual branches.
• The value aimed at is the conditional probability of the occurrence of the root
event, given that the events which preceded on that sequence have occurred.
• Multiplication of the conditional probabilities for each branch in a sequence
gives the probability of that sequence.
S2
IS1S2S3: P(S2/S1,I) x P(S1/I) x P(I)
S1 Success
Success Failure
IS1S2F3: P(F2/S1,I) x P(S1/I) x P(I)
Initiating
event IS1F2S3: P(S2/F1,I) x P(F1/I) x P(I)
Failure Success
Failure
IS1F2F3: P(F2/F1,I) x P(F1/I) x P(I)

Event Trees

Fault Tree modeling and evaluation can be deployed in order to determine
the individual event failure probabilities.
S2
IS1S2S3: P(S2/S1,I) x P(S1/I) x P(I)
S1 Success
Success Failure
IS1S2F3: P(F2/S1,I) x P(S1/I) x P(I)
Initiating
event
IS1F2S3: P(S2/F1,I) x P(F1/I) x P(I)
Failure Success
Failure
IS1F2F3: P(F2/F1,I) x P(F1/I) x P(I)

Event Trees

In case of two-way dependencies between the events in a sequence, one
approach to follow is to redefine the events in such a away as to eliminate
the dependency of the earlier event on the later one.
• This approach is called Event Tree with Boundary Conditions.
• Functions/actions in the second event upon which the success of the first event
depend are identified and redefined as explicit events preceding the first event.
Suppose S1 and S2 are two consecutive intervention systems to some
initiating event I, such that S1 needs the pumps of S2 to operate.
• Then these pumps can be redefined as a separate event, S3, while S2* refers to
the remaining functions/actions in S2 (i.e. without the pumps).
S2*
S1 Success
P(S2*/S1S3,I) x P(S1/S3I) x P(S3I) x P(I)
S3 Success Failure P(F2*/S1S3,I) x P(S1/S3I) x P(S3I) x
Success Failure Success P(I)
Failure
Initiating
event Failure

Event Tree Of The Çöllolar Open Coal Mine Case
◆ Çöllolar open coal mine in the Afşin-Elbistan region is one of the largest such mines in Turkey,
annually supplying around 10 million tons of lignite to the nearby thermal power plant.
◆ On February 10, 2011, a major landslide occured on the eastern wall of the Çöllolar mine. 10 miner
perished in this disaster, which occured along the full 1150 meter lenght and 140 meter height of
the eastern wall and carried 50 million m3 of material.
◆ Experts examining the disaster site came up with the following findings:
▪ The coal layers were not continous and featured almost vertical discontinuities.
▪ There were soft clay layers sandwiched between coal & other layers, and sloped towards the mine.
▪ The slopes of the site walls were far steeper (35 m.

steps with 55° walls and narrow step widths)
than of the neighboring Kışlaköy mine (35 m.
steps with 30° walls) and than the general 16°
slope suggested by the geological consulting
company REGmbH.
▪ The nearby Hurman stream greatly enriched the
underground water stocks and found its way to the
mining area.

▪ There were a number of drainage wells in the mine area to drain the underground water; but they
were not well maintained.
▪ As the excavations contiued, many stress cracks appeared in the high grounds around the mine;
the management had the cracks filled with ash.
▪ In the days right before the landslide, the ash level in some of the cracks displayed around 20 cm.
depressions.
▪ The experts believe that the above factors all contributed to the landslide disaster.

Effective Drainage No Stress Cracks
Event Tree of the Wells
IS1S2S3S4
Çöllolar Open Coal No Soft Clay

Layers
Significant Stress Cracks IS1S2S3F4
Mine Accident No/Non Effective No Stress Cracks

IS1S2F3S4
Drainage Wells
Continous Significant Stress Cracks IS1S2F3F4
Coal Layer
IS1F2S3S4
Wells
Soft Clay Significant Stress Cracks IS1F2S3F4
Layers
No Stress Cracks
No/Non Effective IS1F2F3S4
Drainage Wells
Steep Open Mine Significant Stress Cracks IS1F2F3F4
Walls
Wells
IF1S2S3S4
No Soft Clay Significant Stress Cracks IF1S2S3F4
Layers
No/Non Effective No Stress Cracks
IF1S2F3S4
Drainage Wells
Significant Stress Cracks IF1S2F3F4
Non-continous
Coal Layer No Stress Cracks
Effective Drainage
Wells
IF1F2S3F4
Soft Clay Significant Stress Cracks IF1F2S3F4
Layers
No Stress Cracks
No/Non Effective IF1F2F3S4
Drainage Wells
Significant Stress Cracks IF1F2F3F4

THY Tekirdağ Case
◆ On February 24, 2009, a Turkish Airlines Boeing 737-800 aircraft, TK1951, flying from Istanbul Atatürk Airport
to Amsterdam Schiphol Airport, with 135 passenger and 8 crew members, crashed 750 meters short of the airport
runway, at 11:50 local time. The aircraft split into three sections, but there were no fire neither explosions. In the
accident 9 crew members and passenger (including three pilots) died and there were around 50 injuries, half being
quite serious.
◆ The weather conditions at the time of the incident were cloudy, with negligable wind. The communication
between the tower and the aircraft had proceeded normally up until the indicent, which seemed to occur without
any warning whatsoever. The aircraft was approaching the final stages of a routine landing and the pilots had not
reported anything unusul. The tower had tightly squeezed in this landing between two other landings (thus the
angle of descent was a bit steeper than usual).
◆ The survivors of the accident said that they did not notice
anything unusual until the crash.
◆ Some indicated that the landing speed seemed to be faster
than usual, but there was no loss of control.
◆ Some claimed that the aircraft had stalled while flying.
◆ One survivor said he felt a shaking then a slight sense of
uplifting but then came the sudden and unexpected crash.
◆ The crash angle was very small (as if the aircraft was flying
parallel to ground at an extremely low altitude).
◆ The Dutch authorites initiated rescue efforts, including 100
emergency personnel, 60 ambulances and 5 helicopters.

THY Tekirdağ Case
◆ Claim 1: Pilotage error.
◆ Claim 2: Lack of experience of the rookie pilot.
◆ Claim 3: Left engine tore off.
◆ Claim 4: Both engines stalled
◆ Claim 5: A bird was sucked into the engine.
◆ Claim 6: There was no fuel left.
◆ Claim 7: Aircraft entered an air turbulance.
◆ Claim 8: Automatic pilot was not activated.
◆ Claim 9: Icing in the tail impared manuvering and landing.
◆ Claim 10: Pilots sacrificed themselves by hard braking to
avoid a crash landing on a highway.
◆ Civil aviation experts acknowledged a defective altimeter and pilots’ realization of this being too late. Auto-pilot
landing was also confirmed with low visibility conditions most probably being the primary reason for this choice.
◆ Holland Department of Aviation Safety declared that the aircraft was on auto-pilot at the time of the incident and
that a faulty altimeter led to loss of speed right before the incident.
◆ Deployment of the auto-pilot system at landing is quite routine in many airlines and is decided upon by the pilots.
◆ According to the black box, at a height of 1950 ft the left altimeter suddenly indicated a change in altitude (from
1950 to 8 ft) and passed this onto the auto-pilot. This change had a direct impact upon the auto-throttle system
which provides engine power.

THY Tekirdağ Case
◆ It is not clear whether the crew new about the left altimeter
problem before. But they were notified that something was
amiss via the warning signal “landing gear must go down”.
This signal was not regarded to be a serious problem.
◆ Due to the left altimeter problem, the pilots transferred the
auto-pilot system to the controls on the right; unfortunately,
even though the pilots now referred to the right altimeter, the
auto-throttle continued to rely on the left altimeter.
◆ The auto-pilot responded to perception of being just a few
meters above ground by reducing engine power. It assumed
that the aircraft was in the final stages of the flight. Thus, the
aircraft lost speed.
◆ Due to the low visibility conditions and their steep descent angle, the crew did not notice the flawed actions of the
auto-throttle.
◆ Meanwhile, a relatively inexperienced second pilot was in the controls of the auto-pilot (since the captain did not
see the altimeter defect, nor the low visibility conditions as a serious problem).
◆ When the aircraft started stalling, the pilots finally acknowledged the gravity of the situation; the second pilot
tried to apply full power to break away from this unusual and dangerous situation.
◆ Unfortunately, (for some reason) the auto-throttle was still active; so, it pushed back the controls to the IDLE
position, probably preventing a last minute speed-up.
◆ Airspeed was reduced to 170 km (way below the minimum necessary 260 km) and the plane crashed.

THY Tekirdağ Case – Fault Tree Analysis
Loss of Power During Approach
Engine Failure Deliberate Power Cut Lack of Fuel
Autopilot Misguidance: Faulty Human Error: Failure to act upon

Perception of Closeness to Runway autopilot’s landing gear warning
Failure to notice Not comprehending

the warning its implications
Faulty Faulty Faulty

Altimeter Speed Altimeter
R. Check L.

Loss of Power During Approach
Engine Failure Deliberate Power Cut Lack of Fuel



R. Check L.

THY Tekirdağ Case – Event Tree Analysis
Initiating Event System 1 System 2 System 3 System 4
Accident
/Undesirable Event
Experienced Sequences
Pilot
Possible Event Good Visibility
IS1S2S3S4
Conditions Success
Tree of the THY Manual Effective Cross-
Success Failure
Tekirdağ Flight check Circuits IS1S2S3F4
Accident Success Success
Failure Success IS1S2F3S4
Failure IS1S2F3F4
Altimeter
Malfunction Success IF1S2S3S4
Success Failure IF1S2S3F4
Failure Success IF1S2F3S4
Success
Failure IF1S2F3F4
Failure
Autopilot Success IF1F2S3S4
Flight Success Failure
Failure IF1F2S3F4
Failure Success IF1F2F3S4
Failure
IF1F2F3F4

Accident
/Undesirable Event
Experienced
Pilot Sequences
Good Visibility
Possible Event Conditions Success IS1S2S3S4
Tree of the THY Manual Effective Cross- Success Failure
Flight
Tekirdağ check Circuits IS1S2S3F4
Success Success IS1S2F3S4
Accident Failure Success
Failure
IS1S2F3F4
Altimeter
Malfunction
“Manual Flight” Main Branch of the Event Tree

◆ If the plane were on manual flight, malfunctioning of the altimeter system would not likely to cause
a serious problem.
◆ Effectiveness of the cross-check circuits would not matter;
◆ If visibility conditions were good, the pilot could observe and judge the altitude himself (and act
accordingly), if the visibility conditions were bad, then the pilot would receive accurate altitude
information from the tower;
◆ Success or failure of the manual flight would be slightly influenced by the pilot’s own experience.

Accident
/Undesirable Event
Experienced
Pilot
Sequences
Conditions Success IF1S2S3S4
Tree of the THY Effective Cross- Success Failure
Tekirdağ check Circuits IF1S2S3F4
Altimeter
Accident Malfunction
Success
Autopilot Failure IF1S2F3F4
Flight
“Autopilot Flight & Cross-check Circuits Effective” Branch of the Event Tree
◆ If the plane were on auto pilot and cross-check circuits were effective, then the autopilot would try
to confirm the altitude information from the (faulty) altimeter system with other related status
information from the cross-check system (such as speed of aircraft and status of the landing gear).
◆ In case of conflicting feedback, the autopilot system would warn the pilot before taking any action
and suggest switching to manual flight.
◆ From that point onwards the manual flight conditions would prevail.

Altimeter Accident
Malfunction /Undesirable Event
Autopilot Experienced Sequences
Possible Event Flight Pilot
Good Visibility
Tree of the THY Ineffective Conditions Success IF1F2S3S4
Tekirdağ Cross-check
Success Failure IF1F2S3F4
Circuits
Accident
Failure IF1F2F3F4
“Autopilot Flight & Cross-check Circuits Ineffective” Branch of the Event Tree
◆ If the plane were on auto pilot & cross-check circuits were not effective, but visibility conditions
were good, then, presumably, the pilot would notice that something is amiss the instant the autopilot
cut-off power and attempted to reduce aircraft speed to that of immediate landing speed.
◆ Then, pilot would have ample time to override/shut-off the autopilot system properly and proceed
with manual landing. The pilot’s success or failure in properly switching-off the autopilot and then
proceeding with manual landing would be slightly influenced by his experience.
◆ If the visibility conditions were bad, it could take some time for the pilot to notice that something is
wrong. Even then, an experienced pilot might immediately and properly switch to manual flight,
while an inexperienced pilot might panic and do not properly switch to manual flight.

The Chain of Triggering Events and Their Potential Causes

The Initiating Event: Altimeter Malfunction
• Possible THY Organizational Error – Improper Maintenance: If the malfunction that led
to the accident was not the first one (as claimed by some critics), then there is a strong
suspicion of maintenance lapse (why was not the faulty unit fixed or replaced?).
• Possible Boeing Design Error – Improper System Design: In the system architecture, the
auto-throttle is set up to receive altitude information only from the left wing altimeter and
make key operational decision without cross-checking that information with the right wing
altimeter. In a more prudent design the two altimeters would be deployed to validate one
another and in case of conflicting information the auto-pilot system should warn the pilots.
The First Triggering Event: Decision to Land on Auto-pilot System
• Possible THY Procedural/Human Error: If the pilots were not informed about the faulty
altimeter or they failed to notice it during routine flight checks, this would indicate a
serious procedural or human error.
• Possible THY Human Error: If the pilots decided to land on auto-pilot system while
knowing about the faulty altimeter, this would indicate a serious judgment error.


◆ The Second Triggering Event: Ineffective Cross-check Circuits.
Possible Boeing Design Error – Improper System Design: In the system architecture, the auto-
throttle is set up to automatically reduce power (and speed) to immediate landing conditions,
once it receives altitude information indicating “immediate landing” or “landed” status.
In a more prudent design the information from the altimeter system would be cross-checked
against other indicators of “immediate landing” or “landed” status (such as current speed and
status of landing gear), and in case of conflicting information, the auto-pilot system would
effectively warn the pilots before taking any action and suggest manual flight.
◆ The Third Triggering Event: Cloud Cover and Low Visibility.
Possible Chance Factor: In better visibility conditions, presumably the pilots would have easily
and immediately realized that something were amiss the instant the autopilot system
drastically reduced power at a high altitude. Unfortunately, the visibility conditions cannot be
controlled or manipulated.


◆ The Fourth Triggering Event: Steep Descent Landing Conditions.
Possible Airport Procedural Error: In normal descent landing conditions, presumably it would
have been easier for the pilots to realize that something were amiss the instant the auto-pilot
system drastically reduced power at a high altitude. On the other hand, the pilots could have
perceived the steep decline conditions resulting from the “drastically reduced power” status
brought on by the auto-pilot system with that of the steep descent angle instructed by the
tower.
◆ The Fifth Triggering Event: Inexperienced Pilot in Control.
Possible THY Human Error: If the captain pilot new about the faulty altimeter, then leaving
control to an inexperienced pilot could be considered a serious judgment error.
As it happened, when the deadly insufficient power/low speed condition was recognized, the
inexperienced pilot in control (probably in panic) tried to increase power, but the engines did
not respond since the auto–pilot system (which was not switched-off) countermanded the
pilot’s instruction.
An experienced pilot might have reacted coolly, first immediately (but properly) switch to
manual flight and then increasing power.

Monte Carlo Simulation
u Dynamic Analysis: Simulation of a dynamic/stochastic system based

on any given set of rules and/or historical data and/or expert judgment.
u Monte Carlo (MC) Simulation: Develops computer models that creates
different possible futures or samples using random numbers.
• It involves random sampling of each probability distribution within the model to
produce thousands of computer generated realizations (iterations, trials).
• Work out different indicators from the samples (such as central location, spread,
distribution function).
u Advantages:
• The individual distributions in the model need not be approximated;
• Correlations and other interdependencies can be modeled;
• The level of math necessary to perform MC Simulation is quite basic;
• Complex mathematics (power functions, logs) can be included with ease;
• The behavior of the overall model can be investigated with ease;
• Changes to the model can be made quickly and results compared;
• MC Simulation is widely recognized as a valid technique.

Monte Carlo Simulation Example – Project Risks
u Example: We would like to assess and understand the random duration

(and maybe cost) of a project, given,
• The set of activities comprising the project and their precedence relationships;
• The probability distribution governing the duration of each activity;
• The probability distribution governing the cost of each activity.
u “Assessing and understanding the random duration (cost)” may involve,
• Estimating the expected project duration (and expected cost);
• Measuring the variability of actual project duration (cost) around the mean;
• Likelihood of completing the project by some specific time T (and/or cost C).
u In the ensuing MC simulation,
• In each iteration a series of activity duration (cost) realizations are generated for
each activity;
• A project duration and total cost value is determined based on the generated
realizations (through ordinary CPM computations);
• Thousands of simulation iterations accumulate thousands of such determined
project durations (and costs) realizations;
• Various statistical studies can then be done based on the accumulated data set.

Monte Carlo Simulation Example – Oil Drilling Risks
u Example: We would like to assess time and cost overruns risks in Oil
Prospecting/Drilling Operations in a certain area, given,
• Probability distribution regarding existance of an oil reserve;
• Probability distribution regarding size and quality of the oil reserve;
• Probability distribution regarding depth of the field and soil characteristics;
• Probability distribution regarding marketability & sale price of the extracted oil.
u “Assessing time and cost overrun risks” may involve,
• Estimating drilling costs and durations;
• Estimating amount and duration of oil extraction;
• Estimating the revenue to be gained from the sales of the extracted oil.
u In the ensuing MC simulation,
• A series (set) of realizations for each of the random factors (existance, size and quality of
oil, depth of oil field and soil quality, market profile and oil prices) are generated based
on the assumed probability distributions (one iteration);
• Related extraction time, extraction costs and oil sales revenues are determined based on
the generated realizations;
• Thousands of simulation iterations accumulate thousands of such determined extraction
time, extraction cost and oil sales revenue realizations;
• Various statistical studies can then be done based on the accumulated data set.

Dynamic Analysis
u Discrete Event Simulation (DES): It differs from MC Simulation mainly

in that it models the evolution of a system over time.
• It does this by allowing the user to define equations for each element in the
model for how it changes, moves and/or interacts with other elements.
• Then it steps the system in small time increments and keeps track of the state of
all elements at any time (such as parts in a manufacturing system, patients in an
hospital, ships in a harbor).
u DES allows the user to model complicated systems in a simple way, by
defining how the elements interact and then letting the model simulate what
might happen. Regarding risk assessment, it may be used to model,
• Flow/dispersion of hazardous gases & liquids to assess potential harm to
environment and/or communities;
• Flow of vessel traffic to assess maritime accident risk;
• Behavior of heavy rains and flow or rainwater to assess flooding risks;
• Behavior of sensitive equipment deployed in handling/manufacturing/storage of
flammable or explosive material to help assess fire/explosion risk.

◆ Çöllolar open coal mine in the Afşin-Elbistan region is one of the largest such mines in Turkey,
annually supplying around 10 million tons of lignite to the nearby thermal power plant.
◆ On February 10, 2011, a major landslide occured on the eastern wall of the Çöllolar mine. 10 miner
perished in this disaster, which occured along the full 1150 meter lenght and 140 meter height of
the eastern wall and carried 50 million m3 of material.
Experts examining the disaster site came up with the following findings:
▪ The coal layers were not continous and featured almost vertical discontinuities.
▪ There were soft clay layers sandwiched between coal & other layers, and sloped towards the mine.
▪ The slopes of the site walls were far steeper (35 m.

steps with 55° walls and narrow step widths)
than of the neighboring Kışlaköy mine (35 m.
steps with 30° walls) and than the general 16°
slope suggested by the geological consulting
company REGmbH.
▪ The nearby Hurman stream greatly enriched the
underground water stocks and found its way to the
mining area.

▪ There were a number of drainage wells in the mine area to drain the underground water; but they
were not well maintained.
▪ As the excavations contiued, many stress cracks appeared in the high grounds around the mine;
the management had the cracks filled with ash.
▪ In the days right before the landslide, the ash level in some of the cracks displayed around 20 cm.
depressions.
▪ The experts believe that the above factors all contributed to the landslide disaster.

Event Tree of the Wells
IS1S2S3S4
No Soft Clay Significant Stress Cracks IS1S2S3F4
Çöllolar Open Coal Layers
Mine Accident No/Non Effective No Stress Cracks
IS1S2F3S4
Drainage Wells
Continous Significant Stress Cracks IS1S2F3F4
Coal Layer
IS1F2S3S4
Wells
Soft Clay Significant Stress Cracks IS1F2S3F4
Layers

IS1F2F3S4
Drainage Wells
Steep Open Mine Significant Stress Cracks IS1F2F3F4
Walls
Wells
IF1S2S3S4
No Soft Clay Significant Stress Cracks IF1S2S3F4
Layers
IF1S2F3S4
Drainage Wells
Significant Stress Cracks IF1S2F3F4
Non-continous
Coal Layer No Stress Cracks
Effective Drainage
Wells
IF1F2S3F4
Soft Clay Significant Stress Cracks IF1F2S3F4
Layers
IF1F2F3S4
Drainage Wells
Significant Stress Cracks IF1F2F3F4

THY Tekirdağ Case - Facts
◆ On February 24, 2009, a Turkish Airlines Boeing 737-800 aircraft, TK1951, flying from Istanbul Atatürk Airport
to Amsterdam Schiphol Airport, with 135 passenger and 8 crew members, crashed 750 meters short of the airport
runway, at 11:50 local time. The aircraft split into three sections, but there were no fire neither explosions. In the
accident 9 crew members and passenger (including three pilots) died and there were around 50 injuries, half being
quite serious.
◆ The weather conditions at the time of the incident were cloudy, with negligable wind. The communication
between the tower and the aircraft had proceeded normally up until the indicent, which seemed to occur without
any warning whatsoever. The aircraft was approaching the final stages of a routine landing and the pilots had not
reported anything unusul. The tower had tightly squeezed in this landing between two other landings (thus the
angle of descent was a bit steeper than usual).
◆ The survivors of the accident said that they did not notice
anything unusual until the crash.
◆ Some indicated that the landing speed seemed to be faster
than usual, but there was no loss of control.
◆ Some claimed that the aircraft had stalled while flying.
◆ One survivor said he felt a shaking then a slight sense of
uplifting but then came the sudden and unexpected crash.
◆ The crash angle was very small (as if the aircraft was flying
parallel to ground at an extremely low altitude).
◆ The Dutch authorites initiated rescue efforts, including 100
emergency personnel, 60 ambulances and 5 helicopters.

THY Tekirdağ Case – Claims & Findings
◆ Claim 1: Pilotage error.
◆ Claim 2: Lack of experience of the rookie pilot.
◆ Claim 3: Left engine tore off.
◆ Claim 4: Both engines stalled
◆ Claim 5: A bird was sucked into the engine.
◆ Claim 6: There was no fuel left.
◆ Claim 7: Aircraft entered an air turbulance.
◆ Claim 8: Automatic pilot was not activated.
◆ Claim 9: Icing in the tail impared manuvering and landing.
◆ Claim 10: Pilots sacrificed themselves by hard braking to
avoid a crash landing on a highway.
◆ Civil aviation experts acknowledged a defective altimeter and pilots’ realization of this being too late. Auto-pilot
landing was also confirmed with low visibility conditions most probably being the primary reason for this choice.
◆ Holland Department of Aviation Safety declared that the aircraft was on auto-pilot at the time of the incident and
that a faulty altimeter led to loss of speed right before the incident.
◆ Deployment of the auto-pilot system at landing is quite routine in many airlines and is decided upon by the pilots.
◆ According to the black box, at a height of 1950 ft the left altimeter suddenly indicated a change in altitude (from
1950 to 8 ft) and passed this onto the auto-pilot. This change had a direct impact upon the auto-throttle system
which provides engine power.

THY Tekirdağ Case - Findings
◆ It is not clear whether the crew new about the left altimeter
problem before. But they were notified that something was
amiss via the warning signal “landing gear must go down”.
This signal was not regarded to be a serious problem.
◆ Due to the left altimeter problem, the pilots transferred the
auto-pilot system to the controls on the right; unfortunately,
even though the pilots now referred to the right altimeter, the
auto-throttle continued to rely on the left altimeter.
◆ The auto-pilot responded to perception of being just a few
meters above ground by reducing engine power. It assumed
that the aircraft was in the final stages of the flight. Thus, the
aircraft lost speed.
◆ Due to the low visibility conditions and their steep descent angle, the crew did not notice the flawed actions of the
auto-throttle.
◆ Meanwhile, a relatively inexperienced second pilot was in the controls of the auto-pilot (since the captain did not
see the altimeter defect, nor the low visibility conditions as a serious problem).
◆ When the aircraft started stalling, the pilots finally acknowledged the gravity of the situation; the second pilot
tried to apply full power to break away from this unusual and dangerous situation.
◆ Unfortunately, (for some reason) the auto-throttle was still active; so, it pushed back the controls to the IDLE
position, probably preventing a last minute speed-up.
◆ Airspeed was reduced to 170 km (way below the minimum necessary 260 km) and the plane crashed.

How It Should Have Loss of Power During Approach
Been Designed
Engine Failure Delibarate Power Cut Lack of Fuel



R. Check L.

How It Was Loss of Power During Approach
Probably Designed
Engine Failure Delibarate Power Cut Lack of Fuel



R. Check L.

Accident
/Undesirable Event
Experienced Sequences
Pilot
IS1S2S3S4
Conditions Success
Tree of the THY Manual Effective Cross-
Success Failure
Tekirdağ Flight check Circuits IS1S2S3F4
Accident Success Success
Failure Success IS1S2F3S4
Failure IS1S2F3F4
Altimeter
Malfunction Success IF1S2S3S4
Success Failure IF1S2S3F4
Success
Failure IF1S2F3F4
Failure
Autopilot Success IF1F2S3S4
Flight Success Failure
Failure IF1F2S3F4
Failure
IF1F2F3F4

Accident
/Undesirable Event
Experienced
Pilot Sequences
Good Visibility
Possible Event Conditions Success IS1S2S3S4
Tree of the THY Manual Effective Cross- Success Failure
Flight
Tekirdağ check Circuits IS1S2S3F4
Success Success IS1S2F3S4
Accident Failure Success
Failure
IS1S2F3F4
Altimeter
Malfunction
“Manual Flight” Main Branch of the Event Tree

◆ If the plane were on manual flight, malfunctioning of the altimeter system would not likely to cause
a serious problem.
◆ Effectiveness of the cross-check circuits would not matter;
◆ If visibility conditions were good, the pilot could observe and judge the altitude himself (and act
accordingly), if the visibility conditions were bad, then the pilot would receive accurate altitude
information from the tower;
◆ Success or failure of the manual flight would be slightly influenced by the pilot’s own experience.

Accident
/Undesirable Event
Experienced
Pilot
Sequences
Conditions Success IF1S2S3S4
Tree of the THY Effective Cross- Success Failure
Tekirdağ check Circuits IF1S2S3F4
Altimeter
Accident Malfunction
Success
Autopilot Failure IF1S2F3F4
Flight
“Autopilot Flight & Cross-check Circuits Effective” Branch of the Event Tree
◆ If the plane were on auto pilot and cross-check circuits were effective, then the autopilot would try
to confirm the altitude information from the (faulty) altimeter system with other related status
information from the cross-check system (such as speed of aircraft and status of the landing gear).
◆ In case of conflicting feedback, the autopilot system would warn the pilot before taking any action
and suggest switching to manual flight.
◆ From that point onwards the manual flight conditions would prevail.

Altimeter Accident
Malfunction /Undesirable Event
Autopilot Experienced Sequences
Possible Event Flight Pilot
Good Visibility
Tree of the THY Ineffective Conditions Success IF1F2S3S4
Tekirdağ Cross-check
Success Failure IF1F2S3F4
Circuits
Accident
Failure IF1F2F3F4
“Autopilot Flight & Cross-check Circuits Ineffective” Branch of the Event Tree
◆ If the plane were on auto pilot & cross-check circuits were not effective, but visibility conditions
were good, then, presumably, the pilot would notice that something is amiss the instant the autopilot
cut-off power and attempted to reduce aircraft speed to that of immediate landing speed.
◆ Then, pilot would have ample time to override/shut-off the autopilot system properly and proceed
with manual landing. The pilot’s success or failure in properly switching-off the autopilot and then
proceeding with manual landing would be slightly influenced by his experience.
◆ If the visibility conditions were bad, it could take some time for the pilot to notice that something is
wrong. Even then, an experienced pilot might immediately and properly switch to manual flight,
while an inexperienced pilot might panic and do not properly switch to manual flight.

Experienced
Good Visibility Steep Descent Pilot
Conditions Conditions Success
Possible Event Manual Effective Cross-
Success Failure
Flight check Circuits
Tree of the THY
Failure
Tekirdağ Success Success Success
Failure
Accident
Altimeter Success
Malfunction Success Failure
Success Failure Success

This slide presents an extended version of the Failure
event tree: “Steep descent landing conditions”
imposed on the aircraft by the Tower’s decision
Success
to squeeze its landing between two regular Failure
landings is included. Normal Descent Failure
Autopilot
In normal descent conditions, it could have been Flight Success Steep Descent Success
easier for the pilots to notice that something
were amiss the instant the auto-pilot system Failure Failure
drastically reduced power at a high altitude. In
this cas,e however, the pilots could have Success
perceived the steep decline conditions resulting Failure
Failure Normal Descent
from the “drastically reduced power” status
brought on by the auto-pilot system with that of Steep Descent Success
the steep descent angle instructed by the tower.
Failure


◆ The Initiating Event: Altimeter Malfunction
• Possible THY Organizational Error – Improper Maintenance: If the malfunction that
led to the accident was not the first one (as claimed by some critics), then there is a strong
suspicion of maintenance lapse (why was not the faulty unit fixed or replaced?).
auto-throttle is set up to receive altitude information only from the left wing altimeter and
make key operational decision without cross-checking that information with the right
wing altimeter. In a more prudent design the two altimeters would be deployed to validate
one another and in case of conflicting information the auto-pilot system should warn the
pilots.
◆ The First Triggering Event: Decision to Land on Auto-pilot System
• Possible THY Procedural/Human Error: If the pilots were not informed about the
faulty altimeter or they failed to notice it during routine flight checks, this would indicate
a serious procedural or human error.
• Possible THY Human Error: If the pilots decided to land on auto-pilot system while
knowing about the faulty altimeter, this would indicate a serious judgment error.


◆ The Second Triggering Event: Ineffective Cross-check Circuits.
auto-throttle is set up to automatically reduce power (and speed) to immediate landing
conditions, once it receives altitude information indicating “immediate landing” or “landed”
status.
In a more prudent design the information from the altimeter system would be cross-checked
against other indicators of “immediate landing” or “landed” status (such as current speed and
status of landing gear), and in case of conflicting information, the auto-pilot system would
effectively warn the pilots before taking any action and suggest manual flight.
◆ The Third Triggering Event: Cloud Cover and Low Visibility.
• Possible Chance Factor: In better visibility conditions, presumably the pilots would have
easily and immediately realized that something were amiss the instant the autopilot system
drastically reduced power at a high altitude. Unfortunately, the visibility conditions cannot be
controlled or manipulated.


◆ The Fourth Triggering Event: Steep Descent Landing Conditions.
• Possible Airport Procedural Error: In normal descent landing conditions, presumably it
would have been easier for the pilots to realize that something were amiss the instant the auto-
pilot system drastically reduced power at a high altitude. On the other hand, the pilots could
have perceived the steep decline conditions resulting from the “drastically reduced power”
status brought on by the auto-pilot system with that of the steep descent angle instructed by
the tower.
◆ The Fifth Triggering Event: Inexperienced Pilot in Control.
• Possible THY Human Error: If the captain pilot new about the faulty altimeter, then leaving
control to an inexperienced pilot could be considered a serious judgment error.
As it happened, when the deadly insufficient power/low speed condition was recognized, the
inexperienced pilot in control (probably in panic) tried to increase power, but the engines did
not respond since the auto–pilot system (which was not properly switched-off) countermanded
the pilot’s instruction.
An experienced pilot might have reacted coolly, first immediately (but properly) switch to
manual flight and then increasing power.

58A All Slides

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

58A All Slides

Uploaded by

Copyright:

Available Formats

RISK ANALYSIS & MANAGEMENT

İlhan Or - Boğaziçi University 1

İlhan Or - Boğaziçi University 2

u The concept of Risk and Risk Assessment has a long history.

İlhan Or - Boğaziçi University 3

u Development of formal Risk Appreciation over time:

İlhan Or - Boğaziçi University 4

u Establishment of Causal Links over time:

➢ Greeks & Romans observed the adverse effect of exposure to lead;

▪ 16th – 18th Century:

➢ Evelyn linked smoke in London to various types of acute respiratory problems;

➢ Ramazinni indicated that nuns living in Appenine monasteries appeared to have

➢ Hutchinson showed that occupational&medical exposure to arsenic can lead to

İlhan Or - Boğaziçi University 5

u Some Early Examples of Societal Risk Management:

İlhan Or - Boğaziçi University 6

u Two major obstacles impeded the progress in establishing

İlhan Or - Boğaziçi University 7

İlhan Or - Boğaziçi University 8

İlhan Or - Boğaziçi University 9

İlhan Or - Boğaziçi University 10

İlhan Or - Boğaziçi University 11

u Industrial Hazards are threats to people and life support

İlhan Or - Boğaziçi University 12

An Industrial Disaster - Buncefield Fire, 2005 U.K.

İlhan Or - Boğaziçi University 13

u Characteristics of Industrial Hazards:

İlhan Or - Boğaziçi University 14

u Other Types of Hazards:

İlhan Or - Boğaziçi University 15

u Emergency: An unexpected situation or sudden occurrence of

İlhan Or - Boğaziçi University 16

İlhan Or - Boğaziçi University 17

İlhan Or - Boğaziçi University 18

İlhan Or - Boğaziçi University 19

İlhan Or - Boğaziçi University 20

u Risk Management is a set of integrated and coordinated proactive

Ilhan Or - Boğaziçi University 21

İlhan Or - Boğaziçi University 22

İlhan Or - Boğaziçi University 23

İlhan Or - Boğaziçi University 24

u Low Probability / High consequence events are often perceived

İlhan Or - Boğaziçi University 25

İlhan Or - Boğaziçi University 26

İlhan Or - Boğaziçi University 27

İlhan Or - Boğaziçi University 28

İlhan Or - Boğaziçi University 29

Actual number of deaths per year

Ilhan Or - Boğaziçi University 30

İlhan Or - Boğaziçi University 31

İlhan Or - Boğaziçi University 32

İlhan Or - Boğaziçi University 33

İlhan Or - Boğaziçi University 34

u Hazard warnings and hazard probability statements are not

İlhan Or - Boğaziçi University 35

u Such quantifications can rarely be applied across the board.

İlhan Or - Boğaziçi University 36

Estimated Loss of Life Expectancy Due to Various Causes

İlhan Or - Boğaziçi University 37

u The units selected for the presentation of risk also influence

İlhan Or - Boğaziçi University 38

u Many of the global (and local) major Social, Political,

İlhan Or - Boğaziçi University 39