Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ELECTRONICALLY REPRINTED FROM MARCH 2016

Alarm Management By
the Numbers
Deeper understanding of common alarm-system metrics can improve remedial actions and
result in a safer plant

Kim VanCamp
Emerson Process
Management

IN BRIEF
ALARM MANAGEMENT
PERFORMANCE METRICS

ALARM SYSTEM
EXAMPLE METRICS

AVERAGE ALARM RATES

PEAK ALARM RATE

ALARM PRIORITY
DISTRIBUTION

ALARM SOURCE
CONTRIBUTION

STALE ALARMS

CLOSING REMARKS

FIGURE 1. A better understanding of alarm system metrics can lead to more focused remedial actions and help to make the
plant safer

D
o you routinely receive “alarm leading to more focused remedial actions
management performance” re- and ultimately to a safer, better performing
ports, or are you expected to plant (Figure 1).
monitor a managerial dashboard This article reviews the now well estab-
equivalent? What do you look for and what lished benchmark metrics associated with
does it mean? We all know that fewer the alarm-management discipline. Most arti-
alarms mean fewer operator interruptions cles previously published on alarm manage-
and presumably fewer abnormal process or ments cover alarm concepts (for example,
equipment conditions. But a deeper under- defining a valid alarm), alarm management
standing of the more common alarm-man- methods (for instance, rationalization tech-
agement metrics can yield greater insight, niques), justification (such as the benefits of
TABLE 1. EXAMPLE OF TYPICAL ALARM PERFORMANCE METRICS, TARGETS AND ACTION LIMITS
Metric Target Action limit
Average alarm rate per operator (alarms per day) < 288 > 432
Average alarm rate per operator (alarms per hour) < 12 > 18
Average alarm rate per operator (alarms per 10 minutes) 1–2 >3
Percent of 10-minute periods containing > 10 alarms < 1% > 5%
Maximum number of alarms in a 10 minute period ≤10 > 10
Percent of time the system is in flood < 1% > 5%
Annunciated priority distribution (low priority) ~80% < 50%
Annunciated priority distribution (medium priority) ~15% > 25%
Annunciated priority distribution (high priority) ~5% >15%
Percent contribution of top 10 most frequent alarms < 1% to ~5% > 20%
Quantity of chattering and fleeting alarms 0 >5
Stale alarms (number of alarms active for more than >24 hours) < 5 on any day >5

investing in alarm management) and tools ease of use, integration and migration, re-
(including dynamic alarming enablers). This porting capabilities, price, support avail-
article provides a different perspective. Writ- ability and so forth; with reasonable cer-
ten for process plant operation managers or tainty that the KPIs derived from the chosen
others that routinely receive alarm manage- product can be interpreted consistently and
ment performance reports, this article aims compared across sites and across differing
to explain the most common metrics, with- process control, safety and other open plat-
out requiring an understanding of the alarm- form communications (OPC)-capable alarm-
management discipline in depth. generating sources.
In addition to defining the KPI measure-
Alarm-management KPIs ments, the EEMUA-191, ISA-18.2 and
The first widely circulated benchmark met- IEC62682 publications also suggest perfor-
rics, or key performance indicators (KPIs), mance targets, based in large part on the
for alarm management relevant to the practical experience of the companies par-
chemical process industries (CPI) were pub- ticipating in the committees that contributed
lished in the 1999 edition of the Engineering to each publication. As an example, these
Equipment and Materials Users Association publications state that an average long-term
publication EEMUA-191 Alarm Systems – rate of new alarms occurring at a frequency
A Guide to Design, Management and Pro- of up to 12 alarms per hour is the maximum
curement [1]. Later works from standards manageable for an operator. Suggested
organizations, such as the 2009 publication performance levels such as this can pro-
International Society of Automation (ISA) vide a reasonable starting point if you are
18.2 Management of Alarm Systems for the just beginning an alarm-management pro-
Process Industries [2] and the 2014 publi- gram. But before deciding what constitutes
cation IEC62682 Management of alarms a reasonable set of targets for your site, you
systems for the process industries [3], should also consider other firsthand inputs,
built upon EEMUA-191 and have furthered like surveying your operators and reviewing
alarm-management thought and discipline. in-house studies of significant process dis-
For example, they provide a lifecycle frame- turbances and alarm floods. Note that more
work for effectively managing alarms and es- research into the human factors that affect
tablish precise definitions for core concepts operator performance is needed to validate
and terminology. Yet fifteen years later, little and potentially improve on the current pub-
has changed regarding the metrics used to lished performance targets. Important work
measure alarm-system performance. This in this area is ongoing at the Center for Op-
consistency in measurement has been posi- erator Performance (Dayton, Ohio; www.
tive in many respects, leading to the wide operatorperformance.org).
availability of generally consistent commer-
cial alarm analytic reporting products, from Alarm system example metrics
both control-system vendors and from com- A typical alarm-performance report contains
panies that specialize in alarm management. a table similar to Table 1, where the metrics
Consequently, selection of an alarm-analysis and targets are based upon, and in many
product may be based on factors such as cases, copied directly from, the EEMUA-
FIGURE 2. Timeline views of 1400 Alarm rates for Figure 2 on a per-hour
the data can reveal periods
where alarm performance is 1200 basis
not acceptable • Overall: 16.5
1000

Average alarm rate


• During alarm floods: 100.7
800 • Excluding alarm floods: 7.9

600

400 n Critical
n Warning
n Advisory
200

5/6/2009
5/7/2009
5/8/2009
5/9/2009
5/10/2009
5/11/2009
5/12/2009
5/13/2009
5/14/2009
5/15/2009
5/16/2009
5/17/2009
5/18/2009
5/19/2009
5/20/2009
5/21/2009
5/22/2009
5/23/2009
5/24/2009
5/25/2009
5/26/2009
5/27/2009
5/28/2009
5/29/2009
5/30/2009
5/31/2009
Date

191, ISA-18.2 and IEC62682 publications. It hour exceeds the target KPI value of 12 from
is also common to see locally specified ac- Table 1, but is slightly less than the action
tion limits based on a site’s alarm philosophy. limit of 18 per hour, and so might not raise
When a target or action limit is exceeded, it concern, while the timeline view shows that
is important to ask: what problems are likely there are significant periods of time where
contributing to the need for action, and what the performance is unacceptable.
are the actions? These questions are the Common contributors to an excessively
focus of the following discussion. high alarm rate include the following:
• The alarm system is being used to notify
Average alarm rate the operator of events that do not consti-
The average alarm rate is a straightforward tute actual alarms, such as communicat-
measure of the frequency with which new ing informational “for your information”
alarms are presented to the operator, ex- messages, prompts, reminders or alerts.
pressed as an average count per day, hour According to ISA-18.2, an “alarm” is an in-
or per 10-minute interval. As alarm frequency dication to the operator that an equipment
increases, an operator’s ability to respond malfunction, process deviation or abnor-
correctly and in time to avoid the ultimate mal condition requiring a timely response
consequence of inaction decreases. If the is occurring
rate is excessively high, it is probable that • Chattering or other frequently occurring
some alarms will be missed altogether or the nuisance alarms are present. These often
operators will ignore them, thus eroding their originate from non-process alarm sources
overall sense of concern and urgency. So of marginal interest to the operator, such
clearly it is an important metric. as field devices or system hardware diag-
Averages can be misleading, however, be- nostics. Chattering alarms can also indi-
cause they provide no sense of the peaks in cate an incorrect alarm limit or deadband
the alarm rate, making it difficult to distinguish • Redundant alarms, where multiple alarms
“alarm floods” from steady-state “normal” are presented when a single abnormal situ-
operation. Consequently, most alarm per- ation occurs. An example is when a pump
formance reports supplement this basic KPI is shut down unexpectedly, generating a
value with a timeline view or separate calcu- pump fail alarm in addition to alarms for low
lation of alarm rates for both the times when outlet flow and low discharge pressure
operation is normal and for times of an alarm • A problem with the metric calculation is oc-
flood. Figure 2 presents a typical example. curring. A correct calculation only counts
The average alarm rate of 16.5 alarms per new alarms presented to the particular
operator or operating position for which
FIGURE 3. Pie charts can sup- New alarm activation rate distribution
plement alarm performance the metric is intended, taking into consid-
reports and give information 6.6%
n Acceptable eration any by-design threshold settings or
on how much time is spent in 10.1% (0–1 per 10 min.) other authorized filtering mechanisms that
the acceptable range
n Manageable cause fewer alarms to be presented to the
(2–4 per 10 min.) operator than may be recorded in system
n Demanding event logs
20.0% 63.4% (5–9 per 10 min.)

n Unacceptable Peak alarm rate


(≥10 per10 min.) The two metrics — the percentage of
10-minute periods with more than 10
alarms, and the percent of time spent in
an “alarm flood” state — are calcu- Alarm priority distribution FIGURE 4. When the number of high-pri-
lated differently, but are highly simi- ority alarms exceeds that of low-priority
alarms, the methodology of how alarms
lar in that they quantify how much of 8.7%
are assigned priority should be evaluated
the operator’s time is spent within the
highly stressful circumstance of receiv- n Medium
ing more alarms than can be managed 51.8% n High
39.4%
effectively. n Low
EEMUA-191 defines the start of an
alarm flood as a 10-minute period with
more than 10 new alarms, continuing
through subsequent 10-minute inter-
vals until reaching a 10-minute inter-
val with fewer than five new alarms. required to detect state changes and
Equally acceptable is to define a flood suppress or modify alarms accord-
simply as a 10-minute period with ingly
more than 10 new alarms. Often, an Some systems provide specialized
alarm-performance report will supple- alarm views that present alarms in a
ment these two metrics with a pie graphical pattern to aid an operator’s
chart (Figure 3) that segments the re- comprehension of peak alarm events
port period into 10-minute periods that and their associated causality, supple-
are categorized into named alarm-rate menting the classic alarm list to help pro-
ranges, such as acceptable, manage- vide a built-in layer of defense against the
able, demanding and unacceptable. overwhelming effects of an alarm flood.
Another commonly included metric in
the alarm-performance report, the peak Alarm priority distribution
number of alarms within a 10-minute When faced with multiple alarms, the
period, is a straightforward measure operator must decide which to address
of the degree of difficulty of the worst- first. This is — or should be — the basis
case alarm flood for the operator. In for assigning priority to an alarm. Most
poorly performing alarm systems, it is systems will employ three or four priori-
common to see peak alarm counts in ties: low, medium, high and very-high.
a 10-minute period that exceed 250, There are a number of well accepted
a total that would overwhelm even the methods for assigning priority, the most
most highly skilled operator. common being a systematic guided
Common contributors to high peak- (selection-based) consideration of the
alarm-rate frequency and severity in- severity of the consequence of inaction
clude the following items: combined with the time available for the
• Multiple redundant alarms for the operator to take the required action.
same abnormal condition. The op- Conventional wisdom says that the an-
timum situation is of course that any nunciated alarm-priority distribution ex-
single abnormal event will produce perienced by the operator for low-, me-
just one alarm, representing the best dium- and high-priority alarms should
choice in terms of operator compre- be in an approximate ratio of 80, 15 and
hension and the quickest path to take 5%. Ultimately however, the goal should
remedial action. This requires study of be to guide the operator’s determination
alarm causes and often leads to the of the relative importance of one alarm
design of conditional, first-out or other compared to another, based on their
form of advanced alarming logic importance to the business.
• Cascading alarms. The sudden Figure 4 illustrates a situation where
shutdown of equipment often trig- the number of high-priority (critical)
gers automated actions of the control alarms being presented to the operator
system, which in turn, triggers more far exceeds the low-priority (advisory)
alarms alarms, suggesting the need to review
• False indications. When routine the consistency and methodology of
transitions between process states the priority assignment.
occur, the alarm system is not usually Common contributors to out-of-bal-
designed to “follow the process,” so ance alarm-priority distributions include
it can therefore produce a multitude the following:
of false indications of an abnormal • Alarm prioritization (a step in the ratio-
condition. Likewise, logic is typically nalization process) has not been per-
450 100.0% alarm source (FIST111) alone was respon-
400 sible for 15% of all of the alarms presented
350 80.0% to the operator.
300
Another related metric is the count of
Number of alarms

60.0% chattering alarms — alarms that repeatedly


250
transition between the alarm state and the
200
40.0% normal state in a short period of time. The
150 specific criteria for identifying chattering
100 20.0% alarms vary. The most common method is
50 to count alarms that activate three or more
0 0.0% times within one minute.
When the top-ten alarm sources generate
FITST111

IIUP16P1

FICUP1516

IIPX15P1

OPC_FI-N2-051

TIFH42106

FIC-1252

PICFP2043

TIFG41106

FIFC1054
n Alarms
— Cumulative % over 20% of all the alarms presented to the
operator, it is a strong indicator that one or
both of the following is the case:
Alarm source
• Some of those alarms are nuisance alarms
— alarms that operators have come to ex-
FIGURE 5. A small number formed and alarm priorities have been left pect, and in most cases, ignore or con-
of alarm sources can often at their default values sider to be informational
account for the majority of
alarms • Misuse of the priority-setting scheme to • The alarm system is being misused to (fre-
classify alarms for reasons other than pro- quently) generate operator prompts based
viding the operator with a tie-breaker dur- on routine changes in process conditions
ing alarm peaks. For example, using prior- or operating states that may or may not
ity to classify alarms by impact categories, require action
such as environmental, product quality, Eliminating chattering alarms is generally
safety/health, or economic loss straightforward, using signal-conditioning
• Lack of discipline in setting priority based features found in most control systems,
on consideration of direct (proximate) such as on-delay, off-delay and hysteresis
consequences rather than ultimate (un- (deadband).
mitigated) consequences. While it may be
the case that a designed operator action Stale alarms
could fail, followed by a protective system A stale alarm is one that remains annunci-
failure, followed by a subsequent incorrect ated for an extended period of time, most
human response, such what-if consider- often specified as 24 hours. Stale alarms
ations are likely to lead to a vast skewing are surprisingly challenging to quantify.
of alarm priorities toward critical Metrics based on event histories require the
presence of both the start and ending alarm
Alarm source contribution event in order to compute an alarm’s an-
The percent of alarms coming from the top- nunciated duration. There is no event rep-
ten most frequent alarm sources relative to resenting the attainment of a certain age of
the total alarm count is a highly useful met- an annunciated alarm. Thus, it is common
ric for quantifying, identifying and ultimately to miss counting stale alarms if their acti-
weeding out nuisance alarms and alarm- vation event or all-clear event falls outside
system misuse. This is especially true if the the range of dates and times covered in the
alarm performance report covers a range of event history. Consequently, there are alter-
time where operations were routine and with- nate methods for quantifying stale alarms,
out significant process upsets or equipment such as periodic sampling of the active
failures. The top-ten alarm sources often alarm lists at each operator workstation, or
provide “low-hanging” fruit for alarm-man- simply counting the number of alarms that
agement performance improvement. They attained an age greater than the threshold
are a handful of alarms, which if addressed, age. Given this variation in methods, it is im-
will create a noticeable positive change for portant to exercise caution when compar-
the operator. ing stale-alarm metrics across different sites
Figure 5 shows a pattern observed in that may be using different alarm-analytic
many control systems, where as few as applications.
ten alarm sources (like a control module or In addition to being hard to quantify, stale
transmitter) out of the many thousands of alarms can also be some of the most difficult
defined alarm sources, collectively account nuisance alarms to eliminate. Thus in some
for about 80% of all of the alarms presented respects the upward or downward trend in
to the operator. In this example, the first stale alarm counts provides an informal in-
dication of the overall ongoing health of the tions to address them. With this understand-
alarm management program. ing, periodic reviews of alarm-performance
Common contributors to stale alarm reports should lead to more focused actions
counts include the following: that can improve operator effectiveness and
• Routine transitions between process thereby reduce the risks for economic loss,
states where the alarm system is not de- environmental damage or unsafe situations.
signed to adapt and therefore provides For further reading on these and other alarm
false indications of an abnormal condition performance metrics, including suggested
• Alarms associated with standby or idle methods for corrective action, one outstand-
equipment ing resource is Ref. 4. n
• Alarms configured to monitor conditions no Edited by Scott Jenkins
longer relevant or available, an indicator of
poor management-of-change processes References
• Alarms that are essentially latched due to 1. EEMUA Publication 191 — Alarm Systems: A Guide to Design,
excessive application of hysteresis Management and Procurement – Third edition, published by the
Engineering Equipment and Materials Users Association in 2013.
• Alarms that persist beyond the called-for
2. ANSI/ISA–18.2–2009 — Management of Alarm Systems for
operator action, waiting for maintenance the Process Industries – approved June 23, 2009. ISBN: 978-1-
action. This likely constitutes an incorrect 936007-19-6.
use of the alarm system, using it as a re- 3. ANSI/ISA–18.2–2009 — Management of Alarm Systems for
the Process Industries – approved June 23, 2009. ISBN: 978-1-
cording method for outstanding mainte- 936007-19-6.
nance actions 4. International Society of Automation. Technical Report ISA-TRI
In conjunction with reviewing the number 18.2.5, Alarm System Monitoring Assessment and Auditing, ISA.
of stale alarms or the list of stale alarms, it is 2012.
also important to review what alarms have
been manually suppressed (thus removing Author
them from the view of the operator). Sup- Kim VanCamp is the DeltaV marketing prod-
pressing the alarm will remove a stale alarm uct manager for alarm management at Emer-
son Process Management (8000 Norman
from the alarm list (effectively reducing the Center Drive, Bloomington, MN 55437;
number of stale alarms), but will not address Phone: 1-952-828-3500; Email: Kim.Van-
the underlying condition. Camp@emerson.com). He joined Emerson in
1976 and has held senior assignments in
manufacturing, technology, field service, cus-
Closing remarks tomer service, service marketing and product
marketing. VanCamp is a voting member of
This article touches on just some of the key the ISA-18.2 committee on Management of Alarm Systems for the
alarm-system performance metrics and Process Industries and has published multiple papers on alarm
what the numbers represent, in terms of the management. He holds a bachelor’s degree in electrical engineer-
ing from the University of Nebraska.
issues that lay behind them and possible ac-

Posted with permission from March 2016. Chemical Engineering, Access Intelligence, Copyright 2016. All rights reserved.
For more information on the use of this content, contact Wright’s Media at 877-652-5295
122029

You might also like