Professional Documents
Culture Documents
Article Alarm Management by Numbers Deltav
Article Alarm Management by Numbers Deltav
Alarm Management By
the Numbers
Deeper understanding of common alarm-system metrics can improve remedial actions and
result in a safer plant
Kim VanCamp
Emerson Process
Management
IN BRIEF
ALARM MANAGEMENT
PERFORMANCE METRICS
ALARM SYSTEM
EXAMPLE METRICS
ALARM PRIORITY
DISTRIBUTION
ALARM SOURCE
CONTRIBUTION
STALE ALARMS
CLOSING REMARKS
FIGURE 1. A better understanding of alarm system metrics can lead to more focused remedial actions and help to make the
plant safer
D
o you routinely receive “alarm leading to more focused remedial actions
management performance” re- and ultimately to a safer, better performing
ports, or are you expected to plant (Figure 1).
monitor a managerial dashboard This article reviews the now well estab-
equivalent? What do you look for and what lished benchmark metrics associated with
does it mean? We all know that fewer the alarm-management discipline. Most arti-
alarms mean fewer operator interruptions cles previously published on alarm manage-
and presumably fewer abnormal process or ments cover alarm concepts (for example,
equipment conditions. But a deeper under- defining a valid alarm), alarm management
standing of the more common alarm-man- methods (for instance, rationalization tech-
agement metrics can yield greater insight, niques), justification (such as the benefits of
TABLE 1. EXAMPLE OF TYPICAL ALARM PERFORMANCE METRICS, TARGETS AND ACTION LIMITS
Metric Target Action limit
Average alarm rate per operator (alarms per day) < 288 > 432
Average alarm rate per operator (alarms per hour) < 12 > 18
Average alarm rate per operator (alarms per 10 minutes) 1–2 >3
Percent of 10-minute periods containing > 10 alarms < 1% > 5%
Maximum number of alarms in a 10 minute period ≤10 > 10
Percent of time the system is in flood < 1% > 5%
Annunciated priority distribution (low priority) ~80% < 50%
Annunciated priority distribution (medium priority) ~15% > 25%
Annunciated priority distribution (high priority) ~5% >15%
Percent contribution of top 10 most frequent alarms < 1% to ~5% > 20%
Quantity of chattering and fleeting alarms 0 >5
Stale alarms (number of alarms active for more than >24 hours) < 5 on any day >5
investing in alarm management) and tools ease of use, integration and migration, re-
(including dynamic alarming enablers). This porting capabilities, price, support avail-
article provides a different perspective. Writ- ability and so forth; with reasonable cer-
ten for process plant operation managers or tainty that the KPIs derived from the chosen
others that routinely receive alarm manage- product can be interpreted consistently and
ment performance reports, this article aims compared across sites and across differing
to explain the most common metrics, with- process control, safety and other open plat-
out requiring an understanding of the alarm- form communications (OPC)-capable alarm-
management discipline in depth. generating sources.
In addition to defining the KPI measure-
Alarm-management KPIs ments, the EEMUA-191, ISA-18.2 and
The first widely circulated benchmark met- IEC62682 publications also suggest perfor-
rics, or key performance indicators (KPIs), mance targets, based in large part on the
for alarm management relevant to the practical experience of the companies par-
chemical process industries (CPI) were pub- ticipating in the committees that contributed
lished in the 1999 edition of the Engineering to each publication. As an example, these
Equipment and Materials Users Association publications state that an average long-term
publication EEMUA-191 Alarm Systems – rate of new alarms occurring at a frequency
A Guide to Design, Management and Pro- of up to 12 alarms per hour is the maximum
curement [1]. Later works from standards manageable for an operator. Suggested
organizations, such as the 2009 publication performance levels such as this can pro-
International Society of Automation (ISA) vide a reasonable starting point if you are
18.2 Management of Alarm Systems for the just beginning an alarm-management pro-
Process Industries [2] and the 2014 publi- gram. But before deciding what constitutes
cation IEC62682 Management of alarms a reasonable set of targets for your site, you
systems for the process industries [3], should also consider other firsthand inputs,
built upon EEMUA-191 and have furthered like surveying your operators and reviewing
alarm-management thought and discipline. in-house studies of significant process dis-
For example, they provide a lifecycle frame- turbances and alarm floods. Note that more
work for effectively managing alarms and es- research into the human factors that affect
tablish precise definitions for core concepts operator performance is needed to validate
and terminology. Yet fifteen years later, little and potentially improve on the current pub-
has changed regarding the metrics used to lished performance targets. Important work
measure alarm-system performance. This in this area is ongoing at the Center for Op-
consistency in measurement has been posi- erator Performance (Dayton, Ohio; www.
tive in many respects, leading to the wide operatorperformance.org).
availability of generally consistent commer-
cial alarm analytic reporting products, from Alarm system example metrics
both control-system vendors and from com- A typical alarm-performance report contains
panies that specialize in alarm management. a table similar to Table 1, where the metrics
Consequently, selection of an alarm-analysis and targets are based upon, and in many
product may be based on factors such as cases, copied directly from, the EEMUA-
FIGURE 2. Timeline views of 1400 Alarm rates for Figure 2 on a per-hour
the data can reveal periods
where alarm performance is 1200 basis
not acceptable • Overall: 16.5
1000
600
400 n Critical
n Warning
n Advisory
200
5/6/2009
5/7/2009
5/8/2009
5/9/2009
5/10/2009
5/11/2009
5/12/2009
5/13/2009
5/14/2009
5/15/2009
5/16/2009
5/17/2009
5/18/2009
5/19/2009
5/20/2009
5/21/2009
5/22/2009
5/23/2009
5/24/2009
5/25/2009
5/26/2009
5/27/2009
5/28/2009
5/29/2009
5/30/2009
5/31/2009
Date
191, ISA-18.2 and IEC62682 publications. It hour exceeds the target KPI value of 12 from
is also common to see locally specified ac- Table 1, but is slightly less than the action
tion limits based on a site’s alarm philosophy. limit of 18 per hour, and so might not raise
When a target or action limit is exceeded, it concern, while the timeline view shows that
is important to ask: what problems are likely there are significant periods of time where
contributing to the need for action, and what the performance is unacceptable.
are the actions? These questions are the Common contributors to an excessively
focus of the following discussion. high alarm rate include the following:
• The alarm system is being used to notify
Average alarm rate the operator of events that do not consti-
The average alarm rate is a straightforward tute actual alarms, such as communicat-
measure of the frequency with which new ing informational “for your information”
alarms are presented to the operator, ex- messages, prompts, reminders or alerts.
pressed as an average count per day, hour According to ISA-18.2, an “alarm” is an in-
or per 10-minute interval. As alarm frequency dication to the operator that an equipment
increases, an operator’s ability to respond malfunction, process deviation or abnor-
correctly and in time to avoid the ultimate mal condition requiring a timely response
consequence of inaction decreases. If the is occurring
rate is excessively high, it is probable that • Chattering or other frequently occurring
some alarms will be missed altogether or the nuisance alarms are present. These often
operators will ignore them, thus eroding their originate from non-process alarm sources
overall sense of concern and urgency. So of marginal interest to the operator, such
clearly it is an important metric. as field devices or system hardware diag-
Averages can be misleading, however, be- nostics. Chattering alarms can also indi-
cause they provide no sense of the peaks in cate an incorrect alarm limit or deadband
the alarm rate, making it difficult to distinguish • Redundant alarms, where multiple alarms
“alarm floods” from steady-state “normal” are presented when a single abnormal situ-
operation. Consequently, most alarm per- ation occurs. An example is when a pump
formance reports supplement this basic KPI is shut down unexpectedly, generating a
value with a timeline view or separate calcu- pump fail alarm in addition to alarms for low
lation of alarm rates for both the times when outlet flow and low discharge pressure
operation is normal and for times of an alarm • A problem with the metric calculation is oc-
flood. Figure 2 presents a typical example. curring. A correct calculation only counts
The average alarm rate of 16.5 alarms per new alarms presented to the particular
operator or operating position for which
FIGURE 3. Pie charts can sup- New alarm activation rate distribution
plement alarm performance the metric is intended, taking into consid-
reports and give information 6.6%
n Acceptable eration any by-design threshold settings or
on how much time is spent in 10.1% (0–1 per 10 min.) other authorized filtering mechanisms that
the acceptable range
n Manageable cause fewer alarms to be presented to the
(2–4 per 10 min.) operator than may be recorded in system
n Demanding event logs
20.0% 63.4% (5–9 per 10 min.)
IIUP16P1
FICUP1516
IIPX15P1
OPC_FI-N2-051
TIFH42106
FIC-1252
PICFP2043
TIFG41106
FIFC1054
n Alarms
— Cumulative % over 20% of all the alarms presented to the
operator, it is a strong indicator that one or
both of the following is the case:
Alarm source
• Some of those alarms are nuisance alarms
— alarms that operators have come to ex-
FIGURE 5. A small number formed and alarm priorities have been left pect, and in most cases, ignore or con-
of alarm sources can often at their default values sider to be informational
account for the majority of
alarms • Misuse of the priority-setting scheme to • The alarm system is being misused to (fre-
classify alarms for reasons other than pro- quently) generate operator prompts based
viding the operator with a tie-breaker dur- on routine changes in process conditions
ing alarm peaks. For example, using prior- or operating states that may or may not
ity to classify alarms by impact categories, require action
such as environmental, product quality, Eliminating chattering alarms is generally
safety/health, or economic loss straightforward, using signal-conditioning
• Lack of discipline in setting priority based features found in most control systems,
on consideration of direct (proximate) such as on-delay, off-delay and hysteresis
consequences rather than ultimate (un- (deadband).
mitigated) consequences. While it may be
the case that a designed operator action Stale alarms
could fail, followed by a protective system A stale alarm is one that remains annunci-
failure, followed by a subsequent incorrect ated for an extended period of time, most
human response, such what-if consider- often specified as 24 hours. Stale alarms
ations are likely to lead to a vast skewing are surprisingly challenging to quantify.
of alarm priorities toward critical Metrics based on event histories require the
presence of both the start and ending alarm
Alarm source contribution event in order to compute an alarm’s an-
The percent of alarms coming from the top- nunciated duration. There is no event rep-
ten most frequent alarm sources relative to resenting the attainment of a certain age of
the total alarm count is a highly useful met- an annunciated alarm. Thus, it is common
ric for quantifying, identifying and ultimately to miss counting stale alarms if their acti-
weeding out nuisance alarms and alarm- vation event or all-clear event falls outside
system misuse. This is especially true if the the range of dates and times covered in the
alarm performance report covers a range of event history. Consequently, there are alter-
time where operations were routine and with- nate methods for quantifying stale alarms,
out significant process upsets or equipment such as periodic sampling of the active
failures. The top-ten alarm sources often alarm lists at each operator workstation, or
provide “low-hanging” fruit for alarm-man- simply counting the number of alarms that
agement performance improvement. They attained an age greater than the threshold
are a handful of alarms, which if addressed, age. Given this variation in methods, it is im-
will create a noticeable positive change for portant to exercise caution when compar-
the operator. ing stale-alarm metrics across different sites
Figure 5 shows a pattern observed in that may be using different alarm-analytic
many control systems, where as few as applications.
ten alarm sources (like a control module or In addition to being hard to quantify, stale
transmitter) out of the many thousands of alarms can also be some of the most difficult
defined alarm sources, collectively account nuisance alarms to eliminate. Thus in some
for about 80% of all of the alarms presented respects the upward or downward trend in
to the operator. In this example, the first stale alarm counts provides an informal in-
dication of the overall ongoing health of the tions to address them. With this understand-
alarm management program. ing, periodic reviews of alarm-performance
Common contributors to stale alarm reports should lead to more focused actions
counts include the following: that can improve operator effectiveness and
• Routine transitions between process thereby reduce the risks for economic loss,
states where the alarm system is not de- environmental damage or unsafe situations.
signed to adapt and therefore provides For further reading on these and other alarm
false indications of an abnormal condition performance metrics, including suggested
• Alarms associated with standby or idle methods for corrective action, one outstand-
equipment ing resource is Ref. 4. n
• Alarms configured to monitor conditions no Edited by Scott Jenkins
longer relevant or available, an indicator of
poor management-of-change processes References
• Alarms that are essentially latched due to 1. EEMUA Publication 191 — Alarm Systems: A Guide to Design,
excessive application of hysteresis Management and Procurement – Third edition, published by the
Engineering Equipment and Materials Users Association in 2013.
• Alarms that persist beyond the called-for
2. ANSI/ISA–18.2–2009 — Management of Alarm Systems for
operator action, waiting for maintenance the Process Industries – approved June 23, 2009. ISBN: 978-1-
action. This likely constitutes an incorrect 936007-19-6.
use of the alarm system, using it as a re- 3. ANSI/ISA–18.2–2009 — Management of Alarm Systems for
the Process Industries – approved June 23, 2009. ISBN: 978-1-
cording method for outstanding mainte- 936007-19-6.
nance actions 4. International Society of Automation. Technical Report ISA-TRI
In conjunction with reviewing the number 18.2.5, Alarm System Monitoring Assessment and Auditing, ISA.
of stale alarms or the list of stale alarms, it is 2012.
also important to review what alarms have
been manually suppressed (thus removing Author
them from the view of the operator). Sup- Kim VanCamp is the DeltaV marketing prod-
pressing the alarm will remove a stale alarm uct manager for alarm management at Emer-
son Process Management (8000 Norman
from the alarm list (effectively reducing the Center Drive, Bloomington, MN 55437;
number of stale alarms), but will not address Phone: 1-952-828-3500; Email: Kim.Van-
the underlying condition. Camp@emerson.com). He joined Emerson in
1976 and has held senior assignments in
manufacturing, technology, field service, cus-
Closing remarks tomer service, service marketing and product
marketing. VanCamp is a voting member of
This article touches on just some of the key the ISA-18.2 committee on Management of Alarm Systems for the
alarm-system performance metrics and Process Industries and has published multiple papers on alarm
what the numbers represent, in terms of the management. He holds a bachelor’s degree in electrical engineer-
ing from the University of Nebraska.
issues that lay behind them and possible ac-
Posted with permission from March 2016. Chemical Engineering, Access Intelligence, Copyright 2016. All rights reserved.
For more information on the use of this content, contact Wright’s Media at 877-652-5295
122029