Professional Documents
Culture Documents
Causal Analysis For Alarm Flood Reduction
Causal Analysis For Alarm Flood Reduction
Causal Analysis For Alarm Flood Reduction
including Biosystems
June 6-8, 2016. NTNU, Trondheim, Norway
Abstract: The introduction of distributed control systems and the high level of interconnectiv-
ity of modern process plants has caused alarm flooding to become one of the main problems in
alarm management of process plants. A reduction of alarm flood periods contributes to a decrease
in plant incidents. In this work, a combination of alarm log, process data and connectivity
analysis is used to isolate consequence alarms originating from the same process abnormality
and to provide a causal alarm suggestion. The effectiveness of the method is illustrated on an
industrial case study of an ethylene plant, a typical example of a large-scale industrial system.
Keywords: alarm systems, fault detection and diagnosis, root-cause, plant-wide disturbance,
plant connectivity, approximate sequence matching, large-scale systems.
pointing them to the causal alarm (i.e. the alarm related Alarm log
to the root-cause of the alarm flood) could significantly
reduce their reaction time.
Remove cha=ering
Causal root-cause analysis for process troubleshooting is alarms
Alarm log without
the topic of several recent contributions investigating the cha=ering alarms
design of systematic methods for abnormality propagation Iden7fy alarm flood
paths determination and plant-wide disturbances causal periods
analysis. Quantitative data-driven methods detecting di- Alarm flood sequences
rectionality between measurements have emerged, such as Cluster alarm flood
time delay analysis (Fan and Deyun, 2012) or transfer sequences Alarm flood sequences
entropy, which is used to identify the causality between clustered and an
abnormality template
measurements even in absence of observable time delays for each cluster
between them (Bauer et al., 2007). A combination of quan-
titative process history methods and qualitative models Map alarms to
Topology data Process data
signals and assets
like Signed direct graphs (SDG) (Fan and Deyun, 2012)
are proposed in Thambirajah et al. (2009). Abnormality
templates and alarm,
The present work follows the line of thought in Schleburg signal and assets
et al. (2013) but aims at developing a systematic method iden7fiers mapped
Isolate the causal
for the isolation of the causal alarm in an alarm flood. The alarm
proposed approach relies on the use of multiple process
information sources: alarm logs, historical process data Abnormality templates and
and a topology model of the process. Even though the corresponding causal
data originating from different information sources differ alarms
724
IFAC DYCOPS-CAB, 2016
June 6-8, 2016. NTNU, Trondheim, Norway
that the beginning or the end of an alarm flood sequence is abnormality. Therefore, an alarm template can be built
cut out, in order not to lose part of the sequence, the time for each cluster/process abnormality. This alarm template
intervals before and after the consecutive time intervals contains the representative alarms that identify the corre-
containing more than τ alarms are also included. sponding process abnormality (Bouchair et al., 2013), see
Fig. 1. Alarms that are not related to a process abnormal-
2.3 Cluster alarm flood sequences ity can trigger during the same time interval. Obviously,
these alarms should not be included in the abnormality
Once the alarm flood time intervals are isolated and template. Filtering out these alarms is achieved based on
the alarm flood sequences are built, similar alarm flood their frequency of occurrence in the training set of alarm
sequences are clustered using sequence pattern matching. floods. The threshold on frequency of occurrence is set to
Alarm time stamps are taken into account as the proximity 50%, i.e. an alarm is kept only if it occurs in more than
in time of the alarms is more important than the exact or- 50% of the sequences belonging to a given cluster.
der of occurrence. We use the method described in Cheng
et al. (2013) based on a modified Smith-Waterman (MSW) Similarity
Clustering
Pre-filter and create
algorithm to blur the order of the alarms in the sequence indices
templates
when they occur closely in time. A similarity matrix is Set of alarm flood Reduced set of Similarity indices Alarm flood sequences
built using pairwise similarity indices obtained from the sequences combinations of alarm clustered and a fault
flood sequences template for each cluster
MSW algorithm. Finally, agglomerative hierarchical clus-
tering (AHC) with the average linkage criterion is applied
to cluster the alarm flood sequences. Fig. 2. Clustering alarm flood sequences step
One main disadvantage of the MSW algorithm is its The third step of the proposed method where alarm flood
heavy computational burden. In order to overcome this sequences are clustered and a set of templates defining
problem a prefiltering stage is added. A less precise but the process abnormalities of all clusters is generated is
less time consuming method is used (Kabir et al., 2013). summarized in Fig. 2 and described in Alg. 1.
Combinations of alarm sequences with low similarity are
filtered out and the MSW index computation is applied to
Alg. 1: Cluster alarm flood sequences
a subset of alarm sequence combinations with a significant input : Alarm flood sequences {Ai } : i = 1, 2, ..., N
similarity. A pair of alarm sequences is considered similar output: Clusters {Ck } : k = 1, 2, ..., K Ck = {Ai : i ∈ 1, 2, ..., N } and
fault templates {Tk } : k = 1, 2, ..., K Tk = {e : e ∈ Σ}
if the two alarm sequences have a high ratio of alarms
1 For each pair {Ai , Aj } : i 6= j and i, j = 1, 2, ..., N
in common. If two alarm sequences have few alarms in if(SIpref (Ai , Aj ) ≥ λpref ) add to list LM SW
common, the computation of the MSW similarity index is 2 Build an N × N similarity matrix S
discarded. A zero similarity index will be assumed. n
SIM SW (Ai , Aj ), if {Ai , Aj } ∈ LM SW
si,j =
0, otherwise
Two thresholds are used: the prefilter threshold (λpref )
and the clustering threshold (λclus ). The first threshold 3 {Ck } = AHC(S, λclus )
decides whether a pair of alarm sequences is filtered out. 4 For each Ck if (alarm type e ∈ Σ is present in more than 50% of
Ai ∈ Ck ) add e to Tk
The second threshold sets the stopping criteria of the
clustering algorithm. The values of these two thresholds
are interrelated: for a given clustering threshold, setting
the prefilter threshold too high can lead to combinations of 2.4 Map alarms to signals and assets
alarm sequences with MSW similarity index large enough
for being clustered, but with a prefilter similarity index The absence of a common naming convention for alarm,
too low to pass the prefilter. A solution to facilitate the signal and assets tags hinders a straightforward mapping
thresholds’ choice is to make the prefilter and the MSW between the different process information sources. We
similarity indices comparable. If one could insure that propose to map alarms, signals and assets based on the
SIpref ilter ≥ SIM SW , for a given clustering threshold if similarity of the symbols contained in their tag names.
the prefilter threshold is set to be equal or smaller than the For instance, the tag name of a temperature controller
clustering threshold, all pairs of alarms that have a MSW from the case study is TC11 in the P&ID, this controller
similarity index high enough to be clustered together will has a signal associated whose tag is TC11CO.XAY and
pass the prefilter. an alarm type TC11CO is assigned to this signal.
Given two alarm sequences A and B, a similarity index In this work six hashmaps {M apx→y : x 6= y, x, y =
that meet this requirement is: e, si, as} mapping alarm types (e), signals (si) and assets
(as) are generated based on the computation of a similarity
|A ∩ B| index as described in Alg. 2. Particularly, the similarity
SIpref ilter (A, B) = (1) index (SISW ) obtained from the original Smith Waterman
max(|A|, |B|) algorithm (Smith and Waterman, 1981).
The numerator represents the number of common alarm
occurrences in both sequences. And the denominator is the Alg. 2: Mapping signals and alarms
input : Sets of all signal tags Si={sitag } and all alarm types Σ
number of alarm occurrences of the longest sequence (the output: M apsi→e and M ape→si
similarity index is normalized between 0 and 1). 1 For each sitag , find emap : emap ∈ Σ, SISW (sitag , emap ) =
max({SI(sitag , e)}) e ∈ Σ, SISW (sitag , emap ) ≥ λmap
According to assumption (2) in Section 2, sequences be- 2 Add pair [sitag , emap ] to M apsi→e and [emap , sitag ] to M ape→si
longing to the same cluster stem from the same process
725
IFAC DYCOPS-CAB, 2016
June 6-8, 2016. NTNU, Trondheim, Norway
Fig. 3. P&ID of the cracking and quenching section of the ethylene plant
2.5 Isolate the causal alarm the plant is explored using a graph-search algorithm on
a graph capturing the connections between assets in the
The first alarm with respect to the time of occurrence plant. The depth first search algorithm is employed in
cannot be considered as the causal alarm since the time this work (Yang et al., 2010). The assets suggested as
at which an alarm triggers is highly dependent on the set- root-causes from the data-driven analysis are taken as
tings of the alarm limits. Moreover, the time between the starting points of the disturbance propagation path while
occurrence of a process abnormality and the corresponding the remaining assets are taken as secondary disturbed
alarm triggering is probabilistic. As the causal relation- elements. All feasible propagation paths between the sug-
ships between signals can be captured, the process data gested root-cause and each of the secondary disturbed
associated to these alarms should be analyzed instead. assets are explored. If no feasible path is found, the root-
cause hypothesis is discarded.
The method used for this purpose is transfer entropy
(Bauer et al., 2007). Transfer entropy uses transition prob- Once the spurious results of the data-driven analysis have
abilities to estimate the amount of information transferred been filtered out and one of the root-cause suggestions has
from one signal x to another signal y and reads: been validated with plant connectivity, the causal alarm is
X p(xi+h |yi , xi ) taken as the alarm associated to the root-cause asset (see
t(x|y) = p(xi+h , yi , xi ) · log (2) Alg. 3).
p(xi+h |xi )
The causality is obtained by comparing the amount of Alg. 3: Isolate the causal alarm
input : Clusters {Ck } and fault templates {Tk }
information transferred from signal x to signal y with the output: A causal alarm akc for each fautl template Tk
amount information transferred from signal y to signal x. 1 For each cluster Ck , choose a sequence Ai : Ai ∈ Ck
If tx→y is positive, more information is transferred from x 2 {sip } = M ape→si ({ep : ep ∈ Tk })
3 Perform process-data causal analysis on signals {sip } to obtain
to y, and therefore x caused y. If tx→y is negative, y caused suggested root-cause signals {siksg } : q ∈ N
x. If it has a value close to 0, then the causality cannot be 4 {ask k k
sg } = M apsi→as ({sisg,q }) and {assd } = M ape→as ({ep })
deduced. 5 If ( ∃ask k
sg,q with feasible propagation paths to all assd )
tx→y = t(y|x) − t(x|y) (3) ak k
c = M apas→e (assg,q )
726
IFAC DYCOPS-CAB, 2016
June 6-8, 2016. NTNU, Trondheim, Norway
After loading the alarm log to the software tool and per-
forming the steps Remove chattering alarms and Identify
alarm flood periods, 23 alarm flood periods are identified. Fig. 6. Time trends of the signals included in the analysis
Results obtained from the Cluster alarm flood sequences
step are depicted in Fig. 4 (a). Three clusters are obtained.
From the 23 alarm sequences analyzed, four do not belong
to any cluster and 19 are clustered under one of the three
groups. Each one of these clusters containing similar alarm
flood sequences represents a different process abnormality.
Only the process abnormality from Cluster1 is further
analyzed. The template of this abnormality is displayed
in Fig. 4 (b).
To perform the data-driven and connectivity analysis, the
signals and assets associated to the alarms included in the
template are identified. After loading the process data and
the topology model of the plant, the automatic mapping
using the Smith-Waterman algorithm is performed. The
results are displayed in Fig. 5. Tags in red represent alarms, Fig. 7. Transfer entropy values tx→y
tags in blue signals and tags in green assets. For instance,
the valve G02 has an associated signal G02.Y and two suggests that the root-cause is the signal PC11.XAY.
alarms G02/010 and G02/011. GI01.Y, PC12.XAY, PI15.Y, PI14.Y, PI13.Y, PI16.Y and
GI02.Y are considered secondary disturbed signals (see
The process signals appearing in the mapping tree of Fig. 5 Fig. 7).
are considered for the analysis. These signals are displayed
in Fig. 6. The results of the process-data analysis are Finally, the result of the process-data analysis is validated
shown in a bubble chart representing the transfer entropy using the process connectivity. The process connectivity
based causality matrix (Bauer et al., 2007). The analysis analysis found feasible propagation paths starting from
727
IFAC DYCOPS-CAB, 2016
June 6-8, 2016. NTNU, Trondheim, Norway
728