Quickest Change Detection With Confusing Change: S-Cusum

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1

Quickest Change Detection with Confusing Change


Y.-Z. Janice Chen∗ , Jinhang Zuo∗ , Venugopal V. Veeravalli‡ , Don Towsley∗
∗ University of Massachusetts Amherst, {yuzhenchen, jinhangzuo, towsley}@cs.umass.edu
‡ University of Illinois at Urbana-Champaign, vvv@illinois.edu

Abstract—In the problem of quickest change detection (QCD), physical obstacles or unrelated nearby communication systems
a change occurs at some unknown time in the distribution of can cause signal interference.
a sequence of independent observations. This work studies a
QCD problem where the change is either a bad change, which This study addresses a QCD problem with the aim of swiftly
arXiv:2405.00842v1 [math.ST] 1 May 2024

we aim to detect, or a confusing change, which is not of our detecting bad changes while preventing false alarms for pre-
interest. Our objective is to detect a bad change as quickly as
possible while avoiding raising a false alarm for pre-change or change stages or confusing changes. We consider a sequence
a confusing change. We identify a specific set of pre-change, bad of observations X1 , X2 , ..., Xt generated from a stochastic
change, and confusing change distributions that pose challenges system. During the pre-change stage (i.e., when the system
beyond the capabilities of standard Cumulative Sum (CuSum) is in the in-control state), the observations follow distribution
procedures. Proposing novel CuSum-based detection procedures, f0 . At an unknown deterministic time ν, an event occurs and
S-CuSum and J-CuSum, leveraging two CuSum statistics, we
offer solutions applicable across all kinds of pre-change, bad changes the distribution from which the observations arise.
change, and confusing change distributions. For both S-CuSum The event can manifest as either a confusing change with dis-
and J-CuSum, we provide analytical performance guarantees tribution fC or a bad change with distribution fB . Addressing
and validate them by numerical results. Furthermore, both the QCD problem with confusing change prompts intriguing
procedures are computationally efficient as they only require questions regarding the design of a detection procedure that
simple recursive updates.
will not be triggered by a confusing change. Furthermore, the
Index Terms—Sequential change detection, CuSum procedure
false alarm metric in this context differs from that in classical
QCD problems, necessitating considering both pre-change and
confusing change factors.
I. I NTRODUCTION
There has been prior research exploring extensions of the
The quickest change detection (QCD) problem is of fundamen- classical QCD framework to encompass formulations with
tal importance in sequential analysis and statistical inference. more intricate assumptions regarding distributions before or
See, e.g., [1], [2], [3], [4] for books and survey papers on after an event occurs, including composite pre-change dis-
this topic. Moreover, the problem of sequential detection of tribution [15], [16], composite post-change distribution [17],
changes or anomalies in stochastic systems arises in a variety post-change distribution isolation [18], [19], and transient
of science and engineering domains such as signal processing change [20]. Our QCD with confusing change problems differ
in sensor networks [5], [6] and cognitive radio [7], quality from those problems as they still aim to raise an alarm as soon
control in manufacturing [8], [9] and power delivery [10], as any change happens, while we avoid raising an alarm if the
[11], anomaly detection in social network [12] and public change is a confusing change. The most relevant extension for
health [13], and surveillance [14], where timely detection of our study is the formulation of nuisance change, as studied by
changes or abnormalities is crucial for decision-making or Lau and Tay [21]. In a similar vein, [21] considers two types
system control. of changes and aims to detect only one of them. While [21]
In the classical formulation of the QCD problem, the objective presents a more generalized model for change points, allowing
is to detect any change in the distribution. However, in many one type of change to occur after the other, they focus solely
applications, there can be confusing changes that are not of pri- on a restricted set of post-change distributions. In contrast
mary interest, and raising an alarm for a confusing change may to [21], we focus on scenarios where one type of change
result in wasting human resources on checking the system. For occurs and devise solutions applicable to all post-change dis-
instance, in cybersecurity applications, natural fluctuations or tributions. Therefore, we view our work as complementary to
diurnal variations in network traffic patterns can be confusing Lau and Tay’s [21]. From an algorithm design perspective, our
when it comes to identifying potentially malicious activity. In proposed procedures differ from prior procedures that involve
environment or industrial process monitoring, delicate sensors two CuSum Statistics [22], [23], and our procedures are more
or instruments can experience wear or drift in calibration and, computationally efficient than the window-limited generalized
therefore, introduce changes in the data that are not reflective likelihood ratio test-based procedure proposed in [21].
of the underlying environment or process. Another example is Our contributions are as follows: In Section II, we formulate
wireless communication in a complicated environment, where a novel quickest change detection problem where the change
can be either a bad change or a confusing change, and
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may we propose a false alarm metric that takes not only the
no longer be accessible. run length to false alarm for pre-change but also the run
2

length to false alarm for confusing change into account. In TABLE I: Possible Drifts of CuSumfB ,f0 and CuSumfB ,fC
Section III, we investigate the problem with possible scenarios Statistics under Distributions f0 , fC , and fB
with different combinations of pre-change, bad change, and CuSumfB ,f0
confusing change distributions, and we identify a scenario
Under f0 Ef0 [log(fB (X)/f0 (X))]
in which procedures depending on a single CuSum statistic = −DKL (f0 ||fB ) < 0
will incur short run lengths to false alarm. In Section IV, we
Under fC EfC [log(fB (X)/f0 (X))]
propose two novel procedures, S-CuSum and J-CuSum, that = −DKL (fC ||fB ) + DKL (fC ||f0 ) ≷ 0
incorporate two CuSum statistics and work under all kinds of Under fB EfB [log(fB (X)/f0 (X))]
distributions. In Section V, we first provide a universal lower = DKL (fB ||f0 ) > 0
bound of detection delay for any procedure that fulfills the
CuSumfB ,fC
false alarm requirement, and then we provide the false alarm
upper bounds and detection delay lower bounds of S-CuSum Under f0 Ef0 [log(fB (X)/fC (X))]
= −DKL (f0 ||fB ) + DKL (f0 ||fC ) ≷ 0
and J-CuSum. In Section VI, our simulation results under all
possible scenarios corroborate our theoretical guarantees for Under fC EfC [log(fB (X)/fC (X))]
= −DKL (fC ||fB ) < 0
S-CuSum and J-CuSum. Finally, in Section VII, we conclude
this paper and discuss future directions. Under fB EfB [log(fB (X)/fC (X))]
= DKL (fB ||fC ) > 0

II. P ROBLEM F ORMULATION & P RELIMINARY


Optimization Problem. The optimization problem we have
Let {Xt : t ∈ N+ } be a sequence of random variables whose in the work is to find a stopping rule for a bad change that
values are observed sequentially, and let {Ft : t ∈ N+ } be the belongs to Cγ for every γ ≥ 1 and approximates
filtration generated by the sequence, i.e., Ft := σ(Xi : 1 ≤ inf WADDfB (T ) (3)
t ≤ t), where σ(·) denotes σ-algebra. Let f0 (X) be the density T ∈Cγ

of the pre-change distribution, let fC (X) be the density of the as γ → ∞.


confusing change distribution, and let fB (X) be the density
Methodology. In this work, we propose Cumulative Sum
of the bad change distribution. We assume that f0 , fC , and fB
(CuSum)-based procedures for the QCD with confusing
are known, different, and measure over a common measurable
change problems. Hence, we briefly review CuSum statistics
space. We denote the unknown but deterministic change-point
and its recursion here. We let the cumulative summation of
by ν.
the log-likelihood ratio of two densities, e.g., fB and f0 , be:
To be more specific, we denote by Pν,f∗ , ∗ ∈ {C, B}, the t
X 
fB (Xi )

underlying probability measure, and by Eν,f∗ the correspond- CuSumfB ,f0 [t] := max log . (4)
k:1≤k≤t f0 (Xi )
ing expectation, when the change-point is ν and the post- i=k
change distribution is f∗ . Moreover, we denote by P∞ the Such a CuSum statistic can be updated recursively by:
underlying probability measure, and by E∞ the corresponding   +
expectation, when the change never occurs, i.e., P∞ = Pf0 , fB (Xt )
CuSumfB ,f0 [t] ≡ CuSumfB ,f0 [t − 1] + log .
E∞ = Ef0 . Finally, let T denote a stopping time, i.e., the time f0 (Xt )
at which we stop taking observations and declare that a bad (5)
change has occurred. Please refer to, e.g., [1], [2], [3], [4] for more discussion on
this type of procedure.
False Alarm Measure. An alarm is considered as a false alarm
if (1) the alarm is raised in the pre-change stage, i.e., T < ν,
III. C HARACTERIZATION OF C HALLENGING S CENARIO
or (2) the alarm is raised for a confusing change, i.e., T ≥ ν
and XT ∼ fC (X). Therefore, for any stopping time T , we In this section, we categorize all instances of the QCD with
measure the false alarm performance in terms of its mean time confusing change problems into three scenarios and iden-
to false alarm in both cases. Specifically, we denote by Cγ the tify one scenario as more challenging than the other two.
subfamily of stopping times for which the worst-case average Specifically, we characterize the scenarios by the values of
run length (WARL) to false alarm is at least γ in both cases, KL divergences DKL (f0 ||fC ), DKL (f0 ||fB ), DKL (fC ||f0 ),
i.e., DKL (fC ||fB ), DKL (fB ||f0 ), and DKL (fB ||fC ) as they gov-
ern the drifts of CuSum statistics.
Cγ := {T : E∞ [T ] ≥ γ, inf Eν,fC [T ] ≥ γ}. (1) The drifts of CuSum statistics are determined by the signs
ν≥1
of the expectations of the log-likelihood ratios. For example,
when a data sample comes from distribution fB , i.e., X ∼
Delay Measure. We use worst-case measures for delay. By fB , the expectation of log-likelihood ratio log(fB (X)/f0 (X))
Pollak’s criterion [24], for any T ∈ Cγ , corresponds to KL divergence DKL (fB ||f0 ), as
  
fB (X)
WADDfB (T ) := sup Eν,fB [T − ν|T ≥ ν]. (2) DKL (fB ||f0 ) := EfB log > 0, (6)
ν≥1 f0 (X)
3

(a) Confusing Change (b) Bad Change (a) Confusing Change (b) Bad Change
Fig. 1: Illustration of CuSumfB ,f0 Behaviors in Scenario 1 - Fig. 2: Illustration of CuSumfB ,f0 (Blue Lines) and
applying standard CuSumfB ,f0 procedure suffices to detect a CuSumfB ,fC (Red Lines) Behaviors in Scenario 2 - ap-
bad change quickly while avoiding raising false alarm for a plying standard CuSumfB ,fC procedure suffices to detect a
confusing change. bad change quickly while avoiding raising false alarm for a
confusing change.

where by definition DKL (fB ||f0 ) > 0 as long as fB ̸= f0 , in


which case, the value of CuSumfB ,f0 is generally increasing,
i.e., CuSumfB ,f0 has a positive drift. Following the same
reasoning, under f0 , CuSumfB ,f0 has a negative drift because
  
fB (X)
Ef0 log = −DKL (fB ||f0 ) < 0. (7)
f0 (X)
(a) Confusing Change (b) Bad Change
Interestingly, the drift of CuSumfB ,f0 under fC , indicated by
   Fig. 3: Illustration of CuSumfB ,f0 (Blue Lines) and
fB (X)
EfC log CuSumfB ,fC (Red Lines) Behaviors in Scenario 3 -
f0 (X)
   CuSumfB ,f0 fails in distinguishing a bad change from a
fB (X) fC (X) confusing change, and CuSumfB ,fC fails in avoiding raising
= EfC log (8)
f0 (X) fC (X) false alarm during pre-change stage.
     
fB (X) fC (X)
= EfC log + EfC log (9)
fC (X) f0 (X)
= −DKL (fC ||fB ) + DKL (fC ||f0 ), (10) 3. when CuSumfB ,f0 has a positive drift under fC
and CuSumfB ,fC also has a positive drift under
can be positive, zero, or negative depending on the values of f0 , i.e., −DKL (fC ||fB ) + DKL (fC ||f0 ) > 0 and
−DKL (fC ||fB ) and DKL (fC ||f0 ). −DKL (f0 ||fB ) + DKL (f0 ||fC ) > 0.
As there is an extra distribution, fC , besides the typical distri-
Scenarios 1 and 2 are trivial since simply applying a standard
butions, f0 , fB , in this QCD with confusing change problem,
CuSum procedure suffices to quickly detect a bad change
it is natural to consider an extra CuSum statistic, CuSumfB ,fC
while avoiding raising a false alarm for a confusing change. In
(defined in the same way as CuSumfB ,f0 in (4)), besides
Figure 1, we illustrate the behavior of CuSumfB ,f0 in Scenario
the typical CuSum statistic, CuSumfB ,f0 , often considered
1. As illustrated in Figure 1a, since in Scenario 1, CuSumfB ,f0
in quickest change detection problem. (Further justification
has a non-positive drift under confusing change fC (as under
for considering CuSumfB ,fC will be provided in Section IV.)
pre-change f0 ), CuSumfB ,f0 generally remains small after a
Following the same reasoning for CuSumfB ,f0 , we find that
confusing change occurs and will rarely trigger a false alarm.
CuSumfB ,fC has a positive drift under fB , has a negative
Figure 1b illustrates that CuSumfB ,f0 generally increases after
drift under fC , and has either a positive, zero, or negative
a bad change occurs as it has a positive drift and quickly passes
drift under f0 depending on the values of −DKL (f0 ||fB ) and
the alarm threshold. In Figure 2, we find that in Scenario 2
DKL (f0 ||fC ). We list the possible drifts of the two CuSum
CuSumfB ,f0 fails to distinguish a bad change from confusing
statistics under the three possible distributions, f0 , fB , fC , in
change because it has a positive drift under confusing change
Table I.
fC (as under bad change fB ). Fortunately, applying a standard
According to the relations between the KL divergences for CuSumfB ,fC procedure suffices for our goal as it has non-
f0 , fC , and fB , which leads to different drifts in the CuSum positive drifts under both the pre-change and confusing change
statistics, we categorize instances of the QCD with confusing distributions and only has positive drift under the bad change
change problem into three scenarios: distribution.
1. when CuSumfB ,f0 has a zero or negative drift under fC , Scenario 3 poses challenges beyond the capabilities of stan-
i.e., −DKL (fC ||fB ) + DKL (fC ||f0 ) ≤ 0; dard single CuSum procedures. Specifically, as illustrated in
2. when CuSumfB ,f0 has a positive drift under fC Figure 3a, in Scenario 3 CuSumfB ,f0 generally increases under
and CuSumfB ,fC has a zero or negative drift under a confusing change distribution and hence is likely to trigger
f0 , i.e., −DKL (fC ||fB ) + DKL (fC ||f0 ) > 0 and a false alarm. Figure 3 also shows that in Scenario 3, the
−DKL (f0 ||fB ) + DKL (f0 ||fC ) ≤ 0; standard CuSumfB ,fC procedure is likely to raise a false
4

Algorithm 1 Successive CuSum (S-CuSum) Procedure Algorithm 2 Joint CuSum (J-CuSum) Policy
Input: f0 , fC , fB , b0 , bC Input: f0 , fC , fB , b0 , bC
Initialize: t = 0, CuSumW [t] ← 0, CuSumSΛ [t] ← 0 Initialize: t = 0, CuSumW [t] ← 0, CuSumJΛ [t] ← 0
while 1 do while 1 do
t←t+1 t←t+1
if CuSumW [t − 1] < b0 then if CuSumW [t − 1] < b0 then
CuSumW [t] ← (CuSumW [t − 1] + W [t])+ CuSumW [t] ← (CuSumW [t − 1] + W [t])+
else + end if
CuSumSΛ [t] ← CuSumSΛ [t − 1] + Λ[t] if CuSumJΛ [t − 1] < bC then
+
if CuSumSΛ [t − 1] ≥ bC then CuSumJΛ [t] ← (CuSumJΛ [t − 1] + Λ[t])
TS-CuSum ← t end if
Break if CuSumW [t] ≥ b0 , CuSumJΛ [t] ≥ bC then
end if TJ-CuSum ← t
end if Break
end while else if CuSumW [t] ≤ 0 then
Output: stopping time TS-CuSum CuSumW [t] ← 0
CuSumJΛ [t] ← 0
end if
end while
Output: stopping time TJ-CuSum

(a) Confusing Change (b) Bad Change


Fig. 4: Illustration of S-CuSum Behaviors in Scenario 2 & 3 -
only in the bad change distribution would both CuSumW and
CuSumSΛ pass the threshold. Note that S-CuSum also works (a) Confusing Change (b) Bad Change
in Scenario 1. Fig. 5: Illustration of J-CuSum Behaviors in Scenario 3
- the resetting of CuSumJΛ (Red Lines) prevents J-CuSum
from raising false alarm for confusing change, and applying
alarm in a pre-change stage as it has positive drift under CuSumW (Blue Lines) and CuSumJΛ simultaneously shortens
f0 . Hence, standard single CuSum procedures CuSumfB ,f0 detection delay. Note that J-CuSum also applies to Scenario
and CuSumfB ,fC cannot solely address this problem. It is 1 & 2.
worth noting that neither can a naive combination of the
standard single CuSum procedures, such as separately launch-
ing CuSumfB ,f0 and CuSumfB ,fC simultaneously from the and another hypothesis test that aims to distinguish between
beginning and stopping when both statistics pass thresholds, the post-change states (confusing change or bad change):
successfully tackle this problem. Indeed, Figure 3a shows that
if the pre-change stage is long, i.e, change point ν is large, H0 [t] : Xt ∼ fC (X) (13)
CuSumfB ,fC may be much larger than the threshold at the H1 [t] : Xt ∼ fB (X). (14)
time a confusing change occurs such that it would not fall
below the threshold before CuSumfB ,f0 passes the threshold, Intuitively, the hypothesis test to determine pre-/post-change
and therefore a false alarm would be triggered for a confusing suggests testing against log-likelihood
change. 
fB (Xt )

W [t] := log , (15)
f0 (Xt )
IV. S UCCESSIVE C U S UM AND J OINT C U S UM and the hypothesis test for distinguishing confusing/bad
P ROCEDURES change suggests testing against log-likelihood
In this section, we propose two new procedures, Successive 
fB (Xt )

CuSum (S-CuSum) and Joint CuSum (J-CuSum), that work Λ[t] := log . (16)
fC (Xt )
for all scenarios of the QCD with confusing change problems.
We begin by introducing the two hypothesis tests correspond- Since we are only interested in detecting the bad change, we
ing to this problem. We have one hypothesis test that aims should only raise an alarm when both tests favor the alternative
to determine whether we are in the pre-change state or in a hypotheses. This suggests that synthesizing the two tests is the
post-change state at time t: key to our problem.
The core idea of the Successive CuSum (S-CuSum) procedure
H0 [t] : τ > t (11)
is to prevent test statistic w.r.t. Λ[t] from passing the threshold
H1 [t] : τ ≤ t, (12) in the pre-change stage by only launching it after the test
5

statistic w.r.t. W [t] has passed the threshold. Specifically, we We first study the universal lower bound on the WADDfB for
let the test statistic w.r.t. W [t] be any procedure whose run length to false alarm is no smaller
than γ.
CuSumW [t] :=
 Theorem 1 (Universal Detection Delay Lower Bound). As
0,
 if t = 0,
γ → ∞, we have
+
(CuSumW [t − 1] + W [t]) , if CuSumW [t − 1] < b0 ,


CuSumW [t − 1], otherwise. inf WADDfB (T )
T ∈Cγ0 ,γC
(17) log(γ)(1 − o(1))
≥ . (23)
And let the test statistic w.r.t. Λ[t] be min{DKL (fB ||f0 ), DKL (fB ||fC )} + o(1)

CuSumSΛ [t] := The full proof of Theorem 1 is deferred to Appendix A. The


key step in this proof is to utilize our false alarm requirement,
(
0, if t = 0 or CuSumW [t] < b0 ,
S
+ (1), and the proof-by-contradiction argument as in the proof
CuSumΛ [t − 1] + Λ[t] , otherwise.
of [25, Theorem 1]. Intuitively, the detection delay of a
(18) stopping rule T ∈ Cγ not only depends on DKL (fB ||f0 ) but
That is, S-CuSum stops updating CuSumW once it passes the also depends on DKL (fB ||fC ) as it is required to have at least
threshold and starts updating CuSumSΛ . An alarm is triggered γ run length to false alarm for confusing change.
when CuSumSΛ passes the threshold, i.e., In the following, we analyze the run length to false alarm and
TS-CuSum := inf{t ≥ 1 : CuSumSΛ [t] ≥ bC } (19) detection delay of S-CuSum.
≡ inf{t ≥ 1 : CuSumW [t] ≥ b0 , CuSumSΛ [t] ≥ bC }. Theorem 2 (S-CuSum False Alarm Lower Bound). With b0 =
(20) bC = log γ, we have
Pseudocode for S-CuSum is given in Algorithm 1. As illus- TS-CuSum ∈ Cγ , (24)
trated in Figure 4, CuSumW most likely passes the threshold
after a change has occurred. If the change is a confusing where Cγ is defined in Eq. (1).
change, CuSumSΛ always has a negative drift and will rarely The full proof of Theorem 2 is given in Appendix B. Note
pass the threshold; if the change is a bad change, CuSumSΛ the S-CuSum only triggers an alarm when both CuSumW
generally increases and passes the threshold quickly. and CuSumSΛ pass their thresholds. Hence, to show that
TS-CuSum ∈ Cγ , we need to show that E∞ [TCuSumW ] ≥ γ and
While S-CuSum effectively detects the bad change and ig-
that E1,fC [TCuSumfB ,fC ] ≥ γ. We establish both inequalities by
nores the confusing change, its detection delay leaves room
relating the CuSum statistics to their corresponding Shiryaev-
for improvement. Toward this, we propose Joint CuSum
Roberts statistics and utilizing the martingale properties of the
(J-CuSum), which incorporates two tests w.r.t. W [t] and Λ[t]
Shiryaev-Roberts statistics [2].
respectively in a more involved way. Specifically, J-CuSum
utilizes CuSumW (as defined in (17)) and let Theorem 3 (S-CuSum Detection Delay Upper Bound). With
b0 = bC = log γ, as γ → ∞, we have
CuSumJΛ [t] :=
WADDfB (TS-CuSum )

0,
 if t = 0 or CuSumW [t] ≤ 0,  
J
+
CuSumΛ [t − 1] + Λ[t] , if CuSumJΛ [t − 1] < bC , log γ log γ
≤ + (1 + o(1)) (25)
 DKL (fB ||f0 ) DKL (fB ||fC )
CuSumJΛ [t − 1], otherwise.

2 log γ
(21) ≤ (1 + o(1)). (26)
min{DKL (fB ||f0 ), DKL (fB ||fC )}
That is, J-CuSum prevents CuSumJΛ from passing the thresh-
old in the pre-change stage by resetting it to zero whenever The full proof of Theorem 3 is given in Appendix C. Because
CuSumW hits zero. And J-CuSum also raises an alarm when CuSumW [t] and CuSumSΛ [t] are always non-negative and are
both statistics pass the threshold, i.e., zero when t = 0, the worse-case average detection delay
occurs when the change point ν = 1. And by the algorithmic
TJ-CuSum := inf t ≥ 1 : CuSumW [t] ≥ b0 , CuSumJΛ [t] ≥ bC .

property of S-CuSum, E1,fB [TS-CuSum ] equals the sum of
(22) E1,fB [TCuSumW ] and E1,fB [TCuSumfB ,fC ]. We upper bound both
The pseudocode of J-CuSum is presented in Algorithm 2, using a generalized Weak Law of Large Numbers, Lemma 1.
and Figure 5 illustrates how J-CuSum works. In the following, we analyze the run length to false alarm and
detection delay of J-CuSum.
V. T HEORETICAL G UARANTEES
Theorem 4 (J-CuSum False Alarm Lower Bound). With b0 =
In this section, we discuss the theoretical properties of the bC = log γ, we have
QCD with confusing change problems and our proposed
procedures. TJ-CuSum ∈ Cγ , (27)
6

(a) Scenario 1 (b) Scenario 2 (c) Scenario 3


Fig. 6: Numerical Comparison between S-CuSum and J-CuSum and baselines, CuSumfB ,f0 and CuSumfB ,fC in the three
scenarios specified in Section III.

where Cγ is defined in Eq. (1). • for Scenario 1, we let f0 = N (0, 1), fC = N (−0.5, 1),
and fB = N (0.5, 1);
The full proof of Theorem 4 is deferred to Appendix D. By • for Scenario 2, we let f0 = N (0, 1), fC = N (0.7, 1),
the algorithmic property of J-CuSum, when confusing change and fB = N (1.2, 1);
occurs and the change point is just right before CuSumJΛ being • for Scenario 3, we let f0 = N (0, 1), fC = N (1, 1), and
reset by CuSumW , J-CuSum has the shortest average run fB = N (0.5, 1).
time for CuSumJΛ to pass the threshold. To analyze the run
length between a reset to the next reset, we follow [26] to For each scenario, we run all procedures under P1,fB , P∞ , and
define the stopping time of CuSumW in terms of the stopping P1,fC with varying thresholds (for S-CuSum and J-CuSum,
times of a sequence of sequential probability ratio tests. It we let b0 = bC ) to learn these procedures’ detection delays,
follows from the approximation of the stopping time of a pre-change run lengths to false alarms, and run lengths to false
corresponding sequential probability ratio test given in [26] alarm for confusing change respectively. For each scenario,
that the expected value of CuSumJΛ just right before a reset is underlying distribution, and threshold, we perform 60 inde-
almost zero. Therefore, we can utilize results in the proof of pendent trials and report the average.
Theorem 2 to prove that TJ-CuSum ∈ Cγ .
In Figure 6, we plot the average detection delays of the
Theorem 5 (J-CuSum Detection Delay Upper Bound). With procedures against their average run lengths to false alarm for
b0 = bC = log γ, as γ → ∞, we have pre-change or for confusing change, whichever is smaller.This
is because our false alarm requirement, Eq. (1), asks for both
WADDfB (TJ-CuSum )
  the run lengths to false alarm for pre-change and confusing
log γ log γ change to be no smaller than the same threshold; hence,
⪅ + (1 + o(1)) (28)
DKL (fB ||f0 ) DKL (fB ||fC ) we plot against whichever is smaller to make sure that the
2 log γ requirement is fulfilled. Moreover, We plot run lengths to
≤ (1 + o(1)). (29)
min{DKL (fB ||f0 ), DKL (fB ||fC )} false alarm on a logarithmic scale while plotting detection
delay on a linear scale. Hence, straight lines on the figures
The full proof of Theorem 5 is deferred to Appendix E. indicate that detection delays grow logarithmically with regard
Because CuSumW [t] and CuSumJΛ [t] are always non-negative to run lengths to false alarm; whereas the steeply rising
and are zero when t = 0, the worst-case average detection lines on the figures indicate that detection delays grow super-
delay occurs when the change point ν = 1. And by the logarithmically with regard to run lengths to false alarm.
algorithmic property of J-CuSum, E1,fB [TJ-CuSum ] is up- First, Figures 6a, 6b, and 6c respectively corroborate our
per bounded by both E1,fB [TCuSumW ] and E1,fB [TCuSumJΛ ]. discussion in Section III that CuSumfB ,f0 suffices in Scenario
We upper bound E1,fB [TCuSumW ] using Lemma 1. As for 1, CuSumfB ,fC suffices in Scenario 2, but neither a single
E1,fB [TCuSumJΛ ], we follow [26] to define the stopping time CuSumfB ,f0 nor a single CuSumfB ,fC suffice in Scenario 3
of CuSumW in terms of the stopping times of a sequence to detect a bad change quickly while avoiding raising false
of sequential probability ratio tests and approximate the run alarm for a confusing change. Indeed, Figure 6c shows that
length to the last resetting. both CuSumfB ,f0 and CuSumfB ,fC incur very large average
detection delays in order to achieve same average run lengths
VI. N UMERICAL R ESULTS to false alarm as S-CuSum or J-CuSUm.
In this section, we numerically compare S-CuSum and Second, Figures 6a, 6b, and 6c show that both S-CuSum
J-CuSum to baselines, CuSumfB ,f0 and CuSumfB ,fC . Specif- and J-CuSum perform well in all three possible scenarios,
ically, we conduct simulations in each of the three scenarios namely under all kinds of pre-change, bad change, and con-
discussed in Section III: fusing change distributions. Indeed, in Figure 6, the lines
7

of S-CuSum and J-CuSum are straight in all three graphs, [7] L. Lai, Y. Fan, and H. V. Poor, “Quickest detection in cognitive radio:
meaning that their detection delays grow logarithmically with A sequential change detection framework,” in IEEE GLOBECOM 2008-
2008 IEEE Global Telecommunications Conference. IEEE, 2008, pp.
regard to their run lengths to false alarm. 1–5.
Finally, as shown in Figures 6a, 6b, and 6c, our simula- [8] T. L. Lai, “Sequential changepoint detection in quality control and
tions empirically support our discussion in Section IV that dynamical systems,” Journal of the Royal Statistical Society: Series B
(Methodological), vol. 57, no. 4, pp. 613–644, 1995.
the detection delay of S-CuSum still has room to improve
and J-CuSum reduces the detection delay by allowing the [9] W. H. Woodall, D. J. Spitzner, D. C. Montgomery, and S. Gupta, “Using
CuSumJΛ to be launched earlier than CuSumSΛ but still avoiding control charts to monitor process and product quality profiles,” Journal
of Quality Technology, vol. 36, no. 3, pp. 309–320, 2004.
raising false alarms.
[10] T. Banerjee, Y. C. Chen, A. D. Dominguez-Garcia, and V. V. Veeravalli,
VII. C ONCLUSION AND D ISCUSSION “Power system line outage detection and identification—a quickest
change detection approach,” in 2014 IEEE International Conference on
In this paper, we investigated a quickest change detection Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp.
problem where an initially in-control system can transition 3450–3454.
into an out-of-control state due to either a bad event or a [11] J. Sun, M. Saeedifard, and A. S. Meliopoulos, “Backup protection of
confusing event. Our goal was to detect the change as soon multi-terminal hvdc grids based on quickest change detection,” IEEE
Transactions on Power Delivery, vol. 34, no. 1, pp. 177–187, 2018.
as possible if a bad event occurs while avoiding raising an
alarm if a confusing event occurs. We found that when both [12] F. Ji, W. P. Tay, and L. R. Varshney, “An algorithmic framework for
1) the KL-divergence between confusing change distribution estimating rumor sources with different start times,” IEEE Transactions
on Signal Processing, vol. 65, no. 10, pp. 2517–2530, 2017.
fC and pre-change distribution f0 is larger than that between
fC and bad change distribution fB , and 2) the KL-divergence [13] Y. Liang and V. V. Veeravalli, “Quickest change detection with leave-
one-out density estimation,” in ICASSP 2023-2023 IEEE International
between f0 and fC is greater than that between f0 and Conference on Acoustics, Speech and Signal Processing (ICASSP).
fB occurs, typical procedures based on CuSumfB ,f0 and IEEE, 2023, pp. 1–5.
CuSumfB ,fC fail to achieve our objective. Hence, we proposed
[14] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and H. Kim, “A novel
two new detection procedures S-CuSum and J-CuSum that approach to detection of intrusions in computer networks via adaptive
achieve our objective in all scenarios and provide theoretical sequential and batch-sequential change-point detection methods,” IEEE
guarantees as well as numerical corroborations. transactions on signal processing, vol. 54, no. 9, pp. 3372–3382, 2006.

[15] Y. Mei, Asymptotically optimal methods for sequential change-point


While the detection delay upper bound of J-CuSum that detection. California Institute of Technology, 2003.
we obtained is the same as that obtained for S-CuSum,
intuitively, J-CuSum should produce smaller detection delays [16] ——, “Sequential change-point detection when unknown parameters are
present in the pre-change distribution,” 2006.
than those S-CuSum would incur. Indeed, in all our simula-
tions, J-CuSum has smaller detection delays than S-CuSum. [17] G. Rovatsos, X. Jiang, A. D. Domı́nguez-Garcı́a, and V. V. Veeravalli,
“Statistical power system line outage detection under transient dynam-
We leave closing the theoretical gap between the detection ics,” IEEE Transactions on Signal Processing, vol. 65, no. 11, pp. 2787–
delay upper bound of J-CuSum and the universal lower bound 2797, 2017.
of detection delay for future work.
[18] A. Warner and G. Fellouris, “Sequential change diagnosis revisited and
the adaptive matrix cusum,” arXiv preprint arXiv:2211.12980, 2022.
ACKNOWLEDGEMENT
[19] X. Zhao, J. Hu, Y. Mei, and H. Yan1, “Adaptive partially observed
The authors would like to thank Lance Kaplan for invaluable sequential change detection and isolation,” Technometrics, vol. 64, no. 4,
insights and engaging discussions throughout the development pp. 502–512, 2022.
of this paper. [20] S. Zou, G. Fellouris, and V. V. Veeravalli, “Quickest change detection
under transient dynamics: Theory and asymptotic analysis,” IEEE Trans-
R EFERENCES actions on Information Theory, vol. 65, no. 3, pp. 1397–1412, 2018.
[1] H. V. Poor and O. Hadjiliadis, Quickest detection. Cambridge Univer- [21] T. S. Lau and W. P. Tay, “Quickest change detection in the presence of
sity Press, 2008. a nuisance change,” IEEE Transactions on Signal Processing, vol. 67,
no. 20, pp. 5281–5296, 2019.
[2] V. V. Veeravalli and T. Banerjee, “Quickest change detection,” in
Academic press library in signal processing. Elsevier, 2014, vol. 3, [22] V. Dragalin, “The design and analysis of 2-cusum procedure,” Commu-
pp. 209–255. nications in Statistics-Simulation and Computation, vol. 26, no. 1, pp.
[3] A. Tartakovsky, I. Nikiforov, and M. Basseville, Sequential analysis: 67–81, 1997.
Hypothesis testing and changepoint detection. CRC press, 2014.
[23] Y. Zhao, F. Tsung, and Z. Wang, “Dual cusum control schemes for
[4] L. Xie, S. Zou, Y. Xie, and V. V. Veeravalli, “Sequential (quickest) detecting a range of mean shifts,” IIE transactions, vol. 37, no. 11, pp.
change detection: Classical results and new directions,” IEEE Journal 1047–1057, 2005.
on Selected Areas in Information Theory, vol. 2, no. 2, pp. 494–514,
2021. [24] M. Pollak, “Optimal detection of a change in distribution,” The Annals
of Statistics, pp. 206–227, 1985.
[5] T. Banerjee and V. V. Veeravalli, “Data-efficient quickest change de-
tection in sensor networks,” IEEE Transactions on Signal Processing, [25] T. L. Lai, “Information bounds and quick detection of parameter
vol. 63, no. 14, pp. 3727–3735, 2015. changes in stochastic systems,” IEEE Transactions on Information
theory, vol. 44, no. 7, pp. 2917–2929, 1998.
[6] Z. Sun, S. Zou, R. Zhang, and Q. Li, “Quickest change detection
in anonymous heterogeneous sensor networks,” IEEE Transactions on [26] D. Siegmund, Sequential analysis: tests and confidence intervals.
Signal Processing, vol. 70, pp. 1041–1055, 2022. Springer Science & Business Media, 1985.
8

[27] G. Fellouris and A. G. Tartakovsky, “Multichannel sequential detec- where Ht = (X1 , ..., Xt ), t ∈ N; change-of-measure argument
tion—part i: Non-iid data,” IEEE Transactions on Information Theory, (a) holds because Pf0 and Pν,fB are measures over a common
vol. 63, no. 7, pp. 4551–4571, 2017.
measurable space, Pν,fB is σ-finite, and Pf0 ≪ Pν,fB ; a will
be specified later; and inequality (b) is because, for any event
A PPENDIX A A and B, P[A ∩ B] ≥ P[A] − P[B c ].
PROOF OF T HEOREM 1
The event {T ≥ ν} only depends on Hν−1 , which follows the
Proof. We first recall the following generalized version of the
same distribution under Pf0 and Pν,fB . This implies
Weak Law of Large Number.
Pf0 [T ≥ ν] = Pν,fB [T ≥ ν]. (44)
Lemma 1 (Lemma A.1 in [27]). Let {Yt , t ∈ N} be a
sequence of random variables i.i.d. on (Ω, F, P) with E[Yt ] = By Eq. (44) and reordering Eq. (43), it follows that
µ > 0, then for any ϵ > 0, as n → ∞, Pν,fB [ν ≤ T < ν + αγ |T ≥ ν]
" #
≤ ea Pf0 [ν ≤ T < ν + αγ |T ≥ ν]
Pk
max1≤k≤n t=1 Yt
P − µ > ϵ → 0. (30)
n ν,fB (Hj )
h P  i
+ Pν,fB max log > a T ≥ ν . (45)
ν≤j<ν+αγ Pf0 (Hj )
Note that
To show that the first term at the right-hand side of Eq. (45)
WADDfB (T ) := sup Eν,fB [T − ν|T ≥ ν] (31) converges to 0 as γ → ∞, we can utilize the proof-by-
ν≥1
contradiction argument as in the proof of [25, Theorem 1].
≥ Eν,fB [T − ν|T ≥ ν] (32) Let αγ be a positive integer and αγ < γ. For any T ∈ Cγ , we
(a)
have Ef0 [T ] ≥ γ and then for some ν ≥ 1, Pf0 [T ≥ ν] > 0
≥ Pν,fB [T − ν ≥ αγ |T ≥ ν] × αγ , (33)
and
where inequality (a) is by the Markov inequality. It then αγ
Pf0 [T ≤ ν + αγ |T ≥ ν] ≤ , (46)
suffices to show that as γ → ∞, γ
Pν,fB [T − ν ≥ αγ |T ≥ ν] → 1, (34) because otherwise Pf0 [T ≥ ν + αγ |T ≥ ν] < 1 − αγ/γ for all
ν ≥ 1 with Pf0 [T ≥ ν] > 0, implying that Ef0 [T ] ≤ γ.
or equivalently,
Let a = log γ (1−ϵ) , then
Pν,fB [ν ≤ T < ν + αγ |τ ≥ ν] → 0. (35)
ea Pf0 [ν ≤ T < ν + αγ |T ≥ ν]
We will first show that Eq. (35) holds when αγ αγ
≤ ea = ϵ → 0, as γ → ∞. (47)
γ γ
1
αγ = log γ (1−ϵ) , ϵ > 0, (36)
DKL (fB ||f0 ) + ϵ We then show that the second term at the right-hand side of
using a change-of-measure argument. Specifically, Eq. (45) also converges to 0 as γ → ∞.
ν,fB (Hj )
h P  i
Pf0 [ν ≤ T < ν + αγ ] Pν,fB max log >aT ≥ν
ν≤j<ν+αγ Pf0 (Hj )
= Ef0 [1{ν≤T <ν+αγ } ] (37)
j  f (X ) 
Pf0 (HT ) i
h i
B i
h X
(a)
= Eν,fB 1{ν≤T <ν+αγ } (38) = Pν,fB max log > a|T ≥ ν (48)
Pν,fB (HT ) ν≤j<ν+αγ
i=ν
f0 (Xi )
h Pf0 (HT ) i j  f (X ) 
≥ Eν,fB 1 (39) (a)
h i
B i
Pf (HT )
X
0
{ν≤T <ν+αγ ,log( P
ν,fB (HT )
)≥−a} Pν,f (HT ) = Pν,fB max log >a (49)
B ν≤j<ν+αγ
i=ν
f0 (Xi )
h  P (H )  i
f0 T
≥ e−a Pν,fB ν ≤ T < ν + αγ , log ≥ −a (b) h j  f (X ) 
Pν,fB (HT ) B i
X
≤ Pν,fB max log
(40) ν≤j<ν+αγ
i=ν
f0 (Xi )
ν,fB (HT )
h P  i i
= e−a Pν,fB ν ≤ T < ν + αγ , log ≤a > αγ (DKL (fB ||f0 ) + ϵ) (50)
Pf0 (HT )
(41) → 0, as γ → ∞. (51)
h
≥ e−a Pν,fB ν ≤ T < ν + αγ , where equality (a) is due to the fact that the event {T ≥ ν}
is independent from Xi , ∀i ≥ ν; inequality (b) is because
ν,fB (Hj )
P  i
max log ≤a (42) a ≥ αγ × (DKL (fB ||f0 ) + ϵ); and the last step is by applying
ν≤j<ν+αγ Pf0 (Hj ) Lemma 1.
(b) h i
≥ e−a Pν,fB ν ≤ T < ν + αγ Similarly, we can show that Eq. (35) holds when
ν,fB (Hj ) 1
h P  i
− e−a Pν,fB max log >a (43) αγ = log γ (1−ϵ) , ϵ > 0, (52)
ν≤j<ν+αγ Pf0 (Hj ) DKL (fB ||fC ) + ϵ
9

using a change-of-measure argument. Specifically, where equality (a) is due to the fact that the event {T ≥ ν}
is independent from Xi , ∀i ≥ ν; inequality (b) is because
Pν,fC [ν ≤ T < ν + αγ ]
a ≥ αγ × (DKL (fB ||fC ) + ϵ); and the last step is by applying
(a)
h Pν,fC (HT ) i Lemma 1.
= Eν,fB 1{ν≤T <ν+αγ } (53)
Pν,fB (HT )
(b) h i By Eq. (36)-(51), we show that
≥ e−a Pν,fB ν ≤ T < ν + αγ log γ
W ADDfB (T ) ≥ (1 − o(1)), (63)
ν,fB (Hj ) DKL (fB ||f0 ) + o(1)
h P  i
− e−a Pν,fB max log >a (54)
ν≤j<ν+αγ Pν,fC (Hj ) and by Eq. (52)-(62), we show that
where Ht = (X1 , ..., Xt ), t ∈ N; change-of-measure argument log γ
(a) holds because Pν,fC and Pν,fB are measures over a com- W ADDfB (T ) ≥ (1 − o(1)). (64)
DKL (fB ||fC ) + o(1)
mon measurable space, Pν,fB is σ-finite, and Pν,fC ≪ Pν,fB ;
a will be specified later; and inequality (b) is because, for any Therefore, we have that
event A and B, P[A ∩ B] ≥ P[A] − P[B c ].
W ADDfB (T )
The event {T ≥ ν} only depends on Hν−1 , which follows the n log(γ)(1 − o(1)) log(γ)(1 − o(1)) o
same distribution under Pν,fC and Pν,fB . This implies ≥ max ,
DKL (fB ||f0 ) + o(1) DKL (fB ||fC ) + o(1)
Pν,fC [T ≥ ν] = Pν,fB [T ≥ ν]. (55) (65)
log γ
By Eq. (55) and reordering Eq. (54), it follows that = (1 − o(1)).
min{DKL (fB ||f0 ), DKL (fB ||fC )} + o(1)
Pν,fB [ν ≤ T < ν + αγ |T ≥ ν] (66)
≤ ea Pν,fC [ν ≤ T < ν + αγ |T ≥ ν]
ν,fB (Hj )
h P  i
+ Pν,fB max log > a T ≥ ν . (56)
ν≤j<ν+αγ Pν,fC (Hj )
A PPENDIX B
To show that the first term at the right-hand side of Eq. (56) PROOF OF T HEOREM 2
converges to 0 as γ → ∞, we can utilize the proof-by- Proof. To assist our analysis, we first define intermediate
contradiction argument as in the proof of [25, Theorem 1]. stopping times:
Let αγ be a positive integer and αγ < γ. For any T ∈ Cγ , we
have inf Eν,fC [T ] ≥ γ and inf Pν,fC [T ≥ ν] > 0, TCuSumW := inf{t ≥ 1 : CuSumW [t] ≥ b0 }, (67)
ν≥1 ν≥1
TCuSumSΛ := inf{t ≥ 1 : CuSumSΛ [t] ≥ bC }, (68)
αγ
inf Pν,fC [T ≤ ν + αγ |T ≥ ν] ≤ , (57) TCuSumfB ,fC := inf{t ≥ 1 : CuSumfB ,fC [t] ≥ bC }. (69)
ν≥1 γ
because otherwise inf Pν,fC [T ≥ ν + αγ |T ≥ ν] < 1 − αγ/γ By the algorithmic property of S-CuSum, we have that
ν≥1
with inf Pν,fC [T ≥ ν] > 0, implying that inf Eν,fC [T ] ≤ γ. TS-CuSum = TCuSumSΛ (70)
ν≥1 ν≥1

Let a = log γ (1−ϵ) , then = TCuSumW + TCuSumfB ,fC . (71)

ea Pν,fC [ν ≤ T < ν + αγ |T ≥ ν] Hence, if we show


αγ αγ
≤ ea = ϵ → 0, as γ → ∞. (58) E∞ [TS-CuSum ] ≥ E∞ [TCuSumW ] ≥ γ, (72)
γ γ
inf Eν,fC [TS-CuSum ]
ν≥1
We then show that the second term at the right-hand side of
Eq. (45) also converges to 0 as γ → ∞. ≥ min{ν + E1,fC [TCuSumfB ,fC ], E∞ [TCuSumW ]} ≥ γ, (73)
h P
ν,fB (Hj )
 i then we have TJ-CuSum ∈ Cγ .
Pν,fB max log >aT ≥ν
ν≤j<ν+αγ Pν,fC (Hj ) In the following, we first lower bound the average run time
j
h X  f (X ) 
B i
i to false alarm for pre-change of CuSumW by relating it to a
= Pν,fB max log > a|T ≥ ν (59) Shiryaev-Robert test. We define the Shiryaev-Roberts statistics
ν≤j<ν+αγ
i=ν
fC (Xi )
j
corresponding to CuSumW as
(a)
h  f (X )  i
B i
X
= Pν,fB max log >a (60) t Y
t
ν≤j<ν+αγ fC (Xi )
X fB (Xj )
i=ν RW [t] = (74)
i=1 j=i
f0 (Xj )
j
(b) h  f (X ) 
B i
X
≤ Pν,fB max log with recursion
ν≤j<ν+αγ
i=ν
f C (Xi )
i fB (Xt+1 )
> αγ (DKL (fB ||fC ) + ϵ) (61) RW [t + 1] = (1 + RW [t]) , (75)
f0 (Xt+1 )
→ 0, as γ → ∞. (62) RW [0] = 0. (76)
10

Let the stopping time In the following, we lower bound E1,fC [TCuSumfB ,fC ] follow-
b0 ing the similar reasoning as for lower bounding E∞ [TCuSumW ]
TRW := inf{t ≥ 1 : RW [t] ≥ e }. (77) in Eq. (74)-(85). Specifically, we let
We have
t Y
t
X fB (Xj )
TCuSumW ≥ TRW , (78) RΛ := , (88)
i=1 j=i
fC (Xj )
because
TRΛ := inf t ≥ 1 : RΛ [t] ≥ ebC ,

  (89)
 t
X 
RW [t] ≥ exp max W [j] . (79) and we have that {RΛ [t] − t} is a martingale w.r.t.
1≤i≤t 
j=i σ(X1 , ..., Xt ). Then by Doob’s optional stopping theorem, we
It then suffices to just lower bound the average run time to have
false alarm for pre-change of RW .
E1,fC [TCuSumfB ,fC ] ≥ E1,fC [TRΛ ] (90)
We have that {RW [t] − t} is a martingale w.r.t. σ(X1 , ..., Xt ) bC
since = E1,fC [RΛ [TRΛ ]] ≥ e = γ, (91)

E∞ [(RW [t + 1] − (t + 1))|σ(X1 , ..., Xt )] where the last inequality is by letting bC = log γ.


 
(a) fB (Xt+1 )
= E∞ (1 + RW [t]) σ(X1 , ..., Xt ) By Eq. (72)(73)(85)(91), we show that TJ-CuSum ∈ Cγ .
f0 (Xt+1 )
− (t + 1) (80)
 
(b) fB (Xt+1 )
= (1 + RW [t])E∞ − (t + 1) (81)
f0 (Xt+1 )
= (1 + RW [t]) × 1 − (t + 1) A PPENDIX C
(c) PROOF OF T HEOREM 3
= RW [t] − t, (82)
Proof. By the fact that CuSumW [t] and CuSumSΛ [t] are always
where equality (a) is due to the recursion definition of RW ,
non-negative and are zero when t = 0, the worse-case of the
equality (b) is because RW [t] is σ(X1 , ..., Xt -measurable, and
average detection delay happens when change point ν = 1.
equality (c) shows that {RW [t] − t} satisfies the definition of
a martingale. To assist the analysis, we will use intermediate stop-
Since {RW [t] − t} is a martingale, by the Doob’s optional ping times TCuSumW , TCuSumSΛ , and TCuSumfB ,fC defined in
stopping/sampling theorem, Eq. (67)(68)(69) respectively.

E∞ [RW [TRW ] − TRW ] = E∞ [RW [0]] = 0. (83) By the algorithmic properties of S-CuSum, we have that

Hence, by letting b0 = log γ, we have E1,fB [TS-CuSum ] = E1,fB [TCuSumW ] + E1,fB [TCuSumfB ,fC ].
E∞ [TCuSumW ] ≥ E∞ [TRW ] (84) (92)
= E∞ [RW [TRW ]] ≥ eb0 = γ. (85)
Let αb0 = b0/DKL (fB ||f0 ) and αbC = bC/DKL (fB ||fC ). Then,
In the following, we lower bound the shortest average run
time to false alarm for confusing change of TS-CuSum . Note E1,fB [TCuSumW /αb0 ]
that when a false alarm for confusing change is triggered by X∞
S-CuSum, there are only two possible cases. In one case, = P1,fB [TCuSumW /αb0 > ℓ] (93)
S-CuSum starts updating TCuSumSΛ after the change point ν, ℓ=0
i.e., CuSumW [ν] < b0 ; in this case, ∞
X
≤1+ P1,fB [TCuSumW /αb0 > ℓ] (94)
Eν,fC [TS-CuSum ] ≥ ν + E1,fC [TCuSumfB ,fC ]. (86) ℓ=1

In the other case, S-CuSum starts updating TCuSumSΛ after the
X
=1+ P1,fB [∀1 ≤ t ≤ ℓαb0 : CuSumW [t] < b0 ] (95)
change point ν, i.e., CuSumW passes the threshold before ℓ=1
the change point, but CuSumSΛ passes the threshold after the X∞ ℓ
\
change point ν; and hence ≤1+ P1,fB [CuSumW [kαb0 ] < b0 ] (96)
ℓ=1 k=1
Eν,fC [TS-CuSum ] ≥ E∞ [TCuSumW ]. (87)  
∞ Y
ℓ kαb0
(a) X X fB (Xj )
Therefore, the shortest average run time to false alarm for ≤ 1+ P1,fB  max < b0  ,
i:(k−1)αb0 +1≤i≤kαb0 f 0 (Xj )
confusing change inf ν≥1 Eν,fC [TS-CuSum ] is lower bounded by ℓ=1 k=1 j=i
min{ν + E1,fC [TCuSumfB ,fC ], E∞ [TCuSumW ]}. (97)
11

E1,fB [TCuSumfB ,fC /αbC ] E1,fB [TCuSumfB ,fC /αbC ]


X∞ X∞
= P1,fB [TCuSumfB ,fC /αbC > ℓ] (98) ≤1+ δℓ (111)
ℓ=0 ℓ=1
∞ 1
=. (112)
X
≤1+ P1,fB [TCuSumfB ,fC /αbC > ℓ] (99) 1−δ
ℓ=1

This implies that, as γ → ∞,
X
=1+ P1,fB [∀1 ≤ t ≤ ℓαbC : CuSumfB ,fC [t] < bC ] E1,fB [TS-CuSum ]
ℓ=1 αb + αbC
(100) ≤ 0 (113)
∞ ℓ  1−δ 
X\ log γ log γ
≤1+ P1,fB [CuSumfB ,fC [kαbC ] < bC ] (101) ≤ + (1 + o(1)),
DKL (fB ||f0 ) DKL (fB ||fC )
ℓ=1 k=1
∞ Y ℓ

kαb0
 if b0 = bC = log γ (114)
(a) X X fB (Xj )
≤ 1+ P1,fB  max < bC  ≤ 2 log γ
i:(k−1)αb0 +1≤i≤kαb0 f (Xj ) (1 + o(1)),
ℓ=1 k=1 j=i C min{DKL (fB ||f0 ), DKL (fB ||fC )}
(102) DKL (fB ||f0 )
if b0 = ,
min{DKL (fB ||f0 ), DKL (fB ||fC )}
where inequalities (a) are by the definitions of CuSumW and DKL (fB ||fC )
CuSumfB ,fC and by the independency among the random bC = . (115)
min{DKL (fB ||f0 ), DKL (fB ||fC )}
variables.

It follows from Lemma 1 that


A PPENDIX D
Pαb0 fB (Xj ) PROOF OF T HEOREM 4
max j=i f0 (Xj )
i:1≤i≤αb0 p
→ β, (103) Proof. To assist the analysis, we will use intermediate stop-
b0 ping times TCuSumW and TCuSumfB ,fC defined in Eq. (67)(69)
Pαb0 fB (Xj )
max j=i fC (Xj ) respectively and
i:1≤i≤αbC p
→ β, (104) TCuSumJΛ := inf{t ≥ 1 : CuSumJΛ [t] ≥ bC }. (116)
bC

where β > 1. Therefore, as b0 , bC → ∞, By algorithmic property of J-CuSum, we have that



αb0
 TJ-CuSum = max{TCuSumW , TCuSumJΛ }. (117)
X fB (Xj )
P1,fB  max < b0  → 0, (105) Hence, if we show
i:1≤i≤αb0 f (Xj )
j=i 0
  E∞ [TJ-CuSum ] ≥ E∞ [TCuSumW ] ≥ γ, (118)
αb0
X fB (Xj ) inf Eν,fC [TJ-CuSum ] ≥ inf Eν,fC [TCuSumJΛ ] ≥ γ, (119)
P1,fB  max < bC  → 0. (106) ν≥1 ν≥1
i:1≤i≤αb0 f (Xj )
j=i C then we have TJ-CuSum ∈ Cγ .

This implies that By the analysis in Appendix B, i.e., Eq. (74)-(85), we already
have that
 
αb0
X fB (Xj ) E∞ [TCuSumW ] ≥ γ. (120)
P1,fB  max < b0  ≤ δ, (107)
i:1≤i≤αb0
j=i
f0 (Xj )
  In the following, we lower bound the shortest average run time
αb0
X fB (Xj ) to false alarm for confusing change of TJ-CuSum . To assist
P1,fB  max < bC  ≤ δ, (108) this analysis, we define TCuSumW in terms of a sequence of
i:1≤i≤αb0
j=i
fC (Xj ) sequential probability ratio tests as in [26]. Specifically, let
t
X
where δ can be arbitrarily small for large b0 and bC . S[t] := W [i] (121)
i=1
By Eq. (107)(108) and Eq. (97)(102), we have
N1 := inf{t ≥ 1 : S[t] ∈
/ (0, b0 )}, (122)
E1,fB [TCuSumW /αb0 ]+ N2 := inf{t ≥ 1 : S[N1 + t] − S[N1 ] ∈
/ (0, b0 )}, (123)
X∞ Nk := inf{t ≥ 1 : S[N1 + N2 + ... + Nk−1 + t]
≤1+ δℓ (109) − S[N1 + N2 + ... + Nk−1 ] ∈
/ (0, b0 )}, (124)
ℓ=1
1 M := inf{k ≥ 1 : S[N1 + N2 + ... + Nk ]
= , (110) − S[N1 + N2 + ... + Nk−1 ] ≥ b0 }; (125)
1−δ
12

this way,
TCuSumW ≡ N1 + N2 + ... + NM ; (126)
besides, we also let
α := Pf0 [S[N1 ] ≥ b0 ], (127)
β := PfB [S[N1 ] ≤ a], a → 0− . (128) (a) Case 1 (b) Case 2
Fig. 7: Illustration of Possible Performances of J-CuSum
By the algorithmic property of J-CuSum, when confusing
when τ = 1: in Case 1, CuSumJΛ passes its threshold before
change occurs and ν → Ef0 [N1 ]− , J-CuSum has the shortest
CuSumW does. In Case 2, CuSumJΛ passes its threshold after
average run time for CuSumJΛ to pass the threshold; and
CuSumW does.
Ef0 [CuSumJΛ [ν]]
(a)
h fB (X) i
= Ef0 log ×ν (129)
fC (X)
h fB (X) i
≃ Ef0 log × Ef0 [N1 ], as ν → Ef0 [N1 ]− (130)
fC (X)
(b)
h fB (X) i a(eb0 − 1) + b0 (1 − ea ) cases: 1) CuSumW has not passed threshold b0 yet (as illus-
≃ Ef0 log × , a → 0− trated in Figure 7a), 2) CuSumW has already passed threshold
fC (X) −(eb0 − ea )DKL (f0 ||fB )
(131) b0 (as illustrated in Figure 7b). In the following, we analyze
each case separately.
≃0 (132)
where equality (a) is by Wald’s identity; approximation (b) is
by [26, Eq. (2.15)]. Hence
inf Eν,fC [TCuSumJΛ ] We first consider case 1, the case that CuSumΛ passes
ν≥1

threshold bC while CuSumW has not passed threshold b0
(as illustrated in Figure 7a). In this case, the worse case
X
= inf Pν,fC [TCuSumJΛ ≥ t] (133)
ν≥1
t=0
detection delay of J-CuSum is simply upper bounded by that
X∞ of CuSumW . And by the analysis in Appendix C, we have
= inf Pν,fC [CuSumJΛ [t] < bC ] (134) that, with b0 = log γ, γ → ∞, in case 1:
ν≥1
t=0

(a) X
≃ PfC [CuSumfB ,fC [t] < bC ] (135)
t=0

X
= PfC [TCuSumfB ,fC ≥ t] (136)
t=0
= EfC [TCuSumfB ,fC ] (137) E1,fB [TJ-CuSum ]
(b)
≥ γ, (138) ≤ E1,fB [TCuSumW ] (139)
log γ
where approximation (a) follows from Eq. (132), and inequal- ≤ (1 + o(1)). (140)
DKL (fB ||f0 )
ity (b) follows from the analysis in Appendix B, i.e., Eq. (88)-
(91).

A PPENDIX E
PROOF OF T HEOREM 5
Proof. By the fact that CuSumW [t] and CuSumJΛ [t] are always
non-negative and are zero when t = 0, the worse-case of the
average detection delay happens when change point ν = 1.
We then proceed to consider case 2, the case that CuSumΛ
To assist the analysis, we will use intermediate stopping times
passes threshold bC when CuSumW has already passed thresh-
TCuSumW and TCuSumfB ,fC defined in Eq. (67)(69) respectively
old b0 (as illustrated in Figure 7b). To assist this analysis,
and TCuSumJΛ defined in Eq. (116).
as introduced in Eq. (121)-(128), we follow [26] defining
By the algorithmic property of J-CuSum, we have that, when TCuSumW in terms of a sequence of sequential probability ratio
CuSumJΛ passes threshold bC , there are only two possible tests. Then we have that, with b0 = bC = log γ, γ → ∞, in
13

case 2:
E1,fB [TJ-CuSum ]
≤ E1,fB [TCuSumJΛ ] (141)
"M −1 #
X
≤ E1,fB Nk + E1,fB [TCuSumfB ,fC ] (142)
k=1
= E1,fB [(M − 1) · N1 ] + E1,fB [TCuSumfB ,fC ] (143)
 
(a) 1
= − 1 E1,fB [N1 ]
P1,fB [S[N1 ] ≥ b0 ]
+ E1,fB [TCuSumfB ,fC ] (144)
 
(b) 1
≃ − 1 E1,fB [N1 ] + E1,fB [TCuSumfB ,fC ] (145)
1−β
 a b0 
(c) e (e − 1)
≃ E1,fB [N1 ]
eb0 (1 − ea )
+ E1,fB [TCuSumfB ,fC ], a → 0− (146)
a b0 a b0 b0 a
(d) e (e − 1)[ae (e − 1) + b0 e (1 − e )]

eb0 (1 − ea )(eb0 − ea )DKL (fB ||f0 )
+ E1,fB [TCuSumfB ,fC ], a → 0− (147)
(e) b0 − 1 + 1/eb0
= + E1,fB [TCuSumfB ,fC ] (148)
DKL (fB ||f0 )
(f )
 
log γ log γ
≤ + (1 + o(1)). (149)
DKL (fB ||f0 ) DKL (fB ||fC )
where equality (a) is by Wald’s identity, approximation (b) is
following [26, Eq. (2.52)(2.53)], approximation (c) is follow-
ing [26, Eq. (2.11)(2.12)], approximation (d) is by [26, Eq.
(2.15)], equality (e) is by L’Hôpital’s rule, and inequality (f)
is by the analysis in Appendix C.

You might also like