Professional Documents
Culture Documents
Quickest Change Detection With Confusing Change: S-Cusum
Quickest Change Detection With Confusing Change: S-Cusum
Quickest Change Detection With Confusing Change: S-Cusum
Abstract—In the problem of quickest change detection (QCD), physical obstacles or unrelated nearby communication systems
a change occurs at some unknown time in the distribution of can cause signal interference.
a sequence of independent observations. This work studies a
QCD problem where the change is either a bad change, which This study addresses a QCD problem with the aim of swiftly
arXiv:2405.00842v1 [math.ST] 1 May 2024
we aim to detect, or a confusing change, which is not of our detecting bad changes while preventing false alarms for pre-
interest. Our objective is to detect a bad change as quickly as
possible while avoiding raising a false alarm for pre-change or change stages or confusing changes. We consider a sequence
a confusing change. We identify a specific set of pre-change, bad of observations X1 , X2 , ..., Xt generated from a stochastic
change, and confusing change distributions that pose challenges system. During the pre-change stage (i.e., when the system
beyond the capabilities of standard Cumulative Sum (CuSum) is in the in-control state), the observations follow distribution
procedures. Proposing novel CuSum-based detection procedures, f0 . At an unknown deterministic time ν, an event occurs and
S-CuSum and J-CuSum, leveraging two CuSum statistics, we
offer solutions applicable across all kinds of pre-change, bad changes the distribution from which the observations arise.
change, and confusing change distributions. For both S-CuSum The event can manifest as either a confusing change with dis-
and J-CuSum, we provide analytical performance guarantees tribution fC or a bad change with distribution fB . Addressing
and validate them by numerical results. Furthermore, both the QCD problem with confusing change prompts intriguing
procedures are computationally efficient as they only require questions regarding the design of a detection procedure that
simple recursive updates.
will not be triggered by a confusing change. Furthermore, the
Index Terms—Sequential change detection, CuSum procedure
false alarm metric in this context differs from that in classical
QCD problems, necessitating considering both pre-change and
confusing change factors.
I. I NTRODUCTION
There has been prior research exploring extensions of the
The quickest change detection (QCD) problem is of fundamen- classical QCD framework to encompass formulations with
tal importance in sequential analysis and statistical inference. more intricate assumptions regarding distributions before or
See, e.g., [1], [2], [3], [4] for books and survey papers on after an event occurs, including composite pre-change dis-
this topic. Moreover, the problem of sequential detection of tribution [15], [16], composite post-change distribution [17],
changes or anomalies in stochastic systems arises in a variety post-change distribution isolation [18], [19], and transient
of science and engineering domains such as signal processing change [20]. Our QCD with confusing change problems differ
in sensor networks [5], [6] and cognitive radio [7], quality from those problems as they still aim to raise an alarm as soon
control in manufacturing [8], [9] and power delivery [10], as any change happens, while we avoid raising an alarm if the
[11], anomaly detection in social network [12] and public change is a confusing change. The most relevant extension for
health [13], and surveillance [14], where timely detection of our study is the formulation of nuisance change, as studied by
changes or abnormalities is crucial for decision-making or Lau and Tay [21]. In a similar vein, [21] considers two types
system control. of changes and aims to detect only one of them. While [21]
In the classical formulation of the QCD problem, the objective presents a more generalized model for change points, allowing
is to detect any change in the distribution. However, in many one type of change to occur after the other, they focus solely
applications, there can be confusing changes that are not of pri- on a restricted set of post-change distributions. In contrast
mary interest, and raising an alarm for a confusing change may to [21], we focus on scenarios where one type of change
result in wasting human resources on checking the system. For occurs and devise solutions applicable to all post-change dis-
instance, in cybersecurity applications, natural fluctuations or tributions. Therefore, we view our work as complementary to
diurnal variations in network traffic patterns can be confusing Lau and Tay’s [21]. From an algorithm design perspective, our
when it comes to identifying potentially malicious activity. In proposed procedures differ from prior procedures that involve
environment or industrial process monitoring, delicate sensors two CuSum Statistics [22], [23], and our procedures are more
or instruments can experience wear or drift in calibration and, computationally efficient than the window-limited generalized
therefore, introduce changes in the data that are not reflective likelihood ratio test-based procedure proposed in [21].
of the underlying environment or process. Another example is Our contributions are as follows: In Section II, we formulate
wireless communication in a complicated environment, where a novel quickest change detection problem where the change
can be either a bad change or a confusing change, and
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may we propose a false alarm metric that takes not only the
no longer be accessible. run length to false alarm for pre-change but also the run
2
length to false alarm for confusing change into account. In TABLE I: Possible Drifts of CuSumfB ,f0 and CuSumfB ,fC
Section III, we investigate the problem with possible scenarios Statistics under Distributions f0 , fC , and fB
with different combinations of pre-change, bad change, and CuSumfB ,f0
confusing change distributions, and we identify a scenario
Under f0 Ef0 [log(fB (X)/f0 (X))]
in which procedures depending on a single CuSum statistic = −DKL (f0 ||fB ) < 0
will incur short run lengths to false alarm. In Section IV, we
Under fC EfC [log(fB (X)/f0 (X))]
propose two novel procedures, S-CuSum and J-CuSum, that = −DKL (fC ||fB ) + DKL (fC ||f0 ) ≷ 0
incorporate two CuSum statistics and work under all kinds of Under fB EfB [log(fB (X)/f0 (X))]
distributions. In Section V, we first provide a universal lower = DKL (fB ||f0 ) > 0
bound of detection delay for any procedure that fulfills the
CuSumfB ,fC
false alarm requirement, and then we provide the false alarm
upper bounds and detection delay lower bounds of S-CuSum Under f0 Ef0 [log(fB (X)/fC (X))]
= −DKL (f0 ||fB ) + DKL (f0 ||fC ) ≷ 0
and J-CuSum. In Section VI, our simulation results under all
possible scenarios corroborate our theoretical guarantees for Under fC EfC [log(fB (X)/fC (X))]
= −DKL (fC ||fB ) < 0
S-CuSum and J-CuSum. Finally, in Section VII, we conclude
this paper and discuss future directions. Under fB EfB [log(fB (X)/fC (X))]
= DKL (fB ||fC ) > 0
(a) Confusing Change (b) Bad Change (a) Confusing Change (b) Bad Change
Fig. 1: Illustration of CuSumfB ,f0 Behaviors in Scenario 1 - Fig. 2: Illustration of CuSumfB ,f0 (Blue Lines) and
applying standard CuSumfB ,f0 procedure suffices to detect a CuSumfB ,fC (Red Lines) Behaviors in Scenario 2 - ap-
bad change quickly while avoiding raising false alarm for a plying standard CuSumfB ,fC procedure suffices to detect a
confusing change. bad change quickly while avoiding raising false alarm for a
confusing change.
Algorithm 1 Successive CuSum (S-CuSum) Procedure Algorithm 2 Joint CuSum (J-CuSum) Policy
Input: f0 , fC , fB , b0 , bC Input: f0 , fC , fB , b0 , bC
Initialize: t = 0, CuSumW [t] ← 0, CuSumSΛ [t] ← 0 Initialize: t = 0, CuSumW [t] ← 0, CuSumJΛ [t] ← 0
while 1 do while 1 do
t←t+1 t←t+1
if CuSumW [t − 1] < b0 then if CuSumW [t − 1] < b0 then
CuSumW [t] ← (CuSumW [t − 1] + W [t])+ CuSumW [t] ← (CuSumW [t − 1] + W [t])+
else + end if
CuSumSΛ [t] ← CuSumSΛ [t − 1] + Λ[t] if CuSumJΛ [t − 1] < bC then
+
if CuSumSΛ [t − 1] ≥ bC then CuSumJΛ [t] ← (CuSumJΛ [t − 1] + Λ[t])
TS-CuSum ← t end if
Break if CuSumW [t] ≥ b0 , CuSumJΛ [t] ≥ bC then
end if TJ-CuSum ← t
end if Break
end while else if CuSumW [t] ≤ 0 then
Output: stopping time TS-CuSum CuSumW [t] ← 0
CuSumJΛ [t] ← 0
end if
end while
Output: stopping time TJ-CuSum
statistic w.r.t. W [t] has passed the threshold. Specifically, we We first study the universal lower bound on the WADDfB for
let the test statistic w.r.t. W [t] be any procedure whose run length to false alarm is no smaller
than γ.
CuSumW [t] :=
Theorem 1 (Universal Detection Delay Lower Bound). As
0,
if t = 0,
γ → ∞, we have
+
(CuSumW [t − 1] + W [t]) , if CuSumW [t − 1] < b0 ,
CuSumW [t − 1], otherwise. inf WADDfB (T )
T ∈Cγ0 ,γC
(17) log(γ)(1 − o(1))
≥ . (23)
And let the test statistic w.r.t. Λ[t] be min{DKL (fB ||f0 ), DKL (fB ||fC )} + o(1)
where Cγ is defined in Eq. (1). • for Scenario 1, we let f0 = N (0, 1), fC = N (−0.5, 1),
and fB = N (0.5, 1);
The full proof of Theorem 4 is deferred to Appendix D. By • for Scenario 2, we let f0 = N (0, 1), fC = N (0.7, 1),
the algorithmic property of J-CuSum, when confusing change and fB = N (1.2, 1);
occurs and the change point is just right before CuSumJΛ being • for Scenario 3, we let f0 = N (0, 1), fC = N (1, 1), and
reset by CuSumW , J-CuSum has the shortest average run fB = N (0.5, 1).
time for CuSumJΛ to pass the threshold. To analyze the run
length between a reset to the next reset, we follow [26] to For each scenario, we run all procedures under P1,fB , P∞ , and
define the stopping time of CuSumW in terms of the stopping P1,fC with varying thresholds (for S-CuSum and J-CuSum,
times of a sequence of sequential probability ratio tests. It we let b0 = bC ) to learn these procedures’ detection delays,
follows from the approximation of the stopping time of a pre-change run lengths to false alarms, and run lengths to false
corresponding sequential probability ratio test given in [26] alarm for confusing change respectively. For each scenario,
that the expected value of CuSumJΛ just right before a reset is underlying distribution, and threshold, we perform 60 inde-
almost zero. Therefore, we can utilize results in the proof of pendent trials and report the average.
Theorem 2 to prove that TJ-CuSum ∈ Cγ .
In Figure 6, we plot the average detection delays of the
Theorem 5 (J-CuSum Detection Delay Upper Bound). With procedures against their average run lengths to false alarm for
b0 = bC = log γ, as γ → ∞, we have pre-change or for confusing change, whichever is smaller.This
is because our false alarm requirement, Eq. (1), asks for both
WADDfB (TJ-CuSum )
the run lengths to false alarm for pre-change and confusing
log γ log γ change to be no smaller than the same threshold; hence,
⪅ + (1 + o(1)) (28)
DKL (fB ||f0 ) DKL (fB ||fC ) we plot against whichever is smaller to make sure that the
2 log γ requirement is fulfilled. Moreover, We plot run lengths to
≤ (1 + o(1)). (29)
min{DKL (fB ||f0 ), DKL (fB ||fC )} false alarm on a logarithmic scale while plotting detection
delay on a linear scale. Hence, straight lines on the figures
The full proof of Theorem 5 is deferred to Appendix E. indicate that detection delays grow logarithmically with regard
Because CuSumW [t] and CuSumJΛ [t] are always non-negative to run lengths to false alarm; whereas the steeply rising
and are zero when t = 0, the worst-case average detection lines on the figures indicate that detection delays grow super-
delay occurs when the change point ν = 1. And by the logarithmically with regard to run lengths to false alarm.
algorithmic property of J-CuSum, E1,fB [TJ-CuSum ] is up- First, Figures 6a, 6b, and 6c respectively corroborate our
per bounded by both E1,fB [TCuSumW ] and E1,fB [TCuSumJΛ ]. discussion in Section III that CuSumfB ,f0 suffices in Scenario
We upper bound E1,fB [TCuSumW ] using Lemma 1. As for 1, CuSumfB ,fC suffices in Scenario 2, but neither a single
E1,fB [TCuSumJΛ ], we follow [26] to define the stopping time CuSumfB ,f0 nor a single CuSumfB ,fC suffice in Scenario 3
of CuSumW in terms of the stopping times of a sequence to detect a bad change quickly while avoiding raising false
of sequential probability ratio tests and approximate the run alarm for a confusing change. Indeed, Figure 6c shows that
length to the last resetting. both CuSumfB ,f0 and CuSumfB ,fC incur very large average
detection delays in order to achieve same average run lengths
VI. N UMERICAL R ESULTS to false alarm as S-CuSum or J-CuSUm.
In this section, we numerically compare S-CuSum and Second, Figures 6a, 6b, and 6c show that both S-CuSum
J-CuSum to baselines, CuSumfB ,f0 and CuSumfB ,fC . Specif- and J-CuSum perform well in all three possible scenarios,
ically, we conduct simulations in each of the three scenarios namely under all kinds of pre-change, bad change, and con-
discussed in Section III: fusing change distributions. Indeed, in Figure 6, the lines
7
of S-CuSum and J-CuSum are straight in all three graphs, [7] L. Lai, Y. Fan, and H. V. Poor, “Quickest detection in cognitive radio:
meaning that their detection delays grow logarithmically with A sequential change detection framework,” in IEEE GLOBECOM 2008-
2008 IEEE Global Telecommunications Conference. IEEE, 2008, pp.
regard to their run lengths to false alarm. 1–5.
Finally, as shown in Figures 6a, 6b, and 6c, our simula- [8] T. L. Lai, “Sequential changepoint detection in quality control and
tions empirically support our discussion in Section IV that dynamical systems,” Journal of the Royal Statistical Society: Series B
(Methodological), vol. 57, no. 4, pp. 613–644, 1995.
the detection delay of S-CuSum still has room to improve
and J-CuSum reduces the detection delay by allowing the [9] W. H. Woodall, D. J. Spitzner, D. C. Montgomery, and S. Gupta, “Using
CuSumJΛ to be launched earlier than CuSumSΛ but still avoiding control charts to monitor process and product quality profiles,” Journal
of Quality Technology, vol. 36, no. 3, pp. 309–320, 2004.
raising false alarms.
[10] T. Banerjee, Y. C. Chen, A. D. Dominguez-Garcia, and V. V. Veeravalli,
VII. C ONCLUSION AND D ISCUSSION “Power system line outage detection and identification—a quickest
change detection approach,” in 2014 IEEE International Conference on
In this paper, we investigated a quickest change detection Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014, pp.
problem where an initially in-control system can transition 3450–3454.
into an out-of-control state due to either a bad event or a [11] J. Sun, M. Saeedifard, and A. S. Meliopoulos, “Backup protection of
confusing event. Our goal was to detect the change as soon multi-terminal hvdc grids based on quickest change detection,” IEEE
Transactions on Power Delivery, vol. 34, no. 1, pp. 177–187, 2018.
as possible if a bad event occurs while avoiding raising an
alarm if a confusing event occurs. We found that when both [12] F. Ji, W. P. Tay, and L. R. Varshney, “An algorithmic framework for
1) the KL-divergence between confusing change distribution estimating rumor sources with different start times,” IEEE Transactions
on Signal Processing, vol. 65, no. 10, pp. 2517–2530, 2017.
fC and pre-change distribution f0 is larger than that between
fC and bad change distribution fB , and 2) the KL-divergence [13] Y. Liang and V. V. Veeravalli, “Quickest change detection with leave-
one-out density estimation,” in ICASSP 2023-2023 IEEE International
between f0 and fC is greater than that between f0 and Conference on Acoustics, Speech and Signal Processing (ICASSP).
fB occurs, typical procedures based on CuSumfB ,f0 and IEEE, 2023, pp. 1–5.
CuSumfB ,fC fail to achieve our objective. Hence, we proposed
[14] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and H. Kim, “A novel
two new detection procedures S-CuSum and J-CuSum that approach to detection of intrusions in computer networks via adaptive
achieve our objective in all scenarios and provide theoretical sequential and batch-sequential change-point detection methods,” IEEE
guarantees as well as numerical corroborations. transactions on signal processing, vol. 54, no. 9, pp. 3372–3382, 2006.
[27] G. Fellouris and A. G. Tartakovsky, “Multichannel sequential detec- where Ht = (X1 , ..., Xt ), t ∈ N; change-of-measure argument
tion—part i: Non-iid data,” IEEE Transactions on Information Theory, (a) holds because Pf0 and Pν,fB are measures over a common
vol. 63, no. 7, pp. 4551–4571, 2017.
measurable space, Pν,fB is σ-finite, and Pf0 ≪ Pν,fB ; a will
be specified later; and inequality (b) is because, for any event
A PPENDIX A A and B, P[A ∩ B] ≥ P[A] − P[B c ].
PROOF OF T HEOREM 1
The event {T ≥ ν} only depends on Hν−1 , which follows the
Proof. We first recall the following generalized version of the
same distribution under Pf0 and Pν,fB . This implies
Weak Law of Large Number.
Pf0 [T ≥ ν] = Pν,fB [T ≥ ν]. (44)
Lemma 1 (Lemma A.1 in [27]). Let {Yt , t ∈ N} be a
sequence of random variables i.i.d. on (Ω, F, P) with E[Yt ] = By Eq. (44) and reordering Eq. (43), it follows that
µ > 0, then for any ϵ > 0, as n → ∞, Pν,fB [ν ≤ T < ν + αγ |T ≥ ν]
" #
≤ ea Pf0 [ν ≤ T < ν + αγ |T ≥ ν]
Pk
max1≤k≤n t=1 Yt
P − µ > ϵ → 0. (30)
n ν,fB (Hj )
h P i
+ Pν,fB max log > a T ≥ ν . (45)
ν≤j<ν+αγ Pf0 (Hj )
Note that
To show that the first term at the right-hand side of Eq. (45)
WADDfB (T ) := sup Eν,fB [T − ν|T ≥ ν] (31) converges to 0 as γ → ∞, we can utilize the proof-by-
ν≥1
contradiction argument as in the proof of [25, Theorem 1].
≥ Eν,fB [T − ν|T ≥ ν] (32) Let αγ be a positive integer and αγ < γ. For any T ∈ Cγ , we
(a)
have Ef0 [T ] ≥ γ and then for some ν ≥ 1, Pf0 [T ≥ ν] > 0
≥ Pν,fB [T − ν ≥ αγ |T ≥ ν] × αγ , (33)
and
where inequality (a) is by the Markov inequality. It then αγ
Pf0 [T ≤ ν + αγ |T ≥ ν] ≤ , (46)
suffices to show that as γ → ∞, γ
Pν,fB [T − ν ≥ αγ |T ≥ ν] → 1, (34) because otherwise Pf0 [T ≥ ν + αγ |T ≥ ν] < 1 − αγ/γ for all
ν ≥ 1 with Pf0 [T ≥ ν] > 0, implying that Ef0 [T ] ≤ γ.
or equivalently,
Let a = log γ (1−ϵ) , then
Pν,fB [ν ≤ T < ν + αγ |τ ≥ ν] → 0. (35)
ea Pf0 [ν ≤ T < ν + αγ |T ≥ ν]
We will first show that Eq. (35) holds when αγ αγ
≤ ea = ϵ → 0, as γ → ∞. (47)
γ γ
1
αγ = log γ (1−ϵ) , ϵ > 0, (36)
DKL (fB ||f0 ) + ϵ We then show that the second term at the right-hand side of
using a change-of-measure argument. Specifically, Eq. (45) also converges to 0 as γ → ∞.
ν,fB (Hj )
h P i
Pf0 [ν ≤ T < ν + αγ ] Pν,fB max log >aT ≥ν
ν≤j<ν+αγ Pf0 (Hj )
= Ef0 [1{ν≤T <ν+αγ } ] (37)
j f (X )
Pf0 (HT ) i
h i
B i
h X
(a)
= Eν,fB 1{ν≤T <ν+αγ } (38) = Pν,fB max log > a|T ≥ ν (48)
Pν,fB (HT ) ν≤j<ν+αγ
i=ν
f0 (Xi )
h Pf0 (HT ) i j f (X )
≥ Eν,fB 1 (39) (a)
h i
B i
Pf (HT )
X
0
{ν≤T <ν+αγ ,log( P
ν,fB (HT )
)≥−a} Pν,f (HT ) = Pν,fB max log >a (49)
B ν≤j<ν+αγ
i=ν
f0 (Xi )
h P (H ) i
f0 T
≥ e−a Pν,fB ν ≤ T < ν + αγ , log ≥ −a (b) h j f (X )
Pν,fB (HT ) B i
X
≤ Pν,fB max log
(40) ν≤j<ν+αγ
i=ν
f0 (Xi )
ν,fB (HT )
h P i i
= e−a Pν,fB ν ≤ T < ν + αγ , log ≤a > αγ (DKL (fB ||f0 ) + ϵ) (50)
Pf0 (HT )
(41) → 0, as γ → ∞. (51)
h
≥ e−a Pν,fB ν ≤ T < ν + αγ , where equality (a) is due to the fact that the event {T ≥ ν}
is independent from Xi , ∀i ≥ ν; inequality (b) is because
ν,fB (Hj )
P i
max log ≤a (42) a ≥ αγ × (DKL (fB ||f0 ) + ϵ); and the last step is by applying
ν≤j<ν+αγ Pf0 (Hj ) Lemma 1.
(b) h i
≥ e−a Pν,fB ν ≤ T < ν + αγ Similarly, we can show that Eq. (35) holds when
ν,fB (Hj ) 1
h P i
− e−a Pν,fB max log >a (43) αγ = log γ (1−ϵ) , ϵ > 0, (52)
ν≤j<ν+αγ Pf0 (Hj ) DKL (fB ||fC ) + ϵ
9
using a change-of-measure argument. Specifically, where equality (a) is due to the fact that the event {T ≥ ν}
is independent from Xi , ∀i ≥ ν; inequality (b) is because
Pν,fC [ν ≤ T < ν + αγ ]
a ≥ αγ × (DKL (fB ||fC ) + ϵ); and the last step is by applying
(a)
h Pν,fC (HT ) i Lemma 1.
= Eν,fB 1{ν≤T <ν+αγ } (53)
Pν,fB (HT )
(b) h i By Eq. (36)-(51), we show that
≥ e−a Pν,fB ν ≤ T < ν + αγ log γ
W ADDfB (T ) ≥ (1 − o(1)), (63)
ν,fB (Hj ) DKL (fB ||f0 ) + o(1)
h P i
− e−a Pν,fB max log >a (54)
ν≤j<ν+αγ Pν,fC (Hj ) and by Eq. (52)-(62), we show that
where Ht = (X1 , ..., Xt ), t ∈ N; change-of-measure argument log γ
(a) holds because Pν,fC and Pν,fB are measures over a com- W ADDfB (T ) ≥ (1 − o(1)). (64)
DKL (fB ||fC ) + o(1)
mon measurable space, Pν,fB is σ-finite, and Pν,fC ≪ Pν,fB ;
a will be specified later; and inequality (b) is because, for any Therefore, we have that
event A and B, P[A ∩ B] ≥ P[A] − P[B c ].
W ADDfB (T )
The event {T ≥ ν} only depends on Hν−1 , which follows the n log(γ)(1 − o(1)) log(γ)(1 − o(1)) o
same distribution under Pν,fC and Pν,fB . This implies ≥ max ,
DKL (fB ||f0 ) + o(1) DKL (fB ||fC ) + o(1)
Pν,fC [T ≥ ν] = Pν,fB [T ≥ ν]. (55) (65)
log γ
By Eq. (55) and reordering Eq. (54), it follows that = (1 − o(1)).
min{DKL (fB ||f0 ), DKL (fB ||fC )} + o(1)
Pν,fB [ν ≤ T < ν + αγ |T ≥ ν] (66)
≤ ea Pν,fC [ν ≤ T < ν + αγ |T ≥ ν]
ν,fB (Hj )
h P i
+ Pν,fB max log > a T ≥ ν . (56)
ν≤j<ν+αγ Pν,fC (Hj )
A PPENDIX B
To show that the first term at the right-hand side of Eq. (56) PROOF OF T HEOREM 2
converges to 0 as γ → ∞, we can utilize the proof-by- Proof. To assist our analysis, we first define intermediate
contradiction argument as in the proof of [25, Theorem 1]. stopping times:
Let αγ be a positive integer and αγ < γ. For any T ∈ Cγ , we
have inf Eν,fC [T ] ≥ γ and inf Pν,fC [T ≥ ν] > 0, TCuSumW := inf{t ≥ 1 : CuSumW [t] ≥ b0 }, (67)
ν≥1 ν≥1
TCuSumSΛ := inf{t ≥ 1 : CuSumSΛ [t] ≥ bC }, (68)
αγ
inf Pν,fC [T ≤ ν + αγ |T ≥ ν] ≤ , (57) TCuSumfB ,fC := inf{t ≥ 1 : CuSumfB ,fC [t] ≥ bC }. (69)
ν≥1 γ
because otherwise inf Pν,fC [T ≥ ν + αγ |T ≥ ν] < 1 − αγ/γ By the algorithmic property of S-CuSum, we have that
ν≥1
with inf Pν,fC [T ≥ ν] > 0, implying that inf Eν,fC [T ] ≤ γ. TS-CuSum = TCuSumSΛ (70)
ν≥1 ν≥1
Let the stopping time In the following, we lower bound E1,fC [TCuSumfB ,fC ] follow-
b0 ing the similar reasoning as for lower bounding E∞ [TCuSumW ]
TRW := inf{t ≥ 1 : RW [t] ≥ e }. (77) in Eq. (74)-(85). Specifically, we let
We have
t Y
t
X fB (Xj )
TCuSumW ≥ TRW , (78) RΛ := , (88)
i=1 j=i
fC (Xj )
because
TRΛ := inf t ≥ 1 : RΛ [t] ≥ ebC ,
(89)
t
X
RW [t] ≥ exp max W [j] . (79) and we have that {RΛ [t] − t} is a martingale w.r.t.
1≤i≤t
j=i σ(X1 , ..., Xt ). Then by Doob’s optional stopping theorem, we
It then suffices to just lower bound the average run time to have
false alarm for pre-change of RW .
E1,fC [TCuSumfB ,fC ] ≥ E1,fC [TRΛ ] (90)
We have that {RW [t] − t} is a martingale w.r.t. σ(X1 , ..., Xt ) bC
since = E1,fC [RΛ [TRΛ ]] ≥ e = γ, (91)
E∞ [RW [TRW ] − TRW ] = E∞ [RW [0]] = 0. (83) By the algorithmic properties of S-CuSum, we have that
Hence, by letting b0 = log γ, we have E1,fB [TS-CuSum ] = E1,fB [TCuSumW ] + E1,fB [TCuSumfB ,fC ].
E∞ [TCuSumW ] ≥ E∞ [TRW ] (84) (92)
= E∞ [RW [TRW ]] ≥ eb0 = γ. (85)
Let αb0 = b0/DKL (fB ||f0 ) and αbC = bC/DKL (fB ||fC ). Then,
In the following, we lower bound the shortest average run
time to false alarm for confusing change of TS-CuSum . Note E1,fB [TCuSumW /αb0 ]
that when a false alarm for confusing change is triggered by X∞
S-CuSum, there are only two possible cases. In one case, = P1,fB [TCuSumW /αb0 > ℓ] (93)
S-CuSum starts updating TCuSumSΛ after the change point ν, ℓ=0
i.e., CuSumW [ν] < b0 ; in this case, ∞
X
≤1+ P1,fB [TCuSumW /αb0 > ℓ] (94)
Eν,fC [TS-CuSum ] ≥ ν + E1,fC [TCuSumfB ,fC ]. (86) ℓ=1
∞
In the other case, S-CuSum starts updating TCuSumSΛ after the
X
=1+ P1,fB [∀1 ≤ t ≤ ℓαb0 : CuSumW [t] < b0 ] (95)
change point ν, i.e., CuSumW passes the threshold before ℓ=1
the change point, but CuSumSΛ passes the threshold after the X∞ ℓ
\
change point ν; and hence ≤1+ P1,fB [CuSumW [kαb0 ] < b0 ] (96)
ℓ=1 k=1
Eν,fC [TS-CuSum ] ≥ E∞ [TCuSumW ]. (87)
∞ Y
ℓ kαb0
(a) X X fB (Xj )
Therefore, the shortest average run time to false alarm for ≤ 1+ P1,fB max < b0 ,
i:(k−1)αb0 +1≤i≤kαb0 f 0 (Xj )
confusing change inf ν≥1 Eν,fC [TS-CuSum ] is lower bounded by ℓ=1 k=1 j=i
min{ν + E1,fC [TCuSumfB ,fC ], E∞ [TCuSumW ]}. (97)
11
This implies that By the analysis in Appendix B, i.e., Eq. (74)-(85), we already
have that
αb0
X fB (Xj ) E∞ [TCuSumW ] ≥ γ. (120)
P1,fB max < b0 ≤ δ, (107)
i:1≤i≤αb0
j=i
f0 (Xj )
In the following, we lower bound the shortest average run time
αb0
X fB (Xj ) to false alarm for confusing change of TJ-CuSum . To assist
P1,fB max < bC ≤ δ, (108) this analysis, we define TCuSumW in terms of a sequence of
i:1≤i≤αb0
j=i
fC (Xj ) sequential probability ratio tests as in [26]. Specifically, let
t
X
where δ can be arbitrarily small for large b0 and bC . S[t] := W [i] (121)
i=1
By Eq. (107)(108) and Eq. (97)(102), we have
N1 := inf{t ≥ 1 : S[t] ∈
/ (0, b0 )}, (122)
E1,fB [TCuSumW /αb0 ]+ N2 := inf{t ≥ 1 : S[N1 + t] − S[N1 ] ∈
/ (0, b0 )}, (123)
X∞ Nk := inf{t ≥ 1 : S[N1 + N2 + ... + Nk−1 + t]
≤1+ δℓ (109) − S[N1 + N2 + ... + Nk−1 ] ∈
/ (0, b0 )}, (124)
ℓ=1
1 M := inf{k ≥ 1 : S[N1 + N2 + ... + Nk ]
= , (110) − S[N1 + N2 + ... + Nk−1 ] ≥ b0 }; (125)
1−δ
12
this way,
TCuSumW ≡ N1 + N2 + ... + NM ; (126)
besides, we also let
α := Pf0 [S[N1 ] ≥ b0 ], (127)
β := PfB [S[N1 ] ≤ a], a → 0− . (128) (a) Case 1 (b) Case 2
Fig. 7: Illustration of Possible Performances of J-CuSum
By the algorithmic property of J-CuSum, when confusing
when τ = 1: in Case 1, CuSumJΛ passes its threshold before
change occurs and ν → Ef0 [N1 ]− , J-CuSum has the shortest
CuSumW does. In Case 2, CuSumJΛ passes its threshold after
average run time for CuSumJΛ to pass the threshold; and
CuSumW does.
Ef0 [CuSumJΛ [ν]]
(a)
h fB (X) i
= Ef0 log ×ν (129)
fC (X)
h fB (X) i
≃ Ef0 log × Ef0 [N1 ], as ν → Ef0 [N1 ]− (130)
fC (X)
(b)
h fB (X) i a(eb0 − 1) + b0 (1 − ea ) cases: 1) CuSumW has not passed threshold b0 yet (as illus-
≃ Ef0 log × , a → 0− trated in Figure 7a), 2) CuSumW has already passed threshold
fC (X) −(eb0 − ea )DKL (f0 ||fB )
(131) b0 (as illustrated in Figure 7b). In the following, we analyze
each case separately.
≃0 (132)
where equality (a) is by Wald’s identity; approximation (b) is
by [26, Eq. (2.15)]. Hence
inf Eν,fC [TCuSumJΛ ] We first consider case 1, the case that CuSumΛ passes
ν≥1
∞
threshold bC while CuSumW has not passed threshold b0
(as illustrated in Figure 7a). In this case, the worse case
X
= inf Pν,fC [TCuSumJΛ ≥ t] (133)
ν≥1
t=0
detection delay of J-CuSum is simply upper bounded by that
X∞ of CuSumW . And by the analysis in Appendix C, we have
= inf Pν,fC [CuSumJΛ [t] < bC ] (134) that, with b0 = log γ, γ → ∞, in case 1:
ν≥1
t=0
∞
(a) X
≃ PfC [CuSumfB ,fC [t] < bC ] (135)
t=0
∞
X
= PfC [TCuSumfB ,fC ≥ t] (136)
t=0
= EfC [TCuSumfB ,fC ] (137) E1,fB [TJ-CuSum ]
(b)
≥ γ, (138) ≤ E1,fB [TCuSumW ] (139)
log γ
where approximation (a) follows from Eq. (132), and inequal- ≤ (1 + o(1)). (140)
DKL (fB ||f0 )
ity (b) follows from the analysis in Appendix B, i.e., Eq. (88)-
(91).
A PPENDIX E
PROOF OF T HEOREM 5
Proof. By the fact that CuSumW [t] and CuSumJΛ [t] are always
non-negative and are zero when t = 0, the worse-case of the
average detection delay happens when change point ν = 1.
We then proceed to consider case 2, the case that CuSumΛ
To assist the analysis, we will use intermediate stopping times
passes threshold bC when CuSumW has already passed thresh-
TCuSumW and TCuSumfB ,fC defined in Eq. (67)(69) respectively
old b0 (as illustrated in Figure 7b). To assist this analysis,
and TCuSumJΛ defined in Eq. (116).
as introduced in Eq. (121)-(128), we follow [26] defining
By the algorithmic property of J-CuSum, we have that, when TCuSumW in terms of a sequence of sequential probability ratio
CuSumJΛ passes threshold bC , there are only two possible tests. Then we have that, with b0 = bC = log γ, γ → ∞, in
13
case 2:
E1,fB [TJ-CuSum ]
≤ E1,fB [TCuSumJΛ ] (141)
"M −1 #
X
≤ E1,fB Nk + E1,fB [TCuSumfB ,fC ] (142)
k=1
= E1,fB [(M − 1) · N1 ] + E1,fB [TCuSumfB ,fC ] (143)
(a) 1
= − 1 E1,fB [N1 ]
P1,fB [S[N1 ] ≥ b0 ]
+ E1,fB [TCuSumfB ,fC ] (144)
(b) 1
≃ − 1 E1,fB [N1 ] + E1,fB [TCuSumfB ,fC ] (145)
1−β
a b0
(c) e (e − 1)
≃ E1,fB [N1 ]
eb0 (1 − ea )
+ E1,fB [TCuSumfB ,fC ], a → 0− (146)
a b0 a b0 b0 a
(d) e (e − 1)[ae (e − 1) + b0 e (1 − e )]
≃
eb0 (1 − ea )(eb0 − ea )DKL (fB ||f0 )
+ E1,fB [TCuSumfB ,fC ], a → 0− (147)
(e) b0 − 1 + 1/eb0
= + E1,fB [TCuSumfB ,fC ] (148)
DKL (fB ||f0 )
(f )
log γ log γ
≤ + (1 + o(1)). (149)
DKL (fB ||f0 ) DKL (fB ||fC )
where equality (a) is by Wald’s identity, approximation (b) is
following [26, Eq. (2.52)(2.53)], approximation (c) is follow-
ing [26, Eq. (2.11)(2.12)], approximation (d) is by [26, Eq.
(2.15)], equality (e) is by L’Hôpital’s rule, and inequality (f)
is by the analysis in Appendix C.