Professional Documents
Culture Documents
2019-Sampling For Remote Estimation Through Queues Age
2019-Sampling For Remote Estimation Through Queues Age
net/publication/331033809
CITATIONS READS
0 150
2 authors, including:
SEE PROFILE
All content following this page was uploaded by Tasmeen Zaman Ornee on 12 May 2019.
In causal sampling policies, each sampling time Si is chosen if t ∈ [Di , Di+1 ), i = 0, 1, 2, . . . (9)
by using the up-to-date information available at the sampler.
To characterize this statement precisely, let us define the σ- D. Problem Formulation
fields The goal of this paper is to find the optimal sampling policy
that minimizes the mean-squared estimation error subject to an
Nt = σ(Xs , Is : 0 ≤ s ≤ t), Nt+ = ∩s>t Ns . (5)
average sampling-rate constraint, which is formulated as the
Hence, {Nt+ , t ≥ 0} is a filtration (i.e., a non-decreasing following problem:
and right-continuous family of σ-fields) of the information "Z
T
#
1 2
available at the sampler. Each sampling time Si is a stopping mseopt = min lim sup E (Xt − X̂t ) dt (10)
time with respect to the filtration {Nt+ , t ≥ 0} such that π∈Π T →∞ T 0
" n #
{Si ≤ t} ∈ Nt+ , ∀t ≥ 0. (6) 1 X 1
s.t. lim inf E (Si+1 − Si ) ≥ , (11)
n→∞ n f max
i=1
Let π = (S1 , S2 , ...) represent a sampling policy and Π
represent the set of causal sampling policies that satisfy two R
2 We will optimize lim supT →∞ E[ 0T (Xt −X̂t )2 dt]/T , but operationally
conditions: (i) Each sampling policy π ∈ Π satisfies (6) for R Di
a nicer criterion is lim supi→∞ E[ 0 (Xt − X̂t )2 dt]/E[Di ]. These criteria
all i. (ii) The sequence of inter-sampling times {Ti = Si+1 − are corresponding to two definitions of “average cost per unit time” that are
Si , i = 0, 1, . . .} forms a regenerative process [37, Section widely used in the literature of semi-Markov decision processes. These two
6.1]: There exists an increasing sequence 0 ≤ k1 < k2 < . . . criteria are equivalent, if {T1 , T2 , . . .} is a regenerative process, or more
generally, if {T1 , T2 , . . .} has only one ergodic class. If not condition is
of almost surely finite random integers such that the post-kj imposed, however, they are different. The interested readers are referred to
process {Tkj +i , i = 0, 1, . . .} has the same distribution as the [38]–[42] for more discussions.
3 In [30]–[36], it was shown that when the sampler and estimator are jointly
post-k0 process {Tk0 +i , i = 0, 1, . . .} and is independent of the
optimized, the MMSE estimator has the same expression no matter with or
pre-kj process {Ti , i = 0, 1, . . . , kj − 1}; further, we assume without the second part of information. We will consider the joint optimization
that E[kj+1 − kj ] < ∞, E[Sk1 ] < ∞, and 0 < E[Skj+1 − of the sampler and estimator in our future work.
Algorithm 1 Bisection method for solving (19) Algorithm 2 Bisection method for solving (21)
given l = mseYi , u = mse∞ , tolerance ǫ > 0. given l = mseYi , u = mse∞ , tolerance ǫ > 0.
repeat repeat
β := (l + u)/2. β := (l + u)/2.
h R Di+1 (β) i
E Di (β)
(Xt −X̂t )2 dt o := E[Di+1 (β) − Di (β)].
o := β − E[D .
i+1 (β)−Di (β)] if o ≥ 1/fmax, u := β; else, l := β.
if o ≥ 0, u := β; else, l := β.
until u − l ≤ ǫ.
until u − l ≤ ǫ. return β.
return β.
if
where mseopt is the optimum value of (10) and fmax is the
maximum allowed sampling rate. When fmax = ∞, this E[Di+1 (β) − Di (β)] > 1/fmax ; (20)
problem becomes an unconstrained problem.
otherwise, β is the root of
III. M AIN R ESULT
E[Di+1 (β) − Di (β)] = 1/fmax . (21)
A. Signal-aware Sampling
Problem (10) is a constrained continuous-time MDP with a The optimal objective value to (10) is then given by
continuous state space. However, we found an exact solution
hR i
D (β)
E Dii+1(β)
(Xt − X̂t )2 dt
to this problem. To present this solution, let us consider an mseopt = . (22)
OU process Ot with initial state Ot = 0 and parameter µ = 0. E[Di+1 (β)−Di (β)]
According to (8), Ot can be expressed as The proof of Theorem 1 is explained in Section IV. The
σ optimal sampling policy in Theorem 1 has a nice structure.
Ot = √ e−θt We2θt −1 . (12) Specifically, the (i+1)-th sample is taken at the earliest time t
2θ
satisfying two conditions: (i) The i-th sample has already been
Define delivered by time t, i.e., t ≥ Di (β), and (ii) the estimation
σ2 error |Xt − X̂t | is no smaller than a pre-determined threshold
mseYi = E[OY2 i ] = E[1 − e−2θYi ], (13)
2θ v(β), where v(·) is a non-linear function defined in (18).
2
2 σ
mse∞ = E[O∞ ]= . (14) Lemma 1. In Theorem 1, it holds that mseYi ≤ mseopt ≤ β ≤
2θ mse∞ .
In the sequel, we will see that mseYi and mse∞ are the lower
and upper bounds of mseopt , respectively. We will also need Proof. See Appendix M.
to use the function4 mse −mseYi
2 Z x 2 √
Hence, mse ∞
∞ −β
≥ 1. Further, because G(x) is strictly
ex −t2 ex π increasing on [0, ∞) and G(0) = 1, the inverse function
G(x) = e dt = erf(x), x ∈ [0, ∞), (15)
x 0 x 2 G−1 (·) and the threshold v(β) are properly defined and
where erf(·) is the error function, defined as v(β) ≥ 0. We note that the service time distribution affects
Z x the optimal sampling policy in (17) and (18) through mseYi
2 2
and β.
erf(x) = √ e−t dt. (16)
π 0 The calculation of β falls into two cases: In one case, β
Theorem 1. If the Yi ’s are i.i.d. with 0 < E[Yi ] < ∞, then can be computed by solving (19) via the bisection search
(S1 (β), S2 (β), . . .) with a parameter β is an optimal solution method in Algorithm 1. For this case to occur, the sampling
to (10), where rate constraint (11) needs to be inactive at the β obtained in
n o Algorithm 1. Because Di (β) = Si (β) + Yi , we can obtain
Si+1 (β) = inf t ≥ Di (β) : Xt − X̂t ≥ v(β) , (17) E[Di+1 (β) − Di (β)] = E[Si+1 (β) − Si (β)] and hence (20)
holds when the sampling rate constraint (11) is inactive. In
Di (β) = Si (β) + Yi , v(β) is defined by the other case, β is selected to satisfy the sampling rate
constraint (11) with equality, which is implemented by using
mse∞ − mseYi
σ
v(β) = √ G−1 , (18) another bisection method in Algorithm 2. The upper and lower
θ mse∞ − β
bounds for bisection search in Algorithms 1-2 are determined
G−1 (·) is the inverse function of G(·) in (15) and β ≥ 0 is by Lemma 1. If fmax = ∞, because E[Di+1 (β) − Di (β)] ≥
the root of E[Yi ] > 0, (20) is always satisfied and only the first case can
happen. By comparing (19) and (22), it follows immediately
hR i
D (β) 2
E Dii+1(β)
(X t − X̂ t ) dt
β= , (19) that
E[Di+1 (β)−Di (β)]
Lemma 2. If the sampling rate constraint is removed, i.e.,
4 If x = 0, G(x) is defined as its right limit G(0) = limx→0+ G(x) = 1. fmax = ∞, then β = mseopt .
The calculation of the expectations in Algorithms 1-2 can to (10) is given by
be greatly simplified by using the following lemma, which is hR
Di+1 (β)
i
obtained based on Dynkin’s formula [12, Theorem 7.4.1] and E Di (β) (Xt − X̂t )2 dt
mseopt = . (32)
the optional stopping theorem. E[Di+1 (β)−Di (β)]
Lemma 3. In Theorem 1, it holds that Theorem 2 is an alternative form of Theorem 1 in [3], [4].
The benefit of the new expression in (30)-(32) is that they
E[Di+1 (β) − Di (β)] allows to character β by the optimal objective value mseopt
=E[max{R1 (v(β)) − R1 (OYi )}, 0] + E[Yi ], (23) and the sampling rate constraint (11), in the same way as in
Theorem 1, which appears to be more fundamental than the
"Z #
Di+1 (β)
2
E (Xt − X̂t ) dt expression in [3], [4].
Di (β)
functions of β and are irrelevant of i. One can improve the E (Xt − X̂t ) dt = E p(∆t )dt . (34)
0 0
accuracy of the solution in Algorithms 1-2 by (i) reducing
the tolerance ǫ and (ii) increasing the number of Monte Hence, minimizing the mean-squared estimation error among
Carlo realizations for computing the expectations. Such a low- signal-ignorant sampling policies can be formulated as the
complexity solution for solving a constrained continuous-time following MDP for minimizing the expected time-average of
MDP with a continuous state space is rare. the nonlinear age function p(∆t ):
1) Sampling of Wiener Processes: In the limiting case that
"Z #
T
1
σ = 1 and θ → 0, the OU process Xt in (1) becomes a Wiener mseage-opt = inf lim sup E p(∆t )dt (35)
π∈Πsignal-ignorant T →∞ T 0
process Xt = Wt . In this case, the MMSE estimator in (9) is " n #
given by 1 X 1
s.t. lim inf E (Si+1 − Si ) ≥ ,
n→∞ n f
X̂t = WSi , if t ∈ [Di , Di+1 ). (28) i=1 max
(36)
As shown in Appendix C, v(·) defined by (18) tends to
p where mseage-opt is the optimal value of (35). By (33), p(∆t )
v(β) = 3(β − E[Yi ]). (29) and mseage-opt are bounded. Problem (35) is one instance of
the problems recently solved in Corollary 3 of [13] for general
Theorem 2. If σ = 1, θ → 0, and the Yi ’s are i.i.d. with 0 <
strictly increasing functions p(·). From this, a solution to (35)
E[Yi ] < ∞, then (S1 (β), S2 (β), . . .) is an optimal solution to
is given by
(10), where
n p o Theorem 3. If the Yi ’s are i.i.d. with 0 < E[Yi ] < ∞, then
Si+1 (β) = inf t ≥ Di (β) : Xt − X̂t ≥ 3(β − E[Yi ]) ,
(S1 (β), S2 (β), . . .) is an optimal solution to (35), where
(30) n o
Si+1 (β) = inf t ≥ Di (β) : E[(Xt+Yi+1 − X̂t+Yi+1 )2 ] ≥ β ,
Di (β) = Si (β) + Yi , and β ≥ 0 is the root of
hR i (37)
D (β) 2
E Dii+1 (X t − X̂ t ) dt
β=
(β)
, (31) Di (β) = Si (β) + Yi , and β ≥ 0 is the root of
E[Di+1 (β)−Di (β)] hR
D (β)
i
2
E Dii+1 (β)
(X t − X̂ t ) dt
if E[Di+1 (β) − Di (β)] > 1/fmax; otherwise, β is the root of β= , (38)
E[Di+1 (β) − Di (β)] = 1/fmax . The optimal objective value E[Di+1 (β)−Di (β)]
if E[Di+1 (β) − Di (β)] > 1/fmax; otherwise, β is the root of According to (8) and (9), the estimation error (Xt − X̂t ) is
E[Di+1 (β) − Di (β)] = 1/fmax . The optimal objective value of the same distribution with Ot−Si , if t ∈ [Di , Di+1 ). By
to (35) is given by using Dynkin’s formula and the optional stopping theorem,
hR
D (β) 2
i we obtain the following lemma.
E Dii+1
(β) (Xt − X̂t ) dt
mseage-opt = . (39) Lemma 5. Let τ ≥ 0 be a stopping time of the OU process
E[Di+1 (β)−Di (β)] Ot with E [τ ] < ∞, then
Because Πsignal-ignorant ⊂ Π, it follows immediately that Z τ 2
2 σ 1 2
mseopt ≤ mseage-opt . E Ot dt = E τ − Oτ . (42)
0 2θ 2θ
C. Discussions of the Results If, in addition, τ is the first exit time of a bounded set, then
The difference among Theorems 1-3 is only in the ex- E [τ ] = E[R1 (Oτ )], (43)
pressions (17), (30), (37) of threshold policies, whereas the Z τ
parameter β is determined by the optimal objective value E Ot2 dt = E[R2 (Oτ )], (44)
and the sampling rate constraint in the same manner. Later 0
on in (55), we have shown that β is exactly equal to the where R1 (·) and R2 (·) are defined in (26) and (27), respec-
summation of the optimal objective value of the MDP and the tively.
optimal Lagrangian dual variable associated to the sampling
Proof. See Appendix D.
rate constraint.
In signal-aware sampling policies (17) and (30), the sam- B. Suspend Sampling when the Server is Busy
pling time is determined by the instantaneous estimation error
Xt − X̂t , and the threshold v(β) is determined by the By using the strong Markov property of the OU process
signal model. In the signal-ignorant sampling policy (37), the Xt and the orthogonality principle of MMSE estimation, we
sampling time is determined by the expected estimation error obtain the following useful lemma:
E[(Xt+Yi+1 − X̂t+Yi+1 )2 ] at time t + Yi+1 . We note that if Lemma 6. In (10), it is suboptimal to take a new sample
t = Si+1 (β), then t + Yi+1 = Si+1 (β) + Yi+1 = Di+1 (β) before the previous sample is delivered.
is the delivery time of the new sample. Hence, (37) requires
that the expected estimation error upon the delivery of the Proof. See Appendix E.
new sample is no less than β. Finally, it is worth noting that
Theorems 1-3 hold for all distributions of the service times A similar result was obtained in [4] for the sampling of
Yi satisfying 0 < E[Yi ] < ∞, and for both constrained and Wiener processes. By Lemma 6, there is no loss to consider
unconstrained sampling problems. a sub-class of sampling policies Π1 ⊂ Π such that each
sample is generated and sent out after all previous samples
IV. P ROOF OF THE M AIN R ESULT are delivered, i.e.,
We prove Theorem 1 in four steps: (i) We first show Π1 = {π ∈ Π : Si = Gi ≥ Di−1 for all i}.
that sampling should be suspended when the server is busy, For any policy π ∈ Π1 , the information used for determining
which can be used to simplify (10). (ii) We use an extended Si includes: (i) the history of signal values (Xt : t ∈ [0, Si ])
Dinkelbach’s method [46] and Lagrangian duality method to and (ii) the service times (Y1 , . . . , Yi−1 ) of previous samples.
decompose the simplified problem into a series of mutually
Let us define the σ-fields Ft+ = σ(Xs : s ∈ [0, t]) and
independent per-sample MDP. (iii) We utilize the free bound- Ft+ = ∩r>t Fr+ . Then, {Ft+ , t ≥ 0} is the filtration (i.e., a
ary method from optimal stopping theory [11] to solve the non-decreasing and right-continuous family of σ-fields) of the
per-sample MDPs. (iv) Finally, we use a geometric multiplier OU process Xt . Given the service times (Y1 , . . . , Yi−1 ) of
method [47] to show that the duality gap is zero. The above previous samples, Si is a stopping time with respect to the
proof framework is an extension to that used in [3], [4], [13], filtration {Ft+ , t ≥ 0} of the OU process Xt , that is
where the most challenging part is in finding the analytical
solution of the per-sample MDP in Step (iii). [{Si ≤ t}|Y1 , . . . , Yi−1 ] ∈ Ft+ . (45)
The OU process Ot in (12) with initial state Ot = 0 and Π1 ={Si : [{Si ≤ t}|Y1 , . . . , Yi−1 ] ∈ Ft+ ,
parameter µ = 0 is the solution to the SDE Ti is a regenerative process}. (46)
dOt = −θOt dt + σdWt . (40) Let Zi = Si+1 − Di ≥ 0 represent the waiting time between
the delivery time Di of the i-th sample and the generation time
In addition, the infinitesimal generator of Ot is [48, Eq. A1.22]
Si+1 of the (i+1)-th sample.As Zi is the waiting time between
∂ σ2 ∂ 2 the delivery time Di of the i-th sample and thePgeneration time
G = −θu + . (41)
∂u 2 ∂u2 Si+1 of the (i + 1)-th sample. Then, Si = i−1 j=0 (Yj + Zj )
Pi−1
and Di = j=0 (Yj + Zj ) + Yi for each i = 1, 2, . . . D. Lagrangian Dual Problem of (50)
Given (Y0 , Y1 , . . .), (S1 , S2 , . . .) is uniquely determined by Next, we use the Lagrangian dual approach to solve (50)
(Z0 , Z1 , . . .). Hence, one can also use π = (Z0 , Z1 , . . .) to with c = mseopt . We define the Lagrangian associated with
represent a sampling policy. (50) as
Because {Xt − X̂t , t ∈ [Di , Di+1 )} and {Ot−Si , t ∈
[Di , Di+1 )} are of the same distribution, for each i = 1, 2, . . ., L(π; λ)
n Z Yi +Zi +Yi+1
"Z # 1X
Di+1
2 = lim E Os2 ds − (mseopt + λ)(Yi +Zi )
E (Xt − X̂t ) dt n→∞ n Yi
i=1
Di
λ
+ , (52)
"Z # "Z #
Di+1 Yi +Zi +Yi+1
=E 2
Ot−S dt =E Os2 ds . (47) fmax
i
Di Yi where λ ≥ 0 is the dual variable. Let
Because Ti is a regenerative process, the renewal theory [49] e(λ) = inf L(π; λ). (53)
tells us that n1 E[Sn ] is a convergent sequence and π∈Π1
"Z
T
# Then, the dual problem of (50) is defined by
1 2
lim sup E (Xt − X̂t ) dt
T →∞ T 0
d = max e(λ), (54)
λ≥0
hR i
D
E 0 n (Xt − X̂t )2 dt where d is the optimum value of (54). Weak duality [47]
= lim implies d≤h(mseopt ). In Section IV-F, we will establish strong
n→∞ E[Dn ]
Pn hR
Yi +Zi +Yi+1 2
i duality, i.e., d = h(mseopt ). In the sequel, we solve (53). Let
i=1 E Yi
O s ds us define
= lim Pn . (48)
n→∞
i=1 E [Yi + Z i ] β = mseopt + λ. (55)
Hence, (10) can be rewritten as the following MDP:
hR i As shown in Appendix F, by using Lemma 5, we can obtain
Pn Yi +Zi +Yi+1 2
i=1 E Yi
Os ds "Z
Yi +Zi +Yi+1
#
mseopt = min lim Pn (49) E Os2 ds−β(Yi +Zi )
π∈Π1 n→∞ i=1 E [Yi + Zi ] Yi
n
1 X 1 "Z
Yi +Zi
#
s.t. lim E [Yi + Zi ] ≥ ,
n→∞ n f max =E (Os2 − β)ds + γOY2 i +Zi
i=1 Yi
where mseopt is the optimal value of (49). σ2
+ [E(Yi+1 ) − γ] − βE[Yi+1 ], (56)
2θ
C. Reformulation of Problem (49) For any s ≥ 0, define the σ-fields Fts = σ(Os+r − Os :
r ∈ [0, t]) and Fts+ = ∩r>t Frs . Then, {Fts+ , t ≥ 0} is the
In order to solve (49), let us consider the following MDP
filtration of the time-shifted OU process {Os+t − Os , t ∈
with a parameter c ≥ 0:
"Z # [0, ∞)}. Define Ms as the set of integrable stopping times
n
1X Yi +Zi +Yi+1
2
of {Os+t − Os , t ∈ [0, ∞)}, i.e.,
h(c) = min lim E Os ds − c(Yi + Zi )
π∈Π1 n→∞ n Yi Ms = {τ ≥ 0 : {τ ≤ t} ∈ Fts+ , E [τ ] < ∞}. (57)
i=1
(50)
n By using a sufficient statistic of (53), we can obtain
1X 1
s.t. lim E [Yi + Zi ] ≥ , Lemma 8. An optimal solution (Z0 , Z1 , . . .) to (53) satisfies
n→∞ n fmax
i=1 "Z #
Yi +Zi
where h(c) is the optimum value of (50). Similar with Dinkel- 2 2
min E (Os − β)ds + γOYi +Zi OYi , Yi , (58)
Zi ∈MYi
bach’s method [46] for nonlinear fractional programming, the Yi
following lemma holds for the MDP (49): where β ≥ 0 and γ ≥ 0 are defined in (55) and (25),
Lemma 7. [4] The following assertions are true: respectively.
Proof. See Appendix G.
(a). mseopt T c if and only if h(c) T 0.
By this, (53) is decomposed as a series of per-sample MDP
(b). If h(c) = 0, the solutions to (49) and (50) are identical.
(58).
Hence, the solution to (49) can be obtained by solving (50) E. Analytical Solution to Per-Sample MDP (58)
and seeking c = mseopt ≥ 0 such that
We solve (58) by using the free-boundary approach for
h(mseopt ) = 0. (51) optimal stopping problems [11].
Let us consider an OU process Vt with initial state V0 = v boundary condition (63), C2 is chosen as
and parameter µ = 0. Define the σ-fields FtV = σ(Vs : s ∈ 1
1 β
3 θ
[0, t]), FtV + = ∩r>t FrV , and the filtration {FtV + , t ≥ 0} C2 = E e−2θYi v∗2 + − 2 2 F2 1, 1; , 2; 2 v∗2 v∗2 ,
2θ 2θ σ 2 σ
associated to {Vt , t ≥ 0}. Define MV as the set of integrable
stopping times of {Vt , t ∈ [0, ∞)}, i.e., where we have used (25). By this, we obtain the expression
of H(v) in the continuation region (−v∗ , v∗ ). In the stopping
MV = {τ ≥ 0 : {τ ≤ t} ∈ FtV + , E [τ ] < ∞}. (59) region |v| ≥ v∗ , the stopping time in (61) is τ∗ = 0 since
Our goal is to solve the following optimal stopping problem |V0 | = |v| ≥ v∗ . Hence, if |v| ≥ v∗ , the objective value
for any given initial state v ∈ R achieved by the sampling time (61) is
Z τ Z 0
sup Ev −γVτ2 − (Vs2 − β)ds , (60)
2
Ev −γv − (Vs − β)ds = −γv 2 .
2
τ ∈MV 0 0
where Ev [·] is the conditional expectation for given initial state By this, we obtain a candidate of the value function for (60):
V0 = v.
2
− v2θ + 2θ1
− σβ2 2 F2 1, 1; 23 , 2; σθ2 v 2 v 2 + C2 ,
In order to solve (60), we first find a candidate solution to H(v) = if |v| < v∗ ,
−γv 2 ,
if |v| ≥ v∗ .
(60) by solving a free boundary problem; then we prove that
the free boundary solution is indeed the value function of (60): (67)
Hence, (58) is one instance of (60) with v = OYi . where the Next, we find v∗ . By taking the gradient of H(v), we get
supremum is taken over all stopping times τ of Vt . √ !
′ v σ 2β θ
H (v) = − + 3 −
√ F v , v ∈ (−v∗ , v∗ ),
1) A candidate solution to (60): Now, we show how to θ θ2 σ θ σ
solve (60). The general optimal stopping theory in Chapter I (68)
of [11] tells us that the following guess of the stopping time
where
should be optimal for Problem (60): Z x
2 2
τ∗ = inf{t ≥ 0 : |Vt | ≥ v∗ }, (61) F (x) = ex e−t dt. (69)
0
where v∗ ≥ 0 is the optimal stopping threshold to be found. The boundary condition (64) implies that v∗ is the root of
Observe that in this guess, the continuation region (−v∗ , v∗ ) √ !
is assumed symmetric around zero since the OU process is v σ 2β θ
− + 3 − √ F v = −2γv. (70)
symmetric, i.e., the process {−Vt , t ≥ 0} is also an OU θ θ 2 σ θ σ
process started at −v. Similarly, we may also argue that the
Substituting (13), (14), and (25) into (70), yields that v∗ is the
value function should be even.
root of
√
According to [11, Chapter 8], and [12, Chapter 10], the θ
(mse∞ − β) G v = mse∞ − mseYi , (71)
value function and the optimal stopping threshold v∗ should σ
satisfy the following free boundary problem:
where G(·) is defined in (15). Solving (71), we obtain that
σ 2 ′′ v∗ can be expressed as a function v(β) of β, where v(β) is
H (v) − θvH ′ (v) = v 2 − β, v ∈ (−v∗ , v∗ ), (62)
2 defined by (18). By this, we obtain a candidate solution to
H(±v∗ ) = −γv∗2 , (63) (60).
H ′ (±v∗ ) = ∓2γv∗ . (64) 2) Verification of the optimality of the candidate solution:
Next, we use Itô’s formula to verify the above candidate
In Appendix H, we use the integrating factor method [50, Sec. solution is indeed optimal, as stated in the following theorem:
I.5] to find the general solution to (62) given by
Theorem 4. For all v ∈ R, H(v) in (67) is the value function
v2
1 β 3 θ 2 2
H(v) = − + − 2 F2 1, 1; , 2; 2 v v of the optimal stopping problem (60). The optimal stopping
2θ 2θ σ 2 2 σ time for solving (60) is τ∗ in (61), where v∗ = v(β) is given
√ !
θ by (18).
+ C1 erfi v + C2 , v ∈ (−v∗ , v∗ ), (65)
σ In order to prove Theorem 4, we have established the
where C1 and C2 are constants to be found for satisfying following properties of H(v):
(63)-(64), and erfi(x) is the imaginary error function, i.e., Rτ
Lemma 9. H(v) = Ev −γVτ2∗ − 0 ∗ (Vs2 − β)ds .
Z x
2 2
erfi(x) = √ et dt. (66) Proof. See Appendix I.
π 0
Because H(v) should be even, C1 = 0. In order to satisfy the Lemma 10. H(v) ≥ −γv 2 for all v ∈ R.
Proof. See Appendix J. 1.05
1
A function f (v) is said to be excessive for the process Vt
0.95
if
0.9 Uniform sampling
Zero-wait sampling
Ev f (Vt ) ≤ f (v), ∀t ≥ 0, v ∈ R. (72) 0.85 Age-optimal sampling
MSE-optimal sampling
MSE
By using Itô’s formula in stochastic calculus, we can obtain 0.8
0.75
Lemma 11. The function H(v) is excessive for the process 0.7
Vt .
0.65
0.55
Now, we are ready to prove Theorem 4. 0 0.5 1 1.5
fmax
σ2 σ2
E[Yi+1 ] + γE OY2 i +Zi − 2 E 1 − e−2θYi+1
According to Lemma 5, =
"Z # 2θ 4θ
Yi +Zi +Yi+1
2
σ2 2
E Os ds = [E(Yi+1 ) − γ] + E OYi +Zi γ, (114)
Yi +Zi 2θ
σ 2
1 h i where γ is defined in (25). Using this, (56) can be shown
= E[Yi+1 ] − E OY2 i +Zi +Yi+1 − OY2 i +Zi . (107) readily.
2θ 2θ
The second term in (107) can be expressed as A PPENDIX G
P ROOF OF L EMMA 8
h i
E OY2 i +Zi +Yi+1 − OY2 i +Zi
" 2 Because the Yi ’s are i.i.d., (56) is determined by the control
σ
=E OYi +Zi e−θYi+1 + √ e−θYi+1 We2θYi+1 −1 decision Zi and the information (OYi , Yi ). Hence, (OYi , Yi )
2θ is a sufficient statistic for determining Zi in (53). Therefore,
there exists an optimal policy (Z0 , Z1 , . . .) to (53), in which
2
− OYi +Zi Zi is determined based on only (OYi , Yi ). By this, (53) is
σ 2 −2θYi+1 2
decomposed into a sequence of per-sample MDPs, given by
2 −2θYi+1
=E OYi +Zi (e − 1) + e We2θYi+1 −1 (58). This completes the proof.
2θ
σ A PPENDIX H
+ E 2OYi +Zi e−θYi+1 √ e−θYi+1 We2θYi+1 −1 . (108)
2θ P ROOF OF (65)
Because Yi+1 is independent of OYi +Zi and Wt , we have Define S(v) = H ′ (v). Then, (62) becomes
E OY2 i +Zi (e−2θYi+1 − 1) = E OY2 i +Zi E e−2θYi+1 − 1 ,
2θ 2
(109) S ′ (v) − 2
vS(v) = 2 (v 2 − β). (115)
σ σ
Equation (115) can be solved by using the integrating factor Case 1: If |v| ≥ v∗ , (61) implies τ∗ = 0. Hence,
method [50, Sec. I.5], which applies to any ODE of the form
Z τ∗
Ev τ∗ |v| ≥ v∗ = Ev 1ds |v| ≥ v∗ = 0, (127)
S ′ (v) + a(v)S(v) = b(v). (116) 0
A PPENDIX I A PPENDIX J
P ROOF OF L EMMA 9 P ROOF OF L EMMA 10
The proof of Lemma 9 consists of the following two cases: The proof of Lemma 10 consists of the following two cases:
Case 1: If |v| ≥ v∗ , (67) tells us that chosen in the sequel arbitrarily. By using Itô’s formula [55,
Theorem 7.13], we obtain that almost surely
H(v) = −γv 2 . (136)
H(Vt ) − H(v)
Hence, Lemma 10 holds in Case 1. Z t 2
Case 2: |v| < v∗ . Because H(v) is an even function and σ ′′
H (Vr ) − θVr H ′ (Vr ) − (Vr2 − β) dr
=
H(v) = −γv 2 holds at v = ±v∗ , to prove H(v) ≥ −γv 2 for 0 2
Z t
|v| < v∗ , it is sufficient to show that for all v ∈ [0, v∗ )
+ σH ′ (Vr )dWr . (144)
H ′ (v) < [−γv 2 ]′ = −2γv. (137) 0
For all t ≥ 0 and all v ∈ R, we can show that
Hence, the remaining task is to prove that (137) holds for Z t
v ∈ [0, v∗ ). Ev ′ 2
[σH (Vr )] dr < ∞.
After some manipulations, we can obtain from (71) that 0
√ Rt
This and [55, Theorem 7.11] imply that 0 σH ′ (Vr )dWr is a
!
θ
(mse∞ − β) G v∗ = mse∞ E(e−2θYi ). (138) martingale and
σ
Z t
Because G(·) > 0 is an increasing function, it holds for all Ev σH ′ (Vr )dWr = 0, ∀ t ≥ 0. (145)
v ∈ [0, v∗ ) that 0
√ ! √ ! Hence,
θ θ
(mse∞ − β) G v < (mse∞ − β) G v∗ Ev [H(Vt ) − H(v)]
σ σ Z t 2
σ ′′
=mse∞ E(e−2θYi ). H (Vr ) − θVr H ′ (Vr ) − (Vr2 − β) dr . (146)
(139) =Ev
0 2
One can obtain (137) from (68) and (139). Hence, Lemma 10
also holds in Case 2. This completes the proof. Next, we show that for all v ∈ R
σ 2 ′′
H (v) − θvH ′ (v) − (v 2 − β) ≤ 0. (147)
A PPENDIX K 2
P ROOF OF L EMMA 11 Let us consider the following two cases:
We need the following lemma in the proof of Lemma 11: Case 1: If |v| < v∗ , then (62) implies
Lemma 12. (1 − 2x2 )G(x) ≤ 1 for all x ≥ 0. σ 2 ′′
H (v) − θvH ′ (v) − (v 2 − β) = 0. (148)
2
Proof. Because G(0) = 1, it suffices to show that for all x > 0
Case 2: |v| > v∗ . In this case, H(v) = −γv 2 . Hence,
[(1 − 2x2 )G(x)]′ ≤ 0. (140)
σ 2 ′′
We have H (v) − θvH ′ (v)
2
[(1 − 2x2 )G(x)]′ σ2
= (−2γ) − θv(−2γv)
1 2 x −t2
Z
1 2
Z x
2
2
= − 2 ex e dt + − 4x2 ex e−t dt − 2x. (141) = − σ 2 γ + 2θγv 2
x 0 x 0
2
= − mseYi + E[1 − e−2θYi ]v 2 . (149)
Because e−t is decreasing on t ∈ [0, ∞), for all x > 0
Z x Z x Substituting (149) into (147), yields
−t2 2 2
e dt ≥ e−x dt = xe−x . (142) E[e−2θYi ]v 2 ≥ β − mseYi . (150)
0 0
Hence, To prove (150), since |v| > v∗ , it suffices to show that
x
1 2 1
Z
− 2 ex e−t
2
dt + ≤ 0. (143) E[e−2θYi ]v∗2 ≥ β − mseYi , (151)
x 0 x
which is equivalent to
Substituting (143) into (141), (140) follows. This completes
the proof. v∗2
(mse∞ − mseYi ) ≥ (mse∞ − mseYi ) − (mse∞ − β).
mse∞
Now we are ready to prove Lemma 11. (152)
Proof of Lemma 11. The function H(v) is continuous differ- By Lemma 12, we get
entiable on R. In addition, H ′′ (v) is continuous everywhere √ !
v∗2 2θ
θ
but at v = ±v∗ . Since the Lebesgue measure of those time 1− 2 G v∗ ≤ 1. (153)
σ σ
t for which Vt = ±v∗ is zero, the values H ′′ (±v∗ ) can be
Hence, Next, we compute mseopt and β = mseopt + λ⋆ . To compute
√ ! mseopt , we substitute policy π ⋆ into (49), which yields
v∗2
θ
1− G v∗ ≤ 1. (154) Pn−1 hR Yi +Zi (β)+Yi+1 2 i
mse∞ σ i=0 E Yi
Os ds
mseopt = lim Pn−1
By substituting (71) into (155), we obtain n→∞
i=0 E [Yi +Z (β)]
√ hR i i
! Yi +Zi (β)+Yi+1 2
v∗2 E Yi Os ds
θ
(mse∞ − mseYi ) 1 − G v∗ = , (163)
mse∞ σ E[Yi + Zi (β)]
√
θ where in the last equation we have used that the Zi (β)’s are
≤ (mse∞ − β) G v∗ . (155)
σ i.i.d. Hence, the value of β = mseopt + λ⋆ can be obtained by
considering the following two cases:
Because G(x) > 0 for all x > 0,
Case 1: If λ⋆ > 0, then (163) and (161) imply that
v∗2
(mse∞ − mseYi ) 1 − ≤ mse∞ − β, (156) 1
mse∞ E [Yi + Zi (β)] = , (164)
fmax
which implies (152). Hence, (147) holds in both cases. Thus,
hR i
Y +Z (β)+Yi+1 2
E Yii i Os ds
Ev [H(Vt ) − H(v)] ≤ 0 holds for all t ≥ 0 and v ∈ R. This β > mseopt = . (165)
completes the proof. E [Yi +Zi (β)]
Case 2: If λ⋆ = 0, then (163) and (162) imply that
A PPENDIX L 1
E [Yi + Zi (β)] ≥ , (166)
P ROOF OF T HEOREM 5 fmax
hR i
Y +Z (β)+Yi+1 2
E Yii i Os ds
According to [47, Prop. 6.2.5], if we can find π ⋆ = (Z1 , β = mseopt = . (167)
E [Yi +Zi (β)]
Z2 , . . .) and λ⋆ that satisfy the following conditions:
n−1
Hence, if we choose π ⋆ = (Z0 (β), Z1 (β), . . .), where Zi (β)
1X 1 is given by (74) and β satisfies (164)-(167), and choose λ⋆ =
π ⋆ ∈ Π1 , lim E [Yi + Zi ] − ≥ 0, (157)
n→∞ n
i=0
fmax β − mseopt , then the selected π ⋆ and λ⋆ satisfy (157)-(160).
By [47, Prop. 6.2.3(b)], the duality gap between (50) and (54)
λ⋆ ≥ 0, (158)
is zero. A solution to (50) and (54) is π ⋆ . This completes the
L(π ⋆ ; λ⋆ ) = inf L(π; λ⋆ ), (159) proof.
π∈Π1
( n−1
)
⋆ 1X 1 A PPENDIX M
λ lim E [Yi + Zi ] − = 0, (160)
n→∞ n fmax P ROOF OF L EMMA 1
i=0
then π ⋆ is an optimal solution to (50) and λ⋆ is a geometric We first prove mseopt ≤ β. Using (55) and λ ≥ 0, it follows
multiplier [47] for (50). Further, if we can find such π ⋆ and that mseopt ≤ β.
λ⋆ , then the duality gap between (50) and (54) must be zero, Next, we prove β ≤ mse∞ . According to (13) and (14), we
because otherwise there is no geometric multiplier [47, Prop. know that mseYi ≤ mse∞ . Because G(x) ≥ 1 for all x ≥ 0,
6.2.3(b)]. The remaining task is to find π ⋆ and λ⋆ that satisfy (71) implies β ≤ mse∞ .
(157)-(160). Finally, we prove mseYi ≤ mseopt . We first consider the
The remaining task is to find π ⋆ and λ⋆ that satisfies (157)- special case of fmax = ∞. In this case, (19) and (22) tell us
(160). According to Theorem 4 and Corollary 1, an solution that mseopt = β. By using (71) and the fact that G(x) ≥ 1
π ⋆ = (Z0 (β), Z1 (β), . . .) to (159) is given by (73), where for all x ≥ 0, it follows that mseYi ≤ β = mseopt holds when
β = mseopt + λ⋆ . In addition, because the Yi ’s are i.i.d., the fmax = ∞. On the other hand, the set of feasible policies of
Zi (β)’s in policy π ⋆ are i.i.d. Using (157), (158), and (160), Problem (10) becomes larger as fmax increases. Hence, mseopt
the value of λ⋆ can be obtained by considering two cases: If is decreasing in fmax . We have shown that mseYi ≤ mseopt
λ⋆ > 0, because the Zi (β)’s are i.i.d., we have from (160) holds in the special case fmax = ∞. Therefore, mseYi ≤
that mseopt must hold for all possible positive values of fmax . This
n−1 completes the proof.
1X 1
lim E [Yi + Zi (β)] = E [Yi + Zi (β)] = . (161)
n→∞ n fmax
i=0 A PPENDIX N
⋆
If λ = 0, then (157) implies P ROOF OF L EMMA 3
n−1 According to (8) and (9), the estimation error (Xt −X̂t ) is of
1X 1
lim E [Yi + Zi (β)] = E [Yi + Zi (β)] ≥ . (162) the same distribution with Ot−Si (β) for t ∈ [Di (β), Di+1 (β)).
n→∞ n f max
i=0 We will use (Xt − X̂t ) and Ot−Si (β) interchangeably for t ∈
[Di (β), Di+1 (β)). In order to prove Lemma 3, we need to By using (176), (177), and (114), we have
consider the following two cases: "Z
Di+1 (β)
#
2
E (Xt − X̂t ) dt OYi , |OYi | < v(β)
Di (β)
"Z
Di+1 (β) Z Si+1 (β)
=E (Xt − X̂t )2 dt + (Xt − X̂t )2 dt
Si+1 (β) Si (β)
Case 1: If |XDi (β) − X̂Di (β) | = |OYi | ≥ v(β), then (17) #
Z Di (β)
tells us Si+1 (β) = Di (β). Hence, − 2
(Xt − X̂t ) dt OYi , |OYi | < v(β)
Si (β)
Di (β) = Si (β) + Yi , (168)
Di+1 (β) = Si+1 (β) + Yi+1 = Di (β) + Yi+1 . (169) =mse∞ [E(Yi+1 ) − γ] + v 2 (β)γ + R2 (v(β)) − R2 (|OYi |).
(178)
Using these and the fact that the Yi ’s are independent of the
By combining (170) and (175) of the two cases, yields
OU process, we can obtain h i
h i E Di+1 (β) − Di (β) OYi
E Di+1 (β) − Di (β) OYi , |OYi | ≥ v(β) = E[Yi+1 ], (170)
=E[Yi+1 ] + max{R1 (v(β)) − R1 (|OYi |), 0}. (179)
and
"Z
Di+1 (β)
# Similarly, by combining (171) and (178) of the two cases,
E 2
(Xt − X̂t ) dt OYi , |OYi | ≥ v(β) yields
Di (β) "Z #
Di+1 (β)
Yi +Yi+1 2
(Xt − X̂t ) dt OYi
Z
E
=E Os2 ds OYi , |OYi | ≥ v(β) Di (β)
Yi
(a) σ
2
σ 2 =mse∞ [E(Yi+1 ) − γ] + max{v 2 (β), OY2 i }γ
E[Yi+1 ] + γOY2 i −
E 1 − e−2θYi+1
= 2 + max{R2 (v(β)) − R2 (|OYi |), 0}. (180)
2θ 4θ
=mse∞ [E(Yi+1 ) − γ] + OY2 i γ, (171) Finally, by taking the expectation over OYi in (179) and (180)
where Step (a) follows from the proof of (114). and using the fact that R1 (·) and R2 (·) are even functions,
Lemma 3 is proven.
Case 2: If |XDi (β) − X̂Di (β) | = |OYi | < v(β), then (17)
tells us that, almost surely,
|XSi+1 (β) − X̂Si+1 (β) | = v(β). (172)
By invoking Lemma 5, we can obtain
h i
E Si+1 (β) − Si (β) OYi , |OYi | < v(β) = R1 (v(β)), (173)
h i
E Di (β) − Si (β) OYi , |OYi | < v(β) = R1 (|OYi |). (174)
Using (173), (174), and Di+1 (β) = Si+1 (β) + Yi+1 , we get
h i
E Di+1 (β) − Di (β) OYi , |OYi | < v(β)
h
=E (Di+1 (β) − Si+1 (β)) + (Si+1 (β) − Si (β))
i
− (Di (β) − Si (β)) OYi , |OYi | < v(β)
=E[Yi+1 ] + R1 (v(β)) − R1 (|OYi |). (175)
In addition, by invoking Lemma 5 again, we can obtain
"Z #
Si+1 (β)
2
E Os ds OYi , |OYi | < v(β) = R2 (v(β)), (176)
Si (β)
"Z #
Di (β)
E Os2 ds OYi , |OYi | < v(β) = R2 (|OYi |). (177)
Si (β)