Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331033809

Sampling for Remote Estimation through Queues: Age of Information and


Beyond

Preprint · February 2019

CITATIONS READS

0 150

2 authors, including:

Tasmeen Zaman Ornee


Auburn University
7 PUBLICATIONS 86 CITATIONS

SEE PROFILE

All content following this page was uploaded by Tasmeen Zaman Ornee on 12 May 2019.

The user has requested enhancement of the downloaded file.


Sampling for Remote Estimation through Queues:
Age of Information and Beyond
Tasmeen Zaman Ornee and Yin Sun
Dept. of ECE, Auburn University, Auburn, AL

Abstract— The age of information, as a metric for evaluating ACK


information freshness, has received a lot of attention. Recently,
arXiv:1902.03552v1 [cs.IT] 10 Feb 2019

an interesting connection between the age of information and sampler


remote estimation error was found in a sampling problem of (Si , XSi )
Wiener processes: If the sampler has no knowledge of the signal Xt estimator X̂t
being sampled, the optimal sampling strategy is to minimize the
observer channel
age of information; however, by exploiting causal knowledge of
the signal values, it is possible to achieve a smaller estimation Fig. 1: System model.
error. In this paper, we extend a previous study by investigating a
problem of sampling a stationary Gauss-Markov process, namely
the Ornstein-Uhlenbeck (OU) process. The optimal sampling Xt − XUt of the system state, and the state update policy
problem is formulated as a constrained continuous-time Markov that minimizes the age of information does not minimize the
decision process (MDP) with an uncountable state space. We
provide an exact solution to this MDP: The optimal sampling
state estimation error. This result was first shown in [3], [4],
policy is a threshold policy on instantaneous estimation error and where a challenging sampling problem of Wiener processes
the threshold is found. Further, if the sampler has no knowledge was solved and the optimal sampling policy was shown to
of the OU process, the optimal sampling problem reduces to have an intuitive structure. As the results therein hold only
an MDP for minimizing a nonlinear age of information metric. for non-stationary signals that can be modeled as a Wiener
The age-optimal sampling policy is a threshold policy on expected
estimation error and the threshold is found. These results hold for process, one would wonder how to, and whether it is possible
(i) general service time distributions of the queueing server and to, extend [3], [4] for handling more general signal models.
(ii) sampling problems both with and without a sampling rate In this paper, we generalize [3], [4] by exploring a problem
constraint. Numerical results are provided to compare different of sampling an Ornstein-Uhlenbeck (OU) process Xt . From
sampling policies.
the obtained results, we hope to find useful structural prop-
I. I NTRODUCTION erties of the optimal sampler design that can be potentially
applied to more general signal models. The OU process Xt
Timely updates of the system state are of significant impor- is the continuous-time analogue of the discrete-time AR(1)
tance for state estimation and decision making in networked process, which is defined as the solution to the stochastic
control and cyber-physical systems, such as UAV navigation, differential equation (SDE) [5], [6]
robotics control, mobility tracking, and environment monitor-
ing systems. To evaluate the freshness of state updates, the dXt = θ(µ − Xt )dt + σdWt , (1)
concept of Age of Information, or simply age, was introduced where µ, θ > 0, and σ > 0 are parameters and Wt
to measure the timeliness of state samples received from a represents a Wiener process. The OU process is the only
remote transmitter [1], [2]. Let Ut be the generation time nontrivial continuous-time process that is stationary, Gaussian,
of the freshest received state sample at time t. The age of and Markovian [6]. It has been widely used to model financial
information, as a function of t, is defined as ∆t = t−Ut , which prices such as interest rates, currency exchange rates, and
is the time difference between the freshest samples available commodity prices (with modifications) [7], node mobility in
at the transmitter and receiver. mobile ad hoc networks, robotic swarms, or UAV networks
Recently, the age of information concept has received [8], [9], and physical processes such as fluid dynamics [10].
significant attention, because of the extensive applications of
As shown in Fig. 1, samples of an OU process are for-
state updates among systems connected over communication
warded to a remote estimator through a FIFO queue with
networks. The states of many systems, such as UAV mobility
i.i.d. service times. The service time distributions considered
trajectory and sensor measurements, are in the form of a signal
in this paper are quite general: they are only required to have
Xt , that may change slowly at some time and vary more
a finite mean. This queueing model is helpful to analyze the
dynamically later. Hence, the time difference described by
robustness of remote estimation systems with occasionally
the age ∆t = t − Ut only partially characterize the variation
long communication delay. For example, UAVs flying close to
This work was supported in part by NSF grant CCF-1813050 and ONR WiFi access points may suffer from long communication delay
grant N00014-17-1-2417. and instability issues, if they receive strong interference from
the WiFi access points.1 We remark that this paper focuses on • The above results hold for (i) general service time dis-
a queueing model; whereas our analysis can be readily applied tributions with a finite mean and (ii) sampling problems
to other channel models, such as erasure channels and fading both with and without a sampling rate constraint. Nu-
channels. merical results suggest that the optimal sampling policy
The estimator utilizes causally received samples to construct is better than zero-wait sampling and the classic uniform
an estimate X̂t of the real-time signal value Xt . The quality sampling.
of remote estimation is measured by the time-average mean- One interesting observation from these results is that the
squared estimation error, i.e., threshold function v(β) varies with respect to the signal model
"Z
T
# and sampling problem, but the parameter β is determined in
1 2
mse = lim sup E (Xt − X̂t ) dt . (2) the same way.
T →∞ T 0
A. Related Work
Our goal is to find the optimal causal sampling policy that
minimizes mse by choosing the sampling times subject to a The results in this paper are tightly related to recent studies
maximum sampling rate constraint. In practice, the cost (e.g., on the age of information ∆t , e.g., [1], [13]–[27], which does
energy, CPU cycle, storage) for state updates increases with not have a signal model. The average age and average peak
the average sampling rate. Hence, we are striking to find the age have been analyzed for various queueing systems in, e.g.,
optimum tradeoff between estimation error and update cost. In [1], [14], [16], [17]. The optimality of the Last-Come, First-
addition, the unconstrained problem will also be solved. The Served (LCFS) policy, or more generally the Last-Generated,
contributions of this paper are summarized as follows: First-Served (LGFS) policy, was established for various queue-
ing system models in [20]–[24]. Optimal sampling policies
• The optimal sampling problem for minimize the mse
for minimizing non-linear age functions were developed in
under a sampling rate constraint is formulated as a
[13], [15]. Age-optimal transmission scheduling of wireless
constrained continuous-time Markov decision process
networks were investigated in, e.g., [18], [19], [25]–[29].
(MDP) with an uncountable state space. Because of the
On the other hand, this paper is also a contribution to
curse of dimensionality, such problems are often lack
of low-complexity solutions that are arbitrarily accurate. the area of remote estimation, e.g., [30]–[36], by adding
However, we were able to solve this MDP exactly: The a queue between the sampler and estimator. In [32], [34],
optimal sampling of Wiener processes was studied, where the
optimal sampling policy is proven to be a threshold policy
on instantaneous estimation error, where the threshold is transmission time from the sampler to the estimator is zero.
Optimal sampling of OU processes was also considered in
a non-linear function v(β) of a parameter β. The value of
[32], which is solved by discretizing time and using dynamic
β is equal to the summation of the optimal objective value
of the MDP and the optimal Lagrangian dual variable programming to solve the discrete-time optimal stopping
problems. However, our optimal sampler of OU processes is
associated to the sampling rate constraint. If there is no
sampling rate constraint, the Lagrangian dual variable is obtained analytically. Remote estimation over several different
zero and hence β is exactly the optimal objective value. channel models was recently studied in, e.g., [35], [36]. In
[30]–[36], the optimal sampling policies were proven to be
Among the technical tools developed to prove this result
is a free boundary method [11], [12] for finding the threshold policies. Because of the queueing model, our optimal
optimal stopping time of the OU process. sampling policy has a different structure from those in [30]–
[36]. Specifically, in our optimal sampling policy, sampling is
• The optimal sampler design of Wiener process in [3],
[4] is a limiting case of the above result. By comparing suspended when the server is busy and is reactivated once the
server becomes idle. The optimal sampling policy for Wiener
the optimal sampling policies of OU process and Wiener
process, we find that the threshold function v(β) changes processes in [3], [4] is a limiting case of ours.
according to the signal model, where the parameter β is II. M ODEL AND F ORMULATION
determined in the same way for both signal models.
• Further, we consider a class of signal-ignorant sampling A. System Model
policies, where the sampling times are determined with- We consider the remote estimation system illustrated in
out using knowledge of the observed OU process. The Fig. 1, where an observer takes samples from an OU process
optimal signal-ignorant sampling problem is equivalent to Xt and forwards the samples to an estimator through a
an MDP for minimizing the time-average of a nonlinear communication channel. The channel is modeled as a single-
age function p(∆t ), which has been solved recently in server FIFO queue with i.i.d. service times. The system starts
[13]. The age-optimal sampling policy is a threshold to operate at time t = 0. The i-th sample is generated at
policy on expected estimation error, where the threshold time Si and is delivered to the estimator at time Di with a
function is simply v(β) = β and the parameter β is service time Yi , which satisfy Si ≤ Si+1 , Si + Yi ≤ Di ,
determined in the same way as above. Di + Yi+1 ≤ Di+1 , and 0 < E[Yi ] < ∞ for all i. Each
sample packet (Si , XSi ) contains the sampling time Si and
1 This example was suggested to the authors by Thaddeus A. Roppel. the sample value XSi . Let Ut = max{Si : Di ≤ t} be the
∆t Skj ] < ∞, j = 1, 2, . . .2
From this, we can obtain that Si is finite almost surely for all
i. We assume that the OU process {Xt , t ≥ 0} and the service
times {Yi , i = 1, 2, . . . } are mutually independent, and do not
change according to the sampling policy.
A sampling policy π ∈ Π is said to be signal-ignorant
(signal-aware), if π is (not necessarily) independent of
S0 D0 S1 D1 Sj−1 Dj−1 Sj Dj t {Xt , t ≥ 0}. Let Πsignal-ignorant ⊂ Π denote the set of signal-
ignorant sampling policies, defined as
Fig. 2: Evolution of the age ∆t over time.
Πsignal-ignorant = {π ∈ Π : π is independent of {Xt , t ≥ 0}}.(7)
sampling time of the latest received sample at time t. The age C. MMSE Estimator
of information, or simply age, at time t is defined as [1]
According to (6), Si is a finite stopping time. By using [43,
∆t = t − Ut = t − max{Si : Di ≤ t}, (3) Eq. (3)] and the strong Markov property of the OU process
[11, Eq. (4.3.27)], Xt is expressed as
which is shown in Fig. 2. Because Di ≤ Di+1 , ∆t can be
also expressed as Xt =XSi e−θ(t−Si ) + µ 1 − e−θ(t−Si )
 
σ
∆t = t − Si , if t ∈ [Di , Di+1 ), i = 0, 1, 2, . . . (4) + √ e−θ(t−Si ) We2θ(t−Si ) −1 , if t ∈ [Si , ∞). (8)

The initial state of the system is assumed to satisfy S0 = 0,
At any time t ≥ 0, the estimator uses causally received
D0 = Y0 , X0 and ∆0 are finite constants. The parameters µ,
samples to construct an estimate X̂t of the real-time signal
θ, and σ in (1) are known at both the sampler and estimator.
value Xt . The information available to the estimator consists
Let It ∈ {0, 1} represent the idle/busy state of the server of two parts: (i) Mt = {(Si , XSi , Di ) : Di ≤ t}, which
at time t. We assume that whenever a sample is delivered, contains the sampling time Si , sample value XSi , and delivery
an acknowledgement is sent back to the sampler with zero time Di of the samples that have been delivered by time t
delay. By this, the idle/busy state It of the server is known at and (ii) the fact that no sample has been received after the
the sampler. Therefore, the information that is available at the last delivery time max{Di : Di ≤ t}. Similar to [4], [32],
sampler at time t can be expressed as {Xs , Is : 0 ≤ s ≤ t}. [44], we assume that the estimator neglects the second part of
information.3 Then, as shown in Appendix A, the minimum
mean square error (MMSE) estimator is determined by
B. Sampling Policies
X̂t = E[Xt |Mt ] = XSi e−θ(t−Si ) + µ 1 − e−θ(t−Si ) ,
 

In causal sampling policies, each sampling time Si is chosen if t ∈ [Di , Di+1 ), i = 0, 1, 2, . . . (9)
by using the up-to-date information available at the sampler.
To characterize this statement precisely, let us define the σ- D. Problem Formulation
fields The goal of this paper is to find the optimal sampling policy
that minimizes the mean-squared estimation error subject to an
Nt = σ(Xs , Is : 0 ≤ s ≤ t), Nt+ = ∩s>t Ns . (5)
average sampling-rate constraint, which is formulated as the
Hence, {Nt+ , t ≥ 0} is a filtration (i.e., a non-decreasing following problem:
and right-continuous family of σ-fields) of the information "Z
T
#
1 2
available at the sampler. Each sampling time Si is a stopping mseopt = min lim sup E (Xt − X̂t ) dt (10)
time with respect to the filtration {Nt+ , t ≥ 0} such that π∈Π T →∞ T 0
" n #
{Si ≤ t} ∈ Nt+ , ∀t ≥ 0. (6) 1 X 1
s.t. lim inf E (Si+1 − Si ) ≥ , (11)
n→∞ n f max
i=1
Let π = (S1 , S2 , ...) represent a sampling policy and Π
represent the set of causal sampling policies that satisfy two R
2 We will optimize lim supT →∞ E[ 0T (Xt −X̂t )2 dt]/T , but operationally
conditions: (i) Each sampling policy π ∈ Π satisfies (6) for R Di
a nicer criterion is lim supi→∞ E[ 0 (Xt − X̂t )2 dt]/E[Di ]. These criteria
all i. (ii) The sequence of inter-sampling times {Ti = Si+1 − are corresponding to two definitions of “average cost per unit time” that are
Si , i = 0, 1, . . .} forms a regenerative process [37, Section widely used in the literature of semi-Markov decision processes. These two
6.1]: There exists an increasing sequence 0 ≤ k1 < k2 < . . . criteria are equivalent, if {T1 , T2 , . . .} is a regenerative process, or more
generally, if {T1 , T2 , . . .} has only one ergodic class. If not condition is
of almost surely finite random integers such that the post-kj imposed, however, they are different. The interested readers are referred to
process {Tkj +i , i = 0, 1, . . .} has the same distribution as the [38]–[42] for more discussions.
3 In [30]–[36], it was shown that when the sampler and estimator are jointly
post-k0 process {Tk0 +i , i = 0, 1, . . .} and is independent of the
optimized, the MMSE estimator has the same expression no matter with or
pre-kj process {Ti , i = 0, 1, . . . , kj − 1}; further, we assume without the second part of information. We will consider the joint optimization
that E[kj+1 − kj ] < ∞, E[Sk1 ] < ∞, and 0 < E[Skj+1 − of the sampler and estimator in our future work.
Algorithm 1 Bisection method for solving (19) Algorithm 2 Bisection method for solving (21)
given l = mseYi , u = mse∞ , tolerance ǫ > 0. given l = mseYi , u = mse∞ , tolerance ǫ > 0.
repeat repeat
β := (l + u)/2. β := (l + u)/2.
h R Di+1 (β) i
E Di (β)
(Xt −X̂t )2 dt o := E[Di+1 (β) − Di (β)].
o := β − E[D .
i+1 (β)−Di (β)] if o ≥ 1/fmax, u := β; else, l := β.
if o ≥ 0, u := β; else, l := β.
until u − l ≤ ǫ.
until u − l ≤ ǫ. return β.
return β.

if
where mseopt is the optimum value of (10) and fmax is the
maximum allowed sampling rate. When fmax = ∞, this E[Di+1 (β) − Di (β)] > 1/fmax ; (20)
problem becomes an unconstrained problem.
otherwise, β is the root of
III. M AIN R ESULT
E[Di+1 (β) − Di (β)] = 1/fmax . (21)
A. Signal-aware Sampling
Problem (10) is a constrained continuous-time MDP with a The optimal objective value to (10) is then given by
continuous state space. However, we found an exact solution
hR i
D (β)
E Dii+1(β)
(Xt − X̂t )2 dt
to this problem. To present this solution, let us consider an mseopt = . (22)
OU process Ot with initial state Ot = 0 and parameter µ = 0. E[Di+1 (β)−Di (β)]
According to (8), Ot can be expressed as The proof of Theorem 1 is explained in Section IV. The
σ optimal sampling policy in Theorem 1 has a nice structure.
Ot = √ e−θt We2θt −1 . (12) Specifically, the (i+1)-th sample is taken at the earliest time t

satisfying two conditions: (i) The i-th sample has already been
Define delivered by time t, i.e., t ≥ Di (β), and (ii) the estimation
σ2 error |Xt − X̂t | is no smaller than a pre-determined threshold
mseYi = E[OY2 i ] = E[1 − e−2θYi ], (13)
2θ v(β), where v(·) is a non-linear function defined in (18).
2
2 σ
mse∞ = E[O∞ ]= . (14) Lemma 1. In Theorem 1, it holds that mseYi ≤ mseopt ≤ β ≤
2θ mse∞ .
In the sequel, we will see that mseYi and mse∞ are the lower
and upper bounds of mseopt , respectively. We will also need Proof. See Appendix M.
to use the function4 mse −mseYi
2 Z x 2 √
Hence, mse ∞
∞ −β
≥ 1. Further, because G(x) is strictly
ex −t2 ex π increasing on [0, ∞) and G(0) = 1, the inverse function
G(x) = e dt = erf(x), x ∈ [0, ∞), (15)
x 0 x 2 G−1 (·) and the threshold v(β) are properly defined and
where erf(·) is the error function, defined as v(β) ≥ 0. We note that the service time distribution affects
Z x the optimal sampling policy in (17) and (18) through mseYi
2 2
and β.
erf(x) = √ e−t dt. (16)
π 0 The calculation of β falls into two cases: In one case, β
Theorem 1. If the Yi ’s are i.i.d. with 0 < E[Yi ] < ∞, then can be computed by solving (19) via the bisection search
(S1 (β), S2 (β), . . .) with a parameter β is an optimal solution method in Algorithm 1. For this case to occur, the sampling
to (10), where rate constraint (11) needs to be inactive at the β obtained in
n o Algorithm 1. Because Di (β) = Si (β) + Yi , we can obtain
Si+1 (β) = inf t ≥ Di (β) : Xt − X̂t ≥ v(β) , (17) E[Di+1 (β) − Di (β)] = E[Si+1 (β) − Si (β)] and hence (20)
holds when the sampling rate constraint (11) is inactive. In
Di (β) = Si (β) + Yi , v(β) is defined by the other case, β is selected to satisfy the sampling rate
constraint (11) with equality, which is implemented by using

mse∞ − mseYi

σ
v(β) = √ G−1 , (18) another bisection method in Algorithm 2. The upper and lower
θ mse∞ − β
bounds for bisection search in Algorithms 1-2 are determined
G−1 (·) is the inverse function of G(·) in (15) and β ≥ 0 is by Lemma 1. If fmax = ∞, because E[Di+1 (β) − Di (β)] ≥
the root of E[Yi ] > 0, (20) is always satisfied and only the first case can
happen. By comparing (19) and (22), it follows immediately
hR i
D (β) 2
E Dii+1(β)
(X t − X̂ t ) dt
β= , (19) that
E[Di+1 (β)−Di (β)]
Lemma 2. If the sampling rate constraint is removed, i.e.,
4 If x = 0, G(x) is defined as its right limit G(0) = limx→0+ G(x) = 1. fmax = ∞, then β = mseopt .
The calculation of the expectations in Algorithms 1-2 can to (10) is given by
be greatly simplified by using the following lemma, which is hR
Di+1 (β)
i
obtained based on Dynkin’s formula [12, Theorem 7.4.1] and E Di (β) (Xt − X̂t )2 dt
mseopt = . (32)
the optional stopping theorem. E[Di+1 (β)−Di (β)]
Lemma 3. In Theorem 1, it holds that Theorem 2 is an alternative form of Theorem 1 in [3], [4].
The benefit of the new expression in (30)-(32) is that they
E[Di+1 (β) − Di (β)] allows to character β by the optimal objective value mseopt
=E[max{R1 (v(β)) − R1 (OYi )}, 0] + E[Yi ], (23) and the sampling rate constraint (11), in the same way as in
Theorem 1, which appears to be more fundamental than the
"Z #
Di+1 (β)
2
E (Xt − X̂t ) dt expression in [3], [4].
Di (β)

=E[max{R2 (v(β)) − R2 (OYi )}, 0]


B. Signal-ignorant Sampling
+ mse∞ [E(Yi ) − γ] + E max{v 2 (β), OY2 i } γ,
 
(24)
where In signal-ignorant sampling policies, the sampling times Si
are determined based only on the service times Yi , but not on
1
γ= E[1 − e−2θYi ], (25) the observed OU process {Xt , t ≥ 0}.

v2
 
3 θ Lemma 4. If π ∈ Πsignal-ignorant , then the mean-squared
R1 (v) = 2 2 F2 1, 1; , 2; 2 v 2 , (26)
σ 2 σ estimation error of the OU process Xt at time t is
v2 v2
 
3 θ 2 h i σ2
R2 (v) = − + 2 F2 1, 1; , 2; 2 v , (27) E (Xt − X̂t )2 π, Y1 , Y2 , . . . = 1 − e−2θ∆t , (33)

2θ 2θ 2 σ 2θ
and 2 F2 is the generalized hypergeometric function [45]. which a strictly increasing function p(∆t ) of the age ∆t .
Proof. See Appendix N. According to Lemma 4 and Fubini’s theorem, for every
policy π ∈ Πsignal-ignorant ,
"Z # "Z #
Because the Yi s are i.i.d., the expectations in (3) are T
2
T

functions of β and are irrelevant of i. One can improve the E (Xt − X̂t ) dt = E p(∆t )dt . (34)
0 0
accuracy of the solution in Algorithms 1-2 by (i) reducing
the tolerance ǫ and (ii) increasing the number of Monte Hence, minimizing the mean-squared estimation error among
Carlo realizations for computing the expectations. Such a low- signal-ignorant sampling policies can be formulated as the
complexity solution for solving a constrained continuous-time following MDP for minimizing the expected time-average of
MDP with a continuous state space is rare. the nonlinear age function p(∆t ):
1) Sampling of Wiener Processes: In the limiting case that
"Z #
T
1
σ = 1 and θ → 0, the OU process Xt in (1) becomes a Wiener mseage-opt = inf lim sup E p(∆t )dt (35)
π∈Πsignal-ignorant T →∞ T 0
process Xt = Wt . In this case, the MMSE estimator in (9) is " n #
given by 1 X 1
s.t. lim inf E (Si+1 − Si ) ≥ ,
n→∞ n f
X̂t = WSi , if t ∈ [Di , Di+1 ). (28) i=1 max
(36)
As shown in Appendix C, v(·) defined by (18) tends to
p where mseage-opt is the optimal value of (35). By (33), p(∆t )
v(β) = 3(β − E[Yi ]). (29) and mseage-opt are bounded. Problem (35) is one instance of
the problems recently solved in Corollary 3 of [13] for general
Theorem 2. If σ = 1, θ → 0, and the Yi ’s are i.i.d. with 0 <
strictly increasing functions p(·). From this, a solution to (35)
E[Yi ] < ∞, then (S1 (β), S2 (β), . . .) is an optimal solution to
is given by
(10), where
n p o Theorem 3. If the Yi ’s are i.i.d. with 0 < E[Yi ] < ∞, then
Si+1 (β) = inf t ≥ Di (β) : Xt − X̂t ≥ 3(β − E[Yi ]) ,
(S1 (β), S2 (β), . . .) is an optimal solution to (35), where
(30) n o
Si+1 (β) = inf t ≥ Di (β) : E[(Xt+Yi+1 − X̂t+Yi+1 )2 ] ≥ β ,
Di (β) = Si (β) + Yi , and β ≥ 0 is the root of
hR i (37)
D (β) 2
E Dii+1 (X t − X̂ t ) dt
β=
(β)
, (31) Di (β) = Si (β) + Yi , and β ≥ 0 is the root of
E[Di+1 (β)−Di (β)] hR
D (β)
i
2
E Dii+1 (β)
(X t − X̂ t ) dt
if E[Di+1 (β) − Di (β)] > 1/fmax; otherwise, β is the root of β= , (38)
E[Di+1 (β) − Di (β)] = 1/fmax . The optimal objective value E[Di+1 (β)−Di (β)]
if E[Di+1 (β) − Di (β)] > 1/fmax; otherwise, β is the root of According to (8) and (9), the estimation error (Xt − X̂t ) is
E[Di+1 (β) − Di (β)] = 1/fmax . The optimal objective value of the same distribution with Ot−Si , if t ∈ [Di , Di+1 ). By
to (35) is given by using Dynkin’s formula and the optional stopping theorem,
hR
D (β) 2
i we obtain the following lemma.
E Dii+1
(β) (Xt − X̂t ) dt
mseage-opt = . (39) Lemma 5. Let τ ≥ 0 be a stopping time of the OU process
E[Di+1 (β)−Di (β)] Ot with E [τ ] < ∞, then
Because Πsignal-ignorant ⊂ Π, it follows immediately that Z τ   2 
2 σ 1 2
mseopt ≤ mseage-opt . E Ot dt = E τ − Oτ . (42)
0 2θ 2θ
C. Discussions of the Results If, in addition, τ is the first exit time of a bounded set, then
The difference among Theorems 1-3 is only in the ex- E [τ ] = E[R1 (Oτ )], (43)
pressions (17), (30), (37) of threshold policies, whereas the Z τ 
parameter β is determined by the optimal objective value E Ot2 dt = E[R2 (Oτ )], (44)
and the sampling rate constraint in the same manner. Later 0

on in (55), we have shown that β is exactly equal to the where R1 (·) and R2 (·) are defined in (26) and (27), respec-
summation of the optimal objective value of the MDP and the tively.
optimal Lagrangian dual variable associated to the sampling
Proof. See Appendix D.
rate constraint.
In signal-aware sampling policies (17) and (30), the sam- B. Suspend Sampling when the Server is Busy
pling time is determined by the instantaneous estimation error
Xt − X̂t , and the threshold v(β) is determined by the By using the strong Markov property of the OU process
signal model. In the signal-ignorant sampling policy (37), the Xt and the orthogonality principle of MMSE estimation, we
sampling time is determined by the expected estimation error obtain the following useful lemma:
E[(Xt+Yi+1 − X̂t+Yi+1 )2 ] at time t + Yi+1 . We note that if Lemma 6. In (10), it is suboptimal to take a new sample
t = Si+1 (β), then t + Yi+1 = Si+1 (β) + Yi+1 = Di+1 (β) before the previous sample is delivered.
is the delivery time of the new sample. Hence, (37) requires
that the expected estimation error upon the delivery of the Proof. See Appendix E.
new sample is no less than β. Finally, it is worth noting that
Theorems 1-3 hold for all distributions of the service times A similar result was obtained in [4] for the sampling of
Yi satisfying 0 < E[Yi ] < ∞, and for both constrained and Wiener processes. By Lemma 6, there is no loss to consider
unconstrained sampling problems. a sub-class of sampling policies Π1 ⊂ Π such that each
sample is generated and sent out after all previous samples
IV. P ROOF OF THE M AIN R ESULT are delivered, i.e.,

We prove Theorem 1 in four steps: (i) We first show Π1 = {π ∈ Π : Si = Gi ≥ Di−1 for all i}.
that sampling should be suspended when the server is busy, For any policy π ∈ Π1 , the information used for determining
which can be used to simplify (10). (ii) We use an extended Si includes: (i) the history of signal values (Xt : t ∈ [0, Si ])
Dinkelbach’s method [46] and Lagrangian duality method to and (ii) the service times (Y1 , . . . , Yi−1 ) of previous samples.
decompose the simplified problem into a series of mutually
Let us define the σ-fields Ft+ = σ(Xs : s ∈ [0, t]) and
independent per-sample MDP. (iii) We utilize the free bound- Ft+ = ∩r>t Fr+ . Then, {Ft+ , t ≥ 0} is the filtration (i.e., a
ary method from optimal stopping theory [11] to solve the non-decreasing and right-continuous family of σ-fields) of the
per-sample MDPs. (iv) Finally, we use a geometric multiplier OU process Xt . Given the service times (Y1 , . . . , Yi−1 ) of
method [47] to show that the duality gap is zero. The above previous samples, Si is a stopping time with respect to the
proof framework is an extension to that used in [3], [4], [13], filtration {Ft+ , t ≥ 0} of the OU process Xt , that is
where the most challenging part is in finding the analytical
solution of the per-sample MDP in Step (iii). [{Si ≤ t}|Y1 , . . . , Yi−1 ] ∈ Ft+ . (45)

A. Preliminaries Hence, the policy space Π1 can be expressed as

The OU process Ot in (12) with initial state Ot = 0 and Π1 ={Si : [{Si ≤ t}|Y1 , . . . , Yi−1 ] ∈ Ft+ ,
parameter µ = 0 is the solution to the SDE Ti is a regenerative process}. (46)
dOt = −θOt dt + σdWt . (40) Let Zi = Si+1 − Di ≥ 0 represent the waiting time between
the delivery time Di of the i-th sample and the generation time
In addition, the infinitesimal generator of Ot is [48, Eq. A1.22]
Si+1 of the (i+1)-th sample.As Zi is the waiting time between
∂ σ2 ∂ 2 the delivery time Di of the i-th sample and thePgeneration time
G = −θu + . (41)
∂u 2 ∂u2 Si+1 of the (i + 1)-th sample. Then, Si = i−1 j=0 (Yj + Zj )
Pi−1
and Di = j=0 (Yj + Zj ) + Yi for each i = 1, 2, . . . D. Lagrangian Dual Problem of (50)
Given (Y0 , Y1 , . . .), (S1 , S2 , . . .) is uniquely determined by Next, we use the Lagrangian dual approach to solve (50)
(Z0 , Z1 , . . .). Hence, one can also use π = (Z0 , Z1 , . . .) to with c = mseopt . We define the Lagrangian associated with
represent a sampling policy. (50) as
Because {Xt − X̂t , t ∈ [Di , Di+1 )} and {Ot−Si , t ∈
[Di , Di+1 )} are of the same distribution, for each i = 1, 2, . . ., L(π; λ)
n  Z Yi +Zi +Yi+1 
"Z # 1X
Di+1
2 = lim E Os2 ds − (mseopt + λ)(Yi +Zi )
E (Xt − X̂t ) dt n→∞ n Yi
i=1
Di
λ
+ , (52)
"Z # "Z #
Di+1 Yi +Zi +Yi+1
=E 2
Ot−S dt =E Os2 ds . (47) fmax
i
Di Yi where λ ≥ 0 is the dual variable. Let
Because Ti is a regenerative process, the renewal theory [49] e(λ) = inf L(π; λ). (53)
tells us that n1 E[Sn ] is a convergent sequence and π∈Π1
"Z
T
# Then, the dual problem of (50) is defined by
1 2
lim sup E (Xt − X̂t ) dt
T →∞ T 0
d = max e(λ), (54)
λ≥0
hR i
D
E 0 n (Xt − X̂t )2 dt where d is the optimum value of (54). Weak duality [47]
= lim implies d≤h(mseopt ). In Section IV-F, we will establish strong
n→∞ E[Dn ]
Pn hR
Yi +Zi +Yi+1 2
i duality, i.e., d = h(mseopt ). In the sequel, we solve (53). Let
i=1 E Yi
O s ds us define
= lim Pn . (48)
n→∞
i=1 E [Yi + Z i ] β = mseopt + λ. (55)
Hence, (10) can be rewritten as the following MDP:
hR i As shown in Appendix F, by using Lemma 5, we can obtain
Pn Yi +Zi +Yi+1 2
i=1 E Yi
Os ds "Z
Yi +Zi +Yi+1
#
mseopt = min lim Pn (49) E Os2 ds−β(Yi +Zi )
π∈Π1 n→∞ i=1 E [Yi + Zi ] Yi
n
1 X 1 "Z
Yi +Zi
#
s.t. lim E [Yi + Zi ] ≥ ,
n→∞ n f max =E (Os2 − β)ds + γOY2 i +Zi
i=1 Yi
where mseopt is the optimal value of (49). σ2
+ [E(Yi+1 ) − γ] − βE[Yi+1 ], (56)

C. Reformulation of Problem (49) For any s ≥ 0, define the σ-fields Fts = σ(Os+r − Os :
r ∈ [0, t]) and Fts+ = ∩r>t Frs . Then, {Fts+ , t ≥ 0} is the
In order to solve (49), let us consider the following MDP
filtration of the time-shifted OU process {Os+t − Os , t ∈
with a parameter c ≥ 0:
"Z # [0, ∞)}. Define Ms as the set of integrable stopping times
n
1X Yi +Zi +Yi+1
2
of {Os+t − Os , t ∈ [0, ∞)}, i.e.,
h(c) = min lim E Os ds − c(Yi + Zi )
π∈Π1 n→∞ n Yi Ms = {τ ≥ 0 : {τ ≤ t} ∈ Fts+ , E [τ ] < ∞}. (57)
i=1
(50)
n By using a sufficient statistic of (53), we can obtain
1X 1
s.t. lim E [Yi + Zi ] ≥ , Lemma 8. An optimal solution (Z0 , Z1 , . . .) to (53) satisfies
n→∞ n fmax
i=1 "Z #
Yi +Zi
where h(c) is the optimum value of (50). Similar with Dinkel- 2 2
min E (Os − β)ds + γOYi +Zi OYi , Yi , (58)
Zi ∈MYi
bach’s method [46] for nonlinear fractional programming, the Yi
following lemma holds for the MDP (49): where β ≥ 0 and γ ≥ 0 are defined in (55) and (25),
Lemma 7. [4] The following assertions are true: respectively.
Proof. See Appendix G.
(a). mseopt T c if and only if h(c) T 0.
By this, (53) is decomposed as a series of per-sample MDP
(b). If h(c) = 0, the solutions to (49) and (50) are identical.
(58).
Hence, the solution to (49) can be obtained by solving (50) E. Analytical Solution to Per-Sample MDP (58)
and seeking c = mseopt ≥ 0 such that
We solve (58) by using the free-boundary approach for
h(mseopt ) = 0. (51) optimal stopping problems [11].
Let us consider an OU process Vt with initial state V0 = v boundary condition (63), C2 is chosen as
and parameter µ = 0. Define the σ-fields FtV = σ(Vs : s ∈ 1

1 β

3 θ 
[0, t]), FtV + = ∩r>t FrV , and the filtration {FtV + , t ≥ 0} C2 = E e−2θYi v∗2 + − 2 2 F2 1, 1; , 2; 2 v∗2 v∗2 ,

2θ 2θ σ 2 σ
associated to {Vt , t ≥ 0}. Define MV as the set of integrable
stopping times of {Vt , t ∈ [0, ∞)}, i.e., where we have used (25). By this, we obtain the expression
of H(v) in the continuation region (−v∗ , v∗ ). In the stopping
MV = {τ ≥ 0 : {τ ≤ t} ∈ FtV + , E [τ ] < ∞}. (59) region |v| ≥ v∗ , the stopping time in (61) is τ∗ = 0 since
Our goal is to solve the following optimal stopping problem |V0 | = |v| ≥ v∗ . Hence, if |v| ≥ v∗ , the objective value
for any given initial state v ∈ R achieved by the sampling time (61) is
 Z τ   Z 0 
sup Ev −γVτ2 − (Vs2 − β)ds , (60)
2
Ev −γv − (Vs − β)ds = −γv 2 .
2

τ ∈MV 0 0

where Ev [·] is the conditional expectation for given initial state By this, we obtain a candidate of the value function for (60):
V0 = v.
 2  
 − v2θ + 2θ1
− σβ2 2 F2 1, 1; 23 , 2; σθ2 v 2 v 2 + C2 ,


In order to solve (60), we first find a candidate solution to H(v) = if |v| < v∗ ,
−γv 2 ,

if |v| ≥ v∗ .

(60) by solving a free boundary problem; then we prove that
the free boundary solution is indeed the value function of (60): (67)
Hence, (58) is one instance of (60) with v = OYi . where the Next, we find v∗ . By taking the gradient of H(v), we get
supremum is taken over all stopping times τ of Vt . √ !
 
′ v σ 2β θ
H (v) = − + 3 −
√ F v , v ∈ (−v∗ , v∗ ),
1) A candidate solution to (60): Now, we show how to θ θ2 σ θ σ
solve (60). The general optimal stopping theory in Chapter I (68)
of [11] tells us that the following guess of the stopping time
where
should be optimal for Problem (60): Z x
2 2
τ∗ = inf{t ≥ 0 : |Vt | ≥ v∗ }, (61) F (x) = ex e−t dt. (69)
0
where v∗ ≥ 0 is the optimal stopping threshold to be found. The boundary condition (64) implies that v∗ is the root of
Observe that in this guess, the continuation region (−v∗ , v∗ )   √ !
is assumed symmetric around zero since the OU process is v σ 2β θ
− + 3 − √ F v = −2γv. (70)
symmetric, i.e., the process {−Vt , t ≥ 0} is also an OU θ θ 2 σ θ σ
process started at −v. Similarly, we may also argue that the
Substituting (13), (14), and (25) into (70), yields that v∗ is the
value function should be even.
root of
√ 
According to [11, Chapter 8], and [12, Chapter 10], the θ
(mse∞ − β) G v = mse∞ − mseYi , (71)
value function and the optimal stopping threshold v∗ should σ
satisfy the following free boundary problem:
where G(·) is defined in (15). Solving (71), we obtain that
σ 2 ′′ v∗ can be expressed as a function v(β) of β, where v(β) is
H (v) − θvH ′ (v) = v 2 − β, v ∈ (−v∗ , v∗ ), (62)
2 defined by (18). By this, we obtain a candidate solution to
H(±v∗ ) = −γv∗2 , (63) (60).
H ′ (±v∗ ) = ∓2γv∗ . (64) 2) Verification of the optimality of the candidate solution:
Next, we use Itô’s formula to verify the above candidate
In Appendix H, we use the integrating factor method [50, Sec. solution is indeed optimal, as stated in the following theorem:
I.5] to find the general solution to (62) given by
Theorem 4. For all v ∈ R, H(v) in (67) is the value function
v2
   
1 β 3 θ 2 2
H(v) = − + − 2 F2 1, 1; , 2; 2 v v of the optimal stopping problem (60). The optimal stopping
2θ 2θ σ 2 2 σ time for solving (60) is τ∗ in (61), where v∗ = v(β) is given
√ !
θ by (18).
+ C1 erfi v + C2 , v ∈ (−v∗ , v∗ ), (65)
σ In order to prove Theorem 4, we have established the
where C1 and C2 are constants to be found for satisfying following properties of H(v):
(63)-(64), and erfi(x) is the imaginary error function, i.e., Rτ
Lemma 9. H(v) = Ev −γVτ2∗ − 0 ∗ (Vs2 − β)ds .
 
Z x
2 2
erfi(x) = √ et dt. (66) Proof. See Appendix I.
π 0
Because H(v) should be even, C1 = 0. In order to satisfy the Lemma 10. H(v) ≥ −γv 2 for all v ∈ R.
Proof. See Appendix J. 1.05

1
A function f (v) is said to be excessive for the process Vt
0.95
if
0.9 Uniform sampling
Zero-wait sampling
Ev f (Vt ) ≤ f (v), ∀t ≥ 0, v ∈ R. (72) 0.85 Age-optimal sampling
MSE-optimal sampling

MSE
By using Itô’s formula in stochastic calculus, we can obtain 0.8

0.75
Lemma 11. The function H(v) is excessive for the process 0.7
Vt .
0.65

Proof. See Appendix K. 0.6

0.55
Now, we are ready to prove Theorem 4. 0 0.5 1 1.5
fmax

Proof of Theorem 4. InR Lemmas 9-11, we have shown that


τ Fig. 3: MSE vs fmax tradeoff for i.i.d. exponential service time
H(v) = Ev −γVτ2∗ − 0 ∗ (Vs2 − β)ds , H(v) ≥ −γv 2 , and

with E[Yi ] = 1, where the parameters of the OU process are
H(v) is an excessive function. Moreover, from the proof of
σ = 1 and θ = 0.5.
Lemma 9, we know that Ev [τ∗ ] < ∞ holds for all v ∈ R.
Hence, Pv (τ∗ < ∞) = 1 for all v ∈ R. These conditions and
Theorem 1.11 in [11, Section 1.2] imply that τ∗ is an optimal V. N UMERICAL R ESULTS
stopping time of (60). This completes the proof. In this section, we evaluate the estimation error achieved by
the following four sampling policies:
An immediate consequence of Theorem 4 is
1. Uniform sampling: Periodic sampling with a period given
Corollary 1. A solution to (58) is (Z1 (β), Z2 (β), . . .), where by Si+1 − Si = 1/fmax.
Zi (β) = inf{t ≥ 0 : |OYi +t | ≥ v(β)}, (73) 2. Zero-wait sampling [1], [15]: The sampling policy given
by
and v(β) is defined in (18).
Si+1 = Si + Yi , (77)
F. Zero Duality Gap between (50) and (54) which is infeasible when fmax < 1/E[Yi ].
Strong duality is established in the following thoerem: 3. Age-optimal sampling [13]: The sampling policy given
by Theorem 3.
Theorem 5. If the service times Yi are i.i.d. with 0 < E[Yi ] < 4. MSE-optimal sampling: The sampling policy given by
∞, then the duality gap between (50) and (54) is zero. Further,
Theorem 1.
(Z0 (β), Z1 (β), . . .) is an optimal solution to (50) and (54),
where Zi (β) is determined by Let mseuniform , msezero-wait , mseage-opt , and mseopt , be the MSEs
of uniform sampling, zero-wait sampling, age-optimal sam-
Zi (β) = inf{t ≥ 0 : |OYi +t | ≥ v(β)}, (74) pling, MSE-optimal sampling, respectively. We can obtain
v(β) is defined in (18), β ≥ 0 is the root of mseYi ≤ mseopt ≤ mseage-opt ≤ mseuniform ≤ mse∞ ,
mseage-opt ≤ msezero-wait ≤ mse∞ , (78)
hR i
Y +Z (β)+Yi+1 2
E Yii i Ot dt
β= , (75) whenever zero-wait sampling is feasible, which fit with our
E[Yi + Zi (β)]
numerical results.
if E[Yi + Zi (β)] > 1/fmax; otherwise, β is the root of E[Yi + Figure 3 illustrates the tradeoff between the MSE and fmax
Zi (β)] = 1/fmax. Further, the optimal objective value to (49)
for i.i.d. exponential service time with mean E[Yi ] = 1.
is given by
hR i Because E[Yi ] = 1, the maximum throughput of the queue
Y +Z (β)+Yi+1 2 is 1. The parameters of the OU process are σ = 1, θ = 0.5
E Yii i Ot dt
mseopt = . (76) and µ can be chosen arbitrarily. The lower bound mseYi is 0.5
E[Yi + Zi (β)] and the upper bound mse∞ is 1. In fact, as Yi is an exponential
2
random variable with mean 1, σ2θ (1 − e−2θYi ) has a uniform
Proof. We use [47, Prop. 6.2.5] to find a geometric multiplier distribution on [0, 1]. Hence, it is natural that mseYi = 0.5.
for (50). This suggests that the duality gap between (50) and For small values of fmax , age-optimal sampling has similarity
(54) must be zero, because otherwise there exists no geometric with uniform sampling, and hence mseage-opt and mseuniform are
multiplier [47, Prop. 6.2.3(b)]. The details are provided in of same values. However, as fmax grows mseuniform reaches
Appendix L. the upper bound and remains constant for fmax ≥ 1. This
is because the queue length of uniform sampling is large at
Hence, Theorem 1 follows from Theorem 5. high sampling frequencies. In particular, when fmax ≥ 1, the
queue length of uniform sampling is infinite. On the other [18] Q. He, D. Yuan, and A. Ephremides, “Optimal link scheduling for age
hand, mseage-opt and mseopt decrease with respect to fmax . The minimization in wireless systems,” IEEE Trans. Inf. Theory, vol. 64,
no. 7, pp. 5381–5394, July 2018.
reason behind this is that the set of feasible sampling policies [19] C. Joo and A. Eryilmaz, “Wireless scheduling for information fresh-
satisfying the constraint in (10) and (35) becomes larger as ness and synchrony: Drift-based design and heavy-traffic analysis,”
fmax grows, and hence the optimal values of (10) and (35) IEEE/ACM Trans. Netw., vol. 26, no. 6, pp. 2556–2568, Dec 2018.
[20] A. M. Bedewy, Y. Sun, and N. B. Shroff, “Optimizing data freshness,
are decreasing in fmax . As we expected, msezero-wait is larger throughput, and delay in multi-server information-update systems,” in
than mseopt and mseage-opt . Moreover, all of them are between IEEE ISIT, 2016.
the lower bound and upper bound. [21] ——, “Age-optimal information updates in multihop networks,” in IEEE
ISIT, 2017.
[22] ——, “Minimizing the age of the information through queues,” 2017,
VI. C ONCLUSION coRR, abs/1709.04956.
[23] ——, “The age of information in multihop networks,” 2017, coRR,
In this paper, we have studied the optimal sampler design abs/1712.10061.
for remote estimation of OU processes through queues. We [24] Y. Sun, E. Uysal-Biyikoglu, and S. Kompella, “Age-optimal updates of
multiple information flows,” in IEEE INFOCOM AoI Workshop, 2018.
have developed optimal sampling policies that minimize the [25] I. Kadota, A. Sinha, and E. Modiano, “Optimizing age of information
estimation error of OU processes subject to a sampling rate in wireless networks with throughput constraints,” in IEEE INFOCOM,
constraint. These optimal sampling policies have nice struc- April 2018, pp. 1844–1852.
[26] R. Talak, S. Karaman, and E. Modiano, “Optimizing information fresh-
tures and are easy to compute. A connection between remote ness in wireless networks under general interference constraints,” in
estimation and nonlinear age metrics has been found. The ACM MobiHoc, 2018.
structural properties of the optimal sampling policies shed [27] N. Lu, B. Ji, and B. Li, “Age-based scheduling: Improving data freshness
for wireless real-time traffic,” in ACM MobiHoc, 2018.
lights on the possible structure of the optimal sampler designs [28] B. Zhou and W. Saad, “Joint status sampling and updating for min-
for more general signal models, such as Feller processes, imizing age of information in the internet of things,” 2018, coRR,
which is an important future research direction. abs/1807.04356.
[29] ——, “Minimizing age of information in the internet of things with
non-uniform status packet sizes,” 11 2018.
R EFERENCES [30] B. Hajek, K. Mitzel, and S. Yang, “Paging and registration in cellular
networks: Jointly optimal policies and an iterative algorithm,” IEEE
[1] S. Kaul, R. D. Yates, and M. Gruteser, “Real-time status: How often Trans. Inf. Theory, vol. 54, no. 2, pp. 608–622, Feb 2008.
should one update?” in IEEE INFOCOM, 2012. [31] G. M. Lipsa and N. C. Martins, “Remote state estimation with commu-
[2] X. Song and J. W. S. Liu, “Performance of multiversion concurrency nication costs for first-order LTI systems,” IEEE Trans. Auto. Control,
control algorithms in maintaining temporal consistency,” in Proceed- vol. 56, no. 9, pp. 2013–2025, Sept. 2011.
ings., Fourteenth Annual International Computer Software and Applica- [32] M. Rabi, G. V. Moustakides, and J. S. Baras, “Adaptive sampling for
tions Conference, Oct 1990, pp. 132–139. linear state estimation,” SIAM Journal on Control and Optimization,
[3] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of vol. 50, no. 2, pp. 672–702, 2012.
the Wiener process over a channel with random delay,” in IEEE ISIT, [33] A. Nayyar, T. Başar, D. Teneketzis, and V. V. Veeravalli, “Optimal
2017. strategies for communication and remote estimation with an energy
[4] ——, “Sampling of the Wiener process for remote estimation over a harvesting sensor,” IEEE Trans. Auto. Control, vol. 58, no. 9, pp. 2246–
channel with random delay,” 2017, coRR, abs/1707.02531. 2260, Sept. 2013.
[5] G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian [34] K. Nar and T. Başar, “Sampling multidimensional Wiener processes,”
motion,” Phys. Rev., vol. 36, pp. 823–841, Sept. 1930. in IEEE CDC, Dec. 2014, pp. 3426–3431.
[6] J. L. Doob, “The Brownian movement and stochastic equations,” Annals [35] X. Gao, E. Akyol, and T. Başar, “Optimal communication scheduling
of Mathematics, vol. 43, no. 2, pp. 351–369, 1942. and remote estimation over an additive noise channel,” Automatica,
[7] L. Evans, S. Keef, and J. Okunev, “Modelling real interest rates,” Journal vol. 88, pp. 57 – 69, 2018.
of Banking and Finance, vol. 18, no. 1, pp. 153 – 165, 1994. [36] J. Chakravorty and A. Mahajan, “Remote estimation over a packet-drop
[8] A. Cika, M.-A. Badiu, and J. P. Coon, “Quantifying link stability in ad channel with Markovian state,” 2018, coRR, abs/1807.09706.
hoc wireless networks subject to Ornstein-Uhlenbeck mobility,” 2018, [37] P. J. Haas, Stochastic Petri Nets: Modelling, Stability, Simulation. New
coRR, abs/1811.03410. York, NY: Springer New York, 2002.
[9] H. Kim, J. Park, M. Bennis, and S. Kim, “Massive UAV-to-ground com- [38] S. M. Ross, Applied Probability Models with Optimization Applications.
munication and its stable movement control: A mean-field approach,” in San Francisco, CA: Holden-Day, 1970.
IEEE SPAWC, June 2018, pp. 1–5. [39] H. Mine and S. Osaki, Markovian Decision Processes. New York:
[10] M. Wilkinson and A. Pumir, “Spherical Ornstein-Uhlenbeck processes,” Elsevier, 1970.
Journal of Statistical Physics, vol. 145, no. 1, p. 113, Sep 2011. [40] D. Hayman and M. Sobel, Stochastic models in Operations Research,
[11] G. Peskir and A. N. Shiryaev, Optimal Stopping and Free-Boundary Volume II: Stochastic Optimizations. New York: McGraw-Hill, 1984.
Problems. Basel, Switzerland: Birkhäuswer Verlag, 2006. [41] E. A. Feinberg, “Constrained semi-Markov decision processes with
[12] B. Øksendal, Stochastic Differential Equations: An Introduction with average rewards,” Zeitschrift für Operations Research, vol. 39, no. 3,
Applications, 5th ed. Springer-Verlag Berlin Heidelberg, 2000. pp. 257–288, 1994.
[13] Y. Sun and B. Cyr, “Sampling for data freshness optimization: Non- [42] D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed.
linear age functions,” 2018, coRR, abs/1812.07241. Belmont, MA: Athena Scientific, 2005, vol. 1.
[14] C. Kam, S. Kompella, G. D. Nguyen, and A. Ephremides, “Effect of [43] R. A. Maller, G. Müller, and A. Szimayer, “Ornstein-Uhlenbeck
message transmission path diversity on status age,” IEEE Trans. Inf. processes and extensions,” in Handbook of Financial Time Series,
Theory, vol. 62, no. 3, pp. 1360–1374, March 2016. T. Mikosch, J.-P. Kreiß, R. A. Davis, and T. G. Andersen, Eds. Berlin,
[15] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 421–437.
“Update or wait: How to keep your data fresh,” IEEE Trans. Inf. Theory, [44] T. Soleymani, S. Hirche, and J. S. Baras, “Optimal information control
vol. 63, no. 11, pp. 7492–7508, Nov. 2017. in cyber-physical systems,” IFAC-PapersOnLine, vol. 49, no. 22, pp. 1
[16] C. Kam, S. Kompella, G. D. Nguyen, J. E. Wieselthier, and – 6, 2016.
A. Ephremides, “On the age of information with packet deadlines,” IEEE [45] H. Exton, Handbook of hypergeometric integrals: Theory, applications,
Trans. Inf. Theory, vol. 64, no. 9, pp. 6419–6428, Sept. 2018. tables, computer programs. Halsted Press, a division of J. Wiley, 1978.
[17] R. D. Yates and S. K. Kaul, “The age of information: Real-time status [46] W. Dinkelbach, “On nonlinear fractional programming,” Management
updating by multiple sources,” IEEE Trans. Inf. Theory, in press, 2018. Science, vol. 13, no. 7, pp. 492–498, 1967.
[47] D. P. Bertsekas, A. Nedić, and A. E. Ozdaglar, Convex Analysis and could be correlated with (Si , Di , Di+1 ) and hence Step (b) of
Optimization. Belmont, MA: Athena Scientific, 2003. (81) may not hold. Substituting (4) into (81), yields that for
[48] A. N. Borodin and P. Salminen, Handbook of Brownian motion – Facts
and Formulae. Basel, Switzerland: Birkhäuswer Verlag, 1996. all t ≥ 0
[49] S. M. Ross, Stochastic Processes, 2nd ed. John Wiley& Sons, Inc., h i σ2
E (Xt − X̂t )2 π, Y1 , Y2 , . . . = 1 − e−2θ∆t , (83)

1996.
[50] H. Amann, Ordinary Differential Equations An Introduction to Nonlin- 2θ
ear Analysis. Berlin: Walter De Gruyter, 1990. which is strictly increasing in ∆t . This completes the proof.
[51] T. M. Liggett, Continuous Time Markov Processes: An Introduction.
Providence, Rhode Island: American Mathematical Society, 2010.
[52] R. Durrett, Probability: Theory and Examples, 4th ed. Cambridge
Univerisity Press, 2010.
[53] C. Jia and G. Zhao, “Moderate maximal inequalities for the Ornstein- A PPENDIX C
Uhlenbeck process,” 2017, coRR, abs/711.00902. P ROOF OF (29)
[54] H. V. Poor, An Introduction to Signal Detection and Estimation, 2nd ed.
New York, NY, USA: Springer-Verlag New York, Inc., 1994.
[55] P. Morters and Y. Peres, Brownian Motion. Cambridge Univerisity
Press, 2010. When σ = 1, (71) can be expressed as
√ 
θv = E e−2θYi ,
 
(1 − 2θβ) G (84)
A PPENDIX A
P ROOF OF (9) The error function erf(x) has a Maclaurin series representa-
tion, given by
The MMSE estimator X̂t can be expressed as
x3
 
2
erf(x) = √ x − + o(x3 ) . (85)
X̂t = E[Xt |Mt ] π 3
= E[Xt |{(Sj , XSj , Dj ) : Dj ≤ t}]. (79) Hence, the Maclaurin series representation of G(x) in (15) is
Recall that Ut = max{Si : Di ≤ t} is the generation time of 2x2
the latest received sample at time t. According to the strong G(x) = 1 + + o(x2 ). (86)
3
Markov property of Xt [11, Eq. (4.3.27)] and the fact that the √
Yi ’s are independent of {Xt , t ≥ 0}, {Ut , XUt } is a sufficient Let x = θv, we get
√  2
statistic for estimating Xt based on {(Sj , XSj , Dj ) : Dj ≤ t}.
G θv = 1 + θv 2 + o(θ). (87)
If t ∈ [Di , Di+1 ), (4) suggests that Ut = Si and XUt = XSi . 3
This and (8) tell us that, if t ∈ [Di , Di+1 ), then In addition,
X̂t = E[Xt |{(Si , XSi , Di ) : Di ≤ t}] E e−2θYi = 1 − 2θE[Yi ] + o(θ).
 
(88)
= E[Xt |Si , XSi ]
Hence, (84) can be expressed as
= XSi e−θ(t−Si ) + µ 1 − e−θ(t−Si ) .
 
(80)  
2 2
(1 − 2βθ) 1 + θv + o(θ) = 1 − 2θE[Yi ] + o(θ). (89)
This completes the proof. 3
Expanding (89), yields
A PPENDIX B 2
P ROOF OF L EMMA 4 2θE[Yi ] − 2βθ + θv 2 + o(θ) = 0. (90)
3
In any signal-ignorant policy, because the sampling times Si Divided by θ and let θ → 0 on both sides of (90), yields
and the service times Yi are both independent of {Xt , t ≥ 0}, v 2 − 3(β − E[Yi ]) = 0. (91)
the delivery times Di are also independent of {Xt , t ≥ 0}. p
Hence, for any t ∈ [Di , Di+1 ), Equation (91) has twop roots − 3(β − E[Yi ]), and
p
h i 3(β − E[Yi ]). If v∗ = − 3(β − E[Yi ]), the free boundary
E (Xt − X̂t )2 Si , Di , Di+1 problem in (62)-(64) are invalid. Hence, as θ → 0 and σ = 1,
p
the root of (18) is v∗ = 3(β − E[Yi ]). This completes the
 2 
(a) σ −2θ(t−Si ) 2
= E e We2θ(t−Si ) −1 Si , Di , Di+1 proof.

2 h
(b) σ
i
= 1 − e−2θ(t−Si ) , (81)

where Step (a) is due to (8)-(9) and Step (b) is due to E[Wt2 ] = A PPENDIX D
t for all constant t ≥ 0. We note that in signal-aware sampling P ROOF OF L EMMA 5
policies,
σ 2 −2θ(t−Si ) 2 We first prove (42). It is known that the OU process Ot is a
(Xt − X̂t )2 = e We2θ(t−Si ) −1 (82)
2θ Feller process [51, Section 5.5]. By using a property of Feller
process in [51, Theorem 3.32], we get completes the proof.
Z t
Ot2 − G(Os2 )ds
A PPENDIX E
Z0 t P ROOF OF L EMMA 6
2
=Ot − (−θOs 2Os + σ 2 )ds
0
Z t Suppose that in the sampling policy π, sample i is generated
=Ot2 − σ 2 t + 2θ Os2 ds (92) when the server is busy sending another sample, and hence
0 sample i needs to wait for some time before being submitted
is a martingale, where G is the infinitesimal generator of to the server, i.e., Si < Gi . Let us consider a virtual
Ot defined in (41). According to [52], the minimum of two sampling policy π ′ = {S0 , . . . , Si−1 , Gi , Si+1 , . . .} such that
stopping times is a stopping time and constant times are the generation time of sample i is postponed from Si to Gi . We
stopping times. Hence, t∧τ is a bounded stopping time for call policy π ′ a virtual policy because it may happen that Gi >
every t ∈ [0, ∞), where x∧y = min{x, y}. Then, by Theorem Si+1 . However, this will not affect our proof below. We will
8.5.1 of [52], for every t ∈ [0, ∞) show that the MSE of the sampling policy π ′ is smaller than
Z t∧τ   2    that of the sampling policy π = {S0 , . . . , Si−1 , Si , Si+1 , . . .}.
2 σ 1 2
E Os ds = E (t∧τ ) − E O . (93) Note that {Xt : t ∈ [0, ∞)} does not change accord-
0 2θ 2θ t∧τ ing to the sampling policy, and the sample delivery times
{D0 , D1 , D2 , . . .} remain the same in policy π and policy π ′ .
hR i
t∧τ
Because E 0 Os2 ds and E[t∧τ ] are positive and increas-
Hence, the only difference between policies π and π ′ is that
ing with respect to t, by using the monotone convergence
the generation time of sample i is postponed from Si to Gi .
theorem [52, Theorem 1.5.5], we get
Z t∧τ  Z τ  The MMSE estimator under policy π is given by (9) and the
MMSE estimator under policy π ′ is given by
lim E Os2 ds = E Os2 ds , (94)
t→∞ 0 0
X̂t′ =E[Xt |(Sj , XSj , Dj )j≤i−1 , (Gi , XGi , Di )]
lim E[(t∧τ )] = E[τ ]. (95) 
t→∞ E[Xt |XGi , Gi ], t ∈ [Di , Di+1 );
= (102)
In addition, according to [53, Theorem 2.2], E[Xt |XSj , Sj ], t ∈ [Dj , Dj+1 ), j 6= i.
    
C θτ C Next, we consider a third virtual sampling policy π ′′ in
E max Os2 ≤ E log 1 + ≤ E [τ ] < ∞.
0≤s≤τ θ σ σ which the samples (XGi , Gi ) and (XSi , Si ) are both delivered
(96) to the estimator at the same time Di . Clearly, the estimator
2 under policy π ′′ has more information than those under
Because Ot∧τ ≤ sup0≤s≤τ Os2 for all t and sup0≤s≤τ Os2 is
policies π and π ′ . By following the arguments in Appendix
integratable, by invoking the dominated convergence theorem
A, one can show that the MMSE estimator under policy π ′′ is
[52, Theorem 1.5.6], we have
 2  X̂t′′ =E[Xt |(Sj , XSj , Dj )j≤i , (Gi , XGi , Di )]
= E Oτ2 .
 
lim E Ot∧τ (97) 
t→∞ E[Xt |XGi , Gi ], t ∈ [Di , Di+1 );
= (103)
Combining (94)-(97), (42) is proven. E[Xt |XSj , Sj ], t ∈ [Dj , Dj+1 ), j 6= i.
We now prove (43) and (44). By using the proof arguments Notice that, because of the strong Markov property of OU
in Appendix H, one can show that R1 (v) in (26) satisfies process, the estimator under policy π ′′ uses the fresher sample
(XGi , Gi ), instead of the stale sample (XSi , Si ), to construct
σ 2 ′′ X̂t′′ during [Di , Di+1 ). Because the estimator under policy
R (v) − θvR1′ (v) = 1, (98)
2 1 π ′′ has more information than that under policy π, one can
and R2 (v) in (27) satisfies imagine that policy π ′′ has a smaller estimation error than
σ 2 ′′ policy π, i.e.,
R (v) − θvR2′ (v) = v 2 . (99)
2 2
(Z ) (Z )
Di+1 Di+1
2 ′′ 2
E (Xt − X̂t ) dt ≥ E (Xt − X̂t ) dt .
In addition, R1 (v) and R2 (v) are twice continuously differen- Di Di
tiable. According to Dynkin’s formula in [12, Theorem 7.4.1 (104)
and the remark afterwards], because the initial value of Ot is
O0 = 0, if τ is the first exit time of a bounded set, then To prove (104), we invoke the orthogonality principle of the
Z τ  MMSE estimator [54, Prop. V.C.2] under policy π ′′ and obtain
E0 [R1 (Oτ )] = R1 (0) + E0 1ds = R1 (0) + E0 [τ ],(100)
(Z )
Di+1
′′ ′′
Z0 τ  E 2(Xt − X̂t )(X̂t − X̂t )dt = 0, (105)
2 Di
E0 [R2 (Oτ )] = R2 (0) + E0 Os ds . (101)
0 where we have used the fact that (XGi , Gi ) and (XSi , Si ) are
Because R1 (0) = R2 (0) = 0, (43) and (44) follow. This available by the MMSE estimator under policy π ′′ . Next, from
(105), we can get and
 
σ −θYi+1
(Z )
Di+1 −θYi+1
E 2OYi +Zi e √ e We2θYi+1 −1
E (Xt − X̂t )2 dt 2θ
Di  
σ
=E [2OYi +Zi ] E e−θYi+1 √ e−θYi+1 We2θYi+1 −1
(Z )
Di+1
=E (Xt − X̂t′′ )2 + (X̂t′′ 2
− X̂t ) dt   2θ 
Di (a) σ
(Z ) = E [2OYi +Zi ]E E e−θYi+1 √ e−θYi+1 We2θYi+1 −1 Yi+1 .
Di+1 2θ
+E 2(Xt − X̂t′′ )(X̂t′′ − X̂t )dt (110)
Di
where Step (a) is due to the law of iterated expectations.
(Z
Di+1
) Because E[Wt ] = 0 for all constant t ≥ 0, it holds for all
=E (Xt − X̂t′′ )2 + (X̂t′′ − X̂t )2 dt realizations of Yi+1 that
Di  
−θYi+1 σ −θYi+1
E e √ e We2θYi+1 −1 Yi+1 = 0. (111)
(Z )
Di+1
≥E (Xt − X̂t′′ )2 dt . (106) 2θ
Di Hence,
In other words, the estimation error of policy π ′′ is no greater
 
−θYi+1 σ −θYi+1
than that of policy π. Furthermore, by comparing (102) and E 2OYi +Zi e √ e We2θYi+1 −1 = 0. (112)

(103), we can see that the MMSE estimators under policies
π ′′ and π ′ are exact the same. Therefore, the estimation error In addition,
 2 
of policy π ′ is no greater than that of policy π. σ −2θYi+1 2
E e We2θYi+1 −1
By repeating the above arguments for all samples i sat- 2θ
2
  
isfying Si < Gi , one can show that the sampling policy (a) σ
= E E e−2θYi+1 We22θYi+1 −1 Yi+1
{S0 , G1 , . . . , Gi−1 , Gi , Gi+1 , . . .} is better than the sampling 2θ
policy π = {S0 , S1 , . . . , Si−1 , Si , Si+1 , . . .}. This completes (b) σ
2 
= E 1 − e−2θYi+1 ,

the proof. (113)

where Step (a) is due to the law of iterated expectations and
Step (b) is due to E[Wt2 ] = t for all constant t ≥ 0. Hence,
A PPENDIX F "Z
Yi +Zi +Yi+1
#
P ROOF OF (56) E Os2 ds
Yi +Zi

σ2 σ2 
E[Yi+1 ] + γE OY2 i +Zi − 2 E 1 − e−2θYi+1
  
According to Lemma 5, =
"Z # 2θ 4θ
Yi +Zi +Yi+1
2
σ2  2 
E Os ds = [E(Yi+1 ) − γ] + E OYi +Zi γ, (114)
Yi +Zi 2θ
σ 2
1 h i where γ is defined in (25). Using this, (56) can be shown
= E[Yi+1 ] − E OY2 i +Zi +Yi+1 − OY2 i +Zi . (107) readily.
2θ 2θ
The second term in (107) can be expressed as A PPENDIX G
P ROOF OF L EMMA 8
h i
E OY2 i +Zi +Yi+1 − OY2 i +Zi
" 2 Because the Yi ’s are i.i.d., (56) is determined by the control
σ
=E OYi +Zi e−θYi+1 + √ e−θYi+1 We2θYi+1 −1 decision Zi and the information (OYi , Yi ). Hence, (OYi , Yi )
2θ is a sufficient statistic for determining Zi in (53). Therefore,
there exists an optimal policy (Z0 , Z1 , . . .) to (53), in which

2
− OYi +Zi Zi is determined based on only (OYi , Yi ). By this, (53) is

σ 2 −2θYi+1 2
 decomposed into a sequence of per-sample MDPs, given by
2 −2θYi+1
=E OYi +Zi (e − 1) + e We2θYi+1 −1 (58). This completes the proof.

 
σ A PPENDIX H
+ E 2OYi +Zi e−θYi+1 √ e−θYi+1 We2θYi+1 −1 . (108)
2θ P ROOF OF (65)
Because Yi+1 is independent of OYi +Zi and Wt , we have Define S(v) = H ′ (v). Then, (62) becomes
E OY2 i +Zi (e−2θYi+1 − 1) = E OY2 i +Zi E e−2θYi+1 − 1 ,
     
2θ 2
(109) S ′ (v) − 2
vS(v) = 2 (v 2 − β). (115)
σ σ
Equation (115) can be solved by using the integrating factor Case 1: If |v| ≥ v∗ , (61) implies τ∗ = 0. Hence,
method [50, Sec. I.5], which applies to any ODE of the form  
 Z τ∗ 
Ev τ∗ |v| ≥ v∗ = Ev 1ds |v| ≥ v∗ = 0, (127)
S ′ (v) + a(v)S(v) = b(v). (116) 0

In the case of (115), and


Z τ∗ 
2θ 2 Ev Vs 2 ds |v| ≥ v∗ = 0. (128)
a(v) = − 2 v, b(v) = 2 (v 2 − β). (117)
σ σ 0

The integrating factor of (115) is Because V0 = v, we have


2
− σθ2 Ev [Vτ2∗ ] = Ev [V02 ] = v 2 .
R
a(v)dv v (129)
M (v) = e =e . (118)
θ 2
Multiplying e− σ2 v on both sides of (115) and transforming By combining (127)-(129), we get
Z τ∗
the left-hand side into a total derivative, yields
 
Ev −γVτ2∗ − (Vs 2 − β)ds |v| ≥ v∗ = −γv 2 . (130)
2 ′
h i 2
θ θ 0
S(v)e− σ2 v = b(v)e− σ2 v . (119)
Case 2: If |v| < v∗ , (61) tells us that, almost surely,
Taking the integration on both sides of (119), yields
Vτ∗ = v∗ . (131)
2 2
Z
θ 2 θ 2
S(v)e− σ2 v = 2
(v − β)e− σ2 v dv Similar to the proof of Lemma 3 below, we can use Lemma
σ
2 − θ2 v2 2 2 − θ2 v2 5 to obtain
Z Z
= e σ v dv − βe σ dv. (120)
σ2 σ2 
Ev τ∗ |v| < v∗

The indefinite integrals in (120) are given by  Z τ∗ 
√ √ ! =Ev 1ds |v| < v∗
2 − θ2 v2 2 πσ θ v
Z
θ 2 0
e σ v dv = 3 erf v − e− σ2 v + C1 , =R1 (v∗ ) − R1 (v)
σ2 2θ 2 σ θ
v2 v2
   
(121) 3 θ 3 θ
√ ! = ∗2 2 F2 1, 1; , 2; 2 v∗2 − 2 2 F2 1, 1; , 2; 2 v 2 ,
√ σ 2 σ σ 2 σ
2 − θ2 v2 πβ θ
Z
βe σ dv = √ erf v + C2 , (122) (132)
σ 2 σ
σ θ
Z τ∗ 
where erf(·) is the error function defined in (16). Combining Ev Vs 2 ds |v| < v∗
(120)-(122), results in 0
√ √  √  =R2 (v∗ ) − R2 (v)
πσ πβ θ θ 2 v θ 2
S(v) = 3 −
√ erf v e σ2 v − +C3 e σ2 v , v∗2 v∗2
 
σ θ 3 θ 2
2θ 2 σ θ =− + 2 F2 1, 1; , 2; 2 v∗
(123) 2θ 2θ 2 σ
v2 v2
 
3 θ 2
where C3 = C1 + C2 . We need to integrate (123) again to get + − 2 F2 1, 1; , 2; 2 v , (133)
H(v), which requires the following integrals: 2θ 2θ 2 σ
Z √ √  √  and
πσ πβ θ θ 2
3 − √ erf v e σ2 v dt
Ev Vτ2∗ |v| < v∗ = v∗2 .
 
2θ 2  σ θ  σ (134)
 
1 β 2 3 θ 2 Combining (132)-(134), yields
= − v 2 F2 1, 1; , 2; 2 v + C4 , (124)
2θ σ 2 2 σ  Z τ∗ 
√ ! 2 2
Z
θ 2 θ Ev −γVτ∗ − (Vs − β)ds |v| < v∗
C3 e σ2 v dv = C5 erfi v + C6 , (125) 0
σ v2

1 β

3 θ 2 2
=− + − 2 F2 1, 1; , 2; 2 v v
v v2
Z
2θ 2θ σ 2 2 σ
dv = − + C7 , (126)
θ 2θ
 
1 1 β 3 θ 
+ E e−2θYi v∗2 + − 2 2 F2 1, 1; , 2; 2 v∗2 v∗2 .

where erfi(·) is the imaginary error function defined in (66). 2θ 2θ σ 2 σ
Hence, H(v) is given by (65). This completes the proof of (135)
(65). By combining (130) and (135), Lemma 9 is proven.

A PPENDIX I A PPENDIX J
P ROOF OF L EMMA 9 P ROOF OF L EMMA 10

The proof of Lemma 9 consists of the following two cases: The proof of Lemma 10 consists of the following two cases:
Case 1: If |v| ≥ v∗ , (67) tells us that chosen in the sequel arbitrarily. By using Itô’s formula [55,
Theorem 7.13], we obtain that almost surely
H(v) = −γv 2 . (136)
H(Vt ) − H(v)
Hence, Lemma 10 holds in Case 1. Z t 2
Case 2: |v| < v∗ . Because H(v) is an even function and σ  ′′
H (Vr ) − θVr H ′ (Vr ) − (Vr2 − β) dr

=
H(v) = −γv 2 holds at v = ±v∗ , to prove H(v) ≥ −γv 2 for 0 2
Z t
|v| < v∗ , it is sufficient to show that for all v ∈ [0, v∗ )
+ σH ′ (Vr )dWr . (144)
H ′ (v) < [−γv 2 ]′ = −2γv. (137) 0
For all t ≥ 0 and all v ∈ R, we can show that
Hence, the remaining task is to prove that (137) holds for Z t 
v ∈ [0, v∗ ). Ev ′ 2
[σH (Vr )] dr < ∞.
After some manipulations, we can obtain from (71) that 0
√ Rt
This and [55, Theorem 7.11] imply that 0 σH ′ (Vr )dWr is a
!
θ
(mse∞ − β) G v∗ = mse∞ E(e−2θYi ). (138) martingale and
σ
Z t 
Because G(·) > 0 is an increasing function, it holds for all Ev σH ′ (Vr )dWr = 0, ∀ t ≥ 0. (145)
v ∈ [0, v∗ ) that 0

√ ! √ ! Hence,
θ θ
(mse∞ − β) G v < (mse∞ − β) G v∗ Ev [H(Vt ) − H(v)]
σ σ Z t 2 
σ  ′′
=mse∞ E(e−2θYi ). H (Vr ) − θVr H ′ (Vr ) − (Vr2 − β) dr . (146)

(139) =Ev
0 2
One can obtain (137) from (68) and (139). Hence, Lemma 10
also holds in Case 2. This completes the proof. Next, we show that for all v ∈ R
σ 2 ′′
H (v) − θvH ′ (v) − (v 2 − β) ≤ 0. (147)
A PPENDIX K 2
P ROOF OF L EMMA 11 Let us consider the following two cases:
We need the following lemma in the proof of Lemma 11: Case 1: If |v| < v∗ , then (62) implies
Lemma 12. (1 − 2x2 )G(x) ≤ 1 for all x ≥ 0. σ 2 ′′
H (v) − θvH ′ (v) − (v 2 − β) = 0. (148)
2
Proof. Because G(0) = 1, it suffices to show that for all x > 0
Case 2: |v| > v∗ . In this case, H(v) = −γv 2 . Hence,
[(1 − 2x2 )G(x)]′ ≤ 0. (140)
σ 2 ′′
We have H (v) − θvH ′ (v)
2
[(1 − 2x2 )G(x)]′ σ2
= (−2γ) − θv(−2γv)
1 2 x −t2
Z
1 2
Z x
2
2
= − 2 ex e dt + − 4x2 ex e−t dt − 2x. (141) = − σ 2 γ + 2θγv 2
x 0 x 0
2
= − mseYi + E[1 − e−2θYi ]v 2 . (149)
Because e−t is decreasing on t ∈ [0, ∞), for all x > 0
Z x Z x Substituting (149) into (147), yields
−t2 2 2
e dt ≥ e−x dt = xe−x . (142) E[e−2θYi ]v 2 ≥ β − mseYi . (150)
0 0
Hence, To prove (150), since |v| > v∗ , it suffices to show that
x
1 2 1
Z
− 2 ex e−t
2
dt + ≤ 0. (143) E[e−2θYi ]v∗2 ≥ β − mseYi , (151)
x 0 x
which is equivalent to
Substituting (143) into (141), (140) follows. This completes
the proof. v∗2
(mse∞ − mseYi ) ≥ (mse∞ − mseYi ) − (mse∞ − β).
mse∞
Now we are ready to prove Lemma 11. (152)

Proof of Lemma 11. The function H(v) is continuous differ- By Lemma 12, we get
entiable on R. In addition, H ′′ (v) is continuous everywhere √ !
v∗2 2θ
 
θ
but at v = ±v∗ . Since the Lebesgue measure of those time 1− 2 G v∗ ≤ 1. (153)
σ σ
t for which Vt = ±v∗ is zero, the values H ′′ (±v∗ ) can be
Hence, Next, we compute mseopt and β = mseopt + λ⋆ . To compute
√ ! mseopt , we substitute policy π ⋆ into (49), which yields
v∗2
 
θ
1− G v∗ ≤ 1. (154) Pn−1 hR Yi +Zi (β)+Yi+1 2 i
mse∞ σ i=0 E Yi
Os ds
mseopt = lim Pn−1
By substituting (71) into (155), we obtain n→∞
i=0 E [Yi +Z (β)]
√ hR i i
! Yi +Zi (β)+Yi+1 2
v∗2 E Yi Os ds
 
θ
(mse∞ − mseYi ) 1 − G v∗ = , (163)
mse∞ σ E[Yi + Zi (β)]
√ 
θ where in the last equation we have used that the Zi (β)’s are
≤ (mse∞ − β) G v∗ . (155)
σ i.i.d. Hence, the value of β = mseopt + λ⋆ can be obtained by
considering the following two cases:
Because G(x) > 0 for all x > 0,
Case 1: If λ⋆ > 0, then (163) and (161) imply that
v∗2
 
(mse∞ − mseYi ) 1 − ≤ mse∞ − β, (156) 1
mse∞ E [Yi + Zi (β)] = , (164)
fmax
which implies (152). Hence, (147) holds in both cases. Thus,
hR i
Y +Z (β)+Yi+1 2
E Yii i Os ds
Ev [H(Vt ) − H(v)] ≤ 0 holds for all t ≥ 0 and v ∈ R. This β > mseopt = . (165)
completes the proof. E [Yi +Zi (β)]
Case 2: If λ⋆ = 0, then (163) and (162) imply that
A PPENDIX L 1
E [Yi + Zi (β)] ≥ , (166)
P ROOF OF T HEOREM 5 fmax
hR i
Y +Z (β)+Yi+1 2
E Yii i Os ds
According to [47, Prop. 6.2.5], if we can find π ⋆ = (Z1 , β = mseopt = . (167)
E [Yi +Zi (β)]
Z2 , . . .) and λ⋆ that satisfy the following conditions:
n−1
Hence, if we choose π ⋆ = (Z0 (β), Z1 (β), . . .), where Zi (β)
1X 1 is given by (74) and β satisfies (164)-(167), and choose λ⋆ =
π ⋆ ∈ Π1 , lim E [Yi + Zi ] − ≥ 0, (157)
n→∞ n
i=0
fmax β − mseopt , then the selected π ⋆ and λ⋆ satisfy (157)-(160).
By [47, Prop. 6.2.3(b)], the duality gap between (50) and (54)
λ⋆ ≥ 0, (158)
is zero. A solution to (50) and (54) is π ⋆ . This completes the
L(π ⋆ ; λ⋆ ) = inf L(π; λ⋆ ), (159) proof.
π∈Π1
( n−1
)
⋆ 1X 1 A PPENDIX M
λ lim E [Yi + Zi ] − = 0, (160)
n→∞ n fmax P ROOF OF L EMMA 1
i=0

then π ⋆ is an optimal solution to (50) and λ⋆ is a geometric We first prove mseopt ≤ β. Using (55) and λ ≥ 0, it follows
multiplier [47] for (50). Further, if we can find such π ⋆ and that mseopt ≤ β.
λ⋆ , then the duality gap between (50) and (54) must be zero, Next, we prove β ≤ mse∞ . According to (13) and (14), we
because otherwise there is no geometric multiplier [47, Prop. know that mseYi ≤ mse∞ . Because G(x) ≥ 1 for all x ≥ 0,
6.2.3(b)]. The remaining task is to find π ⋆ and λ⋆ that satisfy (71) implies β ≤ mse∞ .
(157)-(160). Finally, we prove mseYi ≤ mseopt . We first consider the
The remaining task is to find π ⋆ and λ⋆ that satisfies (157)- special case of fmax = ∞. In this case, (19) and (22) tell us
(160). According to Theorem 4 and Corollary 1, an solution that mseopt = β. By using (71) and the fact that G(x) ≥ 1
π ⋆ = (Z0 (β), Z1 (β), . . .) to (159) is given by (73), where for all x ≥ 0, it follows that mseYi ≤ β = mseopt holds when
β = mseopt + λ⋆ . In addition, because the Yi ’s are i.i.d., the fmax = ∞. On the other hand, the set of feasible policies of
Zi (β)’s in policy π ⋆ are i.i.d. Using (157), (158), and (160), Problem (10) becomes larger as fmax increases. Hence, mseopt
the value of λ⋆ can be obtained by considering two cases: If is decreasing in fmax . We have shown that mseYi ≤ mseopt
λ⋆ > 0, because the Zi (β)’s are i.i.d., we have from (160) holds in the special case fmax = ∞. Therefore, mseYi ≤
that mseopt must hold for all possible positive values of fmax . This
n−1 completes the proof.
1X 1
lim E [Yi + Zi (β)] = E [Yi + Zi (β)] = . (161)
n→∞ n fmax
i=0 A PPENDIX N

If λ = 0, then (157) implies P ROOF OF L EMMA 3
n−1 According to (8) and (9), the estimation error (Xt −X̂t ) is of
1X 1
lim E [Yi + Zi (β)] = E [Yi + Zi (β)] ≥ . (162) the same distribution with Ot−Si (β) for t ∈ [Di (β), Di+1 (β)).
n→∞ n f max
i=0 We will use (Xt − X̂t ) and Ot−Si (β) interchangeably for t ∈
[Di (β), Di+1 (β)). In order to prove Lemma 3, we need to By using (176), (177), and (114), we have
consider the following two cases: "Z
Di+1 (β)
#
2
E (Xt − X̂t ) dt OYi , |OYi | < v(β)
Di (β)
"Z
Di+1 (β) Z Si+1 (β)
=E (Xt − X̂t )2 dt + (Xt − X̂t )2 dt
Si+1 (β) Si (β)
Case 1: If |XDi (β) − X̂Di (β) | = |OYi | ≥ v(β), then (17) #
Z Di (β)
tells us Si+1 (β) = Di (β). Hence, − 2
(Xt − X̂t ) dt OYi , |OYi | < v(β)
Si (β)
Di (β) = Si (β) + Yi , (168)
Di+1 (β) = Si+1 (β) + Yi+1 = Di (β) + Yi+1 . (169) =mse∞ [E(Yi+1 ) − γ] + v 2 (β)γ + R2 (v(β)) − R2 (|OYi |).
(178)
Using these and the fact that the Yi ’s are independent of the
By combining (170) and (175) of the two cases, yields
OU process, we can obtain h i
h i E Di+1 (β) − Di (β) OYi
E Di+1 (β) − Di (β) OYi , |OYi | ≥ v(β) = E[Yi+1 ], (170)
=E[Yi+1 ] + max{R1 (v(β)) − R1 (|OYi |), 0}. (179)
and
"Z
Di+1 (β)
# Similarly, by combining (171) and (178) of the two cases,
E 2
(Xt − X̂t ) dt OYi , |OYi | ≥ v(β) yields
Di (β) "Z #
Di+1 (β)
Yi +Yi+1 2
(Xt − X̂t ) dt OYi
Z 
E
=E Os2 ds OYi , |OYi | ≥ v(β) Di (β)
Yi
(a) σ
2
σ 2 =mse∞ [E(Yi+1 ) − γ] + max{v 2 (β), OY2 i }γ
E[Yi+1 ] + γOY2 i −
E 1 − e−2θYi+1
 
= 2 + max{R2 (v(β)) − R2 (|OYi |), 0}. (180)
2θ 4θ
=mse∞ [E(Yi+1 ) − γ] + OY2 i γ, (171) Finally, by taking the expectation over OYi in (179) and (180)
where Step (a) follows from the proof of (114). and using the fact that R1 (·) and R2 (·) are even functions,
Lemma 3 is proven.

Case 2: If |XDi (β) − X̂Di (β) | = |OYi | < v(β), then (17)
tells us that, almost surely,
|XSi+1 (β) − X̂Si+1 (β) | = v(β). (172)
By invoking Lemma 5, we can obtain
h i
E Si+1 (β) − Si (β) OYi , |OYi | < v(β) = R1 (v(β)), (173)
h i
E Di (β) − Si (β) OYi , |OYi | < v(β) = R1 (|OYi |). (174)

Using (173), (174), and Di+1 (β) = Si+1 (β) + Yi+1 , we get
h i
E Di+1 (β) − Di (β) OYi , |OYi | < v(β)
h
=E (Di+1 (β) − Si+1 (β)) + (Si+1 (β) − Si (β))
i
− (Di (β) − Si (β)) OYi , |OYi | < v(β)
=E[Yi+1 ] + R1 (v(β)) − R1 (|OYi |). (175)
In addition, by invoking Lemma 5 again, we can obtain
"Z #
Si+1 (β)
2
E Os ds OYi , |OYi | < v(β) = R2 (v(β)), (176)
Si (β)
"Z #
Di (β)
E Os2 ds OYi , |OYi | < v(β) = R2 (|OYi |). (177)
Si (β)

View publication stats

You might also like