Professional Documents
Culture Documents
Minimizing AoI Using Hybrid Noma
Minimizing AoI Using Hybrid Noma
NOMA/OMA
Qian Wang1,2 , He Chen2 , Yonghui Li1 , Branka Vucetic1
1
School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia
2
Department of Information Engineering, Chinese University of Hong Kong, Hong Kong SAR, China
1
{qian.wang2, yonghui.li, branka.vucetic}@sydney.edu.au, 2 he.chen@ie.cuhk.edu.hk
Abstract—This paper considers a wireless network with a base station (BS) receiving status updates from multiple nodes
base station (BS) conducting timely transmission to two clients with a generate-at-will status arrival model. A BS serving
in a slotted manner via hybrid non-orthogonal multiple access status updates to multiple nodes with randomly generated
arXiv:2001.04042v1 [cs.IT] 13 Jan 2020
To proceed, we now investigate the existence of optimal accordingly as in [4] to reduce the complexity of calculating
stationary and deterministic policy of Problem 2 and achieve the optimal policy.
the following theorem. D. Suboptimal Policy
∗
Theorem 1. There exist a constant J , a bounded function In this subsection, we propose a suboptimal policy, with
lower computation complexity comparing with that of optimal
h(∆1 , ∆2 ) : S → R and a stationary and deterministic policy MDP policy. Inspired by the Max-weight policy in [11],
π ∗ , satisfies the average reward optimality equation, the proposed suboptimal policy makes use of the transition
ˆ 1, ∆
ˆ 2 )]), (16) probability in the MDP and minimizes the expectation of the
J ∗ +h(∆1 , ∆2 ) = min(w1 ∆1 +w2 ∆2 +E[h(∆ reward of the next stage. According to (12), given current
a∈A
state s = (∆1 , ∆2 ), the expected reward of next stage ŝ can
∀(∆1 , ∆2 ) ∈ S, where π ∗ is the optimal policy, J ∗ is the be expressed as
optimal average reward, and (∆ˆ 1, ∆
ˆ 2 ) is the next state after
1 + w1 P1O ∆1 + w2 ∆2 , if a = 1;
(∆1 , ∆2 ) taking action a. E[r(ŝ, a)] = 1 + w1 ∆1 + w2 P2O ∆2 , if a = N
1 + w1 P1 (a)∆1 + w2 P2N (a)∆2 ,
N
otherwise.
Proof. See Appendix A.
(17)
B. Action Elimination Then, the action of state s in suboptimal policy π̄ can be given
by
In this subsection, we establish action elimination by ana- π̄(s) = arg min E[r(ŝ, a)]. (18)
lyzing the property of the formulated MDP problem, which a
can reduce action space for each state for lower compu- The suboptimal policy is simple and easy to implement.
tation complexity. According to (7) and (10), and the fact Moreover, as we show via the numerical results in Section IV,
α1 + α2 = 1, the outage probability of client 2 using the suboptimal policy can achieve near-optimal performance.
NOMA P2N is decreasing in α2 , i.e., P2N (a) is decreasing By applying the suboptimal policy, the considered MDP
R
in action a, when max{d N2 e + 1, d (2 2−1)N
R e} < a < N . problem is transferred into a Markov chain. We then can
However, the outage probability of client 1 using NOMA P1N approximately calculate the weighted sum of the expected AoI
R R
is decreasing when 2 2R−1 < α2 < 2R2 +1 and increasing when of the network by finite state approximation as following,
2R
< α2 < 1. That is, P1N (a) is decreasing in a when π̄
2R +1 π̄ Ps,ŝ , if ŝ ∈ Sm ;
R R Ps,ŝ = π̄ π̄ (19)
a ∈ {max{d N2 e + 1, d (2 2−1)N
P
R e}, ..., b 22R +1
N
c} and increasing Ps,ŝ + s0 ∈S−Sm Ps,s 0 otherwise,
R R
2 N 2 N
when a ∈ {d 2R +1 e, d 2R +1 e + 1, ..., N − 1}. As the BS aims where Sm is the set of states with ∆1 ≤ m and ∆2 ≤ m
to minimize the weighted sum of the expected AoI of the π̄
R
and Ps,ŝ is the probability of transiting from state s to ŝ by
network, action a = b 22R +1 N
c has a better performance in taking action π̄(s). Accordingly, the transition matrix of the
reducing AoI of both clients, with lower outage probability corresponding Markov chain Pπ̄ can be constructed. Denoting
R
comparing to a ∈ {max{d N2 e + 1, d (2 2−1)N R e}, max{d N2 e + θ as the steady state distribution of the Markov chain, by
R
1, d (2 2−1)N
R
e} + 1, ..., b 22R +1
N
c}. Thus, the action space can be solving θ = Pπ̄ θ , the approximated weighted sum of the
R
R R expected AoI of the network can be expressed by
reduced to a ∈ {0, b 2R +1 c, b 22R +1
2 N N
c + 1, ..., N }. X
¯ 0 (π̄) =
∆ θs (w1 ∆1 + w2 ∆2 ), (20)
C. Structural Results on Optimal Policy s=(∆1 ,∆2 )∈Sm
In this subsection, we derive two structural results of the
where θs is the element in θ , denoting steady state probability
optimal policy which offers an effective way to reduce the
of state s. This also serves as the lower bound of the weighted
offline computation complexity and online implementation ¯ ¯ 0 (π̄).
sum of the expected AoI of policy π̄, i.e., ∆(π̄) >∆
hardware requirement.
IV. N UMERICAL R ESULTS
Theorem 2. The optimal policy π ∗ has a switching-type
policy. That is, denoting c and d as any action from action This section provides numerical results to verify the analyti-
R R
space {0, b 22R +1
N
c, b 22R +1
N
c + 1, ..., N }, cal results provided in the preceding sections. We set path loss
∗ ∗ exponent τ = 2 and target data rate R = 1 in all simulations.
• If π ((∆1 , ∆2 )) = c, then π ((∆1 , ∆2 + z)) = d, where
The SNR in this section refers to as the transmission SNR ρ.
z is any positive integer and d ≥ c,
∗ ∗ We apply Relative Value Iteration (RVI) on truncated finite
• If π ((∆1 , ∆2 )) = a, then π ((∆1 + z, ∆2 )) = d, where
states (∆i ≤ 100, ∀i) to approximate the countable infinite
z is any positive integer and d ≤ c.
state space according to [21]. The optimal policy and subopti-
Proof. See Appendix B. mal policy is illustrated in Fig.1, where SNR= 18dB, d1 = 2,
100 100
90 90
weighted sum of the expected AoI of the suboptimal policy
80 80
and that of the optimal adaptive NOMA/OMA policy narrows
70 70
60 60
as the SNR increases.
50 50
Moreover, we can see that when SNR is small, e.g., SNR<
40 40
30 30
15dB, the performance of optimal adaptive NOMA/OMA
20 20
scheme and that of optimal OMA scheme are almost the
10 10
0
0 10 20 30 40 50 60 70 80 90 100
0
0 10 20 30 40 50 60 70 80 90 100
same in Fig. 2. This is due to the low SNR, which leads to
a larger outage probability for both OMA and NOMA. The
(a) MDP optimal policy (b) Suboptimal policy situation for NOMA is even worse. As such, both optimal
Fig. 1: Age-optimal policy and suboptimal policy. Each point adaptive NOMA/OMA policy and the suboptimal policy will
represents a state s = (∆1 , ∆2 ). The colored area indicates prefer not to choose NOMA scheme but use OMA scheme.
action for each state, i.e., a = 0 for states in the blue area; a = Thus, these two policies have similar performance. As SNR
7 for states in the orange area; a = 8 for states in the purple increases, the weighted sum of the expected AoI of optimal
area; a = 9 for states in the green area and a = 10 for states OMA policy will approach 1.5, when w1 = w2 = 0.5. This is
in the red area, where N = 10 and A = {0, 6, 7, 8, 9, 10}. the optimal performance under the OMA scheme, where the
outage probability of each client is 0, and thus 1 and 2 take
16 turns to form the age evolution of each client.
Optimal adaptive NOMA/OMA
Optimal NOMA
Suboptimal adaptive NOMA/OMA
Furthermore, we can find that the performance of optimal
14
Optimal OMA
adaptive NOMA/OMA policy and that of suboptimal policy
12 and NOMA policy are relatively close when SNR is large,
e.g., SNR≥ 20dB in Fig. 2. This is because both optimal
10
adaptive NOMA/OMA policy and suboptimal policy are more
8 likely to choose NOMA for transmission. When SNR is large
enough, the optimal performance of both the optimal adaptive
6
d 1=3, d2=6
NOMA/OMA policy and the suboptimal policy approaches
4 1 as the instantaneous AoI of each client will be always 1,
2
thanks to no outage for both clients in NOMA. The BS thus
d 1=2, d2=4
always chooses NOMA scheme to conduct transmissions to
0
10 15 20 25
both clients. In addition, NOMA is better than optimal OMA
when SNR> 16dB for d1 = 2 and d2 = 4 and SNR> 19dB
Fig. 2: The performance comparison of different policies for d1 = 3 and d2 = 6. This shows the advantage of NOMA
versus SNR, when w1 = w2 = 0.5. in timely status update when SNR is large.
V. C ONCLUSIONS
d2 = 4 and w1 = w2 = 0.5. The switching-type policy is In this paper, we considered a wireless network with a base
verified. Besides, we can find that the proposed suboptimal station (BS) conducting timely transmission to two clients in
policy is close to the optimal policy. a slotted manner via hybrid non-orthogonal multiple access
Fig. 2 illustrates the performance of the weighted sum (NOMA)/orthogonal multiple access (OMA). The BS can
of the expected AoI of the two clients under optimal pol- adaptively switch between NOMA and OMA for the downlink
icy using adaptive NOMA/OMA scheme (optimal adap- transmission to minimize the AoI of the network. We develop
tive NOMA/OMA scheme), the policy that always using an optimal policy for the BS to decide whether to use NOMA
NOMA for transmission (optimal NOMA policy with a ∈ or OMA for downlink transmission based on the instantaneous
R
{max{d N2 e + 1, d (2 2−1)N
R e}, ..., N − 1}), the proposed sub- AoI of both clients in order to minimize the weighted sum
optimal policy and the optimal OMA policy that the BS of the expected AoI of the network. This is achieved by
adaptively selects one client to conduct transmission (optimal formulating a Markov Decision Process (MDP) problem. We
OMA scheme with a ∈ {0, N }) in two cases: 1) d1 = 2 proved the existence of optimal stationary and determinis-
and d2 = 4; 2) d1 = 3 and d2 = 6. The setting of the tic policy. Action elimination was conducted to reduce the
rest system parameters is the same as that in Fig 1. We can computation complexity. The optimal policy is shown to have
find that the proposed suboptimal policy achieves near-optimal a switching-type property with obvious decision boundaries.
performance, and its weighted sum of the expected AoI almost A suboptimal policy with lower computation complexity was
coincides with that of the optimal adaptive NOMA/OMA also proposed, which is shown to achieve near-optimal per-
policy especially when the outage probability of two clients formance according to simulation results. The approximate
are small as shown in Fig. 2. Specifically, the performance average AoI performance of the suboptimal policy was also
of suboptimal policy is closer to that of the optimal adaptive derived. The performance of different policies under different
NOMA/OMA policy when d1 = 2 and d2 = 4, comparing system settings was compared and analyzed in numerical
to the case when d1 = 3 and d2 = 6; the gap between the results to provide useful insights for practical system designs.
A PPENDIX A To verify these conditions, we first order the state by ∆2 , i.e.,
− −
P ROOF OF T HEOREM 1 s+ ≥ s− if ∆+ + + −
2 ≥ ∆2 where s = (·, ∆2 ) and s = (·, ∆2 ).
We prove this theorem by verifying the Assumptions 3.1, The one-step reward function of the MDP is
3.2 and 3.3 in [22] hold. As the action space for each state is r(s, a) = w1 ∆1 + w2 ∆2 . (25)
finite, assumption 3.2 holds, and we only need to verify the
following two conditions. It is obvious that the condition a) is satisfied. According to
1) There exist positive constants β < 1, M and m, and the transition probabilities in (12) and (13), if the current state
a measurable function ω(s) ≥ 1 on S, s = (∆1 , ∆2 ) s = (∆1 , ∆2 ), the next possible states are s1 = (·, ∆2 +
such that the reward function of MDP problem r(s, a) = 1) (including (1, ∆2 + 1) and (∆1 + 1, 1)) and s2 = (·, 1)
w1 ∆1 + w2 ∆2 , |r(s, a)| ≤ M ω(s) for all state-action
pairs (s, a) and (including (1, 1) and (∆1 + 1, 1)). Based on (12) and (13), we
X have
ω(ŝ)P (ŝ|s, a) ≤ βω(s) + m, for all (s, a). (21) 0, if k > s1
q(k|s, a = 0) = (26)
ŝ∈S 1, otherwise.
2) There exist two value functions v1 , v2 ∈ Bω (S), and
some state s0 ∈ S, such that 0, if k > s1
q(k|s, a = i, 0 < i < N ) = P2N (i), if s1 ≥ k > s2
v1 (s) ≤ hα (s) ≤ v2 (s), for all s ∈ S, and α ∈ (0, 1), (22)
1, if k ≤ s2
where hα (s) = Vα (s) − Vα (s0 ) and Bω (S) := {u : kukω < (27)
∞} denotes Banach space, kukω := sups∈S ω(s)−1 |u(s)| 0, if k > s1
denotes the weighted supremum norm. q(k|s, a = N ) = P O, if s1 ≥ k > s2 (28)
2
1, if k ≤ s2
For condition 1, we show that when ω(s) =
w1 ∆1 + w O
∆
2 2 and m >
O
1, there exist Thus, condition b) is immediate.
w1 ∆1 P1 +w2 ∆2 +1−m w1 ∆1 +w2 P2 ∆2 +1−m
maxa { w1 ∆1 +w2 ∆2
, w1 ∆1 +w2 ∆2
, To verify the remaining two conditions, we give the defini-
w1 P1 (a)∆1 +w2 P2N (a)∆2 +1−m
N
w1 ∆1 +w2 ∆2
} ≤ β < 1 to meet condition tion of subadditive in the following
1. To prove condition 2 in our problem, we show that when Definition 1. (Subadditive [23]) A multivariable function
ω(s) P= w1 ∆1 + w2 ∆2 , there exists ww 1 ∆1 +w2 ∆2 +1
1 ∆1 +w2 ∆2
≤κ<∞ Q(δ, a) : N × A → R is subadditive in (δ, a) , if for all
that ω(ŝ)P
ŝ∈S P (ŝ|s, a) ≤ κω(s) for all (s, a), and for δ + ≥ δ − and a+ ≥ a− ,
d ∈ ΠM D , Pŝ∈S ω(ŝ)Pd (ŝ|s, a) ≤ ω(s) + 1 ≤ (1 + 1)ω(s),
so that αN ŝ∈S ω(ŝ)PdN (ŝ|s, a) ≤ αN (ω(s) + N ) < Q(δ + , a+ ) + Q(δ − , a− ) ≤ Q(δ + , a− ) + Q(δ − , a+ ) (29)
αN (1 + N )ω(s). Hence, for each α, 0 ≤ α < 1, there exists
a η, 0 ≤ η < 1 and an integer N such that holds.
According to (25), condition c) follows. For the last condi-
X
αN ω(ŝ)PπN (ŝ|s, a) ≤ ηω(s) (23)
ŝ∈S
tion, we verify whether