Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

76 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO.

1, JANUARY 2020

Event-Triggered Optimal Control With Performance


Guarantees Using Adaptive Dynamic Programming
Biao Luo , Member, IEEE, Yin Yang, Derong Liu , Fellow, IEEE, and Huai-Ning Wu

Abstract— This paper studies the problem of event-triggered with a sequence of Lyapunov equations in [2]. This solution
optimal control (ETOC) for continuous-time nonlinear systems requires the knowledge of complete system dynamics. In [9],
and proposes a novel event-triggering condition that enables this requirement is avoided with an online policy iteration
designing ETOC methods directly based on the solution of the
Hamilton–Jacobi–Bellman (HJB) equation. We provide formal algorithm, which can be implemented with an actor-critic
performance guarantees by proving a predetermined upper neural network (NN) structure. In [11], an off-policy learning
bound. Moreover, we also prove the existence of a lower bound for method was developed, where system data can be gener-
interexecution time. For implementation purposes, an adaptive ated by arbitrary control policies. In [13], an ADP method
dynamic programming (ADP) method is developed to realize was proposed for nonaffine nonlinear systems with unknown
the ETOC using a critic neural network (NN) to approximate
the value function of the HJB equation. Subsequently, we prove dynamics. To relax the persistence of excitation condition,
that semiglobal uniform ultimate boundedness can be guaranteed a model-based reinforcement learning method was suggested
for states and NN weight errors with the ADP-based ETOC. in [16], using a concurrent-learning-based system identifier to
Simulation results demonstrate the effectiveness of the developed simulate experience. Note that these works are mainly time-
ADP-based ETOC method. triggered approaches, where the controller needs to stay active
Index Terms— Adaptive dynamic programming (ADP), event- at all time instances, which can be expensive in terms of
triggered, neural network (NN), optimal control, performance computational and communication overhead.
guarantee. In many practical applications, it is often unnecessary
and/or infeasible to update the controller at every time
I. I NTRODUCTION
instance. Event-triggered control [19]–[21] is a promising

A DAPTIVE dynamic programming (ADP) is an effec-


tive technique to solve optimal control problems. Such
problems often require solving a complicated nonlinear par-
methodology for reducing computational and communication
costs, which has attracted considerable attention in the past
few years. Event-triggered control methods only update con-
tial differential equation called the Hamilton–Jacobi–Bellman trol action when an event-triggering condition is violated.
(HJB) equation [1]. However, it is usually difficult to obtain Recently, ADP methods have been applied to solve event-
the exact solution of the HJB equation. To overcome this triggered optimal control (ETOC) problems in [22]–[30].
difficulty, a popular methodology is to solve the optimal Vamvoudakis [22] proposed a novel optimal adaptive event-
control problems approximately using ADP [2]–[18]. For triggered control algorithm for continuous-time nonlinear sys-
instance, Vrabie and Lewis [4] propose an integral reinforce- tems. In [23], an actor–critic method was employed to solve
ment learning method based on policy iteration for optimal the finite-horizon optimal control problem with event-based
control with partially unknown system dynamics. For affine approximation. In [28], the ETOC problem was investigated
nonlinear continuous-time systems with input constraints, for partially unknown continuous-time systems with input
an approximate solution to the HJB equation can be obtained constraints. In [31], the ETOC problem for discrete-time
nonlinear systems was considered, and an adaptive event-
Manuscript received March 16, 2018; revised November 12, 2018 and
February 12, 2019; accepted February 12, 2019. Date of publication March 15, triggered control method based on heuristic dynamic pro-
2019; date of current version January 3, 2020. This work was supported gramming was proposed. In [30], an interesting Q-learning
in part by the National Natural Science Foundation of China under algorithm was developed to solve the model-free ETOC of
Grant 61873350, Grant 61503377, Grant 61721091, Grant 61625302,
Grant 61533017, and Grant U1501251 and in part by the Qatar National continuous-time linear systems. For the ETOC problem for
Research Fund through National Priority Research Project under Grant continuous-time nonlinear systems, most of the works derive
NPRP9-466-1-103. (Corresponding author: Biao Luo.) an event-triggered HJB equation, and then apply ADP methods
B. Luo is with the School of Automation, Central South University,
Changsha 410083, China (e-mail: biao.luo@hotmail.com). to obtain its solution approximately. Note that the event-
Y. Yang is with the College of Science and Engineering, Hamad Bin Khalifa triggered HJB equation itself is already an approximation
University, Doha, Qatar (e-mail: yyang@qf.org.qa). of the original HJB equation. Moreover, how event-triggered
D. Liu is with the School of Automation, Guangdong University of
Technology, Guangzhou 510006, China (e-mail: derongliu@foxmail.com). schemes affect the optimal performance index has rarely been
H.-N. Wu is with the Science and Technology on Aircraft Control Labora- studied in existing works.
tory, Beihang University, Beijing 100191, China (e-mail: whn@buaa.edu.cn). In traditional optimal control with a time-triggered scheme,
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. the optimal performance index can be attained in theory. How-
Digital Object Identifier 10.1109/TNNLS.2019.2899594 ever, due to the use of an event-triggered scheme in ETOC,
2162-237X © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 77

the optimal performance will degrade to some extent. Hence, u(t) = u ∗ (x) such that the system (1) is closed-loop asymp-
ETOC essentially provides a tradeoff between the perfor- totically stable, and the performance index (2) is minimized.
mance and resources usage. Then, a natural question is how Then, the time-triggered optimal control is given by
much performance degradation an ETOC method causes.
In [22] and [30], the real performance index was analyzed u ∗ (x) = arg min J (x 0 , u). (3)
u
for event-triggered control. However, the obtained real perfor-
In this paper, the ETOC problem is considered, where the
mance index contains an integral term and its boundness was
system information only transmits when the event-triggering
not analyzed. In this paper, an ETOC method is developed condition is violated. The triggered instants are a monotone
by proposing a novel event-triggering condition, which guar-
increasing sequence {t j } determined by the event-triggering
antees a predetermined upper bound for performance index. condition, where t0 = 0, t j < t j +1 , j ∈ N. Thus, the interexe-
Moreover, the stability and the lower bound of interexecution
cution time of the event-triggered control is defined as
times of ETOC are analyzed theoretically. For realization
purpose, ADP is employed to implement the ETOC by using h j  t j +1 − t j . (4)
a critic NN to estimate the solution of the HJB equation.
It is proven that semiglobal uniform ultimate boundedness For the ETOC problem, only at event-triggered instants,
(SGUUB) can be guaranteed for states and NN weight errors the system information is transmitted to the controller and the
with the ADP-based ETOC. Furthermore, the bounds for control signal is held constant in a zero-order-hold scheme
performance and interexecution time are also analyzed for the during time interval [t j , t j +1 ).
ADP-based ETOC. In this paper, the study on the ETOC problem aims to
The rest of this paper is arranged as follows. The problem achieve the following three goals. First, a predetermined upper
description for ETOC is given in Section II. An ETOC method bound can be guaranteed for the real performance index;
is proposed and its stability, the upper bound of perfor- second, there exists a lower bound for the interexecution time;
mance index, and the lower bound of interexecution times and third, the stability of the closed-loop system with the
are analyzed theoretically in Section III. An ADP-based ETOC can be guaranteed.
ETOC method is developed with theoretical analysis
in Section IV. Simulation results are presented in Section V, III. E VENT-T RIGGERED O PTIMAL C ONTROL
and brief conclusions are given in Section VI. W ITH P ERFORMANCE G UARANTEE
Notations: Rn is the set of n-dimensional Euclidean space In this section, the ETOC method is developed and
and  ·  denotes its norm. N denotes the set of nonnegative its theoretical analysis is provided. First, the ETOC is
integers. The superscript T is used for the transpose of a designed directly based on the HJB equation and a novel
matrix or vector. ∇x  ∂/∂ x denotes a gradient operator event-triggering condition is proposed. Based on the devel-
notation. For a symmetric matrix M, M > (≥)0 means that it oped ETOC, the stability and performance bound are proved
is a positive definite (semidefinite) matrix. v2M  v T Mv for the closed-loop system. Subsequently, it is proven that there
for some real vector v and symmetric matrix M > (≥)0 exists a lower bound for the interexecution times.
with appropriate dimensions. X and U represent two compact
sets. C 1 (X ) is a function space on X with first derivatives
are continuous. σ (·) and σ (·) denote the maximum and A. Event-Triggered Optimal Control Design
minimum singular values. For a vector x(t), denote x − (t)  First, let us consider the time-triggered optimal control
limε→0 x(t − ε). of the system (1) and the performance index (2). For an
admissible control u(x), its value function is defined as
 ∞
II. P ROBLEM D ESCRIPTION  
Vu (x) = Q(x(t)) + u(x(t))2R dt (5)
Consider the following continuous-time nonlinear systems: t

ẋ(t) = f (x(t)) + g(x(t))u(t), x(0) = x 0 (1) for all x(t) = x ∈ X , where Vu (x) ∈ C 1 (X ), Vu (x)  0,
and Vu (0) = 0. For a value function Vu (x) ∈ C 1 (X ), its
where x = [x 1 , · · · , x n ]T ∈ X ⊂ Rn is the state, x 0 is the Hamiltonian is denoted as
initial state, and u = [u 1 , · · · , u m ]T ∈ U ⊂ Rm is the control
input. Assume that f (x) + g(x)u(t) is Lipschitz continuous H (x, u(x), ∇x Vu (x)) = [∇x Vu (x)]T [ f (x) + g(x)u(x)]
on X that contains the origin, f (0) = 0, and that the system +Q(x) + u(x)2R . (6)
is stabilizable on X , i.e., there exists a continuous control
function such that the system is asymptotically stable on X . By using (6), differentiating (5) with respect to t yields
Consider the following infinite-horizon performance index: H (x, u(x), ∇x Vu (x)) = 0. (7)
 ∞
 
J (x 0 , u) = Q(x(t)) + u(t)2R dt (2) For the optimal control u ∗ (x) in (3), denote its optimal value
0 function as V ∗ (x)  Vu ∗ (x). Then, it follows from (2) that the
where R > 0 and Q(x) is a positive definite function, optimal performance index is given by
i.e., Q(x) > 0 for all x = 0 and Q(0) = 0. For the time-
J ∗ (x 0 )  min J (x 0 , u) = V ∗ (x 0 ). (8)
triggered optimal control problem, it aims to design a control u

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
78 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020

From the optimal control theory [1], V ∗ (x) satisfies the state x(t) is transmitted to update the controller. Thus, it fol-
following HJB equation: lows from (17) that the next release time instant t j +1 is given
by
min H (x, u(x), ∇x V ∗ (x)) = 0. (9)
u
t j +1 = inf{t|Cα (x(t), x̂(t))  0, t  t j } (19)
Then, ∇u H (x, u(x), ∇x V ∗ (x)) = 0, that is,
where t0 = 0.
1
u (x) = − R −1 g T (x)∇x V ∗ (x).

(10)
2
B. Stability and Performance Guarantee
Substituting (10) into (9) gives the HJB equation
For the time-triggered optimal control (10), the performance
H (x, u ∗(x), ∇x V ∗ (x)) = 0 (11) index (2) can be minimized, i.e., the optimal performance
index J ∗ (x 0 ) will be obtained. In the ETOC (13), only states
which can be rewritten as at event-triggered time instants {t j } are available for control
∇xT V ∗ (x) f (x) + Q(x) update. That is to say, the performance index is bound to
1 degrade at some extent, that is, the optimal performance index
− ∇xT V ∗ (x)g(x)R −1 g T (x)∇x V ∗ (x) = 0 (12) cannot be achieved whenever any event-triggering scheme is
4
employed. Thus, it is necessary to analyze that how much the
where V ∗ (x) ∈ C 1 (X ), V ∗ (x)  0, and V ∗ (0) = 0. performance index will degrade for an event-triggered control
Now, it is ready to give the ETOC. Based on (10) and method. For the proposed event-triggering condition (17),
using V ∗ (x), the following ETOC is proposed: we will demonstrate in Theorem 1 that an upper bound of the
1 performance index with the ETOC (13) can be predetermined
μ(x̂) = u ∗ (x̂) = − R −1 g T (x̂)∇x V ∗ (x̂) (13) by giving the parameter α. Moreover, the stability of the
2
where x̂(t) denotes the state available for controller as follows: closed-loop system (16) will also be proved in Theorem 1.
 Theorem 1: Consider the closed-loop system (16) with the
x(t) t = t j triggering condition (17). The triggering instant sequence {t j }
x̂(t) = (14) is determined by (19). Then:
x(t j ) t ∈ (t j , t j +1 )
1) the closed-loop system (16) is asymptotically stable;
for all j ∈ N. Note that the ETOC (13) depends on the 2) there exists an upper bound for the real performance
solution V ∗ (x) of the HJB equation (12). index J (x 0 , μ), i.e., J (x 0 , μ)  (1 + α)J ∗ (x 0 ).
Remark 1: For the time-triggered optimal control u ∗ (x),
Proof: 1) Choose V ∗ (x) as the Lyapunov function. Based
it requires system state x at all time instants. Differently,
on (17) and (19), taking derivative along (16) yields
in the ETOC (13), only system state at triggered instants is
transmitted to the controller and holds constant during the V̇ ∗ (x) = ∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)]
execution interval, which greatly reduces computational and 1
communication resources. = [Cα (x, x̂) − Q(x) − μ(x̂)2R ]
1+α
The error between the available state x̂(t) and true state x(t) 1
is defined as − [Q(x) + μ(x̂)2R ]
1+α
0 (20)
e(t)  x̂(t) − x(t). (15)
which means that the closed-loop system (16) is asymptoti-
With the ETOC (13), it follows from (1) and (15) that the
cally stable.
closed-loop system is given by
2) According to (20), we have
ẋ = f (x) + g(x)μ(x̂)
Q(x) + μ(x)2R  −(1 + α)V̇ ∗ (x) (21)
1
= f (x) − g(x)R −1 g T (x̂)∇x V ∗ (x̂)
2 for t ∈ [t j , t j +1 ) and all j . From 1) of Theorem 1, the closed-
1 loop system (16) is asymptotically stable, which means that
= f (x) − g(x)R −1 g T (x + e)∇x V ∗ (x + e). (16)
2 limt →∞ x(t) = 0. Then, based on (2), (8), and (21), we have
 ∞
To determine the release time instant t j , a novel event-
triggering condition is proposed as follows: J (x 0 , μ) = [Q(x(t)) + μ(x̂(t))2R ]dt
0  ∞
Cα (x, x̂) < 0 (17)  −(1 + α) dV ∗ (x(t))
0 ∞
where = −(1 + α)V ∗ (x(t)) 0
Cα (x, x̂)  (1 + α)∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)] = (1 + α)V ∗ (x 0 )
+Q(x) + μ(x̂)2R (18) = (1 + α)J ∗ (x 0 ).
with α > 0 being a pregiven constant that will determine an This completes the proof.
upper bound for the performance index with the ETOC (13). From Theorem 1, it is observed that with the given of the
Once the triggering condition (17) is violated, the current parameter α, the real performance index J (x 0 , μ) is upper

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 79

bounded by the predetermined (1 + α)J ∗ (x 0 ). In Corollaries 1 Assumption 2: Assume that u ∗ (x) is Lipschitz continuous.
and 2, we will analyze the influence of the parameter α in the For all x 1 , x 2 ∈ X , there exists lu > 0 such that u ∗ (x 1 ) −
event-triggering condition (17) to the performance. u ∗ (x 2 )  lu x 1 − x 2 .
Corollary 1: Consider the closed-loop system (16) with Theorem 2: Consider the closed-loop system (16) with the
the triggering condition (17). The sequence of interexecution triggering condition (17). The triggering instant t j is deter-
times {h j } is implicitly determined by (19). If α = 0, then mined by (19) and the interexecution time h j is defined
h j = 0 for all j and J (x 0 , μ) = J ∗ (x 0 ). by (4). Let Assumptions 1 and 2 hold. Then, there exists a
Proof: Based on the HJB equation (12), we have lower bound h > 0 for h j , i.e., h j  h for all j .
Proof: Based on the HJB equation (12)
∇xT V ∗ (x) f (x) + Q(x) = u ∗ (x)2R . (22)
∇xT V ∗ (x) f (x) = −Q(x) + u ∗ (x)2R . (23)
With the condition α = 0, it follows from (18) and (22) that
According to Assumption 2, it follows from (10) and (13) that
Cα (x, x̂) = ∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)]
μ(x̂) − u ∗ (x) = u ∗ (x̂) − u ∗ (x)  lu x̂ − x = lu e.
+ Q(x) + μ(x̂)2R
(24)
= u ∗ (x)2R − 2[u ∗ (x)]T Rμ(x̂) + μ(x̂)2R
Based on (23), (24), and Assumption 1, rewrite (18) as
= u ∗ (x) − μ(x̂)2R  0  
C(x, x̂) = (1 + α)∇xT V ∗ (x) f (x) + g(x)μ(x̂)
for all t  t j and j . Thus, it follows from (19) that t j +1 = t j ,
i.e., h j = 0 for all j . + Q(x) + μ(x̂)2R
 
On the one hand, it follows from the condition α = 0 = (1 + α) − Q(x) + u ∗ (x)2R − 2μ(x̂)Ru ∗ (x)
and 2) of Theorem 1 that J (x 0 , μ)  J ∗ (x 0 ). On the other + Q(x) + μ(x̂)2R
hand, J ∗ (x 0 ) is the minimum performance, which means that
= −α Q(x) + (1 + α)μ(x̂) − u ∗ (x)2R
J (x 0 , μ)  J ∗ (x 0 ). Thus, we have J (x 0 , μ) = J ∗ (x 0 ).
Corollary 2: Let α1  α2 > 0. Consider the closed-loop − αμ(x̂)2R
system (16) and the triggering condition (17) with parameters  −α Q(x) + (1 + α)μ(x̂) − u ∗ (x)2R
α1 and α2 . h α1 and h α2 are the interexecution times associate  −αl1 x2 + lu2 (1 + α)σ (R)e2 . (25)
with the parameters α1 and α2 . Then, h α1  h α2 .
Proof: For t ∈ [0, h α2 ], it follows from (18) and (19) that Let β > t j that satisfies the following equation:
−αl1 x(β)2 + lu2 (1 + α)σ (R)e(β)2 = 0. (26)
Cα1 (x, x̂) − Cα2 (x, x̂)
= (α1 − α2 )∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)] According to (19), (25), and (26), we have
α1 − α2   t j +1  β. (27)
= Cα2 (x, x̂) − Q(x) − μ(x̂)2R
1 + α2 Define a notation as follows:
α1 − α2  
e(t)
− Q(x) + μ(x̂)2R  0 s(t)  .
1 + α2 x(t)
(28)
which means Cα1 (x, x̂)  Cα2 (x, x̂). Thus, h α1  h α2 . By using (28), dividing x(β)2 on both sides of (26) yields
By using the parameter α = 0, it is observed from the following equation:
Corollary 1 that the ETOC will degrade into a traditional
time-triggered optimal control, i.e., h j = 0, and the optimal −αl1 + lu2 (1 + α)σ (R)s(β)2 = 0. (29)
performance J ∗ (x) can be achieved. From Corollary 2, it is For the quadratic equation (29), it has two different solutions,
noted that larger α will result in a larger interexecution time, i.e., s1 (β) = λ1 and s2 (β) = λ2 , where λ1 = ((αl1 /
and then more computational and communication resources lu2 (1 + α)σ (R)))1/2 and λ2 = −((αl1 /lu2 (1 + α)σ (R)))1/2 .
can be reduced. This is to say, α is a tuning parameter Noting that s1 (β) > 0 and s2 (β) < 0, we take
between the ETOC and the time-triggered optimal control,
which achieves a tradeoff between the optimal performance s(β) = λ1 . (30)
index and the reduction of resources. For the choice of α, Based on (28), let us consider
it depends on practical requirement. If designers emphasize d e(t)
on optimizing the performance index, a small α can be used. ṡ(t) =
dt x(t)
Otherwise, a large α can be applied if designers emphasize on
x(t)e(t)−1 eT (t)ė(t) e(t)x(t)−1 x T (t)ẋ(t)
reducing computational and communication resources. = −
x(t)2 x(t)2
−1
x(t)e(t) e(t)ė(t)
C. Lower Bound of Interexecution Times 
x(t)2
In this section, it is proven that there exists a lower bound
for the interexecution time h j . Before starting, the following e(t)x(t)−1 x(t)ẋ(t)
+
assumption is required. x(t)2
Assumption 1: Assume that l1 x2  Q(x)  l2 x2 and x(t)ė(t) e(t)ẋ(t)
= + . (31)
l3 x  u ∗ (x)  l4 x, where l1 , l2 , l3 , l4 > 0. x(t)2 x(t)2

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
80 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020

Since f (x)+g(x)μ(x +e) is Lipschitz continuous, there exists represented as


a constant L > 0 such that L
ė = ẋ =  f (x)+g(x)μ(x + e)  Lx+ Le. (32) V ∗ (x) = θ ∗j ψ j (x) + (x) (35)
j =1
Based on (31) and (32), we have ṡ(t)  L[1 + s(t)]2 . Let
φ(t) be the solution of the differential equation φ̇(t) = L[1 + where θ ∗  [θ1∗ , · · · , θ L∗ ]T is the vector of the ideal critic
φ(t)]2 with initial condition φ(t j ) = 0. Then, by using the NN weights, L (x)  [ψ1 (x), · · · , ψ L (x)]T denotes the
comparison principle, we have s(t)  φ(t). Thus, its solution vector of critic NN activation functions, and (x) is the
is given by NN estimation error that satisfies lim L→∞ (x) = 0. Although
θ ∗ provides the best approximation for V ∗ (x), it is usually
L(t − t j )
φ(t) = (33) unknown and difficult to obtain. In realization, the output of
1 − L(t − t j ) the critic NN is
for t  t j . For the time instant β, we have s(β)  φ(β). Then, L
it follows from (33) that λ1  (L(β − t j )/1 − L(β − t j )), that V̂ (x) = θ j ψ j (x) = T
L (x)θ (36)
is, j =1

β  h + tj (34) where θ  [θ1, · · · , θ L ]T is the estimation of the ideal


weight vector θ ∗ . By using (36), it follows from (13) that the
where h = (λ1 /L(1 + λ1 )). From (27) and (34), we have
ADP-based ETOC is given by
t j +1  β  h + t j . Then, h j = t j +1 − t j  h, which means
that h j is lower bounded by h for all j . 1
μ(x̂) = − R −1 g T (x̂)∇x̂T L ( x̂)θ̂ (37)
Remark 2: It is worth mentioning that some related 2
works [24], [25], [27], [30], [32] have already been reported where
to solve the ETOC problems for continuous-time systems- 
based ADP. Compared with these existing works, there are two θ (t) t = t j
θ̂ (t) = . (38)
main differences in the proposed ETOC method. First and the θ (t j ) t ∈ (t j , t j +1 )
most important, the real performance index of event-triggered
control methods was rarely analyzed in existing works. For Cα (x, x̂) in the triggering condition (17) is implemented with
the proposed event-triggering condition (17), it is proven that
Cα (x, x̂)  (1 + α)∇xT V̂ (x)[ f (x) + g(x)μ(x̂)]
an upper bound can be guaranteed for the real performance
index. Moreover, a tradeoff between the optimal performance +Q(x) + μ(x̂)2R . (39)
index and the reduction of resources can be achieved by tuning
To learn the critic NN weight vector θ , the ADP method
the parameter α in (17). Second, in the existing works, the so-
is developed. Inspired by works [33]–[35], the following
called event-triggered HJB equation is usually required. How-
assumption is introduced.
ever, it means that V ∗ (x) should satisfy both the HJB equation
Assumption 3: Let P(x) ∈ C 1 (X ) be a Lyapunov function
and the event-triggered HJB equation, which is unreasonable
candidate that satisfies
to some extent. For the proposed ETOC method, the event-
triggered HJB equation is not required. It only requires to ∇xT P(x)[ f (x) + g(x)u ∗ (x)]  0. (40)
solve the original HJB equation (12) directly, which is much
simpler and more reasonable. Moreover, ∇x P(x) is bounded on the compact set X ,
i.e., ∇x P  Px M , where Px M > 0.
Due to the estimation error in the critic NN, the replace-
IV. R EALIZATION W ITH A DAPTIVE
ment of V ∗ (x) and u ∗ (x) in the HJB equation (11) with
DYNAMIC P ROGRAMMING
V̂ (x) and μ(x) will result in a residual error, which means
Note that the ETOC (13) depends on the solution V ∗ (x) H (x, μ(x), ∇x V̂ (x)) = 0. Thus, define the residual error
of the HJB equation (12). To realize the ETOC, ADP is between H (x, μ(x), ∇x V̂ (x)) and H (x, u ∗(x), ∇x V ∗ (x)) as
employed to solve the HJB equation directly online, and follows:
then an ADP-based ETOC method is developed. By using a
critic NN to estimate the optimal value function V ∗ (x), its ξ(x, θ )
weights are tuned to minimize the squared residual error of  H (x, μ(x), ∇x V̂ (x)) − H (x, u ∗(x), ∇x V ∗ (x))
the HJB equation. Subsequently, the stability of the closed- = H (x, μ(x), ∇x V̂ (x)) − 0
loop system, the bounds for the real performance index,
and the interexecution time are analyzed for the developed = θ T ∇x L (x)[ f (x)+g(x)μ(x)]+ Q(x)+μ(x)2R (41)
ADP-based ETOC. where ∇xT L (x)  [∇x L (x)]T and ∇x L (x) denotes the
Jacobian of L (x). Based on (41), the square residual error
A. Adaptive Dynamic Programming-Based ETOC is denoted as follows:
To solve the HJB equation (12), a critic NN is employed 1
E(x, θ )  ξ 2 (x, θ ). (42)
to approximate its solution V ∗ (x). Then, V ∗ (x) can be 2

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 81

Then, it is desired to tune θ such that the square residual Theorem 3: Consider the system (1) with control (37).
error (42) is minimized and the system stability can be The triggering condition is given by (17) with (39). Let
guaranteed. Therefore, the following tuning rule is developed: Assumptions 1–4 hold. Then, the signals x(t), x̂(t), and θ̃(t)
are SGUUB.
θ̇ = −βσ (x)ξ(x, θ )γ(x)
−1 T
Proof: Choose the following Lyapunov function candi-
+κ∇x L (x)g(x)R g (x)∇x P(x) (43) date:
where β > 0 is an adaptive gain and L(t) = V ∗ (x) + P(x) + V ∗ (x̂) + Vθ̃ (θ̃ ) (48)
γ(x)  ∇x L (x)[ f (x) + g(x)μ(x)] (44) where Vθ̃ (θ̃ )  (1/2)θ̃ T θ̃ .
First, let us consider the stability
1
σ (x)  (45) of flow dynamics on t ∈ [t j , t j +1 ) for all j . By taking the
1 + γ (x)γ(x)
T
derivative along with (16) and (47), we have

0 if ∇xT P(x)[ f (x) + g(x)μ(x̂)]  0 ˙
κ  (46) L̇(t) = V̇ ∗ (x) + Ṗ(x) + V̇ ∗ (x̂) + θ̃ T θ̃. (49)
0.5 else.
We consider each part of (49) separately as follows. From (35),
The key advantage of (46) is that it avoids requiring
we have
an initial admissible control policy, and thus, the initial
NN weights θ (0) of (43) can be selected randomly. Similar V ∗ (x) = LT (x)θ ∗ + (x)
techniques have also been applied and analyzed in related = LT (x)(θ − θ̃ ) + (x)
ADP works [33], [34]. = V̂ (x) − LT (x)θ̃ + (x). (50)
Remark 3: It is necessary to give a brief description for
the implementation of the ADP-based ETOC. First, the critic Then, by using Assumptions 1 and 4, it follows from (50) that
NN weight θ is computed with (43) using x, and the event- V̇ ∗ (x) = ∇xT V ∗ (x)[ f (x) + g(x)μ(x̂)]
triggering condition (39) is verified using x and x̂. Then, when
= ∇xT V̂ (x)[ f (x) + g(x)μ(x̂)]
the event-triggering condition is violated, i.e., Cα (x, x̂)  0,  
x̂ and θ̂ are transmitted to compute the control with (37) and − ∇xT LT (x)θ̃ − ∇xT (x) [ f (x) + g(x)μ(x̂)]
held constant in a zero-order-hold scheme. 1  
− Q(x) + μ(x̂)2R
 1T+ αT 
B. Theoretical Analysis ∇x L (x)θ̃ − ∇xT (x) [ f (x) + g(x)μ(x̂)]
1  
In this section, some theoretical analyses are provided − l1 x2 + l32 x̂2
for the ADP-based ETOC method (37), including stability, 1+α
+ (d M θ̃  + x M )( f M + g M l4 x̂). (51)
performance bound, and interexecution time bound. The proof
the system stability in Theorem 3 is in part inspired by the The derivative of P(x) is given by
works [32], [34], [36]. Before starting, Assumption 4 and the
definition of SGUUB are given as follows. Ṗ(x) = ∇xT P(x)[ f (x) + g(x)μ(x̂)]
Definition 1: [37] Consider the system (16), the solution = ∇xT P(x)[ f (x) + g(x)μ(x)]
x(t) is SGUUB, if for all x(t0 ) = x 0 ∈ X , there exist positive + ∇xT P(x)g(x)[μ(x̂) − μ(x)]. (52)
constants μ1 and T (μ1 , x 0 ) such that x(t) < μ1 for all According to the definition of x̂(t) in (14), it is noted that x̂(t)
t > t0 + T . remains invariant on t ∈ [t j , t j +1 ). Thus,
Assumption 4: Assume that:
1) θ ∗ is bounded, i.e., θ ∗   θ M , where θ M > 0; V̇ ∗ (x̂) = 0. (53)
2) f (x) is Lipschitz continuous, i.e., for all x 1 , x 2 ∈ X , Based on (43) and (47), we get
there exists l f > 0 such that  f (x 1 )− f (x 2 )  l f x 1 −
x 2 . f (x) and g(x) are bounded on the compact set X , V̇θ̃ (θ̃) = θ̃ T θ̃˙ = −βσ (x)ξ(x, θ )θ̃ T γ(x)
i.e.,  f   f M and g  g M , where f M , g M > 0; + κ θ̃ T ∇x L (x)g(x)R −1 g T (x)∇x P(x). (54)
3) L (x) and ∇x L (x) are bounded on the compact
set X , i.e.,  L   M and ∇x L   d M , where With Assumptions 1 and 4, the first term in (54) is
M , d M > 0; −βσ (x)ξ(x, θ )θ̃ T γ(x)
 
4) ∇x (x) is Lipschitz continuous, i.e., for all x 1 , x 2 ∈ X ,
T γ (x)γ(x)θ + Q(x) + μ(x) R γ(x)
T 2
there exists ld > 0 such that ∇x (x 1 ) − ∇x (x 2 )  = −β θ̃
1 + γ T (x)γ(x)
ld x 1 − x 2 . (x) and ∇x (x) are bounded on the
compact set X , i.e.,    M and ∇x   x M , where γ (x)γ(x)θ̃ (θ ∗ + θ̃ )
T T
= −β
 1 + γ (x)γ(x)
T
M , x M > 0; 
5) γ(x) is bounded on the compact set X , i.e., γm  Q(x) + μ(x)2R γ(x)
−β
γ(x)  γ M , where γm , γ M > 0. 1 + γ T (x)γ(x)
Define the error of critic NN weights as θ̃ (t)  θ (t) − θ ∗ . βγ θ̃ (θ M + θ̃ ) βγm (l1 + l3 )x2
2
Then, it follows from (43) that − m − . (55)
1 + γM2 1 + γM
2

θ̃˙ (t) = θ̇ (t). (47) Based on the definition of κ in (46), it has two cases.

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
82 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020

Case 1: if ∇xT P(x)[ f (x) + g(x)μ(x̂)]  0, κ = 0. Then where z = [x, x̂, θ̃ ]T , and υ = x M fM + k1
⎡ l βγm (l1 + l3 ) ⎤
−1 T
Ṗ(x) + κ θ̃ T ∇x L (x)g(x)R g (x)∇x P(x)
1
+ 0 0
⎢1 + α 1 + γM 2 ⎥
= ∇xT P(x)[ f (x) + g(x)μ(x̂)]  0. (56) ⎢ ⎥
⎢ 2
l3 d M g M l4 ⎥

M =⎢ 0 − ⎥
1+α ⎥
Case 2: if ∇xT P(x)[ f (x) + g(x)μ(x̂)] > 0, κ = 0.5. Then, ⎢ 2 ⎥
⎣ d M g M l4 βγm2 ⎦
it follows from (52) that 0 −
2 1 + γM 2

−1 T
⎡ ⎤
Ṗ(x) + κ θ̃ T ∇x L (x)g(x)R g (x)∇x P(x) k2
⎢ x M g M l4 + k2 ⎥
= ∇xT P(x)[ f (x) + g(x)μ(x)] N =⎢


βγm2 θ M ⎦ .
−1 T
d M fM −
+ 0.5θ̃ T ∇x L (x)g(x)R g (x)∇x P(x) 1 + γM 2

+ ∇xT P(x)g(x)[μ(x̂) − μ(x)] Let the parameters be chosen such that M > 0. Then,
  it follows from (60) that
= ∇xT P(x) f (x) − 0.5 g(x)R −1 g T (x)∇xT L (x)(θ − θ̃ )

+ ∇xT P(x)g(x)[μ(x̂) − μ(x)] L̇(t)  −σ (M)z2 + Nz2 + υ.


 
= ∇xT P(x) f (x) − 0.5 g(x)R −1 g T (x)∇xT L (x)θ

Then, by completing the squares for (60), L̇(t)  0 if
+ ∇xT P(x)g(x)[μ(x̂) − μ(x)]
  z  B
= ∇xT P(x) f (x) − 0.5 g(x)R −1 g T (x)∇xT (V ∗ (x) − (x))
where
+ ∇xT P(x)g(x)[μ(x̂) − μ(x)] 
∗ N + N2 (N) + 4υσ (M)
= ∇xT P(x)[ f (x) + g(x)u (x)] B=
2σ (M)
+ 0.5∇xT P(x)g(x)R −1 g T (x)∇xT (x)
gives the bound for signals x(t), x̂(t), and θ̃ (t), i.e., they
+∇xT P(x)g(x)[μ(x̂) − μ(x)]. (57) are SGUUB.
Second, we consider the stability of the jump dynamics on
According to (56), (57), and Assumptions 1–4, we have the event-triggered instant t = t j . Consider the difference of
the Lyapunov function on the event-triggered instant t = t j ,
−1 T
Ṗ(x) + κ θ̃ T ∇x L (x)g(x)R g (x)∇x P(x) which is given by
 0.5∇xT P(x)g(x)R −1 g T (x)∇xT (x)
L(t j ) = V ∗ (x(t j )) + P(x(t j )) + V ∗ (x̂(t j ))
+∇xT P(x)g(x)[μ(x̂) − μ(x)]
+Vθ̃ (θ̃ (t j )) (61)
 k1 + k2 (x̂ + x) (58)
where V ∗ (x(t j ))  V ∗ (x(t j )) − V ∗ (x − (t j )), P(x(t j )) 
where k1 = 0.5Px M g 2M σ −1 (R) xM and k2 = Px M g M l3 . P(x(t j ))− P(x − (t j )), V ∗ (x̂(t j ))  V ∗ (x̂(t j ))−V ∗ (x̂ − (t j )),
Combining (56) and (58) yields and Vθ̃ (θ̃(t j )) = Vθ̃ (θ̃ (t j )) − Vθ̃ (θ̃ − (t j )). On the one
hand, it follows from the stability of the flow dynamics that
Ṗ(x) + κ θ̃ T ∇x L (x)g(x)R −1 g T (x)∇x P(x) V ∗ (x(t j )) + P(x(t j )) + Vθ̃ (θ̃ (t j ))  0 when z  B.
 max{0, k1 + k2 (x̂ + x)} On the other hand, V ∗ (x̂(t j ))  −γ (e(t j )) when z  B,
where γ (·) is a class of κ-function [32], [38]. Therefore,
= k1 + k2 (x̂ + x). (59)
L(t j ) < 0 when z  B, which means that the signals
x(t), x̂(t), and θ̃ (t) are SGUUB at the event-triggered instant
Then, by using (51), (53), (55), and (59), it follows
t = tj.
from (49) that
Theorem 4: Consider the system (1) with control (37). Let
1   Assumptions 1–4 hold. Then:
L̇(t)  − l1 x2 + l32 x̂2 1) there exists an upper bound for the real performance
1+α
index Jˆ(x 0 , μ);
+(d M θ̃  + x M )( f M + g M l4 x̂) 2) there exists a lower bound h > 0 for h j , i.e., h j  h
+k1 + k2 (x̂ + x) for all j .
βγm2 θ̃ (θ M + θ̃ ) βγm (l1 + l3 )x2 Proof: For the proof of Theorem 4, it is expected that the
− − ADP has achieved the desired convergence, i.e., θ ∗ is obtained.
1 + γM 2 1 + γM2
1) Choose V ∗ (x) as the Lyapunov function. Then, based
= −z T Mz + N z + υ (60) on (35), taking derivative for V ∗ (x) along the system (1) with

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 83

control (37), we have Moreover, the upper bound is also analyzed under the consid-
eration of NN estimation error by using ADP implementation.
V̇ ∗ (x) = V̂˙ (x) + ˙ (x) Second, in this paper, the ADP method is developed to solve
= ∇xT V̂ (x)[ f (x) + g(x)μ(x̂)] + ˙ (x) the original HJB equation (12) directly. In [22], the ADP was
1   employed to solve the event-triggered HJB equation, which
= Cα (x, x̂) − Q(x) − μ(x̂)2R + ˙ (x)
1+α can be viewed as approximation for the original HJB equation.
1   Third, because the event-triggering condition in this paper is
− Q(x) + μ(x̂)2R + ˙ (x)
1+α different from those in [22] and [30], their theoretical analysis
that is, is greatly different.

Q(x) + μ(x̂)2R  −(1 + α)[V̇ ∗ (x) + ˙ (x)]. V. S IMULATION S TUDIES


∞
According to Assumption 4, we have (x(t))0  2 M . Then In this section, the effectiveness of the ETOC is verified
 ∞ on three examples: a power system, satellite attitude control
  system, and a nonlinear torsional pendulum system. Moreover,
ˆ
J (x 0 , μ) = Q(x(t)) + μ(x̂(t))2R dt
to show the influence of the parameter α in the event-triggering
0  ∞
condition, comparative simulation studies are conducted with
 −(1 + α) d[V ∗ (x(t)) + (x(t))] different α.
0 ∞ ∞
= −(1 + α)V ∗ (x(t))0 + (1 + α) (x(t))0
A. Case 1: Power System
= (1 + α)[ J ∗ (x 0 ) + 2 M ].
The system dynamics of the linear power system is given
2) According to (39), we have as follows:
Cα (x, x̂) = (1 + α)∇xT V̂ (x)[ f (x) + g(x)μ(x̂)] ẋ = Ax + Bu (63)
+Q(x) + μ(x̂)2R where the system matrices are given by
 
= (1 + α) ∇xT V ∗ (x) − ∇xT (x) f (x) ⎡ 1 Kp ⎤
− 0 ⎡ ⎤
−2(1 + α)μ(x̂)Ru ∗ (x) + Q(x) + μ(x̂)2R ⎢ Tp Tp ⎥ 0
  ⎢ 1 1 ⎥ ⎢0⎥
= (1 + α) − Q(x) + u ∗ (x)2R − ∇xT (x) f (x) ⎢ ⎥
A=⎢ 0 − ⎥, B = ⎢
⎣1⎦

⎢ Tt Tt ⎥
−2(1 + α)μ(x̂)Ru ∗ (x) + Q(x) + μ(x̂)2R ⎣ 1 1⎦ Tg
− 0 −
= −α Q(x) + (1 + α)μ(x̂) − u ∗ (x)2R R0 Tg Tg
−αμ(x̂)2R − (1 + α)∇xT (x) f (x) the system state vector is x = [ f, Pg , X g ]T and u is
 −αl1 x + lu2 (1 + α)σ (R)e2
2
the control input.  f, Pg , and X g denote the incremental
+(1 + α)ld l f x2 . (62) frequency deviation, the incremental change in the generator
output, and the incremental change in governor valve position,
For sufficient accurate critic NN, the NN estimation error respectively. Tg = 0.2, Tt = 5, T p = 2, K p = 0.5, and
(x) → 0 and ld → 0. Then, αl1 − (1 + α)ld l f → αl1 > 0. R0 = 0.5 denote the governor time constant, the turbine time
By using the same steps as (26)–(30), we take the solu- constant, the plant model time constant, the plant gain, and the
tion s(β) = λ1 = (([αl1 − (1 + α)ld l f ]/lu2 (1 + α)σ (R)))1/2 speed regulation due to governor action, respectively. For the
for the quadratic equation [−αl1 + (1 + α)ld l f ] + lu2 (1 + performance index (2), let R = 1 and Q(x) = x T Sx with an
α)σ (R)s(β)2 = 0. With λ1 , the rest of proof is the same as identity matrix S. Then, the HJB equation (12) can be solved
that for Theorem 2, which is omitted to avoid repetition. exactly and its solution is given by V ∗ (x) = x T Px, where
In Theorem 3, the stability for the ADP-based event- ⎡ ⎤
triggered control is provided. In Theorem 4, it is proven that 2.8033 0.2913 −0.1036
there exists an upper bound for the real performance index and P = ⎣ 0.2913 2.5831 0.0671 ⎦ .
a lower bound for the interexecution times. From the proof, −0.1036 0.0671 0.0847
when the NN estimation error (x) → 0, the upper bound By using the optimal value function V ∗ (x), the ETOC (13)
for the real performance index and the lower bound for the is applied to the power system. For the event-triggering
interexecution times will approach to the ideal values. condition (18), select the parameter α = 0.05. The simulation
Remark 4: In [22] and [30], some important theoreti- results are demonstrated in Figs. 1–4. Figs. 1–3 show the
cal results were given to analyze the real performance trajectories of the state x and x̂, the ETOC μ, and the interex-
index for event-triggered control. Comparing this paper ecution times h j , respectively. It is found in Fig. 3 that the
with [22] and [30], there are three main differences. First, minimum h j is 0.5001s, which shows that the communication
in [22] and [30], the real performance index for the event- of state information and the computation of control are greatly

triggered control contains an integral term 0 (u ∗c − u ∗d )T reduced. Moreover, the optimal performance index J ∗ (x 0 ),

R(u ∗c − u d )dt. However, the boundness of this integral term the real performance index J (x 0 , μ), and the bound (1 +
was not analyzed. In this paper, an upper bound of the real α)J ∗ (x 0 ) are shown in Fig. 4, where the upper bound of the
performance index can be determined for a given parameter α. real performance is guaranteed, i.e., J (x 0 , μ)  (1+α)J ∗ (x 0 ).

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
84 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020

Fig. 1. For Case 1, the trajectories of x and x̂. Fig. 4. For Case 1, the trajectories of J ∗ (x0 ), J (x0 , μ) and the bound
(1 + α)J ∗ (x0 ).

Fig. 2. For Case 1, the trajectory of the ETOC μ.


Fig. 5. For Case 2, the trajectories of x and x̂.

Fig. 3. For Case 1, the interexecution time h j .


Fig. 6. For Case 2, the trajectories of x and x̂.

B. Case 2: Satellite Attitude Control System


and Tc is the control torque. Denoting x = [ϑ2 , ϑ̇2 , ϑ1 , ϑ̇1 ]T
For the satellite model considered in [39] and [40], its and u = Tc , the system (64) can be rewritten as (63), where
equations of motion are ⎡ ⎤ ⎡ ⎤
 0 1 0 0 0
J1 ϑ̈1 + d(ϑ̇1 − ϑ̇2 ) + k(ϑ1 − ϑ2 ) = Tc ⎢−k/J2 −d/J2 k/J 2 d/J2 ⎥ ⎢ ⎥
(64) A=⎢ ⎥, B = ⎢ 0 ⎥.
J2 ϑ̈2 + d(ϑ̇2 − ϑ̇1 ) + k(ϑ2 − ϑ1 ) = 0 ⎣ 0 0 0 1 ⎦ ⎣ 0 ⎦
k/J1 d/J1 −k/J1 −d/J1 1/J1
where ϑ1 is the angle of the main satellite with respect to the
star, ϑ2 is the angle between the star sensor and the instrument For the performance index (2), let R = 1 and Q(x) = x T Sx
package, k = 0.09 is a torque constant, d = 0.0219 is a with an identity matrix S. Let the parameter α = 0.05 for
viscous damping constant, J1 = 1 and J2 = 1 are inertias, event-triggering condition. The simulation results are shown

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 85

Fig. 7. For Case 2, the trajectory of the ETOC μ.


Fig. 10. For Case 3, the trajectories of x and x̂.

Fig. 8. For Case 2, the interexecution time h j .


Fig. 11. For Case 3, the trajectories of θ .

Fig. 9. For Case 2, the trajectories of J ∗ (x0 ), J (x0 , μ) and the bound
(1 + α)J ∗ (x0 ).
Fig. 12. For Case 3, the trajectory of the ETOC μ.
in Figs. 5–9. Figs. 5–7 shows the trajectories of states and
control. The intertimes h j are shown in Fig. 8, where the where u is the control input, M = 1 kg is the mass, and l = 3
achieved minimum h j is 0.1201 s. In Fig. 9, it is observed m is the length of the pendulum bar. Denote the system state
that the achieved real performance is J (x 0 , μ) = 15.6072, as x = [ϑ, ω]T , where ϑ and ω are the current angle and
which is below the upper bound (1 + α)J ∗ (x 0 ). angular velocity with initial values given by ϑ(0) = 0.2 and
ω(0) = −0.2, respectively. g = 9.8m/s2 is the gravity, J =
(4Ml 2 /3) is the rotary inertia, and f d is the frictional factor.
C. Case 3: Torsional Pendulum System
For the performance index (2), R = 1 and Q(x) = x T Sx with
The nonlinear torsional pendulum system [41] is given by S being an identity matrix.

⎪ dϑ To approximate the value function of the HJB equation,
⎨ =ω
dt select the NN activation functions as L (x) = [x 12 , x 1 x 2 , x 22 ,
(65)

⎩ J dω = u − Mgl sin(ϑ) − f d ω x 13 , x 12 x 2 , x 1 x 22 , x 23 , x 14 , x 13 x 2 , x 12 x 22 , x 1 x 23 , x 24 ]T . For the event-
dt triggering condition (18), select the parameter α = 0.8. Then,

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
86 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020

Fig. 13. For Case 3, the interexecution time h j . Fig. 16. For Case 4, the trajectory of the ETOC μ.

Fig. 14. For Case 4, the trajectories of x1 , x2 , x̂1 and x̂2 . Fig. 17. For Case 4, the interexecution time h j .

Fig. 15. For Case 4, the trajectories of x3 , x4 , x̂3 and x̂4 . Fig. 18. For Case 4, the trajectories of J ∗ (x0 ), J (x0 , μ) and the bound
(1 + α)J ∗ (x0 ).

the ADP-based ETOC is applied to the torsional pendulum


system. Figs. 4–13 demonstrate the trajectories of the state x the same as that for Case 2. The simulation results are
and x̂, the NN weight θ , the ETOC μ, the event-triggering demonstrated in Figs. 14–18. Compared with Fig. 8 with
condition Cα (x, x̂), and the interexecution times h j , respec- Fig. 17, it is found that the achieved minimum h j is 0.3203s
tively. In Fig. 13, it is found that the minimum h j is 0.0220 s. for Case 4, which is less than that for Case 2. Compared
with Fig. 9 with Fig. 18, it is observed that the achieved
real performance is J (x 0 , μ) = 15.3893 for Case 4, which
D. Case 4: Event-Triggering Condition With Larger α is larger than that for Case 2. This is to say, the para-
To study the influence of the parameter α, we conduct meter α achieves a tradeoff between the performance index
simulation on the satellite attitude control system (64) with and the reduction of resources, which verifies the analysis
a larger parameter α = 0.5, and other parameters remain in Remark 2.

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
LUO et al.: ETOC WITH PERFORMANCE GUARANTEES USING ADAPTIVE DYNAMIC PROGRAMMING 87

VI. C ONCLUSION [15] B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal


and autonomous control using reinforcement learning: A survey,” IEEE
For the optimal control problem of nonlinear continuous- Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062,
time systems, a novel event-triggering condition is proposed Jun. 2018.
[16] R. Kamalapurkar, L. Andrews, P. Walters, and W. E. Dixon, “Model-
for ETOC design. It is proven that the developed ETOC based reinforcement learning for infinite-horizon approximate optimal
method can guarantee an upper bound for the real performance tracking,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3,
index. Moreover, the stability of the closed-loop system and pp. 753–758, Mar. 2017.
[17] Q. Wei, D. Liu, Y. Liu, and R. Song, “Optimal constrained self-learning
the existence of a lower bound for interexecution times have battery sequential management in microgrid via adaptive dynamic pro-
been analyzed. To solve the HJB equation, the critic NN was gramming,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 168–176,
used to approximate the optimal value function and the Apr. 2017.
[18] M. D. S. Aliyu, “An iterative relaxation approach to the solution of the
ADP-based ETOC method has been developed. The stability, Hamilton-Jacobi-Bellman-Isaacs equation in nonlinear optimal control,”
the bounds for the performance index, and the interexecution IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 360–366, Jan. 2018.
time have been proved. In the simulation studies, the effec- [19] P. Tabuada, “Event-triggered real-time scheduling of stabilizing control
tiveness of the developed ETOC method was verified on four tasks,” IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1680–1685,
Sep. 2007.
cases. [20] W. P. M. H. Heemels, K. H. Johansson, and P. Tabuada, “An intro-
There are some related interesting topics that will be studied duction to event-triggered and self-triggered control,” in Proc. IEEE
in the future. First, study the model-free ETOC methods 51st Annu. Conf. Decision Control (CDC), Maui, HI, USA, Dec. 2012,
pp. 3270–3285.
with performance guarantee. Second, study event-triggering [21] M. Lemmon, Event-Triggered Feedback in Control, Estimation, and
schemes to update the critic NN weights. Third, study practical Optimization, London, U.K.: Springer, 2010, pp. 293–358.
applications for the developed ADP-based ETOC method. [22] K. G. Vamvoudakis, “Event-triggered optimal adaptive control algorithm
for continuous-time nonlinear systems,” IEEE/CAA J. Autom. Sinica,
vol. 1, no. 3, pp. 282–293, Jul. 2014.
R EFERENCES [23] A. Sahoo, H. Xu, and S. Jagannathan, “Near optimal event-triggered
control of nonlinear discrete-time systems using neurodynamic pro-
gramming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 9,
[1] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control. Hoboken, pp. 1801–1815, Sep. 2015.
NJ, USA: Wiley, 2013.
[24] L. Dong, X. Zhong, C. Sun, and H. He, “Event-triggered adaptive
[2] M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for
dynamic programming for continuous-time systems with control con-
nonlinear systems with saturating actuators using a neural network HJB
straints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 8,
approach,” Automatica, vol. 41, no. 5, pp. 779–791, May 2005.
pp. 1941–1952, Aug. 2017.
[3] A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Discrete-time non-
[25] A. Sahoo, H. Xu, and S. Jagannathan, “Approximate optimal control of
linear HJB solution using approximate dynamic programming: Conver-
affine nonlinear continuous-time systems using event-sampled neurody-
gence proof,” IEEE Trans. Syst., Man, Cybern. B. Cybern., vol. 38, no. 4,
namic programming,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28,
pp. 943–949, Aug. 2008.
no. 3, pp. 639–652, Mar. 2017.
[4] D. Vrabie and F. Lewis, “Neural network approach to continuous-time
direct adaptive optimal control for partially unknown nonlinear systems,” [26] Q. Zhang, D. Zhao, and D. Wang, “Event-based robust control for
Neural Netw., vol. 22, no. 3, pp. 237–246, 2009. uncertain nonlinear systems using adaptive dynamic programming,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 1, pp. 37–50,
[5] H. Zhang, Y. Luo, and D. Liu, “Neural-network-based near-optimal
Jan. 2018.
control for a class of discrete-time affine nonlinear systems with control
constraints,” IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490–1503, [27] X. Zhong and H. He, “An event-triggered ADP control approach for
Sep. 2009. continuous-time system with unknown internal states,” IEEE Trans.
[6] T. Dierks and S. Jagannathan, “Online optimal control of affine nonlinear Cybern., vol. 47, no. 3, pp. 683–694, Mar. 2017.
discrete-time systems with unknown internal dynamics by using time- [28] Y. Zhu, D. Zhao, H. He, and J. Ji, “Event-triggered optimal con-
based policy update,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, trol for partially unknown constrained-input systems via adaptive
no. 7, pp. 1118–1129, Jul. 2012. dynamic programming,” IEEE Trans. Ind. Electron., vol. 64, no. 5,
[7] A. Heydari and S. N. Balakrishnan, “Finite-horizon control-constrained pp. 4101–4109, May 2017.
nonlinear optimal control using single network adaptive critics,” IEEE [29] L. Dong, X. Zhong, C. Sun, and H. He, “Adaptive event-triggered
Trans. Neural Netw. Learn. Syst., vol. 24, no. 1, pp. 147–157, Jan. 2013. control based on heuristic dynamic programming for nonlinear discrete-
[8] F. L. Lewis and D. Liu, Reinforcement Learning and Approximate time systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 7,
Dynamic Programming for Feedback Control, vol. 17. Hoboken, NJ, pp. 1594–1605, Jul. 2017.
USA: Wiley, 2013. [30] K. G. Vamvoudakis and H. Ferraz, “Model-free event-triggered control
[9] H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Adaptive optimal algorithm for continuous-time linear systems with optimal performance,”
control of unknown constrained-input systems using policy iteration and Automatica, vol. 87, pp. 412–420, Jan. 2018.
neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 10, [31] D. Wang, C. Mu, D. Liu, and H. Ma, “On mixed data and event driven
pp. 1513–1525, Oct. 2013. design for adaptive-critic-based nonlinear H∞ control,” IEEE Trans.
[10] Z. Ni, H. He, and J. Wen, “Adaptive learning in tracking control based Neural Netw. Learn. Syst., vol. 29, no. 4, pp. 993–1005, Apr. 2018.
on the dual critic network design,” IEEE Trans. Neural Netw. Learn. [32] D. Wang, C. Mu, H. He, and D. Liu, “Event-driven adaptive robust
Syst., vol. 24, no. 6, pp. 913–928, Jun. 2013. control of nonlinear systems with uncertainties through NDP strategy,”
[11] B. Luo, H.-N. Wu, T. Huang, and D. Liu, “Data-based approximate IEEE Trans. Syst., Man, Cybern., Syst., vol. 47, no. 7, pp. 1358–1370,
policy iteration for affine nonlinear continuous-time optimal control Jul. 2017.
design,” Automatica, vol. 50, no. 12, pp. 3281–3290, 2014. [33] D. Nodland, H. Zargarzadeh, and S. Jagannathan, “Neural network-based
[12] Q.-Y. Fan and G.-H. Yang, “Adaptive actor–critic design-based integral optimal adaptive output feedback control of a helicopter UAV,” IEEE
sliding-mode control for partially unknown nonlinear systems with input Trans. Neural Netw., vol. 24, no. 7, pp. 1061–1073, Jul. 2013.
disturbances,” IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 1, [34] H. Zargarzadeh, T. Dierks, and S. Jagannathan, “Optimal control of
pp. 165–177, Jan. 2016. nonlinear continuous-time systems in strict-feedback form,” IEEE Trans.
[13] T. Bian, Y. Jiang, and Z.-P. Jiang, “Adaptive dynamic programming and Neural Netw. Learn. Syst., vol. 26, no. 10, pp. 2535–2549, Oct. 2015.
optimal control of nonlinear nonaffine systems,” Automatica, vol. 50, [35] X. Yang, H. He, and D. Liu, “Event-triggered optimal neuro-
no. 10, pp. 2624–2632, Oct. 2014. controller design with reinforcement learning for unknown nonlinear
[14] D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic systems,” IEEE Trans. Syst., Man, Cybern., Syst., to be published.
Programming with Applications in Optimal Control. Cham, Switzerland: [Online]. Available: https://ieeexplore.ieee.org/document/8183439.
Springer, 2017. doi: 10.1109/TSMC.2017.2774602.

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.
88 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 1, JANUARY 2020

[36] K. G. Vamvoudakis and F. L. Lewis, “Online actor–critic algorithm Derong Liu (S’91–M’94–SM’96–F’05) received the
to solve the continuous-time infinite horizon optimal control problem,” Ph.D. degree in electrical engineering from the
Automatica, vol. 46, no. 5, pp. 878–888, 2010. University of Notre Dame, Notre Dame, IN, USA,
[37] S. S. Ge, C. C. Hang, T. H. Lee, and T. Zhang, Stable Adaptive Neural in 1994.
Network Control. Norwell, MA, USA: Kluwer, 2001. In 2006, he joined the University of Illinois at
[38] H. K. Khalil and J. Grizzle, Nonlinear Systems, vol. 3. Chicago, Chicago, IL, USA, as a Full Professor of
Upper Saddle River, NJ, USA: Prentice-Hall, 2002. electrical and computer engineering and of computer
[39] X.-M. Zhang and Q.-L. Han, “Event-triggered dynamic output feedback science. In 2008, he was selected for the 100 Talents
control for networked control systems,” IET Control Theory Appl., vol. 8, Program by the Chinese Academy of Sciences. From
no. 4, pp. 226–234, Mar. 2014. 2010 to 2015, he was the Associate Director of the
[40] G. F. Franklin, J. D. Powell, and A. Emami-Naeini, Feedback Control State Key Laboratory of Management and Control
of Dynamic Systems. Boston, MA, USA: Addison-Wesley, 1986. for Complex Systems, Institute of Automation, Chinese Academy of Sciences,
[41] D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming Beijing, China. He has authored or co-authored 18 books.
algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Dr. Liu is a Fellow of the International Neural Network Society and the
Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. International Association of Pattern Recognition. He was the Editor-in-Chief
of the IEEE T RANSACTIONS ON N EURAL N ETWORKS AND L EARNING
Biao Luo (M’15) received the Ph.D. degree from S YSTEMS from 2010 to 2015. He is currently the Editor-in-Chief of Artificial
Intelligence Review (Springer).
Beihang University, Beijing, China, in 2014.
From 2014 to 2018, he was an Associate Professor
and an Assistant Professor with the Institute of
Automation, Chinese Academy of Sciences, Beijing,
China. He is currently a Professor with the
School of Automation, Central South University,
Changsha, China. His current research interests
include distributed parameter systems, intelligent
control, reinforcement learning, deep learning, and
computational intelligence.
Dr. Luo was a recipient of the Chinese Association of Automation Outstand-
ing Ph.D. Dissertation Award in 2015. He serves as an Associate Editor for the
IEEE T RANSACTIONS ON E MERGING T OPICS IN C OMPUTATIONAL I NTEL -
LIGENCE, Artificial Intelligence Review, Neurocomputing, and the Journal of
Industrial and Management Optimization. He is the Secretariat of Adaptive
Dynamic Programming and Reinforcement Learning Technical Committee,
Chinese Association of Automation.

Yin Yang received the Ph.D. degree from the


Hong Kong University of Science and Technology,
Hong Kong, in 2009. Huai-Ning Wu received the B.E. degree in automa-
He is currently an Assistant Professor with the tion from the Shandong Institute of Building Mate-
Division of Information and Communication Tech- rials Industry, Jinan, China, in 1992, and the Ph.D.
nologies, College of Science and Engineering, degree in control theory and control engineer-
Hamad Bin Khalifa University, Doha, Qatar. He also ing from Xi’an Jiaotong University, Xi’an, China,
holds adjunct positions with the Qatar Computing in 1997.
Research Institute, Doha, and Advanced Digital Sci- He is currently a Professor with Beihang Univer-
ences Center, Singapore. He has authored or co- sity, Beijing, China, and a Distinguished Professor of
authored extensively in top venues on differentially Yangtze River Scholar with the Ministry of Educa-
private data publication and analysis, and on query authentication in out- tion of China, Beijing. His current research interests
sourced databases. He has also designed efficient query processing methods include robust control, fault-tolerant control, distrib-
in various contexts, including data streams, relational keyword search, spa- uted parameter systems, and fuzzy/neural modeling and control.
tial databases, Web portals, and wireless sensor networks. He is currently Dr. Wu was a recipient of the China National Funds for Distinguished Young
researching actively on cloud-based big data analytics, especially real-time Scientists. He serves as an Associate Editor for the IEEE T RANSACTIONS ON
stream processing. His current research interests include big data analytics, F UZZY S YSTEMS and the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND
cloud computing, and database security and privacy. C YBERNETICS : S YSTEMS .

Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on September 07,2023 at 06:25:54 UTC from IEEE Xplore. Restrictions apply.

You might also like