Nody D 23 01248 PDF

Nonlinear Dynamics
Adaptive Optimal Coordination Control of perturbed Bilateral Teleoperators with

variable Time Delays using Actor-Critic Reinforcement Learning
--Manuscript Draft--
Manuscript Number:
Full Title: Adaptive Optimal Coordination Control of perturbed Bilateral Teleoperators with
variable Time Delays using Actor-Critic Reinforcement Learning
Article Type: Original Research
Keywords: Reinforcement learning, Approximate/adaptive dynamic programming (ADP),

Actor/Critic Structure, Robust Integral of the Sign of the Error (RISE), Bilateral
Teleoperators
Corresponding Author: Phuong Nam Dao, PhD

Hanoi University of Science and Technology School of Electrical Engineering
Hanoi, VIET NAM
Corresponding Author Secondary

Information:
Corresponding Author's Institution: Hanoi University of Science and Technology School of Electrical Engineering
Corresponding Author's Secondary

Institution:
First Author: Phuong Nam Dao, PhD
First Author Secondary Information:
Order of Authors: Phuong Nam Dao, PhD
Quang Phat Nguyen, Master
Manh Hung Vu, Master
The Anh Nguyen, Master
Order of Authors Secondary Information:
Funding Information: Ministry of Science and Technology Associate Professor Phuong Nam Dao
(DTDLCN.19/23)
Abstract: In this article, we study the unification of coordination control problem between two
sides and optimal control effectiveness for an unknown Bilateral Teleoperators (BTs)
under variable time delays in communication between two sides and external
disturbance. We proposed the control frame of Actor/Critic strategy and the Robust
Integral of the Sign of the Error (RISE), in which the synchronization effectiveness is
discussed in two sections with different conditions. The sliding variable is given to
transform a BT dynamic model into order reduction model, which can be designed
more favorable. By fully analyzing optimization problem in designing the training
weights of Actor/Critic structure based on the property of Hamiltonian term,
Reinforcement Learning (RL) control scheme in each side is proposed for a BT system.
Consequently, we incorporate the RISE term into proposed control frame to
mathematically prove that the tracking errors asymptotically converge to zero.
Furthermore, the proposed control strategy can also guarantee the convergence of
learning process. Simulation results and the comparisons demonstrate the
performance of the proposed control frame.
Order of Authors (with Contributor Roles): Phuong Nam Dao, PhD (Conceptualization: Lead; Investigation: Lead; Methodology:
Lead; Project administration: Lead; Writing – original draft: Lead; Writing – review &
editing: Lead)
Quang Phat Nguyen, Master (Data curation: Equal; Software: Equal; Validation: Equal)
Manh Hung Vu, Master (Data curation: Equal; Software: Equal; Validation: Equal;
Visualization: Equal)
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
The Anh Nguyen, Master (Formal analysis: Equal; Validation: Supporting;
Visualization: Supporting)
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Abstract Springer Nature 2021 LATEX template Click here to access/download;Abstract;Abstract_2-5-
2023.pdf
Adaptive Optimal Coordination Control of perturbed

Bilateral Teleoperators with variable Time Delays using
Actor-Critic Reinforcement Learning
Phuong Nam Dao1*, Quang Phat Nguyen1 , Manh Hung Vu1 and The Anh Nguyen1
1* School of Electrical and Electronic Engineering, Hanoi University of Science and
Technology, 01 Dai Co Viet, Hanoi, 100000, Vietnam.
*Corresponding author(s). E-mail(s): nam.daophuong@hust.edu.vn;
Abstract
In this article, we study the unification of coordination control problem between two sides and opti-
mal control effectiveness for an unknown Bilateral Teleoperators (BTs) under variable time delays
in communication between two sides and external disturbance. We proposed the control frame of
Actor/Critic strategy and the Robust Integral of the Sign of the Error (RISE), in which the syn-
chronization effectiveness is discussed in two sections with different conditions. The sliding variable
is given to transform a BT dynamic model into order reduction model, which can be designed more
favorable. By fully analyzing optimization problem in designing the training weights of Actor/Critic
structure based on the property of Hamiltonian term, Reinforcement Learning (RL) control scheme
in each side is proposed for a BT system. Consequently, we incorporate the RISE term into pro-
posed control frame to mathematically prove that the tracking errors asymptotically converge to zero.
Furthermore, the proposed control strategy can also guarantee the convergence of learning process.
Simulation results and the comparisons demonstrate the performance of the proposed control frame.
Keywords: Reinforcement learning, Approximate/adaptive dynamic programming (ADP), Actor/Critic

Structure, Robust Integral of the Sign of the Error (RISE), Bilateral Teleoperators
1
Manuscript Springer Nature 2021 LATEX template Click here to
access/download;Manuscript;Manuscript_Actor_Critic_for
Click here to view linked References
Adaptive Optimal Coordination Control of perturbed

Bilateral Teleoperators with variable Time Delays using
Actor-Critic Reinforcement Learning
Phuong Nam Dao1*, Quang Phat Nguyen1 , Manh Hung Vu1 and The Anh Nguyen1
1* School of Electrical and Electronic Engineering, Hanoi University of Science and
Technology, 01 Dai Co Viet, Hanoi, 100000, Vietnam.
*Corresponding author(s). E-mail(s): nam.daophuong@hust.edu.vn;
Abstract
In this article, we study the unification of coordination control problem between two sides and opti-
mal control effectiveness for an unknown Bilateral Teleoperators (BTs) under variable time delays
in communication between two sides and external disturbance. We proposed the control frame of
Actor/Critic strategy and the Robust Integral of the Sign of the Error (RISE), in which the syn-
chronization effectiveness is discussed in two sections with different conditions. The sliding variable
is given to transform a BT dynamic model into order reduction model, which can be designed more
favorable. By fully analyzing optimization problem in designing the training weights of Actor/Critic
structure based on the property of Hamiltonian term, Reinforcement Learning (RL) control scheme
in each side is proposed for a BT system. Consequently, we incorporate the RISE term into pro-
posed control frame to mathematically prove that the tracking errors asymptotically converge to zero.
Furthermore, the proposed control strategy can also guarantee the convergence of learning process.
Simulation results and the comparisons demonstrate the performance of the proposed control frame.
Keywords: Reinforcement learning, Approximate/adaptive dynamic programming (ADP), Actor/Critic

Structure, Robust Integral of the Sign of the Error (RISE), Bilateral Teleoperators
1 Introduction physical phenomenon, a well-known technique of

scattering/wave variables has been employed to
Numerous researches have been proposed since describe the exchange of energy among main parts
thirty years to develop control schemes of bilateral in BTs for obtaining the conventional passivity
motion synchronization in bilateral teleoperators based controller [3, 4]. However, scattering the-
(BTs) (see [1, 2] and the references therein). ory is difficult to implement in several situations,
This synchronization requires the tracking per- such as time-varying delays, actuator saturation.
formance between two sides of master and slave Moreover, an another technique of employing N
in the presence of the interaction of environment ports structure [5, 6] was mentioned to model the
and slave robot, the human operator and mas- exchange of energy in BTs. In [6], the 2-port net-
ter robot, time-varying delays, actuator saturation work theory combining passivity based controller
[1, 2]. Based on the equivalence between mechan- was utilized for a hexapod robot bilateral teleop-
ical systems and the electrical circuit in term of erator with two parts involving body and leg-level.
1
Springer Nature 2021 LATEX template
However, the descriptions of scattering/wave vari- the solution for Ricatti equation by an appropri-
ables, Bond graph, N ports structure are difficult ate algorithm. It follows that the work in [16] only
to develop for addressing time-varying delay issue guarantees for the linearized model of BTs in the
and input constraint in BTs [7]. case of particular cost function. To develop non-
In order to employ the key tool of Lyapunov linear optimal control problem for robotic systems
stability theory for developing the control design with general cost functions, the approximate algo-
in BTs, the consideration of Euler-Lagrange (EL) rithms and convergence analyses are necessary to
dynamical equation and state space representa- investigate, because the objective of optimal con-
tion in joint and task spaces was utilized [8, 9]. In trol problem is to obtain a controller among the set
[8], based on the appropriate integral Lyapunov of stabilizing control designs that minimize a given
function, the time-varying delay issue in communi- cost function. Considering the optimal regulation
cation channel between master and slave sides was of nonlinear systems, it is ineluctable to solve the
solved. Moreover, several different methods have challenge of surmounting the partial derivative
been proposed in [10–12] to fulfill the time-varying Hamilton-Jacobi-Bellman (HJB) equation, which
delay under the consideration of EL dynamical is extremely hard to directly solve for finding the
equation. An important technique of using Inte- optimal controller.
gral Lyapunov function with terminal term to be To the best of our knowledge, the unifica-
the function in term of time-varying delay was pre- tion between the convergence of RL algorithm
sented for addressing variable time delay problem and coordination tracking validation in robotic
and the force reflecting control law after employing control systems has only been investigated by a
the force observer [10]. For the case of unknown few researches [17–22]. The fact is that under
parameters in BTs, authors in [13] proposed the the time-varying desired reference, the optimal
adaptive coordination tracking control with back- control design for robotic systems requires the
stepping procedure and force reflecting control time-varying optimal controller and the equiva-
scheme based on Neural Networks (NNs) to esti- lent time-varying Bellman function, which leads to
mate the environmental torque. The method of the appearance of partial derivative term − ∂V∂x (x,t)
using NNs have also been developed in several with respect to time in HJB equation [18]. To
different BTs control systems, such as the exten- solve this challenge, the direct method has been
sion in finite time convergence [12]. However, it studied by employing Newton-Leibniz formula to
follows that the difficulty in finding the appropri- achieve optimal control law under the influence of
ate activation function as well as in computation. partial derivative term − ∂V∂x (x,t)
in [23]. To elim-
Moreover, the actuator saturation is also given inate this term in solving HJB equation by RL
as a practical challenge, which has not been yet algorithm in Robotic control systems, a different
solved in [10, 12]. To deal with the challenge approach was presented to transform the time-
of full-state constraint in nonlinear pure-feedback varying closed system into time-invariant system
systems in case that the states vector module is by adding the auxiliary state variables [17, 18].
satisfied a boundary condition, a log-type Barrier Nevertheless, the convergence of adjusting mech-
Lyapunov Function (BLF) was discussed in [14]. anism requires the satisfaction of Persistence of
Moreover, although authors in [15] have investi- Excitation (PE) condition [17, 18]. In the light of
gated this obstacle of input saturation by employ- the concurrent learning technique, by inserting the
ing an adaptive fuzzy control design, which may additional term in adjusting mechanism in learn-
degrade the control performance, but the input ing process, the convergence can be satisfied in the
constraint issue is still a challenge for achieving absence of PE condition [19].
BTs control design. As we have all known, the appearance of
In order to solve the difficulties of state, input unknown parameters and external disturbance is
constraints in robotic systems, optimal control quite practical in many robotics engineering con-
strategy has capability of handling these chal- trol fields, such as cooperating manipulators [17],
lenges by considering the constraint set. Authors surface vessels [18], BTs [1]. As an extension of
in [16] established the traditional optimal control RL algorithm to deal with dynamic uncertainties
for BTs after implementing the linearization with
the constant time delay and without considering
and external disturbance, the Off-Policy tech- particular situation of cost function after imple-
nique can be considered as a promising solution menting linearization of BT model and optimal
because of the capability in simultaneously com- control law, RL control schemes discussed in
puting the approximate optimal control law and [17, 18, 20, 29, 31, 36, 37] developing for differ-
Bellman function [24–28]. Moreover, a different ent robotic systems, the proposed RISE based
approach has been mentioned with the frame of Actor/Critic control strategy is constructed in
RL algorithm and identifier to estimate unknown the absence of analytical solution. Moreover, the
dynamics [29]. On the other hand, to solve exter- proposed RISE based RL control frameworks
nal disturbance using RL algorithm in robotic is established to develop for general quadratic
control systems, Disturbance Observer (DO) can form performance index without the connection
be considered as the additional term [18]. Besides, between weight matrices as described in [16, 31].
an effective solution has been proposed by inte- 3) The convergence of Actor/Critic algorithms and
grating disturbance in RL algorithm [30]. It can the coordination control effectiveness between
be concluded that the disadvantage of Off-Policy two sides, observer of dynamic uncertainties are
algorithm is required to collect the appropriate pointed out by theoretical proofs and a series
data for satisfying full-rank property [28]. Addi- of comparative simulations are conducted to
tionally, the unification between tracking perfor- demonstrate the performance of the proposed
mance and training law convergence has not yet control structure.
been investigated in the off-policy method imple-
The remainder of this article is organized as
mentation of nonlinear systems [4, 26, 27]. With
follows. Section 2 gives the problem statement
regard to the estimation in the frame of unknown
and Preliminaries with BTs model and the control
parameters and external disturbance, a significant
objective. The main results for the control frame
method of Robust Integral of the Sign of the Error
using Actor/Critic Structure and RISE estimation
(RISE) term was introduced in [31–33]. However,
are proposed. Section 4 shows the BTs simulation
the connection of this RISE term to optimal con-
studies and a brief conclusion is drawn in Section
trol problem is only solved in particular case for
5.
traditional manipulators [31].
Notation: We use the following notations
Inspired by the above works and analyses from
throughout this paper. Let k ∈ {l, r} denote
traditional nonlinear control technique to optimal
local side (l) or remote side (r) of a considered
control strategy, the central purpose of this arti-
BT system. ∥x∥ is defined via Euclidean norm
cle is to obtain the unification between optimal T n
control problem and coordination control between p the vector x = [x1 , ..., xn ] ∈ R as ∥x∥ =
of
2 2
x1 + ... + xn .
two sides, estimation of dynamic uncertainties for
BT systems. None of the previous works achieve
these two above aspects simultaneously. The pro- 2 Problem Statements and
posed controller is developed using the frame of Preliminaries
Actor/Critic strategy and RISE design with main
contribution to be described in the following: The Bilateral Teleoperator (BT) is structured by
a local robot (l) and a remote robot (r) with its
1) Different from the traditional approaches of joint space dynamic to be given as:
Lyapunov stability theory as well as scattering
theory for BTs [1–7, 9–13, 15, 34, 35] only study- 
Ml (ηl ) η̈l + Cl (ηl , η̇l ) η̇l + Gl (ηl ) + F (η̇l )
ing the coordination performance, a novel RISE



 =τ +τ +ω
based Actor/Critic control design is established l h l
to achieve not only optimal control requirement 

 Mr (ηr ) η̈r + Cr (ηr , η̇r ) η̇r + Gr (ηr ) + F (η̇r )

but also coordination tracking control objec- = τr − τe + ωr

tive between two sides in perturbed Bilateral (1)
Teleoperations in the presence of time-varying It can be seen that, although the BTs (1) struc-
delays. ture includes two sides but thanks to the similarity
2) Compare with the optimal control requirement between them (1), we can only consider the control
for BTs mentioned in [16] considering in the design for robot manipulator systems formulated
by the following dynamic equation: Control Objective: When the BT established

by (1), (2) is subjected to dynamic uncertainties,
Mk (ηk )η¨k + Ck (ηk , η̇k )η̇k + Gk (ηk ) + Fk (η̇k ) external disturbances, human and environment
(2)
= τk (t) + τk′ (t) torques as well as time-varying delays, the purpose
is to investigate a control frame including RISE
with and online Actor/Critic based optimal controller
to guarantee the coordination between two sides
k = l or r; (6), Fig. 1 while the minimization of cost function
(3)
τl′ = (τh + ωl ) , τr′ = (−τe + ωr ) ; (16) is well satisfied.
and τh is established by human torque to support lim (ηl (t) − ηr (t − Tr (t))) = 0;

t→∞
for the motion on the local side, τe is appeared (6)
lim (ηr (t) − ηl (t − Tl (t))) = 0;
by environment torque on the remote side. On t→∞
the other hand, Mk (ηk ) ∈ Rn×n is a general- where Tr (t) and Tl (t) define the time-varying
ized inertia matrix with symmetric and positive delays in remote, local sides, respectively.
definite property, Ck (ηk , η̇k ) ∈ Rn×n defines a
matrix obtaining from centripetal-Coriolis term,
Remark 1 It is obvious different from conventional BT
Gk (ηk ) ∈ Rn is a gravity vector to be com- control schemes [1–6, 13, 15, 34, 38] only developing
puted by Gk (ηk ) = ∂U∂η k (ηk )
k
, where Uk (ηk ) is the synchronization between two sides, the proposed
the potential energy. Additionally, Fk (η̇k ) ∈ Rn , control strategy theoretically guarantees for not only
τk (t) are vectors of a generalized friction, con- convergence between two sides to be described in (6)
trol inputs, respectively. Some following properties and minimization of performance index in the pres-
and assumptions are listed to consider the control ence of variable time delays. Moreover, optimal control
purpose have rarely been mentioned in BT control
design in next sections.
algorithm. It can be seen that the proposed controller
in [16] only implemented the special case of optimal
Property 1 The inertia symmetric matrix Mk (ηk ) control problem without analyzing the solution in gen-
is positive definite, and there exist a positive number eral cases. In this article, the RL algorithm based
a ∈ R and a positive function b(ηk ) ∈ R in term of optimal control is developed with the convergence of
ηk , ∀ξ ∈ Rn , such that: training law to be mentioned.
a∥ξ∥2 ≤ ξ T Mk (ηk )ξ ≤ b(ηk )∥ξ∥2 , (4)
3 Control Design and Analysis

ξ T Ṁk (ηk ) − 2Ck (ηk , η̇k ) ξ = 0, (5)
In this section, we design a control structure being

Assumption 1 If ηk (t), η̇k (t) ∈ L∞ , then all the combination of Robust Integral of the sign
these functions Ck (ηk , η̇k ), Fk (η̇k ), Gk (ηk ) and the of the error (RISE) and Actor/Critic based opti-
first, second partial derivatives of all functions mal control scheme (see fig. 1). The coordination
Mk (ηk ), Ck (ηk , η̇k ), Gk (ηk ) with respect to ηk (t) as tracking control as well as the optimal control
well as of the elements of Ck (ηk , η̇k ), Fk (η̇k ) in term
problem and the convergence of RL controller are
of η̇k (t) exist and are bounded.
theoretically analyzed.
In order to reduce the order of dynamic model
Assumption 2 The desired trajectory ηkd (t) = in (2), the following sliding variable can be uti-
ηkref (t) (Fig. 1) as well as the first, second, third and lized:
fourth time derivatives of it exist and are bounded. sk (t) = ėk + λk ek (7)
Additionally, the vectors ηl , ηr on the both sides satisfy
where
measurability and differentiability.
ek (t) = ηkref − ηk , λk ∈ Rnxn > 0;
Assumption 3 The vector τk′
in (3) and the deriva-
ηr (t − Tr (t)) , in Local Side (k = l)
ref
tives with respect to time are bounded by known ηk =
ηl (t − Tl (t)) , in Remote Side (k = r)
constants.
(8)
Fig. 1: The Control Structure of Master/Slave Side in BTs
Substituting (7) into (1), the dynamic model to split them into kinematic and dynamic models.
of BT is rewritten as: However, this method does not utilize in general
manipulator with integrability property [1, 5]. In this
Mk ṡk = −Ck sk − τk + fk − τk′ (9) article, the sliding variable sk (t) is employed to obtain
the reduced order model (9) of second-order uncer-
tain/perturbed BT systems. Furthermore, the total of
where fk (ηk , η̇k , ηkref , η̇kref , η¨k ref ) is obtained from
term τk′ (3) and nonlinear function fk (10) are solved
the reference and dynamic system: by RISE technique (see Fig. 1) in the next section.
fk = Mk (η¨k ref +α1 ėk )+Ck (η̇kref +α1 ek )+Gk +Fk

(10) 3.1 Actor/Critic based
By virtue of [17], it implies that the computa- Reinforcement Learning Control
tion of constraint forces τh , τe in (1), (2) give
out the motion/force control design after complet- Based on the property of the estimation µ bk (t),
ing motion control scheme based on estimation from (12), it implies that the dynamic model in
of the constraint forces in the original model each side of BT system can be described as:
(1), (2). Nevertheless, in each BT, it does not
ignore the constraint forces τh , τe in (1), (2), (3) badp
Mk ṡk = −Ck sk + uk (ηk , η̇k , t) + ∆k (t) (13)
under the interaction in two sides. This implies
the requirement of estimating them by the RISE where ∆k (t) = fk −τk′ −µ
bk (t). The optimal control
(Robust Integral of the Sign of the Error) tech- problem is implemented for the nominal dynamic
nique in next sections. For the unification between model in each side by eliminating the term ∆k in
optimal control and tracking effectiveness, the (13):
proposed controller is organized by two terms (in
Fig. 1), including Online Actor/Critic RL con- badp
Mk ṡk = −Ck sk + uk (ηk , η̇k , t) (14)
badp
trol uk (ηk , η̇k , t) and the RISE based estimator
bk (t) of unknown term fk − τk′ in (9).
µ In the view of (7) and (14), it follows that the
following time-varying model is obtained as:
τk (t) = µ badp
bk (t) − uk (ηk , η̇k , t) (11)
−λk ek + sk 0n×n adp
ẋk = + u
b (ηk , η̇k , t) (15)
Then, substituting τk (t) (11) into (9), it yields: κk Mk−1 k
badp
Mk ṡk = − Ck sk + uk (ηk , η̇k , t) − µ
bk (t) where xk = [eTk , sTk ]T , κk = −Mk−1 (ηkref − ek ) −
(12)
+ fk − τk′ Ck (ηkref −ek , η̇kref +λk ek −sk )sk and the objective
of optimal controller is to find the control signal u∗k
for minimizing the infinite horizon cost function
Remark 2 Due to non-integrability property in mobile
robots [18, 36], the separation technique was employed
as: and (13) as:

Z∞ d
1 T 1 adp,T adp badp
Xk = Ak (Xk ) + Bk (Xk )(u + ∆k (t)) (21)
badp
J(xk , u k ) = x Qxk + u Ru dt dt k
2 k 2 k
b bk
0
(16) Due to the effect of infinity terminal in integral
where Q ∈ R2n×2n and R ∈ Rn×n are positive def- (19), it implies that the value of Jk < ∞ holds in
inite symmetric matrices and their values describe appropriate class Υ(Ω) of control inputs u badp
k (t)
the efficiency of the control effort as well as the (see [18]), which guarantee the stability of (17).
state variables, respectively. Hence, it is necessary to assume the existence of
However, since the time-varying reference optimal state feedback control scheme u badp,∗
k (Xk )
ηkref (t) inspires the consideration of optimal con- as well as smooth function property of the corre-
trol problem for time-varying system (15), a more sponding optimal performance index Vk∗ (Xk (t)) =
systematic approach, which is considered as the min badp
Jk (Xk (t) , uk (t)), where:
equivalent autonomous system affine state-space badp
uk (X)∈Υ(Ω)
representation (17) with the condition that time

derivative of desired trajectory ηkref (t) is its func- badp
Jk (Xk (t), uk (t)) =
tion η̇kref (t) = fkref (ηkref ): Z∞
(22)
1 T 1 adp,T adp
X QT Xk + u Ru dτ
2 k 2 k
b bk
d
badp
Xk = Ak (Xk ) + Bk (Xk )u k (17) t
dt
is obtained from final cost function (19). In the
where Xk = [xTk , (ηkref )T , (η̇kref )T ]T view of dynamic programming (DP) principle,
after taking the time derivative of optimal per-
−λk ek + sk formance index Vk∗ (Xk ), Hamiltonian function is
 
 
0n×n
 Θ k 
 −1  given to find Bellman function Vk∗ (Xk ) from u badp,∗
k
 f ref (η ref )  , Bk (Xk ) = Mk
Ak (Xk ) =  
k k as:
02n×n
f˙kref (ηkref ) ∗

adp,∗ ∂Vk
(18) Hk Xk , u bk , badp,∗
= r(Xk (τ ), u k (τ ))
and Θk = −Mk−1 (ηkref − ek ) − Ck (ηkref − ek , η̇kref + ∂Xk
λk ek − sk )sk . It follows that the infinite horizon ∂Vk∗
+ (Ak (Xk ) + Bk (Xk )ubadp,∗
k )=0
cost function (16) is converted into the following ∂Xk
performance index: (23)
Furthermore, in the opposite direction, the
Z∞ optimal controller u badp,∗
k (Xk ) can be solved from
Jk = badp
r(Xk (τ ), uk (τ ))dτ (19) the optimization problem using Bellman function
0
Vk∗ (Xk ):
∂Vk∗

where:
min Hk Xk , ubadp
k , =0
ubadp
k (Xk )∈Υ(Ω) ∂Xk
1 T 1 adp,T adp
badp
r(Xk (τ ), uk (τ )) = X QT Xk + u Ru , ∗

2 k 2 k
b bk adp ∂Vk
where Hk Xk , u bk , =
∂Xk
Q2n×2n 02n×2n ∂V ∗
QT = .
02n×2n 02n×2n badp
r Xk (τ ), u k (τ ) +
k
badp
Ak (Xk ) + Bk (Xk )u k
(20) ∂Xk
(24)
It can be seen that matrix QT is given to eliminate
ref It should be noted that the satisfaction of sta-
the consideration of time-varying reference ηk
d ref
bility condition in Υ(Ω) containing input signals
and its derivative dt ηk . It should be noted that
badp
u k guarantees the existence of a solution in (23).
the completed BT model is established from (17) ∂V ∗
Additionally, the partial derivative ∂Xkk in (23)
implies difficulty in analytical solving (23) to find
Bellman function Vk∗ (Xk ) from ubadp,∗

k . Therefore, where
badp
σk (Xk , u k ) =
an approximate neural networks (NN) method ∂Ψk
Ak (Xk ) + badp
Bk (Xk )u is considered as the
∂Xk k
mentioned in [18] is considered to develop RL
c′
regression vector and the updatation law of W
strategy for each side in BTs. k
According to the results in many related pub- in (27) can be achieved as:
lications [18, 39] to develop RL strategy for non-
linear systems, for each fixed N ∈ N neurons, d c′ σk
W = −kck λk δhjbk (30)
there exists a Neural Network (NN), which can be dt k 1 + vk σkT λk σk
employed to estimate the Bellman smooth func-
tion Vk∗ (Xk ) over any compact set ΩX ⊆ R4n and where υk , kck ∈ R are constant positive scalars,
the optimal state feedback smooth control input and λk (t) ∈ RN ×N is a solution of the dynamic
badp,∗
u (Xk ) is computed from this NN: equation as follows:
k
Vk∗ (Xk ) = Wk T Ψk (Xk ) + εk (Xk ) (25) d σk σkT

λk = −kck λk λk ;
dt 1 + υk σk λk σkT (31)
!
T
1 ∂Ψk T
∂εk (Xk ) λk (t+
s ) = λk (0) = φ0k I
badp,∗
uk (Xk ) = − Rk−1 BkT Wk +
2 ∂Xk ∂Xk
(26) with the time t+ s to be given to satisfy the
where Wk ∈ RN is an unknown ideal weights following eigenvalue condition λmin {λk (t)} ≤
vector and Ψk (Xk ) ∈ RN is a smooth activation φ1k , φ0k > φ1k > 0 and the method of choosing
function vector in term of Xk . Furthermore, in the covariance matrix λk (t) is given as mentioned in
light of the Weierstrass approximation theorem [18]
to be mentioned in [18], the reconstruction error φ1k I ⩽ λk (t) ≤ φ0k I (32)
εk (Xk ) ∈ RN of the Bellman function Vk∗ (Xk ) as Similarly, by minimizing the squared Bellman
2
well as its partial derivative with respect to Xk error δhjbk , we obtain the updatation law of actor
(Xk )T NN as:
satisfy that εk (Xk ) and ∂εk∂X k
in (25), (26) will
converge to 0 as N → ∞. However, due to the dc ka1
development of practical systems with RL strat- Wak = − q Πk δhjbk − ka2 (W ck′ )
cak − W
dt 1+σ σT
egy, it is required to implement a finite number N k k
neurons for approximating the Bellman function (33)
and the state feedback optimal controller: where:
ck′ T Ψk (Xk )
Vbk (Xk ) = W (27) ∂Ψk ∂Ψk T c
Πk = Bk Rk−1 BkT ck′ )
(Wak − W (34)
1 −1 T ∂Ψk T c ′ ∂Xk ∂Xk
badp
uk (Xk ) = − Rk Bk (Xk )( ) Wk (28)
2 ∂Xk
The term δhjbk in (30) can be expressed in
The method of choosing activation function detail from (25), (26), (27), (28), (29) to analyze
Ψk (Xk ) in (27), (28) can be derived according to the convergences of Critic weights vectors (30) as:
the particular description of systems [18]. Addi-
tionally, (23), (24) imply the training law of
T ∂Ψk
weights vector Wc ′ by minimizing the integral of
k
c′ k σk −
δhjbk =W badp,∗
Ak (Xk ) + Bk (Xk )uk
R∞ 2 ∂Xk
squared Bellman error δhjbk after calculating −ubadp,∗,T badp,∗
Rk u badp,T
+u badp
Rk u
0 k k k k
c′ k :
Bellman error δhjbk to be a function of W ∂εk adp,∗
− (Ak (Xk ) + Bk (Xk )u
bk )
∂Xk
∂ Vb k ∂Vk∗ (35)
δhjbk = H badp
b k (Xk , u
k , badp,∗
) − Hk∗ (Xk , u k , ) Substituting (35) into (30), the dynamics of
∂X k ∂X k
Critic weight error Wf′ = W ∗ − W c ′ can be
T 1 T 1 adp,T k k
=W badp
c′ k σk (Xk , u
k ) + Xk Qk Xk + u badp
Rk u
2 k k
b
2
(29)
expressed as: the optimal control scheme as well as the tracking

effectiveness is guaranteed in next Theorem 1.
d f′ T f′ σk
W = −ηck λk ψk′ ψk′ W k + ηck λk ϖk
dt k 1 + υk σkT λk σk
(36) 3.2 RISE based Estimation
where Consequently, the control scheme in each side of
BTs (11) can be developed by estimating ϵk =
1 f′ T ∂Ψk ∂Ψk T f
ϖk = W k( )Bk Rk−1 ATk ( ) Wak fk −τk′ based on the Robust Integral of the Sign of
4 ∂Xk ∂Xk
the Error (RISE) framework [31–33]. In the view
1 ∂εk ∂εk T
− ( )Ak Rk−1 ATk ( ) (37) of dynamic equation (2) and the control law (11),
4 ∂Xk ∂Xk (28), it implies that:
∂εk
− Ak (Xk ) + Bk (Xk )u badp,∗
k
∂Xk Mk (ηk ) η̈k + C (ηk , η̇k ) η̇k + G (ηk ) + F (η̇k )
fak = W ∗ − W
cak and ψ ′ (t) = √ σk badp
bk (t) − u
=µ k (ηk , η̇k , t) + τk′ (t)
W k Tλ σ
is
1+υk σk k k
⇒µbk (t) = Mk (ηk ) η̈k
estimated by n o
badp
+ C (ηk , η̇k ) η̇k + G (ηk ) + F (η̇k ) + uk − τk′ (t)
1
∥ψk′ ∥ ≤ √ (38) ⇒ Mk (ηk ) η̈k + hk (ηk , η̇k , t) = µ
bk (t)
υk φ1k
(41)
The nominal system is obtained after ignoring the where hk (ηk , η̇k , t) = C (ηk , η̇k ) η̇k +
effect of actor weight error (37) G (ηk ) + F (η̇k ) + u badp
k − τk′ (t). To handle the
coordination tracking of ηk under unknown func-
d f′ T f′ tion hk (ηk , η̇k , t), the following RISE is presented
W = −ηc λk ψk′ ψk′ W f′
k = pk Wk (39)
dt k as:
In the light of [18, 39], if ψk′ (t) satisfies the per- bk (t) = (ksk + 1) (sk (t) − sk (0))
µ
sistence of excitation (PE) criteria (40) then W f′ Zt
k
exponentially converges to origin (42)
+ ((ksk + 1) αk sk (τ ) + βsgn (sk )) dτ
tZ
0 +δ
0
T
µ2 I ≥ ψk′ (s)ψk′ (s) ds ≥µ1 I, ∀t0 ≥ 0 where ksk , β ∈ R are two positive coefficients
t0 being selected as:
(40)
where µ1 , µ2 are positive numbers. 1
β > ζk1 + ζk2 . (43)
αk
Remark 3 The PE criteria (40) is necessary to sat-
isfy for obtaining the convergence of Critic training
where ζk1 , ζk2 denote two given bounds of ∥NDk ∥
d
weights W c ′ . The concurrent learning method is an and dt NDk , respectively, as described in [31]
k
effective technique that considers satisfaction of PE and (45)). With the purpose of considering the
criteria (40), whereas it requires history stack man- stability of the closed system under RISE con-
agement algorithm, which reduces its possibility in trol part, some following Lemmas are presented
real-time situations [40]. However, thanks to the con- to establish the Lyapunov candidate function with
trol structure (11) (Fig. 1), the tracking performance the notation that one positive definite term part
of closed system is still guaranteed in the absence of is designed based on the Lemma 1 [33]:
PE criteria (40), as shown in Section 3.3. As men-
tioned in [31], the optimal control scheme was directly
solved and developed in particular case with a given Lemma 1 [33] Consider the auxiliary term Vk1 (t)
connection between two weight matrices in cost func- designed as follows:
tion. However, because it is impossible to directly solve Vk1 (t) = (ṡk + αk sk )T (NDk − β1 sgn(sk )) (44)
in general case of cost function given in (16), Actor/-
Critic strategy is necessary to consider for achieving
where Additionally, in order to consider the attraction

df dτ ′ set of the closed system under the term of RL con-
NDk = dk − k ; badp
troller u (ηk , η̇k , t) in (11), we employ several
dt dt k
common assumptions to be discussed in [18, 39]:

ref
fdk = Mk ηk η̈kref + Ck ηkref , η̇kref η̇kref (45)

+ Gk ηkref + Fk η̇kref .
Assumption 4 The term Bk (Xk ) in each side of
, then the following inequality can be obtained: the BT system (21) is bounded by a known positive
constant Lk as follows: 0 < ∥Bk (Xk )∥ ≤ Lk .
Zt n
| skj (0) | −sT
X
Vk1 (t) dτ ≤ β1 k (0)NDk (0)
0 j=1 Assumption 5 There exists a known constant scalar
(46) ∆k such that the whole of dynamic uncertainties and
external disturbances, human force τh , environment
torque τe of each side in BT is lumped into term ∆k (t)
in (13) to be satisfied as:
Proof See proof in [33]. □
∥∆k (t)∥ ≤ ∆k (50)
To evaluate the coordination tracking of the
closed BTs under the RISE scheme (41), the
Lemma 2 is employed after considering the time Theorem 1 Suppose that Assumptions 4-5 hold as
well as the bound conditions in training law of NN
derivative of Lyapunov function candidate (80).
using activation function Φk (Xk ) and its equivalent
derivative to be guaranteed [18]. Moreover, PE crite-
Lemma 2 [41]: Assume that the existence of a solu- ria (40) is satisfied in term ψk′ (t) = q σk
in
Tλ σ
1+υk σk
d k k
tion of the nonlinear
system dt ξ (t) = f (ξ (t) , t)
f : Rm × R≥0 7→ R is satisfied. Let the region Λ be each side of the BT system (1) and the parameters are
defined as Λ = {ξ ∈ Rm , ∥ξ∥ < ε} and let V : Λ × mentioned in (51), (52), (33) satisfy the auxiliary con-
R≥0 7→ R≥0 be a continuously differentiable function dition kγk3
a1
> h1 h2 . If the control scheme establishes
such that: to each side of BTs designed in (11) with RISE con-
troller (42) and Actor/Critic (AC) control law (28)
W1 (ξ) ≤ V (ξ, t) ≤ W2 (ξ) and
using the training laws (30), (33), then the following
d (47)
V (ξ, t) ≤ −W (ξ) , ∀t ≥ 0, ∀ξ ∈ Λ control objectives can be obtained:
dt
1. The AC updated weights W c ′ in each
cak and W
where W1 (ξ) , W2 (ξ) are continuous positive definite k
functions and W (ξ) is a uniformly continuous pos- side of a BT system converge to ideal weight
itive semi-definite function. If the inequality (47) is W ∗ such that the errors are UUB.
satisfied and the initial point ξ (0) ∈ Ω, then we obtain: 2. The tracking performance in each side of the
BT closed control system are also UUB.
W (ξ (t)) → 0 as t→∞ (48)
where the region Ω is given by:
Ω = {ξ ∈ Λ, W2 (ξ) ≤ δ} , 0 <δ< min W1 (ξ) (49) Proof Thanks to the satisfaction of PE criteria (40)
∥ξ∥=ε and considering the nominal system (39), it is assured
the existences of the time-varying function Vk′ : RN ×
[0, ∞) → R as well as four corresponding positive
Proof See proof in [41]. □ coefficients γk1 , γk2 , γk3 , γk4 ∈ R+ such that:
f ′ 2 f ′ 2

′ f′
γk1 W k ≤ Vk Wk , t ≤ γk2 Wk
3.3 Convergence and Stability
Analysis ∂Vk′ ∂Vk′
fk′ ) ≤ −γk3
2
fk′
+ (pk W W
∂t ′ (51)

∂Wf
k
This section demonstrates the efficiency of the
∂V ′
proposed distributed control structure (Fig.1) in

k f′
≤ γk4 W

k
two main theorems. The first theorem studies the f′

∂W
k
convergence of training weights in Actor/Critic Since the time-varying property of dynamic equation
strategy and both theorems discuss about track- (36) and the purpose of considering the convergence of
ing control problem in some different situations. f ′ , the time-varying function V ′ (W
W f ′ , t) is employed
k k k
as one part of the proposed Lyapunov function can- where

didate to study the convergence of Critic weights σk′ = Ωk + ∆pk (56)
c ′ . Furthermore, the estimation in time derivative of
W T f′
k
the Lyapunov function candidate is necessary to uti- Ωk = −ηc λk ψk′ ψk′ W k (57)
lize some following inequalities in each side of a BT ∆pk =
obtained from Assumption 4 and the bound assump- σk

1 f T ∂Ψk

tions of ideal NN weights, activation function and its ηc λk W Bk Rk −1
1 + νσk T λk σk 4 ak ∂Xk
derivative in [39]: T
∂Ψk fak − 1 ∂ϵk Bk Rk −1 Bk T
Bk T

f′ W
Wk ≤ h1 , ∂Xk 4 ∂Xk

∂Ψk
T
−1 T ∂Ψk T ∂ϵ ∂ϵ
∂X k k
B R B k ( ) ≤ h2 × − uadp,∗
Ak (Xk ) + Bk (Xk )bk
k ∂Xk ∂Xk ∂Xk
1 f T ∂Ψk

∂Ψk T f (58)
Wak ( )B R−1 BkT ( ) Wak
4 ∂Xk k k ∂Xk The following equation is obtained from (23):
1 ∂ε ∂ε ∂Vk∗
− ( k )Bk Rk−1 BkT ( k )T A (X ) =
4 ∂Xk ∂Xk ∂Xk k k
∂εk ∂Vk∗

− Ak (Xk ) + Bk (Xk )buadp,∗ ≤ h3 uadp,∗ (Xk )
(59)

k (52) − B (X )b
∂Xk ∂Xk k k k
1 T ∂Ψk ∂ε

σk ( )B R−1 BkT ( k )T badp,∗,T
− XkT QT Xk − u badp,∗
(Xk )Rk u (Xk )
2 ∂Xk k k ∂Xk k k
1 ∂ε ∂ε Substituting (59) and Actor/Critic based RL con-

+ ( k )Bk Rk−1 BkT ( k )T troller (28), (33), (51) of each side in BTs into (55), it
2 ∂Xk ∂Xk
yields:
1 ∂Ψk ∂Ψk T f
+ WkT ( )B R−1 BkT ( ) Wak
2 ∂Xk k k ∂Xk d
V (X , t)
1 ∂ε ∂Ψk T f
dt k k
+ ( k )Bk Rk−1 BkT ( ) Wak ≤ h4 ,

f′
≤ −zkT Qzk − u badp,∗,T badp,∗

2 ∂Xk ∂Xk k Rk u k + h4 W k ∆pk

Since the purpose of studying the convergence of
f′
2
Actor, Critic weights (30), (33) as well as coordina- ukadp,∗,T Rk (b
+ 2b uadp,∗
k −ubadp
k ) − h3 Wk
tion tracking control problem, the Lyapunov function T f′
+ ηa2 W (Wk − W uadp,∗,T
fak ) − 2b Rk ∆k (t)
candidate contains not only Vk′ (W f ′ , t) but also the
fak
k
k
ηa1 T ∂Ψk
conventional quadratic form 12 W fT W
ak ak and optimal
f +q W B (X )R−1
∂Xk k k k
fak
function Vk∗ (Xk ), as described in the following non- 1 + σkT σk
autonomous function: T
∂Ψk fk′ − W
fk′ , t) + 1 W
Vk (Xk , t) ≜ Vk∗ (Xk ) + Vk′ (W fT W (53) × BkT (Xk ) (W fak )δhjbk
2 ak ak ∂Xk
f
(60)
Because of the description of smooth and positive
In the view of (28), (26) and inequality (52), it
definite function of the optimal function Vk∗ (Xk ), it
follows that:
implies that it is bounded by two class-K functions
β1 (.), β2 (.) as follows: uadp,∗,T
2bk uadp,∗
Rk (bk −ubk ) =
T
β1 (∥Xk ∥) ≤ Vk∗ (Xk ) ≤ β2 (∥Xk ∥)

(54) 1 T ∂Ψk ∂ϵk
Wk Bk Rk −1 Bk T
2 ∂Xk ∂Xk
According to (51), (54), it leads to the proper property T
of Lyapunov function Vk (Xk , t) (53). Implementing its 1 ∂Ψk ∂Ψk
+ Wk T Bk Rk −1 Bk T W
fak
(61)
Lie derivative along BT system (1) under the proposed 2 ∂Xk ∂Xk
controller, we can achieve the following result: T
1 ∂ϵk ∂Ψk
d ∂V ∗ ∂Vk′ + Bk Rk −1 Bk T W
fak
uadp
Vk (Xk , t) = k (Ak (Xk ) + Bk (Xk )b )+ 2 ∂Xk ∂Xk
dt ∂Xk k ∂t T
1 ∂ϵk ∂ϵk
′
∂Vk ′ T c˙ + Bk Rk −1 Bk T ≤ h4
+ Ω −W W ak 2 ∂Xk ∂Xk
f′ k
fak
∂W k and the term ∆pk is bounded as:
∂Vk∗
+ B (X )∆ (t) ∆pk ≤ √ηc φ0 h3

∂Xk k k k υk φ1
(62)
(55)
11
According to (52) and (35), it can be obtained the the BT as:

estimation of the remaining term in (60) as: d
V (X , t) ≤
∂Ψ T dt k k
η T ∂Ψk
q a1 W Bk (Xk )Rk−1 BkT (Xk ) k 1 2
− zkT (Q − ρI)zk + λmax (Rk )∆k
fak
1 + σkT σk ∂Xk ∂Xk
2
′ 2 2
fk′ )δhjbk = q ηa1

fak − W
(W − (1 − χ)(γk3 − ηa1 h1 h2 ) f
Wk − ηa2 f
Wak
1 + σkT σk 1
+ ηa1 h21 h2 h3 + h4 +
∂Ψ T 4χ(γk3 − ηa1 h1 h2 )
T ∂Ψk
×Wfak Bk (Xk )Rk−1 BkT (Xk ) k
(W fk′ )
fak − W γ η φ
k4 c 0
2
∂Xk ∂Xk √ h3 + ηa1 h1 h2 h3 +ηa1 h21 h2 + ηa2 h1
∂Ψ T 2 υk φ1
f′ T 1 f T ∂Ψk
× −W k σk + Wak Bk Rk−1 BkT k
W
fak (67)
4 ∂Xk ∂Xk The parameters are chosen satisfying 0 < χ <
1 ∂ϵk ∂ϵ T 1, γk3 > ηa1 h1 h2 . Define a state variables vector
− Bk Rk−1 BkT k
iT
4 ∂Xk ∂Xk
h ′
zk = XkT W fT W
k
fT
ak (68)
− (Ak (Xk ) + Bk (Xk )b uadp,∗
k ) to study not only the convergence of training laws
f′ , W
W fak but also tracking performance of each side in
′ 2 ′ ′ k
Wk + ηa1 h21 h2 f
≤ ηa1 h1 h2 f Wk + ηa1 h1 h2 h3 f
Wk a BT. Due to the positive definite property, it implies
that there exist two class-K functions β3 , β4 such that:
+ ηa1 h21 h2 h3
1
(63) β3 (∥zk ∥) ≤ zkT (Q − ρI)zk
f T (Wf′ − W 2
On the other hand, ηa2 W ak k
fak ) can be written ′ 2 2
as: + (1 − χ)(γk3 − ηa1 h1 h2 ) f
Wk + ηa2 f Wak
ηa2 W T f′
fak (Wk − Wfak ) = ηa2 W
fakT f′
(Wk − W fak ) ≤ β4 (∥zk ∥)
′ 2 (64) (69)
≤ ηa2 h1 f
Wk − ηa2 f
Wak According to indirect estimation (69), the inequality
Furthermore, the additional inequality is employed: (67) can be achieved as:
d
bkadp,∗,T Rk u
badp,∗ uadp,∗,T V (X , t) ≤
−u k − 2bk Rk ∆k (t) dt k k
2
≤ ∆k (t)T Rk ∆k (t) − β3 ∥zk ∥ + λmax (Rk )∆k + ηa1 h21 h2 h3
(65)
≤ λmax (Rk )∥∆k (t)∥2 1 (70)
+ h4 +
2 4χ(γk3 − ηa1 h1 h2 )
≤ λmax (Rk )∆k γ η φ 2
k4 c 0
√ h3 + ηa1 h1 h2 h3 +ηa1 h21 h2 + ηa2 h1
Substituting (63), (64), and (65) into (60), it can be 2 υk φ1
achieved next estimation as: It implies to the attraction region of ∥zk ∥ in each side
d of the closed BT system as:
V (X , t) ≤
dt k k

1
1 ′ 2 Ωzk ≜ zk : ∥zk ∥ ≤ β3−1
− zkT (Q − ρI)zk − (γk3 − ηa1 h1 h2 ) f Wk 4χ(γk3 − ηa1 h1 h2 )
2 γ η φ 2
k4 c 0
2 γk4 ηc φ0 √ h3 + ηa1 h1 h2 h3 + ηa1 h21 h2 + ηa2 h1

Wak + ηa1 h21 h2 h3 + h4 + √

− ηa2 f h3 2 υk φ1
2 υk φ1
2
+ ηa1 h1 2 h2 h3 + h4 + λmax (Rk )∆k

+ ηa1 h1 h2 h3 + ηa1 h21 h2 + ηa2 h1 fWk′

2 (71)
+ λmax (Rk )∆k
Therefore, two above conclusions (1, 2) in Theorem 1
(66)
can be achieved simultaneously because of UUB prop-
After utilizing the classical inequality ab ≤ γa2 +
1 2 erty in ∥zk ∥ with the attraction region (71). The proof
4γ b , the final estimation is expressed in each side of of Theorem 1 is just completed. □
Remark 4 The following example points out the neces-

sity fact of a finite-time convergence in sliding variable
sk (t) (7). Let’s consider the system given as
dx
= Ax + Bs (72)
dt
ds taking the time derivative of (76), the closed loop error

= Cx + Ds (73)
dt systems for pk (t) can now be represented as:
where A ∈ Rn×n , B ∈ Rn×r , C ∈ Rr×n , D ∈ Rr×r d 1 dMk (t)
and x, s are the state Mk (t) p =− pk + Nek + NDk − sk
variables. Choosing A a Hurwitz dt k 2 dt (77)
A B
matrix and not a Hurwitz matrix. Moreover, − (ksk + 1) αk sk − βk sgn (sk )
C D
s is also a sliding variable because if s = 0 then accord- where Nek (ek , sk , pk , t) is computed as:
ing to (72), x converge to 0. However, in the case that
s converges to 0 in infinite time, x does not converge Nek = − C˙k sk − Ck ṡk − 1 d Mk pk + d (fk − fid )
to 0. It is worth emphasizing that, different from gen- 2 dt dt
eral sliding mode controllers in [21, 42, 43] considering dMk ds c′
dW
finite time convergence of sliding variables, the pro- + αk sk + αk Mk k + sk + k
Φ(Xk )
dt dt dt
posed control scheme in Theorem 1 determines the
+Wck′ dΦ(Xk )
convergences of both errors ek and sliding variables sk dt
without studying the finite time convergence of sliding (78)
variables sk . The second step is to proceed the compensation of the
sign term in (77) with an auxiliary positive definite
term to be given as follows:
The next center theorem addresses the track-
n Zt
ing performance and optimal control problem of
−sT
X
Pk (t) = β1 | skj (0) | k (0)Ndk (0)− Vk1 (t) dτ
the proposed controller for perturbed BTs in the
j=1 0
presence of time-varying delays. (79)
in the following Lyapunov function:
Theorem 2 Consider the perturbed BTs (1) to be 1 T 1 T
V (χk , t) = eT
k ek + sk sk + pk Mk (ηk )pk + Pk (t)
mentioned to time-varying delays (Fig.1) in the con- 2 2
sideration of Assumptions 1-3, the proposed control (80)
T T T p T 3n+1
frames in both sides (11) that are investigated simi- with χk (t) = ek sk pk pk (t) ∈R . In the
larly with respect to the notation (3), (8) with RISE view of Filippov’s work in [33], it is able to develop
control term µbk (t) being considered in (42), the frame the stability consideration based on Lemma 2 after
of Actor/Critic RL strategy (27), (28) with the adap- obtaining the existence of solution χk (t).
tation laws (30), (33) can guarantee not only opti- Third, the Lie derivative of time-varying Lyapunov
mal effectiveness but also the boundary property of function candidate (80) along the closed error system
all closed-system signals and the following tracking (77) with the notation (79) is shown as:
performance in the closed-loop BTs d
V (χk , t) = − 2eT T T e
k λk ek + 2sk ek + pk Nk (ek , sk , pk , t)
lim ∥χk (t)∥ = 0, ∀ζk = ek T sT T T ∈ Φ (74) dt

t→∞ k pk k
− (ksk + 1)pT T
k pk − αk sk sk
where the domain Φk is obtained from (86). Further- (81)
more, the estimation performance of total of distur- According to the fact that 2sT 2 2
k ek ≤ ∥ek ∥ + ∥sk ∥ ,
bance and dynamic uncertainties, constraint forces is (81) can be derived:
ensured by RISE control part as follows:
d
bk (t) → fk − τk′ as t → ∞
µ

(75) V (χk , t) ≤pT e T
k Nk (ek , sk , pk , t) − (ksk + 1)pk pk
dt
− (2λmin (λk ) − 1) ∥ek ∥2 − (αk − 1) ∥sk ∥2
(82)
Proof First, substituting Actor/Critic control input In the view of [31, 33], it follows that the existence of
(28) into the closed tracking error system to be a bound of N ek (ek , sk , pk , t) (78) holds:
reduced the order after taking time derivative of (9):
Nk (ek , sk , pk , t) ≤ Hk (∥ζk ∥) ∥ζk ∥ (83)
e
Mk pk = −Ck sk + fk − τk′ + αk Mk sk + Wck′ Φ(Xk ) − µ̂k
(76) where ζk = ek T sT T T and H : R+ 7→ R+ is a

k pk k
where pk = ṡk +αk sk , αk ∈ R+ . For the consideration non-decreasing map. Based on (83) and (82), it implies
of convergences of all three vectors {ek , sk , pk }, based that the following result holds:
on the estimation of µk and the RL controller (28),
d
V (χk , t) ≤ −λ3 ∥ζk ∥2 −ksk ∥pk ∥2 −Hk (∥ζk ∥) ∥pk ∥ ∥ζk ∥
dt
(84)
13
Combining with
traditional inequality 0 ≤ Remark 5 The effectiveness of (75) implies that it
2
a2 + ab + b4 (∀a, b ∈ R), estimation (84) leads to is obvious different from [35], the measurable prop-
an inequality as: erty of human force and the environmental force are
not necessary to give. On the other hand, due to the
d 1
V (χk , t) ≤ −λ3 ∥ζk ∥2 + H2 (∥ζk ∥) ∥ζk ∥2 (85) discontinuous function in RISE part µ bk (t) (42), it
dt 4ksk k is necessary to utilize Fillipov problem as shown in
Therefore, we can obtain the following result: proof of Theorem 2 to point out the solution existence,
d which has not been considered in [31]. Additionally,
V (χk , t) ≤ −α ∥ζk ∥2 , α > 0, although the tracking performance is considered in
dt n p o both Theorem 2 and Theorem 1 but disadvantage of
∀χk ∈ Ωk = χk ∈ R3n+1 | ∥χk ∥ ≤ Hk−1 2 λ3 ksk RISE part µ bk (t) (42) is to require the assumption of
(86) initial states.
Under the consideration of Property 1, it is obtained
that:
V1 (χk ) ≤ V (χk , t) ≤ V2 (χk ) (87) Remark 6 It is worth stressing that the possibility
1
2 of the proposed controller as shown in Theorem 1,
where V1 (χk ) = 2 min{1, a} ∥χk ∥ , V2 (χk ) = Theorem 2 involves both the handling of variable time
max 1, 21 b(ηk ) ∥χk ∥2

delays and the elimination of the following conditions
The following implications are considered in the given in [1, 2] to consider the stability of closed system:
case that χk ∈ Ωk . According to (87) and (86),
Zt Zt
it yields that ek , sk and pk ∈ L∞ hold. Combin-
ing with pk = ṡk + αk sk and (7), it is true that − τhT (σ)sl (σ) dσ ≥ −ch , τeT (σ)sr (σ) dσ ≥ −ce
ėk , ṡk ∈ L∞ . Moreover, based on Assumption 3, 0 0
it leads to the result η(t), η̇(t), η̈(t) ∈ L∞ holds. (90)
Thus, combining with Assumption 2, it implies that Furthermore, according to [17], the proposed con-
Mk (ηk ), Ck (ηk , η̇k ), Gk (ηk ), fk (η̇k ) ∈ L∞ . On the troller given in Theorem 1 can be extended for the case
other hand, based on dynamic model (1) and Assump- of motion/force hybrid control structure in remote side
tion 3, it yields that τk (t) ∈ L∞ as well as the of BTs after computing the constrain factor.
boundary description of all closed-loop system signals
is verified.
The consideration of employing Lemma 2 requires 4 Simulation Results
to find the region Φk of initial states χ(0) such that
χk (t) ∈ Ωk , ∀t ≥ 0. According to the estimation (87) This section presents the simulation results of the
and the set Ωk in (86), it can achieve the region Φk actual controlled BT system and some compar-
depending on parameter ksk : isons with the previous controller in [2] are given
p 2 to demonstrate the effectiveness of the proposed
Φk = χ(t) ∈ Ωk | V2 (χk (t)) < λk Hk−1 2 λ3 ksk control laws in Section 3 considering the dynamic
(88) model of BT to be described in [2]. With the struc-
Additionally, according to pk (t) ∈ L∞ and (42), ture of 2-DOF (degrees of freedom) manipulator
d
it leads to dt µˆk ∈ L∞ . Combining with (77), we can in each side of actual BT, the dynamic model (1)
conclude not only ėk (t), ṡk (t) ∈ L∞ but also ṗk (t) ∈ is employed by some following matrices:
L∞ . Thus, it is true that W (χk ) = −α ∥ζk ∥2 in (86) is

uniformly continuous. Finally, in the light of Lemma ϱ1 + 2ϱ2 cos ηk2 ϱ3 + ϱ2 cos ηk2
2, it is concluded that limt→∞ ∥χk (t)∥ = 0, ∀ζk ∈ Mk (ηk ) = ,
ϱ3 + ϱ2 cos ηk2 ϱ3
Φk . From (12) and (42), the RISE controller can be
rewritten as follows: −ϱ2 sin ηk2 η̇k2 −ϱ2 sin ηk2 (η̇k1 + η̇k2 )
Ck (ηk , η̇k ) = ,
ϱ2 sin ηk2 η̇k1 0
Mk ṡk + Ck sk + τk′ − ubadp
k (t) − f k = −b
µk (t) (89)
ϱ cos ηk1 + ϱ5 cos(ηk1 + ηk2 )
Combining with pk = ṡk + αk sk , it yields that ṡk → G(ηk ) = 4 , k ∈ {l, r}
ϱ5 cos(ηk1 + ηk2 )
0 as t → ∞ (∀ζk ∈ Φk ). Moreover, due to the infinity (91)
terminal in cost function (20), it is guaranteed the
where ϱi , i = 1, 5 are constant coefficients to
convergence that u badp
k (t) → 0 as t → ∞. Therefore, be obtained from mechanical parameters and
the statement (75) is verified and it is true that RISE
gravitational acceleration. For the purpose of
bk (t) is able to estimate the term (fk − τk′ ).
controller µ
□ comparison with related controller in [2], the
mechanical parameters in [2] are utilized as fol-

lows Ml = [1.4, 1.0] kg, Il = [0.22, 0.24] kgm2 ,
Ll = [0.6, 0.4] m, Mr = [1.8, 1.6] kg, Ir =
[0.35, 0.33] kgm2 , Lr = [0.8, 0.6] m and g =
9.8 m/s2 . Moreover, the initial conditions of
robotics are given as ηl (0) = [0.5, 0.6]T , η̇l (0) =
[0 0]T , ηr (0) = [−0.2, 0.3]T and η̇ r (0) = [0, 0]T .
Therefore, the actual dynamic parameters’ vec-
tors in two sides of BTs are obtained as
θl = [0.99, 0.12, 0.28, 1.02, 0.2]T and θr =
[2.14, 0.38, 0.47, 2.0, 0.48]T . The simulation studies
are implemented in the following scenario. During
the first 7 seconds of the simulation, neither robot
is impacted by external forces; however, after the
time t = 7s, the operator’s outside force comes Fig. 2: Variable Time Delay between two sides of
into operation. The local robot is moved, and the BTs
trajectory of the slave robot is given to make an
interaction with a wall surface in the robot manip-
ulator’s x direction. The operator removes the
robot’s trajectory from the interaction at t = 22s.
Then the Master robot is no longer affected by the
wall’s surface. For checking the convergence of the
training law in Neural Network, a small noise is
inserted into the system for the first three seconds.
Additionally, the system’s varying communication
latency in Fig. 2 are considered in the proposed
control deign as follows:
Tl (t) = 0.2 + 0.1 sin (2t) + 0.1 sin (3t) ,

(92)
Tr (t) = 0.25 + 0.1 sin (2t) + 0.05 sin (4t)
We conduct the proposed control scheme to be

shown in Theorem. 2 with control parameters in Fig. 3: Evolution of Joint Angles of two sides in
Actor-Critic structure and the BT system under finite time control scheme [2]
RISE based estimator

10 11
to be selected as: Q = and R = in
01 01
cost function of two sides in BTs. Furthermore, the notation Xk ∈ R8 (17) as:
parameters in Actor/Critic control structure are

1 0 Ψk (Xk ) =[x2k1 , xk1 xk2 , x2k2 , x2k1 x2k3 ,
given as λl = λr = , kcl = kcr = 2, ka1l = (93)
0 0.6 x2k2 x2k4 , x2k1 x2k5 , x2k2 x2k6 ]T
ka1r = 2, ka2l = ka2r = 4, vl = vr = 0.01. The
RISE based estimator’s parameters are choosen
Remark 7 Almost references on RL algorithms have
by ksl = diag {35, 40} , ksr = diag {40, 35} , βl =
not studied the precise method of choosing activation
βr = diag {0.1, 0.1} , αl = αr = diag {1.1, 1.1}. functions [17, 37, 39] in general cases. However, the
The value function and the RL controller are selection approach can be accomplished after solving
approximated by a neural network and the acti- the Bellman function in a special cost function (see
vation functions Ψk (Xk ) ∈ RN , k ∈ {l, r} of Theorem 2 in [18]).
the selected neural network are chosen with the
15
Fig. 4: The control signals of two sides in the BT Fig. 6: Sliding variables of two sides in the BT
system under finite time control scheme [2] system under finite time control scheme [2]
Fig. 5: Tracking error of two sides in the BT sys- Fig. 7: Evolution of Joint Angles of two sides in
tem under finite time control scheme [2] the BT system under the proposed Optimal Con-
trol Scheme
It can be seen that, the coordination effective-

ness between two sides of actual BTs given in in Fig.2. It is worth emphasizing that the simula-
Theorem 1 are presented in Fig. 7, Fig. 8 with tion results in Fig. 9, Fig. 10, Fig. 4 point out that
the control actions on both sides to be displayed the Actor/Critic Strategy based control scheme
in Fig. 9, Fig. 10, which are bounded. Moreover, in Theorem 2 is more effective than the proposed
the external forces are obtained in Fig. 11 and the controller in [2] in term of control signals. More-
sliding variables fluctuate around zero in Fig. 12, over, the advantage of RL control scheme is also
which is better than the evolution of sliding vari- shown in Figs. 7, 8 with respect to tracking effec-
able (Fig. 6) under finite time control scheme tiveness in comparison with the work in Figs. 3, 5
proposed in [2]. On the other hand, the conver- [2]. Finally, the proposed optimal control effective-
gences of adaptation laws in Critic and Actor NN ness is described in Fig. 17 by the comparative
of two sides proposed in Theorem 1 are depicted result of cost functions under finite time con-
in Fig. 13, Fig. 14, Fig. 15, Fig. 16 under vari- trol in [2] and Actor/Critic Control strategy in
able time communication delays in two directions Theorem 1.
Fig. 8: Tracking error of two sides in the BT sys- Fig. 10: The Control signals of Remote Robot in
tem under the proposed Optimal Control Scheme BT system under the proposed Optimal Control
Scheme
Fig. 9: The Control signals of Local Robot in

BT system under the proposed Optimal Control Fig. 11: Human and Environment Torques in
Scheme BT system under the proposed Optimal Control
Scheme
5 Conclusions
RISE term is investigated to achieve the coor-
This paper has developed a novel coordination dination control objective and the corresponding
control framework between two sides of Actor/- identification capability, in which the tracking dis-
Critic structure and RISE term for unknown cussion is studied in both Actor/Critic structure
BTs with time-varying delays in communication and RISE design in different situations. Simu-
between two sides and external disturbance. By lation studies demonstrate the performance of
utilizing sliding variable for obtaining the order the proposed control designs in comparison with
reduction and fully studying NN, optimization related control algorithm for BT systems. In the
problem in training process, the Actor/Critic future, the BT experiment system and the data
strategy is firstly established, followed by a strictly driven based RL strategy will be developed.
theoretical proof of weights convergence and coor-
dination tracking validation. On the other hand, Acknowledgments. This research was funded
by Vietnam’s National project “Research, develop
17
Fig. 12: Sliding variables in BT system under the Fig. 14: Convergence of the Weights of Actor NN
proposed Optimal Control Scheme in Remote Robot
Fig. 13: Convergence of the Weights of Actor NN Fig. 15: Convergence of the Weights of Critic NN
in Local Robot in Local Robot
an intelligent mobile robot using different types IEEE Transactions on Industrial Informat-
of sensing technology and IoT platform, AI, and ics, 2019.
implemented in radioactive environment monitor-
ing application”, code: DTDLCN.19/23 of the [2] P. N. Dao, V. T. Nguyen, and Y.-C. Liu,
CT1187 Physics development program in the “Finite-time convergence for bilateral teleop-
period 2021- 2025. eration systems with disturbance and time-
varying delays,” IET Control Theory &
Applications, vol. 15, no. 13, pp. 1736–1748,
References 2021.
[1] Y.-C. Liu, N. Dao, and K. Y. Zhao,
[3] M. Franken, S. Stramigioli, S. Misra, C. Sec-
“On robust control of nonlinearteleoperators
chi, and A. Macchelli, “Bilateral telemanipu-
under dynamic uncertainties with variable
lation with time delays: A two-layer approach
time delays and without relative velocity,”
combining passivity and transparency,” IEEE
[6] J. Li, B. You, L. Ding, J. Xu, W. Li, H. Chen,

and H. Gao, “A novel bilateral haptic teleop-
eration approach for hexapod robot walking
and manipulating with legs,” Robotics and
Autonomous Systems, vol. 108, pp. 1–12,
2018.
[7] D. Heck, A. Saccon, R. Beerens, and

H. Nijmeijer, “Direct force-reflecting two-
layer approach for passive bilateral teleoper-
ation with time delays,” IEEE Transactions
on Robotics, vol. 34, no. 1, pp. 194–206, 2018.
[8] J. Yan, X. Yang, X. Luo, and X. Guan,

“Dynamic gain control of teleoperating
Fig. 16: Convergence of the Weights of Critic NN
cyber-physical system with time-varying
in Remote Robot
delay,” Nonlinear Dynamics, vol. 95, no. 4,
pp. 3049–3062, 2019.
[9] D.-H. Zhai and Y. Xia, “A novel switching-

based control framework for improved task
performance in teleoperation system with
asymmetric time-varying delays,” IEEE
transactions on cybernetics, vol. 48, no. 2, pp.
625–638, 2017.
[10] Y. Yuan, Y. Wang, and L. Guo, “Force

reflecting control for bilateral teleopera-
tion system under time-varying delays,”
IEEE Transactions on Industrial Informat-
ics, vol. 15, no. 2, pp. 1162–1172, 2018.
[11] D.-H. Zhai and Y. Xia, “Finite-time control of

teleoperation systems with input saturation
Fig. 17: Cumulative cost function difference
and varying time delays,” IEEE Transactions
between finite time control scheme [2] and the pro-
on Systems, Man, and Cybernetics: Systems,
posed optimal control strategy
vol. 47, no. 7, pp. 1522–1534, 2016.
[12] H. Zhang, A. Song, H. Li, and S. Shen,

transactions on robotics, vol. 27, no. 4, pp.
“Novel adaptive finite-time control of tele-
741–756, 2011.
operation system with time-varying delays
[4] S.-J. Lee and H.-S. Ahn, “Controller designs and input saturation,” IEEE transactions on
for bilateral teleoperation with input satura- cybernetics, 2019.
tion,” Control Engineering Practice, vol. 33,
[13] Z. Chen, F. Huang, W. Sun, J. Gu, and
pp. 35–47, 2014.
B. Yao, “Rbf neural network based adaptive
[5] M. Shahbazi, S. F. Atashzar, M. Tavakoli, robust control for nonlinear bilateral teleop-
and R. V. Patel, “Position-force domain pas- eration manipulators with uncertainty and
sivity of the human arm in telerobotic sys- time delay,” IEEE/ASME Transactions on
tems,” IEEE/ASME Transactions on Mecha- Mechatronics, 2019.
tronics, vol. 23, no. 2, pp. 552–562, 2018.
19
[14] Y.-J. Liu and S. Tong, “Barrier lyapunov [22] C. Chen, H. Modares, K. Xie, F. L. Lewis,
functions-based adaptive control for a class Y. Wan, and S. Xie, “Reinforcement learning-
of nonlinear pure-feedback systems with full based adaptive optimal exponential track-
state constraints,” Automatica, vol. 64, pp. ing control of linear systems with unknown
70–75, 2016. dynamics,” IEEE Transactions on Automatic
Control, vol. 64, no. 11, pp. 4423–4438, 2019.
[15] D.-H. Zhai and Y. Xia, “Adaptive control
for teleoperation system with varying time [23] T. Sun and X.-M. Sun, “An adaptive dynamic
delays and input saturation constraints,” programming scheme for nonlinear optimal
IEEE Transactions on industrial electronics, control with unknown dynamics and its appli-
vol. 63, no. 11, pp. 6921–6929, 2016. cation to turbofan engines,” IEEE Transac-
tions on Industrial Informatics, 2020.
[16] S. Ganjefar, S. Najibi, and H. Momeni, “A
novel structure for the optimal control of [24] J. Y. Lee, J. B. Park, and Y. H. Choi, “Inte-
bilateral teleoperation systems with variable gral reinforcement learning with explorations
time delay,” Journal of the Franklin Institute, for continuous-time nonlinear systems,” in
vol. 348, no. 7, pp. 1537–1555, 2011. The 2012 International Joint Conference on
Neural Networks (IJCNN). IEEE, 2012, pp.
[17] P. N. Dao and Y.-C. Liu, “Adaptive reinforce- 1–6.
ment learning in control design for cooperat-
ing manipulator systems,” Asian Journal of [25] ——, “Integral reinforcement learning for
Control, vol. 24, no. 3, pp. 1088–1103, 2022. continuous-time input-affine nonlinear sys-
tems with simultaneous invariant explo-
[18] T. L. Pham, P. N. Dao et al., “Disturbance rations,” IEEE Transactions on Neural Net-
observer-based adaptive reinforcement learn- works and Learning Systems, vol. 26, no. 5,
ing for perturbed uncertain surface vessels,” pp. 916–932, 2014.
ISA transactions, 2022.
[26] Y. Zhu, D. Zhao, and X. Li, “Using reinforce-
[19] R. Kamalapurkar, L. Andrews, P. Walters, ment learning techniques to solve continuous-
and W. E. Dixon, “Model-based reinforce- time non-linear optimal tracking problem
ment learning for infinite-horizon approxi- without system dynamics,” IET Control The-
mate optimal tracking,” IEEE transactions ory & Applications, vol. 10, no. 12, pp.
on neural networks and learning systems, 1339–1347, 2016.
vol. 28, no. 3, pp. 753–758, 2016.
[27] X. Yang, H. He, D. Liu, and Y. Zhu,
[20] G. Wen, C. P. Chen, and S. S. Ge, “Sim- “Adaptive dynamic programming for robust
plified optimized backstepping control for neural control of unknown continuous-time
a class of nonlinear strict-feedback systems non-linear systems,” IET Control Theory &
with unknown dynamic functions,” IEEE Applications, vol. 11, no. 14, pp. 2307–2316,
Transactions on Cybernetics, 2020. 2017.
[21] H. Zhang, Q. Qu, G. Xiao, and Y. Cui, [28] G. Xiao, H. Zhang, Y. Luo, and H. Jiang,
“Optimal guaranteed cost sliding mode con- “Data-driven optimal tracking control for a
trol for constrained-input nonlinear systems class of affine non-linear continuous-time sys-
with matched and unmatched disturbances,” tems with completely unknown dynamics,”
IEEE transactions on neural networks and IET Control Theory & Applications, vol. 10,
learning systems, vol. 29, no. 6, pp. 2112– no. 6, pp. 700–710, 2016.
2126, 2018.
[29] X. Guo, W. Yan, and R. Cui, “Integral rein-
forcement learning-based adaptive nn con-
trol for continuous-time nonlinear mimo sys-
tems with unknown control directions,” IEEE
Transactions on Systems, Man, and Cyber- [38] S. Zhang, S. Yuan, X. Yu, L. Kong, Q. Li, and
netics: Systems, 2019. G. Li, “Adaptive neural network fixed-time
control design for bilateral teleoperation with
[30] L. Zhang, J. Fan, W. Xue, V. G. Lopez, time delay,” IEEE transactions on cybernet-
J. Li, T. Chai, and F. L. Lewis, “Data-driven ics.
h {} optimal output feedback control for lin-
ear discrete-time systems based on off-policy [39] S. Bhasin, R. Kamalapurkar, M. Johnson,
q-learning,” IEEE Transactions on Neural K. G. Vamvoudakis, F. L. Lewis, and W. E.
Networks and Learning Systems, 2021. Dixon, “A novel actor–critic–identifier archi-
tecture for approximate optimal control of
[31] K. Dupree, P. M. Patre, Z. D. Wilcox, and uncertain nonlinear systems,” Automatica,
W. E. Dixon, “Asymptotic optimal control of vol. 49, no. 1, pp. 82–92, 2013.
uncertain nonlinear euler–lagrange systems,”
Automatica, vol. 47, no. 1, pp. 99–107, 2011. [40] C. Li, F. Liu, Y. Wang, and M. Buss, “Con-
current learning-based adaptive control of an
[32] H. J. Asl, T. Narikiyo, and M. Kawanishi, uncertain robot manipulator with guaranteed
“Rise-based prescribed performance control safety and performance,” IEEE Transactions
of euler–lagrange systems,” Journal of the on Systems, Man, and Cybernetics: Systems,
Franklin Institute, vol. 356, no. 13, pp. 7144– vol. 52, no. 5, pp. 3299–3313, 2021.
7163, 2019.
[41] H. K. Khalil, “Nonlinear systems third edi-
[33] B. Xian, D. M. Dawson, M. S. de Queiroz, tion,” Patience Hall, vol. 115, 2002.
and J. Chen, “A continuous asymptotic
tracking control strategy for uncertain non- [42] T. Madani, B. Daachi, and K. Djouani,
linear systems,” IEEE Transactions on Auto- “Modular-controller-design-based fast termi-
matic Control, vol. 49, no. 7, pp. 1206–1211, nal sliding mode for articulated exoskeleton
2004. systems,” IEEE Transactions on Control Sys-
tems Technology, vol. 25, no. 3, pp. 1133–
[34] J. Bao, H. Wang, and P. X. Liu, “Finite-time 1140, 2016.
synchronization control for bilateral teleoper-
ation systems with asymmetric time-varying [43] J. Huang, S. Ri, T. Fukuda, and Y. Wang, “A
delay and input dead zone,” IEEE/ASME disturbance observer based sliding mode con-
Transactions on Mechatronics, 2020. trol for a class of underactuated robotic sys-
tem with mismatched uncertainties,” IEEE
[35] H. Shen and Y.-J. Pan, “Improving tracking Transactions on Automatic Control, vol. 64,
performance of nonlinear uncertain bilat- no. 6, pp. 2480–2487, 2018.
eral teleoperation systems with time-varying
delays and disturbances,” IEEE/ASME
Transactions on Mechatronics, 2019.
[36] S. Li, L. Ding, H. Gao, Y.-J. Liu, L. Huang,

and Z. Deng, “Adp-based online tracking
control of partially uncertain time-delayed
nonlinear system and application to wheeled
mobile robots,” IEEE transactions on cyber-
netics, 2019.
[37] J. Na, Y. Lv, K. Zhang, and J. Zhao, “Adap-

tive identifier-critic-based optimal tracking
control for nonlinear systems with experimen-
tal validation,” IEEE Transactions on Sys-
tems, Man, and Cybernetics: Systems, 2020.
Click here to access/download;Manuscript;On-Off_Policy.jpg

Click here to access/download;Manuscript;RISE_Actor-Critic for BTs.jpg

Click here to access/download;colour figure;deltaJ.png
Click here to access/download;colour figure;human_environment.png
Click here to access/download;colour figure;JointAngle.png
Click here to access/download;colour figure;SlidingVariable.png
Click here to access/download;colour figure;timedelay.png
Click here to access/download;colour figure;TrackingError.png
Click here to access/download;colour figure;U_local.png
Click here to access/download;colour figure;U_remote.png
Click here to access/download;colour figure;WaLocal.png
Click here to access/download;colour figure;WaRemote.png
Click here to access/download;colour figure;WcLocal.png
Click here to access/download;colour figure;WcRemote.png
Click here to access/download;colour figure;JointIET.png
Click here to access/download;colour figure;PlotTorqueIET.png
Click here to access/download;colour figure;SlidingVariableIET.png
Click here to access/download;colour figure;TrackingErrorIET.png
Electronic Supplementary Material
Click here to access/download

Actor-Critic for BTs.tex
Click here to access/download
sn-sample-bib.tex
Click here to access/download;attachment to
manuscript;sample_1.eps
b
Click here to access/download;attachment to
manuscript;sample_2.eps
Click here to access/download;attachment to manuscript;sn-
article.tex
This is pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022)

(preloaded format=pdflatex 2023.3.8) 2 MAY 2023 09:54
entering extended mode
restricted \write18 enabled.
%&-line parsing enabled.
**sn-article.tex
(./sn-article.tex
LaTeX2e <2022-11-01> patch level 1
L3 programming layer <2023-02-22> (./sn-jnl.cls
Document Class: sn-jnl 2019/11/18 v0.1: An authoring template for
Springer Jour
nal articles
(c:/TeXLive/2022/texmf-dist/tex/latex/base/article.cls
Document Class: article 2022/07/02 v1.4n Standard LaTeX document class
(c:/TeXLive/2022/texmf-dist/tex/latex/base/fleqn.clo
File: fleqn.clo 2016/12/29 v1.2b Standard LaTeX option (flush left
equations)
\mathindent=\skip48
Applying: [2015/01/01] Make \[ robust on input line 50.
LaTeX Info: Redefining \[ on input line 51.
Already applied: [0000/00/00] Make \[ robust on input line 62.
Applying: [2015/01/01] Make \] robust on input line 74.
LaTeX Info: Redefining \] on input line 75.
Already applied: [0000/00/00] Make \] robust on input line 83.
) (c:/TeXLive/2022/texmf-dist/tex/latex/base/size10.clo
File: size10.clo 2022/07/02 v1.4n Standard LaTeX file (size option)
)
\c@part=\count185
\c@section=\count186
\c@subsection=\count187
\c@subsubsection=\count188
\c@paragraph=\count189
\c@subparagraph=\count190
\c@figure=\count191
\c@table=\count192
\abovecaptionskip=\skip49
\belowcaptionskip=\skip50
\bibindent=\dimen140
)
LaTeX Info: Redefining \rm on input line 138.
LaTeX Info: Redefining \sf on input line 139.
LaTeX Info: Redefining \tt on input line 140.
LaTeX Info: Redefining \bf on input line 141.
LaTeX Info: Redefining \it on input line 142.
LaTeX Info: Redefining \sl on input line 143.
LaTeX Info: Redefining \sc on input line 144.
LaTeX Info: Redefining \cal on input line 145.
LaTeX Info: Redefining \mit on input line 146.
LaTeX Info: Redefining \textsubscript on input line 204.
\columnhsize=\skip51
(c:/TeXLive/2022/texmf-dist/tex/latex/geometry/geometry.sty
Package: geometry 2020/01/02 v5.9 Page Geometry
(c:/TeXLive/2022/texmf-dist/tex/latex/graphics/keyval.sty
Package: keyval 2022/05/29 v1.15 key=value parser (DPC)
\KV@toks@=\toks16
) (c:/TeXLive/2022/texmf-dist/tex/generic/iftex/ifvtex.sty
Package: ifvtex 2019/10/25 v1.7 ifvtex legacy package. Use iftex instead.
(c:/TeXLive/2022/texmf-dist/tex/generic/iftex/iftex.sty
Package: iftex 2022/02/03 v1.0f TeX engine tests
))
\Gm@cnth=\count193
\Gm@cntv=\count194
\c@Gm@tempcnt=\count195
\Gm@bindingoffset=\dimen141
\Gm@wd@mp=\dimen142
\Gm@odd@mp=\dimen143
\Gm@even@mp=\dimen144
\Gm@layoutwidth=\dimen145
\Gm@layoutheight=\dimen146
\Gm@layouthoffset=\dimen147
\Gm@layoutvoffset=\dimen148
\Gm@dimlist=\toks17
) (c:/TeXLive/2022/texmf-dist/tex/latex/graphics/graphicx.sty
Package: graphicx 2021/09/16 v1.2d Enhanced LaTeX Graphics (DPC,SPQR)
(c:/TeXLive/2022/texmf-dist/tex/latex/graphics/graphics.sty
Package: graphics 2022/03/10 v1.4e Standard LaTeX Graphics (DPC,SPQR)
(c:/TeXLive/2022/texmf-dist/tex/latex/graphics/trig.sty
Package: trig 2021/08/11 v1.11 sin cos tan (DPC)
) (c:/TeXLive/2022/texmf-dist/tex/latex/graphics-cfg/graphics.cfg
File: graphics.cfg 2016/06/04 v1.11 sample graphics configuration
)
Package graphics Info: Driver file: pdftex.def on input line 107.
(c:/TeXLive/2022/texmf-dist/tex/latex/graphics-def/pdftex.def
File: pdftex.def 2022/09/22 v1.2b Graphics/color driver for pdftex
))
\Gin@req@height=\dimen149
\Gin@req@width=\dimen150
) (c:/TeXLive/2022/texmf-dist/tex/latex/multirow/multirow.sty
Package: multirow 2021/03/15 v2.8 Span multiple rows of a table
\multirow@colwidth=\skip52
\multirow@cntb=\count196
\multirow@dima=\skip53
\bigstrutjot=\dimen151
) (c:/TeXLive/2022/texmf-dist/tex/latex/amsmath/amsmath.sty
Package: amsmath 2022/04/08 v2.17n AMS math features
\@mathmargin=\skip54
For additional information on amsmath, use the `?' option.
(c:/TeXLive/2022/texmf-dist/tex/latex/amsmath/amstext.sty
Package: amstext 2021/08/26 v2.01 AMS text
(c:/TeXLive/2022/texmf-dist/tex/latex/amsmath/amsgen.sty
File: amsgen.sty 1999/11/30 v2.0 generic functions
\@emptytoks=\toks18
\ex@=\dimen152
)) (c:/TeXLive/2022/texmf-dist/tex/latex/amsmath/amsbsy.sty
Package: amsbsy 1999/11/29 v1.2d Bold Symbols
\pmbraise@=\dimen153
) (c:/TeXLive/2022/texmf-dist/tex/latex/amsmath/amsopn.sty
Package: amsopn 2022/04/08 v2.04 operator names
)
\inf@bad=\count197
LaTeX Info: Redefining \frac on input line 234.
\uproot@=\count198
\leftroot@=\count199
LaTeX Info: Redefining \overline on input line 399.
LaTeX Info: Redefining \colon on input line 410.
\classnum@=\count266
\DOTSCASE@=\count267
LaTeX Info: Redefining \ldots on input line 496.
LaTeX Info: Redefining \dots on input line 499.
LaTeX Info: Redefining \cdots on input line 620.
\Mathstrutbox@=\box51
\strutbox@=\box52
LaTeX Info: Redefining \big on input line 722.
LaTeX Info: Redefining \Big on input line 723.
LaTeX Info: Redefining \bigg on input line 724.
LaTeX Info: Redefining \Bigg on input line 725.
\big@size=\dimen154
LaTeX Font Info: Redeclaring font encoding OML on input line 743.
LaTeX Font Info: Redeclaring font encoding OMS on input line 744.
\macc@depth=\count268
LaTeX Info: Redefining \bmod on input line 905.
LaTeX Info: Redefining \pmod on input line 910.
LaTeX Info: Redefining \smash on input line 940.
LaTeX Info: Redefining \relbar on input line 970.
LaTeX Info: Redefining \Relbar on input line 971.
\c@MaxMatrixCols=\count269
\dotsspace@=\muskip16
\c@parentequation=\count270
\dspbrk@lvl=\count271
\tag@help=\toks19
\row@=\count272
\column@=\count273
\maxfields@=\count274
\andhelp@=\toks20
\eqnshift@=\dimen155
\alignsep@=\dimen156
\tagshift@=\dimen157
\tagwidth@=\dimen158
\totwidth@=\dimen159
\lineht@=\dimen160
\@envbody=\toks21
\multlinegap=\skip55
\multlinetaggap=\skip56
\mathdisplay@stack=\toks22
LaTeX Info: Redefining \[ on input line 2953.
LaTeX Info: Redefining \] on input line 2954.
) (c:/TeXLive/2022/texmf-dist/tex/latex/amsfonts/amssymb.sty
Package: amssymb 2013/01/14 v3.01 AMS font symbols
(c:/TeXLive/2022/texmf-dist/tex/latex/amsfonts/amsfonts.sty
Package: amsfonts 2013/01/14 v3.01 Basic AMSFonts support
\symAMSa=\mathgroup4
\symAMSb=\mathgroup5
LaTeX Font Info: Redeclaring math symbol \hbar on input line 98.
LaTeX Font Info: Overwriting math alphabet `\mathfrak' in version
`bold'
(Font) U/euf/m/n --> U/euf/b/n on input line 106.
)) (c:/TeXLive/2022/texmf-dist/tex/latex/amscls/amsthm.sty
Package: amsthm 2020/05/29 v2.20.6
\thm@style=\toks23
\thm@bodyfont=\toks24
\thm@headfont=\toks25
\thm@notefont=\toks26
\thm@headpunct=\toks27
\thm@preskip=\skip57
\thm@postskip=\skip58
\thm@headsep=\skip59
\dth@everypar=\toks28
) (c:/TeXLive/2022/texmf-dist/tex/latex/jknapltx/mathrsfs.sty
Package: mathrsfs 1996/01/01 Math RSFS package v1.0 (jk)
\symrsfs=\mathgroup6
) (c:/TeXLive/2022/texmf-dist/tex/latex/graphics/rotating.sty
Package: rotating 2016/08/11 v2.16d rotated objects in LaTeX
(c:/TeXLive/2022/texmf-dist/tex/latex/base/ifthen.sty
Package: ifthen 2022/04/13 v1.1d Standard LaTeX ifthen package (DPC)
)
\c@r@tfl@t=\count275
\rotFPtop=\skip60
\rotFPbot=\skip61
\rot@float@box=\box53
\rot@mess@toks=\toks29
) (c:/TeXLive/2022/texmf-dist/tex/latex/appendix/appendix.sty
Package: appendix 2020/02/08 v1.2c extra appendix facilities
\c@@pps=\count276
\c@@ppsavesec=\count277
\c@@ppsaveapp=\count278
) (c:/TeXLive/2022/texmf-dist/tex/latex/xcolor/xcolor.sty
Package: xcolor 2022/06/12 v2.14 LaTeX color extensions (UK)
(c:/TeXLive/2022/texmf-dist/tex/latex/graphics-cfg/color.cfg
File: color.cfg 2016/01/02 v1.6 sample color configuration
)
Package xcolor Info: Driver file: pdftex.def on input line 227.
(c:/TeXLive/2022/texmf-dist/tex/latex/graphics/mathcolor.ltx)
Package xcolor Info: Model `cmy' substituted by `cmy0' on input line
1353.
Package xcolor Info: Model `hsb' substituted by `rgb' on input line 1357.
Package xcolor Info: Model `RGB' extended on input line 1369.
Package xcolor Info: Model `HTML' substituted by `rgb' on input line
1371.
Package xcolor Info: Model `Hsb' substituted by `hsb' on input line 1372.
Package xcolor Info: Model `tHsb' substituted by `hsb' on input line
1373.
Package xcolor Info: Model `HSB' substituted by `hsb' on input line 1374.
Package xcolor Info: Model `Gray' substituted by `gray' on input line
1375.
Package xcolor Info: Model `wave' substituted by `hsb' on input line
1376.
) (c:/TeXLive/2022/texmf-dist/tex/latex/base/textcomp.sty
Package: textcomp 2020/02/02 v2.0n Standard LaTeX package
) (c:/TeXLive/2022/texmf-dist/tex/latex/ncctools/manyfoot.sty
Package: manyfoot 2019/08/03 v1.11 Many Footnote Levels Package (NCC)
(c:/TeXLive/2022/texmf-dist/tex/latex/ncctools/nccfoots.sty
Package: nccfoots 2005/02/03 v1.2 NCC Footnotes Package (NCC)
)
\MFL@columnwidth=\dimen161
) (c:/TeXLive/2022/texmf-dist/tex/latex/booktabs/booktabs.sty
Package: booktabs 2020/01/12 v1.61803398 Publication quality tables
\heavyrulewidth=\dimen162
\lightrulewidth=\dimen163
\cmidrulewidth=\dimen164
\belowrulesep=\dimen165
\belowbottomsep=\dimen166
\aboverulesep=\dimen167
\abovetopsep=\dimen168
\cmidrulesep=\dimen169
\cmidrulekern=\dimen170
\defaultaddspace=\dimen171
\@cmidla=\count279
\@cmidlb=\count280
\@aboverulesep=\dimen172
\@belowrulesep=\dimen173
\@thisruleclass=\count281
\@lastruleclass=\count282
\@thisrulewidth=\dimen174
) (c:/TeXLive/2022/texmf-dist/tex/latex/algorithms/algorithm.sty
Package: algorithm 2009/08/24 v0.1 Document Style àlgorithm' - floating
enviro
nment
(c:/TeXLive/2022/texmf-dist/tex/latex/float/float.sty
Package: float 2001/11/08 v1.3d Float enhancements (AL)
\c@float@type=\count283
\float@exts=\toks30
\float@box=\box54
\@float@everytoks=\toks31
\@floatcapt=\box55
)
\@float@every@algorithm=\toks32
\c@algorithm=\count284
) (c:/TeXLive/2022/texmf-dist/tex/latex/algorithmicx/algorithmicx.sty
Package: algorithmicx 2005/04/27 v1.2 Algorithmicx
Document Style algorithmicx 1.2 - a greatly improved àlgorithmic' style
\c@ALG@line=\count285
\c@ALG@rem=\count286
\c@ALG@nested=\count287
\ALG@tlm=\skip62
\ALG@thistlm=\skip63
\c@ALG@Lnr=\count288
\c@ALG@blocknr=\count289
\c@ALG@storecount=\count290
\c@ALG@tmpcounter=\count291
\ALG@tmplength=\skip64
) (c:/TeXLive/2022/texmf-dist/tex/latex/algorithmicx/algpseudocode.sty
Package: algpseudocode
Document Style - pseudocode environments for use with the àlgorithmicx'
style
) (c:/TeXLive/2022/texmf-dist/tex/latex/program/program.sty
\@gtempa=\dimen175
\@gtempa=\dimen176
\@gtempa=\dimen177
\@gtempa=\dimen178
\@gtempa=\dimen179
\@gtempa=\dimen180
\@gtempa=\dimen181
\@gtempa=\dimen182
\@gtempa=\dimen183
\@gtempa=\dimen184
\@gtempa=\dimen185
\@gtempa=\dimen186
\@gtempa=\dimen187
\@gtempa=\dimen188
\@gtempa=\dimen189
\@gtempa=\dimen190
\@gtempa=\dimen191
\@gtempa=\dimen192
\@gtempa=\dimen193
\@gtempa=\dimen194
\@gtempa=\dimen195
\@gtempa=\dimen196
\@gtempa=\dimen197
\@gtempa=\dimen198
\@gtempa=\dimen199
\@gtempa=\dimen256
\@gtempa=\dimen257
\@gtempa=\dimen258
\@gtempa=\dimen259
\@gtempa=\dimen260
\@gtempa=\dimen261
\@gtempa=\dimen262
\@gtempa=\dimen263
\@gtempa=\dimen264
\@gtempa=\dimen265
\@gtempa=\dimen266
\@gtempa=\dimen267
\@gtempa=\dimen268
\@gtempa=\dimen269
\@gtempa=\dimen270
\@gtempa=\dimen271
\@gtempa=\dimen272
\@gtempa=\dimen273
\@gtempa=\dimen274
\@gtempa=\dimen275
\@gtempa=\dimen276
\@gtempa=\dimen277
\@gtempa=\dimen278
\@gtempa=\dimen279
\@gtempa=\dimen280
\c@programline=\count292
\old@nxttabmar=\count293
\symeulerletters=\mathgroup7
\powersetbox=\box56
LaTeX Font Warning: Font shape ÒT1/cmr/m/n' in size <16> not available
(Font) size <17.28> substituted on input line 841.
LaTeX Font Info: Calculating math sizes for size <16> on input line
841.
LaTeX Font Warning: Font shape ÒML/cmm/m/it' in size <16> not available
LaTeX Font Warning: Font shape ÒMS/cmsy/m/n' in size <16> not available
LaTeX Font Info: Trying to load font information for U+msa on input
line 841
.
(c:/TeXLive/2022/texmf-dist/tex/latex/amsfonts/umsa.fd
File: umsa.fd 2013/01/14 v3.01 AMS symbols A
)
LaTeX Font Info: Trying to load font information for U+msb on input
line 841
.
(c:/TeXLive/2022/texmf-dist/tex/latex/amsfonts/umsb.fd
File: umsb.fd 2013/01/14 v3.01 AMS symbols B
)
LaTeX Font Info: Trying to load font information for U+rsfs on input
line 84
1.
(c:/TeXLive/2022/texmf-dist/tex/latex/jknapltx/ursfs.fd
File: ursfs.fd 1998/03/24 rsfs font definition file (jk)
)
LaTeX Font Warning: Font shape Ù/rsfs/m/n' in size <16> not available
LaTeX Font Info: Trying to load font information for U+eur on input
line 841
.
(c:/TeXLive/2022/texmf-dist/tex/latex/amsfonts/ueur.fd
File: ueur.fd 2013/01/14 v3.01 Euler Roman
)) (c:/TeXLive/2022/texmf-dist/tex/latex/listings/listings.sty
\lst@mode=\count294
\lst@gtempboxa=\box57
\lst@token=\toks33
\lst@length=\count295
\lst@currlwidth=\dimen281
\lst@column=\count296
\lst@pos=\count297
\lst@lostspace=\dimen282
\lst@width=\dimen283
\lst@newlines=\count298
\lst@lineno=\count299
\lst@maxwidth=\dimen284
(c:/TeXLive/2022/texmf-dist/tex/latex/listings/lstmisc.sty
File: lstmisc.sty 2023/02/27 1.9 (Carsten Heinz)
\c@lstnumber=\count300
\lst@skipnumbers=\count301
\lst@framebox=\box58
) (c:/TeXLive/2022/texmf-dist/tex/latex/listings/listings.cfg
File: listings.cfg 2023/02/27 1.9 listings configuration
))
Package: listings 2023/02/27 1.9 (Carsten Heinz)
\artcatbox=\box59
\aucount=\count302
\corraucount=\count303
\punctcount=\count304
\emailcnt=\count305
\c@affn=\count306
\addcount=\count307
\PacsCount=\count308
\PacsTmpCnt=\count309
\FMremarkdim=\dimen285
\fmremarkbox=\box60
\opshortpage=\dimen286
\labelwidthi=\dimen287
\labelwidthii=\dimen288
\labelwidthiii=\dimen289
\labelwidthiv=\dimen290
\figwidth=\dimen291
\figheight=\dimen292
\sidecapwidth=\dimen293
\wrapcapline=\dimen294
\totalwrapline=\dimen295
\wraptotline=\dimen296
\figurebox=\box61
\wrapfigcapbox=\box62
\figcapbox=\box63
\capbox=\box64
\headwidthskip=\skip65
\tabcapbox=\box65
\temptbox=\box66
\tempdime=\dimen297
\tabhtdime=\dimen298
(c:/TeXLive/2022/texmf-dist/tex/latex/hyperref/hyperref.sty
Package: hyperref 2023-02-07 v7.00v Hypertext links for LaTeX
(c:/TeXLive/2022/texmf-dist/tex/generic/ltxcmds/ltxcmds.sty
Package: ltxcmds 2020-05-10 v1.25 LaTeX kernel commands for general use
(HO)
) (c:/TeXLive/2022/texmf-dist/tex/generic/pdftexcmds/pdftexcmds.sty
Package: pdftexcmds 2020-06-27 v0.33 Utility functions of pdfTeX for
LuaTeX (HO
)
(c:/TeXLive/2022/texmf-dist/tex/generic/infwarerr/infwarerr.sty
Package: infwarerr 2019/12/03 v1.5 Providing info/warning/error messages
(HO)
)
Package pdftexcmds Info: \pdf@primitive is available.
Package pdftexcmds Info: \pdf@ifprimitive is available.
Package pdftexcmds Info: \pdfdraftmode found.
) (c:/TeXLive/2022/texmf-dist/tex/latex/kvsetkeys/kvsetkeys.sty
Package: kvsetkeys 2022-10-05 v1.19 Key value parser (HO)
) (c:/TeXLive/2022/texmf-dist/tex/generic/kvdefinekeys/kvdefinekeys.sty
Package: kvdefinekeys 2019-12-19 v1.6 Define keys (HO)
) (c:/TeXLive/2022/texmf-dist/tex/generic/pdfescape/pdfescape.sty
Package: pdfescape 2019/12/09 v1.15 Implements pdfTeX's escape features
(HO)
) (c:/TeXLive/2022/texmf-dist/tex/latex/hycolor/hycolor.sty
Package: hycolor 2020-01-27 v1.10 Color options for hyperref/bookmark
(HO)
) (c:/TeXLive/2022/texmf-dist/tex/latex/letltxmacro/letltxmacro.sty
Package: letltxmacro 2019/12/03 v1.6 Let assignment for LaTeX macros (HO)
) (c:/TeXLive/2022/texmf-dist/tex/latex/auxhook/auxhook.sty
Package: auxhook 2019-12-17 v1.6 Hooks for auxiliary files (HO)
) (c:/TeXLive/2022/texmf-dist/tex/latex/hyperref/nameref.sty
Package: nameref 2022-05-17 v2.50 Cross-referencing by name of section
(c:/TeXLive/2022/texmf-dist/tex/latex/refcount/refcount.sty
Package: refcount 2019/12/15 v3.6 Data extraction from label references
(HO)
) (c:/TeXLive/2022/texmf-
dist/tex/generic/gettitlestring/gettitlestring.sty
Package: gettitlestring 2019/12/15 v1.6 Cleanup title references (HO)
(c:/TeXLive/2022/texmf-dist/tex/latex/kvoptions/kvoptions.sty
Package: kvoptions 2022-06-15 v3.15 Key value format for package options
(HO)
))
\c@section@level=\count310
)
\@linkdim=\dimen299
\Hy@linkcounter=\count311
\Hy@pagecounter=\count312
(c:/TeXLive/2022/texmf-dist/tex/latex/hyperref/pd1enc.def
File: pd1enc.def 2023-02-07 v7.00v Hyperref: PDFDocEncoding definition
(HO)
Now handling font encoding PD1 ...
... no UTF-8 mapping file for font encoding PD1
) (c:/TeXLive/2022/texmf-dist/tex/generic/intcalc/intcalc.sty
Package: intcalc 2019/12/15 v1.3 Expandable calculations with integers
(HO)
) (c:/TeXLive/2022/texmf-dist/tex/generic/etexcmds/etexcmds.sty
Package: etexcmds 2019/12/15 v1.7 Avoid name clashes with e-TeX commands
(HO)
)
\Hy@SavedSpaceFactor=\count313
(c:/TeXLive/2022/texmf-dist/tex/latex/hyperref/puenc.def
File: puenc.def 2023-02-07 v7.00v Hyperref: PDF Unicode definition (HO)
Now handling font encoding PU ...
... no UTF-8 mapping file for font encoding PU
)
Package hyperref Info: Hyper figures OFF on input line 4177.
Package hyperref Info: Link nesting OFF on input line 4182.
Package hyperref Info: Hyper index ON on input line 4185.
Package hyperref Info: Plain pages OFF on input line 4192.
Package hyperref Info: Backreferencing OFF on input line 4197.
Package hyperref Info: Implicit mode ON; LaTeX internals redefined.
Package hyperref Info: Bookmarks ON on input line 4425.
\c@Hy@tempcnt=\count314
(c:/TeXLive/2022/texmf-dist/tex/latex/url/url.sty
\Urlmuskip=\muskip17
Package: url 2013/09/16 ver 3.4 Verb mode for urls, etc.
)
LaTeX Info: Redefining \url on input line 4763.
\XeTeXLinkMargin=\dimen300
(c:/TeXLive/2022/texmf-dist/tex/generic/bitset/bitset.sty
Package: bitset 2019/12/09 v1.3 Handle bit-vector datatype (HO)
(c:/TeXLive/2022/texmf-dist/tex/generic/bigintcalc/bigintcalc.sty
Package: bigintcalc 2019/12/15 v1.5 Expandable calculations on big
integers (HO
)
))
\Fld@menulength=\count315
\Field@Width=\dimen301
\Fld@charsize=\dimen302
Package hyperref Info: Hyper figures OFF on input line 6042.
Package hyperref Info: Link nesting OFF on input line 6047.
Package hyperref Info: Hyper index ON on input line 6050.
Package hyperref Info: backreferencing OFF on input line 6057.
Package hyperref Info: Link coloring OFF on input line 6062.
Package hyperref Info: Link coloring with OCG OFF on input line 6067.
Package hyperref Info: PDF/A mode OFF on input line 6072.
(c:/TeXLive/2022/texmf-dist/tex/latex/base/atbegshi-ltx.sty
Package: atbegshi-ltx 2021/01/10 v1.0c Emulation of the original atbegshi
package with kernel methods
)
\Hy@abspage=\count316
\c@Item=\count317
\c@Hfootnote=\count318
)
Package hyperref Info: Driver (autodetected): hpdftex.
(c:/TeXLive/2022/texmf-dist/tex/latex/hyperref/hpdftex.def
File: hpdftex.def 2023-02-07 v7.00v Hyperref driver for pdfTeX
(c:/TeXLive/2022/texmf-dist/tex/latex/base/atveryend-ltx.sty
Package: atveryend-ltx 2020/08/19 v1.0a Emulation of the original
atveryend pac
kage
with kernel methods
)
\Fld@listcount=\count319
\c@bookmark@seq@number=\count320
(c:/TeXLive/2022/texmf-dist/tex/latex/rerunfilecheck/rerunfilecheck.sty
Package: rerunfilecheck 2022-07-10 v1.10 Rerun checks for auxiliary files
(HO)
(c:/TeXLive/2022/texmf-dist/tex/generic/uniquecounter/uniquecounter.sty
Package: uniquecounter 2019/12/15 v1.4 Provide unlimited unique counter
(HO)
)
Package uniquecounter Info: New unique counter `rerunfilecheck' on input
line 2
85.
)
\Hy@SectionHShift=\skip66
) (c:/TeXLive/2022/texmf-dist/tex/latex/breakurl/breakurl.sty
Package: breakurl 2013/04/10 v1.40 Breakable hyperref URLs
(c:/TeXLive/2022/texmf-dist/tex/latex/xkeyval/xkeyval.sty
Package: xkeyval 2022/06/16 v2.9 package option processing (HA)
(c:/TeXLive/2022/texmf-dist/tex/generic/xkeyval/xkeyval.tex
(c:/TeXLive/2022/te
xmf-dist/tex/generic/xkeyval/xkvutils.tex
\XKV@toks=\toks34
\XKV@tempa@toks=\toks35
)
\XKV@depth=\count321
File: xkeyval.tex 2014/12/03 v2.7a key=value parser (HA)
)) (c:/TeXLive/2022/texmf-dist/tex/generic/iftex/ifpdf.sty
Package: ifpdf 2019/10/25 v3.4 ifpdf legacy package. Use iftex instead.
)
Package breakurl Warning: You are using breakurl while processing via
pdflatex.
(breakurl) \burl will be just a synonym of \url.

(breakurl) on input line 48.
)
Package hyperref Info: Option `colorlinks' set `true' on input line 1486.
Package hyperref Info: Option `breaklinks' set `true' on input line 1486.
Package hyperref Info: Option `plainpages' set `false' on input line
1486.
Package hyperref Info: Option `bookmarksopen' set `true' on input line
1486.
Package hyperref Info: Option `bookmarksnumbered' set `false' on input
line 148
6.
(c:/TeXLive/2022/texmf-dist/tex/latex/wrapfig/wrapfig.sty
\wrapoverhang=\dimen303
\WF@size=\dimen304
\c@WF@wrappedlines=\count322
\WF@box=\box67
\WF@everypar=\toks36
Package: wrapfig 2003/01/31 v 3.6
)
\wraplines=\count323
\@authorfigbox=\box68
\@authorfigboxdim=\skip67
\biofigadjskip=\skip68
(c:/TeXLive/2022/texmf-dist/tex/latex/natbib/natbib.sty
Package: natbib 2010/09/13 8.31b (PWD, AO)
\bibhang=\skip69
\bibsep=\skip70
LaTeX Info: Redefining \cite on input line 694.
\c@NAT@ctr=\count324
))
\c@theorem=\count325
\c@example=\count326
\c@remark=\count327
\c@definition=\count328
(c:/TeXLive/2022/texmf-dist/tex/latex/l3backend/l3backend-pdftex.def
File: l3backend-pdftex.def 2023-01-16 L3 backend support: PDF output
(pdfTeX)
\l__color_backend_stack_int=\count329
\l__pdf_internal_box=\box69
) (./sn-article.aux)
\openout1 = `sn-article.aux'.
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 72.
LaTeX Font Info: ... okay on input line 72.
LaTeX Font Info: Checking defaults for OMS/cmsy/m/n on input line 72.
LaTeX Font Info: Checking defaults for OT1/cmr/m/n on input line 72.
LaTeX Font Info: Checking defaults for T1/cmr/m/n on input line 72.
LaTeX Font Info: Checking defaults for TS1/cmr/m/n on input line 72.
LaTeX Font Info: Checking defaults for OMX/cmex/m/n on input line 72.
LaTeX Font Info: Checking defaults for U/cmr/m/n on input line 72.
LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 72.
LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 72.
\footinsA=\insert252
\c@footnoteA=\count330
LaTeX Font Info: Redeclaring symbol font ÀMSa' on input line 72.
LaTeX Font Info: Overwriting symbol font ÀMSa' in version `normal'
(Font) U/msa/m/n --> U/msa/m/n on input line 72.
LaTeX Font Info: Overwriting symbol font ÀMSa' in version `bold'
(Font) U/msa/m/n --> U/msa/m/n on input line 72.
*geometry* driver: auto-detecting
*geometry* detected driver: pdftex
*geometry* verbose mode - [ preamble ] result:
* driver: pdftex
* paper: custom
* layout: <same size as paper>
* layoutoffset:(h,v)=(0.0pt,0.0pt)
* bindingoffset: 17.07164pt
* modes: twoside
* h-part:(L,W,R)=(30.72897pt, 455.24408pt, 46.09349pt)
* v-part:(T,H,B)=(54.88535pt, 614.57951pt, 70.3068pt)
* \paperwidth=549.13818pt
* \paperheight=739.77165pt
* \textwidth=455.24408pt
* \textheight=614.57951pt
* \oddsidemargin=-24.46938pt
* \evensidemargin=-26.1765pt
* \topmargin=-44.03778pt
* \headheight=12.0pt
* \headsep=14.65314pt
* \topskip=10.0pt
* \footskip=28.82265pt
* \marginparwidth=34.1433pt
* \marginparsep=14.22636pt
* \columnsep=22.76219pt
* \skip\footins=18.0pt plus 6.0pt minus 3.0pt
* \hoffset=0.0pt
* \voffset=0.0pt
* \mag=1000
* \@twocolumntrue
* \@twosidetrue
* \@mparswitchtrue
* \@reversemargintrue
* (1in=72.27pt=25.4mm, 1cm=28.453pt)
(c:/TeXLive/2022/texmf-dist/tex/context/base/mkii/supp-pdf.mkii
[Loading MPS to PDF converter (version 2006.09.02).]
\scratchcounter=\count331
\scratchdimen=\dimen305
\scratchbox=\box70
\nofMPsegments=\count332
\nofMParguments=\count333
\everyMPshowfont=\toks37
\MPscratchCnt=\count334
\MPscratchDim=\dimen306
\MPnumerator=\count335
\makeMPintoPDFobject=\count336
\everyMPtoPDFconversion=\toks38
) (c:/TeXLive/2022/texmf-dist/tex/latex/epstopdf-pkg/epstopdf-base.sty
Package: epstopdf-base 2020-01-24 v2.11 Base part for package epstopdf
(c:/TeXLive/2022/texmf-dist/tex/latex/grfext/grfext.sty
Package: grfext 2019/12/03 v1.3 Manage graphics extensions (HO)
)
Package epstopdf-base Info: Redefining graphics rule for `.eps' on input
line 4
85.
Package grfext Info: Graphics extension search list:
(grfext)
[.pdf,.png,.jpg,.mps,.jpeg,.jbig2,.jb2,.PDF,.PNG,.JPG,.JPE
G,.JBIG2,.JB2,.eps]
(grfext) \AppendGraphicsExtensions on input line 504.
(c:/TeXLive/2022/texmf-dist/tex/latex/latexconfig/epstopdf-sys.cfg
File: epstopdf-sys.cfg 2010/07/13 v1.3 Configuration of (r)epstopdf for
TeX Liv
e
))
\c@lstlisting=\count337
Package hyperref Info: Link coloring ON on input line 72.
(./sn-article.out) (./sn-article.out)
\@outlinefile=\write3
\openout3 = `sn-article.out'.
LaTeX Font Info: Calculating math sizes for size <12.045> on input
line 127.
LaTeX Font Warning: Font shape ÒT1/cmr/bx/n' in size <8.43146> not

available
(Font) size <8> substituted on input line 127.
LaTeX Font Warning: Font shape ÒML/cmm/b/it' in size <8.43146> not

available
LaTeX Font Warning: Font shape ÒMS/cmsy/b/n' in size <8.43146> not

available
LaTeX Font Warning: Font shape Ù/rsfs/m/n' in size <8.43146> not

available
LaTeX Font Warning: Font shape ÒT1/cmr/m/n' in size <8.43146> not

available
line 12
7.
LaTeX Font Warning: Font shape ÒT1/cmr/bx/n' in size <5.52061> not

available
LaTeX Font Warning: Font shape ÒML/cmm/b/it' in size <5.52061> not

available
LaTeX Font Warning: Font shape ÒMS/cmsy/b/n' in size <5.52061> not

available
LaTeX Font Warning: Font shape Ù/rsfs/m/n' in size <5.52061> not

available
Underfull \hbox (badness 2799) in paragraph at lines 127--127

\OT1/cmr/m/n/9 The abstract serves both as a gen-eral intro-duc-tion to
the to
pic and as a brief, non-
[]

\OT1/cmr/m/n/9 technical sum-mary of the main results and their impli-
ca-tions
. Authors are advised to
[]

\OT1/cmr/m/n/9 check the author instruc-tions for the jour-nal they are
sub-mi
t-ting to for word lim-
[]

\OT1/cmr/m/n/9 its and if struc-tural ele-ments like sub-head-ings,
cita-tions
, or equations are per-mit-ted.
[]
line 12
7.
Package natbib Warning: Citation `bib1' on page 1 undefined on input line

130.
Underfull \vbox (badness 10000) has occurred while \output is active []
Underfull \hbox (badness 10000) has occurred while \output is active

[]
[]

[]
[]
[1{c:/TeXLive/2022/texmf-var/fonts/map/pdftex/updmap/pdftex.map}
]
\OT1/cmr/m/n/10 Tables can be inserted via the nor-mal table
[]

\OT1/cmr/m/n/10 tables you should use [][]\OT1/cmtt/m/n/10
\footnotetext[]{...}
[]
LaTeX Font Info: Calculating math sizes for size <8.03> on input line
188.
LaTeX Font Warning: Font shape ÒT1/cmr/m/n' in size <4.015> not

available
LaTeX Font Warning: Font shape ÒML/cmm/m/it' in size <4.015> not

available
LaTeX Font Warning: Font shape ÒMS/cmsy/m/n' in size <4.015> not

available
LaTeX Font Warning: Font shape Ù/rsfs/m/n' in size <4.015> not available
Overfull \hbox (7.4466pt too wide) in paragraph at lines 188--197

[][]
[]
LaTeX Warning: `h' float specifier changed to `ht'.

[]\OT1/cmtt/m/n/10 \caption{<table-caption>}\label{<table-label>}%[]
[]

[]\OT1/cmtt/m/n/10 Column 1 & Column 2 & Column 3 & Column 4\\[]
[]
[]\OT1/cmtt/m/n/10 row 2 & data 4 & data 5\footnotemark[1] & data 6 \\[]
[]

[]\OT1/cmtt/m/n/10 row 3 & data 7 & data 8
[]

[]\OT1/cmtt/m/n/10 \footnotetext{Source: This is an example of table
footnote.[
]
[]

[]\OT1/cmtt/m/n/10 \footnotetext[1]{Example for a first table footnote.[]
[]

[]\OT1/cmtt/m/n/10 \footnotetext[2]{Example for a second table
footnote.[]
[]

[]$
[]

[]\OT1/cmr/m/n/10 In case of dou-ble col-umn lay-out, tables
[]

\OT1/cmr/m/n/10 to use [][]\OT1/cmtt/m/n/10 \begin{table*} []...
[]\end{table*}
[]

[]\OT1/cmr/m/n/10 instead of [][]\OT1/cmtt/m/n/10 \begin{table} []...
[]\end{ta
ble}
[]
[]\OT1/cmtt/m/n/10 ... []\end{sidewaystable} []\OT1/cmr/m/n/10 instead of
[]

[][]\OT1/cmtt/m/n/10 \begin{table*} []... []\end{table*}
[]\OT1/cmr/m/n/10 envi
-ron-
[]

[]| []
[]
[2]
\OT1/cmr/m/n/10 of the major dif-fer-ence between L[]T[]X and
[]

[]\OT1/cmtt/m/n/10 \caption{<figure-caption>}\label{<figure-label>}[]
[]
LaTeX Warning: File `fig.eps' not found on input line 310.
Package epstopdf Info: Source file: <fig.eps>

(epstopdf) Output file: <fig-eps-converted-to.pdf>
(epstopdf) Command: <repstopdf --outfile=fig-eps-converted-
to.pdf f
ig.eps>
(epstopdf) \includegraphics on input line 310.
Package epstopdf Info: Output file is already uptodate.
! Package pdftex.def Error: File `fig-eps-converted-to.pdf' not found:

using dr
aft setting.
See the pdftex.def package documentation for explanation.

Type H <return> for immediate help.
...
l.310 ...udegraphics[width=0.9\textwidth]{fig.eps}
Try typing <return> to proceed.

If that doesn't work, type X <return> to quit.
[][]
[]
LaTeX Warning: `h' float specifier changed to `ht'.

\OT1/cmr/m/n/10 Packages [][]\OT1/cmtt/m/n/10 algorithm[]\OT1/cmr/m/n/10
, [][]
\OT1/cmtt/m/n/10 algorithmicx []\OT1/cmr/m/n/10 and
[]

[]\OT1/cmtt/m/n/10 \caption{<alg-caption>}\label{<alg-label>}[]
[]

[]| []
[]
[3]
Overfull \hbox (66.73177pt too wide) detected at line 347
[] []
[]
Package hyperref Info: bookmark level for unknown algorithm defaults to 0

on in
put line 363.

[]\OT1/cmr/m/n/10 Similarly, for [][]\OT1/cmtt/m/n/10
listings[]\OT1/cmr/m/n/10
, use the
[]

[][]\OT1/cmtt/m/n/10 listings []\OT1/cmr/m/n/10 pack-age.
[][]\OT1/cmtt/m/n/10
\begin{lstlisting} []...
[]

[][]\OT1/cmtt/m/n/10 lstlisting []\OT1/cmr/m/n/10 pack-age doc-u-men-ta-
tion fo
r more
[]
(c:/TeXLive/2022/texmf-dist/tex/latex/listings/lstlang1.sty
File: lstlang1.sty 2023/02/27 1.9 listings language file
)
LaTeX Font Info: Trying to load font information for OMS+cmr on input
line 4
05.
(c:/TeXLive/2022/texmf-dist/tex/latex/base/omscmr.fd
File: omscmr.fd 2022/07/10 v2.5l Standard LaTeX font definitions
)
LaTeX Font Info: Font shape ÒMS/cmr/m/it' in size <10.03749> not
available
(Font) Font shape ÒMS/cmsy/m/n' tried instead on input line
405.

[]$[]$
[]

\OT1/cmr/m/n/10 Environments such as figure, table, equation
[]

[][]\OT1/cmtt/m/n/10 \label{#label} []\OT1/cmr/m/n/10 com-mand. For
figures and
[]

[]| []
[]
[4]

430.

430.

\OT1/cmr/m/n/10 Another exam-ple for [][]\OT1/cmtt/m/n/10
\citep{...}[]\OT1/cmr
/m/n/10 : [\OT1/cmr/bx/n/10 ? \OT1/cmr/m/n/10 ]. For
[]
\OT1/cmr/m/n/10 author-year cita-tion mode, [][]\OT1/cmtt/m/n/10
\cite{...} []\
OT1/cmr/m/n/10 prints
[]

432.

432.

432.

432.

432.

432.

432.
Package natbib Warning: Citation `bib10' on page 5 undefined on input

line 432.

line 432.

line 432.

\OT1/cmr/m/n/10 For theorem like envi-ron-ments, we require
[]
[][]\OT1/cmtt/m/n/10 amsthm []\OT1/cmr/m/n/10 pack-age. There are three
types o
f
[]

[][]
[]

[]| []
[]
[5]
[]\OT1/cmr/bx/n/10 Ethical approval dec-la-ra-tions \OT1/cmr/m/n/10 (only
[]

\OT1/cmr/m/n/10 sam-ples must include an unam-bigu-ous state-
[]

\OT1/cmr/m/n/10 ti-fy-ing patien-t/-par-tic-i-pant infor-ma-tion, or if
[]

\OT1/cmr/m/n/10 it describes human trans-plan-ta-tion research,
[]

\OT1/cmr/m/n/10 visit ([][]$https : / / www . nature . com / nature-
[]research
/
[]

\OT1/cmr/m/n/10 ([][]$https : / / www . springer . com / gp / authors-
[]editors
/
[]
\OT1/cmr/m/n/10 journal-[]author / journal-[]author-[]helpdesk /
[]

\OT1/cmr/m/n/10 publishing-[]ethics / 14214$[][]) for Springer Nature
[]

\OT1/cmr/m/n/10 jour-nals, or ([][]$https : / / www . biomedcentral . com
/
[]

\OT1/cmr/m/n/10 getpublished / editorial-[]policies # ethics + and +
[]

[]| []
[]
[6]
Package hyperref Warning: Difference (4) between bookmark levels is

greater
(hyperref) than one, level fixed on input line 543.

\OT1/cmr/m/n/10 has accom-pa-ny-ing sup-ple-men-tary file/s please
[]
Package hyperref Warning: Difference (3) between bookmark levels is

greater
(hyperref) than one, level fixed on input line 551.
No file sn-article.bbl.
Package natbib Warning: There were undefined citations.

[]| []
[]
[7] (./sn-article.aux)
LaTeX Font Warning: Size substitutions with differences

(Font) up to 1.28pt have occurred.
Package rerunfilecheck Info: File `sn-article.out' has not changed.

(rerunfilecheck) Checksum:
B1366A264836B167BE71CB57F1A49884;3117.
)
Here is how much of TeX's memory you used:
15156 strings out of 476024
227405 string characters out of 5794017
1873382 words of memory out of 5000000
35269 multiletter control sequences out of 15000+600000
541331 words of font info for 149 fonts, out of 8000000 for 9000
1151 hyphenation exceptions out of 8191
90i,17n,90p,737b,1543s stack positions out of
10000i,1000n,20000p,200000b,200000s
{c:/TeXLive/2022/texmf-dist/fonts/enc/dvips/cm-super/cm-super-
ts1.enc}<c:/TeX
Live/2022/texmf-
dist/fonts/type1/public/amsfonts/cmextra/cmbsy8.pfb><c:/TeXLive
/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmbx10.pfb><c:/TeXLive/2022/tex
mf-dist/fonts/type1/public/amsfonts/cm/cmbx12.pfb><c:/TeXLive/2022/texmf-
dist/f
onts/type1/public/amsfonts/cm/cmbx8.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1
/public/amsfonts/cm/cmbx9.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/am
sfonts/cm/cmex10.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm
/cmmi10.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmmi6.pf
b><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb><c:/TeXL
ive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmmi8.pfb><c:/TeXLive/2022/t
exmf-
dist/fonts/type1/public/amsfonts/cm/cmr10.pfb><c:/TeXLive/2022/texmf-
dist/
fonts/type1/public/amsfonts/cm/cmr12.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type
1/public/amsfonts/cm/cmr17.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/a
msfonts/cm/cmr6.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/
cmr7.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmr8.pfb><c
:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmr9.pfb><c:/TeXLive/2
022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmss10.pfb><c:/TeXLive/2022/texmf
-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb><c:/TeXLive/2022/texmf-
dist/fon
ts/type1/public/amsfonts/cm/cmsy7.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/p
ublic/amsfonts/cm/cmsy8.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsf
onts/cm/cmti10.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/c
mti9.pfb><c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/cm/cmtt10.pfb>
<c:/TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/symbols/msam10.pfb><c:/
TeXLive/2022/texmf-
dist/fonts/type1/public/amsfonts/symbols/msbm10.pfb><c:/TeXL
ive/2022/texmf-
dist/fonts/type1/public/rsfs/rsfs10.pfb><c:/TeXLive/2022/texmf-d
ist/fonts/type1/public/cm-super/sfrm1000.pfb>
Output written on sn-article.pdf (7 pages, 339879 bytes).
PDF statistics:
317 PDF objects out of 1000 (max. 8388607)
249 compressed objects within 3 object streams
51 named destinations out of 1000 (max. 500000)
145 words of extra memory for PDF output out of 10000 (max. 10000000)

Nody D 23 01248 PDF

Uploaded by

Copyright:

Available Formats

You might also like

Nody D 23 01248 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nody D 23 01248 PDF

Uploaded by

Copyright:

Available Formats

Nonlinear Dynamics

Adaptive Optimal Coordination Control of perturbed Bilateral Teleoperators with

Article Type: Original Research

Keywords: Reinforcement learning, Approximate/adaptive dynamic programming (ADP),

Corresponding Author: Phuong Nam Dao, PhD

Corresponding Author Secondary

Corresponding Author's Secondary

First Author: Phuong Nam Dao, PhD

First Author Secondary Information:

Order of Authors: Phuong Nam Dao, PhD

Quang Phat Nguyen, Master

Manh Hung Vu, Master

The Anh Nguyen, Master

Order of Authors Secondary Information:

Adaptive Optimal Coordination Control of perturbed

*Corresponding author(s). E-mail(s): nam.daophuong@hust.edu.vn;

Keywords: Reinforcement learning, Approximate/adaptive dynamic programming (ADP), Actor/Critic

Adaptive Optimal Coordination Control of perturbed

*Corresponding author(s). E-mail(s): nam.daophuong@hust.edu.vn;

Keywords: Reinforcement learning, Approximate/adaptive dynamic programming (ADP), Actor/Critic

1 Introduction physical phenomenon, a well-known technique of

to achieve not only optimal control requirement 

by the following dynamic equation: Control Objective: When the BT established

and τh is established by human torque to support lim (ηl (t) − ηr (t − Tr (t))) = 0;

In this section, we design a control structure being

Fig. 1: The Control Structure of Master/Slave Side in BTs

fk = Mk (η¨k ref +α1 ėk )+Ck (η̇kref +α1 ek )+Gk +Fk

as: and (13) as:

representation (17) with the condition that time

Bellman function Vk∗ (Xk ) from ubadp,∗

Vk∗ (Xk ) = Wk T Ψk (Xk ) + εk (Xk ) (25) d σk σkT

expressed as: the optimal control scheme as well as the tracking

where Additionally, in order to consider the attraction

as one part of the proposed Lyapunov function can- where

1 ∂ε ∂ε Substituting (59) and Actor/Critic based RL con-

According to (52) and (35), it can be obtained the the BT as:

Remark 4 The following example points out the neces-

ds taking the time derivative of (76), the closed loop error

mechanical parameters in [2] are utilized as fol-

Tl (t) = 0.2 + 0.1 sin (2t) + 0.1 sin (3t) ,

We conduct the proposed control scheme to be

It can be seen that, the coordination effective-

Fig. 9: The Control signals of Local Robot in

[6] J. Li, B. You, L. Ding, J. Xu, W. Li, H. Chen,

[7] D. Heck, A. Saccon, R. Beerens, and

[8] J. Yan, X. Yang, X. Luo, and X. Guan,

[9] D.-H. Zhai and Y. Xia, “A novel switching-

[10] Y. Yuan, Y. Wang, and L. Guo, “Force

[11] D.-H. Zhai and Y. Xia, “Finite-time control of

[12] H. Zhang, A. Song, H. Li, and S. Shen,

[36] S. Li, L. Ding, H. Gao, Y.-J. Liu, L. Huang,

[37] J. Na, Y. Lv, K. Zhang, and J. Zhao, “Adap-

Click here to view linked References

Click here to view linked References

Click here to access/download

This is pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022)

(breakurl) \burl will be just a synonym of \url.

LaTeX Font Warning: Font shape `OT1/cmr/bx/n' in size <8.43146> not