Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Outline Preliminary (Optimal Control) Example [1]

Intelligent Control and Fault Diagnosis


Lecture 5: Intelligent Control (RL)

Farzaneh Abdollahi

Department of Electrical Engineering

Amirkabir University of Technology

Winter 2024

Farzaneh Abdollahi Intelligent Control Lecture 5 1/16


Outline Preliminary (Optimal Control) Example [1]

Preliminary (Optimal Control)

Example [1]

Farzaneh Abdollahi Intelligent Control Lecture 5 2/16


Outline Preliminary (Optimal Control) Example [1]

Prelim (Dynamic Programming in Continues Time)


▶ Objective:
Z tf
J(x, u, t0 , tf ) = h(xf , tf ) + g (x(t), u(t), t)dt
t0

subject to
ẋ = f (x(t), u(t), t)
▶ Value fcn (cost-to-go): V (x(t0 ), t0 , tf ) = minu J(x, u, t0 , tf )
▶ Solution:
▶ Bellman generalized Hamiton Jaccobi Equ by considering u(t)
▶ Hamilton Jaccobi Bellman (HJB) equ

∂V ∂V T
− = min{g (x(t), u(t), t) + f (x(t), u(t), t)}
∂t u
| ∂x
{z }
H(x(t),u(t), ∂V
∂x ,t)=Hamiltonian

Farzaneh Abdollahi Intelligent Control Lecture 5 3/16


Outline Preliminary (Optimal Control) Example [1]

▶ It is expensive to solve n dimensional HJB PDE


▶ For more details ref to Steve Brunton crash courses
▶ RL can be considered as a model-free framework for solving optimal
control problems stated as Markov decision processes (MDPs)
▶ RL rests on two pillars of equal importance:
1. Optimal control: the two most famous RL algorithms, TD- and
Q-learning, are all about approximating the value function that is at
the heart of optimal control. Similarly, actor-critic methods are based
on state-feedback, which is motivated by optimal control theory.
2. Statistics and information theory, especially the topic of exploration,
▶ In Control problems RL can provide optimal solution when there ie
no a priori info on system dynamics

Farzaneh Abdollahi Intelligent Control Lecture 5 4/16


Outline Preliminary (Optimal Control) Example [1]

Example [1]

▶ Consider a nonlinear system

ẋ = f (x) + g (x)u (1)

where f : R n → R n :the drift unknown dynamics; g : R n → R n×m :


the control effectiveness matrix with n ≥ m and the pseudoinverse of
g (x) exists;
▶ Let xd ∈ R n : a time-varying continuously differentiable desired state
trajectory.
▶ Objective: Designing an optimal tracking controller when f (x) is not
known a priori

Farzaneh Abdollahi Intelligent Control Lecture 5 5/16


Outline Preliminary (Optimal Control) Example [1]

Design a DNN Identifier

▶ To improve performance and using indirect approach the unknown


dynamics is identified by a DNN.
▶ A multitimescale DNN-based indetifier is applied
▶ The output layer weights of the DNN are adjusted in real time using
adaptive update laws motivated by a Lyapunov-based stability analysis.
▶ Concurrent to real-time execution, data are collected and DNN
training algorithms (e.g., stochastic gradient descent, iteratively
update the inner layer DNN features.
▶ i.e. the inner layer weights are not updated in real time; the weights
are discretely updated intermittently during task-execution once
training is complete
Farzaneh Abdollahi Intelligent Control Lecture 5 6/16
Outline Preliminary (Optimal Control) Example [1]

DNN Identifier
▶ Using Universal approximation theorem, f can be represented as
f (x) = θT ϕ(Φ∗ (x)) + ϵ∗θ (x)
▶ θ ∈ R n×n :ideal output layer weight matrix
▶ ϕ : R p → R h : vector of activation fcn,
▶ Φ∗ = Vk ϱk (Vk−1 , ϱk−1 (Vk−2 , ϱk−2 (...x)))R n ⇒ R p : unknown inner
layer features of the DNN; k number of inner layer; Vk , ϱk : the inner
layer weights and activation fcns
▶ ϵ∗θ (x):bounded function approximation error
▶ The ideal weights are unknown and should be approximated by
learning.
▶ The identifer dynamics:

x̂˙ = θ̂T ϕ(Φ̂i (x)) + g (x)u + k0 x̃

where x̃ = x − x̂, k0 > 0: estimator gain


Farzaneh Abdollahi Intelligent Control Lecture 5 7/16
Outline Preliminary (Optimal Control) Example [1]

DNN Identifier

▶ The proposed update law for the output layer weights:

M
θ̂˙ = Γθ ϕ(Φ̂i (x))x̃ T + kθ Γθ
X
ϕ(Φ̂i (xj ))
j=1

. (x̄˙ j − gj (xj )uj − θ̂T ϕ(Φ̂i (xj )))T (2)

where Γθ ∈ R h×h and kθ > 0 are constant adaptation gains.


▶ Assumtion: A history stack of input–output data pairs {xj , uj }Mj=1
and history stack of numerically computed state derivatives {x̄˙ j }M
j=1
are available a priori for each index j

Farzaneh Abdollahi Intelligent Control Lecture 5 8/16


Outline Preliminary (Optimal Control) Example [1]

▶ The dynamics are unknown ⇝ud (xd ) is not known.


▶ An approximation of the trajectory tracking component of the
controller: ûd (xd , θ̂) = g + (xd )(hd (xd ) − fˆi (x, θ̂)).
where
▶ hd (xd ) = ẋd is a locally Lipshitz fcn
▶ g + (x) = (g T (x)g (x))−1 g T (x)

Farzaneh Abdollahi Intelligent Control Lecture 5 9/16


Outline Preliminary (Optimal Control) Example [1]

▶ For tracking, let us define the error dynamics:

ξ˙ = F (ξ) + G (ξ)µ

▶ ξ = [e, xd ]
▶ mu = u − ud (xd )
 
▶ F = f (e + x d ) − hd (x d ) + g (e + x d )ud (x d )
hd (xd )
▶ G = [g (e + xd )T 0m×n ]T
▶ The control objective is to find a control policy u that minimizes the
cost function: Z ∞
J(ξ, µ) = Q̄(ξ) + µT Rµ dτ
0 | {z }
r (ξ,µ)

Farzaneh Abdollahi Intelligent Control Lecture 5 10/16


Outline Preliminary (Optimal Control) Example [1]

Topics
▶ The value fcn for the optimal solution is
Z ∞
V ∗ (ξ) = min r (ξ, µ) dτ
µ∈U 0 | {z }
Q(ξ)+µ∗ (ξ)T Rµ∗ (ξ)

▶ The optimal control policy V ∗ is a solution to the corresponding


HJB equation

0 = ∇ξ V ∗ (ξ)(F (ξ) + G (ξ)µ∗ (ξ)) + Q(ξ) + µ∗ (ξ)T Rµ∗ (ξ) (3)

▶ The boundary condition V ∗ (0) = 0,


▶ The optimal policy

1
µ∗ (ξ) = R −1 G (ξ)T (∇ξ V ∗ (ξ))T
2
Farzaneh Abdollahi Intelligent Control Lecture 5 11/16
Outline Preliminary (Optimal Control) Example [1]

▶ To solve the HJB equ, the optimal value fcn can be found.
Considering the universal approximation property of MLP, NN is
applied to approximate V ∗ (critic)

V ∗ = W T σ(ξ) + ϵ(ξ) ≃ ŴcT σ(ξ)

▶ and also the optimal control policy µ∗ (actor)

1 1
µ∗ = R −1 G (ξ)T (∇ξ σ(ξ)T W +∇ξ ϵ(ξ)T ) ≃ R −1 G (ξ)T (∇ξ σ(ξ)T Ŵa )
2 2
▶ The control signal applied to (1) is

u = µ̂(ξ, Ŵa ) + ûd (xd , θ̂)

Farzaneh Abdollahi Intelligent Control Lecture 5 12/16


Outline Preliminary (Optimal Control) Example [1]

Update Laws
▶ The HJB equation in (3) is equal to zero under optimal conditions;
▶ Applying aprroximated fcns V ∗ (ξ) and µ∗ (ξ), and fˆ results in a
residual δ̂
▶ δ̂ is considered as cost fcn to min to define the continuous-time
least-squares-based update laws which are designed based on the
subsequent stability analysis
˙ ω
Ŵc = −ηc1 Γ δ̂ − ηc2 ΓΣc
ρ
Γωω T Γ
Γ̇ = (λΓ − ηc1 − ηc2 ΓΣΓ Γ)1
ρ2
˙ G T Ŵa ω T
Ŵ a = −ηa1 (Ŵa − Ŵc ) − ηa2 Ŵa + ηc1 Ŵc + ηc2 Σa Ŵc

Farzaneh Abdollahi Intelligent Control Lecture 5 13/16


Outline Preliminary (Optimal Control) Example [1]

Stability Analysis

▶ UUB of e, W̃c , W̃a , x̃, θ̃ is show by the following Lyapunov candidate

1 1 1 1
VL = V ∗ (ξ) + W̃cT Γ(−1) W̃c + W̃aT W̃a + x̃ T x̃ + tr (θ̃T Γ−1
θ θ̃)
2 2 2 2

Farzaneh Abdollahi Intelligent Control Lecture 5 14/16


Outline Preliminary (Optimal Control) Example [1]

References I

S. N. M. L. Greene, Z.I. Bell and W. Dixon, “Deep neural network-based


approximate optimal tracking for unknown nonlinear systems,” IEEE
Transactions on Automatic Control, vol. 68, no. 5, pp. 3171–3177, 2023.

D. Silver, Reinforcement Learning Lecture.


https://www.davidsilver.uk/teaching/ (available date:Jan. 2023).

Janis Klaise, Reinforcement learning with MATLAB.


http://web.khu.ac.kr/ tskim/NE%2010-3%20Reinforcement-learning-
ebook.pdf (available date:Jan. 2023).

S. Meyn, Control Systems and Reinforcement Learning.


Campbridge University Press, 2022.

K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, Handbook of


Reinforcement learning and Control.
Springer, 2021.

Farzaneh Abdollahi Intelligent Control Lecture 5 15/16


Outline Preliminary (Optimal Control) Example [1]

References II

R.S. Sutton and A. G. Barto, Reinforcement learning: an introduction.


MIT Press, 2nd ed., 2018.

Z. I. B. S. M. Nahid Mahmud, S. A. Nivison and R. Kamalapurkar, “Safe


model-based reinforcement learning for systems with parametric
uncertainties,” Frontiers in Robotics and AI, vol. 8, no. Article 733104,
2021.

Farzaneh Abdollahi Intelligent Control Lecture 5 16/16

You might also like