Professional Documents
Culture Documents
Lec ICFDC 5
Lec ICFDC 5
Farzaneh Abdollahi
Winter 2024
Example [1]
subject to
ẋ = f (x(t), u(t), t)
▶ Value fcn (cost-to-go): V (x(t0 ), t0 , tf ) = minu J(x, u, t0 , tf )
▶ Solution:
▶ Bellman generalized Hamiton Jaccobi Equ by considering u(t)
▶ Hamilton Jaccobi Bellman (HJB) equ
∂V ∂V T
− = min{g (x(t), u(t), t) + f (x(t), u(t), t)}
∂t u
| ∂x
{z }
H(x(t),u(t), ∂V
∂x ,t)=Hamiltonian
Example [1]
DNN Identifier
▶ Using Universal approximation theorem, f can be represented as
f (x) = θT ϕ(Φ∗ (x)) + ϵ∗θ (x)
▶ θ ∈ R n×n :ideal output layer weight matrix
▶ ϕ : R p → R h : vector of activation fcn,
▶ Φ∗ = Vk ϱk (Vk−1 , ϱk−1 (Vk−2 , ϱk−2 (...x)))R n ⇒ R p : unknown inner
layer features of the DNN; k number of inner layer; Vk , ϱk : the inner
layer weights and activation fcns
▶ ϵ∗θ (x):bounded function approximation error
▶ The ideal weights are unknown and should be approximated by
learning.
▶ The identifer dynamics:
DNN Identifier
M
θ̂˙ = Γθ ϕ(Φ̂i (x))x̃ T + kθ Γθ
X
ϕ(Φ̂i (xj ))
j=1
ξ˙ = F (ξ) + G (ξ)µ
▶ ξ = [e, xd ]
▶ mu = u − ud (xd )
▶ F = f (e + x d ) − hd (x d ) + g (e + x d )ud (x d )
hd (xd )
▶ G = [g (e + xd )T 0m×n ]T
▶ The control objective is to find a control policy u that minimizes the
cost function: Z ∞
J(ξ, µ) = Q̄(ξ) + µT Rµ dτ
0 | {z }
r (ξ,µ)
Topics
▶ The value fcn for the optimal solution is
Z ∞
V ∗ (ξ) = min r (ξ, µ) dτ
µ∈U 0 | {z }
Q(ξ)+µ∗ (ξ)T Rµ∗ (ξ)
1
µ∗ (ξ) = R −1 G (ξ)T (∇ξ V ∗ (ξ))T
2
Farzaneh Abdollahi Intelligent Control Lecture 5 11/16
Outline Preliminary (Optimal Control) Example [1]
▶ To solve the HJB equ, the optimal value fcn can be found.
Considering the universal approximation property of MLP, NN is
applied to approximate V ∗ (critic)
1 1
µ∗ = R −1 G (ξ)T (∇ξ σ(ξ)T W +∇ξ ϵ(ξ)T ) ≃ R −1 G (ξ)T (∇ξ σ(ξ)T Ŵa )
2 2
▶ The control signal applied to (1) is
Update Laws
▶ The HJB equation in (3) is equal to zero under optimal conditions;
▶ Applying aprroximated fcns V ∗ (ξ) and µ∗ (ξ), and fˆ results in a
residual δ̂
▶ δ̂ is considered as cost fcn to min to define the continuous-time
least-squares-based update laws which are designed based on the
subsequent stability analysis
˙ ω
Ŵc = −ηc1 Γ δ̂ − ηc2 ΓΣc
ρ
Γωω T Γ
Γ̇ = (λΓ − ηc1 − ηc2 ΓΣΓ Γ)1
ρ2
˙ G T Ŵa ω T
Ŵ a = −ηa1 (Ŵa − Ŵc ) − ηa2 Ŵa + ηc1 Ŵc + ηc2 Σa Ŵc
4ρ
Stability Analysis
1 1 1 1
VL = V ∗ (ξ) + W̃cT Γ(−1) W̃c + W̃aT W̃a + x̃ T x̃ + tr (θ̃T Γ−1
θ θ̃)
2 2 2 2
References I
References II