Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352738044

Imitation Learning With Stability and Safety Guarantees

Article  in  IEEE Control Systems Letters · January 2022


DOI: 10.1109/LCSYS.2021.3077861

CITATION READS
1 71

4 authors, including:

He Yin Murat Arcak


University of California, Berkeley University of California, Berkeley
27 PUBLICATIONS   59 CITATIONS    200 PUBLICATIONS   3,873 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Reachability analysis and control synthesis for uncertain nonlinear systems View project

Neural Network Controller Robustness Analysis View project

All content following this page was uploaded by He Yin on 04 July 2021.

The user has requested enhancement of the downloaded file.


Imitation Learning with Stability and Safety Guarantees
He Yin, Peter Seiler, Ming Jin and Murat Arcak

Abstract— A method is presented to learn neural network While there are many works focusing on NN robustness
(NN) controllers with stability and safety guarantees through analysis using QCs, e.g., NN safety verification [5], NN
imitation learning (IL). Convex stability and safety conditions Lipschitz constant estimation [6], reachability analysis [7]
are derived for linear time-invariant plant dynamics with NN
controllers by merging Lyapunov theory with local quadratic and stability analysis [8]–[10] of NN controlled systems,
constraints to bound the nonlinear activation functions in the there are relatively fewer results regarding NN synthesis,
arXiv:2012.09293v2 [eess.SY] 7 Apr 2021

NN. These conditions are incorporated in the IL process, which including NN training with bounded Lipschitz constants
minimizes the IL loss, and maximizes the volume of the region [11], and recurrent NN training with bounded incremental
of attraction associated with the NN controller simultaneously. `2 gains [12].
An alternating direction method of multipliers based algorithm
is proposed to solve the IL problem. The method is illustrated on Compared to existing works, this letter makes the fol-
an inverted pendulum system, aircraft longitudinal dynamics, lowing contributions. First, it presents a safe IL algorithm
and vehicle lateral dynamics. to synthesize NN controllers, which trades off between
I. I NTRODUCTION IL accuracy and the size of the stability margin of the
NN controller. To the best of our knowledge, this is the
Imitation learning (IL) is a class of methods that learns first attempt to ensure stability of IL based on deep NNs.
a policy to attain a control goal consistent with expert Second, the stability condition from [8] is nonconvex and
demonstrations [1], [2]. Used in tandem with deep neural thus computationally intractable for NN control synthesis;
networks (NNs), IL presents unique advantages, including here we convexify this constraint (using loop transformation)
a substantial increase in sample efficiency compared to for its efficient enforcement in the learning process. Third,
reinforcement learning (RL) [3], and wide applicability to as demonstrated by the numerical example in Section VI-B,
domains where the reward model is not accessible or on- while the proposed approach can train a policy that imitates
policy data is difficult/unsafe to acquire [1]. While IL is the expert demonstrations, it can potentially improve local
closely related to supervised learning as it trains a mapping stability over suboptimal expert policies, thus enhance the
from observations to actions [4], a key difference is the robustness of IL.
ensuing deployment of the learned policy under dynamics, Other related works include approximating explicit model
which consequently raises the issue of closed-loop stability. predictive control law using NNs for linear parameter varying
This problem naturally falls within the realm of robust plant dynamics [13], and LTI plant dynamics [14], [15]
control, which analyzes stability for uncertain linear or to expedite the online evaluation of controllers. Compared
nonlinear systems; however, a major technical challenge is with these works, the proposed method provides stability
to derive nonconservative guarantees for highly nonlinear guarantees for the learned NN controller.
policies such as NNs that can be also tractably incorporated The stability and safety issues for learning (especially
into the learning process. RL) based control has also been addressed by a Hamilton-
This letter tackles this issue and presents a method to learn Jacobi reachability-based framework in [16], a control barrier
NN controllers with stability and safety guarantees through function based method in [17], [18], and a convex projection
IL. We first derive convex stability and safety conditions for operator based method in [19].
linear time-invariant (LTI) plant dynamics by merging Lya- Notation: Sn , Sn+ and Sn++ denote the sets of n-by-n sym-
punov theory with local sector quadratic constraints (QCs) to metric, positive semidefinite and positive definite matrices,
describe the activation functions in the NN. Then we incorpo- respectively. When applied to vectors, the orders >, ≤ are
rate these constraints in the IL process that minimizes the IL applied elementwise. For P ∈ Sn++ , define the ellipsoid
loss, and maximizes the volume of the region of attraction
associated with the NN controller. Finally, we propose an E(P ) := {x ∈ Rn : x> P x ≤ 1}. (1)
alternating direction method of multipliers (ADMM) based II. P ROBLEM F ORMULATION
method to solve the IL problem.
Consider the feedback system consisting of a plant G and
Funded in part by the Air Force Office of Scientific Research grant state-feedback controller π as shown in Figure 1. We assume
FA9550-18-1-0253, the Office of Naval Research grant N00014-18-1-2209, the plant G is a linear, time-invariant (LTI) system defined
and the U.S. National Science Foundation (NSF) grants ECCS-1906164. M.
Jin acknowledges the funding from NSF EPCN-2034137. by the following discrete-time model:
H. Yin and M. Arcak are with the University of California, Berkeley
{he yin, arcak}@berkeley.edu. x(k + 1) = AG x(k) + BG u(k), (2)
P. Seiler is with the University of Michigan, Ann Arbor nG nu
pseiler@umich.edu. where x(k) ∈ R is the state and u(k) ∈ R is the control,
M. Jin is with the Virginia Tech jinming@vt.edu. AG ∈ RnG ×nG and BG ∈ RnG ×nu . Finally, assume x(k) is
constrained to a set X ⊂ RnG , which is referred to as the controller satisfy the safety condition (x(k) ∈ X ∀k ≥ 0),
“safety condition”. This state constraint set is assumed to be and are able to converge to the equilibrium point if they start
a polytope symmetric around the origin: from the ROA associated with the learned NN controller.
X = {x ∈ RnG : −h ≤ Hx ≤ h, h ≥ 0}, (3) III. S TABILITY AND S AFETY C ONDITIONS FOR NN
where H ∈ R nX ×nG
, and h ∈ R . The controller π : nX C ONTROLLED S YSTEMS
RnG → Rnu is an `-layer, feedforward neural network (NN) In this section, we treat the parameters of the NN con-
defined as: troller as fixed and derive the safety and local stability
w0 (k) = x(k), (4a) conditions of the NN-controlled LTI systems.
i i i i−1 i

w (k) = φ W w (k) + b , i = 1, . . . , `, (4b) A. NN Representation: Isolation of Nonlinearities
`+1 ` `+1
u(k) = W w (k) + b , (4c)
It is useful to isolate the nonlinear activation functions
i ni th from the linear operations of the NN as done in [5], [20].
where w ∈ R are the outputs (activations) from the i
layer and n0 = nG . The operations for each layer are defined Define v i as the input to the activation function φi (recalling
by a weight matrix W i ∈ Rni ×ni−1 , bias vector bi ∈ Rni , that bi = 0ni ×1 by Remark 1):
and activation function φi : Rni → Rni . The activation
function φi is applied element-wise, i.e. v i (k) := W i wi−1 (k), i = 1, . . . , `. (7)

The nonlinear operation of the ith layer (4b) is thus expressed


>
φi (v) := ϕ(v1 ), · · · , ϕ(vni ) ,

(5)
as wi (k) = φi (v i (k)). Gather the inputs and outputs of all
where ϕ : R → R is the (scalar) activation function selected activation functions:
for the NN. Common choices for the scalar activation func- " 1# " 1#
v w
tion include tanh, ReLU, and leaky ReLU. We assume the .
vφ := .. ∈ R and wφ := ... ∈ Rnφ ,

(8)
activation ϕ is identical in all layers; this can be relaxed
v` w`
with minor changes to the notation. We also assume that the
P`
activation ϕ satisfies ϕ(0) = 0. where nφ := i=1 ni , and define the combined nonlinearity
Our goal is to stabilize an equilibrium, which we assume φ : Rnφ → Rnφ by stacking the activation functions:
is shifted to the origin as is standard in control design. To  1 1 
φ (v )
ensure that the equilibrium remains at the origin with the NN
φ(vφ ) :=  .. . (9)
controller, we impose the constraint π(0) = 0. Since π(0) = .
0 translates to a nonconvex constraint on (W i , bi ), we set all `
φ (v ) `

the bias terms to be zero: bi = 0ni ×1 , for i = 1, ..., ` + 1.


Thus wφ (k) = φ(vφ (k)), where the scalar activation function
Remark 1: Setting the bias terms to zero is arguably an
ϕ is applied element-wise to each entry of vφ . Finally, the
underuse of NNs. Whether π(0) = 0 can be achieved with
NN control policy π defined in (4) can be rewritten as:
less restrictive convex constraints is an interesting problem
that deserves further research.
h i h i
u(k) x(k)
vφ (k) = N wφ (k) (10a)
u(k) x(k) wφ (k) = φ(vφ (k)). (10b)
G
<latexit sha1_base64="8hlxYmWfuwQDXM6ZSyMe/eEr7Dw=">AAAB63icbVDLSgNBEOz1GeMr6tHLYBDiJeyKoMegF48RzAOSJcxOZpMhM7PLPISw5Be8eFDEqz/kzb9xNtmDJhY0FFXddHdFKWfa+P63t7a+sbm1Xdop7+7tHxxWjo7bOrGK0BZJeKK6EdaUM0lbhhlOu6miWEScdqLJXe53nqjSLJGPZprSUOCRZDEj2OSSrU0uBpWqX/fnQKskKEgVCjQHla/+MCFWUGkIx1r3Aj81YYaVYYTTWblvNU0xmeAR7TkqsaA6zOa3ztC5U4YoTpQradBc/T2RYaH1VESuU2Az1steLv7n9ayJb8KMydQaKsliUWw5MgnKH0dDpigxfOoIJoq5WxEZY4WJcfGUXQjB8surpH1ZD/x68HBVbdwWcZTgFM6gBgFcQwPuoQktIDCGZ3iFN094L96797FoXfOKmRP4A+/zB2+HjdM=</latexit> <latexit sha1_base64="hJOxIP0lpA/Eos+0aMdtxh4VVbw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBahXkoigh6LXjxWMG2hDWWz3bRLdzdhdyOW0L/gxYMiXv1D3vw3btoctPXBwOO9GWbmhQln2rjut1NaW9/Y3CpvV3Z29/YPqodHbR2nilCfxDxW3RBrypmkvmGG026iKBYhp51wcpv7nUeqNIvlg5kmNBB4JFnECDa59FSfnA+qNbfhzoFWiVeQGhRoDapf/WFMUkGlIRxr3fPcxAQZVoYRTmeVfqppgskEj2jPUokF1UE2v3WGzqwyRFGsbEmD5urviQwLracitJ0Cm7Fe9nLxP6+Xmug6yJhMUkMlWSyKUo5MjPLH0ZApSgyfWoKJYvZWRMZYYWJsPBUbgrf88ippXzQ8t+HdX9aaN0UcZTiBU6iDB1fQhDtogQ8ExvAMr/DmCOfFeXc+Fq0lp5g5hj9wPn8AdByN1g==</latexit>

<latexit sha1_base64="tUjvE4wkWRlcVIxRgxZQP+zWaG8=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRgx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RTW1jc2t4rbpZ3dvf2D8uFRS8epYthksYhVJ6AaBZfYNNwI7CQKaRQIbAfj25nffkKleSwfzCRBP6JDyUPOqLFS465frrhVdw6ySrycVCBHvV/+6g1ilkYoDRNU667nJsbPqDKcCZyWeqnGhLIxHWLXUkkj1H42P3RKzqwyIGGsbElD5urviYxGWk+iwHZG1Iz0sjcT//O6qQmv/YzLJDUo2WJRmApiYjL7mgy4QmbExBLKFLe3EjaiijJjsynZELzll1dJ66LquVWvcVmp3eRxFOEETuEcPLiCGtxDHZrAAOEZXuHNeXRenHfnY9FacPKZY/gD5/MHm/WMyw==</latexit>
<latexit


<latexit sha1_base64="7YlkrurOAYOv+Ch30L7WpchFVys=">AAAB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbbSbt0swm7G6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTAXXxvO+ndLa+sbmVnm7srO7t3/gHh61dJIphk2WiER1QqpRcIlNw43ATqqQxqHAdji+nfntJ1SaJ/LRTFIMYjqUPOKMGis99FLed6tezZuDrBK/IFUo0Oi7X71BwrIYpWGCat31vdQEOVWGM4HTSi/TmFI2pkPsWippjDrI56dOyZlVBiRKlC1pyFz9PZHTWOtJHNrOmJqRXvZm4n9eNzPRdZBzmWYGJVssijJBTEJmf5MBV8iMmFhCmeL2VsJGVFFmbDoVG4K//PIqaV3UfK/m319W6zdFHGU4gVM4Bx+uoA530IAmMBjCM7zCmyOcF+fd+Vi0lpxi5hj+wPn8AU+9jc0=</latexit>
<latexit
The matrix N depends on the weights as follows, where the
vertical and horizontal bars partition N compatibly with the
inputs (x, wφ ) and outputs (u, vφ ):
Fig. 1: Feedback system with plant G and NN π
W `+1
 0 0 0 ···

If control constraint sets are hypercubes, they can be con- W1 0 ··· 0 0  
 0 W2 · · · 0 0 Nux Nuw
sidered in the framework by applying activation functions, N = .  := N .

.
.
.
.
.
.
.
.
. vx Nvw
like tanh, elementwise to the output layer (4c). . . . . .
0 0 ··· W` 0
Let χ(k; x0 ) denote the solution to the feedback system
at time k from initial condition x(0) = x0 . The region of This decomposition of the NN, depicted in Figure 2, isolates
attraction (ROA) is defined below. the nonlinearities φ in preparation for the stability analysis.
Definition 1: The region of attraction (ROA) of the feed-
x(k) u(k)
back system with plant G and NN π is defined as <latexit sha1_base64="hJOxIP0lpA/Eos+0aMdtxh4VVbw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBahXkoigh6LXjxWMG2hDWWz3bRLdzdhdyOW0L/gxYMiXv1D3vw3btoctPXBwOO9GWbmhQln2rjut1NaW9/Y3CpvV3Z29/YPqodHbR2nilCfxDxW3RBrypmkvmGG026iKBYhp51wcpv7nUeqNIvlg5kmNBB4JFnECDa59FSfnA+qNbfhzoFWiVeQGhRoDapf/WFMUkGlIRxr3fPcxAQZVoYRTmeVfqppgskEj2jPUokF1UE2v3WGzqwyRFGsbEmD5urviQwLracitJ0Cm7Fe9nLxP6+Xmug6yJhMUkMlWSyKUo5MjPLH0ZApSgyfWoKJYvZWRMZYYWJsPBUbgrf88ippXzQ8t+HdX9aaN0UcZTiBU6iDB1fQhDtogQ8ExvAMr/DmCOfFeXc+Fq0lp5g5hj9wPn8AdByN1g==</latexit>

N
<latexit sha1_base64="FsFohXQjc3OA/BBAi7JXP9TMCN0=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRiydpwX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8MJME/YgOJQ85o8ZKjft+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia89jMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpPHUYQTOIVz8OAKanAHdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fppGM0g==</latexit>
<latexit
<latexit sha1_base64="8hlxYmWfuwQDXM6ZSyMe/eEr7Dw=">AAAB63icbVDLSgNBEOz1GeMr6tHLYBDiJeyKoMegF48RzAOSJcxOZpMhM7PLPISw5Be8eFDEqz/kzb9xNtmDJhY0FFXddHdFKWfa+P63t7a+sbm1Xdop7+7tHxxWjo7bOrGK0BZJeKK6EdaUM0lbhhlOu6miWEScdqLJXe53nqjSLJGPZprSUOCRZDEj2OSSrU0uBpWqX/fnQKskKEgVCjQHla/+MCFWUGkIx1r3Aj81YYaVYYTTWblvNU0xmeAR7TkqsaA6zOa3ztC5U4YoTpQradBc/T2RYaH1VESuU2Az1steLv7n9ayJb8KMydQaKsliUWw5MgnKH0dDpigxfOoIJoq5WxEZY4WJcfGUXQjB8surpH1ZD/x68HBVbdwWcZTgFM6gBgFcQwPuoQktIDCGZ3iFN094L96797FoXfOKmRP4A+/zB2+HjdM=</latexit>

R := {x0 ∈ X : lim χ(k; x0 ) = 0nG ×1 }. (6) w (k)


<latexit sha1_base64="lyuZcTL3h/KzxVWtK47kE6cShLI=">AAAB8HicbVBNSwMxEJ31s9avqkcvwSLUS9kVQY9FLx4r2A9pl5JNs21okg1JVilLf4UXD4p49ed489+YtnvQ1gcDj/dmmJkXKc6M9f1vb2V1bX1js7BV3N7Z3dsvHRw2TZJqQhsk4YluR9hQziRtWGY5bStNsYg4bUWjm6nfeqTasETe27GiocADyWJGsHXSw1Ovq4asMjrrlcp+1Z8BLZMgJ2XIUe+Vvrr9hKSCSks4NqYT+MqGGdaWEU4nxW5qqMJkhAe046jEgpowmx08QadO6aM40a6kRTP190SGhTFjEblOge3QLHpT8T+vk9r4KsyYVKmlkswXxSlHNkHT71GfaUosHzuCiWbuVkSGWGNiXUZFF0Kw+PIyaZ5XA78a3F2Ua9d5HAU4hhOoQACXUINbqEMDCAh4hld487T34r17H/PWFS+fOYI/8D5/ADmGkAM=</latexit>
v (k)
<latexit sha1_base64="nxU1FXoXpX4MiNIZ2vf7DD69UbM=">AAAB8HicbVDLSgNBEOz1GeMr6tHLYBDiJeyKoMegF48RzEOSJcxOZpMh81hmZgNhyVd48aCIVz/Hm3/jJNmDJhY0FFXddHdFCWfG+v63t7a+sbm1Xdgp7u7tHxyWjo6bRqWa0AZRXOl2hA3lTNKGZZbTdqIpFhGnrWh0N/NbY6oNU/LRThIaCjyQLGYEWyc9jXvdZMgqo4teqexX/TnQKglyUoYc9V7pq9tXJBVUWsKxMZ3AT2yYYW0Z4XRa7KaGJpiM8IB2HJVYUBNm84On6NwpfRQr7UpaNFd/T2RYGDMRkesU2A7NsjcT//M6qY1vwozJJLVUksWiOOXIKjT7HvWZpsTyiSOYaOZuRWSINSbWZVR0IQTLL6+S5mU18KvBw1W5dpvHUYBTOIMKBHANNbiHOjSAgIBneIU3T3sv3rv3sWhd8/KZE/gD7/MHN/qQAg==</latexit>

k→∞
Given state and control data pairs from the expert demon- <latexit sha1_base64="vMCmCBvrQqvMkcZe5c78yK19qu0=">AAAB63icbVBNSwMxEJ3Ur1q/qh69BIvgqeyKoMeiF48VbC20S8mm2W5okl2SrFCW/gUvHhTx6h/y5r8x2+5BWx8MPN6bYWZemApurOd9o8ra+sbmVnW7trO7t39QPzzqmiTTlHVoIhLdC4lhgivWsdwK1ks1IzIU7DGc3Bb+4xPThifqwU5TFkgyVjzilNhCGqQxH9YbXtObA68SvyQNKNEe1r8Go4RmkilLBTGm73upDXKiLaeCzWqDzLCU0AkZs76jikhmgnx+6wyfOWWEo0S7UhbP1d8TOZHGTGXoOiWxsVn2CvE/r5/Z6DrIuUozyxRdLIoygW2Ci8fxiGtGrZg6Qqjm7lZMY6IJtS6emgvBX355lXQvmr7X9O8vG62bMo4qnMApnIMPV9CCO2hDByjE8Ayv8IYkekHv6GPRWkHlzDH8Afr8AROIjj8=</latexit>

strations, our goal is to learn a NN controller from the data


to reproduce the demonstrated behavior, while guaranteeing Fig. 2: NN representation to isolate the nonlinearities φ.
the system trajectories under the control of the learned NN
B. Quadratic Constraints: Scalar Activation Functions In order to apply the local sector bounds in the stability
The stability analysis relies on quadratic constraints (QCs) analysis, we must first compute the bounds v, v ∈ Rnφ on
to bound the activation function. A typical constraint is the the activation input vφ . The process to compute the bounds
sector bound as defined next. is briefly discussed here with more details provided in [8],
Definition 2: Let α ≤ β be given. The function ϕ : R → [21]. The first step is to find the smallest hypercube that
R lies in the (global) sector [α, β] if: bounds the state constraint set: X ⊆ {x : x ≤ x ≤ x}.
Therefore, w0 (defined in (4a)) is bounded by w0 = x and
(ϕ(ν) − αν) · (βν − ϕ(ν)) ≥ 0 ∀ν ∈ R. (11) w0 = x. Define c = 12 (w0 + w0 ), r = 12 (w0 − w0 ), and
The interpretation of the sector [α, β] is that ϕ lies between denote y > as the ith row of W 1 . Then the first activation
lines passing through the origin with slope α and β. Many input v 1 = W 1 w0 is bounded by , v ], where v 1i = y > c+
1 1
n0 P[vn0
activation functions are bounded in the sector [0, 1], e.g. tanh 1 >
P
j=1 |yj rj |, and v i = y c − j=1 |yj rj |. If the activation
4 and ReLU. Figure 3 illustrates ϕ(ν) = tanh(ν) (blue solid) functions φ1 are monotonically non-decreasing, then the first
and the global sector defined by [0, 1] (red solid lines). activation output w1 is bounded by w1 = φ1 (v 1 ) and w1 =
3
tanh(⌫) (¯
⌫ , tanh(¯
⌫ )) φ1 (v 1 ). This process can be propagated through all layers of
the NN to obtain the bounds v, v ∈ Rnφ for the activation
<latexit sha1_base64="rM5XYp4Jy8TR4es/d+3Aks6n8uI=">AAACDHicbVDLSsNAFJ34rPVVdelmsAgtSElE0GXRjcsK9gFNKJPppB06mYSZG7GEfIAbf8WNC0Xc+gHu/BunbUBtPTBwOOdc7tzjx4JrsO0va2l5ZXVtvbBR3Nza3tkt7e23dJQoypo0EpHq+EQzwSVrAgfBOrFiJPQFa/ujq4nfvmNK80jewjhmXkgGkgecEjBSr1SuuD5RqSuT7AS7wO4hBSKH2Y9crZqUXbOnwIvEyUkZ5Wj0Sp9uP6JJyCRQQbTuOnYMXkoUcCpYVnQTzWJCR2TAuoZKEjLtpdNjMnxslD4OImWeBDxVf0+kJNR6HPomGRIY6nlvIv7ndRMILryUyzgBJulsUZAIDBGeNIP7XDEKYmwIoYqbv2I6JIpQMP0VTQnO/MmLpHVac+yac3NWrl/mdRTQITpCFeSgc1RH16iBmoiiB/SEXtCr9Wg9W2/W+yy6ZOUzB+gPrI9vkFubSw==</latexit>

<latexit sha1_base64="ATKxfN36itgpaIJ3FRphUA/YDg4=">AAAB+XicbVBNS8NAEN34WetX1KOXYBHqpSQi6LHoxWMF+wFNKJvtpF262YTdSbGE/hMvHhTx6j/x5r9x2+agrQ8GHu/NMDMvTAXX6Lrf1tr6xubWdmmnvLu3f3BoHx23dJIpBk2WiER1QqpBcAlN5CigkyqgcSigHY7uZn57DErzRD7iJIUgpgPJI84oGqln2z7CE+ZI5XBa9WV20bMrbs2dw1klXkEqpECjZ3/5/YRlMUhkgmrd9dwUg5wq5EzAtOxnGlLKRnQAXUMljUEH+fzyqXNulL4TJcqURGeu/p7Iaaz1JA5NZ0xxqJe9mfif180wuglyLtMMQbLFoigTDibOLAanzxUwFBNDKFPc3OqwIVWUoQmrbELwll9eJa3LmufWvIerSv22iKNETskZqRKPXJM6uScN0iSMjMkzeSVvVm69WO/Wx6J1zSpmTsgfWJ8/nKqTnw==</latexit>

input vφ . The remainder of the paper will assume the local


1
sector bounds have been computed as briefly summarized in

0
<latexit sha1_base64="Oa0f5AT46flV46saoc14N8Pl4zE=">AAAB+HicbVDLSsNAFL3xWeujUZduBovgqiQi6LLoxmUF+4AmlMlk2g6dTMI8hBr6JW5cKOLWT3Hn3zhps9DWAwOHc+7h3jlRxpnSnvftrK1vbG5tV3aqu3v7BzX38KijUiMJbZOUp7IXYUU5E7Stmea0l0mKk4jTbjS5LfzuI5WKpeJBTzMaJngk2JARrK00cGuBETGVRTwPhJkN3LrX8OZAq8QvSR1KtAbuVxCnxCRUaMKxUn3fy3SYY6kZ4XRWDYyiGSYTPKJ9SwVOqArz+eEzdGaVGA1TaZ/QaK7+TuQ4UWqaRHYywXqslr1C/M/rGz28DnMmMqOpIItFQ8ORTlHRAoqZpETzqSWYSGZvRWSMJSbadlW1JfjLX14lnYuG7zX8+8t686asowIncArn4MMVNOEOWtAGAgae4RXenCfnxXl3Phaja06ZOYY/cD5/AHhSk5o=</latexit>
<latexit

⌫ the following property.


⌫¯
<latexit sha1_base64="QzPQlzp7LG42rVgcii2BXB1/0uo=">AAAB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpoaeyvlv1at4cZJX4BalCgUbf/eoNEpbFXCGT1Jiu76UY5FSjYJJPK73M8JSyMR3yrqWKxtwE+fzUKTmzyoBEibalkMzV3xM5jY2ZxKHtjCmOzLI3E//zuhlG10EuVJohV2yxKMokwYTM/iYDoTlDObGEMi3srYSNqKYMbToVG4K//PIqaV3UfK/m319W6zdFHGU4gVM4Bx+uoA530IAmMBjCM7zCmyOdF+fd+Vi0lpxi5hj+wPn8AV7jjdc=</latexit>
<latexit sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit>
sha1_base64="9WW7h9R0oEO0pBKh1TuBJdRBUms=">AAAB33icbZBLSwMxFIXv1FetVatbN8EiuCozbnQpuHFZ0T6gHUomvdOGZjJDckcoQ3+CGxeK+K/c+W9MHwttPRD4OCch954oU9KS7397pa3tnd298n7loHp4dFw7qbZtmhuBLZGq1HQjblFJjS2SpLCbGeRJpLATTe7meecZjZWpfqJphmHCR1rGUnBy1mNf54Na3W/4C7FNCFZQh5Wag9pXf5iKPEFNQnFre4GfUVhwQ1IonFX6ucWMiwkfYc+h5gnasFiMOmMXzhmyODXuaGIL9/eLgifWTpPI3Uw4je16Njf/y3o5xTdhIXWWE2qx/CjOFaOUzfdmQ2lQkJo64MJINysTY264INdOxZUQrK+8Ce2rRuA3ggcfynAG53AJAVzDLdxDE1ogYAQv8AbvnvJevY9lXSVv1dsp/JH3+QNBMIyC</latexit>
sha1_base64="lnQ1U2b8O6Zlm7oNX9mKcNN5LXs=">AAAB6nicbVA9TwJBEJ3DL8Qv1NJmIzGxIns2WhJtLDEKksCF7C17sGFv77I7Z0Iu/AQbC42x9RfZ+W9c4AoFXzLJy3szmZkXpkpapPTbK62tb2xulbcrO7t7+wfVw6O2TTLDRYsnKjGdkFmhpBYtlKhEJzWCxaESj+H4ZuY/PgljZaIfcJKKIGZDLSPJGTrpvqezfrVG63QOskr8gtSgQLNf/eoNEp7FQiNXzNquT1MMcmZQciWmlV5mRcr4mA1F11HNYmGDfH7qlJw5ZUCixLjSSObq74mcxdZO4tB1xgxHdtmbif953QyjqyCXOs1QaL5YFGWKYEJmf5OBNIKjmjjCuJHuVsJHzDCOLp2KC8FffnmVtC/qPq37d7TWuC7iKMMJnMI5+HAJDbiFJrSAwxCe4RXePOW9eO/ex6K15BUzx/AH3ucPXaON0w==</latexit>

<latexit sha1_base64="bzNC6HQSwFQ6BxJ8nQmeqQVfRy0=">AAAB8HicbVDLSgNBEOyNrxhfUY9eBoPgKeyKoMegF48RzEOyS5idzCZDZmaXeQhhyVd48aCIVz/Hm3/jJNmDJhY0FFXddHfFGWfa+P63V1pb39jcKm9Xdnb39g+qh0dtnVpFaIukPFXdGGvKmaQtwwyn3UxRLGJOO/H4duZ3nqjSLJUPZpLRSOChZAkj2DjpMYyxykNpp/1qza/7c6BVEhSkBgWa/epXOEiJFVQawrHWvcDPTJRjZRjhdFoJraYZJmM8pD1HJRZUR/n84Ck6c8oAJalyJQ2aq78nciy0nojYdQpsRnrZm4n/eT1rkusoZzKzhkqyWJRYjkyKZt+jAVOUGD5xBBPF3K2IjLDCxLiMKi6EYPnlVdK+qAd+Pbi/rDVuijjKcAKncA4BXEED7qAJLSAg4Ble4c1T3ov37n0sWkteMXMMf+B9/gAhmZCc</latexit>
<latexit

Property 1: Let the state constraint set X and its corre-


-1
Global Sector sponding activation input bounds v, v be given. There exist
-2
Local Sector αφ , βφ such that φ satisfies the local sector for all vφ ∈ [v, v].
Fig. 3: Sector constraints on tanh D. Lyapunov Condition
-3

The global sector is often too coarse for analysis; thus, we This section uses a Lyapunov function and the local sector
-4
consider a local sector for tighter bounds. to compute an inner approximation for the ROA of the
-6 Definition
-4 0 ν, ν̄ ∈2 R with 4α ≤ β and
3:-2 Let α, β, 6 ν ≤ feedback system of G and π.
0 ≤ ν̄. The function ϕ : R → R satisfies the local sector Theorem 1: ([8]) Consider the feedback system of plant G
[α, β] if (ϕ(ν) − α ν) · (β ν − ϕ(ν)) ≥ 0 ∀ν ∈ [ν, ν̄]. in (2) and NN π in (4) with equilibrium point x∗ = 0nG ×1 ,
As an example, ϕ(ν) := tanh(ν) restricted to the interval and the state constraint set X. Let v̄, v, αφ , βφ ∈ Rnφ be
[−ν̄, ν̄] satisfies the local sector bound [α, β] with α := given vectors satisfying Property 1 for the NN and the set
tanh(ν̄)/ν̄ > 0 and β := 1. As shown in Figure 3 (green X. Denote the ith row of the matrix H by Hi> and define
dashed lines), the local sector provides a tighter bound than    
I 0nG ×nφ Nvx Nvw
the global sector. These bounds are valid for a symmetric RV := NnG N
, and R φ := 0 I .
ux uw nφ ×nG nφ
interval around the origin with ν = −ν̄; non-symmetric
intervals (ν 6= −ν̄) can be handled similarly. If there exists a matrix P ∈ Sn++ G
, and vector λ ∈ Rnφ with
λ ≥ 0 such that Λ = diag(λ) satisfying
C. Quadratic Constraints: Combined Activation Functions  >
A>

> AG P AG − P G P BG
Local sector constraints can also be defined for the com- RV > > RV
BG P AG BG P BG
bined nonlinearity φ, given by (9). Let v, v̄ ∈ Rnφ be  
given with v ≤ 0 ≤ v̄. Suppose that the activation input > −2Aφ Bφ Λ (Aφ + Bφ )Λ
+ Rφ Rφ < 0, (13a)
vφ ∈ Rnφ lies element-wise in the interval [v, v̄] and the ith (Aφ + Bφ )Λ −2Λ
 
input/output pair is wi = ϕ(vi ), and the scalar activation h2i Hi>
≥ 0, i = 1, · · · , nX , (13b)
function satisfies the local sector [αi , βi ] with the input Hi P
restricted to vi ∈ [v i , v̄i ] for i = 1, . . . , nφ . The local
then: (i) the feedback system consisting of G and π is
sector bounds can be computed for ϕ on the given interval
locally asymptotically stable around x∗ , and (ii) the set E(P ),
analytically (as above for tanh). These local sectors can be
defined by (1), is an inner-approximation to the ROA.
stacked into vectors αφ , βφ ∈ Rnφ that provide QCs satisfied
Proof: The proof can be found in [8, Theorem 1].
by the combined nonlinearity φ.
The Lyapunov condition (13a) is convex in P and λ if
Lemma 1: Let αφ , βφ , v, v̄ ∈ Rnφ be given with αφ ≤
the weight matrix N is given, and thus we can efficiently
βφ , and v ≤ 0 ≤ v̄. Suppose that φ satisfies the local sector
compute the ROA inner-estimates. However, this condition is
[αφ , βφ ] element-wise for all vφ ∈ [v, v̄]. For any λ ∈ Rnφ
computationally intractable for NN controller synthesis, as it
with λ ≥ 0, and for all vφ ∈ [v, v̄], wφ = φ(vφ ), it holds
is nonconvex if we search over N , P , and λ simultaneously.
 >   
vφ −2Aφ Bφ Λ (Aφ + Bφ )Λ vφ IV. C ONVEX S TABILITY AND S AFETY C ONDITIONS
w
φ (A + B )Λ
φ −2Λ w ≥ 0, (12)
φ φ
In [11], [12], αφ is set to zero to formulate convex con-
where Aφ = diag(αφ ), Bφ = diag(βφ ), and Λ = diag(λ). straints. However, this restriction is too coarse for stability
Proof: The proof can be found in [8, Lemma 1]. analysis of NN controlled systems. Instead, we perform a
loop transformation as shown in Fig. 4 to convexify the Matching (20) and (21) with (14a) we get
stability condition without having restrictions on αφ and βφ .  
Nux + C2 (I − C4 )−1 Nvx C1 + C2 (I − C4 )−1 C3
Ñ = −1 −1 .
x(k)
<latexit sha1_base64="hJOxIP0lpA/Eos+0aMdtxh4VVbw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBahXkoigh6LXjxWMG2hDWWz3bRLdzdhdyOW0L/gxYMiXv1D3vw3btoctPXBwOO9GWbmhQln2rjut1NaW9/Y3CpvV3Z29/YPqodHbR2nilCfxDxW3RBrypmkvmGG026iKBYhp51wcpv7nUeqNIvlg5kmNBB4JFnECDa59FSfnA+qNbfhzoFWiVeQGhRoDapf/WFMUkGlIRxr3fPcxAQZVoYRTmeVfqppgskEj2jPUokF1UE2v3WGzqwyRFGsbEmD5urviQwLracitJ0Cm7Fe9nLxP6+Xmug6yJhMUkMlWSyKUo5MjPLH0ZApSgyfWoKJYvZWRMZYYWJsPBUbgrf88ippXzQ8t+HdX9aaN0UcZTiBU6iDB1fQhDtogQ8ExvAMr/DmCOfFeXc+Fq0lp5g5hj9wPn8AdByN1g==</latexit>
u(k)
<latexit sha1_base64="8hlxYmWfuwQDXM6ZSyMe/eEr7Dw=">AAAB63icbVDLSgNBEOz1GeMr6tHLYBDiJeyKoMegF48RzAOSJcxOZpMhM7PLPISw5Be8eFDEqz/kzb9xNtmDJhY0FFXddHdFKWfa+P63t7a+sbm1Xdop7+7tHxxWjo7bOrGK0BZJeKK6EdaUM0lbhhlOu6miWEScdqLJXe53nqjSLJGPZprSUOCRZDEj2OSSrU0uBpWqX/fnQKskKEgVCjQHla/+MCFWUGkIx1r3Aj81YYaVYYTTWblvNU0xmeAR7TkqsaA6zOa3ztC5U4YoTpQradBc/T2RYaH1VESuU2Az1steLv7n9ayJb8KMydQaKsliUWw5MgnKH0dDpigxfOoIJoq5WxEZY4WJcfGUXQjB8surpH1ZD/x68HBVbdwWcZTgFM6gBgFcQwPuoQktIDCGZ3iFN094L96797FoXfOKmRP4A+/zB2+HjdM=</latexit>
(I − C4 ) Nvx (I − C4 ) C3
B A N
+ Thus, Ñ is a function of N denoted concisely as Ñ = f (N ).
<latexit sha1_base64="FsFohXQjc3OA/BBAi7JXP9TMCN0=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRiydpwX5AG8pmO2nXbjZhdyOU0F/gxYMiXv1J3vw3btsctPXBwOO9GWbmBYng2rjut1NYW9/Y3Cpul3Z29/YPyodHLR2nimGTxSJWnYBqFFxi03AjsJMopFEgsB2Mb2d++wmV5rF8MJME/YgOJQ85o8ZKjft+ueJW3TnIKvFyUoEc9X75qzeIWRqhNExQrbuemxg/o8pwJnBa6qUaE8rGdIhdSyWNUPvZ/NApObPKgISxsiUNmau/JzIaaT2JAtsZUTPSy95M/M/rpia89jMuk9SgZItFYSqIicnsazLgCpkRE0soU9zeStiIKsqMzaZkQ/CWX14lrYuq51a9xmWldpPHUYQTOIVz8OAKanAHdWgCA4RneIU359F5cd6dj0VrwclnjuEPnM8fppGM0g==</latexit>
<latexit

<latexit sha1_base64="aWOcHJrcbrsxJSLO3n80z068CGE=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBZBEEoigh6LXjy2YD+gDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjlo5TxbDJYhGrTkA1Ci6xabgR2EkU0igQ2A7GdzO//YRK81g+mEmCfkSHkoecUWOlxkW/XHGr7hxklXg5qUCOer/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S1mXVc6te46pSu83jKMIJnMI5eHANNbiHOjSBAcIzvMKb8+i8OO/Ox6K14OQzx/AHzucPcYWMrw==</latexit>
<latexit

<latexit sha1_base64="DgzOlZI/aWnb+lJZFzqqejttYRI=">AAACAnicbZDLSsNAFIZP6q3WW9SVuBksghtLUgS7rLpxWcFeoAlhMp20QycXZiZCCcWNr+LGhSJufQp3vo3TNAtt/WHg4z/ncOb8fsKZVJb1bZRWVtfWN8qbla3tnd09c/+gI+NUENomMY9Fz8eSchbRtmKK014iKA59Trv++GZW7z5QIVkc3atJQt0QDyMWMIKVtjzzyAkEJtm15yQjhs7RVQ7TrD71zKpVs3KhZbALqEKhlmd+OYOYpCGNFOFYyr5tJcrNsFCMcDqtOKmkCSZjPKR9jREOqXSz/IQpOtXOAAWx0C9SKHd/T2Q4lHIS+rozxGokF2sz879aP1VBw81YlKSKRmS+KEg5UjGa5YEGTFCi+EQDJoLpvyIywjoTpVOr6BDsxZOXoVOv2VbNvruoNhtFHGU4hhM4AxsuoQm30II2EHiEZ3iFN+PJeDHejY95a8koZg7hj4zPHzHvlp0=</latexit>
2
A +B
It is important to note that Ñ depends on N both directly,
2 and also indirectly through its dependence on (Aφ , Bφ ).
z (k) Ñ <latexit sha1_base64="MNvhc8O2cHujoW/St0XOzN+W0RQ=">AAACAnicbZDLSsNAFIZP6q3WW9SVuBksgiCUpAh2WXXjsoK9QBPCZDpph04uzEyEEoobX8WNC0Xc+hTufBunaRba+sPAx3/O4cz5/YQzqSzr2yitrK6tb5Q3K1vbO7t75v5BR8apILRNYh6Lno8l5SyibcUUp71EUBz6nHb98c2s3n2gQrI4uleThLohHkYsYAQrbXnmkRMITLIrz0lGDJ2j6xymWX3qmVWrZuVCy2AXUIVCLc/8cgYxSUMaKcKxlH3bSpSbYaEY4XRacVJJE0zGeEj7GiMcUulm+QlTdKqdAQpioV+kUO7+nshwKOUk9HVniNVILtZm5n+1fqqChpuxKEkVjch8UZBypGI0ywMNmKBE8YkGTATTf0VkhHUmSqdW0SHYiycvQ6des62afXdRbTaKOMpwDCdwBjZcQhNuoQVtIPAIz/AKb8aT8WK8Gx/z1pJRzBzCHxmfPy7Ilps=</latexit>
sha1_base64="hP+6LrUf2d3tZaldqaQQvEKMXyw=">AAAB2XicbZDNSgMxFIXv1L86Vq1rN8EiuCozbnQpuHFZwbZCO5RM5k4bmskMyR2hDH0BF25EfC93vo3pz0JbDwQ+zknIvSculLQUBN9ebWd3b/+gfugfNfzjk9Nmo2fz0gjsilzl5jnmFpXU2CVJCp8LgzyLFfbj6f0i77+gsTLXTzQrMMr4WMtUCk7O6oyaraAdLMW2IVxDC9YaNb+GSS7KDDUJxa0dhEFBUcUNSaFw7g9LiwUXUz7GgUPNM7RRtRxzzi6dk7A0N+5oYkv394uKZ9bOstjdzDhN7Ga2MP/LBiWlt1EldVESarH6KC0Vo5wtdmaJNChIzRxwYaSblYkJN1yQa8Z3HYSbG29D77odBu3wMYA6nMMFXEEIN3AHD9CBLghI4BXevYn35n2suqp569LO4I+8zx84xIo4</latexit>
sha1_base64="CSwHf39JH9rLQrCruD01ppTVEkI=">AAAB93icbZDNS8MwGMbfzq85p1Zv4iU4BEEY7S569OPicYL7gLWUNEu3sDQtSSqMUrz4r3jxoIj/iTf/G7NuB918IPDjeRLevE+Ycqa043xblbX1jc2t6nZtp767t28f1LsqySShHZLwRPZDrChngnY005z2U0lxHHLaCye3s7z3SKViiXjQ05T6MR4JFjGCtbEC+8iLJCb5deClY4bO0U0JRd4qArvhNJ1SaBXcBTRgoXZgf3nDhGQxFZpwrNTAdVLt51hqRjgtal6maIrJBI/owKDAMVV+Xq5QoFPjDFGUSHOERqX7+0WOY6WmcWhuxliP1XI2M//LBpmOLv2ciTTTVJD5oCjjSCdo1gcaMkmJ5lMDmEhm/orIGJtOtGmtZkpwl1dehW6r6TpN996BKhzDCZyBCxdwBXfQhg4QeIIXeIN369l6tT7mdVWsRW+H8EfW5w+3fpU4</latexit>
sha1_base64="2NPLWpVDupuyw4qZ/nL3piPJf9k=">AAACAnicbZDLSsNAFIZP6q3WW9SVuBksgiCUpBu7rLpxWcHWQhPCZDpph04uzEyEEoIbX8WNC0Xc+hTufBunaRba+sPAx3/O4cz5/YQzqSzr26isrK6tb1Q3a1vbO7t75v5BT8apILRLYh6Lvo8l5SyiXcUUp/1EUBz6nN77k+tZ/f6BCsni6E5NE+qGeBSxgBGstOWZR04gMMkuPScZM3SOrgrIs2bumXWrYRVCy2CXUIdSHc/8coYxSUMaKcKxlAPbSpSbYaEY4TSvOamkCSYTPKIDjREOqXSz4oQcnWpniIJY6BcpVLi/JzIcSjkNfd0ZYjWWi7WZ+V9tkKqg5WYsSlJFIzJfFKQcqRjN8kBDJihRfKoBE8H0XxEZY52J0qnVdAj24snL0Gs2bKth31r1dquMowrHcAJnYMMFtOEGOtAFAo/wDK/wZjwZL8a78TFvrRjlzCH8kfH5Ay2Ilpc=</latexit>

v (k)
Specifically, suppose both N and a hypercube state bound
<latexit sha1_base64="evDQVvOFrP2zshbcLNCezvkHpCM=">AAAB8HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRiyepYGulDWWz2bRLd5OwOxFK6K/w4kERr/4cb/4bt20O2vpg4PHeDDPzglQKg6777ZRWVtfWN8qbla3tnd296v5B2ySZZrzFEpnoTkANlyLmLRQoeSfVnKpA8odgdD31H564NiKJ73Gccl/RQSwiwSha6bGHQoY8v530qzW37s5AlolXkBoUaParX70wYZniMTJJjel6bop+TjUKJvmk0ssMTykb0QHvWhpTxY2fzw6ekBOrhCRKtK0YyUz9PZFTZcxYBbZTURyaRW8q/ud1M4wu/VzEaYY8ZvNFUSYJJmT6PQmF5gzl2BLKtLC3EjakmjK0GVVsCN7iy8ukfVb33Lp3d15rXBVxlOEIjuEUPLiABtxAE1rAQMEzvMKbo50X5935mLeWnGLmEP7A+fwBA4uQiA==</latexit>

<latexit sha1_base64="aQI+Lyiuu+RpiCam+EzklXw6RGU=">AAAB8HicbVBNSwMxEJ34WetX1aOXYBHqpeyKYI8FLx4r2A9pl5JNs21okl2SrFCX/govHhTx6s/x5r8xbfegrQ8GHu/NMDMvTAQ31vO+0dr6xubWdmGnuLu3f3BYOjpumTjVlDVpLGLdCYlhgivWtNwK1kk0IzIUrB2Ob2Z++5Fpw2N1bycJCyQZKh5xSqyTHp76vWTEK+OLfqnsVb058Crxc1KGHI1+6as3iGkqmbJUEGO6vpfYICPacirYtNhLDUsIHZMh6zqqiGQmyOYHT/G5UwY4irUrZfFc/T2REWnMRIauUxI7MsveTPzP66Y2qgUZV0lqmaKLRVEqsI3x7Hs84JpRKyaOEKq5uxXTEdGEWpdR0YXgL7+8SlqXVd+r+ndX5Xotj6MAp3AGFfDhGupwCw1oAgUJz/AKb0ijF/SOPhatayifOYE/QJ8/OyiP/A==</latexit>

<latexit sha1_base64="nxU1FXoXpX4MiNIZ2vf7DD69UbM=">AAAB8HicbVDLSgNBEOz1GeMr6tHLYBDiJeyKoMegF48RzEOSJcxOZpMh81hmZgNhyVd48aCIVz/Hm3/jJNmDJhY0FFXddHdFCWfG+v63t7a+sbm1Xdgp7u7tHxyWjo6bRqWa0AZRXOl2hA3lTNKGZZbTdqIpFhGnrWh0N/NbY6oNU/LRThIaCjyQLGYEWyc9jXvdZMgqo4teqexX/TnQKglyUoYc9V7pq9tXJBVUWsKxMZ3AT2yYYW0Z4XRa7KaGJpiM8IB2HJVYUBNm84On6NwpfRQr7UpaNFd/T2RYGDMRkesU2A7NsjcT//M6qY1vwozJJLVUksWiOOXIKjT7HvWZpsTyiSOYaOZuRWSINSbWZVR0IQTLL6+S5mU18KvBw1W5dpvHUYBTOIMKBHANNbiHOjSAgIBneIU3T3sv3rv3sWhd8/KZE/gD7/MHN/qQAg==</latexit>

[x, x̄] are given. Then Ñ is constructed by: (i) propagating


2
B A
+
<latexit sha1_base64="aWOcHJrcbrsxJSLO3n80z068CGE=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBZBEEoigh6LXjy2YD+gDWWznbRrN5uwuxFK6C/w4kERr/4kb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjlo5TxbDJYhGrTkA1Ci6xabgR2EkU0igQ2A7GdzO//YRK81g+mEmCfkSHkoecUWOlxkW/XHGr7hxklXg5qUCOer/81RvELI1QGiao1l3PTYyfUWU4Ezgt9VKNCWVjOsSupZJGqP1sfuiUnFllQMJY2ZKGzNXfExmNtJ5Ege2MqBnpZW8m/ud1UxPe+BmXSWpQssWiMBXExGT2NRlwhcyIiSWUKW5vJWxEFWXGZlOyIXjLL6+S1mXVc6te46pSu83jKMIJnMI5eHANNbiHOjSBAcIzvMKb8+i8OO/Ox6K14OQzx/AHzucPcYWMrw==</latexit>
<latexit

<latexit sha1_base64="vMCmCBvrQqvMkcZe5c78yK19qu0=">AAAB63icbVBNSwMxEJ3Ur1q/qh69BIvgqeyKoMeiF48VbC20S8mm2W5okl2SrFCW/gUvHhTx6h/y5r8x2+5BWx8MPN6bYWZemApurOd9o8ra+sbmVnW7trO7t39QPzzqmiTTlHVoIhLdC4lhgivWsdwK1ks1IzIU7DGc3Bb+4xPThifqwU5TFkgyVjzilNhCGqQxH9YbXtObA68SvyQNKNEe1r8Go4RmkilLBTGm73upDXKiLaeCzWqDzLCU0AkZs76jikhmgnx+6wyfOWWEo0S7UhbP1d8TOZHGTGXoOiWxsVn2CvE/r5/Z6DrIuUozyxRdLIoygW2Ci8fxiGtGrZg6Qqjm7lZMY6IJtS6emgvBX355lXQvmr7X9O8vG62bMo4qnMApnIMPV9CCO2hDByjE8Ayv8IYkekHv6GPRWkHlzDH8Afr8AROIjj8=</latexit>
[x, x̄] through the NN to compute bounds [v, v] on the ac-
<latexit sha1_base64="LvSc1o4TyYY3MDujJnMqB0pFPM4=">AAACAnicbZDLSsNAFIZP6q3WW9SVuBksghtLUgS7rLpxWcFeoAlhMp20QycXZiZCCcWNr+LGhSJufQp3vo3TNAtt/WHg4z/ncOb8fsKZVJb1bZRWVtfWN8qbla3tnd09c/+gI+NUENomMY9Fz8eSchbRtmKK014iKA59Trv++GZW7z5QIVkc3atJQt0QDyMWMIKVtjzzyAkEJll9ml17TjJi6Bxd5TD1zKpVs3KhZbALqEKhlmd+OYOYpCGNFOFYyr5tJcrNsFCMcDqtOKmkCSZjPKR9jREOqXSz/IQpOtXOAAWx0C9SKHd/T2Q4lHIS+rozxGokF2sz879aP1VBw81YlKSKRmS+KEg5UjGa5YEGTFCi+EQDJoLpvyIywjoTpVOr6BDsxZOXoVOv2VbNvruoNhtFHGU4hhM4AxsuoQm30II2EHiEZ3iFN+PJeDHejY95a8koZg7hj4zPHzQZlp0=</latexit>

tivation inputs, (ii) computing local sector bounds (Aφ , Bφ )


A +B consistent with [v, v], and (iii) performing the steps in this
˜ <latexit sha1_base64="5hhXfUXbdO3InkpBTGmCiL+cDJo=">AAACA3icbZDLSsNAFIZP6q3WW9SdbgaLIIglKYJdVt24rGAv0IQwmU7aoZMLMxOhhIAbX8WNC0Xc+hLufBunaRda/WHg4z/ncOb8fsKZVJb1ZZSWlldW18rrlY3Nre0dc3evI+NUENomMY9Fz8eSchbRtmKK014iKA59Trv++Hpa795TIVkc3alJQt0QDyMWMIKVtjzz4MwJBCbZpeckI4ZO0VUBeVbPPbNq1axC6C/Yc6jCXC3P/HQGMUlDGinCsZR920qUm2GhGOE0rzippAkmYzykfY0RDql0s+KGHB1rZ4CCWOgXKVS4PycyHEo5CX3dGWI1kou1qflfrZ+qoOFmLEpSRSMyWxSkHKkYTQNBAyYoUXyiARPB9F8RGWGdidKxVXQI9uLJf6FTr9lWzb49rzYb8zjKcAhHcAI2XEATbqAFbSDwAE/wAq/Go/FsvBnvs9aSMZ/Zh18yPr4BnJSW0g==</latexit>
2 section to compute Ñ from (N, Aφ , Bφ ).
<latexit sha1_base64="w95mmOlO4j8agYgYwLtj9DUu3Ro=">AAAB83icbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi8cKthaaUDabTbt0kyy7E6GE/g0vHhTx6p/x5r9x2+agrQ8GHu/NMDMvVFIYdN1vp7K2vrG5Vd2u7ezu7R/UD4+6Jss14x2WyUz3Qmq4FCnvoEDJe0pzmoSSP4bj25n/+MS1EVn6gBPFg4QOUxELRtFKvo9CRrzw1UhMB/WG23TnIKvEK0kDSrQH9S8/ylie8BSZpMb0PVdhUFCNgkk+rfm54YqyMR3yvqUpTbgJivnNU3JmlYjEmbaVIpmrvycKmhgzSULbmVAcmWVvJv7n9XOMr4NCpCpHnrLFojiXBDMyC4BEQnOGcmIJZVrYWwkbUU0Z2phqNgRv+eVV0r1oem7Tu79stG7KOKpwAqdwDh5cQQvuoA0dYKDgGV7hzcmdF+fd+Vi0Vpxy5hj+wPn8AXqtkfU=</latexit>

B. Stability condition after loop transformation


Fig. 4: Loop transformation. If φ is in the sector [αφ , βφ ],
then φ̃ is in the sector [−1nφ ×1 , 1nφ ×1 ]. Similar to the original Lyapunov condition (13a), the new
Lyapunov condition for the feedback system of G in (2) and
NN in (14) can be written as
A. Loop transformation h >
A P AG −P A>
i
G P BG
R̃V> G > Λ 0
 
> > R̃V + R̃ φ 0 −Λ R̃φ < 0, (22)
Through loop transformation, we obtain a new represen- BG P AG BG P BG
   
tation of the NN controller, which is equivalent to (10), InG 0 Ñ Ñ
where R̃V = , and R̃φ = 0vx I vz . (23)
    Ñux Ñuz nφ
u(k) x(k)
= Ñ , (14a)
vφ (k) zφ (k) Lemma 2: Consider the feedback system of G in (2) and
zφ (k) = φ̃(vφ (k)), (14b) NN in (10) with the state constraint set X. If there exist a
matrix P ∈ Sn++ G
, and vector λ ∈ Rnφ with λ ≥ 0 such
where Ñ and φ̃ are defined in Fig. 4. Here, we also partition that (22) (where Ñ = f (N )) and (13b) hold, then: (i) the
Ñ compatibly with the inputs (x, zφ ) and outputs (u, vφ ) feedback system consisting of G in (2) and NN in (10) is

Ñux Ñuz
 locally asymptotically stable around x∗ , and (ii) the set E(P )
Ñ = . (15) is a ROA inner-approximation for it.
Ñvx Ñvz
Proof: It follows from the assumption that (22) and
The loop transformation normalizes the nonlinearity φ̃ to (13b) hold that the feedback system of G in (2) and NN in
lie in the sector [−1nφ ×1 , 1nφ ×1 ]. As a result, φ̃ satisfies the (14) is locally asymptotically stable around x∗ , and E(P )
sector QC for any Λ = diag(λ) with λ ≥ 0: is its ROA inner-approximation. Since the representations
 >    (10) and (14) of NN are equivalent, the feedback system
vφ Λ 0 vφ
of G in (2) and NN in (10) is identical to the feedback
z 0 −Λ z ≥ 0, ∀vφ ∈ [v, v]. (16)
φ φ
system of G in (2) and NN in (14). As a result, the feedback
The input to N is transformed by the following equation: system consisting of G in (2) and NN in (10) is locally
Bφ − Aφ Aφ + B φ asymptotically stable around x∗ , and the set E(P ) is a ROA
wφ (k) = zφ (k) + vφ (k). (17) inner-approximation for it.
2 2
The new Lyapunov condition (22) is convex in P and Λ
The transformed matrix Ñ can be computed by combining using Ñ = f (N ), where N is given. To incorporate the
this relation with (10a). Substituting (17) into (10a) we obtain stability condition in the IL process, we will proceed by
treating Ñ ∈ R(nu +nφ )×(nG +nφ ) as a decision variable along
u(k) = Nux x(k) + C1 zφ (k) + C2 vφ (k), (18) with P and Λ, and try to derive a stability condition that is
vφ (k) = Nvx x(k) + C3 zφ (k) + C4 vφ (k), (19) convex in (P, Λ, Ñ ). Substitute (23) into (22) to obtain
B φ − Aφ Aφ + Bφ  >     
where C1 = Nuw , C2 = Nuw , ?
P 0 AG + BG Ñux BG Ñuz

P 0
< 0.
2 2 0 Λ Ñvx Ñvz 0 Λ
B φ − Aφ Aφ + B φ
C3 = Nvw , C4 = Nvw .
2 2 Applying Schur complements yields the equivalent condition
The expression for vφ (k) can be solved from (19):
A> > > >
 
P 0 G + Ñux BG Ñvx
> > >
vφ (k) = (I − C4 )−1 Nvx x(k) + (I − C4 )−1 C3 zφ (k). (20) 
 0 Λ Ñuz BG Ñvz  > 0, (24)
AG + BG Ñux BG Ñuz P −1 0 
Substituting (20) into (18) yields Ñvx Ñvz 0 Λ−1
u(k) = (Nux + C2 (I − C4 )−1 Nvx )x(k)
and P > 0, Λ > 0. Now (24) is linear in Ñ , but still
+ (C1 + C2 (I − C4 )−1 C3 )zφ (k). (21) nonconvex in P and Λ. Multiplying (24) on the left and
right by diag(P −1 , Λ−1 , InG , Inφ ) we obtain expert behaviour, and the latter are involved in the stability
  and safety constraints (29b). The two sets of variables are
Q1 0 Q1 A> > >
G + L1 BG L>3
connected through the equality constraint (29c). Note that
 0 Q2 L> B >
L> 

AG Q1 + BG L1 BG L2
2
Q1
G 4  > 0, (25)
0  (29c) is equivalent to f (N ) = LQ−1 , and the term on
L3 L4 0 Q2 the right-hand side equals to Ñ from (26). Therefore, (29c)
essentially means that the first set of decision variable N ,
where Q1 = P −1 > 0, Q2 = Λ−1 > 0, L1 = Ñux Q1 , after being transformed by the nonlinear function f , satisfies
L2 = Ñuz Q2 , L3 = Ñvx Q1 , and L4 = Ñvz Q2 . the stability and safety constraints.
The stability condition (25) is convex in the decision vari- Similar to [11], we use the alternating direction method

ables (Q1 , Q2 , L1 , . . . , L4 ), where Q1 ∈ Sn++ G
, Q2 ∈ S++ of multipliers (ADMM) algorithm to solve this constrained
nu ×nG nu ×nφ
and Q2 is a diagonal matrix, L1 ∈ R , L2 ∈ R , learning problem. We first define an augmented loss function
L3 ∈ Rnφ ×nG , and L4 ∈ Rnφ ×nφ . Variables (P, Λ, Ñ ) that
satisfy the Lyapunov condition (22) can be recovered using La (N, Q, L, Y ) = η1 L(N ) − η2 log det(Q1 )
the computed (Q1 , Q2 , L1 , . . . , L4 ) through the following  ρ 2
+ tr Y > · (f (N )Q − L) + kf (N )Q − LkF , (30)
equations: P = Q−1 1 , Λ = Q2 , and
−1
2
Ñ = LQ−1 , (26) where k·kF is the Frobenius norm, Y ∈ R(nu +nφ )×(nG +nφ )
h
Q1 0
i  L1L2
 is the Lagrange multiplier, and ρ > 0 is the regularization pa-
where Q := 0 Q2 , and L := L3 L4 . (27) rameter typically affecting the convergence rate of ADMM.
Thus, the convex stability condition (25) allows us to search The ADMM algorithm takes the following form:
over P , Λ, and Ñ simultaneously. 1. N -update: N k+1 = arg minN La (N, Qk , Lk , Y k ).
Moreover, to enforce the safety condition (x(k) ∈ X ∀k ≥
2. (Q, L)-update: (Q, L)k+1 = arg min La (N k+1 , Q, L, Y k )
0) of the system, convex constraints on Q1 are imposed: Q,L
s.t. LMI(Q, L) > 0
Hi> Q1 Hi ≤ h2i , i = 1, · · · , nX , (28)

which is derived directly from (13b) by Schur complements, 3. Y -update: If f (N k+1 )Qk+1 − Lk+1 F ≤ σ, where
and using Q1 = P −1 . Again, this ensures E(Q−1 σ > 0 is the stopping tolerance, then the algorithm has
1 ) ⊆ X.
Denote the LMIs (25), (28) with Q1 > 0 and Q2 > 0 al- converged, and we have found a solution to (29), so terminate
together as LMI(Q, L) > 0, which will later be incorporated the algorithm. Otherwise, update Y and return to step 1.
in the IL process to learn robust NN controllers. Y k+1 = Y k + ρ f (N k+1 )Qk+1 − Lk+1

Remark 2: Note that model uncertainties are not consid-
ered in the paper. They can be incorporated in (13a) or The unconstrained optimization in Step 1 can be solved
(22) using integral quadratic constraints (IQCs) as in [8]. using gradient based algorithm. The optimization in Step 2 is
However, to derive convex conditions, like (25), only limited convex, and can be solved effectively using SDP solvers. The
types of uncertainties may be incorporated. This in turn variable Y in Step 3 accumulates the deviation from the con-
will allow for Zames-Falb IQCs to refine the description of straint (29c). The individual steps are well-behaved: Gradient
activation functions by capturing their slope restrictions. descent in Step 1 almost surely converges to local optima
V. S AFE I MITATION L EARNING A LGORITHM [22] and Step 2 always obtains a global optimum. However,
the loss L(N ) and constraint (29c) are nonconvex and hence
Given state and control data pairs from the expert demon-
the proposed ADMM does not guarantee convergence to a
strations, we use a loss function L(N ) to train NN controllers
global optima. Convergence properties of ADMM without
with weights N to match the data. Common choices of the
convexity are still subject to ongoing research [23]. However,
loss function include mean squared error, absolute error, and
any converged solution provides a safe NN controller with
cross-entropy. In general, L(N ) is non-convex in N . We
stability and safety guarantees.
propose the next optimization to solve the safe IL problem,
In Step 2, Q and L introduce O(nG + nφ ) and O((nG +
min η1 L(N ) − η2 log det(Q1 ) (29a)
N,Q,L nφ ) × (nu + nφ )) decision variables. The complexity of
s.t. LMI(Q, L) > 0 (29b) primal/dual SDP solvers scales cubically with the number
of decision variables. Hence Step 2 will be computationally
f (N )Q = L (29c)
expensive if the number of activation functions nφ is large.
where Q and L are defined in (27). The optimization has
VI. E XAMPLES
separate objectives. The cost function (29a) combines the
IL loss function with a term that attempts to increase the In the following examples, Step 1 in the ADMM al-
volume of E(Q−1 1 ) (which is proportional to det(Q1 ) ). The gorithm is implemented on Tensorflow, and solved by
parameters η1 , η2 > 0 reflect the relative importance between ADAM [24]. Step 2 is formulated using CVX, and is
imitation learning accuracy and size of the robustness mar- solved by MOSEK. The mean squared error is chosen
gin. The optimization has two sets of decision variables: as the loss function L(N ). The code is available at
N and (Q, L). The former is involved in mimicking the https://github.com/heyinUCB/IQCbased ImitationLearning
A. Inverted pendulum X = [−2, 2] × [−3, 3]. In this example, we design an
Consider an inverted pendulum with mass m = 0.15 kg, LQR controller to produce expert data. The NN controller
length l = 0.5 m, and friction coefficient µ = 0.5 Nms/rad. is again parameterized by a 2-layer, feedforward NN with
The discretized and linearized dynamics are: n1 = n2 = 16 and tanh as the activation function for both
       layers. In this example, we will show how the parameter η2
x1 (k + 1) 1 δ x1 (k) 0 affects the result. To do so, two experiments are carried out
= gδ µδ + δ u(k),
x2 (k + 1) l 1 − ml 2 x2 (k) ml2 using two sets of parameters (ρ = 1, η1 = 100, η2 = 5)
where the states x1 , x2 represent the angular position (rad) and (ρ = 1, η1 = 100, η2 = 20), meaning that we care
and velocity (rad/s), u is the control input (Nm), and δ = more about the size of the ROA inner-approximation, and
0.02 s is the sampling time. The state constraint set is less about the IL accuracy in the second experiment than we
X = [−2.5, 2.5] × [−6, 6]. To generate state and control do in the first experiment. In both experiments, the ADMM
data pairs for IL, we design an explicit model predictive algorithm is terminated in 20 iterations.
controller (MPC) to serve as the expert. By fitting a NN The ROA inner-approximations of the NN controllers from
controller to the explicit MPC controller, we can expedite the the two experiments are shown in Figure 6. The one com-
evaluation of controllers during run-time [13]–[15]. In this puted with η2 = 5 is shown with a magenta ellipsoid, and the
example, besides a NN controller, we will also provide its one computed with η2 = 20 is shown with a blue ellipsoid.
associated ROA inner-approximation that guarantees stability First, it is important to notice that both NN controllers’ ROA
and safety. The NN controller is parameterized by a 2-layer, inner-approximations are larger than that of the expert’s LQR
feedforward NN with n1 = n2 = 10 and tanh as the controller (shown with a dashed gray ellipsoid), thanks to
activation function for both layers. Take ρ = 1, η1 = 100, the second term in the cost function (29a), which enhances
and η2 = 5. The ADMM algorithm is terminated after 16 the robustness of IL. Also, as expected, the ROA inner-
iterations, and f (N ) − LQ−1 F = 0.17. approximation of the NN controller with η2 = 20 is larger
In Fig. 5, the plot on the left shows the learned NN than that with η2 = 5, since a larger η2 leads to a larger
controller with a blue surface, and state and control data ROA inner-approximation. However, the larger ROA inner-
pairs from expert demonstrations with orange dots; the plot approximation comes at the cost of less accurate regression
on the right shows the ROAs of the MPC controller and to the expert data. As shown in Figure 7, the mesh plot of
the NN controller with oranges dots and a blue ellipsoid, the NN controller with η2 = 20 (shown with a blue surface)
respectively. We can notice that the ROA of the NN controller is more off from the expert data (shown with orange stars)
is tightly contained by the state constraint set X (shown with than that with η2 = 5 (shown with a magenta surface).
a gray rectangle), which guarantees the safety of the system.

Fig. 6: ROAs and state constraint set X of GTM


Fig. 5: Left: NN controller vs. expert data from demonstra-
tions; Right: ROAs of MPC controller and NN controller,
and state constraint set X of the inverted pendulum

B. Generic Transport Model


The Generic Transport Model (GTM) is a 5.5% scale com-
mercial aircraft. Linearizing and discretizing the longitudinal
dynamics given in [25] with sampling time δ = 0.02 s yields:
      
x1 (k + 1) 0.935 0.019 x1 (k) −0.006
= + u(k),
x2 (k + 1) −0.907 0.913 x2 (k) −1.120
where the states x1 , x2 represent angle of attack (rad),
and pitch rate (rad/s), and the control u represents the
elevator deflection (rad). Take the state constraint set as Fig. 7: NN controllers vs. expert data of GTM
0.4
C. Vehicle Lateral Control
0.2
Consider discretized vehicle lateral dynamics from [8, Eqn
34] with sampling time δ = 0.02s, and a constant curvature 0
0.4
c ≡ 0. Let x = [e, ė, eθ , ėθ ] denote the plant state, where e 0.2
is the perpendicular distance to the lane edge (m), and eθ 0
-0.2
is the angle between the tangent to the straight section of 0.2
the road and the projection of the vehicle’s longitudinal axis 0.1
0
(rad). The control u is the steering angle of the front wheel
(rad). The state constraint set is X = [−2, 2] × [−5, 5] ×
0
[−1, 1] × [5, 5]. We design an explicit MPC law to serve
-1
as the expert, which is computed with an input constraint
-2
u(k) ∈ [−π/6, π/6] and a 5-step prediction horizon. The 0
NN controller is parameterized by a 2-layer NN with n1 = -0.2
n2 = 10 and tanh as the activation function for both layers. -0.4
0 2 4 6 8 10
We perform two experiments using two sets of parameters
(ρ = 1000, η1 = 100, η2 = 100) and (ρ = 1000, η1 =
100, η2 = 500). In both experiments, the ADMM algorithm
is terminated in 20 iterations, and the achieved kf (N ) − Fig. 9: Simulated signals using the explicit MPC law (red
LQ−1 kF are 0.14 and 0.08. curves), and the NN controllers with η2 = 100 (cyan curves)
As shown in Fig. 8, the ROAs of the NN controllers and η2 = 500 (blue curves)
with η2 = 100 and η2 = 500 are contained by the state
constraint set X shown with maroon boxes, guaranteeing
the safety of the system. As expected, the volume of the R EFERENCES
ROA with η2 = 500 is larger than that with η2 = 100, and
[1] T. Osa, J. Pajarinen, G. Neumann, J. Bagnell, P. Abbeel, and J. Peters,
the achieved det(Q1 ) for η2 = 100 and η2 = 500 are 543 “An algorithmic perspective on imitation learning,” Foundations and
and 663. However, the larger ROA comes at the cost of less Trends in Robotics, vol. 7, no. 1-2, pp. 1–179, 2018.
accurate regression to the expert data. As shown in Fig. 9, [2] D. A. Pomerleau, ALVINN: An Autonomous Land Vehicle in a Neural
Network. San Francisco, CA, USA: Morgan Kaufmann Publishers
the simulated state and control signals with η2 = 100 are Inc., 1989, p. 305–313.
closer to those of the MPC law than those with η2 = 500. [3] W. Sun, A. Venkatraman, G. J. Gordon, B. Boots, and J. A. Bagnell,
It takes 0.011 and 0.012 second to run the simulations “Deeply aggrevated: Differentiable imitation learning for sequential
for the NN controllers for 500 time steps with η2 = 100 prediction,” in International Conference on Machine Learning, 2017.
[4] H. Daumé, J. Langford, and D. Marcu, “Search-based structured
and η2 = 500 on a laptop with an Intel core i5 processor, prediction,” Machine learning, vol. 75, no. 3, pp. 297–325, 2009.
and it takes 0.38 second for the explicit MPC law, which [5] M. Fazlyab, M. Morari, and G. J. Pappas, “Safety Verification and
demonstrates the run-time advantage of NNs over explicit Robustness Analysis of Neural Networks via Quadratic Constraints
and Semidefinite Programming,” p. arXiv:1903.01287, Mar. 2019.
MPC laws. [6] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Ef-
5 5
ficient and accurate estimation of Lipschitz constants for deep neural
networks,” in Advances in Neural Information Processing Systems 32.
Curran Associates, Inc., 2019, pp. 11 427–11 438.
[7] H. Hu, M. Fazlyab, M. Morari, and G. J. Pappas, “Reach-SDP:
Reachability Analysis of Closed-Loop Systems with Neural Network
0 0 Controllers via Semidefinite Programming,” p. arXiv:2004.07876.
[8] H. Yin, P. Seiler, and M. Arcak, “Stability Analysis using Quadratic
Constraints for Systems with Neural Network Controllers,” p.
arXiv:2006.07579, Jun. 2020.
[9] P. Pauli, J. Köhler, J. Berberich, A. Koch, and F. Allgöwer, “Offset-
-5 -5 free setpoint tracking using neural network controllers,” arXiv preprint
-2 0 2 -1 0 1 arXiv:2011.14006, 2020.
[10] M. Jin and J. Lavaei, “Stability-certified reinforcement learning: A
Fig. 8: ROAs of the explicit MPC law (red crosses), and the control-theoretic perspective,” arXiv preprint arXiv:1810.11505, 2018.
NN controllers with η2 = 100 (cyan curves) and η2 = 500 [11] P. Pauli, A. Koch, J. Berberich, and F. Allgöwer, “Training robust
neural networks using Lipschitz bounds,” p. arXiv:2005.02929.
(blue curves) [12] M. Revay, R. Wang, and I. R. Manchester, “Convex sets of robust
recurrent neural networks,” arXiv preprint arXiv:2004.05290, 2020.
[13] X. Zhang, M. Bujarbaruah, and F. Borrelli, “Safe and near-optimal
VII. C ONCLUSIONS policy learning for model predictive control using primal-dual neural
networks,” in 2019 American Control Conference (ACC), 2019.
In this paper, we present an IL algorithm with stability and
[14] B. Karg and S. Lucia, “Efficient representation and approximation of
safety guarantees. First, we derive convex stability and safety model predictive control laws via deep learning,” IEEE Transactions
conditions for NN controlled systems. Then, we incorporate on Cybernetics, vol. 50, no. 9, pp. 3866–3878, 2020.
these conditions in the IL process, which trades off between [15] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J.
Pappas, and M. Morari, “Approximating explicit model predictive
the IL accuracy, and the size of the ROA. Finally, we propose control using constrained neural networks,” in 2018 Annual American
an ADMM based algorithm to solve the safe IL problem. Control Conference (ACC), 2018, pp. 1520–1527.
[16] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula,
and C. J. Tomlin, “A general safety framework for learning-based
control in uncertain robotic systems,” IEEE Transactions on Automatic
Control, vol. 64, no. 7, pp. 2737–2752, 2018.
[17] A. J. Taylor, A. Singletary, Y. Yue, and A. D. Ames, “A control barrier
perspective on episodic learning via projection-to-state safety,” arXiv
preprint arXiv:2003.08028, 2020.
[18] L. Lindemann, H. Hu, A. Robey, H. Zhang, D. V. Dimarogonas, S. Tu,
and N. Matni, “Learning hybrid control barrier functions from data,”
arXiv preprint arXiv:2011.04112, 2020.
[19] P. L. Donti, M. Roderick, M. Fazlyab, and J. Z. Kolter, “Enforcing ro-
bust control guarantees within neural network policies,” arXiv preprint
arXiv:2011.08105, 2020.
[20] K. K. Kim, E. R. Patrón, and R. D. Braatz, “Standard representation
and unified stability analysis for dynamic artificial neural network
models,” Neural Networks, vol. 98, pp. 251–262, 2018.
[21] S. Gowal, K. Dvijotham, R. Stanforth, R. Bunel, C. Qin, J. Uesato,
R. Arandjelovic, T. Mann, and P. Kohli, “On the Effectiveness of
Interval Bound Propagation for Training Verifiably Robust Models,”
p. arXiv:1810.12715, Oct. 2018.
[22] J. D. Lee, M. Simchowitz, M. I. Jordan, and B. Recht, “Gradient
descent only converges to minimizers,” in 29th Annual Conference on
Learning Theory, vol. 49. PMLR, 23–26 Jun 2016, pp. 1246–1257.
[23] Y. Wang, W. Yin, and J. Zeng, “Global convergence of ADMM in
nonconvex nonsmooth optimization,” Journal of Scientific Computing,
vol. 78, no. 1, pp. 29–63, 2019.
[24] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
tion,” arXiv preprint arXiv:1412.6980, 2014.
[25] A. Chakraborty, P. Seiler, and G. J. Balas, “Nonlinear region of at-
traction analysis for flight control verification and validation,” Control
Engineering Practice, vol. 19, no. 4, pp. 335–345, 2011.

View publication stats

You might also like