Professional Documents
Culture Documents
2019 - Joshi - CDC - Ieee - DNN Control
2019 - Joshi - CDC - Ieee - DNN Control
Abstract— We present a new neuroadaptive architecture: is to find an update law for a parametric model of the
Deep Neural Network based Model Reference Adaptive Con- uncertainty that ensures that the candidate Lyapunov function
trol (DMRAC). Our architecture utilizes the power of deep is non-increasing. Many update laws have been proposed and
neural network representations for modeling significant non-
linearities while marrying it with the boundedness guarantees analyzed, which include but not limited to σ-modification
that characterize MRAC based controllers. We demonstrate [10], e-modification [11], and projection-based updates [9].
arXiv:1909.08602v1 [cs.LG] 18 Sep 2019
through simulations and analysis that DMRAC can subsume More modern laws extending the classical parametric setting
previously studied learning based MRAC methods, such as include `1 -adaptive control [12] and concurrent learning [13]
concurrent learning and GP-MRAC. This makes DMRAC a have also been studied.
highly powerful architecture for high-performance control of
nonlinear systems with long-term learning properties. A more recent work introduced by the author is the
Gaussian Process Model Reference Adaptive Control (GP-
I. I NTRODUCTION MRAC), which utilizes a GP as a model of the uncertainty.
Deep Neural Networks (DNN) have lately shown tremen- A GP is a Bayesian nonparametric adaptive element that
dous empirical performance in many applications and various can adapt both its weights and the structure of the model
fields such as computer vision, speech recognition, trans- in response to the data. The authors and others have shown
lation, natural language processing, Robotics, Autonomous that GP-MRAC has strong long-term learning properties as
driving and many more [1]. Unlike their counterparts such well as high control performance [14], [15]. However, GPs
as shallow networks with Radial Basis Function features [2], are “shallow” machine learning models, and do not utilize
[3], deep networks learn features by learning the weights the power of learning complex features through compositions
of nonlinear compositions of weighted features arranged in as deep networks do (see II-A). Hence, one wonders whether
a directed acyclic graph [4]. It is now pretty clear that the power of deep learning could lead to even more powerful
deep neural networks are outshining other classical machine- learning based MRAC architectures than those utilizing GPs.
learning techniques [5]. Leveraging these successes, there In this paper, we address this critical question: How can
have been many exciting new claims regarding the control MRAC utilize deep networks while guaranteeing stability?
of complex dynamical systems in simulation using deep Towards that goal, our contributions are as follows: a) We
reinforcement learning [6]. However, Deep Reinforcement develop an MRAC architecture that utilizes DNNs as the
Learning (D-RL) methods typically do not guarantee stability adaptive element; b) We propose an algorithm for the online
or even the boundedness of the system during the learning update of the weights of the DNN by utilizing a dual time-
transient. Hence despite significant simulation success, D-RL scale adaptation scheme. In our algorithm, the weights of the
has seldomly been used in safety-critical applications. D-RL outermost layers are adapted in real time, while the weights
methods often make the ergodicity assumption, requiring that of the inner layers are adapted using batch updates c) We
there is a nonzero probability of the system states returning develop theory to guarantee Uniform Ultimate Boundedness
to the origin. In practice, such a condition is typically (UUB) of the entire DMRAC controller; d) We demonstrate
enforced by resetting the simulation when a failure occurs. through simulation results that this architecture has desirable
Unfortunately, however, real-world systems do not have this long term learning properties.
reset option. Unlike, D-RL much effort has been devoted in We demonstrate how DNNs can be utilized in stable
the field of adaptive control to ensuring that the system stays learning schemes for adaptive control of safety-critical sys-
stable during learning. tems. This provides an alternative to deep reinforcement
Model Reference Adaptive Control (MRAC) is one such learning for adaptive control applications requiring stability
leading method for adaptive control that seeks to learn a guarantees. Furthermore, the dual time-scale analysis scheme
high-performance control policy in the presence of signif- used by us should be generalizable to other DNN based
icant model uncertainties [7]–[9]. The key idea in MRAC learning architectures, including reinforcement learning.
II. BACKGROUND
*Supported by the Laboratory Directed Research and Development pro-
gram at Sandia National Laboratories, a multi-mission laboratory managed A. Deep Networks and Feature spaces in machine learning
and operated by National Technology and Engineering Solutions of Sandia,
LLC., a wholly owned subsidiary of Honeywell International, Inc., for The key idea in machine learning is that a given function
the U.S. Department of Energy’s National Nuclear Security Administration can be encoded with weighted combinations of feature vector
under contract DE-NA-0003525. Φ ∈ F, s.t Φ(x) = [φ1 (x), φ2 (x), ..., φk (x)]T ∈ Rk ,
Authors are with Coordinated Science Laboratory,
University of Illinois, Urbana-Champaign, IL, USA and W ∗ ∈ Rk×m a vector of ‘ideal’ weights s.t ky(x) −
T
girishj2@illinois.edu,girishc@illinois.edu W ∗ Φ(x)k∞ < (x). Instead of hand picking features, or
relying on polynomials, Fourier basis functions, comparison- Z M = (Z1 , Z2 , . . . ZM ) of Z−valued random variables
type features used in support vector machines [16], [17] or drawn according to some unknown distribution P ∈ P.
Gaussian Processes [18], DNNs utilize composite functions Where each Zi = {xi , yi } are the labelled pair of input and
of features arranged in a directed acyclic graphs, i.e. Φ(x) = target values. For each P the expected loss can be computed
φn (θn−1 , φn−1 (θn−2 , φn−2 (...)))) where θi ’s are the layer as E p (`(Z, θ)). The above empirical loss (1) is used as
weights. The universal approximation property of the DNN proxy for the expected value of loss with respect to the true
with commonly used feature functions such as sigmoidal, data generating distribution.
tanh, and RELU is proved in the work by Hornik’s [19] and Optimization based on the Stochastic Gradient Descent
shown empirically to be true by recent results [20]–[22]. (SGD) algorithm uses a stochastic approximation of the
Hornik et al. argued the network with at least one hidden gradient of the loss L(Z, θ) obtained over a mini-batch of
layer (also called Single Hidden Layer (SHL) network) M training examples drawn from buffer B. The resulting
to be a universal approximator. However, empirical results SGD weight update rule
show that the networks with more hidden layers show better M
generalization capability in approximating complex function. 1 X
θ k+1 = θk − η ∇θ `(Zi , θ k ) (2)
While the theoretical reasons behind better generalization M i=1
ability of DNN are still being investigated [23], for the
where η is the learning rate. Further details on generating
purpose of this paper, we will assume that it is indeed true,
i.i.d samples for DNN learning and the training details of
and focus our efforts on designing a practical and stable
network are provided in section IV.
control scheme using DNNs.
B. Neuro-adaptive control III. S YSTEM D ESCRIPTION
Neural networks in adaptive control have been studied This section discusses the formulation of model reference
for a very long time. The seminal paper by Lewis [24] adaptive control (see e.g. [7]). We consider the following
utilized Taylor series approximations to demonstrate uniform system with uncertainty ∆(x):
ultimate boundedness with a single hidden neural network. ẋ(t) = Ax(t) + B(u(t) + ∆(x)) (3)
SHL networks are nonlinear in the parameters; hence, the
analysis previously introduced for linear in parameter, radial where x(t) ∈ Rn , t > 0 is the state vector, u(t) ∈ Rm ,
basis function neural networks introduced by Sanner and t > 0 is the control input, A ∈ Rn×n , B ∈ Rn×m are
Slotine does not directly apply [2]. The back-propagation known system matrices and we assume the pair (A, B) is
type scheme with non-increasing Lyapunov candidate as controllable. The term ∆(x) : Rn → Rm is matched system
a constraint, introduced in Lewis’ work has been widely uncertainty and be Lipschitz continuous in x(t) ∈ Dx .
used in Neuro-adaptive MRAC. Concurrent Learning MRAC Let Dx ⊂ Rn be a compact set and the control u(t) is
(CL-MRAC) is a method for learning based neuro-adaptive assumed to belong to a set of admissible control inputs of
control developed by the author to improve the learning measurable and bounded functions, ensuring the existence
properties and provide exponential tracking and weight error and uniqueness of the solution to (3).
convergence guarantees. However, similar guarantees have The reference model is assumed to be linear and therefore
not been available for SHL networks. There has been much the desired transient and steady-state performance is defined
work, towards including deeper neural networks in control; by a selecting the system eigenvalues in the negative half
however, strong guarantees like those in MRAC on the plane. The desired closed-loop response of the reference
closed-loop stability during online learning are not available. system is given by
In this paper, we propose a dual time-scale learning approach
which ensures such guarantees. Our approach should be ẋrm (t) = Arm xrm (t) + Brm r(t) (4)
generalizable to other applications of deep neural networks, where xrm (t) ∈ Dx ⊂ Rn and Arm ∈ Rn×n is Hurwitz and
including policy gradient Reinforcement Learning (RL) [25] Brm ∈ Rn×r . Furthermore, the command r(t) ∈ Rr denotes
which is very close to adaptive control in its formulation and a bounded, piece wise continuous, reference signal and we
also to more recent work in RL for control [26]. assume the reference model (4) is bounded input-bounded
C. Stochastic Gradient Descent and Batch Training output (BIBO) stable [7].
The true uncertainty ∆(x) in unknown, but it is assumed
We consider a deep network model with parameters θ,
to be continuous over a compact domain Dx ⊂ Rn . A
and consider the problem of optimizing a non convex loss
Deep Neural Networks (DNN) have been widely used to
function L(Z, θ), with respect to θ. Let L(Z, θ) is defined
represent a function when the basis vector is not known.
as average loss over M training sample data points.
Using DNNs, a non linearly parameterized network estimate
M ˆ
of the uncertainty can be written as ∆(x) , θnT Φ(x), where
1 X
L(Z, θ) = `(Zi , θ) (1) θn ∈ Rk×m are network weights for the final layer and
M i=1
Φ(x) = φn (θn−1 , φn−1 (θn−2 , φn−2 (...)))), is a k dimen-
where M denotes the size of sample training set. For each sional feature vector which is function of inner layer weights,
sample size of M , the training data are in form of M -tuple activations and inputs. The basis vector Φ(x) ∈ F : Rn →
Rk is considered to be Lipschitz continuous to ensure the (5). To achieve the asymptotic convergence of the reference
existence and uniqueness of the solution (3). model tracking error to zero, we use the D-MRGeN estimate
in the controller (5) as νad = ∆0 (x)
A. Total Controller
The aim is to construct a feedback law u(t), t > 0, νad (t) = W T φn (θn−1 , φn−1 (θn−2 , φn−2 (...)))) (6)
such that the state of the uncertain dynamical system (3) To differentiate the weights of D-MRGeN from last layer
asymptotically tracks the state of the reference model (4) weights of DNN “θn ”, we denote D-MRGeN weights as
despite the presence of matched uncertainty. “W ”.
A tracking control law consisting of linear feedback term Assumption 1: Appealing to the universal approximation
upd = Kx(t), a linear feed-forward term ucrm = Kr r(t) property of Neural Networks [27] we have that, for every
and an adaptive term νad (t) form the total controller given basis functions Φ(x) ∈ F there exists unique ideal
u = upd + ucrm − νad (5) weights W ∗ ∈ Rk×m and 1 (x) ∈ Rm such that the
following approximation holds
The baseline full state feedback and feed-forward controller
is designed to satisfy the matching conditions such that ∆(x) = W ∗T Φ(x) + 1 (x), ∀x(t) ∈ Dx ⊂ Rn (7)
Arm = A − BK and Brm = BKr . For the adaptive Fact 1: The network approximation error 1 (x) is upper
controller ideally we want νad (t) = ∆(x(t)). Since we bounded, s.t ¯1 = supx∈Dx k1 (x)k, and can be made
do not have true uncertainty information, we use a DNN arbitrarily small given sufficiently large number of basis
estimate of the system uncertainties in the controller as functions.
ˆ
νad (t) = ∆(x(t)). The reference model tracking error is defined as e(t) =
B. Deep Model Reference Generative Network (D-MRGEN) xrm (t)−x(t). Using (3) & (4) and the controller of form (5)
for uncertainty estimation with adaptation term νad , the tracking error dynamics can be
written as
Unlike traditional MRAC or SHL-MRAC weight update
ė(t) = ẋrm (t) − ẋ(t) (8)
rule, where the weights are moved in the direction of
diminishing tracking error, training a deep Neural network ė(t) = Arm e(t) + W̃ T Φ(x) + 1 (x) (9)
is much more involved. Feed-Forward networks like DNNs
are trained in a supervised manner over a batch of i.i.d data. where W̃ = W ∗ − W is error in parameter.
Deep learning optimization is based on Stochastic Gradient The estimate of the unknown true network parameters W ∗
Descent (SGD) or its variants. The SGD update rule relies are calculated on-line using the weight update rule (10);
on a stochastic approximation of the expected value of the correcting the weight estimates in the direction of minimizing
gradient of the loss function over a training set or mini- the instantaneous tracking error e(t). The resulting update
batches. rule for network weights in estimating the total uncertainty
To train a deep network to estimate the system uncer- in the system is as follows
tainties, unlike MRAC we need labeled pairs of state-true Ẇ = Γproj(W, Φ(x)e(t)0 P ) W (0) = W0 (10)
uncertainties {x(t), ∆(x(t))} i.i.d samples. Since we do not
have access to true uncertainties (∆(x)), we use a generative where Γ ∈ Rk×k is the learning rate and P ∈ Rn×n
network to generate estimates of ∆(x) to create the labeled is a positive definite matrix. For given Hurwitz Arm , the
targets for deep network training. For details of the generative matrix P ∈ Rn×n is a positive definite solution of Lyapunov
network architecture in the adaptive controller, please see equation ATrm P + P Arm + Q = 0 for given Q > 0
[15]. This generative network is derived from separating the Assumption 2: For uncertainty parameterized by unknown
DNN into inner feature layer and the final output layer of the true weight W ∗ ∈ Rk×m and known nonlinear basis Φ(x),
network. We also separate in time-scale the weight updates of the ideal weight matrix is assumed to be upper bounded s.t
these two parts of DNN. Temporally separated weight update kW ∗ k ≤ Wb . This is not a restrictive assumption.
algorithm for the DNN, approximating system uncertainty is 1) Lyapunov Analysis: The on-line adaptive identifica-
presented in more details in further sections. tion law (10) guarantees the asymptotic convergence of
the tracking errors e(t) and parameter error W̃ (t) under
C. Online Parameter Estimation law the condition of persistency of excitation [7], [28] for the
The last layer of DNN with learned features from inner structured uncertainty. Similar to the results by Lewis for
layer forms the Deep-Model Reference Generative Network SHL networks [29], we show here that under the assumption
(D-MRGeN). We use the MRAC learning rule to update of unstructured uncertainty represented by a deep neural
pointwise in time, the weights of the D-MRGeN in the network, the tracking error is uniformly ultimately bounded
direction of achieving asymptotic tracking of the reference (UUB). We will prove the following theorem under switching
model by the actual system. feature vector assumption.
Since we use the D-MRGeN estimates to train DNN Theorem 1: Consider the actual and reference plant model
model, we first study the admissibility and stability character- (3) & (4). If the weights parameterizing total uncertainty in
istics of the generative model estimate ∆0 (x) in the controller the system are updated according to identification law (10)
Then the tracking error kek and error in network weights
kW̃ k are bounded for all Φ ∈ F.
Proof: The feature vectors belong to a function class
characterized by the inner layer network weights θi s.t
Φ ∈ F. We will prove the Lyapunov stability under the
assumption that inner layer of DNN presents us a feature
which results in the worst possible approximation error
compared to network with features before switch.
For the purpose of this proof let Φ(x) denote feature
before switch and Φ̄(x) be the feature after switch. We define
the error 2 (x) as,
2 (x) = sup W T Φ̄(x) − W T Φ(x) (11)
Φ̄∈F
ẋ(t) = Arm x(t) + Brm r(t) + B(∆(x) − fθ (x(t))) (22) P m {|L(Z, θ) − EP (`(Z, θ))| ≥ } (28)
( m )
X
Adding and subtracting the term ∆0 (x) in above expression = Pm `(Z, θ) − mEP (`(Z, θ)) ≥ m (29)
and using the training and generalization error definitions we i=1
can write, −2 m/2
≤ 2e (30)
ẋ(t) = Arm x(t) + Brm r(t) (23) Hence
0 0
+B(∆(x) − ∆ (x(t)) + ∆ (x(t)) − fθ (x(t))) P m {∀fθ ∈ H, | |L(Z, θ) − EP (`(Z, θ))| ≥ }
2
≤ 2|H|e− m/2
=δ (31)
The term (∆(x) − ∆0 (x(t))) is the D-MRGeN training error
and (∆0 (x(t)) − fθ (x(t))) is the generalization error of We note that the total number
N of possible states that is as-
the DMRAC DNN network. For simplicity of analysis we signed to the weights is 2k since there are 2k possibilities
assume the training error is zero, this assumption is not for each weights. Therefore H is finite and |H| ≤ 2kN . The
very restrictive since training error can be made arbitrarily result follows immediately from simplifying Eq-(31).
VI. S IMULATIONS both with adaptation and as a feed-forward adaptive network
without adaptation. Figure-3b demonstrate the DNN learning
In this section, we will evaluate the presented DMRAC performance vs epochs. The Training, Testing and Validation
adaptive controller using a 6-DOF Quadrotor model for the error over the data buffer for DNN, demonstrate the network
reference trajectory tracking problem. The quadrotor model performance in learning a model of the system uncertainties
is completely described by 12 states, three position, and and its generalization capabilities over unseen test data.
velocity in the North-East-Down reference frame and three
body angles and angular velocities. The full description of VII. C ONCLUSION
the dynamic behavior of a Quadrotor is beyond the scope In this paper, we presented a DMRAC adaptive con-
of this paper, and interested readers can refer to [37] and troller using model reference generative network architec-
references therein. ture to address the issue of feature design in unstructured
The control law designed treats the moments and forces on uncertainty. The proposed controller uses DNN to model
the vehicle due to unknown true inertia/mass of the vehicle significant uncertainties without knowledge of the system’s
and moments due to aerodynamic forces of the crosswind, domain of operation. We provide theoretical proofs of the
as the unmodeled uncertainty terms and are captured online controller generalizing capability over unseen data points
through DNN adaptive element. The outer-loop control of and boundedness properties of the tracking error. Numeri-
the quadrotor is achieved through Dynamic Inversion (DI) cal simulations with 6-DOF quadrotor model demonstrate
controller, and we use DMRAC for the inner-loop attitude the controller performance, in achieving reference model
control. A simple wind model with a boundary layer effect tracking in the presence of significant matched uncertainties
is used to simulate the effect of crosswind on the vehicle. and also learning retention when used as a feed-forward
A second-order reference model with natural frequency adaptive network on similar but unseen new tasks. Thereby
4rad/s and damping ratio of 0.5 is used. Further stochastic- we claim DMRAC is a highly powerful architecture for high-
ity is added to the system by adding Gaussian white noise performance control of nonlinear systems with robustness
to the states with a variance of ωn = 0.01. The simulation and long-term learning properties.
runs for 150secs and uses time step of 0.05s. The maximum R EFERENCES
number of points (pmax ) to be stored in buffer B is arbitrarily [1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning.
set to 250, and SVD maximization algorithm is used to MIT press, 2016.
cyclically update B when the budget is reached, for details [2] R M Sanner and J.-J.E. Slotine. Gaussian networks for direct adaptive
control. Neural Networks, IEEE Transactions on, 3(6):837–863, 11
refer [35]. 1992.
The controller is designed to track a stable reference [3] Miao Liu, Girish Chowdhary, Bruno Castra da Silva, Shih-Yuan Liu,
commands r(t). The goal of the experiment is to evaluate and Jonathan P How. Gaussian processes for learning and control: A
tutorial with examples. IEEE Control Systems Magazine, 38(5):53–86,
the tracking performance of the proposed DMRAC controller 2018.
on the system with uncertainties over an unknown domain [4] Dong Yu, Michael L. Seltzer, Jinyu Li, Jui-Ting Huang, and Frank
of operation. The learning rate for D-MRGeN network and Seide. Feature Learning in Deep Neural Networks - Studies on Speech
Recognition Tasks. arXiv e-prints, page arXiv:1301.3605, Jan 2013.
DMRAC-DNN networks are chosen to be Γ = 0.5I6×6 and [5] Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mo-
η = 0.01. The DNN network is composed of 2 hidden layers hamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick
with 200, 100 neurons and with tan-sigmoid activations, and Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic
modeling in speech recognition. IEEE Signal processing magazine,
output layer with linear activation. We use “Levenberg- 29, 2012.
Marquardt backpropagation” [38] for updating DNN weights [6] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu,
over 100 epochs. Tolerance threshold for kernel indepen- Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller,
Andreas K Fidjeland, Georg Ostrovski, and others. Human-level
dence test is selected to be ζtol = 0.2 for updating the buffer control through deep reinforcement learning. Nature, 518(7540):529–
B. 533, 2015.
Figure-2a and Fig-2b show the closed loop system per- [7] P Ioannou and J Sun. Theory and design of robust direct and
indirect adaptive-control schemes. International Journal of Control,
formance in tracking the reference signal for DMRAC con- 47(3):775–813, 1988.
troller and learning retention when used as the feed-forward [8] Gang Tao. Adaptive control design and analysis, volume 37. John
network on a similar trajectory (Circular) with no learning. Wiley & Sons, 2003.
[9] J.-B. Pomet and L Praly. Adaptive nonlinear regulation: estimation
We demonstrate the proposed DMRAC controller under from the Lyapunov equation. Automatic Control, IEEE Transactions
uncertainty and without domain information is successful on, 37(6):729–740, 6 1992.
in producing desired reference tracking. Since DMRAC, [10] Petros A Ioannou and Jing Sun. Robust adaptive control, volume 1.
PTR Prentice-Hall Upper Saddle River, NJ, 1996.
unlike traditional MRAC, uses DNN for uncertainty esti- [11] Anuradha M Annaswamy and Kumpati S Narendra. Adaptive control
mation is hence capable of retaining the past learning and of simple time-varying systems. In Decision and Control, 1989.,
thereby can be used in tasks with similar features without Proceedings of the 28th IEEE Conference on, page 1014?1018 vol.2,
12 1989.
active online adaptation Fig-2b. Whereas traditional MRAC [12] Naira Hovakimyan and Chengyu Cao. 1 Adaptive Control Theory:
which is “pointwise in time” learning algorithm and cannot Guaranteed Robustness with Fast Adaptation. SIAM, 2010.
generalize across tasks. The presented controller achieves [13] Girish Chowdhary, Tansel Yucelen, Maximillian Mühlegg, and Eric N
Johnson. Concurrent learning adaptive control of linear systems with
tighter tracking with smaller tracking error in both outer exponentially convergent bounds. International Journal of Adaptive
and inner loop states as shown in Fig-2b and Fig-3a in Control and Signal Processing, 27(4):280–301, 2013.
(a) (b)
Fig. 2: DMRAC Controller Evaluation on 6DOF Quadrotor dynamics model (a) DMRAC vs MRAC vs GP-MRAC Controllers
on quadrotor trajectory tracking with active learning and DMRAC as frozen feed-forward network (Circular Trajectory) to
test network generalization (b) Closed-loop system response in roll rate φ(t) and Pitch θ(t)
(a) (b)
Fig. 3: (a) Position Tracking performance of DMRAC vs MRAC vs GP-MRAC controller with active learning and Learning
retention test over Circular Trajectory for DMRAC (b) DNN Training, Test and Validation performance.
[14] Girish Chowdhary, Hassan A Kingravi, Jonathan P How, and Patri- [24] F L Lewis. Nonlinear Network Structures for Feedback Control. Asian
cio A Vela. Bayesian nonparametric adaptive control using gaussian Journal of Control, 1:205–228, 1999.
processes. Neural Networks and Learning Systems, IEEE Transactions [25] Richard S Sutton, Andrew G Barto, and Ronald J Williams. Rein-
on, 26(3):537–550, 2015. forcement learning is direct adaptive optimal control. IEEE Control
[15] Girish Joshi and Girish Chowdhary. Adaptive control using gaussian- Systems Magazine, 12(2):19–22, 1992.
process with model reference generative network. In 2018 IEEE [26] Hamidreza Modares, Frank L Lewis, and Mohammad-Bagher Naghibi-
Conference on Decision and Control (CDC), pages 237–243. IEEE, Sistani. Integral reinforcement learning and experience replay
2018. for adaptive optimal control of partially-unknown constrained-input
[16] Bernhard Scholkopf, Ralf Herbrich, and Alex Smola. A Generalized continuous-time systems. Automatica, 50(1):193–202, 2014.
Representer Theorem. In David Helmbold and Bob Williamson, edi- [27] Jooyoung Park and Irwin W Sandberg. Universal approximation using
tors, Computational Learning Theory, volume 2111 of Lecture Notes radial-basis-function networks. Neural computation, 3(2):246–257,
in Computer Science, pages 416–426. Springer Berlin / Heidelberg, 1991.
2001. [28] Karl J Åström and Björn Wittenmark. Adaptive control. Courier
[17] Bernhard Schölkopf and Alexander J Smola. Learning with kernels: Corporation, 2013.
Support vector machines, regularization, optimization, and beyond. [29] FL Lewis. Nonlinear network structures for feedback control. Asian
MIT press, 2002. Journal of Control, 1(4):205–228, 1999.
[18] Carl Edward Rasmussen and Christopher KI Williams. Gaussian [30] Gregory Larchev, Stefan Campbell, and John Kaneshige. Projection
process for machine learning. MIT press, 2006. operator: A step toward certification of adaptive controllers. In AIAA
Infotech@ Aerospace 2010, page 3366. 2010.
[19] K Hornik, M Stinchcombe, and H White. Multilayer Feedforward
[31] Kumpati S Narendra and Anuradha M Annaswamy. Stable adaptive
Networks are Universal Approximators. Neural Networks, 2:359–366,
systems. Courier Corporation, 2012.
1989.
[32] Thomas Kailath. Linear systems, volume 156. Prentice-Hall Engle-
[20] Hrushikesh Mhaskar, Qianli Liao, and Tomaso Poggio. Learning wood Cliffs, NJ, 1980.
Functions: When Is Deep Better Than Shallow. arXiv e-prints, page [33] Huan Xu and Shie Mannor. Robustness and generalization. Machine
arXiv:1603.00988, Mar 2016. learning, 86(3):391–423, 2012.
[21] Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Mi- [34] Sara A. van de Geer and Peter Bhlmann. On the conditions used to
randa, and Qianli Liao. Why and when can deep-but not shallow- prove oracle results for the lasso. Electron. J. Statist., 3:1360–1392,
networks avoid the curse of dimensionality: A review. International 2009.
Journal of Automation and Computing, 14(5):503–519, Oct 2017. [35] G. Chowdhary and E. Johnson. A singular value maximizing data
[22] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and recording algorithm for concurrent learning. In Proceedings of the
Oriol Vinyals. Understanding deep learning requires rethinking gen- 2011 American Control Conference, pages 3547–3552, June 2011.
eralization. arXiv e-prints, page arXiv:1611.03530, Nov 2016. [36] Daniel Jakubovitz, Raja Giryes, and Miguel R. D. Rodrigues.
[23] Matus Telgarsky. Benefits of depth in neural networks. arXiv e-prints, Generalization Error in Deep Learning. arXiv e-prints, page
page arXiv:1602.04485, Feb 2016. arXiv:1808.01174, Aug 2018.
[37] Girish Joshi and Radhakant Padhi. Robust control of quadrotors using
neuro-adaptive control augmented with state estimation. In AIAA
Guidance, Navigation, and Control Conference, page 1526, 2017.
[38] Hao Yu and Bogdan M Wilamowski. Levenberg-marquardt training.
Industrial electronics handbook, 5(12):1, 2011.