Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Deep Model Reference Adaptive Control

Girish Joshi and Girish Chowdhary

Abstract— We present a new neuroadaptive architecture: is to find an update law for a parametric model of the
Deep Neural Network based Model Reference Adaptive Con- uncertainty that ensures that the candidate Lyapunov function
trol (DMRAC). Our architecture utilizes the power of deep is non-increasing. Many update laws have been proposed and
neural network representations for modeling significant non-
linearities while marrying it with the boundedness guarantees analyzed, which include but not limited to σ-modification
that characterize MRAC based controllers. We demonstrate [10], e-modification [11], and projection-based updates [9].
arXiv:1909.08602v1 [cs.LG] 18 Sep 2019

through simulations and analysis that DMRAC can subsume More modern laws extending the classical parametric setting
previously studied learning based MRAC methods, such as include `1 -adaptive control [12] and concurrent learning [13]
concurrent learning and GP-MRAC. This makes DMRAC a have also been studied.
highly powerful architecture for high-performance control of
nonlinear systems with long-term learning properties. A more recent work introduced by the author is the
Gaussian Process Model Reference Adaptive Control (GP-
I. I NTRODUCTION MRAC), which utilizes a GP as a model of the uncertainty.
Deep Neural Networks (DNN) have lately shown tremen- A GP is a Bayesian nonparametric adaptive element that
dous empirical performance in many applications and various can adapt both its weights and the structure of the model
fields such as computer vision, speech recognition, trans- in response to the data. The authors and others have shown
lation, natural language processing, Robotics, Autonomous that GP-MRAC has strong long-term learning properties as
driving and many more [1]. Unlike their counterparts such well as high control performance [14], [15]. However, GPs
as shallow networks with Radial Basis Function features [2], are “shallow” machine learning models, and do not utilize
[3], deep networks learn features by learning the weights the power of learning complex features through compositions
of nonlinear compositions of weighted features arranged in as deep networks do (see II-A). Hence, one wonders whether
a directed acyclic graph [4]. It is now pretty clear that the power of deep learning could lead to even more powerful
deep neural networks are outshining other classical machine- learning based MRAC architectures than those utilizing GPs.
learning techniques [5]. Leveraging these successes, there In this paper, we address this critical question: How can
have been many exciting new claims regarding the control MRAC utilize deep networks while guaranteeing stability?
of complex dynamical systems in simulation using deep Towards that goal, our contributions are as follows: a) We
reinforcement learning [6]. However, Deep Reinforcement develop an MRAC architecture that utilizes DNNs as the
Learning (D-RL) methods typically do not guarantee stability adaptive element; b) We propose an algorithm for the online
or even the boundedness of the system during the learning update of the weights of the DNN by utilizing a dual time-
transient. Hence despite significant simulation success, D-RL scale adaptation scheme. In our algorithm, the weights of the
has seldomly been used in safety-critical applications. D-RL outermost layers are adapted in real time, while the weights
methods often make the ergodicity assumption, requiring that of the inner layers are adapted using batch updates c) We
there is a nonzero probability of the system states returning develop theory to guarantee Uniform Ultimate Boundedness
to the origin. In practice, such a condition is typically (UUB) of the entire DMRAC controller; d) We demonstrate
enforced by resetting the simulation when a failure occurs. through simulation results that this architecture has desirable
Unfortunately, however, real-world systems do not have this long term learning properties.
reset option. Unlike, D-RL much effort has been devoted in We demonstrate how DNNs can be utilized in stable
the field of adaptive control to ensuring that the system stays learning schemes for adaptive control of safety-critical sys-
stable during learning. tems. This provides an alternative to deep reinforcement
Model Reference Adaptive Control (MRAC) is one such learning for adaptive control applications requiring stability
leading method for adaptive control that seeks to learn a guarantees. Furthermore, the dual time-scale analysis scheme
high-performance control policy in the presence of signif- used by us should be generalizable to other DNN based
icant model uncertainties [7]–[9]. The key idea in MRAC learning architectures, including reinforcement learning.
II. BACKGROUND
*Supported by the Laboratory Directed Research and Development pro-
gram at Sandia National Laboratories, a multi-mission laboratory managed A. Deep Networks and Feature spaces in machine learning
and operated by National Technology and Engineering Solutions of Sandia,
LLC., a wholly owned subsidiary of Honeywell International, Inc., for The key idea in machine learning is that a given function
the U.S. Department of Energy’s National Nuclear Security Administration can be encoded with weighted combinations of feature vector
under contract DE-NA-0003525. Φ ∈ F, s.t Φ(x) = [φ1 (x), φ2 (x), ..., φk (x)]T ∈ Rk ,
Authors are with Coordinated Science Laboratory,
University of Illinois, Urbana-Champaign, IL, USA and W ∗ ∈ Rk×m a vector of ‘ideal’ weights s.t ky(x) −
T
girishj2@illinois.edu,girishc@illinois.edu W ∗ Φ(x)k∞ < (x). Instead of hand picking features, or
relying on polynomials, Fourier basis functions, comparison- Z M = (Z1 , Z2 , . . . ZM ) of Z−valued random variables
type features used in support vector machines [16], [17] or drawn according to some unknown distribution P ∈ P.
Gaussian Processes [18], DNNs utilize composite functions Where each Zi = {xi , yi } are the labelled pair of input and
of features arranged in a directed acyclic graphs, i.e. Φ(x) = target values. For each P the expected loss can be computed
φn (θn−1 , φn−1 (θn−2 , φn−2 (...)))) where θi ’s are the layer as E p (`(Z, θ)). The above empirical loss (1) is used as
weights. The universal approximation property of the DNN proxy for the expected value of loss with respect to the true
with commonly used feature functions such as sigmoidal, data generating distribution.
tanh, and RELU is proved in the work by Hornik’s [19] and Optimization based on the Stochastic Gradient Descent
shown empirically to be true by recent results [20]–[22]. (SGD) algorithm uses a stochastic approximation of the
Hornik et al. argued the network with at least one hidden gradient of the loss L(Z, θ) obtained over a mini-batch of
layer (also called Single Hidden Layer (SHL) network) M training examples drawn from buffer B. The resulting
to be a universal approximator. However, empirical results SGD weight update rule
show that the networks with more hidden layers show better M
generalization capability in approximating complex function. 1 X
θ k+1 = θk − η ∇θ `(Zi , θ k ) (2)
While the theoretical reasons behind better generalization M i=1
ability of DNN are still being investigated [23], for the
where η is the learning rate. Further details on generating
purpose of this paper, we will assume that it is indeed true,
i.i.d samples for DNN learning and the training details of
and focus our efforts on designing a practical and stable
network are provided in section IV.
control scheme using DNNs.
B. Neuro-adaptive control III. S YSTEM D ESCRIPTION
Neural networks in adaptive control have been studied This section discusses the formulation of model reference
for a very long time. The seminal paper by Lewis [24] adaptive control (see e.g. [7]). We consider the following
utilized Taylor series approximations to demonstrate uniform system with uncertainty ∆(x):
ultimate boundedness with a single hidden neural network. ẋ(t) = Ax(t) + B(u(t) + ∆(x)) (3)
SHL networks are nonlinear in the parameters; hence, the
analysis previously introduced for linear in parameter, radial where x(t) ∈ Rn , t > 0 is the state vector, u(t) ∈ Rm ,
basis function neural networks introduced by Sanner and t > 0 is the control input, A ∈ Rn×n , B ∈ Rn×m are
Slotine does not directly apply [2]. The back-propagation known system matrices and we assume the pair (A, B) is
type scheme with non-increasing Lyapunov candidate as controllable. The term ∆(x) : Rn → Rm is matched system
a constraint, introduced in Lewis’ work has been widely uncertainty and be Lipschitz continuous in x(t) ∈ Dx .
used in Neuro-adaptive MRAC. Concurrent Learning MRAC Let Dx ⊂ Rn be a compact set and the control u(t) is
(CL-MRAC) is a method for learning based neuro-adaptive assumed to belong to a set of admissible control inputs of
control developed by the author to improve the learning measurable and bounded functions, ensuring the existence
properties and provide exponential tracking and weight error and uniqueness of the solution to (3).
convergence guarantees. However, similar guarantees have The reference model is assumed to be linear and therefore
not been available for SHL networks. There has been much the desired transient and steady-state performance is defined
work, towards including deeper neural networks in control; by a selecting the system eigenvalues in the negative half
however, strong guarantees like those in MRAC on the plane. The desired closed-loop response of the reference
closed-loop stability during online learning are not available. system is given by
In this paper, we propose a dual time-scale learning approach
which ensures such guarantees. Our approach should be ẋrm (t) = Arm xrm (t) + Brm r(t) (4)
generalizable to other applications of deep neural networks, where xrm (t) ∈ Dx ⊂ Rn and Arm ∈ Rn×n is Hurwitz and
including policy gradient Reinforcement Learning (RL) [25] Brm ∈ Rn×r . Furthermore, the command r(t) ∈ Rr denotes
which is very close to adaptive control in its formulation and a bounded, piece wise continuous, reference signal and we
also to more recent work in RL for control [26]. assume the reference model (4) is bounded input-bounded
C. Stochastic Gradient Descent and Batch Training output (BIBO) stable [7].
The true uncertainty ∆(x) in unknown, but it is assumed
We consider a deep network model with parameters θ,
to be continuous over a compact domain Dx ⊂ Rn . A
and consider the problem of optimizing a non convex loss
Deep Neural Networks (DNN) have been widely used to
function L(Z, θ), with respect to θ. Let L(Z, θ) is defined
represent a function when the basis vector is not known.
as average loss over M training sample data points.
Using DNNs, a non linearly parameterized network estimate
M ˆ
of the uncertainty can be written as ∆(x) , θnT Φ(x), where
1 X
L(Z, θ) = `(Zi , θ) (1) θn ∈ Rk×m are network weights for the final layer and
M i=1
Φ(x) = φn (θn−1 , φn−1 (θn−2 , φn−2 (...)))), is a k dimen-
where M denotes the size of sample training set. For each sional feature vector which is function of inner layer weights,
sample size of M , the training data are in form of M -tuple activations and inputs. The basis vector Φ(x) ∈ F : Rn →
Rk is considered to be Lipschitz continuous to ensure the (5). To achieve the asymptotic convergence of the reference
existence and uniqueness of the solution (3). model tracking error to zero, we use the D-MRGeN estimate
in the controller (5) as νad = ∆0 (x)
A. Total Controller
The aim is to construct a feedback law u(t), t > 0, νad (t) = W T φn (θn−1 , φn−1 (θn−2 , φn−2 (...)))) (6)
such that the state of the uncertain dynamical system (3) To differentiate the weights of D-MRGeN from last layer
asymptotically tracks the state of the reference model (4) weights of DNN “θn ”, we denote D-MRGeN weights as
despite the presence of matched uncertainty. “W ”.
A tracking control law consisting of linear feedback term Assumption 1: Appealing to the universal approximation
upd = Kx(t), a linear feed-forward term ucrm = Kr r(t) property of Neural Networks [27] we have that, for every
and an adaptive term νad (t) form the total controller given basis functions Φ(x) ∈ F there exists unique ideal
u = upd + ucrm − νad (5) weights W ∗ ∈ Rk×m and 1 (x) ∈ Rm such that the
following approximation holds
The baseline full state feedback and feed-forward controller
is designed to satisfy the matching conditions such that ∆(x) = W ∗T Φ(x) + 1 (x), ∀x(t) ∈ Dx ⊂ Rn (7)
Arm = A − BK and Brm = BKr . For the adaptive Fact 1: The network approximation error 1 (x) is upper
controller ideally we want νad (t) = ∆(x(t)). Since we bounded, s.t ¯1 = supx∈Dx k1 (x)k, and can be made
do not have true uncertainty information, we use a DNN arbitrarily small given sufficiently large number of basis
estimate of the system uncertainties in the controller as functions.
ˆ
νad (t) = ∆(x(t)). The reference model tracking error is defined as e(t) =
B. Deep Model Reference Generative Network (D-MRGEN) xrm (t)−x(t). Using (3) & (4) and the controller of form (5)
for uncertainty estimation with adaptation term νad , the tracking error dynamics can be
written as
Unlike traditional MRAC or SHL-MRAC weight update
ė(t) = ẋrm (t) − ẋ(t) (8)
rule, where the weights are moved in the direction of
diminishing tracking error, training a deep Neural network ė(t) = Arm e(t) + W̃ T Φ(x) + 1 (x) (9)
is much more involved. Feed-Forward networks like DNNs
are trained in a supervised manner over a batch of i.i.d data. where W̃ = W ∗ − W is error in parameter.
Deep learning optimization is based on Stochastic Gradient The estimate of the unknown true network parameters W ∗
Descent (SGD) or its variants. The SGD update rule relies are calculated on-line using the weight update rule (10);
on a stochastic approximation of the expected value of the correcting the weight estimates in the direction of minimizing
gradient of the loss function over a training set or mini- the instantaneous tracking error e(t). The resulting update
batches. rule for network weights in estimating the total uncertainty
To train a deep network to estimate the system uncer- in the system is as follows
tainties, unlike MRAC we need labeled pairs of state-true Ẇ = Γproj(W, Φ(x)e(t)0 P ) W (0) = W0 (10)
uncertainties {x(t), ∆(x(t))} i.i.d samples. Since we do not
have access to true uncertainties (∆(x)), we use a generative where Γ ∈ Rk×k is the learning rate and P ∈ Rn×n
network to generate estimates of ∆(x) to create the labeled is a positive definite matrix. For given Hurwitz Arm , the
targets for deep network training. For details of the generative matrix P ∈ Rn×n is a positive definite solution of Lyapunov
network architecture in the adaptive controller, please see equation ATrm P + P Arm + Q = 0 for given Q > 0
[15]. This generative network is derived from separating the Assumption 2: For uncertainty parameterized by unknown
DNN into inner feature layer and the final output layer of the true weight W ∗ ∈ Rk×m and known nonlinear basis Φ(x),
network. We also separate in time-scale the weight updates of the ideal weight matrix is assumed to be upper bounded s.t
these two parts of DNN. Temporally separated weight update kW ∗ k ≤ Wb . This is not a restrictive assumption.
algorithm for the DNN, approximating system uncertainty is 1) Lyapunov Analysis: The on-line adaptive identifica-
presented in more details in further sections. tion law (10) guarantees the asymptotic convergence of
the tracking errors e(t) and parameter error W̃ (t) under
C. Online Parameter Estimation law the condition of persistency of excitation [7], [28] for the
The last layer of DNN with learned features from inner structured uncertainty. Similar to the results by Lewis for
layer forms the Deep-Model Reference Generative Network SHL networks [29], we show here that under the assumption
(D-MRGeN). We use the MRAC learning rule to update of unstructured uncertainty represented by a deep neural
pointwise in time, the weights of the D-MRGeN in the network, the tracking error is uniformly ultimately bounded
direction of achieving asymptotic tracking of the reference (UUB). We will prove the following theorem under switching
model by the actual system. feature vector assumption.
Since we use the D-MRGeN estimates to train DNN Theorem 1: Consider the actual and reference plant model
model, we first study the admissibility and stability character- (3) & (4). If the weights parameterizing total uncertainty in
istics of the generative model estimate ∆0 (x) in the controller the system are updated according to identification law (10)
Then the tracking error kek and error in network weights
kW̃ k are bounded for all Φ ∈ F.
Proof: The feature vectors belong to a function class
characterized by the inner layer network weights θi s.t
Φ ∈ F. We will prove the Lyapunov stability under the
assumption that inner layer of DNN presents us a feature
which results in the worst possible approximation error
compared to network with features before switch.
For the purpose of this proof let Φ(x) denote feature
before switch and Φ̄(x) be the feature after switch. We define
the error 2 (x) as,
2 (x) = sup W T Φ̄(x) − W T Φ(x) (11)
Φ̄∈F

Similar to Fact-1 we can upper bound the error 2 (x)


as ¯2 = supx∈Dx k2 (x)k. By adding and subtracting the
term W T Φ̄(x), we can rewrite the error dynamics (9) with Fig. 1: DMRAC training and controller details
switched basis as, theory and Barbalats Lemma [31] we can show that e(t) is
∗T T uniformly ultimately bounded in vicinity to zero solution.
ė(t) = Arm e(t) + W Φ(x) − W Φ(x)
T T From Theorem-1 & (9) and using system theory [32]
+W Φ̄(x) − W Φ̄(x) + 1 (x) (12)
we can infer that as e(t) → 0, ∆0 (x) → ∆(x) in point-
From Assumption-1 we know there exists a W ∗ ∀Φ ∈ wise sense. Hence D-MRGeN estimates yτ = ∆0 (xτ ) are
F. Therefore we can replace W ∗T Φ(x) by W ∗T Φ̄(x) and admissible target values for training DNN features over the
rewrite the Eq-(12) as data Z M = {{xτ , yτ }}M τ =1 .
The details of DNN training and implementation details
ė(t) = Arm e(t) + W̃ T Φ̄(x) + W T (Φ̄(x) − Φ(x)) + 1 (x) of DMRAC controller is presented in the following section.
(13)
IV. A DAPTIVE C ONTROL USING D EEP NETS (DMRAC)
For arbitrary switching, for any Φ̄(x) ∈ F, we can prove the
The DNN architecture for MRAC is trained in two steps.
boundedness by considering worst possible approximation
We separate the DNN into two networks, as shown in Fig-
error and therefore can write,
1. The faster learning outer adaptive network and slower
ė(t) = Arm e(t) + W̃ T Φ̄(x) + 2 (x) + 1 (x) (14) deep feature network. DMRAC learns underlying deep fea-
ture vector to the system uncertainty using locally exciting
Now lets consider V (e, W̃ ) > 0 be a differentiable, positive uncertainty estimates obtained using a generative network.
definite radially unbounded Lyapunov candidate function, Between successive updates of the inner layer weights, the
W̃ T −1
Γ W̃ feature provided by the inner layers of the deep network
V (e, W̃ ) = eT P e + (15) is used as the fixed feature vector for outer layer adaptive
2
network update and evaluation. The algorithm for DNN
The time derivative of the lyapunov function (15) along the
learning and DMRAC controller is provided in Algorithm-1.
trajectory (14) can be evaluated as
Through this architecture of mixing two-time scale learning,
˙
V̇ (e, W̃ ) = ėT P e + eT P ė − W̃ T Γ−1 Ŵ (16) we fuse the benefits of DNN memory through the retention
of relevant, exciting features and robustness, boundedness
Using (14) & (10) in (16), the time derivative of the lyan- guarantee in reference tracking. This key feature of the
punov function reduces to presented framework ensures robustness while guaranteeing
long term learning and memory in the adaptive network.
V̇ (e, W̃ ) = −eT Qe + 2eT P (x) (17)
Also as indicated in the controller architecture Fig-1 we
where (x) = 1 (x) + 2 (x) and ¯ = ¯1 + ¯2 . can use contextual state ‘ci ’ other than system state x(t)
Hence V̇ (e, W̃ ) ≤ 0 outside compact neighborhood of the to extract relevant features. These contextual states could be
origin e = 0, for some sufficiently large λmin (Q). relevant model information not captured in system states. For
example, for an aircraft system, vehicle parameters like pitot
2λmax (P )¯
ke(t)k ≥ (18) tube measurement, the angle of attack, engine thrust, and so
λmin (Q)
on. These contextual states can extract features which help in
Using the BIBO assumption xrm (t) is bounded for bounded decision making in case of faults. The work on DMRAC with
reference signal r(t), thereby x(t) remains bounded. Since contextual states will be dealt with in the follow on work.
V (e, W̃ ) is radially unbounded the result holds for all x(0) ∈ The DNN in DMRAC controller is trained over training
Dx . Using the fact, the error in parameters W̃ are bounded dataset Z M = {xi , ∆0 (xi )}M 0
i=1 , where the ∆ (xi ) are D-
through projection operator [30] and further using Lyapunov MRGeN estimates of the uncertainty. The training dataset
Z M is randomly drawn from a larger data buffer B. Not forward network over similar reference tracking tasks with-
every pair of data {xi , ∆0 (xi )} from D-MRGeN is added out loss of the guaranteed tracking performance.
to the training buffer B. We qualify the input-target pair
based on kernel independence test such that to ensure that B. Method for Recording Data using MRGeN for DNN
we collect locally exciting independent information which Training
provides a sufficiently rich representation of the operating In statistical inference, implicitly or explicitly one always
domain. Since the state-uncertainty data is the realization of assume that the training set Z M = {xi , yi }M i=1 is composed
a Markov process, such a method for qualifying data to be on M-input-target tuples that are independently drawn from
sufficiently independent of previous data-points is necessary. buffer B over same joint distribution P (x, y). The i.i.d
The algorithm details to qualify and add a data point to the assumption on the data is required for robustness, consistency
buffer is provided in detail in subsection IV-B. of the network training and for bounds on the generalization
error [33], [34]. In classical generalization proofs one such
1
A. Details of Deep Feature Training using D-MRGeN condition is that pmax XT X → γ as pmax → ∞, where X
denotes the design matrix with rows ΦTi . The i.i.d assumption
This section provides the details of the DNN training over implies the above condition is fulfilled and hence is sufficient
data samples observed over n-dimensional input subspace but not necessary condition for consistency and error bound
x(t) ∈ X ∈ Rn and m-dimensional targets subspace y ∈ for generative modeling.
Y ∈ Rm . The sample set is denoted as Z where Z ∈ X × Y. The key capability brought about by DMRAC is a relevant
We are interested in the function approximation tasks for feature extraction from the data. Feature extraction in DNN
DNN. The function fθ is the learned approximation to the is achieved by using recorded data concurrently with current
model uncertainty with parameters θ ∈ Θ, where Θ is the data. The recorded data include the state xi , feature vector
space of parameters, i.e. fθ : Rn → Rm . We assume a Φ(xi ) and associated D-MRGeN estimate of the uncertainty
training data buffer B has pmax training examples, such ∆0 (xi ). For a given ζtol ∈ R+ a simple way to select
that the set Z pmax = {Zi |Zi ∈ Z}pi=1 max
= {(xi , yi ) ∈ the instantaneous data point {xi , ∆0 (xi )} for recording is
pmax
X × Y}i=1 . The samples are independently drawn from to required to satisfy following condition
the buffer B over probability distribution P . The hypothesis
set, which consist of all possible functions fθ is denoted as kΦ(xi ) − Φp k2
γi = ≥ ζtol (19)
H. Therefore a learning algorithm A (in our case SGD) is a kΦ(xi )k
mapping from A : Z pmax → H Where the index p is over the data points in buffer B. The
The loss function, which measures the discrepancy be- above method ascertains only those data points are selected
tween true target y and algorithm’s estimated target function for recording that are sufficiently different from all other
value fθ is denoted by L(y, fθ (x)). Specific to work pre- previously recorded data points in the buffer. Since the buffer
sented in this paper, we use a `2 -norm between values i.e. B is of finite dimension, the data is stored in a cyclic manner.
Ep (`(y, fθ (x))) = EP (kyi − fθ (xi )k2 ) as loss function for As the number of data points reaches the buffer budget, a new
DNN training. The empirical loss (1) is used to approximate data is added only upon one existing data point is removed
the loss function since the distribution P is unknown to such that the singular value of the buffer is maximized. The
learning algorithm. The weights are updated using SGD in singular value maximization approach for the training data
the direction of negative gradient of the loss function as given buffer update is provided in [35].
in (2).
Unlike the conventional DNN training where the true V. S AMPLE C OMPLEXITY AND S TABILITY A NALYSIS
target values y ∈ Y are available for every input x ∈ X , FOR DMRAC
in DMRAC true system uncertainties as the labeled targets In this section, we present the sample complexity results,
are not available for the network training. We use the generalization error bounds and stability guarantee proof for
part of the network itself (the last layer) with pointwise DMRAC. We show that DMRAC controller is characterized
weight updated according to MRAC-rule as the generative by the memory of the features learned over previously
model for the data. The D-MRGeN uncertainty estimates observed training data. We further demonstrate in simulation
y = W T Φ(x, θ1 , θ2 , . . . θn−1 ) = ∆0 (x) along with inputs that when a trained DMRAC is used as a feed-forward net-
xi make the training data set Z pmax = {xi , ∆0 (xi )}pi=1 max
. work with frozen weights, can still produce bounded tracking
Note that we use interchangably xi and x(t) as discrete performance on reference tracking tasks that are related
representation of continuous state vector for DNN training. but reasonably different from those seen during network
The main purpose of DNN in the adaptive network is to training. We ascribe this property of DMRAC to the very
extract relevant features of the system uncertainties, which low generalization error bounds of the DNN. We will prove
otherwise is very tedious to obtain without the limits on the this property in two steps. Firstly we will prove the bound on
domain of operation. the generalization error of DNN using Lyapunov theory such
We also demonstrate empirically, that the DNN features that we achieve an asymptotic convergence in tracking error.
trained over past i.i.d representative data retains the memory Further, we will show information theoretically the lower
of the past instances and can be used as the frozen feed- bound on the number of independent samples we need to
Algorithm 1 D-MRAC Controller Training small by tuning network architecture and training epochs.
1: Input: Γ, η, ζtol , pmax The reference tracking error dynamics can be written as,
2: while New measurements are available do
3: Update the D-MRGeN weights W using Eq:(10) ė(t) = Arm e(t) +  (24)
4: Compute yτ +1 = Ŵ T Φ(xτ +1 ) To analyze the asymptotic tracking performance of the
5: Given xτ +1 compute γτ +1 by Eq-(19). error dynamics under DMRAC controller we can define a
6: if γτ +1 > ζtol then Lyapunov candidate function as V (e) = eT P e and its time
7: Update B : Z(:) = {xτ +1 , yτ +1 } and X : Φ(xτ +1 ) derivative along the error dynamics (24) can be written as
8: if |B| > pmax then V̇ (e) = −eT Qe + 2P e (25)
9: Delete element in B by SVD maximization [35]
10: end if where Q is solution for the Lyaunov equation ATrm P +
11: end if P Arm = −Q. To satisfy the condition V̇ (e) < 0 we get
12: if |B| ≥ M then the following upper bound on generalization error,
13: Sample a mini-batch of data Z M ⊂ B
λmax (Q)kek
14: Train the DNN network over mini-batch data using kk < (26)
λmin (P )
Eq-(2)
15: Update the feature vector Φ for D-MRGeN network The idea is, that if the DNN produces a generalization error
16: end if lower than the specified bound (26), then we can claim
17: end while Lyanpunov stability of the system under DMRAC controller.

B. Sample Complexity of DMRAC


train through before we can claim the DNN generalization In this section, we will study the sample complexity results
error is well below a determined lower level given by from computational theory and show that when applied to a
Lyapunov analysis. network learning real-valued functions the number of training
samples grows at least linearly with the number of tunable
A. Stability Analysis parameters to achieve specified generalization error.
The generalization error of a machine learning model Theorem 2: Suppose a neural network with arbitrary ac-
is defined as the difference between the empirical loss of tivation functions and an output that takes values in [−1, 1].
the training set and the expected loss of test set [36]. Let H be the hypothesis class characterized by N-weights
This measure represents the ability of the trained model to and each weight represented using k-bits. Then any squared
generalize well from the learning data to new unseen data, error minimization (SEM) algorithm A over H, to achieve
thereby being able to extrapolate from training data to new a generalization error (26) admits a sample complexity
test data. Hence generalization error can be defined as bounded as follows
  
ˆ
∆(x) − fθ (x) 6  (20) 1 2
mA (, δ) 6 2 kN ln 2 + ln (27)
 δ
Using the DMRAC (as frozen network) controller in (5) and
using systems (3) we can write the system dynamics as where N is total number of tunable weights in the DNN.
Proof: Let H be finite hypothesis class of function
ẋ(t) = Ax(t) + B(−Kx(t) + Kr r(t) − fθ (x(t)) + ∆(x)) mapping s.t H : X → [−1, 1] ∈ Rm and A is SEM algorithm
(21) for H. Then by Hoeffding inequality for any fixed fθ ∈ H
We can simplify the above equation as the following event holds with a small probability δ

ẋ(t) = Arm x(t) + Brm r(t) + B(∆(x) − fθ (x(t))) (22) P m {|L(Z, θ) − EP (`(Z, θ))| ≥ } (28)
( m )
X
Adding and subtracting the term ∆0 (x) in above expression = Pm `(Z, θ) − mEP (`(Z, θ)) ≥ m (29)
and using the training and generalization error definitions we i=1
can write, −2 m/2
≤ 2e (30)
ẋ(t) = Arm x(t) + Brm r(t) (23) Hence
0 0
+B(∆(x) − ∆ (x(t)) + ∆ (x(t)) − fθ (x(t))) P m {∀fθ ∈ H, | |L(Z, θ) − EP (`(Z, θ))| ≥ }
2
≤ 2|H|e− m/2
=δ (31)
The term (∆(x) − ∆0 (x(t))) is the D-MRGeN training error
and (∆0 (x(t)) − fθ (x(t))) is the generalization error of We note that the total number
N of possible states that is as-
the DMRAC DNN network. For simplicity of analysis we signed to the weights is 2k since there are 2k possibilities
assume the training error is zero, this assumption is not for each weights. Therefore H is finite and |H| ≤ 2kN . The
very restrictive since training error can be made arbitrarily result follows immediately from simplifying Eq-(31).
VI. S IMULATIONS both with adaptation and as a feed-forward adaptive network
without adaptation. Figure-3b demonstrate the DNN learning
In this section, we will evaluate the presented DMRAC performance vs epochs. The Training, Testing and Validation
adaptive controller using a 6-DOF Quadrotor model for the error over the data buffer for DNN, demonstrate the network
reference trajectory tracking problem. The quadrotor model performance in learning a model of the system uncertainties
is completely described by 12 states, three position, and and its generalization capabilities over unseen test data.
velocity in the North-East-Down reference frame and three
body angles and angular velocities. The full description of VII. C ONCLUSION
the dynamic behavior of a Quadrotor is beyond the scope In this paper, we presented a DMRAC adaptive con-
of this paper, and interested readers can refer to [37] and troller using model reference generative network architec-
references therein. ture to address the issue of feature design in unstructured
The control law designed treats the moments and forces on uncertainty. The proposed controller uses DNN to model
the vehicle due to unknown true inertia/mass of the vehicle significant uncertainties without knowledge of the system’s
and moments due to aerodynamic forces of the crosswind, domain of operation. We provide theoretical proofs of the
as the unmodeled uncertainty terms and are captured online controller generalizing capability over unseen data points
through DNN adaptive element. The outer-loop control of and boundedness properties of the tracking error. Numeri-
the quadrotor is achieved through Dynamic Inversion (DI) cal simulations with 6-DOF quadrotor model demonstrate
controller, and we use DMRAC for the inner-loop attitude the controller performance, in achieving reference model
control. A simple wind model with a boundary layer effect tracking in the presence of significant matched uncertainties
is used to simulate the effect of crosswind on the vehicle. and also learning retention when used as a feed-forward
A second-order reference model with natural frequency adaptive network on similar but unseen new tasks. Thereby
4rad/s and damping ratio of 0.5 is used. Further stochastic- we claim DMRAC is a highly powerful architecture for high-
ity is added to the system by adding Gaussian white noise performance control of nonlinear systems with robustness
to the states with a variance of ωn = 0.01. The simulation and long-term learning properties.
runs for 150secs and uses time step of 0.05s. The maximum R EFERENCES
number of points (pmax ) to be stored in buffer B is arbitrarily [1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning.
set to 250, and SVD maximization algorithm is used to MIT press, 2016.
cyclically update B when the budget is reached, for details [2] R M Sanner and J.-J.E. Slotine. Gaussian networks for direct adaptive
control. Neural Networks, IEEE Transactions on, 3(6):837–863, 11
refer [35]. 1992.
The controller is designed to track a stable reference [3] Miao Liu, Girish Chowdhary, Bruno Castra da Silva, Shih-Yuan Liu,
commands r(t). The goal of the experiment is to evaluate and Jonathan P How. Gaussian processes for learning and control: A
tutorial with examples. IEEE Control Systems Magazine, 38(5):53–86,
the tracking performance of the proposed DMRAC controller 2018.
on the system with uncertainties over an unknown domain [4] Dong Yu, Michael L. Seltzer, Jinyu Li, Jui-Ting Huang, and Frank
of operation. The learning rate for D-MRGeN network and Seide. Feature Learning in Deep Neural Networks - Studies on Speech
Recognition Tasks. arXiv e-prints, page arXiv:1301.3605, Jan 2013.
DMRAC-DNN networks are chosen to be Γ = 0.5I6×6 and [5] Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mo-
η = 0.01. The DNN network is composed of 2 hidden layers hamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick
with 200, 100 neurons and with tan-sigmoid activations, and Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic
modeling in speech recognition. IEEE Signal processing magazine,
output layer with linear activation. We use “Levenberg- 29, 2012.
Marquardt backpropagation” [38] for updating DNN weights [6] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu,
over 100 epochs. Tolerance threshold for kernel indepen- Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller,
Andreas K Fidjeland, Georg Ostrovski, and others. Human-level
dence test is selected to be ζtol = 0.2 for updating the buffer control through deep reinforcement learning. Nature, 518(7540):529–
B. 533, 2015.
Figure-2a and Fig-2b show the closed loop system per- [7] P Ioannou and J Sun. Theory and design of robust direct and
indirect adaptive-control schemes. International Journal of Control,
formance in tracking the reference signal for DMRAC con- 47(3):775–813, 1988.
troller and learning retention when used as the feed-forward [8] Gang Tao. Adaptive control design and analysis, volume 37. John
network on a similar trajectory (Circular) with no learning. Wiley & Sons, 2003.
[9] J.-B. Pomet and L Praly. Adaptive nonlinear regulation: estimation
We demonstrate the proposed DMRAC controller under from the Lyapunov equation. Automatic Control, IEEE Transactions
uncertainty and without domain information is successful on, 37(6):729–740, 6 1992.
in producing desired reference tracking. Since DMRAC, [10] Petros A Ioannou and Jing Sun. Robust adaptive control, volume 1.
PTR Prentice-Hall Upper Saddle River, NJ, 1996.
unlike traditional MRAC, uses DNN for uncertainty esti- [11] Anuradha M Annaswamy and Kumpati S Narendra. Adaptive control
mation is hence capable of retaining the past learning and of simple time-varying systems. In Decision and Control, 1989.,
thereby can be used in tasks with similar features without Proceedings of the 28th IEEE Conference on, page 1014?1018 vol.2,
12 1989.
active online adaptation Fig-2b. Whereas traditional MRAC [12] Naira Hovakimyan and Chengyu Cao. 1 Adaptive Control Theory:
which is “pointwise in time” learning algorithm and cannot Guaranteed Robustness with Fast Adaptation. SIAM, 2010.
generalize across tasks. The presented controller achieves [13] Girish Chowdhary, Tansel Yucelen, Maximillian Mühlegg, and Eric N
Johnson. Concurrent learning adaptive control of linear systems with
tighter tracking with smaller tracking error in both outer exponentially convergent bounds. International Journal of Adaptive
and inner loop states as shown in Fig-2b and Fig-3a in Control and Signal Processing, 27(4):280–301, 2013.
(a) (b)
Fig. 2: DMRAC Controller Evaluation on 6DOF Quadrotor dynamics model (a) DMRAC vs MRAC vs GP-MRAC Controllers
on quadrotor trajectory tracking with active learning and DMRAC as frozen feed-forward network (Circular Trajectory) to
test network generalization (b) Closed-loop system response in roll rate φ(t) and Pitch θ(t)

(a) (b)
Fig. 3: (a) Position Tracking performance of DMRAC vs MRAC vs GP-MRAC controller with active learning and Learning
retention test over Circular Trajectory for DMRAC (b) DNN Training, Test and Validation performance.

[14] Girish Chowdhary, Hassan A Kingravi, Jonathan P How, and Patri- [24] F L Lewis. Nonlinear Network Structures for Feedback Control. Asian
cio A Vela. Bayesian nonparametric adaptive control using gaussian Journal of Control, 1:205–228, 1999.
processes. Neural Networks and Learning Systems, IEEE Transactions [25] Richard S Sutton, Andrew G Barto, and Ronald J Williams. Rein-
on, 26(3):537–550, 2015. forcement learning is direct adaptive optimal control. IEEE Control
[15] Girish Joshi and Girish Chowdhary. Adaptive control using gaussian- Systems Magazine, 12(2):19–22, 1992.
process with model reference generative network. In 2018 IEEE [26] Hamidreza Modares, Frank L Lewis, and Mohammad-Bagher Naghibi-
Conference on Decision and Control (CDC), pages 237–243. IEEE, Sistani. Integral reinforcement learning and experience replay
2018. for adaptive optimal control of partially-unknown constrained-input
[16] Bernhard Scholkopf, Ralf Herbrich, and Alex Smola. A Generalized continuous-time systems. Automatica, 50(1):193–202, 2014.
Representer Theorem. In David Helmbold and Bob Williamson, edi- [27] Jooyoung Park and Irwin W Sandberg. Universal approximation using
tors, Computational Learning Theory, volume 2111 of Lecture Notes radial-basis-function networks. Neural computation, 3(2):246–257,
in Computer Science, pages 416–426. Springer Berlin / Heidelberg, 1991.
2001. [28] Karl J Åström and Björn Wittenmark. Adaptive control. Courier
[17] Bernhard Schölkopf and Alexander J Smola. Learning with kernels: Corporation, 2013.
Support vector machines, regularization, optimization, and beyond. [29] FL Lewis. Nonlinear network structures for feedback control. Asian
MIT press, 2002. Journal of Control, 1(4):205–228, 1999.
[18] Carl Edward Rasmussen and Christopher KI Williams. Gaussian [30] Gregory Larchev, Stefan Campbell, and John Kaneshige. Projection
process for machine learning. MIT press, 2006. operator: A step toward certification of adaptive controllers. In AIAA
Infotech@ Aerospace 2010, page 3366. 2010.
[19] K Hornik, M Stinchcombe, and H White. Multilayer Feedforward
[31] Kumpati S Narendra and Anuradha M Annaswamy. Stable adaptive
Networks are Universal Approximators. Neural Networks, 2:359–366,
systems. Courier Corporation, 2012.
1989.
[32] Thomas Kailath. Linear systems, volume 156. Prentice-Hall Engle-
[20] Hrushikesh Mhaskar, Qianli Liao, and Tomaso Poggio. Learning wood Cliffs, NJ, 1980.
Functions: When Is Deep Better Than Shallow. arXiv e-prints, page [33] Huan Xu and Shie Mannor. Robustness and generalization. Machine
arXiv:1603.00988, Mar 2016. learning, 86(3):391–423, 2012.
[21] Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Mi- [34] Sara A. van de Geer and Peter Bhlmann. On the conditions used to
randa, and Qianli Liao. Why and when can deep-but not shallow- prove oracle results for the lasso. Electron. J. Statist., 3:1360–1392,
networks avoid the curse of dimensionality: A review. International 2009.
Journal of Automation and Computing, 14(5):503–519, Oct 2017. [35] G. Chowdhary and E. Johnson. A singular value maximizing data
[22] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and recording algorithm for concurrent learning. In Proceedings of the
Oriol Vinyals. Understanding deep learning requires rethinking gen- 2011 American Control Conference, pages 3547–3552, June 2011.
eralization. arXiv e-prints, page arXiv:1611.03530, Nov 2016. [36] Daniel Jakubovitz, Raja Giryes, and Miguel R. D. Rodrigues.
[23] Matus Telgarsky. Benefits of depth in neural networks. arXiv e-prints, Generalization Error in Deep Learning. arXiv e-prints, page
page arXiv:1602.04485, Feb 2016. arXiv:1808.01174, Aug 2018.
[37] Girish Joshi and Radhakant Padhi. Robust control of quadrotors using
neuro-adaptive control augmented with state estimation. In AIAA
Guidance, Navigation, and Control Conference, page 1526, 2017.
[38] Hao Yu and Bogdan M Wilamowski. Levenberg-marquardt training.
Industrial electronics handbook, 5(12):1, 2011.

You might also like