Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CONFIDENTIAL. Limited circulation.

For review only


1
IEEE RA-L submission 23-1391.1

Safe Risk-averse Bayesian Optimization for Controller Tuning


Christopher König1 , Miks Ozols2 , Anastasia Makarova3 ,
Efe C. Balta2 , Andreas Krause3 , Alisa Rupenyan1,2

Abstract— Controller tuning and parameter optimization are objective. BO has been combined with heteroscedastic noise
crucial in system design to improve both the controller and for controller optimization [10] to find optimal hyperparame-
underlying system performance. Bayesian optimization has ters of a stochastic model predictive controller, however, the
been established as an efficient model-free method for controller
tuning and adaptation. Standard methods, however, are not safety aspect of the approach was not studied. Our approach
enough for high-precision systems to be robust with respect considers safety of the underlying system and combines
to unknown input-dependent noise and stable under safety aspects of safe BO and risk-averse BO. This ensures that
constraints. In this work, we present a novel data-driven there are no constraint violations during the optimization,
approach, RaGooSe, for safe controller tuning in the presence and enables the application of the approach to continuous
of heteroscedastic noise, combining safe learning with risk-
averse Bayesian optimization. We demonstrate the method for optimization, e.g., [11], [12] concerning controller parameter
synthetic benchmark and compare its performance to estab- adaptation for systems operating under changing conditions.
lished BO-based tuning methods. We further evaluate RaGoose Contribution. In this work, we make the following con-
performance on a real precision-motion system utilized in tributions: (1) we propose a Bayesian optimization algorithm
semiconductor industry applications and compare it to the for safe policy search in the presence of heteroscedastic
built-in auto-tuning routine.
noise. The noise affects the surrogate models used in the
data-driven constrained optimization procedure, which are
I. I NTRODUCTION built using features from the underlying noisy data, and
the exploration-exploitation strategy. (2) We demonstrate the
High-precision, high-stakes systems operation often re-
approach for a benchmark problem with safety constraints
quires risk-averse parameter selection, even if this results in a
and compare the achieved performance to that achieved
trade-off with the best expected performance of the systems.
without accounting for the noise based on [11], and that
The standard approach in tuning the control parameters of
achieved with another constrained BO approach [13]. (3)
such systems assumes constant observation noise, and tuning
We demonstrate the approach on a numerical simulation of
is often based on a model-based or rule-based procedure, thus
a high-precision motion system, and apply it in practice
limiting the system’s performance. The goal of this work
to tune the controller on the real system (Fig. 3), used
is to demonstrate a novel flexible, risk-averse approach to
in semiconductor manufacturing, in comparison with the
find optimal control parameters under input-dependent noise,
industry standard tuning approach, via the built-in auto-tuner
based only on observation of the system performance.
in the system.
Optimizing performance indicators derived from system II. R ELATED WORK
data for tuning of control parameters has been explored
BO for controller tuning. BO has been applied in the
through various approaches, e.g., iterative feedforward tun-
tuning of various types of controllers for fast motion sys-
ing [1], variable gain selection [2], data-driven feedforward
tems. BO-based tuning for enhanced performance has been
learning [3]. The performance criteria can be represented
demonstrated for cascade controllers of linear axis drives,
by features in the data measured during system operation at
where data-driven performance metrics have been used to
different values of the controller parameters [4]. Then, the
intentionally increase the traversal time and the tracking
low-level controller parameters can be optimized to fulfill the
accuracy while reducing vibrations [14]. In model predictive
desired performance criteria. Bayesian optimization (BO) [5]
control (MPC), instead of adapting the controller for the
denotes a class of sample-efficient, black-box optimization
worst-case scenarios, the prediction model can be selected
algorithms that have been used to address a wide range of
to provide the best closed-loop performance by tuning the
problems, see [6] for a review. BO has been successfully
parameters in the MPC optimization objective for maximum
demonstrated in controller tuning with custom unknown
performance [15]–[17].
objectives and constraints in the existing literature [7], [8].
Safe BO. To ensure optimal control performance for time-
An important aspect for practical applicability of BO is
varying systems, controllers should adapt their parameters
safety. In this work, we follow the definition for safety
over time, while avoiding unsafe, or unstable parameters. Es-
from [9] as optimization under constraints, separate from the
pecially in high-precision motion systems, even one iteration
This project was funded by the Swiss Innovation Agency, grant Nr. 46716, with excessive vibrations is not allowed during operation.
and by the Swiss National Science Foundation under NCCR Automation, Thus, learning the parameters of cascaded controllers with
grant Nr. 180545. safety and stability constraints is needed to ensure that
1 Inspire AG, Zurich, Switzerland
2 Automatic Control Laboratory, ETH Zurich, Switzerland only safe parameters are explored [11], [12]. The SafeOpt
3 Learning & Adaptive Systems group, ETH Zurich, Switzerland algorithm [18] achieves safety, but is inefficient due to its
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
2
IEEE RA-L submission 23-1391.1

exploration strategy [9]. This has been addressed in [19], Bayesian optimization employs two main ingredients:
where the safe set is not actively expanded, which may probabilistic model (for uncertainty quantification) and ac-
compromise optimality but works well for the considered quisition function (for sampling policy of the next inputs).
application. Another approach applied to controller tuning, is Probabilistic model. Gaussian processes (GPs) [28] pro-
to add a safety-related log-barrier in the cost, as demonstrated vide a distribution over the space of functions commonly
in [20], to reduce constraint violations as compared to an used in non-parametric Bayesian regression. Posterior GP
implementation based on BO with inequality constraints mean and variance denoted by µt (·) and σt2 (·), respectively,
[13]. An efficient way to enable safe exploration is proposed are computed based on the previous measurements y1:t =
by GoOSE [21], ensuring that every input to the system sat- [y1 , . . . , yt ]⊤ and a given kernel κ(·, ·) :
isfies an unknown, but observable constraint. For controller
tuning, GoOSE unifies time-varying Gaussian process bandit µt (x) = κt (x)T (Kt + Σt )−1 y1:t , (3)
⊤ −1
σt2 (x)

optimization [22] with multi-task Gaussian processes [23], = κ(x, x) − κt (x) (Kt + Σt ) κt (x) , (4)
and with efficient safe set search based on particle swarm
where Σt = diag(ρ2 (x1 ), . . . , ρ2 (xt )), (Kt )i,j = κ(xi , xj ),
optimization, enabling its application in the adaptive tuning
κt (·)T = [κ(x1 , ·), .., κ(xt , ·)]T , f ∼ GP (µ0 , λ−1 κ) is a
of motion systems [11].
prior.
Tuning in the presence of heteroscedastic noise.
As long as the GP-based model of the objective f is
Input-dependent noise and lack of repeatability are common
well-calibrated, we can use it to construct high-probability
in robotic and motion systems. The main challenge is
confidence bounds for f [27]. In particular, as long as f
that both the objective and noise are unknown. Taking
has a bounded norm in the reproducing kernel Hilbert space
it into account in controller design leads to performance
(RKHS) associated with the covariance function κ used in
improvement, as demonstrated in the MPC-based controller
the GP, f is bounded by lower and upper confidence bounds
design of robotic systems. For example, [10] and [24]
lcbft (x) and ucbft (x), respectively,
account for heteroscedastic noise via an additional flexible
parametric noise model. These approaches, however, utilize lcbft (x) = µt (x) − βtf σtf (x),
a restrictive model of the noise variance. (5)
ucbft (x) = µt (x) + βtf σtf (x),
Alternatively, [25] focus on the risk associated with
querying noisy points and proposing RAHBO, a flexible where βtf > 0 is a parameter that ensures validity of the
approach enjoying theoretical convergence guarantees. In confidence bounds [26].
contrast to the standard BO paradigm that optimizes the Acquisition function acqt : X → R expresses the
expected value only and might fail in the risk-averse setting, informativeness of an input x about the location of objective
RAHBO trades the mean and input-dependent variance of optimum, given the probabilistic model of f . The goal
the objective, while learning both the objective and the of the acquisition function is to trade off exploration,
noise distribution on the fly. To this end, RAHBO extends i.e., learning about uncertain inputs, and exploitation, i.e.,
the concentration inequalities in the case of heteroscedastic choosing inputs that lead to low cost function values. Thus,
noise [26] to the setting of unknown noise following the BO reduces the original black-box optimization problem
optimism-under-uncertainty principle [27]. to cheaper problems xt = argminx∈X acqt (x). One of the
most popular acquisition functions, GP-LCB [27], uses the
III. P RELIMINARIES AND P ROBLEM STATEMENT so-called principle of optimism in the face of uncertainty.
This section introduces the optimization problem of inter- The idea is to rely on an optimistic guess of f (x) via the
est and related preliminaries. lower confidence bound in Eq. (5) and choose xt with the
Classical optimization objective. Consider a problem of lowest guess. Algorithm 1 provides the standard BO loop.
sequentially optimizing a fixed and unknown objective f :
X → R over the compact set of inputs X ⊂ Rd : Algorithm 1 Bayesian Optimization loop
1: Prior f ∼ GP (0, κ)
min f (x). (1)
x∈X 2: for t = 1, . . . , T do
At every iteration t, for a chosen input xt ∈ X , one observes 3: xt ← argminx∈X acqt (x)
a noise-perturbed evaluation yt resulting into the dataset 4: Observe yt ← f (xt ) + ε(xt )
Dt = {(xi , yi )}ti=1 , with: 5: Update GP posterior with yt

yt = f (xt ) + ε(xt ), (2) A. Safe optimization under constraints


where ε(xt ) is zero-mean noise, independent across different Consider a safety metric q : X → R, which should stay
time steps t. In the standard setup, the noise is assumed bounded within a predefined region with high probability.
to be identically distributed over the whole domain X , Formally, the optimization problem is then
i.e., εt ∼ N (0, ρ2 ). This, however, might be restrictive min{f (x) | q(x) ≤ c}, (6)
x∈X
in practice, leading to sub-optimal solutions [25]. Here we
present the generalized version of the optimization with under the feasibility assumption {x | q(x) ≤ c} =
̸ ∅, for a
εt ∼ N (0, ρ2 (xt )) and later describe why it is important. given c ∈ R. The safety metric q(x) here is an unknown,
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
3
IEEE RA-L submission 23-1391.1

expensive-to-evaluate function that should be learned on the Acquision function. Since the noise variance is unknown
fly. GoOSE extends any standard BO algorithm providing in practice and both f (x) and ρ(x) are learned during the
high-probability safety guarantees in such a setup [11]. optimization, RAHBO proposes the acquisition strategy
GoOSE keeps track of two subsets of the optimization trading off not only exploration-exploitation but also risk:
domain: the safe set Xt and the optimistic safe set Xtopt .
Xt contains the set of points classified as safe while Xtopt acqt (x) = lcbft (x) + α lcbvar
t (x), (12)
contains the points that potentially could be safe, i.e.,
where lcbvar
t denotes the lower confidence bound Eq. (5)
Xt = {x ∈ X | ucbqt (x) ≤ c}, (7) constructed for the variance ρ2 (x). For the latter, one need
Xtopt = {x ∈ X | ∃x̄ ∈ Wt , s.t. gtϵ (x̄, x) > 0}. (8) to observe evaluations s2t of the noise variance, which is
done by querying k observations {yti }ki=1 of f (x):
Here, the subscript t indicates the iteration of the Bayesian k k
optimization, and the superscript q marks the constraint 1X i 1 X
yt = y, s2t = [yt − yti (xt )]2 . (13)
q(x). Wt ⊂ Xt is the set of expanders - the periphery k i=1 t k − 1 i=1
of the safe set such that ∀ x ∈ Wt it holds true that
For a complete introduction and treatment of RAHBO,
|ucbqt (x) − lcbqt (x)| > ϵ, and gtϵ (x̄, x) is the noisy expansion
we refer the reader to [25]. Having introduced RAHBO,
operator taking values of 0 or 1, defined as:
constituting an important part of the framework presented
gϵt (x̄, x) = I lcbqt (x̄) + µ∇
 
t (x̄) ∞ d(x̄, x) + ϵ ≤ c , (9) in this work, we are ready to present RAGoOSE – our Safe

Risk-averse Bayesian Optimization method.
for some ϵ > 0 uncertainty threshold relating to the
observation uncertainty of q(x) (see also [11]). µ∇t (x̄) is

IV. RAGoOSE : S AFE R ISK - AVERSE BO
the mean of the posterior over the gradient of the constraint
function, and d(·, ·) is the distance metric associated with We aim to minimize the risk-averse mean-variance objec-
Lipschitz continuity of q(x). Prior to the initialization tive subject to unknown safety constraint function. Similar
of BO, GoOSE assumes a known non-empty safe seed techniques could be used to capture heteroscedastic noise in
X0 , which does not need to be large in size and can be the constraint measurements. Formally,
constructed from previous measurements. min{M V (x) | q(x) ≤ c}, (14)
The acquired points belong to the joint domain Xt ∪ Xtopt x∈X

for a given c ∈ R. As before, we get noise-perturbed


xt = argmin acqt (x). (10) evaluations as follows: m(x) = q(x) + εq with
x∈{Xt ∪Xtopt }
εq ∼ N (0, σq2 ), σq2 > 0, y(x) = f (x) + ε(x) with
If xt ∈ Xt , f (xt ) and q(xt ) are evaluated via available ε(x) ∼ N (0, ρ2 (x)) and similarly s2 (x) = ρ2 (x) + εvar
observations. Otherwise, if xt ∈ Xtopt , then an expander of 2
with εvar ∼ N (0, σvar ). The latter are observed through
the safe set is evaluated. This strategy allows both to query sample mean yt and variance s2t defined in Eq. (13).
the points satisfying the constraints and strategically expand
the safe set in the direction of a promising input. A. Building blocks of RAGoOSE
The main elements enabling the safe and risk-aware
B. Risk-averse optimization optimization of RAGoOSE rely on RAHBO and GoOSE.
Consider two different solutions that have similar RAGoOSE probabilistic models. To reason under un-
expected function values but one produces noisier certainty about the unknown objective f (x), noise variance
realizations (see Section V as a toy example). This is ρ2 (x) and safety metrics q(x), we model each with a GP
important when it comes to the actual deployment of the - GP f , GP var , and GP q , respectively, characterised by a
solutions in a real system where one might prefer more kernel function κf , κvar , and κq , respectively.
stable solutions. While standard risk-neutral BO methods, The posteriors of GPtvar and GPtq can be simplified since
such as GP-LCB, provide guarantees in the risk-neutral the variance ρ2 (x) and safety metrics q(x) are assumed to
(homoscedastic/heteroscedastic) BO setting [27], they might be perturbed by homoscedastic noise:
fail in the risk-averse setting. To this end, [25] proposes a
µqt (x) = κqt (x)T (Ktq + σq2 I)−1 y1:t ,
practical risk-averse algorithm with theoretical guarantees.
2
Mean-variance objective is a simple and frequently used σtq (x) = κq (x, x) − κqt (x)T (Ktq + σq2 I)−1 κqt (x),
way to incorporate risk:
where I ∈ Rt×t is the identity matrix, and the rest corre-
2
min{M V (x) = f (x) + αρ (x)} (11) sponds to the notation in Eqs. (3) and (4). The posterior mean
x∈X
and variance equations for GPtvar are built analogously.
with α ≥ 0 is the coefficient of absolute risk tolerance. The posterior GPtf in Eqs. (3) and (4), however, relies
Intuitively, the noise variance ρ2 (x) acts as a penalizing term on the unknown noise variance ρ2 (x) in Σt . To this end,
to the objective f (x), encouraging to acquire points with less [25] proposes instead to use Σ̂t with the upper confidence
noise. The coefficient of absolute risk-tolerance α allows bound ucbvar
t (·):
accounting for various degrees of risk tolerance, with α = 0
Σ̂t := k1 diag ucbvar var

corresponding to the risk-neutral case of using GP-LCB. t (x1 ), . . . , ucbt (xt ) , (15)
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
4
IEEE RA-L submission 23-1391.1

where Σ̂t is corrected by k due to the sample mean. Optimization loop. Having set the configuration and all
This correction guarantees correct, though conservative, parameters and input sets initialized, we present the main
confidence bounds for the objective f (x), thus allowing for algorithm of our work, representing an end-to-end execution
effective optimization (see [25] for more details). of RAGoOSE, in Algorithm 2. We note that an explicit
We assume the functions f (x), q(x), and ρ2 (x) to belong modeling of the variance in the constraint evaluations is
to RKHS associated by their respective kernels and have not included, though a possible extension of the algorithm.
bounded respective RKHS-norms, other assumptions follow Furthermore the expansion strategy (see Algorithm 2 line
[11], [25]. Under these assumptions, the lower and upper 12-15) is not risk-averse.
confidence bounds for the functions are defined according
to Eq. (5) and denoted as lcbqt (x), ucbqt (x) for q(x) and Algorithm 2 RAGoOSE optimization loop
lcbvar var 2
t (x), ucbt (x) for ρ (x). The corresponding parame- 1: Input: Parameters α, β, k, ϵ,
f q var
ters β , β , and β are individually adjusted to ensure the Initial data X0 , Df , Dvar , Dq ,
validity of confidence bounds and balance exploration vs. Kernel functions κf , κq , κvar ,
exploitation (further referenced collectively as β parameters). Prior means µf0 , µq0 , µvar
0
RAGoOSE acquisition strategy. With the confidence 2: for t = 1, 2, . . . do
bounds defined, RAGoOSE acquires point xt at each 3: update GP f , GP var , GP q
iteration t following the acquision strategy in Eq. (12) 4: Xt ← {x ∈ X : uqt (x) ≤ c}
conditioned on the safety of the inputs. Particularly, xt is 5: Lt ← {x ∈ Xt : ∃z ∈ / Xt , with d(x, z) ≤ ∆x}
obtained by: 6: Wt ← {x ∈ Lt : uqt (x) − ltq (x) ≥ ϵ}
xt = argmin {lcbft (x) + α lcbvar
t (x)} (16) 7: if xt−1 = xopt then
x∈{Xt ∪Xtopt } 8: xopt ← argmin {lcbft (x) + α lcbvar t (x)}
x∈{Xt ∪Xtopt }
As described in [25], the framework allows for fine-tuning
the trade-off between exploration, exploitation, and risk. This 9: if ucbqt (x) ≤ c then
is exactly done via the parameter β f , allowing to adjust 10: xt ← xopt
exploration vs. exploitation trade-off of the objective, and 11: evaluate yt , mt , s2t
α, adjusting the trade-off between optimum value of the 12: else
expected objective and the associated observation variance 13: xexp ← argminx∈Wt d(x, xopt ): gtϵ (x, xopt ) ̸= 0
at this point. However, the parameters β var and β q act as 14: xt ← xexp
tuning knobs to control exploration. As acqt (x) is using 15: evaluate yt , mt , s2t
lcbvar
t , an increasing β
var
corresponds to a more optimistic 16: Output: x∗
f var
f inal ← argmin{ucbt (x) + α ucbt (x)}
belief on the observation noise variance of f (x), whereas an x∈Xt

increasing β q corresponds to a smaller safe set Xt , and hence,


to a more conservative estimate of the true safe set Xsaf e via C. Implementation
Xt , where Xsaf e is defined as Xsaf e = {x ∈ X | q(x) ≤ c}.
RAGoOSE keeps track of its own estimate of the safe set When optimizing the acquisition function, the current
Xt and the optimistic safe set Xtopt , defined in the same way implementation of RAGoOSE uses particle swarm
as described in section III. optimization (PSO) [29], that initializes a set of particles
across Xt , and through its update rules also allows the
particles to expand into the optimistic safe set Xtopt ,
B. The Algorithm
avoiding calculating the optimistic safe set explicitly [11].
Initialization. We assume the compact optimization Another important aspect affecting the performance of
domain X ⊂ Rd to be defined and known, and an initial the algorithm is the chosen discretization size ∆x, as
set of n0 observations Df = {yi }ni=1 0
, Dq = {mi }ni=1 0
, and finer discretization allows a better assumption of Xtopt ,
n0 2
Dvar = {si }i=1 for f (x), q(x), and ρ (x) at x ∈ X0 ⊆ due to a finer grid of expanders, while slowing down
Xsaf e ⊆ X must be available. We use X0 as a safe-seed the computation time of the algorithm. We follow the
set required for RAGoOSE initialization, and as a fallback rule-of-thumb in selection of the discretization step from [7]
input in case no safe inputs are found through RAGoOSE based on the lengthscale parameter ℓq of the GP q kernel
iterations. Given the initialization sets, we initialize the three −2
κq , with ∆x s.t. κq (x, x + ∆x) (σq ) = 0.95.
Gaussian processes GP f , GP var and GP q with pre-selected The parameter β = 3 was fixed for every GP of the
kernels for the GPs and tune their hyperparameters. implementation, resulting in a 99% confidence interval. The
We then configure the RAGoOSE optimization with exploration threshold was set to ϵ = 6σq for all experiments
our selected values of α and β, defining the exploration- (see also Algorithm 2 line 6).
exploitation-risk aversion setting. Additionally, we select
appropriate uncertainty threshold parameter ϵ governing V. C ASE S TUDY
the augmentation of the set of expanders, via the noisy We now apply the proposed approach, first on an il-
expansion operator gϵt . We also set the number of evaluations lustrative example (sinusoidal function with varying noise
per iteration k, and the total number of iteration loops of levels), then on a numerical simulation of a precision motion
RAGoOSE, T . system, and finally we run experiments on the real system.
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
5
IEEE RA-L submission 23-1391.1

In the synthetic example the hyperparemeters were tuned domain, and x∗HV , located in the high-variance region. All
every iteration minimizing negative log marginal likelihood. GPs were initialized with five a priori known points within
The hyperparameters we used for the GPs in the other the safe set interval x ∈ [0, 1]. In Fig. 2 we report the
experiments can be read from: mean cost function f (xt ), observation noise variance ρ2 (xt )
with the best performing RAGoOSE configuration and the
Numeric experiment f q ρ2 benchmarks, and the intermediate regret rt with varying α
l = [Pkp , Vki ]T [50, 200]T [50, 200]T [50, 200]T (with rt defined in Eq. (17)) with ±2 standard errors (SE)
(σn , µ0 , λ) (2, 6, 0) (4, 6, 1e-2) (1e-2, 0.3, 1e-6 )
across the n = 30 performed repetitions of the optimization
Real experiment f q ρ2 routine
l = [Pkp , Vkp , Vki , Aff ]T [50, 100, 200, 0.5]T [50, 100, 200, 0.5]T [50, 100, 200, 0.5]T
rt = f (x∗opt ) + 50ρ2 (x∗opt ) − f (xt ) + 50ρ2 (xt ) . (17)
   
(σn , µ0 , λ) (1, 4.61, 0) (1e-2, 0.192, 9e-4) (1e-3, 1e-2, 1e-6 )

TABLE I: GP Hyperparameters for experiments in section V We report a constraint violation when the observed value
mt exceeds the constraint upper bound c = 3.
All experiments use a RBF kernel κ(x, x′ ) = Results. All methods were able to converge to the opti-
′ 2
σn2 exp(− (x−x
2l2
)
)
for all GPs. All priors are set to a constant mum of the cost function successfully, as shown in Fig. 2.
µ0 . For the cost function GP f the noise is modeled by While RAGoOSE requires more iterations to converge, it
GP var . shows superior performance in observation noise variance,
converging to x∗opt , yielding a 41% and 31% decrease in
observed noise variance, compared with GoOSE and CBO
respectively. We observe a closer convergence to the low-
variance optimum with increasing α, as Fig. 2c reports,
with α = 50 yielding the lowest regret at the terminal
iteration. As expected, the risk-neutral case of α = 0 fails
to converge to the risk-averse optimum x∗opt , yielding the
highest intermediate regret rt . RAGoOSE outperforms the
other benchmarks in the number of constraint violations.
The benchmark study results are summarized in Table II.
The additional computation due to modeling the noise vari-
ance slows down the optimization time of RAGoOSE, but
increases the performance and reduces constraint violations.
Furthermore RAGoOSE shows slightly better prediction of
the final optimum mean value, within one standard deviation
of the noise.

Fig. 1: Illustrative example in the case study. Top: Objective Metric/Parameter RAGoOSE GoOSE CBO
f (x) with 3 global maxima marked as (xCV , xopt , xHV )
f (x∗ ) -0.999 -0.987 -0.999
and constraint function q(x) over the same domain showing
that only two optima (xopt , xHV ) are safe; Bottom: Het- ρ2 (x∗ ) 0.011 0.018 0.016
eroscedastic noise variance ρ2 (x) over the same domain: Constraint violations 0.03% 0.07% 7.90%
the noise level at (xCV , xopt , xHV ) varies according to the Mean optimization time [s] 68.6 28.4 21.4
sigmoid function.
TABLE II: Mean optimized performance metric (cost) f (x∗ ),
A. Synthetic problem mean noise parameter ρ2 (x∗ ), constraint violations and
We study RAGoOSE on a synthetic optimization optimization time for different BO-based tuning methods
problem, and compare it with two baselines - GoOSE [11] applied on the synthetic problem across the 30 optimization
and Constrained Bayesian Optimization (CBO ) [13]. experiments
Problem Setup. We use a sinusoidal cost function f (x)
and observe its noisy evaluations y(x), with the cost ob-
servation noise variance function ρ2 (x) as shown in Fig. 1. B. Controller Tuning for Precision Motion
Similarly, we observe the constraint function q(x) via its System and controller. The system of interest is a linear
noisy observations m(x), with a homoscedastic noise vari- axis of a high precision positioning system by Schneeberger
ance of σq = 0.1. Performance was assessed by repeating Linear Technology, shown in Fig. 3. The axis is driven by
the optimization 30 times, with T = 200 BO iterations a permanent magnet AC motor equipped with nanometer
each, and k = 10 evaluations of the cost function. We precision encoders for position and speed tracking. The
repeat the optimization routine with RAGoOSE for all α ∈ axis is guided on two rails, reaching positioning accuracy
{0, 10, 30}. As Section V illustrates, the set-up gives rise of ≤ 10 µm, repeatability of ≤ 0.7 µm, and 3σ stability
to 3 optimizers - x∗CV , that violates the constraint, x∗opt , of ≤ 1 nm. The sampling time of the controller and the
located at the low observation noise variance region of the date acquisition of the system used is fs = 20 kHz. Such
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
6
IEEE RA-L submission 23-1391.1

(a) Cost function f (xt ) ± 2SE (b) Noise variance ρ2 (xt ) ± 2SE (c) Regret rt
Fig. 2: Numerical study results for synthetic function (see Figure 1), number of iterations t = 100, and nrep = 30 repetitions
for each experiments, with reported average and two standard errors (SE). RAGoOSE successfully converges to the minimum
of the cost function (a), while obtaining a lower observation noise at the solution (b) compared to the benchmarks. (c) Regret
convergence as a function of the of α parameter.

Fig. 4: Left panel: S-curve position reference function in-


dicating move- and settle time. Right panel: position error
pe (ti ) example (highlighted in black) and ξ(i, ns ) filter
signals. The optimization problem can be written as follows:
min{ϕ(x) | q(x) > c}.
x∈X

Fig. 3: Top panel: Simplified controller architecture in the Specifically, theP performance tracking metric is provided by
1 nP
experimental study. Bottom panel: Positioning system by ϕ(x) = nP −n s i=ns |ξ(i, ns )pe (ti )| + δ(x). Here, pe (ti )
Schneeberger Linear Technology AG. indicates the deviation from the reference position at each
time instance ti , δ indicates if additional noise is added to
systems are routinely used for production and quality control the metric (only used in the numerical study). The stability
in the semiconductor industry, in biomedical engineering, (safety) constraint q(x) is calculated from the Fourier trans-
and in photonics and solar technologies. For example, in form of the filtered velocity error F[ve (ti )] and limited by a
semiconductor manufacturing applications, considering risk threshold c. The threshold is determined by a small number
or safety alone is not enough due to stringent production of prior experiments. Furthermore, in the experimental study,
requirements. Therefore a tuning method that considers both both the performance and the stability metrics are filtered by
process variance and safety, e.g., RAGoOSE, is beneficial. a right-sided sigmoid function ξ(i, ni ) centered at 150ms
after the start of the settling phase for robustness of the
The system is controlled by a three-level cascade con-
metric. The settling phase extends from time instance ns
troller shown in Section V-B. The outermost loop con-
to time instance np . The optimization is over the controller
trols the position with a P-controller Cp (s) = Pkp , and
gains Pkp , Vkp , Vki and the feedforward gain Aff .
the middle loop controls the velocity with a PI-controller
We solve the optimization problem using RAGoOSE,
Cv (s) = Vkp + Vki /s. The inner loop, which controls the
following (14) and Algorithm 2. As our system exhibits
current of the drive, is well-tuned and not subject to tuning.
heteroscedastic noise which is included in ϕ(x), to provide
Feedforward structures are used to accelerate the response
the model for its variance i.e. ρ2 (x), we need to acquire
of the system. The gain of the velocity feedforward Vff is
multiple repetitions at each candidate gain combination
well-tuned and not modified during the retuning procedure,
x. For observations of f (x), we use the mean of the
while the acceleration feedforward gain Aff is retuned by
observations over the n repetitions, while we use the variance
the algorithm during the experiments on the real system and
for ρ2 (x). The optimum of the RAGoOSE experiments is
fixed to Aff = 0 for the simulation experiments.
found using the pessimistic prediction given by:
Experimental setup. We optimize the controller parame- x∗ = argmin{ucbft (x) + α ucbvar
t (x)}. (18)
ters providing the best tracking performance of the system, x∈Xt

while avoiding instabilities of the controller in the presence Each experiment has ten repetitions for each candidate
of heteroscedastic noise. We estimate both the tracking combination of parameters x, to estimate the noise variance
performance and potential instabilities via the corresponding ρ(x). Furthermore, we repeat the RAGoOSE optimization
data-driven features ϕ(x) and q(x) extracted from the system ten times. Figure 5 shows the range of values for f (x) and
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
7
IEEE RA-L submission 23-1391.1

ρ(x), obtained for each iteration during the ten repeated Bayesian optimization iterations and n = 10 evaluations of
optimizations. The solid lines show the median over the the cost function at each iteration. We show the distribution
repeated optimizations. of the observations of the mean f (xt ) and the observations
of the variance ρ2 (xt ) of ϕ(x) in Figure 6.

Fig. 6: Optimization results for controller tuning using


RAGoOSE with, α = 0 (left) and α = 100 (right).
Mean observation over the 10 repetitions f (x) (x-axis) is
plotted against the noise observation ρ2 (x) (y-axis), for one
Fig. 5: Optimization results (numerical) for controller tuning RAGoOSE optimization. The resulting data points are color-
comparing RAGoOSE using α = 0 with α = 100 for 10 coded corresponding to the values of one of the optimization
different experiments. (a) Median and min-max interval of variables, Aff , to visualize the heteroscedasticity of the
f (x) over the 10 RAGoOSE optimizations for each iteration. noise, which induces the particular shape of the obtained
(b) Median standard deviation of the cost ρ(x) over the 10 distribution with distinct ”branches”, corresponding to high
repeated RAGoOSE optimizations for each iteration together or low noise.
with its min-max interval.
We further compare the performance of RAGoOSE using
Numerical study. We perform the RAGoOSE optimiza- α = 0 and α = 100 for controller tuning of the three con-
tion on a simulation of a single axis of Argus comparing troller gains (Pkp , Vkp , Vki ) and on the acceleration feedfor-
the effect of α = 0 and α = 100 with T = 200 ward parameter (Aff ) with the performance of the auto-tuning
Bayesian optimization iterations and n = 10 evaluations of routine of the built-in hardware controller balancing band-
the cost function at each iteration. For simplicity we only width with phase and gain margin method on the real Argus
optimize the gains x = [Pkp , Vki ] in the simulation exper- linear axis. Both RAGoOSE optimizations were stopped
iment. All RAGoOSE experiments are initialized using 3 after 100 iterations. The first point of each of the two data-
initial samples, x0 = [[200, 1000], [300, 1000], [200, 1500]]T . driven optimizations corresponds to the controller parameters

To simulate the heteroscedasticity of the observations, non found by manual tuning, with x0 = [200, 600, 1000, 0.0] .
uniform noise is added on the cost function, dependent on Figure 6 shows the distribution of cost mean observations
one of the two parameters, e.g. for x1 = Vki , f (xt ) with the corresponding variance observation ρ2 (xt ).
( Clearly, α = 100 avoids high risk parameter combinations
N (0, 1e − 7), x1 ≤ 1200 compared to using α = 0 and the preferred candidate
δ(x1 ) = (19)
N (0, 1e − 5), x1 > 1200. controllers are along low variance regions, corresponding
Each experiment has ten repetitions for each candidate to lower values of the Af f parameter which is mostly
combination of parameters x, to estimate the noise variance associated with heteroscedastic noise. The performance of
ρ(x). Furthermore, we repeat the RAGoOSE optimization
ten times. Figure 5 shows the range of values for f (x) and RaGoOSE α = 0 RaGoOSE α = 100 Auto-tuning
ρ(x), obtained for each iteration during the ten repeated
f (x∗ )[nm] 2.459 2.543 4.987
optimizations. The solid lines show the median over the ρ2 (x∗ )[nm2 ] 9.187e-3 8.347e-3 5.832e-3
repeated optimizations. During the simulation experiment of Pkp∗ 330.4 319.8 350.0
the Argus linear axis, RAGoOSE using α = 0 and α = 100 Vkp∗ 731.6 862.4 600.0
show significant differences in performance and observation Vki∗ 1449.3 1174.6 2000.0
noise (see Figure 5). While both RAGoOSE configurations A∗ff 0.83 0.815 0.0
manage to quickly find a parameter combination resulting in
good performance, but high risk, RAGoOSE using α = 100 TABLE III: Optimized performance metric (cost), noise
manages to further explore and find a combination resulting parameter, and corresponding controller parameters for dif-
in lower risk and thus also better performance, due to the ferent values of α, compared to the controller’s inbuilt auto-
effect of the noise on the cost observation. tuning in the experimental study.
Experimental Study. We perform the optimization with the controllers corresponding to optima found by RAGoOSE
RAGoOSE using α = 0 and α = 100 with T = 100 using α = 0 and α = 100 (see Figure 6) after t = 100
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time
CONFIDENTIAL. Limited circulation. For review only
8
IEEE RA-L submission 23-1391.1

iterations is similar in tracking performance (cost) and in [8] A. Marco, P. Hennig, J. Bohg, S. Schaal, and S. Trimpe, “Automatic
variance. This is a result of the specific heteroscedasticity lqr tuning based on gaussian process global optimization,” in 2016
IEEE International Conference on Robotics and Automation (ICRA),
of the system’s noise. Due to the distribution of the noise, 2016, pp. 270–277.
the parameter regions of optimal performance correspond to [9] F. Berkenkamp, A. Krause, and A. P. Schoellig, “Bayesian optimiza-
the parameter regions of low risk, situated in the bottom left tion with safety constraints: safe and automatic parameter tuning in
robotics,” Machine Learning, pp. 1–35, 2021.
corner of the graphs in Figure 6. Thus, the final result of [10] R. Guzman, R. Oliveira, and F. Ramos, “Heteroscedastic bayesian
the RaGoOSE optimization is not strongly dependent of the optimisation for stochastic model predictive control,” IEEE Robotics
choice of α. Both methods show significant improvement and Automation Letters, vol. 6, no. 1, pp. 56–63, 2020.
[11] C. König, M. Turchetta, J. Lygeros, A. Rupenyan, and A. Krause,
compared to the controllers tuned using the built-in auto- “Safe and efficient model-free adaptive control via bayesian opti-
tuning routine (see Table III), due to the contribution of mization,” in 2021 IEEE International Conference on Robotics and
the noise model in the performance modelling. There were Automation (ICRA), 2021, pp. 9782–9788.
[12] P. Brunzema, A. von Rohr, and S. Trimpe, “On controller
no constraint violations observed during the optimization of tuning with time-varying bayesian optimization,” arXiv preprint
both RAGoOSE experiments. In terms of duration of the tun- arXiv:2207.11120, 2022.
ing, RAGoOSE needs ten times more evaluations, to estimate [13] J. Gardner, M. Kusner, Zhixiang, K. Weinberger, and J. Cunningham,
“Bayesian optimization with inequality constraints,” in Proceedings of
the noise variance, which slows it with one order of magni- International Conference on Machine Learning (ICML), ser. Proceed-
tude, compared to methods not considering heteroscedastic ings of Machine Learning Research, vol. 32, no. 2. Bejing, China:
noise. In addition, as shown in Table II, it is slower due to PMLR, 22–24 Jun 2014, pp. 937–945.
[14] M. Khosravi, V. Behrunani, P. Myszkorowski, R. S. Smith, A. Ru-
the additional GP model of the noise. Thus, one iteration penyan, and J. Lygeros, “Performance-driven cascade controller tuning
of RAGoOSE for tuning the precision motion system takes with bayesian optimization,” IEEE Transactions on Industrial Elec-
32s, compared to 1.6s achieved with the method from [11]. tronics, p. 1–1, 2021.
[15] D. Piga, S. Formentin, and A. Bemporad, “Direct data-driven con-
VI. C ONCLUDING REMARKS trol of constrained systems,” IEEE Transactions on Control Systems
Technology, vol. 26, no. 4, pp. 1422—-1429, Jul 2018.
We presented a risk-averse Bayesian optimization [16] F. Sorourifar, G. Makrygirgos, A. Mesbah, and J. A. Paulson, “A
framework that performs constrained optimization with safe data-driven automatic tuning method for MPC under uncertainty using
constrained Bayesian optimization,” arXiv:2011.11841, 2020.
evaluations, accounting for the solution variance. To this [17] A. Rupenyan, M. Khosravi, and J. Lygeros, “Performance-based
end, we considered heteroscedastic noise in the process and trajectory optimization for path following control using bayesian
extended the RAHBO method to consider the safety of eval- optimization,” in 2021 60th IEEE Conference on Decision and Control
(CDC). IEEE, 2021, pp. 2116–2121.
uations in terms of constraint violations. We present results [18] Y. Sui, A. Gotovos, J. W. Burdick, and A. Krause, “Safe exploration for
in both numerical benchmark studies and on a real high- optimization with Gaussian processes,” in Proceedings of International
precision motion system to show the improved performance Conference on Machine Learning (ICML), 2015.
[19] M. Fiducioso, S. Curi, B. Schumacher, M. Gwerder, and A. Krause,
over state-of-the-art methods. We demonstrate the proposed “Safe contextual bayesian optimization for sustainable room tempera-
framework on a controller tuning application where safe ture pid control tuning,” in Proceedings of the 28th International Joint
evaluations are crucial to ensure physical safety. Future work Conference on Artificial Intelligence. AAAI Press, December 2019,
pp. 5850–5856.
will consider extending RAGoOSE to include the variance [20] M. Khosravi, C. Koenig, M. Maier, R. S. Smith, J. Lygeros, and A. Ru-
from the measurements of the constraints in the optimization. penyan, “Safety-aware cascade controller tuning using constrained
From the application perspective, the next challenge is bayesian optimization,” IEEE Transactions on Industrial Electronics,
2022.
applying the method in continuous optimization scenarios, [21] M. Turchetta, F. Berkenkamp, and A. Krause, “Safe exploration for
without disrupting the operation of the physical system. interactive machine learning,” in Advances in Neural Information
R EFERENCES Processing Systems (NeurIPS), 2019, pp. 2891–2901.
[22] I. Bogunovic, J. Scarlett, and V. Cevher, “Time-varying gaussian
[1] X. Li, S.-L. Chen, J. Ma, C. S. Teo, and K. K. Tan, “Data-driven process bandit optimization,” in Artificial Intelligence and Statistics.
model-free iterative tuning approach for smooth and accurate track- PMLR, 2016, pp. 314–323.
ing,” in 2018 IEEE/ASME International Conference on Advanced [23] K. Swersky, J. Snoek, and R. P. Adams, “Multi-task bayesian opti-
Intelligent Mechatronics (AIM), 2018, pp. 593–598. mization,” Advances in neural information processing systems, vol. 26,
[2] M. Li, Y. Zhu, K. Yang, and C. Hu, “A data-driven variable-gain 2013.
control strategy for an ultra-precision wafer stage with accelerated iter- [24] R. Guzman, R. Oliveira, and F. Ramos, “Bayesian optimisation for
ative parameter tuning,” IEEE Transactions on Industrial Informatics, robust model predictive control under model parameter uncertainty,”
vol. 11, no. 5, pp. 1179–1189, 2015. arXiv preprint arXiv:2203.00551, 2022.
[3] F. Song, Y. Liu, W. Jin, J. Tan, and W. He, “Data-driven feedforward [25] A. Makarova, I. Usmanova, I. Bogunovic, and A. Krause, “Risk-
learning with force ripple compensation for wafer stages: A variable- averse heteroscedastic bayesian optimization,” Advances in Neural
gain robust approach,” IEEE Transactions on Neural Networks and Information Processing Systems, vol. 34, pp. 17 235–17 245, 2021.
Learning Systems, pp. 1–15, 2020. [26] J. Kirschner and A. Krause, “Information directed sampling and
[4] M. Khosravi, V. Behrunani, R. S. Smith, A. Rupenyan, and J. Lygeros, bandits with heteroscedastic noise,” in Conference On Learning Theory
“Cascade control: Data-driven tuning approach based on bayesian (COLT), 2018.
optimization,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 382–387, 2020. [27] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger,
[5] J. Mockus, Bayesian Approach to Global Optimization: Theory and “Information-theoretic regret bounds for gaussian process optimization
Applications, ser. Mathematics and its Applications. Springer Nether- in the bandit setting,” IEEE Transactions on Information Theory,
lands, 2012. vol. 58, no. 5, pp. 3250–3265, may 2012. [Online]. Available:
[6] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, https://doi.org/10.1109%2Ftit.2011.2182033
“Taking the human out of the loop: A review of Bayesian optimiza- [28] C. Rasmussen and C. Williams, Gaussian Processes for Machine
tion,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2015. Learning, ser. Adaptive Computation and Machine Learning. Cam-
[7] R. R. Duivenvoorden, F. Berkenkamp, N. Carion, A. Krause, and A. P. bridge, MA, USA: MIT Press, Jan. 2006.
Schoellig, “Constrained Bayesian optimization with particle swarms [29] J. Kennedy and R. Eberhart, “Particle swarm optimization,” vol. 4, pp.
for safe adaptive controller tuning,” IFAC-PapersOnLine, vol. 50, no. 1, 1942–1948 vol.4, 1995.
pp. 11 800 – 11 807, 2017, 20th IFAC World Congress.
Preprint submitted to IEEE Robotics and Automation Letters as Submission for RA-L only
Received June 12, 2023 03:11:14 Pacific Time

You might also like