Professional Documents
Culture Documents
A Flying Inverted Pendulum
A Flying Inverted Pendulum
21881
presented by
M ARKUS H EHN
Diplom-Ingenieur, Technische Universität Darmstadt
2014
Optimality-Based Trajectory Generation
and Periodic Learning Control
for Quadrocopters
Markus Hehn
This thesis considers the design of algorithms for the autonomous flight of quadrocopters
and related multi-rotor vehicles. These flying vehicles have proven popular as small-scale
aerial robotics platforms because they combine mechanical simplicity and robustness, the
ability to hover, and exceptional agility.
The algorithms proposed herein perform two tasks: 1) they generate high-performance
flight trajectories that satisfy the dynamic and input constraints of quadrotor vehicles;
and 2) they track a given, periodic trajectory with high accuracy by using past data.
The first part of this thesis presents several algorithms centered around the problem
of generating flight trajectories for quadrocopters. The first algorithm presented here is
a benchmarking tool (Paper I). Its aim is to quantitatively answer questions surrounding
achievable performance, influence of design parameters, and performance assessment of
control strategies in the design process of quadrotor systems. The algorithm allows the
computation of quadrotor maneuvers that satisfy Pontryagin’s minimum principle with
respect to time-optimality. Such maneuvers provide a useful lower bound on the dura-
tion of maneuvers, which can be used to assess performance of controllers and vehicle
design parameters. Computations are based on a two-dimensional first-principles quadro-
tor model. The minimum principle is applied to this model to find that time-optimal
trajectories are bang-bang in the thrust command, and bang-singular in the rotational
rate command. The algorithm allows the computation of time-optimal maneuvers for
arbitrary initial and final states by solving the boundary value problem induced by the
minimum principle.
While the benchmarking algorithm provides valuable insight in the design stage, its
computational complexity and use of a two-dimensional model disqualifies it from plan-
ning flight trajectories dynamically. Dynamic trajectory planning is required for tasks
subject to dynamic changes of the objective, where significant disturbances in the flight
path may occur, or where knowledge of the environment becomes available only mid-flight.
In such circumstances, it is necessary to quickly and robustly plan a new trajectory when-
ever new information becomes available. The second algorithm in this thesis is designed
to satisfy such real-time requirements (Paper II). Its aim is to combine computational
efficiency and the ability to plan fast motions from a large range of initial conditions to
a target point that will be reached at rest. The approach consists of planning separate
trajectories in each of the three translational degrees of freedom, and to ensure feasibility
5
Abstract
6
Kurzfassung
Diese Dissertation präsentiert Algorithmen für den autonomen Flug von Quadrokoptern
und ähnlichen Vielrotorfluggeräten. Auf Grund ihres simplen und robusten mechanischen
Aufbaus, ihrer Schwebefähigkeit und ihrer ausserordentlichen Agilität haben sich derar-
tige Fluggeräte zu beliebten Plattformen in der Flugrobotik entwickelt, und sind insbe-
sondere im Bereich relativ niedriger Nutzlasten stark vertreten.
Die in dieser Arbeit präsentierten Algorithmen erfüllen zwei Aufgaben: 1) Die Be-
rechnung von Flugtrajektorien unter Berücksichtigung der Dynamik und Aktuatorlimi-
tationen von Quadrokoptern, sowie 2) das akkurate Folgen einer gegebenen periodischen
Trajektorie durch Verwendung von Daten vorheriger Ausführungen.
Im ersten Teil der Dissertation werden mehrere Algorithmen zur Berechnung von
Flugtrajektorien für Quadrokopter beschrieben. Ein erster Algorithmus dieser Kategorie
dient zum Leistungsvergleich von Flugsystemen (Paper I). Ziel ist es hier, quantitative
Antworten auf Fragen zur Leistungsfähigkeit von Flugsystemen und zum Einfluss von
Systemparametern und -architektur auf die erzielbare Leistung zu berechnen. Der Al-
gorithmus erlaubt die Berechnung von Flugmanövern, die die Optimalitätsbedingungen
des Pontrjaginschen Maximumprinzips für die Minimierung der Manöverdauer erfüllen.
Die so errechnete Manöverdauer stellt die Untergrenze für die Dauer des Manövers dar,
und kann als Referenzwert für die Leistungsfähigkeit des Systems verwendet werden.
Zur Berechnung der Manöver wird ein zweidimensionales Quadrokoptermodell verwen-
det. Anhand des Pontrjaginschen Maximumprinzips wird hergeleitet, dass die optimalen
Steuersignale für den Gesamtschub immer maximal oder minimal sind, während jene für
die Drehrate maximal, minimal oder durch eine Singularitätsbedingung gegeben sind.
Der Algorithmus erlaubt die Berechnung von zeitoptimalen Flugmanövern für beliebi-
ge Anfangs- und Endzustände, indem das sich durch das Maximumprinzip ergebende
Randwertproblem gelöst wird.
Der obige Algorithmus liefert wertvolle Informationen zur Auslegung von Flugsyste-
men, ist jedoch aufgrund seines hohen Rechenaufwandes und seiner Beschränkung auf
ein zweidimensionales Modell für die dynamische Planung von Flugbahnen ungeeignet.
Solch eine dynamiche Planung ist nötig, wenn während der Ausführung eines Manövers
Änderungen des Zielpunktes oder signifikante Störungen auftreten, oder neue Informa-
tionen zu Hindernissen in der Umgebung verfügbar werden. In solchen Szenarien ist es
nötig, schnell und zuverlässig neue Flugbahnen zu planen sobald neue Informationen
7
Kurzfassung
zur Verfügung stehen. Der zweite in dieser Arbeit vorgestellte Algorithmus (Paper II)
wurde in Hinsicht auf solche Echtzeitbedüfnisse entwickelt. Das Ziel ist in diesem Kon-
text, bei geringem Rechenaufwand schnelle Flugtrajektorien aus einem grossen Bereich
von Anfangszuständen zu einem Zielpunkt, der im Stillstand erreicht wird, zu planen.
Der hierfür verwendete Ansatz besteht aus einer Entkopplung der drei translatorischen
Freiheitsgrade, wobei mittels Begrenzungen der Zustände und Stellgrössen der einzelnen
Freiheitsgrade die dynamischen Eigenschaften des Quadrokopters berücksichtigt werden.
So kann eine gültige Flugtrajektorie auf einem Laptop innerhalb von wenigen Mikro-
sekunden berechnet werden, und zusätzlich verfügbare Rechenzeit kann zur iterativen
Verbesserung der Trajektorie verwendet werden. Das wiederholte Planen der Trajekto-
rie mit hoher Frequenz erlaubt die Nutzung der Trajektoriengenerierung als impliziten
Regelungsalgorithmus, und funktioniert in diesem Fall ähnlich wie eine modellprädiktive
Regelung. Die Anwendbarkeit des Entkopplungsansatzes auf andere Trajektoriengenerie-
rungsaufgaben wird in einer Erweiterung des Algorithmus anhand von Abfangtrajektorien
demonstriert (Paper III).
Der zweite Teil dieser Dissertation präsentiert Forschungsergebnisse zu Algorithmen,
die Daten aus vorherigen Experimenten verwenden um Fehler beim Folgen einer periodi-
schen Referenztrajektorie zu reduzieren. Zur Auslegung von Flugreglern werden typischer-
weise vereinfachte Modelle der komplexen Flugdynamik von Quadrokoptern verwendet,
die häufig mit signifikanten Parameterunsicherheiten behaftet sind. Hierdurch können
sich grosse Folgefehler beim Flug von Hochleistungstrajektorien ergeben, die über meh-
rere Experimente hinweg zu grossem Teile wiederholbar sind. In diesem Teil der Arbeit
werden Algorithmen präsentiert, die in diesem Falle die nicht-kausale Korrektur solcher
wiederholbarer Folgefehler erlauben (Paper IV & V). Hierzu werden konventionelle Regler
durch eine zusätzliche Vorsteuerung ergänzt, die Messdaten vorheriger Experimente ver-
wendet. Das Lernen des Vorsteuerungssignals erfolgt im Frequenzbereich und basiert auf
einer Fourierzerlegung der Eingangs- und Ausgangssignale des Systems. Zusätzlich wurde
eine Zeitskalierungsmethode entwickelt, mit der der Lernprozess während des langsamen
Ausführens des Manövers beginnen kann, und das gelernte Korrektursignal dann auf
höhere Manövergeschwindigkeiten übertragen werden kann.
Alle in dieser Dissertation vorgestellten Algorithmen wurden mit Quadrokoptern in
der Flying Machine Arena experimentell validiert. Diese Tests demonstrieren ihre Leis-
tungsfähigkeit und bestätigen die in der Entwicklung der Algorithmen verwendeten An-
nahmen. Die Lernalgorithmen des zweiten Teils wurden weiterhin anhand von einem Ex-
periment validiert, bei dem ein Quadrokopter ein inverses Pendel stabilisiert (Paper VI).
Diese Experimente dienen dazu, die Anwendbarkeit der Methoden auf komplexere Flug-
anwendungen zu bestätigen.
Zusätzlich wurden die Algorithmen auch im Rahmen von Vorführungen sowohl an
der ETH Zürich als auch an internationalen Veranstaltungen präsentiert. Über 150 Live-
Demonstrationen bestätigen die Robustheit und Zuverlässigkeit der dargestellten Algo-
rithmen.
8
Acknowledgments
Many people have contributed directly and indirectly to the research presented in this
thesis, and this thesis would not have been possible without them. I would like to
use this opportunity to thank some of them, knowing full well that this list cannot be
comprehensive.
My sincerest gratitude goes to my doctoral advisor Professor Raffaello D’Andrea. He
put his trust in me by accepting me as a doctoral candidate as well as giving me the
freedom to work on the topics I considered interesting. I have benefited immensely from
our numerous discussions and his guidance, insight, and inspiration. For this, I wish to
say a heartfelt thank you to him. Raff has created a truly exceptional environment at
the Institute for Dynamic Systems and Control (IDSC), working in which felt more like
fun than work and where there always were more ideas than time.
I also sincerely thank Professor Vijay Kumar and Professor Roy Smith for agreeing to
be members of my doctoral committee and their willingness to spend considerable time
reviewing this thesis. I greatly appreciated their insightful questions and comments.
My time at the IDSC was spent in the company of many friendly, helpful, and ex-
tremely intelligent colleagues. Our countless discussions were always interesting, and
often helped a great deal in progressing our individual or joint research projects. I have
learnt a lot from these colleagues, and now consider many of them to be good friends.
My closest collaborators throughout the doctorate were the other doctoral candidates
involved in the Flying Machine Arena project: Sergei Lupashin, Angela Schöllig, Mark
Müller, Robin Ritz, Federico Augugliaro, and Dario Brescianini. To work in a group of
highly motivated, skillful, and talented people has been a great pleasure and a hugely
educational and rewarding experience. I feel that our work often showed that the whole
really can be greater than the sum of its parts.
While we did not work on the same project, my office mate Philipp Reist and I shared
teaching assistant duties and, by merit of starting at IDSC on the same day, progress
and struggles of our doctorates. I truly appreciated working with him as well as our
discussions and out-of-work projects. I also thoroughly enjoyed sharing teaching tasks
with Sebastian Trimpe, who in his methodic, thought-through and structured approach
to problems was a role model to me throughout this doctorate.
The IDSC supports its doctoral candidates with its excellent support staff. I would like
to thank Katharina Munz for her organizational work, Markus Waibel for his management
9
Acknowledgments
of numerous demos and his help in disseminating our research results, Hallie Siegel for
proof-reading and improving our publications, Carolina Flores for her design and media
work, and Igor Thommen, Marc-André Corzillius, and Hans Ulrich Honegger for their
technical support in the Flying Machine Arena.
I am greatly indebted to the great students that I had the pleasure to work with in
the past four years. Many of them contributed significantly to the results of this thesis,
and I am happy to see that many now pursue a doctorate themselves. I would like to
particularly thank Luzius Brodbeck, Robin Ritz, and Dario Brescianini for their excellent
work that directly improved and extended this research.
A great number of people have supported and guided me throughout the years, and
the influence of many has shaped my life. Of all of them, the most important ones were
certainly my parents Silvia and Wolfgang, who always supported and encouraged me.
Finally, I owe a huge thank you to Anna Hänsli, who has accompanied me during the
past five years. She has always stood by my side unconditionally, and gave me whatever
support I needed.
Financial Support
The author gratefully acknowledges partial funding of this research by the Swiss Na-
tional Science Foundation (SNSF) under the grant “High-Performance Maneuvers and
Trajectory Generation for Quadrotor Flying Vehicles” and the equipment grant “Optical
Motion Capture System for Robot Experiments in Real World Environments”.
10
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1 Part A: Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Part B: Iterative Learning of Periodic Motions . . . . . . . . . . . . . 20
2. Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Part A. Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Part B. Iterative Learning of Periodic Motions . . . . . . . . . . . . . 27
2.3 List of Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4 List of Supervised Student Projects . . . . . . . . . . . . . . . . . . . 32
2.5 Outreach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3. Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Part A. Trajectory Generation . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Part B. Iterative Learning of Periodic Motions . . . . . . . . . . . . . 37
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A. TRAJECTORY GENERATION . . . . . . . . . . . . . . . . . . . . . . 41
Paper I. Performance Benchmarking of Quadrotor Systems Using Time-
Optimal Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2. Modeling of Vehicle Dynamics . . . . . . . . . . . . . . . . . . . . . . 45
3. Minimum Principle for Time-Optimal Maneuvers . . . . . . . . . . . . 48
4. Algorithm for Calculation of Time-Optimal Maneuvers . . . . . . . . . 53
5. Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6. Benchmarking of Quadrotor Designs and Controllers . . . . . . . . . . 61
7. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
8. Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . 68
A. Algorithm for Calculation of Time-Optimal Maneuvers . . . . . . . . . 69
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11
Contents
12
Paper V. An Iterative Learning Scheme for High Performance, Periodic
Quadrocopter Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . 171
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
2. The Flying Inverted Pendulum Experiment . . . . . . . . . . . . . . . 174
3. Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4. Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5. Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 183
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Paper VI. A Flying Inverted Pendulum . . . . . . . . . . . . . . . . . . . . 187
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
2. Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3. Nominal trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
4. Dynamics about nominal trajectories . . . . . . . . . . . . . . . . . . . 195
5. Controller Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6. Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
7. Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 203
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Curriculum Vitae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
13
Preface
This thesis documents the research carried out by the author during his doctoral stud-
ies under the supervision of Professor Raffaello D’Andrea at the Institute for Dynamic
Systems and Control (IDSC) at ETH Zurich between September 2009 and April 2014.
The thesis is presented in the form of a cumulative dissertation: its main content
consists of six self-contained research articles (of which three are journal articles and
three are conference contributions) that have been published or submitted for publication
during the doctoral studies. The research is structured into two classes of problems: The
generation of flight trajectories for quadrocopters (Part A), and the precise tracking of
periodic trajectories through learning methods (Part B). The articles are put into context
by three introductory chapters, which are structured as follows:
Chapter 1 sketches out the motivation for this work. It also describes the problems
considered in this thesis, related work, and the approaches used. Chapter 2 describes the
key contributions of the research papers included in this thesis and how the individual
papers relate to each other. This chapter also includes references to related papers,
created in the context of the author’s doctoral studies, that are not included in this
thesis. Chapter 3 then provides a discussion of potential extensions and new directions
of this research.
15
Preface
16
1
Introduction
This thesis presents algorithms for the autonomous high-performance flight of quadro-
copters. The algorithms provide means to plan flight paths and to accurately track
periodic motions, with a particular focus on the combination of computational efficiency
and high flight performance.
Quadrocopters are robotic platforms that are frequently used in applications where
the ability to move freely in three-dimensional space and to overcome obstacles on the
ground are crucial [32]. Their high mechanical robustness (due to few moving parts) [27],
their safety (due to comparatively small rotor sizes) [15], and their ability to engage in
high-performance maneuvers and carry large payloads (due to their typically high thrust-
to-weight ratio) [18] are significant advantages over other small robotic platforms.
To further increase payload capabilities without increasing the rotor size, and to
increase robustness to actuator failures, other multi-rotor configurations are also being
designed and used [18]. Popular designs include hexacopters (with six propellers) and
octocopters (eight propellers). While this thesis is focused on quadrocopter applications,
the algorithms also apply directly to configurations with more rotors mounted in parallel,
in particular typical hexacopters and octocopters [1].
This thesis considers the trajectory generation and control of quadrocopters during
high-performance flight. In this context, high performance is used to highlight that the
algorithms are designed to allow full use of a vehicle’s capabilities. While this is desirable
for many applications (for example, it may allow faster execution of a given task or higher
accuracy), the ability to exploit the dynamic potential of the vehicle can also be seen as
a safety feature: even if the planned motion does not require the use of a vehicle’s full
dynamic capabilities, their efficient use may be crucial in recovering from unexpected
disturbances or avoiding collisions when obstacles are detected mid-flight.
From a controls perspective, high-performance quadrocopter flight is a challenging
proposition for several reasons:
− The inherently unstable and nonlinear dynamics typically limit the use of
linear control laws to flight near hover or near a trajectory planned ahead of time
(see e.g. [17], and references therein).
17
Chapter 1. Introduction
mounting of the propellers. Flying fast maneuvers and effectively suppressing dis-
turbances thus requires fast control loops and therefore limits the amount of com-
putation that can be carried out.
− Significant model uncertainties arise when the maneuvering speed increases, for
example due to the varying drag and lift behaviour of the propellers under unsteady
inflow conditions [6], the aerodynamic effects of a vehicle moving through the tur-
bulent wake of its propellers [2], and external influences such as wind or ground and
wall effects when operating in proximity to the environment [28]. These uncertain-
ties diminish the effectiveness of model-based control strategies.
The research presented in this thesis addresses two problems related to the control
of quadrocopters. The key objective of the research presented in part A of this thesis is
the generation of dynamic trajectories for quadrotor vehicles. Part B then focuses on the
design of algorithms that can be used to track a predefined, periodic trajectory with high
accuracy by using data from past executions.
A central part of this research project is the experimental validation of the developed
algorithms in order to demonstrate their effectiveness under real-world conditions and
provide insight into their characteristics. Each of the algorithms presented in this thesis
was applied to small quadrocopters (see Figure 1.1), and experimental results are pre-
sented for each of the research contributions. The experiments were carried out in the
Flying Machine Arena [20], a platform for aerial robotics research that was co-developed
by the author throughout the course of his doctoral studies.
The following sections specify the objectives and context of each of the two parts of the
research in more detail, and discuss the different approaches presented in the individual
contributions.
Photo by Raymond Oung
Figure 1.1 Quadrocopters used for the experimental validation of algorithms. These vehicles
measure approximately 35 cm from propeller center to propeller center, and weigh about 500 g.
18
1.1 Part A: Trajectory Generation
Trajectory generation algorithms form a core component in unmanned aerial vehicle flight
systems. They are closely interlinked with the control of the vehicle, and provide crucial
information (such as arrival time at a specified point or flight paths) to high-level planning
systems, e.g. mission planners or collision avoidance systems. Their impact on overall
system performance and efficiency is significant, as they typically determine the speed
of task execution. This is made even more important by severe limitations in achievable
flight time of state-of-the-art quadrocopters.
The problem can be stated as follows: Given the current state x0 , plan a trajectory
to a target state xT . The trajectory must be feasible with respect to the dynamics of the
quadrocopter. A multitude of trajectory design criteria may be desirable, e.g. reaching
the target state at a specified time, minimizing energy consumption, avoiding a certain
state space region, or arriving at the target state as quickly as possible. In its general
form, the trajectory generation problem is the search for the control input trajectory
that minimizes the performance metric, subject to constraints of the system dynamics
and possibly other design constraints, and terminates at the specified terminal state.
Related Work
The problem of quadrocopter trajectory generation has received significant attention in
recent years, and a number of algorithms have been presented. Broadly speaking, a
possible categorization of the algorithms is as follows:
A first group of algorithms can be considered as primarily geometric. The trajectory
generation process consists of first generating a path in space from a class of path prim-
itives, and thereafter parameterizing the generated path in time such that the dynamic
constraints of the quadrocopter are enforced. Examples of such algorithms have been
presented using path primitives such as lines [16], polynomials [9], or splines [5].
A second group of algorithms is based on the design of trajectories that minimize a
derivative of the position trajectory (or combinations thereof). Because these derivatives
can be related to the control input constraints of the quadrocopter through its differential
flatness property, the feasibility of the trajectory depends on these derivatives. Examples
of such methods include minimum snap trajectory generation [21] and the minimization
of a weighted sum of derivatives [30]. More real-time-focused implementations are based
on model predictive control methods with learnt linear dynamics [4] or linear dynamics
based on a decoupling of the system [24].
Research Directions
The specific trajectory design criterion that this thesis focuses on are minimum-time
trajectories, meaning that the design objective is to minimize the maneuver’s duration
until the target state is reached. For high-performance flight, minimum-time maneuvers
are of particular interest because they determine a fundamental performance limit of
the system considered. They can provide definitive answers to questions such as “If a
19
Chapter 1. Introduction
maneuver must start at the state x0 and end at the state xT , what is the minimum time
required to execute it?” and “Is it possible to reach state xT from the initial state x0
within some given duration?” [14].
The first direction of research presented herein is the design of algorithms that com-
pute time-optimal flight maneuvers for quadrocopters. The computed trajectories can
be used not only to execute maneuvers, but also as a benchmarking tool that provides
valuable insight into the capabilities of the quadrotor system. Time-optimal maneuvers
can highlight the influence on performance of such varied design parameters as the vehicle
weight, the measurement range of the internal sensors, or the feedback control law. The
application of state-of-the-art optimal control techniques (e.g. [3, 11]) to compute these
time-optimal maneuvers is made difficult by the nonlinearity and high order of the quadro-
tor dynamics. A key challenge when designing algorithms that are capable of computing
these maneuvers is thus to find suitable combinations of problem class restrictions (to
reduce complexity) and numerical methods (to improve computational tractability). For
these results to maintain their value as a benchmarking tool, it is particularly important
that the generated trajectories are indeed time-optimal.
The second focus of this research is the design of algorithms that can generate tra-
jectories quickly enough to be deployed in real-time scenarios. This allows a system
to cope with changing requirements, which in the case of quadrotor vehicles may be
caused by rapidly changing destinations, suddenly appearing obstacles, or the recovery
from arbitrarily perturbed states (caused, for example, by large wind gusts). Running
the trajectory generation algorithm at every controller update allows its use as an im-
plicit feedback law: closed-loop control is achieved by re-generating a trajectory at each
controller update, and applying the first control inputs of it in the manner of model
predictive control [10]. The combination of long, variable planning horizons, nonlinear
dynamics, and short available calculation times makes this a challenging optimization
problem. Compared to the benchmarking problem, the focus here is shifted away from
finding truly time-optimal maneuvers, towards finding maneuvers of relatively short du-
ration under stringent real-time requirements. In this context, it is acceptable for the
generated trajectories to not satisfy any optimality conditions; for the use in real-time sce-
narios, the absolute performance of planned trajectories is secondary to guaranteeing the
finding of a feasible trajectory of reasonable performance within the given computation
time constraints.
The second part of this thesis considers problems where the flight trajectory is defined
ahead of flight, and the objective is to track this trajectory as accurately as possible.
Accurate trajectory following is particularly relevant for operations where the tracking
performance of the system determines the quality of the task. Examples include carrying
a hanging payload with one or multiple quadrocopters, inspecting an object (like a bridge)
20
1.2 Part B: Iterative Learning of Periodic Motions
along a pre-computed path, painting surfaces that are difficult to access, or filming a scene
with a camera mounted on the vehicle.
The tracking problem complements the research projects presented in Part A: whereas
real-time trajectory generation algorithms are useful for dynamically changing tasks or
environments, we now consider problems where the trajectory is fully defined ahead of
flight. The large model uncertainties, in particular at high maneuvering speeds, make the
accurate tracking of high-performance trajectories challenging because they effectively act
as disturbances to the closed-loop trajectory tracking system.
In the research presented herein, we consider the specific case where the same trajec-
tory can be flown repeatedly. This offers an opportunity to improve tracking performance
because many of the disturbances that degrade tracking performance will be similar each
time the vehicle performs the motion. It is thus possible to non-causally correct for these
repeatable disturbances: by using data from previous executions to characterize them,
model-based correction inputs can be computed before executing the motion again. Ide-
ally, these correction inputs are able to fully compensate for the repeatable disturbances,
such that the feedback controller is required only to compensate for non-repeatable dis-
turbances.
Related Work
While the application of learning algorithms (and specifically non-causal strategies) to
stationary systems is well-established, its use for flying vehicles is less mature and has
only been actively researched in recent years. Several high-performance maneuvers for
multi-rotor vehicles have been demonstrated with the use of learning algorithms. Broadly
speaking, these learning approaches can be categorized in two groups:
The first group is characterized by its ability to learn motions that are parameterized.
The motion is thus described by a (finite) set of design parameters chosen by the user.
After the execution of the motion, these parameters are adapted to compensate for ob-
served disturbances. The direction and magnitude of the correction may be model-based,
or based on the user’s intuition. A discussion on the importance of choosing ‘good’ design
parameters may be found in [19], where a learning algorithm for this kind of parameter-
ized motion is demonstrated for multiple flips and fast translations with quadrocopters. A
further demonstration of this class of learning algorithms is provided in [22]. The ability
to shape the tracking performance strongly depends on the number of parameters that
are optimized; in the above examples, the objective is to minimize the error at specific
time instants (‘key frames’), and a relatively small number of parameters is sufficient to
do so. This makes the methods computationally lightweight.
The second group of learning approaches considers more generic motions that need
not be specified by parameters. The system dynamics are considered in discrete time,
and the correction consists of correction values (typically control inputs or set points)
for each discrete time step. After executing the motion, the correction values are nu-
merically optimized in order to minimize a performance metric related to the tracking
error. In this optimization, a model of the system dynamics provides the mapping from
21
Chapter 1. Introduction
Position
Vehicle + −
Feedback
+ +
Closed-Loop Dynamics
Reference
Iterative Learning
Figure 1.2 Architecture of a learning algorithm augmenting feedback control. The reference
to the feedback controller is augmented by adding the learning control input to it. The feedback
control runs in real-time, while the iterative learning applies feedback based on past executions.
In this configuration, the iterative learning component can non-causally compensate repeatable
disturbances, while the feedback control compensates non-repeatable disturbances.
corrections to the tracking error. This approach is commonly known as a form of iterative
learning control [7], and its application to high performance quadrocopter flight has been
demonstrated by several researchers [23, 25, 29, 31].
The delimitation between the two groups is not strict. Indeed, the second group of
learning approaches could be seen as using a very large number of values to parameterize
the correction.
Research Directions
The research in this thesis is centered on the development of algorithms for the learning
of non-causal correction inputs for maneuvers that are periodic. It explores how the
limitation to periodic maneuvers can be exploited in a learning control context, and how
computationally inexpensive algorithms can be designed for this class of problems. To
combine the strengths of feedback control and learning control, the learning control is
used to augment the feedback controller as seen in Figure 1.2.
A particular focus of the algorithms in this thesis is the application of learning control
to high-performance maneuvers. In this context, traditional learning control strategies
may encounter a problem: in order to learn from past executions, the system must be
able to execute the maneuver before learning control is applied. Due to the unstable
dynamics of quadrocopters, and to physical constraints such as limited operating spaces,
high-performance maneuvers may not be executable without the appropriate correction
inputs from the learning control algorithm. This initialization problem requires a novel
approach that provides a sufficiently good initial guess of the correction inputs.
22
2
Contributions
This chapter briefly summarizes the key contributions of the papers included in this
thesis and explains the relationship between the individual results. Many of the results
were obtained in close collaboration with other researchers at the Institute for Dynamic
Systems and Control, as indicated below.
Paper I
[P1] Markus Hehn, Robin Ritz, and Raffaello D’Andrea. Performance Benchmark-
ing of Quadrotor Systems Using Time-Optimal Control. Autonomous Robots,
Volume 33, Issue 1-2, pp 69-88, 2012.
23
Chapter 2. Contributions
Contribution The paper presents the structure of time-optimal maneuvers that arises
from the necessary optimality conditions given by Pontryagin’s minimum principle [3]. It
shows that time-optimal maneuvers are bang-singular: the thrust control is always maxi-
mal or minimal, and the rotational rate control input is maximal, minimal, or defined by
a singular arc constraint. Based on this structure, the paper introduces an algorithm for
computing the time-optimal maneuver for arbitrary initial and final states. To show that
the approach yields results that are meaningful for real-life quadrocopters, the computed
maneuvers are validated experimentally. The method’s use as a benchmarking tool is
demonstrated through the analysis of the performance implications of using a cascaded
linear controller, and the influence of the maximum thrust the vehicle can produce.
Related Publications
[R5] Robin Ritz, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Quadro-
copter Performance Benchmarking Using Optimal Control. Proceedings of the
2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 2011.
This paper presents an abridged version of the optimality conditions and trajec-
tory generation algorithm, including preliminary results.
Paper II
[P2] Markus Hehn and Raffaello D’Andrea. Real-Time Trajectory Generation for
Quadrocopters. IEEE Transactions on Robotics (under review).
Context This paper considers the problem of generating high-performance flight tra-
jectories for quadrocopters under hard real-time constraints. As discussed in Chapter 1,
such computational constraints typically arise when quadrocopters are flown in dynam-
ically changing environments. The objective of the trajectory generation considered in
this paper is to plan trajectories from a large range of initial conditions reaching a target
point at rest, and for these trajectories to reach this target point as quickly as possible.
As opposed to [P1], the full three-dimensional planning problem is considered, and the
design objective is to use the trajectory generator as a feedback control law by executing
it repeatedly in the fashion of model predictive control.
24
2.1 Part A. Trajectory Generation
microseconds on a laptop computer, and remaining computing time can be used to iter-
atively improve the trajectory. The paper includes the full description of the trajectory
generation algorithm, an analysis of its computational and flight performance, and an
experimental validation.
Related Publications
[R2] Angela Schoellig, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Feasi-
bility of Motion Primitives for Choreographed Quadrocopter Flight. Proceedings
of the 2011 American Control Conference (ACC), 2011.
This paper presents the derivation of fundamental feasibility conditions for trans-
lational trajectories from the dynamics and input constraints of quadrocopters.
[R3] Markus Hehn and Raffaello D’Andrea. Quadrocopter Trajectory Generation and
Control. Proceedings of the 2011 IFAC World Congress, 2011.
This paper presents the decoupling methodology used in this trajectory genera-
tion approach, and presents preliminary results on the use of the basic decoupled
trajectory generation mechanism.
[R7] Dario Brescianini, Markus Hehn, and Raffaello D’Andrea. Quadrocopter Pole
Acrobatics. Proceedings of the 2013 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), 2013.
This paper presents a system that allows quadrocopters to balance an inverted
pendulum, throw it into the air, and catch and balance it again on a second
vehicle. The catching maneuver is planned in real time using the presented
decoupling approach, and part of the motion is planned using the the decoupled
trajectory generator.
[R11] Federico Augugliaro, Sergei Lupashin, Michael Hamer, Cason Male, Markus
Hehn, Mark W. Mueller, Jan Willmann, Fabio Gramazio, Matthias Kohler, and
Raffaello D’Andrea. The Flight Assembled Architecture Installation: Coopera-
tive construction with flying machines. IEEE Control Systems Magazine (under
review).
This paper presents the design and realization of a system that autonomously
assembled a six-meter-tall tower consisting of 1500 foam modules with quadro-
copters over six days in front of a live audience. The trajectory generation
algorithm used in this system is an adapted version of the one presented in [P2].
25
Chapter 2. Contributions
Paper III
[P3] Markus Hehn and Raffaello D’Andrea. Real-Time Trajectory Generation for In-
terception Maneuvers with Quadrocopters. Proceedings of the 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), 2012.
Context This paper represents a continuation of the work in [P2]. While the results
in [P2] focus on trajectory generation problems where the objective is to reach a target
point at rest as quickly as possible, this work considers the problem of intercepting a
position, i.e. planning a trajectory such that the vehicle is at a given position at a
specified time. In addition to the interception constraint, the planned trajectories are
designed to come to rest as quickly as possible after interception.
Contribution The main contribution of this paper is the adaptation of the trajectory
generation algorithm presented in [P2] to the interception problem. Using the same de-
coupling approach, Pontryagin’s minimum principle is used to show that the structure
of an interception maneuver with minimal time to rest after the interception is identical
to the structure of a maneuver that guides the vehicle to a target point. Thus, the tra-
jectory generation algorithm from [P2] can be adapted to interception problems without
increasing its computational complexity. The algorithm is validated experimentally by
intercepting a ball thrown in the air.
Related Publications
[R6] Robin Ritz, Mark W. Mueller, Markus Hehn, and Raffaello D’Andrea. Coop-
erative Quadrocopter Ball Throwing and Catching. Proceedings of the 2012
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
2012.
This paper presents the system design of an experiment in which multiple
quadrocopters are attached to an elastic net, cooperatively launch a ball into
the air by stretching the net, and catch it again. The real-time trajectory gen-
eration problem of intercepting the flying ball with the net is solved by applying
a one-dimensional interception algorithm that is based on the same decoupling
approach and shares the same maneuver structure.
26
2.2 Part B. Iterative Learning of Periodic Motions
Paper IV
[P4] Markus Hehn and Raffaello D’Andrea. A Frequency Domain Iterative Learning
Algorithm for High-Performance, Periodic Quadrocopter Maneuvers. Mecha-
tronics (under review).
27
Chapter 2. Contributions
and the learnt corrections from the reduced speed are used to initialize the learning at
higher speeds. Experimental results show that the approach can reduce tracking errors
nearly to the repeatability limit of the system and that it can be applied to more complex
systems (as demonstrated by the flying inverted pendulum experiment in particular).
Related Publications
[R8] Markus Hehn and Raffaello D’Andrea. A Frequency Domain Iterative Feed-
Forward Learning Scheme for High-Performance Periodic Quadrocopter Maneu-
vers. Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), 2013.
This paper presents an abridged version of the learning scheme including pre-
liminary results. As opposed to [P4], the convergence properties are derived in
a deterministic context.
[R10] Sergei Lupashin, Markus Hehn, Mark W. Mueller, Angela P. Schoellig, Michael
Sherback, and Raffaello D’Andrea. A Platform for Aerial Robotics Research and
Demonstration: The Flying Machine Arena. Mechatronics, Volume 24, Issue 1,
pp 41-54, 2014.
This paper includes a more detailed presentation and analysis of the trajectory
tracking controller that is used in the learning process, as well as describing the
overall hardware and software infrastructure that is used for the experiments in
the Flying Machine Arena.
Paper V
[P5] Markus Hehn and Raffaello D’Andrea. An Iterative Learning Scheme for High
Performance, Periodic Quadrocopter Trajectories. Proceedings of the 2013 Eu-
ropean Control Conference (ECC), 2013.
Context This paper considers a similar problem setup to [P4]. The architecture of
the learning system differs in that the learnt corrections are added directly to the control
inputs instead of changing the reference signal to the feedback controller. A similar fre-
quency domain learning approach is used, but the problem is formulated in a deterministic
setting.
Contribution This paper extends [P4] to focus on the convergence of learning systems
where the dimension of the tracking error and the correction input differ. The algorithm
is designed to minimize an error output that consists of linear combinations of the system
states, and error convergence conditions are derived. For systems where the correction
input is larger or smaller than the number of error outputs, it is shown that the correction
28
2.2 Part B. Iterative Learning of Periodic Motions
Paper VI
[P6] Markus Hehn and Raffaello D’Andrea. A Flying Inverted Pendulum. Proceed-
ings of the 2011 IEEE International Conference on Robotics and Automation
(ICRA), 2011.
Context The flying inverted pendulum is an experimental test bed that consists of a
quadrocopter balancing an inverted pendulum. The feedback controller used to stabilize
the inverted pendulum on the quadrocopter is an example of more complex, task-specific
quadrotor control laws. This makes it an interesting experiment to verify the applicability
of learning algorithms to more complex tasks. The experiment highlights the limitations
of model-based control that can be overcome by learning control: the system is subject
to large, repeatable disturbances caused by unmodeled dynamics (such as aerodynamic
effects and the pendulum attachment point not coinciding with the quadrocopter’s center
of mass) and parameter uncertainties (such as the length of the pendulum). Both [P4]
and [P5] present experimental results demonstrating the ability of the learning algorithms
to significantly improve tracking performance of the flying inverted pendulum.
Contribution This paper details the system design of the flying inverted pendulum
experiment. It contains the derivation of the combined dynamics of quadrocopter and
pendulum, the derivation of equilibrium conditions and linearized dynamics for stationary
and circular flight, and the design of LQR controllers. The system design is verified
experimentally.
Related Publications
[R7] Dario Brescianini, Markus Hehn, and Raffaello D’Andrea. Quadrocopter Pole
Acrobatics. Proceedings of the 2013 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), 2013.
This paper extends the flying inverted pendulum experiment to a system that
allows quadrocopters to balance an inverted pendulum, throw it into the air,
and catch and balance it again on a second vehicle.
29
Chapter 2. Contributions
This section lists all peer-reviewed journal articles and conference contributions that
were published or submitted by the author during his doctoral studies. The publica-
tions [P1]-[P6] are featured in this thesis. The related publications [R1]-[R11] present
preliminary, related, or additional research results from the author’s doctoral studies.
Where applicable, their respective connection to the publications [P1]-[P6] is highlighted
in the descriptions of the contributions of individual papers (Sections 2.1 and 2.2).
The journal manuscripts [P2], [P4], and [R11] are currently under review. Preliminary
results of [P2] and [P4] have been published in [R3] and [R8], respectively.
[P1] Markus Hehn, Robin Ritz, and Raffaello D’Andrea. Performance Benchmark-
ing of Quadrotor Systems Using Time-Optimal Control. Autonomous Robots,
Volume 33, Issue 1-2, pp 69-88, 2012.
[P2] Markus Hehn and Raffaello D’Andrea. Real-Time Trajectory Generation for
Quadrocopters. IEEE Transactions on Robotics (under review).
[P3] Markus Hehn and Raffaello D’Andrea. Real-Time Trajectory Generation for In-
terception Maneuvers with Quadrocopters. Proceedings of the 2012 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), 2012.
[P4] Markus Hehn and Raffaello D’Andrea. A Frequency Domain Iterative Learning
Algorithm for High-Performance, Periodic Quadrocopter Maneuvers. Mecha-
tronics (under review).
[P5] Markus Hehn and Raffaello D’Andrea. An Iterative Learning Scheme for High
Performance, Periodic Quadrocopter Trajectories. Proceedings of the 2013 Eu-
ropean Control Conference (ECC), 2013.
[P6] Markus Hehn and Raffaello D’Andrea. A Flying Inverted Pendulum. Proceed-
ings of the 2011 IEEE International Conference on Robotics and Automation
(ICRA), 2011.
Related publications
(listed in chronological order)
[R1] Sergei Lupashin, Angela P. Schoellig, Markus Hehn, and Raffaello D’Andrea.
The Flying Machine Arena as of 2010. Proceedings of the 2011 IEEE Inter-
national Conference on Robotics and Automation (ICRA) - Video Submission,
2011.
30
2.3 List of Publications
[R2] Angela Schoellig, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Feasi-
bility of Motion Primitives for Choreographed Quadrocopter Flight. Proceedings
of the 2011 American Control Conference (ACC), 2011.
[R3] Markus Hehn and Raffaello D’Andrea. Quadrocopter Trajectory Generation and
Control. Proceedings of the 2011 IFAC World Congress, 2011.
[R4] Stefania Tonetti, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Dis-
tributed Control of Antenna Array with Formation of UAVs. Proceedings of the
2011 IFAC World Congress, 2011.
[R5] Robin Ritz, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Quadro-
copter Performance Benchmarking Using Optimal Control. Proceedings of the
2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 2011.
[R6] Robin Ritz, Mark W. Mueller, Markus Hehn, and Raffaello D’Andrea. Coop-
erative Quadrocopter Ball Throwing and Catching. Proceedings of the 2012
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
2012.
[R7] Dario Brescianini, Markus Hehn, and Raffaello D’Andrea. Quadrocopter Pole
Acrobatics. Proceedings of the 2013 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), 2013.
[R8] Markus Hehn and Raffaello D’Andrea. A Frequency Domain Iterative Feed-
Forward Learning Scheme for High-Performance Periodic Quadrocopter Maneu-
vers. Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), 2013.
[R9] Mark W. Mueller, Markus Hehn, and Raffaello D’Andrea. A Computationally
Efficient Algorithm for State-to-State Quadrocopter Trajectory Generation and
Feasibility Verification. Proceedings of the 2013 IEEE/RSJ International Con-
ference on Intelligent Robots and Systems (IROS), 2013.
[R10] Sergei Lupashin, Markus Hehn, Mark W. Mueller, Angela P. Schoellig, Michael
Sherback, and Raffaello D’Andrea. A Platform for Aerial Robotics Research and
Demonstration: The Flying Machine Arena. Mechatronics, Volume 24, Issue 1,
pp 41-54, 2014.
[R11] Federico Augugliaro, Sergei Lupashin, Michael Hamer, Cason Male, Markus
Hehn, Mark W. Mueller, Jan Willmann, Fabio Gramazio, Matthias Kohler, and
Raffaello D’Andrea. The Flight Assembled Architecture Installation: Coopera-
tive construction with flying machines. IEEE Control Systems Magazine (under
review).
31
Chapter 2. Contributions
The following is a list of student projects that the author supervised at ETH Zurich as
part of his doctoral studies.
Master’s thesis
The master’s thesis is a six-month, full-time project.
Semester project
The semester project is a semester-long, part-time project.
[S1] Mark Müller, Automatic Tuning of PID Controllers for Flight Control, Fall 2009.
(co-supervised with Guillaume Ducard)
[S2] Robin Ritz, Time-Optimal Quadrotor Control, Spring 2010.
[S3] Michael Blösch, Quadrotor Ball Launching, Spring 2010.
[S4] Dario Brescianini, Nonlinear Quadrocopter Attitude Control, Fall 2011.
[S5] Sara Spedicato, Quadrotor Attitude Control, Spring 2012.
[S6] Simon Berger, UAV Compliance Monitoring, Fall 2012.
(co-supervised with Mark Müller)
Bachelor’s thesis
The bachelor’s thesis is a three-month, full-time project.
32
2.5 Outreach
2.5 Outreach
Throughout the duration of the author’s doctoral studies, the results of the research were
presented both to visitors at ETH Zurich and at various events. Some of the research
resulted in stand-alone demonstrations, while other algorithms were used as components
in more complex systems.
Lab Demonstrations
The algorithms developed during the author’s doctoral studies have been demonstrated to
more than 3000 visitors from academia, industry, governments, schools, and the general
public at over 150 separate demonstrations of the Flying Machine Arena at ETH Zurich.
Exhibitions
The technology developed in the Flying Machine Arena was also exhibited at several
events held outside ETH Zurich (as an example, see Figure 2.1). Further details about
the requirements and execution of external exhibitions may be found in [20]. All of the
following exhibitions included algorithms that are part of the research presented herein:
Figure 2.1 Exhibition of the Flying Machine Arena at Hannover Messe 2012, the largest
industrial fair in the world. The demonstrations presented to the audience included algorithms
presented in this thesis as well as in the related publications listed in Section 2.3.
33
Chapter 2. Contributions
34
3
Future Directions
This chapter discusses the current state of the research projects and potential future
research directions.
The first part of the trajectory generation research focused on the development of a
benchmarking tool to determine the performance potential of quadrotor systems. The
existing tool can be used to compute two-dimensional maneuvers between arbitrary states.
It does, however, require computation times on the order of hours for a single maneuver,
and the user must provide a reasonable initial guess for the method to converge.
35
Chapter 3. Future Directions
The second part of this research focused on the design of efficient real-time trajec-
tory generation methods. The presented algorithms have been implemented and used
extensively in the Flying Machine Arena project, both in individual experiments and as
building blocks in larger systems. They form a solid and proven base for further research
and development.
36
3.2 Part B. Iterative Learning of Periodic Motions
Choice of correction signal Fourier order: The order of the Fourier series used to
represent the tracking error and controller set point corrections is currently chosen by the
user. An interesting extension would be to automatically analyze the reference trajectory
in order to determine (in combination with the known closed-loop system dynamics) the
required order of the Fourier series.
Transfer to similar maneuvers: The focus of the learning algorithms in this re-
search was to improve the performance of one maneuver based on previous executions
of the same maneuver. A change of the maneuver would then require re-starting the
learning process entirely. One would however expect that similar maneuvers require sim-
ilar corrections, and that the data of past executions of one maneuver could serve as a
starting point to learn a similar maneuver. For more traditional time-domain iterative
learning control methods, recent results of such knowledge transfer algorithms applied to
quadrocopters have been promising [13].
Transfer to higher execution speeds: The current method of initializing the learn-
ing at higher speeds from correction data of lower speeds is based on a first-order extrap-
olation of the past corrections and the known closed-loop system transfer function. It
would be interesting to explore other mechanisms to predict the corrections at higher
speeds from low-speed measurement data, perhaps by including a disturbance model to
predict speed-dependent disturbance properties.
37
References
[1] Markus W. Achtelik, Simon Lynen, Margarita Chli, and Roland Siegwart. Inversion
Based Direct Position Control and Trajectory Following for Micro Aerial Vehicles.
In International Conference on Intelligent Robots and Systems, 2013.
[2] Moses Bangura and Robert Mahony. Nonlinear Dynamic Modeling for High Perfor-
mance Control of a Quadrotor. In Australasian Conference on Robotics and Automa-
tion, 2012.
[3] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Vol. I. Athena
Scientific, third edition, 2005.
[4] Patrick Bouffard, Anil Aswani, and Claire J Tomlin. Learning-Based Model
Predictive Control on a Quadrotor: Onboard Implementation and Experimental
Results. In International Conference on Robotics and Automation, 2012.
[5] Y. Bouktir, M. Haddad, and T. Chettibi. Trajectory Planning for a Quadrotor
Helicopter. In Mediterranean Conference on Control and Automation, 2008.
[6] Pierre-jean Bristeau, Philippe Martin, Erwan Salaün, and Nicolas Petit. The Role
of Propeller Aerodynamics in the Model of a Quadrotor UAV. In European Control
Conference, 2009.
[7] Douglas A Bristow, Marina Tharayil, and Andrew G Alleyne. A Survey of Iterative
Learning Control. Control Systems Magazine, 26(3):96–114, 2006.
[8] Bruce A. Conway. A Survey of Methods Available for the Numerical Optimization
of Continuous Dynamic Systems. Journal of Optimization Theory and Applications,
152(2):271–306, 2011.
[9] Ian D. Cowling, Oleg A. Yakimenko, and James F. Whidborne. A Prototype of
an Autonomous Controller for a Quadrotor UAV. In European Control Conference,
2007.
[10] Carlos E. Garcı́a, David M. Prett, and Manfred Morari. Model Predictive Control:
Theory and Practice - A Survey. Automatica, 25(3):335–348, 1989.
[11] Hans Peter Geering. Optimal Control with Engineering Applications. Springer, 2007.
[12] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization.
Academic Press, October 2003.
38
[13] Michael Hamer, Markus Waibel, and Raffaello D’Andrea. Knowledge Transfer
for High-Performance Quadrocopter Maneuvers. In International Conference on
Intelligent Robots and Systems, 2013.
[14] Henry Hermes and Joseph P Lasalle. Functional Analysis and Time Optimal Control.
Mathematics in Science and Engineering, 56, 1969.
[15] Gabriel M Hoffmann, Hao Huang, Steven L Waslander, and Claire J Tomlin.
Quadrotor Helicopter Flight Dynamics and Control: Theory and Experiment. In
AIAA Guidance, Navigation and Control Conference, 2007.
[16] Gabriel M. Hoffmann, Steven L. Waslander, and Claire J. Tomlin. Quadrotor
Helicopter Trajectory Tracking Control. In Conference on Decision and Control,
2008.
[17] Jonathan P. How, Brett Bethke, Adrian Frank, Daniel Dale, and John Vian.
Real-Time Indoor Autonomous Vehicle Test Environment. IEEE Control Systems
Magazine, 28(2):51–64, 2008.
[18] Qimi Jiang, Daniel Mellinger, Christine Kappeyne, and Vijay Kumar. Analysis
and Synthesis of Multi-Rotor Aerial Vehicles. In International Design Engineering
Technical Conference, 2011.
[19] Sergei Lupashin and Raffaello D’Andrea. Adaptive Fast Open-Loop Maneuvers for
Quadrocopters. Autonomous Robots, 33(1-2):89–102, 2012.
[20] Sergei Lupashin, Markus Hehn, Mark W. Mueller, Angela P. Schoellig, Michael
Sherback, and Raffaello D’Andrea. A Platform for Aerial Robotics Research and
Demonstration: The Flying Machine Arena. Mechatronics, 24(1):41–54, 2014.
[21] Daniel Mellinger and Vijay Kumar. Minimum Snap Trajectory Generation and
Control for Quadrotors. In International Conference on Robotics and Automation,
2011.
[22] Daniel Mellinger, Nathan Michael, and Vijay Kumar. Trajectory Generation
and Control for Precise Aggressive Maneuvers with Quadrotors. In International
Symposium on Experimental Robotics, 2010.
[23] Fabian L. Mueller, Angela P. Schoellig, and Raffaello D’Andrea. Iterative Learn-
ing of Feed-Forward Corrections for High-Performance Tracking. In International
Conference on Intelligent Robots and Systems, 2012.
[24] Mark W. Mueller and Raffaello D’Andrea. A Model Predictive Controller for
Quadrocopter State Interception. In European Control Conference, 2013.
[25] Pong-in Pipatpaibul and P R Ouyang. Application of Online Iterative Learning
Tracking Control for Quadrotor UAVs. ISRN Robotics, 2013:Article ID 476153, 2013.
[26] Paul Pounds, Robert Mahony, and Peter Corke. Modelling and Control of a Quad-
Rotor Robot. In Australasian Conference on Robotics and Automation, 2006.
[27] Paul Pounds, Robert Mahony, and Peter Corke. Modelling and control of a large
quadrotor robot. Control Engineering Practice, 18(7):691–699, July 2010.
39
References
[28] Caitlin Powers, Daniel Mellinger, Aleksandr Kushleyev, and Bruce Kothmann. In-
fluence of Aerodynamics and Proximity Effects in Quadrotor Flight. In International
Symposium on Experimental Robotics, 2012.
[29] Oliver Purwin and Raffaello D’Andrea. Performing and Extending Aggressive
Maneuvers Using Iterative Learning Control. Robotics and Autonomous Systems,
59(1):1–11, 2011.
[30] Charles Richter, Adam Bry, and Nicholas Roy. Polynomial Trajectory Planning for
Quadrotor Flight. In International Conference on Robotics and Automation, 2013.
[31] Angela P Schoellig, Fabian L Mueller, and Raffaello D’Andrea. Optimization-
Based Iterative Learning for Precise Quadrocopter Trajectory Tracking. Autonomous
Robots, 33(1-2):103–127, 2012.
[32] Roland Siegwart, Illah R. Nourbakhsh, and Davide Scaramuzza. Autonomous Mobile
Robots. The MIT Press, second edition, 2011.
40
Part A
TRAJECTORY GENERATION
Paper I
Abstract
Frequently hailed for their dynamical capabilities, quadrotor vehicles are often
employed as experimental platforms. However, questions surrounding achievable
performance, influence of design parameters, and performance assessment of control
strategies have remained largely unanswered. This paper presents an algorithm that
allows the computation of quadrotor maneuvers that satisfy Pontryagin’s minimum
principle with respect to time-optimality. Such maneuvers provide a useful lower
bound on the duration of maneuvers, which can be used to assess performance
of controllers and vehicle design parameters. Computations are based on a two-
dimensional first-principles quadrotor model. The minimum principle is applied
to this model to find that time-optimal trajectories are bang-bang in the thrust
command, and bang-singular in the rotational rate control. This paper presents a
procedure allowing the computation of time-optimal maneuvers for arbitrary initial
and final states by solving the boundary value problem induced by the minimum
principle. The usage of the computed maneuvers as a benchmark is demonstrated
by evaluating quadrotor design parameters, and a linear feedback control law as an
example of a control strategy. Computed maneuvers are verified experimentally by
applying them to quadrocopters in the ETH Zurich Flying Machine Arena testbed.
Published in Autonomous Robots Volume 33, Issue 1-2. The publication includes two
electronic attachments available at www.hehn.be and www.springerlink.com.
DOI: 10.1007/s10514-012-9282-3
c 2012 Springer Science+Business Media, LLC. The final publication is available at springerlink.com.
43
Paper I. Performance Benchmarking Using Time-Optimal Control
1. Introduction
Quadrotor vehicles are an increasingly popular aerial vehicle platform. Advantages com-
pared to other micro unmanned aerial vehicle (UAV) platforms are the ability to hover,
robustness and straightforwardness of design due to mechanical simplicity [20], and safety
benefits due to smaller rotor sizes compared to most other hover-capable UAVs [10].
Most early research on quadrocopter control focused on near-hover operation. This
operation was commonly achieved using linear controllers, which were tuned through
various strategies (see, for example, [3], and references therein). Trajectories are typi-
cally generated by connecting waypoints by lines or circles, and choosing flight velocity
sufficiently low to satisfy near-hover assumptions (e.g. [12]).
With the above advantages making quadrocopters attractive platforms for numerous
research fields, another key strength of these vehicles is increasingly being exploited: They
offer exceptional agility in the rotational degrees of freedom due to the off-center mounting
of the propellers, which allows the generation of large torques. Most platforms also
provide high thrust-to-weight ratios, allowing fast translational dynamics. This has lead
to numerous more complex control strategies that leverage these capabilities by explicitly
considering the feasibility of trajectories. Examples include fast translations [9, 18, 21],
flips [17], and dancing motions [23]. Several authors have also introduced algorithms that
generate trajectories from a class of motion primitives (lines, polynomials, or splines)
and respect the dynamic constraints of quadrocopters [4, 5, 11]. These algorithms enforce
dynamic feasibility by designing trajectories in a two-step process, first determining the
shape of the trajectory and subsequently determining an appropriate speed profile such
that feasibility constraints are not violated.
While this work has lead to impressive results, it has not answered one key question:
Given the specifications of a quadrotor vehicle, how fast can a certain maneuver be
completed? More formally, this can be stated as: What is the time-optimal trajectory for
given initial and final states? The ability to compute accurate answers to this question
offers insight that may be helpful for many performance-related topics: How much of
the theoretical speed potential does a certain control strategy utilize? Are there specific
maneuvers for which there exists a large improvement potential for a given controller?
How do the physical parameters of the vehicle, such as mass or maximum thrust, influence
performance?
The computation of time-optimal trajectories provides a useful tool to answer these
questions: The duration of such trajectories provides an absolute lower bound, which
may be used for comparisons. Several approaches have been proposed to compute time-
optimal trajectories: [14] proposed a direct method using control inputs discretized in
time, and solved the resulting optimization problem using genetic algorithms and non-
linear programming. [4] presented a different direct method, exploiting the differential
flatness of the system dynamics and parameterizing the output trajectory by a set of
control points, which are then connected by B-spline functions. While both methods
have been successfully demonstrated, the optimality of solutions remained unanswered.
44
2. Modeling of Vehicle Dynamics
In this paper, we present an algorithm that allows the computation of state and
input trajectories between two states. Pontryagin’s minimum principle is employed to
determine the structure of time-optimal trajectories, and computed trajectories can be
verified to satisfy the minimum principle conditions. While these conditions are necessary
- but not sufficient - for optimality, they do provide a strong argument for the found
trajectories.
The computation of maneuvers that satisfy the minimum principle using the algorithm
presented herein requires considerable computational effort. This makes it an unattrac-
tive proposition in real-time scenarios. However, the off-line computation of a selection
of maneuvers, chosen to be representative for the motion of the vehicle in a specific ap-
plication, provides a lower bound on the maneuver duration, to which real-time capable
controllers may be compared as a means of benchmarking their performance. If certain
motions are expected to be carried out repeatedly, this method may be used to design
nominal trajectories, for which stabilizing controllers could then be designed.
We base our calculations on a two-dimensional first-principles model of the quadrotor
vehicle. Because the algorithm presented herein is meant as a benchmarking tool rather
than a control algorithm, the choice of a two-dimensional model provides a reasonable
trade-off: The omission of the third dimension makes the problem tractable by reducing
its dimensionality. At the same time, two-dimensional problems cover a large class of ma-
neuvers that are interesting to benchmark. Through rotations of the coordinate system,
all maneuvers that start and end at rest may be treated as two-dimensional problems,
as well as maneuvers where initial and final velocities and rotations are aligned with the
rotated coordinate system. This provides a large number of control scenarios that may
be benchmarked, such that the benchmarking results can be considered representative of
the performance.
The quadrotor model is presented in Section 2. The structure of time-optimal trajec-
tories is determined in Section 3, and the algorithm used to solve the induced boundary
value problem is presented in Section 4. Numerous sample maneuvers have been com-
puted with this algorithm, a selection of which are shown in Section 5. The applica-
tion of time-optimal maneuvers as a benchmarking tool is demonstrated by computing
the influence of model parameters and evaluating the performance of a linear controller
(Section 6). The computed trajectories are validated in actual flight tests, with the
experimental results (Section 7) demonstrating their validity.
This section introduces the two-dimensional first-principles model, on which we base our
calculations. Furthermore, a non-dimensionalizing coordinate transformation is employed
to reduce the number of model parameters to two.
45
Paper I. Performance Benchmarking Using Time-Optimal Control
|ω| ≤ ω, (1)
FT ≤ FT ≤ FT (2)
The pitch rate is limited by the range of the gyroscopic on-board sensors, and the maxi-
mum collective thrust is determined by the maximum thrust each propeller can produce.
Because commonly available motor drivers do not allow changes of the direction of rota-
tion mid-flight, and because the propellers are of fixed-pitch type, it is assumed that the
collective thrust is always positive: FT > 0.
The equations of motion are
FT
ẍ = sin θ, (3)
m
FT
z̈ = cos θ − g, (4)
m
θ̇ = ω, (5)
where g denotes the gravitational acceleration and m is the mass of the quadrocopter.
We assume that the angular velocity θ̇ can be set directly without dynamics and
delay. This is motivated by the very high angular accelerations that quadrotors can
reach (typically on the order of several hundred rad s−2 ), while the angular velocity is
usually limited by the gyroscopic sensors used for feedback control on the vehicle [17].
This means that aggressive maneuvering is often more severely limited by the achievable
rotational rate than the available angular acceleration. The simplification of the model to
a constrained rotational rate makes the derivations and numerical computations in this
paper tractable.
It should be noted that the first-principles model presented above not only assumes
there is direct control of the rotational rate, but also neglects numerous dynamic effects,
such as propeller speed change dynamics [17], blade flapping [20], changes in the angle of
attack of the propellers [13], and drag forces. Considering that we seek to compute very
fast maneuvers, it is to be expected that such effects are significant. However, the purpose
of the method presented herein is to allow the benchmarking of quadrotor performance.
Because such benchmarks typically serve comparative purposes, modeling inaccuracies
play a less significant role than in other applications: As long as the model captures
46
2. Modeling of Vehicle Dynamics
θ
z
FT
ω
the dominant dynamics, one can expect relative comparisons based on the model to
provide meaningful answers. Differences of results based on different models, for example
between simulative and experimental results, highlight that the model does not capture
all effects; however, when comparing results obtained using the same dynamics, one can
expect the neglected effects to, for the most part, cancel out. We therefore believe the
trade-off between modeling complexity (and therefore tractability of the optimal control
problem) and accuracy to be reasonable. This is further supported by the comparison
of numerical and experimental results, which show a reasonable qualitative match with
some quantitative discrepancies between simulations and reality (see Section 7).
t̂ := ωt, (6)
ω2x
x̂ := , (7)
g
ω2z
ẑ := . (8)
g
In the transformed coordinates and time, the gravitational acceleration is unity. Defining
the state vector
˙ ẑ, ẑ,
x := (x̂, x̂, ˙ θ), (9)
47
Paper I. Performance Benchmarking Using Time-Optimal Control
˙
x̂ x̂˙
x̂¨ u sin θ
T
˙ = f (x, u) = ˙
ẋ = ẑ
ẑ .
(11)
¨
uT cos θ − 1
ẑ
θ̇ uR
It is important to note that in the above equations, the derivative ẋ is taken with respect
to the transformed time t̂.
The transformed control inputs are bounded by the minimum and maximum thrust
in units of gravitational acceleration, and unit allowable rotational rate:
FT FT
uT = ≤ uT ≤ uT = ,
mg mg (12)
|uR | ≤ 1.
The dimensionless model contains two model parameters: The lower and upper limit
of the collective thrust input (uT and uT ).
For the remainder of the paper, the hat notation will be omitted. It is understood that
calculations are carried out using the dimensionless coordinates, but could equivalently
be done in the original system.
48
3. Minimum Principle for Time-Optimal Maneuvers
problem
minimize T
u∈U
where pi denotes the elements of the costate vector p. Note that, because the terminal
time of the maneuver is free, the Hamiltonian is always zero along an optimal trajec-
tory [2]:
H(x∗ , u∗ , p) ≡ 0. (15)
ṗ1 = 0, ⇒ p 1 = c1 ,
ṗ2 = −p1 , ⇒ p2 = c2 − c1 t,
(17)
ṗ3 = 0, ⇒ p 3 = c3 ,
ṗ4 = −p3 , ⇒ p4 = c4 − c3 t,
The above equation depends on the control input uT , the trajectory of which is not known
a priori. It is therefore not easily possible to express p5 explicitly.
49
Paper I. Performance Benchmarking Using Time-Optimal Control
Optimal Control Input u∗R For the rotational control input uR , minimizing (14)
results in
u∗R = argmin {p5 uR }. (19)
uR ∈[−1,+1]
ΦR := p5 (20)
as the switching function of u∗R . If ΦR is zero for a nontrivial interval of time, then
the minimum condition (19) is insufficient to determine u∗R . In these intervals, which
are called singular arcs, u∗R is determined using the condition that ΦR remains zero: It
follows that Φ̇R vanishes, which results in the condition
Solving for θ∗ using u∗T > 0 (as discussed in Section 2.1) and the costate equations (17)
yields
∗ p2 c2 − c1 t
θ = arctan = arctan . (22)
p4 c4 − c3 t
Differentiating (22) with respect to time gives the trajectory of the control input u∗R = θ̇∗
in a singular arc:
c2 c3 − c1 c4
u∗R,sing = (23)
(c21 + c23 )t2 − 2(c1 c2 + c3 c4 )t + c22 + c24
The rotational control input u∗R of a time-optimal maneuver of the quadrocopter can
be written as:
+1
if ΦR < 0
u∗R = u∗R,sing if ΦR = 0 . (24)
−1 if Φ > 0 R
Optimal Control Input u∗T To compute the optimal control trajectory for the thrust
50
3. Minimum Principle for Time-Optimal Maneuvers
input u∗T , the sum of all terms of the Hamiltonian containing uT must be minimized:
For a singular arc to exist, ΦT must be zero for a nontrivial interval of time. Setting ΦT
to zero and solving for θ∗ yields
∗ p4 c3 t − c4
θ = arctan − = arctan . (27)
p2 c2 − c1 t
The pitch angle θ∗ is determined by the rotational control input u∗R . If u∗R is in a regular
interval, then the pitch angle is an affine function of time, and Equation (27) can not be
satisfied over a nontrivial time interval. It can therefore be concluded that u∗T cannot be
in a singular arc when u∗R is regular. If u∗R is singular, θ∗ is given by (22). It follows that,
for a singular arc of u∗T to exist, the pitch angle trajectory defined by (22) and by (27)
must be identical:
c2 − c1 t c3 t − c4
arctan = arctan . (28)
c4 − c3 t c2 − c1 t
Taking the tangent of both sides and multiplying the constraint out yields
Neglecting the trivial case c = (0, 0, 0, 0), condition (29) cannot hold for a nontrivial
interval of time. The thrust control input u∗T therefore does not contain singular arcs and
can be written as (
uT if ΦT ≤ 0
u∗T = . (30)
uT if ΦT > 0
51
Paper I. Performance Benchmarking Using Time-Optimal Control
ẋ∗
u∗T sin θ∗
ż ∗
ẋa = fa (t, xa ) = , (31)
u∗T cos θ∗ − 1
u∗R
(c1 t − c2 )u∗T cos θ∗ + (c4 − c3 t)u∗T sin θ∗
where the control inputs u∗R and u∗T are given by the control laws (24) and (30). A quad-
rotor maneuver from x0 to xT that satisfies the minimum principle solves the boundary
value problem (BVP)
ẋa = fa (t, xa ),
x∗ (0) = x0 , (32)
x∗ (T ) = xT .
The BVP (32) contains six unknowns: The final time T , the four unknown constants c,
and the initial value of the switching function ΦR (t = 0). However, the initial value of
the switching function may not be chosen freely, but must satisfy Equation (15)1 .
1. Apply u = (uR , uT ) or u = (uR , uT ) until the pitch angle θ is zero. Thrust com-
mands now influence only the vertical dynamics.
1
It can be seen from Equation (15) that, depending on the initial state x0 and the constants c, either
ΦR (t = 0) = 0 is the only allowable initial value of the switching function (u∗R (t = 0) is then in a singular
arc), or there are two allowable values that only differ in sign (u∗R (t = 0) is then in a regular arc, and its
sign is dictated by the sign of ΦR (t = 0)).
52
4. Algorithm for Calculation of Time-Optimal Maneuvers
2. Use the commands u = (0, uT ) and u = (0, uT ) to drive the states z and ż to zero
using a bang-bang strategy.
3. Apply rotational rate commands to drive the horizontal degree of freedom to zero,
while adjusting the thrust such that the vertical degree of freedom remains at rest.
The corresponding control inputs will be u = (uR , g/ cos θ), u = (uR , g/ cos θ),
and u = (0, g/ cos θ). The allowable pitch angle during this step is limited by the
available thrust, as g/ cos θ ≤ uT must hold.
As shown in [22], the existence of the minimum follows from the existence of an
arbitrary trajectory to the target state.
This section introduces a numerical algorithm that solves the boundary value problem
for the augmented system (BVP (32)) between arbitrary initial and final states. The
resulting maneuvers satisfy the minimum principle with respect to time-optimality. An
implementation of the algorithm in Matlab for free use is available on the first author’s
website, and is submitted along with this article. This section aims to provide the reader
with a high-level overview of the algorithm used. A more detailed discussion of the
individual steps may be found in Appendix A.
Finding a solution to the boundary value problem (32) is generally difficult: The
state at the end of the maneuver is a non-convex function of the six unknowns, with
many local minima and strongly varying sensitivities. With common boundary value
problem solvers providing only local convergence under these conditions, it is necessary
to provide a good initial guess for the unknowns or the solution trajectory xa . However,
with no straightforward physical interpretation of the constant vector c and the switching
function ΦR , it is difficult to provide such an initial guess. The application of BVP solvers
showed that convergence to the correct solution could only be achieved from initial guesses
very close to the correct values, making it almost impossible to initialize the algorithm
correctly. The problem is further aggravated by the fact that the numerical integration
is highly sensitive to numerical errors when entering or leaving singular arcs, as will be
discussed in Section 4.4.
The algorithm presented herein relaxes these problems by using more robust optimiza-
tion methods to produce a good initial guess for the BVP solver. For this, we exploit the
known bang-singular structure of maneuvers and parametrize a maneuver by the times
at which the control inputs switch, and by the terminal time. This approach, commonly
referred to as switching time optimization (STO, [24]), provides a significant advantage
of requiring no initial guess of xa or c. Instead of requiring an initial guess of xa or c, it
requires a guess of the switching times, which are easier to obtain, and which can lead to
convergence from a much larger range of initial guesses.
53
Paper I. Performance Benchmarking Using Time-Optimal Control
The finding of a solution to the STO problem does not necessarily imply a solution to
the conditions derived from the minimum principle. Therefore, the STO is used only as
a first step of the algorithm. In a second step, the result of it is used to extract an initial
guess of c, which typically lies close enough to the correct values to allow a BVP solver
to solve the boundary value problem as a third step, and therefore compute a maneuver
that satisfies the minimum principle.
If it is assumed that the maneuver has no singular arcs, then the algorithm is less
complex and more intuitive. Therefore, an algorithm assuming maneuvers with pure
bang-bang behavior is introduced first, and then the modifications necessary for the
computation of maneuvers with singular arcs are presented.
54
4. Algorithm for Calculation of Time-Optimal Maneuvers
found maneuver does not satisfy the minimum principle. Reasons for this may be an
inappropriate choice of the number of switches in the control inputs, failure of the STO
to converge to the correct maneuver, or the existence of singular arcs in the optimal
maneuver.
2
We conjecture that the assumption that uT does not switch at the edges of the singular arcs is valid
for almost all initial and final conditions, with an appropriately defined measure. For all maneuvers
considered here, results have shown that this condition has been fulfilled.
55
Paper I. Performance Benchmarking Using Time-Optimal Control
5. Numerical Results
In this section, we present a selection of quadrotor maneuvers that were computed using
the algorithm introduced in Section 4. While the algorithm allows the computation of
motions between arbitrary states, the results presented herein focus on position changes
where the quadrocopter is at rest at the beginning and at the end of the maneuver.
The quadrotor parameters for which the maneuvers have been computed are based on
the ETH Zurich Flying Machine Arena vehicles, as described in [17]. Table 1 shows the
used numerical model parameters in the dimensional form (FT /m, FT /m, ω). The non-
dimensional parameters (uT , uT ) can easily be calculated from these using the control
input transformation (12). While computations were carried out in the dimensionless
coordinate system, the maneuvers presented herein have been transformed back to the
state variables representing physical dimensions, allowing a more intuitive interpretation.
At the end of the vertical displacement maneuver, the quadrotor is at rest again, with no
overall horizontal displacement, and a final pitch angle that is a multiple of 2π:
56
5. Numerical Results
Figure 2. Maneuver duration T as function of the final vertical displacement zT for rest-to-
rest maneuvers with no horizontal displacement. Additionally, the switching times of uR are
drawn in the plot on the top, and the switching times of uT in the plot on the bottom. The
vertical solid line denotes where the structure of the minimum principle solution changes: on the
left the solution without a flip, which is faster for small zT , and on the right the solution where
the quadrocopter performs a flip, which is faster for large zT . The line with the arrows denotes
the example maneuver of Figure 4.
displacement zT . Furthermore, it shows the switching times for each maneuver. For a
particular displacement zT , the maneuver starts at the bottom of the graph (t = 0 s) and,
as time passes, moves up in the positive direction of the t-axis. Every time a switching
line is crossed, the corresponding control input switches to the value specified in the
diagram. The maneuver is finished when the T -curve is reached.
If the desired vertical displacement is small, i.e. for
zT ≤ 2.4 m, the quadrocopter is within a singular arc during the entire maneuver. The
pitch angle remains at exactly θ = 0. The thrust is at its maximum at the beginning
and switches to its minimum at a time such that the quadrocopter comes to rest due to
gravity at the desired height zT . For zT ≥ 2.5 m, it is beneficial to perform a flip and to
make use of the thrust for braking while the pitch is around θ ≈ ±π. For zT ≥ 6.3 m, a
singular arc (which keeps the pitch near θ ≈ ±π for a particular time) appears, as can
be seen in Figure 2. Thus, the flip is stopped for an interval of deceleration. A selection
of maneuvers is depicted in Figure 3, showing the different maneuver shapes for varying
57
Paper I. Performance Benchmarking Using Time-Optimal Control
displacements.
The arrow line in the plots of Figure 2 denotes an example maneuver with a vertical
displacement of zT = 5 m. The state, input and switching function trajectories of this
maneuver are shown in Figure 4. The switches in the control input trajectories in Figure 4
can be depicted by following the arrow line from t = 0 s towards t = T in Figure 2.
Considerations of symmetry lead to the conclusion that, for purely horizontal dis-
placements, the switching times are symmetric about the half-time of the maneuver. This
58
5. Numerical Results
consideration is not proven here, but has shown to always be correct for the computed
horizontal maneuvers, with the resulting symmetric maneuvers satisfying the minimum
principle. Using this assumption during the STO, the number of optimization variables
can be reduced and the computation of horizontal displacements becomes particularly
simple.
Maneuvers have been computed for a displacement xT ranging from 0.1 m to 15 m,
with a step size of 0.1 m. Figure 5 shows the maneuver duration T and the switching
times as a function of xT .
For xT ≤ 1.5 m, the maneuver is bang-bang with no singular arcs. At the beginning
the quadrotor turns at maximum rate, and around the maximum pitch angle the thrust
is switched to its maximum for acceleration. Then it turns in the negative direction to
decelerate around the minimum peak of θ, before it goes back to θ = 0. At xT = 1.6 m,
two singular arcs appear. Roughly speaking, the pitch angle is kept at θ ≈ ±π/2 for
acceleration and deceleration, respectively. Because a trade-off between fast acceleration
in x and maintaining altitude in z is necessary, the pitch angle is not exactly θ = ±π/2
within the singular arcs, and is not constant. For xT ≥ 7.9 m, the two singular arcs merge:
The quadrotor turns smoothly to a negative θ for deceleration, instead of a sharp turn in
59
Paper I. Performance Benchmarking Using Time-Optimal Control
Figure 5. Maneuver duration T as function of the final displacement xT for purely horizontal
rest-to-rest maneuvers. Additionally, the switching times of uR are drawn in the plot on the top,
and the switching times of uT in the plot on the bottom.
the middle of the maneuver. For maneuvers with xT ≥ 2.4 m, the thrust control input is
always at its maximum value. Figure 6 shows an illustration of some selected maneuvers.
60
6. Benchmarking of Quadrotor Designs and Controllers
Since the final state of a general displacement contains two variables (xT and zT ), the
maneuver duration T cannot be easily plotted in a two-dimensional figure. As an exam-
ple, a maneuver with a displacement of 5 m in horizontal and vertical direction, i.e. a
maneuver with xT = 5 m and zT = 5 m, is illustrated here. Figure 7 shows the resulting
input, state, and switching function trajectories of this example maneuver. Note that the
control inputs and the switching functions indeed fulfill the control laws (24) and (30).
This implies that the minimum principle for time-optimality is satisfied.
61
Paper I. Performance Benchmarking Using Time-Optimal Control
For a varying maximum pitch rate ω, the adjusted quadrotor performance can be
obtained very easily: As the computations are done in the dimensionless coordinate
system where the maximum pitch rate is normalized to unity, a changing ω impacts only
the back-transformation to the dimensional coordinates. Consequently, no recomputation
of maneuvers is necessary and the structure of the switching times, as plotted in Figure 2
and Figure 5, does not change.
For varying thrust limits FT and FT (or equivalently uT and uT ), the impact of the
changing parameters is not straight-forward. Since uT and uT are used during the com-
putation of the maneuvers, a complete recalculation is required and the structure of the
switching time evolution in Figure 2 and Figure 5 may change. As an example, maneu-
vers for a horizontal displacement of xT = 5 m have been computed with a maximum
mass-normalized thrust FT /m between 10 m s−2 and 30 m s−2 at a step size of 0.5 m s−2 ,
while the minimum thrust FT /m was kept constant at 1 m s−2 . The resulting maneuver
duration T and the switching times are shown in Figure 8.
The gravitational acceleration g poses a lower limit: If the mass-normalized thrust FT /m
approaches g, the maneuver duration approaches infinity because the quadrotor needs
all the available thrust force to maintain hover, and no horizontal displacement can be
62
6. Benchmarking of Quadrotor Designs and Controllers
achieved without height loss. As FT /m increases, the slope of the T -curve decreases,
which means that the performance gain per additional thrust becomes smaller. It follows
that for large thrust-to-weight ratios, an increase of the available thrust force does not
lead to significantly better performance with respect to the maneuver duration, with re-
spect to horizontal displacements. To achieve faster displacements, the maximum pitch
rate ω would have to be increased as well.
ẍ = gθ (37)
FT
z̈ = −g (38)
m
θ̇ = ω. (39)
1
ω= (θd − θ), (40)
τθ
where θd is the desired pitch angle that is computed by the position control loop, and τθ
is the time constant of the attitude control loop.
The two translational degrees of freedom decouple entirely in the linearization, allow-
ing straightforward controller designs for each of them:
ζz 1
FT = m −2 ż − 2 (z − zd ) , (41)
τz τz
1 ζx 1
θd = −2 ẋ − 2 (x − xd ) . (42)
g τx τx
In the above equations, τx and τz are the respective closed loop time constants, and ζx
and ζz are the respective damping ratios.
The saturations of the control inputs ω and FT are applied with the same values as
for the computation of time-optimal maneuvers. Additionally, the desired pitch angle θd
is limited to |θd | ≤ π/2.
63
Paper I. Performance Benchmarking Using Time-Optimal Control
It is important to note that this controller is not designed for optimal performance,
and performance cannot be expected to match the many more sophisticated controllers
that have been presented (see, for example, [4, 5, 11, 17, 18, 21]). It is, however, a useful
demonstration of the benchmarking of common ‘everyday’ controllers, and their perfor-
mance.
The parameters of the controller are shown in Table 2, and are based on the parameters
of controllers used in the Flying Machine Arena test bed. These parameters are based
on manual tuning for all-round usability in the testbed, rather than optimizing simulated
performance.
We compare the performance of the linear controller to the achievable performance
when using the model dynamics (3)-(5) and performing translations that start and end
at rest. To evaluate the performance of the linear controller, the target translation is
provided as a setpoint xd and zd , and the closed-loop dynamics are simulated until the
state settles. The duration of the maneuver is taken to be the time until the position
error remains within 1% of the translation distance.
Figure 9 shows the duration of purely horizontal maneuvers as a function of the
translation distance. The results show that this linear controller achieves maneuver times
that, depending on the translation distance, are between approximately 175% and 880%
of the minimal achievable time. As one expects, the maneuver duration is approximately
constant for small translations, but varies significantly for large translations where model
nonlinearities become more dominant. The non-monotonicity of the maneuver duration
is caused by position oscillations that either lie within the 1% band defining the end of
the maneuver, or slightly exceed it.
Figure 10 demonstrates the same comparison between the linear controller and time-
optimal maneuvers, this time demonstrating the performance for purely vertical maneu-
vers. When using the linear controller, the pitch angle θ always remains at 0. The
system dynamics are therefore fully linear, except for control input saturations. This
is represented in the three distinct regions in the plot: Initially, the maneuver duration
is constant. In a second region, the upper thrust constraint is reached, increasing the
maneuver duration. In the third section, both upper and lower constraints are active
during the maneuver, and the maneuver duration rises faster. The maneuver duration
using the linear controller is between 155% and 1100% of the minimal time.
64
7. Experimental Results
Time-optimal
0
0 2 4 6 8 10
Horizontal distance (m)
Figure 9. Comparison of maneuver times for purely horizontal maneuvers. The maneuver
using the linear controller is considered finished when the position error remains within 1% of
the desired translation.
Time-optimal
Maneuver duration (s)
3
Linear
0
0 2 4 6 8 10
Vertical distance (m)
Figure 10. Comparison of maneuver times for purely vertical maneuvers. The maneuver
using the linear controller is considered finished when the position error remains within 1% of
the desired translation.
The above results show that the linear controller with saturations leaves considerable
room for improvement. The the highest performance gains are achievable for very small
translations, which take approximately constant time with the linear controller.
7. Experimental Results
65
Paper I. Performance Benchmarking Using Time-Optimal Control
66
7. Experimental Results
Figure 11. Measured state trajectories for two example maneuvers: Final state xT = 0 and
zT = 5 m (top), xT = 5 m and zT = 5 m (bottom). The numerical results for these maneuvers
are shown in Figures 4 and 7, respectively. It can be seen that the measured trajectories are
similar to the numerical ones, with unmodeled dynamics apparent at high speeds and around
switching times.
Inspecting the velocity trajectories more closely, it also becomes apparent that sudden
changes of acceleration, such as at the beginning of both maneuvers, are not achieved in
the experiment. This behavior can be explained by examining the underlying propeller
dynamics: For a sudden thrust change, the propeller speed must be increased or decreased
instantaneously. The true propeller speed change dynamics are dictated by the available
current and by the motor controllers.
The differences between the trajectories are further highlighted in Figure 12, where
experimental and simulative results are superimposed. While the direct comparison high-
lights the longer duration of the maneuver, it is also clearly visible that the general shape
of the simulation results is matched well by the experiments. With the ability to trans-
fer maneuvers from simulation to the experimental platform, it is possible to perform
comparative studies not only in simulation, but also in reality. For accurate benchmarks,
care should be taken to compare results from similar sources (for example, obtained using
the same simulation, as shown in Section 6). If, when assessing performance, numerical
results for time-optimal maneuvers are to be compared to experimental results, it may
be important to account for differences between experimental and simulative results.
67
Paper I. Performance Benchmarking Using Time-Optimal Control
Figure 12. Overlay of trajectories from simulation and experiment: Final state xT = 5 m and
zT = 5 m. The direct comparison highlights the longer duration of the experimental trajectory,
with peak velocities about 16% and 40% lower in x and z, respectively.
68
A. Algorithm for Calculation of Time-Optimal Maneuvers
transformed to compute all maneuvers starting and ending at standstill using the two-
dimensional model, additional insight may be gained through the ability to compute
arbitrary maneuvers.
A further possible extension of the work herein could be the development of algo-
rithms that permit significantly faster computation of time-optimal maneuvers, allowing
such maneuvers to be computed at timescales that make them useful in on-line planning
scenarios.
This appendix discusses the numerical algorithm presented in Section 4 in more detail,
with a focus on how the individual steps were implemented. This implementation (in
Matlab) of the algorithm is available for free use on the first author’s website, and is
submitted along with this article.
This appendix follows the outline of Section 4, first introducing maneuvers containing
no singular arcs in Sections A.1 - A.3, and then showing modifications for bang-singular
maneuvers in Section A.4.
Figure 13 shows a flowchart diagram of the algorithm for bang-bang maneuvers, and
in the following, the three steps are introduced in detail.
NR and NT are the number of switches of the rotational control input and the thrust
input, respectively. The principle of STO is to choose NR and NT , and to then improve
an initial choice of the switching times {TuR }ini and {TuT }ini , until a control trajectory
is found that guides the quadrotor from x0 to xT with an acceptable accuracy. The final
state error is measured using the scalar final state residual function
where the matrix W = diag(w1 , w2 , w3 , w4 , w5 ) contains the weights of the different state
errors. The final state x(T ) resulting from the chosen switching times can be obtained by
numerically integrating the system dynamics f (x, u) over the interval [0, T ], where u is
defined by the initial control inputs u(t = 0) and the switching times {TuR } and {TuT }.
69
Paper I. Performance Benchmarking Using Time-Optimal Control
Figure 13. Flowchart diagram of the algorithm that computes bang-bang maneuvers satisfying
the minimum principle. The three steps are presented in detail in Sections A.1-A.3. In this graph,
≈ is used to denote that the equation must be solved to acceptable accuracy.
The maneuver duration T is not known a priori and we seek the minimum T for which
Pres = 0 can be obtained. The problem can be written as
70
A. Algorithm for Calculation of Time-Optimal Maneuvers
where {T }ach is the set of all T for which Pres = 0 is achievable, implying that the
maneuver to be found is the one with the shortest possible duration.
The solution of (45) is computed by a two-step algorithm: For an initially small, fixed
maneuver duration T , the state residual Pres is minimized by varying the switching times
{TuR } and {TuT } using a simplex search method (this choice was based on the observation
that derivative-free optimization algorithms have shown to perform significantly better
in this optimization). After the minimization, T is increased using the secant method
Ti − Ti−1
Ti+1 = Ti + , (46)
(Pres,i−1 /Pres,i ) − 1
or by a constant value if convergence of the secant method is not assumed, see [6]. These
two steps are repeated until Pres = 0 is achieved. Since the initial value of T is chosen
to be too small to complete the maneuver, and since T is successively increased, the
algorithm delivers a value close to the smallest T for which Pres = 0 is achievable.
The choice of the number of switches is based on the user’s intuition and experience
from the computation of other maneuvers. If the number is chosen too high, the algorithm
can converge to the correct result by producing dispensable switching times, as discussed
below. The initial guess for the duration of the maneuver T must be chosen to be too
short to compute the maneuver, and can be obtained from a guess based on the vehicle’s
translational acceleration capabilities, or on similar maneuvers.
As shown in Section 3, only the derivative Φ̇R of the switching function is known a priori.
71
Paper I. Performance Benchmarking Using Time-Optimal Control
However, once the state trajectories are known from the STO, the condition H ≡ 0 (which
must hold if the maneuver is time-optimal) can be used to compute ΦR . Recalling the
Hamiltonian (14) and using the definition ΦR = p5 yields
As shown in (17), the first four costates pi are all linear in c. The above equation can
therefore be written as a linear function of c:
1
ΦR = (−1 + c1 (−ẋ + tuT sin θ) + c2 (−uT sin θ)
uR (49)
+ c3 (−ż + t(uT cos θ − 1)) + c4 (−uT cos θ + 1)) .
Given the linear form of ΦR , Equation (47) states NR linear conditions on the constant
vector c.
The derivative Φ̇R is given by Equation (31). For a trajectory that satisfies the
minimum principle, the integral of Φ̇R must coincide with the trajectory of ΦR given
by (49). Hence, for an arbitrary interval [t1 , t2 ] ∈ [0, T ],
R t2
ΦR (t2 ) − ΦR (t1 ) = t1
Φ̇R dt
R t2 (50)
= t1
(−p2 uT cos θ + p4 uT sin θ)dt
must hold, where the left side of the equation is computed using H ≡ 0, i.e. by (49). The
costates p2 and p4 are linear functions of c, and the above equation can be written as
Z t2 Z t2
ΦR (t2 ) − ΦR (t1 ) = c1 tuT cos θdt − c2 uT cos θdt
t1 t1
Z t2 Z t2 (51)
−c3 tuT sin θdt + c4 uT sin θdt.
t1 t1
To set up conditions on c based on (51), the maneuver interval [0, T ] is divided into
NR + 1 subintervals that are separated by the switching times {TuR }, i.e.
[n o
[0, T ] = [0, Tu1R ], [Tu1R , Tu2R ], . . . , [TuNRR , T ] . (52)
This choice is beneficial with respect to the computational effort, because the switching
function ΦR must vanish at the switching times; the left side of (51) can be set to zero for
all intervals, except for the first and the last one. The NR + 1 intervals describe NR + 1
additional linear conditions on the constant vector c.
72
A. Algorithm for Calculation of Time-Optimal Maneuvers
must be satisfied, where the set {TuT } is given by the STO. The thrust switching function
(26) is a linear function of the costates p2 and p4 , and again linear in c:
This linear form of the thrust switching function ΦT allows one to define NT addi-
tional linear conditions on the elements of the constant vector c, based on the conditions
from (53).
The matrix A is of size (Nc × 4) and the vector r has the length Nc , where Nc is the total
number of linear conditions:
Nc = 2NR + NT + 1 (56)
For all maneuvers considered here, the system of equations (55) is overdetermined, per-
mitting no exact solution. Therefore, the least squares solution of (55) is computed ( [1]),
which is given by
c∗ = (AT A)−1 AT r. (57)
73
Paper I. Performance Benchmarking Using Time-Optimal Control
1. Applying STO, a maneuver that brings the quadrotor to the desired final state is
found, and in parallel, a constant vector c that fulfills the condition matrix equation
resulting from the parameter extraction is computed.
2. Having a reasonable initial guess of the switching times, of the maneuver duration T ,
and of the constant vector c, a BVP solver that computes a solution to BVP (32)
is applied.
74
A. Algorithm for Calculation of Time-Optimal Maneuvers
Figure 14. Flowchart diagram of the algorithm to compute bang-singular maneuvers that
satisfy the minimum principle. The symbol ≈ is used to denote that the equation must be solved
to acceptable accuracy.
Within a singular arc, uR is given by Equation (23) and its trajectory depends on the
4
It is necessary to additionally specify whether uR switches to −1 or +1 at the end of the singular
arc. We employ the convention that uR switches to the opposing value of the one before the singular arc.
A singular arc where uR returns to the same value after the singular arc can be modeled by an additional
switch at the end of the singular arc, with the corresponding duration of the additional singular arc being
zero.
75
Paper I. Performance Benchmarking Using Time-Optimal Control
constants c. The final state residual Pres is therefore not only a function of the maneuver
duration T and of the sets of the switching times, but also of the constant vector c.
Accordingly, the state residual may be written as
The new parameter set {Ds,uR } and the constant vector c are additional optimization
variables during the STO.
If the solution is to satisfy the minimum principle, the optimization variables over-
constrain the problem: For the solution to satisfy the optimality conditions, the control
inputs must be the optimal control inputs, as specified by Equations (24) and (30).
These optimal inputs could be found using c to compute the switching functions. This
is avoided, however, because the separate optimization of the switching times and c has
shown to be more robust.
Because only constants c that satisfy the condition matrix equation Ac = r from the
parameter extraction are a valid choice, we define the condition residual to be
where Wc is a diagonal matrix containing the weights of the different linear conditions.
It is important to note that the matrix A and the vector r are functions of the switching
times {TuR } and {TuT }, of the singular arc durations {Ds,uR }, of the maneuver duration
T , and of the constants c. For a maneuver that satisfies the minimum principle, the
condition residual Cres must vanish. Consequently, the STO problem for bang-singular
maneuvers can be written as
where {T }ach denotes the set of all T for which Pres = 0 and Cres = 0 is achievable.
For bang-singular maneuvers, the sum of the state and the condition residual Pres + Cres
is minimized during the STO. For the computation of Cres , the matrix A and the vector
r are required: The parameter extraction is no longer an isolated step, but needs to be
performed for each evaluation of Cres within the STO minimization. The parameter ex-
traction is not used to compute the constants c (which are optimization variables), but
to compute A and r.
76
A. Algorithm for Calculation of Time-Optimal Maneuvers
ΦR (Tui R ) = 0 for i = 1, 2, . . . , NR ,
(63)
ΦR (Tui R + Ds,u
i
R
)=0 for i = 1, 2, . . . , NR .
Because these conditions do not imply that ΦR is zero during the entire singular arc, it
is necessary to verify the trajectory of ΦR after the computation. If a switch Tui R has no
i
singular arc, i.e. if Ds,u R
= 0, then the corresponding two conditions in (63) are identical.
From this it follows that one additional condition results for each singular arc. We denote
the number of singular arcs as Ns , hence (63) describes NR + Ns conditions. This means
that Ns additional conditions have been identified, compared to the bang-bang case. As
derived in Section A.2, these conditions are linear with respect to c.
As the derivative of the rotational switching function Φ̇R is known explicitly, we
demand that the integration value of Φ̇R between two switches of uR is zero for bang-
bang maneuvers. For bang-singular maneuvers, we pose similar conditions, but extra
time intervals over the singular arcs are created. An integration value of zero does not
imply that ΦR stays at zero during the whole singular arc, but constant drifts of ΦR are
penalized. Hence, the intervals over which Φ̇R is integrated are
Sn
[0, T ] = 1
[0, Tu1R ], [Tu1R , Ts,u R
1
], [Ts,u R
, Tu2R ], . . .
o (64)
NR −1 NR NR NR NR
. . . , [Ts,u R
, TuR ], [TuR , Ts,uR ], [Ts,uR , T ] ,
i
where Ts,u R
= Tui R + Ds,u
i
R
is used for a more compact notation. Analogously to the
bang-bang case, a linear condition for each of these intervals can be constructed using
i
Equation (51). If a switch has no singular arc, then Ds,u R
= 0 and the corresponding
interval vanishes. Hence, for bang-singular maneuvers, NR + Ns + 1 linear conditions
on the constant vector c result. Compared to a bang-bang maneuver, Ns additional
conditions are introduced.
Assuming that the thrust input uT does not switch at the edges of the singular in-
tervals, Φ̇R is continuous over the border of the singular arcs, as can be seen from (21).
Consequently, the switching function ΦR enters and leaves a singular arc tangentially.
We therefore impose the conditions that the derivative Φ̇R is zero at the edges of every
77
Paper I. Performance Benchmarking Using Time-Optimal Control
i
singular arc. For each singular arc, i.e. for each Ds,u R
> 0, two additional conditions
result:
Φ̇R (Tui R ) = 0 for i = 1, 2, . . . , Ns ,
(65)
Φ̇R (Tui R + Ds,u
i
R
) = 0 for i = 1, 2, . . . , N s .
The derivative of the rotational switching function is given by
which has been derived in Section 3. This is a linear function of the constants c, and
yields 2Ns additional conditions.
General Condition Matrix Equation: In total, 4Ns additional conditions have been
identified. It follows that in the case of a bang-singular maneuver, the condition matrix
equation
Ac = r (67)
The condition matrix equation is overdetermined as soon as the maneuver has at least
one singular arc.
BVP Solver for Bang-Singular Maneuvers Similar to the algorithm for bang-
bang maneuvers, the final step is the reduction of errors through the application of a
BVP solver. If the maneuver contains singular arcs, ΦR stays at zero for a nontrivial
interval of time. Since the system is integrated numerically, ΦR is near zero during the
singular arcs, but does not vanish completely due to numerical inaccuracies. As ΦR enters
and leaves the singular arcs tangentially, defining a threshold value below which ΦR is
considered to be zero is not a straightforward task. For this reason, the rotational control
trajectory uR is not determined using the optimal control law (i.e. based on its switching
function ΦR ), but is based on the sets {TuR } and {Ds,uR }. Consequently, {TuR } and
{Ds,uR } are optimizing variables during the BVP minimization, because they impact the
control trajectory u. Further, since the switching times of uR are not determined based
on the constants c, the optimal control laws are not implicitly satisfied. One must thus
ensure that the condition matrix equation is fulfilled, which is the case if Cres vanishes.
Thus, as during the switching time optimization, the sum of the state residual Pres and
the condition residual Cres is minimized. The BVP solver problem for bang-singular
maneuvers becomes
minimize Pres + Cres
subject to ẋ = f (x, u), (69)
x(0) = x0 ,
78
References
where the control trajectory uR is computed according to the switching times and singular
arc durations, and uT according to the optimal control law (30). Note that the arguments
({TuR }, {Ds,uR }, c, T ) of Pres and Cres have been omitted for reasons of clarity.
In the BVP Solver step, the NT linear conditions resulting from the thrust input are
trivially satisfied, because uT is computed based on its switching function ΦT . Hence,
when the matrix condition equation is computed for the evaluation of Cres during the
BVP minimization, it has only
References
[2] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Vol. I. Athena
Scientific, third edition, 2005.
[6] Germund Dahlquist and Ake Björck. Numerical Methods. Dover Publications, 2003.
[7] Hans Peter Geering. Optimal Control with Engineering Applications. Springer, 2007.
79
Paper I. Performance Benchmarking Using Time-Optimal Control
[8] Daniel Gurdan, Jan Stumpf, Michael Achtelik, Klaus-Michael Doth, Gerd Hirzinger,
and Daniela Rus. Energy-Efficient Autonomous Four-Rotor Flying Robot Controlled
at 1 kHz. In International Conference on Robotics and Automation, 2007.
[9] Markus Hehn and Raffaello D’Andrea. Quadrocopter Trajectory Generation and
Control. In IFAC World Congress, 2011.
[10] Gabriel M Hoffmann, Hao Huang, Steven L Waslander, and Claire J Tomlin.
Quadrotor Helicopter Flight Dynamics and Control: Theory and Experiment. In
AIAA Guidance, Navigation and Control Conference, 2007.
[11] Gabriel M. Hoffmann, Steven L. Waslander, and Claire J. Tomlin. Quadrotor
Helicopter Trajectory Tracking Control. In Conference on Decision and Control,
2008.
[12] Jonathan P. How, Brett Bethke, Adrian Frank, Daniel Dale, and John Vian.
Real-Time Indoor Autonomous Vehicle Test Environment. IEEE Control Systems
Magazine, 28(2):51–64, 2008.
[13] Haomiao Huang, Gabriel M. Hoffmann, Steven L. Waslander, and Claire J. Tomlin.
Aerodynamics and Control of Autonomous Quadrotor Helicopters in Aggressive
Maneuvering. In International Conference on Robotics and Automation, 2009.
[14] Li-Chun Lai, Chi-Ching Yang, and Chia-Ju Wu. Time-Optimal Control of a Hovering
Quad-Rotor Helicopter. Journal of Intelligent and Robotic Systems, 45(2):115–135,
June 2006.
[15] Urszula Ledzewicz, Helmut Maure, and Heinz Schattler. Bang-Bang and Singular
Controls in a Mathematical Model for Combined Anti-Angiogenic and Chemotherapy
Treatments. In Conference on Decision and Control, December 2009.
[16] Sergei Lupashin and Raffaello D’Andrea. Adaptive Open-Loop Aerobatic Maneuvers
for Quadrocopters. In IFAC World Congress, 2011.
[17] Sergei Lupashin, Angela Schöllig, Michael Sherback, and Raffaello D’Andrea. A
Simple Learning Strategy for High-Speed Quadrocopter Multi-Flips. In International
Conference on Robotics and Automation, 2010.
[18] Daniel Mellinger, Nathan Michael, and Vijay Kumar. Trajectory Generation
and Control for Precise Aggressive Maneuvers with Quadrotors. In International
Symposium on Experimental Robotics, 2010.
[19] Nathan Michael, Daniel Mellinger, Quentin Lindsey, and Vijay Kumar. The GRASP
Multiple Micro UAV Testbed. IEEE Robotics and Automation Magazine, 17(3):56–
65, 2010.
[20] Paul Pounds, Robert Mahony, and Peter Corke. Modelling and Control of a Quad-
Rotor Robot. In Australasian Conference on Robotics and Automation, 2006.
[21] Oliver Purwin and Raffaello D’Andrea. Performing and Extending Aggressive
Maneuvers Using Iterative Learning Control. Robotics and Autonomous Systems,
59(1):1–11, 2011.
80
References
[22] Emilio Roxin. The Existence of Optimal Controls. The Michigan Mathematical
Journal, 9(2):109–119, 1962.
[23] Angela Schoellig, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Feasiblity
of Motion Primitives for Choreographed Quadrocopter Flight. In American Control
Conference, 2011.
[24] M Zandvliet, O Bosgra, J Jansen, P Vandenhof, and J Kraaijevanger. Bang-Bang
Control and Singular Arcs in Reservoir Flooding. Journal of Petroleum Science and
Engineering, 58(1-2):186–200, August 2007.
81
Paper II
Abstract
This paper presents a trajectory generation algorithm that efficiently computes
high-performance flight trajectories capable of moving a quadrocopter from a large
class of initial states to a given target point that will be reached at rest. The ap-
proach consists of planning separate trajectories in each of the three translational
degrees of freedom, and ensuring feasibility by deriving decoupled constraints for
each degree of freedom through approximations that preserve feasibility. The pre-
sented algorithm can compute a feasible trajectory within tens of microseconds on
a laptop computer; remaining computation time can be used to iteratively improve
the trajectory. By replanning the trajectory at a high rate, the trajectory generator
can be used as an implicit feedback law similar to model predictive control. The so-
lutions generated by the algorithm are analyzed by comparing them to time-optimal
motions, and experimental results validate the approach.
83
Paper II. Real-Time Trajectory Generation for Quadrocopters
1. Introduction
Quadrocopters are popular aerial robotic platforms for applications where the ability to
hover and move freely in three-dimensional space is important [39]. Trajectory generation
remains a problem, however, as flight paths that are feasible under the complex dynamic
and input constraints of the vehicles must be computed.
84
1. Introduction
rithms, see e.g. [24]), also making short computation times desirable.
1.3 Contributions
This paper presents and analyzes a trajectory generation algorithm that allows the fast
computation of high-performance flight trajectories to move the quadrocopter from a
large class of initial states to a given target location at rest. The specific performance
criterion considered herein is the duration of the flight maneuver until the vehicle reaches
the target.
The algorithm is designed to be computationally efficient in order to provide a prov-
ably feasible trajectory quickly enough for its use in real-time settings. The design is
further targeted at flight in relatively small spaces, where peak velocities remain suffi-
ciently small that aerodynamic effects (such as drag) do not become dominant. It is
also assumed that an underlying body rate control loop is available to accurately track
commanded rotational rates and therefore allow the rotational rates to be modeled as
control inputs.
In order to efficiently generate trajectories under these assumptions, a decoupling ap-
proach is used to reformulate the quadrocopter trajectory problem in the computation
of time-optimal trajectories for acceleration- and jerk-limited triple integrators. The fact
85
Paper II. Real-Time Trajectory Generation for Quadrocopters
that time-optimal trajectories for input-affine systems are bang-singular is well known
from optimal control theory [18], [6]. In the context of trajectory generation for robotic
applications, the bang-singular structure is attractive because these trajectories often
provide a relatively simple parameterization and the possibility of computing the corre-
sponding parameters efficiently. Similar approaches of bang-singular trajectory genera-
tion have been demonstrated for robotic arms with many joints [26] and for generic tra-
jectories where the derivatives of the position are constrained [12]. A similar decoupling
approach with time-optimal bang-singular trajectories has been used for omnidirectional
ground robots in the RoboCup competition [36].
Furthermore, the algorithm provides means to trade off computational complexity
and the performance of generated trajectories: after a first trajectory has been com-
puted, remaining computation time can be used to iteratively increase the performance
(i.e., reduce the duration) of the trajectory. The computational effort of this iterative
optimization can be determined a priori, but it is also possible to adapt to changing
availability of computational resources: the iteration process can be aborted at any time,
and the best trajectory generated up until that time can be used as a solution while still
maintaining feasibility.
Preliminary results of the decoupling approach were presented in a previous conference
paper [16]. This paper extends these results by: 1) introducing an iterative method for
optimally choosing the decoupling parameters, 2) a rigorous way to account for a large
class of initial conditions without violating the dynamic constraints of the vehicle, and
3) an evaluation of the computational requirements and performance of the algorithm.
1.4 Outline
The remainder of this paper is organized as follows: Section 2 introduces the model of
the quadrocopter dynamics that is used throughout the paper. The trajectory generation
problem is then formally presented in Section 3. Section 4 derives the feasibility conditions
from the dynamic model and the decoupling used to simplify the trajectory generation
problem while guaranteeing feasibility. Section 5 introduces the basic proposed trajectory
generation algorithm that satisfies the trajectory generation problem. Section 6 presents
the iterative optimization of the decoupling parameters to improve performance when
more computation time is available. The use of the trajectory generation algorithm as
an implicit feedback law is discussed in Section 7. Section 8 presents results on the
computational performance of the method and characterizes the properties of planned
trajectories in comparison to time-optimal trajectories. An experimental validation of
the method is presented in Section 9, and conclusions as well as an outlook are given in
Section 10.
2. Dynamic Model
86
2. Dynamic Model
ωz ωx
a
ωy
V
z
y
O
x
Figure 1. The inertial coordinate system O, the body-fixed coordinate system V, and the
control inputs of the quadrocopter. The rotational rates ωx , ωy , and ωz are tracked by an on-
board controller, using gyroscope feedback. The collective mass-normalized thrust a acts along
the third body axis of the vehicle.
O
v = OV R Vv. (1)
87
Paper II. Real-Time Trajectory Generation for Quadrocopters
tracking controllers):
where amin > 0. This limitation is motivated by the fact that typical quadrocopters have
fixed-pitch propellers, the direction of rotation of which cannot be reversed during flight.
z̈ a −g
The change of vehicle attitude is related to the rotational control inputs through the
kinematic relationship [22]
−ωz
0 ωy
O
Ṙ = OV R ωz 0 −ωx . (7)
V
−ωy ωx 0
Note that the above equations of motion neglect well-known aerodynamic effects that
act on quadrocopters, such as rotor damping [34] and drag-like effects [1]. While such
effects can be significant at high flight speeds, the algorithm presented herein is mainly
intended for navigating relatively small spaces where flight speeds are limited. Omitting
these effects eliminates the need to quantify them over a large range of flight regimes
and also greatly simplifies the problem. This motivates their omission in this paper.
These simplifying assumptions can be considered similar to model reduction approaches
for model predictive control (see e.g. [14]) in that the model accuracy is traded off for
computational performance.
88
3. Problem Statement
3. Problem Statement
The trajectory generation problem considered herein can be described as follows: Let
the quadrocopter have a given initial state (consisting of initial vehicle position x0 , y0 , z0 ,
velocity ẋ0 , ẏ0 , ż0 , and attitude OV R0 ). Generate a trajectory that satisfies the initial state
constraints and drives the vehicle to a target point (taken to be, without loss of generality,
the origin), with the vehicle reaching the target point at rest. The generated trajectory
must be feasible with respect to the quadrotor dynamics (6)-(7) and the control input
constraints (2)-(5). It should also reach the target point as quickly as possible, and its
generation should be computationally inexpensive.
This section describes how the trajectory generation problem statement from the previous
section is converted into three separate, more tractable trajectory generation problems.
The derivation of the control inputs for an arbitrary vehicle trajectory (x(t), y(t), z(t))
is presented first (Section 4.1). Using this derivation, the general conditions under which
trajectories are feasible are then obtained (Section 4.2). From these general trajectory
constraints, a set of decoupled trajectory constraints are then generated through approx-
imations that preserve feasibility. These decoupled constraints differ from the general
trajectory constraints in that the motion constraints of one degree of freedom (e.g., x(t))
do not depend on the other two degrees of freedom (e.g., y(t) and z(t)).
ẍ 0 0
f := ÿ + 0 = OV R 0 . (8)
z̈ g a
Using the two-norm (denoted by k·k), the thrust a required to follow the trajectory
can be calculated by using the property that a pure rotation matrix does not change the
89
Paper II. Real-Time Trajectory Generation for Quadrocopters
f
f¯ := (10)
kf k
and can be seen to define the third column of the rotation matrix by substituting f¯ back
into Equation (8):
0
O ¯
VR 0 = f . (11)
Taking the derivative of the above equation and combining it with Equation (7) gives
two of the vehicle body rates as functions of the current attitude and f¯˙:
ωy
−ωx = O Rf¯˙ .
V
(12)
Equation (9) and the first two rows of Equation (12) provide three equations for four
unknown control inputs. To fully constrain the control input trajectories for the given
position trajectory, a fourth constraint must be additionally specified, which can be taken
to be a user-defined constraint on ωz (ωz = 0, for example).
Collective Thrust The collective thrust calculated from Equation (9) must lie be-
tween the minimum and maximum thrust defined by Equation (5), i.e.
p
kf k = ẍ2 + ÿ 2 + (z̈ + g)2 ≥ amin (13)
kf k ≤ amax . (14)
Rotational Rates The actual body rate control inputs ωx and ωy can only be com-
90
4. Feasibility Conditions and Decoupling
Note that the above constraint is independent of the specification of ωz , and therefore
holds for a translational trajectory irrespective of the rotational motion defined through
ωz .
Initial State The trajectory must satisfy the constraints arising from the initial state
of the vehicle. Specifically, the position and velocity must coincide with the initial values
and the acceleration at the start of the trajectory must be such that Equation (11) is
satisfied with the given attitude, that is
0
O ¯
V R0 0 = f (t = 0) . (19)
Minimum Thrust Constraint In order to satisfy the minimum thrust constraint (13),
91
Paper II. Real-Time Trajectory Generation for Quadrocopters
z̈ ≥ z̈min (20)
It is then straightforward to verify that Equation (13) holds independently of the trajec-
tories in x and y because the thrust is no smaller than amin . While, at this point, it may
seem beneficial to choose the smallest possible value of z̈min , the tradeoffs involved in the
choice of this parameter will become obvious in the following section.
Rotational Rate Constraint The rotational rate control inputs nonlinearly couple
the jerk and acceleration of the three degrees of freedom according to Equation (12), and
their limitation (2)-(3) therefore couples the permissible values of the trajectory jerks and
accelerations.
The trajectory jerk f˙ is related to the actual control inputs through (12) by explicitly
calculating f¯˙:
ωy !
V f˙ f f T f˙
−ωx = O R − . (22)
kf k kf k3
0
Using the triangle inequality and the Cauchy-Schwarz inequality [2], the above expression
is at most
f˙ f f T f˙
ωx,y ≤ − (23)
kf k kf k3
f˙ f f T f˙
ωx,y ≤ + (24)
kf k kf k3
2
f˙ f f˙
ωx,y ≤ + (25)
kf k kf k3
f˙
ωx,y ≤ 2 . (26)
kf k
Because the minimum vertical acceleration constraint (20) provides a lower bound for
92
4. Feasibility Conditions and Decoupling
kf k, feasibility with respect to the body rate input constraints (2)-(3) is guaranteed if
(z̈min + g)
f˙ ≤ ωxy,max (27)
2
holds. For the decoupled system, this can be ensured by limiting the permissible per-axis
jerk of a planned trajectory by
and choosing the limiting values such that Equation (27) is satisfied:
It can now be seen that the lower bound of the vertical acceleration (z̈min ) can be used
as a design parameter in order to trade off higher allowable jerk magnitudes against
achievable negative accelerations in the vertical (Equations (20) and (29)).
For the purpose of this paper, the jerk limit is chosen to be identical for the three
degrees of freedom, i.e.
Maximum Thrust Constraint The maximum allowable thrust (14) imposes a con-
straint on the two-norm of the accelerations in the three degrees of freedom. This con-
straint is satisfied if
where z̈ is already lower-bounded by Equation (20). The acceleration bounds of the three
degrees of freedom are parameterized as functions of the maximum thrust constraint (5)
using the maximum acceleration tradeoff parameters αx , αz ∈ (0 1):
z̈max
αz := (32)
(amax − g)
ẍmax
αx := q . (33)
2 2
amax − (z̈max + g)
The two parameters specify the maximum accelerations ẍmax and z̈max . The per-
missible acceleration ÿmax is then chosen to be as large as possible while fulfilling con-
93
Paper II. Real-Time Trajectory Generation for Quadrocopters
straint (14):
q
ÿmax = a2max − ẍ2max − (z̈max + g)2 . (34)
z̈0 a0 −g
with the initial thrust control input a0 remaining a free degree of freedom. For appli-
cations where the trajectory generation algorithm is applied recursively (e.g. when it
is used as a feedback law), the last value of the thrust a is typically available and can
readily be used to compute the initial acceleration; in other usage scenarios, the initial
value a0 may be considered a design variable.
It is assumed that the initial state z̈0 defined by Equation (35) satisfies the minimum
acceleration constraint (20), i.e. the initial acceleration must satisfy z̈0 ≥ z̈min for the
chosen value a0 . This condition limits the range of permissible initial attitudes as can be
seen in Equation (35).
Because the allowable acceleration magnitudes (ẍmax , ÿmax , z̈max ) can be traded off
freely through Equations (32) and (33), the initial accelerations (ẍ0 , ÿ0 , z̈0 ) may not satisfy
the constraints (31). For nonzero initial accelerations, the acceleration constraints (31)
are therefore replaced by the time-varying constraints
These time-varying constraints are constructed as follows, here explained for the example
of the first degree of freedom: The maximum allowable acceleration magnitude ẍ(t) starts
at the magnitude of the initial acceleration (e.g. |ẍ0 |). It then has constant slope (e.g.
cx ) until it reaches the design value (e.g. ẍmax ):
(
|ẍ0 | + cx t for 0 ≤ t < (ẍmax − |ẍ0 |)/cx
ẍ(t) = (37)
ẍmax for t ≥ (ẍmax − |ẍ0 |)/cx .
It now remains to choose the slopes cx , cy , cz such that the maximum thrust constraint (14)
94
4. Feasibility Conditions and Decoupling
remains satisfied for all times given the initial acceleration (ẍ0 , ÿ0 , z̈0 ). This is done
by differentiating between degrees of freedom where the constraints are increasing (i.e.,
cx > 0) and those where they are decreasing (cx < 0).
For degrees of freedom where the initial acceleration exceeds the allowable acceleration
...
(e.g. |ẍ0 | > ẍmax ), the slope is chosen to be the maximum jerk cx = − w max as given in
Equation (30). This corresponds to the fastest possible transition into the designed
maximum magnitude that is permissible under the jerk constraint (28).
Degrees of freedom where the initial acceleration is smaller than the allowable accel-
eration (e.g. |ẍ0 | < ẍmax ) cannot increase their acceleration at the maximum permissible
jerk value without violating the maximum thrust constraint (14). Intuitively speaking,
they must ‘wait’ for other degrees of freedom to reduce their respective acceleration before
using the full acceleration. Given that the constraints are always reduced in magnitude
at the maximum rate, the longest duration for any degree of freedom to reduce to zero
acceleration is
|ẍ0 | |ÿ0 | |z̈0 |
∆T0 := max ... , ... , ... . (38)
w max w max w max
The slope for acceleration bounds exceeding the initial acceleration magnitude are then
chosen to reach their allowable acceleration after ∆T0 , given here for the example of the
x degree of freedom:
ẍmax − |ẍ0 |
cx = . (39)
∆T0
The proof that this choice of slopes cx , cy , cz satisfies the maximum thrust con-
straint (14) is omitted here but may be found in Appendix A.
Overview This completes the decoupling of the system. In conclusion, if the following
conditions are satisfied:
1. the initial attitude is converted to an initial acceleration according to Equation (35)
and satisfies the minimum vertical acceleration constraint (20);
2. the jerk in each axis is constrained in magnitude according to Equation (28); and
3. the acceleration of the individual degrees of freedom is constrained by Equations (20)
and (36);
then the three-dimensional overall trajectory will be feasible with respect to the system
dynamics (6)-(7), the input constraints (2)-(5), and the initial conditions. It is then pos-
sible to design the trajectory of the three translational degrees of freedom independently.
The parameters used in the decoupling of the three degrees of freedom are the maxi-
mum acceleration tradeoff parameters αx and αz , the minimum vertical acceleration z̈min ,
and the initial thrust a0 .
95
Paper II. Real-Time Trajectory Generation for Quadrocopters
The basic trajectory generation algorithm will be introduced in this chapter. The de-
coupled feasibility constraints (Section 4.3) allow the planning to be carried out inde-
pendently for each degree of freedom while guaranteeing feasibility. This makes the
trajectory generation problem significantly more tractable and allows it to be solved in a
computationally efficient way.
Problem Statement Let s = (s1 , s2 , s3 ) = (w, ẇ, ẅ) be the state. The time-optimal
...
planning problem can then be stated as: find the planning input u = w minimizing the
final time tf,w :
ṡ1 = s2 (41)
ṡ2 = s3 (42)
ṡ3 = u, (43)
s1 (t = 0) = w0 (44)
s2 (t = 0) = ẇ0 (45)
s3 (t = 0) = ẅ0 (46)
s(t = tf,w ) = 0 (47)
ẅ ≤ s3 ≤ ẅ (48)
...
|u| ≤ w max . (49)
96
5. Decoupled Trajectory Planning
In this formulation, the initial vehicle attitude (35) is enforced by the initial con-
dition (46), the jerk constraint (28) by the input constraint (49), and the acceleration
constraints (20) and (36) by the state constraint (48). Because the translational dynam-
ics (6) contain no velocity-dependent aerodynamic effects, no velocity constraints are
considered.
H(s, u, λ, η) = λ1 s2 + λ2 s3 + λ3 u
(50)
+ η1 (s3 − ẅ) + η2 ẅ − s3
where λ are the adjoint variables and η are state constraint multipliers that fulfill
η≥0 (51)
η1 = 0 if s3 > ẅ (52)
η2 = 0 if s3 < ẅ . (53)
which results in
λ̇1 = 0 (55)
λ̇2 = λ1 (56)
λ̇3 = λ2 + η1 − η2 . (57)
The optimal control u∗ is the control input that minimizes the Hamiltonian function:
For this problem structure, it can be shown that the adjoint variables λ are continu-
ous [15]. Furthermore, λ3 = 0 must hold when a state constraint is active. The optimal
control input u∗ consists of interior arcs and boundary arcs. An interior arc is charac-
terized by η1 = η2 = 0 (i.e., the state constraint (48) is not active), λ3 6= 0 and therefore
97
Paper II. Real-Time Trajectory Generation for Quadrocopters
0
...
w max
0
...
− w max
ẅmax
0
ẅ0
ẅmin
w0
0
0 t1 t2 t3 t4 tf
Time t
...
u∗ = ± w max . On boundary arcs, the state constraint is active (i.e., s3 = ẅ or s3 = ẅ)
and it follows that λ3 = 0 and u∗ is chosen such that the state constraint remains active.
98
5. Decoupled Trajectory Planning
Furthermore, each boundary arc induces one additional constraint, in that one or the
other of the following conditions must hold:
• If the duration of the boundary arc is nonzero (t2 −t1 > 0 or t4 −t3 > 0, respectively),
then s3 must coincide with the upper or lower acceleration constraint at the start
point of the boundary arc (t1 or t3 ).
• If the above condition does not hold, the corresponding boundary arc must vanish
(t2 = t1 or t4 = t3 , respectively).
In the following, the solution to the optimal control problem (40)-(49) will be denoted
by w∗ (t). An example of the solution of the optimal control problem is illustrated in
Figure 2. This example depicts the case where all five sections are of non-zero duration.
5.3 Overview
The basic trajectory generation algorithm has herewith been described. The total three-
dimensional maneuver is given by
( ∗
x (t) for 0 ≤ t ≤ tf,x
x(t) = (60)
0 for t > tf,x
( ∗
y (t) for 0 ≤ t ≤ tf,y
y(t) = (61)
0 for t > tf,y
( ∗
z (t) for 0 ≤ t ≤ tf,z
z(t) = (62)
0 for t > tf,z .
The maneuver satisfies the initial conditions, the quadrocopter dynamics, and the input
99
Paper II. Real-Time Trajectory Generation for Quadrocopters
The algorithm presented in the previous chapter computes a feasible trajectory that
satisfies the trajectory planning problem. However, the decoupling approach involved the
choice of the design parameters αx , αz , z̈min , a0 in order to allow the separate planning in
each degree of freedom. This chapter presents and discusses the properties and influence
of the individual parameters, and proposes possible iterative improvement schemes that
permit the computation of better trajectories6 through their variation.
For any choice of the decoupling parameters satisfying the conditions outlined in
Section 4.3, the solution of the decoupled trajectory planning (60)-(62) is a feasible solu-
tion to the trajectory generation problem. It is therefore possible to apply the iterative
improvement schemes to only a subset of parameters, or to not apply them at all.
A consequence of this property is that, as long as the basic trajectory generation
algorithm from the previous chapter has been executed at least once, a feasible solution
is available at any time, and further executions then improve on previous ones. This
implies that computational constraints can always be abided by aborting computation
when the permissible computation duration is reached. If the computation time is not
sufficient for the iteration schemes to converge to a user-defined level of accuracy, then
the available solution differs from the converged solution in that the maneuver duration
may be longer, but is still guaranteed to be feasible with respect to the dynamics and
constraints of the problem.
100
6. Choice of Design Parameters
of freedom has a longer execution time than the others, it is allocated more acceleration
in order to reduce the execution time.
An important property of the decoupling parameters αx and αz is that the dura-
tion of the time-optimal maneuver of each single degree of freedom depends on them
monotonically. This can be shown as follows:
The acceleration boundaries (ẍmax , ÿmax , z̈max ) are strictly monotonic with respect to
the control effort tradeoff parameters (32)-(33). By construction, the time-varying accel-
eration constraints (37) monotonically depend on the respective acceleration boundary
ẅmax . That is, if ẅmax,1 ≥ ẅmax,2 then
Because the computed trajectories are optimal with respect to the maneuver duration,
an increase of the constraints cannot increase the terminal time of the maneuver [3] and
the maneuver duration is therefore monotonic with respect to the decoupling parameters
αx and αz .
From this property, it follows that a maneuver synchronizing the maneuver durations
of the three degrees of freedom minimizes tf . This can be shown as follows: Assume
that the three maneuver durations are synchronized. Because the maneuver durations
of the three degrees of freedom are monotonic with respect to the control effort tradeoff
parameters, no change of the parameters can reduce all three maneuver durations (and
thereby the total maneuver duration tf as given by Equation (63)).
The monotonicity of the maneuver durations with respect to the optimization param-
eters allows the use of a large number of optimization algorithms to find their optimal
value. A straightforward implementation consists of two loops: an inner loop that syn-
chronizes the two horizontal degrees of freedom (i.e., tf,x and tf,y ) by varying αx , and an
outer loop that synchronizes the vertical motion (i.e., tf,z ) to the two horizontal motions
by varying αz . Each of these loops can be implemented in a simple manner by a bisection
algorithm, which assures linear convergence. Specifically, this implies that the complexity
of the two nested bisections finding the optimal values of αx and αz to within a tolerance
xz is [7]
2
O log2 (xz ) . (65)
In addition, the computation of the inner loop can be aborted early if it is found that
the maneuver duration of both horizontal degrees of freedom is either shorter or longer
than the duration of the vertical degree of freedom. Due to the maneuver duration
monotonicity with respect to αx , it is not possible to increase or decrease the maneuver
duration of both horizontal degrees of freedom, and synchronization with the vertical
degree of freedom is therefore not possible. By using this property, it is not necessary to
perform the bisection of αx to full accuracy for each bisection step in αz .
101
Paper II. Real-Time Trajectory Generation for Quadrocopters
Duration tf (s) 4
2
-8 -5 -2
−2
Minimum acceleration z̈min (m s )
Figure 3. Influence of the design parameter z̈min on the maneuver duration tf with optimized
tradeoff parameters αx , αz for three different maneuvers (solid, dashed, and dotted). The op-
timization presented in Section 6.2 finds the minimum of this function through an exhaustive
search.
The two nested bisection algorithms provide a straightforward way to compute the
optimal values of the two control tradeoff parameters αx and αz . Furthermore, the
maneuver duration monotonicity and linear convergence of the bisection method allow
the a priori determination of hard constraints on the number of required executions.
However, more sophisticated and multivariate optimization methods (see e.g. [11]) could
provide higher computational efficiency in the finding of the optimal values.
102
7. Use as a Feedback Law
guaranteed.
6.3 Overview
This chapter presented schemes for finding the optimal values of the design parameters
αx , αz , z̈min such that the maneuver duration is minimized. The application of these
schemes is optional since the feasibility of the trajectory generation algorithm does not
depend on them. Depending on the computational constraints and performance require-
ments, one may choose to optimize all design parameters, or to fix some or all of them
ahead of time.
The schemes presented herein are focused on simplicity and assured convergence, and
optimize each of the design parameters individually in nested loops. Through the use
of more advanced optimization algorithms, and by directly considering the multivariate
optimization problem, the computation time may be reduced further.
The decoupling process also involved the choice of the initial thrust a0 in Equa-
tion (35). While this can be considered a design parameter and therefore optimized, it
is assumed herein that it is chosen based on the thrust applied previously. This results
in continuous accelerations and therefore continuous thrust commands (as computed by
Equation (9)), a favourable property with respect to the underlying thrust dynamics.
103
Paper II. Real-Time Trajectory Generation for Quadrocopters
8. Performance Evaluation
The overall trajectory generation algorithm, consisting of the basic decoupled trajec-
tory generator (Section 5) and the iterative improvement schemes (Section 6), has now
been described. Generated trajectories satisfy the feasibility constraints imposed by the
dynamics of the quadrocopter by construction. This section will characterize the perfor-
mance of the presented algorithm with respect to the design objectives of achieving high
flight performance (short maneuver durations) and low computational complexity (short
computation times).
104
8. Performance Evaluation
0.2
-0.2
0 2 4 6 8 10
Horizontal position (m)
20
Inputs ωx (rad s−1 ), a (m s−1 )
10
-10
0 0.5 1 1.5 2
Time t (s)
Figure 5. Control inputs corresponding to the comparison of the purely horizontal maneuver
shown in Figure 4. The solid lines represent the thrust input a, and the dotted lines the rotational
rate input ωx . Black lines correspond to the time-optimal translation [17], and gray lines are the
inputs of the trajectory generated by the algorithm presented herein. Note that the decoupling
yields non-maximal thrust inputs when the vehicle rotates (beginning, center, and end of the
maneuver), and the conservatism introduced in the decoupled jerk feasibility constraints leads
to a less efficient use of the rotational rate control input.
trajectory generated with the algorithm presented herein takes 2.01 s, which amounts to
an increase of 29 %. Considering the shape of the generated trajectories, two significant
factors contributing to the increased duration can be identified:
1) The decoupling assumption, along with identical initial and final altitudes and the
maneuver starting at rest, leads to the generated trajectory being entirely at constant
altitude. In comparison, the time-optimal maneuver exploits any available thrust during
attitude changes to accelerate upwards, thereby allowing the vehicle to tilt further and
increase its horizontal acceleration. This effect is further enabled because the time-
105
Paper II. Real-Time Trajectory Generation for Quadrocopters
0 2 4 6 8
Horizontal position (m)
optimal maneuver is planned using the true control inputs (a, ωx ), enabling instantaneous
changes in thrust. The decoupling presented herein results in the trajectory being planned
in jerk, meaning that the thrust (as computed by Equation (9)) is always continuous
throughout the maneuver. Although this lack of discontinuities in the thrust control
input results in lower performance, it may be desirable due to the underlying actuator
dynamics of the quadrocopter.
2) A further noticeable effect is highlighted in the control inputs corresponding to the
two maneuvers (Figure 5): while the time-optimal maneuver uses the maximum rotational
rate input for significant durations at the beginning and end of the maneuver, the motion
generated using the algorithm presented herein results in lower rotational rate control
inputs. This is mostly caused by the conservative approximations used to decouple the
rotational rate feasibility conditions (30).
As a second example, consider a translation of 8 m in the horizontal and 8 m in the
vertical. The time-optimal trajectory and the generated trajectory are shown in Figure 6.
106
8. Performance Evaluation
In this case, the duration of the generated trajectory is 58 % longer than the time-optimal
trajectory (2.78 s vs. 1.76 s). Two contributing factors can be identified:
1) The arguments about the use of jerk as a control input also apply in this case,
leading for example to a slower initial acceleration than the time-optimal motion provides.
2) A significant difference can be seen towards the end of the motion: during the final
deceleration phase, the generated trajectory results in the vehicle pitching to an angle
of approximately 50◦ , applying the optimized minimal vertical acceleration z̈min and the
minimal horizontal acceleration −ẍmax . The time-optimal trajectory results in the vehicle
being pitched to an upside-down attitude at an angle of approximately 140◦ , applying
full thrust and thereby achieving significantly lower vertical accelerations. This differ-
ence is caused by the decoupling of the three degrees of freedom, wherein the minimum
vertical acceleration constraint (20) was introduced. The value of the constraint was
then optimized to trade off the permissible jerk against allowable vertical deceleration
(Section 6.2).
107
Paper II. Real-Time Trajectory Generation for Quadrocopters
prove the trajectory performance as presented in Section 6. For example, it can be shown
that the optimization of the acceleration tradeoff parameters αx and αz to a tolerance
of 1 % requires at most 171 calls of the one-dimensional decoupled trajectory generator,
therefore requiring approximately 1.4 ms of computation time. For a tolerance of 10 % in
the tradeoff parameters, a maximum of 78 calls suffice (resulting in approximately 0.6 ms
computation time). This optimization can then be repeated for varying values of the
minimum vertical acceleration z̈min , where the number of evaluated points may be chosen
based on computational constraints.
9. Experimental Validation
108
9. Experimental Validation
Position y (m)
0
-2
-4
-2 0 2 4
Position x (m)
Note that the implicit control law does not track a previously planned trajectory,
but a new trajectory to the target point is computed at every controller update. This
replanning behaviour can be observed particularly well when significant deviations from a
previously planned trajectory occur, for example during the initial deceleration phase and
when the vehicle changes direction after the target point is changed. While the initially
planned trajectory contains a significant amount of overshoot, the actual overshoot during
execution of the maneuver is smaller. Potential explanations for this behaviour are the
unmodeled effects of drag (which can cause more deceleration) [1] and translational lift
(which can cause increased propeller efficiency) [21]; both of these are known to have
significant influence at the maneuvering speeds encountered in this experiment. Because
the higher-than-planned deceleration allows for a more direct flight path, the trajectories
planned later on contain almost no overshoot.
109
Paper II. Real-Time Trajectory Generation for Quadrocopters
Position z (m) 6
2
-2 )
-2 0 x (m
0 n
Positio 2
s itio
n y (m
)
2 Po
Figure 8. Three-dimensional motion experiment. The vehicle starts at the top-right (green
circle) with the target point being at the bottom left (left red cross). As the vehicle approaches
the target point, the target is switched to the center-right target point (right red cross). The
solid red line shows the flown trajectory, and the dotted blue lines (starting at the green circles)
show some of the planned trajectories.
the target point is changed when the vehicle is within 1 m of the target point, and the
new target point is then (x = 0 m, y = 2.5 m, z = 4 m). The vehicle starts at a speed of
3.6 m s−1 , and reaches a peak speed of 7.2 m s−1 .
Figure 9 shows the flight trajectory over time along with the same planned trajectories
that are seen in Figure 8. Note that, particularly during high-speed flight phases, the
vehicle lags behind the planned trajectories. This behaviour is again consistent with
aerodynamic effects [35], and is excarberated (in comparison to the constant altitude
experiment) by the vertical speed of the vehicle ranging from −3.8 m s−1 to 2.8 m s−1 .
The propeller inflow velocity thus changes considerably.
110
10. Conclusion
2
0
-2
6
4
2
0 1 2 3 4 5
Time t (s)
Figure 9. Flown (solid red) and a selection of planned (dashed blue, starting at green cir-
cles) trajectories over time for the experiment shown in Figure 8. The set point switches from
(x = 3 m, y = −2.5 m, z = 1.5 m) to (x = 0 m, y = 2.5 m, z = 4 m) at t = 1.94 s.
(inspired by [37]) in order to ensure that no collisions occur. Over the duration of the
project, tens of thousands of trajectories were generated.
10. Conclusion
This paper presented and analyzed a method for generating quadrocopter flight trajec-
tories. The method efficiently computes trajectories that both allow for fast motion and
are guaranteed to be feasible.
The decoupling of the trajectory generation problem under conservative approxima-
tion of the feasibility constraints allows the problem to be simplified considerably, and the
computation of time-optimal trajectories for the decoupled system reduces to determining
the switching times of the bang-singular solution trajectory. A particular strength of the
method is that it very quickly allows the computation of a provably feasible trajectory
(that is, within few tens of microseconds on a laptop computer), but can also iteratively
provide higher-quality solutions if more computation time is available. The iterative im-
provement allows optimal design parameters to be determined and can be carried out
until convergence criteria are met, or aborted at any time if no more computation time
is available.
Analysis of the computed trajectories showed structural differences when compared
to time-optimal trajectories. These differences are mostly caused by the decoupling ap-
111
Paper II. Real-Time Trajectory Generation for Quadrocopters
Figure 10. Tower made of foam bricks, assembled by quadrocopters during the Flight Assem-
bled Architecture project [41]. Two of the quadrocopters can be seen above the tower to the
left and right; a third is seen resting on a charging station on the wall to the left. The flight
paths used by the quadrocopters for this project were generated using the trajectory generation
algorithm presented in this paper.
proach. The trajectory generation algorithm presented herein trades off the full use of
the dynamic capabilities of quadrocopters for the ability to compute tens of thousands of
trajectories every second. This makes it particularly suitable to applications that require
the planning of large numbers of trajectory candidates (such as, for example, sampling-
based path planners) or fast re-planning due to rapidly changing or a priori unknown
environments or task objectives.
The approach was experimentally verified for quadrocopters maneuvering at moder-
ately high speeds (up to 7 m s−1 ), and was shown to cope well with disturbances due to
unmodeled dynamic effects and changing target points. Its robustness and applicability
as a building block in more complex systems has been demonstrated by its use in the
112
A. Feasibility of Time-Varying Acceleration Constraints
This appendix contains the proof that the time-varying acceleration constraints (36) (used
to account for non-zero initial accelerations) satisfy the maximum thrust constraint (14).
For the purpose of simplicity, the proof will first be presented for a related set of time-
varying constraints (referred to as constant time bound change). The findings of this
related acceleration change will then be applied to the actual time-varying acceleration
constraints (36).
113
Paper II. Real-Time Trajectory Generation for Quadrocopters
reach their respective bounds (ẍmax , ÿmax , z̈max ) in constant time ∆T0 , i.e.
with ∆T0 defined in Equation (38). Then the maximum acceleration bound (14) remains
satisfied for all t ∈ (0 ∆T ):
2 2 2
ẍct + ÿ ct + z̈ ct + g =
cx 2 cy 2
|ẍ0 | + t + |ÿ0 | + t (69)
∆T ∆T
cz 2
+ |z̈0 | + g + t ≤ amax .
∆T
Proof: Without loss of generality, let the normalized time be t̃ such that ∆T̃ = 1.
Rewriting (69) in terms of (ẍmax , ÿmax , z̈max ) using Equations (66)-(68) results in the
condition
2 2
ẍmax + cx t̃ − 1 + ÿmax + cy t̃ − 1 +
2 (70)
z̈max + g + cz t̃ − 1 ≤ amax
Because 0 ≤ 1 − t̃ ≤ 1 a sufficient condition for the inequality to hold is
The initial acceleration (35) can be rewritten using Equations (66)-(68) to yield
with a0 ≤ amax by design. Expanding the above yields the required inequality (72).
114
References
acceleration exceeds the allowed acceleration magnitude (e.g. |ẍ0 | > ẍmax ). In this case,
...
the actual time-varying bounds decrease at the minimum jerk − w max until the allowable
acceleration (e.g. ẍmax ) is reached. To show that the inequality (69) remains satisfied, it
suffices to note that, through the construction of cx , cy , cz in Equations (66)-(68),
holds for all times t. The time-varying acceleration constraints must therefore be feasible
with respect to the maximum acceleration bound (14) since the constant time bound
change is feasible.
References
[1] Moses Bangura and Robert Mahony. Nonlinear Dynamic Modeling for High Perfor-
mance Control of a Quadrotor. In Australasian Conference on Robotics and Automa-
tion, 2012.
[2] Dennis S. Bernstein. Matrix Mathematics. Princeton University Press, 2005.
[3] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Vol. I. Athena
Scientific, third edition, 2005.
[4] Patrick Bouffard, Anil Aswani, and Claire J Tomlin. Learning-Based Model
Predictive Control on a Quadrotor: Onboard Implementation and Experimental
Results. In International Conference on Robotics and Automation, 2012.
[5] Y. Bouktir, M. Haddad, and T. Chettibi. Trajectory Planning for a Quadrotor
Helicopter. In Mediterranean Conference on Control and Automation, 2008.
[6] Arthur Earl Bryson and Yu-Chi Ho. Applied Optimal Control. Taylor & Francis,
1975.
[7] Richard L Burden and J Douglas Faires. Numerical Analysis. Brooks/Cole, ninth
edition, 2011.
[8] Ian D. Cowling, Oleg A. Yakimenko, and James F. Whidborne. A Prototype of
an Autonomous Controller for a Quadrotor UAV. In European Control Conference,
2007.
[9] Shay Gal-on and Markus Levy. Exploring CoreMark - A Benchmark Maximizing
Simplicity and Efficacy. The Embedded Microprocessor Benchmark Consortium
(EEMBC), 2012.
115
Paper II. Real-Time Trajectory Generation for Quadrocopters
[10] Carlos E. Garcı́a, David M. Prett, and Manfred Morari. Model Predictive Control:
Theory and Practice - A Survey. Automatica, 25(3):335–348, 1989.
[11] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization.
Academic Press, October 2003.
[12] Corrado Guarino, Lo Bianco, and Fabio Ghilardelli. Third Order System for the
Generation of Minimum-Time Trajectories with Asymmetric Bounds on Velocity
, Acceleration , and Jerk. In International Conference on Intelligent Robots and
Systems: Workshop on Robot Motion Planning: Online, Reactive, and in Real-time,
2012.
[13] Daniel Gurdan, Jan Stumpf, Michael Achtelik, Klaus-Michael Doth, Gerd Hirzinger,
and Daniela Rus. Energy-Efficient Autonomous Four-Rotor Flying Robot Controlled
at 1 kHz. In International Conference on Robotics and Automation, 2007.
[14] Juergen Hahn and Thomas F. Edgar. An Improved Method for Nonlinear Model
Reduction Using Balancing of Empirical Gramians. Computers & Chemical Engi-
neering, 26(10):1379–1397, 2002.
[15] Richard F. Hartl, Suresh P. Sethi, and Raymond G. Vickson. A Survey of the
Maximum Principles for Optimal Control Problems with State Constraints. SIAM
Review, 37(2):181–218, 1995.
[16] Markus Hehn and Raffaello D’Andrea. Quadrocopter Trajectory Generation and
Control. In IFAC World Congress, 2011.
[17] Markus Hehn, Robin Ritz, and Raffaello D’Andrea. Performance Benchmarking of
Quadrotor Systems Using Time-Optimal Control. Autonomous Robots, 33(1-2):69–
88, 2012.
[18] H. Hermes and G. Haynes. On The Nonlinear Control Problem With Control
Appearling Linearly. Journal of the Society for Industrial and Applied Mathematics
Series A Control, 1(2):85–108, 1963.
[19] Gabriel M. Hoffmann, Steven L. Waslander, and Claire J. Tomlin. Quadrotor
Helicopter Trajectory Tracking Control. In Conference on Decision and Control,
2008.
[20] Jonathan P. How, Brett Bethke, Adrian Frank, Daniel Dale, and John Vian.
Real-Time Indoor Autonomous Vehicle Test Environment. IEEE Control Systems
Magazine, 28(2):51–64, 2008.
[21] Haomiao Huang, Gabriel M. Hoffmann, Steven L. Waslander, and Claire J. Tomlin.
Aerodynamics and Control of Autonomous Quadrotor Helicopters in Aggressive
Maneuvering. In International Conference on Robotics and Automation, 2009.
[22] Peter C. Hughes. Spacecraft Attitude Dynamics. John Wiley & Sons, 1986.
[23] Qimi Jiang, Daniel Mellinger, Christine Kappeyne, and Vijay Kumar. Analysis
and Synthesis of Multi-Rotor Aerial Vehicles. In International Design Engineering
Technical Conference, 2011.
116
References
[24] Sertac Karaman and Emilio Frazzoli. Sampling-Based Algorithms for Optimal
Motion Planning. International Journal of Robotics Research, 30(7):846–894, 2011.
[25] Alonzo Kelly and Bryan Nagy. Reactive Nonholonomic Trajectory Generation via
Parametric Optimal Control. The International Journal of Robotics Research, 22(7-
8):583–601, 2003.
[26] Torsten Kroger and Friedrich M. Wahl. Online Trajectory Generation: Basic
Concepts for Instantaneous Reactions to Unforeseen Events. IEEE Transactions on
Robotics, 26(1):94–111, 2010.
[27] Sergei Lupashin, Markus Hehn, Mark W. Mueller, Angela P. Schoellig, Michael
Sherback, and Raffaello D’Andrea. A Platform for Aerial Robotics Research and
Demonstration: The Flying Machine Arena. Mechatronics, 24(1):41–54, 2014.
[28] Lorenz Meier, Petri Tanskanen, Lionel Heng, Gim Hee Lee, Friedrich Frandorfer, and
Marc Pollefeys. PIXHAWK: A Micro Aerial Vehicle Design for Autonomous Flight
Using Onboard Computer Vision. Autonomous Robots, 33(1-2):21–39, 2012.
[29] Daniel Mellinger and Vijay Kumar. Minimum Snap Trajectory Generation and
Control for Quadrotors. In International Conference on Robotics and Automation,
2011.
[30] Nathan Michael, Daniel Mellinger, Quentin Lindsey, and Vijay Kumar. The GRASP
Multiple Micro UAV Testbed. IEEE Robotics and Automation Magazine, 17(3):56–
65, 2010.
[31] Mark B. Milam, Kudah Mushambi, and Richard M. Murray. A New Computational
Approach to Real-Time Trajectory Generation for Constrained Mechanical Systems.
In Conference on Decision and Control, 2000.
[32] Mark W. Mueller and Raffaello D’Andrea. A Model Predictive Controller for
Quadrocopter State Interception. In European Control Conference, 2013.
[33] Mark W. Mueller, Markus Hehn, and Raffaello D’Andrea. A Computationally
Efficient Algorithm for State-to-State Quadrocopter Trajectory Generation and
Feasibility Verification. In International Conference on Intelligent Robots and
Systems, 2013.
[34] Paul Pounds, Robert Mahony, and Peter Corke. Modelling and Control of a Quad-
Rotor Robot. In Australasian Conference on Robotics and Automation, 2006.
[35] Caitlin Powers, Daniel Mellinger, Aleksandr Kushleyev, and Bruce Kothmann. In-
fluence of Aerodynamics and Proximity Effects in Quadrotor Flight. In International
Symposium on Experimental Robotics, 2012.
[36] Oliver Purwin and Raffaello D’Andrea. Trajectory Generation and Control for Four
Wheeled Omnidirectional Vehicles. Robotics and Autonomous Systems, 54:13–22,
2006.
[37] Oliver Purwin, Raffaello D’Andrea, and Jin-Woo Lee. Theory and Implementation of
Path Planning by Negotiation for Decentralized Agents. Robotics and Autonomous
Systems, 56(5):422–436, 2008.
117
Paper II. Real-Time Trajectory Generation for Quadrocopters
[38] Charles Richter, Adam Bry, and Nicholas Roy. Polynomial Trajectory Planning for
Quadrotor Flight. In International Conference on Robotics and Automation, 2013.
[39] Roland Siegwart, Illah R. Nourbakhsh, and Davide Scaramuzza. Autonomous Mobile
Robots. The MIT Press, second edition, 2011.
[40] Wannes Van Loock, Goele Pipeleers, and Jan Swevers. Time-Optimal Quadrotor
Flight. In European Control Conference, 2013.
[41] J Willmann, F Augugliaro, T Cadalbert, Raffaello D’Andrea, Fabio Gramazio, and
Matthias Kohler. Aerial Robotic Construction Towards a New Field of Architectural
Research. International Journal of Architectural Computing, 10:439–460, 2012.
118
Paper III
Abstract
This paper presents an algorithm that permits the calculation of interception
maneuvers for quadrocopters. The translational degrees of freedom of the quadro-
copter are decoupled. Pontryagin’s minimum principle is used to show that the
interception maneuver that minimizes the time to rest after the interception is
identical to the time-optimal maneuver that drives the vehicle to the position at
which it comes to rest after the interception. This fact is leveraged to apply previ-
ously developed, computationally efficient methods for the computation of intercep-
tion maneuvers. The resulting trajectory generation algorithm is computationally
lightweight, permitting its use as an implicit feedback law by replanning the tra-
jectory at each controller update. The validity and performance of the approach
is demonstrated experimentally by intercepting balls mid-flight. The real-time tra-
jectory generation permits to take into account changes in the predicted ball flight
path at each controller update.
119
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
1. Introduction
Quadrocopters have been widely adopted as experimental platforms for research in flying
robotics (see, for example, the testbeds [15, 20]). Reasons for the popularity of these
vehicles include the ability to hover, mechanical simplicity and robustness, and their
exceptional maneuverability due to typically high thrust-to-weight ratios combined with
the off-center mounting of the propellers.
From a controls perspective, a recent focus has been the planning and following of
trajectories that exploit the dynamical capabilities of these vehicles. Results include
algorithms that plan trajectories from classes of motion primitives, such as lines [14] or
polynomials [4, 7], while others solve an optimal control problem for approximate or full
vehicle dynamics (e.g. for minimum snap [19] or minimum time [12]).
In this paper, we consider the problem of using a quadrocopter for the purpose of
interception. The general interception problem has been studied in an optimal control
context for several decades (see, for example, [5] and references therein). Variations of
the problem have also been studied in robotics (e.g. for robotic arms [1] and ground
robots [22]) For quadrotor applications, interception problems have been treated in a
number of scenarios, including ball juggling, where the interception was more strongly
constrained to occur at a specified velocity and attitude [21]. In [3], a ball flight path was
intercepted on a given plane by setting the controller reference position to the interception
point.
The method we present herein permits the computation of interception maneuvers for
quadrocopters in real time. These maneuvers are optimal in that they minimize the time
to rest after the interception event, when decoupled dynamics are assumed. This paper
shows that the resulting maneuver structure is identical to the time-optimal maneuver
that brings the vehicle to the position at which it comes to rest after the interception.
This allows us to apply the efficient trajectory planning algorithm that was introduced
in [11]. Because the entire trajectory (consisting of the interception and to-rest motion
thereafter) is planned, it is easy to verify additional constraints such as, for example,
maximum allowable positions.
The trajectory generation algorithm is computationally light weight, permitting us
to recompute trajectories in real time at update rates on the order of tens to hundreds
of replannings per second. This also permits us to use the trajectory generation as an
implicit feedback law by applying the control inputs of the first section of the planned tra-
jectory (similar to model predictive control [8]) at each controller update. Furthermore,
trajectories can be planned from arbitrary initial states, and the generated trajectory
is guaranteed to be feasible with respect to the dynamic and input constraints of the
vehicle.
The remainder of this paper is structured as follows: In Section 2, we introduce the
dynamic model of the quadrocopter used in the trajectory generation. In Section 3,
we provide a brief overview of the previously presented trajectory generation algorithm
for time-optimal maneuvers to a specified position. Section 4 presents the interception
120
2. Vehicle Dynamics
ωy ωz
ωx
Figure 1. The four control inputs of the quadrotor vehicle. The rotational rates ωx , ωy ,
and ωz are assumed to follow commands without dynamics or delay. This is motivated by a
high-bandwidth on-board controller.
problem and includes the derivation of optimality conditions for interception trajectories.
In Section 5, we discuss the implementation of an algorithm computing such interception
maneuvers. Section 6 describes the experimental setup we use to intercept balls mid-
flight and presents experimental results, while Section 7 draws conclusions, highlighting
directions for future research.
2. Vehicle Dynamics
121
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
where amin > 0. This positive lower bound is motivated by the fact that typical quad-
rotor vehicles have propellers of fixed-pitch type, and are not able to stop or reverse the
propellers’ direction of rotation in flight.
z̈ a −g
−ωz
0 ωy
O
Ṙ = OV R ωz 0 −ωx . (3)
V
−ωy ωx 0
In this section, we provide an overview of the method used to generate trajectories from
arbitrary initial conditions to a target point. This approach was introduced in [11], and
is described here briefly because it will later be used.
The trajectory generation problem is simplified by approximating the quadrotor dy-
namics with three triple integrators:
...
x vx
...
y
= vy , (4)
...
z vz
and planning the trajectory in the jerks (vx (t), vy (t), vz (t)). The control inputs along a
trajectory (x(t), y(t), z(t)) are
ẍ(t) 0
f (t) := ÿ(t) + 0 , (5)
z̈(t) g
122
3. Trajectory Generation Algorithm
f (t)
f¯(t) := , (6)
kf (t)k
a(t) = kf (t)k , (7)
ωy (t)
−ωx (t) = O R(t)f¯˙(t) .
V
(8)
For a trajectory to be feasible, the resulting control inputs a, ωx , and ωy must lie within
their allowable sets. In order to satisfy the thrust constraint (1), constant acceleration
bounds for each degree of freedom are introduced. Furthermore, it was shown that it is
difficult to choose allowable jerk values such that the rotational rate control inputs (8) do
not exceed the vehicle body rate limitations. It is however straightforward to compute
the rotational rates along a planned trajectory. If these exceed limitations, it was shown
that a feasible trajectory can always be found by reducing the allowable jerk values.
A trajectory to the target position can be computed for each degree of freedom of the
approximate dynamics (4), with given acceleration and jerk constraints. Time-optimal
trajectories from arbitrary initial conditions to a target position (to be reached at rest)
were presented. Using Pontryagin’s minimum principle, it follows that the switching
function is of parabolic shape with additional intervals at the zero crossings in which it
remains zero. The optimal control input v ∗ is bang-singular, consisting of at most five
distinct regions:
• [0 t1 ): v ∗ = ±vmax ,
• [t1 t2 ): v ∗ = 0,
• [t2 t3 ): v ∗ = ∓vmax ,
• [t3 t4 ): v ∗ = 0,
• [t4 tf ]: v ∗ = ±vmax ,
where vmax denotes the maximum allowable jerk. In this solution, the intervals [t1 t2 ) and
[t3 t4 ) represent singular arcs, in which the control input is determined by the acceleration
remaining constant on its boundary. The initial control input and the five times t1 . . . tf
fully define the maneuver.
The performance of this trajectory generation algorithm has been demonstrated in a
number of experiments [11]. It was shown that it can be used as an implicit feedback law
by re-planning a trajectory for each controller update. The computational load caused by
this is minor, with computation taking no longer than 0.2 ms on a conventional desktop
computer.
We will now adapt the trajectory generation algorithm, not to reach a target position
as quickly as possible, but to cross a specified position at a specified time.
123
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
While the time-optimal to-rest maneuvers presented above are very useful in many appli-
cations, situations may arise in which a specified position must be reached more quickly
than it is possible to reach at rest. Using the same coordinate decoupling (4) as before, we
will now derive trajectories for a given interception point (x̂, ŷ, ẑ), which must be reached
at a specified interception time t̂.
Because the interception constraint is not sufficient to uniquely define the trajectory,
we further require the planned trajectories to bring the vehicle to rest as quickly as
possible after the interception. This choice provides two advantages:
• The problem statement now includes not only the motion to intercept the position
at the right time, but also the motion to bring the vehicle back to rest after the
interception. This is desirable because it permits easy verification of constraints such
as a maximum desirable displacement.
• The choice of the time to rest as a cost function forces aggressive deceleration after the
time of interception, avoiding excessive overshoot.
We will now formally state the trajectory generation problem described above, allow-
ing the optimality conditions to be applied in order to find solutions to it.
ṡ1 = s2 , (9)
ṡ2 = s3 , (10)
ṡ3 = v , (11)
s(t = 0) = s0 , (12)
s1 (t = t̂) = q̂ , (13)
s2 (t = tf ) = s3 (t = tf ) = 0 , (14)
where t̂ is the interception time and q̂ is the interception position. Furthermore, the input
and state constraints
124
4. The Interception Maneuver
must be satisfied.
We begin by noting that the algorithm presented in Section 3 already solves the above
problem if it is possible to reach q̂ such that tf is smaller than t̂: Because the computed
trajectory reaches q̂ at rest, the interception constraint (13) will be satisfied by this time-
optimal motion followed by the vehicle remaining at rest until the interception time (i.e.
v ∗ = 0 for tf < t ≤ t̂). From here on, we will assume that t̂ is smaller than tf , and will
now derive the optimality conditions and resulting structure of optimal maneuvers.
H(s, v, λ, η) = 1 + λ1 s2 + λ2 s3 + λ3 v
(17)
+ η1 (−q̈min + s3 ) + η2 (q̈max − s3 ) ,
where λ are the adjoint variables and η are state constraint multipliers that fulfill the
constraints
η ≥0, (18)
η1 = 0 if s3 > q̈min , (19)
η2 = 0 if s3 < q̈max . (20)
λ̇1 = 0 , (22)
λ̇2 = λ1 , (23)
λ̇3 = λ2 + η1 − η2 . (24)
The optimal control v ∗ is the control input that minimizes the Hamiltonian function:
125
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
0 t1 t2 t̂ t3 t4 tf
Figure 2. Sketch of the adjoint variable trajectory λ3 for an example maneuver. At the
interception time t̂, the shape switches from parabolic to constant slope as λ1 jumps to zero. The
intervals [t1 t2 ) and [t3 t4 ) represent singular arcs, in which one of the acceleration constraints (16)
is active.
At the interception time t̂, the discontinuity in the first adjoint variable [5] is
where t̂− and t̂+ signify just before and just after t̂, respectively, and ν is a constant
Lagrange multiplier. The other two adjoint variables λ2 , λ3 remain continuous over the
interception time. Because the final state s1 (tf ) is free, the costate constraint at the final
time is [2]
λ1 (tf ) = 0 . (27)
It was shown in [18] that, for problems of this form, the adjoint variables λ are
continuous when the acceleration constraint (16) becomes active or stops being active.
Furthermore, λ3 = 0 must hold over the duration in which a state constraint is active.
The costate dynamics (22)-(24) form a double integrator with respect to λ3 . With λ1
being zero from t̂+ onwards, and with the intersection time condition (26), it follows that
the trajectory of λ3 has the following shape, which is sketched in Figure 2:
• In the time interval 0 t̂− , λ3 has a parabolic shape because λ̈3 = λ1 is constant when
126
4. The Interception Maneuver
by the five times t1 , t2 , t3 , t4 , and tf and the initial control input. For the interception
trajectory, an additional constraint arises because the switching function λ3 is linear after
the interception time: The remaining trajectory can only contain two regular arcs and
one singular arc.
Now, assume that we have computed an interception maneuver that satisfies the
above optimality conditions, implying that it minimizes the to-rest duration after the
interception. The maneuver terminates at some position qf (by definition of the maneuver
this position is reached at rest). Then this maneuver is identical to the time-optimal
maneuver from the same initial conditions to the position qf as described in Section 3:
Because both the time-optimal maneuver to qf and the interception maneuver are fully
defined by the switching times and the initial control input, and all boundary constraints
are satisfied for both maneuvers, this must be the case.
Conversely, assume that we have computed a time-optimal maneuver to the position
qf . Then for all intermediate positions occurring after the time t2 , this is the interception
maneuver that is optimal with respect to the above optimality conditions (we must limit
the intermediate positions to ones occurring after t2 due to the constant slope of λ3
permitting only one zero crossing after t̂).
The above equivalence of interception maneuvers and time-optimal maneuvers allows
us to leverage the trajectory generation algorithm that we have developed for time-
optimal maneuvers, as will be seen in Section 5.
127
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
In this section, we discuss how the properties of optimal interception trajectories can be
used to efficiently compute interception maneuvers.
128
6. Experimental Results
where cx and cz are parameters. It is straightforward to verify that for all values of
cx ∈ [0 1], cz ∈ [g/amax 1], the acceleration constraint (1) is satisfied.
It then remains to find parameters cx and cz such that each degree of freedom is able
to reach the interception point. We propose a two-step strategy:
1. Starting from cz = 1, find the lowest value cz for which the vertical degree of freedom
is able to both satisfy the interception constraint and not violate space restrictions,
and then
2. vary cx until both horizontal degrees of freedom reach the interception point and remain
within space constraints.
The above method finds a three-dimensional interception trajectory that satisfies the
interception constraint and possible space restrictions. It may however be beneficial to
additionally specify a performance measure for the choice of cx and cz . We intend to
investigate this in further research.
The above steps complete the description of an algorithm permitting the computation
of a three-dimensional interception maneuver that is feasible with respect to the dynamics
and constraints of the quadrocopter.
6. Experimental Results
The interception strategy presented herein has been tested in the Flying Machine Arena,
an aerial vehicle development platform at ETH Zurich [17]. To demonstrate its perfor-
mance, we apply it to the problem of “hitting” a ball mid-flight with a quadrotor vehicle.
129
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
-3
Po
si tio
ny
-3.5 0.2
(m
0
)
x (m)
Position
Figure 3. A set of candidate trajectories planned when a ball is first detected. The red cross
denotes the initial position of the vehicle. The blue dots are the ball positions predicted at
various instances in time. The dotted black lines show the planned flight path for each of the
predicted ball positions. The chosen interception time lies in the middle of these points.
130
7. Conclusion & Outlook
Position z (m) 3
-2.2
-2.4
Pos
ition
-2.6
y (m
2
)
1
-2.8
0 )
-1 ion x (m
Posit
Figure 4. Trajectories planned as the ball state and drag estimates converge. A selection of
the planned trajectories is shown here (dotted black lines). The predicted interception location
(blue points) changes by about 0.55 m. The updated interception location is accounted for at
every re-planning of the trajectory. The solid red line shows the trajectory that was actually
flown. The vehicle intercepted the ball at the end of the red line.
This paper introduced an interception trajectory generation algorithm for quadrotor vehi-
cles. It was shown that the problem can be reformulated such that its structure is identical
to the trajectory generation for time-optimal trajectories. For this class of trajectory gen-
131
Paper III. Real-Time Trajectory Generation for Interception Maneuvers
eration problems, an efficient computational method has been developed previously and
could be readily adapted. The successful implementation of this interception strategy
has been verified experimentally by intercepting balls mid-flight.
In the trajectory design approach presented herein, a number of design parameters
remained to be chosen, for example the control effort tradeoff between the three decoupled
degrees of freedom. We are planning to investigate ways to systematically choose these
such that the best possible performance is achieved.
Furthermore, we intend to investigate strategies for determining the optimal inter-
ception time for objects flying along trajectories, where optimality could be defined by
criteria such as minimal vehicle velocity at interception time, the shortest time to inter-
ception, or other objectives.
An interesting extension of this work would be to include multiple interception con-
straints. Because the current interception method is computationally very lightweight,
we believe that it should be possible to plan such multiple interception trajectories while
maintaining real-time capabilities.
References
[1] Berthold Bäuml, Thomas Wimböck, and Gerd Hirzinger. Kinematically Optimal
Catching a Flying Ball with a Hand-Arm-System. In International Conference on
Intelligent Robots and Systems, 2010.
[2] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Vol. I. Athena
Scientific, third edition, 2005.
[3] Patrick Bouffard, Anil Aswani, and Claire J Tomlin. Learning-Based Model
Predictive Control on a Quadrotor: Onboard Implementation and Experimental
Results. In International Conference on Robotics and Automation, 2012.
[4] Y. Bouktir, M. Haddad, and T. Chettibi. Trajectory Planning for a Quadrotor
Helicopter. In Mediterranean Conference on Control and Automation, 2008.
[5] Arthur Earl Bryson and Yu-Chi Ho. Applied Optimal Control. Taylor & Francis,
1975.
[6] S Chang. Minimal Time Control With Multiple Saturation Limits. IEEE Transac-
tions on Automatic Control, 8(1):35–42, January 1963.
[7] Ian D. Cowling, Oleg A. Yakimenko, and James F. Whidborne. A Prototype of
an Autonomous Controller for a Quadrotor UAV. In European Control Conference,
2007.
[8] Carlos E. Garcı́a, David M. Prett, and Manfred Morari. Model Predictive Control:
Theory and Practice - A Survey. Automatica, 25(3):335–348, 1989.
[9] Daniel Gurdan, Jan Stumpf, Michael Achtelik, Klaus-Michael Doth, Gerd Hirzinger,
and Daniela Rus. Energy-Efficient Autonomous Four-Rotor Flying Robot Controlled
at 1 kHz. In International Conference on Robotics and Automation, 2007.
132
References
[10] Richard F. Hartl, Suresh P. Sethi, and Raymond G. Vickson. A Survey of the
Maximum Principles for Optimal Control Problems with State Constraints. SIAM
Review, 37(2):181–218, 1995.
[11] Markus Hehn and Raffaello D’Andrea. Quadrocopter Trajectory Generation and
Control. In IFAC World Congress, 2011.
[12] Markus Hehn, Robin Ritz, and Raffaello D’Andrea. Performance Benchmarking of
Quadrotor Systems Using Time-Optimal Control. Autonomous Robots, 33(1-2):69–
88, 2012.
[13] Henry Hermes and Joseph P Lasalle. Functional Analysis and Time Optimal Control.
Mathematics in Science and Engineering, 56, 1969.
[14] Gabriel M. Hoffmann, Steven L. Waslander, and Claire J. Tomlin. Quadrotor
Helicopter Trajectory Tracking Control. In Conference on Decision and Control,
2008.
[15] Jonathan P. How, Brett Bethke, Adrian Frank, Daniel Dale, and John Vian.
Real-Time Indoor Autonomous Vehicle Test Environment. IEEE Control Systems
Magazine, 28(2):51–64, 2008.
[16] Peter C. Hughes. Spacecraft Attitude Dynamics. John Wiley & Sons, 1986.
[17] Sergei Lupashin, Angela Schöllig, Michael Sherback, and Raffaello D’Andrea. A
Simple Learning Strategy for High-Speed Quadrocopter Multi-Flips. In International
Conference on Robotics and Automation, 2010.
[18] H. Maurer. On Optimal Control Problems with Bounded State Variables and Control
Appearing Linearly. SIAM Journal Control and Optimization, 15(3):345–362, 1977.
[19] D. Mellinger, N. Michael, and V. Kumar. Trajectory Generation and Control
for Precise Aggressive Maneuvers with Quadrotors. The International Journal of
Robotics Research, 31(5):664–674, 2012.
[20] Nathan Michael, Daniel Mellinger, Quentin Lindsey, and Vijay Kumar. The GRASP
Multiple Micro UAV Testbed. IEEE Robotics and Automation Magazine, 17(3):56–
65, 2010.
[21] Mark Muller, Sergei Lupashin, and Raffaello D’Andrea. Quadrocopter Ball Juggling.
In International Conference on Intelligent Robots and Systems, 2011.
[22] Frieder Stolzenburg, Oliver Obst, and Jan Murray. Qualitative Velocity and Ball
Interception. Lecture Notes in Computer Science - KI2002: Advances in Artificial
Intelligence, 2479:95–99, 2002.
133
Part B
ITERATIVE LEARNING OF
PERIODIC MOTIONS
Paper IV
Abstract
Quadrocopters offer an attractive platform for aerial robotic applications due
to, amongst others, their hovering capability and large dynamic potential. Their
high-speed flight dynamics are complex, however, and the modeling thereof has
proven difficult. Feedback control algorithms typically rely on simplified models,
with feedback corrections compensating for unmodeled effects. This can lead to
significant tracking errors during high-performance flight, and repeated execution
typically leads to a large part of the tracking errors being repeated. This paper
introduces an iterative learning scheme that non-causally compensates repeatable
trajectory tracking errors during the repeated execution of periodic flight maneu-
vers. An underlying feedback control loop is leveraged by using its set point as
a learning input, increasing repeatability and simplifying the dynamics considered
in the learning algorithm. The learning is carried out in the frequency domain,
and is based on a Fourier series decomposition of the input and output signals.
The resulting algorithm requires little computational power and memory, and its
convergence properties under process and measurement noise are shown. Further-
more, a time scaling method allows the transfer of learnt maneuvers to different
execution speeds through a prediction of the disturbance change. This allows the
initial learning to occur at reduced speeds, and thereby extends the applicability
of the algorithm for high-performance maneuvers. The presented methods are val-
idated in experiments, with a quadrocopter flying a figure-eight maneuver at high
speed. The experimental results highlight the effectiveness of the approach, with
the tracking errors after learning being similar in magnitude to the repeatability
of the system. The applicability to more complex tasks is demonstrated experi-
mentally by learning a maneuver wherein an inverted pendulum is balanced on the
quadrocopter, while a trajectory is tracked.
137
Paper IV. Frequency Domain Learning for Periodic Maneuvers
1. Introduction
Aerial robots serve as platforms for robotic applications that provide numerous benefits,
including the ability to move freely in three-dimensional space, and the significantly
increased ability to overcome obstacles due to not being limited to motion on the ground.
For relatively small platforms that require hovering capabilities, multi-rotor vehicles such
as quadrocopters are often the vehicle of choice [53]. Compared to other such platforms,
quadrocopters profit from high mechanical robustness due to a minimal number of moving
parts [45], safety due to comparatively small rotor size, and high thrust-to-weight ratios
allowing high-performance maneuvers as well as the transport of large payloads.
While the use of quadrocopters as robotic platforms was largely confined to research
institutions in the past, a growing number of industrial applications are now in the process
of being developed and deployed. Examples include aerial imaging for photogrammetry,
motion picture production, and journalism [19], environmental monitoring and inspection
tasks of hard-to-reach objects such as pipelines, dams, and power lines [54], the creation
of ad-hoc antenna networks or arrays [56], as well as disaster coordination [1].
The capability of quadrocopters to perform highly dynamic, complex, and precise
motions has been demonstrated repeatedly in recent years (see, for example, [34, 37, 39,
48]). In order to execute such high-performance motions, the commonly used approach
consists of using a first-principles model of the quadrotor dynamics to design the nominal
maneuver, and a model-based feedback control law to ensure tracking of the nominal
trajectory.
Such traditional feedback controllers however have important limitations in high-
performance quadrotor applications. While the first-principles models used to design
the controllers capture the near-hover behaviour of quadrocopters well, secondary effects
become increasingly important when maneuvering speed increases. Examples of such
effects are the complex drag and lift behaviour of rotary wings under unsteady inflow
conditions [11], the aerodynamic effects of a vehicle moving through the turbulent wake
of its propellers [6], and external influences such as wind or ground and wall effects when
operating in proximity to the environment [46]. Such effects are not typically accounted
for in the maneuver and controller synthesis stage in order to make the design process
tractable. The execution then heavily relies on the feedback controller to compensate for
potentially significant effects not captured by the nominal dynamics.
In order to improve the tracking performance of quadrocopters under feedback control,
a number of researchers have proposed learning schemes. Examples of such schemes in-
clude those based on reinforcement learning techniques [8,60] and neural networks [16,17],
which are designed to automatically find well-performing control policies, and adaptive
control methods [4,40,42] that adapt parameters that are based on modeled disturbances
such as payloads, center of mass shifts and external disturbances.
When the motion that we are concerned with is to be executed repeatedly, a further
opportunity to improve tracking performance may arise: Many of the disturbances that
degrade tracking performance will be similar each time the vehicle performs the motion.
138
1. Introduction
These disturbances can then be compensated for non-causally using data from past ex-
ecutions. Control strategies that exploit available data from past executions in order to
improve tracking performance were first proposed in the late 1970s and early 1980s [5,27]
for applications in motion control and power supply control. Since then, active research in
this field, covering numerous applications and problem formulations (see e.g. [12, 15, 59],
and references therein), has shown it to be a powerful approach for high-performance
reference tracking. In extensions to these learning methods, several authors have shown
the application of learning control methods to systems with underlying feedback control
loops (e.g. [9, 14]). In such scenarios, the powerful capability of learning control to non-
causally compensate repeatable disturbances is combined with real-time feedback control
to correct for non-repetitive noise.
While the application of learning algorithms, and specifically non-causal strategies,
to stationary systems is well-established, its use for flying vehicles is less mature and
has been actively researched during recent years. Several high-performance maneuvers
for multi-rotor vehicles have been demonstrated with the use of learning algorithms.
Broadly speaking, the learning approaches used can be categorized in two groups:
The first group is characterized by its ability to learn motions that are parameterized.
The motion is thus described by a (finite) set of design parameters, chosen by the user.
After the execution of the motion, these parameters are adapted to compensate for dis-
turbances. The direction and magnitude of the correction may be model-based, or based
on the user’s intuition. A discussion on the importance of choosing ‘good’ design pa-
rameters may be found in [31], where a learning algorithm for this kind of parameterized
motions is demonstrated for multiple flips and fast translations with quadrocopters. A
further demonstration of this class of learning algorithms is provided in [36]. The ability
to shape the tracking performance strongly depends on the number of parameters that
are optimized; in the above examples, the objective is to minimize the error at specific
time instants (‘key frames’), and a relatively small number of parameters is sufficient to
do so. This makes the methods computationally lightweight.
The second group of learning approaches considers more generic motions that need
not be specified by parameters. The system dynamics are considered in discrete time,
and the correction consists of correction values (typically control inputs or set points)
for each discrete time step. After execution of the motion, a numerical optimization over
the correction values is performed in order to minimize a metric related to the tracking
error. In this optimization, a model of the system dynamics provides the mapping from
corrections to the tracking error. This approach is commonly known as a form of iterative
learning control [12], and its application to high performance quadrocopter flight has been
demonstrated [38, 43, 47, 51].
The delimitation between the two groups is not strict. Indeed, the second group of
learning approaches could be seen as using a very large number of values to parameterize
the correction.
The algorithm presented in this paper can be characterized to be a form of repetitive
control [59] in that it is a technique for non-causally compensating repeated tracking
139
Paper IV. Frequency Domain Learning for Periodic Maneuvers
errors in the execution of periodic motions. Algorithms of this form have previously shown
good performance when applied to related problems where aerodynamic disturbances are
considered, in particular the rejection of periodic wind disturbances on wind turbines [7,
24].
Similar to the second group of learning algorithms, we do not assume a parameterized
motion. However, we reduce the dimensionality of the corrections that we intend to learn
by assuming that they are periodic. This allows us to parameterize the corrections as
the coefficients of a truncated Fourier series. The order of the Fourier series provides a
means to trade off computational complexity and the ability to compensate for temporally
local or high-frequency disturbances. Furthermore, the approach can be considered to be
conceptually similar to the one presented by [31], which presents an adaptation strategy
to correct for state errors at discrete points in time of parameterized motion primitives.
However, we consider periodic errors (instead of errors at specific points in time), and do
not require parameterization of the maneuver.
The contribution of this paper to the field of quadrocopter control lies in the appli-
cation of methods from the fields of repetitive control and iterative learning control to
quadrocopters. A general framework for arbitrary periodic motions is presented. We
demonstrate how a feedback controller can be leveraged to shape the closed-loop dynam-
ics of the quadrotor system, and show that a linear time-invariant approximation of the
closed-loop dynamics suffices to guide the learning process. Using statistical properties
of the disturbance, measurement noise, and the influence of nonlinearities, we derive the
optimal inter-execution learning update step size. The validity of the approach and its
performance is investigated through experiments in the ETH Flying Machine Arena, both
with a quadrocopter under normal position control and with a task-specific controller
stabilizing a more complex system consisting of a quadrocopter balancing an inverted
pendulum.
Furthermore, this paper introduces a novel method that extends the applicability of
the repetitive control approach when the reference trajectory is too fast to be learnt
directly, for example because the initial execution fails entirely. The core idea here lies in
providing an improved initial guess of the disturbances degrading tracking performance.
This typically enables learning of the trajectory because the errors are sufficiently small
for the first-principles model to provide reliable information on how to compensate. To
find the improved initial guess, we introduce a time scaling method that allows initial
learning to occur at reduced maneuvering speeds and the transfer of learnt corrections
from the reduced-speed execution to full speed. This method may also be applied to other
complex dynamic systems where it is necessary to limit initial tracking errors in order
to avoid the system failing. The time-scaling method provides an interesting alternative
to methods that rely on aborting trials when the errors grow too large [51] in that the
tracking error is always learnt over the entirety of the maneuver, and over more general
methods to extend the motion [47] due to its computational simplicity.
Preliminary results of this method were presented at international conferences [22,23],
and this paper extends these results through the computation of the optimal learning rate
140
2. Quadrocopter Dynamics and Closed-Loop Control
p3
p2
O
p1
Figure 1. The inertial coordinate system O and the vehicle coordinate system V , used to
describe the dynamics of the quadrocopter.
for given statistical properties of the process, a novel way to predict disturbances when
transferring knowledge between different execution speeds, as well as providing an in-
depth experimental analysis of the method.
The remainder of this paper is structured as follows: We introduce the model of the
quadrocopter and the used feedback law in Section 2. The learning algorithm is then
presented and analyzed in Section 3. Section 4 presents experimental results highlighting
the performance of the algorithm. Section 5 discusses advantages and restrictions of the
learning algorithm, and Section 6 provides a conclusion.
This section first introduces the first-principles model of a quadrocopter, along with
a discussion of the accuracy of the model. Furthermore, we introduce the input-output
linearizing feedback controller used to control the vehicle. The combination of vehicle and
feedback controller form the closed-loop dynamics that the iterative learning algorithm is
then applied to. For ease of notation, vectors are expressed as n-tuples (x1 , x2 , ...) where
convenient, with dimension and stacking clear from context.
141
Paper IV. Frequency Domain Learning for Periodic Maneuvers
L(F2 − F4 )
I ω̇ = L(F3 − F1 ) − ω × Iω (1)
ζ(F1 − F2 + F3 − F4 )
where ω = (ωx , ωy , ωz ) is the rotational rate of the vehicle, I is its rotational inertia, L
the arm length of the vehicle, ζ the drag-to-thrust ratio of the propeller, and F1 to F4 are
the individual thrust forces of each propeller. The total mass-normalized thrust produced
by the four propellers is
1
a= (F1 + F2 + F3 + F4 ) (2)
m
where m denotes the mass of the vehicle. The rotational kinematics are given by the
first-order differential equation of the rotation matrix [26]
−ωz
0 ωy
O
V Ṙ =O
V R ωz 0 −ωx . (3)
−ωy ωx 0
The translational dynamics of the vehicle, expressed in the inertial coordinate system
O, are
p̈1 0 0
O
p̈2 = V R 0 + 0 (4)
p̈3 a −g
142
2. Quadrocopter Dynamics and Closed-Loop Control
While the modeling of such effects has provided valuable insight, their incorpora-
tion into control strategies has generally been slow due to the highly complex nonlinear
models making controller design significantly more difficult, and the added difficulty of
identifying the parameters of such models. Instead, most control laws treat such effects
as disturbances, relying on feedback control to account for them implicitly. In this pa-
per, we will follow a similar approach in that we do not model the effects, but we will
compensate for them non-causally during the repeated execution of periodic motions.
Position Control For all three degrees of freedom, a feedback control law determines
the desired acceleration p̂¨i from the position and velocity errors such that the loop is
shaped to the dynamics of a second-order system with time constant τi and damping
ratio ζi :
1 ζi
p̈ˆi = 2 (p̂i − pi ) − 2 ṗi for i = {1, 2, 3} (5)
τi τi
1 ˆ
a= p̈3 + g . (6)
R33
Reduced Attitude Control The desired rotation matrix entries R̂13 and R̂23 for the
given desired accelerations are computed from (4), (5) and (6) to be
p̈ˆ1 p̈ˆ2
R̂13 = and R̂23 = . (7)
a a
The attitude control loop is shaped such that the rotation matrix entries R13 and
R23 react in the manner of a first-order system with time constant τrp by computing the
desired derivative of the rotation matrix elements:
ˆ = 1 R̂ − R
Ṙ i3 i3 i3 for i = {1, 2} . (8)
τrp
Inverting the rotational kinematics (3), this is converted to the commanded rotational
143
Paper IV. Frequency Domain Learning for Periodic Maneuvers
" ˆ #
ω̂x R21 −R11 Ṙ13
1
= ˆ . (9)
ω̂y R33 R22 −R12 Ṙ 23
The commanded rotational rate about the third body axis, ω̂z , can be determined
separately as it does not influence the translational dynamics of the vehicle. We employ
a proportional controller on the Euler angle describing the vehicle heading.
Body Rate Control Using the desired body rates ω̂x , ω̂y , ω̂z as commands, the body
rate controller is designed to follow the commands in the fashion of three decoupled
first-order systems. In order to achieve this, the rotational dynamics (1) are inverted:
with the time constants τpq for the first two axes, and τr for the third axis. The above
equation, in combination with the total thrust constraint resulting from combining equa-
tions (2) and (6), define the four individual propeller forces F1 to F4 , and thereby complete
the feedback control design.
144
3. Learning Algorithm
The feedback control design is based on cascaded control loops that are designed using
a loop shaping approach. We assume time scale separation between the control loops (i.e.,
τrp τ12 and τpq τrp ) and approximate the closed-loop dynamics to depend only on the
position control loops. The nominal dynamics of the closed-loop system from a position
set point p̂ to the vehicle position p can then be approximated by three decoupled LTI
second-order systems:
1 ζi
p̈i ≈ 2
(p̂i − pi ) − 2 ṗi for i = {1, 2, 3} (11)
τi τi
with time constant τi and damping ratio ζi . We will use these nominal closed-loop
dynamics in the iteration-domain learning algorithm.
More accurate characterizations of the closed-loop dynamics could be used in the
learning algorithm, e.g. by including the underlying control loops such as the attitude
control (7)-(9). However, our experiments showed that the low-order model was sufficient
to guide the iterative learning process as long as very high frequencies are not considered.
3. Learning Algorithm
This section introduces the learning algorithm that is applied to the quadrocopter system
in order to compensate for systematic disturbances that deteriorate flight performance
during the execution of periodic motions. The fundamental idea is to use data from
past executions in order to characterize the tracking errors, and to then compensate
for them in a non-causal manner during following executions. For this compensation,
we leverage the prior knowledge of the dominant dynamics of the quadrocopter under
feedback control, and combine this knowledge with measurement data from experiments
in order to determine appropriate compensations to apply during the next execution.
Compared to the use of pure feedback control, this scheme can improve the tracking
performance because repeated disturbances are compensated for non-causally, whereas
pure feedback control is limited to causal corrections.
The basic system structure used in the learning algorithm is depicted in Figure 2. The
closed-loop dynamics of the quadrocopter under feedback control remain unmodified by
the learning algorithm. We use the approach of adapting the set point of the controller,
also called a serial architecture [12] or indirect learning-type control [58, 59]. The con-
troller set point is augmented by adding a correction input u(t) to it. We assume that we
can derive a linear time-invariant (LTI) approximative model of the closed-loop dynamics
between the position tracking error and the correction input (as done in Section 2.3 for
the presented feedback control law). In comparison to modifying the control inputs of
the quadrocopter directly, the serial architecture offers the advantage that the dynamics
from a change in the set point to a change in the tracking error output are those of the
closed loop system, which are designed to be approximately LTI. Due to the periodic
145
Paper IV. Frequency Domain Learning for Periodic Maneuvers
p(t) y(t)
Vehicle + −
Iterative Learning
Figure 2. The learning architecture. The iteration-domain feedback law uses the tracking
error as input, and its output is a set point shift to the real-time feedback law.
nature of the motions addressed in this paper, we use Fourier series [25] to characterize
correction input and tracking error output signals of the system.
The learning algorithm builds upon this LTI model when interpreting the measure-
ment data of an execution in order to determine the appropriate correction input signal
for the next execution, as detailed in Section 3.2. Using the statistical properties of the
relevant disturbance signals, it is possible to derive the optimal correction input signal in
a least-squares sense (Section 3.3). Safer learning of high-performance maneuvers can be
achieved by reducing the execution speed during the initial learning phase, and using the
disturbance characteristics identified at lower speeds to provide an improved disturbance
prediction at high speeds (Section 3.4).
146
3. Learning Algorithm
The first component di (t) represents the systematic tracking error that should be com-
pensated. It is not known a priori, but we will later assume that a statistical description
of d0 (t) is available for the derivation of the optimal learning step size in Section 3.3.
Specifically, we will then assume that its mean is zero, and that it is stationary with a
known autocorrelation function. The evolution of the tracking error over several iterations
is modeled as
where γ i (t) models slight changes to the disturbance from iteration to iteration as a trial-
uncorrelated sequence of zero-mean stationary noise, where we will again assume a known
autocorrelation function in Section 3.3.
The second component, hi (t), is the response of the closed-loop dynamic system to
the correction input ui (t), characterized in the frequency domain by
with G(ω) being the transfer function [13] of the LTI approximation of the closed-loop
system, and H i (ω) and U i (ω) being the frequency domain representations of hi (t) and
ui (t), respectively. The final component of the tracking error measurement, ni (t),
represents trial-uncorrelated, non-periodic noise. It is assumed to vary for each execution
of the maneuver, and captures effects such as non-repeatable disturbances to the vehicle
(for example wind gusts), but also measurement noise. The non-repeatable disturbances
are assumed to be zero mean; a non-zero mean would represent a repeatable disturbance
and would therefore be accounted for by di (t). Furthermore, we assume stationarity, and
for the derivation of the optimal learning step size a known autocorrelation function.
The goal of the learning algorithm is to choose the correction input ui (t) such that
the output y i (t) is minimized. The desired output is assumed to be periodic with period
T , and the algorithm will use measurements of previous executions y i−1 (t), y i−2 (t), . . . in
order to determine ui (t). Intuitively, this task implies finding the correction input ui (t)
such that hi (t) + di (t), is minimized, meaning that the unwanted repeated disturbance is
canceled out in the output (12) as well as possible.
Note that the dimensions of u(t) and y(t) are not necessarily given by the system
model, but can be chosen by the user to indicate which signals are relevant to the track-
ing task at hand, and which quantities can be modified. We will, however, assume that
the dimension of u(t) is no less than that of y(t), implying that we have at least as many
compensation inputs available as we have error quantities. If this is not the case, full
147
Paper IV. Frequency Domain Learning for Periodic Maneuvers
tracking can typically not be achieved, although it can be shown that the tracking error
can still be reduced [23]. In the specific implementation of the algorithm considered in
this paper, u(t) is a three-dimensional additive correction to the position control refer-
ence point and y(t) is the three-dimensional deviation of the vehicle from the reference
trajectory, as seen in Figure 2.
N
X N
X
i
u (t) = r0i + rki cos (kΩ0 t) + sik sin (kΩ0 t) (15)
k=1 k=1
where Ω0 = 2π/T is the fundamental frequency of the series. The frequency domain
representation of u(t) is then
The system response hi (t) to this Fourier series ui (t) is also a Fourier series of order
N [25], which is defined in the frequency domain through the relationship
We will now invert this relationship in order to use it as a learning feedback law
that compensates a given disturbance. Assuming that G (kΩ0 ) is full rank, let G+ (kΩ0 )
denote the Moore-Penrose pseudoinverse [10] of G (kΩ0 ) (in the special case that G is
square, G+ (kΩ0 ) = G−1 (kΩ0 ) holds).
Assume that we have executed the trajectory for iteration i, and measured the tracking
error y i (t). Let Y i (kΩ0 ) denote the Fourier series that coincides with y i (t) for t ∈ [0 T ].
The iteration feedback law is then given by the correction Fourier series
for k = 0 . . . N , with the step size µik controlling the adaptation rate. The time signal
ui+1 (t) is then constructed from U i+1 (kΩ0 ) and applied to the system in the next iteration.
148
3. Learning Algorithm
Dropping the argument kΩ0 for notational convenience (all quantities in the following
are parameterized by kΩ0 ), we expand the above using Equations (13), (18), and (19):
Stability We first consider µik a tuning parameter, and derive stability conditions for
constant values µik = µ̄k . We rewrite Equation (22) using Equations (18), (20) and the
fact that G+ is the right inverse of G:
149
Paper IV. Frequency Domain Learning for Periodic Maneuvers
E Y i+1 = (1 − µ̄k ) E Y i
(25)
since the noise terms are zero mean as discussed in Section 3.3. Note that
E Y0 =0
(26)
since it is assumed that D0 is zero mean and the expected value of the Fourier coeffi-
cients Y i is therefore zero for all i. If the algorithm is incorrectly initialized or D0 is
not zero mean, the tracking error still converges to zero in expectation for step sizes
0 < γ̄k ≤ 1. Furthermore, the variance of the Fourier coefficients can be computed by
rewriting Equation (23) using (18) and (20):
Taking the variance and using the fact that (Di + H i ), Γi , N i and N i+1 are independent,
it follows that
from which it follows that the variance does not diverge for step sizes 0 < µ̄k ≤ 1,
assuming bounded variance of N and Γ.
Because the Fourier series coefficients are approximately Gaussian, their mean and
variance fully describe their approximate distribution. Assuming that the approximation
as a Gaussian distribution is sufficiently accurate, the learning update law results in the
Fourier series coefficients of the tracking error being zero mean with bounded variance
for constant step sizes 0 < µ̄k ≤ 1.
It is important to note that this result only holds for the Fourier series coefficients
k = 0, 1, . . . , N of Y (kΩ0 ). No adaptation occurs for higher-frequency components, and
the coefficient dynamics for k > N are therefore simply those of Equation (22) without
the input:
150
3. Learning Algorithm
Minimum Mean Square Error Step Size We now consider time-varying step sizes
µik , and derive the step size sequence that minimizes the trace of the tracking output
variance, also called the minimum mean square estimate [3]. Computing the trace of
the variance analogously to Equation (30) from Equation (27), and setting the derivative
∗
with respect to µik to zero, it follows that the optimal step size µk i is
where Tr[·] denotes the trace operator, and the argument (kΩ0 ) is stated explicitly in order
to highlight that the optimal step size may differ for each multiple of the fundamental
frequency. The output variance Var[Y i ] can be computed recursively from Equation (27),
starting with the statistical properties of d0 (t) and n0 (t) to determine Var[Y 0 ]. Note that
∗
the step size sequence µk i only depends on the statistical properties of the random signals
d0 (t), γ i (t) and ni (t) and not on their actual realizations, and can therefore be entirely
precomputed before starting experiments if desired.
151
Paper IV. Frequency Domain Learning for Periodic Maneuvers
T
T` = (33)
λ`
and assume that we have successfully learnt the motion for the execution speed λ1 , i.e.
the tracking error output y1 (t) associated with the execution speed λ1 is sufficiently small
for all t1 ∈ [0 T1 ]. The objective is then to use the learnt compensation input at speed λ1 ,
which we denote u1 (t), to initialize the compensation input for the (typically higher)
execution speed λ2 , which we analogously denote u2 (t).
Ideally, the values for u2 (t) should result in a tracking error at the new execution
speed y2 (t) that is as small as y1 (t). This would require knowing how the disturbance d(t)
changes with the time scaling, such that u(t) can be chosen accordingly as seen in Equa-
tion (12). However, due to d(t) capturing unmodeled disturbances, a model of its depen-
dence on execution speed is not readily available.
An obvious choice for the transfer between two maneuver speeds is to keep the learnt
input corrections coefficients and simply re-map them to the corresponding frequencies
(i.e., U2 (kλ2 Ω0 ) = U1 (kλ1 Ω0 )). However, the varying sensitivity of the closed-loop trans-
fer function at the two different frequencies, as well as the changing disturbances, could
potentially lead to large errors.
In order to account for this, we use a linear extrapolation method (see e.g. [18]) to
predict the correct value of the compensation term H(λkΩ0 ) in Equation (20) from past
time scaling changes. The first-order prediction of H(λ3 kΩ0 ) for the execution speed λ3
from the past execution speeds λ1 and λ2 is then
λ3 −λ2
H(λ3 kΩ0 ) = H(λ2 kΩ0 )+ (H(λ2 kΩ0 )−H(λ1 kΩ0 )) (34)
λ2 −λ1
which we can expand using Equation (14) to find the initial guess for the correction
Fourier series coefficients:
+ λ3 −λ2
U (λ3 kΩ0 ) =G (λ3 kΩ0 ) G(λ2 kΩ0 )U (λ2 kΩ0 ) 1+
λ2 −λ1
(35)
λ3 −λ2
−G(λ1 kΩ0 )U (λ1 kΩ0 ) .
λ2 −λ1
The prediction requires the storage of the learnt correction series coefficients of the
past two execution speeds (U (λ1 kΩ0 ) and U (λ2 kΩ0 )), as well as the respective execution
speeds (λ1 and λ2 ). For the first prediction, only one past set of learnt series coefficients is
available, and we therefore resort to a zeroth-order prediction by eliminating the second
summand in Equation (34).
152
3. Learning Algorithm
1. The order of the compensation Fourier series N fundamentally determines the fre-
quencies of the error output y(t) that are compensated for since the compensation
input u(t) is characterized by a truncated Fourier series of order N , and no adapta-
tion occurs at higher frequencies. While an upper bound for N is in principle only
given by the discrete-time measurements used to estimate the coefficients of y(t),
the high-frequency components of the tracking error are often inherently small and
can therefore be neglected by choosing a lower order. If the approximate model (18)
only captures the underlying closed-loop dynamics well for low frequencies, it may
also be beneficial to avoid learning at higher frequencies by limiting the series order.
2. The iteration- and order-dependent learning step size µik can be treated as a tuning
parameter chosen by the user, or its optimal value may be derived from the sta-
tistical properties of n(t), d0 (t), and γ(t). While the explicit statistical properties
of these signals may be difficult to identify for an experimental platform, they can
also be considered design parameters. In this context, the properties of γ(t) capture
the likely change of the disturbance from iteration to iteration due to unmodeled
dynamics (for example, nonlinearities of the underlying model), and can therefore
be used to encode model uncertainty. A typical choice would be to assume the
disturbance changes to be large initially to account for significant changes in the
correction input that could highlight nonlinear behaviour, and smaller changes as
the algorithm converges. The properties of the non-repetitive noise term n(t) en-
code measurement noise and the level of repeatability of the experiment, and large
values thereof enforce smaller step sizes (as seen in Equation (32)), reducing the
level of ‘trust’ in the measurements. The statistical properties of d0 (t) capture the
confidence in the initial guess of what the systematic disturbances are, and thereby
influences the step size during the initial phase of the learning. Assuming little prior
knowledge implies the coefficients of D0 (kΩ0 ) having very large variance, which in
∗
turn implies that the first step size µ0 will approach 1 according to Equation (32).
3. Finally, when time scaling is applied, the time scale sequence λ0 , λ1 , . . . remains to
be chosen. This is typically done based on past experiments, where the performance
with no initial compensation input determines λ0 , and the following time scales are
determined based on the performance of the predictor (34). Note that, if the optimal
∗
step sequence size µk i is used, γ(t) should be chosen to have significant influence
after a change in execution speed in order to account for modeling mismatches
caused by the change of execution speed.
153
Paper IV. Frequency Domain Learning for Periodic Maneuvers
4. Experiments
Experiments demonstrating the use of the learning algorithm presented herein for quadro-
copter flight are presented in this section. After introducing the experimental setup, we
present two different flight tasks to which the algorithm was applied, discuss its perfor-
mance, and demonstrate the influence of a selection of the tuning parameters introduced
in Section 3.5.
154
4. Experiments
3. Wait for the system to converge under the new control inputs. Note that the
instantaneous change of the correction input in Step 2 represents a non-periodic
excitation of the system, making this step necessary.
The first constraint, limiting the collective thrust, is given by the minimum and maximum
rotational speed of the propellers. The second constraint, limiting the rotational rate
of the quadrocopter, is given by the limited range of the gyroscopic inertial sensors.
Because the quadrocopter has low rotational inertia and high achievable torques due to
the outwards mounting of the propellers, the body rate control loop (10) has very high
bandwidth. We therefore do not explicitly limit rotational accelerations, and assume that
the rate constraint (37), with ωmax suitably chosen, along with (36) suffices to ensure
feasibility.
Based on the constraints (36) and (37), we will now design the two components of
the figure-eight maneuver, i.e. the half-circles about the obstacle points and the splines
connecting them:
Semi Circles: A circular trajectory, covering an angle of 180◦ , surrounds each obstacle
point at a user-defined radius Rs . The thrust a and rotational rates ωx , ωy along this
155
Paper IV. Frequency Domain Learning for Periodic Maneuvers
Position p2 (m) 0
-1
0 1 2 3 4
Position p1 (m)
Figure 3. The reference trajectory of the figure-eight maneuver, highlighting the composition
from two half-circles (dashed red) and two connecting splines (solid blue). The period of the
maneuver is T = 3.3 s, and the maximum speed reached is 6 m s−1 .
circular trajectory are then computed [49], and the time required for each semi circle
is chosen to be the fastest time for which the control input constraints (36)-(37) are
satisfied.
Connecting Splines: The two semi circle trajectories are connected through polyno-
mial trajectories. The trajectory is continuous in position, velocity, attitude, and rota-
tional rate of the quadrocopter if the polynomial trajectories match the position, velocity,
acceleration, and jerk of the circular trajectories at either end [35]. In order to satisfy
the four boundary constraints on both ends of the spline, we construct a seventh-order
polynomial. The remaining degree of freedom in the spline design is the duration of
maneuver. In order to achieve high speed, we iterate over the duration until the fastest
maneuver that satisfies the control input bounds (36)-(37) is found, using the algorithm
from [49] to compute the inputs and verify feasibility.
The figure-eight maneuver is then fully defined through the concatenation of the first
semi circle, the first spline, the second semi circle, and the second spline. Due to the
continuity and feasibility conditions imposed on each component of the trajectory, the
resulting trajectory is feasible with respect to the constraints (36)-(37), and it is periodic.
An example of the figure-eight motion can be seen in Figure 3. In this specific example,
the parameters were chosen to be L = 4 m, Rs = 0.75 m, amax = 1.8g = 17.65 m s−2 , and
ωmax = 500 ◦ s−1 . The resulting maneuver duration is T = 3.3 s, with an average speed
of 4 m s−1 and a maximum speed of 6 m s−1 . The duration of each half circle is 0.71 s, and
the duration of each connecting spline is 0.93 s. This example was used as the reference
trajectory to be learnt in the following results.
Learning at Fixed Maneuver Speed As a first test case, we show the learning
performance when the maneuver is executed at the fixed speed of λ = 0.7, i.e. 70 % of
the maneuver speed the motion was nominally designed for. The order of the correction
input series was chosen to be N = 10, which showed to be a good compromise between
156
4. Experiments
Position p2 (m)
-1
-1 0 1 2 3 4 5
Position p1 (m)
Figure 4. Top-down view of the flown trajectory before (thick solid red) and after (thin solid
blue) learning. The dotted black line denotes the nominal trajectory, and the dashed blue line
shows the set point p̂(t) provided to the feedback control law (as seen in Figure 2) after the
learning is applied.
the ability to compensate for temporally localized tracking errors and robustness to high-
frequency uncertainty. We chose the learning step size series to be
3
µik = min 1, for all k (38)
i+1
which provides fast initial convergence with the first three steps being of size 1, and good
robustness to non-repetitive noise since the steps reduce in size as the iteration number
increases.
At this speed, the maneuver could be safely executed with no initial correction input
(i.e. u0 (t) = 0), and the learning algorithm converged. Figure 4 shows the trajectories of
the vehicle and the set points in the horizontal plane, both at the start of the learning
process and after convergence. It can be seen that the deviations from the nominal trajec-
tory after the learning process are minimal, and that the learning algorithm significantly
improved the tracking performance.
Figure 5 shows the evolution of the error coefficients over 22 iterations of the learning
algorithm. It can be seen that the error coefficient magnitude quickly reduced from
values in excess of 100 cm to values below 2 cm. The peak tracking error was reduced from
approximately 200 cm to 5 cm. A significant outlier can be seen at the 18th iteration, likely
caused by a non-repeatable disturbance to the vehicle such as a wireless communication
failure. Because the step size µ18 k = 3/19 is relatively small, the large tracking error in
this iteration was not strongly propagated to following iterations.
It should be noted that the feedback control law used in these experiments was not
tuned to provide the best possible tracking performance without the adaptation; as dis-
cussed in Section 2.2, a straightforward improvement would be the inclusion of feedfor-
ward terms in the control law. What can, however, be seen from these results is that such
157
Paper IV. Frequency Domain Learning for Periodic Maneuvers
100
10−2
0 5 10 15 20
Iteration number i
Figure 5. Evolution of the Fourier series coefficients during learning at a fixed maneuver speed
of λ = 0.7. The lines at iteration zero are from top to bottom: k = {1, 2, 4, 5, 0, 6, 3, 7, 10, 8, 9}.
Note the significant outlier for iteration i = 18, the influence of which on subsequent iterations
was relatively small because µ18k = 3/19 for all k is relatively small.
improvements to the feedback control law do not appear necessary because the adaptation
law can largely eliminate repeatable tracking errors.
158
4. Experiments
N =0
100 N = 10
N =1
RMS Tracking Error (m)
N =2
10−1
N =9 N =5
N =8
10−2
0 5 10 15 20 25
Iteration number i
Figure 6. Learning performance (as measured by the RMS position tracking error) for the
high-performance figure-eight maneuver executed at a speed of λ = 0.8, with varying order
N of the correction Fourier series. The dashed black horizontal line represents the estimated
repeatability limit of 1.4 cm. Note that the relatively large uncertainty at higher frequencies
causes the learning performance to decrease for N = 9, and the vehicle collided with the floor
during the sixth iteration for N = 10.
the flight trajectory learnt using a Fourier correction series of order N = 8, the average
RMS tracking error over the last five iterations (i.e., i = 21, 22, 23, 24, 25) was 24.2 mm
(with a standard deviation of 4.5 mm), indicating that the compensations by the learning
algorithm corrected for almost all systematic errors. In other words, if an input signal
u(t) exists that would entirely eliminate all systematic tracking errors given the true
dynamics of the quadrocopter, then it would allow the tracking error to be reduced by a
further approximately 11 mm compared to the learning algorithm presented here.
159
Paper IV. Frequency Domain Learning for Periodic Maneuvers
100
10−1
0 5 10 15 20 25 30
Iteration number i
Figure 7. Logarithmic plot of the root mean square position tracking error during execution
of the figure-eight maneuver with time scaling. The maneuver execution speed (shown at the top
of the graph) is increased every five iterations. Note that λ = 1.0 denotes the nominal maneuver
speed, and values λ > 1.0 are not nominally feasible.
initialized the speed factor to λ0 = 0.8, and then applied the update law
after every five learning iterations of the maneuver. We used the same learning step
size sequence (38) as before, but at iterations where time scaling occurs, we reset the
iteration counter to i = 0 in order to compensate for the significant model uncertainty
after a change of the time scale. The order of the correction Fourier series was chosen to
be N = 5.
In Figure 7, the results of the experiment show that the RMS tracking error was
initially reduced from 1.7 m to 0.18 m over the first four iterations, which were executed
at λ = 0.8. The RMS error increased by values of 0.005 m to 0.09 m during the first four
speed changes. Beginning with the 25th iteration, the execution speed was λ > 1, i.e.
the maneuver is faster than was computed to be feasible according to Section 4.3. It can
be seen that the tracking error increased as one would expect, and the learning algorithm
ultimately failed to reduce the error at an execution speed of λ = 1.1 due to significant
input saturations. These experimental results correspond well to the predicted feasibility.
160
4. Experiments
was first presented in [21], and preliminary learning results have been presented in [23].
We begin by presenting the equations of motion of the pendulum on top of the quadro-
copter and the controller design used to stabilize the overall system consisting of quadro-
copter and pendulum. Following this, the learning is demonstrated for a sinusoidal tra-
jectory.
System Design The inverted pendulum is modeled as a point mass with two degrees of
freedom, chosen to be the horizontal position of the pendulum center of mass relative to its
base in O (r in the direction of p1 , s in the direction of p2 ). Using Lagrangian mechanics,
the nonlinear equations of motion of the pendulum can be derived [21]. Assuming that the
pendulum is attached to the center of mass of the quadrocopter (i.e. rotational motion of
the vehicle does not affect the pendulum), the overall system dynamics linearized about
the hover point are
g
r̈ = r − R13 g (40)
Lp
g
s̈ = s − R23 g (41)
Lp
p̈1 = R13 g (42)
p̈2 = R23 g (43)
Ṙ13 = ωy (44)
Ṙ23 = −ωx (45)
p̈3 = a − g (46)
where Lp is the length of the pendulum from its base to its center of mass and we
omitted the rotational dynamics (1) under the assumption that they remain controlled
as before using the feedback law (10) and that they are significantly faster than the
remaining dynamics.
The two horizontal degrees of freedom represent fifth-order systems, with the vehicle
forming a triple integrator from the body rate to its position. The vertical motion is
represented by a double integrator from thrust to position.
The feedback controller for the dynamics (40)-(46) is designed using an infinite-horizon
linear quadratic regulator (LQR; see e.g. [30]) design. The design penalizes only the
vehicle position p1 , p2 , p3 and the control effort ωx , ωy , a, and there is no penalty on the
pendulum state. The ratio of the penalties on position and control effort controls the
speed at which the position set point is tracked. Values for this ratio are tuned manually
until the system shows fast performance without saturating the control inputs. Note that,
for this specific controller, no effort of feedback linearization (as presented in Section 2.2
for the vehicle only) is made.
The approximate system dynamics (14) are derived by computing the closed-loop
161
Paper IV. Frequency Domain Learning for Periodic Maneuvers
10−1
0 2 4 6 8 10 12 14 16
Iteration number i
Figure 8. Logarithmic plot of the root mean square position tracking error during the inverted
pendulum tracking task, as seen in Figure 10. The learning step size for this experiment was
chosen to be constant with µ̄k = 0.6 for all k.
Results The vehicle position RMS tracking error during the learning process is shown
in Figure 8. It can be seen that the initial execution caused very high tracking errors (RMS
error of 1.95 m), and that the tracking error was continuously reduced, with tracking errors
below 0.1 m achieved for all iterations i > 11. Figure 9 shows the flight path before and
after application of the learning algorithm, and a series of snapshots of the experiment
can be seen in Figure 10 with the reference trajectory superimposed on the pictures.
The experimental results in the previous section demonstrated the ability of the learning
algorithm to significantly improve the tracking performance for periodic maneuvers. A
162
5. Advantages and Limitations
0.5
Position p2 (m)
-0.5
-1
Figure 9. Top view of the flown trajectories during the inverted pendulum learning experi-
ment. The dotted black lines shows the reference trajectory p∗ (t) as described in Section 4.4.
The thick red line shows the flown trajectory before learning, and the thin blue line shows the
trajectory after applying the learning algorithm.
key enabler for this is that the learning is performed on the closed-loop system of the
quadrocopter under feedback control, which makes repeated maneuver executions highly
repeatable, and therefore well-suited for the non-causal correction of tracking errors.
The parametrization of the correction input u(t) as a truncated Fourier series allows
the use of the order of the series to define a trade-off between 1) the ability to compen-
sate for highly localized tracking errors, and 2) computational complexity and memory
requirements. Relatively low Fourier series orders often suffice to significantly improve
the tracking performance, as seen in our experiments. The limitation to relatively low
Fourier series orders also provides a convenient way to suppress high-frequency jitter in
the learnt compensation, an effect that can be frequently observed in iterative learning
control approaches [51]. Furthermore, inaccuracies of the dynamic model at high frequen-
cies can be circumvented by limiting the learning to lower frequencies, or can be modeled
through the characteristics of the noise models used to capture changing dynamics and
non-repeatable effects. Through the appropriate choice of these noise characteristics,
the learning algorithm can be tuned to quickly compensate for disturbances at frequen-
cies where an accurate model of the system is available, and to perform more cautious
adaptation at frequencies where significant model uncertainty is present.
It can also be seen from our experiments that highly simplified models of the closed-
loop dynamics, though only capturing a relatively rough approximation of the true be-
haviour, suffices to guide the learning process both for the stand-alone quadrocopter as
well as when more complex tasks are performed. Similar results have been demonstrated
for other learning algorithms, e.g. reinforcement learning [28]. The simplified model,
163
Paper IV. Frequency Domain Learning for Periodic Maneuvers
Figure 10. Snapshots of the inverted pendulum trajectory tracking experiment. The red line
denotes the nominal trajectory p∗ (t), and the point on it represents the instantaneous nominal
position at the time of the snapshot. Performance before learning (i = 0) is seen on the left,
and performance after ten learning iterations (i = 10) is seen on the right. The snapshots are
taken every 1 s, and are ordered from top to bottom. Notice that, before learning, the vehicle
lags behind the nominal trajectory significantly.
and therewith the uncomplicated derivation of such a model, make the application of the
algorithm to changing conditions (such as specific tasks, differing controllers, or different
vehicles) a straightforward undertaking.
The algorithm presented particularly lends itself to applications where computational
power and memory are limited. During the execution of the maneuver, the algorithm only
requires the estimation of the coefficients of the tracking error Fourier series up to order
N , and the control input update can be reduced to a matrix multiplication according
to Equation (19) by pre-computing the step size sequence µik and the inverse transfer
function G+ (kΩ0 ) for the considered frequencies k = 0, 1, . . . , N .
Due to the simplified dynamic model, the serial architecture, and the frequency do-
main approach in the learning algorithm, it is not trivial to incorporate additional con-
straints. Examples of potentially useful constraints are the explicit consideration of input
164
6. Conclusion
saturations, or the penalization of control effort. Their inclusion can improve the learning
performance because they help capture the underlying dynamics more accurately, or can
help to shape the outcome of the learning process.
Furthermore, maneuvers exhibiting very large initial tracking errors highlight the
limitations of using a simplified model in the learning process. The system dynamics may
not be captured with sufficient accuracy for large errors, causing the learning algorithm
to fail. The time-scaling method presented herein can offer a viable solution to this
problem when it is possible to initially reduce the execution speed of the maneuver: The
frequency domain approach to iterative learning provides a convenient way to transfer
learnt correction inputs between different execution speeds of the same maneuver. This
allows initial learning to occur at reduced speeds, thus providing a safe way to learn
high-speed maneuvers where a poor initial guess of the correction input can lead to a
crash or to non-convergence.
6. Conclusion
This paper presents the use of a frequency-domain iterative learning scheme for periodic
quadrocopter flight. The iteration-domain feedback law leverages an underlying feedback
control law that stabilizes the vehicle and makes its motion highly repeatable. The nom-
inally linear time-invariant closed-loop dynamics of the quadrocopter feedback system
are used to determine correction values from observed errors in a straightforward man-
ner. By parameterizing the non-causal tracking error compensation as Fourier series, the
algorithm is computationally lightweight and easy to adapt. Uncertainties, such as mea-
surement noise, inaccuracies of the approximate transfer function, or model uncertainty
can be accounted for in the learning step size, and the optimal step size for each frequency
component can be computed from the statistical properties thereof. The approach also
allows the learning of high-performance maneuvers by executing them at a reduced speed
initially and then transferring learnt corrections to higher speeds, where a disturbance
prediction scheme is used to initialize the learning at higher speeds.
The approach was experimentally verified for a quadrocopter executing a high-perfor-
mance motion, and for it executing the complex task of balancing an inverted pendulum
while tracking a trajectory. The experimental results also highlighted the advantages
of learning at reduced speeds, with a the high-performance motion only being learnt
successfully at full speed when using learnt parameters from lower speeds to initialize the
learning process.
References
[1] Airrobot GmbH & Co KG. AirRobot Product Information (Available online at
www.airrobot-us.com/images/Airrobot info.pdf), 2007.
165
Paper IV. Frequency Domain Learning for Periodic Maneuvers
166
References
[17] J Dunfied, M Tarbouchi, and G Labonte. Neural Network Based Control of a Four
Rotor Helicopter. In International Conference on Industrial Technology, 2004.
[18] Walter Gautschi. Numerical Analysis. Birkhäuser Boston, 2012.
[19] Alexandra Suzanne Gibb. Droning the Story. Master thesis, Simon Fraser University,
2011.
[20] Daniel Gurdan, Jan Stumpf, Michael Achtelik, Klaus-Michael Doth, Gerd Hirzinger,
and Daniela Rus. Energy-Efficient Autonomous Four-Rotor Flying Robot Controlled
at 1 kHz. In International Conference on Robotics and Automation, 2007.
[21] Markus Hehn and Raffaello D’Andrea. A Flying Inverted Pendulum. In International
Conference on Robotics and Automation, 2011.
[22] Markus Hehn and Raffaello D’Andrea. A Frequency Domain Iterative Feed-Forward
Learning Scheme for High Performance Periodic Quadrocopter Maneuvers. In
International Conference on Intelligent Robots and Systems, 2013.
[23] Markus Hehn and Raffaello D’Andrea. An Iterative Learning Scheme for High
Performance, Periodic Quadrocopter Trajectories. In European Control Conference,
2013.
[24] Ivo Houtzager, Jan-Willem van Wingerden, and Michel Verhaegen. Rejection of
Periodic Wind Disturbances on a Smart Rotor Test Section Using Lifted Repetitive
Control. IEEE Transactions on Control Systems Technology, 21(2):347–359, 2013.
[25] Hwei P. Hsu. Schaum’s Outline of Signals and Systems. McGraw-Hill, 1995.
[26] Peter C. Hughes. Spacecraft Attitude Dynamics. John Wiley & Sons, 1986.
[27] T Inoue, S Iwai, and M Nakano. High Accuracy Control of a Proton Synchrotron
Magnet Power Supply. In IFAC World Congress, 1981.
[28] J Zico Kolter and Andrew Y Ng. Policy Search via the Signed Derivative. In Robotics:
Science and Systems, 2009.
[29] Alex Kushleyev, Daniel Mellinger, and Vijay Kumar. Towards A Swarm of Agile
Micro Quadrotors. In Robotics: Science and Systems, 2012.
[30] Huibert Kwakernaak and Raphael Sivan. Linear Optimal Control Systems. John
Wiley & Sons, 1972.
[31] Sergei Lupashin and Raffaello D’Andrea. Adaptive Fast Open-Loop Maneuvers for
Quadrocopters. Autonomous Robots, 33(1-2):89–102, 2012.
[32] Sergei Lupashin, Angela P. Schoellig, Markus Hehn, and Raffaello D’Andrea. The
Flying Machine Arena as of 2010. In International Conference on Robotics and
Automation, 2011.
[33] Robert Mahony, Vijay Kumar, and Peter Corke. Multirotor Aerial Vehicles:
Modeling, Estimation, and Control of Quadrotor. IEEE Robotics Automation
Magazine, 19(3):20–32, 2012.
167
Paper IV. Frequency Domain Learning for Periodic Maneuvers
168
References
[49] Angela Schoellig, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Feasiblity
of Motion Primitives for Choreographed Quadrocopter Flight. In American Control
Conference, 2011.
[50] Angela Schoellig, Clemens Wiltsche, and Raffaello D’Andrea. Feed-Forward Param-
eter Identification for Precise Periodic Quadrocopter Motions. In American Control
Conference, 2012.
[51] Angela P Schoellig, Fabian L Mueller, and Raffaello D’Andrea. Optimization-
Based Iterative Learning for Precise Quadrocopter Trajectory Tracking. Autonomous
Robots, 33(1-2):103–127, 2012.
[52] J Schoukens and J Renneboog. Modeling the Noise Influence on the Fourier Coef-
ficients After a Discrete Fourier Transform. IEEE Transactions on Instrumentation
and Measurement, IM-35(3):278–286, 1986.
[53] Roland Siegwart, Illah R. Nourbakhsh, and Davide Scaramuzza. Autonomous Mobile
Robots. The MIT Press, second edition, 2011.
[54] A K Sinha, S Atyeo, A Schauenburg, A Rathore, B Quinn, and T Christoffersen.
Investigation of the Application of UAV Technology in Power Line Inspection. In
International UAV Systems Conference, 2006.
[55] Koushil Sreenath, Nathan Michael, and Vijay Kumar. Trajectory Generation and
Control of a Quadrotor with a Cable-Suspended Load A Differentially-Flat Hybrid
System. In International Conference on Robotics and Automation, 2013.
[56] Stefania Tonetti, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Dis-
tributed Control of Antenna Array with Formation of UAVs. In IFAC World
Congress, 2011.
[57] Tara Volgoi, Hayley Roberts, Lindsay Gerber, and Nathan Aswege. Tracking The
Future. In The Standard, pages 6–7. Vicon Motion Systems Ltd, first edition, 2011.
[58] Youqing Wang and Francis J. Doyle. Stability Analysis for Set-Point-Related Indirect
Iterative Learning Control. In Conference on Decision and Control, 2009.
[59] Youqing Wang, Furong Gao, and Francis J. Doyle. Survey on Iterative Learning
Control, Repetitive Control, and Run-to-Run Control. Journal of Process Control,
19(10):1589–1600, 2009.
[60] Steven L. Waslander, Gabriel M. Hoffmann, Jung Soon Jang, and Claire J.
Tomlin. Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode
vs. Reinforcement Learning. In International Conference on Intelligent Robots and
Systems, 2005.
169
Paper V
Abstract
Quadrocopters allow the execution of high-performance maneuvers under feed-
back control. However, repeated execution typically leads to a large part of the
tracking errors being repeated. This paper evaluates an iterative learning scheme
for an experiment where a quadrocopter flies in a circle while balancing an in-
verted pendulum. The scheme permits the non-causal compensation of periodic
errors when executing the circular motion repeatedly, and is based on a Fourier
series decomposition of the repeated tracking error and compensation input. The
convergence of the learning scheme is shown for the linearized system dynamics.
Experiments validate the approach and demonstrate its ability to significantly im-
prove tracking performance.
171
Paper V. Iterative Learning Scheme for Periodic Trajectories
1. Introduction
The capability of quadrocopters to perform highly dynamic, complex, and precise motions
has been demonstrated repeatedly in recent years (see, for example, [12,14,15,18]). Such
motions are commonly executed by using a first-principles model of the vehicle dynamics
to determine nominal control inputs, and a feedback control law to ensure tracking of the
nominal trajectory.
In order to account for model mismatches, a number of learning schemes have been
developed. Examples include those based on sliding mode control and reinforcement
learning techniques [20], neural networks [5], and adaptive control [16].
When performing the same motion repeatedly, a further opportunity to improve track-
ing performance may arise because many of the disturbances that degrade tracking per-
formance will be similar each time the vehicle performs the motion. It is thus possible
to non-causally correct for these repeatable disturbances: By using data from previous
executions to characterize them, model-based correction inputs can be computed before
executing the motion again. Ideally, these correction inputs are able to fully compen-
sate for the repeatable disturbances, such that the feedback controller is only required to
compensate for non-repeatable disturbances.
Such iteration-based learning approaches have been successfully demonstrated for
multi-rotor vehicles performing high-performance maneuvers. Broadly speaking, the
learning approaches may be separated into two groups:
The first group is characterized by its ability to learn motions that are parameterized.
The motion is thus described by a (finite) set of design parameters, chosen by the user.
After the execution of the motion, these parameters are adapted to compensate for dis-
turbances. The direction and magnitude of the correction may be model-based, or based
on the user’s intuition. A discussion on the importance of choosing ‘good’ design pa-
rameters may be found in [10], where a learning algorithm for this kind of parameterized
motions is demonstrated for multiple flips and fast translations with quadrocopters. A
further demonstration of this class of learning algorithms is provided in [13]. The ability
to shape the tracking performance strongly depends on the number of parameters that
are optimized; in the above examples, the objective is to minimize the error at specific
time instants (‘key frames’), and a relatively small number of parameters is sufficient to
do so. This makes the methods computationally lightweight.
The second group of learning approaches considers more generic motions that need
not be specified by parameters. The system dynamics are considered in discrete time,
and the correction consists of correction values (typically control inputs or set points)
for each discrete time step. After execution of the motion, a numerical optimization over
the correction values is performed in order to minimize a metric related to the tracking
error. In this optimization, a model of the system dynamics provides the mapping from
corrections to the tracking error. This approach is commonly known as a form of iterative
learning control [3], and its application to high performance quadrocopter flight has been
demonstrated [17, 19].
172
1. Introduction
The delimitation between the two groups is not strict. Indeed, the second group of
learning approaches could be seen as using a very large number of values to parameterize
the correction.
This paper evaluates a technique for non-causally compensating repeated, periodic
tracking errors. Specifically, we consider a motion where the linearized dynamics around
the nominal motion are time-invariant under an appropriate coordinate transformation.
Similar to the second group of learning algorithms, we do not assume a parameterized
motion. However, we reduce the dimensionality of the corrections that we intend to learn
by assuming that they are periodic. This allows us to parameterize the corrections as the
coefficients of a truncated Fourier series. The order of the Fourier series provides a means
to trade off computational complexity and the ability to compensate for temporally local
or high-frequency disturbances.
The specific motion considered in this paper consists of a quadrocopter balancing an
inverted pendulum and tracking a circular trajectory [7]. This problem is an interesting
platform for learning algorithms for multiple reasons: Due to the highly dynamic nature
of the motion and the agility required to balance the pendulum, the full range of the
vehicle dynamics is used. Due to relatively high flight speeds and fast vehicle attitude
changes, unmodeled aerodynamic effects (see e.g. [9]) significantly influence the dynamics
of the vehicle. Furthermore, the control system employs a time-varying coordinate system
transformation, the errors in which potentially introduce further periodic errors in the
feedback control loop.
The remainder of this paper is structured as follows: We explain our motivation for
developing the learning algorithm in Chapter 2 by introducing the flying inverted pen-
dulum experiment and recapitulating experimental results that highlight its performance
without learning. We then describe the iterative learning strategy in Chapter 3, and show
convergence of repeatable errors. Chapter 4 shows experimental results from the appli-
cation of the algorithm to the experiment. Finally, we summarize and discuss potential
research directions in Chapter 5.
z
y
O
x
Figure 1. The inertial coordinate system O and the vehicle coordinate system V, used to
describe the dynamics of the system.
173
Paper V. Iterative Learning Scheme for Periodic Trajectories
This chapter introduces the flying inverted pendulum [7], an experiment that demon-
strates the performance and agility of quadrocopters. This experiment is the motivation
behind the algorithm presented in Chapter 3, and is the basis for the experimental results
in Chapter 4. The objective of this experiment is to balance an inverted pendulum on a
quadrocopter while the vehicle flies horizontal circles. The derivations from [7] are repro-
duced here in abbreviated form for the purpose of completeness; the reader is referred to
the previously published paper for a more thorough discussion.
2.1 Dynamics
The quadrocopter is modeled as a rigid body with six degrees of freedom: Its position
(x, y, z) in the inertial coordinate system O, and its attitude, represented by the rotation
between the inertial coordinate system O and the body-fixed coordinate system V, as
shown in Figure 1. The rotation is parameterized by three Euler angles, representing
rotations about the z-axis (α), the y-axis (β) and the x-axis(γ), executed in this order:
O
V R(α, β, γ) = Rz (α) Ry (β) Rx (γ) . (1)
The control inputs are the rotational rates of the vehicle about the three body axes
(ωx , ωy , ωz ) and the collective, mass-normalized thrust applied by the vehicle along its
body z-axis, (a; in units of acceleration). It follows that the differential equations gov-
erning the vehicle motion are
ẍ 0 0
O
ÿ = V R(α, β, γ) 0 + 0 (2)
z̈ a −g
cos β cos γ − sin γ 0 −1 ωx
γ̇
β̇ = cos β sin γ cos γ 0 ωy (3)
α̇ − sin β 0 1 ωz
174
2. The Flying Inverted Pendulum Experiment
vehicle acceleration:
r̈
= h (r, s, ṙ, ṡ, ẍ, ÿ, z̈) . (4)
s̈
cos Ωt − sin Ωt 0
x u
y =: sin Ωt cos Ωt 0 v (5)
z 0 0 1 w
0 0
O O
V R(α, β, γ) 0 =: W R(Ωt, µ, ν) 0 (6)
1 1
r cos Ωt − sin Ωt p
=: . (7)
s sin Ωt cos Ωt q
Ω2 R
µ̄ = arctan(− ) (8)
g
ν̄ = 0 (9)
q
ā = g2 + (Ω2 R)2 (10)
q̄ = 0 (11)
gp̄
Ω2 (R + p̄) + p =0 (12)
L2 − p̄2
where L represents the length of the pendulum from its base to its center of mass. The
vehicle yaw angle, α, is not defined by the motion, and may be chosen separately. In our
experiments, we chose a constant yaw angle α = 0.
175
Paper V. Iterative Learning Scheme for Periodic Trajectories
and use the transformed control input v := (a, µ̇, ν̇, α̇). The true rotational rate control
inputs (ωx , ωy , ωz ) can be recovered for known values of v and x through Equations (6)
and (3). The linear dynamics resulting from a Taylor expansion about the nominal
trajectory are then time-invariant:
where a tilde denotes small deviations from the nominal trajectory, and f (x̄, v̄) denotes
the system dynamics consisting of Equations (2)-(4) under the coordinate transforma-
tion (5)-(7). An infinite-horizon linear quadratic regulator [2] was designed to stabilize
the system around the nominal trajectory. We denote the resulting time-invariant state
feedback law by ṽ = Kx̃.
176
3. Learning Algorithm
Error (m)
p̃ ũ
0.2 q̃ ṽ
-0.2
-0.4
Time t (s)
0 5 10 15
Figure 2. Errors in the rotating coordinate system (from [7]): Pendulum position error (p̃, q̃)
and quadrocopter position error (ũ, ṽ). At t = 2 s, the controller is switched from a constant
nominal position to a circular trajectory with R = 0.1 m. Large repeating errors can be seen.
3. Learning Algorithm
The approach presented in this paper is conceptually similar to [10], which presents an
adaptation strategy to correct for state errors at discrete points in time of parameterized
motion primitives. However, we consider periodic errors (instead of errors at specific
points in time), and do not require parameterization of the maneuver. We propose a
correction strategy, and show that, under the assumption of linear time-invariant system
dynamics, this strategy indeed fully corrects for such recurring errors.
177
Paper V. Iterative Learning Scheme for Periodic Trajectories
N
X N
X
v̂ = r0 + rk cos (kωt) + sk sin (kωt) . (17)
k=1 k=1
N
X N
X
x̃ = a0 + ak cos (kωt) + bk sin (kωt) (18)
k=1 k=1
for k = 1 . . . N .
ỹ = Cx̃ . (22)
Our objective is to eliminate the effects of errors described by the Fourier series (18)
on the output. Due to the truncation of the Fourier series, higher-frequency components
are not considered, and the order of the series would have to be increased in order to
compensate for them.
178
3. Learning Algorithm
The dimension of ỹ is not defined by the problem, and may be chosen by the user as
the set of error outputs that should be minimized. If the dimension of ỹ is larger than
the dimension of the adaptation control input v̂, a full correction of all errors cannot be
expected.
The above equations (19)-(21) can be written in matrix form, and multiplied by C in
order to describe the output tracking errors (22):
B 0 sk −kω −Ā ak
= (23)
0 B rk −Ā kω bk
−1
C 0 −kω −Ā B 0 sk C 0 ak
= . (24)
0 C −Ā kω 0 B rk 0 C bk
| {z } | {z }
=:J =:ek
where γ is the step size parameter. The step size parameter can be used to trade off
convergence of errors and noise rejection. This may be necessary because ei−1
k usually
contains components caused by non-repetitive process noise and measurement errors, and
is therefore only an estimate of the true repeatable tracking error.
3.5 Convergence
Using the above feedback law, the tracking error Fourier coefficients ek evolve as follows:
i i−1 !
sk sk
eik = ei−1
k +J − (26)
rk rk
= ei−1
k − γJJ + ei−1
k . (27)
If J + is the right inverse of J (the interpretation of this is that there are not more
179
Paper V. Iterative Learning Scheme for Periodic Trajectories
tracking error outputs than adaptation control inputs), it follows that the tracking error
converges to zero for 0 < γ ≤ 1:
Note that if the tracking error dimension is lower than the input dimension, the cor-
rection term J + ek is the least-squares solution to an underconstrained set of equations [1],
implying that the Euclidian norm of the Fourier coefficients is minimized. By Parseval’s
theorem [8], this is equivalent to minimizing the energy of the correction input signal v̂.
Now consider the case where J + is not the right inverse, implying that there are more
tracking error outputs than inputs. In this case, the error dynamics are
where I denotes the identity matrix and (·)T denotes the transpose of a matrix. Assuming
that J is full rank, we apply a singular value decomposition [1]:
Σ
J =U VT (31)
0
I 0
eik = I − γU U T eki−1 . (32)
0 0
| {z }
=:M
Using a similarity transformation [1], we show that the error does not diverge by
computing the eigenvalues of M :
eig (M ) = eig U T M U
(33)
I 0
= eig I − γ (34)
0 0
= {1 − γ, . . . , 1 − γ, 1, . . . , 1} . (35)
This implies that, under the appropriate coordinate transformation, the tracking error
coefficients either reduce to zero over time, or remain constant. Note that because this
case implies fewer control inputs than tracking errors, it is to be expected that not all
components of the tracking error Fourier series can be driven to zero.
180
4. Experimental results
4. Experimental results
This chapter presents the application of the learning algorithm to the flying inverted
pendulum experiment introduced in Chapter 2. We start by presenting the experimental
setup, then discuss the implementation of the learning algorithm, and finally show results
obtained.
1. Measure the tracking error ỹ for multiple circles. While measuring the error over
one circle would suffice, averaging it over several circles improves the rejection of
non-repetitive disturbances. In our experiments, the tracking error was measured
and averaged over three circles.
2. Compute the updated input correction Fourier series v̂ according to Equation (25).
3. Wait for several circles to allow the system to converge under the new input cor-
rection. Our experiments showed that this step is important in order to correctly
measure the systematic tracking error in the next iteration. For the experiments
presented in this paper, we chose to wait for two circles before starting the error
measurement for the next iteration.
We chose the tracking error output to be the vehicle position error in the rotating coordi-
nate system: ỹ = (ũ, ṽ, w̃). This implies that the objective is for the vehicle to correctly
track the horizontal circle in size, phase, and height. The nominal trajectory was set to a
circle radius R of 0.3 m. Errors in the pendulum position p̃, q̃ are not penalized in these
experiments.
181
Paper V. Iterative Learning Scheme for Periodic Trajectories
Figure 3. Photograph of experiments: The vehicle flying a circle while balancing the inverted
pendulum.
The update rate γ was chosen to be 0.3, a value that showed a good trade-off between
noise rejection and speed of convergence in our experiments. The order of the error and
input Fourier series, N , was varied during experiments. It was found that orders higher
than N = 1 had little impact on the tracking performance during our experiments.
Error Coefficient Norm
0.4
0.2
0
0 2 4 6 8 10
Iteration Number i
Figure 4. Evolution of the tracking error output Fourier coefficients during learning. The
lines shown represent the Euclidian norm of the Fourier coefficients: ||Ca0 || (solid blue), ||Ca1 ||
(dashed green) and ||Cb1 || (dotted red). Higher order terms are significantly smaller, and have
been omitted in this figure.
182
5. Conclusion and Outlook
4.3 Results
The circular motion of the pendulum was started with all correction coefficients set to
zero, i.e. v̂ = 0. Figure 4 shows the Euclidian norm of the error Fourier series coefficients
over ten iterations. It can be seen that initial tracking performance is relatively poor with
peak tracking errors of 64 cm. The errors are then quickly reduced over the first three
iterations, after which non-repeatable disturbances cause them to vary from iteration
to iteration while small improvements are made. After ten iterations of the learning
algorithm, the peak tracking error is reduced to 11 cm, and subsequent iterations do not
improve tracking performance significantly.
Figure 5 shows the flight path of the vehicle in the horizontal plane before learning
and after the tenth learning iteration. Note that the initial flight path shows large errors,
with the flown circle being much too large, shifted from the desired centre point, and
warped. After ten learning iterations, the tracking performance has improved consid-
erably, although remaining, largely unrepeatable, disturbances still prevent the vehicle
from following the trajectory perfectly.
This paper evaluated an iterative adaptation scheme that improves tracking performance
when periodic disturbances cause poor tracking under feedback control. We have derived
convergence properties for the presented method, and have shown that our approach
greatly improves performance in an experiment where a quadrocopter balances an in-
verted pendulum while flying circles.
The method was presented with a focus on the specific problem of trajectories for
quadrocopters, and a number of assumptions were made in order to simplify derivations.
As is, the method is limited to dynamic systems where the linearization around the
nominal trajectory is – under an appropriate coordinate transformation – time-invariant.
We intend to investigate transferability to more general system descriptions, and hope to
perform more experiments to verify its applicability to other problems.
One open question is the choice of the order of the Fourier series used to represent
tracking errors and correction inputs. While we chose the order manually for our exper-
iments, more systematic approaches could be considered.
Furthermore, it would be worthwhile to investigate whether learnt correction values
could be transferred to similar maneuvers, as proposed for Iterative Learning Control
in [17]. This would be particularly applicable to our experiment because attempts to fly
large circles with the inverted pendulum currently lead to crashes before learning permits
correct tracking of the trajectory.
183
Paper V. Iterative Learning Scheme for Periodic Trajectories
0.4
-0.2
-0.4
-0.6
-0.5 0 0.5
Vehicle Position x (m)
Figure 5. Horizontal flight trajectory of the quadrocopter over two rounds of the circular
trajectory. The dashed blue line shows the flight path before learning, the solid red line shows
flight after ten iterations of learning. The nominal trajectory is shown in dotted black. Errors
in height are not shown; they were reduced from a maximum error of 0.42 m to a maximum
of 0.01 m.
References
184
References
[10] Sergei Lupashin and Raffaello D’Andrea. Adaptive Fast Open-Loop Maneuvers for
Quadrocopters. Autonomous Robots, 33(1-2):89–102, 2012.
[11] Sergei Lupashin, Angela P. Schoellig, Markus Hehn, and Raffaello D’Andrea. The
Flying Machine Arena as of 2010. In International Conference on Robotics and
Automation, 2011.
[12] D. Mellinger, N. Michael, and V. Kumar. Trajectory Generation and Control
for Precise Aggressive Maneuvers with Quadrotors. The International Journal of
Robotics Research, 31(5):664–674, 2012.
[13] Daniel Mellinger, Nathan Michael, and Vijay Kumar. Trajectory Generation
and Control for Precise Aggressive Maneuvers with Quadrotors. In International
Symposium on Experimental Robotics, 2010.
[14] Nathan Michael, Daniel Mellinger, Quentin Lindsey, and Vijay Kumar. The GRASP
Multiple Micro UAV Testbed. IEEE Robotics and Automation Magazine, 17(3):56–
65, 2010.
[15] Mark Muller, Sergei Lupashin, and Raffaello D’Andrea. Quadrocopter Ball Juggling.
In International Conference on Intelligent Robots and Systems, 2011.
[16] Ivana Palunko and Rafael Fierro. Adaptive Control of a Quadrotor with Dynamic
Changes in the Center of Gravity. In IFAC World Congress, 2011.
[17] Oliver Purwin and Raffaello D’Andrea. Performing and Extending Aggressive
Maneuvers Using Iterative Learning Control. Robotics and Autonomous Systems,
59(1):1–11, 2011.
[18] Robin Ritz, Mark W. Müller, Markus Hehn, and Raffaello D’Andrea. Cooperative
Quadrocopter Ball Throwing and Catching. In International Conference on Intelli-
gent Robots and Systems, 2012.
[19] Angela P Schoellig, Fabian L Mueller, and Raffaello D’Andrea. Optimization-
Based Iterative Learning for Precise Quadrocopter Trajectory Tracking. Autonomous
Robots, 33(1-2):103–127, 2012.
[20] Steven L. Waslander, Gabriel M. Hoffmann, Jung Soon Jang, and Claire J.
Tomlin. Multi-Agent Quadrotor Testbed Control Design: Integral Sliding Mode
vs. Reinforcement Learning. In International Conference on Intelligent Robots and
Systems, 2005.
185
Paper VI
Abstract
We extend the classic control problem of the inverted pendulum by placing the
pendulum on top of a quadrotor aerial vehicle. Both static and dynamic equilibria
of the system are investigated to find nominal states of the system at standstill and
on circular trajectories. Control laws are designed around these nominal trajec-
tories. A yaw-independent description of quadrotor dynamics is introduced, using
a ‘Virtual Body Frame’. This allows for the time-invariant description of curved
trajectories. The balancing performance of the controller is demonstrated in the
ETH Zurich Flying Machine Arena testbed. Development potential for the future is
highlighted, with a focus on applying learning methodology to increase performance
by eliminating systematic errors that were seen in experiments.
187
Paper VI. A Flying Inverted Pendulum
1. Introduction
The inverted pendulum is a classic control problem, offering one of the most intuitive,
easily describable and realizable nonlinear unstable systems. It has been investigated for
several decades (see, for example, [11], and references therein). It is frequently used as a
demonstrator to showcase theoretical advances, e.g. in reinforcement learning [6], neural
networks [14], and fuzzy control [13].
In this paper, we develop a control strategy that enables an inverted pendulum to
balance on top of a quadrotor. Besides being a highly visual demonstration of the dynamic
capabilities of modern quadrotors, the solution to such a complex control problem offers
insight into quadrotor control strategies, and could be adapted to other tasks.
Quadrotors offer exceptional agility. Thanks to the off-center mounting of the pro-
pellers, extraordinarily fast rotational dynamics can be achieved. This is combined with
typically high thrust-to-weight ratios, resulting in large achievable translational acceler-
ations when not carrying a payload.
While most early work on quadrotors focused on near-hover operation (e.g. [5], and
references therein), a growing community is working on using the full dynamical potential
of these vehicles. Flips have been executed by several groups, some focusing on speed and
autonomous learning [7] and some on safety guarantees [3]. Other complex maneuvers,
including flight through windows and perching have been demonstrated [8].
In Section 2, we introduce the dynamic models used in the controller design. Section 3
presents static and dynamic nominal trajectories for the quadrotor to follow. The dynam-
ics are then linearized around these trajectories in Section 4, and linear state feedback
controllers are designed in Section 5. The experimental setup and results are shown in
Section 6 and conclusions are drawn in Section 7, where an outlook is also presented.
2. Dynamics
We derive the equations of motion of the quadrotor and the inverted pendulum for the
trajectory-independent general case.
Given that the mass of the pendulum is small compared to the mass of the quadrotor,
it is reasonable to assume that the pendulum’s reactive forces on the quadrotor are
negligible. The dynamics of the quadrotor, then, do not depend on the pendulum, whereas
the dynamics of the pendulum are influenced by the motion of the quadrotor. This
assumption is justified by the experimental setup, with the weight of the pendulum being
less than 5% of that of the quadrotor vehicle.
2.1 Quadrotor
The quadrotor is described by six degrees of freedom: The translational position (x, y,
z) is measured in the inertial coordinate system O as shown in Figure 1. The vehicle
188
2. Dynamics
z
y
O
x
Figure 1. The inertial coordinate system O and the vehicle coordinate system V.
attitude V is defined by three Euler angles. From the inertial coordinate system, we
first rotate around the z-axis by the yaw angle α. The coordinate system is then rotated
around the new y-axis by the pitch angle β, and finally rotated about the new x-axis by
the roll angle γ:
O
V R(α, β, γ) = Rz (α) Ry (β) Rx (γ) , (1)
where
1 0 0
Rx (γ) = 0 cos γ − sin γ , (2)
0 sin γ cos γ
cos β 0 sin β
Ry (β) = 0 1 0 , (3)
− sin β 0 cos β
cos α − sin α 0
0 0 1
The translational acceleration of the vehicle is dictated by the attitude of the vehicle
and the total thrust produced by the four propellers. With a representing the mass-
normalized collective thrust, the translational acceleration in the inertial frame is
ẍ 0 0
O
ÿ = V R(α, β, γ) 0 + 0 . (5)
z̈ a −g
189
Paper VI. A Flying Inverted Pendulum
The vehicle attitude is not directly controllable, but it is subject to dynamics. The
control inputs are the desired rotational rates about the vehicle body axes, (ωx , ωy , ωz ),
and the mass-normalized collective thrust, a, as shown in Figure 2. High-bandwidth
controllers on the vehicle track the desired rates using feedback from gyroscopes. The
quadrotor has very low rotational inertia, and can produce high torques due to the out-
ward mounting of the propellers, resulting in very high achievable rotational accelerations
on the order of 200 rad s−2 . The vehicle has a fast response time to changes in the desired
rotational rate (experimental results have shown time constants on the order of 20 ms).
We will therefore assume that we can directly control the vehicle body rates and ignore
rotational acceleration dynamics. As with the vehicle body rates, we assume that the
thrust can be changed instantaneously. Experimental results have shown that the true
thrust dynamics are about as fast as the rotational dynamics, with propeller spin-up
being noticeably faster than spin-down.
The rates of the Euler angles are converted to the vehicle body coordinate system V
through their respective transformations:
ωx γ̇ 0 0
−1 −1 −1
ωy = 0 + Rx (γ) β̇ + Rx (γ) Ry (β) 0 . (6)
ωz 0 0 α̇
The above can be written more compactly by combining the Euler rates into a single
vector, calculating the relevant rows of the rotation matrices, and solving for the Euler
angle rates:
−1
− sin γ
γ̇ cos β cos γ 0 ωx
β̇ = cos β sin γ cos γ 0 ωy . (7)
α̇ − sin β 0 1 ωz
where L to denotes the length from the base of the pendulum to its center of mass. We
model the pendulum as an inertialess point mass that is rigidly attached to the mass
center of the quadrotor, such that rotations of the vehicle do not cause a motion of the
pendulum base. In the experimental setup, the point that the pendulum is attached to is
mounted off-center by about 10% of the length of the pendulum. While this assumption
causes modeling errors, it simplifies the dynamics to such a great extent that the problem
190
2. Dynamics
ωy ωz
ωx
Figure 2. The control inputs of the quadrotor: The rotational rates ωx , ωy , and ωz are tracked
by an on-board controller, using gyroscope feedback.
becomes much more tractable. The Lagrangian [9] of the pendulum can be written as
1 rṙ + sṡ 2
L = 2 2
(ẋ + ṙ) + (ẏ + ṡ) + (ż − )
2 ζ (9)
− g (z + ζ) ,
where we assume unit pendulum mass without loss of generality. The first term
represents the kinetic energy of the pendulum, and the second the potential energy. The
full, nonlinear dynamic equations can be derived from L using conventional Lagrangian
mechanics:
d ∂L ∂L
− =0 (10)
dt ∂ ṙ ∂r
d ∂L ∂L
− =0, (11)
dt ∂ ṡ ∂s
r̈
= f (r, s, ṙ, ṡ, ẍ, ÿ, z̈) , (12)
s̈
191
Paper VI. A Flying Inverted Pendulum
1
4 2 2 2
r̈ = 2 − r ẍ − L − s ẍ
(L − s2 )ζ 2
− 2r2 sṙṡ + −L2 + s2 ẍ + r3 ṡ2 + ss̈ − ζ (g + z̈)
(13)
2 3 2 2 2 2 2
+ r −L ss̈ + s s̈ + s ṙ − ζ (g + z̈) + L −ṙ − ṡ + ζ (g + z̈)
1
4 2 2 2
s̈ = 2 − s ÿ − L − r ÿ
(L − r2 )ζ 2
− 2s2 rṙṡ + −L2 + r2 ÿ + s3 ṙ2 + rr̈ − ζ (g + z̈)
(14)
+ s −L2 rr̈ + r3 r̈ + r2 ṡ2 − ζ (g + z̈) + L2 −ṙ2 − ṡ2 + ζ (g + z̈)
vehicle in a nonlinear fashion. This attitude, combined with the thrust a, controls the
translational acceleration of the vehicle. While the acceleration drives the translational
motion of the vehicle linearly, it also drives the motion of the pendulum through nonlinear
equations. The combined system consists of thirteen states (three rotational and six
translational states of the quadrotor, and four states of the pendulum), and four control
inputs (three body rates, and the thrust).
3. Nominal trajectories
In this section, we find static and dynamic equilibria of the system that satisfy Equations
(5), (7), and (12). These are used as nominal trajectories to be followed by the quadrotor.
Corresponding nominal control inputs are also described. We denote nominal values by
a zero index (x0 , r0 , etc.).
192
3. Nominal trajectories
r = r0 = 0 (17)
s = s0 = 0, (18)
meaning that, as expected, the inverted pendulum is exactly over the quadrotor.
cos Ωt − sin Ωt 0
x u u
y =: Rz (Ωt) v = sin Ωt cos Ωt 0 v . (19)
z w 0 0 1 w
To describe the vehicle attitude, a second set of Euler angles is introduced, describing
the ‘virtual body frame’ W and named η, µ, and ν:
O
W R(η, µ, ν) = Rz (η) Ry (µ) Rx (ν) , (20)
1 1
The above equation defines values for three elements of the rotation matrices. As every
column of a rotation matrix has unit norm [12], however, this equation only defines two
of the angles (η, µ, ν).
Comparing the constraint (21) with the translational equation of motion of the vehi-
cle (5), it is straightforward to see that the virtual body frame W represents an attitude
that is constrained such that it effects the same translational motion of the quadrotor as
the vehicle attitude V. The remaining degree of freedom represents the fact that rota-
tions about the axis along which ωz acts have no affect on the quadrotors translational
motion.
Applying (19), its derivatives, (21), and setting the free parameter η = Ωt, the quad-
193
Paper VI. A Flying Inverted Pendulum
ẅ a cos µ cos ν − g
Ω2 R
µ0 = arctan(− ), (23)
g
ν0 = 0 , (24)
q
a0 = g2 + (Ω2 R)2 . (25)
Knowing the nominal values for (η0 , µ0 , ν0 ), we solve for (α0 , β0 , γ0 ) using Equation (21).
Analogous to the constant position case, we set α0 = 0, simplifying (21) to
sin β0 cos γ0 cos Ωt sin µ0 cos ν0 + sin Ωt sin ν0
− sin γ0 = sin Ωt sin µ0 cos ν0 − cos Ωt sin ν0 , (26)
cos β0 cos γ0 cos µ0 cos ν0
which can be solved for β0 and γ0 . This completes the description of the nominal states
required for the translational motion (5): In the coordinate systems C and W, the
nominal position and attitude are constant. Using Equations (19) and (26), the time-
varying nominal states in O and V may be found.
To calculate the rotational rate control inputs in Equation (7), we take the first
derivative of Equation (26). It can be shown that
Combining these equations with the results from Equations (26) and (7), the nominal
states can be solved for the nominal control inputs (ωx 0 , ωy 0 , ωz 0 ). The full derivation is
made available online at www.idsc.ethz.ch/people/staff/hehn-m .
194
4. Dynamics about nominal trajectories
Identically to the vehicle, the pendulum relative coordinates r and s are rotated by Ωt:
r cos Ωt − sin Ωt p
=: . (29)
s sin Ωt cos Ωt q
Applying this rotation to the Lagrangian derivations of the motion of the pendulum
(10), (11) and setting the base motion (ẍ, ÿ, z̈) to the circular trajectory, the pendulum
dynamics can be shown to be
We seek a solution where p˙0 = q˙0 = p¨0 = q¨0 = 0, leading to the following constraints
for equilibrium points:
gq0
Ω2 (q0 ) +=0, (32)
ζ0
gp0
Ω2 (R + p0 ) + =0. (33)
ζ0
The only solution to the first equation is q0 = 0. The second equation defines a nonlin-
ear relationship between Ω, R, and p0 , with a solution p0 always existing in the range
−R ≤ p0 ≤ 0. This result is intuitive: The center of mass of the pendulum must lie to-
wards the center of the circle, such that the centripetal force acting on it is compensated
for by gravity. If the center of mass were to lie further than R inwards, the centripetal
force would change sign, making the pendulum fall.
Linear approximate dynamics are derived through a first-order Taylor expansion of the
equations of motion (5), (7), and (12) about the nominal trajectories found in Section
3. We denote small deviations by a tilde (x̃, r̃, etc.). We present only the resulting
linearized dynamics. The corresponding derivations are not shown in this paper due to
space constraints, but are made available online.
195
Paper VI. A Flying Inverted Pendulum
of freedom of the quadrotor-pendulum system decouple entirely along the three axes of
the O coordinate system, resulting in the following equations:
g
r̃¨ = r̃ − β̃g (34)
L
g
s̃¨ = s̃ + γ̃g (35)
L
¨
x̃ = β̃g (36)
ỹ¨ = −γ̃g (37)
β̃˙ = ω̃y (38)
γ̃˙ = ω̃x (39)
z̃¨ = ã (40)
The two horizontal degrees of freedom represent fifth-order systems, with the vehicle
forming a triple integrator from the body rate to its position. The vertical motion is
represented by a double integrator from thrust to position.
2
gL2
ζ
p̃¨ = 2 p̃(Ω2 + 3 ) + 2q̃Ω+
0 ˙
L ζ0
(41)
p0 p0
µ̃(− a0 sin µ0 − a0 cos µ0 ) + ã( cos µ0 − sin µ0 )
ζ0 ζ0
g
q̃¨ = q̃(Ω2 + ) − 2p̃Ω ˙ + ν̃a0 (42)
ζ0
ũ¨ = ã sin µ0 + µ̃a0 cos µ0 + 2ṽΩ
˙ + ũΩ2 (43)
ṽ¨ = −ν̃a0 − 2ũΩ
˙ + ṽΩ 2
(44)
w̃¨ = ã cos µ0 − µ̃a0 sin µ0 (45)
We observe that the linearized dynamics are indeed time-invariant in the coordinate
systems C and W.
The above equations reduce to Equations (34) – (40) if setting R = 0 and Ω = 0.
If R = 0, Ω is a free parameter (see Equation (33)), allowing the description of the
dynamics (34) – (40) in a rotating coordinate system.
196
5. Controller Design
5. Controller Design
We design linear full state feedback controllers to stabilize the system about its nominal
trajectories. We use an infinite-horizon linear-quadratic regulator (LQR) design [1] and
determine suitable weighting matrices.
6. Experimental results
The algorithms presented herein were implemented in the Flying Machine Arena, an aerial
vehicle development platform at ETH Zurich [7]. We present results demonstrating the
performance of the controllers designed in the previous Section.
197
Paper VI. A Flying Inverted Pendulum
Figure 3. The quadrotor balancing the pendulum, at standstill. The mass center is about
half-length of the pendulum.
198
6. Experimental results
Error (m)
r̃ x̃
0.2 s̃ ỹ
-0.2
Time t (s)
0 5 10 15
Figure 4. State errors during balancing: Pendulum position error (r̃, s̃) and quadrotor position
error (x̃, ỹ). The pendulum is manually placed on the vehicle at approximately t = 4.25 s, and
the balancing controller is switched on at approximately t = 4.75 s.
retro-reflective markers mounted to the vehicle. The total closed-loop latency is approx-
imately 30 ms.
The inverted pendulum consists of a carbon fiber tube, measuring 1.15 m in length.
The top end of the pendulum carries a retro-reflective marker, allowing the position of
this point to be determined through the motion tracking system in the same manner as
vehicles are located. The center of mass of the pendulum is 0.565 m away from its base.
Figure 3 shows the quadrotor and the pendulum.
Conventional desktop computers are used to run all control algorithms, with one com-
puter acting as an interface to the testbed. Data is exchanged over Ethernet connections.
A Luenberger observer is used to filter the sensory data and provide full state information
to the controller. The observer also compensates for systematic latencies occurring in the
control loop, using the known control inputs to project the system state into the future.
199
Paper VI. A Flying Inverted Pendulum
on integrated errors. The main suspected reason for these steady-state errors are mis-
calibrations in the system: Errors in the vehicle attitude measurement lead to the linear
controller trading off the attitude error and the position error, and biased measurements
of the on-board gyroscopes result in a biased response to control inputs.
Though originally designed for a constant position, this controller has been success-
fully tested for set point tracking at moderate speeds. A video of this is available online
at www.idsc.ethz.ch/people/staff/hehn-m .
Error (m)
p̃ ũ
0.2 q̃ ṽ
-0.2
-0.4
Time t (s)
0 5 10 15
Figure 5. Errors in the rotating coordinate system: Pendulum position error (p̃, q̃) and quad-
rotor position error (ũ, ṽ). The pendulum is balanced by the quadrotor during the entire duration
shown on this plot. At t = 2 s, the controller is switched from a constant nominal position to a
circular trajectory with R = 0.1 m.
200
6. Experimental results
A video showing the experiments of both cases presented herein is available online at
www.idsc.ethz.ch/people/staff/hehn-m .
0.4
(x0 ,y0 ) (x,y)
-0.2
x Position (m)
Figure 6. The trajectory of the quadrotor and pendulum in O, compared to the nominal
trajectory.
201
Paper VI. A Flying Inverted Pendulum
Error (m)
p̃ ũ
0.2 q̃ ṽ
-0.2
-0.4
Time t (s)
0 5 10 15
Figure 7. Simulation results: Errors in the rotating coordinate system: Pendulum position
error (p̃, q̃) and quadrotor position error (ũ, ṽ). At t = 2 s, the controller is switched from a
constant nominal position to a circular trajectory with R =0.1m.
-0.2
-0.4
-0.4 -0.2 0 0.2 0.4
x Position (m)
Figure 8. Simulation results: The trajectory of the quadrotor and pendulum in O, compared
to the nominal trajectory.
This result highlights the influence of biased sensory information, leading to position
errors in the inertial coordinate system O. These are observed as oscillating errors in C.
The mean errors on the trajectory also show different characteristics in simulation
and reality. This is particularly noticeable in the pendulum position error p̃ and q̃. The
simulation contains a detailed dynamic model of the quadrotor that has been validated
202
7. Conclusion and Outlook
in several experiments. It is therefore probable that the differences are due mainly to the
simulated pendulum dynamics. The non-modeled off-center mounting of the pendulum
could explain this discrepancy.
Circle radii R of up to 0.5 m have been successfully tested in simulation and real-
ity. The vehicle and pendulum positioning errors increase with the circle size, but the
controllers are still capable of keeping the pendulum in balance.
We have developed linear controllers for stabilizing a pendulum on a quadrotor, which can
be used for both static and dynamic equilibria of the pendulum. The virtual body frame
is a useful tool to describe motions in a convenient coordinate system (e.g. allowing the
use of symmetries), without enforcing this rotation for the vehicle. Using its properties
and a rotating coordinate system, the system description is time-invariant on circular
trajectories.
This allows the straightforward application of well-established state feedback design
principles. Controllers for standstill and circular motion have been validated experimen-
tally and are shown to stabilize the pendulum. This key milestone allows us to shift our
focus towards improving system performance.
Experimental results revealed systematic errors when applying the control laws. There
appear to be different sources of these errors:
• Miscalibrations of sensors cause biases in the experimental setup. These errors are
observed in the attitude information from the motion capture system, and in the
vehicle on-board control loops using gyroscope feedback.
• The simplifying assumption that the pendulum is mounted at the center of mass of
the vehicle is violated in the experimental setup. Rotations of the vehicle therefore
cause a motion of the pendulum base point.
• The equations of motion used to derive nominal trajectories and linear models
neglect many real-world effects, such as drag and underlying dynamics of the control
inputs.
• The control laws are designed assuming continuous-time control, while the vehicle
is controlled at only 50Hz.
We have identified two approaches for extending the controller design presented in this
paper. The first, and most straightforward approach, is to include states that represent
the integrated errors, and to weigh them appropriately in the controller design. This
would permit compensation for some of the systematic errors. For instance, one would
expect this to drive the vehicle position errors in the standstill case to zero.
203
Paper VI. A Flying Inverted Pendulum
References
[1] Dimitri P. Bertsekas. Dynamic Programming and Optimal Control, Volume 2. Athena
Scientific, 2007.
[2] Insik Chin, S. Joe Qin, Kwang S. Lee, and Moonki Cho. A Two-Stage Iterative
Learning Control Technique Combined with Real-Time Feedback for Independent
Disturbance Rejection. Automatica, 40(11):1913–1922, 2004.
[3] JH Gillula, Haomiao Huang, MP Vitus, and CJ Tomlin. Design of Guaranteed Safe
Maneuvers Using Reachable Sets: Autonomous Quadrotor Aerobatics in Theory and
Practice. International Conference on Robotics and Automation, 2010.
[4] Daniel Gurdan, Jan Stumpf, Michael Achtelik, Klaus-Michael Doth, Gerd Hirzinger,
and Daniela Rus. Energy-Efficient Autonomous Four-Rotor Flying Robot Controlled
at 1 kHz. In International Conference on Robotics and Automation, 2007.
[5] Jonathan P. How, Brett Bethke, Adrian Frank, Daniel Dale, and John Vian.
Real-Time Indoor Autonomous Vehicle Test Environment. IEEE Control Systems
Magazine, 28(2):51–64, 2008.
[6] Michail G. Lagoudakis and Ronald Parr. Least-Squares Policy Iteration. Journal of
Machine Learning Research, 4(6):1107–1149, January 2003.
[7] Sergei Lupashin, Angela Schöllig, Michael Sherback, and Raffaello D’Andrea. A
Simple Learning Strategy for High-Speed Quadrocopter Multi-Flips. In International
Conference on Robotics and Automation, 2010.
[8] Nathan Michael, Daniel Mellinger, Quentin Lindsey, and Vijay Kumar. The GRASP
Multiple Micro UAV Testbed. IEEE Robotics and Automation Magazine, 17(3):56–
65, 2010.
[9] Francis C Moon. Applied Dynamics: With Applications to Multibody and Mecha-
tronic Systems. Wiley, 1998.
204
References
205
Paper VI. A Flying Inverted Pendulum
206
Curriculum Vitae
Markus Hehn
born 14th March 1985
207
Curriculum Vitae
Peer-Reviewed Publications
Journals
• Michael Benz, Markus Hehn, Christopher H. Onder, and Lino Guzzella. Model-Based
Actuator Trajectories Optimization for a Diesel Engine Using a Direct Method. Journal
of Engineering for Gas Turbines and Power, Volume 133, Issue 3, 2011.
• Markus Hehn, Robin Ritz, and Raffaello D’Andrea. Performance Benchmarking of Quad-
rotor Systems Using Time-Optimal Control. Autonomous Robots, Volume 33, Issue 1-2,
pp 69-88, 2012.
• Sergei Lupashin, Markus Hehn, Mark W. Mueller, Angela P. Schoellig, Michael Sherback,
and Raffaello D’Andrea. A Platform for Aerial Robotics Research and Demonstration:
The Flying Machine Arena. Mechatronics, Volume 24, Issue 1, pp 41-54, 2014.
• Markus Hehn and Raffaello D’Andrea. A Flying Inverted Pendulum. Proceedings of the
2011 IEEE International Conference on Robotics and Automation (ICRA), 2011.
• Sergei Lupashin, Angela P. Schoellig, Markus Hehn, and Raffaello D’Andrea. The Flying
Machine Arena as of 2010. Proceedings of the 2011 IEEE International Conference on
Robotics and Automation (ICRA) - Video Submission, 2011.
• Angela Schoellig, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Feasibility
of Motion Primitives for Choreographed Quadrocopter Flight. Proceedings of the 2011
American Control Conference (ACC), 2011.
• Markus Hehn and Raffaello D’Andrea. Quadrocopter Trajectory Generation and Control.
Proceedings of the 2011 IFAC World Congress, 2011.
• Stefania Tonetti, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Distributed
Control of Antenna Array with Formation of UAVs. Proceedings of the 2011 IFAC World
Congress, 2011.
• Robin Ritz, Markus Hehn, Sergei Lupashin, and Raffaello D’Andrea. Quadrocopter Per-
formance Benchmarking Using Optimal Control. Proceedings of the 2011 IEEE/RSJ In-
ternational Conference on Intelligent Robots and Systems (IROS), 2011.
• Markus Hehn and Raffaello D’Andrea. Real-Time Trajectory Generation for Interception
Maneuvers with Quadrocopters. Proceedings of the 2012 IEEE/RSJ International Confer-
ence on Intelligent Robots and Systems (IROS), 2012.
• Robin Ritz, Mark W. Mueller, Markus Hehn, and Raffaello D’Andrea. Cooperative
Quadrocopter Ball Throwing and Catching. Proceedings of the 2012 IEEE/RSJ Inter-
national Conference on Intelligent Robots and Systems (IROS), 2012.
• Markus Hehn and Raffaello D’Andrea. An Iterative Learning Scheme for High Performance,
Periodic Quadrocopter Trajectories. Proceedings of the 2013 European Control Conference
(ECC), 2013.
208
• Dario Brescianini, Markus Hehn, and Raffaello D’Andrea. Quadrocopter Pole Acrobatics.
Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS), 2013.
• Markus Hehn and Raffaello D’Andrea. A Frequency Domain Iterative Feed-Forward Learn-
ing Scheme for High-Performance Periodic Quadrocopter Maneuvers. Proceedings of the
2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013.
• Mark W. Mueller, Markus Hehn, and Raffaello D’Andrea. A Computationally Efficient
Algorithm for State-to-State Quadrocopter Trajectory Generation and Feasibility Verifica-
tion. Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots
and Systems (IROS), 2013.
209