Professional Documents
Culture Documents
Game Theoretic Planning For Self-Driving Cars in Competitive Scenarios
Game Theoretic Planning For Self-Driving Cars in Competitive Scenarios
I. I NTRODUCTION
In this work we seek to enable an autonomous car to interact
with other vehicles, both human-driven and autonomous, while
reasoning about the other vehicle’s intentions and responses.
Interactions among multiple cars are complicated because there
is no centralized coordination, and there exist both elements (b) Outdoor experiments using X1 — a drive-by-wire research
of cooperation and competition during the process. On the vehicle at Thunderhill Raceway Park, CA
one hand, all agents must cooperate to avoid collisions. On Fig. 1: We demonstrate the performance of our algorithm in
the other hand, competition is found in numerous scenarios experiments using both RC cars indoors and a full-size
such as merging, lane changing, overtaking, and so on. In research vehicle, X1, outdoors.
these situations, the action of one agent is dependent on the
intentions of other agents, and vice versa, thus exhibiting rich
dynamic behaviors that are seldom characterized in the existing
literature. do better by changing its action unilaterally. We develop our
We propose a game theoretic planning algorithm that explic- algorithm for a two car racing game in which the objective
itly handles such complex phenomena. In particular, we exploit of each car is to advance along the track as far beyond its
the concept of Nash equilibria from game theory, which are competitor as possible. The vehicles also share a common
characterized by a set of actions such that no single agent can collision avoidance constraint to formalize the intention that
1
Mingyu Wang, John Talbot, and J. Christian Gerdes are with the both cars want to avoid collisions. Nevertheless, there is no
Department of Mechanical Engineering, Stanford University, Stanford, CA means of explicit communication between the vehicles. It is
{94305, USA, mingyuw, john.talbot,
} gerdes through this collision avoidance constraint that one agent can
@stanford.edu. impose influence on the other, and also can acquire information
2
Zijian Wang and Mac Schwager are with the Department of Aeronautics
and Astronautics, Stanford University, Stanford, CA 94305, USA,{ zjwang, by observing or probing the opponent. We use an iterative
schwager} @stanford.edu. algorithm, called Sensitivity Enhanced Iterated Best Response
Toyota Research Institute (”TRI”) provided funds to assist the authors with (SE-IBR), that seeks to converge to a Nash equilibrium in the
their research but this article solely reflects the opinions and conclusions of its
authors and not TRI or any other Toyota entity. joint trajectory space of the two cars. The ego vehicle
iteratively
solves several rounds of trajectory optimization, first for itself, ego vehicle accounts for the predicted response of the other
then for the opponent, then again for itself, etc. The fixed vehicle. The planned trajectories are specifically suited to cars
point of this iteration represents a Nash equilibrium in the as we parameterize them by piecewise time polynomials which
trajectories of the two vehicles, hence the trajectory for the take into account the vehicle kinematics and also can enforce
the bounded steering and acceleration constraints. in [4] uses an information theoretic planner in each iteration,
We verify our algorithm in both simulations and which accommodates stochasticity, while ours uses a nonlinear
experiments. The simulations use a high-fidelity model of MPC approach with continuous-time vehicle trajectories,
vehicle dynamics and show that our game theoretic planner which allows us to incorporate realistic kinematic constraints.
significantly out- performs an opponent using a conservative In a similar vein [5] proposes a game theoretic approach for an
MPC-based trajec- tory planner which only avoids collisions autonomous car to interact with a human driven car. That
passively. We also perform experiments with scale model cars, work proposes a Stackelberg solution (as opposed to a Nash
and a full sized autonomous car. For the model cars, we show solution) with potential field models to account for human
again that the car using our game theoretic planner driver behavior. It also does not consider constraints (e.g. road
significantly out-performs its competitor with the or collision constraints) for the vehicles.
aforementioned MPC based planner. A snapshot from an Aside from the above examples, the existing literature in
experimental run is shown in Fig. 1a. For the full-scale car game-theoretic motion planning and modeling for multiple
(shown in Fig. 1b), we show that the game theoretic planner competing agents typically are either for discrete state or action
outperforms a simulated competing vehicle (simulated for spaces [6]–[8], or are specifically for pursuit-evasion style
safety reasons) using the aforementioned MPC- based planner. games. In [9], a hierarchical reasoning game theory approach
is used for interactive driver modeling; [10] predicts the motion
II. RELATED WORK of vehicles in a model-based, intention-aware framework using
Our work builds upon the results in [1], in which the an extensive-form game formulation; [11], [12] considers Nash
Sensitiv- ity Enhanced Iterated Best Response (SE-IBR) and Stackelberg equilibria in a two-car racing game where
algorithm was proposed for drone racing, assuming single the action space of both players are finite and the game is
integrator dynamics for the drones, and representing the formulated in bimatrix form. A similar approach is also used to
planned trajectories as discrete waypoints. Recently in [2], [3], model human collision avoidance behavior in an extensive
Wang et al. extended this algorithm to multiple drones (more form game [13]. In contrast to these previous works, we
than two) in 3d environ- ments, again with single integrator consider a continuous trajectory space which can capture the
dynamics. The main innova- tion in our work beyond these is complex interactions and realistic dynamics of cars.
(i) to represent the trajectory as a continuous-time piecewise Differential games are studied in [14]–[17] in the context of a
polynomial, (ii) to incorporate bicycle kinematic constraints, pursuit-evasion game where pursuers and evaders are
(iii) and to include turning angle and acceleration constraints. antagonistic towards each other. This antagonistic assumption
All of these advancements are for the purpose of making this is unrealistic in real-world driving and leads to over
algorithm suitable for car racing. We also provide new conservative behaviors for the ego vehicle.
simulation results with a high fidelity car dynamics simulator, There is also a vast body of work on collision avoidance
experimental results with small-scale autonomous race cars in in a non-game-theoretic framework. To name just a few, [18],
the lab, and experimental results with a full-scale autonomous [19] present a provable distributed collision avoidance frame-
car on a test track. work for multiple robots using Voronoi diagrams, under the
Some other recent works have proposed Iterative Best Re- assumption that agents are reciprocal. Control barrier functions
sponse (IBR) algorithms to reach Nash equilibria in multi- are used in [20] for adaptive cruise control. [21] presents a
vehicle interaction scenarios. In [4], Williams et al. propose an control structure for cars in emergency scenarios where cars
IBR algorithm together with an information theoretic planner have to maneuver at their limits to achieve stability and avoid
for controlling two ground vehicles in close proximity. That collisions with static obstacles. [22] introduces a minimally-
work considers problems in which the objective function of interventional controller for closed-loop collision avoidance
each vehicle depends explicitly on the states of both vehicles. using backward reachability analysis. By contrast, we focus on
In contrast, in our problem, the only coupling between the motion planning in a competitive two-car racing game, where
two vehicles is through a collision avoidance constraint, which agents are adversarial and have influence on each other through
is handled by our SE-IBR algorithm by adding a sensitivity shared collision avoidance constraints.
term to the best response iterations. Secondly, the scenario in III. P RELIMINARIES
[4] is collaborative: the vehicles try to maintain a prescribed
distance from one another. Our scenario is competitive: the In this section, the game theoretic planner from [1] is
vehicles are competing to win a race. Finally, the algorithm concisely summarized for two simplified discrete-time single-
integrator robots in a racing scenario. Our work in this paper
builds upon this previously proposed framework.
θ j ∈Θj (θ∗i )
where lτ is total track length. Track tangent and normal vectors θ∗j = arg max
fj (θ j ). (3b)
are denoted as: t = τ J and n = τ JJ . Track width is wτ . These Computing Nash equilibria in general can be
track parameters are known by all racing agents beforehand.
The progress of one player along the track is defined as the arc computationally intractable and there can be multiple Nash
length s of the closest point to the robot’s current position pi on equilibria in a game. Instead, an iterative algorithm can be used
the track center line: 1 to approach a Nash equilibrium online at each time step, as
s (p ) = arg min ||τ (s) − p ||2. (1) explained below.
i i i
2 s
The objective of a player is to travel along the track as far as B. Sensitivity Enhanced Objective Function
possible ahead of its opponent, which can be formulated by the To simplify the notation, (2) can be rewritten as:
following optimization problem
max si (θi ) − sj (θ j ), (4a)
max si (pNi ) − sj (pN j) (2a) θi
θ
i
s.t. hi (θ i ) = 0, (4b)
s.t. pni = pn−1i + un i (2b) gi (θi ) ≤ 0, (4c)
pn − pn ≥ d
j i i (2c) γi (θi , θ j ) ≤ 0, (4d)
i i
i
.n(pn)T [pn − τ (pn)]. ≤ wτ (2d) where hi(·) represents equality constraint (2b); gi(·) represents
n
ǁui ǁ ≤ ui, (2e) inequality constraints
constraint (2e),both
(2c) involving (2d)players ·) is the
and γi(·,which inequality
provides the only
where the decision variable is θi = [p1 , . . . , pN , u1 , . . . , uN coupling between the ego player and its opponent. As we
]. i
pointed out, although player i does not have control
i i i authority
The optimization will be solved and used in a receding horizon
of the strategy of player j, it could still pose influence on the
planning fashion with N steps. (2a) represents the objective
of one player to travel further than its opponent at the end calculation of the opponent’s optimal strategy through the cou-
pled collision avoidance constraint (2c). In other words, player
of planning horizon. Constraint (2b) enforces the integrator
dynamics, (2c) is the collision avoidance constraint for the i’s strategy does affect the payoff of player j at Nash equilibria.
Thus, the optimal payoff of player j can be represented as a
two players which we denote by i and j, di is the minimum
clearance distance for agent i to avoid collision (may be function of θ∗i , i.e., s∗j (θ ∗i ). To incorporate the influence
different or the same for different agents, and can be used
of
to represent risk tolerance for collisions). The constraint (2d) one player on its opponent’s optimal payoff, we propose to
substitute the objective in problem (4) by:
keep the player inside the track boundaries, and (2e) is the
actuation limit constraint with maximum velocity ui, which si (θ i )of− sαs ∗
∗ j (θ i ). (5)
may be different for different agents. A closed-form solution j (θ∗ i ) is not easily available.
linear approximation of s j (θi ) around an available
In this optimization problem, note that player i only has solution However, we could apply sensitivity analysis [24]
and get a of θ i . A sensitivity enhanced Iterated Best
access to its own planned trajectory θ i . The opponent’s planned Response (IBR)
trajectory θj is not available. However, because of the collision algorithm is used in our algorithm. The algorithm starts from
avoidance constraints (2c), the ego player needs pj s to calculate an initial guess of opponent’s trajectory; solves the
its optimal solution. In other words, the feasible trajectory set optimization problem (4) with sensitivity-enhanced objective
of player i depends on the unknown trajectory of player j. (5) for both the ego and opponent player iteratively. When
To deal with the competitive nature of this problem, we solving the problem for one player, we freeze and use the
writing
model itthe
as objective of game,
a zero-sum player which (pN ) besseen
j as sjcould N
i (p ). We use
explicitly solution of the other player’s strategy from the previous
by the concept of Nash equilibria [23]. jUnlike − in i[5], where iteration.
the
robot car is assumed to take action first and the human driven Specifically, assume that in the lth iteration, the solution of
ego player’s optimization problem from last iterationi is
car responds, in our work, no player has information advantage θ l−1 . Then, θl−1 is treated as fixed when the ego robot
over its opponent. Thus, Nash equilibria better suit the problem. solves i the optimization for the opponent to obtain the
A Nash equilibrium is a strategy profile (a trajectory pair, in opponent’s optimal
our value s∗ (θl−1 ). If we linearize s∗ (θ i ) around θl−1 :
j i j i
case) where no player can do better by unilaterally changing its ds j
strategy. Formally, Θ = Θi × Θj is the set of feasible strategy s (θ ) = s (θ ) + ∗
∗ ∗ l−1 . (θ − θ l−1 ). (6)
i j
j i j i dθ i .θi =θi i i
l−1
Note that the computation of si (θ i ) also requires an opti- tory section between two consecutive waypoints is represented
mization problem
of this term as givenanalysis
by sensitivity in (1). Again, a linear
is obtained. Theapproximation
final explicit using a second-order time polynomial.
form of the optimization
T Nproblem is as
l ∂γ j follows:
l
θ. (8) up to second order. Such relationship is called endogenous
.max t p + αµ
(θ i l
,θ j) transformation. For bicycle model, this transformation is given
i
as:
i
i j i
l−1
.
θ ∈Θ ∂θ i
[1]. 1 2
(10)
x = σ1, y = σ 2,
Using this sensitivity-enhanced objective function, it is
1 2
σ˙ 2 σ¨2
2
a= √
σ˙ 2 + σ˙ 2
v(t)
θ˙(t) = tan φ(t),
L
1)Continuity constraints: We enforce the position and veloc- where
straintsf are
(t) nonlinear,
is a second-order polynomial
therefore inthe
we impose t. These con-
constraint
ity to be continuous at the waypoints;
[xn(tn), yn(tn)] =pn, κmax ≤ κ¯ max (17)
n+1
[xn(tn+1), yn(tn+1)] =p , using Sequential Quadratic Programming (SQP) [27].
(12)
[x˙ n(tn ), (tn )] =un,
y˙n
[x˙ n (tn+1 ), y˙ n (tn+1 )] =un+1 , C. Sensitivity Enhanced Iterated Best Response
for n =above,
equations 0, 1, .we
. . ,obtain
N 1. Substitute (11) into the To summarize, the optimization problem each player
− in A a and set of equality constraints
n
should solve is described in the following. For player i, the
Bn. de- cision variables are: ∈Ai R3N , Bi R3N which
The following constraints on speed, acceleration, and curva- are the coefficients vectors of N second-order polynomials
ture should be enforced on each piece of the polynomial. For and
clarity of notations, we omit subscript n for the nth piece of θ i = [p0i, . . . , pii N , u
i 0 , . . . , uN ], which is consistent with
polynomials, unless there is possible confusion.
2)Speed constraints: From (10), we have: the
previous notation. Note that this approach is also known as
collocation method [28]. For clarity, rewrite the optimization
v 2 (t) = x˙ 2 (t) + y˙ 2 (t), problem in the following form.
which is a second-order polynomial in t. Plug in (11), we max si (pNi ) − sj (pN ) (18a)
obtain coefficients which are functions of A·,·s and B·,·s. θ i ,Ai ,B i
j
s.t. hi (θ i , Ai , B i ) = 0, (18b)
gi (θ i , Ai , B i ) ≤ 0, (18c)
Notice that the maximum value of speed in the interval γ (θ , θ ) ≤ 0, (18d)
i i j
t ∈ [tn, tn+1] is attained at its end points. Thus, we have
vn,max(t) = max{||un||, ||un+1||}. Velocity constraints where
2
Algorithm 1 Sensitivity Enhanced Iterated Best Response
2: L:
1: maximum
observe numbercurrent
opponent’s of iterations
states pin0, IBR
u0
3: initialize opponent trajectory: θ1 = [p1, . . . , pN+1]
j
4: for l = 1, 2, . . . , L do
θl θj = θ j
l
5:
6:
ego:
obtainsolve
ego (19) using
optimal SQP with
strategy
i
7: opponent: solve (19) using SQP with θ i = θ l
i
9: end obtain
8: for opponent optimal strategy θlj
Fig. 3: Snapshots during a simulation in the blocking scenario.
The GTP car (red trajectories) starts in the first position
j with lower speed limit with respect to the MPC car (green
10: ego: solve (19) with θ j = θ L trajectories). The top right figure shows that the two cars
entering a corner choose the short course. The other three
tire model [29], [30] and carefully calibrated system figures show that the leading vehicle is actively blocking the
parameters of our experiment vehicle. following vehicle by moving intentionally in front of it. This
The proposed algorithm is implemented using Gurobi 1 may appear to be suboptimal but it ensures its leading position
solver with C++ interface. The non-convex constraints and in the game.
objective function are imposed through linearization.
Lineariza- tion is applied based on solution from the previous
iteration. For the highly-nonlinear curvature constraints (17),
we choose only to enforce the constraints on first half of the
trajectory. We found this significantly improves the stability of
solution. Since the algorithm is implemented in receding
horizon fashion, only the first several steps are executed so the
second half of the trajectory will not be executed during the
process. Moreover, to achieve online and real-time planning, (a) Overtaking scenario 1.
we solve the optimization problem outlined in Algorithm 1 for
two game
iterations (L = 2) and do not wait until convergence. For both
simulation and experiments, we use Robot Operating System
(ROS) framework which handles message-passing.
The two players use the proposed game theoretic planner
(GTP) and a naive Model Predictive Control (MPC) planner (b) Overtaking scenario 2.
respectively. The naive MPC controller serves as a baseline
strategy and only optimizes its own objective (4) using a initial Fig. 4: Snapshots during simulations in the overtaking scenario.
guess of the opponent’s behavior. In other words, the baseline The GTP car (red trajectories) starts in the secondary position
planner does not exploit its influence on the other player. with a higher speed limit behind the MPC car (green trajec-
tories). Only key snapshots during overtaking are shown here.
useshaped
6.5 the The
m.
with total
following length lτ =The
parameters.
planning
216 m andtrack
horizon is 5 racing
half-width
sec and the is oval- wupdates
planner
τ = We
In 4a, the MPC car passively avoids collision and steers away
at from its optimal trajectory. In 4b, the GTP car overtakes the
m/s22 and
Hz. the
The maximum
maximum acceleration
curvature is 0.11 mfor
−1 both cars is 5
, corresponding to
a maximum steering angle of 18◦. MPC car from the left.
We demonstrate the performance of the proposed planner in
two settings: overtaking and blocking. In simulations, the two
cars start from randomly selected positions, one always in front
(higher maximum speed) MPC car and maintain a
of the other. The maximum speed of the leading car is always
leading position in the game. Snapshots during a
set to a lower speed limit (5 m/s) than its competitor (6 m/s). simulation are shown in Fig. 3. Red trajectories are GTP
The GTP car starts in the leading position in blocking tests and
planner trajec- tories and green trajectories are MPC
the secondary position in overtaking tests.
planner trajectories. Since we adopt a receding horizon
1)In blocking scenarios, the GTP car has the lower max- planning approach, the illustrated trajectories are planned
imum speed and thus is in a disadvantageous situation. trajectories instead of true path. We could observe that
Naturally, we would expect the slower car to be over- on the first-half of the straight segment, the GTP car
taken. However, in most cases, it is able to block the tends to steer in front of its competitor instead of going
1 straight forward. Even though going straight would
http://www.gurobi.com/
achieve more progress along the track for itself, blocking
is a better strategy to win the race. When the two cars
are entering corners, however,
Pixfalcon Controller
Odroid Computer