Transportation Research Part C: Lucas Barcelos de Oliveira, Eduardo Camponogara

Transportation Research Part C 18 (2010) 120139
Contents lists available at ScienceDirect
Transportation Research Part C

journal homepage: www.elsevier.com/locate/trc
Multi-agent model predictive control of signaling split in urban

trafc networks q
Lucas Barcelos de Oliveira, Eduardo Camponogara *
Department of Automation and Systems Engineering, Federal University of Santa Catarina, Cx.P. 476, 88040-900 Florianpolis, SC, Brazil
a r t i c l e
i n f o
Article history:
Received 15 October 2008
Received in revised form 3 March 2009
Accepted 29 April 2009
Keywords:
Urban trafc networks
Split control
Distributed agents
Distributed optimization
Model predictive control
a b s t r a c t
The operation of large dynamic systems such as urban trafc networks remains a challenge
in control engineering to a great extent due to their sheer size, intrinsic complexity, and
nonlinear behavior. Recently, control engineers have looked for unconventional means
for modeling and control of complex dynamic systems, in particular the technology of
multi-agent systems whose appeal stems from their composite nature, exibility, and scalability. This paper contributes to this evolving technology by proposing a framework for
multi-agent control of linear dynamic systems, which decomposes a centralized model
predictive control problem into a network of coupled, but small sub-problems that are
solved by the distributed agents. Theoretical results ensure convergence of the distributed
iterations to a globally optimal solution. The framework is applied to the signaling split
control of trafc networks. Experiments conducted with simulation software indicate that
the multi-agent framework attains performance comparable to conventional control. The
main advantages of the multi-agent framework are its graceful extension and localized
reconguration, which require adjustments only in the control strategies of the agents in
the vicinity.
2009 Elsevier Ltd. All rights reserved.
1. Introduction
The steady advances in communications and computer technology are shaping the way trafc control systems are designed. Today, operating centers can receive data from remote sensors and apply control policies that respond to the prevailing trafc conditions. Among the existing real-time control systems, the Trafc-responsive Urban Control (TUC) framework
(Diakaki et al., 2002) has drawn interest for its simplicity, robustness, and good performance corroborated with eld applications in Munich, Southampton, and Chania (Bielefeldt et al., 2004; Diakaki and Papageorgiou, 1997; Kosmatopoulos et al.,
2006). TUC uses a modied store-and-forward model of trafc ow (Gazis and Potts, 1963) with purely continuous state and
control variables which greatly simplies the synthesis of a control strategy. In its baseline form, TUC has an off-line and an
on-line module (Diakaki et al., 2002). The off-line module solves an unconstrained linear-quadratic-regulator (LQR) problem
that minimizes a quadratic cost function on queue lengths and deviations from nominal split signals. The on-line module
produces feasible split signals, which satisfy green time bounds and add up to cycle time, by solving a quadratic program
that minimizes the distance from the infeasible signals obtained with the LQR policy. Invariably, such a framework does
not necessarily reach optimal solutions to the underlying constrained control problem (Camacho and Bordons, 2004). To this
end, model predictive control (MPC) approaches have been proposed to explictly handle constraints and thereby improve
solution quality of the TUC framework (Aboudolas et al., 2007; de Oliveira and Camponogara, 2007).
q
This research was supported in part by Conselho Nacional de Desenvolvimento Cientco e Tecnolgico (CNPq) under Grant #473841/2007-0.
* Corresponding author. Tel.: +55 48 3721 7688; fax: +55 48 3721 9934.
E-mail address: camponog@das.ufsc.br (E. Camponogara).
0968-090X/$ - see front matter 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.trc.2009.04.022
L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139
121
The technology of multi-agent systems has also advanced in the past decades, particularly so in articial intelligence and
software engineering (Jennings, 2000; Maturana et al., 2005). This evolving technology aims to arrange agents of limited perception and expertise in an organization to perform tasks that are beyond the abilities of the agents. The problem-solving
ability of a multi-agent system emerges from the interactions of the agents, which employ some form of reasoning to cooperate with others and resolve conicts when driven by the interests of the organization.
Intelligent agents and multi-agent systems have been successful in solving unstructured problems (for which adequate
models are not known), in the replacement of and assistance to humans (Toms and Garcia, 2005; Rigolli and Brady,
2005; Nguyen-Duc et al., 2008), in solving high abstraction problems (Pechoucek et al., 2006; Tumer and Agogino, 2007),
and in handling discrete decisions (Yamashita et al., 2005; de Oliveira et al., 2005; Balan and Luke, 2006). The nature of these
problems contrasts with dynamic control problems, which are typically structured (for which good models based on differential equations are known) and where the aim is to control machines, the decisions are of low level and demand guarantees
of stability and convergence, and the control variables are continuous.
While multi-agent systems are very adaptive in unstructured problems, they have been mostly used as a software engineering paradigm in the eld of dynamic control systems (Maturana et al., 2005; Srinivasan and Choy, 2006; Tatara et al.,
2007). Control engineers and computer scientists are bridging the gap between these disciplines by developing multi-agent
systems to cope with the sheer size and complexity of large dynamic control systems (Li et al., 2005; Manikonda et al., 2001;
Tatara et al., 2005; Negenborn et al., 2008). The appeal for multi-agent technology stems from the composite nature, exibility, and scalability.
Aligned with these efforts, this paper proposes a framework for a network of distributed agents to control linear dynamic
systems, which are put together by interconnecting linear sub-systems with local input constraints. Our framework decomposes the optimization problem arising from the MPC approach into a network of coupled, but small sub-problems to be
solved by the agent network. Each agent senses and controls the variables of its sub-system, while communicating with
agents in the vicinity to obtain neighborhood variables and coordinate their actions. A well-crafted problem decomposition
and coordination protocol ensure convergence of the agents iterates to a global optimum of the MPC problem.
The work reported here builds upon preceding work on distributed control (Camponogara et al., 2002; Camponogara and
Talukdar, 2007) by exploiting the linear dynamic structure to develop simpler models and algorithms. The paper focuses on
the development of the multi-agent MPC framework and its application to the control of signaling split in urban trafc networks. While being able to attain performance comparable to centralized MPC, the multi-agent MPC framework is more robust in that the failure of a control agent compromises only its local sub-system. And it also supports a plug-in technology
that allows for graceful expansion and reconguration to be performed locally, rather than having to coordinate at the control center.
The remaining sections are structured as follows. Section 2 presents basic concepts of urban trafc networks and describes the store-and-forward model used by the TUC strategy. Section 3 formulates split control as an MPC problem for
a network of dynamically coupled sub-systems, one for each intersection. Last but not least, the section develops a decomposition of the MPC problem into a set of sub-problems and outlines a distributed algorithm for the agent network to reach
an optimal solution. Section 4 reports results from computational experiments aimed to compare the TUC LQR strategy with
the multi-agent MPC approach. Section 5 draws some nal remarks and suggests directions for future work.
2. Urban trafc control
The origin of urban trafc control dates back to the early 20th century with the appearance of trafc lights. The rst attempts of real-time trafc control began in the 1980s with the implementation of SCOOT (Robertson and Bretherton, 1991;
Hunt et al., 1981) and SCATS (Lowrie, 1982) strategies. Nevertheless, despite the continuous research in the past decades,
most of the control strategies still rely on heuristics to compute the signaling split such as the acclaimed TRANSYT (Robertson,
1969).
Urban trafc control is usually divided in several modules which are responsible for several aspects of trafc control.
These modules include ramp metering, dynamic message signaling, signaling split control, and public transport. By split
we mean the green light time assigned to each street or road of an intersection. This is one of the four control factors that
mostly inuence trafc (Diakaki, 1999; Papageorgiou, 2004), with the others being stage specication, cycle duration, and
offset between intersections. The signaling split control module of the TUC strategy is of particular interest to this paper.
The trafc-responsive urban control framework uses a store-and-forward model which represents trafc ow with continuous variables, thereby facilitating the synthesis of multi-variable control algorithms such as LQR and MPC. The underlying assumption of this store-and-forward model is shown in Fig. 1. The bold full line in colors green1 and red represent
the cycle of a junction. The square wave in full line represents the usual trafc ow model of a single stream of vehicles, using
integer variables to differentiate periods with right of way and saturated ow, associated with the green portion of the cycle
line, from periods with no ow, where the cycle line is red. The dashed line on the other hand represents the same ow of vehicles as seen by the model proposed by Gazis and Potts (1963). From this illustration, one can view the store-and-forward model
as the mean ow crossing the stop line of an intersection during the control interval, meaning that this interval has to be greater
For interpretation of color in Fig. 1, the reader is referred to the web version of this article.
122
Fig. 1. Store-and-forward ow (dashed line) and ow modeled with binary variables (full line).
than the intersections cycle time. The TUC trafc model does not try to realistically model the complex and rapidly evolving
dynamics of trafc, such as driver reaction times, acceleration, and deceleration, but rather it is concerned with the long term
evolution of the in-and-outows of the network.
2.1. Urban trafc network modeling
The models for trafc network and trafc ow presented below are from (Diakaki, 1999). A urban trafc network consists
of intersections or junctions joined by links which represent streets, avenues, roads or any other infrastructure connecting
them. A junction comprises a set of approaches ending at a common crossing area. An approach is a subset of the lanes of
a link from which vehicles are able to cross the intersection simultaneously, being dened by the topology and stages of
the network. A stage, or phase, is the period of time during which the trafc light signals are held constant at the intersection.
Approaches may also be further divided into one or more streams. The maximum ow that can cross the stop line of an intersection when a stream has the right of way (r.o.w.) is the saturation ow, which is usually expressed in vehicles per hour. The
yellow time introduced between consecutive phases to ensure safety is known as lost time. And the time frame until the repetition of stages is called cycle time or cycle. These concepts are the building blocks for trafc modeling.
Fig. 2 shows a urban trafc network with two roads each of which has 4 lanes. Taking the horizontal link on the westeast
direction as the reference, one notices two distinct approaches: one bundling vehicles willing to make a left turn and the
other bundling the vehicles wishing to go straight ahead. The arrows show all streams of this network. The gure also illustrates the three stages of the intersection which are repeated in each cycle.
A urban trafc network is therefore viewed as a directed graph whose nodes are the junctions j 2 J and whose arcs correspond to the links z 2 Z. The sets Ij and Oj have the incoming and outgoing links of junction j, respectively. The routes of
Fig. 2. Basic concepts for trafc modeling.
123
vehicles entering the network are assumed to follow statistical patterns that are modeled by turning rates. Specically, the
turning rate sz;w gives the rate of vehicles that reach a junction j from link z 2 Ij and turn into a link w 2 Oj . For the purpose of
the trafc control analysis herein, turning rates sz;w , cycle times C j and lost times Lj at the junctions, and saturation ows Sz
for the links are all known constants.
Let F j be the set of phases at junction j, while uj;i denotes the green time of phase i 2 F j . It is typical to have all intersections
P
operating with a common cycle time C, which is enforced by the constraint i2F j uj;i Lj C. An additional constraint is
h
i

max
uj;i 2 umin
umax
where umin
is the minimum (maximum) allowable green time. Also, let V z # F j be the subset of
j;i ; uj;i
j;i
j;i
phases for which link z has the r.o.w. at junction j.
The trafc ow dynamics of the network link z in Fig. 3 is given by
Dxz t 1 DTqz t dz t pz t cz t
where t 1; 2; . . . is a discrete time index and DT is the control interval; xz denotes the number of vehicles in link z; qz (pz ) is
the inow (outow) of link z during the time window DTt; t 1; dz is the demand, that is, the vehicles not originating from
adjacent links that enter the network; and cz is the exit ow.
Because turning rates are known, the trafc ow into link z is expressed as
qz t
sw;z pw t
w2Ij
where sw;z is the turning rate towards link z 2 Oj1 coming from link w 2 Ij1 . Demand and exit rates are lumped together as a
single disturbance, say ez t. Assuming that inows and outows of link z with r.o.w. are equal to their saturation ow, Sz , Eq.
(1) becomes
2
xz t 1 xz t DT 4
w2Ij
3
Sw X
Sz X
sw;z
uj ;i t
uj ;i t ez t5
C i2V 1
C i2V 2
w
P
where the control signal uj1 ;i t is the green time for vehicles going through junction j1 during phase i, whereas i2V z uj2 ;i t is
the green time for vehicles leaving link z. Notice that link z starts at junction j1 and ends at j2 . Generalizing Eq. (2) for all
network links leads to the matrix equation
xt 1 Axt But et
where xt is the state vector; ut is the control vector containing signals uj;i t; 8i 2 F j ; 8j 2 J; et is the vector with the
disturbances; and A I is the state matrix, whereas B is the control input matrix.
2.2. Split control
Trafc-responsive control systems adjust split signals according to the demands of involved streams. In standard form,
the TUC strategy uses the LQR technique to nd a time-invariant gain matrix, which is simpler than optimizing a performance criterion (Diakaki et al., 2002) but potentially delivering a sub-optimal control law. To apply the LQR technique,
the disturbances are disregarded and the dynamic system (3) becomes:
xt 1 Axt But
Such assumption is plausible since the goal is to attain a satisfactory gain matrix. The minimization of the proportional
, where xmax
is the link capacity, is attempted to reduce the risk of oversaturation and spillback.
occupancy of the links xz =xmax
z
z
To this end, the following quadratic function is used:
Fig. 3. Trafc ow dynamics in a link.
124
1

1X
kxtk2Q kutk2R
2 t0
where Q and R are diagonal matrices, with the rst being positive denite and the second being positive semi-denite2.
According to the LQR theory, an innite time horizon is used in (5) to achieve a time-invariant control law. As matrix Q weighs
the states (the number of vehicles in the roads), the minimization of the average occupancy is achieved by making its diagonal
2 for the corresponding links z 2 Z. Matrix R reects the penalty imposed on control effort, usually
elements equal to 1=xmax
z
dened as R rI where r is found experimentally. Minimizing criterion (5) leads to the control law
ut Lxt
where L is Ricattis gain matrix which depends on A; B; Q , and R, but with small susceptibility to variation of these matrices
(Diakaki et al., 2002). The feedback control law (6) does not account for the constraints on the control signals, which are imposed in an ad hoc manner by solving the following problem at each sample time t and for each junction j 2 J
X
uj;i t U j;i t2
Q j t : min
U j;i t
7a
i2F j
s: to :
U j;i t Lj C j
7b
i2F j
max
U j;i t 2 umin
j;i ; uj;i ; 8i 2 F j
7c
where U j;i t is the closest solution in Euclidean space to uj;i t. Q j t is a quadratic program which can be solved in real-time
with an efcient algorithm (Diakaki, 1999) that converges in at most jF j j steps. Although this approach gives a feasible split,
the resulting solution does not necessarily satisfy the optimality conditions for the dynamic system dened by Eq. (4) subject
to the constraints on control signals. Actually, this multi-variable regulator behaves in a purely reactive way to unknown
disturbances because no predictions on disturbances are made. On the other hand, the structure of matrix L provides the
regulator with a gating effect, that is, the split of highly loaded links on peripheral junctions are reduced to preclude saturation in upstream links and thereby avoid gridlocks.
Previous works (Aboudolas et al., 2007; de Oliveira and Camponogara, 2007; de Oliveira, 2008) report that signicant
improvements may be induced by replacing the standard LQR control law with a procedure that accounts for systems constraints such as model predictive control. Generally speaking, the MPC approach is composed by (Camacho and Bordons,
2004; Khne, 2005)
a prediction model satisfactorily describing the process dynamics in a nite-time horizon;
a cost function which gives the control signals when minimized; and
a sliding horizon of prediction and control, which is translated a step forward at each sample period, requiring the computation of new control actions from which only that of the actual time is implemented.
Model predictive control minimizes the same cost function of LQR control, except that it covers a limited time frame given
by the prediction horizon. MPC is regarded as a feed-forward control strategy because a disturbance model can be embedded
in its prediction model. Nevertheless, the use of a disturbance model may mask the benets of the computation of a better
control signal under equal circumstances. Put another way, the dynamic model for trafc ow should be the same for TUC
and MPC strategies when comparing their performances. Following these principles, the MPC problem for signaling split control at time t is cast as
Pt : min
K
K1
X
X
1
1
^ t kjt
^ t kjt0 Ru
^ t kjt0 Q x
^t kjt
x
u
2
2
k1
k0
^tjt xt
s: to : x
8a
8b
For k 0; . . . ; K 1 :
^ t kjt
^t k 1jt Ax
^t kjt Bu
x
8c
^ t kjt P c
Cu
8d
^ t kjt d
Du
8e
^ t kjt is the state

where K is the length of the prediction horizon; xt is the current state of the trafc network at time t; x
^ t kjt is the control prediction for time t k, but only u
^ tjt is implemented with ut u
^ tjt; C
prediction for time t k; u
and c dene the inequality constraints; and D and d dene the equality constraints.
A semi-positive matrix M induces a vector norm kxkM
p
x0 Mx.
125
3. Multi-agent model predictive control

This section introduces the concept of linear dynamic network (LDN) which models the trafc ow dynamics and split
control problem shown above. It presents a distributed formulation Pt for LDNs which generalizes the MPC formulation
for split control given in Eqs. (8a)(8e). Further, this section develops a decomposition of Pt into a set fP m tg of sub-problems and proposes a distributed algorithm for an agent network to reach a solution to Pt by iteratively solving fPm tg.
3.1. MPC formulation
A dynamic network consists of the interconnection of M sub-systems that forms a graph G V; E, where each subsystem is a node in V and each arc i; j 2 E denes a coupling between sub-systems i and j. Vector xm 2 Rnm has the local
state and um 2 Rpm has the local controls of sub-system m. The state of sub-system m evolves in time depending on its local
state, local control signals, and the control signals at the up-stream sub-systems. For discrete-time dynamics, the state equation for sub-system m is:
xm t 1 Am xm t
Bmi ui t
i2Im
where t 2 N is the discrete sample time and Im fmg [ fi : i; m 2 Eg is the set of input neighbors of sub-system m, which
includes m and the up-stream sub-systems. The network state is x x1 ; . . . ; xM , whereas its control vector is
u u1 ; . . . ; uM . Clearly, the dynamic Eqs. (9) are collectively given by xt 1 Axt But for suitable matrices A and
B. Hereafter, the network dynamic system is assumed to be controllable3.
Given the network state xt, the MPC framework obtains the control signals for time t by solving the following quadratic
programming problem:
M
X
Pt : min
/m t
m1
M X
K
X
1
^ m t k 1jt0 Rm u
^ m t k 1jt
^m t kjt u
^m t kjt0 Q m x
x
2
m1 k1
10a
s: to :
^ m tjt xm t; m 2 M
x
^m t kjt
^ m t k 1jt Am x
For all m 2 M; k 2 K : x
10b
^ i t kjt
Bmi u
10c
i2Im
^ m t kjt P cm
Cmu
^ m t kjt dm
Dm u
10d
10e
^ m t kjt is its predicted

^m t kjt is sub-system ms state prediction for time t k calculated at time t, whereas u
where x
control signal; Q m is positive semi-denite and Rm is positive denite; C m and cm (Dm and dm ) dene the inequality (equality)
constraints; and M f1; . . . ; Mg is the set with the indices of the sub-systems and K f0; . . . ; K 1g denes the prediction
horizon.
^ m tjt. The other control signals are calOnly the control signals predicted for time t are implemented, namely um t u
culated merely to predict the long-term effects of the present control actions and thereby avoid actions that have poor longterm performance. Because of this predictive feature, the framework is called model predictive control. At the next sample
time, t 1, the prediction horizon is rolled forward: the current state xt 1 is measured, Pt 1 is solved, and new control
signals um t 1 are obtained and implemented. The process continues indenitely receding into innity. This is why such
control framework is also known as rolling, sliding, and receding horizon control.
The test bed is the trafc network depicted in Fig. 4 with 13 one-way roads and six junctions. The state x3 x6 x7 0 of subsystem three has the number of vehicles in roads 6 and 7, while its control vector u3 u6 u7 0 has the green time for each
road. The coupling graph G appears in Fig. 5. The set of input neighbors to sub-system three is I3 f1; 3; 4g. Matrix B33
expresses the discharge of queues x3 as a function of green times u3 , while B31 B34 expresses how queues x3 build up as
x1 x4 are emptied. For the purpose of illustration,
B33 T
SC6
SC7
!
;
B34 T
s8;7 SC8 s9;7 SC9
!
;
B31 T
s1;6 SC1 s2;6 SC2 s3;6 SC3

0
where T (seconds) is the control interval, si;j is the conversion rate from road i into j; Si (vehicles/s) is the saturation ow of
road i, and C (seconds) is the cycle time. The inequality constraints impose minimum and maximum green times on the
phases. The equalities guarantee that the total green time plus lost time (yellow time) add up to cycle time.
3
With A being n n and B being n m, the pair A; B is said to be controllable if the n nm matrix A AB A2 B An1 B has full row rank. For a controllable
plant xt 1 Axt But, there exist control vectors u0, u1, . . ., un 1 that force xn to the origin regardless of the initial state x0.
126
Fig. 4. Trafc network. The shaded area indicates sub-system 3 whose incoming queues are modeled by state variables x6 and x7 .
Fig. 5. Dynamic coupling graph.
3.2. Compact formulation

The elimination of linear dependencies and the aggregation of variables over the prediction horizon leads to an equivalent
form of Pt that simplies the design of algorithms. Note that sub-system ms state prediction for time t k is a function of
its state at time t and the control signals prior to time t k
^m t kjt Akm xm t
x
k
X
X
^
Al1
m Bmi ui t k ljt
11
l1 i2Im
^ m t u
^ m tjt; . . . ; u
^ m t K 1jt collect the control variables and x
^ m t x
^ m t 1jt; . . . ; x
^m t Kjt be
Let vector u
the state variables predicted over the time horizon. By dening matrices
Am
Bmi
6 2 7
6
6 Am 7
6 Am Bmi
7
6
6
m 6
mi 6
7 and B
A
6 .. 7
6
..
6. 7
6
.
5
4
4
AKm
K1
Bmi
Am

Bmi

..
.
..
AK2
m Bmi
7
0 7
7
7
7
0 7
5
Bmi
127
where 0 denotes a matrix with zeros of suitable dimension, the state predictions are calculated as
m xm t
^m t A
x
mi u
^ i t
B
12
i2Im
m IK Q and R
m IK Rm in terms of the Kronecker
Let In denote the identity matrix of dimension n. By dening Q
m
4
product , and using Eq. (12), the objective term /m t becomes
/m t
X
1
1 X X
1
0
0 Q
0 Q

^
^
0 Q
^
^ i t0 B
^
xm t0 A
xm t0 A
u
m m Am xm t
m m Bmi ui t
mi m Bmj uj t um t Rm um t
2
2
2
i2Im
i2Im j2Im
13
Now, dene the following vectors, matrices, and constant

0 Q
gmi t B
for i 2 Im
mi m Am xm t
0

Hmij Bmi Q m Bmj for i; j 2 Im; i m or j m
mB
0 Q
m
mm R
Hmmm B
14b
X 1
0 Q

ct
xm t0 A
m m Am xm t
2
m2M
14d
mm
14a
14c
Then, problem Pt becomes
Pt : min
X X
1 X X X
^ i t0 Hmij u
^ j t
^ i t ct
gmi t0 u
u
2 m2M i2Im j2Im
m2M i2Im
mu
^ m t P cm ; m 2 M
s: to : C
m; m 2 M

^ m t d
Dm u
15a
15b
15c
m d0 d0 0 have appropriate dimensions.

m IK C m ; D
m IK Dm , and
where C
cm c0m c0m 0 and d
m
m
Here, the issue is how a network of distributed agents solves Pt instead of a centralized agent. In what follows, we develop a decomposition of Pt into a set of coupled sub-problems fP m tg and outline a distributed solution protocol.
3.3. Problem decomposition
For the distribution of decision-making, an agent m decides upon the values of the local control variables of sub-system
m. The values um t are obtained by solving a local optimization problem Pm t at each sample time. The design of the subproblem set fP m tg and the couplings among the agents is the so-called problem decomposition. The decomposition is said to
^ m t.
be perfect if each sub-problem Pm t encompasses all of the objective terms and constraints of Pt that depend on u
Models and algorithms for perfect and approximate decomposition are found in (Camponogara and Talukdar, 2004,
2005). For a perfect decomposition, let:
^i t is
Im fi : m 2 Ii; i mg be the set of output neighbors of sub-system m, that is, any sub-system i whose state x
^ m t;
affected by u
^ m t; and
Cm fi; j 2 Im Im : i m or j mg be the sub-system pairs of quadratic terms in /m that depend on u
^ m t.
Cm; k fi; j 2 Ik Ik : i m or j mg be the pairs of quadratic terms in /k t, k 2 Im, that depend on u
In the sample trafc network (Figs. 4 and 5), I1 f1g; I1 f2; 3; 5; 6g; C1 f1; 1g, and C1; 3
^ m t can affect the state of systems other than Im [ Im. For instance, subf1; 3; 1; 4; 1; 1; 3; 1; 4; 1g. Notice that u
system 1 is coupled to sub-system dened by Eq. (4) via sub-system 3, but 4 R I1 [ I1. The notion of neighborhood establishes the interdependence among sub-systems. Agent ms view of the network is divided in three sets:
local variables: the variables in vector um t;
neighborhood variables: all the variables in vector ym t ui t : i 2 Nm where Nm Im [ fi :
i; j 2 Cm; k; k 2 Img fmg is the neighborhood of agent m. The neighborhood of agent m consists of the sub-systems
other than m that are affected by the decision um t or whose decisions affect xm t. Notice that Im # Nm; and
remote variables: all of the other variables which consist of vector zm t ui t : i R Nm [ fmg.
P
^ t m2M /m t denote the objective function of Pt.
From agent ms view point, ut um t0 ym t0 zm t0 0 . Let f u
A perfect problem decomposition requires the local problem Pm t to account for all the dependencies with the neighbors
^ m t
of agent m. This is achieved if P m t is obtained from Pt by (i) discarding from the objective f the terms not involving u
and (ii) dropping the constraints not associated with agent m. Formally, agent ms local problem is
m is a block diagonal matrix with K blocks, each of which being a matrix Q .

The operator denotes the Kronecker product. Q
m
128
1
^ m t gm t0 u
^ m t
^ m t0 Hm u
u
2

^ m t P cm
s: to : C m u
m
mu
^ m t d
D
^m t : min fm
Pm t; y
16a
16b
16c
where Hm is a suitable matrix and gm t is a suitable vector. A step-by-step procedure to obtain Hm and gm t from Hijl and
gij t is developed in (Camponogara and de Oliveira, 2009). Evidently, a perfect decomposition ensures that
^ t fm u
^ m t; y
^m t f m y
^ m t; ^zm t ct
f u
^ m t will be shorthands
for each agent m where f m is a suitable function. To simplify notation, hereafter P m ; Pm t, and P m t; y
for sub-problem (16a)(16c).
A perfect problem decomposition leads to some relationships between Pt and fP m tg that are handy to the design of a
distributed algorithm for the agent network. Assumptions and resulting properties are presented below. The reader can refer
to (Camponogara and de Oliveira, 2009) for the demonstrations and some illustrations.
^ t satises rst-order KKT (Karush-Kuhn-Tucker) optimality conditions for Pt if, and only if,
Proposition 1. A solution u
^ m t; y
^ m t satises KKT conditions of P m t; y
^ m t for each m 2 M.
u
Denition 1. (Feasible spaces) The feasible spaces are:
m g is the feasible space for P m t;
^m P
^m d
^ m : Cmu
cm ; Dm u
U m fu
U U 1 U M is the feasible space for Pt; and
Y m i2Nm U i is the feasible space for agent ms neighborhood variables.
Assumption 1. (Compactness) The feasible space U is a compact set.

m for all m 2 M.
mu
mu
^m >
^m d
^ 2 U such that C
cm and D
Assumption 2. (Strict feasibility) There exists u
Compactness5 is a plausible assumption because control signals are invariably bounded. So is the strict feasibility assumption: if the interior of U is empty, then some inequalities are indeed equalities and should be regarded as such.
Proposition 2. Problem Pt given by (15a)(15c) is convex.
^ m t is convex.
Corollary 1. Sub-problem Pm t; y
^ t is a convex function and U is a convex set, u
^ tH is a local minimum for f
Proposition 3. (Optimality conditions) Because f u
over U if and only if:
^ tH 0 u
^ t u
^ tH P 0;
rf u
^ t 2 U
8u
17
^ t satisfying condition (17) is called stationary point.

A vector u
^ m tH ; y
^ tH is a local minimum for Pt if, and only if, u
^m tH is a local minimum of
Corollary 2. (Local optimality conditions) u
H
^
Pm t; ym t for all m 2 M.
This corollary means that an overall control vector that cannot be unilaterally improved by a single agent (a xed point) is
locally optimal for all sub-problems fPm tg and therefore also locally optimal for Pt. As the problems are all convex, a local
optimum induces a global optimum.
3.4. Multi-agent distributed control
A perfect problem decomposition establishes an equivalence between an optimal solution to Pt and a stationary solu^ tH ? Below, we present a distributed algotion for the sub-problem network fP m tg. How do the agents reach a xed point u
rithm for the agents to arrive at a stationary point for fPm tg which works by generating a sequence
^ k
^ k
^ 0
^ tk u
u
1 t; . . . ; uM t of iterates. Starting with a feasible control vector ut , at each iteration k the agents exchange
their decisions locally, coordinate the iterations to preclude coupled agents from acting simultaneously, and keep working
until convergence is attained or time is up. At this point, the control signals are implemented and the horizon is rolled forwards to the next sample time. Two fundamental assumptions for the convergence of the agents iterates to a stationary
solution are stated below.
5
A set S is compact if for any given sequence xk of vectors in S there exists a subsequence xki which converges to a point xH in S. Any compact set is closed
and bounded.
129
Assumption 3. (Synchronous work) If agent m revises its decisions at iteration k, then:

k
^ i t : i 2 Nm to produce an approximate solution to Pm t; y

^m t that becomes its next iter^ m t u
(i) agent m uses y
^ k1
t;
ate u
m
^ k
^ ik1 t u
(ii) all the neighbors of agent m keep their decisions at iteration k, that is, u
i t for all i 2 Nm.
^ tk is not a stationary point for all problems in fPm tg, then at least one agent m for
Assumption 4. (Continuous work) If u
k
^ k1
^
^k
t by approximately solving Pm t; y
which um t is not a stationary point for P m t produces a new iterate u
m
m t.
Condition (ii) of Assumption 3 and Assumption 4 hold by arranging the agents to iterate repeatedly in a sequence
hS1 ; . . . ; Sr i where Si # M; [ri1 Si M, and all distinct pairs m; n 2 Si are non-neighbors for all i. hS1 ; S2 ; S3 i with
S1 f2; 4; 6g; S2 f3; 5g, and S3 f1g is a valid sequence for the illustrative scenario. Actually, this sequence is too restrictive because the dynamic Eq. (9) assumes that ui t; i 2 Im; inuences the entire state vector xm t 1. This is not the case
in the trafc scenario. While the control signals u1 t and u4 t inuence x3 t 1 as a whole in the model, u1 t inuences
only the part of x3 t 1 associated with x6 , whereas u4 t inuences only the part associated with x7 . Thus,
S1 f2; 4; 6g; S2 f3; 5g, and S3 f1; 4g is also a plausible iteration sequence for the agents. Time-varying sequences that
uphold the conditions and synchronization protocols are other alternatives.
^ tk are drawn to a stationary point of
Of relevance is the way an agent m solves Pm t approximately so that the iterates u
fP m tg. To this end, we developed a distributed algorithm based on the feasible direction method (Bertsekas, 1995) which is
only outlined below, but fully developed in (Camponogara and de Oliveira, 2009). The distributed feasible direction method
is specially tailored for LDNs, taking advantage of the local dynamic and constraint structure which is not present in frameworks for more general settings (Camponogara et al., 2002; Camponogara and Talukdar, 2007).
k
^ k t u
m
^ k
^ tk , agent m computes a locally descent direction d
t u
At the current iterate u
m
m t by solving a linear programming (LP) problem
Dk
m t min
k
u
m t
0
^ k
k
^ k
^k
ru^ m t fm u
m t; ym t um t um t
18a

k
s: to : C m u
m t P cm
18b

mu
k
D
m t dm
18c
k
^ k
^ k t 0 is locally feasible at u
^m
^ k
^ k
A direction d
t; y
m
m t if u
m t am dm t 2 U m for all sufciently small am > 0. A locally
k
k
0 ^ k
^ k
^ m t; y
^m t if rfm u
^k
feasible direction is locally descent at a nonstationary point u
m t; y
m t dm t < 0. Notice that the
t
produces
a
locally
descent
direction
if
one
exists.
solution to Dk
m
k
k
^ k
^ k
^ k1
t u
The next iterate u
m
m t am tdm t is obtained by nding a step am t that satises the Armijo rule. Given
k
bm ; rm 2 0; 1; am t is the smallest nonnegative integer am for which:
0 ^ k
am ^ k
am
^ k
^ k
^ k
^k
^k
^k
fm u
^ m t fm u
m t bm dm t; ym t 6 fm um t; y m t rm bm ru
m t; ym t dm
^ k t arrive at a stationary point of

Agent-iterations as delineated above, Assumptions 3, and 4 ensure that the iterates u
fP m tg and thereby a solution to Pt. Some technical details are needed for the convergence proof, but effectively the agent
network implements a distributed feasible direction method for quadratic programming (Camponogara and de Oliveira,
2009). The procedure used by each agent m at iteration k to solve fP m tg is outlined below.
Agent-iterationt; m; k
1: if agent m cannot revise its decisions in iteration k then
^ k
^ k1
t u
2: u
m
m t
3: return
4: end if
^ k
^ k
5: Agent m obtains y
m t u
i t : i 2 Nm from its neighbors
^ k t
t
to
obtain
d
6: Agent m solves Dk
m
m
k
^
7: if dm t 0 then
^ k
^ k
^ k1
^k
t u
. u
8: u
m
m t
m t; y
m t is stationary for P m t
9: return
10: end if
11: am 0
0 ^ k
am ^ k
am
^ k
^ k
^ k
^k
^ k
^ k
12: while fm u
^ m t fm u
m t bm dm t; y
m t > fm u
m t; y
m t rm bm ru
m t; y
m t dm
13: am am 1
14: end while
am ^ k
^ k
^ k1
t u
15: u
m
m t bm dm t
130
The iteration procedure is relatively simple. The most computationally demanding step is the solution of the linear program, for which fast and robust LP solvers are available off-the-shelf.
3.4.1. Analytical computation of feasible descent direction
The constraint structure of the linear dynamic network for signaling split control allows Dk
m t to be solved analytically.
To this end, replace the split ui t of a road i at cycle t with ui t li di t, where li is the lower bound for green time
and di t is the green time extension. If the upper bound for ui t is the cycle time C, then constraints (18b) and (18c)
become
P
P
li ;
di t kjt C
ui 2um
8k 2 K
ui 2um
di t kjt P 0; 8ui 2 um ;
8k 2 K
Such constraint structure is separable, having an independent constraint set for each prediction time k. Further, any basic
solution will have precisely one nonzero variable
di t kjt for each k. The net result is that an optimal basic solution to
t
is
found
by
dening
the
basic
variables
as
those corresponding to the most negative entries of the gradient
Dk
m
^ k
^k
rfm u
m t; y
m t.
3.4.2. Conict resolution
The multi-agent MPC framework can be viewed as a dynamic game (Camponogara et al., 2006). Each agent m has an implicit reaction function Rm ym determining the agents response um to the decisions ym of its neighboring agents. The reaction function is computed by solving sub-problem P m t. Thus, the agents resolve conicts by iteratively reacting to one
anothers decisions until they reach a xed point. Such a xed point is a Nash point for the game, that is, a combined decision
vector u which cannot be improved unilaterally by any agent with respect to its objective. On the one hand, the agents are
selsh to the extent that they are driven by their own interests, as quantied by their objective functions. On the other hand,
this selsh behavior leads to a global optimum since the objectives of the agents are aligned with the global objective in
problem Pt.
3.4.3. Multi-agent MPC as a multi-agent system
All in all, the multi-agent MPC framework falls within the class of multi-agent systems, which are systems composed of
multiple interacting intelligent agents having the characteristics of autonomy, local views, and decentralization (Wooldridge,
2002). The agents have limited autonomy because they follow the iteration and communication protocol imposed by
Assumptions 3 and 4, but each agent m is free to decide upon the values of parameters bm and rm based on what worked
best in the past, perform multiple iterations rather than simply satisfying the Armijo rule, and even utilize a totally different
algorithm that would solve P m or nd a near-optimal solution which implicitly satises the Armijo rule. The views of the
agents are local because they sense and decide upon the values of a fraction of the state and control variables, respectively.
And the agents are decentralized since no single agent has a complete view of or operates the entire network.
3.5. Closed-loop stability
The MPC approach is a kind of feedback control. It repeatedly revises the predicted control actions over a receding horizon
as new state measurements are received. However, the optimizations do not explictly consider the system behavior beyond
the prediction horizon, potentially leading the system to an unstable mode. For simplicity, let the origin x; u 0 be an equilibrium point for the dynamic network xt 1 Axt But. It is important to mention that stabilization conditions as^ t kjt xt k, and a global optimum is found for the optimization problems.
sume that the prediction model is perfect, x
The two main strategies for closed-loop stability of MPC are terminal constraints and innite horizons (Maciejowski, 2002).
^ m t Kjt 0 for
The terminal constraint strategy drives the nal state to the origin, that is, it introduces the constraint x
all m 2 M. Then a positive-denite objective function and these terminal constraints ensure closed-loop stability of the network. Notice that this strategy would couple the sub-systems in more complex ways than the local constraint structure given
^ m t Kjtk2 for all m 2 M can be introduced in the objective to retain
by Eqs. (10d) and (10e). Instead, the penalty term 1 kx
^
the local structure of the network. Notice that xm t Kjt tends to 0 as ! 0.
An innite prediction horizon ensures closed-loop stability. As the optimizations would not be in nite-dimensional
space, the innite horizon problem has to be expressed in terms of a nite set of control variables. If the network is intrinsically stable (all eigenvalues of A are inside the unit disc), this strategy introduces a terminal cost
P1 0 i
P
0
i
^
^
m2M xm t Kjt W m xm t Kjt where W m
i0 Am Q m Am is convergent because Am is stable. For an unstable plant,
W m does not converge and the unstable modes must be forced to zero at the end of the prediction. The network state is
decomposed in terms of stable xsm Asm xm and unstable xum Aum xm modes via a Jordan decomposition. Pt is augmented
P
^ sm t Kjt0 W sm x
^ sm t Kjt where W sm is
^um t Kjt 0 for all m 2 M and a terminal cost m2M x
with a terminal constraint x
obtained similarly to W m . The terminal constraints couple the sub-systems through the constraint set, but they can be
approximated with a penalty term as explained above to preserve the local structure of Pt.
All in all, the distributed agents can implement the terminal cost strategy if the open-loop plant is stable, or otherwise
introduce terminal constraints on the unstable modes while enforcing terminal costs on the stable modes. The agents
131
enforce terminal constraints approximately using penalty terms. Regardless of the strategy or combination thereof, the
agents can ensure closed-loop stability without compromising the local structure of fP m tg.
4. Simulation analysis
Fig. 4 shows the urban trafc network that served as a test bed for simulations with the multi-agent model predictive
control strategy and the standard TUC two-stage regulator. First, both strategies are evaluated through a numerical analysis
based on the nominal network model. Besides a comparison of these strategies, the study included an analysis of the convergence of the solutions produced by multi-agent MPC to the optimal solution obtained with centralized MPC. Second, simulations were conducted with a professional trafc simulation software to assess the behavior of both strategies in a more
realistic scenario and subject to model discrepancies. Third, the test bed network was expanded by adding two junctions,
four state variables, and four control signals to illustrate the exibility and scalability of multi-agent MPC.
4.1. Network specication
The test bed network was designed to represent an urban perimeter traversed by high ow avenues, providing a convenient scenario for split control evaluation. Nevertheless, the complexity of the network is inuenced by other variables that
include cycle time and offset between junctions. The network consists only of one-way links to diminish the inuence of
these control parameters and the network specication on the performance metrics. Further, offset control is not implemented and the cycle time is dened as a multiple of the shortest Webster cycle to balance the internal streams of vehicles.
To mitigate the inuence of the network specication and the uncontrolled variables (offset and cycle time), three scenarios
were appraised.
4.1.1. Scenario I: distinct cycles C
Cycle times and nominal splits were computed through a method known as Websters procedure (Webster, 1959), which
yields optimal cycle times and signaling splits for isolated junctions. The procedure is summarized by the equations below:
Cj
qN =Si C j Lj
1:5Lj 5
P
and uj;i iP
;
N
1 k2Ij qNk =Sk
k2Ij qk =Sk
for all i 2 F j
where C j is the cycle of junction j; Lj denotes the lost time of the same junction; qNi is the nominal inow to link i in vehicles
per hour; Si is the saturation ow of link i; Ij is the set of input links of junction j; uj;i is the nominal green time allocated to
phase i of junction j; and F j is the set of phases of the controlled junction j.
The cycle times and splits resulting from the application of Websters procedure appear in Table 1. The cycles and splits
are not optimal since the junctions are not isolated and operate synchronously (their offset is zero). In fact, vehicle progression is erratic and difcult since the junctions have distinct cycles. In this scenario, the high trafc inows concentrated in
the main avenues x2 and x8 make progression even more difcult.
4.1.2. Scenario II: equal cycles (C=)
Cycle times were set to 120 s, providing a harmonic progression of vehicles and minimizing the undesirable effects of the
lack of synchronization. For this scenario, the trafc inows were more balanced to avoid oscillations in the internal ows
and thereby diminish the effects of synchronization. With the given cycle time, the nominal splits were obtained by the
Websters procedure. The nominal trafc control parameters are presented in Table 2.
Table 1
Nominal parameters of the distinct cycles scenario.
Link
z
Control
uj;z
Saturation
Sz (veh/h)
Nominal inow
qz (veh/h)
Nominal split
uN
j;z (s)
Cycle
C (s)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
u1;1
u1;2
u1;3
u2;1
u2;2
u3;1
u3;2
u4;1
u4;2
u5;1
u5;2
u6;1
u6;2
3600
3600
3600
3600
3600
1800
3600
3600
3600
3600
1800
3600
3600
1000
1100
900
1800
1300
58.0
63.8
52.2
46.7
73.9
26.3
43.6
89.2
64.4
50.9
28.8
75.6
43.7
192.0
132.6
81.9
165.6
91.7
131.3
132
Table 2
Nominal parameters of the equal cycles scenario.
Link
z
Control
uj;z
Saturation
Sz (veh/h)
Nominal inow
qz (veh/h)
Nominal split
uN
j;z (s)
Cycle
C (s)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
u1;1
u1;2
u1;3
u2;1
u2;2
u3;1
u3;2
u4;1
u4;2
u5;1
u5;2
u6;1
u6;2
3600
3600
3600
3600
3600
1800
3600
3600
3600
3600
1800
3600
3600
800
1300
900
900
700
28.8
46.8
32.4
72.5
39.5
54.9
57.1
63.0
49.0
59.8
52.2
54.7
57.3
120
120
120
120
120
120
Table 3
Nominal parameters of the equal cycles scenario with crash simulation.
Link
z
Control
uj;z
Saturation
Sz (veh/h)
Nominal inow
qz (veh/h)
Nominal split
uN
j;z (s)
Cycle
C (s)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
u1;1
u1;2
u1;3
u2;1
u2;2
u3;1
u3;2
u4;1
u4;2
u5;1
u5;2
u6;1
u6;2
3600
3600
3600
3600
3600
1800
3600
3600
3600
3600
1800
3600
3600
800
1300
900/0/1500
900
700
28.8
46.8
32.4
72.5
39.5
54.9
57.1
63.0
49.0
59.8
52.2
54.7
57.3
120
120
120
120
120
120
4.1.3. Scenario III: equal cycles with crash simulation (C = /crash)

This scenario has the same characteristics of the previous one, except for the simulation of a car crash in link x3 . This incident occurs at the 15 min of simulation and blocks the link for 15 min. The intent is to temporarily suspend trafc ow
through the link. When the link is unblocked at the 30th minute of simulation, the inow of link x3 reaches a rate higher
than the nominal rate for the remaining of the simulation because of the accumulation of vehicles during the incident. Table
3 presents the values regarding this scenario.
4.1.4. Remarks
Table 4 shows the turning rates which are common for all scenarios6. The rst column gives the origin link of the conversion, while the remaining columns dene the destination links. The data characterizing a scenario and the turning rates are sufcient to determine the matrix B (see Section 2) and thereby obtain the dynamic system xt 1 Axt But.
Scenarios II and III share the same dynamic system but differ in the input demand fk, which simulates the suspension of
trafc ow on link x3 for 15 min in scenario III. A comparison between scenarios I and II aims to verify if one of the control
strategies is more suitable for an erratic progression (distinct cycle times) or smoother progression (equal cycle times). A
comparison between scenarios II and III seeks to assess the robustness of the control strategies when the demands deviate
drastically from the nominal demands.
4.2. Numerical results
This section presents results from numerical simulation using Eq. (4) as a model for the trafc system. The simulation can
be implemented with scientic computation software, such as MATLAB and SCILAB, and even programming languages such as
PYTHON and C. The networks actual state is calculated at each interval based on the given initial conditions, the previous state,
and the discrete model. To make the control design model different from the simulation model, a disturbance was introduced
in the simulation model
6
Turning rates are not reported for x4 ; x5 ; x12 , and x13 because they are exit links.
133

Table 4
Nominal turning rates for the test bed network.
sw;j
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x1
x2
x3
x6
x7
x8
x9
x10
x11
0.20
0.25
0.65
0.50
0.80
0.05
0.30
0.05
0.40
0.60
0.60
0.40
0.05
0.30
0.05
0.80
0.50
0.70
0.15
0.15
Fig. 6. Flowchart of the numerical simulation process.
xt 1 Axt But ft
19
where the term ft represents the inows of input links, that is, vehicles entering the network. In the network of Fig. 4, the
set of inow links is f1; 2; 3; 8; 9g. Fig. 6 presents the owchart of the numerical simulation.
Note however that this simulation is a rough representation of real trafc behavior and is best used for control design. For
instance, the model does not allow vehicles to cross two intersections in the same control interval. Another limitation is the
assumption that queue lengths are sufciently large and downstream links are not obstructed, so that the outow of a link
with right of way is approximated by its saturation ow. This assumption does not hold when few vehicles are waiting at the
stop line of a road, say xz , which feeds another queue, say xw .
The scenario chosen for this experiment is the one of distinct cycles. Since the model ignores the interactions between
junctions, the given cycles and splits can be regarded as optimal and there is no need to replicate the experiments for other
scenarios. The inows of the network, ft, are dened by a Gaussian function centered at the middle of simulation time, with
an initial value equal to the nominal inows and a peak that doubles the nominal.
The network was simulated for approximately 2 h, namely T 40 simulation steps with control interval of DT 200 s.
The impact of the prediction horizon on multi-agent MPC is evaluated for steps ranging from 1 to 5. Furthermore, 10 random
initial conditions were considered to increase reliability of the analysis. The initial state of the links were obtained at random
in the range from 0 to 500 vehicles for each initial condition.
134
Fig. 7. Mean accumulated cost over 40 simulation steps for a set of 10 random initial conditions.
Because model (19) is not ideal for trafc representation, the following accumulated cost was chosen as the objective
function and comparative metric:
J exp
T
X

xt0 Q xt Dut0 RDut
20
t0
where T is the number of simulation steps; Dut uN ut is the deviation from the nominal control signals; Q I is an
identity matrix weighing the states; and R 0:003I is a matrix weighing control deviation.
The values assumed for the weighting matrices are typical of other papers on TUC control (Diakaki, 1999; Carlson et al.,
2006). The objective J exp simultaneously minimizes queues in a balanced way with the quadratic term kxtk2Q and control
deviation from a nominal xed-time control policy uN with the quadratic term kDutk2R kut uN k2R . The trafc engineer
experimentally sets the parameter r dening the control-cost matrix R rI and thereby the trade-off rate between the two
objectives. Nominal splits for the experiments appear in Tables 13.
P
^t kjt0 Q x
^t kjt
A stop criterion of relative tolerance was selected for multi-agent MPC. Let J exp t Kk1 x
0
^ t k 1jt RDu
^ t k 1jt be the objective function for MPC over the prediction horizon, that is, the objective of
Du
k1
k
^ k
Pt. The distributed agents iterate until kJ k
exp t J exp tk=kJ exp tk < q where q is the tolerance, fDut g is the sequence
of iterates produced by the agents, and k is the iteration counter. Such criterion is satised when the relative decrease in the
objective function becomes insignicant.
Fig. 7 shows the mean accumulated cost J exp over 10 simulation runs with different initial conditions. These results corroborate the efciency of multi-agent MPC. For a prediction horizon K 5, multi-agent MPC achieves a performance increase
of approximately 10% in comparison to the TUC LQR approach. For long horizons, the changes in control signals are more
subtle and so are the variations in objective function as shown in the gure. For short horizons, the relative distance between
the multi-agent MPC solution and the centralized solution becomes more pronounced, specially for high tolerances. Junctions with high inuence on the network, as junction 1, induce a large cost reduction that compared to the reduction from
less inuential junctions can trigger the stop criterion far from the optimal point.
4.3. Simulation results
Aiming to circumvent the limitations of the numerical analysis, the three scenarios were modeled in AIMSUN version 6
which is a professional trafc simulator (Barcel and Casas, 2002). The performance results from these simulations are more
reliable as the trafc dynamics are modeled more accurately.
Eq. (20) remains the objective function for computing the gain matrix L of the TUC strategy and for multi-agent MPC. Matrix Q was the identity, whereas the control deviation matrix R was either R1 0:003I or R2 I. All scenarios share the same
control interval DT 200 s and a duration of approximately 1 h. Further, equal prediction and control horizons of length
K 2 f1; 3g were used for multi-agent MPC. Although they seem small at rst, such sliding horizons are in accordance with
the dynamics of interest in the process: the proposed control interval is 200 s long which is larger than the highest cycle
time, thereby conguring an adequate control horizon.
135
Fig. 8.
AIMSUN
simulation model of the test bed network.
Table 5
Simulation results with R1 matrix for all scenarios.
Scenario
Journey time (s/km)

Mean
Density (veh/km)
Mean
Std. dev.
TUC LQR
C
C
C /crash
241.23
189.89
193.06
Std. dev.
3.15
0.75
2.72
29.51
18.57
19.14
0.67
0.23
2.74
M-MPC K 1
C
C
C /crash
240.42
189.85
192.09
6.43
0.96
1.80
29.59
18.57
19.06
0.97
0.09
2.44
M-MPC K 3
C
C
C /crash
465.66
208.21
205.77
55.38
2.68
18.83
53.57
20.30
20.55
4.74
0.27
3.86
Because state variables are not readily available in AIMSUN7, inductive loop detectors were inserted at the entrance and stop
line of the controlled links. Then, the number of vehicles that have entered but not left the link is obtained by subtracting the
measurements of the former detector from the measurements of the latter detector.
Fig. 8 depicts the AIMSUN simulation model of the test bed network. A set of ten replications with different seeds were simulated for each scenario. Tables 5 and 6 report the results achieved by multi-agent MPC (M-MPC) and the TUC LQR strategy
for matrices R1 and R2 , respectively. The results encompass the scenarios of distinct cycle times C, identical cycle times
(C=), and identical cycle times with car crash (C = /crash).
With controlcost matrix R1 , the difference between the performance of multi-agent MPC with a unitary step control
horizon (K 1) and the TUC LQR approach is not statistically signicant. On the other hand, the multi-agent MPC performance is inferior with a prediction horizon of three steps, corroborating the hypothesis that the predictions from the trafc
ow model given in Eq. (4) might be signicantly wrong. This observation is reinforced by the lack of performance degradation in the numerical experiments in which the predictions match the actual model.
With controlcost matrix R2 , the results are slightly favorable to multi-agent MPC but not statistically signicant when
the length of the prediction horizon is K 1. The TUC LQR approach achieves better performance than multi-agent MPC
when K 3.
7
URL: http://www.aimsun.com.
136
Table 6
Simulation results with R2 matrix for all scenarios.
Scenario
Journey time (s/km)

Mean
Density (veh/km)
Std. dev.
Mean
Std. dev.
TUC LQR
C
C
C / crash
240.87
189.03
192.38
2.73
0.59
2.70
29.63
18.46
18.97
0.56
0.24
2.64
M-MPC K 1
C
C
C / crash
237.82
188.74
191.60
3.03
0.80
2.47
29.35
18.47
19.05
0.37
0.06
2.53
M-MPC K 3
C
C
C / crash
311.64
199.04
202.22
31.32
2.36
6.45
37.56
19.40
20.07
3.24
0.21
0.29
A comparison between Tables 5 and 6 indicates that the performance of all control strategies were slightly better when
R R2 .
4.4. Multi-agent MPC recongurability
To demonstrate that multi-agent MPC can be recongured at ease, two junctions were added to the test bed network as
depicted in Fig. 9. The inclusion of the new junctions will take place in two phases, rst including sub-systems 7 and 8 afterwards. The introduction of junction 7 expands the neighborhood of junction 6 from the set N6 f1; 5g to N6 f1; 5; 7g.
As a consequence, new terms are included in agent 6s objective function to account for the inuence of the control signals at
junction 6 on the state of junction 7. No change is required in any other junction.
Initially, the neighborhood of junction 7 consists only of junction 6, conguring an easily implementable sub-system. In
the form of Eq. (16a), the objective function of agent 7 is given by
H7 t H777
7
B077 Q 7 B77 R
1
^ 6 t
g7 t H0767 H776 u
2
0 Q 7 A7 x7 t
^ 6 t B
B077 Q 7 B76 u
77
The addition of junction 8 is very similar to the previous one. This time, the introduction of junction 8 expands the neighborhood of junction 7 from the set N7 f6g to N7 f6; 8g. As a consequence, agent 7s objective function must be updated to account for the inuence on the state of junction 8.
Fig. 9. Expanded trafc network.
137

Table 7
Simulation results of the expanded test bed network.
Scenario
M-MPC K 1
Journey time (s/km)
Density (veh/km)
Mean
Std. dev.
Mean
Std. dev.
200.07
0.76
19.63
0.05
H7 t H777 H877
77 R7 B0 Q 8 B87
B077 Q 7 B
87
1 0
1
^ 6 t g88 t H0887 H878 u
^ 8 t g87 t
g7 t H767 H776 u
2
2
0 Q 8 A8 x8 t
^ 6 t B077 Q 7 A7 x7 t B087 Q 8 B88 u
^ 8 t B
B077 Q 7 B76 u
87
The sub-problem of agent 8 is actually fairly simple since junction 7 is its sole neighboring sub-system
H8 t H888 B088 Q 8 B88 R8

1
^ 7 t g88 t B088 Q 8 B87 u
^ 7 t B088 Q 8 A8 x8 t
g8 t H0878 H887 u
2
At this point the system is already congured with the newly added junctions. The reconguration process is summarized
in the following steps:
(1) statistically gather the parameters of the new junction(s);
(2) determine the neighborhood of the added intersection(s); and
(3) revise the objective function of the junctions belonging to that neighborhood and determine the objective function of
the new sub-system(s) according to Eq. (16a).
The parameters necessary to put together the simulation scenario are

the
the
the
the
turning rates are s12;14 0:6; s13;14 0:4; s14;16 0:5, and s15;16 0:5;
saturation ow is 3600 veh=h for links x14 ; x15 ; x16 , and x17 ;
nominal splits are uN7;1 uN7;2 uN8;1 uN8;2 54 s; and
inow for links x15 and x17 is 800 veh=h.
With the purpose of illustration, the AIMSUN equal cycle scenario C was modied to encompass junctions 7 and 8. The
results from the simulations appear in Table 7 for a prediction horizon of one step and R 3 103 I.
To provide a clear comparison with the LQR process of reconguration, the steps needed to include the two junctions
above are listed below:
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
statistically gather the parameters of junction 7;

include the new data in the global matrices A; B; Q , and R;
compute the new control matrix L;
modify all the parameters of the control matrix;
statistically gather the parameters of junction 8;
include the new data in the global matrices A; B; Q , and R;
compute the new control matrix L;
modify all the parameters of the control matrix; and
set up new procedures for recovering feasibility of control signals.
Although the number of steps involved are similar, the inclusion of a new junction in the LQR control scheme requires
modication of the control laws of all junctions. As network complexity increases, this task not only becomes arduous,
but also error prone as the parameters must be manually input.
5. Summary and future work
The operation of large dynamic systems remains a challenge in control engineering to a great extent due to their sheer
size, intrinsic complexity, and nonlinear behavior (Tatara et al., 2005, 2007). Recently, control engineers have turned their
attention to multi-agent systems for their composite nature, exibility, and scalability. To this end, this paper contributed
to this evolving technology with a framework for multi-agent control of linear dynamic networks, which are obtained from
the interconnection of sub-systems that become dynamically coupled but otherwise have local constraints.
138
Of particular interest to this paper is the signaling split control of trafc ow modeled by store-and-forward equations.
Such model leads to a linear dynamic network of sub-systems matching the trafc junctions. The state variables are the
number of vehicles in the roads leading to each junction, while the control signals are the green times given to each of their
stages. The signaling split control entails solving a constrained, innite time, linear-quadratic-regulator problem (Diakaki
et al., 2002): the quadratic cost seeks to minimize queue lengths and deviation from nominal signals; the constraints ensure
that the green times add up to cycle time and are within bounds; and the linear dynamics result from the store-and-forward
trafc ow model.
The TUC approach uses a feedback control law for signaling split, whereby a static feedback matrix is computed off-line
with the LQR technique and a quadratic program is solved on-line to recover split feasibility. On the other hand, model predictive control handles constraints in a systematic way by using a nite-time rolling horizon and solving optimization problems on-line. To cope with large networks and allow distributed reconguration, this paper proposed a decomposition of the
MPC problem into a set of locally coupled sub-problems that are iteratively solved by a network of distributed agents. The
iterates produced by these distributed agents are drawn towards a globally optimal solution if they synchronize their work.
The purpose of the experiments was threefold. First, the numerical analysis aimed to demonstrate the convergent behavior
of the multi-agent system and compare its speed with that of an ideal, centralized agent that solves the overall MPC problem.
Second, the simulation analysis showed that multi-agent model predictive control can achieve performance comparable to
the TUC approach in representative scenarios implemented with the Aimsun simulator. And third, the experiments illustrated the exibility of the multi-agent MPC framework by introducing two additional controlled junctions, which required
only the reconguration of the control agent at the neighboring junction.
The research reported heretofore is multidisciplinary with contributions across the elds of multi-agent technology, optimization, and urban trafc control. Further improvements will be pursued along the following directions:
numerical and simulated studies with very large networks aimed to conrm the potential of the multi-agent MPC
framework;
the formulation and application of trafc models that more accurately represent trafc ow (Aboudolas et al., 2007); and
the formal extension of the multi-agent framework to handle constraints on state variables.
References
Aboudolas, K., Papageorgiou, M., Kosmatopoulos, E., 2007. Control and optimization methods for trafc signal control in large-scale congested urban road
networks. In: Proceedings of the American Control Conference, New York, USA, pp. 31323138.
Balan, G., Luke, S., 2006. History-based trafc control. In: AAMAS06: Proceedings of the 5th International Joint Conference on Autonomous Agents and
Multiagent Systems, ACM, New York, NY, USA, pp. 616621.
Barcel, J., Casas, J., 2002. Dynamic network simulation with Aimsun. In: Proceedings of the International Symposium on Transport Simulation. <http://
www.aimsun.com/site/content/view/35/50/>.
Bertsekas, D.P., 1995. Nonlinear Programming. Athena Scientic, Belmont, MA.
Bielefeldt, C., Diakaki, C., Papageorgiou, M., 2001. TUC and the SMART NETS project. In: Proceedings of the International IEEE Conference on Intelligent
Transportation Systems, Oakland, CA, USA, pp. 5560.
Camacho, E.F., Bordons, C., 2004. Model Predictive Control. Springer-Verlag.
Camponogara, E., de Oliveira, L.B., 2009. Distributed optimization for model predictive control of linear dynamic networks, Accepted by IEEE Transactions on
Systems, Man, and Cybernetics Part A. <http://www.das.ufsc.br/~camponog/papers/dmpc-tuc.pdf>.
Camponogara, E., Talukdar, S.N., 2004. Designing communication networks for distributed control agents. European Journal of Operational Research 153 (3),
544563.
Camponogara, E., Talukdar, S., 2005. Designing communication networks to decompose network control problems. INFORMS Journal on Computing 17 (2),
207223.
Camponogara, E., Talukdar, S.N., 2007. Distributed model predictive control: synchronous and asynchronous computation. IEEE Transactions on Systems,
Man, and Cybernetics Part A 37 (5), 732745.
Camponogara, E., Jia, D., Krogh, B.H., Talukdar, S.N., 2002. Distributed model predictive control. IEEE Control Systems Magazine 22 (1), 4452.
Camponogara, E., Zhou, H., Talukdar, S.N., 2006. Altruistic agents in uncertain, dynamic games. Journal of Computer & Systems Sciences International 45,
536552.
Carlson, R.C., Kraus Junior, W., Campnogara, E. 2006. Combining the TUC urban trafc control strategy with bandwidth maximisation control in
transportation systems. In: Proceedings of the 11th IFAC Symposium on Control in Transportation Systems.
de Oliveira, L.B., 2008. Otimizao e controle distribudo de fraes de verde em malhas veiculares urbanas, Masters thesis, Graduate Program in Electrical
Engineering, Federal University of Santa Catarina, in Portuguese.
de Oliveira, L.B., Camponogara, E., 2007. Predictive control for urban trafc networks: initial evaluation. In: Proceedings of the 3rd IFAC Symposium on
System, Structure and Control, Iguassu Falls, Brazil.
de Oliveira, D., Bazzan, A.L.C., Lesser, V., 2005. Using cooperative mediation to coordinate trafc lights: a case study. In: AAMAS05: Proceedings of the 4th
International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 563470.
Diakaki, C., 1999. Integrated control of trafc ow in corridor networks, Ph.D. Thesis, Department of Production Engineering and Management, Technical
University of Crete, Greece.
Diakaki, C., Papageorgiou, M., 1997. Partners of the Project TABASCO, Urban Integrated Trafc Control Implementation Strategies, Tech. Rep. Project
TABASCO (TR1054), Transport Telematics Ofce, Brussels, Belgium (September 1997).
Diakaki, C., Papageorgiou, M., Aboudolas, K., 2002. A multivariable regulator approach to trafc-responsive network-wide signal control. Control
Engineering Practice 10 (2), 183195.
Gazis, D.C., Potts, R.B., 1963. The oversaturated intersection. In: Proceedings of the Second International Symposium on Trafc Theory, pp. 221237.
Hunt, P.B., Robertson, D.I., Bretherton, R.D., Winton, R.I., 1981. SCOOT a trafc responsive method of coordinating signals, Tech. rep., Transport Research
Laboratory, Crowthorne, England.
Jennings, N., 2000. On agent-based software engineering. Articial Intelligence 117, 277296.
139
Kosmatopoulos, E., Papageorgiou, M., Bielefeldt, C., Dinopoulou, V., Morris, R., Mueck, J., Richards, A., Weichenmeier, F., 2006. International comparative eld
evaluation of a trafc-responsive signal control strategy in three cities. Transportation Research Part A: Policy and Practice 40 (5), 399413.
Khne, F., 2005. Controle preditivo de robs mveis no holonmicos, Masters thesis, Graduate Program in Electrical Engineering, Federal University of Rio
Grande do Sul, Brazil, in Portuguese.
Li, S., Zhang, Y., Zhu, Q., 2005. Nash-optimization enhanced distributed model predictive control applied to the Shell benchmark problem. Information
Sciences 170 (2-4), 329349.
Lowrie, P.R., 1982. The Sydney co-ordinated adaptive trafc system principles, methodology and algorithms. In: Proceedings of the IEE International
Conference on Road Trafc Signalling, London, pp. 6770.
Maciejowski, J.M., 2002. Predictive Control with Constraints. Prentice Hall.
Manikonda, V., Levy, R., Satapathy, G., Lovell, D.J., Chang, P.C., Teittinen, A., 2001. Autonomous agents for trafc simulation and control. Transportation
Research Record 1774, 110.
Maturana, F.P., Staron, R.J., Hall, K.H., 2005. Methodologies and tools for intelligent agents in distributed control. IEEE Intelligent Systems 20 (1), 4249.
Negenborn, R.R., Schutter, B.D., Hellendoorn, J., 2008. Multi-agent model predictive control for transportation networks: serial versus parallel schemes.
Engineering Applications of Articial Intelligence 21 (3), 353366.
Nguyen-Duc, M., Guessoum, Z., Mari, O., Perrot, J.-F., Briot, J.-P., Duong, V., 2008. Towards a reliable air trafc control. In: AAMAS08: Proceedings of the 7th
International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 101104.
Papageorgiou, M., 2004. Overview of road trafc control strategies. In: Information and Communication Technologies: From Theory to Applications, pp. LIX
LLX.
Pechoucek, M., ilk, D., Pavlcek, D., Uller, M., 2006. Autonomous agents for air-trafc deconiction. In: AAMAS06: Proceedings of the 5th International
Joint Conference on Autonomous Agents and Multiagent Systems, ACM, New York, NY, USA, pp. 14981505.
Rigolli, M., Brady, M., 2005. Towards a behavioural trafc monitoring system. In: AAMAS05: Proceedings of the 4th International Joint Conference on
Autonomous Agents and Multiagent Systems, ACM, New York, NY, USA, pp. 449454.
Robertson, D.I., 1969. TRANSYT: A trafc network study tool, Tech. rep., Transport Research Laboratory, Crowthorne, England.
Robertson, D.I., Bretherton, R.D., 1991. Optimizing networks of trafc signals in real time the SCOOT method. IEEE Transactions on Vehicular Technology
40 (1), 1115.
Srinivasan, D., Choy, M.C., 2006. Cooperative multi-agent system for coordinated trafc signal control. IEE Proceedings Intelligent Transport Systems 153
(1), 4149.
Tatara, E., Birol, I., Teymour, F., inar, A., 2005. Agent-based control of autocatalytic replicators in networks of reactors. Computers & Chemical Engineering
29, 807815.
Tatara, E., inar, A., Teymour, F., 2007. Control of complex distributed systems with distributed intelligent agents. Journal of Process Control 17, 415427.
Toms, V.R., Garcia, L.A., 2005. A cooperative multiagent system for trafc management and control. In: AAMAS05: Proceedings of the 4th International
Joint Conference on Autonomous Agents and Multiagent Systems, ACM, New York, NY, USA, pp. 5259.
Tumer, K., Agogino, A., 2007. Distributed agent-based air trafc ow management. In: AAMAS07: Proceedings of the 6th International Joint Conference on
Autonomous Agents and Multiagent Systems, ACM, New York, NY, USA, pp. 18.
Webster, F.V., 1959. Trafc signal settings, Tech. Rep. 39, Road Research Laboratory, London, UK.
Wooldridge, M., 2002. An Introduction to MultiAgent Systems. John Wiley & Sons Ltd.
Yamashita, T., Izumi, K., Kurumatani, K., Nakashima, H., 2005. Smooth trafc ow with a cooperative car navigation system. In: AAMAS05: Proceedings of
the 4th International Joint Conference on Autonomous Agents and Multiagent Systems, ACM, New York, NY, USA, pp. 478485.

Transportation Research Part C: Lucas Barcelos de Oliveira, Eduardo Camponogara

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transportation Research Part C: Lucas Barcelos de Oliveira, Eduardo Camponogara

Uploaded by

Copyright:

Available Formats

Transportation Research Part C 18 (2010) 120139

Contents lists available at ScienceDirect

Transportation Research Part C

Multi-agent model predictive control of signaling split in urban

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Fig. 2. Basic concepts for trafc modeling.

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Fig. 3. Trafc ow dynamics in a link.

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

^ t kjt is the state

A semi-positive matrix M induces a vector norm kxkM

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

3. Multi-agent model predictive control

^ m t kjt is its predicted

s8;7 SC8 s9;7 SC9

s1;6 SC1 s2;6 SC2 s3;6 SC3

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Fig. 5. Dynamic coupling graph.

3.2. Compact formulation

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Now, dene the following vectors, matrices, and constant

Then, problem Pt becomes

m d0    d0 0 have appropriate dimensions.

m is a block diagonal matrix with K blocks, each of which being a matrix Q .

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Assumption 1. (Compactness) The feasible space U is a compact set.

^ t satisfying condition (17) is called stationary point.

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Assumption 3. (Synchronous work) If agent m revises its decisions at iteration k, then:

^ i t : i 2 Nm to produce an approximate solution to Pm t; y

^ k t arrive at a stationary point of

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

di t kjt P 0; 8ui 2 um ;

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

4.1.3. Scenario III: equal cycles with crash simulation (C = /crash)

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Fig. 6. Flowchart of the numerical simulation process.

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

simulation model of the test bed network.

Journey time (s/km)

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Journey time (s/km)

Fig. 9. Expanded trafc network.

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

Journey time (s/km)

H8 t H888 B088 Q 8 B88 R8

statistically gather the parameters of junction 7;

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

L.B. de Oliveira, E. Camponogara / Transportation Research Part C 18 (2010) 120139

You might also like

m d0 d0 0 have appropriate dimensions.