Zakeri cuts

SIAM J. OPTIM.
c 2000 Society for Industrial and Applied Mathematics

°
Vol. 10, No. 3, pp. 643–657
INEXACT CUTS IN BENDERS DECOMPOSITION∗

GOLBON ZAKERI† , ANDREW B. PHILPOTT‡ , AND DAVID M. RYAN‡
Abstract. Benders decomposition is a well-known technique for solving large linear programs
with a special structure. In particular, it is a popular technique for solving multistage stochastic
linear programming problems. Early termination in the subproblems generated during Benders de-
composition (assuming dual feasibility) produces valid cuts that are inexact in the sense that they
are not as constraining as cuts derived from an exact solution. We describe an inexact cut algorithm,
prove its convergence under easily verifiable assumptions, and discuss a corresponding Dantzig–Wolfe
decomposition algorithm. The paper is concluded with some computational results from applying the
algorithm to a class of stochastic programming problems that arise in hydroelectric scheduling.
Key words. stochastic programming, Benders decomposition, inexact cuts
AMS subject classifications. 90C15, 90C05, 90C06, 90C90
PII. S1052623497318700
1. Introduction. Many large linear programming problems exhibit a block-

diagonal structure that makes them amenable to decomposition techniques such as
Dantzig–Wolfe decomposition [5, 6] or its dual, Benders decomposition [3]. The latter
technique has become increasingly popular in stochastic linear programming, starting
with the independent publication of the L-shaped method by Van Slyke and Wets [17]
for two-stage stochastic linear programming. (The L-shaped method is often referred
to as stochastic Benders decomposition.)
In this paper we are concerned with Benders decomposition applied to linear
programs of the form
P: minimize cT x + q T y
subject to Ax = b,
T x + W y = h,
x ≥ 0, y ≥ 0.
If we define
Q(x) = min{q T y | W y = h − T x, y ≥ 0},
then P can be written as
P: minimize cT x + Q(x)
subject to Ax = b,
x ≥ 0.
Throughout this paper we assume that X = {x ≥ 0 | Ax = b} is contained in dom Q =
{x | Q(x) < ∞}. Under this assumption the Benders decomposition algorithm can
be defined as follows.
∗ Received by the editors March 25, 1997; accepted for publication (in revised form) January 10,
1999; published electronically February 29, 2000. This work was supported by the New Zealand
Public Good Science Fund, FRST contract 403.
http://www.siam.org/journals/siopt/10-3/31870.html
† Mathematical Sciences Division, Argonne National Lab, Argonne, IL 60439 (zakeri
@mcs.anl.gov).
‡ Operations Research Group, Department of Engineering Science, University of Auckland, Private
Bag 92019, Auckland, New Zealand (a.philpott@auckland.ac.nz, d.ryan@auckland.ac.nz).

643
644 G. ZAKERI, A. B. PHILPOTT, AND D. M. RYAN
Benders decomposition algorithm.

Set i := 0, U0 := ∞, L0 := −∞, F := Rn × [L0 , ∞).
While Ui − Li > 0
(1) Set i := i + 1.
(2) Solve the master problem
MP: minimize cT x + θ
subject to Ax = b,
(x, θ) ∈ F,
x≥0
to obtain optimal primal variables (xi , θi ).
(3) Set Li := cT xi + θi .
(4) Solve the subproblem
SP(xi ): minimize qT y
subject to W y = h − T xi ,
y≥0
to obtain optimal primal variables yi and dual variables πi .
(5) Set Ui := min{Ui−1 , cT xi + q T yi }.
(6) Set F := F ∩ {(x, θ) | πiT (h − T x) ≤ θ}.
In the classical case the cut defined by step 6 comes from an optimal basic feasible
solution to the subproblem. Since there is a finite number of basis matrices for this
problem, finite termination of the algorithm at the optimal solution can be guaranteed
(see, e.g., [13]).
In this paper we explore the Benders decomposition algorithm in the case where
the cuts are not computed from an optimal extreme-point solution to a linear pro-
gramming subproblem. For example, when the subproblems are very large, it makes
sense to determine the cuts by applying a primal-dual interior-point method to the
subproblem. Terminating this procedure when it yields a feasible dual solution will
still define a valid cut. We call this an inexact cut. If the dual solution is close to
optimal, then an inexact cut will also separate the optimal solution from the cur-
rent iterate (except when this is optimal). As observed by a number of authors (see,
e.g., [2]), inexact cuts may be less effort to compute than the exact cuts, especially
for linear programming algorithms that yield an approximately optimal dual feasible
solution before termination.
In theoretical terms, Benders decomposition is a special case of a more general
class of convex cutting plane algorithms first introduced by Kelley [14]. Cutting plane
algorithms construct a sequence of hyperplanes that separate the current iterate from
the optimal solution. In the case where the cutting planes are computed inexactly, the
asymptotic convergence of this process to the optimal solution has been investigated
by a number of authors [1, 7, 8, 11, 14]. In the context of Benders decomposition
applied to linear programs of the form P, all of the convergence results in these papers
assume that the sets containing x and π T T are both bounded. In the convergence
theorem that we prove for inexact cuts, we require that X = {x ≥ 0 | Ax = b} be
bounded and that X ⊆ dom Q. The latter assumption, which is known as relatively
complete recourse in stochastic programming, is weaker than requiring that π T T be
bounded.
INEXACT CUTS IN BENDERS DECOMPOSITION 645
To avoid possible confusion, we remark that our use of the term inexact is less
general here than that of Au, Higle, and Sen [1]. At each iteration i of their inexact
subgradient algorithm (applied to minimize a general objective function f (x)), they
construct an approximate subgradient at the current point xi by computing a sub-
gradient to an approximating function fi and taking a projected step from xi in (the
negative of) that direction. With certain restrictions on the convergence of {fi } to
f they prove convergence of xi to the minimum of f under the assumption that the
subgradients of fi at xi form a bounded sequence. Our results are confined to Benders
decomposition (where f (x) = cT x + Q(x) and each fi is defined by the inexact cuts at
iteration i), but we do not require that the subgradients of fi at each iterate (namely,
c − πiT T in our special case) form a bounded sequence.
In the next section we describe a Benders decomposition algorithm that termi-
nates the solution of the subproblem before optimality to produce an inexact cut. The
steps of the algorithm ensure that this cut separates the optimal solution from the
current iterate. In section 3 we consider the convergence of the inexact cut algorithm
under the above assumptions, and in section 4 we discuss the implications of our
results for Dantzig–Wolfe decomposition. In section 5 we give some computational
results.
2. The algorithm. We start the inexact cut algorithm by choosing a conver-
gence tolerance δ, setting an iteration counter i := 0, and choosing some decreasing
sequence {i } that converges to 0. We also set U0 := ∞ and L0 := −∞. The remaining
steps of the algorithm are as follows.
Inexact cut algorithm.
While Ui − Li > δ
(1) Set i := i + 1.
(2) Solve MP to obtain (xi , θi ).
(3) Set Li := cT xi + θi .
(4) Perform an inexact optimization to generate a vector πi feasible for the dual
of SP(xi ) such that
(1) πiT (h − T xi ) + i > Q(xi ).
(5) Set Ui := min{Ui−1 , cT xi + πiT (h − T xi ) + i }.
(6) If πiT (h − T xi ) > θi , then add the cut πiT (h − T x) ≤ θ to MP,
else set i := i + 1, xi+1 := xi , θi+1 := θi , Li+1 := Li , Ui+1 := Ui and go to
step 4.1
We denote by vi the value of the inexact optimization in step 4. Thus vi =
πiT (h − T xi ). In step 6 of each iteration of this method we check to see if vi > θi ,
which ensures that the hyperplane πiT (h − T x) = θ will strictly separate the current
iterate (xi , θi ) from any optimal solution of P. If this check fails, then we decrease the
duality gap tolerance and continue with the solution of SP(xi ), until either i → 0
with no change in (xi , θi ) or (xi , θi ) is separated from an optimal solution of P by a
cut.
To show that this algorithm converges we make use of the following simple results.
Lemma 2.1. −πiT T is an i -subgradient of Q at xi .
Proof. Since πi is dual feasible for SP(xi ), it is dual feasible for every possible
subproblem SP(x). Hence for every x of suitable dimension, we have
Q(x) ≥ πiT (h − T x),
1 Note that in this case x and θ remain fixed and only (possibly) changes.
and by (1)
Q(xi ) ≤ vi + i .
Thus
Q(x) − Q(xi ) ≥ πiT (h − T x) − πiT (h − T xi ) − i ,
giving
Q(x) ≥ Q(xi ) − πiT T (x − xi ) − i ,
which gives the result.

Lemma 2.2. Let Ui , Li , xi , and θi be generated by applying the inexact cut
algorithm with i . Then
0 ≤ Ui − Li ≤ vi + i − θi .
Proof. Since Ui is an upper bound on the value of P and Li is a lower bound,

Ui − Li ≥ 0. Moreover, since
Ui ≤ cT xi + vi + i
and Li = cT xi + θi , we have
0 ≤ Ui − Li ≤ cT xi + vi + i − cT xi − θi
and the result follows.

3. Convergence of the algorithm. In this section we prove that the sequence
{(xi , θi )} generated by the inexact cut algorithm converges to an optimal solution
to P. As alluded to above, abstract proofs of convergence for cutting plane methods
(see [14]) typically invoke a compactness argument that in our context relies on an
assumption that the sets containing x and π T T are both bounded. A general con-
vergence theory that might avoid these assumptions is developed in Higle and Sen
[10], who prove several convergence results for algorithms similar to the inexact cut
algorithm. Unfortunately, the direct application of these results to our algorithm is
not straightforward.
The difficulty with applying the results in [10] lies in demonstrating the key
assumption that the sequence {maxj<i (πjT (h − T xi ))} converges to Q(x̄) whenever
{xi } converges to x̄. (Observe that i → 0 implies {maxj<i (πjT (h−T xi−1 ))} converges
to Q(x̄) when {xi } converges to x̄, but this is different from the above assertion, since
xi−1 is not the minimizer of maxj<i (πjT (h − T x)).) To deal with situations like this
(for a slightly different class of algorithms from ours) [10, Theorem 9] relaxes the
assumption to
n o
lim max(πjT (h − T xi )) − max(πjT (h − T xi−1 )) = 0,
i∈K j<i j<i
where K is some infinite index set. This can be shown to be equivalent to our equa-
tion (6) below. (Lemma 3.9 shows that this equation holds for our algorithm.) As
[10, Theorem 9] is not directly applicable, we present a self-contained proof of our
convergence result.
To illuminate the role that the boundedness of {π T T } plays in the proof we

begin by showing that the sequence {πiT T } generated by the inexact cut algorithm
is bounded provided that the set X = {x ≥ 0 | Ax = b} is bounded, and dom Q is
Rn . (In stochastic programming the latter is known as complete recourse.) In what
follows we shall relax the latter assumption to X ⊆ dom Q, for which we may still
prove convergence although we are no longer guaranteed a bound on {π T T }. We
make use of the following technical result.
Lemma 3.1. If for some given pair (b, β) the epigraph of
f (x) = max {bTk x + βk }

1≤k≤N
lies in the half-space H = {(x, µ) | µ ≥ bT x + β}, then kbk ≤ max1≤k≤N kbk k.

Proof. Suppose kbk > max1≤k≤N kbk k = M and let β̃ = max1≤k≤N |βk |. Let
n > kbk|2β̃−β| T
−M kbk and define z := nb. We will show that f (z) < b z + β, contradicting
the hypothesis. Formally,
bT z + β = nkbk2 + β > nkbkM + β + |β̃ − β|

≥ nM kbk + β̃
≥ max [(nbT )bk ] + β̃
1≤k≤N
= max [(nbT )bk ] + max |βk |

1≤k≤N 1≤k≤N
≥ max [(nbT )bk + βk ]

1≤k≤N
= f (z).
This contradicts the assumption that the epigraph of max1≤k≤N {bTk x + βk } lies in H.
Hence we must have kbk ≤ M as required.
Lemma 3.2. If dom Q = Rn , then the sequence {−πiT T } is bounded.
Proof. Let {π̂k | 1 ≤ k ≤ N } be the set of basic feasible solutions of W T π ≤ q.
Recall that for every x ∈ Rn , πi is dual feasible for SP(x). So for every such x we
have
πiT (h − T x) ≤ max π̂kT (h − T x) = Q(x),

1≤k≤N
where the equation follows by virtue of dom Q = Rn . Therefore the epigraph of Q lies
in the half-space
H = {(x, µ) | µ ≥ bT x + β},
where b = −πiT T and β = πiT h. The conclusion is then immediate from Lemma
3.1.
Next we will show that the inexact cut algorithm terminates in a finite number of
iterations with a δ-optimal solution. If the inexact cut algorithm does not terminate
in a finite number of iterations, then it will produce an infinite sequence {(xi , θi )}
that satisfies one of the following conditions:
(1) There exists m such that θi ≥ vi for all i ≥ m.
(2) There exists a subsequence {(xσ(i) , θσ(i) )} such that θσ(i) < vσ(i) .
The following lemmas show that a contradiction results in either case, namely, the
algorithm eventually yields a δ-optimal solution.
Lemma 3.3. If there exists m such that θi ≥ vi for all i ≥ m, then Ui − Li ↓ 0.
Proof. Since θi ≥ vi , Lemma 2.2 implies
0 ≤ Ui − Li ≤ vi + i − θi ≤ i .
The result follows since i → 0.

Lemma 3.4. If there exists a convergent subsequence {(xτ (i) , θτ (i) )} such that
θτ (i) < vτ (i) , then
(1) 0 < vτ (i) − θτ (i) ≤ vτ (i) − vτ (i−1) + πτT(i−1) T (xτ (i) − xτ (i−1) );
(2) lim vτ (i) − vτ (i−1) = 0;
(3) lim inf πτT(i−1) T (xτ (i) − xτ (i−1) ) ≥ 0.
Proof. It is clear that 0 < vτ (i) − θτ (i) from the assumption. To obtain the second
inequality, observe that (xτ (i) , θτ (i) ) is constrained to satisfy the cut we added at
iteration τ (i − 1). Therefore
θτ (i) ≥ πτT(i−1) (h − T xτ (i) ),
which implies
vτ (i) − θτ (i) ≤ vτ (i) − πτT(i−1) (h − T xτ (i) )

(2) = vτ (i) − vτ (i−1) + πτT(i−1) T (xτ (i) − xτ (i−1) ).
Now (xτ (i) , θτ (i) ) → (x∗ , θ∗ ) by assumption. Furthermore, from the algorithm we have
Q(xτ (i) ) − τ (i) ≤ vτ (i) ≤ Q(xτ (i) )
and therefore
(3) vτ (i) → Q(x∗ ),
which implies lim vτ (i) − vτ (i−1) = 0. Furthermore, (2) and (3) imply
lim inf πτT(i−1) T (xτ (i) − xτ (i−1) ) ≥ 0.
Lemma 3.5. Suppose X = {x ≥ 0 | Ax = b} is bounded and dom Q is Rn . If

there exists a subsequence {(xτ (i) , θτ (i) )} such that θτ (i) < vτ (i) , then Ui − Li ↓ 0.
Proof. The subsequence {(xτ (i) , θτ (i) )} is bounded since X is bounded. Thus we
may assume, by extracting a further subsequence if necessary, that {(xτ (i) , θτ (i) )} is
convergent to (x∗ , θ∗ ), say. We proceed to show that Uτ (i) − Lτ (i) converges to zero,
which implies the result. By Lemma 2.2 we have that
0 ≤ Uτ (i) − Lτ (i) ≤ vτ (i) + τ (i) − θτ (i) ,
so if we let
Vτ (i) = vτ (i) + τ (i) − θτ (i) ,
then by Lemma 3.4

(4) 0 < Vτ (i) ≤ vτ (i) − vτ (i−1) + πτT(i−1) T (xτ (i) − xτ (i−1) ) + τ (i)
and
(5) lim vτ (i) − vτ (i−1) = 0.
Furthermore, since dom Q = Rn , by Lemma 3.2, πτT(i−1) T is bounded, and so
πτT(i−1) T (xτ (i) − xτ (i−1) ) → 0.
Substituting into (4) and taking the limit as τ (i) → ∞ yields Vτ (i) → 0. Since
Uτ (i) − Lτ (i) is bounded above by Vτ (i) and below by 0, it must converge to 0. Now
by their definitions, {Ui } is decreasing and {Li } is increasing. Hence {Ui − Li } is
decreasing, and since a subsequence of this sequence converges, it follows that the
whole sequence converges.
Theorem 3.6. If {x ≥ 0 | Ax = b} is bounded and dom Q = Rn , the inexact cut
algorithm terminates in a finite number of iterations with a δ-optimal solution of P.
Proof. From Lemma 3.3 and Lemma 3.5 we have that Ui − Li ↓ 0. Therefore
there exists some I such that UI − LI < δ, so the algorithm terminates in at most I
iterations. Let xk be such that UI = cT xk + vk + k . Then
cT xk + Q(xk ) ≤ cT xk + vk + k < LI + δ,
and so cT xk + Q(xk ) is within δ of the optimum.

We shall now consider relaxing the assumption that dom Q = Rn to X ⊆ dom Q.
(We retain the assumption that X is bounded.) In this case we are no longer guar-
anteed that {−πiT T } is a bounded sequence, since {xi } could lie on the boundary of
the domain of Q. At such points it is possible to have unbounded i -subgradients.
Since Lemma 3.4 remains valid without our assumption, in what follows we confine
our attention to the term
πτT(i−1) T (xτ (i) − xτ (i−1) )
and demonstrate that for some subsequence {xσ(i) } of {xτ (i) },

T
(6) lim −πσ(i−1) T (xσ(i) − xσ(i−1) ) = 0.
i→∞
We do this by showing in Lemma 3.9 that for some subsequence {xσ(i) } of {xτ (i) },
T
(7) lim inf −πσ(i−1) T (xσ(i) − xσ(i−1) ) ≥ 0.
Since by virtue of Lemma 3.4

T
lim inf πσ(i−1) T (xσ(i) − xσ(i−1) ) ≥ 0,
we get
T
lim sup −πσ(i−1) T (xσ(i) − xσ(i−1) ) ≤ 0,
which with (7) yields (6).

The proof of Lemma 3.9 uses a subsequence of {xi } lying in the relative interior
of a face of X. Each face of X is a bounded polyhedral set. To derive the inequality
(7) we make use of the following two lemmas for polyhedral sets.
Lemma 3.7. Let
K = {x | bTj x ≤ βj , 1 ≤ j ≤ k}
and suppose bTj x∗ = βj , 1 ≤ j ≤ k. Then x∗ + y ∈ K implies that y is in the recession

cone of K.
Proof. Since x∗ + y ∈ K it follows that for every j = 1, 2, . . . , k, bTj y ≤ 0, and so
for any x ∈ K, λ ≥ 0 and any j,
bTj (x + λy) = bTj x + λbTj y

≤ βj + λbTj y
(8) ≤ βj ,
which shows that y is in the recession cone of K.

Lemma 3.8. Suppose {xi } is a sequence of points in G = {x | bTj x ≤ βj , 1 ≤ j ≤
m}, converging to x∗ , such that for some k ≤ m
bTj x∗ = βj if 1 ≤ j ≤ k,
bTj x∗ < βj otherwise.
Then there is some λ > 0 and N such that for every y in the recession cone of
{x | bTj x ≤ βj , 1 ≤ j ≤ k},
y
i > N ⇒ xi + λ ∈ G.
kyk
Proof. If k = m, then the result is trivial. Otherwise k < m, so let
K = {x | bTj x ≤ βj , 1 ≤ j ≤ k},
and define C to be the recession cone of K. Since
G = K ∩ {x | bTj x ≤ βj , k < j ≤ m},
every member of {xi } lies in K and satisfies
(9) xi + λy ∈ K, λ ≥ 0, y ∈ C.
Now since bTj x∗ < βj for k < j ≤ m, we may choose λ > 0 so that
(10) kz − x∗ k < 2λ =⇒ bTj z ≤ βj , k < j ≤ m.
Thus if N is chosen sufficiently large so that
i > N =⇒ kxi − x∗ k < λ,
then for every y ∈ C,

° °
° °
°xi + λ y − x∗ ° ≤ kxi − x∗ k + λ < 2λ.
° kyk °
It now follows from (10) that

T y
bj xi + λ ≤ βj , k < j ≤ m.
kyk
Furthermore by (9),

y
bTj xi + λ ≤ βj , 1 ≤ j ≤ k,
kyk
y
and so xi + λ kyk ∈ G.
We now apply the above lemmas to prove Lemma 3.9. The proof proceeds by
showing that for an appropriately chosen convergent subsequence {xσ(i) }, the projec-
T
tion of πσ(i) T in the direction of xσ(i+1) − xσ(i) is uniformly bounded. Once this is
established the conclusion of Lemma 3.9 is immediate.
Lemma 3.9. Suppose {(xτ (i) , θτ (i) )} is a subsequence of the sequence of solutions
generated by the inexact cut algorithm, and let {πτ (i) } be the corresponding approxi-
mately optimal solutions to the dual of SP(xτ (i) ). Then there exists a subsequence of
{xτ (i) } (indexed by σ(i)) such that xσ(i) → x∗ and
T
lim inf −πσ(i) T (xσ(i+1) − xσ(i) ) ≥ 0.
Proof. Since X is bounded, convex, and polyhedral, the (finite) collection of all
relative interiors of the faces of X partition it [16, Theorem 18.2]. Hence there is a
subsequence of {xτ (i) }, indexed by γ(i), such that {xγ(i) } lies in the relative interior
of a face G of X and converges to a point x∗ ∈ G. (We shall henceforth denote the
relative interior of G by ri G.) Since G is polyhedral we may represent it by
G = {x | bTi x ≤ βi , 1 ≤ i ≤ m}.
If x∗ is in the interior of G, then define C to be Rn . In this case there is clearly some

y
λ > 0 such that for every y ∈ C, and i sufficiently large, xγ(i) + λ kyk ∈ G.
Otherwise, without loss of generality define k to be such that
bTi x∗ = βi , 1 ≤ i ≤ k, bTi x∗ < βi , k < i ≤ m,
and define C to be the recession cone of {x | bTi x ≤ βi , 1 ≤ i ≤ k}. By Lemma 3.8

there is some λ > 0 such that for every y ∈ C and for i sufficiently large,
y
(11) xγ(i) + λ ∈ G.
kyk
Since we are concerned here with the limiting behavior of {xγ(i) } we shall henceforth
assume that (11) holds for all members of {xγ(i) }.
We now show that we can choose a subsequence {xσ(i) } of {xγ(i) } such that
xσ(i−1) − xσ(i) ∈ C. When C = Rn this is trivial. Otherwise we construct the subse-
quence by choosing xσ(k) given xσ(k−1) in the following manner. Since xσ(k−1) ∈ ri G,
there exists > 0 such that
({xσ(k−1) } + B) ∩ aff G ⊆ G,
where B is the open unit ball and aff G is the affine hull of G. Now for γ(i) large
enough we have that x∗ − xγ(i) ∈ B, and so if we choose σ(k) = γ(i), then
x∗ + (xσ(k−1) − xσ(k) ) = xσ(k−1) + x∗ − xγ(i) ∈ G,
since xσ(k−1) + x∗ − xγ(i) is also in aff G. Therefore
x∗ + (xσ(k−1) − xσ(k) ) ∈ {x | bTi x ≤ βi , 1 ≤ i ≤ k},

and by Lemma 3.7 we deduce that

(12) xσ(k−1) − xσ(k) ∈ C.
Since xσ(i) ∈ ri G, this construction may be repeated to yield an infinite sequence.
Applying Lemma 2.1 to members of {xσ(i) }, we have for any x that
T
Q(x) ≥ Q(xσ(i−1) ) − πσ(i−1) T (x − xσ(i−1) ) − σ(i−1) .
If we choose
xσ(i−1) − xσ(i)
x = xσ(i−1) + ° °
°xσ(i−1) − xσ(i) ° λ,
then by Lemma 3.8, (12) and (11) yield x ∈ G and give
T xσ(i−1) − xσ(i)
−λπσ(i−1) T° °
°xσ(i−1) − xσ(i) ° ≤ Q(x) − Q(xσ(i−1) ) + σ(i−1)
≤ sup Q(x) − inf Q(x) + σ(i−1) .
x∈G x∈G
If we set M = supx∈G Q(x) − inf x∈G Q(x) + 1 , then since {i } is decreasing we obtain
T xσ(i−1) − xσ(i) M
−πσ(i−1) T° °
°xσ(i−1) − xσ(i) ° ≤ λ .
Therefore
T M° °
°xσ(i−1) − xσ(i) ° ,
−πσ(i−1) T (xσ(i) − xσ(i−1) ) ≥ −
λ
which implies
T
lim inf −πσ(i−1) T (xσ(i) − xσ(i−1) ) ≥ 0.
Theorem 3.10. If X = {x ≥ 0 | Ax = b} is bounded and X ⊆ dom Q, the
inexact cut algorithm terminates in a finite number of iterations with a δ-optimal
solution of P.
Proof. The proof is similar to that of Theorem 3.6. We will start by showing
Ui − Li ↓ 0. If there exists m such that θi ≥ vi for all i ≥ m, then Lemma 3.3
delivers the conclusion. Otherwise, there exists a subsequence {(xτ (i) , θτ (i) )} such
that θτ (i) < vτ (i) , and since X is bounded, without loss of generality we may assume
that {(xτ (i) , θτ (i) )} converges to (x∗ , θ∗ ), say. Then by Lemma 3.4,
lim inf πτT(i−1) T (xτ (i) − xτ (i−1) ) ≥ 0.

Thus
(13) lim sup −πτT(i−1) T (xτ (i) − xτ (i−1) ) ≤ 0.
Now we can apply Lemma 3.9 to extract a subsequence {xσ(i) } of {xτ (i) } such that
T
(14) lim inf −πσ(i) T (xσ(i+1) − xσ(i) ) ≥ 0.
From (13) and (14) we have
T
−πσ(i) T (xσ(i+1) − xσ(i) ) → 0.
This yields Uσ(i) − Lσ(i) → 0, implying that the decreasing sequence {Ui − Li }
tends to 0, which then gives the result as in the proof of Theorem 3.6.
4. Dantzig–Wolfe decomposition. It is well known that Benders decompo-
sition is dual to Dantzig–Wolfe decomposition. Therefore some form of inexact opti-
mization procedure should apply to the latter algorithm in a way that mirrors the
steps of the inexact cut algorithm described in section 2. In fact such a scheme has
been outlined in the literature by Kim and Nazareth [15], who discuss the compu-
tational advantages of using interior-point methods in such an approach. We digress
briefly in this section to explore the asymptotic convergence properties of such an
algorithm.
The dual problem of P can be formulated as
D: maximize bT u + hT v
subject to AT u + T T v ≤ c,
W T v ≤ q.
Suppose for the moment that the set V = {v | W T v ≤ q} is bounded with extreme
points {vi }. Then Dantzig–Wolfe decomposition solves a restricted master problem
P
MD: maximize bT u + i λi hT vi
P
subject to AT u + i λi T T vi ≤ c,
P
i λi = 1,
λ ≥ 0,
where the summations are taken over a subset of {vi }. New extreme points are added
iteratively to this subset by solving MD, obtaining optimal dual variables (x, θ), and
then solving the subproblem
SD(x): maximize (hT − xT T T )v
subject to W T v ≤ q,
T
to give a new column T 1vi to be added to the restricted master problem, in the
event that this column has a positive reduced cost defined by
(hT − xT T T )vi − θ.
In our inexact Dantzig–Wolfe decomposition algorithm we first choose a conver-

gence tolerance δ, set an iteration counter i := 0, and choose some decreasing sequence
{i } that converges to 0. We do not require that V be bounded, but following [15]
we require an initial set of (not necessarily extreme) points {v1 , v2 , . . . , vN } ⊆ V such
that MD has a feasible solution. The algorithm then proceeds as follows.
Inexact Dantzig–Wolfe decomposition algorithm.
While Ui − Li > δ
(1) Set i := i + 1.
(2) Solve MD to obtain P (ui , λ) and dual variables xi and θi .
(3) Set Li := bT ui + i λi hT vi .
(4) Perform an inexact optimization to generate a vector vi feasible for SD(xi )
such that
(15) viT (h − T xi ) + i > V (SD(xi )).
(5) Set Ui := min{Ui−1 , cT xi + viT (h − T xi ) + i }.

T
(6) If viT (h − T xi ) > θi , then add the column T 1vi to MD,
else set i := i + 1, xi+1 := xi , θi+1 := θi , Li+1 := Li , Ui+1 := Ui and go to
step 4.
Here V (SD(xi )) is the optimal value of SD(xi ). Since the dual of SD(xi ) is easily
seen to be SP(xi ), V (SD(xi )) = Q(xi ), and so step 4 of this algorithm is identical to
the same step of the inexact cut algorithm of section 2.
In classical Dantzig–Wolfe decomposition, each solution vi obtained for SD is an
extreme point, of which there is a finite number, thus guaranteeing finite termination.
In the inexact algorithm, this is no longer true. However, Theorem 3.10 may be
invoked to yield the following corollary.
Corollary 4.1. If X = {x ≥ 0 | Ax = b} is a bounded set and for every x ∈ X
the problem SD(x) is bounded, then the inexact Dantzig–Wolfe algorithm terminates
in a finite number of iterations with a δ-optimal solution of D.
Since SD(x) will always have a feasible solution (if D does), the boundedness
condition on SD(x) is equivalent to SP(x) being feasible, which is the relatively com-
plete recourse assumption of the previous section. The other assumption, that X is
bounded, appears to be rather restrictive in the current context, and it fails to hold
in the case when A and b are both absent, a typical situation in many applications of
Dantzig–Wolfe decomposition. The convergence proof requires X to be bounded to
enable the extraction of convergent subsequences. Even when A and b fail to bound
X, we can still extract convergent subsequences as long as we have a guarantee that
the sequence {xi } lies in a bounded set. In Benders decomposition we can enforce this
condition in practice by placing a priori bounds on the components of x. Similarly, in
inexact Dantzig–Wolfe decomposition we can impose a priori bounds on the optimal
dual variables for the master problem constraints (by placing a priori penalties on
infeasibilities in these constraints).
5. Computational results. We conclude by presenting some computational
results of applying the inexact cut algorithm to a set of problems that arise in the
planning of hydroelectric power generation. The problems are all based on a mul-
tistage stochastic programming model developed by Broad [4], in which the New
Zealand electricity system is represented as a side-constrained network model with
nodes representing hydroelectric reservoirs, hydroelectric generation facilities, thermal
generation facilities, and demand points and arcs with constant losses representing
the transmission network. The model consists of six reservoirs, six thermal stations,
and 22 hydrostations.
Each stage is a week long, and demand in each week is represented by a piecewise
linear load duration curve with three linear sections. At each stage several random
outcomes are possible for the inflows into the reservoirs in the current week. We impose
a lower bound on the final level of the reservoirs at the end of the final stage. This lower
bound is a fixed fraction of the original initial level of the reservoirs in the very first
stage. Additional side constraints include DC load flow constraints that govern the
transmission flows and conservation of water flow equations in hydroelectric systems.
The linear program for each stage has 273 variables and 120 constraints. The objective
in each stage is to minimize the cost of thermal electricity generation over the current
week plus the expected future cost of thermal generation.
The multistage models described above were converted into two-stage and three-
stage problems by aggregating consecutive stages into larger problems. For example, to
obtain a two-stage problem from a multistage problem we aggregate each second-stage
problem and its descendants into a single deterministic equivalent linear program.
Table 1
Problem sizes.
Problem # agg stg P Subproblem # stg # scen/stg

P1 2 10,920 × 24,843 1,200 × 2,730 3 9
P2 2 10,920 × 24,843 1,200 × 2,730 3 9
P3 2 14,520 × 33,033 4,800 × 10,920 5 3
P4 2 14,520 × 33,033 4,800 × 10,920 5 3
P5 2 43,680 × 99,372 14,520 × 33,033 6 3
P6 2 43,680 × 99,372 14,520 × 33,033 6 3
P7 3 14,520 × 33,033 1,560 × 3,549 5 3
P8 3 14,520 × 33,033 1,560 × 3,549 5 3
P9 3 43,680 × 99,372 4,800 × 10,920 6 3
P10 3 43,680 × 99,372 4,800 × 10,920 6 3
P11 3 35,154 × 42,966 1,404 × 1,560 5 5
Similarly, to obtain a three-stage problem from a multistage problem, we aggregate

each third-stage problem and its descendants into a single deterministic equivalent
linear program. Table 1 presents the size and characteristics of the resulting problems.
Although the problems in each pair have the same size, they differ in the lower bounds
imposed on the final levels of the reservoirs. Column 1 of Table 1 gives the problem
identifiers, column 2 presents the number of stages in the problem (after aggregation),
and column 3 contains the size of the deterministic equivalent problem. Column 4
contains the size of each subproblem after aggregation. Column 5 contains the number
of stages in the problem before aggregation. For example, problem P5 is a six-stage
problem, in which we have aggregated the last five stages to produce a two-stage
problem. The last column contains the number of random outcomes (inflows) at each
stage.
When applied to stochastic programs, Benders decomposition and the inexact cut
algorithm must solve a number of subproblems in each iteration. The resulting cut
has as coefficients the expectation of the subproblem coefficients. In the case of three-
stage problems we traverse the scenario tree depth first using the fast pass procedure
(see [12, 18]).
Benders decomposition and the inexact cut algorithm were both implemented
using CPLEX 4.0’s primal-dual interior-point solver baropt to solve the subproblems
and the simplex solver optimize to solve the first-stage problems. We do not apply
the crossover operation (hybbaropt) in solving the subproblems. For the inexact cut
algorithm we terminate optimizing the last stage problems once an -optimal solution
has been achieved. (All but last-stage problems are solved to optimality.) We start
with = 10, 000 and reduce it by a factor of 10 at each iteration; we terminate baropt
when both primal and dual feasibility are attained in the subproblem and the dual
objective is at most away from the primal objective.
Observe that obtaining a primal feasible solution is not a key requirement of the
algorithm but gives a convenient means for bounding how far our dual solution is from
optimality; there is potential for efficiency improvements if a bound can be found that
requires less computation. Indeed it is easy to see that since the proof of convergence
works with a subsequence of the iterates, the requirement that i decreases monoton-
ically is not necessary, as long as i → 0. This raises the (unexplored) possibility of
ignoring i , at least in the early stages of the algorithm, and interrupting baropt in
step 4 as soon as dual feasibility is attained, then restarting it only if πiT (h−T xi ) ≤ θi
(i.e., the cut is not exact enough to change xi ).
Table 2
Performance comparison.
Problem # BD cuts # inex cuts BD time inex time % improvement

P1 22 9 170 68 60%
P2 33 20 261 159 39%
P3 5 5 124 109 12%
P4 24 17 640 398 38%
P5 4 4 594 546 8%
P6 4 4 626 585 7%
P7 30 14 324 150 54%
P8 33 27 376 304 19%
P9 17 15 1207 1087 10%
P9 14 11 979 780 20%
P11 4 4 150 134 11%
Table 2 contains a comparison of the computational results for the two meth-
ods. The termination criterion for both algorithms requires a relative gap of 10−5
between the upper and the lower bounds (i.e., we stop when U U −L
< 10−5 ). All times
are reported on an SGI Power Challenge. Column 1 contains the problem identifiers.
Columns 2 and 3 contain the number of cuts under the exact and inexact cut algo-
rithms, respectively. Columns 4 and 5 contain the timing in seconds for the exact and
inexact methods, respectively. The last column contains the percentage of improve-
ment of the inexact cut algorithm over the exact Benders decomposition algorithm.
The entries in this column are calculated as ( exact time − inexact time ) × 100%.
exact time
Note that traditionally the subproblems are not aggregated and they are solved
using the (dual) simplex method with warm starting. For some problems this is more
efficient than using an interior-point method on an aggregated subproblem, although
in other cases (e.g., P3, P7, and P11) we experienced significant speed-up by aggre-
gating and using the interior-point method versus Benders decomposition with warm
starting simplex. It may be possible to warm start the interior-point method effec-
tively when solving the subproblems, using recent research developed to this end (see,
for example, [19, 9]).
6. Conclusions. In every one of our problems the inexact cut algorithm im-
proved the time to obtain a solution with the same accuracy as that of the Benders
decomposition algorithm. In our experiments, the choice of {i } is made indepen-
dently of the problem. Further improvements in speed can be achieved by making
a problem-dependent choice of {i }. In Table 2 the greatest improvements were ob-
tained in cases where the Benders decomposition required a large number of cuts. In
these cases we observed that often during the course of the exact algorithm the lower
bounds did not change over the course of several iterations. The inexact cut algorithm
does not display this behavior, and it reaches an approximately optimal solution with
fewer cuts. This suggests that computing cuts inexactly is a promising and simple im-
provement strategy for operations research practitioners who observe similar behavior
in Benders decomposition applied to their stochastic linear programming models.
Acknowledgments. We thank Michael Saunders, Suvrajeet Sen, and the anony-

mous referees for their insightful comments, which significantly improved the exposi-
tion of this paper.
REFERENCES
[1] K. T. Au, J. L. Higle, and S. Sen, Inexact subgradient methods with applications in stochastic
programming, Math. Programming, 63 (1994), pp. 65–82.
[2] O. Bahn, O. Du Merle, J.-L. Goffin, and J.-P. Vial, A cutting plane method from analytic
centers for stochastic programming, Math. Programming Ser. B, 69 (1995), pp. 45–73.
[3] J. F. Benders, Partitioning procedures for solving mixed-variables programming problems,
Numer. Math., 4 (1962), pp. 238–252.
[4] K. P. Broad, Power Generation Planning Using Scenario Aggregation, M.S. thesis, University
of Auckland, Auckland, New Zealand, 1996.
[5] G. B. Dantzig and P. Wolfe, Decomposition principle for linear programs, Oper. Res., 8
(1960), pp. 101–111.
[6] G. B. Dantzig and P. Wolfe, The decomposition algorithm for linear programs, Economet-
rica, 29 (1961), pp. 767–778.
[7] E. Flippo and A. Rinnooy Kan, Decomposition in general mathematical programming, Math.
Programming, 60 (1993), pp. 361–382.
[8] A. M. Geoffrion, Generalized Benders decomposition, J. Optim. Theory Appl., 10 (1972),
pp. 237–260.
[9] J. Gondzio, Warm start of the primal-dual method applied in the cutting-plane scheme, Math.
Programming Ser. A, 83 (1998), pp. 125–143.
[10] J. L. Higle and S. Sen, On the convergence of algorithms with implications for stochastic and
nondifferentiable optimization, Math. Oper. Res., 17 (1992), pp. 112–131.
[11] W. Hogan, Application of general convergence theory for outer approximation algorithms,
Math. Programming, 5 (1973), pp. 151–168.
[12] J. Jacobs, G. Freeman, J. Grygier, D. Morton, G. Schultz, K. Staschus, and J. Ste-
dinger, SOCRATES: A system for scheduling hydro-electric generation under uncertainty,
Ann. Oper. Res., 59 (1995), pp. 99–133.
[13] P. Kall and S. W. Wallace, Stochastic Programming, John Wiley, New York, 1994.
[14] J. E. Kelley, Jr., The cutting-plane method for convex programs, J. Soc. Indust. Appl. Math.,
8 (1960), pp. 703–712.
[15] K. Kim and J. L. Nazareth, The decomposition principle and algorithm for linear program-
ming, Linear Algebra Appl., 152 (1991), pp. 119–133.
[16] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970.
[17] R. M. Van Slyke and R. Wets, L-shaped linear programs with applications to optimal control
and stochastic programming, SIAM J. Appl. Math., 17 (1969), pp. 638–663.
[18] R. J. Wittrock, Advances in a Nested Decomposition Algorithm for Solving Staircase Linear
Programs, Report SOL 83-2, Systems Optimization Laboratory, Department of Operations
Research, Stanford University, Stanford, CA, 1983.
[19] G. Zakeri, D. M. Ryan, and A. B. Philpott, Techniques for Solving Large Scale Set Parti-
tioning Problems, Technical report, University of Auckland, New Zealand, 1996.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Zakeri cuts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zakeri cuts

Uploaded by

Copyright:

Available Formats

SIAM J. OPTIM.

c 2000 Society for Industrial and Applied Mathematics

INEXACT CUTS IN BENDERS DECOMPOSITION∗

Key words. stochastic programming, Benders decomposition, inexact cuts

AMS subject classifications. 90C15, 90C05, 90C06, 90C90

1. Introduction. Many large linear programming problems exhibit a block-

Bag 92019, Auckland, New Zealand (a.philpott@auckland.ac.nz, d.ryan@auckland.ac.nz).

Benders decomposition algorithm.

Q(x) − Q(xi ) ≥ πiT (h − T x) − πiT (h − T xi ) − i ,

Q(x) ≥ Q(xi ) − πiT T (x − xi ) − i ,

which gives the result.

Proof. Since Ui is an upper bound on the value of P and Li is a lower bound,

and the result follows.

To illuminate the role that the boundedness of {π T T } plays in the proof we

f (x) = max {bTk x + βk }

lies in the half-space H = {(x, µ) | µ ≥ bT x + β}, then kbk ≤ max1≤k≤N kbk k.

bT z + β = nkbk2 + β > nkbkM + β + |β̃ − β|

= max [(nbT )bk ] + max |βk |

≥ max [(nbT )bk + βk ]

πiT (h − T x) ≤ max π̂kT (h − T x) = Q(x),

The result follows since i → 0.

θτ (i) ≥ πτT(i−1) (h − T xτ (i) ),

vτ (i) − θτ (i) ≤ vτ (i) − πτT(i−1) (h − T xτ (i) )

Q(xτ (i) ) − τ (i) ≤ vτ (i) ≤ Q(xτ (i) )

(3) vτ (i) → Q(x∗ ),

lim inf πτT(i−1) T (xτ (i) − xτ (i−1) ) ≥ 0.

Lemma 3.5. Suppose X = {x ≥ 0 | Ax = b} is bounded and dom Q is Rn . If

0 ≤ Uτ (i) − Lτ (i) ≤ vτ (i) + τ (i) − θτ (i) ,

Vτ (i) = vτ (i) + τ (i) − θτ (i) ,

then by Lemma 3.4

(5) lim vτ (i) − vτ (i−1) = 0.

Furthermore, since dom Q = Rn , by Lemma 3.2, πτT(i−1) T is bounded, and so

πτT(i−1) T (xτ (i) − xτ (i−1) ) → 0.

and so cT xk + Q(xk ) is within δ of the optimum.

πτT(i−1) T (xτ (i) − xτ (i−1) )

and demonstrate that for some subsequence {xσ(i) } of {xτ (i) },

Since by virtue of Lemma 3.4

which with (7) yields (6).

Lemma 3.7. Let

and suppose bTj x∗ = βj , 1 ≤ j ≤ k. Then x∗ + y ∈ K implies that y is in the recession

bTj (x + λy) = bTj x + λbTj y

which shows that y is in the recession cone of K.

Proof. If k = m, then the result is trivial. Otherwise k < m, so let

and define C to be the recession cone of K. Since

G = K ∩ {x | bTj x ≤ βj , k < j ≤ m},

every member of {xi } lies in K and satisfies

(10) kz − x∗ k < 2λ =⇒ bTj z ≤ βj , k < j ≤ m.

Thus if N is chosen sufficiently large so that

i > N =⇒ kxi − x∗ k < λ,

then for every y ∈ C,

It now follows from (10) that

If x∗ is in the interior of G, then define C to be Rn . In this case there is clearly some

bTi x∗ = βi , 1 ≤ i ≤ k, bTi x∗ < βi , k < i ≤ m,

and define C to be the recession cone of {x | bTi x ≤ βi , 1 ≤ i ≤ k}. By Lemma 3.8

({xσ(k−1) } + B) ∩ aff G ⊆ G,

x∗ + (xσ(k−1) − xσ(k) ) = xσ(k−1) + x∗ − xγ(i) ∈ G,

since xσ(k−1) + x∗ − xγ(i) is also in aff G. Therefore

x∗ + (xσ(k−1) − xσ(k) ) ∈ {x | bTi x ≤ βi , 1 ≤ i ≤ k},

and by Lemma 3.7 we deduce that

then by Lemma 3.8, (12) and (11) yield x ∈ G and give

lim inf πτT(i−1) T (xτ (i) − xτ (i−1) ) ≥ 0.

In our inexact Dantzig–Wolfe decomposition algorithm we first choose a conver-

(15) viT (h − T xi ) + i > V (SD(xi )).

(5) Set Ui := min{Ui−1 , cT xi + viT (h − T xi ) + i }.

Q(x) − Q(xi ) ≥ πiT (h − T x) − πiT (h − T xi ) − i ,

Q(x) ≥ Q(xi ) − πiT T (x − xi ) − i ,

The result follows since i → 0.

Q(xτ (i) ) − τ (i) ≤ vτ (i) ≤ Q(xτ (i) )

0 ≤ Uτ (i) − Lτ (i) ≤ vτ (i) + τ (i) − θτ (i) ,

Vτ (i) = vτ (i) + τ (i) − θτ (i) ,

({xσ(k−1) } + B) ∩ aff G ⊆ G,

(15) viT (h − T xi ) + i > V (SD(xi )).

(5) Set Ui := min{Ui−1 , cT xi + viT (h − T xi ) + i }.