Broyden Method

A limited memory Broyden method to solve
high-dimensional systems of nonlinear equations
Proefschrift
ter verkrijging van

de graad van Doctor aan de Universiteit Leiden,
op gezag van de Rector Magnificus Dr. D. D. Breimer,
hoogleraar in de faculteit der Wiskunde en
Natuurwetenschappen en die der Geneeskunde,
volgens besluit van het College voor Promoties
te verdedigen op dinsdag 9 december 2003
klokke 15.15 uur
door
Bartholomeus Andreas van de Rotten
geboren te Uithoorn
op 20 oktober 1976
Samenstelling van de promotiecommissie:
promotoren: prof. dr. S.M. Verduyn Lunel

prof. dr. ir. A. Bliek (Universiteit van Amsterdam)
referent: prof. dr. D. Estep (Colorado State University)
overige leden: prof. dr. G. van Dijk

dr. ir. H.C.J. Hoefsloot (Universiteit van Amsterdam)
prof. dr. R. van der Hout
dr. W.H. Hundsdorfer (CWI, Amsterdam)
prof. dr. L.A. Peletier
prof. dr. M.N. Spijker
A limited memory Broyden method
to solve high-dimensional systems
of nonlinear equations
Bart van de Rotten
Mathematisch Instituut, Universiteit Leiden, The Netherlands
ISBN: 90-9017576-8
Printed by PrintPartners Ipskamp
The research that led to this thesis was funded by N.W.O. (Nederlandse or-
ganisatie voor Wetenschappelijk Onderzoek) grant 616-61-410 and supported
by the Thomas Stieltjes Institute for Mathematics:
Contents
Introduction 1
I Basics of limited memory methods 15
1 An introduction to iterative methods 17

1.1 Iterative methods in one variable . . . . . . . . . . . . . . . . . 18
1.2 The method of Newton . . . . . . . . . . . . . . . . . . . . . . . 25
1.3 The method of Broyden . . . . . . . . . . . . . . . . . . . . . . 35
2 Solving linear systems with Broyden’s method 55

2.1 Exact convergence for linear systems . . . . . . . . . . . . . . . 56
2.2 Two theorems of Gerber and Luk . . . . . . . . . . . . . . . . . 62
2.3 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . 67
3 Limited memory Broyden methods 71

3.1 New representations of Broyden’s method . . . . . . . . . . . . 72
3.2 Broyden Rank Reduction method . . . . . . . . . . . . . . . . . 83
3.3 Broyden Base Reduction method . . . . . . . . . . . . . . . . . 96
3.4 The approach of Byrd . . . . . . . . . . . . . . . . . . . . . . . 100
II Features of limited memory methods 109
4 Features of Broyden’s method 111

4.1 Characteristics of the Jacobian . . . . . . . . . . . . . . . . . . 113
4.2 Solving linear systems with Broyden’s method . . . . . . . . . . 120
4.3 Introducing coupling . . . . . . . . . . . . . . . . . . . . . . . . 125
4.4 Comparison of selected limited memory Broyden methods . . . 128
i
ii Contents
5 Features of the Broyden rank reduction method 135

5.1 The reverse flow reactor . . . . . . . . . . . . . . . . . . . . . . 135
5.2 Singular value distributions of the update matrices . . . . . . . 138
5.3 Computing on a finer grid using same amount of memory . . . 140
5.4 Comparison of selected limited memory Broyden methods . . . 142
III Limited memory methods

applied to periodically forced processes 147
6 Periodic processes in packed bed reactors 149

6.1 The advantages of periodic processes . . . . . . . . . . . . . . . 149
6.2 The model equations of a cooled packed bed reactor . . . . . . 152
7 Numerical approach for solving

periodically forced processes 167
7.1 Discretization of the model equations . . . . . . . . . . . . . . . 168
7.2 Tests for the discretized model equations . . . . . . . . . . . . . 173
7.3 Bifurcation theory and continuation techniques . . . . . . . . . 175
8 Efficient simulation of periodically forced reactors in 2D 183

8.1 The reverse flow reactor . . . . . . . . . . . . . . . . . . . . . . 183
8.2 The behavior of the reverse flow reactor . . . . . . . . . . . . . 185
8.3 Dynamic features of the full two-dimensional model . . . . . . . 191
Notes and comments 195
Bibliography 201
A Test functions 207
B Matlab code of the limited memory Broyden methods 211
C Estimation of the model parameters 219
Samenvatting (Waarom Broyden?) 223
Nawoord 227
Curriculum Vitae 229

Introduction
Periodic chemical processes form a field of major interest in chemical reac-

tor engineering. Examples of such processes are the pressure swing adsorber
(PSA), the thermal swing adsorber (TSA), the reverse flow reactor (RFR),
and the more recently developed pressure swing reactor (PSR). The state of
a chemical reactor, that contains a periodically forced process, is given by the
temperature profiles and concentration profiles of the reactants. Starting with
an initial state, the reactor generally goes through a transient phase during
many periods before converging to a periodic limiting state. This periodic
limiting state is also known as the cyclic steady state (CSS). Because the re-
actor operates in this state most of the time, it is interesting to investigate
the dependence of the cyclic steady state on the operating parameters of the
reactor.
The simulation of periodically forced processes in packed bed reactors leads
to the development of partial differential equations. In order to investigate the
behavior of the system numerically, we discretize the equations in space. The
action of the process during one period can be computed by integrating the
obtained system of ordinary differential equations in time for one period. The
map that assigns to an initial state of the process, the state after one period
is called the period map. We denote the period map by f : Rn → Rn , which,
in general, is highly nonlinear. The dynamical process in the reactor can now
be formulated by the dynamical system
xk+1 = f (xk ), k = 0, 1, 2, . . . ,
where xk denotes the state of the reactor after k periods. Periodic states of
the reactor are fixed points of the period map f and a stable cyclic steady
state can be computed by taking the limit of xk as k → ∞. Depending on
the convergence properties of the system at hand, the transient phase of the
process might be very long, and efficient methods to find the fixed points of f
are essential.
1
2 Introduction
Fixed points of the map f correspond to zeros of g : Rn → Rn where g is

given by
g(x) = f (x) − x.
So, the basic equation we want to solve is
g(x) = 0 for x ∈ Rn . (1)
Because (1) is a system of n nonlinear equations, iterative algorithms are

needed to approximate a zero of the function g. The iterative algorithms pro-
duce a sequence {xk } of approximations to the zero x∗ of g. A function eval-
uation can be a rather expensive task, and it is generally accepted that the
most efficient iterative algorithms for solving (1) minimize the least number
of function evaluations.
In his thesis [49], Van Noorden compares several iterative algorithms for
the determination of the CSS of periodically forced processes, by solving (1).
He deduces that for this type of problem, the Newton-Picard method and the
method of Broyden are especially promising.
In this thesis, the study of periodically forced processes is extended to
more complex models. Because the dimension of the discretized system of
such models is very large, memory constraints arise and care must be taken
in the choice of iterative methods. Therefore, it is necessary to develop lim-
ited memory algorithms to solve (1), that is, algorithms that use a restricted
amount of memory. Since the method of Broyden is popular in the chemical
reactor engineering, we focus on approaches aimed at reducing the memory
needed by the method of Broyden. We call the resulting algorithms limited
memory Broyden methods.
Basics of limited memory methods

The standard iterative algorithm is the method of Newton. Let x0 ∈ Rn be an
initial guess in the neighborhood of a zero x∗ of g. Newton’s method defines a
sequence {xk } in Rn of approximations of x∗ by
xk+1 = xk − Jg−1 (xk )g(xk ), k = 0, 1, 2, . . . , (2)
where Jg (x), is the Jacobian of g at the point x. An advantage of the method

of Newton is that the convergence is quadratic in a neighborhood of a zero,
i.e.,
kxk+1 − x∗ k < ckxk − x∗ k2 ,
Introduction 3
for a certain constant c > 0. Since it is not always possible to determine the
Jacobian of g analytically, we often have to approximate Jg using finite differ-
ences. The number of function evaluations per iteration step in the resulting
approximate Newton’s method is (n + 1).
In 1965, Broyden [8] proposed a method that uses only one function evalu-
ation per iteration step instead of (n + 1). The main idea of Broyden’s method
is to approximate the Jacobian of g by a matrix Bk . Thus the scheme (2) is
replaced by
xk+1 = xk − Bk−1 g(xk ), k = 0, 1, 2, . . . . (3)
After every iteration step, the Broyden matrix Bk is updated using a rank-
one-matrix. If g(x) is an affine function, so for some A ∈ Rn×n and b ∈ Rn ,
g(x) = Ax + b, then
g(xk+1 ) − g(xk ) = A(xk+1 − xk )
holds. According to this equality, the updated Broyden-matrix Bk+1 is chosen

such that it satisfies the equation
yk = Bk+1 sk , (4)
with
sk = xk+1 − xk and yk = g(xk+1 ) − g(xk ).
Equation (4) is called the secant equation and algorithms for which this con-
dition is satisfied are called secant methods. If we assume that Bk+1 and Bk
are identical on the orthogonal complement of the linear space spanned by s k ,
the condition in (4) results in the following update scheme for the Broyden
matrix Bk
sTk g(xk+1 )sTk
Bk+1 = Bk + (yk − Bk sk ) = B k + (5)
sTk sk sTk sk
In 1973, Broyden, Dennis and Moré [11] published a proof that the method
of Broyden is locally q-superlinearly convergent, i.e.,
kxk+1 − x∗ k
lim = 0.
k→∞ kxk − x∗ k
In 1979, Gay [22] proved that for linear problems the method of Broyden is in
fact exactly convergent in 2n iterations. Moreover, he showed that this implies
locally 2n-step, quadratic convergence for nonlinear problems
kxk+2n − x∗ k ≤ ckxk − x∗ k2 ,
4 Introduction
with c > 0. This proof of exact convergence was simplified and sharpened in
1981 by Gerber and Luk [23]. In practice, these results imply that the method
of Broyden needs more iterations to converge than the method of Newton.
Yet, since only one function evaluation is made for every iteration step, the
method of Broyden might significantly reduce the amount of CPU-time to
solve the problem.
In Chapter 1, we discuss the method of Newton and the method of Broyden
in more detail and in particular describe the derivation and convergence prop-
erties. Subsequently, we consider the method of Broyden for linear systems
of equations in Chapter 2. We deduce that Broyden’s method uses selective
information of the system to solve it.
Both Newton’s and Broyden’s method need to store an (n × n)-matrix,
see (2) and (3). Therefore, for high-dimensional systems, this might lead to
severe memory constraints. From the early seventies, there has been serious
attention paid to the issue of reducing the amount of storage required for
the iterative methods. Different techniques have appeared for solving large
nonlinear problems [62].
The problem that we consider is the general nonlinear equation (1), so
nothing is known beforehand about the structure of the Jacobian of the sys-
tem. In Chapter 3, we develop several limited memory methods that do not
depend on the structure of the Jacobian and are based on the method of
Broyden. In addition to a large reduction of the memory used, these limited
memory methods give more insight in the original method of Broyden, since we
investigate the question of how much and which information can be dropped
without destroying the property of superlinear convergence. In Section 3.2,
we derive our main algorithm, the Broyden Rank Reduction method (BRR).
To introduce the idea of the BRR method we first consider an example.
Example 1. The period map f : Rn → Rn to be considered is a small (take

ε = 1.0 · 10−2 ) quadratic perturbation to two times the identity map,
 
2x1 − εx22
 .. 
 . 
f (x) =  . (6)
2xn−1 − εx2n 
2xn
The unique fixed points of the function f, x∗ = 0, can be found by applying

Broyden’s method to solve (1) with g(x) = f (x) − x up to a certain residual
kg(x)k < 1.0 · 10−12 ,

Introduction 5
using initial estimate vector x0 = (1, . . . , 1). In order to obtain a good example
of memory reduction, we choose n = 100, 000. Starting with a simple initial
matrix B0 = −I, the first Broyden matrix is given by,
B1 = B0 + c1 dT1 ,
where c1 = g(x1 )/ks0 k and d1 = s0 /ks0 k. Because B0 does not have to be

stored, it is more economical to store the vector c1 and d1 instead of the
matrix B1 itself. Applying another update, we obtain the second Broyden
matrix,
B2 = B1 + c2 dT2 = B0 + c1 dT1 + c2 dT2 , (7)
where c2 = g(x2 )/ks1 k and d2 = s1 /ks1 k. Now 4 · n = 400, 000 locations are
used to store the vector pairs {c1 , d1 } and {c2 , d2 }. In the next iteration step,
we would need 6 · n = 600, 000 storage locations to store all of the vector pairs.
We consider the update matrix in the fifth iteration that consists of the first
five rank-one updates to the Broyden matrix,
Q = c1 dT1 + c2 dT2 + . . . + c5 dT5 .
Because Q is the sum of five rank-one matrices, it has rank less or equal to
five, and if we compute the singular value decomposition of Q (see Section 3.2
for details) we see that Q can be written as
Q = σ1 u1 v1T + . . . + σ5 u5 v5T ,
where {u1 , . . . , u5 } and {v1 , . . . , v5 } are orthonormal sets of vectors and
σ1 = 2.49, σ2 = 1.61, σ3 = 0.214 · 10−5 ,

σ4 = 0.121 · 10−12 , σ5 = 0.00.
This suggests that we can ignore the singular value σ5 in the singular value
decomposition of Q without changing the update matrix Q. We replace the
matrix Q by Qe with
e = Q − σ5 u5 v5T = σ1 u1 v1T + . . . + σ4 u4 v4T .

Q
We define c̃i := σi ui and d˜i := vi for i = 1, . . . 4. The difference between the

original Broyden matrix B5 = B0 + Q and the ’reduced’ matrix B e5 = B0 + Qe
can be estimated as
e5 k = kB0 + Q − B0 − Qk
kB5 − B e = kσ5 u5 v5T k = σ5 ku5 kkv5 k = σ5 ,
6 Introduction
which is equal to zero in this case. After this reduction we can store a new
pair of update vectors
c5 := g(x6 )/ks5 k and d2 := s2 /ks2 k.
Continuing, on every iteration step we first remove the singular value σ5 of
Q before computing the new update. This leads to Algorithm 3.11 of Section
3.2, the Broyden Rank Reduction method, with parameter p = 5. Surprisingly
the fifth singular value of the update matrix remains zero in all subsequent
iterations until the process is converged, see Figure 1. Therefore, if in every
iteration we save the four largest singular values of the update matrix and
drop the fifth singular value, we do not alter the Broyden matrix. In fact, we
apply the method of Broyden itself.
2.5
singular values
1.5
PSfrag replacements 0.5
0
0 5 10 15
iteration k
Figure 1: The singular values of the update matrix during the BRR process with
p = 5.
The rate of convergence of this process is plotted in Figure 2, together with

the rate of convergence of the BRR method for other values of p.
If p is larger than 5, the rate of convergence does not increase. We observe
that the residual kg(xk )k is approximately 10−14 after 14 iterations. For p = 5,
the number of required storage locations is reduced from n2 = 1010 for the
Broyden matrix of the original method to 2pn = 106 for the BRR method.
Note that p cannot be any small number, and care is needed to find the
optimal p. Clearly, the BRR process does not converge for p = 2. In the first
iterations, it might not be harmful to remove the second singular value of
the update matrix, but after 8 iterations the process fails to keep the fast q-
superlinear convergence and starts to diverge. For p = 3 we observe the same
kind of behavior, where the difficulties start in the 9th iteration.
Introduction 7
5
10
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 10 20 30 40 50 60
iteration k
Figure 2: The convergence rate of the Broyden Rank Reduction method when com-
puting a fixed point of the function f given by (6). [’◦’(p = 1), ’×’(p = 2), ’+’(p = 3),
’∗’(p = 4), ’¤’(p = 5)]
The reduction applied to the update matrix Q can also be explained as

follows. For the method of Broyden the action of the Broyden matrix has to be
known is all n directions, for example, in order to compute the new Broyden
step sk , see (3). In this situation the BRR method is satisfied with the action
of the Broyden matrix in only p directions. These directions are produced by
the Broyden process itself.
Features of limited memory methods

In Part I, we discuss most of the properties of Broyden’s method and the newly
developed Broyden Rank Reduction method. The good results in practical
applications of Broyden’s method can only be explained to a limited degree.
Therefore, in Part II we apply the method of Broyden and the BRR method to
several test functions. We are especially interested in nonlinear test functions,
since it is known that for linear systems of equations, Broyden’s method is far
less efficient than, for example, GMRES [61] and Bi-CGSTAB [70, 71].
In the neighborhood of the solution, a function often can be considered as
approximately affine. Moreover, the rate of convergence of Broyden’s method
applied to the linearization of a function indicates how much iterations it
might need for the function itself. In Part I, we show that if we apply Broy-
den’s method to an affine function g(x) = Ax + b, the difference kBk − Ak
between the Broyden matrix and the Jacobian of the function does not in-
crease as k increases, if measured in the matrix norm induced by the l2 -vector
8 Introduction
norm. For nonlinear functions g the difference kBk − Jg (x∗ )kF , measured in
the Frobenius norm, may increase. However, we can choose a neighborhood
N1 of the solution x∗ and a neighborhood N2 of the Jacobian Jg (x∗ ), so that
if (x0 , B0 ) ∈ N1 × N2 the difference kBk − Jg (x∗ )kF never exceeds two times
the initial difference kB0 − Jg (x∗ )kF .
Example 2. Let the matrix A be given by the sum

 
2 1
 .. .. 
 . . 
A=  ..
 + S,
 (8)
 . 1
2
where the elements of S are in between zero and one. The matrix S contains
in fact the values of a gray-scale picture of a cat. We consider the system
of linear equations Ax = 0 and apply the method of Broyden from initial
condition x0 = (1, . . . , 1) and with initial Broyden matrix B0 = −I. The
dimension of the problem is n = 100. Since A is invertible, the Theorem of
Gay implies that it will take Broyden’s method less then 200 iterations to solve
the problem exactly. It turns out that in the simulation, about 219 iterations
are needed to reach a residual of kg(xk )k < 10−12 . The finite arithmetic of the
computer has probably introduced a nonlinearity into the system so that the
conditions of Gay’s Theorem are not completely fulfilled.
In Figure 3, we have plotted the matrix A and the Broyden matrix for
different iterations. We observe that in some way the Broyden matrix Bk
tries to approximate the Jacobian A. Although the final Broyden matrix is
certainly not equal to the Jacobian, it approximates the Jacobian to such an
extent that the solution to the problem Ax = 0 can be found.
After 50 iterations, the rough contour of the cat can be recognized in
the Broyden matrix B50 . While reconstructing the two main diagonals of the
Jacobian, the picture of the cat is sharpened. Note that the light spot at
the left side of the image of the Jacobian is considered less interesting by the
method of Broyden. On the other hand, the nose and eyes of the cat are
clearly detected.
Limited memory methods applied to periodic processes

We now consider an application in the chemical reactor engineering. In case
of significant temperature fluctuations, it turns out to be essential to include
a second space dimension in the model of packed bed reactors. Moreover, to
Introduction 9
Jacobian B50
B100 B218
Figure 3: The Jacobian as given by (8) and the Broyden matrix at three different
iterations of the Broyden process (n = 100). Black corresponds to values smaller than
0 and white to values larger than 1.
obtain an accurate approximation of the periodic state of the reactor, it is nec-

essary to use a fine grid. This implies that the number of equations, n, is very
large. Combining the integration of the system of ordinary differential equa-
tions for the evaluation of the function g with a fine grid in the reactor makes
it practically impossible to solve (1) using classical iterative algorithms for
nonlinear equations. To overcome severe memory constraints, many authors
have reverted to pseudo-homogeneous one-dimensional models and to coarse
10 Introduction
grid discretization, which renders such models inadequate or inaccurate.

The radial transport of heat and matter is essential in non-isothermal
packed bed reactors [72]. A highly exothermic reaction, a large width of the
reactor, and efficient cooling of the reactor at the wall cause radial temperature
gradients to be present, see Figure 4. Clearly, for reactors operating under
these conditions the radial dimension must be taken into account explicitly.
temperature
conversion
PSfrag replacements PSfrag replacements
temperature
rad. distance rad. distance
ax. distance conversion ax. distance
Figure 4: Qualitative conversion and temperature distribution of the cooled reverse

flow reactor in the cyclic steady state using the two-dimensional model (10)-(12) with
the parameter values of Tables C.1 and C.2.
As an example, we consider a reverse flow reactor, which is considered in

detail in Chapter 8. The reverse flow reactor (RFR) is a catalytic packed-bed
reactor in which the flow direction is periodically reversed to trap a hot zone
within the reactor. Upon entering the reactor, the cold feed gas is heated up
regeneratively by the hot bed so that a reaction can occur. The reaction is
assumed to be exothermic. At the other end of the reactor, the hot product
gas is cooled by the colder catalyst particles. The beginning and end of the
reactor thus effectively work as heat exchangers. The cold feed gas purges
the high-temperature (reaction) front in downstream direction. Before the
hot reaction zone exits the reactor, the feed flow direction is reversed. The
flow-reversal period, denoted by tf , is usually constant and predefined. One
complete cycle of the RFR consists of two flow-reverse periods. Overheating of
the catalyst and hot spot formation are avoided by a limited degree of cooling.
In Chapter 6, we derive the balance equations of the two-dimensional model
of a general packed bed reactor. Here we give a short summary of the deriva-
tion. We start with the one-dimensional pseudo-homogeneous model of Khi-
nast, Jeong and Luss [33], which takes into account the axial heat and mass
dispersion.
Introduction 11
The concentration and temperature depend on the axial and the radial
direction, c = c(z, r, t) and T = T (z, r, t). The second spatial dimension is
incorporated by including the radial components of the diffusion terms,
½ ¾ ½ ¾
1 ∂ ∂c 1 ∂ ∂T
εDrad r and λrad r ,
r ∂r ∂r r ∂r ∂r
in the component balance and the energy balance, respectively. The cooling
term in the energy balance disappears. Instead, at the wall of the reactor the
boundary condition
¯
∂T ¯¯
λrad = −Uw (T (R) − Tc ), (9)
∂r ¯r=R
is added to the system. Equation (9) describes the heat loss at the reactor
wall to the surrounding cooling jacket.
In summary, we can now give the complete two-dimensional model. The
component balance is given by
½ ¾
∂c ∂2c ∂c 0 1 ∂ ∂c
ε = εDax 2 − u − r (c, T ) + εDrad r , (10)
∂t ∂z ∂z r ∂r ∂r
the energy balance is given by
∂T ∂2T ∂T
((ρcp )s (1 − ε) + (ρcp )g ε) = λax 2 − u(ρcp )g +
∂t ∂z ∂z ½ ¾
0 1 ∂ ∂T
(−∆H)r (c, T ) + λrad r , (11)
r ∂r ∂r
and the boundary conditions are given by

¯ ¯
¯ ∂T ¯
−λax ∂T
∂z ¯ = u(ρcp )g (T0 − T (0)), ∂z ¯z=L = 0,
z=0
¯ ¯
∂c ¯ ∂c ¯
−εDax ∂z ¯ = u(c0 − c(0)), ∂z ¯z=L = 0,
z=0
¯ ¯
∂c ¯ ∂c ¯
∂r ¯r=0 = 0, ∂r ¯r=R = 0,
¯ ¯
∂T ¯ ¯
∂r ¯r=0 = 0, λrad ∂T
∂r ¯ = −Uw (T (R) − Tc ).
r=R
(12)
The values of the parameters in this model are derived in Appendix C and
summarized in Tables C.1 and C.2.
12 Introduction
In Chapter 7, we describe a numerical approach to deal with the partial

differential equations in order to compute the cyclic steady state of the process.
When we are able to evaluate the period map of the process using discretization
techniques and integration routines, we can apply the limited memory Broyden
methods of Chapter 3.
We define f : Rn → Rn to be the period map of the RFR of one flow
reverse period associated to the balance equations (10) - (12). We use 100
equidistant grid points in the axial direction and 25 grid points in the radial
direction. The state vector, denoted by x, consist of the temperature and the
concentration in every grid point. This implies that n = 5000.
In Chapter 8, we propose the use of the Broyden Rank Reduction method
to simulate a full two-dimensional model of the reverse flow reactor with radial
gradients taken into account. A disadvantage of the method of Broyden is
that an initial approximation of the solution has to be chosen as well as an
initial Broyden matrix. This problem is naturally solved by the application.
The reverse flow reactor is usually started in preheated state, for example
T = 2T0 , where the reactor is filled with the carrier gas without a trace of the
reactants, that is, c = 0. This initial state of the reactor is chosen as the initial
state of the Broyden process. In the first periods, the state of the reverse flow
reactor converges relatively fast to the cyclic steady state. Thereafter the rate
of convergence decreases and the dynamical process takes many periods before
it reaches the cyclic steady state. By taking the initial Broyden matrix equal
to minus the identity, B0 = −I, the first iteration of the Broyden process is a
dynamical simulation step,
x1 = x0 − B0−1 g(x0 ) = x0 + f (x0 ) − x0 = f (x0 ).
We apply the BRR method for different values of p to approximate a

zero of the function g(x) = f (x) − x with a residual of 10−8 . Figure 5 shows
that the BRR method converges in 49 iterations for p = 10. Here, instead of
25, 000, 000 (n2 ) required for a standard Broyden iteration, only 100, 000 (2pn)
storage locations are needed for the Broyden matrix. If we use a few more
iterations, p can even be chosen equal to 5 and the number of storage locations
is reduced further. If p is chosen too small (p = 2) the (fast) convergence is
lost. For complete details of the computations see Chapter 5.
The BRR method makes it possible to compute efficiently the cyclic steady
state of the reverse flow reactor, where radial gradients are integrated in the
model.
Introduction 13
0
10
residual kg(xk )k
−5
10
PSfrag replacements
0 10 20 30 40 50 60 70 80 90 100
iteration k
Figure 5: The convergence rate of the method of Broyden and the BRR method, for
different values of p, applied to the period map of the reverse flow reactor using the
two-dimensional model (10)-(12) with the parameter values of Tables C.1 and C.2.
[’+’(p = 10), ’∗’(p = 5), ’O’(p = 2)]
14 Introduction
Part I
Basics of limited memory

methods
15
Chapter 1
An introduction to iterative
methods
A general nonlinear system of algebraic equations can be written as
g(x) = 0, (1.1)
where x = (x1 , . . . , xn ) is a vector in Rn , the n-dimensional real vector space.

The function g : Rn → Rn is assumed to be continuously differentiable in an
open, convex domain D. The Jacobian of g, denoted by Jg , is assumed to be
Lipschitz continuous in D, that is, there exists a constant γ ≥ 0 such that for
every u and v in D,
kJg (u) − Jg (v)k ≤ γku − vk.
We write Jg ∈ Lipγ (D).
In general, systems of nonlinear equations cannot be solved analytically
and we have to consider numerical approaches. These approaches are generally
built on the concept of iterations. Steps involving similar computations are
performed over and over again until the solution is approximated. The oldest
and the most famous iterative method might be the method of Newton, also
called the Newton-Raphson method. In Section 1.2, we derive and discuss the
method of Newton and describe the convergence properties. In addition, we
discuss some quasi-Newton methods, based on the method of Newton.
The quasi-Newton method of most interest to this work is the method of
Broyden, proposed by Charles Broyden in 1965 [8]. In Section 1.3, we derive
this method in the same way as Broyden. We prove the local convergence of
the method and discuss the most interesting features.
As a simple introduction into quasi-Newton methods, we first consider a
scalar problem.
17
18 Chapter 1. An introduction to iterative methods
1.1 Iterative methods in one variable

The algorithms we discuss in this section are the scalar version of Newton’s
method, Broyden’s method and other quasi-Newton methods, as discussed in
Sections 1.2 and 1.3. The multi-dimensional versions of the methods are more
complex, but an understanding of the scalar case will help in understanding
the multi-dimensional case. The theorems in this section are special cases of
the theorems in Sections 1.2 and 1.3. So, we often omit the proof, unless it
gives insight in the algorithms.
The scalar version of Newton’s method

The standard iterative method to solve (1.1) is the method of Newton, which
can be described by a single expression. We choose an initial guess x0 ∈ R to
the solution x∗ and compute the sequence {xk } using the iteration scheme
g(xk )
xk+1 = xk − , k = 0, 1, 2, . . . . (1.2)
g 0 (xk )
This iteration scheme involves solving a local affine model for the function g
instead of solving the nonlinear equation (1.1) directly. A clear choice for the
affine model, denoted by lk (x), is the tangent line to the graph of g in the
point (xk , g(xk )). So, the function is linearized in the point xk , i.e.,
lk (x) = g(xk ) + g 0 (xk )(x − xk ), (1.3)
and xk+1 is defined to be the zero of this affine function, which yields (1.2).
We illustrate this idea with an example.
Example 1.1. Let g : R → R be given by
g(x) = x2 − 2. (1.4)
√
The derivative of this function is g 0 (x) = 2x and an exact zero of g is x∗ = 2.
As initial condition, we take x0 = 3. The first affine model equals the tangent
line to g at x0 ,
l0 (x) = g(x0 ) + g 0 (x0 )(x − x0 ) = 7 + 6(x − 3) = 6x − 11.
The next iterate x1 is determined to be the intersection point of the tangent

line and the x-axis, x1 = 11
6 , see Figure 1.1. Next, we repeat the same step
starting from the new estimate x1 .
1.1 Iterative methods in one variable 19
10
PSfrag replacements 0
x2 x1 x0
−2
1 2 3 4
Figure 1.1: The first two steps of the scalar version of Newton’s method (1.2) for
x2 − 2 = 0, starting at x0 = 3.
An important fact is that the method of Newton is locally q-quadratically

convergent. In every iteration, the number of accurate digits is doubled when
the iteration starts close to the true solution. The scalar version of Theorem
1.10 reads.
Theorem 1.2. Let g : R → R be continuously differentiable in an open inter-

val D, where g 0 ∈ Lipγ (D). Assume that for some ρ > 0, |g 0 (x)| ≥ ρ for every
x ∈ D. If g(x) = 0 has a solution x∗ ∈ D, then there exists an ε > 0 such that
if |x0 − x∗ | < ε, the sequence {xk } generated by
g(xk )
xk+1 = xk − , k = 0, 1, . . . ,
g 0 (xk )
exists and converges to x∗ . Furthermore, for k ≥ 0,

γ
|xk+1 − x∗ | ≤ |xk − x∗ |2 .
2ρ
The condition that g 0 (x) has a nonzero lower bound in D, simply means
that g 0 (x∗ ) must be nonzero for Newton’s method to converge quadratically.
If g 0 (x∗ ) = 0, then x∗ is a multiple root, and Newton’s method converges only
linearly [18]. In addition, if g 0 (x) ≥ ρ on D, the continuity of g implies that

x∗ is the only solution in D.
Theorem 1.2 guarantees the convergence only for a starting point x 0 that
lies in a neighborhood of the solution x∗ . If |x0 − x∗ | is too large, Newton’s
method might not converge. So, the method is useful for its fast local conver-
gence, but we need to combine it with a more robust algorithm that is can
converge from starting points further away from the true solution.
The secant method

In many practical applications, the nonlinear equation cannot be given in
closed form. For example, the function g might be the output of a compu-
tational or experimental procedure. In this case, g 0 (x) is not available and
we have to modify Newton’s method, which requires the derivative g 0 (x) to
model g around the current estimate xk by the tangent line to g(x) at xk . The
tangent line can be approximated by the secant line through g(x) at xk and a
nearby point xk + hk . The slope of this line is given by
g(xk + hk ) − g(xk )
ak = . (1.5)
hk
The function g(x) is modeled by
lk (x) = g(xk ) + ak (x − xk ). (1.6)
Iterative methods that solve (1.6) in every iteration step are called quasi-
Newton methods. These methods follow the scheme
g(xk )
xk+1 = xk − , k = 0, 1, . . . . (1.7)
ak
Of course we have to choose hk in the right way. For hk sufficiently small,
ak is a finite-difference approximation to g 0 (xk ). In Theorem 1.5, we show
that using ak given by (1.5) with sufficiently small hk , works as well as using
the derivative itself. However, in every iteration two function evaluations are
needed. If computing g(x) is very expensive, using hk = xk−1 − xk may be a
better choice, where xk−1 is the previous iterate. Substituting hk = xk−1 − xk
in (1.5) gives
g(xk−1 ) − g(xk )
ak = , (1.8)
xk−1 − xk
and only one function evaluation is required, since g(xk−1 ) is already computed
in the previous iteration. This quasi-Newton method is called the secant
method, because the local model uses the secant line through the points xk
and xk−1 . Since a0 is not defined by the secant method, a0 is often chosen
using (1.5) with h0 small or a0 = −1.
While it may seem locally ad hoc, it turns out to work well. The method is
slightly slower than a finite-difference method, but usually it is more efficient in
terms of the total number of function evaluations required to obtain a specified
accuracy.
To prove the convergence of the secant method we need the following
lemma, which also plays a role in the multi-dimensional setting.
Lemma 1.3. Let g : R → R be continuously differentiable in an open interval
D, and let g 0 ∈ Lipγ (D). Then for any x, y in D,
γ(y − x)2
|g(y) − g(x) − g 0 (x)(y − x)| ≤ .
2
Ry
Proof. The fundamental theorem of calculus gives that g(y)−g(x) = x g 0 (z)dz,
which implies
Z y
g(y) − g(x) − g 0 (x)(y − x) = (g 0 (z) − g 0 (x))dz. (1.9)
x
Under the change of variables
z = x + t(y − x), dz = dt(y − x),
(1.9) becomes
Z 1
g(y) − g(x) − g 0 (x)(y − x) = (g 0 (x + t(y − x)) − g 0 (x))(y − x)dt.
0
Applying the triangle inequality to the integral and using the Lipschitz conti-
nuity of g 0 , yields
Z 1
0
|g(y) − g(x) − g (x)(y − x)| = |y − x| γ|t(y − x)|dt = γ|y − x|2 /2.
0
We analyze one step of the quasi-Newton process (1.6). By construction,
xk+1 − x∗ = a−1 ∗ ∗
k (ak (xk − x ) − g(xk ) + g(x ))
= a−1 ∗ 0 ∗ 0
k (g(x ) − g(xk ) − g (xk )(x − xk ) + (g (xk ) − ak )(x − xk ))
∗
Z
³ x ∗ ´
= a−1
k (g 0
(z) − g 0
(x k ))dz + (g 0
(x k ) − a k )(x ∗
− x k ) .
xk
If we define ek = |xk − x∗ | and use g 0 ∈ Lipγ (D) in the same way as in the
proof of Lemma 1.3, we obtain
³γ ´
ek+1 ≤ |a−1 | e 2
+ |g 0
(x k ) − a |e
k k . (1.10)
k 2 k
In order to use (1.10), we have to know how close the finite difference
approximation ak is to g 0 (xk ) as a function of hk .
Lemma 1.4. Let g : R → R be continuously differentiable in an open interval

D and let g 0 ∈ Lipγ (D). If xk , xk + hk ∈ D and ak is defined by (1.5), then
γ|hk |
|ak − g 0 (xk )| ≤ . (1.11)
2
Proof. From Lemma 1.3, we have
γ|hk |2
|g(xk ) − g(xk + hk ) − hk g 0 (xk )| ≤ .
2
Dividing both sides by |hk | gives the desired result.
Substituting (1.11) in (1.10) gives

γ
ek+1 ≤ (ek + |hk |)ek . (1.12)
2|ak |
Using this inequality, it is not difficult to prove the following theorem.
Theorem 1.5. Let g : R → R be continuously differentiable in an open in-

terval D and let g 0 ∈ Lipγ (D). Assume that |g 0 (x)| ≥ ρ for some ρ > 0 and
for every x ∈ D. If g(x) = 0 has a solution x∗ ∈ D, then there exist positive
constants ε, h such that if {hk } is a real sequence with 0 < |hk | ≤ h, and if
|x0 − x∗ | < ε, then the sequence {xk } given by
g(xk ) g(xk + hk ) − g(xk )

xk+1 = xk − , ak = ,
ak hk
for k = 0, 1, . . . , is well defined and converges q-linearly to x∗ . If limk→∞ hk =
0, then the convergence is q-superlinear. If there exists some constant c1 such
that
|hk | ≤ c1 |xk − x∗ |,
or equivalently, a constant c2 such that
|hk | ≤ c2 |g(xk )|, (1.13)

then the convergence is q-quadratic. If there exists some constant c3 such that
|hk | ≤ c3 |xk − xk−1 |, (1.14)
then the convergence is at least two-step q-quadratic.
If we would like the finite-difference methods to converge q-quadratically,

we just set hk = c2 |g(xk )|. Indeed, if xk is close enough to x∗ , the mean value
theorem and the fact that g(x∗ ) = 0 implies that |g(xk )| ≤ c|xk − x∗ | for some
c > 0. Note that the secant method hk = xk−1 −xk is included as a special case
of (1.14). We restrict the proof of Theorem 1.5 to the two-step q-quadratic
convergence of the secant method.
Proof (of Theorem 1.5). We first prove that the secant method is q-linearly
convergent. Choose ε = ρ/(4γ) and h = ρ/(2γ). Suppose x0 and x1 are in D
and in addition |x0 − x∗ | < ε, |x1 − x∗ | < ε and |h1 | = |x1 − x0 | < η 0 . Since
|g 0 (x)| ≥ ρ for all x ∈ D, (1.11) implies that
|a1 | = |a1 − g 0 (x1 ) + g 0 (x1 )|

≥ |g 0 (x1 )| − |a1 − g 0 (x1 )|
γ|h1 |
≥ ρ−
2
γ ρ 3
≥ ρ− · = ρ.
2 2γ 4
From (1.12), this gives
γ 2γ ³ ρ ρ´ 1
e2 ≤ (e 1 + |h 1 |)e 1 ≤ + e1 = e1 .
2 · 34 ρ 3ρ 4γ 2γ 2
1
Therefore, we have |x2 − x∗ | ≤ 2 · ε < ε and
3 3
|h2 | = |x2 − x1 | ≤ e2 + e1 ≤ ε = h < h.
2 4
Using the same arguments, we obtain
2γ 1
ek+1 ≤ (ek + |hk |)ek ≤ ek for all k = 1, 2, . . . . (1.15)
3ρ 2
To prove the two-step q-quadratic convergence of the secant method, we note
that
|hk | = |xk − xk−1 | ≤ ek + ek−1 , k = 1, 2, . . . .
Using the linear convergence, we derive from (1.15)
2γ 2γ 1 2γ 2
ek+1 ≤ (ek + ek + ek−1 )ek ≤ (2ek−1 ) ek−1 = e .
3ρ 3ρ 2 3ρ k−1
This implies the two-step q-quadratic convergence.
In numerical simulations, we have to deal with the restrictions arising from

the finite arithmetic of the computer. In particular, we should not choose h k
too small because then f l(xk ) = f l(xk + hk ), where f l(a) is the floating point
representation of a, and the finite-difference approximation of g 0 (xk ) is not
defined. Additionally, it can also happen that f l(g(xk )) = f l(g(xk + hk )),
although the derivative is not equal to zero. This is one reason that the secant
process can fail. Of course, it depends on the function and how accurately we
can approximate the zero of the function.
In the next example, we consider the secant method again applied to the
function g(x) = x2 − 2.
Example 1.6. Let g : Rn → Rn be given by (1.4). We apply the secant

method, (1.7) and (1.8), to solve g(x) = 0, starting from the initial condition
x0 = 3. In the first iteration step of the secant method, we use a0 = g 0 (x0 ). In
Figure 1.2, the first two steps of the secant method are displayed. Table 1.1
shows that the secant method needs just√a few more iterations to obtain the
same precision for the approximation of 2 as Newton’s method.
iteration Newton’s method Secant method
0 3.0 3.0
1 1.833333333333 1.833333333333
2 1.462121212121 1.551724137931
3 1.414998429895 1.431239388795
4 1.414213780047 1.414998429895
5 1.414213562373 1.414218257349
6 - 1.414213563676
7 - 1.414213562373
Table 1.1: The

√ method of Newton (1.2) and the secant method (1.7) and (1.8) ap-
proximating 2.
1.2 The method of Newton 25
10
x2 x1 x0
−2
1 2 3 4
Figure 1.2: The first two steps of the secant method, defined by (1.7) and (1.8), for
x2 − 2 = 0, starting from x0 = 3.
1.2 The method of Newton

This section we derive the method of Newton in the multi-dimensional set-
ting. By Theorem 1.10, we prove that in Rn Newton’s method is locally
q-quadratically convergent. We then discuss the finite-difference version of
Newton’s method and other types of quasi-Newton methods that leads to the
introduction of the method of Broyden in Section 1.3.
Derivation of the algorithm

Similar to the one-dimensional setting, the method of Newton is based on
finding the root of an affine approximation to g at the current iterate x k . The
local model is derived from the equality
Z xk +s
g(xk + s) = g(xk ) + Jg (z)sdz,
xk
where Jg denotes the Jacobian of g. If the integral is approximated by Jg (xk )s,

the model in the current iterate becomes
lk (xk + s) = g(xk ) + Jg (xk )s. (1.16)

We solve this affine model for s, that is, find sk ∈ Rn such that
lk (xk + sk ) = 0.
This Newton step, sk , is added to the current iterate
xk+1 = xk + sk .
The new iterate xk+1 is not expected to equal x∗ , but only to be a better
estimate than xk . Therefore, we build the Newton iteration into an algorithm,
starting from an initial guess x0 .
Algorithm 1.7 (Newton’s method). Choose an initial estimate x0 ∈ Rn ,

pick ε > 0, and set k := 0. Repeat the following sequence of steps until
kg(xk )k < ε.
i) Solve Jg (xk )sk = −g(xk ) for sk ,
ii) xk+1 := xk + sk .
In order to judge every iterative method described in this thesis, we con-

sider the rate of convergence of each method on a test function, the discrete
integral equation function, as described in Appendix A. We have chosen this
function from a large set of test function, called the CUTE collection, cf.
[18, 47]. It is a commonly chosen problem and, in addition, the method of
Broyden is able to compute a zero of this function rather easily. The first time
we use this test function, we explicitly give the expression of the function. In
future examples, we refer to Appendix A.
Example 1.8. We apply Algorithm 1.7 to find a zero of the discrete integral
equation function, given by
h³ ´
i
X n
X
gi (x) = xi + (1−ti ) tj (xj +tj +1)3 +ti (1−tj )(xj +tj +1)3 , (1.17)
2
j=1 j=i+1
for i = 1, . . . , n, where h = 1/(n + 1) and ti = i · h, i = 1, . . . , n. We start with

the initial vector x0 given by
x0 = (t1 (t1 − 1), . . . , tn (tn − 1)).
In Table 1.2, the convergence properties of Newton’s method are described,

for different dimensions n of the problem. The initial residual kg(x0 )k and the
final residual kg(xk∗ )k are given, where k ∗ is the number of iterations used.
The variable R is a measure of the rate of convergence and defined by
R = log(kg(x0 )k/kg(xk∗ )k)/k ∗ . (1.18)
The residual kg(xk )k, k = 0, . . . , k ∗ , is plotted in Figure 1.3. We observe that

the dimension of the problem does not influence the convergence of Newton’s
method in case of this test function.
method n kg(x0 )k kg(xk∗ )k k∗ R
Newton 10 0.2518 6.3085 · 10−15 3 10.4393

Newton 100 0.7570 1.7854 · 10−14 3 10.4594
Newton 200 1.0678 2.4858 · 10−14 3 10.4637
Table 1.2: The convergence properties of Algorithm 1.7 applied to the discrete integral
equation function (1.17) for different dimensions n.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 1 2 3
iteration k
Figure 1.3: The convergence rate of Algorithm 1.7 applied to the discrete integral
equation function (1.17) for different dimensions n. [’◦’(n = 10), ’×’(n = 100), ’+’(n =
200)]
Note that if g is an affine function, Newton solves the problem in one

iteration. Even if a component function of g is affine, each iterate generated
by Newton’s method is a zero of this component function, that is, if g1 is affine
then g1 (x1 ) = g1 (x2 ) = . . . = 0. We illustrate this with an example.
Example 1.9. We consider the Rosenbrock function g : R2 → R2 defined by

µ ¶
10(x2 − x21 )
g(x) = .
1 − x1
As initial condition, we choose x0 = (−1.2, 1). The function value in x0

equals g(x0 ) = (−4.4, 2.2). Note that the second component of the Rosen-
brock function is affine in x. This explains the zero in the function value of
x1 = (1, −3.84), which equals g(x1 ) = (−48.4, 0). As said before, all future
iterates will be a zero of the second component function. This implies that
the first component of xk will be equal to 1 for all future iterations. So, the
first component of the Rosenbrock function has become affine and the next
iterate yields the solution x2 = (1, 1).
Some problems arise in implementing Algorithm 1.7. The Jacobian of

g is often not analytically available, for example, if g itself is not given in
analytic form. A finite-difference method or less expensive methods should be
used to approximate the Jacobian. Secondly, if Jg (xk ) is ill-conditioned, then
Jg (xk )sk = −g(xk ) will not give a reliable solution.
Local convergence of Newton’s method

In this section, we give a proof of the local q-quadratic convergence of Newton’s
method and discuss its implications. The proof is a prototype of the proofs
for convergence of the quasi-Newton methods. All the convergence results in
this thesis are local, i.e., there exists an ε > 0 such that the iterative method
converges for all x0 in an open neighborhood N (x∗ , ε) of the solution x∗ . Here,
N (x∗ , ε) = {x ∈ Rn | kx − x∗ k < ε}.
Theorem 1.10. Let g : Rn → Rn be continuously differentiable in an open,

convex set D ∈ Rn . Assume that Jg ∈ Lipγ (D) and that there exist x∗ ∈ Rn
and β > 0 such that g(x∗ ) = 0 and Jg (x∗ ) is nonsingular with kJg (x∗ )−1 k ≤ β.
Then there exists an ε > 0 such that for all x0 ∈ N (x∗ , ε) the sequence {xk }
generated by
xk+1 = xk − Jg (xk )−1 g(xk ), k = 0, 1, 2, . . . ,
is well defined, converges to x∗ , and satisfies
kxk+1 − x∗ k ≤ βγkxk − x∗ k2 for k = 0, 1, 2, . . . . (1.19)

We can show that the convergence is q-quadratic, by choosing ε such that

Jg (x) is nonsingular for all x ∈ N (x∗ , ε). The reason is that if the Jacobian
is nonsingular, the local error in the affine model (1.16) is at most of order
O(kxk − x∗ k2 ). This is a consequence of the following lemma.
Lemma 1.11. Let g : Rn → Rn be continuously differentiable in the open,
convex set D ⊂ Rn and let x ∈ D. If Jg ∈ Lipγ (D), then for any y ∈ D,
γ
kg(y) − g(x) − Jg (x)(y − x)k ≤ ky − xk2 .
2
Proof. According to the fundamental theorem of calculus, we have
g(y) − g(x) − Jg (x)(y − x)
³Z 1 ´
= Jg (x + t(y − x))(y − x)dt − Jg (x)(y − x)
0
Z 1
= (Jg (x + t(y − x)) − Jg (x))(y − x)dt. (1.20)
0
We can bound the integral on the right hand side of (1.20) in terms of the
integrand. Together with the Lipschitz continuity of Jg at x ∈ D, this implies
Z 1
kg(y) − g(x) − Jg (x)(y − x)k ≤ kJg (x + t(y − x)) − Jg (x)kk(y − x)kdt
0
Z 1
≤ γkt(y − x)kk(y − x)kdt
0
Z 1
2 γ
= γk(y − x)k tdt = ky − xk2 .
0 2
The next theorem says that matrix inversion is continuous in norm. Fur-
thermore, it gives a relation between the norms of the inverses of two nearby
matrices that is useful later in analyzing algorithms.
Theorem 1.12. Let k.k be the induced l2 -norm on Rn×n and let E ∈ Rn×n .
If kEk < 1, then (I − E)−1 exists and
1
k(I − E)−1 k ≤ .
1 − kEk
If A is nonsingular and kA−1 (B − A)k < 1, then B is nonsingular and
kA−1 k
kB −1 k ≤ .
1 − kA−1 (B − A)k
The proof of Theorem 1.12 can be found in [18].
Proof (of Theorem 1.10). We choose
ε ≤ 1/(2βγ) (1.21)
so that N (x∗ , ε) ⊂ D. By induction on k, we show that (1.19) holds for each

iteration step and that
1
kxk+1 − x∗ k ≤ kxk − x∗ k,
2
which implies that xk+1 ∈ N (x∗ , ε) if xk ∈ N (x∗ , ε).
We first consider the basis step (k = 0). Using the Lipschitz continuity of
Jg at x∗ , kx0 − x∗ k ≤ ε and (1.21), we obtain
kJg (x∗ )−1 (Jg (x0 ) − Jg (x∗ ))k ≤ kJg (x∗ )−1 kkJg (x0 ) − Jg (x∗ )k
1
≤ βγkx0 − x∗ k ≤ βγε ≤ .
2
Theorem 1.12 implies that Jg (x0 ) is nonsingular and
kJg (x∗ )−1 k

kJg (x0 )−1 k ≤ ≤ 2·kJg (x∗ )−1 k ≤ 2β. (1.22)
1 − kJg (x∗ )−1 (Jg (x0 ) − Jg (x∗ ))k
This implies that x1 is well defined and, additionally,
x1 − x∗ = x0 − x∗ − Jg (x0 )−1 g(x0 )

= x0 − x∗ − Jg (x0 )−1 (g(x0 ) − g(x∗ ))
= Jg (x0 )−1 (g(x∗ ) − g(x0 ) − Jg (x0 )(x∗ − x0 )). (1.23)
The second factor in (1.23) gives the difference between g(x∗ ) and the affine
model l0 (x) evaluated at x∗ . Therefore, by Lemma 1.11 and (1.22),
kx1 − x∗ k ≤ kJg (x0 )−1 kkg(x∗ ) − g(x0 ) − Jg (x0 )(x∗ − x0 )k

γ
≤ 2β kx0 − x∗ k2 = βγkx0 − x∗ k2 .
2
We have shown (1.19) for k = 0. Since kx0 −x∗ k ≤ ε ≤ 1/(2βγ), it follows that
kx1 − x∗ k ≤ 21 kx0 − x∗ k, which yields x1 ∈ N (x∗ , ε) as well. This completes
the proof for k = 0.
The proof of the induction step proceeds in the same way.
Note that if g is affine, the Jacobian is constant and the Lipschitz constant
γ can be chosen to be zero. We then have
kx1 − x∗ k ≤ βγkx0 − x∗ k2 = 0,
and the method of Newton converges exactly in one single iteration. If g is a

nonlinear function, the relative nonlinearity of g at x∗ is given by, γrel = β · γ.
So, for x ∈ D,
kJg (x∗ )−1 (Jg (x) − Jg (x∗ ))k ≤ kJg (x∗ )−1 kkJg (x) − Jg (x∗ )k
≤ βγkx − x∗ k = γrel kx − x∗ k.
The radius of guaranteed convergence of Newton’s method is inversely propor-

tional to the relative nonlinearity, γrel , of g at x∗ . The bound ε for the region
of convergence is a worst-case estimate. In directions from x∗ in which g is
less nonlinear, the region of convergence may very well be much larger.
We conclude with a summery of the characteristics of Newton’s method.
Advantages of Newton’s method
• q-Quadractically convergent from good starting points if Jg (x∗ ) is non-

singular,
• Exact solution in one iteration for an affine function g (exact at each

iteration for any affine component function of g).
Disadvantages of Newton’s method
• Not globally convergent for many problems,
• Requires the Jacobian Jg (xk ) at each iteration step,
• Each iteration step requires the solution of a system of linear equations

that might be singular or ill-conditioned.
Quasi-Newton methods
We have already indicated that it is not always possible to compute the Ja-
cobian of the function g, or that it is very expensive. In this case, we have to
approximate the Jacobian, for example, by using finite differences.
Algorithm 1.13 (Discrete Newton method). Choose an initial estimate

x0 ∈ Rn and set k := 0. Repeat the following sequence until kg(xk )k < ε.
i) Compute
£ ¤
Ak = (g(xk + hk e1 ) − g(xk ))/hk · · · (g(xk + hk en ) − g(xk ))/hk
ii) Solve Ak sk = −g(xk ) for sk ,

iii) xk+1 := xk + sk .
Example 1.14. We consider the discrete integral equation function g, given
by (A.5). We assume that hk ≡ h and apply Algorithm 1.13 for different values
of h. We start with the initial condition x0 given by (A.6) and set ε = 10−12 .
The convergence properties of the discrete Newton method are described in
Table 1.3. The rate of convergence is plotted in Figure 1.4. The difference
between the real Jacobian and the approximated Jacobian kJg (xk )−Ak k turns
out to be of order 10−5 for h = 1.0 · 10−4 , of order 10−7 for h = 1.0 · 10−8 and
of order 10−3 for h = 1.0 · 10−12 .
method n h kg(x0 )k kg(xk∗ )k k∗ R
Discrete Newton 100 1.0 · 10−4 0.7570 4.4908 · 10−16 4 8.7652

Discrete Newton 100 1.0 · 10−8 0.7570 8.6074 · 10−14 3 9.9351
Discrete Newton 100 1.0 · 10−12 0.7570 2.1317 · 10−13 4 7.2246
Table 1.3: The convergence properties of Algorithm 1.13, applied to the discrete
integral equation function (A.5) for different values of h.
If the finite-difference step size hk is properly chosen, the discrete Newton

method is also q-quadratically convergent. This is the conclusion of the next
theorem. We denote the l1 vector-norm and the corresponding induced matrix
norm by k.k1 .
Theorem 1.15. Let g and x∗ satisfy the assumptions of Theorem 1.10. Then
there exist ε, h > 0 such that if {hk } is a real sequence with 0 < |hk | ≤ h and
x0 ∈ N (x∗ , ε), the sequence {xk } generated by
xk+1 = xk − A−1
k g(xk ), k = 0, 1, 2, . . . ,
where
£ ¤
Ak = (g(xk + hk e1 ) − g(xk ))/hk · · · (g(xk + hk en ) − g(xk ))/hk ,
is well defined and converges q-linearly to x∗ . Additionally, if
lim hk = 0,
k→∞
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 1 2 3 4
iteration k
Figure 1.4: The convergence rate of the discrete Newton method 1.13 applied to
the discrete integral equation function (A.5) for different values of h. [’◦’(h = 10 −4 ),
’×’(h = 10−8 ), ’+’(h = 10−12 )]
then the convergence is q-superlinear. If there exists a constant c1 such that
|hk | ≤ c1 kxk − x∗ k1 ,
or equivalently a constant c2 such that
|hk | ≤ c2 kg(xk )k1 ,
then the convergence is q-quadratic.
For the proof of Theorem 1.15 we refer to [18]. Another way to avoid com-
putations of the Jacobian in every iteration is to compute the Jacobian in the
first iteration, A = Jg (x0 ), and use this matrix in all subsequent iterations as
an approximation of Jg (xk ). This method is called the Newton-Chord method.
It turns out that the Newton-Chord method is locally linearly convergent [38].
Algorithm 1.16 (Newton-Chord method). Choose an initial estimate

x0 ∈ Rn , set k := 0, and compute the Jacobian A := Jg (x0 ). Repeat the
following sequence of steps until kg(xk )k < ε.
i) Solve Ask = −g(xk ) for sk ,
ii) xk+1 := xk + sk .
Example 1.17. Let g be the discrete integral equation function given by

(A.5). We apply Algorithm 1.16 and Algorithm 1.7 to approximate the zero
of g. As initial estimate we choose x0 given by (A.6) multiplied by a factor 1, 10

or 100. The convergence properties of the Newton-Chord method and Newton’s
method are described in Table 1.4. In Figure 1.5, we can observe the linear
convergence of the Newton-Chord method. The rate of convergence of the
Newton-Chord method is very low in case of the initial condition 100x0 . Clearly
for all initial conditions, the Newton-Chord method needs more iterations to
converge than the original method of Newton, see Figure 1.6.
method n factor kg(x0 )k kg(xk∗ )k k∗ R
Newton Chord 100 1 0.7570 2.3372 · 10−14 8 3.8886

Newton Chord 100 10 18.5217 2.9052 · 10−13 16 1.9866
Newton Chord 100 100 3.8215 · 103 2.1287 200 0.0375
Newton 100 1 0.7570 1.7854 · 10−14 3 10.4594
Newton 100 10 18.5217 2.7007 · 10−16 4 9.6917
Newton 100 100 3.8215 · 103 3.9780 · 10−13 9 4.0890
Table 1.4: The convergence properties of Algorithm 1.16 and Algorithm 1.7 applied to
the discrete integral equation function (A.5) for different initial conditions (x 0 , 10x0
and 100x0 ).
5
10
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 2 4 6 8 10 12 14 16 18 20
iteration k
equation function (A.5) for different initial conditions. [’◦’(x0 ), ’×’(10x0 ), ’+’(100x0 )]
1.3 The method of Broyden 35
5
10
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 1 2 3 4 5 6 7 8 9 10
iteration k
equation function (A.5) for different initial conditions. [’◦’(x0 ), ’×’(10x0 ), ’+’(100x0 )]
1.3 The method of Broyden

The Newton-Chord method of the previous section saves us the expensive
computation of the Jacobian Jg (xk ) is every iterate xk of the process, by ap-
proximating it by the Jacobian in the initial condition, A = Jg (x0 ). Additional
information about the Jacobian obtained during the process is neglected. This
information consists of the function values of g in the iterates needed to com-
pute the step sk . In this section, we start with the basis idea for a class of
methods that adjust the approximation matrix to the Jacobian Jg (xk ) using
only the function value g(xk ). We single out the method proposed by C.G.
Broyden in 1965 [8] that has a q-superlinear and even 2n-step q-quadratic
local convergence rate and seems to be very successful in practice. This algo-
rithm, which is analogous to the method of Newton, is called the method of
Broyden.
A derivation of the algorithm

Recall that in one dimension we use the local model (1.6),
lk+1 (x) = g(xk+1 ) + ak+1 (x − xk+1 )
for the nonlinear function g. Note that lk+1 (xk+1 ) = g(xk+1 ) for all choices of
ak+1 ∈ Rn . If we set ak+1 = g 0 (xk+1 ), we obtain Newton’s method. If g 0 (xk+1 )
is not available, we force the scheme to satisfy lk+1 (xk ) = g(xk ), that is
g(xk ) = g(xk+1 ) + ak+1 (xk − xk+1 ),
which yields the secant approximation (1.8),
g(xk+1 ) − g(xk )
ak+1 = .
xk+1 − xk
The next iterate xk+2 is the zero of the local model, lk+1 . Therefore we arrive
at the quasi-Newton update
xk+2 = xk+1 − g(xk+1 )/ak+1 .
The price we have to pay is a reduction in local convergence rate, from q-

quadratic to 2-step q-quadratic convergence.
In multiple dimensions, we apply an analogous affine model
lk+1 (x) = g(xk+1 ) + Bk+1 (x − xk+1 ).
For Newton’s method Bk+1 equals the Jacobian Jg (xk+1 ). We enforce the same
requirement that led to the one-dimensional secant method. So, we assume
that lk+1 (xk ) = g(xk ), which implies that
g(xk ) = g(xk+1 ) + Bk+1 (xk − xk+1 ). (1.24)
Furthermore, if we define the current step by sk = xk+1 − xk , and the yield of

the current step by yk = g(xk+1 ) − g(xk ), Equation (1.24) is reduced to
Bk+1 sk = yk . (1.25)
We refer to (1.25) as the secant equation. For completeness we first give the
definition of a secant method.
Definition 1.18. The iterative process
xk+1 = xk − Bk−1 g(xk )
is called a secant method if the matrix Bk satisfies the secant equation (1.25)
in every iteration step.
The crux of the problem in extending the secant method to more than
one dimension is that (1.25) does not completely specify the matrix Bk+1 . In
fact, if sk 6= 0, there is an n(n − 1)-dimensional affine subspace of matrices
satisfying (1.25). Constructing a successful secant approximation consists of
selecting a good approach to choose from all these possibilities. The choice
should enhance the Jacobian approximation properties of Bk+1 or facilitate
its use in a quasi-Newton algorithm.
A possible strategy is using the former function evaluations. That is, in

additional to the secant equation, we set
g(xl ) = g(xk+1 ) + Bk+1 (xl − xk+1 ), l = k − m, . . . , k − 1.
This is equivalent to
g(xl ) = g(xl+1 ) + Bk+1 (xl − xl+1 ), l = k − m, . . . , k − 1,
so,
Bk+1 sl = yl , l = k − m, . . . , k − 1. (1.26)
For m = n−1 and linear independent sk−m , . . . , sk the matrix Bk+1 is uniquely
determined by (1.25) and (1.26). Unfortunately, most of the time sk−m , . . . , sk
tend to be linearly dependent, making the computation of Bk+1 a poorly posed
numerical problem.
The approach that leads to the successful secant approximation is quite
different. Aside from the secant equation no new information about either the
Jacobian or the model is given. The idea is to preserve as much as possible of
what we already have. Therefore, we try to minimize the change in the affine
model, subject to the secant equation (1.25). The difference between the new
and the old affine model, at any x is given by
lk+1 (x) − lk (x) = g(xk+1 ) + Bk+1 (x − xk+1 ) − g(xk ) − Bk (x − xk )

= yk − Bk+1 sk + (Bk+1 − Bk )(x − xk )
= (Bk+1 − Bk )(x − xk ).
The last equality is due to the secant equation. Now if we write an arbitrary
x ∈ Rn as
x − xk = αsk + q, where q T sk = 0, α ∈ R,
the expression that we want to minimize becomes
lk+1 (x) − lk (x) = α(Bk+1 − Bk )sk + (Bk+1 − Bk )q. (1.27)
We have no control over the first term on the right hand side of (1.27), since
it equals
(Bk+1 − Bk )sk = yk − Bk sk . (1.28)
However, we can make the second term on the right hand side of (1.27) zero
for all x ∈ Rn , by choosing Bk+1 such that
(Bk+1 − Bk )q = 0, for all q ⊥ sk . (1.29)

This implies that (Bk+1 − Bk ) has to be a rank-one matrix of the form usTk ,
with u ∈ Rn . Equation (1.28) now implies that u = (yk − Bk sk )/(sTk sk ). This
leads to the Broyden or secant update
(yk − Bk sk )sTk
Bk+1 = Bk + . (1.30)
sTk sk
The word ’update’ indicates that we are not approximating the Jacobian in
the new iterate, Jg (xk+1 ), from scratch. Rather a former approximation Bk
is updated into a new one, Bk+1 . This type of updating is shared by all the
successful multi-dimensional secant approximation techniques.
We arrive at the algorithm of Broyden’s method.
Algorithm 1.19 (Broyden’s method). Choose an initial estimate x0 ∈

Rn and a nonsingular initial Broyden matrix B0 . Set k := 0 and repeat the
following sequence of steps until kg(xk )k < ε.
i) Solve Bk sk = −g(xk ) for sk ,
ii) xk+1 := xk + sk
iii) yk := g(xk+1 ) − g(xk ),
iv) Bk+1 := Bk + (yk − Bk sk )sTk /(sTk sk ),
In this section we use the Frobenius norm, denoted by k.kF . The norm is
defined by
³Xn X n ´1/2
kAkF = A2ij . (1.31)
i=1 j=1
So, it equals the l2 -vector norm of the matrix written as a n2 -vector. For
y, s ∈ Rn the set of all matrices that satisfy the secant equation As = y is
denoted by
Q(y, s) = {A ∈ Rn×n | As = y}.
In the preceding, we have followed the steps of Broyden when developing
his iterative method in [8], but the derivation of the Broyden update can be
made much more rigorous. The Broyden update is the minimum change to
Bk consistent with the secant equation (1.25), if (Bk+1 − Bk ) is measured in
Frobenius norm. That is, of all matrices A that satisfy the secant equation
(1.25) the new Broyden matrix Bk+1 yields the minimum of kA − Bk kF . This
will be proved in Lemma 1.20.
Lemma 1.20. Let B ∈ Rn×n and s, y ∈ Rn arbitrary. If s 6= 0, then the

unique solution A = B̄ to
min kA − BkF (1.32)
A∈Q(y,s)
is given by
(y − Bs)sT
B̄ = B + .
sT s
Proof. We compute for any A ∈ Q(y, s),
° (y − Bs)sT °
° °
kB̄ − BkF = ° °
sT s F
° (A − B)ssT °
° °
= ° °
sT s F
° ssT °
° °
≤ kA − BkF ° T ° = kA − BkF .
s s 2
Note that Q(y, s) is a convex (in fact, affine) subset of Rn×n . Because the
Frobenius norm is strictly convex, the solution to (1.32) is unique on the
convex subset Q(y, s).
We have not defined yet what should be chosen for the initial approxima-
tion B0 to the Jacobian in the initial estimate, Jg (x0 ). The finite differences
approximation turns out to be a good start. It also makes the minimum change
characteristics of Broyden’s update more appealing, as given in Lemma 1.20.
Another choice, that avoids the computation of Jg (x0 ), is taking the initial
approximation equal to minus identity,
B0 = −I. (1.33)
Suppose the function g is defined by g(x) = f (x) − x, where f is the period
map of a dynamical process
xk+1 = f (xk ), k = 0, 1, . . . . (1.34)
A fixed point of the process (1.34) is a zero of the function g. By choosing
B0 = −I, the first iteration of Broyden is just a dynamical simulation step,
x1 = x0 − B0−1 g(x0 ) = x0 − (f (x0 ) − x0 ) = f (x0 ).
So, in this way, we let the system choose the direction of the first step. In
addition, the initial Broyden matrix is easy to store and can be directly imple-
mented in the computer code. This makes the reduction methods discussed
in Chapter 3 effective.
We now apply the method of Broyden to the test function (A.5).

(A.5). We define the initial condition x0 by (A.6) and we set ε = 10−12 . We
apply Algorithm 1.19 for different dimensions of the problem. The convergence
results for the method of Broyden are described in Table 1.5. Although the
Broyden’s method needs more iterations to converge than Newton’s method, it
avoids the computation of the Jacobian. The method of Broyden method only
makes one function evaluation per iteration compared to the n + 1 function
evaluations of Algorithm 1.13. The rate of convergence again does not depend
on the dimension of the problem, see also Figure 1.7.
method n kg(x0 )k kg(xk∗ )k k∗ R
Broyden 10 0.2518 4.8980 · 10−14 21 1.3937

Broyden 100 0.7570 4.4398 · 10−13 21 1.3412
Broyden 200 1.0678 6.3644 · 10−13 21 1.3404
Table 1.5: The convergence results for Algorithm 1.19 applied to the discrete integral
equation function (A.5) for different dimensions n.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 2 4 6 8 10 12 14 16 18 20 22
iteration k
equation function (A.5) for different dimensions n. [’◦’(n = 10), ’×’(n = 100), ’+’(n =
200)]

(A.5). We apply Algorithm 1.19 to approximate the zero of g. As we did for
the Newton-Chord method and the method of Newton we multiply the initial
condition x0 , given by (A.6), by a factor 1, 10 and 100. The convergence results
for Broyden’s method are described in Table 1.6. For the initial condition
100x0 the method of Broyden fails to converge.
method n factor kg(x0 )k kg(xk∗ )k k∗ R
Broyden 100 1 0.7570 4.4398 · 10−13 21 1.3412

Broyden 100 10 18.5217 8.7765 · 10−13 33 0.9297
Broyden 100 100 3.8215 · 103 1.0975 · 10+20 13 -2.9151
Table 1.6: The convergence results for Broyden’s method 1.19 applied to the discrete
integral equation function (A.5) for different initial conditions, x0 , 10x0 and 100x0 .
Superlinear convergence
In order to prove the convergence of Broyden’s method, we first need the
following extension of Lemma 1.11.
Lemma 1.23. Let g : Rn → Rn be continuously differentiable in the open,
convex set D ⊂ Rn , x ∈ D. If Jg ∈ Lipγ (D) then for every u and v in D
kg(v) − g(u) − Jg (x)(v − u)k ≤ γ max{kv − xk, ku − xk}kv − uk. (1.35)
Moreover, if Jg (x) is invertible, there exist ε > 0 and ρ > 0 such that
(1/ρ)kv − uk ≤ kg(v) − g(u)k ≤ ρkv − uk, (1.36)
for all u, v ∈ D for which max{kv − xk, ku − xk} ≤ ε.
Proof. The proof of Equation (1.35) is similar to the proof of Lemma 1.11.
Equation (1.35) together with the triangle inequality implies that for u, v
satisfying max{kv − xk, ku − xk} ≤ ε,
kg(v) − g(u)k ≤ kJg (x)(v − u)k + kg(v) − g(u) − Jg (x)(v − u)k
≤ (kJg (x)k + γ max{kv − xk, ku − xk})kv − uk
≤ (kJg (x)k + γε)kv − uk.
Similarly,
kg(v) − g(u)k ≥ kJg (x)(v − u)k − kg(v) − g(u) − Jg (x)(v − u)k
³ 1 ´
≥ − γ max{kv − xk, ku − xk} kv − uk
kJg (x)−1 k
³ 1 ´
≥ − γε kv − uk.
kJg (x)−1 k
Thus if ε < (1/kJg (x)−1 kγ), then 1/kJg (x)−1 k − γε > 0 and (1.36) holds if we
choose ρ large enough such that
ρ > kJg (x)k + γε,
and
1 1
< − γε.
ρ kJg (x)−1 k
In the next theorem it is necessary to use the Frobenius norm (1.31).

Because all norms in a finite-dimensional vector space are equivalent, there is
a constant η > 0 such that
kAk ≤ ηkAkF , (1.37)
where k.k is the l2 -operator norm induced by the corresponding vector norm.
By L(Rn ) we denote the space of all linear maps from Rn to Rn , i.e., all
(n × n)-matrices. So, an element of the power set P{L(Rn )} is a set of linear
maps from Rn to Rn . The function Φ appearing in Theorem 1.24 is a set valued
function, that assigns to a couple of a vector x ∈ Rn and a matrix B ∈ L(Rn ) a
set of matrices {B̄}. This can consists of one single element and it can contain
B itself.
Theorem 1.24. Let g : Rn → Rn be continuously differentiable in the open,
convex set D ⊂ Rn , and assume that Jg ∈ Lipγ (D). Assume that there exists an
x∗ ∈ D such that g(x∗ ) = 0 and Jg (x∗ ) is non-singular. Let Φ : Rn × L(Rn ) →
P{L(Rn )} be defined in a neighborhood N = N1 × N2 of (x∗ , Jg (x∗ )) where N1
is contained in D and N2 only contains non-singular matrices. Suppose there
are non-negative constants α1 and α2 such that for each (x, B) in N, and for
x̄ = x − B −1 g(x), the function Φ satisfies
³ ´
kB̄ − Jg (x∗ )kF ≤ 1 + α1 max{kx̄ − x∗ k, kx − x∗ k} · kB − Jg (x∗ )kF
+ α2 max{kx̄ − x∗ k, kx − x∗ k} (1.38)
for each B̄ in Φ(x, B). Then for arbitrary r ∈ (0, 1), there are positive constants
ε(r) and δ(r) such that for kx0 − x∗ k < ε(r) and kB0 − Jg (x∗ )kF < δ(r), and
Bk+1 ∈ Φ(xk , Bk ), k ≥ 0, the sequence
xk+1 = xk − Bk−1 g(xk ) (1.39)
is well defined and converges to x∗ . Furthermore,
kxk+1 − x∗ k ≤ rkxk − x∗ k (1.40)
for each k ≥ 0, and {kBk k}, {kBk−1 k} are uniformly bounded.
Proof. Let r ∈ (0, 1) be given and set β ≥ kJg (x∗ )−1 k. Choose δ(r) = δ and
ε(r) = ε such that
ε
(2α1 δ + α2 ) ≤ δ, (1.41)
1−r
and for η given by (1.37),
β(1 + r)(γε + 2ηδ) ≤ r. (1.42)
If necessary further restrict ε and δ so that (x, B) lies in the neighborhood N

whenever kB−Jg (x∗ )kF < 2δ and kx−x∗ k < ε. Suppose that kB0 −Jg (x∗ )kF <
δ and kx0 − x∗ k < ε. Then kB0 − Jg (x∗ )k < ηδ < 2ηδ, and since (1.42) yields
2β(1 + r)ηδ ≤ r, (1.43)
Theorem 1.12 gives

β β
kB0−1 k ≤ ≤ = (1 + r)β.
1 − β2ηδ 1 − r/(1 + r)
Lemma 1.23 now implies that
kx1 − x∗ k ≤ kx0 − B0−1 g(x0 ) − x∗ k

³
≤ kB0−1 k · kg(x0 ) − g(x∗ ) − Jg (x∗ )(x0 − x∗ )k
´
+kB0 − Jg (x∗ )kkx0 − x∗ k
≤ β(1 + r)(γε + 2ηδ)kx0 − x∗ k,
and by (1.42) it follows that kx1 − x∗ k ≤ rkx0 − x∗ k. Hence, kx1 − x∗ k < ε,

and thus x1 ∈ D.
We complete the proof with an induction argument. Assume that both
kBk − Jg (x∗ )kF ≤ 2δ and kxk+1 − x∗ k ≤ rkxk − x∗ k for k = 0, 1, . . . , m − 1. It
follows from (1.38) that
kBk+1 − Jg (x∗ )kF − kBk − Jg (x∗ )kF

≤ (α1 kBk − Jg (x∗ )kF + α2 ) max{kxk+1 − x∗ k, kxk − x∗ k}
≤ (2α1 δ + α2 ) max{rkxk − x∗ k, kxk − x∗ k}
≤ (2α1 δ + α2 )rk kx0 − x∗ k
≤ (2α1 δ + α2 )εrk ,
and by summing both sides from k = 0 to m − 1, we obtain

ε
kBm − Jg (x∗ )kF ≤ kB0 − Jg (x∗ )kF + (2α1 δ + α2 ) ,
1−r
which by (1.41) implies that kBm − Jg (x∗ )k ≤ 2δ. To complete the induction
step we only need to prove that kxm+1 − x∗ k ≤ rkxm − x∗ k. This follows by
an argument similar to the one for m = 1. In fact, since kBm − Jg (x∗ )k ≤ 2ηδ,
Lemma 1.12 and (1.43) implies that
−1
kBm k ≤ (1 + r)β,
and by Lemma 1.23 it follows that

³
kxm+1 − x∗ k ≤ kBm −1
k kg(xm ) − g(x∗ ) − Jg (x∗ )(xm − x∗ )k
´
+kBm − Jg (x∗ )kkxm − x∗ k
≤ β(1 + r)(γε + 2ηδ)kxm − x∗ k,
and kxm+1 − x∗ k ≤ rkxm − x∗ k follows from (1.42).
Corollary 1.25. Assume that the hypotheses of Theorem 1.24 hold. If some
subsequence of {kBk − Jg (x∗ )k} converges to zero, then the sequence {xk }
converges q-superlinearly at x∗ .
Proof. We would like to show that
kxk+1 − x∗ k
lim = 0.
By Theorem 1.24 there are numbers ε( 12 ) and δ( 21 ) such that kB0 −Jg (x∗ )kF <
δ( 21 ) and kx0 −x∗ k < ε( 12 ) imply that kxk+1 −k ≤ 12 kxk −x∗ k for each k ≥ 0. Let
now r ∈ (0, 1) be given. We can choose m > 0 such that kBm −Jg (x∗ )kF < δ(r)
and kxm − x∗ k < ε(r). So, kxk+1 − x∗ k ≤ rkxk − x∗ k for each k ≥ m. Since
r ∈ (0, 1) was arbitrary, the proof is completed.
It should be clear that some condition like the one in Corollary 1.25 is
necessary to guarantee q-superlinear convergence. For example, the Newton-
Chord iteration scheme, see Algorithm 1.16,
xk+1 = xk − Jg (x0 )−1 g(xk )
satisfies (1.38) with α1 = α2 = 0, but is, in general, only linearly convergent.

One of the interesting aspects of the result of the following theorem is that
q-superlinear convergence is guaranteed for the method of Broyden, without
any subsequence of {kBk − Jg (x∗ )k} necessarily converging to zero.

convex set D ⊂ Rn , and assume that Jg ∈ Lip γ(D). Let x∗ be a zero of g, for
which Jg (x∗ ) is non-singular. Then the update function Φ(x, B) = {B̄ | s 6= 0},
where
sT
B̄ = B + (y − Bs) T , (1.44)
s s
is well defined in a neighborhood N = N1 × N2 of (x∗ , Jg (x∗ )), and the corre-
sponding iteration
xk+1 = xk − Bk−1 g(xk ) (1.45)
with Bk+1 ∈ Φ(xk , Bk ), k ≥ 0, is locally and q-superlinearly convergent at x∗ .
Before we can prove the theorem we need some preparations. The idea
of the proof of Theorem 1.26 is in the following manner. If B̄ is given by
(1.44), then Lemma 1.23 and standard properties of the matrix norms k.k 2
and k.kF imply that there exists a neighborhood N of (x∗ , Jg (x∗ )) such that
condition (1.38) is satisfied for every (x, B) in N. Subsequently, Theorem 1.24
yields that iteration (1.45) is locally and linearly convergent. The q-superlinear
convergence is a consequence of the following two lemma’s.
Lemma 1.27. Let xk ∈ Rn , k ≥ 0. If {xk } converges q-superlinearly to x∗ ∈
Rn , then in any norm k.k,
kxk+1 − xk k
lim = 1.
Define the error in the current iteration ek by
ek = x k − x ∗ . (1.46)
The proof is drawn in Figure 1.8. Clearly, if

kek+1 k ksk k
lim = 0, then lim = 1.
k→∞ kek k k→∞ kek k
Proof (of Lemma 1.27). With ek given by (1.46) we compute

¯ ks k ¯ ¯ ks k − ke k ¯
¯ k ¯ ¯ k k ¯
lim ¯ − 1¯ = lim ¯ ¯
k→∞ kek k k→∞ kek k
¯ ks + e k ¯
¯ k k ¯
≤ lim ¯ ¯
k→∞ kek k
kek+1 k
= lim = 0,
k→∞ kek k
xk+1
PSfrag replacements
sk
ek+1
xk
ek
x∗
Figure 1.8: Schematic drawing of two subsequent iterates.
where the final equality is the definition of q-superlinear convergence if ek 6= 0

for all k.
Note that Lemma 1.27 is also of interest to the stopping criteria in our al-
gorithms. It shows that whenever an algorithm achieves at least q-superlinear
convergence, then any stopping test that uses sk is essentially equivalent to
the same test using ek , which is the quantity we are really interested in.
Lemma 1.28. Let D ⊆ Rn be an open, convex set, g : Rn → Rn continuously
differentiable, and Jg ∈ Lipγ (D). Assume that Jg (x∗ ) is non-singular for some
x∗ ∈ D. Let {Ak } be a sequence of nonsingular matrices in L(Rn ). Suppose
for some x0 ∈ D that the sequence of points generated by
xk+1 = xk − A−1
k g(xk ) (1.47)
remains in D, and satisfies limk→∞ xk = x∗ , where xk 6= x∗ for every k. Then

{xk } converges q-superlinearly to x∗ in some norm k.k and g(x∗ ) = 0, if and
only if
k(Ak − Jg (x∗ ))sk k
lim =0 (1.48)
k→∞ ksk k
where sk = xk+1 − xk .
Proof. Define ek = xk − x∗ . First we assume that (1.48) holds, and show that
g(x∗ ) = 0 and that {xk } converges q-superlinearly to x∗ . Equation (1.47) gives
0 = Ak sk + g(xk ) = (Ak − Jg (x∗ ))sk + g(xk ) + Jg (x∗ )sk ,
so that
−g(xk+1 ) = (Ak − Jg (x∗ ))sk + (−g(xk+1 ) + g(xk ) + Jg (x∗ )sk ), (1.49)

and
kgk+1 k kAk − Jg (x∗ )sk k k − g(xk+1 ) + g(xk ) + Jg (x∗ )sk k
≤ +
ksk k ksk k ksk k
kAk − Jg (x∗ )sk k
≤ + γ max{kx − x∗ k, kx − x∗ k} (1.50)
ksk k
where the second inequality follows from Lemma 1.23. Equation (1.50) to-
gether with limk→∞ kek k = 0 and (1.48) gives
kg(xk+1 )k
lim = 0. (1.51)
k→∞ ksk k
Since limk→∞ ksk k = 0, it follows that
g(x∗ ) = lim g(xk ) = 0.

k→∞
From Lemma 1.23, there exist ρ > 0, k0 ≥ 0, such that
1
kg(xk+1 )k = kg(xk+1 ) − g(x∗ )k ≥ kek+1 k, (1.52)
ρ
for all k ≥ k0 . Combining (1.51) and (1.52) gives
kg(xk+1 )k
0 = lim
k→∞ ksk k
1 kek+1 k
≥ lim
k→∞ ρ ksk k
1/ρ · kek+1 k 1/ρ · rk
≥ lim = lim ,
k→∞ kek k + kek+1 k k→∞ 1 + rk
where rk = kek+1 k/kek k. This implies
lim rk = 0,
k→∞
which completes the proof of q-superlinear convergence.

The proof of the reverse implication, that q-superlinear convergence and
∗
g(x ) = 0 imply (1.48), is the derivation above read in more or less the reversed
order. From Lemma 1.23, there exist ρ > 0, k0 ≥ 0, such that
kg(xk+1 )k ≥ ρkek+1 k
for all k ≥ k0 . Therefore,

kek+1 k
0 = lim
k→∞ kek k
kg(xk+1 )k
≥ lim
k→∞ 1/ρkek k
kg(xk+1 )k ksk k
≥ lim ρ · · . (1.53)
k→∞ ksk k kek k
The q-superlinear convergence implies that limk→∞ ksk k/kek k = 1 according
to Lemma 1.27. Together with (1.53) this gives that (1.51) holds. Finally,
from (1.49) and Lemma 1.23,
k(Ak − Jg (x∗ ))sk k kg(xk+1 )k k − g(xk+1 ) + g(xk ) + Jg (x∗ )sk k
≤ +
ksk k ksk k ksk k
kg(xk+1 )k
≤ + γ max{kx − x∗ k, kx − x∗ k},
ksk k
which together with (1.51) and limk→∞ kek k = 0 proves (1.48).
Due to the Lipschitz continuity of Jg , it is easy to show that Lemma 1.28
remains true if (1.48) is replaced by
k(Ak − Jg (xk ))sk k
lim = 0. (1.54)
k→∞ ksk k
This condition has an interesting interpretation. Because sk = −A−1
k g(xk ),
Equation (1.54) is equivalent to
kJg (xk )(sN
k − sk )k
lim = 0,
k→∞ ksk k
where sN −1
k = −Jg (xk ) g(xk ) is the Newton step from xk . Thus the necessary
and sufficient condition for the q-superlinear convergence of a secant method
is that the secant steps converge, in magnitude and direction, to the Newton
steps from the same points.
After stating a final lemma we are able to prove the main theorem of this
section.
Lemma 1.29. Let s ∈ Rn be nonzero and E ∈ Rn×n . Then
° ³ ssT ´° ³ ³ kEsk ´2 ´1/2
° ° 2
° E I − ° = kEk F − (1.55)
sT s F ksk
1 ³ kEsk ´2
≤ kEkF − . (1.56)
2kEkF ksk
Proof. Note that I − (ssT /sT s) is a Euclidean projection, and so is ssT /sT s.
So by the Pythagorean theorem,
° ssT °2 ° ³ ssT ´°
° ° ° °2
kEk2F = °E T ° + °E I − T ° ,
s s F s s F
and the equality
° ssT ° kEsk
° °
°E T ° = ,
s s F ksk
we have proved (1.55). Because for any α ≥ |β| ≥ 0, (α2 − β 2 )1/2 ≤ α − β 2 /2α,
Equation (1.55) implies (1.56).
Proof (of Theorem 1.26). In order to be able use both Theorem 1.24 and
Lemma 1.28, we first derive an estimate for kB̄ − Jg (x∗ )k. Assume that x̄
and x are in D and ksk 6= 0. Define Ē = B̄ − Jg (x∗ ), E = B − Jg (x∗ ),
ē = x̄ − x∗ , and e = x − x∗ . Note that
Ē = B̄ − Jg (x∗ )
sT
= B − Jg (x∗ ) + (y − Bs) T
s s
³ ss T ´ sT
= (B − Jg (x∗ )) I − T + (y − Jg (x∗ )) T .
s s s s
Therefore,
° ³ ssT ´° ky − Jg (x∗ )sk
° °
kĒkF ≤ °(B − Jg (x∗ )) I − T ° +
s s F ksk
° ³ T
ss ° ´°
°
≤ °E I − T ° + γ max{kēk, kek}. (1.57)
s s F
For the last inequality of (1.57) Lemma 1.23 is used. Because I − ssT /(sT s)
is an orthogonal projection it has l2 -norm equal to one,
° ssT °
° °
°I − T ° = 1.
s s
Therefore, the inequality (1.57) can be reduced to
kĒkF ≤ kEkF + γ max{kēk, kek}. (1.58)
We define the neighborhood N2 of Jg (x∗ ) by

n 1o
N2 = B ∈ L(Rn ) | kJg (x∗ )−1 k · kB − Jg (x∗ )k < .
2
Then any B ∈ N2 is non-singular and satisfies

kJg (x∗ )−1 k
kB −1 k ≤ ≤ 2kJg (x∗ )−1 k.
1 − k(Jg (x∗ )−1 (B − Jg (x∗ ))k
To define the neighborhood N1 of x∗ , choose ε > 0 and ρ > 0 as in Lemma
1.23 so that max{kx̄ − x∗ k, kx − x∗ k} ≤ ε implies that x and x̄ belong to D
and that (1.36) holds, for u = x and v = x̄, i.e.,
(1/ρ)kx − x̄k ≤ kg(x) − g(x̄)k ≤ ρkx − x̄k. (1.59)
In particular, if kx − x∗ k ≤ ε and B ∈ N2 then x ∈ D and
ksk = kB −1 g(x)k ≤ kB −1 kkg(x) − g(x∗ )k ≤ 2ρkJg (x∗ )−1 kkx − x∗ k.

ε
Let N1 be the set of all x ∈ Rn such that kx − x∗ k < 2 and
ε
2ρkJg (x∗ )−1 kkx − x∗ k < .
2
If N = N1 × N2 and (x, B) ∈ N, then
kx̄ − x∗ k ≤ ksk + kx − x∗ k ≤ ε.
Hence, x̄ ∈ D and moreover, (1.59) shows that s = 0 if and only if x = x∗ . So,

the update function is well defined in N. Equation (1.58) then shows that the
update function associated with the iteration (1.45) satisfies the hypotheses
of Theorem 1.24 and therefore, the algorithm according to (1.45) is locally
convergent at x∗ . In addition we can choose r ∈ (0, 1) in (1.40) arbitrarily. We
take r = 12 , so
1
kxk+1 − x∗ k ≤ kxk − x∗ k (1.60)
2
Considering Lemma 1.28, a sufficient condition for {xk } to converge q-
superlinearly to x∗ is
kEk sk k
lim = 0. (1.61)
k→∞ ksk k
In order to justify Equation (1.61) we write Equation (1.57) as

° ³ sk sT ´°
° °
kEk+1 kF ≤ °Ek I − T k ° + γ max{kek+1 k, kek k}. (1.62)
sk sk F
Using Equation (1.60) and Lemma 1.29 in (1.62), we obtain
kEk sk k2
kEk+1 kF ≤ kEk kF − + γkek k,
2kEk kF ksk k2
or ³ ´
kEk sk k2
≤ 2kE k k F kE k k F − kE k+1 k F + γke k k . (1.63)
ksk k2
Theorem 1.24 gives that {kBk k} is uniformly bounded for k ≥ 0. This implies
that there exists an M > 0 independently of k such that
kEk k = kBk − Jg (x∗ )k ≤ kBk k + kJg (x∗ )k ≤ M.
By Equation (1.60) we obtain

∞
X
kek k ≤ 2ε.
k=0
Thus from (1.63),
kEk sk k2 ³ ´
≤ 2M kEk kF − kEk+1 kF + γkek k , (1.64)
ksk k2
and summing the left and right sides of (1.64) for k = 0, 1, . . . , m, yields
m
X kEk sk k2 ³ m
X ´
≤ 2M kE0 kF − kEm+1 kF + γ kek k
ksk k2
k=0 k=0
³ ´
≤ 2M kE0 kF + 2εγ
³ ´
≤ 2M M + 2εγ . (1.65)
Because (1.65) is true for any m ≥ 0, we obtain

∞
X kEk sk k2
< ∞,
ksk k2
k=0
which implies (1.61) and completes the proof.
The inverse notation of Broyden’s method

A restriction of the method of Broyden is that it is necessary to solve an
n-dimensional system to compute the Broyden step, see Algorithm 1.19. To
avoid this problem, instead of the Broyden-matrix one could store the inverse
of this matrix, and the operation is reduced to a matrix-vector multiplication.
If Hk is the inverse of Bk then
sk = −Hk g(xk ),
and the secant equation becomes
Hk+1 yk = sk . (1.66)
Equation (1.66) again does not define a unique matrix but a class of matrices.
In Section 1.3, the new Broyden matrix Bk+1 has been chosen so that,
in addition to the secant equation (1.25), it satisfies Bk+1 q = Bk q in any
direction q orthogonal to sk . This was sufficient to define Bk+1 uniquely and
the update was given by (1.30).
It is possible, using Householder’s modification formula, to compute the
−1
new inverse Broyden matrix Hk+1 = Bk+1 with very little effort from Hk .
Householder’s formula, also called the Sherman-Morrison formula, states that
if A is a nonsingular (n × n)-matrix, u and v are vectors in Rn , and (1 +
v T A−1 u) 6= 0, then (A + uv T ) is nonsingular and
A−1 uv T A−1
(A + uv T )−1 = A−1 − . (1.67)
1 + v T A−1 u
This formula is a particular case of the Sherman-Morrison-Woodbury for-
mula derived in the next theorem.
Theorem 1.30. Let A ∈ Rn×n be nonsingular and U, V ∈ Rn×p be arbitrary
matrices with p ≤ n. If (I + V T A−1 U ) is nonsingular then (A + U V T )−1 exists
and
(A + U V T )−1 = A−1 − A−1 U (I + V T A−1 U )−1 V T A−1 . (1.68)
Proof. The formula (1.68) is easily verified by computing
(A−1 − A−1 U (I + V T A−1 U )−1 V T A−1 )(A + U V T ),
and
(A + U V T )(A−1 − A−1 U (I + V T A−1 U )−1 V T A−1 ),
that both yield the identity. Therefore A + U V T is invertible and the inverse
is given by (1.68).
Equation (1.67) gives that if sTk Hk yk 6= 0, then
−1 sTk −1
Hk+1 = Bk+1 = (Bk + (yk − Bk sk ) )
sTk sk
sTk Bk−1
= Bk−1 − (Bk−1 yk − sk )
sTk Bk−1 yk
sTk Hk
= Hk + (sk − Hk yk ) . (1.69)
sTk Hk yk
The iterative scheme
xk+1 = xk − Hk g(xk ), k = 0, 1, 2, . . . ,
together with the rank one update (1.69) equals Algorithm 1.19.
Instead of assuming that Bk+1 q = Bk q in any direction q orthogonal to sk ,
we could also require that
Hk+1 q = Hk q for q T yk = 0.
This is, in some sense, the complement of the first method of Broyden. Since
Hk+1 satisfies (1.66), it is readily seen that for this method Hk+1 is uniquely
given by
yT
Hk+1 = Hk + (sk − Hk yk ) Tk .
yk yk
This update scheme, however, appears in practice to be unsatisfactory and is
called the second or ’bad’ method of Broyden [8].
Chapter 2
Solving linear systems with

Broyden’s method
One important condition for an algorithm to be a good iterative method is

that it should use a finite number of iterations to solve a system of linear
equations
Ax + b = 0, (2.1)
where A ∈ Rn×n and b ∈ Rn . As we have seen in Section 1.2, the method
of Newton satisfies this condition, that is, it solves a system of linear equa-
tions in just one single iteration step. Although computer simulations indicate
that the method of Broyden satisfies finite convergence, for a long time it was
not possible to prove this algebraically. In 1979, fourteen years after Charles
Broyden proposed his algorithm, David Gay published a proof that Broyden’s
method converges in at most 2n iteration steps for any system of linear equa-
tions (2.1) where A is nonsingular [22]. In addition Gay proved under which
conditions the method of Broyden needs exactly 2n iterations.
For many examples, however, it turns out that Broyden’s method needs
much less iterations. In 1981, Richard Gerber and Franklin Luk [23] published
an approach to compute the exact number of iterations that Broyden’s method
needs to solve (2.1).
In this chapter, we discuss the Theorems of Gay, Section 2.1, and of Ger-
ber and Luk, Section 2.2, and we give examples to illustrate the theorems. In
Section 2.3, we show that the method of Broyden is invariant under unitary
transformations and in some weak sense also under nonsingular transforma-
tions. This justifies that we restrict ourselves to examples where A is in Jordan
canonical block form, cf. Section 4.2.
But first we start again with the problem in the one-dimensional setting.
55
56 Chapter 2. Solving linear systems with Broyden’s method
The one-dimensional case

Consider the function g : R → R given by
g(x) = αx + β,
where α 6= 0. It is clear that Newton’s method converges in one iteration

starting from any initial point x0 , different from the solution x∗ . Indeed,
g(x0 ) αx0 + β β
x1 = x 0 − 0
= x0 − =− .
g (x0 ) α α
and g(x1 ) = 0. It turns out that Broyden needs two iterations from the same
initial point x0 6= x∗ , if b0 ∈ R is an arbitrary nonzero scalar, with b0 6= α. We
compute
g(x0 ) α β
s0 = − = − x0 − .
b0 b0 b0
So, if x1 = x0 + s0 , then
³ ´ ³ ´ ³ ´
g(x1 ) = α 1 − α/b0 x0 + 1 − α/b0 β = 1 − α/b0 (αx0 + β).
The scalar b0 is updated by
g1 (1 − α/b0 )(αx0 + β)
b1 = b0 + = b0 + = b0 − (b0 − α) = α.
s0 −(αx0 + β)/b0
Thus after one iteration Broyden’s method succeeds to find the derivative of
the function g. Therefore the method converges in the next iterations step,
that is,
g(x1 ) α β β
x2 = x 1 − = x 1 − x1 − =− .
b1 b1 b1 α
2.1 Exact convergence for linear systems

Suppose that g : Rn → Rn is an affine function, that is, for x ∈ Rn ,
g(x) = Ax + b, (2.2)
where A : Rn×n and b ∈ Rn . The matrix A is assumed to be nonsingular.

For notational simplicity we denote g(xk ) by gk . We consider the following
generalization of the method of Broyden.
2.1 Exact convergence for linear systems 57
Algorithm 2.1 (Generalized Broyden’s method). Choose x0 ∈ Rn and

a nonsingular (n × n)-matrix H0 . Compute s0 := −H0 g(x0 ) and let k := 0.
Repeat the following sequence of steps as long as sk 6= 0.
i) xk+1 := xk + sk ,
ii) yk := g(xk+1 ) − g(xk ),
iii) Choose vk such that the conditions
vkT yk = 1, (2.3)
vk = HkT uk , (2.4)
are satisfied, where uTk sk 6= 0.
iv) Hk+1 := Hk + (sk − Hk yk )vkT ,
v) Compute sk+1 := −Hk+1 g(xk+1 ),
Property (2.3) establishes the inverse secant equation (1.66),
Hk+1 yk = sk .
Note that both properties are satisfied when Broyden’s ’good’ update is used,
i.e.,
vk = HkT sk /(sTk Hk yk ) for sTk Hk yk 6= 0.
The ’bad’ Broyden update vk = yk /(ykT yk ) clearly satisfies property (2.3), but
might non keep (2.4) invariant.
The updated Broyden matrix can be written as
Hk+1 = Hk + (sk − Hk yk )vkT
= Hk − Hk (gk + yk )vkT
= Hk (I − gk+1 vkT ). (2.5)
With this relation the following lemma is easily been shown.
Lemma 2.2. If Hk is invertible and vk satisfies conditions (2.3) and (2.4)
then Hk+1 is invertible as well.
Proof. The determinant of the matrix I − gk+1 vkT equals
det(I − gk+1 vkT ) = 1 − vkT gk+1
= 1 − vkT (yk + gk )
= 1 − 1 − uTk Hk gk
= uTk sk .
Because uTk sk is assumed to be nonzero, this implies that Hk+1 is invertible if

Hk is invertible.
According to the definition of the Broyden step, sk = −Hk gk , the non-

singularity of H0 implies that sk = 0 if and only if gk = 0 for all k ≥ 0. Thus
the algorithm stops if and only if the zero of the function g is found.
Since g is a affine function the yield of the step size can be expressed as
yk = Ask . (2.6)
The matrix A is assumed to be nonsingular. So, (2.6) establishes that yk is

a nonzero vector throughout the execution of Algorithm 2.1. We also use the
relations
yk = −AHk g(xk ), (2.7)
and
(I − AHk+1 )yk = 0. (2.8)
Equation (2.7) implies that
gk+1 = yk + gk = (I − AHk )gk . (2.9)
A theorem of Gay
In this section, we show that Algorithm 2.1 converges in at most 2n steps when
applied to an affine function g : Rn → Rn , given by (2.2), where A ∈ Rn×n
is nonsingular and b ∈ Rn . This follows as an easy corollary to the following
lemma. The notation bσc used below denotes the greatest integer less than or
equal to σ ∈ R.
In the proof of Lemma 2.3, we need the equalities
AHk+1 = A(Hk + (sk − Hk yk )vkT ) (2.10)
and
AHk+1 = AHk (I − (I − AHk )gk vkT ), (2.11)
for which we have used (2.6) and (2.7). From (2.10) we deduce
I − AHk+1 = (I − AHk )(I + AHk gk vkT )

= (I − AHk )(I − yk vkT ). (2.12)
Lemma 2.3. If A ∈ Rn×n and Algorithm 2.1 is applied to g(x) ≡ Ax − b

with the result that gk ≡ g(xk ) and yk−1 are linearly independent, then for
1 ≤ j ≤ b(k + 1)/2c, the vectors
(AHk−2j+1 )i gk−2j+1 , 0 ≤ i ≤ j, (2.13)
are linearly independent.
Proof. We prove (2.13) by induction on j. The linearity of g implies that

yk−1 = Ask−1 = −AHk−1 gk−1 , so (2.13) is easily seen to hold for j = 1, using
that yk−1 = gk − gk−1 . For the induction we proof that the vectors in (2.13)
are linearly independent for j = 2. The proof for 3 ≤ j ≤ b(k + 1)/2c is similar
and we refer to [22] for a complete derivation.
By (2.11) we have that
T
AHk−1 = AHk−2 (I − (I − AHk−2 )gk−2 vk−2 )
Moreover, Equation (2.9) gives gk−1 = (I − AHk−2 )gk−2 and therefore

T
(AHk−1 )gk−1 = AHk−2 (I − (I − AHk−2 )gk−2 vk−2 )gk−1
T
= AHk−2 (I − AHk−2 )gk−2 (1 − vk−2 gk−1 )
T
= (1 − vk−2 gk−1 )(I − AHk−2 )AHk−2 gk−2
Since gk−1 and (AHk−1 )gk−1 are linearly independent
(I − AHk−2 )gk−2 and (I − AHk−2 )AHk−2 gk−2
are linearly independent as well. According to (2.8) we have
(I − AHk−2 )yk−3 = 0.
Therefore yk−3 , gk−2 and AHk−2 gk−2 are linearly independent. Note that, as
before,
gk−2 = (I − AHk−3 )gk−3
and
T
(AHk−2 )gk−2 = (1 − vk−3 gk−2 )(I − AHk−3 )AHk−3 gk−3 .
Since yk−3 = AHk−3 gk−3 , we see that (AHk−3 )i gk−3 , 0 ≤ i ≤ 2, are linearly
independent. Therefore (2.13) holds for j = 2.
Theorem 2.4. If g(x) = Ax−b and A ∈ Rn×n is nonsingular, then Algorithm

2.1 converges in at most 2n steps, i.e., gk = 0 for some k ≤ 2n.
Proof. By Lemma 2.3 there exists a k with 1 ≤ k ≤ 2n − 1 such that gk and

yk−1 are linearly dependent. The theorem clearly holds if gk = 0, so assume
that gk 6= 0, (whence gk−1 6= 0 too). Lemma 2.2 shows that Hl is nonsingular
for l ≥ 0, so sk−1 6= 0. Because A is also nonsingular, we must have yk−1 6= 0
and hence gk = λyk−1 for some λ 6= 0. According to (2.8) yk−1 = AHk−1 yk−1 ,
so gk = AHk gk , whence gk+1 = gk − AHk gk = 0.
With the following example we illustrate Theorem 2.4.
Example 2.5. Consider the linear function g1 (x) = A1 x and g2 (x) = A2 x,

where
   
2 1 0 0 2 1 0 0
0 2 1 0 0 2 0 0
A1 = 
0 0 2 1
 and A2 = 0
. (2.14)
0 2 1
0 0 0 2 0 0 0 2
We apply the method of Broyden, see Algorithm 1.19, starting with the initial
matrix B0 = −I and initial estimate x0 = (1, 1, 1, 1). The rate of convergence
is given in Figure 2.1. Clearly, here the number of 2n iterations is an upper
bound for Algorithm 1.19 to obtain the exact zero of the function g 1 and g2 .
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 1 2 3 4 5 6 7 8
iteration k
Figure 2.1: The convergence rate of Algorithm 1.19, when solving Ai x = 0, for i = 1, 2,
where A1 and A2 are defined in (2.14) [’◦’: A1 , ’×’: A2 ]
In Section 1.3, we have seen that the Broyden matrix Bk does not neces-
sarily converge to the Jacobian even if the sequence {xk } converges to x∗ .
Lemma 2.6. Let g : Rn → Rn be an affine function, with nonsingular Jaco-

bian A ∈ Rn×n . Consider Algorithm 1.19, where sTk Bk−1 yk is nonzero in every
iteration k. Then for all k = 0, . . . , k ∗ ,
kBk+1 − AkF ≤ kBk − AkF (2.15)
Proof. Because we assume that sTk Bk−1 yk 6= 0 for all k, Algorithms 1.19 and 2.1
are equivalent for vk = HkT sk /(sTk Hk yk ). Theorem (2.4) gives that the process
is well defined and converges in a finite number of iterations.
Because g is affine we have that yk = Ask and according to the Broyden
update, we obtain
sTk
Bk+1 = Bk + (yk − Bk sk )
sTk sk
³ sk sT ´
Bk+1 − A = (Bk − A) I − T k ,
sk sk
and by taken the Frobenius norm of both sides, we arrive at
° sk sT °
° °
kBk+1 − AkF ≤ kBk − AkF °I − T k ° ≤ kBk − AkF .
sk sk
In other words, the difference between the Jacobian and the Broyden ma-
trix, is projected on an (n−1)-dimensional subspace orthogonal to the Broyden
step sk . Therefore, the final difference kBk∗ − AkF depends in particular on
the orthogonality of the Broyden steps {s0 , . . . , sk∗ }.
Lemma 2.7. Let g : Rn → Rn be an affine function, with nonsingular Ja-
cobian A ∈ Rn×n . Consider Algorithm 2.1 and suppose for some k ≥ 1, that
yk 6= 0, vkT yk−1 6= 0 and rank(I − AHk ) = n − 1 then rank(I − AHk+1 ) = n − 1
and yk spans the kernel of (I − AHk+1 ).
Proof. The assumption that yk 6= 0 implies that yk−1 6= 0. According to (2.8)
we see that yk is in the kernel of (I −AHk+1 ). Similarly, since rank(I −AHk ) =
n − 1 the vector yk−1 spans the kernel of (I − AHk ). Any other null vector y
of (I − AHk+1 ) must (after scaling) satisfy (I − yk vkT )y = yk−1 . But vk spans
the kernel of (I − yk vkT )T , and because, by assumption, vkT yk−1 6= 0, yk−1 is
not in the range of (I − yk vkT ). So, yk spans the kernel of (I − AHk+1 ).
Lemma 2.7 leads to the important observation that the sequence of ma-
trices Hk does not terminate with the inverse of the matrix A, at least in the
usual case in which all vkT yk−1 6= 0. In fact, each matrix Hk and A−1 agree
only on a subspace of dimension one.
2.2 Two theorems of Gerber and Luk

We consider the Broyden process again applied to compute a zero of the affine
function g given by (2.2). Let Zk be defined as the subspace spanned by the
Krylov sequence {gk , AHk gk , (AHk )2 gk , . . .}. So,
Zk = span{gk , AHk gk , (AHk )2 gk , . . .}, (2.16)
for k ≥ 0. We will call subspace Zk the kth Krylov subspace. So, Z0 will
be called the zeroth subspace. We already have proved that Algorithm 2.1
terminates at the kth iteration if and only if g(xk ) = 0. Thus sk = 0 if and
only the dimension of Zk is zero.
We proceed to show how the Zk ’s decrease in dimension and first derive
several lemma’s.
Lemma 2.8. Let zk+1 be any vector in Zk+1 . Then there exists a vector zk in
Zk such that
zk+1 = (I − AHk )zk .
Proof. It suffices to show that for j ≥ 0, there is a vector tj in Zk such that
(AHk+1 )j gk+1 = (I − AHk )tj .
We prove this by induction. If j = 0, we have t0 = gk because of (2.9). Assume

there is a vector tj in Zk such that
(AHk+1 )j gk+1 = (I − AHk )tj .
By definition of Hk+1 , we obtain
(AHk+1 )j+1 gk+1 = (AHk+1 )(AHk+1 )j gk+1

= (AHk + A(sk − Hk yk )vkT )(I − AHk )tj
= (I − AHk )AHk tj + c(I − AHk )yk ,
where c = vkT (I − AHk )tj . So,
(AHk+1 )j+1 gk+1 = (I − AHk )tj+1 ,
where tj+1 = AHk tj + cyk . By (2.7) the vector tj+1 ∈ Zk .
We immediately see that

Zk+1 ⊂ Zk . (2.17)
Another direct implication of Lemma 2.8 is formulated in the following lemma.
2.2 Two theorems of Gerber and Luk 63
Lemma 2.9. Let the vectors {t1 , t2 , . . . , td } span Zk . Then the vectors
(I − AHk )t1 , (I − AHk )t2 , . . . , (I − AHk )td
span Zk+1 .
Lemma 2.10. Let dim Zk = d + 1. If there is a nonzero vector wk in the

subspace Zk ∩ Ker(I − AHk ) then
d
X
wk = αi (AHk )i gk , αd 6= 0.
i=0
Proof. We have
d
X
0 = (I − AHk )wk = α0 gk + (αi − αi−1 )(AHk )i gk − αd (AHk )d+1 gk .
i=1
Suppose that αd = 0. As the vectors (AHk )i gk for i = 0, 1, . . . , d are linearly

independent, we deduce that αi = 0 for i = 0, 1, . . . , d − 1. But this contradicts
the assumption of a nonzero wk .
A consequence of Lemma 2.10 is that wk is unique up to a scalar multiple,

i.e., if there is a nonzero vector wk in Zk ∩ Ker(I − AHk ), then wk spans
Zk ∩ Ker(I − AHk ). Thus, with Lemma 2.9, we obtain the inequality
dim Zk − 1 ≤ dim Zk+1 ≤ dim Zk . (2.18)
The following theorems state the basic result.
Theorem 2.11. If dim Zk+1 = dim Zk , then dim Zk+2 = dim Zk+1 − 1.
Theorem 2.12. If dim Zk+1 = dim Zk − 1 and
vkT wk 6= 0, (2.19)
where wk spans Zk ∩ Ker(I − AHk ), then dim Zk+2 = dim Zk+1 .
Before we start to prove both theorems, a few remarks are in order on the
vector wk in Theorem 2.12. Since dim Zk+1 = dim Zk − 1, Lemma 2.9 shows
that a nonzero wk must exist. In fact, it can be shown that wk = λyk−1 for
some scalar λ, with the exception of the case where there exists a nonzero w 0 .
Proof (of Theorem 2.11). Since dim Zk+1 = dim Zk , the subspaces Zk+1 Zk
are identical by (2.17), and so yk ∈ Zk+1 . By (2.7)-(2.8) and Lemma 2.10, yk
spans
Zk+1 ∩ Ker(I − AHk+1 ).
Applying Lemma 2.9 completes the proof.
The next lemma is needed in the proof of Theorem 2.12.

Lemma 2.13. If there is a nonzero vector wk in Zk ∩ Ker(I − AHk ) and if
vkT wk 6= 0, then Zk+1 ∩ Ker(I − AHk+1 ) = {0}.
Proof. Suppose there is a nonzero vector wk+1 in Zk+1 ∩Ker(I −AHk+1 ). First
we show that wk+1 = λyk for some scalar λ. Then we will prove that yk is not
in Zk+1 .
Assume wk+1 6= λyk for all nonzero scalars λ. Since wk+1 ∈ Zk and from
the equation
(I − AHk+1 ) = (I − AHk )(I − yk vkT ),
we deduce that
(I − yk vkT )wk+1 = αwk ,
for some nonzero scalar α. But then
αvkT wk = vkT wk+1 − vkT yk vkT wk+1 = 0,
contradicting the assumption that vkT wk 6= 0. So wk+1 = λyk for some nonzero
scalar λ.
Now we show that yk is not in Zk+1 . Let dim Zk = d + 1 where d ≥ 1. By
Lemma 2.10, the set of vectors
{gk , AHk gk , . . . , (AHk )d−1 gk , wk }
is a basis for Zk . Assuming that yk ∈ Zk+1 , Lemma 2.9 implies that
³X
d−1 ´
yk = (I − AHk ) βi (AHk )i gk + βd wk
i=0
³X
d−1 ´
= (I − AHk ) βi (AHK )i gk .
i=0
So, if d > 1, then
β0 + (β1 − β0 + 1)AHk gk + . . .
+ (βd−1 − βd−2 )(AHk )d−1 gk − βd−1 (AHk )d gk = 0.
2.2 Two theorems of Gerber and Luk 65
For d = 1 we obtain
β0 gk + (1 − β0 )AHk gk = 0.
Either case is impossible, as the vectors (AHk )i gk , i = 0, . . . , d, are linearly
independent. So yk ∈/ Zk+1 and hence Zk+1 ∩ Ker(I − AHk+1 ) = {0}.
Proof (of Theorem 2.12). The proof follows directly from the Lemmas 2.9 and
2.13.
Theorems 2.11 and 2.12 imply finite termination of the method. Let
dim Z0 = d0 and dim Z1 = d1 . From (2.18), we see that d0 − 1 ≤ d1 ≤ d0 .
Applying the Theorems 2.11 and 2.12, we conclude that Algorithm 2.1 must
terminate in exactly d0 + d1 steps, if (2.19) is satisfied in every iteration where
dim Zk+1 = dim Zk − 1. A weaker statement, though easier to check, is that
Broyden’s method needs at most 2d0 iterations, which is a direct consequence
of 2.11 and (2.18).
Corollary 2.14. Let d0 = dim Z0 then Algorithm 2.1 needs at most 2d0 iter-
ations to converge.
Example 2.15. It turns out that in case of the function g1 in Example 2.5
both the zeroth and the first Krylov space of the Broyden process has dimen-
sion 2 (= d0 + d1 ). This predicts the four iterations the method of Broyden
needs to solve g1 (x) = 0.
In case of the function g2 of Example 2.5, both the zeroth and the first
Krylov space of the Broyden process has dimension 4. The method of Broyden
needs 8 iterations to solve the equation g2 (x) = 0.
In the next example, we show (2.19) is a necessary condition for Theorem
2.12.
Example 2.16. Consider the linear function g(x) = Ax, where
 
−1 1 0
A =  0 −1 1  .
0 0 −1
If we apply the method of Broyden starting with the initial matrix B0 = −I
and initial estimate x0 = (1, 1, 1). Then we obtain the following process. Note
that the inverse of the Broyden also equal minus the identity, H0 = −I.
The function value in x0 equals g(x0 ) = Ax0 = (0, 0, −1). Therefore the
zeroth Krylov space is given by
     
n 0 0 −1 o
Z0 = span{g0 , AH0 g0 , (AH0 )2 g0 , . . .} = span  0  ,  1  ,  2  ,
−1 −1 −1
and the dimension of Z0 equals three. The kernel of (I − AH0 ) is one-

dimensional and spanned by the vector (−1, 0, 0). So the vector w0 that spans
Z0 ∩ Ker(I − AH0 ) equals w0 = (−1, 0, 0).
We see that the first Broyden step equals s0 = −H0 g0 = (0, 0, −1) and
so the last element of the iterate is nicely removed, x1 = x0 + s0 = (1, 1, 0).
With the yield of the Broyden step y0 = (0, −1, 1) we compute the new inverse
Broyden matrix,
v0 = H0T s0 /(sT0 H0 y0 ) = (0, 0, 1),
and  
−1 0 0
H1 = H0 + (s0 − H0 y0 )v0T =  0 −1 −1 .
0 0 −1
The new Broyden matrix B1 is given by
 
−1 0 0
B1 =  0 −1 1  .
0 0 −1
The function value in x1 equals g(x1 ) = Ax1 = (0, −1, 0). It turn out that
the first Krylov space is two-dimensional and is given by
   
n 0 1 o
Z1 = span{g1 , AH1 g1 , . . .} = span −1 , −1 .
0 0
We thus have that d0 = 3 and d1 = 2. According to Theorem 2.12 the in-

tersection of Z1 and the kernel of (I − AH1 ) must be empty and dim Z2 =
dim Z1 , if v0T w0 6= 0. However, in this example v0T w0 √ = 0. √The kernel of
(I − AH1 ) is spanned by the vectors (−1, 0, 0) and (0, 12 2, 12 2). Therefore
Z1 ∩ Ker(I − AH1 ) is spanned by the nonzero vector w1 = (−1, 0, 0). The
dimension of the second Krylov space, denoted by d2 , equals one. Note that
w0 and w1 are parallel (here chosen to be identical).
The second Broyden step equals s1 = −H1 g1 = (0, −1, 0) and so the second
element of the iterate is nicely removed, x2 = x1 +s1 = (1, 0, 0). Together with
the yield of the Broyden step y1 = (−1, 1, 0), we compute the new inverse
Broyden matrix, v1 = H1T s1 /(sT1 H1 y1 ) = (0, 1, 1), and
 
−1 −1 −1
H2 = H1 + (s1 − H1 y1 )v1T =  0 −1 −1 .
0 0 −1
2.3 Linear transformations 67
The Broyden matrix B2 is given by

 
−1 1 0
B2 =  0 −1 1  .
0 0 −1
The Broyden matrix B2 equals the Jacobian of the linear function. Hence, we
know that the Broyden process terminates in the next iteration. Again the
vector v1 and w1 are orthogonal.
The function value in x2 equals g(x2 ) = Ax2 = (−1, 0, 0) and the second
Krylov space is given by
 
n −1 o
Z2 = span{g2 , . . .} = span  0  .
0
Because H2 is the inverse of the Jacobian, A, the kernel of (I − AH2 ) is the
entire space. The vector w2 is thus given by w2 = (−1, 0, 0), and w2 = w1 =
w0 .
The final Broyden step equals s2 = (−1, 0, 0) and the solution to the
problem is found, that is, x3 = 0. The third Krylov space is given by Z3 =
span{g3 , AH3 g3 , . . .} = {0}.
2.3 Linear transformations

An important observation for our present approach is that Broyden’s method
is invariant under unitary transformations for general systems. We make this
precise in the next lemma.
Lemma 2.17. Let g : Rn → Rn be a general function, and choose x0 ∈ Rn
and B0 ∈ Rn×n . Let U be a unitary matrix. Consider Algorithm 1.19 starting
with x̃0 = U T x0 and Be0 = U T B0 U, applied to the function g̃(z) = U T g(U z),
z ∈ Rn . Then for every k = 0, 1, . . . ,
x̃k = U T xk and Bek = U T Bk U. (2.20)
In particular,
kg̃(x̃k )k = kg(xk )k.
Proof. Statement (2.20) is easily proved using an induction principle. For
k = 0, Equation (2.20) follows from the assumptions. We compute
x̃k+1 = x̃k − B e −1 g̃(x̃k )
k
= U T xk − U T Bk−1 U U T g(U U T xk )
= U T (xk − Bk−1 g(xk )) = U T xk+1 .
Therefore
g̃k+1 = g̃(x̃k+1 ) = U T g(U U T xk+1 ) = U T g(xk+1 ) = U T gk+1 ,
and
s̃k = x̃k+1 − x̃k = U T xk+1 − U T xk = U T sk .
This leads to
e T
ek + (ỹk − Bk s̃k )s̃k
ek+1 = B
B
s̃Tk s̃k
T
ek + g̃k+1 s̃k
= B
s̃Tk s̃k
U T g(U U T xk+1 )(U T sk )T
= U T Bk U +
(U T sk )T (U T sk )
g(xk+1 )sTk
= U T Bk U + U T T U
sk U U T sk
g(xk+1 )sTk
= U T (Bk + )U = U T Bk+1 U.
sTk sk
So, (2.20) is true for every k = 0, 1, . . . and
kg̃(x̃k )k = kU T g(U U T xk )k = kU T g(xk )k = kg(xk )k.
It might happen that a system is more or less singular. This is unprofitable

for the numerical procedures to solve this system. The question is whether
scaling of the system does change the rate of convergence of Broyden’s method.
For linear systems of equations we have the following result.
Lemma 2.18. Let g : Rn → Rn be an affine function. Suppose, for a certain

choice of x0 and H0 , the dimension of the zeroth Krylov space Z0 (2.16) is equal
to d0 . Let U be a nonsingular matrix, and consider the Broyden process starting
with x̃0 = U −1 x0 and Be0 = U −1 B0 U, applied to the function g̃(z) = U −1 g(U z),
n
z ∈ R . Then the method of Broyden needs at most 2d0 iterations to converge
exactly to the zero of g̃, i.e., g̃(x̃k ) = 0 for some k ≤ 2d0 .
Proof. First note that
g̃0 = g̃(x̃0 ) = U −1 g(U x̃0 ) = U −1 g(U U −1 x0 ) = U −1 g(x0 ) = U −1 g0 .

2.3 Linear transformations 69
If we apply the linear transformation x → U x, the zeroth Krylov space Ze0

eH
built with g̃0 and A e 0 becomes
Ze0 = span{g̃0 , A
eHe 0 g̃0 , (A
eHe 0 )2 g̃0 , . . .}
= span{U −1 g0 , U −1 AU U −1 H0 U U −1 g0 , (U −1 AU U −1 H0 U )2 U −1 g0 , . . .}
= span{U −1 g0 , U −1 AH0 g0 , U −1 (AH0 )2 g0 , . . .} = U −1 Z0
Because U is of full rank, the dimensions of Z0 and Ze0 are equal. Corollary
2.14 completes the proof.
Chapter 3
Limited memory Broyden

methods
In the previous chapters, we saw that the method of Broyden has several
advantages. In comparison with the method of Newton it does not need ex-
pensive calculation of the Jacobian of the function g. According to a clever
updating scheme of the Broyden matrix, every iteration step includes only
one function evaluation. This makes the method efficient for problems where
the evaluation of g is very time-consuming. Although Broyden’s method fails
to have local q-quadratic convergence it is still q-superlinearly convergent for
nonlinear equations and exact convergent for linear equations. In addition, the
method of Broyden turns out to be quite suitable for problems stemming from
applications, for example, from chemical reaction engineering, see Section 8.3.
A disadvantage of Broyden’s method arises if we consider high-dimensional
systems of nonlinear equations, involving a large amount of memory to store
the n2 elements of the Broyden matrix.
In this chapter, we develop a structure to reduce the number of storage
locations for the Broyden matrix. All methods described in this chapter are
based on the method of Broyden and reduce the amount of memory needed
for the Broyden matrix from n2 storage locations to 2pn storage locations.
Therefore we call these algorithms limited memory Broyden methods. The
parameter p is fixed during the iteration steps of a limited memory Broyden
method.
In Section 3.1, we describe how we can use the structure of the Broyden
update scheme, to write the Broyden matrix B as a sum of the initial Broyden
matrix B0 and an update matrix Q, which is written as the product of two
(n × p)-matrices, Q = CD T . The initial Broyden matrix is set to minus the
71
72 Chapter 3. Limited memory Broyden methods
identity at every simulation (B0 = −I). By applying a reduction to the rank

of Q in subsequent iterations of the Broyden process, the number of elements
to store never exceeds 2pn.
The Broyden Rank Reduction method is introduced in Section 3.2. This
method considers the singular value decomposition of Q, and applies the reduc-
tion by truncating the singular value decomposition up to p−1 singular values.
We prove under which conditions of the pth singular value of the update ma-
trix, the q-superlinear convergence of the method of Broyden is retained. In
addition, we discuss several properties of the Broyden Rank Reduction method
that also gives more insight in the original Broyden process.
To increase the understanding of limited memory Broyden methods we
give in Section 3.3 a generalization of the Broyden Rank Reduction method.
In Section 3.4, we observe a limited memory Broyden method coming from
the work of Byrd et al. [12]. This approach cannot be trapped in the frame-
work of Section 3.3 but due to its natural derivation, we have taken it into
consideration.
3.1 New representations of Broyden’s method

The updates of the ’good’ method of Broyden, Algorithm 1.19, are generated
by
sT ³ g(xk+1 )sTk ´
Bk+1 = Bk + (yk − Bk sk ) Tk = Bk + , (3.1)
sk sk sTk sk
with
sk = xk+1 − xk and yk = g(xk+1 ) − g(xk ).
Equation (3.1) implies that if an initial matrix B0 is updated p times, the

resulting matrix Bp can be written as the sum of the initial matrix B0 and p
rank one matrices, that is,
p−1
X sT
Bp = B 0 + (yk − Bk sk ) Tk = B0 + CDT , (3.2)
k=0
sk sk
where C = [c1 , . . . , cp ], D = [d1 , . . . , dp ] are defined by
ck+1 = (yk − Bk sk )/ksk k, dk+1 = sk /ksk k,
for k = 0, . . . , p − 1.
3.1 New representations of Broyden’s method 73
The sum of all correction terms to the initial Broyden matrix B0 in (3.2),
we call the update matrix. So, if Q denotes the update matrix, then
p
X
Q = CD T = ck dTk . (3.3)
k=1
By choosing B0 to be minus the identity (B0 = −I), the initial Broyden

matrix can be implemented in the code for the algorithm. So, it suffices to
store the (n×p)-matrices C and D. In addition, we take advantage of Equation
(3.3) to compute the product Qz for any vector z ∈ Rn . The following lemma
is clear.
Lemma 3.1. Let Q = CD T , where C and D are arbitrary (n × p)-matrices.

Storing the matrices C and D requires 2pn storage locations. Furthermore the
computation of the matrix vector product Qz = C(D T z), with z ∈ Rn , costs
2pn floating point operations.
In the next iteration step of the Broyden process, 2(p+1)n storage locations
are needed to store the Broyden matrix Bp+1 . In the following iteration step
2(p + 2)n storage locations are needed to store Bp+2 , etc. In case n is even,
after n/2 iterations of Broyden’s method, 2(n/2)n = n2 storage locations are
needed, which equals the number of storage locations we need for the Broyden
matrix itself. In other words, this alternative notation for the Broyden matrix,
given by (3.2), is only useful if p can be kept small (p ¿ n). However, if the
method of Broyden needs more than p iterations to converge, we have to reduce
the number of rank-one matrices that forms the update matrix (3.3). We fix
the maximal number of corrections to be stored at p. After p iterations of the
method of Broyden all columns of the matrices C and D are used. To make it
possible to proceed after these p iterations, the two next examples are obvious.
We remove all corrections made to the initial Broyden matrix and start all over
again, or, we freeze the Broyden matrix and neglect all subsequent corrections.
Example 3.2. Let g be the discrete integral equation function, given by

(A.5). As initial estimate we choose x0 given by (A.6) and we set ε = 10−12 .
We apply the original method of Broyden, Algorithm 1.19, where the updates
to the initial Broyden matrix are stored as in (3.3). After p iterations we
remove all stored corrections and restart the Broyden algorithm with initial
estimate x0 = xp . The dimension of the problem is fixed at n = 100, therefore
the initial residual is kg(x0 )k ≈ 0.7570. The rate of convergence is given in
Figure 3.1. It turns out that for p = 10 the same number of iterations are
needed as for the original method of Broyden. For p = 3 and p = 5 a few more
iterations are needed (24 and 26, respectively). However, for p = 2 about 92
iterations are needed.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5) where after p iterations the Broyden process is restarted.
[’◦’(Broyden), ’×’(p = 10), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]

We apply the original method of Broyden, Algorithm 1.19, where the updates
to the initial Broyden matrix are stored as in (3.3). After p iterations all future
corrections are neglected. The dimension of the problem if fixed at n = 100,
and the initial residual equals kg(x0 )k ≈ 0.7570. The rate of convergence is
given in Figure 3.2. The method is divergent for every value of p. Note that for
p = 1 and p = 2 the convergence behavior is equal. This can be explained by
Table 3.1. The difference between B1 and B2 is relatively small and therefore
it makes no difference whether we freeze the Broyden matrix after the first or
after the second iteration. Similarly the differences in l2 -norm between B3 , B4
and B5 is of order 10−1 and we see that the convergence behavior is equal for
p = 3, 4 and 5.
An appropriate name for the method used in Example 3.3 would be the
Broyden-Chord method. Unfortunately, the method does not work. The
method of Example 3.2 is more promising. However, it is worth investigating
whether it is possible to save more information about the previous iterations
of the process.
We introduce a more sophisticated approach. If p corrections to the initial
Broyden matrix are stored, the update matrix Q is the sum of p rank-one
k kBk+1 − Bk k k kBk+1 − Bk k
0 2.2125 10 1.893
1 0.052184 11 0.233
2 2.044 12 0.028375
3 0.11172 13 1.8165
4 0.19556 14 0.37294
5 2.014 15 0.0093723
6 0.0085303 16 1.9524
7 1.6974 17 0.10937
8 0.59031 18 0.060311
9 0.014515 19 1.9682
Table 3.1: The difference in l2 -norm between two subsequent Broyden matrices of
Algorithm 1.19 applied to the discrete integral equation function (A.5).
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
0 2 4 6 8 10 12 14 16 18 20 22
iteration k
Figure 3.2: The convergence rate of Algorithm 1.19 applied to the discrete inte-
gral equation function (A.5), where after p iterations the Broyden matrix is frozen.
[’◦’(Broyden), ’×’(p = 10), ’¦’(p = 8), ’O’(p = 7), ’+’(p = 6), ’M’(p = 5), ’/’(p = 3),
’.’(p = 2), ’✩’(p = 1)]
matrices, and has at most rank p. If we approximate the update matrix by a

matrix of lower rank q (q ≤ p − 1). This approximation, denoted by Q, e can
be decomposed using two (n × q)-matrices, C e and D.
e In this way, memory is
available to store p − q additional updates. Repeating this action every time
that p updates are stored, it is sure that the number of storage locations for
the Broyden matrix never exceeds 2pn.
In this chapter, we derive a number of limited memory Broyden methods
that based on trying to ’reduce’ the update matrix Q, given by (3.3). To gather
all methods in a general update-reduction scheme, we propose the following

conditions.
• The parameter p is predefined and fixed (1 ≤ p ≤ n).
• The Broyden matrix Bk is written as the sum of the initial Broyden

matrix B0 and the update matrix Q.
• The update matrix Q is written as a product of two (n × p)-matrices C

and D, that is Q = CD T , with C = [c1 , . . . , cp ] and D = [d1 , . . . , dp ]. A
rank-one update to the Broyden matrix is stored in a column of C and
the corresponding column of D.
• The current number of stored updates is denoted by m (0 ≤ m ≤ p).

The maximal number of updates to the initial Broyden matrix is thus
given by p.
• The initial Broyden matrix equals minus identity, B0 = −I. We start

the limited memory Broyden process with the matrices C and D equal
to zero (m := 0).
• A new update is stored in column m + 1 of the matrices C and D.

If already p updates are stored (m = p), a reduction is applied to the
update matrix just before the next update is computed. The new number
of updates after the reduction is denoted by q (0 ≤ q ≤ p − 1). No
reduction is performed as long as m < p.
• When applying the reduction, the decomposition CD T of the update

matrix is optionally rewritten by
e T = CZ T D
CDT = C(DZ) e T =: C
eDeT , (3.4)
where the matrix Z ∈ Rp×p is nonsingular. Thereafter the last p − q

columns of the matrices C and D are set to zero (m := q).
After rewriting the matrices C and D, the columns are ordered in such
a way that the last p − q columns of the matrices C and D can be removed
to perform the reduction. So, the first q columns are saved and the reduced
Broyden matrix is given by
p
X q
X
e = Bk −
B cl dTl = B0 + cl dTl . (3.5)
l=q+1 l=1
The new Broyden matrix B̄ after the updating scheme becomes

e + (yk − Bs
B̄ = B e k )sT /(sT sk ). (3.6)
k k
Only if m = p, a reduction to the update matrix is needed just before

storing the new correction. In the other case, if m < p, no reduction is applied
and the correction to the Broyden matrix is simply given by (3.1). This normal
update is stored in column m+1 of the matrices C and D. So, dm+1 = sk /ksk k
and
³ m
X ´
cm+1 := yk − B 0 s k − cl dTl sk /ksk k
l=1
= g(xk+1 )/ksk k. (3.7)
However, directly after a reduction is applied, care should be taken by comput-

ing the next update. Since the Broyden matrix Bk is replaced by a reduced
e (3.7) is no longer valid. Substituting (3.5) into (3.6) gives, with
matrix B,
m = q,
³ Xq ´
cm+1 := yk − B0 sk − cl dTl sk /ksk k (3.8)
l=1
or equivalently
³ p
X ´
cm+1 := g(xk+1 ) + cl dTl sk /ksk k. (3.9)
l=q+1
Note that the first approach, (3.8) has the disadvantage that we have to
store the vector yk . The number q determines which approach is the cheapest
one in floating points operations. Especially if q = p − 1, the second approach,
(3.9), is very attractive. The update is then reduced to
cp := (g(xk+1 ) + cp dTp sk )/ksk k.
In (3.9), the last p − q columns of C and D are still used to compute the new
update before they are set to zero. We proceed with the first approach.
We are now ready to give the algorithm of a general limited memory Broy-
den method.
Algorithm 3.4 (The limited memory Broyden method). Choose an
initial estimate x0 ∈ Rn , set the parameters p and q, and let C = [c1 , . . . , cp ],
D = [d1 , . . . , dp ] ∈ Rn×p be initialized by ci = di = 0 for i = 1, . . . , p (m := 0).
Set k := 0 and repeat the following sequence of steps until kg(xk )k < ε.
i) Solve (B0 + CDT )sk = −g(xk ) for sk ,
ii) xk+1 := xk + sk ,
iii) yk := g(xk+1 ) − g(xk ),

e = CZ T and D
iv) If m = p define C e = DZ −1 for a nonsingular matrix
Z ∈ Rp×p and set ci = di = 0 for i = q + 1, . . . , p (m := q),
v) Perform the Broyden update, i.e.,

P
cm+1 := (yk − B0 sk − m T
l=1 cl dl sk )/ksk k
dm+1 := sk /ksk k,
and set m := m + 1.
It actually turns out that one can avoid solving the large n-dimensional
system Bk sk = −g(xk ), by using the Sherman-Morrison formula (1.68). This
gives
(B0 + CDT )−1 = B0−1 − B0−1 C(I + D T B0−1 C)−1 DT B0−1 . (3.10)
By inspection of (3.10) we see that (I + D T B0−1 C) is a (p × p)-matrix. So,

we only have to solve a linear system in Rp . Due to our choice of the initial
Broyden matrix (B0 = −I), the inverse is trivial, that is B0−1 = −I.
The new update to the Broyden matrix is always made after the reduction
to the update matrix Q. So, the limited memory Broyden method is still a
secant method. Equation (3.6) implies that
e k + (yk − Bs
B̄sk = Bs e k )sT sk /(sT sk ) = yk .
k k
Finally, note that in the first p iteration steps no reduction takes place.
So, during these iterations the limited memory Broyden method is equivalent
to the method of Broyden. Since xp+1 is computed still using the original, not
yet reduced, Broyden matrix Bp . The difference between a limited memory
Broyden method and the method of Broyden itself can be detected only in
iteration step k = p + 2.
We make a first attempt to reduce the number of columns of the matrices
C and D. The simplest thought is to do nothing with the columns of the
matrices C and D (so, Z = I) and if no additional corrections to the Broyden
matrix can be stored, free memory can be created by removing old updates.
We just make a selection of q updates that we would like to keep. The columns
of C and D corresponding to these updates are placed in the first q columns
and hereafter the last p − q columns of both matrices are put to zero. After
the reduction additional updates can be stored for the next p − q iterations of
the Broyden process. We will discuss some of the basic choices for the updates
to save.
One possibility is removing the update matrix Q completely and start all
over again. Thus take q = 0 and remove all columns of C and D. Note,
however, that the Broyden process does not restart with the initial matrix
Be = B0 , because directly after the reduction a new update is stored in the
first columns of C and D. So the algorithm considered in Example 3.5 is indeed
different from the algorithm of Example 3.2. Note also that, in this case, it is
superfluous to rewrite the matrices C and D, because all columns are removed.

We apply Algorithm 3.4, where q is set to zero. Again the dimension is chosen
to be n = 100, and thus kg(x0 )k = 0.7570. The rate of convergence is given in
Figure 3.3. For p = 2, 4, 5 and 10 more or less the same number of iterations
are needed as for the method of Broyden itself. Only for p = 3 more iterations
are needed to converge and for p = 1 the process does not converge at all.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5) with q = 0. [’◦’(Broyden), ’×’(p = 10), ’M’(p = 5), ’∗’(p = 4),
’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
Another possibility to reduce the update matrix is to remove the first

column of both matrices, C and D, i.e., the oldest update of the Broyden
process. The parameter q is set to q := p − 1. If Z ∈ Rp×p is the permutation
matrix  
0 1
 .. .. 
 . . 
Z=
 ..
,
 (3.11)
 . 1
1 0
step (iv) of Algorithm 3.4, then implies that
£ ¤
e = CZ T = c2 · · ·
C c p c1
and £ ¤
e = DZ −1 = d2 · · ·
D d p d1 .
Example 3.6. Let g be the discrete integral equation function given by (A.5).
As initial estimate we choose x0 given by (A.6) and we set ε = 10−12 . We apply
the Algorithm 3.4, where q is set to p − 1 and Z is given by (3.11). We choose
n = 100 and thus kg(x0 )k ≈ 0.7570. The rate of convergence is given in Figure
3.4. For p = 2 and p = 3 a few more iterations are needed than for the method
of Broyden. For all other values of p the convergence is much slower. For p = 1
we have no convergence, which was already known because this is exactly the
same case as the algorithm used in Example 3.5 with p = 1.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
Figure 3.4: The convergence rate of Algorithm 3.4, applied to the discrete integral
equation function (A.5), with q = p − 1 and Z given by (3.11). [’◦’(Broyden), ’×’(p =
10), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
The next approach of reduction is removing the last column of both ma-
trices, C and D, i.e., the latest update of the Broyden process. So, again
q := p − 1, but now in step (iv) of Algorithm 3.4 the decomposition of the

update matrix is not rewritten (Z = I). Because the columns are removed
after the Broyden step, i.e., after xk+1 is computed, this approach is not equal
to freezing the Broyden matrix. Besides the new update is still computed and
added to the Broyden matrix. Therefore, this method is a secant method, that
is, the new Broyden matrix Bk+1 satisfies the secant equation (1.25).
Algorithm 3.4, where q is set to p − 1 and Z = I. We choose n = 100 and thus
kg(x0 )k ≈ 0.7570. The rate of convergence is given in Figure 3.5. The process
diverges for p = 4 and 5. For p = 2 and 3 the convergence is rather slow. Only
for p = 10 we have fast convergence. Note that we already have discussed the
case p = 1 in the previous two examples.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5), with q = p − 1 and Z = I. [’◦’(Broyden), ’×’(p = 10),
’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
Instead of removing one single update, we can remove the first two columns
of both matrices, C and D, the two oldest updates of the Broyden process.
In Section 1.3, we have seen that after the method of Broyden diverges for
one iteration, the next iteration it makes a large step in the right direction.
Perhaps two updates are in some way related. The parameter q is set to p − 2.
If Z ∈ Rp×p is the permutation matrix

 
0 0 1
 .. .. .. 
 . . . 
 
 .. .. 
Z= . . 1 (3.12)
 
 .. 
1 . 0
1 0
step (iv) of Algorithm 3.4 implies that

£ ¤
e = CZ T = c3 · · ·
C c p c1 c2
and
£ ¤
e = DZ −1 = d3 · · ·
D d p d1 d2 ,
and subsequently the last two columns of C and D are set to zero.
Algorithm 3.4, where q is set to p−2 and Z given by (3.12). We choose n = 100
and thus kg(x0 )k ≈ 0.7570. The rate of convergence is given in Figure 3.6. The
method cannot be applied for p = 1. The rate of convergence is rather fast for
the smaller values of p, for p = 8 and p = 10 the process converges slower.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5), with q = p − 2 and Z is given by (3.12). [’◦’(Broyden),
’×’(p = 10), ’¦’(p = 8), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2)]
3.2 Broyden Rank Reduction method 83
In the final example of this section, we remove the last two columns of the
matrices C and D, when a reduction has to be applied. So, we remove the
two latest update of the Broyden process. The parameter q is set to p − 2, and
again Z = I.
Algorithm 3.4, where q is set to p − 2 and Z = I. We choose n = 100 and
thus kg(x0 )k ≈ 0.7570. The rate of convergence is given in Figure 3.7. Again
the method cannot be applied for p = 1. It is Remarkable that for p = 5 and
p = 10 the rate of convergence is much lower.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5), with q = p − 2 and Z = I. [’◦’(Broyden), ’×’(p = 10),
’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2)]
Of course we could think of more fancy approaches for the selection of

the columns of C and D. Perhaps, it is interesting to remove all odd or all
even columns of the matrices C and D. A more serious approach would be
removing the p − q updates stored in the update matrix that are the smallest
in Frobenius norm. From Table 3.1 we can derive that these updates probably
are the 2nd, 4th and 5th update, etc, for the example used in this chapter.
3.2 Broyden Rank Reduction method

In this section, we arrive to the main work of this thesis. In Algorithm 3.4,
we have represented the Broyden matrix by
Bk = B0 + CDT ,
where C, D ∈ Rn×p . We store the corrections to the initial Broyden matrix in

the columns of the matrices C = [c1 , . . . , cp ] and D = [d1 , . . . , dp ]. The update
matrix, denoted by Q, is defined by
p
X
Q = CD T = cl dTl . (3.13)
l=1
In Section 3.1, we have tried to reduce the rank of Q during the Broyden
process. We saw that for small values of p often has difficulties to converge
and if the process succeeds to converge, the rate of convergence might be low.
In addition, if the limited memory Broyden method converges for a certain
value for p it might diverges for a larger value for p. Clearly, we cannot tell yet
whether and when removing an update destroys the structure of the Broyden
matrix too much.
To introduce a new special limited memory Broyden method, we first recall
some basic properties of singular values. Every real matrix A ∈ Rn×n can be
written as
A = U ΣV T = σ1 u1 v1T + · · · + σn un vnT (3.14)
where U = [u1 , . . . , un ] and V = [v1 , . . . , vn ] are orthogonal matrices and
Σ = diag(σ1 , . . . , σn ). The real nonnegative numbers σ1 ≥ . . . ≥ σn ≥ 0 are
called the singular values of A. Because, for i = 1, . . . , n,
AT Avi = V ΣU T U ΣV T vi = σi2 vi ,
AAT ui = U ΣV T V ΣU T ui = σi2 ui ,
the column vectors of U are the eigenvectors of AAT and are called the left
singular vectors of A, and the column vectors of V are the eigenvectors of A T A
and are called the right singular vectors of A. The rank of a matrix A equals p if
and only if σp is the smallest positive singular value, i.e., σp 6= 0 and σp+1 = 0.
The following basic theorem yields that the best rank-p approximation of a
matrix A is given by the first p terms of the singular value decomposition.
The proof can be found in [27].
Theorem 3.10. Let the singular value decomposition of A ∈ Rn×n be given
by (3.14). If q < r = rank A and
q
X
Aq = σk uk vkT
k=1
then
min kA − Bk = kA − Aq k = σq+1 ,
rank B=q
where k.k denotes the l2 -matrix norm.
An interpretation of this theorem is that the largest singular values of a

matrix A contain the most important information of the matrix A. The theory
of singular values can be extended to rectangular matrices, see again [27].
This leads us to consider the following reduction procedure for the limited
memory Broyden method. Compute the singular value decomposition of the
update matrix Q. Because the rank of Q is less or equal to p, the singular
value decomposition can be reduced to
Q = σ1 u1 v1T + · · · + σp up vpT .
Next choose q and remove the smallest p − q singular value and their corre-
sponding left and right singular vectors from the singular value decomposition
of Q.
In other words, considering the general Algorithm 3.4, for step (iv) the
singular value decomposition of Q is computed and stored in the matrices C
and D. Then, by setting the last p − q columns of both matrices to zero the
last p − q terms of the singular value decomposition are removed. This leads
to the best rank q approximation of the update matrix Q that is available in
the l2 -norm.
A problem we still have to deal with is that we do not want to compute the
(n × n)-update matrix Q explicitly. So, the question is how we can determine
the singular values of this matrix.
Using the QR-decomposition of D = DR e we observe that Q can be written
as
CDT = C(DR)e T = CRT D eT = C eDeT ,
where D e is orthogonal. Now, using the singular value decomposition of Ce=
T
U ΣW , we see that
eD
C e T = (U ΣW T )D
e T = (U Σ)(DW
e )T = C
bDbT .
Because W and D e are orthogonal matrices, the product D b is orthogonal as

b b T
well. Therefore, C D represents an economic version of the singular value
decomposition of the update matrix. The matrix Z in Algorithm 3.4 is given
by Z = W −1 R.
Note that the singular values of Ce are the square roots of the eigenvalues
of Ce C
T e which is an (p × p)-matrix. In addition, the matrix W consists of
the eigenvectors of Ce T C.
e The right singular vectors of Q are obtained using
these eigenvectors. So the n-dimensional problem of the computation of the
singular values of Q has, in fact, become a p-dimensional problem.
This limited memory Broyden method in which we remove the pth singu-
lar value of the update matrix in every iteration, is called the Broyden Rank
Reduction (BRR) method. In applications it turns out to be a very efficient

algorithm to solve high-dimensional systems of nonlinear equations. The the-
oretical justification of the BRR method is given in Theorem 3.12 where we
show under which conditions the method is q-superlinear convergent. The
general Algorithm 3.4 can be replaced by this new algorithm.
Algorithm 3.11 (The Broyden Rank Reduction method). Choose an

initial estimate x0 ∈ Rn , set the parameters p and q, and let C = [c1 , . . . , cp ],
D = [d1 , . . . , dp ] ∈ Rn×p be initialized by ci = di = 0 for i = 1, . . . , p (m := 0).
Set k := 0 and repeat the following sequence of steps until kg(xk )k < ε.
i) Solve (I − D T C)tk = DT g(xk ) for tk ,
ii) sk := g(xk ) + Ctk ,
iii) xk+1 := xk + sk ,
iv) yk := g(xk+1 ) − g(xk ),

e
v) Compute the QR-decomposition of D = DR,
e
D := D, T
C := CR ,
vi) Compute the SVD of C = U ΣW T , (σ1 ≥ · · · ≥ σp )

C := U Σ, D := DW,
vii) If m = p then set ci = di = 0 for i = q + 1, . . . , p (m := q),
viii) Perform the Broyden update, i.e.,

P
cm+1 := (yk + sk − m T
l=1 cl dl sk )/ksk k
dm+1 := sk /ksk k,
and set m := m + 1.
In Algorithm 3.11, we compute the singular value decomposition of Q in

every iteration, even if we apply no reduction, in order to obtain a better
understanding of the importance of the updates to the Broyden matrix. For
economical reasons it would be better to compute the singular value decom-
position
Note that in step (v) of the first iteration of Algorithm 3.11 the QR-
decomposition is computed of a zero matrix. Since, however, the matrix R is
then set to zero, we can choose any orthogonal matrix Q without disturbing
the procedure.
To proof the q-superlinear convergence of a limited memory Broyden method

we observe that before a Broyden update is applied, the Broyden matrix is
reduced using a reduction matrix R,
e = B − R.
B
The new updated Broyden matrix B̄ is therefore given by
sT ³ ssT ´
e + (y − B)s
B̄ = B e T /(sT s) = B + (y − Bs) − R I − . (3.15)
sT s sT s
Comparable to the proof of the convergence of Broyden’s method, Theorem
1.26, we estimate the difference between the new Broyden matrix and the
Jacobian of g at x∗ . It follows that
³ ssT ´ sT ³ ssT ´
B̄ − Jg (x∗ ) = (B − Jg (x∗ )) I − T + (y − Jg (x∗ )s) T − R I − T .
s s s s s s
Thus instead of (1.58) we obtain
kB̄ − Jg (x∗ )kF ≤ kB − Jg (x∗ )kF + γ max{kx̄ − x∗ k, kx − x∗ k} + kRk (3.16)
According to Theorem 1.24, a general limited memory Broyden method would

be linearly convergent to x∗ if the norm kRk of the reduction R can be esti-
mated by the length of the Broyden step ksk, because then
kRk ≤ ksk ≤ 2 max{kx̄ − x∗ k, kx − x∗ k}. (3.17)
So, (1.38) is satisfied for all B̄ ∈ Φ(x, B) where Φ : Rn × L(Rn ) → P{L(Rn )}

is defined as Φ(x, B) = {B̄ | kRk ≤ ksk, s 6= 0} and B̄ by (3.15). This leads
to the following theorem.
convex set D ⊂ Rn , and assume that Jg ∈ Lip γ(D). Let x∗ be a zero of g,
for which Jg (x∗ ) is non-singular. Then the update function Φ(x, B) = {B̄ :
kRkF < ksk, s 6= 0}, where
sT ³ ssT ´
B̄ = B + (y − Bs) − R I − ,
sT s sT s
is well defined in a neighborhood N = N1 × N2 of (x∗ , Jg (x∗ )), and the corre-
sponding iteration
xk+1 = xk − Bk−1 g(xk )
with Bk+1 ∈ Φ(xk , Bk ), k ≥ 0, is locally and q-superlinearly convergent at x∗ .
The proof of the q-superlinear convergence of a limited memory Broyden

method is identical to the proof of Theorem 1.26. Due to (3.16) and (3.17),
the inequality in (1.57) holds, if γ is replaced by γ + 2.
We apply Algorithm 3.11 to our test function.

We apply Algorithm 3.11, for different values of p, and q := p−1. So, we remove
only the smallest singular value starting from the pth iteration. In Table 3.2,
the convergence results for the BRR method are given, for different values of
p. It turns out that the method only converges for p ≥ 7, however in those
cases the rate of convergence is exactly the same as the rate of convergence of
Broyden’s method, see Figure 3.8. In Figure 3.9, we consider the ratio between
the removed singular value σp and the size of the Broyden step ksk−1 k in the
kth iteration, k = 0, . . . , k ∗ . It is clear that for every p the quotient σp /ksk−1 k
eventually increases. If this quotient becomes of order one, the BRR method
get difficulties to achieve the fast convergence and the process starts to deviate
from the convergence of the method of Broyden.
method n p kg(x0 )k kg(xk∗ )k k∗ R
Broyden 100 - 0.7570 4.4433 · 10−13 21 1.3411

BRR 100 10 0.7570 4.4433 · 10−13 21 1.3411
BRR 100 8 0.7570 4.4469 · 10−13 21 1.3411
BRR 100 7 0.7570 3.6068 · 10−13 21 1.3511
BRR 100 6 0.7570 1.6256 · 10−3 200 0.0307
BRR 100 5 0.7570 3.4376 · 10−2 200 0.0155
BRR 100 4 0.7570 1.7514 · 10−1 200 0.0073
BRR 100 3 0.7570 1.9912 · 10+22 160 −0.3227
BRR 100 2 0.7570 4.6464 · 10+30 55 −1.2889
BRR 100 1 0.7570 3.2381 · 10−2 200 0.0158
Table 3.2: Characteristics of Algorithm 3.11 applied to the discrete integral equation
function (A.5), with q = p − 1.
We can conclude that the Broyden Rank Reduction method converges as

fast as the method of Broyden as long as the quotient σp /ksk−1 k remains small.
If the quotient grows, we cannot control the convergence process.
Instead of removing the smallest singular value we also could remove other
singular values from the SVD of the update matrix Q. If we want to remove
the largest singular value in every iteration, we have to include an intermediate
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5), with q = p − 1. [’◦’(Broyden), ’×’(p = 10), ’¦’(p = 8),
’O’(p = 7), ’+’(p = 6), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
10
10
quotient σp /ksk−1 k
0
10
PSfrag replacements −10

10
0 5 10 15 20 25 30 35 40
iteration k
Figure 3.9: The quotient σp /ksk−1 k for Algorithm 3.11, applied to the discrete integral
’O’(p = 7), ’+’(p = 6), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
step. After computing the singular value decomposition of the update matrix
in step (vi), we additionally permute the columns of the matrices C and D, so
that the first column of both matrices is moved to the last column. In other
words, we apply Algorithm 3.4 where q is set to q = p − 1 and the matrix Z
is equal to  
0 1
 .. .. 
 . .  −1
Z=
 ..
 W R.
 (3.18)
 . 1
1 0

We apply Algorithm 3.4 with q := p − 1 and Z defined by (3.18) So, we
remove only the largest singular value starting from the pth iteration. In
Figure 3.10, we observe that the process diverges shortly after we remove the
largest singular value from the singular value decomposition of the update
matrix.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
10), ’O’(p = 7), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’✩’(p = 1)]
The computations
In order to compute the singular value decomposition of the update matrix
Q = CD T , we use two steps. First we make the matrix D orthogonal and
e In these
then we compute the singular value decomposition of the matrix C.
steps two important matrices are involved. We now have a closer view to the
the matrix R of the QR-decomposition of D and the matrix W, containing the
e T C.
eigenvectors of C e
Note that after p iterations of the BRR process, the matrix D is nearly
orthogonal. The first p − 1 columns denoted by v1 , . . . , vp−1 are the right
singular vectors of the previous update matrix, and form an orthonormal set
in the Rn . Let cdT be the new rank-one update to the Broyden matrix, then
the last column of D contains the vector d. The decomposition of the update
matrix is rewritten by
e T = CRT D
CDT = C(DR) eT = C
eDeT ,
e is the QR-decomposition of D. So, R has the structure

where DR
 
1 r1p
 .. .. 
 . . 
R= 
 1 r1,p−1 
rpp
where the last column (r1p , . . . , rpp ) describes how the new vector d is dis-
tributed over the old ’directions’ of the update matrix. In fact rlp = vlT d,
l = 1, . . . , p − 1, and rpp normalizes the new vector d˜p after the orthog-
onalization. The matrix R is invertible if and only if rpp 6= 0, that is, if
d∈/ span{v1 , . . . , vp−1 }. The inverse matrix is then given by
 
1 −r1p /rpp
 .. .. 
 . . 
R−1 =  ,
 1 −r1,p−1 /rpp 
1/rpp
e = DR−1 , thus
and D
p−1
X rlp
1
d˜p = d− vl ,
rpp rpp
l=1
which is equivalent to the Gramm-Schmidt orthogonalization of d with re-

spect to the orthonormal set {v1 , . . . , vp−1 }. On the other hand, if rpp = 0
then d ∈ span{v1 , . . . , vp−1 } and d˜p can be any vector orthogonal to the set
{v1 , . . . , vp−1 }.
In order to obtain the singular value decomposition of Q, the eigenvectors
e
of C T C e are computed and stored in the (p × p)-matrix W. The right singular
vectors of Q are obtained by multiplying W from the left by D. e So,
eD
C e T = (U ΣW T )D
e T = (U Σ)(DW
e )T = C
bDbT .
After the first p iterations of the BRR process, W has no particular structure
 
w11 · · · w1p
 ..  .
W =  ... ..
. . 
wp1 · · · wpp
Nothing can be said about the entries of the matrix, because a rank-one per-
turbation of a matrix can disturb the singular value decomposition completely.
On the other hand, if C eDe T is already in SVD-format then W = I. The matrix
W tells us how we have to turn the columns of D e to obtain the right singular
vectors of Q. By considering the diagonal of W we can observe whether or
not the update to the Broyden matrix changes the form of the singular value
decomposition.
Tables 3.3 and 3.4 can be explained in the following way. Because ini-
tially the matrices C and D are zero, the singular values decomposition of the
update-matrices does not have to be computed in the first iteration. For k = 2
it is trivial that all but the first element of the first column of R are equal to
zero, since m = 1. For k = 3 the element |r12 | is close to one. This implies
that the first two Broyden steps, s0 and s1 , point in more or less the same
direction. Note that the difference between the Broyden matrices B1 and B2
is small, see Table 3.1. The diagonal of W shows that the update matrix is
in singular value decomposition format, in spite of the addition of a rank-one
matrix. In the fourth iteration (k = 4) a second direction orthogonal to the
first is involved. According to the diagonal of W the two directions have to
be adjusted slightly to obtain the singular value decomposition. In the sixth
iteration (k = 6) the fourth and fifth direction are twisted (|w44 |, |w55 | 6= 1).
Note that the singular values corresponding to these directions are small. Di-
rectly after the introduction of a third direction in iteration 7 the second and
third direction are twisted. So, a direction is found that is more important
than the second direction of the last iteration. In iteration k = 8 the Broyden
step lies mainly in this new direction. Note that the first direction obtained by
Broyden’s method, remains the principal direction in all subsequent iterations.
£ ¤ £ ¤
k m |r1m | ··· |rpm | |w11 | ··· |wpp |
1 1
2 1
0 0
1 1
3 2
0 0
1 1
4 3
0 0
1 1
5 4
0 0
1 1
6 5
0 0
1 1
7 6
0 0
1 1
8 7
0 0
1 1
9 7
0 0
1 1
10 7
0 0
1 1
11 7
0 0
Table 3.3: The absolute values of the elements of column m of R and the diagonal
of W during the BRR process, Algorithm 3.11 with p = 7 and q = 6, applied to the
discrete integral equation function (A.5) (n = 100).
£ ¤ £ ¤
k m |r1m | ··· |rpm | |w11 | ··· |wpp |
1 1
12 7
0 0
1 1
13 7
0 0
1 1
14 7
0 0
1 1
15 7
0 0
1 1
16 7
0 0
1 1
17 7
0 0
1 1
18 7
0 0
1 1
19 7
0 0
1 1
20 7
0 0
1 1
21 7
0 0
Table 3.4: The absolute values of the elements of column m of R and the diagonal
of W during the BRR process, Algorithm 3.11 with p = 7 and q = 6, applied to the
discrete integral equation function (A.5) (n = 100).
The Broyden Rank Reduction Inverse method

The reduction process to the update matrix of the Broyden matrix can also be
applied in case of the inverse notation of the method of Broyden. The inverse
Broyden matrix H can also be written as the sum of the initial matrix H0 and
an update matrix Q. Apart from the computation of the Broyden step and the
rank-one update to the Broyden matrix, the algorithm is essentially the same
and has similar convergence properties. The Sherman-Morrison-Woodbury
formula (1.68) shows, however, that Algorithm 3.11 and Algorithm 3.15 are
not identical.
Algorithm 3.15 (The Broyden Rank Reduction Inverse method).

Choose an initial estimate x0 ∈ Rn , set the parameters p and q, and let
C = [c1 , . . . , cp ], D = [d1 , . . . , dp ] ∈ Rn×p be initialized by ci = di = 0 for
i = 1, . . . , p (m := 0). Set k := 0 and repeat the following sequence of steps
until kg(xk )k < ε.
i) sk := g(xk ) − CD T g(xk ),
ii) xk+1 := xk + sk ,
iii) yk := g(xk+1 ) − g(xk ),

e
iv) Compute the QR-decomposition of D = DR,
e
D := D, T
C := CR ,
v) Compute the SVD of C = U ΣW T , (σ1 ≥ · · · ≥ σp )

C := U Σ, D := DW,
vi) If m = p then set ci = di = 0 for i = q + 1, . . . , p (m := q),
vii) Perform the Broyden update, i.e.,

P
α := −sTk yk + m T T
l=1 (cl sk )(dl yk ),
Pm
cm+1 := (sk + yk − l=1 cl dTl yk )/α,
P
dm+1 := −sk + m T
l=1 dl cl sk ,
and
cm+1 := cm+1 · kdm+1 k,

dm+1 := dm+1 /kdm+1 k,
and set m := m + 1.

We apply Algorithm 3.15, for different values of p, and q := p − 1. So, we
remove only the smallest singular value starting from the pth iteration. It
turns out that the method converges as fast as the method of Broyden for
p ≥ 7, see Figure 3.8. For p = 6 just a few more iterations are needed. In
Figure 3.9, we consider the ratio between the removed singular value σ p and
the size of the Broyden step ksk−1 k in the kth iteration, k = 0, . . . , k ∗ . It
is clear that for every p the quotient σp /ksk−1 k eventually increases. If this
quotient becomes of order one, the BRR method get difficulties to achieve the
fast convergence and the process starts to deviate from the convergence of the
method of Broyden. Note that the results are similar to those of Example
3.13.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
’O’(p = 7), ’+’(p = 6), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
3.3 Broyden Base Reduction method

We now develop a generalization of the reduction methods described in the
previous section. For this purpose we repeat some results. The Broyden
matrix after the pth correction can be written
B = B0 + Q,
3.3 Broyden Base Reduction method 97
10
10
0
10

10
0 5 10 15 20 25 30 35 40
iteration k
Figure 3.12: The quotient σp /ksk−1 k for Algorithm 3.15 applied to the discrete inte-
gral equation function (A.5), with q = p − 1. [’◦’(Broyden), ’×’(p = 10), ’¦’(p = 8),
’O’(p = 7), ’+’(p = 6), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
where the update matrix, denoted by Q, has at most rank p. As we have seen
before Q is the product of two (n × p)-matrices C and D,
Q = CD T .
In order to reduce the rank of Q, we propose the following approach. Let

V be a q-dimensional subspace of the Rn with orthonormal basis {v1 , . . . , vq }
(q ≤ p). The idea is that Q is approximated with a new matrix Q e without
destroying the action of the update matrix on the q-dimensional subspace V,
i.e.,
e V.
Q|V = Q| (3.19)
e has rank less than q, it is set equal to zero on the orthogonal
To assure that Q
complement of V, thus
Q|V ⊥ = 0. (3.20)
The new update matrix Qe can be decomposed in two (n×q)-matrices C e and D,
e
e
where D equals the matrix V = [v1 , . . . , vq ] that consists of the basis vectors.
By (3.19) and the orthogonality of V, it follows that
e = CV
QV = QV e T V = C.
e
Note that we have projected the update matrix on the q-dimensional subspace
V,
e = QV V T .
Q
Besides, the second condition (3.20) is fulfilled, because for u ⊥ V it follows

e = QV V T u = 0. The usefulness of this approach depends on the choice
that Qu
of the subspace V.
Notice that if V = Im D with dim V = q ≤ p and Q e is defined by (3.19),
then
{Ker Q}⊥ = Im QT = Im DC T ⊂ Im D = V.
Thus V ⊥ ⊂ Ker Q and
e V ⊥ = 0,
Q|V ⊥ = Q|
which implies that Q and Q e are equal on the whole Rn . The difference in
decomposition between CD and C T eDe T is that D
e is an orthogonal (n × q)-
matrix and D is not necessarily.
Because the number of columns of C and D is reduced using an orthonor-
mal basis {v1 , . . . , vq } of the subspace V, we call this approach the Broyden
Base Reduction (BBR) method.
Example 3.17. We assume that rank Q ≤ p and take the subspace V spanned
by the right singular vectors {v1 , . . . , vp }, corresponding to the largest p sin-
gular values σ1 , . . . , σp of Q. The set of right singular vectors also forms an
orthonormal basis of V. We define D e := V = [v1 , . . . , vp ] and C
e := QV. The
product CeDe T represents the singular value decomposition of Q, since
eT C
C e = V T QT QV = V T Σ2 V = Σ2 ,
e are orthogonal and

where Σ = diag(σ1 , . . . , σp ) implies that the columns of C
there exists an orthogonal (n × p)-matrix U such that C e = U Σ. By taking
the subspace V = span{v1 , . . . , vq } with q < p the Broyden Rank Reduction
method is obtained.
After p iterations of Broyden’s method, the matrix D is given by

h sp−1
i
s
D = ks00 k · · · ksp−1 k .
To apply a reduction on the columns of D, we take a q-dimensional subspace,

V that contains the vectors {sp−q , . . . , sp−1 }, q < p. To obtain an orthonormal
basis for V we can compute the QL-decomposition of D = V L, where V is an
orthogonal (n × q)-matrix, and L is a lower triangular (p × p)-matrix. We use
the QL-decomposition instead of the usual QR-decomposition because then
for l = 1, . . . , p, we have that
span{sl−1 , . . . , sp−1 } ⊂ span{vl , . . . , vp }.

3.3 Broyden Base Reduction method 99
So, we can take V = span{vp−q+1 , . . . , vp }. We rewrite the decomposition of

Q by
CDT = C(V L)T = CLT V T =: C eDeT . (3.21)
e and D
By removing the first (p − q) columns of C e the update matrix retains
the same action on the last q Broyden steps. Note that this is not the case in
Example 3.6. This reduction is applied whenever the maximum number of p
columns in C and D is reached.

We apply Algorithm 3.4, for q := p − 1 and Z given by
 
0 1
 .. .. 
 . . 
Z=   L, (3.22)
.. 
 . 1
1 0
where L comes from the QL-decomposition of D. In Figure 3.13, we observe

that the rate of convergence is high for p = 8. For smaller values of p the rate
of convergence is rather low, or the process even diverges.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
10), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
Another choice for V could be the subspace that contains the first p − 1
Broyden steps. Thus V ⊃ span{s0 , . . . , sp−2 }. Note that after the pth iteration,
the subspace V is set to Im V where D = V R. In the subsequent iterations,

the subspace V remains fixed. This implies that after every iteration, the new
correction cdT to the Broyden matrix is subdivided over the p − 1 existing
directions, because the update matrix is rewritten as
eD
CDT = C(V R)T = CRT V DT =: C eT , (3.23)
where V = [v1 , . . . , vp−1 , d˜p ]. The last column d˜p of V is orthogonal to the base
vectors v1 , . . . , vp−1 . After the reduction the first p − 1 columns of the matrix
C have been adapted and the first p − 1 columns of D are still the base vectors
v1 , . . . , vp−1 . Because we store the basis from the pth iteration, we call this
method the Broyden Base Storing (BBS) method.
We apply Algorithm 3.4, for q := p−1 and Z = R, where R comes from the QR-
decomposition of D. In Figure 3.13, we observe that the rate of convergence
is again high for p = 8. For smaller values of p the rate of convergence is very
low, or the process diverges.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5), with q = p − 1 and Z = R. [’◦’(Broyden), ’×’(p = 10),
’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
3.4 The approach of Byrd

In 1994, Byrd, Nocedal and Schnabel derived a compact representation of
the matrices generated by Broyden’s update (3.1) for systems of nonlinear
3.4 The approach of Byrd 101
equations. These new compact representation is of interest in its own right, but
also of use in limited memory methods. Therefore we include the derivation
in this chapter.
Let us define the (n × k)-matrices Sk and Yk by
£ ¤ £ ¤
Sk = s0 . . . sk−1 , Yk = y0 . . . yk−1 . (3.24)
We first prove a preliminary lemma on products of projection matrices

yk sTk
Vk = I − , (3.25)
ykT sk
that will be useful in subsequent analysis and is also interesting in its own
right.
Lemma 3.20. The product of a set of k projection matrices of the form (3.25)
satisfies
V0 · · · Vk−1 = I − Yk Rk−1 SkT , (3.26)
where Rk is the (k × k)-matrix
(
sTi−1 yj−1 if i ≤ j,
(Rk )i,j =
0 otherwise.
Proof. Proceeding by induction, we note that (3.26) holds for k = 1, because

in this case the right hand side of (3.26) is given by
1 T
I − y0 s = V0 .
sT0 y0 0
Now, assume that (3.26) holds for some k, and consider k + 1. If we write the
matrix Rk+1 as · ¸
Rk SkT yk
Rk+1 = ,
0 1/ρk
we see that · ¸
−1 Rk−1 −ρk Rk−1 SkT yk
Rk+1 = .
0 ρk
This implies that
· ¸· ¸
−1 T
£ ¤ R−1 −ρk Rk −1S T yk SkT
I− Yk+1 Rk+1 Sk+1 = I − Yk yk k k
0 ρk sTk
= I − Yk Rk−1 SkT + ρk Yk Rk−1 SkT yk sTk − ρk yk sTk
= (I − Yk Rk−1 SkT )(I − ρk yk sTk ).
Together with the induction hypothesis, we obtain
V0 · · · Vk = (I − Yk Rk−1 SkT )(I − ρk yk sTk )

−1 T
= (I − Yk+1 Rk+1 Sk+1 ),
which establishes the product relation (3.26) for all k.
Compact representation of the Broyden matrix

As before, we define
£ ¤ £ ¤
Sk = s0 . . . sk−1 , Yk = y0 . . . yk−1 ,
and we assume that the vectors si are nonzero.
Theorem 3.21. Let B0 be a nonsingular starting matrix, and let Bk be ob-

tained by updating B0 k times using Broyden’s formula (3.1) and the pairs
k−1
{si , yi }i=0 . Then
Bk = B0 + (Yk − B0 Sk )Nk−1 SkT , (3.27)
where Nk is the k × k matrix
(
sTi−1 sj−1 if i ≤ j,
(Nk )i,j = (3.28)
0 otherwise.
Proof. It is easy to show (using induction) that Bk can be written as
Bk = C k + D k , (3.29)
where Ck and Dk are defined recursively by
C0 = B 0 , Ck+1 = Ck (I − ρk sk sTk ) k = 0, 1, 2, . . . ,
and
D0 = 0, Dk+1 = Dk (I − ρk sk sTk ) + ρk yk sTk k = 0, 1, 2, . . . , (3.30)
where
ρk = 1/sTk sk .
Considering first Ck we note that it can be expressed as the product of C0
with a sequence of projection matrices,
Ck = C0 (I − ρ0 s0 sT0 ) · · · (I − ρk−1 sk−1 sTk−1 ). (3.31)

Now we apply Lemma 3.20, with y := s in the definition (3.25), to (3.31) in

order to obtain
Ck = B0 − B0 Sk Nk−1 SkT , (3.32)
for all k = 1, 2, 3, . . . .
Next we show by induction that Dk has the compact representation
Dk = Yk Nk−1 SkT . (3.33)
By the definition (3.30), we have that D1 = y0 ρ0 sT0 , which agrees with (3.33)
for k = 1. Assume now that (3.33) holds for some k. Then by (3.30),
Dk+1 = Yk Nk−1 SkT (I − ρk sk sTk ) + ρk yk sTk
= Yk Nk−1 SkT − ρk Yk Nk−1 sk sTk + ρk yk sTk
· ¸· ¸ · ¸· ¸
£ ¤ N −1 −ρk Nk −1S T sk SkT £ ¤ 0 0 SkT
= Y k yk k k + Y k yk
0 0 sTk 0 ρk sTk
· −1 ¸
Nk −ρk Nk −1SkT sk T
= Yk+1 Sk+1 . (3.34)
0 ρk
Note, however, that
· −1 ¸· ¸
Nk −ρk Nk −1SkT sk Nk SkT sk
= I,
0 ρk 0 1/ρk
−1
which implies that the second matrix on the right hand side of (3.34) is Nk+1 .
By induction this establishes (3.33). Finally, substituting (3.32) and (3.33) in
(3.29), we obtain (3.27).
We now derive a compact representation of the inverse Broyden update

which is given by
sTk Hk
Hk+1 = Hk + (sk − Hk yk ) (3.35)
sTk Hk yk
Theorem 3.22. Let H0 be a nonsingular starting matrix, and let Hk be ob-
tained by updating H0 k times using the inverse Broyden’s formula (3.35) and
k−1
the pairs {si , yi }i=0 . Then
Hk = H0 + (Sk − H0 Yk )(Mk + SkT H0 Yk )−1 SkT H0 , (3.36)
where Sk and Yk are given by (3.24) and Mk is the (k × k)-matrix
(
−sTi−1 sj−1 if i > j,
(Mk )i,j = (3.37)
0 otherwise.
Proof. Let
U = Y k − B0 Sk , V T = Nk−1 SkT ,
so that (3.27) becomes
Bk = B 0 + U V T
Applying the Sherman-Morrison-Woodbury formula (1.68), we obtain
Hk = Bk−1 = B0−1 − B0−1 U (I + V T B0−1 U )−1 V T B0−1

= H0 − H0 (Yk − B0 Sk )(I + Nk−1 SkT H0 (Yk − B0 Sk ))−1 Nk−1 SkT H0
= H0 − (H0 Yk − Sk )(Nk + SkT H0 Yk − SkT Sk )−1 SkT H0 .
By (3.28) and (3.37) we have Nk − SkT Sk = Mk , which gives (3.36).
Note that since we have assumed that all the updates given by (3.35) exist,
we have implicitly assumed the non-singularity of Bk . This non-singularity
along with the Sherman-Morrison formula (1.68) ensures that (Mk + SkT H0 Yk )
is nonsingular.
In applications, we only use the representation (3.36) of the inverse Broy-
den matrix. Because we always start with H0 = −I as initial matrix (3.36) is
reduced to
Hk = −I − (Yk + Sk )(Mk − SkT Yk )−1 SkT . (3.38)
The matrix we want to invert, (Mk −SkT Yk ), can be approximately singular,
because the size of the Broyden step ksk k decreases if the process converges.
In that case, the norm of the first column of M is much larger than the
norm of the last but one column. In the pth iteration, the first column of M
equals (0, −sT1 s0 , . . . , −sTp−1 s0 ) and column p − 1 equals (0, . . . , 0, −sTp−1 sp−2 ).
For the same reason the (p × p)-matrix SpT Yp probably does not have p large
singular values. In addition, the vectors {s0 , . . . , sp−1 } can be more or less
linear dependent. There exists a remarkable way to solve this problem. Instead
of storing the Broyden steps and their yields, we define
h sp−1
i
s
D = ks00 k · · · ksp−1 k = Sp T, (3.39)
h yp−1
i
y
C = ks00 k · · · ksp−1 k = Yp T, (3.40)
where T = diag(1/ks0 k, . . . , 1/ksp−1 k). Note that T is invertible, since sk 6= 0

during the Broyden process. So, we substitute Sp = DT −1 and Yp = CT −1
into (3.38) and arrive at
Hp = −I − (DT −1 + CT −1 )(T −1 (T M T )T −1 − (DT −1 )T CT −1 )−1 (DT −1 )T

= −I − (D + C)((T M T ) − D T C)−1 DT . (3.41)
Note that the product T M T equals the (p × p)-matrix

( sT
(
s
i−1
− ksi−1 j−1
k sj−1 if i > j, −dTi−1 dj−1 if i > j,
(T M T )i,j = =
0 otherwise. 0 otherwise.
Removing updates
Using this notation, the method of Broyden can simply be transformed into a
limited memory method. We apply the condition that at most p updates to
the Broyden matrix can be stored. The Broyden steps and their corresponding
yields are stored according to (3.39) and (3.40) in the matrices C and D. So,
before a new Broyden step in iteration k = p + 1 can be computed, we have
to remove a column of both matrices C and D.
Algorithm 3.23 (The limited memory Broyden method of Byrd).

Choose an initial estimate x0 ∈ Rn , set the parameters p and q, and let
C = [c1 , . . . , cp ], D = [d1 , . . . , dp ] ∈ Rn×p be initialized by ci = di = 0 for
i = 1, . . . , p (m := 0). Set k := 0 and repeat the following sequence of steps
until kg(xk )k < ε.
i) Compute for i = 1, . . . , m and j = 1, . . . , m,

(
−dTi−1 dj−1 if i > j,
Mi,j = (3.42)
0 otherwise.
Pm T
Pm T
ii) Solve (M − l=1 dl cl )tk =− l=1 dl g(xk ) for tk ,
Pm
iii) dm+1 := −g(xk ) + l=1 (cl + dl )tk ,
iv) xk+1 := xk + dm+1 ,
v) cm+1 := g(xk+1 ) − g(xk ),
vi) dm+1 := dm+1 /dm+1 and cm+1 := cm+1 /cm+1 ,
vii) Let m := m + 1,
viii) If m = p then set cl = dl = 0 for l = q + 1, . . . , p (m := q).
Note that we changed the meaning of the matrices C and D compared to

the other limited memory methods. Instead of Bk = B0 + CDT we have in
Algorithm 3.23 the matrix Bk given by

 
dT1
£ ¤ £ ¤  
B k = B 0 + ( c1 · · · c m + d1 · · · dm )N −1  ...  (3.43)
dTm
where the matrix N is given by

(
dTi−1 dj−1 if i ≤ j,
(N )i,j = (3.44)
0 otherwise,
for i = 1, . . . , m and j = 1, . . . , m. Note that the dimensions of N and M

depend on m, and so these dimensions are variable.
Because after the reduction step (vii) the latest (scaled) Broyden step s k
and its yield yk are stored in column q of the matrices C and D, Algorithm
3.23 is still a secant method.
Theorem 3.24. Let Bk+1 be given by (3.43), where C = [c1 , . . . , cm ] and

D = [d1 , . . . , dm ] are both (n × m)-matrices and N is given by (3.44). If
sk /ksk k is stored in column m of D and yk /ksk k is stored in column m of C,
then Bk+1 satisfies the secant equation (1.25).
Proof. Because N is non-singular, v = ksk k · em is the unique solution of the

equation
N v = D T sk
or, equivalently
   T 
dT0 d0 · · · dT0 dm−1 dT0 ksskk k d0 sk
 .. .. ..   .. 
 . . .   . 
  v =  
 T T sk 
dm−1 dm−1 dm−1 ksk k  dT sk  .
  m−1 
sT sk sT
k k
s
ksk k k
ksk k ksk k
Therefore
Bk+1 sk = B0 sk + (C − B0 D)N −1 DT sk
= B0 sk + (C − B0 D)ksk kem
= B 0 sk + y k − B 0 sk = y k .
We observe that in case of p = 1 the update to the Broyden matrix is

directly removed at the end of every iteration step. Therefore, Algorithm 3.23
equals dynamical simulation for p = 1.

We apply Algorithm 3.23, for different values values of p, the parameter q is
set to p − 1. The rate of convergence is given in Figure 3.15. Clearly, for every
value of p the method needs more iterations than the method of Broyden.
Only for p = 3 and p = 4 the convergence is reasonably fast. Note that for
p = 2 we have again the same result as we have for p = 1 in case of the other
limited memory Broyden methods. For p = 1 the method directly diverges.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
equation function (A.5), for different values of p. [’◦’(Broyden), ’×’(p = 10), ’M’(p =
5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
Part II
Features of limited memory

methods
109
Chapter 4
Features of Broyden’s method
In Part I we provided the theoretical background to the limited memory Broy-

den methods. We discussed the derivation and convergence of the method of
Broyden and indicated the freedom in the algorithm to reduce the amount of
memory to store the Broyden matrix and still preserving the fast convergence.
In this chapter we investigate whether the characteristics of the function g
can tell us whether or not our main algorithm, the Broyden Rank Reduction
method, will succeed to approximate a zero x∗ of g.
We consider the function g as the difference between a period map f :
Rn → Rn and the identity. So,
g(x) = f (x) − x.
A zero x∗ of the function g is a fixed point of the function f. As we pointed
out in Section 1.3, the first step of a limited memory Broyden method is a
dynamical simulation step if the initial Broyden matrix is given by B0 = −I,
that is,
x1 = f (x0 ).
In addition, suppose that g is an affine function, g(x) = Ax + b, where A ∈
Rn×n and b ∈ Rn . We know that
Bk+1 sk = yk = Ask ,
for every k = 0, 1, 2, . . . , and that the Broyden matrix Bk+1 is the sum of B0
and the update matrix CD T . Therefore, the equality
CDT sk = (A + I)sk
holds, where (A + I) is the Jacobian of the period map f. For nonlinear func-
tions g and f, this suggests that the update matrix approximates in some sense
111
112 Chapter 4. Features of Broyden’s method
the Jacobian of f, Jf (x∗ ) = Jg (x∗ ) + I. Also, the parameter p of the limited

memory Broyden method could be chosen rank Jf (x∗ ) + 1. We investigate this
conjecture with an example.
Example 4.1. Let g : Rn → Rn be given by g(x) = ax, with nonzero a ∈ R.
The unique solution of the system g(x) = 0, is x∗ = 0 and the Jacobian of
g is given by Jg (x∗ ) = aI. Let A = aI, then the Jacobian of the period map
f : Rn → Rn is given by Jf (x∗ ) = A + I = (a + 1)I. Let x0 6= 0 be arbitrarily
given, the first Broyden step becomes
s0 = −B0−1 g(x0 ) = ax0 ,
and thus x1 = x0 + ax0 = (a + 1)x0 . Note that in case of a = −1 the Jacobian
of f equals the zero matrix, which has rank zero, and that the exact solution
is found in just one single iteration of the Broyden process. This is clear
since the initial Broyden matrix equals the Jacobian (B0 = A = −I) and the
method of Newton converges in one iteration on linear systems. Now assume
that a 6= −1. Because g(x1 ) = a(a + 1)x0 the new Broyden matrix becomes
g(x1 )sT0 a(a + 1)x0 · (ax0 )T x0 xT0
B1 = B 0 + = −I + = −I + (a + 1) .
sT0 s0 (ax0 )T (ax0 ) xT0 x0
The next Broyden step is given by
s1 = −B1−1 g(x1 )
³ x0 xT ´−1
= − − I + (a + 1) T 0 a(a + 1)x0
x0 x0
³ a + 1 x0 xT0 ´
= − −I + · T a(a + 1)x0
a x0 x0
= a(a + 1)x0 − (a + 1)2 x0 = −(a + 1)x0 ,
and x2 = x1 + s1 = (a + 1)x0 − (a + 1)x0 = 0. This was to be expected as well,
because the zeroth Krylov subspace is given by
Z0 = span {g(x0 ), (AH0 )g(x0 ), (AH0 )2 g(x0 ), . . .}
= span {g(x0 ), ag(x0 ), a2 g(x0 ), . . .} = span {x0 },
and has dimension d0 = dim Z0 = 1, for a 6= 0. According to Corollary 2.14
the method of Broyden converges in less that 2d0 = 2 iterations. Note that,
although A + I = (a + 1)I has full rank, it has singular value (a + 1) with
multiplicity n. Obviously, the method of Broyden uses the information of the
Jacobian Jf (x∗ ) in only one direction, that is, on the subspace spanned by
x0 . The last update matrix before the process converges exactly is given by
CDT = (a + 1)x0 xT0 /(xT0 x0 ).
4.1 Characteristics of the Jacobian 113
4.1 Characteristics of the Jacobian

In a small neighborhood of the solution x∗ , the nonlinear function g can be
considered as approximately linear, depending on the relative nonlinearity γ rel ,
see Section 1.3. Therefore, we compare in this section the convergence proper-
ties of the method of Broyden for several test functions and their linearizations
around x∗ . The initial Broyden matrix is set to minus the identity.
We define the affine function l : Rn → Rn by
l(x) = g(x∗ ) + Jg (x∗ )(x − x∗ ) = Jg (x∗ )x − Jg (x∗ )x∗ ,
and write
l(x) = Ax + b (4.1)
where
A = Jg (x∗ ) and b = −Jg (x∗ )x∗ . (4.2)
In this section we compute both the singular values of the matrix A + I and
of the zeroth Krylov space for the linearized problem, given by
Z0 = span {l(x0 ), (AH0 )l(x0 ), (AH0 )2 l(x0 ), . . .}.
We investigate the connection between d0 = dim Z0 an the choice of p for

the BRR method to solve g(x) = 0. The dimension of a Krylov space can-
not always be determined exactly. The vector (AH0 )j l(x0 ), for example, can
still be linearly independent of the first j vectors in the Krylov sequence,
l(x0 ), . . . , (AH0 )j−1 l(x0 ), but also lie close to the subspace spanned by these
j vectors. Therefore, we define the zeroth Krylov matrix by
K0 := K(l(x0 ), AH0 ), (4.3)
where h i
v Av An−1 v
K(v, A) = kvk kAvk ··· kAn−1 vk .
The rank of K0 equals the dimension of Z0 . However, we can derive the singular
values of K0 , to obtain a more continuous description of the rank of K0 . The
rank of K0 can be approximated by the number of relatively large (for example,
≥ 10−15 ) singular values of K0 .
The discrete integral equation function

We consider the function g : Rn → Rn as given by (A.5), with dimension
n = 20. As expected from Section 1.2 it takes Newton’s method 3 iterations to
0
10
residual kg(xk )k
−5
10
−10
−15
10
quotient σp /ksk−1 k 0 5 10 15 20 25 30 35 40
iteration k
10
10
0
10
PSfrag replacements
−10
10
residual kg(xk )k
0 5 10 15 20 25 30 35 40
iteration k
Figure 4.1: The convergence rate of Algorithm 1.19 and Algorithm 3.11, with q = p−1,
applied to the discrete integral equation function (A.5) and additionally the quotient
σp /ksk−1 k. [’◦’(Broyden), ’×’(p = 10), ’¦’(p = 8), ’O’(p = 7), ’+’(p = 6), ’M’(p = 5),
’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
converge to a residual of kg(xk )k < 10−12 , starting from the initial condition
x0 given by (A.6). In Figure 4.1 we have plotted the rate of convergence for
the method of Broyden and the BRR method for different values of p.
The method of Broyden needs 21 iterations to obtain the same order of
residual. It turns out that the BRR method needs also 21 iterations for p ≥ 7,
cf. Section 3.2, where we took n = 50. For smaller values of the parameter p
the residual diverges from the path of Broyden’s method when the quotient
σp /ksk−1 k has become too large, see again Figure 4.1. Thereafter the residual
kg(xk )k changes very little from iteration to iteration and the process slowly
diverges. So, the quotient σp /ksk−1 k decreases rather because the size of the
Broyden step increases than because the singular value σp gets smaller.
In Figure 4.2, we have plotted the singular values of both Jf (x∗ ) and K0 ,
defined by (4.3). The graph of the singular values of Jf (x∗ ) describes an
exponential decay to 2. Clearly the matrix Jf (x∗ ) has full rank. The singular
values of K0 describe a fast linear decay until the 9th singular value. The
remaining singular values are of the same order. Note that it is not evident to
determine the dimension of the zeroth Krylov space.
2.3 0
10
2.25
−5
2.2 10
2.15
−10
10
2.1
2.05 −15
10
2
0 5 10 15 20 0 5 10 15 20
Figure 4.2: The singular values of Jf (x∗ ) (left) and K0 (right) in case of the discrete
integral equation function (A.5), n = 20.
In Figure 4.3, we have plotted the singular values of the same matrices
Jf (x∗ ) and K0 for a larger dimension (n = 50). We observe that the number
2.3 0
10
2.25
−5
2.2 10
2.15
−10
10
2.1
2.05 −15
10
2
0 10 20 30 40 50 0 10 20 30 40 50
integral equation function (A.5), n = 50.
of large singular values of K0 is about the same as in case of n = 20. So, for
the linearized system the method of Broyden would need as many iterations
for n = 20 as it needs for n = 50. This explains the same rate of convergence
for different dimensions of the nonlinear problem, see Example 1.21.
The discrete boundary value function

n = 20. The method of Newton needs again 3 iterations to converge to a
residual of kg(xk )k < 10−10 , starting from the initial condition x0 , given by
(A.3). The method of Broyden needs 60 iterations to obtain the same order of
residual, see Figure 4.4. It turns out that the BRR method fails to converge
for every value of p. As can be seen in Figure 4.4 the residual increases directly
after the quotient σp /ksk−1 k has become too large.
5
10
residual kg(xk )k
0
10
−5
−10
10
quotient σp /ksk−1 k 0 10 20 30 40 50 60
iteration k
0
10
−5
10
PSfrag replacements
−10
10
residual kg(xk )k
0 10 20 30 40 50 60
iteration k
applied to the discrete boundary value function (A.2) and additionally the quotient
σp /ksk−1 k. [’◦’(Broyden), ’×’(p = 10), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2),
’✩’(p = 1)]
The singular values of Jf (x∗ ) are all different and are nicely distributed
over the interval [1, 5], Figure 4.5. All singular values of K0 are larger than
10−15 and more than 10 singular values are even larger than 10−5 . Although
one might consider K0 not to have full rank it is rather close to be nonsingular.
So, the method of Broyden would need almost all 2n iterations to converge on
the linearized problem.
5 0
10
4
−5
10
3
−10
2 10
0 5 10 15 20 0 5 10 15 20
boundary value function (A.2), n = 20.
The extended Rosenbrock function

n = 20. The method of Newton needs 3 iterations to converge to a residual of
kg(xk )k < 10−12 , starting from the initial condition x0 given by (A.7). The
method of Broyden needs 18 iteration to obtain the same order of residual,
see Figure 4.6.
For p = 1 and p = 2 the BRR method fails to converges. For larger values
of p, however, the BRR method has a high rate of convergence and is even
faster than the method of Broyden. If we take p larger than 5 the rate of
convergence of the BRR method does not increase, that is, the BRR method
still needs 11 iterations. Note that for p = 5 the quotient σp /ksk−1 k does not
exceeds 10−15 , see Figure 4.6.
The unique solution of the extended Rosenbrock function is the vector
∗
x = (1, . . . , 1). The extended Rosenbrock function is a system of n/2 copies
of the Rosenbrock function, see Example 1.9. So, the Jacobian Jf = Jg + I at
the solution x∗ is a block-diagonal matrix, with blocks given by
µ ¶
−19 10
.
−1 1
Therefore, the Jacobian Jf (x∗ ) has two different singular values, that is, σ1 =
. . . = σn/2 ≈ 21.5134 and σn/2+1 = . . . = σn ≈ 0.4183, see Figure 4.7. Clearly,
only two singular values of the matrix K0 are significant. So, the dimension of
the zeroth Krylov space Z0 is 2 and the method of Broyden would need at most
4 iterations to solve the linearized system. Note that the BRR method can
approximate the zero of the extended Rosenbrock function if we take p = 3.
5
10
residual kg(xk )k
0
10
−5
10
PSfrag replacements
−10
10
−15
10
quotient σp /ksk−1 k 0 5 10 15 20 25
iteration k
5
10
0
10
−5
10
PSfrag replacements
−10
10
residual kg(xk )k −15

10
0 5 10 15 20 25
iteration k
applied to the extended Rosenbrock function (A.7) and additionally the quotient
σp /ksk−1 k. [’◦’(Broyden), ’M’(p = 5), ’∗’(p = 4), ’/’(p = 3), ’.’(p = 2), ’✩’(p = 1)]
The extended Powell singular function

n = 20. It turns out that the method of Newton converges linearly in 23
iterations to a residual of kg(xk )k < 10−12 , starting from the initial condition
given by (A.10). With the same initial condition, the method of Broyden fails
to converge to the zero of the extended Powell singular function, as does the
BRR method, for every value of p.
The unique solution of the extended Powell singular function is the zero
vector, x∗ = (0, . . . , 0). The Jacobian Jf = Jg + I at the solution x∗ is a
0
20 10
15 −5
10
10
−10
10
5
−15
10
0
0 5 10 15 20 0 5 10 15 20
Figure 4.7: The singular values of Jf (x∗ ) (left) and K0 (right) in case of the extended
Rosenbrock function (A.7), n = 20.
block-diagonal matrix, with blocks given by

 
2 10 p0 0
p
0 1 (5) − (5)
 .
0 0 1 0 
0 0 0 1
The Jacobian Jf is nonsingular and has four different singular values, that is,
σ1 = . . . = σn/4 ≈ 10.2501, σn/4+1 = . . . = σn/2 ≈ 3.3064, σn/2+1 = . . . =
σ3n/4 ≈ 1.0000, and σ3n/4+1 = . . . = σn ≈ 0.0590, see Figure 4.8. Only three
0
10 10
8
−5
10
6
4 −10
10
2
−15
0 10
0 5 10 15 20 0 5 10 15 20
Figure 4.8: The singular values of Jf (x∗ ) (left) and K0 (right) in case of the extended
Powell singular function (A.9), n = 20.
singular values of the matrix K0 are significant. The dimension of the zeroth
Krylov space Z0 is about 3 and the method of Broyden would need at most 6
iterations to solve the linearized system. The Jacobian Jg , however, is singular
at the zero x∗ of the function g and the theory of Sections 1.3 and 3.2 cannot
be applied.
4.2 Solving linear systems with Broyden’s method

As we have seen in Section 4.1 the linearized problem gives more insight in
the success of the method of Broyden on the original nonlinear problem. We
first recall the main results of Chapter 2 In many problems, components of
the function are linear or nearly linear. Therefore it is interesting to consider
the method of Broyden on linear systems.
Theorems 2.11 and 2.12 show that the number of iterations needed by the
method of Broyden to converge exactly on linear problems
Ax + b = 0, (4.4)
can be predicted by the sum of the dimensions of the Krylov spaces Z0 and
Z1 . By Corollary 2.14 we know that the method of Broyden needs at most
2d0 iterations on linear systems, where d0 = dim Z0 . According to Lemma
2.18 Broyden’s method needs at most 2d0 iterations for all linearly translated
systems of (4.4).
Therefore, we consider the method of Broyden applied to linear systems
where A has a Jordan canonical block form. In this section the vector b is
chosen to be the zero vector. As initial Broyden matrix we choose again
B0 = −I and in all examples we choose the initial condition x0 = (1, . . . , 1).
Another conclusion of Chapter 2 is that although the difference between
the Broyden matrix and the Jacobian does not grow, Lemma 2.6, it is not
necessary that the Broyden matrix approaches the Jacobian even if the linear
system (4.4) is solved. It has been proved that, under certain conditions, the
Broyden matrix and the Jacobian only coincide in one single direction, Lemma
2.7. In this section we illustrate the development of the Broyden matrix along
the Broyden process.
One Jordan block

Let us consider the matrix A ∈ Rn×n , given by
 
λ 1
 .. .. 
 . . 
A=  ..
.
 (4.5)
 . 1
λ
4.2 Solving linear systems with Broyden’s method 121
The vector b is set to zero and we choose λ equal to 2. If x0 is given by

x0 = (1, . . . , 1), the dimension of the zeroth Krylov space Z0 equals d0 = n.
It takes Broyden’s method at most 2d0 = 2n iterations to solve (4.4). In
Example 2.5 we have seen that, indeed, for n = 4 the method of Broyden
needs 8 iterations to converge.
We choose the dimension n = 20 and apply the method of Broyden. The
residual kg(xk )k oscillates around 101 for 39 iterations and then suddenly drops
to 10−12 in the 40th iteration step.
We have plotted the Jacobian A in Figure 4.9. The structure of the Jaco-
bian can be clearly distinguished. In the same figure we have also plotted the
initial Broyden matrix (B0 = −I), as well as the Broyden matrix at several
iterations. The matrix B40 is the final matrix before the method of Broyden
solves (4.4).
Jacobian B0 B3 B4
B5 B6 B10 B15
B20 B25 B30 B40
Figure 4.9: The Jacobian (4.5) of the linear system, the initial Broyden matrix and
the Broyden matrix at subsequent iterations (n = 20). Black corresponds to the value
−1 and white to the value 2.
Clearly, Broyden’s method tries to recover the structure of the Jacobian,

starting from its initial matrix. Due to our choice of the Jacobian, the initial
Broyden matrix and the initial estimate x0 , this recovery starts at the bottom
right side of the matrix. Iteration after iteration the update to the Broyden
matrix involves a next entry of the main diagonal. We see that the Broyden
matrix is also developing a sub-diagonal. After about 25 iterations the upper
left corner of the matrix is reached. Thereafter the elements of the two main
diagonals are adjusted and the off-diagonal elements are pressed to zero.
We have applied Algorithm 3.11 to solve the linear system with the Jaco-
bian of (4.5), for every value of p. The BRR method is only converging for
p = 20 but not as fast as the method of Broyden itself. In 60 iterations a resid-
ual is reached of 3.2468 · 10−11 . We have seen in Figure 4.9 that the method
of Broyden mainly concerns the main diagonals of the matrix. The other ele-
ments of the Broyden matrix are kept approximately zero. The BRR method,
however, disturbs the structure of the Broyden matrix. That is, where the
elements of the Broyden matrix should be zero a pattern arises, see Figure
4.10.
B30 B40 B50 B60
Figure 4.10: The Broyden matrix at three different iterations of Algorithm 3.11, with
p = 20 (n = 20). Black corresponds to the value −1 and white to the value 2.
For smaller values of p the BRR method fails to converge. That is, the
convergence behavior of Broyden’s method is followed for about 2p iterations,
but then the process diverges. We now apply Algorithm 3.11 with p = 1. This
implies that there is only one singular value available to update the initial
Broyden matrix B0 in order to approximate the Jacobian (4.5). We have
plotted the Broyden matrix at four iterations of the BRR process, see Figure
4.11. Again the update process starts at the lower right corner of the matrix.
However, instead of creating the upper sub-diagonal, after a few iterations the
diagonal structure is restored in the lower right corner.
We apply Algorithm 3.11 with p = 2. We have plotted the Broyden matrix
at four iterations of the BRR process, see Figure 4.12. The update process
4.2 Solving linear systems with Broyden’s method 123
B10 B20 B30 B40
starts as expected in the lower right corner of the matrix. It turns out that the
process fails to reconstruct the Jacobian again. Instead, two spots are created
on the diagonal and destroy the banded structure of the Broyden matrix. As
said before the process fails to converge.
B10 B20 B30 B40
Two equal Jordan blocks

We assume that n is even and consider the matrix A ∈ Rn×n , given by
µ ¶
A11 0
A= , (4.6)
0 A22
where both A11 , A22 ∈ Rn/2×n/2 , are Jordan blocks (4.5) with the same eigen-
value λ = 2. The vector b is the zero vector. If x0 is given by x0 = (1, . . . , 1)
the dimension of the zeroth Krylov space Z0 equals d0 = n/2. It takes Broy-
den’s method at most 2d0 = n iterations to solve (4.4). In Example 2.5 we
have seen that for n = 4 the method of Broyden needs 4 iterations to converge.
We choose the dimension n = 20 and plot the Jacobian A, see Figure 4.13.
The Broyden matrix is plotted at several iterations. The matrix B20 is the
final matrix before the method of Broyden solves the problem.
Jacobian B5 B10 B20
As for the previous example, Broyden’s method tries to recover the struc-
ture of the Jacobian, starting from the initial matrix. The process starts at
the bottom right side of the matrix. Iteration after iteration the update to
the Broyden matrix involves a next entry of the main diagonal. Note that
the Broyden matrix again is developing a sub-diagonal. But, in addition, two
bands arise that connect both (n/2)-dimensional systems.
Here the method of Broyden needs 20 iterations to converge, and kg(x k )k
oscillates before it drops to 10−12 . It turns out that the BRR method is as fast
as Broyden’s method for p ≥ 11. For p ≤ 10 the process eventually diverges.
Two different Jordan blocks

We assume that n is even and consider the matrix A ∈ Rn×n , given by
µ ¶
A11 0
A= , (4.7)
0 A22
where both A11 , A22 ∈ Rn/2×n/2 , are Jordan blocks (4.5) but with different
eigenvalues λ1 = 2 and λ2 = 3. The vector b is the zero vector. If the initial
condition is given by x0 = (1, . . . , 1), the dimension of the zeroth Krylov space
Z0 equals d0 = n. It takes Broyden’s method at most 2d0 = 2n iterations to
solve (4.4).
We choose the dimension n = 20 and plot the Jacobian A, see Figure 4.14.
The Broyden matrix is plotted at several iterations. The matrix B40 is the
final matrix before the method of Broyden solves the problem.
4.3 Introducing coupling 125
Jacobian B10 B30 B40
Broyden’s method tries to recover the structure of the Jacobian, starting

from the initial matrix. As in the previous example the two bands are devel-
oped that connect the two (n/2)-dimensional systems. However, at the end of
the process these bands are eventually removed.
Here the same description of the computations is valid as for the first
matrix. The method of Broyden needs 40 iterations to converge. We have to
choose p = 20 for the BRR method to converge. For smaller values of p the
BRR method indeed diverges.
4.3 Introducing coupling

In the previous section we have seen that the method of Broyden finds out
when a system of equations can be split into several independent systems of
equations and that it tries to solve the independent systems simultaneously.
We consider the matrix A ∈ Rn×n , given by
 
λ δ
 .. .. 
 . . 
A=  . (4.8)
.. 
 . δ
λ
The vector b is set to zero and we choose λ equal to 2. The parameter δ varies
between zero and one. With x0 given by x0 = (1, . . . , 1) the exact dimension
of the zeroth Krylov space Z0 equals d0 = n, if δ 6= 0.
However, it turns out that for small values of δ the method of Broyden
needs less than 2n iterations.
If δ = 1.0 · 10−4 the method of Broyden needs 8 iterations to converge to a
residual of 9.822·10−16 . In Figure 4.15 we have plotted the Jacobian and several
Broyden matrices of the process. After 8 iterations, only three elements on the
diagonal are ’recovered’, but evidently this is enough for Broyden’s method
to find the solution. The BRR method turns out to be convergent for every
value for p.
Jacobian B4 B6 B8
Figure 4.15: The Jacobian (4.8) of the linear system, with δ = 1.0 · 10−4 , and the
Broyden matrix at subsequent iterations (n = 20). Black corresponds to the value
If δ = 1.0 · 10−3 the method of Broyden needs 10 iterations to converge

to a residual of 1.4698 · 10−14 . The Broyden matrix in the 10th iterations has
recovered four elements on the diagonal, see Figure 4.16. This is enough for
the method of Broyden to find the solution. Simulations show that the BRR
method is convergent for every value for p, except for p = 2.
Jacobian B6 B8 B10
If δ = 1.0·10−2 , then the method of Broyden needs 14 iterations to converge

to a residual of 3.2768 · 10−13 . Similarly to the previous cases, the process has
to recover some of the diagonal elements of the Jacobian, before it finds the
solutions. Here, the final number of recovered elements is 6, see Figure 4.17.
The off-diagonal elements are still small and therefore not distinguishable.
Remarkably, the BRR method has exactly the same rate of convergence for
4.3 Introducing coupling 127
p ≥ 6. For smaller values of p the rate of convergence is low or the process

diverges.
Jacobian B6 B10 B14
If δ = 0.1 the method of Broyden needs 30 iterations to converge. The

Broyden matrix starts to recover the 14th elements of the diagonal, when the
process converges, see Figure 4.18. Clearly, the off-diagonal elements of the
Jacobian become important. The BRR method only converges equally fast for
p ≥ 12.
Figure 4.18: The Jacobian (4.8) of the linear system, with δ = 0.1, and the Broyden
matrix at subsequent iterations (n = 20). Black corresponds to the value −1 and
white to the value 2.
If δ = 0.5 the method of Broyden needs 40 iterations to converge. The

situation is comparable to the one described in Section 4.2, where we consid-
ered a Jacobian consisting of one canonical Jordan block. The plots in Figure
4.19 are similar to those of Figure 4.9. The BRR method fails to converge for
p < 20, and even for p = 20 the rate of convergence is lower, i.e., 47 iterations
are needed instead.
For different values of δ we have plotted the rate of convergence of the
method of Broyden when solving g(x) = 0, see Figure 4.20.
Figure 4.19: The Jacobian (4.8) of the linear system, with δ = 0.5, and the Broyden
matrix at subsequent iterations (n = 20). Black corresponds to the value −1 and
white to the value 2.
0
10
residual kg(xk )k
−5
10
−10
10
PSfrag replacements
−15
10
0 5 10 15 20 25 30 35 40
iteration k
Figure 4.20: The rate of convergence of Broyden’s method solving (4.4) where A
is given by (4.8) for different values of δ. [’◦’(δ = 1.0 · 10−4 ), ’×’(δ = 1.0 · 10−3 ),
’+’(δ = 1.0 · 10−2 ) ’∗’(δ = 0.1), ’¤’(σ = 0.5)]
4.4 Comparison of selected limited memory Broy-

den methods
In this section we compare the most promising limited memory Broyden meth-
ods, derived in Chapter 3. For every test function of Appendix A and every
linear system discussed in Section 4.2 we applied the methods for p = 1, . . . , 20.
The results we have put in Tables 4.2 - 4.6. The results of the discrete bound-
ary value function (A.2) and the extended Powell singular function (A.9) are
not included because all limited memory Broyden methods fail to converge for
these functions.
In the tables and the description of the results we have used abbreviations
for the limited memory Broyden methods, as listed in Table 4.1.
In all tables the methods are listed vertically and different values of p are
listed in horizontal direction. The initial condition x0 as well as the dimension
4.4 Comparison of selected limited memory Broyden methods 129
UPALL The Broyden Update Restart method

(Algorithm 3.4 with q = 0)
UP1 The Broyden Update Reduction method
(Algorithm 3.4 with q = 1 and Z = I)
BRR The Broyden Rank Reduction method
(Algorithm 3.11 with q = p − 1)
BRRI The Broyden Rank Reduction Inverse method
BRR2 The Broyden Rank Reduction method
BBR The Broyden Base Reduction method
(Algorithm 3.4 with q = p − 1 and Z given by (3.22))
BBS The Broyden Base Storing method
(Algorithm 3.4 with q = p − 1 and Z = R)
BYRD The limited memory Broyden method proposed by Byrd et al.
Table 4.1: The abbreviation of several limited memory Broyden methods.
n are uniform for every simulation in a table. For every combination of the
method and parameter p, the number of iterations of the simulation are given
and the variable R that represents the rate of convergence of the process to
obtain a residual of kg(xk )k < ε. Note that if R is negative the final residual
kg(xk∗ )k is larger than the initial residual kg(x0 )k. If R is large, then the
method has a high rate of convergence. If a process fails to converge this is
indicated by an asterisk.
In the examples of Chapter 3 we saw that in some cases a process initially
converges, but fails after a few iterations. So, the residual kg(xk )k might have
been smaller than the final residual. This situation cannot be distinguished
in the tables of this section. We refer for further details to Chapter 3 and
Sections 4.1 and 4.2.
The discrete integral equation function

We consider the discrete integral equation function (A.5) for n = 20 and apply
the limited memory Broyden methods starting from x0 given by (A.6) to a
residual of kg(xk )k < 10−12 . The results are given in Table 4.2.
For p ≥ 10 all methods succeed to converge. Some of the methods are even
indistinguishable from the method of Broyden. It turns out that especially
UPALL has good results, because the method converges for all p ≥ 2. For
p = 1 all methods fail to converge. Note that for p = 1 all methods, except for
BRR2 and BYRD, are in fact equal. For BRR there exists a sharp boundary,
method p = 20 p = 19 p = 18 p = 17 p = 16
UPALL 21 1.3412 23 1.3544 21 1.3619 21 1.4441 22 1.2688
UP1 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3327
BRR 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BBRI 21 1.3411 21 1.3411 21 1.3411 21 1.3411 21 1.3411
BRR2 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BBR 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BBS 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BYRD 21 1.3411 21 1.3411 21 1.3411 21 1.3410 21 1.3432
p = 15 p = 14 p = 13 p = 12 p = 11
UPALL 23 1.3103 20 1.3677 23 1.2103 22 1.3442 21 1.3353
UP1 24 1.2372 21 1.3029 27 1.0782 25 1.1281 29 0.9653
BRR 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BBRI 21 1.3411 21 1.3411 21 1.3411 21 1.3411 21 1.3411
BRR2 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BBR 21 1.3412 21 1.3412 21 1.3412 21 1.3412 21 1.3412
BBS 21 1.3412 21 1.3412 21 1.3411 21 1.3411 21 1.3401
BYRD 27 1.0585 32 0.8577 32 0.8585 63 0.4433 111 0.2489
p = 10 p=9 p=8 p=7 p=6

UPALL 22 1.3908 21 1.3658 20 1.4545 23 1.2895 22 1.2677
UP1 29 0.9555 41 0.6953 52 0.5278 32 0.8740 44 0.6512
BRR 21 1.3412 21 1.3412 21 1.3411 21 1.3511 200 0.0307 ∗
BBRI 21 1.3411 21 1.3411 21 1.3410 21 1.3329 23 1.1922
BRR2 21 1.3412 21 1.3411 21 1.3384 23 1.2061 36 0.7662
BBR 21 1.3412 21 1.3410 21 1.3401 200 0.0773∗ 200 0.0655∗
BBS 22 1.3909 22 1.3864 22 1.3733 131 −0.4034∗ 104 −0.6570∗
BYRD 171 0.1650 200 0.1027∗ 90 0.3157 200 0.0640∗ 94 0.2991
p=5 p=4 p=3 p=2 p=1

UPALL 21 1.3469 24 1.2088 33 0.8307 24 1.2468 200 0.0158 ∗
UP1 83 0.3425 30 0.9762 25 1.1068 26 1.0909 200 0.0158 ∗
BRR 200 0.0155∗ 200 0.0073∗ 160 −0.3226∗ 55 −1.2889∗ 200 0.0158∗
BBRI 50 0.5684 53 0.5189 119 0.2407 94 0.2995 200 0.0158 ∗
BRR2 51 0.5655 84 0.3360 114 0.2410 24 1.2468 – –
BBR 200 0.0605∗ 200 0.0008∗ 168 0.1646 38 0.7934 200 0.0158∗
BBS 64 0.4277 200 0.1195∗ 62 −0.7674∗ 200 0.0778∗ 200 0.0158∗
BYRD 75 0.3734 30 0.9509 38 0.7933 200 0.0158∗ 9 −6.3791∗
Table 4.2: The number of iterations and the rate of convergence for the limited
memory Broyden methods of Table 4.1, applied to the discrete integral equation
function (A.5) (n = 20). [’*’ (no convergence)]
that is, the method converges for p ≥ 7 and fail for p ≤ 6. The same holds for
the methods BBR and BBS, both methods converge for p ≥ 8. These methods,
however, also converge for some smaller values of p. For p ≤ 12 the method
BYRD needs many iterations to converge, except for p = 3 and p = 4, where
the method does converge rather fast. We can conclude that every method
can be trusted if p is larger than a certain critical value (p = 6 for BRR,
p = 7 for BBR and BBS, etc). Beneath this critical value the method might
occasionally converge.
The extended Rosenbrock function
For the extended Rosenbrock function (A.7) with n = 20, we give the results
of the simulations only for p ≤ 10, since for every method starting from x0
given by (A.8) the rate of convergence hardly increases for larger values of p.
The results are listed in Table 4.3.
method p = 10 p=9 p=8 p=7 p=6

UPALL 11 2.9756 13 ∞ 12 2.8867 14 ∞ 16 2.2081
UP1 11 2.9756 11 2.9444 15 ∞ 14 2.4949 25 1.2801
BRR 11 2.9756 11 2.9756 11 2.9756 11 2.9756 11 2.9756
BRRI 11 2.9756 11 2.9756 11 2.9756 11 2.9756 11 2.9756
BRR2 11 2.9756 11 2.9756 11 2.9756 11 2.9756 11 2.9756
BBR 11 2.9756 11 2.9756 11 2.9756 11 2.9756 11 2.9756
BBS 11 2.9756 11 2.9756 11 2.9756 11 2.9756 11 2.9756
BYRD 11 2.9723 14 ∞ 14 ∞ 18 1.8733 19 1.9444
p=5 p=4 p=3 p=2 p=1

UPALL 14 2.4087 38 0.8749 200 −0.1284∗ 200 −0.0374∗ 200 −0.0303∗
UP1 23 1.4538 20 1.6973 66 0.4997 30 1.1411 200 −0.0303 ∗
BRR 13 2.6200 13 2.6875 22 1.5367 62 0.5074 200 −0.0303 ∗
BRRI 13 2.5924 13 2.6017 77 0.4126 33 −1.3923∗ 200 −0.0316∗
BRR2 13 2.6200 13 2.6875 33 0.9461 200 −0.0374∗ – –
BBR 11 2.9756 11 2.9756 11 2.9756 200 −0.0986∗ 200 −0.0303∗
BBS 11 2.9756 11 2.9756 11 2.9756 18 1.7653 200 −0.0303 ∗
BYRD 20 1.6295 32 1.0355 145 0.2341 200 −0.0380∗ 4 −14.9518∗
memory Broyden methods of Table 4.1, applied to the extended Rosenbrock function
(A.7) (n = 20). [’*’ (no convergence)]
The ’∞’-sign implies that accidentally the exact zero of the extended
Rosenbrock function is found. Again all methods fail to converge for p = 1.
For p = 2 only the methods UP1, BRR and BBS converge. Note that most
methods converge for p = 3. The method BBR and BBS are for p = 3 even as
fast as for p = 10.
One Jordan block

We consider again the matrix A ∈ Rn×n , given by (4.5), with λ equal to 2 and
n = 20. The vector b is the zero vector. In Table 4.4 we give the results for
the limited memory Broyden methods for 16 ≤ p ≤ 20, starting from x0 given
by x0 = (1, . . . , 1). All methods fail to converge for smaller values of p. Note
that the method of Broyden needs 2n iterations to solve (4.4), see Section 4.2.
method p = 20 p = 19 p = 18 p = 17 p = 16
UPALL 200 −0.0470∗ 200 −0.0491∗ 200 −0.0501∗ 200 −0.0472∗ 200 −0.0482∗
UP1 200 −0.0515∗ 200 −0.0636∗ 200 −0.0516∗ 200 −0.0474∗ 200 −0.0491∗
BRR 65 0.4889 162 −0.2709∗ 152 −0.2872∗ 143 −0.3057∗ 136 −0.3216∗
BRRI 83 0.3662 200 0.0620∗ 200 −0.0495∗ 200 −0.0805∗ 200 −0.1101∗
BRR2 96 0.3259 167 0.1812 136 0.2245 200 0.0361∗ 200 0.1294∗
BBR 194 −0.2331∗ 200 −0.1959∗ 200 −0.1987∗ 153 −0.2883∗ 182 −0.2450∗
BBS 62 0.5514 139 −0.3142∗ 123 −0.3592∗ 114 −0.3892∗ 104 −0.4233∗
BYRD 200 −0.0564∗ 200 −0.0560∗ 200 −0.0549∗ 200 −0.0634∗ 200 −0.0378∗
memory Broyden methods of Table 4.1, applied to the linear equation (4.4) where A
is given by (4.5) and b = 0 (n = 20). [’*’ (no convergence)]
Two equal Jordan blocks

We assume that n is even and consider the matrix A ∈ Rn×n , given by (4.6)
where both A11 , A22 ∈ Rn/2×n/2 , are Jordan blocks (4.5) with the eigenvalue
λ = 2. The vector b is the zero vector. The initial condition x0 is given by
x0 = (1, . . . , 1). Note that it takes Broyden’s method n iterations to solve
(4.4), see Section 4.2. The results of the simulation are given in Table 4.5
Most of the limited memory Broyden methods fail for p ≤ 10 except for the
method BRR2 and BBS (and UP1 for p = 2). The methods UPALL, UP1
also fail to converge for 11 ≤ p ≤ 16 and the method BYRD for 11 ≤ p ≤ 18.
Two different Jordan blocks

We take n = 20 and consider the matrix A ∈ Rn×n , given by (4.7) where
both A11 , A22 ∈ Rn/2×n/2 , are Jordan blocks given by (4.5), with different
eigenvalues λ1 = 2 and λ2 = 3. The vector b is the zero vector and the initial
condition x0 is given by x0 = (1, . . . , 1). Note that it takes Broyden’s method
2n iterations to solve (4.4), see Section 4.2. The results for the limited memory
Broyden methods for 16 ≤ p ≤ 20 are given in Table 4.6. More or less the
same description is valid as for Table 4.4.
method p = 20 p = 19 p = 18 p = 17 p = 16
UPALL 20 1.7351 20 1.7351 20 1.6369 104 0.3384 200 0.0589 ∗
UP1 20 1.7351 20 1.7351 20 1.6805 167 0.1870 200 0.1403 ∗
BRR 20 1.7351 20 1.7351 20 1.7331 20 1.7259 20 1.6852
BRRI 20 1.7685 20 1.7685 20 1.7662 20 1.7521 20 1.7198
BRR2 20 1.7351 20 1.7351 20 1.7326 20 1.7280 20 1.6866
BBR 20 1.7351 20 1.7351 20 1.7465 20 1.7275 20 1.7245
BBS 20 1.7351 20 1.7351 20 1.7387 20 1.7325 20 1.7205
BYRD 20 1.7035 20 1.7036 200 −0.0623∗ 200 0.0924∗ 200 0.0033∗
p = 15 p = 14 p = 13 p = 12 p = 11
UPALL 200 0.0476∗ 200 0.0171∗ 200 0.0251∗ 200 0.0151∗ 200 0.0162∗
UP1 200 0.0132∗ 200 0.0182∗ 200 0.0235∗ 200 0.0079∗ 200 0.0042∗
BRR 20 1.6375 20 1.6571 20 1.6027 20 1.5553 20 1.5288
BRRI 20 1.7052 20 1.6270 20 1.6756 20 1.5609 20 1.5227
BRR2 20 1.6466 20 1.6467 20 1.6305 20 1.5621 73 0.4173
BBR 20 1.7495 20 1.7549 20 1.6908 20 1.6256 20 1.5424
BBS 20 1.7113 20 1.6994 20 1.7035 20 1.7338 20 1.7177
BYRD 200 0.0469∗ 200 0.0598∗ 200 0.0121∗ 200 0.0114∗ 200 0.0029∗
p = 10 p=9 p=8 p=7 p=6

UPALL 200 0.0157∗ 200 0.0432∗ 200 0.0213∗ 200 0.0398∗ 200 0.0515∗
UP1 200 0.0182∗ 200 0.0348∗ 200 0.0359∗ 200 0.0088∗ 200 0.0581∗
BRR 166 −0.2652∗ 130 −0.3374∗ 116 −0.3775∗ 117 −0.3750∗ 105 −0.4178∗
BRRI 200 0.0287∗ 200 −0.0357∗ 200 −0.1002∗ 200 −0.0933∗ 200 −0.2109∗
BRR2 90 0.3381 159 0.1909 128 0.2394 200 0.0913∗ 200 0.0122∗
BBR 200 −0.1304∗ 200 −0.1804∗ 165 −0.2639∗ 200 −0.1673∗ 196 −0.2335∗
BBS 36 1.0697 122 −0.3577∗ 103 −0.4280∗ 100 −0.4388∗ 89 −0.4907∗
BYRD 200 0.0146∗ 200 0.0127∗ 200 0.0086∗ 200 0.0402∗ 200 0.0351∗
p=5 p=4 p=3 p=2 p=1

UPALL 200 0.0465∗ 200 0.0989∗ 200 0.0585∗ 200 0.1395∗ 105 −0.4200∗
UP1 200 0.0641∗ 200 0.0964∗ 200 0.1116∗ 184 0.1727 105 −0.4200∗
BRR 98 −0.4522∗ 93 −0.4750∗ 105 −0.4177∗ 97 −0.4541∗ 105 −0.4200∗
BRRI 196 −0.2225∗ 190 −0.2299∗ 146 −0.3023∗ 148 −0.3154∗ 105 −0.4200∗
BRR2 200 −0.0312∗ 200 0.0069∗ 200 −0.0013∗ 200 0.1395∗ – –
BBR 200 −0.1937∗ 200 −0.1759∗ 200 0.0980∗ 200 0.1205∗ 105 −0.4200∗
BBS 86 −0.5056∗ 99 −0.4398∗ 99 −0.4464∗ 99 −0.4484∗ 105 −0.4200∗
BYRD 200 0.0960∗ 200 0.0747∗ 200 0.1205∗ 105 −0.4200∗ 33 −1.3498∗
method p = 20 p = 19 p = 18 p = 17 p = 16
UPALL 200 −0.0930∗ 200 −0.0806∗ 200 −0.0813∗ 200 −0.0798∗ 200 −0.0633∗
UP1 200 −0.0630∗ 200 −0.0570∗ 200 −0.0558∗ 200 −0.0465∗ 200 −0.0408∗
BRR 38 0.8408 45 0.7108 147 −0.2954∗ 149 −0.2953∗ 130 −0.3400∗
BRRI 38 0.8642 46 0.6741 200 0.0648∗ 200 −0.1018∗ 200 −0.0847∗
BRR2 38 0.8229 103 0.3301 112 0.2736 200 0.1223∗ 200 0.0566∗
BBR 42 0.7646 168 −0.2635∗ 200 −0.1886∗ 167 −0.2646∗ 164 −0.2708∗
BBS 40 0.8473 41 0.8489 146 0.2196 87 −0.5110∗ 105 −0.4182∗
BYRD 200 −0.0444∗ 200 −0.0678∗ 200 −0.0589∗ 200 −0.0672∗ 200 −0.0502∗
Chapter 5
Features of the Broyden rank

reduction method
Anticipating on the simulations of Chapter 8, we investigate the convergence

properties of the Broyden Rank Reduction method for computing fixed points
of the period map f : Rn → Rn of the reverse flow reactor defined by (8.3), cor-
responding to the partial differential equations of the one- and two-dimensional
model. The results are described in Section 5.1. In Section 5.2 we consider
the singular values of the update matrix for both models. In 5.3 we show that
the BRR method makes it possible to compute the limiting periodic state of
the reverse flow reactor on a finer grid using the same amount of memory and
just a few more iterations. Finally, in Section 5.4 we compare the convergence
properties of the limited memory Broyden methods listed in Table 4.1.
5.1 The reverse flow reactor

The one-dimensional model
Let f : Rn → Rn be the map of one flow reverse period, see (8.3), that corre-
sponds to the balance equations in (6.23)-(6.25) using the parameters values
of Table 6.2. In addition, we fix the flow reverse period and the dimensionless
cooling capacity (tf = 1200s and Φ = 0.2). As initial condition we take a
state of the reactor that is at high constant temperature (T = 2T 0 ) and filled
with inert gas (c = 0). For the finite volume discretization an equidistant
grid is used with N grid points (N = 100). This leads to an n-dimensional
discretized problem, where n = 2N = 200. The system of ordinary differential
equations is integrated over one reverse flow period using the NAG-library
135
136 Chapter 5. Features of the Broyden rank reduction method
routine D02EJF. To solve the equation g(x) = 0 with g(x) = f (x) − x, the
BRR method is applied for different values of p.
0
10
residual kg(xk )k
−5
10
−10
0 10 20 30 40 50 60
iteration k
applied to the period map of the reverse flow reactor (8.3) using the one-dimensional
model (6.23)-(6.25) with the parameter values of Table 6.2. [’◦’(Broyden), ’×’(p = 20),
’+’(p = 10), ’∗’(p = 5), ’¤’(p = 4), ’¦’(p = 3), ’O’(p = 2), ’M’(p = 1)]
The information in Figure 5.1 can be interpreted in the following way.

The method of Broyden converges to a residual with kg(xk )k < 10−10 in 52
iterations. For p = 20, the BRR method approximates the convergence rate
of the method of Broyden using one fifth of the amount of memory. Note that
the residuals of both methods are equal up to the 45th iteration. For p = 10,
the BRR method is even faster than the method of Broyden. So, the number
of iterations needed to converge to kg(xk )k < 10−10 is not monotonously
increasing for smaller values of p. If we take p = 5 or p = 4 instead of p = 10,
the BRR method needs a few more iterations to converge. However, the
amount of memory usage is divided by a factor 2 and 52 , respectively. For
p = 3, p = 2, and p = 1 the BRR method has a very low rate of convergence.
We see that a large reduction of memory is obtained with just a few more
iterations.
The two-dimensional model

Let f : Rn → Rn now be the map of one flow reverse period, see (8.3),
corresponding to the balance equations in (6.26)-(6.28) using the parameters
values of Tables 6.2. We fix the flow reverse period and the dimensionless
cooling capacity (tf = 1200s and Φ = 0.2). The ratio between the width and
5.1 The reverse flow reactor 137
the length of the reactor is set at R/L = 0.0025. As initial condition a state of
the reactor is taken that is at high constant temperature (T = 2T0 ) and filled
with inert gas (c = 0). For the finite volume discretization an equidistant grid
is used with N grid points in the axial direction (N = 100). In the radial
direction a non-uniform grid of M grid points is chosen that becomes finer in
the direction of the wall of the reactor (M = 25). In fact, a segment of the
reactor is divided in M rings with the same volume. The dimension of the
discretized problem is denoted by n (n = 2 · M · N = 5000). The system of
ordinary differential equations is integrated over one reverse flow period using
the NAG-library routine D02NCF.
To solve the equation g(x) = 0, with g(x) = f (x) − x, the BRR method
is applied for different values of p. It turns out that for the two-dimensional
model it is no longer possible to apply the original method of Broyden, due to
memory constraints.
0
10
residual kg(xk )k
−5
10
PSfrag replacements
0 10 20 30 40 50 60
iteration k
Figure 5.2: The convergence rate of Algorithm 3.11, with q = p − 1, applied to the
period map of the reverse flow reactor (8.3) using the two-dimensional model (6.26)-
(6.28) with the parameter values of Table 6.2. [’×’(p = 20), ’+’(p = 10), ’∗’(p = 5),
’¤’(p = 4), ’¦’(p = 3), ’O’(p = 2)]
Figure 5.2 shows that the BRR method has a high rate of convergence for
p ≥ 5. For 2 ≤ p ≤ 4 the BRR method does not converge within 60 iterations.
The amount of memory needed to store the Broyden matrix can be reduced
by choosing p = 10 instead of p = 20, using approximately the same number
of iterations.
5.2 Singular value distributions

of the update matrices
As we have explained in Section 3.2 the rank of the update matrix increases
during the Broyden process, since in every iteration a rank-one matrix is added
to the update matrix. So, the number of nonzero singular values of the update
matrix increases. In this section we investigate what happens with the singular
values if we remove the pth singular value in every iteration, that is, if we apply
the Broyden Rank Reduction method with parameter p.

As in the previous section, we first consider the period map of the reverse flow
reactor defined by (8.3) corresponding to the one-dimensional model (6.23)-
(6.25). In Figure 5.3 we have plotted the singular values of the update matrix
during the BRR process, for different values of p.
In case of p = 50, we see that the update matrix has rank one at the begin-
ning of the second iteration, that is, the matrix has one nonzero singular value.
In every iteration one nonzero singular value is added, the smallest singular
value. This singular value increases during some iterations and thereafter is
reaches a more or less stable value. For example, the largest singular value σ 1
jumps in the second iteration from about 10−1 to 100 . Subsequently the value
of σ1 is rather stable. Because the parameter p is larger than the number of
iterations done by the BRR process, no singular values are removed. So, we
have considered a situation where the BRR method is equal to the method of
Broyden.
If we choose p = 10 we see that during the first 10 iterations, the singular
value distribution is exactly the same as for p = 50. Thereafter the singular
value σ10 starts jumping around. The other nine singular values seems to be
invariant under the reduction procedure.
Decreasing the parameter p to 5, the singular value σ5 starts to oscillate
after the 5th iteration. In addition the singular value σ1 is larger than for
p = 50 and p = 10. The other singular values are still rather stable.

We apply the same investigation for the period map of the reverse flow reactor
defined by (8.3) corresponding to the two-dimensional model (6.26)-(6.28). In
Figure 5.4 we have plotted the singular values of the update matrix during
the BRR process, for different values of p.
5.2 Singular value distributions of the update matrices 139
0
10
singular values
−2
10
−4
0 5 10 15 20 25 30 35 40 45
iteration k
0
10
singular values
−2
10
−4
0 5 10 15 20 25 30 35 40 45 50
iteration k
0
10
singular values
−2
10
10
0 5 10 15 20 25 30 35 40 45 50 55
iteration k
Figure 5.3: The singular value distribution of the update matrix during the BRR
process, with q = p − 1, applied to the period map of the reverse flow reactor (8.3)
corresponding to the one-dimensional model (6.23)-(6.25). [up (p = 50), middle
(p = 10), down (p = 5)]
It turns out the we can describe the behavior of the singular values of the
update matrix in the same way as we did for the one-dimensional model. The
only difference is that for p = 5 the singular value σ4 starts to alter instead of
the singular value σ1 .
0
10
singular values
−2
10
−4
0 5 10 15 20 25 30 35 40 45
iteration k
0
10
singular values
−2
10
10
0 5 10 15 20 25 30 35 40 45 50
iteration k
0
10
singular values
−2
10
−4
0 10 20 30 40 50 60
iteration k
Figure 5.4: The singular value distribution of the update matrix during the BRR
process, with q = p − 1, applied to the period map of the reverse flow reactor (8.3)
corresponding to the two-dimensional model (6.26)-(6.28). [pu (p = 50), middle
(p = 10), down (p = 5)].
5.3 Computing on a finer grid using same amount

of memory
In Section 5.1 we have seen that the BRR method makes it possible to find
symmetric periodic solutions of the RFR using the full two-dimensional model
5.3 Computing on a finer grid using same amount of memory 141
of the RFR. In addition, we have shown that even when using the one-
dimensional description of the RFR, the BRR method saves memory. It turned
out that, surprisingly, for the two-dimensional model the same values for p can
be used as in case of the one-dimensional model. We now show that it is pos-
sible to use a finer grid with the same amount of memory to store the Broyden
matrix at the expense of just a few more iterations.
For the above simulation of the two-dimensional model a very slim reactor
is used (R/L = 0.0025). As will be discussed in Section 8.3, gradients in the
radial direction are absent in this case, and the two-dimensional model leads
to exactly the same results as the one-dimensional model. If we take a larger
radius for the reactor (R/L = 0.025), then radial temperature gradients are in
fact introduced. To illustrate the benefits of our limiting memory method we
compare two simulations of the model with M = 25 and M = 5 grid points
in the radial direction. So, the dimension of the discretized problem becomes
n = 5000 and n = 1000, respectively.
We have applied the BRR method with different values of p to compute
the periodic state of the reactor, see Table 5.1.
M = 25 M =5
# iterations # storage loc. # iterations # storage loc.
p = 20 48 200, 000 53 40, 000

p = 10 50 100, 000 55 20, 000
p=5 61 50, 000 65 10, 000
p=4 65 40, 000 82 8, 000
p=3 80 30, 000 76 6, 000
p=2 > 100 20, 000 90 4, 000
p=1 > 100 10, 000 > 100 2, 000
Table 5.1: The number of iterations of Algorithm 3.11, with q = p − 1, applied to

the period map of the reverse flow reactor (8.3) corresponding to the two-dimensional
model (6.26)-(6.28), and the number of storage locations for the Broyden matrix
using a grid with N = 100 grid points in the axial direction and M = 25, respectively
M = 5, grid points in the radial direction.
Although a few more iterations are needed than in case of the slim reactor
(R/L = 0.0025), still the same values for p can be used for both the fine and
the coarse grid. Note that for every value of p the rate of convergence for
M = 25 is higher than for M = 5. Suppose, for example, that at most 40, 000
storage locations are available. To accelerate the convergence we want to use
the largest value of p. For the coarse grid the parameter p can be chosen to
be 20 and for the fine grid at most p = 4. This implies that instead of 53
iterations for a coarse grid, 65 iterations are needed for a fine grid to solve
the discretized problem while using the same amount of memory to store the
Broyden matrix.
Although the approximation of the cyclic steady state is qualitatively good
using the coarse grid, Figure 5.5(b), the more accurate approximation using
the fine grid, Figure 5.5(a), is preferable.
1.2 1.2
temperature
temperature
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0
0 0
0.2 0.2
0.4 0.4
1 1
0.6 0.8 0.6 0.8
0.6 0.6
rad. distance 0.8

1 0
0.2
0.4
rad. distance 0.8
1 0
0.2
0.4
ax. distance ax. distance

(a) Fine grid (M = 25) (b) Coarse grid (M = 5)
Figure 5.5: Temperature distribution over the reactor bed using a coarse and a fine
grid in the radial direction.
5.4 Comparison of selected limited memory Broy-

den methods
Concluding this chapter, we apply the limited memory Broyden methods listed
in Table 4.1, to compute a fixed point of the period map f : Rn → Rn defined
by (8.3), as we did for several test functions in Chapter 4. The computations
are stopped if a maximal number of 200 iterations is reached or the process
is converged to a residual of kg(xk )k < ε, where ε = 10−10 for the one-
dimensional model and ε = 10−8 for the two-dimensional model.

The results of the simulations with the period map of the one-dimensional
model as given in Table 5.2.
It turns out that the methods BRR, BRRI, BRR2, BBR and BBS are
rather fast for p ≥ 5. Note that for p = 5 we need 2pn = 2 · 5 · 200 = 2000
storage locations to store the update matrix and for p = 50 we need 20, 000
storage locations. The method BRR can even be applied for p = 4 using 57
iterations. The method BRRI is still applicable for p = 3 using 69 iterations.
We clearly see that a smaller value of p does not have to imply that more
iterations are needed for the limited memory Broyden process. The fact that
for p = 50 not all methods converge in 48 iterations can be explained by the
introduction of rounding errors of the large computations. For two simulations
the results were not returned by the program, because an evaluation of the
period map failed during the process.

The results of the simulation with the period map of the two-dimensional
model are given in Table 5.3. Note that all methods converge in 47 iterations
for p = 50. For p ≥ 10 the method BRR, BRRI, BRR2 and BBR need less
than 51 iterations. The convergence properties of the limited memory Broyden
methods in case of the two-dimensional model are comparable to those in case
of the one-dimensional model. Note that the method BRR is applicable for
p = 4 using 64 iterations instead of 47.
method p = 50 p = 40 p = 30 p = 25 p = 20
UPALL 49 0.5186 75 0.3421 66 0.3917 79 0.3178 63 0.4051
UP1 49 0.5186 65 0.3839 92 0.2731 86 0.2923 118 0.2156
BRR 48 0.5202 47 0.5324 53 0.4750 51 0.4998 55 0.4575
BRRI 50 0.5019 56 0.4610 51 0.4979 47 0.5305 50 0.5056
BRR2 48 0.5202 47 0.5324 49 0.5134 60 0.4221 52 0.5067
BBR 49 0.5186 50 0.5069 49 0.5101 50 0.5187 55 0.4543
BBS 49 0.5186 47 0.5340 53 0.4744 55 0.4547 48 0.5264
BYRD 52 0.4831 50 0.4991 62 0.4215 57 0.4429 59 0.4441
p = 15 p = 14 p = 13 p = 12 p = 11
UPALL 82 0.3046 95 0.2654 71 0.3540 74 0.3421 72 0.3455
UP1 120 0.2076 126 0.2013 146 0.1716 119 0.2096 115 0.2216
BRR 56 0.4503 51 0.4889 50 0.5029 49 0.5119 54 0.4670
BRRI 53 0.4815 55 0.4561 48 0.5201 49 0.5193 46 0.5727
BRR2 46 0.5490 48 0.5216 57 0.4471 46 0.5414 47 0.5298
BBR 49 0.5078 48 0.5314 64 0.3968 46 0.5497 44 0.5732
BBS 54 0.4626 52 0.4851 59 0.4232 53 0.4701 51 0.5023
BYRD 76 0.3287 68 0.3807 101 0.2477 65 0.3883 85 0.3010
p = 10 p=9 p=8 p=7 p=6

UPALL 75 0.3342 86 0.2967 102 0.2442 131 0.1945 96 0.2596
UP1 162 0.1544 ... . . .∗ 132 0.1892 145 0.1725 146 0.1705
BRR 52 0.4843 55 0.4523 58 0.4332 50 0.5030 53 0.4715
BRRI 46 0.5511 49 0.5135 49 0.5159 44 0.5753 53 0.4693
BRR2 63 0.4016 49 0.5271 49 0.5281 52 0.4789 55 0.4528
BBR 49 0.5154 47 0.5323 42 0.5960 67 0.3809 47 0.5295
BBS 43 0.5852 58 0.4303 45 0.5667 53 0.4772 59 0.4314
BYRD 71 0.3550 91 0.2747 97 0.2572 133 0.1913 112 0.2266
p=5 p=4 p=3 p=2 p=1

UPALL 116 0.2201 138 0.1854 164 0.1561 200 0.0786∗ 200 0.0757∗
UP1 155 0.1628 155 0.1608 185 0.1388 191 0.1310 200 0.0757 ∗
BRR 54 0.4634 57 0.4432 85 0.2936 105 0.2420 200 0.0757 ∗
BRRI 62 0.4166 63 0.3952 69 0.3668 90 0.2778 200 0.0819 ∗
BRR2 60 0.4211 92 0.2737 158 0.1602 200 0.1129∗ – –
BBR 60 0.4175 ... . . .∗ 154 0.1617 182 0.1382 200 0.0757∗
BBS 71 0.3541 81 0.3098 76 0.3305 200 0.0717∗ 200 0.0757∗
BYRD 171 0.1458 128 0.1971 150 0.1663 188 0.1335 200 0.0059 ∗
Table 5.2: The number of iterations and the rate of convergence for different limited
memory Broyden methods, applied to the period map of the reverse flow reactor (8.3)
according to the one-dimensional model (6.23)-(6.25), n = 200. [’*’ (no convergence),
’. . .’ (no data)]
method p = 50 p = 40 p = 30 p = 25 p = 20
UPALL 47 0.4780 52 0.4259 54 0.4373 66 0.3568 67 0.3269
UP1 47 0.4780 61 0.3734 72 0.3167 98 0.2240 78 0.2830
BRR 47 0.4782 47 0.4782 47 0.4783 47 0.4783 47 0.5126
BRRI 47 0.4782 47 0.4783 47 0.4782 47 0.4783 47 0.5118
BRR2 47 0.4782 47 0.4782 47 0.4781 47 0.4778 47 0.4770
BBR 47 0.4780 47 0.4796 47 0.4912 47 0.4995 47 0.4669
BBS 47 0.4780 47 0.4782 47 0.4819 47 0.4876 50 0.4434
BYRD 47 0.4781 48 0.4567 53 0.4235 59 0.3742 56 0.4087
p = 15 p = 14 p = 13 p = 12 p = 11
UPALL 62 0.3760 72 0.3071 77 0.2880 115 0.1924 84 0.2659
UP1 78 0.2809 93 0.2357 101 0.2180 116 0.1909 115 0.1960
BRR 47 0.4913 47 0.5062 48 0.4657 48 0.4574 46 0.4781
BRRI 47 0.4955 47 0.5027 48 0.4640 48 0.4785 47 0.4690
BRR2 47 0.4796 48 0.4666 48 0.4833 47 0.4668 50 0.4427
BBR 48 0.4652 51 0.4580 47 0.4802 46 0.4833 47 0.4655
BBS 48 0.4648 50 0.4532 49 0.4813 60 0.3727 48 0.4678
BYRD 77 0.2861 61 0.3597 79 0.2813 74 0.3017 ... . . .∗
p = 10 p=9 p=8 p=7 p=6

UPALL 82 0.2677 79 0.2830 ... . . .∗ 103 0.2173 ... . . .∗
UP1 137 0.1629 171 0.1289 94 0.2404 110 0.1994 152 0.1539
BRR 49 0.4513 52 0.4512 55 0.4002 52 0.4238 50 0.4488
BRRI 48 0.4622 50 0.4621 51 0.4293 49 0.4593 59 0.3770
BRR2 50 0.4455 53 0.4198 59 0.3747 ... . . .∗ 53 0.4143
BBR 49 0.4643 ... . . .∗ 55 0.4104 55 0.4426 55 0.4082
BBS 52 0.4562 55 0.3983 55 0.4045 59 0.3908 74 0.2961
BYRD 88 0.2520 82 0.2723 79 0.2846 78 0.2828 92 0.2480
p=5 p=4 p=3 p=2 p=1

UPALL 81 0.2780 142 0.1557 96 0.2577 200 0.0560∗ ... . . .∗
UP1 127 0.1725 130 0.1695 ... . . .∗ 92 0.2453 ... . . .∗
BRR 60 0.3996 64 0.3441 79 0.2778 107 0.2112 ... . . .∗
BRRI 85 0.2610 59 0.3713 91 0.2405 110 0.2028 ... . . .∗
BRR2 61 0.3732 78 0.2804 79 0.2770 200 0.0891∗ – –
BBR 56 0.3924 80 0.2749 92 0.2388 ... . . .∗ ... . . .∗
BBS 57 0.3997 73 0.3027 102 0.2331 200 0.0807∗ ... . . .∗
BYRD 123 0.1785 ... . . .∗ ... . . .∗ ... . . .∗ 200 0.0014∗
Table 5.3: The number of iterations and the rate of convergence for different limited
memory Broyden methods, computing a fixed point of the period map (8.3) according
to the two-dimensional model (6.26)-(6.28), n = 5000. [’*’ (no convergence), ’. . .’ (no
data)]
Part III
Limited memory methods

applied to periodically forced
processes
147
Chapter 6
Periodic processes in packed

bed reactors
In this chapter, we give a short introduction in the chemical reactor engineer-

ing. In Section 6.1, we discuss the most common cyclic processes in packed
bed reactors and explain their advantages. The balance equations for a general
packed bed reactor are derived in Section 6.2.
6.1 The advantages of periodic processes

Periodic processes in packed bed reactors mainly arise from periodically vary-
ing the feeding conditions, that is, the temperature, pressure and direction of
the feed streams.
Pressure and thermal swing adsorption

In pressure swing adsorption (PSA) processes, gas mixtures are separated by
selective adsorption over a bed of sorbent materials. If the adsorbent is satu-
rated, that is, it cannot adsorb any more adsorbate, it has to be regenerated.
Therefore, the adsorbent must bind components reversibly, so that it does
not have to be replaced every time it is saturated, but can be cleaned in the
reactor itself. The periodic nature of the PSA arises from the high pressure
adsorption phase and the subsequent low pressure regeneration phase.
During adsorption one component is selectively adsorbed, such that at the
product end of the reactor the gas stream does not contain this component. In
a packed bed a front is therefore formed that slowly migrates in the direction of
the product end. From the feed point up to the adsorption front, the feed gas
149
150 Chapter 6. Periodic processes in packed bed reactors
mixture is in equilibrium with a saturated sorbent, while further downstream,

the gas phase contains non-adsorbing components only and the sorbent is not
saturated. During this step the pressure is maintained at a high level.
Before the adsorbent in the reactor is completely saturated, the product
end of the reactor is closed and the pressure is released at the feed end of the
reactor. This second step is called the blowdown step.
When the pressure has dropped to sufficient low level, it is maintained at
this level and ’clean’ carrier gas is led into the reactor at the product end such
that the adsorbent in the reactor is purged, that is, the adsorbed component
is removed from the sorbent during this regeneration step.
When the adsorbent has lost enough of its loading, the product end of
the reactor is again closed and the pressure is raised to the former high level.
After this pressurization the process returns to the first step.
De Montgareuil and Domine and independently Skarstrom are generally
considered to be the inventors of the PSA. The Skarstrom PSA cycle was
immediately accepted for commercial use in air drying. Pressure swing ad-
sorption is widely used for bulk separation and purification of gasses. Major
applications are for example, moisture removal from air and natural gas, sep-
aration of normal and iso-alkanes, and hydrogen recovery and purification. A
pressure swing adsorber designed to separate water from air has been studied
by e.g. Kvamsdal and Hertzberg [39].
Thermal swing adsorption (TSA) processes are similar to pressure swing
adsorption processes and are also intended to separate gas mixtures. But here
the cyclic nature arises from the low temperature adsorption phase and the
subsequent high temperature regeneration phase. Studies of thermal swing
adsorbers can be found in work by e.g. Davis and Levan [14]. Combinations
of PSA and TSA processes also exist.
Pressure swing reactor

The principle of Pressure Swing Reactors (PSR), sometimes also referred to
as Sorption Enhanced Reaction Processes (SERP), is based upon physically
admixing a sorbent and a catalyst in one vessel in order to achieve a separation
concurrent with a reaction. Sorption and catalysis may even be integrated in
a single material. The sorption enhanced reaction process has been demon-
strated primarily in achieving supra equilibrium levels in equilibrium limited
reactions. The adsorption is typically used to purify one of the reaction prod-
ucts. The cyclic nature of a pressure swing reactor arises from the same high
pressure adsorption and low pressure regeneration phases as in the pressure
swing adsorber. The pressure swing reactor is a relatively new process and
6.1 The advantages of periodic processes 151
has been studied by e.g. Hufton et al. [29], Carvill et al. [13] and Kodde and
Bliek [36].
The PSR potentially offers the following advantages:
• Increased conversion of reactants,
• Improved selectivities and yields of desired products,
• Reduced requirements for external supply or cooling capacity,
• Reduced capital expenditure by process intensification,
• More favorable reaction conditions might be possible, resulting in longer
lifetime of equipment and less catalyst deactivation.
A well known application of the pressure swing reactor is the removal of
CO from syngas, combining low-temperature shift catalysis and selective CO 2
removal by adsorption. Production of high purity hydrogen from syngas, as
required for instance for fuel cell applications, normally uses a multi-step pro-
cess, involving both a water gas shift and a selective oxidation process. In the
latter step a part of the produced hydrogen is inevitably lost. This disadvan-
tage can be avoided in a reactive separation using PSR. By a combination of
low temperature shift catalysis and selective adsorption of carbon dioxide in
one vessel, the removal of CO as a result of the shift reaction rather than by
selective oxidation might become feasible.
The shift reaction is given by
H2 O + CO −→
←− H2 + CO2 .
When adsorbing the CO2 the equilibrium of the above reaction shifts to the
right. This implies that more H2 is produced and more CO is removed. Being
a member of the family of adsorptive reactors, the PSR is limited to compara-
tively low temperature applications in order to maintain sufficient adsorption
capacity for the sorbent.
The reverse flow reactor

The simplest example of a periodic process might be the reverse flow reactor
(RFR), a packed bed reactor in which the flow direction is periodically re-
versed in order to trap a hot reaction zone within the reactor. In this way
even systems with a small adiabatic temperature rise can be operated with-
out preheating the feed stream. The reverse flow reactor concept was first
proposed and patented by Cottrell in 1938 for the removal of pollutants. We
describe the RFR is more detail in Section 8.1.
6.2 The model equations

of a cooled packed bed reactor
We consider a tubular reactor filled with small catalyst particles in which gas
flows in axial direction. The gas contains a bulk part of an inert gas with
a trace of reactants A that, when meeting the catalyst, reacts to a product
B. We deal with exothermic reactions only. To avoid overheating - melting
and burning - of the catalyst particles, the reactor is cooled using a cooling
jacket around the reactor. Turbulences of the gas around the particles cause
a nearly constant velocity over a cross section of the reactor. The reactor we
have described here is called a cooled packed bed reactor.
In this section a mathematical model is derived that describes the essen-
tials of the reactor unit. The dimension of the model denotes the number of
spatial directions in the model. Time is considered as an additional dimension.
Therefore, the one-dimensional model consists of the axial dimension. For the
two-dimensional model also the radial direction is taken into account. If we
distinguish between the solid, the catalyst or the adsorbent, and the gas phase,
we obtain a heterogeneous model. We consider a pseudo-homogeneous model,
neglect the difference in temperature between the solid particles and the gas
phase, and assume that the species only exists in the gas phase. The model is
based on the conservation of mass and energy, which is described by balance
equations. In order to be able to formulate the model several assumptions
have to be made.
The mass transport mechanisms we take into account are convective mass
transport, turbulence around the catalyst particles and bulk diffusion. The
latter two are lumped together as dispersion in axial (and radial) direction.
Heat transfer is the result of the following mechanisms.
• Mechanisms independent of flow:
– Thermal conduction through the solid particle,

– Thermal conduction through the contact point of two particles,
– Radiant heat transfer between the surfaces of two adjacent pellets.
• Mechanisms depending on the fluid flow:
– Thermal conduction through the fluid film near the contact surface
of two pellets,
– Heat transfer by convection,
– Heat conduction within the fluid,
6.2 The model equations of a cooled packed bed reactor 153
– Heat transfer by lateral mixing.
The contribution of radiation to the total heat flow turns out to be im-
portant at temperatures above 400◦ C. Below this temperature the various
mechanisms of heat transport, except for the heat transport by convection,
are usually described by a lumped parameter, the effective thermal conduc-
tivity.
The transport resistance between the gas phase and the catalyst is negli-
gible, as is the multiplicity of the catalyst particles, that is, the difference in
activity. Therefore the effectivity, denoted by η, is equal to one. We assume
that the gas phase satisfies the ideal gas law, and flows through the vessel at a
constant velocity. The velocity over a cross section of the reactor is assumed
constant, due to a high rate of turbulence. We assume that the pressure drop
over the unit caused by the flow along the catalyst bed, is negligible
The equipment both upstream and downstream the reactor have no influ-
ence on the behavior of the flow inside the vessel. Furthermore we assume that
dispersion of energy and mass, caused by diffusion and turbulence around the
catalyst particles, can only occur inside the reactor and not in the channels
leading to it. In addition, the reaction only occurs inside the reactor. There-
fore we can apply Danckwerts boundary conditions, see [63]. The temperature
and composition of the feed streams and the mass flow are constant in time.
The thermal equilibrium between the gas and the catalyst occurs instan-
taneously. Hence, intra-particle gradients in temperature or concentration are
assumed to be negligible. We assume that all the physical properties are con-
stant in the range of temperature and concentration that occurs in the reactor.
The dispersion coefficient is assumed to be constant and equal for every com-
ponent. The reaction is exothermic and the heat of reaction is independent
of the temperature. The reaction does not change the number of moles in the
gas phase, thus one mole of species A gives one mole of species B.
In order to model the cooling we assume that the reactor wall is cooled at
an more or less constant temperature, caused by a high flow rate or a large
density of the cooling flow. Inside the reactor the cooling occurs only via the
gas phase due to the negligible contact area between the catalyst particles and
the reactor wall.
In Table 6.1 we have summarized the assumptions made for both the one
and the two dimensional model of a packed bed reactor. The additional con-
dition for the one-dimensional model is that concentration and temperature
are constant over a cross section of the reactor.
• The gas phase satisfies the ideal gas law.

• The velocity of the flow is constant.
• The heat and concentration equilibrium between the gas phase and the catalyst
occurs instantaneously.
• The transport resistance and multiplicity of the catalyst particles is negligible.
• The physical properties, like the dispersion coefficient, the thermal conductiv-
ity and the molar based heat capacity, are independent of temperature and
concentration and equal for every component.
• The pressure drop caused by the catalyst particles is negligible.
• The reaction does not change the number of moles in the gas phase.
• The equipment both upstream and downstream has no influence on the flow
inside the reactor.
• Dispersion of heat and mass occurs only inside the reactor.
• The temperature and composition of the feed gas is constant in time.
• The reactor wall is cooled at constant temperature.
• Cooling at the reactor wall inside the reactor occurs only via the gas phase.
Table 6.1: Assumptions on the cooled packed bed reactor
The component balances and the mass balance

The component balance represents the conservation of mass of one single
species in the gas phase. We consider a very basic example of a species A
reacting into species B, that is A → B. The total concentration of the gas is
denoted by ρ and the mole fraction of species A by yA . The partial concentra-
tion of species A is given by CA = ρyA .
We compute the flow of species A through the cross section of the reactor
at z = z0 , that is, the number of moles that passes at z = z0 every second.
The flow is caused by convection and diffusion.
The convection is the bulk motion caused by feeding the reactor. If u is
the rate of the flow, then the convection is given by
BA = C A · u mol/(m2 s).
The diffusion is based on contributions from molecular diffusion in the gas

phase (to create the highest possible entropy) and from the turbulent flow
PSfrag replacements
z z + ∆z
Figure 6.1: A segment of the reactor of length ∆z.
around the particles, and is given by

∂yA
JA = −ρDax mol/(m2 s).
∂z
The molar flux is the sum of the convection and the diffusion term and
represents the number of moles of a component that crosses a unit area per
second,
WA = J A + B A mol/(m2 s).
To compute the flow one has to multiply the flux by the cross sectional area
of the reactor, denoted by Ac . But, since the reactor is filled with particles,
the void fraction ε has to be taken into account. So, the flow equals
FA = εAc WA mol/s.
The component balance is obtained by considering a small segment of the

reactor, see Figure 6.1. The volume of the segment is equal to ∆V = Ac ∆z.
The number of moles that accumulate in the small segment, ε∆V ∂CA /∂t, is
equal to the number of moles that enters the section, FA (z), minus the number
that leaves, FA (z + ∆z), minus the number that reacts per second. If r 0 is
the number of moles that reacts per kilogram catalyst per second, we have to
multiply this by (1 − ε)∆V ρcat , to obtain the number of moles that reacts per
second in the segment. This leads to the equality
{accumulation} = {in} − {out} − {reaction}

∂CA
ε∆V = FA (z) − FA (z + ∆z) − (1 − ε)∆V ρcat r0 .
∂t
After dividing both sides by ∆V and letting the length of the segment going
to zero (∆z ↓ 0), we arrive at a partial differential equation. The component
balance of species A reads
∂CA ∂ n ∂yA o
ε = ρεDax − εuCA − (1 − ε)ρcat r0 (6.1)
∂t ∂z ∂z
The left hand side of the component balance denotes the accumulation of
component A in the gas phase. The convective and diffusive contributions to
the flow and the reaction rate are represented by the right hand side terms.
In the same way we obtain the component balance of species B, given by
∂CB ∂ n ∂yB o
ε = ρεDax − εuCB + (1 − ε)ρcat r0 . (6.2)
∂t ∂z ∂z
Note the plus sign in front of the reaction term, which implies that the reaction
increases the concentration of species B.
Finally, a third species is also present in the reactor, namely, the carrier
gas. The carrier gas is an inert, that is, it does not take part in the reaction.
Therefore, the component balance of the inert is given by
∂CI ∂ n ∂yI o
ε = ρεDax − εuCI . (6.3)
∂t ∂z ∂z
If we add the component balances of all species we obtain the overall
mass balance, which is an important equation if the velocity is not necessarily
constant. Because the sum of the mole fractions equals one, yA + yB + yI = 1,
and has zero derivative, the overall mass balance is given by
∂ρ ∂uρ
ε = −ε .
∂t ∂z
Note that the reaction term is also canceled in this equation since the reaction
does not change the total number of molecules.
The component balance equations contain a second order derivative of the
mole fraction. So, we derive the boundary conditions at both ends of the
reactor. Note that we have assumed that mass dispersion appears only inside
the reactor. The reactor is called a closed-closed vessel, that is, either upstream
or downstream of the reactor mass dispersion is negligible. At the entrance
the boundary equation compares the flux in and in front of the reactor This
leads to the equality WA,0 = WA |z=0 , which is equal to
∂yA ¯
¯
CA,0 u = −ρDax + C A u¯ . (6.4)
∂z z=0
At the other end we have assumed that no influences exists of the equipment
on the behavior of the flow, which implies no gradients in the concentration
of the components,
∂yA ¯¯
¯ = 0. (6.5)
∂z z=L
The energy balance

Let us consider the open system given by a thin segment of the packed bed
reactor of length ∆z. The total energy contained in the segment is given by
Esys . The energy balance describes the change in total energy of the system
(∂Esys /∂t), which can be computed in two different ways.
For the first approach we analyze what happens inside the segment. The
total energy is given by the sum of the energy of the catalyst and the energy
of the gas phase, that is,
X
Esys = Es ρs (1 − ε)∆V + Ei ρi ε∆V. (6.6)
i
The energy Ei of a species in the gas phase consists of the enthalpy, Hi , and the
product −P Vi . Here Vi is the specific volume per mol of species i. Therefore,
X X X
E i ρi = (Hi − P Vi )ρi = Hi ρi − P.
i i
P
In the last step we used that i Vi ρi = 1. Dividing (6.6) by ∆V and differen-
tiating in time leads to the change in energy. If we assume that the density of
the catalyst is constant (∂ρs /∂t = 0), then the change in potential energy is
linearly proportional to the change in temperature, that is,
∂H ∂T
= cp ,
∂t ∂t
where cp denotes the specific heat capacity, at constant pressure. The specific
heat capacity cp is assumed to be independent of temperature and concentra-
tion. Using the component balances (6.1)-(6.3), that is,
∂ρi ∂Wi
ε =ε + νi (1 − ε)ρs r0 ,
∂t ∂z
the change in energy reads
1 ∂Esys ∂Es X ∂ρi X ∂Hi ∂P
= ρs (1 − ε) + ε Hi +ε ρi −ε
∆V ∂t ∂t ∂t ∂t ∂t
i i
∂T ∂T X ∂Wi
= (1 − ε)(ρcp )s + ε(ρcp )g +ε Hi
∂t ∂t ∂z
i
∂P
+(1 − ε)ρs (−∆H)r 0 − ε , (6.7)
∂t
P
where (−∆H) = i ν i Hi denotes the heat of reaction.
On the other hand, we can consider the interaction of the segment with its
surrounding. We have to take into account conduction (through the gas phase
and the catalyst particles), heat transport due to the flow in the reactor, and
cooling. The energy that results from the work of equipment is neglected.
For the conduction term, we use Fick’s first law of diffusion, cf. [63, 72].
The amount of energy that passes a cross section of the reactor per square
meter, per second equals
∂T
−λax , (6.8)
∂z
where λax is the effective axial heat conductivity, that depends on the heat
conductivities in the gas and the solid phase and on the heat-transfer resistance
between the two phases. Note that the conduction operates in the direction
of decreasing temperature, indicated by the minus sign.
The flow of energy is the amount of energy that crosses a passes section of
the reactor per second and is given by
X X
Fi Ei = εAc W i Ei .
i i
Because the energy of species i equals Ei = Hi − P Vi , we obtain

X X X
W i Ei = Wi Hi − P W i Vi
i i i
X X ∂yi X
= Wi Hi + P ρDax Vi − P u ρi Vi . (6.9)
∂z
i i i
By assuming that the specific volume is equal for every species in the gas
phase, the second term of the last expression in (6.9) disappears.
The cooling, denoted by Q̇, is the amount of energy that leaves the segment
at the wall of the reactor per second. The cooling rate per square meter
surface area is linearly proportional to the difference in temperature of the
segment and of the cooling jacket (−Uw (T − Tc )). The surface area of the
segment equals 2πR∆Z and the volume of the segment is πR2 ∆z. By aw
we denote the ratio between the surface area and the volume of the segment
(aw = 2πR∆z/(πR2 ∆z) = 2/R.). The total cooling per second is thus given
by
Q̇ = −Uw (T − Tc )Ac ∆zaw . (6.10)
From (6.8), (6.9) and (6.10), we obtain the following expression for the change
in energy of the segment

1 ∂Esys
lim
∆z→0 ∆V ∂t
1 n ∂T ¯¯z
= lim − λax Ac ¯
∆z→0 Ac ∆z ∂z z+∆z
³X ´¯z o
¯
+εAc Wi Hi − P u ¯ − Uw aw Ac ∆z(T − Tc )
z+∆z
i
∂2T ∂ X ∂ n o
= λax 2 − ε Wi Hi + P u − Uw aw (T − Tc ). (6.11)
∂z ∂z ∂z
i
Note that the second term of the right hand side can be expanded to
∂ X X ∂Hi X ∂Wi
ε Wi Hi = ε Wi + ε Hi . (6.12)
∂z ∂z ∂z
i i i
Because
X X³ ∂yi ´
Wi = − ρDa x + ρi u = ρg u.
∂z
i i
and because ∂Hi /∂z can be approximated by (cp )g ∂T /∂z the first term of the
right hand side of (6.12) becomes
∂T
ε(ρcp )g u .
∂z
PSince (6.7) is valid for all ∆V, we can combine (6.7) and (6.11). The term
ε i Hi · ∂Wi /∂z cancels and we derive the equation for the energy balance
³ ´ ∂T ∂P ∂2T ∂T
(1 − ε)(ρcp )s + ε(ρcp )g −ε = λax 2 − ε(ρcp )g u
∂t ∂t ∂z ∂z
∂ n o
− P u − Uw aw (T − Tc ) + (1 − ε)ρs (−∆H)r 0 . (6.13)
∂z
The left hand side shows the accumulation of enthalpy in the gas and solid
phase. On the right hand side, the first two terms show the contribution of the
heat transfer by convection and diffusion. The fourth term denotes the heat
transfer through the reactor wall to the surroundings. The last term gives the
enthalpy change due to reaction.
The boundary conditions are obtained in a similar way to those of the
component balance, (6.4) and (6.5), and are given by
∂T ¯
¯
u(ρcp )g T0 = −λax + u(ρcp )g T ¯ ,
∂z z=0
at the entrance of the reactor and

∂T ¯¯
¯ = 0.
∂z z=L
at the product end.
Reaction rate
The reaction rate depends on many factors. First of all it depends on the
concentration of the reactants in the reactor. In addition the temperature is
important. At low temperature, for example, the reaction might not occur at
all. If the heat of the reactor is extremely high, it can accelerate the reaction
and the reactor might explode. The type of the catalyst and the system of the
reaction on the catalyst increase the complexity of the formula.
In the simulations of Chapter 8 we restrict ourselves to the reaction rate
given by
ηk∞ av kc exp[−Ea /Rgas T ]
r0 (c, T ) = c,
av kc + ηk∞ exp[−Ea /Rgas T ]
according to Khinast et al. [33].
Radial direction
To extend to one-dimensional model with the radial direction, we assume
that the state in the reactor is cylindrically symmetric and that the dispersion
coefficient Drad and the thermal conductivity λrad are independent of position,
concentration and temperature. In addition, we assume that energy transport
by mass diffusion can be lumped into the thermal conductivity.
We consider the radial part of the diffusion in the energy balance equation.
The radial part of the diffusion in the component balance is obtained in a
similar way. We subdivide the segment of the reactor with width ∆z in M
rings, see Figure 6.2. The widths of the rings are given by ∆r1 , . . . , ∆rM .
Denote by ri the center radius of the ith ring, that is r1 = 12 ∆r1 , r2 = ∆r1 +
1 Pi−1 1
2 ∆r2 , and in general ri = j=1 ∆rj + 2 ∆ri , i = 1, . . . , M. We take a ring
with center radius r and width ∆r. The volume of this ring is given by
n ³ 1 ´2 ³ 1 ´2 o
∆V = ∆z π r + ∆r − π r − ∆r
2 2
= 2π∆z · r∆r.
Similarly to the axial case (6.8), the heat conductivity in radial direction per
m2 surface area, is given by
∂T
−λrad .
∂r
r2
r1
PSfrag replacements
∆z
Figure 6.2: A segment of the reactor of length ∆z.
The accumulation in the ring under consideration equals the flow through the
surface of the ring at r − 21 ∆r minus the flow through the surface of the ring
at r + 12 ∆r. If we divide the accumulation term by the volume of the ring, we
obtain
¯
1 n ∂T ¯¯ ³ ³ 1 ´ ´
(−λrad ) 2π r − ∆r ∆z
∆V ∂r ¯r− 1 ∆r 2
2
¯ ³ ³
∂T ¯¯ 1 ´ ´o
− (−λrad ) 2π r + ∆r ∆z . (6.14)
∂r ¯ 1 r+ 2 ∆r2
The expression in Formula (6.14) can be further simplified to

¯ 1 ³ ∂T ¯¯ ¯ ´
1 ∂T ¯¯r+ 2 ∆r 1 ¯ ∂T ¯¯
(λrad ) + (λ ) + .
∆r ∂r ¯r− 1 ∆r 2r ∂r ¯r− 1 ∆r ∂r ¯r+ 1 ∆r
rad
2 2 2
By taking the limit ∆r → 0, we arrive at

n ∂2T 1 ∂T o
(λrad ) + ,
∂r2 r ∂r
which equals
1 ∂ n ∂T o
(λrad ) r . (6.15)
r ∂r ∂r
The radial part of the diffusion in the component balance is given by
1 ∂ n ∂yA o
(ρDrad ) r . (6.16)
r ∂r ∂r
At the wall of the reactor the boundary condition
∂T ¯¯
λrad ¯ = −Uw (T (R) − Tc ), (6.17)
∂r r=R
is added to the system. Equation (6.17) describes the heat loss at the reactor
wall to the surrounding cooling jacket, which is linearly proportional to the
difference in the temperature inside and outside of the reactor wall. Because
no material can pass through the wall of the reactor, we have
∂yA ¯¯
¯ = 0.
∂r r=R
The cylindrical symmetry in the reactor yields the boundary conditions
∂yA ¯¯ ∂T ¯¯
¯ = 0, and ¯ = 0.
∂r r=0 ∂r r=0
A justification of the two-dimensional model

In the following, we justify that the above extension of the one-dimensional
model is indeed natural. The relation between the one- and two-dimensional
balance equations is based on the idea of a weighted average. To give a useful
one-dimensional representation of the two-dimensional state of the reactor, the
weighted average can be taken of the temperature and the concentration over
the cross section of the reactor. In the two-dimensional model, the temperature
in the point (z, r) at time t is denoted by T (z, r, t). So, the average temperature
over the cross section through z = z0 equals
Z R
2
T̄ (z0 , t) = rT (z0 , r, t)dr. (6.18)
R2 0
Before we compare the energy balance equations of both models, we apply

a few simplifications. We assume that the term (ρcp )g is constant in time and
space, as well as the velocity and the pressure. Therefore, the energy balance
(6.13) becomes
³ ´ ∂T
(1 − ε)(ρcp )s + ε(ρcp )g =
∂t
∂2T ∂T
λax − ε(ρcp )g u − Uw aw (T − Tc ). (6.19)
∂z 2 ∂z
The two-dimensional version of the energy balance equation reads

³ ´ ∂T
(1 − ε)(ρcp )s + ε(ρcp )g =
∂t
∂2T ∂T 1 ∂ n ∂T o
λax 2 − ε(ρcp )g u + (λrad ) r . (6.20)
∂z ∂z r ∂r ∂r
If we take the weighted average of both sides of the energy balance, (6.20),
over the cross section of the reactor and use (6.18) for the weighted average
of the temperature, we obtain
∂ T̄ ∂ 2 T̄ ∂ T̄
((ρcp )s (1 − ε) + (ρcp )g ε) = λax 2 − u(ρcp )g +
∂t ∂z ∂z
Z R ¯
2 0 2 ∂T ¯¯R
(−∆H) 2 r · r (c, T )dr + λrad 2 r . (6.21)
R 0 R ∂r ¯r=0
Using the boundary conditions in radial direction, we can rewrite the last term
of (6.21) in the following way
¯
2 ∂T ¯¯R 2 ¡ ¢
λrad r ¯ = − · U w T (R) − T 0 . (6.22)
R2 ∂r r=0 R
If we substitute (6.22) in (6.21) and assume that the concentration and the
temperature are constant in the radial direction, we recover the energy balance
of the one-dimensional model, (6.19), with aw = 2/R.
In the same way we can show that the component balance of the one-
dimensional model is also a limiting case of the component balance of the
two-dimensional model.
Dimensionless equations
In order to obtain the dimensionless versions of the balance equations we
use the following dimensionless variables. The conversion is given by x =
(c0 − c)/c0 , where c0 is the concentration of the reactants in the feeding gas
and c = CA = ρyA . If the conversion equals zero no reaction has occurred
and if the conversion equals one the reaction is completed. The dimensionless
temperature is given by θ = (T − T0 )/T0 , where T0 is the temperature of the
feeding gas. Since the reaction is exothermic and the cooling temperature is
fixed at T0 , the dimensionless temperature is always positive. The independent
dimensionless variables are time, τ = tu/L, the axial distance, ξ = z/L, and
the radial distance, ζ = r/R, for the two-dimensional model.
a1 = (ρcp )s · (1 − ε)/(ρcp )g + ε a2 = (−∆H)c0 /(T0 (ρcp )g ) = ∆Tad /T0

a3 = Lηk∞ /u a4 = ηk ∞ /(av kc )
Pem = uL/(εDax ) Peh = uL(ρcp )g /λax
β = Ea /(RT0 ) Φ = 2LUw /(Ru(ρcp )g )
Pemp = uL/(εDrad ) Pehp = (ρcp )g Lu/λrad
Table 6.2: The dimensionless parameters of the balance equations.
We first derive the dimensionless version of the component (conversion)

balance of the one-dimensional model. Substituting the expressions for the
dimensionless variables into (6.1) gives
∂(1 − x)c0 ∂ 2 (1 − x)c0 ∂(1 − x)c0 ηk∞ av kc (1 − x)c0
ε = εDax 2
−u − Ea 1
.
∂τ L/u ∂(ξL) ∂ξL av kc exp( RT · 1+θ ) + ηk∞
0
Hereafter, we divide both sides by the factor −c0 u/L. By gathering all param-
eters in dimensionless groups, we obtain
∂x 1 ∂ 2 x ∂x (1 − x)
ε = − + a3 .
∂τ Pem ∂ξ 2 ∂ξ exp(β/(1 + θ)) + a4
In the same way the energy balance in (6.19) becomes
∂T0 (1 + θ)
((ρcp )s (1 − ε) + (ρcp )g ε)
∂τ L/u
2
∂ T0 (1 + θ) ∂T0 (1 + θ)
= λax 2
− u(ρcp )g +
∂(ξL) ∂ξL
ηk∞ av kc (1 − x)c0
(−∆H) Ea 1
− Uw aw (T0 (1 + θ) − Tc ).
av kc exp( RT 0 1+θ ) + ηk ∞
Dividing by (ρcp )g T0 u/L gives

∂θ 1 ∂ 2 θ ∂θ (1 − x)
a1 = 2
− + a 2 a3 − Φθ.
∂τ Peh ∂ξ ∂ξ exp[β/(1 + θ)] + a4
The expressions for the dimensionless parameters are given in Table 6.2.
For the two-dimensional model the major part follows from the above dis-
cussion. We only deal with the radial components of the diffusion terms. After
dividing the radial diffusion term of the energy balance (6.15) by (ρcp )g T0 u/L,
the dimensionless version is given by
λrad L2 1 ∂ n ∂θ o
· 2 ζ .
(ρcp )g Lu R ζ ∂ζ ∂ζ
Therefore, we define
(ρcp )g Lu
Pehp = .
λrad
By substituting the dimensionless variables in the radial term of the com-
ponent balance (6.16) and subsequently dividing again by −c0 u/L, we obtain
εDrad L2 1 ∂ n ∂x o
· 2 ζ ,
uL R ζ ∂ζ ∂ζ
and we define Pemp by
uL
Pemp = .
εDrad
In Appendix C we explain that in general Dax 6= Drad . We have added the
parameters Pemp and Pehp in Table 6.2.
The dimensionless boundary conditions in axial direction of the one- and
two-dimensional model are derived in the same way as the balance equations
and given at the end of this section. To point out the difference between
the one- and two-dimensional model we explicitly derive the dimensionless
boundary condition for the temperature at r = R (ζ = 1). So, starting with
¯
∂T ¯¯
λrad = −Uw (T (R) − Tc ),
∂r ¯r=R
we substitute θ and ζ, and divide both sides by u(ρcp )g T0 R/L2 , which leads
to
λrad L2 ∂θ ¯¯ Uw L 1
· 2 ¯ =− · θ(1) = − Φ · θ(1).
Lu(ρcp )g R ∂ζ ζ=1 Ru(ρcp )g 2
Note that dimensionless cooling capacity, Φ, is multiplied by a factor one half.
A summary of the model equations

We summarize the complete dimensionless one- and two-dimensional model.
The parameters involved are given in Table 6.2.
For the one-dimensional model the component balance reads
∂x 1 ∂ 2 x ∂x
ε = − + χ(x, θ), (6.23)
∂τ Pem ∂ξ 2 ∂ξ
where the reaction rate is given by
n o−1
χ(x, θ) = a3 (1 − x) exp(β/(1 + θ)) + a4 .
The energy balance involves a cooling term and is given by
∂θ 1 ∂ 2 θ ∂θ
a1 = − + a2 χ(x, θ) − Φθ. (6.24)
∂τ Peh ∂ξ 2 ∂ξ
The boundary conditions read
¯ ¯
¯ ∂θ ¯
θ − Pe1h ∂θ
∂ξ ¯ = 0, ∂ξ ¯ξ=1 = 0,
ξ=0
¯ ¯ (6.25)
1 ∂x ¯ ∂x ¯
x− Pem ∂ξ ¯ξ=0 = 0, ∂ξ ¯ξ=1 = 0.
For the two-dimensional model the component balance is given by
∂x 1 ∂ 2 x ∂x 1 L2 1 ∂ n ∂x o
ε = − + χ(x, θ) + ζ , (6.26)
∂τ Pem ∂ξ 2 ∂ξ Pemp R2 ζ ∂ζ ∂ζ
the energy balance is given by
∂θ 1 ∂ 2 θ ∂θ 1 1 ∂ n ∂θ o
a1 = − + a 2 χ(x, θ) + ζ (6.27)
∂τ Peh ∂ξ 2 ∂ξ Pehp ζ ∂ζ ∂ζ
and the boundary conditions are given by

¯ ¯
L2 ∂θ ¯ ∂θ ¯
θ − Pe1h R 2 ∂ξ ¯ = 0, ∂ξ ¯ = 0,
ξ=0 ξ=1
¯ ¯
1 ∂x ¯ ∂x ¯
x− Pem ∂ξ ¯ξ=0 = 0, ∂ξ ¯ξ=1 = 0,
¯ ¯ (6.28)
∂θ ¯ 1 1 L2 ∂θ ¯
∂ζ ¯ζ=0 = 0, 2 Φθ + Pehp R2 ∂ζ ¯ζ=1 = 0,
¯ ¯
∂x ¯ ∂x ¯
∂ζ ¯ζ=0 = 0, ∂ζ ¯ζ=1 = 0.
Chapter 7
Numerical approach for

solving periodically forced
processes
To solve a model consisting of partial differential equations including nonlinear

terms, the use of a computer is unavoidable. Therefore the model has to be
discretized in space and implemented in the computer. In Section 7.1 we give
an example of a discretization, called finite volumes.
During this process of discretization and implementation many errors might
be made. For instance, rounding errors can have a major influence. Further-
more, the grid should be chosen fine enough. So, the implemented models
have to be checked. This will be done in Section 7.2.
In this chapter we use basic partial differential equations. The ideas de-
veloped in Sections 7.2 and 7.1 can easily be extended to the model equations
of the packed bed reactor derived in Section 6.2.
The last part of this chapter, Section 7.3, contains a short description of
bifurcation theory and a continuation technique. The bifurcation theory can
be used to find out whether a periodically forced process has a periodic stable
limiting state. In addition, it shows when small changes in the parameters
have a major influence on the behavior of the limiting state. For parame-
ter investigation we do not want to compute the periodic limiting state from
scratch for every value of the bifurcation parameter. If we change the bifurca-
tion parameter slightly we would prefer to take the old periodic limiting state
as initial estimate of an iterative method to compute the new one.
167
168 Numerical approach for solving periodically forced processes
7.1 Discretization of the model equations

We consider the following initial-boundary value problem


 ut = duzz − auz + h(u),

 ¯ ¯
¯ ¯ (7.1)
(au − duz )¯ = duz ¯ = 0,

 z=0 z=1


u(z, 0) = u0 (z), z ∈ [0, 1],
where u : [0, 1] × R+ → R and a, d > 0. The partial differential equation

describes, for example, the temperature distribution in a reactor. Note that
the positivity of a implies that the gas is flowing from the left to the right end
of the reactor, in positive z-direction.
In order to discretize (7.1) we divide the reactor in N segments of equal
PSfrag replacements
width. The state u is assumed to be constant over a segment and located
in the center, see Figure 7.1. For every segment, i = 1, . . . , N, a balance
u(z1 ) u(z2 ) u(zi ) u(zN )
z=0 z=1
∆z
Figure 7.1: The distribution of the grid points over the interval.
equation is derived, where the accumulation term ut (zi ) is expressed in terms

of the state of the segment, u(zi ), and the states of the neighboring segments,
u(zi−2 ), u(zi−1 ), u(zi+1 ) and u(zi+2 ). This results in a large system of ordinary
differential equations, which can be written as
Ut = F (U (t)), (7.2)
where U (t) = (u(z1 , t), . . . , u(zn , t)).

In other words, we divide the interval [0, 1] in N small intervals of equal
length (∆z = 1/N ), and define zi = (i − 12 ) · ∆z, i = 1, . . . , N. The boundaries
of the ith interval are given by zi+ 1 = i · ∆z and zi− 1 = (i − 1) · ∆z, for
2 2
i = 1, . . . , N. Therefore, z 1 = 0 and zN + 1 = 1. In order to approximate the
2 2
first and second derivative of u in a grid point zi , we use the Taylor expansion
7.1 Discretization of the model equations 169
of u around zi
1
u(z) = u(zi ) + uz (zi )(z − zi ) + uzz (zi )(z − zi )2
2
1
+ uzzz (zi )(z − zi )3 + O(|z − zi |4 ). (7.3)
6
The first derivative can be computed in several ways. One approach is
called first order upwind. If the flow rate in the reactor is rather high, we
have to use information of grid points that lie in upstream direction. So, if
the flow comes from the left we use the state value u in zi−1 . We apply (7.3)
in z = zi−1 and obtain
1
u(zi−1 ) = u(zi ) − uz (zi )∆z + uzz (zi )(∆z)2 + O((∆z)3 ). (7.4)
2
If we rearrange the terms, the derivative of u in zi equals
u(zi ) − u(zi−1 )
uz (zi ) = + O(∆z). (7.5)
∆z
The first term of the right hand side of (7.5) is the approximation for the
derivative of u. The situation is given schematically in Figure 7.2.
u(zi )
PSfrag replacements
u(zi−1 )
zi+1
zi−1 zi
u(zi+1 ) zi− 12
Figure 7.2: First order upwind approximation of the derivative in zi
Another approach, called second order central, is applicable if the diffusion

in the reactor dominates the dynamics. We apply (7.3) in zi+1
1
u(zi+1 ) = u(zi ) + uz (zi )∆z + uzz (zi )(∆z)2 + O((∆z)3 ). (7.6)
2
Subtracting (7.4) from (7.6) gives
u(zi+1 ) − u(zi−1 ) = 2∆zuz (zi ) + O((∆z)3 )
and therefore
u(zi+1 ) − u(zi−1 )
uz (zi ) = + O((∆z)2 ). (7.7)
2∆z
The situation is given schematically in Figure 7.3.
PSfrag replacements
u(zi ) u(zi+1 )
u(zi−1 )
zi−1 zi zi+1
zi− 21 zi+ 12
Figure 7.3: Second order central approximation of the derivative in zi
For the second derivative of u one relevant approximation is available. Note

that
u(zi+1 ) − u(zi ) 1 1
= uz (zi ) + uzz (zi )∆z + uzzz (zi )(∆z)2 + O((∆z)3 ), (7.8)
∆z 2 6
and
u(zi ) − u(zi−1 ) 1 1
= uz (zi ) − uzz (zi )∆z + uzzz (zi )(∆z)2 + O((∆z)3 ). (7.9)
∆z 2 6
We subtract (7.9) from (7.8), divide by ∆z and arrive at
u(zi+1 ) − 2u(zi ) + u(zi−1 )

uzz (zi ) = + O(h2 ).
h2
If we use second order central for the diffusion term and first order upwind
for the convective term, the ith component function of F in (7.2) is given by
d ³ ´
u(z i+1 ) − 2u(z i ) + u(z i−1 )
(∆z)2
a ³ ´
− u(zi ) − u(zi−1 ) + h(u(zi )), (7.10)
∆z
7.1 Discretization of the model equations 171
for i = 2, . . . , N − 1. In order to derive the first component function and the

last component function of F, we apply a slightly different approach. First,
note that
¯ ¯ 1 ¯z 1
¯ ¯ ¯ i+
(duzz − auz )¯ = (duz − au)z ¯ ≈ (duz − au)¯ 2 .
zi zi zi+ 1 − zi− 1 zi− 1
2 2 2
The first derivative uz evaluated at the boundary between two segments,

(zi+ 1 ), is approximated using the state u at the grid points of both segments
2
(zi and zi+1 ), that is,
1
uz (zi+ 1 ) = (u(zi+1 ) − u(zi )).
2 zi+1 − zi
If we apply first order upwind then u evaluated at zi+ 1 is approximated by

2
u(zi ), the nearest point in upstream direction. The ith component function of
F becomes
µ ¶
1 u(zi+1 ) − u(zi ) u(zi ) − u(zi−1 )
d −
zi+ 1 − zi− 1 zi+1 − zi zi − zi−1
2 2
u(zi ) − u(zi−1 )
−a + h(u(zi )), (7.11)
zi+ 1 − zi− 1
2 2
for i = 2, . . . , N − 1. Note that for equidistant grid (7.10) and (7.11) are equal.
Because the mesh is chosen such that z 1 = 0, the left boundary condition
2
reads (duz − au)|z 1 = 0. The first component function of F is given by
2
1 u(z2 ) − u(z1 ) u(z1 )

d · −a + h(u(z1 )).
z3 z2 − z 1 z3
2 2
On the other hand, we have defined zN + 1 = 1 and thus duz |zN + 1 = 0. The
2 2
N th component function of F yields
µ ¶
1 u(zN ) − u(zN −1 ) u(zN ) − u(zN −1 )
d − −a + h(u(zN )).
1 − zN − 1 zN − zN −1 1 − zN − 1
2 2
If we apply central discretization for the first derivative, u(zi+ 1 ) is approx-

2
imated by (u(zi ) + u(zi+1 ))/2 to derive the last component function of F. In
this case we have to evaluate u at zN +1 . Because the first derivative of u in
zN + 1 = 1 equals zero, we can replace u(zN +1 ) by u(zN ).
2
The two-dimensional initial-boundary value problem




 ut = d1 uzz − a1 uz + dr2 (rur )r + h(u),

 ¯ ¯

(a1 u − d1 uz )¯¯
 = d 1 uz ¯
¯
= 0,
¯ z=0 ¯ z=1 (7.12)

 ¯ ¯

 ur ¯ = (a2 u + d2 rur )¯ = 0,

 r=0 r=1

u(z, r, 0) = u (z, r),
0 (z, r) ∈ [0, 1]2 ,
where a1 , a2 , d1 , d2 > 0, can be discretized using mainly the same approach.
In addition to the axial derivative in the partial differential equation, we have
to deal with radial component of the diffusion term. Therefore, we divide the
radial axis into M intervals, with boundaries r 1 , r 3 , . . . , rM + 1 . We set r 1 = 0
2 2 2 2
and rM + 1 = 1. In every interval j we choose a grid point rj , j = 1, . . . , M,
2
and approximate the radial term at the grid point rj , by
d2 ¯ d2 1 ¯r 1
¯ ¯ j+
(rur )r ¯ ≈ (rur )¯ 2 . (7.13)
r rj rj rj+ 1 − rj− 1 rj− 1
2 2 2
For j = 2, . . . , M − 1 we can expand (7.13) to

¯ µ ¶
d2 ¯ d2 1 u(rj+1 ) − u(rj ) u(rj ) − u(rj−1 )
(rur )r ¯ ≈ r 1 − rj− 1 .
r rj rj rj+ 1 − rj− 1 j+ 2 rj+1 − rj 2 rj − rj−1
2 2
For the first and the last grid point the boundary conditions have to be taken
into account. Because the derivative ur at r = 0 equals zero, we obtain for
j = 1,
d2 ¯ d2 1 ¯r 3 d2 u(r2 ) − u(r1 )
¯ ¯
(rur )r ¯ ≈ · (rur )¯ 2 ≈ · .
r r1 r1 r 3 − r 1 r1 r1 r2 − r 1
2 2 2
The other boundary condition leads to

d2 ¯ d2 1 ¯r 1
¯ ¯ M+ 2
(rur )r ¯ ≈ · (rur )¯
r rM rM rM + 1 − r M − 1 rM − 1
2 2
µ 2 ¶
1 1 u(rM ) − u(rM −1 )
≈ · − a2 u(1) − d2 rM − 1 .
rM 1 − r M − 1 2 rM − rM −1
2
The value of u at r = 1 is not defined yet. Because the gradient of u at r = 1

in general is not equal to zero, we extrapolate u using the values at the grid
points rM −1 and rM , see Figure 7.4. This leads to
u(rM ) − u(rM −1 )
u(1) = u(rM ) + (1 − rM ) · .
rM − rM −1
7.2 Tests for the discretized model equations 173
u(1)
u(rM )
PSfrag replacements
u(rM −1 )
rM −1 rM
rM − 12 rM + 12
Figure 7.4: An approximation of the value u(1).
So far, we have only considered the axial and radial terms separately.
In order to discretize (7.12) the values of u in the grid points of the two-
dimensional mesh should be stored in one single vector U. In order to obtain
a small band width of the Jacobian of the function F, we first count in the
direction that has the smallest number of grid points (often M < N ). So the
vector U becomes
U = (u1,1 , u1,2 , . . . , u1,M , u2,1 , . . . , uN,1 , . . . , uN,M ),
where ui,j = u(zi , rj ).
7.2 Tests for the discretized model equations

The initial-boundary value problem (7.1) can explicitly be solved if the func-
tion h is affine. In general the solution is a infinite sum of terms with co-
sine, sine and the exponential function, where the coefficients are fixed by
the boundary conditions and the initial condition. The explicit solution is
derived by splitting the variables, that is, assuming that u in of the form
u(z, t) = Z(z)T (t) and substituting this in the partial differential equations.
The obtained analytical solution can be compared to the results of numerical
simulation.
If it is not possible to compute the explicit solution we have to consider
other techniques to check the solution given by the computer. One approach
is to check the balances. We first integrate both sides of the partial differential
equation of (7.1) in time (T > 0) and space,

Z TZ 1 Z TZ 1
ut dzdt = (duzz − auz + h(u))dzdt.
0 0 0 0
If we assume that the solution is continuous, we can change the order of

integration. Therefore,
Z 1n ¯T o Z Tn ¯1 Z 1 o
¯ ¯
u(z, t)¯ dz = (dwz − au)¯ + h(u)dz dt.
0 0 0 0 0
By applying the boundary conditions of (7.1) we obtain

Z 1 Z Tn Z 1 o
{u(z, T ) − u0 (z)}dz = − au(1, t) + h(u)dz dt.
0 0 0
We can check the simulation by computing the integral at the right hand side
simultaneously with the variable u.
Another possibility is multiplying both side of the partial differential equa-
tion by u and then integrating in time and space, which results in the following
’energy estimate’,
Z TZ 1 Z TZ 1
(uut )dzdt = u(duzz − auz + h(u))dzdt.
0 0 0 0
Again interchanging the integration order gives

Z 1n Z Tn ¯1 Z 1 Z 1
1 2 ¯¯T o ¯ 2 a 2 ¯¯1 o
u ¯ dz = duuz ¯ − duz dz − u ¯ + uh(u)dz dt,
0 2 0 0 0 0 2 0 0
and inserting the boundary conditions yields

Z 1 T n Z
1 2 2
{u (z, T ) − u (z, 0)}dz = − au2 (0, t)
2 0 0
Z 1 Z 1 o
2 a 2 a 2
−d uz dz − u (1, t) + u (0, t) + uh(u)dz dt.
0 2 2 0
This can be simplified to

Z 1 Z 1
2
u (z, T )dz = u20 (z)dz
0 0
Z Tn Z 1 Z 1 o
2 2
− au (1, t) + au (0, t)) + 2d u2z dz − uh(u)dz dt.
0 0 0
7.3 Bifurcation theory and continuation techniques 175
R1
This implies that if the function h satisfies 0 uh(u)dz < 0 then the ’total
R1
amount of energy’ in the system (ku(., t)k22 = 0 u2 (z, t)dz) is decreasing.
If the function h is affine, the two-dimensional problem (7.12) can be solved
explicitly by splitting the variables, that is, assuming that u in of the form
u(z, r, t) = Z(z)R(r)T (t). The resulting ordinary differential equations in z
and t are solved in the same way, as was done for the one-dimensional problem.
The ordinary differential equation that involves the variable r is a so-called
Bessel equation and the solutions are given in terms of the Bessel function. In
general the solution of (7.12) is an infinite sum of terms with cosine, sine, the
exponential function and the Bessel function, where the coefficients are fixed
by the boundary conditions and the initial condition.
We shortly discuss some additional approaches to check the implemen-
tation of the discretized equations. If a reliable implementation of the one-
dimensional model exists, we can consider (artificial) limit cases of the process
in which the one-dimensional and the two-dimensional should give the same
results. That is, when the state of the reactor has no gradients in radial di-
rection. We describe two situations starting from an initial state u0 , which
is constant in radial direction. Note that the boundary condition at z = 0
and z = 1 are uniform over the cross section of the reactor. If the diffusion
in radial direction is high, differences in radial direction will be removed in-
stantaneously. In terms of 7.12 this means that if d2 is large, the term (rur )r
will become small shortly. Another example is when the cooling of the reactor
stagnates at or nearby the reactor wall. That is, the reactor wall is isolated,
a2 = 0, or no diffusion exists in radial direction, d2 = 0. If a2 = 0 then the
temperature gradient at r = 1 will be zero and no gradients are introduced.
In order to check the implemented model if radial gradients are present,
we can again integrate the initial-boundary value problem (7.12) in time and
space. Note that for the radial direction we first have to multiply the equation
by r, in order to obtain the weighted average. We obtain similar integral
equations as in the case of the system (7.1). Terms of the integral equation
can simultaneously be integrated with the variable u.
7.3 Bifurcation theory and continuation techniques

Periodically forced processes in packed bed reactors can be described by use
of partial differential equations. In order to investigate the behavior of the
system numerically, we discretize the equations in space using a finite volumes
technique with first order upwind for the convective term. The state of the
reactor at time t is denoted by a vector x(t) from the n-dimensional vector
space, Rn . The resulting system of n ordinary differential equations can be

written as
x0 (t) = F (x(t), t), (7.14)
where F (·, t + tc ) = F (·, t) for t ∈ R, and tc denotes the period length.
The map f : Rn → Rn that assigns to an initial state at time zero, x(0) =
x0 , the value of the solution after one cycle, x(tc ), is called the Poincaré or
period map of (7.14). So, we have
f (x0 ) = x(tc ). (7.15)
In other words, evaluating the map f is equivalent to simulating one cycle of
the process in (7.14).
Moreover, a periodic state of the reactor corresponds to a t c -periodic so-
lution x(t) of (7.14). Since the initial condition, x0 , of a periodic solution is a
fixed point of the period map, we solve
f (x) − x = 0, (7.16)
using iterative methods. Note that the value f (x) is obtained by integrating
a large system of ordinary differential equations over a period tc . Therefore,
the function evaluation is a computationally expensive task, and the iterative
method that needs the fewest evaluations of f to solve (7.16), is the most
efficient. Since it might take a long transient time before the limiting periodic
state is reached, direct methods are preferable to dynamical simulation.
The model of a packed bed reactor contains several physical parameters,
which may vary over certain specified intervals. Therefore, it is important to
understand the qualitative behavior of the systems as a bifurcation parame-
ter change. A good design for the periodically forced process in the packed
bed reactor is such that the qualitative behavior does not change when the
bifurcation parameter is varied slightly from the value for which the original
design was made. The value of the bifurcation parameter where the qualita-
tive property of the state of the reactor changes is called a bifurcation point.
Knowledge of the bifurcation points is necessary for a good understanding of
the system. Our objective in this section is to give a short view of the easiest
types of bifurcations and method to find the bifurcation values.
Let f be the period map of a periodically forced process. Now assume that
the period map depends on a bifurcation parameter λ. Thus, we consider the
dynamical system
xk+1 = f (xk ; λ) (7.17)
where f : Rn × R → Rn , starting from an initial condition x0 . A periodic
state of the process is a fixed point of the period map (f (x∗ ) = x∗ ). In
fact, the periodic state depends on the value of the bifurcation parameter
(x∗ = x∗ (λ)). We are interested whether and how the periodic state changes,
when we vary the bifurcation parameter slightly. To understand the local
behavior of the system in more detail, we have to consider the Jacobian of the
period map, also called the monodromy matrix. The monodromy matrix M
describes the evolution of a small perturbation over one period. The stability
of periodic solutions is determined by the Floquet multipliers, the eigenvalues
of the monodromy matrix, [30].
A periodic solution is stable when the absolute values of all the (possibly
complex) eigenvalues of M are smaller than unity. This implies that a neigh-
borhood exists of the periodic state x∗ in which all trajectories converge to
the periodic state as time goes to infinity.
PSfrag replacements
µ1
µ1 µ1
θ0
1 −1
µ2
Figure 7.5: Different bifurcation scenarios of a periodically forced system.
When changing the bifurcation parameter an eigenvalue might cross the

unit circle and the dynamics of the system can change completely. Landing
at a bifurcation point the periodic state becomes unstable or disappears. The
angle at which an eigenvalue µ crosses the unit circle, determines the type of
bifurcation. In the following example we consider three different scenario’s,
see Section 8.2 for more details.
Example 7.1. If the eigenvalue leaves the unit circle at µ = 1 the number of
periodic solutions of the system changes. In general this will be by two. The
bifurcation point is called a limit point or a saddle-node. Let the flow-reversal
time tf be the bifurcation parameter and fixed all other physical parameters
of the system. For moderate values of tf a stable periodic state exists at high
temperature. However the longer we flow from one direction the more energy
is purged out of the reactor during one flow-reversal period. There exists a
minimum value for tf for which the extinguished state is still the only possible
periodic state. This value for tf corresponds to the bifurcation point.
If the eigenvalue leaves the unit circle at µ = −1 the period of the solution
is doubled. Let f be the map corresponding to half a period of the reverse flow
reactor, defined by (8.3). If x∗ is a stable fixed point of f, the limiting state

of the reactor is a symmetric period state, see Figure 8.5(a). If we alternate
the bifurcation parameter such that the largest eigenvalue of the monodromy
matrix leaves the unit circle at µ = −1, the fixed point x∗ becomes unstable,
and the limiting state corresponds to a new point x̃∗ that satisfies
f 2 (x̃∗ ) = x̃∗ 6= f (x̃∗ ).
Since f 2 equals the period map of a whole cycle of the reverse flow reactor,
x̃∗ is a periodic state of the process. However, it has becomes asymmetric, see
Figure 8.5(b).
If a pair of eigenvalues leaves the unit circle at µ1 = eiθ0 and µ2 = e−iθ0 ,
where 0 < θ0 < π, the limiting state of the reactor becomes quasi-periodic,
which implies the state follows two frequencies, see 8.5(c). Note that a com-
plex eigenvalue always has a conjugate partner. This bifurcation is called a
Neimark-Sacker bifurcation, and corresponds to a transition from a single to
a two-frequency motion.
Continuation techniques
Clearly, we are interested in the dependence of the limiting periodic state of the
periodically forced process on certain bifurcation parameters. In this section
we describe the basics of continuation techniques to analyze the dynamical
system
xk+1 = F (xk , α), α ∈ R,
where the period map F : Rn+1 → Rn depends upon one bifurcation parameter
α.
Fixed points of the period map, also called equilibrium points, satisfy the
equation
F (x, α) = x. (7.18)
If we denote a point in Rn+1 by y = (x, α) and define G : Rn+1 → Rn by
G(y) = F (x, α) − x, Equation (7.18) leads to
G(y) = 0. (7.19)
By the implicit function Theorem the system of (7.19) locally defines a smooth
one-dimensional curve C in Rn+1 passing through a point y0 , that satisfies
(7.19), provided that
rank JG (y0 ) = n. (7.20)
Here JG (y0 ) denotes the Jacobian of G at y0 . Every point on the curve C that
satisfies (7.20) is called regular.
During continuation, points on this curve (y0 , y1 , y2 , . . .) are approximated
with a desired accuracy. The first two points of the sequence are computed
by fixing the bifurcation parameter and applying iterative methods to solve
(7.18). For the subsequent points, most of the continuation algorithms used
in bifurcation analysis implement predictor-corrector methods that include
three basic steps, prediction, correction and step size adjustment. The next
continuation point is predicted by adding a step to the previous point, that is
based on previously computed points of the branch and an appropriate step
length. Next, the prediction is corrected bordered with a step length condition.
Finally the step size is adapted.
We describe some of the basic choices for the prediction and the correction
step, and strategies to validate the new computed point on the bifurcation
branch in order to choose the new step size.
Prediction
Suppose that a regular point yk in the sequence approximating the curve C
has been found. Then, the initial guess ỹ of the next point in the sequence is
made using the prediction formula
ỹ = yk + ∆sk vk , (7.21)
where ∆sk is the current step size, and vk ∈ Rn+1 is a vector of unit length
(kvk k = 1).
A possible choice for vk is the tangent vector to the curve in yk . To obtain
the tangent vector we parametrize the curve near yk , by the arc-length s with
y(0) = yk . If we substitute the parametrization into (7.19) and take derivative
with respect to s, we obtain
JG (yk )vk = 0, (7.22)
since vk = dy/ds(0). System (7.22) has a unique solution (vk has unit length)
because rank JG (yk ) = n by the assumption of regularity.
Another popular prediction method is the secant prediction. It requires
two previous points on the curve, yk−1 and yk . The prediction is given by
(7.21), where now
yk−1 − yk
vk = . (7.23)
kyk−1 − yk k
The advantage of this method is that the computation of the Jacobian J G and
of the solution of a large system of equations is avoided.
A third, simple, method is changing the bifurcation parameter only and

using the last point in the sequence yk as initial guess of the next point on the
bifurcation branch. The direction of the step is therefore given by
vk = en+1 .
where en+1 is the last unit-vector in Rn+1 . A minor of this method is that
the branch can only be detected in increasing direction of the bifurcation
parameter.
Correction
Having predicted a point ỹ presumably close to the curve, one needs to locate
the next point yk+1 on the curve to within a specified accuracy. This correction
is usually performed by some Newton-like iterations. However, the standard
Newton iterations have to be applied to a system in which the number of
equations is equal to that of the unknowns. So, in order to apply Newton’s
method or a quasi-Newton method, a scalar condition
hk (y) = 0
has to be appended to the system (7.18), where hk : Rn+1 → R is called the

control function. We redefine function G : Rn+1 → Rn+1 by
µ ¶
F (x, α) − x
G(y) = .
hk (y)
Solving
G(y) = 0 (7.24)
geometrically means that one looks for an intersection of the curve C with some
surface near ỹ. It is natural to assume that the prediction point ỹ belongs to
this surface as well (that is, hk (ỹ) = 0). There are several ways to specify the
function hk (y).
The simplest way is to take a hyperplane passing through the point ỹ that
is orthogonal to the coordinate axis of the bifurcation parameter, namely, set
hk (y) = α − α̃.
This approach is called natural continuation. Instead of the coordinate axis

of the bifurcation parameter often the axis is taken that corresponds to the
index of the component of vk with the maximum absolute value. Because the
element of yk with this index is locally the most rapidly changing along C.
PSfrag replacements y0
y1 v v
ỹ ỹ
y
Figure 7.6: Prediction and correction step.
Another possibility, called pseudo-arclength continuation, is to select the

hyperplane passing through the point ỹ that is orthogonal to the vector v k .
This hyperplane is defined by
0 = hy − ỹ, vk i.
Therefore we set
hk (y) = hy − ỹ, vk i
= hy − (yk + ∆sk vk ), vk i
= hy − yk , vk i − ∆sk . (7.25)
If the curve is regular (rank JG (y) = n for all y ∈ C) and the step size ∆sk
is sufficiently small, one can prove that the Newton iterations for (7.24) will
converge to a point on the curve C from the predicted point ỹ of the tangent
prediction or the secant prediction, [38].
For the third possibility not a hyperplane is taken but a sphere around
the previous computed point yk in the sequence. That is, the distant between
the approximation of the next point on the curve and yk is fixed at ∆sk . The
control function is therefore defined as
hk (y) = ky − yk k − ∆sk .
Clearly, the predicted point ỹ lies on the sphere. The main disadvantages of
this approach is that the control function is not linear and, especially in the
neighborhood of bifurcation point, the continuation might go in the wrong
direction, since the curve has at least two intersection points with the sphere.
Note that the matrix JG (yk ) needed in (7.22) can be extracted from the
last iteration of the Newton process solving (7.24).
Step size adjustment

There are many sophisticated algorithms to control the step size ∆sk . The
simplest convergence dependent control, however, has proved to be reliable
and easily implementable. That is, if no convergence occurs after a prescribed
number of iterations in the correction step, we decrease the step size and return
to the prediction step. If the last point is successfully computed, we accept
it as a new point of the sequence and multiply the step length with a given,
constant factor greater than one. If the convergence succeeds but it uses many
iterations we accept the new point of the sequence but decrease the step size
∆sk .
To summarize this section we give the continuation algorithm that we have
applied in our simulations.
Algorithm 7.2 (Continuation scheme). Let yk = (xk , αk ) and yk−1 =

(xk−1 , αk−1 ) be the last successfully computed points in the sequence approx-
imating the branch. Fix the real parameters a and b (a > 1 and 0 < b < 1)
and the number imax . The next point, yk+1 = (xk+1 , αk+1 ), in the sequence is
determined by:
Secant prediction: Set ỹ = yk + ∆sk · vk , where ∆sk is the current step

size and vk is defined by (7.23).
pseudo-arclength continuation: Solve (7.24) where G(y) = (F (x, α) −

x, hk (y)) and hk (y) is defined by (7.25)
step-size control: If the correction step fails, then multiply ∆sk by b and
return to the prediction step. If the correction step succeeds using less
than imax /2 iterations of an iterative method, then accept the new point
in the sequence and set ∆sk+1 = a∆sk . If the correction step succeeds
but using more than imax /2 iterations, then accept the new point in the
sequence and set ∆sk+1 = b∆sk .
Chapter 8
Efficient simulation of
periodically forced reactors in
2D
The final chapter of this thesis is devoted to the connection between the it-
erative method for solving high-dimensional systems of nonlinear equations
and the efficient simulation of a two-dimensional model for the reverse flow
reactor, where the radial direction is taken into account.
8.1 The reverse flow reactor

We start to recall the description of the reverse flow reactor from the intro-
duction. The reverse flow reactor (RFR) is a catalytic packed-bed reactor in
which the flow direction is periodically reversed to trap a hot zone within the
reactor. Upon entering the reactor, the cold feed gas is heated up regenera-
tively by the hot bed so that a reaction can occur. The reaction is assumed
to be exothermic. At the other end of the reactor the hot product gas is
cooled by the colder catalyst particles. The beginning and end of the reactor
thus effectively work as heat exchangers. The cold feed gas purges the high-
temperature (reaction) front in downstream direction. Before the hot reaction
zone exits the reactor, the feed flow direction is reversed. The flow-reversal
period, denoted by tf , is usually constant and predefined. One complete cycle
of the RFR consists of two flow-reverse periods. Overheating of the catalyst
and hot spot formation are avoided by a limited degree of cooling, at the wall
at constant temperature. This can be done by using a huge amount of cool-
ing water, that flows at a high rate along the outside of the reactor wall. A
183
184 Chapter 8. Efficient simulation of periodically forced reactors in 2D
schematic diagram of the reactor is shown in Figure 8.1.
Tc
catalyst
gas flow
PSfrag replacements
cooling jacket
Tc
Figure 8.1: Schematic drawing of the cooled reverse flow reactor.
Starting with an initial state, the reactor goes through a long transient
phase before converging to a periodic limiting state, also called the cyclic
steady state (CSS). Limiting states of periodically forced packed bed reactors
are of interest to the industry because the reactor operates in this situation
most of the time.
The basic model for a fixed bed catalytic reactor, such as the RFR, is
the so-called pseudo-homogeneous one-dimensional model. This model does
not differentiate between the fluid and the solid phase and considers gradients
in the axial direction only. Eigenberger and Nieken [19] have investigated a
simplified one-dimensional model. Due to a very short residence time of the
gas in the reactor, they assume the continuity equation and the mass balance
equation to be in quasi steady state when compared to the energy balance
equation. They apply standard dynamical simulation to compute the limiting
periodic states of the reverse flow reactor. Due to their choice of the model
and the values of the parameters all periodic states discovered are symmetric,
that is, the state after one flow reversal period is the mirror image of the initial
state.
Rehacek, Kubicek and Marek [57, 58] have extended the model of the RFR
to a two-phase model with transfer of mass and energy between the fluid and
solid phase. They consider the period map, that is, the map which assigns
the new state after one period of the process to an initial state. To obtain a
numerical expression of the period map, the authors discretize the partial dif-
ferential equations of the model in space and integrate the resulting system of
ordinary differential equations over one period. Again with dynamical simula-
8.2 The behavior of the reverse flow reactor 185
tion, that is, iterating the period map, symmetric stable periodic states of the
RFR are obtained. In addition, they observe asymmetric and quasi-periodic
behavior.
Khinast, Luss et al. [34, 32, 33, 35] have developed an efficient method to
compute bifurcation diagrams of periodic processes. Their approach is based
on previous work of Gupta and Bhatia [26] in which the system of partial
differential equations is considered as a boundary value problem in time. The
boundary condition implies that the initial state of the reactor equals the state
at the end of the cycle and, therefore, has to be a fixed point of the period map,
as explained in more detail in Section 7.3. The method of Broyden is used
in combination with continuation techniques to find the parameter dependent
fixed points of the period map.
For steady state processes that have coefficients and boundary conditions
invariant in time, two-dimensional models are standard practice, see [56].
When modeling a steady state process, a time invariant state can often be
expressed as the solution of a system of ordinary differential equations, where
time derivatives are absent. For the theoretical analysis of limiting states of
steady state processes, a great number of efficient mathematical and numer-
ical tools is available. In periodically forced systems, such as the RFR, the
limiting solution varies in time. To our knowledge, full two-dimensional mod-
els for the RFR have never been solved using a direct iterative method, such
as the method of Broyden. The reason being that an accurate simulation re-
quires a fine grid which yields a high dimensional discretized system. Due to
large computational costs, both regarding CPU-time and regarding memory
usage, two-dimensional models of periodically forced systems have so far been
avoided, at the expense of relevance and accuracy.
The radial transport of heat and matter, however, is very important in
non-isothermal packed bed reactors [72]. Highly exothermic reaction, a large
width of the reactor, and efficient cooling of the reactor at the wall cause
radial temperature gradients to be present, see Figure 8.2(b). Clearly, for
cooled reverse flow reactors the radial dimension must explicitly be taken into
account.
8.2 The behavior of the reverse flow reactor

From the initial state to the CSS
As initial condition for the reverse flow reactor we take a preheated reactor
filled with an inert gas. In the computations we consider the dimensionless
temperature θ = (T − T0 )/T0 and the conversion x = (c0 − c)/c0 , where T0 is
temperature
conversion
temperature
rad. distance rad. distance
ax. distance conversion ax. distance
(a) conversion (b) temperature
Figure 8.2: Qualitative temperature and conversion distribution of the cooled reverse
flow reactor in the cyclic steady state according to the two-dimensional model (6.26)-
(6.28) with the parameter values of Table 6.2.
the temperature and c0 is the concentration of the feed gas. The dimensionless
initial condition is set to
θ≡1 and x ≡ 1.
We start the process and let the gas flow entering the reactor at the left end.
The feed gas contains a trace of the reactant A and is at low temperature.
When entering the hot reactor, the cold feed is heated up and comes into
contact with the catalyst. Therefore, the reaction occurs and the concentration
of species A decreases. Because the reaction is assumed to be exothermic, the
temperature increases and a reaction front is created. On the other hand, the
catalyst at the left side of the reactor is cooled due to the low temperature
of the feed gas. At the left side of the reaction front the temperature is too
low to activate the reaction. At the other side of the reaction front all of the
reactants has reacted and the conversion is completed.
In Figure 8.3 the state of the reactor is given a different times. The re-
action front can be easily distinguished. Because the reactor is cooled the
temperature decreases at the right side of the reaction front.
After a period of time tf the feeding at the left end of the reactor is stopped
and the flow direction is reversed by feeding from the right end of the reactor.
Directly after this flow reversal, the hot reaction zone withdraws from the
right end and moves in left direction. The concentration A still present in the
left part is purged out of the reactor and after a short intermediate phase the
1 1
0.8 0.8
temperature
con
0.6 0.6
0.4 0.4
0.2 0.2
temperature 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
conversion conversion
axial distance axial distance
Figure 8.3: Snap shots of the first reverse flow period of the reactor.
conversion of species A in the product gas is again equal to one. The product
gas during this intermediate phase is often considered as waste gas. Note that
the reaction front now occurs at the right side of the hot zone, see Figure 8.4.
1 1
0.8 0.8
temperature
con
0.6 0.6
0.4 0.4
0.2 0.2
temperature 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
conversion conversion
Figure 8.4: Sn-apt shots of the second reverse flow period of the reactor.
By reversing the flow direction after a fixed period tf over and over again,
the hot reaction zone is catch in the reactor. It depends on the conditions of the
process what will be the state of the reactor after many cycles. Clearly, when
the cooling capacity is too high the state extinguishes, because the reaction
cannot be sustained at low temperatures. If the reverse flow period is too

long the reaction front exits the reactor. We describe the limiting state of the
reactor for different values of the cooling capacity and a moderate reverse flow
period.
Periodical, a-symmetric and quasi-periodic states

We use dynamical simulation to determine the limiting state of the reactor
for different values of the dimensionless cooling capacity. Adiabatic operation
leads to periodic, symmetric states at which the temperature (and concen-
tration) profiles at the beginning and end of a flow-reversal period are mir-
ror images. We call these states symmetric period-1 operation. Laboratory
and pilot-plant RFRs usually cannot be operated in an adiabatic mode [43].
Moreover, in some applications involving equilibrium-limited reactions cooling
is applied to avoid exceeding some critical temperatures at which either un-
desired reactions or catalyst deactivation may occur. Various modes of RFR
cooling were described by Matros and Bunimovich [44]. Reactor cooling may
introduce some complex and rich dynamic features, which do not exist in its
absence. For example, under relatively fast flow-reversal frequencies the sym-
metric states of a cooled RFR may become unstable and either asymmetric or
quasi-periodic states may be obtained. Quasi-periodic behavior of the reactor
means that in addition to the flow-reversal period, the forcing frequency, a
second period determines the over-all behavior. Examples of the dimension-
less temperature profiles, for these three types of states are shown in Figure
8.5. The differences in the dynamic features are caused by changes in the
dimensionless cooling capacity Φ, as defined in Table 6.2.
1 1 1
0.9 0.9 0.9

temperature
temperature
temperature
0.8 0.8 0.8
0.7 0.7 0.7
0.6 0.6 0.6
0.5 0.5 0.5
0.3
0.3
0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
axial distance axial distance axial distance
(a) Φ = 0.332 (b) Φ = 0.324 (c) Φ = 0.3
Figure 8.5: The limiting state of the reverse flow reactor at the switch of the flow
direction
To illustrate the development of quasi-periodic behavior we consider the

maximum temperature of the reactor at the end of every flow-reversal period,
see Figure 8.6. After a transient phase of about 50 flow-reversal periods the
1.05
0.95
0.9
θmax
0.85
0.8
PSfrag replacements
0.75
0.7
0 50 100 150 200 250 300 350 400 450 500
number of flow-reversals
0.9
0.85
θmax
0.8
PSfrag replacements
0.75
420 430 440 450 460 470 480 490 500
number of flow-reversals
Figure 8.6: The maximal temperature of the reactor at the switch of the flow direction,
for the first 500 flow-reversal periods, and the same picture starting after 420 flow-
reversal periods (Φ = 0.3).
reactor reaches a quasi-periodic regime. The second frequency of the quasi-

periodic behavior of the reactor equals 45 flow-reversal periods.
We construct a corresponding Poincaré map by considering ∆θave (n) versus
θave (n), where we define
Z 1
θave (n) = θ(z, ntf )dz, n = 0, 1, 2, . . . , (8.1)
0
and
³Z 1/2 Z 1 ´
∆θave (n) = 2 θ(z, ntf )dz − θ(z, ntf )dz , n = 0, 1, 2, . . . . (8.2)
0 1/2
The value θave (n) is the average reactor temperature after the nth flow reversal
and ∆θave (n) is the corresponding averaged difference between the tempera-
tures in the right and left half of the reactor. Clearly, the sign of ∆θave (n)
changes upon alternating flow reversal. For symmetric period-1 states, the
Poincaré map consists of two points, both for the same θave (n) value. For
asymmetric period-1 states, the Poincaré map has two points, but not for the
same θave (n) values. In Figure 8.7 we have plotted the Poincaré map corre-
sponding to the quasi-periodic behavior of Figure 8.6.
consist of a set of points forming two closed curves, thus indicating quasi-
period behavior. Each curve corresponds to one flow direction.
0.5
0.45
0.4
0.35
0.3
∆θave
0.25
0.2
0.15
0.1
0
0.41 0.415 0.42 0.425
θave
Figure 8.7: The Poincaré map of ∆θave (2k) versus θave (2k), representing the quasi-
periodic behavior of the reverse-flow reactor after the transient phase (2k ≥ 50).
8.3 Dynamic features of the full two-dimensional model 191
8.3 Dynamic features of the full two-dimensional

model
Before doing simulations with the two-dimensional model (6.26)-(6.28), we
simplify the problem in the following way. From the mathematical point of
view, it makes no difference if the flow direction in the reactor is reversed or
if the reactor is reversed itself while the fluid flows from the same direction.
Therefore we do not compute the state of the RFR after a whole cycle, but we
integrate the system over one flow-reversal period (tf ) and then reverse the
reactor in the axial direction. So, instead of f (x0 ) = x(z, (tc u)/L), the period
map is given by
f (x0 ) = x((L − z)/L, (tf u)/L), (8.3)
where L is length of the reactor and u is the superficial velocity. The state
of the reactor after a whole cycle is then obtained by applying the map f
twice to the initial condition. A fixed point of f corresponds to a symmetric
periodic state of the reactor. If asymmetric periodic states exist, we can find
them by computing fixed points of the original period map. The only way to
determine whether the limiting state of the reactor is quasi-periodic is by using
dynamical simulation. In this section we restrict ourselves to the computation
of symmetric periodic states.
We consider aspects of limiting periodic states of the RFR for different
values of the dimensionless reactor radius, denoted by R/L. The results are
expressed in the dimensionless temperature (T − T0 )/T0 and the conversion
(c0 − c)/c0 . We have fixed the flow-reversal period (tf = 1200s). As a bifurca-
tion parameter we use the dimensionless cooling capacity, defined by,
2LUw
Φ= .
Ru(ρcp )g
To obtain the results of this section the BRR method is used with p = 30.
The bifurcation diagrams, describing the dependence of the symmetric periodic
state of the reactor on the dimensionless cooling capacity, are constructed
using a standard continuation technique in combination with the BRR method.
Eigenvalues of the Jacobian Jf are determined using the subspace method with
locking [60].
We describe two different cases of the limiting periodic state for a fixed
value of the cooling capacity (Φ = 0.2). If the reactor is rather slim (for exam-
ple, R/L = 0.0025), we observe that the temperature is constant over every
cross section of the reactor, see Figure 8.8(b). In this way we can validate
the two-dimensional model. Indeed according to the theory of Section 6.2, if
radial gradients are absent, the weighted average of the two-dimensional tem-
perature profile equals the temperature profile of the one-dimensional model.
This has been confirmed by simulations of the one-dimensional model. The
same observation is valid for the conversion, see Figure 8.8(a).
1.2 1.2
1 1
temperature
0.8 0.8
conversion
0.6 0.6
PSfrag replacements 0.4 PSfrag replacements 0.4
temperature 0.2 0.2
0 conversion 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) Conversion (b) Temperature
Figure 8.8: Axial temperature and conversion profiles of the RFR (in CSS) at the
beginning of a reverse flow according to the two-dimensional model (6.26)-(6.28) with
the parameter values of Table 6.2. The cooling capacity Φ is fixed at 0.2 and the
radius of the reactor equals R/L = 0.0025.
We use the same value for the cooling capacity (Φ = 0.2), but now with a
larger reactor width (R/L = 0.025). This implies that the cooling now prop-
agates less easily through the reactor and steep temperature gradients in the
radial direction arise. In Figure 8.9(b) we have represented the distribution of
the temperature over the catalyst bed in the cyclic steady state. For several
positions in the radial direction, the temperature profile along the reactor is
plotted. The lines with the highest temperatures correspond to radial posi-
tions near the axis of the reactor. The lines with the lowest temperatures
correspond to radial positions near the wall of the reactor. Clearly, the cool-
ing is especially influencing the temperature of the catalyst near the wall of
the reactor. Note that for different radial positions the axial position of the
maximum temperature is shifted. This results in a lower maximum of the
weighted average of the temperature. In Figure 8.9(a) the conversion of the
same cyclic steady state is given. The lines with the highest conversion cor-
respond to radial positions near the axis of the reactor. The lines with the
lowest conversion correspond to radial positions near the wall of the reactor.
Note that only around the axis the conversion is complete at the end of the
8.3 Dynamic features of the full two-dimensional model 193
reactor. Therefore the product gas consists of a mixture of both products as

reactants and on an average the conversion is not complete.
1.2 1.2
1 1
temperature
0.8 0.8
conversion
0.6 0.6
PSfrag replacements 0.4 PSfrag replacements 0.4
temperature 0.2 0.2
0 conversion 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) Conversion (b) Temperature
Figure 8.9: Axial temperature and conversion profiles of the RFR (in CSS) at the
beginning of a reverse flow period according to the two-dimensional model (6.26)-
(6.28) with the parameter values of Table 6.2. The cooling capacity Φ is fixed at 0.2
and the radius of the reactor equals R/L = 0.025. In addition, the weighted average
(6.18) is given (’◦’).
Two bifurcation branches are shown in Figure 8.10. The weighted average
(6.18) of the temperature is computed over every cross section. The maxi-
mum of these values is plotted versus the dimensionless cooling capacity Φ for
different values of R/L. It can be shown that, for every value of the cooling ca-
pacity, a stable extinguished state exists. For the slim reactor (R/L = 0.0025)
the maximum average temperature is always higher than for the wide reactor
(R/L = 0.025) at the same cooling capacity. This can be explained by the fact
that for the wide reactor, for different radial positions, the maximum of the
temperature is not found at the same axial position in the reactor. Note that
for the slim reactor there exists a minimum in the upper branch (at Φ ≈ 0.3).
The reason is that the two high temperature zones, cf. Figure 8.8(b), merge
into one. For cooling capacities higher than Φ ≈ 0.67, the reactor cannot op-
erate at high temperature and dies out. The part of the branch with negative
cooling capacity has of course no physical meaning. The bifurcation branch
for the wide reactor has more or less the same characteristics. However, the
minimum has disappeared and the upper branch has become monotonically
decreasing.
To determine the stability of the points on the bifurcation branches, we
1.2
0.8
0.6
θmax
0.4
0.2
1.8
1.6
1.4
1.2
1
µmax
0.8
−0.8
−1.2
−1.4
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
dimensionless cooling capacity Φ
Figure 8.10: The maximum dimensionless temperature (θmax ) and the largest Floquet
multiplier (µmax ) versus the cooling capacity (Φ) for two different values of the reactor
radius. The two-dimensional model (6.26)-(6.28) was used with the parameter values
of Table 6.2. [’∗’ (R/L = 0.0025), ’◦’ (R = 0.025)]
have also plotted the largest Floquet multiplier (µmax ) in Figure 8.10. Start-
ing with Φ = 0 at the upper branch of the bifurcation diagram the largest
eigenvalue of the Jacobian at the fixed points is slightly less than +1, im-
plying that the fixed points are stable. At Φ ≈ 0.15 a negative eigenvalue
the largest eigenvalue in modulus and crosses the unit circle at µ = −1 for
Φ ≈ 0.19, causing a symmetry loss bifurcation, that is, the symmetric state
become unstable and a stable asymmetric period-1 state emerges. For cooling
capacities higher than Φ ≈ 0.32 (Φ ≈ 0.48), the largest eigenvalue returns to
the unit circle but remains close to −1. Then the symmetric state is stable
but it takes the reactor a large number of cycles to converge to this limiting
state. Finally, at the limit point, for which Φ ≈ 0.67 (Φ ≈ 0.65), a positive
eigenvalue crosses the unit circle at µ = +1. So, for higher cooling capacities
the cooling eventually causes extinction of the reactor. The fixed points of the
lower branches for both the wide and the slim reactor are unstable.
Notes and comments
Section 1.1
For more information on finite arithmetic see [18] and [20].
A clear and detailed introduction in quasi-Newton methods to solve non-
linear equations and optimization problems is given by Dennis and Schnabel
[18].
The proof of Theorem 1.2 is given in [18] where it is Theorem 2.4.3.
Lemma 1.3 is Lemma 2.4.2 of [18] and Lemma 1.4 is Corollary 2.6.2 of [18].
Theorem 1.5 is given without proof in [18] where it is Theorem 2.6.3.
Section 1.2
In [8] Broyden uses the mean convergence rate R given by
1
R= log(kg(x0 )k/kg(xm−1 )k)
m
as the measure of efficiency of a method for solving a particular problem,

where m is the total number of function evaluations. In this thesis we divide
the logarithm by k ∗ instead of m, which makes R infinity if k ∗ = 0.
Theorem 1.10 is a simplification of Theorem 5.2.1 of [18], where it is as-
sumed that Jg ∈ Lipγ (N (x∗ , r)) with N (x∗ , r) ⊂ D, for some r.
Lemma 1.11 is a simplification of Lemma 4.1.12 of [18], where Jg is assumed
to be Lipschitz continuous at x only.
Theorem 1.12 is called the Banach perturbation theorem. Theorem 3.1.4
of [18] is a more general version of the perturbation theorem, where k.k can be
any norm on Rn×n that satisfies kABk ≤ kAk·kBk, A, B ∈ Rn×n and kIk = 1.
The theorem is also given [55].
Theorem 1.15 is Theorem 5.4.1 of [18].
195
196 Notes and comments
Section 1.3
Lemma 1.20 is a special case of Lemma 8.1.1 of [18]. If the l2 -operator norm
is used in (1.32) instead of the Frobenius norm, multiple solutions for A exist,
some clearly less desirable than Broyden’s update.
Lemma 1.23 is a combination of Lemma’s 4.1.15 and 4.1.16 of [18]. The
lemma is also given in [11] where it is Lemma 3.1.
Theorem 1.24 is a special case of Theorem 3.2 of [11], where instead of the
Frobenius norm a weighted matrix norm, denoted by k.kM . This very general
theorem of Broyden, Dennis and Moré was developed to extend the analysis
given by Dennis for Broyden’s method [15], to other secant methods. The
theorem is in some sense considered as unsatisfying, because the initial Broy-
den matrix must be close to the Jacobian. For the limited memory Broyden
methods, where we choose B0 = −I in general, this is assumption is not satis-
fied. However, all other convergence proofs of the quasi-Newton methods for
nonlinear equations are built on this result.
Corollary 1.25 is Corollary 3.3 of [11].
Theorem 1.26 is a particular case of Theorem 4.3 of [11]. The proof of the
theorem is simplified by using results of [18] and [16].
Lemma 1.27 is Lemma 8.2.3 of [18].
Lemma 1.28 is Lemma 2.2 of [16].
Lemma 1.29 is Lemma 8.2.5 of [18].
Theorem 1.30 can be found in e.g. [28].
For practical implementation, the method of Broyden has to be used in
combination with global algorithms. Well known approaches are for example
line search and the model-trust region approach, see Sections 6.3 and 6.4
of [18]. To obtain a more robust method, Broyden himself chose the finite-
difference approximation of the Jacobian for the initial estimate B0 and applied
a backtracking strategy for the line search, see [8].
An overview of many of the important theoretical results of secant methods
is given in e.g. [17, 42].
In 1970 Broyden [7] has proved that his method converges R-superlinearly
on linear problems and in 1971 he has proved that the method converges
locally and at least linearly on nonlinear problems [9].
In 2000 Broyden has written a short note on the discovery of the ’good
Broyden’ method [10].
Section 2.1
The generalized Broyden’s method, Algorithm 2.1, is proposed by Gerber and

Luk in [23]. The algorithm was also published by Gay in [22], where in case
of yk = 0 the new inverse Broyden matrix Hk+1 was set to Hk . In Chapter 2
we only consider affine functions g(x) = Ax + b, where the matrix A ∈ Rn×n
is nonsingular. Therefore, yk = Ask and yk = 0 if and only if sk = 0.
Lemma 2.2 is a particular case of results derived in [23].
A full proof of Lemma 2.3 can be found in [22], where it is Lemma 2.1.
Theorem 2.4 is Theorem 2.2 of [22].
Lemma 2.7 is a slightly adjusted version of Lemma 3.1 of [54], that is
derived from Lemma 2.3 of [22].
In [22] Gay proved under which conditions Algorithm 2.1 requires a full
2n steps to convergence.
As a result of the 2n-step exact convergence for linear systems, Gay proved
in [22] that the method of Broyden is 2n-step quadratically convergent for
nonlinear functions.
Section 2.2
Lemma 2.8 is Lemma 3.1 of [23], Lemma 2.9 is Lemma 3.2 of [23] and Lemma
2.10 is Lemma 3.3 of [23].
Theorems 2.11 and 2.12 are Theorems 3.1 and 3.2 of [23]. Note that
the condition (2.19) is unsatisfactory, since it has to be checked during the
process. We would like to sharpen Theorem 2.12 in the following way. If
dim Zk+1 = dim Zk − 1 and vkT wk = 0 then dim Zk+2 = dim Zk+1 − 1, a
nonzero vector wk+1 ∈ Zk+1 ∩ Ker(I − AHk+1 ) exists, that equals wk and
T w
satisfies vk+1 T
k+1 = 0. This would imply that if w0 6= 0 and v0 w0 = 0 the
method of Broyden needs d0 iterations to converge. Simulations confirmed
this conjecture, see Example 2.16.
Lemma 2.13 is Lemma 3.4 of [23].
Section 2.3
According Lemma 2.18 we consider in the examples of Chapters 2 and 4 affine

functions g(x) = Ax, where A is in Jordan normal form, see [64] for more
details.
Section 3.1
In this thesis we only consider limited memory methods that are based on the
method of Broyden and are applicable for nonlinear functions with general
nonsingular Jacobian. In 1970 Schubert [62] has proposed a secant method
to solve nonlinear functions, where the Jacobian is sparse and the locations
of the nonzero elements are known. In addition to the secant equation, he
imposes the updated Broyden matrix to have the same sparsity structure as
the Jacobian. In 1971 Broyden [9] has investigated the properties of this
modified algorithm, both theoretical and experimental. Toint has extended
this approach to quasi-Newton algorithms for optimization problems, cf. [65,
66, 67].
The Newton-Picard method is first proposed by Lust et al. [41]. The
algorithm applies the method of Newton on a small p-dimensional subspace
and dynamical simulation, Picard iteration, on the orthogonal subspace. The
small subspace is formed by the eigenvectors corresponding to the largest
eigenvalues in modulus of the Jacobian Jg at the current iterate xk . The p
eigenvectors can be computed using subspace iteration, see [60], avoiding the
usage of a large (n × n)-matrix in the algorithm.
A relatively new field of research is the Newton-Krylov method, cf. [6], that
is based on solving the Newton iteration step without computing the Jacobian
of the function explicitly. To approximate the Newton step, subspace iteration
is used. Derived from this idea, Tensor-Krylov methods provide even faster
algorithms, see [5].
Limited memory quasi-Newton methods for optimization problems have
been studied by e.g. Kolda, O’Leary and Nazareth [37], Liu and Nocedal [40],
Morales and Nocedal [45] and Nocedal [48].
Section 3.2
A good overview of singular values can be found in e.g. [27, 25]
The rank reduction applied in Algorithm 3.11 with q = p − 1 can also be
considered as an additional rank-one update. Let vp be the right singular vec-
tor corresponding to the pth singular value of the update matrix Q in iteration
p + 1. The new update matrix Q e satisfies Qv
e p = 0, and in all other directions
e
Q has the same action as Q. This implies for the intermediate Broyden matrix
Be that


Bve p = B 0 vp ,

Bu
e = Bu for u ⊥ vp ,
and therefore
vpT
e = B + (B0 vp − Bvp )
B = B − Qvp vpT ,
vpT vp
which is a rank one-update of B.
The condition (3.17) on the reduction matrix R is already suggested in
[11].
Section 3.4
The idea of Section 3.4 comes from an article by Byrd, Nocedal and Schnabel
[12], in which they derived short representation for different quasi-Newton
methods.
Lemma 3.20 is Lemma 2.1 of [12], Theorem 3.21 is Theorem 6.1 of [12] and
Theorem 3.22 is Theorem 6.2 of [12].
The scaling in Algorithm 3.23 is proposed by Richard Byrd.
In a limited context using the notation of Section 3.4, the multiple secant
version of Broyden’s update, see (1.25) and (1.26), is given by
Bk = B0 + (Yk − B0 Sk )(SkT Sk )−1 SkT . (8.4)
This update is well defined as long as Sk has full column rank, and obeys the
k secant equations Bk Sk = Yk .
Comparing (8.4) to the formula in (3.27) for k consecutive, standard Broy-
den updates, we see that in the multiple secant approach we use SkT Sk , while
in (3.27) it is the upper triangular portion of this matrix, including the main
diagonal. Therefore, the two update are the same if the directions in S k are
orthogonal. The preference between these two formulas does not appear to be
clear cut. The formula (3.27) has the advantage that it is well defined for any
Sk , while (8.4) is only well defined numerically if the k step directions that
make up Sk are sufficiently linearly independent. If they are not, only some
subset of them can be utilized in a numerical implementation of the multiple
Broyden method. This is the approach that has often been taken in implemen-
tations of this update. On the other hand, (8.4) always enforces the k prior
secant equations while (3.27) only enforces the most recent equation. Thus
it would probably be worthwhile considering either method (or their inverse
formulations) in a limited memory method for solving nonlinear equations.
Section 6.2
A comprehensive overview of chemical reactors and modeling techniques is
written by e.g. Scott Fogler [63] and Froment and Bischoff [21].
A clear introduction is given by Aris [3].
Section 7.2
An introduction to dynamical systems can be found in [4].
Section 7.1
Basics of discretization techniques are given in [59].
Section 7.3
For locating a bifurcation branch it is enough to approximate the points on
the branch op to an error of about 10−2 during the continuation scheme. In
the neighborhood of bifurcation points the points on the branch might have
to be determined more accurately.
Van Noorden et al. [51, 52] compared several convergence acceleration
techniques (such as the method of Newton, the method of Broyden and the
Newton-Picard method) in combination with continuation techniques. From
their work it turns out that Broyden’s method is the most efficient for solving
large systems of nonlinear equations in terms of function evaluations.
An advanced adapted Broyden method that uses information of the con-
tinuation process to update the Broyden matrix is developed by Van Noorden
et al. [50].
Studies in continuation techniques and bifurcation analysis can be found
in work by e.g. Allgower, Chien and Georg [1] and Allgower and Georg [2].
Section 8.2
An extended investigation of the dynamical behavior of the reverse flow reactor
is given by Khinast et al. [33].
Recent studies of the reverse-flow reactor can be found in work by Glöckler,
Kolios and Eigenberger [24] and Jeong and Luss [31].
Bibliography
[1] E.L. Allgower, C.-S. Chien, and K. Georg. Large sparse continuation
problems. J. Comput. Appl. Math., 26:3–21, 1989.
[2] E.L. Allgower and K. Georg. Numerical continuation methods, vol-

ume 13 of Springer Series in Computational Mathematics. Springer-
Verlag, Berlin, 1990. An introduction.
[3] R. Aris. Mathematical modelling techniques. Dover Publications Inc.,

New York, 1994. Corrected and expanded reprint of the 1978 original.
[4] D.K. Arrowsmith and C.M. Place. An introduction to dynamical systems.

Cambridge University Press, Cambridge, 1990.
[5] A. Bouaricha. Tensor-Krylov methods for large nonlinear equations. Com-

put. Optim. Appl., 5:207–232, 1996.
[6] P.N. Brown and Y. Saad. Convergence theory of nonlinear Newton-Krylov

algorithms. SIAM J. Optim., 4:297–330, 1994.
[7] C. G. Broyden. The convergence of single-rank quasi-Newton methods.

Math. Comp., 24:365–382, 1970.
[8] C.G. Broyden. A class of methods for solving nonlinear simultaneous

equations. Math. Comp., 19:577–593, 1965.
[9] C.G. Broyden. The convergence of an algorithm for solving sparse non-
linear systems. Math. Comp., 25:285–294, 1971.
[10] C.G. Broyden. On the discovery of the ’good Broyden’ method. Math.
Program., B 87:209–213, 2000.
[11] C.G. Broyden, J.E. Dennis, Jr., and J.J. Moré. On the local and su-
perlinear convergence of quasi-Newton methods. J. Inst. Math. Appl.,
12:223–245, 1973.
201
202 Bibliography
[12] R.H. Byrd, J. Nocedal, and R.B. Schnabel. Representations of quasi-

Newton matrices and their use in limited memory methods. Math. Pro-
gram., 63:129–156, 1994.
[13] B.T. Carvill, J.R. Hufton, M. Anand, and S. Sircar. Sorption enhanced
reaction process. AIChE J., 42(10):2765–2772, 1996.
[14] M.M. Davis and M.D. Levan. Experiments on optimization of thermal

swing adsorption. Ind. Eng. Chem. Res., 28:778–785, 1989.
[15] J.E. Dennis, Jr. On the convergence of Broyden’s method for nonlinear
systems of equations. Math. Comp., 25:559–567, 1971.
[16] J.E. Dennis, Jr. and J.J. Moré. A characterization of superlinear con-
vergence and its application to quasi-Newton methods. Math. Comp.,
28:549–560, 1974.
[17] J.E. Dennis, Jr. and J.J. Moré. Quasi-Newton methods, motivation and
theory. SIAM Rev., 19:46–89, 1977.
[18] J.E. Dennis, Jr. and R.B. Schnabel. Numerical methods for unconstrained
optimization and nonlinear equations, volume 16 of Classics in applied
mathematics. Society for Industrial and Applied Mathematics (SIAM),
Philadelphia, PA, 1996. Corrected reprint of the 1983 original.
[19] G. Eigenberger and U. Nieken. Catalytic combustion with periodic-flow

reversal. Chem. Eng. Sci., 43:2109–2115, 1988.
[20] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson. Computational dif-

ferential equations. Cambridge University Press, Cambridge, 1996.
[21] G.F. Froment and K.B. Bischoff. Chemical reactor analysis and design.
John Wiley & Sons Ltd., New York, 1990.
[22] D.M. Gay. Some convergence properties of Broyden’s method. SIAM J.

Numer. Anal., 16:623–630, 1979.
[23] R.R. Gerber and F.T. Luk. A generalized Broyden’s method for solving
simultaneous linear equations. SIAM J. Numer. Anal., 18:882–890, 1981.
[24] B Glöckler, G. Kolios, and G. Eigenberger. Analysis of a novel reverse-

flow reactor concept for autothermal methane steam reforming. Chem.
Eng. Sci., 58:593–601, 2003.
Bibliography 203
[25] G.H. Golub and C.F. Van Loan. Matrix computations. Johns Hopkins
Studies in the Mathematical Sciences. Johns Hopkins University Press,
third edition, 1996.
[26] V.K. Gupta and S.K. Bhatia. Solution of cyclic profiles in catalytic reactor
operation with periodic-flow reversal. Comput. Chem. Eng., 15:229–237,
1991.
[27] R.A. Horn and C.R. Johnson. Matrix analysis. Cambridge University
Press, Cambridge, 1990. Corrected reprint of the 1985 original.
[28] A.S. Householder. Principles of numerical analysis, pages 135–138. Mc

Graw-Hill, New York, 1953.
[29] J.R. Hufton, S. Mayorga, and S. Sircar. Sorption-enhanced reaction pro-

cess for hydrogen production. AIChE J., 45:248–256, 1999.
[30] G. Iooss and D.D. Joseph. Elementary stability and bifurcation theory.
Undergraduate texts in mathematics. Springer-Verlag, New York, second
edition, 1990.
[31] Y.O. Jeong and D. Luss. Pollutant destruction in a reverse-flow chro-

matographic reactor. Chem. Eng. Sci., 58:1095–1102, 2003.
[32] J.G. Khinast, A. Gurumoorthy, and D. Luss. Complex dynamic features

of a cooled reverse-flow reactor. AIChE J., 44:1128–1140, 1998.
[33] J.G. Khinast, Y.O. Jeong, and D. Luss. Dependence of cooled reverse-flow
reactor dynamics on reactor model. AIChE J., 45:299–309, 1999.
[34] J.G. Khinast and D. Luss. Mapping regions with different bifurcation
diagrams of a reverse-flow reactor. AIChE J., 43:2034–2047, 1997.
[35] J.G. Khinast and D. Luss. Efficient bifurcation analysis of periodically-

forced distributed parameter systems. Comput. Chem. Eng., 24:139–152,
2000.
[36] A.J. Kodde and A. Bliek. Selectivity enhancement in consecutive reac-

tions using the pressure swing reactor. Stud. Surf. Sci. Catal., 109:419–
428, 1997.
[37] T.G. Kolda, D.P. O’Leary, and L. Nazareth. BFGS with update skipping
and varying memory. SIAM Journal on Optimization, 8:1060–1083, 1998.
204 Bibliography
[38] Y.A. Kuznetsov. Elements of applied bifurcation theory, volume 112 of

Applied Mathematical Sciences. Springer-Verlag, New York, second edi-
tion, 1998.
[39] H.M. Kvamsdal and T. Hertzberg. Optimization of pressure swing ad-

sorption systems the effect of mass transfer during the blowdown step.
Chem. Eng. Sci., 50:1203–1212, 1995.
[40] D.C. Liu and J. Nocedal. On the limited memory BFGS method for large
scale optimization. Math. Program., 45:503–528, 1989.
[41] K. Lust, D. Roose, A. Spence, and A.R. Champneys. An adaptive

Newton-Picard algorithm with subspace iteration for computing periodic
solutions. SIAM J. Sci. Comput., 19:1188–1209, 1998.
[42] J.M. Martı́nez. Practical quasi-Newton methods for solving nonlinear

systems. J. Comput. Appl. Math., 124:97–121, 2000.
[43] Yu.Sh. Matros. Catalytic processes under unsteady state conditions. El-
sevier, Amsterdam, 1989.
[44] Yu.Sh. Matros and G.A. Bunimovich. Reverse-flow operation in fixed bed
catalytic reactors. Catal. Rev., 38:1–68, 1996.
[45] J.L. Morales and J. Nocedal. Automatic preconditioning by limited mem-

ory quasi-Newton updating. SIAM J. Optim., 10:1079–1096, 2000.
[46] J.J. Moré and M.Y. Cosnard. Numerical solution of nonlinear equations.
ACM Trans. Math. Soft., 5:64–85, 1979.
[47] J.J. Moré, B.S. Garbow, and K.E. Hillstrom. Testing unconstrained op-
timization software. ACM Trans. Math. Soft., 7:17–41, 1981.
[48] J. Nocedal. Updating quasi-Newton matrices with limited storage. Math-

ematics of Computation, 35:773–782, 1980.
[49] T.L. van Noorden. New algorithms for parameter-swing reactors. PhD
thesis, Vrije Universiteit, Amsterdam, 2002.
[50] T.L. van Noorden, S.M. Verduyn Lunel, and A. Bliek. A Broyden rank
p + 1 update continuation method with subspace iteration. To appear in
SIAM J. Sci. Comput.
Bibliography 205
[51] T.L. van Noorden, S.M. Verduyn Lunel, and A. Bliek. Acceleration of
the determination of periodic states of cyclically operated reactors and
separators. Chem. Eng. Sci., 57:1041–1055, 2002.
[52] T.L. van Noorden, S.M. Verduyn Lunel, and A. Bliek. The efficient com-
putation of periodic states of cyclically operated chemical processes. IMA
J. Appl. Math., 68:149–166, 2003.
[53] Numerical Algorithms Group (NAG). The NAG Fortran library manual,
Mark 20, 2003. Available from http://www.nag.co.uk/.
[54] D.P. O’Leary. Why Broyden’s nonsymmetric method terminates on linear

equations. SIAM J. Optim., 5:231–235, 1995.
[55] J.M. Ortega and W.C. Rheinboldt. Iterative solution of nonlinear equa-
tions in several variables, volume 30 of Classics in applied mathematics.
Society for Industrial and Applied Mathematics (SIAM), Philadelphia,
PA, 2000. Reprint of the 1970 original.
[56] R.M. Quinta Ferreira and C.A. Almeida-Costa. Heterogeneous models

of tubular reactors packed with ion-exchange resins: Simulation of the
MTBE synthesis. Ind. Eng. Chem. Res., 35:3827–3841, 1996.
[57] J. Rehacek, M. Kubicek, and M. Marek. Modeling of a tubular catalytic

reactor with flow reversal. Chem. Eng. Sci., 47:2897–2902, 1992.
[58] J. Rehacek, M. Kubicek, and M. Marek. Periodic, quasiperiodic and

chaotic spatiotemporal patterns in a tubular catalytic reactor with peri-
odic flow reversal. Comput. Chem. Eng., 22:283–297, 1998.
[59] R.D. Richtmyer and K.W. Morton. Difference methods for initial-value
problems, volume 4 of Interscience tracts in pure and applied mathematics.
John Wiley & Sons Ltd., New York-London-Sydney, second edition, 1967.
[60] Y. Saad. Numerical methods for large eigenvalue problems, algorithms and
architectures for advanched scientific computing. Manchester university
press, Manchester, 1992.
[61] Y. Saad and M.H. Schultz. GMRES: a generalized minimal residual al-
gorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist.
Comput., 7:856–869, 1986.
[62] L.K. Schubert. Modification of a quasi-Newton method for nonlinear

equations with a sparse Jacobian. Math. Comput., 24:27–30, 1970.
206 Bibliography
[63] H. Scott Fogler. Elements of chemical reaction engineering. Prentice Hall

PTR, third edition, 1999.
[64] J. Stoer and R. Bulirsch. Introduction to numerical analysis, volume 12 of

Texts in Applied Mathematics. Springer-Verlag, New York, third edition,
2002. Translated from the German by R. Bartels, W. Gautschi and C.
Witzgall.
[65] Ph.L. Toint. On sparse and symmetric matrix updating subject to a linear
equation. Math. Comput., 31:954–961, 1977.
[66] Ph.L. Toint. On the superlinear convergence of an algorithm for solving

a sparse minimization. SIAM J. Numer. Anal., 16, 1979.
[67] Ph.L. Toint. A sparse quasi-Newton update derived variationally with

a nondiagonally weighted Frobenius norm. Math. Comput., 37:425–433,
1981.
[68] B.A. van de Rotten and S.M. Verduyn Lunel. A limited memory Broy-
den method to solve high-dimensional systems of nonlinear equations.
Technical Report 2003-06, Universiteit Leiden, 2003.
[69] B.A. van de Rotten, S.M. Verduyn Lunel, and A. Bliek. Efficient sim-
ulation of periodically forced reactor in 2-d. Technical Report 2003-13,
Universiteit Leiden, 2003.
[70] H.A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging variant
of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci.
Statist. Comput., 13:631–644, 1992.
[71] H.A. van der Vorst and G.L.G. Sleijpen. Iterative Bi-CG type methods
and implementation aspects. In Algorithms for large scale linear algebraic
systems (Gran Canaria, 1996), volume 508 of NATO Adv. Sci. Inst. Ser.
C Math. Phys. Sci., pages 217–253. Kluwer Acad. Publ., Dordrecht, 1998.
[72] K.R. Westerterp, W.P.M. van Swaaij, and A.A.C.M. Beenackers. Chemi-
cal reactor design and operation. John Wiley & Sons Ltd., second edition,
1988.
Appendix A
Test functions
This appendix is devoted to a discussion of the test functions used to test

the different limited memory Broyden methods of Chapter 3. Because the
methods of Newton and Broyden are not globally converging and the area of
convergence can be small, we have chosen some specific test functions, taken
from the CUTE collection, cf. [18, 47].
Discrete boundary value function

The two-point boundary value problem
1
u00 (t) = (u(t) + t + 1)3 , 0 < t < 1, u(0) = u(1) = 0. (A.1)
2
can be discretized by considering the equation at the points t = ti , i = 1, . . . , n.
We apply the standard O(h2 ) discretization and denote h = 1/(n + 1) and
ti = i · h, i = 1, . . . , n. The resulting system of equations is given by
g(x) = 0,
where
h2
gi (x) = 2xi − xi−1 − xi+1 + (xi + ti + 1)3 , i = 1, . . . , n, (A.2)
2
for x = (x1 , . . . , xn ) and xi = u(ti ), i = 1, . . . , n. The Jacobian of this function
has a band structure with on both sub-diagonals the value −1. The elements
on the diagonal of the Jacobian are given by
∂gi 2h2
=2+ (xi + ih + 1)2 , i = 1, . . . , n.
∂xi 2
207
208 Appendix A. Test functions
As initial condition we define the vector x0 by
x0 = (t1 (t1 − 1), . . . , tn (tn − 1)). (A.3)
The so-called discrete boundary value function was first used by Moré and
Cosnard to test the methods of Brent and of Brown [46]. In Figure A.1 we
have plotted the initial condition x0 and the zero x∗ of the function g.
−0.05
−0.1
−0.15
−0.2
−0.25
0 0.2 0.4 0.6 0.8 1
Figure A.1: The initial condition x0 (dotted line) and the zero x∗ (solid line) of the
function g given by (A.2).
Discrete integral equation function

In the same article Moré and Cosnard also considered the discrete integral
equation function [46]. If we integrate the boundary value problem (A.1) two
times and apply the boundary conditions, we obtain the nonlinear integral
equation
Z
1 1
u(t) + H(s, t)(u(s) + s + 1)3 ds = 0, (A.4)
2 0
where (
s(1 − t), s < t,
H(s, t) =
t(1 − s), s ≥ t.
209
To discretize Equation (A.4), we replace the integral by an n-point rectangular

rule based on the points t = ti , i = 1, . . . , n. If we denote h = 1/(n + 1) and
ti = i · h, i = 1, . . . , n, the resulting system of equations reads
g(x) = 0,
where g(x) is given by
h³ ´
i
X n
X
3
gi (x) = xi + (1−ti ) tj (xj +tj +1) +ti (1−tj )(xj +tj +1)3 , (A.5)
2
j=1 j=i+1
for i = 1, . . . , n. Note that the Jacobian of the function g has a dense structure.
As in case of the discrete boundary value function, we start with the initial
vector x0 , given by
x0 = (t1 (t1 − 1), . . . , tn (tn − 1)). (A.6)
Extended Rosenbrock function

The extended Rosenbrock function g : Rn → Rn is defined for even n by
(
g2i−1 (x) = 10(x2i − x22i−1 ),
i = 1, . . . , n/2. (A.7)
g2i (x) = 1 − x2i−1 ,
This implies that the equation
g(x) = 0
equals n/2 copies of a system in the two-dimensional space.

Note that the Jacobian of the extended Rosenbrock function is a block
diagonal matrix. The (2 × 2)-matrices on the diagonal are given by
µ ¶
−20x2i−1 10
.
−1 0
The unique zero of (A.7) is given by x∗ = (1, . . . , 1), so that the Jacobian of g
is nonsingular at x∗ and has singular values approximately 22.3786 and 0.4469
with multiplicity n/2.
As initial vector x0 for the iterative methods we choose
(−1.2, 1, . . . , −1.2, 1). (A.8)

210 Appendix A. Test functions
Extended Powell singular function

The extended Powell singular function contains n/4 copies of the same function
in the four-dimensional space. Let n be a multiple of 4 and define the function
g : Rn → Rn by


 g4i−3 (x) = x4i−3 + 10x4i−2 ,

 √

g4i−2 (x) = 5(x4i−1 − x4i ),
i = 1, . . . , n/4. (A.9)

 g4i−1 (x) = (x4i−2 − 2x4i−1 )2 ,



 √
g4i (x) = 10(x4i−3 − x4i )2 ,
The unique zero of (A.9) is x∗ = 0. So, the Jacobian is a block diagonal matrix
with blocks
 
1 10 √0 0
√
 0 0 5 − 5 
 0 2(x4i−2 − 2x4i−1 ) −4(x4i−2 − 2x4i−1 ) 0 
√ √
2 10(x4i−3 − x4i ) 0 0 −2 10(x4i−3 − x4i )
and singular at the zero x∗ .

The initial point x0 is given by
(3, −1, 0, 1, . . . , 3, −1, 0, 1). (A.10)

Appendix B
Matlab code of the limited

memory Broyden methods
We have implemented the codes of the iterative methods described in Chapters

1 and 3 in the computer languages Fortran and Matlab. The codes in Fortran
were used in order to apply the integration routines and matrix manipulation
routines of the Fortran NAG-library [53], as well as to compute the solutions
of large dimensional systems of equations (n ≥ 1000). The codes in Matlab
were used in order to obtain more insight in the Broyden matrices and the
update matrices, to manufacture plots of the Broyden matrices, the singular
values of the update matrices and the rate of convergence, as well as to present
the codes in a convenient manner.
The method of Broyden

We omit the codes of Newton’s method, Algorithm 1.7, the Newton-Chord
method, Algorithm 1.16, and the Discrete Newton method, Algorithm 1.13,
and start with the plain method of Broyden, Algorithm 1.19, that forms the
basis of the codes of all the limited memory Broyden methods to come.
function [ x ] = . . .
broyden ( gcn , x , B , n , imax , i e p s , i f a i l )
%%% I n i t i a l i s a t i o n %%%
g = f e v a l ( gcn , x , n ) ; i t e = 0 ;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;
%%% Broyden i t e r a t i o n %%%

while ( ne ( i t e +1) > i e p s ) ,
%%% Broyden s t e p %%%
211
212 Appendix B. Matlab code of the limited memory Broyden methods
s = −B\ g ; ns = s ’ ∗ s ;
x = x + s;
y = f e v a l ( gcn , x , n ) − g ; g = y + g ;
i t e = i t e + 1;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;
%%% Matrix u p d a t e %%%
B = B + ( y − B∗ s ) ∗ s ’ / ns ;
end ;
We are not only interested in the zero of the function g but also in the
convergence properties of the method. Therefore, we include extra output
parameters of the subroutine, such as the number of iterations ’ite’ and the
residue at every iteration step ’ne’. In addition the algorithm can stuck at
several points. The reason of failure of the subroutine we return in the variable
’ifail’. The local matrices and vectors are declared at the beginning of the
subroutine. The extended code for the method of Broyden reads
function [ x , i t e , ne , i f a i l ] = . . .
broyden ( gcn , x , B , n , imax , i e p s , meps , i f a i l )
i f ( i f a i l ˜ = 0 | imax == 0) i f a i l = 1 ; return , end ;

disp ( ’ # ∗∗∗ The method o f Broyden ∗∗∗ ’ ) ;
%%% P r e a l l o c a t i o n %%%
g = zeros ( n , 1 ) ; s = zeros ( n , 1 ) ; y = zeros ( n , 1 ) ;
g = f e v a l ( gcn , x , n ) ; i t e = 0 ;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;

while ( ne ( i t e +1) > i e p s ) ,
i f ( ne ( i t e +1) > meps ( 2 ) ) i f a i l = 4 ; break , end ;
i f ( i t e >= imax ) i f a i l = 2 ; break , end ;
i f ( rcond (B) < meps ( 1 ) ) i f a i l = 5 ; break , end ;
s = −B\ g ; ns = s ’ ∗ s ;
i f ( ns <= 0) i f a i l = 3 ; break , end ;
x = x + s;
i t e = i t e + 1;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;
B = B + ( y − B∗ s ) ∗ s ’ / ns ;
end ;
If the residual becomes larger than a predefined value ’meps(2)’, the process
is not expected to converge. Therefore the computation is stopped to avoid
213
overflow. The condition number of the Broyden matrix B is computed by

rcond(B) = kBk · kB −1 k. The Broyden matrix is considered as approximately
singular if the condition number is smaller than the machine precision, stored
in ’meps(1)’.
The general limited memory Broyden method

We indicated in Chapter 3 that the structure of all limited memory Broyden
methods are similar, except for Algorithms 3.15 and 3.23. The basis of the
limited memory Broyden methods as described in Algorithm 2.1 is given by
the following routine
function [ x , i t e , ne , i f a i l ] = . . .
lmb ( gcn , x , C , D, n , p , q , m, imax , i e p s , meps , i f a i l )
i f ( i f a i l ˜ = 0 | imax == 0) i f a i l = 1 ; return , end ;

i f ( p < 1 | p > n ) i f a i l = 1 ; return , end ;
i f ( q < 0 | q > p −1) i f a i l = 1 ; return , end ;
i f (m < 0 | m > p ) i f a i l = 1 ; return , end ;
disp ( ’ # ∗∗∗ The l i m i t e d memory Broyden method ∗∗∗ ’ ) ;
%%% P r e a l l o c a t i o n %%%
g = zeros ( n , 1 ) ; s = zeros ( n , 1 ) ; y = zeros ( n , 1 ) ;
B2 = zeros ( p , p ) ;
g = f e v a l ( gcn , x , n ) ; i t e = 0 ;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;

while ( ne ( i t e +1) > i e p s ) ,
B2 = eye ( p)−D’ ∗C ;
i f ( rcond ( B2) < meps ( 1 ) ) i f a i l = 5 ; break , end ;
s = C∗ ( B2 \ ( D’ ∗ g ) ) + g ; ns = sqrt ( s ’ ∗ s ) ;
x = x + s;
i t e = i t e + 1;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;
i f (m == p )
%%% R e c o m p o s i t i o n %%%
%%% R e d u c t i o n %%%
m = q;
end ;

m = m + 1;
C ( : ,m) = ( y + s − C ( : , 1 : m−1)∗D ( : , 1 : m−1) ’∗ s ) / ns ;
D( : ,m) = s / ns ;
end ;
The only part that has to be filled in is how the decomposition CD T of

the update matrix is rewritten and which columns of the matrices C and D
are removed.
In the main program the subroutine ’lmb’ is for example called in the
following way.
function program
%%% P r e a l l o c a t i o n and i n i t i a l i s a t i o n %%%
i e p s = 1 . 0 E−12;
meps = [ 1 . 0 E− 1 6 ; 1 . 0 E20 ] ;
imax = 2 0 0 ;
n = 100;
p = 5;
C = zeros ( n , p ) ; D = zeros ( n , p ) ;
m = 0;
x0 = o n e s ( n , 1 ) ;
i f a i l = 0;
q = p−1;
[ x , i t e , ne , i f a i l ] = . . .
lmb ( ’ gcn ’ , x , C , D, n , p , q , m, imax , i e p s , meps , i f a i l ) ;
Removing columns in normal format

The simplest way to create free columns in the (n×p)-matrices C and D is just
by setting p−q columns equal to zero. To satisfy the conditions superposed on
the limited memory Broyden methods, see Section 3.1, the nonzero columns
are stored in the first q columns of the matrices.
For example, we can remove the newest p − q updates of the Broyden
process by setting the last p − q columns of C and D to zero.
%%% R e d u c t i o n %%%
C ( : , q +1:p ) = zeros ( n , p−q ) ;
D( : , q +1:p ) = zeros ( n , p−q ) ;
The oldest p − q updates of the Broyden process are removed by storing

the last q columns of C and D in the first q columns and again setting the last
p − q columns of the new matrices C and D equal to zero.
%%% R e d u c t i o n %%%
C ( : , 1 : q ) = C ( : , p−q +1:p ) ; C ( : , q +1:p ) = zeros ( n , p−q ) ;
D ( : , 1 : q ) = D( : , p−q +1:p ) ; D( : , q +1:p ) = zeros ( n , p−q ) ;
215
Removing columns in SVD-format

For the Broyden Rank Reduction method three additional (p × p)-matrices
have to be declared.
%%% P r e a l l o c a t i o n %%%
R = zeros ( p , p ) ; S = zeros ( p , p ) ; W = zeros ( p , p ) ;
Before the reduction is applied, the matrices C and D are written as
the singular value decomposition of the update matrix. For this the QR-
decomposition is computed of the matrix D and thereafter the SVD-decomposition
of C.
%%% R e c o m p o s i t i o n %%%
%%% QR−d e c o m p o s i t i o n , R %%%
[ D,R] = qr (D, 0 ) ; C = C∗R ’ ;
%%% SVD−d e c o m p o s i t i o n , W%%%
[ C, S ,W] = svd (C , 0 ) ; C = C∗S ; D = D∗W;
The smallest p − q singular values of the update matrix are removed by
setting the last p − q columns of C and D equal to zero.
%%% R e d u c t i o n %%%
C ( : , q +1:p ) = zeros ( n , p−q ) ;
D( : , q +1:p ) = zeros ( n , p−q ) ;
In order to remove the largest p − q singular values of the update matrix,
the last q columns of C and D are copied to the first q columns of both matrices
and subsequently the new last p − q columns of C and D are set equal to zero.
%%% R e d u c t i o n %%%
Note that these reduction procedures also have been applied in the normal
format.
Removing the first columns in QL-format

The Broyden Base Reduction method is actually similar to removing the first
columns of the matrices C and D in the normal format. Before we apply the
reduction the matrix D is first orthogonalized using a QL-decomposition. So,
the (p × p)-matrix L has to be declared.
%%% P r e a l l o c a t i o n %%%
L = zeros ( p , p ) ;
The decomposition of the update matrix is rewritten in the following way.
%%% R e c o m p o s i t i o n %%%
%%% QL−d e c o m p o s i t i o n , L %%%
[ D, L ] = q l (D ) ; C = C∗L ’ ;
The first p − q columns of C and D are removed in the same way as done
in normal format.
%%% R e d u c t i o n %%%
For the subroutine ’ql’ we used the QR-decomposition routine of Matlab.

Let {d1 , . . . , dp } be the columns of D. If the QR-decomposition of [dp , . . . , d1 ]
is given by
 
r11 · · · r1p
£ ¤ £ ¤ .. ..  ,
dp · · · d1 = d˜p · · · d˜1  . . 
rpp
then we obtain
 
rpp
£ ¤ £ ¤ . .. 
d1 · · · dp = d˜1 · · · ˜
dp  .. . e
 =: DL.
r1p ··· r11
So, the ’ql’-subroutine reads
function [ Q, L ] = q l (A ) ;
%%% [Q, L ] = QL(A) p r o d u c e s t h e ” economy s i z e ” QL−d e c o m p o s i t i o n .
%%% I f A i s m−by−n w i t h m > n , t h e n t h e f i r s t n columns o f Q
%%% a r e computed . L i s a l o w e r t r i a n g u l a r m a t r i x .
[ Q, L ] = qr ( f l i p l r (A) , 0 ) ;
Q = f l i p l r (Q) ;
L = f l i p l r ( f l i p u d (L ) ) ;
Removing the last columns in QR-format

The Broyden Base Storing method computes the QR-decomposition of the
matrix D before it removes the last p − q columns of the matrices C and D.
Therefore, we declare the (p × p)-matrix R.
%%% P r e a l l o c a t i o n %%%
R = zeros ( p , p ) ;
Subsequently we rewrite the decomposition of the update matrix in the
following way.
%%% R e c o m p o s i t i o n %%%
%%% QL−d e c o m p o s i t i o n , L %%%
[ D,R] = qr (D, 0 ) ; C = C∗R ’ ;
The last p − q columns are removed in the same way as done in the normal
format.
%%% R e d u c t i o n %%%
C ( : , q +1:p ) = zeros ( n , p−q ) ;
D( : , q +1:p ) = zeros ( n , p−q ) ;
217
The inverse notation of Broyden’s method

For the inverse notation only the computation of the Broyden step s and the
update to the inverse Broyden matrix are different from the standard ’lmb’-
subroutine. So, the Broyden iterations reads
while ( ne ( i t e +1) > i e p s ) ,
s = g − C ( : , 1 :m) ∗D ( : , 1 :m) ’ ∗ g ; ns = sqrt ( s ’ ∗ s ) ;
x = x + s;
i t e = i t e + 1;
ne ( i t e +1) = sqrt ( g ’ ∗ g ) ;
i f (m == p )
%%% R e c o m p o s i t i o n %%%
%%% R e d u c t i o n %%%
m = q;
end ;
m = m + 1;
C ( : ,m) = C ( : , 1 : m−1)∗D ( : , 1 : m−1) ’∗ y − y ;
stHy = s ’ ∗C ( : ,m) ;
D( : ,m) = D ( : , 1 : m−1)∗C ( : , 1 : m−1) ’∗ s − s ;
nHts = sqrt (D( : ,m) ’ ∗D( : ,m) ) ;
i f ( stHy = = 0 | nHts <= 0) i f a i l = 3 ; break , end ;
C ( : ,m) = ( s−C ( : ,m) ) / stHy ∗ nHts ;
D( : ,m) = D( : ,m) / nHts ;
end ;
All reduction methods discussed above are applicable to the limited mem-
ory inverse Broyden method.
The limited memory Broyden method proposed by Byrd et al.

This limited memory Broyden method has to be implemented in a quite differ-
ent setting. In fact, the update to the Broyden matrix is not clearly computed,
but inherent of the algorithm. For the sake of clarity, we declare the matrix
M 2 instead of B2. On the other hand the vectors s and y are not used.
%%% P r e a l l o c a t i o n %%%
g = zeros ( n , 1 ) ;
M2 = zeros ( p , p ) ;
For simplicity we only consider the case where the oldest p − q updates to
the Broyden matrix are removed if m = p. The complete Broyden iteration
reads

while ( ne ( i t e +1) > i e p s ) ,
%%%% Broyden s t e p and u p d a t e %%%
i f (m == 0)
D( : , 1 ) = g ;
else
M2 = zeros (m,m) ;
f o r i = 1 :m
f o r j = 1 : i −1
M2( i , j ) = −D( : , i ) ’ ∗D( : , j ) ;
end ;
end ;
M2( 1 :m, 1 :m) = M2( 1 :m, 1 :m)−D ( : , 1 :m) ’ ∗C ( : , 1 :m) ;
i f ( rcond (M2( 1 :m, 1 :m) ) < meps ( 1 ) ) i f a i l = 5 ; break , end ;
D( : ,m+ 1 ) = . . .
(C ( : , 1 :m)+D ( : , 1 :m) ) ∗ (M2( 1 :m, 1 :m) \ (D ( : , 1 :m) ’ ∗ g ) ) + g ;
end ;
x = x + D( : ,m+1);
ns = sqrt (D( : ,m+1) ’∗D( : ,m+ 1 ) ) ;
C ( : ,m+1) = f e v a l ( gcn , x , n ) − g ; g = C ( : ,m+1) + g ;
i t e = i t e + 1 ; ng = sqrt ( g ’ ∗ g ) ; ne ( i t e +1) = ng ;
%%% S c a l i n g %%%
D( : ,m+1) = D( : ,m+1)/ ns ;
C ( : ,m+1) = C ( : ,m+1)/ ns ;
m = m + 1;
i f (m == p ) ,
%%%% R e d u c t i o n %%%
m = q;
end ;
end ;
The scaling is inserted to overcome a bad condition number for the matrix
M 2. In contrast with the matrix B2 the matrix M 2 is not invertible if m < p,
because we have declared M 2 as a (p × p)-matrix. Therefore, we use the
left-upper (m × m)-part of the matrix M 2.
Appendix C
Estimation of the model

parameters
In the simulations of the two-dimensional model (6.26)-(6.28), we take the

same parameter values as used by Khinast, Jeong and Luss (1999), see Table
C.1. To compute the effective axial heat conductivity the following expression
is proposed
u2
λax = (1 − ε)λs + λg + hav .
(ρcp )2g
3 3
(ρcp )s 1382.0 kJ/m K (ρcp )g 0.6244 kJ/m K η 1
k∞ 9.85 · 106 s−1 av 1426.0 m2surf /m3react kc 0.115 m/s
h 0.02kW/(m2 K) L 4.0 m ε 0.38
Tc = T 0 323 K ∆Tad 50 K Ea /Rgas 8328.6 K
Dax 3 · 10−5 m2 /s u 1.0 m/s tf 1200s
λs 0.0 kW/(mK) λg 2.6 · 10−4 kW/(mK)
Table C.1: Parameter values for the reverse flow reactor
In this appendix we derive appropriate values for the radial dispersion

Drad and for the radial heat conductivity λrad using correlation formulas of
Westerterp, Swaaij and Beenackers [72]. The derived values of the radial
parameters are given in Table C.2. In our simulations we fix the flow reverse
time (tf = 1200s).
When a fluid flows through a packed bed of solid particles with low porosity,
the variations in the local velocity cause a dispersion in the direction of the
flow. In not too short beds (i.e., L/dp > 10, where dp denotes the particle
size) this dispersion can be described by means of a longitudinal dispersion
219
220 Appendix C. Estimation of the model parameters
coefficient, although in reality no back mixing occurs. The void spaces of a

packed bed can be considered as ideal mixers, and the number of voids is
roughly equal to
L
N∼ .
dp
Using the relation N = Pem /2 = uL/(2εDax ), the following expression for the
axial dispersion in packed beds, denoted by the Bodenstein number, can be
derived
udp
Boax = = 2.
εDax
To avoid large wall effects, it is assumed that dp /2R < 0.1. It is known that
the radial Bodenstein number, Borad ∼ udp /(εDrad ), approaches a value of 10
to 12 at Re > 100. This implies that the coefficient of transverse dispersion
Drad is about six times smaller than Dax .
Heat can be transported perpendicular to the main flow by the same mech-
anism if a transverse temperature gradient exists, resulting in a convective heat
conductivity λ0rad . Besides, heat transport occurs by thermal radiation between
the particles. The (isotropic) thermal conductivity of the bed is denoted by
λ0 . The total radial thermal conductivity is then given by
λrad = λ0 + λ0rad ,
where λ0 and λ0rad act fairly independently. For the convective heat conduc-
tivity the following correlation is given
(ρcp )g dp u
λ0rad = .
8[2 − (1 − dp /R)2 ]
Note that under stagnant conditions, we have that λ0rad = 0 and that the
radial heat dispersion coefficient equals the thermal conductivity. If the heat
diffusion through the solid particles can be neglected (that is, λs = 0) the
following expression is valid for λ0 ,
λ0 = 0.67 · λg · ε,
in case of 0.26 < ε < 0.93 and T < 673K. Using the parameter values given in
Table C.1, we arrive at the following expression for the radial heat conductivity
0.6244 · dp · 1.0
λrad = λ0 + λ0rad = 6.6 · 10−5 +
8[2 − (1 − dp /R)2 ]
dp
= 6.6 · 10−5 + 7.81 · 10−2 .
2 − (1 − dp /R)2
221
If we choose the particle diameter to be dp = 1.0·10−3 m and take R in the range

from 0.01m to 0.1m, then the radial heat conductivity varies from 1.32·10 −4 to
1.43·10−4 kW/(mK). Therefore, we fix the value at λrad = 1.4·10−4 kW/(mK).
dp 1.0 · 10−3 m Drad 0.5 · 10−5 m2 /s λrad 1.4 · 10−4 kW/(mK)
Table C.2: The values of the radial parameters for the two-dimensional model of the
reverse flow reactor
In the computations of the Chapter 5 and 8 we have used the dimen-

sionless equations for the one-dimensional model, (6.23)-(6.25), and the two-
dimensional model, (6.26)-(6.28). The corresponding dimensionless parame-
ters of Table 6.2 are computed using the values of Tables C.1 and C.2.
Symbols
In Section 6.2 we have used the following symbols.
Roman
av specific external particle surface area, m2surf /m3reactor
aw specific reactor wall surface area, m2wall /m3reactor
c, C concentration, kmol/m3
D dispersion coefficient, m2 /s
dp particle diameter, m
Ea activation energy, kJ/kmol
h heat-transfer coefficient, kW/(m2 K)
kc mass-transfer coefficient, m/s
k∞ frequency factor for reaction, s−1
L reactor length, m
r radial distance, m
R radius of the reactor, m
Rgas universal gas constant, kJ/(kmol K)
u superficial gas velocity, m/s
Uw heat-transfer coefficient at reactor wall, kW/(m2 K)
t time, s
tf flow reverse time, s
T temperature, K
Tc (T0 ) cooling(feed) temperature, K
z axial distance, m
222 Appendix C. Estimation of the model parameters
Greek
−∆H heat of reaction, kJ/kmol
−∆Tad adiabatic temperature rise, K
ε void fraction, [−]
η effectiveness factor, [−]
λ0 (isotropic) thermal conductivity, kW/(m3 K)
λ0 convective heat conductivity, kW/(m3 K)
λ thermal conductivity, kW/(m3 K)
(ρcp ) volumetric heat capacity, kJ/(m3 K)
Φ dimensionless cooling capacity, [−]
Dimensionless parameters
Bo Bodenstein number
Pe Péclet number
Pr Prandl number
Re Reynolds number
Subscripts
ax axial direction
rad radial direction
g gas phase
s solid phase
Samenvatting
(Waarom Broyden?)
Wiskunde is één van de oudste wetenschappen in de wereld, maar speelt in

het huidige wetenschappelijke onderzoek nog altijd een prominente rol. Be-
schouwen we de volgende voorbeelden.
• Ecologisch onderzoek naar vervuiling van een fabriek die zijn afval loost
in een baai met open verbinding naar zee.
• Het gedrag van veengrond onder invloed van dag en nacht.
• De toestand van kraakbeen in de pols onder herhaaldelijke belasting.
• Periodiek aangedreven processen in chemische reactoren.
Wat deze processen met elkaar gemeen hebben is dat ze wiskundig ge-
zien eigenlijk hetzelfde zijn. Ze worden namelijk beschreven met behulp van
partiële differentiaalvergelijkingen met tijdsafhankelijke parameters en rand-
condities. We zijn vooral geı̈nteresseerd in wat een systeem doet na verloop
van (een lange) tijd. Omdat de condities van het proces periodiek in de tijd
zijn, verwachten we hetzelfde voor de uiteindelijke toestand van het systeem.
In dat geval noemen we het systeem in periodiek stabiele toestand (’cyclic
steady state’).
We beschouwen een variabele, zeg x, bijvoorbeeld de concentratie van de
giftige stof in het water van een baai, of de temperatuur van de reactor. Deze
variabele hangt af van plaats en tijd. In eenvoudige gevallen zijn de partiële
differentiaalvergelijkingen nog op te lossen. Maar wanneer meerdere variabe-
len in het proces een rol spelen of wanneer het mechanisme moeilijker wordt,
is dit al niet meer te doen en moeten we op een andere manier een oplossing
vinden. De laatste decennia kunnen we de hulp inroepen van de computer.
Daarvoor dienen we de partiële differentiaalvergelijkingen eerst aan te passen
223
224 Samenvatting
zodat de computer er überhaupt iets mee kan doen. We discretiseren de ver-

gelijkingen op een grid, dat wil zeggen, we delen de ruimte op in kleine blokjes
en nemen aan dat per blokje de variabele constant is. In plaats van de partiële
differentiaalvergelijkingen hebben we nu een groot stelsel van n gewone diffe-
rentiaalvergelijkingen, voor elk blokje in de ruimte een vergelijking die afhangt
van x. We definiëren de periode-afbeelding f : Rn → Rn als de functie die de
toestand aan het begin van een periode overbrengt naar de toestand aan het
einde van de periode. Om de periode-afbeelding te verkrijgen, moeten we het
stelsel van gewone differentiaalvergelijkingen integreren over één periode. De
periodiek stabiele toestand is dus een vast punt van de periode-afbeelding en
een nulpunt van de functie g(x) = f (x) − x. De uiteindelijke vergelijking die
we moeten oplossen, wordt
g(x) = 0.
De uitdaging is nu om steeds gedetailleerdere modellen te gebruiken om de

processen beter te kunnen omschrijven. Bovendien kan het nodig zijn om een
fijner grid te gebruiken, dat wil zeggen, meer gridpunten. De dimensie van het
gediscretiseerde probleem n wordt hierdoor groter en efficiënte methodes zijn
nodig om g(x) = 0 op te lossen.
In toepassingen is de methode van Broyden populair. Deze methode gaat
uit van een begin-schatting x0 voor het nulpunt x∗ van de functie g. Met behulp
van een iteratief schema wordt een reeks van iteraties {xk } berekend dat naar
de oplossing x∗ convergeert. Hierbij wordt gebruik gemaakt van een matrix
Bk dat de Jacobiaan (de afgeleide) van de functie g in de iteratie xk benadert.
De Broyden matrix wordt elke iteratie aangepast door er een rang-één-matrix
bij op te tellen. Per iteratie wordt slechts één functie-evaluatie uitgevoerd. De
methode van Broyden blijkt in het bijzonder geschikt te zijn voor problemen
afkomstig van periodieke processen.
Een nadeel van de methode van Broyden, is dat de (n × n)-matrix Bk op-
geslagen moet worden. Dit kan een probleem vormen als het model te groot
wordt. De vraag is dus of we de Broyden matrix op een efficiënte manier
kunnen opslaan. Na een uitgebreide studie van de methode van Broyden,
waarvan de simulaties vooral te vinden zijn in het tweede deel van dit proef-
schrift, hebben we een oplossing gevonden. Door de Broyden matrix (zelf een
benadering) te benaderen, lukt het om de matrix op te slaan met behulp van
2pn elementen, in plaats van de n2 elementen. De parameter p wordt vooraf
vastgesteld en de ideale waarde voor p is bepaald door eigenschappen van de
functie g en niet van de dimensie n van het gediscretiseerde probleem. Het
blijkt dat voor veel gevallen p klein gekozen kan worden. De methode die we
hebben ontwikkeld, noemen we de ’Broyden Rank Reduction’ methode. Als
Samenvatting 225
bijkomend voordeel, gaan de grote n-dimensionale berekeningen die nodig zijn

voor de methode van Broyden over in kleine p-dimensionale berekeningen.
n
p
p
We hebben bewezen onder welke omstandigheden de ’Broyden Rank Reduc-

tion’ methode even snel is als de originele methode van Broyden. Dit is het
belangrijkste resultaat van het eerste deel van dit proefschrift.
De aanleiding voor het ontwikkelen van de ’Broyden Rank Reduction’ me-
thode was een probleem afkomstig uit de reactorkunde. Dit probleem wordt
in het laatste deel van dit proefschrift volledig uitgewerkt.
De ’reverse-flow’ reactor is een cilindrische buis gevuld met een katalysator-
deeltjes waar een gas doorheen wordt gestuurd. In dit gas zit een reagens dat,
wanneer het in contact komt met de katalysator, reageert tot een product. We
veronderstellen dat de reactie exotherm is, dat wil zeggen, er komt warmte bij
vrij. Omdat de reactie alleen plaatsvindt als de temperatuur hoog genoeg
is, warmen we de reactor eerst op voordat we het proces starten. Als we nu
koud gas (op kamertemperatuur) de reactor inlaten, warmt het op en vindt de
reactie plaats. Dit heeft twee effecten. Op de plaats waar de reactie plaats-
vindt, ontstaat een reactie-front en stijgt de temperatuur. Vervolgens wordt
dit reactie-front door de reactor heen gestuwd en zal, wanneer we niets ver-
anderen aan de condities van het proces, de reactor verlaten. De hele reactor
is dan op kamertemperatuur en er kan geen reactie meer plaatsvinden. We
kunnen dit voorkomen door voordat het reactie-front de reactor heeft verla-
ten de reactor in omgekeerde richting te gaan gebruiken. We laten dan het
gas binnen aan het rechter uiteinde en vangen het product op aan het linker
uiteinde. Hierdoor zal het reactie-front weer naar links verschuiven.
Daar er veel energie bij de reactie vrijkomt en de reactor aan de wand wordt
gekoeld, ontstaan er in radiale richting temperatuursgradiënten. Daarom zou-
den we graag de reactor beschrijven met behulp van een twee-dimensionaal
model, met de concentratie van het reagens en de temperatuur als variabelen.
Indien we voor de discretisatie 100 gridpunten nemen in axiale richting en
25 gridpunten in radiale richting, is de dimensie van het gediscretiseerde pro-
bleem, n, gelijk aan 2 · 100 · 25 = 5000. Het blijkt dat de vergelijking g(x) = 0,
niet meer is op te lossen met behulp van de methode van Broyden, daar naast
226 Samenvatting
alle andere matrices en vectoren een Broyden matrix moet worden opgesla-
gen met 25.000.000 elementen. De ’Broyden Rank Reduction’ methode kan
wel worden toegepast voor bijvoorbeeld p = 20 of p = 10. Hiervoor moeten
respectievelijk 200.000 en 100.000 elementen worden opgeslagen. De methode
convergeert voor beide p’s even snel, terwijl het geheugengebruik wordt ge-
halveerd. De parameter p kan zelfs gelijk worden gekozen aan 5 (opnieuw een
halvering van het geheugengebruik) in ruil voor een paar extra iteraties. Een
periodieke toestand van de ’reverse-flow’ reactor met temperatuursgradiënten
in radiale richting kan met de ’Broyden Rank Reduction’ methode nu voor het
eerst efficiënt berekend worden.
Nawoord
Dit proefschrift was nooit voltooid zonder de inbreng en steun van vele vrien-
den, collega’s en bekenden. Daarbij denk ik vooral aan de personen die direct
bij het totstandkoming van dit proefschrift betrokken zijn geweest. Verschil-
lende leden van de promotiecommissie hebben door hun opmerkingen en vra-
gen de presentatie van de resultaten overzichtelijker gemaakt en het aantal
fouten en onnauwkeurigheden verminderd.
Financiële ondersteuning voor congres- en werkbezoek heb ik gekregen van
het Leids Universiteits Fonds, Shell en van NWO via het Pioneer project. Het
Mathematisch Instituut heeft mij alle vrijheid gegeven om ongestoord mijn on-
derzoek te doen en bovendien mij te kunnen voorbereiden op het toekomstig
werk voor de klas. Gedurende vier jaar heb ik me thuis gevoeld op twee totaal
verschillende instituten. Zowel in Leiden als bij het Instituut voor de Techni-
sche Scheikunde in Amsterdam was er altijd wel iemand om een oplossing te
vragen voor een probleem (niet noodzakelijk betreffende mijn onderzoek) of
om trots mijn nieuwste resultaten te laten zien (van mijn kat Siep bijvoorbeeld,
zie de introductie).
In het laatste jaar van mijn promotie kreeg ik de gelegenheid om een maand
de Colorado State University te bezoeken op uitnodiging van Don Estep. Het
was voor mij een grote uitdaging om het resultaat van mijn onderzoek te
bespreken met hem en zijn collega’s. Het was leuk om de verschillende fiets-
routes in en rondom Fort Collins te ontdekken.
Vanaf het begin van mijn promotie heb ik veel gehad aan de gesprekken
met Tycho, die mij in wezen is voorgegaan in dit onderzoek. Zowel zijn heldere
ideeën als vermogen tot relativeren hebben mij veel geholpen. Het basis-idee
van de ’Broyden Rank Reduction’ methode is mede aan hem te danken.
De tijd die ik met Miguel op de kamer zat, waren aangenaam en effectief.
Het schoolbord op de kamer werd intensief gebruikt, want het was vaak al
voldoende om een probleem uit te leggen aan de ander om zelf ineens de
oplossing te zien. Zijn kennis over Latex is dit proefschrift zeker ten goede
gekomen.
227
228 Nawoord
Ook buiten de universiteit hebben velen mij geholpen bij het voltooien
van mijn promotie, vaak ook zonder dat zij zich ervan bewust waren. Door
het commentaar van Bertram en Luuk is de leesbaarheid van het begin en
het einde van dit proefschrift vergroot. De onvoorwaardelijke steun van mijn
ouders, mijn broer en mijn zus is voor mij van onschatbare waarde.
Désirée is voor mij de reden om altijd vol te blijven houden.
Curriculum Vitae
Bart van de Rotten is geboren op 20 oktober 1976 in Uithoorn. Op zevenjarige

leeftijd behaalde hij als laatste van de klas het diploma voor vermenigvuldigen.
Deze valse start werd echter snel ingehaald en hij slaagde erin om alle reken-
taken van de basisschool uit te werken. Door één van zijn leraren werd hij zelfs
getipt als de toekomstig directeur van de IBM. Ook op de middelbare school
bleef de interesse voor de wiskunde groeien, mede door het enthousiasme van
zijn wiskunde-docent. Vanaf de derde klas tot aan het laatste jaar van de
universiteit heeft hij leerlingen begeleid in wis- en natuurkunde. Zowel voor
wiskunde A, wiskunde B als natuurkunde sloot hij het gymnasium af met
(afgerond) een tien voor het eindexamen.
Op 3 juli 1995 nam hij zijn diploma in ontvangst en in de herfst van
datzelfde jaar begon hij de wiskunde-opleiding aan de Vrije Universiteit te
Amsterdam. In de zomer van 1998 startte hij een specialisatie in de Operato-
rentheorie onder begeleiding van dr. A.C.M. Ran. Voor het schrijven van zijn
eindscriptie ’Invariant Lagrangian subspaces of infinite dimensional Hamilto-
nians and the Ricatti equation’, vertrok hij in oktober 1998 voor vier maanden
naar de Technische Universität Wien. Hier was hij te gast bij prof. dr. H.
Langer een expert op het gebied van de operatorentheorie. Op 25 augustus
1999 studeerde hij cum laude af.
Zijn interesse ging over van de zuivere naar de meer toegepaste wiskun-
de. Bij prof. dr. S.M. Verduyn Lunel en prof. dr. A. Bliek (Universiteit
van Amsterdam) deed hij zijn onderzoek in het oplossen van grote stelsels
niet-lineaire vergelijkingen, afkomstig van modellen voor chemische, periodiek
aangedreven reactoren, waarvan dit proefschrift het resultaat is. Tijdens deze
periode gaf hij werkcollege’s in analyse en numerieke wiskunde. Daarnaast be-
zocht hij conferenties in Lunteren, Wageningen, Hasselt en Montreal, gaf hij
diverse keren een voordracht in Leiden, Utrecht en Amsterdam, deed hij mee
aan een modelleer-week in Eindhoven en volgde hij onder andere een cursus
in Delft. Als hoogtepunt van zijn promotie bezocht hij het voorjaar van 2003
de Colorado State University. Daar was het mogelijk om zijn onderzoek te
229
230 Curriculum Vitae
verdiepen met behulp van de expertise van prof. dr. D. Estep en zijn groep.
Binnenkort keert hij terug naar de Vrije Universiteit waar hij een opleiding
gaat volgen tot wiskunde-docent.

Broyden Method

Uploaded by

Copyright:

Available Formats

You might also like

Broyden Method

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Broyden Method

Uploaded by

Copyright:

Available Formats

A limited memory Broyden method to solve

high-dimensional systems of nonlinear equations

ter verkrijging van

Bartholomeus Andreas van de Rotten

promotoren: prof. dr. S.M. Verduyn Lunel

referent: prof. dr. D. Estep (Colorado State University)

overige leden: prof. dr. G. van Dijk

I Basics of limited memory methods 15

1 An introduction to iterative methods 17

2 Solving linear systems with Broyden’s method 55

3 Limited memory Broyden methods 71

II Features of limited memory methods 109

4 Features of Broyden’s method 111

5 Features of the Broyden rank reduction method 135

III Limited memory methods

6 Periodic processes in packed bed reactors 149

7 Numerical approach for solving

8 Efficient simulation of periodically forced reactors in 2D 183

Notes and comments 195

A Test functions 207

B Matlab code of the limited memory Broyden methods 211

C Estimation of the model parameters 219

Samenvatting (Waarom Broyden?) 223

Curriculum Vitae 229

Periodic chemical processes form a field of major interest in chemical reac-

Fixed points of the map f correspond to zeros of g : Rn → Rn where g is

So, the basic equation we want to solve is

g(x) = 0 for x ∈ Rn . (1)

Because (1) is a system of n nonlinear equations, iterative algorithms are

Basics of limited memory methods

xk+1 = xk − Jg−1 (xk )g(xk ), k = 0, 1, 2, . . . , (2)

where Jg (x), is the Jacobian of g at the point x. An advantage of the method

g(xk+1 ) − g(xk ) = A(xk+1 − xk )

holds. According to this equality, the updated Broyden-matrix Bk+1 is chosen

Example 1. The period map f : Rn → Rn to be considered is a small (take

The unique fixed points of the function f, x∗ = 0, can be found by applying

kg(x)k < 1.0 · 10−12 ,

where c1 = g(x1 )/ks0 k and d1 = s0 /ks0 k. Because B0 does not have to be

Q = c1 dT1 + c2 dT2 + . . . + c5 dT5 .

where {u1 , . . . , u5 } and {v1 , . . . , v5 } are orthonormal sets of vectors and

σ1 = 2.49, σ2 = 1.61, σ3 = 0.214 · 10−5 ,

e = Q − σ5 u5 v5T = σ1 u1 v1T + . . . + σ4 u4 v4T .

We define c̃i := σi ui and d˜i := vi for i = 1, . . . 4. The difference between the

PSfrag replacements 0.5

The rate of convergence of this process is plotted in Figure 2, together with

The reduction applied to the update matrix Q can also be explained as

Features of limited memory methods

Example 2. Let the matrix A be given by the sum

Limited memory methods applied to periodic processes

obtain an accurate approximation of the periodic state of the reactor, it is nec-

grid discretization, which renders such models inadequate or inaccurate.

PSfrag replacements PSfrag replacements

Figure 4: Qualitative conversion and temperature distribution of the cooled reverse

As an example, we consider a reverse flow reactor, which is considered in

the energy balance is given by

and the boundary conditions are given by

In Chapter 7, we describe a numerical approach to deal with the partial

x1 = x0 − B0−1 g(x0 ) = x0 + f (x0 ) − x0 = f (x0 ).