2017 Using White-Box Nonlinear Optimization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

MAIN ARTICLE

Using white-box nonlinear optimization


methods in system dynamics policy
improvement
Ingmar Vierhaus,a* Armin Fügenschuh,a Robert Gottwaldb and
Stefan Grösserc

Abstract
We present a new strategy for the direct optimization of the values of policy functions. This
approach is particularly well suited to model actors with a global perspective on the system and
relies heavily on modern mathematical white-box optimization methods. We demonstrate our
strategy on two classical models: market growth and World2. Each model is first transformed
into an optimization problem by defining how the actor can influence the models’ dynamics and
by choosing objective functions to measure improvements. To improve comparability between
different runs, we also introduce a comparison measure for possible interventions. We solve the
optimization problems, discuss the resulting policies and compare them to the existing results
from the literature. In particular, we present a run of the World2 model which significantly
improves the published “towards a global equilibrium” run with equal cost of intervention.
Copyright © 2018 System Dynamics Society

Syst. Dyn. Rev. 33, 138–168 (2017)

Additional Supporting Information may be found online in the supporting information tab for
this article.

Introduction

System dynamics (SD) models describe the behavior of complex systems


over time using several interrelated stocks, flows, feedback loops and delays
(Schwaninger and Grösser, 2008; Richardson, 2011). SD models are generally
developed, first, to understand existing dynamics in systems, and second, to
improve implemented policies, for example in project management (Ford
and Sterman, 1998), in the energy consumption of a residential built envi-
ronment (Grösser, 2014), in city development (Forrester, 1969), in innovation
management (Repenning et al., 2001) or in strategic management (Groesser
and Jovy, 2016; Rahmandad and Repenning, 2016). SD models consist of
ordinary differential equations, nonlinear functional relations and table data.

a
Helmut Schmidt University/University of the Federal Armed Forces Hamburg, Holstenhofweg 85, 22043
Hamburg, Germany
b
Zuse Institute Berlin, Takustra_e 7, 14195 Berlin, Germany
c
Bern University of Applied Science, School of Engineering, https://maps.google.com/?q=Quellgasse+10&
entry=gmail&source=g, Quellgasse 10, 2105 Biel, Switzerland
* Correspondence to: Ingmar Vierhaus. E-mail: vierhaus@zib.de
Accepted by Andreas Größler, Received 21 July 2016; Revised 23 May 2017, 22 September 2017 and
20 October 2017; Accepted 29 October 2017

System Dynamics Review


System Dynamics Review vol 33, No 2 (April/June 2017): 138-168
Published online in Wiley Online Library
(wileyonlinelibrary.com) DOI: 10.1002/sdr.1583

138
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 139

Even if each of the elements in such a system is individually well under-


stood, the interplay between several of these elements may show a surpris-
ing, unexpected behavior over time (Forrester, 1971a).
Policies are a basic and important concept of SD modeling. We interpret a
policy as a model for an actor in the system. This actor influences the dynam-
ics of the system based on some input information. An example of an actor
that we have in mind is a manager, who decides on the expansion or
reduction of the production capacities of his company. Thus a policy is the
decision rule which specifies how the actor selects and uses available infor-
mation to make decisions in order to achieve goals (Sterman, 2000). The pol-
icy normally depends on the variables and information that are directly
available to this actor. It is said that such an actor has a local, limited view.
In other words, he exhibits bounded rationality (Simon, 1984). Questions reg-
ularly arise concerning whether a given policy can be improved, or even
what a “good” policy actually is. In this context, the need for efficient com-
putational methods for policy analysis as well as policy improvement and
design has been recognized in SD (see, for example, Keloharju and
Wolstenholme, 1988; Yücel and Barlas, 2011), and is an active field of
research. In terms of SD modeling, a policy is represented by a functional
relationship. The function depends on the model variables that correspond
to the information that is available to the actor. The value of the function
then represents the actor’s decision and influences a flow in the model. Such
functional relationships are often defined in terms of table functions. For
simplicity, in the remainder of this paper we will refer to such a function as a
policy function. When developing a simulation model, the modeling step of
“policy formulation and evaluation” also compares the performance of two
or more candidate policies (Sterman, 2000). When two simulations with dif-
ferent policies lead to different behaviors of the system, one has to evaluate
which of the two simulations is more suitable or “better” for a given model
purpose. To answer this question, one needs to define an objective function
such that the higher the value of the objective function for a given simulation
(or lower, in case of minimization), the better the corresponding policy
(Dangerfield and Roberts, 1996). Once an objective function is defined, there
are several approaches to computer-aided policy improvement.
Direct parameter policy design starts with the definition of an analytic,
parametrized and usually nonlinear policy function (Keloharju and
Wolstenholme, 1989). The parameters of this function are set to starting
values, and for each parameter a range (an interval) of valid values is
defined. In a common SD simulation, the values of all variables are
completely determined by the model equations and the initial values. To
optimize in an SD model, a set of variables must be free, that is, not deter-
mined by the model equations already. These “free variables” can be varied
by the optimization algorithm, to achieve the desired objective. A solution of
an SD optimization problem consists of one value assigned to each of the
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
140 System Dynamics Review

free variables. In the case of direct parameter policy design, the parameters
of the policy function are the free variables. Consequently, the goal of the
policy improvement is to find a set of parameter values within the given
range that defines a policy function, which in turn improves the value of the
objective function. We call the set of all possible solutions of an optimization
problem the search space. In direct parameter policy design, the search
space is defined by the parameters of the policy function. Consider the
S-shaped logistic function Sðx Þ = 1 +ae −x as an example. The function is param-
etrized by a single parameter a. If the goal is to find the best value for this
parameter, the search space of the corresponding optimization problem is
only one dimensional and the problem would be easy to solve. On the other
hand, only S-shaped policy functions can be the result of such an optimiza-
tion. The expectations of the modeler on the shape of the policy limit the
possible results. By “modeler,” we refer to the person developing the model
and within this process also describing the actor. If a software package offers
parameter optimization capabilities, it is usually possible to attempt the
solution of such direct parameter policy design problems. In Yücel and
Barlas (2011) the “pattern-oriented parameter specifier (POPS)” routine was
presented, which aims to find parameter settings that produce a desired pat-
tern of behavior in one of the model variables. In this routine, the resulting
optimization problem is solved using a genetic algorithm. Here, the core idea
is to optimize, by repeated simulation, one of the conventional approaches to
SD optimization (Liu et al., 2012). For the optimization algorithm, the under-
lying model is a black box, which receives the current parameter values and
then returns the objective value after carrying out one single simulation run
of the model. Such black-box approaches have the advantage that any model
that can be simulated can also be optimized since there are no requirements
on the properties of the model equations. Examples of the application of
black-box optimization to SD can be found, for instance, in Sterman (2000).
However, approaches using repeated simulation suffer from the “curse of
dimensionality” (Bellman, 2003), where the significant dimension is that of
the space of free variables. An additional free variable adds a dimension to
the optimization algorithm’s search space. Solving optimization problems
with a large number of free variables therefore quickly becomes impractical.
Table function policy design is a possible way to generalize direct parame-
ter policy design, by defining a parametrized table function instead of an
analytic function (Keloharju and Wolstenholme, 1989). In this case, the mod-
eler has to define the number of data points of the table function and two
intervals that define valid values of the data points on the x- and y-axes. This
approach removes the modeler’s expectations of the shape of the policy from
the process. However, the possible policies are reduced to the space of the
piecewise linear functions with the selected number of points. If the data
points are required to have a predefined distance on the y-axis, the possible

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 141

solutions are reduced further, but the number of parameters and thus the
number of free variables decrease. As in the previous case, the goal of the
policy improvement is to find parameter values (i.e. data points of the table
function) that improve the value of the objective function. A software pack-
age that supports table function policy design is the Powersim Studio plug-
in SOPS (Moxnes and Krakenes, 2005).
The policy function is a model about what information an actor uses to
make decisions in a system. If the actor has only a bounded view of the sys-
tem, then the policy will only depend on the variables and information that
are available to this actor (Sterman, 2000). An improved policy will enable
this actor to make better decisions based on the limited information available
to him. Recent work has focused on improving policies for such actors,
using, for instance, co-evolutionary analysis (Liu et al., 2012). We will con-
sider a different kind of actor. Our actor has a global perspective on the
model, i.e. he or she has information on all the state variables at all times
within the simulation time horizon. Modeling a policy of such a comprehen-
sively aware actor with the conventional approaches to policy analysis is a
difficult endeavor. One way would be to define a table function for each
state, which depends only on that state. A mixed policy function that
depends on all states can then be defined as a sum of these table functions
(Keloharju and Wolstenholme, 1989). As a consequence of the “curse of
dimensionality,” the degrees of freedom of a mixed policy function are lim-
ited from a practical perspective, if an optimization of the policy by repeated
simulation is attempted.
We follow a different route and directly optimize the values of the policy
function. This is equivalent to defining the policy as a time-dependent table
function with one data point for each discrete time step within the time hori-
zon. In the context of physical systems, this type of problem is known as an
“optimal control problem” (Betts, 2010). In this approach, no assumptions
on the properties of the policy function are made a priori. It is only neces-
sary to select the free variables. In a conventional approach, these free vari-
ables contain the values of the policy functions. For each of these variables,
the range of valid values must be defined. It is then the task of the optimiza-
tion to find the optimal value for each free variable at each time, where the
continuous time is discretized into time steps. This leads to very large
parameter spaces compared to the conventional approaches. In particular, to
optimize np policy functions in a model with nt time steps, the search space
is of dimension np × nt. We propose and demonstrate this approach by
employing state-of-the-art nonlinear optimization algorithms to solve the
resulting optimization problems (Hanson et al., 2009; Betts, 2010) that are
not as affected by the curse of dimensionality as the conventional methods.
Here, the optimization algorithm works directly on the model equations and
does not require a full simulation of the model at each iteration. In contrast
to the conventional black-box approaches, this can be seen as a white-box
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
142 System Dynamics Review

optimization approach. Details on the transformation of SD models to opti-


mization problems can be found in Appendix S1 (Supporting information).
The solution that results from this approach has several interesting charac-
teristics. First, the setup of the optimization problem is easily achieved. Sec-
ond, it optimizes the values of selected policies from a global perspective,
i.e. with respect to all states and further model parameters. Third, it is guar-
anteed to be locally optimal in a search space that has not been limited by a
priori assumptions about the policy function. Finally, depending on the
model size, it is computed within seconds or minutes. We classify problems
where a set of free variables need to be optimized to maximize or minimize
a given objective function defined in an SD model as a system dynamics
optimization (SDO) problem (Figure 1).
The remainder of this paper is organized as follows. In the next section we
discuss different available optimization methods. In the third section we
consider two models from the literature to demonstrate the benefits of our
optimization approach. In final section we summarize our findings and pro-
vide an outlook for further research topics. We address the technical details
of our approach in an Appendix S2 (Supporting information) that can be
downloaded from the journal’s web page. There, we also present a new data
format and new software that was developed and published to simplify the
use of the proposed methods.

Optimization methods for SD models

We discuss some optimization methods, and distinguish between black-box


and white-box as well as between local and global methods. In the Appendix
S2 (Supporting information) to this paper, we also review the best-
documented optimization methods which are used by SD software. These
are Powell’s method, Box’s method, and Markov chain Monte Carlo methods
and their derivatives (such as genetic algorithms or simulated annealing).
They are used in Ventana Vensim (Ventana Systems, Inc., 2015), Insight

Fig. 1. Flowchart Selection of


demonstrating our Definition of
Selection of objective function free variables
approach. The modeler is SD model with range of
(in the state variables)
required to perform the valid values
steps in the top line. For
the tasks in the bottom Modeller
line, software is already Software
available or is presented
Continuous,
Appendix S1 (Supporting nonlinear,
Interpolation
information) Solution using non-smooth
Time-Discretization of non-smooth
NLP solver optimization
functions
problem

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 143

Maker (Fortmann-Roe, 2014) and Goldsim (GoldSim Technology Group,


2015). There are other SD software packages that also offer some optimiza-
tion capabilities, e.g. Powersim (Powersim Software AS, 2015), Anylogic
(The AnyLogic Company, 2015) and Dynaplan Smia (Dynaplan, 2015). How-
ever, these vendors do not reveal many details about their software’s optimi-
zation method. Other software vendors do not currently include
optimization, e.g. Isee systems’ Stella (Isee Systems, Inc., 2015) and Forio
(Forio Corporation, 2015).

Black-box versus white-box optimization


When optimizing a SD model with respect to an objective function, two prin-
cipally different approaches exist: a black-box and a white-box approach.
For the first, fixed values are assigned to selected free variables in the model.
Then, it is simulated. After the simulation is completed, the resulting value
of the objective function is evaluated. Next, new fixed values for the free var-
iables are selected based on a defined algorithm or heuristic, e.g. a genetic
algorithm, and the model is again simulated. After that, the resulting value
of the objective function is evaluated. The black-box approach uses only
information about the objective function value from the SD model; all other
information in an SD model concerning the model structure is neglected.
Let us compare the black-box optimization approach with a hiker in the
mountains who is searching for the (locally) highest peak. The person lacks
a map of the terrain; however, the person has a GPS tracker and can thereby
obtain the current GPS coordinates of his position. Further, the hiker can
obtain his current altitude by means of the current GPS coordinates. Does
the GPS coordinates tell you anything about the direction the hiker should
go next? Does it give you any hint to avoid falling down a steep cliff? No, not
at all. This short story can be used to explain the black-box optimization
approach. The GPS coordinates are analogous to the fixed parameter values
in an SD model; the resulting altitude is analogous to the resulting values of
the objective function. Simulating an SD model for a set of specific parame-
ters (GPS coordinates) obtains an objective function value (altitude). The
promise of all black-box approaches is to test a large number of different
GPS coordinates with the intent to find better values of the objective func-
tion. Although this heuristic approach has been proven to be successful for
many applications, there is no theoretical guarantee that it converges or even
finds a solution at all, and thus it is also criticized (e.g. Conn, 2014: “Simu-
lated Annealing, Genetic Algorithms etc. are usually for the ignorant or the
desperate or both”).
The opposite approach is white-box optimization. It uses the full algebraic
information of the SD model and exploits it during the search for an opti-
mum. This approach allows us to compute objective function values for a
single set of parameters. Moreover, it allows us to obtain information on the
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
144 System Dynamics Review

first partial derivatives of the SD model, which provides hints about the
direction of the steepest descent and ascent. When no further ascent is possi-
ble, the second derivatives of the model reveal the curvature at this point.
With this information, a local optimum can be guaranteed. Let us express
the white-box approach with the hiker analogy. Now, the hiker has in addi-
tion to the GPS device also a map of the local neighborhood at any given
point. So the hiker can use the information from the map to plan the next
steps with greater care, and falling off a cliff can be avoided.
The black-box optimization approach was in the focus of the existing liter-
ature on the optimization of SD models. The white-box approach, however,
has found considerable attention in the mathematical control community
(Betts, 2010), but so far has left no significant footprint in the literature on
optimization in SD. Our work addresses this gap. Although the mathematical
methods such as interior point – first introduced by Karmarkar (1984)—or
sequential quadratic programming—first introduced by Wilson (1963)—have
been around for quite some time, they were vastly refined over the last
decade, and new ideas entered the stage that made it possible also to tackle
large-scale problems; some milestones of these developments can be found
in the work of Han (1976), Powell (1978), Fletcher (1982a, 1982b), Fletcher
and Leyffer (2002). Needless to say, those SD problems with hundreds of
stocks and flows and relations between them fall into the large-scale
category.

Global versus local optimization


In mathematical optimization, typically a distinction is made between local
and global optimization methods. Global methods find a global optimum,
i.e. a solution for the given problem that has the proven lowest (resp. high-
est) objective function value among all other possible solutions in the case of
a minimization (resp. maximization) problem. The computational difficulty
is hence twofold: finding one of these solutions and then proving that no bet-
ter solution exists. Since this is computationally very demanding, one can
usually only solve very small problem instances, and large problem
instances only if a very special underlying substructure can be exploited (for
example, a shortest path in a graph can be computed very quickly, which is
possible on a street map of the entire U.S.A. or Europe). We only consider
local optimization methods here. Some discussion on the relationship
between our work and global optimization can be found in the final
section (“Conclusions and outlook”).
Coming back to the hiking example from the previous paragraph, a local
optimizer can be seen as a hiker with a map that shows the neighborhood
around his current location, but only for a very short distance of a few
meters. However, this is already good enough to avoid falling down a steep
rock that lies a few steps ahead, and also good enough to plan the next step
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 145

in the direction of the steepest ascent that leads to the nearest peak. How-
ever, it is not sufficient to find the highest mountain of all. For this, the hiker
had to carry a global map of all mountains, but carrying such heavy map
would slow the hiker down significantly.
A solution found by a local method is only optimal within a certain neigh-
borhood of this solution. This is a much more tractable problem, since it
only needs to be shown that no better solution exists for any possible
descent direction from the computed incumbent solution. Karush (1939) and
independently Kuhn and Tucker (1951) provide a mathematical description,
i.e. a criterion, to verify whether a given solution is indeed locally optimal.
These conditions are called KKT conditions, taken from the initials of the
inventors. Existing software packages, e.g. CONOPT (Drud, 1985) and IPOPT
(Wächter and Biegler, 2006), verify this criterion and thus terminate with
proven local optimal solutions.
Methods such as Box’s and Powell’s, as well as genetic algorithms, are
found in a further category of methods that may or may not find feasible
solutions. These methods, however, do not provide a formal, mathematical
guarantee of their local optimality. Moreover, such methods also do not pro-
vide a proof of global optimality (also called “certificate of optimality”).
They just return the best solution according to the termination criterion of
the method that does not come with any guarantee or proof of optimality. To
summarize, in some cases these methods are able to find good solutions.
However, they do not give any clues as to how good the provided solution
is. Hence modelers have to believe the results or rely on experience with
similar problems.

Interior point methods


We use state-of-the-art interior-point (IP) line-search algorithms as described
by Wächter and Biegler (2006) (other recent implementations include, for
example, KNITRO (Byrd et al., 2006) or LOQO (Benson et al., 2004). These
methods deal with constrained optimization problems of the form defined in
the following equations:

max f ðx1 ,…, xn Þ (1a)


s:t: gk ðx1 ,…,xn Þ ≤ 0, 8k = 1, …,m, (1b)
x = ðx1 ,…, xn Þ 2 ℝ n : (1c)

Here x1, …, xn are free variables, f is the objective function and gk are the
constraint functions. The method is based on the KKT conditions, which are
necessary (but not sufficient) for the local optimality of a solution to a con-
strained optimization problem. These conditions yield a nonlinear equality
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
146 System Dynamics Review

system which is solved multiple times for a parameter that converges to zero
during an optimization run. Once the parameter is (approximately) zero, a
local optimal solution has been found. In order to compute the KKT condi-
tions, first and second derivative information is needed. Hence it must be
assumed that all functions f and gk are at least twice continuously
differentiable.1
Using this additional information, it is possible to apply the method to
large problem sizes. Wächter and Biegler implemented their method in the
publicly available software code IPOPT (Wächter and Biegler, 2006), and
applied it to a test set of problems, where the largest has about 250,000 vari-
ables and constraints. Interior point methods are, however, limited because
they always work on the full set of constraints. This problem is avoided by
sequential quadratic programming methods, which apply an active set strat-
egy, and which are explained in the following section.

Sequential quadratic programming methods


We also use sequential quadratic programming (SQP) methods that were
implemented by Drud in the software code CONOPT (see Drud, 1985, 1994),
which also deal with optimization problems of the form defined in Eq. (1).
Other recent implementations of the SQP method can be found in MINOS
(Murtagh and Saunders, 2003) and SNOPT (Gill et al., 2005). SQP methods
begin with an initial solution, and improve this solution at each step
(i.e. sequentially). At each step a quadratic programming (QP) problem must
be solved that indicates the progress step or search direction for the next
step. The QP to be solved is derived from an approximation of the Lagrang-
ian of the problem and the current solution. Such problems can be solved
easily by Newton’s method for solving nonlinear equation systems. A step
length is determined, which yields a step size from one iteration to the next.
As with IP methods, SQP methods also use the KKT conditions to determine
if a local optimum has been reached. Again, this makes it necessary to
assume twice-differentiable functions. SQP methods are particularly fast,
since they only work with a subset of all inequalities. The linear approxima-
tion of the constraints is only taken for those inequalities that are satisfied
with equality (called “active constraints”) at the current iterate. This subset
needs to be updated from one iterate to the next, which is called an “active
set strategy.” The weakness of the SQP method is that they may require
already many QP iterations while still being far away from a local optimal
solution.
In summary, both IP and SQP today work for large nonlinear optimization
problems (NLP, for short) with twice-differentiable objective and constraint
1
A function is twice continuously differentiable on a domain, if the first and second derivatives are defined
everywhere in the domain, and the second derivatives are continuous functions.

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 147

functions, but it cannot be decided beforehand which method is actually fas-


ter for a given problem class. Hence one needs to run both methods concur-
rently, which is what we do in the following.

Summary
Sequential quadratic programming and interior point methods both can typi-
cally solve larger problems than Box’s and Powell’s methods. While both
methods (Box’s and Powell’s) operate on f(x) as a black box, these two more
modern methods (SQP and IP) make use of derivative information and
exploit them as steepest descent directions to converge faster. Additionally,
in this paper we consider constraint optimization problems, which cannot
be solved using Powell’s method. However, it is not entirely clear which of
the two, SQP or IP, is the faster method. Therefore, one has to implement the
model in both and then try out which of them is actually faster. For the two
system dynamics optimization models we use as test problems, we
attempted a solution of the optimization problem with the SQP solver CON-
OPT as well as with the interior point solver IPOPT. We report on our results
in the following sections.

Application

In order to use a modern interior-point- or SQP-based NLP solver, we need


to reformulate the given SDO problem as a nonlinear program. This involves
the discretization of the model using a suitable scheme (such as Euler or
Runge–Kutta schemes), and if necessary the interpolation of non-smooth
functions that exist in the model. We developed a data format that allows for
the formulation of arbitrary SDOs on the basis of Vensim models (Ventana
Systems, Inc., 2015), as well as automatic tools that allow the conversion of
an SDO into a nonlinear program. More details on the conversion and
approximation process and on these tools can be found in the Appendix S2
(Supporting information).
We selected two models from the literature to demonstrate the benefits of
our novel optimization approach. The first is the market growth model from
Forrester (1968); the second is the World2 model from Forrester (1971b). We
have chosen these two particular models because they are well explored, rig-
orously formulated and extensively tested and discussed in the literature. It
is likely that the reader already has some familiarity with them, which
allows us to concentrate on demonstrating the benefits of our approach with-
out requiring the reader to become familiar with a new model. We use simu-
lations of the unmodified models as base runs for comparison. Further, they
present the opportunity to demonstrate two interesting use cases. For the
market growth model, we directly optimize values that were previously
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
148 System Dynamics Review

determined by an a priori defined policy function. In the case of World2 we


compute policy functions for parameters that were constant in the original
model, and we are also able to use a previously documented solution for
comparison: the “Towards a Global Equilibrium” run of Forrester (1971b).
The concept of an actor with a global understanding of the world-system
was already considered in Forrester’s experiments with different parameter
choices. In our simulations of the World2 model we follow the same con-
cept. However, we want to emphasize that both applications are used fore-
most to demonstrate our approach. Already for these two models, there is a
literally infinite amount of different choices of which goal to use as the
objective that should be optimized, and our pick is just one possibility for
each of these two models.
In the next two sections, we demonstrate our approach with these two
models. The steps will be very similar in both cases. We first formulate an
objective function, i.e. a function of the model variables that will serve as a
quality indicator for a given model run. We then choose the “free variables,”
i.e. define how the behavior of the model can be altered. For both models,
we introduce additional “bookkeeping variables,” which we use to compare
the magnitude of interventions between base runs and optimized runs.
These new variables are state variables that only accumulate information,
without changing the dynamic behavior of the model (hence we call them
“bookkeeping variables”). In the final step, we compare the optimized solu-
tions with the selected base run.

Market growth
First, we consider the market growth model as presented by Forrester (1968).
A stock and flow diagram is shown in Figure 2. The model describes the pol-
icies governing the growth of sales and production capacity in a new prod-
uct market. Forrester’s original model resulted from a case study of an
electronics manufacturer and represents the opinions of the company’s
senior management about the way that corporate growth was managed.
We use the version of the market growth model as published by
Richardson (2011).
In the well-known base run of the market growth model, the evolution of
the firm shows an unexpected stagnation. Over the first 40 months, sales
rise, fueled by the increasing number of salesmen. Then, sales suddenly
level off. This is because those sales are not being backed up with added Pro-
duction capacity; this produces delivery delays, starting around month
20, and begins to affect sales a few months after that. The development of
Production capacity and Salesmen is shown in Figure 5.
In the original model, one of the key policies is modeled via the table
function CEF, which defines the Capacity expansion fraction depending on
the Delivery delay condition. Here, a production manager chooses by how
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 149

Time mkt rcgnz Time company


deliv delay rcgnz deliv delay
Normal sales effectiveness

<SEDM> Delivery delay


recognizd by mkt Delivery delay Time to set
Sales
recognizd by delivery delay goal
effectiveness
company

Delivery delay
Orders B
Salesmen operating goal
booked
Delivery delay
Time to hire Salesmen indicated
salesmen hired B
R Backlog Delivery delay
Delivery delay condition
minimum
Delivery delay <CEF>
Indicated Delivery rate Production <Delivery rate bias
salesmen capacity fraction average> Capacity OPTIMAL
expansion fraction SWITCH
<Time>
<PCF> CEFOPTIMAL
Salesman salary Budget
Delivery rate Production Production
average capacity capacity on Fractional costs for
Production order Production Costs for new new production
Revenue to sales capacity receiving capacity ordering production capacity capacity
Deliv rate avg time Costs for using Production
production capacity delay time

Fractional costs for Accumulated


using production total costs per Costs
capacity time

Fig. 2. Structure of the adjusted market growth model (Forrester, 1968). Source: Richardson (2011). Our modifications to the
original model are shown in bold

much the Production capacity should be reduced or increased in the future.


CEF is a function of the backlog. Therefore, the decision maker decides only
on the basis of the backlog of the company after some delay. He or she is
subject to bounded rationality (Simon, 1984; Morecroft, 1985).
A completely different type of decision maker does not have this limitation.
He or she has a deep understanding of the dynamics of his or her company and
is therefore able to make more informed decisions. Thus we address the ques-
tion: Is it possible to model such a decision maker? If the answer is “Yes,” a
follow-up question would be: Are there alternative solutions to the model
which allow the company to choose an expansion strategy that overcomes the
decline of Production capacity in the considered period of time? The optimiza-
tion problem resulting from these questions is detailed in the next section.

Problem statement
As described in the previous section, the dynamics of the market growth
model lead to a decline in Production capacity as the company reacts to fluc-
tuations in demand using the policies for capacity expansion, hiring sales-
men and recognition of delivery delays. Our base model has the standard
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
150 System Dynamics Review

values for the policies as described by Forrester (1968). The key control the
company has over the expansion or reduction of the Production capacity is
in setting the variable Capacity expansion fraction. In the original model,
this variable is defined in terms of a table function of the Delivery delay con-
dition. It is clear that setting the Capacity expansion fraction to a positive
value will lead to a continuous increase in the Production capacity. The
decline of the company would be avoided. However, the original paper does
not consider the costs that would be associated with an expansion of produc-
tion capacity and that might make a constant expansion strategy impossible.
When extending a pure simulation model to an optimization problem, this is
a common challenge: in the simulation case, the modeler has full control over
all variables and constantly checks the results for plausibility. In the case of opti-
mization, such plausibility checks need to be formulated as algebraic expres-
sions such that the optimization algorithm is able to take them into account.
In order to formalize this, we introduce a “cost of intervention.” In the
case of market growth, we consider the policy in question (i.e. the expansion
of the production capacity) as the intervention of interest. To quantify the
magnitude of this intervention, we assume that there is a cost associated
with the maintenance of the existing production capacity and another cost
associated with increasing the capacity. For our exemplary case, we assume
that adding one unit of new capacity is 16 times more expensive than main-
taining one unit of capacity. (If the method is to be applied to a concrete
company, this value has to be adjusted accordingly.)
To implement this concept, we extend the original market growth model
by introducing two cost variables c1(t), c2(t), two weights w1, w2 and one
new state variable, Accumulated Costs, integrating over the incurred cost:

c1 ðt Þ = w1 Production capacity orderingðtÞ, 8t = 1, 2, …, T, (2a)


c2 ðt Þ = w2 Production capacityðtÞ, 8t = 1, 2, …, T, (2b)
X
T
Accumulated Costs = Δt ðc1 ðt Þ + c2 ðt ÞÞ: (2c)
t=0

Note that information flows only into these additional variables but never
back to the original model variables. Thus the dynamics of the model are not
changed by our modifications. Based on these definitions, we can now com-
pare two simulations for the magnitude of the intervention. In the base run,
the accumulated cost incurred after 100 months amounts to a value of
3.47 × 105. In the following, we consider two questions concerning the model:

1. Is there a capacity expansion policy with a similar cost of intervention


that can keep the Production capacity stable or even expand the Produc-
tion capacity within the considered time frame?

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 151

2. Can this capacity expansion policy be formulated as a bounded rationale


policy, i.e. as a function of the Delivery delay condition?

In the next section, we answer the first question by formulating and solv-
ing an optimization problem. By analyzing the solution to this problem, we
answer the second question.

Formulation of the market growth optimization problem


At this point, the original model has been modified by adding the cost
accounting variables, as explained before, to assess the cost of intervention.
For our SDO problem, we now remove the functional dependency of the
Capacity expansion fraction from the model and consider it a free, time-
varying variable.2 The table function CEF is thereby removed from the
model. We do not remove or change any other functional dependencies; in
particular, the bounded rational policy for the hiring of salesmen remains in
place. We refer to the equations of this new model with one time-varying
free variable as “optimization model.” We now formulate the following opti-
mization problem:

max Production capacityðTÞ, (3a)


s:t: zðt Þ = Capacity expansion fractionðtÞ, 8t = 1, 2, …,T, (3b)
Accumulated CostsðTÞ ≤ 3:5 × 10 , 5
(3c)
½market growth base model, (3d)
 
change limitzc , (3e)
−0:2 ≤ zðt Þ ≤ 0:2, 8t = 1, 2, …,T: (3f)

The first statement (Eq. 3a) defines the objective function. We will com-
pare two simulations by comparing the value of Production capacity at the
end of the simulation at time T. We consider a maximization problem,
i.e. we prefer higher values of the objective function. We use z(t) = Capacity
expansion fraction (t) to denote the time-varying free variable. Note that we
added a constraint that limits the accumulated cost of intervention to the
cost incurred in the base run (Eq. 3c). This way, the magnitude of the inter-
vention in the base run and in the optimized run remain comparable. We
computed the Accumulated Costs (T) for the base run (i.e. a simulation of
the original model) to be 3.5 × 105. Constraint (Eq. 3d) represents the model
equations of the base market growth model.
2
We distinguish between constant and time-varying variables. A constant variable takes on the same value at
each time step of the simulation. Most model parameters are constant variables. A time-varying variable can
have a different value at each time step in the simulation.

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
152 System Dynamics Review

At first sight one might think that, by removing the link from the variable
Delivery delay condition to the Capacity expansion fraction, the capacity
expansion feedback loop is cut. However, the optimization algorithm uses the
information of all variables in the model, including the information about
delivery delay condition, to compute the optimal values for capacity expan-
sion fraction. This means that the mentioned feedback loop is still intact and
that additional feedback loops were created. In fact, by applying optimization
to the modified problem, we compute a more comprehensive policy function.
Locally optimal solutions of problems such as Eq. (3) often show so-called
“bang-bang behavior” (Sonneborn and Van Vleck, 1965; Artstein, 1980);
i.e. the value of the free variable remains at its upper bound for some time,
and then switches abruptly to its lower bound and vice versa. Even though
such a solution can theoretically be very efficient, it might be impossible to
realize in practice because the free variable cannot be changed infinitely fast.
To control how fast a free variable z can change in reality, we introduce an
additional parameter zc limiting the amount of change from one time step to
the next. This variable has the unit Unit of free
time
variable
and is incorporated into
the model via a constraint of the form
 
zðt + 1Þ− zðt Þ
− zc ≤ ≤ zc (3g)
Δt

In the case of the market growth model, the variable zc constrains the rate
of the free variable Capacity expansion fraction. The corresponding con-
straint is Eq. 3e.
We conducted experiments for different values of zc. For a constant Capac-
ity expansion fraction, i.e. zc = 0, no solution was found that respects the
cost constraint. Therefore, we varied zc from a value of 0.0001 (which leads
to a very slowly changing free variable) to zc = 0.4 (which allows the free
variable to jump from one bound to the other within one time step and is
therefore equivalent to no limit).
The results are summarized in Table 1 and Figure 3. In the following, we
select a value of zc = 0.002, which leads to sufficiently slowly changing solu-
tions for the market growth model. This constraint rate of change results in a
reduction of 42 percent in the value of the objective function.

Optimization results
We solve the problem using the NLP solvers CONOPT and IPOPT. More
information on the necessary reformulation can be found in the Appendix

Table 1 Locally optimal


objective values for the zc 0.0001 0.001 0.002 0.005 0.01 0.1 0.05 0.2 0.4
market growth model for Production capacity (t) 0.63 1.31 1.59 1.94 2.34 3.01 3.21 3.30 3.33
different change limits zc [104]

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 153

Fig. 3. Plot of optimal ·104


objective values of the
Objective Value
3
market growth problem
for different rate limits zc 2
1

10− 4

10− 3

10− 2

10− 1
zc

(a) (b) (c)


0.15
0.08 400

300
0.06
0.1
200

Capacity expansion fraction


Production capacity ordering
Capacity expansion fraction

0.04 100

0 0.05
0.02
-100
0 0
-200

-0.02 -300

-400 -0.05
-0.04
-500

-0.06 -600 -0.1


0 50 100 0 50 100 -1 0 1 2 3
t [Months] t [Months] Delivery delay condition

Fig. 4. Broken lines show base run, solid lines show optimization solution. (a) Comparison of values of the free variable in
base run and optimal run. (b) Comparison of the ordered Production capacity as a result of the chosen expansion fraction.
(c) Plot of functional dependency between capacity expansion fraction and delivery delay condition

S2 (Supporting information). The single-core solving time on a workstation


equipped with an Intel Xeon E3–1290 V2 was 1.6 and 1.7 seconds with
CONOPT and IPOPT, respectively. Both solvers converged to the same
solution.
Figures 4 and 5 compare the solution from the optimization run with the
base run. Figure 4(a) shows the values of the free variable. In contrast to the
base run, the optimization solution starts with a strong reduction of Produc-
tion capacity, which is then gradually increased. In both runs, the first zero
point is reached at almost the same time. However, while in the base run the
first local maximum is reached shortly thereafter, in the optimized solution
the variable continues to grow and reaches its maximum well into the

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
154 System Dynamics Review

(a) 14000 (b) × 10 4 (c) 45


5
baserun
optimized run 4.5
12000 40
4

10000 3.5 35
Production capacity

Salesmen
8000 30
Backlog

2.5
6000 25
2

4000 1.5 20

1
2000 15
0.5

0 0 10
0 50 100 0 50 100 0 20 40 60 80 100
t [Months] t [Months] t [Months]

Fig. 5. Comparison of state variable values (a–c) of the market growth model in base run and optimized run. Broken lines
show the base run values

second half of the considered time frame. Figure 4(b) shows how the solu-
tions result in adding or reducing Production capacity, and Figure 5(a) shows
the actual values of the Production capacity over time. At the beginning of
the time horizon, the Production capacity is declining in the base run as well
as in the optimized solution. Counterintuitively, in the optimized solution
the decline is more pronounced. A possible interpretation of this is that, in
the optimized solution, the company uses the first 40 months to reach a bet-
ter “starting position,” i.e. a better ratio of the Production capacity, backlog
and salesmen. Indeed, the Production capacity shows a regular u-shape, end-
ing with a higher Production capacity at the end of the time frame than at
t = 0. While in the base run the Production capacity drops to a value of
5260, in the optimized run the Production capacity at the end of the time
window reaches a value of 13,163. This represents an increase of 10 percent
compared to the start value, and an improvement of 250 percent compared
to the final value in the base run.
With the above considerations, we can answer question 1 with “Yes.” We
found a satisfying solution, spending a similar cost of intervention, but
avoiding the continuous decline in Production capacity. Indeed, the backlog
and the number of salesmen have also grown within the time frame
(Figure 5(b), (c)). Hence we do not have to expect a new strong oscillation
after the considered time frame.
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 155

To answer question 2, we refer to Figure 4(c). For the base run, we plotted
the table function CEF, i.e. the function has exactly one value on the second
axis for each value on the first axis. In the optimized solution, there is clearly
no function of the Delivery delay condition that would produce this plot.
The answer to question 2 is therefore “No.” Since, in the optimization proce-
dure, for each solution of the problem all variable values at all times are
known, the formulation and solution of the optimization problem (Eq. (3))
can be interpreted as modeling a decision maker who is aware of the full
model and its development over time. We showed that it is impossible to for-
mulate a policy function of one argument that reproduces this behavior.
Whether it would be possible to find a policy that depends on several state
variables and leads to a similar solution is an interesting question and
remains a topic for future research.
The main benefits of the policy computed with our approach can be sum-
marized as follows:

• Whereas in the base run the Production capacity was reduced, in the opti-
mized solution the Production capacity increases at the final time com-
pared to t = 0. The improvement compared to the base run amounts to
250 percent.
• We introduced a cost of intervention to account for the costs that result
from a different policy that were not accounted for in the original model.
Using the cost of intervention from the base run as a constraint of our
optimization problem, our optimized solution can be considered a redis-
tribution of effort rather than an increase of effort.
• With the exception of a permitted range of values, no assumptions about
the policy function, i.e. on the relationships between the policy function
values and other model variables, were made. Therefore, the search space
is much less limited than in a conventional policy improvement approach.
• The solution is guaranteed to be a local optimum within this larger search
space.
• In the original model, the Capacity expansion fraction was defined as a
function of the Delivery delay condition. In our approach we do not
assume this dependency and consequently the computed policy can no
longer be expressed as a function of this variable. Therefore, it could not
be computed using a conventional policy improvement approach to an
arbitrary function of the Delivery delay condition.

World2
The World2 model was introduced by Forrester (1971b). Figure 6 shows its
stock and flow diagram. The model is Forrester’s answer to the futility of
addressing world challenges in a piecemeal fashion. Instead the problem
should be addressed as a system of problems. The model consists of five
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
156 System Dynamics Review

<Time>

Difference in Food coeff


- +
Food coeff +
1970 Costs to change
FOOD food coeff
COEFF

births pollution
multiplier <Time>

births food
multiplier POPULATION food ratio
LAND AREA DENSITY food crowding
multiplier POLLUTION
NORMAL food pollution STANDARD
multiplier

crowding
births pollution
crowding deaths ratio
multiplier crowding FOOD PER
multiplier DEATH RATE CAPITA
<Time> NORMAL NORMAL
deaths food
-Birth rate pollution
Difference in Birth rate multiplier food per capita absorption time
normal 1970 Population quality crowding potential
+ multiplier
births deaths
+
BIRTH RAT <Time> quality pollution
Costs to change E NORMAL deaths pollution multiplier
birth rate multiplier
CAPITAL quality food pollution
INVESTMENT QUALITY OF LIFE absorption
multiplier
births material deaths material RATE NORM STANDARD
multiplier multiplier quality of life

Capital Pollution
Cap inv rate capital capital CAPITAL <Time>
1970 investment depreciation DEPRECIATION
<Time>
+ - NORM
pollution
capital generation
Difference in cap in rate investment pollution capital
multiplier capital multiplier
<Time>
Pollution per
material ratio
+ cap 1970
standard quality material
Costs to change of living multiplier POLLUTION
cap inv rate capital PER CAPITA
investment from NORM
effective capital ratio quality ratio
capital ratio agriculture +
EFF CAPITAL CAPITAL -
RATIO NORM AGRI FRAC Difference in Pollution per cap
nat res matl NORM +
multiplier <Time>
natural resource Costs to change
extraction multiplier pollution per cap
Capital Agriculture
Fraction
capital agri frac ind
Natural
<Time> Resources NATURAL
natural CAPITAL
RESOURCES
NAT RES natural resource resource AGRI FRAC
UTILIZATION fraction INITIAL
utilization ADJ TIME
NORM remaining

NRUN 1970
-
+
Difference in NRUN
+
<Time>
+ accumulated
<Costs to change Total Costs total costs
Costs to birth rate> + +
change NRUN + +
<Costs to change
food coeff>
<Costs to <Costs to change
change NRUN> cap inv rate>

Fig. 6. Model structure of the modified World2 model (Forrester, 1971b). Source: teaching material by George Richardson,
University of Albany, PAD 624. Free variables are shown in red. Additional bookkeeping variables are shown in green

interacting subsystems, each of which deals with a different system of the


model. The main systems are the food system (dealing with agriculture and
food production), the industrial system, the population system, the non-
renewable resources system and the pollution system. The model shows that
the production of goods, especially food, in this world is limited by the
available resources and constraints that will prevent population and produc-
tion from unlimited growth.
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 157

(a) (b) (c)


7 25 1.3

1.2
6
20

1.1
5

quality of life
15
Population

1
Capital

4
0.9
10
3
0.8

5
2
0.7

1 0 0.6
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
t [Years] t [Years] t [Years]

Fig. 7. Comparison of variables of the World2 model in base run (dashed line) and optimized run (solid line) (a–c)

Problem statement
In Forrester (1971b), several scenarios are considered for the World2 model.
In particular, in chapter 6, “Towards a Global Equilibrium” parameter set-
tings are presented, which lead to a sustainable state of the world system
within a relatively short time. We use this scenario as our base run. Selected
state variables of this run are shown in Figure 7 and the parameter changes
suggested by Forrester are listed in Table 2 in the column “Base run.” In the
base run we see a stabilized level of the Population, a high level of quality of
life and low level of Pollution. For our optimization, we intend to improve
the system behavior of the World2 compared to the best policy run of Forres-
ter. We want to identify a policy to achieve a more sustainable world mea-
sured in quality of life with the smallest costs necessary to achieve this. As

Table 2. Free variables


selected for optimization Value Base Optimization Weight
of the World2 model Free variable zi i t < 1970 run range (wi) zc,i

NRUN1970 1 1 0.25 [0.1, 1.0] 0.9 0.045


Pollutionpercap1970 2 1 0.5 [0.1, 1.0] 0.9 0.045
Capinvrate1970 3 0.05 0.03 [0.01, 0.05] 0.04 0.002
Foodcoeff1970 4 1 0.8 [0.6, 1.25] 0.65 0.0325
Birthratenormal1970 5 0.04 0.028 [0.02, 0.04] 0.01 0.0005

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
158 System Dynamics Review

in the previous model, we introduce costs of intervention, which are


incurred if a parameter value is changed at a given point in time.
As in the base run, all variables remain at their initial value zinit until the
year 1970 and can only be changed afterwards. We selected the following
symmetric exponential function to model a cost that increases exponentially
with increasing magnitude of the intervention. For each of the five free vari-
ables zi, as given in Table 2, the cost incurred by a change from the initial
value zinit,i to a value zi is then given by

fc, i ðzi ðt Þ,zinit , i Þ = αð expðβðzi ðt Þ− zinit , i Þ=wi Þ −1 + expð −βðzi ðt Þ −zinit , i Þ=wi Þ −1Þ
(4)

where we choose the parameter values α = 2.9 × 104 and β = 3.6. The value
of this function is zero if zi(t) = zinit,i, i.e. if the value of the free variable
remains at its initial value. This means there is no intervention, and there-
fore no costs are incurred. If the free variable is set to a different value from
the initial value a cost is computed for each timestep. The cost grows expo-
nentially with the difference between the new and the original value of the
variable. Each free variable zi must remain within a given interval. The
weights wi are chosen according to the width of the allowed interval. These
intervals, as well as the cost coefficients, are listed in Table 2. The chosen
weights wi are normalizations for different interval widths and ensure an
adequate weighting of the exponential growth of the individual cost compo-
nents. Furthermore, since zi and wi have the same unit and we choose α to
be of unit 1, fc,i is of unit 1 as well. We sum the costs of changing the free
variables from their initial value and accumulate them over time. The final
cost of intervention of a given run is then calculated as follows:
!
X
T X
5
Accumulated total costs ðT Þ = Δt fc, i ðzi ðt Þ,zinit , i Þ (5)
t=0 i=1

As in the market growth model, information flows only into our newly
introduced bookkeeping variables. Therefore, the model dynamics are not
changed by these modifications.
With the model adjusted as just described, we re-simulated the model
with Forrester’s parameter selection and derived the costs incurred by his
policy changes. This total accumulated cost we call the “Forrester budget,”
i.e. the costs necessary to implement Forrester’s final policy for World2.
The computed value is 3.8 × 107. Note that the utility of the unitless cost
of intervention is in the comparison of simulations. The absolute value of a
single simulation on its own is not yet useful. We will therefore use this
Forrester budget as a reference value for the optimizations in the following
sections.

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 159

Similar to the first application, we again formulate a question to guide the


analysis of the optimization results:

• Is it possible to find time-varying functions, for all five policy variables,


that lead to a continuous growth of population and quality of life, while
not surpassing the cost of intervention incurred by Forrester’s solution?

Formulation of the World2 optimization problem


In the section on the market growth model we directly optimized a variable
that was defined by a policy function in the original model. For the World2
model, we choose a different approach: we choose the five parameters that
Forrester changed in his simulation as free variables. They are: NRUN (natu-
ral resource utilization norm), Pollution per capital, Capital investment rate,
Food coefficient and Birth rate normal.
Typical for manual parameter finding, Forrester only changed the parame-
ters once in the year 1970. In the optimization run, we remove the premise
that the five policy parameters (see Table 2) only change once during the
simulation. We consider them as time-dependent functions, with limitations
on the speed of change (Eq. 6e) as described above in the section on “Formu-
lation of the market growth optimization problem.” We refer to Table 2 for
the values of the change limits.
These considerations led to the formulation of our version of the World2
model:
X
max Δt ðquality of life ðT Þ × Population ðT ÞÞ (6a)
t

s:t: Control ranges defined in Table 2 (6b)


Accumulated total costs ðT Þ ≤ 3:8 × 10 7
(6c)
½World2 base model (6d)
½change limits zc, i  (6e)

The desired result is a time-varying policy for all five free variables. In
terms of the objective function, we chose to accumulate the product of Popu-
lation and quality of life. In the base run, the value of this accumulated prod-
uct takes on a value of 610.14.

Optimization results
As before, we attempted the solution with CONOPT as well as IPOPT. How-
ever, IPOPT was unable to find a feasible solution within 10 minutes. The
solving time with CONOPT for this model was 142 seconds, and the locally
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
160 System Dynamics Review

(a) 1.25 (b) 0.04


(c) × 10 5
4.5

1.2
0.038 4
1.15
0.036
1.1 Birth rate normal 1970 3.5
Food coeff 1970

Total Costs
1.05 0.034 3

1 0.032 2.5

0.95
0.03 2
0.9

0.028 1.5
0.85

0.8 0.026 1
100 150 200 100 150 200 100 150 200
t [Years] t [Years] t [Years]

Fig. 8. Comparison of values of the selected free variables and intervention cost of the World2 model in base run (dashed
line) and optimized run (solid line) (a–c)

optimal objective value is 1046.79. This is an improvement of 70 percent.


The Accumulated total costs in the optimized run reach the same value as in
the base run: the optimization uses all of the available budget of
interventions.
The results of the optimization are shown in Figures 7 and 8. The three
panels in Figure 7 compare the base run and the optimization run for the
state variables Population, Capital and quality of life. The last panel clearly
shows that, in the optimization run, quality of life is improved compared to
the base run, at every point along the time axis. At the end of the simulation,
the improvement amounts to roughly 20 percent. As shown in the first two
panels, this improvement is accompanied by a continued growth in popula-
tion and capital.
Figure 8 shows the optimized values for the free variables Food coeff 1970
and Birth rate normal 1970. The third panel compares the costs of the inter-
ventions accumulated at a given time. Since in the base run all parameters
stay at the same value after the year 1970, it is clear that the cost incurred is
the same at all times following the year 1970. While the cost of interventions
is in the beginning slightly less than in the base run, there is a sharp peak in
the year 2065, thus reducing the birth rate.
All in all, the redistribution of the intervention towards different times
and different free variables as in the base run led to a solution with a highly
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 161

improved value of the objective function. To the best of our knowledge, no


simulation of the World2 model has been published so far that allowed for
an ongoing growth of the population and did not lead to a collapse by the
year 2100. As we demonstrated here, such a solution exists, if the assump-
tion that parameters can be changed only once is removed.
The main benefits of the policy computed with our approach can be sum-
marized as follows:

• In this application, we simultaneously optimized five time-varying


policies.
• As before, we introduced a cost of intervention and constrained our opti-
mization to not exceed the cost of intervention computed for Forrester’s
“Towards a Global Equilibrium” simulation of the World2 model.
• As in the market growth model, no assumptions on the policy functions
were made, except for the definition of a permitted range for the value of
the policy variables.
• The solution is guaranteed to be a local optimum.
• The solution represents an improvement of 70 percent compared to For-
rester’s original published solution.
• To the best of our knowledge, no solution with similar features has been
reported to date.

Before we conclude the paper, we demonstrate the sensitivity of the result-


ing value of the objective function when we limit the number of interven-
tions the actor can perform. In our optimization run, the actor could change
the value of each policy 200 times, i.e. one change per year (where those
values were overridden by the default values for the first 70 years). We
reduced the number of interventions in certain steps from 200 to 1. For
instance, when we have one intervention, then the values of the policy
parameter are changed once and remain constant thereafter. In Figure 8 we
demonstrate the trade-off between less frequent changes in the free variables
and the loss in optimality. When only one intervention is allowed for each
parameter for the whole period, the objective function of the optimal solu-
tion is 631 units, which in fact is very close to Forrester’s 610.14. With two
changes (one for each 100 years), the objective functions value already
jumps to 940 units: an improvement of 50 percent. When increasing the
interventions again to 4, i.e. one for each 50 years, the further improvement
is a modest 5 percent. For a further improvement of another 5 percent (from
993 to 1046), one has to increase the number of interventions in each policy
parameter to 200, i.e. one for each year. One can safely assume that this
observation is not particular for the SDO problem we have set up with
World2, but prototypical for other SDO problems as well: the number of
interventions has a strong effect on the resulting objective function values
when going from only one intervention to a few interventions. However, to
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
162 System Dynamics Review

(a) (b)

Solution Time [s]


200
Objective Value

1,000

800 100

600 0

200

100

50

20

10

1
200

100

50

20

10

1
Control Interval [Years] Control Interval [Years]

Fig. 9. The impact of changing the control interval in the World2 model on the achievable objective function (a) and the CPU
time to solve the resulting problems (b). The control interval defines how much time must pass before a control variable can
be assigned a new value. With a shorter control interval, the number of free variables increases. We would therefore expect
solving times to increase, which is in fact the case. The achieved objective value already increases significantly when
changing the control interval from 200 to 100 years. For further decreases of the control interval we see smaller but still
significant improvements

gain the last few percent of optimality, one has to intervene almost instanta-
neously and adapt the policy values with a high frequency. Systems that are
controlled with such high frequency exist. For example, the U.S. economy
can be seen as a highly complex dynamical system, related to national
(e.g. inflation, employment rate, psychological factors) and international
effects (other economies and political systems). The Federal Open Market
Committee of the Federal Reserve Bank steers this system by setting the
value of a free variable, the short-term objective for the Fed’s open market
operations. This value has been changed between one and 11 times per year
since the year 2000 (Board of Governors of the Federal Reserve System,
2015; Figure 9).

Conclusions and outlook

We have discussed recent mathematical optimization methods and demon-


strated their use in the context of system dynamics. We presented a novel
approach to the optimization of policy functions. In previous approaches,
policy functions are first parametrized, and then the parameters of the policy
function are optimized. In our approach, we directly optimize the time-
varying values of the policy variable. While previous approaches to policy
optimization are limited by the parametrization of the policy function, our
optimization approach takes the complete model dynamics into account. We
do not compute an optimal policy for an actor with a limited view of the sys-
tem, but for an actor with a comprehensive view of the model dynamics. We
used two example models from the literature to demonstrate the benefit of
this approach: market growth and World2. For both models, we computed
time-varying policy values, which are significant improvements of the base
runs, in respect to the selected objective function. In order to demonstrate
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 163

our approach, we defined optimization problems around each model by


specifying an objective function and free variables.
In order to make the results of our optimization comparable to previous
simulations of a given model, we introduced a novel concept. We extend the
model by defining a cost associated with each intervention into the system,
which is accumulated over time. Interventions of high magnitude or long
duration incur higher costs than small or short interventions. The definition
of this cost of intervention allows us to compute the cost associated with
interventions in the original models and use them as base values. This
approach allows us to optimize our selected objective function merely by
redistributing effort.
The main contribution of our paper is to demonstrate a new approach to
policy design and evaluation and make it available to the user: a systematic
way to compute solutions in a search space, which was not accessible with
previous methods. It is our hope that this will contribute to the toolbox of
model analysis and policy exploration, when an actor with an understanding
of the entire system is of interest.
In the paper, we did not consider stochastic influences that may be present
in the underlying real-world system on which the considered models are
based. We left out stochastic influences and thereby followed the original lit-
erature on the models used. This corresponds to level 0 uncertainty in the
hierarchy defined by Kwakkel and Pruyt (2013). A combination of our
approach with a higher uncertainty level could be a subject for further
research. However, with the approach presented in this paper it is already
possible to quickly adapt to changing realities by adapting the model, objec-
tive function and choice of free variables, and then re-solve the optimization
problem.
This paper has focused on policy optimization. In SD, optimization
methods are also used to calibrate certain model parameters to achieve a
desired model behavior. From a mathematical standpoint, calibration and
policy optimization are very similar. In the first step, the user would define
the objective function to measure how close a given simulation is to the
desired behavior. For example, this could be the absolute difference between
the value of a state variable at a given time and the desired value. In a more
complex case, the user could define the desired behavior of a state by defin-
ing a table function with desired values for each simulated time step. The
objective function would then be the absolute difference between the actual
values of the state variable and the desired values at each time step. After
defining the objective, in the second step the parameter to be optimized
would be selected as the free variable. After this, the methods presented in
this paper could be applied as presented here. In this paper, we have solved
problems with hundreds of free variables. Since calibration problems usually
only include a few free variables, we would expect to find local optima for
calibration problems without problems.
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
164 System Dynamics Review

A key concept of our approach is the removal of a functional dependency


from the model. The value of a variable that was previously defined by this
relationship will afterwards be computed by an optimization procedure.
This can be seen as the substitution of a link in a model by an optimization
algorithm.
As mentioned before, one might think that one or several feedback loops
are cut by this procedure. However, the optimization algorithm uses the
information of all variables in the model to compute the optimal values. The
previous feedback loop is still intact and even additional feedback loops
might have been created. More research is needed to untangle what feedback
relations actually influence the designed policy and by how much. This
endeavor is related to a conventional sensitivity analysis of the model.
Methods of data science and exploratory data analysis could be employed to
uncover those conditions and causal relations. We see a potential for further
research in this direction.
Existing numerical solvers such as CONOPT are able to solve such SDO
problems to proven local optimality in a very short time: typically less than
1 minute. The only technical obstacle is to approximate table functions by
smooth functions, for which we suggest spline interpolation, because the
numerical optimization software requires existing first and second deriva-
tives. We believe that most systems and relationships that are represented
via SD models do not naturally contain non-smooth functions (“nature does
not jump”; see, for example, Linné, 1751). The existing non-smoothness in
functions in published models appears to be a result of modeling practices
rather than an inherent property of the modeled systems. Here, it is neces-
sary to present the possibility of entering data not by filling out tables but by
directly drawing a spline curve, as for instance in CAD software packages.
Our research also has some further challenges for developers of numerical
solvers: the solutions found by the nonlinear optimization methods are only
locally optimal, and no information is provided about how far they are away
from a global optimal solution. To answer this question, one possible way is
to embed the whole procedure in a branch-and-bound search that guarantees
to find global optimal solutions in finite time. There has been some recent
research that explores this avenue.
To conclude this paper, we would like to express our sincere hope that we
were able to inspire our readers to consider modern optimization methods
as a powerful addition to their toolbox. To make these methods available
and accessible, we see the need for user-friendly software to bridge the gap
between SD and applied mathematics. In Appendix S1 (Supporting informa-
tion) we discuss and present some software solutions that we consider nec-
essary. Furthermore, some experience is required to become comfortable in
setting up optimization problems by defining objective functions and select-
ing free variables. For this, there is no better way than to choose a model
and to start experimenting. It is our hope that the use of modern
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 165

optimization methods will allow many practitioners to find new and inter-
esting solutions in models that we considered completely familiar.

Acknowledgements

We thank three reviewers for constructive comments on an earlier version of


our manuscript. Moreover, we thank those software providers who were
willing to share information and thus support our research.

Biographies

Armin Fügenschuh is a professor for Engineering Mathematics and Numerics


of Optimization at the Brandenburg Technical University Cottbus-Senften-
berg since 2017, and a Fellow of the Zuse Institute Berlin since 2013. He was
professor for Applied Mathematics at the Helmut Schmidt University /
University of the Federal Armed Forces Hamburg between 2013 and 2017,
postdoc researcher at the Technical University Darmstadt, the Zuse
Institute Berlin, the Georgia Institute of Technology, Atlanta, and the
Friedrich-Alexander University in Erlangen-Nuremberg between 2008 and
2013. He received his PhD from the Technical University Darmstadt in 2005
and his Habilitation from the Technical University Berlin in 2011. His
research interests include mixed-integer linear and nonlinear optimization,
and their applications to real-world problems, among them System Dynamics
optimization models. Stefan N. Groesser is professor of strategic and technol-
ogy management, Dean of Studies for Industrial Engineering and Manage-
ment Science at the Bern University of Applied Sciences, Switzerland. In
addition, he is a senior researcher in strategic management and system
dynamics at the University of St Gallen, Switzerland, and was a visiting
scholar at the System Dynamics Group at MIT Sloan. Stefan received his Ph.
D. in management from the University of St. Gallen. His research interests
include strategic and technology management, business models, mental
models, and simulation methodology.

Ingmar Vierhaus received his Diploma in Physics from Humboldt University


Berlin in 2009. Since then he has held research positions at the German
Aerospace Center, the Zuse Institute Berlin, and the Helmut Schmidt Univer-
sity / University of the Federal Armed Forces Hamburg. Among other pro-
jects, from 2012 to 2015 he was part of the Collaborative Research Center
"Sustainable Manufacturing", funded by the German Research Society. His
academic work focuses on mixed-integer non-linear optimization and
dynamical systems.
Copyright © 2018 System Dynamics Society
DOI: 10.1003/sdr
166 System Dynamics Review

References

Artstein Z. 1980. Discrete and continuous bang-bang and facial spaces, or: look for
the extreme points. SIAM Review 22(2): 172–185.
Bellman R. 2003. Dynamic Programming. Dover Books on Computer Science Series.
Dover: Mineola, NY.
Benson HY, Shanno DF, Vanderbei R. 2004. Interior-point methods for nonconvex
nonlinear programming: jamming and numerical testing. Mathematical Program-
ming 99(1): 35–48.
Betts JT. 2010. Practical Methods for Optimal Control Using Nonlinear Programming.
Advances in Design and Control. Society for Industrial and Applied Mathematics:
Philadelphia, PA.
Board of Governors of the Federal Reserve System. 2015. Open market operations
archive. Available: https://www.federalreserve.gov/monetarypolicy/openmarket_
archive.htm [8 March 2017].
Byrd RH, Nocedal J, Waltz RA. 2006. KNITRO: an integrated package for nonlinear
optimization. In Large-Scale Nonlinear Optimization, di Pillo G, Roma M (eds).
Springer: Heidelberg; 35–59.
Conn A. 2014. A trust region method for solving grey-box mixed integer nonlinear
problems. In International Workshop on MINLP 2014, CMU, Pittsburgh, PA, 2–5
June 2014 (lecture slides).
Dangerfield B, Roberts C. 1996. An overview of strategy and tactics in system dynam-
ics optimization. Journal of the Operational Research Society 47: 405–423.
Drud AS. 1985. CONOPT: a GRG code for large sparse dynamic nonlinear optimiza-
tion problems. Mathematical Programming 31(2): 153–191.
Drud AS. 1994. A large scale GRG code. ORSA Journal on Computing 6(2): 207–216.
Dynaplan AS. 2015. Dynaplan SMIA software. Available: http://dynaplan.com
[30 June 2015].
Fletcher R. 1982a. A model algorithm for composite nondifferentiable optimization
problems. Mathematical Programming 17: 67–76.
Fletcher R. 1982b. Second order corrections for nondifferentiable optimization. In Numeri-
cal Analysis, Vol. 912 of Lecture Notes in Mathematics. Springer: Berlin; Watson, G.A.
(Ed.), 85–114.
Fletcher R, Leyffer S. 2002. Nonlinear programming without a penalty function.
Mathematical Programming 91(2): 239–269.
Ford DN, Sterman JD. 1998. Dynamic modeling of product development processes.
System Dynamics Review 14(1): 31–68.
Forio Corporation. 2015. Forio software. Available: http://forio.com [30 June 2015].
Forrester JW. 1968. Market growth as influenced by capital investment. Industrial
Management Review 9(2): 83–105.
Forrester JW. 1969. Urban Dynamics. Pegasus Communications: Waltham, MA.
Forrester JW. 1971a. Counterintuitive behavior of social systems. Industrial Manage-
ment Review 73(2): 52–68.
Forrester JW. 1971b. World Dynamics. Wright-Allen: Boston, MA.
Fortmann-Roe S. 2014. Insight maker: a general-purpose tool for web-based model-
ing & simulation. Simulation Modelling Practice and Theory 47: 28–45.

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
I.Vierhaus et al.: Using White-box Optimization Methods in Policy Design 167

Gill PE, Murray W, Saunders MA. 2005. SNOPT: an SQP algorithm for large-scale
constrained optimization. SIAM Review 47(1): 99–131.
GoldSim Technology Group. 2015. Goldsim software. Available: http://goldsim.com
[30 June 2015].
Groesser SN, Jovy N. 2016. Business model analysis using computational modeling: a
strategy tool for exploration and decision-making. Journal of Management Control
27: 61–88.
Grösser SN. 2014. Co-evolution of legal and voluntary standards: development of
energy efficiency in Swiss residential building codes. Technological Forecasting
and Social Change 87(1): 1–16.
Han SP. 1976. Superlinearly convergent variable metric algorithms for general non-
linear programming problems. Mathematical Programming 11(3): 263–282.
Hanson DA, Kryukov Y, Leyffer S, Munson TS. 2009. Optimal control model of tech-
nology transition. International Journal of Global Energy Issues 33: 154–175.
Isee Systems, Inc. 2015. STELLA software. Available: http://iseesystems.com [30 June
2015].
Karmarkar NK. 1984. A new polynomial-time algorithm for linear programming.
Combinatorica 4: 373–395.
Karush W. 1939. Minima of functions of several variables with inequalities as side
constraints. Master’s thesis. Department of Mathematics, University of Chicago,
Chicago, IL.
Keloharju R, Wolstenholme E. 1988. The basic concepts of system dynamics optimi-
zation. Systemic Practice and Action Research 1(1): 65–86.
Keloharju R, Wolstenholme E. 1989. A case study in system dynamics optimization.
Journal of the Operational Research Society 40(3): 221–230.
Kuhn H, Tucker W. 1951. Nonlinear programming. In Proceedings of 2nd Berkeley
Symposium on Mathematical Statistics and Probability, Neyman J (ed). University
of California Press: Berkeley, CA; 481–492.
Kwakkel JH, Pruyt E. 2013. Exploratory modeling and analysis: an approach for
model-based foresight under deep uncertainty. Technological Forecasting and
Social Change 80(3): 419–431.
Linné CV. 1751. Philosophia Botanica In Qua Explicantur Fundamenta Botanica Cum
Definitionibus Partium, Exemplis Terminorum, Observationibus Rariorum. Adjec-
tis Figuris Aeneis. Kiesewetter: Stockholm. pp.1–362.
Liu H, Howley E, Duggan J. 2012. Co-evolutionary analysis: a policy exploration method
for system dynamics models. System Dynamics Review 28(4): 361–369.
Morecroft JDW. 1985. Rationality in the analysis of behavioral simulation models.
Management Science 31(7): 900–916.
Moxnes E, Krakenes A. 2005. SOPS: a tool to find optimal policies in stochastic
dynamic systems. In Proceedings of the 23rd International Conference of the Sys-
tem Dynamics Society, Sterman JD, Repenning NP, Langer RS, Rowe JI, Yanni JM
(eds). University of Bergen: Bergen. pp.1–16. http://www.systemdynamics.org/
conferences/2005/proceed/papers/MOXNE288.pdf
Murtagh BA, Saunders MA. 2003. MINOS 5.51 user’s guide. Technical report SOL
83-20R. Systems Optimization Laboratory, Department of Management Science
and Engineering, Stanford University, Stanford, CA.

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr
168 System Dynamics Review

Powell MJD. 1978. A fast algorithm for nonlinearly constrained optimization calcula-
tions. In Numerical Analysis, Vol. 630 of Lecture Notes in Mathematics,
Watson GA (ed). Springer: Heidelberg; 144–157.
Powersim Software AS. 2015. Powersim software. Available: http://powersim.com
[30 June 2015].
Rahmandad H, Repenning N. 2016. Capability erosion dynamics. Strategic Manage-
ment Journal 37(4): 649–672.
Repenning NP, Goncalves P, Black LJ. 2001. Past the tipping point: the persistence
of firefighting in product development. Industrial Management Review 43(4): 44–54.
Richardson GP. 2011. Reflections on the foundations of system dynamics. System
Dynamics Review 27(3): 219–243.
Schwaninger M, Grösser S. 2008. System dynamics as model-based theory building.
Systems Research and Behavioral Science 25(4): 447–465.
Simon H. 1984. Models of Bounded Rationality, Vol. 1. Economic Analysis and Pub-
lic Policy, (1st ed.) MIT Press: Cambridge, MA.
Sonneborn L, Van Vleck F. 1965. The bang-bang principle for linear control systems.
SIAM Journal of Control 2: 151–159.
Sterman JD. 2000. Business Dynamics: Systems Thinking and Modeling for a Com-
plex World. Irwing McGraw-Hill: Boston, MA.
The AnyLogic Company. 2015. Anylogic software. Available: http://anylogic.com
[30 June 2015].
Ventana Systems, Inc. 2015. Vensim software. Available: http://vensim.com [30 June
2015].
Wächter A, Biegler LT. 2006. On the implementation of an interior-point filter line-
search algorithm for large-scale nonlinear programming. Mathematical Program-
ming 106: 25–57.
Wilson RB. 1963. A simplicial method for convex programming. PhD thesis. Harvard
University, Cambridge, MA.
Yücel G, Barlas Y. 2011. Automated parameter specification in dynamic feedback
models based on behavior pattern features. System Dynamics Review 27(2):
195–215. https://doi.org/10.1002/sdr.457.

Supporting information

Additional supporting information may be found in the online version of


this article at the publisher’s website.

Appendix S1. Further Optimization Methods.


Appendix S2. Comparison with Optimization in Vensim.

Copyright © 2018 System Dynamics Society


DOI: 10.1003/sdr

You might also like