CDC00-INV4502: A Numerical Method For Solving Singular Brownian Control Problems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

CDC00-INV4502: A Numerical Method for Solving

Singular Brownian Control Problems∗


Sunil Kumar Muthukumar Muthuraman
Graduate School of Business Department of Computer Science
Stanford University Stanford University

Abstract
The Brownian approximation approach to developing dynamic con-
trol policies for multiclass queueing networks is useful when the limiting
Brownian control problem can be solved. However, this problem can be
rarely solved analytically. In this paper we present a new method for
numerically solving the Brownian control problem. We adapt nonlin-
ear finite element methods to numerically solve the Hamilton-Jacobi-
Bellman equation associated with the Brownian control problem. The
solution to this partial differential equation is then used to construct
an optimal control for the Brownian system. We illustrate this method
on an example of a Brownian control problem.

1 Introduction
Multiclass queueing networks form natural models of a wide variety of sys-
tems in manufacturing and communication. The problem of finding optimal
scheduling policies for such networks is, therefore, of considerable interest.
In general, analytically deriving such optimal policies is intractable. It is
prudent to relax one’s expectation and to find “good” policies that are op-
timal in some ideal asymptotic regime. A good candidate for such a regime
is “heavy traffic” where the system is barely capable of handling the load
impressed upon it. A systematic procedure for finding such good policies
was proposed by Harrison [?] as follows. First, we set up the system model
and the corresponding optimal control problem. Then, we imagine a system
parameter n increasing without bound and scale all processes of interest

This research is supported in part by a Finmeccanica Faculty Scholarship at Stanford
University.

1
by the parameter n as suggested by functional central limit theorems. We
heuristically derive a corresponding Brownian control problem in terms of
possible functional limits of scaled processes. We solve the limiting control
problem and find an interpretation of the solution of the limiting control
problem in the context of the original system. We use this interpretation to
come up a with a policy for the original system. Finally, we analyze perfor-
mance under that policy in the limiting asymptotic regime to verify that it
is indeed asymptotically optimal.
A shortcoming of the approach proposed by Harrison is that it is most
applicable when the limiting control problem is easily solved and the solu-
tion to the limiting control problem is easily interpreted. Unfortunately, it
is only for a small class of networks that analytical solutions of the limiting
Brownian control problem can be derived. Hence if the approach is to be-
come a universally applicable mechanistic procedure, it is necessary to find
ways to solve the Brownian control problem in general, at least numerically.
Furthermore, it is necessary that the numerical solution of the Brownian
control lend itself to easy interpretation as a policy in the context of the
original multiclass queueing network.
A numerical procedure for solving Brownian control problems, based on
approximating Markov chains, was proposed by Kushner and Dupuis [?]. In
this paper, we propose an alternate method for solving the Brownian control
problem that has the following added advantage. It provides the numerical
solution in a form that contains all the information necessary to interpret the
solution as a policy in the original multiclass queueing network. To be more
specific, it provides the solution in a form that allows one to apply the mech-
anistic policy design procedure called BIGSTEP proposed by Harrison [?].
Thus, the combination of the numerical method for solution of the Brownian
control problem and the BIGSTEP procedure provides one way to design
a purely mechanistic procedure (such as a software package) that takes as
input the primitives defining the multiclass queueing network and generates
a “good” scheduling policy as output. Of course, the final step of verifying
that the policies generated by this procedure are indeed asymptotical opti-
mal remains. Despite several recent examples of such asymptotic optimality
proofs in [?, ?, ?, ?], general proofs appear to be quite far away. However,
recent work by Bramson [?] and Williams [?] has provided a powerful set of
tools to help obtain such proofs.
The proposed method for solving the Brownian control problem may
be described as follows. We begin by articulating the control problem that
needs to be solved in its lowest dimensional form. This involves reformulat-
ing the problem in terms of an equivalent workload formulation [?]. Then,

2
using Ito’s formula, we formally derive the Hamilton-Jacobi-Bellman equa-
tion associated with this problem. This partial differential equation has an
embedded optimization problem within it. We solve this equation using
a two-step iterative procedure. In the first step, we assume a solution to
the embedded optimization problem and solve a partial differential equa-
tion with known parameters using the finite element method. In the second
step, the solution obtained to the partial differential equation is then used
to re-solve the optimization problem. This two-step procedure is then iter-
ated until convergence. The reader who is familiar with numerical methods
of optimal control will recognize this procedure as the analog of the policy
improvement algorithm.
The procedure raises two important theoretical issues. First, we need
to show that the iterative procedure actually converges, and that the limit
is indeed a solution to the Hamilton-Jacobi-Bellman equation. Second, we
need to verify that a solution to the Hamilton-Jacobi-Bellman equation is
indeed optimal for the stochastic Brownian control problem. This confer-
ence paper in meant to disseminate the basic method and to illustrate its
application. Hence, neither of these theoretical issues will be resolved in this
paper.
The rest of this extended abstract is organized as follows. In the next
section, application of the method outlined above will be illustrated on an
example of a single-server queueing system with service rate control. In the
last section, the general method and other examples considered in the full
paper will be outlined.

2 An Example
Consider the queueing system shown in Figure 1. It consists of a single
server whose service rate is continuously variable. There is a finite buffer
in front of the server and exogenous arrivals that do not find room in the
finite buffer are lost. For concreteness, the reader can consider the rate of
arrivals to the system as 1 and the service rate as 1 − µ, where µ is a real
valued variable representing the deviation from the nominal service rate of 1.
This interpretation is not necessary for the precise mathematical statement
of the Brownian control problem that follows. We consider the case when
the service rate is controlled by the system manager to minimize the sum of
the infinite horizon expected discounted holding cost for jobs waiting in the
buffer, and the infinite horizon expected discounted cost of control. That is,
the manager attempts to optimize the trade-off between using a very high

3
service rate to minimize the holding costs and the cost of implementing that
control.

Arrivals at
rate 1 Finite Buffer

Figure 1: Queueing System Example

Rather than consider a system discrete jobs, we consider the following


continuous approximation of the system, and the resulting Brownian control
problem. The reader is referred to Chapter 2 of Harrison [?] for details of
this system model. We assume that the inventory in the system Z(t), with
Z(t) ∈ [0, 1] for all t, is given by

Z(t) = W (t) + L(t) − R(t), (1)

where W (t) is a controlled one-dimensional diffusion given by


Z t Z t
W (t) = w + µ(Z(s)) ds + σ dB(s), (2)
0 0

with B(.) being a standard Brownian motion, w the initial inventory, σ


the variance parameter of the diffusion, and where L(t) and U (t) are non-
decreasing reflection processes that correspond to idleness of the server when

4
the buffer is empty and turning away of arrivals when the buffer is full. They
are given by
L(t) = sup (W (s) − R(s))− , and (3)
0≤s≤t

R(t) = sup (1 − W (s) − L(s))− . (4)


0≤s≤t

The function µ(.) : [0, 1] → U represents the state-dependent drift rate


control that takes values in a compact set U ⊂ R. When the inventory is
Z(t), the drift rate is µ(Z(t)). The cost function that we intend to optimize,
by choosing a suitable function µ is
Z ∞
v(w) = e−αt E [h(Z(t) + c(µ(Z(t))] dt, (5)
0

where h(.) is the holding cost function and c(.) is the control cost function.
For simplicity, we assume that both are general quadratic polynomials, and
α is the interest rate.
Formally, using Ito’s formula, we can derive the Hamilton-Jacobi-Bellman
equation for the value function v(.) that corresponds to the control problem
described above. The resulting equation is
σ 2 ∂ 2 v(w) ∂v(w)
 
− αv(w) + h(w) + minµ c(µ(w)) − µ = 0, (6)
2 ∂w2 ∂w
with boundary conditions
∂v(0) ∂v(1)
= = 0. (7)
∂w ∂w
The minimization in the equation above can be explicitly solved in some
cases. For example, when c(µ) = µ2 /2, we have1 minµ (c(µ) − µv ′ ) =
−(v ′ )2 /2 and hence we obtain the differential equation for the value function
v(.) as
σ 2 ∂ 2 v(w) 1 ∂v(w) 2
 
− αv(w) − = −h(w),
2 ∂w2 2 ∂w
with boundary conditions
∂v(0) ∂v(1)
= = 0.
∂w ∂w
In general, however, the optimization problem denoted by (∗) = minµ (c(µ)−
µv ′ ) has to be solved numerically, and we will consider this general case in
attempting to solve (6-7).
1 ∂f (w)
When unambiguous, we will denote ∂w
by f ′ (w).

5
Let us now turn our attention to solving the Hamilton-Jacobi-Bellman
equation (6-7) using finite element methods. Consider the space [0, 1] di-
vided into N intervals, each of length 1/N and define the grid points w(1) =
0, w(i) = (i − 1)/N, and w(N + 1) = 1. We define “hat” functions Na ,
a = 2, ...N as follows.

• Na (.) is continuous, and piecewise linear.

• Na (w) = 0 for all w ≤ w(a − 1) and for all w ≥ w(a + 1).

• Na (wa ) = 1.

Consider any function g(w) defined on the same space [0, 1]. Multiplying
the above equations (6-7) by g(w) and integrating with respect to w from 0
to 1, we get

σ2 ′
(v (w), g′ (w)) + α(v(w), g(w)) = ((∗), g(w)) + (h(w), g(w))
2
where (f, g) denotes the inner product on L2 . Now approximating v(w) by
N
X +1
v(w) = da Na , (8)
a=1

and g(w) by
N
X +1
g(w) = cb Nb
b=1
we get

σ2 ′ ′
[ (Na , Nb )+α(Na , Nb )]da = ((∗), Nb )+(h(w), Nb ) for all b = 1, 2, ..., N +1.
2
This can be rewritten in matrix form as

Kd = F + fp , (9)
2
where d = [d1 , d2 , ....dN +1 ]′ , K = [ σ2 (Na′ , Nb′ ) + α(Na , Nb )], F = ((∗), Nb ),
and fp = (h(w), Nb ).
Note that fp can be computed a priori. The equation (9) can be solved
iteratively as follows. First we assume a value for F (say the zero vector)
and solve for d. Using this solution for d we can compute the approximation
for v using (8). We can then use this approximation to solve (∗) and hence
compute the F . In doing so we also compute an approximation for the

You might also like