The Maximum Collection Problem With Time-Dependent Rewards

The Maximum Collection Problem with
Time-Dependent Rewards
E. Erkut
Department of Finance and Management Science, Faculty of Business,
University ofAlberta
J. Zhang
Department ofEarth and Atmospheric Sciences, Faciilty QfScience, University ofAlberta
We consider a routing problem where the objective is to maximize the sum of the rewards
collected at the nodes visited. Node rewards are decreasing linear functions of time. Time is
spent when traveling between pairs of nodes, and while visiting the nodes. We propose a
penalty-based greedy (heuristic) algorithm and a branch-and-bound (optimal) algorithm
for this problem. The heuristic is very effective in obtaining good solutions. We can solve
problems with up to 20 nodes optimally on a microcomputer using the branch-and-bound
algorithm. We report our computational experience with this problem. 0 1996 John Wiley &
Sons. Inc.
1. INTRODUCTION
Perhaps the most popular problem in the network optimization literature is the travel-
ing-salesman problem (TSP). Recently, a number of network problems that are closely
related to the TSP have been suggested by different authors. One of these related problems
is the maximum collection problem with time-dependent rewards ( MCPTDR ), which was
recently introduced by Brideau and Cavalier ( BC) [ 21. They suggested an integer program-
ming formulation and a heuristic algorithm for this problem. In this article we describe an
alternative heuristic, and an implicit enumeration algorithm for MCPTDR. We also report
our computational experience with the solution algorithms.
The MCPTDR is defined on a graph G ( N ,A ) with node set N = { 0 , 1,2, . . . , n } , and
arc set A = { ( i , j ) ,z , J € N } . The travel time on arc ( i , j )is given by d!,. The reward at node
i is given by r, ( t ) = max { 6, - sit, 0 } for t 2 0, and the time required to collect the reward
is denoted by u, . It is assumed that the reward at node i is collected at the arrival instance.
However, u, units of time must pass before node i can be departed. Point 0 is assumed to
be the depot with ro = 0 for all t > 0 and vo = 0. A solution to MCPTDR consists of a
sequence of nodes in G . The objective is to find the sequence that maximizes the total
reward collected.
The MCPTDR is closely related to another routing problem, namely, the orienteering
problem, for which a number of applications, such as vehicle routing and production plan-
ning problems, have been suggested in the literature [ 101. The MCPTDR problem may be
relevant in a competitive environment. BC suggest that the rewards may correspond to
sales made. The sales potentials may be declining over time, because other salespeople are
operating in the same area. Given the initial sales potential, and the decline rates, the ob-
NuvuI Rcsc.rrrch Logis[ics,Vol. 43, pp. 749-763 ( 1996)

Copyright 0 1996 by John Wiley &Sons, Inc. CCC 0894-069X/96/050749- I5
750 Naval Research Logistics, Vol. 43 ( 1996)
jective of a salesperson may be to visit the nodes of the network in an order that maximizes
the total sales. Conversely, the rewards may correspond to purchases of a certain product.
The amount of the product available may be declining over time, because of other pur-
chasers operating in the same region. In this scenario, if one wishes to maximize the
amount purchased, the MCPTDR problem may apply. Butt and Cavalier [ 31 use this sce-
nario in the context of recruiting athletes from area high schools for a college team.
The case of a rock band with its first hit album and a popular music video may help
portray the competitive scenario. The band may plan a tour of cities to capitalize on its
new fame. However, in the music industry fame and fortune are short-lived. Hence, the
number of tickets the band will sell is likely to decrease over time as the attention of the
fans is directed to other up-and-coming bands. Therefore, it is in the best interest of the
band to play concerts in as many locations as quickly as possible. In this case, the number
of tickets sold would correspond to the reward, which would decline over time (perhaps at
the same rate), and the problem to be solved is MCPTDR.
The MCPTDR problem may also apply in the scheduling of visits for equipment repair.
Consider a telecommunications equipment repair company operating in a city or a prov-
ince. The repair company receives calls for service during the day and sends out a repair
team to its customers the next day. The repair contract may stipulate that the repair com-
pany will receive a certain amount of money for a repair service if they can respond by
early the next morning, and their revenue will decline in proportion to the delay in service.
A time-dependent penalty is reasonable in the case of telephone equipment, because a
phone line that is not operational will result in lost revenue for the customer requesting the
repair. The repair company may sell different repair contracts (i.e., different charges for
immediate service and different penalty rates) to their customers. For example, a bank may
be willing to pay a higher charge and impose stiffer delay penalties than a grocery store. In
this setting, the repair company would be interested in maximizing its revenue by solving
a MCPTDR. ( I f a customer cannot be serviced before the charge drops to zero, then the
service for that customer would either be contracted out to another repair company, or be
provided free of charge after all customers with positive rewards have been serviced.)
2. LITERATURE SURVEY
We now give a brief discussion of the relevant literature. There is only one reference in
the literature to the MCPTDR. In this reference, BC propose an integer programming
formulation for the problem and suggest a two-phase heuristic, which consists of a two-step
greedy construction phase, and a 2-OPT improvement phase. They can solve problems
with up to eight nodes, using their IP formulation and GAMS/MPSX. The BC heuristic is
able to find very good solutions to these problems.
Although there is only one reference to the MCPTDR, there are a number of articles
on a closely related problem in the literature, namely, the orienteering problem. In the
orienteering problem, a subset of nodes in a network are to be visited. At each node, a prize
is collected. The goal is to maximize the total prize collected within a given time allotment.
There are three differences between the orienteering problem and the MCPTDR: The re-
wards in the orienteering problem do not depend on the time of visit, there exists a time
limit for completing the (partial) tour in the orienteering problem, and no time is spent at
the nodes in the orienteering problem. The MCPTDR can be viewed as an unrestricted
orienteering problem in a competitive environment.
Tsiligirides [ 121 introduces the orienteering problem and suggests a stochastic and a
Erkut and Zhang : Maximum Collection with Time-Dependent Rewards 75 1
deterministic heuristic algorithm for constructing good solutions, and a route-improve-

ment algorithm for improving solutions constructed by these two heuristics. On test prob-
lems with 21, 32, and 33 nodes, and with different time limits, the stochastic heuristic
tends to outperform the deterministic heuristic in terms of solution quality. However, both
algorithms produce tours of comparable quality when complemented with the improve-
ment phase.
Golden, Levy, and Vohra [ 5 J show that this problem is NP-hard, and present a center-of-
gravity heuristic for it, which is a four-phase approach. In the first phase, a greedy insertion
algorithm is used to construct a route, which is improved in phase two with the use of a 2-
OPT interchange procedure. In the third phase, the (weighted) center of gravity of the
improved route is computed, and nodes are sequenced according to a cheapest insertion
rule that uses distances between the nodes and the center of gravity. Finally, in the fourth
and final phase, the route that results from phase three is improved by using an interchange
procedure. The last two phases are repeated a number of times, and the best resulting tour
is reported as the product of this multiphase heuristic. The proposed four-phase heuristic
solves the Tsiligirides test problems in 0.5-10 s on a UNIVAC 1190. When compared to
the heuristics suggested by Tsiligirides, it is found that the center-of-gravity heuristic finds
better solutions than Tsiligirides’s stochastic heuristic (the better of the two heuristics sug-
gested by Tsiligirides) in about half of the problem instances, and it equals the performance
of the stochastic heuristic in about 20% of the instances. However, when one compares the
solutions from the center-of-gravity heuristic to the solutions from the stochastic heuristic
supported by an improvement phase (as suggested by Tsiligirides), the superiority of the
center-of-gravity heuristic is not as obvious.
Later, Golden, Wang, and Liu [ 71 improve upon the center-of-gravity heuristic by com-
bining it with the randomization idea of Tsiligirides’s stochastic heuristic, by taking into
account the opportunities posed by clusters of nodes, and by adding a learning feature. The
learning feature changes the desirability of the nodes in an insertion procedure based on
the outcomes of past tours. This heuristic is a relatively complicated one in comparison to
the other heuristics proposed for this problem. It uses five starting center-of-gravity loca-
tions, and uses 20 repetitions from each center-of-gravity location, producing a total of 100
solutions for every problem instance. Nevertheless, the computational effort required by
this heuristic is very reasonable (around 1 s per problem on a UNIVAC 1100/92) and it
finds better solutions to the Tsiligirides’s test problems than the stochastic heuristic and the
center-of-gravity heuristic.
The first three articles that study the orienteering problem suggest heuristic algorithms
for finding good solutions for it. In contrast, Kataoka and Morito [ 91 propose an optimal
algorithm for this problem. However, it seems that they were unaware of the existing liter-
ature, because they do not reference any of the articles discussed above. They provide two
formulations for the orienteering problem, which they term “the single-constraint maxi-
mum collection problem,” and describe a branch-and-bound procedure. They demon-
strate the solution procedure on a series of problems with 10 nodes and different prob-
lem parameters. These problems consume from 1 to 100 seconds on a 10-MHz micro-
computer.
In the most recent article that describes a heuristic for the orienteering problem, Ramesh
and Brown [ 101 propose a four-phase heuristic. This heuristic cycles over the first three
phases, namely, vertex insertions, edge interchanges, and vertex deletions, and takes the
best solution found to the fourth phase, where unvisited vertices are inserted into the tour.
Unlike its predecessors, this heuristic does not assume a Euclidean cost structure, and thus,
752 Nuvul Reseurch Logislics. Vol. 43 ( 1996)
does not exploit the underlying geometry ofthe Euclidean problem. Thus, it can be viewed
as the most general heuristic for this problem. The quality of the solutions found on Eu-
clidean problems, and the computational effort, are comparable to those of the Golden,
Wang, and Liu [ 7 ] heuristic.
Later, Ramesh, Yoon, and Karwan ( 1992) presented an optimal algorithm for this prob-
lem that employs a branch-and-bound search using Lagrangian relaxation. They reformu-
late the underlying network problem in such a way that the Lagrangian relaxation turns
out to be a modified form of a well-known problem. Using this procedure, they are able to
solve problems with 60 nodes in about 70 seconds on a VAX 1 1 /780. This algorithm also
allows them to test the performance ofthe Ramesh and Brown [ 101heuristic. This heuristic
finds solutions that are on the average about 10%away from the optimum on 20 test prob-
lems with 20, 30,40, and 60 nodes and different time constraints.
We finish the literature survey with two articles that study problems that are closely
related to the orienteering problem. Balas [ 11 considers a “prize collecting TSP,” where the
salesperson gets a prize in every city visited and pays a penalty for every city not visited.
The objective is to minimize the travel costs and penalties, while collecting a prescribed
amount of money. (Note that, without the penalties, the versions of the problem studied
by Kataoka and Morito and by Balas are very closely related. In fact, these two versions
can both be viewed as special cases of the bicriteria problem of maximizing rewards and
minimizing time spent.) Balas discusses structural properties of the polytope of this prob-
lem, identifying several families of facet-defining inequalities for the polytope. Although
these inequalities can be used in designing an algorithm for the problem, n o algorithm is
suggested in this article.
In the most recent related paper, Butt and Cavalier [ 31 consider the multiple tour version
of the orienteering ( maximum collection) problem. They provide an integer programming
formulation for the problem, a heuristic solution procedure, and report their computa-
tional experience with the heuristic. On problems with 10, 15, and 20 nodes, the heuristic
succeeds in finding very good solutions, when compared to optimal solutions found using
the IP formulation.
3. MIXED-INTEGER PROGRAMMING FORMULATION

We now give a mixed-integer programming formulation for the MCPTDR by defining
one binary variable for each node to account for the piecewise linearity of the reward
function.
max 2 v,,
I= I
n
s.t.Cx,=l, i = O ,.... n,
j =a
n
CxIi=l, j = O, . . . , n ,
I= I
...
Erkut and Zhang: Maximum Collection with Time-Dependent Rewards 153
r;gb;-s;t;+M(I-y,), i = 1, . . . , n , (5)
r,IMy,, i = l , ..., n, (6)
r l , t j2 0 , i = 1, . . . , n , (7)
~;,=0,1, i = O,..., n; j = 1 , ..., n; iZj, (8)
y, = 0 , 1 i = O , . . . , n. (9)
The problem parameters are as defined in Section 1, and M is a large number. The
decision variables are
xij = 1, if node i is followed by n o d ej in the tour, 0 otherwise,

ti = the arrivaltime at node i,
r, the reward collected at node i,
=
y, = 1 if the reward collected at node i is positive, 0 if the reward is zero.
The accounting of the rewards is accomplished as follows. For y , = 1, ( 5 ) and ( 6 ) become

r, I b, - s,t, and r, IM , respectively. Thus, because the objective is to maximize, for y , =
1, in an optimal solution we would have r, = b, - s,t,. For y, = 0, ( 5 ) and ( 6 )become r, I
M a n d r, I 0, implying r, = 0. Although this formulation would schedule visits to all nodes,
it is clear that the zero-reward nodes (if any) would be encountered at the end of the tour.
Depending on the problem scenario, it may be desirable to exclude such nodes from the
tour in an actual implementation.
Note that it is not necessary to write another set of constraints involving the arrival time,
+
the reward function parameters, and y, [such as t, I b, /s, M ( 1 - y, ) ] . Because for y, =
0 the reward is always zero, the maximization objective will prefery, = 1 for all nonnegative
( 6 , - s,t, ). For negative ( b , - s,t, ), the objective will prefer y , = 0, which implies a zero
reward.
Also note that constraint set ( 4 ) also serves the purpose of eliminating subtours. The
+
above formulation contains n 2 n( 0 - 1 ) variables, 2n continuous variables, and n2 4n +
constraints. Although it is possible to solve the problem by solving the above formulation
on a professional solver, we believe that this is not a desirable strategy, because of the size
of the problem. Note that, even for a small problem with 10 nodes, the formulation con-
tains 110 binary variables. Thus, we turn to heuristics in the next section, and develop a
problem-specific branch-and-bound algorithm in Section 5.
4. HEURISTIC ALGORITHM
Because the computational effort required to solve an integer programming problem

increases quite rapidly, as the problem size increases, BC provide a two-phase heuristic for
MCPTDR. The suggested heuristic consists of a serial application of two heuristics: a two-
step-look-ahead greedy and a 2-OPT procedure. We first suggest a minor improvement to
the first phase of this heuristic, and then propose a penalty heuristic for the first phase,
154 Nuvul Rcwurch Logistics, Vol. 43 ( 1996)
v=l
rew = rnax(20-t.0)
max{35-t,0}
Figure 1. The numerical example used to demonstrate the suggested improvement for the two-step
greedy heuristic.
which usually outperforms the two-step greedy in computational testing. We use the nu-
merical example in Figure 1 to motivate the improvement we suggest.
Suppose the tour is currently at node q , and the only nodes that are still unvisited are
nodes j and k . At this point, the two-step greedy heuristic compares the reward-per-unit
time for the sequence q - j - k , with the sequence q - k - j , and chooses the next node to
be visited.
+
Following the notation in BC, the heuristic computes R,k = [ 100 (20 - 20) (35 - +
+ +
30)]/30 = 3.5, R L /= [lo0 (35 - 20) 01/34 = 3.38, and chooses to visit node j next.
Clearly, this is an inefficient move, because the reward at n o d e j is zero, and there is nothing
to be gained from visiting n o d ej before node k . In fact, the visit to n o d e j delays the visit to
node k and results in a total reward that is lower than the total reward that can be achieved
by visiting node k first. This deficiency can be removed by inserting a simple check into the
two-step greedy heuristic: When choosing the next node to be visited, exclude all nodes
with zero rewards. In our preliminary computational tests, this minor modification im-
proved the quality of the solutions by an average of about 2% and in the computational
tests we report later, we used the modified version of the two-step greedy heuristic.
(Inefficient visits to zero-reward nodes may be flushed out of the solution by the second
phase of the heuristic. However, this may increase the computational effort required by the
second phase.)
Penalty Heuristic
The heuristic we suggest for MCPTDR is a greedy construction heuristic that attempts
to minimize the opportunity cost-per-unit time of not visiting a node at any given point in
the tour. Suppose the last node visited in a partial tour is node q . Suppose if we visit n o d e j
next, we will collect a reward of r,. If we do not visit node j next, the reward at n o d e j will
be reduced by some amount. The algorithm finds a lower bound on this reduction by
considering visits to nodes other than nodej. Suppose the tour visits node k next, instead
of nodej. Ignoring zero rewards for a moment, the loss of reward at n o d e j will be equal to
s,(d,, + +
vh dk,- d,,), if n o d e j is visited immediately after node k. Thus, the loss value
calculated above is a lower bound on the loss of reward at nodej, if node k is visited prior
to node j. Repeating this calculation for all unvisited nodes and taking the minimum of all
loss values gives a lower bound on the lost reward at n o d e j if n o d e j is not the next node to
be visited. This can be considered as a penalty of not visiting node j next. We find a penalty
+
per time unit by dividing this penalty by (d,/ v,). The next node to be visited is the node
Erkut and Zhang : Maximum Collection with Time-Dependent Rewards 755
max{ 122-t.0)
Figure 2. The numerical example used to demonstrate the penalty heuristic.
with the highest penalty-per-time value. If the reward at this node is zero at the time of the
arrival, then this node is excluded from the tour and the penalties are calculated again.
Hence, this algorithm may produce a tour of a subset of the initial node set, because it does
not include nodes with zero rewards in the tour.
We now state the penalty algorithm formally.
Initialize: partial tour P = { 0 } , nodes 1, . . . , n are unvisited, time = 0, total-reward =0

repeat until all nodes are visited
denote q = the last node in partial tour P
repeat for all unvisited nodesj
repeat for all unvisited nodes k
+
lossk = ( d,L + 01, dk, - d,,)~,
end
minimum-loss = mink { lossk }
penalty, = minimum-loss/( d,, u,) +
end
maximum-penalty = max, { penalty,} = penalty,
delete node I from the list of unvisited nodes
+ +
triaLtime = time u, d,,
trial-reward, = b, - s,(trial-time)
if trial-reward, > 0 then
add node i to the partial tour P
time = trial-time
+
total-reward = totaLreward trial-reward,
end
Note that this heuristic uses O(n 3 )effort, like the two-step greedy heuristic suggested by
BC. We use the numerical example in BC to demonstrate the penalty heuristic. The prob-
lem is reproduced in Figure 2 for convenience. The reward functions are printed next to
each node; the visitation times are two units for each node. The first two iterations of the
756 N u w I Resrurch Logistics, Vol. 43 ( 1996)
Table 1. The summary of Iteration I of the penalty algorithm. Node 2

is the first node to be visited, because the act of not visiting node 2 is
associated with the highest penalty rate.
Loss
k=
Penalty per
1 2 3 4 unit time
j= I 12 10 16 1013
2 18 18 24 1813
3 20 15 25 1514
4 2 2 I 116
Table 2. The summary of lteration 2 o f t h e penalty algorithm. Node 3

is the next node to be visited, because the act of not visiting node 3 is
associated with the highest penalty rate.
Loss
k= Penalty per
I 3 4 unit time
j = I 6 8 616
3 25 15 1515
4 6 3 315
algorithm are summarized in Tables 1 and 2. The matrices in the tables contain the reward-
loss information. For example, if we do not visit node I in iteration 1 but visit node 2, we
+
will lose at least ( 1 + 2 4 - 1 )*2 = 12 units of reward. The last column in each table
provides the penalty per unit time if node j is not visited next, and the algorithm picks the
node with the largest penalty.
Continuing in this fashion (and breaking ties arbitrarily), the heuristic terminates with
the following solution: 0 - 2 - 3 - 1 - 4 - 0. The reward associated with this solution is
+ + + + +
(148 - 1 * 3 ) + ( 148 - ( 1 2 + 3)*5) + ( 144- ( I 2 3 2 2)*2) + ( 122 - ( 1 + 2 +
+ + + +
3 2 2 2 3 )* 1 ) = 494.
Although the solution found by the penalty heuristic for this numerical example is opti-
mal, we recognize that the penalty heuristic cannot be expected to find optimal solutions
with regularity, and we suggest using an improvement phase, such as 2-OPT, to improve
the solution found by the penalty heuristic.
5. EXACT ALGORITHM
In our opinion, solving the mixed-integer programming model in Section 3 using an
integer programming package is not the best way of solving MCPTDR. BC report that the
average solution time with a similar formulation (using GAMS/MPSX) is over 1.5 h on a
mainframe for problems with n = 8, and a problem with n = 10 was not solved in 50,000
CPU seconds. In the branching tree of a MIP problem with n 2 binary variables, there are
n 2 levels, and as many as 2"* tips at the bottom. However, the most intuitive representation
of the problem (namely, one of viewing the solution as a sequence of n numbers) calls for
only n levels in a branching tree with as many as n! tips at the bottom. Clearly, 2"' grows
much faster than n ! .
Figure 3. Motivation of the improved upper bound. The last node in the incomplete tour is node q,
and nodes i and k are contained in set J .
We now describe a simple branch-and-bound algorithm to solve MCPTDR optimally.

Our algorithm constructs the optimal tour by implicitly enumerating all possible tours.
The algorithm is a depth-first search branching algorithm that operates o n a vector of length
n . All nodes in the branching tree, except for the tip nodes ( i n level n ) , correspond to
incomplete tours. Pruning of nodes in the enumeration tree is made possible by using lower
and upper bounds. The solution provided by the tandem of penalty heuristic and 2-OPT
delivers a (tight) lower bound for the problem. A simple upper bound for an incomplete
tour can be calculated quickly.
Denoting the set of nodes not visited by an incomplete tour by J , the partial reward of
the incomplete tour by PR, the time consumed by the incomplete tour by f , and the last
node visited by the incomplete tour by q , the following expression is an upper bound on
the total reward for all possible completions of the incomplete tour:
PR+ C max{O,bJ-sJ(t+v,+d,,)}
JEJ
Note that the assumption behind the above calculation is that we can visit every one of
the unvisited nodes immediately after node q . This would be possible ifwe had I JI visitors,
and if we would send one visitor to each node in J . Because there is only one visitor, and
nodes can only be visited sequentially, the above term is an upper bound on the total
reward that can be extracted from any completion of the incomplete tour. In fact, the
summation represents a potential reward. This simple upper bound is sufficient to solve
problems several orders of magnitude faster than the MIP model. Using a QUICKBASIC
program and a 386 microcomputer with a math coprocessor, we solved randomly gener-
ated problems with n = 10 in an average time of 26.5 s.
Clearly, it is possible to improve upon the upper bound introduced above. Here we out-
line possible improvements. The general idea is one of reducing the potential computed
above by recognizing the fact that the nodes in set J cannot be visited all at once. To
facilitate the development of the improved bound, we use Figure 3.
+ +
The potential for node i is max { 0, bj - si( t u, d,,) } . This is the reward that will be
collected at node i if i is visited immediately after q . However, if node i is not visited
758 Naval Research Logistics, Vol. 43 ( 1996)
immediately after q , then this potential will be reduced. The potential reward at node i will
be reduced to the following amount, if node k is visited after q :
Thus, the reduction is
We can find a lower bound on this reduction by considering all nodes j in J . A valid
lower bound is
Although the expression seems complicated, the idea is quite simple. If we do not visit i
immediately after 4 , then some time will be lost due to travel to some (other) node in J ,
then some more time will be lost due to visiting that other node, and finally some more
time will be lost due to the travel between the other node and node i. The lost time will be
+
at least min,,J{ d,,} min,,J{ v,} + minl,lEJ{d,,} units. The reduction formula above
takes these losses into account. Reductions to potentials are computed for all nodes in J ,
and they are subtracted from the simple upper bound introduced as the potential. How-
ever, we must add the largest of these reductions back to the potential, because one of the
nodes in J will actually be visited immediately after 4 . These calculations give us a sharper
upper bound.
The reductions detailed above reduce the potential by accounting for the fact that only
one of the potentials in the set J can be realized, and all other potentials must be reduced,
because arrivals at all but one node will be delayed by one round in the sequence. This idea
can be taken further, and potentials can further be reduced up to I JI - 1 times. This is, in
fact, what we implemented in our program. However, we do not detail further reductions,
because the operations are very similar to the ones detailed above.
Clearly, the additional pruning achieved through improved upper bounds may not be
worth the effort spent for improving the upper bounds. However, we found that the version
ofthe algorithm with sharper upper bounds used less computational time for our test prob-
lems. It is perhaps worth mentioning that, before spending effort on improving the simple
upper bound at a node, one should check whether the simple upper bound is sufficient to
prune the node. Reductions in the potential should only be calculated if the simple upper
bound is unable to prune the node. Similarly, after every reduction round, a pruning should
be attempted before starting the next reduction round.
6. COMPUTATIONAL EXPERIENCE
We were interested in the computational effort required by the algorithms, as well as in
the accuracy of the heuristics. The algorithms were coded in QUICKBASIC and run on a
386 micro. We generated the network by generating uniformly distributed random points
in the (0, 10)square. The distances were assigned by using the Euclidean distance formula.
The visiting times and the parameters of the reward function were generated with the use
Table 3. Average accuracies of the heuristics on 20 random problems

with n = 8.
.
s; v; range 2SG PEN 2SG + 20PT PEN + 20PT
(1,11) 90.66% 96.6 1 % 96.79% 98.62%
(1.16) 9 1.46% 90.50% 97.68% 97.5 1%
2SG = Two-step greedy heuristic
PEN = penalty heuristic
2SG + 20PT = two-step greedy complemented with the 2-OPT
improvement heuristic
PEN +
20PT = penalty heuristic complemented with the 2-OPT
improvement heuristic.
of uniform distributions as well. To facilitate duplication of our results, we report the

ranges of these parameters along with our results below.
We first report the accuracy comparison of the two-step greedy with the penalty algo-
rithm on 20 problems with n = 8. Both algorithms were run individually, as well as in series
with the 2-OPT algorithm, resulting in a total of four heuristic runs per problem. Each
problem was solved optimally, with the use of the branch-and-bound algorithm. The range
for bJin all problems solved was ( 300,600). To observe the impact of the problem param-
eters, we used two sets of distributions for the s, and u, values. When we used a range of ( 1,
1 1 ) for s, and u,, the optimal tours collected positive rewards from an average of 7.25 nodes
(out of 8 ) , whereas when we switched to a range of ( 1, 16) for s, and uJ (i.e., larger
variance), on the average 2.05 nodes resulted in zero rewards. We report the average per-
centage optimality figures for the heuristics in Table 3.
This limited comparison suggests that the penalty heuristic is superior to the two-step-
greedy on problems where almost all of the nodes produce positive rewards. However, the
two-step-greedy is slightly superior on problems where several nodes produce zero rewards.
In almost all instances, the 2-OPT heuristic results in an improvement of the solutions. On
the average, the best of the four heuristic solutions is within 2.5% of the optimum.
We further compared these heuristics on larger problems ( n = 10, 20, 30, 40, 50). In
each set 10 random problems were solved. However, we did not solve these larger problems
optimally. Thus, the average accuracies reported are in relation to the best solution found
by the four heuristics. The b range used was dependent on n , namely (40n, 80n).As in the
first part of the experiment, we used two different ranges for s, and u, values. In Table 4 we
report the average accuracies for the s-u range of ( 1, 11 ), and in Table 5 we report the
accuracies for the s-u range of ( I , 16). For the problems summarized in Table 4, almost
all nodes produced positive rewards, whereas for the problems summarized in Table 5,
several nodes ( u p to 20%) produced zero rewards.
Table 4. Average accuracies of the heuristics o n 10 random problems with the s-v range of ( I , 1 1).
n b 2SG PEN 2SG + 20PT PEN + 20PT
10 (400,800) 9 1.26% 98.69% 98.44% 99.75%
20 (800,1600) 92.44% 96.77% 98.90% 99.67%
30 ( 1200,2400) 90.87% 97.99% 98.94% 99.64%
40 (1600,3200) 91.51% 97.53% 98.43% 99.92%
50 (2000,4000) 91.84% 98.06% 98.07% 100.00%
Average 91.58% 97.81% 98.56% 99.80%
760 N m a l RL'SL'LIIY~
Logistics. Vol. 43 ( I996 )
Table 5. Average accuracies of the heuristics on 10 random problems with the s-1' range of ( I , 16).
n h 2SG PEN 2SG 20PT + PEN 20PT+
10 (400,800) 9 1.68% 93.06% 9 7.79% 98.27%
20 (800.1600) 90.07% 93.68%) 98.53% 98.2 I %
30 ( 1200,2400) 9 I .56% 90.60% 99.16%) 99.2 8%
40 ( 1600.3200) 91.51% 93.64%) 98.32% 99.3 8%)
50 (2000,4000) 89.4 I % 93. I7% 98.26% 99.79%
AveraEe 90.85% 93.83% 98.4 I %, 98.99%
These results indicate that the penalty heuristic is consistently superior to the two-step
greedy heuristic. In fact, in many individual problem instances, the solution produced by
the penalty heuristic is superior to the solution of the greedy complemented with 2-OPT.
The solution provided by the penalty heuristic, complemented by 2-OPT is almost always
the best among the heuristic solutions. Nevertheless, as the problem parameters are
changed to introduce more nodes with zero rewards, the accuracy gap between the penalty
heuristic and the greedy heuristic closes. As Table 3 indicates, the greedy algorithm may be
superior to the penalty algorithm, depending on the problem parameters.
When solving the test problems summarized in Table 5, we noticed that the solution
produced by the penalty algorithm almost always collected positive rewards from more
nodes than the solution produced by the greedy algorithm. In fact, in some problems with
50 nodes, the greedy collected zero rewards from up to 20 nodes, whereas the number of
zero-reward nodes never exceeded 10 for the penalty algorithm.
When solving the problems with larger values of n , it came to our attention that the
2-OPT phase consumed most of the computational effort. For example, the 10 50-node
problems summarized in Table 4 consumed about 3.6 h. Only 1.5 and 3.7 min of this time
was spent on the penalty and the greedy algorithms. respectively. The rest was spent on the
2-OPT procedure. The 2-OPT procedure complementing the greedy heuristic consumed
three times as much time as the 2-OPT procedure complementing the penalty heuristic.
Although not valid for every problem solved, in most instances a better phase- 1 heuristic
solution (greedy or penalty) resulted in a better solution with less computational effort in
the 2-OPT phase. Thus, it would seem that for larger problems one should run several
construction heuristics, and attempt to improve upon the best solution using 2-OPT. It
also seems that for very large problems ( n > loo), the 2-OPT procedure may be computa-
tionally overwhelming, and the solution of a construction heuristic may have to suffice.
We also note that, although the greedy and the penalty algorithms are of the same compu-
tational complexity, the greedy algorithm consumed, on the average, about 2-3 times as
much time as the penalty algorithm in all problems sets of Tables 4 and 5. This is because
the greedy algorithm performs many more arithmetic operations per iteration than the
penalty algorithm.
Although the penalty heuristic appeared to be superior to the greedy heuristic on the
larger problems reported in Tables 4 and 5, we can reach no conclusions about the absolute
quality of the heuristic solutions, because these larger problems were not solved optimally.
To get some indication of the absolute effectiveness of the penalty heuristic on these larger
problems, we solved 30 random problems with 15 nodes, with b = (600, 1200) and (s,u ) =
( I , 1 1 ). On this new problem set, the penalty heuristic complemented by 2-OPT generated
solutions that were 98.05% optimal on the average. This is consistent with the results re-
ported in Table 3. However, the heuristic solutions for four of the 30 problems were below
Erkut and Zhang: Maximum Collection with Time-Dependent Rewards 76 I
Table 6. Average solution times (in CPU seconds) for different problem sizes, using the C++ code
of the optimal algorithm on a 60-MHz Pentium microcomputer.
n 10 11 12 13 14 15 16 17 18 19 20
Time 0.8 1.5 2.8 7.1 17.3 43.4 112.9 326.6 1015 2968 6398
95% optimal, and the worst heuristic solution was only 92.15% optimal. This indicates that
there is room for improvement of the heuristic solutions via alternative algorithms.
One might expect the quality of the solutions of a heuristic to decrease as the size of the
problem (number of nodes) increases. We are unable to test this hypothesis in a conclusive
manner, because we cannot solve large problems optimally. However, as a start to answer-
ing this question, we solved 10 problems with 20 nodes, with b = (800, 1600) and (s, v ) =
( 1 , l l ) . The average optimality of the solutions produced by the penalty algorithm com-
plemented by 2-OPT was 96.19%, and the worst solution was only 90.12% optimal. These
figures, when compared to those that resulted from smaller problems, may be considered as
an indication that the quality of the heuristic solutions suffers as the problem size increases.
Although the penalty heuristic appears slightly supenor to the two-step greedy in our
computational tests, we suggest using both heuristics (complemented by 2-OPT) to im-
prove the chances of finding a near-optimal solution to a problem. If one uses this strategy,
then the average heuristic solution for the 10 problems with 20 nodes is 98.08% optimal,
and the worst heuristic solution is 96.8 1% optimal.
In the final part of our computational experiment, we tested the speed of the optimal
algorithm. To reduce the time used up by the branch-and-bound algorithm, we translated
the code into C++ and ran it on a Pentium (60 MHz) microcomputer. For n = 10-20, we
solved 10 random problems for each problem size. Table 6 gives the average computational
time per problem for these problem sizes. We note that there is a large variance in times for
a given problem size. There is usually an order-of-magnitude difference between the fastest
and the slowest time in a group.
Note that the branch-and-bound algorithm (on a micro) solves problems with 10 nodes
5-6 orders of magnitude faster than an all-purpose MIP solver (on a mainframe) can solve
the MIP formulation suggested in BC.
We assumed that the growth of the computational effort as a function of the problem
size is in the form Y ( n )= ab", where Y ( n )is the average computational time for a problem
with n nodes. Applying simple linear regression with In( Y ) and n as the dependent and
independent variables respectively, we found a = 0.00046 and b = 2.539 with R 2 = 0.994.
This implies that the effort to solve the problem grows by a factor of about 2.5 every time
one node is added to the problem. We conclude that this version of the implicit enumera-
tion algorithm is not practical for problems that have many more than 20 nodes.
7. SUMMARY AND CONCLUSIONS
In this article we considered a variant of the traveling-salesman problem: the maximum

collection problem with time-dependent rewards. The objective is to visit nodes of a net-
work to collect rewards that decline as time progresses. This problem may apply in com-
petitive environments, or in instances where there is a time-dependent penalty for delays
in service. After reviewing related literature, we provided a mathematical programming
762 NavalResearch Logistics, Vol. 43 ( 1996)
formulation, a penalty heuristic, and an implicit enumeration algorithm for this problem.
Our computational experience indicates that the penalty heuristic is usually superior to
a two-step greedy heuristic suggested for this problem. When the penalty heuristic is used
in tandem with a 2-OPT improvement heuristic, solutions that are within 1% of the opti-
mum are generated with great regularity for small problems (with up to 15 nodes). Larger
problems (with 50 nodes) can be solved in an average time of about 5 min on a 386 mi-
crocomputer with the use of the penalty heuristic followed by 2-OPT. The 2-OPT phase is
quite time consuming, and it seems that more effective construction heuristics would re-
duce the computational effort of the tandem heuristic. To that end, avenues worth pursuing
are the refinement of the penalty criterion, the development of a semigreedy penalty algo-
rithm [ 81, or a Tabu-search heuristic [ 41. Using several different heuristics for a problem
would also greatly reduce the chances of producing a poor solution for an individual prob-
lem instance.
Based on our experience with small problems (with 10-20 nodes), we found that the
computational time required by the implicit enumeration algorithm grew with the (2.5 )th
power of the number of nodes. The average solution time for problems with 20 nodes on a
6O-MHz Pentium was 1.8 h. It is probably possible to solve larger problems ( n > 20) opti-
mally by improving the branch-and-bound algorithm through sharper lower and upper
bounds. However, to solve substantially larger problems ( n > 100) optimally, one may
have to resort to a different optimal algorithm.
We conclude that MCPTDR is a difficult combinatorial problem. Nevertheless, it seems
that one can usually find very good solutions to MCPTDR problems by using a combina-
tion of simple and quick heuristics. In addition to being quite valuable in providing good
upper bounds for optimal algorithms, heuristics may provide the only practical alternative
for large instances of this problem.
ACKNOWLEDGMENT
This research has been supported in part by the Natural Sciences and Engineering Re-
search Council of Canada (OGP 2548 1 ), and a McCalla Professorship at the University of
Alberta.
REFERENCES
[ 1 ] Balas, E., “The Prize Collecting Traveling Salesman Problem.” NL.twork.7, 19,62 1-636 ( 1989).
[ 21 Brideau, R.J., Ill, and Cavalier, T.M., “The Maximum Collection Problem with Time Depen-
dent Rewards,” presented at the TIMS International Conference. Alaska, June 1994.
[ 31 Butt, S., and Cavalier, T., “A Heuristic for the Multiple Tour Maximum Collection Problem,”
Computers & Operations Research. 21 ( 1 ), I0 I - I I 1.
[ 41 Glover, F., “Tabu Search: Part I ,” ORSA Jozirnal on Comnpiiting 1 ( 1989).
[ 5 1 Golden, B.L., Levy, L., and Vohra, R., “The Orienteering Problem,” Nuvul Research Logistics,
34,307-3 I8 ( 1987).
[6] Golden, B.L., and Stewart, W.R., “Empirical Analysis of Heuristics,” in E.L. Lawler, J.K. Len-
stra. A.H.G. Rinnoy Kan, and D.B. Shmoys, Thc Truvc./ing Sa/.smun Probkm. Wiley, New
York, 1985.
[ 7 1 Golden. B.L.. Wang, Q., and Liu, L., “A Multifaceted Heuristic for the Orienteering Problem,”
A o r d Rcwarclr Logistics. 35, 359-366 ( 1988).
[ X ] I I x t . J.P.. and Shogdn, A.W., “Semi-Greedy Heuristics: An Empirical Study,” Operations Re-
\(,(/).(// /.<*//(,K\. 6, 107-1 14( 1987).
[ 91 Kataoka, S., and Morito, S., “An Algorithm for Single Constraint Maximum Collection Prob-
lem,” Journal ~ f ’ t hOperutions
e Resmrch Society ofJupun, 31,515-53 1 ( 1988).
[ 101 Ramesh, R., and Brown, K.M., “An Efficient Four-Phase Heuristic for the Generalized Orien-
teering Problem,” Computers & Operations Research, 18( 2 ) , I5 I - 165 ( I99 1 ).
[ 1 1 ] Ramesh, R., Yoon, Y . ,and Kanvan, M.H., “An Optimal Algorithm for the Orienteering Tour
Problem,” ORSA Joiirnalon Computing,4 ( 2 ) , 155-165 (1992).
[ 121 Tsiligirides, T., “Heuristic Methods Applied to Orienteering,” Juirrnal c$Operational Research
Society, 3 5 ( 9 ) , 797-809 ( 1984).
Manuscript received February 1995

Revised manuscript received January 1996
Accepted February 2 1, 1996

The Maximum Collection Problem With Time-Dependent Rewards

Uploaded by

Copyright:

Available Formats

You might also like

The Maximum Collection Problem With Time-Dependent Rewards

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Maximum Collection Problem With Time-Dependent Rewards

Uploaded by

Copyright:

Available Formats

The Maximum Collection Problem with

NuvuI Rcsc.rrrch Logis[ics,Vol. 43, pp. 749-763 ( 1996)

deterministic heuristic algorithm for constructing good solutions, and a route-improve-

3. MIXED-INTEGER PROGRAMMING FORMULATION

r,IMy,, i = l , ..., n, (6)

~;,=0,1, i = O,..., n; j = 1 , ..., n; iZj, (8)

xij = 1, if node i is followed by n o d ej in the tour, 0 otherwise,

The accounting of the rewards is accomplished as follows. For y , = 1, ( 5 ) and ( 6 ) become

Because the computational effort required to solve an integer programming problem

Figure 2. The numerical example used to demonstrate the penalty heuristic.

Initialize: partial tour P = { 0 } , nodes 1, . . . , n are unvisited, time = 0, total-reward =0

Table 1. The summary of Iteration I of the penalty algorithm. Node 2

Table 2. The summary of lteration 2 o f t h e penalty algorithm. Node 3

We now describe a simple branch-and-bound algorithm to solve MCPTDR optimally.

Thus, the reduction is

Table 3. Average accuracies of the heuristics on 20 random problems

of uniform distributions as well. To facilitate duplication of our results, we report the

7. SUMMARY AND CONCLUSIONS

In this article we considered a variant of the traveling-salesman problem: the maximum

Manuscript received February 1995

You might also like