Choice Behavior in A Discrete-Trial Concurrent VI-W: A Test of Maximizing Theories of Matching

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

LEARNING AND MOTIVATION 16, 423-443 (198%

Choice Behavior in a Discrete-Trial Concurrent VI-W:


A Test of Maximizing Theories of Matching
BEN A. WILLIAMS

University of California, San Diego

Rats were trained on a discrete-trial procedure in which one alternative (VR)


was correlated with a constant probability of reinforcement while the other was
correlated with a VI schedule which ran during the intertrial intervals and held
the scheduled reinforcer until they were obtained by the next VI response.
Relative reinforcement rate was varied in series of conditions in which the VR
schedule was varied and in series in which the VI was varied. Choice behavior
was described well by the generalized matching law, although moderate under-
matching occurred for all subjects. Contrary to the predictions of molar maximizing
(optimahty) theories, there was no consistent bias in favor of the ratio alternative,
and the sensitivity to reinforcement allocation was not systematically affected
by whether the ratio or interval schedule was varied. The results were also
contrary to momentary maximizing accounts, as there was no correspondence
between the probability of a changeover to the VI behavior and the time since
the last response to the VI alternative. Neither variety of maximizing theory
appears to provide a general explanation of matching in concurrent schedules.
@ I985 Academic Press, Inc.

The matching law (cf. Herrnstein, 1961, 1970), expressed as Eq. (I),
states that the ratio of response rates for two alternatives in a choice
experiment “matches” the ratio of their respective reinforcement rates.

Perfect matching often fails to occur, however, so that results from choice
studies are more often described in terms of Eq. (2), known as the
generalized matching law (Baum, 1974). Accordingly, the exponent, u,
indicates the sensitivity of the response allocation to the reinforcement
ratio (with a = 1.0 for perfect matching), while the parameter, b, indicates
bias toward one or other response alternative independent of the rein-
Request for reprints should be addressed to the author, Department of Psychology, C-
009, University of California, San Diego, La Jolla, CA 92093. This research was supported
by NIMH Research Grant 1 ROl MH 35572-02 and NSF Research Grant BNS84-08878
to the University of California, San Diego.

423
0023-9690185 $3.00
Copyright 0 I985 by Academic Press. Inc.
Ail rights of reproduction in any form rexred.
424 BEN A. WILLIAMS

forcement ratios (for example, when there are ditTerent types of reinforcers
or responses).

One account of the matching relation is molar maximizing theory (also


known as optimality theory), which postulates that matching occurs because
it is the pattern of behavior that produces the highest overall rate of
reinforcement (cf. Baum, 1981; Rachlin, Battalio, Kagel, & Green, 1981;
Staddon & Motheral, 1978). Implicit in such analyses is the assumption
that subjects integrate rates of reinforcement from all sources in the
situation and then compare the overall rates obtained from different
patterns of responding. Such an assumption is, of course, quite different
from the traditional view that choice behavior can be understood by
comparing the response strengths associated with each individual behavior.
A procedure that provides a potential test of optimality theory is a
concurrent variable-interval (VI) variable-ratio (VR) schedule. From the
nature of the schedules involved, it should be apparent that the best
strategy for maximizing total reinforcements involves working continuously
on the ratio alternative with only occasional sampling of the VI alternative.
This is true because the VI reinforcers are held until obtained by the
next response, while VR reinforcement rate is directly proportional to
the total amount of VR responding. This intuition has been developed
in more formal terms by Baum (1981), who demonstrated (given several
simplifying assumptions) that the pattern of behavior which produces the
highest total reinforcement rate on a concurrent VI-VR is given by Eq.
(3), in which n refers to the ratio requirement (also see Frelec, 1982).
Thus, optimality theory predicts that matching of relative response rates
to relative reinforcement rates should occur, but with a bias toward the
VR schedule equal to the square root of the ratio requirement.

Equation (3) also implies that optimization theory predicts a different


outcome depending on how relative reinforcement rate is varied. With
variation due to changes in the VI component the result is a form of
biased matching, since n remains constant as long as the ratio component
is unchanged. Thus, fi is then equivalent to the bias term, b, of Eq.
(2). This prediction does not apply, however, when the relative rein-
forcement rate is varied by changes in the VR requirement. The value
of n then changes across experimental conditions so that the degree of
bias will covary with the rates of reinforcement produced by the ratio
component. That is, a fit of Eq. (2) to several different experimental
conditions in which n is varied would result in an averaging of the bias
MAXIMIZING THEORIES OF MATCHING 425

term across conditions that would be captured by Eq. (2) both in terms
of the value of b being somewhere in the middle of the different values
of dn and in terms of a value of u substantially less than 1.0 (see Baum,
1981, Fig. 7, for a graphical depiction for why this is so). The result is
that the sensitivity of relative response rate to relative reinforcement
rate should be substantially less when relative reinforcement rate is varied
by changes in the VR schedule, than by changes in the VI schedule.
The major evidence pertaining to Eq. (3) comes from the study of
concurrent VR-VI by Herrnstein and Heyman (1979), who demonstrated
that matching did occur but with different types of biases depending
upon the behavioral measure. With response rate as the measure bias
was in favor of the VR alternative; with time allocation the bias was in
favor of the VI alternative. Such results are contrary to the predictions
of Eq. (3), both because the time-allocation data are those more relevant
to the derivation of Eq. (3) offered by Baum (1981) and because the bias
term for response rate was substantially less than required by Eq. (3).
The ratio values employed by Herrnstein and Heyman were in the range
of VR 30-60, so the expected bias should be in the range of 5-8. In
fact, however, the obtained bias for response rate was approximately
1.4. The predictions of optimality theory thus appear to be disconfirmed
by the empirical evidence.
F’roponents of optimality theory (e.g,, Rachlin ef cd., 1981) have argued
that the failure of the expected bias in favor of the VR alternative need
not be considered critical evidence, if the concept of “leisure” is in-
corporated into the analysis. Thus, rather than the total value in the
situation being given by the sum of the reinforcement rates from the
choice alternatives, value is also determined by the time spent not re-
sponding. Choice between VR and VI is then actually a choice between
different “packages” of food + leisure. Because ratio schedules maintain
higher local response rates than do interval schedules, this implies that
the choice of the interval schedule produces a relatively greater amount
of leisure, thus offsetting the bias toward the VR component that is
expected from Eq. (3), which is based only on the obtained rates of
food. To support this contention, Rachlin et cd. (1981, pp 409-410) dem-
onstrated that values of VI leisure and VR leisure could be derived
empirically that would allow the choice results of Herrnstein and Heyman
(1979) to be consistent with optimality-based predictions.
Differential amounts of leisure for VR versus VI schedules is a potential
confounding variable only because the two schedules produce different
local rates of responding. Thus, its effects should be eliminated by ensuring
that local response rates are equated for the two schedules. One method
of accomplishing this was used by Green, Rachlin, and Hanson (1983),
who trained subjects on a concurrent VR-“VR” schedule that was a
direct analog to a concurrent VR-VI. The major difference between their
426 BEN A. WILLIAMS

procedure and a conventional VR-VR schedule was that one of the


response alternatives advanced not only its own ratio programmer but
that associated with the other response as well. Reinforcers for the
second response scheduled by the first ratio programmer were then held
until the subject emitted its next response to the second alternative.
Reinforcers for the second response (the “VR” alternative) thus became
available in two ways; by occurring while the animal was working on
that response, and by accumulating while the animal was on the VR
alternative. The critical difference between their procedure and the con-
ventional VR-VI schedule is that local response rates should not be
different for the two alternatives, since ratio schedules were involved in
both cases. The differential effects of leisure thus should be eliminated
and predictions of Eq. (3) should apply. The results were partially consistent
with this analysis, as, unlike the results for Herrnstein and Heyman
(1979), a substantial bias toward the VR component did occur, However,
the degree of bias was still substantially less than that predicted by
optimality theory and the data were highly variable (perhaps due to the
absence of a COD). Thus, the evidence relevant to optimality theory
remains ambiguous.
The present study used a different method for equating the response
requirements for the interval and ratio schedules. A discrete-trial procedure
was used in which a single response was allowed each trial. The probability
of reinforcement for the ratio alternative was held constant across trials,
while that for the VI alternative was governed by a VI timer that ran
during the intertrial intervals (ITIs), with each scheduled VI reinforcer
then held until the next VI response. Because only a single response
was required, no differences in local response rate were possible, so
differences in the amount of “leisure” presumably should not occur.
Once again, therefore, the predictions of Eq. (3) should directly apply.
Discrete-trial versions of a concurrent schedule have been studied
previously, but only with concurrent VI-VI (cf. Nevin, 1969; Silberberg,
Hamilton, Ziriax, & Casey, 1978). The major focus of such research has
been to test “momentary maximizing” accounts of matching, which
assume that matching occurs because the subject chooses the response
alternative with the higher local probability of reinforcement at any moment
(cf. Shimp, 1966, 1969). Although the outcomes of the discrete-trial pro-
cedures have generally provided negative evidence for such accounts
(see especially Nevin, 1979, 1982), more recent data (Hinson & Staddon,
1983a, 1983b) from conventional free-operant concurrent schedules have
provided strong support. The present study thus provides the opportunity
for further investigating the conflict between the previous studies.
As developed by Hinson and Staddon (1983a), the application of mo-
mentary maximizing to a concurrent VI-VR schedule can be determined
by comparing the moment-to-moment choice probabilities with the cor-
MAXIMIZING THEORIES OF MATCHING 427

responding reinforcement probabilities. For a VI schedule with an ex-


ponential distribution of interreinforcement intervals, the probability of
reinforcement is given by the time since the last response to that alternative
combined with the overall scheduled rate of reinforcement, as expressed
by Eq. (4). Lambda refers to the scheduled rate of reinforcement.
P (~/~“i) = 1 - e-A’ (4)
Subjects should thus respond to the VI alternative when the value of
Eq, (4) exceeds the constant probability of reinforcement associated with
the ratio alternative (the inverse of the ratio requirement) and respond
to the ratio alternative otherwise. By recording the trial-by-trial choices
as a function of time since the last VI response, it should be possible
to determine whether a momentary maximizing strategy actually is present.
The present study trained rats on a series of concurrent VI-VR schedules
using a discrete-trial procedure. In the first series of conditions the relative
rate of reinforcement was varied by changes in the ratio alternative;
subsequent conditions varied relative reinforcement rate by changes in
the VI schedule. The data were recorded both in terms of the overall
choice probabilities and in terms of the trial-by-trial sequences of choices.
The aim of the study was thus to provide a meaningful test of both the
molar maximizing and momentary maximizing theories of the matching
law.
METHOD
Subjects
Four experimentally naive Holtzman strain male albino rats were main-
tained at 85% of their free-feeding weights by additional feeding after
the end of each experimental session. The subjects were 10-I I months
of age at the start of the experiment and had an average free-feeding
weight of 650 g.
Apparatus
The experimental space consisted of an interior conditioning chamber
constructed of clear Plexiglas enclosed in a larger wooden box equipped
with a ventilating fan for sound insulation. The interior dimensions were
20.3 cm in height, 25 cm in width, and 28 cm in length. Mounted on the
front panel of the chamber, which was painted black, were two nonre-
tractable steel levers, located 12.7 cm above the floor and spaced 10 cm
apart. Each lever was approximately 0.3 cm thick, extended I .3 cm into
the chamber from the front panel, and required a minimum force of 0.10
to 0.12 N for a depression sufficient to activate the microswitch to which
it was connected. Approximately 5.0 cm above each lever was mounted
a 3-W miniature light bulb, which was illuminated during trial periods.
Between the levers and 1.4 cm above the grid floor was the recessed
428 BEN A. WILLIAMS

opening for the food dipper, which was activated for reinforcement. The
dipper size as 0.02 ml, and the reinforcer consisted of undiluted Mazola
corn oil.
Procedure
All subjects were hand shaped to press each lever in alternate sessions.
After 100 reinforcers had been obtained by presses of each separate
lever, all subjects were begun immediately on the first condition listed
in Table I. A discrete-trial procedure was used in which the illumination
of the lights located above each lever signaled the trial onset. The first
response to either lever terminated the trial, producing the reinforcer if
scheduled, and initiating a constant 6-s IT1 in which the lights were
extinguished. Trials also terminated after 5 s if no responses occurred.
A total of 400 trials were presented each session.
The sequence of schedules associated with each lever is shown in
Table 1. Interspersed between Conditions 6 and 7 were several sessions
in which the IT1 was varied, but these data are not presented. Note that
one subject (S-20) became ill after Condition 6 and was terminated from
the study.
The VR schedule was programmed by two 33-position steppers, each
wired according to a random number table, with a Gerbrands’ cam stepper
(model G-4642) determining on a quasi-random basis which stepper con-
trolled the trial outcome for any particular trial. The VI schedules were
constructed of either 12 (the VI 270-s schedule) or 18 intervals (the VI
30-s and VI 90-s schedules) drawn from the exponential distribution of
Fleshier and Hoffman (1962). The VI timers ran continuously (during
both the IT1 and trial periods) until a reinforcer became available. Training

TABLE 1
Order of Conditions and the Schedules Correlated with Each Response Lever

Schedule Number of sessions

Order Left Right s-2 S-8 s-14 s-20

I VI 90 p = so 20 16 16 16
2 VI 90 p = .15 18 23 31 16
3 VI 90 p = .08 16 16 16 18
4 VI 30 p = .15 16 16 21 15
5 VI 90 p = .15 19 17 18 21
6 VI 270 p = .15 17 25 21 17
7 * = .15 VI 30 19 15 21
8 p = .15 VI 90 21 21 28
9 p = .20 VI 270 21 20 20
10 p = .20 VI 90 17 21 17

Nore. The p values correspond to the reciprocal of the ratio requirement. Also shown
are the number of sessions presented to each subject during each condition.
MAXIMIZING THEORIES OF MATCHING 429

on each condition was continued for a minimum of 15 sessions and until


a stability criterion was reached. The latter was determined by dividing
the last nine sessions into three blocks of three sessions each, and computing
a choice proportion for each block of sessions. When these choice pro-
portions differed by no more than 5%, and showed no monotonically
increasing or decreasing trend, the choice behavior was considered stable.
The actual numbers of sessions received by each subject are shown in
Table 1.
Simulations
After the end of the training shown in Table 1, the distributions of
reinforcements produced by different distributions of responding were
determined by simulations of each of the first six conditions shown in
Table 1. Choice proportions to the VR alternative were varied in regular
intervals from 0.05 to 0.95, with each proportion maintained for a minimum
of four sessions. The distribution of VR choices for a particular proportion
was determined by stepper programmed according to a random number
table. The simulated choice responses occurred with a latency of 1.0 s
after trial onset, which was the approximate average latency for the rat
subjects throughout training (see Table 4).
RESULTS
The data were first analyzed according to the generalized matching
law (Eq. (2)). The logarithmic version of Eq. (2) was fitted to the data
for individual subjects using a least-squares linear regression analysis.
Figure 1 shows the best-fitting functions, the value of a and b for that
fit, and the variance accounted for by the fit. Note that different data
symbols represent ditferent series of conditions. The a values were similar
for all subjects, ranging from 0.73 to 0.81. The b values were somewhat
more variable, with two subjects showing a small bias (1.14 and 1.10)
in favor of the ratio alternative, and the remaining two showing little or
no bias.
Three separate series of conditions are included in Fig. 1: those from
Conditions l-3 in which the ratio schedule was varied, those from Con-
ditions 4-6 in which the interval schedule was varied, and those from
Conditions 7-10 in which the interval schedule was again varied after a
reversal in the lever correlated with the different schedules. By examining
the data from each series separately, it should be possible to determine
whether the bias and/or sensitivity varied systematically with the different
types of variation. To facilitate the comparison, the data from each series
were analyzed separately, with the resulting a and b values shown in
Table 2 (the fourth row of the table is discussed later). The value of a
was greatest for all subjects during the series of conditions in which the
VR schedule was varied. This difference, while not large, possibly could
430 BEN A. WILLIAMS

I.2 -

.e -
.4 -

.o -
-.4 -

-3 -

I.2 8 .!? 0 .4 .e I.2

LOG Rwt/Rvt
FIG. 1. Plots of the log of the choice ratio (&J&J as a function of the log of the
reinforcement ratio (&/&) for each condition for each subject. Each point represents
the mean of the last five sessions of each condition. The best fitting functions that are
shown, along with their parameter values and the variance accounted for, were derived
by a linear regression analysis in terms of the logarithmic version of Eq. (2). Closed circles
correspond to Conditions l-3, open circles to Conditions 4-6, and triangles to Conditions
7-10 (see Table I).

be due to the order of conditions, as the VR series was presented early


in training and not repeated (because of increasingly aged subjects).
Whatever the cause of the difference that was obtained, it is clear that
no evidence is provided for the prediction of Eq. (3) that sensitivity
should be greater with varations in the VI than with variation in the VR.
Changes in the bias term, b, were more variable, both across subjects
and across conditions. All subjects exhibited a bias toward the VR com-
ponent during the first series of conditions, but this effect did not continue
when the VI schedule was varied (for Subjects 114 and 120) and when
the correlation between the levers and the type of schedule was reversed
(Subjects 102, 108, and 114). A strong position bias was clearly apparent
for Subject 102, since its bias toward the right lever persisted throughout
training regardless of the schedule conditions. The presence of the position
bias is also the likely reason for the generally poorer fit of Eq. (2) to
the data for that subject, as shown in Fig. 1.
MAXIMIZING THEORIES OF MATCHING 431

A complicating feature of the results is that subjects were able to


respond during the IT1 (signaled by a completely dark chamber) when
no reinforcement was available. All subjects responded somewhat during
that time, particularly just after the termination of trials on which responses
were not reinforced. Because of a recording error reliable measurements
of the IT1 behavior was available only for Conditions 7-10, during which
time the average number of responses per 6-s IT1 period ranged from 1
to 2, which implies that the responses during the trial periods constituted
only one-half to one-third of the total behavior during the session. In
order to evaluate how the IT1 behavior may have affected matching
performance, the total behavior during the session was also subjected
to an analysis in terms of Eq. (2), with the results shown in the last row
of Table 2. In general, there was no systematic change in the values of
u, indicating that the sensitivity to reinforcement was not affected. How-
ever, the bias term, b, was substantially greater in favor of the VR
alternative for all three subjects. The possible reason for ths bias toward
the VR, despite responses during the IT1 having no effect on the delivery
of reinforcement, is considered in the Discussion.
Simulations
The predictions of optimahty theory noted in the introduction depend
upon a number of simplifying assumptions that may be violated by the
use of the discrete-trial procedure. In order to determine the extent these
predictions really do apply to the present situation the optimal rate of

TABLE 2
Results of the Linear Regression Analysis for Different Series of Conditions

Subject

102 108 I14 I20

u b a b u b u b

VR series
(Cond. l-3) 0.79 1.49 0.81 1.21 0.85 1.3s 0.83 I.06
VI series
(Cond. 4-6) 0.55 l.S7 0.73 1.08 0.66 0.77 0.77 0.87
VI series
after reversal
(Cond. 7-10) 0.64 0.12 0.72 1.01 0.77 0.93
VI series
after reversal
all responses
(Cond. 7-10) 0.58 1.06 0.81 1.61 0.65 1.31
~-
No?e. Shown are the best fitting a and b vtiues from the fit to the logarithmic version
of Eq. (2).
432 BEN A. WILLIAMS

reinforcement possible for a given pair of schedules was determined by


simulations in which the choice of the VR alternative was systematically
varied in small steps. Figure 2 shows the results of those simulations in
terms of the total number of reinforcers from both schedules that would
be produced by particular choice proportions. The left portion shows
the results for Conditions l-3 in which the VR schedule was varied in
conjunction with a VI 90-s schedule; the right portion shows the results
for Conditions 4-6 in which the VI schedule was varied in conjunction
with a ratio schedule in which the probability of reinforcement was held
constant at 0.15. Of greatest interest is the point on each function that
produced the greatest total reinforcers. Table 3 shows those numbers in
term of the VR/VI ratio for each schedule comparison, along with the
corresponding relative rates of reinforcement. Also shown are the actually
obtained choice proportions, averaged over all four subjects. Note that
the choice ratios producing the maximum number of reinforcers may be
somewhat in error because of the step sizes of the choice allocations
presented in the simulations (usually in increments of 0.05).
In order to have a quantitative comparison of the simulated response
distributions with those actually obtained, the simulated data shown in
the first two columns of Table 3 were fitted by the logarithmic version
of Eq. (2), as were the real data shown in Fig. 1. Separate fits were
provided for Conditions l-3, 4-6, and all six together. Considering first
only Conditions 4-6, Eq. (3) predicts a bias toward the VR schedule
equal to the square root of the ratio requirement (the inverse of 0.15 =
6.67), which is 2.58, and a value of u equal to 1.0. The fitted values of
the parameters were b = 2.99 and u = 0.99. Given the imperfect location
of the choice ratios producing the true maximum in Fig. 2, the theoretical
predictions based on Eq. (3) thus accord well with the distributions of

0 IO m so 40 so 00 70 m 00 loo 0 10 m 30 40 So t4 70 00 5-a wo
PERCENT CHOICE OF VR FWCENT CtOlCE OF VR
FIG. 2. The results of simulations of Conditions l-6 in which the probability of a choice
of the VR alternative was systematically varied in small steps. Shown is the total number
of reinforcers from both alternatives that resulted from various percentages of choice
allocations to the VR alternative. The left portion shows the results for Conditions l-3,
the right portion shows the results for Conditions 4-6.
MAXIMIZING THEORIES OF MATCHING 433

TABLE 3
Comparison of Simulation Data with the Obtained Choice Proportions
-
Simulated response Simulated reinforcement Obtained response
Condition VR/VI VR/VI VR/VI
-~-
I. 20.2 12.9 8.18
2. 9.51 2.76 1.21
3. 3.20 1.00 0.49
4. 2.00 0.71 0.23
5. 9.51 2.16 0.94
6. 20.0 7.44 3.04
-.-
Note. Shown are the simulated ratios of VR/VI responding that produced the highest
total rates of reinforcement in the simulation. Also shown are the reinforcement ratios
corresponding to those response ratios, and the actually obtained choice ratios from each
condition (the geometric means of four subjects).

behavior that would have produced the maximum rates of reinforcement


in the present situation.
The predictions of Eq. (3) for Conditions l-3 are more indeterminant
because of the effects of averaging the bias term across conditions.
However, as noted in the introduction, Eq. (3) does predict that the
sensitivity parameter, a, should be smaller for Conditions l-3 than Con-
ditions 4-6. The obtained fitted values for Conditions l-3 were b = 3.69
and u = 0.70, so once again the predictions of Eq. (3) were upheld.
Finally, the obtained values of b and u for the fits to all six conditions
were !J = 3.32 and u = 0.83. While the value of a predicted by Eq. (3)
is close to those actually obtained, as shown in Fig. 1, the predicted
bias term is clearly much larger than those actually obtained (which
ranged from 0.97 to 1.14). The analysis of the simulated data thus reveals
that the predictions formally derived from optimality theory do indeed
apply meaningfully to the present situation, and that the actually obtained
results are clearly different from those predictions.
Local Reinforcement Probabilities and Changeover Probabilities
Although the sensitivities to the reinforcement ratios, as shown in
Fig. 1, were substantially below perfect matching, they are sufficiently
close to warrant an assessment of whether the approximation to matching
that was obtained was due to adherence to some type of momentary
maximizing strategy. To make this assessment, the local probabilities of
reinforcement for each experimental condition was determined empirically,
by calculating the probability of reinforcement after successive numbers
of trials since the last response to the VI alternative. Because of recording
failures, these data were available only from Conditions 2-9, which are
shown in Fig. 3. Plotted are the obtained VI reinforcement probabilities
aggregated over all subjects, as well as the probability of reinforcement
434 BEN A. WILLIAMS

TRIALS SINCE LAST VI RESPONSE


FIG. 3. The empirical probability of reinforcement for a VI response as a function of
the number of trials since the last VI response. Each segment corresponds to a different
experimental condition as designated by the heading. Probabilities were aggregated over
all subjects for the last five sessions of each condition.

obtained for the VR alternative over the entire session. For all conditions,
the probability of reinforcement substantially increased with larger numbers
of intervening trials since the last VI response, with the major difference
across conditions being the slope and the intercepts of the functions. If
perfect adherence to a momentary maximizing strategy did occur, this
should be reflected by a choice of the VR alternative whenever the VI
reinforcement function fell below the flat VR reinforcement function, but
a choice of the VI alternative whenever the VI reinforcement function
fell above the VR line.
The actual choice probabilities are shown in Fig. 4, where each segment
of the graph corresponds to the probability-of-reinforcement functions
shown in Fig. 3. Although substantial variability occurred across conditions
(compare, for example, Conditions 2, 5, and 8, where the same rein-
forcement schedules were involved), the functions shown in Fig. 4 generally
do not correspond to the reinforcement probabilities shown in Fig. 3. In
fact, only during Condition 2 was there any clear evidence of increasing
probabilities of a VI response since the time since the last VI response,
as the more common patterns were generally flat functions or those that
actually decreased with time since the last VI response.
The abscissas of Figs. 3 and 4 include both trials on which a response
occurred and trials without a response. Since trials without a response
consumed substantially more time, this meant that the correlation between
the number of trials since the last response and the time since the last
VI response would be degraded the larger the number of trials without
MAXIMIZING THEORIES OF MATCHING 435

1 2 3 Q I 2 23 , 2 3 4-5 ?6 , 2 3 *-5 e-e=9

TRIALS SINCE LAST VI RESPONSE

FIG. 4. The probability of a response to the VI alternative as a function of the number


of trials since the last VI response. Each segment of the graph corresponds to a different
experimental condition and the reinforcement probabilities in Fig. 2. Shown in each segment
are the functions for individual subjects. Data were taken from the last five sessions of
each condition. Different numbers of trials along the abscissa were blocked in order to
have a minimum of 100 observations per data point.

a response. Thus, the trials measure would also be imperfectly correlated


with the actua1 probabilities of reinforcement for the VI response, and
the subjects’ behavior should not be expected to be well predicted by
that variable. To assess the potential importance of this variable, Table 4
shows the percentage of trials without a response for each condition, as

TABLE 4
The Percentage of Trials without a Response for Individual Subjects during
Each Condition

Subject

102 108 114 I20

Condition % Latency % Latency % Latency % Latency


~~~ ~-
1 4.1 5.4 1.1 7.6
2 5.5 9.9 2.6 6.8
3 8.2 0.94 14.3 0.92 8.6 0.91 9.4 0.94
4 4.0 0.85 5.9 0.71 3.6 0.63 3.4 0.61
5 2.7 0.15 6.6 0.65 5.3 0.84 8.3 0.79
6 4.4 0.76 9.1 0.66 7.1 0.88 6.4 0.77
I 1.2 0.94 4.6 0.90 3.4 0.90
8 4.8 0.99 9.6 0.96 6.7 0.84
9 5.9 1.10 6.2 0.80 10.6 0.86
10 11.6 1.00 8.0 0.76 11.5 0.79

NOW. Also shown are the average latencies (in seconds) for a response during trials on
which a response occurred. The latter measure was not recorded during the first two
conditions. Data are averages from the last five sessions of training.
436 BEN A. WILLIAMS

well as the average latency of responses on trials in which a response


did occur. As can be seen, the failure to respond occurred only on a
few trials, with the average percentage across conditions being 6.7%.
The latency data were less variable across conditions, and generally
below 1.0 s for all subjects. Moreover, there was no relation between
the latency values and the proportion of behaviors to the VR/VI alternative,
indicating that the latency was not differentially related to which type
of response occurred. In general, therefore, the trials measure and the
time measure were necessarily highly correlated.

DISCUSSION
The present results demonstrate that the generalized matching law
(Eq. (2)) provides an excellent description of choice behavior of rats
trained on a discrete-trial version of a concurrent VI-VR reinforcement
schedule. For four subjects the average value of a was 0.77, which is
below the range of 0.80 to 1.0 that has been reported for subjects trained
on standard free-operant procedures (cf. Baum, 1979). The greater degree
of undermatching obtained here probably is not due to the use of rats
as subjects, as the only two previous studies using rats as subjects (Baum,
1976; Norman & McSweeney, 1978) while varying the relative rate of
positive reinforcement over a meaningful range have obtained values of
u comparable to pigeons. The use of the discrete-trial procedure is a
second possible reason for the undermatching, but its significance cannot
be evaluated properly because previous studies that have used discrete-
trial procedures (Nevin, 1969; Shimp, 1966; Silberberg et ul., 1978) have
investigated only a single value of relative reinforcement rate. In any
event, the results approximated matching to a degree sufficient to warrant
their use as a means of evaluating the various theories of the underlying
processes.
As noted in the introduction, optimality theory implies two predictions
that can be tested by the present data. First, the sensitivity of choice
behavior to reinforcement allocation, as indexed by the fit of Eq. (2),
should be greater when relative reinforcement rate is varied by changes
in the VI than in the VR; second, the bias within each separate condition
should be toward the VR in proportion to the square root of the ratio
requirement. The results provided no support for the first of these pre-
dictions, as Table 2 reveals that to the extent any differences occurred
in the sensitivity to reinforcement (as shown by the values of u) they
were opposite to those predicted by optimality theory. However, the
present findings cannot be interpreted unambiguously because of the
confound of the possible effects of the order of presentation. A decreasing
sensitivity to reinforcement during later experimental conditions has been
noted in some previous studies (Todorov, Castro, Hanna, de Sa Bittencourt,
MAXIMIZING THEORIES OF MATCHING 431

8z Barret, 1983), so it is possible that the greater sensitivity predicted


during the VI variation was attenuated by that effect in the present study.
However, the size of such decreasing sensitivities has usually been much
smaller than would be required to salvage the predictions of optimality
theory; such an order effect would need to be sufficiently large not only
to account for the failure to obtain the predicted effect of greater sensitivity
during the VI series, but also to offset the obtained effect in the opposite
direction. Moreover, a greater sensitivity to reinforcement when the ratio
schedule was varied has also been reported by Davison (1982) using a
free-operant procedure involving a choice between VI and FR schedules.
The second prediction of optimality theory also was not supported,
as Fig. 1 shows that little overall bias occurred for two of the four
subjects, and that for the remaining two subjects the bias in favor of the
ratio alternative was much smaller than the predicted levels. Moreover,
unlike previous free-operant studies of concurrent VI-VR in which different
local response rates occurred to the two schedules (e.g., Herrnstein &
Heyman, 1979), this failure to find the predicted bias cannot be ascribed
to differential amounts of leisure associated with the two types of behavior.
This apparent disconfirmation of optimality theory must be qualified
by the fact that a substantially greater bias toward the VR alternative
occurred when responses during the IT1 were included in the analysis.
Why the greater bias occurred for responses in the IT1 is a mystery, but
there are several reasons for doubting its importance. First, the effect
was observed only after the reversal of the correlation between the
schedule type and response position, so that the difference could be due
entirely to a position preference which was manifested primarily during
the ITI. Second, if it is assumed that the responses during the IT1 reflect
a failure of discrimination of the absence of reinforcement during the
IT1 (as opposed, for example, to being maintained by trial onset), this
would mean that the ratio requirement, from the animal’s perspective,
would be much larger than that actually scheduled. Thus, instead of the
ratio requirement of 5-6.7 (corresponding to probabilities of .20 and .15),
the ratio requirements would be the total number of responses to the
VR alternative divided by the total number of reinforcers. The actual
number of responses/reinforcer averaged over Conditions 7-10 were 21,
22, and 11, for Subjects 102, 108, and 114, respectively. The corresponding
biases that would be predicted by optimal&y theory would thus be increased
to 4.6, 4.7, and 3.3. Thus, even if the behavior duing the IT1 is included
in the analysis, the predicted biases in favor of the VR alternative were
still much larger than those actually obtained.
A final reason for questioning whether the IT1 behavior can be taken
as evidence in favor of optimality theory is that it requires a failure of
reinforcement contingencies that seems inconsistent with the assumptions
438 BEN A. WILLIAMS

of optimahty theory. The basic tenet of that approach is that animals


learn which p&tern of behavior is correlated with the highest overall
rate of reinforcement, with the supplementary assumption that high rates
of responding are more aversive than low rates of responding, so that
difI’erential amounts of “leisure” correlated with difTerent types of schedules
must also be included. Yet, in order to salvage optimality theory for the
present situation it must be assumed that the animals fail to detect the
presence versus absence of reinforcement which is perfectly correlated
with a highly discriminable cue, even when the continued responding
during the IT1 entails a substantial loss of leisure which in other situations
is considered to be a sticiently strong variable to offset major differences
in obtained reinforcement rates. To the extent that “leisure” is an important
variable, one would expect the present subjects to quickly discriminate
the absence of any response consequences during the ITI. Thus, the
greater bias toward the VR schedule which occurred during the IT1 can
be taken as evidence for optimality theory only if the importance of
leisure is discounted and if the animal is assumed to be insensitive to
exteroceptive stimuli as cues for the optimal pattern of behavior.
The present data also provide a test of “momentary maximizing”
theories of matching, which hypothesize that subjects choose the response
alternative with the higher probability of reinforcement at any moment
(Hinson & Staddon, 1983a; Shimp, 1966; Silberberg et ul., 1978). Such
conceptions imply that the probability of a choice of the VI alternative
in the present procedure should increase the greater the number of trials
since the last VI response. In most conditions, however, the changeover
functions were either flat or decreasing, indicating little control by the
local reinforcement contingencies. Thus, the present results, like those
of Nevin (1969), seem to provide evidence against control by the local
reinforcement contingencies.
Hinson and Staddon (1983a) have argued that flat changeover functions,
when specified in terms of the number of responses since the last VI
response, do not constitute serious evidence against momentary maximizing
because time since the last response, not the number of intervening
responses, determines the reinforcement probability, and the number of
intervening responses and the amount of time may be only loosely related.
However, such an argument does not apply to discrete-trial procedures
like that used here because the allowance of a single response per trial
greatly reduces the variation in local response rates that would allow
the correlation between the two variables to become uncoupled. Some
variation could occur if the subjects frequently failed to respond, or
responded with variable latencies from trial to trial. As shown by Table
4, however, trials without a response were infrequent and latencies were
typically less than 1.0 s. Given that these latencies were quite short in
MAXIMIZING THEORIES OF MATCHING 439

comparison to the period of time the VI timer could run during the ITI,
this means that a very small portion of the variance in time since the
last VI response could be accounted for by variation in response latencies.
The implication is that the number of trials and the amount of elapsed
time since the last response were necessarily highly correlated. The
present data thus question the generality of the momentary maximizing
account proposed by Hinson and Staddon and suggest that matching
does not depend upon aherence to a momentary maximizing strategy.
This does not mean that momentary maximizing does not occur, or that
it may not be correlated with molar matching in some situations, since
Hinson and Staddon do provide strong evidence for its occurrence in
their own procedure. Just what variables determine when it will occur,
and how it is related to molar matching, remain to be elucidated.
The variability in changeover functions presented in Fig. 4 also suggest
the need for caution in generalizing about the nature of such functions
on the basis of a single pair of reinforcement schedules. The functions
for individual subjects were usually similar within conditions, but differed
across conditions, suggesting that both the particular schedule values
and the order of conditions played a significant role. Several varieties
of functions were obtained, including decreasing, flat, slightly increasing,
and even some which were nonmonotonic. Such variability may help
explain the discrepancy reported by different investigators in the past
(e.g., Nevin, 1982, versus Silberberg & Ziriax, 1982). Whatever the reasons
for the variability in the changeover functions, the present data make a
strong case that there is little correspondence between the changeover
probabilities and the actual reinforcement probabilities for such changeover
behavior. On the other hand, the present data agree with previous claims
(Silberberg & Ziriax, 1982), that changeover behavior is not randomly
distributed, since consistent nonrandom patterns were obtained in several
conditions.
Finally, the present results are relevant to the recent argument of Ziriax
and Silberberg (1984) that concurrent VR-VI schedules cannot be used
to study choice behavior in a meaningful fashion, because the relative
rate of reinforcement from such a schedule necessarily tracks the relative
response rates, with the result that matching is necessarily obtained. In
their study they simulated the schedule conditions used previously by
Herrnstein and Heyman (1979) and found that such tracking did occur
in all conditions. Because the finding of undermatching, shown in Fig. 1,
challenges the generality of their analysis, the extent to which the re-
inforcement ratios track the response ratios was explored further using
the simulation data shown in Fig. 2. The data from each of those functions
were analyzed in terms of the logarithmic version of Eq. (2). Such an
analysis allows an assessment of the extent that reinforcement ratios are
determined by the distribution of responding, and thus of the degrees of
freedom for deviations from the matching relation. Figure 5 shows the
440 BEN A. WILLIAMS

-1.0 -0s 0.0 0.5 1.0 1.55 -1.0 -0.5 0.0 0.5

LOG VRNI (REINFORCERS) LC6 VRNI U7ElWXCERS~


FIG. 5. Plots of the simulation data in terms of the logarithm of the response ratios
(VR/VI) versus the logarithm of the reinforcement ratios.

results of this analysis with separate functions for each schedule pair.
Once again conditions with variation in the VR schedule are shown in
the left portion; those with variation in the VI schedule are shown in
the right portion. In general the shape of the functions was similarly
curvilinear in all cases, with the effect of changing the schedule value
being simply to move the location of the function along the abscissa.
The parameter values for the fits of Eq. (2) to the data shown in Fig. 5
are shown in Table 5. Despite the curvilinear nature of the functions,
Eq. (2) accounted for more than 90% of the variance in each individual
function. In addition, the sensitivities of the response rates to the re-
inforcement ratios (the value of the exponent u) were all substantially

TABLE 5
Results of the Simulation Data Shown in Fig. 5
in Terms of the Generalized Matching Law (Eq.
2)
b u % VAC

Condition
1. 0.16 1.75 97.2
2. 1.12 1.79 93.4
3. 3.19 1.71 96.3
Aggregate l-3 1.17 1.08 65.6
Condition
4. 3.39 1.42 98.4
5. 1.12 1.79 93.4
6. 0.13 2.16 94.9
Aggregate 4-6 1.29 1.01 61.5

Note. Shown are the bias terms toward the VR


component (b), the sensitivities of the response
ratios to the reinforcement ratios (u), and the per-
centage of variance accounted for by the fit.
MAXIMIZING THEORIES OF MATCHING 441

above 1.0, indicating substantial overmatching, and the bias terms changed
systematically as a function of the relative values of the schedule pairs.
But quite a different picture is provided by a fit of the aggregate of all
conditions in each portion of Fig. 5. For fits to the aggregate of conditions
with VR variation, and to those with VI variation, the exponent u ap-
proached 1.0, and the bias terms were only slightly greater than 1.0.
However, the quality of the fits was severely reduced, accounting for
only 60-65% of the total variance. Whether matching is obtained (when
u = 1.0) thus depends upon the particular mix of conditions that is
considered and does not follow automatically because reinforcement ratios
necessarily track response ratios. Moreover, that mix of conditions which
did provide an approximation to matching resulted in a poor description
of the data by the matching equation, with such poor fits standing in
sharp contrast to the fits of the real data shown in Fig. 1.
The reason for the discrepancy between the results in Table 5 and
those of Ziriax and Silberberg (1984) are not clear, especially since they
demonstrated that reinforcement rate strongly tracked response rate using
a variety of different local response rates and changeover rates in their
simulation of the conditions of Herrnstein and Heyman (1979). However,
in a subsequent simulation of a single VI 30-s-VR 30 schedule, they
systematically varied changeover rates and local response rates and pro-
duced functions quite similar to those shown in Fig. 5 (see their Fig. 2).
That is, fits of Eq. (2) to some conditions in which these parameters
were held constant produced sensitivity values in the overmatching range
like those seen in Table 5. Changes in the local response rates also were
shown to produce systematic variation in the degree of bias toward the
VR component, in much the same way that different schedule pairs
systematically altered the degree of bias in Fig. 5. A fit of Eq. (2) to
aggregates of their conditions would thus have produced a much lower
sensitivity to reinforcement and much lower percentage of the variance
accounted for, as also occurred in the present situation. Thus, their
simulations also suggest that whether the matching relation is forced by
the schedule constraints, per se, depends upon the particular mix of
conditions included in the regression analysis. Given the results shown
in Table 5 and Fig. 5, the implication is that at least some versions of
a concurrent VI-VR schedule do not automatically produce the matching
relation and that such schedules can indeed be used as a meaningful test
of different theories of choice.
REFERENCES
Baum, W. H. (1974). On two types of deviation from the matching law: Bias and under-
matching. Journal of the Experimental Analysis of Behavior, 22, 231-242.
Baum, W. H. (1976). Time-based and count-based measurement of preference. .Jouuzu/ c?/
the Experimental Analysis qf Behavior, 26, 27-35.
442 BEN A. WILLIAMS

Baum, W. H. (1979). Matching, undermatching, and overmatching in studies of choice.


Journal of the Experimental Analysis 32, 269-281.
of Behavior,
Baum, W. H. (1981). Optimization and the matching law as accounts of instrumental
behavior. Journal of the Experimental Analysis of Behavior, 36, 387-403.
Davison, M, (1982). Preference in concurrent variable-interval fixed-ratio schedules, Journal
of the Experimental Analysis of Behavior, 37, 81-96.
Fleshier, M., & Hoffman, H. S. (1%2). A progression for generating variable-interval
schedules. Journal of the Experimental Analysis of Behavior, 5, 529-530.
Green, L., Rachlin, H., & Hanson, J. (1983). Matching and maximizing with concurrent
ratio-interval schedules. Journal of the Experimental Analysis of Behavior, 40, 217-
224.
Herrnstein, R. J. (1961). Relative and absolute strength of response as a function of
frequency of reinforcement. Journal of the Experimental Analysis of Behavior, 4,267-
272.
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of
Behavior, 13, 243-266.
Herrnstein, R. J., & Heyman, G. M. (1979). Is matching compatible with reinforcement
maximization on concurrent variable interval, variable ratio? Journal of the Experimental
Analysis of Behavior, 31, 209-223.
Herrnstein, R. J., & Vaughan, W. (1980). Melioration and behavioral allocation. In J. E. R.
Staddon (Ed.), Limits to action (pp. 143-176). New York: Academic Press.
Hinson, J. M., & Staddon, J. E. R. (1983a). Hillclimbing by pigeons. Journal of the
Experimental Analysis of Behavior, 39, 25-47.
Hinson, J. M., & Staddon, J. E. R. (1983b). Matching, maximizing, and hillclimbing.
Journal of the Experimental Analysis of Behavior, 321-331.
40,
Nevin, J. A. (1969). Interval reinforcement of choice behavior in discrete trials. Journal
of the Experimental Analysis of Behavior, 12, 875-885.
Nevin, J. A. (1979). Overall matching versus momentary maximizing: Nevin (1%9) revisited.
Journal of Experimental Psychology: Animal Behavior Processes, 5, 300-306.
Nevin, J. A. (1982). Some persistent issues in the study of matching and maximizing. In
M. L. Commons, R. J. Herrnstein, & H. Raclin (Eds.), Quantitative analyses of
behavior: Vol. 2. Mafching and maximizing accounts (pp. 153-16.5). Cambridge, MA:
Ballinger.
Norman, W. D., & McSweeney, F. K. (1978). Matching, contrast, and equalizing in the
concurrent lever-press responding of rats. Journal of the Experimental Analysis of
Behavior, 29, 453-462.
Prelec, D. (1982). Matching, maximizing, and the hyperbolic feedback function. Psychological
Review, 89, 189-230.
Rachhn, H., Battaho, R., Kagel, J., & Green, L. (1981). Maximization theory in behavioral
psychology. The Behavioral and Brain Sciences, 4, 371-388.
Shimp, C. P. (1966). Probabilistically reinforced choice behavior in pigeons. Journal of
the Experimental Analysis of Behavior, 9, 433-455.
Shimp, C. P. (1969). Optimum behavior in free-operant experiments. Psychological Review,
76, 97-112.
Silberberg, A., Hamilton, B., Ziriax, J. M., & Casey, J. (1978). The structure of choice.
Journal of Experimental Psychology: Animal Behavior Processes, 4, 368-398.
Silberberg, A., & Ziriax, J. M. (1982). The interchangeover time as a molecular dependent
variable in concurrent schedules. In M. L. Commons, R. J. Herrnstein, & H. Rachhn
(Eds.), Quantitative analyses of 6ehavior: Vol. 2. Matching and maximizing accounts
(pp. 131-151). Cambridge, MA: Ballinger.
Staddon, J. E. R., & Motheral, S. (1978). On matching and maximizing in operant choice
experiments. Psychological Review, 85, 436-444.
MAXIMIZING THEORIES OF MATCHING 443

Todorov, J. C., Castro, J. M. O., Hanna, E. S., de Sa Bittencourt, M. C. N., & Barreto,
M. Q. (1983). Choice, experience, and the generalized matching law. Jourrd of r/w
Experimental Analysis of Behavior, 40, 99-109.
Ziriax, J. M., & Silberberg, A. (1984). Concurrent variable-interval variable-ratio schedules
can provide only weak evidence for matching. Journal of the Experimenta/ Ana/y.si.y
of Behavior, 41, 83-100.

Received February 18, 1985


Revised June 25. 1985

You might also like