Professional Documents
Culture Documents
Choice Behavior in A Discrete-Trial Concurrent VI-W: A Test of Maximizing Theories of Matching
Choice Behavior in A Discrete-Trial Concurrent VI-W: A Test of Maximizing Theories of Matching
Choice Behavior in A Discrete-Trial Concurrent VI-W: A Test of Maximizing Theories of Matching
The matching law (cf. Herrnstein, 1961, 1970), expressed as Eq. (I),
states that the ratio of response rates for two alternatives in a choice
experiment “matches” the ratio of their respective reinforcement rates.
Perfect matching often fails to occur, however, so that results from choice
studies are more often described in terms of Eq. (2), known as the
generalized matching law (Baum, 1974). Accordingly, the exponent, u,
indicates the sensitivity of the response allocation to the reinforcement
ratio (with a = 1.0 for perfect matching), while the parameter, b, indicates
bias toward one or other response alternative independent of the rein-
Request for reprints should be addressed to the author, Department of Psychology, C-
009, University of California, San Diego, La Jolla, CA 92093. This research was supported
by NIMH Research Grant 1 ROl MH 35572-02 and NSF Research Grant BNS84-08878
to the University of California, San Diego.
423
0023-9690185 $3.00
Copyright 0 I985 by Academic Press. Inc.
Ail rights of reproduction in any form rexred.
424 BEN A. WILLIAMS
forcement ratios (for example, when there are ditTerent types of reinforcers
or responses).
term across conditions that would be captured by Eq. (2) both in terms
of the value of b being somewhere in the middle of the different values
of dn and in terms of a value of u substantially less than 1.0 (see Baum,
1981, Fig. 7, for a graphical depiction for why this is so). The result is
that the sensitivity of relative response rate to relative reinforcement
rate should be substantially less when relative reinforcement rate is varied
by changes in the VR schedule, than by changes in the VI schedule.
The major evidence pertaining to Eq. (3) comes from the study of
concurrent VR-VI by Herrnstein and Heyman (1979), who demonstrated
that matching did occur but with different types of biases depending
upon the behavioral measure. With response rate as the measure bias
was in favor of the VR alternative; with time allocation the bias was in
favor of the VI alternative. Such results are contrary to the predictions
of Eq. (3), both because the time-allocation data are those more relevant
to the derivation of Eq. (3) offered by Baum (1981) and because the bias
term for response rate was substantially less than required by Eq. (3).
The ratio values employed by Herrnstein and Heyman were in the range
of VR 30-60, so the expected bias should be in the range of 5-8. In
fact, however, the obtained bias for response rate was approximately
1.4. The predictions of optimality theory thus appear to be disconfirmed
by the empirical evidence.
F’roponents of optimality theory (e.g,, Rachlin ef cd., 1981) have argued
that the failure of the expected bias in favor of the VR alternative need
not be considered critical evidence, if the concept of “leisure” is in-
corporated into the analysis. Thus, rather than the total value in the
situation being given by the sum of the reinforcement rates from the
choice alternatives, value is also determined by the time spent not re-
sponding. Choice between VR and VI is then actually a choice between
different “packages” of food + leisure. Because ratio schedules maintain
higher local response rates than do interval schedules, this implies that
the choice of the interval schedule produces a relatively greater amount
of leisure, thus offsetting the bias toward the VR component that is
expected from Eq. (3), which is based only on the obtained rates of
food. To support this contention, Rachlin et cd. (1981, pp 409-410) dem-
onstrated that values of VI leisure and VR leisure could be derived
empirically that would allow the choice results of Herrnstein and Heyman
(1979) to be consistent with optimality-based predictions.
Differential amounts of leisure for VR versus VI schedules is a potential
confounding variable only because the two schedules produce different
local rates of responding. Thus, its effects should be eliminated by ensuring
that local response rates are equated for the two schedules. One method
of accomplishing this was used by Green, Rachlin, and Hanson (1983),
who trained subjects on a concurrent VR-“VR” schedule that was a
direct analog to a concurrent VR-VI. The major difference between their
426 BEN A. WILLIAMS
opening for the food dipper, which was activated for reinforcement. The
dipper size as 0.02 ml, and the reinforcer consisted of undiluted Mazola
corn oil.
Procedure
All subjects were hand shaped to press each lever in alternate sessions.
After 100 reinforcers had been obtained by presses of each separate
lever, all subjects were begun immediately on the first condition listed
in Table I. A discrete-trial procedure was used in which the illumination
of the lights located above each lever signaled the trial onset. The first
response to either lever terminated the trial, producing the reinforcer if
scheduled, and initiating a constant 6-s IT1 in which the lights were
extinguished. Trials also terminated after 5 s if no responses occurred.
A total of 400 trials were presented each session.
The sequence of schedules associated with each lever is shown in
Table 1. Interspersed between Conditions 6 and 7 were several sessions
in which the IT1 was varied, but these data are not presented. Note that
one subject (S-20) became ill after Condition 6 and was terminated from
the study.
The VR schedule was programmed by two 33-position steppers, each
wired according to a random number table, with a Gerbrands’ cam stepper
(model G-4642) determining on a quasi-random basis which stepper con-
trolled the trial outcome for any particular trial. The VI schedules were
constructed of either 12 (the VI 270-s schedule) or 18 intervals (the VI
30-s and VI 90-s schedules) drawn from the exponential distribution of
Fleshier and Hoffman (1962). The VI timers ran continuously (during
both the IT1 and trial periods) until a reinforcer became available. Training
TABLE 1
Order of Conditions and the Schedules Correlated with Each Response Lever
I VI 90 p = so 20 16 16 16
2 VI 90 p = .15 18 23 31 16
3 VI 90 p = .08 16 16 16 18
4 VI 30 p = .15 16 16 21 15
5 VI 90 p = .15 19 17 18 21
6 VI 270 p = .15 17 25 21 17
7 * = .15 VI 30 19 15 21
8 p = .15 VI 90 21 21 28
9 p = .20 VI 270 21 20 20
10 p = .20 VI 90 17 21 17
Nore. The p values correspond to the reciprocal of the ratio requirement. Also shown
are the number of sessions presented to each subject during each condition.
MAXIMIZING THEORIES OF MATCHING 429
I.2 -
.e -
.4 -
.o -
-.4 -
-3 -
LOG Rwt/Rvt
FIG. 1. Plots of the log of the choice ratio (&J&J as a function of the log of the
reinforcement ratio (&/&) for each condition for each subject. Each point represents
the mean of the last five sessions of each condition. The best fitting functions that are
shown, along with their parameter values and the variance accounted for, were derived
by a linear regression analysis in terms of the logarithmic version of Eq. (2). Closed circles
correspond to Conditions l-3, open circles to Conditions 4-6, and triangles to Conditions
7-10 (see Table I).
TABLE 2
Results of the Linear Regression Analysis for Different Series of Conditions
Subject
u b a b u b u b
VR series
(Cond. l-3) 0.79 1.49 0.81 1.21 0.85 1.3s 0.83 I.06
VI series
(Cond. 4-6) 0.55 l.S7 0.73 1.08 0.66 0.77 0.77 0.87
VI series
after reversal
(Cond. 7-10) 0.64 0.12 0.72 1.01 0.77 0.93
VI series
after reversal
all responses
(Cond. 7-10) 0.58 1.06 0.81 1.61 0.65 1.31
~-
No?e. Shown are the best fitting a and b vtiues from the fit to the logarithmic version
of Eq. (2).
432 BEN A. WILLIAMS
0 IO m so 40 so 00 70 m 00 loo 0 10 m 30 40 So t4 70 00 5-a wo
PERCENT CHOICE OF VR FWCENT CtOlCE OF VR
FIG. 2. The results of simulations of Conditions l-6 in which the probability of a choice
of the VR alternative was systematically varied in small steps. Shown is the total number
of reinforcers from both alternatives that resulted from various percentages of choice
allocations to the VR alternative. The left portion shows the results for Conditions l-3,
the right portion shows the results for Conditions 4-6.
MAXIMIZING THEORIES OF MATCHING 433
TABLE 3
Comparison of Simulation Data with the Obtained Choice Proportions
-
Simulated response Simulated reinforcement Obtained response
Condition VR/VI VR/VI VR/VI
-~-
I. 20.2 12.9 8.18
2. 9.51 2.76 1.21
3. 3.20 1.00 0.49
4. 2.00 0.71 0.23
5. 9.51 2.16 0.94
6. 20.0 7.44 3.04
-.-
Note. Shown are the simulated ratios of VR/VI responding that produced the highest
total rates of reinforcement in the simulation. Also shown are the reinforcement ratios
corresponding to those response ratios, and the actually obtained choice ratios from each
condition (the geometric means of four subjects).
obtained for the VR alternative over the entire session. For all conditions,
the probability of reinforcement substantially increased with larger numbers
of intervening trials since the last VI response, with the major difference
across conditions being the slope and the intercepts of the functions. If
perfect adherence to a momentary maximizing strategy did occur, this
should be reflected by a choice of the VR alternative whenever the VI
reinforcement function fell below the flat VR reinforcement function, but
a choice of the VI alternative whenever the VI reinforcement function
fell above the VR line.
The actual choice probabilities are shown in Fig. 4, where each segment
of the graph corresponds to the probability-of-reinforcement functions
shown in Fig. 3. Although substantial variability occurred across conditions
(compare, for example, Conditions 2, 5, and 8, where the same rein-
forcement schedules were involved), the functions shown in Fig. 4 generally
do not correspond to the reinforcement probabilities shown in Fig. 3. In
fact, only during Condition 2 was there any clear evidence of increasing
probabilities of a VI response since the time since the last VI response,
as the more common patterns were generally flat functions or those that
actually decreased with time since the last VI response.
The abscissas of Figs. 3 and 4 include both trials on which a response
occurred and trials without a response. Since trials without a response
consumed substantially more time, this meant that the correlation between
the number of trials since the last response and the time since the last
VI response would be degraded the larger the number of trials without
MAXIMIZING THEORIES OF MATCHING 435
TABLE 4
The Percentage of Trials without a Response for Individual Subjects during
Each Condition
Subject
NOW. Also shown are the average latencies (in seconds) for a response during trials on
which a response occurred. The latter measure was not recorded during the first two
conditions. Data are averages from the last five sessions of training.
436 BEN A. WILLIAMS
DISCUSSION
The present results demonstrate that the generalized matching law
(Eq. (2)) provides an excellent description of choice behavior of rats
trained on a discrete-trial version of a concurrent VI-VR reinforcement
schedule. For four subjects the average value of a was 0.77, which is
below the range of 0.80 to 1.0 that has been reported for subjects trained
on standard free-operant procedures (cf. Baum, 1979). The greater degree
of undermatching obtained here probably is not due to the use of rats
as subjects, as the only two previous studies using rats as subjects (Baum,
1976; Norman & McSweeney, 1978) while varying the relative rate of
positive reinforcement over a meaningful range have obtained values of
u comparable to pigeons. The use of the discrete-trial procedure is a
second possible reason for the undermatching, but its significance cannot
be evaluated properly because previous studies that have used discrete-
trial procedures (Nevin, 1969; Shimp, 1966; Silberberg et ul., 1978) have
investigated only a single value of relative reinforcement rate. In any
event, the results approximated matching to a degree sufficient to warrant
their use as a means of evaluating the various theories of the underlying
processes.
As noted in the introduction, optimality theory implies two predictions
that can be tested by the present data. First, the sensitivity of choice
behavior to reinforcement allocation, as indexed by the fit of Eq. (2),
should be greater when relative reinforcement rate is varied by changes
in the VI than in the VR; second, the bias within each separate condition
should be toward the VR in proportion to the square root of the ratio
requirement. The results provided no support for the first of these pre-
dictions, as Table 2 reveals that to the extent any differences occurred
in the sensitivity to reinforcement (as shown by the values of u) they
were opposite to those predicted by optimality theory. However, the
present findings cannot be interpreted unambiguously because of the
confound of the possible effects of the order of presentation. A decreasing
sensitivity to reinforcement during later experimental conditions has been
noted in some previous studies (Todorov, Castro, Hanna, de Sa Bittencourt,
MAXIMIZING THEORIES OF MATCHING 431
comparison to the period of time the VI timer could run during the ITI,
this means that a very small portion of the variance in time since the
last VI response could be accounted for by variation in response latencies.
The implication is that the number of trials and the amount of elapsed
time since the last response were necessarily highly correlated. The
present data thus question the generality of the momentary maximizing
account proposed by Hinson and Staddon and suggest that matching
does not depend upon aherence to a momentary maximizing strategy.
This does not mean that momentary maximizing does not occur, or that
it may not be correlated with molar matching in some situations, since
Hinson and Staddon do provide strong evidence for its occurrence in
their own procedure. Just what variables determine when it will occur,
and how it is related to molar matching, remain to be elucidated.
The variability in changeover functions presented in Fig. 4 also suggest
the need for caution in generalizing about the nature of such functions
on the basis of a single pair of reinforcement schedules. The functions
for individual subjects were usually similar within conditions, but differed
across conditions, suggesting that both the particular schedule values
and the order of conditions played a significant role. Several varieties
of functions were obtained, including decreasing, flat, slightly increasing,
and even some which were nonmonotonic. Such variability may help
explain the discrepancy reported by different investigators in the past
(e.g., Nevin, 1982, versus Silberberg & Ziriax, 1982). Whatever the reasons
for the variability in the changeover functions, the present data make a
strong case that there is little correspondence between the changeover
probabilities and the actual reinforcement probabilities for such changeover
behavior. On the other hand, the present data agree with previous claims
(Silberberg & Ziriax, 1982), that changeover behavior is not randomly
distributed, since consistent nonrandom patterns were obtained in several
conditions.
Finally, the present results are relevant to the recent argument of Ziriax
and Silberberg (1984) that concurrent VR-VI schedules cannot be used
to study choice behavior in a meaningful fashion, because the relative
rate of reinforcement from such a schedule necessarily tracks the relative
response rates, with the result that matching is necessarily obtained. In
their study they simulated the schedule conditions used previously by
Herrnstein and Heyman (1979) and found that such tracking did occur
in all conditions. Because the finding of undermatching, shown in Fig. 1,
challenges the generality of their analysis, the extent to which the re-
inforcement ratios track the response ratios was explored further using
the simulation data shown in Fig. 2. The data from each of those functions
were analyzed in terms of the logarithmic version of Eq. (2). Such an
analysis allows an assessment of the extent that reinforcement ratios are
determined by the distribution of responding, and thus of the degrees of
freedom for deviations from the matching relation. Figure 5 shows the
440 BEN A. WILLIAMS
-1.0 -0s 0.0 0.5 1.0 1.55 -1.0 -0.5 0.0 0.5
results of this analysis with separate functions for each schedule pair.
Once again conditions with variation in the VR schedule are shown in
the left portion; those with variation in the VI schedule are shown in
the right portion. In general the shape of the functions was similarly
curvilinear in all cases, with the effect of changing the schedule value
being simply to move the location of the function along the abscissa.
The parameter values for the fits of Eq. (2) to the data shown in Fig. 5
are shown in Table 5. Despite the curvilinear nature of the functions,
Eq. (2) accounted for more than 90% of the variance in each individual
function. In addition, the sensitivities of the response rates to the re-
inforcement ratios (the value of the exponent u) were all substantially
TABLE 5
Results of the Simulation Data Shown in Fig. 5
in Terms of the Generalized Matching Law (Eq.
2)
b u % VAC
Condition
1. 0.16 1.75 97.2
2. 1.12 1.79 93.4
3. 3.19 1.71 96.3
Aggregate l-3 1.17 1.08 65.6
Condition
4. 3.39 1.42 98.4
5. 1.12 1.79 93.4
6. 0.13 2.16 94.9
Aggregate 4-6 1.29 1.01 61.5
above 1.0, indicating substantial overmatching, and the bias terms changed
systematically as a function of the relative values of the schedule pairs.
But quite a different picture is provided by a fit of the aggregate of all
conditions in each portion of Fig. 5. For fits to the aggregate of conditions
with VR variation, and to those with VI variation, the exponent u ap-
proached 1.0, and the bias terms were only slightly greater than 1.0.
However, the quality of the fits was severely reduced, accounting for
only 60-65% of the total variance. Whether matching is obtained (when
u = 1.0) thus depends upon the particular mix of conditions that is
considered and does not follow automatically because reinforcement ratios
necessarily track response ratios. Moreover, that mix of conditions which
did provide an approximation to matching resulted in a poor description
of the data by the matching equation, with such poor fits standing in
sharp contrast to the fits of the real data shown in Fig. 1.
The reason for the discrepancy between the results in Table 5 and
those of Ziriax and Silberberg (1984) are not clear, especially since they
demonstrated that reinforcement rate strongly tracked response rate using
a variety of different local response rates and changeover rates in their
simulation of the conditions of Herrnstein and Heyman (1979). However,
in a subsequent simulation of a single VI 30-s-VR 30 schedule, they
systematically varied changeover rates and local response rates and pro-
duced functions quite similar to those shown in Fig. 5 (see their Fig. 2).
That is, fits of Eq. (2) to some conditions in which these parameters
were held constant produced sensitivity values in the overmatching range
like those seen in Table 5. Changes in the local response rates also were
shown to produce systematic variation in the degree of bias toward the
VR component, in much the same way that different schedule pairs
systematically altered the degree of bias in Fig. 5. A fit of Eq. (2) to
aggregates of their conditions would thus have produced a much lower
sensitivity to reinforcement and much lower percentage of the variance
accounted for, as also occurred in the present situation. Thus, their
simulations also suggest that whether the matching relation is forced by
the schedule constraints, per se, depends upon the particular mix of
conditions included in the regression analysis. Given the results shown
in Table 5 and Fig. 5, the implication is that at least some versions of
a concurrent VI-VR schedule do not automatically produce the matching
relation and that such schedules can indeed be used as a meaningful test
of different theories of choice.
REFERENCES
Baum, W. H. (1974). On two types of deviation from the matching law: Bias and under-
matching. Journal of the Experimental Analysis of Behavior, 22, 231-242.
Baum, W. H. (1976). Time-based and count-based measurement of preference. .Jouuzu/ c?/
the Experimental Analysis qf Behavior, 26, 27-35.
442 BEN A. WILLIAMS
Todorov, J. C., Castro, J. M. O., Hanna, E. S., de Sa Bittencourt, M. C. N., & Barreto,
M. Q. (1983). Choice, experience, and the generalized matching law. Jourrd of r/w
Experimental Analysis of Behavior, 40, 99-109.
Ziriax, J. M., & Silberberg, A. (1984). Concurrent variable-interval variable-ratio schedules
can provide only weak evidence for matching. Journal of the Experimenta/ Ana/y.si.y
of Behavior, 41, 83-100.