Professional Documents
Culture Documents
Modeling VI and VDRL Feedback Functions: Searching Normative Rules Through Computational Simulation
Modeling VI and VDRL Feedback Functions: Searching Normative Rules Through Computational Simulation
DOI: 10.1002/jeab.826
RESEARCH ARTICLE
1
Department of Pathology, University of São Abstract
Paulo, Medical School, São Paulo, Brazil
2
We present the mathematical description of feedback functions of variable interval
Department of Legal Medicine, Medical
Ethics, Work and Social Medicine, University
and variable differential reinforcement of low rates as functions of schedule size
of São Paulo, Medical School, São Paulo, only. These results were obtained using an R script named Beak, which was built
Brazil to simulate rates of behavior interacting with simple schedules of reinforcement.
3
Department of Experimental Psychology, Using Beak, we have simulated data that allow an assessment of different rein-
University of São Paulo, Institute of
Psychology, São Paulo, Brazil
forcement feedback functions. This was made with unparalleled precision, as simu-
4
National Institute of Science and Technology
lations provide huge samples of data and, more importantly, simulated behavior is
on Behavior, Cognition and Teaching, not changed by the reinforcement it produces. Therefore, we can vary response
São Paulo, Brazil rates systematically. We’ve compared different reinforcement feedback functions
for random interval schedules, using the following criteria: meaning, precision,
Correspondence
Marcelo Frota Benvenuti, Instituto de
parsimony, and generality. Our results indicate that the best feedback function for
Psicologia da USP, Av. Prof. Mello Moraes, the random interval schedule was published by Baum (1981). We also propose that
1721-030, Sao Paulo, SP, Brazil. the model used by Killeen (1975) is a viable feedback function for the random dif-
Email: mbenvenuti@usp.br
ferential reinforcement of low rates schedule. We argue that Beak paves the way
for greater understanding of schedules of reinforcement, addressing still open ques-
Funding information
São Paulo Research Foundation, Grant/Award tions about quantitative features of simple schedules. Also, Beak could guide
Number: 2014/50909-8; Brazilian Council for future experiments that use schedules as theoretical and methodological tools.
Scientific and Technological Development,
Grant/Award Number: 465686/2014-1;
Coordination for the Improvement of Higher KEYWORDS
Education Personnel, Grant/Award Number: reinforcement feedback function, simple schedules of reinforcement, simulation, variable differential
88887.136407/2017-00 reinforcement of low rates, variable interval
The general definition of operant behavior implies that procedures. A quantitative analysis of feedback functions
behavior controls environmental changes. Ferster and and their main features would directly address some old
Skinner (1957) emphasized how these changes shaped dif- yet still pending questions about schedules of reinforce-
ferent patterns of behavior. In their work behavior was ment (e.g., Baum, 1973, 1993; Catania & Reynolds, 1968;
the dependent variable and reinforcement was the inde- Killeen, 1975; Rachlin, 1978) and guide future research
pendent variable. On the other hand, the causality can be that uses schedules as a methodological tool.
reversed: rate of reinforcement may be treated as the In this work, we seek to resume the long-dormant dis-
dependent variable and rates of behavior as the indepen- cussion about RFF of simple schedules through a compu-
dent variable. The mathematical description of such a tational routine called Beak. This routine simulates rates of
relation is called reinforcement feedback function, or sim- behavior interacting with schedules of reinforcement. Our
ply RFF (Baum, 1973; Rachlin, 1978). major contribution is that it allows us to test many possible
Recent technologies pave the way for a more precise response rates without having to rely on extensive experi-
quantitative description of reinforcement processes and mentation with actual subjects. Despite the name Beak, it
324 © 2023 Society for the Experimental Analysis of Behavior. wileyonlinelibrary.com/journal/jeab J Exp Anal Behav. 2023;119:324–336.
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SIMPLE SCHEDULES 325
is important to emphasize that we do not seek to simulate and Skinner (1957) in their seminal work. According to
response patterns of any specific animal (e.g., rats, bees, these authors, every schedule of reinforcement could be
pigeons, humans). Our goal is to build rules about possible “represented by a certain arrangement of timers, coun-
outputs of a schedule over a large range of random ters and relay circuits” (Ferster & Skinner, 1957, p. 10).
response rates. These rules could guide future experiments Still, most textbooks and technical papers omit relevant
that use schedules as theoretical and methodological tool. details about schedule algorithms and emphasize the
behavioral patterns associated with each simple sched-
ule (e.g., Mazur, 2016; Pierce & Cheney, 2017).
Schedules as algorithms This discussion is not confined to solely theoretical
matters. Schedules of reinforcement are held as crucial
Schedules of reinforcement are core concepts for the methodological tools for behavioral scientists to produce
experimental analysis of behavior. The algorithms and and analyze many experimental results. The correct inter-
rules that define schedules, however, are usually taken pretation of these results relies on clarity of schedule defi-
for granted, except for initial works (e.g., Catania & nitions when applied to problems such as discrimination
Reynolds, 1968; Ferster & Skinner, 1957; Fleshler & learning by the use of multiple schedules (Ferster &
Hoffman, 1962; Millenson, 1963). The absence of sched- Skinner, 1957; Weiss & Van Ost, 1974), observing behav-
ule appraisal in the current literature is a potential prob- ior and conditioned reinforcement (Wyckoff, 1969),
lem because it could hinder replication. choice by the use of concurrent schedules (Herrnstein,
A schedule of reinforcement is a set of rules that 1961, 1970), self-control by the use of concurrent chained
describes how behavior can produce reinforcers schedules (Rachlin & Green, 1972), behavior pharmacol-
(Ferster & Skinner, 1957). Although the literature on the ogy (Dews, 1962; Reilly, 2003), and decision making and
topic presents a myriad of schedule designations, all of bias (Fantino, 1998; Goodie & Fantino, 1995).
them derive from the criteria used to define the so-called
simple schedules. Fundamentally, reinforcers can be a
function of a number of responses, of the passage of time, Schedules as feedback functions
or some combination of both.
All these schedule requirements can be either fixed The search for feedback functions for basic schedules is
(F) or variable (V). On fixed schedules, the criterion to be an important attempt to find normative rules about how
met (schedule size) is constant between reinforcers. On simple schedules constrain reinforcement. This quantita-
variable schedules, this criterion is an average of a set of tive signature of schedules precedes the empirical pattern
values. In the late fifties, implementing a variable sched- associated with each schedule and the ensuing contro-
ule could be a challenge. Ferster and Skinner (1957) did versy on differences among species, related repertoires,
so, selecting a series of values with an intended mean and and stability criterion (e.g., Galizio & Buskist, 1988;
“scrambling” them. However simple, this solution raises Stoddard et al., 1988). Reinforcement feedback functions
some important questions. How many values should one (RFFs) allow us to discover optimal relations between
use? How should the relative frequency of such intervals behavior and reinforcement for each schedule and so
be distributed? Does scrambling mean randomness? pose a way to propose normative rules for what to expect
Intuitively, one should build a schedule with as many from actual (optimal) behavior. That is why RFFs are a
values as possible in order to diminish predictability. Yet, research topic in their own right. Still, the feedback func-
in the past, researchers implemented schedules using a tion of many schedules remains an open subject. The gen-
punched tape, in which the distances between holes corre- eral shape of some RFFs is well known. Figure 1 depicts
sponded to multiples of values that originated the variable schematic RFFs, based on Rachlin (1989).
schedule. Therefore, this method imposed a practical limi- Each RFF clarifies how rates of reinforcement are con-
tation because too many values meant very long tapes, strained by basic schedules in a molar level of analysis.
which could lead to more technical difficulties (Catania & Time schedules (e.g., variable time, VT) do not depend on
Reynolds, 1968). The electromechanical apparatus also behavior. Therefore, the RFF is a horizontal line with an
constrained choices regarding the distribution of frequency intercept equal to the rate of reinforcement deduced from
of interval values. Because it limited the number of values, the schedule’s size. In ratio schedules (e.g., variable rate,
distributions were always discrete. Instead of variable VR), rates of behavior and reinforcement have a linear
schedules, modern computers can easily apply random relation, with an intercept equal to zero and a slope that is
(R) procedures with intervals distributed according to con- the reciprocal of the ratio size. In interval schedules
tinuous density probability functions as a feasible alterna- (e.g., variable interval, VI), reinforcement rate is further
tive to fixed and variable schedules. constrained by a temporal criterion, altering the prior lin-
The absence of discussions addressing the schedule’s ear function. In such cases, rates of response control
algorithms used along many experiments suggests an increasing rates of reinforcement until an asymptotic level.
apparent, but false, consensus. There are several critical As Baum (1992) pointed out, a viable RFF should
aspects to defining and implementing schedules of rein- fit the experimental data. But one cannot directly
forcement, which were already recognized by Ferster manipulate rates of behavior in the animal laboratory
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
326 SILVEIRA ET AL.
METHOD
Here we describe how we have implemented simple
schedules and responses on Beak. For the sake of parsi-
mony, we will describe the random interval (RI) and
random differential reinforcement of low rates (RDRL).
Our implementations of simple schedules are mainly
based on initial work by Millenson (1963) and Ambler
(1973). We consider their implementation ideal, because
they are continuous versions of the discrete (and more
widely used) algorithms (Fleshler & Hoffman, 1962).
F I G U R E 1 Schematic feedback functions for three fundamental Our implementation of responses is like the one by
variable schedules: VT, VI, and VR. The VT schedule does not depend
Green et al. (1983). Specifically, here, p stands simply
on behavior because reinforcers are provided at average time intervals.
The VR schedule completely depends on the animal’s behavior because for response probability, whereas 1 p stands for a prob-
reinforcers are provided after a given number of responses. The VI ability of no response. Also, trials can happen every frac-
schedule is a middle ground in which the reinforcer becomes available tion of a second, depending on the response rates we
at intervals, but it is only received after an animal’s response. want to investigate.
reinforcer consumption. For studies with approximately parameter T is not affected by the organism’s behavior,
zero consumption time, we argue that T ≤ 1 second is a whereas the same parameter, in the RDRL, is directly
convenient heuristic for T to meet both requirements affected by the organism’s IRT. This happens because
simultaneously. the chronometer that registers time during each cycle
Given that the implemented schedule is a function of resets after every response emitted, which causes a cycle
T and p, it is unlikely that the average and standard devi- of time T to only be fully completed if no responses are
ation will be identical to the planned value. Therefore, we emitted in the meantime. Such a condition makes p con-
suggest a 1% margin of tolerance. If x is the planned ditional to the organism’s IRT, so in order to obtain a
schedule average and standard deviation, this margin of mean value for the probability of reinforcement in the
tolerance for the mean can be described as session one must consider the minimum IRT the schedule
requires (the size of the RDRL).
jx Tp j In other words, the same relation between T and p
≤ 0:01: ð3Þ that defines the average IAT of an RI defines the
x
average IRT with which the organism is required to
comply in a RDRL. Therefore, substituting T for T 0 , to
Applying the same margin of tolerance to the stan- emphasize such a difference, the mean RDRL size is
dard deviation, given by
pffiffiffiffiffiffiffiffiffiffi
T
1p T0
1≥
p
≥ 0:99: ð4Þ μRDRL ¼ : ð5Þ
x p
In other words, values of T and p that meet the The parameter T 0 is the minimum IRT required by
requirements expressed in Equations 3 and 4 will produce the schedule for reinforcement assignment, and p is the
an RI with exponential distribution of IAT that is suffi- probability that a reinforcer is assigned by the end of T 0 .
ciently close to a RI x (of same size) as planned before- Here we will use Beak to draw the RFF RDRL and dis-
hand. A small R script to determine adequate cuss a convenient curve fit. Even though the RDRL was
combinations of T and p is available as supplemental implemented in the animal laboratory (Aasvee
material (available at: https://sourceforge.net/projects/ et al., 2015; Logan, 1967), to the best of our knowledge,
simpleschedules/). no further studies have been published about the RFF of
After choosing appropriate values for T and p, the the RDRL.
simulation starts running. A given interval will elapse
until the first reinforcer is assigned. After every reinforced
response, the chronometer restarts. That poses the inter- Simulating responses
val schedule’s criterion for reinforcement presentation
based on the period between two consecutive reinforcers Here we will present the assumptions of Beak regarding
(reinforcement as a function of both responding and pas- the implementation of responses to study schedules of
sage of time). Using such an implementation and data reinforcement using computational simulation. Beak pro-
produced using Beak, we will discuss the shape of the duces instantaneous responses programmed as a
RFF RI. Bernoulli process, where a success corresponds to the
emission of a response. The simulation explores a range
of response rates (B), with B constant along each session.
Random differential reinforcement of low rates The probability of response emission at each instant of
time (pb ) for each session is given by
In the well-known DRL schedule (differential reinforce-
ment of low rates of behavior), a minimum interresponse Bt
pb ¼ : ð6Þ
time (IRT) must precede rewarded responses (Ferster & 60
Skinner, 1957). Using Beak, we were able to implement
the random differential reinforcement of low rates—the The simulation evolves in discrete steps. Each second
RDRL schedule (Ambler, 1973; Logan, 1967). In a is fractioned according to t (the minimum possible IRT).
RDRL schedule, the required IRT varies randomly. Such The mean rate of responses, B, is provided in minutes
variation is a function of parameters like those used to (the correspondence from minutes to seconds is repre-
implement the RI schedule (Millenson, 1963). sented by the constant 1/60 in Equation 6). For instance,
Just like the previously defined RI, a reinforcement is a response rate of 100 per minute and a second parti-
assigned with probability equal to p every T seconds. The tioned in intervals of 5/1000 of a second, would result in
difference relies on the fact that, in the RI schedule, the pb ≈ 0:0083 (the probability of response in each iteration
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
328 SILVEIRA ET AL.
randomly.
Reinforcement per minute
10
for the RI and a possible RFF for the RDRL. For 15.15s, T=0.591, p=0.03901
both schedules, in addition to graphic represen-
4
tations, we consider how well each RFF fits our simu- 30.21s, T=1.000, p=0.03310
lated data using a goodness-of-fit measure (R2 ). Also,
2
TABLE 1 Equations explored here in investigating best RFF a ¼ 1=ðV =60Þ ¼ 60=V , where V still stands for the sched-
RI fit ule size in seconds. Therefore, the parameter a is a theo-
Reference RFF RI retical asymptote of reinforcement per minute, a constant
for which not estimative is required.
Baum (1981) R ¼ ðV =60
1
ÞþB1 Using an iterative least squares algorithm, we have
Killeen (1975) R ¼ ðV =60Þ 1 exp Bc
1 estimated the parameters b and c (Table 4) for all RDRL
h i in Figure 4. The R2 values summarized in this table show
Prelec (1982) R ¼ B 1 exp ðV =60 1
ÞB that Equation 7 is a proper RFF for the RDRL schedule
m
Rachlin (1978) R ¼ ðV =60
1 B (we dismiss a Bayesian information criterion analysis
Þ B max
simply because we do not know any viable alternative to
Note. The value for V was provided in seconds and scaled by 60 for conversion to model the RDRL).
minutes.
As in Killeen’s (1975) model, b controls the
decreasing and c the increasing of obtained reinforce-
ments. However, it is also noticeable that b and c vary in
the RI size. The parameter c shows a similar behavior a regular proportion. By assuming c ¼ b=e, we were able
across RI sizes, but it does not seem to have an upper to reduce Equation 7 to a single parameter b. In addition
limit. to that, it is also possible to show that
As previously mentioned, an appropriate feedback ln ðbÞ ¼ ln ðV Þ þ 6, reducing Equation 7 to Equation 8,
function should fit the data (Baum, 1992). In order to com- an equation with no free parameters that allows us to cal-
pare fit qualities, one possible criterion is the goodness-of- culate reinforcement rate a priori, similar to the RI RFF
fit measure, R2 , for which we suggest the threshold .90 by Baum (1981).
and .95 for good and excellent fit, respectively. Notwith-
standing, using R2 as the only criterion could be mislead- 60 V V
ing, as it usually favors more complex RFF. Thus, we R¼ exp 6 B exp 5 B : ð8Þ
V e e
will use BIC and AIC to compare models with different
numbers of parameters (Schwarz, 1978). Table 3 summa-
rizes the R2 and BIC estimated for each RFF. In Equation 8, reinforcement rate R is simply a func-
Our results favor Baum’s (1981) RFF regarding tion of response rate B and the schedule size in seconds
both excellent fitting (highest R2 ) and parsimony (low- V. Therefore, we can analytically explore the main quan-
est BIC/AIC). Overall, the R2 seems to decay as the RI titative features of the RFF RDRL. The first point of
sizes increase. Figure 3 brings graphical representation of interest concerns optimal behavior, it is the point Bm that
data for each simulated RI and how each RFF fits optimizes reinforcement rate to a maximum given by Rm .
our data. The point Bm is given by
e6
Random differential reinforcement of low rates Bm ¼ : ð9Þ
ðe 1ÞV
feedback function
As described, a RDRL could reinforce any IRT with a Substituting Equation 9 in Equation 8, we calculate the
certain probability. Therefore, we expect an optimal rate maximum reinforcement rate, given by Equation 10:
greater than the size of RDRL and, as a result, a maxi-
mum of reinforcement per minute falls short of the theo- 60 e
Rm ¼ ðe 1Þ exp : ð10Þ
retical asymptote deduced from the size of schedule. All V e1
these features are shown in Figure 4, which depicts the
points resulting from our simulation of four different The values obtained through Equations 9 and 10
RDRL—namely 2, 4, 8, and 16 s. allow some interesting predictions about optimal behav-
Our simulations are well described by the following ior. The RDRL, unlike the RI schedule, punishes high
equation: rates of behavior with reinforcement loss. Therefore, we
would expect actual subjects to respond at a rate close to
B B Bm . To the best of our knowledge, this experiment has
R ¼ a exp exp : ð7Þ not yet been done.
b c
Another point of interest is the inflection point, where
the marginal reinforcement loss is maximum. For
As for the RI schedule, R and B stand for rates of response rates greater than Bi , reinforcement loss
reinforcement and responses, respectively. The parameter increases at rising rates. The point Bi is given by
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
330 SILVEIRA ET AL.
RI size
RFF Parameter 5s 7s 10 s 15 s 30 s 60 s
TABLE 3 Fit precision and parsimony for each feedback function using simulated data from three RI schedules
RI size
RFF Measure 5s 7s 10 s 15 s 30 s 60 s
12
Reinforcement per minute
8
10
6
8
6
4
Simulated Data Simulated Data
4
2
R2=0.96347, Killeen (1975) R2=0.95274, Killeen (1975)
2
0
0 50 100 150 200 0 50 100 150 200
4
Reinforcement per minute
5
3
4
3
2
Simulated Data Simulated Data
2
1.0
Reinforcement per minute
0.8
1.5
0.6
1.0
0.4
0.2
0.0
FIGURE 3 Curve fit and R 2 for each feedback function using simulated data of RI 5, 7, 10, 15, 30, and 60 s
minute (i.e., RFF). It is important to emphasize that behavior. In other words, simulations map the norma-
these simulations do not replace the study of behavior. tive rules of schedules, whereas experiments map effec-
Simulations are concerned with normative rules of tive behaviors of organisms.
schedules, going through a large range of possible Comparisons between different RFFs for RI provide
response rates, and exhaustively repeating these condi- experimenters with better ways to describe the relation
tions. In this sense, Beak can provide orientation for a between behavior and environmental constraints. How-
researcher in creating an experimental scenario to which ever, deciding between curve fits is no simple matter
a biological organisms can be purposefully subjected. given that there are no definitive criteria. We will address
Because this biological being will behave with a certain the issue systematically, highlighting the pros and cons of
response rate, its confrontation with the simulation pre- each one of the four curve fits—Baum (1981), Killeen
dictions may clarify biases and constraints of actual (1975), Prelec (1982), and Rachlin (1978).
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
332 SILVEIRA ET AL.
exp ½12ð11eÞ ln ðV Þ
60 B
R¼ : ð13Þ
V B max
RDRL2s RDRL4s
30
15
Reinforcement per minute
25
20
10
Simulated data Simulated data
15 Killeen (1975), R²=0.994 Killeen (1975), R²=0.986
Free parameter curve Free parameter curve
Rm Rm
10
5
Ri Ri
5
Bm Bm Bi
0
0
0 50 100 150 200 0 50 100 150 200
RDRL8s RDRL16s
8
4
Reinforcement per minute
6
3
Simulated data Simulated data
Killeen (1975), R²=0.994 Killeen (1975), R²=0.999
4
2
Free parameter curve Free parameter curve
Rm Rm
Ri Ri
2
Bm Bi Bm Bi
0
F I G U R E 5 Curve fit or simulated data of four RDRL. Top horizontal dashed lines are given by 60=V , where V is the schedule size in seconds.
Decreasing dashed lines are adjusted by ð60=V Þexp ðB=cÞ, rising dashed lines are adjusted by ð60=V Þ½1 exp ðB=bÞ, and empty circles are
simulated data. Solid thick line is RDRL fit by Equation 7, and the dashed line is provided by Equation 8, where B stands for response rate, ðBm , Rm Þ
is the maximum point, and ðBi , Ri Þ is the inflection point.
likely IRT requirements. Here, we have implemented a optimal point, reinforcement rate would decrease asymp-
RDRL, a continuous version of the somewhat minimalist totically. Because Logan (1967) built his variable differ-
Logan’s RDRL. Even though Logan (1967) described his ential reinforcement of low rates schedule out of two
results in terms of proportion of IRT, we argue that his intervals, optimal rates of response could be easily pre-
seminal data are in agreement with the RFF RDRL gen- dicted. In fact, rats that served as subjects learned how to
eral shape produced using Beak. maximize reinforcement by responding after the shorter
Logan found that the most likely IRT observed in the interval and then waiting for the longer one. However,
experiment “approximated an optimal strategy for maxi- using a geometric distribution for the values that com-
mizing reward” (Logan, 1967, p. 393). This meant that pose the schedule we should expect the peaks observed in
the subjects’ first response after reinforcement occurred Logan’s experiment to merge, forming the curves seen in
with an IRT slightly longer than the smaller DRL inter- Figures 4 and 5. Figure 5 shows that the greater the size
val out of the two programmed, and further responses of the schedule programmed, the sharpest the curve on
happened with IRT around the other (longer) DRL inter- the peak of the RFF.
val. Therefore, he found two peaks of likely IRT that Our results show that Killeen’s (1975) model is a via-
matched the DRL intervals used. ble RFF for the RDRL schedule. Killeen (1975) used
Considering that behavior rate equals the reciprocal Equation 7 to model response probability as a function
of IRT, Logan’s results allow us to intuit what a RDRL of time elapsed since the last reinforcement. The model is
RFF should look like. Reinforcers per minute should based on two competing processes controlled by parame-
increase along with response rate until a certain maxi- ters b (concurrent) and c (inhibitory). Killeen (1975)
mum. However, if the response rate increases beyond this interpreted the former as a measure of an increasing
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
334 SILVEIRA ET AL.
of parsimony. In this case, realism introduced unneces- relationships, affiliations, knowledge or beliefs) in the
sary complications with no gain in explanatory power. subject matter or material discussed in this manuscript.
Regarding schedules in which reinforcement may
depend on both the passage of time and the occurrence of ETHICS ST ATE ME NT
responses, the RDRL is a way to further constrain rein- This investigation is purely theoretical; thus, it was not
forcement in comparison to the RI schedule. The RFF submitted to any ethics committee.
RDRL is like RFF RI, in a sense that in both cases the
rate of reinforcement depends on the response rate. OR CID
Therefore, we have found increasing functions at low Paulo Sergio Panse Silveira https://orcid.org/0000-
rates of response. However, these functions are also nega- 0003-4110-1038
tively accelerated functions. This represents the restric- José de Oliveira Siqueira https://orcid.org/0000-0002-
tion imposed by time, which is present in both schedules. 3357-8939
The RFFs of the two schedules differ in the extent to João Lucas Bernardy https://orcid.org/0000-0002-3805-
which the RDRL schedules further constrain reinforce- 7366
ment. In the interval schedule, the response rate has a Jessica Santiago https://orcid.org/0000-0002-7788-5455
positive monotonic relation with the ever-increasing rate Thiago Cersosimo Meneses https://orcid.org/0000-
of reinforcement. That is not the case in the RDRL. In 0003-3473-5841
the RDRL schedule, high rates of response are negatively Bianca Sanches Portela https://orcid.org/0000-0002-
punished by the postponement of reinforcement. In fact, 1351-652X
this feedback system is well described by two competing Marcelo Frota Benvenuti https://orcid.org/0000-0002-
processes (Killeen, 1975). 9397-3033
Briefly, our results demonstrate the power of our
computational simulation to analyze basic schedules of REF ER ENCE S
reinforcement and refine ways to implement them. The Aasvee, K., Rasmussen, M., Kelly, C., Kurvinen, E., Giacchi, M. V., &
enormous computational power available today should Ahluwalia, N. (2015). Validity of self-reported height and weight for
be used to offer, for instance, a variety of intervals estimating prevalence of overweight among Estonian adolescents:
instead of a simple shuffle of a small set of intervals mim- The Health Behaviour in School-aged Children study. BMC
icking older devices. Also, based on our results, we Research Notes, 8(1), 606. https://doi.org/10.1186/s13104-015-1587-9
Ambler, S. (1973). A mathematical model of learning under schedules of
revised RI RFF and proposed a RDRL RFF. Using interresponse time reinforcement. Journal of Mathematical Psychol-
computer simulation prevents unnecessary use of time ogy, 10(4), 364–386. https://doi.org/10.1016/0022-2496(73)90023-0
and long experimentation without a clear notion of the Baum, W. M. (1973). The correlation-based law of effect. Journal of the
normative rule that may be governing the strategic Experimental Analysis of Behavior, 20(1), 137–153. https://doi.org/
options involved. The new implementation methods pre- 10.1901/jeab.1973.20-137
Baum, W. M. (1981). Optimization and the matching law as accounts
sented paves the way for a richer study of schedules of of instrumental behavior. Journal of the Experimental Analysis of
reinforcement and their normative maximization rules, Behavior, 36(3), 387–403. https://doi.org/10.1901/jeab.1981.36-387
serving also as a guide toward promising questions that Baum, W. M. (1992). In search of the feedback function for variable-
future experiments may want to explore. interval schedules. Journal of the Experimental Analysis of Behav-
ior, 57(3), 365–375. https://doi.org/10.1901/jeab.1992.57-365
Baum, W. M. (1993). Performances on ratio and interval schedules of
F U N D I NG S T A T E M E NT reinforcement: Data and theory. Journal of the Experimental Anal-
Marcelo Benvenuti is member of the National Institute ysis of Behavior, 59(2), 245–264. https://doi.org/10.1901/jeab.1993.
of Science and Technology on Behavior, Cognition, and 59-245
Teaching (INCT - ECCE), supported by São Paulo Catania, A. C., & Reynolds, G. S. (1968). A quantitative analysis of the
Research Foundation (FAPESP, grant No. 2014/50909-8), responding maintained by interval schedules of reinforcement.
Journal of the Experimental Analysis of Behavior, 11(3S2),
the Brazilian Council for Scientific and Technological 327–383. https://doi.org/10.1901/jeab.1968.11-s327
Development (CNPq; grant #465686/2014-1), and the Dews, P. B. (1962). Psychopharmacology. In A. J. Bachrach (Ed.),
Coordination for the Improvement of Higher Education Experimental foundations of clinical psychology (4th ed.,
Personnel (CAPES; grant #88887.136407/2017-00). pp. 423–441). Basic Books.
Fantino, E. (1998). Behavior analysis and decision making. Journal of
the Experimental Analysis of Behavior, 69(3), 355–364. https://doi.
C ON F L I CT O F I NT E R E ST org/10.1901/jeab.1998.69-355
The authors declare no conflict of interest. The authors Feller, W. (1968). An introduction to probability theory and its applica-
certify that they have NO affiliations with or involvement tions (Vol. 1, 3rd ed.). Wiley.
in any organization or entity with any financial interest Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement.
(such as honoraria; educational grants; participation in Appleton-Century-Crofts.
Fleshler, M., & Hoffman, H. S. (1962). A progression for generating
spearkers’ bureaus; membership, employment, consultan- variable-interval schedules. Journal of the Experimental Analysis of
cies, stock ownership, or other equity interest; and expert Behavior, 5(4), 529–530. https://doi.org/10.1901/jeab.1962.5-529
testimony or patent-licensing arrangements), or non- Galizio, M., & Buskist, W. (1988). Laboratory lore and research prac-
financial interest (such as personal or professional tices in the experimental analysis of human behavior: Selecting
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
336 SILVEIRA ET AL.
reinforcers and arranging contingencies. The Behavior Analyst, Rachlin, H. (1978). A molar theory of reinforcement schedules. Journal
11(1), 65–69. https://doi.org/10.1007/bf03392457 of the Experimental Analysis of Behavior, 30(3), 345–360. https://
Goodie, A. S., & Fantino, E. (1995). An experientially derived base-rate doi.org/10.1901/jeab.1978.30-345
error in humans. Psychological Science, 6(2), 101–106. https://doi. Rachlin, H. (1989). Judgment, decision, and choice: A cognitive/behavioral
org/10.1111/j.1467-9280.1995.tb00314.x synthesis. Freeman.
Green, L., Rachlin, H., & Hanson, J. (1983). Matching and maximizing Rachlin, H., & Green, L. (1972). Commitment, choice, and self-control.
with concurrent ratio-interval schedules. Journal of the Experimen- Journal of the Experimental Analysis of Behavior, 17(1), 15–22.
tal Analysis of Behavior, 40(3), 217–224. https://doi.org/10.1901/ https://doi.org/10.1901/jeab.1972.17-15
jeab.1983.40-217 Reilly, M. P. (2003). Extending mathematical principles of reinforcement
Herrnstein, R. J. (1961). Relative and absolute strength of response as a into the domain of behavioral pharmacology. Behavioural Processes,
function of frequency of reinforcement. Journal of the Experimen- 62(1–3), 75–88. https://doi.org/10.1016/S0376-6357(03)00027-5
tal Analysis of Behavior, 4(3), 267–272. https://doi.org/10.1901/ Schwarz, G. (1978). Estimating the dimension of a model. The Annals
jeab.1961.4-267 of Statistics, 6(2), 461–464. https://doi.org/10.2307/2958889
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimen- Staddon, J. E. R. (1977). Schedule-induced behavior. In W. K. Honig &
tal Analysis of Behavior, 13(2), 243–266. https://doi.org/10.1901/ J. E. R. Staddon (Eds.), Handbook of operant behavior
jeab.1970.13-243 (pp. 125–152). Prentice-Hall.
Hyndman, R. J. (1996). Computing and graphing highest density Stoddard, L. T., Sidman, M., & Brady, J. V. (1988). Fixed-interval and
regions. American Statistician, 50(2), 120–126. https://doi.org/10. fixed-ratio reinforcement schedules with human subjects. The Analysis
1080/00031305.1996.10474359 of Verbal Behavior, 6(1), 33–44. https://doi.org/10.1007/bf03392827
Killeen, P. R. (1975). On the temporal control of behavior. Psychologi- Weiss, S. J., & Van Ost, S. L. (1974). Response discriminative and rein-
cal Review, 82(2), 89–115. https://doi.org/10.1037/h0076820 forcement factors in stimulus control of performance on multiple
Killeen, P. R., & Sitomer, M. T. (2003). MPR. Behavioural Processes, and chained schedules of reinforcement. Learning and Motivation,
62(1), 49–64. https://doi.org/10.1016/S0376-6357(03)00017-2 5(4), 459–472. https://doi.org/10.1016/0023-9690(74)90004-6
Logan, F. A. (1967). Variable DRL. Psychonomic Science, 9(7), Wyckoff, L. B. J. (1969). The role of observing responses in discrimina-
393–394. https://doi.org/10.3758/BF03330862 tion learning: Part II. In D. P. Hendry (Ed.), Conditioned rein-
Machado, A. (1997). Learning the temporal dynamics of behavior. Psy- forcement (pp. 237–260). The Dorsey Press.
chological Review, 104(2), 241–265. https://doi.org/10.1037/0033-
295X.104.2.241
Mazur, J. E. (2016). Learning and behavior. Prentice-Hall.
Millenson, J. R. (1963). Random interval schedules of reinforcement.
How to cite this article: Silveira, P. S. P.,
Journal of the Experimental Analysis of Behavior, 6(3), 437–443.
https://doi.org/10.1901/jeab.1963.6-437 de Oliveira Siqueira, J., Bernardy, J. L., Santiago,
Nevin, J. A., & Baum, W. M. (1980). Feedback functions for variable- J., Meneses, T. C., Portela, B. S., & Benvenuti,
interval reinforcement. Journal of the Experimental Analysis of M. F. (2023). Modeling VI and VDRL feedback
Behavior, 34(2), 207–217. https://doi.org/10.1901/jeab.1980.34-207 functions: Searching normative rules through
Pierce, W. D., & Cheney, C. D. (2017). Behavior analysis and learning:
computational simulation. Journal of the
A biobehavioral approach. Erlbaum.
Prelec, D. (1982). Matching, maximizing, and the hyperbolic reinforce- Experimental Analysis of Behavior, 119(2),
ment feedback function. Psychological Review, 89(3), 189–230. 324–336. https://doi.org/10.1002/jeab.826
https://doi.org/10.1037/0033-295X.89.3.189