Modeling VI and VDRL Feedback Functions: Searching Normative Rules Through Computational Simulation

Received: 6 July 2022 Accepted: 20 November 2022
DOI: 10.1002/jeab.826
RESEARCH ARTICLE
Modeling VI and VDRL feedback functions: Searching normative

rules through computational simulation
Paulo Sergio Panse Silveira 1 | José de Oliveira Siqueira 2 |

João Lucas Bernardy 3,4 | Jessica Santiago 3
| Thiago Cersosimo Meneses 3 |
Bianca Sanches Portela 3 | Marcelo Frota Benvenuti 3,4
1
Department of Pathology, University of São Abstract
Paulo, Medical School, São Paulo, Brazil
2
We present the mathematical description of feedback functions of variable interval
Department of Legal Medicine, Medical
Ethics, Work and Social Medicine, University
and variable differential reinforcement of low rates as functions of schedule size
of São Paulo, Medical School, São Paulo, only. These results were obtained using an R script named Beak, which was built
Brazil to simulate rates of behavior interacting with simple schedules of reinforcement.
3
Department of Experimental Psychology, Using Beak, we have simulated data that allow an assessment of different rein-
University of São Paulo, Institute of
Psychology, São Paulo, Brazil
forcement feedback functions. This was made with unparalleled precision, as simu-
4
National Institute of Science and Technology
lations provide huge samples of data and, more importantly, simulated behavior is
on Behavior, Cognition and Teaching, not changed by the reinforcement it produces. Therefore, we can vary response
São Paulo, Brazil rates systematically. We’ve compared different reinforcement feedback functions
for random interval schedules, using the following criteria: meaning, precision,
Correspondence
Marcelo Frota Benvenuti, Instituto de
parsimony, and generality. Our results indicate that the best feedback function for
Psicologia da USP, Av. Prof. Mello Moraes, the random interval schedule was published by Baum (1981). We also propose that
1721-030, Sao Paulo, SP, Brazil. the model used by Killeen (1975) is a viable feedback function for the random dif-
Email: mbenvenuti@usp.br
ferential reinforcement of low rates schedule. We argue that Beak paves the way
for greater understanding of schedules of reinforcement, addressing still open ques-
Funding information
São Paulo Research Foundation, Grant/Award tions about quantitative features of simple schedules. Also, Beak could guide
Number: 2014/50909-8; Brazilian Council for future experiments that use schedules as theoretical and methodological tools.
Scientific and Technological Development,
Grant/Award Number: 465686/2014-1;
Coordination for the Improvement of Higher KEYWORDS
Education Personnel, Grant/Award Number: reinforcement feedback function, simple schedules of reinforcement, simulation, variable differential
88887.136407/2017-00 reinforcement of low rates, variable interval
Editor-in-Chief: Mark Galizio
Handling Editor: Sarah Cowie
The general definition of operant behavior implies that procedures. A quantitative analysis of feedback functions
behavior controls environmental changes. Ferster and and their main features would directly address some old
Skinner (1957) emphasized how these changes shaped dif- yet still pending questions about schedules of reinforce-
ferent patterns of behavior. In their work behavior was ment (e.g., Baum, 1973, 1993; Catania & Reynolds, 1968;
the dependent variable and reinforcement was the inde- Killeen, 1975; Rachlin, 1978) and guide future research
pendent variable. On the other hand, the causality can be that uses schedules as a methodological tool.
reversed: rate of reinforcement may be treated as the In this work, we seek to resume the long-dormant dis-
dependent variable and rates of behavior as the indepen- cussion about RFF of simple schedules through a compu-
dent variable. The mathematical description of such a tational routine called Beak. This routine simulates rates of
relation is called reinforcement feedback function, or sim- behavior interacting with schedules of reinforcement. Our
ply RFF (Baum, 1973; Rachlin, 1978). major contribution is that it allows us to test many possible
Recent technologies pave the way for a more precise response rates without having to rely on extensive experi-
quantitative description of reinforcement processes and mentation with actual subjects. Despite the name Beak, it
324 © 2023 Society for the Experimental Analysis of Behavior. wileyonlinelibrary.com/journal/jeab J Exp Anal Behav. 2023;119:324–336.
19383711, 2023, 2, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/jeab.826 by CAPES, Wiley Online Library on [10/04/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SIMPLE SCHEDULES 325
is important to emphasize that we do not seek to simulate and Skinner (1957) in their seminal work. According to
response patterns of any specific animal (e.g., rats, bees, these authors, every schedule of reinforcement could be
pigeons, humans). Our goal is to build rules about possible “represented by a certain arrangement of timers, coun-
outputs of a schedule over a large range of random ters and relay circuits” (Ferster & Skinner, 1957, p. 10).
response rates. These rules could guide future experiments Still, most textbooks and technical papers omit relevant
that use schedules as theoretical and methodological tool. details about schedule algorithms and emphasize the
behavioral patterns associated with each simple sched-
ule (e.g., Mazur, 2016; Pierce & Cheney, 2017).
Schedules as algorithms This discussion is not confined to solely theoretical
matters. Schedules of reinforcement are held as crucial
Schedules of reinforcement are core concepts for the methodological tools for behavioral scientists to produce
experimental analysis of behavior. The algorithms and and analyze many experimental results. The correct inter-
rules that define schedules, however, are usually taken pretation of these results relies on clarity of schedule defi-
for granted, except for initial works (e.g., Catania & nitions when applied to problems such as discrimination
Reynolds, 1968; Ferster & Skinner, 1957; Fleshler & learning by the use of multiple schedules (Ferster &
Hoffman, 1962; Millenson, 1963). The absence of sched- Skinner, 1957; Weiss & Van Ost, 1974), observing behav-
ule appraisal in the current literature is a potential prob- ior and conditioned reinforcement (Wyckoff, 1969),
lem because it could hinder replication. choice by the use of concurrent schedules (Herrnstein,
A schedule of reinforcement is a set of rules that 1961, 1970), self-control by the use of concurrent chained
describes how behavior can produce reinforcers schedules (Rachlin & Green, 1972), behavior pharmacol-
(Ferster & Skinner, 1957). Although the literature on the ogy (Dews, 1962; Reilly, 2003), and decision making and
topic presents a myriad of schedule designations, all of bias (Fantino, 1998; Goodie & Fantino, 1995).
them derive from the criteria used to define the so-called
simple schedules. Fundamentally, reinforcers can be a
function of a number of responses, of the passage of time, Schedules as feedback functions
or some combination of both.
All these schedule requirements can be either fixed The search for feedback functions for basic schedules is
(F) or variable (V). On fixed schedules, the criterion to be an important attempt to find normative rules about how
met (schedule size) is constant between reinforcers. On simple schedules constrain reinforcement. This quantita-
variable schedules, this criterion is an average of a set of tive signature of schedules precedes the empirical pattern
values. In the late fifties, implementing a variable sched- associated with each schedule and the ensuing contro-
ule could be a challenge. Ferster and Skinner (1957) did versy on differences among species, related repertoires,
so, selecting a series of values with an intended mean and and stability criterion (e.g., Galizio & Buskist, 1988;
“scrambling” them. However simple, this solution raises Stoddard et al., 1988). Reinforcement feedback functions
some important questions. How many values should one (RFFs) allow us to discover optimal relations between
use? How should the relative frequency of such intervals behavior and reinforcement for each schedule and so
be distributed? Does scrambling mean randomness? pose a way to propose normative rules for what to expect
Intuitively, one should build a schedule with as many from actual (optimal) behavior. That is why RFFs are a
values as possible in order to diminish predictability. Yet, research topic in their own right. Still, the feedback func-
in the past, researchers implemented schedules using a tion of many schedules remains an open subject. The gen-
punched tape, in which the distances between holes corre- eral shape of some RFFs is well known. Figure 1 depicts
sponded to multiples of values that originated the variable schematic RFFs, based on Rachlin (1989).
schedule. Therefore, this method imposed a practical limi- Each RFF clarifies how rates of reinforcement are con-
tation because too many values meant very long tapes, strained by basic schedules in a molar level of analysis.
which could lead to more technical difficulties (Catania & Time schedules (e.g., variable time, VT) do not depend on
Reynolds, 1968). The electromechanical apparatus also behavior. Therefore, the RFF is a horizontal line with an
constrained choices regarding the distribution of frequency intercept equal to the rate of reinforcement deduced from
of interval values. Because it limited the number of values, the schedule’s size. In ratio schedules (e.g., variable rate,
distributions were always discrete. Instead of variable VR), rates of behavior and reinforcement have a linear
schedules, modern computers can easily apply random relation, with an intercept equal to zero and a slope that is
(R) procedures with intervals distributed according to con- the reciprocal of the ratio size. In interval schedules
tinuous density probability functions as a feasible alterna- (e.g., variable interval, VI), reinforcement rate is further
tive to fixed and variable schedules. constrained by a temporal criterion, altering the prior lin-
The absence of discussions addressing the schedule’s ear function. In such cases, rates of response control
algorithms used along many experiments suggests an increasing rates of reinforcement until an asymptotic level.
apparent, but false, consensus. There are several critical As Baum (1992) pointed out, a viable RFF should
aspects to defining and implementing schedules of rein- fit the experimental data. But one cannot directly
forcement, which were already recognized by Ferster manipulate rates of behavior in the animal laboratory
326 SILVEIRA ET AL.
METHOD
Here we describe how we have implemented simple
schedules and responses on Beak. For the sake of parsi-
mony, we will describe the random interval (RI) and
random differential reinforcement of low rates (RDRL).
Our implementations of simple schedules are mainly
based on initial work by Millenson (1963) and Ambler
(1973). We consider their implementation ideal, because
they are continuous versions of the discrete (and more
widely used) algorithms (Fleshler & Hoffman, 1962).
F I G U R E 1 Schematic feedback functions for three fundamental Our implementation of responses is like the one by
variable schedules: VT, VI, and VR. The VT schedule does not depend
Green et al. (1983). Specifically, here, p stands simply
on behavior because reinforcers are provided at average time intervals.
The VR schedule completely depends on the animal’s behavior because for response probability, whereas 1 p stands for a prob-
reinforcers are provided after a given number of responses. The VI ability of no response. Also, trials can happen every frac-
schedule is a middle ground in which the reinforcer becomes available tion of a second, depending on the response rates we
at intervals, but it is only received after an animal’s response. want to investigate.
without changing critical aspects of the environment. Random interval

That is an important limitation, as RFF models the
environment as a function of behavior and the rate of Millenson (1963) proposed the RI schedule as a version
response is the independent variable, but one that we of VI schedules. Millenson’s RI is a function of the
cannot manipulate systematically. Therefore, even with parameters T and p, where T stands for the duration of a
large samples of behavior, experiments rarely cover a cycle in any unit of time, at the end of which there is a
sufficiently wide range of response rates (Baum, 1992), probability p of reinforcement assignment. The interas-
whereas simulations allow the experimenter to explore signment time (IAT) is the number of cycles with dura-
and predict what optimal performances would look tion T until reinforcement assignment.
like for a wide range of environmental conditions. In For every specific RI size, there are infinite possible
fact, experiments with humans and animals do not combinations of T and p. However, not every combina-
seem to be the best choice to define a RFF, which was tion is eligible for our purposes: behavioral researchers
the historical attempt; their utility is the discovery of should find values of T and p that will produce an IAT
what strategies among many an organism can adopt with geometric distribution with mean equals to
and under which circumstances, given the normative
rules predicted by simulations (literally providing a T
μRI ¼ : ð1Þ
comprehensive map for each schedule), thus opening a p
whole new string of research.
A consequence of this approach is that the ideal con- To achieve a geometric distribution, we must meet
ditions for investigation of RFF are better achievable two requirements. First, the distribution’s average
through computer simulation because we can prevent (Equation 1) must be equal to the standard deviation of
simulated behavior from changing as a function of rates the distribution (σ), given by
of outcomes. Rather, we can vary it systematically. For
that reason, we have developed Beak, which allows us to T pffiffiffiffiffiffiffiffiffiffi
σ RI ¼ 1 p: ð2Þ
analyze with unprecedented precision the quantitative p
features of feedback functions and build normative rules
for different contingencies. Second, the geometric distribution of IAT will
Although Beak implemented other basic random approach an exponential distribution as T approaches
schedules such as RT and RR, in the present paper, we zero. The exponential distribution is desirable because it
will discuss the curve fit presented by Baum (1982), Pre- has the inherent property of lack of memory
lec (1982), and Rachlin (1978) for the RI schedule (the (Feller, 1968), videlicet, its past behavior bears no infor-
pure time or ratio schedules are extremes with monoto- mation about the future behavior of IAT distribution.
nous behavior that do not need further discussion for the This property is critical for a more refined implementa-
present focus). We also show that a curve from tion of variable schedules because it ensures minimum
(Killeen, 1975), which was originally proposed in a differ- predictability, as Fleshler and Hoffman (1962) intended.
ent theoretical context, is a viable RFF for RI schedules. Also, the exponential distribution conveniently portrays
More interestingly, this function is also a suitable RFF the continuity of time.
for the random differential reinforcement of low rates On the other hand, Millenson (1963) pointed out
that we could extensively map with Beak. that T should also be greater than the average time of
reinforcer consumption. For studies with approximately parameter T is not affected by the organism’s behavior,
zero consumption time, we argue that T ≤ 1 second is a whereas the same parameter, in the RDRL, is directly
convenient heuristic for T to meet both requirements affected by the organism’s IRT. This happens because
simultaneously. the chronometer that registers time during each cycle
Given that the implemented schedule is a function of resets after every response emitted, which causes a cycle
T and p, it is unlikely that the average and standard devi- of time T to only be fully completed if no responses are
ation will be identical to the planned value. Therefore, we emitted in the meantime. Such a condition makes p con-
suggest a 1% margin of tolerance. If x is the planned ditional to the organism’s IRT, so in order to obtain a
schedule average and standard deviation, this margin of mean value for the probability of reinforcement in the
tolerance for the mean can be described as session one must consider the minimum IRT the schedule
requires (the size of the RDRL).
jx Tp j In other words, the same relation between T and p
≤ 0:01: ð3Þ that defines the average IAT of an RI defines the
x
average IRT with which the organism is required to
comply in a RDRL. Therefore, substituting T for T 0 , to
Applying the same margin of tolerance to the stan- emphasize such a difference, the mean RDRL size is
dard deviation, given by
pffiffiffiffiffiffiffiffiffiffi
T
1p T0
1≥
p
≥ 0:99: ð4Þ μRDRL ¼ : ð5Þ
x p
In other words, values of T and p that meet the The parameter T 0 is the minimum IRT required by
requirements expressed in Equations 3 and 4 will produce the schedule for reinforcement assignment, and p is the
an RI with exponential distribution of IAT that is suffi- probability that a reinforcer is assigned by the end of T 0 .
ciently close to a RI x (of same size) as planned before- Here we will use Beak to draw the RFF RDRL and dis-
hand. A small R script to determine adequate cuss a convenient curve fit. Even though the RDRL was
combinations of T and p is available as supplemental implemented in the animal laboratory (Aasvee
material (available at: https://sourceforge.net/projects/ et al., 2015; Logan, 1967), to the best of our knowledge,
simpleschedules/). no further studies have been published about the RFF of
After choosing appropriate values for T and p, the the RDRL.
simulation starts running. A given interval will elapse
until the first reinforcer is assigned. After every reinforced
response, the chronometer restarts. That poses the inter- Simulating responses
val schedule’s criterion for reinforcement presentation
based on the period between two consecutive reinforcers Here we will present the assumptions of Beak regarding
(reinforcement as a function of both responding and pas- the implementation of responses to study schedules of
sage of time). Using such an implementation and data reinforcement using computational simulation. Beak pro-
produced using Beak, we will discuss the shape of the duces instantaneous responses programmed as a
RFF RI. Bernoulli process, where a success corresponds to the
emission of a response. The simulation explores a range
of response rates (B), with B constant along each session.
Random differential reinforcement of low rates The probability of response emission at each instant of
time (pb ) for each session is given by
In the well-known DRL schedule (differential reinforce-
ment of low rates of behavior), a minimum interresponse Bt
pb ¼ : ð6Þ
time (IRT) must precede rewarded responses (Ferster & 60
Skinner, 1957). Using Beak, we were able to implement
the random differential reinforcement of low rates—the The simulation evolves in discrete steps. Each second
RDRL schedule (Ambler, 1973; Logan, 1967). In a is fractioned according to t (the minimum possible IRT).
RDRL schedule, the required IRT varies randomly. Such The mean rate of responses, B, is provided in minutes
variation is a function of parameters like those used to (the correspondence from minutes to seconds is repre-
implement the RI schedule (Millenson, 1963). sented by the constant 1/60 in Equation 6). For instance,
Just like the previously defined RI, a reinforcement is a response rate of 100 per minute and a second parti-
assigned with probability equal to p every T seconds. The tioned in intervals of 5/1000 of a second, would result in
difference relies on the fact that, in the RI schedule, the pb ≈ 0:0083 (the probability of response in each iteration
328 SILVEIRA ET AL.
step). To investigate higher values of B, t needs to be rate of reinforcement in RI schedules is monotonically

smaller, making the simulation finer with greater compu- increased and negatively accelerated, as shown in
tational cost. Additionally, as this rate of trials increases, Figure 2.
the Bernoulli process approaches a time continuity, as in Below we will compare different RFFs for the RI. We
a Poisson process. have tested RFFs by Baum (1981), Prelec (1982), and
The researcher also determines session duration and Rachlin (1978). We also tested a RFF for the RI, pre-
the number of session repetitions. Beak records the rein- sented in a different theoretical context by Killeen (1975).
forcement rate (reinforcers per minute) of each repetition These RFFs are summarized in Table 1. In all RFF, B
and computes the 95% highest density interval (Hyndman, stands for the response rate, R stands for reinforcement
1996). For this work we computed 500 repetitions of 1-hr rate, and V stands for the size schedule in seconds,
sessions; therefore, each point of our simulations corre- whereas c and m are free parameters that are estimated a
sponding to a given B was the result of 500 sessions, each posteriori.
one depending on 720,000 iterations (3,600 0:005 1
), total- To fit the curves presented to the data we
ing 3:6 10 trials. Because B ranged from 0 to 200 (inte-
8
obtained through simulation, we allowed all parame-
ger values), the definition of each RFF depicted below ters of the equations to vary (except B and R, which are
was obtained by 7:236 1010 events. With such a number the variables we want to describe) to best fit the curve to
of trials, the obtained average rate of responses draws the data by a nonlinear least squares method. From that
itself nearer to the nominal rate of responses determined it follows that we have estimated parameters, which are
by the experimenter. not exactly those obtained empirically but are a
good approximation. For instance, we have an estimated
V that approaches the schedule’s size that was defined
Break-and-run patterns by the experimenter but renders a more accurate
description of the data than the one defined a priori.
For another simulation set, we have included two new We do that because we want to know how faithfully
parameters to implement a break-and-run pattern of one can assume that this parameter approaches the
responses (Nevin & Baum, 1980). We have started these schedule’s size. Therefore, we investigated how V, m, and
simulations running a probability of starting a new burst c vary across RI. These results are summarized in
of responses (i.e., probability of a run, Pr ), and during a Table 2.
run, a probability of stopping the emission of responses Table 2 shows that our estimations of V are all fairly
(i.e., the probability of a break, Pb ). During each run close to the chosen schedule sizes. The parameter m
period, responses are emitted according to the same (Rachlin, 1978, 1989) seems to be an inverse function of
Bernoulli process previously defined, meaning that the
probability of occurrence of each response is conditioned
to the continuity of a run. Such simulations allow us to RI feedback functions
compare two different accounts about the structure of
14
behavior, one that assumes that behavior is random and

other that assumes clusters of behavior start and stop 5.05s, T=0.197, p=0.03901
12
randomly.
Reinforcement per minute
10
7.07s, T=0.277, p=0.03918

RESULTS
8
10.10s, T=0.395, p=0.03911

Our results include a comparison of different RFFs
6
for the RI and a possible RFF for the RDRL. For 15.15s, T=0.591, p=0.03901
both schedules, in addition to graphic represen-
4
tations, we consider how well each RFF fits our simu- 30.21s, T=1.000, p=0.03310
lated data using a goodness-of-fit measure (R2 ). Also,
2
59.92s, T=1.000, p=0.01669

for RI schedules, we computed the Bayesian
0
information criterion (BIC) and Akaike information cri-

0 50 100 150 200
terion (AIC) values as parsimony and generality Response rate
measures.
F I G U R E 2 Simulated data relating mean reinforcement per
minute to responses per minute in three VI schedules. The RI
Random interval curve fit schedules are the best approximation of RI 5, RI 7, RI 10, RI 15, RI
30, and RI 60 s, with IAT geometrical distribution as function of
T and p (see text). Average is represented by points and 95% highest
The results of our simulation are in agreement with density intervals by vertical bars from 500 repetitions of simulated
Millenson’s (1963) RI findings. We also found that the sessions of 1 hr.
TABLE 1 Equations explored here in investigating best RFF a ¼ 1=ðV =60Þ ¼ 60=V , where V still stands for the sched-
RI fit ule size in seconds. Therefore, the parameter a is a theo-
Reference RFF RI retical asymptote of reinforcement per minute, a constant
for which not estimative is required.
Baum (1981) R ¼ ðV =60
1
ÞþB1 Using an iterative least squares algorithm, we have

Killeen (1975) R ¼ ðV =60Þ 1 exp Bc
1 estimated the parameters b and c (Table 4) for all RDRL
h i in Figure 4. The R2 values summarized in this table show
Prelec (1982) R ¼ B 1 exp ðV =60 1
ÞB that Equation 7 is a proper RFF for the RDRL schedule
m
Rachlin (1978) R ¼ ðV =60
1 B (we dismiss a Bayesian information criterion analysis
Þ B max
simply because we do not know any viable alternative to
Note. The value for V was provided in seconds and scaled by 60 for conversion to model the RDRL).
minutes.
As in Killeen’s (1975) model, b controls the
decreasing and c the increasing of obtained reinforce-
ments. However, it is also noticeable that b and c vary in
the RI size. The parameter c shows a similar behavior a regular proportion. By assuming c ¼ b=e, we were able
across RI sizes, but it does not seem to have an upper to reduce Equation 7 to a single parameter b. In addition
limit. to that, it is also possible to show that
As previously mentioned, an appropriate feedback ln ðbÞ ¼ ln ðV Þ þ 6, reducing Equation 7 to Equation 8,
function should fit the data (Baum, 1992). In order to com- an equation with no free parameters that allows us to cal-
pare fit qualities, one possible criterion is the goodness-of- culate reinforcement rate a priori, similar to the RI RFF
fit measure, R2 , for which we suggest the threshold .90 by Baum (1981).
and .95 for good and excellent fit, respectively. Notwith-

standing, using R2 as the only criterion could be mislead- 60 V V
ing, as it usually favors more complex RFF. Thus, we R¼ exp 6 B exp 5 B : ð8Þ
V e e
will use BIC and AIC to compare models with different
numbers of parameters (Schwarz, 1978). Table 3 summa-
rizes the R2 and BIC estimated for each RFF. In Equation 8, reinforcement rate R is simply a func-
Our results favor Baum’s (1981) RFF regarding tion of response rate B and the schedule size in seconds
both excellent fitting (highest R2 ) and parsimony (low- V. Therefore, we can analytically explore the main quan-
est BIC/AIC). Overall, the R2 seems to decay as the RI titative features of the RFF RDRL. The first point of
sizes increase. Figure 3 brings graphical representation of interest concerns optimal behavior, it is the point Bm that
data for each simulated RI and how each RFF fits optimizes reinforcement rate to a maximum given by Rm .
our data. The point Bm is given by
e6
Random differential reinforcement of low rates Bm ¼ : ð9Þ
ðe 1ÞV
feedback function
As described, a RDRL could reinforce any IRT with a Substituting Equation 9 in Equation 8, we calculate the
certain probability. Therefore, we expect an optimal rate maximum reinforcement rate, given by Equation 10:
greater than the size of RDRL and, as a result, a maxi-
mum of reinforcement per minute falls short of the theo- 60 e
Rm ¼ ðe 1Þ exp : ð10Þ
retical asymptote deduced from the size of schedule. All V e1
these features are shown in Figure 4, which depicts the
points resulting from our simulation of four different The values obtained through Equations 9 and 10
RDRL—namely 2, 4, 8, and 16 s. allow some interesting predictions about optimal behav-
Our simulations are well described by the following ior. The RDRL, unlike the RI schedule, punishes high
equation: rates of behavior with reinforcement loss. Therefore, we
would expect actual subjects to respond at a rate close to

B B Bm . To the best of our knowledge, this experiment has
R ¼ a exp exp : ð7Þ not yet been done.
b c
Another point of interest is the inflection point, where
the marginal reinforcement loss is maximum. For
As for the RI schedule, R and B stand for rates of response rates greater than Bi , reinforcement loss
reinforcement and responses, respectively. The parameter increases at rising rates. The point Bi is given by
330 SILVEIRA ET AL.
TABLE 2 Parameter estimates for each RFF RI using beak
RI size
RFF Parameter 5s 7s 10 s 15 s 30 s 60 s
Baum (1981) V 4.91 6.92 9.92 14.88 29.86 59.56

Killeen (1975) V 5.41 7.49 10.55 15.59 30.72 60.58
c 18.474 14.059 10.441 7.383 3.961 2.107
Prelec (1982) V 5.25 7.29 10.33 15.32 30.38 60.17
Rachlin (1978) V 4.90 6.83 9.71 14.52 29.20 58.60
m 0.210 0.174 0.142 0.111 0.071 0.043
TABLE 3 Fit precision and parsimony for each feedback function using simulated data from three RI schedules
RI size
RFF Measure 5s 7s 10 s 15 s 30 s 60 s
Baum (1981) R2 0.99993 0.99985 0.99975 0.99933 0.99696 0.98758

BIC 1070 1101 1185 1220 1330 1485
AIC 1077 1107 1192 1226 1337 1492
Killeen (1975) 2 0.96347 0.95274 0.93744 0.92245 0.88656 0.83498
R
BIC 189 62 77 263 601 963
AIC 179 52 87 273 611 973
Prelec (1982) R2 0.93349 0.92918 0.92353 0.91713 0.89932 0.86361
BIC 304 137 43 255 630 1006
AIC 297 131 49 261 637 1013
Rachlin (1978) 2 0.88794 0.85478 0.82013 0.76983 0.68004 0.58907
R
BIC 414 286 134 45 394 780
AIC 404 276 124 55 404 790
Note. Best goodness-of fit-measures and lowest BIC and AIC values are in boldface.
2e6 to some intuitive expectations, the reproduction of these

Bi ¼ ¼ 2Bm : ð11Þ
ðe 1ÞV break-and-run patterns did not influence the distribution
of reinforcements for different rates of responses. In the
Substituting Bi in Equation 8, we calculate the rein- break-and-run model, effective rate of responses per
forcement rate Ri at the point of inflection: minute is merely the product of the nominal responses
per minute by the proportion of time in which the animal

60 2 2e is behaving (Rachlin, 1989). The effective response prob-
Ri ¼ exp exp : ð12Þ ability is given by Pr =ðPr þ Pb Þ.Consequently, we can
V e1 e1
obtain the same results of a given effective rate in a cer-
tain schedule simulating the equivalent nominal rate
Figure 5 shows the data from four RDRL, 2, 4, without the break-and-run pattern (i.e., Pr ¼ 1 and
8, and 16 s. For each RDRL we have fitted Equations 7 Pb ¼ 0). This equivalence is shown in Figure 6.
and 8 and highlighted the points of interest (Bm ,Rm Þ and
(Bi ,Ri Þ. Also, the dashed exponential curves exemplify
how the RFF is controlled by the free parameters b and c DISCUSSION
that, in turn, control how reinforcement increases and
decreases. The main objective of the present paper was to imple-
ment and discuss simple schedules of reinforcement from
a quantitative perspective. In Beak, we have implemented
Break-and-run effects VI and variable differential reinforcement of low rates
schedules. Also, we have implemented a random distribu-
We have simulated the influence of several combinations tion of responses that allows us to investigate wide ranges
of the probabilities to start a run and a break. Contrary of response rate and their effects on reinforcements per
RFF RI5 sec RFF RI7 sec
12
8
10
6
8
6
4
Simulated Data Simulated Data
4
R2=0.99993, Baum (1981) R2=0.99985, Baum (1981)
2
R2=0.96347, Killeen (1975) R2=0.95274, Killeen (1975)
2
R2=0.93349, Prelec (1982) R2=0.92918, Prelec (1982)

R2=0.88544, Rachlin (1978) R2=0.85389, Rachlin (1978)
0
0
0 50 100 150 200 0 50 100 150 200

6
4
5
3
4
3
2
2
R2=0.99975, Baum (1981) 1 R2=0.99933, Baum (1981)

1
R2=0.92353, Prelec (1982) R2=0.91713, Prelec (1982)

0
0 50 100 150 200 0 50 100 150 200

2.0
1.0
0.8
1.5
0.6
1.0
0.4

R2=0.99696, Baum (1981) R2=0.98758, Baum (1981)
0.5
0.2

R2=0.89932, Prelec (1982) R2=0.86361, Prelec (1982)
0.0
0.0
0 50 100 150 200 0 50 100 150 200

Response Rate Response Rate
FIGURE 3 Curve fit and R 2 for each feedback function using simulated data of RI 5, 7, 10, 15, 30, and 60 s
minute (i.e., RFF). It is important to emphasize that behavior. In other words, simulations map the norma-
these simulations do not replace the study of behavior. tive rules of schedules, whereas experiments map effec-
Simulations are concerned with normative rules of tive behaviors of organisms.
schedules, going through a large range of possible Comparisons between different RFFs for RI provide
response rates, and exhaustively repeating these condi- experimenters with better ways to describe the relation
tions. In this sense, Beak can provide orientation for a between behavior and environmental constraints. How-
researcher in creating an experimental scenario to which ever, deciding between curve fits is no simple matter
a biological organisms can be purposefully subjected. given that there are no definitive criteria. We will address
Because this biological being will behave with a certain the issue systematically, highlighting the pros and cons of
response rate, its confrontation with the simulation pre- each one of the four curve fits—Baum (1981), Killeen
dictions may clarify biases and constraints of actual (1975), Prelec (1982), and Rachlin (1978).
332 SILVEIRA ET AL.
suggested that m ¼ 0:1 or m ¼ 0:2 across schedules of dif-

ferent sizes. In fact, if m approaches zero, the interval
schedule approaches a RT; if m approaches one, it
approaches a RR. However, we did not find a constant
value for m, which varies with RI size. As an example,
we wrote the parameter m as a function of V (the sched-
ule size) and rewrote Rachlin’s (1978) RFF as
exp ½12ð11eÞ ln ðV Þ
60 B
R¼ : ð13Þ
V B max
As shown in Equation 13, Rachlin’s (1978) RFF

could rely solely on meaningful parameters; in theory the
same could be done with Killeen’s (1975) free parameter
c, which seems to be a positive value with no upper limit.
F I G U R E 4 Simulated data relating reinforcement per minute to
responses per minute in four RDRL schedules
Precision and Parsimony
T A B L E 4 Parameter estimates (b and c) and R2 for simulated data Although the other RFFs seem to struggle with larger
of four RDRL schedules RI, Baum’s (1981) RFF is fairly stable. In fact, it seems
V a b c R2 to be the best available RFF RI with respect to both
goodness of fit and informational criteria (see Table 3).
2s 30.00 212.55 74.94 0.994
As Baum (1992) stated, Rachlin’s (1978) RFF has many
4s 15.00 99.68 35.46 0.986 drawbacks. First, it does not have a horizontal asymp-
8s 7.50 48.25 17.14 0.994 tote, an important feature of interval schedules. Second,
16 s 3.75 23.94 8.51 0.999 and more importantly, it does not fit the data properly
Note. RFF RDRL = 60=V½ exp ðB=bÞ exp ðB=cÞ]; B = Response rate; (see Table 3 and Figure 3). Even though Killeen’s (1975)
V = Schedule size in seconds; e = exp(1); R2 = Goodness-of-fit measure for and Prelec’s (1982) do better than Rachlin’s (1978), their
Equation 7. R2 also drops significantly for the larger RI, which seems
be a tendency for even larger RI.
Variable interval RFF

Generality
As Figure 2 shows, RI simulations executed using Beak
were able to reproduce the general shape of a known Following the above criteria, one should readily decide in
RFF. That enables the investigation of how precisely and favor of Baum’s (1981) RFF. However, it is worth men-
parsimoniously the RFF curves proposed can describe tioning that Baum’s function, like Prelec’s, applies only
simulated data, which gives us grounds to point out to interval schedules. Conversely, Killeen’s (1975) and
advantages and disadvantages of different ways to imple- Rachlin’s functions also describe other simple schedules
ment each RFF. (see Killeen & Sitomer, 2003). Rachlin’s (1978) exponen-
tial function, for instance, describes time, interval, and
ratio schedules. Still, we argue that such generality does
Meaning not compensate for the fallouts already discussed. Simi-
larly, Killeen’s (1975) functions have generality as the pri-
To describe the relation between B and R, Baum’s (1981) mary advantage, as it models not only interval schedules
and Prelec’s (1982) RFFs rely only on V, a single param- but also ratio schedules and the still undocumented
eter that has a built-in meaning and is supposed to corre- RDRL feedback function, as discussed below.
spond to the schedule’s size determined from
experimental planning. That is convenient, especially
because, at least in theory, they do not require estimation Variable differential reinforcement of low
methods. On the other hand, Rachlin’s (1978) depends rates RFF
on m, and Killeen’s (1975) depends on c. As far as we
know, these parameters have no empirical meaning. Our RDRL is an implementation of the variable differen-
Rachlin (1978, 1989) showed that m always falls tial reinforcement of low rates schedule. Logan (1967)
between zero and one for any interval schedule and exposed rats to a variable DRL with only two equally
RDRL2s RDRL4s
30
15
25
20
10
Simulated data Simulated data
15 Killeen (1975), R²=0.994 Killeen (1975), R²=0.986
Free parameter curve Free parameter curve
Rm Rm
10
5
Ri Ri
5
Bm Bm Bi
0
0
0 50 100 150 200 0 50 100 150 200
RDRL8s RDRL16s
8
4
6
3
Simulated data Simulated data
Killeen (1975), R²=0.994 Killeen (1975), R²=0.999
4
2
Free parameter curve Free parameter curve
Rm Rm
Ri Ri
2
Bm Bi Bm Bi
0
0 50 100 150 200 0 50 100 150 200

Response rate Response rate
F I G U R E 5 Curve fit or simulated data of four RDRL. Top horizontal dashed lines are given by 60=V , where V is the schedule size in seconds.
Decreasing dashed lines are adjusted by ð60=V Þexp ðB=cÞ, rising dashed lines are adjusted by ð60=V Þ½1 exp ðB=bÞ, and empty circles are
simulated data. Solid thick line is RDRL fit by Equation 7, and the dashed line is provided by Equation 8, where B stands for response rate, ðBm , Rm Þ
is the maximum point, and ðBi , Ri Þ is the inflection point.
likely IRT requirements. Here, we have implemented a optimal point, reinforcement rate would decrease asymp-
RDRL, a continuous version of the somewhat minimalist totically. Because Logan (1967) built his variable differ-
Logan’s RDRL. Even though Logan (1967) described his ential reinforcement of low rates schedule out of two
results in terms of proportion of IRT, we argue that his intervals, optimal rates of response could be easily pre-
seminal data are in agreement with the RFF RDRL gen- dicted. In fact, rats that served as subjects learned how to
eral shape produced using Beak. maximize reinforcement by responding after the shorter
Logan found that the most likely IRT observed in the interval and then waiting for the longer one. However,
experiment “approximated an optimal strategy for maxi- using a geometric distribution for the values that com-
mizing reward” (Logan, 1967, p. 393). This meant that pose the schedule we should expect the peaks observed in
the subjects’ first response after reinforcement occurred Logan’s experiment to merge, forming the curves seen in
with an IRT slightly longer than the smaller DRL inter- Figures 4 and 5. Figure 5 shows that the greater the size
val out of the two programmed, and further responses of the schedule programmed, the sharpest the curve on
happened with IRT around the other (longer) DRL inter- the peak of the RFF.
val. Therefore, he found two peaks of likely IRT that Our results show that Killeen’s (1975) model is a via-
matched the DRL intervals used. ble RFF for the RDRL schedule. Killeen (1975) used
Considering that behavior rate equals the reciprocal Equation 7 to model response probability as a function
of IRT, Logan’s results allow us to intuit what a RDRL of time elapsed since the last reinforcement. The model is
RFF should look like. Reinforcers per minute should based on two competing processes controlled by parame-
increase along with response rate until a certain maxi- ters b (concurrent) and c (inhibitory). Killeen (1975)
mum. However, if the response rate increases beyond this interpreted the former as a measure of an increasing
334 SILVEIRA ET AL.
correlated, allowing for a simplification. Using our esti-

mated values, we could rewrite the RFF in Equation 8,
where the single parameter is the schedule size V .
Another type of information we take from the simula-
tions and the estimations of the RDRL RFF is about the
maximum reinforcement rate possible to be obtained.
Equation 9 gives the response rate that maximizes rein-
forcement as a function of the schedule size. This is also
interesting for planning experiments with actual subjects,
especially considering differences between the typical
local operant rates of each subject.
Equation 10 shows the relation between maximiza-
tion and the size of the schedule. An interesting result we
observed across many simulations is that the maximum
reinforcement rate is a constant 60=ðeV Þ ¼ 0:37a. This
normative rule not only confers a priori the maximum
reinforcement an animal can obtain given the schedule
size but also the prediction of the response rate at which
this maximum will be achieved.
Although Beak allows testing for more extreme
schedules, for both RI and RDRL simulations are not
feasible for the response rate range presented here. For
F I G U R E 6 Influence of burst and stop behaviors in the simulation instance, a very rich schedule such as RI 0.1 s up to
of two exemplary schedules. Simulated data of a RI 15 s and a RDRL
8 s with several levels of probability to start a burst (Pb ) and to start a
200 responses per minute would barely start its ascen-
pause (Ps ). It was found that ranging from 0 to 200 responses per sion; conversely, a very lean schedule such as RI 600 s
minute did not affect the results; the curves are only incomplete for the would be already at the maximum plateau with one
effective number of responses per minute, given by Pb =ðPb þ Ps Þ. response per minute. These schedules would require,
respectively, the study of response rates up to several
hundreds of responses per minute or the study of frac-
competition among different activities and the latter as tional intervals in response rates between 0 and 1. This
postreinforcement inhibition. Even though Killeen’s kind of resulting curve would be qualitatively equal to
(1975) model was applied in a different theoretical con- the ones presented here but not applicable to emulate
text, we have found an analogous conflict in our analy- behavior, at least not for the animals usually available
sis. The RDRL poses a similar competition between in our laboratories.
contingency and postponement of reinforcers: on one These findings are important because they success-
hand, we have the negative punishment imposed by the fully add complexity in our basic description of simple
schedule to rates above the schedule’s criterion (con- schedules of reinforcement. This complexity may be
trolled by b); on the other, we have the direct relation viewed as consistent with other schedule parameters,
between rates of response and reinforcement (controlled showing the capacity of our computational model to
by c). compare different quantitative models of simple sched-
The decreasing dashed lines for each RDRL in ules of reinforcement as a starting point to analysis of
Figure 5 are controlled by the parameter b, whereas the other sources of control, including conflict between excit-
rising dashed lines are controlled by c. Greater values of atory and inhibitory control (Staddon, 1977) and tempo-
b mean a slower decay in reinforcer income at higher ral control (e.g., Machado, 1997).
rates of response. Greater values of c mean a slower Typical behavior of a real animal might present
increase in reinforcement at low rates of response. break-and-run patterns, with bursts of responding at
As shown in Table 4, we have found greater values of approximately constant tempo and pauses of variable
b and c for smaller (richer) RDRL. That matches our intervals. It is arguable that the structure of behavior in
interpretation of the model because smaller RDRL are the real organism is more complex than that implemented
more demanding and permissive. They are demanding in our model (Nevin & Baum, 1980); thus, more realism
because they require higher response rates in order to might be incorporated to make this model more useful
reach maximum reinforcement income, and they are per- for specific occasions. To assess the effect of a break-
missive because they allow higher response rates to go and-run model, it was necessary to implement this break-
unpunished (see Figure 5). and-run feature in our simulation. Somewhat against
Even though Killeen’s (1975) model showed good fit intuition, our results show that the effect of this modifica-
and a convenient theoretical analogy, we have showed tion merely reproduced the behavior of a lower response
that, for RDRL RFF, the parameters b and c are rate; thus, it may be removed from the model for the sake
of parsimony. In this case, realism introduced unneces- relationships, affiliations, knowledge or beliefs) in the
sary complications with no gain in explanatory power. subject matter or material discussed in this manuscript.
Regarding schedules in which reinforcement may
depend on both the passage of time and the occurrence of ETHICS ST ATE ME NT
responses, the RDRL is a way to further constrain rein- This investigation is purely theoretical; thus, it was not
forcement in comparison to the RI schedule. The RFF submitted to any ethics committee.
RDRL is like RFF RI, in a sense that in both cases the
rate of reinforcement depends on the response rate. OR CID
Therefore, we have found increasing functions at low Paulo Sergio Panse Silveira https://orcid.org/0000-
rates of response. However, these functions are also nega- 0003-4110-1038
tively accelerated functions. This represents the restric- José de Oliveira Siqueira https://orcid.org/0000-0002-
tion imposed by time, which is present in both schedules. 3357-8939
The RFFs of the two schedules differ in the extent to João Lucas Bernardy https://orcid.org/0000-0002-3805-
which the RDRL schedules further constrain reinforce- 7366
ment. In the interval schedule, the response rate has a Jessica Santiago https://orcid.org/0000-0002-7788-5455
positive monotonic relation with the ever-increasing rate Thiago Cersosimo Meneses https://orcid.org/0000-
of reinforcement. That is not the case in the RDRL. In 0003-3473-5841
the RDRL schedule, high rates of response are negatively Bianca Sanches Portela https://orcid.org/0000-0002-
punished by the postponement of reinforcement. In fact, 1351-652X
this feedback system is well described by two competing Marcelo Frota Benvenuti https://orcid.org/0000-0002-
processes (Killeen, 1975). 9397-3033
Briefly, our results demonstrate the power of our
computational simulation to analyze basic schedules of REF ER ENCE S
reinforcement and refine ways to implement them. The Aasvee, K., Rasmussen, M., Kelly, C., Kurvinen, E., Giacchi, M. V., &
enormous computational power available today should Ahluwalia, N. (2015). Validity of self-reported height and weight for
be used to offer, for instance, a variety of intervals estimating prevalence of overweight among Estonian adolescents:
instead of a simple shuffle of a small set of intervals mim- The Health Behaviour in School-aged Children study. BMC
icking older devices. Also, based on our results, we Research Notes, 8(1), 606. https://doi.org/10.1186/s13104-015-1587-9
Ambler, S. (1973). A mathematical model of learning under schedules of
revised RI RFF and proposed a RDRL RFF. Using interresponse time reinforcement. Journal of Mathematical Psychol-
computer simulation prevents unnecessary use of time ogy, 10(4), 364–386. https://doi.org/10.1016/0022-2496(73)90023-0
and long experimentation without a clear notion of the Baum, W. M. (1973). The correlation-based law of effect. Journal of the
normative rule that may be governing the strategic Experimental Analysis of Behavior, 20(1), 137–153. https://doi.org/
options involved. The new implementation methods pre- 10.1901/jeab.1973.20-137
Baum, W. M. (1981). Optimization and the matching law as accounts
sented paves the way for a richer study of schedules of of instrumental behavior. Journal of the Experimental Analysis of
reinforcement and their normative maximization rules, Behavior, 36(3), 387–403. https://doi.org/10.1901/jeab.1981.36-387
serving also as a guide toward promising questions that Baum, W. M. (1992). In search of the feedback function for variable-
future experiments may want to explore. interval schedules. Journal of the Experimental Analysis of Behav-
ior, 57(3), 365–375. https://doi.org/10.1901/jeab.1992.57-365
Baum, W. M. (1993). Performances on ratio and interval schedules of
F U N D I NG S T A T E M E NT reinforcement: Data and theory. Journal of the Experimental Anal-
Marcelo Benvenuti is member of the National Institute ysis of Behavior, 59(2), 245–264. https://doi.org/10.1901/jeab.1993.
of Science and Technology on Behavior, Cognition, and 59-245
Teaching (INCT - ECCE), supported by São Paulo Catania, A. C., & Reynolds, G. S. (1968). A quantitative analysis of the
Research Foundation (FAPESP, grant No. 2014/50909-8), responding maintained by interval schedules of reinforcement.
Journal of the Experimental Analysis of Behavior, 11(3S2),
the Brazilian Council for Scientific and Technological 327–383. https://doi.org/10.1901/jeab.1968.11-s327
Development (CNPq; grant #465686/2014-1), and the Dews, P. B. (1962). Psychopharmacology. In A. J. Bachrach (Ed.),
Coordination for the Improvement of Higher Education Experimental foundations of clinical psychology (4th ed.,
Personnel (CAPES; grant #88887.136407/2017-00). pp. 423–441). Basic Books.
Fantino, E. (1998). Behavior analysis and decision making. Journal of
the Experimental Analysis of Behavior, 69(3), 355–364. https://doi.
C ON F L I CT O F I NT E R E ST org/10.1901/jeab.1998.69-355
The authors declare no conflict of interest. The authors Feller, W. (1968). An introduction to probability theory and its applica-
certify that they have NO affiliations with or involvement tions (Vol. 1, 3rd ed.). Wiley.
in any organization or entity with any financial interest Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement.
(such as honoraria; educational grants; participation in Appleton-Century-Crofts.
Fleshler, M., & Hoffman, H. S. (1962). A progression for generating
spearkers’ bureaus; membership, employment, consultan- variable-interval schedules. Journal of the Experimental Analysis of
cies, stock ownership, or other equity interest; and expert Behavior, 5(4), 529–530. https://doi.org/10.1901/jeab.1962.5-529
testimony or patent-licensing arrangements), or non- Galizio, M., & Buskist, W. (1988). Laboratory lore and research prac-
financial interest (such as personal or professional tices in the experimental analysis of human behavior: Selecting
336 SILVEIRA ET AL.
reinforcers and arranging contingencies. The Behavior Analyst, Rachlin, H. (1978). A molar theory of reinforcement schedules. Journal
11(1), 65–69. https://doi.org/10.1007/bf03392457 of the Experimental Analysis of Behavior, 30(3), 345–360. https://
Goodie, A. S., & Fantino, E. (1995). An experientially derived base-rate doi.org/10.1901/jeab.1978.30-345
error in humans. Psychological Science, 6(2), 101–106. https://doi. Rachlin, H. (1989). Judgment, decision, and choice: A cognitive/behavioral
org/10.1111/j.1467-9280.1995.tb00314.x synthesis. Freeman.
Green, L., Rachlin, H., & Hanson, J. (1983). Matching and maximizing Rachlin, H., & Green, L. (1972). Commitment, choice, and self-control.
with concurrent ratio-interval schedules. Journal of the Experimen- Journal of the Experimental Analysis of Behavior, 17(1), 15–22.
tal Analysis of Behavior, 40(3), 217–224. https://doi.org/10.1901/ https://doi.org/10.1901/jeab.1972.17-15
jeab.1983.40-217 Reilly, M. P. (2003). Extending mathematical principles of reinforcement
Herrnstein, R. J. (1961). Relative and absolute strength of response as a into the domain of behavioral pharmacology. Behavioural Processes,
function of frequency of reinforcement. Journal of the Experimen- 62(1–3), 75–88. https://doi.org/10.1016/S0376-6357(03)00027-5
tal Analysis of Behavior, 4(3), 267–272. https://doi.org/10.1901/ Schwarz, G. (1978). Estimating the dimension of a model. The Annals
jeab.1961.4-267 of Statistics, 6(2), 461–464. https://doi.org/10.2307/2958889
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimen- Staddon, J. E. R. (1977). Schedule-induced behavior. In W. K. Honig &
tal Analysis of Behavior, 13(2), 243–266. https://doi.org/10.1901/ J. E. R. Staddon (Eds.), Handbook of operant behavior
jeab.1970.13-243 (pp. 125–152). Prentice-Hall.
Hyndman, R. J. (1996). Computing and graphing highest density Stoddard, L. T., Sidman, M., & Brady, J. V. (1988). Fixed-interval and
regions. American Statistician, 50(2), 120–126. https://doi.org/10. fixed-ratio reinforcement schedules with human subjects. The Analysis
1080/00031305.1996.10474359 of Verbal Behavior, 6(1), 33–44. https://doi.org/10.1007/bf03392827
Killeen, P. R. (1975). On the temporal control of behavior. Psychologi- Weiss, S. J., & Van Ost, S. L. (1974). Response discriminative and rein-
cal Review, 82(2), 89–115. https://doi.org/10.1037/h0076820 forcement factors in stimulus control of performance on multiple
Killeen, P. R., & Sitomer, M. T. (2003). MPR. Behavioural Processes, and chained schedules of reinforcement. Learning and Motivation,
62(1), 49–64. https://doi.org/10.1016/S0376-6357(03)00017-2 5(4), 459–472. https://doi.org/10.1016/0023-9690(74)90004-6
Logan, F. A. (1967). Variable DRL. Psychonomic Science, 9(7), Wyckoff, L. B. J. (1969). The role of observing responses in discrimina-
393–394. https://doi.org/10.3758/BF03330862 tion learning: Part II. In D. P. Hendry (Ed.), Conditioned rein-
Machado, A. (1997). Learning the temporal dynamics of behavior. Psy- forcement (pp. 237–260). The Dorsey Press.
chological Review, 104(2), 241–265. https://doi.org/10.1037/0033-
295X.104.2.241
Mazur, J. E. (2016). Learning and behavior. Prentice-Hall.
Millenson, J. R. (1963). Random interval schedules of reinforcement.
How to cite this article: Silveira, P. S. P.,
Journal of the Experimental Analysis of Behavior, 6(3), 437–443.
https://doi.org/10.1901/jeab.1963.6-437 de Oliveira Siqueira, J., Bernardy, J. L., Santiago,
Nevin, J. A., & Baum, W. M. (1980). Feedback functions for variable- J., Meneses, T. C., Portela, B. S., & Benvenuti,
interval reinforcement. Journal of the Experimental Analysis of M. F. (2023). Modeling VI and VDRL feedback
Behavior, 34(2), 207–217. https://doi.org/10.1901/jeab.1980.34-207 functions: Searching normative rules through
Pierce, W. D., & Cheney, C. D. (2017). Behavior analysis and learning:
computational simulation. Journal of the
A biobehavioral approach. Erlbaum.
Prelec, D. (1982). Matching, maximizing, and the hyperbolic reinforce- Experimental Analysis of Behavior, 119(2),
ment feedback function. Psychological Review, 89(3), 189–230. 324–336. https://doi.org/10.1002/jeab.826
https://doi.org/10.1037/0033-295X.89.3.189

Modeling VI and VDRL Feedback Functions: Searching Normative Rules Through Computational Simulation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modeling VI and VDRL Feedback Functions: Searching Normative Rules Through Computational Simulation

Uploaded by

Copyright:

Available Formats

Received: 6 July 2022 Accepted: 20 November 2022

Modeling VI and VDRL feedback functions: Searching normative

Paulo Sergio Panse Silveira 1 | José de Oliveira Siqueira 2 |

Editor-in-Chief: Mark Galizio

Handling Editor: Sarah Cowie

without changing critical aspects of the environment. Random interval

step). To investigate higher values of B, t needs to be rate of reinforcement in RI schedules is monotonically

behavior, one that assumes that behavior is random and

7.07s, T=0.277, p=0.03918

10.10s, T=0.395, p=0.03911

59.92s, T=1.000, p=0.01669

information criterion (BIC) and Akaike information cri-

TABLE 2 Parameter estimates for each RFF RI using beak

Baum (1981) V 4.91 6.92 9.92 14.88 29.86 59.56

Baum (1981) R2 0.99993 0.99985 0.99975 0.99933 0.99696 0.98758

2e6 to some intuitive expectations, the reproduction of these

RFF RI5 sec RFF RI7 sec

R2=0.99993, Baum (1981) R2=0.99985, Baum (1981)

R2=0.93349, Prelec (1982) R2=0.92918, Prelec (1982)

RFF RI10 sec RFF RI15 sec

R2=0.99975, Baum (1981) 1 R2=0.99933, Baum (1981)

R2=0.92353, Prelec (1982) R2=0.91713, Prelec (1982)

0 50 100 150 200 0 50 100 150 200

RFF RI30 sec RFF RI60 sec

Simulated Data Simulated Data

R2=0.88656, Killeen (1975) R2=0.83498, Killeen (1975)

0 50 100 150 200 0 50 100 150 200

suggested that m ¼ 0:1 or m ¼ 0:2 across schedules of dif-

As shown in Equation 13, Rachlin’s (1978) RFF

Variable interval RFF

0 50 100 150 200 0 50 100 150 200

correlated, allowing for a simplification. Using our esti-

You might also like