Download as pdf
Download as pdf
You are on page 1of 10
OPERANT CONDITIONING In 1898, Edward Thorndike placed a hungry cat in a box with a mechanical latch and then placed food in full view just outside the box. The cat meowed, paced back and forth, and rubbed against the walls of the box. In so doing, it hap- ‘pened to trip the latch. Immediately, the door to the box opened, and the cat gained access to the food. Thorndike repeated the experiment, and with contin- _uted repetitions the cat became more adept at tripping the latch. Eventually, it was able to leave its cage almost as soon as food appeared. _ Thomdike proposed a law of learning to account for this phenomenon, which he called the law of effect: An animal's tendency to reproduce a behavior de- ‘pends on that behavior’s effect on the environment and the consequent effect on the animal. If tripping the latch had not helped the cat reach the food, the cat ‘Would not have learned to keep brushing up against the latch. More simply, the law of effect states that behavior is controlled by its consequences. The behavior of Thorndike's cats exemplifies a second form of conditioning, known as instrumental or operant conditioning. Thorndike used the term instru- 216 (Ola Psychological “Testing Center Cuarrens Learns “mad Tih ono pes bn ern Me dba ‘mental conditioning because the behavior is instrumental to achieving a more satis- fying state of affsirs. B. F Skinner, who spent years experimenting with and sys- tematizing the ways in which behavior is controlled by the environment, called it operant conditioning, which means learning to operate on the environment to produce a consequence. Although the lines between operant and classical conditioning are not always ‘hard and fast, the major distinction between them is that in classical conditioning, ‘an environmental stimulus initiates a response, whereas in operant conditioning a behavior (or operant) produces an environmental response. Operants are behav- iors that are emitted (spontaneously produced) rather than elicited by the envi- ronment, Thordike's cat spontaneously emitted the behavior of brushing, up against the latch, which resulted in an effect that conditioned future behavior. Skinner emitted the behaviors of experimenting and writing about his results, ‘which brought him the respect of his colleagues and hence influenced his future behavior. Had his intial experiments failed, he would likely have been less likely to persist, just as Thorndike's cats did not continue emitting behaviors with neu- tral or aversive environment effects. In operant conditioning—whether the ani- imal isa cat ora psychologist—the behavior precedes the environmental event that conditions future behavior, By contrast, in classical conditioning, an environmen- tal stimulus such as a bell) precedes a response. ‘The basic idea behind operant conditioning, then, is that behavior is con- trolled by its consequences. In this section, we explore two types of environmental consequence that produce operant conditioning: reinforcement, which increases the probability that a response will occur, and punishment, which diminishes its Likelihood. REINFORCEMENT Reinforcement means just what the name implies: Something in the environment fortifies, or reinforces, a behavior. A reinforcer is an environmental consequence that occurs after an organism has produced a response and makes the response more likely to recur. Psychologists distinguish two kinds of reinforcement, posi- tiveand negative. Positive Reinforcement Positive reinforcement is the process whereby presentation of a stimulus (a re- ward or payoff) alter a behavior makes the behavior more likely to occur again. Orman Conmmonive ? L “or example in experimental procedures pioneered by B. F. Skinner (1938, 1953), ‘igeon was placed in a cage with a target mounted on one side (Figure5.8), P= [goons and rats were Skinner's favorte—that i, most reinforcing subjects) The [pigeon spontaneously pecked around in the cage. This behavior was not a re nse (0 any particular stimulus; pecking is simply innate avian behavior If, by ince, the pigeon packed atthe target, however a pellet of grain dropped into a bin, te pigeon happened to peck at the target again, it was once more rewarded ith « pele. The pellet is a pesitive einforcer—an environmental consequence “that when presented, strengthens the probability that a response wil ecur, The pigeon would thos start to peck atthe target more frequently since this operant ame associated withthe positive reinforce. = Positive reinforcement isnot limited to pigeons. In fac, it controls much of juman behavior as ell. Students learn to exert efor studying when they are re- yorced with praise and good grades, salespeople learn to appease obnoxious | aisiomers and laugh at their jokes because doing so yields them commissions, | ind people learn to goto work each day because they eceive a paycheck tive Reinforcement lninating something aversive am itself be a reinforcer or reward. Tiss known negative reinforcement —the process whereby termination of an aversive stim Takes a behavior more likely to occur. Just as the presentation ofa postive force rewards a response, the removal of an aversive stimulus rewards a re- nse, Negative reinforcers, then, are aversive or unpleasant stimuli that agthen a behavior by their removal. Hitting the snooze button on an alarm ock s negatively reinforced by te termination ofthe alarm; cleaning the kitchen egatively reinforced by the elimination of unpleasant sights and smells. "Negetive reinforcement occurs in both escape learning and avoidance earm- Inescape learning, a behavior is reinforced by the elimination of an aversive te ofafairs that aleady exists; that is, the organism escapes an aversive situa on, For example, a rat presses a lever and terminates an electric shock; an zealous sunbather applies lotion to her skin to relieve sunburn pain; or a ld cleans his room to stop his parents from nagging, Avoidance learning oc- 5 as an organism prevents an expected aversive event from happening, In this s avoidance of a potentially aversive situation reinforces the operant. For ex: ple, rt presses a lover when it hears atone that signals that a shock is about 27 Ficune 5.8 ‘Apparat for operant cond ‘Soming: Ua pigeon spaced in cage with atargt on one ie, Wich can be used for operant onditoning: ()B..Skinner experiments witha rat placed Ina skinner box, witha similar ‘desig, in vehich resting 8 bar tray esl in eaters. 218 Reinforcement strengthens och a ad wrk on acters CuarrnS) Leannine ‘to occur; the sunather puts on sunscreen before going out in the sun to avoid ¢ sunburn; or the child cleans his room to avoid nagging. PUNISHMENT Reinforcement is one type of environmental consequence that controls behavior through operant conditioning; the other is punishment (Figure 5.9). Whereas rein: forcement always increxses the likelihood ofa response, either by the presentatior of @ rewant or the removal of an aversive stimulus, punishment decreases the probability that a response will recur: Thus, f Skinner’s pigeon received an elec: tricshock each time it pecked atthe target, i would be less likely to peck again be ‘cause this operant resulted in an aversive outcome. Parents intuitively apply this behavioral technique when they “ground” a teenager for staying out past curfew. ‘The criminal justice system also operates on a system of punishment, tempting to discourage illicit behaviors by imposing penalties. Like reinforcement, punishment can be positive or negative. ("Positive” and Orenaxt Copmonine ‘tar ocones seo wh Serenata oe a rn ars manor aly ‘Saree key 19 Froune 59 (Conditioning proces, Behaviors detnguish two kinds of conditioning, asia and oper- ttn operat conditioning the environment inuencs beawior through reinforcement and pponshatent. “negative” here do not refer to the feelings of the participants, who rarely con- Sider punishment a positive experience. Positive simply means something is pre- ‘sented, whereas negative means something is taken away.) In positive punish- ‘ment, such as spanking, exposure to an aversive event following a behavior reduces the likelihood of the eperant recurring. Negative punishment involves losing, ornot obtaining, a reinforcer asa consequence of behavior, as when an em ployee fil to receive a pay increase because of frequent lateness or absenteeism. ‘Punishment is commonplace and essential in human affairs, since reinforce- -ment alone i not likely o inhibit many undesirable behaviors, but its frequently ‘applied in ways that render it ineffective (Chance, 1988; Laub & Sampson, 1995; ‘Newsom et al, 1983; Skinner, 1953). One common problem in using punishment ‘with animals and young children is that the learner may have difficulty distin gguishing which operant is being punished, People who yel at their dog, for com- ‘ng after it has been called several times are actualy punishing good behavior ‘coming when called. The dog is more likely to associate the purishment with its faction than its inaction—and is likely to adjust its behavior accordingly, by be- ‘coming even les likely to come when called. A second and related problom asso- Gated with punishment is thatthe learner may come to fear the person meting 220 Gunrrens Leanne out the punishment via classical conltoning) rather than the ation (via operant conditioning). child who is harshly punished by his father may become afraid ofhis father instead of changing his behavior Third, people who rely heavily on punishment often fail to recognize that punishment may not eliminate existing rewards for « behavior In nature, unlike the laboratory, a singe ation may have multiple consequences, and behavior can be controlled by any number of them. A teacher who punishes the class clown ‘may not have much success if the behavior is reinforced by classmates. Some- times, too, punishing one behavior (suchas stealing) may inadvertenty reinforce another (such as ying). Fourth, people typically use punishment when they are angry, which can lead both to poorly designed punishment (rom a learning point of view) and tothe potential for abuse. An angry parent may punish a child for misdeeds just discov ered but that occutred a considerable time earlier. The time interval between the child's action and the consequence may render the punishment inetiective be- ‘cause the child does not adequately connect the two event. Parents also fre- quently punish depending more on their mood than onthe type of behavior they Want to discourage, which can prevent the child fom leaning what behavior I being punished, under what circumstances, and how to avoid it Finally, aggression that is used to punish behavior often leads to further ag- gression, The child who is beaten typically learns a much deeper lesson: that problems canbe solved with violence. In fact, the more piysical punishment par- ents use, the more aggressively their cildzen tend to behave at home and at school (Dodie etal, 1985, 1997; Kaplan, 1996; Larzeere, 1986; Larzlere eta, 1996; Weiss etal, 1993). Correlation doesnot, of course, prove causation; aggros- sive children may provoke punitive parenting, Nevertheess the weight of evi dence suggests that violent parents tend to create violent children. Thee sno ev ‘dence that te use of belts or paddles augments the painful effects of a swat with the hand ofan adult who usualy outweighs a child by a factor of atleast 3.0F 4. IF ‘paren who use such devices am to tach thei children selcontro, they would do better to learn it themselves, since beating children tends to make them more likely as adults to have less sel-contol lower selesteem, more troubled relation- ships, and more depression and tobe more likely to abuse their own children and spouses (Rohner, 1975, 1985, Straus & Kantor, 194). Punishment tends tobe most effective when i is accompanied by reasoning — feven in 2- and 3-year-olds (Larzalere eta, 1996)—and when the person being ppuished is also reinforeed for an altemative, aceptable behavior. Explaining helps a child correctly connect an action with @ punishment. Having other posi tively reinforced behaviors to draw on allows the child to generate alternative re sponses. Extinction ‘As in classical conditioning, learned operant responses can be extinguished. Ex- tinction occurs if enough conditioning trials pass in which the operant is nat fol- lowed by the consequence previously associated with it. A child may reduce effort in school if hard work no longer leads to reinforcers (such as “good work!" writ- ‘en on homework) ust as a corporate executive may choose to stop producing. a product that is no longer bringing in profits Knowing how to extinguish behavior is important in everyday life, particu larly for parents, Consider the case of a 21-month-old boy who had a serious ill zness requiring around-the-clock attention (Williams, 1959). After recovering, the child continued to demand this level of attention, which was no longer necessary. His demands were especially troublesome at bedtime, when he screamed and Orem Conomonin eanese “Times chile put to bed — First etinction of trum behavior in a 2-month ol chi. As shown curve A te eid iniily Gd for long prods of Une, bu very few tai of nonreinored eying were reged oes fnguish the heaving. ncurve the behavior was again quickly extinguished folowing ts povianeous recovery, Sere: Wiliams, 199, p. 269 unless parent sat with him until he fll asleep, which could take up to two ‘Relying on the principle that unreinforced behavior will be extinguished, the swith some help from a psychologist, began following a new bedtime reg In the first trial of the extinction series, they spent a relaxed and warm, x-night session with their son, cosed the door when they left the room, and cto respond to the wails and sereams that followed, After 45 minutes, the fell asleep, and he fall asleep immediately on the second tial (Figure 5.10) ‘As in classical conditioning, spontancous recovery (in which a previously behavior recurs without renewed reinforcement) sometimes occurs. In ct the boy cried and screamed again one night when his aunt attempted to put to bed. She inadvertently reinforced this behavior by returning to his room; a rosult, his parents had to repeat their extinction procedure, . nm Sunmany Operant condoning means leasing to operate on the evion- ent 1 produce a consequence. Operants are behaviors that are emsited rater than by the environment Reinforcement refers fo consequence that increases fhe ty hat response wil cur Positive senforcement ecu when the envio consequence (a reward or payotf) makes e behavior more Hkely to occ again ive reinforcement ocurs when termiation ofa aversive stimuli takes a beh ike to recur Wheres telnfreement incesoe the probably ofa response, deceases the probly that response wil cur Panshrent common. cn human afar but fequenty applied in ways that render it nefective. Eine Bin operant condoning ors if enough condin al pas in which the operant lowed hy the consequence previously associated With IT CONDITIONING OF COMPLEX BEHAVIORS far we have discussed relatively simple behaviors controlled by their envi- ental consequences—pigeons pecking, rats pressing, and people showing up am CCuarreRs Leasxone at work for a paycheck, In fact, operant conditioning offers one ofthe most com- prehensive explanatory accounts ofthe range of human and animal behavior ever produced, We will now ratchet up the complexity level by exploring four phe- ‘nomena that substantially inereace the breadth ofthe behaviorist account of learn- ing; schedules of reinforcement, discriminative stimuli, context, and characteris- ‘ics of the learner. Schedules of Reinforcement In the examples described so far, an animal is rewarded or punished every time it performs a behavior. This situation, in which the consequence is the same each time the animal emits a behavior, is called a continuous reinforcement schedule (because the behavior is continuously reinforced). A child reinforced for altruistic behavior on a continuous schedule of reinforcement would be praised every time she shares, just a a rat might receive a pellet of food each time it presses a lever. Such consistent reinforcement, however, rarely occurs in nature or in human life. ‘Mote typically, an action sometimes leads to reinforcement but other times does ‘not. Such reinforcement schedules are known as partial or intermittent schedules of reinforcement because the behavior is enforced only partof the time, or inter- rmittently. (These are called schedules of reinforcement, but the same principles apply with punishment) Intuitively, one would think that continuous schedules would be more effec tive, Although this tend to be true during the inital learning (acquisition) of a re- sponse (presumably because continous reinforcement renders the connection be- fiween the behavior and its consequence clear and predictable), partial reinforcement is usually superior for maintaining learned behavior. For example, ‘suppose you have a relatively new car, and every time you turn the key the en- ‘ine stars. If one day, however, you try to start the car ten times and the engine ‘will nt tum over, you will probably give up and call a towing company. In con- trast if you are the proud owner of a rusted-out 1972 Chevy and are accustomed. to ten turns of the Ignition before the car finally cranks up, you may try 20 oF 30, times before enlisting the help of a mechanic. Thus, behaviors maintained under partial schedisles are usually more resistant to extinction. ‘Behaviorist researchers, notably Skinner and his colleagues, have categorized {intermittent reinforcement schedules as either ratio schedules or interval sched ules (Fester & Skinner, 1957; Skinner, 1958). In ratio schedules, payoffs are tied to i i i i aa i eb al cat ‘the number of responses emitted: only a fraction of “correct” behaviors receive re- {nforcement (such as one out of every five, fora ratio of 1:5). In interval sched- tles, rewards are delivered only after some interval of time. The organism can produce a response as often as it wants, but the response will only be einforced (or punished) after certain amount of time has elapsed. Reinforcement schedules are often studied with a cumulative response recorder, an instrument thet tallies the number of times a subject produces a re~ sponse, such as pressing a bar or pecking a target. Figure 311 illustrates typical ‘cumulative response recordings forthe four reinforcement schedules we will now describe: fixed ratio, variable ratio, fixed interval, and variable interval Fixed-Ratio Schedules In a fixed-raio (FR) schedule of reinforcement, an organism receives reinforcement fora fixed proportion of the responses it emits, Plecework employment uses a fixed-ratio schedule of reinforcement: A person re- ceives payment for every bushel of apples picked (an FR-1 schedule) or for every ten scarves woven (an FR10 schedule}. Workers weave the frst nine scarves ‘without reinforcement; the payoff occurs when the tenth scarf is completed. AS shown in Figure 511, FR schedules are characterized by rapid responding, with a brief pause after each reinforcement. Orexaxr Cowarrionine sap amma a ae ee H : 22 a] f Froure 5.11 Schedles of enforcement. The gure shows cumulative response records oe Seal, Ya blero, Sxed-nterval and variable interval esnorement schedules. A cumulative sponse ‘econ grap the fot number of respenses hat have heen emutted at ay point inne. Die. tnt echedles of rinorceent produce diferent ptr of sponding, Time Vatiable-Ratio Schedules In variable-ratio (VR) schedules, the individual receives a reward for some percentage of responses, but the number of responses required hefore reinforcement is unpredictable (hat is, variable). Variable-ratio schedules specify an average number of responses that will be rewarded. Thus, a pigeon on a VR-S schedule may be rewarded on its fourth, seventh, 13th, and 20¢h Tesponses, averaging one reward for every five responses. Variableratio sched- ules generally produce rapid, constant responding and are probably the most ‘common in daily life. People cannot predict tat they will be rewanded or praised for every fifth good deed, but they do receive occasional, irregular social rein- forcement, wich is enough to reinforce altruistic behavior in most people. Sim Tarly, students may not receive a good grade each time they study hard for an ex: ‘amination, but many study nonetheless because they learn that the average rate of reinforcement is higher than if they do not study. The power of variable-atio schedilles can be seen in gambling, in which people may gradually lose their ‘shirts i they are intermittently—and very iregularly—reinforced. Fixed-Interval Ratios In a fixed-interval (FD schedule, an animal receives “winforcement for its responses only after a fixed amount of time. For example, 2 “rat that prosees a bar is reinforced with a pellet of food every ten minutes. The rat ay press the bar 100 times or one time during that ten minutes; doing s0 does pss ctemer ne delivery ofthe pellet, just as long as the rat presses he barat some point during each ten-minute interval | _-Ananimal onan Fl schedule of reinforcement will ultimately learn to stop re fing except tovrard the end of each interval, producing the scalloped cumu- ive response pattern shown in Figure 511. Fedsinterval schedules affect sman performance in the same way: For example, workers whose boss comes by ly at two o clock are likely to relax the rest ofthe day. Schools rly heavily on FI ules; as a result, some students procrastinate between exams and pull “all- ighters" when reinforcement (or punishment) is imminent. Politicians, to0, seem resemble rats in their response patterns (Figure 5:12). Periodic adjournment of 1 on a fixed interval appears to reinforce bill passing (the congressional valent of bar pressing), producing precisely the same scalloped response nd (Waldrop, 1972. CuarrenS Leanwove 20th Congress dan, 1967 Aug. 1948 = Bist Congress dan, 1948-hen. 1981 = Bnd Congress 3 dan, 195Iesuly 1982 a m= 831d Congress 2 Jan. 1953-Aug. 1954 z 3 2 é ° MAMJJASONOJ FMAMJJASOND List session 1 | and session | Writs Froune 5.12 ‘Tho etiocs ofa uad-ntrval schedule on the US. Congress. Adjournment serves a power ‘infor for members ef he US, House of Representatives, who seem to respond t ths > Interval schedule of enforcement with 2 furry of lastenimute att before being reinforce, ‘Aathe graph shows, the pattern looked the same over sverl years investigated. Te sealloped ‘rve.s remarkably simular to the xad-nerval cumulative response recording devived rom ‘Sadping mts and pigeons in Figure 51, Souce Welaborg & Waldrop, 1972, p23. ‘Variable-Interval Schedules A variable interval (VD schedule like a fixed interval schedule ties reinforcement toan interval of time after which the individ xual’s response leads to reinforcement. In a VI schedule, however, the animal can- not predict how long that time interval will be. Thus, a rat might receive reinforcement for bar pressing, but only at five, six 20, and 40 minutes (a VI-10 schedule). \Variable-interval schedules are more effective than fixed-interval schedules in smainlaining consistent performance. Random, unannounced governmental in- spections of working conditions in a plant are much more effective in getting ‘management to maintain safety standards than inspections at fixed intervals. In the classroom, pop quizzes make similar use of VI schedules. Discriminative Stimuli Tn everyday lf, then, rarely does a response receive continuous reinforcement in a given situation such as work or schoel. Making matter even more complicated for learners is that a single behavior ca lead to differen effect in differen it~ tions, Professors receive a paycheck for lecturing to thet classes, but if they Tee= fre new acquaintances ata cocKtall party, the environmental consequences vill rot be the same, Similary, domestic cats lear that the dining room table ba great place to stretch out and relax except when their owners are home. In some situations then, a connection might exist between a behavior and a consequence (called a response contingency, because the consequence is depen ‘ent, or contingent, on the behavion In other situations, however, the cotingen- cies might be diferent, so the organism needs tobe able to disriminate dream

You might also like