Professional Documents
Culture Documents
PSYU-X1101 - 2021 - Week - 5 - Lectures - 9-10 - Operant - Cond - 15mar2021
PSYU-X1101 - 2021 - Week - 5 - Lectures - 9-10 - Operant - Cond - 15mar2021
1
Instrumental Learning or
Operant Conditioning
Learning that occurs from possible consequences of our actions
E.L. Thorndike (1874-1949) Law of Effect B.F. Skinner (1904-1990) "The consequences of
Learning caused by consequences or “effects”. behaviour determine the probability that the
Behaviours that had a “satisfying” effect were behaviour will occur again”
"stamped in” and behaviours that had an
“annoying” effect were “stamped out”
Learning that occurs from
consequences of our actions
Thorndike Skinner
Law of Effect emphasized that:
Thorndike’s rule is that the 1. reinforcement (which
probability of an action being increases the likelihood of
repeated is strengthened a response) and
when it is followed by a 2. punishment (decreases the
pleasant or satisfying probability)
consequence. are always defined after the
fact.
If the dog
It’swhen
begs about
he the
sees food,
CONSEQUENCES!
then food is a
reinforcer
Operant Conditioning
• Through operant conditioning, an individual
makes an association between a particular
behaviour and a consequence (Skinner, 1938).
• It is learning through reinforcement (reward)
and punishment
• Behaviour (responses) is voluntary
• Behaviour is modified according to its
consequences
The learning of a new association between behaviour
and its consequences
Response
becomes Reinforcement
more likely
“No!” Response
becomes
less likely
Punishment
How is Operant Conditioning different to
Classical Conditioning?
puff
Human
Elicited
Dog
Emitted
Thorndike
• Studied cats placed in puzzle boxes
Law of Effect:
v Behaviour that results in reward will be more likely in the future.
v Behaviour that results in punishment will be less likely in the future.
v Behaviour is controlled by its consequences.
Edward L. Thorndike (1911) Animal Intelligence CHAPTER V LAWS AND HYPOTHESES FOR behaviour
http://psychclassics.yorku.ca/Thorndike/Animal/chap5.htm
Thorndike Studied cats placed in puzzle boxes
Thorndike’s Law of Effect
If a behaviour is reinforced, it is
R1 (bite at bars) MORE likely to occur
FIRST TRIAL
R2 (jump up & down)
Tendency R3 (meow) If a behaviour is punished, it is
to perform LESS likely to occur
R4……
LATER TRIAL R1 (bite at bars) Incorrect responses weaken when not rewarded
Olivia Colman
for best actress
in The Favourite
The Consequence of Responding in
Operant Conditioning
(The term consequence is used when there is a contingent
relationship between a behaviour and an event - a
consequence is an event that is CAUSED by a behaviour. )
Consequences include events that may
involve:
• the presentation of a stimulus
• the removal of a stimulus that is
already present
Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Positive Positive
Contingency Added) Yum J
Reinforcement
Write-off L
Punishment
Positive Reinforcement Positive Punishment
Negative
FineNegative
L Relief JNegative
(Stimulus Punishment Reinforcement
Removed) Negative Punishment Negative Reinforcement
Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Contingency Added) Yum J Write-off L
Positive Reinforcement Positive Punishment
Negative
Fine L Relief J
(Stimulus
Removed) Negative Punishment Negative Reinforcement
Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Positive Positive
Contingency Added) Reinforcement Punishment
– Reinforcement is nice
– Negative means removal
Contrasting Positive and Negative
Reinforcement
Baby's View
• Wakes up hungry à Cries Receives bottle à
(response) (positive reinforcement for baby)
Mother's View
• Hears crying (aversive stimulus) à Gives bottle
(response) à Crying stops (negative reinforcement
for mother)
Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Positive Positive
Contingency Added) Reinforcement Punishment
• Negative Punishment
– The removal of a pleasant stimulus after a behaviour
reduces the likelihood of the behaviour occurring in the
future. Speed à Lose licence
lose
Positive versus Negative Punishment
Discriminative stimuli
S+
Free Coffee
A controlling stimulus that sets the
occasion for reinforcement of an operant.
Responding in presence of this S+ will get
me the outcome I am seeking.
S-
Out of Order
Responding in presence of this S- will NOT
get me the outcome I am seeking
or extinction stimulus - a stimulus that sets
the occasion for non-reinforcement or
extinction of an operant. 39
Acquiring Complex Behaviours:
Shaping
Skinner shapes Agnes to jump up wall for
Look magazine May 20, 1952
In 1951 published a paper in Scientific American in which he claimed it was easy to train animals.
Journalist Joseph Roddy for Look in 1952 called his bluff, and arranged to meet Skinner with a
dog, and ask Skinner to teach the dog a trick of Roddy’s choosing….
Skinner shapes Agnes
In the span of 20 minutes, Skinner was able to use reinforcement of successive approximations to
shape Agnes’s behaviour. The result was a pretty good trick: Agnes would wander in, stand on her
hind legs, and jump on command.
Variables That Affect Operant Conditioning
these apply to both Reinforcers and Punishers
• Reinforcer Magnitude
– The larger the reward - the faster the acquisition of learning.
– The quality of the reinforcer is also important.
– N.B. the reward has to be of a certain value in order for the
instrumental response to be performed (after acquisition).
Magnitude of Reinforcer
• Crespi – the larger the reward the faster rats run down an
alley.
• Likelihood and intensity of a response depends on size of
reward.
– Must be sufficient for response to occur
– Intensity of response varies with size of reward.
• Reward size also affects human learning.
– Children age 4 & 5 learn faster when given small prizes
instead of buttons (tokens).
– Adults show higher achievement when paid more money.
– Rats prefer 1 cube in pieces to one cube as it appears to be greater
Variables That Affect Operant Conditioning
Effect of delay of reward on operant conditioning
Delay of Reward
The greater the delay - the weaker the learning.
Frequency of reinforcement
• Must the response always be reinforced?
Keys to peck
Food hopper
Skinner ‘mechanised’ the process with his invention of the Skinner box
Reinforcement Contingencies: Timing and
Schedules of Reinforcement
• Intermittent Reinforcement:
Continuous periodic administration
Continuous Partial/Intermittent
of the reinforcement.
• Partial (Intermittent) Reinforcement
– Maintains behaviours with fewer reinforcement
trials following initial learning
– reinforcing a response only part of the time
– results in slower acquisition
– greater resistance to extinction
Schedules of Reinforcement
• Ratio schedules
– Reinforcement depends on the number of
responses made
• Fixed Ratio (FR)
– reinforces a response only after a specified
number of responses
– faster you respond the more rewards you get
– different ratios
– very high rate of responding
– like piecework pay
Schedules of Reinforcement
• Variable Ratio (VR)
– reinforces a response after an
unpredictable number of responses
– average ratios
– an example would be playing poker
machines
– very hard to extinguish because of
unpredictability
Schedules of Reinforcement
• Interval schedules
– Based on the amount of time between
reinforcements. The first response following
the minimum time is reinforced.
• Fixed Interval (FI)
– reinforces a response only after a specified
time has elapsed
– response occurs more frequently as the
anticipated time for reward draws near
• An example would be receiving a pay cheque
every two weeks
Schedules of Reinforcement
• Variable Interval (VI)
– reinforces a response at unpredictable
time intervals
– produces slow steady responding
• An example would be checking your emails at
random times to see if you have a new message
• Waiting for an appropriate wave to catch
• Buying petrol on a cheap(er) day
Typical Response Patterns under Schedules of Reinforcement
High
Medium
Slow
• The basic procedure for establishing a secondary reinforcer is the process of classical
conditioning.
• Skinner used the flash of a strobe light as a conditioned reinforcer to train Agnes
flash light à cube of beef; jump up wall à flash light
Doing what I don’t like doing in order to do what I like doing:
Recidivism
Rehabilitation
Issues of Punishment
4. Learner may learn to fear the administrator
rather than the association between their
behaviour and the punishment
5. Punishment may not undo existing rewards
for a behaviour – unless it is delivered every
time
6. Punitive aggression may lead to modelling of
aggression
Issues of Punishment
Learned Helplessness
When there is no (perceived) relationship between
the individual’s behaviour and punishment
Damned if
I do and
Damned if
I don’t L
Most useful for promoting relaxation, which can help relieve a number of conditions that
are related to stress.
Skinner’s legacy
• Social facilitation
– One’s behaviour prompts similar behaviour of another
• Local or Stimulus enhancement
– Behaviour of one person/animal directs attention of
others to an object
• True imitation
– Imitation of a novel behaviour pattern in order to
achieve a specific goal of particular interest that is
either very unusual or quite improbable to have
occurred by other means (i.e. spontaneously). Complex
80
Social facilitation (simplest)
Silva, K., Bessa, J. & de Sousa, L. (2012) Auditory contagious yawning in domestic dogs (Canis familiaris):
first evidence for social modulation, Animal Cognition, 15, 721-724. doi:10.1007/s10071-012-0473-2
Local or stimulus enhancement
• Local or stimulus enhancement refers to a process in which one individual directs
another individual's attention to a particular object or some activity or some place
in the environment - after observing another individual engage in that activity, but
the observer does not necessarily attend to the actions of the “model”.
• E.g. stare at sky – others will look up to see what you’re looking at
Imitation – the most complex form of
social learning- only primates??
• Imitation - (least simple). True Imitation — When an animal
imitates a behaviour that it has never done before. True
imitation can be defined as duplicating a novel behaviour (or
sequence of behaviours) in order to achieve a specific goal,
without showing any understanding of the behaviour.
Kids absorb your drinking – DrinkWise.com.au
3. REPRODUCTION 4. REINFORCEMENT
Observational We are more likely
learning cannot to repeat a
occur if we lack the modelled behaviour
motivation or motor if the model is
skills necessary to reinforced for the
imitate the model behaviour
Albert Bandura
88
Social Learning Theory
• Children can learn by observation
– Vicarious reinforcement
• Child can learn without immediate performance of
the behaviour (may not produce the behaviour until they are an adult)
• Achieved through formation of a SYMBOLIC
REPRESENTATION
• Have to see someone do it (a MODEL)
Key Features of the MODEL
• APPROPRIATENESS
– aggressive male models more likely to be imitated
than aggressive female ones, due to cultural factors
in Western world
• SIMILARITY
– children are more likely to imitate someone they
perceive as similar to themselves
• same sex, same age, same ethnic group, etc.
Bandura Ross & Ross (1961)
• Aim – If children were witnesses to an aggressive
display by an adult they would imitate this
aggression when given the opportunity.
Bandura, A. Ross, D and Ross S.A. (1961). Transmission of aggression through imitation
of aggressive models. Journal of Abnormal and Social Psychology, 63, 575-582.
http://psychclassics.yorku.ca/Bandura/bobo.htm
Bandura Ross & Ross (1961)
Sockeroo!
• Method – a laboratory
experiment in controlled
conditions.
• There were three conditions
• 24 children in each condition Physical & verbal
– Non aggressive condition behaviours
– Aggressive condition
– Control condition
• There were male and female
role models
12 children in each
How the children played when they
thought they were unobserved!
Bandura Ross & Ross (1961)
• What was observed?
The criteria
• Aggression - physical & verbal
• Imitative aggression
• Non-imitative
What did Bandura et al. (1961) find?
– exposure to aggressive models will lead to imitation
of the aggression observed
– exposure to non-aggressive models generally has an
inhibiting effect on aggressive behaviour
– same-sex imitation is greater than opposite-sex
imitation for some behaviours (Boys especially)
– boys imitate aggression more than girls and are
generally more aggressive except for verbal
aggression
Conclusions
§ Aggression is a learned behaviour, not an in-
built instinct
§ Learning can take place in absence of any
reinforcement, only via observation and
modelling
§ Modelling is a powerful and fast way of
learning
Bandura’s further research
§ Bandura, Ross & Ross (1963): children
watched films with either an
aggressive or non-aggressive model
§ Filmed model produced even more
aggression than live model
§ Model rewarded or punished for
aggression
§ Children imitated the rewarded
aggressive model the most
§ Bandura’s research as the ‘first
generation’ of scientific research on
the effects of media violence on
children
http://www.topics-mag.com/edition02/images/tv_kidsyuki.jpeg