Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

Learning: an Introduction – Part 2

1
Instrumental Learning or
Operant Conditioning
Learning that occurs from possible consequences of our actions

E.L. Thorndike (1874-1949) Law of Effect B.F. Skinner (1904-1990) "The consequences of
Learning caused by consequences or “effects”. behaviour determine the probability that the
Behaviours that had a “satisfying” effect were behaviour will occur again”
"stamped in” and behaviours that had an
“annoying” effect were “stamped out”
Learning that occurs from
consequences of our actions
Thorndike Skinner
Law of Effect emphasized that:
Thorndike’s rule is that the 1. reinforcement (which
probability of an action being increases the likelihood of
repeated is strengthened a response) and
when it is followed by a 2. punishment (decreases the
pleasant or satisfying probability)
consequence. are always defined after the
fact.
If the dog
It’swhen
begs about
he the
sees food,
CONSEQUENCES!
then food is a
reinforcer
Operant Conditioning
• Through operant conditioning, an individual
makes an association between a particular
behaviour and a consequence (Skinner, 1938).
• It is learning through reinforcement (reward)
and punishment
• Behaviour (responses) is voluntary
• Behaviour is modified according to its
consequences
The learning of a new association between behaviour
and its consequences

Response
becomes Reinforcement
more likely

“No!” Response
becomes
less likely
Punishment
How is Operant Conditioning different to
Classical Conditioning?

The light provides information as to when responding will be rewarding.


Key Difference between Classical and Operant Conditioning

puff
Human

Elicited

Dog

Emitted
Thorndike
• Studied cats placed in puzzle boxes

Law of Effect:
v Behaviour that results in reward will be more likely in the future.
v Behaviour that results in punishment will be less likely in the future.
v Behaviour is controlled by its consequences.

Edward L. Thorndike (1911) Animal Intelligence CHAPTER V LAWS AND HYPOTHESES FOR behaviour
http://psychclassics.yorku.ca/Thorndike/Animal/chap5.htm
Thorndike Studied cats placed in puzzle boxes
Thorndike’s Law of Effect
If a behaviour is reinforced, it is
R1 (bite at bars) MORE likely to occur
FIRST TRIAL
R2 (jump up & down)
Tendency R3 (meow) If a behaviour is punished, it is
to perform LESS likely to occur
R4……

Rcorrect (pull at string) à Reward

LATER TRIAL R1 (bite at bars) Incorrect responses weaken when not rewarded

R2 (jump up & down)


R3 (meow)
Tendency
R4……
to perform

Rcorrect (pull at string) à Reward Correct response gets “stamped in”


Thorndike’s Law of Effect
Learning occurs by trial and error until the correct Response à Reward relationship is acquired
Law of Effect
“Of several responses made to the same situation, those which
are accompanied or closely followed by satisfaction to the
animal will, other things being equal, be more firmly connected
with the situation...; those which are accompanied or closely
followed by discomfort ... will have their connections with that
situation weakened. The greater the satisfaction or discomfort,
the greater the strengthening or weakening of the bond.”
(Thorndike, 1911, p.244)

Negative Law of Effect dropped some time after 1929

Thorndike, E.L. (1911). Animal Intelligence, New York: Macmillan.


B. F. Skinner
Operant Conditioning: Learning New Behaviours
• B.F. Skinner’s “Radical behaviourism”:
– The factor controlling an organism’s behaviour
was the consequence of that behaviour.
– There was no need to hypothesise internal
processes.
– The only appropriate object of study is overt,
observable behaviour
– The laws governing “learning” via operant
conditioning were the same for all organisms.
Reinforcement Contingencies
• Contingencies reflect conditions that must be met in
order for reinforcement to be dispensed.
• The reinforcement must be meaningful to the
organism

• The reinforcement must follow the behaviour.

Olivia Colman
for best actress
in The Favourite
The Consequence of Responding in
Operant Conditioning
(The term consequence is used when there is a contingent
relationship between a behaviour and an event - a
consequence is an event that is CAUSED by a behaviour. )
Consequences include events that may
involve:
• the presentation of a stimulus
• the removal of a stimulus that is
already present

Therefore, there are …..


…. Two Types of Contingent Relationships
Between a Response and a Consequence
• Positive contingency - when a response
causes the presentation of a stimulus.
• Negative contingency - when a response
causes the removal of a stimulus that is
already present.

But remember that there are . .


.
. . .Different Types of
Stimulus Events

• Pleasant (desired; appetitive)


• Unpleasant (undesired, aversive)
• neutral
NOTE: The terms “positive” and “negative” should ONLY be used to describe
the contingent relationships, NOT the type of stimulus.
Positive means that some event is added into the situation as the
consequence of an action. It does not mean nice!!
Negative means some event is removed the situation as the consequence of
an action. It does not mean nasty!!
Two Types of Effects of Behaviour- Consequence
Relationships on Behaviour:

• Reinforcement - any contingent


relationship between a consequence and
response that causes the response to
increase in frequency.
• Punishment - any contingent relationship
between a consequence and response that
causes the response to decrease in
frequency.
Four Types of Behaviour-Consequence
Relationships in Operant Conditioning

Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Positive Positive
Contingency Added) Yum J
Reinforcement
Write-off L
Punishment
Positive Reinforcement Positive Punishment

Negative
FineNegative
L Relief JNegative
(Stimulus Punishment Reinforcement
Removed) Negative Punishment Negative Reinforcement

= behaviour increases in frequency

= behaviour decreases in frequency


Four Types of Behaviour-Consequence
Relationships in Operant Conditioning

Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Contingency Added) Yum J Write-off L
Positive Reinforcement Positive Punishment

Negative
Fine L Relief J
(Stimulus
Removed) Negative Punishment Negative Reinforcement

= behaviour increases in frequency

= behaviour decreases in frequency


Four Types of Behaviour-Consequence Relationships in
Operant Conditioning: Reinforcement

Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Positive Positive
Contingency Added) Reinforcement Punishment

Negative Negative Negative


(Stimulus Punishment Reinforcement
Removed)

= behaviour increases in frequency


Reinforcement
• Positive Reinforcement
– The presentation of a pleasant stimulus after a
behaviour makes the behaviour more likely to
occur in the future. Joy
• Negative Reinforcement
– The removal of an aversive stimulus after a
behaviour makes the behaviour more likely to
occur in the future. E.g.
Relief
1. Negative reinforcement is not punishment
2. Negative reinforcement is not punishment
3. Negative reinforcement is not punishment

– Reinforcement is nice
– Negative means removal
Contrasting Positive and Negative
Reinforcement
Baby's View
• Wakes up hungry à Cries Receives bottle à
(response) (positive reinforcement for baby)
Mother's View
• Hears crying (aversive stimulus) à Gives bottle
(response) à Crying stops (negative reinforcement
for mother)

Example of a positive and negative reinforcement in a mother-infant interaction


Four Types of Behaviour-Consequence Relationships in
Operant Conditioning: Punishment

Type of Stimulus
Appetitive “Nice” Aversive “Nasty”
Positive
(Stimulus
Positive Positive
Contingency Added) Reinforcement Punishment

Negative Negative Negative


(Stimulus Punishment Reinforcement
Removed)

= behaviour decreases in frequency


Punishment
• Positive Punishment
– The presentation of an aversive stimulus after a behaviour
reduces the likelihood of the behaviour occurring in the
future. Primary or Social

• Negative Punishment
– The removal of a pleasant stimulus after a behaviour
reduces the likelihood of the behaviour occurring in the
future. Speed à Lose licence

lose
Positive versus Negative Punishment
Discriminative stimuli

The light provides information as to when responding will be rewarding.


Discriminative stimuli
• In classical conditioning – they elicit
autonomic responses (i.e. involuntary reflexes)

• In operant conditioning – they inform us as to


when we can emit a voluntary response
The Discriminative Stimulus: Knowing
When to Respond

• Discriminative stimulus: when present a


response will be followed by reward or
punishment
– Can be a particular situation or thing in the
environment
• May produce the behaviour in response to
a similar stimulus (stimulus generalisation),
unless it doesn’t produce same reward
(stimulus discrimination)
Acquiring Complex Behaviours:
Shaping
• Complex behaviours, such as bar-
pressing, unlikely to occur
spontaneously, so they are hard to
reinforce.
• Solution: Shaping
– A procedure in which reinforcement is
delivered for successive approximations of
the desired response
• Training a dog to fetch the paper
• Teaching a child to tie shoelaces
Pigeon learning to read through
shaping and discrimination training
Pigeon learning to read through shaping and
discrimination training
Example of the two kinds of discriminative stimuli in
operant conditioning for a caffeine addict

S+
Free Coffee
A controlling stimulus that sets the
occasion for reinforcement of an operant.
Responding in presence of this S+ will get
me the outcome I am seeking.

S-
Out of Order
Responding in presence of this S- will NOT
get me the outcome I am seeking
or extinction stimulus - a stimulus that sets
the occasion for non-reinforcement or
extinction of an operant. 39
Acquiring Complex Behaviours:
Shaping
Skinner shapes Agnes to jump up wall for
Look magazine May 20, 1952

In 1951 published a paper in Scientific American in which he claimed it was easy to train animals.
Journalist Joseph Roddy for Look in 1952 called his bluff, and arranged to meet Skinner with a
dog, and ask Skinner to teach the dog a trick of Roddy’s choosing….
Skinner shapes Agnes
In the span of 20 minutes, Skinner was able to use reinforcement of successive approximations to
shape Agnes’s behaviour. The result was a pretty good trick: Agnes would wander in, stand on her
hind legs, and jump on command.
Variables That Affect Operant Conditioning
these apply to both Reinforcers and Punishers
• Reinforcer Magnitude
– The larger the reward - the faster the acquisition of learning.
– The quality of the reinforcer is also important.
– N.B. the reward has to be of a certain value in order for the
instrumental response to be performed (after acquisition).
Magnitude of Reinforcer
• Crespi – the larger the reward the faster rats run down an
alley.
• Likelihood and intensity of a response depends on size of
reward.
– Must be sufficient for response to occur
– Intensity of response varies with size of reward.
• Reward size also affects human learning.
– Children age 4 & 5 learn faster when given small prizes
instead of buttons (tokens).
– Adults show higher achievement when paid more money.
– Rats prefer 1 cube in pieces to one cube as it appears to be greater
Variables That Affect Operant Conditioning
Effect of delay of reward on operant conditioning

Delay of Reward
The greater the delay - the weaker the learning.
Frequency of reinforcement
• Must the response always be reinforced?

Does the rat have to be rewarded


every time it presses the bar?

Yes when learning a


new response
Frequency of reinforcement
Reinforcement Contingencies: Timing and Schedules
of Reinforcement
• Continuous reinforcement: reinforcing the desired
response each time it occurs
Frequency of reinforcement
Reinforcement Contingencies: Timing and
Schedules of Reinforcement
• Continuous reinforcement
– Problems:
• Habituation to the reinforcer:
the reinforcement loses its
reinforcing qualities
• Satiation: the organism
becomes sated with the
reinforcer.
How to measure responding à reinforcement

Keys to peck

Food hopper

Skinner ‘mechanised’ the process with his invention of the Skinner box
Reinforcement Contingencies: Timing and
Schedules of Reinforcement
• Intermittent Reinforcement:
Continuous periodic administration
Continuous Partial/Intermittent

of the reinforcement.
• Partial (Intermittent) Reinforcement
– Maintains behaviours with fewer reinforcement
trials following initial learning
– reinforcing a response only part of the time
– results in slower acquisition
– greater resistance to extinction
Schedules of Reinforcement
• Ratio schedules
– Reinforcement depends on the number of
responses made
• Fixed Ratio (FR)
– reinforces a response only after a specified
number of responses
– faster you respond the more rewards you get
– different ratios
– very high rate of responding
– like piecework pay
Schedules of Reinforcement
• Variable Ratio (VR)
– reinforces a response after an
unpredictable number of responses
– average ratios
– an example would be playing poker
machines
– very hard to extinguish because of
unpredictability
Schedules of Reinforcement
• Interval schedules
– Based on the amount of time between
reinforcements. The first response following
the minimum time is reinforced.
• Fixed Interval (FI)
– reinforces a response only after a specified
time has elapsed
– response occurs more frequently as the
anticipated time for reward draws near
• An example would be receiving a pay cheque
every two weeks
Schedules of Reinforcement
• Variable Interval (VI)
– reinforces a response at unpredictable
time intervals
– produces slow steady responding
• An example would be checking your emails at
random times to see if you have a new message
• Waiting for an appropriate wave to catch
• Buying petrol on a cheap(er) day
Typical Response Patterns under Schedules of Reinforcement

The slope of the cumulative record reflects


the subject’s response rate

High

Medium

Slow

Each type of reinforcement schedule tends to generate a characteristic


pattern of responding.
Schedules of reinforcement and patterns of response
Everyday examples behaviours that are
rewarded on a scheduled basis
Fixed Variable
Ratio
Interval
Not all behaviour is about getting food
NotPrimary
all behaviour is aboutReinforcers
and Secondary getting food
• Primary: Reinforcers such as food,
water and sex that have an innate
basis because of their biological value
to the organism.

• Secondary: Stimuli, such as money or


grades, that acquire their reinforcing
power by a learned association with
a primary reinforcer.
– Also called Conditioned
Reinforcers.

• The basic procedure for establishing a secondary reinforcer is the process of classical
conditioning.
• Skinner used the flash of a strobe light as a conditioned reinforcer to train Agnes
flash light à cube of beef; jump up wall à flash light
Doing what I don’t like doing in order to do what I like doing:

• The Premack Principle: (Grandma’s Rule)


– Using a desired or high frequency behaviour to reinforce a
less desirable or lower frequency behaviour;
• A more-preferred activity can be used to reinforce a less-
preferred activity.
– Discovered by David Premack.
• Premack, D. (1959). Toward empirical behavior laws: I. Positive reinforcement.
Psychological Review, 66, 219-233

If you eat your veggies… then you get to eat cake


Issues of Punishment
1. Punishment does not usually result in long term
behavioural change - effects are temporary
2. Punishment does not promote better, alternative
behaviour
• Example: If Punish a child for fighting with sibling: it does
not teach the child to cooperate with their sibling
• Better: Reinforce an alternative response
3. Punishment typically leads to escape behaviour;

Recidivism

Rehabilitation
Issues of Punishment
4. Learner may learn to fear the administrator
rather than the association between their
behaviour and the punishment
5. Punishment may not undo existing rewards
for a behaviour – unless it is delivered every
time
6. Punitive aggression may lead to modelling of
aggression
Issues of Punishment
Learned Helplessness
When there is no (perceived) relationship between
the individual’s behaviour and punishment

Damned if
I do and
Damned if
I don’t L

If the punishment is very aversive à PTSD


Learned Helplessness
Martin Seligman
Overmier and Seligman (1967) found
that dogs exposed to inescapable and
unavoidable electric shocks in one
situation later failed to learn to escape
shock in a different situation where
escape was possible. Shortly thereafter
Seligman and Maier (1967)
demonstrated that this effect was
caused by the uncontrollability of the
original shocks.

Overmier, J. B., & Seligman, M. E. P. (1967) Effects of


inescapable shock upon subsequent escape and
avoidance learning. Journal of Comparative and
Physiological Psychology, 63, 28-33.
Seligman, M. E. P., & Maier, S. F. (1967). Failure to
escape traumatic shock. Journal of Experimental
Psychology, 74, 1-9.
Maier, S. F. & Seligman, M. E. P., (1976) Learned
Helplessness: Theory and Evidence Journal of Experimental
Psychology: General, 105, 3-46.
Applications of Operant Conditioning
• Behavioural Therapy
– wide variety of everyday behaviour problems,
including obesity, smoking, alcoholism, social
anxiety, depression, delinquency, and aggression.
– Token Economies
– Remedial Education
– Therapy for Autism
– Training dogs
– Biofeedback
Psychology in Everyday Life:
Operant Conditioning Can we learn to
control
In biofeedback training: involuntary body
1) internal bodily processes (like blood responses?
pressure or muscle tension) are
electrically recorded

2) information is amplified and reported


back to the patient through
headphones, signal lights, and other
means

3) this information helps the person


learn to control bodily processes not
normally under voluntary control

Most useful for promoting relaxation, which can help relieve a number of conditions that
are related to stress.
Skinner’s legacy

Sarbi, an explosive detection dog for the


Australian Army, received RSPCA's most
prestigious animal bravery award the Purple Cross,
at a ceremony at the Australian War Memorial in
Canberra
Observational Learning
aka Social Learning - Bandura
Aka vicarious learning, imitation, modelling,…

Bandura, A. (1977) Social Learning Theory. New Jersey: Prentice Hall.


In other words it’s about learning
from others
• Observational learning or vicarious
conditioning (sometimes referred to as
behavioural contagion)
• Learning by watching others “models” or
“demonstrators” –
it is how we acquire new information by being
exposed to one another in a common
environment
Observational Learning: Overview
• Learning that occurs as a result of observing the
experiences of others
• Consider: What would life be like if you could
only learn through your own trial and error?
– Adaptive to learn from others
– Basis of how our culture gets passed from one
generation to the next
• Who learns by observation: many species,
including chimpanzees, rhesus monkeys, some
birds and – bees!
What behaviours are learnt?
• Specific actions and learning general styles of behaviour

When do we learn by observing?


• We copy when asocial learning is costly (dangerous/uncertain situations).
Can’t afford to learn from your own mistakes as in operant conditioning -
e.g. might get lost if don’t take the same precautions as experienced bush-
walkers.
• We copy a successful individual i.e. copy if someone is doing it better, or
when our established behaviour is unproductive e.g. buy/sell shares when
your wealthy friend does.
• Be careful not to get caught by Turnitin!! Copy your clever friend’s style of
studying not their output.
Laland, K.N. (2004) Social learning strategies, Learning and behaviour, 32 , 4-14
Besides true imitation, social learning results from one
or more of a number of other social phenomena.
Simple

• Social facilitation
– One’s behaviour prompts similar behaviour of another
• Local or Stimulus enhancement
– Behaviour of one person/animal directs attention of
others to an object
• True imitation
– Imitation of a novel behaviour pattern in order to
achieve a specific goal of particular interest that is
either very unusual or quite improbable to have
occurred by other means (i.e. spontaneously). Complex

80
Social facilitation (simplest)

• an increase in the frequency or intensity of a behaviour (that is already in the


animal’s repertoire) caused by the presence of others (of the same species)
performing the same behaviour at that time.
Dogs can 'catch' yawns from humans - but it seems to
work best when there's a bond between dog and man
(5 X more likely to yawn when it’s their owner)

Silva, K., Bessa, J. & de Sousa, L. (2012) Auditory contagious yawning in domestic dogs (Canis familiaris):
first evidence for social modulation, Animal Cognition, 15, 721-724. doi:10.1007/s10071-012-0473-2
Local or stimulus enhancement
• Local or stimulus enhancement refers to a process in which one individual directs
another individual's attention to a particular object or some activity or some place
in the environment - after observing another individual engage in that activity, but
the observer does not necessarily attend to the actions of the “model”.
• E.g. stare at sky – others will look up to see what you’re looking at
Imitation – the most complex form of
social learning- only primates??
• Imitation - (least simple). True Imitation — When an animal
imitates a behaviour that it has never done before. True
imitation can be defined as duplicating a novel behaviour (or
sequence of behaviours) in order to achieve a specific goal,
without showing any understanding of the behaviour.
Kids absorb your drinking – DrinkWise.com.au

When kids look up to you, what do they see?


Observational Learning Processes

In order to learn by observation four


processes are involved:
1. Attention
2. Retention
3. Reproduction
4. Motivation (from reinforcement)
1. ATTENTION 2 . RETENTION

To learn new behaviours, we need to carefully note


Observational learning requires attention. This is why
and remember the model’s directions and
teachers insist on having students watch their
demonstrations
demonstrations

3. REPRODUCTION 4. REINFORCEMENT
Observational We are more likely
learning cannot to repeat a
occur if we lack the modelled behaviour
motivation or motor if the model is
skills necessary to reinforced for the
imitate the model behaviour
Albert Bandura

• Albert Bandura proposes we


learn through IMITATION or
MODELLING
• “Observational (or Vicarious)
Learning”
• This explain the SPEED of
learning in young children (no
need for trial-and-error)

88
Social Learning Theory
• Children can learn by observation
– Vicarious reinforcement
• Child can learn without immediate performance of
the behaviour (may not produce the behaviour until they are an adult)
• Achieved through formation of a SYMBOLIC
REPRESENTATION
• Have to see someone do it (a MODEL)
Key Features of the MODEL

• APPROPRIATENESS
– aggressive male models more likely to be imitated
than aggressive female ones, due to cultural factors
in Western world
• SIMILARITY
– children are more likely to imitate someone they
perceive as similar to themselves
• same sex, same age, same ethnic group, etc.
Bandura Ross & Ross (1961)
• Aim – If children were witnesses to an aggressive
display by an adult they would imitate this
aggression when given the opportunity.

• TWO adult ‘role models’ -


one male and one female and a female
experimenter

Bandura, A. Ross, D and Ross S.A. (1961). Transmission of aggression through imitation
of aggressive models. Journal of Abnormal and Social Psychology, 63, 575-582.

http://psychclassics.yorku.ca/Bandura/bobo.htm
Bandura Ross & Ross (1961)
Sockeroo!
• Method – a laboratory
experiment in controlled
conditions.
• There were three conditions
• 24 children in each condition Physical & verbal
– Non aggressive condition behaviours

– Aggressive condition
– Control condition
• There were male and female
role models
12 children in each
How the children played when they
thought they were unobserved!
Bandura Ross & Ross (1961)
• What was observed?
The criteria
• Aggression - physical & verbal
• Imitative aggression
• Non-imitative
What did Bandura et al. (1961) find?
– exposure to aggressive models will lead to imitation
of the aggression observed
– exposure to non-aggressive models generally has an
inhibiting effect on aggressive behaviour
– same-sex imitation is greater than opposite-sex
imitation for some behaviours (Boys especially)
– boys imitate aggression more than girls and are
generally more aggressive except for verbal
aggression
Conclusions
§ Aggression is a learned behaviour, not an in-
built instinct
§ Learning can take place in absence of any
reinforcement, only via observation and
modelling
§ Modelling is a powerful and fast way of
learning
Bandura’s further research
§ Bandura, Ross & Ross (1963): children
watched films with either an
aggressive or non-aggressive model
§ Filmed model produced even more
aggression than live model
§ Model rewarded or punished for
aggression
§ Children imitated the rewarded
aggressive model the most
§ Bandura’s research as the ‘first
generation’ of scientific research on
the effects of media violence on
children
http://www.topics-mag.com/edition02/images/tv_kidsyuki.jpeg

You might also like