Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/308965897

Error based and reward based learning

Presentation · April 2016


DOI: 10.13140/RG.2.2.34036.14726

CITATIONS READS

0 1,042

1 author:

Wondimu W Teka
U.S. Food and Drug Administration
28 PUBLICATIONS   225 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Fractional-order Izhikevich model View project

All content following this page was uploaded by Wondimu W Teka on 10 October 2016.

The user has requested enhancement of the downloaded file.


Error based and reward based
learning
By Wondimu Teka

1
Research goal
• Develop a model that has both BG and cerebellum compartments
• Apply error-based and reward based learning on the model
• Produce experimental results for
• Control group (Normal BG and cerebellum)
(Sensory and reward prediction errors are presented)
• Damaged BG
(Only sensory prediction errors)
• Damaged Cerebellum
(only reward prediction errors)
• Propose model predictions
• Write the paper in parallel with the above research tasks

2
Cerebellum and basal ganglia are separately responsible:
Doya 1999/2000 proposal (Kenji Doya).
The basal ganglia are specialized for
reinforcement learning, which is
guided by the reward signal encoded
in the dopaminergic input from the
substantia nigra. The cerebellum is
specialized for supervised learning,
which is guided by the error signal
encoded in the climbing fiber input
from the inferior olive. The cerebral
cortex is specialized for unsupervised
learning, which is guided by the
statistical properties of the input
signal itself, but may also be regulated
Doya, Kenji. "Complementary roles of basal ganglia and cerebellum
in learning and motor control." Current opinion in neurobiology 10.6 by the ascending neuromodulatory
(2000): 732-739. inputs. 3
Error-based and reward-based learning
Error-based learning Reward-based learning
(Supervised learning) (Reinforcement learning)
Sensory prediction error Reward prediction error
• Minimize the error (error correction) • Maximize reward
• Observed sensory feedback • Minimize punishment
• The magnitude and sign of error presented • Success/ failure
Action specification Action selection
(move closer to the target) • move around – explore
• trial-and-error search
Visual feedback No Visual feedback
Cerebellum is responsible Basal ganglia is responsible
Learning is faster, forgetting is faster Learning is slower, forgetting too
Sensory remapping Reach variability is high

4
Error correction: Feedback of errors can be used to directly improve
performance. In supervised or error-based learning the error vector gives both
the magnitude and the direction of the error, and the learning system then shifts
subsequent performance in the opposite direction, with the intention to reduce
the error on subsequent trials. The subject has clear information about the sign
and magnitude of the error, and the subject uses this to minimize error.

Reinforcement learning: The process by which an animal or artificial system can


learn to optimize its behavior using rewards and/or punishments. The value of
actions reinforces those behaviors that maximize reward or minimize
punishment. However, the feedback reward or punishment does not dictate how
to improve performance.

5
Error-based learning to understand motor adaptation and motor disordered

For references see:


Wolpert, Daniel M., Jörn
Diedrichsen, and J.
Randall
Flanagan. "Principles of
sensorimotor learning.
" Nature
Reviews Neuroscience
12.12 (2011): 739-751.

6
Methods of experiment on motor learning and adaptation
The most common experimental methods for Error based and reward based
learnings are
• Visual perturbation (rotation)
• Force field perturbation
• Gradual visual perturbation
• Gradual force field perturbation
The errors are caused by the perturbation (external factor) or motor noise
(internal factor).
These experimental tasks involve the adjustment of an internal model to
compensate for an external perturbation.

7
Sensory prediction error dominates reward prediction error
Although reward prediction error is useful for learning and adaptation, the
change in the motor commands is driven almost entirely by sensory prediction
errors when there is high quality sensory feedback.
Reward prediction error (RPE) is very useful if
there is a lack of (no) sensory prediction error
(SPE).
Learning with RPE is slower than Learning with SPE
Izawa, Jun, and Reza Shadmehr. "Learning from sensory and
reward prediction errors during motor adaptation." PLoS
Comput Biol 7.3 (2011): e1002012.

8
Reward based Learning (RPE) shows higher degree of retention
than Error Based learning (SPE), i.e. Aftereffect decay is slow
Therrien, Amanda S.,
Daniel M. Wolpert,
and Amy J. Bastian.
"Effective
reinforcement
learning following
cerebellar damage
requires a balance
between exploration
and motor noise."
Brain (2015): awv329.

9
Cerebellar patients show complete retention from RBL, but
Complete forgettingfrom
Therrien, Amanda S.,
Daniel M. Wolpert,
and Amy J. Bastian.
"Effective
reinforcement
learning following
cerebellar damage
requires a balance
between exploration
and motor noise."
Brain (2015): awv329.

10
Reward based Learning (RPE) shows higher degree of retention
than Error Based learning (SPE), i.e. Aftereffect decay is slow
BE= Binary error, VE= vector error BE
for RBL, NA for EBL,
BE+VE for both.
Error clamp - false artificial
correction, and Sensory error is 0
and reward is provided .

All are for healthy group


Shmuelof, Lior, et al. "Overcoming motor “forgetting” through
reinforcement of learned actions." The Journal of neuroscience
32.42 (2012): 14617-14621a.

11
Reward caused greater memory retention, and Punishment
led to faster learning
Reward-based feedback during adaptation
led subsequently to greater retention
when the directional feedback was fully
withdrawn (no vision). Previous work has
shown that positive reinforcement can
influence both online (retention across
trials) and offline (retention across time)
motor retention.
Galea, Joseph M., et al. "The dissociable
effects of punishment and reward on
motor learning." Nature neuroscience 18.4
(2015): 597-602.

12
Reward caused greater memory retention

Galea, Joseph M., et al. "The dissociable effects of punishment and reward on motor
learning." Nature neuroscience 18.4 (2015): 597-602.
13
Combination of reward and sensory (error) feedback accelerates
learning compared with either form of feedback alone.

Nikooyan, Ali A., and Alaa A. Ahmed. "Reward


feedback accelerates motor learning."
Journal of neurophysiology 113.2 (2015): 633-
646.

14
Combination of reward and sensory (error) feedback increases
learning performance and minimize reach variability.

BE = Binary error = RPE


VE = vector Error = SPE

All are for healthy group


Shmuelof, Lior, et al. "Overcoming
motor “forgetting” through
reinforcement of learned actions."
The Journal of neuroscience 32.42
(2012): 14617-14621a.

15
Error Based learning (SPE) is very effective only for simple
Learning, for example, distance and angle errors.
Reward based Learning (RPE) is important for complex
learning, for example to correct kinematic errors.
Searching supporting experimental studies

16
Baseline performance with damaged cerebellum
Cerebellar patients do not differ from control group in their baseline performance
- with/without sensory feedback.
Error variability may be high in cerebellar patients, i.e. large Standard deviation of
errors. Note: baseline task is performed after trainings.

Izawa, Jun, Sarah E. Criscimagna-Hemminger, and Reza Shadmehr. "Cerebellar


contributions to reach adaptation and learning sensory consequences of action." The
Journal of neuroscience 32.12 (2012): 4230-4239.

Henriques, Denise YP, et al. "The cerebellum is not necessary for visually driven
recalibration of hand proprioception." Neuropsychologia 64 (2014): 195-204.

Synofzik, Matthis, Axel Lindner, and Peter Thier. "The cerebellum updates predictions
about the visual consequences of one's behavior." Current Biology 18.11 (2008): 814-
818. 17
Cerebellar disorders show impaired motor learning
Cerebellar disorders show impaired motor learning in both visual and force field
adaptationthe. Cerebellum plays an important role in adaptation to visuomotor
(VM) and force field (FF) perturbations. Since the model lacks Cerebellum,
errors in the model are larger.

Donchin, Opher, et al. "Cerebellar


regions involved in adaptation to force
field and visuomotor perturbation."
Journal of neurophysiology 107.1
(2012): 134-147.
18
Learning despite a damaged cerebellum – gradual adaptation
Most studies agree that there is very weak (no) error based learning if
Cerebellum is damaged. However, recent studies show that there is gradual
error based learning despite a damaged cerebellum.
Motor (gradual) adaptation on cerebellum patient is very similar to that of control group.

gradual Visual rotation of 5 degree


for each 20 trials, and maximum
perturbation is 30 degree. Both the
target and the cursor were visible.
Izawa, Jun, Sarah E. Criscimagna-
Hemminger, and Reza Shadmehr.
"Cerebellar contributions to reach
adaptation and learning sensory
consequences of action." The
Journal of neuroscience 32.12
(2012): 4230-4239.

19
Learning despite a damaged cerebellum – gradual adaptation
Aftereffects following reaching with a gradually rotated cursor were
similar across the control and cerebellar patients groups.

gradual Visual rotation of 0.75 degree for each single trail, and
maximum perturbation is 30 degree. For aftereffect only the
target is visible. Henriques, Denise YP, et al. "The cerebellum is
not necessary for visually driven recalibration of hand
proprioception." Neuropsychologia 64 (2014): 195-204.
20
Learning despite a damaged cerebellum – gradual adaptation
cerebellar patients and healthy controls showed learning under error based
(cursor was presented). Note: Patients did not show aftereffect when the visual
feedback and reward are removed, so the learning is from online visual feedback,
It is not the function of cerebellum.

Gradual Visual rotation of 1 degree for each 20


trials, and maximum perturbation is 15 degree.
Both the target and the cursor were visible for
EB. Therrien, Amanda S., Daniel M. Wolpert, and
Amy J. Bastian. "Effective reinforcement learning
following cerebellar damage requires a balance
between exploration and motor noise." Brain
Note: Patients did not show after effect,
(2015): awv329. 21
Contradiction with the previous one.
No learning during a damaged cerebellum: No gradual adaptation
Here, when visual feedback removed (the cursor was not visible), they
showed that there is no gradual adaptation. For a 20-deg visuomotor rotation
in either a single large step or in a series of smaller 5-deg steps. The ataxic
group exhibited a comparable deficit in both conditions.

Schlerf, John E., et al. "Individuals with cerebellar


degeneration show similar adaptation deficits with large
and small visuomotor errors." Journal of neurophysiology
109.4 (2013): 1164-1173.

Compare this with:


Izawa, Jun, Sarah E. Criscimagna-Hemminger, and Reza
Shadmehr. "Cerebellar contributions to reach adaptation
and learning sensory consequences of action." The Journal
of neuroscience 32.12 (2012): 4230-4239.

22
Effects of experimental methods on motor learning and adaptation
Different experimental result may contradict each other because of the type
of the experiment. Example:
1. Cerebellar patients showed gradual visuomotor adaption.
Method: a cursor feedback was provided through the reaching movement.
Izawa, Jun, Sarah E. Criscimagna-Hemminger, and Reza Shadmehr.
"Cerebellar contributions to reach adaptation and learning sensory
consequences of action." The Journal of neuroscience 32.12 (2012): 4230-4239.
2. Cerebellar patients have deficits in adapting their reaching movement to
a gradual visuomotor rotation.
method: only endpoint feedback was provided.
Schlerf, John E., et al. "Individuals with cerebellar degeneration show similar
adaptation deficits with large and small visuomotor errors.
" Journal of neurophysiology 109.4 (2013): 1164-1173.
The difference between 1 and 2 may arise from online feedback correction.23
View publication stats

You might also like