Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

ENPM808F:

Robot Learning
Summer 2017

Lecture 2:
Memory-Based Learning
Course Outline

•  Motor Learning and the EvoluBon of Intelligence

•  Memory-Based Learning

•  Behavior Based RoboBcs

•  Reinforcement Learning

•  Value versus Policy IteraBon

•  Q-Learning and Actor-CriBc Models

•  Robot Shaping and Evolving Behaviors

•  Crossing the Reality Gap

•  ImitaBon and Learning from DemonstraBon

•  Deep Reinforcement Learning with CNNs

•  On-line and Lifelong Learning

Global vs. Local Learning

Global Neural Network Learning (e.g., mul=layer perceptron network)


u=lizes a completely distributed weight representa=on, thus requiring
update of all network weights per paFern per itera=on.


Local Neural Network Learning (e.g., CMAC, RBF, LWR) u=lizes
Locally distributed weight representa=on, thus requiring only
A small subset of all network weights.
Global Learning Local Learning

Advantages: Advantages:

•  Compact Representa.on •  Rapid Convergence
•  Automa.c Resource Alloca.on •  Computa.onally Inexpensive
•  Generally Con.nuous and •  No Local Minima
Differen.able Mappings •  Convergence Guaranteed

•  Very High Accuracy


Disadvantages: Disadvantages:

•  Very Slow Convergence •  Memory Intensive
•  Unpredictable Local Minima (may not •  Resource Alloca.on Not Automa.c
converge to global minimum) •  Con.nuity and Differen.ability of
•  Computa.onally Expensive Mapping More Difficult to Guarantee
•  Generaliza.on Not Easily Controllable
•  Compara.vely Poor Accuracy
(on some problems)
Secant Approxima=on to Tangent
Con=nuous CMAC
Con=nuous CMAC
Curse of Dimensionality
vs.
Blessing of Non-Uniformity*
* --- Pedro Domingos

The Curse of Dimensionality in Machine Learning refers to the effect that


many algorithms that work well in low dimensions become intractable in
higher dimensions.

The Blessing of Non-Uniformity refers to the fact that many problems, and
therefore input spaces, which are high-dimensional can be represented
using a lower dimensional manifold or representa=on.

Lazy Learning

versus

Eager Learning
Lazy versus Eager Learning

Lazy Learning methods store all of the training data and use it only
when called with a new input vector (query) to perform a
mapping. They make no assump=ons about the overall
shape of the global mapping before the query is presented.
Also referred to as Instance-based Learning methods.

Examples include: k-Nearest Neighbors, Locally Weighted Regression,
and Case-Based Reasoning
Lazy versus Eager Learning

Eager Learning methods construct an approximate representa=on


of the global func=on before receiving a query. They are
therefore limited to genera=ng a single global
approxima=on to the target mapping.

Examples include: Mul.layer Perceptrons (e.g., Backprop Networks),
RBFs, Decision Trees, CMAC, …
k-Nearest Neighbor

Training Examples !! , !(!! )


k-Nearest Neighbor

Query
!!
k-Nearest Neighbor

Query
!!
k-Nearest Neighbor

k = 6

!
!!! !(!! )
!(!! ) ←
!
k-Nearest Neighbor

Requires:
•  Training exemplars and queries map to points in ℜ!
•  Small input vectors
•  Sufficient density of training data to cover areas of interest

Advantages:
•  No informa=on is lost
•  Fast training
•  Can model highly complex surfaces

Disadvantages:
•  Addi=onal computa=onal complexity to answer queries
•  Weights all input aFributes equally
•  Suffers from Curse of Dimensionality
k-Nearest Neighbor Algorithm
(Real-valued)

•  Store all training examples !! , !(!! )



!! ! !
•  Given query calculate mean of values of nearest neighbors

!
!!! !(!! )
!(!! ) ←
!
•  Note that there are no weights to update
Distance Weighted
k-Nearest Neighbor Algorithm
(Real-valued)

Neighbors are weighted based upon their distance from the query point !!

!(!! , !! ) !! !!
Define distance between and
1
Define weights !! ≡ !


!(! ,
! ! ! )

!
!!! !! !(!! )
!! ! !! ←
Then given query , !
!!! !!
Shepard’s Method

•  Store all training examples !! , !(!! )



!! !
•  Given query calculate weighted sum of values of all points

!
!!! !! !(!! )
! !! ← !
!!! !!

•  Note that since all points are used, Shepard’s Method is a global learning
algorithm
Nearest Neighbor vs
Locally Weighted Regression

k-Nearest Neighbor
(Discrete)

Distance Weighted
k-Nearest Neighbor
(Con=nuous)

Locally Weighted
Regression
Locally Weighted Regression

!(!)
LWR is a Lazy Learning method in which an approxima=on is
!!
formed around each query point

!!
•  It is Local since only the points near are used.

•  It is Weighted since the influence of the points is determined by their


distance from the query point.

!(!)
•  It is Regression since approximates a real-valued target func=on.

We can approximate the target func=on as a linear combina=on of


m weighted aFributes of the training points

! ! = !! + !! !! ! + ⋯ + !! !! (!)
Locally Weighted Linear Regression

Unweighted Averaging Locally Weighted Averaging


Using Springs Using Springs
Locally Weighted Linear Regression

To calculate squared error over k nearest neighbors, define error func=on


!
1
!! !! ≡ (! ! − !(!))!
2
!!!

To weight the influence of points based upon distance, define kernel func=on
1
!! ! ≡
!
1
!! ! ≡ !
!
!
!! ! ≡ ! !!
To minimize the weighted error across the en=re training set
!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
Locally Weighted Linear Regression

Some typical kernel


func=ons (from
Atkeson et al., 1997a)
Locally Weighted Linear Regression

To minimize the weighted error across the en=re training set


!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!

To minimize the weighted error across the k nearest neighbors


!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!

The weight update rule can then be expressed


!

△ !! = ! !(!) ! ! − ! ! !! (!)
!!!
! !
with learning rate for each aFribute of input vector
Locally Weighted Nonlinear Regression

Locally weighted regression techni-


ques may be extended to u=lize
nonlinear local support func=ons
(e.g, quadra=c, cubic polynomial)
through use of widely supported
curve fi`ng techniques from
sta=s=cal analysis.

LOESS (Locally Es=mated ScaFerplot
Smoothing) and LOWESS (Locally
Weighted ScaFerplot Smoothing) are
efficient nonparametric methods for
fi`ng models to subsets of data.
Radial Basis Func=on Networks

Radial Basis Func.on Networks (RBFs) are a type of


neural network closely related to distance-weighted
regression.

They form global approxima=ons of the target
func=ons using a linear combina=on of basis
func=ons weighted by distance from the points of
interest.
Radial Basis Func=on Network

! = (!! ! , !! ! , … , !! (!))
Radial Basis Func=on Networks

! ! = !! + !! !! (!(!! , !))
!!!

!!
Kernel func=on is defined so that
it decreases with increasing distance.

!!
A common choice for is the
Gaussian func=on

! !
! ! ! (! ! ,!)
!! ! !! , ! = ! !!!

!!!
where denotes the variance of the Gaussian at !!
RBF Network Training

RBF networks are trained in a two-stage process.



First, the number of hidden units is determined (ocen the same as
the number of points in the training set). Their centers are
ini=alized (generally to the training points), and their variances
ini=alized to correspond with the chosen kernel func=on.

Second, the weights are trained using the global error func=on
!
1 !


!! !! ≡ ! ! −! ! !(!)
2
!!!
or a localized error func=on, e.g.

!
1 !
!! !! ≡ ! ! −! ! !(!)
2
!!!
RBF Network Training

The hidden layer nodes may


alterna=vely be ini=alized using the
EM (Expecta.on Maximiza.on)
Algorithm, an unsupervised
clustering technique for fi`ng data
to a mixture of Gaussians.

Only the input vectors are used to
determine the cluster centers. As
EM is an unsupervised clustering
technique, the target vectors
(outputs) are not used or needed.
RBF Network Training

The outputs, however, are used to determine the weights connec=ng


the hidden layer nodes to the output. Since the output(s) are a linear
combina=on of these inputs, there are numerous linear regression-based
techniques for efficiently op=mizing these weights.


Locally Weighted Learning
for Robo=c Control

Locally weighted learning for control u=lizes Memory Based (Lazy)


Learning methods for construc=ng local models from data.

A key concept is that rather than building a global model up front,
simply store all of the data.

When a query is presented, use the data near the query point to
construct a local model to answer the query. This model is then
discarded.

Locally Weighted Learning methods specify how models may be
built using Lazy Learning methods (such as LWR), but not how
they are used to construct learning controllers.
Locally Weighted Learning
for Robo=c Control

First we will consider temporally independent control tasks, such


as setpoint control and vehicle trajectory following, such that

! = ! !, ! + !"#$%

! ! !
for output vector , state vector , and control vector .

! !
The control task is to choose so that the outcome is .

!
Use Lazy Learning to infer a model which approximates . !
Locally Weighted Learning
for Inverse Models

Inverse model based control techniques use states and desired


outcomes to predict the control inputs necessary to achieve
the desired outcomes.

! = ! !! (!, !)

Learned database
implemen=ng
inverse model
Locally Weighted Learning
for Inverse Models

Pros:

•  The database is “trained” by adding new points (!, !, !)

! !
•  If there is a monotonic rela=onship between and , then there
are efficient methods for rapidly converging on the correct mapping


Cons:

•  May not work if
Ø  Vector space of ac=ons and outcomes is not the same
Ø  Mapping is not one-to-one
Ø  Data include misleading noisy observa=ons

Locally Weighted Learning
for Inverse Models

Inverse model fails to accurately predict control ac.on for


desired outcome on non-monotonic func.on.
Locally Weighted Learning
for Forward Models

Forward model based control techniques use states and control


inputs to predict outcomes.

! = !(!, !)

Learned database
implemen=ng
forward model
Locally Weighted Learning
for Forward Models

Pros:

•  The database is “trained” by adding new points (!, !, !)
•  Allows “mental simula=on,” or predic=on of the effects of different
ac=ons


Cons:

•  Requires search of the database to find ac=on that corresponds
to the desired outcome for the current state.
Combining Inverse and
Forward Models


An Inverse Model may be used to generate a good star=ng poin=ng
for search of a Forward Model.

!! (!, ! )
! ! = ! !

! may be used with a Lazy Forward Model
!


! = !(!, !! )

! !!
If is close to then Newton’s Method may be used to
!
further refine .
Locally Weighted Learning
for Robo=c Control

Next we will consider temporally dependent control tasks such that



! ! + 1 = !(! ! , ! ! )

The task is then to regulate (control) the state to achieve a desired
!!
setpoint , or a sequence of values to create a trajectory
!! 1 , !! 2 , !! 3 , …

The database is “trained” by adding triples of the form (!! , !! , !!!! )

!!
current state
! !
current control input
! next state
!!!
Deadbeat Control
for Devil S=cking

One-step deadbeat control chooses ac.ons to cause the immediate


next state to be the desired next state. If the next state is aYainable
in one step the ac.on may be chosen without regard to future
states, decisions, or performance.

Atkeson et al. (1997) applied deadbeat control to learn the Devil
S=cking task.

!! = ! !! (!! , !!!! )
First, an Inverse Model was learned
(database populated from sufficient explora=on).

!!
Next, given a desired state the database was queried to
determine !! = ! !! (!! , !! )


Locally Weighted Learning
for Robo=c Control

Devil s=cking robot link, by Professor Chris Atkeson:

YouTube Link: Devil S=cking Robot Video


Locally Weighted Learning
for Robo=c Control

Deadbeat Control will fail if the dynamics of the plant being


controlled require more than a single step lookahead.
Lazy Learning may be applied to more complex control
methods (e.g., LQR, Nonlinear Op.mal Control).

The use of linear regression models ocen (with sufficient data)
guarantees the existence of deriva=ves, which may be easily
calculated from the models through numerical differen=a=on.

Learning methods for Nonlinear Op.mal Control techniques
(e.g., value and policy itera=on) ocen fall under Reinforcement
Learning techniques and will be visited later in the course.


References
•  Albus, James S., “A New Approach to Manipulator Control: The Cerebellar Model
Ar=cula=on Controller,” Journal of Dynamic Systems, Measurement, and Control,
pp. 225, September 1975.

•  Atkeson, C. G., Moore, A. W., and Schaal, S., "Locally Weighted Learning,”
Ar.ficial Intelligence Review, 11.1-5 (1997): 11-73, 1997.

•  Atkeson, C. G., Moore, A. W., and Schaal, S., “Locally Weighted Learning for Control,”
Ar.ficial Intelligence Review, 11:1-5 (1997): 75-113, 1997.

•  Domingos, P., "A few useful things to know about machine learning,"
Communica.ons of the ACM, 55.10:78-87, 2012.

•  Mitchell, T., “The Discipline of Machine Learning,” CMU-ML-06-108


(CMU Technical Report), July 2006.

•  Mitchell, T., Machine Learning, Chapter 8, McGraw Hill Educa=on, 1997.
Reading Assignments
•  Atkeson, C. G., Moore, A. W., & Schaal, S., "Locally Weighted Learning,”
Ar=ficial Intelligence Review 11.1-5 (1997): 11-73.

•  Atkeson, C. G., Moore, A. W., & Schaal, S., “Locally Weighted Learning for
Control,” Ar=ficial Intelligence Review, 11:1-5 (1997): 75-113.

•  Brooks, Rodney A. "Intelligence without representa=on." Ar.ficial Intelligence


47.1 (1991): 139-159.

•  Arkin, Ronald. "Motor schema based naviga=on for a mobile robot: An approach
to programming by behavior." Robo.cs and Automa.on. Proceedings. 1987 IEEE
Interna.onal Conference on. Vol. 4. IEEE, 1987.

•  Wahde, Ma`as, and Wolff, Krister, “Behavior-Based Robo=cs,” 1997.


On-line (hYp://www.am.chalmers.se/~wolff/AA/Chapter3.pdf)
Homework Assignment #2
Due by 4pm, 06/20

1)  Program a Discrete CMAC and train it on a 1-D func=on (ref: Albus 1975, Fig. 5)
Explore effect of overlap area on generaliza=on and =me to convergence.

2)  Program a Con=nuous CMAC by allowing par=al cell overlap, and modifying
the weight update rule accordingly. Compare the output of the Discrete CMAC
with that of the Con=nuous CMAC.

3)  Discuss how you might use recurrent connec=ons to train a CMAC to output
a desired trajectory without using =me as an input (e.g., state only).

You might also like