Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Developmental Psychology Copyright 1996 by the American Psychological Association, Inc.

1996~ Vol. 32, No. 5, 811-823 0012-1649/96/$3.00

Learning to Reach: A Mathematical Model

N e i l E. B e r t h i e r
University of Massachusetts at Amherst

This article presents a mathematical model of the development of reaching that assumes that the
major problem facing infants is their lack of lower level motor control and that infants learn to adjust
their reaching strategies as a consequence of their previous experience and to match their current
level of control. The model hypothesizes that infant reaches are a series of submovements, with the
goal being to get the hand to the target in the face of errors in executed submovements. To relate
actual infant reaches to this model, reaching data were decomposed into submovements, using a
polynomial fitting algorithm that assumed minimum-jerk submovements. The model makes quan-
titative predictions about the course of development that are supported by existing results. The
validity of the model's underlying assumptions was assessed by comparing the directional variability
of the submovements with the variability assumed in the model.

The development of motor control has increasingly been longitudinally studied 5 infants and found that reaching trajec-
viewed as a significant problem in cognitive development tories were relatively straight within movement units, that the
(Lockman & Thelen, 1993). To succeed at reaching, infants number of movement units decreased with age, and that the first
must integrate visual, proprioceptive, and auditory input movement unit of the reach lengthened and started to dominate
(Clifton, Rochat, Robin, & Berthier, 1994) and accommodate the reach until the hand-speed profiles approached the appear-
their movements to the external constraints of the task, both in ance of the adult.
terms of planning movements that will uncover objects or avoid Von Hofsten ( 1991, 1993 ) has suggested that movement units
obstacles and in terms of using appropriate muscle forces for are actually the elementary action units of infant reaches. Be-
the situation (Goldfield, Kay, & Warren, 1993; Thelen, 1994). cause each movement unit is defined by the acceleration of the
Infants succeed even though they lack the fine motor control of hand, it reflects an expenditure of energy and may be the result
adults and operate with a neuromuscular system that is contin- of a control action on the part of the infant. Further support
uously changing in size, strength, and neural connectivity. Even is provided by results showing that the hand usually changes
with these substantive difficulties, most infants successfully direction between but not within movement units (Fetters &
reach by about 16 weeks of age and are proficient at grasping Todd, 1987; von Hofsten, 1979, 1991, 1993). However, Mathew
small objects in the second half of the first year of life. and Cook (1990) showed that when movement units are de-
Although the study of motor development has long been a fined as by von Hofsten ( 1991 ), there are a significant number
part of developmental psychology (e.g., Gesell, 1929; McGraw, of direction changes within movement units. Von Hofsten
1943), von Hofsten (1979) was the first to use modern equip- (1993) also suggested that action units of the reach are not the
ment to examine the detailed kinematics of infant reaching. output of a single motor program because movement units have
Von Hofsten focused his analysis on the hand-speed profiles of highly variable shape.
infant reaches. A hand-speed profile is a plot of the speed of the Recently, others have focused on the dynamics of infant
hand as a function of time during the reach. He found that in- reach. Thelen et al. (1993) studied infants around the time of
fant and adult hand-speed profiles differ in fundamental ways. onset of reaching and confirmed von Hofsten's ( 1991 ) basic re-
In simple reaching situations adults will reach with a single ac- sults. Unlike von Hofsten ( 1991 ), who focused on the common
celeration and deceleration of the hand, but von Hofsten found features of development, Thelen et al. (1993) emphasized that
that infants show multiple accelerations and decelerations dur- different infants show different developmental paths. Thelen et
ing reaches. Von Hofsten labeled these accelerations and decel- al. ( 1993 ) found that 2 of their infants moved very slowly with
erations of the hand m o v e m e n t units. Von Hofsten's work has highly damped movements at the onset of reaching and that 2
been confirmed and extended by Fetters and Todd (1987) and of their infants initially moved rapidly and energetically. By 22
Mathew and Cook ( 1990 ). More recently, von Hofsten ( 1991 ) weeks of age, all 4 infants showed similar reaching styles; the
slow-moving infants had speeded up their reaches, whereas the
energetic infants had slowed their reaches. Thelen et al. ( 1993 )
The author wishes to thank Andy Barto, Rachel Clifton, Michael concluded that the central nervous system does not contain
McCarty, and David Rosenbaum for their helpful and insightful com- rigid programs that detail the kinematics of early reaching but
ments on this work, and to Daniel Robin and Daniel McCall for their
that development of reaching is the result of the infant matching
help in the collection of data. This research was supported by NSF grant
the dynamics of the arm to the task.
SBR 9410160 and N1H grant HD 27714.
Correspondence concerning this article should be addressed to Neil Other data show that infant reaching is not simply a neural
E. Berthier, Department of Psychology, Tobin Hall, University of Mas- program that is triggered by the presence of a goal object, but
sachusetts, Amherst, Massachusetts 01003. Electronic mail may be sent that infants match the kinematics of their reaches to the task
via Internet to berthier@psych.umass.edu. and their goals. Figure 1 shows two hand movements of a 6-

811
8 12 BERTHIER

A B
1400 120
1200 gl00 Bat

1
Bat g
~ 1000 80

~- 60
600
g 4o
400
Reach
200 a 2o
J
0 400 800 1200 1600 0 460 860 1 2 0 0 1600" 20'00
Time (ms) Time (ms)

Figure 1. Batting and reaching speed profiles for movements by a 6-month-old infant• The end of the time
series is the time of contact lbr each of the movements• A: Both speed profiles are displayed on the same
scale. The two movement units in the bat correspond to moving the hand back to the body and moving the
hand from the body to contact the toy. B: A plot of the distance of the hand from the marker on the toy for
the reach and bat. The reach occurred first, and the time delay from the end of the reach to the start of the
bat was compressed on the figure, but was l s during the trial. The distance to the toy at the beginning and
end of the bat is not equal because of slight movement of the toy and because the distance is measured to
the position of the marker on the toy, not the toy itself. These data were obtained from the experiment
described in Experiment 1.

m o n t h - o l d t h a t resulted in contact with a plastic toy held in demands. Von Hofsten ( 1 9 9 3 ) has also described the p r o b l e m
front o f the infant. O n e m o v e m e n t , labeled reach, started with of m o t o r learning as a search o f "task space" in which the in-
the h a n d close to the b o d y a n d resulted in o r i e n t a t i o n of the fants explore the results o f their actions and choose reaches that
h a n d with the toy a n d grasp o f the object. T h e other m o v e m e n t , are the s m o o t h e s t or most economical•
labeled bat, was a b a c k w a r d and forward m o v e m e n t o f the h a n d T h e m a i n goal of this article is to propose a m a t h e m a t i c a l
t h a t started with the h a n d in contact with the toy. D u r i n g the model of the development of reaching. The proposed model
bat the i n f a n t m o v e d the h a n d back to the body to approxi- builds on the work of von Hofsten, Thelen, and others and
mately the same position as the b e g i n n i n g of the reach and then makes predictions a b o u t the kinematics that should be ob-
m o v e d the h a n d at high speed to m o m e n t a r i l y contact the ob- served at various times in development. The model hypothe-
ject. T h e two m o v e m e n t units constituting the bat c o r r e s p o n d sizes that infants use multiple m o v e m e n t units in reaching be-
to the h a n d m o v i n g b a c k w a r d to the body and forward to the cause such reaches are relatively efficient given the state o f their
toy, respectively. In this p a r t i c u l a r e x p e r i m e n t a l session, the in- developing n e u r o m u s c u l a r system. T h e generation o f reaches is
fant m a d e several reaches and bats. W h e n the contact resulted also hypothesized to occur t h r o u g h the interaction of the infant
in grasp o f the toy, m o v e m e n t k i n e m a t i c s r e s e m b l e d those la- with the e n v i r o n m e n t , using a trial and evaluation m e t h o d that
beled reach, w h e n the toy was a p p r o a c h e d o p e n - h a n d e d a n d leads the infant to discover an optimal s t r u c t u r i n g of the reach.
contacted momentarily, the k i n e m a t i c s r e s e m b l e d those labeled Data are presented that s u p p o r t the basic a s s u m p t i o n s o f the
bat. This e x a m p l e shows t h a t infants use different k i n e m a t i c s model.
w h e n the goal of the reach is different.
Figure 1 suggests t h a t there is s o m e t h i n g a b o u t the task of Model
reaching t h a t leads the i n f a n t to adopt m u l t i p l e - m o v e m e n t - u n i t
h a n d movements. Figure 1 shows t h a t it is clearly possible for I n f a n t reaching is modeled as a stochastic, or probabilistic,
the i n f a n t to move the h a n d to the toy in a single m o v e m e n t unit optimal control p r o b l e m . Meyer, A b r a m s , K o r n b l u m , Wright,
because t h a t is w h a t the i n f a n t does in the bat. O n e obvious and S m i t h ( 1 9 8 8 ) have developed a similar model of adult
difference between batting a n d reaching is t h a t at the end o f a p o i n t i n g experiments. In a classic experiment, Fitts ( 1 9 5 4 )
reach the infant is in a position to grasp the toy, b u t at the end found t h a t adults adjust the speed o f reaching to the accuracy
o f a bat the h a n d is n o t in a position to grasp the toy because it o f m o v e m e n t ; increasing d e m a n d s for accuracy result in de-
is m o v i n g too rapidly. O n e hypothesis developed in this article creases in reach velocities. To explain these data Meyer et al.
is t h a t infants show m u l t i p l e - m o v e m e n t - u n i t reaches because assumed that adults a t t e m p t to move to a target zone in such a
they are the most kinematically efficient way to move the h a n d way as to m i n i m i z e total m o v e m e n t time and that they reach
to grasp objects. using any n u m b e r of e l e m e n t a r y movements. Meyer et al. fur-
Thelen et al. ( 1993 ) proposed t h a t the tailoring o f m o v e m e n t ther a s s u m e d that rapid m o v e m e n t s result in more e n d p o i n t
k i n e m a t i c s to the task is one o f soft assembler: So/? assembly e r r o r t h a n slow movements. T h e Meyer et al. model p r o d u c e s
m e a n s t h a t the reacher m a r s h a l s the d y n a m i c s o f the body and results that m a t c h e x p e r i m e n t a l data.
the e n v i r o n m e n t to create a m o v e m e n t t h a t is specific to the T h e c u r r e n t work extends Meyer et al.'s ( 1988 ) model to in-
task at hand. T h i s m e a n s t h a t f u n c t i o n imposes c o n s t r a i n t s on fants. We assume t h a t infant m o v e m e n t s are c o m p o s e d o f a se-
the action, t h a t the b e h a v i o r is self-organizing a n d optimizing, quence of e l e m e n t a r y units, that each of the s u b m o v e m e n t s has
a n d t h a t the system discovers good solutions according to task some error t h a t depends on the speed o f the s u b m o v e m e n t , and
LEARNING TO REACH 813

that infants attempt to reach in a way that minimizes total by reduction of model stochasticity. A numerical learning algo-
movement time. The submovements of the current model differ rithm that works by "trial and evaluation" is used to compute
from the movement units ofvon Hofsten ( 1979 ). Von Hofsten's the movement strategies that bring the hand most reliably to the
movement units are operationally defined by an acceleration target in m i n i m u m time. The computed strategies for move-
and deceleration of the hand and by definition do not overlap in ment can then be compared with the strategies used by infants
time. The submovements in the current model do not require at various ages.
an acceleration and deceleration, and a submovement may be
generated before the preceding submovement has been com- Simulation Experiment
pleted. For computational simplicity, the implementation of the
model presented here assumes that a submovement is com- For simplicity, the current simulations assume that the hand
pleted and the hand speed drops to zero before the next sub- is moving in a plane. The model of the lower level motor systems
movement is generated. It should be relatively simple to extend has a single parameter k, which controls the degree of stochas-
the current mathematical formulation to the case of overlap- ticity in the mapping from motor commands to movements.
ping submovements. The state of the system at step i = 0, 1. . . . is an ordered pair si
Infants are hypothesized to develop movement strategies = (xi, Yi ) specifying the planar coordinates of the hand. Each
through a process of"trial and evaluation" as modeled by con- coordinate is an integer between - 4 0 and 140. The goal is to
nectionist (reinforcement learning) procedures. In reinforce- move the hand into a 10 × 10 box centered on ( 100, 100). The
ment learning tasks, an agent takes actions in a particular envi- motor command generated by the controller (higher motor
ronment (e.g., Barto, Bradtke, & Singh, 1995). Inherent in the systems) at each step is a triple, ai = (dxi, dyi, t~i), with dxi
task specifications is a goal that the agent is required to accom- and dy~ being commanded distances in the x and y directions,
plish. Reinforcement learning algorithms that solve these tasks respectively, and v~ being commanded average speed, dxi and
work by executing actions and evaluating the consequences. If dyi E { - 6 0 , - 4 0 , - 2 0 . . . . . 100}, and vi @ {80, 160, 240,
the agent takes an action that makes the attainment of the goal .... 480 }. The present simulations use distance in mm and
more likely, the probability of that action in the future is in- speed in mm/s. The motor command is applied to the model of
creased. Conversely, if the agent takes an action that makes at- the arm and state transitions are defined as
tainment of the goal less likely, the probability of that action is
decreased. Reinforcement learning algorithms generally ex- xi+l = xi + dxi + 7(0, 0"2) ( 1)
plore the effects of many possible actions to determine which
action is the best. If a stochastic optimal control problem is ad- Yi+l = Yi + dYi + 7(0, 0"2), (2)
dressed by a reinforcement learning algorithm, it can be proven
that the reinforcement learning algorithm will find an optimal where n(0, tr 2) is a Gaussian probability distribution with a
solution under certain assumptions (Watkins & Dayan, 1992). mean of 0 and variance a 2 = kv~ + .2. k is constant between
Reinforcement learning algorithms have the advantage that in zero and one that controls the degree of stochasticity of the state
nonstationary environments the algorithm will adjust the ac- transitions. Equations 1 and 2 define a model of the infant's arm
tion probabilities to select actions appropriate for the current in which the amount of uncertainty about intended movements
environment. In the current context, where the infant's neuro- increases with movement speed.
muscular system is constantly maturing and changing, rein- Q-learning was the particular reinforcement learning algo-
forcement learning algorithms will select the action that is most rithm that was used to solve the stochastic optimal control
appropriate for the current state of the infant's development. problem stated above. Like other reinforcement learning algo-
The model specifically assumes the following: rithms, Q-learning works by trial and evaluation and has been
1. Infant reaches are composed of a sequence of shown to be computationally tractable. In the current context,
submovements. Q-learning is not meant to be a specific model of infant learning
2. Infants learn to reach when their cognitive, motor, and but is simply used to efficiently compute movement strategies
perceptual systems are undergoing rapid development. Because for various levels of stochasticity.
of the dynamics of the arm, motor commands often have out- Q-learning works by computing a Q-function that estimates
comes that are unanticipated by the young infant. At this time, the cumulative costs of taking particular actions in particular
the mapping between their motor commands and movements states. The optimal action of each state is then the action that
is constantly changing. Thus, infants learn to reach when they minimizes the total cumulative cost. Q-learning has been shown
cannot accurately predict the consequences of particular motor to converge to an optimal control strategy under certain condi-
commands. This uncertainty decreases with age. tions (Watkins & Dayan, 1992 ) and in practice has been shown
3. Movement strategies of infants are adapted to deal with to compute approximately optimal control strategies (Barto et
this uncertainty, and the strategies used by infants arise through al., 1995 ). A detailed description of the Q-learning procedure
a "trial and evaluation" process. The strategies infants use track used in the current simulations is provided in the Appendix.
changes in their neuromuscular system and are optimal or near-
optimal strategies. S i m u l a t i o n Results
The model is specified by a mathematical model of the neu-
romuscular system of the infant that has the key feature that the A series of simulations were performed that differed only in
mapping between motor commands and movements is stochas- the value of k, the arm model parameter that controls the de-
tic. The amount of uncertainty in the motor command to move- gree of stochasticity. Computations with high stochasticity are
ment mapping is assumed to decrease with age and is modeled meant to refer to infants just learning to reach, and those with
814 BERTHIER

A 140
B 140 • . .. °
°, ,,
120 • °°° • • 120
i00
• ...¶.,.'. ..~" . .
i00
• . . •. • .~ ~ . ' . . ~ _
80 . , ,.'.. 80 • ° L.~.. ~ • 4 - -

• . ,~,
so • " • .
60

40 • • * ! % ,* . 40
20 • ~ * % , . °% ,, 20
0
**
0

6 ~o go Ko ~'o ~ o ~ o ~ o :o" 8"" 2'o 40 60 80 i00 120 1~

C 140 • .'° ° • ° •
D 140 ~'. ~'.:. " ..
120 ~ o.t. -, ." 120 -. -.~.-.,...,'...;."
• . II~ I* * % ** • , . • . . .
100
I°: ~, ° , oO I00
.. , ~ . . ~ .
80 80
• . .*
60 60
40 40

20 20

0 0

- 2 o ',0 -20_
go 8'o 16o 1~o 14o 0 0 20 40 60 80 100 120 140

E F 140
140
.° ,t "* .°

120 • . ~.::,,:... 120


•:..~_~_~ •
100 i00 • .- t.J-,,~.
80 80
s"

60 60

40 40

20 20
0 0

-2920 6 2"0 4"0 go e'o 16o i~o 14o -2920 6 2'o 4'0 6'0 8'o 16o ~ o i ~ o

Figure 2. Hand positions at the end of each movement unit during 300 simulated reaches. The movements
were controlled by a Q-learning-computed policy that approximated a time-optimal policy. Panel A shows
the positions of the hand at the end of the first unit, Panel B at the end of the second unit, Panel C at the end
of the third unit, and so on. If the hand reached the target, the reach was terminated, and hand positions
were not plotted on subsequent panels. The starting position of the hand was (0, 0), and the target was the
box centered at ( 100, 100). These reaches used an arm model with k = 20 × 10 5. Note that the optimal
strategy is to move the hand about halfway to the target during the first movement unit and then use several
small movements to reach the target.

low stochasticity are m e a n t to refer to older infants. Figure 2 shows t h a t the optimal strategy with low stochasticity is to reach
shows a s u m m a r y o f the k i n e m a t i c s o f 300 reaches with an a r m for the target in one s u b m o v e m e n t , a strategy t h a t works on
m o d e l with relatively high stochasticity. Each panel o f the figure a l m o s t all o f the trials• S u b s e q u e n t simulations showed t h a t the
shows the ( x , y ) positions o f the h a n d after a p a r t i c u l a r sub- " s c a t t e r " seen in Panel B is the result o f the coarseness o f the
m o v e m e n t across the 300 reaches. Panel A shows h a n d posi- data s t r u c t u r e for the Q-function.
tions after the first s u b m o v e m e n t , Panel B after the second sub- Figure 4 s u m m a r i z e s the results o f the simulations. T h e m e a n
m o v e m e n t , a n d so on. Each d o t shows the h a n d position for a n u m b e r ( t o p ) , speed ( m i d d l e ) , a n d distance ( b o t t o m ) o f move-
particular reach. If h a n d position was w i t h i n the b o x centered m e n t units in optimal strategies is shown as a function of de-
at ( I00, 100), the target was considered attained, a n d the reach creasing model stochasticity, which c o r r e s p o n d s to increasing
was t e r m i n a t e d . All reaches started at (0, 0 ) . age o f the infants. As expected, the simulations showed that with
T h e s i m u l a t i o n s with high a r m - m o d e l stochasticity indicate increasing age the efficient strategy is to reach with fewer sub-
t h a t the optimal strategy is to reach for the target using a se- m o v e m e n t s o f increasing distance. This prediction qualitatively
q u e n c e o f s u b m o v e m e n t s o f relatively high speed. T h e first u n i t m a t c h e s the available longitudinal data showing t h a t infants
takes the h a n d a b o u t halfway to the target, and several smaller reach with fewer m o v e m e n t units o f increasing distance as de-
m o v e m e n t s take the h a n d the rest o f the way. T h e target is usu- velopment proceeds ( v o n H o t , ten, 1991 ; T h e l e n et al., 1993).
ally a t t a i n e d w i t h i n six m o v e m e n t s units. A n unexpected result was seen in the trade-off of m o v e m e n t
Figure 3 shows the k i n e m a t i c s o f 300 reaches with an a r m speed a n d n u m b e r of s u b m o v e m e n t s . T h e simulations show
m o d e l o f m u c h lower stochasticity t h a n Figure 2, Figure 3 t h a t with high stochasticity the efficient strategy is to reach for
LEARNING TO REACH 815

A 140 140
120 120
i00 ~0 I00
80 80
60 60
40 40
20 20
0 o
-20. -20.
20 40 60 80 100 120 140 -,:0 2'0 40 60 80 i00 120 140

C 140
D 140
120 120
i00 D. 100 D
80 80
60 60
40 40
20 20
0 0
-20
I~IIII I o I . l I I : l l I liIo|~III i |, -2 2 0 0 2"0 40 60 80 I00 120 140

Figure 3. Hand positions at the end of each movement unit for 300 simulated reaches. Data plotted as in
Figure 2. These reaches are using an arm model with k = 2 × 10 -5. Note that the optimal strategy is to try
to move the hand to the target in one movement unit. The "scatter" in hand positions in Panel B is a
numerical problem that is due to the coarseness of the representation of hand position in the data structure
storing the Q-values.

the target with four or five relatively high speed units. As sto- Although these results are consistent with the view that infant
chasticity decreases with age, an abrupt transition in strategy is reaches are composed of a sequence of submovements, two
seen. At about k = 15 × 10 5, the optimal strategy shifts from problems remain. First, changes in hand direction are often
reaching with four or five high-speed units to reaching with one seen between speed valleys. Approximately 20% of all curvature
or two low-speed movements. Decreasing stochasticity beyond maxima occur between speed valleys (von Hofsten, 1991; von
this point leads to faster and faster movements composed of one Hofsten & R6nnqvist, 1993, Mathew & Cook, 1990). This
or two units. This leads to the counterintuitive prediction that finding is not consistent with the hypothesis that infants reach
a decrease in movement speed will be seen at a point in devel- in a sequence of simple submovements. It is possible, how-
opment when the number of submovements decrease. ever, that the reason directional changes are observed within
movement units is that the current segmentation methods do
Experiment 1 not fully decompose infant reaches into their elementary
submovements.
The current model assumes that infant reaches are a se-
A second problem is that the current segmentation methods
quence of submovements and that by moving in such a se-
are empirical in nature and do not have a rigorous theoretical
quence infants are able to correct for early inaccuracies in the
justification. It is possible that the accelerations and decelera-
reach by making corrections late in the sequence. Although the
tions of the hand during the reach are a reflection of the dynam-
mathematical implementation above makes the simplifying as-
ics of the arm and not consequences of a sequence of actions,
sumption that the hand comes to rest between submovements,
much like a damped pendulum accelerates and decelerates be-
the available data clearly show that the submovements overlap
fore finally resting. A theoretical understanding of the submove-
in time. The goal of Experiment 1 is to relate the empirically
ments of the reach that is consistent with current theories of
obtained data with the simple assumptions of the mathematical
movement generation would provide strong evidence that infant
model and to obtain a theoretical understanding of the form of
reaches are composed of a sequence of submovements.
the submovements underlying infant reaches.
Von Hofsten (1979) was the first to argue that infant reaches Examination of the literature reveals that adults typically
are composed of a series of submovements and to define sub- move their hands in straight lines and show bell-shaped speed
movements by minima of hand-speed profiles. Fetters and Todd profiles when asked to move in between two points in space
( 1987 ) later showed that the infant's hand changes direction at (Abend, Bizzi, & Morasso, 1982 ). Flash and Hogan ( 1985) sug-
these times of minimal speed, a result that further supports the gested that adults plan movements between two points as a
hypothesis that von Hofsten movement units are in fact func- straight line and adopt movement kinematics that minimizes
tional submovements. Von Hofsten ( 1991 ) confirmed this anal- hand jerk. Experimental results confirm that adults use mini-
ysis in a longitudinal study, and von Hofsten and R6nnqvist mum-jerk trajectories when asked to make simple movements
( 1993 ) confirmed these results in neonates. (Flash & Hogan, 1985). Although the minimum-jerk model
816 BERTHIER

a s s u m p t i o n is m a d e in t h e c u r r e n t m o d e l t h a t i n f a n t s actively
p l a n to m i n i m i z e m o v e m e n t jerk. If t h i s e x t e n s i o n is s u c c e s s f u l ,
t h e n s t r o n g e v i d e n c e w o u l d be o b t a i n e d t h a t i n f a n t s m o v e in a
sequence of elementary submovements and that the underlying
5
"5 m o v e m e n t s a r e s i m i l a r to s i m p l e p o i n t - t o - p o i n t m o v e m e n t
made by adults. A similar but atheoretical method has been
.o
E d e v e l o p e d b y M i l n e r ( 1992 ) for a d u l t r e a c h e s .
Z

Method
.=
50 40 30 20 10 Participants. Six 6.5-month-old infants were used as participants in
Stochasticity
the current experiment. Three of the infants were male and 3 were fe-
male. Infants were recruited as part of a larger longitudinal study, and
because of the time constraints of that study, infants were from families
500
whose parents had ties to the university. All infants were born at full-
400 term, in good health, and had a normal course of development.
Stimulus and apparatus. Infants were presented with a hand-held
(ID plastic replica of "Big Bird" (7 cm in length). A small rattle was held
300
behind the toy. As part of the larger longitudinal study, inthnts were also
200 tested in the dark with a glowing toy and in the dark with a sounding
toy. Only data from the fully illuminated trials are presented here.
100 The infants were videotaped throughout the session at 33 f r a m e s / s
with an infrared camera (Panasonic W V I 8 0 0 ) that was placed to the
O6'o 5'0 4'0 3'0 2'0 1'0 right of the infant for a side view of the reaches, in addition to the vid-
Stochasticity eotape, the reaches were recorded using an Optotrak motion analysis
system ( Northern Digital ). This system consists of three infrared cam-
180
eras that generate estimates of a marker's position in three-dimensional
160
coordinates. In the current experiments, four infrared emitting diodes
140 ( I R E D s ) were placed on the infant's a r m and used as markers. The
120 Optotrak system estimated the positions of these markers at a rate of
100 100 Hz. Position data were acquired during 15-s trials. Two IREDs were
80 taped on the back of the infant's right hand, one just proximal to the
121 joint of the index finger and one on the ulnar surface just proximal to
60
the joint of the little finger. Two IREDs were used in order to keep at
40
least one in camera view if the infant rotated his or her hand during the
20 reach. Infants of this age are not bothered by the IREDs and tend to
s'o 4'o 3'0 2'o ~'o o ignore them once they are in place. One IRED was placed on the exper-
Stochasticity imenter's hand that held the toy. The Optotrak cameras were placed
above and to the right of the infants,
Figure 4. The simulated n u m b e r (top), speed (middle), and distance Both the video camera and the Optotrak system were fed through a
(bottom) of the movement units as a function of increasing age. Increas- date-timer ( For-A ) and into a videocassette recorder ( Panasonic Model
ing age was modeled as decreasing stochasticity in the a r m model. The 1950) and a video monitor (Sony Model 1271 ). The Optotrak system
values ofstochasticity are X 10 5. The speed and distance for movement and the date-timer were triggered simultaneously in order to time-lock
units 2 and 3 are not given for stochasticities < 15 x 10 5 because these the IRED data with the videorecorded behavior for later scoring.
m o v e m e n t s were almost always composed of one movement unit. Procedure. Infants were seated on their parents" laps in front of the
experimenter who presented the objects. The parents were asked to hold
their infants firmly around the hips to support the infant and allow for
free movement of the infant's arms, The parents were further asked to
d o e s n o t fit all t h e d a t a (e.g., U n o , K a w a t o , & S u z i k i , 1 9 8 9 ) ,
refrain from attempting to influence the infant in any way. A second
m i n i m u m - j e r k p o l y n o m i a l s c a n a c c o u n t for a w i d e v a r i e t y o f
experimenter was seated out of view to trigger the Optotrak system and
a d u l t d a t a . It is likely t h a t t h e o b s e r v e d m i n i m u m - j e r k t r a j e c t o - to observe the infant on the video monitor. After the IREDs were ap-
ries a r i s e n o t f r o m s o m e p l a n n i n g p r o c e s s as F l a s h a n d H o g a n plied to the i n f a n r s hand, the presenter indicated that the session could
( 1 9 8 5 ) s u g g e s t e d , b u t as a r e s u l t o f t h e d y n a m i c s o f t h e a r m begin. Before each trial, the presenter got the infant's attention while
itself. concealing the toy, and began the trial when the infant looked straight
Recently, Flash and Henis (1991) extended minimum-jerk ahead and made eye contact. A trial consisted of the presenter signaling
t h e o r y to t h e c a s e in w h i c h a d u l t s a r e r e q u i r e d to m o v e to a the other experimenter to trigger the Optotrak, then bringing the toy
s h i f t e d target. T h e y f o u n d t h a t t h e y c o u l d fit e x p e r i m e n t a l l y o b - into view and shaking it slightly. The toy was presented at approxi-
t a i n e d k i n e m a t i c d a t a b y a s s u m i n g t h a t m o v e m e n t s to a s h i f t e d mately 30 ° from the midline in the infant's right hemifield about shoul-
der height and at a r m ' s length. Presenting the object in this hemifield
target consist of two minimum-jerk submovements that simply
encouraged right-handed reaching, which could be tracked with l REDs.
s u m m e d . T h e c u r r e n t e x p e r i m e n t s e e k s to e x t e n d F l a s h a n d
The presenter held the toy in this position, shaking it intermittently,
H e n i s ' s ( 1991 ) t h e o r y to i n f a n t r e a c h i n g m o v e m e n t s a n d s h o w until the infant successfully reached for and grasped it or until the 15-s
that infant reaches consist of a sequence of minimum-jerk sub- trial ended, lntertrial intervals lasted approximately 10 s.
m o v e m e n t s . M i n i m u m - j e r k p o l y n o m i a l s a r e u s e d s i m p l y be- Lighted, glowing, and sounding object trials were always presented in
c a u s e t h e y e m p i r i c a l l y fit t h e a d u l t d a t a a n d n o t b e c a u s e a n y blocks of two trials per condition. The order of blocks was randomized
L E A R N I N G TO REACH 817

across participants. Each infant continued to receive blocks of trials in s u m of a set of functions described by Equation 5. If there are n sub-
random order until he or she had completed 18 total trials (6 per movements indexed by i, then the observed speed profile should be
condition ) or until he or she became fussy. All infants bad at least one
reach in each condition. ~(t) = ~ xi(t). (6)
Data scoring. The trials were scored by viewing the Optotrak data i
in conjunction with the videotape. Each trial was first examined on the
videotape to ascertain whether a reach was performed on that trial and The parameters that control the position and shape of the i m sub-
approximately where during the 15-s trial the reach occurred. A reach movement will be denoted by p~, q, and w~.
was defined as a forward motion o f t b e a r m that resulted in contact with Although Equation 5 is nonlinear, its parameters can be estimated
the object that was not part of a turning motion or a torso rotation. The relatively easily because infant submovements are separated in time. In
Optotrak data for successful reaches were then examined to determine s u m m a r y , the present algorithm first estimates the parameters of the
whether the IREDs had remained within sight of the cameras for the largest s u b m o v e m e n t of the reach. This s u b m o v e m e n t is then subtracted
duration of the reach. If a reach occurred, and the Optotrak data had from the data, and the largest remaining s u b m o v e m e n t is fit. This sub-
traction-fitting cycle proceeds until all the submovements have been fit.
gaps of less than 330 m s of missing data, the kinematic data were ana-
lyzed. O f the two IREDs on the reaching hand, only the one that was The parameter estimation step of each iteration uses gradient descent
on the error function to compute parameter estimates.
missing the least a m o u n t of data were used for further analysis. If nei-
The data are first filtered and differentiated by the algorithm of Busby
ther IRED was missing any data, then the IRED that was used in the
majority of the other trials for that participant was used. and Trujillo ( 1985 ). Next, the m a x i m u m of the speed data is found, and
Reach onset was defined as the m o m e n t when the infant's a r m began it is assumed that the m a x i m u m occurs near the center, c;, o f t b e largest
an uninterrupted approach toward the object and was determined by submovement. The parameters of the largest s u b m o v e m e n t are then es-
viewing the infant's behavior on videotape. Two observers indepen- timated by fitting a movement of Equation 5 to the peak and its neigh-
boring four points. Because the data were sampled at 100 Hz, this fitting
dently viewed each reach and noted the time of reach onset. In cases of
disagreement, a third observer also viewed the videotape. If the third was done using 50 ms of data. The parameter estimation is done by
gradient descent on the mean squared error function,
observer decided that reach onset was clearly one of the times noted by
the other observers, the third observer chose that onset time. Iftbe third j+4

observer could not decide which o f t b e other observer's times was most E = Z [ x ( j ) - x ( J ) ] 2, (7)
J
appropriate, the earlier time was used. The same procedure was used to
determine the time o f initial contact with the object, except that the where x is the actual segment of data to be approximated, .~ is the ap-
later time was chosen in a m b i g u o u s cases. proximation, and j is the time index of the first of the five points of x
Kinematic data analysis and computational methods. The data ob- that we are trying to approximate. The parameters, p;, ci, and wi, are
tained from the Optotrak system are estimates of the true IRED posi- adjusted to go down the gradient,
tion at the time o f t b e sample. The d y n a m i c p r o g r a m m i n g method of
Busby and Trujillo (1985) was used to estimate the position, velocity, OE OE OE
zXpi = - c ~ - - ~c; = AW; = (8)
and acceleration of the hand. We used the criteria suggested by Busby Op," - ~ 7c,' -'r Ow,"
and Trujillo for selecting B and used B = 1 X 10 - ~ . In the current
article, the term speed refers to the magnitude of the velocity vector. Step sizes, a, fl, % were adjusted according to Darken and Moody
Following Flash and Henis (1991), we a s s u m e d that the observed ( 1991 ). The initial values for the parameters, p;, c;, and w;, were pro-
speed profile is the simple s u m of a series o f submovements, each sub- vided by the peak speed of the data, the time of peak speed, and by
m o v e m e n t being described by a m i n i m u m - j e r k speed profile. Mini- a default value of 30 ms, respectively. Experimentation with various
mum-jerk movements that start at position x = 0 and time t = 0, and n u m b e r s of steps in the gradient descent indicated that 50,000 steps
end at position x = 1 and at time t = 1, are given by worked well in an acceptable a m o u n t of computer time.
After the 50 m s of the data were fit, the function was extended from t
x ( t ) = 10t 3 - 15t 4 + 6t 5. (3) = q - w~/2 to t = c~ + wJ2. This function describes the largest sub-
movement. Because Flash and Henis ( 1991 ) showed that multiple sub-
Equation 3 can be parameterized to describe movements of arbitrary movements sum, we then subtracted this submovement from the data
length and starting location (Flash & Hogan, 1985 ). The speed profile and proceeded to fit the next largest s u b m o v e m e n t of the remaining
of a m i n i m u m - j e r k m o v e m e n t is obtained by differentiating Equa- data. A pseudo-R 2 was computed to assess the quality of the fit by di-
tion 3, viding the mean squared error of the approximation by the mean of
the s u m of the squared data and subtracting it from 1. The iterations
Jc(t) = 30t 2 - 60t 3 + 30t 4, (4) continued until one of four criteria were met: (a) The m e a n squared
error was less than 40.0, (b) the addition of the last submovement re-
where x is the time derivative o f x ( t ) .
duced the mean squared error less than 20.0, (c) on the average, there
To fit arbitrary movements, we require a parameterization of Equa-
was more than one s u b m o v e m e n t per 100 ms, or (d) the pseudo-R 2 was
tion 4 that allows arbitrary shifting in time and scaling in t i m e and
greater than .992.
height. An appropriate parameterization of Equation 4 is
Once ~he n u m b e r and parameters of the submovements in a reach
were determined, a global gradient descent on the error function was
x(t)
performed to optimize the parameters. The global gradient descent was
performed as in each of the previously mentioned fitting steps, except
that all of the parameters were adjusted simultaneously. A computer
program written in c implemented the decomposition algorithm
where p is proportional to the peak speed, c is the time of peak, and w is (written by and available from Neil E. Berthier).
the time duration o f t b e movement, x ( t ) is defined as zero ift < c - w/
2ort>c+ w/2. Results
If Flash and Henis's ( 1991 ) result is appropriate for infants and the
observed speed profile is the simple s u m of a series o f m i n i m u m - j e r k To i l l u s t r a t e h o w t h e d e c o m p o s i t i o n p r o c e d u r e works, de-
submovements, then the observed speed profile should be the simple c o m p o s i t i o n s o f t w o m o v e m e n t s will b e d e s c r i b e d in detail. T h e
818 BERTHIER

first movement was the high-speed bat of the toy presented in Figure 6 shows the hand-speed profile for a typical reach per-
Figure 1, and the second was a typical reach for the toy. The bat formed by this infant. This movement had multiple peaks and
is particularly relevant to the question of the extension of Flash was much lower in velocity than the movement shown in Figure
and Henis ( 1991 ) to infant data because it consists of two sub- 5. The decomposition of this reach is shown in Panels A through
movements and is directly analogous to the two submovements E. Panels A through D show the decomposition after the first,
that Flash and Henis found in response to double shifts of a second, third, and fourth submovements. After the fourth sub-
target. movement, a fifth submovement was attempted, but addition of
The hand-speed profile of the bat is shown in Figure 5. At the the fifth submovement caused the mean squared error to in-
start of this movement the baby was touching the toy and rap- crease, and it was discarded. Panel E of Figure 6 shows the de-
idly brought her hand back toward the shoulder, reversed direc- composition after the global estimation step that resulted in a
tion, and rapidly brought the hand back in contact with the toy, mean squared error of 124 and a pseudo-R 2 of 0.992.
hitting it at the end of the data sequence. This movement Overall, the six infants reached on 52 trials in the light in their
appeared to consist of two movements, up and back, and thus experimental sessions. Fifty of these reaches were made with the
is directly analogous to the Flash and Henis double shift hand that had the IREDs on it, and 43 of the reaches had data
paradigm. with sufficient Optotrak data to be differentiated. These 43
Panels A through C on Figure 5 show the sequential decom- reaches were submitted to the decomposition procedure de-
position of the movement. The algorithm first chose to fit the scribed earlier.
peak occurring at 310 ms. The program chose initial values for The experimenter visually inspected the decompositions and
the submovement and submitted the 50 ms around the peak to provided an estimate of the quality of the fit. Of the 43 decom-
a local fit. Once the parameters of the peak were estimated, they positions, 35 were judged "good" by the experimenter and were
were extended over the full domain of the submovement (250- similar to the decomposition displayed in Figure 6. Six other
350 ms). The submovement after the first fit is shown in Panel decompositions were judged "adequate" by the experimenter,
A, the dashed line is a plot o f a parameterized version of Equa- and two were judged "poor." The two decompositions that were
tion 5. The program then chose to fit the peak at 150 ms. This judged poor included a very slow reach in which the maximum
peak was locally fit and the submovement extended, with the speed was 60 m m / s and a decomposition in which the local fit
results shown in Panel B. After this step a third peak was at- was acceptable, but the global fit resulted in the widening of a
tempted at 170 ms, but the mean squared error actually in- single submovement to span 600 ms. The mean pseudo-R 2 was
creased after the local fit so that the third peak was discarded. .993 for the 41 reaches judged good or adequate. The mean
The parameters representing the two submovements were pseudo- R 2 was .872 for the two reaches judged poor.
then subject to global fitting. The results after the global fitting Figure 7 shows boxplots for durations of the submovements,
stage are shown in Panel C of Figure 5. The final fit was pseudo- for the time delay between subsequent submovements, and for
R 2 = 0.9998. In all three panels the sum of the submovements the peak speeds of the submovements based on 239 submove-
is also shown by the dotted line (labeled a p p r o x i m a t i o n ) , but ments. The plots show that the submovements used by the in-
because it overlies either the tails of the submovement or the fants are not random and show certain characteristics. For ex-
data itself, it is difficult to see. ample, the central half of the submovement durations ranged

Data - - Approximation ........... Submovements . . . . . . . . . . .

A B C
1400

1200 w

1000

800
E

600
u'J

400

200

. . . . .~, '%. ,
5"0 ' ' ~ ,t , J
0 100 150 200 250 3 0 0 350 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
Time (ms) Time (ms) Time (ms)

[\igure 5. Steps in the decomposition of the hand-speed profile for a rapid bat of the toy. A. The fit after
the first iteration. The algorithm has added a peak in the hand-speed profile at about 300 ms. B: The fit after
the second iteration, where the algorithm added a peak at about 150 ms. C: The approximation after a
global fit, where the parameters of both submovements were adjusted simultaneously. The overall pseudo°
R 2 for this fit was 0.9998.
LEARNING TO REACH 819

Data Approximation ........... Submovements ...........

A B C
350

E
g 25O

Q.
if)
150
100
/ ".-"
50 /',,
#' '., . .:',= , ,
100 200 300 400 500 600 0 100 200 300 400 500 600 0 100 200 300 400 500 600
Time (ms) Time (ms) Time (ms)
D E
40O
350
g 300
~250
~ 200
m 150
:.', :
100
/',, ,. •
: ,
;-,4,
/!i?,
0 100 200 300 400 500 600 100 200 300 400 500 6C)0
Time (ms) Time (ms)

Figure 6. Decomposition of a multisubmovement reach. The algorithm adds a single submovement at


each stage of the iteration from Panels A-D. Panel E shows the decomposition after the parameters of the
submovements were adjusted simultaneously.The overall pseudo-R 2for this fit was 0.992.

between 230 to 316 ms. All three distributions showed a hand- decompose infant reaches into submovements that have a sin-
ful of numerically large outliers. gle, invariant shape. Further, the shape of these individual sub-
movements is symmetrical and is the same as the shape of sim-
Discussion ple adult reaching movements. The current results suggest,
therefore, that generation of the reach may be simpler than yon
The current results show that it is possible to decompose a Hofsten ( 1991 ) supposed and that the submovements may be
large proportion of infant reaches into an underlying sequence the result of a relatively simple process.
of submovements that resemble simple movements of adults. To the extent that the current algorithm correctly identifies
These results provide a theoretical understanding of the sub- submovements of the reach, it is clear that submovements in
movements themselves and how they are combined to form a infants overlap in time (e.g., Figures 5 and 6). On the other
reach, and they suggest that infant reaches are composed of a hand, the mathematical implementation of the model makes
sequence of simple actions as hypothesized by the current the simplifying assumption that action i is completed before
model. The data also show that a major difference between action i + 1 is generated. Clearly, further development of the
adult reaches observed in other experiments (e.g., Flash & Ho- model will require modifications that allow for submovements
gan, 1985 ) and infant reaches is that adults are able to generate to be generated before preceding submovements are completed.
movements that reach targets in a single submovement, whereas An empirical finding of the current experiment is that infant
infants require several submovements. reaches can be decomposed into the underlying movements for
Von Hofsten ( 1991 ) found that single movement units did a large majority of infant reaches. This finding allows for a mi-
not preserve their basic shape from action unit to action unit croexamination of reach kinematics to determine how infant
when movement units were defined by speed minima. Move- reaching is planned and executed. The following experiment ex-
ment units defined by speed minima often have asymmetrical ploits this finding and analyzes single submovements to deter-
and irregular shapes. He also found that movement unit dura- mine whether the directional variability is similar to that pre-
tion and movement unit speed were relatively poorly correlated. dicted by the current mathematical model.
These results led yon Hofsten (1991) to suggest that single
movement units are not generated by a single, simple rule, but Experiment 2
that they were generated in some complex way that was highly The model hypothesizes that multiple submovement reaches
context dependent. The current results show it is possible to are the result of the infant's inability to precisely control the
820 BERTHIER

1200 experiment and were brought once a week into the laboratory, starting
at about 10 weeks of age. Data presented here were obtained from the
1100 session during which infants first made five successful contacts with the
object ( age range = 13-18 weeks, M - 16.6 weeks).
Stimulus. apparatus, and procedure. The stimuli, apparatus, and
1000
procedure were identical to that of Experiment 1.
Data anaO'sis Data were scored, differentiated, and decomposed as
"G" 900 in Experiment 1. The position of the hand at the time of submovement
¢1 onset and offset was then determined. The direction of hand movement
U)
800 during the submovement was defined by the vector connecting these
E
E two points. The direction of the target at the time ofsubmovement onset
v
was then determined. Directional error of the hand during the submove-
¢1 700
I1}
merit was then described by an error in azimuth and an error in eleva-
C:L X tion. The error in azimuth was defined as the angle to the left or right of
09
600 the target, and the error in elevation was defined as the angle above or
O below the target.
g 500 - o Directional error was computed similarly for the simulated data. The
O) time ofsubmovement onset and offset was determined directly from the
.E
_
I-- data in the simulations of Figure 2. Angular error was computed as
400
just described, but because the simulated reaches are two-dimensional
movements, a single error in azimuth completely describes the angular
300 error.

200 Results

100 The distributions o f angular errors are shown in Figure 8 with


the error from the simulated reaches shown by the dashed lines
with boxes, the error in elevation by the solid lines, and the error
0 I I I

Our Delay Pk Speed


in a z i m u t h by dotted lines with crosses. The infant data are
based on 113 s u b m o v e m e n t s o f 27 reaches. Figure 8 shows that
Figure 7. Boxplots for submovement duration (Dur), delay between the distributions o f the infant and the simulated data are ap-
the start of subsequent submovements ( Delay ), and submovement peak proximately the same shape. The mean error in azimuth was
speed (Pk Speed) for 239 submovements. Each plot shows the 75th and 2.1 ° with a standard deviation o f 67 °. The m e a n error in eleva-
25 t, percentiles as the upper and lower boundaries to the rectangle (the tion was - 4 . 5 ° with a standard deviation o f 60 °. Neither mean
hinges), the median as the central horizontal line, and the mean as the was significantly different from zero, t( 112 ) = 0.335 and t ( 1 1 2 )
plus sign. The whiskers on the box show the most deviant score that is = - 0 . 7 9 8 , respectively, p > .05. The 95% confidence intervals
less than 1.5 hingespreads from the hinges, and the outliers are labeled
by circles ( 1.5-3.0 hingespreads from the hinge) and crosses (greater
than 3.0 hingespreads from the hinge). See Smith and Prentice (1993)
for further discussion ofboxplots.
.3°! :
,~',
Azimuth -*-
Elevation -*
# :', Simulations -0-
,I
.25 :',
,i
,1
arm. The model assumes that infants are uncertain as to the , a

'~
effects o f their m o t o r c o m m a n d s on a r m position and as a result
.20
use a corrective sequence o f s u b m o v e m e n t s to reach a target.
The u n c e r t a i n t y a s s u m p t i o n is m o d e l e d by adding a Gaussian o
r a n d o m variable to the c o m m a n d to m o v e m e n t m a p p i n g ~- .15
( E q u a t i o n s 1 & 2).
The validity o f the a s s u m p t i o n that error in reaching leads to
.10
multiple s u b m o v e m e n t reaches can be directly tested by com-
paring the variability o f the simulated m o v e m e n t s with the vari-
ability o f m o v e m e n t s m a d e by infants. If the a s s u m p t i o n is cor- .05
rect, then the variability o f m o v e m e n t s in simulation should be
c o m p a r a b l e to the variability o f m o v e m e n t s when infants move
in multiple s u b m o v e m e n t s . E x p e r i m e n t 2 tests this prediction 0
-200 -150 -100 -50 0 50 100 150 200
by c o m p a r i n g the directional variability o f reaches m a d e by 8 Degrees
infants at a p p r o x i m a t e l y the t i m e o f reach onset with the direc-
tional variability o f the simulated reaches o f Figure 2. Figure 8. The distributions of the angles between the direction of the
target and the direction of the hand movement. One angle (degrees
Method elevation) indicates directional error above or below the target, and the
other (degrees azimuth) indicates directional error to the left or right.
Participants. Eight infants were tested at approximately the age of Angular error is also shown from the simulated reaches of Figure 2
onset of reaching. These infants were tested as part of a longitudinal (Simulations).
LEARNING TO REACH 821

for the means were -10.3 < ts < 14.5 and -15.7 < ~ < 6.7, maximize the collection of information about the environment.
respectively. The mean of the simulated data was -2.17 degrees The model and data presented here also show that reach kine-
with a standard deviation of 29.8*. matics depend on the intentions of the infant. As shown by Fig-
ure 1, infants use one kinematic pattern to grasp objects and
Discussion another to bat objects. Thelen et al. ( 1993 ) has also shown that
infants adapt their reaching kinematics to increase the proba-
The close fit of the infant and simulated movement variability bility of grasping an object.
provides direct support for the assumptions of the mathemati- In the current work, motor learning was modeled using Q-
cal model. First, the comparison shows that although the infants learning. Q-learning is a simple algorithm that relies on the ac-
move directly toward the target on the average, there is consid- tor trying various motor actions and evaluating the results.
erable directional variability on a submovement-by-submove- Freedland and Bertenthal (1994) have emphasized that motor
ment basis. As is assumed in the model, infants do seem to be learning is critically dependent on generation of a rich set of
attempting to move toward the target, but the motor commands possible actions from which the most appropriate can be se-
they use lead to random error in execution. Second, the data lected. If too large a set of possible actions is used, improvement
show that the variability of infant movement is closely approxi- in performance will be slow because so many actions need to be
mated by the zero mean, gaussian random variable assumed by tested. Q-learning attempts to limit the set of possible actions,
the model. Third, the amount of error in infant reaching, which but the algorithm is still slow because a significant number of
is the driving force in making multiple submovement reaches actions are executed a large number of times. Perhaps the most
more efficient, is comparable to that assumed in the simulations important difference between motor learning in infants and Q-
that lead to predicted reaches of multiple submovements. learning is that infants are much better at selecting and limiting
the number of possible actions during learning (Siegler, 1994).
G e n e r a l Discussion Q-learning stores the current information about the utility of
actions as a Q-function, which in the current work was stored
The current paper presents a mathematical model of the de- as a look-up table indexed by state and action. Representation
velopment of reaching that builds on the work of von Hofsten of information in this way is not only psychologically unrealis-
(1993), Thelen et al. (1993), and Meyer et al. (1988). The tic, but an impediment to learning. Representation of the Q-
model assumes that infants learn to reach by evaluating the function in a more psychologically realistic way that readily al-
effects of movement commands on the state of their arm and lows for interpolation and generalization of information should
on the state of the environment. The model hypothesizes that lead to a better selection of actions and to faster and more real-
infants are primarily attempting to deal with their uncertainty istic learning (e.g., Albus, 1981; Poggio & Girosi, 1990; Rosen-
in the motor command to movement mapping. Simulations baum, Loukopoulos, Meulenbroek, Vaughan, & Engelbrecht,
were presented that qualitatively match the kinematic develop- 1995).
ment of reaching. Experimental data were presented that sup-
A second way in which the current simulations are too sim-
port two key assumptions of the model. Experiment 1 showed
plistic is in the initial movements of the arm at the beginning
that infant movements could be decomposed into the underly-
of learning. In the current simulations, the initial actions are
ing submovements using a principled method, and Experiment
essentially random because the Q-function was initialized by
2 showed that the angular error in infant reaches matches the
uniformly setting it to 0.1. However, infants do not start reach-
form and magnitude of error assumed by the model.
ing by randomly moving their arms. Infants start reaching with
The current model hypothesizes that the development of
a set of movements that are the result of maturation of the neu-
reaching is the result of infants' active exploration of their own
romuscular system and their already extensive experience in
motor abilities and the constraints of the task. This hypothesis
moving their arms (Thelen et al,, 1993; von Hofsten, 1982).
is not a new one (e.g., Gibson, 1988, von Hofsten, 1993, Thelen
This initial biasing of reaching strategies has the effect of accel-
et al., 1993 ), but the current work differs from earlier work in
erating the course of development of reaching.
presenting a concise, mathematical model of the process. The
current simulations produce movement strategies that depend The current model stresses that perceptual development must
jointly on the properties of the muscle-arm system and on the go hand in hand with motor development. Improvement in mo-
size of the target. Whether motor learning is described as soft tor control is contingent on the infants' ability to perceive the
assembly (Thelen et al., 1993) or as search of a task space (von position of the target and the posture of the arm. Although some
Hofsten, 1993), the current simulations show that learning by investigators have emphasized the infant's use of vision of the
exploration is an effective means of selecting efficient strategies hand in early reaching (e.g., Bushnell, 1985), results are accu-
for movement. mulating that emphasize the role of proprioception and effer-
The model, though clearly simplified, captures many of the ence copy in estimating the posture of the arm. Clifton, Muir,
important features of the development of reaching. The model Ashmead, and Clarkson (1993) Showed that lack of vision of
characterizes infants, not as using a collection of reflexes, but as the hand around the onset of reaching has no effect on the suc-
individuals that explore possibilities for action and select the cess of reaching. Clifton et al. (1994) showed that lack of the
most efficient acts. Infants are characterized not simply as re- vision of the hand did not cause 6-month-old infants to alter the
acting to changes in their motor systems and the environment, way they reach. Robin, Berthier, and Clifton (1996) showed that
but as continually searching for the most efficient way of ac- even when infants are required to catch a moving object, that
complishing tasks. This view is similar to that of Haith (1980), lack of vision of the hand has little effect on the success of reach-
who characterized neonates as using eye movement strategies to ing. These data all suggest that learning to reach, sight of the
822 BERTHIER

target, and proprioception from the a r m are far more important Bushnell, E. (1985). The decline of visually guided reaching during
than sight of the hand. infancy. In[ant Behavior and Development. 8, 139-155.
The current model makes the simplifying assumption that Cliftom R. K., Muir, D. W., Ashmead, D. H., & Clarkson, M. G.
submovements of the reach do not overlap in time. This (1993). Is visually guided reaching in early infancy a myth? Child
Do,elopment, 64, 1099-1110.
translates in the model into having the infant make a submove-
Clifton, R. K., Rochat, E, Litovsky, R., & Perris, E. (1991). Object
merit, have the hand come to a stop, perceive the position of the representation guides infants' reaching in the dark. Journal qfExper-
hand, and generate the next submovement. However, the data imental Psychology: Human Perception and Performance, 17, 323-
clearly show that reach submovements do overlap (e.g., Figure 329.
1 ). If infant reaches are a sequence of correcting submove- Clifton, R. K., Rochat, P., Robin, D. J., & Berthier, N. E. (1994).
ments, then infants seem to be able to generate a correction Multimodel perception in the control of infant reaching. Journal qf
while the hand is in motion and to generate a correction for any Experimental Psychology: ttuman Perception and Perlormance, 20,
error in the next submovement. This process may not involve 876-886.
any explicit prediction of future limb positions on the part of Darken, C., & Moody, J. ( 1991 ). Note on learning rate schedules for
stochastic optimization, l~k,ural h~lbrrnation Processing Systems, 3,
the infant but may be accomplished by learning a mapping be-
832-838.
tween the current state and the correct action (e.g., Berthier,
Fetters, L., & Todd, J. ( 1987 ). Quantitative assessment of infant reach-
Singh, Barto, & Houk, 1993, p. 63). ing movements. Journal q['Motor Behavior, 19, 147-166.
Experimental results show that adults are able to make rapid Fitts, E M. (1954). The information capacity of the human motor sys-
corrections while the hand is in motion (e.g., Goodale, P61isson, tem in controlling the amplitude of movement. Journal ¢~[kL,,~peri-
& Prablanc, 1986; P~lisson, Prablanc, Goodale, & Jeannerod, menial Psychology, 47, 381-391.
1986). However, the data on infants are more equivocal. Ash- Flash, T., & Henis, E. ( 1991 ). Arm trajectory modifications during
mead, McCarty, Lucas, and Belvedere (1993) analyzed the reaching towards visual targets. Journal ¢?[Cognitive Neuroscience, 3,
hand paths of infants in a target switching experiment in the 220-230.
dark and found that 9-month-olds made in-flight corrections to Flash, T., & Hogan, N. (1985). The coordination of arm movements:
switched targets. However, Ashmead et al. ( 1993 ) did not find An experimentally confirmed mathematical model. Journal q/Neu-
roscience, 7, 1688-1703.
evidence for in-flight corrections o f 5-month-olds, a result that
Freedland, R. L., & Bertenthal, B. 1. (1994). Developmental changes
might be the result of a lack of experimental power. in interlimb coordination: Transition to hands-and-knees crawling.
In sum, this article presents a mathematical model that gives Psychological Science, 5, 26-32,
a precise implementation of a theory of the development of Gesell, A. ( 1929 ). lnl~mcy and human growth. New York: Macmillan.
reaching. Many ideas in the theory have been discussed by oth- Gibson, E. J. ( 1988 ). Exploratory behavior in the development of per-
ers. The model characterizes infants as actively exploring op- ceiving, acting, and the acquiring of knowledge. Annual R~wiew of
tions for m o v e m e n t and selecting the most efficient m o v e m e n t Psychology, 39. 1-41.
strategies. Infants are situated in a particular environment and Goldfield, E. C., Kay, B. A., & Warren, W. H. ( 1993 ). Infant bouncing:
act in order to fulfill their goals and intentions. Because the ki- The assembly and tuning of action systems. Child Development, 64,
1128-1142.
nematics of reaching depend on what infants intend to do with
Goodale, M. A., P~lisson, D., & Prablanc, C. ( 1986 ). Large adjustments
objects (e.g., Figure 1 ), kinematic analyses of infant reaching
in visually guided reaching do not depend on vision of the hand or
offers a tool with which to investigate infant cognition. Recent perception of target displacement. Nature, 320, 728-750.
work using kinematic analyses has provided evidence that in- Haith, M. M. (1980). Rules babies look by: The organization of new-
fants form mental representations of objects they can no longer born visual activity Hillsdale, N J: Erlbaum.
see (Clifton, Rochat, Litovsky, & Perris, 1991 ), that infants an- Lockman, J. J., & Thelen. E. (1993). Developmental biodynamics:
ticipate movements of objects in accordance with the law of Brain, body, behavior connections. Child Do,elopment, 64, 953-959.
inertia (von Hofsten, Spelke, & Feng, 1993), and that infants Mathew, A., & Cook, M. (1990). The control of reaching movements
aim ahead of moving objects in order to grasp them (Robin, by young infants. Child Development, 61, 1238-1257.
Berthier, & Clifton, 1996). McGraw, M. B. (1943). The neuromuscular maturation of the human
injant. New York: Columbia University Press.
Meyer, D. E., Abrams, R. A., Kornblum. S., Wright, C. E., & Smith,
References J. E. K. ( 1988 ). Optimality in human motor performance: Ideal con-
trol of rapid aimed movements. P.s3,choh~gicalRo,iew, 95, 340-370.
Abend, W., Bizzi, E., & Morasso, E (1982). Human arm trajectory Milner, T. E. ( 1992 ). A model for the generation of movements requir-
formation. Brain, 105, 331-348. ing endpoint precision. Neuroscience, 49, 487-496.
Albus, J. (1981). Brains, behavior, and roboti~s. Peterborough, NH: P~lisson, D., Prablanc, C., Goodale, M. A., & Jeannerod, M. (1986).
Byte Books. Visual control of reaching movements without vision of the limb. II.
Ashmead, D. H., McCarty, M. E., Lucas, L. S., & Belvedere, M. C. Evidence of fast unconscious processes correcting the trajectory of
(1993). Visual guidance in infants' reaching toward suddenly dis- the hand to the final position of a double-step stimulus. Experimental
placed targets. Child Development, 64, 1111-1127. Brain Research, 62, 303-31 i.
Barto, A. G., Bradtke, S. J., & Singh, S. P. ( 1995 ). Learning to act using Poggio, T., & Girosi, E ( 1990, September). Networks for approxima-
real-time dynamic programming, Artificial Intelligence, 72, 81-138. tion and learning. Proceedings ~f the 1EEE, 78, 1481-1497.
Berthier, N. E., Singh, S. E, Barto, A. G., & Houk, J. C. (1993). Dis- Robin, D. J , Berthier, N. E., & Clifton, R. K. ( 1996 ). Infants' predictive
tributed representations of limb motor programs in arrays of adjust- reaching for moving objects in the dark. Developmental Ps),chology,
able pattern generators. Journal of Cognitive Neuroscience, 5, 56-78. 32, 824-835.
Busby, H. R., & Trujillo, D. M. ( 1985 ). Numerical experiments with a Rosenbaum, D., Loukopoulos, L., Meulenbroek, R., Vaughan, J., &
new differentiation filter. Journal of Biomechanical Engineering, 107, Engelbrecht, S. (1995). Planning reaches by evaluating stored pos-
293-299. tures, t3wchological Reviews; 102, 28-67.
LEARNING TO REACH 823

Siegler, R. S. (1994). Cognitive variability: A key to understanding cog- von Hofsten, C. (1979). Development of visually directed reaching:
nitive development. Current Directions in Psychological Science, 3, The approach phase. Journal of Human Movement Studies, 5, 160-
1-5. 168.
Smith, A. F., & Prentice, D. A. (1993). Exploratory data analysis. In G. von Hofsten, C. ( 1982 ). Eye-hand coordination in the newborn. Devel-
Keren & C. Lewis (Eds.), A handbook for data analysis in the behav- opmental Psychology. 18, 450-461.
ioral sciences, statistical issues (pp. 349-390). Hillsdale, NJ: von Hofsten, C. ( 1991 ). Structuring of early reaching movements: A
Erlbaum. longitudinal study. Journal of Motor Behavior, 23, 280-292.
von Hofsten, C. (1993). Prospective control: A basic aspect o f action
Thelen, E. (1994). Three-month-old infants learn new patterns of in-
development. Human Development. 36, 253-270.
terlimb coordination. Infant Behavior & Development, 17, 978.
von Hofsten, C., & R/Snnqvist, L. (1993). The structuring of neonatal
Thelen, E., Corbetta, D., K a m m , K., Spencer, J. P., Schneider, K., & arm movements. Child Development, 64, 1046-1057.
Zernicke, R. E (1993). The transition to reaching: Mapping inten- von Hofsten, C., Spelke, E., & Feng, Q. (1993). Principles of predictive
tion to intrinsic dynamics. Child Development, 64, 1058-1098. reaching in 6-month-old infants. 34 th Annual Meeting of the Psycho-
Uno, Y., Kawato, M., & Suziki, R. (1989). Formation and control of nomic Society Abstracts, 9.
optimal trajectory in human multijoint arm movement. Biological Watkins, C. J. C. H., & Dayan, E (1992). Q-learning. Machine Learn-
Cybernetics, 61, 89-101. ing, 8, 279-292.

Appendix

Q-Learning

Stochastic, discrete-time, optimal-control problems are mathemati- where si is the current state in which action aj was taken, and si+l is the
cally defined by a state set s, which contains all the states of the dynam- actual next state of the dynamical system, a is a learning rate parameter,
ical system, an action set a, which contains the possible actions (each and 3' is the discount factor. The process was then repeated until a goal
action is not necessarily possible in all of the states), and the state-tran- state was reached. The state was then reset to the start state, and the
sition probability mapping, which describes the probability of making process repeated until 100,000 to 300,000 reaches were completed.
a transition from a particular state, st, to a particular next state, s,+~, How one chooses actions in Q-learning is a key decision. Early in
given that action a was taken. Each state transition of the dynamical learning the Q function contains little usable information about how to
system is viewed as having a particular cost. In the case where a time choose actions because it was initially set uniformly t o . I. Late in train-
optimal solution is required, the cost of each state transition is the time ing, the Q-function hopefully becomes a good guide to optimal actions
it takes for the transition to occur. In the current context, the cost, c(s~, because it has cached information about the expected results o f partic-
a, s~+~), of the transition from state st to state s~+~, given that action a ular actions. In the present simulations, actions were chosen according
was taken, is simply the time it takes for the arm to move to a new to a probabilistic exploration strategy that smoothly changes the proba-
configuration. The computational task is to develop an optimal policy bility of actions as training proceeds ( Darken & Moody, 1991 ). Because
that maps each state to its optimal action or actions. the Q-function is of little utility early in training, the method randomly
The true Q-function is written Q(s, a), and it is the expected time it chooses actions in the initial stages of training. As training proceeds,
takes to reach a goal state if action a were taken in state s. Q-learning actions are chosen with probabilities that reflect their Q-values, and late
starts with an arbitrary Q-function and iteratively computes the true Q- in training actions are chosen deterministically according to the Q-
function. At step k of the iteration, an estimate of the true Q-function is function.
provided by Qk(S, a). The current estimate of the best action in state s To accelerate convergence to the optimal Q-function, neighboring
is the action with minimum Qk(S, a). states were lumped together by dividing the state space into boxes that
In the current simulations the Q-function was initialized by setting it were 10 × 10 mm. States that were within a particular box had the
uniformly to 0.1. The system was then set to the start state, (0, 0). same Q-values. a was set to 0.1, and because the simulations were finite
An action was then chosen, applied to the plant, and a state-transition horizon problems, 3' was set to 1.0.
simulated. The Q-function was then updated for that action and state
by the following rule:

Qk+ i (si, aj) R e c e i v e d S e p t e m b e r 30, 1994


R e v i s i o n received O c t o b e r 24, 1995
= ( 1 - a)Qk(si, aj) + a[c(si, aj, st+l) + 3"(rain Q(Si+l, a ) ) l , (AI) A c c e p t e d O c t o b e r 24, 1995 •
a~A

You might also like