Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Proceedings of 2012 IEEE

International Conference on Mechatronics and Automation

August 5 - 8, Chengdu, China

Online Learning of COM Trajectory for

Humanoid Robot Locomotion
Dingsheng Luoa,b, Yi Wanga,b, Xihong Wua,b,c *
Speech and Hearing Research Center, b Key Lab of Machine Perception (Ministry of Education)
Peking University, Beijing, 100871, China
College of Computer Science and Technology, Jilin University, Changchun, 130012, China
{dsluo, wangyi, wxh}

Abstract - Center Of Mass (COM) trajectory is an essential between expected planning and real movements of COM.
factor for stable and natural robot locomotion. Unlike previous Based on the utilization of feedback control, ZMP
research in which COM trajectory either restricted by ZMP compensation methods were studied [5][6]. To avoid
trajectory or directly predefined by simple function such as inaccuracy from model calculation, a reinforcement learning
sinusoid, this research aims to establish the COM trajectory by
based COM trajectory generation method was successfully
online autonomous learning under the objective of locomotion
stability and naturalness, which is expressed as a self-consistent proposed [7]. Faced to the weakness that comes from the
measure in this paper. It provides an alternative that may avoid framework of model based approaches, the research on model-
or weaken the mismatch between theoretical planning and free robot locomotion control is booming [8][9]. However, as
practical implementation. The experimental results on a real for COM trajectory generation, commonly used method is
humanoid robot PKU-HR4 show its effectiveness and promising simply based on predefinition.
Apart from model based approaches, another robot control
Index Terms - COM trajectory; robot locomotion; consistent methodology is biologically inspired approaches, and Central
measure; stability criterion; online learning. Pattern Generator (CPG) [10][11] is one of its typical
representations, which is inherent model free and was applied
I. INTRODUCTION in robot locomotion control successfully. Oscillator is a
Locomotion capability is one of the most essential skills for commonly used component of a CPG based robot control
legged robot. Various approaches were successfully proposed. framework, and many predefined functions were successfully
However, due to the intrinsic property of high complexity and adopted for realizing the oscillator [12][13]. However,
non-linear dynamic control, it is still a great challenge in this traditional CPG based robot control is inclined to optimize
area, especially for humanoid robot under objectives with parameters so as to approach the control objective only, while
respect to stability, flexibility and naturalness. cares little on COM movement. Recently, there was a report
that takes the COM trajectory into account and a simple
Considering humanoid robot could be regarded as a
sinusoid function was employed during robot controller
connecting rod mechanical body, the coordinates, projection to
establishing [14]. Although success was achieved, the sinusoid
the ground as well as variation characteristics of COM will
function seems too simple to accurately describe the behavior
directly determine the feasibility and robustness of robot
of COM movement.
locomotion. Therefore, how to control COM of robot is one of
the most critical factors to reach a satisfied locomotion Since COM movement plays a very important role either in
behavior. At the very beginning, robot locomotion capability maintaining feasible and stable locomotion or in letting the
was realized based on human hand tuning strategies. To let the locomotion more flexible and natural, two aspects need to be
robot locomotion under the control, mathematical formalization considered. First, the COM trajectory should be meticulously
was then studied, and Denavit-Hartenberg expression rule [1] modeled so as to have enough capacity to express various
as well as kinematics and dynamics analysis were introduced. locomotion behaviors. Second, the expected COM trajectory
After Zero Moment Point (ZMP) locomotion control was should be consistent with the real movement of robot physical
proposed by Vukobratovic [2][3], COM becomes theoretical body that is under the mechanical control. In this paper, a Self-
computable. To avoid high computational cost of traditional Consistent Stability Criterion (SC2) is proposed, which aims to
ZMP theory, a simplified realization called Inverted Pendulum measure the match degree between desired COM trajectory and
Model (IPM) was then proposed [4]. These studies could be robot real movement, and therefore a more flexible and natural
regarded as model based approaches, which greatly promoted locomotion control might be achieved. Based on the proposed
the performance of robot locomotion, and turned out to be the SC2, COM trajectory is online learned rather than predefined.
most common and mainstream research for robot locomotion Consequently, it is expected that the representation of COM
controlling. trajectory is more meticulous and better self-consistent
comparing to a predefined one and thus may preferably avoid
Due to intrinsic complexity of mechanical control, model
the weakness coming from model based approaches for robot
based approaches inevitably encounter inconsistent behaves
locomotion control.

978-1-4673-1278-3/12/$31.00 ©2012 IEEE 1996

The remainder of the paper is organized as follows. In cubic interpolation is a variant of cubic interpolation that
Section II, the proposed measure SC2, COM trajectory preserves monotonicity of the data set being interpolated.
modeling as well as online learning algorithm are described. In Meanwhile, as more data points are involved, better modeling
Section III, experiments on PKU-HR4 humanoid robot are capacity could be achieved. Therefore, it is an ideal and
given to evaluate the effectiveness. And the conclusions are reasonable choice for COM trajectory construction.
drawn in Section IV.
As mentioned in subsection A, the COM trajectory
II. ONLINE LEARNING OF COM TRAJECTORY modeling is to generate three functions as expressed in formula
(3), (4) and (5). Taking the ܱܵ‫ܥ‬େ୓୑೉ ሺ‫ݐ‬ሻ generation as an
A. Problem Formulation example, trajectory establishing is described as follows.
Inspired by biological mechanism, legged robot locomotion
could be regarded as a system of coupled oscillators. Let ߨǣ Ͳ ൌ ‫ݐ‬଴ ൏ ‫ݐ‬ଵ ൏ ‫ ڮ‬൏ ‫ݐ‬ெିଵ ൏ ‫ݐ‬ெ ൌ ܶ be a partition of
Consequently, COM trajectory could be expressed by an the interval [0, T], which is then divided into M subintervals
oscillator as suggested in [15]. To simplify the problem, it is Im =[tm-1, tm] successively, where m = 1, …, M. Therefore, a
assumed that the robot locomotion is under a fixed road surface piecewise expression for formula (3) is obtained:
circumstance. Therefore, robot movement could be regarded as ூభ
a periodical stationary process. Based on these points, robot ‫ܥܱܵۓ‬େ୓୑̴௑ ሺ‫ݐ‬ሻ ‫ܫ א ݐ‬ଵ
ܱۖܵ‫ ܥ‬ூమ ሺ‫ݐ‬ሻ ‫ܫ א ݐ‬ଶ
COM trajectory learning is mathematical formulized as follows. ܱܵ‫ܥ‬େ୓୑̴௑ ሺ‫ݐ‬ሻ ൌ େ୓୑̴௑ ሺͳͲሻ
‫۔‬ ‫ڭ‬ ‫ڭ‬
Given time t that belongs to the interval ሾͲǡ ൅λሿ and three- ۖ ூಾ
dimensional real number space R3 in which each point ‫ܥܱܵە‬େ୓୑̴௑ ሺ‫ݐ‬ሻ ‫ܫ א ݐ‬ெ
represents a spatial position of COM, thus a mapping: For each subinterval Im, a further partition is made by

݂େ୓୑ ሺ‫ݐ‬ሻǣ ሾͲǡ ൅λሿ ՜ ܴ ሺͳሻ ߨ௠ ǣ ‫ݐ‬௠ିଵ ൌ ‫ݐ‬௠଴ ൏ ‫ݐ‬௠ଵ ൏ ‫ ڮ‬൏ ‫ݐ‬௠ǡே೘ ିଵ ൏ ‫ݐ‬௠ǡே೘ ൌ ‫ݐ‬௠ . Let
ሼ‫ݔ‬௠௜ ǣ ݅ ൌ Ͳǡͳǡ ǥ ǡ ܰ௠ ሽ be a given set of monotone data values
represents a COM trajectory. Under the periodical stationary at the partition points. That is,
assumption, given the periodicity T, a COM trajectory could be

expressed as an oscillator, and then formula (1) is described as: ܱܵ‫ܥ‬େ୓୑̴௑

ሺ‫ݐ‬௠௜ ሻ ൌ ‫ݔ‬௠௜ ǡ ݅ ൌ Ͳǡͳǡ ǥ ǡ ܰ௠ ሺͳͳሻ
ܱܵ‫ܥ‬େ୓୑ ሺ‫ݐ‬ሻǣ ܶ ՜ ܴଷ ሺʹሻ Based on Nm pair samples ሼሺ‫ݐ‬௠௜ ǡ ‫ݔ‬௠௜ ሻǣ ݅ ൌ Ͳǡͳǡ ǥ ǡ ܰ௠ ሽ , a
which is further divided as three functions corresponding to function noted as ݃େ୓୑̴௑ ሺ‫ݐ‬ሻ over Im could be achieved
three dimensional spatial coordinates: according to monotone cubic interpolation by employing the
Fritsch–Carlson method [16]. Since the function is absolutely
ܱܵ‫ܥ‬େ୓୑̴௑ ሺ‫ݐ‬ሻ ൌ ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௑ ሻሺ͵ሻ determined by Nm pair samples given, the latter could be
ܱܵ‫ܥ‬େ୓୑̴௒ ሺ‫ݐ‬ሻ ൌ ݂େ୓୑̴௒ ሺ‫ݐ‬ǡ ߆௒ ሻሺͶሻ regarded as parameter set of a functional, as a consequence, a
ܱܵ‫ܥ‬େ୓୑̴௓ ሺ‫ݐ‬ሻ ൌ ݂େ୓୑̴௓ ሺ‫ݐ‬ǡ ߆௓ ሻሺͷሻ general expression for ܱܵ‫ܥ‬େ୓୑̴௑ ሺ‫ݐ‬ሻ could be reached:
ூ ூ ூ
where t belongs to T and ߆௑ , ߆௒ ,߆௓ are three parameter sets ܱܵ‫ܥ‬େ୓୑̴௑

ሺ‫ݐ‬ሻ ൌ ݃େ୓୑̴௑

൫‫ݐ‬ǡ ߆௑೘ ൯ (12)
corresponding to three functions. ூ
where ‫ܫ א ݐ‬௠ and ߆௑೘ ൌ ሼሺ‫ݐ‬௠௜ ǡ ‫ݔ‬௠௜ ሻǣ ݅ ൌ Ͳǡͳǡ ǥ ǡ ܰ௠ ሽ.
To measure the COM trajectory, a reward function should
be established which is described as: Thus, according to formula (3) and (10), parameterized
piecewise model for ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௑ ሻ is obtained:
ܴ݁‫݀ݎܽݓ‬ሺ݂ሻǣ ‫ ܨ‬՜ ሾͲǡͳሿሺ͸ሻ
ூ ூ
‫݃ ۓ‬େ୓୑̴௑ ൫‫ݐ‬ǡ ߆௑ ൯ ‫ܫ א ݐ‬ଵ
భ భ
where f belongs to ‫ ܨ‬ൌ ሼ݂େ୓୑̴௑ ǡ ݂େ୓୑̴௒ ǡ ݂େ୓୑̴௓ ሽ.
ۖ ݃ூమ ூ
‫ܫ א ݐ‬ଶ
൫‫ݐ‬ǡ ߆ మ ൯
Therefore, to find a optimal COM trajectory ܱܵ‫ܥ‬େ୓୑ ‫כ‬
, the ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௑ ሻ ൌ  େ୓୑̴௑ ௑ ሺͳ͵ሻ
‫כ‬ ‫כ‬ ‫כ‬ ‫۔‬ ‫ڭ‬ ‫ڭ‬
objective is to learn three parameter sets ߆௑ ǡ ߆௒ ǡ ߆௓ , so that: ۖ ூಾ ூಾ
‫݃ە‬େ୓୑̴௑ ൫‫ݐ‬ǡ ߆௑ ൯ ‫ܫ א ݐ‬ெ
߆௑‫ כ‬ൌ ƒ”‰ƒšܴ݁‫ ݀ݎܽݓ‬ቀ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௑ ሻቁሺ͹ሻ ூ
It is easy to see, ߆௑ ൌ ‫ڂ‬ெ
௠ୀଵ ߆௑ , which actually is a set of pair

߆௒‫ כ‬ൌ ƒ”‰ƒšܴ݁‫ ݀ݎܽݓ‬ቀ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௒ ሻቁሺͺሻ samples and could be further expressed as

߆௓‫ כ‬ൌ ƒ”‰ƒšܴ݁‫ ݀ݎܽݓ‬ቀ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௓ ሻቁሺͻሻ ߆௑ ൌ ሼሺ‫ݐ‬௜ ǡ ‫ݔ‬௜ ሻǣ ݅ ൌ Ͳǡͳǡ ǥ ǡ ܰሽ (14)
where ܰ ൌ σெ
௠ୀଵ ܰ௠ .
B. COM Trajectory Modeling
To establish COM trajectory, modeling capacity and Obviously, ݂େ୓୑̴௒ ሺ‫ݐ‬ǡ ߆௒ ሻ and ݂େ୓୑̴௓ ሺ‫ݐ‬ǡ ߆௓ ሻ could be
learning efficiency are two essential but contrarious aspects established in a completely same way as for ݂େ୓୑̴௑ ሺ‫ݐ‬ǡ ߆௑ ሻ,
that need be considered. In this research, other than simple therefore, a piecewise monotone cubic interpolation based
sinusoid, a piecewise monotone cubic interpolation approach COM trajectory is successfully modeled.
[16] is introduced for COM trajectory modeling. Monotone

C. Self-Consistent Stability Criterion (SC2) It is obvious that the proposed measure of reward function
The evaluation of COM trajectory is an essential issue in could describe the self-consistent degree. And the larger its
stable and natural robot locomotion control. ZMP stable value is, the higher match degree between COM trajectory and
criterion provides a good theory to restrain the variation of corresponding body movement will be.
COM [3], but fails to measure in detail how a COM trajectory D. Online Learning Algorithm
should be. Although IPM successfully simplifies complex ZMP
It is easy to see, the proposed SC2 is a mapping that
computation, it roughly assumes the robot physical body as a
successfully quantifies trajectory f to a real number, and
mass point and thus guide the COM trajectory generation in a
therefore it could be directly served as a reward function during
inaccuracy style [17]. Realizing the importance of the
COM trajectory learning process. Thus, formula (6) could be
difference between desired trajectory and real trajectory,
realized as follows:
approaches around ZMP compensation control were proposed
and successfully improved the stability of robot locomotion ܴ݁‫݀ݎܽݓ‬ሺ݂ሻ ൌ  ଶ ሺ݂ሻ (17)
[5][6]. However, both expected trajectory planning and real
trajectory calculating in these approaches were commonly To avoid the weaknesses of simulation platform based
obtained under model based approaches, therefore extra load is learning, real robot based learning is adopted in this research.
imported to computational source limited robot. Thus, requirements of online and real time learning style
should be considered. As suggested in [21], computational
In this research, the viewpoint that planned COM trajectory efficiency as well as convergent speed of the learning
should consistent with robot body movement is suggested. It is algorithm should be emphasized.
believed that the more match degree between COM trajectory
and robot body movement (i.e. more self-consistent), more Based on above discussion, along with the learning
stable and natural robot locomotion. Based on this view, a objective and the established COM trajectory model mentioned
stability criterion that focuses on self-consistent is proposed as in previous subsection, an online learning algorithm is
follows. presented as follows.
Firstly, by employing an Design of Experiments based
Given an expected COM trajectory ݂େ୓୑ ሺ‫ݐ‬ሻ as in (1) and
Active Learning (DEAL) strategy [21], a set that contains K
real movement of robot body ݄஻௢ௗ௬ெ௢௩௘ ሺ‫ݐ‬ሻ over same interval,
subset is obtained, noted as ߆ ሺ଴ሻ ൌ ሼ߆௞ ǣ ݇ ൌ ͳǡʹǡ ǥ ǡ ‫ܭ‬ሽ. And
and considering values of both functions are three dimension, a
self-consistent measure is defined on each subfunction pair, each ߆௞ , which is assumed consisting of N pair samples, could
noted as (f, h), which are under same dimension axis. be regarded as a parameter set according to (14).

ͳ Secondly, a partition of ߆௞ is made to get M further subset,

‫݁ݎݑݏܽ݁ܯ‬ሺ݂ǡ ݄ሻ ൌ ሺ‫ݎ‬௙௛ ൅ ሺͳ െ ߣ௛ ሻሻሺͳͷሻ noted as ߆௞ ൌ ሼ߆௞ǡ௠ ǣ ݉ ൌ ͳǡʹǡ ǥ ǡ ‫ܯ‬ሽ . Based on ߆௞ and
piecewise monotone cubic interpolation under the Fritsch–
where ‫ݎ‬௙௛ is the correlation coefficient of f and h, expressing Carlson method, a piecewise COM trajectory ݂ሺ‫ݐ‬ǡ ߆௞ ሻ is
the match degree of two functions from the view of global established according to (13). And then reward measure for
trend, the larger ‫ݎ‬௙௛ , the more consistent f and h; While ߣ௛ is ݂ሺ‫ݐ‬ǡ ߆௞ ሻ is obtained, noted as ܴ௞ , according to formula (17).
Lyapunov exponent [18] of movement curve h, which is a
commonly used index to quantify the local dynamic stability of Thirdly, for all k from 1 to K, repeat previous process to get
a time series process[19][20]. Considering that trajectory f is K reward scores ሼܴ௞ ǣ ݇ ൌ ͳǡʹǡ ǥ ǡ ‫ܭ‬ሽ , and among them, a
periodical stationary, thus ߣ௛ should be as small as possible to maximum score ܴ‫ כ‬and corresponding k* could be obtained.
let the h more match to f. Therefore ߣ௛ could be adopted to Consequently, ߆ ௞ is obtained, which obviously satisfy:
describe the consistent extent from the view of local variation. ‫כ‬
Here, to fit both global and local measurement in a same trend, ߆ ௞ ൌ ƒ”‰ƒšܴ݁‫݀ݎܽݓ‬൫݂ሺ‫ݐ‬ǡ ߆௞ ሻ൯ǡ ݇ ൌ ͳǡʹǡ ǥ ǡ ‫ܭ‬ሺͳͺሻ
Lyapunov exponent ߣ௛ is employed by a simple reversed ‫כ‬
In addition, to avoid the optimum ߆ ௞ is a local one, a
expression 1- ߣ௛ .
random gradient learning policy [22] is further employed. And
Since robot body movement is actually factored by planned around the neighborhood of ߆௞ , by randomly sampling, a new
COM trajectory, ݄஻௢ௗ௬ெ௢௩௘ ሺ‫ݐ‬ሻ could be rewritten as ሺଵሻ
set߆ ൌ ሼ߆ ሺଵሻ௞
ǣ ݇ ൌ ͳǡʹǡ ǥ ǡ ‫ܭ‬Ԣሽ is obtained. Performing the
݄஻௢ௗ௬ெ௢௩௘ ሺ‫ݐ‬ǡ ݂ሻ. As a consequence, the Lyapunov exponent ߣ௛ same learning process mentioned above to reach a new
of h could be changed as ߣ௛ሺ௙ሻ , and correlation coefficient ‫ݎ‬௙௛ optimum parameter set ߆ ሺଵሻ௞ . And till the learning process is
could be modified as ‫ݎ‬௙ǡ௛ሺ௙ሻ . Thus, by redefining the proposed converged, the final best parameter set, noted as ߆௕௘௦௧ , could
measure as in (15), the Self-Consistent Stability Criterion (SC2) be achieved, which satisfy:
is achieved:
߆௕௘௦௧ ൌ ƒ”‰ƒšܴ݁‫݀ݎܽݓ‬൫݂ሺ‫ݐ‬ǡ ߆ ௦ ሻ൯ǡ ߆ ௦ ‫ א‬ራ ߆ሺ௜ሻ  ሺͳͺሻ
ܵ‫ ܥ‬ଶ ሺ݂ሻ ൌ ሺ‫ݎ‬௙ǡ௛ሺ௙ሻ ൅ ሺͳ െ ߣ௛ሺ௙ሻ ሻሻሺͳ͸ሻ ௜ୀ଴
In practice,݄஻௢ௗ௬ெ௢௩௘ ሺ‫ݐ‬ǡ ݂ሻ could be either obtained by sensors
feedback or calculated according to the variation of joint angles.

III. EXPERIMENTS AND RESULLTS system and two baseline systems) adopt
a the same trajectory of
swinging leg, which is modeled according
a to the proposed
A. The PKU-HR4 Robot Platform
COM trajectory modeling method while parameter set is
The PKU-HR4 is the fourth version humanoid robot predefined so that obtained trajecttory is same as in [24]. In
developed by Peking University, which is 4.211kg weight, 56cm other words, three systems on rob bot locomotion control are
high. It has 21 DOF (degree of freedom), inclluding 6 DOF for completely same except the desired
d COM trajectories. Such an
hip, 1 DOF for each knee, 2 DOF for each aankle, 1 DOF for experimental setup let the COM M trajectory research be
torso, 2 DOF for head, 3 DOF for each arm. Itt uses 3 ROBTIS reasonable.
RX-64 and 18 ROBTIS RX-28 servo actuatorrs. All the servos
are connected to the controller via RS485 connection running As for the obtaining of robot boddy movement ݄஻௢ௗ௬ெ௢௩௘ ሺ‫ݐ‬ሻ,
at 1Mbps. The PC-104 equipped with inerttial sensor is the namely, the real COM trajectory, several methods could be
main control system that placed in the back off PKU-HR4. The available, while in this research, itt is calculated according to
inertial sensor has one 3-axes gyroscope and one 3-axes the variation of joint angles. And only
o y dimension (left-right
accelerometer, which will be used to measurre the stability in direction) of COM trajectory is considered,
c while x and z
this research. The appearance and designn structure with dimension are simply set as constannts. And T equals to the time
measurement in mm of PKU-HR4 are shown inn Figure 1. duration of each step.
During COM trajectory modelin ng and learning, only curve
over half of the period T are considered due to its symmetry.
Under this setting, parameter M in n formula (13) is set as 2,
which means that the COM trajecto ory is a 2-pieces piecewise
function over half of the T. For all m = 1, …, M, corresponding
Nm are equally set as 4. In DEAL leaarning, parameter K is set as
L16(54) that means K Taguchi's orrthogonal arrays are active
selected at the beginning of the learrning process [21]. While in
random gradient learning, ‫ܭ‬Ԣ is set as 8. And each evaluation
0T which means 30 steps.
related to each parameter set lasts 30
Figure 1. Appearance and design structure of humanoiid robot PKU-HR4 C. Results and Discussion
1) Overall Performance Compa arison
Apart from the robot PKU-HR4 itself, a PC C laptop, wireless
router, external power supplier etc. are also involved in the The overall performance of three COM trajectories, i.e.,
robot platform. All of the learning processes are executed by generated by IPM, predefined by y Sinusoid, learned by the
the PC104 equipped on PKU-HR4, while the P PC laptop is only proposed approach in this paper, are
a compared as shown in
used as a monitor which remotely logins tto the robot via Figure 2, where three measure valu
ues are calculated according
wireless router. All joint motors are supplied bby external power to the proposed SC2.
supply to guarantee constant outside conditions during the
whole learning process.
The locomotion of PKU-HR4 is under the control of a
locomotion engine, where a predefined gait ppattern is adopted
and each locomotion step is divided into sevveral sub-actions.
Given COM trajectory and trajectory off swinging leg,
sequences of joint angles corresponding to sub
ub-actions in each
step could be obtained via inverse kkinematics, and
consequently, robot locomotion is realized.
B. Experimental Setup
In order to evaluate the effectiveness of thee proposed COM
trajectory learning framework that involves iits modeling and Figure 2. Overall performance of three desired COM trajectory.
SC2 based measure, other two COM trajeectory generation
approaches, i.e., predefined and IPM based C COM generation, From Figure 2, it can be seeen that the desired COM
are employed as two baseline systems. Foor the former, a trajectory learned by proposed appro
oach outperforms other two,
predefined sinusoid is adopted as COM trajectoory, while for the while the sinusoid one receives the lowest measure score. The
latter, COM trajectory is generated by IPM as suggested in [23]. experimental result in Figure 2 is agree
a with our expectation,
As mentioned in previous subsection, apart from the COM since sinusoid seems too simple to describe
d the COM trajectory;
trajectory, the robot locomotion is also depennded on trajectory while the one generated by IPM tak kes some more information
of the swinging leg. In this research, to let thee comparison in a in trajectory building and hence behaves
b better than simple
fair setting, three robot locomotion control syystems (proposed sinusoid; as for the one learned in this paper, it actually takes

the consistent measure as the learning criterion, therefore a
higher reward value could be obtained. Thhe learned COM
trajectory is successfully applied in robot loccomotion control
and a comparably stable movement is achievedd.
To further observe the difference of threee desired COM
trajectories, a comparison of them is shown as in Figure 3.

(a) Sinusoid based robot locom

motion control system

Figure 3. Comparison of three desired COM ttrajectory.

From Figure 3, it can be seen that two cuurves by IPM and

sinusoid are almost coincide with each othher, while curve
learned in this research makes obvious differrences comparing
to other two. This may be the main cause thatt lead the learned (b) IPM based robot locomottion control system
COM trajectory behaves in a higher consistent measure.
2) Desired and Real Trajectory Comparisoon
The desired trajectory and real one in thhree systems are
compared as shown in Figure 4. It consists off three subfigures
that show the comparison of two curves iin three systems
According to the idea of consistent meeasure, the more
similar two curves are, the higher SC2 score wiill be, and further
the better the desired COM trajectory will be. From Figure 4, it
can be seen intuitively that match degree in ((a) is the lowest, (c) Learned COM trajectory based robo
ot locomotion control system
and in (b) the match degree turns better, two curves are closer
to each other, while in (c) the match degree seeems much better. Figure 4. Comparison between the desired and real COM trajectory.
It agrees with what was shown in Figure 2, annd further reveals
the proposed measure SC2 is effective. Thiss intuitive match
degree is achieved not only from the globbal view of the
similarity between two curves, but also from tthe local view of
the similarity among subcurves over each pperiod T in each
Furthermore, the intuitive match degreee on similarity
among subcurves over each period T is only m meaningful for the
real COM trajectory, since the desired curve is strict periodical
stationary. It obviously could provide informaation on stability
of real robot locomotion, and thus further dem
monstrates that the
proposed COM trajectory learning framework is effective.
g process based on SC2 reward.
Figure 5. COM trajectory online learning
3) Online Learning Process
DEAL, while the last part is evalu uations based on parameter
The real robot based COM trajectory llearning process
sets obtained by random gradient leaarning policy.
according to the presented algorithm is shoown in Figure 5,
where SC2 measure score at each evaluation (ttrail) is indicated. Each point in the learning proceess curve was averaged by 3
The curve consists of two parts: the first paart is evaluations experiments, which is expected to t guarantee the statistical
based on parameter sets obtained by active learning strategy significance. From Figure 5, it can n be seen that the learning

process is efficient and converged within about 60 trails. This control of zero-moment point,” In Proceedings of IEEE International
Conference on Robotics and Automation, pp. 1620-1626, Taipei, Taiwan,
efficiency is mainly own to the active learning strategy which
could greatly reduce the hypothesis space.
[7] T. Matsubara, J. Morimoto, J. Nakanishi, S. H. Hyon, J. G. Hale and G.
Cheng, “Learning to acquire whole-body humanoid CoM movements to
IV. CONCLUSIONS achieve dynamic tasks,” In Proceedings of IEEE International
In order to get a more stable and natural robot locomotion Conference on Robotics and Automation, pp. 2688-2693, Roma, Italy,
control, the issue of COM trajectory generation has been
[8] A. J. Ijspeert, “Central pattern generators for locomotion control in
focused. Unlike previous work, such as predefinition, model animals and robots: a review,” Neural Networks, vol. 21, no. 4, pp. 642-
calculation etc., a new learning framework is established for a 653, 2008.
better COM trajectory, with improved modeling capacity and [9] R. Chalodhorm, D.B. Grimes, K. Grochow and R.P.N. Rao, “Learning to
learning efficiency. walk through imitation”, In Proceedings of International Joint
Conference on Artificial Intelligence, pp. 2084-2090, Hyderabad, India,
Firstly, to measure a COM trajectory candidate, a new 2007.
index Self-Consistent Stability Criterion (SC2) is proposed, [10] G. Taga, Y. Yamaguchi and H Shimizu, “Self-organized control of
from the perspective that the better robot locomotion capability, bipedal locomotion by neural oscillators in unpredictable environment,”
the more consistent between both desired and real COM Biological Cybernetics, vol. 65, no. 3, pp. 147-159, 1991.
trajectories. Then, by employing piecewise monotone cubic [11] G. Taga, “A model of the neuro-musculo-skeletal system for anticipatory
adjustment of human locomotion during obstacle avoidance,” Biological
interpolation, a new COM trajectory modeling approach is
Cybernetics, vol. 78, no.1, pp. 9-171998.
presented, which has a better modeling capacity and could
[12] F. Delcomyn, “Neural basis for rhythmic behaviour in animals,”
approximate any curves theoretically by a proper parameter set. Science, vol. 210, no. 4469, pp. 492-498, 1980.
Furthermore, under the consideration of requirements from real [13] K. Matsuoka, “Sustained Oscillations Generated by Mutually Inhibiting
robot based learning, an active learning method followed by a Neurons with Adaptation,” Biological Cybernetics, vol. 52, no. 6, pp.
random gradient learning approach is introduced, where active 367-376, 1985.
learning (DEAL) is to improve the convergent speed by greatly [14] I. Ha, Y. Tamura and H. Asama, “Gait pattern generation and
reducing the hypothesis space, while random gradient learning stabilization for humanoid robot based on coupled oscillators,” In
Proceedings of IEEE/RSJ International Conference on Intelligent Robots
is expected to avoid falling into local optimum. Experiments and Systems, pp. 3207-3212, San Francisco, CA, USA, 2011
are performed with three systems on a humanoid robot PKU- [15] G. Endo, J. Nakanishi, J. Morimoto, and G. Cheng, “Experimental
HR4 for locomotion control, and results show the effectiveness studies of a neural oscillator for biped locomotion with QRIO,” In
of the contribution in this research. Proceedings of IEEE International Conference on Robotics and
Automation, pp. 598-604, Barcelona, Spain, 2005.
ACKNOWLEDGMENT [16] F. N. Fritsch and R. E. Carlson, “Monotone piecewise cubic
interpolation,” SIAM Journal on Numerical Analysis, vol. 17, no. 2, pp.
The work was supported in part by the National Natural 238-246, 1980.
Science Foundation of China (No. 90920302, No. 91120001, [17] Y. Choi, B. J. You, and S. R. Oh, “On the stability of indirect ZMP
No.61121002), a "Twelfth Five-Year" National Science & controller for biped robot systems,” In Proceedings of IEEE/RSJ
Technology Support Program of China (No. 2012BAI12B01) International Conference on Intelligent Robots and Systems, pp. 1966-
and a research program from Microsoft China. Authors would 1971, Sendal, Japan, 2004.
great appreciate Professor Huisheng Chi for his helpful and [18] A. M. Lyapunov, “The general problem of the stability of motion”
(English translations), Taylor and Francis, International Journal of
beneficial suggestions. Authors would also great appreciate Control, vol. 55, no. 3, pp. 531-773, 1992.
anonymous reviewers for their valuable comments. [19] M. T. Rosenstein, J. J. Collins, and C. J. De Luca, “A practical method
for calculating largest Lyapunov exponents from small data sets,”
REFERENCES Physica D, vol. 65, pp. 117–134, 1993.
[1] J. Denavit and R. S. Hartenberg, “A kinematic notation for lower-pair [20] H. Kantzi and S. Schreiber, “Nonlinear Time Series Analysis,”
mechanisms based on matrices,” Journal of Applied Mechanics, vol. 22, Cambridge University Press, 1997.
no. 2, pp. 215-221, 1995. [21] D. S. Luo, Y. Wang and X. H. Wu, “Active online learning of the
[2] M. Vukobratovic and D. Juricic, "Contribution to the synthesis of biped bipedal walking,” In Proceedings of IEEE-RAS International
gait," IEEE Transaction on Biomedical Engineering, vol. 16, no.1, pp.1- Conference on Humanoid Robots, pp.352-357, Bled, Slovenia, 2011.
6, 1969. [22] N. Kohl and P. Stone, “Policy gradient reinforcement learning for fast
[3] M. Vukobratovic and B. Borovac, “Zero-moment point – thirty five quadrupedal locomotion,” In Proceedings of IEEE International
years of its life,” International Journal of Humanoid Robotics, vol. 1, Conference on Robotics and Automation, pp.2619-2624, New Orleans,
no.1, pp. 157–173, 2004. LA, USA, 2004.
[4] S. Kajita, O. Matsumoto and M. Saigo, “Real-time 3D walking pattern [23] M. Friedmann, J. Kiener, S. Petters, et al, "Versatile, high-quality
generation for a biped robot with telescopic legs,” In Proceedings of motions and behavior control of humanoid soccer robots," In
IEEE International Conference on Robotics and Automation, pp. 2299- Proceedings of the Workshop on Humanoid Soccer Robots of the IEEE-
2308, Seoul, Korea, 2001. RAS International Conference on Humanoid Robots, pp.9-16, Genoa,
[5] K. Hirai, M. Hirose, Y. Haikawa and T. Takenaka, “The Development of Italy, 2006.
Honda Humanoid Robot,” In Proceedings of IEEE International [24] C. Niehaus, T. Röfer and T. Laue, “Gait-optimization on a humanoid
Conference on Robotics and Automation, pp. 1321-1326, Leuven, robot using particle swarm optimization,” In Proceedings of the
Belgium, 1998. Workshop on Humanoid Soccer Robots of the IEEE-RAS International
[6] S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada, K. Yokoi Conference on Humanoid Robots, Pittsburgh, PA, USA, 2007.
and H. Hirukawa, “Biped walking pattern generation by using preview


You might also like