Professional Documents
Culture Documents
10.1109 Lars-Sbr.2015.41 Apbp
10.1109 Lars-Sbr.2015.41 Apbp
10.1109 Lars-Sbr.2015.41 Apbp
201512th
12thLatin
LatinAmerican
AmericanRobotics
RoboticsSymposium
Symposiumand
and2015
2015Third
3rd Brazilian
BrazilianSymposium
Symposiumon
onRobotics
Robotics
I. I NTRODUCTION
Analyzing the research related to the development of hu-
manoid robot locomotion, it can be noted that the vast majority
of researchers are looking for the stability of humanoid robots
during walking on flat floors. However, in a real environment,
there is also the need to empower the walking of these robots
on inclined floors.
Climbing ramps is an important ability for humanoid robots:
ramps exist everywhere in the world, such as in accessibility
ramps and building entrances. Also, in the RoboCup domain,
ramps are used in the Rescue Robot League arenas, which
have 15 degrees ramps to challenge localization and mapping Fig. 1: Ramps in the world: accessibility ramps, building
implementations and to increase complexity. On other leagues, entrance ramps, RoboCup Rescue Robots in their arenas.
such as the Humanoid Leagues, stability on slightly sloped
floors is important, because the arenas are not completely
leveled. Some examples of ramps are presented in Figure 1. This work was motivated by changes in the rules of the
This work proposes the use of Reinforcement Learning [1] RoboCup Humanoid League for 2015: from this year onwards,
to learn the action policy that will make a robot walk in an the fields in which the robots will play will be made of
upright position, in a lightly sloped terrain. To do this, we artificial grass, with the intent is to make the game closer
propose a system with 2 layers: one that generate the gait of to the real soccer environment. Artificial grass fields are not
the robot, making use of a gyroscope in a closed feedback perfectly plain, and variations in the turf cause small slopes,
loop to correct the robot’s gait; and a second layer – the RL causing the robots to fall. The system proposed here intends
component – that makes use of an accelerometer to generate to address this problems, however, the experiments performed
a correction for the gait, when the slope of the floor that the in this article demonstrates the model efficiency to solving the
robot is walking changes. more general problem of walking on sloped floors.
211
oscillator). Each oscillator develops a synchronized sinusoidal More recent work on this subject includes: Zhow et. al [9],
trajectory, allowing the robot to have a stable walk. The walk that proposed a dynamically stable gait planning algorithm
with oscillators does not perform the real time ZMP calculus, that use zero-moment point (ZMP) constraints for climbing a
however, our results show that this method perform a walk sloping surface; Hong et. al [10], that proposed a command
based on a dynamic model of ZMP, with an advantage: low state (CS)-based modifiable walking pattern generator for
computational cost. We can express the balance oscillator as modifiable walking on an inclined plane in both pitch and
in 3 and the movement oscillator as in 4: roll directions; Huang et al [11], that computes the future
ZMP locations taking into account a known slope gradient.
OSCbal = ρbal sin(ωbal t + δbal ) + μbal (3)
Literature about biped walking on uneven terrain – such as
grass, sands and rocks – is much larger, and unfortunately
OSCmove = cannot be included in this work.
⎧ The system proposed in this paper differs from all the
⎪
⎪ ρmove 0, rT4 published research because, as far as we know, we are the
⎪
⎪
⎪
⎪ first combining Reinforcement Learning techniques and Gait
⎪
⎪ ρmove sin(ωmovet + δmove ) , −
rT T rT
⎪
⎪ 4 2 4 generation, in a two-layer architecture, to achieve the goal of
⎨
−ρmove 2 − 4 ,
T rT
2 + 4
T rT climbing up and down slopes.
⎪
⎪
⎪
⎪
⎪
⎪ ρmove sin ωmove t − rT2 + δmove + rT4 , T − rT4
T IV. P ROPOSED SYSTEM ’ S ARCHITECTURE
⎪
⎪
2
⎪
⎪ The proposed architecture of our system, presented in Figure
⎩ ρmove T − rT4 , T 2, is a two-layer combination of the traditional gait generation
(4) control loop with a reinforcement learning component.
Where: ρ is the sinusoidal amplitude, ω is angular velocity, The low level layer implements Ha, Tamura and Asama
δ is a phase shift, μ is an offset, and r is the Double Support [6] gait generation system, where the control generates the
Phase (DSP) ratio. sinusoidal commands for the servomotors, and implements a
Ha, Tamura and Asama [6] assumed the dynamic model closed feedback control loop using a gyroscope to correct the
of the robot as a inverted pendulum to develop the feedback robot’s gait.
model. The computed compensation torque is provided to the The Reinforcement Learning layer uses an accelerometer to
balance oscillator parameter μbal . According to Ha, Tamura generate a correction for the gait, when the slope of the floor
and Asama, μbal was assumed to operate as a torque generator, that the robot is walking changes. To be able to correct the
so OSCbal can be written according to equation 5: posture of the robot, this layer has to learn the optimal action
OSCbal = ρbal sin(ωbal t + δbal ) + k p θr + kd θ˙r (5) policy needed to stabilize the robot in the stand-up position.
The problem that the RL layers is trying to learn can be
Where: θ˙r is the gyroscope sensor data; θr is the joint angle described as a finite state MDP, where the states are the
sensor data; k p and kd are found by the experimental method. pitch angle of the ankle joint, discretized in 61 states, using
the accelerometer values (the discretization created 30 states
III. R ELATED W ORK where the robot tilts forward, 30 where it tilts backward,
As far as we know, the first work that specifically addressed and one state that is the central position, which means an
the walk of an humanoid, biped, robot walking on sloped upright robot). In this layer, the robot can only perform two
terrain was published by Zheng and Shen [7], who proposed actions: increments or decrements the position of the ankle
a scheme to enable a biped robot to climb sloping surfaces, joint. The transition function is deterministic, as one action
by using position sensors on the joints and force sensors will make the robot move the ankle joint to the nearby state.
underneath the heel and toe to detect the transition of the Finally the reward given is a positive reinforcement when the
supporting terrain from a flat floor to a sloping surface. Their robot reaches the upright position and a negative one for all
algorithm “for the biped robot control system evaluate the other states. We only used 61 states because the learning was
inclination of the supporting foot and the unknown gradient, performed in a real robot, so we needed to reduce as much as
and a compliant motion scheme is then used to enable the possible the state space, thus reducing training time.
robot to transfer from level walking to climbing the slope”. The algorithm in the RL layer acquires the measures of the
While Zheng and Shen may be the first paper to address accelerometer in X and Z axis. The X axis indicates whether
the problem, Chew, Pratt and Pratt [8] is currently the most the robot is leaning forward or backward in a qualitative way
cited one: they proposed a control strategies for walking and the Z axis indicates quantitatively how much the robot is
dynamically and steadily over sloped terrain with unknown inclined forward or backward. The accelerometer is positioned
slope gradients and transition locations. The proposed algo- near to the center of mass of the robot.
rithm is based on geometric considerations, being very simple, To combine the 2 layers of control, the RL layer is able to
detecting the ground blindly using foot contact switches. With change the parameter pitchO f f set of the gait generation. In
this algorithm, they were able to make a simulated 7-link this way, when the floor inclination changes, the accelerometer
planar biped to walk up and down slopes. perceives this change, and the RL layers defines a new
212
Fig. 2: The two-layer architecture proposed.
213
Fig. 6: The ankles’ servomotors being adjusted according to
the value of pitchO f f set.
214
applying reinforcement learning on the simulator [14]. The
advantage of using a simulator is that a large number of trials
may be run. On the other hand, transfer the learned behavior
of the simulator to a real robot is difficult as explained by
Stone [14], however, the aim of our research is focused in the
domain of soccer humanoid robots in real environments.
One drawback of this system is that, during the walk
process, the robot needs to decrease the step size as the
degree inclination increases. So, as a future works, we propose
to adapt the step height according the degree inclination to
maintain the same step size and use Reinforcement Learning
to update other variables, improving the stability of the robot
during the walk.
ACKNOWLEDGMENT
The authors acknowledge the Centro Universitário da FEI
and the Robotics and Artificial Intelligence Laboratory for
supporting this project. The authors would also like to thank
the scholarships provided by CAPES and CNPq.
R EFERENCES
[1] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
MIT press, 1998.
[2] C. J. C. H. WATKINS, “Learning from delayed rewards.” Ph.D. disser-
tation, Cambridge University, 1989.
[3] T. McGeer, “Passive dynamic walking,” the international journal of
robotics research, vol. 9, no. 2, pp. 62–82, 1990.
[4] M. Vukobratovic and D. Juricic, “Contribution to the synthesis of biped
gait,” Biomedical Engineering, IEEE Transactions on, no. 1, pp. 1–6,
1969.
[5] M. Vukobratović and B. Borovac, “Zero-moment point—thirty five years
of its life,” International Journal of Humanoid Robotics, vol. 1, no. 1,
pp. 157–173, 2004.
[6] I. Ha, Y. Tamura, and H. Asama, “Gait pattern generation and stabi-
lization for humanoid robot based on coupled oscillators,” in Intelligent
Fig. 7: Robot climbing an accessibility ramp. Robots and Systems (IROS), 2011 IEEE/RSJ International Conference
on. IEEE, 2011, pp. 3207–3212.
[7] Y. Zheng and J. Shen, “Gait synthesis for the sd-2 biped robot to
climb sloping surface,” Robotics and Automation, IEEE Transactions
on, vol. 6, no. 1, pp. 86–96, Feb 1990.
missing. In order to facilitate the use of this system by other [8] C.-M. Chew, J. Pratt, and G. Pratt, “Blind walking of a planar bipedal
researchers, all the software is available for download at robot on sloped terrain,” in Robotics and Automation, 1999. Proceedings.
1999 IEEE International Conference on, vol. 1, 1999, pp. 381–386.
http://fei.edu.br/ ˜ rbianchi/software.html. [9] C. Zhou, P. K. Yue, J. Ni, and S.-B. Chan, “Dynamically stable gait
planning for a humanoid robot to climb sloping surface,” in Robotics,
VI. C ONCLUSIONS Automation and Mechatronics, 2004 IEEE Conference on, vol. 1, Dec
2004, pp. 341–346 vol.1.
With the introduction of artificial grass fields in the [10] Y.-D. Hong, B.-J. Lee, and J.-H. Kim, “Command state-based modifiable
RoboCup Humanoid League, the study of new equilibrium walking pattern generation on an inclined plane in pitch and roll direc-
methods for robot locomotion became a necessity. Although tions for humanoid robots,” Mechatronics, IEEE/ASME Transactions on,
vol. 16, no. 4, pp. 783–789, Aug 2011.
the use of a gyroscope is already sufficient to maintain the [11] W. Huang, C.-M. Chew, Y. Zheng, and G.-S. Hong, “Pattern generation
stability of the robot in this type of ground, the use of other for bipedal walking on slopes and stairs,” in Humanoid Robots, 2008.
sensors such as an accelerometer, combined with Reinforce- Humanoids 2008. 8th IEEE-RAS International Conference on, Dec 2008,
pp. 205–210.
ment Learning techniques to help the robot to stabilize itself [12] I. Ha, Y. Tamura, H. Asama, J. Han, and D. W. Hong, “Development of
during the walk seems very promising. Open Humanoid Platform DARwIn-OP,” in Proceedings of SICE Annual
The proposed architecture of our system, a two-layer com- Conference (SICE). IEEE, 2011, pp. 2178–2181.
[13] D. H. Perico, I. J. Silva, C. O. Vilao, T. P. Homem, R. C. Destro,
bination of the traditional gait generation control loop with F. Tonidandel, and R. A. Bianchi, “Hardware and software aspects of
a reinforcement learning component that adjusts the robot’s the design and assembly of a new humanoid robot for robocup soccer,” in
body inclination, improved the walking of the robot, enabling Robotics: SBR-LARS Robotics Symposium and Robocontrol (SBR LARS
Robocontrol), 2014 Joint Conference on. IEEE, 2014, pp. 73–78.
it to walk in different situations of sloping floors (ascending [14] A. Farchy, S. Barrett, P. MacAlpine, and P. Stone, “Humanoid robots
and descending floors). learning to walk faster: From the real world to simulation and back,” in
Using the same architecture described in the article, other Proceedings of the 2013 international conference on autonomous agents
and multi-agent systems. International Foundation for Autonomous
models of MDP are going to be implemented in future Agents and Multiagent Systems, 2013, pp. 39–46.
research, using a 3D simulator as is done in some research
215