Professional Documents
Culture Documents
AModel-freeDeepReinforcementLearningApproachforRoboticManipulatorsPathPlanning
AModel-freeDeepReinforcementLearningApproachforRoboticManipulatorsPathPlanning
net/publication/354464422
CITATION READS
1 1,209
5 authors, including:
Joaquin Carrasco
The University of Manchester
132 PUBLICATIONS 1,986 CITATIONS
SEE PROFILE
All content following this page was uploaded by Hanlin Niu on 09 September 2021.
Abstract: Path planning problems have attracted much attention in robotic fields such as manipulators. In this paper,
a model-free off-policy actor critic based deep reinforcement learning method is proposed to solve the classical path
planning problem of a UR5 robot arm. Unlike standard path planning methods, the reward design of the proposed method
contains smoothness reward, which assures smooth trajectory of the UR5 robot arm when accomplishing path planning
tasks. Additionally, the proposed method does not rely on any model while the standard path planning method is model-
based. The proposed method not only guarantees that the joint angle of the UR5 robotic arm lies within the allowable range
each time when it reaches the random target point, but also ensures that the joint angle of the UR5 robotic arm is always
within the allowable range during the entire episode of training. A standard path planning method was implemented in
Robot Operating System (ROS) and the proposed method was applied in CoppeliaSim to validate the feasibility. It can be
inferred from the experiment that the training with the proposed method is successful.
r = rd + ro + ra + rk (1)
Fig. 4.: The process of the UR5 robot arm reaching the
target position. The pink disc depicts the position of the
target.
Fig. 5.: Pick and place tasks via the standard path plan- Fig. 6.: Average reward of the proposed method. The
ning method. The UR5 robot arm with orange color de- transparent area indicates the standard deviation of the
notes the initial position of the actual UR5 robot arm. The results.
UR5 robot arm with grey color represents the current po-
sition of the actual UR5 robot arm. The image viewer at
the bottom right corner represents the view from the 3D
camera. The location of the box can be computed by the
aruco marker [27] on top of it.
5. CONCLUSION
Fig. 9.: Failed path planning with the standard path plan- In this paper, a model-free off-policy actor critic based
ning method. The orange UR5 robot arm stands for the deep reinforcement learning method is proposed to solve
initial position of the actual UR5 robot arm, the grey UR5 the problem of path planning. The simulation results
robot arm stands for the current position of the actual in CoppeliaSim validated the feasibility of the proposed
UR5 robot arm, and the transparent UR5 robot arm de- method. When it comes to the hardware implementation,
notes the planned trajectory for the actual movement of a standard path planning method has been implemented
the real UR5 robot arm generated by OMPL (Open Mo- as a baseline to compare and contrast both methods. It
tion Planning Library) in rviz. can be deduced that the proposed method can generate a
smooth trajectory and keep the joint angles of the UR5
robot arm always within the allowable range. In addi-
is trying to find path planning solutions at the left half tion, the proposed method does not rely on any model
plane. It can be deduced from Fig. 8 that the proposed while the standard path planning method is model-based.
training method is feasible. In future work, the proposed method will also be imple-
mented on the real UR5 robot arm. Besides, vision infor-
4.2 Comparison with Standard Path Planning Method mation can be applied in the proposed method to accom-
Unlike the standard path planning method, the pro- plish more complicated tasks.
posed method can not only guarantee that the joint angle
of the UR5 robotic arm is within the allowable range each
time when it reaches the target point, but also ensure that ACKNOWLEDGEMENT
the joint angle of the UR5 robotic arm is always within
This work was supported by EPSRC project No.
the allowable range during the entire episode of training.
EP/S03286X/1 and EPSRC RAIN project No. EP/R026084/1.
Fig. 9 shows an example of failed path planning. It can
be seen from the pictures in Fig. 9 that the planned tra-
jectory generated by the standard path planning method
REFERENCES
contains a severe jerk, which makes it impossible to exe-
cute in the actual experiment. [1] K. Wei and B. Ren, “A method on dynamic path
Table 1 depicts the method of successful rate compari- planning for robotic manipulator autonomous ob-
son for the path planning. It can be inferred that the stan- stacle avoidance based on an improved rrt algo-
dard path planning method failed to complete the task rithm,” Sensors, vol. 18, no. 2, p. 571, 2018.
[2] K. Wu, J. Hu, B. Lennox, and F. Arvin, “Sdp- tional Journal of Control, Automation, and Systems,
based robust formation-containment coordination vol. 5, no. 6, pp. 674–680, 2007.
of swarm robotic systems with input saturation,” [15] R. S. Sutton, “Introduction: The challenge of re-
Journal of Intelligent & Robotic Systems, vol. 102, inforcement learning,” in Reinforcement Learning.
no. 1, pp. 1–16, 2021. Springer, 1992, pp. 1–3.
[3] H. Niu, Z. Ji, F. Arvin, B. Lennox, H. Yin, and [16] R. S. Sutton and A. G. Barto, Reinforcement learn-
J. Carrasco, “Accelerated sim-to-real deep rein- ing: An introduction. MIT press, 2018.
forcement learning: Learning collision avoidance [17] H. Zhang, H. Jiang, Y. Luo, and G. Xiao, “Data-
from human player,” in 2021 IEEE/SICE Inter- driven optimal consensus control for discrete-time
national Symposium on System Integration (SII). multi-agent systems with unknown dynamics us-
IEEE, 2021, pp. 144–149. ing reinforcement learning method,” IEEE Transac-
[4] K. Wu, J. Hu, B. Lennox, and F. Arvin, “Finite- tions on Industrial Electronics, vol. 64, no. 5, pp.
time bearing-only formation tracking of hetero- 4091–4100, 2016.
geneous mobile robots with collision avoidance,” [18] H. Kandath, J. Senthilnath, and S. Sundaram,
IEEE Transactions on Circuits and Systems II: Ex- “Mutli-agent consensus under communication fail-
press Briefs, 2021. ure using actor-critic reinforcement learning,” in
[5] J. Hu, H. Niu, J. Carrasco, B. Lennox, and 2018 IEEE Symposium Series on Computational In-
F. Arvin, “Voronoi-based multi-robot autonomous telligence (SSCI). IEEE, 2018, pp. 1461–1465.
exploration in unknown environments via deep re- [19] Y. Zhang and M. M. Zavlanos, “Distributed off-
inforcement learning,” IEEE Transactions on Vehic- policy actor-critic reinforcement learning with pol-
ular Technology, vol. 69, no. 12, pp. 14 413–14 423, icy consensus,” in 2019 IEEE 58th Conference on
2020. Decision and Control (CDC). IEEE, 2019, pp.
[6] H. Niu, Z. Ji, Z. Zhu, H. Yin, and J. Carrasco, “3d 4674–4679.
vision-guided pick-and-place using kuka lbr iiwa [20] E. Rohmer, S. P. Singh, and M. Freese, “V-rep: A
robot,” in 2021 IEEE/SICE International Sympo- versatile and scalable robot simulation framework,”
sium on System Integration (SII). IEEE, 2021, pp. in 2013 IEEE/RSJ International Conference on In-
592–593. telligent Robots and Systems. IEEE, 2013, pp.
[7] B. Dasgupta and T. Mruthyunjaya, “Singularity-free 1321–1326.
path planning for the stewart platform manipulator,” [21] V. Nair and G. E. Hinton, “Rectified linear units
Mechanism and Machine Theory, vol. 33, no. 6, pp. improve restricted boltzmann machines,” in Icml,
711–725, 1998. 2010.
[22] M. Lutz, Programming python. ” O’Reilly Media,
[8] A. A. Maciejewski and C. A. Klein, “Obstacle
Inc.”, 2001.
avoidance for kinematically redundant manipula-
[23] N. Ketkar, “Introduction to keras,” in Deep learning
tors in dynamically varying environments,” The in-
with Python. Springer, 2017, pp. 97–111.
ternational journal of robotics research, vol. 4,
[24] M. Quigley, K. Conley, B. Gerkey, J. Faust,
no. 3, pp. 109–117, 1985.
T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “Ros:
[9] T. Greville, “The pseudoinverse of a rectangular or
an open-source robot operating system,” in ICRA
singular matrix and its application to the solution of
workshop on open source software, vol. 3. Kobe,
systems of linear equations,” SIAM review, vol. 1,
Japan, 2009, p. 5.
no. 1, pp. 38–43, 1959.
[25] H. R. Kam, S.-H. Lee, T. Park, and C.-H. Kim,
[10] S. R. Buss, “Introduction to inverse kinematics with “Rviz: a toolkit for real domain data visualization,”
jacobian transpose, pseudoinverse and damped least Telecommunication Systems, vol. 60, no. 2, pp. 337–
squares methods,” IEEE Journal of Robotics and 345, 2015.
Automation, vol. 17, no. 1-19, p. 16, 2004. [26] I. A. Sucan, M. Moll, and L. E. Kavraki, “The open
[11] D. E. Whitney, “Resolved motion rate control of motion planning library,” IEEE Robotics & Automa-
manipulators and human prostheses,” IEEE Trans- tion Magazine, vol. 19, no. 4, pp. 72–82, 2012.
actions on man-machine systems, vol. 10, no. 2, pp. [27] R. M. Salinas, “Aruco: A minimal library for aug-
47–53, 1969. mented reality applications based on opencv,” 2012.
[12] J. Lander and G. CONTENT, “Making kine more
flexible,” Game Developer Magazine, vol. 1, no. 15-
22, p. 2, 1998.
[13] R. Mukundan, “A robust inverse kinematics algo-
rithm for animating a joint chain,” International
Journal of Computer Applications in Technology,
vol. 34, no. 4, pp. 303–308, 2009.
[14] J.-J. Park, J.-H. Kim, and J.-B. Song, “Path plan-
ning for a robot manipulator based on probabilis-
tic roadmap and reinforcement learning,” Interna-