Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

MIPRO 2020, September 28 - October 02, 2020, Opatija, Croatia

Estimating Robot Manipulator End-effector Forces


using Deep Learning
Stanko Kružić∗ , Josip Musić∗ , Roman Kamnik† , Vladan Papić∗
∗ University of Split, Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture
† University of Ljubljana, Faculty of Electrical Engineering

E-mail: {skruzic, jmusic, vpapic}@fesb.hr, roman.kamnik@fe.uni-lj.si

Abstract—The measurement of the robotic manipulator end- that torque measurements are available for each joint motor
effector interaction forces can in certain cases be challenging, and that robot dynamics model is known. If measured joint
especially when using robots that have a small payload (and motor torques are not available, they can be estimated using
consequently not capable of using wrist-mounted force sensor),
which is often case with educational robots. In the paper, a measured motor currents and in turn, be used to estimate end-
method for estimation of end-effector forces using measurements effector forces [6]).
from the base-mounted force sensor and deep neural networks Neural networks have been, since their recent re-emergence,
is presented. Several deep architectures were trained using widely used in many fields, including robotics. They have been
data collected on real 6-DOF robot manipulator (Commonplace applied to grasping (using vision) [7], control [8], domain
Robotics Mover6 robot) using custom-made interaction object
operated by a human. The obtained results show that when adaptation [9], etc. However, the force estimation in robotics
using appropriate deep architecture promising estimates can be is a less studied topic but does exist. In [10] neural networks
achieved (with an RMSE metric on test set which was 16%, 12% are applied as an extension to force observers, which require
and 6% of maximum force in respective directions of x, y and full knowledge of system dynamics. They are used there to
z axes). This makes this approach suitable for use in a variety overcome complicated dynamics or not-fully-known dynamics
of applications, including but not limited to usage with haptic
feedback interfaces for robot control. model. There also exist application-specific approaches, e.g.
Index Terms—robotics, force estimation, deep learning in robotic surgery [11], [12] where estimation is done without
a force sensor, but using some kind of visual feedback and
I. I NTRODUCTION neural networks.
There have been some recent applications of neural net-
Robotic manipulators are nowadays used for a variety of works for learning robot (inverse) dynamics [13], [14], [15].
different tasks and are usually programmed to execute a set of In [13] Lagrangian equation is implemented and used in a
predefined trajectories to complete a given task. This is usually neural network for learning inverse dynamics. [14] uses some
referred to as position control. However, in some applications, previous knowledge about the system to build a network. In
this may be inappropriate, when feedback only about current [15] the learning of the inverse model is done using Long
position does not guarantee successful task completion. In Short-Term Memory (LSTM) network which outperforms the
these cases, force control (which also may be thought of as Gaussian processes approach.
interaction control [1]) is used, and therefore forces on robot In the paper, we demonstrate the possibility of estimating
end-effector need to be measured. This is especially true when the end-effector forces by measuring forces with a force sensor
robot actions involve physical contact between the robot and mounted under the robot base. The benefit of this method
the environment. is that it does not rely on measurements of joint motor
In most scenarios, a force sensor is mounted on the robot currents, because forces can be directly measured using force
wrist. However, when using manipulators with small payload sensors, which are generally reliable and provide very accurate
it may be inconvenient or even impossible to use the wrist- measurements. In our approach to the problem, the force
mount sensor. This is often the case with educational robotic sensor is mounted under the robot base and the estimation of
manipulators or other small robots. This disadvantage can be the end-effector force is accomplished using neural networks
overcome by estimating those forces instead of measuring which may prove beneficial for the task because it requires
them directly. One common method of force estimation is one no knowledge about the robot dynamics (which is inferred
using force observers, as in [2], [3], [4]. In [2] the estimation is implicitly, i.e. the neural network is used to approximate it).
done using model-based observers, but it requires an accurate The rest of the paper is structured as follows. In Section
dynamic robot model. In [3] both linear and non-linear force II, an introduction and overview of the neural network archi-
observers were presented (the latter applied in robot force tectures are given, while Section III describes data collection
control scheme), while in [4] control error from joint control process of the performed experiment and reports the trained
was used to estimate forces. End-effector forces can also be NN architectures and accompanied hyperparameters. In Sec-
estimated from joint motor torques, as in [5], which requires tion IV results are presented and discussed while in Section V

1163
Authorized licensed use limited to: University of Split. Downloaded on December 01,2021 at 09:40:53 UTC from IEEE Xplore. Restrictions apply.
conclusions are drawn based on obtained results and possible FC FC
directions of future research on the topic are presented. Input Output

II. M ATERIALS AND M ETHODS


The proposed approach aims to predict end-effector forces
by measuring forces on the robot base. Since it is virtually
impossible to predict those values with absolute precision,
the aim is to identify which NN architecture (and with what
hyperparameters) will produce best results for use in a specific
application. (a) MLP architecture
Several NN architectures were used for the task: multilayer FC FC
perceptron (MLP), convolutional NN and LSTM network, and Conv

are shown in Fig. 1. However, please note that these network Input Output
graphs are only conceptual presentations of the architectures
since graphs with real numbers of neurons per layer used
in the experiment are inconvenient to draw. Thus, actual
numbers of layers and neurons are summarised in Table I.
Those architectures were used in the experiment with different
numbers of layers and neurons per layer to assess which
of these can produce best estimates of robot wrist force by
measuring force on robot base. Networks are usually trained (b) Convolutional NN architecture
using back-propagation and non-linear optimisation algorithms
FC FC
which optimise given loss function (mean square error and LSTM
mean average error are commonly used).
Input Output
MLP is the simplest NN architecture which features only
densely connected layers which are optimised to map inputs
to outputs. On the other hand, convolutional networks use
convolutional layers to extract features from input. Convo-
lutional layers can be either 1-dimensional, which used to
extract temporal features (i.e. from time sequence input), 2-
dimensional, which are used for extracting spatial features
(i.e. to process images) and 3-dimensional, for spatiotemporal
(c) LSTM NN architecture
features (sequence of images – video). LSTM [16] networks
Fig. 1. NN architectures used in the research
consist of LSTM layers and are a sub-type of recurrent neural
networks. LSTM layers are used for processing sequential
data which feature feedback connections (i.e. memory). Both
aforementioned architectures have densely-connected layers to held by the experimenter who applied force on robot end-
do the learning from features extracted using convolutional and effector, as illustrated in Fig. 3. The robot executed random
LSTM layers, respectively. The MLP architecture was chosen (but feasible) trajectories in the process. The other force
based on the reasoning that the inputs are going to be of sensor was mounted below the robot base. Both sensors were
relatively low complexity (i.e. low number of input features), JR3 90M40 6-axis force-torque sensors, and data acquisition
while the convolutional and LSTM architectures were chosen was done using Mathworks Simulink Real-Time software
to assess if treating inputs as time sequences can produce better which also synchronised sensors at a rate of 100 Hz. Also,
force estimates. The details about actual hyperparameters used Optotrak Certus optical motion capture system was used for
in the experiment are presented in Section III. assessing the positions and orientations of the robot base and
the interaction tool. The position data measurements were
III. E XPERIMENTAL S ETUP synchronised with force measurements. Robot joint positions
The experiment was conducted on Commonplace Robotics that are provided by encoders on each of the joints were also
Mover6 robot, a six-degrees-of-freedom educational robot. recorded.
The robot manufacturer did not provide the dynamics model The data were collected as follows. The experimenter hold-
nor any relevant data that makes the computation of the ing the interaction tool applied forces to the robot by making
dynamic model possible. Furthermore, the robot has a payload a contact between the interaction tool dome and the robot end-
of only 0.4 kg, which makes the use of wrist-mounted force effector while the robot was either in motion or at a standstill.
sensors impossible. Hence, for this purpose, an (auxiliary) During the robot motion, random (but feasible) goal positions
interaction tool was devised to enable direct measurement and orientations were generated and trajectories executed.
of end-effector forces, as depicted in Fig. 2. The tool was Between two successive motion trajectories, the robot was,

1164
Authorized licensed use limited to: University of Split. Downloaded on December 01,2021 at 09:40:53 UTC from IEEE Xplore. Restrictions apply.
for a short period of time, at a standstill. A random number
of up to six contacts were made in a single measurement
instance, and a total of 800 measurement instances of random
length, ranging from approx. 10 s to approx. 30 s were
recorded, resulting in a total of 1,803,875 samples. All the
measurement instances were preprocessed in such a way that
both robot base and interaction tool forces were expressed
in the same reference frame. This was achieved by using
positions obtained from the Optotrak markers, three positioned
on the robot base and three on the interaction tool, which was
used to define a reference frame corresponding to the robot
base and the interaction tool, respectively. Following this, the
transformation between the two frames was obtained, which
then was used to express both measured forces in a common
reference frame so that the principal axes from both force
measurements match. Finally, the obtained forces were low-
pass filtered to remove noise.
From obtained measurement data, two datasets were built:
one with filtered data with mean removed from data and scaled
to unit variance (i.e. normalised); the other was same except
that recorded joint positions were also included as inputs along
with the measured forces. Due to the fact that multiple robot
poses may result in the same measured forces, we included
this in the dataset to assess if it will help neural networks
to produce better estimates. The datasets were then divided
into training, validation and test set as follows. The test set
used random 20% data, the validation set used further 20% of
Fig. 2. Custom-built interaction tool used in the data collection with Optotrak what’s left and the remaining data were used as a training set.
markers. Please note that the hemispheric dome placed on top of the force The forces measured on the robot base-mounted sensor served
sensor to focus forces. as inputs to the NN (possibly along with joint positions), while
the measured interaction tool forces were used as NN targets
(outputs).
Several networks were trained for each of the architectures
introduced in Section II with varying hyperparameters. The
overview of the used architectures with their respective hyper-
parameters is summarised in Table I, while obtained results
are presented and discussed in the following section. Please
note that MLP networks have a higher number of neurons
per layer than other architectures, which is based on authors’
previous experience in the field and on the fact that MLP does
the learning on raw input data rather than temporal features
(which is the case with convolutional and LSTM architectures,
and thus need more neurons to generalise properly. In convo-
lutional and LSTM networks, a single 1-d convolution / LSTM
layer was used at the beginning of the network, reasoning that
for the data of relatively low dimensionality and complexity as
our is (only forces along three principal axes), a single layer
is enough to extract temporal features. Those layers accept
sequences of fixed length N as inputs. Thus, the sequences in
the dataset had to be split into sequences of length N , and that
was done such that first sequence consisted of samples of 1 to
N , then from samples 2 to N + 1, etc. from the originally
recorded sequence. In the research, sequence lengths used
were 5 and 10, while the number of LSTM and convolutional
Fig. 3. The process of collecting measurements.
units in the layer was fixed to 8.
All the networks were initialised with a Glorot uniform

1165
Authorized licensed use limited to: University of Split. Downloaded on December 01,2021 at 09:40:53 UTC from IEEE Xplore. Restrictions apply.
initialiser [17], and trained using Adam optimiser [18], starting TABLE II
with a learning rate of 0.001 (the optimiser is adaptive, thus the N ETWORK FITNESS AND RMSE METRIC ALONG PRINCIPAL AXES ( ROW
NUMBERS ARE CORRESPONDING TO ARCHITECTURES ENUMERATED IN
learning rate may change during the training). In the training TABLE I. A LL QUANTITIES ARE EXPRESSED IN N EWTONS [N]
process, the mean absolute error (MAE) was used as a loss
function, while for activating neurons, the ReLU function was # Val. set Test set RMSEx RMSEy RMSEz
used. In all instances, the training lasted until there was no 1. 1.92130 3.5392 2.5623 3.0335 2.7934
improvement in validation loss for 10 consecutive epochs. The 2. 2.10563 3.2132 2.8725 3.0540 2.7845
training was conducted using Tensorflow 2.0 and Keras. 3. 1.95203 3.4192 2.6007 2.8154 2.5659
4. 2.02529 3.4508 2.8554 2.9534 2.6193
IV. R ESULTS AND D ISCUSSION 5. 1.92932 3.5604 2.5468 2.7671 2.4515
6. 1.88883 3.3298 2.5386 2.7577 2.3894
Networks presented in Table I were trained, with the results
7. 2.02308 3.3601 2.5757 2.9328 2.6488
being presented in Table II. As a measure of network fitness,
8. 1.96090 3.0722 2.5637 2.8173 2.5134
the validation loss and test loss (obtained using MAE loss
9. 1.97469 3.5950 2.6007 2.8573 2.6239
function) were used. As a metric of fit between targets and pre-
dictions, root-mean-square errors (RMSE) are also reported.
Please note that all reported loss values are computed for all smaller number of total trainable parameters generally perform
three principal axes together and are a 2 norms of vector better (not depending on architecture). This is likely due to
losses along principal axes, while RMSE values are reported simple input data (i.e. a small number of features). By using
on the test set along each of principal axes separately. this metric, the best architecture is, still by a small margin,
From the results, it is not immediately observable which convolutional network (the smallest in term of the number of
architecture is optimal, because all of them, at first sight, trainable parameters, among those trained). Also, on most of
perform similarly, with no significant differences between the trained networks, RMSE along z axis is the smallest one,
various architectures. However, there appear some interesting which is encouraging, due to fact that force component along
observations when looking at end-effector force predictions z axis is usually a dominant component of the force vector.
on the test set. Examples of predictions using different trained For the architecture with the best obtained RMSE, 6% RMSE
architectures are shown in Fig. 4 (please note that Figs. 4a, 4c with respect to the maximum force along z axis was achieved,
and 4e show estimated from same test case; similar is true for which is significantly better than 16% and 12% achieved along
Figs. 4b, 4d and 4f, but another test case is used). Obtained test axes x and y).
results partly contain both ”good” and ”bad” predictions (i.e. in Based on the results, there is no clear-cut decision which
one part of a single test case, predictions are fair, while on the kind of architecture is optimal for the task. However, ar-
other part are not). But, the general observation is that obtained chitectures that consider input (measured) forces as time-
predictions suggest that MLP architecture performs marginally series data have marginally better predictions. Please note
worse than others, and by visual inspection of the obtained that hyperparameter tuning for each of the architectures may
predictions, it seems that they perform much worse than that provide somewhat better results, but that would likely require a
measured network fitness in Table II suggest. It was also grid search approach to identify optimal hyperparameters and
observed that predictions made using MLP networks perform also a significant amount of time to train networks with all
slightly worse and oscillate slightly more than predictions possible values of all chosen hyperparameters (i.e. the number
made using the other two architectures. of layers, number of neurons per layers, activation function,
When looking at RMSE obtained on the test set along each optimiser, loss function, etc.).
of the principal axes, it may be concluded that networks with a
V. C ONCLUSION
TABLE I In the paper, the method for estimation of robot manipulator
T RAINED ARCHITECTURES USED IN THE EXPERIMENT end-effector forces is presented. The forces are measured with
# Arch. Joint pos. Layers (neurons)1 Seq. length
a force sensor mounted under the robot base, and end-effector
forces are estimated using deep neural networks. It is shown
1. MLP No 3 (64) /
that less complex networks (in the number of parameters,
2. MLP Yes 3 (64) /
which corresponds to the lower number of layers and/or lower
3. MLP Yes 2 (32) /
4. Conv No 3 (32) 10
number of neurons per layer) perform better, likely due to low
5. Conv Yes 2 (16) 10 complexity (and dimensionality) of input data.
6. Conv Yes 2 (16) 5 Based on obtained results, it might be concluded that this
7. LSTM No 2 (16) 10 approach provides good estimates and that can be used in ap-
8. LSTM Yes 2 (16) 10 plications not requiring a high degree of precision. However, if
9. LSTM Yes 2 (16) 5 high precision is required, this approach needs to be extended.
1 Layers pertain to number of densely connected hidden layers, One of directions of future development of this approach might
while neurons pertain to number of neurons in each hidden layer. be to also include joint velocities and accelerations in the

1166
Authorized licensed use limited to: University of Split. Downloaded on December 01,2021 at 09:40:53 UTC from IEEE Xplore. Restrictions apply.
(a) Arch. #3 (b) Arch. #2

(c) Arch. #4 (d) Arch. #6

(e) Arch. #9 (f) Arch. #8


Fig. 4. Prediction examples on test set for trained networks.

1167
Authorized licensed use limited to: University of Split. Downloaded on December 01,2021 at 09:40:53 UTC from IEEE Xplore. Restrictions apply.
dataset possibly leading to better predictions, reasoning that Transactions on Automation Science and Engineering, vol. 15, pp. 879–
those are used for computations of the inverse dynamics ana- 886, April 2018.
[7] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning
lytically, and hence might prove useful for better generalisation hand-eye coordination for robotic grasping with deep learning and large-
of the trained neural networks. The other direction might be scale data collection,” The International Journal of Robotics Research,
incorporating knowledge about the robotic system into the vol. 37, no. 4-5, pp. 421–436, 2018.
model, leading to the better capture of the physical model [8] L. Jin, S. Li, J. Yu, and J. He, “Robot manipulator control using neural
networks: A survey,” Neurocomputing, vol. 285, pp. 23 – 34, 2018.
of the robot. This may be tackled by introducing additional
[9] K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrish-
layer(s) in neural networks that implement robot physics [13], nan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and V. Van-
[14] and consequently leading to better prediction of contact houcke, “Using simulation and domain adaptation to improve efficiency
forces, due to physical constraints those new layers eventually of deep robotic grasping,” in 2018 IEEE International Conference on
Robotics and Automation (ICRA), pp. 4243–4250, May 2018.
imposing. [10] A. C. Smith and K. Hashtrudi-Zaad, “Application of neural networks
in inverse dynamics based contact force estimation,” in Proceedings
R EFERENCES of 2005 IEEE Conference on Control Applications, 2005. CCA 2005.,
[1] B. Siciliano and L. Villani, Robot force control, vol. 540. Springer pp. 1021–1026, Aug 2005.
Science & Business Media, 2012. [11] A. Marbán, V. Srinivasan, W. Samek, J. Fernández, and A. Casals, “A
[2] P. J. Hacksel and S. E. Salcudean, “Estimation of environment forces recurrent convolutional neural network approach for sensorless force
and rigid-body velocities using observers,” in Proceedings of the 1994 estimation in robotic surgery,” CoRR, vol. abs/1805.08545, 2018.
IEEE International Conference on Robotics and Automation, pp. 931– [12] A. I. Aviles, S. Alsaleh, P. Sobrevilla, and A. Casals, “Sensorless force
936 vol.2, May 1994. estimation using a neuro-vision-based approach for robotic-assisted
[3] A. Alcocer, A. Robertsson, A. Valera, and R. Johansson, “Force esti- surgery,” in 2015 7th International IEEE/EMBS Conference on Neural
mation and control in robot manipulators,” IFAC Proceedings Volumes, Engineering (NER), pp. 86–89, April 2015.
vol. 36, no. 17, pp. 55 – 60, 2003. 7th IFAC Symposium on Robot [13] M. Lutter, C. Ritter, and J. Peters, “Deep lagrangian networks: Using
Control (SYROCO 2003), Wroclaw, Poland, 1-3 September, 2003. physics as model prior for deep learning,” CoRR, vol. abs/1907.04490,
[4] A. Stolt, M. Linderoth, A. Robertsson, and R. Johansson, “Force 2019.
controlled robotic assembly without a force sensor,” in 2012 IEEE [14] F. Dı́az Ledezma and S. Haddadin, “First-order-principles-based con-
International Conference on Robotics and Automation, pp. 1538–1543, structive network topologies: An application to robot inverse dynamics,”
May 2012. in 2017 IEEE-RAS 17th International Conference on Humanoid Robotics
[5] M. Van Damme, P. Beyl, B. Vanderborght, V. Grosu, R. Van Ham, (Humanoids), pp. 438–445, Nov 2017.
I. Vanderniepen, A. Matthys, and D. Lefeber, “Estimating robot end- [16] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
effector force from noisy actuator torque measurements,” in 2011 IEEE Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
International Conference on Robotics and Automation, pp. 1108–1113,
May 2011. [17] X. Glorot and Y. Bengio, “Understanding the difficulty of training
[6] A. Wahrburg, J. Bös, K. D. Listmann, F. Dai, B. Matthias, and H. Ding, deep feedforward neural networks,” in Proceedings of the Thirteenth
“Motor-current-based estimation of cartesian contact forces and torques International Conference on Artificial Intelligence and Statistics (Y. W.
for robotic manipulators and its application to force control,” IEEE Teh and M. Titterington, eds.), vol. 9 of Proceedings of Machine
[15] E. Rueckert, M. Nakatenus, S. Tosatto, and J. Peters, “Learning inverse Learning Research, (Chia Laguna Resort, Sardinia, Italy), pp. 249–256,
dynamics models in o(n) time with lstm networks,” in 2017 IEEE-RAS PMLR, 13–15 May 2010.
17th International Conference on Humanoid Robotics (Humanoids), [18] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
pp. 811–816, Nov 2017. International Conference on Learning Representations, 12 2014.

1168
Authorized licensed use limited to: University of Split. Downloaded on December 01,2021 at 09:40:53 UTC from IEEE Xplore. Restrictions apply.

You might also like