1 s2.0 S095741742301686X Main

Expert Systems With Applications 236 (2024) 121184
Contents lists available at ScienceDirect
Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa
Neural-optimal tuning of a controller for a parallel robot

Daniel Blanck-Kahan a , Gerardo Ortiz-Cervantes a , Valentín Martínez-Gama a ,
Héctor Cervantes-Culebro b ,∗, J. Enrique Chong-Quero b , Carlos A. Cruz-Villar c
a
Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Av. Carlos Lazo No. 100, 01389, Ciudad de México, Mexico
b
Tecnologico de Monterrey, Escuela de Ingeniería y Ciencias, Carr. al Lago de Guadalupe Km.3.5, 52926, Estado de México, Mexico
c
Cinvestav-IPN, Departamento de Ingeniería Eléctrica, Av. IPN 2508, A.P. 14-740, 07300, Ciudad de México, Mexico
ARTICLE INFO ABSTRACT
Keywords: In this article, a double strategy is proposed to find the optimal gains of a cascaded PI controller to minimize
Deep Neural Networks the trajectory position error in a five-bar parallel robot. The first strategy employs Differential Evolution to
Differential Evolution tune constant gains during the execution time of the desired trajectory. Once Differential Evolution achieves
Neural control
convergence on the solution by finding the vector of optimal gains that minimize the position tracking error,
all the position error data and current of the two brushless motors are saved. In the second strategy, the data
generated in the first strategy is used to train a Deep Neural Network. After that, the trained Deep Neural
Network replaces the constant gains of the first strategy with time-varying gains for the desired trajectory.
Three working scenarios are proposed to test the generalization of the Deep Neural Network. In the first
scenario, a training trajectory is executed. In the second one, a testing trajectory of the Deep Neural Network
is evaluated. In the third one, a mass change is generated in the middle of the cycle. The results show that
the Deep Neural Network is robust to different trajectories and mass changes during the execution of pick and
place tasks.
1. Introduction Robust FOPID (Hajiloo et al., 2012; Sánchez et al., 2017; Zhang & Liu,
2018), Sliding mode controller (Ye et al., 2021; Zhang et al., 2023), H-
Finding the optimal gains of a controller for a five-bar parallel is infinity controller (Ashok Kumar & Kanthalakshmi, 2018; Rigatos et al.,
a highly iterative process since many variables must be tuned and 2017; Souza & Souza, 2019), among others.
often meet conflicting specifications such as energy efficiency and high On the other hand, the robustness problem in the face of parametric
accuracy (Rodríguez-Molina et al., 2020). In addition, the nonlinear uncertainties and disturbances can also be addressed from an intelligent
dynamic behavior must be solved with the kinetic constraints of the control perspective. Some examples of intelligent control methods are
mechatronic system. An example of a kinetic constraint is the torque
listed in the following four subcategories.
provided by the motor. The torque profile is unknown; it is a time vari-
able and depends on the demanded task (Fang et al., 2016). Moreover, • Adaptive Meta-heuristic algorithms. Particle Swarm Optimiza-
some drawbacks can be found when looking for optimal gains for a tion (PSO) (Bingül & Karahan, 2011). Differential Evolution Based
given controller, such as these optimal gains are specific for an effector
Control Adaptation (DEBAC) (Villarreal-Cervantes et al., 2018).
load, trajectory, and a set of modeled parameters (Li et al., 2022). In
Non-Dominated Sorting in Genetic Algorithms (NSGA)-II (Zhou
the presence of security controls (Salwani et al., 2009), changes in
& Zhang, 2019). Ant Lion Optimization (ALO) (Pradhan et al.,
working conditions, tasks, parametric uncertainties, and disturbances
2020). A comprehensive review of state-of-the-art on the classifi-
in the system, the optimal tuning results in a sub-optimal one (Kumar
& Kumar, 2017). This disadvantage can be addressed from a classical cation of metaheuristic algorithms to tune PID parameters can be
robust control theory such as Robust LQR (Liu et al., 2012), Fractional- found in the reference (Joseph et al., 2022).
Order PID (FOPID) controller (Goyal et al., 2019; Kler et al., 2018),
The code (and data) in this article has been certified as Reproducible by Code Ocean: (https://codeocean.com/). More information on the Reproducibility
Badge Initiative is available at https://www.elsevier.com/physical-sciences-and-engineering/computer-science/journals.
∗ Corresponding author.
E-mail addresses: daniel.blanck@exatec.tec.mx (D. Blanck-Kahan), gerardo.ortiz.cervantes@exatec.tec.mx (G. Ortiz-Cervantes), valentin_mgm@exatec.tec.mx
(V. Martínez-Gama), hector_cervantes@tec.mx (H. Cervantes-Culebro), jchong@tec.mx (J.E. Chong-Quero), cacruz@cinvestav.mx (C.A. Cruz-Villar).
https://doi.org/10.1016/j.eswa.2023.121184
Received 24 May 2022; Received in revised form 9 August 2023; Accepted 10 August 2023
Available online 1 September 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
D. Blanck-Kahan et al. Expert Systems With Applications 236 (2024) 121184
• Neural Network (NN) based Control. An adaptive compensation

control scheme based on NN compensates for the effects of distur-
bances and uncertainties for drones (Jin et al., 2020; Ucgun et al.,
2022). In Cheng et al. (2021), an adaptive NN control approach
is studied to perform robust and accurate control of nonlinear
systems with unknown dynamics.
• Reinforcement Learning (RL) and Adaptive Dynamic Pro-
gramming (ADP). In Sutton and Barto (2018), an ADP algorithm
is used to find an optimal controller. Robust optimal control for
nonlinear systems with unknown disturbances based on distur-
bance observer (Song & Lewis, 2020; Trinh et al., 2021). RL ap-
proach to tune A hybrid Zeigler–Nichols (Hekimoğlu, 2019), and
H-infinity control (Kiumarsi et al., 2017; Luo et al., 2014). A PI RL
control approach that uses a metaheuristic Grey Wolf Optimizer
(GWO) algorithm to train the policy NN is studied by Zamfirache
Fig. 1. Built prototype.
et al. (2022). GWO is compared with Gradient Descent and PSO
algorithms, where GWO offers the best performance to control a
nonlinear servo system position.
• Fuzzy Control (FC). In Pang et al. (2018), a variable universe 2.1. Problem definition
FC design is proposed for a vehicle semi-active suspension sys-
tem with a magnetorheological (MR) damper through the com- The five-bar parallel robot consists of five rigid links connected at
bination of Fuzzy Neural Network (FNN) and PSO. A hybrid their ends by five revolute joints (see Fig. 1). The two BLDC motors,
force/position control method based on Adaptive Fuzzy Com- 𝑀1 and 𝑀2 , are fixed to the aluminum frame. 𝑙0 represents the distance
puted Torque Control and Fuzzy Proportion Integral is proposed between the two rotors of BLDC motors. The rotors of BLDC motors are
by Wang et al. (2021). A self-tuning Adaptive Fuzzy Logic Based rigidly attached to links 𝑙1 and 𝑙2 , respectively. 𝑙3 and 𝑙4 represent the
Repetitive Learning Controller is designed for robot manipula- length non-actuated links. 𝜃1 and 𝜃2 represent the angular displacement
tors (Yilmaz et al., 2021). Precup et al. (2021) propose a novel controlled by motors 𝑀1 and 𝑀2 for each case.
Slime Mould Algorithm (SMA) as the optimal fuzzy controller Two origins are set as [𝑂𝑥1 , 𝑂𝑦1 ] = [0, 0] and [𝑂𝑥2 , 𝑂𝑦2 ] = [𝑙0 , 0].
The set of desired trajectories in the final effector, the intersection joint
tuning approach. The approach is applied to the position control
between links 𝑙3 and 𝑙4 , is given in Appendix. The set of trajectories is
of a nonlinear servo system. This article uses the information
given in Cartesian space 𝐶[𝑥𝑁 , 𝑦𝑁 ] for training, validation, and testing
feedback model, the information available from individuals in
of the DNN. For instance, the desired trajectory in Fig. 11(c) can be
previous iterations, to guide the search process and, consequently,
represented by a parameterized equation in time as in:
accelerate convergence. ( )
𝜋𝑡
𝑥𝑁 = 𝑐𝑒𝑥 + 𝑠𝑚𝑎 cos 2
This study proposes a double strategy to tune six gains of a cascade 𝑡𝑓
type of PI controller applied to a five-bar parallel robot. In the first
( )
strategy, Differential Evolution is used as a search algorithm to seek 𝜋𝑡
𝑦𝑁 = 𝑐𝑒𝑦 + 𝑠𝑚 sin 2 (1)
optimal gains that minimize the trajectory error position. Once the 𝑡𝑓
algorithm achieves convergence, the position error and current data
where [𝑐𝑒𝑥 , 𝑐𝑒𝑦 ]𝑚 is the center of the ellipse, 𝑠𝑚𝑎 is the semi-major axis,
are saved in packages of 10 samples until every desired trajectory is 𝑠𝑚 is the semi-minor axis. The cycle time is denoted by 𝑡𝑓 . The update
completed for each motor. A DNN is trained, validated, and tested with time of the trajectory is expressed with the variable 𝑡. The desired
the data stored for each proposed trajectory. The inputs of the DNN Cartesian trajectory is transformed in the articular space 𝜽̄ 𝐦 for the two
are position errors for all trajectories arranged in packs of 10 samples, BLDC motors, 𝑚 = [1, 2]. The desired angular displacement is given by
and the outputs are the optimal gains for that trajectory section. Three the inverse kinematics:
working scenarios are proposed to test the generalization of the DNN. ( 2 2 + 𝑝2
) ( )
−𝑙𝑚+2 + 𝑙𝑚 𝑛𝑚 𝑝𝑦𝑚
In the first scenario, a known trajectory is tested. In the second one, an 𝜽̄ 𝐦 = 𝜎 arccos + arctan (2)
unknown trajectory in the training of the DNN is evaluated. In the third 2𝑙𝑚 𝑝𝑛𝑚 𝑝𝑥𝑚
one, a mass change is generated in the middle of the cycle. This article where positive or negative quadrant is represented with√ 𝜎. The mag-
shows that the DNN can identify control gains for unknown trajectories nitude between the origin and the end effector is 𝑝𝑛𝑚 = (𝑝2𝑥𝑚 + 𝑝2𝑦𝑚 ),
and even when disturbances, such as mass in the end effector, are where 𝑝𝑥𝑚 = 𝑥𝑁 − 𝑂𝑥𝑚 , 𝑝𝑦𝑚 = 𝑦𝑁 − 𝑂𝑦𝑚 .
added. The dynamic model of the five-bar parallel robot as a function of the
The following paper contains six sections. In the Section 2, the prob- position and velocity actuated joint variables (𝜽𝐦 , 𝜽̇ 𝐦 ) can be described
lem is defined. Then in the Section 3, the method and the algorithm to by Ghorbel et al. (2000).
solve the problem are explained. In this section also, the experimental
setup is described. The results of training, validation, and testing are 𝝉 = 𝐃(𝜃𝑚 )𝜽̈ 𝐦 + 𝐂(𝜃𝑚 , 𝜃̇ 𝑚 )𝜽̇ 𝐦 + 𝐠(𝜃𝑚 ) (3)
shown in the Section 4. In the Section 5, conclusions are presented. where 𝐃(𝜃𝑚 ) defines an inertia matrix, 𝐂(𝜃𝑚 , 𝜃̇ 𝑚 ) centrifugal and Coriolis
Finally, all trajectories for training, validating, and testing purposes are matrix, and 𝐠(𝜃𝑚 ) gravity vector. In the case of this work, the vector is
displayed in the appendix section. zero since gravity acts on the 𝑧-axis (see Fig. 1). 𝝉 are external torques
that are changed by the following cascade PI control law (𝐔):
[ ][ ]
2. Optimization problem 𝐾𝑃1 0 𝜃̄1 − 𝜃1
𝜽𝐦𝐥𝐨𝐨𝐩 = 𝐊𝐏𝐦 [𝜽̄ 𝐦 − 𝜽𝐦 ] = ̄ (4)
0 𝐾𝑃2 𝜃2 − 𝜃2
The present section formally establishes the control system as a 𝑡𝑓 [ ]
dynamic nonlinear optimization problem. 𝜽̇ 𝐦𝐥𝐨𝐨𝐩 = 𝐊𝐏𝐕𝐦 [𝜽𝐦𝐥𝐨𝐨𝐩 − 𝜽̇ 𝐦 ] + 𝐊𝐈𝐕𝐦 𝜽𝐦𝐥𝐨𝐨𝐩 − 𝜽̇ 𝐦 𝑑𝑡 (5)
∫0
2
• Achieving a balance of information between proven and new

𝑡𝑓 [ ]
𝐔 = 𝐊𝐏𝐈𝐦 [𝜽̇ 𝐦𝐥𝐨𝐨𝐩 − 𝐢] + 𝐊𝐈𝐈𝐦 𝜽̇ 𝐦𝐥𝐨𝐨𝐩 − 𝐢 𝑑𝑡 (6) solutions to be tested. Influenced by several survivors (𝑁𝑆),
∫0 which dictates how many individuals of the current generation
where 𝐊𝐏𝐦 is the position controller proportional gain. 𝐊𝐏𝐕𝐦 rep- get to cross their solutions with each other to generate a new set
resents the proportional velocity controller, and 𝐊𝐈𝐕𝐦 refers to an of solutions.
integral velocity controller gain. 𝐊𝐏𝐈𝐦 and 𝐊𝐈𝐈𝐦 correspond to pro- • Converge to a solution and performance with only the adequate
portional and integral current gains, respectively. In this article, 𝐊𝐏𝐈𝐦 number of iterations without using excessive training time and
and 𝐊𝐈𝐈𝐦 are set automatically during driver configuration. For this resources, which population size (𝑁𝑃 ), maximum of generations
reason, the gains associated with the current are not considered design (𝑀𝐴𝑋𝐺𝐸𝑁 ), and elite individuals (𝐸) define. These elite indi-
variables for the proposed problem. If the position control mode is viduals correspond to the best-tried solutions, which would also
selected, the whole loop runs. In this case, the position control loop is automatically populate part of the next generation without any
executed first, Eq. (4), the speed control second, Eq. (5), and the current changes or mutations, representing the current winning solution.
control last, Eq. (6). If velocity control mode is set, the position control
part is not activated, and the velocity command is fed directly into the
second stage input. In torque control mode, only the current controller Algorithm 1: Customized Differential Evolution
is used. begin
The input torque 𝝉 is a dynamic constraint. 𝑔=0
Create a random initial population 𝑥⃗𝑗,𝑔 ∀𝑗, 𝑗 = 1, ..., 𝑁𝑃
|𝜏(𝑡)| ≤ 𝝉 𝐦𝐚𝐱 ; 0 ≤ 𝑡 ≤ 𝑡𝑓 (7) ( )
Evaluate 𝑓 𝑥⃗𝑗,𝑔 ∀𝑗, 𝑗 = 1, ..., 𝑁𝑃
where 𝝉 𝐦𝐚𝐱 is 1.99 Nm. The proposed torque limit corresponds to the for 𝑔 = 1 to 𝑀𝐴𝑋𝐺𝐸𝑁 ( do )
value of the BLDC Motor, Odrive model D 5065 270KV. rank 𝑥⃗𝑗,𝑔 on 𝑓 𝑥⃗𝑗,𝑔
for 𝑗 = 1 to NP do
2.2. Objective function Select randomly 𝑟1 ≠ 𝑟2 and 𝑟1 , 𝑟2 in 𝑥⃗𝑠,𝑔 where
𝑠 = [1, 𝑁𝑆]
The control objective, 𝐽𝑜𝑏𝑗 , is to minimize the position trajectory if 𝑗 > 𝐸 then
tracking error for a five-bar parallel robot, defined in Eq. (8). 𝐹 = 𝑟𝑎𝑛𝑑[0, 1] ( )
𝑡𝑓 𝑡𝑓
𝑢𝑗,𝑔+1 = 𝑥𝑟1 ,𝑔 + 𝐹 𝑥𝑟1 ,𝑔 − 𝑥𝑟2 ,𝑔
min 𝐽𝑜𝑏𝑗 = (𝑒2𝜃1 + 𝑒2𝜃2 ) 𝑑𝑡 = [(𝜃̄1 − 𝜃1 )2 + (𝜃̄2 − 𝜃2 )2 ]𝑑𝑡 (8) if 𝑗 < 𝐸 + 𝑁𝑀 then
𝑥⃗∈ 𝑅6 ∫0 ∫0
𝑢𝑗,𝑔+1 = 𝑢𝑗,𝑔+1 ∗ 𝑟𝑎𝑛𝑑(1 − 𝑚𝑟, 1 + 𝑚𝑟)
Where 𝑥⃗ = [𝐾𝑃1 , 𝐾𝑃2 , 𝐾𝑃 𝑉1 , 𝐾𝑃 𝑉2 , 𝐾𝐼𝑉1 , 𝐾𝐼𝑉2 ] is a vector of six cas-
else
cade proportional–integral gains that minimize the position trajectory 𝑢𝑗,𝑔+1 = 𝑥𝑗,𝑔
tracking error. 𝑒𝜃1 and 𝑒𝜃2 are the position errors for each degree of ( ) ( )
if 𝑓 𝑢⃗𝑗,𝑔+1 ≤ 𝑓 𝑥⃗𝑗,𝑔 then
freedom. The desired position trajectory is denoted by 𝜃̄1 and 𝜃̄2 . The
𝑥⃗𝑗,𝑔+1 = 𝑢⃗𝑗,𝑔+1
measure position are represented by 𝜃1 and 𝜃2 . The dynamic optimiza-
tion problem is subject to inverse kinematics (Eq. (2)), and the five-bar else
parallel robot dynamics (Eq. (3)). As well the internal cascading loops 𝑥⃗𝑗,𝑔+1 = 𝑥⃗𝑗,𝑔
of position (Eq. (4)), velocity (Eq. (5)) and current (Eq. (6)) of the PI 𝑔 =𝑔+1
controller, as well as, torque limits that each of the BLDC motors can
provide(Eq. (7)).
The search algorithm for the optimization problem, defined by
3. Material and methods
Eq. (8), is based on a Customized version of Differential Evolution
(Storn & Price, 1997), as seen in Algorithm 1. First, 𝑁𝑃 individuals are
This article shows a two-stage strategy to find the optimal gains of
randomly created. The current generation is represented by 𝑔, ∀𝑗, 𝑗 =
a cascaded proportional–integral controller, as shown in Fig. 2. Both
1, … , 𝑁𝑃 . The design variable vector is evaluated to determine the
strategies are tested in the built prototype. Thus is avoided to solve
objective function of each generated individual. Then all the solutions
for each sampling time the dynamic model of the five-bar parallel
are ordered from smallest to most significant numerical value of the
robot and the physical constraints of motors (Eqs. (3), (7)) because
objective function. The best 𝐸 individuals are passed directly to the
its solution is obtained directly from the sensors’ measurements. In
next generation. [ addition, non-modeled dynamics, such as joint friction and possible
At] the moment of reproduction the parent vector, 𝑥⃗𝑗,𝑔 = 𝑥1,𝑗,𝑔 , … ,
clearance in the mechanical parts, are implicitly considered.
𝑥𝑛,𝑗,𝑔 , generates one offspring vector 𝑢⃗𝑗,𝑔 . To form the offspring, two
The first stage strategy consists of using Differential Evolution (DE)
individuals 𝑥⃗𝑟1 ,𝑔 and 𝑥⃗𝑟2 ,𝑔 , are randomly selected from a subpopulation
online to tune six design variables 𝑥⃗ = [𝐾𝑃1 , 𝐾𝑃2 , 𝐾𝑃 𝑉1 , 𝐾𝑃 𝑉2 , 𝐾𝐼𝑉1 ,
𝑠, of survivors 𝑁𝑆, from the initial population. After that, a difference
𝐾𝐼𝑉2 ], three for each degree of freedom (Fig. 2(a)).
between these two vectors is calculated and scaled by a random factor
The first step proposes a set of 𝑛 desired trajectories (Section:
𝐹 ∈ [0, 1], whose outcome is a mutant vector. The mutant and parent
Appendix). The trajectories are proposed for pick and place tasks. After
vector are recombined to create an offspring vector based on some
defining the 𝑛 paths, the objective function is defined by Eq. (8). In
mutant individuals, 𝑁𝑀, which some of them are modified by a
this article, the objective function is to minimize the trajectory position
random mutation factor, 𝑚𝑟. The process is iterative until convergence
error for each degree of freedom.
in the solution is achieved or the maximum number of generations,
Each individual, 𝑥⃗𝑗,𝑔 , proposed by DE is particular for a given
𝑀𝐴𝑋𝐺𝐸𝑁 , is reached.
desired path. Likewise, the proposed vector remains constant from
The Customized DE algorithm, 1, parameters are tuned for the
the beginning of the trajectory until the end of the cycle. The whole
following three main goals:
desired trajectory is fragmented into sections of 𝑄 samples until the
• Encourage the algorithm to explore new solutions without con- desired trajectory is completed. If convergence is obtained, the optimal
verging too fast into local minima. The number of mutants, 𝑁𝑀, solution vector for the cascade controller gains, 𝐾⃗∗ = 𝑥⃗𝑗,𝑀𝐴𝑋𝐺𝐸𝑁 =
and mutation factor, 𝑚𝑟, mainly influences it as this defines the [𝐾𝑃1 , 𝐾𝑃2 , 𝐾𝑃 𝑉1 , 𝐾𝑃 𝑉2 , 𝐾𝐼𝑉1 , 𝐾𝐼𝑉2 ] is saved. Moreover, for each sec-
number of solutions that could randomly gain in a new direction tion, the trajectory position errors (𝑒𝜃𝑚 ) and the current supplied in
for the next generation. each of the motors (𝑖𝑚 ) are saved in packets of 𝑄 = 10 samples.
3
Fig. 2. PI controller gains tuning by DE and DNN.
Fig. 3. Data generation for DNN training flow chart.
4
The methodology used to generate data to train the DNN is sum-

marized in Fig. 3. All the data packets saved for all the proposed
trajectories lead to the second optimal gain tuning strategy.
The second control strategy consists of training a Deep Neural
Network (DNN) offline, as seen at the bottom of Fig. 2(a). The error
position and current histories of each package of 𝑄 samples are the
inputs for the DNN training. At the same time, the optimal cascade PI
controller gains for that trajectory section are the outputs. Once the
DNN is trained, it is the one that changes every 𝑄 samples the cascaded
PI controller gains in the function of time and desired trajectory.
Consequently, the cascade PI optimal controller gains change from
being invariant in time using DE, to variant in time using . At the
same time, the experiment is executed, as can be seen at the bottom
of Fig. 2(b). From this block, it can be seen that the optimal gains,
𝐊𝐏𝐦 (𝐭), 𝐊𝐏𝐕𝐦 (𝐭), 𝐊𝐈𝐕𝐦 (𝐭), are a function of time for a particular section
of the desired trajectory. On the other hand, 𝐊𝐏𝐦 , 𝐊𝐏𝐕𝐦 , 𝐊𝐈𝐕𝐦 are
constant gains for a particular individual and trajectory when DE is
implemented.
The training of the DNN is implemented with an Adaptive Moment
Estimation (ADAM) optimization algorithm (Khan et al., 2020), which
is integrated into the Keras and TensorFlow library. The optimizer uses Fig. 4. Visual representation of training and testing of the model.
a per-parameter learning rate and is a variant of stochastic gradient
descent, which combines the best properties of other optimization algo-
rithms such as AdaGrad and RMSProp (Brownlee, 2017). It provides a
computationally efficient but robust solution to noisy problems arising
request to the network and controller gains update is made every 50 ms.
when using the model in a non-stationary application (e.g., real-time
Experimentally it is found that is necessary a minimum number of
or online requests, as in this article’s case).
samples 𝑄 = 10 so that the DNN can generalize the process of following
Keras libraries are used to determine the performance of the DNN.
a span of any trajectory proposed in this article (Figs. 10–11). It is
Both training and test sets are monitored to calculate the model ac-
also experimentally verified that introducing current data (𝑖𝑚𝑁 ...𝑖𝑚𝑁+10 )
curacy over the training process. To evaluate the model, the vector of
improves the accuracy of the training instead of inserting the current
optimal gains obtained with the DE algorithm is compared against the
error of the actuators (𝑒𝑖𝑚𝑁 ...𝑒𝑖𝑚𝑁+10 ).
one obtained by the DNN. Once it is validated that the DNN predicts
the set of optimal gains for particular trajectory sections, it is possible
4. Results and discussion
to substitute the DE algorithm for the DNN.
3.1. Experimental setup In this work, the controller tuning of optimal gains is achieved
using a DNN substituting a DE algorithm. For all experiments, the
A five-bar parallel robot has been built, as shown in Fig. 1. The optimization problem minimizes the tracking position error defined in
link lengths of the parallel robot are set, as suggested by Campos Eq. (8).
et al. (2010). This design offers a large workspace for trajectory design The DNN is trained offline with data from six different trajec-
without falling into robot link singularities or link collisions. tories (Figs. 10(a)–10(f)). Then the training is validated with three
Two BLDC motors, Odrive model D 5065 270KV, are used. The unknown trajectories as illustrated in Figs. 11(a)–11(c). After that,
maximum nominal speed of the motor is 8640 rpm, and the torque of the DNN is tested with three unknown trajectories, as illustrated in
1.99 Nm. Angular positions 𝜃1 and 𝜃2 are measured with CUI AMT102- Figs. 11(d)–11(f)
V encoders of 8192 pulses per revolution. Currents are measured with The trajectories to train and validate the DNN are of irregular
the Odrive V3.6 board. geometries, for example, names of people (Fig. 10(f)) or curved lines
The range of the variables for the design vector is set to 𝐊𝐏𝐦 ∈ followed by straight lines (Fig. 11(e)), to have a diversity that allows
[0, 120]; 𝐊𝐕𝐦 ∈ [0, 0.25]; 𝐊𝐈𝐦 ∈ [0, 0.5]. The range of gains is obtained better interpolation for any unknown trajectory. 70% of the data gen-
experimentally to ensure the system’s stability. Likewise, with the erated by running the entire DE algorithm is used to train the DNN.
range of proposed gains, the system’s structural integrity is sought. For The remaining 30% is employed for validation purposes.
instance, it is observed that raising the upper limit of the variable 𝐾𝑉𝑚 Fig. 5, shows with a blue line, the accuracy of the DNN optimization
induces vibrations in the links. algorithm in each iteration, where a smooth line is observed with an
The proposed parameters for the case of the DE algorithm are a improvement trend in training. The orange line shows the evaluation of
population of 𝑁𝑃 = 8 individuals, mutant individuals 𝑁𝑀 = 5, the DNN with an unknown training example; for this reason that oscil-
survivors 𝑁𝑆 = 3, elite individuals 𝐸 = 1, mutation factor 𝑚𝑟 = 0.15 lations are observed in each epoch. The model shows 95.8% accuracy
and, five iterations (𝑀𝐴𝑋𝐺𝐸𝑁 ). on training data and 94.7% on testing.
The experimental platform is developed in Python programming. After validating the offline DNN training, three tuning techniques
TensorFlow and Keras libraries are used to design, train, validate, and are presented for comparison. The three tuning techniques considered
test the DNN. The DNN consists of an input layer of 40 neurons. Four in this work are random gains, the DE algorithm, and the online
hidden layers of 200, 500, 200, and 100 neurons, respectively. An DNN. Each one of these techniques provides a possible solution for
output layer with six neurons (Fig. 4). These outputs represent the several industrial scenarios. The first technique represents an individual
design vector for the robot controller. The activation function used for who has yet to learn about the system which, sets a group of control
all neurons except the output neurons was ReLU due to its simplicity gains. The second technique represents a meta-heuristic approach in
in implementation and low computational cost. which few assumptions about the system are made, and global search
From Fig. 4, a sequence of 𝑄 = 10 consecutive position errors and optimization is conveyed based on a specific working scenario. The
current are taken. The microcontroller sampling time is 5 ms, so a third technique applies a DNN as the search engine and uses online
5
Fig. 5. Visual representation of training and testing of the model.
Fig. 7. Second experiment, testing trajectory.
Fig. 6. First experiment, training trajectory.
Table 1
Objective function performance on training trajectory. Fig. 8. Third experiment, testing trajectory with the change of mass.
Training trajectory, 10(f)
Technique Obj. function
Random 0.00777 that DE obtained the best result in minimizing the objective function.
DE 6.6714e−04 The solution vector obtained by the DE algorithm is the following:
Neural Network 0.0011
𝑥⃗ = [𝐾𝑃1 = 65.84, 𝐾𝑉1 = 0.24, 𝐾𝐼1 = 1.67, 𝐾𝑃2 = 80.88, 𝐾𝑉2 =
0.16, 𝐾𝐼2 = 0.6] . On the other hand, a random tuning approach has
an error over one order of magnitude compared with the DE and
performance feedback to adapt to several working scenarios. This ar- DNN. DNN technique provides an objective function (Eq. (8)), which
ticle exemplifies the scenarios by changing the desired trajectory and is 1.64 bigger than the DE technique. In addition, the most significant
the load at the end effector. amplitude errors between DE and DNN tuning techniques occur in the
Three experiments are conducted, each designed to evaluate the space of time, ranging from 1.8 to 2.4 s, as displayed in the zoomed box
performance under two different working scenarios. In the first exper- at the upper-right corner of Fig. 6. This type of behavior indicates that
iment, trajectory (Fig. 10(f)) is executed using the vector of optimal the two techniques are equivalent because both have a similar reaction
gains of the PID controller given by the DE algorithm that has con- at this time.
verged on, and the DNN has trained. The second experiment compares
a testing trajectory (Fig. 11(e)) for the DNN and DE algorithms. In the
4.2. Second experiment
third experiment, DE and DNN are examined when the same unknown
trajectory is proposed; and the mass of the end effector is changed in
the middle of the cycle time. Fig. 7 shows the results of this experiment, in which the DNN
achieves the best performance based on the objective function, Table 2.
4.1. First experiment The DE static gains obtained the best objective function; however, the
system reaction time at the beginning of the trajectory generates more
Fig. 6 shows the trajectory tracking position error results for the first significant error peaks than DNN. In the zoomed box, it is evidenced
experiment. A black dotted line depicts the result obtained by a random how the DNN obtains peaks of half the magnitude compared to DE. On
PID controller gain. DE and DNN are shown with red dashed line and the other hand, after the first half of the trajectory, the random gains
continuous blue line, respectively. From this, Fig. 6 and Table 1 depict resulted in a critically stable system.
6
Fig. 9. PID controller gains tuning by DNN technique.
Table 2 4.3. Third experiment

Objective function performance on the testing trajectory.
Testing trajectory, 11(e) The third experiment aims to analyze the system behavior during a
Technique Obj. Function mass change in the end effector while the testing desired trajectory is
Random 0.0112 executed 11(e). The elastic deformation of the robot links determined
DE 3.2953e−04 the weight limit used during this experiment.
Neural Network 2.0506e−04
In the last experiment, Fig. 8, the mass change at the end effector
occurs at 0.9 s. The zoomed box in Fig. 8 shows the difference in
magnitude of the tracking error between DE and the DNN at the time of
Results in Tables 1–2 suggest DE reached a global minimum and the mass change. It can be seen that both present error peaks at almost
converged to optimal gains, which are optimal for the first trajectory. the same time, but the DNN repeatedly achieves error minimization
Contrarily, for the case of the second trajectory, DE does not perform faster than DE. The DNN accomplished the best result of the objective
optimally because the PID controller gains are optimal for a particular function and the lowest impact in the system due to the change of mass,
trajectory. Therefore, the DE algorithm would have to be rerun for the as can be observed in Table 3. The robustness to changes in mass by the
testing trajectory. DNN can be observed in that the amplitudes of the errors are decreasing
7
Fig. 10. Trajectory training set.
Table 3 under a similar change pattern. This behavior is expected since the
Objective function performance on a testing trajectory with the change
system is under the same conditions (no mass). When the change in
of mass.
mass occurs, the proportional gains reach maximum peaks of 102.5 and
Testing trajectory, 11(e), and mass changed
121.188 for each motor, respectively, as can be seen in Figs. 9(a)–9(b).
Technique Obj. function
Approaching the end of the trajectory, the proportional position gains
Random 0.3469 set on both motors start to converge to the same values. For the case
DE 0.1262
of the gains in velocity with the change in mass, an increase in the
Neural Network 1e−04
amplitude of the gains and a decrease in the frequency of change in
these gains are observed in Figs. 9(c)–9(d). In the case of integral gains,
an average decrease is observed in Figs. 9(e)–9(f) for both motors once
once the disturbance occurs. However, the other two tuning techniques the mass disturbance appears.
show the highest error peaks close, to the 1.6 s. In the case of the
random technique, the most significant error is almost four times the 5. Conclusions
magnitude of the other techniques.
Figs. 9(a)–9(f) show the time-varying gains for the two brushless This paper addresses the gain tuning of a PID controller for two
motors for the testing trajectory. Experiment 2 is represented by a blue brushless motors using three techniques: Random assignment, DE, and
dotted line when there are no changes in mass at the end effector. DNN. Three case studies are analyzed to observe each method’s advan-
Experiment 3 is depicted by a continuous red line when the mass is tages and disadvantages. All measurements of position errors, velocity
changed in the middle of the run time of the desired trajectory. During errors, and current are generated using the experimental prototype,
the first half of the trajectory, from second 0 to 0.9, the gains behave thus saving the use of a mathematical model to solve the dynamics of
8
Fig. 11. Trajectory validation and testing sets.
the robot and actuators. Likewise, friction effects in the joints and the Writing – review & editing, Visualization, Supervision, Project ad-
closed kinematic chain constraints of the parallel robot are implicitly ministration, Funding acquisition. J. Enrique Chong-Quero: Concep-
considered. tualization, Methodology, Formal analysis, Resources, Writing – re-
view & editing, Project administration, Funding acquisition. Carlos
A DNN can interpolate between known data, while DE finds the
A. Cruz-Villar: Conceptualization, Methodology, Validation, Formal
global optimum for a particular scenario. When there is a change in
analysis, Resources, Writing – review & editing, Supervision, Project
the system (e.g., a change of mass in the end effector), the DNN can administration, Funding acquisition.
interpolate and find a robust solution to changes in the trajectory and
mass at the end effector, see Fig. 8. However, DE must be iteratively Declaration of competing interest
performed if there are changes in the trajectory, in the mass, or the
conjunction of both. The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared to
influence the work reported in this paper.
CRediT authorship contribution statement
Data availability
Daniel Blanck-Kahan: Software, Validation, Formal analysis, In- I have shared all data and codes in the appendix section.
vestigation, Data curation. Gerardo Ortiz-Cervantes: Software, Valida-
Acknowledgments
tion, Formal analysis, Investigation, Data curation, Project administra-
tion. Valentín Martínez-Gama: Software, Validation, Formal analysis, The authors would like to acknowledge the financial support of
Investigation. Héctor Cervantes-Culebro: Conceptualization, Method- NOVUS (Grant number N20-144), Institute for the Future of Education,
ology, Validation, Formal analysis, Resources, Writing – original draft, Tecnologico de Monterrey, Mexico, in the production of this work.
9
Appendix Kiumarsi, B., Lewis, F. L., & Jiang, Z.-P. (2017). 𝐻∞ Control of linear discrete-time
systems: Off-policy reinforcement learning. Automatica, 78, 144–152.
Kler, D., Sharma, P., Rana, K., & Kumar, V. (2018). A BSA tuned fractional-order PID
Figs. 10(a)–10(f) show the set of trajectories with which the DNN is
controller for enhanced MPPT in a photovoltaic system. In Fractional order systems
trained. The training trajectories are a set of splines executed in differ-
(pp. 673–703). Elsevier.
ent quadrants of the Cartesian plane. Figs. 11(a)–11(c) depict the set of Kumar, A., & Kumar, V. (2017). Evolving an interval type-2 fuzzy PID controller for
validation trajectories. In Fig. 11(a), a spline in the fourth quadrant of the redundant robotic manipulator. Expert Systems with Applications, 73, 161–177.
the Cartesian plane is shown as a trajectory. Fig. 1111(b) reproduces Li, H., Song, B., Tang, X., Xie, Y., & Zhou, X. (2022). Controller optimization using data-
the intersection of trajectories in straight and curved lines. A geometric driven constrained bat algorithm with gradient-based depth-first search strategy.
ISA Transactions, 125, 212–236.
ellipse is generated in Fig. 11(c). Figs. 11(d)–11(f) illustrate the set
Liu, H., Lu, G., & Zhong, Y. (2012). Robust LQR attitude control of a 3-DOF laboratory
of testing trajectories which are trajectories of closed and irregular helicopter for aggressive maneuvers. IEEE Transactions on Industrial Electronics,
geometry. 60(10), 4627–4636.
All of the code used for the project is contained in the following Luo, B., Wu, H.-N., & Huang, T. (2014). Off-policy reinforcement learning for 𝐻∞ control
publicly available GitHub repository. The code is open for free use design. IEEE Transactions on Cybernetics, 45(1), 65–76.
under the MIT license. Pang, H., Liu, F., & Xu, Z. (2018). Variable universe fuzzy control for vehicle semi-active
suspension system with MR damper combining fuzzy neural network and particle
https://github.com/valentin-martinez-gama/robo-evoML
swarm optimization. Neurocomputing, 306, 130–140.
The following are the main components of the codebase: Pradhan, R., Majhi, S. K., Pradhan, J. K., & Pati, B. B. (2020). Optimal fractional order
PID controller design using ant lion optimizer. Ain Shams Engineering Journal, 11(2),
• Logic and control functions for the control loop implementation
281–291.
using Odriver under the Odrive_control folder. Precup, R.-E., David, R.-C., Roman, R.-C., Petriu, E. M., & Szedlak-Stinean, A.-I.
• The evolutionary algorithm logic and its implementation on top (2021). Slime mould algorithm-based tuning of cost-effective fuzzy controllers for
of the ODrive controller are contained in the evo_ML.py file. servo systems. International Journal of Computational Intelligence Systems, 14(1),
• The methodology used to generate data training set out of imple- 1042–1052.
Rigatos, G., Siano, P., Selisteanu, D., & Precup, R. (2017). Nonlinear optimal control of
menting multiple evolutionary iterations on multiple trajectories
oxygen and carbon dioxide levels in blood. Intelligent Industrial Systems, 3, 61–75.
is in the ML_training.py file. Rodríguez-Molina, A., Mezura-Montes, E., Villarreal-Cervantes, M. G., & Aldape-
• ML.py and ML_data.py contain supporting initialization and data Pérez, M. (2020). Multi-objective meta-heuristic optimization in intelligent control:
processing functions. A survey on the controller tuning problem. Applied Soft Computing, 93, Article
• The Trajectories folder contains the trajectories to be followed as 106342.
pairs of angular setpoints for the two motors. All of them are to Salwani, M. I., Norzaidi, M. D., Chong, S. C., & Lin, B. (2009). Factors determining
organisational commitment on security controls in accounting-based information
be uniformly spaced in time.
systems. International Journal of Services and Standards, 5(1), 51–66.
• Datasets and Keras Model contain the input data used for NN Sánchez, H. S., Padula, F., Visioli, A., & Vilanova, R. (2017). Tuning rules for robust
training and the resulting trained model implemented. FOPID controllers based on multi-objective optimization with FOPDT models. ISA
Transactions, 66, 344–361.
Song, R., & Lewis, F. L. (2020). Robust optimal control for a class of nonlinear systems
with unknown disturbances based on disturbance observer and policy iteration.
References Neurocomputing, 390, 185–195.
Souza, A., & Souza, L. (2019). Design of a controller for a rigid-flexible satellite using
Ashok Kumar, M., & Kanthalakshmi, S. (2018). 𝐻∞ Tracking control for an inverted the H-infinity method considering the parametric uncertainty. Mechanical Systems
pendulum. Journal of Vibration and Control, 24(16), 3515–3524. and Signal Processing, 116, 641–650.
Bingül, Z., & Karahan, O. (2011). A fuzzy logic controller tuned with PSO for 2 DOF Storn, R., & Price, K. (1997). Differential evolution–A simple and efficient heuristic for
robot trajectory control. Expert Systems with Applications, 38(1), 1017–1031. global optimization over continuous spaces. Journal of Global Optimization, 11(4),
Brownlee, J. (2017). Gentle introduction to the adam optimization algorithm for deep 341–359.
learning. Machine Learning Mastery, 3. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: an introduction. MIT Press.
Campos, L., Bourbonnais, F., Bonev, I. A., & Bigras, P. (2010). Development of a five- Trinh, N. H., Vu, N. T.-T., & Nguyen, P. D. (2021). Robust optimal tracking control
bar parallel robot with large workspace. 44106, In International design engineering using disturbance observer for robotic arm systems. Journal of Control, Automation
technical conferences and computers and information in engineering conference (pp. and Electrical Systems, 1–12.
917–922). Ucgun, H., Okten, I., Yuzgec, U., & Kesler, M. (2022). Test platform and graphical user
Cheng, L., Wang, Z., Jiang, F., & Li, J. (2021). Adaptive neural network control of interface design for vertical take-off and landing drones. Science and Technology,
nonlinear systems with unknown dynamics. Advances in Space Research, 67(3), 25(3), 350–367.
1114–1123. Villarreal-Cervantes, M. G., Mezura-Montes, E., & Guzmán-Gaspar, J. Y. (2018).
Fang, J., Zhao, J., Mei, T., & Chen, J. (2016). Online optimization scheme with dual-
Differential evolution based adaptation for the direct current motor velocity control
mode controller for redundancy-resolution with torque constraints. Robotics and
parameters. Mathematics and Computers in Simulation, 150, 122–141.
Computer-Integrated Manufacturing, 40, 44–54.
Wang, Z., Zou, L., Su, X., Luo, G., Li, R., & Huang, Y. (2021). Hybrid force/position
Ghorbel, F. H., Chételat, O., Gunawardana, R., & Longchamp, R. (2000). Modeling
control in workspace of robotic manipulator in uncertain environments based on
and set point control of closed-chain mechanisms: Theory and experiment. IEEE
adaptive fuzzy control. Robotics and Autonomous Systems, 145, Article 103870.
Transactions on Control Systems Technology, 8(5), 801–815.
Ye, M., Gao, G., & Zhong, J. (2021). Finite-time stable robust sliding mode dynamic
Goyal, V., Mishra, P., Shukla, A., Deolia, V. K., & Varshney, A. (2019). A fractional
order parallel control structure tuned with meta-heuristic optimization algorithms control for parallel robots. International Journal of Control, Automation and Systems,
for enhanced robustness. Journal of Electrical Engineering, 70(1), 16–24. 19(9), 3026–3036.
Hajiloo, A., Nariman-Zadeh, N., & Moeini, A. (2012). Pareto optimal robust design Yilmaz, B. M., Tatlicioglu, E., Savran, A., & Alci, M. (2021). Adaptive fuzzy logic
of fractional-order PID controllers for systems with probabilistic uncertainties. with self-tuned membership functions based repetitive learning control of robotic
Mechatronics, 22(6), 788–801. manipulators. Applied Soft Computing, 104, Article 107183.
Hekimoğlu, B. (2019). Optimal tuning of fractional order PID controller for DC motor Zamfirache, I. A., Precup, R.-E., Roman, R.-C., & Petriu, E. M. (2022). Policy iteration
speed control via chaotic atom search optimization algorithm. IEEE Access, 7, reinforcement learning-based control using a Grey Wolf Optimizer Algorithm.
38100–38114. Information Sciences, 585, 162–175.
Jin, X.-Z., He, T., Wu, X.-M., Wang, H., & Chi, J. (2020). Robust adaptive neural Zhang, B., Deng, B., Gao, X., Shang, W., & Cong, S. (2023). Design and implementation
network-based compensation control of a class of quadrotor aircrafts. Journal of of fast terminal sliding mode control with synchronization error for cable-driven
the Franklin Institute, 357(17), 12241–12263. parallel robots. Mechanism and Machine Theory, 182, Article 105228.
Joseph, S. B., Dada, E. G., Abidemi, A., Oyewola, D. O., & Khammas, B. M. (2022). Zhang, S., & Liu, L. (2018). Normalized robust FOPID controller regulation based on
Metaheuristic algorithms for PID controller parameters tuning: Review, approaches small gain theorem. Complexity, 2018.
and open problems. Heliyon, Article e09399. Zhou, X., & Zhang, X. (2019). Multi-objective-optimization-based control parameters
Khan, A. H., Cao, X., Li, S., Katsikis, V. N., & Liao, L. (2020). BAS-ADAM: An ADAM auto-tuning for aerial manipulators. International Journal of Advanced Robotic
based approach to improve the performance of beetle antennae search optimizer. Systems, 16(1), Article 1729881419828071.
IEEE/CAA Journal of Automatica Sinica, 7(2), 461–471.
10

1 s2.0 S095741742301686X Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S095741742301686X Main

Uploaded by

Copyright:

Available Formats

Expert Systems With Applications 236 (2024) 121184

Contents lists available at ScienceDirect

Expert Systems With Applications

Neural-optimal tuning of a controller for a parallel robot

ARTICLE INFO ABSTRACT

• Neural Network (NN) based Control. An adaptive compensation

• Achieving a balance of information between proven and new

Fig. 2. PI controller gains tuning by DE and DNN.

Fig. 3. Data generation for DNN training flow chart.

The methodology used to generate data to train the DNN is sum-

Fig. 5. Visual representation of training and testing of the model.

Fig. 7. Second experiment, testing trajectory.

Fig. 6. First experiment, training trajectory.

Fig. 9. PID controller gains tuning by DNN technique.

Table 2 4.3. Third experiment

Fig. 10. Trajectory training set.

Fig. 11. Trajectory validation and testing sets.

You might also like