Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Neural Computing and Applications (2024) 36:4523–4535

https://doi.org/10.1007/s00521-023-09396-x (0123456789().,-volV)(0123456789().,-volV)

REVIEW

Improving productivity in mining operations: a deep reinforcement


learning model for effective material supply and equipment
management
Teddy V. Chiarot Villegas1 • S. Francisco Segura Altamirano1 • Diana M. Castro Cárdenas1 •

Ayax M. Sifuentes Montes2 • Lucia I. Chaman Cabrera1 • Antenor S. Aliaga Zegarra2 •


Carlos L. Oblitas Vera1 • José C. Alban Palacios2

Received: 24 May 2023 / Accepted: 13 December 2023 / Published online: 13 January 2024
 The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024

Abstract
This research study examines the impact of shifts and lunch breaks on mining operations, particularly focusing on delays in
hauling equipment used to supply extracted material to crushers. These delays significantly reduce productivity, averaging
below 80% during regular working hours and adversely affecting mine profitability. To address this issue, a Q-learning-
based deep reinforcement learning model was developed, utilizing real-world data from mining operations. The model
aimed to achieve a 90% coverage in material supply to the crushers. A simulation environment, closely resembling the
physical mining setting, was created to test the trucks as agents. Various scenarios, including equipment selection, cycle
time, queue times, and material types, were considered. Based on the results, a deep learning model was trained to
maximize coverage by determining the optimal combination of trucks and crushers. The solution successfully achieved a
90% supply coverage during shift changes and lunch breaks, with average execution times of less than 1 ms, making it
suitable for real-time applications. This research demonstrates the effectiveness of the proposed Q-learning deep rein-
forcement learning model in optimizing material supply and enhancing mining productivity. By addressing delays and
improving operational efficiency, this model holds significant potential for improving profitability in mining operations.

Keywords Mining operations  Hauling equipment  Q-learning  Truck dispatching

1 Introduction shovels presents significant challenges in achieving daily


tonnage objectives. Such delays hinder the capacity to clear
Efficient fleet management is crucial in mining operations mining fronts for progression or to supply feed for crushing
to maximize productivity and minimize costs, which con- processes [1].
stitute approximately 50% of total operational expenses in Efficient resource management plays a pivotal role in
open-pit mines. The scarcity of hauling trucks and loading addressing the aforementioned challenges, and many

Antenor S. Aliaga Zegarra


& S. Francisco Segura Altamirano aaliagaz@unp.edu.pe
sseguraal@unprg.edu.pe
Carlos L. Oblitas Vera
Teddy V. Chiarot Villegas coblitas@unprg.edu.pe
tchiarot@unprg.edu.pe
José C. Alban Palacios
Diana M. Castro Cárdenas jalbanp@unp.edu.pe
dcastroc@unprg.edu.pe
1
Ayax M. Sifuentes Montes Pedro Ruiz Gallo National University, Calle Chile 848, JLO,
asifuentesm@unp.edu.pe Chiclayo, Lambayeque, Peru
2
Lucia I. Chaman Cabrera National University of Piura, Piura, Peru
lchamanc@unprg.edu.pe

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4524 Neural Computing and Applications (2024) 36:4523–4535

mining activities employ fleet management systems to A recent advancement in the area of large-scale ride-
maximize production, minimize stockpile usage, align sharing platforms is the balancing of vehicle supply and
crusher feeding with objectives, and adhere to blending demand, key to maximizing efficiency. A study by Lin
restrictions. However, these systems offer multiple con- et al. developed new contextual deep multi-agent rein-
figuration options determined by the operator’s criteria and forcement learning (RL) algorithms that effectively coor-
experience. Notably, the utilization and prioritization of dinated thousands of vehicles to improve platform revenue
shovels and crushers hold immense significance within this and customer satisfaction [7].
context. Dispatch decisions carry substantial weight in Building on this, [8] presents an actor-critic RL model to
operational efficiency and are of utmost importance, con- provide updated short-term production schedules and fleet
sidering that a significant portion of mining costs are assignments for trucks and shovels in a mining complex.
directly associated with truck and shovel activities [2]. The carefully designed reward system prioritizes mineral
Many companies offer mining fleet management sys- material movement and penalizes tailings. Extensive sim-
tems and the most used ones are Modular Mining Systems, ulation demonstrates significantly enhanced productivity,
Jigsaw Software, and Wenco, installed and used by 200, efficiency, and resource utilization compared to current
130, and 65 mining companies, respectively. However, practices. By harnessing recent innovations in ride-sharing
Micro mines with the Pitram system and Caterpillar with and mining through advanced machine learning, this study
Cat MineStar Solutions are the next leaders in mining fleet offers a promising direction for transforming fleet
management systems [3]. management.
Recent research has explored innovative solutions, such Ahangaran et al. [9] explored the truck assignment
as integrating control layers into fleet management systems problem in open-pit mines, aiming to maximize produc-
using deep learning to improve dispatch agility [4]. Such tivity while minimizing transportation costs. Their two-
intelligent distributed architectures, namely a vehicular stage model first determines optimal routes using flow
layer, a processing layer and a decision layer, with high- network theory and then assigns trucks to shovels using
performance algorithms and Internet-of-Things (IoT) con- binary integer programming, accommodating operational
nectivity have demonstrated potential for enhanced real- constraints and varying truck capacities. Mu et al. [10]
time coordination. Experimental implementations show- addressed the efficiency challenges in dispatching large
case scalability possibilities across various mines. truck fleets, given the complexity of current road networks.
Another study [5] utilized dense neural network models They proposed a multilayer map matching algorithm with
in open-pit mining truck systems to predict daily mined spatial data structures and a novel ‘‘lookup table’’ method,
material production based on transport conditions and outperforming traditional ‘‘pinpointing’’ techniques in
operating times. The verification with data processed over empirical tests with Taiwanese truck data.
two months showed that the ore production in the morning Chaowasakoo et al. [11] studied truck dispatch strategies
and afternoon were 11.40% and 8.87%, respectively, with in open-pit mining to maximize productivity under cycle
an error margin of approximately 4.17% between actual time uncertainties. Their simulation comparing four
and forecasted productions. This method of prediction can strategies revealed that the global vision approach signifi-
effectively address the challenges faced by conventional cantly outperformed traditional methods in a real coal mine
truck system simulations based on complex algorithms. scenario.
Further studies have applied deep multi-agent rein- In summary, the reviewed studies effectively employ
forcement learning (RL) for vehicle supply-demand bal- various analytical and simulation techniques, including
ance in large-scale ride-sharing platforms, coordinating integer programming, goal programming, hidden Markov
thousands of units to enhance revenue and customer sat- models, and reinforcement learning algorithms to optimize
isfaction [6]. Adapting such approaches to optimize shovel- complex real-world transport and mining systems. Their
truck fleets could transform mining productivity. Our work findings demonstrate significant quantitative improve-
employs Q-learning-based models with tailored rewards to ments, ranging from 47 to 270% gains in key performance
significantly improve crusher coverage. Rigorous testing indicators such as production levels, cash flows, truck
on extensive data from Peru analyzes truck agent behavior delays and fleet availability. These underscore outstanding
in shovel selection. By prioritizing ore movement and yet realistic opportunities for operational advancement
penalizing waste, the networks achieved significant leveraging modern optimization, machine learning and
improvements in efficiency and resource utilization. Fur- data-driven solutions.
ther leveraging the latest advancements in ride-sharing Addressing the persistent challenge of production bot-
could offer a highly promising transformation in fleet tlenecks in mining operations, an emergent approach in the
management. field involves integrating an automated control layer into
fleet management systems, harnessing the optimization

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4523–4535 4525

power of deep reinforcement learning models. Aligning


with this high-potential direction, in this study we present a
breakthrough proposition utilizing a deep Q-learning
algorithm tailored to the specifics of the mining context.
Our proposed model demonstrates remarkable potential to
significantly enhance truck coverage and material supply to
crushers during shift changes and lunch breaks. This
unprecedented achievement is realized through meticulous
and constraints-aware reward design, dynamically ensuring Fig. 1 Reinforcement learning model
optimal fleet performance in varying conditions. By
effectively harnessing the generalization capability of deep action itself, such as the steering wheel actions used in
learning, our research puts forward a game-changing driving cars, which can vary in different directions and
solution poised to revolutionize the mining industry, steering angles.
overcoming intrinsic production limitations and catalyzing Observations serve as a reference provided by the
the unlocking of untapped operational efficiency at scale. environment to the agent, offering insights into the sur-
rounding circumstances. This information is crucial for
determining rewards, as it includes data on rewards, pun-
2 Materials and methods ishments, and the agent’s acquired score, facilitating the
learning process.
Machine learning, a branch of artificial intelligence that
enables computers to learn without being explicitly pro- 2.1 Q learning
grammed. Through algorithms, patterns are identified in
data and prediction or classification models are con- Part of the reinforcement learning methodology solves the
structed. There are three types of learning: supervised, learning difficulties in the control of autonomous agents
reinforcement, and unsupervised. Supervised learning uses through trial and error within a dynamic environment, by
labeled data to train a model and establish an input/output providing a signal that allows each of the actions carried
relationship. In unsupervised learning, data is unlabeled, out to be improved. Furthermore, it can solve decision
and the algorithm must classify the information. In this problems in which the benefit of an action is subject to a
study, reinforcement learning is prioritized, where a model chain of Markovian decisions. In real time, the agent is in a
is developed with an agent that interacts in an unknown particular position S0 of in the environment and then per-
environment and learns through trial and error, using forms an action a0 generating a new state S1 and reward r 0 ,
established rewards and punishments. By employing these and so on, as shown in Fig. 2. The agent performs a series
machine learning techniques, this study pushes the of actions whose values are defined by the reward function
boundaries of knowledge and explores the potential for over time, and the agent decides what action to take during
data-driven advancements, fostering innovation in various each step to maximize the value of the reward. Generally,
areas (see Fig. 1). the environment and the task without prior external influ-
The terms shown in Fig. 1 can be described as: ences are considered, such that they satisfy the Markovian
Environment in this context is characterized by rewards property. A set of possible states S, actions A, transition
obtained from the environment itself, actions performed by probabilities Pðs; a; s0Þ, expected rewards Rðs; a; s0Þ, and
the agent or input to the environment, and observations that cumulative reward Rt are added in the future, as shown in
provide additional information on the rewards acquired by Eq. 1, where c is a discount factor whose value ranges
the agent. from 0 to 1 [12].
Reward are quantitative values obtained from the
environment, serving as feedback to the agent. They can be
positive or negative, influencing the agent’s behavior and
learning process.
Action This represents the agent’s behavior within the
environment, which can be accepted by the game or
activities in progress. In reinforcement learning, actions
can be classified as discrete or continuous. Discrete actions
are a finite set of options, enabling the agent to move
between specific choices, such as left or right. On the other
hand, continuous actions involve values associated with the Fig. 2 Q-learning methodology

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4526 Neural Computing and Applications (2024) 36:4523–4535

8
> S ¼ s0 ; s1 ; s2 ;    ; sn communication functions, GPS global positioning sensors
>
>
>
< A ¼ a0 ; a1 ; a2 ;    ; an for shovels, drills, and tractors, and automatic assignments
Pðs; a; s0Þ ¼ Pfstþ1 ¼ s0; st ¼ s; at ¼ ag ð1Þ to haul trucks in open-pit mines, with the following com-
>
> 2
> Rðs; a;Ps0Þ ¼ Efr tþ1 ; st ¼ s; at ¼ a; stþ1 ¼ s0g
> ponents [15]:
: 1
Rt ¼ k¼0 ck r tþkþ1
• PTX (Modular Mining) computerized field system
Control theory methods are used in each episode and comprising a graphic console and a central unit;
algorithms capable of making better predictions, whose installed on trucks, auxiliary equipment, shovels, and
value a [ (0,1) updates the values of Qðs; aÞ (see Eq. 2) are crushers;
applied. It is reduced step by step or constantly, depending • Global positioning system (GPS) technology;
on whether or not the work is in a static environment [12] • The radio data link is a base computer in the
Qðs; aÞ ¼ Qðs; aÞ þ a½r þ rfmaxg Qðs0; a0Þ  Qðs; aÞ ð2Þ information center, whose purpose is to allocate auto-
matic assignments to haul trucks.
Reinforcement learning uses specific criteria, which are
The hauling process begins at the beginning of the shift,
detailed below [13]:
where each operator data are entered in the Dispatch sys-
• Defines multiple episodes, which record loss values in a tem for the subsequent registration of both the user and the
list; truck, shovel, or crusher. The data are sent to the infor-
• Sets random actions under some policy, for example, mation center after confirming the registration. Then, the
the direction of a path to be forward, backward, left, or operator receives a message regarding the necessity for the
right (see Fig. 2); continuous use of equipment and the mandatory revisions,
• Each action a carries a reward r, and a new state stþ1 is after which the operator and the equipment can continue
generated; the hauling cycle. For breaks such as operator replace-
• Collects the transition ðs; a; r; s0Þ in the experience ments, shift changes, maintenance, refueling, and blasting
storage list (see Fig. 3), providing batch updates online, processes that occur in mines, Dispatch performs the
where each experience is generated based on probabil- allocation and optimization calculations before operators
ity, taking e as the value for designated actions and then are entered into the system.
saving them in a list. Therefore, it decomposes into
separate mini-batch tensors, iterates over them, and
computes updates for the mini-batch Q-values. Then, 3 Analysis and proposal
the final array Y is aggregated, and each state is stored
in variable X to form a mini training batch [14]. Currently, mining companies have hundreds of trucks,
• Determines the gradient descent loss function which dozens of shovels, several loaders, and primary crushers
decreases the neural network’s prediction error. distributed among different pits. A probable scenario is
shown in Fig. 4. Usually, there is an integration between
the different pits.
2.2 Mining fleet management system
The system is designed to accommodate the arrange-
ment of shovels and loaders, allowing for flexibility in their
There are several large-scale mining management software
location based on operational requirements and the daily
packages, including DISPATCH, with data
mining plan. However, the crushers have fixed positions
and cannot be easily relocated. Dumps serve as storage
areas for waste or low-grade materials and are typically
situated outside the main processing area, where they are
stacked in piles. Additionally, truck distribution within the
mine is determined by the Dispatch system configuration,
which ensures alignment with the operational plan.
In mining operations, a crucial task is to transport gra-
ded material to the crushers. Currently, there is a sufficient
amount of equipment in place to maintain the objective of
delivering material to the crushers when all equipment is
operational. However, not all trucks are dedicated to
transporting material to the crushers due to the need to
prioritize the removal of waste material in order to access
Fig. 3 Probability strategy the ore. This process is carried out on a daily basis. While

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4523–4535 4527

Fig. 4 Mining operation of two open pits (Image from Google Earth)

there are no difficulties in covering the crusher with trucks


during hours when all equipment is available, specific
periods such as shift changes or lunch breaks pose chal-
lenges. As a result, there is a decrease in truck coverage at Fig. 5 Equipment stoppage process during shift change
the crushers, leading to a shortage of trucks and shovels
available for this specific task.

3.1 Shift change

The mining operations are carried out around the clock,


divided into two 12-hour time periods, with a shift change
occurring at 07:30 and 19:30. To ensure continuous supply
to the crushers, a systematic arrangement is implemented,
positioning all trucks and shovels strategically during these
events. Due to the inability to execute these events
simultaneously, the shift changes are scheduled with a time Fig. 6 Timeline of the shift change process entering operators
gap between them. In Fig. 5, a graph illustrates the distri-
bution of equipment downtime during shift change hours, 3.3 Equipment: trucks, shovels and crushers
showcasing both rounds. To minimize the movement of
personnel, designated areas are designated as parking zones Using the Dispatch system, the operation time of each
during shift change hours, allowing for equipment down- truck is obtained from the mining companies. The trucks
time (see Fig. 6). are classified into categories to calculate the percentage of

3.2 Lunch break

During daily mining operations, there is a scheduled


equipment shutdown for lunch break. Due to the inability
to stop all teams simultaneously, the shutdown is carried
out in three separate rounds. Each round involves approx-
imately one-third of the fleet, with designated time slots
from 11:30 to 12:30, 12:30 to 13:30, and 13:30 to 14:30.
Supervisors are responsible for making decisions regarding
which teams will be taken out of operation in each round.
In Fig. 7, a graph illustrates the minute-by-minute distri-
bution of trucks during the lunch break process, clearly
indicating the specific times when trucks are affected by
the lunch break downtime.

Fig. 7 Snack hours Process—Usual Day

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4528 Neural Computing and Applications (2024) 36:4523–4535

availability and usage of the equipment. During the cal- in one of the crushers on shift change schedules and lunch
endar time T c , these categories are divided into available breaks, as shown in Fig. 11. In the month evaluated, the
time T D and wasted time (unavailable). The available time event of delays had an impact of 12% on the use of one of
is further broken down into subcategories: effective oper- the crushers. The ‘‘Upstream Delays’’ is the time when the
ating time T O , operational delays, and non-operational crusher did not have a truck, as shown in Fig. 12.
delays. This breakdown allows for the calculation of the
proportion of available time that was effectively utilized by 3.4 Proposed methodology
the equipment, which is termed equipment utilization (see
Eq. 3). This groundbreaking research focused on applying deep
TD reinforcement learning, specifically Q-learning, within the
D% ¼  100 ð3Þ context of trucks, shovels, and crushers in mining opera-
TC
tions. By incorporating information on hauling cycle times,
The percentage of utilization U% is defined as the the agent function was assigned to the trucks, enabling
relationship between the operative time T 0 and the avail- them to make informed decisions. The actions available to
able T D (see Eq. 4) the trucks involved selecting and approaching the available
TO shovels. The research utilized the open-source program-
U% ¼  100 ð4Þ
TD ming language Python [16], along with relevant libraries
that facilitated the simulation of the mining process and
This study was done based on the information from one
training of the deep learning models. This innovative
month of the current year (2022) to calculate the number of
approach holds immense potential for revolutionizing the
trucks operating in the mine per hour, where the operative
mining industry, enhancing productivity, and driving
times T_0 of all trucks were set for each hour, that allowed
transformative advancements.
the estimation of the number of trucks operating in the
The methodology employed in this research encom-
mining company during that hour. Figure 8 shows how the
passed a comprehensive analysis of the hauling cycle over
number of trucks decreased during the shift change and
a 30-day period. Through the utilization of a transact-SQL
lunch break process, regardless of the time of shift.
script and the Dispatch system, essential data on supply
Although the schedules were similar for shift change and
times, shift change delays, truck transfers, shovel loading
lunch break events, day and night shifts had different
and unloading, as well as crushers and dumps were
characteristics as shown.
extracted. The Dispatch system, integrated with GPS glo-
Shovel operators also changed shifts and lunch breaks,
bal positioning sensors, provided accurate information,
and in these cases, the operators usually had some relays so
while 15 shovels with flexible positions, four dumps, three
that the shovels were operating always. However, by
crushers, and a designated truck parking area were strate-
simple inspection, it can be seen from Fig. 9 that the
gically selected. By converting coordinates to the WGS84
number of shovels available at shift changes and lunch
cartographic map and considering 252 combinations for
breaks dropped by up to four units. All the previous pro-
loading and unloading locations, a wide range of daily
cesses determine the tonnage per hour fed to the crushers.
operating scenarios was accounted for. An additional 50
The day and night shifts and the number of trucks unloaded
combinations were used for the truck parking location,
per hour in the crushers can be seen in Fig. 10.
optimizing proximity to the launch area, shovels, and
For the crushers, an event called ‘‘Upstream Delays’’
crushers. This meticulous methodology ensured a com-
indicates the percentage of time in which there were no
prehensive and reliable foundation for subsequent deep
trucks in the crushers. In addition, this event had an impact
reinforcement learning modeling and simulations.

Fig. 8 Operating trucks per day in different shifts Fig. 9 Number of shovels available at shift changes and lunch break

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4523–4535 4529

category involved the loading delay time for trucks at the


same shovel, resulting in an average of 4.80 min with a
standard deviation of 1.32 min.
The scheduling of dumps was consistent and unre-
stricted, resulting in queues during material dispatch with a
mean of l = 1.62 min and a standard deviation of r = 0.45
min. Conversely, the crushers experienced truck queues
and had restrictions on equipment access. The discharge
time values for crushers were l = 2.25 min with a r = 0.89
min. This higher value signifies the variability in discharge
Fig. 10 Number of discharges per day in crushers times, which are influenced by the material’s mechanical
characteristics, including its resistance and reaction to
force during the crushing process.
The study analyzed the delay times in the mine over a
30-day period in 2022. These delays, including truck cycle
times, were found to be variable and not fixed values. To
account for this variability, a normal distribution curve was
generated using the mean and standard deviation, and these
delays were incorporated into the discrete event simulator.
For the lunch break, the analysis revealed an average
delay of l = 49.77 min with a standard deviation (r) of
6.85 min. During the shift change from 6:30 a.m. to 7:30
Fig. 11 Event upstream delays a.m. for the day shift, the average delay was l = 25.72 min
with a standard deviation of r = 13.18 min. Similarly,
during the shift change from 6:30 p.m. to 7:30 p.m. for the
night shift, the average delay was l = 25.92 min with a
standard deviation of r = 14.79 min.

3.5 Deep learning model

In Q-learning, the neural network plays a crucial role by


receiving the observations of each agent in the environ-
ment. The network’s output consists of Q values associated
with each possible action that the agent can take. In this
research, the observations for each agent were carefully
defined to capture relevant information. These observations
included travel times to each shovel (15 data points),
queues at each shovel (15 data points), equipment avail-
ability on the route to each shovel (15 data points), queues
at each crusher (three data points), and the number of other
agents present in the environment (one data point). The
outputs of the neural network represented the Q values
Fig. 12 Impact of delays upstream in one of the crushers
directed toward each possible action, based on the agent’s
current observation. Considering that the mine operated in
The accommodation time of trucks at the shovel refers
an unstable environment, the agent had an infinite range of
to the duration from the truck’s arrival at the shovel until it
possible observations. The neural network was responsible
is positioned to be loaded. In the various conducted tests,
for calculating the Q value based on the current state,
the average preparation time for multiple trucks was found
resulting in a network output of 15 points, corresponding to
to be 1.09 min, with a standard deviation of 0.54 min. To
the number of shovels in the mine.
obtain data on shovel loading times, the information was
Significant developments were made in the reward
divided into two categories. The first category focused on
function (R_t) to accurately capture the performance of the
parking time at the shovels, yielding an average of 1.30
agents. The reward value was determined based on two
min with a standard deviation of 0.45 min. The second
essential variables. Firstly, the total cycle time (DT_cycle)

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4530 Neural Computing and Applications (2024) 36:4523–4535

from when the truck was assigned to a shovel until it b. When the status was 1, the truck was parked next
unloaded and requested a new assignment. This factor to the shovels and the materials were being loaded.
accounted for the impact of truck queues at the crusher or c. When the status was 2, the trucks followed their
dump destinations. Consequently, locations with longer preset paths to reach the crushers if transporting the
queues received lower rewards, ensuring an incentive to mineral, or they reached the landfill. Then, a status
minimize delays. The second variable, the type of material of three materials was unloaded. Then, the trucks
carried by the agent (T_material), was considered. It was returned to the point of origin and started a new
defined as either 0 or 1, representing waste or minerals, cycle.
respectively. This factor allowed the reward function to
2. The duration times in the haul cycle used the normal
differentiate between different material types and their
distributions described in Sect. 3.4, depending on the
significance for the mining operation. Additionally, an
status of the trucks.
additional 50% reward was provided when the agent suc-
3. The loss of operational equipment was considered
cessfully loaded ore and chose an ore shovel as an action,
during lunch break hours (11:30 a.m.–2:30 p.m.) and
followed by a direct discharge to the crushers. This
shift change hours (6:30 a.m.–7:30 a.m., 6:30 p.m.–
encouraged efficient handling of valuable ore material.
7:30 p.m.).
Overall, the reward function (R_t) was calculated as per
4. Therefore, some conditions were carried out inside the
Eq. 5, incorporating these variables and considerations.
mine, which correspond to the input layer of the neural
This comprehensive approach ensured that the rewards
network model, as detailed below:
aligned with the desired objectives of minimizing cycle
times, optimizing material types, and prioritizing efficient a. The travel times of the empty trucks located
operations. around 15 shovels.
T material þ 2 b. Number of trucks in each shovel.
Rt ¼ ð5Þ c. Number of trucks in each crusher.
DT cycle
d. Assign the paths of travel to each shovel.
A neural network model with 49 inputs, 4900 neurons in
5. The shovels were assigned, and the reward was given
the hidden layer, and 15 in the output layer was created
when a truck was unloaded (it had no material). It was
(see Fig. 13). The training parameters were set as follows:
established using the greedy epsilon strategy, by
• Learning rate a ¼ 0:001 randomly designating a probability of e ¼ 0:2; or
• Probability strategy e ¼ 0:8. 1  e. The network estimated the maximum value Q of
• Mean-squared error (MSE) loss function and an Adam the 15 shovels and stored the actions in a vector.
optimizer Finally, the dumps and crushers were assigned.
The latter helped to update the weights according to the The following parameters were designated:
training data in order to reduce the error during network
• Memory size, equal to the total number of trucks (110
learning [17].
units).
The starting conditions of the mine, such as the envi-
• The number of epochs was set to 2500 (design
ronment, states, actions, shift changes, lunch breaks,
criterion).
actions, and active equipment, were known. In addition, the
• Experience vector, where all the states, actions, and
processes carried out during the haul cycle are explained in
rewards during a simulation day in the mine were
detail as follows:
saved.
1. The status that trucks own during the haul cycle: • The trucks were considered to be the agents. Individual
rewards per agent were defined, and their values were
a. When the status was 0, trucks traveled from the
based on the time from the assignment of the truck to a
point of origin to the nearest shovel.
shovel to the unloading of the truck, either in the
crusher or dump including the queue that the truck
faced at a particular destination. Therefore, a place that
had a long queue would be lightly rewarded. Next, an
additional 50% reward was provided when the agent
moved to the ore shovels and discharged their content
directly into the crusher.
When the experience size (the accumulated rewards)
Fig. 13 Neural network design—network outputs after more than 2500 epochs of training exceeded a

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4523–4535 4531

particular value (10,000), a sample was selected and detailed examination of the situation at any given moment.
entered in mini-batches to generate the Q-type learning This approach facilitated a comprehensive understanding
algorithm. of the dynamics and performance of the mine’s hauling
Finally, the loss function was calculated, and the neural activities throughout the day. Figure 15 provides a visual
network model was stored in a file. representation of the simulated events, aiding in the visu-
alization and interpretation of the results.
Samples were collected at intervals of 10 s to monitor
4 Results the progression of the mining operations. Notably, the total
number of loaded trucks decreased to 55 units during the
During each training period, the total discharges of the day shift change event, which occurred at approximately
were recorded to understand the material movement pat- 25,000 and 70,000 s, equivalent to 18:56 and 19:26 h,
terns, as depicted in Figure 14. By adjusting the model’s respectively. This decline in the number of loaded trucks
parameters, it was observed that the number of discharges can be attributed to the transition between shifts. Similarly,
increased. However, there were also instances of low dis- during the lunch break event, it was estimated that around
charge values, particularly at higher training times. This 75 trucks were present between 40,000 and 50,000 s, cor-
behavior can be attributed to the implementation of a responding to a time range of 11:06–13:53 h. Figure 16
greedy epsilon strategy. The greedy epsilon strategy is illustrates these fluctuations in the number of trucks during
commonly employed when training Q-learning models the lunch break period. By analyzing these samples and
[18, 19]. It involves balancing the exploration-exploitation observing the temporal patterns of truck availability,
trade-off by considering the value of epsilon during both valuable insights can be gained regarding the impact of
training and verification phases. A low value of epsilon shift changes and lunch breaks on the mining operations.
leads to delayed convergence to the optimal solution, while Such information can aid in optimizing scheduling and
a high value may result in suboptimal solutions being resource allocation strategies to enhance overall efficiency
favored even when the optimal solution has been found. and productivity in the mine.
Hence, the observed peaks of low discharge values at high The mining operations were simulated for an entire day,
training times can be attributed to the interaction between and the analysis focused on identifying periods when the
the greedy epsilon strategy and the learning process. This crushers were without trucks. To determine this, delays
behavior is expected and aligns with previous findings in upstream were calculated using the simulation results. It
the field. It demonstrates the importance of carefully was found that during all hours of the day, these delays
adjusting the parameters and balancing exploration and accounted for less than 10% of the total time, as depicted in
exploitation to achieve optimal performance in the Q- Figure 17. This analysis sheds light on the efficiency of the
learning model. mining operations, highlighting that the crushers were
The hauling events in the mine, spanning a full day consistently supplied with trucks throughout the day. The
equivalent to 86,400 s, were simulated using the Pygame results demonstrate that delays in the upstream processes
module. This simulation allowed for the observation and
analysis of the mining operations over a continuous period.
The speed of the operation was controlled, enabling a

Fig. 14 Number of total downloads per day for each training epoch Fig. 15 Discrete event simulation of the mine

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4532 Neural Computing and Applications (2024) 36:4523–4535

Fig. 18 Percentage of use of crushers

resource allocation and operational strategies. By main-


taining high utilization rates, the mine can optimize its
Fig. 16 Simulation of daily operating trucks production capabilities and enhance its profitability.
Lastly, tests were conducted to assess the response time
were minimal, indicating a well-optimized system that of the algorithm. A total of 1000 executions of the model
ensures a steady flow of material to the crushers. By were performed, recording the execution time for each
maintaining delays below 10%, the mining operations can case. The results demonstrated exceptionally fast response
sustain high productivity and minimize any disruptions that times, with an average of less than 1 ms. The average
could hinder the overall efficiency of the process. This response time was calculated to be l = 0.76 ms, with a
information is crucial for making informed decisions standard deviation of r = 2.32 ms, as shown in the test
regarding resource allocation, scheduling, and process results. These findings underscore the algorithm’s effi-
improvements in order to maximize the mine’s ciency and suitability for real-time applications. With
profitability. response times measured in milliseconds, the algorithm can
The results allowed for estimating the utilization of the swiftly process and provide optimal solutions to the chal-
crushers per hour. It was observed that during hours lenges encountered in mining operations. The consistently
without any specific events, the crushers operated at close low execution times ensure that the algorithm can effec-
to 100% capacity. In the cases of shift change and lunch tively and rapidly handle the complexities of the mining
break events, it was verified that the crushers operated with environment, contributing to enhanced productivity and
over 90% of the trucks, as shown in Figure 18. These profitability. The fast response times exhibited by the
findings demonstrate the effectiveness of the mining algorithm are a testament to its effectiveness in real-world
operations in terms of utilizing the crushers efficiently. The mining scenarios, where timely decision-making and
high utilization rates indicate that the crushers are effec- optimized resource allocation are paramount.
tively processing the material and contributing to the
overall productivity of the mine. Even during shift changes
and lunch breaks, when some trucks may be temporarily 5 Discussion
unavailable, the crushers still maintain a high operating
capacity, ensuring minimal downtime and maximizing Our research, employing a deep Q-learning reinforcement
material processing. The results provide valuable insights learning model in a simulated environment calibrated with
into the performance and efficiency of the crusher opera- real mining operation data from Peru, stands at the fore-
tions, enabling informed decision-making regarding front of innovations in fleet management optimization. This
approach is particularly relevant when juxtaposed with
similar methodologies in recent studies.
In [7], they proposed a multi-agent contextual rein-
forcement learning framework for large-scale fleet man-
agement in ride-sharing platforms, using a simulator
calibrated with real historical data from Didi Chuxing.
Similarly, [8] presented an actor-critic approach for short-
term production planning and fleet management in mining
complexes, also utilizing a simulator based on historical
data. These methodologies, while analogous in their use of
sophisticated simulation environments, diverge in their
Fig. 17 Percentage of crusher time without truck

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4523–4535 4533

specific focus and application areas, highlighting the ver- In summary, our study contributes to the growing body
satility of simulation-based research in operational of knowledge in mining fleet management by introducing a
management. novel, efficient, and highly precise method. By integrating
In terms of data utilization, in [7] leveraged historical advanced machine learning techniques and leveraging
data from Didi Chuxing’s travel orders and vehicle tra- comprehensive data, we have demonstrated a significant
jectories, and [8] relied on historical equipment perfor- improvement in operational efficiency, providing a
mance data from a large-scale mining operation. Our study, promising direction for future research and practical
however, focused on comprehensive haul cycle informa- applications in the field.
tion over a 30-day period from a mining operation in Peru.
This unique dataset allowed for a more detailed and
localized analysis of fleet management challenges, pro- 6 Conclusions
viding insights specific to the operational contexts of
Peruvian mines. This study has yielded critical insights into the optimiza-
The results of these studies underscore the efficacy of tion challenges within mining operations. We observed
simulation and advanced modeling in improving opera- notable reductions in operational efficiency during shift
tional outcomes. Lin et al. [7] achieved significant changes and lunch breaks, with a decrease of approxi-
enhancements in revenue and customer satisfaction by mately 45–60% and 35–45% in the number of operating
coordinating thousands of vehicles. de Carvalho and trucks, respectively. Concurrently, a reduction of one to
Dimitrakopoulos [8] reported a 47% increase in cash flow four units per day was noted in shovel availability. The
by adapting shovel and truck assignments. In comparison, study also identified additional delays, including planned
our research achieved an impressive 90% supply coverage and operational losses, particularly affecting crushers and
to crushers during shift changes and lunch breaks, equipment.
demonstrating the practical benefits of our Q-learning To address these inefficiencies, we designed a Q-learn-
model in a real-world mining context. ing deep reinforcement learning model, treating each truck
Regarding precision, the simulation used by [7] as an independent agent with actions corresponding to the
achieved an r2 calibration of 0.9331 between the simulated available number of shovels. The model’s framework
and real gross merchandise value. de Carvalho and Dimi- included key operational parameters such as travel times to
trakopoulos [8] saw a 5.2% improvement in the forecasts of shovels, truck queues at shovels and crushers, and the
fed material quality using data assimilation. Our solution, distribution of trucks en route to each shovel. A reward
with average response times of less than a millisecond per system was implemented, prioritizing ore transportation
truck assignment, indicates high precision and suitability efficiency and minimizing cycle times, thereby fostering
for real-time applications. This aspect of our research is lower truck queues and more efficient material handling.
particularly noteworthy, as it suggests the potential for Our simulation environment was meticulously con-
immediate practical implementation in the mining industry. structed to mirror real-world conditions, incorporating
Comparatively, [20] presented a multistage optimization actual time distributions for each transport cycle stage and
framework for truck assignment in open-pit mines, using integrating delays associated with shift changes, lunch
goal programming and simulation, while [9] developed a breaks, and travel. The neural network model featured a
two-stage model for real-time truck dispatch, combining complex architecture with 49 input neurons, 4900 neurons
flow network theory and integer programming. While these in the hidden layer, and 15 output neurons. The deep
studies offer valuable insights into the complexity of fleet reinforcement learning algorithm trained the network to
management, our research’s use of deep Q-learning in a estimate the maximum Q-values based on agent
simulated environment to optimize crusher supply cover- observations.
age demonstrates a novel approach that effectively bal- The results showcased the model’s effectiveness,
ances theoretical robustness with practical applicability. achieving over 90% coverage for crusher usage during shift
Furthermore, our solution’s capacity to maintain over changes and lunch breaks, while maintaining manageable
90% supply coverage to crushers during crucial periods, truck queues at strategic locations. Furthermore, the mod-
such as shift changes and lunch breaks, represents a sig- el’s average execution time of one millisecond per truck
nificant advancement over [20], who reduced upstream assignment to shovels underscores its suitability for real-
delays in crushers to less than 10% throughout the day. The time operational applications.
average response time of less than a millisecond per truck This research marks a significant advancement in min-
assignment in our study not only underscores the precision ing fleet management through the application of a deep Q-
of our model but also its potential for real-time application learning reinforcement learning model. The model’s ability
in the mining industry. to enhance productivity, reduce downtime, and increase

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


4534 Neural Computing and Applications (2024) 36:4523–4535

profitability in the mining industry is a testament to its mining operations, driving productivity and operational
potential. efficiency.
Nevertheless, it is crucial to acknowledge the study’s
limitations and outline future research directions. The
Data availability The data used to support the findings of this study
simulated environment, calibrated with real operational
are available from the corresponding author upon request.
data, does not fully capture the complexities of an actual
mining operation. This limitation underscores the necessity
for additional testing in physical environments to validate Declarations
the model’s real-world applicability.
Moreover, the model was tested in a singular mining Conflict of interest The authors declare that they have no known
competing financial interests or personal relationships that could have
operation in Peru. To ascertain its broader applicability and influenced the work reported in this study. This study did not receive
generalizability, it is imperative to expand the scope of any financial support.
testing to include diverse mining sites with varying oper-
ational characteristics.
The model’s current limitation, restricting truck actions References
to the number of available shovels, presents an opportunity
for enhancement. Exploring more dynamic agent actions, 1. Moradi Afrapoli A, Tabesh M, Askari-Nasab H (2019) AA
multiple objective transportation problem approach to dynamic
such as alternative routing in response to fluctuating delays, truck dispatching in surface mines. Eur J Oper Res 276:331–342.
could potentially elevate the model’s performance. https://doi.org/10.1016/j.ejor.2019.01.008
In this study, the achievement of rapid response times 2. de Carvalho JP, Dimitrakopoulos R (2021) Integrating production
demonstrates the model’s potential for enhancing effi- planning with truck-dispatching decisions through reinforcement
learning while managing uncertainty. Minerals. https://doi.org/
ciency in mining operations. However, the challenge lies in 10.3390/min11060587
replicating this efficiency on a larger scale, particularly in 3. Mohtasham M, Mirzaei-Nasirabad H, Askari-Nasab H, Alizadeh
real-time implementations. Future research endeavors will B (2022) Multi-stage optimization framework for the real-time
be directed toward validating and refining this model under truck decision problem in open-pit mines: a case study on Sungun
copper mine. Int J Min Reclam Environ 36:461–491. https://doi.
more diverse and realistic conditions. Testing the model in org/10.1080/17480930.2022.2067709
actual physical mining environments will provide critical 4. Bnouachir H, Chergui M, Machkour N et al (2020) Intelligent
insights into its practical applicability and effectiveness. fleet management system for open pit mine. Int J Adv Comput Sci
Simultaneously, extending its application across multiple Appl (IJACSA). https://doi.org/10.14569/IJACSA.2020.0110543
5. Baek J, Choi Y (2020) Deep neural network for predicting ore
mining sites, each with its unique operational characteris- production by truck-haulage systems in open-pit mines. Appl Sci
tics, will allow for a comprehensive evaluation of the 10:1657. https://doi.org/10.3390/app10051657
model’s adaptability and generalizability. This expansion is 6. Zhang C, Odonkor P, Zheng S, et al (2020) Dynamic dispatching
crucial for ascertaining the model’s robustness in varying for large-scale heterogeneous fleet via multi-agent deep rein-
forcement learning. In: 2020 IEEE international conference on
mining contexts. big data (big data). IEEE, Atlanta, GA, USA, pp 1436–1441
Furthermore, the exploration of more dynamic actions 7. Lin K, Zhao R, Xu Z, Zhou J (2018) Efficient large-scale fleet
for truck agents, such as adjusting routes in response to management via multi-agent deep reinforcement learning. In:
changing operational conditions, will contribute to Proceedings of the 24th ACM SIGKDD international conference
on knowledge discovery & data mining. Association for Com-
enhancing the model’s responsiveness and overall effi- puting Machinery, New York, NY, USA, pp 1774–1783
ciency. The development of computational optimizations is 8. de Carvalho JP, Dimitrakopoulos R (2023) Integrating short-term
also vital for the model’s efficient deployment in large- stochastic production planning updating with mining fleet man-
scale, real-time applications. This includes incorporating agement in industrial mining complexes: an actor-critic rein-
forcement learning approach. Appl Intell. https://doi.org/10.1007/
additional uncertainties, like variations in cycle times and s10489-023-04774-3
equipment availability, to augment the model’s realism and 9. Ahangaran DK, Yasrebi AB, Wetherelt A, Foster P (2012)
practical utility. An expanded focus on coordinating the AReal—time dispatching modelling for trucks with different
model across multi-pit mining operations will broaden its capacities in open pit mines. Arch Min Sci 57:39–52. https://doi.
org/10.2478/v10267-012-0003-8
scope and enhance its effectiveness in complex mining 10. Mu C-Y, Chou T-Y, Hoang TV et al (2021) ADevelopment of
scenarios. Lastly, integrating the model with established multilayer-based map matching to enhance performance in large
commercial fleet management systems, such as MOD- truck fleet dispatching. ISPRS Int J Geo-Inf. https://doi.org/10.
ULAR MINING, will facilitate further testing and valida- 3390/ijgi10020079
11. Chaowasakoo P, Seppälä H, Koivo H, Zhou Q (2017) ADigital-
tion, ensuring its readiness for industry-wide adoption and ization of mine operations: scenarios to benefit in real-time truck
implementation. Through these focused efforts, the study dispatching. Int J Min Sci Technol 27:229–236. https://doi.org/
aims to significantly contribute to the advancement of 10.1016/j.ijmst.2017.01.007

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Neural Computing and Applications (2024) 36:4523–4535 4535

12. Silva L, Torquato M, Fernandes M (2018) Parallel implementa- 18. Bulut V (2022) Optimal path planning method based on epsilon-
tion of reinforcement learning Q-learning technique for FPGA. greedy Q-learning algorithm. J Braz Soc Mech Sci Eng 44:106.
IEEE Access. https://doi.org/10.1109/ACCESS.2018.2885950 https://doi.org/10.1007/s40430-022-03399-w
13. Chen J, Li K, Li K et al (2021) Dynamic bicycle dispatching of 19. Zai A, Brown B (2020) Deep Reinforcement Learning in Action,
dockless public bicycle-sharing systems using multi-objective 1st ed. Manning, Shelter Island, New York
reinforcement learning. ACM Trans Cyber-Phys Syst. https://doi. 20. Moradi Afrapoli A, Askari-Nasab H (2019) Mining fleet man-
org/10.1145/3447623 agement systems: a review of models and algorithms. Int J Min
14. Adi TN, Bae H, Iskandar YA (2021) Interterminal truck routing Reclam Environ 33:42–60. https://doi.org/10.1080/17480930.
optimization using cooperative multiagent deep reinforcement 2017.1336607
learning. Processes. https://doi.org/10.3390/pr9101728
15. Modular Mining Systems, Inc (2016) Sistema Dispatch Publisher’s Note Springer Nature remains neutral with regard to
16. Führer C, Solem JE, Verdier O (2021) Scientific computing with jurisdictional claims in published maps and institutional affiliations.
Python: high-performance scientific computing with NumPy,
SciPy, and pandas, 2nd edn. Packt Publishing Ltd, Birmingham
Springer Nature or its licensor (e.g. a society or other partner) holds
Mumbai
exclusive rights to this article under a publishing agreement with the
17. Wilson R, Mercier PHJ, Navarra A (2022) Integrated artificial
author(s) or other rightsholder(s); author self-archiving of the
neural network and discrete event simulation framework for
accepted manuscript version of this article is solely governed by the
regional development of refractory gold systems. Mining
terms of such publishing agreement and applicable law.
2:123–154. https://doi.org/10.3390/mining2010008

123

Content courtesy of Springer Nature, terms of use apply. Rights reserved.


Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:

1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

onlineservice@springernature.com

You might also like