A Hybrid Genetic Algorithm Based On Reinforcement Learning For The Energy-Aware Production Scheduling in The Photovoltaic Glass Industry

Computers & Operations Research 163 (2024) 106521
Contents lists available at ScienceDirect
Computers and Operations Research

journal homepage: www.elsevier.com/locate/cor
A hybrid genetic algorithm based on reinforcement learning for the

energy-aware production scheduling in the photovoltaic glass industry
Weiwei Cui a,b , Biao Yuan c,d ,∗
a School of Management, Shanghai University, Shanghai, China
b
Shanghai Key Laboratory of Intelligent Manufacturing and Robotics, Shanghai, China
c
Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Shanghai, China
d
Data-Driven Management Decision-Making Lab, Shanghai Jiao Tong University, Shanghai, China
ARTICLE INFO ABSTRACT
Keywords: Recently, the growing solar energy capacity has played a significant role in developing a clean energy supply
Photovoltaic glass system in China. However, the resulting rapid expansion of photovoltaic component (e.g., glass) manufacturing
Hybrid flow shop intensifies the energy demand in the locality of the plant. Therefore, this paper considers the energy-aware
Genetic algorithm
production scheduling of a deep-processing line in the photovoltaic glass plant, whose layout is a hybrid flow
Reinforcement learning
shop with batch and non-batch machines. Firstly, we establish a mixed integer programming model with the
minimization of the energy consumption and the penalty for excess of the due date. Then, we propose a hybrid
genetic algorithm (GA) based on reinforcement learning to solve the problem. Specifically, the expected Sarsa
is used to extract critical knowledge about algorithmic parameters during the population evolution to guide
the exploration of the GA. Finally, we conduct extensive numerical experiments to validate the effectiveness
of the proposed algorithm by comparing it with a commercial optimization solver and other metaheuristics.
The numerical results show that the average gap between the solver and the proposed algorithm is around
4% in small-sized instances. Compared with the heuristic used in the plant, the improvements of this paper
are about 16%∼18% and 17%∼21% in practical-sized instances for the delay penalty and energy consumption
objectives, respectively. In addition, the computational results provide managerial insights for managers in
further pursuing energy efficiency from higher-level decision-making, e.g., planning over multiple periods from
a tactical perspective, and changing production line configurations and introducing new processing techniques
from a strategic perspective.
1. Introduction 14th Five-Year Plan published by the National Economic and Social
Development of China.
According to the statistical review of world energy 2022,1 solar To satisfy the increasing solar energy market around the world,
energy capacity continued to proliferate from 72.2 GW in 2011 to more and more manufacturing companies have started to invest in
843.1 GW in 2021. It has become one of the most important renewable new plants producing photovoltaic (PV) modules; for example, Jinjing
resources, accounting for more than half of the increase in global power Group, which is a leading company in the glass industry in China,
generation over the past two years. As the primary catalyst for the just constructed a new line in Ningxia Province in 2022 to produce
growth of solar capacity, China installed 53 GW and 87 GW of solar ultra-clear glass used in the PV panel. On the one hand, this plant
capacity in 2021 and 2022, respectively, resulting in a cumulative contributes to the prosperity of the PV industry by manufacturing high-
installed capacity of 393.4 GW. Nowadays, the electricity generated by quality components. On the other hand, the production line consumes
solar and wind is beyond 1.19 trillion kWh each year, which occupies
significant energy, putting additional pressure on the local energy sup-
13.8% of the total electricity consumption in China (Department of
plier. Although the production and scheduling problems in traditional
Energy Statistics, 2023). The rapid growth of renewable energy dramat-
glass plants have been well studied by researchers considering the high
ically reduces reliance on fossil fuels and highly supports the Chinese
energy demand in heating quartz sand to produce liquid glass (Fabiano
government in achieving the goal of a carbon peak in 2030. Meanwhile,
Motta Toledo et al., 2016), the studies focusing on the deep-processing
the solar capacity will double in the next few years, according to the
∗ Corresponding author at: Sino-US Global Logistics Institute, Shanghai Jiao Tong University, Shanghai, China.
E-mail addresses: cuiww67@shu.edu.cn (W. Cui), biaoyuan.ie@sjtu.edu.cn (B. Yuan).
1
https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy.html
https://doi.org/10.1016/j.cor.2023.106521
Received 19 May 2023; Received in revised form 26 September 2023; Accepted 19 December 2023
Available online 21 December 2023
0305-0548/© 2023 Elsevier Ltd. All rights reserved.
W. Cui and B. Yuan Computers and Operations Research 163 (2024) 106521
line used to remold the characteristics of glass sheets are relatively perspective. It will reduce the energy demand intensity of glass plants
scarce. and promote the sustainable development of the PV glass industry.
Typically, glass sheets must undergo several processing stages, such The rest of this paper is organized as follows. A review of works
as edging, coating, and tempering, in a deep-processing line to meet the related to the production scheduling of glass plants is presented in
PV panel requirements, such as shape, light transmission, and strength. Section 2. The problem description and mathematical model of the
Different processes require different machines. Therefore, scheduling problem are given in Section 3. A hybrid GA based on reinforcement
the glass sheets in the deep-processing line can be viewed as a Hybrid learning is designed to solve the model effectively in Section 4. Then,
Flow Shop Scheduling Problem (HFSSP), where a piece of the glass numerical experiments are investigated in Section 5 to validate the
sheet represents a job. While no papers have specifically addressed effectiveness of our algorithm and show managerial insights. Finally,
the scheduling of PV glass in a deep-processing line as a hybrid flow conclusions and future works are summarized in Section 6.
shop, several papers have studied similar problems related to other
types of glass. Wang et al. (2020) studied glass ceramics and assumed 2. Literature review
that the deep-processing line is a two-stage hybrid flow shop, in which
several parallel machines exist in the first stage and only one furnace We first concentrate on the literature about scheduling problems
exists in the second stage. Liu et al. (2020) studied the tempered in the glass industry in Section 2.1. Meanwhile, since the proposed
glass and assumed that the deep-processing line is a three-stage hybrid problem is formulated as an HFSSP, we review the articles about
flow shop, in which several parallel machines exist in the cutting and HFSSP, especially those considering energy-efficient objectives.
printing stages, and only one furnace exists in the tempering stage. Both
papers consider the batch nature of the tempering furnace, although the 2.1. Scheduling problems in the glass industry
assumptions about the processing time of one batch in the tempering
furnace are different. In addition to the makespan, a traditional time- Several papers study glass processing from the system level, fo-
oriented objective, the energy cost under the time-of-use tariff is also cusing on different aspects of glass plants. He et al. (1996) was one
considered (Wang et al., 2020; Liu et al., 2020). Both papers validate of the earliest papers examining a production scheduling problem
that inserting appropriate buffer times into the schedule is beneficial encountered in a glass manufacturing company, in which a sequen-
for the trade-off between these two objectives, since the buffer time tial heuristic algorithm was designed to solve a no-delay flow shop
helps to shift the processing of the job to intervals with lower electricity model. Richard and Proust (2000) paid attention to the loading of the
machine for the furnace considering the glass color, then (T’kindt et al.,
prices.
2001) continued to study this furnace and focused on the sequenc-
In this study, inspired by the scheduling challenges of a PV glass
ing of orders. Almada-Lobo et al. (2008) designed a VNS approach
plant, we focus on the deep processing line used to process different
to the long-term capacitated lot-sizing problem in a glass container
types of PV glass, where different machines in the line require energy
industry to determine the production planning of multiple periods.
to perform operations. In the problem, a set of glasses should be
Similarly, Hervert-Escobar and Pérez (2017) proposed an optimization
completed before the due date; otherwise, a delay penalty is incurred.
model that maximizes the fulfillment of the demand from the planning
Therefore, this study aims to minimize the penalty for exceeding the
production formulation based on a case study. Gicquel et al. (2010)
due date and the total energy consumption simultaneously by optimiz-
focused on the coating stage in the glass plant. The authors proposed
ing the production schedule of these glasses. We devise an efficient
a mixed integer programming formulation to determine the optimal
algorithm combining the metaheuristic and reinforcement learning to
configuration of the machines in the coating stage. de Souza Amorim
solve the real-sized problems. In addition, we intend to investigate the
et al. (2021) also considered the configuration problem and adopted
relationship between the two objectives and find approaches to reduce
hybrid evolutionary algorithms to redesign new furnaces and molding
energy consumption through numerical experiments.
machines for the glass container industry. Lozano and Medaglia (2014)
To the best of our knowledge, the proposed problem, considering
investigated scheduling incompatibility jobs on many parallel batch
the glass types and the batch production mode in the PV glass deep- processing machines found in a bottleneck workstation of a safety
processing line, has not been investigated in the literature so far to glass manufacturing facility. Arbib et al. (2022) presented a mixed
minimize the time- and energy-oriented objectives simultaneously. The integer program in the classical vein of robust optimization to study the
main contributions of this study are summarized as follows. stock assortment and cutting in the cutting stage of glass production,
(i) From the theoretical aspect, a mixed integer linear programming considering the stochastic realization of defects in glass sheets.
model is established, which is an extension of the traditional HFSSP The works mentioned above consider the unique characteristics
model. Meanwhile, to solve the model efficiently, a reinforcement of real-world production lines in glass plants and emphasize the sig-
learning method is integrated into the algorithmic framework to im- nificance of various important factors. They aim to address specific
prove the performance of the adopted metaheuristic by training and problems in those glass plants rather than improving algorithms to
selecting its hyper-parameters. This idea can be applied to other studies solve conventional classic problems under benchmark settings more
that use metaheuristics to solve combinatorial optimization problems. efficiently. Despite the valuable contributions made by the above re-
For example, Yılmaz and Durmusoglu (2018) used the genetic algo- searchers, there remains a significant gap in effectively addressing pro-
rithm (GA), simulated annealing, and artificial bee colony algorithm duction scheduling issues in the glass industry, mainly due to the sub-
to solve the batching problem in a multi-hybrid cell manufacturing stantial variations in production line configurations between different
system; and Yılmaz and Yazıcı (2022) used the non-dominated sort- plants.
ing genetic algorithm II to solve the multi-objective disassembly line
balancing problem. 2.2. HFSSP
(ii) From the practical aspect, a detailed numerical experiment is
conducted through many generated instances to provide managerial As a common manufacturing environment, the HFSSP has already
insights. The outcomes of the research can not only help the plant man- been studied by many researchers (Ruiz and Rodríguez, 2010). The
agers to make a better arrangement for the glasses in the scheduling peculiarities of different industrial applications are formulated as con-
horizon, but also can provide helpful guidance to further pursue energy straints in the HFSSP. For example, Yılmaz and Yılmaz (2022) con-
efficiency from higher-level decision-making, such as planning over sidered the limited waiting time between consecutive stages in the
multiple periods from a tactical perspective, and changing line config- tire industry, which can be viewed as a new variant of HFSSP from
urations and introducing new processing techniques from a strategic the mathematical perspective and can be extended to other scenarios
2
from the realized practice perspective. Here, we mainly summarize the Table 1
Notation used in problem description.
literature simultaneously considering traditional time-oriented and new
energy-oriented objectives. Notation Description
Bruzzone et al. (2012) tried to promote energy saving without 𝑔 Index of producing stages, 𝑔 ∈ {1, 2, 3}
𝑗 Index of machines in one stage
changing the jobs’ assignment and sequencing provided by the refer-
𝑀𝑔 Set of machines in Stage 𝑔 ∈ {1, 2, 3}
ence schedule given by the APS software. Dai et al. (2013) tried to 𝑖 Index of glasses
balance the makespan and total energy consumption of the system 𝐼 Set of glasses
by selecting appropriate machine speeds and adopted an improved 𝑘 Index of glass types
𝐼𝑘 Set of glasses belonging to type 𝑘
genetic-simulated annealing algorithm to solve the model. Yan et al.
𝐵𝑗 Set of batches in machine 𝑗 of 𝑀 3
(2016) first optimized the cutting parameters of each machine based 𝐾 Number of glass types
on grey relational analysis at the machine level, then optimized the 𝑇 Due date of glasses
scheduling at the flow shop floor level based on GA. Tang et al. (2016) 𝑝𝑔𝑖𝑗 Processing time of Glass 𝑖 in Machine 𝑗 of 𝑀 𝑔 , 𝑔 ∈ {1, 2}
used a predictive–reactive way to reduce energy consumption and 𝑝3𝑖 Processing time of Glass 𝑖 in any tempering furnace
𝑞𝑖 Size of Glass 𝑖
makespan for a flexible flow shop with dynamic interruptions. Li et al.
𝑄 Capacity of one tempering furnace in Stage 3
(2018) proposed a multi-objective optimization algorithm combining 𝜇𝑗𝑔 Power demand of Machine 𝑗 of 𝑀 𝑔 under working state,
several deep-exploitation and deep-exploration strategies to minimize 𝑔 ∈ {1, 2, 3}
energy consumption considering the setup time between consecutive 𝜗𝑔𝑗 Power demand of Machine 𝑗 of 𝑀 𝑔 under idle state,
𝑔 ∈ {1, 2, 3}
jobs. Lu et al. (2018) proposed a hybrid three-objective grey wolf
𝛿 Penalty cost per unit of time delay
optimizer and GA with a local search heuristic to optimize noise 𝛩 Big constant adopted to linearize the model
pollution alongside energy consumption and productivity issues. Meng 𝑥1𝑖𝑗 1 if Glass 𝑖 ∈ 𝐼 is assigned to Machine 𝑗 of 𝑀 1 ; 0 otherwise
et al. (2018) proposed five mixed integer linear programming models 𝑥2𝑖𝑗 1 if Glass 𝑖 ∈ 𝐼 is assigned to Machine 𝑗 of 𝑀 2 ; 0 otherwise
and an improved GA to solve the hybrid flow shop scheduling problem 𝑥3𝑖𝑗𝑏 1 if Glass 𝑖 ∈ 𝐼 is assigned the 𝑏th batch to Machine 𝑗 of
𝑀 3 ; 0 otherwise
with unrelated parallel machines when turning off idle machines is
𝑦𝑔𝑖𝑖′ 1 if Glass 𝑖 ∈ 𝐼 precedes Glass 𝑖′ ∈ 𝐼 in Stage 𝑔 ∈ {1, 2}; 0
allowed to save energy. Chen et al. (2018) considered the impact of lot otherwise
streaming and adopted the GA to obtain approximate Pareto solutions 𝑠𝑔𝑖 Start time of Glass 𝑖 ∈ 𝐼 in Stage 𝑔 ∈ {1, 2}
of makespan and power consumption for a hybrid flow shop. Zhang 𝑐𝑖𝑔 Finish time of Glass 𝑖 ∈ 𝐼 in Stage 𝑔 ∈ {1, 2}
𝑠3𝑗𝑏 Start time of the 𝑏th batch in Machine 𝑗 of 𝑀 3
et al. (2020) designed a three-stage multi-objective approach based 3
𝑐𝑗𝑏 Finish time of the 𝑏th batch in Machine 𝑗 of 𝑀 3
on decomposition to investigate an energy-efficient HFSSP consider- 3
𝜏𝑗𝑏 1 if the 𝑏th batch in Machine 𝑗 of 𝑀 3 is not an empty
ing sequence-dependent setups and transportation operations. Ding batch; 0 otherwise
et al. (2021) proposed a hybrid particle swarm optimization algorithm 𝐶𝑚𝑎𝑥 Completion time of the order
with integrated tabu search for a hybrid flow shop with variable 𝑠𝑡𝑔𝑗 Time turning on Machine 𝑗 of 𝑀 𝑔 , 𝑔 ∈ {1, 2, 3}
speeds and Time-of-Use tariff. Lian et al. (2021) studied a steelmaking 𝑓 𝑡𝑔𝑗 Time turning off Machine 𝑗 of 𝑀 𝑔 , 𝑔 ∈ {1, 2, 3}
𝑇𝑗𝑔 Working time length of Machine 𝑗 of 𝑀 𝑔 , 𝑔 ∈ {1, 2, 3}
plant scheduling problem considering energy saving through a two-
𝑒𝑔𝑗 Energy consumption of Machine 𝑗 of 𝑀 𝑔 , 𝑔 ∈ {1, 2, 3}
level strategy to enhance production efficiency and reduce energy
costs simultaneously. Shao et al. (2021) designed an ant colony opti-
mization behavior-based multi-objective evolutionary algorithm based
on decomposition for a distributed heterogeneous HFSSP, considering
different processing capabilities. Wu and Cao (2022) formulated the
cold-drawn seamless steel pipes as an HFSSP with batch processing Considering the economic effect of the giant furnace and the quick
machines and proposed a multi-objective evolutionary algorithm em- response to the customer order, the raw sheet glasses are made in
bedded with a greedy selection strategy to solve the problem. We advance and then placed in the buffer zone in the manufacturing plant.
can find that metaheuristics are adopted in almost all papers to solve Once detailed information about the customer order is obtained, the
problems related to HFSSP since their basic version is NP-hard. raw sheet glasses are picked from the buffer zone and processed in the
deep-processing line. Generally, the raw sheet glass must experience
edging, coating, and tempering in sequence before it is packed and
3. Problem description and mathematical model
delivered to the customer. Therefore, the deep-processing line usually
contains three sequential processing stages, i.e., edging, coating, and
The section first introduces the manufacturing process of glasses,
tempering. To make the processing line more flexible, there are some
describes the mathematical model for the problem, and uses an example
buffer zones between stages for storing the semi-finished products and
to illustrate the problem. Table 1 lists the sets, parameters, and
more than one machine to process the glasses in each stage. A classic
decision variables used in this section.
layout of the deep-processing line is a straight line, as shown in Fig. 1.
In the first stage, the edges of each glass are smoothed and beveled
3.1. Problem description in an edging machine. For any glass, choosing one in the set of
heterogeneous parallel machines 𝑀 1 is feasible to finish this process.
Photovoltaic glass is an essential component of solar panels. It is However, the processing times of one glass in different machines are
a type of ultra-clear glass with high light transmittance and strong different, and the power demands of different machines are also differ-
pressure resistance to increase the light adsorption ability and protect ent. Some machines can complete the process faster with higher power
the crystalline silicon solar cell. Usually, the thickness of PV glass is demand. Each glass should be processed on precisely one machine
3.2∼4.0 mm, the transmissivity is higher than 93%, and the strength in 𝑀 1 , and each machine can only process one glass simultaneously.
is 3∼5 times larger than the ordinary window glass in the market. After the edging process, glasses are transferred to the coating stage or
Like ordinary glass, PV glass is also made from natural materials, temporarily stored in the buffer zone.
such as sand, soda ash, and limestone, in a furnace with very high Each glass is moved on a roller system in the second stage and
temperatures. After cooling, the liquid glass becomes a solid sheet. coated in a coating machine. Anti-reflective coatings can reduce the
Then, the raw sheet glass is cut into several pieces with different sizes theoretical reflection of soda-lime-silica float glass and increase light
according to the requirements of the cutting machine. Unlike ordinary transmission. For any glass, choosing one in the set of heterogeneous
glass, PV glass must be processed further in a deep-processing line. parallel machines 𝑀 2 , similar to 𝑀 1 is feasible to finish this process.
3
Fig. 1. Layout of the deep-processing line in a glass plant.
∑ ∑
Although the coating machine is very long and there is more than one 𝑥3𝑖𝑗𝑏 = 1, ∀𝑖∈𝐼 (2)
step processed by the machine, e.g., cleaning, drying, coating film, and 𝑗∈𝑀 3 𝑏∈𝐵𝑗
heating, we unify them as a single stage. After the coating process, 𝑥3𝑖𝑗𝑏 + 𝑥3𝑖′ 𝑗𝑏 ≤ 1, ∀ 𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 ; ∀𝑖 ∈ 𝐼 𝑘 , 𝑖′ ∈ 𝐼 𝑘 , 𝑘 < 𝑘′
′
(3)
glasses are transferred to the tempering stage directly or stored in the ∑
buffer zone temporarily. 𝑥3𝑖𝑗𝑏 ⋅ 𝑞𝑖 ≤ 𝑄, ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (4)
In the third stage, each glass is heated to a certain high degree in 𝑖∈𝐼
∑
the tempering furnace to enhance its strength using the reaction at 𝑐𝑖𝑔 = 𝑠𝑔𝑖 + 𝑥𝑔𝑖𝑗 ⋅ 𝑝𝑔𝑖𝑗 , ∀𝑖 ∈ 𝐼, 𝑔 ∈ {1, 2} (5)
the molecular level. Unlike the above two stages, batch processing is 𝑗∈𝑀 𝑔
adopted in the third stage. Several glasses can be formed into a batch 𝑠2𝑖 ≥ 𝑐𝑖1 , ∀𝑖 ∈ 𝐼 (6)
and moved into the tempering furnace simultaneously. The glasses in ( )
one batch are also moved out of this furnace together. This batch’s 𝑠𝑔𝑖′ + 𝛩 ⋅ 3 − 𝑥𝑔𝑖𝑗 − 𝑥𝑔𝑖′ 𝑗 − 𝑦𝑔𝑖𝑖′ ≥ 𝑐𝑖𝑔 ,
processing time equals the largest glass processing time in the batch. ∀𝑗 ∈ 𝑀 𝑔 ; ∀𝑖 ∈ 𝐼, 𝑖′ ∈ 𝐼, 𝑖 < 𝑖′ ; 𝑔 ∈ {1, 2} (7)
However, glasses of different types are incompatible and cannot be ( )
bounded into one batch. The batch capacity is defined by the physical 𝑠𝑔𝑖 + 𝛩 ⋅ 2 − 𝑥𝑔𝑖𝑗 − 𝑥𝑔𝑖′ 𝑗 + 𝑦𝑔𝑖𝑖′ ≥ 𝑐𝑖𝑔′ ,
space available in each tempering furnace for simultaneously process-
∀𝑗 ∈ 𝑀 𝑔 ; ∀𝑖 ∈ 𝐼, 𝑖′ ∈ 𝐼, 𝑖 < 𝑖′ ; 𝑔 ∈ {1, 2} (8)
ing multiple sheets of the same type. After the tempering process, ( )
glasses will be packed by the workers and delivered to the customers, 𝑠3𝑗𝑏 ≥ 𝑐𝑖2 + 𝛩 ⋅ 𝑥3𝑖𝑗𝑏 − 1 , ∀𝑖 ∈ 𝐼; ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (9)
which is beyond the scope of this paper. 3
𝑐𝑗𝑏 ≥ 𝑠3𝑗𝑏 + 𝑥3𝑖𝑗𝑏 ⋅ 𝑝3𝑖 , ∀𝑖 ∈ 𝐼; ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (10)
3.2. Mathematical model 𝑠3𝑗𝑏 ≥ 𝑐𝑗(𝑏−1)
3
, ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (11)
3
In this problem, all raw sheet glasses are ready before the deep- 𝐶𝑚𝑎𝑥 ≥ 𝑐𝑗𝑏 , ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (12)
processing line at the horizon’s beginning. One glass can be viewed 3
𝜏𝑗𝑏 3
≥ 𝜏𝑗(𝑏+1) , ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (13)
as a job that needs to be moved from Stage 1 to Stage 3. Thus, this ∑
3
problem can be modeled as a three-stage HFSSP with the batch feature 𝜏𝑗𝑏 ≤ 𝑥3𝑖𝑗𝑏 , ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (14)
in Stage 3. The machines in each of the first two stages are non- 𝑖∈𝐼
∑
3
identical, whereas the tempering furnaces in Stage 3 are identical. Each 𝛩 ⋅ 𝜏𝑗𝑏 ≥ 𝑥3𝑖𝑗𝑏 , ∀𝑗 ∈ 𝑀 3 , 𝑏 ∈ 𝐵𝑗 (15)
machine’s power demand is higher during working time than during 𝑖∈𝐼
idle time. No power is required when a machine is off. Each machine Constraint (1) ensures that each job has to be assigned to only one
is turned on before its first task and off after its last task. machine in each stage. Constraint (2) enforces that each job has to be
At the beginning of the production horizon, the decision maker gets assigned to only one batch of a tempering furnace in Stage 3. Constraint
the information that a set of glasses 𝐼 needs to be finished during (3) implies that jobs from different types cannot be bounded into one
this horizon. The glasses belong to different types indexed by 𝑘; thus,
⋃ batch. Constraint (4) ensures that the total size of one batch cannot
𝐼= 𝐾 𝑘
𝑘=1 𝐼 . The parameters, such as the processing times of glasses at exceed the capacity of the furnace. Constraint (5) establishes the link
different stages, are deterministic and known in advance. We assume
between one job’s start and finish times in Stages 1 and 2. Constraint
that unexpected machine failures are ignored since machines are usu-
(6) ensures that the job can only be started in Stage 2 after it is finished
ally maintained periodically to keep a higher reliability level in the
in Stage 1. Constraints (7) and (8) guarantee that no two jobs can
glass plant. To guide the operations in the plant, the plant manager
needs to decide at the beginning of the scheduling horizon about the be processed simultaneously on a machine in Stage 1 and Stage 2.
jobs’ allocation on parallel machines, jobs’ sequences in all machines, Constraint (9) means that a batch can only be started in the furnace
and jobs’ start times. A good production scheduling needs to satisfy after all jobs in that batch are finished in Stage 2. Constraint (10)
all the practical conditions and minimize the operational cost of the means that the processing time of a batch equals the longest processing
system at the same time. The constraints and objectives are formulated time of the job in that batch in the tempering furnace. Constraint (11)
as follows. guarantees that no two batches can be processed simultaneously in
the tempering furnace. Constraint (12) defines the order’s completion
3.2.1. Constraints time, i.e., when all glasses leave the deep-processing line. The valid
inequalities (13) to (15) are adopted to reduce the feasible solution
∑ space by deleting the situation that Batch 𝑏 + 1 is formed without
𝑥𝑔𝑖𝑗 = 1, ∀ 𝑖 ∈ 𝐼, 𝑔 ∈ {1, 2} (1)
𝑗∈𝑀 𝑔
forming Batch 𝑏.
4
3.2.2. Objectives 4. Solution approach

If the order cannot be finished before the due date, a penalty cost
will be calculated as Eq. (16), which means that no penalty is caused Although the model established in Section 3 can be solved by an op-
if jobs can be finished earlier. timization solver, such as CPLEX, GUROBI, and COPT, it is impossible
( )+ to get the optimal solution of a large-sized problem in the real-world
𝑜𝑏𝑗1 = 𝛿 ⋅ 𝐶𝑚𝑎𝑥 − 𝑇 (16) industry within an acceptable computational time. Therefore, a meta-
In addition, the total power consumption during the horizon must heuristic based on the GA is designed to solve the problem since it
be minimized to improve energy efficiency. The total power con- performs very well in the combinatorial optimization field. The GA is
sumption can be calculated as Eq. (17). Eq. (18) defines the power inspired by the natural selection process, which researchers widely use
consumption of Machine 𝑗 of 𝑀 𝑔 , including the consumption caused by to solve machine scheduling problems. However, the effectiveness of
working and idle time. Eqs. (19) and (20) calculate the working time the traditional GA highly depends on the genetic operators designed
length for each machine. Eqs. (21)–(24) define when each machine is and key parameters such as crossover rate and mutation rate. Many
turned on/off in this scheduling horizon. data, including good solutions and bad solutions, are generated ran-
domly by different operators during the evolution of the population;
∑
3 ∑
however, the algorithm only gives a higher survival probability to those
𝑜𝑏𝑗2 = 𝑒𝑔𝑗 (17)
𝑔=1 𝑗∈𝑀 𝑔
better solutions and does not gain any knowledge from these data to
( ) ( ) adaptively control the operators.
𝑒𝑔𝑗 = 𝑓 𝑡𝑔𝑗 − 𝑠𝑡𝑔𝑗 𝜗𝑔𝑗 + 𝑇𝑗𝑔 𝜇𝑗𝑔 − 𝜗𝑔𝑗 , ∀𝑗 ∈ 𝑀 𝑔 ; 𝑔 ∈ {1, 2, 3} (18) Expected Sarsa is a typical Q-value-based reinforcement learning
∑ 𝑔 𝑔 algorithm. It is an off-policy Time-Difference control algorithm by
𝑇𝑗𝑔 = 𝑥𝑖𝑗 ⋅ 𝑝𝑖𝑗 , ∀𝑗 ∈ 𝑀 𝑔 ; 𝑔 ∈ {1, 2} (19)
𝑖∈𝐼 approximating the action-value function to search the optimal policy,
∑( ) in which the Q value is updated as the following function
𝑇𝑗3 = 3
𝑐𝑗𝑏 − 𝑠3𝑗𝑏 , ∀𝑗 ∈ 𝑀 3 (20)
( ) ( ) ∑
𝑏∈𝐵𝑗 𝑄 𝑠𝑡 , 𝑎𝑡 ← 𝑄 𝑠𝑡 , 𝑎𝑡 + 𝛼[𝛾𝑡 + 𝛽( 𝜋(𝑎|𝑠𝑡+1 )𝑄(𝑠𝑡+1 , 𝑎)) − 𝑄(𝑠𝑡 , 𝑎𝑡 )], (26)
𝑎
𝑓 𝑡𝑔𝑗 ≥ 𝑐𝑖𝑔 + 𝛩(𝑥𝑔𝑖𝑗 − 1), ∀𝑖 ∈ 𝐼; ∀𝑗 ∈ 𝑀 𝑔 ; 𝑔 ∈ {1, 2} (21)
where 𝑠𝑡 is the state value of the environment at time step 𝑡 indi-
𝑠𝑡𝑔𝑗 ≤ 𝑠𝑔𝑖 + 𝛩(1 − 𝑥𝑔𝑖𝑗 ), ∀𝑖 ∈ 𝐼; ∀𝑗 ∈ 𝑀 𝑔 ; 𝑔 ∈ {1, 2} (22) cating the characteristics of environment sensed by the interactive
𝑓 𝑡3𝑗 ≥ 3
𝑐𝑗𝑏 , 3
∀𝑗 ∈ 𝑀 , 𝑏 ∈ 𝐵𝑗 (23) goal-seeking agent, 𝑎 is the taken action, and 𝜋(𝑎|𝑠𝑡+1 ) is the probability
of Action 𝑎 taken by the agent under the state 𝑠𝑡+1 using a predefined
𝑠𝑡3𝑗 ≤ 𝑠3𝑗𝑏 , 3
∀𝑗 ∈ 𝑀 , 𝑏 ∈ 𝐵𝑗 (24) policy, 𝛾𝑡 is the reward value obtained by the agent with Action 𝑎𝑡
under State 𝑠𝑡 , 𝛼 is the learning rate determining the step size of the
As mentioned above, considering energy naturally results in a trade-
learning process at each iteration, 𝛽 is the discount rate adjusting the
off between two objectives. For a bi-objective problem, there are three
importance of rewards over time. In this paper, the 𝜀−greedy policy is
approaches to solving the model: 𝜖-constraint approach, non-dominated
adopted to choose an action, i.e., the best action is selected greedily
Pareto solutions approach, and unifying different objectives approach.
with probability 1 − 𝜀 and a random action is selected with a small
To simplify the analysis framework, we adopt the third approach and
probability 𝜀.
use the weighted sum of two objectives as the final objective (25),
When the evolved populations in the GA are treated as a dynamic
where 𝜌1 + 𝜌2 = 1.
environment, it is possible to train an agent using reinforcement learn-
𝑜𝑏𝑗 = 𝜌1 ⋅ 𝑜𝑏𝑗1 + 𝜌2 ⋅ 𝑜𝑏𝑗2 (25) ing to determine the GA operators adaptively. It is more flexible than
the traditional GA, in which the operators are tuned in advance and
Therefore, a mathematical model can be established by minimizing fixed in all instances. Therefore, we design a novel Expected Sarsa-
Objective (25) and subjecting it to Formulas (1) to (24). Besides, all the based Genetic Algorithm (ESGA) to solve our model, in which Expected
decision variables in the model are non-negative. Sarsa is used to extract critical knowledge about algorithmic parameters
during the population evolution to guide the exploration of the GA.
3.3. A numerical example Although training the Q-value matrix (Q-table) requires additional
computational resources, it is done offline. Once the Q-value matrix
We use a numerical example in Fig. 2 to present the above model is well trained, the ESGA can directly use the values stored in the
clearly. The deep-processing line consists of three edging machines in Q-table when selecting actions under different states. Therefore, the
𝑀 1 , four coating machines in 𝑀 2 , and two tempering machines in 𝑀 3 . computational complexity of the ESGA is not higher than that of the
The size of the tempering machine 𝑄 equals 12. There are ten glasses traditional GA.
(jobs) that belong to two types, i.e., 𝐾 = 2, 𝐼 1 = {1, 2, 3, 4, 5}, and The overall flowchart of the ESGA is shown in Fig. 3 to illustrate the
𝐼 2 = {6, 7, 8, 9, 10}. The sizes of ten glasses are {4, 3, 4, 9, 2, 5, 2, 5, 4, algorithmic framework. The main components are described in detail
7}. Due date 𝑇 equals 100. Processing times are 𝑝111 = 10, 𝑝211 = 14, as follows.
𝑝251 = 20, 𝑝34 = 30, 𝑝35 = 25, 𝑝39 = 36, 𝑝310 = 30. The first batch in
Machine 2 of 𝑀 3 consists of two glasses belonging to the same type; 4.1. Representation of solution
thus, it can only be started after these glasses are finished in the coating
stage. The processing time of this batch equals 30, which is the longest The encoding method is the first consideration when using the GA to
processing time of these glasses in the tempering machine. Accordingly, solve problems. For the traditional HFSSP, the versions with a complex
the processing time of the second batch in Machine 2 of 𝑀 3 equals 36. encoding scheme yield poor results, whereas the versions separating
The total size of glasses in any batch is not larger than 12. Finally, all sequencing and assignment perform well (Ruiz and Maroto, 2006).
glasses can be finished at time 110, which is later than the due date Therefore, the chromosome is defined as a priority list 𝐿, indicating
and thus leads to the penalty cost of 10𝛿. Each machine in all stages is the priority relationship among all jobs. For example, 𝐿 = {3, 2, 1, 5, 4}
turned on before its processed first job and turned off immediately after represents that Job 3 is assigned to the machine firstly for a case
its processed last job. Accordingly, the working and idle time lengths with five jobs. The following parameters are defined to describe the
can be calculated for each machine, which can be used to calculate decoding procedure: 𝐿𝑖 indicates the 𝑖th job in this list, 𝐿𝑇𝑗𝑔 indicates
the total energy consumption. Note that some parameter values are not the allowable earliest start processing time of Machine 𝑗 of 𝑀 𝑔 . The
given in this example to simplify the description. details are as follows.
5
Fig. 2. Gantt chart of a solution with 10 glasses.
Fig. 3. Overall flowchart of ESGA.
6
4.2. Genetic operators

Algorithm 1 : Decoding Procedure
∕∕ arrange jobs in Stages 1 and 2 4.2.1. Selection
1: for 𝑔 ∶= 1 to 2 do The tournament selection operator is used in the algorithm, which
2: 𝐿𝑇𝑗𝑔 ← 0, 𝑇𝑗𝑔 ← 0, ∀𝑗 can dramatically reduce the selection pressure compared to roulette
3: for 𝑖 ∶= 1 to 𝑛 do wheel selection. Two individuals are randomly selected, and the one
4: if 𝑔 = 1 with the better performance is chosen as a parent. Then, the other
5: choose Machine 𝑗 ∗ in stage 𝑔 to process 𝐿𝑖 by selecting the parent is selected in the same way.
smallest value of 𝐿𝑇𝑗𝑔 + 𝑝𝑔𝐿 𝑗 , and 𝑠𝑔𝐿 ← 𝐿𝑇𝑗𝑔∗ , 𝑐𝐿𝑔 ← 𝑠𝑔𝐿 + 𝑝𝑔𝐿 𝑗 ∗ When selecting a parent, two kinds of criteria are considered. One
𝑖 𝑖 𝑖 𝑖 𝑖
6: else
∗
is the objective value, a commonly used method to construct the
7: choose Machine 𝑗 in stage 𝑔 to process 𝐿𝑖 by selecting the smallest
fitness function. The other one is the diverse contribution of each
value of max(𝐿𝑇𝑗𝑔 , 𝑐𝐿𝑔−1 ) + 𝑝𝑔𝐿 𝑗 , and 𝑠𝑔𝐿 ← max(𝐿𝑇𝑗𝑔∗ , 𝑐𝐿𝑔−1 ), 𝑐𝐿𝑔 ← 𝑠𝑔𝐿 + 𝑝𝑔𝐿 𝑗 ∗
8: end if
𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖
individual since maintaining the diversity of the population as long as
9: 𝑔 𝑔 𝑔 𝑔
for the chosen Machine 𝑗 ∗ of 𝑀 𝑔 , 𝐿𝑇𝑗 ∗ ← 𝑐𝐿 , 𝑇𝑗 ∗ ← 𝑇𝑗 ∗ + 𝑝𝐿 𝑗 ∗ 𝑔 possible is crucial to the good performance of metaheuristics (Reeves,
𝑖 𝑖
10: end for 2010). To reduce the computational time, the diversity of individual
11: update the list 𝐿 according to the finish time of jobs in stage 𝑔 𝑃 is defined as the distance between 𝑃 and the individual with the
12: set 𝑠𝑡𝑔𝑗 to be the start time of the first processed job in Machine 𝑗 of smallest objective value. A distance measure 𝛿(𝑃 , 𝑃𝑏𝑒𝑠𝑡 ) based on the
𝑀 𝑔 , ∀𝑗 differences between the jobs’ priority relationship of 𝑃 and that of 𝑃𝑏𝑒𝑠𝑡
13: set 𝑓 𝑡𝑔𝑗 to be the finish time of the last processed job in Machine 𝑗 of is proposed. For Individual 𝑃 , if Job 𝑖 is a predecessor of Job 𝑗 in the
𝑔
𝑀 , ∀𝑗 priority list, then 𝜖𝑖𝑗 = 1; otherwise, 𝜖𝑖𝑗 = 0. The number of jobs’
14: end for
pairs is 𝑛 ⋅ (𝑛 − 1)∕2. Then, the distance is calculated as 𝛿(𝑃 , 𝑃𝑏𝑒𝑠𝑡 ) =
∕∕ generate a set of batches 𝐵 for stage 3 2 ∑ ∑
15: set 𝐵 ← ∅ , and 𝑏 ← 0 𝑛(𝑛−1) 𝑖∈𝐼 𝑗>𝑖 ℎ(𝜖𝑖𝑗 (𝑃 ) ≠ 𝜖𝑖𝑗 (𝑃𝑏𝑒𝑠𝑡 )), where function ℎ(𝑥) is a valuation
16: while 𝐿 is not empty do function that returns one if the condition 𝑥 is true.
17: 𝑏←𝑏+1
18: generate a new batch 𝐵 𝑏 , set type of 𝐵 𝑏 be the type of 𝐿1 , and set the 4.2.2. Crossover and mutation
capacity 𝑄𝑏 ← 𝑄. Considering the feasibility of the chromosome and the crossover
19: for 𝑖 ∶= 1 to 𝐿.count do pressure, the uniform crossover (UX) operator is adopted in the algo-
20: if (Type of 𝐿𝑖 = Type of 𝐵 𝑏 ) and (𝑞𝐿𝑖 ≤ 𝑄𝑏 ) rithm. By randomly generating the pattern of 0 and 1, a UX operation
21: add 𝐿𝑖 into 𝐵 𝑏 , 𝑄𝑏 ← 𝑄𝑏 − 𝑞𝐿𝑖 with 𝑛 genes can be obtained, which might generate an array such as
22: end if
{1, 0, 1, 0, 0} when 𝑛 = 5. The components corresponding to the 1s are
23: end for
copied from one parent, and then others corresponding to the 0s are
24: delete all jobs in 𝐵 𝑏 from 𝐿
25: set the release time of 𝐵 𝑏 to be the largest 𝑐𝐿2 among all jobs in 𝐵 𝑏 taken in the order they appear from the other parent.
𝑖
26: set the processing time 𝑝(𝐵 𝑏 ) of 𝐵 𝑏 to be the largest 𝑝3𝐿 among all jobs The reverse mutation operator is adopted considering the feasibility
𝑖
in 𝐵 𝑏 of the chromosome and the ability to escape from the sub-optimal
27: add 𝐵 𝑏 into 𝐵 regions. For the mutated individual, two genes are selected randomly,
28: end while and the sequence between these two genes is reversed.
∕∕ arrange batches in Stage 3
29: sequence the batches according to the release time of batches, i.e., 𝑟(𝐵 𝑏 ) ≤ 4.2.3. Population management
𝑟(𝐵 𝑏+1 ), ∀𝑏
When updating the old population with offspring, the best individ-
30: for 𝑏 ∶= 1 to |𝐵| do
ual with the smallest objective value in the old population is stored
31: choose Machine 𝑗 ∗ in stage 3 to process 𝐵 𝑏 by selecting the earliest
available machine
in the new population to improve the stability of the algorithm. If
32: set start time of 𝐵 𝑏 to be the maximum of its release time and machine’s several individuals with the same smallest objective value exist, choose
available time the individual with the best diversity. If there still exists the same
33: update the machine’s available time for Machine 𝑗 ∗ of 𝑀 3 , add 𝐵 𝑏 into diversity between these individuals, then randomly choose one. The
𝐵𝑗 ∗ other individuals of the new population are directly adopted from the
34: end for offspring. The algorithm stops when the population iteration reaches
35: set 𝑠𝑡3𝑗 to be the start time of the first processed batch in Machine 𝑗 of the maximum number 𝐼𝑡𝑒𝑟𝑚𝑎𝑥 .
𝑀 3 , ∀𝑗
36: set 𝑓 𝑡3𝑗 to be the finish time of the last processed batch in Machine 𝑗 of
4.3. Parameter control scheme based on expected sarsa
𝑀 3 , ∀𝑗
37: for 𝑗 ∶= 1 to 𝑀 3 do
38: for 𝑏 ∶= |𝐵𝑗 | − 1 to 1 do This section introduces the main components of Expected Sarsa.
3
39: 𝑐𝑗𝑏 ← 𝑠3𝑗(𝑏+1) , 𝑠3𝑗𝑏 ← 𝑐𝑗(𝑏)
3
− 𝑝(𝐵 𝑏 )
40: end for 4.3.1. States
41: end for The states of reinforcement learning should describe the primary
42: update 𝑠𝑡3𝑗 according to the start time of the first batch in Machine 𝑗 of characteristics of the population and track the performance of the pop-
𝑀 3 , ∀𝑗 ulation. Let 𝐼𝑡𝑒𝑡 be the index of the 𝑡th population. Let 𝐷𝑖𝑣𝑡 be the diver-
∑
sity of the 𝑡th population, 𝐷𝑖𝑣𝑡 = 𝑃 ∈𝑃 𝑜𝑝𝑡 ,𝑃 ≠𝑃𝑏𝑒𝑠𝑡 𝛿(𝑃 , 𝑃𝑏𝑒𝑠𝑡 )∕(|𝑃 𝑜𝑝𝑡 | − 1).
Then, the state of the population is set as (𝐼𝑡𝑒𝑡 , 𝐷𝑖𝑣𝑡 ). To use the Q-table
To illustrate the chromosome encoding and decoding schemes, we
method, it is necessary to partition state space. In this study, the max-
present an example in Fig. 4. We consider ten glasses in the example.
imum iteration of the GA is set to 3000, and the discretization of 𝐼𝑡𝑒𝑡
According to the priority list represented by the chromosome, the
is set to {100, 300, 600, 1000, 1500, 2000, 2400, 2700, 3000}. Since 𝐷𝑖𝑣𝑡 is a
glasses are first arranged in Stage 1. Then, we get the priority list {1, value between 0 and 1, it is set to {0.01, 0.03, 0.05, 0.07, 0.1, 0.2, 0.3, 0.4,
3, 2, 4, 5, 7, 6, 8, 9, 10} for Stage 2. After arranging glasses in Stage 0.5, 1}. Therefore, the size of the state space is 90. A larger size may
2, we can get the priority list {1, 4, 3, 2, 7, 5, 6, 8, 9, 10} for Stage obtain a better control scheme of parameters, but it will lead to high-
3, based on which we construct four batches. Finally, the four batches dimensional state spaces at the same time and more computational
are arranged in Stage 3. burden.
7
Fig. 4. Encoding and decoding schemes: (a) encoding scheme, (b) arranging glasses in Stages 1 and 2, (c) constructing the batches, and (d) arranging batches in Stage 3.
4.3.2. Actions Algorithm 2: Training for Expected Sarsa.

Three kinds of actions are defined. The first is the action 𝑎1 deter- 1: set the Q-value matrix to be a zero matrix and 𝑇𝑡𝑟𝑎𝑖𝑛 = 0
mining the selection method in the GA. The action 𝑎1 contains two 2: while 𝑇𝑡𝑟𝑎𝑖𝑛 < 𝑁𝑡𝑟𝑎𝑖𝑛 do
different methods. Method I means that the individual with a better 3: set 𝑡 = 0, randomly initialize population 𝑃 𝑜𝑝𝑡 , and obtain the state 𝑠𝑡
objective value is kept when comparing two individuals. Method II of 𝑃 𝑜𝑝𝑡
means that the individual with a better objective value or diversity is 4: while 𝑡 < 𝐼𝑡𝑒𝑟𝑚𝑎𝑥 do
5: select an action 𝑎𝑡 in action space using 𝜀-greedy policy under 𝑠𝑡
kept randomly. The second is the action 𝑎2 determining the crossover
6: generate new population by using 𝑎𝑡 , i.e., updating the old population
probability of the GA. Since the crossover probability should be larger with offspring obtained by the selection, crossover, and mutation operators
to enhance the exploration ability of the algorithm and avoid trap-
ping in local optimality, the action 𝑎2 contains three different values 7: observe the state 𝑠′ of new population
{0.75, 0.85, 0.95}. The third is the action 𝑎3 determining the mutation 8: calculate the reward 𝛾𝑡 and update the Q-value matrix using Formula
probability of the GA. Since the mutation probability should be smaller (26)
to reduce the randomness of the search, the action 𝑎3 contains four 9: 𝑡←𝑡+1
10: 𝑠𝑡 ← 𝑠′
different values {0.05, 0.15, 0.25, 0.35}. Thus, the size of the action space
11: end while
is 24. 12: 𝑇𝑡𝑟𝑎𝑖𝑛 ← 𝑇𝑡𝑟𝑎𝑖𝑛 + 1
13: end while
14: return Q-value matrix
4.3.3. Reward
Since the goal of the algorithm is to find the feasible solution with
the smallest objective value, it is intuitive to set the reward 𝛾𝑡 (𝑠𝑡 , 𝑎𝑡 ) to
be one if the best objective value in 𝑃 𝑜𝑝𝑡+1 is smaller than that in 𝑃 𝑜𝑝𝑡 . 5.1. Test instances
Otherwise, 𝛾𝑡 (𝑠𝑡 , 𝑎𝑡 ) is set to zero.
The configuration of the deep-processing line in a glass plant is
as follows: three edging machines with {𝜇11 , 𝜇21 , 𝜇31 } = {30, 50, 75} kW
4.3.4. Training
and 𝜗1𝑗 = 0.2𝜇𝑗1 , ∀𝑗; four coating machines with {𝜇12 , 𝜇22 , 𝜇32 , 𝜇42 } =
The training of the algorithm is a procedure updating the Q-value {100, 100, 250, 250} kW and 𝜗2𝑗 = 0.4𝜇𝑗2 , ∀𝑗; two tempering machines
matrix. The pseudo-code of training is shown in Algorithm 2. Let with {𝜇13 , 𝜇23 } = {300, 300} kW and 𝜗3𝑗 = 0.6𝜇𝑗3 , ∀𝑗. The capacity of the
𝐼𝑡𝑒𝑟𝑚𝑎𝑥 be the number of iterations in GA. Let 𝑁𝑡𝑟𝑎𝑖𝑛 be the number tempering machine is 12. The speed of Machine 3 of 𝑀 1 is two times
of training episodes. faster than Machine 1 of 𝑀 1 , and the speed of Machine 2 of 𝑀 1 is 1.5
times faster than Machine 1 of 𝑀 1 . The speed of Machine 3 of 𝑀 2 is two
times faster than Machine 1 of 𝑀 2 . The speeds of the two tempering
machines are the same. The actual processing time of one operation
5. Computational results
equals the basic processing time divided by machine speed.
This section generates four cases by setting the number of glasses
In this section, we first introduce the instances used in numerical to {8, 50, 100, 200}. The due dates of glasses in four cases are {60 ×
experiments. Then, we compare the proposed algorithm and three other 0.5, 60 × 2, 60 × 4, 60 × 8} minutes, respectively. For each glass, its size
different algorithms. Next, we validate the effectiveness of the model. is randomly generated from the interval [1, 4]; basic processing time
Finally, we discuss the approaches to reduce energy consumption. The in the edging machine is randomly generated from [4, 16] minutes;
algorithm is coded in C# and tested on the computer with an Intel i5 basic processing time in the coating machine is randomly generated
3.20 GHz CPU, 8 GB RAM, and Windows 10 operating system. The from [10, 30] minutes; basic processing time in tempering machine is
mathematical model is solved by IBM ILOG CPLEX 22.10 with default randomly generated from [5, 15] minutes. In the plant, 200 glasses
settings. usually need to be finished within one eight-hour work shift. The
8
Table 2
Results of comparison between different algorithms.
𝑛 Instance CPLEX PGA1 PGA2 ESGA
𝑈𝐵 𝐿𝐵 𝐴𝑉 𝐺 𝑆𝑇 𝐷 𝐴𝑉 𝐺 𝑆𝑇 𝐷 𝐴𝑉 𝐺 𝑆𝑇 𝐷
I1 290.4 290.4 307.7 6.24 294.3 0.00 294.3 0.00
I2 268.5 268.5 289.2 8.12 281.1 0.43 280.8 0.00
8 I3 312.0 312.0 324.7 6.70 328.1 0.00 328.1 0.00
I4 342.4 337.4 361.6 6.38 360.6 0.00 360.6 0.00
I5 323.2 323.2 342.5 10.9 330.9 0.12 330.8 0.00
I1 1815 920.1 1941 23.3 1470 6.99 1470 3.55
I2 2155 990.8 2206 32.8 1632 3.33 1632 2.30
50 I3 2268 993.2 2137 38.1 1611 1.80 1611 1.61
I4 1933 969.9 2048 21.1 1563 2.87 1561 0.94
I5 2880 984.9 2074 29.2 1595 0.89 1594 0.79
I1 – – 3915 64.4 2866 5.31 2862 2.00
I2 – – 4392 66.2 3229 3.52 3226 0.80
100 I3 – – 4349 48.5 3230 5.14 3228 3.99
I4 – – 4276 66.7 3153 2.63 3152 1.64
I5 – – 4326 43.8 3216 3.23 3214 2.86
I1 – – 8296 103 6111 6.37 6103 4.14
I2 – – 9012 95.7 6492 10.1 6477 7.19
200 I3 – – 8829 118 6436 5.30 6431 4.84
I4 – – 8546 146 6252 6.04 6243 3.59
I5 – – 8713 82.6 6315 5.57 6307 2.59
penalty cost is 10 per minute if the finish time is later than the due 5.2.2. Comparison with other metaheuristics
date. For each case, five instances were generated randomly. In the previous section, we mainly compare different versions of GA-
based algorithms. To further validate the effectiveness of the ESGA, we
compare it with the SA (Fan and Su, 2022), the Jaya (Paraveen and
5.2. Comparison with different algorithms
Khurana, 2023), and the PSO (Hamdi and Boujneh, 2022). Note that
the algorithmic framework and the encoding scheme are the same as
5.2.1. Comparison with GA-based algorithms in references, while the decoding scheme is changed to Algorithm 1, as
To validate the effectiveness of the ESGA, numerical experiments comparisons between different metaheuristics should be made in the
were conducted by comparing it with three other different algorithms same problem setting.
when 𝜌1 = 0.5. First, the mathematical model was solved by CPLEX. In the following experiments, two types of glasses are considered,
Second, the problem was solved by the traditional GA without rein- and the number of glasses 𝑛 and the percentage of type I glasses are
forcement learning (PGA1), in which the chromosome of the solution set to {100, 200} and {50%, 60%, 70%, 80%, 90%}, respectively,
contains multiple strings corresponding to job sequences on machines forming ten parameter combinations. For each combination, five in-
in three stages. Finally, the problem was solved by the traditional GA stances are randomly generated. For each instance, each metaheuristic
without reinforcement learning (PGA2), in which the chromosome of is run five times independently. Table 3 lists the average value, the
the solution is the same as that in Section 4.1. The maximum runtime coefficient of variance, and the relative percentage deviation in the
of CPLEX is set to 3600 s. Each metaheuristic runs ten times on each columns ‘‘𝐴𝑉 𝐺’’, ‘‘𝐶𝑂𝑉 ’’, and ‘‘𝑅𝑃 𝐷’’, respectively, where each value
instance, and each run takes several minutes. Table 2 reports the is the average of five instances. The indicators ‘‘𝐴𝑉 𝐺’’ and ‘‘𝑅𝑃 𝐷’’
upper and lower bounds obtained by CPLEX in the columns ‘‘𝑈 𝐵’’ and have the same meaning in the previous section. The indicator ‘‘𝐶𝑂𝑉 ’’
‘‘𝐿𝐵’’, respectively, and the average value and standard deviation ob- represents the coefficient of variance, expressed as a percentage, in the
tained by the GA-based algorithms in the columns ‘‘𝐴𝑉 𝐺’’ and ‘‘𝑆𝑇 𝐷’’, objective values obtained during the five runs. Note that the reason
respectively. for not using the ‘‘𝐶𝑂𝑉 ’’ indicator in the previous section is that the
Table 2 indicates that the ESGA outperforms the other algorithms. values have no significant difference between the different algorithms
Although CPLEX can obtain optimal solutions for small-sized instances, in previous section. The results reveal that the ESGA outperforms other
metaheuristics both in terms of solution quality and algorithm stability.
it cannot handle large-sized instances. The performance of the PGA1
In addition, the box-plots of 𝐶𝑂𝑉 and 𝑅𝑃 𝐷 values obtained by the
is the worst, especially for the instances with more than 50 glasses.
four metaheuristics are shown in Fig. 6. Fig. 6(b) clearly illustrates that
One reason is that the multiple-chromosome method is not suitable
every solution generated by ESGA surpasses all solutions produced by
for the parallel-serial flow shop, as discussed by other researchers.
other metaheuristics across all instances, which validates the efficacy
Another reason is that the batch feature of the tempering machine
of the ESGA framework.
and the type feature of the glasses cause incompatibility. The PGA2
is much better than the PGA1, which shows the effectiveness of the
5.3. Model analysis
encoding and decoding methods proposed in this paper. The ESGA is
slightly better than the PGA2 from the viewpoints of average objective 5.3.1. Effectiveness of model
value and solution robustness, which validates the advantage of using The operations policy employed in glass plants is usually quite
reinforcement learning to improve the performance of the GA. simple, which is actually a heuristic described as follows. Different
In addition, we define an indicator 𝑅𝑃 𝐷 which is the relative types of glasses are processed sequentially. For those glasses belonging
percentage deviation from the best solution 𝑂𝑏𝑗𝑏𝑒𝑠𝑡 obtained by all GAs to one type, similar to the bin packing problem, the plant manager
for each instance, i.e., 𝑅𝑃 𝐷 = 100 ∗ (𝑂𝑏𝑗 − 𝑂𝑏𝑗𝑏𝑒𝑠𝑡 )∕𝑂𝑏𝑗𝑏𝑒𝑠𝑡 . Since tends to use the smallest number of batches to pack them, considering
CPLEX cannot produce competitive solutions for large-sized instances, the batch nature of the tempering machine. Once the glasses have been
we ignore it here and just draw the box-plots of 𝑅𝑃 𝐷 values obtained split into several batches, they are processed sequentially from Stage
by the three GAs. Fig. 5 shows that the 𝑅𝑃 𝐷 of the ESGA is the smallest, 1 to Stage 3. The above algorithm is denoted as HA. To validate the
which also validates the effectiveness of the ESGA. effectiveness of this study, numerical experiments were conducted to
9
Fig. 5. Box-plots of 𝑅𝑃 𝐷 obtained by three GAs for different sized problems.
Table 3
Comparison for results obtained by different metaheuristics.
𝑛 pct. of Type I ESGA SA Jaya PSO
Glasses %
𝐴𝑉 𝐺 𝐶𝑂𝑉 𝑅𝑃 𝐷 𝐴𝑉 𝐺 𝐶𝑂𝑉 𝑅𝑃 𝐷 𝐴𝑉 𝐺 𝐶𝑂𝑉 𝑅𝑃 𝐷 𝐴𝑉 𝐺 𝐶𝑂𝑉 𝑅𝑃 𝐷
50 3136 0.047 0.072 3185 0.187 1.621 3230 0.333 3.071 3232 0.367 3.119
60 3140 0.096 0.148 3183 0.349 1.545 3232 0.273 3.105 3234 0.346 3.152
100 70 3138 0.096 0.114 3183 0.322 1.520 3229 0.395 2.993 3234 0.391 3.167
80 3142 0.075 0.083 3187 0.289 1.524 3232 0.290 2.967 3233 0.408 2.981
90 3144 0.113 0.160 3194 0.323 1.761 3231 0.295 2.937 3243 0.443 3.310
50 6312 0.056 0.064 6453 0.224 2.294 6527 0.188 3.455 6520 0.249 3.360
60 6300 0.082 0.089 6451 0.293 2.479 6518 0.243 3.546 6510 0.268 3.417
200 70 6307 0.062 0.087 6456 0.253 2.447 6520 0.195 3.461 6504 0.239 3.204
80 6309 0.083 0.098 6455 0.180 2.413 6515 0.221 3.375 6503 0.268 3.179
90 6310 0.060 0.063 6477 0.208 2.703 6532 0.269 3.571 6520 0.279 3.392
Fig. 6. Box-plots of 𝐶𝑂𝑉 and 𝑅𝑃 𝐷 obtained by four metaheuristics.
calculate the Gap between the HA and the ESGA, which equals to the due date objective is about 16%∼18%. The improvement under
( )
𝑂𝑏𝑗𝐻𝐴 − 𝑂𝑏𝑗𝐸𝑆𝐺𝐴 ∕𝑂𝑏𝑗𝐻𝐴 ⋅ 100%. In this section, we set the number the energy objective is about 17%∼21%. The improvement is greater
of glasses 𝑛 to 200. There are two types of glasses. To analyze the for the instances with a larger percentage of Type I glasses from
impact of the portion of different types, we set the percentage of type both perspectives. The reason behind this phenomenon is that the HA
I glasses to {90%, 80%, 70%, 60%, 50%}. To analyze the impact of the performs worse under the scenario with unbalanced amounts of two
coefficient of different objectives, we set 𝜌1 to {0.1, 0.3, 0.5, 0.7, 0.9}. The types of glass since the HA handles different types of glass sequentially
difference between the HA and the ESGA solutions considering the due and the ESGA can handle them interlaced. Considering the effect of
date penalty is shown in Fig. 7. The difference between the HA and the coefficient 𝜌1 , the improvement of the due date objective changes a
the ESGA considering the energy consumption is shown in Fig. 8. Each little, as shown in Fig. 7, while the improvement of the energy objective
value in Figs. 7 and 8 indicates an average of five random instances. is greater when 𝜌1 is smaller, as shown in Fig. 8. This is because, as a
Figs. 7 and 8 show that the performance of the ESGA is much simple heuristic, the HA generates the same solution regardless of the
better than the HA from both perspectives. The improvement under coefficient 𝜌1 , whereas the ESGA can adaptively search for the solution
10
whole group with 100,000 solutions, the R value is only about 0.2. In
the divided small group with 10,000 solutions, the R value is below
0.1 in almost all cases. Thus, we can get the following remarks. (i) It is
necessary to consider two objectives simultaneously when the manager
selects a solution since a solution with a good due date objective does
not bring a good performance in the energy aspect and vice versa.
(ii) It is not hard to persuade a traditional manager who used to only
focus on the due date penalty objective to pay attention to the energy
consumption objective since reducing energy consumption may not
necessarily lead to an increase in the due date penalty. (iii) For a plant
in which two objectives have been optimized using our approach, it is
hard to decrease the energy consumption further, even if the manager
is eager to sacrifice the due date penalty objective.
The preceding analysis highlights the importance of assigning a
Fig. 7. Comparison between HA and ESGA with the objective of due date penalty. higher weight to the energy objective in this particular scenario. How-
ever, it is not possible to determine the exact weight in advance,
as it depends on individual manager preferences and specific case
requirements. The following procedure can help managers make the
final decision. First, we enumerate the 𝜌1 from 0 to 1, which results
in 11 settings when the step size is 0.1. For each setting, we use
the proposed ESGA to obtain the solution and objective values. Since
the computational time of the ESGA is within a few minutes, it is
acceptable to run it 11 times to obtain all candidate solutions. Second,
we use the Technique for Order of Preference by Similarity to the Ideal
Solution (TOPSIS) method to select the appropriate solution among the
candidate solutions. After constructing the positive and negative ideal
solutions, the Euclidean distance between each candidate solution and
the ideal solutions can be calculated concerning the two objectives.
Finally, we select the candidate solution with the highest closeness
ratio. The detailed calculation procedure can refer to Ghodratnama
Fig. 8. Comparison between HA and ESGA with the objective of energy consumption. et al. (2023). This method finds wide applications in multi-objective
decision problems, extending well beyond the domain of production
Table 4 scheduling (Mahmood et al., 2019).
R value between due date penalty and energy consumption.
Group Average R value 5.4. Approaches to reduce energy consumption
I1 I2 I3 I4 I5
Whole 0.212 0.211 0.209 0.226 0.237 Since energy-efficient objectives are becoming increasingly impor-
1 0.085 0.129 0.086 0.078 0.107 tant nowadays, it is necessary to explore ways to reduce energy con-
2 0.024 0.026 0.024 0.046 0.029
3 0.031 0.029 0.017 0.019 0.026
sumption further.
4 0.019 0.017 0.041 0.009 0.017
5 0.014 0.007 0.011 0.008 0.026 5.4.1. Comparison between different configurations
6 0.029 0.017 0.031 0.002 0.006
7 0.019 0.013 0.018 0.019 0.009 Firstly, we want to discuss the impact of the configuration of the
8 0.023 0.021 0.016 0.013 0.021 deep-processing line. The original configuration is denoted as B. An-
9 0.016 0.035 0.015 0.055 0.047 other configuration A is assumed that only one tempering machine with
10 0.116 0.061 0.093 0.096 0.077
(𝑄 = 24, 𝜇13 = 600) works in the tempering stage. Another configuration
C is assumed that four tempering machines with (𝑄 = 6, 𝜇𝑗3 = 150, ∀𝑗 ≤
4) work in the tempering stage. Three configurations are compared in
with lower energy consumption as the weight of the energy objective the case with two types of glasses and 𝜌1 = 0.5. The comparison results
increases. of five instances are shown in the following figures.
Fig. 9 shows that the energy consumption of Configuration C is the
5.3.2. Determination of objective weights smallest in all instances. Therefore, instead of a big tempering machine,
Since two objectives are considered in this model, it is necessary to setting several small tempering machines in the tempering stage would
analyze the balance of different objectives and help managers deter- be better. In addition, according to Fig. 10, Configuration C also
mine the appropriate weights. Figs. 7 and 8 show that the absolute performs better than A and B from the viewpoint of due date penalty.
value of the improvement changes little when 𝜌1 increases from 0.1
The reason behind this fact is that it is more flexible to handle different
to 0.9, especially for the due date penalty objective. Also, the results
types of glasses with more than one small machine compared with
do not demonstrate that one objective becomes smaller and another
one big machine. To validate this phenomenon, additional experiments
correspondingly higher. Therefore, we checked the objective values of
different solutions in five random instances. We randomly generated were conducted when the number of types increased from 2 to 4 and 8.
100,000 solutions for each instance and got their objective values. After The energy reduction of Configuration B compared with A is denoted
sequencing these solutions according to the energy consumption value, as Gap1, which equals to [𝑜𝑏𝑗2 (𝐴) − 𝑜𝑏𝑗2 (𝐵)]∕𝑜𝑏𝑗2 (𝐴) ⋅ 100. The energy
they were divided into ten small groups. The correlation analysis was reduction of Configuration C compared with A is denoted as Gap2,
conducted, and the obtained R values are shown in Table 4. which equals to [𝑜𝑏𝑗2 (𝐴) − 𝑜𝑏𝑗2 (𝐶)]∕𝑜𝑏𝑗2 (𝐴) ⋅ 100. The numerical results
Table 4 shows that the correlation between the due date penalty and are shown in Fig. 11, which validates that the reduction percentage is
energy consumption objectives in this problem is not significant. In the larger for cases with more types.
11
Fig. 9. Difference between energy consumption of three configurations. Fig. 12. Objectives reduction in cases with eight different techniques.
developing techniques IV and VIII. Compared to IV, technique VIII can

simultaneously reduce the due date penalty significantly. Thus, it is a
good research direction to shorten the glasses processing time on the
machines in the deep-processing line. In addition, we can find that
Technique II is better than I and III; Technique VI is better than V and
VII. It shows that the value of investigating coating machines is larger
than that of edging and tempering machines from both sides of reducing
energy consumption and due date penalty. In summary, Technique VI
is worth the investment, although its impact on reducing objectives is
slightly worse than VIII.
5.4.3. Comparison between different processing times

The above analysis focuses on one scheduling horizon, in which
Fig. 10. Difference between due date penalty of three configurations.
the glasses needed to be processed have already been determined in
advance. For higher-level decision-making, the plant manager planning
over multiple periods can decide the glasses allocation in these periods.
Can the system’s performance at the scheduling horizon be improved
by making appropriate allocations beforehand? To examine this impact,
we compared various scenarios. The processing times described in
Section 5.1 are set to the benchmark scenario. Another three scenarios
are generated as follows. Scenario 1: the processing times of all glasses
in Stage 1 are the same, which equals the average value of [4, 16].
Scenario 2: the processing times of all glasses in Stage 2 are the same,
equal to the average value of [10, 30]. Scenario 3: the processing times
of all glasses in Stage 3 are the same, which equals the average value of
[5, 15]. The computational results are shown in Table 5. 𝑅𝑎𝑡𝑖𝑜1 under
the due date penalty is calculated by 𝑜𝑏𝑗1 (𝑠𝑐𝑒𝑛𝑎𝑟𝑖𝑜1)∕𝑜𝑏𝑗1 (𝑏𝑒𝑛𝑐ℎ𝑚𝑎𝑟𝑘).
𝑅𝑎𝑡𝑖𝑜2 shows the comparison between Scenario 2 and the basic sce-
nario; 𝑅𝑎𝑡𝑖𝑜3 shows the comparison between Scenario 3 and the basic
Fig. 11. Energy reduction of configuration in cases with different types. scenario.
Table 5 indicates that the system efficiency varies across different
scenarios. Regarding the value of 𝑅𝑎𝑡𝑖𝑜3 , Scenario 3 exhibits a slightly
5.4.2. Comparison between different techniques better performance than the benchmark scenario. Since Scenario 3 only
Although reducing energy consumption through changing configu- modifies the processing times in Stage 3 (which is the final stage in the
ration is possible, the reduction percentage is limited. What if the plant production line), the difference in objectives can be attributed to the
manager can invest money to get some new techniques? If there are batch production mode of the tempering machine, where the processing
eight potential techniques, it is necessary to discuss which is better. time of a batch equals the largest processing time of the glasses in
Techniques I, II, and III: the energy demand per unit time of the edging that batch. Regarding the value of 𝑅𝑎𝑡𝑖𝑜2 , Scenario 2 is better than
machine, coating machine, and tempering machine can be reduced by the benchmark scenario, especially under the due date penalty. This
10%, respectively. Technique IV: the energy demand of all machines is because the finish time of glasses in Stage 2 is actually the arrival
in each stage can be reduced by 10%. Techniques V, VI, and VII: the time of glasses in Stage 3. Regarding the value of 𝑅𝑎𝑡𝑖𝑜1 , Scenario 1
processing time of glasses in the edging machine, coating machine, and is worse than the benchmark scenario. The reason is that the speeds
tempering machine can be reduced by 10%, respectively. Technique of the three edging machines in this problem differ. In Scenario 1, the
VIII: the processing time of glasses in all machines in each stage can basic processing times of glasses are the same, which leads to a big
be reduced by 10%. The impacts of different techniques are shown difference in the actual processing time of glasses in Stage 1. Therefore,
in Fig. 12. It is obvious that the performance of IV is better than the plant manager can reference the principle that the glasses should be
I, II, and III; the performance of VIII is better than V, VI, and VII. allocated into different scheduling horizons according to the similarity
Meanwhile, the plant manager also needs to invest more money when of processing times in the coating stage.
12
Table 5
Ratios between scenarios with different processing times.
𝑛 Due date penalty Energy consumption Total objective
𝑅𝑎𝑡𝑖𝑜1 𝑅𝑎𝑡𝑖𝑜2 𝑅𝑎𝑡𝑖𝑜3 𝑅𝑎𝑡𝑖𝑜1 𝑅𝑎𝑡𝑖𝑜2 𝑅𝑎𝑡𝑖𝑜3 𝑅𝑎𝑡𝑖𝑜1 𝑅𝑎𝑡𝑖𝑜2 𝑅𝑎𝑡𝑖𝑜3
200 1.021 0.917 1.019 1.000 0.981 0.986 1.003 0.972 0.991
300 1.016 0.916 1.014 1.001 0.980 0.987 1.003 0.971 0.991
400 1.012 0.927 1.010 0.998 0.981 0.983 1.000 0.973 0.987
500 1.010 0.911 1.007 1.001 0.981 0.984 1.003 0.971 0.987
600 1.008 0.901 1.007 0.999 0.978 0.983 1.001 0.968 0.986
6. Concluding remarks Declaration of competing interest
This paper establishes a mathematical programming model for a The authors declare that they have no known competing finan-
deep-processing line in a PV glass plant, which intends to minimize the cial interests or personal relationships that could have appeared to
due date penalty and energy consumption simultaneously. It considers influence the work reported in this paper.
the distinct characteristics of machines in the edging, coating, and
tempering stages; additionally, the types of glass pose a challenge Data availability
in solving the problem. To solve the real-sized problem efficiently, a
hybrid GA based on reinforcement learning is designed, in which the Data will be made available on request.
expected Sarsa is used to extract critical knowledge about algorithmic
parameters during the population evolution to guide the exploration Acknowledgments
of the GA. The chromosome is encoded by a priority list representing
the priority relationship among all jobs, which is then decoded by a This work was supported by the National Natural Science Founda-
constructive heuristic based on the problem feature analysis. tion of China [grant numbers 71801147 and 72301170] and the Startup
The key management insights are derived from the numerical re- Fund for Young Faculty at SJTU, China [23X010502006].
sults, which are described as follows:
References
• It is necessary to consider two objectives simultaneously when
the plant manager chooses a solution. Because a solution with a Almada-Lobo, B., Oliveira, J.F., Carravilla, M.A., 2008. Production planning and
good due date objective does not bring a good performance in the scheduling in the glass container industry: A VNS approach. Int. J. Prod. Econ.
energy aspect and vice versa. On the other hand, reducing energy 114 (1), 363–375.
Arbib, C., Marinelli, F., Pinar, M.C., Pizzuti, A., 2022. Robust stock assortment and
consumption does not necessarily increase the due date penalty.
cutting under defects in automotive glass production. Prod. Oper. Manage. 31,
• The improvement of this paper compared to the current method 4154–4172.
used in the plant is greater than 15% from both the delay penalty Bruzzone, A.G., Anghinolfi, D., Paolucci, M., Tonelli, F., 2012. Energy-aware scheduling
and energy consumption considerations in all instances with dif- for improving manufacturing process sustainability: A mathematical model for
flexible flow shops. CIRP Ann-Manuf. Technol. 61, 459–462.
ferent objective weights and percentages of glass types. Moreover,
Chen, T.L., Cheng, C.Y., Chou, Y.H., 2018. Multi-objective genetic algorithm for energy-
it is more important to use the solution approach proposed in efficient hybrid flow shop scheduling with lot streaming. Ann. Oper. Res. 290,
this paper when the plant manager faces a scenario where the 813–836.
percentages of different glass types are unequal. Dai, M., Tang, D., Giret, A., Salido, M.A., Li, W., 2013. Energy-efficient scheduling
• The plant manager can further reduce energy consumption from for a flexible flow shop using an improved genetic-simulated annealing algorithm.
Robot. Comput.-Integr. Manuf. 29, 418–429.
a higher-level decision perspective. (i) When planning production Department of Energy Statistics, 2023. China Energy Statistical Year Book 2022. China
over multiple periods from a tactical point of view, the glasses Statistics Press, Beijing.
should be allocated into different scheduling horizons according Ding, J., Schulz, S., Shen, L., Buscher, U., Lü, Z., 2021. Energy aware scheduling in
to the similarity of processing times in the coating stage. (ii) When flexible flow shops with hybrid particle swarm optimization. Comput. Oper. Res.
125, 105088.
deciding the configuration of the deep-processing line, setting
Fabiano Motta Toledo, C., da Silva Arantes, M., Yukio Bressan Hossomi, M., Almada-
up several small machines in the tempering stage is better than Lobo, B., 2016. Mathematical programming-based approaches for multi-facility
setting up one large machine. (iii) When investing money to glass container production planning. Comput. Oper. Res. 74, 92–107.
develop new technologies from a strategic point of view, it would Fan, H., Su, R., 2022. Mathematical modelling and heuristic approaches to job-shop
scheduling problem with conveyor-based continuous flow transporters. Comput.
be better to focus on reducing the processing time of glasses in the
Oper. Res. 148, 105998.
coating stage, as the return on investment is higher compared to Ghodratnama, A., Amiri-Aref, M., Tavakkoli-Moghaddam, R., 2023. Solving a new bi-
other technologies. objective mathematical model for a hybrid flow shop scheduling problem with
robots and fuzzy mainteance time. Comput. Ind. Eng. 182, 109349.
There are several interesting extensions based on the research in Gicquel, C., Miègeville, N., Minoux, M., Dallery, Y., 2010. Optimizing glass coating
this paper. The capacitated lot sizing problem, which determines the lines: MIP model and valid inequalities. European J. Oper. Res. 202, 747–755.
optimal production quantities of different glasses over multiple periods, Hamdi, I., Boujneh, I., 2022. Particle swarm optimization based-algorithms to solve the
two-machine cross-docking flow shop problem: just in time scheduling. J. Comb.
is worthy of investigation. In addition, real-time power flow control
Optim. 44, 947–969.
needs to be optimized when the plant’s energy demand is supplied He, D., Kusiak, A., Artiba, A., 1996. A scheduling problem in glass manufacturing. IIE
by a renewable energy generation/storage system, such as a solar Trans. 28, 129–139.
energy generation system with a battery bank. During this process, Hervert-Escobar, L., Pérez, J.F.L., 2017. Production planning and scheduling optimiza-
uncertainty must be considered since the energy generation depends tion model: A case of study for a glass container company. Ann. Oper. Res. 286,
529–543.
on dynamically changing weather conditions. Li, J., Sang, H., Han, Y., Wang, C., Gao, K., 2018. Efficient multi-objective opti-
mization algorithm for hybrid flow shop scheduling problems with setup energy
CRediT authorship contribution statement consumptions. J. Clean. Prod. 181, 584–598.
Lian, X., Zheng, Z., Wang, C., Gao, X., 2021. An energy-efficient hybrid flow shop
scheduling problem in steelmaking plants. Comput. Ind. Eng. 162, 107683.
Weiwei Cui: Methodology, Software, Writing – original draft. Biao Liu, M., Yang, X., Chu, F., Zhang, J., Chu, C., 2020. Energy-oriented bi-objective
Yuan: Conceptualization, Methodology, Validation. optimization for the tempered glass scheduling. Omega 90, 101995.
13
Lozano, A.J., Medaglia, A.L., 2014. Scheduling of parallel machines with sequence- de Souza Amorim, F.M., da Silva Arantes, M., de Souza Ferreira, M.P., Toledo, C.F.M.,
dependent batches and product incompatibilities in an automotive glass facility. J. 2021. MILP formulation and hybrid evolutionary algorithms for the glass container
Sched. 17, 521–540. industry problem with multiple furnaces. Comput. Ind. Eng. 158, 107398.
Lu, C., Gao, L., Li, X., Zheng, J., Gong, W., 2018. A multi-objective approach to welding Tang, D., Dai, M., Salido, M.A., Giret, A., 2016. Energy-efficient dynamic scheduling
shop scheduling for makespan, noise pollution and energy consumption. J. Clean. for a flexible flow shop using an improved particle swarm optimization. Comput.
Prod. 196, 773–787. Ind. 81, 82–95.
Mahmood, M.S., Zaidan, B., Zaidan, A., Ahmed, M.A., 2019. Survey on fuzzy TOPSIS T’kindt, V., Billaut, J.C., Proust, C., 2001. Solving a bicriteria scheduling problem on
state-of-the-art between 2007 and 2017. Comput. Oper. Res. 104, 207–227. unrelated parallel machines occurring in the glass bottle industry. European J. Oper.
Meng, L., Zhang, C., Shao, X., Ren, Y., Ren, C., 2018. Mathematical modelling Res. 135 (1), 42–49.
and optimisation of energy-conscious hybrid flow shop scheduling problem with Wang, S., Wang, X., Chu, F., Yu, J., 2020. An energy-efficient two-stage hybrid flow
unrelated parallel machines. Int. J. Prod. Res. 57, 1119–1145. shop scheduling problem in a glass production. Int. J. Prod. Res. 58 (8), 2283–2314.
Paraveen, R., Khurana, M.K., 2023. A comparative analysis of SAMP-Jaya and simple Wu, X., Cao, Z., 2022. An improved multi-objective evolutionary algorithm based on
Jaya algorithms for PFSSP (permutation flow shop scheduling problems). Soft decomposition for solving re-entrant hybrid flow shop scheduling problem with
Comput. 27, 10759–10776. batch processing machines. Comput. Ind. Eng. 169, 108236.
Reeves, C.R., 2010. Genetic algorithms. In: Gendreau, M., Potvin, J.Y. (Eds.), Handbook Yan, J., Li, L., Zhao, F., Zhang, F., Zhao, Q., 2016. A multi-level optimization approach
of Metaheuristics. Springer US, Boston, MA, pp. 109–139. for energy-efficient flexible flow shop scheduling. J. Clean. Prod. 137, 1543–1552.
Richard, P., Proust, C., 2000. Maximizing benefits in short-term planning in bottle-glass Yılmaz, Ö.F., Durmusoglu, M.B., 2018. A performance comparison and evaluation of
industry. Int. J. Prod. Econ. 64 (1), 11–19. metaheuristics for a batch scheduling problem in a multi-hybrid cell manufac-
Ruiz, R., Maroto, C., 2006. A genetic algorithm for hybrid flowshops with sequence turing system with skilled workforce assignment. J. Ind. Manag. Optim. 14 (3),
dependent setup times and machine eligibility. European J. Oper. Res. 169, 1219–1249.
781–800. Yılmaz, Ö.F., Yazıcı, B., 2022. Tactical level strategies for multi–objective disassembly
Ruiz, R., Rodríguez, J.A.V., 2010. The hybrid flow shop scheduling problem. European line balancing problem with multi–manned stations: an optimization model and
J. Oper. Res. 205, 1–18. solution approaches. Ann. Oper. Res. 319, 1793–1843.
Shao, W., Shao, Z., Pi, D., 2021. An ant colony optimization behavior-based Yılmaz, B.G., Yılmaz, Ö.F., 2022. Lot streaming in hybrid flowshop scheduling problem
MOEA/D for distributed heterogeneous hybrid flow shop scheduling problem by considering equal and consistent sublots under machine capability and limited
under nonidentical time-of-use electricity tariffs. IEEE Trans. Autom. Sci. Eng. 19, waiting time constraint. Comput. Ind. Eng. 173, 108745.
3379–3394. Zhang, B., Pan, Q., Gao, L., Meng, L., Li, X., Peng, K., 2020. A three-stage multiob-
jective approach based on decomposition for an energy-efficient hybrid flow shop
scheduling problem. IEEE Trans. Syst. Man Cybern. Syst. 50, 4984–4999.
14

A Hybrid Genetic Algorithm Based On Reinforcement Learning For The Energy-Aware Production Scheduling in The Photovoltaic Glass Industry

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Hybrid Genetic Algorithm Based On Reinforcement Learning For The Energy-Aware Production Scheduling in The Photovoltaic Glass Industry

Uploaded by

Copyright:

Available Formats

Computers & Operations Research 163 (2024) 106521

Contents lists available at ScienceDirect

Computers and Operations Research

A hybrid genetic algorithm based on reinforcement learning for the

ARTICLE INFO ABSTRACT

Fig. 1. Layout of the deep-processing line in a glass plant.

3.2.2. Objectives 4. Solution approach

Fig. 2. Gantt chart of a solution with 10 glasses.

Fig. 3. Overall flowchart of ESGA.

4.2. Genetic operators

4.3.2. Actions Algorithm 2: Training for Expected Sarsa.

Fig. 5. Box-plots of 𝑅𝑃 𝐷 obtained by three GAs for different sized problems.

Fig. 6. Box-plots of 𝐶𝑂𝑉 and 𝑅𝑃 𝐷 obtained by four metaheuristics.

developing techniques IV and VIII. Compared to IV, technique VIII can

5.4.3. Comparison between different processing times

6. Concluding remarks Declaration of competing interest

You might also like