Imperfect Corrective Maintenance Scheduling For Energy Efficient Manufacturing Systems Through Online Task Allocation Method

Version of Record: https://www.sciencedirect.
com/science/article/pii/S0278612519300913
Manuscript_60bfd57f3584b23f4151a74debd39829
Imperfect Corrective Maintenance Scheduling for Energy
Efficient Manufacturing Systems Through Online Task
Allocation Method
Tian Yua，Cheng Zhua，Qing Changa,*, Junfeng Wangb
a
Department of Mechanical and Aerospace Engineering, University of Virginia, Charlottesville, VA, 22904, US
b
Department of Industrial and Manufacturing System Engineering, Huazhong University of Science and Technology, Wuhan, 430074,
China
Abstract
Maintenance management is critical in enabling a smooth production operation in a manufacturing system. Once one
machine fails, a corrective maintenance (CM) is required to resume its normal operation. However, perfect CM is not
always needed to restore the failed machine as good as new. A Multi-level or imperfect CM is a more realistic and
economic way based on the need of production operational levels. In addition, maintenance resources such as
maintenance staffs are limited in reality. Therefore, how to dispatch the limited maintenance workforce with an
appropriate level of CM is critical since it directly impacts the overall system’s productivity, energy efficiency and
cost. This paper aims at creating a real-time CM scheduling policy to reduce the overall maintenance and energy
related cost for a stochastic serial production line. To accomplish the goal, the CM scheduling problem is formulated as
an online task allocation (OTA) problem. The expected cost rate of the serial production line is introduced and used to
define the payoff function of the OTA problem. A numerical case study is provided to evaluate the effectiveness of the
OTA based maintenance policy by comparing with other heuristic policies.
Keywords: Real-time maintenance policy, Stochastic manufacturing system, Online task allocation, Energy Consumption
1. Introduction
Maintenance activity is important for all manufacturing systems. It plays a critical role in plant floor
operation to increase productivity, quality and reduce overall costs. Modern manufacturing systems are
highly complex with strongly interconnected machines and buffers [1], [2]. A random machine failure will
*Corresponding Author.
Email address: qc9nq@virginia.edu (Q. Chang)
© 2019 published by Elsevier. This manuscript is made available under the Elsevier user license
https://www.elsevier.com/open-access/userlicense/1.0/
not only halt production on a single machine but will also propagate through the entire system and make
other machines starved or blocked. Thus, corrective maintenance (CM) has to be carried out to respond to
the failures and rectify the faults so that the failed machines can be recovered.
The classification of maintenance level has raised attention in industry and academia [3]. A perfect CM
action restores the machine or equipment by replacing the component with a new one, while imperfect CM
is an action after which the machine is recovered to a certain state which is better than its pre-maintenance
condition but not as good as new. In addition, imperfect CM costs lower than perfect CM, but it may
accelerate the deterioration of components resulting in damage on machines or equipment [4], [5].
Therefore, a CM action can be further classified into multiple levels depending on the repairing restoration
degree and the associated cost [6]. Since a modern manufacturing system is complex with machines and
buffers being dynamically interacted. It turns out that the option of CM level is not just a local decision on
an isolated machine or equipment [7].
Tremendous works have been dedicated to maintenance policies considering multi-level maintenance
[8], [9]. Most of the studies have focused on single-unit systems. However, maintenance of multi-unit
systems differs from that of single-unit systems in that components in multi-unit systems are interconnected
and interdependent. Wang et al. [8] categorizes maintenance policies for multi-unit systems into group
maintenance policies and opportunistic maintenance policies. Both policies are based on the idea that any
component’s failure will impact the operation of the entire system, and it lends opportunities for other
components to be maintained at the same time without incurring additional production loss. Some studies
[10], [11] have explored the opportunity window of maintenance for multi-stage manufacturing systems.
However, these studies focus on single maintenance level without discussing multiple imperfect
maintenance levels and their implications to machines’ reliability and aging conditions.
In addition, most of the maintenance models for manufacturing systems are policy based [12], [13], [14],
in which policies are generated by observing the statistic behavior of the production system and
maintenance operations in a long run. The statistical description offers an average expectation of the system
behavior, and thus particular policies depending on which may not be the optimal maintenance policies
especially for the real-time scenario because the dynamic behavior of the system are not fully analysed for
decision making [10].

A multi-stage manufacturing system is nonlinear and stochastic. There is no closed form representation
of the performance for a general serial production line [15]. It is very difficult to find out how the
maintenance-caused stoppage of machines relates to the production loss of the system. To deal with the
challenge, simulation-based methods have been developed to search for the optimal maintenance policy
[16]. For example, Yang et al. [17] develops a Genetic Algorithm (GA) based method to search for the most
cost-effective policy by evaluating maintenance schedule through a discrete-event simulation. Other studies
like Pandey et al. [18] apply evolutionary algorithm to minimize total maintenance, failure and shutdown
cost. Abdollahzadeh et al. [19] presents a multi-objective-based model in order to minimize total expected
costs due to maintenance actions. However, simulation-based optimization cannot reveal relationships
between system parameters and performance. Zou et al. [20] has shown that the system production loss
caused by machines random failure directly relates to the dynamic states of the system. Therefore, to obtain
a model-based CM policy, a real-time modeling of the production system is necessary.
In modern lean manufacturing environment, it is not an uncommon scenario that maintenance tasks are
more than available maintenance staff at a certain time. Therefore, maintenance priority directly impacts
production performance [21]. A common practice on plant floor is to use heuristic rules derived from
human expert knowledge to prioritize the maintenance works [22], [23]. Xia et al. [24] have developed an
online manufacturing analytics framework for integrating sensor-driven prognostic models and
opportunistic maintenance policy in mass customization manufacturing. However, maintenance staff
limitation and impact from multiple maintenance levels are not considered. Meanwhile, with increasing
concerns on eco-efficiency of manufacturing all over the world [25], [26], energy and environment related
performance of manufacturing systems have increasingly raised attentions [27], [28]. Maintenance decision
making needs a systematic method to trade-off production performance, maintenance related cost and
energy efficiency to minimize the overall operational cost. Recently, Huang et al. [7] have developed a real-
time maintenance scheduling method for multi-stage manufacturing systems through utilizing an expected
cost rate concept while considering the aging model of multi-level maintenance. However, maintenance
staff constraint and energy cost are ignored, which may not be a practical scenario.
The existing maintenance scheduling methods for multi-stage production line either focus on single level
maintenance or has not systematically taken the real-time production information into consideration on how
a machine’s aging status after a certain repair level will impact system-level performance such as
throughput, energy consumption and the overall cost. A timely CM scheduling to assign limited workforce
to the right machine and to carry out the right level of CM is critical but challenging due to the nonlinear
interactions of machines and stochastic nature of machine random failures.
This paper is devoted to addressing the above challenges and obtaining an online CM policy by
prioritizing limited maintenance staff to perform an appropriate level of maintenance to minimize overall
cost. The main contributions of this paper are: 1) formulating the CM scheduling as an online task
allocation (OTA) problem and defining the OTA payoff function by integrally considering production real-
time dynamic status, multi-level maintenance cost and energy consumption, 2) evaluating the production
loss of various maintenance actions according to the real-time system properties and estimating real-time
maintenance cost rate, and 3) developing an optimal CM scheduling policy by solving the OTA problem.
This paper is organized as follows. Section 2 introduces production system structure and notations, the
online task allocation (OTA) problem, as well as CM dispatching formulation based on OTA. In Section 3,
the manufacturing system data-driven model is introduced, based on which the production loss and
production loss risk triggered by corresponding maintenance actions are evaluated. In Section 4, the overall
cost and the expected cost rate are derived, according to which the solution to OTA problem and the real-
time maintenance policy are derived. A case study and numerical experiments are given in Section 5.
Conclusions and future works are provided in Section 6.
2. System Description and Problem Statement
2.1 Production Line System Description
A serial production line with machines (represented by rectangles) and − 1 intermediate buffers
(represented by circles) is shown in Fig. 1. In this paper, the continuous flow model is adopted for
convenience of mathematical representation, where the buffer levels change continuously within the
interval from zero to its maximum capacity.
Fig. 1. Standard structure of a serial production line.
The following notations are adopted in this paper:

• denotes the machine, = 1,2, … , ;
• denotes the buffer, = 2,3, … , . It also denotes the maximum of the buffer capacity.
• denotes the cycle time of machine , and 1/ is the fixed rated speed of machine
• ∗ denotes the slowest machine in the serial production line, i.e. ∗

= arg max , =
1, 2, … , ;
• denotes the buffer level of buffer at time ;
• ! = " , # !, ! , $ ! %, = 1,2, … , , & = 1,2, …, denotes the & maintenance action on machine
starting from ! and lasting for $ ! and the maintenance level is # ! ;
• # ! ∈ 1(, 2( denotes the level of maintenance on machine for the & maintenance. In this
paper, we adopt two levels of maintenance, in which 1( means perfect maintenance that restores
the machine as good as new, and 2( means imperfect maintenance that resumes the machine to a
certain aging level.
• )* denotes the opportunity window of machine at time ;
• +,-./ denotes the permanent production loss caused by the maintenance action !;
• +,0-./ denotes the production loss risk caused by the maintenance action !;
• 1 denotes the total number of available maintenance staff at time .
• 2 denotes the total number of machines waiting in the queue to be repaired at time .
We also make the following assumptions based on realistic scenarios:
• CM is taken upon failure if and only if there is at least one maintenance staff available, and a
failed machine will not be recovered until a CM action is finished.
• Two maintenance levels are adopted: a perfect maintenance action will restore the machine “as
good as new”; an imperfect maintenance recovers the machine to a degree between the current
age and new. The recovery effect can be evaluated based on the age reduction factor to the
maintenance action.
• The duration of each maintenance level is assumed to be deterministic.
• Once a maintenance staff is assigned to do the CM, the staff is not available until the maintenance
is finished, all maintenance staff are treated at the same skill level.
• Machine , = 1,2, ⋯ , − 1 is blocked if its downstream buffer is full. The machine at the end-
of-line is never blocked.
• Machine , = 2,3, ⋯ , is starved if its upstream buffer !45 is empty. The machine at the
beginning of the line is never starved.
2.1 Problem Statement
2.2.1 Introduction to Online Task Allocation Problem
An online task allocation (OTA) problem can be described as in [29]: for a mission with 6 tasks with 67
disjoint subsets of tasks risen one at a time, the strategy aiming at maximizing or minimizing the total
payoff of the task assignment is to allocate 68 staff to the subsets of tasks in a way that each task is solved
by only one staff and each staff can finish only one task at one time in a subset. This problem can be
formulated as an Integer Linear Programming problem as:
9 6 ∑ @5 ∑!@5 < != ! 1
A> ? >
:./
s.t.
∑>@5
A
= ! = 1 , ∀& = 1, … , 68
∑ . ∈CD
= ! = 1 , ∀&, E: & = 1, … , 68 , E = 1, … , 67
∑>@5
A
= ! ≤ 1 , ∀& = 1, … , 68
= ! ∈ 0,1 , ∀ , &
where < ! is the payoff of the & staff finishing the task and = ! defines the allocation of the staff, which
equals to 1 if the task is allocated with the & staff and 0 otherwise; is the time that the task is
assigned to the staff and I denotes the time that the E task subset arises.
2.2.2 Maintenance Scheduling Problem Formulation
With the definition of the OTA problem, we can formulate the real-time CM scheduling problem into an
OTA problem with limited maintenance staff to perform appropriate level of CM in order to minimize the
overall cost in a long run. Since all maintenance staffs are assumed to be at the same skill level, the payoff
< ! in Eqn. (1) can be simplified as < . , = 1,2, ⋯ , . Let =
J J.
be the corresponding variable that
equals to 1 if any staff is dispatched to the machine at time , and 0 otherwise.
The goal is to allocate all available staff to failed machines in order to minimize the total payoff of
maintenance.
i.e.,
9 6 ∑ @5 < . = 2
J J.
K
:. ,J.
s.t.
∑ @5 = J. ≤1
= ∈ 0,1 , ∀ , , #
J.
# ∈ 1(, 2(
The constraint indicates that each available maintenance staff can only perform a certain level of
maintenance on exactly one machine at time . In addition, the maintenance level # on machine can be
either perfect (1() or imperfect (2(). Specifically, for every = = 1, the payoff function < .
J. J
with two
levels of # should be compared. Therefore, our goal is to find the optimal maintenance allocation of =
J.
with appropriate maintenance level # to minimize the total payoff. One key challenge for this problem is to
define the payoff function < . . In order to find the appropriate payoff function < .
J J
, we need to further
understand the production system dynamic properties, which are introduced in the next section.
3. Production System Modeling
In multi-stage serial production line, the dynamic interactions between machines and buffers are
nonlinear since the buffers have finite capacity. In addition, the system is constantly affected by machines’
random failures and maintenance scheduling. Based on our previous work in [7], the dynamic
manufacturing system can be modeled as the state space equation:
LM = N"L ,O ,P % 3
In which,
L = QR5 , R5 ,⋯,R ST , R is the state representing the production count of machine at
time ;
O = QU5 , U5 ,⋯,U ST , U is the control input for machine at time , which is the
assignment of CM of level # , # ∈ 1(, 2( ;
P = Q*5 , *5 ,⋯,* ST , * is the random disruptions on machine at time . In this
paper, such information is the random failures of machines, which can be directly obtained from sensor
data. It denotes whether a machine is suffering a random failure at time .
* = V W,5, X. 7 YZ-8[ Y>[J [

X. :[ J7 [ \-
\-
4
N ∗ = Q=5 ∗ , =5 ∗ , ⋯ , = ∗ ST , = ∗ is the dynamic function of machine .
Based on the conservation of flow, we can evaluate system state at any time according to [8]. The
buffer level of buffer at time can be evaluated as:
= 0 +R − R 45 5
Indeed, random failures of machine will directly impact the system state and buffer level. How to assign
the CM and to what level resuming the failed machines also play a significant role in changing system
states and production performance. In a multi-stage serial production line, a stoppage at one machine does
not necessarily lead to permanent production loss of the entire line, due to the presence of the finite buffers
in the line and asynchronous machine process [10]. Our previous studies demonstrate that there exists a
threshold for machine downtime, beyond which the system will suffer permanent production loss that
cannot be recovered [20]. This threshold is defined as opportunity window. To keep this paper self-
contained, we briefly summarize our previous studies without detailed proof.
Definition 1. [28], [30] Opportunity window is the longest possible disruption on machine at time
that does not result in permanent production loss at the end-of-line machine, i.e.
)* = abc V$ ≥ 0: a. . ∃ $ , fW RM g $g = fW RM g; $g, ∀ ≥ $ i 6
∗ C C ∗
where fW RM g $g and fW RM g; $g are end-of-line production counts without and with disruption
C C
event respectively.
If the duration of a disruption event is longer than the machine’s opportunity window, a permanent
production loss will be incurred on the whole line. Otherwise, the disruption event will only cause
temporary stoppage of adjacent machines, which can be recovered eventually. Indeed, a CM is carried out
when there is a downtime event. Thus, to derive whether a CM action will cause permanent production
loss, the opportunity window at time needs to be evaluated first.
In our previous work, we further introduce the concept of permanent production loss. The permanent
production loss can be defined as the difference between the output of the real scenario of the production
line and a virtual ideal scenario of the same line [30]. The virtual ideal scenario of a production line is
defined as a scenario that there are no disruptions due to machine failures and other impacts. Indeed, for a
serial production line, the slowest machine eventually determines the production output [15].
Thus, consider a maintenance action denoted as ! = " , # !, ! , $ ! %, if $ ! ≤ )* , the maintenance
will not incur permanent production loss; however, if $ ! > )* ( ), the maintenance will halt the slowest
machine, which will result in permanent production loss. Therefore, the permanent production loss caused
by the maintenance action ! can be formulated as:
n./ 4op. " ./ %

+,-./ = 9<l m , 0r 7
Cq∗
From Eqn. (7) we can find that a maintenance ! will cause permanent production loss +, -./ only if it
lasts longer than the opportunity window of machine and the observation interval is from ! to ! + $ !.
Therefore, +, -./ only measures the impact of ! on the system within its duration. However, the
maintenance action ! will leave further impact on the system because it totally changes system states at
time !. To evaluate this profound impact on the system, production loss risk +,0 -./ is introduced to
measure the difference in production losses of the first following disruption event after ! by comparing
the scenarios of with and without ! [7].
∗
For example, after time !, the first random failure occurs on machine I at time and a maintenance
action I∗ = E, #I ∗ , ∗ , $I ∗ is taken with the same level as ! , #I ∗ = # ! . Since I∗ is the only disruption
∗
event following ! without any other events between ! and . The full disruption events can be included
u,
in t = Qt ! , I ∗ S,
u is the existing downtime list up to time
in which t !.
With the list of disruption events within the very near future t and current system states, the set of
buffer levels v ∗
=Q w
∗
, x
∗
,…, ∗
S at time ∗
can be derived, with which the opportunity
window )*I ∗
can be evaluated. Then, the permanent production loss +, -D∗ ( ∗ ) caused by I∗ can be
derived as:
nD∗ 4opD ∗
+,-D∗ ∗
= 9<l V , 0i 8
Cq∗
For the disruption event I∗ , it happens with the probability c (E, !,

∗
) based on the reliability of
machine I. Because the machines are independent with each other in reliability, the probability c (E, !,
∗
) is the joint probability of machines, i.e.
% ∏\@5,\}I {1 − fW c\ "
∗
c"E, !,
∗
% = cI " !,
∗
! , g%$g| 9
The expected value of the permanent production loss of I∗ can be evaluated as:
•{+,-D∗ | = ∑I@5 fW c"E, !, % +,-D∗ $ 10

€ ∗ ∗ ∗
Please note that it is the potential disruption event I∗ that causes +, -D∗ ( ∗ ) rather than the initial
disruption event !. Note that the existence of ! alters the buffer levels v ∗
at time ∗
, and thus the
)*I ∗
is changed, which results in different +,-D∗ ∗
. Therefore, the production loss of I∗ without !
happens at time ! should also be calculated in order to evaluate how ! influences the future event I∗ .
• = Qt
Considering a case that the full disruption events list is t u, I ∗ S,
•( ∗ ), opportunity
the buffer levels v
‚ I ( ∗ ), and production loss +,

window )* ‚ - ∗ ( ∗ ) can be similarly evaluated at time ∗
D
. Therefore, the
expected production loss of the disruption event I∗ regardless of ! can be formulated as:
‚ - ∗ | = ∑I@5 f€ cƒ"E,
•{+, !,
∗ ‚- ∗
% +, ∗
$ ∗
11
D W D
where cƒ"E, !,
∗
% is the probability that machine SI fails at time ∗
. Note that cƒ"E, !,
∗
% ≠ c (E, !,
∗
),
since cƒ"E, !,
∗
% is derived by assuming that machine does not incur !. Eventually, the production loss
risk of the maintenance action ! can be defined as the difference of expected production losses with and
without !:
‚ - ∗|
+,0 -./ = •{+,-D∗ | − •{+, 12
D
With the derivation of the production loss and the production loss risk of !, we can quantify the impact
of a maintenance action to the whole system. Using this property, we will show the derivation of payoff
<.
J
in OTA problem for CM dispatching.
4. Maintenance Cost Analysis for OTA
We have formulated real-time CM dispatching as OTA problem. The payoff < .

J
directly relates to the
production loss as discussed in the previous section, as well as maintenance related cost. One key point is
that the aging condition of a machine will have significant implication to maintenance related cost. Thus,
the failure rate of machines in our system is not deterministic but changes over time.
4.1. Maintenance Cost Analysis
The cost of a maintenance action contains two parts: resources cost and production loss cost. The
resource cost includes the cost of consumable materials and machine or equipment replacement, depending
on the maintenance level. In the meanwhile, during a maintenance action, the energy consumption could be
reduced due to the stoppage of machines for maintenance. Therefore, given a maintenance action ! =
J./
" , # !, ! , $ ! %, the cost of this maintenance action, denoted as U ! , can be evaluated as:
J./ J./ J./ J./ J./

U! = ( + (Z …+,- + +,0- † − ‡ ! 13
./ ./
J./ J./ J./

where ( is the resource cost of maintenance level # ! , (Z is the profit per part, +,-./ and +,0-./ are the
permanent production loss and production loss risk due to ! respectively. These two terms have been
J./ J./
derived in the previous section. ‡ ! is the electrical energy saving during $ ! which can be evaluated as:
‡ ! = (- ⋅ ∑I@5 ‰I!
J./ J./
14
J./
where (- is the unit cost of electricity, ‰I! is the stoppage time of machine I due to the maintenance
action ! that may cause I being starved or blocked.
One may argue that the total cumulated cost during a finite time horizon may not be appropriate to guide
an optimal maintenance decision. Although a perfect maintenance may cost much higher than an imperfect
maintenance, it may provide the machine a longer operation time before the next failure and thus reduce the
J./
CM frequency and the associated cost. However, this relationship is not explicitly modeled in U ! . Thus, a
normalized cost rate needs to be defined to evaluate the maintenance cost per unit time before the next
failure arrival to capture the machine aging condition for a certain level of CM. A real-time maintenance
J./
cost rate 0 ! is defined as:
K ./
J./ Š./
0! = K ./ K ./ 15
‹./ Œn./
J J
where $ !./ is the duration of maintenance, • !./ is the lifetime of machine after the & maintenance, which
J./
is a random variable with the distribution c … Ž• ! † following the aging model of machines [8]. The
J./
virtual age • ! can be evaluated as:
J./ J./ J. /’“ J. /’“

•! =‘ …• !45
+• !45
† 16
J./ J. /’“
where ‘ is the factor of age reduction in maintenance level # ! , • !45
is the virtual age of machine
J. /’“
right after its & − 1 maintenance action is finished, and • !45
is the life time after & − 1
J./
maintenance action. It is clear that for a perfect maintenance, ‘ = 0 since the virtual age of machine
J./
starts from zero, while 0 < ‘ < 1 for imperfect maintenance.
J./
The probability of time to failure is a conditional probability on the machine’s virtual age • ! , i.e.
K ./
J./ Z. • Œ–./ —
c … Ž• ! † = ™
∗
>0 17
f K./ Z. ˜ n˜
，
š
./
Therefore, the expected cost rate after its & maintenance and before the next failure can be calculated
as:
K./
J./ € Š./ J./
• {0 ! | = fW c … Ž• ! † $ 18
Œn./
It is noted that the evaluation of expected cost rate considers both the maintenance cost and aging
conditions for various level of maintenance. When a random failure occurs at a machine, a proper level of
J./
maintenance should be chosen to minimize the expected cost rate • {0 ! |.
4.2 Evaluation of the Payoff Function
Based on the cost analysis above, we can derive the expected cost rate for each specific maintenance
action from Eqn. (18). Thus, we can always compare the expected cost rate among all possible maintenance
prioritization and maintenance levels to find the minimal one. In another word, we can obtain the minimum
expected cost rate and the corresponding maintenance assignment whenever a maintenance action is
needed. Examining the OTA formulation of the maintenance scheduling problem in section 2.2.2, we can
J./
conclude that the expected cost rate • {0 ! | is a reasonable payoff value of < .
J
in Eqn. (2). With this
payoff definition, we do not need to solve a linear programming problem of OTA. We can prove that with
J./
payoff to be • {0 ! |, the optimal solution for the OTA problem indeed minimizes the total expected cost
rate. Proposition 1. makes this accurate.
J./ T
Proposition 1. Let < . = • {0 ! |, assume NYZ = › =5 “ , =w œ ,...,= • is a solution of the
J J J Jq
OTA problem in Eqn. (2). Then, the solution is the optimal CM assignment that guarantees the minimum of
the total cost of the system.
Proof: Let ž = ›<5“ , <wœ ,,...,< q • denotes the set of payoff at time , N =
J J J
T
›=5 “ , =w œ ,...,= • is the set of maintenance allocation. Thus, the problem
J J Jq
9 6 ∑ @5 < . = 19
J J.
:.K ,J.
could be rewritten as:
9 6ž ∙N 20
N
T
Since N = › =5 “ , =w œ ,...,= • is the optimal solution that minimizes the product ž ∙
J J Jq
¡¢
N , we have = D. = 1, in which indicates that < . is the E smallest element in ž , and E =

J J
I
arg min 1 ,2 is the number of executable tasks at time . Therefore, N ¡¢ =
C
› =5 “ , =w œ ,...,= • provides the allocation with the respective maintenance level [#5 , #w, … , # ]
J J Jq
that follows the ranking of the elements in ž . Then the ranking of < .
J
indicates the ranking of
T
expected cost rate. Therefore, the solution N = › =5 “ , =w œ ,...,= • provides the optimal
J J Jq
¡¢
maintenance assignment to guarantee the minimal expected cost rate at time . In this way, we can obtain
the minimum of total cost over the time horizon by integrating the cost for each maintenance scheduling at
time .
Based on the analysis above, the optimal CM scheduling policy algorithm is shown in Fig. 2.
Fig. 2. Flow chart of the real-time CM assignment policy.
Initialize , , 0 , machine aging, failure rate and number of maintenance staff
Start the production simulation from ← 0
1) Check the number of failed machines 2

Repeat:
2) Check the number of available maintenance staff 1

3) If 2 ≠ 0&& 1 >0
4) Check all the allocations of maintenance staff N
5) Check all corresponding cost of maintenance ¬
J./
For each N , calculate the corresponding expected cost rate < . ← • {0 ! |
J
6)
Calculate the reward - ← ∑ @5 < . =
J J.
Find the optimal allocation N ¡¢

7)
Update the total cost ¥\ ← ¥\ + ¬ ∙ N

8)
¡¢
Else ← + 1
9)
10)
Until =
Calculate the total cost ¥ Y [J ← ¥\ + ¥J[§Y8
Output the total maintenance cost ¥ Y [J
The structured algorithm is shown as below:
5. Case Study
In order to evaluate the effectiveness of the OTA based maintenance policy, extensive numerical
simulations are performed. To measure the performance of maintenance policies, a performance metric is
selected to be the total cost that includes maintenance related cost and the labor cost within a time interval
(0, ]. The total cost ¥ Y [J can be defined as:
= ∑ @5 ∑¦
J./
¥Y [J !@5 U ! +6¥J[§Y8
C
21
in which, ¨ is the total number of maintenance work orders on machine up to time , 6 is the total
number of maintenance staff within the time duration (0, S, and ¥J[§Y8 is the unit cost for a maintenance
staff. We will evaluate and compare the total cost ¥ Y [J deriving from our proposed OTA based
maintenance policy and three other benchmark policies. We introduce three benchmark maintenance
policies for comparison. The first one is called FCFS-1c, which always allocates perfect maintenance
according to the order of machines’ breakdown. The second is FCFS-2c, which allocates imperfect
maintenance according to the order of machines’ breakdown. The third one is FCFS-r, which allocates
different maintenance levels according to a pre-defined machine age threshold, and the maintenance order
follows the order of machines’ breakdown. When a machine’s age goes beyond the threshold, a perfect
maintenance is performed, otherwise, an imperfect maintenance is carried out. The threshold is selected to
be 50 9 6 for this experiment. It is noted that all three benchmark policies are commonly used in practice.
To perform the experiment 100 different serial production lines was constructed by randomly selecting
parameters from the following sets:
∈ Q5, 10S9 6, = 1, 2, … , 10
∈ Q2, 40S, = 2, … , 10
0 ∈ Q0, S, = 2, … , 10
For demonstration purpose, the results from 5 different lines with 10 machines and 9 buffers are shown.
For each production line, the simulation time is one week, i.e. = 10080 9 6, and the simulation iteration
is 20. A widely-used failure rate model known as Weibull distribution is adopted to evaluate the machine
reliability with the scale parameter ® and the shape parameter ¯ . The probability density function is given
as:
±.
c = lc •− … † — 22
°.
J./
Given the virtual age • ! of the machine, the conditional probability density of the failure rate is
K ./ ±. K ./ ±. K./ ±.
J./ Œ–./ –./ Œ–./
c … Ž• ! † = ² ³ lc ´² ³ −² ³ µ 23
±.
°. °. °. °.
The shape parameter ¯ for every machine is set to 2 in this case study and the scale parameter ® and
the initial virtual age ¶ W are picked randomly from:
® ∈ Q500, 2000S9 6, = 1, 2, … , 10
¶ W ∈ Q0, 100S9 6, = 1,2, … , 10
The maintenance parameters are shown in Tab.1. The maintenance staff cost is assumed to be $24 per
hour.
Table 1. Maintenance Parameters
Maintenance Age Reduction Maintenance Resource Cost
Level- · Factor ž·¸ Duration (min) (US Dollar)
1( 0 30 1200
2( 0.5 15 600
First, we compare the total cost ¥ Y [J in Eqn. (21) of all four policies with the number of
maintenance staff from 1 to 5. For each randomly generated production line, the average total cost for the 5
lines are shown in Fig. 3(a)-(e). The small “I” bars on top of each bar represent the 95% confidence interval
(CI). Overall, it is obvious that the average total cost of OTA policy is the lowest, which demonstrates that
the OTA based maintenance policy is effective in reducing the total maintenance cost. Note that for 1 or 2
maintenance staff, there are minor overlaps of 95% CI between OTA policy and FCFS-r for line 4 and line
5. There are no overlaps of 95% CI between OTA and any other policy for more than 3 maintenance staffs.
It means that 1 or 2 staffs might not be enough, and 3 or more staffs are required for these lines. Although
the OTA policy has potential to assign appropriate CM level to the right machines to minimize the total
cost, insufficient maintenance staff limits the benefits of the algorithm and leads to extra production loss.
Total maintenance cost with 1 maintenance staff

800000
700000
600000
500000
Total Cost ($)
FCFS-1c
400000
FCFS-2c
300000 FCFS-r
200000 OTA
100000
0
Line 1 Line 2 Line 3 Line 4 Line 5
Production Line
(a)

800000
700000
600000
500000
Total Cost ($)
FCFS-1c
400000
FCFS-2c
300000 FCFS-r
200000 OTA
100000
0
Production Line
(b)

800000
700000
600000
500000
Total Cost ($)
FCFS-1c
400000
FCFS-2c
300000 FCFS-r
200000 OTA
100000
0
Production Line
(c)
800000
700000
600000
500000
Total Cost ($)
FCFS-1c
400000
FCFS-2c
300000 FCFS-r
OTA
200000
100000
0
Production Line
(d)

800000
700000
600000
500000
Total Cost ($)
FCFS-1c
400000
FCFS-2c
300000 FCFS-r
OTA
200000
100000
0
Production Line
(e)
Fig. 3. The comparison of overall cost among policies.

Furthermore, we compare the total cost of all policies for different number of maintenance staff as
shown in Fig. 4. It is clear that the average total cost always reaches its minimum for all policies when
there are 3 maintenance staffs. This implies that 3 maintenance staffs may be just adequate for this kind of
10 machines and 9 buffers serial production line.,
Total maintenance cost with different number of maintenance

staff
600000
500000
400000
Total cost ($)
FCFS-1c
300000
FCFS-2c
200000 FCFS-r
OTA
100000
0
1 2 3 4 5
Number of maintenance staff
Fig. 4. The comparison of overall cost among different number of maintenance staff
To further examine the policies, we track the maintenance cost history and the production count
accumulation for the production lines with 3 maintenance staffs over the whole simulation time, as shown
in Fig. 5 and Fig. 6 respectively. It is evident that although the total cost and production count do not show
obvious difference among the four policies during the beginning transient period, OTA based maintenance
policy indeed outperform all other policies over the long run. This is because OTA based policy use the
expected cost rate as payoff function, and the optimal solution is guaranteed by exhaustive search using the
payoff. On the other hand, FCFS policy does not consider the potential impact of current maintenance
actions on future production count and blindly assigns staff to whatever comes first. Therefore, FCFS
policy leads the system with lower production count and higher maintenance cost in a long run.
In addition, single maintenance level actions that ignores machine aging also leads to a lower system
performance, as can be observed from Fig. 5 and Fig. 6. The total cost of FCFS-1c policy and FCFS-2c
policy are much higher than that of OTA policy, since OTA policy optimizes and adapts the option of
perfect maintenance and imperfect maintenance when assigning a CM work. Moreover, FCFS-r policy also
outperforms FCFS-1c and FCFS-2c in terms of lower total cost and higher production output, since FCFS-r
policy uses a fixed threshold to determine perfect and imperfect maintenance.
To summarize, optimally prioritizing the limited maintenance staff to perform the appropriate level of
maintenance can significantly improve productivity and reduce overall cost. The OTA optimization based
on the real-time system model and system properties such as permanent production loss is effective.
Total Cost ($)
Fig. 5. Maintenance cost history of policies

Production Count (Part)
Fig. 6. Production count among policies
6. Conclusion
In this paper, the CM scheduling problem in a serial stochastic production system is formulated as an
OTA problem. We defined the real-time expected cost rate of multi-level maintenance actions based on
machine age, production loss and production loss risk. The expected cost rate is defined as the payoff
function in OTA problem, which guaranteed an optimal solution. From the optimal solution of the OTA
problem, an optimal maintenance policy can be derived. Numerical experiments are performed, and a
simulation case study is shown to demonstrate the effectiveness of the OTA based policy. However,
solving an optimization problem could be computational expensive for very large system such as a
production line with more than 50 machines. In the future, we will explore a more efficient algorithm by
incorporating machine learning techniques.
Acknowledgments:
This work was supported by the U.S. National Science Foundation (NSF) Grant. 1351160 and 1853454.
References:
[1] T. Fetene Adane, M. F. Bianchi, A. Archenti, and M. Nicolescu, “Application of system dynamics
for analysis of performance of manufacturing systems,” J. Manuf. Syst., vol. 53, pp. 212–233, 2019.
[2] P. Zheng et al., “Smart manufacturing systems for Industry 4.0: Conceptual framework, scenarios,
and future perspectives,” Front. Mech. Eng., vol. 13, no. 2, pp. 137–150, 2018.
[3] T. Wu, X. Ma, L. Yang, and Y. Zhao, “Proactive maintenance scheduling in consideration of
imperfect repairs and production wait time,” J. Manuf. Syst., vol. 53, pp. 183–194, 2019.
[4] F. Ding and Z. Tian, “Opportunistic maintenance for wind farms considering multi-level imperfect
maintenance thresholds,” Renew. Energy, vol. 45, pp. 175–182, 2012.
[5] P. Do, A. Voisin, E. Levrat, and B. Iung, “A proactive condition-based maintenance strategy with
both perfect and imperfect maintenance actions,” Reliab. Eng. Syst. Saf., vol. 133, pp. 22–32, 2015.
[6] H. Pham and H. Wang, “Imperfect maintenance,” Eur. J. Oper. Res., vol. 94, no. 3, pp. 425–438,
1996.
[7] J. Huang, Q. Chang, J. Zou, and J. Arinez, “A real-time maintenance policy for multi-stage
manufacturing systems considering imperfect maintenance effects,” IEEE Access, vol. 6, pp.
62174–62183, 2018.
[8] H. Wang, “A survey of maintenance policies of deteriorating systems,” Eur. J. Oper. Res., 2002.
[9] E. Ruschel, E. A. P. Santos, and E. de F. R. Loures, “Industrial maintenance decision-making: A
systematic literature review,” J. Manuf. Syst., vol. 45, pp. 180–194, 2017.
[10] Q. Chang, J. Ni, P. Bandyopadhyay, S. Biller, and G. Xiao, “Maintenance Opportunity Planning
System,” J. Manuf. Sci. Eng., 2007.
[11] X. Gu, X. Jin, W. Guo, and J. Ni, “Estimation of active maintenance opportunity windows in
Bernoulli production lines,” J. Manuf. Syst., vol. 45, pp. 109–120, 2017.
[12] H. Pham and H. Wang, “Optimal (τ, T) opportunistic maintenance of ak‐out‐of‐n: G system
with imperfect PM and partial failure,” Nav. Res. Logist., vol. 47, no. 3, pp. 223–239, 2000.
[13] H. Wang and H. Pham, “Optimal preparedness maintenance of multi-unit systems with imperfect
maintenance and economic dependence,” Reliab. Optim. Maint., pp. 135–150, 2006.
[14] X. Wang, Y. Zhang, L. Wang, J. Wang, and J. Lu, “Maintenance grouping optimization with
system multi-level information based on BN lifetime prediction model,” J. Manuf. Syst., vol. 50,
pp. 201–211, 2019.
[15] J. Li and S. M. Meerkov, Production systems engineering. Springer Science & Business Media,
2008.
[16] A. Alrabghi and A. Tiwari, “State of the art in simulation-based optimisation for maintenance
systems,” Comput. Ind. Eng., vol. 82, pp. 167–182, 2015.
[17] Z. M. Yang, D. Djurdjanovic, and J. Ni, “Maintenance scheduling in manufacturing systems based
on predicted machine degradation,” J. Intell. Manuf., vol. 19, no. 1, pp. 87–98, 2008.
[18] M. Pandey, M. J. Zuo, and R. Moghaddass, “Selective maintenance scheduling over a finite
planning horizon,” Proc. Inst. Mech. Eng. Part O J. Risk Reliab., vol. 230, no. 2, pp. 162–177,
2016.
[19] H. Abdollahzadeh, K. Atashgar, and M. Abbasi, “Multi-objective opportunistic maintenance
optimization of a wind farm considering limited number of maintenance groups,” Renew. Energy,
vol. 88, pp. 247–261, 2016.
[20] J. Zou, Q. Chang, J. Arinez, G. Xiao, and Y. Lei, “Dynamic production system diagnosis and
prognosis using model-based data-driven method,” Expert Syst. Appl., 2017.
[21] Z. Yang, Q. Chang, D. Djurdjanovic, J. Ni, and J. Lee, “Maintenance Priority Assignment Utilizing
On-line Production Information,” J. Manuf. Sci. Eng., 2007.
[22] J. Levitt, “Maintenance Management,” Kirk‐Othmer Encycl. Chem. Technol., pp. 1–16, 2000.
[23] B. W. Niebel, Engineering maintenance management. CRC Press, 1994.
[24] T. Xia, X. Fang, N. Gebraeel, L. Xi, and E. Pan, “Online Analytics Framework of Sensor-Driven
Prognosis and Opportunistic Maintenance for Mass Customization,” J. Manuf. Sci. Eng., vol. 141,
no. 5, p. 51011, 2019.
[25] Y. S. Park, G. Egilmez, and M. Kucukvar, “A novel life cycle-based principal component analysis
framework for eco-efficiency analysis: case of the United States manufacturing and transportation
nexus,” J. Clean. Prod., vol. 92, pp. 327–342, 2015.
[26] P. Zurano-Cervelló, C. Pozo, J. M. Mateo-Sanz, L. Jiménez, and G. Guillén-Gosálbez, “Eco-
efficiency assessment of EU manufacturing sectors combining input-output tables and data
envelopment analysis following production and consumption-based accounting approaches,” J.
Clean. Prod., vol. 174, pp. 1161–1189, 2018.
[27] J. Zou, Q. Chang, X. Ou, J. Arinez, and G. Xiao, “Resilient adaptive control based on renewal
particle swarm optimization to improve production system energy efficiency,” J. Manuf. Syst.,
2019.
[28] Q. Chang, G. Xiao, S. Biller, and L. Li, “Energy saving opportunity analysis of automotive serial
production systems (March 2012),” IEEE Trans. Autom. Sci. Eng., 2013.
[29] L. Luo, N. Chakraborty, and K. Sycara, “Competitive analysis of repeated greedy auction algorithm
for online multi-robot task assignment,” in Proceedings - IEEE International Conference on
Robotics and Automation, 2012.
[30] J. Liu, Q. Chang, G. Xiao, and S. Biller, “The Costs of Downtime Incidents in Serial Multistage
Manufacturing Systems,” J. Manuf. Sci. Eng., vol. 134, no. 2, p. 021016, 2012.

Imperfect Corrective Maintenance Scheduling For Energy Efficient Manufacturing Systems Through Online Task Allocation Method

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Imperfect Corrective Maintenance Scheduling For Energy Efficient Manufacturing Systems Through Online Task Allocation Method

Uploaded by

Copyright:

Available Formats

Version of Record: https://www.sciencedirect.

Imperfect Corrective Maintenance Scheduling for Energy

Efficient Manufacturing Systems Through Online Task

Tian Yua，Cheng Zhua，Qing Changa,*, Junfeng Wangb

OTA based maintenance policy by comparing with other heuristic policies.

an isolated machine or equipment [7].

decision making [10].

a model-based CM policy, a real-time modeling of the production system is necessary.

opportunistic maintenance policy in mass customization manufacturing. However, maintenance staff

interactions of machines and stochastic nature of machine random failures.

Conclusions and future works are provided in Section 6.

2. System Description and Problem Statement

2.1 Production Line System Description

interval from zero to its maximum capacity.

Fig. 1. Standard structure of a serial production line.

The following notations are adopted in this paper:

• ∗ denotes the slowest machine in the serial production line, i.e. ∗

• denotes the buffer level of buffer at time ;

starting from ! and lasting for $ ! and the maintenance level is # ! ;

certain aging level.

• )* denotes the opportunity window of machine at time ;

• 1 denotes the total number of available maintenance staff at time .

We also make the following assumptions based on realistic scenarios:

failed machine will not be recovered until a CM action is finished.

• The duration of each maintenance level is assumed to be deterministic.

of-line is never blocked.

beginning of the line is never starved.

2.1 Problem Statement

2.2.1 Introduction to Online Task Allocation Problem

formulated as an Integer Linear Programming problem as:

2.2.2 Maintenance Scheduling Problem Formulation

equals to 1 if any staff is dispatched to the machine at time , and 0 otherwise.

3. Production System Modeling

manufacturing system can be modeled as the state space equation:

L = QR5 , R5 ,⋯,R ST , R is the state representing the production count of machine at

assignment of CM of level # , # ∈ 1(, 2( ;

P = Q*5 , *5 ,⋯,* ST , * is the random disruptions on machine at time . In this

data. It denotes whether a machine is suffering a random failure at time .

* = V W,5, X. 7 YZ-8[ Y>[J [

N ∗ = Q=5 ∗ , =5 ∗ , ⋯ , = ∗ ST , = ∗ is the dynamic function of machine .

buffer level of buffer at time can be evaluated as:

contained, we briefly summarize our previous studies without detailed proof.

loss, the opportunity window at time needs to be evaluated first.

Thus, consider a maintenance action denoted as ! = " , # !, ! , $ ! %, if $ ! ≤ )* , the maintenance

by the maintenance action ! can be formulated as:

n./ 4op. " ./ %

the scenarios of with and without ! [7].

For the disruption event I∗ , it happens with the probability c (E, !,

•{+,-D∗ | = ∑I@5 fW c"E, !, % +,-D∗ $ 10

‚ I ( ∗ ), and production loss +,

We have formulated real-time CM dispatching as OTA problem. The payoff < .

4.1. Maintenance Cost Analysis

J./ J./ J./ J./ J./

J./ J./ J./

action ! that may cause I being starved or blocked.

J./ J./ J. /’“ J. /’“

4.2 Evaluation of the Payoff Function

rate. Proposition 1. makes this accurate.

the total cost of the system.

could be rewritten as:

N , we have = D. = 1, in which indicates that < . is the E smallest element in ž , and E =

arg min 1 ,2 is the number of executable tasks at time . Therefore, N ¡¢ =

P = Q5 , 5 ,⋯,* ST , * is the random disruptions on machine at time . In this