Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Version of Record: https://www.sciencedirect.

com/science/article/pii/S0278612519300913
Manuscript_60bfd57f3584b23f4151a74debd39829

Imperfect Corrective Maintenance Scheduling for Energy

Efficient Manufacturing Systems Through Online Task

Allocation Method

Tian Yua,Cheng Zhua,Qing Changa,*, Junfeng Wangb

a
Department of Mechanical and Aerospace Engineering, University of Virginia, Charlottesville, VA, 22904, US
b
Department of Industrial and Manufacturing System Engineering, Huazhong University of Science and Technology, Wuhan, 430074,

China

Abstract

Maintenance management is critical in enabling a smooth production operation in a manufacturing system. Once one

machine fails, a corrective maintenance (CM) is required to resume its normal operation. However, perfect CM is not

always needed to restore the failed machine as good as new. A Multi-level or imperfect CM is a more realistic and

economic way based on the need of production operational levels. In addition, maintenance resources such as

maintenance staffs are limited in reality. Therefore, how to dispatch the limited maintenance workforce with an

appropriate level of CM is critical since it directly impacts the overall system’s productivity, energy efficiency and

cost. This paper aims at creating a real-time CM scheduling policy to reduce the overall maintenance and energy

related cost for a stochastic serial production line. To accomplish the goal, the CM scheduling problem is formulated as

an online task allocation (OTA) problem. The expected cost rate of the serial production line is introduced and used to

define the payoff function of the OTA problem. A numerical case study is provided to evaluate the effectiveness of the

OTA based maintenance policy by comparing with other heuristic policies.

Keywords: Real-time maintenance policy, Stochastic manufacturing system, Online task allocation, Energy Consumption

1. Introduction

Maintenance activity is important for all manufacturing systems. It plays a critical role in plant floor

operation to increase productivity, quality and reduce overall costs. Modern manufacturing systems are

highly complex with strongly interconnected machines and buffers [1], [2]. A random machine failure will

*Corresponding Author.
Email address: qc9nq@virginia.edu (Q. Chang)

© 2019 published by Elsevier. This manuscript is made available under the Elsevier user license
https://www.elsevier.com/open-access/userlicense/1.0/
not only halt production on a single machine but will also propagate through the entire system and make

other machines starved or blocked. Thus, corrective maintenance (CM) has to be carried out to respond to

the failures and rectify the faults so that the failed machines can be recovered.

The classification of maintenance level has raised attention in industry and academia [3]. A perfect CM

action restores the machine or equipment by replacing the component with a new one, while imperfect CM

is an action after which the machine is recovered to a certain state which is better than its pre-maintenance

condition but not as good as new. In addition, imperfect CM costs lower than perfect CM, but it may

accelerate the deterioration of components resulting in damage on machines or equipment [4], [5].

Therefore, a CM action can be further classified into multiple levels depending on the repairing restoration

degree and the associated cost [6]. Since a modern manufacturing system is complex with machines and

buffers being dynamically interacted. It turns out that the option of CM level is not just a local decision on

an isolated machine or equipment [7].

Tremendous works have been dedicated to maintenance policies considering multi-level maintenance

[8], [9]. Most of the studies have focused on single-unit systems. However, maintenance of multi-unit

systems differs from that of single-unit systems in that components in multi-unit systems are interconnected

and interdependent. Wang et al. [8] categorizes maintenance policies for multi-unit systems into group

maintenance policies and opportunistic maintenance policies. Both policies are based on the idea that any

component’s failure will impact the operation of the entire system, and it lends opportunities for other

components to be maintained at the same time without incurring additional production loss. Some studies

[10], [11] have explored the opportunity window of maintenance for multi-stage manufacturing systems.

However, these studies focus on single maintenance level without discussing multiple imperfect

maintenance levels and their implications to machines’ reliability and aging conditions.

In addition, most of the maintenance models for manufacturing systems are policy based [12], [13], [14],

in which policies are generated by observing the statistic behavior of the production system and

maintenance operations in a long run. The statistical description offers an average expectation of the system

behavior, and thus particular policies depending on which may not be the optimal maintenance policies

especially for the real-time scenario because the dynamic behavior of the system are not fully analysed for

decision making [10].


A multi-stage manufacturing system is nonlinear and stochastic. There is no closed form representation

of the performance for a general serial production line [15]. It is very difficult to find out how the

maintenance-caused stoppage of machines relates to the production loss of the system. To deal with the

challenge, simulation-based methods have been developed to search for the optimal maintenance policy

[16]. For example, Yang et al. [17] develops a Genetic Algorithm (GA) based method to search for the most

cost-effective policy by evaluating maintenance schedule through a discrete-event simulation. Other studies

like Pandey et al. [18] apply evolutionary algorithm to minimize total maintenance, failure and shutdown

cost. Abdollahzadeh et al. [19] presents a multi-objective-based model in order to minimize total expected

costs due to maintenance actions. However, simulation-based optimization cannot reveal relationships

between system parameters and performance. Zou et al. [20] has shown that the system production loss

caused by machines random failure directly relates to the dynamic states of the system. Therefore, to obtain

a model-based CM policy, a real-time modeling of the production system is necessary.

In modern lean manufacturing environment, it is not an uncommon scenario that maintenance tasks are

more than available maintenance staff at a certain time. Therefore, maintenance priority directly impacts

production performance [21]. A common practice on plant floor is to use heuristic rules derived from

human expert knowledge to prioritize the maintenance works [22], [23]. Xia et al. [24] have developed an

online manufacturing analytics framework for integrating sensor-driven prognostic models and

opportunistic maintenance policy in mass customization manufacturing. However, maintenance staff

limitation and impact from multiple maintenance levels are not considered. Meanwhile, with increasing

concerns on eco-efficiency of manufacturing all over the world [25], [26], energy and environment related

performance of manufacturing systems have increasingly raised attentions [27], [28]. Maintenance decision

making needs a systematic method to trade-off production performance, maintenance related cost and

energy efficiency to minimize the overall operational cost. Recently, Huang et al. [7] have developed a real-

time maintenance scheduling method for multi-stage manufacturing systems through utilizing an expected

cost rate concept while considering the aging model of multi-level maintenance. However, maintenance

staff constraint and energy cost are ignored, which may not be a practical scenario.

The existing maintenance scheduling methods for multi-stage production line either focus on single level

maintenance or has not systematically taken the real-time production information into consideration on how
a machine’s aging status after a certain repair level will impact system-level performance such as

throughput, energy consumption and the overall cost. A timely CM scheduling to assign limited workforce

to the right machine and to carry out the right level of CM is critical but challenging due to the nonlinear

interactions of machines and stochastic nature of machine random failures.

This paper is devoted to addressing the above challenges and obtaining an online CM policy by

prioritizing limited maintenance staff to perform an appropriate level of maintenance to minimize overall

cost. The main contributions of this paper are: 1) formulating the CM scheduling as an online task

allocation (OTA) problem and defining the OTA payoff function by integrally considering production real-

time dynamic status, multi-level maintenance cost and energy consumption, 2) evaluating the production

loss of various maintenance actions according to the real-time system properties and estimating real-time

maintenance cost rate, and 3) developing an optimal CM scheduling policy by solving the OTA problem.

This paper is organized as follows. Section 2 introduces production system structure and notations, the

online task allocation (OTA) problem, as well as CM dispatching formulation based on OTA. In Section 3,

the manufacturing system data-driven model is introduced, based on which the production loss and

production loss risk triggered by corresponding maintenance actions are evaluated. In Section 4, the overall

cost and the expected cost rate are derived, according to which the solution to OTA problem and the real-

time maintenance policy are derived. A case study and numerical experiments are given in Section 5.

Conclusions and future works are provided in Section 6.

2. System Description and Problem Statement

2.1 Production Line System Description

A serial production line with machines (represented by rectangles) and − 1 intermediate buffers

(represented by circles) is shown in Fig. 1. In this paper, the continuous flow model is adopted for

convenience of mathematical representation, where the buffer levels change continuously within the

interval from zero to its maximum capacity.

Fig. 1. Standard structure of a serial production line.

The following notations are adopted in this paper:


• denotes the machine, = 1,2, … , ;

• denotes the buffer, = 2,3, … , . It also denotes the maximum of the buffer capacity.

• denotes the cycle time of machine , and 1/ is the fixed rated speed of machine

• ∗ denotes the slowest machine in the serial production line, i.e. ∗


= arg max , =

1, 2, … , ;

• denotes the buffer level of buffer at time ;

• ! = " , # !, ! , $ ! %, = 1,2, … , , & = 1,2, …, denotes the & maintenance action on machine

starting from ! and lasting for $ ! and the maintenance level is # ! ;

• # ! ∈ 1(, 2( denotes the level of maintenance on machine for the & maintenance. In this

paper, we adopt two levels of maintenance, in which 1( means perfect maintenance that restores

the machine as good as new, and 2( means imperfect maintenance that resumes the machine to a

certain aging level.

• )* denotes the opportunity window of machine at time ;

• +,-./ denotes the permanent production loss caused by the maintenance action !;

• +,0-./ denotes the production loss risk caused by the maintenance action !;

• 1 denotes the total number of available maintenance staff at time .

• 2 denotes the total number of machines waiting in the queue to be repaired at time .

We also make the following assumptions based on realistic scenarios:

• CM is taken upon failure if and only if there is at least one maintenance staff available, and a

failed machine will not be recovered until a CM action is finished.

• Two maintenance levels are adopted: a perfect maintenance action will restore the machine “as

good as new”; an imperfect maintenance recovers the machine to a degree between the current

age and new. The recovery effect can be evaluated based on the age reduction factor to the

maintenance action.

• The duration of each maintenance level is assumed to be deterministic.

• Once a maintenance staff is assigned to do the CM, the staff is not available until the maintenance

is finished, all maintenance staff are treated at the same skill level.
• Machine , = 1,2, ⋯ , − 1 is blocked if its downstream buffer is full. The machine at the end-

of-line is never blocked.

• Machine , = 2,3, ⋯ , is starved if its upstream buffer !45 is empty. The machine at the

beginning of the line is never starved.

2.1 Problem Statement

2.2.1 Introduction to Online Task Allocation Problem

An online task allocation (OTA) problem can be described as in [29]: for a mission with 6 tasks with 67

disjoint subsets of tasks risen one at a time, the strategy aiming at maximizing or minimizing the total

payoff of the task assignment is to allocate 68 staff to the subsets of tasks in a way that each task is solved

by only one staff and each staff can finish only one task at one time in a subset. This problem can be

formulated as an Integer Linear Programming problem as:

9 6 ∑ @5 ∑!@5 < != ! 1
A> ? >
:./

s.t.

∑>@5
A
= ! = 1 , ∀& = 1, … , 68

∑ . ∈CD
= ! = 1 , ∀&, E: & = 1, … , 68 , E = 1, … , 67

∑>@5
A
= ! ≤ 1 , ∀& = 1, … , 68

= ! ∈ 0,1 , ∀ , &

where < ! is the payoff of the & staff finishing the task and = ! defines the allocation of the staff, which

equals to 1 if the task is allocated with the & staff and 0 otherwise; is the time that the task is

assigned to the staff and I denotes the time that the E task subset arises.

2.2.2 Maintenance Scheduling Problem Formulation

With the definition of the OTA problem, we can formulate the real-time CM scheduling problem into an

OTA problem with limited maintenance staff to perform appropriate level of CM in order to minimize the

overall cost in a long run. Since all maintenance staffs are assumed to be at the same skill level, the payoff
< ! in Eqn. (1) can be simplified as < . , = 1,2, ⋯ , . Let =
J J.
be the corresponding variable that

equals to 1 if any staff is dispatched to the machine at time , and 0 otherwise.

The goal is to allocate all available staff to failed machines in order to minimize the total payoff of

maintenance.

i.e.,

9 6 ∑ @5 < . = 2
J J.
K
:. ,J.

s.t.

∑ @5 = J. ≤1

= ∈ 0,1 , ∀ , , #
J.

# ∈ 1(, 2(

The constraint indicates that each available maintenance staff can only perform a certain level of

maintenance on exactly one machine at time . In addition, the maintenance level # on machine can be

either perfect (1() or imperfect (2(). Specifically, for every = = 1, the payoff function < .
J. J
with two

levels of # should be compared. Therefore, our goal is to find the optimal maintenance allocation of =
J.

with appropriate maintenance level # to minimize the total payoff. One key challenge for this problem is to

define the payoff function < . . In order to find the appropriate payoff function < .
J J
, we need to further

understand the production system dynamic properties, which are introduced in the next section.

3. Production System Modeling

In multi-stage serial production line, the dynamic interactions between machines and buffers are

nonlinear since the buffers have finite capacity. In addition, the system is constantly affected by machines’

random failures and maintenance scheduling. Based on our previous work in [7], the dynamic

manufacturing system can be modeled as the state space equation:

LM = N"L ,O ,P % 3

In which,

L = QR5 , R5 ,⋯,R ST , R is the state representing the production count of machine at

time ;
O = QU5 , U5 ,⋯,U ST , U is the control input for machine at time , which is the

assignment of CM of level # , # ∈ 1(, 2( ;

P = Q*5 , *5 ,⋯,* ST , * is the random disruptions on machine at time . In this

paper, such information is the random failures of machines, which can be directly obtained from sensor

data. It denotes whether a machine is suffering a random failure at time .

* = V W,5, X. 7 YZ-8[ Y>[J [


X. :[ J7 [ \-
\-
4

N ∗ = Q=5 ∗ , =5 ∗ , ⋯ , = ∗ ST , = ∗ is the dynamic function of machine .

Based on the conservation of flow, we can evaluate system state at any time according to [8]. The

buffer level of buffer at time can be evaluated as:

= 0 +R − R 45 5

Indeed, random failures of machine will directly impact the system state and buffer level. How to assign

the CM and to what level resuming the failed machines also play a significant role in changing system

states and production performance. In a multi-stage serial production line, a stoppage at one machine does

not necessarily lead to permanent production loss of the entire line, due to the presence of the finite buffers

in the line and asynchronous machine process [10]. Our previous studies demonstrate that there exists a

threshold for machine downtime, beyond which the system will suffer permanent production loss that

cannot be recovered [20]. This threshold is defined as opportunity window. To keep this paper self-

contained, we briefly summarize our previous studies without detailed proof.

Definition 1. [28], [30] Opportunity window is the longest possible disruption on machine at time

that does not result in permanent production loss at the end-of-line machine, i.e.

)* = abc V$ ≥ 0: a. . ∃ $ , fW RM g $g = fW RM g; $g, ∀ ≥ $ i 6
∗ C C ∗

where fW RM g $g and fW RM g; $g are end-of-line production counts without and with disruption
C C

event respectively.

If the duration of a disruption event is longer than the machine’s opportunity window, a permanent

production loss will be incurred on the whole line. Otherwise, the disruption event will only cause

temporary stoppage of adjacent machines, which can be recovered eventually. Indeed, a CM is carried out
when there is a downtime event. Thus, to derive whether a CM action will cause permanent production

loss, the opportunity window at time needs to be evaluated first.

In our previous work, we further introduce the concept of permanent production loss. The permanent

production loss can be defined as the difference between the output of the real scenario of the production

line and a virtual ideal scenario of the same line [30]. The virtual ideal scenario of a production line is

defined as a scenario that there are no disruptions due to machine failures and other impacts. Indeed, for a

serial production line, the slowest machine eventually determines the production output [15].

Thus, consider a maintenance action denoted as ! = " , # !, ! , $ ! %, if $ ! ≤ )* , the maintenance

will not incur permanent production loss; however, if $ ! > )* ( ), the maintenance will halt the slowest

machine, which will result in permanent production loss. Therefore, the permanent production loss caused

by the maintenance action ! can be formulated as:

n./ 4op. " ./ %


+,-./ = 9<l m , 0r 7
Cq∗

From Eqn. (7) we can find that a maintenance ! will cause permanent production loss +, -./ only if it

lasts longer than the opportunity window of machine and the observation interval is from ! to ! + $ !.

Therefore, +, -./ only measures the impact of ! on the system within its duration. However, the

maintenance action ! will leave further impact on the system because it totally changes system states at

time !. To evaluate this profound impact on the system, production loss risk +,0 -./ is introduced to

measure the difference in production losses of the first following disruption event after ! by comparing

the scenarios of with and without ! [7].


For example, after time !, the first random failure occurs on machine I at time and a maintenance

action I∗ = E, #I ∗ , ∗ , $I ∗ is taken with the same level as ! , #I ∗ = # ! . Since I∗ is the only disruption


event following ! without any other events between ! and . The full disruption events can be included

u,
in t = Qt ! , I ∗ S,
u is the existing downtime list up to time
in which t !.

With the list of disruption events within the very near future t and current system states, the set of

buffer levels v ∗
=Q w

, x

,…, ∗
S at time ∗
can be derived, with which the opportunity
window )*I ∗
can be evaluated. Then, the permanent production loss +, -D∗ ( ∗ ) caused by I∗ can be

derived as:

nD∗ 4opD ∗
+,-D∗ ∗
= 9<l V , 0i 8
Cq∗

For the disruption event I∗ , it happens with the probability c (E, !,



) based on the reliability of

machine I. Because the machines are independent with each other in reliability, the probability c (E, !,


) is the joint probability of machines, i.e.

% ∏\@5,\}I {1 − fW c\ "

c"E, !,

% = cI " !,

! , g%$g| 9

The expected value of the permanent production loss of I∗ can be evaluated as:

•{+,-D∗ | = ∑I@5 fW c"E, !, % +,-D∗ $ 10


€ ∗ ∗ ∗

Please note that it is the potential disruption event I∗ that causes +, -D∗ ( ∗ ) rather than the initial

disruption event !. Note that the existence of ! alters the buffer levels v ∗
at time ∗
, and thus the

)*I ∗
is changed, which results in different +,-D∗ ∗
. Therefore, the production loss of I∗ without !

happens at time ! should also be calculated in order to evaluate how ! influences the future event I∗ .

• = Qt
Considering a case that the full disruption events list is t u, I ∗ S,
•( ∗ ), opportunity
the buffer levels v

‚ I ( ∗ ), and production loss +,


window )* ‚ - ∗ ( ∗ ) can be similarly evaluated at time ∗
D
. Therefore, the

expected production loss of the disruption event I∗ regardless of ! can be formulated as:

‚ - ∗ | = ∑I@5 f€ cƒ"E,
•{+, !,
∗ ‚- ∗
% +, ∗
$ ∗
11
D W D

where cƒ"E, !,

% is the probability that machine SI fails at time ∗
. Note that cƒ"E, !,

% ≠ c (E, !,

),

since cƒ"E, !,

% is derived by assuming that machine does not incur !. Eventually, the production loss

risk of the maintenance action ! can be defined as the difference of expected production losses with and

without !:

‚ - ∗|
+,0 -./ = •{+,-D∗ | − •{+, 12
D

With the derivation of the production loss and the production loss risk of !, we can quantify the impact

of a maintenance action to the whole system. Using this property, we will show the derivation of payoff

<.
J
in OTA problem for CM dispatching.
4. Maintenance Cost Analysis for OTA

We have formulated real-time CM dispatching as OTA problem. The payoff < .


J
directly relates to the

production loss as discussed in the previous section, as well as maintenance related cost. One key point is

that the aging condition of a machine will have significant implication to maintenance related cost. Thus,

the failure rate of machines in our system is not deterministic but changes over time.

4.1. Maintenance Cost Analysis

The cost of a maintenance action contains two parts: resources cost and production loss cost. The

resource cost includes the cost of consumable materials and machine or equipment replacement, depending

on the maintenance level. In the meanwhile, during a maintenance action, the energy consumption could be

reduced due to the stoppage of machines for maintenance. Therefore, given a maintenance action ! =

J./
" , # !, ! , $ ! %, the cost of this maintenance action, denoted as U ! , can be evaluated as:

J./ J./ J./ J./ J./


U! = ( + (Z …+,- + +,0- † − ‡ ! 13
./ ./

J./ J./ J./


where ( is the resource cost of maintenance level # ! , (Z is the profit per part, +,-./ and +,0-./ are the

permanent production loss and production loss risk due to ! respectively. These two terms have been

J./ J./
derived in the previous section. ‡ ! is the electrical energy saving during $ ! which can be evaluated as:

‡ ! = (- ⋅ ∑I@5 ‰I!
J./ J./
14

J./
where (- is the unit cost of electricity, ‰I! is the stoppage time of machine I due to the maintenance

action ! that may cause I being starved or blocked.

One may argue that the total cumulated cost during a finite time horizon may not be appropriate to guide

an optimal maintenance decision. Although a perfect maintenance may cost much higher than an imperfect

maintenance, it may provide the machine a longer operation time before the next failure and thus reduce the
J./
CM frequency and the associated cost. However, this relationship is not explicitly modeled in U ! . Thus, a

normalized cost rate needs to be defined to evaluate the maintenance cost per unit time before the next

failure arrival to capture the machine aging condition for a certain level of CM. A real-time maintenance
J./
cost rate 0 ! is defined as:
K ./
J./ Š./
0! = K ./ K ./ 15
‹./ Œn./

J J
where $ !./ is the duration of maintenance, • !./ is the lifetime of machine after the & maintenance, which

J./
is a random variable with the distribution c … Ž• ! † following the aging model of machines [8]. The

J./
virtual age • ! can be evaluated as:

J./ J./ J. /’“ J. /’“


•! =‘ …• !45
+• !45
† 16

J./ J. /’“
where ‘ is the factor of age reduction in maintenance level # ! , • !45
is the virtual age of machine

J. /’“
right after its & − 1 maintenance action is finished, and • !45
is the life time after & − 1

J./
maintenance action. It is clear that for a perfect maintenance, ‘ = 0 since the virtual age of machine

J./
starts from zero, while 0 < ‘ < 1 for imperfect maintenance.

J./
The probability of time to failure is a conditional probability on the machine’s virtual age • ! , i.e.

K ./
J./ Z. • Œ–./ —
c … Ž• ! † = ™

>0 17
f K./ Z. ˜ n˜

š
./

Therefore, the expected cost rate after its & maintenance and before the next failure can be calculated

as:
K./
J./ € Š./ J./
• {0 ! | = fW c … Ž• ! † $ 18
Œn./

It is noted that the evaluation of expected cost rate considers both the maintenance cost and aging

conditions for various level of maintenance. When a random failure occurs at a machine, a proper level of

J./
maintenance should be chosen to minimize the expected cost rate • {0 ! |.

4.2 Evaluation of the Payoff Function

Based on the cost analysis above, we can derive the expected cost rate for each specific maintenance

action from Eqn. (18). Thus, we can always compare the expected cost rate among all possible maintenance

prioritization and maintenance levels to find the minimal one. In another word, we can obtain the minimum

expected cost rate and the corresponding maintenance assignment whenever a maintenance action is
needed. Examining the OTA formulation of the maintenance scheduling problem in section 2.2.2, we can

J./
conclude that the expected cost rate • {0 ! | is a reasonable payoff value of < .
J
in Eqn. (2). With this

payoff definition, we do not need to solve a linear programming problem of OTA. We can prove that with

J./
payoff to be • {0 ! |, the optimal solution for the OTA problem indeed minimizes the total expected cost

rate. Proposition 1. makes this accurate.

J./ T
Proposition 1. Let < . = • {0 ! |, assume NYZ = › =5 “ , =w œ ,...,= • is a solution of the
J J J Jq

OTA problem in Eqn. (2). Then, the solution is the optimal CM assignment that guarantees the minimum of

the total cost of the system.

Proof: Let ž = ›<5“ , <wœ ,,...,< q • denotes the set of payoff at time , N =
J J J

T
›=5 “ , =w œ ,...,= • is the set of maintenance allocation. Thus, the problem
J J Jq

9 6 ∑ @5 < . = 19
J J.
:.K ,J.

could be rewritten as:

9 6ž ∙N 20
N

T
Since N = › =5 “ , =w œ ,...,= • is the optimal solution that minimizes the product ž ∙
J J Jq
¡¢

N , we have = D. = 1, in which indicates that < . is the E smallest element in ž , and E =


J J
I

arg min 1 ,2 is the number of executable tasks at time . Therefore, N ¡¢ =

C
› =5 “ , =w œ ,...,= • provides the allocation with the respective maintenance level [#5 , #w, … , # ]
J J Jq

that follows the ranking of the elements in ž . Then the ranking of < .
J
indicates the ranking of

T
expected cost rate. Therefore, the solution N = › =5 “ , =w œ ,...,= • provides the optimal
J J Jq
¡¢

maintenance assignment to guarantee the minimal expected cost rate at time . In this way, we can obtain

the minimum of total cost over the time horizon by integrating the cost for each maintenance scheduling at

time .

Based on the analysis above, the optimal CM scheduling policy algorithm is shown in Fig. 2.
Fig. 2. Flow chart of the real-time CM assignment policy.
Initialize , , 0 , machine aging, failure rate and number of maintenance staff
Start the production simulation from ← 0

1) Check the number of failed machines 2


Repeat:

2) Check the number of available maintenance staff 1


3) If 2 ≠ 0&& 1 >0
4) Check all the allocations of maintenance staff N
5) Check all corresponding cost of maintenance ¬
J./
For each N , calculate the corresponding expected cost rate < . ← • {0 ! |
J
6)
Calculate the reward - ← ∑ @5 < . =
J J.

Find the optimal allocation N ¡¢


7)

Update the total cost ¥\ ← ¥\ + ¬ ∙ N


8)
¡¢
Else ← + 1
9)
10)

Until =
Calculate the total cost ¥ Y [J ← ¥\ + ¥J[§Y8
Output the total maintenance cost ¥ Y [J

The structured algorithm is shown as below:

5. Case Study

In order to evaluate the effectiveness of the OTA based maintenance policy, extensive numerical

simulations are performed. To measure the performance of maintenance policies, a performance metric is

selected to be the total cost that includes maintenance related cost and the labor cost within a time interval

(0, ]. The total cost ¥ Y [J can be defined as:

= ∑ @5 ∑¦
J./
¥Y [J !@5 U ! +6¥J[§Y8
C
21

in which, ¨ is the total number of maintenance work orders on machine up to time , 6 is the total

number of maintenance staff within the time duration (0, S, and ¥J[§Y8 is the unit cost for a maintenance

staff. We will evaluate and compare the total cost ¥ Y [J deriving from our proposed OTA based

maintenance policy and three other benchmark policies. We introduce three benchmark maintenance

policies for comparison. The first one is called FCFS-1c, which always allocates perfect maintenance

according to the order of machines’ breakdown. The second is FCFS-2c, which allocates imperfect

maintenance according to the order of machines’ breakdown. The third one is FCFS-r, which allocates

different maintenance levels according to a pre-defined machine age threshold, and the maintenance order

follows the order of machines’ breakdown. When a machine’s age goes beyond the threshold, a perfect
maintenance is performed, otherwise, an imperfect maintenance is carried out. The threshold is selected to

be 50 9 6 for this experiment. It is noted that all three benchmark policies are commonly used in practice.

To perform the experiment 100 different serial production lines was constructed by randomly selecting

parameters from the following sets:

∈ Q5, 10S9 6, = 1, 2, … , 10

∈ Q2, 40S, = 2, … , 10

0 ∈ Q0, S, = 2, … , 10

For demonstration purpose, the results from 5 different lines with 10 machines and 9 buffers are shown.

For each production line, the simulation time is one week, i.e. = 10080 9 6, and the simulation iteration

is 20. A widely-used failure rate model known as Weibull distribution is adopted to evaluate the machine

reliability with the scale parameter ® and the shape parameter ¯ . The probability density function is given

as:

±.
c = lc •− … † — 22
°.

J./
Given the virtual age • ! of the machine, the conditional probability density of the failure rate is

K ./ ±. K ./ ±. K./ ±.
J./ Œ–./ –./ Œ–./
c … Ž• ! † = ² ³ lc ´² ³ −² ³ µ 23
±.
°. °. °. °.

The shape parameter ¯ for every machine is set to 2 in this case study and the scale parameter ® and

the initial virtual age ¶ W are picked randomly from:

® ∈ Q500, 2000S9 6, = 1, 2, … , 10

¶ W ∈ Q0, 100S9 6, = 1,2, … , 10

The maintenance parameters are shown in Tab.1. The maintenance staff cost is assumed to be $24 per

hour.
Table 1. Maintenance Parameters

Maintenance Age Reduction Maintenance Resource Cost

Level- · Factor ž·¸ Duration (min) (US Dollar)

1( 0 30 1200

2( 0.5 15 600

First, we compare the total cost ¥ Y [J in Eqn. (21) of all four policies with the number of

maintenance staff from 1 to 5. For each randomly generated production line, the average total cost for the 5

lines are shown in Fig. 3(a)-(e). The small “I” bars on top of each bar represent the 95% confidence interval

(CI). Overall, it is obvious that the average total cost of OTA policy is the lowest, which demonstrates that

the OTA based maintenance policy is effective in reducing the total maintenance cost. Note that for 1 or 2

maintenance staff, there are minor overlaps of 95% CI between OTA policy and FCFS-r for line 4 and line

5. There are no overlaps of 95% CI between OTA and any other policy for more than 3 maintenance staffs.

It means that 1 or 2 staffs might not be enough, and 3 or more staffs are required for these lines. Although

the OTA policy has potential to assign appropriate CM level to the right machines to minimize the total

cost, insufficient maintenance staff limits the benefits of the algorithm and leads to extra production loss.

Total maintenance cost with 1 maintenance staff


800000

700000

600000

500000
Total Cost ($)

FCFS-1c
400000
FCFS-2c
300000 FCFS-r

200000 OTA

100000

0
Line 1 Line 2 Line 3 Line 4 Line 5
Production Line
(a)

Total maintenance cost with 2 maintenance staff


800000

700000

600000

500000
Total Cost ($)

FCFS-1c
400000
FCFS-2c
300000 FCFS-r

200000 OTA

100000

0
Line 1 Line 2 Line 3 Line 4 Line 5
Production Line

(b)

Total maintenance cost with 3 maintenance staff


800000

700000

600000

500000
Total Cost ($)

FCFS-1c
400000
FCFS-2c
300000 FCFS-r

200000 OTA

100000

0
Line 1 Line 2 Line 3 Line 4 Line 5
Production Line

(c)
Total maintenance cost with 4 maintenance staff
800000

700000

600000

500000
Total Cost ($)

FCFS-1c
400000
FCFS-2c
300000 FCFS-r
OTA
200000

100000

0
Line 1 Line 2 Line 3 Line 4 Line 5
Production Line

(d)

Total maintenance cost with 5 maintenance staff


800000

700000

600000

500000
Total Cost ($)

FCFS-1c
400000
FCFS-2c
300000 FCFS-r
OTA
200000

100000

0
Line 1 Line 2 Line 3 Line 4 Line 5
Production Line

(e)

Fig. 3. The comparison of overall cost among policies.


Furthermore, we compare the total cost of all policies for different number of maintenance staff as

shown in Fig. 4. It is clear that the average total cost always reaches its minimum for all policies when

there are 3 maintenance staffs. This implies that 3 maintenance staffs may be just adequate for this kind of

10 machines and 9 buffers serial production line.,

Total maintenance cost with different number of maintenance


staff
600000

500000

400000
Total cost ($)

FCFS-1c
300000
FCFS-2c
200000 FCFS-r
OTA
100000

0
1 2 3 4 5
Number of maintenance staff

Fig. 4. The comparison of overall cost among different number of maintenance staff

To further examine the policies, we track the maintenance cost history and the production count

accumulation for the production lines with 3 maintenance staffs over the whole simulation time, as shown

in Fig. 5 and Fig. 6 respectively. It is evident that although the total cost and production count do not show

obvious difference among the four policies during the beginning transient period, OTA based maintenance

policy indeed outperform all other policies over the long run. This is because OTA based policy use the

expected cost rate as payoff function, and the optimal solution is guaranteed by exhaustive search using the

payoff. On the other hand, FCFS policy does not consider the potential impact of current maintenance

actions on future production count and blindly assigns staff to whatever comes first. Therefore, FCFS

policy leads the system with lower production count and higher maintenance cost in a long run.

In addition, single maintenance level actions that ignores machine aging also leads to a lower system

performance, as can be observed from Fig. 5 and Fig. 6. The total cost of FCFS-1c policy and FCFS-2c
policy are much higher than that of OTA policy, since OTA policy optimizes and adapts the option of

perfect maintenance and imperfect maintenance when assigning a CM work. Moreover, FCFS-r policy also

outperforms FCFS-1c and FCFS-2c in terms of lower total cost and higher production output, since FCFS-r

policy uses a fixed threshold to determine perfect and imperfect maintenance.

To summarize, optimally prioritizing the limited maintenance staff to perform the appropriate level of

maintenance can significantly improve productivity and reduce overall cost. The OTA optimization based

on the real-time system model and system properties such as permanent production loss is effective.
Total Cost ($)

Fig. 5. Maintenance cost history of policies


Production Count (Part)

Fig. 6. Production count among policies

6. Conclusion

In this paper, the CM scheduling problem in a serial stochastic production system is formulated as an

OTA problem. We defined the real-time expected cost rate of multi-level maintenance actions based on

machine age, production loss and production loss risk. The expected cost rate is defined as the payoff

function in OTA problem, which guaranteed an optimal solution. From the optimal solution of the OTA

problem, an optimal maintenance policy can be derived. Numerical experiments are performed, and a

simulation case study is shown to demonstrate the effectiveness of the OTA based policy. However,

solving an optimization problem could be computational expensive for very large system such as a

production line with more than 50 machines. In the future, we will explore a more efficient algorithm by

incorporating machine learning techniques.

Acknowledgments:

This work was supported by the U.S. National Science Foundation (NSF) Grant. 1351160 and 1853454.
References:

[1] T. Fetene Adane, M. F. Bianchi, A. Archenti, and M. Nicolescu, “Application of system dynamics
for analysis of performance of manufacturing systems,” J. Manuf. Syst., vol. 53, pp. 212–233, 2019.
[2] P. Zheng et al., “Smart manufacturing systems for Industry 4.0: Conceptual framework, scenarios,
and future perspectives,” Front. Mech. Eng., vol. 13, no. 2, pp. 137–150, 2018.
[3] T. Wu, X. Ma, L. Yang, and Y. Zhao, “Proactive maintenance scheduling in consideration of
imperfect repairs and production wait time,” J. Manuf. Syst., vol. 53, pp. 183–194, 2019.
[4] F. Ding and Z. Tian, “Opportunistic maintenance for wind farms considering multi-level imperfect
maintenance thresholds,” Renew. Energy, vol. 45, pp. 175–182, 2012.
[5] P. Do, A. Voisin, E. Levrat, and B. Iung, “A proactive condition-based maintenance strategy with
both perfect and imperfect maintenance actions,” Reliab. Eng. Syst. Saf., vol. 133, pp. 22–32, 2015.
[6] H. Pham and H. Wang, “Imperfect maintenance,” Eur. J. Oper. Res., vol. 94, no. 3, pp. 425–438,
1996.
[7] J. Huang, Q. Chang, J. Zou, and J. Arinez, “A real-time maintenance policy for multi-stage
manufacturing systems considering imperfect maintenance effects,” IEEE Access, vol. 6, pp.
62174–62183, 2018.
[8] H. Wang, “A survey of maintenance policies of deteriorating systems,” Eur. J. Oper. Res., 2002.
[9] E. Ruschel, E. A. P. Santos, and E. de F. R. Loures, “Industrial maintenance decision-making: A
systematic literature review,” J. Manuf. Syst., vol. 45, pp. 180–194, 2017.
[10] Q. Chang, J. Ni, P. Bandyopadhyay, S. Biller, and G. Xiao, “Maintenance Opportunity Planning
System,” J. Manuf. Sci. Eng., 2007.
[11] X. Gu, X. Jin, W. Guo, and J. Ni, “Estimation of active maintenance opportunity windows in
Bernoulli production lines,” J. Manuf. Syst., vol. 45, pp. 109–120, 2017.
[12] H. Pham and H. Wang, “Optimal (τ, T) opportunistic maintenance of ak‐out‐of‐n: G system
with imperfect PM and partial failure,” Nav. Res. Logist., vol. 47, no. 3, pp. 223–239, 2000.
[13] H. Wang and H. Pham, “Optimal preparedness maintenance of multi-unit systems with imperfect
maintenance and economic dependence,” Reliab. Optim. Maint., pp. 135–150, 2006.
[14] X. Wang, Y. Zhang, L. Wang, J. Wang, and J. Lu, “Maintenance grouping optimization with
system multi-level information based on BN lifetime prediction model,” J. Manuf. Syst., vol. 50,
pp. 201–211, 2019.
[15] J. Li and S. M. Meerkov, Production systems engineering. Springer Science & Business Media,
2008.
[16] A. Alrabghi and A. Tiwari, “State of the art in simulation-based optimisation for maintenance
systems,” Comput. Ind. Eng., vol. 82, pp. 167–182, 2015.
[17] Z. M. Yang, D. Djurdjanovic, and J. Ni, “Maintenance scheduling in manufacturing systems based
on predicted machine degradation,” J. Intell. Manuf., vol. 19, no. 1, pp. 87–98, 2008.
[18] M. Pandey, M. J. Zuo, and R. Moghaddass, “Selective maintenance scheduling over a finite
planning horizon,” Proc. Inst. Mech. Eng. Part O J. Risk Reliab., vol. 230, no. 2, pp. 162–177,
2016.
[19] H. Abdollahzadeh, K. Atashgar, and M. Abbasi, “Multi-objective opportunistic maintenance
optimization of a wind farm considering limited number of maintenance groups,” Renew. Energy,
vol. 88, pp. 247–261, 2016.
[20] J. Zou, Q. Chang, J. Arinez, G. Xiao, and Y. Lei, “Dynamic production system diagnosis and
prognosis using model-based data-driven method,” Expert Syst. Appl., 2017.
[21] Z. Yang, Q. Chang, D. Djurdjanovic, J. Ni, and J. Lee, “Maintenance Priority Assignment Utilizing
On-line Production Information,” J. Manuf. Sci. Eng., 2007.
[22] J. Levitt, “Maintenance Management,” Kirk‐Othmer Encycl. Chem. Technol., pp. 1–16, 2000.
[23] B. W. Niebel, Engineering maintenance management. CRC Press, 1994.
[24] T. Xia, X. Fang, N. Gebraeel, L. Xi, and E. Pan, “Online Analytics Framework of Sensor-Driven
Prognosis and Opportunistic Maintenance for Mass Customization,” J. Manuf. Sci. Eng., vol. 141,
no. 5, p. 51011, 2019.
[25] Y. S. Park, G. Egilmez, and M. Kucukvar, “A novel life cycle-based principal component analysis
framework for eco-efficiency analysis: case of the United States manufacturing and transportation
nexus,” J. Clean. Prod., vol. 92, pp. 327–342, 2015.
[26] P. Zurano-Cervelló, C. Pozo, J. M. Mateo-Sanz, L. Jiménez, and G. Guillén-Gosálbez, “Eco-
efficiency assessment of EU manufacturing sectors combining input-output tables and data
envelopment analysis following production and consumption-based accounting approaches,” J.
Clean. Prod., vol. 174, pp. 1161–1189, 2018.
[27] J. Zou, Q. Chang, X. Ou, J. Arinez, and G. Xiao, “Resilient adaptive control based on renewal
particle swarm optimization to improve production system energy efficiency,” J. Manuf. Syst.,
2019.
[28] Q. Chang, G. Xiao, S. Biller, and L. Li, “Energy saving opportunity analysis of automotive serial
production systems (March 2012),” IEEE Trans. Autom. Sci. Eng., 2013.
[29] L. Luo, N. Chakraborty, and K. Sycara, “Competitive analysis of repeated greedy auction algorithm
for online multi-robot task assignment,” in Proceedings - IEEE International Conference on
Robotics and Automation, 2012.
[30] J. Liu, Q. Chang, G. Xiao, and S. Biller, “The Costs of Downtime Incidents in Serial Multistage
Manufacturing Systems,” J. Manuf. Sci. Eng., vol. 134, no. 2, p. 021016, 2012.

You might also like