Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Information Sciences 181 (2011) 1212–1223

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

A parallel immune algorithm for traveling salesman problem and its


application on cold rolling scheduling
Jun Zhao a, Quanli Liu a, Wei Wang a,⇑, Zhuoqun Wei a, Peng Shi b,c
a
School of Control Science and Engineering, Dalian University of Technology, Dalian, China
b
Department of Computing and Mathematical Sciences University of Glamorgan, Pontypridd, CF37 1DL, United Kingdom
c
School of Engineering and Science Victoria University, Melbourne, VIC 8001, Australia

a r t i c l e i n f o a b s t r a c t

Article history: Parallel computing provides efficient solutions for combinatorial optimization problem.
Received 10 August 2010 However, since the communications among computing processes are rather cost-consum-
Received in revised form 30 November 2010 ing, the actual parallel or distributed algorithm comes with substantial expenditures, such
Accepted 4 December 2010
as, hardware, management, and maintenance. In this study, a parallel immune algorithm
Available online 13 December 2010
based on graphic processing unit (GPU) that originally comes to process the computer
graphics in display adapter is proposed. Genetic operators and a structure of vaccine taboo
Keywords:
list are designed, and the internal memory utility of GPU structure is optimized. To verify
Parallel immune algorithm
Graphics processing unit
the effectiveness and efficiency of the proposed algorithm, various middle-scale traveling
Traveling salesman problem salesman problems (TSP) are employed to demonstrate the potential of the proposed tech-
Cold rolling scheduling niques. The simulation examples demonstrate that the developed method can greatly
improve the computing efficiency for solving the TSP, and the results are more remarkable
when the scale of TSP becomes higher. Furthermore, the derived algorithm is verified by a
practical application in steel industry that arranges the cold rolling scheduling of a batch of
steel coils.
Ó 2010 Elsevier Inc. All rights reserved.

1. Introduction

Traveling salesman problem (TSP) is one of the most widely studied problems in combinatorial optimization area [13].
Since it has a great significance for many practical applications in engineering, the study of this problem has attracted people
from various fields such as artificial intelligence, biology, and operations research. TSP belongs to the typical NP-hard classes,
for which people tended to seek an acceptable approach in limited computing time. Also, note that solving the underlying
problem will take much more time when the problem scale grows, so it is difficult for the generic algorithms to meet the
demands of computing time [17].
The methodologies based on intelligent computing have become the important means for such problem, including genet-
ic algorithm, ant colony optimization, simulated annealing, and etc. [2,3,5,10,12,13,16,18,19,22]. The work in [12] described
a new hybrid nature inspired approach regarding each individual of the colony as a solution. Although it had a sound global
search performance, its traditional version easily comes up against premature convergence in application and has lower effi-
ciency in the stage of evolution. While [19] gave an artificial ant system that could find the shortest path from the food
source to nest without using visual cues. In the work of [10,12,19], the simulated annealing, swarm intelligence and genetic
algorithm for TSP were introduced, respectively, which, although can solve TSP by their own characteristics, the common

⇑ Corresponding author.
E-mail address: wangwei@dlut.edu.cn (W. Wang).

0020-0255/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.ins.2010.12.003
J. Zhao et al. / Information Sciences 181 (2011) 1212–1223 1213

issues they face are the low computing efficiency when the scale of TSP becomes larger. Moreover, [13] modified an immune-
inspired self-organizing neural network for the solution of TSP, and proposed the comparison with other neural methods.
The simulation results claimed that the developed algorithms could achieve a better performance for the combinatorial opti-
mization. However, with respect to some application problems that could be summarized as the prototypes of TSP, the above
intelligent algorithms still exhibit the drawback of low solving efficiency that is often of little help to meet the real-time
requirement in practice.
Parallel computing is capable of greatly shortening time to give a solution; therefore it has been paid more attention by
the researchers. However, the actual parallel or distributed algorithm is generally based on the real devices of computer
cluster or multi-core processor. Typically, it was described in [11] that the serial and the parallel implementations of
simulated annealing and tabu search for TSP, and the algorithms were run at a multi-processor computer composed of 10
homogeneous 32-bit processors. A cooperative parallel multi-colony ACO was introduced in [19] for TSP on a distributed-
memory parallel architecture, and several factors were analyzed that influence the performance of a multi-colony ACO
algorithm. The work in [5] reviewed the existing parallel and distributed ACO, and introduced the exchange strategies by
organizing the interaction among a group of colonies on PC cluster. And, [15] constructed a tower-like master–slave parallel
paradigm for artificial immune systems, in which a parallel immune memory was designed. These parallel algorithms come
with the substantial expenditures of hardware investment, resource management, and device maintenance due to the plat-
form of machine clusters. Similarly, the multi-processor technique is also limited by the number of the CPU cores. When the
scale of application problem grows, the communication among processors, or between the master and the slave nodes, in-
curs a significant overhead. Recently, graphic processing unit (GPU), originally designed to support the computer graphics
processing, have attracted attentions to be used as the data-parallel processor due to its low hardware cost [14]. GPU-based
general purpose computations have been extensively employed for algebraic computation [4], matrix decomposition [6], and
optimization [20]. Li et al. [7,8] respectively proposed a fine-grained parallel GA and ACO based on GPU to improve the run-
ning time for TSP. Furthermore, a GPU-based immune algorithm was given in [9], in which the classical crossover and muta-
tion were adopted in evolution. However, the acceleration capacity of the parallelization still could not be well exerted
owing to the specific data parallel structure of GPU.
In this study, a parallel immune algorithm (PIA) based on GPU structure is proposed for a class of middle-scale TSP, where
the parallelized genetic and immune operators are designed. In genetic phase, three operators are combined to substitute the
traditional crossover and mutation; while in immune phase, the parallelized vaccine extraction and vaccination are pro-
posed, and the concept of vaccine taboo list is provided to increase the impact of the vaccines on the individuals and enlarge
the searching scope. To verify the proposed parallel computing, several typical benchmarks of TSP are used as the simulation
instances, and the comparative analysis is also presented by using the intelligent algorithms and their corresponding parall-
elized versions. For further demonstrating the effectiveness of the proposed low hardware cost PIA, a practical application in
steel industry is then provided, in which we consider a production scheduling of cold mill as an industrial example. The run-
ning results indicate that the proposed PIA with the low hardware cost achieves the better solving efficiency than existing
intelligent algorithms.
The rest of this paper is organized as follows. In Section 2, TSP problem and the corresponding definition of middle-scales
type are described, and the parallel immune algorithm is presented, where the genetic and the immune operators are estab-
lished. Section 3 gives the implementation of the proposed PIA on the GPU structure. The validations of the proposed ap-
proach using TSP benchmarks and the industrial application are illustrated in Section 4. Finally, Section 5 draws the
conclusions of this paper.

2. Middle-scale TSP and immune algorithm

In the traveling salesman problem, we are given a set {c1, c2, . . . , cN} of cities and for each pair {ci, cj} of distinct cities a
PN1
distance d(ci, cj). The goal of TSP is to find an ordering p of the cities that minimizes the quantity i¼1 dðcpðiÞ ; cpðiþ1Þ Þþ
dðcpðNÞ ; cpð1Þ Þ. This quantity is referred to as the tour length, since it is the length of the tour a salesman would make when
visiting the cities in the order specified by the permutation, returning at the end to the initial city. We concentrate in this
study on the symmetric TSP, in which the distances satisfy d(ci, cj) = d(cj, ci) for 1 6 i, j 6 N. In general, the TSP that contains
of more than 100 and less than 1000 cities are defined as the middle-scale ones.
Taking advantage of the characteristics of the original evolutionary algorithm, immune algorithm selectively employs the
feature information of the solved problems to inhibit the preponderance during the iterative process [7]. The characteristic
information, noted as vaccine, needs to be extracted from the problems, named as antigen. Then, the information will be
transformed into a class of new solutions that usually are called as the antibody. In this paper, a series of cities sequence
involved in the better solutions will be extracted as the vaccine, by which the selected individuals will be modified to form
the antibodies.

2.1. Genetic operators

In general, each individual runs on a process in parallel computing; hence, the simply parallelized crossover or mutation
would largely increase the communication cost between the processes, which leads to a low computing efficiency. In this
1214 J. Zhao et al. / Information Sciences 181 (2011) 1212–1223

study, a series of genetic heuristics are designed to substitute the traditional operators such as crossover and mutation. The
designed operators consist of double-bit-exchange, multi-bit-exchange and gene string move. In such genetic modification,
the individuals will be randomly selected to complete the operations. The probability of double-bit-exchange will be set
greater than others since we empirically consider the double-bit-exchange makes a relatively small change to the individ-
uals. We provide illustrative examples of double-bit-exchange and gene string move here considering that the multi-bit-ex-
change is similar to the double-bit-exchange.

Double-bit-exchange:
The parent city tour is: 0-5- -7-4-1- -6-3-9, and the cities 8 and 2 are selected for exchange.
The child city tour would be: 0-5-2-7-4-1-8-6-3-9.
Gene string moves:
The parent city tour is: 0-1-2- -6-7-8-9, and the gene string 3-4-5 is selected. Then,
The child city tour would be: 0-1-2-6-7-8-9-3-4-5.

The three genetic operators will be used one by one to each individual of the proposed algorithm for generating the off-
spring. And, the usage of each operator is based on the designated probability.

2.2. Immune operator

The solutions with well objective value should usually involve a shorter route between neighboring cities to great extent.
Therefore, it has a large possibility for the shortest route between two neighboring cities to be comprised in the optimal solu-
tion. In view of that, we adopt the feature tour segment that consists of the nearest neighboring route in the solution in order
to extract the vaccines.

2.2.1. Vaccination
A structure of vaccine pool is employed in this study to represent the collection including all of the feature information for
the immune process.
First, the size of vaccine pool can be determined on the basis of the city amount. Starting from each city, its nearest
successor can be chosen, and then the tour segment containing the city and its nearest neighbor will be stored into the vac-
cine pool. The tour segment would be as the vaccine that can be randomly selected to perform the vaccination for a certain
number of individuals. The vaccination process is illustrated in Fig. 1. And, we present a vaccination example here. The old
antigen is 0-3-5-7-2-1-4-8-6-9. Assuming the selected vaccine is 5-8, the antigen is vaccinated to produce a list of antibodies
as follows:

Then, we calculate the fitness value of all of the generated antibodies, and pick up the best one as the new antigen after
vaccination.

Fig. 1. Vaccination process.


J. Zhao et al. / Information Sciences 181 (2011) 1212–1223 1215

Fig. 2. The taboo list related operation.

2.2.2. Vaccine taboo list


Since the vaccines in pool are randomly selected for vaccination, it is very possible for some of vaccines to be repeatedly
chosen within a few of generations. As a result, it easily brings about falling into the local optimum area or degrades the
efficiency of the algorithm. To avoid that, a vaccine taboo list is introduced in this paper to aim at enlarging the search scope
and avoiding the repetitive vaccination. Fig. 2 shows the taboo list related operation, in which we assume the taboo list is full
in the iteration phase and considering a new selected vaccine 19-6 is being pushed into the list after vaccination. Then, the
earliest stored vaccine, 7-11, should be set free and return to the vaccine pool. The taboo list length in this algorithm is expe-
rientially set as one fifths of the amount of the extracted vaccines.

3. Parallelization on GPU

Although GPU was originally used to process the computer graphics, the processing is in nature to complete the single-
instruction-multiple-data (SIMD) extensions [14]. In this paper we transform the immune algorithm for a class of middle-
scale TSP into the SIMD process, and make full use of the GPU structure for parallel computing to improve the algorithm
efficiency.

3.1. Parallel model

Based on the mentioned solving process of the immune algorithm, each individual in the colony needs to realize the evo-
lution independently, such as genetic operation, fitness calculation and immune. In such way, if parallelizing the algorithm
directly on GPU structure, the imbalance load among the processors might result in the running wait of the processes that
obviously debase the solving efficiency. On the other hand, the amount of processes in parallel computing has to be also rea-
sonably considered. Generally speaking, the more the process amount is, the higher the acceleration ratio reaches. But, along
with the amount of processes rises, the corresponding communication between processes and the management would be
also complicated.
In this study, we employ the multiple processors in GPU structure, where each individual completes the evolution on one
process of a processor. In order to avoid the inter-process waiting during computation, we establish a separated parallel
mechanism for each evolution operator and synchronize the processes after each parallelized operation. The GPU-based par-
allel model for the algorithm is shown as Fig. 3, where Host, namely the CPU, realizes the separated kernels that release the
parallelized instructions to device; and device, namely the GPU, allocates the computing task into the multi-processors
(block) that consists of a few of threads in parallel. The available amount of processes in PIA can be determined by both
the number of block and thread of GPU structure.

3.2. Memory optimization

In GPU structure, texture memory is a kind of read-only memory that comes from the texture rendering unit specialized
on computer images. The data stored in texture memory can be quickly read by texture cache. While, shared memory in each
multi-processor has a high access speed in spite of a relatively smaller storage space than device memory. In this study, the
proposed PIA optimizes the utilizations of texture and shared memory, where the texture one is used in the operation of a
1216 J. Zhao et al. / Information Sciences 181 (2011) 1212–1223

Block(0,0) Block(1,0) Kernel 0

Block(0,1) Block(1,1) Kernel 1

Block(0,2) Block(1,2)

Block(0,3) Block(1,3)

Block(0,4) Block(1,4) Kernel n

Thread(0,0) Thread(1,0) Thread(2,0) Thread(3,0)

Thread(0,1) Thread(1,1) Thread(2,1) Thread(3,1)

Fig. 3. GPU-based parallel model.

large amount of data access, such as storing city distance data; while, in terms of iterative access process, the shared memory
is employed to accelerate the access. Here we illustrate the algorithm process as Fig. 4.
The step r completes the data copying from CPU to GPU device including the initialized individual of the colony, the city
distance data bound with texture memory, and the parallel parameters. The step s completes the one-to-one mapping from
the individuals to the computing process after loading vaccines from device memory into the shared memory. The new vac-
cinated individuals will be stored into the device memory. Combined with step s, all the processes can asynchronously ob-
tain the information from shared memory (step u) and texture memory (step t) in parallel without master–slave
management mode. Thus, the communication costs between the processes can be further saved. And, the step u is to par-
allelize vaccination for each process and update taboo list. In this figure, the bidirectional arrow denotes the data can be
transported both ways; while the unidirectional arrow denotes the data stream is one-way mode. Since the whole compu-
tation reads and writes the device memory only once, where the vaccination and the objective value calculation are imple-
mented by the shared and the texture memory, respectively, the solving efficiency of PIA is greatly improved.

Fig. 4. The optimized process based on GPU structure.


J. Zhao et al. / Information Sciences 181 (2011) 1212–1223 1217

3.3. Implementation of PIA

Considering the immune algorithm, the initial colony is randomly generated with a fixed population size, and the coding
length of each individual equals to the city amount in TSP. We perform the genetic operations of each individual on one
thread, which can be regarded as a computing process. In the parallelization, each block in GPU structure serves as a proces-
sor and the thread in block communicates with the others by using the shared memory. The solving process reads as follows.

Step 1: Initialize the GPU device and the algorithm parameters including the blocks and threads initialization, individual
initialization and etc.
Step 2: Load the city distance data and the initialized data from CPU into GPU.
Step 3: Extract the vaccines from the city distance data and store them into the shared memory of GPU structure.
Step 4: Complete the genetic procedure based on the three proposed operators.
Step 5: Complete the immune procedure for new generated individuals including taboo list check and update.
Step 6: Calculate the fitness value of each individual, construct a new generation by tournament algorithm, and record the
best-so-far solution.
Step 7: Check whether the iteration number is satisfied. If so, output the best-so-far; otherwise, go back to Step 4.

4. Simulations and industrial application

4.1. Simulations

For verifying the effectiveness of the proposed parallel algorithm, we apply it to a class of middle-scale symmetrical TSP
benchmarks. The hardware environment is dual-core 2.80 GHz CPU, 2G RAM, and GeForce GTS 250 GPU; the software plat-
form is the computer unified device architecture (CUDA) 2.0 [6]. In the verification, we respectively increase the city amount
and population size of colony to test the proposed PIA. Three TSP benchmarks are used, whose cities quantity ranges from
150 to 318. As for each experiment, we perform them for independent 50 times on both CPU and GPU, and the maximum
number of generations is 1000 that equals to the settings in [9]. The experimental data and the corresponding results are
listed in Table 1, where the column of Acc denotes the acceleration ratio between the serial algorithm and its corresponding
parallel version.
It is apparently from Table 1 that the acceleration ratio gradually increases along with the population size and the city
amounts grow, which is consistent with the theoretical analysis of parallel computing. In general, the more the number
of individuals in colony is, the higher the capacity of global searching in parallel computing exhibits. However, the comput-
ing cost also increases with the population size growing due to the inter-processes communication and management. As for
the first two questions, when the population size of the colony equals to 768, namely the amount of processes in parallel, the
optimal traveling tour obtains, and the optimum would not be improved even if the population size increases. To lin318, the
optimal population size is 1024. We illustrate the relationship between the population size and the computational time of
the serial and the parallel algorithms, shown as Fig. 5. The computing time of both the serial and the parallel algorithms
would increase when the population size of colony or the problem scale enlarges; however, the growth rate by the two
modes is clearly different. The growth rate by the serial version is largely rapider than that by the parallel one. Therefore,
the parallel algorithm exhibits a high solving efficiency when the problem scale becomes large.
For a further validation of the proposed PIA, a group of comparative experiments are also carried out, which include ge-
netic algorithm (GA) from [7], ant colony optimization (ACO) from [8], genetic-ant colony algorithm (GACO) from [1] and the

Table 1
The running results for TSP benchmarks by using the proposed PIA.

TSP Population size Optimum GPU Time (s) CPU Time (s) Acc
kroA150 400 30918 42.5 133.9 3.15
512 30578 49.5 177.7 3.59
600 30089 55.1 202.2 3.67
768 28366 66.4 261.1 3.93
800 28366 69.8 280.3 4.02
kroA200 400 36151 57.1 237.5 4.16
512 33344 61.5 286.6 4.66
600 32857 73.2 348.4 4.76
768 31097 84.8 423.8 5.00
800 31097 88.3 461.8 5.23
lin318 768 57138 149.3 1071.9 7.18
800 54568 156.9 1135.9 7.24
1000 49366 188.7 1396.2 7.40
1024 42519 189.1 1420.1 7.51
1200 42519 214.3 1615.8 7.54
1218 J. Zhao et al. / Information Sciences 181 (2011) 1212–1223

Fig. 5. The relationship between the population size and the running time for the benchmarks.

generic immune algorithm (IA) coming from [9]. All of them are implemented by both serial and GPU-based parallel algo-
rithm, respectively, and iteration number is still set to 1000 that is equal to the settings in [9]. For the appropriate compar-
ison, five TSP instances, partially involved in [9], are presented. And, the running performances are reported in Table 2. From
the table, it is clear that the optima by the proposed method have the better optimization results than others except two of
them (kroA150 and ts225) have the same ones as that by IA. More importantly, a fairly high predominance of the proposed IA
exhibits with respect to the solving time. The computing efficiency by the proposed parallel version could be better by two to
four times than other methods. Such results are extremely exciting for the practice that requires high solving efficiency. As
for the acceleration ratio, it is obviously that the acceleration ratio gradually increases along with the city amounts of the TSP
grow, which are also in accordance with the theoretical analysis of parallel computing. Although that by the generic IA has
the highest value in all over simulations compared to the others, the parallel algorithm by it still needs much more comput-
ing time, which is worse than the proposed approach.

4.2. Cold rolling production scheduling

In this section, a practical industrial application that requires a relatively high searching efficiency is studied by using the
proposed parallel algorithm. Cold rolling is a critical procedure after casting and hot rolling in steel industry, whose related
scheduling problem makes a close influence on the logistics and benefits of whole cold rolling line. Nowadays, with the man-
ufacturers or technician in plant give more attention to manufacturing executive system, the production scheduling of cold
mill becomes a significant problem that connects production management with manufacturing process.

4.2.1. Problem description


The studied production scheduling problem is a rather time-consuming and capital-consuming work that aims at arrang-
ing a batch of steel coils, usually hundreds of coils, for an optimal rolling sequence under technical constraints. The objective
of the scheduling is to minimize rollers abrasion and reduce setup cost. In rolling process, the roller-changing operation not
only is time-consuming, but also augments the additional setup costs. Hence, it is the key for the production scheduling to
utilize the roller as much as possible in its life cycle. The width profile within a roller-changing period cycle in cold rolling
should be from wide to narrow for the purposed of avoiding the edge mark brought onto the subsequent coils. In practice, it
is permitted to change the rollers only once in a batch since it is necessary to complete a width jump from narrow to wide in
roller life cycle. Fig. 6 shows the coils width profile of a batch, in which the width must be transited from wide to narrow in
the roller-changing period. In industrial manufacturing, smoothing the width and the gauges jumps of adjacent rolled coils
J. Zhao et al. / Information Sciences 181 (2011) 1212–1223 1219

Table 2
Comparative experimental results of the TSP benchmarks.

TSP Algorithm Optimum GPU time (s) CPU time (s) Acc
ch130 GA 7571 76.2 251.5 3.3
ACO 6338 193.6 561.3 2.9
GACO 6334 201.2 563.4 2.8
IA 6318 124.1 905.8 7.3
Proposed IA 6296 64.4 206.2 3.2
kroA150 GA 34252 85.9 292.1 3.4
ACO 28939 253.6 1039.8 4.1
GACO 28816 261.4 1041.2 4.0
IA 28366 211.5 1882.3 8.9
Proposed IA 28366 66.4 261.1 3.9
kroA200 GA 46720 170.1 663.3 3.9
ACO 31815 365.3 2288.6 6.3
GACO 31553 383.2 2296.5 6.0
IA 31251 443.6 4568.6 10.3
Proposed IA 31097 84.8 423.8 5.0
ts225 GA 157507 179.9 773.6 4.3
ACO 132394 397.9 2706.2 6.8
GACO 132105 424.1 2714.4 6.4
IA 129675 511.0 5876.8 11.5
Proposed IA 129675 155.7 887.7 5.7
lin318 GA 61102 206.6 1094.1 5.3
ACO 46776 451.2 3795.3 8.4
GACO 46297 453.6 3804.7 8.3
IA 42631 797.8 11834.1 15.8
Proposed IA 42519 189.1 1420.1 7.5

can improve the rolling quality and reduce the roller abrasion. And, the larger the jumps are, the more serious rollers abra-
sion generates, which may debase the surface grade of the steel coils.
As for the engineering practice, the amount of steel coils in a batch is usually more than 100. Due to the NP-hard char-
acteristics of this scheduling problem, it is difficult to obtain a satisfactory solution in acceptable computation time via using
general mathematic programming. Therefore, it is necessary for the practical problem to find a high speed computational
algorithm.

4.2.2. Mathematic model for cold rolling scheduling


A model of double TSP without return is proposed in this study to solve this sequence problem considering there are two
width-decreasing parts in a batch (see Fig. 6). For the entire optimization, the scheduling needs not only to optimize the in-
ner rolling sequence within a batch, but also to consider the connection with the previous batch for the flexible transition.
Here, the last steel coil of the previous batch is regarded as the starting city of the first salesman. The jump penalty of the two
adjacent coils on width and gauges is viewed as the distance of two cities. The objective of this model is to minimize the total
distance traveled by the two salesmen under the constraints. The model exhibits as follows.

previous a batch plan next


batch plan (study in this paper) batch plan

the last coil


production sequence

Fig. 6. The rolling width profile in a rolling batch.


1220 J. Zhao et al. / Information Sciences 181 (2011) 1212–1223

X
Min : pij  ðuij þ v ij Þ ð1Þ
i;j
X
n
Subject to : ðuij þ v ij Þ ¼ 1; i ¼ 0; 1; 2; . . . ; n and i – j ð2Þ
j¼1

where

1; if coil j is continuously rolled after coil i and i; j are in the first width-decreasing part
uij ¼
0; otherwise

1; if coil j is continuously rolled after coil i and i; j are in the second width-decreasing part
v ij
0; otherwise
and, i = 0 denotes the last steel coil in the previous batch; n is the number of steel coils; pij = wij + egij + ogij, wij, egij and ogij
represent the penalty of width, gauge in entrance and gauge in exit, respectively. Eq. (1) is the objective function of the
scheduling model that minimizes the jumps penalty of continuous rolled steel coils. Constraint (2) denotes that each coil
can be dealt with only once.
A heuristic method [21] is adopted, in which we firstly arrange the steel coils whose width are larger than the last coil of
the previous batch as the Rolling Section 1 due to the width profile in a batch and the connection with the previous batch
(See Fig. 7). In this Rolling Section, the coils will be rolled right after the rollers change. A standard TSP that is solved by the
proposed PIA can be applied here. The objective function can be calculated based on Eq. (1).
Then, the rest of the steel coils are arranged as another rolling section, where we concern the connection with not only the
previous batch, but also the Rolling Section 1. The width profile is illustrated in Fig. 8. This model is summarized as a typical
double TSP without priority, which concentrates on the reasonable grouping and arranging of these coils. The above men-
tioned PIA is also employed to solve it.

Step 1: Generate p binary strings, each of which has r bits that equals to the number of rest of steel coils. In binary string,
‘0’ represents the corresponding coil is distributed into the former rolling segment before rollers changing; while, ‘1’ rep-
resents it is distributed into the latter segment.
Step 2: Based on the distribution in Step 1, there will be two independent TSPs. The PIA will be respectively applied to
them.
Step 3: Obtain and memory the optimized solution under the current grouping. The objective function is the sum of the
two TSPs fitness values.
Step 4: Complete the evolution and the immune for next iteration. Then, distribute the new individuals and repeat Step 2
and Step 3 for the better group and sequence.
Step 5: Check whether the maximal iteration number reaches. If so, output the current best-so-far distribution and the
rolling sequence. Otherwise, repeat Step 4.

Until now, the two Rolling Sections have been optimized respectively. We incorporate them to form the ultimate opti-
mized rolling batch plan. After performing a large number of simulation experiments using the real production data, we des-
ignate the penalty coefficient for the coil size jump of the scheduling model as Table 3.

4.2.3. Algorithm application


In this study, we randomly select two batches of steel coils from the cold rolling plant of Shanghai Baosteel Co. Ltd., China,
as the validation examples, in which the amounts of steel coils are 105 and 135, respectively. At present, the human–ma-
chine interaction based on the workers experience is widely used in most of plants. However, the solving time for a batch
of coils needs generally about 1–2 h. For improving the scheduling efficiency and saving the human-capital cost, we apply

Fig. 7. Rolling Section 1 in a batch.


J. Zhao et al. / Information Sciences 181 (2011) 1212–1223 1221

Fig. 8. Rolling Section 2 in a batch.

Table 3
Penalty structure of the width and gauge jumps.

Width jumps (mm) Penalty Entrance gauge jumps (mm) Penalty Exit gauge jumps (mm) Penalty
0–5 1 0–0.10 5 0–0.02 5
6–10 5 0.11–0.20 10 0.03–0.05 10
11–30 10 0.21–0.30 20 0.06–0.10 30
31–50 30 0.31–0.40 30 0.11–0.15 50
50–100 50 0.41–0.50 50 0.16–0.20 100
>100 100 >0.50 100 >0.20 200

Table 4
Comparison of computation results using four methods and their parallel version.

Batch No. Amount of steel coils Method Optimal objective value Computational Time (s)
Serial Parallel
1 105 MA 72930 About 1.6 h
GA 61320 208.6 68.2
ACO 59780 732.1 192.8
IA 59780 893.4 113.6
Proposed IA 59780 193.2 59.7
2 135 MA 98520 About 1.8 h
GA 82710 251.0 81.5
ACO 77690 823.6 219.3
IA 77690 918.5 126.7
Proposed IA 77690 216.4 64.1

the intelligent algorithms, especially the parallel versions, to solve this industrial problem. Here, a comparative experiment
including the manual-based approach (MA), the generic evolutionary computation – GA also from [7], the swarm intelli-
gence – ACO from [8], the generic immune algorithm from [9] and their corresponding parallelized versions on GPU are gi-
ven. Table 4 gives the running results for the two batches. With respect to the optimal solution, the results by the proposed
IA are equal to that by ACO, which is superior to that by GA. And, all of the intelligence algorithms are obviously better than
the manual based approach. As for the computational time, although the acceleration ratios of these intelligent algorithms
reach more than 3, the computing speed of both serial and parallel versions by the proposed IA are superior to the others.
From the table, the proposed PIA obtains the optimal scheduling solution within about 1 min for a batch of rolled steel coils,
and such efficiency completely meets the practical scheduling requirement and largely saves the human capital cost.
Taking the Batch No.1 in Table 4 as an instance, we illustrate the size jump trend comparisons in the batch of steel coils by
using MA and the proposed PIA, shown as Fig. 9. Under the technical constraint of twice width decreasing partitions (see
Fig. 9(a)), the different exit gauge jump trend between the MA and the PIA appears in Fig. 9(b), where the exit gauge jump
by the PIA is relatively smoother than that by the MA. Such scheduling results are capable of effectively reducing the roller
abrasion.

5. Conclusions

Enhancing the solving efficiency of combinatorial optimization has become one of the hot issues in intelligent computing
field. In this study, we developed a parallel immune algorithm for TSP based on GPU structure that was designed originally
for image processing. Compared to the traditional parallelization, the proposed parallel model based on GPU in this paper
possesses the significant feature of fairly lower hardware investment, and also presents high quality solution and the fastest
1222 J. Zhao et al. / Information Sciences 181 (2011) 1212–1223

Fig. 9. Jump trend comparison in a batch of coils between MA and PIA.

solving speed compared to some typical intelligent algorithms. Furthermore, an industrial application that is with the similar
scale to the middle-scale TSP is included to demonstrate the effectiveness of the proposed parallel algorithm. The application
results show that the solving efficiency and the optimal solution are remarkable.
In summary, the proposed algorithm based on GPU structure can: (1) easily implement the parallel computing via the low
cost hardware device; (2) reduce the management tasks for the process communication and the data storage; and (3) agilely
enlarge the process amount for the parallelization according to the application problem.

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61034003), and the Engineering and
Physical Sciences Research Council, UK (EP/F029195). The cooperation of the cold rolling plant of Shanghai Baosteel Co.
Ltd., China is greatly appreciated. And, the authors would like to thank the Associate Editor and the anonymous review-
ers for their valuable comments and constructive suggestions which have greatly helped improve the presentation of the
paper.
J. Zhao et al. / Information Sciences 181 (2011) 1212–1223 1223

References

[1] A. Acan, GAACO: A GA+ACO hybrid for faster and better search capability, in: Proceedings of International Workshop on Ant Algorithms, Brussels,
Belgium, 2002, pp. 15–26.
[2] I. Ellabib, P. Calamai, O. Basir, Exchange strategies for multiple ant colony system, Information Sciences 177 (2007) 1248–1264.
[3] S. Gao, Z. Tang, C. Vairappan, An effective immune algorithm for multiple-valued logic minimization problems, International Journal of Innovative
Computing, Information and Control 5 (11A) (2009) 3961–3970.
[4] M. Harris, G. Coombe, T. Scheuermann, A. Lastra, Physically – based visual simulation on graphics hardware, in: Proceedings of the ACM Conference on
Graphics Hardware, Saarbrucken, Germany, 2002, pp. 109–118.
[5] Z. Hua, F. Huang, A variable-grouping based genetic algorithm for large-scale integer programming, Information Sciences 176 (19) (2006) 2869–2885.
[6] S. Lahabar, P.J. Narayanan, Singular value decomposition on GPU using CUDA, in: Proceedings of the IEEE International Symposium on Parallel &
Distributed Processing, USA, 2009, pp. 1–10.
[7] J. Li, Z. Chi, D. Wan, Parallel genetic algorithm based on fine-grained model with GPU-accelerated, Control and Decision 23 (6) (2008) 697–700.
[8] J. Li, X. Hu, Z. Pang, K. Qian, A parallel ant colony optimization algorithm based on fine-grained model with GPU-accelerated, Control and Decision 14
(8) (2009) 1132–1136.
[9] J. Li, L. Zhang, L Liu, A parallel immune algorithm based on fine-grained model with GPU-acceleration, in: International Conference on Innovative
Computing, Information and Control, 2009, pp. 683–686.
[10] C.C. Lo, C.C. Hus, Annealing framework with learning memory, IEEE Transactions on System, Man, Cybernetics, Part A 28 (5) (1998) 1–13.
[11] M. Malek, M. Guruswamy, M. Pandya, Serial and parallel simulated annealing and tabu search algorithms for the traveling salesman problem, Annals of
Operations Research 21 (1989) 59–84.
[12] Y. Marinakis, M. Marinaki, G. Dounias, Honey bees mating optimization algorithm for the Euclidean traveling salesman problem, Information Sciences,
in press, doi:10.1016/j.ins.2010.06.032.
[13] T.A. Masutti, LN. de Castro, A self-organizing neural network using ideas from the immune system to solve the traveling salesman problem,
Information Sciences 179 (2009) 1454–1468.
[14] J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A.E. Lefohn, T.J. Purcell, A survey of general-purpose computation on graphics hardware,
Eurographics 8 (2005) 21–51.
[15] Y. Qi, L. Jiao, F. Liu, Parallel artificial immune algorithm for large-scale TSP, Acta Electronica Sinica 8 (2008) 1552–1558.
[16] C.K. Ting, S.T. Li, C. Lee, On the harmonious mating strategy through tabu search, Information Sciences 156 (3) (2003) 189–214.
[17] C.F. Tsai, C.W. Tsai, C.C. Tseng, A new hybrid heuristic approach for solving large traveling salesman problem, Information Sciences 166 (2004) 67–81.
[18] C. Twomey, T. Stutzle, M. Dorigo, M. Manfrin, M. Birattari, An analysis of communication policies for homogeneous multi-colony ACO algorithms,
Information Sciences 180 (12) (2010) 2390–2404.
[19] A. Ugur, Path planning on a cuboid using genetic algorithms, Information Sciences 178 (16) (2008) 3275–3287.
[20] M. Wong, T. Wong, K. Fok, Parallel evolutionary algorithms on graphics processing unit, Proceedings of IEEE Congress on Evolutionary Computation 3
(2005) 2286–2293.
[21] J. Zhao, Q. Liu, W. Wang, Models and algorithms of production scheduling in tandem cold rolling, Acta Automatica Sinica 34 (5) (2008) 565–573.
[22] Y. Zhu, Z. Tang, H. Dai, S. Gao, Cooperation artificial immune system with application to travelling salesman problem, ICIC Express Letters 2 (2) (2008)
143–148.

You might also like