Professional Documents
Culture Documents
IST 11 Edge - PID1744101
IST 11 Edge - PID1744101
net/publication/252022440
CITATIONS READS
6 1,233
5 authors, including:
J. M. Bonelo M. J. Martín-Vázquez
Universidad de Cádiz University of Malaga
27 PUBLICATIONS 53 CITATIONS 23 PUBLICATIONS 181 CITATIONS
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Carlos G. Spinola on 28 May 2014.
Abstract— An image acquisition and processing system to This project was conceived to simultaneously fulfill both
measure the width and inspect the quality of the stainless steel goals, namely continuous width strip measuring and edge
strip in a production line is presented in this paper. It is based on quality inspection for all coils processed in a production line.
the real-time processing of the images acquired by a twin linear
camera system. Image processing algorithms to detect defects Previous works have presented real time inspection system
and anomalies in the edges have been implemented. The core of based on image processing in industrial environment using area
the image processing is submitted to the GPU of the graphic card cameras [2][3] and linear cameras [4]-[8]. So it was decided to
to reduce the total processing time. A system like this has been design a width measuring industrial system based on image
proved and installed in a stainless steel production line for processing that could also afford edge inspection and defect
quality control purposes. detection such as cracks, scratches, strike marks, etc. As we
explain later, due to the precision required, the range of values
Image processing; edge line search; edge defect detection; to measure and the peculiarity of the available places to
GPU, CUDA, graphic card processing; quality control. position the system, a twin linear camera system was used.
The description of the system, the basic model of the
I. INTRODUCTION
measuring task and the inspection algorithms implemented are
Stainless steel is a material of high added value and presented in this paper. The need to accomplish all these
increasing importance in our world. It is initially produced in objectives in real time and in an industrial environment is very
the melting shop, and in the subsequent hot rolling mill it is demanding, so the initial algorithms developed to be processed
usually shaped as strip coils several hundred meter long and in a standard CPU were modified and adapted to be processed
different width and thickness (Fig. 1). The strip width and in the GPU (Graphic Processor Unit) of the graphic card. A
thickness must be monitored as the dimensions are important dramatic reduction in the processing time was achieved using
features to fulfill the customer orders [1]. The strip surface and its ability to parallel processing, and making room for
edge must also be accurately inspected to detect any kind of additional and more detailed inspection in the future. This can
anomalies in the edges for quality control purposes and to be considered an important feature of the presented system.
avoid troubles in subsequent processing lines, especially in
cold rolling mill, where damages could be produced in the very This device has been installed, tested and it is in operation
smooth rolls and even breakage of the strip could happen. in the Acerinox factory in Spain, one of the major producers of
stainless steel in Europe.
The edge line positions dreal1 and dreal2 regarding the optical
center of the image are determined in each twin acquired
images. The distance k between the two cameras is known (b)
from the system calibration; therefore a representative value of Fig. 5. (a) ROI and window i over acquired image (image left-
the coil width dwd in this pair of images can be calculated as in rotated 90º); (b) Edge vector E(n).
(2). element E(i) contains the relative position of the local
d wd = d real1 + d real 2 + k (2) edge in the i-th window with regard to the global edge
line in the image (Fig. 5.b).
An ultrasonic sensor installed between the two cameras
informs about the sheet thickness, which is necessary to e) Finally, E(n) is processed to detect abrupt variations in the
calculate the width. local edge position. See positions 25 and 42 in Fig. 5.b.
The detailed geometrical model that considers the different
thickness of the coils, possible displacement from the central VI. GPU IMPLEMENTATION
axis and the correction implemented to assure the required So far it has been considered that the image processing runs
accuracy is presented in [7] and [8]. in a standard CPU, which could be placed in the same
production line. But the number of images acquired and
V. EDGE DEFECT DETECTION processed depends on the line speed and the image processing
use almost all the available time. A way to solve this problem
A common feature of most edge defects is that they affect is to use the computational power of the GPU of present
the straightness of the coil edge as being caused by a hit or graphic cards, with its parallel processing capability, using its
strike (Fig. 5.a) and this fact is used to detect such defects in SIMD (Single Instruction stream Multiple Data) architecture
the image. [11][12][13].
The defect detection algorithm is complementary with the There are many high level libraries to program the GPU.
edge detection and it is processed once the position of the coil But most of them are oriented to very specific graphic
edge in the image has been determined. The approach consists programming applications and harder to use being not
on analyzing in more detail a narrow region around the found applicable to a general purpose application like ours.
edge. The steps of the algorithm are described below:
CUDA (Compute Unified Device Architecture) is a general
a) Selecting the ROI centered around the coil edge (Fig. 5.a). parallel programming framework developed by NVIDIA
b) This ROI is binarized separating roll from steel pixels, in [14][15] to be used in general purpose C language applications
order to improve the robustness of the edge detection [16] to take advantage of the parallel processing capability of
algorithm when working with small regions. the graphic card. CUDA introduce the concept of threads or
elemental processors that can be grouped in 1, 2 or 3
c) Afterwards, the ROI is divided in n-windows along the dimensional grid that runs the same operator over a data
edge high enough to achieve the required accuracy but matrix. These simple abstractions allow some independence of
maintaining a low computational cost in order to allow the details of the graphic card model and shorten the
real time processing. In this application windows of developing time.
50x600 pixels (10 x 120 mm) have been selected.
d) Each binarized window i is processed in similar way VII. GPU IMAGE PROCESSING
described earlier to detect the edge position in it. After When adapting the described algorithms to the GPU, it is
completing this step a vector E(n) is obtained where possible to avoid some of the restrictions introduced earlier to
reduce the execution time, as decimation and ROI selection. VIII. PERFORMANCE IMPROVEMENTS
Therefore coil edge detection, width measurement, and defect It has been compared the processing time of the presented
detection, described as sequential tasks in IV and V paragraph algorithms applied to 2048x2048 images when implemented in
can be combined. CPU and GPU. It has also been considered the transfer time of
The image processing tasks explained in paragraphs III, IV and the image to the GPU which is a time consuming operation.
V can be submitted to GPU as described in the following steps. The hardware use as the test-bench was:
1. Memory allocation and transfer. GPU Memory allocation • CPU: Intel Pentium Dual Core E5200 2.52GHz, 1GB
and transfer of the image matrix to it. RAM.
2. Welding Line Detection. This step is essentially identical • GPU: NVIDIA GeForce GTX 285, 1 GB Global
to the one described in III. The GPU implementation of Memory. 240 CUDA cores. Compute capability 1.3.
classical algorithm as Sobel filter is developed in a similar
way as that described in references [17][18]. TABLE I shows the processing times obtained. The
speeding up got in weld line detection is 10x and in edge
3. Averaging the luminance of image columns. The image is inspection 14x, being the overall 9x.
logically divided in N windows of the same height. A
thread is devoted to each column and the 1xw grid of TABLE I. COMPARISON OF TASK PROCESSING TIMES.
threads, defined as shown in Fig. 6 (step 3), processes
Time (ms)
each row in parallel. Each time the average of a window Task
is completed the result is stored. Finally, a matrix M(n,w) CPU GPU Speed-Up
is obtained being each row the average luminance of the Load Image - 3.75 -
columns of the i-window.
Weld Line Detection 78 7.81 10x
4. Filtering and gradient calculation of each row of the Edge Detection 63 4.40 14x
M(n.w) matrix. Each thread is assigned to an element
whose result depends on the contiguous ones in the same TOTAL 141 15.96 9x
row. The 1xw grid of threads, defined as shown in Fig. 6
(step 4) processes each row in parallel. The result is There is also additional CPU miscellaneous processing for
stored in S(n,w) matrix. each image apart from the E(n) vector analysis, such as data
5. Getting the position of local coil edge. A thread is base accessing, results storing, image compression, etc. Taking
devoted to process each row. The nx1 grid of threads advantage that there are two processors in the system, it is
defined in this step processes each column in parallel. possible to achieve an extra processing time reduction if the
After completing it, a vector E(n) is obtained whose tasks are done in parallel rather than sequentially. The GPU
element E(i) stores the position of the local edge in the i- processing time Tg and the CPU miscellaneous processing time
th window Fig. 6 (step 5). Tp can be overlapped as shown in Fig. 7. In each period, the
processing of the i-image is done in the GPU while the CPU
Vector E(n) is the final result of the GPU processing and it performs the miscellaneous tasks corresponding to the previous
is transfer to the CPU where the rest of the processing is done image i-1. It is supposed that there is always a new available
because there are only a few elements and no much parallelism image at the beginning of each period.
is involved. The mean value of E(n) can be considered as the
coil edge location in the image dreal1 or dreal2, depending on the
side of the image. These values are used to calculate the coil
width, as in (2). The analysis of vector E(n) to detect edge
defects is the same explained in V.