HujanenWaltz SPIEConfMachVisionAppIndInspec1995 PipelinedImplementation

SPIE Conf. on Machine Vision Applications in Industrial Inspection Originally published San Jose, Feb.
1995
Revised and republished January 1998. Copyright Jan. 1998 by F. M. Waltz & A. A. Hujanen. Duplication by permission only.
PIPELINED IMPLEMENTATION OF BINARY SKELETONIZATION

USING FINITE-STATE MACHINES
Ahti A. Hujanen and Frederick M. Waltz
ABSTRACT
Skeletonization of binary images is an essential step in the inspection of many products, most notably printed circuit boards. It also is
used in many other situations, an unusual example being the location of branching points on growing plants for purposes of cutting
and vegetative propagation. Commercially-available image processing boards typically can’t perform this operation, although they
readily perform the easier task of repeated binary erosion. While a single skeletonization step cannot be done in one pass using a 3x3
neighborhood, one pass with a 4x4 neighborhood suffices. This result has been implemented in custom integrated circuits imbedded
in proprietary products, but (to our knowledge) is not commercially available.
This paper describes a new pipelined implementation of binary skeletonization which fits easily into the standard SKIPSM
(Separated-Kernel Image Processing using finite State Machines) architecture and which can be built using standard ICs costing less
than $200 total. The same approach also can be implemented in software, providing an order-of-magnitude increase in speed at no
extra cost. Furthermore, this same SKIPSM architecture is highly versatile and programmable, allowing it to be software-reconfigured
to perform hundreds of other pipelined image processing operations.
Keywords: image processing, separability, real-time, finite-state machines, inspection, binary skeletonization
1. INTRODUCTION
A set of six papers1, 2, 3, 4, 5, 6 introduced the concepts and many applications of SKIPSM (Separated-Kernel Image Processing using
finite State Machines). This paper presents the application of SKIPSM to binary skeletonization — an important operation which can
be done in a conventional pipelined architecture but which requires either multiple passes or specialized and rather expensive single-
purpose hardware, making real-time implementations generally unaffordable. The SKIPSM approach also provides significant
advantages for software-based systems.
The key features of SKIPSM are
• the separation of a large class of neighborhood image processing operations (generally considered not to be separable) into a row
operation followed by a column operation,
• the formulation of these row and column operations in a form compatible with pipelined operation,
• the realization of the resulting row and column operations as simple finite-state machines (FSMs),
• the automated generation of the FSM configuration data, and
• the implementation of the resulting FSMs in a few inexpensive standard microchips, or in software.
Note that the separation of 2-D operators into two 1-D operators does not involve separability in the usual sense, such as is defined for
2-D linear convolutions. All 2-D operators meeting a simple separability condition (see 1) can be separated using SKIPSM, although
the result may be unwieldy in some cases. The separability condition is met for binary skeletonization.
Finite-state machine theory is a branch of automata theory. Finite-state machines are not machines in the usual sense, but autonomous
input-driven sequential mathematical/logical constructs having important analytical and computational properties and a very well-
developed body of theory. Hardware implementations of FSMs using flip-flops, multiplexing switches, or ASICs are widely used for
such things as sequencers, timers, and computer disk I/O drivers. However, our examination of thousands of books and articles on
image processing has revealed no direct references to FSMs being applied to pipelined image processing. Furthermore, judging by the
available products, high-speed hardware vendors make no use of FSMs for image processing. Finally, FSMs are not part of the usual
training given to practitioners in the image processing field.
This paper will not repeat the extensive development of the theory and applications of SKIPSM contained in 1, 2, 3, 4, 5, 6, but will
instead assume that the reader has mastered the necessary concepts from these papers before proceeding here.
The image processing categories to which SKIPSM can be applied include but are not limited to the following:
• Binary template matching with large, arbitrary templates.
• “Fuzzy” binary template matching and binary correlation.
• Grey-level template matching, if the number of grey levels is not too large.
• Binary correlation.
• Binary morphology of all types with large, arbitrary SEs (structuring elements). SEs up to 25x25 and larger and with “holes” and
Pipelined implementation of binary skeletonization using finite-state machines, SPIE Paper 2423-2 Hujanen & Waltz SK8-1
SPIE Conf. on Machine Vision Applications in Industrial Inspection Originally published San Jose, Feb. 1995
other non-convex shapes can be applied in a single pipelined pass. In Out In Out
• Multiple SEs can be applied simultaneously in a single pipelined pass. For Row Column
example, six or more stages of the “Grassfire Transform” have been carried out CS Machine NS CS Machine NS
in one pass, and even larger numbers of stages are possible.
• Grey-level morphology, if the number of grey levels is not too large.
• Various “smearing” operations, including row, column, and diagonal
Pixel Delay Line Delay
summations and Hough transforms.
• Certain operations previously thought to be impossible in a pipelined system,
In Out In Out
such as “blob fill” and “patterned blob fill.”
• The subject of this paper — Binary skeletonization with one pipelined pass per Column Row
erosion stage, using a 4 pixel-by-4 pixel neighborhood to prevent the breaking CS Machine NS CS Machine NS
of connectivity (and hence failure of the algorithm) that occurs with algorithms
attempting to use a single pass per stage with a 3-by-3 neighborhood.
Figure 1 shows two versions of the general SKIPSM block diagram, applicable to Line Delay Pixel Delay
both hardware and software implementations. It is also possible to use
configurations operating along the diagonals of the image. For the remainder of Figure 1. Two versions of the basic SKIPSM
this paper, detailed discussion will be limited to the first configuration shown in architecture for a row-by-row raster scan pattern.
Figure 1, which is based on a row-by-row raster-scan input sequence. The other configurations can be handled in an analogous
manner. Binary skeletonization and all the other operations noted above can be implemented using this same hardware configuration,
some variations of which are shown in Figure 2. Note that this approach also offers great advantages for software implementations.
Row Machine Column Machine Row Machine Column Machine
Rule Rule R R
Input Row Output Input A Row A Output
Circuit Values Circuit Values
M M
Delay Delay
(a) Rule-based configuration. (b) Minimum RAM-based configuration.
Row Machine Column Machine Row Machine Column Machine

Input R RR Output RAM RAM
R R A R R Output
A A
A Row A A EP EP A
M M
M Input M Row
M Values M LD LD M
Values
Delay Delay
(c) More flexible RAM-based configuration. (d) Fully-programmable RAM-based configuration
Figure 2. Some implementations of the SKIPSM architecture for pipelined systems.

2. THE BINARY SKELETONIZATION OPERATION
Skeletonization differs from binary erosion in one small but critical way: Binary erosion strips off the outer layer(s) of a binary blob
without regard for how much of the blob remains. Skeletonization strips off the outer layers in the same way, but stops just before
breaking connectivity. That is, for “thick” regions, the operation is indistinguishable from erosion. But, after successive passes in
which outer layers are stripped away, the remaining blob eventually becomes only a few pixels wide in some places. The
skeletonization operation must sense such situations and NOT erode when to do so would break the skeleton of a single blob into two
or more pieces. Note that there is no unique “correct” skeleton and that different algorithms give somewhat different results. Opinions
vary as to which skeleton is “best.” This helps explain the proliferation of papers on the subject. (7 through17) For an overview of the
automated inspection of printed wiring boards and a list of forty-five references, many of which involve binary skeletonization,
consult Chin & Iverson.18
There is not enough information in the usual 3x3 neighborhood to distinguish between cases where erosion should be done and where
it should not be done. For this reason all skeletonization operations must use either more than one 3x3 pass or a neighborhood larger
than 3x3. One of the earliest algorithms7 used four 3x3 passes, one for the top, another for the right side, etc. The same paper
indicates that a 5x5 neighborhood is

sufficient. Other algorithms were devised to
reduce this to two 3x3 passes.8, 15, 17
Perhaps it has already been pointed out in
the literature, but one of these, the March
1984 Zhang-Suen algorithm,15 turns out to
be incorrect for skeletonization, as Figure 3. Errors cases on the first pass of the Zhang-Suen algorithm.
published. The one thing a skeletonization algorithm must A B C D Key
not do is break connectivity. On the first pass, in addition to 1 Used for all decisions (standard 3x3 neighborhood).
45 acceptable center-pixel removals and 197 acceptable 2 Used only for certain connectivity-preserving decisions.
center-pixel retentions, this algorithm removes the center 3 Not used.
pixel and thereby breaks connectivity in the 14 cases shown 4 Pixel about which black/white decision is being made.
in Figure 3. It is incorrect in a comparable number of cases
on the second pass. It is therefore useless for Figure 4. The Floeder skeletonization neighborhood.
skeletonization.
One vendor of industrial inspection 1 2 3 4 5 6 7 8 9 10
systems is developed a proprietary single-
pass skeletonization algorithm which uses
a 4x5 neighborhood19 and was designed 11 12 13 14 15 16 17 18 19 20
to, among other things, reduce the number
of extraneous spurs in the final skeleton.
21 22 23 24 25 26 27 28 29 30
Independently, Floeder20 developed an
excellent single-pass 4x4 algorithm, along
with an EPLD implementation. It is
Floeder’s algorithm which is being 31 32 33 34 35 36 37 38 39 40
implemented in SKIPSM in this paper.
The Floeder neighborhood is shown in
Figure 4. 41 42 43 44 45 46 47 48 49 50
Floeder's algorithm defines seventy-eight

4x4 templates, shown in Figure 5, that 51 52 53 54 55 56 57 58 59 60
distinguish between pixels that can and
cannot be removed without distorting the
resulting skeleton -- that is, the cases 61 62 63 64 65 66 67 68 69 70
where the deleting the pixel would break
the connectivity. The algorithm states that
if the pixel is white and it's neighborhood 71 72 73 74 75 76 77 78
matches one of the templates, then the Don’t Care
Black Pixel
pixel is removed in the resultant image. White Pixel
Otherwise, if the pixel is black or the
neighborhood does not match one of the Figure 5. Skeletonization templates.
templates, then the pixel remains unchanged in the resulting The column machine encodes this part of the
image. neighborhood into a state value from 0 to 599.
A B C D This information, plus the
The Floeder algorithm lends itself well to the SKIPSM method Earliest row current input from the
1 row machine, determines
since we simply need to identify when the image matches one Scanning sequence
of the templates and set the output accordingly. 2 the current output value
and the next state value.
3
3. PARTITIONING INTO A ROW MACHINE AND A COLUMN At each pixel time, the row
MACHINE Current row 4 machine encodes this part of
the neighborhood into a value
As usual, SKIPSM makes use of a row machine and a column from 0 to 9. This becomes the
Earliest pixel input for the column machine.
machine. Figure 6 shows the division of the 4x4 neighborhood Most recent pixel
in current row
into the two parts appropriate to these machines. The reader is in current row
referred to 1, 2, 3, 4, 5, 6 for more information on this step. Figure 6. Partitioning the neighborhood for SKIPSM implementation.
4. IMPLEMENTATION OF THE ROW MACHINE

The purpose of the row machine is to identify each row-pixel pattern uniquely. Therefore, our first step in developing the row
machine is to create a state representation for each four-pixel row. The obvious choice is to use the binary pixels as a binary number.
By using the first three pixels of the row to represent the state and the fourth (most recent) as the input, we immediately discover that
the row machine has only eight states. These eight states, coupled with the two input values, tells us we need to evaluate 16
conditions. Because the left-most pixel in the row-machine neighborhood (Figure 6) is needed only for certain decisions, it turns out
that only 10 cases need to be distinguished, rather than the 16 one would expect from a four-pixel binary neighborhood. These 10
cases, along with their encoded values, are shown in Figure 7, which also gives the complete state-transition diagram and look-up
tables for this machine.
com-
Base Address
Current State Input Next State Output
raw Output Values LUT LUT LUT
Offset from
Next State
pressed
weights: 1 2 4 weights: 1 2 4 weights: 1 2 4 8 Option 1 Option 2 Option 3
Output
weights
stacked: stacked:
0 000 0 0000 0 0 1 2 4 8
output state packed:
000 0 1 001 4 0001 8 5 0,1 0 = hi bits = hi bits 10S+O
0 000 0 1000 1 0 2,3 1 0 0 0 0 0 0
100 1 1 001 4 1001 9 5 1 4 5 44 69 45
4,5 2
0 100 1 0100 2 1 2 0 0 0 0 0
010 2 6 3 3 4 5 44 69 45
1 101 5 0101 10 6
7 4 4 1 1 9 17 11
0 100 1 1100 3 1 5 5 6 53 86 56
110 3 1 101 5 1101 11 6 8,9 5
6 1 1 9 17 11
0 010 2 0010 4 2 10,11 6 7 5 6 53 86 56
001 4 1 011 6 0011 12 7 12,13 7 8 2 2 18 34 22
0 010 2 1010 5 2 14 8 9 6 7 62 103 67
101 5 1 011 6 1011 13 7 10 2 2 18 34 22
15 9 11 6 7 62 103 67
0 110 3 0110 6 3 12 3 3 27 51 33
011 6 1 111 7 0111 14 8
raw values 13 7 8 71 120 78
0 110 3 1110 7 4 14 3 4 35 52 34
111 7 1 111 7 1111 15 9
compressed values
15 7 9 79 121 79
Figure 7. State transition and output tables for the row machine.
The table was generated by identifying the next state and output value for all possible current state values and input values. For a
given input value and state value, the next state is found simply by discarding the “oldest” pixel value and appending the input pixel
value. The output value is found simply by appending the input pixel to the binary state.
A graphical diagram of the row machine is shown in Figure 8. Each box in the diagram represents a state. The box contains state
number and corresponding pixel pattern. The connecting lines represent the transitions between states. The values accompanying each
transition are the input pixel value causing the transition and resulting output value of the row machine.
0/0 1/9
0 1/5 4 1/7 6 1/8 7
000 001 011 111
0/0 1/5 0/2 1/7 0/3 0/4
1 0/1 2 1/6 5 1/6 3

100 010 101 110
0/2
0/1
Figure 8. State transition diagram for the row machine. Only eight states are required.
Assuming that, to reduce hardware cost, a single lookup table (LUT) is to be used to contain both next-state and output information,
the numbers actually loaded into the LUT can be obtained in various ways. Three possibilities are shown in Figure 7. The first is
obtained by assigning the state values to the lowest three bits (weight = 1) and the output values to the next four higher bits (weight =
8), giving a range of 0-to-79 (decimal) and a 7-bit overall LUT word length. This is called “stacking.” The values can also be stacked
in the opposite order, as in Option 2, giving a range of 0-to-121 but the same 7-bit overall LUT word length. Option 3 shows the
values being “packed” without regard to bit boundaries, again giving a range of 0-to-79 and a 7-bit overall LUT word length.
“Packing” can sometimes reduce the overall LUT word length, but requires the column machine to do additional decoding steps, and
is therefore usually avoided. The choice depends on the hardware configuration of the column machine. The standard SKIPSM
architecture uses stacking, with the highest bits as output bits, as in Option 1.
5. THE COLUMN MACHINE
In Section 4, we developed a finite state machine to uniquely identify each of the four pixel patterns present in the skeletonization
templates. However, we must be able to identify when we get four rows matching one of the templates. To perform this operation, we
will build a second finite state machine. This is the column machine.
We know the column machine must produce a black pixel whenever pixel C3 (as labeled in Figure 4) is black. That is, if pixel C3 is
black it remains black. Additionally, if the neighborhood matches one of the templates, the output should be a black pixel. Otherwise,
the column machine should produce a white pixel. This is as prescribed by the Floeder algorithm.
To uniquely identify a neighborhood pattern, we simply concatenate the row machine A B C D Row Machine
output values. This becomes the state value. This is the same procedure we followed for Output
1 7
the row machine, but instead of binary pixel values, we have the row values provided by
the row machine. For example, if we have the pixel pattern shown in Figure 9. The row 2 0 State = 702
machine would produce the values 7, 0, 2 and 4 for the rows in this pattern. The first 3 2
three values are concatenated to produce the column machine state. The last value is the
column machine input. This pixel pattern would drive the column machine to state 702. 4 4 Input = 4
Then, with the input from the fourth row, the machine would move to state 24. Since the
pattern 7, 0, 2, 4 matches template 1, the output for the transition from 702 to 024 is 0. Figure 9. Column encoding for a particular
(See Figure 5.) pixel pattern.
Note that, as with the row machine, the next state is generated simply by discarding the "oldest" row machine value and appending the
input row value. Using this method we can generate a LUT for the column machine consisting of 1,000 entries. Figure 9 shows the
form and a few of the entries of the state transition table for this (“uncompressed”) column machine.
Notice, however, that the top row of the templates only use six of the 10 possible row values. Therefore, we can reduce the table to
600 state values (6,000 lines) by translating the most significant (“oldest”) value through a compression LUT. (See Figure 10.) It
should be noted that while translating adds some preliminary work in generating the column machine LUT, it adds no additional
complexity or execution time to the implementation. This compression is embedded in the final finite state machine.
Current State Input Next State Output Input Output
000 0 000 0 0 0
000 1 001 0 1 0
000 2 002 0 2 1
000 3 003 0 3 2
000 4 004 0 4 2
… … … … 5 0
702 4 024 0 6 3
… … … … 7 4
999 8 998 1 8 5
999 9 999 1 9 5
Figure 9. Uncompressed column machine LUT. Figure 10. Compression LUT.
There is one more consideration when producing the column machine LUT. When pixel C3 is black, the output should be a black
pixel. Thus any state in the column machine ending in 0, 1, 5 or 6 should have an output value of 0.
6. FURTHER COMMENTS
Figure 11 shows the minimum hardware required to realize the skeletonization implementation presented here. Notice that it requires
only 180,336 bits of memory. This is over 65 percent less memory than is required by Floeder's original implementation.
Of course, these results can be implemented in many other forms. Of particular interest to software-based systems is the fact that, at
the cost of a modest amount of RAM devoted to two lookup tables, each pass of the skeletonization operation is reduced to a few
steps: fetching the next input value, two bit concatenation or mask-and-add 4x7 14x11
steps to generate the LUT addresses, two fetches from fast RAM, and an output RAM RAM
Input Output
operation. Figures 12 and 13 show the results of applying our software 112 176
implementation of the algorithm to two images: one very simple and the other bits KBits
very complex. 10
Delay
Lines are 1 bit except as noted.
Figure 11. The minimized hardware version of the
architecture needed for binary skeletonization.
Figure 12. Skeletonized image of a simple object: (a) Source image (b) Skeletonized image
Figure 13. Skeletonized magnetic resonance image (a) Source image (b) Skeletonized image
7. CONCLUSIONS
In this paper, we have demonstrated the application of the SKIPSM method to binary skeletonization. The SKIPSM implementation
yields a 65 percent memory savings over the "direct" implementation while producing the same skeleton. Additionally, the
skeletonization operation offers all the advantages of the SKIPSM method, which include a pipelined implementation, a standard
inexpensive flexible hardware realization, and an efficient software implementation. Furthermore, this exact same hardware
configuration can be programmed to perform hundreds of other image processing operations, simply by loading different lookup
tables.
8. ACKNOWLEDGMENT
We hereby express appreciation to Steven P. Floeder, Engineering Systems & Technology Labs, 3M Company, Saint Paul, Minnesota,
for the excellent work20 on skeletonization which was incorporated into this paper and which made this implementation possible.
9. REFERENCES
1. F. M. Waltz, “SKIPSM: separated-kernel image processing using finite-state machines,” Paper No. 36, Proc. SPIE Conf. on
Machine Vision Applications, Architectures, and Systems Integration III, Vol. 2347, Boston, Nov. 1994
2. F. M. Waltz and H. H. Garnaoui, “Application of SKIPSM to binary morphology,” Paper No. 37, Proc. SPIE Conf. on Machine
Vision Applications, Architectures, and Systems Integration III, Vol. 2347, Boston, Nov. 1994
3. F. M. Waltz and H. H. Garnaoui, “Fast computation of the Grassfire Transform using SKIPSM,” Paper No. 38, Proc. SPIE Conf.
on Machine Vision Applications, Architectures, and Systems Integration III, Vol. 2347, Boston, Nov. 1994
4. F. M. Waltz, “Application of SKIPSM to binary template matching,” Paper No. 39, Proc. SPIE Conf. on Machine Vision
Applications, Architectures, and Systems Integration III, Vol. 2347, Boston, Nov. 1994
5. F. M. Waltz, “Application of SKIPSM to grey-level morphology,” Paper No. 40, Proc. SPIE Conf. on Machine Vision
Applications, Architectures, and Systems Integration III, Vol. 2347, Boston, Nov. 1994
6. F. M. Waltz, “Application of SKIPSM to the pipelining of certain global image processing operations,” Paper No. 41, Proc. SPIE
Conf. on Machine Vision Applications, Architectures, and Systems Integration III, Vol. 2347, Boston, Nov. 1994
7. Rozenfield, A., "Connectivity in Digital Pictures," Journal of the ACM, vol. 17, no. 1, pp. 153-155, January 1970.
8. Beun, M., "A Flexible Method for Automatic Reading of Hand-Written Numerals," Phillips Tech. Rev., vol. 31, no. 4, pp. 89-
101, 130-137, 1973.
9. Gudsen, A., "A Quantitative Analysis of Preprocessing Techniques for the Recognition of Hand-Printed Characters," Pattern
Recognition, vol. 8, pp. 219-227, 1976.
10. Moayer, B., Fu, K. S., "A Tree System Approach for Fingerprint Pattern Recognition," IEEE Trans. Comput., vol. C-25, pp. 262-
275, 1976.
11. Tamura, H., "A Comparison of Line Thinning Algorithms from Digital Geometry Viewpoint.", Proc. 4th Int. Joint Conf. on
Pattern Recognition, pp. 715-719, 1978.
12. Hilditch, C. J., "Linear Skeletons from Square Cupboards," Machine Intelligence, vol. 4, pp. 403-420, 1969.
13. Lee, D. T., "Medial Axis Transformation of Planar Shape," IEEE Trans. on Pattern Anal. and Mach. Intel., vol. PAMI-4, no. 4,
pp. 363-369, July 1982.
14. Naccache, N. J., Shinghal, R., STPA: "A Proposed Algorithm for Thinning Binary Patterns," IEEE Trans. Syst. Man. Cybernt.,
vol. 14, no. 3, p. 409, June 1984.
15. Zhang, T. Y., Suen, C. Y., "A Fast Parallel Algorithm for Thinning Digital Patterns," Communications of the ACM, vol. 27, no.
3., pp. 236-239, March 1984.16.
16. Holt, C., "An Improved Parallel Thinning Algorithm," Communications of the ACM, vol. 30, no. 0.2, pp. 156-160, February
1987.
17. Suzuki, S., Abe, K., "Binary Picture Thinning by an Interative Parallel Two-Subcycle Operation," Pattern Recognition, vol. 20,
no. 3, pp. 297-307, 1987.
18. Chin, Roland T., and Iverson, Rolf, “Automated Visual Inspection of Printed Wiring Boards: A Critical Overview,” in Machine
Vision Systems Integration, SPIE Critical Reviews of Optical Science & Technology, Volume CR36, Boston, Nov. 1990. This
paper includes a list of 45 references, many of which pertain to skeletonization.
19. Chin, Roland T. et al., U. S. Patent No. 494930. This patent describes hardware implementations for printed wiring inspection,
inclusing skeletonization.
20. S. P. Floeder, “A reprogrammable image processing system”, M. S. Thesis, Electrical Engineering Department, University of
Minnesota, Minneapolis, 1990

HujanenWaltz SPIEConfMachVisionAppIndInspec1995 PipelinedImplementation

Uploaded by

Copyright:

Available Formats

You might also like

HujanenWaltz SPIEConfMachVisionAppIndInspec1995 PipelinedImplementation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HujanenWaltz SPIEConfMachVisionAppIndInspec1995 PipelinedImplementation

Uploaded by

Copyright:

Available Formats

SPIE Conf. on Machine Vision Applications in Industrial Inspection Originally published San Jose, Feb.

PIPELINED IMPLEMENTATION OF BINARY SKELETONIZATION

Row Machine Column Machine Row Machine Column Machine

(c) More flexible RAM-based configuration. (d) Fully-programmable RAM-based configuration

Figure 2. Some implementations of the SKIPSM architecture for pipelined systems.

indicates that a 5x5 neighborhood is

Floeder's algorithm defines seventy-eight

4. IMPLEMENTATION OF THE ROW MACHINE

0/0 1/5 0/2 1/7 0/3 0/4

1 0/1 2 1/6 5 1/6 3

You might also like