Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Pattern Recognition 47 (2014) 623–633

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

Unsupervised segmentation and approximation of digital


curves with rate-distortion curve modeling
Alexander Kolesnikov a,n, Tuomo Kauranne b
a
Arbonaut Ltd., Kaislakatu 2, FI-80130 Joensuu, Finland
b
Lappeenranta University of Technology, PO. Box 20, FI-53851 Lappeenranta, Finland

art ic l e i nf o a b s t r a c t

Article history: This paper considers the problem of unsupervised segmentation and approximation of digital curves and
Received 31 December 2012 trajectories with a set of geometrical primitives (model functions). An algorithm is proposed based on a
Received in revised form parameterized model of the Rate–Distortion curve. The multiplicative cost function is then derived from
4 August 2013
the model. By analyzing the minimum of the cost function, a solution is defined that produces the best
Accepted 8 September 2013
possible balance between the number of segments and the approximation error. The proposed algorithm
Available online 16 September 2013
was tested for polygonal approximation and multi-model approximation (circular arcs and line segments
Keywords: for digital curves, and polynomials for trajectory). The algorithm demonstrated its efficiency in com-
Shape parisons with known methods with a heuristic cost function. The proposed method can additionally be
Graphical model
used for segmentation and approximation of signals and time series.
Piecewise linear approximation
& 2013 Elsevier Ltd. All rights reserved.
Curve fitting

1. Introduction (2) Min-ε problem: With a given number of approximating line


segments M (description length R), approximate the input curve
Polygonal approximation of digital curves is used in image with a minimum approximation error.
processing, computer graphics, pattern recognition, data retrieval, The solution to the problem depends on the error measure in
CAD, GIS, shape analysis and encoding. In all above applications, use. For an error measure of L1 (the maximum deviation) the
it is used to reduce the amount of data that need to be processed, solution can be found with the following heuristic [15–20] or
stored and transmitted. In addition to polygonal approximation, optimal algorithms [21–24]. With L2-norm (Integral Square Error,
signals, curves and trajectories can be approximated with non- ISE), the problem can be solved with heuristic [20,25] or optimal
linear functions like arcs, polynomials and splines [1–7]. There are algorithms [26–31].
two main approaches [8] to the problem of shape representation Thus, an approximation can be found if the number of seg-
with a polygonal curve: (1) identification of the dominant points ments M (the description length R) or the error bound ε0 is known.
as vertices of the approximating polygon, or (2) approximation of However, before an approximation can be formed, we need to
the segments of the input curve with a continuous sequence of know the most suitable value of the input parameter and how
line segments. The former approach is based on analysis of the many segments are required to represent the curve adequately.
curvature or other features to identify the corner points of the Therefore, a method is needed that determines the input para-
curve [9–13]. The approach has the drawback that the algorithms meter of the approximation algorithm for a concrete curve.
are sensitive to contour noise and often include input or hidden A possible approach to the problem is to find a balance
parameters. When using the latter approach, the problem of the between the number of segments M and the approximation error
approximation of digital curves is usually formulated in one of the E by introducing a cost function that incorporates both M and E.
following two forms [7,14]: For an additive error measure L2, the Lagrange multiplier method
(1) Min-# problem: With a given constraint on the approxima- can be used. The Lagrange multiplier algorithm searches for a
tion error, approximate a curve with the minimum number of line solution with a minimal value of the additive cost function
segments (the description length). C ¼E2 þ λM [2,4,5,32], where E2 is the Integral Square Error. The
trade-off between the number of segments M and the Integral
Square Error E2 is controlled by means of the user-defined
heuristic Lagrange multiplier λ.
n
Corresponding author. Tel.: þ 358 468 104 853.
In [9], a multiplicative criterion Figure of Merit (FOM) was
E-mail addresses: alexander.kolesnikov@arbonaut.com (A. Kolesnikov). introduced for evaluating solutions that have been obtained by
tuomo.kauranne@lut.fi (T. Kauranne). using heuristic algorithms for polygonal approximation: FOM¼E2M.

0031-3203/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.patcog.2013.09.002
624 A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633

Later, this criterion was presented as an “optimization error” [10], parameterized model of the approximation error as a function of
“weighted sum of square error” [11] and “compromise ratio” [13]. the number of segments M. Then, we derive a multiplicative cost
However, Rosin [33] has demonstrated that the two terms, M and function for approximation with error measures L2 and L1. The
E2, in the FOM are not balanced – the heuristic criterion gives minimum of the cost function gives us the optimal number and
solutions with too many linear segments. In the modified criterion type of approximating segments.
FOM-n (n¼2 or 3) proposed in [12], the number of segments M was The paper is organized as follows. Section 2 discusses the
penalized by raising it to a power n that would have the effect of problem of segmentation and approximation of curves. In
reducing this bias: FOMn ¼ E2 M n : Section 3, we introduce a parameterized model of the Rate–
pffiffiffiffiffi [17], as the “relative
The criterion FOM-2 was proposed earlier Distortion. Section 4 presents a criterion that gives a solution to
error”, in the following form: Er ¼ M E2 , but this publication the problem in question. Section 5 sets out the results of our
remained unnoticed. Although originally created as a criterion for experiments for test curves. Section 6 presents the conclusions.
evaluating heuristic algorithms [8,12,17,25], the criterion FOM-n
has been used for determining the number of linear segments
[20,34–36]. However, it is still not clear which value of the 2. Rate–Distortion curve
parameter n is more appropriate, n ¼2 or n ¼3.
For approximation with the error measure L1, the multiplica- An open planar N-vertex curve P is defined as ordered set of
tive criterion “weighted maximum error”, WE1 was introduced in points: P¼ {p(1), …, p(N)}, where p(n) ¼(x(n), y(n)). The polygonal
[11]: WE1 ¼ E1 M: The modified Figure of Merit (MFOM-3) was curve P is approximated by another polygonal curve Q¼{q(1), …, q
proposed in [19]. The criterion incorporates the criterion FOM-2 (Mþ 1)} with M linear segments, where q(m)¼p(im).
and the maximum deviation E1 with the number of segments M The approximation error with the measure L1 for a curve
as follows: segment {p(im), …, p(im þ 1)} is defined as the maximum of the
Hausdorff distance dH between points p(n) of the curve segment
MFOM-2 ¼ MFOM-2E1 M ¼ E2 E1 M 3 ð1Þ
and the corresponding approximating linear segment S(im, im þ 1)
In the algorithm [19], the set of polygonal approximating defined by the points p(im) and p(im þ 1). Maximum deviation for
curves was sequentially constructed for input contour with a the curve P is defined as follows:
heuristic algorithm for increasing error tolerance for maximum  
deviation until the first local minimum of the MFOM-3 criterion is δðMÞ  E1 ðMÞ ¼ max max fdH ðpðnÞ; Sðim ; im þ 1 ÞÞg : ð2Þ
1rmrM im r n r im þ 1
reached.
In this paper, we approach the problem from another angle. The approximation error with the measure L2 for a curve
Based on the fractal properties of curves, we first construct a segment {p(im), …, p(im þ 1)} is defined as the sum of squared

Fig. 1. General scheme of the full-search DP algorithm for polygonal approximation for error measures L2 (left) and L1 (right) [26]. M.
A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633 625

Euclidean distances dE between a point p(n) of the curve segment the curve is obtained from the sum of the description lengths of all
and the approximation line L(im, im þ 1) defined by the points p(im) segments
and p(im þ 1). The total approximation error E2(M) for the curve P is
M
defined as the sum of the approximation errors for all the R ¼ ∑ rðSðim ; im þ 1 ; φk ÞÞ: ð5Þ
segments m¼1

M M im þ 1
The functions E1(m) and E2(m) are called Rate–Distortion curves
2
E2 ðMÞ ¼ ∑ e2 ðim ; im þ 1 Þ ¼ ∑ ∑ dE ðpðnÞ; Lðim ; im þ 1 ÞÞ: ð3Þ for the error norms L1 and L2, correspondingly.
m¼1 m ¼ 1 n ¼ im Solution to the min-ε problem can be found with a Dynamic
Programming (DP) algorithm for the k-link shortest path in the
Approximation error with L2–norm is also characterized by Root- weighted acyclic graph of size M  N (state or search space) [26–
Mean-Square-Error s: 30]. For multi-model approximation, a set of solutions can be
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi calculated with the optimal DP algorithm [7] for the description
sðMÞ ¼ E2 ðMÞ=N: ð4Þ length in the range [R1, R2].
The cost function C(n,m) is defined as the minimum approx-
In the general case, we have a set of K approximation models imation error for the n-vertex curve {p1, p2,…, pn} approximated by
(functions) Φ¼ {φ1, …, φK}, where the model functions are m line segments. Thus, the function C(N, m) for m ¼[1, M] gives the
described by the corresponding number of parameters {λ1, …, Rate–Distortion curve. A 2D array of parent states A(n,m) is also
λK} [7]. For two-model approximation with line segments (φ1) and needed to construct the curve partition into segments. The general
circular arcs (φ2), the description length of the models are defined scheme of the full-search DP algorithm for polygonal approxima-
by the corresponding number of parameters to represent the tion is given in Fig. 1. Modification of the algorithm for multi-
model function: λ1 ¼ 2 and λ2 ¼3. For a polynomial model of order model approximation is given in [7].
q we thus have λk ¼q. Approximation error Ep(j,n) for curve segment {pj, …, pn} can be
The input curve P is divided into segments S(im, im þ 1), and each calculated in O(N) and O(1) time for measures L1 and L2,
segment is approximated by a function φk from the set Φ. The correspondingly [26]. The errors for all segments with end point
description length, r(im, im þ 1; φk), for the curve segment S(im, im þ 1) pn are calculated beforehand and stored in an array of size N. The
is defined as a description length λk of the selected model φk. for total time complexity of the DP algorithm for L1-norm is cubic:
the segment: r(im, im þ 1; φk)¼ λk. The total description length R for T1 ¼O(N3), i.e., defined by the time for calculation of approximation

Fig. 2. Approximation of the curve Great Britain with L2-norm: (a) the Rate–Distortion curve in linear scale with its model; and (b) the Rate–Distortion curve in log–log scale
with its model.

Fig. 3. Approximation of the curve Great Britain with L1-norm: (a) the Rate–Distortion curve in linear scale with its model; (b) the Rate–Distortion curve in log–log scale
with its model: D1 ¼ 1.09 and a¼ 0.92.
626 A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633

errors for O(N2) curve segments. For L2-norm, the complexity is error ε^ p on the number of segments m
defined by complexity of the DP algorithm for the shortest k-link
ε^ p ðmÞ ¼ C p m  1=Dp : ð8Þ
path: T2 ¼ O(MN2) [26]. The space complexity of the algorithms is O
(MN), which is defined by the size of the search graph (size of the The parameter Dp is evaluated from the corresponding log-linear
arrays C and A). regression model for m in the range [M1, M2] as follows:
( )
M2
Dp ¼ arg min ∑ ðlog ε^ p ðmÞ  log εp ðmÞÞ2 : ð9Þ
3. Model of Rate-Distortion curve m ¼ M1

This gives:
To construct model of the Rate–Distortion curve, we consider
the fractal dimension of curves. There are two main practical 2
ðM 2  M 1 þ 1Þ∑M M2
M 1 lg ðmÞ  ð∑M 1 lgðmÞÞ
2 2
methods to find the fractal dimension for planar curves: the box- Dp ¼ :
counting method [37] and the method with calculation of the ðM 2  M 1 þ 1Þ∑M M2 M2
M 1 lgðmÞlgðεp ðmÞÞ  ∑M 1 lgðmÞ∑M 1 lgðεp ðmÞÞ
2

curve length [38–40]. The fractal dimension H is defined with the ð10Þ
box-counting dimension method as follows: the curve P is covered Thus, a model is constructed that describes “on average” the
with a regular grid with cell size ε  ε and the number n(ε) of the behavior of the Rate–Distortion curve for m in the range [M1, M2].
cells (boxes) that cover the curve is calculated. Then, the fractal The parameter Dp for a curve is not necessarily equal to the fractal
dimension H is defined as follows: dimension H of the curve [41]. Indeed, the number n(ε) of covering
log nðεÞ boxes of size ε  ε for a straight line segment of length L is as
H ¼ lim : ð6Þ follows: n(ε)  L/ε. Thus, for the straight line, the fractal dimension
ε-0 log ε
H is equal to 1. On the other hand, only one rectangle of width wp
Thus, the number n(ε) of covering boxes of size ε  ε for the curve is needed to cover the line segment, which gives a value of
can be estimated by the following power law: nðεÞ  Cε  H . parameter Dp for the straight line: Dp ¼ 0.
Following this approach, we cover the curve with m variable- An example of the Rate–Distortion (R–D) curve for the test
length rectangles of width wp, where the width depends on the contour Great Britain is given in Figs. 2 and 3, the parameter of the
error measure: w1  ε1 ¼δ(m) and w2  ε2 ¼s(m) for the measures R–D curve model is D2 ¼ 1.10 and D1 ¼1.09, correspondingly.
L1 and L2, correspondingly. Thus, the assumption is that the
number of such rectangles m is described by the following
function of the error εp [41]: 4. Parameterized criterion pFOMp
D
mðεp Þ  εp p ; ð7Þ
From (8), it follows that the function I RDC ¼ ε^ p ðmÞm1=Dp is a
where the parameter Dp depends on the error measure. Relation (7) constant for the model of the Rate–Distortion curve in the range
is inverted to get the dependence of the estimated approximation [M1, M2]. We can derive the criterion Fp(m) directly from the

Fig. 4. The test data set: (a) Curve Great Britain, N ¼ 10,909; (b) Curve #1, N ¼ 1953; (c) Curve #2, N ¼ 2661 [19]; (d) Curve #3, N¼ 460 [12]; (e) Curve #4, N ¼ 428 [12]; (f) Curve
#5, N ¼ 1159 [19]; (g) Curve #6, N ¼1042 [36]; (h) Curve #7, N ¼2141 [19]; (i) Curve #8, N ¼ 1029, (j) Curve #9, N ¼939; (k) Trajectory Mario 8, N ¼ 1861 [42]; and (l) 2D point
set Shark, N ¼ 449 [43].
A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633 627

model invariant IRDC to evaluate solutions as follows: F p ðmÞ ¼ In the case of multi-model approximation, the total description
εp ðmÞm1=Dp . However, to keep consistency with the criteria FOM-n length R of the curve is used instead of the number of segments m.
and WE1 [12,11], the parameterized Figure of Merit pFOMp is Then the criterion pFOMp should be written as follows:
defined in the following way: (
( E2 Ra ; a ¼ 2D2 1 ; p ¼ 2;
E2 ma ; a ¼ 2D2 1 ; p ¼ 2; pFOM p ¼ 1 ð13Þ
pFOM p ¼ 1 ð11Þ E1 Ra ; a ¼ D1 ; p ¼ 1:
E1 ma ; a ¼ D1 ; p ¼ 1:
The algorithm for unsupervised approximation therefore consists
The global minimum of the criterion pFOMp at the point M*
of three steps:
gives a solution with the best balance between the error Ep and the
number of segments m in terms of the multiplicative cost function
Step 1: Calculate approximation errors Ep with the correspond-
M* ¼ arg min fEp ðmÞma g ð12Þ ing optimal algorithm and find the model parameter a;

Fig. 5. Rate–Distortion curves with models for approximation with L2-norm for: (a) Curves #1–6 (polygonal approximation); (b) Curves #7–9 (multi-model approximation
with line segments and circular arcs); and (c) Trajectory Mario 8 (multi-model approximation with polynomials).

Fig. 6. Rate–Distortion curves for approximation with L1-norm for: (a) Curves #1–6 (polygonal approximation); and (b) Curves #7–9 (multi-model approximation with line
segments and circular arcs).
628 A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633

Step 2: Find the global minimum of the criterion pFOMp by the The proposed algorithm was tested with single- and multi-
number of segments m (or the description length R); model approximation. The former case comprised polygonal
Step 3: Approximate the input curve with the optimal algo- approximation of the curve Great Britain and test curves #1–6
rithm by using the found number of approximating segments with L2 and L1-norms. The 2D point set Shark has been approxi-
M* (description length R*). mated by polygonal curve with the Optimal Transport algorithm
[43]. In the latter case, the set of approximating models included
The heuristic criterion FOM-n was proposed in [12] to over- line segments and circular arcs for test curves #7–9. The test
come the drawback of the criterion FOM by penalizing the number trajectory Mario 8 was approximated with a set of polynomials of
of segments m. In the criterion FOM-n, an integer value of the the 1st and 2nd order, where the order of polynomials for the
parameter n should be guessed, whereas with the approach segments was defined with the optimal algorithm [7].
proposed in this paper, the parameter a is obtained from the The results of experiments have demonstrated that the pro-
Rate–Distortion curve. The criterion WE1 is a particular case of the posed parametric model describes the behavior of Rate–Distortion
criterion pFOM1 for a¼ 1. curves well enough for most of the test curves, especially for the
The total time complexity of the algorithm is cubic. This is coastline of Great Britain, the 2D point set Shark, the trajectory
defined by the complexity of the optimal DP algorithm for Mario 8 and the curves #1, #2, #4, #7 (see Figs. 2, 3, 5, 6 and 11).
polygonal and multi-model approximation of curves. For artificial objects, like curves #3, #7 and #8, the R–D curve
deviates significantly from its model (see Figs. 5 and 6). However,
we are more interested in deviating R–D curves because in such a
5. Results and discussions case, the criterion pFOMp has a distinct global minimum. More-
over, a larger deviation of the R–D curve from its model, makes it
The proposed algorithm was tested on a HP EliteBook with easier to find an adequate number of segments.
Microsoft Visual Studio 2010. The results of the experiment are set Parameter a of the model depends on the interval of the
out for the curve Great Britain, nine digitized curves [5,22,24], the numbers of segments [M1, M2] (Description Lengths [R1, R2]) used
trajectory Mario 8 [42] and the 2D point set Shark [43] (see Fig. 4). for construction of the Rate–Distortion curve. In other words, we
The set of the test shapes included smooth and noisy curves. should define the Region of Interest (ROI) for the approximation

Fig. 7. The test curve Great Britain: (a) the criteria FOM-2, FOM-3, and pFOM2: M* ¼22 for pFOM2; and (b) results of the optimal polygonal approximation with L2-norm for
M* ¼ 22.

Fig. 8. The test curve Great Britain: (a) the criteria WE1, MFOM-3, and pFOM1: M* ¼ 34 for pFOM1; and (b) results of the optimal polygonal approximation with L1-norm for
M* ¼ 34.
A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633 629

errors [εL, εU]. It is unreasonable to take too small a lower bound, measure L1 is more sensitive to outliers than those with the additive
because the criterion pFOMp has a degenerate solution: pFOMp ¼0, measure L2. That is why the number of segments evaluated with
when Ep ¼0. On the other hand, the upper bound should be much the measure L1 is usually bigger than that with the measure L2
less than the geometrical size of the shape. For test curves #1–9, (see Figs. 13 and 14). Appropriate choice of error measure depends on
we have s0 ¼[0.3, 8.0], and δ0 ¼ [0.3, 8.0]. For the test trajectory, the the purpose of curve segmentation.
range of approximation errors is s0 ¼[3.0, 12.0]. The criteria FOM-2 or FOM-3 give the same results as the
Test curves #1–6 are sorted by the model parameter a for criterion pFOM2 if a E2 (curve #1 and Great Britain) or a E3 (curve
L2-norm in ascending order, where the parameter a is varied from #5), but which criterion to be used has to be guessed in every
2.15 to 3.66 (see Fig. 5). The parameter a can be treated as a particular case (see Figs. 7 and 13). For smooth shapes, both
measure of curve smoothness. For smooth curves, the approxima- criteria overestimate the number of segments M. For curve #6
tion error decreases faster with increasing number of segments with a6 ¼3.66, the proposed method gives M* ¼20 compared with
than for noisy ones, as can be seen by comparing the parameter a M* ¼47 and M* ¼34 for the criteria FOM-2 and FOM-3, respectively
for two similar curves with different levels of noise, where a ¼2.15 (see Fig. 9). For well-structured artificial contours like curves #7
(noisy curve #1) and a ¼3.02 (smooth curve #5). and 8, the heuristic criteria also give good results. However, in
We have evaluated performance of the criterion pFOMp in total, the proposed criterion pFOMp gives more adequate results
comparison with some other multiplicative criteria for the curve than the heuristic criteria under study. In the criterion pFOMp,
Great Britain, the curves #1–6, and 2D point set Shark. The sensitivity to the error Ep and the number of segments M is
criterion pFOM2 was compared with the criteria FOM-2 and balanced by the model parameter a: the criterion is less sensitive
FOM-3 [12] (see Figs. 7, 9, 11 and 12). The criterion pFOM1 was to the number of segments for noisy curves and more sensitive for
then compared with the criteria WE1 and MFOM-3 [11,19] (see noisy curves
Figs. 8 and 10). The criteria are scaled to the range [0,1] to The proposed unsupervised algorithm provides adequate shape
emphasize their minimums in the range under study. representation with a smaller number of segments than the
With the proposed method, the number of segments is evaluated heuristic algorithms [12,35] (see Figs. 13–15). For example, for
from the Rate–Distortion curve. The resulting number of segments curve #3, the proposed method gives M* ¼40 and M* ¼ 42 for L2
depends on the error measure in use. The approximation error with and L1-norm, respectively, against M ¼61 for the heuristic

Fig. 9. Criteria FOM-2, FOM-3 and pFOM2 for approximation with L2-norm of test curves #1–9.
630 A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633

Fig. 10. Criteria FOM-2, WE1, MFOM-3 and pFOM1 for approximation with L1-norm of test curves #1–9.

Fig. 11. Test 2D point set Shark, N ¼ 449: (a) Rate–distortion curve with its model: a¼ 2.26; (b) Criteria FOM-2, FOM-3, and pFOM2; and (c) result of approximation with the
found number of the edges: M* ¼30 for pFOM2.
A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633 631

Fig. 12. (a) Criteria FOM-2, FOM-3 and pFOM2 for the test trajectory Mario 8; and result of polynomial approximation of the trajectory with found parameters, R¼ 222,
s ¼4.33: (b) in the plane (Time, X), and (c) in the plane (Time, Y).

Fig. 13. Results of approximation for error measure L2: (1) Curve #1, M* ¼ 27; (2) Curve #2, M* ¼ 30; (3) Curve #3, M* ¼ 40; (4) Curve #4, M* ¼ 35; (5) Curve #5, M* ¼ 34;
(6) Curve #6, M* ¼ 22; (7) Curve #7, Mlines ¼8 and Marcs ¼ 3, R¼ 25; (8) Curve #8, Mlines ¼8 and Marcs ¼ 4, R¼ 28; and (9) Curve #9, Mlines ¼3 and Marcs ¼13, R ¼44.

algorithm method in [12]. In [35], the first minimum of the Due to the simplicity of the algorithm, the processing time is
criterion FOM-2 has been used to evaluate the number of seg- acceptable even for a few thousand vertices. For example, the
ments for optimal polygonal approximation of the curve #6 with processing time for approximation of 2661-vertex curve #3 is 1 s
L2-norm: M ¼47 (see Fig. 15). The proposed method for this curve and 30 s for measures L2 and L1, correspondingly. The processing
gives a more adequate value: M* ¼ 22 (see Fig. 13). time for the 10,909-vertex curve Great Britain for measures L2 and
The proposed method is not very sensitive to the presence of L1 is 10 s and 30 min, correspondingly. The processing time for
noise: quite adequate results are obtained for smooth and noisy Full Search Dynamic Programming algorithm with error measure
versions of the same shape: M1* ¼27 for curve 1 and M5*¼ 35 for L1 is too high for practical applications for N  10,000 vertices.
curve 5 (L2-norm), M1* ¼ 34 and M5* ¼38 (L1-norm), respectively. In such a case, it is recommended to reduce the number of points
However, the corresponding approximation errors differ greatly in the input curve to a reasonable value [18] or to use a faster
for two curves: s1 ¼1.40 and s5 ¼0.63 (L2-norm); and δ1 ¼3.2 and heuristic or near-optimal algorithm for approximation to calculate
δ5 ¼1.6 (L1-norm). the Rate–Distortion curve.
632 A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633

Fig. 14. Results of approximation for the error measure L1: (1) Curve #1, M* ¼ 34; (2) Curve #2, M* ¼ 34; (3) Curve #3, M* ¼ 42; (4) Curve #4, M* ¼ 38; (5) Curve #5, M* ¼ 32;
(6) Curve #6, M* ¼ 23; (7) Curve #7, Mlines ¼9 and Marcs ¼ 3, R¼ 27; (8) Curve #8, Mlines ¼ 8 and Marcs ¼ 4, R ¼28; and (9) Curve# 9, Mlines ¼ 4 and Marcs ¼ 12, R ¼44.

Fig. 15. Examples of polygonal approximation with heuristic algorithms: (1) Curve #3, M ¼61 [12]; (2) Curve #4, M ¼ 46 [12];and (3) Curve #6, M ¼ 47 [36].

The proposed approach with Rate–Distortion curve modeling Acknowledgment


has been used to solve a similar problem in a different area, i.e.
determining the number of clusters in multidimensional data We would like to thank Fernando de Goes et al. for providing
[44]. experimental data for the 2D test point set Shark.

6. Conclusions References

This paper considered the problem of optimal representation of [1] R. Nygaard, A.K. Katsaggelos, Rate–Distortion optimal signal compression
digital curves with a set of geometric primitives. The proposed using second order polynomial approximation, in: Proceedings of Interna-
tional Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2001,
method is based on an analysis of the approximation error as a
vol. 4, pp. 2617–2620.
function of the number of segments (description length). The [2] R. Mann, A.D. Jepson, T. El-Maraghi, Trajectory segmentation using dynamic
parameterized model of the Rate–Distortion curve was presented programming, in: Proceedings of International Conference on Pattern Recog-
nition, ICPR, 2002, vol. 1, pp. 331–334.
for polygonal and multi-model approximation for error measures
[3] L. Yin, Y. Yajie, L. Wenyin, Online segmentation of freehand stroke by dynamic
L2 and L1. The model parameter is a characteristic of the curve programming, in: Proceedings of International Conference on Document
smoothness – a bigger value of the parameter means a smoother Analysis and Recognition, ICDAR, 2005, vol. 1, pp. 197–201.
curve. By basing our algorithm on the constructed model, a [4] E. Bodansky, A. Gribov, Approximation of a polyline with a sequence of
geometric primitives, in: Proceedings of International Conference on Image
multiplicative cost function was derived for determining the Analysis and Recognition, ICIAR, LNCS, Springer, Heidelberg, 2006, vol. 4142,
number of segments. The minimum of this criterion gives a pp. 468–478.
solution with the best balance between the number of segments [5] N. Miller, R. Mann, Detecting hand-ball events in video sequences, in:
Proceedings of Canadian Conference on Computer and Robot Vision, CRV,
and the approximation error. 2008, pp. 139–146.
The proposed algorithm demonstrated good results for contour [6] F. Tortorella, R. Patraccone, M. Molinara, A dynamic programming approach
data when compared with heuristic multiplicative criteria tested for segmenting digital planar curves into line segments and circular arcs, in:
Proceedings of IAPR International Conference on Pattern Recognition, ICPR,
for unsupervised approximation. The known heuristic criteria can 2008, pp. 1–4.
be treated as particular cases of the presented criterion. The [7] A. Kolesnikov, Segmentation and multi-model approximation of digital curves,
proposed algorithm has potential for use in computer graphics, Pattern Recognition Letters 33 (9) (2012) 1171–1179.
[8] F. Arrebola, F. Sandoval, Corner detection and curve segmentation by multi-
CAD, digital cartography, spatial information systems, and shape
resolution chain-code linking, Pattern Recognition 38 (2005) 1596–1614.
analysis, modeling and comparison. [9] D. Sarkar, A simple algorithm for detection of significant vertices for polygonal
approximation of chain-coded curves, Pattern Recognition Letters 14 (1993)
959–964.
[10] J.M. Iñesta, M. Buendia, M.A. Sarti, Reliable polygonal approximation of
Conflict of interest imaged real objects through dominant point detection, Pattern Recognition
31 (1998) 685–697.
[11] W.-Y. Wu, An adaptive method for detecting dominant points, Pattern
None declared. Recognition 36 (2003) 2231–2237.
A. Kolesnikov, T. Kauranne / Pattern Recognition 47 (2014) 623–633 633

[12] M. Marji, P. Siy, Polygonal representation of digital planar curves through [27] C.-C. Tseng, C.-J. Juan, H.-C. Chang, J.-F Lin, An optimal line segment extraction
dominant point detection – a nonparametric algorithm, Pattern Recognition algorithm for online chinese character recognition using dynamic program-
37 (2004) 2113–2130. ming, Pattern Recognition Letters 19 (1998) 953–961.
[13] R. Dinesh, D.S Guru, Finite automata inspired model for dominant point [28] M. Salotti, An efficient algorithm for the optimal polygonal approximation of
detection: a non-parameter approach, in: Proceedings International Confer- digitized curves, Pattern Recognition Letters 22 (2) (2001) 215–221.
ence on Computing: Theory and Application, ICTA, 2007, pp. 579–583. [29] M. Salotti, Optimal polygonal approximation of digitized curves using the sum
[14] H. Imai, M. Iri, Polygonal approximations of a curve (formulations and of square deviations criterion, Pattern Recognition 35 (2002) 435–443.
algorithms), in: G.T Toussaint (Ed.), Computational Morphology, North- [30] A. Kolesnikov, P. Fränti, Data reduction of large vector graphics, Pattern
Holland, Amsterdam, 1988, pp. 71–86. Recognition 38 (3) (2005) 381–394.
[15] D.H. Douglas, T.K. Peucker, Algorithm for the reduction of the number of [31] A. Kolesnikov, ISE-bounded polygonal approximation of digital curves, Pattern
points required to represent a line or its caricature, The Canadian Cartogra- Recognition Letters 33 (10) (2012) 1329–1337.
pher 10 (2) (1973) 112–122. [32] A. Gribov, E. Bodansky, A new method of polyline approximation, in:
[16] J. Sklansky, V Gonzalez, Fast polygonal approximation of digitized curves, Proceedings of International Conference on Structural, Syntactic and Pattern
Pattern Recognition 12 (1980) 327–331. Recognition, LNCS, 2004, vol. 3138, pp. 504–511.
[17] A. Held, K. Abe, C. Arcelli, Towards a hierarchical contour description via [33] P.L. Rosin, Techniques for assessing polygonal approximation of curves, IEEE
dominant point detection, IEEE Transactions on Systems, Man, and Cyber- Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 659–666.
netics 24 (1994) 942–949. [34] M. Marji, P. Siy, Polygonal representation of digital planar curves through
[18] A. Pikaz, I. Dinstein, An algorithm for polygonal approximation based on dominant point detection – a nonparametric algorithm, Pattern Recognition
iterative point elimination, Pattern Recognition Letters 16 (1995) 557–563. 37 (2004) 2113–2130.
[19] T.P. Nguyen, I. Debled-Rennesson, Parameter-free method for polygonal [35] A. Carmona-Poyato, N.L. Fernandez-Garcıa, R. Medina-Carnicer, F.J. Madrid-
representation of noisy curves, in: Proceedings of International Workshop Cuevas, Dominant point detection: a new proposal, Image and Vision
on Combinatorial Image Analysis, IWCIA, 2009, pp. 65–78. Computing 23 (2005) 1226–1236.
[20] M.T. Parvez, S.A. Mahmoud, Polygonal approximation of digital planar curves [36] A. Carmona-Poyato, F.J. Madrid-Cuevas, R. Medina-Carnicer, R. Mundoz-
through adaptive optimizations, Pattern Recognition Letters 31 (2010) Salinas, A new measurement for assessing polygonal approximation of curves,
1997–2005. Pattern Recognition 44 (1) (2011) 45–54.
[21] G. Papakonstantinou, Optimal polygonal approximation of digital curves, [37] B. Mandelbrot, How long is the coast of Britain? Statistical self-similarity and
Signal processing 8 (1985) 131–135. fractional dimension, Science 156 (3775) (1967) 636–638.
[22] J. Dunham, Optimum uniform piecewise linear approximation of planar [38] F. Normant, C. Tricot, Method for evaluating the fractal dimension of curves
curves, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1) using convex hulls, Physical Review A 43 (12) (1991) 6518–6524.
(1986) 67–75. [39] K. Falconer, Fractal Geometry, Wiley, New York, 2003.
[23] H. Imai, M. Iri, Polygonal approximations of a curve (formulations and [40] F.M. Barnsley, Fractals Everywhere, second ed., Morgan Kaufmann, 1993.
algorithms), in: G.T Toussaint (Ed.), Computational Morphology, North- [41] J. Jimenez, J.L. Navalon, Some experiments in image vectorisation, IBM Journal
Holland, Amsterdam, 1988, pp. 71–86. of Research and Development 26 (1982) 724–734.
[24] W.S. Chan, F. Chin, On approximation of polygonal curves with minimum [42] E. Keogh, UCR Time Series Datasets at 〈http://www.cs.ucr.edu/  eamonn/
number of line segments or minimum error, Int. J, Computational Geometry time_series_data/〉.
and Applications 6 (1996) 59–77. [43] F. de Goes, D. Cohen-Steiner, P. Alliez, M. Desbrun, An optimal transport
[25] M. Shearer, J.J. Zou, Detection of dominant points based on noise suppression approach to robust reconstruction and simplification of 2D shapes, Computer
and error minimization, in: Proceedings of International Conference on Graphics Forum 30 (5) (2011) 1593–1602.
Information, Technology and Applications, ICITA, 2005, vol. 1, pp.772–775. [44] A. Kolesnikov, E. Trichina, Determining the number of clusters with Rate–
[26] J.C. Perez, E. Vidal, Optimum polygonal approximation of digitized curves, Distortion curve modeling, in: Proceedings of International Conference on
Pattern Recognition Letters 15 (1994) 743–750. Image Analysis and Recognition, ICIAR, 2012, vol. 1, pp. 43–50.

Alexander Kolesnikov received a M.Sc. degree in Physics and a Ph.D. degree in Computer Science. He used to be a Senior Researcher at the Research Institute in Novosibirsk
(Russia) and the University of Eastern Finland. His main research interests are signal and image processing and compression.

Tuomo Kauranne is Associate Professor in Applied Mathematics at Lappeenranta University of Technology. His research interests include algorithms for geospatial data
processing and their application in modeling the environment and climate change.

You might also like