Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Int J Comput Vis (2008) 80: 92–103

DOI 10.1007/s11263-008-0130-z

Performance Modeling and Algorithm Characterization


for Robust Image Segmentation
Robust Image Segmentation

Shishir K. Shah

Received: 13 July 2007 / Accepted: 4 March 2008 / Published online: 20 March 2008
© Springer Science+Business Media, LLC 2008

Abstract This paper presents a probabilistic framework region corresponds to a different object or area of inter-
based on Bayesian theory for the performance prediction est. Typically, the partitioning is derived from specific con-
and selection of an optimal segmentation algorithm. The straints and the segmentation process uses these constraints
framework models the optimal algorithm selection process to construct homogenous regions and smooth boundaries.
as one that accounts for the information content of an in- Over the past 40 years, a multitude of methods and algo-
put image as well as the behavioral properties of a particular rithms have been proposed in literature (Pal and Pal 1993).
candidate segmentation algorithm. The input image infor- Each of them attempt to solve the segmentation problem
mation content is measured in terms of image features while based on image properties, constraints derived from the ap-
the candidate segmentation algorithm’s behavioral charac- plication domain, or a combination. Various image prop-
teristics are captured through the use of segmentation quality erties have been investigated including gray-level intensity,
features. Gaussian probability distribution models are used color, texture, etc. Higher level constructs such as geometric
to learn the required relationships between the extracted im- models of objects of interest, or Gestalt principles of percep-
age and algorithm features and the framework tested on the tual organization have also been used. The fundamental limit
Berkeley Segmentation Dataset using four candidate seg- in solving the segmentation problem is the fact that seg-
mentation is a problem of psycho-physical perception and
mentation algorithms.
therefore not susceptible to a purely analytical solution (Pal
and Pal 1993; Pavlidis 1977; Fu and Mui 1981). Techniques
Keywords Image segmentation · Performance modeling · and algorithms developed to get satisfactory results range
Algorithm characterization · Algorithm selection from simple statistical models, to adaptive filters, inten-
sity and texture analyses, wavelets, entropy, fuzzy sets, and
neural networks (Pal and Pal 1993; Nair and Aggarwal 1996;
1 Introduction Spann and Nieminen 1988; Spann and Grace 1994). How-
ever, most approaches are developed for a specific applica-
In any computer vision and image analysis system, segmen- tion and cannot be generalized for all images. In fact, no
tation is an integral part and is broadly defined as the par- single algorithm can be considered good for all images, nor
titioning of an image into separate regions. Each resulting are all algorithms good for a particular image (Pal and Pal
1993). Each algorithm’s utility is limited by its specific char-
acteristics that makes it applicable for particular kind of im-
ages. The fundamental challenge in image segmentation is
S.K. Shah
Quantitative Imaging Laboratory, University of Houston, then to provide a generalized framework that is capable of
Houston, TX 77204-3010, USA choosing a suitable algorithm from many candidates given a
particular image (Haralick and Shapiro 1992).
S.K. Shah () This paper presents a probabilistic framework that allows
Department of Computer Science, University of Houston,
Houston, TX 77204-3010, USA
for the selection of an appropriate image segmentation al-
e-mail: shah@cs.uh.edu gorithm based on the characteristic properties of the im-
Int J Comput Vis (2008) 80: 92–103 93

age to be segmented and the algorithm’s behavioral proper- The proposed framework in this paper is different from
ties. Since a large number of image segmentation algorithms the previous methods. The emphasis is on evaluating the ca-
have already been developed, the present work focuses on pability of an algorithm to provide optimal results for a given
evaluating the ability of a candidate algorithm from a list input image. We are not interested in optimizing an algo-
of possible algorithms to perform segmentation on a given rithm for a given task. It is well understood that segmenta-
image. Within the developed framework, the ability to per- tion algorithms are difficult to generalize and, depending on
form this evaluation is learned using a learning set of im- the context, a given algorithm will perform well and lead to
ages. Based on this knowledge, the evaluation or prediction dependable results, while for another image, still dependent
of each candidate algorithm’s capability of segmenting the on the context, the same algorithm will perform poorly and
input image is done without actually running any of the algo- provide unreliable results. In this paper, we introduce a gen-
rithms. Segmentation is performed using only the algorithm eral probabilistic framework and a prototype system that are
predicted to achieve the best outcome. designed for optimal selection of image segmentation algo-
rithms. This system can be considered an intelligent system
Performance analysis for tuning parameters of various
that combines algorithm evaluation with knowledge-based
segmentation algorithms for the purpose of optimizing re-
prediction for segmentation algorithm selection. Evaluation
sults within the context of an application domain has re-
is used to obtain knowledge about an algorithm’s capability
ceived much attention in the computer vision community in
and the intelligent system design is used to apply that knowl-
recent years (Konishi and Yuille 2000). A survey of major
edge. Knowledge within the framework is learned from a
research in the area of segmentation algorithm evaluation is limited sample of images representing the context variety
presented in Zhang (1996). Analytical examination is by far and a measure of candidate algorithms’ performance.
the most common approach to evaluate the performance of Rest of the paper is organized as follows. Section 2 de-
a segmentation algorithm. This is normally done by model- scribes the proposed approach and provides the definitions
ing the image to be processed, the noise that degrades the and the methodology of probabilistic performance predic-
original image with respect to the model, and analyzing the tor. Section 3 presents the framework based on probability
objective of the segmentation task (Chalmond et al. 2001). models and a maximum a posteriori estimator. Segmenta-
Another way of assessing performances has been through tion algorithms, image features, and performance evaluation
examination of outputs of the candidate algorithms for an measures are described in Sect. 4. Experimental results and
input image. Methods that combine both analytical examina- evaluation of the proposed approach on a large database of
tion and systematic optimization from model images for the segmented images with ground truth are presented in Sect. 5.
derivation of performance have also been developed (Zhang Finally, Sect. 6 concludes this work and discusses future ex-
and Haralick 1993; Cho et al. 1997). In the context of opti- tensions.
mizing algorithm performance based on parameter tuning,
recent approaches have exploited machine learning tech-
niques and learning-based system where the parameters are 2 Performance Prediction Framework
obtained from training data (Reynolds and Rolnick 1995;
Perner 1999). Predicting the probability of incorrect seg- Selection of an appropriate segmentation algorithm given an
input image is generally difficult for a computer, but rather
mentation for each candidate algorithm based on informa-
easy for an experienced person. Humans can learn an algo-
tion theoretic approaches has also been proposed (Konishi
rithm’s characteristics by observing the algorithm’s perfor-
and Yuille 2000). In a restricted sense, algorithm evalu-
mance over a range of inputs. As a result, humans are able
ation is concerned with the study of the algorithm’s be-
to predict the performance of each algorithm on a given im-
havior under different inputs. That is, performance evalua-
age, and more over able to choose an optimal one. Simulat-
tion is a process to characterize the capability of an algo- ing this process can lead to computerized automation in pre-
rithm, independent of the application domain. This requires dicting the performance of various algorithms and hence the
a methodology to understand the input-output relationship eventual selection of an optimal one given an input image.
of the algorithm. In many cases, performance evaluation has Performance analysis for any algorithm can include a range
been coupled with methods for fair comparisons of algo- of measures such as processing quality, stability, time and
rithms (Randen and Husoy 1999; Shufelt 1999). Algorithm memory complexity, etc. Each of these measures are rele-
comparison then leads to ranking of different algorithms for vant and can weigh on the selection of an optimal algorithm.
a given input image and the application of interest. How- For simplicity, we will restrict our discussion in this paper
ever, instead of understanding the capability of a particular to only the quality of segmentation.
algorithm in analyzing different images, it emphasizes the A general flow chart of the proposed performance pre-
relative merit of one algorithm over others, and hence, fails diction framework is shown in Fig. 1. The framework com-
to offer any predictive information. prises of three main modules: feature extractor, performance
94 Int J Comput Vis (2008) 80: 92–103
Fig. 1 Framework of the
proposed knowledge gathering
optimal image segmentation
system

evaluator, and a predictor. The feature extractor is meant to are used to design a predictor. This allows the representation
identify the context of the image. This means that the fea- and capture of knowledge needed for algorithm selection.
ture extractor should be general yet powerful enough to ex- The predictor itself is formulated such that confidence levels
tract image features that can provide sufficient information can be associated with the knowledge captured. When this
about the characteristics of the image that can be associated knowledge is put to use, features extracted from each new
with the behavioral properties of a segmentation algorithm. input image are used by the predictor and the performance
The performance evaluator simulates the human observer, of all algorithms on that image is predicted. The algorithm
wherein the main objective of the performance evaluator is corresponding to the best performance is selected as optimal
to capture the behavioral properties of each algorithm. This and applied to that image.
is based on a ground-truth reference and accurate image fea-
tures that can be measured to provide distinction among the
candidate algorithms and the one chosen as being closest 3 Probabilistic Learning and Prediction
to the ground-truth. Finally, the predictor provides a conve-
In presenting the probabilistic prediction framework, con-
nient mechanism to automatically acquire, store, and utilize
sider the problem of segmenting an image into different re-
human knowledge in an implicit way. Specifically, the pre-
gions, each exhibiting an area of interest. Let I = {Is , s ∈ S}
dictor combines information from the feature extractor and
be an image which belongs to V S where V is the set of val-
the performance evaluator, which forms the basis of the im- ues taken by Is , typically V = {0, . . . , 255}. Assume that
age context and the knowledge about each algorithm’s per- a set of local characteristics, or image features Y = {Ys ,
formance characteristics. The predictor uses this informa- s ∈ S}, extracted from I , is available. Let us also denote a
tion to estimate the outcome of each candidate algorithm. set of segmentation algorithms A = {a1 , . . . , ak }, each re-
The one predicted to generate the most desirable outcome is sulting in a corresponding segmentation R = {Ra , a ∈ A}.
chosen. In this paper, we provide a statistical learning model Let us further denote Ra to be a random variable indicating
for the predictor based on Bayesian decision theory (Duda the segmentation for an algorithm a. We further define the
and Hart 1973), relying on features extracted from the input ground truth as being represented as R g . The ground truth
image and the measure of algorithm performance extracted represents the exact state of nature with respect to the re-
from a learning data set of images. gions. Our objective is clearly to obtain a segmentation as
The framework is prototyped through the implementa- close as possible to R g . In probabilistic terms, the quality of
tion of four image segmentation techniques as detailed in segmentation obtained can be measured by the probability
Sect. 4. During training, the input image is processed with of error, α = P (Ra = R g |R g ), calculated over all regions.
each algorithm and the result compared to ground truth data. The goal of the predictor then is to minimize the probability
A range of measures are computed for each input image and of error.
associated with the obtained results of segmentation. Both Generalizing the above point of view, the proposed prob-
the image features and the measure of segmentation quality abilistic learning and prediction approach can be considered
Int J Comput Vis (2008) 80: 92–103 95
Fig. 2 A probabilistic learning
and prediction approach for
hypothesis based segmentation
algorithm selection

a matching problem (Ettinger et al. 1996; Chiang and Moses which form the second input to the Bayes match stage. We
1999; Chiang et al. 2000) as summarized in Fig. 2. Given a assume available a feature prediction function, learned off-
bank of segmentation algorithms available, let us consider line, which maps a hypothesis Hk to the set of predicted
an index stage that provides a list of candidate hypothesis, features. We also assume a known probability model for the
Hk , k = 1, . . . , K, based on a partitioning of the hypothesis attributes of any predicted or extracted feature, as well as for
space. A feature extraction stage serves to measure the im- the feature attributes if Yi provides no positive association to
age context, where such context is estimated from the image a predicted feature (i.e., Yi has a negative relationship to a
and used as low-dimensional surrogates for sufficient sta- predicted feature). The objective of the Bayes match stage is
tistics. This is modeled as a density function, p(Y |I ) and to compute the posterior likelihood of the observed features,
reflects the sensitivity of the image features Y , given the im- Y , under each of the K hypotheses in the set H, given by
age I . Evaluation of the candidate hypotheses proceeds us-
ing a probabilistic interpretation for the observations. A fea- ∧k = P (Hk |Y ), Hk ∈ H. (3)
ture prediction stage is used to compute a predicted feature
set for each segmentation algorithm by combining the image The most likely hypotheses and their corresponding likeli-
context model from the feature extraction stage and a “good- hoods provide the optimal choice of the segmentation algo-
ness” of segmentation representation of a hypothesis Hk . rithm.
The predicted feature set, X k , is modeled by a distribution For simplicity of presentation, we drop the subscript of
function p(X|Hk ) acknowledging error in the model used the hypothesis Hk considering the solution for H ∈ H and
to compute the predicted feature set and variation among the superscript k on X k . Then, the Bayes rule applied to
segmentations in any algorithm. A Bayes Match is used to compute the posterior likelihood in (3) for any H ∈ H is
compute the posterior probability of a candidate hypothe- given by
sis, ∧(Hk ), by combining the extracted and predicted feature
sets. A rank order of the top hypothesis and the associated p(Y |H )P (H )
P (H |Y ) = . (4)
likelihoods provide the optimal selection of the segmenta- p(Y )
tion algorithm in this proposed system.
The Bayes matching problem can be formulated as fol- Since the denominator of (4) does not depend on the hypoth-
low. Consider n features esis H , the maximum a posteriori (MAP) estimate is found
by maximizing p(Y |H )P (H ) over H ∈ H. The prior P (H )
Y = [Y1 , Y2 , . . . , Yn ]T (1) is computed during the model learning phase where the prior
for each hypothesis is given as a ratio of the number of times
extracted from the input image, I , as one of the inputs to the a particular hypothesis was true given an input image over
Bayes match stage. Each feature, Yi , can be a vector valued total size of the learning image set.
function of the contextual attributes. Known also is a set of To determine the conditional estimate of Y given hypoth-
K candidate hypotheses, given as H = Hk , k ∈ [1, . . . , K], esis H , we have
each having a prior probability P (Hk ). The set H is given

by the index stage and can be considered to be a list of the
p(Y |H ) = p(Y |X, H )p(X|H )dX (5)
candidate algorithms and their hypothesized ability to seg-
ment the input image. Further, consider a set of m predicted
features given as where, p(X|H ) models the feature prediction and
p(Y |X, H ) models the feature extraction. This is the case
X k = [X1k , X2k , . . . , Xm ] ,
k T
(2) since X is completely determined from the hypothesis H .
96 Int J Comput Vis (2008) 80: 92–103

The computation of p(Y |X, H ) requires that there be a rela- where  represents the error in the model that estimates the
tionship between the elements of Y and X, which eludes to predicted feature Xi .
the fact that the “goodness” of segmentation features will be The remaining difficulty in solving (6) is knowledge of
implicitly related to the image data since the characteristics the priors P (λ|H ). In general, the mapping λ provides a re-
of the segmentation algorithms are defined by the character- lationship between features Yj and a specific predicted fea-
istics of the image data. ture Xi by the function λj = i. The mapping itself can be
Let us define a mapping function λ that relates the ex- considered independent of the image features, if the same
tracted and predicted features, Y and X, respectively. The features are extracted from every image irrespective of the
mapping function itself is a nuisance parameter that arises segmentation algorithm. In this case, the prior estimates
because the relationship between the predicted and extracted P (λ|H ) would be simplified to Pi (H ), where Pi (H ) is the
features is not explicitly known and hence not amenable to probability of picking hypothesis H given output Xi .
direct estimation. If we denote L to be the set of all admis- It follows that if N is the probability that the extracted
sible mapping functions, the probabilistic correspondence feature has no positive association to a predicted feature, that
model for the Bayes likelihood estimate is given by is, the extracted feature does not contribute to the problem
 of algorithm selection, and
p(Y |H ) = p(Y |λ, H )P (λ|H ) (6)
λ∈L Pi (H )
Li (H ) = (1 − N ) m (12)
where, similar to (5), k=1 Pk (H )

is the probability that the selected algorithm comes from
p(Y |λ, H ) = p(Y |X, λ, H )p(X|λ, H )dX. (7) the list of hypotheses, the likelihood score can be computed
as (Ettinger et al. 1996)
To solve (6), we require a probability model for
p(Y |X, λ, H ). Let us assume that the uncertainties of Xi 
n
are conditionally independent given H , and that the uncer- p(Y |H ) = p(Y |X) = p(Yj |X, H )
tainties of Yj are conditionally independent given H and X. j =1
The independence assumption allows the use of the product  

n 
m
rule to compute the likelihood as: = NpNA (Yj ) + Li (H )p(Yj |Xi , H ) (13)
j =1 i=1

n
p(Y |λ, H ) = p(Yj |λ, H ). (8) where, p(Yj |Xi , H ) is the likelihood that image features Yj
j =1
correspond to predicted segmentation features Xi under seg-
Each term in the product of (8) is a feature likelihood con- mentation hypothesis H .
ditioned on a mapping function λ and hypothesis H . Each The selected hypothesis is the one that maximizes the
extracted feature Yj either provides a positive association to likelihood score.
a predicted feature or has no relationship. If a particular Yj
provides no positive association, the mapping can be consid-
ered to be a null function and the feature attribute modeled 4 Features, Algorithms, and Segmentation Quality
as a random vector with a known probability density func-
tion To assess the feasibility of the proposed framework of
optimal algorithm selection, we have created a bank of
p(Yj |λj = 0, H ) = pNA (Yj ). (9) four segmentation algorithms. Each of these algorithms use
On the other hand, for positive associations between Yj and color and texture features computed from the input image.
a predicted feature Xi , the mapping can be written as λj = i A dataset of human segmentations (Martin et al. 2001) is
and the likelihood of match computed according to (7). Fur- used as ground truth to learn the segmentation quality model
ther, for all i = null, (7) can be simplified to and compute the segmentation quality features used in the
 proposed framework.
p(Yj |λj = i, H ) = p(Yj |Xi , H )p(Xi |H )dXi . (10)
4.1 Image Features
If we assume the extracted and predicted feature attributes
have Gaussian distributed uncertainties, then (10) can be fur- Low-level features of the image are computed as character-
ther simplified to istics defining the image context. Specifically, color and tex-
ture features are computed from the color image and the con-
p(Yj |λj = i, H ) = p(Yj |Xi , H ) ∼ N (Xi , ), (11) verted gray-level image, respectively. For the color features,
Int J Comput Vis (2008) 80: 92–103 97

we use 1976 CIE L∗ a∗ b∗ color space separated into lumi- proximity, similarity, good continuation, closure as well as
nance and chrominance channels. Color distribution is mod- symmetry and parallelism are computed for each segmented
eled with histograms constructed with kernel density esti- regions, Si , resulting from the ith algorithm, and the corre-
mates. Histograms are compared with the χ 2 histogram dif- sponding ground-truth region, S g . Specifically, the features
ference operator (Puzicha et al. 1999) to obtain a feature for used are the ones defined in Ren and Malik (2003) and enu-
each pixel. A brightness cue is also computed based on the merated below:
use of L∗ histogram for each pixel. Once again, the actual
1. inter-region texture similarity
feature for each pixel is the χ 2 histogram difference. Texture
2. intra-region texture similarity
features used are computed from the gray-level image based 3. inter-region brightness similarity
on gray-level co-occurrence matrices (GLCM), fractal mea- 4. intra-region brightness similarity
sures, and Gabor filters (Fogel and Sagi 1989; Laws 1980; 5. inter-region contour energy
Tourassi et al. 2000; Tuceryan and Jain 1993). 6. intra-region contour energy
7. curvilinear continuity.
4.2 Segmentation Algorithms
We further, combine the inter- and intra-region features for
The segmentation algorithms used in this study were im- texture, brightness, and contour, respectively to generate a
plementations of Graph-based segmentation proposed in feature vector for each segmented region in the image. The
Felzenszwalb and Huttenlocher (2004), Level set based seg- overall quality measure is given by the evaluation function,
mentation proposed in Brox and Weickert (2004), Normal- originally proposed in Liu and Yang (1994):
ized Cut approach (Shi and Malik 2000), and the EDISON
segmentor (Comaniciu and Meer 2002). Each implementa- √  N ej2
X(I ) = N  (14)
tion used the same feature set extracted from the input image Pj
j =1
that represents the color and texture content of the image.
Further, the parameters for each algorithm were optimized where, N is the number of segments, Pj is the number of
to obtain the best segmentation performance across the en- pixels in segment j , and ej2 is the squared error of region j
tire learning set of images. These specific algorithms were as computed from each of the above features, independently.
chosen in this study as they exploit local features of the in- The error for each region is given by the χ 2 histogram dif-
put image and aim to realize a global partitioning. The algo- ference of the features of an algorithm’s region and the cor-
rithms used only differ from each other in the way optimal responding ground-truth region. This metric results in four
partitioning of the image is achieved. measures of segmentation quality based on texture, bright-
ness, contour, and continuity. The measure allows for char-
4.3 Segmentation Quality acterization of pixel to pixel match between the achieved
segmentation and the ground truth, but tends to favor under-
One way to characterize a segmentation algorithm is based segmentation.
on a measure of the quality of segmentation achieved. The To avoid over- and under-segmentation, we also use two
answer to “What is a good segmentation?” is not an easy measures that evaluate the segmentation quality based on the
one to quantify. Many approaches have been proposed to number of regions (Borsotti et al. 1998; Zhang et al. 2006).
address this issue and both analytical and empirical meth- These are computed as:
ods have been developed (Zhang 1996). We use a set of fea- √ N
 ej2


tures computed from the regions in the image that have been N N (Sj ) 2
X(I ) = + (15)
segmented as an output of each segmentation algorithm to 1000 × SI 1 + log Sj Sj
j =1
characterize the quality of segmentation. The Berkeley Seg-
mentation Dataset (Martin et al. 2001) provides ground-truth and
segmentation data. This dataset contains 12,000 manual seg-
mentations of 1,000 images by 30 human subjects. Half of 
N
Sj Sj
X(I ) = − g log g (16)
the images were presented to subjects in grayscale, and half S Sj
j =1 j
in color. We use the color segmentations for 300 images,
divided into training and test sets of 100 and 200 each, re- where Sj is the true segmentation of region j in image I
g
spectively. Each image has been segmented by at least 5 sub- and Sj is a ground-truth segmentation for region j . The first
jects, so the ground-truth, S g , for each region is defined by of these two measures penalizes very large number of seg-
a set of human segmentations. We consider a true segmen- ments while emphasizing the regional errors, thus achieving
tation region, S g , such that every pixel in that region is clas- a balance between under- and over-segmentation. The sec-
sified as belonging to that region by all the subjects. Fea- ond measure is information theoretic and reduces bias to-
tures based on Gestalt principles of grouping that include wards over-segmentation.
98 Int J Comput Vis (2008) 80: 92–103

5 Experimental Results Having learned the required priors and likelihoods, the
probability for selecting each of the segmentation algo-
To evaluate the proposed framework, we performed a per- rithms based on image features was computed for each
formance analysis on a subset of the Berkeley Segmentation image in the test dataset. In addition, the precision-recall
Dataset (Martin et al. 2001). The experiment was based on framework (Ren and Malik 2003; van Rijsbergen 1979) was
a two-part approach, learning the required models based on used to evaluate the performance of the proposed approach.
a learning dataset of images, and testing the learned mod- Precision is the fraction of detections which are true posi-
els on a dataset of test images. We divided the dataset of tives while recall is the fraction of true positives which are
300 images into 100 images for training and the remaining detected. Figure 3 shows the precision-recall curve for the
200 for testing. All the images in the training set were sub- training and test datasets. Both curves are almost the same
jected to segmentation based on each of the four algorithms. indicating that the performance predictor generalizes well
Image features Y and the segmentation quality features Xi and that the likelihood models estimated do not overfit the
dependent on each of the i algorithms were computed. In data. On the other hand, the curves could possibly be im-
addition, given the four segmentations for each image, the proved by employing more complex models for the like-
optimal segmentation was chosen as the one that minimized lihood functions. More sophisticated models may provide
the overall error between the segmentation output and the better performance than the simple models used here.
ground-truth. A loss function was introduced such that a loss To evaluate the sufficiency of extracted features in de-
of 1 was assigned for each pixel in a segmented region that scribing the image context, the effect of color and texture
did not belong to that regions ground-truth segmentation. features on the performance of the predictor was analyzed,
Hence, the total error was computed as the sum of all losses independently. Figure 4 presents the performance of the im-
divided by the total number of pixels in the image. age features during training phase. It is clear that both color
During the training phase, Pi (H ) had to be estimated giv- and texture features contain independent information and
ing a probability for picking a segmentation algorithm given provide useful knowledge towards the prediction of the opti-
the segmentation quality measure Xi . To do so, Pi (H ) was mal segmentation algorithm. Between color and texture fea-
computed as a ratio of the number of times a minimum mea- tures, texture has more information relevant to the algorithm
sure of Xi across all segmentation outputs resulted in select- selection process. In addition, the analysis also shows that
ing the correct algorithm by the total number of images. In other features could potentially add more information to bet-
addition, the likelihood distributions for p(Yj |Xi , H ) was ter represent the image context.
assumed to be Gaussian distributed. This captured the re- Figure 5 shows an example of segmentation selection un-
lationship between the image features and the likelihood of der the developed model. Figures 5a and b show the input
obtaining the best segmentation with quality measure Xi un- image and the ground-truth segmentation, respectively. Fig-
der hypothesis H . The parameters of this distribution were ures 5c–f show the segmentation result obtained by each of
estimated by computing the mean and covariance of the fea- the candidate segmentation methods. In this case, the devel-
tures Xi for each segmentation algorithm. In addition, a oped model chose the mean-shift segmentation algorithm,
likelihood model for negative association pNA (Yj ) based on which is the correct choice as that results in the minimum
image features was also estimated by computing the mean error between the resulting segmentation and ground-truth
and covariance of the quality features Xi belonging to the segmentation. The errors for each of the resulting segmen-
segmentation algorithm with the highest squared error mea-
sured between the segmentation output and the ground truth.

Fig. 4 Precision-recall curves for prediction performance based on


Fig. 3 Precision-recall curves for the developed performance predic- color, texture, and combined features as evaluated on the training set
tion system on training and test images of images
Int J Comput Vis (2008) 80: 92–103 99
Fig. 5 Resulting segmentation
of the input image (a) based on
each of the four candidate
algorithms (c)–(f). The
mean-shift algorithm was
selected as the optimal choice

tations, Figs. 5c–f, were 0.101, 0.053, 0.061, and 0.025, re- have similar visual qualities. This result reinforces the fact
spectively. that the problem of segmentation cannot be solved based
Figure 6 presents another example of the ability of the on a purely analytical solution and requires incorporation of
developed framework to select the optimal segmentation al- psycho-physical perception. It is also safe to conclude that
gorithm. Once again, Figs. 6a and b show the input im- the developed framework, while did not select the algorithm
age and the ground-truth segmentation, respectively. Fig- with the lowest error, did select the algorithm that resulted
ures 6c–f show the segmentation result obtained by each of in one of the better segmentations.
the candidate segmentation methods. In this case, the de- Table 1 presents the prediction rate of the developed sys-
veloped model chose the level set approach, and the errors tem. To get a reliable estimate, we used a 5-fold validation
associated with each of the algorithms c–f were computed
scheme to evaluate the prediction rate. All images were di-
to be 0.155, 0.048, 0.355, and 0.148, respectively.
vided into 5 groups, each with 60 images. In every test, only
Figure 7 presents an example where the optimal segmen-
one group of images is used as training samples and the
tation algorithm chosen by the developed model is in fact the
other 240 images are used as testing samples. After 5 it-
second best segmentation. Figures 7a and b show the input
image and the ground-truth segmentation, respectively. Fig- erations, all images have already been used for training and
ures 7c–f show the segmentation result obtained by each of 1200 images for testing. The chosen algorithm by the perfor-
the candidate segmentation methods. The developed model mance predictor is considered to be correct if the respective
chose the graph-based segmentation algorithm, when in fact segmentation is the one with the lowest error as compared
the error associated with the level set approach in lower than to ground-truth segmentation. Accordingly, the predictor’s
that of graph-based algorithm. The errors associated with performance in selecting the best algorithm is computed to
each of the algorithms c–f were 0.121, 0.120, 0.253, and be 86.4%. Each of the algorithms were ranked based on esti-
0.334, respectively. Based on visual inspection, it is clear mated probabilities and the table below shows that the algo-
that both algorithms achieve acceptable segmentations and rithm having the second, third, and fourth largest probability
100 Int J Comput Vis (2008) 80: 92–103
Fig. 6 Resulting segmentation
of the input image (a) based on
each of the four candidate
algorithms (c)–(f). The level set
approach was selected as the
optimal choice

Table 1 Accuracy rates of the predictor in selecting the algorithm Table 2 Accuracy rates in achieving segmentation with the lowest er-
resulting in segmentation with the lowest computed deviation from ror against the ground-truth segmentation for each of the candidate al-
ground-truth segmentation. The table also shows how often other gorithms as evaluated against the entire image set using a 5-fold vali-
choices of the predictor were ones with lowest error dation scheme

Probabilistic predictor ranking Prediction rate (%) Segmentation algorithm Best segmentation rate (%)

First (max probability) 86.40 Graph-based 27.4


Second 11.70 Level set 26.8
Third 1.90 Normalized cut 22.3
Fourth 0.10 Mean-shift 23.5

was the one with lowest error in 11.7%, 1.8%, and 0.1% of predictor has an accuracy of 86.4%, clearly indicating its
the cases, respectively. ability to select an appropriate algorithm based on the given
Further, 5-fold validation was also performed for each image.
segmentation algorithm independently to compare the ben- To additionally evaluate the performance of the predictor,
efit of the performance predictor. Table 2 gives the accu- we employ four quantitative performance measures intro-
racy rates for each algorithm, where the rate was determined duced in Freixenet et al. (2002), Martin et al. (2001), Meila
based on the number of times the particular algorithm re- (2005), Yang et al. (2006), namely, Probabilistic Rand Index
sulted in segmentation with the lowest error. As seen, all al- (PRI), Variation of Information (VoI), Global Consistency
gorithms have similar accuracy rates indicating that no sin- Error (GCE), and Boundary Displacement Error (BDE). PRI
gle algorithm consistently provides a result with the mini- indicates the fraction of pixel pairs whose labels are con-
mum error as compared to the ground-truth. In contrast, the sistent between the obtained segmentation and the ground
Int J Comput Vis (2008) 80: 92–103 101
Fig. 7 Resulting segmentation
of the input image (a) based on
each of the four candidate
algorithms (c)–(f). The
graph-based approach was
selected as the optimal choice,
while the error associate with
the level set approach was lower
than that of the graph-based
segmentation result

truth. VoI captures the amount of randomness in the ob- Table 3 Average performance on the subset of the Berkeley Database
chosen as the testing set for the performance predictor. PRI ranges be-
tained segmentation that cannot be explained by the ground tween [0, 1], with 1 being the best. VoI ranges between [0, ∞), which 0
truth. This is expressed in terms of the average conditional being the best. GCE ranges between [0, 1], with 0 being the best. BDE
entropy of the obtained segmentation. GCE measures the ex- ranges between [0, ∞), with 0 being the best
tent of dependency between the obtained segmentation and PRI VoI GCE BDE
the ground truth providing a measure of consistency be-
tween the two. Finally, BDE measures the offset error in Algorithm chosen by predictor 0.8445 2.2743 0.1754 7.6821
the boundary pixels of the obtained segmentation over the Graph-based 0.7821 2.4271 0.1965 8.8932
ground truth. Table 3 gives the comparison of the segmenta- Level set 0.7667 2.2053 0.2374 8.5473
tion obtained by the predictor against each of the algorithms Normalized cut 0.7564 2.7541 0.2353 8.3678
as computed over the testing data set. Mean-shift 0.7675 2.4572 0.2265 8.6031
As can be seen, the predictor outperforms the individ-
ual algorithms in terms of PRI, GCE, and BDE, each of
which measure the segmentation in terms of under- and not the minimal, but closer to the average of the VoI of the
over-segmentation. This can be considered indicative of the chosen algorithm for each image in the testing data set.
fact that the models used by the predictor in estimating seg-
mentation quality are represented accurately since the seg-
mentation quality metrics outlined in Sect. 4 clearly are the 6 Conclusions
ones that penalize over- and under-segmentation along with
pixel-level differences in each of the segmented regions. On This paper has proposed a probabilistic framework based on
the other hand, VoI, which is more indicative of the random- Bayesian theory for the performance prediction and selec-
ness between the obtained segmentation and ground truth, is tion of an optimal segmentation algorithm. The framework
102 Int J Comput Vis (2008) 80: 92–103

models the optimal algorithm selection process as one that Ettinger, G. J., Klanderman, G. A., Wells, W. M., & Grimson, W. E. L.
accounts for the information content of an input image as (1996). Probabilistic optimization approach to SAR feature
matching. In Proceedings of SPIE (Vol. 2757, pp. 318–329).
well as the behavioral properties of a particular candidate Bellingham: SPIE.
segmentation algorithm. The input image information con- Felzenszwalb, P., & Huttenlocher, D. (2004). Efficient graph-based im-
tent is measured in terms of image features while the candi- age segmentation. International Journal of Computer Vision, 59,
date segmentation algorithm’s behavioral characteristics are 167–181.
Fogel, I., & Sagi, D. (1989). Gabor filters as texture discriminator. Bi-
captured through the use of segmentation quality features. ological Cybernetics, 61, 103–113.
Gaussian probability distribution models are used to learn Freixenet, J., Munoz, X., Raba, D., Marti, J., & Cufi, X. (2002). Yet
the required relationships between the extracted image and another survey on image segmentation: Region and boundary in-
algorithm features and the framework tested on the Berke- formation integration. In ECCV’02: Proceedings of the 7th Eu-
ropean conference on computer vision—Part III (pp. 408–422).
ley Segmentation Dataset using four candidate segmentation London: Springer.
algorithms. The results show that in 86.4% of the test im- Fu, K., & Mui, J. (1981). A survey on image segmentation. Pattern
ages, whose characteristics vary greatly, developed approach Recognition, 13, 3–16.
picks the optimal algorithm. The optimality of the algorithm Haralick, R. M., & Shapiro, L. G. (1992). In Computer and robot vision
(Vol. 1, pp. 303–370). Reading: Addison-Wesley.
is characterized by the segmentation quality metrics used to Konishi, S., & Yuille, A. L. (2000). Statistical cues for domain specific
model the predictor. A quantitative performance analysis is image segmentation with performance analysis. In IEEE com-
also performed on the segmentation results using four in- puter vision and pattern recognition or CVPR I (pp. 125–132).
dependent measures, the results of which are indicative of Laws, K. I. (1980). Textured image segmentation. Ph.D. thesis.
Liu, J., & Yang, Y. H. (1994). Multiresolution color image segmenta-
the predictor’s ability to model the segmentation process en- tion. IEEE Transactions on Pattern Analysis and Machine Intelli-
tirely based on the segmentation quality and hence capturing gence, 16, 689–700.
the ability to perform segmentation independent of any one Martin, D. R., Fowlkes, C. C., Tal, D., & Malik, J. (2001). A database
segmentation algorithm’s intrinsic properties. of human segmented natural images and its application to evaluat-
ing segmentation algorithms and measuring ecological statistics.
Future extensions of this work will focus on evaluating In International conference on computer vision II (pp. 416–423).
different image features as well as measures of segmentation Meila, M. (2005). Comparing clusterings: an axiomatic view. In
quality. In addition, more realistic models for learning the ICML’05: Proceedings of the 22nd international conference on
machine learning (pp. 577–584). New York: Assoc. Comput.
relationship between the image features and segmentation
Mach.
quality measures need to be explored. The precision-recall Nair, D., & Aggarwal, J. K. (1996). A focused target segmentation par-
framework can be used to evaluate the quality of the chosen adigm. In Fourth European conference on computer vision (Vol. 1,
model. pp. 579–588). Berlin: Springer.
Pal, N. R., & Pal, S. K. (1993). A review on image segmentation tech-
niques. Pattern Recognition, 26, 1277–1294.
Pavlidis, T. (1977). Structural pattern recognition. Berlin: Springer.
References Perner, P. (1999). An architecture for a CBR image segmentation sys-
tem. In ICCBR’99: Proceedings of the third international confer-
ence on case-based reasoning and development (pp. 525–534).
Borsotti, M., Campadelli, P., & Schettini, R. (1998). Quantitative eval- Berlin: Springer.
uation of color image segmentation results. Pattern Recognition Puzicha, J., Hofmann, T., & Buhmann, J. M. (1999). Histogram clus-
Letters, 19, 741–747. tering for unsupervised segmentation and image retrieval. Pattern
Brox, T., & Weickert, J. (2004). Level set based image segmentation Recognition Letters, 20, 899–909.
with multiple regions. In LNCS: Vol. 3175. Pattern recognition Randen, T., & Husoy, J. H. (1999). Filtering for texture classification:
(pp. 415–423). Berlin: Springer. A comparative study. IEEE Transactions on Pattern Analysis and
Chalmond, B., Graffigne, C., Prenat, M., & Roux, M. (2001). Con- Machine Intelligence, 21, 291–310.
textual performance prediction for low-level image analysis algo- Ren, X., & Malik, J. (2003). Learning a classification model for
rithms. IEEE Transactions on Image Processing, 10, 1039–1046. segmentation. In International conference on computer vision
Chiang, H. C., & Moses, R. L. (1999). ATR performance predic- (pp. 10–17).
tion using attributed scattering features. In Proceedings of SPIE Reynolds, R. G., & Rolnick, S. R. (1995). Learning the parameters
(Vol. 3721, pp. 785–796). Bellingham: SPIE. for a gradient-based approach to image segmentation using cul-
Chiang, H. C., Moses, R. L., & Potter, L. C. (2000). Classification per- tural algorithms. In Proceedings of the first international sympo-
formance prediction using parametric scattering feature models. sium on intelligence in neural and biological systems (INBS’95)
In Proceedings of SPIE (Vol. 4053, pp. 7546–557). Bellingham: (Vol. 240). Los Alamitos: IEEE Comput. Soc.
SPIE. Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation.
Cho, K., Meer, P., & Cabrera, J. (1997). Performance assessment IEEE Transactions on Pattern Analysis and Machine Intelligence,
through bootstrap. IEEE Transactions on Pattern Analysis and 22, 888–905.
Machine Intelligence, 19, 1185–1198. Shufelt, J. A. (1999). Performance evaluation and analysis of monocu-
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach to- lar building extraction from aerial imagery. IEEE Transactions on
ward feature space analysis. IEEE Transactions on Pattern Analy- Pattern Analysis and Machine Intelligence, 21, 311–326.
sis and Machine Intelligence, 24, 603–619. Spann, M., & Nieminen, A. (1988). Adaptive Gaussian weighted filter-
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene ing for image segmentation. Pattern Recognition Letters, 8, 251–
analysis. New York: Wiley-Interscience. 255.
Int J Comput Vis (2008) 80: 92–103 103

Spann, M., & Grace, A. (1994). Adaptive segmentation of noisy and nical Report UCB/EECS-2006-195). EECS Department, Univer-
textured images. Pattern Recognition, 27, 1717–1733. sity of California, Berkeley.
Tourassi, G. D., Frederick, E. D., Vittitoe, N. F., & Coleman, R. E. Zhang, Y. J. (1996). A survey on evaluation methods for image seg-
(2000). Fractal texture analysis of perfusion lung scans. Comput- mentation. Pattern Recognition, 29, 1335–1346.
ers in Biomedical Research, 33, 161–171. Zhang, X., & Haralick, R. M. (1993). Bayesian corner detection. In
Tuceryan, M., & Jain, A. K. (1993). Texture analysis. Handbook of British machine vision conference.
Pattern Recognition and Computer Vision, 235–276. Zhang, H., Cholleti, S., Goldman, S. A., & Fritts, J. E. (2006). Meta-
van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). London: evaluation of image segmentation using machine learning. In
Butterworths. IEEE conference on computer vision and pattern recognition I
Yang, A. Y., Wright, J., Sastry, S. S., & Ma, Y. (2006). Unsupervised (pp. 1138–1145).
segmentation of natural images via lossy data compression (Tech-

You might also like