DOC10 Yang Et Lin Content Based 3 D Model Retrieval A Survey (2007)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO.

6, NOVEMBER 2007 1081

Content-Based 3-D Model Retrieval: A Survey


Yubin Yang, Member, IEEE, Hui Lin, Member, IEEE, and Yao Zhang

AbstractAs the number of available 3-D models grows, there currently available can be efficiently retrieved and reused, much
is an increasing need to index and retrieve them according to their less time and effort will be needed to complete the modeling
contents. This paper provides a survey of the up-to-date methods task [4].
for content-based 3-D model retrieval. First, the new challenges
encountered in 3-D model retrieval are discussed. Then, the sys- Content-based is a preferred method for handling multime-
tem framework and some key techniques of content-based 3-D dia data efficiently [5][7]. However, compared with the thriv-
model retrieval are identified and explained, including canonical ing development and achievements in 1-D and 2-D multimedia
coordinate normalization and preprocessing, feature extraction, search, the 3-D model search is still in its infancy. Many Web-
similarity match, query representation and user interface, and sites only allow users to search 3-D models in quite a limited and
performance evaluation. In particular, similarity measures using
semantic clues and machine learning methods, as well as retrieval primitive way, such as browsing a directory structure, running
approaches using nonshape features, are given adequate recog- a text keyword search [8][11], or retrieving by file type and
nition as improvements and complements for traditional shape- file size [12]. Thoese traditional text-based search techniques
matching techniques. Typical 3-D model retrieval systems and are not effective for 3-D models, as they suffer from problems,
search engines are also listed and compared. Finally, future re- such as low efficiency, low accuracy, and high ambiguity [6],
search directions are indicated, and an extensive bibliography is
provided. [7]. Moreover, the most significant problem is that 3-D models
contain both shape and appearance information, which is hard
Index Terms3-D object, 3-D shape, content-based retrieval, to represent and query, using text keywords alone.
feature extraction, query interface, similarity match.
To address these issues, the idea of retrieving 3-D models in a
Content-based scheme has already attracted considerable at-
I. INTRODUCTION
tention as a new hotspot in several research communities, such
S AN EXPANSION from audio (1st wave of multime-
A dia), image (2nd wave of multimedia), and video (3rd
wave of multimedia), 3-D digital models and 3-D scenes are
as computer vision, computer graphics, geometric modeling,
pattern recognition, mechanical computer-aided design (CAD),
and molecular biology [13]. This Content-based scheme is
now regarded as the 4th wave of multimedia and are suffi- now developing into a Content-based 3-D Model Retrieval
ciently realistic and complex to convey an increasing amount of methodology, concentrating on the representation, recognition,
perceptual detail [1]. Unlike 2-D images, 3-D models are able and matching of 3-D models on the basis of extraction and
to overcome the illusion problem caused by the human eye; comparison of intrinsic representative features, such as shapes,
thus object segmentation becomes less error prone and easier colors, textures, and light distributions [14][21]. A growing
to achieve [2]. Modern computer technology, powerful comput- number of researchers have become involved in this area and
ing capacity, combined with new acquisition and new modeling have already made much progress [22][25]. Many algorithms
tools make it much easier and cheaper to create and operate 3-D have been proposed and reported, and there has been subsequent
models with basic hardware, which consequently produces an increase in the publication of academic papers and books related
increasing amount of 3-D models in various sources, such as pro- to this topic, in a wide range of international journals and con-
fessional 3-D model collections (e.g., Biology, Medicine, Chem- ferences [13], [14], [16], [17], [26][30]. The new international
istry, Archaeology, and Geography) and on the Internet [3]. This standard MPEG-7 has also covered some 3-D object descriptors
naturally raises an increasing need for new retrieval tools, by as one of its feature sets [5], [31], [32]. In fact, content-based
which these large-scale and complicated new generation me- 3-D model retrieval can be applied widely in many domains,
dia can be easily organized and found by computer users. In such as CAD, cultural heritage applications, robotics, molecu-
addition, modeling highly realistic 3-D models is still a very la- lar biology, virtual geography environment (VGE), 3-D spatial
borious, high-cost, and time-intensive process. If the 3-D models terrain, medicine, chemistry, military, and industrial manufac-
turing [26], [33][40], [55]. It can also be potentially applied in
Manuscript received February 2, 2005; revised February 20, 2006 and May
9, 2006. This work was supported in part by the Key Program of the National e-Business and Web search engines in distributed data environ-
Natural Science Foundation of P.R. China under Grant 60723003, in part by ments [14], [41][44].
the National Natural Science Foundation of P.R. China under Grant 60505008, In this paper, we review the state-of-the-art research methods
and in part by the Natural Science Foundation of Jiangsu Province under Grant
BK2007520. This paper was recommended by Associate Editor L. Zhang. on content-based 3-D model retrieval and explore the important
Y. Yang is with the State Key Laboratory for Novel Software Technology, issues and the outstanding challenges of this area. Following
Nanjing University, Nanjing 210093, China. and also with the Joint Laboratory this introduction, the remainder of the paper is organized as
for GeoInformation Science, Chinese University of Hong Kong, Shatin, N.T.,
Hong Kong (e-mail: yangyubin@ieee.org). follows. In Section II, the basic concepts of content-based 3-D
H. Lin is with the Joint Laboratory for GeoInformation Science, Chinese model retrieval are briefly introduced, highlighting the new chal-
University of Hong Kong, Shatin, N.T., Hong Kong. lenges it faces. This is followed by a description of the archi-
Y. Zhang is with the Department of Information Management, Nanjing
University, Nanjing 210093, China. tecture framework, leading in Section III to a discussion of
Digital Object Identifier 10.1109/TSMCC.2007.905756 the canonical coordinate normalization process and 3-D model

1094-6977/$25.00 2007 IEEE


1082 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

preprocessing technologies. Section IV provides an appropriate models many more arbitrary and unpredictable positions, orien-
categorization of the currently available feature representa- tations, and measurements and makes the 3-D models difficult to
tion and extraction technologies. Section V reviews the rep- parameterize and search [2]. However, it is necessary to search
resentative work in similarity match techniques. Section VI 3-D models invariantly with respect to translation, rotation, scal-
briefly outlines the query representation and user interface tech- ing, and reflection [46]. Therefore, in many cases, more ad-
nologies. Section VII discusses performance evaluation meth- ditional alignment-normalization (pose registration) processes
ods. Section VIII enumerates some typical content-based 3-D may be required to align 3-D objects to their canonical coordi-
model retrieval systems and search engines. Finally, Section IX nate frame, or more intricate mappings or transformations for
concludes the paper with a number of directions for future extracting invariant feature representations of a 3-D model be-
research. fore a similarity match, which are time-consuming, computing-
intensive, and unstable.
II. CONTENT-BASED 3-D MODEL RETRIEVAL Second, the diversity of the representations of 3-D infor-
mation may impede the implementation of simple convenient,
A. Overview and efficient 3-D model retrieval systems. There is no single
The fundamental processing flow of content-based 3-D model common 3-D shape format that serves as a standard. The 3-D
retrieval can be described as follows: first, the compact and de- models are usually represented with two kinds of data: geo-
scriptive features, such as geometric shapes, spatial and topo- metric properties and appearance attributes. Geometric proper-
logical relationships, statistical measures, textures, and material ties have a wide variety of representations including different
colors, are computed and extracted automatically from the 3-D kinds of information like point data, surface data, volumet-
data to build the multidimensional information indices. Then, ric data, and structures, such as solid structures used in the
the similarity measure between a query and each target model CAD/computer-aided manufacturing (CAM) field, parametric
in the database is defined in the multidimensional feature space. surfaces orpolygon meshes, implicit surfaces, volumetric ar-
The similarity degrees are then calculated and sorted so that rays of voxel grids, or just unstructured polygon-soup and
the models having the nearest similarity are returned as the point clouds, while appearance attributes may contain mate-
matched results, on the basis of which, browsing and retrieval rial types, material colors, transparency, reflection coefficients,
in 3-D model databases are finally implemented [14]. Here, and texture mapping. Due to the representation multiplicity,
Content-based means that 3-D models are retrieved accord- most of the currently available 3-D model matching algo-
ing to their visual feature indices, which are automatically (or, rithms merely depend on 3-D shape properties based on some
semiautomatically) extracted and expected to characterize their specific 3-D data formats. How to overcome the unnecessary
contents. complexity and noneffective matching caused by format di-
The ultimate goal of content-based 3-D model retrieval is to versity is one of the major challenges of content-based 3-D
approach human visual perception so that the semantically simi- model retrieval. To identify viable solutions to address these
lar 3-D models can be correctly sourced by their looks. However, issues, it is necessary to develop some new types of high-
the types of methods outlined earlier, which can be termed, low- level descriptors to provide a unified view on perceptual un-
level similarity-induced semantics, capture some, but not all as- derstanding of a 3-D model. Nevertheless, a 3-D model usually
pects of the content of a 3-D model, and do not coincide with the lacks high-level semantic clues. Therefore, there also are chal-
high-level information it contains. For example, a sphere-like lenges for establishing effective mapping mechanisms between
shape can be used to describe a 3-D ball, but how can we judge low-level 3-D data representations and high-level semantic
whether it is a 3-D ball or a 3-D model of the globe by using descriptions.
that feature alone? This is the well-known semantic gap prob- Third, 3-D data representations have been designed for ef-
lem [45] which indicates the relatively limited descriptive power ficient visualization tasks, resulting in many problems for fea-
of low-level visual features for approximating subjective high- ture indexing and similarity comparison. For example, some
level perception. Therefore, extraction of high-level features 3-D representations are not inherently well-defined, such as vir-
which can derive semantics from low-level features should also tual reality modeling language (VRML)-like polygon soup
be integrated as an important part in content-based 3-D model and some unclosed meshes. It is difficult and ineffective to
retrieval. transform these into well-defined representations before feature
extraction [47][49]. Therefore, to accept polygon soup and
other ill-defined 3-D models is a further challenge for 3-D model
B. New Challenges retrieval tasks [50].
Content description and retrieval tasks have become much Finally, 3-D models have both considerable appearance at-
more complex and difficult due to the new intricacy existing in tributes and complex geometric properties, which greatly in-
the features of 3-D models, leading to new challenges, listed as creases the amount of content information. In addition, the di-
follows. mension of 3-D data is also too high to be dealt with effectively
First, building correct feature correspondence for 3-D models and efficiently. Moreover, the multiresolution feature represen-
is more difficult and time-consuming. The 3-D models possess tations should be effectively generated in order to be robust
more complex and excessive poses than 2-D media, with differ- against different level-of-detail (LOD) of 3-D model represen-
ent translations, rotations, scales, and reflections. This gives 3-D tations [46], [51].
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1083

Fig. 1. Typical architecture framework of content-based 3-D model retrieval [3], [52], [53].

C. Architecture Framework or pose registration, 2) Invariance: defining and extracting


feature descriptors that possess the inherent invariance charac-
A typical content-based 3-D model retrieval system gener-
ally consists of four main components: 1) canonical coordi- teristics, so as not to change under any rigid transformations. The
Invariance approaches have been accorded increasing weight
nate normalization (pose registration) and model preprocessing;
in recent research because of their robustness and simplicity.
2) feature extraction: the generation of both low-level features
capturing the 3-D shape, appearance properties, and high-level However, the invariance they possess is not always complete
and all-sided. Moreover, the computation of these feature de-
semantic features; 3) similarity match: relevance ranking pro-
cedure according to similarity degrees calculated; and 4) query scriptors is necessarily completed over a unit coordinate frame.
interface: a practical online user interface designed to represent Hence, to guarantee the descriptive power and robustness of
the feature representations, the canonical coordinate normaliza-
and process user queries. A typical architecture framework is
shown in Fig. 1. tion, such as alignment and scaling, is also necessary before
Different from the traditional 3-D object recognition tasks, invariance-keeping features are extracted.
Besides the normalization process, performing some pre-
which are usually performed at the cost of high computational
complexity by establishing correspondences between a pair of processing steps on 3-D models before their features are ex-
3-D models and then comparing them, content-based 3-D model tracted is inevitable. These steps include the transformation
between different 3-D data representations (e.g., to transform
retrieval needs to be performed on a per-model basis, which
means the feature used for matching should be calculated and polygon meshes into voxel grids), the partition of model units,
and vertex clustering, etc. [3], [14], [55].
stored independently of the target 3-D models [54]. This also
allows for the so-called offline feature extraction process be-
cause there is no need to explicitly establish correspondence. B. Normalization Steps
Therefore, for the genuine online retrieval phase, matching is
performed by comparing the values of feature descriptors for 1) Translation: First, the models center of mass should
each 3-D model. As can be seen from Fig. 1, the feature in- be shifted to the coordinate origin, i.e., I1 = I c = {u|u =
formation of each 3-D model is extracted from the 3-D model v c, v I}, where I is the original coordinate, I1 is the new
database during an offline stage to enable comparison to online coordinate after translation, and c is the models centroid [20].
queries later on. 2) PCA-Based Rotation: Next, for the rotation invariance,
the most popular approach is the PCA [20], [56], [57]. PCA
is used to determine the canonical coordinate system axes of a
III. CANONICAL COORDINATE NORMALIZATION 3-D model, based on calculating the corresponding eigenvectors
AND PREPROCESSING and the resulting diagonal matrix R of eigenvalues, decreasingly
ordered by their values. The rotation transformation is repre-
A. Overview sented as: I2 = R I1 = {x|x = R u, u I1 }, where I1 is
In many cases, we first need to normalize the size and orien- the 3-D models coordinate frame before rotation and I2 is the
tation of a 3-D model before extracting its features. The normal- new coordinate frame after rotation, which are identical to the
ization step aims to ensure that the same feature representation directions having the top three largest variances of the point
can be properly extracted from a 3-D model with any differ- distribution.
ent scale, position, and orientation. This allows us to perform The general PCA transformation in 3-D model retrieval is
search and retrieval tasks on a per-model basis, without fur- defined on the given set of representative points of a 3-D model,
ther alignment of 3-D models to each other. At present, there such as vertices, centroids of each surface, or even randomly
are two approaches to realize such a per-model based normal- selected locations on each surface using statistical techniques
ization [20]: 1) Normalization: finding a canonical coordinate (e.g., the Monte Carlo approach) [58]. In considering the differ-
frame using methods similar to those used in principle com- ent sizes of triangles or meshes of a 3-D model, some appropri-
ponent analysis (PCA), also referred to as pose estimation ate weighting factors, proportional to their surface area can be
1084 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

accommodated, so as to make the transformation more robust topological features maintain the linking relationship between
and improve the reliability and veracity of feature representa- surfaces, edges, and vertices. A review of the existing catego-
tion [20], [59], [60]. However, the point-based PCA transfor- rizations of 3-D shape representations according to different
mation may cause an inaccurate normalization result that will criteria and classification levels is listed in Table II [13], [29],
seriously affect the retrieval precision if the chosen vertices do [30], [52].
not distribute evenly on the surface. Therefore, a more thor- In this paper, in order to provide a more unified perspective
ough improvement, termed CPCA (continuous PCA), which on the way in which shape characteristics of a 3-D model are
performs PCA transformation based on the whole 3-D poly- represented, a new general taxonomy of 3-D shape feature ex-
gon mesh, is proposed in literature [57]. CPCA generalizes the traction methods is proposed the salient features of which are
PCA transformation by using the sums of integrals over sur- as follows: 1) approaches based on global geometrical analy-
faces instead of the sums over selective vertices. Assume that sis; 2) approaches based on function mapping; 3) approaches
the whole size of all the surfaces
 in a 3-D model is repre- based on statistical properties computation and; 4) approaches
sented as S = M i=1 S i = I dv, where v I, I = m i=1 Ti is based on topology analysis. However, it should be borne in mind
the point set of the 3-D model, and T = m i=1 T i (T i R3 ) is that these methods are not absolutely independent and isolated.
the triangleset. The covariance matrix R is then defined as In fact, many of them are quite interdependent. The purpose
R = 1/S I v v T dv. of our taxonomy is to provide a rational and comprehensi-
The PCA algorithm is fairly simple and efficient. However, it ble classification and summarization on the existing research
may erroneously assign the principal axes and produce inaccu- literature.
rate normalization results, especially when the eigenvalues are 1) Global Geometrical Analysis Methods: The global ge-
equal or close to each other, which usually happens to different ometry of a 3-D model is analyzed by directly sampling the
models within the same category [14], [22]. vertex set, the polygon mesh set, or the voxel set in the spatial
3) Reflection and Scaling: A diagonal flipping matrix F is domain. Aspect ratio, binary 3-D voxel bitmap, and 3-D angles
designed to accomplish the reflection invariance, which ensures of vertices or edges may be considered the most simple and
a model and its reflection will have the same feature descriptor. straightforward [64], although their discriminative powers are
Finally, the 3-D model should also be scaled by multiplying a limited. These types of analyses generally use PCA-like meth-
proper scaling coefficient s to a certain unit size to guarantee the ods to align the model into a canonical coordinate frame at first,
scaling invariance. The definition of the flipping matrix and scal- and, then, define the shape representation on this normalized ori-
ing coefficient can be found in the literature [57]. Consequently, entation. For instance, Vranic et al. [20] proposed a ray-based
the whole normalization process is illustrated as follows [20]: geometrical feature representation. They sampled a 3-D model
in its canonical coordinate frame as a set of regular spaced di-
(I) = s1 F R (I c). (1) rection vectors and set rays along each direction vector from
the coordinate origin, which intersected with the triangle mesh
IV. FEATURE EXTRACTION of a polyhedron surrounding the 3-D model. For each direction,
The new adopted features in content-based 3-D model re- the maximum distance from the intersected triangle mesh to the
trieval include 2-D shape projections, 3-D shapes, 3-D appear- coordinate origin was computed and all the distance samples
ances, and even high-level semantics, which are required not composed a feature vector. Using a similar idea, Yu et al. [65]
only to be extracted, represented, and indexed easily and ef- extracted the 3-D global geometry as distance map and surface
ficiently, but also to effectively distinguish the similar models penetration map features.
from the dissimilar models, invariant to typical affine transfor- Suzuki et al. [66] proposed a geometrical feature representa-
mations [20], [61][63]. tion based on the concept of equivalence class (see Fig. 2).
According to the different aspects of the content they repre- First, they divided a 3-D model into a set of grid cells and com-
sent, the features of a 3-D model can be roughly categorized puted the number of vertices (or normal vectors of a surface) in
into two main types: 1) shape features, namely, geometry and each grid cell, which was divided into different equivalence class
topology features and 2) appearance features, which represent groups. The equivalence class was defined as a set of grid cells
some important cognitive characteristics such as material col- sharing the common characteristics, for example, the vertices,
ors, reflection coefficients, and textures mapping. Categories and whose new position was still in the same set. The feature vector
comparisons of these feature extraction methods are provided was then calculated according to the properties of equivalence
in Table I in detail. classes, such as their vertex numbers. Heczko et al. [60] and
Vranic et al. [68] both implemented an octree structure method
to represent the shape features of 3-D volumetric models by
A. Shape Representations and Extraction fulfilling a multiresolution subdivision of the 3-D model space.
Currently, most of the work on shape feature extraction places For each grid cell, they took the sum of mesh sizes bounded
emphasis on geometrical and surface topological properties of by the grid cell, or the 3-D Fourier transformation coefficients
3-D shape features, based on surfaces, voxels, vertex sets, and as the feature components, which formed a feature descrip-
structural shape models [2], [13], [20], [29], [30], [42], [52]. tor of 2r 2r 2r dimensions, where r was the resolution of
Generally, geometrical features usually represent the specific octree representation. Tangelder et al. created a method using
shape and spatial position of surfaces, edges, and vertices, while weighted point sets as the shape descriptor for a 3-D polygon
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1085

TABLE I
CATEGORIES AND COMPARISONS OF 3-D FEATURE EXTRACTION METHODS

mesh [69]. They also enveloped the object in a 3-D voxel grid cases, the low efficiency was mainly caused because some of
and represented the shape as a weighted point set by selecting the feature representations cannot be computed directly from
one representative point for each nonempty grid cell. They then the 3-D meshes, which need to, first, be transformed into a 3-D
selected the vertex with the highest Gaussian curvature or the voxel space. This process is time-consuming and requires a large
area-weighted mean of all the vertices in a grid cell, to represent amount of storage space [73][75]. To address this issue, Zhang
the models geometry features. The experimental results given et al. [76] proposed a global geometrical analysis algorithm us-
by Tangelder were very promising, but their main shortcoming ing Divide-and-Conquer strategy, with no need of volumetric
was the long time it took to compute the descriptors. As for transformation. They first computed the features for each el-
3-D industrial solid models, Cicirello et al. [70] and McWherter ementary surface (a triangle or a tetrahedron) of a 3-D mesh
et al. [18] both compared 3-D shapes by extracting the geomet- model, and then summed them up to form the global feature
rical and engineering features of 3-D models in spatial domains. vector.
The common characteristic of these methods is that they are 2) Function Mapping Methods: Function mapping methods
almost all derived directly from the elementary unit of a 3-D establish a functional mapping from the original 3-D model to
model, that is, the vertex, polygon, or voxel, and a 3-D model a predefined domain, typically a spherical function with invari-
is viewed and handled as a vertex set, a polygon mesh set, or a ance properties or several representative 2-D planar views with
voxel set. Their advantages lie in their easy and direct deriva- reduced dimensions. This has long been studied in 3-D engi-
tion from 3-D data structures, together with their relatively good neering design and CAD communities, and has become one of
representation power. However, the computation processes are the popular means to extract 3-D shape signatures.
usually too time-consuming and sensitive for small features. a) Spherical Mapping: The first viable method is to map
Also, the storage requirements are too high due to the dif- arbitrary 3-D shapes into a series of spherical functions to guar-
ficulties in building a concise and efficient indexing mecha- antee rotation invariance. Horn et al. [77] generated extended
nism for them in large model databases [71], [72]. In order Gaussian image (EGI) mapping from a 3-D object to a Gaus-
to improve the overall performance, the Divide-and-Conquer sian sphere, which associated a point on the 3-D surface with the
strategy was adopted in the feature extraction process. In some corresponding point on the Gaussian sphere, by finding point
1086 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

TABLE II a spherical function to depict the energies at different frequen-


REVIEW OF DIFFERENT CATEGORIZATIONS OF 3-D SHAPE REPRESENTATIONS
cies. This approach can be viewed as an expansion of the Fourier
descriptor shape representation [80] to spherical functions. A
3-D volumetric model space was first represented as a series of
spherical harmonics by mapping and decomposing into a set of
concentric spheres with different radii. Then, those harmonics
within each frequency were summed up to compute the norm
of each frequency component, i.e., the spectrum of energies of
the spherical function as 3-D shape signatures. Fig. 4 illustrates
an instance of this calculation process. Those signatures are in-
variant and dimension-reduced so as to save both storage space
and matching time. Also, using the concept of energy, Leifman
et al. [81] proposed a shape feature representation based on
a sphere projection, which computed the amount of energy
required to deform a 3-D model into a predefined 3-D sphere
whose origin was located in the 3-D models centroid.
From a different perspective, Kazhdan et al. [61], [82] intro-
duced a reflective symmetry descriptor that captures the sym-
metry measures of 3-D objects. For each 2-D plane crossing
through the 3-D models centroid, a symmetry distance was de-
fined to compute how similar the model was to its reflection.
The 3-D voxel grid was then decomposed into a collection of
concentric spheres so that all the symmetry distances on those
spheres were computed. Finally, all the different values of those
Fig. 2. Feature extraction based on equivalence class [67]. symmetry distances were combined to form a reflective sym-
metry descriptor as a 3-D shape feature.
The methods described earlier produce invariant shape fea-
tures, which avoids the time-consuming canonical coordinate
normalization process in feature extraction. However, they also
have some shortcomings. Firstly, it is generally assumed that
Fig. 3. Multiresolution 3-D model reconstruction using spherical harmonic
mapping [79]. a 3-D model will have valid topology (for meshes), or explicit
volume (for volumetric models), which cannot be guaranteed
in practice. Secondly, the spherical function mapping process is
pairs having the same surface normal vector. The distribution of complicated and time-consuming.
those surface normal vectors on the Gaussian sphere was then b) 2-D Planar Mapping: The other type of function map-
extracted to match 3-D surfaces. Similar to the EGI feature, ping is quite intuitive and straightforward. Several 2-D func-
Delingette et al. [78] presented a representation termed, Sim- tional projections of a 3-D model or 2-D planar views from
plex Angle Images (SAI). A 3-D object was first approximated different perspectives are generated and combined as shape or
as a discrete ellipsoid using a surface deformation algorithm, silhouette feature descriptors.
then mapped to a normalized sphere. The curvatures of every The 2-D functional projection reduces the 3-D matching prob-
surface finally served as shape signatures. lem into a 2-D case without computing multiple views of the
Saupe et al. [79] adopted a spherical harmonic function as the object. Johnson et al. [26] proposed a spin image representa-
basic mapping function in the feature extraction process. A unit tion, a 2-D descriptive image associated with a sampling vertex
sphere S 2 centring at the centroid of a 3-D object was defined set on a 3-D surface, for which both the position and direction
then as a function mapping r(u)(u S 2 ) (as shown in Fig. 3), information were involved. The x and y coordinate values of
which was given as follows: the 2-D spin image were defined as the accumulated values of
r : S 2 C, r(u) = x(u) + i y(u). (2) two different distance functions of the 3-D vertices, and the
correlation coefficient between two spin images was computed
They then performed the discrete Fourier transform (DFT) on as the similarity measure. However, since a 3-D model usually
a number of samples of x(u) along some radial directions uij consists of many surfaces, there is a large set of spin images
in the spherical coordinate and took the first K transforming generated for each 3-D model. To achieve more concise and
coefficients as the feature vector. To improve on this method, compact feature representation, the original set of spin images
Vranic et al. [46] calculated the spherical Fourier transforming is compressed using the PCA method. Zhang et al. [83] reduced
coefficients by directly using the spherical harmonics function the 3-D surface matching problem to a 2-D image matching
in frequency domain instead of sampling the 3-D surface first. problem by employing harmonic map theory [84], which stud-
Funkhouser et al. [14], [62] also introduced a rotation-invariant ies the boundary mappings between different metric manifolds
spherical harmonic representation, whose key idea is to establish in terms of the energy-minimization principle. Gu et al. and
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1087

Fig. 4. Sphere harmonic representation [14].

Praun et al. [85], [86] discussed the geometry image concept,


a simple 2-D array of quantized points with useful attributes,
such as vertex positions, surface normals, and textures. Laga
et al. [87] applied this method to 3-D shape matching by simpli-
fying the 3-D matching problem to measure similarity between
parameterized, 2-D geometry images. All those methods make
use of specific 3-D geometry information from a 3-D model in
their 2-D mapping process.
Compared with these methods, the 2-D mapping methods
that establish mappings from a 3-D view to a set of specific
2-D planar views from different angles are much more natural
and simple. The basic idea is that if two 3-D shapes are simi-
lar, they should be similar from many different views. Thus, 2-D
shapes, such as 2-D silhouettes, can be extracted and adopted for
3-D shape matching. There is a prolific amount of literature on Fig. 5. Aspect graph [27].
these particular techniques. Vranic et al. [88] presented a fea-
ture representation based on 2-D boundary information, after
having projected the 3-D model onto three standard coordinate
planes, i.e., XY, XZ, and YZ planes. For each projection on a
specified plane, a silhouette was acquired by selecting contour
points, equidistantly or equiangluarly spanned; then, the Fourier
power spectrum was computed. The first n coefficients of the
power spectrum were finally extracted as the feature. The draw-
Fig. 6. Depth image [50].
back is the incapability of properly reflecting the 3-D spatial
information, since the 3-D model was only viewed as a sim-
ple combination of three standard 2-D projections, losing too a set of uniformly distributed cameras by borrowing the concept
much structure information. To solve this problem, Vranic et al. of light field from image-based rendering. The cameras were
added depth information, which encoded the spatial distance controlled to rotate many times when measuring the similarity
difference of 3-D surfaces into different gray values of their between descriptors of two 3-D models, so as to be switched
2-D projection images [88]. Also, they replaced contour-based onto their different vertices. The final 3-D model retrieval results
2-D shape matching with region-based, which also increased the were combined from the matching results of all those acquired
retrieval precision. Cyr et al. [27] proposed an aspect-graph 2-D images by integrating 2-D Zernike moment and Fourier de-
approach to represent 3-D shapes, as shown in Fig. 5. First, scriptors. Ohbuchi et al. [50] presented a similar method. They
2-D projection views were computed according to the view an- generated a depth or z-value image of a 3-D model from mul-
gles achieved after partitioning the viewing sphere by every 5 . tiple viewpoints v that were equally spaced on the unit sphere.
Then, similar 2-D projection views were clustered into the same The 3-D model matching was then performed by adopting a 2-D
group so as to generate a number of clusters called Aspect, Fourier descriptor [89] for similarity matching of 2-D images.
from which the shape representation was created by selecting a The main difference is that Chens 2-D image only contains sil-
representative view for each Aspect. Min et al. [3] projected houettes while Ohbuchis has depth information. Fig. 6 depicts
each 3-D model into several 2-D silhouette images from m dif- Ohbuchis feature extraction process. The depth image was first
ferent viewpoints and then matched all their combinations with mapped from the Cartesian coordinate into the polar coordinate
n (m > n) 2-D sketches drawn by the user or the counterpart to perform Fourier transformation before Fourier descriptors
combinations of other 3-D models. The similarity was mea- were computed.
sured as the minimal sum of all the pairwise sketch-to-image Since many more features can be extracted for a 2-D shape,
(or, image-to-image) similar scores. Chen et al. [24] proposed the function mapping methods make the retrieval process more
a light field descriptor representing the 4-D light field of a 3-D flexible. They can also largely reduce the complexity of feature
model with a collection of 2-D images, which were captured by computation and make the feature descriptor more compact.
1088 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

However, this inevitably causes much loss of important 3-D


information, since the function mapping process is restricted by
different constraints. Moreover, for 2-D planar view mapping,
how to decide the necessary number of 2-D projection views is
another problem in practice [27]. Fig. 7. Shape functions [2].
3) Statistical Properties Methods:
a) Moment: Shapes can also be compared based on their
statistical properties. The most extensively used statistical prop- the 3-D shape spectrum descriptor (3-DSSD) [5]. For rigid 3-D
erty is the moments, such as Hus image moments [90]. For shapes, Novotni et al. [96] introduced the so-called distance
example, Elad et al. [91] assumed the 3-D model to be a hollow histograms as a basic representation. Their fundamental idea
model bounded by its surfaces, and computed 3-D moment of was that if two objects were similar, only a small part of the
surfaces as the 3-D shape representation by volume of one of the objects would be outside the boundary
 of the other one, and the average distance from the boundary
mpq r = xp y q z r dxdydz (3) would also be small. They first computed the offset hulls of each
D
object based on a 3-D distance field, and then constructed the
where D is the 3-D model, D is the surface of D, and mpq r is distance histograms for each object to indicate how much of the
the (p,q,r)th 3-D moment. For a 3-D model, the set of moments volume of one object was inside the offset hull of the other.
mpq r is unique so that it constitutes a full and complete descrip- A specific set of shape functions can also be defined for
tion of D, and a partial object description can also be obtained histogram computation purposes in order to integrate probability
by using some subset of these moments [92]. distributions of geometrical properties. For instance, Osada et al.
However, the main drawback of these methods is that a unit- [2] computed different histograms, based on several self-defined
scale coordinate frame of the 3-D model has to be acquired prior shape functions known as shape distribution (including angle,
to the feature computation process. To address this issue, some distance, area, and volume, as shown in Fig. 7), which were all
new statistical feature extraction approaches with no need of calculated from a randomly sampled point set on a 3-D models
pose registration have been proposed. Shape feature based on surface. Song et al. [97] also adopted a histogram representation,
3-D Zernike moments [93] is an example. Novotni et al. [63] based on shape functions to match 3-D shape by generating
demostrated that 3-D Zernike moments are computed as a pro- histograms using the discrete Gaussian curvature and discrete
jection from the function defining the 3-D object onto a set of mean curvature of every vertex of a 3-D triangle mesh.
orthonormal functions within a unit sphere, which have sim- The introduction of geometrical properties into the histogram,
ple representation but good retrieval performance. They further makes multiresolution shape representation possible. Ohbuchi
presented 3-D Zernike invariants as the 3-D shape descriptor. et al. [51] proposed a multiresolution shape descriptor, repre-
A 3-D model was transformed into a voxel grid representation sented in the form of an ordered set of histograms. They first
of specific precision and then scaled into a unit sphere, so the defined a multiresolution representation (MRR) feature, speci-
geometrical moments were computed as follows: fied as a set of 3-D -shapes [98], which was defined by using
 a group of -values spaced at power of two intervals. -shapes
mpq r = f (x, y, z)xp y q z r dxdydz. (4) are a generalization of the convex hull of a point set, which
|x 2 +y 2 +z 2 |1
shrinks by gradually developing cavities utill it is identical to
Finally, the 3-D Zernike invariants were extracted on the ba- the convex hull when = [98]. Next, a 2-D histogram was
sis of those computed geometrical moments. The 3-D Zernike generated for each MRR so that an ordered set of histograms
invariants were reported to gain robustness against topological could be produced as the shape descriptor.
and geometrical deformations. To sum up, many statistical shape feature descriptors are sim-
b) Histogram: There are also many other kinds of statisti- ple to compute and useful for keeping invariant properties. In
cal property features expressed in the form of different discrete many cases, they are also robust against noise, or the small
histograms of geometrical statistics [94]. The shape representa- cracks and holes that exist in a 3-D model. Unfortunately, as
tion is simplified as a probability distribution problem by using an inherent drawback of a histogram representation, they pro-
histograms and avoids the model normalization process. In lit- vide only limited discrimination between objects: they neither
erature [95], 3-D histograms were constructed on the crease an- preserve nor construct spatial information. Thus, they are often
gles for all edges in a 3-D triangular mesh to match 3-D shapes. not discriminating enough to make small differences between
Paquet et al. [16] presented histogram features, including color dissimilar 3-D shapes, and usually fail to distinguish different
histogram, normal vector histogram, and material histogram to shapes having the same histogram.
represent 3-D shapes. Paquet et al. also pointed out that a his- 4) Topology Methods: Topology is a relatively high-level
togram can represent the 3-D data distributions, based on voxels, representation. It describes the organization and spatial arrange-
and is transformation invariant. Ankerst et al. [34] subdivided ment information: how vertices are connected to compose sur-
the space into shells and sectors around the centroid of a 3-D faces with edges. A well-designed graph data structure and graph
object, making the resulting partitions correspond to the bins algorithm can be adopted to represent the topology and skeleton
of a 3-D shape histogram. In the MPEG-7 standard, there is characteristic of 3-D models. Therefore, this type of method
also a shape histogram descriptor for 3-D mesh model known as usually produces a graph-like structure, rather than numeric
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1089

Fig. 8. Multiresolutional Reeb graph [101].

feature descriptors. Examples are as follows. Bardinet et al. [99]


presented a structured 3-D shape representation based on 3-D Fig. 9. Skeleton graph [104].
skeleton and medial axes, as an extension for the concept of
2-D medial axis transform (MAT) [100]. First, adequate at- constructed by directing the edge from a voxel with the higher
tributed relational graphs (ARGs), consisting of a set of nodes distance to the one with lower distance. Here, the distance means
with attributes and a set of links, were first generated and the the minimum distance from a voxel to the boundary of the
topological features were then extracted from the node and link volumetric object. Fig. 9 gives an example of the skeleton graph
structures of those graphs. Hilaga et al. [101] represented the structure.
topology of a 3-D object as a Reeb graph using a function Topology analysis can also be carried out by decomposing
of geodesic distance [102] between points on the mesh. The a 3-D model as a parametric model of a set of simple ele-
Reeb graph is a skeleton representation using a continuous scalar mentary regular shapes. The topology is depicted as the spatial
function defined on an object with arbitrary dimensions [103]. relationships and arrangements of those basic shapes, such as
An appropriately defined, continuous function is invariant to generalized cylinders [106], deformable regions [107], shock
affine transformations. Hilaga et al. proposed their topological scaffold [108], and superquadrics [109]. Ma et al. [110] even
feature as a Multi-resolution Reeb Graph (MRG) (as shown presented a practical approach, using a model based on radial
in Fig. 8), using a continuous function based on the distribution basis functions (RBFs) to extract 3-D skeletons. For a 3-D polyg-
of the geodesic distance, which is defined as follows: onal object, the vertices were treated as centers for RBF-level
 set construction and a gradient descent algorithm was employed
(v) = g(v, p)dS (5) on each vertex to locate the local maxima in the RBF. Finally,
pS
all the connected maxima pairs were handled using the Snake
where v is a point on a surface S, and g(v, p) represents the method, and the final positions of the Snake sequences were
geodesic distance between v and another point p on S, which extracted as the skeleton features. Tal et al. [111] first decom-
is the length of the shortest path from v to p. To produce the posed a mesh into elements called watersheds using a Wa-
scaling invariance, a normalization step is also used, represented tershed decomposition algorithm [112], then fit and classified
as follows: them into four kinds of basic shapes: a spherical surface a cylin-
(v) minpS (p) drical surface, a cone surface or a planar surface. Next, the
n (v) = (6) shape signature, an attributed decomposition graph, was then
maxpS (p)
constructed.
The MRG feature is invariant to translation and rotation and The topological and skeletal shape features are attractive for
robust against changes in topology structure caused by a mesh 3-D retrieval because they are able to capture the significant
simplification or subdivision. In consequence, it is discrimina- shape structures of a 3-D object. Meanwhile, they are relatively
tive on different level-of-detail. However, MRG lacks the ability high-level and close to humans intuitive perception, which
to correctly distinguish the corresponding parts of 3-D models. makes them useful for defining more natural 3-D query
A directed graph structure was also adopted to represent the representation. They can also perform part-matching tasks for
skeleton of a 3-D volumetric model [104], where an edge was containing both local and global structural properties. For some
directional according to a principle similar to shock graph [105]. kinds of topological representations, they are also robust against
First, a volumetric cube was thinned into a skeletal-graph, a the LOD structure of 3-D models due to their multiresolution
line-like sketch composed of the points on the medial axis of properties. However, 3-D models are not always defined well
the medial surface planes. Then, a clustering algorithm was enough to be easily and naturally decomposed into a canonical
implemented on the thinned voxels to increase the robustness set of features or basic shapes. In addition, the decomposition
against small perturbations on the surface and reduce the number process is usually computationally expensive. Moreover, model
of graph nodes. An undirected acyclic graph was first generated decomposition processes are quite noise-sensitive to small
out of the skeletal points by applying the minimum spanning perturbations of the model. Thus, extra effort is, in turn, required
tree (MST) algorithm. After that, the directed graph was finally to handle them [2]. Finally, compared with the comparatively
1090 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

straightforward indexing and similarity matching algorithms the appearance of 3-D models. In a simple rendering model,
based on numeric feature vectors, the indexing and matching material colors of a 3-D model can be specified by ambient
algorithms of graph-like representations are relatively more color, diffuse color, specular color, emissive color, shininess,
complex and time-consuming due to the necessary graph and transparency. Each material color item contains several
searching processes. And, since there is currently no universal light values. Since the shading model is given by equations
general-purpose graph matching solution, different graph with these light parameters, switching these values can generate
matching algorithms need to be designed to accommodate a large number of different colors and change the appearance
different graph-like representations. of objects. Hence, Suzuki et al. proposed a color extraction and
matching method to handle the material color databases effi-
ciently, based on the users subjective evaluation scales. First,
B. Appearance Representations and Extraction users were asked to evaluate and describe material colors for
The 3-D models usually possess multimodal feature descrip- some portions of the database as a study dataset. User inputs
tors. Besides the shape features, the appearance attributes of were then analyzed and a multidimensional space was created,
3-D models such as material color, color distribution, and tex- which reflected the users personalized evaluations of mate-
ture, etc., are also an important part of content-based 3-D model rial colors. To create a complete, personalized search space, a
retrieval. In particular, color and texture databases are neces- set of nonstudied data was mapped into the multidimensional
sary to render 3-D models. In many practical applications, 3-D space. Since the light characteristics of each material color were
appearance features, such as smoothness, roughness, and distri- known, coordinates of each material color could be predicted,
bution of light, might also be of interest [20], so that 3-D model using multiple-regression analysis. In that way, each material
databases may also need to be searched according to the se- color could be represented and matched. In addition, Suziki
lected appearance properties. Besides, the visual perception of et al. also evaluated another appearance feature representation
geometry of the human being is indeed influenced by color, and using the surface textures of 3-D models where the higher order
separate colors are often analyzed as distinct entities in humans local autocorrelation (HLAC) masks were extracted as texture
visual system [16]. However, there are still insufficient research features [113].
data on the appearance representation and extraction method-
ologies, compared with the abundant literature of 3-D shape
V. SIMILARITY MATCH
representation and extraction. This is partly due to the diversity
and complexity of appearance attributes. For example, the distri- After the feature extraction process, appropriate similarity
bution and spatial relationship of colors in 2-D images or videos measurements should be assigned to measure the content sim-
can be successfully defined and represented, whereas, in 3-D ilarity. The ideal goal of similarity measurement is twofold: 1)
models, this is not the case. Therefore, the issues of appearance to make the feature vectors of similar 3-D models as close as
representation and measurement in 3-D situations, particularly possible in the feature space, and 2) to maintain the largest pos-
how to integrate appearance information into the shape descrip- sible distances for dissimilar 3-D models. Hence, the mission
tor, or how to directly derive feature descriptors from appearance of the similarity match is to compute the suitable distances or
information that can then be combined with traditional shape de- dissimilarities in multidimensional feature space between the
scriptors to comprehensively feature 3-D models, and similarity user query and all the 3-D models in the database and rank them
measurements suitable for these appearance-contained feature in descending order of similarities as well. A variable number
descriptors, are necessary and require intensive study. Although of models are then retrieved by listing the top-ranking items.
some shape features also contain partial appearance information At present, the available similarity match methods in content-
such as color and texture, for example, geometry image [87] and based 3-D model retrieval can be categorized into four classes:
histograms [16], they are still too superficial to depict the 3-D 1) distance metrics; 2) graph matching; 3) machine learning;
appearance attributes properly. and 4) semantic measures.
To date, the appearance representations adopted in 3-D model
retrieval are mostly related to surface colors or surface textures.
Paquet et al. [16] presented a color feature extraction method by A. Distance Metrics
separately taking into account the material color and its luminos- Currently, distance metrics are perhaps the most popular and
ity: on the one hand, the material color attribute was described widely used similarity matching methods, with many types al-
using a color histogram of each component of red, green, blue ready, either proposed or in use for content-based 2-D media
(RGB) color space; on the other hand, luminosity attributes retrieval [6], [114]. A distance metric is a dissimilarity mea-
were represented, employing seven different histograms of dif- surement having some particular properties, for which there
fuse reflection coefficients, specular reflection coefficients, and is a comprehensive body of research [52], [115][117]. For
textures. Suzuki et al. [21] presented a color feature extrac- content-based 3-D model retrieval, the successfully applied dis-
tion method from a different perspective. An appearance fea- tance metrics include Manhattan distance [58], [102], Euclidean
ture representation based on material colors was proposed. This distance [3], [63], [118], and Hausdorff distance [66]. The Man-
method can retrieve 3-D polygonal models according to colors hattan and Euclidean measurements are both based on Lp dis-
by reflecting the users preferences from some material color tance (p = 1, 2), that is, Minkowski distance. The Lp distance
databases. It is believed that material colors greatly influence between two points in N-dimension space x, y RN is defined
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1091

as

Lp (x, y) = [ N
i=1 (xi yi ) ]
p 1/p
. (7)
All distances are metrics when p 1. Lp distance itself can
also be directly used as a similarity measurement. For example,
Osada et al. [2] employed it to implement a similarity match on
the probability density function of shape distribution features. In
particular, to assign different impacts to different features or to
allow relevance feedback, Euclidean distance is often modified Fig. 10. Graph matching [105].
to weighted Euclidean distance with the weight matrix [2], [61],
[91]. The Hausdorff distance, another frequently used metric,
is defined for comparing two point sets of different sizes as matching at lower levels [104]. The graph matching algorithm
follows: usually outputs a number of parameters that can be used to
determine the goodness of the similarity match, such as the
h(A, B) = minaA maxbB d(A, B) (8) number of nodes matched, and information about which nodes
where d(A, B) is a distance metric, such as Euclidean distance. were matched to other nodes. Further, a coarse-to-fine graph
However, it is very sensitive to noise since even a single out- matching strategy can also be easily adopted.
lier can change the distance [116]. Many other distance metrics
have also been studied for 3-D model retrieval task. Ohbuchi C. Machine Learning Methods
et al. [53], [58] introduced an elastic-matching distance in or-
The basic idea of similarity match based on machine learning
der to compensate the larger-than-wanted effect caused by
is to train a specific learning classifier to compute and rank simi-
rigid distance metrics, such as Euclidean distance, and the
larity degrees on a preselected training sample set with a specific
results were promising. Tangelder et al. [69] used an improved
scale by using machine learning methods such as artificial neural
earthmovers distance [119] as the distance measure. Some 2-D
networks, support vector machines (SVMs), etc. This is particu-
distance metrics based on Haar wavelet have also been pro-
larly appropriate in cases where no suitable distance metric can
posed [42], [120].
effectively measure the similarity, such as, for feature vectors
whose dimensions are too high. In those cases, some appropriate
B. Graph Matching Algorithms similarity measurements can be approached by learning hidden
When two 3-D objects that need comparing are represented by correlations and mappings from result-known training samples,
graph-like structures, specially designed graph matching algo- which allows for great flexibility in the retrieval process. For
rithms are needed in order to perform similarity match. However, example, Ibato et al. [28] utilized an SVM learning algorithm to
matching two graphs is typically considered as the largest iso- obtain similarity degrees of pairs of feature vectors. The SVM is
morphic subgraph problem, which is almost impossible to solve. a binary classifier that creates and uses a nonlinear hyperplane
Hence, the currently available 3-D shape similarity measure- with maximum margins to the training samples of the given pair
ments based on graph matching are all customized to the given of classes [121]. Ibato et al. carried out many experiments by
3-D topological features and do not provide general solutions. combining the transform-invariant D2 shape feature [2] with
For example, Bardinet et al. [99] achieved this by finding the SVM, feeding the feature vector to an SVM to compute
the association matrix M so that an objective function involving the dissimilarity. Pedro et al. [122] adopted another machine
all types of nodes, links, and attributes in the graph was mini- learning method, self-organized map neural network (SOM) to
mized. Some heuristic constraints were also used in the objective implement similarity match based on spin image shape features.
function to guarantee the correctness of graph matching. Hilaga In [123], a weighted similarity function for CAD model classifi-
et al. [101] attached each graph node with several attributes and cation, based on an underlying shape distribution feature repre-
defined the similarity between two nodes as the similarity be- sentation and a K-nearest-neighbor learning algorithm (KNN),
tween their attributes. Then, the similarity for a given set of node was proposed. Given a set of CAD solid models and correspond-
pairs was computed as a whole similarity measure. In [71], [72], ing categories, the KNN learning method was used to extract
and [105], a graph matching algorithm for 2-D shock graphs was the related patterns to automatically construct a model classifier
given (Fig. 10). A structural signature was defined for each and identify new or hidden classifications using the shape dis-
graph node, which characterized the nodes underlying subgraph tribution feature, learning from the stored, correctly categorized
structure, whose components were based on the eigenvalues of training examples. In addition, probabilistic approaches, such as
the subgraphs adjacency matrix. All the edges in the graph Bayes theorem, are also a practical way for similarity matching,
were discarded and the problem was redefined as finding the in which specific probabilities on features are calculated and the
maximum cardinality and minimum weight matching in a bidi- 3-D model having the highest probability will be identified as
rectional graph. However, this method cannot be guaranteed to the closest matching result [124].
conform to the hierarchical structure of two graphs. To solve Furthermore, machine learning methods can also be used to
this problem, this process was combined with a recursive depth- implement user relevance feedback mechanism in 3-D model
first search in order to use matching at higher levels to constrain retrieval, iteratively refining the retrieval results step by step, by
1092 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

making designed reactions to the users interactive evaluations. the retrieval results of the current querying step. An SVM-based
This can also achieve a personalized retrieval, based on differ- semantic clustering and retrieval method was also successfully
ent users preferences. A good example is Elad et als work implemented in the prototypical 3-D engineering shape search
on relevance feedback [91], [125]. They made use of the SVM system (3-DESS) designed by Purdue University [128].
learning algorithm to derive the optimal weight combination for In addition, some concept hierarchies, such as predefined
a weighted Euclidean distance metric, and made stepwise im- domain ontology, can also be introduced into the semantic
provements to the similarity match, according to every iteration measuring process. There is some work involved in building
of the users interactive evaluation. a fundamental framework for representing and measuring the
semantic information of 3-D models, such as the Aim@shape
project (http://www.aim-at-shape.net) launched by the Euro-
D. Semantic Measurements pean Commission in order to implement semantically capable
As the 3-D model retrieval results achieved by low-level fea- digital representations of 3-D shapes that are expected to ac-
tures have proven not to be as discriminative as people had quire, build, transmit, retrieve and process shapes with their
expected, this raises another important issue, that is, subjec- associated knowledge. This project is an attempt to formalize
tive semantic measurement in similarity comparison. Moreover, shape knowledge (in particular, metadata, used for knowledge-
whether a retrieved 3-D model is relevant and irrelevant to based shape modeling) and define shape ontologies in specific
the query is also judged by the users according to their subjective contexts used for linking semantic keywords to shape features.
perception, related to the semantic content [6]. Consequently, The shape knowledge representation is built on three basic lev-
it is highly important to develop semantic similarity-matching els: geometric, structural, and semantic, where, at the semantic
methods that consider the humans perception in content-based level, the association of specific semantics to structured and ge-
3-D model retrieval process. ometric models is established through automatic annotation of
Many approaches that have been proposed in 2-D media re- shapes or shape parts according to the concepts formalized by
trieval to reduce the semantic gap manage to perform simi- the domain ontology. Furthermore, by introducing a common
larity measurement based on high-level semantics [6], [7]. One formalization framework, it is also possible to build a shared
method is to learn the connections between a 3-D model and a set semantic conceptualization of a multilayered architecture for
of semantic descriptors, or the semantic meanings from those shape models.
automatically extracted 3-D model features. This approach is Another effective method is to perform user relevance feed-
usually based on machine learning and statistical classification, back after each search iteration in the database [91], [129], [130].
which groups 3-D models into semantically meaningful cat- This is effective in narrowing the gap between low-level feature
egories using low-level features so that semantically-adaptive similarity and high-level semantic similarity [91], by which,
searching methods can be applied to different categories. Ex- what the user has in mind is able to be better captured. To some
amples are as follows. Suzuki et al. [66], [126] constructed a extent, it is also seen as a method of semantic measurement and
multidimensional scaling mechanism so that semantic keyword has been extensively employed in 2-D media retrieval [130],
descriptors used in the query and the shape features computed [131]. In the case of 3-D retrieval, Leifman et al. [132] proposed
from the 3-D shapes were strongly correlated, based on a train- a relevance feedback method combining query refinement and
ing data set. The multidimensional scaling mechanism can an- supervised feature extraction at each step, which tried to find
alyze matrices of similar or dissimilar data by representing the an optimal linear transformation that reweights the low-level
rows and the columns as a point in a Euclidean space and then feature components in order to achieve the maximal separation
measure their similarities using Euclidean distance. They then of the original result set. They found this projection by maxi-
created a special user preference space according to this prin- mizing a cost function defined as Fishers Linear Discriminant
ciple, in which a function mapping from 3-D model space was Criterion [92]. Atmosukarto et al. [133] also presented a sub-
constructed to integrate semantic keywords and 3-D shapes as jective similarity measurement relevance feedback process by
a representation of human subjective perception. Zhang et al. combining various distances measured by different feature rep-
[127] introduced the concept of hidden annotation to con- resentations. This was implemented by computing the integer
struct a semantic tree of the whole 3-D model database. They rank rk (Oi |Oj ) of the 3-D object Oi with respect to 3-D object
used an active learning method to compute a list of probabilities Oj using a probability estimation method in the feature space
for each 3-D model, which indicated the models probability of the relevant and irrelevant result sets.
of having a certain semantic attribute. The list of probabilities
was then utilized to calculate the semantic distance between
VI. QUERY REPRESENTATION AND USER INTERFACE
two models, or between the user query and a model in the
database. The overall dissimilarity between two models was A content-based 3-D model retrieval system is expected to
finally determined by combining the weighted sum of the se- allow users to submit their query in a natural and interactive way.
mantic distance with the low-level feature distance. In [28], a Due to the abundance of content descriptions of 3-D models,
novel semantic measurement that could simulate human visual there should be a variety of query specifications to be supported,
perception was also presented. It was achieved by employing a including the follows.
well-trained SVM learning classifier constructed by performing 1) Query-By-Example (QBE): A 3-D model example is di-
SVM learning from the tagged similar and dissimilar models in rectly provided as a query. The example model can be a new one
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1093

the perception gap between the retrieval system and the users,
which is expected to enhance the retrieval performance.

VII. PERFORMANCE EVALUATION


To compare and evaluate the effectiveness of the search us-
ing the 3-D model retrieval algorithm, or how well the system
meets the users requirements, retrieval performance evaluation
is essential in content-based 3-D model retrieval.
Since there are many types of specialized 3-D models in dif-
ferent domains, the relevant research work, including versatile
shape representations and similarity measures, may also have
Fig. 11. Query by 2-D sketch [134]. varied results on the retrieval task. As a result, when consid-
ering the performance evaluation issue, the first action will be
to define a relatively common and general-purpose 3-D model
collection as a benchmark database, in order to define a common
method to provide relevance judgments.

A. Three-Dimension Model Benchmark Databases

Fig. 12. Relevance feedback interface [132]. There are currently several representative 3-D model
databases for performance evaluation purposes, among which,
the Princeton shape benchmark (PSB) [137] is perhaps the most
uploaded by users, one of the result models in specific retrieval popular and well-organized. PSB is a publicly available 3-D
process, or anyone in the available databases [3], [16], [118]. model benchmark database containing 1814 classified 3-D mod-
2) 2-D Projections: Representing a query with a set of els, which have been collected from the Internet and organized
2-D projection images of a 3-D example model from different into hierarchical semantic classifications made by experts. PSB
viewpoints [3]. is provided as separate training and test sets, and each 3-D model
3) 2-D Sketches: Using 2-D shapes sketched interactively by has a set of annotations. The authors have also issued a bundle
users. For instance, Min et al. [3], [134] designed an interactive of software tools for database analysis and browsing.
2-D sketch online interface, by which a query was represented In addition to PSB, some other 3-D model databases can also
as n 2-D sketches drawn by the users from several different be employed as standard benchmarks containing a wide variety
viewpoints, as shown in Fig. 11. of 3-D objects, which have been independently gathered by dif-
4) 3-D Sketches: Using a 3-D shape sketched interactively ferent research groups. These include the Utrecht databases [69],
by users. Min et al. [23] also implemented a 3-D sketch online MPEG-7 databases [29], and Taiwan databases [138]. There are
interface based on a 3-D sketch tool Teddy, which was designed also several benchmark databases on specific domains, for ex-
by Igarashi et al. [14], [135]. ample, CAD models [12] and 3-D protein structures [36]. More
5) Text: Query interface based on text keywords [3] and/or detailed statistics for most of the currently available 3-D model
semantic descriptions [127]. databases can be found in [137].
6) Multimodal Queries: Combinations of multiple query However, since the majority of these 3-D model databases
representations mentioned earlier. primarily focus on 3-D shapes, there are currently no standard
In general, a query that is simultaneously done by integrating benchmark databases for appearance attributes, such as color,
multiple-query specifications is more likely to produce better texture, and light distribution. Although PSB can partially per-
results than using any individual one [14], [136]. Moreover, the form this function, it is still neither ideal nor optimal.
user interface of 3-D model retrieval is responsible for display-
ing retrieval results to users in a visual and interactive way as B. Performance Evaluation Methods
well, in order to make the users browse them or pursue the next
retrieval iteration easily. Some 3-D model retrieval systems also The most common evaluation measures used in 3-D model
introduced an interactive user relevance feedback mechanism retrieval are precision and recall, which are introduced from the
into their query interface. For example, Elad et al. [91] presented information retrieval (IR) community and are widely adopted in
an user relevance feedback interface by providing users a chance 2-D media retrieval [139]. They are defined as:
to mark a subset of the initial retrieval results as relevant or Precision = Nc /Nr Recall = Nc /Ns (9)
irrelevant (using a tick or a cross symbol, as shown in
Fig. 12). Further, Zhang et al. [76] extended this kind of feed- where Nc is the number of retrieved models similar to the query,
back interface by adding a way to mark the extent of relevant Nr is the total number of retrieved models from the query, and
and irrelevant, providing both qualitative and quantitative ad- Ns is the number of similar models in the whole database.
justments. Similar work can also be found in Ref. [132] and They are also usually presented as a Precision-Recall (P-R)
Ref. [133]. The iterative refinement can automatically narrow graph [140], which shows how precision falls as more and more
1094 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

objects are retrieved. A retrieval performance evaluation can be TABLE III


TYPICAL CONTENT-BASED 3-D MODEL RETRIEVAL SYSTEMS
achieved based on the P-R graphs. The closer a precision is to 1,
the better performance is obtained. Moreover, the performance
can also be evaluated in some other aspects in terms of the P-R
graph, such as effectiveness and robustness [141]. However,
since relevant and irrelevant are both judged subjectively
by users, this evaluation is naturally born with subjectivity.
There are also other performance evaluation methods used
for 3-D model retrieval. For example, a similarity matrix mea-
surement was presented as a graphical performance evaluation,
by which a matrix with higher contrast was usually rated either
very similar or very dissimilar according to specifically designed
criteria [23]. Many other types of evaluation measures, such as
best matches, distance image, and tier image have also
been proposed [137].

VIII. EXAMPLES OF CONTENT-BASED


3-D MODEL RETRIEVAL SYSTEMS
For content-based 3-D model retrieval, there are already a
number of prototypes, standalone systems, and Internet-based
search engines implemented and publicized for the purpose of
research. Representative examples are listed in Table III.
Moreover, there also some professional 3-D model retrieval
systems. For example, Ankerst et al. [36], [55] developed
a content-based retrieval system for 3-D protein databases,
while HeriotWatt University implemented a Web-based search
engine, ShapeSifter (URL: http://www.shapesearch.net), and
Drexel University (URL: http://edge.mcs.drexel.edu/repository/
frameset.html) built a digital library for 3-D CAD models and
3-D engineering designs [15], [40]. Another noticeable trend
is the 3-D model retrieval services for handsets such as mo-
bile phones and personal digital assistants. For example, Suzuki
et al. [67] developed a 3-D model retrieval prototype for mobile
phone users. Moreover, a 3-D model retrieval system adopting
the MPEG-7 mechanism can also be easily tailored for Pocket-
PC users [24], [138].

IX. CONCLUSION AND FUTURE RESEARCH


Since its inception, content-based 3-D model retrieval has
witnessed abundant great development and achievements in both
theory and application. Nevertheless, the accuracy of content
information in 3-D models, as a consequence of its versatile
aspects and subjective cognition, is very much in question. Much
work still needs to be undertaken to remedy this situation. The
following are just some of the crucial issues and challenges
deserving further investigation.
1) Research on a unified 3-D model retrieval framework. The
3-D data formats are very diverse while the content of a 3-D
model remains independent of them and, so far, has been our
main focus of attention. A practical unified 3-D retrieval frame-
work is, therefore, needed; one that is capable of accommo-
dating most 3-D data representations adaptively by extract-
ing representation-independent features or performing standard
transformations on-the-fly. Moreover, considering the effi-
ciency of transmitting and retrieving 3-D models on the In-
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1095

ternet, feature extraction and similarity matching directly from [2] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, Matching 3D
compressed 3-D data are also meaningful. models with shape distributions, presented at the Int. Conf. Shape
Model. Appl., Genova, Italy, 2001.
2) Further development of more discriminative 3-D shape [3] P. Min, A. Halderman, M. Kazhdan, and A. Funkhouser, Early expe-
features, especially those that are normalization-free and possess riences with a 3D model search engine, in Proc. Web3D Symp., Saint
strong discriminative power. These must also be natural and Malo, France, 2003, pp. 718.
[4] T. Funkhouser, M. Kazhdan, P. Shilane, P. Min, W. Kiefer, A. Tal,
simple for effective index mechanisms. S. Rusinkiewicz, and D. Dobkin, Modeling by example, ACM Trans.
3) Local partial shape feature extraction, i.e., to implement the Graph., vol. 23, no. 3, pp. 652663, Aug. 2004.
feature vectors that are suitable for partial matching inside a 3-D [5] MPEG-7 Visual Part of eXperimentation Model, version 9.0 ed., MPEG
Video Group, Pisa, Italy, Jan. 2001.
model. In practice, partial shape features that can describe the [6] Y. B. Yang, Research and Applications on the Key Techniques of Content-
local details are often needed for more precise multiresolution Based Image Retrieval. Nanjing, China: Dept. Comput. Sci., Nanjing
and flexible retrieval. Univ., Jun. 2003. (in Chinese with English abstract).
[7] Y. X. Chen and J. Z. Wang, Eds., Machine Learning and Statistical
4) Multiple features need to be combined for effective sim- Modeling Approaches to Image Retrieval. Norwell, MA: Kluwer,
ilarity search. Some work has already been undertaken toward 2004.
this aim [53], [96], [136], [144]. However, when a large num- [8] 3D cafe free 3D models meshes. (2003). [Online]. Available:
http://www.3dcafe. com.
ber of feature descriptors are used for the query, the system [9] Avalon 3D archive. (2003). [Online]. Available: http://avalon. view-
may not be able to respond quickly because more computations point.com.
are needed. Therefore, feature descriptor selection or reduction [10] CADLIB Web based CAD parts library. (2003). [Online]. Available:
http://www.cadlib. co.uk.
techniques must be designed and applied. Consequently, how to [11] Meshnose, the 3D objects search engine. (2003). [Online]. Available:
select and weigh those feature descriptors is also an important http://www.deepfx.com/meshnose.
and promising future direction. [12] National design repository. (2003). [Online]. Available: http://www.
deepfx.com/meshnose.
5) Further development of nonshape descriptors of 3-D mod- [13] J. W. Tangelder and R. C. Veltkamp, A survey of content based 3D
els, such as material color and texture. Furthermore, extraction shape retrieval methods, in Proc. Shape Model. Int., 2004, pp. 145156.
of high-level semantic features and similarity measurements [14] T. Funkhouser, P. Min, and M. Kazhdan et al., A search engine for 3D
models, ACM Trans. Graph., vol. 22, pp. 83105, Jan. 2003.
combining with semantic information will also offer important [15] J. Corney, H. Rea, and D. Clark et al., Coarse filters for shape match-
research issues and challenges. ing, IEEE Comput. Graph. Appl., vol. 22, no. 3, pp. 6574, May/Jun.
6) Research on the mechanism of relevance feedback and 2002.
[16] E. Paquet and M. Rioux, A content-based search engine for VRML
personalized retrieval integrating user preferences, by which databases, in Proc. IEEE Int. Conf. Comput. Vis. and Pattern Recognit.,
users are able to tune the search criteria by themselves toward Santa Barbara, CA, USA, 1998, pp. 541546.
more satisfactory search results. [17] W. Regli and V. Cicirello, Managing digital libraries for computer-aided
design, Comput. Aided Des., vol. 32, pp. 110132, 2000.
7) Development of simple but powerful query interfaces. The [18] D. McWherter, M. Peabody, W. Regli, and A. Shokoufandeh, Transfor-
3-D sketching tool currently used for 3-D shape queries is not mation invariant shape similarity comparison of solid models, presented
user-friendly for the novice. A less complex way for users to at the ASME DETC. Pittsburgh, PA, Sep. 2001.
[19] S. Mukai, S. Furukawa, and M. Kuroda, An algorithm for deciding
build simple 3-D objects and 3-D sketches should be provided, similarities of 3-d objects, presented at the ACM Symp. Solid Model.
for example, an interface that allows users to form a complicated Appl., Saarbrucken, Germany, Jun. 2002.
3-D object by connecting some basic shapes, just like using [20] D. V. Vranic and D. Saupe, 3D model retrieval, presented at the
Spring Conf. Comput. Graph. (SCCG 2000). Comenius University Press,
building blocks [14]. On the other hand, a more effective query Budmerice Slovakia, May 2000.
interface that is able to locate objects under a nonrigid-body [21] M. Suzuki, A web-based retrieval system for 3D polygonal models, in
transformation should also be designed. Proc. Joint 9th IFSA World Congr. 20th NAFIPS (IFSA/NAFIPS 2001),
Vancouver, BC, Canada, Jul. 2001, pp. 22712276.
8) Retrieval issues targeted 3-D scenes that contain multiple [22] M. Kazhdan, Shape representations and algorithms for 3D model
3-D models. Currently, retrieval methods are mostly limited to retrieval, Ph.D. dissertation, Dept. Comput. Sci., Princeton Univ.,
the single 3-D model. However, in many applications, such as a Princeton, NJ, Jun. 2004.
[23] P. Min, A 3D model search engine, Ph.D. dissertation, Dept. Comput.
virtual reality environment, 3-D models are usually presented in Sci. Princeton Univ., Princeton, NJ, Jan. 2004.
complex 3-D scenes. Therefore, 3-D model retrieval technology [24] D. Y. Chen, Three-dimensional model shape description and retrieval
should be extended to handle more complex 3-D scenes [26], based on lightfield, Ph.D. dissertation, Dept. Compute. Sci. Inf. Eng.,
National Taiwan Univ., Taipei, Taiwan, Jun. 2003.
[145]. A novel hierarchical object structure of 3-D scenes may [25] T. Seidl, Adaptable similarity search in 3-d spatial database systems,,
need to be investigated, to localize and recognize the 3-D objects Ph.D. dissertation Faculty Math. Comput. Sci., Univ. Munich,. Munich,
in a 3-D scene. Germany, 1997.
[26] A. Johnson and M. Hebert, Using spin images for efficient object recog-
nition in cluttered 3D scenes, IEEE Trans. Pattern Anal. Mach. Intell.,
ACKNOWLEDGMENT vol. 21, no. 5, pp. 433449, May 1999.
The authors would like to acknowledge the insights and sug- [27] C. Cyr and B. Kimia, 3D object recognition using shape similarity-based
aspect graph, in Proc. 8th IEEE Int. Conf. Comput. Vision, Vancouver,
gestions provided by Dr. M. Barlow and all anonymous review- BC, Canada, 2001, pp. 254261.
ers on this manuscript. [28] M. Ibato, T. Otagiri, and R. Ohbuchi, Shape-similarity search of three-
dimensional models based on subjective measures, IPSJ SIG Notes
REFERENCES Graph. CAD, vol. 16, pp. 2530, 2002.
[29] T. Zaharia and F. Preteux, 3D versus 2D/3D shape descriptors: A com-
[1] P. Schroder and W. Sweldens, Digital Geometry Processing. Frontiers parative study,, in Proc. SPIE Conf. Image Process.: Algorithms Syst.
of Engineering Series. Washington, DC: Nat. Acad. Eng. Press, 2001, IIISPIE Symp. Electron. Imaging, Sci. Technol., San Jose, CA, Jan.,
pp. 4144. vol. 5298, pp. 4758.
1096 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

[30] N. Iyer, K. Lou, S. Jayanti, Y. Kalyanaraman, and K. Ramani, Three [56] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed.
dimensional shape searching: State-of-the-art review and future trends, New York: Academic, 1990.
Comput. Aided Des., vol. 37, no. 5, pp. 509530, 2005. [57] D. Vranic, D. Saupe, and J. Richter, Tools for 3D-object retrieval:
[31] T. Zaharia and F. Preteux, 3D shape-based retrieval within the MPEG-7 Karhunen-loeve transform and spherical harmonics, in Proc. IEEE 2001
framework, in Proc. SPIE Conf., Jan. 2004, vol. 4304, pp. 133145. Workshop Multimedia Signal Process., Cannes, France, Oct., pp. 293
[32] D. V. Vranic and D. Saupe, A feature vector approach for retrieval of 298.
3D objects in the context of mpeg-7, in Proc. Int. Conf. Augmented, [58] R. Ohbuchi, T. Otagiri, M. Ibato, and T. Takei, Shape-similarity search of
Virtual Environ. Three-Dimensional Imaging (ICAV3D 2001), Mykonos, three-dimensional models using parameterized statistics, in Proc. 10th
Greece, May, pp. 3740. Pacific Conf. Comput. Graph. Appl., Beijing, China, 2002, pp. 265275.
[33] H. Kriegel and T. Seidl, Approximation-based similarity search for 3-d [59] E. Paquet, A. Murching, T. Naveen, A. Tabatabai, and M. Roux, De-
surface segments, Geoinformatica, vol. 2, no. 2, pp. 113147, 1998. scription of shape information for 2-D and 3-D objects, Signal Process.:
[34] M. Ankerst, G. Kastenmuller, H. Kriegel, and T. Seidl, 3D shape his- Image Commun., vol. 16, pp. 103122, 2000.
tograms for similarity search and classification in spatial databases, in [60] M. Heczko, D. Keim, D. Saupe, and D. Vranic, A method for similarity
Proc. Symp. Large Spatial Databases, 1999, pp. 207226. search of 3D objects, (in German) in Proc. BTW 2001,. Oldenburg,
[35] D. Keim, Efficient geometry-based similarity search of 3D spatial Germany, Mar., pp. 384401.
databases, presented at the ACM SIGMOD Int. Conf. Manage. Data,. [61] M. Kazhdan, B. Chazelle, and D. Dobkin et al., A reflective symmetry
Philadelphia, PA, 1999. descriptor, in Proc. Eur. Conf. Comput. Vision (ECCV), May 2002,
[36] H. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, pp. 642656.
I. N. Shindyalov, and P. E. Bourne, The protein data bank, Nucleic [62] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, Rotation invariant
Acids Res., vol. 28, pp. 235242, 2000. spherical harmonic representation of 3D shape descriptors, presented
[37] G. Kastenmluer, H. Kriegel, and T. Seidl, Similarity search in 3D protein at the Eurograph./ACM SIGGRAPH Symp. Geom. Process. Aachen,
databases, presented at the 6th German Conf. Bioinformatics, Cologne, Germany.
Germany, 1998. [63] M. Novotni and R. Klein, 3D Zernike descriptors for content based
[38] K. Lou, S. Prabhakar, and K. Ramani, Content-based three-dimensional shape retrieval, presented at the 8th ACM Symp. Solid Model. Appl.
engineering shape search, in Proc. 20th Int. Conf. Data Eng. (ICDE Washington, Washington, DC, 2003.
2004), Boston, MA, Mar./Apr, pp. 754765. [64] L. Kolonias, D. Tzovaras, S. Malassiotis, and M. Strintzis, Fast content-
[39] N. Iyer, Y. Kalyanaraman, K. Lou, S. Jayanti, and K. Ramani, A recon- based search of VRML models based on shape descriptors, in Proc.
figurable, intelligent 3D engineering shape search system. Part I: Shape IEEE Int. Conf. Image Process, Thessaloniki, Greece, Oct. 2001, vol. 2,
representation, presented at the ASME DETC 2003, Comput. Inform. pp. 133136.
Eng. (CIE) Conf., Chicago, IL. [65] M. Yu, I. Atmosukarto, W. K. Leow, Z. Huang, and R. Xu, 3D model
[40] D. McWherter, M. Peabody, A. Shokoufandeh, and W. Regli, Solid retrieval with morphing-based geometric and topological feature maps,
model databases: Techniques and empirical results, ASME/ACM Trans., in Proc. IEEE Conf. Comput. Vis. Pattern Recognit, 2003, pp. 656
J. Comput. Inf. Sci. Eng., vol. 1, no. 4, pp. 300310, Dec. 2001. 661.
[41] 3D knowledge: Acquisition, representation and analysis. (2003). [On- [66] M. Suzuki, T. Kato, and N. Otsu, A similarity retrieval of 3D polygonal
line]. Available: http://3dk.asu.edu. models using rotation invariant shape descriptors, in Proc. IEEE Int.
[42] W. H. Rynson and W. Ben, Web-based 3D geometry model retrieval, Conf. Syst., Man, Cybern. (SMC 2000), Nashville, TN, pp. 29462952.
World Wide Web: Internet Web Inf. Syst., vol. 5, no. 3, pp. 193206, [67] M. Suzuki, Y. Yaginuma, and Y. Sugimoto, A 3D model retrieval sys-
2002. tem for cellular phones, in Proc. IEEE Int. Conf. Syst Man Cybern,
[43] J. Loffler, Content-based retrieval of 3D models in distributed web Washington, DC, Oct. 2003, pp. 38463851.
databases by visual shape information, in Proc. Int. Conf. Inf. Vis., Jul. [68] D. Vranic and D. Saupe, 3D shape descriptor based on 3D fourier
2000, p. 82. transform, in Proc. EURASIP Conf. Digital Signal Process. Multimedia
[44] M. Suzuki, A search engine for polygonal models to support develop- Commun. Services (ECMCS 2001), K. Fazekas, Ed., Budapest, Hungary,
ment of 3D e-learning applications, in Proc. 10th Int. World Wide Web Sep., pp. 271274.
Conf., 2001, pp. 182183. [69] J. Tangelder and R. Veltkamp, Polyhedral model retrieval using
[45] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, weighted point sets, in Int. J. Image Graph., vol. 3, pp. 121,
Content-based image retrieval at the early years, IEEE Trans. Pattern 2003.
Anal. Mach. Intell., vol. 22, no. 12, pp. 13491380, Dec. 2000. [70] V. Cicirello and W. Regli, Machining feature-based comparisons of
[46] D. Vranic and D. Saupe, Description of 3D-shape using a complex mechanical parts, in Proc. Int. Conf. Shape Model. Appl., May 2001,
function on the sphere, in Proc. IEEE Int. Conf. Multimedia Expo (ICME pp. 176185.
2002), Lausanne, Switzerland, Aug., pp. 177180. [71] A. Shokoufandeh, S. J. Dickinson, K. Siddiqi, and S. W. Zucker, Index-
[47] T. Murali and T. Funkhouser, Consistent solid and boundary repre- ing using a spectral encoding of topological structure, in Proc. Comput.
sentations from arbitrary polygonal data, in Comput. Graph. (1997 Vis. Pattern Recognit., 1999, vol. 2, pp. 491497.
SIGGRAPH Symp. Interact. 3D Graph.), pp. 155162. [72] A. Shokoufandeh, S. Dickinson, C. Jonsso, L. Bretzner, and T. Lindeberg,
[48] A. Gueziec, G. Taubin, F. Lazarus, and W. Horn, Converting sets of On the representation and matching of qualitative shape at multiple
polygons to manifold surfaces by cutting and stitching, in Proc. IEEE scales, in Proc. 7th Eur. Conf. Comput. Vis., Copenhagen, Denmark,
Vis., 1998, pp. 383390. May 2002, pp. 759775.
[49] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, Shape dis- [73] H. Chen and T. Huang, A survey of construction and manipulation of
tributions, ACM Trans. Graph., vol. 21, no. 4, pp. 807832, Oct. octrees, Comput. Vis., Graph., Image Process., vol. 43, pp. 409431,
2002. 1988.
[50] R. Ohbuchi, M. Nakazawa, and T. Takei, Retrieving 3D shapes based on [74] S. Yang and T. Lin, A new linear octree construction by filling algo-
their appearance, in Proc. 5th ACM SIGMM, Int. Workshop Multimedia rithms, in Proc. 10th Annu. Int. Phoenix Conf., 1991, pp. 740746.
Inf. Retrieval, Berkeley, CA, 2003, pp. 3945. [75] Y. Kitamura and F. Kishino, A parallel algorithm for octree genera-
[51] R. Ohbuchi and T. Takei, Shape-similarity comparison of 3D models tion from polyhedral shape representation, Pattern Recognit., vol. 3,
using alpha shapes, in Proc. 11th Pacific Conf. Comput Graph. Appl. pp. 303309, 1996.
(PG 2003), Canmore, AB, Canada, Oct., pp. 293302. [76] C. Zhang and T. Chen, Efficient feature extraction for 2d/3d objects in
[52] T. Hlavaty, 3D object classification and retrieval, Depart. Comput. Sci. mesh representation, presented at the ICIP 2001. Thessaloniki, Greece.
Eng., Univ. West Bohemia. Pilsen, Czech Republic, Mar. 2003, Tech. [77] B. Horn, Extended gaussian images, Proc. IEEE, vol. 72, no. 12,
Rep. DCSE/TR-2003-05. pp. 16711686, Dec. 1984.
[53] R. Ohbuchi, T. Minamitani, and T. Takei, Shape-similarity search of [78] H. Delingette, M. Hebert, and K. Ikeuchi, A spherical representation for
3D models by using enhanced shape functions, in Proc. Theory Pract. the recognition of curved objects, in Proc. ICCV, 1993, pp. 103112.
Comput. Graph, Jun. 2003, pp. 97105. [79] D. Saupe and D. Vranic, 3D model retrieval with spherical harmonics
[54] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, Shape matching and and moments, in Proc. DAGM 2001, B. Radig and S. Florczyk, Eds.,
anisotropy, ACM Trans. Graph., vol. 23, no. 3, pp. 623629, Aug. 2004. Munich, Germany, Sep., pp. 392397.
[55] M. Ankerst, G. Kastenmuller, H. Kriegel, and T. Seidl, Nearest neighbor [80] C. Zahn and R. Roskies, Fourier descriptors for plane closed curves,
classification in 3D protein databases, in Proc. ISMB, 1999, pp. 3443. IEEE Trans. Comput., vol. 21, no. 3, pp. 269281, Mar. 1972.
YANG et al.: CONTENT-BASED 3-D MODEL RETRIEVAL: A SURVEY 1097

[81] G. Leifman, S. Katz, A. Tal, and R. Meir, Signatures of 3D models for [110] W. Ma, F. Wu, and M. Ouhyoung, Skeleton extraction of 3D objects
retrieval, in Proc. Israel-Korea Bi-Nat. Conf. Geom. Model. Comput. with radial basis functions, in Proc. Shape Model. Int. 2003, Seoul,
Graph., Feb. 2003, pp. 159163. Korea, May, pp. 207216.
[82] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, Symmetry descrip- [111] A. Tal and E. Zuckerberger, Mesh retrieval by components, presented
tors and 3D shape matching, in Symp. Geometry Process. Jul. 2004, at the Int. Conf. Comput. Graph. Theory Appl., Set ubal, Portugal, Feb.
pp. 117126. 2006.
[83] D. Zhang and M. Hebert, Harmonic shape images: A 3D free-form [112] J. C. Serra, Image Analysis and Mathematical Morphology, 1st ed.
surface representation and its applications in surface matching, in Proc. London, U.K.: Academic, 1982.
Energy Minimization Methods Comput. Vis. Pattern Recognit., M. Pellilo [113] M. Suzuki, Y. Yaginuma, and Y. Shimizu, A texture similarity evaluation
and E. Hancock, Eds. New York: Springer Verlag, 1999, pp. 3043. method for 3D models, in Proc. Int. Conf. Internet Multimedia Syst.
[84] J. Eells and L. H. Sampson, Harmonic mappings of riemannian mani- Appl. (IMSA 2005), pp. 185190.
folds, Amer. J. Math., vol. 86, pp. 109160, 1964. [114] S. Santini and R. Jain, Similarity measures, IEEE Trans. Pattern Anal.
[85] X. Gu, S. Gortler, and H. Hoppe, Geometry images, in Proc. ACM Mach. Intell., vol. 21, no. 9, pp. 871883, Sep. 1999.
Siggraph 2002, pp. 355361. [115] A. Tversky, Features of similarity, Psychol. Rev., vol. 84, no. 4,
[86] E. Praun and H. Hoppe, Spherical parametrization and remeshing, in pp. 327352, 1977.
Proc. Siggraph 2003, pp. 340349. [116] R. C. Veltkamp and M. Hagedoorn, Shape similarity measures, prop-
[87] H. Laga, H. Takahashi, and M. Nakajima, Geometry image matching erties and constructions, in Proc. VISUAL 2000, Lyon, France: Lecture
for similarity estimation of 3D shapes, in Proc. Comput. Graph. Int., Notes in Computer Science, vol. 1929. New York: Springer-Verlag, Nov.,
Crete, Greece, Jun. 2004, pp. 490496. pp. 467476.
[88] D. Vranic, 3D model retrieval, Ph.D. dissertation, Univ. Leipzig, [117] R. C. Veltkamp, Shape matching: Similarity measures and algorithms,
Leipzig, Germany, 2004. in Proc. Shape Model. Int., May 2001, pp. 188197.
[89] D. S. Zhang and G. Lu, Shape-based image retrieval using generic [118] B. David, Methods for content-based retrieval of 3D models, presented
fourier descriptor, Signal Process.: Image Commun., vol. 17, no. 10, at the 3rd Annu. CM316 Conf. Multimedia Syst., Southampton, U.K.,
pp. 825848, Nov. 2002. Jan. 2003.
[90] M. K. Hu, Visual pattern recognition by moment invariants, IRE Trans. [119] Y. Rubner, C. Tomasi, and L. J. Guibas, The earth movers distance
Inf. Theory, vol. 8, no. 2, pp. 179187, 1962. as a metric for image retrieval, Int. J. Comput. Vis., vol. 40, no. 2,
[91] M. Elad, A. Tal, and S. Ar, Content based retrieval of VRML objects pp. 99121, Nov. 2000.
An iterative and interactive approach, in Proc. 6th Eurograph. Workshop [120] J. Gain and J. Scott, Fast polygon mesh querying by example, in Proc.
Multimedia, Manchester, U.K., 2001, pp. 97108. SIGGRAPH Tech. Sketches, Aug. 1999, p. 241.
[92] R. M. Duda and P. E. Hart, Pattern Classification and Scene Analysis. [121] V. Vapnik, The Nature of Statistical Learning Theory, 2nd ed. New York:
New York: Wiley, 1973. Springer-Verlag, 1999.
[93] N. Canterakis, 3-D Zernike moments and Zernike affine invariants for [122] A. Pedro, D. Alberto, and M. Jose, Spin images and neural networks
3D image analysis and recognition, presented at the 11th Scand. Conf. for efficient content-based retrieval in 3D object databases, in Proc.
Image Anal., Kangerlussaq, Greenland, 1999, vol. 3. CIVR 2002, Lecture Notes in Computer Science, vol. 2383. New York:
[94] A. P. Ashbrook, N. A. Thacker, P. I. Rockett, and C. I. Brown, Robust Springer-Verleg, pp. 225234.
recognition of scaled shapes using pairwise geometric histograms, in [123] C. Ip, W. Regli, L. Sieger, and A. Shokoufandeh, Automated learning of
Proc. BMVC, Birmingham, U.K., 1995, pp. 503512. model classifications, in Proc. ACM Symp. Solid Model. Appl. Archive,
[95] P. Besl, Triangles as a primary representation, object recognition in Washington, DC, Jun. 2003, pp. 322327.
computer vision, in Lecture Notes in Computer Science, vol. 1929. New [124] T. Ansary, J. Vandeborre, S. Mahmoudi, and M. Daoudi, A Bayesian
York: Springer-Verleg, pp. 11911206, 1994. framework for 3D models retrieval based on characteristic views, in
[96] M. Novotni and R. Klein, A geometric approach to 3D object compari- Proc. 2nd Int. Symp. 3D Data Process., Vis. Transmiss. (3DPVT 2004),
son, in Proc. Int. Conf. Shape Model. Appl., Genova, Italy, May 2001, Sep., pp. 139146.
pp. 167175. [125] M. Elad, A. Tal, and S. Ar, Directed search in a 3D objects database
[97] J. J. Song and F. Golshani, Shape-based 3D model retrieval, in Proc. using svm, HP Laboratories, Haifa, Israel, Tech. Rep. HPL-2000-20R1,
15th IEEE Int. Conf. Tools Artif. Intell., Nov. 2003, pp. 636640. 2000.
[98] H. Edelsbrunner and E. P. Mucke, Three-dimensional alpha shapes, [126] M. Suzuki, T. Kato, and H. Tsukune, 3D object retrieval based on
ACM Trans. Graph., vol. 13, no. 1, pp. 4372, 1994. subjective measures, in Proc. 9th Int. Conf. Database Expert Syst. Appl.
[99] E. Bardinet, S. Vidal, S. Arroyo, G. Malandain, and N. Capilla, Struc- Workshop (DEXA), 1998, pp. 850855.
tural object matching, presented at the Adv. Concepts Intell. Vision [127] C. Zhang and T. Chen, Active learning for information retrieval: Using
Syst. (ACIVS 2000), Baden-Baden, Allemagne, Aug. 3D models as an example, Carnegie Mellon Univ., Pittsburgh, PA, Tech.
[100] H. Blum, Biological shape and visual science, J. Theoret. Biol., vol. 38, Rep. AMP01-04, 2001.
pp. 205287, 1973. [128] S. Hou, K. Lou, and K. Ramani, SVM-based semantic clustering and
[101] M. Hilaga, Y. Shinagawa, T. Kohmura, and T. Kunii, Topology matching retrieval of a 3D model database, in J. Comput. Aided Des. Appl., Proc.
for fully automatic similarity estimation of 3D shapes, presented at the CAD 2005, Bangkok, Thailand, Jun., vol. 2, pp. 155164.
SIGGRAPH 2001, Los Angeles, CA. [129] Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, Relevance feed-
[102] M. Sharir and A. Schorr, On shortest paths in polyhedral spaces, SIAM back: A power tool in interactive content-based image retrieval, IEEE
J. Comput., vol. 15, no. 1, pp. 193215, 1986. Trans. Circuits Syst. Video Technol., vol. 8, no. 5, pp. 644655,
[103] G. Reeb, On the singular points of a completely integrable PfAFF form Sep. 1998.
or of a numerical function, (in French) Comptes Randus Acad. Sci., [130] Y. Ishikawa, R. Subramanya, and C. Faloutsos, Mindreader: Query
vol. 222, pp. 847849, 1946. databases through multiple examples, presented at the 24th VLDB
[104] H. Sundar, D. Silver, N. Gagvani, and S. Dickinson, Skeleton based Conf., New York, 1998.
shape matching and retrieval, in Proc. Shape Model. Int. 2003, Seoul, [131] Y. Rui, T. S. Huang, and S. F. Chang, Image retrieval: Current tech-
Korea, May, pp. 130139. niques, promising directions, and open issues, J. Vis. Commun. Image
[105] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker, Shock graphs Represent., vol. 10, no. 1, pp. 3962, 1999.
and shape matching, Comput. Vis., pp. 222229, 1998. [132] G. Leifman, R. Meir, and A. Tal, Relevance feedback for 3D shape
[106] T. Binford, Visual perception by computer, in Proc. IEEE Conf. Syst. retrieval, in Proc. IsraelKorea Bi-Nat. Conf. Geom. Model. Comput.
Sci. Cybern., Miami, FL, 1971. Graph., Oct. 2004, pp. 1519.
[107] R. Basri, L. Costa, D. Geiger, and D. Jacobs, Determining the similarity [133] I. Atmosukarto, W. K. Leow, and Z. Huang, Feature combination and
of deformable shapes, Vis. Res., vol. 38, pp. 23652385. relevance feedback for 3D model retrieval, in Proc. 11th Int. Multimedia
[108] F. Leymarie and B. Kimia, The shock scaffold for representing 3D Model. Conf. (MMM 2005), Melbourne, Australia, Jan., pp. 128133.
shape, in Proc. 4th Int. Workshop Visual Form, 2001, pp. 216228. [134] P. Min, J. Chen, and T. Funkhouser, A 2D sketch interface for a 3D
[109] Y. Zhang, A. Koschan, and M. Abidi, Superquadrics based 3D object model search engine, in Proc. SIGGRAPH Tech. Sketches, 2002, p. 138.
representation of automotive parts utilizing part decomposition, in Proc. [135] T. Igarashi, S. Matsuoka, and H. Tanaka, Teddy: A sketching interface
SPIE 6th Int. Conf. Qual. Control Artif. Vis., May 2003, vol. 5132, for 3D freeform design, in Proc. SIG-GRAPH 1999. Los Angeles, CA:
pp. 241251. ACM, pp. 409416.
1098 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICSPART C: APPLICATIONS AND REVIEWS, VOL. 37, NO. 6, NOVEMBER 2007

[136] P. Min, M. Kazhdan, and T. Funkhouser, A comparison of text and Hui Lin (M00) received the Graduate degree in
shape matching for retrieval of online 3D models, presented at the Eur. aerophotogrammetric engineering from the Wuhan
Conf. Digit. Libraries, Bath, U.K., Sep. 2004. Technical University of Surveying and Mapping,
[137] P. Shilane, M. Kazhdan, P. Min, and T. Funkhouser, The Princeton shape Wuhan, China, in 1980, the M.Sc. degree in remote
benchmark, in Proc. Shape Model. Int., Genoa, Italy, 2004. sensing and cartography from the Graduate School
[138] D. Chen, X. Tian, Y. Shen, and M. Ouhyong, On visual similarity based of Chinese Academy of Sciences, Beijing, China, in
3D model retrieval, in Proc. Comput. Graph. Forum, Sep. 2003, vol. 22, 1983, and the Ph.D. degree in geographical informa-
no. 3 pp. 223232. tion systems from the State University of New York
[139] K. Jarvelin and J. Kekalainen, IR evaluation methods for retrieving at Buffalo, Buffalo, in 1992.
highly relevant documents, in Proc. 23rd ACM SIGIR Conf. Res. Dev. He is currently a Professor and the Director of
Inf. Retrieval, 2000, pp. 4148. the Joint Laboratory for GeoInformation Science,
[140] J. Rocchio, Relevance feedback in information retrieval, in The SMART Chinese University of Hong Kong, Shatin, N.T., Hong Kong. His current re-
Retrieval System: Experiments in Automatic Document Processing. En- search interests include virtual geographic environments, cloud-prone and rainy
glewood Cliffs, NJ: Prentice-Hall, 1971, pp. 313323. area remote sensing, spatially integrated humanities, and social science.
[141] B. Bustos, D. Keim, D. Saupe, T. Schreck, and D. Vranic, An experi- Prof. Lin was elected as an Academician of the International Eurasian
mental comparison of feature-based 3D retrieval methods, in Int. Symp. Academy of Sciences in 1995.
3D Data Process., Vis., Transmiss., Thessaloniki, Greece, Sep. 69, 2004,
pp. 215222.
[142] E. Paquet and M. Rioux, Nefertiti: A query by content software for
three-dimensional models databases management, Image Vis. Comput.,
vol. 17, no. 2, pp. 157166, 1999.
[143] M. Novotni and R. Klein, Geometric 3D comparisonAn application,
presented at the ECDL WS Generalized Documents 2001, Darmstadt,
Germany, 2001.
[144] B. Bustos, D. Keim, D. Saupe, T. Schreck, and D. Vranic, Automatic se-
lection and combination of descriptors for effective 3D similarity search,
in Proc. IEEE Int. Workshop Multimedia Content-Based Anal. Retrieval,
Dec. 2004, pp. 514521.
[145] A. Johnson, O. Carmichael, D. Huber, and M. Hebert, Toward a general
3-D matching engine: Multiple models, complex scenes, and efficient
data filtering, in Proc. 1998 Image Understanding Workshop (IUW),
Nov. 1998, pp. 10971107.

Yubin Yang (M04) received the B.Sc. degree in


computer science from the Wuhan Technical Univer-
sity of Surveying and Mapping, Wuhan, China, in
1997, and the M.Sc. and Ph.D. degrees in computer
Yao Zhang received the B.Sc. degree in English lan-
science from Nanjing University, Nanjing, China, in
guage and literature in 2004 from the Nanjing Uni-
2000 and 2003, respectively. versity of Technology, Nanjing, China, where she is
He is currently with the State Key Laboratory for
currently working toward the M.Sc. degree in the De-
Novel Software Technology, Nanjing University. He
partment of Information Management.
is also with the Joint Laboratory for GeoInformation
Her current research interests include information
Science, Chinese University of Hong Kong, Shatin,
retrieval, digital libraries, and digital publishing.
N.T., Hong Kong. His current research interests in-
clude multimedia information retrieval, particularly for images and 3-D models,
spatial data mining and knowledge discovery, medical image processing and
analysis, machine learning, and intelligent virtual geographical environment.

You might also like