Professional Documents
Culture Documents
2015 - Car Re-Idenitification From Large Scale Images Using Semantic Attributes
2015 - Car Re-Idenitification From Large Scale Images Using Semantic Attributes
2015 - Car Re-Idenitification From Large Scale Images Using Semantic Attributes
Qi Zheng, Chao Liang, Wenhua Fang, Da Xiang, Xin Zhao, Chengping Ren, Jun Chen
National Engineering Research Center for Multimedia Software
School of Computer Science, Wuhan University, Wuhan, China
Email: zhengq@whu.edu.cn, cliang@whu.edu.cn
Abstract—Car re-identification, searching a specific car object the utility of example-based retrieval with color correlograms
from a large-scale car image database, is investigated in this avoid the limitations of strict color classification. Ming-Kuang
paper. Previous work mainly focuses on fixed pose and overlooks Tsai et al. [4] used active shape model (ASM) to fit 3D vehicle
the special appearance. However, avoiding matching other poses models to a 2D image, then obtain those parts to rectify
would lead to coarse results of the car retrieval. And some special vehicles from disparate views into the same reference view.
attributes like individual paintings which are greatly helpful
for car retrieval have not drawn enough attention. This paper
These methods are mainly searching in low dimensional, so
addresses these problems through multi-poses matching and re- they avoid finding other features in the objects, leading to the
ranking based on special attributes. Our core idea lies in query coarse results of the car retrieval. Another influential approach
expansion method that can capture weighted attributes to build considers sensors as add-ons [5] [6]. Rogerio Feris et al. [5]
the retrieval model, which allows us to estimate invisible attributes searched for vehicles in surveillance videos based on attributes
by the visible ones to construct complete attributes vectors to (such as color, direction of travel, speed, length, height, etc.).
car retrieval in any poses. Furthermore, we divide all attributes They focus on surveillance surroundings that some of the
into two groups, special attributes and common attributes. characteristics are obtained by the kinetic sensors, which
Here special attributes represent the abnormal appearance like limits their applications. Besides, the last two methods [4] [5]
individual paintings or car damage while common attributes mentioned above are restricted to the limited condition of
denote the intrinsic appearance of car. Using special attributes
to re-rank results turns out to be beneficial to improve the
common-pose car retrieval.
retrieval performance. In the end, the experiments demonstrate
Query Image Gallery Images
the effectiveness of our approach on the car datasets.
……
I. I NTRODUCTION
Attributes
1 1 0 1 0 1…
1 0 0 1 0 0…
1 0 1 1 0 1…
1 0 0 1 0 0…
1 1 0 0 0 1…
……
multimedia retrieval system with the greatly increasing number Attributes in Attributes Attributes in
……
0 0 0 1 0 1… 0 0 0 1 0 1… 0 0 0 1 0 1…
front-right in right back-right …… …… ……
of cars. Unfortunately, the traditional technology using license 1 0 0 1 0 1… 0 0 0 1 0 1… 0 1 0 1 0 1…
978-1-4673-7478-1/15/$31.00
c 2015 IEEE
TABLE I. B INARY ATTRIBUTES SPACE WITH 27 ATTRIBUTES .
we could not see it. Besides, some special appearance of the
car, like individual painting, can locate the same car in gallery head-bumper back-bumper roof-antenna container
roof-rack front-plate back-plate low-underpan
images more accurately. Thus we propose a novel method middle-underpan high-underpan sivler-wheel spare-wheel
in Fig.1. It can tackle with the car retrieval problem under tail-fin vertical-taillight four-headlight horizontal-grille
multi-poses circumstances. Our work is inspired by query coldCair-intake minivan sportyCar jeep
ordinary-car ragtop-open wheel-visible new
expansion based on term co-occurrence and term similarity simple-texture damage roof-Cargo
which has been widely investigated in the text retrieval works
with varying degrees of success [7]. So we bring the query
expansion idea to our system to deal with the multi-poses to the car statue, and the left attributes are the type of the car.
problem and take advantage of the strong correlations to Fig.3 shows an example of some attributes.
evaluate the expansive attributes in another pose. Here using
invisible attributes as expansive attributes to deal with pose
change problem is an important part in our work. Then we
Jeep
adopt distance measurements trained by 5 car poses (these
poses are seen in Fig.5) to find the optimal result. Based on head-bumper spare-wheel
the visible attributes combined with expansive attributes for
each car pose, we focus on the sematic attributes distance
between the query image and each gallery image. In addition,
we divide all attributes into two groups, special attributes high-underpan
back-bumper
including ”damage”, ”complex-texture” and common attributes
including others. If any special attributes detected in the query
image, the SIFT features would be utilized for re-ranking the
sivler-wheel
retrieval result. In our experiments, It is demonstrated that our ragtop-open
simple-texture jeep
N-roof-antenna simple-texture
Car Pose Based on the statistics of car datasets, we
N-roof-rack N-roof-antenna
divide the car pose into 8 directions illustrate in Fig.4.
high-underpan N-roof-rack
… …
… …
Color Histogram The main color of a car is not the
estimate
back-bumper
binary variable, but it hardly changes by the car pose. Thus as
Invisible attributes :
spare-wheel a preprocessing step in our framework, the color is used for
clustering the images of similar color to obtain a subset of the
Fig. 2. Attributes estimation. We estimate the invisible attributes to measure original datasets as a coarse retrieval result. Experiment shows
the images of other poses. Some invisible attributes like spare-wheel in front- that it improves our performance substantially. Standard color
left can be easily estimated by other visible attributes. Thus a query expansion histogram is used here because they are robust to occlusions
is used to measure attributes vector in other pose.
and lighting and view changes. And the pioneering work of
Swain [8] has shown the effectiveness of color histograms to
II. C AR R EPRESENTATION distinguish the large number of objects.
Introduction of some relevant concepts in car retrieval is Sift Feature The SIFT [9] algorithm has been suc-
presented in this section. In addition, we show the training of cessfully used for describing images local features. The SIFT
detector for each attribute in Section 2.2. features show the invariant to image scaling, translation, and
rotation, even partially invariant to affine distortion, illumi-
A. Appearances nation changes, noise addition and partial occlusion. Because
of its remarkable performance on feature matching, the SIFT
The appearances of cars are usually changing with different
has been widely applied to the retrieval work [10] [11]. In this
poses which bring the variation of visibility about attributes.
paper, any special attributes detected will lead the sift matching
We introduce the invisible attributes estimation based on
here to ranking the retrieval result.
car pose to bridge the gaps among the different car poses.
Meanwhile, the color histogram and sift feature are significant
to our preprocessing and result re-ranking respectively. B. Attribute Detection
Binary Semantic Attributes According to the intrinsic As described in section 2.1, we classify the pose of car
property of cars, we define the following space of Na = 27 into 8 directions. Additionally, we notice that the symmetry
binary attributes for our study (summarized in Tab. I ). Seven- direction (except front-back pair ) owe the same attributes.
teen of those attributes are car components, and six are related Thus we decrease the number of directions to 5 by horizontal
We derived 8 color channels (RGB, HSV and YCbCr) and
20 texture filters (Gabor, Schmid, LBP) from the luminance
channel. We use the same parameter choices for γ, λ , θ and
σ 2 as [13] for Gabor filter extraction, and for τ and σ for
Schmid extraction, similar to [13] . Finally, we use a bin size
of 16 to describe each channel except LBP luminance channel
which is set to 59 bins.
However, some attributes will be occluded from different
poses. For example, if a car is facing to the camera, the
attribute like back-bumper can not be seen in the image. But
the back-bumper often exists with the emergence of front-
bumper or the car type of Jeep( illustrated in Fig.2). So it
inspires us that occluded attributes could be estimated by other
visible attributes. Here we use SVM to train the weight vector
wv from each viewpoint:
1
min θ(wv ) = max min( ||wv ||2
wv ,b αi ≥0 wv ,b 2
X n (1)
− αi (yvi (wvT avi + b) − 1))
i=1
R EFERENCES
[1] Z. Sun and E. Technol, “On-road vehicle detection: a review,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, pp. 694–
711, 2006.