Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

IEEE Sponsored 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS) 2015

Survey of Scene Classification Approaches


R.Raja#, S.Md.Mansoor Roomi*, A.Lavanya*, S.Ishwarya*
*Department of Electronics and Communication
Thigarajar College of Engineering, Madurai.
#Pandian Saraswathi Yadav Engineering College, Sivagangai.

Abstract—With the exponential growth in storage of digital mixer of both indoor and outdoor scenes in such a case images
images, searching an image takes considerable amount of time. In are classified as hybrid scenes. Example for hybrid scene is
such scenario the search time could be minimized by grouping corridor images.
(classifying) the similar images in the database. Many approaches
are proposed to categorize the scenes into appropriate classes.
This study discusses the evolution of computer-based scene
classification approaches over the last decade, as well as overview
the successes and failures of proposed solutions to the problems
in scene classification. We have also analyzed the performance of
the existing approaches in terms of classification accuracy.

Keywords—digital images; search time; grouping; scene


classification; performance

I. INTRODUCTION
Due to the advancement in digital imaging, digital
photography collection, image databases have grown
exponentially. Searching and retrieving an image in such a
large collection of database takes enormous amount of search
time. Scene classification has gained much attention of
researchers to reduce the search time in addition scene
classification plays a key role in content based image retrieval
[1,2], automatic creation of photo albums, robot navigation [3]
and tourist information access.
Hierarchy of the scene categories is shown in the fig.1.
Generally, images can be categorized [4] into Photographic
images and Synthetic images. Examples for synthetic images
are drawings, computer game screenshots, window application
screenshot, cartoons, logos, charts and maps. Photographic
images can be further classified as Indoor scenery and
Outdoor scenery. Indoor sceneries are categorized [5] into five
major classes viz. Home, Leisure places, Stores, Working Fig. 1. Hierarchy of Scene categories
places and Public places. Examples for Home scene are living
room, bedroom, kitchen, bath room, garage, staircases, New needs have emerged regarding the development of
children room, lobby and dining room. Buffet, restaurant, fast advanced and user-friendly systems for the efficient
food, concert hall, game room, gym, hail salon, bar and movie manipulation of the image content [7] to categorize the scenes.
theater are the case of Leisure places. Working places consists For tackling these challenges, approaches that shift to image
of hospital, class room, art studio, laboratory, college, office, processing have been proposed [8].
school, operating room, TV studio and computer room. The main objectives of this paper are to
Examples of stores are bakery, grocery stores, jewellery shops,
flower shop, shoe shop, mall and video store. Church, inside  Study the scene classification approaches
bus, inside airport, waiting room, subway, prison cell, elevator proposed over the last decade
and train station are examples of public places. Outdoor  Describe the issues in categorizing the scenes into
scenery [6] can be further divided into natural scenery and appropriate classes
artificial scenery. Natural sceneries can be further classified
into sky/clouds, coasts, river, mountain, forest and plains  Analyze Performance of existing approaches of
(desert). If a natural scene contains any manmade objects, it is scene classification
said to be artificial scenery. City outside (street), market and
bus station are the cases of artificial scenes. Images can be a
IEEE Sponsored 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS) 2015

The rest of this paper is organized as follows: Section II filtering and visualization. Perception of color is not
explains the classical approaches of scene classification; represented better in RGB color space which leads to the
the performance analysis is reported in Section III. Finally, wider use HSV. Scalable Color Descriptor (SCD) [12] is the
section IV concludes the study of scene classification most common way of describing color features by providing
approaches. the distribution of color over the image using color
histograms. It uses the HSV colors space uniformly quantized
II. CLASSICAL APPROACHES to 255 bins. The histogram bin values are non-uniformly
This section presents standard approaches of scene quantized in a range from 16 bits/histogram for a rough
classification and the issues regarding the representation of representation of color distribution and up to 1000
image content. bits/histogram for high-quality applications. Dominant Color
Descriptor (DCD) [12] provides a more compact way of
Scene Classification representation but at the expense of poorer performance in
approaches high quality applications. It consists of the dominant colors in
each region, their percentages and color variance. Color
structure descriptor (CSD) [12] is used to describe local
Low level features based Semantic modeling based features in an image by sliding a window across the image and
approaches approaches counting the occurrence of each color in these regions thereby
constructing the color histogram. Group-of-Frames/Group-of-
Color features Pictures (GoF/GoP) Color Descriptor [12] is used for
Bag-of-Features model describing color features in number of frames in a video by
Texture features using SCD which makes it useful for video retrieval
Spatial Pyramid applications. Opponent color spaces [12] proposed by vande et
Shape features Matching model al. [13] which exhibits lesser dependency on the actual
intensity values. The components of the Opponent color model
Deformable Part based can be calculated as,
1
Model O1  ( R  G)
2 (1)
Fig. 2. Categories of Scene Classification approaches 1
O2  ( R  G  2 B)
6 (2)
As proposed in [9], scene classification methods can be
roughly divided into two broad families: low-level and b) Texture features: Texture aids in identifying objects
semantic modeling based approaches. Three types of low level of interest or region of interest irrespective of the source of the
features are Color, texture and shape. Semantic modeling image. Homogeneous Texture Descriptor (HTD) [12] is
based approaches can be further classified into Bag-of- extracted by filtering the image using bank of Gabor filters
Features (BoF) model, Spatial Pyramidal Matching (SPM) with five scales and six orientations. The first and second
model and Deformable Part based Model (DPM) as depicted moments of energy in the frequency bands are used as the
in fig.2. components of the descriptor. Texture Browsing Decriptor
The major issues [10] in scene classification are (TBD) [12] is similar to a human characterization which
categorizes texture in terms of regularity and directionality but
 Intra-class variability it is not appropriate for similarity ranking. Thus, TBD is used
 Illumination changes to find a set of candidates and homogenous texture descriptor
 View point variations such as scaling and rotation is used to get a similarity ranking . A bank of 24 Gabor filters
 Semantic gap between human perception and of 6 orientations and 4 scales is used to filter image and finally
machine perception a 12-bit descriptor is computed. Gray Level Co-Occurrence
Matrix [14] describes spatial relationships among grey-levels
A. Low level features based approaches in an image. Energy, Entropy, Homogeneity, Information
Visual content of images can be extracted using low level measure of correlation are used to characterize textured
features such color, texture and shape. regions.
a) Color features: Color features are the most widely c) Shape features
used descriptors in content based image retrieval systems. In most of the scene classification applications, the
There are wide range of color features that can be used in scene difference between classes is based on the objects present in
understanding [11,12]. To provide reproducible representations the scenes. Shape descriptors [11] play a vital role in
of color, several color spaces have been introduced like Red identification of objects in a particular scene. 2-D Shape
Green Blue (RGB), Hue Saturation Value (HSV) , Hue Min descriptors can be either region-based or contour-based.
Max Difference (HMMD and CIE-LAB etc. Numerous 3-D shape descriptors can also be used depending
Color Layout Descriptor (CLD) [12] is used to describe the on the applications.
spatial distribution of colors from RGB color space for content
IEEE Sponsored 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS) 2015

3-D Shape Descriptor [15]—Shape Spectrum is used to a robust scene classification algorithm. In this method to
measure the local convexity of the image region by computing achieve rotation and scale invariant property, Gabor filter is
the histogram of shape indexes with 100 bins. Region-Based applied on the image followed by which low level features are
Descriptor ART (Angular Radial Transformation) extracts extracted from the filtered image. Principal Component
shape features from the entire region invariant to Analysis (PCA) is used to reduce the dimensionality of the
transformations. This descriptor is more immune to features.
segmentation noise. Contour-Based Descriptor are computed
by applying curvature scale space transforms(CSS) on the B. Semantic Modeling based Approaches
image and matching is done by comparing the CSS index at
Although the low level feature based approaches offer
the most prominent peaks of the resultant image. 2-D/3-D
simplicity and low computational cost, they often exhibit poor
Shape Descriptor is used to characterize the shape of a 3-D
object by multiple 2-D images with different view angles and performance when number of scene categories are more. To
matching them to calculate similarity. Scale-invariant feature solve this problem, semantic modeling based methods were
transform (or SIFT) is an algorithm in computer vision to evolved. These approaches use a semantic intermediate
detect and describe shape features in images. SIFT [16] representation to model the content of images and fill the gap
feature descriptor is invariant to uniform scaling, orientation, between low-level and high-level features.
and partially invariant to affine distortion and illumination a) Bag-of-Features model: Because of the excellent
changes. SIFT descriptor is used to find the important key performance, these semantic modeling based methods are
points in an object by computing the maxima and minima of widely used for scene classification, especially the Bag-of-
difference of Gaussians function applied in scale space to a Features (BoF) model. The BoF model samples an image
series of smoothed and resampled images which are further efficiently with various local interest point detectors [24] or
used for matching by computing the Euclidean distances dense regions [25], and describes it with local descriptors [26].
between them. Histogram of Oriented Gradients (HOG) [17] is The image is then represented as a histogram of code word
a standard image feature used in object detection and occurrences by mapping the local features to a codebook,
deformable object detection. It decomposes the image into
which is typically generated with a clustering method. Zhong
square cells of a given size and computes a histogram of
Ji et al. [27] presented a novel Object-Enhanced (OE)
oriented gradient in each cell and then renormalizes the cells
by looking into adjacent blocks. mechanism to strengthen the classical BoF based approaches.
In this method, bottom-up saliency detection method is first
Li Zhou et al. [18] proposed an approach based on low adopted to detect and segment the object regions of interest.
level feature combination (texture and shape). This method Next, local features are extracted from images. These
represents sub- regions of the image more effectively using a enhanced features in object regions together with features in
new visual descriptor called GBPWHGO. The GBPWHGO background regions are reunited against the pre-generated
descriptor is composed of a GBP (Gradient Binary Pattern) visual vocabulary to form the OE-BoF features. Dictionary
and a WHGO (Weighted Histogram of Gradient Orientation) Learning Approach [28] is proposed to separate the features
to capture structural and textural properties of images which are specific to particular classes from the features
effectively. Unlike LBP, GBP operator describes each pixel by
commonly present across various classes in a dictionary built
the relative gradient on different directions of the pixel similar
to the gradient information extracted by SIFT, HOG etc. These by usual methods like sparse coding. K mean clustering based
descriptors describe regions of interest using the distribution code book generation results in poor code book due to non
of gradient by accumulating the gradient magnitude of each uniformly distribution. To overcome this hindrance, simple
pixel into orientation bins. But they do not use the frequency fixed radius cluster [29] based on mean shift is used for code
of occurrence of gradient orientation, which is actually very book generation.
useful in scene discrimination. S.M.M. Roomi et al. [20, 21] There are two drawbacks of the traditional codebook
presented methods for classifying images into major model
categories such as Indoor/Outdoor scenes using color, texture  code-word uncertainty
and shape information. Kelly et al. [22] proposed an approach  codeword plausibility
for scene classification using neighborhood and spatial
information. In this approach, the features are extracted from To circumvent these problems, kernel code books [30] are
the images using Contextual Mean Census Transform proposed by Jan et al. An uncertainty modeling method is
(CMCT). The information from distant neighbors increments used for the codebook generation. In addition, kernel density
the contextual information of a 3x3 window of pixels and estimation is applied to allow a degree of ambiguity in
helps in differentiating similar windows placed in different assigning code words to image features. By using kernel
regions. In this way, the local information is extracted from density estimation, the uncertainty between code words and
the regions by CMCT. Multi channel feature generation [23] is image features is lifted beyond the vocabulary and becomes
recently proposed to categorize the scenes. In this approach, part of the codebook model. Christoph et al. [31] proposed an
hyper opponent color space is used to extract the features approach to recognize the scene using Bag-of-Space time
using mCENRIST technique. All these existing methods Energies (BoSE). This approach is built on primitive features
suffer to group when the same scene is taken at different that uniformly capture spatial and temporal orientation
viewpoints. To overcome the issues such as illumination structure of the scenes.
changes and view point variations, Roomi et al.[10] proposed
IEEE Sponsored 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS) 2015

b) Spatial Pyramid Matching model: Spatial Pyramid semantically meaningful descriptions of the scene classes.
Matching model [32] (SPM) is the extension of BoF model. Yingbin [38] presented a new image representation based on
This technique works by partitioning the image into response extracted from object part filters. Since different
increasingly fine sub-regions at multiple scales and computing objects may contain similar parts, the method uses a semantic
histograms of local features found inside each sub-region. hierarchy to automatically determine and merge filters shared
Pyramid matching works by placing a sequence of by multiple objects to form hybrid part filters. The proposed
increasingly coarser grids over the feature space and taking a hybrid-Parts are generated by pooling the response maps of
weighted sum of the number of matches that occur at each the hybrid filters. Contrast to other scene recognition
level of resolution. At any fixed resolution, two points are said approaches, this method adopts object-level detections as
to match if they fall into the same cell of the grid; matches feature inputs, which enable a richer and finer-grained
found at finer resolutions are weighted more highly than representation.
matches found at coarser resolutions. But SPM performs
pyramid matching in the two-dimensional image space, and III. PERFORMANCE ANALYSIS
use traditional clustering techniques in feature space. Roland This section reports the performance analysis of various
et al. [33]. proposed an approach based on spatial information scene classification approaches. Analysis on 15 scene category
encoding, kernel design, and data embeddings compatible with and MIT 67 indoor scene categories are discussed in terms of
image representation on a probability simplex. To achieve performance metrics. 15 scene category dataset consists of 15
better classification performance for SPM, the spatial pyramid categories of Indoor scenes like bedroom, kitchen, living
match kernel (SPMK) is used. SPMK replaces the l2 norm of room, store, etc. and Outdoor scenes like coast, forest, tall
RBF by the histogram intersection (HI) metric. The building, street, etc. At first A. Oliva et al [39] reported that
appearance representation was based on SIFT descriptors their approach achieves 73.28 %. Latter on many approaches
computed on an evenly-spaced 4x4 pixel grid. The 128- are proposed to classify the scenes in 15 scene category.
component Gaussian mixtures of diagonal covariance were Yuning Jiang et al [34] achieved 88.1% because their method
used to model theme distributions, and mixture parameters is based on hybrid part filters response.
estimated with the EM algorithm. Randomized spatial
TABLE I. CLASSIFICATION ACCURACY OF 15 SCENE CATEGORIES
pyramids [34] are used to extract more finer spatial
information than SPM techniques where the image is
randomly partitioned multiple times and a pool of Classification Accuracy
Methods
(%)
independent partition patterns is obtained. Different from SPM
where each local feature is hard-quantized into the unique sub- A.Oliva et al. [39] 73.28
region at each level, the RSP-based method provides many
partition patterns in the same level, such that each local feature Wu et al. [40] 73.29
can be soft-quantized into multiple sub-regions. This will K. Gazolli et al. [41] 76.87
make the image representation more robust as it is less
sensitive to the spatial quantization error. Li et al [37] 80.9

c) Deformable Part based Model: An image S. Lazebnik et al. [32] 81.40


representation based on objects can be very useful in high-
level visual recognition tasks for scenes cluttered with objects. MorcoFornoni et al. 82.42
For scene classification, Deformable Part based Model (DPM) [42]
Bosch et al. [43] 83.7
[35] can be used to capture the recurring visual elements and
salient objects using standard global image features. A DPM J. Wu et al [31] 84.1
represents an object by a lower-resolution root filter and a set
of higher-resolution part filters arranged in a flexible spatial Yingbin Zheng et al 86.3
[38]
configuration. Congcong Li [36] proposed a new image
Yuning Jiang et al [34] 88.1
representation based on DPM to model a full spectrum of
arbitrarily higher-order object interactions for deeper scene
understanding. This approach discovers the groups of objects MIT 67 Indoor scene category is a challenging dataset
by exploiting object-level annotations in images. The group of because it contains more number class as well as more number
objects tend to be semantically meaningful, and are not of cluttered images. Initially, Dalal et al .[17] has reported
dependent on the appearance modeling choices. Insipied by 22.8% accuracy which is not preferably better for indoor scene
DPM, Object bank [37] is proposed by Li-Jia-Li et al. In this classification. Hence several researchers strived hard to
technique, image contents are represented based on objects, or improve performance of the system. This lead to a subsequent
more rigorously, a collection of object sensing filters built on a increase in overall accuracy owing to the usage of high level
generic collection of labeled objects. A regularized logistic semantic approaches like BoF, SPM and DPM method.
regression method is used for describing structured sparsity as Ultimately, Yingbin Zheng et al [38] achieved a high accuracy
well as to explore both feature and object sparsity. This object of 47.2 % using object response derived from hybrid part
bank is used as high-level image representation to discover filters.
IEEE Sponsored 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS) 2015

TABLE II. CLASSIFICATION ACCURACY OF 67 INDOOR SCENE [7] R. Datta, D. Joshi, J. Li, J. Wang, “Image retrieval: ideas, influences,
CATEGORIES and trends of the new age”, ACM Computing Surveys (CSUR) 40 (2)
pp. 1–60, 2008.
[8] A. Hanjalic, R. Lienhart, W. Ma, J. Smith, “The holy grail of multimedia
Classification Accuracy information retrieval: so close or yet so far away” Proceedings of the
Methods
(%) IEEE 9 (4) pp. 541–547, 2008.
N. Dalal et al [17] 22.8 [9] A.Bosch, X.Munz, R.Marti, “A review: which is the best way to
organize/ classify images by content, Image and Vision Computing 25
(6) pp.778–791, 2007.
J. Wu et al [40] 22.46
[10] S.M.M.Roomi, R.Raja, D.Dharmalakshmi, “Robust Indoor/Outdoor
A. Oliva et al [39] 23.2 scene classification”, ICAPR 2015.
[11] A.Yamada, M. Pickering, S. Jeannin, L. Cieplinski, “MPEG-7 Visual
A. Quattoni et al. [5] 25 Part of Experimentation Model Version 8.0”, ISO/IEC
JTC1/SC29/WG11/N3673, 2000.
K. Gazolli et al [41] 25.82 [12] B. S. Manjunath, J.-R. Ohm, V. V. Vasudevan, and A. Yamada,
“MPEG-7 color and texture descriptors,” IEEE Trans. Circuits Syst.
MorcoFornoni et al. 39.6 Video Technol., vol. 11, pp. 703–715, June 2001.
[42]
Li Zhou et al [18] 42.9 [13] Julia Vogel, Bernt Schiele, “Semantic Modeling of Natural Scenes for
Content-Based Image Retrieval”, Int. J. Comput. Vision, 72:133-157,
M. Pandey, et al [35] 43.1 july 2006. DOI: 10.1007/s11263-006-8614-1.
[14] Haralick, R.M., K. Shanmugan, and I. Dinstein, "Textural Features for
Yingbin Zheng et al 47.2 Image Classification", IEEE Transactions on Systems, Man, and
[38] Cybernetics, Vol. SMC-3, pp. 610-621, 1973.
[15] M. Bober, “ MPEG-7 Visual Shape Descriptors,” IEEE Trans. Circuits
Syst. Video Technol., vol. 11, pp. xref–x, June 2001.
[16] D. Lowe, Distinctive image features from scale-invariant keypoints, Int.
IV. CONCLUSION J. Comput.Vis. 60(2) pp. 91–110, 2004.
The proper classification of scene categories is highly [17] N. Dalal,B.Triggs, “Histograms of oriented gradients for human
essential to reduce time complexity in retrieval applications. detection”, Proceedings of the IEEE Conference on Computer Vision
Several paradigms have been proposed in literature to and Pattern Recognition, pp.886–893, 2005.
accomplish this task effectively. In this article, a gist on [18] Li Zhou, Zongtan Zhou, DewenHu, “Scene classification using
multiresolution low level feature combination”, nurocomputing 122
various approaches of scene classification is presented. The (2013) 284–297, 2013.
possible scene categories and the challenging issues involved
[19] S.M.M. Roomi, R.Raja, D.Dharmalakshmi, “Classification and Retrieval
in their classification are also discussed. This paper can better of Natural scenes”, IEEE International Conference on ICCCNT 2013,
assist emerging researchers in this field with the analysis of all pp. 1-8, July, 2013.
the existing techniques. Though several methodologies are [20] S.M.M Roomi, R.Raja, D.Dharmalakshmi, S.Rohini, “ Classificationof
available, there is still a need for further improvement in scene indoor / outdoor scenes ”, IEEE International Conference on ICCIC
recognition systems. 2013, pp. 1-4, Dec., 2013.
[21] R.Raja, S.Md.Mansoor Roomi, D.Dharmalakshmi, “Classification
scenes into Indoor/Outdoor ”, Research Journal of Applied Sciences,
REFERENCES Engineering and Technology 8(21): 2172-2178, 2014.
[22] Kelly, Evandro, “ Exploring neighborhood and spatial information for
improving scene classification”, Pattern Recognition Letters vol.46, pp.
[1] A. Vailaya, A. Jain, M. Figueiredo, H. Zhang, “Content-based 83–88, 2014.
hierarchical classification of vacation images”, Proceedings of the IEEE [23] Yang Xiao, Jianxin Wu, Junsong Yuan, “mCENTRIST: A Multi-
International Conference on Multimedia Computing and Systems, vol. 2, Channel Feature Generation Mechanism for Scene Categorization” IEEE
IEEE ComputerSociety, pp. 518–523, 1999. transaction on Image Processing, vol. 23, no. 2, 2014.
[2] E. Chang, K.G.K. Goh, G. Sychay, G.W.G. Wu, “Cbsa: content-based [24] K. Mikolajczyk, C. Schmid, “Scale and affine invariant interest point
soft annotation for multimodal image retrieval using bayes point detectors”, International Journal of Computer Vision 60 (1), pp. 63–86,
machines”, IEEETrans. Circuits Syst. Video Tech. 13 (1) 26–38, 2003. 2004.
[3] C. Siagian, L. Itti, “Gist: a mobile robotics application of context-based [25] J. Qin, N.H.C. Yung, “Scene categorization via contextual visual words,
vision in outdoor environment”, Proceedings of the 2005 IEEE Pattern Recognition” vol.43, pp.1874–1888, 2010.
Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’05) – Workshops, vol. 3, IEEE Computer Society, [26] K. Mikolajczyk, C. Schmid, “A performance evaluation of local
pp. 1063–1069, 2005 descriptors”, IEEE Transactions on Pattern Analysis and Machine
Intelligence 27 (10), 1615–1630, 2010.
[4] A. Vailaya, A. Jain, and H. Zhang. “Video clustering” Tech. Rep.
MSU-CPS-96-64, Michigan State University, 1996. [27] Zhong Ji , JingWang , YutingSu , ZhanjieSong , ShikaiXing, “Balance
between object and background:Object-enhanced features for scene
[5] A.Quattoni,A.Torralba, “Recognizing indoor scenes” , IEEE Computer image classification”, neurocomputing ,2013.
Society Conferenceon Computer Vision and Pattern Recognition, pp.
413–420, 2009. [28] Shu Kong and Donghui Wang, “A Dictionary Learning Approach for
Classification: Separating the Particularity and the Commonality”
[6] S.M.M.Roomi, R.Raja, D.Dharmalakshmi, “Classification and retrieval Europian Conference of Computer Vision (ECCV), 2012.
of natural scenes”, Fourth International Conference on Computing,
Communications and Networking Technologies (ICCCNT), [29] Frederic Jurie and Bill Triggs, “Creating Efficient Codebooks for Visual
DOI:10.1109/ICCCNT.2013.6726534, pp.1- 8, 2013. Recognition”, ICCV, 2005.
[30] Jan C. van Gemert, Jan-Mark Geusebroek, Cor J. Veenman, and Arnold
W.M. Smeulders, “Kernel Codebooks for Scene Categorization”, ICCV,
2008.
IEEE Sponsored 2nd International Conference on Innovations in Information, Embedded and Communication systems (ICIIECS) 2015

[31] Christoph Feichtenhofer, Axel Pinz1 Richard, P. Wildes, “Bags of Space [38] Yingbin Zheng, Yu-Gang Jiang, Xiangyang Xue, “Learning Hybrid Part
time Energies for Dynamic Scene Recognition”, IEEE transaction on Filters for Scene Recognition”, CVPR 2012.
Pattern Recognition, 2014. [39] A. Oliva, A. Torralba, “Modeling the shape of the scene: a holistic
[32] S. Lazebnik, C. Schmid, J. Ponce, “Beyond bags of features: spatial representation of the spatial envelope”, Int. J. Comput. Vis. 42 (3) pp.
pyramid matching for recognizing natural scene categories”, 145–175,2001.
Proceedings IEEE Computer Society Conference on Computer Vision [40] Wu, J.M. Rehg, “Centrist: a visual descriptor for scene categorization”,
and Pattern Recognition, vol. 2, IEEE Computer Society, 2006, pp. IEEE Trans. Pattern Anal. Mach. Intell. 33 (8), 1489–1501, 2011.
2169–2178.
[41] K. Gazolli, E. Salles, “A contextual image descriptor for scene
[33] Roland Kwitt, Nuno Vasconcelos, and Nikhil Rasiwasia, “Scene classification”, in: Online Proceedings on Trends in Innovative
Recognition on the Semantic Manifold”, ECCV, 2012. Computing, pp. 66–71, 2012.
[34] Yuning Jiang, Junsong Yuan, and Gang Yu , “Randomized Spatial [42] Marco Fornoni,Barbara Caputo, “ Recognition with Naive Bayes Non-
Partition for Scene Recognition”, ECCV 2012. linear Learning”, pattern recognition, ICPR, 2014.
[35] M. Pandey and Svetlana Lazebnik, “Scene Recognition and Weakly [43] Bosch, A., Zisserman, A., Munoz, X.: “Scene classification using a
Supervised Object Localization with Deformable Part-Based Models”, hybrid generative/discriminative approach”. TPAMI 30, pp. 712–
In International Conferrence on Computer Vision (ICCV),2011. 727,2008.
[36] Congcong Li, Devi Parikh, Tsuhan Chen, “Automatic Discovery of [44] J. Wu, Rehg, J.M. “Beyond the euclidean distance: Creating effective
Groups of Objects for Scene Understanding”, CVPR, 2012. visual codebooks using the histogram intersection kernel”. In: ICCV.
[37] Li-Jia Li, Hao Su, Eric P. Xing, Li Fei-Fei1, “Object Bank: A High- 2009.
Level Image Representation for Scene Classification & Semantic
Feature Sparsification”, International journal of computer vision, 2012.

You might also like