A Cognitive Method For Object Detection From Aerial Image

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

International Conference on Computing, Communication and Automation (ICCCA2016)

A Cognitive Method for Object Detectionfrom Aerial

Naveen Chandra, Ashu Sharma, Jayanta Kumar Ghosh
Civil Engineering Department
Indian Institute of Technology
Roorkee, India

Abstract—Aerial images are rich sources of data for [9]. Afterwards, different types of feature for instance shape
extracting geo-spatial information. A deeper understanding of and height of the building were identified based on the
human cognitive processes is necessary to automate the information obtained from shadow [10]. Later, the information
mechanism of information extraction from aerial images. The of the shadow was used to complete the process of boundary
objective of this research work is to emulate human cognitive grouping [11] and it was also used as verification element for
capacities for extracting the geospatial information from an the methods which has been proposed earlier [12, 13].In past,
urban area.Firstly, preliminary knowledge about the sequence of few methods for building detection has been developed which
cognitive processes whichhuman’s image cognition system utilizes are based on supervised classification algorithm [14]. Then,
during object detection is collected. Secondly, rule based
support vector machines (SVM), graph theoretical tools and
approach is used for the representation of the knowledge which is
obtained from the visual interpretation of image by the human
scale invariant feature transform (SIFT) were used for
beings. Thirdly, defined rules are used to detect the buildings in detecting buildings from aerial images [15, 16].Recently, new
the aerial image using two different algorithms. method for building detection is developed by integrating the
information of the shadow with fuzzy logics and Grab Cut
Keywords—Cognitive; Knowledge; Mixture Tuned Matched partitioning algorithm [17].Then a system was proposed for
Filtering; Spectral Angle Mapper. building detection using laser scanning data[18].Some of the
work has used graphical models to improve the overall
I. INTRODUCTION accuracy system[19,20]. Later, Markov random field and
Cognitive task analysis (CTA) [1] is broadly used in conditional random field models were used for object detection
applied psychology for conducting complex task. It is a well- from an urban area[21].In this paper, a cognitive method for
organizedmethod for defining the psychological processes building detection from aerial image has been implemented.
which an analyst acquires during a task.CTA describes the The paper is organized as follows: section II describes the
cognitive capacities and inputs which are used to obtain the cognitive methodology used for building detection, section III
output of the task. CTA makes the process of information illustrates the results and section IV consists of conclusion.
extraction simple as it uses the thought process of human being
for carrying out a task.Therefore, CTA can be used for
performing a complex task of building detection from aerial In this work the process of building detection is performed
image of an urban area.Automatic building detection from an using CTA. CTA is carried out in five steps shown in fig.1.The
urban location has been a prime topic of research in computer overall framework of the methodology is shown in fig. 2.
vision [2]. With the development of aerial imaging technique,
A. Gathering Preliminary Knowledge
high resolution aerial images are made available for different
types of methods proposed for object detection [3-4]. High This is the first stage of CTA during which image analyst
resolution images are useful for interpreting and analysing determines the flow of process which are necessary for
small objects, however, the computational time for processing CTA.There are different methods for knowledge collection
of large image is high. Building detection is a complex task such as unstructured interviews and observation but in this
because buildings are mainly located in the urban area which study document review and analysis is used for preliminary
has other objects such as street, parking lots and trees. knowledge collection. In this stage analyst also identifies the
Buildings vary in their structural design and shapes which are method which will be required in the next stage.Image analyst
difficult to interpret [4].Object detection from aerial images has tries to develop an elementary understanding of the domain i.e.,
always been a key for aerial image interpretation. Few image analysis.
applications to name are 3D city planning, cartography and B. Selecting Method for Representing Knowledge
classification. Various types of algorithms for resolving the
issue of building detection has been developed and some In the second stage image analyst deeply examines the
survey is available in the literature [4-6]. These methodologies knowledge acquired in the previous stage for performing the
use different data sources, models and assessment strategies cognitive analysis of aerial image. In the past, various methods
[4].In the past several types of region growing methods were have been used for the representation of knowledge for
developed for building detection [7-10]. To start with, shadows instance flow chart, semantic network and concept maps [1]. In
were used to identify the edges and the corners of the building our work, rule based approach is used for the representation of

ISBN: 978-1-5090-1666-2/16/$31.00 ©2016 IEEE 327

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:01:18 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2016)
knowledge. Firstly, a set of training data is prepared for the
input image. Then the objects in the aerial image which are
interpreted as buildings by human beings are labelled
manually. Later, these interpretation results are used as rules
for object detection form aerial image.
C. Selecting Method for Knowledge Elicitation
Elicitation is a methodof accumulating knowledge from
various resources.During the third stage, the analyst applies
different algorithm for integrating the knowledge acquired in
the earlier stage.In previous research different technique for
knowledge elicitation has been used such as simulation
method, prototyping method and observation method [1].In
this research two different methods namely mixture tuned
matched filtering (MTMF) [22] and spectral angle mapper
(SAM) [23] is used for knowledge elicitation. The MTMF
algorithm consists of three major steps: (1) The MNF Fig.1. Flow Chart for Cognitive Task Analysis
transformation of the apparent reflection data [22],(2)
Performing matched filtering for the abundance estimation,
and (3) mixture tuning (MT) for identification of the false-
positive pixels [22]. The MTMF algorithm is divided in two
sections i.e., matched filter (MF) and mixture tuned (MT).
Further, matched filter vector (MFV) [22] is calculated using
𝑀𝐹𝑉 = −1 𝑇 (1)

Where 𝐶𝑀𝑁𝐹 denotes the diagonal inverse of covariance matrix
for the MNF data set and 𝑡𝑆𝑀𝑁𝐹 denotes the target spectrum
which is converted to MNF space. Then, MFI [22] is
calculated using (2).
𝑀𝐹𝐼 = 𝑀𝐹𝑉 × 𝐷𝑀𝑁𝐹 (2)
Fig.2. Overall Framework of the Methodology
Where MFI denotes resulted image of the matched filter and
𝐷𝑀𝑁𝐹 denotes the MNF data set. Lastly, MT [22] is calculated
using (3). Where 𝑛𝑏 denotes the number of bands, 𝑡𝑖 denotes the test
1 spectrum and 𝑟𝑖 denotes the reference spectrum. The key
advantage of SAM is that it is a simple and fast algorithm and
𝑀𝑇𝑖 =
𝐷𝑀𝑁𝐹𝑖 − 𝑑𝑚𝑖
(3) primary limitation of this algorithm is the issue of spectral
𝑀𝑇𝑒𝑣𝑎𝑙𝑖 mixing [23].
D. Analyzing and Verifying the Obtained Data
Where 𝑀𝑇𝑖 is the mixture-tuned value for pixel 𝑖, 𝐷𝑀𝑁𝐹𝑖 is The result of CTA is dependent on the method which has
MNF spectrum for pixel 𝑖, 𝑑𝑚𝑖 is mean value for pixel 𝑖 and been used for knowledge elicitation. Thus a quantitative
𝑀𝑇𝑒𝑣𝑎𝑙𝑖 is the interpolated vector of eigenvalues for pixel 𝑖. evaluation is required to validate the output. During this stage
the output image obtained after knowledge representation is
SAM is a well-known algorithm for target detection. It allows
compared with the corresponding ground truth of the input
quick mapping by calculating the spectral similarity between
data set for calculating the overall accuracy.
two spectra i.e., image spectra and the reference reflectance
spectra [23].This algorithm is not influenced by illumination
parameters elements, since the angle between the two vectors E. Formating Output for Various Applications
is not dependent on the length of the vectors [23]. SAM In this last stage of CTA a concluding report is produced
calculates the similarity using (4). which consists of the theoretical and statistical results of the
CTA. The cognitive approach used in this research work can
𝑖=1 𝑡 𝑖 𝑟 𝑖
be used in several decision making tasks which are used in
𝛼 = 𝑐𝑜𝑠 −1 1 1 (4) remote sensing domain.
𝑛𝑏 𝑡 2 2 𝑛𝑏 2 2
𝑖=1 𝑖 𝑖=1 𝑟𝑖


Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:01:18 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2016)
III. RESULTS AND DISCUSSION theperformance of the analyst during thetask.CTA exhibits the
information on the basis of the cognitive parameters of the
A. Data Set human beings.Cognitive task analysis can be used in the
An aerial image of Graz city is used in this research work remote sensing research by integrating it with the image
[24]. Image is obtained from UltraCamD from Microsoft analysis.Building detection from an urban area has always been
Photogrammetry having three color channels i.e., red(R), green a challenging task because of the geometry of the buildings.
(G) and blue (B).The camera UltraCamD is capable of Therefore to resolve this problem a cognitive method which is
delivering the image of size 3680*7500 pixels along track and capable to detect buildings from an aerial image of dense urban
across track respectively [24]. area is implemented in this research work. Moreover, this
method also has a constraint that it is not able to separate the
B. Quantitative Analysis
non-building and building areas which have same spectral
The potential of the cognitive method is evaluated using values. Therefore for future work this method needs an
three well known quality measures [25, 26] given in (5), (6) improvement for the clear separation between two classes i.e.,
and (7). buildings and non-buildings.
Pr ecision  (5)
 TP  FP 
Re call  (6)
 TP  FN 
F1 
2  precision  recall  (7)
 precision  recall  (a)
Where‖.‖ represent number of pixels allotted to each class and
F1score represents the combined output of precision and recall
into a single score. Each pixel of an input image is classified
into four distinct classes i.e., True Positive (TP), False Positive
(FP), True Negative (TN), and False Negative (FN) [27].
C. Discussion
The qualitative evaluation of the cognitive method for
building detection is done through the visual interpretation of
human beings. On the basis of visual analysis it can be deduced
(b) (c) (d)
that performance of SAM is better than MTMF. It is found that
that SAM is able to detect buildings in the image without
considering false negative pixels.In figure. 3 image (b) and (e)
shows the rule image of MTMF and SAM respectively, (c) and
(f) is the output of the forward MNF transformation of MTMF
and SAM respectively and (d) and (g) shows the output of the
detected building from MTMF and SAM respectively and (h)
is the corresponding ground truth of the input image.
Further, the quantitative results are illustrated in table I.
The F1 score computed for MTMF and SAM are 58.88% and
59.83% respectively. On the basis of numerical results, MTMF (e) (f) (g)
has the lower precision (87.11%) than SAM. The statistical
findings are shown in table II. The input image consists of
buildings of different shape and size, in spite of that the
cognitive method for building detection resulted in fair
performance. Therefore, on the basis of qualitative and
quantitative findings, it can be inferred that the cognitive
approach forbuilding detection performs well for such a
complex sample of aerial image.
Cognitive task analysis is a new area of research and it is a Fig.3. (a) Input Image (b)& (e)Rule Image (c)& (f) Output of MNF Transform
key contribution in the field of cognitive psychology. CTA (d) & (g) Detected Buildings (h) Ground Truth
provides the descriptive information which is dependent on


Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:01:18 UTC from IEEE Xplore. Restrictions apply.
International Conference on Computing, Communication and Automation (ICCCA2016)
TABLE I. RESULTS OF ACCURACY ASSESMENT [12] J. C. McGlone and J. Shufelt, “Projective and object space geometry for
monocular building extraction”, In Computer Vision and Pattern
Recognition, 1994. Proceedings CVPR'94., 1994 IEEE Computer
Society Conference on (pp. 54-61). IEEE, June,1994.
[13] C. Lin and R. Nevatia, “Building detection and description from a single
intensity image”, Computer vision and image understanding, 72(2),
101-121, 1998.
[14] D. S. Lee, J. Shan, andJ. S. Bethel, “Class-guided building extraction
from Ikonos imagery”, Photogrammetric Engineering & Remote
TABLE II. STATISTICAL OUTPUT OF THE METHODS Sensing, 69(2), 143-150, 2003.
[15] J. Inglada, “Automatic recognition of man-made objects in high
resolution optical remote sensing images by SVM classification of
geometric image features”, ISPRS journal of photogrammetry and
remote sensing, 62(3), 236-248, 2007.
[16] B. Sirmacek, and C. Unsalan, “Urban-area and building detection using
SIFT keypoints and graph theory”, Geoscience and Remote Sensing,
IEEE Transactions on, 47(4), 1156-1167, 2009.
[17] A. O. Ok, “Automated detection of buildings from single VHR
REFERENCES multispectral images using shadow information and graph cuts”, ISPRS
Journal of Photogrammetry and Remote Sensing, 86, 21-40, 2013.
[1] R. E. Clark, D. Feldon, J. J. van Merrienboer, K. YatesandS. Early, [18] L. Matikainen, H. Kaartinen, and J. Hyyppa, “Classification tree based
“Cognitive task analysis”, Handbook of research on educational building detection from laser scanner and aerial image
communications and technology, 3, 577-593, 2008. data”. International Archives of Photogrammetry, Remote Sensing and
Spatial Information Sciences, 36(Part 3), W52, 2007.
[2] X. Jin and C. H. Davis, “Automated building extraction from high-
resolution satellite imagery in urban areas using structural, contextual, [19] S. Kumar and M. Hebert, “Man-made structure detection in natural
and spectral information”. EURASIP Journal on Applied Signal images using a causal multiscale random field”, In Computer Vision and
Processing,2196-2206., 2005. Pattern Recognition,. Proceedings IEEE Computer Society Conference
on (Vol. 1, pp. I-119), 2003
[3] Y. Xiao, S. K Lim, T. S Tan and S. C Tay, “Feature extraction using
very high resolution satellite imagery”, In Geoscience and Remote [20] F. Korc ans W. Forstner, “Interpreting terrestrial images of urban scenes
Sensing Symposium, Proceedings. IEEE International (Vol. 3). IEEE, using discriminative random fields”, In 21 st Congress of the
September, 2004 International Society for Photogrammetry and Remote Sensing (ISPRS).
Beijing, China (pp. 291-296), 2008.
[4] Quang, T. Nguyen,“An Efficient Framework for Pixel-wise Building
Segmentation from Aerial Images.” Proceedings of the Sixth [21] P. Zhong and R. Wang, “Object detection based on combination of
International Symposium on Information and Communication conditional random field and markov random field”, In Pattern
Technology. ACM, 2015. Recognition, 18th International Conference on (Vol. 3, pp. 160-163).
IEEE, 2006.
[5] E. P Baltsavias, “Object extraction and revision by image analysis using
existing geodata and knowledge: current status and steps towards [22] S. G. Meh., V. Ahadnejad, R. A. Abbaspour and M. Hamzeh, “Using the
operational systems”, ISPRS Journal of Photogrammetry and Remote mixture-tuned matched filtering method for lithological mapping with
Sensing,58(3), 129-151, 2004. Landsat TM5 images”, International Journal of Remote
Sensing, 34(24), 8803-8816, 2013.
[6] N. Haala and M. Kada, “An update on automatic 3D building
reconstruction”, ISPRS Journal of Photogrammetry and Remote [23] G. Girouard, A. Bannari, A. El Harti and A. Desrochers, “Validated
Sensing, 65(6), 570-580, 2010. spectral angle mapper algorithm for geological mapping: comparative
study between QuickBird and Landsat-TM”. In XXth ISPRS Congress,
[7] M. Tavakoli and A. Rosenfeld, “Building and road extraction from aerial
Geo-Imagery Bridging Continents, Istanbul, Turkey (pp. 12-23), 2004.
photographs”, IEEE Transactions on Systems, Man, and
Cybernetics, 12, 84-91, 1982. [24] N. T. Thuy, “Object Detection from Aerial Image”, Doctoral
dissertation, Graz University of Technology, 2009.
[8] M. Herman and T. Kanade, “Incremental reconstruction of 3D scenes
from multiple, complex images”, Artificial intelligence, 30(3), 289- [25] S. Aksoy, I. Z. Yalniz and K. Tasdemir,“Automatic detection and
341,1986. segmentation of orchards using very high resolution imagery”,
Geoscience and Remote Sensing, IEEE Transactions on, 50(8), 3117-
[9] A. Huertas and R. Nevatia, “Detecting buildings in aerial images”,
3131, 2012.
Computer Vision, Graphics, and Image Processing, 41(2), 131-152,
1988. [26] A. O. Ok, “Automated detection of buildings from single VHR
multispectral images using shadow information and graph cuts”, ISPRS
[10] R. B. Irvin and D. M. McKeown, “Methods for exploiting the
Journal of Photogrammetry and Remote Sensing, 86, 21-40, 2013.
relationship between buildings and their shadows in aerial imagery”,
InOE/LASE'89, 15-20 Jan., Los Angeles. CA (pp. 156-164). International [27] S. Ghaffarian, “Automatic building detection based on supervised
Society for Optics and Photonics, March, 1989. classification using high resolution Google Earth images”, J. Photogr.
Remote Sens., XL-3, 101-106, 2014
[11] Y. T Liow and T. Pavlidis, “Use of shadows for extracting buildings in
aerial images”, Computer Vision, Graphics, and Image
Processing, 49(2), 242-277, 1990.


Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:01:18 UTC from IEEE Xplore. Restrictions apply.

You might also like