Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

A Cognitive Based Approach for Building Detection

from High Resolution Satellite Images


Naveen Chandra, Jayanta Kumar Ghosh, Ashu Sharma
Civil Engineering Department
Indian Institute of Technology
Roorkee, India

Abstract— High resolution satellite images are new sources of features [16]. Then, scale invariant feature transform (SIFT)
data for geo-spatial information. In order to automate the and graph theoretical tools were used to buildings from an
process of extraction of information from satellite images urban area [17]. Furthermore, new technique for automated
emulating human experts, an understanding of the human building detection by combining shadow information with
cognitive processes involved in information extraction from Grab Cut partitioning algorithm and fuzzy logics was
images are required. The objective of this research work is to introduced [18]. Afterwards, they improved the accuracy of
emulate the human cognitive capabilities by integrating cognitive their previous research work of detecting the shadow areas
task analysis for extraction of data from satellite images. Initially, [19]. In this paper, an automatic method for detecting buildings
preliminary knowledge about the sequence of cognitive processes
from high resolution satellite images has been implemented
which human being utilizes during the interpretation and
classification of images was collected. Here, rule based approach
using cognitive task analysis (CTA) [20]. The outline of paper
for the representation of the knowledge which is obtained from is as follows: Section II briefly describes the cognitive
the visual interpretation of image by the human beings. Defined approach used for building detection. Section III presents the
rules are used to determine the buildings in the satellite images obtained results and lastly section IV contains the conclusion.
using the mixture tuned matched filtering algorithm (MTMF).
Further, during knowledge elicitation the domain knowledge is II. METHODOLOGY
grouped together using support vector classifier. The method is The process of building detection is integrated with
tested using four different sets of high resolution satellite images. cognitive task analysis performed in five different stages
The overall average of precision and recall are computed as shown in Fig. 1. Cognitive task analysis (CTA) is a term
99.08% and 75.85%, respectively. widely used in applied psychology for performing the complex
task. CTA is a systematic procedure used to define the
Keywords— Cognitive; Knowledge; Mixture Tuned Matched psychological processes used by an analyst to perform a task.
Filtering CTA explains the cognitive process, parameters and inputs that
I. INTRODUCTION will be used to obtain the results of the complex task. It is a
technique used by an individual for complex decision making
Building detection from urban area has been a focus of in real time environment. CTA is used for extracting
research in computer vision [1]. With the availability of high information from the thought process used for performing a
resolution satellite images different types and algorithms and task. CTA is a method that concentrates on understanding the
methods have been proposed for building detection from high task that uses require a cognitive activity such as problem-
resolution satellite images [2]. Building detection has been a solving, decision-making, memory and judgement. The
major area of study in different applications, some survey can cognitive task analysis approach analyzes and represents the
be found in [3-7]. Considering the source of images which cognitive activities which are taken into account by different
have been utilized for building recognition, for instance, users for performing a specific task. The overall architecture of
multispectral pictures, SAR and LiDAR datasets, the current the methodology is shown in Fig. 2.
approach can be divided into two parts i.e., Building detection
using 3Dimage data and using monocular remotely sensed data A. Collecting the Preliminary Knowledge
set. Different region growing algorithm and geometry have During this initial stage the image analyst identifies the
been used in the past for building detection [8-11]. Initially, sequence of processes which are required for the cognitive task
shadows were used to determine the sides and corners of the analysis and selects the method which will be used in
building [10]. Later, different feature such as height and shape knowledge elicitation. Analyst develops a basic understanding
of the building were determined on the basis of information of the area in which cognitive task analysis will be carried out.
obtained from shadow [11].Then shadow information was In general, image analyst becomes familiar with the domain
finally used to finish the process of boundary grouping [12] knowledge i.e., image interpretation.
and simultaneously it was used a verification parameter for
previously proposed methods [13-14]. Now new methods B. Determining the Method for Knowledge Representation
based on supervised classification have been developed for During this stage image analyst examines the preliminary
building detection [15]. Later used support vector machines knowledge and task required for cognitive analysis of satellite
(SVM) were used for building detection using the image image. There are several methods of knowledge representation

978-1-5090-0673-1/16/$31.00 ©2016 IEEE

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:02:03 UTC from IEEE Xplore. Restrictions apply.
such as semantic network, concept maps and flow chart. In this target pixels. If the number of target pixels that are grouped is
research rule based method is used for knowledge less than the Group Min Threshold value, those pixels will be
representation. These rules are based on visual interpretation removed from the output image.
mechanism of image analyst. Initially a training data set for
the input image is prepared. Buildings in the image were C. Selecting the Knowledge Elicitation Method
labelled manually. Then the forward MNF transform of an Elicitation is a technique of gathering information or
input image with three bands is computed. The MNF knowledge from different sources. During this stage, the
transformation is the principal component transformation [21] analyst applies different techniques to gather the knowledge
with two overlay process in key, the changed information space determined in the previous stage . In past, several methods are
is isolated into two sections, one section is identified with the used for elicitation based on their domain such as prototyping
large eigenvalue and the comparing feature image, the other
method, simulation method and observation method etc. Here,
part is steady with the comparative eigenvalue and the picture
classification method is used to elicit the knowledge. Support
for the most part with commotion [22-23]. The benefit of MNF
change is to take out the relationship between distinctive bands; vector machine classifier is trained based on the prior
it makes the data gather in the less part. The spectral knowledge obtained using the training data set. Input image is
information of components is isolated in the feature space of classified into two different classes namely, building and non-
MNF change, the frail data is upgraded during the denoising building.
process. In this manner, the separability of features is increased D. Analysis and Verification of the Acquired Data
[21]. MTMF is a modern spectral unmixing method in which it
is superfluous for all materials inside of a scene to be known or Cognitive task analysis methods vary in results based on the
to have distinguished endmembers [24], and it joins the best knowledge elicitation method. Therefore, the method used
parts of the statistical matched filter model and linear spectral here requires the validation so that it can be used further in
mixing model by excluding the limitations of the previous different application. In this stage the result of the classified
method. MTMF is based on the concepts of signal processing image and the image obtained after representation of the
algorithm which can detect specific land cover type on the knowledge are compared for the analysis and verification of
basis of their spectral properties [25]. MTMF performs the the obtained output.
unmixing by determining the abundance of endmember, by
maximizing the response of the endmember of interest, and by
minimizing the response of the composite unknown Collecting the preliminary knowledge
background, thus ‘matching’ the known signature [26]. When
unmixing is applied in the context of target detection,
extraction of the exact value of target abundance may not be Determining the method for knowledge representation
necessary. If the estimated value of the abundance fraction for
the desired target pixel vector is sufficient to distinguish that Selecting the knowledge elicitation method
pixel from adjacent pixels, it will be more effective in image
classification [27]. The MTMF algorithm consists of three
fundamental steps: (1) an MNF transformation [28], (2) Analysis and verification of the acquired data
matched filtering for abundance estimation, and (3) mixture
tuning (MT) to identify infeasible or false-positive pixels [24]. Formatting the output for the different application
The output of MTMF is a set of rule images given as MF and
infeasibility scores for each pixel related to each endmember.
The MF floating-point results help to evaluate the relative
consistency of the basic spectrum and the proximate
abundances of sub-pixels. A value of 1 denotes a very high
degree of matching [26]. A major advantage of MTMF is that
this method does not require signatures for the other
endmembers that occur in the image [24]. Constrained energy Fig.1. Flow Chart of Cognitive Task Analysis
minimization (CEM) and Target constrained interference
minimization filter (TCIMF) is calculated using covariance
matrix. The binary view of the input image is obtained and the
detected target pixels are highlighted and overlaid on the rule
image. Now different Parameter and filter options are set for
each target based on the prior knowledge and the size of the
target. Sometimes a single target object can be separated into
multiple objects by only several mis-detected pixels, and often
there are false positives detected as just a single or few pixels.
Clumping groups the separated pixels into one object using a
kernel of the size specified in the clumping parameters. Sieving
removes small isolated objects. It looks at the neighboring 4 or
8 pixels to determine if a target pixel is grouped with other

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:02:03 UTC from IEEE Xplore. Restrictions apply.
single score. All the pixels of the image are classified into four
Input Image classes namely, True Positive (TP), True Negative (TN), False
Positive (FP) and False Negative (FN) [32].
Training data C. Discussion
The qualitative assessment of the method is performed
SVM Classification MTMF through the visual interpretation of the results illustrated in Fig.
3. It is observed that it is a robust method and detects building
without including the false negative pixels in the input images.
Output Output Further, the numerical results of the method are shown in table
II. The overall average of precision and recall are computed as
99.08% and 75.85%, respectively. Further, the calculated F1-
Quantitative
Analysis
scores for all the input images are 85.66%, which shows
promising results for such a testing set of images. According to
the numerical results, the lowest precision ratio (98.22%) is
produced by the image D due to the proximity of the spectral
Accuracy Assessment reflectance values of the buildings and the background image.
The maximum precision is shown by the image C whereas
lowest recall is shown by image D and maximum F1 score is
shown by image A. The entire image set represent samples of
various colors, shapes and sizes of the buildings which are
detected by automatic building detection method and is
resulted in good performance. A comparative evaluation of the
cognitive method for building detection is done using one
Fig.2. Overall Architecture of the Methodology sample image of the data set used in [32]. The F1 score
obtained is 95.1% which is more than score obtained in [32].
E. Formating Output for Different Applications The results of the sample image data are shown in Fig. 4.
At the end a final report is prepared which includes the Based on quantitative and qualitative results, it can be deduced
findings of the CTA. The output of the CTA is used in that the cognitive building detection algorithm performs well
different automated applications. The used CTA should be for such challenging set of images.
compared with the other available method in order to reach a
TABLE I. DESCRIPTION OF DATA SET [29]
decision. The model described here can be used in variety of
decision making tasks required in remote sensing.
III. RESULTS AND DISCUSSION
A. Data Set
The potential of the methodology was determined using
high resolution satellite images which have three bands
(RGB).The images are selected in a manner so that it can
represent different building feature namely shapes and size of
buildings. The detailed description of the data is given in table
I.
B. Quantitative Analysis
The performance of the method is evaluated using three
standard quality measures [30-31] given in (1), (2) and (3).

TP
Pr ecision = (1)
( TP + FP )
TP
Re call = (2)
( TP + FN )
F1 =
(2 ∗ precision ∗ recall ) (3)
( precision + recall )
Where ‫ۅ‬.‫ ۅ‬denotes the number of pixels assigned to each class
and F1score is the combination of precision and recall into

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:02:03 UTC from IEEE Xplore. Restrictions apply.
information which is based on the cognitive processes of an
analyst. Most of the automated building detection algorithms
have limitations in context of color, shape and size of the
buildings. Moreover, there are few limitations related to
density of building areas which varies in rural and urban area.
To overcome these limitations, a method which can detect
buildings from high resolution satellite images irrespective of
their geometrical parameter has been implemented. However,
this approach still has a limitation. In particular, it is not able to
perform the clear separation between the building and non-
building areas having similar spectral values. For future work,
this method is needed to be evaluated on complex images
having large geographical area for determining the overall
accuracy of the method.
REFERENCES

[1] X. Jin and C. H. Davis, “Automated building extraction from high-


resolution satellite imagery in urban areas using structural, contextual,
and spectral information”. EURASIP Journal on Applied Signal
Processing, 2196-2206., 2005.
[2] Y. Xiao, S. K Lim, T. S Tan and S. C Tay, “Feature extraction using
very high resolution satellite imagery”, In Geoscience and Remote
Sensing Symposium, 2004. IGARSS'04. Proceedings. 2004 IEEE
International (Vol. 3). IEEE, September, 2004.
[3] E. P Baltsavias, “Object extraction and revision by image analysis using
existing geodata and knowledge: current status and steps towards
operational systems”, ISPRS Journal of Photogrammetry and Remote
Sensing,58(3), 129-151, 2004.
[4] H. Mayer,”Automatic object extraction from aerial imagery—a survey
focusing on buildings”, Computer vision and image
(A) (B) (C) (D) (E)
understanding, 74(2), 138-149, 1999.
Fig.3. (A) Input Images (B) Output of MNF Transform (C) Rule Image (D) [5] C. Unsalan and K. L. Boyer, “A system to detect houses and residential
Detected Buildings (E) Classified Image street networks in multispectral satellite images”, Computer Vision and
Image Understanding, 98(3), 423-461, 2005.
TABLE II. NUMERICAL RESULTS OF THE METHOD [6] C. Brenner, “Building reconstruction from images and laser scanning”,
International Journal of Applied Earth Observation and
Geoinformation, 6(3), 187-198, 2005.
[7] N. Haala and M. Kada, “An update on automatic 3D building
reconstruction”, ISPRS Journal of Photogrammetry and Remote
Sensing, 65(6), 570-580, 2010.
[8] M. Tavakoli and A. Rosenfeld, “Building and road extraction from aerial
photographs”, IEEE Transactions on Systems, Man, and
Cybernetics, 12, 84-91, 1982.
[9] M. Herman and T. Kanade, “Incremental reconstruction of 3D scenes
from multiple, complex images”, Artificial intelligence, 30(3), 289-341,
1986.
[10] A. Huertas and R. Nevatia, “Detecting buildings in aerial images”,
Computer Vision, Graphics, and Image Processing, 41(2), 131-152,
1988.
[11] R. B. Irvin and D. M. McKeown, “Methods for exploiting the
relationship between buildings and their shadows in aerial imagery”,
InOE/LASE'89, 15-20 Jan., Los Angeles. CA (pp. 156-164). International
Society for Optics and Photonics, March, 1989.
[12] Y. T Liow and T. Pavlidis, “Use of shadows for extracting buildings in
aerial images”, Computer Vision, Graphics, and Image
(A) (B) (C) (D) ( E) Processing, 49(2), 242-277, 1990.
Fig.4. (A) Sample Data Used in [32] (B) Output of MNF Transform (C) Rule [13] J. C. McGlone and J. Shufelt, “Projective and object space geometry for
Image (D) Detected Buildings (E) Classified Image monocular building extraction”, In Computer Vision and Pattern
Recognition, 1994. Proceedings CVPR'94., 1994 IEEE Computer
IV. CONCLUSION Society Conference on (pp. 54-61). IEEE, June,1994.
[14] C. Lin and R. Nevatia, “Building detection and description from a single
CTA is an important contribution in cognitive psychology. intensity image”, Computer vision and image understanding, 72(2),
CTA generates descriptive and precise information depending 101-121, 1998.
on the nature of the performance of the analyst while
performing a specific task. CTA is an effective source of

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:02:03 UTC from IEEE Xplore. Restrictions apply.
[15] D. S. Lee, J. Shan, and J. S. Bethel, “Class-guided building extraction implications for noise removal”, Geoscience and Remote Sensing, IEEE
from Ikonos imagery”, Photogrammetric Engineering & Remote Transactions on, 26(1), 65-74, 1988.
Sensing, 69(2), 143-150, 2003. [24] J. W. Boardman, F. A. Kruse, and R. O. Green, ‘Mapping target
[16] J. Inglada, “Automatic recognition of man-made objects in high signatures via partial unmixing of AVIRIS data”, In Proc. JPL airborne
resolution optical remote sensing images by SVM classification of earth sci. workshop (Vol. 1, pp. 23-26), January, 1995
geometric image features”, ISPRS journal of photogrammetry and [25] A. Harris and R. G. Bryant, “A multi-scale remote sensing approach for
remote sensing, 62(3), 236-248, 2007. monitoring northern peatland hydrology: Present possibilities and future
[17] B. Sirmacek, and C. Unsalan, “Urban-area and building detection using challenges”, Journal of environmental management, 90(7), 2178-2188,
SIFT keypoints and graph theory”, Geoscience and Remote Sensing, 2009.
IEEE Transactions on, 47(4), 1156-1167, 2009. [26] A. P. Williams and E. R. Hunt, “Estimation of leafy spurge cover from
[18] A. O. Ok, “Automated detection of buildings from single VHR hyperspectral imagery using mixture tuned matched filtering”, Remote
multispectral images using shadow information and graph cuts”, ISPRS Sensing of Environment, 82(2), 446-456, 2002.
Journal of Photogrammetry and Remote Sensing, 86, 21-40, 2013. [27] Chang, C. I., & Heinz, D. C. Constrained subpixel target detection for
[19] M. Teke, E. Baseski, A. O. Ok, B. Yuksel, and C. Senaras, “Multi- remotely sensed imagery. Geoscience and Remote Sensing, IEEE
spectral false color shadow detection”, In Photogrammetric Image Transactions on, 38(3), 1144-1159, 2000.
Analysis (pp. 109-119). Springer Berlin Heidelberg, 2011. [28] A. A Green, M. Berman, P. Switzer, and M. D. Craig, M. D, “A
[20] R. E. Clark, D. Feldon, J. J. van Merrienboer, K. Yates and S. Early, transformation for ordering multispectral data in terms of image quality
“Cognitive task analysis”, Handbook of research on educational with implications for noise removal”, Geoscience and Remote Sensing,
communications and technology, 3, 577-593, 2008. IEEE Transactions on, 26(1), 65-74,1998.
[21] J. W. Boardman, J. W and F. A. Kruse, “Automated spectral analysis: a [29] https://apollomapping.com/download-free-poster
geological example using AVIRIS data, north Grapevine Mountains, [30] S. Aksoy, I. Z. Yalniz and K. Tasdemir, “Automatic detection and
Nevada”, InProceedings of the Thematic Conference on Geologic segmentation of orchards using very high resolution imagery”,
Remote Sensing (Vol. 1, pp. I-407). Environmental Research Institute of Geoscience and Remote Sensing, IEEE Transactions on, 50(8), 3117-
Michigan, 1994. 3131, 2012.
[22] W. A. N. G. Junhu, “A New Idea in the Clay Alteration Information [31] A. O. Ok, “Automated detection of buildings from single VHR
Extraction in Vegetation Coverage”, Communications in Information multispectral images using shadow information and graph cuts”, ISPRS
Science and Management Engineering. Journal of Photogrammetry and Remote Sensing, 86, 21-40, 2013.
[23] A. A. Green, M. Berman, P. Switzer and M. D. Craig, “A transformation [32] S. Ghaffarian, “Automatic building detection based on supervised
for ordering multispectral data in terms of image quality with classification using high resolution Google Earth images”, J. Photogr.
Remote Sens., XL-3, 101-106, 2014.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on May 02,2024 at 18:02:03 UTC from IEEE Xplore. Restrictions apply.

You might also like