Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 1

Invariant Feature based Darknet Architecture for


Moving Object classification
1
S.Vasavi Sr Member IEEE, 2N.Kanthi Priyadarshini ,3K.Harsha Vardhan
1,2,3
Department of Computer Science and Engineering, VR Siddhartha Engineering College, India
vasavi.movva@gmail.com, kanthi17.neerukonda@gmail.com ,koneru.harshavardhan499@gmail.com

as Radio frequency identification (RFID) based systems are


Abstract— Object detection and classification is important for used to detect stolen vehicles, ambulances that require road
video surveillance applications. Counting vehicles like cars, truck clearance. Magnetic sensor network based vehicle detection
and vans is useful for intelligent transportation systems to can improve accuracy and reduces signal-noise-ratio [L.
identify dense and sparse roads, track loaded vehicles at the
country borders. Even though many solutions such as
Zhang, R. Wang and L. Cui (2011)]. Such networks depend on
appearance-based (Multi-block Local Binary Pattern) and similarity based algorithms to compare waveforms. Ultrasonic
model-based((DATMO) algorithm) are proposed to classify the sensors are used to detect vehicles and save them from
moving objects within the satellite images using machine learning potholes accidents [R. Madli, S. Hebbar, P. Pattar and V.
and deep learning techniques, they either have over fitting Golla(2015)]. Sensors based systems are developed to classify
problems or low performance. Hence these challenges have to be the vehicle movement in industrial monitoring target detection
addressed during detecting and classifying the objects. Instead of
training the classifiers with hand-crafted features, this paper uses
systems [X. Jin, S. Sarkar, A. Ray, S. Gupta and T. Damarla
neural network based object detection and classification to (2012)]. Radar based deep learning systems are developed to
achieve promising accuracy better than the humans. Invariant distinguish different types of moving objects such as person,
feature concept is added to the existing Darknet Architecture of vehicle, animal [X. Jin, S. Sarkar, A. Ray, S. Gupta and T.
You Only Look Once (YOLO) and is combined with Faster Damarla (2020)]. Vehicle detection in dense traffic roads is
Region-Based Convolutional Neural Networks (Faster R-CNN) to important to dynamically plan the signal lights at the road
count the number of vehicles with different spatial locations. This
combined model improves feature extraction step and vehicle
junctions [S. Tuermer, F. Kurz, P. Reinartz and U. Stilla
classification process. The proposed system is tested on two (2013)]. This requires, detecting vehicles and estimating
benchmark datasets Cars Overhead with Context (COWC) and vehicle density periodically. Aerial imagery based vehicle
Vehicle Detection in Aerial Imagery (VEDAI) for counting the detection requires robust system to detect both small and large
cars and trucks. Experimental results prove that the proposed sized vehicles [Azam, S.; Rafique, A.; Jeon, M (2016)].
system is better by 9% in detecting smaller objects than existing Vehicle detection accuracy in aerial images is not encouraging
works.
even with the usage of deep neural networks [Jiandan Zhong,
Index Terms— Deep learning, Faster R-CNN, Neural Tao Lei and Guangle Yao (2017)]. Anchor generation
networks, Object Classification, Satellite images, Vehicle methods have improved the accuracy of on-board vehicle
detection, YOLO, Darknet detection [Wang, Y.; Liu, Z.; Deng, W (2019)]. Vehicle
detection from satellite images faces the problem of
I. INTRODUCTION appearance and size information [Yang, T.; Wang, X.; Yao,
In the era of surveillance with drones and remote sensing, B.; Li, J.; Zhang, Y.; He, Z.; Duan, W (2016)]. Aerial imagery
solving research significant application such as vehicle presents detailed pixel information of vehicles than in satellite
detection and classification problem with deep learning images and hence can be used in applications that require real
techniques in reduced cost has become important. Scale time detection. Moving object detection from the images with
invariant, rotation invariant and dense object detection with one meter resolution generated by earth observation systems
good accuracy is even more challenging task and requires can be used in surveillance applications such as security at
robust classifier. Machine vision software based on deep country boarders. Video satellites such as Skysat-1, Jillin-1
learning algorithms became essential in many of the can generate images with 0.92 meter resolution. Object
automated systems of Industry 4.0 that require object detection detection in such videos is a challenging task because of
and classification tasks. Sensor based vehicle detection occluded objects and shadow areas [Z. Hu, D. Yang, K. Zhang
systems are used to solve parking slot problem [Z. Zhang, M. and Z. Chen (2020)]. Object detection accuracy with deep
Tao and H. Yuan (2015)]. Non intrusive methods are proposed neural networks can be improved by extracting varied features
as an alternate solution for vehicle detection in order to reduce [X. Chen, S. Xiang, C. Liu and C. Pan (2014)]. Rotation
the establishment and maintenance costs [M. Rivas-López et invariant feature extraction methods helps to detect vehicles
al (2015]. Wireless networks and wireless technologies such with different pose and size [Y. Yu, H. Guan and Z. Ji (2015)].
Unmanned Aerial Vehicle (UAV) that uses DC motors for
Submitted on 9th January 2020, Revised on 15th June 2020 object detection can help in transport surveillance systems [L.
Lindner et al (2016)]. Region based networks are proposed for

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 2

small object detection with respect to ground sampling helps in improving the detection accuracy of an object with
distance in aerial images [L. Sommer, T. Schuchert and J. different appearances. This combined system overcomes the
Beyerer (2019)]. Light weight neural networks can reduce problem of detecting small objects and gives the count of
computational cost at training, validation and testing for different vehicles that can be used for further processing in
vehicle detection in aerial images [J. Shen, N. Liu, H. Sun and applications such as intelligent transportation systems.
H. Zhou (2019)]. Context and scene specific feature detectors
can reduce false alarm rates during vehicle detection [C. Tao, C. Organization
L. Mi, Y. Li, J. Qi, Y. Xiao and J. Zhang (2019)]. Region-of- The paper is organized as follows: Section 2 describes a
interest based deep neural networks can predict the location of literature survey on object detection and classification
vehicle that correlates to bounding box of the object in ground methods. Section 3 describes the proposed framework. Results
truth [W. Chu, Y. Liu, C. Shen, D. Cai and X. Hua (2018)]. and discussion are given in section 4.
Convolutional neural networks has enhanced machine vision
with diverse technologies such as Artificial neural networks II. LITERATURE SURVEY
(ANN), Recurrent Networks, deep neural networks that Work reported in [Shenquan Qu, Ying Wang, Gaofeng
enabled increased object detection accuracy [J. M. Gandarias, Meng, and Chunhong Pan (2016)] explores vehicle detection
A. J. García-Cerezo and J. M. Gómez-de-Gabriel(2019)]. A from satellite images with two stages. Binary normed
number of self learning based object detection architectures gradients (BING) are used to extract regions and to speed up
are developed to automate the detection-classification the localization process. CNN is used for feature extraction
workflow. Different stereoscopic vision systems, methods and and classification. The first stage generates category-
implementation is explained in [Ramírez-Hernández L.R. et independent region proposals. These proposals are the input
al. (2020)] data for the next stage. Then the second stage uses CNN to
decide which proposals are vehicles. As vehicle detection
A. Motivation requires localizing objects within an image, a commonly used
Satellite image analysis helps in a wide variety of applications approach that has been used for several decades is the sliding
both in commercial and government sectors. Vehicle window based detector. This method is not practical since it is
detection, as an active research area, has been widely used in time consuming. Authors used satellite images from Google
military surveillance, intelligent traffic systems [Huang, earth of San Francisco city for implementation. Work
Xiaohui, Pan He, Anand Rangarajan, and Sanjay described in [Qiling Jiang, Liujuan Cao, Ming Cheng, Cheng
Ranka(2019)], maritime search and rescue. Human operators Wang, Jonathan Li (2015)] presents vehicle detection from
cannot monitor for long time periods. Also detecting objects satellite images using Deep Neural Networks (DNN). First,
such as cars, trucks, aircrafts and ships from high-resolution road segments are extracted. Graph based segmentation is
satellite images is a difficult task. Although various used to extract image patches. DNN is trained with these
approaches attempt to solve this problem, there is no widely patches and finally classified into vehicle and no-vehicle class.
recognized solution to the problem. The difficulties mainly lie ImageNet dataset is used by the authors. The images are of
in three aspects: the diversity of colors and shapes for different various sizes and are divided into 1000 classes. The training
vehicles, complex background and occlusions. Various object set contains about 1000 images of each class, which results in
location methods have been applied to vehicle detection. about 1.28 million images. Testing is done with 50000 and
Traditional approaches that use hand-craft features such as 150000 images with the same 1000 categories. They did not
Haar features, Scale Invariant Feature Transform (SIFT), use the validation or test set in their work. The advantage is
Local Binary Patterns (LBP) and Histogram of Oriented that, their system achieved excellent performance in object
Gradients (HOG) for detecting moving object have high false recognition and could detect both bright and dark vehicles.
alarm rate [Shugang Zhang, Zhiqiang Wei, Jie Nie, Lei But the drawback is that, it could detect vehicles on-road only.
Huang, Shuang Wang, Zhen Li(2017)]. Deep learning Authors of [Yohei Koga, Hiroyuki Miyazaki and Ryosuke
approaches for feature extraction are based on varied Shibasaki(2018)] described hard example mining for detection
Convolutional neural networks such as Region-based using R-CNN algorithm. USGS Aerial ortho images are used
Convolutional neural networks (R-CNN), Faster R-CNN, by the authors to test their system. This method is time
Region-based Fully Convolutional Network (R-FCN), VGG- consuming and did not consider balanced training data.
16 [K. Simonyan and A. Zisserman(2014)], Residual Neural [Mundhenk, T.N, Konjevod, G, Sakla, W.A, Boakye, K
Network (ResNet). These approaches face accuracy and speed (2016)] used parallel DNN for detection, that detects and
problems and are not suitable to the real-time environment. As counts cars independently of its scene and location. It
such we require an automated approach that can improve considers only cars and not other objects such as trucks, vans.
vehicle detection accuracy. Oriented_SSD (Single Shot MultiBox Detector SSD) is
described in [Tianyu Tang, Shilin Zhou, Zhipeng Deng, Lin
B. Contribution Lei and Huanxin Zou (2017)]. This method produces error in
This paper proposes a combined system of both YOLO and orientation estimation because of false and missing detections.
Faster R-CNN for detection, classification and counting cars Authors of [Konoplich, G.V.; Putin, E.O.; Filchenkov,
and trucks in satellite images. Rotation Invariant-features A.A(2015)] proposed an adapted hybrid neural network

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 3

(HDNN) to detect vehicles in real-time. This method is not If the output value is less than the predicted value, means it is
practical since it is time consuming. Multi-scale deep CNN nearing to actual label. Our proposed system uses Darknet
(MS-CNN) for object detection is described in [Cai, Z.; Fan, based YOLO for detection, because existing object detection
Q.; Feris, R.S.; Vasconcelos, N. (2016)]. This method requires systems could not accurately locate small objects [Guo X. Hu,
large memory for input up-sampling. [Azam, S.; Rafique, A.; Zhong Yang, Lei Hu, Li Huang, and Jia M. Han (2018)] and
Jeon, M (2016)] described a method called FASTER R-CNN Faster R-CNN is used for classification.
for estimating the pose of the vehicle. Their method could not
detect small objects and considers only cars. [Zhang, Huan, III. PROPOSED SYSTEM
Cai Meng, Xiangzhi Bai, and Zhaoxi Li (2018)] also described Even though mean average precision (mAP) of Faster R-
estimating pose of vehicle using an arc based ellipse fitting CNN is good, but it takes more processing time with 5 to 18
method. It concentrates on how to improve detection accuracy frames per second (fps)[ J. Shen, N. Liu, H. Sun and H.
from low resolution images using ellipse parameters. Region Zhou(2019)]. Also R-CNN requires fixed input size [H. Chen,
based networks could not detect small objects. Hyper region Z. He, B. Shi and T. Zhong(2019)]. Yolo v3 is with 155 fps
proposal network (HRPN) is described in [Tang T, Zhou S, and is fast in object detection without compromising on
Deng Z, Zou H, Lei L(2017)]. Performance is reduced accuracy. As such object detection is done using YOLO and
sometimes because of background objects. Works reported in classification using Faster R-CNN. The following Figure 1
[Sang J, Guo P, Xiang Z, Luo H, Chen X(2017)] used three presents the workflow of the proposed system.
different convolutional neural networks such as R-CNN,
VGG16, and ResNet-152 to achieve good recognition In the first step, YOLO predicts all the bounding boxes
accuracy. This method may not be suitable for implementation containing objects from the given input satellite image. In the
in real time because of speed. For detecting small objects from second stage, Convolutional network of Faster R-CNN
satellite images, [Adam Van Etten (2018)] described a verifies the detected regions and classifies objects in the
pipeline called You Only Look Twice (YOLT), that outputs detected regions. YOLO determines the locations within the
bounding boxes around the objects. YOLO 1 [J. Redmon, S. image where there is the possibility of object’s presence. The
Divvala, R. Girshick and A. Farhadi(2016)] and YOLO 2 [J. advantage of using YOLO is that, instead of using a pipeline
Redmon and A. Farhad(2017)] are developed as an alternate to of steps for detection, which is a slow process, it detects using
Faster R-CNN for vehicle detection. Satellite Imagery a single neural network. Faster R-CNN produces more false
Multiscale Rapid Detection with Windowed Networks positives than YOLO. Hence YOLO is used at detection and
(SIMRDWN) framework is proposed in [Adam Van Etten Faster R-CNN at classification. The advantage of combining
(2018)], that combines YOLT with the TensorFlow Object YOLO with Faster R-CNN is to detect smaller objects and to
Detection Application Program Interface (API) for better predict more than one class.
object detection. Both methods could not differentiate features
for highways and runways. Faster R-CNN [Qu T., Zhang Q., Image size of 416X416 is taken as input. Initially YOLO is
Sun S (2017)][ Tang T., Zhou S., Deng Z., Zou H., Lei L used for detecting the probability of bounding box. A
(2017)] generates region proposals having foreground objects bounding box with high probability (more than 0.5) is passed
in the first step and then classifies these objects in the second to Faster R-CNN for final classification. Faster R-CNN is
step. Computational cost of this method is higher. Authors of slow and this can be solved by combining YOLO v3 [Joseph
[Dmitry Sincha, Mikhail Chervonenkis and Pavel Skribtsov Redmon, Ali Farhadi (2019), Kim, Daeho, Meiyin Liu,
(2016)] described a method to detect objects in multiple SangHyun Lee, and Vineet R. Kamat (2019)] (that uses one
scales. Munich Dataset is used for experimentation. Vehicles stage detector strategy) with Faster R-CNN without having to
are classified into three classes such as heavy, light and use much expensive hardware. One extra step is added to this
middle. Performance can be increased by reducing detection hybrid architecture of Faster R-CNN and YOLO, which is to
quality. Work presented in [Jiandan Zhong, Tao Lei and augment rotations, so as to handle the unusual orientation of
Guangle Yao (2017)] described CNN-based detection model the object. Rotation invariant features are used by [Y. Yu, H.
using convolutional neural networks. Partially occluded Guan and Z. Ji(2015)] to estimate object centroid. Whereas the
objects are not detected. Also this method cannot distinguish present system used augment rotations to increase the training
between intra class objects. [Hadj-Sahraoui, Omar, Hadria set. The output from YOLO is generated after applying 1X1
Fizazi, Faouzi Berrichi, Djemoui Chamakhi, and Lahcen kernel on the feature map. Kernel size is taken as 1X1X (3X
Wahib Kebir (2019), Atta, Randa, and Mohammad (5+3)) = 1X1X24, for 3 bounding boxes and 3 classes. 1X1
Ghanbari[2013)] discussed on methods for improving the kernel size results in non-spatially correlated information loss,
resolution of images. Various functions for performance but can benefit in reduced over fitting problem.
evaluation are discussed in [Loss functions (2019)] such as Different detection size is taken for car and truck. The
cross entropy, hinge, Huber, Kullback-Leibler, Mean absolute feature map has an identical height and width of 416. Multi-
error (MAE), Mean Squared Error (MSE). Cross-entropy loss label prediction (with logistic regression) is done at down-
function is used in the proposed system to deal with sample dimensions for input image of stride 32,16,8 and 52 x
overlapping multi-class labels. This function outputs the 52, 26 x 26, 13 x 13 scales. Softmax activation is not used in
classifier performance as a probability value between 0 and 1. the proposed system, because it uses mutual exclusive

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 4

3X3 Grid

Augment Rotation

Bounding box ,
Input Satellite Image confidence score using
sigmoid function
Feature map generation
Rotation

Final detections

Figure. 1 Workflow of the proposed system

function and hence Sigmoid function is used for Type Filters Size Output
determining the class confidence. Darknet-53 [Chuan-Pin Lu,
Convolutional 32 3X3 256 X 256
Jiun-Jian Liaw ,Tzu-Ching Wu and Tsung-Fu Hung (2019)]
Convolutional 64 3 X 3 /2 128 X 128
with Sigmoid function is shown in Figure 2. Yolo v3 has 106
Convolutional 32 1X1
Convolutional layers (53 from Darknet trained on Imagenet
1X Convolutional 64 3X3
and additional 53 for detection task) is used for feature
Residual 128 X 128
extraction. The sigmoid function is added to the architecture.
Convolutional 128 3 X 3 /2 64 X 64
The bottom three layers are used to detect different scaled Convolutional 64 1X1
objects. 2X Convolutional 128 3X3
The network downsamples the input image as explained in Residual 64 X 64
[Chuan-Pin Lu, Jiun-Jian Liaw ,Tzu-Ching Wu andTsung-Fu Convolutional 256 3 X 3 /2 32 X 32
Hung (2019)] and at the layer 81 with stride 32. 1 x 1 Convolutional 128 1X1
detection kernel gives feature map of 13 x 13 x 8 for image 8X Convolutional 256 3X3
size 416X416. Uupsampling is done by a factor of 2. Residual 32 X 32
Detections are done at 94 layer and 106 layer with stride 16 Convolutional 512 3 X 3 /2 16 X 16
and 8 respectively. Upsampling helps in detecting small Convolutional 256 1X1
objects, where the network learns fine-grained features. 8X Convolutional 512 3X3
Selective search that uses color, texture properties are used to Residual 16 X 16
classify the regions given by YOLO. It can reduce the analysis Different
Convolutional 1024 3 X 3 /2 8X8
Scaled
of a number of bounding boxes. Intersection over Union (IoU) Convolutional 512 1X1
objects
overlap of 0.3 is taken for positive prediction of small objects 4X Convolutional 1024 3X3
too, as given in Eq(1). Residual 8X8
IoU = Areaofoverlap / AreaofUnion (1)
Sigmoid Function

Figure 2: Architecture of Darknet-53

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 5

A. Methodology (COWC) (2018)].


1. Load the input image of size 416X416 and extract its It was observed that Vaihingen and Columbus city images
dimensions. The images from the dataset are with spatial are grayscale and the remaining four cities images are in RGB.
resolution of 15cm Ground Sampling Distance (GSD) 2. The VEDAI dataset [Razakarivony, S. and Jurie, F.
2. For each image, create additional training images using (2015)] consists of nine classes with 1250 images. For our
Rotation set = [0,15,30,45,60,75,90,105,120,135,150, experimentation we took the car and truck classes only.
165,180]
3. Divide the input image into grids IV. PERFORMANCE EVALUATION
4. Assign detection kernel size. A. Experimental Setup
5. Train the model with labeled data. Each of the raw image The OpenCV Python 3.2 is used for implementing the
file into matrix notation. Divide the matrix into steps. algorithms of the proposed system. Our experimentation is
Now form grids of size 13x 13, to extract the training set done on CPU and as such, for each image it took nearly 12
(total of 308988 for COWC dataset) and testing set (total seconds. It initially creates patches and then proceeds for
of 79447 for COWC dataset) grids from large overhead scenes detection. Minimum probability of 0.5 is considered to
images without any overlap in the two sets. Here we have filter the weak detections. Non-maxima suppression threshold
3 classes for prediction “car”, “truck” and “others”. Eight for suppressing multiple detections of the same object is set
dimension vector of Grid cell , y is given as follows: with a default value of 0.5. YOLO is trained for about 160
y= epochs and in the COWC dataset there are 33553 images. So
the total number of epochs required is 160* 509. The
pbo dx dy dh Dw o1 o2 o3
learning rate is taken as 0.9. Predefined YOLO weights
available at [Pretrained Weight file (2019)] are used in the
Here, implementation. Open source code for Darknet-53 available at
• pbo specifies object presence probability [Darknet(2018)] is customized for our experimentation to add
• dx, dy, dh, dw specifies the dimensions of the object invariant feature extraction and all default values are used
• o1, o2, o3 represents various classes such as car, truck such as weight decay of 0.005, height and width as 608, batch
and others size as 64. Patch labels are created for each of the dataset.
• If no object is found then o1,o2,o3 will be zero The following table 1 to table 3 presents results of the
6. Pass the objectness score, pc to Sigmoid function (output proposed system on various datasets for detecting cars and
is squashed to a range of 0 to 1) trucks from COWC and Vedai datasets. It can be observed that
7. Repeat for all the boxes: Any box with a probability less false positives are less, because, YOLO predicts object
than or equal to 0.5 (threshold) is discarded. Output the box boundaries after a thorough scan on the entire image. Few
with the highest probability. objects are not detected because of cluttered background area.
8. Image Location Detection: Data labels consist of an IoU of 0.3 also helped in reducing the number of false
image mask where non-zero pixels denote the object centroid. positives. It was observed that, few objects were not detected
Bounding box labels are created by assuming 3.0 meters a as shown in these tables, because of the following reasons:
mean car size and 4.0 for the truck, and transforming the (i) Occlusions
image mask into bounding boxes of 20 pixels on a side (ii) The basic nature of YOLO looking only once over the
centered on the label point. overlapping objects and merged them as a single object.
9. Logistic regression for Vehicle Classification: (iii) Setting threshold of 0.5 to discard the weak detections
(a) Extract the object locations (width and height of the (partial objects are not detected).
object) and save in to a matrix
(b) Draw patch (bounding box) for each object Table 1: Results of the proposed system for detecting cars in
(c) Remove patches whose shape exceeds the shape of a COWC dataset
matrix. S.No Dataset Images No. of No. of Cars
(d) Read the image patch and write the locations and cars detected
dimensions of an object into a text file. 1 Columbus 1301 1281 1273
(e) Determine the step location of each object. 2 Potsdam 7046 6770 5770
3 Selwyn 4828 4779 4709
B. Dataset Description 4 Toronto 7485 7287 7198
1. COWC dataset is used for experimentation in this work. 5 Utah 8145 8006 7939
This dataset consists of large set of annotated cars and trucks 6 Vaihingen 3748 3680 3568
from six locations Toronto Canada, Selwyn New Zealand,
Total 32553 31803 30457
Potsdam and Vaihingn Germany, Columbus and Utah United
States. This dataset has negative examples and helps for better
Table 2: Results of the proposed system for detecting trucks in
classification.
COWC dataset
It also provides extra testing scenes for use after validation.
S.No Dataset Images N.o of No. of
It can be downloaded from [Cars Overhead With Context

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 6

trucks trucks model. Other methods such as Precision and Recall are used to
detected evaluate performance of the proposed system. Also F1 score is
1 Columbus 1301 11 8 calculated to find balance of precision and recall.
2 Potsdam 7046 43 21 Table 5 and table 6 presents accuracy of the proposed
3 Selwyn 4828 24 13 system for detecting cars and trucks in the COWC dataset.
Table 7 presents the accuracy of the proposed system for
4 Toronto 7485 29 14
detecting cars and trucks in the Vedai dataset.
5 Utah 8145 13 6
6 Vaihingen 3748 58 28 Table 5 Accuracy of the proposed system for cars on COWC
S.No Dataset Precision Recall Accuracy
Table 3: Results of the proposed system for detecting cars and
1 Columbus 99.77 99.61 99.38
trucks in Vedai dataset
S.No Object Images No. of objects No..of 2 Potsdam 99.89 85.22 85.17
detections 3 Selwyn 99.62 98.54 98.17
1 Car 1250 5875 4678
4 Toronto 99.97 98.78 98.75
2 Truck 1250 343 301
5 Utah 99.96 99.16 99.13
B. Performance Evaluation Measures 6 Vaihingen 100 96.96 96.96
Performance of the proposed system is calculated using
several measures as described in the following: Average 99.87 96.38 96.26
(i) Confusion matrix: It is a table with True Positives (TP),
False Positives (FP), False Negatives (FN) and True Negatives Table 6 Accuracy of proposed system for trucks on COWC
(TN), to visualize classification algorithm performance as S.No Dataset Precision Recall Accuracy
shown in the table 4 1 Columbus 80 72.73 73.68
2 Potsdam 87.5 48.84 52.8
Table 4 Confusion Matrix [S. Vasavi, Reshma Shaik, 3 Selwyn 92.86 54.17 63.64
Sahithi Yarlagadda(2018)] 4 Toronto 77.78 56 60.53
Predicted a=0 Predicted b=1 5 Utah 75 46.15 52.63
actual a=0 TP FP 6 Vaihingen 96.55 48.28 51.56
Average 84.95 54.36 59.14
actual b=1 FN TN
(ii) Classification Accuracy Rate (CAR): This is used to Table 7 Accuracy of the proposed system on Vedai dataset
measure based on the confusion matrix as given in Eq(2) S.No Dataset Precision Recall Accuracy
[S.Vasavi, Reshma Shaik, Sahithi Yarlagadda(2018)]. 1 Cars 99.13 82.41 82.45
accuracy = (tp + tn) / (tp + tn + fp + fn) (2) 2 Trucks 94.06 89.05 86.51
(iii) Precision: It is used to measure the relevancy of the
It can be observed that, Precision and Recall for Utah region
result generated as defined in Eq(3) [S. Vasavi, Reshma Shaik,
is less when compared to other regions, because of factors
Sahithi Yarlagadda(2018)].
such as car density, building architecture, and vegetation
Pr ecision = tp / (tp + fp ) (3) pattern. Table 8 presents Accuracy of the proposed system for
(iv) Recall: It is used to measure relevancy of the result detecting cars and trucks in both datasets.
generated as given in Eq(4) [S. Vasavi, Reshma Shaik, Sahithi
Yarlagadda (2018)]. Table 8 Accuracy of the proposed system for both datasets
Re call = tp / (tp + fn) (4) S. Data Cars Trucks Cars Trucks Cars Trucks
No set Precision Recall Accuracy
(v) F-Measure as given in Eq(5)[S. Vasavi, Reshma Shaik, 1 COWC 99.87 84.95 96.38 54.36 96.26 59.14
Sahithi Yarlagadda (2018)]. 2 Vedai 99.13 94.06 82.41 89.05 82.45 86.51
F1=2*(precision*recall)/(precision +recall ) (5)
(vi) Cross entropy loss function H is given in Eq(6)[ Figure 3 and figure 4 presents accuracy of the proposed
Murphy, Kevin (2012)]. This value can be between 0 and 1. system at each epoch.
The low this value, the more robust the developed model.
H ( p, q ) = − p( x) log q( x)
x (6)
Where p(x) is the required probability and q(x) is the
predicted probability.
Initially accuracy is calculated to evaluate the proposed

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 7

Recall 75.37 96.15 - 92.1


F-Measure 0.83 0.94 - -

Table 10 Proposed system Vs Existing works for VEDAI


Performance Proposed [Dmitry Sincha, [Jiandan
Metric system Mikhail Zhong, Tao
Chervonenkis and Lei and
Pavel Guangle
Skribtsov(2016)] Yao(2017)]
Accuracy 82.45 - -
Precision 96.6 75.8 -

Recall 85.73 85 80.3


Figure 3. Accuracy for COWC
F-Measure 0.90 0.801 0.782

It can be observed in table 10 that, the proposed system


performed well in extracting the small objects when compared
to [Jiandan Zhong, Tao Lei and Guangle Yao(2017)].

Table 11 False positives for the two datasets


S.No. Dataset False Positives
Car Truck
1 COWC 32 13
2 Vedai 41 19

Cross entropy loss function value for COWC dataset is 0.4 and
0.56 for VEDAI dataset. These values for the probabilities tell
how easily the model can detect given objects (e.g. spread
between 0-1 or 0-0.5).
Figure. 4. Accuracy for VEDAI Figure 5 presents the precision recall curve for both datasets
on cars and trucks. For example in COWC dataset, if we
Table 9 and Table 10 compares the accuracy of the proposed choose precision at 92%, then 84% of cars were detected and
system with existing works. Table 11 presents summary of the if precision level is equal to 0.7 then 10% of the cars are
false positives of both datasets and for cars and trucks. . True detected. If recall increases to 93%, then the precision drops to
positives, False positives, True negatives and False negatives 20%. When both the cars and trucks are considered together,
are aggregated into a single value as shown in these tables. It precision resulted to 92.41. True positive rate (TFR) and false
is 0.83 for cars and 0.9 for trucks. positive rate (FPR) are shown in the table 12, to determine
parameters of Receiver Operating Characteristics (ROC) curve
Table 9 Proposed system Vs Existing works on COWC as shown in the figure 6, to validate the proposed model. If
Performanc Propose [Mundhen [David [Junyan this curve is either closer to left border or top border then the
e Metric d k, T.N.; Yu,(2018) Lu, Chi test is accurate and atleast comes nearer to the 45 0 diagonal
system Konjevod, ] Ma, Li then the model test is less accurate.
G.; Sakla, Li, TruePositiveRate = tp / (tp + fn) (7)
W.A.; Xiaoyan FalsePositiveRate = fp / ( fp + tn) (8)
Boakye, Xing,
Least square method as explained in [Ramírez-Hernández,
K(2016)] Yong
L. R., Rodríguez-Quiñonez, J. C., Castro-Toscano, M. J.,
Zhang,
Zhigang Hernández-Balbuena, D., Flores-Fuentes, W., Rascón-
Carmona, R.,& Sergiyenko, O. (2020). ] is an alternative
Wang,
method to model the camera calibration error.
Jiuwei
Xu(2018)
]
Accuracy 96.26 89.29 85 95.32
Precision 92.41 92.59 - -

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 8

Figure.7. Intersection of Union (IoU) on Cars for COWC


Figure. 5. Precision Recall Curve

Figure. 6. ROC curve Figure.8. Intersection of Union (IoU) on trucks for COWC
Table 12: True positive rate and False positive rate for the two
datasets
S.No Dataset TPR FPR
1 COWC (cars) 0.96 0.37
2 COWC (trucks) 0.52 0.003
3 Vedai (cars) 0.82 0.17
4 Vedai (trucks) 0.89 0.25

The following figures 7 to figure 9 presents Intersection of


Union (IoU) for both datasets and for car and trucks. IoU
varying from 0.3, 0.4 and 0.5 is taken to fix the value of IoU.
As can be seen from these figures that, IoU of 0.3 is better
than the other IoU values, when we compare both ground truth
and detection boxes. Figure. 9. Intersection of Union (IoU) for Vedai
The proposed system achieved mean Average
Precision(mAP) of 92.41 for COWC dataset and 96.43 for V. CONCLUSIONS & FUTURE WORK
VEDAI dataset. A higher mAP for VEDAI shows better Detecting small objects with arbitrary orientations in satellite
performance for vehicle detection. images are a challenging task. This paper investigated on
integrating YOLO (augmented with rotation) with Faster R-
CNN to detect and classify the vehicles. The first step in this
integrated model is a regression step to find probable anchor
box that may have vehicles and the second step is to classify
the vehicle in the anchor box. The proposed system uses a
convolutional kernel filter to predict bounding boxes

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 9

containing objects along with bounding box locations and the on YOLO, Journal of Computer and Communications, 2018, PP. 98-
107.
probability of each class to achieve high accuracy. Jaccard
[14] Joseph Redmon, Ali Farhadi(2019), YOLOv3: An Incremental
index(IoU) of 0.3, 0.4 and 0.5 is taken to consider a match Improvement, arXiv:1804.02767, Mar. 2018, [online] Available:
between the predicted box and ground truth box. Multi-label https://arxiv.org/abs/1804.02767.
classification using logistic regression is done. All Bounding [15] Konoplich, G.V.; Putin, E.O.(2016); Filchenkov, A.A. Application of
deep learning to the problem of vehicle detection in UAV images. In
boxes with high scores are considered for the classification.
Proceedings of the 2016 XIX IEEE International Conference on Soft
Additional training images are added using Rotation step and Computing and Measurements (SCM), St. Petersburg, Russia, 25–27
this process helped our proposed system to achieve a better May 2016; pp. 4–6.
detection rate. The robustness of the proposed system is [16] Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N(2016). A unified multi-
scale deep Convolutional neural network for fast object detection. In
evaluated using several performance measures. Objects are
Proceedings of the 2016 European Conference on Computer Vision,
detected with different scales. Evaluation results on two Amsterdam, The Netherlands, 8–16 October 2016; pp. 354–370.
benchmark datasets COWC and VEDAI proved that our [17] Azam, S.; Rafique, A.; Jeon, M(2016). Vehicle pose detection using
method can detect and classify with better accuracy. We faced region based convolutional neural network. In Proceedings of the
International Conference on Control, Automation and Information
difficulty to differentiate between camping cars and big vans
Sciences (ICCAIS), Ansan, Korea, 27–29 October 2016; pp. 194–198.
because of the region set for car and truck. Partially occluded [18] Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L(2017). Vehicle detection
objects are not detected. Even though our method can in aerial images based on region convolutional neural networks and hard
distinguish intra class objects, identifying appropriate region negative example mining. Sensors 2017, 17, 336.
[19] Sang, J.; Guo, P.; Xiang, Z.; Luo, H.; Chen(2017), X. Vehicle detection
size is to be strengthened.
based on faster R-CNN. J. Chongqing Univ(NatSciEd)2017, 40, 32–36.
Our future work is to detect the objects with partial [20] Adam Van Etten(2018), Satellite Imagery Multiscale Rapid Detection
occlusion and to identify the appropriate region size to with Windowed Networks, Computer Vision and Pattern
distinguish intra-class vehicles. The proposed system will be Recognition,2018,pp:1-12
[21] Qu T., Zhang Q., Sun S(2017). Vehicle detection from high-resolution
evaluated on other objects such as ships, aircraft and its usage
aerial images using spatial pyramid pooling-based deep convolutional
in on-board system so as to fine tune the system that can neural networks. Multimedia Tools Appl. 2017;76:21651–21663
detect generic objects. [22] Tang T., Zhou S., Deng Z., Zou H., Lei L(2017). Vehicle detection in
aerial images based on region convolutional neural networks and hard
negative example mining. Sensors. 2017;17:336
REFERENCES [23] Darknet(2018),
[1] Shenquan Qu, Ying Wang, Gaofeng Meng, and Chunhong Pan(2016), https://github.com/pjreddie/darknet/blob/master/cfg/darknet53.cfg, last
Vehicle Detection in Satellite Images by Incorporating Objectness and accessed 12-09-2018
Convolutional Neural Network, Journal of Industrial and Intelligent [24] Loss functions(2019), https://ml-
Information Vol. 4, No. 2, March 2016, pp:158-162 cheatsheet.readthedocs.io/en/latest/loss_functions.html, Last accessed on
[2] Qiling Jiang, Liujuan Cao, Ming Cheng, Cheng Wang, Jonathan 2-2-2019
Li(2015), Deep neural networks-based vehicle detection in satellite [25] Adam Van Etten(2018), You Only Look Twice: Rapid Multi-Scale
images, Conference: 2015 International Symposium on Bioelectronics Object Detection In Satellite Imagery, pp:1-8,2018
and Bioinformatics (ISBB) [26] Dmitry Sincha, Mikhail Chervonenkis and Pavel Skribtsov(2016)
[3] Yohei Koga, Hiroyuki Miyazaki and Ryosuke Shibasaki(2018), A CNN- Vehicle Detection and Classification in Aerial Images, Indian Journal of
Based Method of Vehicle Detection from Aerial Images Using Hard Science and Technology, Vol 9(48), 2016, pp:1-7
Example Mining, Remote Sens. 2018, 10, 124; pp:1-21, [27] Jiandan Zhong, Tao Lei and Guangle Yao(2017), Robust Vehicle
doi:10.3390/rs10010124 Detection in Aerial Images Based on Cascaded Convolutional Neural
[4] Mundhenk, T.N.; Konjevod, G.; Sakla, W.A.; Boakye, K(2016). A Networks, Sensors 2017, 17, 2720;pp:1-17
Large Contextual Dataset for Classification, Detection and Counting of [28] Shugang Zhang, Zhiqiang Wei, Jie Nie, Lei Huang, Shuang Wang, and
Cars with Deep Learning. In Lecture Notes in Computer Science, Zhen Li, A Review on Human Activity Recognition Using Vision-Based
Proceedings of the ECCV 2016: Springer: Volume 9907, pp. 785–800 Method, Journal of Healthcare Engineering, Volume 2017
[5] Razakarivony, S. and Jurie, F. (2015) Vehicle Detection in Aerial [29] Guo X. Hu, Zhong Yang, Lei Hu, Li Huang, and Jia M. Han, Small
Imagery: A Small Target Detection Benchmark. Journal of Visual Object Detection with Multiscale Features, International Journal of
Communication & Image Representation, 34, 187-203. Digital Multimedia Broadcasting, Volume 2018, Article ID 4546896, 10
https://doi.org/10.1016/j.jvcir.2015.11.002 pages,2018
[6] Cars Overhead With Context (COWC)(2018) [30] Huang, Xiaohui, Pan He, Anand Rangarajan, and Sanjay Ranka(2019).
https://gdo152.llnl.gov/cowc/ , Last accessed August 1st 2018. "Intelligent Intersection: Two-Stream Convolutional Networks for Real-
[7] Pretrained Weight file(2019) time Near Accident Detection in Traffic Video." arXiv preprint
https://pjreddie.com/media/files/yolov3.weights, February 1st 2019 arXiv:1901.01138 (2019).
[8] Tianyu Tang,Shilin Zhou, Zhipeng Deng, Lin Lei and Huanxin [31] Kim, Daeho, Meiyin Liu, SangHyun Lee, and Vineet R. Kamat(2019).
Zou(2017), Arbitrary-Oriented Vehicle Detection in Aerial Imagery with "Remote proximity monitoring between mobile construction resources
Single Convolutional Neural Networks, Remote Sensing,2017,9, 1170 using camera-mounted UAVs." Automation in Construction 99 (2019):
[9] Chuan-Pin Lu, Jiun-Jian Liaw ,Tzu-Ching Wu andTsung-Fu 168-182.
Hung(2019) Development of a Mushroom Growth Measurement System [32] Zhang, Huan, Cai Meng, Xiangzhi Bai, and Zhaoxi Li(2018). "Rock-
Applying Deep Learning for Image Recognition, Agronomy 2019, 9(1), ring detection accuracy improvement in infrared satellite image with
32;pp:1-21 sub-pixel edge detection." IET Image Processing 13, no. 5 (2018): 729-
[10] S. Vasavi, Reshma Shaik, Sahithi Yarlagadda(2018). "chapter 12 735.
Moving Object Classification in a Video Sequence Using Invariant [33] Hadj-Sahraoui, Omar, Hadria Fizazi, Faouzi Berrichi, Djemoui
Feature Extraction", IGI Global, 2018 Chamakhi, and Lahcen Wahib Kebir(2010). "High-resolution DEM
[11] Murphy, Kevin (2012). Machine Learning: A Probabilistic Perspective. building with SAR interferometry and high-resolution optical
MIT. image." IET Image Processing 13, no. 5 (2019): 713-721.
[12] David Yu(2018), Parking Lot Vehicle Detection Using [34] Atta, Randa, and Mohammad Ghanbari(2013). "Low-contrast satellite
Deep Learning,2018 https://medium.com/geoai/parking-lot-vehicle- images enhancement using discrete cosine transform pyramid and
detection-using-deep-learning-49597917bc4a, Last accessed 23-9-2018 singular value decomposition." IET Image processing 7, no. 5 (2013):
[13] Junyan Lu, Chi Ma, Li Li, Xiaoyan Xing, Yong Zhang, Zhigang Wang, 472-483.
Jiuwei Xu(2018), A Vehicle Detection Method for Aerial Image Based

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 10

[35] K. Simonyan and A. Zisserman(2015), “Very Deep Convolutional Transactions on Image Processing, vol. 27, no. 1, pp. 432-441, Jan.
Networks for Large-Scale Image Recognition,” CoRR, vol. 2018, doi: 10.1109/TIP.2017.2762591.
abs/1409.1556, 2014 [54] J. M. Gandarias, A. J. García-Cerezo and J. M. Gómez-de-
[36] Rivas-López, M., Gomez-Sanchez, C. A., Rivera-Castillo, J., Gabriel(2019), "CNN-Based Methods for Object Recognition With
Sergiyenko, O., Flores-Fuentes, W., Rodríguez-Quiñonez, J. C., & High-Resolution Tactile Sensors," in IEEE Sensors Journal, vol. 19, no.
Mayorga-Ortiz, P. (2015, June). Vehicle detection using an infrared light 16, pp. 6872-6882, 15 Aug.15, 2019, doi: 10.1109/JSEN.2019.2912968.
emitter and a photodiode as visualization system. In 2015 IEEE 24th [55] H. Chen, Z. He, B. Shi and T. Zhong(2019), "Research on Recognition
International Symposium on Industrial Electronics (ISIE) (pp. 972-975). Method of Electrical Components Based on YOLO V3," in IEEE
IEEE. Access, vol. 7, pp. 157818-157829, 2019, doi:
[37] Z. Zhang, M. Tao and H. Yuan(2015)(2015), "A Parking Occupancy 10.1109/ACCESS.2019.2950053.
Detection Algorithm Based on AMR Sensor," in IEEE Sensors Journal, [56] J. Redmon, S. Divvala, R. Girshick and A. Farhadi(2016) "You only
vol. 15, no. 2, pp. 1261-1269, Feb. 2015, look once: Unified real-time object detection", Proc. IEEE Conf.
doi: 10.1109/JSEN.2014.2362122. Comput. Vis. Pattern Recognit. (CVPR), pp. 779-788, Jun. 2016.
[38] R. Sundar, S. Hebbar and V. Golla(2015), "Implementing Intelligent [57] J. Redmon and A. Farhad(2017)i, "YOLO9000: Better faster stronger",
Traffic Control System for Congestion Control, Ambulance Clearance, Proc. CVPR, pp. 7263-7271, Jul. 2017.
and Stolen Vehicle Detection," in IEEE Sensors Journal, vol. 15, no. 2, [58] Ramírez-Hernández, L. R., Rodríguez-Quiñonez, J. C., Castro-Toscano,
pp. 1109-1113, Feb. 2015, doi: 10.1109/JSEN.2014.2360288. M. J., Hernández-Balbuena, D., Flores-Fuentes, W., Rascón-Carmona,
[39] R. Madli, S. Hebbar, P. Pattar and V. Golla(2015), "Automatic R.,& Sergiyenko, O. (2020). Improve three-dimensional point
Detection and Notification of Potholes and Humps on Roads to Aid localization accuracy in stereo vision systems using a novel camera
Drivers," in IEEE Sensors Journal, vol. 15, no. 8, pp. 4313-4318, Aug. calibration method. International Journal of Advanced Robotic
2015, doi: 10.1109/JSEN.2015.2417579. Systems, 17(1), 1729881419896717.
[40] X. Jin, S. Sarkar, A. Ray, S. Gupta and T. Damarla (2012), "Target [59] Ramírez-Hernández L.R. et al. (2020), Stereoscopic Vision Systems in
Detection and Classification Using Seismic and PIR Sensors," in IEEE Machine Vision, Models, and Applications. In: Sergiyenko O., Flores-
Sensors Journal, vol. 12, no. 6, pp. 1709-1718, June 2012, doi: Fuentes W., Mercorelli P. (eds) Machine Vision and Navigation.
10.1109/JSEN.2011.2177257. Springer, Cham ISBN 978-3-030-22587-2, pp. 241-265, 2020.
[41] S. Tuermer, F. Kurz, P. Reinartz and U. Stilla (2013), "Airborne Vehicle
Detection in Dense Urban Areas Using HoG Features and Disparity
Maps," in IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 6, no. 6, pp. 2327-2337, Dec.
2013, doi: 10.1109/JSTARS.2013.2242846.
[42] Wang, Y.; Liu, Z.; Deng, W.(2019), Anchor Generation Optimization
and Region of Interest Assignment for Vehicle Detection. Sensors 2019,
19, 1089
[43] Yang, T.; Wang, X.; Yao, B.; Li, J.; Zhang, Y.; He, Z.; Duan (2016), W.
Small Moving Vehicle Detection in a Satellite Video of an Urban
Area. Sensors 2016, 16, 1528
[44] X. Chen, S. Xiang, C. Liu and C. Pan (2014), "Vehicle Detection in
Satellite Images by Hybrid Deep Convolutional Neural Networks," in
IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 10, pp. 1797-
1801, Oct. 2014, doi: 10.1109/LGRS.2014.2309695.
[45] Z. Hu, D. Yang, K. Zhang and Z. Chen (2020), "Object Tracking in
Satellite Videos Based on Convolutional Regression Network With
Appearance and Motion Features," in IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing, vol. 13, pp. 783-
793, 2020, doi: 10.1109/JSTARS.2020.2971657.
[46] A. Bhattacharya and R. Vaughan (2020), "Deep Learning Radar Design
for Breathing and Fall Detection," in IEEE Sensors Journal, vol. 20, no.
9, pp. 5072-5085, 1 May1, 2020, doi: 10.1109/JSEN.2020.2967100.
[47] L. Zhang, R. Wang and L. Cui (2011), "Real-time traffic monitoring
with magnetic sensor networks", J. Inf. Sci. Eng., vol. 27, no. 4, pp.
1473-1486, Jul. 2011.
[48] Y. Yu, H. Guan and Z. Ji(2015), "Rotation-Invariant Object Detection in
High-Resolution Satellite Imagery Using Superpixel-Based Deep Hough
Forests," in IEEE Geoscience and Remote Sensing Letters, vol. 12, no.
11, pp. 2183-2187, Nov. 2015, doi: 10.1109/LGRS.2015.2432135.
[49] L. Lindner et al. (2016), "Machine vision system for UAV navigation,"
2016 International Conference on Electrical Systems for Aircraft,
Railway, Ship Propulsion and Road Vehicles & International
Transportation Electrification Conference (ESARS-ITEC), Toulouse,
2016, pp. 1-6, doi: 10.1109/ESARS-ITEC.2016.7841356.
[50] L. Sommer, T. Schuchert and J. Beyerer (2019), "Comprehensive
Analysis of Deep Learning-Based Vehicle Detection in Aerial Images,"
in IEEE Transactions on Circuits and Systems for Video Technology,
vol. 29, no. 9, pp. 2733-2747, Sept. 2019, doi:
10.1109/TCSVT.2018.2874396.
[51] J. Shen, N. Liu, H. Sun and H. Zhou(2019), "Vehicle Detection in Aerial
Images Based on Lightweight Deep Convolutional Network and
Generative Adversarial Network," in IEEE Access, vol. 7, pp. 148119-
148130, 2019, doi: 10.1109/ACCESS.2019.2947143.
[52] C. Tao, L. Mi, Y. Li, J. Qi, Y. Xiao and J. Zhang (2019), "Scene
Context-Driven Vehicle Detection in High-Resolution Aerial Images,"
in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no.
10, pp. 7339-7351, Oct. 2019, doi: 10.1109/TGRS.2019.2912985.
[53] W. Chu, Y. Liu, C. Shen, D. Cai and X. Hua (2018), "Multi-Task
Vehicle Detection With Region-of-Interest Voting," in IEEE

1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.

You might also like