Professional Documents
Culture Documents
Invariant Feature Based Darknet Architecture For Moving Object Classification
Invariant Feature Based Darknet Architecture For Moving Object Classification
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 1
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 2
small object detection with respect to ground sampling helps in improving the detection accuracy of an object with
distance in aerial images [L. Sommer, T. Schuchert and J. different appearances. This combined system overcomes the
Beyerer (2019)]. Light weight neural networks can reduce problem of detecting small objects and gives the count of
computational cost at training, validation and testing for different vehicles that can be used for further processing in
vehicle detection in aerial images [J. Shen, N. Liu, H. Sun and applications such as intelligent transportation systems.
H. Zhou (2019)]. Context and scene specific feature detectors
can reduce false alarm rates during vehicle detection [C. Tao, C. Organization
L. Mi, Y. Li, J. Qi, Y. Xiao and J. Zhang (2019)]. Region-of- The paper is organized as follows: Section 2 describes a
interest based deep neural networks can predict the location of literature survey on object detection and classification
vehicle that correlates to bounding box of the object in ground methods. Section 3 describes the proposed framework. Results
truth [W. Chu, Y. Liu, C. Shen, D. Cai and X. Hua (2018)]. and discussion are given in section 4.
Convolutional neural networks has enhanced machine vision
with diverse technologies such as Artificial neural networks II. LITERATURE SURVEY
(ANN), Recurrent Networks, deep neural networks that Work reported in [Shenquan Qu, Ying Wang, Gaofeng
enabled increased object detection accuracy [J. M. Gandarias, Meng, and Chunhong Pan (2016)] explores vehicle detection
A. J. García-Cerezo and J. M. Gómez-de-Gabriel(2019)]. A from satellite images with two stages. Binary normed
number of self learning based object detection architectures gradients (BING) are used to extract regions and to speed up
are developed to automate the detection-classification the localization process. CNN is used for feature extraction
workflow. Different stereoscopic vision systems, methods and and classification. The first stage generates category-
implementation is explained in [Ramírez-Hernández L.R. et independent region proposals. These proposals are the input
al. (2020)] data for the next stage. Then the second stage uses CNN to
decide which proposals are vehicles. As vehicle detection
A. Motivation requires localizing objects within an image, a commonly used
Satellite image analysis helps in a wide variety of applications approach that has been used for several decades is the sliding
both in commercial and government sectors. Vehicle window based detector. This method is not practical since it is
detection, as an active research area, has been widely used in time consuming. Authors used satellite images from Google
military surveillance, intelligent traffic systems [Huang, earth of San Francisco city for implementation. Work
Xiaohui, Pan He, Anand Rangarajan, and Sanjay described in [Qiling Jiang, Liujuan Cao, Ming Cheng, Cheng
Ranka(2019)], maritime search and rescue. Human operators Wang, Jonathan Li (2015)] presents vehicle detection from
cannot monitor for long time periods. Also detecting objects satellite images using Deep Neural Networks (DNN). First,
such as cars, trucks, aircrafts and ships from high-resolution road segments are extracted. Graph based segmentation is
satellite images is a difficult task. Although various used to extract image patches. DNN is trained with these
approaches attempt to solve this problem, there is no widely patches and finally classified into vehicle and no-vehicle class.
recognized solution to the problem. The difficulties mainly lie ImageNet dataset is used by the authors. The images are of
in three aspects: the diversity of colors and shapes for different various sizes and are divided into 1000 classes. The training
vehicles, complex background and occlusions. Various object set contains about 1000 images of each class, which results in
location methods have been applied to vehicle detection. about 1.28 million images. Testing is done with 50000 and
Traditional approaches that use hand-craft features such as 150000 images with the same 1000 categories. They did not
Haar features, Scale Invariant Feature Transform (SIFT), use the validation or test set in their work. The advantage is
Local Binary Patterns (LBP) and Histogram of Oriented that, their system achieved excellent performance in object
Gradients (HOG) for detecting moving object have high false recognition and could detect both bright and dark vehicles.
alarm rate [Shugang Zhang, Zhiqiang Wei, Jie Nie, Lei But the drawback is that, it could detect vehicles on-road only.
Huang, Shuang Wang, Zhen Li(2017)]. Deep learning Authors of [Yohei Koga, Hiroyuki Miyazaki and Ryosuke
approaches for feature extraction are based on varied Shibasaki(2018)] described hard example mining for detection
Convolutional neural networks such as Region-based using R-CNN algorithm. USGS Aerial ortho images are used
Convolutional neural networks (R-CNN), Faster R-CNN, by the authors to test their system. This method is time
Region-based Fully Convolutional Network (R-FCN), VGG- consuming and did not consider balanced training data.
16 [K. Simonyan and A. Zisserman(2014)], Residual Neural [Mundhenk, T.N, Konjevod, G, Sakla, W.A, Boakye, K
Network (ResNet). These approaches face accuracy and speed (2016)] used parallel DNN for detection, that detects and
problems and are not suitable to the real-time environment. As counts cars independently of its scene and location. It
such we require an automated approach that can improve considers only cars and not other objects such as trucks, vans.
vehicle detection accuracy. Oriented_SSD (Single Shot MultiBox Detector SSD) is
described in [Tianyu Tang, Shilin Zhou, Zhipeng Deng, Lin
B. Contribution Lei and Huanxin Zou (2017)]. This method produces error in
This paper proposes a combined system of both YOLO and orientation estimation because of false and missing detections.
Faster R-CNN for detection, classification and counting cars Authors of [Konoplich, G.V.; Putin, E.O.; Filchenkov,
and trucks in satellite images. Rotation Invariant-features A.A(2015)] proposed an adapted hybrid neural network
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 3
(HDNN) to detect vehicles in real-time. This method is not If the output value is less than the predicted value, means it is
practical since it is time consuming. Multi-scale deep CNN nearing to actual label. Our proposed system uses Darknet
(MS-CNN) for object detection is described in [Cai, Z.; Fan, based YOLO for detection, because existing object detection
Q.; Feris, R.S.; Vasconcelos, N. (2016)]. This method requires systems could not accurately locate small objects [Guo X. Hu,
large memory for input up-sampling. [Azam, S.; Rafique, A.; Zhong Yang, Lei Hu, Li Huang, and Jia M. Han (2018)] and
Jeon, M (2016)] described a method called FASTER R-CNN Faster R-CNN is used for classification.
for estimating the pose of the vehicle. Their method could not
detect small objects and considers only cars. [Zhang, Huan, III. PROPOSED SYSTEM
Cai Meng, Xiangzhi Bai, and Zhaoxi Li (2018)] also described Even though mean average precision (mAP) of Faster R-
estimating pose of vehicle using an arc based ellipse fitting CNN is good, but it takes more processing time with 5 to 18
method. It concentrates on how to improve detection accuracy frames per second (fps)[ J. Shen, N. Liu, H. Sun and H.
from low resolution images using ellipse parameters. Region Zhou(2019)]. Also R-CNN requires fixed input size [H. Chen,
based networks could not detect small objects. Hyper region Z. He, B. Shi and T. Zhong(2019)]. Yolo v3 is with 155 fps
proposal network (HRPN) is described in [Tang T, Zhou S, and is fast in object detection without compromising on
Deng Z, Zou H, Lei L(2017)]. Performance is reduced accuracy. As such object detection is done using YOLO and
sometimes because of background objects. Works reported in classification using Faster R-CNN. The following Figure 1
[Sang J, Guo P, Xiang Z, Luo H, Chen X(2017)] used three presents the workflow of the proposed system.
different convolutional neural networks such as R-CNN,
VGG16, and ResNet-152 to achieve good recognition In the first step, YOLO predicts all the bounding boxes
accuracy. This method may not be suitable for implementation containing objects from the given input satellite image. In the
in real time because of speed. For detecting small objects from second stage, Convolutional network of Faster R-CNN
satellite images, [Adam Van Etten (2018)] described a verifies the detected regions and classifies objects in the
pipeline called You Only Look Twice (YOLT), that outputs detected regions. YOLO determines the locations within the
bounding boxes around the objects. YOLO 1 [J. Redmon, S. image where there is the possibility of object’s presence. The
Divvala, R. Girshick and A. Farhadi(2016)] and YOLO 2 [J. advantage of using YOLO is that, instead of using a pipeline
Redmon and A. Farhad(2017)] are developed as an alternate to of steps for detection, which is a slow process, it detects using
Faster R-CNN for vehicle detection. Satellite Imagery a single neural network. Faster R-CNN produces more false
Multiscale Rapid Detection with Windowed Networks positives than YOLO. Hence YOLO is used at detection and
(SIMRDWN) framework is proposed in [Adam Van Etten Faster R-CNN at classification. The advantage of combining
(2018)], that combines YOLT with the TensorFlow Object YOLO with Faster R-CNN is to detect smaller objects and to
Detection Application Program Interface (API) for better predict more than one class.
object detection. Both methods could not differentiate features
for highways and runways. Faster R-CNN [Qu T., Zhang Q., Image size of 416X416 is taken as input. Initially YOLO is
Sun S (2017)][ Tang T., Zhou S., Deng Z., Zou H., Lei L used for detecting the probability of bounding box. A
(2017)] generates region proposals having foreground objects bounding box with high probability (more than 0.5) is passed
in the first step and then classifies these objects in the second to Faster R-CNN for final classification. Faster R-CNN is
step. Computational cost of this method is higher. Authors of slow and this can be solved by combining YOLO v3 [Joseph
[Dmitry Sincha, Mikhail Chervonenkis and Pavel Skribtsov Redmon, Ali Farhadi (2019), Kim, Daeho, Meiyin Liu,
(2016)] described a method to detect objects in multiple SangHyun Lee, and Vineet R. Kamat (2019)] (that uses one
scales. Munich Dataset is used for experimentation. Vehicles stage detector strategy) with Faster R-CNN without having to
are classified into three classes such as heavy, light and use much expensive hardware. One extra step is added to this
middle. Performance can be increased by reducing detection hybrid architecture of Faster R-CNN and YOLO, which is to
quality. Work presented in [Jiandan Zhong, Tao Lei and augment rotations, so as to handle the unusual orientation of
Guangle Yao (2017)] described CNN-based detection model the object. Rotation invariant features are used by [Y. Yu, H.
using convolutional neural networks. Partially occluded Guan and Z. Ji(2015)] to estimate object centroid. Whereas the
objects are not detected. Also this method cannot distinguish present system used augment rotations to increase the training
between intra class objects. [Hadj-Sahraoui, Omar, Hadria set. The output from YOLO is generated after applying 1X1
Fizazi, Faouzi Berrichi, Djemoui Chamakhi, and Lahcen kernel on the feature map. Kernel size is taken as 1X1X (3X
Wahib Kebir (2019), Atta, Randa, and Mohammad (5+3)) = 1X1X24, for 3 bounding boxes and 3 classes. 1X1
Ghanbari[2013)] discussed on methods for improving the kernel size results in non-spatially correlated information loss,
resolution of images. Various functions for performance but can benefit in reduced over fitting problem.
evaluation are discussed in [Loss functions (2019)] such as Different detection size is taken for car and truck. The
cross entropy, hinge, Huber, Kullback-Leibler, Mean absolute feature map has an identical height and width of 416. Multi-
error (MAE), Mean Squared Error (MSE). Cross-entropy loss label prediction (with logistic regression) is done at down-
function is used in the proposed system to deal with sample dimensions for input image of stride 32,16,8 and 52 x
overlapping multi-class labels. This function outputs the 52, 26 x 26, 13 x 13 scales. Softmax activation is not used in
classifier performance as a probability value between 0 and 1. the proposed system, because it uses mutual exclusive
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 4
3X3 Grid
Augment Rotation
Bounding box ,
Input Satellite Image confidence score using
sigmoid function
Feature map generation
Rotation
Final detections
function and hence Sigmoid function is used for Type Filters Size Output
determining the class confidence. Darknet-53 [Chuan-Pin Lu,
Convolutional 32 3X3 256 X 256
Jiun-Jian Liaw ,Tzu-Ching Wu and Tsung-Fu Hung (2019)]
Convolutional 64 3 X 3 /2 128 X 128
with Sigmoid function is shown in Figure 2. Yolo v3 has 106
Convolutional 32 1X1
Convolutional layers (53 from Darknet trained on Imagenet
1X Convolutional 64 3X3
and additional 53 for detection task) is used for feature
Residual 128 X 128
extraction. The sigmoid function is added to the architecture.
Convolutional 128 3 X 3 /2 64 X 64
The bottom three layers are used to detect different scaled Convolutional 64 1X1
objects. 2X Convolutional 128 3X3
The network downsamples the input image as explained in Residual 64 X 64
[Chuan-Pin Lu, Jiun-Jian Liaw ,Tzu-Ching Wu andTsung-Fu Convolutional 256 3 X 3 /2 32 X 32
Hung (2019)] and at the layer 81 with stride 32. 1 x 1 Convolutional 128 1X1
detection kernel gives feature map of 13 x 13 x 8 for image 8X Convolutional 256 3X3
size 416X416. Uupsampling is done by a factor of 2. Residual 32 X 32
Detections are done at 94 layer and 106 layer with stride 16 Convolutional 512 3 X 3 /2 16 X 16
and 8 respectively. Upsampling helps in detecting small Convolutional 256 1X1
objects, where the network learns fine-grained features. 8X Convolutional 512 3X3
Selective search that uses color, texture properties are used to Residual 16 X 16
classify the regions given by YOLO. It can reduce the analysis Different
Convolutional 1024 3 X 3 /2 8X8
Scaled
of a number of bounding boxes. Intersection over Union (IoU) Convolutional 512 1X1
objects
overlap of 0.3 is taken for positive prediction of small objects 4X Convolutional 1024 3X3
too, as given in Eq(1). Residual 8X8
IoU = Areaofoverlap / AreaofUnion (1)
Sigmoid Function
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 5
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 6
trucks trucks model. Other methods such as Precision and Recall are used to
detected evaluate performance of the proposed system. Also F1 score is
1 Columbus 1301 11 8 calculated to find balance of precision and recall.
2 Potsdam 7046 43 21 Table 5 and table 6 presents accuracy of the proposed
3 Selwyn 4828 24 13 system for detecting cars and trucks in the COWC dataset.
Table 7 presents the accuracy of the proposed system for
4 Toronto 7485 29 14
detecting cars and trucks in the Vedai dataset.
5 Utah 8145 13 6
6 Vaihingen 3748 58 28 Table 5 Accuracy of the proposed system for cars on COWC
S.No Dataset Precision Recall Accuracy
Table 3: Results of the proposed system for detecting cars and
1 Columbus 99.77 99.61 99.38
trucks in Vedai dataset
S.No Object Images No. of objects No..of 2 Potsdam 99.89 85.22 85.17
detections 3 Selwyn 99.62 98.54 98.17
1 Car 1250 5875 4678
4 Toronto 99.97 98.78 98.75
2 Truck 1250 343 301
5 Utah 99.96 99.16 99.13
B. Performance Evaluation Measures 6 Vaihingen 100 96.96 96.96
Performance of the proposed system is calculated using
several measures as described in the following: Average 99.87 96.38 96.26
(i) Confusion matrix: It is a table with True Positives (TP),
False Positives (FP), False Negatives (FN) and True Negatives Table 6 Accuracy of proposed system for trucks on COWC
(TN), to visualize classification algorithm performance as S.No Dataset Precision Recall Accuracy
shown in the table 4 1 Columbus 80 72.73 73.68
2 Potsdam 87.5 48.84 52.8
Table 4 Confusion Matrix [S. Vasavi, Reshma Shaik, 3 Selwyn 92.86 54.17 63.64
Sahithi Yarlagadda(2018)] 4 Toronto 77.78 56 60.53
Predicted a=0 Predicted b=1 5 Utah 75 46.15 52.63
actual a=0 TP FP 6 Vaihingen 96.55 48.28 51.56
Average 84.95 54.36 59.14
actual b=1 FN TN
(ii) Classification Accuracy Rate (CAR): This is used to Table 7 Accuracy of the proposed system on Vedai dataset
measure based on the confusion matrix as given in Eq(2) S.No Dataset Precision Recall Accuracy
[S.Vasavi, Reshma Shaik, Sahithi Yarlagadda(2018)]. 1 Cars 99.13 82.41 82.45
accuracy = (tp + tn) / (tp + tn + fp + fn) (2) 2 Trucks 94.06 89.05 86.51
(iii) Precision: It is used to measure the relevancy of the
It can be observed that, Precision and Recall for Utah region
result generated as defined in Eq(3) [S. Vasavi, Reshma Shaik,
is less when compared to other regions, because of factors
Sahithi Yarlagadda(2018)].
such as car density, building architecture, and vegetation
Pr ecision = tp / (tp + fp ) (3) pattern. Table 8 presents Accuracy of the proposed system for
(iv) Recall: It is used to measure relevancy of the result detecting cars and trucks in both datasets.
generated as given in Eq(4) [S. Vasavi, Reshma Shaik, Sahithi
Yarlagadda (2018)]. Table 8 Accuracy of the proposed system for both datasets
Re call = tp / (tp + fn) (4) S. Data Cars Trucks Cars Trucks Cars Trucks
No set Precision Recall Accuracy
(v) F-Measure as given in Eq(5)[S. Vasavi, Reshma Shaik, 1 COWC 99.87 84.95 96.38 54.36 96.26 59.14
Sahithi Yarlagadda (2018)]. 2 Vedai 99.13 94.06 82.41 89.05 82.45 86.51
F1=2*(precision*recall)/(precision +recall ) (5)
(vi) Cross entropy loss function H is given in Eq(6)[ Figure 3 and figure 4 presents accuracy of the proposed
Murphy, Kevin (2012)]. This value can be between 0 and 1. system at each epoch.
The low this value, the more robust the developed model.
H ( p, q ) = − p( x) log q( x)
x (6)
Where p(x) is the required probability and q(x) is the
predicted probability.
Initially accuracy is calculated to evaluate the proposed
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 7
Cross entropy loss function value for COWC dataset is 0.4 and
0.56 for VEDAI dataset. These values for the probabilities tell
how easily the model can detect given objects (e.g. spread
between 0-1 or 0-0.5).
Figure. 4. Accuracy for VEDAI Figure 5 presents the precision recall curve for both datasets
on cars and trucks. For example in COWC dataset, if we
Table 9 and Table 10 compares the accuracy of the proposed choose precision at 92%, then 84% of cars were detected and
system with existing works. Table 11 presents summary of the if precision level is equal to 0.7 then 10% of the cars are
false positives of both datasets and for cars and trucks. . True detected. If recall increases to 93%, then the precision drops to
positives, False positives, True negatives and False negatives 20%. When both the cars and trucks are considered together,
are aggregated into a single value as shown in these tables. It precision resulted to 92.41. True positive rate (TFR) and false
is 0.83 for cars and 0.9 for trucks. positive rate (FPR) are shown in the table 12, to determine
parameters of Receiver Operating Characteristics (ROC) curve
Table 9 Proposed system Vs Existing works on COWC as shown in the figure 6, to validate the proposed model. If
Performanc Propose [Mundhen [David [Junyan this curve is either closer to left border or top border then the
e Metric d k, T.N.; Yu,(2018) Lu, Chi test is accurate and atleast comes nearer to the 45 0 diagonal
system Konjevod, ] Ma, Li then the model test is less accurate.
G.; Sakla, Li, TruePositiveRate = tp / (tp + fn) (7)
W.A.; Xiaoyan FalsePositiveRate = fp / ( fp + tn) (8)
Boakye, Xing,
Least square method as explained in [Ramírez-Hernández,
K(2016)] Yong
L. R., Rodríguez-Quiñonez, J. C., Castro-Toscano, M. J.,
Zhang,
Zhigang Hernández-Balbuena, D., Flores-Fuentes, W., Rascón-
Carmona, R.,& Sergiyenko, O. (2020). ] is an alternative
Wang,
method to model the camera calibration error.
Jiuwei
Xu(2018)
]
Accuracy 96.26 89.29 85 95.32
Precision 92.41 92.59 - -
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 8
Figure. 6. ROC curve Figure.8. Intersection of Union (IoU) on trucks for COWC
Table 12: True positive rate and False positive rate for the two
datasets
S.No Dataset TPR FPR
1 COWC (cars) 0.96 0.37
2 COWC (trucks) 0.52 0.003
3 Vedai (cars) 0.82 0.17
4 Vedai (trucks) 0.89 0.25
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 9
containing objects along with bounding box locations and the on YOLO, Journal of Computer and Communications, 2018, PP. 98-
107.
probability of each class to achieve high accuracy. Jaccard
[14] Joseph Redmon, Ali Farhadi(2019), YOLOv3: An Incremental
index(IoU) of 0.3, 0.4 and 0.5 is taken to consider a match Improvement, arXiv:1804.02767, Mar. 2018, [online] Available:
between the predicted box and ground truth box. Multi-label https://arxiv.org/abs/1804.02767.
classification using logistic regression is done. All Bounding [15] Konoplich, G.V.; Putin, E.O.(2016); Filchenkov, A.A. Application of
deep learning to the problem of vehicle detection in UAV images. In
boxes with high scores are considered for the classification.
Proceedings of the 2016 XIX IEEE International Conference on Soft
Additional training images are added using Rotation step and Computing and Measurements (SCM), St. Petersburg, Russia, 25–27
this process helped our proposed system to achieve a better May 2016; pp. 4–6.
detection rate. The robustness of the proposed system is [16] Cai, Z.; Fan, Q.; Feris, R.S.; Vasconcelos, N(2016). A unified multi-
scale deep Convolutional neural network for fast object detection. In
evaluated using several performance measures. Objects are
Proceedings of the 2016 European Conference on Computer Vision,
detected with different scales. Evaluation results on two Amsterdam, The Netherlands, 8–16 October 2016; pp. 354–370.
benchmark datasets COWC and VEDAI proved that our [17] Azam, S.; Rafique, A.; Jeon, M(2016). Vehicle pose detection using
method can detect and classify with better accuracy. We faced region based convolutional neural network. In Proceedings of the
International Conference on Control, Automation and Information
difficulty to differentiate between camping cars and big vans
Sciences (ICCAIS), Ansan, Korea, 27–29 October 2016; pp. 194–198.
because of the region set for car and truck. Partially occluded [18] Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L(2017). Vehicle detection
objects are not detected. Even though our method can in aerial images based on region convolutional neural networks and hard
distinguish intra class objects, identifying appropriate region negative example mining. Sensors 2017, 17, 336.
[19] Sang, J.; Guo, P.; Xiang, Z.; Luo, H.; Chen(2017), X. Vehicle detection
size is to be strengthened.
based on faster R-CNN. J. Chongqing Univ(NatSciEd)2017, 40, 32–36.
Our future work is to detect the objects with partial [20] Adam Van Etten(2018), Satellite Imagery Multiscale Rapid Detection
occlusion and to identify the appropriate region size to with Windowed Networks, Computer Vision and Pattern
distinguish intra-class vehicles. The proposed system will be Recognition,2018,pp:1-12
[21] Qu T., Zhang Q., Sun S(2017). Vehicle detection from high-resolution
evaluated on other objects such as ships, aircraft and its usage
aerial images using spatial pyramid pooling-based deep convolutional
in on-board system so as to fine tune the system that can neural networks. Multimedia Tools Appl. 2017;76:21651–21663
detect generic objects. [22] Tang T., Zhou S., Deng Z., Zou H., Lei L(2017). Vehicle detection in
aerial images based on region convolutional neural networks and hard
negative example mining. Sensors. 2017;17:336
REFERENCES [23] Darknet(2018),
[1] Shenquan Qu, Ying Wang, Gaofeng Meng, and Chunhong Pan(2016), https://github.com/pjreddie/darknet/blob/master/cfg/darknet53.cfg, last
Vehicle Detection in Satellite Images by Incorporating Objectness and accessed 12-09-2018
Convolutional Neural Network, Journal of Industrial and Intelligent [24] Loss functions(2019), https://ml-
Information Vol. 4, No. 2, March 2016, pp:158-162 cheatsheet.readthedocs.io/en/latest/loss_functions.html, Last accessed on
[2] Qiling Jiang, Liujuan Cao, Ming Cheng, Cheng Wang, Jonathan 2-2-2019
Li(2015), Deep neural networks-based vehicle detection in satellite [25] Adam Van Etten(2018), You Only Look Twice: Rapid Multi-Scale
images, Conference: 2015 International Symposium on Bioelectronics Object Detection In Satellite Imagery, pp:1-8,2018
and Bioinformatics (ISBB) [26] Dmitry Sincha, Mikhail Chervonenkis and Pavel Skribtsov(2016)
[3] Yohei Koga, Hiroyuki Miyazaki and Ryosuke Shibasaki(2018), A CNN- Vehicle Detection and Classification in Aerial Images, Indian Journal of
Based Method of Vehicle Detection from Aerial Images Using Hard Science and Technology, Vol 9(48), 2016, pp:1-7
Example Mining, Remote Sens. 2018, 10, 124; pp:1-21, [27] Jiandan Zhong, Tao Lei and Guangle Yao(2017), Robust Vehicle
doi:10.3390/rs10010124 Detection in Aerial Images Based on Cascaded Convolutional Neural
[4] Mundhenk, T.N.; Konjevod, G.; Sakla, W.A.; Boakye, K(2016). A Networks, Sensors 2017, 17, 2720;pp:1-17
Large Contextual Dataset for Classification, Detection and Counting of [28] Shugang Zhang, Zhiqiang Wei, Jie Nie, Lei Huang, Shuang Wang, and
Cars with Deep Learning. In Lecture Notes in Computer Science, Zhen Li, A Review on Human Activity Recognition Using Vision-Based
Proceedings of the ECCV 2016: Springer: Volume 9907, pp. 785–800 Method, Journal of Healthcare Engineering, Volume 2017
[5] Razakarivony, S. and Jurie, F. (2015) Vehicle Detection in Aerial [29] Guo X. Hu, Zhong Yang, Lei Hu, Li Huang, and Jia M. Han, Small
Imagery: A Small Target Detection Benchmark. Journal of Visual Object Detection with Multiscale Features, International Journal of
Communication & Image Representation, 34, 187-203. Digital Multimedia Broadcasting, Volume 2018, Article ID 4546896, 10
https://doi.org/10.1016/j.jvcir.2015.11.002 pages,2018
[6] Cars Overhead With Context (COWC)(2018) [30] Huang, Xiaohui, Pan He, Anand Rangarajan, and Sanjay Ranka(2019).
https://gdo152.llnl.gov/cowc/ , Last accessed August 1st 2018. "Intelligent Intersection: Two-Stream Convolutional Networks for Real-
[7] Pretrained Weight file(2019) time Near Accident Detection in Traffic Video." arXiv preprint
https://pjreddie.com/media/files/yolov3.weights, February 1st 2019 arXiv:1901.01138 (2019).
[8] Tianyu Tang,Shilin Zhou, Zhipeng Deng, Lin Lei and Huanxin [31] Kim, Daeho, Meiyin Liu, SangHyun Lee, and Vineet R. Kamat(2019).
Zou(2017), Arbitrary-Oriented Vehicle Detection in Aerial Imagery with "Remote proximity monitoring between mobile construction resources
Single Convolutional Neural Networks, Remote Sensing,2017,9, 1170 using camera-mounted UAVs." Automation in Construction 99 (2019):
[9] Chuan-Pin Lu, Jiun-Jian Liaw ,Tzu-Ching Wu andTsung-Fu 168-182.
Hung(2019) Development of a Mushroom Growth Measurement System [32] Zhang, Huan, Cai Meng, Xiangzhi Bai, and Zhaoxi Li(2018). "Rock-
Applying Deep Learning for Image Recognition, Agronomy 2019, 9(1), ring detection accuracy improvement in infrared satellite image with
32;pp:1-21 sub-pixel edge detection." IET Image Processing 13, no. 5 (2018): 729-
[10] S. Vasavi, Reshma Shaik, Sahithi Yarlagadda(2018). "chapter 12 735.
Moving Object Classification in a Video Sequence Using Invariant [33] Hadj-Sahraoui, Omar, Hadria Fizazi, Faouzi Berrichi, Djemoui
Feature Extraction", IGI Global, 2018 Chamakhi, and Lahcen Wahib Kebir(2010). "High-resolution DEM
[11] Murphy, Kevin (2012). Machine Learning: A Probabilistic Perspective. building with SAR interferometry and high-resolution optical
MIT. image." IET Image Processing 13, no. 5 (2019): 713-721.
[12] David Yu(2018), Parking Lot Vehicle Detection Using [34] Atta, Randa, and Mohammad Ghanbari(2013). "Low-contrast satellite
Deep Learning,2018 https://medium.com/geoai/parking-lot-vehicle- images enhancement using discrete cosine transform pyramid and
detection-using-deep-learning-49597917bc4a, Last accessed 23-9-2018 singular value decomposition." IET Image processing 7, no. 5 (2013):
[13] Junyan Lu, Chi Ma, Li Li, Xiaoyan Xing, Yong Zhang, Zhigang Wang, 472-483.
Jiuwei Xu(2018), A Vehicle Detection Method for Aerial Image Based
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/JSEN.2020.3007883, IEEE Sensors
Journal
Sensors-30790-2020 10
[35] K. Simonyan and A. Zisserman(2015), “Very Deep Convolutional Transactions on Image Processing, vol. 27, no. 1, pp. 432-441, Jan.
Networks for Large-Scale Image Recognition,” CoRR, vol. 2018, doi: 10.1109/TIP.2017.2762591.
abs/1409.1556, 2014 [54] J. M. Gandarias, A. J. García-Cerezo and J. M. Gómez-de-
[36] Rivas-López, M., Gomez-Sanchez, C. A., Rivera-Castillo, J., Gabriel(2019), "CNN-Based Methods for Object Recognition With
Sergiyenko, O., Flores-Fuentes, W., Rodríguez-Quiñonez, J. C., & High-Resolution Tactile Sensors," in IEEE Sensors Journal, vol. 19, no.
Mayorga-Ortiz, P. (2015, June). Vehicle detection using an infrared light 16, pp. 6872-6882, 15 Aug.15, 2019, doi: 10.1109/JSEN.2019.2912968.
emitter and a photodiode as visualization system. In 2015 IEEE 24th [55] H. Chen, Z. He, B. Shi and T. Zhong(2019), "Research on Recognition
International Symposium on Industrial Electronics (ISIE) (pp. 972-975). Method of Electrical Components Based on YOLO V3," in IEEE
IEEE. Access, vol. 7, pp. 157818-157829, 2019, doi:
[37] Z. Zhang, M. Tao and H. Yuan(2015)(2015), "A Parking Occupancy 10.1109/ACCESS.2019.2950053.
Detection Algorithm Based on AMR Sensor," in IEEE Sensors Journal, [56] J. Redmon, S. Divvala, R. Girshick and A. Farhadi(2016) "You only
vol. 15, no. 2, pp. 1261-1269, Feb. 2015, look once: Unified real-time object detection", Proc. IEEE Conf.
doi: 10.1109/JSEN.2014.2362122. Comput. Vis. Pattern Recognit. (CVPR), pp. 779-788, Jun. 2016.
[38] R. Sundar, S. Hebbar and V. Golla(2015), "Implementing Intelligent [57] J. Redmon and A. Farhad(2017)i, "YOLO9000: Better faster stronger",
Traffic Control System for Congestion Control, Ambulance Clearance, Proc. CVPR, pp. 7263-7271, Jul. 2017.
and Stolen Vehicle Detection," in IEEE Sensors Journal, vol. 15, no. 2, [58] Ramírez-Hernández, L. R., Rodríguez-Quiñonez, J. C., Castro-Toscano,
pp. 1109-1113, Feb. 2015, doi: 10.1109/JSEN.2014.2360288. M. J., Hernández-Balbuena, D., Flores-Fuentes, W., Rascón-Carmona,
[39] R. Madli, S. Hebbar, P. Pattar and V. Golla(2015), "Automatic R.,& Sergiyenko, O. (2020). Improve three-dimensional point
Detection and Notification of Potholes and Humps on Roads to Aid localization accuracy in stereo vision systems using a novel camera
Drivers," in IEEE Sensors Journal, vol. 15, no. 8, pp. 4313-4318, Aug. calibration method. International Journal of Advanced Robotic
2015, doi: 10.1109/JSEN.2015.2417579. Systems, 17(1), 1729881419896717.
[40] X. Jin, S. Sarkar, A. Ray, S. Gupta and T. Damarla (2012), "Target [59] Ramírez-Hernández L.R. et al. (2020), Stereoscopic Vision Systems in
Detection and Classification Using Seismic and PIR Sensors," in IEEE Machine Vision, Models, and Applications. In: Sergiyenko O., Flores-
Sensors Journal, vol. 12, no. 6, pp. 1709-1718, June 2012, doi: Fuentes W., Mercorelli P. (eds) Machine Vision and Navigation.
10.1109/JSEN.2011.2177257. Springer, Cham ISBN 978-3-030-22587-2, pp. 241-265, 2020.
[41] S. Tuermer, F. Kurz, P. Reinartz and U. Stilla (2013), "Airborne Vehicle
Detection in Dense Urban Areas Using HoG Features and Disparity
Maps," in IEEE Journal of Selected Topics in Applied Earth
Observations and Remote Sensing, vol. 6, no. 6, pp. 2327-2337, Dec.
2013, doi: 10.1109/JSTARS.2013.2242846.
[42] Wang, Y.; Liu, Z.; Deng, W.(2019), Anchor Generation Optimization
and Region of Interest Assignment for Vehicle Detection. Sensors 2019,
19, 1089
[43] Yang, T.; Wang, X.; Yao, B.; Li, J.; Zhang, Y.; He, Z.; Duan (2016), W.
Small Moving Vehicle Detection in a Satellite Video of an Urban
Area. Sensors 2016, 16, 1528
[44] X. Chen, S. Xiang, C. Liu and C. Pan (2014), "Vehicle Detection in
Satellite Images by Hybrid Deep Convolutional Neural Networks," in
IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 10, pp. 1797-
1801, Oct. 2014, doi: 10.1109/LGRS.2014.2309695.
[45] Z. Hu, D. Yang, K. Zhang and Z. Chen (2020), "Object Tracking in
Satellite Videos Based on Convolutional Regression Network With
Appearance and Motion Features," in IEEE Journal of Selected Topics
in Applied Earth Observations and Remote Sensing, vol. 13, pp. 783-
793, 2020, doi: 10.1109/JSTARS.2020.2971657.
[46] A. Bhattacharya and R. Vaughan (2020), "Deep Learning Radar Design
for Breathing and Fall Detection," in IEEE Sensors Journal, vol. 20, no.
9, pp. 5072-5085, 1 May1, 2020, doi: 10.1109/JSEN.2020.2967100.
[47] L. Zhang, R. Wang and L. Cui (2011), "Real-time traffic monitoring
with magnetic sensor networks", J. Inf. Sci. Eng., vol. 27, no. 4, pp.
1473-1486, Jul. 2011.
[48] Y. Yu, H. Guan and Z. Ji(2015), "Rotation-Invariant Object Detection in
High-Resolution Satellite Imagery Using Superpixel-Based Deep Hough
Forests," in IEEE Geoscience and Remote Sensing Letters, vol. 12, no.
11, pp. 2183-2187, Nov. 2015, doi: 10.1109/LGRS.2015.2432135.
[49] L. Lindner et al. (2016), "Machine vision system for UAV navigation,"
2016 International Conference on Electrical Systems for Aircraft,
Railway, Ship Propulsion and Road Vehicles & International
Transportation Electrification Conference (ESARS-ITEC), Toulouse,
2016, pp. 1-6, doi: 10.1109/ESARS-ITEC.2016.7841356.
[50] L. Sommer, T. Schuchert and J. Beyerer (2019), "Comprehensive
Analysis of Deep Learning-Based Vehicle Detection in Aerial Images,"
in IEEE Transactions on Circuits and Systems for Video Technology,
vol. 29, no. 9, pp. 2733-2747, Sept. 2019, doi:
10.1109/TCSVT.2018.2874396.
[51] J. Shen, N. Liu, H. Sun and H. Zhou(2019), "Vehicle Detection in Aerial
Images Based on Lightweight Deep Convolutional Network and
Generative Adversarial Network," in IEEE Access, vol. 7, pp. 148119-
148130, 2019, doi: 10.1109/ACCESS.2019.2947143.
[52] C. Tao, L. Mi, Y. Li, J. Qi, Y. Xiao and J. Zhang (2019), "Scene
Context-Driven Vehicle Detection in High-Resolution Aerial Images,"
in IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no.
10, pp. 7339-7351, Oct. 2019, doi: 10.1109/TGRS.2019.2912985.
[53] W. Chu, Y. Liu, C. Shen, D. Cai and X. Hua (2018), "Multi-Task
Vehicle Detection With Region-of-Interest Voting," in IEEE
1558-1748 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on July 16,2020 at 14:15:16 UTC from IEEE Xplore. Restrictions apply.