Professional Documents
Culture Documents
A Novel Model Based On AdaBoost and Deep CNN For Vehicle Classification
A Novel Model Based On AdaBoost and Deep CNN For Vehicle Classification
A Novel Model Based On AdaBoost and Deep CNN For Vehicle Classification
Received September 24, 2018, accepted October 8, 2018, date of publication October 12, 2018, date of current version November 8, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2875525
ABSTRACT Real-time vehicle classification is an important issue in intelligent transport systems. In this
paper, we propose a novel model to classify five distinct groups of vehicle images from actual life based on
AdaBoost algorithm and deep convolutional neural networks (CNNs). The experimental results demonstrate
that the proposed model attains the highest classification accuracy of 99.50% on the test data set, while it
takes only 28 ms to identify a vehicle image. This performance significantly outperforms the traditional
algorithms, such as SIFT-SVM, HOG-SVM, and SURF-SVM. Moreover, the proposed deep CNN-based
feature extractor has less parameters, thereby occupies much smaller storage resources as compared with the
state-of-the-art CNN models. The high prediction accuracy and low storage cost confirm the effectiveness
of our proposed model for vehicle classification in real time.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 60445
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
W. Chen et al.: Novel Model Based on AdaBoost and Deep CNN for Vehicle Classification
the operating time of each picture is only about convolutional layers. As shown in Figure 2, the blue mod-
28 ms, which adequately meets the needs of real-time ule accepts the dataset to train the CNN model, which is
processing. described in the next section. After training, the CNN model
The rest of the paper is organized as follows: Section II can be used to extract useful and highly representative fea-
describes the proposed model for vehicle classification in tures of vehicle images. In this novel model, the CNN is used
detail, including the architecture of the frame work and the as a feature extractor, and SVM is used as the weak classifier
feature extractor. Then the detailed principle about how it of the AdaBoost. The proposed model training procedure is
is suitable for vehicle classification is introduced in this described in Figure 3.
section. Section III presents the experiment results and anal-
ysis, as well as comparison with other algorithms. Lastly,
conclusions are concluded in Section IV.
A. FRAMEWORK OVERVIEW
Figure 2 depicts the proposed model architecture. It begins
with an input dataset, which accepts the size of 224×224 × 3
pixels-sized images (224 is the pixel of height and weight
of vehicle images, and 3 is the number of image channels).
In order to reduce the cost of storage and training time,
we designed a CNN model with 12 convolutional neural
layers and 2 fully connected layers (FC) as a feature extrac-
tor. This CNN model has less parameters and more deep
be depicted in (2) and (3). In the deployment of CNN, we add the dropout layers into
the two FC layers. The rates of all dropout layers are set
yr×r f (r, r)
yi = max R×R i (2) to be 0.3.
f (r, r) = ε · yk−1
i × ωi,j
k
+ ekj (3)
4) ACTIVATION FUNCTION
In the above formulas, maxR×R denotes max-pooling operate The activation function layer is also called the nonlinearity
in a R × R region, yr×ri is the i-th output maps of a r × r mapping layer. It is used to increase the nonlinear expressive
window, and f (r, r) represents the window function of setting power of the whole network. The stacking of some linear
blocks where ε is a trainable variant. operation layers can only play the role of linear mapping, but
cannot form complex functions. There are several available
activation functions to be chosen. In the following, we will
introduce the rectified linear unit (ReLU) that we used in the
proposed network.
ReLU is one of the most commonly used activation func-
tions in deep convolution neural networks [23]. In the process
of error back propagation, it’s difficult or even impossible to
FIGURE 6. Illustration of the max-pooling operation.
transmit the error of the derivative in the region to the front
layer, which leads to training failures. The ReLU function is
a piecewise function, defined as:
Figure 6 shows an example of the max-pooling operation. (
The process of the picture from left to right in turn is: the x if x > 0
Rectifier (x) = {max(0, x)} = (4)
first max-pooling, the feature map after first max-pooling, 0 if x < 0
the sixteen max-pooling and the feature map after the sixteen
max-pooling. It is clear seen that the result of the confluence
operation is smaller than its input image. In fact, the con-
fluence operation is actually a ‘‘drop sampling’’ operation.
The pool layer has three functions: feature invariance, feature
dimensionality reduction, and prevention of over-fitting.
3) DROPOUT LAYER
Over-fitting is such a considerable issue in deep learning,
that various methods have been designed by researchers.
In order to handle this issue, two approaches are adopted
in our CNN-based feature extractor. For one thing, data
augmentation was adopted in the training process such as FIGURE 8. ReLU function and its function gradient. (a) ReLU function.
(b) ReLU gradient function.
image horizontally flipping and random crops. For another
thing, the dropout layer proposed by Hinton et al. [22] As shown in Figure 8, when x is greater than 0, the function
is adopted in the feature extractor. The dropout layer can gradient is 1 or 0. It is also found that the function can help the
make the network structure cleaner and more regularized. stochastic gradient descent (SGD) method converge, and the
The correlation between neurons decreases after convolution convergence rate is about 6 times than Sigmoid function [23].
operation and the network can get better parameters in the
process of updating weights. The specific working process is C. TRAINING PROCESS OF CNN
shown in Figure 7. The training stage of our CNN is shown in Table 1. We train
the proposed model using standard hinge loss function
instead of softmax cross entropy loss function for optimiza-
tion. Adaptive Moment Estimation (Adam) is adopted for
learning the gradient reduction, which is the most popular
optimization approach for neural network training. In the pro-
posed model, L2 regularization was computed for each of the
fully connected layers’ weights and bias matrices was applied
to the eventual loss function. In order to prevent over-fitting,
we applied dropout to fully connected layers. Table 2 shows
the values of initial parameters of Adam algorithm. After the
CNN model being trained, the output layer of the CNN is
then removed. Then we use the trained CNN model to extract
FIGURE 7. The diagram of the dropout layer. features from the raw vehicle images.
E. ENSEMBLE SCHEME
In the proposed model, we use AdaBoost algorithm to
assemble SVM methods. AdaBoost, proposed by Freund and
Schapire in 1995, combines a large set of weak classifica-
tion function. It uses a weighted majority vote, and asso-
ciates a larger weight with good classification functions and
a smaller weight with poor classification functions. In this
paper, SVM was assembled by AdaBoost for ensemble learn-
ing. Figure11 shows the structure of ensemble scheme.
TABLE 3. The process of the ensemble learning. TABLE 4. Numbers of training and testing images for the five types of
vehicles.
C. EXPERIMENTAL RESULTS
1) PERFORMANCE OF THE CNN EXTRACTOR
Learning curve is one of the most crucial evaluation measures
for the deep learning algorithm. As can be seen in figure 13,
with the proposed model, both the training loss and vali-
dation loss continue to decrease as the number of epochs
goes up. From this curves, we can draw the conclusion that
our model didn’t incur an ‘‘over-fitting’’ or ‘‘under-fitting’’
FIGURE 12. Photographing vehicles from different angles. phenomenon.
Besides, classification accuracy is also an imperative
Figure 1 shows some vehicle images from traffic road. aspect for evaluating the classifier in machine learning. It is
Some of them are captured from bad weather such as night, defined as the number of correctly classified vehicles divided
rain, and haze, which make the vehicle classifying task still by the total number of testing dataset. We use the accuracy of
a big issue. In Figure 1, vehicles from the first row to the confusion matrix diagram to analyze the proposed method.
last in turn are cars, trucks, vans, buses and tractors. Detail As shown in figure 14, the actual prediction accuracy of
information of training and testing images of the five types these five vehicles is 99.80%, 99.74%, 99.62%, 98.85% and
TABLE 7. Results of testing dataset by the proposed method and other [3] Y. Jo and I. Jung, ‘‘Analysis of vehicle detection with WSN-based ultra-
methods. sonic sensors,’’ Sensors, vol. 14, no. 8, pp. 14050–14069, 2014.
[4] K. Mu, F. Hui, and X. Zhao, ‘‘Multiple vehicle detection and tracking in
highway traffic surveillance video based on sift feature matching,’’ J. Inf.
Process Syst., vol. 12, no. 2, pp. 183–195, 2016.
[5] N. C. Acheruvu and V. Muthukumar, ‘‘Video based vehicle detection and
its application in intelligent transportation systems,’’ J. Transp. Technol.,
vol. 2, no. 4, pp. 305–314, Oct. 2012.
[6] K. Robert, ‘‘Video-based traffic monitoring at day and night vehicle fea-
tures detection tracking,’’ in Proc. ITSC, St. Louis, MO, USA, Oct. 2009,
pp. 1–6.
[7] A. Jazayeri, H. Cai, J. Y. Zheng, and M. Tuceryan, ‘‘Vehicle detection and
tracking in car video based on motion model,’’ IEEE Trans. Intell. Transp.
Syst., vol. 12, no. 2, pp. 583–595, Jun. 2011.
[8] X. Yuan, Y.-J. Lu, and S. Sarraf, ‘‘Computer vision system for automatic
vehicle classification,’’ J. Transp. Eng., vol. 120, no. 6, pp. 861–876, 1994.
[9] Q. Tian, T. Zhong, and H. Li, ‘‘A new method for vehicle detection using
MexicanHat wavelet and moment invariants,’’ in Proc. SiPS, Taipei City,
Taiwan, Oct. 2013, pp. 289–294.
[10] M. Liu, C. Wu, and Y. Zhang, ‘‘Motion vehicle tracking based on multi-
resolution optical flow and multi-scale Harris corner detection,’’ in Proc.
IEEE Int. Conf. ROBIO, Sanya, China, Dec. 2007, pp. 2032–2036.
[11] M. A. Naiel, M. O. Ahmad, and M. N. S. Swamy, ‘‘Vehicle detection using
approximation of feature pyramids in the DFT domain,’’ in Proc. Int. Conf.
Image Anal. Recognit., vol. 9164, Jul. 2015, pp. 429–436.
[12] S. Rahati, R. Moravejian, E. M. Kazemi, and F. M. Kazemi, ‘‘Vehicle
recognition using contourlet transform and SVM,’’ in Proc. 5th Int. Conf.
Inf. Technol., Las Vegas, NV, USA, Apr. 2008, pp. 894–898.
[13] J. Arróspide and L. Salgado, ‘‘Log-Gabor filters for image-based vehicle
FIGURE 15. Performance of comparison with different methods in terms verification,’’ IEEE Trans. Image Process., vol. 22, no. 6, pp. 2286–2295,
of accuracy for testing dataset. Jun. 2013.
[14] X. Cao, C. Wu, P. Yan, and X. Li, ‘‘Linear SVM classification using boost-
ing HOG features for vehicle detection in low-altitude airborne videos,’’ in
seen that our method outperforms the comparative methods Proc. 18th IEEE Int. Conf. Image Process., Brussels, Belgium, Sep. 2011,
in 5 type vehicle images as shown in Figure 15. pp. 2421–2424.
[15] L. Hua, W. Xu, T. Wang, R. Ma, and B. Xu, ‘‘Vehicle recognition using
improved SIFT and multi-view model,’’ J. Xi’an Jiaotong Univ., vol. 47,
IV. CONCLUSIONS no. 4, pp. 92–99, 2013.
A novel model based on AdaBoost algorithm and deep con- [16] W. Yu, L. Lei, and S. Feng, ‘‘A new vehicle recognition approach based on
graph spectral theory and BP neural network,’’ in Proc. Int. Conf. Comput.
volutional neural networks has been proposed for vehicle Sci. Electron. Eng., Hangzhou, China, Mar. 2012, pp. 84–86.
classification in this paper. Inspired by VGGNet and AlexNet [17] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
a high-efficiency deep CNN model is designed to directly recognition,’’ in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016,
pp. 770–778.
extract the features of vehicle images. The output layer of the [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
CNN is taken as the base learner of the AdaBoost algorithm. with deep convolutional neural networks,’’ in Proc. Neural Inf. Process.
Compared with the state-of-the-art methods, the proposed Syst., 2012, pp. 1097–1105.
[19] O. Russakovsky et al., ‘‘ImageNet large scale visual recognition chal-
method has many advantages. First of all, the presented lenge,’’ Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
model is able to perform the highest accuracy even in low [20] L. Deng and Z. Wang, ‘‘Deep convolution neural networks for vehicle
quality data. In addition, the proposed feature extractor can classification,’’ Appl. Res. Comput., vol. 33, no. 3, pp. 930–932, 2016.
[21] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for
dramatically save the cost of storage resources. Last but not large-scale image recognition,’’in Proc. ICLR, 2015, pp. 1–14.
least, the proposed method reduces the computation cost [22] G. Hinton et al., ‘‘Improving neural networks by preventing co-adaptation
of feature detectors,’’ Comput. Sci., vol. 3, no. 4, pp. 212–223, Jul. 2012.
while ensuring high accuracy. Our method can be used in [23] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Deep sparse rectifier neural net-
many fields such as intelligent transport systems and traffic works,’’ in Proc. Int. Conf. Artif. Intell. Statist., 2012, pp. 315–323.
supervision in real time. [24] P. Gangsar and R. Tiwari, ‘‘Comparative investigation of vibration and
current monitoring for prediction of mechanical and electrical faults in
Future work is currently being carried out to further study induction motor based on multiclass-support vector machine algorithms,’’
in the following aspects. First, we will enrich the vehicle Mech. Syst. Signal Process., vol. 94, pp. 464–481, Sep. 2017.
type and its images to augment the deep learning datasets. [25] L. Yang, P. Luo, C. C. Loy, and X. Tang, ‘‘A large-scale car dataset for
fine-grained categorization and verification,’’ in Proc. IEEE Conf. CVPR,
Second, we will also design a system to put this method into Boston, MA, USA, Jun. 2015, pp. 3973–3981.
the practical application. [26] Y. Jia et al., ‘‘Caffe: Convolutional architecture for fast feature embed-
ding,’’ in Proc. 22nd ACM Int. Conf. Multimedia, 2014, pp. 675–678.
[27] C. Szegedy et al., ‘‘Going deeper with convolutions,’’ in Proc. IEEE Conf.
REFERENCES CVPR, Boston, MA, USA, Jun. 2015, pp. 1–9.
[1] Y. Tang et al., ‘‘Vehicle detection and recognition for intelligent traf- [28] Y. Xu, G. Yu, Y. Wang, X. Wu, and Y. Ma, ‘‘A hybrid vehicle detection
fic surveillance system,’’ Multimedia Tools Appl., vol. 76, no. 4, method based on viola-jones and HOG + SVM from UAV images,’’
pp. 5817–5832, 2015. Sensors, vol. 16, no. 8, p. 1325, 2016.
[2] A. MocholíÂ-Salcedo, J. H. Arroyo-Núñez, V. M. Milián-Sánchez, [29] J.-W. Hsieh, L.-C. Chen, D.-Y. Chen, and S.-C. Cheng, ‘‘Vehicle make
G. J. Verdú-Martín, and A. Arroyo-Núñez, ‘‘Traffic control magnetic loops and model recognition using symmetrical SURF,’’ in Proc. 10th IEEE
electric characteristics variation due to the passage of vehicles over them,’’ Int. Conf. Adv. Video Signal Based Surveill., Krakow, Poland, Aug. 2013,
IEEE Trans. Intell. Transp. Syst., vol. 18, no. 6, pp. 1540–1548, Jun. 2017. pp. 472–477.
WEI CHEN is currently pursuing the Ph.D. JING-JING DONG received the B.S. degrees from
degree with the School of Electronics Information, Nantong University in 2016. He is currently pursu-
Nantong University, China. His research areas ing the M.S. degree with the School of Electron-
mainly include computer vision and deep learning. ics Information, Nantong University, China. His
research areas mainly include image procession
and convolution neural networks.