FPN-D-Based Driver Smoking Behavior Detection Method

IETE Journal of Research
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tijr20
FPN-D-Based Driver Smoking Behavior Detection

Method
Zuopeng Zhao, Haihan Zhao, Chen Ye, Xinzheng Xu, Kai Hao, Hualin Yan, Lan
Zhang & Yi Xu
To cite this article: Zuopeng Zhao, Haihan Zhao, Chen Ye, Xinzheng Xu, Kai Hao, Hualin Yan,
Lan Zhang & Yi Xu (2023) FPN-D-Based Driver Smoking Behavior Detection Method, IETE
Journal of Research, 69:8, 5497-5506, DOI: 10.1080/03772063.2021.1982409
To link to this article: https://doi.org/10.1080/03772063.2021.1982409
© 2021 The Author(s). Published by Informa

UK Limited, trading as Taylor & Francis
Group
Published online: 14 Oct 2021.
Submit your article to this journal
Article views: 4362
View related articles
View Crossmark data
Citing articles: 1 View citing articles
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tijr20
IETE JOURNAL OF RESEARCH
2023, VOL. 69, NO. 8, 5497–5506
https://doi.org/10.1080/03772063.2021.1982409
FPN-D-Based Driver Smoking Behavior Detection Method

Zuopeng Zhao, Haihan Zhao, Chen Ye, Xinzheng Xu, Kai Hao, Hualin Yan, Lan Zhang and Yi Xu
School of Computer Science and Technology & Mine Digitization Engineering Research Center of Ministry of Education of the People’s Republic
of China, China University of Mining and Technology, Xuzhou 221116, People’s Republic of China
ABSTRACT KEYWORDS
In view of the fact that a driver’s smoking behavior seriously affects the driving safety, a feature Driver smoking; Feature
pyramid network (FPN)-based smoking behavior identification method has been studied in order pyramid networks; Dilated
to reduce the occurrence of the driver smoking. Most of the existing research has been focused on convolution; Small target;
detection and recognition based on movements while smoking or smog characteristics. Therefore, Detection; Behavior
detection
the probability of misjudgment in such methods is high, thus to address this issue, the present work
proposes a method based on FPN to detect the driver’s smoking behavior. FPN has been combined
with the dilated convolution technique in order to detect a small target object in the driver’s image
and recognize their smoking behavior. By using the driver behavior images collected from the vehicle
platform, a simulation experiment was carried out by employing the behavior identification method
proposed in this work. The experimental results show that the accuracy of the proposed method is
94.75%, the recall rate is 96%, the precision rate is 95.05% and the area under the receiver operating
characteristic curve is 95.5%.
1. INTRODUCTION
With continuous advancements in artificial intelligence,
A driver smoking while driving not only harms their deep learning is being applied in the field of tobacco
own health and that of others in their vicinity but also control and achieving an AI tobacco control has far-
increases the risk of road accidents. Many drivers have reaching consequences. In recent years, a large number
a habit of smoking while driving vehicles, which may of researchers have used deep learning in the recognition
lead to many adverse consequences. When smoking, the of the direction of behavior. Guan et al. [1] proposed a
driver usually controls the steering wheel with one hand network model for an in-depth study of human behav-
that might tilt the body shifting the center of gravity. ior recognition. By using the sliding window algorithm
This uneven force may easily lead to irregular driv- to perform motion segmentation, the time series data
ing. The carbon monoxide in the smoke also affects the are transformed into a deep network model and then
blood oxygen saturation in driver’s body. Physiologi- via end-to-end research, the feature vector is imported
cal studies have shown that if the blood oxygen satura- into the SoftMax classifier for identification. The recog-
tion in the human body is less than 80%, it will cause nition accuracy of the network model for the UCI Human
a series of symptoms of hypoxia, such as distraction, Activity Recognition (HAR) dataset is 91.73%. Zhang
weak thinking ability, memory loss, mild motion dishar- et al. [2] proposed a dedicated interleaved deep con-
mony, fatigue, etc. These factors are likely to pose a threat volutional neural network architecture, which uses the
to driving safety. In addition, a driver smoking in the information from a multi-stream input to merge the
car not only affects the air quality inside the car but extracted abstract features through multiple fusion lay-
might also cause a fire inside the car. If the car is loaded ers and introduces a temporal voting scheme based on
with flammable or explosive dangerous goods, smoking historical inference examples to achieve an enhanced
may cause serious consequences. Thus, it is obvious that accuracy in driver behavior recognition. The recogni-
the smoking habit of a driver while driving seriously tion accuracy of this method in five kinds of aggregated
affects their ability to drive safely. Especially for those behavior patterns, namely, grouping tasks that involve
drivers who drive the “two passengers and one danger” the use of a mobile device and eating and drinking,
vehicles, the habit of smoking while driving seriously is 81.66%. Yan et al. [3] used the color images on the
affects the safety of life and property of the drivers them- driver’s side to extract the skin-like regions using the
selves and also others and it is easy to cause irreparable Gaussian mixture model and passed it to the deep convo-
consequences. lutional neural network model to generate action labels.
© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/
licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered,
transformed, or built upon in any way.
5498 Z. ZHAO ET AL.: FPN-D-BASED DRIVER SMOKING BEHAVIOR DETECTION METHOD
Yang et al. [4] proposed a two-layer learning method generating multiple images of different resolutions by
for a driver’s behavior recognition using the electroen- multi-scale pixel sampling of the original image and then
cephalography (EEG) data, with the highest classification extracting different features from each layer for the pur-
accuracy of 83.5% obtained using this method. pose of prediction. However, this method requires a large
amount of calculation which, in turn, requires a large
At present, most methods used for identifying smoking amount of memory and time. Thus, researchers made
behavior focus on processing the experimental data by improvements in the above methods. Using the char-
relying on the use of smoking behavior gestures or smog acteristics of the convolution itself, the original image
generated in the vicinity while the driver smokes, for was convoluted and a pooling operation was carried
identifying the driver’s smoking behavior. For example, out to extract features at different scales from differ-
Chiu et al. [5] proposed a smoking behavior recogni- ent layers of the network for prediction. However, this
tion based on a spatiotemporal convolutional neural net- method did not use shallow features. Shallow networks
work, using data balance and data enhancement based on focus more on detailed information while deeper net-
GoogleNet and time slice network architecture to achieve works focus more on semantic information. The infor-
an efficient smoking action recognition having an accu- mation in the shallow features is quite helpful for detect-
racy of 91.67%. But not all datasets have smog and the ing small objects and this can improve the accuracy of
driver’s smoking dataset collected in the present work object detection to some extent. Therefore, researchers
has almost no smoke. In addition, the data enhancement continuously explored and as a result have proposed the
processing of the experimental data takes a lot of time FPN technique. By using deep learning to construct the
and effort. At the same time, if only gestures are used for feature pyramid, an improvement in the robustness of
recognition, it easily leads to low recognition accuracy the algorithm is achieved and accurate location informa-
and a high rate of false-positive results. tion is obtained by accumulating shallow as well as deep
features. The algorithm process is shown in Figure 1.
In order to solve the above problems, the present work
aims to study the behavior recognition from another First, a deep convolution operation is performed on the
angle, which is, using the cigarette in the dataset as input image, following which the features of the second
the recognition target. By detecting whether there is convolution layer are subjected to dimensionality reduc-
a cigarette in the driver’s image, in order to identify tion. The features of the third convolution layer are then
whether the driver smokes while driving, the occurrence upsampled so that they have the corresponding dimen-
of adverse consequences due to smoking while driving sions. Then, the processed second and third convolution
can be prevented and the safety of life and property of layers are subjected to an addition operation (addition
the driver and others can be guaranteed. Feature Pyramid of corresponding elements). The result thus obtained is
Network (FPN) is a network that solves multi-scale prob- added to the result of the processed first convolution
lems in object detection and greatly improves the effi- layer. The output thus obtained can then be used to make
ciency for detecting small objects without substantially predictions.
increasing the computation time of the original model
[6]. It is widely used in identifying detection tasks such 2.1.2 Dilated Convolution
as multi-scale target detection and small object detection The dilated convolution method [7] inserts gaps in
and recognition. In this work, based on the character- the standard convolution map in order to increase the
istics of the research object, the FPN network has been receptive field area. Compared to the normal convolu-
improved to form a new network model, namely, Feature tion operation, dilated convolution has a hyperparameter
Pyramid Network-dilated convolution (FPN-D), in order called dilation rate. This hyperparameter refers to the
to extract the features of cigarettes in the images. number of kernel intervals (the dilation rate for a nor-
mal convolution is 1). Since the receptive field area is
increased in this method without pooling, loss of infor-
2. APPROACH mation is avoided and each convolution output contains
a large amount of information. A comparison between
2.1 Method Overview
ordinary convolution and dilated convolution is shown
2.1.1 Fpn in Figure 2.
At present, people often use the method of construct-
ing multi-scale pyramids in computer vision to solve for The square on the upper left side in Figure 2 indicates
the problems caused due to objects of different sizes. a receptive field of size 3 × 3 for a 3 × 3 kernel in an
For example, the earliest image pyramid consisted of ordinary convolution. The position of multiplication of
Z. ZHAO ET AL.: FPN-D-BASED DRIVER SMOKING BEHAVIOR DETECTION METHOD 5499
Figure 1: The FPN structure
receptive field size of the dilated convolution, it is neces-

sary to first calculate the convolution kernel size of the
current layer. This is done using,
fk = (r − 1) ∗ (k − 1) + k (2)
where r is the dilation rate of the dilated convolution

kernel and k is the initial convolution kernel size.
Thus, by obtaining the convolution kernel size of the

current layer using Equation (2) and substituting it in
Equation (1), the receptive field size of the current layer
of the dilated convolution can be calculated.
Figure 2: An example showing the comparison between ordi- This paper uses a deep convolution network to detect
nary convolution and dilated convolution
a driver’s smoking behavior. According to the charac-
teristics of the dataset, the cigarettes in the images are
each point in the kernel is an adjacent 3 × 3 rectangle small target objects and thus are represented by only a
(shown by the red dot positions in the upper left side of few pixels in the image. An ordinary convolutional neu-
the picture) and the dilation rate, in this case, is 1. The ral network (CNN) continuously reduces the information
dilated convolution is represented by the square on the characteristics of a small target object while carrying
lower right side in Figure 2 showing a receptive field of out a continuous convolution operation on the image.
size 7 × 7 for a 3 × 3 kernel having a dilation rate of However, most of the information is concentrated in the
2. Note that its point multiplier position is no longer the shallow feature map. Thus, the final feature map obtained
adjacent 3 × 3 rectangle, but the red dot positions in the from an ordinary CNN has fewer or sometimes no target
lower right side of the figure. features and this makes it impossible to detect cigarettes
in the image. In contrast, since the FPN is continuously
Calculation of the receptive field in a dilated convolution blending shallow features in the convolution process, the
is done as follows: characteristics of the small targets are preserved in the
final feature map. Thus, in the dilated convolution pro-
The receptive field size of an ordinary convolution layer cess, there is no loss of information without pooling and a
k is calculated by, large receptive field area enables each convolution output
to contain a large amount of information. Due to these
advantages, in the present work, FPN combined with the

k−1
lk = lk−1 + (fk − 1) ∗ si (1) dilated convolution network has been selected to extract
i=1 the features from the driver’s image to obtain an accurate
detection.
where lk−1 is the receptive field size of the (k − 1)th layer,
fk is the convolution kernel size of the current layer, and
2.2 FPN-D Network Structure
si is the step size of the ith layer.
In the present work, a new network structure of FPN-D
Since the convolution kernel of the dilated convolution (schematic is shown in Figure 3) consisting of FPN and
changes with the dilation rate, in order to calculate the dilated convolution has been proposed. In this, the image
Figure 3: Structure diagram of FPN-D
data enter the FPN from the input layer after convolu-
tion and the characteristics of the shallow network are
continuously aggregated. The feature map obtained from
this operation is then subjected to a dilated convolution
operation. According to the characteristics of the dataset,
three dilation rates (r = 1, r = 2, r = 3) have been
used to perform the dilated convolution operation. At
the same time, considering that the dilated convolution
may lead to grid problems, the method of hierarchical
feature fusion [8] has been used to solve this problem.
This method uses the feature maps convolved by differ-
ent dilation ratios and superimposes them step by step.
As a result, no additional parameters are introduced, the
amount of calculation also does not increase much and
thus, the grid effect can be effectively improved. In addi-
tion, the input feature map is also added to the final out-
put feature map by doing an element-by-element summa-
tion to improve information transfer. The network then
Figure 4: Schematic for the details of the addition process
undergoes a convolution and the result thus obtained is
outputted by the Sigmoid layer. The above network struc-
ture can efficiently capture the features of small objects the evaluation data set, including 300 normal images of
in an image and the dilated convolution can retain a the driver; 400 images are used to constitute the test data
large amount of information so that the characteristics of set, including 200 driver smoking images. The original
the small objects can be learned from the driver image. data in this article is a video file of the driver during
The detailed process of addition to the network is shown the driving process collected from the vehicle monitoring
in Figure 4, where the activation functions used by all platform. Then take out the video file frame by frame to
convolution layers are Rectified Linear Units(ReLu). form the original driver image as shown in the image on
the left in Figure 5. Then simply crop the original driver
image to form the training image in the text as shown in
2.3 Data Preprocessing
the right image in Figure 5. In this way, the influence of
The driver image used in this article is provided by the the background information in the image on the experi-
local driver real-time monitoring platform. There are a mental results is reduced. In this work, the original driver
total of 7,000 training images, which consist of images of image is simply cropped to reduce the influence of back-
drivers driving more than 200 different types of vehicles. ground information in the image, as shown in Figure 5.
Of these, 3,500 are images of the driver smoking and the This is done using a code for face recognition written in
rest are images of the driver’s normal driving. The spe- the OpenCV environment, that can recognize a face in
cific allocation of the data set is as follows: 6000 images an image or a video and the program can save the rec-
are used to constitute the training data set, including 3000 ognized face as an image. Because the size of the image
driver smoking images; 600 images are used to constitute after trimming by the program is not uniform, the image
Figure 5: Driver image before (left) and after (right) cropping
size ranges from 380 × 380 to 410 × 410. Thus, the input configuration of the network structure is shown in
size of the image in the model is specified as (400 × 400, Table 1.
3),which can avoid the situation that the detection target
is lost. In the training process, the objective function adopted by
the network is a binary cross-entropy function and the
adaptive moment estimation (Adam) [9] optimizer has
2.4 Training and Parameter Selection been used. The learning rate of Adam is set to 0.001, the
The network is trained using the preprocessed driver exponential decay rate of the mean of the gradient is set to
images. The input size of the image is (400 × 400, 3) 0.9, and the exponential decay rate of the uncentered vari-
and the reading mode is RGB mode. Real-time data ance of the gradient is set to 0.999. The dilation ratio used
enhancement, such as horizontal flipping, random clip- in the present work adopts a combination of three dila-
ping, random scaling, etc., is performed before the data tion ratios of r = 1, r = 2, and r = 3, which can obtain
is inputted into the network for training and the net- information from a wider range of pixels and avoid grid
work is then trained as a whole. The detailed parameter problems. At the same time, the method can also adjust
Table 1: Detailed structure of the network

Layer Input size Output size Hyper parameter
Conv1 400 × 400,3 400 × 400,32 32,3 × 3,1
MaxPooling1 400 × 400,32 200 × 200,32 2 × 2,2
Conv2 200 × 200,32 200 × 200,32 32,3 × 3,1
MaxPooling2 200 × 200,32 100 × 100,32 2 × 2,2
Conv3 100 × 100,32 100 × 100,64 64,3 × 3,1
MaxPooling3 100 × 100,64 50 × 50,64 2 × 2,2
Conv4 50 × 50,64 50 × 50,64 64,3 × 3,1
MaxPooling4 50 × 50,64 25 × 25,64 2 × 2,2
Conv5 25 × 25,64 25 × 25,64 64,3 × 3,1
UpSampling1 25 × 25,64 50 × 50,64 2×2
Conv6 50 × 50,64 50 × 50,64 64,1 × 1,1
Add1 50 × 50,64 50 × 50,64 50 × 50,64 –
UpSampling2 50 × 50,64 100 × 100,64 2×2
Conv7 100 × 100,32 100 × 100,64 64,1 × 1,1
Add2 100 × 100,64 100 × 100,64 100 × 100,64 –
Conv8 100 × 100,64 98 × 98,64 64,3 × 3,1
MaxPooling5 98 × 98,64 49 × 49,64 2 × 2,2
D-conv Operation 49 × 49,64 49 × 49,64 64,3 × 3,1,rate = 1,2,3
Conv9 49 × 49,64 47 × 47,32 32,3 × 3,1
MaxPooling6 47 × 47,32 23 × 23,32 2 × 2,2
FC1 – – 64
FC2 – – 64
output – – 1
the size of the receptive field by modifying the dilation In order to ensure the fairness of comparison, the three
rate. benchmark methods of VGG, ResNet50 and FPN have
been debugged many times to ensure that they can be well
adapted to the experimental environment of this arti-
3. EXPERIMENTAL SIMULATION AND RESULTS
cle. Therefore, before the experimental comparison. First,
In this work, the detection performance of FPN-D was perform multiple debugging trainings for each network
evaluated on the basis of accuracy, precision, recall, speci- structure to be compared. Then select the best train-
ficity, and Receiver Operating Characteristic (ROC). The ing parameters corresponding to the network training to
accuracy rate indicates the ratio of correctly predicted compare the subsequent experiments to ensure the fair-
samples to all the samples; the precision rate indicates ness of the experimental comparison. The accuracy rate
the ratio of truly correct samples to all the samples that obtained by applying different methods to the dataset
have been classified as correct samples; the recall rate, used in the experimental simulation is shown in Figure 6,
also known as sensitivity, indicates the ratio of the sample while the loss value for the different methods is shown in
that is predicted to be the correct sample to all the sam- Figure 7. From these two plots, it can be seen that FPN-D
ples that are the correct samples in reality; the specificity has certain advantages in accurate detection in the dataset
refers to the probability of correctly predicting the wrong and changes in the loss function value during training.
samples. The above-mentioned performance parameters The classification performance of a particular compar-
are required to obtain the confusion matrix (see Table 2). ison method is expressed by the average classification
accuracy, precision, recall rate, and specificity. The values
Following are the formulae used for calculating the dif- of these parameters have been calculated using Equations
ferent performance parameters: (3) to (6) and the numerical values thus obtained are
TP + TN
Accuracy: (3)
TP + FN + FP + TN
TP
Recall rate: (4)
TP + FN
TP
Precision rate: (5)
TP + FP
TN
Specificity: (6)
FP + TN
In order to make a comprehensive and accurate evalua-
tion of FPN-D proposed in this work, this method has
been compared with different network structure meth-
ods. The main reason for choosing the VGG network
for comparison is to study whether the deep network
structure is effective in identifying small targets. Tra- Figure 6: Plot showing the accuracy rate obtained by applying
ditional neural networks have more or less problems different network structure training to the dataset
such as information loss when transmitting information.
The residual network optimizes the above problems by
directly bypassing the input information to protect the
integrity of the information. According to the charac-
teristics of the research object in this article, the target
information is originally small, and once the informa-
tion is lost, it will seriously affect the research results.
Therefore, ResNet50 is selected for comparison in the
article.
Table 2: Confusion matrix

Predictive 1 Predictive 0 Total
Actual 1 (P) TP FN TP + FN
Actual 0 (N) FP TN FP + TN Figure 7: Plot showing the loss value for the validation set for the
Total TP + FP FN + TN TP + FN + FP + TN different network structure training
Table 3: Values of the performance of different methods can automatically extract the relevant features from the
Method Accuracy Recall Specificity Precision two-dimensional images of the driver without the need
VGG 81 68.50 93.50 91.33 to design an algorithm for visual feature extraction. The
ResNet50 82.5 67.50 96 94.41 network can fully utilize the features in the driver’s image,
FPN 87.50 79.50 95.50 94.64
FPN-D 94.75 96 95 95.05 such as color, edge, texture, etc. to automatically train a
suitable convolution filter to extract the relevant features
Table 4: AUC and CI values for the different methods from the image.
Method AUC CI
VGG 80.90 (0.023) [76.5, 85.4] As shown in Table 3, the accuracy of detecting images
ResNet50 82.70 (0.022) [78.4, 87] using only FPN is 87.50% and the recall rate is 79.50%.
FPN 87.40 (0.019) [83.7, 91.2]
FPN-D 95.50 (0.012) [93.1, 97.8] The accuracy of detecting images with FPN-D is 94.75%,
with an increase of 7.25 percentage points while the recall
rate is 96%, an increase of 16.5 percentage points as com-
pared to that obtained by using only FPN. Since most
of the image features of small objects exist in the feature
map of the shallow network, the FPN-D network makes
full use of this feature. In the process of convolution, the
shallow feature fusion is continuously performed, so that
the shallow features are well preserved thus well retaining
the shallow features. Compared with traditional VGG,
ResNet and other networks, it has certain advantages.
The FPN-D detection effect is better than the feature
pyramid network because the FPN-D network has a hol-
low convolution layer, which can expand the range of
the receptive field without increasing the amount of cal-
culation, so as to better retain the features of the small
target in the feature map. Especially for some small tar-
gets with large dispersion in the image, the retention of
features in the feature map is higher than that of the gen-
Figure 8: ROC curves for different methods eral convolutional layer. Therefore, from the perspective
of the detection range, the hole convolution plays a cer-
given in Table 3. From a comparison of the numbers in tain role in the formation of the feature map of small
the table, the FPN-D method is found to be superior as objects. Thus, the FPN-D proposed in the present work is
compared to the other three feature extraction methods quite effective. In terms of computational cost, the aver-
in terms of accuracy, recall, and precision rate. Table 4 age time for categorization by GPU is not too different
gives the values of the area under the ROC curve (AUC) and can be neglected in practical applications. In addi-
and the asymptotically 95% confidence interval(CI) for tion, in order to evaluate the advantages of the network
different methods while Figure 8 shows the ROC curves studied in this work, its comparison with some tradi-
corresponding to the different methods. From the figure, tional network structures has also been done. From the
it can be found that the AUC (Area Under the Curve) results shown in Table 3, the accuracy of the most tra-
value of the method studied in the article is closest to 1. ditional convolutional network, VGG detection method,
So the FPN-D detection method has the highest authen- is 81% and its recall rate is 68.50%. It can be seen that
ticity. the deep network structure is not ideal for detecting small
target objects. The accuracy of the same data obtained by
employing the ResNet50 detection method is 82.50% and
4. ANALYSIS AND DISCUSSION
its recall rate is only 67.50%. It can be seen that the advan-
The present work uses the deep learning method in tages of the residual network cannot be well reflected in
order to detect the smoking behavior of drivers. Although the research data in the article. To verify the advantages
there are some investigations on driver behavior detec- of the feature pyramid network in the researched data,
tion and recognition based on deep learning [10–14] and two networks, ResNet50 and ResNet-FPN, are used for
research on driver’s smoking behavior, this paper is a experiments. Through the ROC curve shown in Figure
first attempt at using FPN to analyze the driver’s smok- 8 and the data in Table 4, it can be seen that the feature
ing habit. Compared to the traditional methods, FPN-D pyramid network has a good effect on the data studied
on other data sets. The network structure proposed in

the article has not been tested on other data sets. Later,
we will collect enough images of other small objects to
explore the recognition effect of FPN-D on other small
objects.
5. CONCLUSION
This work proposes a detection method based on the
convolutional neural network to detect the driver’s smok-
Figure 9: Test images of drivers smoking ing behavior. Results of the simulation experiment show
that the method proposed in this work can realize the
task of detecting the driver’s smoking behavior while
in the article. Then compare the AUC values of different driving quite accurately. Moreover, artificial intelligence
methods, the FPN-D method has the largest AUC value, is a very popular research topic at present. There are a
which is 95.60%, and is superior in accuracy as compared lot of studies done on image classification and recog-
to the AUC values of the other structures. Comparing the nition and human behavior recognition by employing
accuracy rate and loss value shown in Figures 6 and 7 for different convolution methods [15–28]. In this paper, we
different detection methods, the accuracy of the FPN-D apply deep learning to the driver’s image for a prelim-
network structure proposed in the present work has the inary study on the detection, by the network, of their
highest value and the value of loss is relatively the lowest. smoking behavior. The datasets used for the studies were
Thus, from the above-described analysis, it can be seen collected from real images captured by drivers while they
that the FPN-D has a higher probability and accuracy for are driving. Although the data obtained might be suf-
detecting the driver’s smoking behavior. That is to say ficient for the present study, as more data are acquired
that the FPN-D can efficiently learn the relevant features in the future, the network structure can be optimized in
to predict whether the driver in an image is smoking or order to achieve better detection results. This is because,
not. for the deep convolutional neural network, providing
more data helps the network to achieve better general-
From the analysis, it is observed that the FPN-D method ization performance and reduces the fitting problem. The
recognizes some images correctly while some incorrectly. current dataset is insufficient to achieve higher accuracy
Among the images shown in Figure 9, (1), (2) and (3) than that obtained in the present work. Further work is
are the images that the network correctly recognizes as needed to obtain better performance in order to achieve
a driver smoking; (4), (5) and (6) are the images that a better recognition accuracy and improve the possibil-
the network wrongly identifies as the driver not smok- ity of applying the algorithm to real life, thus enabling to
ing. From a comparative analysis, it can be inferred that achieve an AI smoke control for drivers while they drive.
the ambient light and the position of the camera inside In addition, the network structure in the article is rel-
the vehicle will have some influence on the detection of atively simple compared to other network structures. If
the driver’s smoking. the programming language can be converted into the lan-
guage supported by the hardware running motherboard
At present, there is no analysis method for design- and embedded in the platform, it should be able to meet
ing hyperparameters in FPN-D (such as learning rate the actual driving application. Finally, in real life, the goal
and momentum parameters, number of convolution of AI smoke control for the driver in the driving process
units, size of convolution kernel, etc.). These are mainly is realized.
obtained through experience. In addition, for deep con-
volutional networks, providing more data is necessary
to help the network obtain better generalization perfor- ACKNOWLEDGEMENTS
mance and reduce over-fitting problems. At present, the The authors would like to thank tutor for his guide and help for
data set in this article is not enough to obtain higher this paper work.
accuracy, and further research is needed to obtain better
performance in the future. And affected by the charac-
teristics of the experimental research data, the network DISCLOSURE STATEMENT
structure proposed in the article has not been tested No potential conflict of interest was reported by the author(s).
REFERENCES 14. C. Yan, et al. “Video-based classification of driving behav-

ior using a hierarchical classification system with multiple
1. S. Guan, Y. Zhang, and Z. Tian. “Research on human
features,” Int. J. Pattern Recognit. Artif. Intell., Vol. 30, no.
behavior recognition based on deep neural network,” in
05, pp. 1650010, 2016.
3rd International Conference on Mechatronics Engineering
and Information Technology (ICMEIT 2019). Atlantis Press,
15. S. Maity, D. Bhattacharjee, and A. Chakrabarti, “A novel
2019.
approach for human action recognition from silhouette
images,” IETE J. Res., Vol. 63, no. 2, pp. 160–71, 2017.
2. C. Zhang, R. Li, W. Kim, D. Yoon, and P. Patras. “Driver
behavior recognition via interwoven deep convolutional
16. B. V. Baheti, S. N. Talbar, and S. S. Gajre, “A training-free
neural nets with multi-stream inputs.” 2018.
approach for generic object detection,” IETE J. Res., 1–14,
2019. DOI:10.1080/03772063.2019.1611491.
3. S. Yan, et al. “Driver behavior recognition based on deep
convolutional neural networks,” in 2016 12th International
17. Z. Cai, et al. “A unified multi-scale deep convolutional
Conference on Natural Computation, Fuzzy Systems and
neural network for fast object detection.” in European con-
Knowledge Discovery (ICNC-FSKD). IEEE, 2016.
ference on computer vision. Springer, 2016.
4. L. Yang, et al. “Driving behavior recognition using EEG
18. S. N. Sivanandam, and M. Paulraj, “An approach for image
data from a simulated car-following experiment,” Accid.
classification using backpropagation scheme,” IETE J. Res.,
Anal. Prev., Vol. 116, no. SI, pp. 30–40, 2018.
Vol. 46, no. 5, pp. 315–7, 2000.
5. C. Chiu, C. Kuo, and P. Chang. “Smoking action recog-
19. B. K. Mohan, “Classification of remotely sensed images
nition based on spatial-temporal convolutional neural
using artificial neural networks,” IETE J. Res., Vol. 46, no.
networks,” in 2018 Asia-Pacific Signal and Information
5, pp. 401–10, 2000.
Processing Association Annual Summit and Conference
(APSIPA ASC). IEEE, 2018.
20. P. O. Glauner, “Deep convolutional neural networks for
smile recognition,” Comput. Sci., 2015.
6. T. Lin, et al. “Feature pyramid networks for object detec-
tion,” in Proceedings of the IEEE conference on computer
21. K. He, et al. “Deep residual learning for image recognition.”
vision and pattern recognition. 2017.
in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition. 2016.
7. F. Yu, and V. Koltun. Multi-scale context aggregation
by dilated convolutions. arXiv preprint arXiv:1511.07122,
22. P. Dollár, et al., “Fast feature pyramids for object detection,”
2015.
IEEE Trans. Pattern Anal. Mach. Intell., Vol. 36, no. 8, pp.
1532–45, 2014.
8. S. Mehta, et al. “Espnet: Efficient spatial pyramid of dilated
convolutions for semantic segmentation,” in Proceedings
23. L. Sun, et al., “Human action recognition using factorized
of the European Conference on Computer Vision (ECCV).
spatio-temporal convolutional networks,” in Proceedings
2018.
of the IEEE International Conference on Computer Vision.
2015.
9. D. P. Kingma, and J. Ba. Adam: A method for stochastic
optimization. arXiv preprint arXiv:1412.6980, 2014.
24. S. Ren, et al., “Object detection networks on convolutional
feature maps,” IEEE Trans. Pattern Anal. Mach. Intell., Vol.
10. Y. Xing, et al. “Driver activity recognition for intelligent
39, no. 7, pp. 1476–81, 2016.
vehicles: A deep learning approach,” IEEE Trans. Veh. Tech-
nol., Vol. 68, no. 6, pp. 5379–5390, 2019.
25. S. Zhao, et al., “Pooling the convolutional layers in deep
ConvNets for video action recognition,” IEEE Trans. Cir-
11. J. Zeng, Y. Sun, and L. Jiang. “Driver distraction detection
cuits Syst. Video Technol., Vol. 28, no. 8, pp. 1839–49, 2018.
and identity recognition in real-time.” in 2010 Second WRI
Global Congress on Intelligent Systems. IEEE, 2010.
26. E. H. Adelson, et al., “Pyramid methods in image process-
ing,” RCA Eng., Vol. 29, no. 6, pp. 33–41, 1984.
12. G. Jianqiang, and Y. Wei. Driver pre-accident behavior
pattern recognition based on dynamic radial basis func-
27. K. He, et al., “Spatial pyramid pooling in deep convolu-
tion neural network.” in Proceedings 2011 International
tional networks for visual recognition,” IEEE Trans. Pat-
Conference on Transportation, Mechanical, and Electrical
tern Anal. Mach. Intell., Vol. 37, no. 9, pp. 1904–16, 2015.
Engineering (TMEE), 2011.
28. P. Wang, et al. “Understanding convolution for semantic
13. P. Li, J. Shi, and X. Liu, “Driving style recognition based on
segmentation,” in 2018 IEEE winter conference on applica-
driver behavior questionnaire,” Open J. Appl. Sci., Vol. 07,
tions of computer vision (WACV). IEEE, 2018.
no. 04, pp. 115–28, 2017.
AUTHORS Kai Hao is a graduate student at China

University of Mining and Technology. He
Zuopeng Zhao is an associate professor at is currently engaged in the research of arti-
China University of Mining and Technol- ficial intelligence in transportation. The
ogy, a master’s tutor, and a doctor gradu- specific research direction is to imple-
ated from Peking University. He is mainly ment intelligent recognition algorithms
engaged in artificial intelligence, Beidou on hardware devices.
satellite positioning and mobile Internet of
Things and other aspects of research. He
Email: 515307059 @qq.com
has published 1 monograph, 7 papers in
SCI journals, more than 10 papers in EI journals, and 4 national Hualin Yan is a graduate student at China
invention patents. He hosted numerous vertical projects and University of Mining and Technology. She
horizontal topics. He won two second prizes of the 5th Science is currently engaged in the research of arti-
and Technology Progress Award of the State Administration of ficial intelligence in transportation. Her
Work Safety, and three second prizes of China Coal Industry specific research direction is to identify
Association Science and Technology Progress Award. driver behavior.
Email: 6510875@qq.com
Haihan Zhao is a graduate student at
Lan Zhang is a graduate student at China
China University of Mining and Technol-
University of Mining and Technology. She
ogy. She is currently engaged in research
is currently engaged in the research of arti-
on artificial intelligence in transportation.
ficial intelligence in transportation. Her
The specific research direction is to iden-
specific research direction is to identify the
tify the driver’s behavior.
driver’s behavior.
Corresponding author. Email: 903682394@qq.com Yi Xu is a graduate student at China Uni-
versity of Mining and Technology. He is
Chen Ye is a graduate student of China currently engaged in the research of arti-
University of Mining and Technology. He ficial intelligence in transportation. The
is currently engaged in the research of arti- specific research direction is to imple-
ficial intelligence in the medical field. His ment intelligent recognition algorithms
specific research direction is to identify the on hardware devices.
benign and malignant thyroid gland.
Email: ts17170032a3tm@cumt.edu.cn
Xinzheng Xu is an associate professor
at the China University of Mining and
Technology and a master’s tutor. He is
mainly engaged in machine learning and
data mining, artificial intelligence and pat-
tern recognition, medical image process-
ing and other aspects of research. He has
published 1 monograph, more than 10
papers in SCI journals, and more than 20 papers in EI journals.
Currently, he is a member of the Chinese Computer Society and
Chinese Artificial Intelligence Society.
Email: xuxinzh@163.com

FPN-D-Based Driver Smoking Behavior Detection Method

Uploaded by

Copyright:

Available Formats

You might also like

FPN-D-Based Driver Smoking Behavior Detection Method

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FPN-D-Based Driver Smoking Behavior Detection Method

Uploaded by

Copyright:

Available Formats

IETE Journal of Research

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tijr20

FPN-D-Based Driver Smoking Behavior Detection

To link to this article: https://doi.org/10.1080/03772063.2021.1982409

© 2021 The Author(s). Published by Informa

Published online: 14 Oct 2021.

Submit your article to this journal

Article views: 4362

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at

FPN-D-Based Driver Smoking Behavior Detection Method

Figure 1: The FPN structure

receptive field size of the dilated convolution, it is neces-

where r is the dilation rate of the dilated convolution

Thus, by obtaining the convolution kernel size of the

Figure 3: Structure diagram of FPN-D

Figure 5: Driver image before (left) and after (right) cropping

Table 1: Detailed structure of the network

Table 2: Confusion matrix

on other data sets. The network structure proposed in

REFERENCES 14. C. Yan, et al. “Video-based classification of driving behav-

AUTHORS Kai Hao is a graduate student at China

You might also like