Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

DDetection of Distracted Driver using Machine learning

Algorithm
IshAayush Anand 1, Ishita Sharma 2, S.Suchitra 3
Department of Data Science and Business Systems, SRM institute of Science and technology, Chennai, India
aa0339@srmist.edu.in1, is8299@srmist.edu.in2, suchitrs@srmist.edu.in 3 .

Abstract- Driving a car is a challenging task that requires Distractions include chatting or texting on cell phones,
your full attention. Any action that causes the motorist to eating and drinking, conversing with passengers in the
lose focus on the road constitutes distracted driving.1.35 vehicle, changing the radio, applying cosmetics, etc.
million people perish in automobile accidents every year.
Nearly one in five auto accidents, according to a report by Nowadays, Advanced Driver Assistance Systems (ADAS)
the National Highway Traffic Safety Administration, are the are being developed to help avoid accidents by providing
result of a distracted driver. We want to build a system that technologies that notify the driver to possible difficulties
can identify distracted drivers and alert them. In this study, and keep the car's driver and occupants safe in the case of
a machine learning algorithm for recognizing people who an accident. Even with today's most advanced autonomous
are easily distracted is suggested. vehicles, the driver must be alert and prepared to regain
Keywords- Distracted Driver detection, machine learning, control of the car in an emergency. In May 2016, a Tesla
image processing, convolutional neural networks. autopilot hit a white truck-trailer in Williston, Florida,
becoming the first fatal collision in autonomous car
1. Introduction testing. Uber's self-driving car, with an emergency backup
According to a report by the World Health Organization driver at the wheel, recently struck and killed a pedestrian
(WHO), 1.3 million people worldwide pass away in traffic in Arizona. The safe driver in each of these catastrophic
accidents annually, making them the eighth greatest cause accidents had a chance to avoid the collisions, but the
of mortality, while another 20 to 50 million suffer injuries evidence shows that he was clearly preoccupied. Due to
or disabilities. Indian roads account for the largest number this, self-driving cars must also have a critical component
of deaths around the world, according to a survey by the that detects inattentive driving. Detecting distracted
National Crime Research Bureau (NCRB), Government of driving, in our opinion, is crucial for taking additional
India. Since 2006, the number of people killed in traffic preventive action.
accidents has risen steadily in India. The research also
notes that the overall number of fatalities increased to 1.46 If a vehicle could detect such distractions and warn the
lakh in 2015, with driver error accounting for the majority driver, the number of mishaps could be minimized.
of these traffic incidents.
In this study, we focus on recognizing manual distractions
Since a few years ago, there has been an increase in the in situations when the driver is engaged in jobs other than
incidence of accidents involving distracted driving. In safe driving, as well as detecting the source of the
2015, distracted driving caused 3477 fatalities and distraction. We provide a Convolutional Neural Network-
391,000 injuries in motor vehicle incidents, according to based solution to this issue. We also try to decrease
the National Highway Traffic Safety Administration computational complexity and memory requirements while
(NHTSA) [2]. Every day in the US, there are believed to be retaining high accuracy, which is important in real-time
9 fatalities and over 1,000 injuries in traffic accidents applications.
where the distracted driver is at fault [1]. The National
Traffic and Highway Safety Administration classifies 2. Related Works
distracted driving as "any activity that diverts the attention The literature on detecting distracted driving is reviewed
of the driver from the task of driving" and divides it into in this part in a concise manner. The use of telephones is
manual, visual, and cognitive distractions [2] [1]. In its the main cause of manual distractions [2].Motivated by
definitions, the CDC defines cognitive distraction as the this, several of the researchers worked on detecting mobile
"driver's mind is off the road." In other words, the driver is phone usage while driving. Zhang et al. [19] built a
physically in a safe driving position, but his mind is off the database with a camera positioned above the dashboard
road. He might be daydreaming, lost in contemplation, etc. and utilized the Hidden Conditional Random Fields model
When a driver's "eyes are off the road," they are to identify mobile phone usage. Basically, it operates
considered to be distracted by visual factors such as through hand, mouth, and face characteristics. In 2015,
inattention, tiredness, exhaustion, or drowsiness. Various Nikhil et al. [5] developed a dataset for hand detection in
activities when "driver's hands are off the wheel" are the automobile environment using the Aggregate Channel
considered manual distractions. Features (ACF) object detector to obtain an average
accuracy of 70.09%. Additionally, Seshadri et al. [14]
produced their own dataset for the identification of cell
phone activity.

The authors employed the Supervised Descent Method, a distractions, and it was made public. Many of the
histogram of Gradients (HoG), and an AdaBoost classifier researchers' techniques were based on conventional hand-
to obtain 93.9% classification accuracy. The system might crafted feature extractors like SIFT, SURF, and HoG
run at 7.5 frames per second, which is close to real-time mixed with classical classifiers like SVM, BoW, and NN.
performance. Le et al. trained Faster-RCNN on the above-
mentioned dataset to obtain 94.2% accuracy, which is However, CNNs turned out to be the most successful
greater than state-of-the-art techniques. To assess mobile methods for reaching high accuracy [9]. However, under
phone usage, they use facial and hand segmentation. For the rules and regulations, the use of the dataset is limited
detecting hands on the wheel and using a mobile phone, to competition purposes only. Abouelnaga et al. [4]
the system may run at 0.06 and 0.09 frames per second, generated a new dataset comparable to State Farm's
respectively[7]. dataset for distracted driver detection in 2017. They
prepared the photos by segmenting the skin, face, and
hands before proposing a solution based on a weighted
ensemble of five distinct Convolutional Neural Networks.
The system obtained high classification accuracy, but it is
computationally too complicated to be real-time, which is
critical in autonomous driving.

3. Dataset Description
The dataset produced by Abouelnaga et al. [4] is used in
this work. The dataset is divided into 10 categories: safe
driving, texting on mobile phones with the right or left
hand, chatting on mobile phones with the right or left
hand, changing the radio, eating or drinking, hair and
cosmetics, reaching behind, and talking to passengers.
Figure 1 displays a couple of samples from the dataset for
Although UCSD's Laboratory of Intelligent and Safe each class. The data was acquired from 31 people from
Automobiles made great progress in this area, they only seven different nations, using four different automobiles,
addressed three forms of distraction: changing the radio, and incorporating many permutations of the drivers and
changing the mirrors, and using the gear. Martin et al. [8] driving circumstances. For instance, drivers are exposed
introduced a vision-based analysis framework that detects to a variety of lighting situations, including sunlight and
in-vehicle actions by employing two Kinect cameras to shadows. 17308 photos in total make up the dataset, which
offer "hands on the wheel" information. Ohn-bar et al. is split into a training set (12,977) and a test set (4331).
[12] suggested a classifier fusion in which the picture is
split into three regions: wheel, gear, and instrument panel 4. Technical Approach
to infer real activity. They also provided a region-based Deep convolutional neural network is a form of Artificial
classification method to identify hands in pre-defined areas Neural Network (ANN) that is inspired by the animal's
of a picture [11]. A model was learned for each region visual brain. CNNs have made significant progress in a
separately and then merged with a second-stage classifier. variety of applications in recent years, including image
In addition to the previously identified head and hand classification, object identification, action recognition,
signals, the authors' study was expanded [13] to include natural language processing, and many more. The
ocular cues. However, it only evaluated three types of fundamental components of a CNN-based system are the
distractions. Convolutional filters/layers, Activation functions, Pooling
layer, and Fully Connected (FC) layer. By adding these
Zhao et al. devised an additional distracted driving dataset layers one after the other, a CNN is essentially created. The
with a side view of the driver, taking into account four accession of plenty of labeled data and the computational
activities: safe driving, shift lever operation, eating, and power has caused CNNs to advance quite quickly since
smartphone use. Contour, let transform, and random forest 2012.
were used by the authors to obtain 90.5% accuracy. A 1. VGG-16 Architecture
system with PHOG and a multilayer perceptron that One of the most important CNN architectures in
produces an accuracy of 94.75% was also suggested by the literature is the VGG Net. It promoted the notion that
authors. Yan et al. [18] reported a Convolutional Neural networks should be deep and straightforward. It
Network-based solution with 99.78% classification
performed well in both picture classification and
accuracy in 2016. The prior datasets focused on a small
number of distractions, many of which are not readily localization tasks. A 16-layer VGG16 model is used.
accessible. In April 2016, State Farm's distracted driver Along with zero-padding, dropout, max-pooling, and
identification competition on Kaggle defined 10 postures to flattening, convolutional neural networks form most
be identified (safe driving plus nine distracted behaviors) of what is used. To get the most out of transfer
[3]. This was the first dataset to consider a wide range of learning, we added a few more layers to assist the
model in adapting to our use case. Each layer's • LeakyReLU Activation Function
function: The Rectified Linear Unit (ReLU) activation function has
gained popularity in recent years due to its efficiency and
 Only the mean of the values for every patch is utilized speedier convergence. However, because the ReLU
by the global average pooling layer. function sets the output value to zero for all inputs less
than zero, the weights of certain neurons may never be
 Dropout layers help in overfitting control by cutting updated, resulting in dead neurons. LeakyReLU solves this
out a portion of parameters (additional tip: problem by inserting a modest slope in the negative area to
experiment with different dropout values). keep the updates alive.

 The batch normalization layer normalizes the inputs • Dropout


to the subsequent layer, allowing for quicker and Dropout, which involves disregarding certain neurons
more robust training. during the training phase at random, is an effective method
of decreasing overfitting [17]. It assists in lessening
neuronal interdependence of learning. We use linearly
 A layer with a certain activation function is called a increasing dropout in a few convolutional and fully
dense layer, which is a normal fully-connected layer.
connected layers.

• L2 Weight regularization
We initialize the pre-trained ImageNet model weights and Weight regularization, also known as weight decay,
then use our dataset to fine-tune each layer of the network. heavily relies on the unstated presumption that a model
As a first step, all images are expanded to 224 224, and the with fewer weights is in some way simpler than a network
per channel mean of RGB planes is removed from each with big weights [10]. It is put into practice by explicitly
pixel of the image. The geometric meaning is that the data penalizing each parameter's squared magnitude in the cost
cloud is centered around the origin along each dimension. function. We add the term 1 2 w2 to the cost function while
The CNN's initial levels function as feature extractors, and taking into account each weight w in the network, where is
its last layer, the Softmax classifier, assigns each image to the regularization strength. Hyperparameter 0.001 is
one of the predefined categories. The original model, on selected for the hyper parameter. • Normalization of
the other hand, has 1000 output channels that correspond Batches Batch normalization improves the performance
to 1000 Picture Net object types. The last layer thus gets and stability of neural networks by directly pushing
removed and is replaced with a softmax layer that contains activations across a neural network level to follow a unit
10 classes. Gaussian distribution [6]. It lessens heavy reliance on
weight initialization, enhances gradient flow across the
network, and permits faster learning rates. The activation
of all convolutional and fully connected layers has been
normalized in the present research.

3. Modified VGG-16
The main problem with VGG-16 is its large amount of
parameters—nearly 140M in total. Fully linked layers
require too much processing work and use the majority of
these parameters. Additionally, the network with a
2. VGG - 16 with Regularization
completely linked layer can only be used with inputs that
have a defined size. Replacing a fully linked layer with a
Experiments with the original VGG-16 network revealed convolution layer saves parameters and allows for
that the algorithm overfits to the training data. Although it variable input size [15]. As the outcome, we construct a
performs admirably on the training set, achieving a near- completely convolutional neural network by replacing
perfect accuracy rate, it is unable to extrapolate to the dense layers with 1-1 convolutions.
untested data. For this reason, we use a variety of Only 15% of the original VGG-16 parameters—or 15M—
regularization strategies to cut out generalization error. are still present in the redesigned network architecture.
Furthermore, the LeakyReLU activation function is used Every regularization parameter is the same as it was in the
rather than ReLU. The significant differences between the previous section.
original VGG-16 are as follows:
5. Results and Discussions
In order to identify drivers who are distracted, we develop the test set. The addition of dropout, L2 weight
an algorithm using convolutional neural networks that regularization, and batch normalization considerably
detect distractions. The pre-trained ImageNet model is boosts the system's performance and yields 96.31%
utilized for weight initialization, and the transfer learning accuracy on the test set. The system typically processes 42
idea is used. All network layer weights are modified in photos per second. In the form of a matrix of confusion,
relation to the dataset. All the hyperparameters are Table 1 provides a precise and comprehensive statistic for
tweaked after extensive experimentation. Stochastic analyzing the system's outputs. Table 2 shows the accuracy
Gradient Descent is used for the training, with a learning of every one one of the 10 classes in the dataset.
rate of 0.0001, a decay rate of 106, and a momentum value We show a new architecture with a nearly 90% decrease in
of 0.9. The batch size and the total number of epochs are parameters without significantly impacting the accuracy
both set to 64. The NVIDIA P5000 GPU, which has 2560 because the VGG-16 has a high parameter count, and as a
CUDA cores and 16 GB RAM, is used for training and result, a high memory need. On the test set, we achieve
testing. Utilizing Keras, the framework is created. When accuracy of 95.54%. The confusion matrices and class-
the initial the VGG-16 is used as a tool for distracted wise accuracies using the modified VGG-16 architecture
driver being identified, it achieves an accuracy rate of are shown in Tables 3 and 4.
100% in the instruction set with 94.44% performance on
The aforementioned confusion matrices show that postures
associated with "safe driving" and "talking to passenger"
are more frequently confused. The "hands on the wheel"
posture in both classes could be to blame. Additionally,
texting and conversing on a smartphone are sometimes
used interchangeably. Lack of temporal information in the
study may be to blame for this misclassification.
Abouelnaga et al. [4] provide a system of distracted driver
detection and postural categorization that consists of of a
genetically weighted ensemble of five ConvNet. These five
convolutional neural networks are trained using a variety
of picture types, including raw, skin-segmented, hand, face,
and "hands + face" images. On the five imagine resources
stated above, researchers trained the algorithm utilizing
AlexNet or InceptionV3. Because of this method, the system
becomes too complex for real-time use, which is crucial for
self-driving automobiles. Contrarily, we adopt a single
ConvNet, which is less complicated and nevertheless
produces greater accuracy than prior techniques in Table
5.
6. Conclusion and Future Work IEEE Conference on Computer Vision and Pattern
Distracted driving is a severe issue that contributes to Recognition Workshops (CVPRW), pages 46–53, June
numerous car accidents all around the world. Therefore, 2016.
with self-driving cars, the identification of a distracted [8] S. Martin, E. Ohn-Bar, A. Tawari, and M. M. Trivedi.
driver becomes a crucial system component. Here, we Understanding head and hand activities and coordination
describe a powerful convolutional neural network-based in naturalistic driving videos. In 2014 IEEE Intelligent
system that can both detect and pinpoint the source of a Vehicles Symposium Proceedings, pages 884–889, June
driver's distraction. We tweak the VGG-16 architecture for 2014
this job and use numerous regularization strategies to
avoid overfitting to the training data. The proposed system [9] R. P. A. S. Murtadha D Hssayeni, Sagar Saxena.
surpasses existing methods of distracted driver Distracted driver detection: Deep learning vs handcrafted
identification from literature on this dataset, as features. IS&T International Symposium on Electronic
demonstrated in Table 5, with an accuracy of 96.31%. On Imaging, pages 20–26, 2017.
an NVIDIA P5000 GPU with 16GB RAM, the system can
process 42 photos per second. Additionally, we suggest a [10] A. Y. Ng. Feature selection, l1 vs. l2 regularization,
streamlined VGG-16 model that uses just 15M parameters and rotational invariance. In Proceedings of the Twenty-
while maintaining acceptable classification accuracy.. first International Conference on Machine Learning, ICML
’04, pages 78–, New York, NY, USA, 2004. ACM.
The number of parameters and calculation time are being
reduced as a continuation of this effort. Including temporal [11] E. Ohn-Bar, S. Martin, A. Tawari, and M. M. Trivedi.
information might aid in lowering misclassification Head, eye, and hand patterns for driver activity
mistakes and boosting accuracy. Additionally, we want to recognition. In 2014 22nd International Conference on
create a system that can recognize physical distractions as Pattern Recognition, pages 660–665, Aug 2014.
well as visual, cognitive, and other types of distractions in
the future. [12] E. Ohn-Bar, S. Martin, and M. M. Trivedi. Driver
hand activity analysis in naturalistic driving studies:
challenges, algorithms, and experimental studies. J.
References:- Electronic Imaging, 22(4):041119, 2013.

[1] Center for disease control and prevention. https: [13] E. Ohn-Bar and M. Trivedi. In-vehicle hand activity
recognition using integration of regions. In 2013 IEEE
//www.cdc.gov/motorvehiclesafety/distracted_drivig/
Intelligent Vehicles Symposium (IV), pages 1034–1039,
June 2013.
[2] National highway traffic safety administration traffic
safety facts. https://www.nhtsa.gov/risky-driving/ [14] K. Seshadri, F. Juefei-Xu, D. K. Pal, M. Savvides, and
distracted-driving/. C. P. Thor. Driver cell phone usage detection on strategic
highway research program (shrp2) face view videos. In
[3] State farm distracted driver detection. 2015 IEEE Conference on Computer Vision and Pattern
https://www.kaggle.com/c/state-farm-distracted- Recognition Workshops (CVPRW), pages 35–43, June
driver-detection. 2015.

[4] Y. Abouelnaga, H. M. Eraqi,and M. N. Moustafa. Real- [15] E.Shelhamer, J.Long, and T.Darrell. Fully
time distracted driver posture classification. convolutional networks for semantic segmentation. IEEE
CoRR, abs/1706.09498, 2017. Transactions on Pattern Analysis and Machine
Intelligence, 39(4):640–651, April 2017.
[5] N.Das, E.Ohn-Bar, and M. M.Trivedi. On performance
evaluation of driver hand detection algorithms: [16] K.Simonyan and A.Zisserman. Very deep
Challenges, dataset, and metrics. In 2015 IEEE 18th convolutional networks for large-scale image recognition.
International Conference on Intelligent Transportation International Conference on Learning Representations.
Systems, pages 2953– 2958, Sept 2015.
[17] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,
[6] S.Ioffe and C.Szegedy. Batch normalization: and R. Salakhutdinov. Dropout: A simple way to prevent
Accelerating deep network training by reducing internal neural networks from over fitting. J. Mach. Learn. Res.,
covariate shift. In Proceedings of the 32Nd International 15(1):1929– 1958, Jan. 2014.
Conference on International Conference on Machine
Learning - Volume 37, ICML’15, pages 448–456. [18] C. Yan, F. Coenen, and B. Zhang. Driving posture
JMLR.org, 2015. recognition by convolutional neural networks. IET
Computer Vision, 10(2):103–114, 2016.
[7] T. H. N. Le, Y. Zheng, C. Zhu, K. Luu, and M. Savvides.
Multiple scale faster-rcnn approach to driver cell-phone [19] X. Zhang, N. Zheng, F. Wang, and Y. He. Visual
usage and hands on steering wheel detection. In 2016 recognition of driver hand-held cell phone use based on
hidden crf., pages 248–251, July 2011

You might also like