Drive-Awake A YOLOv3 Machine Vision Inference Approach of Eyes Closure For Drowsy Driving Detection

Drive-Awake: A YOLOv3 Machine Vision
Inference Approach of Eyes Closure for Drowsy

Driving Detection
Jonel R. Macalisang Alvin Sarraga Alon Moises F. Jardiniano
Technology Licensing Office- ITSO Digital Transformation Center, STEER Hub Department of Computer Engineering
2021 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) | 978-1-6654-2899-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/IICAIET51634.2021.9573811
Technological University of the Philippines Batangas State University FEU Institute of Technology
Manila, Philippines Batangas City, Philippines Manila, Philippines
jonelmacalisang@gmail.com alvin.alon@g.batstate-u.edu.ph mfjardiniano@feutech.edu.ph
Deanne Cameren P. Evangelista Julius C. Castro Meriam L. Tria

College of Engineering and Architecture College of Engineering and Architecture Department of Electronics Engineering
Bohol Island State University Bohol Island State University Eulogio” Amang” Rodriguez Institute
Tagbilaran, Bohol, Philippines Tagbilaran, Bohol, Philippines of Science and Technology
deannecameren.evangelista@bisu.edu.ph julcas_cvcaft@yahoo.com Manila, Philippines
profmeriamtria@gmail.com
Abstract — Nowadays, road accidents have become a major Real-time Detection of drowsiness is the solution to lessen
concern. The drowsiness of drivers owing to overfatigue or road accidents [6]. A study was conducted for how many
tiredness, driving while intoxicated, or driving too quickly is decibels for a driver to wake him up. Including the people who
some of the primary causes of this. Drowsy driving contributes have hard hearing time, the study used 3100 Hz and 450 Hz
to or increases the number of traffic accidents each year. The
signals to test the effectiveness of the two signals on waking
study presented a technique for detecting driver drowsiness in
response to this issue. The sleep states of the drivers in the up people [7]. Both of the tests presented 75 decibels for 2
driving environment were detected using a deep learning minutes. Only 57% of hard hearing awoke from 3100 Hz
approach. To assess if the eyes of particular constant face images while on 450 Hz, 92 % awoke [8]. A study says that vibration
of drivers are closed, a convolutional neural network (CNN) in car seats is one of the factors that makes a driver drowsy,
model has been developed. The suggested model has a wide but vibrations at some frequencies may have the opposite
range of possible applications, including human-computer effect and help keep people awake [9]. The study evaluates an
interface design, facial expression detection, and determining observation test to 15 participants to go through a driving
driver tiredness and drowsiness. The YOLOv3 algorithm, as simulator for 60 minutes, one is having a 4-7 Hz vibration
well as additional tools like Pascal VOC and LabelImg, were
through the car seat, and the other one is no vibration [10].
used to build this approach, which collects and trains a driver
dataset that feels drowsy. The study's total detection accuracy Based on the results of the study, exposure to the vibration
was 100%, with detection per frame accuracy ranging from within 15-30 minutes, sympathetic activity increased having a
49% to 89%. response of alertness when the person feels drowsy, and the
peaked at 60 minutes [11].
Keywords—deep learning, driving, drowsiness, object
detection, YOLOv3 The real-time detection system can be classified into two
kinds of monitoring systems, naming vehicle-oriented system
I. INTRODUCTION and driver-oriented system [12]. Sleepiness is detected in the
The number of road accidents has been one of the most car-oriented device by analyzing the driver's behaviors using
alarming issues in society [1]. Because of traffic accidents, data collected by the vehicle's sensors, such as the vehicle's
1.25 million people around the world die last year. There are location in the lane, steering wheel motions, pedal pressure,
also alarming numbers from the Philippines, with reports from and speed amplitude [13]. The drawback of this technique is
the Philippine Statistics Authority (PSA) showing that the that driving behavior differs from one driver to the next, thus
number of road accident casualties has increased since 2006 creating an "ideal or proper driving" model that can be used to
(from 6,869 in 2006 to 10,012 in 2015)[2]. Factors such as discover differences in driver habits [14].
reckless driving, overspeeding, drunk driving, human error, Drowsiness is recognized in the driver-oriented system
and even drowsiness have been the leading causes of these using face features. The human face is dynamic and has a wide
accidents [3]. The researchers in this study were interested in range of expressions [15]. Human eyes are important in object
reducing road accidents by tackling the issue of driver identification and facial expression research because they are
sleepiness. A major risk in the transportation system is driver one of the most prominent characteristics of the human face.
drowsiness. It was established as a causal or contributory When recognizing face characteristics, it is, therefore,
cause of an accident on the road [4]. According to the findings, advantageous to recognize eyes before identifying other facial
driving when fatigued is equivalent to driving while under the features [16].
influence of drugs or alcohol [5].
978-1-6654-2899-6/21/$31.00 ©2021 IEEE

A similar study is about making vehicles more
knowledgeable and interactive that, under inevitable
circumstances, will alert or notify its user, it may provide
critical real-time information of the situation to the car owner,
rescuer, or authority. In today's growing number of road
accidents, the exhaustion of drivers occurring from sleep
disorders or sleep deprivation is a major factor [17].
According to studies, a person who drives without
stopping and continuously drives appears to be at a greater risk
of becoming sleepy. Evidence shows that collisions are caused
by tired drivers who need to rest, which indicates that
sleepiness causes more road accidents than drunk driving [18].
To determine driver drowsiness, different studies used
descriptive tests such as (1) vehicle-focused measures; (2)
physiological measures; and (3) behavioral measures [19]. A
detailed review of these measures will provide insight into the
existing systems, some relevant challenges, and the Fig. 2. Sample dataset for driver drowsiness detection.
improvements that need to be developed to build a robust
system. The three metrics for the sensors used are analyzed in Fig. 1 depicts the research's block diagram. Data
this study and the benefits and restrictions of each are preparation was the first step, followed by training and
discussed. The numerous ways in which drowsiness has been validation. After that, the model created during training will
experimentally engineered are also discussed. It is concluded be evaluated. The last stage will consist of evaluating the final
that the degree of drowsiness of a driver can be measured detection model as well as testing the final detection model
accurately by developing a hybrid drowsiness detection utilizing a video file.
system that incorporates non-intrusive physiological A. Gathering of Datasets
parameters with other metrics. Several road incidents can then
The creation of the system starts with the dataset
be prevented if a warning is given to a driver perceived to be
collection. Data images as shown in Fig. 2 will be used in
drowsy [20].
creating the object detection system. These datasets are
Like any other study, this study aims to lessen the reported gathered from Google Images and a dataset repository called
traffic accidents due to the driver’s drowsiness and to avoid a Kaggle. This will be trained and validated using YOLOv3 a
high risk of accidents on road. The difference between this pretrained algorithm. These studies employed 1000 images in
study with other studies, it utilized a pre-trained algorithm total, with 80% of them being used for training and 20% for
[21]-[23] that eliminated the need for manual image validation. The study also utilized 10 images and a video clip
processing. Like on the study of [24], their study used two to test.
algorithms, the MTCNN and EM-CNN, where the former was
B. Data Annotation (Ground Truth Labelling)
used for face detection and focal point location, and the latter
is used for evaluation of the state of the eyes and mouth while After gathering the image datasets, these were manually
in this study, the YOLOv3 algorithm is utilized to develop a annotated using the LabelImg tool.
drowsy detection system directly localizing the eyes along
with the LabelImg Tool for dataset annotation. In the
succeeding section of this paper, are the further discussion of
the experimental methodology and experimental result.
II. METHODOLOGY
Fig. 1. Proposed system block diagram.

Fig. 3. Ground truth labeling of the dataset.
Fig. 4. YOLOv3 object detection framework.
For the annotation, the study used the LabelImg tool in

Pascal VOC Format. Bounding boxes are created in the
images as shown in Fig. 3 focusing on the target object to be Fig. 5. Training and validation results.
detected. Saving the annotated images will generate a text file E. Testing
that contains the x and y coordinate of the annotated image.
Generated models are evaluated using their mean
C. Object Detection Framework average precision. For testing, the research will utilize the
model with the highest mean accuracy precision (mAP). For
In terms of detection accuracy, YOLOv3 outperformed
detection models like R-CNN and YOLO, the higher the
YOLOv2. SSD and RetinaNet are both accurate, but model’s mAP, the more its accuracy in its detection.
YOLOv3 is 3.0 and 3.8 times faster, according to [25]. For
multiscale box prediction, YOLOv3 employs an approach For the testing process, the study used a different set of
similar to feature pyramids [26]. Darknet-53, as shown in Fig. images and video datasets. These images are not part of the
4, is the backbone feature extractor for YOLOv3. It has 53 1000 images used for training and validation, and they are
convolution layers (successive 3x3 and 1x1 convolutional used to eliminate biases in the testing accuracy result, which
layers with skip connections comparable to ResNet). is defined as the number of detected objects divided by the
total number of objects, multiplied by 100.
D. Training and Evaluation
The dataset was trained and validated as the annotation of
the images was completed. Google Collaboratory and a pre- III. RESULTS AND DISCUSSIONS
trained algorithm named YOLOv3 are used to train and
validate this dataset. YOLOv3 is a convolutional neural A. Training and Validation Results
network and variant of YOLO (You Only Look Once) that Fig. 5 graphically represents the training and validation of
works very well in real-time object detection as it can the trained dataset. The training and validation error are
recognize specific objects in images, videos, and live feeds. represented on the Y-axis, while the number of epochs is
During this phase, the algorithm passes through the entire represented on the X-axis. The dataset was trained and
datasets multiple times and in machine learning, this is validated 50 times with a training loss of 1.0 to 37.0 and a
referred to as the epoch. Each epoch generated a new model, validation loss of 0.4 to 12.0. These losses are the error
and the performance accuracy of these models is assessed generated during training and validation and these may
using their mean average precision (mAP). indicate how good is your model.
The area under the interpolated PR curve, which may be B. Model Evaluation
computed using the formulas (1) and (2) below, is then The average precision (AP) metric is frequently used to
defined as Average Precision (AP). The interpolated accuracy evaluate the accuracy of object recognition algorithms and
Pinterp a certain level of recall is defined as the highest models such as R-CNN and YOLOv3. For this study, Fig. 6
accuracy identified for any stage of recall ′≥ : is the visual representation of the epoch concerning the value
of mAp in the evaluation of the training of datasets.
(1)
1 max r′ ! (2)
′" 1
Mean Average Precision (mAP) (3): The average AP

across all measurements is denoted as mean average precision
(mAP), where K is the number of queries in the set and AP(i)
is the average precision (AP) for a given query, k.
'
&()
$%&
# *
(3)
Fig. 6. Evaluation of model.
TABLE I. RESULTS OF EVALUATION (MODEL 20)
The mAP is depicted on the Y-axis, while the epoch

number is represented on the X-axis, as illustrated in Fig. 6. In
addition, only 17 models were made, with mAP ranging from Fig. 8. Result of testing.
0.0337 (3.37%) to 1 (100 percent ). Model 20 is used in this
study having an mAP value of 1 or 100% (as shown in Table
1), 2.5999 train loss, and 1.4047 val loss. IV. CONCLUSION AND FUTURE WORKS
C. Evaluation of Model The research indicates that deep learning is used to
diagnose eye drowsiness, which can help deter road crashes
The detection accuracy result per frame of the video due to driver fatigue or exhaustion. The study employs a pre-
testing is shown in Fig. 8. In the short clip video as shown in trained algorithm, the YOLOv3, and Pascal VOC formats, as
the graph there are 50 frames, where variation or fluctuation well as the LabelImg tool for data annotation, to come up with
of detection accuracy is graphically represented. This this solution. The model generated and used by the study got
detection accuracy per frame ranges from 40% to 89% and in an mAP value of 1 or 100%, 97.3609% training accuracy, and
totality, the model has 100% testing accuracy since it 98.5993% validation accuracy. With this, the final testing got
accurately detects the driver’s eye drowsiness level or a 100% testing accuracy where all presented images during
percentage. As shown in Fig.7, the testing results for video testing were detected correctly and accurately.
image frame (a), (b), (c), and (d) were 42.803, 59.283,
67.172, and 87.793 respectively. As the study suggests the eye drowsiness detection for
drivers, enhancement of the system such as adding another
The detection accuracy from 40% to 89% can be dataset that can detect the driver’s attentiveness using images
improved by adding more datasets. that cause the drivers distraction such as, texting while driving
eating or drinking while driving, and so on. Hardware can be
also included in the system. The camera integrated with an
alarm, that every time the camera detected the drowsiness of
the driver the alarm or notification will be activated.
ACKNOWLEDGMENT
The authors would like to express their heartfelt thanks to
the Digital Transformation Center Lab of Batangas State
University's STEER Hub (Science, Technology, Engineering,
and Environment Research Hub).
REFERENCES
[1] J. Rolison, S. Regev, S. Moutari and A. Feeney, "What are the factors
that contribute to road accidents? An assessment of law enforcement
views, ordinary drivers’ opinions, and road accident records", Accident
Analysis & Prevention, vol. 115, pp. 11-24, 2018. doi:
10.1016/j.aap.2018.02.025
[2] N. Verzosa and R. Miles, "Severity of road crashes involving
pedestrians in Metro Manila, Philippines", Accident Analysis &
Prevention, vol. 94, pp. 216-226, 2016. doi: 10.1016/j.aap.2016.06.006
[3] T. Taris, "Reckless driving behaviour of youth: Does locus of control
influence perceptions of situational characteristics and driving
behaviour?", Personality and Individual Differences, vol. 23, no. 6, pp.
987-995, 1997. doi: 10.1016/s0191-8869(97)00126-8
[4] A. Bener, E. Yildirim, T. Özkan and T. Lajunen, "Driver sleepiness,
fatigue, careless behavior and risk of motor vehicle crash and injury:
Population based case and control study", Journal of Traffic and
Transportation Engineering (English Edition), vol. 4, no. 5, pp. 496-
502, 2017. doi: 10.1016/j.jtte.2017.07.005
[5] A. McCartt, S. Ribner, A. Pack and M. Hammer, "The scope and nature
of the drowsy driving problem in New York state", Accident Analysis
Fig. 7. Testing result of video frame (a),(b),(c), and (d).
& Prevention, vol. 28, no. 4, pp. 511-517, 1996. doi: 10.1016/0001- Systems with Applications, vol. 168, p. 114334, 2021. doi:
4575(96)00021-8 10.1016/j.eswa.2020.114334
[6] R. Jabbar, K. Al-Khalifa, M. Kharbeche, W. Alhajyaseen, M. Jafari and [16] Q. Ji, "Real-Time Eye, Gaze, and Face Pose Tracking for Monitoring
S. Jiang, "Real-time Driver Drowsiness Detection for Android Driver Vigilance", Real-Time Imaging, vol. 8, no. 5, pp. 357-377,
Application Using Deep Neural Networks Techniques", Procedia 2002. doi: 10.1006/rtim.2002.0279
Computer Science, vol. 130, pp. 400-407, 2018. doi:
10.1016/j.procs.2018.04.060 [17] N. Bharadwaj, P. Edara and C. Sun, "Sleep disorders and risk of traffic
crashes: A naturalistic driving study analysis", Safety Science, vol. 140,
[7] E. Aidman, C. Chadunow, K. Johnson and J. Reece, "Real-time driver p. 105295, 2021. doi: 10.1016/j.ssci.2021.105295
drowsiness feedback improves driver alertness and self-reported [18] R. Huhta, K. Hirvonen and M. Partinen, "Prevalence of sleep apnea and
driving performance", Accident Analysis & Prevention, vol. 81, pp. 8- daytime sleepiness in professional truck drivers", Sleep Medicine, vol.
13, 2015. doi: 10.1016/j.aap.2015.03.041 81, pp. 136-143, 2021. doi: 10.1016/j.sleep.2021.02.023
[8] M. Gromer, D. Salb, T. Walzer, N. Madrid and R. Seepold, "ECG [19] A. Murata, T. Yamaashi, K. Fukuda and M. Moriwaka, "Trend
sensor for detection of driver’s drowsiness", Procedia Computer Analysis of Behavioral Measures for Predicting Point in Time of
Science, vol. 159, pp. 1938-1946, 2019. doi: Crash", Procedia Manufacturing, vol. 3, pp. 2434-2441, 2015.
10.1016/j.procs.2019.09.366 Available: 10.1016/j.promfg.2015.07.503 [Accessed 27 July 2021].
[20] P. Forsman, B. Vila, R. Short, C. Mott and H. Van Dongen, "Efficient
[9] X. Zhang, X. Wang, X. Yang, C. Xu, X. Zhu and J. Wei, "Driver driver drowsiness detection at moderate levels of
drowsiness detection using mixed-effect ordered logit model drowsiness", Accident Analysis & Prevention, vol. 50, pp. 341-350,
considering time cumulative effect", Analytic Methods in Accident 2013. doi: 10.1016/j.aap.2012.05.005
Research, vol. 26, p. 100114, 2020. doi: 10.1016/j.amar.2020.100114
[21] H. Alon, M. Ligayo, M. Melegrito, C. Franco Cunanan and E. Uy II,
[10] J. Solaz et al., "Drowsiness Detection Based on the Analysis of "Deep-Hand: A Deep Inference Vision Approach of Recognizing a
Breathing Rate Obtained from Real-time Image Hand Sign Language using American Alphabet", 2021 International
Recognition", Transportation Research Procedia, vol. 14, pp. 3867- Conference on Computational Intelligence and Knowledge Economy
3876, 2016. doi: 10.1016/j.trpro.2016.05.472 (ICCIKE), 2021. doi: 10.1109/iccike51210.2021.9410803
[22] J. Dioses, Jr, "Bottle-SegreDuino: An Arduino Frequency-Based Bin
[11] P. Forsman, B. Vila, R. Short, C. Mott and H. Van Dongen, "Efficient for Tin Can and Plastic Bottle Segregation using an Inductive
driver drowsiness detection at moderate levels of Proximity Effect", International Journal of Advanced Trends in
drowsiness", Accident Analysis & Prevention, vol. 50, pp. 341-350, Computer Science and Engineering, vol. 9, no. 4, pp. 5451-5454, 2020.
2013. doi: 10.1016/j.aap.2012.05.005 doi: 10.30534/ijatcse/2020/184942020
[12] A. Picot, S. Charbonnier and A. Caplier, "EOG-based drowsiness [23] L. Lacatan, R. Santos, J. Pinkihan, R. Vicente and R. Tamargo, "Brake-
detection: Comparison between a fuzzy system and two supervised Vision: A Machine Vision-Based Inference Approach of Vehicle
learning classifiers", IFAC Proceedings Volumes, vol. 44, no. 1, pp. Braking Detection for Collision Warning Oriented System", 2021
14283-14288, 2011. doi: 10.3182/20110828-6-it-1002.00706 International Conference on Computational Intelligence and
Knowledge Economy (ICCIKE), 2021. doi:
[13] Y. Jiao, Y. Deng, Y. Luo and B. Lu, "Driver sleepiness detection from 10.1109/iccike51210.2021.9410750
EEG and EOG signals using GAN and LSTM [24] Z. Zhao, N. Zhou, L. Zhang, H. Yan, Y. Xu and Z. Zhang, "Driver
networks", Neurocomputing, vol. 408, pp. 100-111, 2020. doi: Fatigue Detection Based on Convolutional Neural Networks Using
10.1016/j.neucom.2019.05.108 EM-CNN", Computational Intelligence and Neuroscience, vol. 2020,
[14] H. Eoh, M. Chung and S. Kim, "Electroencephalographic study of pp. 1-11, 2020. Available: 10.1155/2020/7251280 [Accessed 7 August
drowsiness in simulated driving with sleep deprivation", International 2021].
Journal of Industrial Ergonomics, vol. 35, no. 4, pp. 307-320, 2005. [25] J. Redmon and A. Farhadi, "YOLOv3: An Incremental
doi: 10.1016/j.ergon.2004.09.006 Improvement", arXiv, 2018. Available:
https://arxiv.org/pdf/1804.02767.pdf
[15] A. Moujahid, F. Dornaika, I. Arganda-Carreras and J. Reta, "Efficient [26] T. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, "Focal Loss for
and compact face descriptor for driver drowsiness detection", Expert Dense Object Detection", 2017 IEEE International Conference on
Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.324

Drive-Awake A YOLOv3 Machine Vision Inference Approach of Eyes Closure For Drowsy Driving Detection

Uploaded by

Copyright:

Available Formats

You might also like

Drive-Awake A YOLOv3 Machine Vision Inference Approach of Eyes Closure For Drowsy Driving Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Drive-Awake A YOLOv3 Machine Vision Inference Approach of Eyes Closure For Drowsy Driving Detection

Uploaded by

Copyright:

Available Formats

Drive-Awake: A YOLOv3 Machine Vision

Inference Approach of Eyes Closure for Drowsy

Deanne Cameren P. Evangelista Julius C. Castro Meriam L. Tria

978-1-6654-2899-6/21/$31.00 ©2021 IEEE

Fig. 1. Proposed system block diagram.

For the annotation, the study used the LabelImg tool in

Mean Average Precision (mAP) (3): The average AP

The mAP is depicted on the Y-axis, while the epoch

You might also like