Professional Documents
Culture Documents
Towards A System For Real-Time Prevention of Drowsiness-Related Accidents
Towards A System For Real-Time Prevention of Drowsiness-Related Accidents
Corresponding Author:
Abdelhak Khadraoui
Department of Computer Science, ENSAM Meknes, Moulay Ismail University, Meknes, Morocco
Marjane 2, P.B. 15290 Al-Mansour, Meknes 50500, Morocco
Email: a.khadraoui@edu.umi.ac.ma
1. INTRODUCTION
According to Moroccan National Road Safety Strategy 2017-2026 and the World Health Organization,
road accidents are a major public health problem globally. Traffic accidents are one of the leading causes of
death and injury worldwide. Each year, more than 1.35 million people die in road accidents and millions more
are injured or disabled. On average, about 3,500 people die and 12,000 are seriously injured in road accidents
in Morocco each year, and on average 10 people die and 33 are seriously injured every day. With this in mind
and in collaboration with all road safety stakeholders, Morocco has decided to implement the National Road
Safety Strategy (2017-2026) to prevent all forms of road accidents. This new strategy defines a more ambitious
long-term vision to develop ’responsible action in Morocco’. We have set an ambitious and quantifiable goal
to halve road deaths by 2026 compared to current levels (less than 1,900 road deaths in 2026).
Advanced driver assistance systems (ADAS) consist of the use of advanced technologies to assist the
driver of a vehicle [1]. An ADAS allow to collect data inside and outside the vehicle as well as to perform
technical processing such as identifying, detecting, and tracking static and dynamic objects, making it an active
safety technology that enables drivers to detect potential hazards in the shortest time possible in order to attract
attention and improve safety. Vehicle recognition and tracking, lane line recognition, traffic sign recognition,
pedestrian recognition, and other general functions of visual ADAS are examples. Since ADAS have various
built-in functions, one of the limitations imposed on the module presented in this work is to distract the driver
and avoid triggering false alarms that cause the driver to turn off the ADAS. is to Using a non-intrusive system
that can detect driver drowsiness from a series of images, which is currently a difficult task.
Currently, the main research results regarding driver behavior modeling technology include the
following. Cai et al. [2] proposed the concept of driving caracteristics map. Miyajima and Takeda [3] used
road driving data to develop to model driver behavior. This technique has been implemented using statistical
machine learning methods. Angkititrakul et al. [4] proposed a probabilistic driver behavior model using Gaus-
sian mixture models. Shi et al. [5] proposed to assess driving style by normalizing driving behavior based on
personalized driver modeling [6]. To quantitatively assess driving style in this way, an aggressiveness index
is proposed that can be used to identify abnormal driving behavior. Taniguchi et al. [7] proposed an unsu-
pervised learning method that was established based on the original double articulatory analyzer model. This
method predicts possible driving behavior scenarios by segmenting and modeling incoming time-series data
about driving behavior. Okuda et al. [8] proposed probability weighted autoregressive exogenous models. The
autoregressive exogenous models are composed of probability weighting functions. This model can represent
real-world driving behavior.
This work begins with the following assumptions: An on-board camera captures an image from the
front of the driver and analyzes it using artificial intelligence (AI) technology to determine if the driver is
feeling sleepy. The system can then warn drivers in real time to prevent possible accidents.The implementation
of the proposed solution consists of a combination of feature pyramid network (FPN) [9] and Deep Residual
Network ResNet [10]. The model can recognize patterns in a series of images, so it can predict whether the
driver is tired. FPN is one of the most popular feature fusion techniques for solving multiscale problems in
object detection.
The next section of this paper reads: section 2 provides an overview of recent research on driver
drowsiness detection using deep learning. The proposed method for the drowsiness detection system is de-
scribed in section 3, and the experiments and results are presented in section 4. Finally, section 5 analyzes
the results and identified problems, proposes possible improvements and future work directions, and section 6
presents the conclusions of this work.
2. RELATED WORK
Human eye blinks can be categorized into the following types: reflex, involuntary, and voluntary
blinks [11]. Reflexive blinks are evoked by stimuli such as loud sounds and bright lights. Involuntary blinks
occur unconsciously, while involuntary blinks are consciously generated by the cue [11]. Blinking was used
to detect drowsiness and concentration, eye fatigue and dry eye [12], [13]. It is the involuntary, momentary or
brief closure of both upper eyelids in a coordinated manner.
2.1. Eye aspect ratio-based driver’s blink detection
The quick closure and reopening of the human eye are known as blinking, and each person has a
unique blink pattern. Blinking lasts around 100–400 milliseconds. When it comes to detecting whether a driver
is tired, the status of his or her eyes blinking is crucial. The blink condition of the driver is used to detect
whether he or she is drowsy or not.
The first step in blink detection is to obtain and crop the face from the frontal image. This step uses
the well-known facial recognition library Dlib [14]. Dlib is based on facial features, which are points on the
face that represent features such as eyes, mouth, and nose. Provides a pre-trained facial feature detector that
can detect 68 points on a face. Six of them are related to the eyes as shown in Figure 1. By calculating the
ratio of the Euclidean distances between the six eye coordinate points using the formula given by (1). We can
determine whether a person is blinking. This ratio is known as the eye aspect ratio (EAR) [15].
||p2 − p6 || + ||p3 − p5 ||
EAR = (1)
2.||p1 − p4 ||
Where p1 , p2 , p3 , p4 , p5 , p6 represent 2D position coordinates as shown in Figure 1. When the eyes are open,
the aspect ratio is nearly constant, but when you blink, the aspect ratio rapidly drops to zero. It also uses a
threshold to assess whether the eyes are open or closed. Close your eyes when the EAR is down the threshold.
On the other hand, an EAR superior to the threshold indicates that the eyes are open. You can tell if a bus
driver is tired by the time it takes them to close their eyes. From this we conclude fatigue.
Figure 1. The six landmarks (two-dimensional coordinates) representing the eye automatically detected by
Dlib: in the case of an open eye (image on the left) and in the case of closed eye (image on the right) [15]
Table 1. Comparative study of recent deep learning based approaches for driver drowsiness detection, trained
on benchmarking images datasets
Articles Network Architecture Classes Datasets Accuracy Speed (fps)
Reddy et al. 2017 MTCNN: Multi-Task 3 classes: DROZY [19] National Baseline-4 91.3% Baseline-4 6.1 fps
[18] Cascaded CNN + Normal Tsing Hua University Baseline-2 93.8% Baseline-2 12.5 fps
DDDN: Driver Drowsi- Yawning (NTHU)-drowsy [20]
ness Detection Network Drowsy
Dipu et al. 2021 Single-Shot multi- 2 classes: Media Research Lab 90% 8 fps
[21] box Detection Eyes Open (MRL) Eye dataset
(SSD) MobileNet + Eyes closed [22]
CNN
Kulhandjian 2021 DCNN: Deep CNN Yawn IEEE Yawning Detec- 95.1% 16-20 fps
[23] No Yawn tion Dataset (YawDD)
Unknown class [24]
Open eyes The Closed Eyes in the 91.8% 16-20 fps
Closed eyes Wild (CEW) dataset
[25]
Magán et al. 2022 CNN + Recurrent Neu- Awake University of Texas at 60% 10 fps
[26] ral Network RNN Drowsy Arlington Real-Life
Drowsiness Dataset
(UTA-RLDD) [27]
3. METHOD
In this section, we describe our proposed drowsiness detection system. Countless people are out and
about day and night. Drivers and long distance drivers suffer from sleep deprivation. Therefore, drowsy driving
is very dangerous. Many accidents are due to driver fatigue. To avoid these accidents, we built a system with
FPN and ResNet architecture to warn drivers in case of fatigue.
detect whether the driver is drowsy or drowsy. In case of danger, you will be alerted in real time. The Figure 2
down shows the pipeline of the proposed system for real-time warning of the driver in case of drowsiness.
Figure 2. Pipeline of the proposed system for real-time drowsiness detection and driver notification
Figure 4. General overview of the proposed deep learning architecture for eyes classification
To solve the challenge of detecting objects at various scales, recognition systems (particularly object
detection) utilise pyramid feature networks, which were first developed by Tsung-Yi et al. [9]. The goal is for
the network to learn the features of the same object across a broad range of scales in the dataset by using a
pyramid of the same image at various scales. Consequently, a feature extraction network that creates feature
representations at various scales can be described as a FPN. In our work, we adopted FPN was proposed as an
object detection networks, for image classification (i.e., Eyes- Open or Closed). Many recognition systems use
residual networks ResNets [10], a traditional state-of-the-art deep neural network, as their foundation. ResNet
has been demonstrated to perform well when used for classification problems.
Two final features are extracted by the FPN, each of which represents an aspect of the input image
at a different scale. After that, a dropout layer was put in place (to prevent overfitting), and then the first
classification layer. The SoftMax function is not appropriate since it determines the position of each output
neuron based on the ratio of the positions of the other output neurons, making the relu activation function
more appropriate. The construction was completed by connecting two categorized layers, each made up of two
neurons, to create a thick layer of ten neurons. The final classification layer to which the SoftMax function
was applied was then connected to this layer. The network uses several categorization results based on various
scaling features when performing this function. This enables the network to classify the photos more accurately.
3.3.1. Model training
The two datasets we used contain training, and test data. We trained the model up to 20 epochs. For
network training, we used ResNet50V2 as a pretrained model with its weights to speed up the convergence. In
the fitting phase we adopt Adam as optimizer and the categorical cross entropy as a loss function and 0.0001 as
a learning rate 0.0001. We also used data augmentation techniques to make learning more efficient and prevent
network overfitting.
4. EXPERIMENTAL RESULTS
In this section, we will report the image classification results of the network on the test set. We
implemented the algorithm and network in the Google Colab Notebook and used the Keras library Ten-
sorFlow backend for developing and operating deep networks. The results for each dataset are reported in
Table 2 in terms of speed (in frame per second), precision, recall, F1-score and accuracy. Note that, in the first
dataset, images have 3 color channels: red, green, and blue (RGB) while in the second dataset images are in
gray scale only.
4.1. Discussion
The average results of the two models in the single image evaluation phase on Table 2 show that the
models achieved an overall accuracy of 98.17 %. Our research shows that while our model detects eyes very
carefully, regular models such as CNN and OpenCV judge all similar eyes and misjudge open rather than closed
images. to identify it. So the model achieves high accuracy overall. Table 2 shows the average results of our
model on two datasets, properly classifying 931 out of 970 images, which is an acceptable value. It also shows
that our model runs almost as fast as the other models in Table 1, and the processing speed is good. Since a
sequence of images our dataset has images with a resolution of (80,80) in most cases, this system can process
them in about 100 to 132 milliseconds. The reason for this is that the algorithm for selecting the photographic
images can differ in each of the dataset image sequences. Thus, the processing speed changes more than
expected. What makes this work credible is that it has been developed and tested in real conditions. By viewing
the input data as a series of images or videos (not a single image) evaluated on a large dataset, exhibiting high
accuracy, few false positives, and good inference speed. We hope our joint dataset and code will help other
researchers improve his AI models and use them to diagnose driver behavior.
4.2. Model evaluation
What makes this work credible is that it has been developed and tested in real conditions. By viewing
the input data as a series of images or videos (not a single image) evaluated on a large dataset, exhibiting high
accuracy, few false positives, and good inference speed. We hope our joint dataset and code will help other
research. This model uses video as real-time input to the scoring process to determine if subjects are feeling
sleepy. Figure 5 shows the frames of the video used. The camera is not in the same position as the standard
computer camera (aloft the screen) that the model was trained with. This made eye tracking very sensitive.
Also, videos were recorded at different frames per second (15 and 30 fps). This is different from the frame
rate used here to train the model (10 fps). Research improve his AI models and use them to diagnose driver
behavior.
Figure 5. Evaluation using video from webcam: The first image on the left show face and eyes detection
where the driver is not drowsy, the second image on the right show a drowsy driver with the alert message
(coupled with sound in the video)
5. CONCLUSION
In this work, we introduced a fully automated system for detecting driver drowsiness from an open
and closed eye dataset. We suggested an image processing approach in the first stage to filter the appropriate
images of the driver’s face. This algorithm improves the network’s accuracy and speed. To improve classi-
fication, we implemented a new deep neural network in the next step. This network can be used to increase
the accuracy of various classification issues, especially for photos carrying essential information. Improve ac-
curacy, particularly in images with crucial small-scale objects. To classify drowsy and non-drowsy database
images, we trained three different deep convolutional networks. Our model outperformed the others by utiliz-
ing ResNet50V2, a modified pyramid network, and the designed architecture. We used the trained networks to
run the fully automated identification system after training. Our model achieved an overall accuracy of 98.17%
for single image classification (the first method of evaluation). In the driver state identification step (sequence
of images or video), our model performed better than the other systems; correctly identifying approximately
463 images out of 502 images as drowsy. We also used the webcam, which continuously captures and observes
the driver’s eyes, to test the classification’s accuracy. Based on the results, the proposed methods can improve
drowsiness detection while being fast enough to be implemented in an ADAS system. Finally, we mention that
this system can be used in other industrial situation where the awareness of the employee is crucial to avoid
accidents.
ACKNOWLEDGEMENTS
This work is carried out within the research project entitled ”A knowledge management approach for
road accident scenarios” granted by the Moroccan Ministry of Equipment and the National Center for Scientific
and Technical Research (CNRST), Rabat, Morocco. The project is part of the Road Safety Research Program
2020.
REFERENCES
[1] F. Jiménez, J. E. Naranjo, J. J. Anaya, F. Garcı́a, A. Ponz, and J. M. Armingol, “Advanced driver assistance system for
road environments to improve safety and efficiency,” Transportation Research Procedia, vol. 14, pp. 2245-2254, 2016, doi:
10.1016/j.trpro.2016.05.240.
[2] H. Cai, Z. Hu, Z. Chen, and D. Zhu, “A driving fingerprint map method of driving characteristic representation for driver identifica-
tion,” IEEE Access, vol. 6, pp. 71012–71019, 2018, doi: 10.1109/ACCESS.2018.2881722.
[3] C. Miyajima and K. Takeda, “Driver-behavior modeling using on-road driving data: A new application for behavior signal process-
ing,” IEEE Signal Processing Magazine, vol. 33, no. 6, pp. 14–21, 2016, doi: 10.1109/MSP.2016.2602377.
[4] P. Angkititrakul, C. Miyajima, and K. Takeda, “Modeling and adaptation of stochastic driver-behavior model with application to car
following,” in 2011 IEEE Intelligent Vehicles Symposium (IV), 2011, pp. 814–819, doi: 10.1109/IVS.2011.5940464.
[5] B. Shi et al., “Evaluating driving styles by normalizing driving behavior based on personalized driver modeling,” IEEE Transactions
on Systems, Man, and Cybernetics: Systems, vol. 45, no. 12, pp. 1502–1508, 2015, doi: 10.1109/TSMC.2015.2417837.
[6] J. Liu, Y. Jia, and Y. Wang, “Development of driver-behavior model based onwoa-rbm deep learning network,” Journal of Advanced
Transportation, vol. 2020, 2020, doi: 10.1155/2020/8859891.
[7] T. Taniguchi, S. Nagasaka, K. Hitomi, K. Takenaka, and T. Bando, “Unsupervised hierarchical modeling of driving behavior and
prediction of contextual changing points,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 1746–1760,
2015, doi: 10.1109/TITS.2014.2376525.
[8] H. Okuda, N. Ikami, T. Suzuki, Y. Tazaki, and K. Takeda, “Modeling and analysis of driving behavior based on a probability-
weighted arx model,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 98–112, 2013, doi:
10.1109/TITS.2012.2207893.
[9] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” Proceedings
of the IEEE conference on computer vision and pattern recognitionvol, 2017, pp. 2117-2125.
[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778.
[11] K. Abe, H. Sato, S. Matsuno, S. Ohi, and M. Ohyama, “Automatic classification of eye blink types using a frame-splitting method,” in
Engineering Psychology and Cognitive Ergonomics. Understanding Human Cognition, D. Harris, Ed., Berlin, Heidelberg: Springer,
2013, pp. 117–124, doi: 10.1007/978-3-642-39360-0 13.
[12] P. P. Caffier, U. Erdmann, and P. Ullsperger, “Experimental evaluation of eye-blink parameters as a drowsiness measure,” European
Journal of Applied Physiology, vol. 89, pp. 319–325, 2003, doi: 10.1007/s00421-003-0807-5.
[13] J. M. Cori, C. Anderson, S. Shekari Soleimanloo, M. L. Jackson, and M. E. Howard, “Narrative review: Do spontaneous eye
blink parameters provide a useful assessment of state drowsiness?” Sleep Medicine Reviews, vol. 45, pp. 95–104, 2019, doi:
10.1016/j.smrv.2019.03.004.
[14] D. E. King, “Dlib-ml: A machine learning toolkit,” Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
[15] T. Soukupova and J. Cech, “Eye blink detection using facial landmarks,” in 21st computer vision winter workshop, Rimske Toplice,
Slovenia, 2016.
[16] A. Vuckovic, V. Radivojevic, A. C. Chen, and D. Popovic, “Automatic recognition of alertness and drowsiness from eeg by
an artificial neural network,” Medical Engineering and Physics, vol. 24, no. 5, pp. 349–360, 2002, doi: 10.1016/S1350-4533
(02)00030-9.
[17] Y. Xing, C. Lv, H. Wang, D. Cao, E. Velenis, and F.-Y. Wang, “Driver activity recognition for intelligent vehicles: A deep learning
approach,” IEEE Transactions on Vehicular Technology, vol. 68, no. 6, pp. 5379–5390, 2019, doi: 10.1109/TVT.2019.2908425.
[18] B. Reddy, Y.-H. Kim, S. Yun, C. Seo, and J. Jang, “Real-time driver drowsiness detection for embedded system using model com-
pression of deep neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),
2017, pp. 121–128.
[19] Q. Massoz, T. Langohr, C. François, and J. G. Verly, “The ULg multimodality drowsiness database (called drozy) and
examples of use,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 1–7, doi:
10.1109/WACV.2016.7477715.
[20] C.-H. Weng, Y.-H. Lai, and S.-H. Lai, “Driver drowsiness detection via a hierarchical temporal deep belief network,” in Computer
Vision – ACCV 2016 Workshops, C.-S. Chen, J. Lu, and K.-K. Ma, Eds., Cham: Springer International Publishing, 2017, pp.
117–133, doi: 10.1007/978-3-319-54526-4 9.
[21] M. T. A. Dipu, S. S. Hossain, Y. Arafat, and F. B. Rafiq, “Real-time driver drowsiness detection using deep learning,” International
Journal of Advanced Computer Science and Applications, vol. 12, no. 7, 2021, doi: 10.14569/IJACSA.2021.0120794.
[22] M. M. Rahman et al., “Eyenet: An improved eye states classification system using convolutional neural network,”
in 2020 22nd International Conference on Advanced Communication Technology (ICACT), 2020, pp. 84–90, doi:
10.23919/ICACT48636.2020.9061472.
[23] H. Kulhandjian, “Detecting driver drowsiness with multi-sensor data fusion combined with machine learning,” Mineta Transporta-
tion Institute Publications, 2021, doi: 10.31979/mti.2021.2015.
[24] S. Abtahi, M. Omidyeganeh, S. Shirmohammadi, and B. Hariri, “YawDD: a yawning detection dataset,” in Proceedings of the 5th
ACM Multimedia Systems Conference, Mar. 2014, pp. 24–28, doi: 10.1145/2557642.2563678.
[25] F. Song, X. Tan, X. Liu, and S. Chen, “Eyes closeness detection from still images with multi-scale histograms of principal oriented
gradients,” Pattern Recognition, vol. 47, no. 9, pp. 2825–2838, 2014, doi: 10.1016/j.patcog.2014.03.024.
[26] E. Magán, M. P. Sesmero, J. M. Alonso-Weber, and A. Sanchis, “Driver drowsiness detection by applying deep learning techniques
to sequences of images,” Applied Sciences, vol. 12, no. 3, p. 1145, 2022, doi: 10.3390/app12031145.
[27] R. Ghoddoosian, M. Galib, and V. Athitsos, “A realistic dataset and baseline temporal model for early drowsiness detection,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.
[28] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2014, pp. 1867–1874.
[29] R. Fusek, “Pupil localization using geodesic distance,” in Advances in Visual Computing, G. Bebis, R. Boyle, B. Parvin, D. Koracin,
M. Turek, S. Ramalingam, K. Xu, S. Lin, B. Alsallakh, J. Yang, E. Cuervo, and J. Ventura, Eds., Cham: Springer, 2018, pp. 433–444,
doi: 10.1007/978-3-030-03801-4 38.
BIOGRAPHIES OF AUTHORS
Youssef Taki received the bachelor’s degree in applied mathematics in 2017 and the
master’s degree in modeling and scientific computing for mathematical engineering in 2019. He is
currently pursuing a Ph.D. degree at Moulay Ismail University, Morroco. His research focuses on the
ADAS “Advanced Driver Assistance Systems”, intelligent transportation systems, road-safety, and
computer vision. He can be contacted at email: you.taki@edu.umi.ac.ma.