Mask-Vision A Machine Vision-Based Inference System of Face Mask Detection For Monitoring Health Protocol Safety

Mask-Vision: A Machine Vision-Based Inference
System of Face Mask Detection for Monitoring

Health Protocol Safety
Rovenson V. Sevilla Alvin Sarraga Alon Mark P. Melegrito
Department of Electrical Engineering Digital Transformation Center, STEER Hub Department of Electronics Engineering
Technological University of the Philippines Batangas State University Technological University of the Philippines
2021 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) | 978-1-6654-2899-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/IICAIET51634.2021.9573664
Manila, Philippines Batangas City, Philippines Manila, Philippines

rovenson_sevilla@tup.edu.ph alvin.alon@g.batstate-u.edu.ph mark_melegrito@tup.edu.ph
Ryan Carreon Reyes Bobby M. Bastes Roselle P. Cimagala

Department of Electrical Engineering College of Engineering and Architecture College of Engineering and Architecture
Technological University of the Philippines Bohol Island State University Bohol Island State University
Manila, Philippines Tagbilaran, Bohol, Philippines Tagbilaran, Bohol, Philippines
ryan_reyes@tup.edu.ph bobskhee@gmail.com roselle.cimagala@bisu.edu.ph
Abstract— To avoid adversely affecting community health use disinfectants [3]. The question of whether or not masks
and the global economy, effective ways to limit the COVID-19 should be worn has sparked a lot of debate. The fact is that in
pandemic require constant attention. In the absence of efficient the context of appropriate infection management, no one
antivirals and insufficient medical resources, WHO preventive measure contains the golden key to disease
recommends several methods to minimize infection rates and
prevention [4]. Each act makes a major contribution to the
prevent depletion of scarce healthcare resources. One of the
non-pharmaceutical treatments that can be used to decrease the procedure and aids the others in restricting and inhibiting the
primary source of SARS-CoV2 droplets expelled by an infected spread of COVID-19 [5].
individual is to wear a mask. Irrespective of disagreements Face masks, on the other hand, are the single most
about medical resources and mask types, all governments effective preventative strategy in our pandemic-stricken
enforce the wearing of masks that cover the nose and mouth by society [6]. Even though public spaces have recently been
the general population. In the next years, the suggested mask opened for economic reasons, the number of incidents in each
detection models might be a valuable tool for ensuring that nation is quite significant. “Face masks can help protect
safety measures are followed correctly. The YOLOv3 model, a against a variety of respiratory diseases conveyed by droplets,
deep transfer learning object identification state-of-the-art
such as coronavirus and influenza.” according to a prominent
approach, is used to create a mask detection model in this
research article. The suggested model's exceptional scholar at the Johns Hopkins Center for Health Security and
performance makes it ideal for video surveillance equipment. an infectious disease expert [7]. Viruses like the coronavirus
The suggested approach focuses on creating an enhanced may be spread via the air by coughing and sneezing, or by
dataset from a 300-image dataset utilizing data augmentation touching a contaminated surface and then touching your
techniques such as image filtering. The Data augmentation- mouth, nose, or eyes before washing your hands, according to
based mask detection model's mean average precision was the expert [8]. Such droplets will not come into contact with
found to be 89.8% during training and 100% during overall one's face or mouth if one is wearing a face mask before
testing, with detection per frame accuracy ranging from 40.03% hitting the ground [9].
to 65.03%.
Despite the fact that masks are mandated to be worn in
Keywords— covid-19, face mask detection, deep learning, public locations such as malls and offices as a precaution,
object detection, yolov3. many people do not comply [10]. Most apartments and
businesses rely on guards to keep an eye on such individuals,
I. INTRODUCTION putting them at risk. The danger can be decreased by
The coronavirus disease in 2019, also known as COVID- automating the mask detection procedure. This may be
19, is caused by the severe acute respiratory syndrome accomplished by automating access to any facility for only
coronavirus 2, according to the World Health Organization those wearing masks using object detection algorithms [11].
(WHO) (SARS-CoV-2) [1]. This is an extremely contagious Object detection methods have been successfully used during
illness distributed mostly by small droplets created by the last ten years to identify a variety of items, such as military
sneezing, coughing, and speaking. Despite the fact that the guns, as well as medical applications such as malignant cell
droplets are not usually airborne, those who are in close identification and other applications [12]-[15].
proximity are more vulnerable to infection [2]. COVID-19 is The goal of this work is to create a model for mask
an extremely infectious illness that may be transmitted from identification that employs a deep learning-based object
person to person nearby. To prevent the transmission of this detection method. The suggested model is trained to utilize an
disease, keep a safe distance from others, wear a mask, and appropriate data augmentation-based pre-processing approach
978-1-6654-2899-6/21/$31.00 ©2021 IEEE

on a dataset of masked persons. This model aims to make the
detection process more efficient. This detection would aid
humans in adjusting to their new normal and ensuring their
protection from future viruses. It's also a step toward AI-
assisted automation that takes into account the need to adapt
to changing societal standards and practices.
The essential architecture of the selected objection
detection method, as well as the proposed mask detection
technique and experimental strategy, will be covered in
section II. The proposed model's results are analyzed in
Section III of this paper, and their performance is evaluated.
The conclusion of this study is drawn in section IV.
II. METHODOLOGY
Object detection is a computational intelligence method
used in image processing and computer vision to detect the
presence of certain objects in a digital image or video, such as
a person, automobile, billboard, structure, and so on [16]. The
Fig. 2. Sample dataset of face mask mages by Prajna Bhandary [19].
area of object identification research in computer vision is
rapidly growing. Traditional computer vision, such as
template object matching, was used at first. Some recent
region-based algorithms include Faster Region-Based A. Dataset Gathering and Preparation
Convolutional Neural Networks (R–CNN), Fast R–CNN, R– The dataset utilized in this study as shown inf Fig. 2 was
CNN, and You Only Look Once (YOLO) [17]. The YOLO created by Prajna Bhandary [19]. The dataset contains 1,376
technique was selected for this study because it can inspect the images divided into two categories: masked images of 690 and
entire image during model evaluation and collect every detail non-masked images of 686. The proponents selected 300
about the full image and target at the same time. YOLO is also images from a collection of 690 masked images for training
considerably faster than Faster R–CNN since it is and validation images. Training and validation datasets were
programmed to do bounding box regression and classification created from the image datasets. The training dataset had 80%
at the same time, predicting localization boxes [18]. Fig. 1 of the images, whereas the validation dataset contained 20%
depicts the system workflow, which includes dataset of them.
preparation, training, validation, and testing.
B. Dataset Annotation
The researchers used software to annotate and label the
datasets, as shown in Fig. 3, to annotate and label the images.
A rectangular bounding box was constructed in the lower face
region of a person to annotate the datasets.
Fig. 3. Sample dataset annotation of face mask images.

Fig. 1. Overview of the system workflow.
testing. The greater the mAP, the better the model's detection
accuracy. mAP (mean average precision) is the average of
AP.
Average Precision (AP): To minimize the influence of the
curve wiggles, interpolate the accuracy at successive recall
phases until the AP is calculated (1). The interpolated
precision (2) defines a particular degree of recall as
the maximum level of accuracy found at any stage of recall
′≥ :
The area under the interpolated curve, which may be
computed using the formula below, is then defined as AP.
Fig. 4. The network architecture of YOLOv3 (Redmon et al., 2018) [18]. (1)
C. Training (AI Modeling)

1 max r′ ! (2)
One of the fastest object detection algorithms is You Only ′" 1
Look Once version 3 (YOLOv3). To anticipate bounding
boxes, YOLOv3 applies a single convolutional neural
network (CNN) to the entire image and then splits it into Mean Average Precision ( # ): The average AP
several areas [18]. CNN becomes more efficient as a result of throughout all tests is denoted as mean average accuracy
the ability to produce many predictions at the same time. (# ) (3), where $ is the number of queries in the set
YOLOv3 uses a technique called the regional proposal, and is the average precision ( ) for a given query, %.
which takes an image as input and creates a representation
with bounding boxes drawn for those patches in the image )
(*+
&'(
that are most likely to represent the item [20]. This is a # (3)
,
quicker procedure than the standard sliding window
technique, which does an exhaustive search for objects by E. Testing
pixel-by-pixel scanning of the whole frame. Some regions The study employed a new set of images and video
may be loud, over-lapping, or confusing during object datasets in a workplace environment for the testing process.
detection, causing the item to be misidentified. The regional These images were utilized to prevent biases in the testing
solution proposed by YOLOv3 would address this issue by accuracy (4) results, as they were not part of the 300 images
returning the image region with the greatest probability score used for training and validation.
that is nearest to the identified item [21].
YOLOv3 (as illustrated in Fig. 4) was chosen by the # 23 4 5 6 789 5
--. /-0 :2 ;< # 23 789 5 =
> 100 (4)
proponents because it is a more widely used network in
business today, and many lightweight networks are built on
it. The major reason for choosing YOLOv3 is because of its
basic structure and ease of use. III. RESULTS AND ANALYSIS
This section discusses the results of training, validation,
As illustrated in Fig. 1, the authors propose employing and testing.
image augmentation for mask detection before training with A. Training and Validation Results
YOLOv3 [18]. The preparation of an enhanced dataset is
included in the augmentation process. The standard dataset is As shown in Fig. 5, this training chart displays the
included in this enhanced dataset, as well as a second set accuracy of the loss and val loss in each epoch; the x-axis is
generated via filtering procedures. Scaling, cropping, HSV the epoch, and the color blue in the chart represents the
distortion, and image flipping are four filtering techniques training loss and the orange one represents the val loss.
that are used in the dataset to assist the model to perform
better.
Each image in this study using YOLOv3 only passes
through the network once only to predict if the person is
wearing a mask or not. YOLOv3 splits that image into
subcomponents and performs convolutions on each of these
subcomponents before pooling them together to make a
forecast.
D. Evaluation
For the evaluation procedure, the study used the mAP
(mean Average Precision) to compare the trained model to
ensure the selection of the right one for the model inference
detection. It will generate files that will be used for Fig. 5. Model training chart.
Fig. 6. Model evaluation chart.
TABLE I. RESULTS OF EVALUATION (MODEL 43)

Model detection_model-ex-043—loss-
005.310.h5
IoU 0.5
Threshold 0.3
Non-Maximum Suppression 0.5
face-mask 89.80
mAP 89.80
The training began with a training loss and validation loss

of 40.4109 and 9.9806, respectively. It finished on epoch 50
with losses of 5.3102 and 30.3824 for training and validation,
respectively. Training epoch 42 had the lowest training and
validation loss score of 5.209 and 2.609, whereas training
epoch 1 had the highest training and validation loss score of
40.4109 and 9.9806.
Fig. 5 depicts the Loss vs. Validation Loss graph, while

Fig. 6 depicts the evaluation of each model to determine which
one has the best performance. Model 2 has the lowest mAP
performance of 36.54 in this graph. Model 27 and Model 43
had the best results, with an mAP value of 0.898. The study's
testing and model inference were conducted using the epoch
model 43 as shown in Table 1.
B. Testing Results
The research team created a video clip to evaluate model
43's inference. The deployment and testing site is a replicated
work-office setting where workers' health protocols are
monitored and tested. The detection accuracy result per frame
of the video testing is shown in Fig. 7. There are 300 frames
in the small clip video depicted in the graph, where variance
or fluctuation in detection accuracy is visually portrayed. The
detection accuracy of each frame ranges from 40% to 91 %,
and the model has a total testing accuracy of 100% since it
correctly identifies the face mask as illustrated in Figure 8. Fig. 8. Sample image testing (a), (b), (c), (d) and (e).
Fig. 8 illustrates a sample image that was used to test the

model's functionality. In addition, the testing results for video
image frames (a), (b), (c), (d), and (e) showed that all of the
face mask objects present in each image frame were detected
perfectly.
IV. CONCLUSION AND FUTURE WORKS
The mask detection model is useful for detecting face
masks in a variety of situations. One such situation is the
Fig. 7. Testing for detection accuracy.
current COVID-19 epidemic when persons wearing face [8] L. Morawska et al., "How can airborne transmission of COVID-19
masks in public areas must be monitored. This method would indoors be minimised?", Environment International, vol. 142, p.
105832, 2020. doi: 10.1016/j.envint.2020.105832
prevent the illness from spreading to security officers or those
[9] C. Kähler and R. Hain, "Fundamental protective mechanisms of face
assigned to mask monitoring. This study offers a technique for masks against droplet infections", Journal of Aerosol Science, vol. 148,
mask detection utilizing an augmented dataset created from a p. 105617, 2020. doi: 10.1016/j.jaerosci.2020.105617
minimal number of training images to create such a detection [10] A. Karaivanov, S. Lu, H. Shigeoka, C. Chen and S. Pamplona, "Face
system. In contrast to traditional training, which necessitates a masks, public policies and slowing the spread of COVID-19: Evidence
huge number of datasets, it augments data using a filtering from Canada", Journal of Health Economics, vol. 78, p. 102475, 2021.
approach, resulting in high detection accuracy. doi: 10.1016/j.jhealeco.2021.102475
[11] A. Tulbure, A. Tulbure and E. Dulf, "A review on modern defect
In model 43, the research achieved an mAP of 0.898. This detection models using DCNNs – Deep convolutional neural
demonstrates that, even with a limited number of datasets, networks", Journal of Advanced Research, 2021. doi:
YOLOv3 is an excellent method for use in face mask detection 10.1016/j.jare.2021.03.015
systems. When evaluated in an office environment for [12] H. Alon, M. Ligayo, M. Melegrito, C. Franco Cunanan and E. Uy II,
"Deep-Hand: A Deep Inference Vision Approach of Recognizing a
monitoring health protocol safety, the system received a 100% Hand Sign Language using American Alphabet", 2021 International
overall testing accuracy. For future work, the proponents Conference on Computational Intelligence and Knowledge Economy
recommend applying the model in a device that can be carried (ICCIKE), 2021. doi: 10.1109/iccike51210.2021.9410803
anywhere. [13] J. Dioses, Jr, "Bottle-SegreDuino: An Arduino Frequency-Based Bin
for Tin Can and Plastic Bottle Segregation using an Inductive
ACKNOWLEDGMENT Proximity Effect", International Journal of Advanced Trends in
Computer Science and Engineering, vol. 9, no. 4, pp. 5451-5454, 2020.
The authors are grateful to the Computing Resources of doi: 10.30534/ijatcse/2020/184942020
Batangas State University's STEER Hub (Science, [14] L. Lacatan, R. Santos, J. Pinkihan, R. Vicente and R. Tamargo, "Brake-
Technology, Engineering, and Environment Research Hub), Vision: A Machine Vision-Based Inference Approach of Vehicle
Digital Transformation Center Lab. Braking Detection for Collision Warning Oriented System", 2021
International Conference on Computational Intelligence and
REFERENCES Knowledge Economy (ICCIKE), 2021. doi:
10.1109/iccike51210.2021.9410750
[1] D. Wu, T. Wu, Q. Liu and Z. Yang, "The SARS-CoV-2 outbreak: What
we know", International Journal of Infectious Diseases, vol. 94, pp. [15] H. Alon, "Eye-Zheimer: A Deep Transfer Learning Approach of
44-48, 2020. doi: 10.1016/j.ijid.2020.03.004 Dementia Detection and Classification from NeuroImaging", 2020
IEEE 7th International Conference on Engineering Technologies and
[2] S. Falahi and A. Kenarkoohi, "Transmission routes for SARS-CoV-2 Applied Sciences (ICETAS), 2020. doi:
infection: review of evidence", New Microbes and New Infections, vol. 10.1109/ICETAS51660.2020.9484315
38, p. 100778, 2020. doi: 10.1016/j.nmni.2020.100778
[16] V. Sharma and R. Mir, "A comprehensive and systematic look up into
[3] A. Rahmani and S. Mirmahaleh, "Coronavirus disease (COVID-19) deep learning based object detection techniques: A review", Computer
prevention and treatment methods and effective parameters: A Science Review, vol. 38, p. 100301, 2020. doi:
systematic literature review", Sustainable Cities and Society, vol. 64, 10.1016/j.cosrev.2020.100301
p. 102568, 2021. doi: 10.1016/j.scs.2020.102568
[17] A. Pathak, M. Pandey and S. Rautaray, "Application of Deep Learning
[4] M. Liao et al., "A technical review of face mask wearing in preventing for Object Detection", Procedia Computer Science, vol. 132, pp. 1706-
respiratory COVID-19 transmission", Current Opinion in Colloid & 1717, 2018. doi: 10.1016/j.procs.2018.05.144
Interface Science, vol. 52, p. 101417, 2021. doi:
10.1016/j.cocis.2021.101417 [18] J. Redmon and A. Farhadi, "YOLOv3: An Incremental
Improvement", arXiv, 2018.
[5] D. Pradhan, P. Biswasroy, P. Kumar Naik, G. Ghosh and G. Rath, "A
Review of Current Interventions for COVID-19 Prevention", Archives [19] Bhandary, P. (2021). observations/experiements/data at master ·
of Medical Research, vol. 51, no. 5, pp. 363-374, 2020. doi: prajnasb/observations. GitHub. Retrieved 17 August 2021, from
10.1016/j.arcmed.2020.04.020 https://github.com/prajnasb/observations/tree/master/experiements/dat
a.
[6] J. Ju, L. Boisvert and Y. Zuo, "Face masks against COVID-19:
Standards, efficacy, testing and decontamination methods", Advances [20] J. Xiao, "exYOLO: A small object detector based on YOLOv3 Object
in Colloid and Interface Science, vol. 292, p. 102435, 2021. doi: Detector", Procedia Computer Science, vol. 188, pp. 18-25, 2021. doi:
10.1016/j.cis.2021.102435 10.1016/j.procs.2021.05.048
[7] B. Krishnamachari, A. Morris, D. Zastrow, A. Dsida, B. Harper and A. [21] A. Shakarami, M. Menhaj, A. Mahdavi-Hormat and H. Tarrah, "A fast
Santella, "The role of mask mandates, stay at home orders and school and yet efficient YOLOv3 for blood cell detection", Biomedical Signal
closure in curbing the COVID-19 pandemic prior to Processing and Control, vol. 66, p. 102495, 2021. doi:
vaccination", American Journal of Infection Control, vol. 49, no. 8, pp. 10.1016/j.bspc.2021.102495
1036-1042, 2021. doi: 10.1016/j.ajic.2021.02.002

Mask-Vision A Machine Vision-Based Inference System of Face Mask Detection For Monitoring Health Protocol Safety

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mask-Vision A Machine Vision-Based Inference System of Face Mask Detection For Monitoring Health Protocol Safety

Uploaded by

Copyright:

Available Formats

Mask-Vision: A Machine Vision-Based Inference

System of Face Mask Detection for Monitoring

Manila, Philippines Batangas City, Philippines Manila, Philippines

Ryan Carreon Reyes Bobby M. Bastes Roselle P. Cimagala

978-1-6654-2899-6/21/$31.00 ©2021 IEEE

Fig. 3. Sample dataset annotation of face mask images.

C. Training (AI Modeling)

TABLE I. RESULTS OF EVALUATION (MODEL 43)

The training began with a training loss and validation loss

Fig. 5 depicts the Loss vs. Validation Loss graph, while

Fig. 8 illustrates a sample image that was used to test the

You might also like