Mapua Institute of Technology at Laguna Academic Year 2019-2020

MAPUA INSTITUTE OF TECHNOLOGY AT LAGUNA
Academic Year 2019-2020
A CONVOLUTIONAL NEURAL NETWORK (CNN) BASED FACE MASK
DETECTION
Aeron Normel Almenanza MENDOZA
Daryl Cuevas PAYUAN
Aaron John Rendon SOTO
Engr. Jonathan R. Oracion
Submitted to the Faculty of Malayan Colleges Laguna
In Partial Fulfillment of the Requirements for the degree of
Bachelor of Science in Electronics Engineering

Methodology
This chapter explains the procedure that were used in this paper. The system consists of
several hardware including; the webcam, the Raspberry Pi, and a LED monitor display. The
webcam was responsible for the gathering of live video feed from its field of view. For the
program using CNN, the researchers then used a Raspberry Pi for the researchers to program a
CNN code with the aid of python software. The computation from the CNN detected if the
person from the video feed is wearing or not wearing a mask. Python is a high-level
programming language that can be used to develop desktop graphical user interface (GUI)
applications, websites and web applications. For the system, the researchers opted to use python
to code the Convolutional Neural Network face mask detection system that would detect persons
without face mask in real time that will sound an alarm along with the face screenshot of the
person not wearing a face mask. Results and screenshots will then be fed to the LED monitor
display as an output.
Fig 3.2 System Design

System Construction
Multiple (2) webcams were implemented within the premises of an establishment in order
for the researchers to fully verify the wearing of facemasks within the premises and ensure its
strict enforcement. All cameras were interconnected through the Raspberry Pi. The researchers
opted to install the webcam on the top of a light stand of 5 ft. in such a way that it would detect a
certain population through the distance. LED TV were mounted to the light stand itself so that it
is easily viewable by the user. The raspberry Pi along with the speaker for the alarm were
installed at the back of the monitor and the overall system were connected thru cables.
Fig 3.3 Prototype construction set-up.

Flowchart
The flowchart of the overall process shown in Fig. 3.4 shows that the system started to
gather live video feed from the source which is a webcam while simultaneously performing
facial recognition. When the program did not detect a face from the video feed, it did continue to
perform the facial recognition until the program detects a face in the video. Generally, the facial
recognition did continue its process on a loop in order to detect single or multiple faces.
When the system detected a face from the facial recognition stage, it then extracted an
image of the face and apply that image into an image classifier. The image classifier then gave
the program the region of interest along with the specific width, height and its coordinates. For
the preprocessing, the image was then resized to a standard 100x100 pixels. The image then
passed through the pre-trained Convolutional Neural Network that will eventually give the
output.
For the Convolutional Neural Network, the researchers used the MobileNetv2 structure
also shown in Fig. 3.4 which has two kinds of blocks. One is the remaining block with a stride of
1. The block with stride 2 for downsizing is the other. For both types of blocks, there are 3
layers. The 1x1 convolution with ReLU6 is the first layer. The second layer is the depth-wise
convolution, and last is another 1x1 convolution but without any non-linearity.
After being resized to 100x100 pixels, the input then entered the first layer which is the
ReLU6 1x1 convolution which then proceeded to the second layer which is the ReLU6 3x3
depth-wise. Lastly, it entered through a linear 1x1 convolution which will then be flattened to 50
layers.
When the system detected the input is wearing a face mask, it was processed as a data.
However, when the program detects that the image is not wearing a face mask, the system, along
with a beeping sound of the alarm. All the outputs were displayed on the monitor.
Fig 3.4 Flow Chart of the System
Training of the dataset
The database used for the training of dataset had 3200 pictures, with 1600 images for
persons with face mask and the remaining 1600 images for persons without face mask. The
images were of different resolution and were RGB. Almost 99% of the images were randomly
extracted from the internet of free licensing. 30% of the images have variation in facial
expression and face size while 3% of the images have variation in capture angle.
Fig 3.5 Image sample for the training data
Materials
Web Camera. Web cam is a video camera that feeds or streams an image or video in real
time to or through a computer. A webcam was well suited for the researchers’ setup on gathering
of the video since it transmits videos on high definition.

Fig 3.6 1080P HD Web Camera USB type
Raspberry Pi. Raspberry pi is a single-board computer that enables the user to program
certain codes on its hardware since Raspberry pi runs on linux and operates in an open source
ecosystem. The Raspberry Pi was ideal since it is small in size and relatively cheaper compared
to a full size computer.
Fig 3.7 Raspberry pi model 4
LED Monitor. The LED Monitor were used to visually display the outputs of the system
provided by the Raspberry Pi.
Fig 3.7 AOC LED Monitor

Light Stand. Light stand serves as a backbone of the overall system providing seamless
construction of the overall system holding the webcam, Raspberry Pi, and monitor as one device.
Fig 3.8 Heavy Duty Light Stand
Keyboard and Mouse. Keyboard and mouse provided the user to control the system manually by
using the controls provided by the raspberry pi. Since the connections were USB-A, the inputs came in
handy for the Raspberry Pi.
Fig 3.9 A4TECH USB Keyboard and Mouse
Speakers. For the beep sound when no face mask was detected, speakers were used to amplify
the sound coming from the output of the Raspberry Pi.
Fig 4.0. 3.5mm Speaker

Results and Discussion
The system was deployed and tested on the residencies of the researchers due to the
inability to deploy the system (due to the pandemic and suspension of classes) in its previous
location which was the Malayan Colleges Laguna. The researchers gathered a crowd of 5 people
to be the persons to be detected with and without the masks.
The data that the researchers gathered were not similar compared to when it would be
deployed in Malayan Colleges Laguna since the crowd density of the students in Malayan
Colleges Laguna would be relatively higher compared to the crowd gathered by the researchers
which were only 5. The researchers conducted 20 trials to gather the data, 10 of which were
outside the premises and the remaining 10 to be inside the premises simulating a deployment in a
certain establishment (Malayan Colleges Laguna). Sample video feed of the system is in Fig. 5
below.
The system then detects the number of faces encountered in the live video feed and
results were then compared to the manual count of number of faces detected by the researchers
themselves. The system also detects the number of people wearing and not wearing the
facemask. The researchers also recorded if false positive instances occurred. All results were
tabulated in Table 1 and Table 2 below.

Face detected Face detected False positive occurred in
Trial #
by researchers by system the system
Trial 1 2 2 None
Trial 2 1 1 None
Trial 3 1 1 None
Trial 4 3 3 None
Trial 5 2 1 Yes
Trial 6 3 3 Yes
Trial 7 1 1 None
Trial 8 1 1 None
Trial 9 1 1 None
Trial 10 1 1 None
Trial 11 4 4 None
Trial 12 5 4 Yes
Trial 13 5 5 None
Trial 14 4 4 None
Trial 15 5 4 Yes
Trial 16 5 5 None
Trial 17 5 3 Yes
Trial 18 5 5 None
Trial 19 5 5 None
Trial 20 5 5 None
Table 1 Data gathered for face detection of the system vs. manual detection by researchers
Manual Count Masked

Manual Count of No Mask detected by
of Unmasked detected by
Masked people system
people system
Trial 1 2 0 2 0
Trial 2 0 1 0 1
Trial 3 1 0 1 0
Trial 4 1 2 1 2
Trial 5 2 0 1 1
Trial 6 2 1 1 2
Trial 7 0 1 0 1
Trial 8 0 1 0 1
Trial 9 0 1 0 1
Trial 10 0 1 0 1
Trial 11 2 2 2 2
Trial 12 3 2 2 2
Trial 13 3 2 3 2
Trial 14 4 0 4 0
Trial 15 3 2 3 1
Trial 16 2 3 1 4
Trial 17 2 3 2 1
Trial 18 2 3 2 3
Trial 19 5 0 5 0
Trial 20 0 5 0 5
Table 2. Data gathered for mask detection of the system vs. manual detection by researchers
Fig. 5.2 Sample Video feed of the face mask detection system
Table 1 shows the data gathered for the face detection of CNN relative to the gathered data by
manual counting of the researchers as the counter. The data recorded by the system in Table 1 can be
observed as relatively close to the data gathered manually by the researchers with a slight difference of
1-2 persons not counted by the system. This discrepancy of the counting of the system may slightly be
caused by the training data not recognizing several faces within its data. Although, in most of the trials,
the system can count what the researchers can count with an accurate results. This goes to show that
with enough training data, CNN can be not only be used in people counting, but can also be used in
detecting face masks.
Table 2 shows the data gathered for the face mask detection of CNN relative to the face mask
detection of the researchers which were detected manually. Same as Table 1, the data gathered by the
system were relatively close to the detection of the researchers with only 2-3 person discrepancy that
was not counted by the system. The discrepancy in thre face mask detection might slightly be caused by
the training data slightly not recognizing several varieties of face masks available in the market today.
But considering the amount of training data that was trained by the system, we can observe that CNN
can be used to detect people wearing and not wearing face masks.
Manual Count Manual Count

Masked detected No Mask detected
of Masked of Unmasked
by system by system
people people
Trial # 20 20 20 20
Mean 1.7 1.5 1.5 1.5
Standard 1.4546 1.3179 1.4327 1.3179
Deviation
Variance 2.1158 1.7368 2.0526 1.7368
Table 3. System reliability test
To ensure the reliability of the system, the researchers did the test of significance by calculating
the mean, standard deviation, and variance of each trial with and without facemasks, both manual and
system count to verify how reliable can the CNN facemask detection be. Based on the analysis done, the
standard deviations for both masked and unmasked trials for the manual and system counting were
relatively close with one another with a slight difference of 0.0219 in the masked while having no
difference in unmasked detection.
With being almost the same standard deviation, we can claim that the Convolutional Neural
Network (CNN) system of facemask detection was able to provide a significant amount of reliability
when it comes to detecting persons with and without facemasks. A linear graph presentation could also be
seen in figures 5.3 and 5.4 showing the slight differences of the detection of the system versus the manual
count of the researchers.
Manual vs CNN (Masked)

6 5
Face Mask detected
5 4
4 3 3 3
32 2 2 2 2 2 2 2
2 1 1 1 1 1
1 0 0 0 0 0 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Trial #
Manual Count of Masked people System Count of Masked people
Fig 5.3 Linear graph of manual facemask detection vs. CNN facemask detection (Masked)
Manual vs CNN (Unmasked)

No. of unasked detected
6 5
5 4
4 3 3 3
3 2 2 2 2 2 2
2 1 1 1 1 1 1 1 1 1
10 0 0 0 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Trial #
Manual Count of Unmasked people System Count of Unmasked people

Fig 5.4 Linear graph of manual facemask detection vs. CNN facemask detection (Unmasked)
Graph in Fig. 5.3 and 5.4 shows that the trend line for both trials of masked and
unmasked were almost of the same. This trend lines clearly show how the system and the method
used (Convolutional Neural Network) can be reliable in counting persons with and without face
mask. The graph can also show that in a trial, the system can have a slight discrepancy of 1-2
persons with a small standard deviation. Having a small standard deviation only means that the
data gathered were all positioned close to the mean. And based on the standard deviation from
Table 3, the manual and system count tend to have similar changes, hence, being reliable.
Conclusion
Based on the gathered results of the researchers, we can conclude that the system
developed by the researchers provided an accurate and reliable face mask detection system using
Convolutional Neural Network (CNN). The reliability of such system provided was also assessed
and was proven to be reliable being relatively close to the manual detection of the researchers.
As trials were conducted inside and outside premises of the researchers, the system also
assures that the reliability of the face mask detection system was not compromised whether the
system were inside or outside the premises, hence, its effectiveness would also be the same in
public areas where the mandatory use of facemasks are now being implemented. The face mask
detection using Convolutional Neural Network (CNN) can be helpful in implementing the strict
use of face masks in public areas.
The face mask detection using Convolutional Neural Network (CNN) can provide an
accurate estimation based on the architecture of MobileNet v2 which is a Convolutional Neural

network well suited for mobile use as well as being used in small and compact and low-power
devices such as Raspberry Pi. The system utilizes 53 layers that can store pre trained images to
detect images that were trained in its database.
Based on testing, optimal results for the data could be achieved when the camera is 5ft
above the ground with no angle tilt (0 degree) with respect to vertical whether that is inside or
outside the premises of an establishment. This deployment ensures that the detection of face and
mask could be detected as a flat image with less angles to prevent false positive results.
The face mask detection system using Convolutional Neural Network (CNN) can provide
convenience to our frontliners in helping and detecting persons with and without facemask
especially in crowded areas where a facemask detection could be a handful device to have.
The researchers’ main objective was to create a facemask detection system that was able
to detect and monitor persons not wearing a facemask using Convolutional Neural Network
(CNN), and the researchers do believe this main objective was achieved in this research.
Recommendations
As the objectives were met in this research, certain improvements could still be helpful in
improving the reliability if the said system. As the time of this research was conducted, one the
most accurate algorithm to classify and detect images was Convolutional Neural Network
(CNN). So therefore, if further improvements in the algorithm could be done, this could lead to a
possibility of a higher reliability for detecting face masks. Also, in the study conducted, the
researchers came up with trained data of 3200 images, 1600 of which was persons with face
mask, and 1600 of which was persons without face mask. As a recommendation, the researchers
strongly believe that having more trained data could yield a higher accuracy in detecting face
masks. With more training data, this would also mean that the device to be used in programming
should be of higher RAM. The researchers strongly suggest that on a higher number of trained
data, RAM should be at least 8GB or higher. In this manner, the errors would be further reduced,
therefore increasing its reliability.

Mapua Institute of Technology at Laguna Academic Year 2019-2020

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mapua Institute of Technology at Laguna Academic Year 2019-2020

Uploaded by

Copyright:

Available Formats

MAPUA INSTITUTE OF TECHNOLOGY AT LAGUNA

Academic Year 2019-2020

A CONVOLUTIONAL NEURAL NETWORK (CNN) BASED FACE MASK

Aeron Normel Almenanza MENDOZA

Daryl Cuevas PAYUAN

Aaron John Rendon SOTO

Engr. Jonathan R. Oracion

Submitted to the Faculty of Malayan Colleges Laguna

In Partial Fulfillment of the Requirements for the degree of

Bachelor of Science in Electronics Engineering

Fig 3.2 System Design

Fig 3.3 Prototype construction set-up.

Fig 3.5 Image sample for the training data

of the video since it transmits videos on high definition.

to a full size computer.

Fig 3.7 Raspberry pi model 4

provided by the Raspberry Pi.

Fig 3.7 AOC LED Monitor

Fig 3.8 Heavy Duty Light Stand

handy for the Raspberry Pi.

Fig 3.9 A4TECH USB Keyboard and Mouse

the sound coming from the output of the Raspberry Pi.

Fig 4.0. 3.5mm Speaker

to be the persons to be detected with and without the masks.

tabulated in Table 1 and Table 2 below.

Manual Count Masked

detecting face masks.

Manual Count Manual Count

difference in unmasked detection.

count of the researchers.

Manual vs CNN (Masked)

Manual Count of Masked people System Count of Masked people

Manual vs CNN (Unmasked)

Manual Count of Unmasked people System Count of Unmasked people

use of face masks in public areas.

accurate estimation based on the architecture of MobileNet v2 which is a Convolutional Neural

detect images that were trained in its database.

therefore increasing its reliability.

You might also like