Professional Documents
Culture Documents
Mapua Institute of Technology at Laguna Academic Year 2019-2020
Mapua Institute of Technology at Laguna Academic Year 2019-2020
DETECTION
This chapter explains the procedure that were used in this paper. The system consists of
several hardware including; the webcam, the Raspberry Pi, and a LED monitor display. The
webcam was responsible for the gathering of live video feed from its field of view. For the
program using CNN, the researchers then used a Raspberry Pi for the researchers to program a
CNN code with the aid of python software. The computation from the CNN detected if the
person from the video feed is wearing or not wearing a mask. Python is a high-level
programming language that can be used to develop desktop graphical user interface (GUI)
applications, websites and web applications. For the system, the researchers opted to use python
to code the Convolutional Neural Network face mask detection system that would detect persons
without face mask in real time that will sound an alarm along with the face screenshot of the
person not wearing a face mask. Results and screenshots will then be fed to the LED monitor
display as an output.
Multiple (2) webcams were implemented within the premises of an establishment in order
for the researchers to fully verify the wearing of facemasks within the premises and ensure its
strict enforcement. All cameras were interconnected through the Raspberry Pi. The researchers
opted to install the webcam on the top of a light stand of 5 ft. in such a way that it would detect a
certain population through the distance. LED TV were mounted to the light stand itself so that it
is easily viewable by the user. The raspberry Pi along with the speaker for the alarm were
installed at the back of the monitor and the overall system were connected thru cables.
The flowchart of the overall process shown in Fig. 3.4 shows that the system started to
gather live video feed from the source which is a webcam while simultaneously performing
facial recognition. When the program did not detect a face from the video feed, it did continue to
perform the facial recognition until the program detects a face in the video. Generally, the facial
recognition did continue its process on a loop in order to detect single or multiple faces.
When the system detected a face from the facial recognition stage, it then extracted an
image of the face and apply that image into an image classifier. The image classifier then gave
the program the region of interest along with the specific width, height and its coordinates. For
the preprocessing, the image was then resized to a standard 100x100 pixels. The image then
passed through the pre-trained Convolutional Neural Network that will eventually give the
output.
For the Convolutional Neural Network, the researchers used the MobileNetv2 structure
also shown in Fig. 3.4 which has two kinds of blocks. One is the remaining block with a stride of
1. The block with stride 2 for downsizing is the other. For both types of blocks, there are 3
layers. The 1x1 convolution with ReLU6 is the first layer. The second layer is the depth-wise
convolution, and last is another 1x1 convolution but without any non-linearity.
After being resized to 100x100 pixels, the input then entered the first layer which is the
ReLU6 1x1 convolution which then proceeded to the second layer which is the ReLU6 3x3
depth-wise. Lastly, it entered through a linear 1x1 convolution which will then be flattened to 50
layers.
When the system detected the input is wearing a face mask, it was processed as a data.
However, when the program detects that the image is not wearing a face mask, the system, along
with a beeping sound of the alarm. All the outputs were displayed on the monitor.
Fig 3.4 Flow Chart of the System
Training of the dataset
The database used for the training of dataset had 3200 pictures, with 1600 images for
persons with face mask and the remaining 1600 images for persons without face mask. The
images were of different resolution and were RGB. Almost 99% of the images were randomly
extracted from the internet of free licensing. 30% of the images have variation in facial
expression and face size while 3% of the images have variation in capture angle.
Materials
Web Camera. Web cam is a video camera that feeds or streams an image or video in real
time to or through a computer. A webcam was well suited for the researchers’ setup on gathering
Raspberry Pi. Raspberry pi is a single-board computer that enables the user to program
certain codes on its hardware since Raspberry pi runs on linux and operates in an open source
ecosystem. The Raspberry Pi was ideal since it is small in size and relatively cheaper compared
LED Monitor. The LED Monitor were used to visually display the outputs of the system
construction of the overall system holding the webcam, Raspberry Pi, and monitor as one device.
Keyboard and Mouse. Keyboard and mouse provided the user to control the system manually by
using the controls provided by the raspberry pi. Since the connections were USB-A, the inputs came in
Speakers. For the beep sound when no face mask was detected, speakers were used to amplify
The system was deployed and tested on the residencies of the researchers due to the
inability to deploy the system (due to the pandemic and suspension of classes) in its previous
location which was the Malayan Colleges Laguna. The researchers gathered a crowd of 5 people
The data that the researchers gathered were not similar compared to when it would be
deployed in Malayan Colleges Laguna since the crowd density of the students in Malayan
Colleges Laguna would be relatively higher compared to the crowd gathered by the researchers
which were only 5. The researchers conducted 20 trials to gather the data, 10 of which were
outside the premises and the remaining 10 to be inside the premises simulating a deployment in a
certain establishment (Malayan Colleges Laguna). Sample video feed of the system is in Fig. 5
below.
The system then detects the number of faces encountered in the live video feed and
results were then compared to the manual count of number of faces detected by the researchers
themselves. The system also detects the number of people wearing and not wearing the
facemask. The researchers also recorded if false positive instances occurred. All results were
Fig. 5.2 Sample Video feed of the face mask detection system
Table 1 shows the data gathered for the face detection of CNN relative to the gathered data by
manual counting of the researchers as the counter. The data recorded by the system in Table 1 can be
observed as relatively close to the data gathered manually by the researchers with a slight difference of
1-2 persons not counted by the system. This discrepancy of the counting of the system may slightly be
caused by the training data not recognizing several faces within its data. Although, in most of the trials,
the system can count what the researchers can count with an accurate results. This goes to show that
with enough training data, CNN can be not only be used in people counting, but can also be used in
Table 2 shows the data gathered for the face mask detection of CNN relative to the face mask
detection of the researchers which were detected manually. Same as Table 1, the data gathered by the
system were relatively close to the detection of the researchers with only 2-3 person discrepancy that
was not counted by the system. The discrepancy in thre face mask detection might slightly be caused by
the training data slightly not recognizing several varieties of face masks available in the market today.
But considering the amount of training data that was trained by the system, we can observe that CNN
can be used to detect people wearing and not wearing face masks.
To ensure the reliability of the system, the researchers did the test of significance by calculating
the mean, standard deviation, and variance of each trial with and without facemasks, both manual and
system count to verify how reliable can the CNN facemask detection be. Based on the analysis done, the
standard deviations for both masked and unmasked trials for the manual and system counting were
relatively close with one another with a slight difference of 0.0219 in the masked while having no
With being almost the same standard deviation, we can claim that the Convolutional Neural
Network (CNN) system of facemask detection was able to provide a significant amount of reliability
when it comes to detecting persons with and without facemasks. A linear graph presentation could also be
seen in figures 5.3 and 5.4 showing the slight differences of the detection of the system versus the manual
5 4
4 3 3 3
32 2 2 2 2 2 2 2
2 1 1 1 1 1
1 0 0 0 0 0 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Trial #
Fig 5.3 Linear graph of manual facemask detection vs. CNN facemask detection (Masked)
6 5
5 4
4 3 3 3
3 2 2 2 2 2 2
2 1 1 1 1 1 1 1 1 1
10 0 0 0 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Trial #
Graph in Fig. 5.3 and 5.4 shows that the trend line for both trials of masked and
unmasked were almost of the same. This trend lines clearly show how the system and the method
used (Convolutional Neural Network) can be reliable in counting persons with and without face
mask. The graph can also show that in a trial, the system can have a slight discrepancy of 1-2
persons with a small standard deviation. Having a small standard deviation only means that the
data gathered were all positioned close to the mean. And based on the standard deviation from
Table 3, the manual and system count tend to have similar changes, hence, being reliable.
Conclusion
Based on the gathered results of the researchers, we can conclude that the system
developed by the researchers provided an accurate and reliable face mask detection system using
Convolutional Neural Network (CNN). The reliability of such system provided was also assessed
and was proven to be reliable being relatively close to the manual detection of the researchers.
As trials were conducted inside and outside premises of the researchers, the system also
assures that the reliability of the face mask detection system was not compromised whether the
system were inside or outside the premises, hence, its effectiveness would also be the same in
public areas where the mandatory use of facemasks are now being implemented. The face mask
detection using Convolutional Neural Network (CNN) can be helpful in implementing the strict
The face mask detection using Convolutional Neural Network (CNN) can provide an
devices such as Raspberry Pi. The system utilizes 53 layers that can store pre trained images to
Based on testing, optimal results for the data could be achieved when the camera is 5ft
above the ground with no angle tilt (0 degree) with respect to vertical whether that is inside or
outside the premises of an establishment. This deployment ensures that the detection of face and
mask could be detected as a flat image with less angles to prevent false positive results.
The face mask detection system using Convolutional Neural Network (CNN) can provide
convenience to our frontliners in helping and detecting persons with and without facemask
especially in crowded areas where a facemask detection could be a handful device to have.
The researchers’ main objective was to create a facemask detection system that was able
to detect and monitor persons not wearing a facemask using Convolutional Neural Network
(CNN), and the researchers do believe this main objective was achieved in this research.
Recommendations
As the objectives were met in this research, certain improvements could still be helpful in
improving the reliability if the said system. As the time of this research was conducted, one the
most accurate algorithm to classify and detect images was Convolutional Neural Network
(CNN). So therefore, if further improvements in the algorithm could be done, this could lead to a
possibility of a higher reliability for detecting face masks. Also, in the study conducted, the
researchers came up with trained data of 3200 images, 1600 of which was persons with face
mask, and 1600 of which was persons without face mask. As a recommendation, the researchers
strongly believe that having more trained data could yield a higher accuracy in detecting face
masks. With more training data, this would also mean that the device to be used in programming
should be of higher RAM. The researchers strongly suggest that on a higher number of trained
data, RAM should be at least 8GB or higher. In this manner, the errors would be further reduced,