Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

MAPUA INSTITUTE OF TECHNOLOGY AT LAGUNA

Academic Year 2019-2020

A CONVOLUTIONAL NEURAL NETWORK (CNN) BASED FACE MASK

DETECTION

Aeron Normel Almenanza MENDOZA

Daryl Cuevas PAYUAN

Aaron John Rendon SOTO

Engr. Jonathan R. Oracion

Submitted to the Faculty of Malayan Colleges Laguna

In Partial Fulfillment of the Requirements for the degree of

Bachelor of Science in Electronics Engineering


Methodology

This chapter explains the procedure that were used in this paper. The system consists of

several hardware including; the webcam, the Raspberry Pi, and a LED monitor display. The

webcam was responsible for the gathering of live video feed from its field of view. For the

program using CNN, the researchers then used a Raspberry Pi for the researchers to program a

CNN code with the aid of python software. The computation from the CNN detected if the

person from the video feed is wearing or not wearing a mask. Python is a high-level

programming language that can be used to develop desktop graphical user interface (GUI)

applications, websites and web applications. For the system, the researchers opted to use python

to code the Convolutional Neural Network face mask detection system that would detect persons

without face mask in real time that will sound an alarm along with the face screenshot of the

person not wearing a face mask. Results and screenshots will then be fed to the LED monitor

display as an output.

Fig 3.2 System Design


System Construction

Multiple (2) webcams were implemented within the premises of an establishment in order

for the researchers to fully verify the wearing of facemasks within the premises and ensure its

strict enforcement. All cameras were interconnected through the Raspberry Pi. The researchers

opted to install the webcam on the top of a light stand of 5 ft. in such a way that it would detect a

certain population through the distance. LED TV were mounted to the light stand itself so that it

is easily viewable by the user. The raspberry Pi along with the speaker for the alarm were

installed at the back of the monitor and the overall system were connected thru cables.

Fig 3.3 Prototype construction set-up.


Flowchart

The flowchart of the overall process shown in Fig. 3.4 shows that the system started to

gather live video feed from the source which is a webcam while simultaneously performing

facial recognition. When the program did not detect a face from the video feed, it did continue to

perform the facial recognition until the program detects a face in the video. Generally, the facial

recognition did continue its process on a loop in order to detect single or multiple faces.

When the system detected a face from the facial recognition stage, it then extracted an

image of the face and apply that image into an image classifier. The image classifier then gave

the program the region of interest along with the specific width, height and its coordinates. For

the preprocessing, the image was then resized to a standard 100x100 pixels. The image then

passed through the pre-trained Convolutional Neural Network that will eventually give the

output.

For the Convolutional Neural Network, the researchers used the MobileNetv2 structure

also shown in Fig. 3.4 which has two kinds of blocks. One is the remaining block with a stride of

1. The block with stride 2 for downsizing is the other. For both types of blocks, there are 3

layers. The 1x1 convolution with ReLU6 is the first layer. The second layer is the depth-wise

convolution, and last is another 1x1 convolution but without any non-linearity.

After being resized to 100x100 pixels, the input then entered the first layer which is the

ReLU6 1x1 convolution which then proceeded to the second layer which is the ReLU6 3x3

depth-wise. Lastly, it entered through a linear 1x1 convolution which will then be flattened to 50

layers.
When the system detected the input is wearing a face mask, it was processed as a data.

However, when the program detects that the image is not wearing a face mask, the system, along

with a beeping sound of the alarm. All the outputs were displayed on the monitor.
Fig 3.4 Flow Chart of the System
Training of the dataset

The database used for the training of dataset had 3200 pictures, with 1600 images for

persons with face mask and the remaining 1600 images for persons without face mask. The

images were of different resolution and were RGB. Almost 99% of the images were randomly

extracted from the internet of free licensing. 30% of the images have variation in facial

expression and face size while 3% of the images have variation in capture angle.

Fig 3.5 Image sample for the training data

Materials

Web Camera. Web cam is a video camera that feeds or streams an image or video in real

time to or through a computer. A webcam was well suited for the researchers’ setup on gathering

of the video since it transmits videos on high definition.


Fig 3.6 1080P HD Web Camera USB type

Raspberry Pi. Raspberry pi is a single-board computer that enables the user to program

certain codes on its hardware since Raspberry pi runs on linux and operates in an open source

ecosystem. The Raspberry Pi was ideal since it is small in size and relatively cheaper compared

to a full size computer.

Fig 3.7 Raspberry pi model 4

LED Monitor. The LED Monitor were used to visually display the outputs of the system

provided by the Raspberry Pi.

Fig 3.7 AOC LED Monitor


Light Stand. Light stand serves as a backbone of the overall system providing seamless

construction of the overall system holding the webcam, Raspberry Pi, and monitor as one device.

Fig 3.8 Heavy Duty Light Stand

Keyboard and Mouse. Keyboard and mouse provided the user to control the system manually by

using the controls provided by the raspberry pi. Since the connections were USB-A, the inputs came in

handy for the Raspberry Pi.

Fig 3.9 A4TECH USB Keyboard and Mouse

Speakers. For the beep sound when no face mask was detected, speakers were used to amplify

the sound coming from the output of the Raspberry Pi.

Fig 4.0. 3.5mm Speaker


Results and Discussion

The system was deployed and tested on the residencies of the researchers due to the

inability to deploy the system (due to the pandemic and suspension of classes) in its previous

location which was the Malayan Colleges Laguna. The researchers gathered a crowd of 5 people

to be the persons to be detected with and without the masks.

The data that the researchers gathered were not similar compared to when it would be

deployed in Malayan Colleges Laguna since the crowd density of the students in Malayan

Colleges Laguna would be relatively higher compared to the crowd gathered by the researchers

which were only 5. The researchers conducted 20 trials to gather the data, 10 of which were

outside the premises and the remaining 10 to be inside the premises simulating a deployment in a

certain establishment (Malayan Colleges Laguna). Sample video feed of the system is in Fig. 5

below.

The system then detects the number of faces encountered in the live video feed and

results were then compared to the manual count of number of faces detected by the researchers

themselves. The system also detects the number of people wearing and not wearing the

facemask. The researchers also recorded if false positive instances occurred. All results were

tabulated in Table 1 and Table 2 below.


Face detected Face detected False positive occurred in
Trial #
by researchers by system the system
Trial 1 2 2 None
Trial 2 1 1 None
Trial 3 1 1 None
Trial 4 3 3 None
Trial 5 2 1 Yes
Trial 6 3 3 Yes
Trial 7 1 1 None
Trial 8 1 1 None
Trial 9 1 1 None
Trial 10 1 1 None
Trial 11 4 4 None
Trial 12 5 4 Yes
Trial 13 5 5 None
Trial 14 4 4 None
Trial 15 5 4 Yes
Trial 16 5 5 None
Trial 17 5 3 Yes
Trial 18 5 5 None
Trial 19 5 5 None
Trial 20 5 5 None
Table 1 Data gathered for face detection of the system vs. manual detection by researchers

Manual Count Masked


Manual Count of No Mask detected by
of Unmasked detected by
Masked people system
people system
Trial 1 2 0 2 0
Trial 2 0 1 0 1
Trial 3 1 0 1 0
Trial 4 1 2 1 2
Trial 5 2 0 1 1
Trial 6 2 1 1 2
Trial 7 0 1 0 1
Trial 8 0 1 0 1
Trial 9 0 1 0 1
Trial 10 0 1 0 1
Trial 11 2 2 2 2
Trial 12 3 2 2 2
Trial 13 3 2 3 2
Trial 14 4 0 4 0
Trial 15 3 2 3 1
Trial 16 2 3 1 4
Trial 17 2 3 2 1
Trial 18 2 3 2 3
Trial 19 5 0 5 0
Trial 20 0 5 0 5
Table 2. Data gathered for mask detection of the system vs. manual detection by researchers

Fig. 5.2 Sample Video feed of the face mask detection system
Table 1 shows the data gathered for the face detection of CNN relative to the gathered data by

manual counting of the researchers as the counter. The data recorded by the system in Table 1 can be

observed as relatively close to the data gathered manually by the researchers with a slight difference of

1-2 persons not counted by the system. This discrepancy of the counting of the system may slightly be

caused by the training data not recognizing several faces within its data. Although, in most of the trials,

the system can count what the researchers can count with an accurate results. This goes to show that

with enough training data, CNN can be not only be used in people counting, but can also be used in

detecting face masks.

Table 2 shows the data gathered for the face mask detection of CNN relative to the face mask

detection of the researchers which were detected manually. Same as Table 1, the data gathered by the

system were relatively close to the detection of the researchers with only 2-3 person discrepancy that

was not counted by the system. The discrepancy in thre face mask detection might slightly be caused by

the training data slightly not recognizing several varieties of face masks available in the market today.

But considering the amount of training data that was trained by the system, we can observe that CNN

can be used to detect people wearing and not wearing face masks.

Manual Count Manual Count


Masked detected No Mask detected
of Masked of Unmasked
by system by system
people people
Trial # 20 20 20 20
Mean 1.7 1.5 1.5 1.5
Standard 1.4546 1.3179 1.4327 1.3179
Deviation
Variance 2.1158 1.7368 2.0526 1.7368
Table 3. System reliability test

To ensure the reliability of the system, the researchers did the test of significance by calculating

the mean, standard deviation, and variance of each trial with and without facemasks, both manual and
system count to verify how reliable can the CNN facemask detection be. Based on the analysis done, the

standard deviations for both masked and unmasked trials for the manual and system counting were

relatively close with one another with a slight difference of 0.0219 in the masked while having no

difference in unmasked detection.

With being almost the same standard deviation, we can claim that the Convolutional Neural

Network (CNN) system of facemask detection was able to provide a significant amount of reliability

when it comes to detecting persons with and without facemasks. A linear graph presentation could also be

seen in figures 5.3 and 5.4 showing the slight differences of the detection of the system versus the manual

count of the researchers.

Manual vs CNN (Masked)


6 5
Face Mask detected

5 4
4 3 3 3
32 2 2 2 2 2 2 2
2 1 1 1 1 1
1 0 0 0 0 0 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Trial #

Manual Count of Masked people System Count of Masked people

Fig 5.3 Linear graph of manual facemask detection vs. CNN facemask detection (Masked)

Manual vs CNN (Unmasked)


No. of unasked detected

6 5
5 4
4 3 3 3
3 2 2 2 2 2 2
2 1 1 1 1 1 1 1 1 1
10 0 0 0 0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Trial #

Manual Count of Unmasked people System Count of Unmasked people


Fig 5.4 Linear graph of manual facemask detection vs. CNN facemask detection (Unmasked)

Graph in Fig. 5.3 and 5.4 shows that the trend line for both trials of masked and

unmasked were almost of the same. This trend lines clearly show how the system and the method

used (Convolutional Neural Network) can be reliable in counting persons with and without face

mask. The graph can also show that in a trial, the system can have a slight discrepancy of 1-2

persons with a small standard deviation. Having a small standard deviation only means that the

data gathered were all positioned close to the mean. And based on the standard deviation from

Table 3, the manual and system count tend to have similar changes, hence, being reliable.

Conclusion

Based on the gathered results of the researchers, we can conclude that the system

developed by the researchers provided an accurate and reliable face mask detection system using

Convolutional Neural Network (CNN). The reliability of such system provided was also assessed

and was proven to be reliable being relatively close to the manual detection of the researchers.

As trials were conducted inside and outside premises of the researchers, the system also

assures that the reliability of the face mask detection system was not compromised whether the

system were inside or outside the premises, hence, its effectiveness would also be the same in

public areas where the mandatory use of facemasks are now being implemented. The face mask

detection using Convolutional Neural Network (CNN) can be helpful in implementing the strict

use of face masks in public areas.

The face mask detection using Convolutional Neural Network (CNN) can provide an

accurate estimation based on the architecture of MobileNet v2 which is a Convolutional Neural


network well suited for mobile use as well as being used in small and compact and low-power

devices such as Raspberry Pi. The system utilizes 53 layers that can store pre trained images to

detect images that were trained in its database.

Based on testing, optimal results for the data could be achieved when the camera is 5ft

above the ground with no angle tilt (0 degree) with respect to vertical whether that is inside or

outside the premises of an establishment. This deployment ensures that the detection of face and

mask could be detected as a flat image with less angles to prevent false positive results.

The face mask detection system using Convolutional Neural Network (CNN) can provide

convenience to our frontliners in helping and detecting persons with and without facemask

especially in crowded areas where a facemask detection could be a handful device to have.

The researchers’ main objective was to create a facemask detection system that was able

to detect and monitor persons not wearing a facemask using Convolutional Neural Network

(CNN), and the researchers do believe this main objective was achieved in this research.

Recommendations

As the objectives were met in this research, certain improvements could still be helpful in

improving the reliability if the said system. As the time of this research was conducted, one the

most accurate algorithm to classify and detect images was Convolutional Neural Network

(CNN). So therefore, if further improvements in the algorithm could be done, this could lead to a

possibility of a higher reliability for detecting face masks. Also, in the study conducted, the

researchers came up with trained data of 3200 images, 1600 of which was persons with face

mask, and 1600 of which was persons without face mask. As a recommendation, the researchers
strongly believe that having more trained data could yield a higher accuracy in detecting face

masks. With more training data, this would also mean that the device to be used in programming

should be of higher RAM. The researchers strongly suggest that on a higher number of trained

data, RAM should be at least 8GB or higher. In this manner, the errors would be further reduced,

therefore increasing its reliability.

You might also like