Professional Documents
Culture Documents
bt3086 Project Report Detection of Face Mask Using Deep Learning Using Convolutional Neural Networkin
bt3086 Project Report Detection of Face Mask Using Deep Learning Using Convolutional Neural Networkin
bt3086 Project Report Detection of Face Mask Using Deep Learning Using Convolutional Neural Networkin
A Project Report
on
Face Mask Recognition Using Deep Learning
Submitted in partial fulfillment of the
requirement for the award of the degree of
Bachelor of Technology
Submitted By
Avdeep Malik 19SCSE1010214
Shubham Upadhyay 19SCSE1010467
CANDIDATE’S DECLARATION
We hereby certify that the work which is being presented in the project, entitled <Face Mask
Recognition Using Deep Learning= in partial fulfillment of the requirements for the award of
the Bachelor of Technology submitted in the School of Computing Science and Engineering of
Galgotias University, Greater Noida, is an original work carried out during the period of January,
2022 to May, 2022, under the supervision of Mr. Deependra Rastogi, Department of Computer
The matter presented in the project has not been submitted by us for the award of any other
This is to certify that the above statement made by the candidates is correct to the best of my
knowledge.
CERTIFICATE
Upadhyay 19SCSE1010467 has been held on 10/05/2022 and their work is recommended for the
award of B.Tech.
Date: 10/05/2022
Abstract
Table of Contents
Title
Candidates Declaration
Acknowledgement
Abstract
Contents
List of Figures
Chapter 1 Introduction
1.1 Introduction
1.2 Formulation of Problem
1.2.1 Tool and Technology Used
Chapter 2 Literature Survey/Project Design
List of Figures
S.No. Caption
1 Over all structure of CNN
2 System architecture
3 Use case diagram
4 Sequence diagram
5 Activity diagram
6 Block diagram
7 Uml diagram
8 Flow chart diagram
CHAPTER-1
Introduction
Face mask detection refers to detect whether a person is wearing a mask or not. In
fact, the problem is reverse engineering of face detection where the face is detected
using different machine learning algorithms for the purpose of security,
authentication and surveillance. Face detection is a key area in the field of
Computer Vision and Pattern Recognition. A significant body of research has
contributed sophisticated to algorithms for face detection in past. The primary
research on face detection was done in 2001 using the design of handcraft feature
and application of traditional machine learning algorithms to train effective
classifiers for detection and recognition. The problems encountered with this
approach include high complexity in feature design and low detection accuracy. In
recent years, face detection methods based on deep convolutional neural networks
(CNN) have been widely developed to improve detection performance.
1. Python
2. Keras
Keras is an API designed for human beings, not machines. Keras follows
best practices for reducing cognitive load: it offers consistent & simple
APIs, it minimizes the number of user actions required for common use
cases, and it provides clear & actionable error messages. It also has
extensive documentation and developer guides.
3. tensorflow
4. opencv-python
5. imutils
6. matplotlib
7. numpy
8. scipy
9. PyCharm
CHAPTER-2
Literature Survey
[1] A paper is released by Militante and Dionisio in 2020 in which they used a
dataset of 25,000 pictures with a pixel resolution of 224224 and achieved a 96
percent accuracy rate. They used an Artificial Neural Network (ANN) to simulate
human brain activation. Raspberry Pi is being used in their research to warn in
public areas if someone enters without a mask
[2] Guillermo et al. (2020) provided a research report that was used to detect face
masks. They used this methodology to construct an artificial dataset on their own
in their research report. They used around 600 raw photos without masks to create
a fictitious set of data created via applying a mask to the face employing artificial
intelligence techniques to get positions.
[3] In 2018, Boyko, Basystiuk, and Shakhovs ka revealed their findings, which
were based on the Dlib and OpenCV libraries. The efficiency of these 2 most
frequent machine lea rning packages is compared using the hog method for finding
and subsequent recognition. The coordinates of the facial border were obtained
using the OpenFace package. They divided the facial characteristics into groups
using 128 facial feature extraction, which will allow them to lead them with greater
accuracy.
[4] Face recognition will gain substantial importance and prospective uses in the
future years, according to D. Dwivedi's "Data Science, Artificial Intelligence, Deep
Learning, Computer Vision, Machine Learning, Data Visualization, and Coffee."
Face detection is a critical component of the face recognition process. In the last
several years study effort had done to advance face detection and improve
prediction accuracy. It has a wide range of uses in a variety of industries, including
law enforcement, entertainment, and safety.
[5] Pandiyan submitted this study paper in 2020, in which he devised a Text
message warning system for without mask peoples being checked by Video
surveillance in public places. CNN layers are utilised in this study to recognise
masks and collect photographs of persons who aren't wearing them, the recorded
photos are kept on-the-fly using AWS (Aws Services). Twilio messaging, which is
an API for sending and receiving Text messages, has been used to issue a
notification to the person whose picture has already being apprehended and saved
in the Amazon Web Services system.
[6] In 2020, Das, Ansari, and Basak published their research report, which
compared the reliability and lost outcomes datasets using two sets of data. This
study is based on OpenCV, TensorFlow, and Python.acquire the result, you'll need
to use the Keras libraries. Das et al. have specified their field of study.With a total
of 1376 people gazing forward, I created my own dataset.690 images with a white
mask and 686 images without a mask; 2nd The dataset comes from Kaggle and has
853 photos.There are two types of masks: masks with masks and masks without
masks. mask. They used 20 epochs to train their model.with a 90 percent training
data and 10 percent testing data were used in this experiment.
[7] Chen et al. (2020) presented their study work in which they offered a method
for determining whether or not he is wearing a face mask. Deep learning and
machine learning approaches are combined in this study. This model has seven
steps: input face mask dataset, input face mask Restructure the collection onto disc,
read the dataset from disc, then detect faces from a picture stream using Python
packages., decide whether to use a mask, and report the result.
[8] This research article was given by Kalas (2019), and the author worked on
recognising a person's face from a video stream. For face detection, three
technologies are employed in this study: OpenCV, adaboost, and the haar-like
method is used for object recognition, as per the research., Adaboost is used for
sample picture training since it does not overstretch the conceptual framework, and
By evaluating the face's properties, the Haarlike Algorithm is employed to collect
the boundary parameters of the face.
[9] Mohan, Paul, and Chirania (2021) published a research article that uses an
ARMCortex M7 microprocessor clocked at 480MHz with a 496kb framebuffer to
identify a face mask. They demonstrated their approach for 138 kilobytes
postquantization with 30fps inference rate using data sets, two from Kaggle with
12232 photographs each. and one from the author using an OpenMV cam H7
controller camera with 1979 images. The dataset was increased to 1,31,124 photos,
and after that all photos were reduced to 3232 pixels, which is the best size for the
microcontroller's configured frame buffer, which is 496kb. Finally, it was the best
fit model for RAM-constrained microcontrollers, with a 99.79 percent accuracy.
[10] Salih basic & Oreho vacki in 2020 they shown their study in which they
utilised Open-CV to recognise faces in images and gender from facial traits.
Salihbai et al. employed the LBPH model for feature classification, which
recognises features and feeds them to the three Convolutional Neural Network
layers. The first layer of Convolutional Neural Network has ninety-six filters, the
2nd layer contains Two hundred fifty six filters, and the final layer contains 384
filters. However, if the person's face is lighted, the position is different, the
camera9s characteristics of the smartphone, and the phone's performance are all
different, the accuracy of their model is reduced significantly.
[11] Loey, Manogaran, Taha, and Khalifa (2021) presented their research work in
which they offered to utilise three datasets to differentiate their correctness by
running them through the same algorithms. RMFD, SMFD, and LFW are the three
datasets that they utilised. They utilised ResNet50 in conjunction with traditional
machine-learning Support Virtual Machine in this study, and they feel ResNet-50
performs better as a feature extractor. As a result, ResNet-50 is employed as a
feature extractor, while Support Virtual Machine is used for training, validation,
and testing. The study obtained 99.64 percent, 99.49 percent, and 100 percent
accuracy in RMFD, SMFD, and LFW, respectively, using these technologies.
CHAPTER-3
Functionality/Working of Project
The Viola-Jones algorithm is one of the most popular algorithms for objects
recognition in an image. This research paper deals with the possibilities of
parametric optimization of the Viola-Jones algorithm to achieve maximum
efficiency of the algorithm in specific environmental conditions. It is shown that
with the use of additional modifications it is possible to increase the speed of the
algorithm in a particular image by 2-5 times with the loss of accuracy and
completeness of the work by not more than the 3-5%.
Positive Images: These images contain the images which we want our classifier to
identify.
Negative Images: Images of everything else, which do not contain the object we
want to detect.
DEEP LEARNING
1. Deep learning is an AI function that mimics the workings of the human brain
in processing data for use in detecting objects, recognizing speech,
translating languages, and making decisions.
2. Deep learning AI is able to learn without human supervision, drawing from
data that is both unstructured and unlabeled.
3. In this, face mask detection is built using Deep Learning technique called as
Convolution Neural Networks (CNN).
Deep learning methods aim at learning feature hierarchies with features from
higher levels of the hierarchy formed by the composition of lower-level features.
Automatically learning features at multiple levels of abstraction allow a system to
learn complex functions mapping the input to the output directly from data,
without depending completely on human-crafted features. Deep learning
algorithms seek to exploit the unknown structure in the input distribution in order
to discover good representations, often at multiple levels, with higher-level learned
features defined in terms of lower-level features.
CNNs, like neural networks, are made up of neurons with learnable weights and
biases. Each neuron receives several inputs, takes a weighted sum over them, pass
it through an activation function and responds with an output. The whole network
has a loss function and all the tips and tricks that we developed for neural networks
still apply on CNNs. In more detail the image is passed through a series of
convolution, nonlinear, pooling layers and fully connected layers, then generates
the output.
Each layer combines patches from previous layers. Convolutional Networks are
trainable multistage architectures composed of multiple stages Input and output of
each stage are sets of arrays called feature maps. At output, each feature map
represents a particular feature extracted at all locations on input. Each stage is
composed of: a filter bank layer, a non-linearity layer, and a feature pooling layer.
A ConvNet is composed of 1, 2 or 3 such 3-layer stages, followed by a
classification module.
Basic structure of CNN, where C1, C3 are convolution layers and S2, S4 are
pooled/sampled layers.
Filter: A trainable filter (kernel) in filter bank connects input feature map to output
feature map Convolutional layers apply a convolution operation to the input,
passing the result to the next layer. The convolution emulates the response of an
individual neuron to visual stimuli.
CONVOLUTIONAL LAYER
It is always first. The image (matrix with pixel values) is entered into it. Image that
the reacting of the input matrix begins at the top left of image. Next the software
selects the smaller matrix there, which is called a filter. Then the filter produces
convolution that is moves along the input image. The filter task is to multiple its
value by the original pixel values. All these multiplications are summed up and one
number is obtained at the end. Since the filter has read the image only in the upper
left corner it moves further by one unit right performing a similar operation. After
passing the filter across all positions, a matrix is obtained, but smaller than a input
matrix.
It is adder after each convolution operation. It has the activation function, which
brings nonlinear property, without this property a network would not be,
sufficiently intense and will not be able to model the response variable.
It follows the nonlinear layer. It works with width and height of the image and
performs a down sampling operation on them. As a result image volume is
reduced. This means that if some features already been identified in the previous
convolution operation, then a detailed image is no longer needed for further
processing and is compressed to less detailed pictures.
CNN MODEL
1. This CNN model is built using the Tensorflow framework and the OpenCv
library which is highly used for real-time applications.
2. This model can also be used to develop a full-fledged software to scan every
person before they can enter the public gathering.
1. Conv2D Layer: It has 100 filters and the activation function used is the
8ReLu9. The ReLu function stands for Rectified Linear Unit which will
output the input directly if it is positive ,otherwise it will output zero.
3. Flatten () Layer: It is used to flatten all the layers into a single 1D layer.
5. Dense Layer: The activation function here is soft max which will output a
vector with two probability distribution values.
SYSTEM ARCHITECTURE
Data Visualization
In the first step, let us visualize the total number of images in our dataset in both
categories. We can see that there are 690 images in the 8yes9 class and 686 images
in the 8no9 class.
Data Augmentation
In the next step, we augment our dataset to include more number of images for our
training. In this step of data augmentation, we rotate and flip each of the images in
our dataset.
DESIGN
Use Case Diagram
Sequence Diagram
Activity Diagram
BLOCK DIAGRAM
CLASS DIAGRAM
FLOWCHART DIAGRAM
CODE IMPLEMENTATION
DIRECTORY = r"dataset"
CATEGORIES = ["with_mask", "without_mask"]
data = []
labels = []
data.append(image)
labels.append(category)
# load the MobileNetV2 network, ensuring the head FC layer sets are
# left off
baseModel = MobileNetV2(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)
# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)
# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
layer.trainable = False
# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig("plot.png")
# pass the blob through the network and obtain the face detections
faceNet.setInput(blob)
detections = faceNet.forward()
print(detections.shape)
while True:
# grab the frame from the threaded video stream and resize it
# to have a maximum width of 400 pixels
frame = vs.read()
frame = imutils.resize(frame, width=400)
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()
Chapter 4
Results and Discussion
Input
Output
Input
Output
The flow to identify the person in the webcam wearing the face mask or not. The
process is two-fold.
To identify the faces in the webcam
Classify the faces based on the mask.
Identify the Face in the Webcam: To identify the faces a pre-trained model
provided by the OpenCV framework was used.
The model was trained using web images. OpenCV provides 2 models for this face
detector
FUNCTIONAL REQUIREMENTS
External results are those that are exported outside the company.
Internal results, which are the main user and computer display and have a
place within the organization.
Operating results used only by the computer department.
User-interface results that allow the user to communicate directly with the
system.
Understanding the user's preferences, the level of technology and the needs
of his or her business through a friendly questionnaire.
NON-FUNCTIONAL REQUIREMENTS
SYSTEM CONFIGURATION
This project can run on commodity hardware. We ran entire project on an AMD
Ryzen5 processor with 16 GB Ram, 4 GB Nvidia Graphic Processor, It also has 6
cores which runs at 1.7 GHz, 2.1 GHz respectively. First part of the is training
phase which takes 10-15 mins of time and the second part is testing part which
only takes few seconds to make predictions and calculate accuracy.
HARDWARE REQUIREMENTS
• RAM: 4 GB
• Storage: 500 GB
• CPU: 2 GHz or faster
• Architecture: 32-bit or 64-bit
SOFTWARE REQUIREMENTS
• Python 3.5 in PyCharm is used for data pre-processing, model training and
prediction
• Operating System: windows 7 and above or Linux based OS or MAC OS
• Coding Language: Python.
• Nvidia Graphic Processor
Chapter 5
Conclusion and Future Scope
CONCLUSION
As the technology are blooming with emerging trends the availability so we have
novel face mask detector which can possibly contribute to public healthcare. The
architecture consists of Mobile Net as the backbone it can be used for high and low
computation scenarios. In order to extract more robust features, we utilize transfer
learning to adopt weights from a similar task face detection, which is trained on a
very large dataset. We used OpenCV, tensor flow, and NN to detect whether
people were wearing face masks or not. The models were tested with images and
real-time video streams. The accuracy of the model is achieved and, the
optimization of the model is a continuous process and we are building a highly
accurate solution by tuning the hyper parameters. This specific model could be
used as a use case for edge analytics. Furthermore, the proposed method achieves
state-of-the-art results on a public face mask dataset. By the development of face
mask-detection we can detect if the person is wearing a face mask and allow their
entry would be of great help to the society
FUTURE ENHANCEMENT
REFERENCES
[1] Militante, S. V., & Dionisio, N. V. (2020). Real-Time Face Mask Recognition
with Alarm System using Deep Learning. 2020 11th IEEE Control and System
Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia.
https://doi.org/10.1109/ ICSGRC49013.2020.9232610
[2] Guillermo, M., Pascua, A. R. A., Billones, R. K., Sybingco, E., Fillone, A., &
Dadios, E. (2020). COVID19 Risk Assessment through Multiple Face Mask
Detection using MobileNetV2 DNN. The 9th International Symposium on
Computational Intelligence and Industrial Applications (ISCIIA2020), Beijing,
China. https:// isciia2020.bit.edu.cn/docs/20201114082420135149. Pdf
[3] Boyko, N., Basystiuk, O., & Shakhovska, N. (2018). Performance Evaluation
and Comparison of Software for Face Recognition, Based on Dlib and Opencv
Library. 2018 IEEE Second International Conference on Data Stream Mining &
Processing (DSMP), Lviv, Ukraine. https://doi.org/10.1109/DSMP.2018.8478556
[5] Pandiyan, P. (2020, December 17). Social Distance Monitoring and Face Mask
Detection Using Deep Neural Network. Retrieved from: https://www.researchgate.
net/publication/347439579_Social_Distance_Monitoring_and_Face_Mask_Detecti
on_Using_Deep_Neural_Netw ork
[6] Das, A., Ansari, M. W., & Basak, R. (2020). Covid-19 Face Mask Detection
Using TensorFlow, Keras and OpenCV. 2020 IEEE 17th India Council
International Conference (INDICON), New Delhi, India. https://
doi.org/10.1109/INDICON49873.2020.9342585
[7] Chen, Y., Hu, M., Hua, C., Zhai, G., Zhang, J., Li, Q., & Yang, S. X. (2020).
Face Mask Assistant: Detection of Face Mask Service Stage Based on Mobile
Phone. Preprint arXiv:2010.06421. https://arxiv.org/abs/2010.06421
[8] Kalas, M. S. (2019). Real Time Face Detection and Tracking using OpenCV.
International Journal of Soft Computing and Artificial Intelligence, 2(1), 41- 44.
[9] Mohan, P., Paul, A. J., & Chirania, A. (2021). A Tiny CNN Architecture for
Medical Face Mask Detection for Resource-Constrained Endpoints. In: Mekhilef,
S., Favorskaya, M., Pandey, R. K., Shaw, R. N. (eds.) Innovations in Electrical and
Electronic Engineering. Lecture Notes in Electrical Engineering, vol 756. Springer,
Singapore. https://doi.org/10.1007/978- 981-16-0749-3_52
[11] Loey, M., Manogaran, G., Taha, M. H. N., & Khalifa, N. E. M. (2021a).
Fighting against COVID-19: A novel deep learning model based on YOLOv2 with
ResNet-50 for medical face mask detection. Sustainable Cities and Society, 65,
102600. https://doi.org/10.1016/j.scs.2020.102600
BASE PAPER