bt3086 Project Report Detection of Face Mask Using Deep Learning Using Convolutional Neural Networkin

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

lOMoARcPSD|16065545

BT3086 Project Report - DETECTION OF FACE MASK


USING DEEP LEARNING USING CONVOLUTIONAL
NEURAL NETWORK
B tech (Galgotias University)
IN

StuDocu is not sponsored or endorsed by any college or university


Downloaded by ashish yadav (ashishyadavking123@gmail.com)
lOMoARcPSD|16065545

A Project Report
on
Face Mask Recognition Using Deep Learning
Submitted in partial fulfillment of the
requirement for the award of the degree of

Bachelor of Technology

Under The Supervision of


Mr. Deependra Rastogi

Submitted By
Avdeep Malik 19SCSE1010214
Shubham Upadhyay 19SCSE1010467

SCHOOL OF COMPUTING SCIENCE AND ENGINEERING


GALGOTIAS UNIVERSITY, GREATER NOIDA
INDIA
May, 2022

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CANDIDATE’S DECLARATION

We hereby certify that the work which is being presented in the project, entitled <Face Mask

Recognition Using Deep Learning= in partial fulfillment of the requirements for the award of

the Bachelor of Technology submitted in the School of Computing Science and Engineering of

Galgotias University, Greater Noida, is an original work carried out during the period of January,

2022 to May, 2022, under the supervision of Mr. Deependra Rastogi, Department of Computer

Science and Engineering, of School of Computing Science and Engineering , Galgotias

University, Greater Noida

The matter presented in the project has not been submitted by us for the award of any other

degree of this or any other places.

Avdeep Malik 19SCSE1010214

Shubham Upadhyay 19SCSE1010214

This is to certify that the above statement made by the candidates is correct to the best of my

knowledge.

Mr. Deependra Rastogi

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CERTIFICATE

The Final Project Viva-Voce examination of Avdeep Malik 19SCSE1010467, Shubham

Upadhyay 19SCSE1010467 has been held on 10/05/2022 and their work is recommended for the

award of B.Tech.

Signature of Examiner(s) Signature of Supervisor(s)

Signature of Project Coordinator Signature of Dean

Date: 10/05/2022

Place: Greater Noida

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Abstract

Global pandemic COVID-19 circumstances emerged in an epidemic of dangerous


disease in all over the world. Wearing a face mask will help prevent the spread of
infection and prevent the individual from contracting any airborne infectious
germs. Using Face Mask Detection System, one can monitor if the people are
wearing masks or not.

Here HAAR-CASACADE algorithm is used for image detection. Collating with


other existing algorithms, this classifier produces a high recognition rate even with
varying expressions, efficient feature selection and low assortment of false positive
features. HAAR feature-based cascade classifier system utilizes only 200 features
out of 6000 features to yield a recognition rate of 85-95%.

According to this motivation we demand mask detection as a unique and public


health service system during the global pandemic COVID-19 epidemic. The model
is trained by face mask image and non-face mask image.

Keywords: COVID-19 epidemic, HAAR-CASACADE algorithm, mask detection,


face mask image, non-face mask image

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Table of Contents

Title
Candidates Declaration
Acknowledgement
Abstract
Contents
List of Figures

Chapter 1 Introduction
1.1 Introduction
1.2 Formulation of Problem
1.2.1 Tool and Technology Used
Chapter 2 Literature Survey/Project Design

Chapter 3 Functionality/Working of Project

Chapter 4 Results and Discussion

Chapter 5 Conclusion and Future Scope


5.1 Conclusion
5.2 Future Scope
Reference
Publication/Copyright/Product

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

List of Figures

S.No. Caption
1 Over all structure of CNN
2 System architecture
3 Use case diagram
4 Sequence diagram
5 Activity diagram
6 Block diagram
7 Uml diagram
8 Flow chart diagram

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CHAPTER-1

Introduction

Face mask detection refers to detect whether a person is wearing a mask or not. In
fact, the problem is reverse engineering of face detection where the face is detected
using different machine learning algorithms for the purpose of security,
authentication and surveillance. Face detection is a key area in the field of
Computer Vision and Pattern Recognition. A significant body of research has
contributed sophisticated to algorithms for face detection in past. The primary
research on face detection was done in 2001 using the design of handcraft feature
and application of traditional machine learning algorithms to train effective
classifiers for detection and recognition. The problems encountered with this
approach include high complexity in feature design and low detection accuracy. In
recent years, face detection methods based on deep convolutional neural networks
(CNN) have been widely developed to improve detection performance.

Although numerous researchers have committed efforts in designing efficient


algorithms for face detection and recognition but there exists an essential
difference between 8detection of the face under mask9 and 8detection of mask over
face9. As per available literature, very little body of research is attempted to detect
mask over face. Thus, our work aims to a develop technique that can accurately
detect mask over the face in public areas (such as airports. railway stations,
crowded markets, bus stops, etc.) to curtail the spread of Coronavirus and thereby
contributing to public healthcare. Further, it is not easy to detect faces with/without
a mask in public as the dataset available for detecting masks on human faces is
relatively small leading to the hard training of the model. So, the concept of
transfer learning is used here to transfer the learned kernels from networks trained
for a similar face detection task on an extensive dataset. The dataset covers various
face images including faces with masks, faces without masks, faces with and
without masks in one image and confusing images without masks. With an
extensive dataset containing 45,000 images, our technique achieves outstanding
accuracy of 98.2%. The major contribution of the proposed work is given below:

1. Develop a novel object detection method that combines one-stage and


two-stage detectors for accurately detecting the object in real-time from
video streams with transfer learning at the back end.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

2. Develop a novel object detection method that combines one-stage and


two-stage detectors for accurately detecting the object in real-time from
video streams with transfer learning at the back end.

3. Creation of unbiased facemask dataset with imbalance ratio equals to


nearly one.

4. The proposed model requires less memory, making it easily deployable


for embedded devices used for surveillance purposes.

Tool and Technology Used

1. Python

Python is an interpreted, object-oriented, high-level programming language


with dynamic semantics. Its high-level built in data structures, combined
with dynamic typing and dynamic binding, make it very attractive for Rapid
Application Development, as well as for use as a scripting or glue language
to connect existing components together. Python's simple, easy to learn
syntax emphasizes readability and therefore reduces the cost of program
maintenance. Python supports modules and packages, which encourages
program modularity and code reuse. The Python interpreter and the
extensive standard library are available in source or binary form without
charge for all major platforms, and can be freely distributed.

2. Keras

Keras is an API designed for human beings, not machines. Keras follows
best practices for reducing cognitive load: it offers consistent & simple
APIs, it minimizes the number of user actions required for common use
cases, and it provides clear & actionable error messages. It also has
extensive documentation and developer guides.

3. tensorflow

TensorFlow is an open source framework developed by Google researchers


to run machine learning, deep learning and other statistical and predictive
analytics workloads. Like similar platforms, it's designed to streamline the
process of developing and executing advanced analytics applications for

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

users such as data scientists, statisticians and predictive modelers. The


TensorFlow software handles data sets that are arrayed as computational
nodes in graph form. The edges that connect the nodes in a graph can
represent multidimensional vectors or matrices, creating what are known as
tensors. Because TensorFlow programs use a data flow architecture that
works with generalized intermediate results of the computations, they are
especially open to very large-scale parallel processing applications, with
neural networks being a common example.

4. opencv-python

OpenCV (Open Source Computer Vision Library) is an open source


computer vision and machine learning software library. OpenCV was built
to provide a common infrastructure for computer vision applications and to
accelerate the use of machine perception in the commercial products. Being
a BSD-licensed product, OpenCV makes it easy for businesses to utilize and
modify the code. The library has more than 2500 optimized algorithms,
which includes a comprehensive set of both classic and state-of-the-art
computer vision and machine learning algorithms. These algorithms can be
used to detect and recognize faces, identify objects, classify human actions
in videos, track camera movements, track moving objects, extract 3D models
of objects, produce 3D point clouds from stereo cameras, stitch images
together to produce a high resolution image of an entire scene, find similar
images from an image database, remove red eyes from images taken using
flash, follow eye movements, recognize scenery and establish markers to
overlay it with augmented reality, etc. OpenCV has more than 47 thousand
people of user community and estimated number of downloads exceeding 18
million. The library is used extensively in companies, research groups and
by governmental bodies.

5. imutils

A series of convenience functions to make basic image processing functions


such as translation, rotation, resizing, skeletonization, displaying Matplotlib
images, sorting contours, detecting edges, and much more easier with
OpenCV and both Python 2.7 and Python 3.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

6. matplotlib

Matplotlib is a comprehensive library for creating static, animated, and


interactive visualizations in Python. Matplotlib makes easy things easy and
hard things possible.Create publication quality plots.Make interactive
figures that can zoom, pan, update.Customize visual style and layout.Export
to many file formats .Embed in JupyterLab and Graphical User
Interfaces.Use a rich array of third-party packages built on Matplotlib.

7. numpy

NumPy (pronounced /ˈnʌmpaɪ/ (NUM-py) or sometimes /ˈnʌmpi/ (NUM-


pee)) is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection
of high-level mathematical functions to operate on these arrays. The
ancestor of NumPy, Numeric, was originally created by Jim Hugunin with
contributions from several other developers. In 2005, Travis
Oliphant created NumPy by incorporating features of the competing
Numarray into Numeric, with extensive modifications. NumPy is open-
source software and has many contributors. NumPy is
a NumFOCUS fiscally sponsored project.

8. scipy

SciPy provides algorithms for optimization, integration, interpolation,


eigenvalue problems, algebraic equations, differential equations, statistics
and many other classes of problems. SciPy’s high level syntax makes it
accessible and productive for programmers from any background or
experience level. SciPy is a free and open-source Python library used for
scientific computing and technical computing. SciPy contains modules for
optimization, linear algebra, integration, interpolation, special functions,
FFT, signal and image processing, ODE solvers and other tasks common in
science and engineering.

9. PyCharm

PyCharm is a dedicated Python Integrated Development Environment (IDE)


providing a wide range of essential tools for Python developers, tightly
integrated to create a convenient environment for productive Python, web,
and data science development.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CHAPTER-2

Literature Survey

[1] A paper is released by Militante and Dionisio in 2020 in which they used a
dataset of 25,000 pictures with a pixel resolution of 224224 and achieved a 96
percent accuracy rate. They used an Artificial Neural Network (ANN) to simulate
human brain activation. Raspberry Pi is being used in their research to warn in
public areas if someone enters without a mask

[2] Guillermo et al. (2020) provided a research report that was used to detect face
masks. They used this methodology to construct an artificial dataset on their own
in their research report. They used around 600 raw photos without masks to create
a fictitious set of data created via applying a mask to the face employing artificial
intelligence techniques to get positions.

[3] In 2018, Boyko, Basystiuk, and Shakhovs ka revealed their findings, which
were based on the Dlib and OpenCV libraries. The efficiency of these 2 most
frequent machine lea rning packages is compared using the hog method for finding
and subsequent recognition. The coordinates of the facial border were obtained
using the OpenFace package. They divided the facial characteristics into groups
using 128 facial feature extraction, which will allow them to lead them with greater
accuracy.

[4] Face recognition will gain substantial importance and prospective uses in the
future years, according to D. Dwivedi's "Data Science, Artificial Intelligence, Deep
Learning, Computer Vision, Machine Learning, Data Visualization, and Coffee."
Face detection is a critical component of the face recognition process. In the last
several years study effort had done to advance face detection and improve
prediction accuracy. It has a wide range of uses in a variety of industries, including
law enforcement, entertainment, and safety.

[5] Pandiyan submitted this study paper in 2020, in which he devised a Text
message warning system for without mask peoples being checked by Video
surveillance in public places. CNN layers are utilised in this study to recognise
masks and collect photographs of persons who aren't wearing them, the recorded
photos are kept on-the-fly using AWS (Aws Services). Twilio messaging, which is
an API for sending and receiving Text messages, has been used to issue a
notification to the person whose picture has already being apprehended and saved
in the Amazon Web Services system.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

[6] In 2020, Das, Ansari, and Basak published their research report, which
compared the reliability and lost outcomes datasets using two sets of data. This
study is based on OpenCV, TensorFlow, and Python.acquire the result, you'll need
to use the Keras libraries. Das et al. have specified their field of study.With a total
of 1376 people gazing forward, I created my own dataset.690 images with a white
mask and 686 images without a mask; 2nd The dataset comes from Kaggle and has
853 photos.There are two types of masks: masks with masks and masks without
masks. mask. They used 20 epochs to train their model.with a 90 percent training
data and 10 percent testing data were used in this experiment.

[7] Chen et al. (2020) presented their study work in which they offered a method
for determining whether or not he is wearing a face mask. Deep learning and
machine learning approaches are combined in this study. This model has seven
steps: input face mask dataset, input face mask Restructure the collection onto disc,
read the dataset from disc, then detect faces from a picture stream using Python
packages., decide whether to use a mask, and report the result.

[8] This research article was given by Kalas (2019), and the author worked on
recognising a person's face from a video stream. For face detection, three
technologies are employed in this study: OpenCV, adaboost, and the haar-like
method is used for object recognition, as per the research., Adaboost is used for
sample picture training since it does not overstretch the conceptual framework, and
By evaluating the face's properties, the Haarlike Algorithm is employed to collect
the boundary parameters of the face.

[9] Mohan, Paul, and Chirania (2021) published a research article that uses an
ARMCortex M7 microprocessor clocked at 480MHz with a 496kb framebuffer to
identify a face mask. They demonstrated their approach for 138 kilobytes
postquantization with 30fps inference rate using data sets, two from Kaggle with
12232 photographs each. and one from the author using an OpenMV cam H7
controller camera with 1979 images. The dataset was increased to 1,31,124 photos,
and after that all photos were reduced to 3232 pixels, which is the best size for the
microcontroller's configured frame buffer, which is 496kb. Finally, it was the best
fit model for RAM-constrained microcontrollers, with a 99.79 percent accuracy.

[10] Salih basic & Oreho vacki in 2020 they shown their study in which they
utilised Open-CV to recognise faces in images and gender from facial traits.
Salihbai et al. employed the LBPH model for feature classification, which
recognises features and feeds them to the three Convolutional Neural Network
layers. The first layer of Convolutional Neural Network has ninety-six filters, the

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

2nd layer contains Two hundred fifty six filters, and the final layer contains 384
filters. However, if the person's face is lighted, the position is different, the
camera9s characteristics of the smartphone, and the phone's performance are all
different, the accuracy of their model is reduced significantly.

[11] Loey, Manogaran, Taha, and Khalifa (2021) presented their research work in
which they offered to utilise three datasets to differentiate their correctness by
running them through the same algorithms. RMFD, SMFD, and LFW are the three
datasets that they utilised. They utilised ResNet50 in conjunction with traditional
machine-learning Support Virtual Machine in this study, and they feel ResNet-50
performs better as a feature extractor. As a result, ResNet-50 is employed as a
feature extractor, while Support Virtual Machine is used for training, validation,
and testing. The study obtained 99.64 percent, 99.49 percent, and 100 percent
accuracy in RMFD, SMFD, and LFW, respectively, using these technologies.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CHAPTER-3

Functionality/Working of Project

Machine Learning approaches:

1. Viola–Jones object detection framework based on HAAR Features


2. Scale-invariant feature transform (SIFT)
3. Histogram of oriented gradients (HOG) features

Machine learning (ML) is the study of computer algorithms that improve


automatically through experience. Itis seen as a subset of artificial intelligence.
Machine learning algorithms build a mathematical model based on sample data,
known as "training data", in order to make predictions or decisions without being
explicitly programmed to do so. Machine learning algorithms are used in a wide
variety of applications, such as email filtering and computer vision, where it is
difficult or infeasible to develop conventional algorithms to perform the needed
tasks. Machine learning is closely related to computational statistics, which focuses
on making predictions using computers. The study of mathematical optimization
delivers methods, theory and application domains to the field of machine learning.
Data mining is a related field of study, focusing on exploratory data analysis
through unsupervised learning. In its application across business problems,
machine learning is also referred to as predictive analytics.

Approaches for Machine Learning:

Viola-Jones Object detection framework based in HAAR features:

The Viola-Jones algorithm is one of the most popular algorithms for objects
recognition in an image. This research paper deals with the possibilities of
parametric optimization of the Viola-Jones algorithm to achieve maximum
efficiency of the algorithm in specific environmental conditions. It is shown that
with the use of additional modifications it is possible to increase the speed of the
algorithm in a particular image by 2-5 times with the loss of accuracy and
completeness of the work by not more than the 3-5%.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Scale-invariant feature transform (SIFT):

The scale-invariant feature transform (SIFT) is a feature detection algorithm in


computer vision to detect and describe local features in images. SIFT key points of
objects are first extracted from a set of reference images and stored in a database.
An object is recognized in a new image by individually comparing each feature
from the new image to this database and finding candidate matching features based
on Euclidean distance of their feature vectors. From the full set of matches, subsets
of key points that agree on the object and its location, scale, and orientation in the
new image are identified to filter out good matches. The determination of
consistent clusters is performed rapidly by using an efficient hash table
implementation of the generalized Hough transform. Each cluster of 3 or more
features that agree on an object and its 14 pose is then subject to further detailed
model verification and subsequently outliers are discarded, Finally the probability
that a particular set of features indicates the presence of an object is computed,
given the accuracy of fit and number of probable false matches. Object matches
that pass all these tests can be identified as correct with high confidence.

Histogram of oriented gradients (HOG):

The histogram of oriented gradients (HOG) is a feature descriptor used in


computer vision and image processing for the purpose of object-detection. The
technique counts occurrences of gradient orientation in localized portions of an
image. This method is similar to that of edge orientation histograms, scale-
invariant feature transform descriptors, and shape contexts, but differs in that it is
computed on a Feature-Based Cascade Classifiers.

HAAR Feature-Based Cascade Classifiers

It is an Object Detection Algorithm used to identify faces in an image or a real


time video. Dense grid of uniformly spaced cells and uses overlapping local
contrast normalization for improved accuracy. It is an effective way for object
detection. In this approach, lot of positive and negative images are used to train the
classifier. In this, a model is pre-trained with frontal features is developed and used
in this experiment to detect the faces in real-time. HAAR Cascade is a machine
learning-based approach where a lot of positive and negative images are used to
train the classifier.

Positive Images: These images contain the images which we want our classifier to
identify.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Negative Images: Images of everything else, which do not contain the object we
want to detect.

DEEP LEARNING

1. Deep learning is an AI function that mimics the workings of the human brain
in processing data for use in detecting objects, recognizing speech,
translating languages, and making decisions.
2. Deep learning AI is able to learn without human supervision, drawing from
data that is both unstructured and unlabeled.
3. In this, face mask detection is built using Deep Learning technique called as
Convolution Neural Networks (CNN).

Deep learning methods aim at learning feature hierarchies with features from
higher levels of the hierarchy formed by the composition of lower-level features.
Automatically learning features at multiple levels of abstraction allow a system to
learn complex functions mapping the input to the output directly from data,
without depending completely on human-crafted features. Deep learning
algorithms seek to exploit the unknown structure in the input distribution in order
to discover good representations, often at multiple levels, with higher-level learned
features defined in terms of lower-level features.

Convolution Neural Network

A convolution neural network is a special architecture of artificial neural network


proposed by yann lecun in 1988. One of the most popular uses of the architecture
is image classification. CNNs have wide applications in image and video
recognition, recommender systems and natural language processing. In this article,
the example that this project will take is related to Computer Vision. However, the
basic concept remains the same and can be applied to any other use-case!

CNNs, like neural networks, are made up of neurons with learnable weights and
biases. Each neuron receives several inputs, takes a weighted sum over them, pass
it through an activation function and responds with an output. The whole network
has a loss function and all the tips and tricks that we developed for neural networks
still apply on CNNs. In more detail the image is passed through a series of
convolution, nonlinear, pooling layers and fully connected layers, then generates
the output.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of


deep, feed-forward artificial neural networks, most commonly applied to analyzing
visual imagery.

Convolutional networks were inspired by biological processes in that the


connectivity pattern between neurons resembles the organization of the visual
cortex. CNNs use relatively little pre-processing compared to other image
classification algorithms. CNN is a special kind of multi- layer NNs applied to 2-d
arrays (usually images), based on spatially localized neural input. CNN Generate
8patterns of patterns9 for pattern recognition.

Each layer combines patches from previous layers. Convolutional Networks are
trainable multistage architectures composed of multiple stages Input and output of
each stage are sets of arrays called feature maps. At output, each feature map
represents a particular feature extracted at all locations on input. Each stage is
composed of: a filter bank layer, a non-linearity layer, and a feature pooling layer.
A ConvNet is composed of 1, 2 or 3 such 3-layer stages, followed by a
classification module.

Basic structure of CNN, where C1, C3 are convolution layers and S2, S4 are
pooled/sampled layers.

Filter: A trainable filter (kernel) in filter bank connects input feature map to output
feature map Convolutional layers apply a convolution operation to the input,
passing the result to the next layer. The convolution emulates the response of an
individual neuron to visual stimuli.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CONVOLUTIONAL LAYER

It is always first. The image (matrix with pixel values) is entered into it. Image that
the reacting of the input matrix begins at the top left of image. Next the software
selects the smaller matrix there, which is called a filter. Then the filter produces
convolution that is moves along the input image. The filter task is to multiple its
value by the original pixel values. All these multiplications are summed up and one
number is obtained at the end. Since the filter has read the image only in the upper
left corner it moves further by one unit right performing a similar operation. After
passing the filter across all positions, a matrix is obtained, but smaller than a input
matrix.

This operation, from a human perspective is analogous to identifying boundaries


and simple Colors on the image. But in order to recognize the fish whole network
is needed. The network will be -consists of several convolution layers mixed with
nonlinear and pooling layers. Convolution is the first layer to extract features from
an input image. Convolution features using small squares of input data. It is a
mathematical operation that takes two inputs such as image matrix and a filter or
kernel.

● An image matrix of dimension (h x w x d)


● A filter (fh x fw x d)
● Outputs a volume dimension (h-fh+1) x (w-fw+1) x1. Consider a 5 x 5
whose image pixel values are 0, 1 and filter matrix 3 x 3 as shown in below

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix


which is called <Feature Map= as output shown in below

THE NON-LINEAR LAYER:

It is adder after each convolution operation. It has the activation function, which
brings nonlinear property, without this property a network would not be,
sufficiently intense and will not be able to model the response variable.

THE POOLING LAYER:

It follows the nonlinear layer. It works with width and height of the image and
performs a down sampling operation on them. As a result image volume is
reduced. This means that if some features already been identified in the previous
convolution operation, then a detailed image is no longer needed for further
processing and is compressed to less detailed pictures.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

FULLY CONNECTED LAYER:

After completion of series of convolution, non-linear and pooling layer, it9s


necessary to attach a fully connected layer. This layer takes the output information
from convolution network. Attaching a fully connected layer to the end of the
network results in N dimensional vector, where N is the amount of classes from
which the model selects the desired class.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CNN MODEL

1. This CNN model is built using the Tensorflow framework and the OpenCv
library which is highly used for real-time applications.
2. This model can also be used to develop a full-fledged software to scan every
person before they can enter the public gathering.

LAYERS IN CNN MODEL


 Conv2D
 MaxPooling2D
 Flatten ()
 Dropout
 Dense

1. Conv2D Layer: It has 100 filters and the activation function used is the
8ReLu9. The ReLu function stands for Rectified Linear Unit which will
output the input directly if it is positive ,otherwise it will output zero.

2. MaxPooling2D: It is used with pool size or filter size of 2*2.

3. Flatten () Layer: It is used to flatten all the layers into a single 1D layer.

4. Dropout Layer: It is used to prevent the model from overfitting.

5. Dense Layer: The activation function here is soft max which will output a
vector with two probability distribution values.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

SYSTEM ARCHITECTURE

Data Visualization
In the first step, let us visualize the total number of images in our dataset in both
categories. We can see that there are 690 images in the 8yes9 class and 686 images
in the 8no9 class.

Data Augmentation
In the next step, we augment our dataset to include more number of images for our
training. In this step of data augmentation, we rotate and flip each of the images in
our dataset.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Splitting the data


In this step, we split our data into the training set which will contain the images on
which the CNN model will be trained and the test set with the images on which our
model will be tested.

Building the Model


In the next step, we build our Sequential CNN model with various layers such as
Conv2D, MaxPooling2D, Flatten, Dropout and Dense.

Pre-Training the CNN model


After building our model, let us create the 8train_generator9 and
8validation_generator9 to fit them to our model in the next step.

Training the CNN model


This step is the main step where we fit our images in the training set and the test
set to our Sequential model we built using keras library. I have trained the model
for 30 epochs (iterations). However, we can train for more number of epochs to
attain higher accuracy lest there occurs over-fitting.

Labeling the Information


After building the model, we label two probabilities for our results. [809 as
8without_ ask9 and 819 as 8with_ mask9]. I am also setting the boundary rectangle
color using the RGB values

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

DESIGN
Use Case Diagram

Sequence Diagram

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Activity Diagram

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

BLOCK DIAGRAM

CLASS DIAGRAM

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

DATA FLOW DIAGRAM

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

FLOWCHART DIAGRAM

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

CODE IMPLEMENTATION

Training the dataset


# import the necessary packages
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.layers import AveragePooling2D
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import os

# initialize the initial learning rate, number of epochs to train for,


# and batch size
INIT_LR = 1e-4
EPOCHS = 20
BS = 32

DIRECTORY = r"dataset"
CATEGORIES = ["with_mask", "without_mask"]

# grab the list of images in our dataset directory, then initialize


# the list of data (i.e., images) and class images
print("[INFO] loading images...")

data = []
labels = []

for category in CATEGORIES:


path = os.path.join(DIRECTORY, category)
for img in os.listdir(path):
img_path = os.path.join(path, img)

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

image = load_img(img_path, target_size=(224, 224))


image = img_to_array(image)
image = preprocess_input(image)

data.append(image)
labels.append(category)

# perform one-hot encoding on the labels


lb = LabelBinarizer()
labels = lb.fit_transform(labels)
labels = to_categorical(labels)

data = np.array(data, dtype="float32")


labels = np.array(labels)

(trainX, testX, trainY, testY) = train_test_split(data, labels,


test_size=0.20, stratify=labels, random_state=42)

# construct the training image generator for data augmentation


aug = ImageDataGenerator(
rotation_range=20,
zoom_range=0.15,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.15,
horizontal_flip=True,
fill_mode="nearest")

# load the MobileNetV2 network, ensuring the head FC layer sets are
# left off
baseModel = MobileNetV2(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))

# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)

# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
layer.trainable = False

# compile our model


print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model.compile(loss="binary_crossentropy", optimizer=opt,
metrics=["accuracy"])

# train the head of the network


print("[INFO] training head...")
H = model.fit(
aug.flow(trainX, trainY, batch_size=BS),
steps_per_epoch=len(trainX) // BS,
validation_data=(testX, testY),
validation_steps=len(testX) // BS,
epochs=EPOCHS)

# make predictions on the testing set


print("[INFO] evaluating network...")
predIdxs = model.predict(testX, batch_size=BS)

# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)

# show a nicely formatted classification report


print(classification_report(testY.argmax(axis=1), predIdxs,
target_names=lb.classes_))

# serialize the model to disk


print("[INFO] saving mask detector model...")
model.save("mask_detector.model", save_format="h5")

# plot the training loss and accuracy


N = EPOCHS
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, N), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig("plot.png")

Detection of Face Mask

# import the necessary packages


from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
from imutils.video import VideoStream
import numpy as np
import imutils
import time
import cv2
import os

def detect_and_predict_mask(frame, faceNet, maskNet):


# grab the dimensions of the frame and then construct a blob
# from it
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 1.0, (224, 224),
(104.0, 177.0, 123.0))

# pass the blob through the network and obtain the face detections
faceNet.setInput(blob)
detections = faceNet.forward()
print(detections.shape)

# initialize our list of faces, their corresponding locations,


# and the list of predictions from our face mask network
faces = []
locs = []
preds = []

# loop over the detections


for i in range(0, detections.shape[2]):
# extract the confidence (i.e., probability) associated with
# the detection
confidence = detections[0, 0, i, 2]

# filter out weak detections by ensuring the confidence is


# greater than the minimum confidence
if confidence > 0.5:
# compute the (x, y)-coordinates of the bounding box for
# the object

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])


(startX, startY, endX, endY) = box.astype("int")

# ensure the bounding boxes fall within the dimensions of


# the frame
(startX, startY) = (max(0, startX), max(0, startY))
(endX, endY) = (min(w - 1, endX), min(h - 1, endY))

# extract the face ROI, convert it from BGR to RGB channel


# ordering, resize it to 224x224, and preprocess it
face = frame[startY:endY, startX:endX]
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
face = cv2.resize(face, (224, 224))
face = img_to_array(face)
face = preprocess_input(face)

# add the face and bounding boxes to their respective


# lists
faces.append(face)
locs.append((startX, startY, endX, endY))

# only make a predictions if at least one face was detected


if len(faces) > 0:
# for faster inference we'll make batch predictions on *all*
# faces at the same time rather than one-by-one predictions
# in the above `for` loop
faces = np.array(faces, dtype="float32")
preds = maskNet.predict(faces, batch_size=32)

# return a 2-tuple of the face locations and their corresponding


# locations
return (locs, preds)

# load our serialized face detector model from disk


prototxtPath = r"face_detector\deploy.prototxt"
weightsPath = r"face_detector\res10_300x300_ssd_iter_140000.caffemodel"
faceNet = cv2.dnn.readNet(prototxtPath, weightsPath)

# load the face mask detector model from disk


maskNet = load_model("mask_detector.model")

# initialize the video stream


print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()

# loop over the frames from the video stream

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

while True:
# grab the frame from the threaded video stream and resize it
# to have a maximum width of 400 pixels
frame = vs.read()
frame = imutils.resize(frame, width=400)

# detect faces in the frame and determine if they are wearing a


# face mask or not
(locs, preds) = detect_and_predict_mask(frame, faceNet, maskNet)

# loop over the detected face locations and their corresponding


# locations
for (box, pred) in zip(locs, preds):
# unpack the bounding box and predictions
(startX, startY, endX, endY) = box
(mask, withoutMask) = pred

# determine the class label and color we'll use to draw


# the bounding box and text
label = "Mask" if mask > withoutMask else "No Mask"
color = (0, 255, 0) if label == "Mask" else (0, 0, 255)

# include the probability in the label


label = "{}: {:.2f}%".format(label, max(mask, withoutMask) * 100)

# display the label and bounding box rectangle on the output


# frame
cv2.putText(frame, label, (startX, startY - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.45, color, 2)
cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)

# show the output frame


cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF

# if the `q` key was pressed, break from the loop


if key == ord("q"):
break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Chapter 4
Results and Discussion

INPUT AND OUTPUT

Input

Output

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Input

Output

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

REAL TIME INPUT/OUTPUT

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Face Mask Detection in webcam stream:

The flow to identify the person in the webcam wearing the face mask or not. The
process is two-fold.
 To identify the faces in the webcam
 Classify the faces based on the mask.

Identify the Face in the Webcam: To identify the faces a pre-trained model
provided by the OpenCV framework was used.

The model was trained using web images. OpenCV provides 2 models for this face
detector

FUNCTIONAL REQUIREMENTS

The primary purpose of computer results is to deliver processing results to users.


They are also employed to maintain a permanent record of the results for future
use. In general, the following are many types of results:

 External results are those that are exported outside the company.
 Internal results, which are the main user and computer display and have a
place within the organization.
 Operating results used only by the computer department.
 User-interface results that allow the user to communicate directly with the
system.
 Understanding the user's preferences, the level of technology and the needs
of his or her business through a friendly questionnaire.

NON-FUNCTIONAL REQUIREMENTS

SYSTEM CONFIGURATION

This project can run on commodity hardware. We ran entire project on an AMD
Ryzen5 processor with 16 GB Ram, 4 GB Nvidia Graphic Processor, It also has 6
cores which runs at 1.7 GHz, 2.1 GHz respectively. First part of the is training
phase which takes 10-15 mins of time and the second part is testing part which
only takes few seconds to make predictions and calculate accuracy.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

HARDWARE REQUIREMENTS

• RAM: 4 GB
• Storage: 500 GB
• CPU: 2 GHz or faster
• Architecture: 32-bit or 64-bit

SOFTWARE REQUIREMENTS

• Python 3.5 in PyCharm is used for data pre-processing, model training and
prediction
• Operating System: windows 7 and above or Linux based OS or MAC OS
• Coding Language: Python.
• Nvidia Graphic Processor

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Chapter 5
Conclusion and Future Scope

CONCLUSION

As the technology are blooming with emerging trends the availability so we have
novel face mask detector which can possibly contribute to public healthcare. The
architecture consists of Mobile Net as the backbone it can be used for high and low
computation scenarios. In order to extract more robust features, we utilize transfer
learning to adopt weights from a similar task face detection, which is trained on a
very large dataset. We used OpenCV, tensor flow, and NN to detect whether
people were wearing face masks or not. The models were tested with images and
real-time video streams. The accuracy of the model is achieved and, the
optimization of the model is a continuous process and we are building a highly
accurate solution by tuning the hyper parameters. This specific model could be
used as a use case for edge analytics. Furthermore, the proposed method achieves
state-of-the-art results on a public face mask dataset. By the development of face
mask-detection we can detect if the person is wearing a face mask and allow their
entry would be of great help to the society

FUTURE ENHANCEMENT

Jingdong's recognition accuracy is stronger than 99 percent. We created the


MFDD, RMFRD, and SMFRD datasets, as well as a cutting-edge algorithm based
on them. The algorithm will provide contactless face authentication in settings
such as community access, campus governance, and enterprise resumption. Our
research has given the world more scientific and technological strength.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

REFERENCES

[1] Militante, S. V., & Dionisio, N. V. (2020). Real-Time Face Mask Recognition
with Alarm System using Deep Learning. 2020 11th IEEE Control and System
Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia.
https://doi.org/10.1109/ ICSGRC49013.2020.9232610

[2] Guillermo, M., Pascua, A. R. A., Billones, R. K., Sybingco, E., Fillone, A., &
Dadios, E. (2020). COVID19 Risk Assessment through Multiple Face Mask
Detection using MobileNetV2 DNN. The 9th International Symposium on
Computational Intelligence and Industrial Applications (ISCIIA2020), Beijing,
China. https:// isciia2020.bit.edu.cn/docs/20201114082420135149. Pdf

[3] Boyko, N., Basystiuk, O., & Shakhovska, N. (2018). Performance Evaluation
and Comparison of Software for Face Recognition, Based on Dlib and Opencv
Library. 2018 IEEE Second International Conference on Data Stream Mining &
Processing (DSMP), Lviv, Ukraine. https://doi.org/10.1109/DSMP.2018.8478556

[4] D. Dwivedi, <Data Science, Artificial Intelligence, Deep Learning, Computer


Vision, Machine Learning, Data Visualization and Coffee=

[5] Pandiyan, P. (2020, December 17). Social Distance Monitoring and Face Mask
Detection Using Deep Neural Network. Retrieved from: https://www.researchgate.
net/publication/347439579_Social_Distance_Monitoring_and_Face_Mask_Detecti
on_Using_Deep_Neural_Netw ork

[6] Das, A., Ansari, M. W., & Basak, R. (2020). Covid-19 Face Mask Detection
Using TensorFlow, Keras and OpenCV. 2020 IEEE 17th India Council
International Conference (INDICON), New Delhi, India. https://
doi.org/10.1109/INDICON49873.2020.9342585

[7] Chen, Y., Hu, M., Hua, C., Zhai, G., Zhang, J., Li, Q., & Yang, S. X. (2020).
Face Mask Assistant: Detection of Face Mask Service Stage Based on Mobile
Phone. Preprint arXiv:2010.06421. https://arxiv.org/abs/2010.06421

[8] Kalas, M. S. (2019). Real Time Face Detection and Tracking using OpenCV.
International Journal of Soft Computing and Artificial Intelligence, 2(1), 41- 44.

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

[9] Mohan, P., Paul, A. J., & Chirania, A. (2021). A Tiny CNN Architecture for
Medical Face Mask Detection for Resource-Constrained Endpoints. In: Mekhilef,
S., Favorskaya, M., Pandey, R. K., Shaw, R. N. (eds.) Innovations in Electrical and
Electronic Engineering. Lecture Notes in Electrical Engineering, vol 756. Springer,
Singapore. https://doi.org/10.1007/978- 981-16-0749-3_52

[10] Salihbasic, A., & Orehovacki, T. (2019). Development of Android


Application for Gender, Age and Face Recognition Using OpenCV. 2019 42nd
International Convention on Information and Communication Technology,
Electronics and Microelectronics (MIPRO), Opatija, Croatia.
https://doi.org/10.23919/ MIPRO.2019.8756700

[11] Loey, M., Manogaran, G., Taha, M. H. N., & Khalifa, N. E. M. (2021a).
Fighting against COVID-19: A novel deep learning model based on YOLOv2 with
ResNet-50 for medical face mask detection. Sustainable Cities and Society, 65,
102600. https://doi.org/10.1016/j.scs.2020.102600

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

BASE PAPER

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)


lOMoARcPSD|16065545

Downloaded by ashish yadav (ashishyadavking123@gmail.com)

You might also like