COVID-19 Care: Checking Whether People Are Following Social Distancing and Wearing Face Masks or Not Using Deep Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

COVID-19 Care: Checking whether people are following social distancing and
wearing face masks or not using Deep Learning
Mrs. Vidya Zope, Nikhil Joshi, Srivatsan Iyengar, Krish Mahadevan
Vivekanand Education Society’s Institute of Technology, Chembur, Mumbai
vidya.zope@ves.ac.in
2017.nikhil.joshi@ves.ac.in
2017.srivatsan.iyengar@ves.ac.in
2017.mahadevan.krishvenkatteshwaran@ves.ac.in

Abstract

The novel Coronavirus disease (COVID-19) has affected people all over the g lobe. Although many countries have started developing
herd immunity against the disease, but still no specific medication has been found to cure the disease. In India, the cases continue to
grow at an alarming rate and given its massive population, it is pract ically impossible to check if each and every indiv idual is
following the rules or not. Hence, in the proposed work, we have developed a system that can detect whether people are follow ing
social distancing and wearing face masks or not in images, videos and webcam feeds. The system can be used in CCTV cameras by
the government officials to keep an eye on the public and see to it that everyone follows the protocols.
Keywords: COVID-19, Object Detection, Deep Learning, YOLO, Social Distancing, Face Masks.

1. Introduction

The novel Coronavirus has affected countries across the globe. It started in Wuhan, China in December 2019 and still no med ic ine
has been found to cure the COVID-19 disease (Co ronavirus disease). Though several countries have made themselves Coro na free,
there is no specific medication g iven to the COVID positive patients. As per 23rd September 2020, mo re than 31 million people have
been affected by the virus, although more than 23 million people have recovered. The total number of deaths is about 9.7 lakhs [13].
Talking of India, about 5.6 million people have suffered from the corona virus and a total of 90,210 deaths have occurred (as of 23rd
September 2020) [13]. A mong the most affected countries are In dia, the USA, Brazil, Russia and so on [13]. India ran ks 2nd in terms
of the total number of patients [13]. Th is is because of the massive population that India has. Although severe restrictio ns were
imparted on the country, the numbers continue to be growing day by day. Despite tremendous hard work put in by the state and central
Govern ment, Po lice staff and other o fficials, not much success has been achieved. And since there is no specific medicine aga inst the
virus, people are required to follow the norms put up by the govt. like following social distancing in every public place and wearing
face masks all the time. This system is designed to help the police staff and the govt. officials by detecting whether or not people are
following social d istancing terms and wearing face masks in images, videos and webcam feeds. This system is very beneficial if used
in CCTV cameras which keep a close eye on the people. The system helps in drawing bounding boxes around people detected in

140

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

images and videos and checks if they are follo wing the rules and regulations or not. If any person is seen violating any rule, the
bounding box will be marked red in color and an alarm will be rung. Hence the officials can be alert and take the necessary a ctions.

2. Problem Statement

The problem statement can be d ivided into 2 modules: Detecting whether people are fo llo wing social d istancing or not and
Detecting whether people are wearing face masks or not. The detection can be done in images, videos and webcam feeds. Dependi ng
on the results, the system alerts the officials by sounding an alarm so that the police staff or other officials can take the necessary
actions against the people.
3. Scope of the Project

The system is a generic system wh ich is not affected by any background disturbance. Hence it has a very wide scope. The system
can be used in CCTV cameras in public places like shopping malls, gardens, market area and so on.

4. Literature Survey

1. Narinder Singh Punn, Sanjay Ku mar Sonbhadra, Sonali Agarwal, “Monitoring COVID-19 social distancing with person
detection and tracking via fine-tuned YOLO v3 and Deepsort techniques” - This paper presents an automated framework t o
monitor social distancing using surveillance video. It uses YOLO v3 object detection model for detecting people and drawi ng
the bounding boxes around them. It also compares the results with faster RCNN and SSD models through parameters like loss
and FPS. Its advantages are that it presents deep comparative study between different models and it uses L2 norm to identify
clusters of people not obeying social distancing. But no alert or warning is shown for the people not
following social distancing.
2. Mahdi Rezaei, Mohsen Azarmi, ”DeepSOCIA L: Social Distancing Monitoring and Infection Risk Assessment in COVID -19
Pandemic” - In this paper, a DeepSocial network system is implemented wh ich utilizes webcam as the source and detects
whether people are following social distancing or not. This paper provides a great visualization using tools like heat maps,
moving trajectory etc. It also performs well in different situations like occlusion, lighting variat ions, shades, and partial
visibility. But it does not have a warning system to alert people if they are violating social distancing.

3. Shashi Yadav, ”Deep Learning based Safe Social Distancing and FaceMask Detection in Public Areas for COVID19 Safet y
Gu idelines Adherence” - This paper adopts a combination of lightweight neural network MobileNet V2 and SSD with transfer
learning technique to achieve the balance of resource limitations and recog nition accuracy so that it can be used on real-time
video surveillance. The work uses neural networking models to analyze RTPS (Real-Time Streaming Protocol) video streams
using OpenCV and MobileNet V2. The approach of mixing the modern-day deep learning methods and classic geometry meets

141

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

the needed requirement and keeps the accuracy high. The work includes use of Transfer learning on top of the h igh performin g
pre-trained SSD model to detect faces with the MobileNet V2 architecture to create a lightweight model wh ich can be used for
embedded devices like Raspberry pi.
4. Dongfang Yang, Ekim Yurtsever, Vishnu Renganathan Keith A. Redmill,
Umit Ozguner,” A Vision-based Social Distancing and Critical Density Detection System for COVID-19” - The paper proposes
use of a monocular camera and deep learning-based real-time ob ject detectors to measure social distancing. The system uses a
pre-trained deep convolutional neural network (CNN) to detect individuals with bounding boxes in each monocular camera
frame. Then, detections in the image domain are transformed into real-world bird’s-eye view coordinates.
5. Kerim Ku¨r¸sat C¸evik, ”Co mputer Vision Based Distance Measurement System using Stereo Camera View” - This paper
utilizes Stereo camera algorith m and technique to obtain the distance between two faces and the distance between the face and
the camera. The system gives a good accuracy of approximately 98% for camera at a range of 60cm to 120cm. But 2 cameras
are used which adds to the expense and more work must be done for calibration of two cameras for different environments.

5. Lacuna in the existing system

1. One of the issues we encountered while doing the literature survey was that all the systems mentioned are either hardware -
based solutions or software solutions which require different equipment based on their respective applications. Our system is a
generalized system which can be used at mult iple p laces with the same results and the software used in comp letely open -
sourced, hence no expense is involved.
2. The mentioned systems do not give any alert if any rules are broken. Our system g ives an alert by sounding an alarm if anyone
is found not wearing a mask or breaking the social distancing rules.
3. Camera calibrat ion is required for detection in webcam feeds for the social d istancing detection since the distance between 2
people will depend on their distance from the camera, the size of the bounding box, the angle of t he camera and so on.

6. Proposed Solution

In the proposed system, we have used Object Detection techniques to identify human faces and human bodies in the given input
image or a video frame. We have used pre-trained Resnet caffemodel for detecting human faces and YOLO v3 weights for the social
distancing detector. The main reason for using pretrained networks is that they provide excellent results in terms of accurac y and
speed which is not possible for custom trained networks. Also, the datasets used to train the model are huge datasets like the ImageNet
dataset or the COCO dataset which minimize the errors and prevent the models fro m overfitting. The dataset used for the face mask
detection is an auxiliary dataset which consisted of about 1500 images belongin g to 2 classes -” wearing mask” and ”not wearing
mask”. The dataset is not a real dataset and created artificially by taking normal images of human faces and attaching a mask after

142

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

identifying the ROIs like eyes, nose, mouth and jawline. Although it is not a real dataset, it gives good results when used for real
world pred ictions. The YOLO v3 weights used for the social distancing detection is trained on the COCO dataset. For the face mask
detection model, we have 80% of the data for training and 20% of the data for validation.

Figure 1: Block Diagram of the proposed work

7. Proposed System Block Diagram

Figure 1 shown depicts the system block diagram. The first step in our system is data pre -processing which involves getting
frames fro m the video and webcam feeds, adjusting the height and width of the images, converting to RGB fo rmat and so on. Next the
diagram is split into 2 parts. For the face detection, we identify hu man faces in the given image or video frame through Obje ct
Detection. We draw bounding boxes around the detected faces and pass the image to the Deep Learn ing model. The deep learning
model classifies the detected image into 2 classes :” mask” or” without mask”. If a person is detected as” without mask”, an alarm is
rung.
Similarly, in the Social Distancing detector, after data prep-processing, the image or video frame is given to the Object Detector. It
identifies human bodies and draws a bounding box around them. Once all the bounding boxes are detected, we find the pair -wise
distance between 2 bounding boxes. The distance found, is the Euclidean distance between the centroids of 2 bounding boxes. If this

143

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

distance is greater than a threshold value, we infer that those 2 people are not follo wing social distancing. We also display the total
number of social distancing violations on the top of the screen.

8. Methodology

As we can see, the system can be divided into 2 modules: The Face Mask Detector and the Social Distancing detector. The face
mask detector is a 2 stage pipeline in wh ich the first step is to detect human faces in the given input and classify the input into 2
classes: ”mask” and ”no mask”. If no image is detected, in the first stage, then no classification is done. For the detection of human
faces, we have used the pre-trained Resnet Caffemodel. The architecture of the Resnet model is optimal for face detection and since it
is a pre-trained model, it achieves better accuracy since it is trained on the COCO dataset. In the second stage, we classify the imag e
into ”mask” or ”no mask”. We have used the pre-trained MobileNet model and modified the input layer and the head portion by
adding a few dense layers to it. The pre -trained model can be found in the Keras library of Python. It is a highly efficient arch itecture
which can be applied on embedded devices like Raspberry Pi. Also, we can detect multiple faces in a single frame. If a person is
detected as ”no mask”, an alarm will be rung.
In the second module, i.e. the Social Distancing detector, we have used the pre -trained YOLO weights for detecting human faces
in the given image or v ideo frame. The b iggest advantage YOLO offers is its speed which is close to 45 frames per second with out
much change in accuracy. Un like other reg ion proposal classificat ion networks like Faster RCNN, wh ich perform detection on the
same region mult iple times, YOLO is like a fully convolutional network, which splits the image into grids and each grid is classified
only once. For each grid, we get the bounding box and the probability as the output. Once we get the bounding boxes on the human
bodies, we find the centroid of the box. Since we get the coordinates of the diagonals of the box, finding the centroid is ve ry
straightforward. Next, we find the d istance between every pair of centroids. The distance used is th e Euclidean distance between 2
centroids. If this distance is less than a specified threshold, we infer that those 2 people are not following social distanc ing. The
threshold distance is also calculated by taking into consideration the height and width of the bo x since it cannot be kept as a constant.
It is assumed that the safe distance between 2 people is 2 meters. We also maintain a set of people who are not followin g soc ial
distancing. Since it is a set, the same person cannot be added mult iple times. F inally, we display the total nu mber of social dis tancing
violations at the top of the screen.
As an additional feature, we also keep a record of the predictions after every testing is done for both the modules. We maint ain a
table which stores the location, the date and time and the prediction results once the user tests the system. We store the number of
social distancing violations for the social d istancing detector and ”mask” or ”no mask” for the mask detector. These results can be
further used for more in-depth analysis, like the average number of social distancing violations, the location at which the violat ions are
the greatest, at what time the violat ions are more and so on. This can help the police staff and other officials for better c ontrol o ver that
particular location or a particular time period so that the community spreading the virus can be controlled as far as possible.

144

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

The entire system is built using Python language since the development of the models, train ing and testing them is much easie r in
Python through its different libraries. We have built a web application using the Django web framework wh ich will help the us ers to
test our Deep Learning models and get
the predictions.

9. Results

For the Face mask detector model, we have plotted a graph between the model accuracy and loss vs. epochs which can be found
below:

Figure 2: Graph between model accuracy and loss vs epochs

We have also calculated various evaluation metrics like accuracy, loss, precision, recall and so on for the trained mode l on the
training and validation dataset using the Keras library of Python. The results of the same can be found below:

145

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

For the training set:


Accuracy 99.91%

Loss 0.0043

Precision 99.81%

Recall 99.90%

F1-Score 99.85%

Specificity 99.85%

Sensitivity 99.85%

For the validation set:


Accuracy 98.91%

Loss 0.0203

Precision 98.81%

Recall 98.91%

F1-Score 98.91%

Specificity 98.91%

Sensitivity 98.91%

Later, we tested the Face Mask Detection module on both positive and negative images and the results are shown below:

146

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

Figure 3: Output of a person wearing mask (Image source: Google)

Figure 4: Output of a person NOT wearing mask (Image source: Google)

Finally, we also tested the Social Distancing module on few test images and one of the outputs is shown below:

Figure 5: Social Distancing Detector Output (Image source: Google)

147

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472
International Conference on IoT based Control Networks and Intelligent Systems (ICICNIS 2020)

10. Conclusion

Thus, we have presented an effective and advanced solution during these t ough periods we all are facing. Th is project will b e of
great help for the govt. officials for keeping an eye on the co mmon public to see whether they are following the ru les or not . In turn, it
will be beneficial for the common public because they can be safe from the rampant pandemic.

11. Future Work

For the first stage of the face mask detection i.e. face detection, if the input image or video frame is too obscured, the Resnet
model will not be able to identify human faces. Hence, the mask detector will not work. To overco me this limitation an end to end
object detector can be developed.

References

[1] Narinder Singh Punn, Sanjay Kumar Sonbhadra, Sonali Agarwal, “Monitoring COVID-19 social distancing with person detection and tracking
via fine-tuned YOLO v3 and Deepsort techniques”, arXiv:2005.01385v2 [cs.CV], 6 May 2020.
[2] Mahdi Rezaei, Mohsen Azarmi, ”DeepSOCIAL: Social Distancing Monitoring and Infection Risk Assessment in COVID-19 Pandemic” - MDPI article,
published in October 2020.
[3] Shashi Yadav, ”Deep Learning based Safe Social Distancing and Face Mask Detection in Public Areas for COVID19 Safety Guidelines Adherence”,
International Journal for Research in Applied Science Engineering Technology (IJRASET), Volume 8, Issue VII, July 2020.
[4] Dongfang Yang, Ekim Yurt sever, Vishnu Renganathan Keith A. Redmill, Umit Ozguner, ”A Vision -based Social Distancing and Critical Density Detection
System for COVID-19”, arXiv:2007.03578, July 2020.
[5] Kerim Ku¨r¸sat C¸evik, ”Computer Vision Based Distance Measurement System using Stereo Camera View”, 2019 3rd International Symposium on
Multidisciplinary Studies and Innovative Technologies (ISMSIT), October 2019.
[6] Rinkal Keniya, Ninad Mehendale, ”Real-time social distancing detector using SocialdistancingNet19 deep learning network”, SSRN preprint, August 2020.
[7] Vincent Chi-Chung Cheng, Shuk-Ching Wong, Vivien Wai-Man Chuang,
Simon Yung-Chun So, Jonathan Hon-Kwan Chen, Siddharth Sridhar, Kelvin Kai-Wang To, Jasper Fuk-Woo Chan, Ivan Fan-Ngai Hung, PakLeung Ho, Kwok-
Yung Yuen, “The role of community-wide wearing of face mask for control of coronavirus disease 2019 (COVID-19) epidemic due to SARS-CoV-2”, Journal of
Infection, Volume 81, Issue 1, Pages 107-114 July 2020.
[8] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi - “You Only Look Once: Unified, Real-Time Object Detection”, The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pg. 779-788, 2016.
[9] Huaizu Jiang and Erik Learned-Miller, “Face Detection with the Faster R-CNN”, arXiv:1606.03473v1 [cs.CV], 10 Jun 2016
[10] Sachin Sudhakar Farfade, Mohammad Saberian, Li-Jia Li ,“Multi-view Face Detection Using Deep Convolutional Neural Networks”, ICMR ’15: Proceedings
of the 5th ACM on International Conference on Multimedia Retrieval, June 2015.
[11] Fengwei Yu1, Wenbo Li, Quanquan Li, Yu Liu, Xiaohua Shi, Junjie Yan, “POI: Multiple Object T racking with High Performance Detection and Appearance
Feature”, In: Hua G., J´egou H. (eds) Computer Vision – ECCV
2016 Workshops. ECCV 2016. Lecture Notes in Computer Science, vol 9914, Springer, 19 June 2016.
[12] Yan Zhang, Stephen J. Kiselewich, William A. Bauson and Riad Hammoud,
“Robust Moving Object Detection at Distance in the Visible Spectrum and
Beyond Using A Moving Camera”, 24th ACM international conference on Multimedia, 22 August 2016.
[13] Coronavirus Updates: https://www.worldometers.info/coronavirus/

148

This preprint research paper has not been peer reviewed. Electronic copy available at: https://ssrn.com/abstract=3768472

You might also like