Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

SMART SOCIAL DISTANCING USING ARTIFICIAL

INTELLIGENCE DURING PANDEMIC

A PROJECT REPORT

Submitted by

VINOTH D
412719405001

In partial fulfillment for the award of the degree of

MASTER OF ENGINEERING
in

COMPUTER SCIENCE AND ENGINEERING

TAGORE ENGINEERING COLLEGE


RATHINAMANGALAM
CHENNAI-600 127
ANNA UNIVERSITY:: CHENNAI 600 025

DECEMBER 2020

1
ANNA UNIVERSITY :: CHENNAI 600 025
BONAFIDE CERTIFICATE

Certified that this project report “SMART SOCIAL DISTANCING USING


ARTIFICIAL INTELLIGENCE DURING PANDEMIC” is the bonafide work of
“VINOTH D (412719405001)” who carried out the project work under my
supervision.

SIGNATURE SIGNATURE
Dr.S. SURENDRAN, M.E.,Ph.D. Mr. SUDHEER REDDY BANDI,M.E.,

HEAD OF THE DEPARTMENT SUPERVISOR


Assistant professor,
Professor,

Department of CSE,
Department of CSE,
Tagore Engineering College,
Tagore Engineering College,
Rathinamangalam, Vandalur.
Rathinamangalam, Vandalur.
Chennai – 600127
Chennai – 600127

Submitted for Project Report Viva voice held on

INTERNAL EXAMINER EXTERNAL EXAMINER

2
ACKNOWLEDGMENT

My heartfelt thanks go to Prof. Dr. M. Mala, M.A., M.Phil., Chairperson of Tagore Engineering
College, Rathinamangalam, for providing us with all necessary infrastructure and other facilities, for
their support to complete this project successfully.

I extend my sincere gratitude to Dr. L. Raja, M.E, Ph.D., Principal, Tagore Engineering College for
this degree of encouragement and moral support during the course of this project.

I am extremely happy for expressing my heartful gratitude to the Head of our department Dr. S.
SURENDRAN, M.E., Ph.D. for his valuable suggestions which helped us to complete the project
successfully.

I sincere gratitude to my supervisor Mr. SUDHEER REDDY BANDI, M.E., for extending all
possible help for this project work.

My sincere thanks to all teaching and non-teaching staff who have rendered help during various
stage of my work.

3
ABSTRACT

Social distancing, also called “physical distancing,” means keeping a safe space between yourself and
other people who are not from your household. To practice social or physical distancing, stay at least
6 feet (about 2 arms’ length) from other people who are not from your household in both indoor and
outdoor spaces. Social distancing should be practiced in combination with other everyday preventive
actions to reduce the spread of COVID-19.

Social distancing is crucial in our fight against COVID-19; it may remain an essential part of our
lives for months and reshape how we interact with the outside world forever. The current way of
social distancing at stores is not a viable long term solution, and we need to come up with a better
way to restore the casual life while ensuring the safety of everyone. The Smart Social Distancing
application uses AI to detect social distancing violations in real-time. It works with the existing
cameras installed at any workplace, hospital, school, or elsewhere.

To ensure users’ privacy, the Smart Social Distancing application avoids transmitting the videos over
the internet to the Cloud. Instead, it is designed to be deployed on small, low-power edge devices that
process the data locally. The Smart Social Distancing application uses edge IOT devices such as
NVIDIA Jetson Nano to monitor social distancing in real-time in a privacy preserving manner.

4
LIST OF ABBREVIATIONS

AI – Artificial Intelligence
IOT – Internet Of Things.
COCO – Common Objects in Context.
COVID-19 – Corona Virus 2019
UI – User Interface

5
TABLE OF CONTENTS
ABSTRACT ......................................................................................................................................... 4
LIST OF ABBREVIATIONS ............................................................................................................... 5
1 CHAPTER .................................................................................................................................. 8
INTRODUCTION ........................................................................................................................... 8
1. PURPOSE ............................................................................ Error! Bookmark not defined.
1.1 WHY SOCIAL DISTANCING ........................................................................................ 8
1.2 OBJECTIVE .................................................................................................................... 8
2 CHAPTER ................................................................................................................................ 10
LITERATURE STUDY ................................................................................................................ 10
2.1 INTRODUCTION ......................................................................................................... 10
3 CHAPTER 3 ............................................................................................................................. 12
PROBLEM DEFINITION ............................................................................................................ 12
3.1 Existing System.............................................................................................................. 12
3.2 Proposed System ............................................................................................................ 13
3.2.1 Advantages of Jetson Nano ................................................................................ 13
3.2.2 Disadvantages of Jetson Nano ........................................................................... 13
4 CHAPTER ................................................................................................................................ 14
SYSTEM REQUIREMENTS SPECIFICATION ......................................................................... 14
4.1 Hardware Requirements ................................................................................................. 14
4.2 Software Requirements .................................................................................................. 14
4.3 DOMAIN DESCRIPTION ............................................................................................ 14
4.3.1 Machine Learning .............................................................................................. 14
4.3.2 Applications of Machine Learning .................................................................... 15
4.3.3 Steps Involved in Machine Learning ................................................................. 15
4.3.4 Types of Learning .............................................................................................. 15
Supervised (inductive) learning ......................................................................... 15
Unsupervised learning........................................................................................................ 15
Reinforcement learning ...................................................................................................... 15
Supervised Learning .......................................................................................................... 15
Regression .......................................................................................................................... 15
Classification ...................................................................................................................... 15
Unsupervised Learning .................................................................................................. 16
Reinforcement Learning ................................................................................................ 16
4.4 LANGUAGE DESCRIPTION ...................................................................................... 16
4.4.1 About Python Language..................................................................................... 16
4.4.2 Python Programming Characteristics................................................................. 16
4.4.3 Applications of Python Programming Web Applications .................................. 17
Scientific and Numeric Computing.................................................................................... 17
Creating software Prototypes ............................................................................................. 17
Good Language to Teach Programming ............................................................................ 17
4.4.4 Data Set and Model ............................................................................................ 17
Common Object in Context ............................................................................................... 17

6
SSD Mobilenet V2 object detection on Jetson Nano at 20+ FPS ...................................... 22
4.4.5 Container ............................................................................................................ 22
Docker ................................................................................................................................ 23
5 CHAPTER ................................................................................................................................ 24
UML Representation ..................................................................................................................... 24
....................................................................................................................................................... 24
Sequence Diagram ......................................................................................................................... 25
....................................................................................................................................................... 25
Activity Diagram ........................................................................................................................... 26
6 CHAPTER ................................................................................................................................ 27
Core Architecture .......................................................................................................................... 27
6.1.1 Computer Vision Engine .................................................................................... 27
6.1.2 Data Pre-Processing ........................................................................................... 27
6.1.3 Model Inference ................................................................................................. 28
6.1.4 Bounding boxes post-processing ....................................................................... 28
6.1.5 Compute distances ............................................................................................. 28
6.1.6 User Interface (UI) and Web Application .......................................................... 29
7 CHAPTER ................................................................................................................................ 30
TESTING ...................................................................................................................................... 30
7.1.1 Software Testing................................................................................................. 30
General ............................................................................................................................... 30
7.1.2 Test Case ............................................................................................................ 30
7.1.3 Testing Techniques ............................................................................................. 30
8 CHAPTER ................................................................................................................................ 33
Smart Social Distancing roadmap ................................................................................................. 33
9 APPENDIX .............................................................................................................................. 34
Source Code: ................................................................................................................................. 34
SCREENSHOT ............................................................................................................................. 85
REFERENCES .............................................................................................................................. 86

7
CHAPTER 1
INTRODUCTION

Smart Social Distancing is to quantify social distancing measures using edge computer vision
systems. Since all computation runs on the device, it requires minimal setup and minimizes privacy
and security concerns. It can be used in retail, workplaces, schools, construction sites, healthcare
facilities, factories, etc.

We can run this application on edge devices such as NVIDIA's Jetson Nano. This application
measures social distancing rates and gives proper notifications each time someone ignores social
distancing rules. By generating and analyzing data, this solution outputs statistics about high-traffic
areas that are at high risk of exposure to COVID-19 or any other contagious virus.

1.1 WHY SOCIAL DISTANCING

COVID-19 spreads mainly among people who are in close contact (within about 6 feet) for a
prolonged period. Spread happens when an infected person coughs, sneezes, or talks, and droplets
from their mouth or nose are launched into the air and land in the mouths or noses of people nearby.
The droplets can also be inhaled into the lungs. Recent studies indicate that people who are infected
but do not have symptoms likely also play a role in the spread of COVID-19. Since people can spread
the virus before they know they are sick, it is important to stay at least 6 feet away from others when
possible, even if you—or they—do not have any symptoms. Social distancing is especially important
for people who are at higher risk for severe illness from COVID-19.

1.2 OBJECTIVE
Social distancing is crucial in our fight against COVID-19; it may remain an essential part of our
lives for months and reshape how we interact with the outside world forever. The current way of
social distancing at stores is not a viable long-term solution, and we need to come up with a better
way to restore the shopping experience while ensuring the safety of everyone.

We can always be smarter in socially distancing ourselves, make shopping more efficient and safer
during COVID19 and beyond. With the help of Artificial Intelligence (AI), the same technology that
is the backbone of self-driving Teslas and Netflix recommendations, combined with edge computing,

8
the technology that is reshaping the Internet of Things (IoT), we can practice social distancing with
minimal disruption to our daily lives. Imagine a seamless integration of social distancing to our
shopping experience powered by big data and AI. Using available data stores can better implement
social distancing. For example, stores can change aisle traffic in real-time, identify hotspots and re-
distribute products to eliminate them and vary the number of cashiers to reduce long wait times while
eliminating the risk of exposure to shoppers and workers.

9
CHAPTER 2
LITERATURE STUDY
2.1 INTRODUCTION
The purpose of literature survey is to give the complete information about the reference papers. The
goal of the literature review is to specify the technical related papers which form the foundation of
this project. Literature survey is the documentation of a comprehensive review of the published and
unpublished work from secondary sources data in the areas of specific interest to the researcher. The
library is a rich storage base for secondary data and researchers used to spend several weeks and
sometimes months going through books, journals, newspapers, magazines, conference proceedings,
doctoral dissertations, master's theses, government publications and financial reports to find
information on their research topic. With computerized databases now readily available and
accessible the literature search is much speedier and easier. The researcher could start the literature
survey even as the information from the unstructured and structured interviews is being gathered.
Reviewing the literature on the topic area at this time helps the researcher to focus further interviews
more meaningfully on certain aspects found to be important is the published studies even if these had
not surfaced duringthe earlier questioning. So, the literature survey is important for gathering the
secondary data for the research which might be proved very helpful in the research. The literature
survey can be conducted for several reasons. The literature review can be in any area of the business.

Title: Smartphone app for efficient contact tracing

Author: Singapore Government.

Year: 2020

Description:

The Smartphone app is able to identify people who have been in close proximity – within 2m for
at least 30 minutes – to corona virus patients using wireless Bluetooth technology, said its developers,
the Government Technology Agency (GovTech) and the Ministry of Health (MOH), on Friday (March
20). "This is especially useful in cases where the infected persons do not know everyone whom they
had been in close proximity with for an extended duration," said its developers.

While use of the app is not compulsory, those who use it have to turn on the Bluetooth settings in
their phones for tracing to be done. They also need to enable push notifications and location
permissions in the app, which is available on the Apple App Store or the Google Play store.

10
Title: Novel Economical Social Distancing Smart Device for COVID-19

Authors: Rahul Reddy Nadikattu, Sikender Mohsienuddin Mohammad, Pawan Whig.

Year: 2020

Description:

Spiritual intelligence is the science of human energy management that clarifies and in the era of
COVID-19 in which everywhere there is a panic like situation and according to the World Health
Organization Social Distancing will be proven to be the only solution. In this research paper, an
innovative localization method was proposing to track humans' position in an outdoor environment
based on sensors is proposed. With the help of artificial intelligence, this novel smart device is handy
for maintaining a social distancing as well as detecting COVID-19 symptom patients and thereby
safety. In these COVID-19 environments, where everyone is conscious about their safety, we came
up with the idea of this novel device. Most of the time, people on the roadside watched their front but
were not able to look after what is going on behind them. The device will give alert to the person if
someone in the critical range of six feet around him. The method is reasonably accurate and can be
very useful in maintaining social distancing. The sensor model used is described, and the expected
errors in distance estimates are analyzed and modeled. Finally, the experimental results are presented.

11
CHAPTER 3
PROBLEM DEFINITION
3.1 Existing System
Coronavirus Disease 2019 or COVID19 has caused chaos and fear all over the world. All non-
essential businesses and services have been shut down in 42 states due to the COVID19 pandemic,
and many businesses are struggling to survive. This pandemic will reshape how we live our lives for
many years to come. Essential businesses such as grocery stores and pharmacies remain open, but
they are hotspots for COVID19. The workers are at a high risk of contracting the virus, and so are the
shoppers. White House’s response coordinator Deborah Birx said in a statement that “This is the
moment not to be going to the grocery store, not going to the pharmacy, but doing everything you can
to keep your family and your friends safe.” People are afraid to go out shopping for their essential
needs, but with few alternatives, many have no choice but to make the trip. So how do we protect
shoppers and workers at the store?

So we are experiencing a deadly pandemic, but we still need access to essential goods and services.
We also know that, according to the Center for Disease Control (CDC), to stay safe during this
pandemic, we need to practice social distancing rules and keep a minimum distance of 6 feet from
people outside of our household. Essential businesses are allowed to be open so long that they follow
social distancing rules. There is a problem; however, stores have limited space, and people might
break social distancing rules without even realizing it. So how do we work with the limited options
we have to make shopping for essential goods safe?

In response to social distancing rules, stores were asked to limit the number of people who can be
inside at the same time. Shoppers are forced to keep their distance while waiting in line to get in or
during checkout. Some stores are marking where people should be standing, spacing everybody apart,
and some supermarkets are testing one-way aisles. Besides the entrance and checkout lines, there is
nowhere else in the store where these safety measures are enforced. Best we can hope for is that
shoppers will honor the social distancing rules while they shop. So far, creating long lines and
increasing wait time for getting groceries has been our best hope in minimizing the spread of
COVID19, but there are drawbacks to this approach. Longer wait times mean more time spent outside
the house, and more time spent in public means a higher risk of exposure to COVID19. Longer wait
times have also led to overbuying, which stresses the supply chain, results in waste, and causes
shortages for others. Many stores have brought in more staff to help manage the social distancing
etiquette. This is not only resource-intensive but puts the health of personnel at risk. More importantly,

12
what is the guarantee that all shoppers will always maintain a safe distance? Social distancing might
work in theory but might not be practical without the appropriate measures and tools. Aisles are
sometimes too narrow, high demand items might be placed next to each other, and shoppers might
simply forget or misjudge the safe distance, what if two customers want a product from the same
shelf? It will quickly become apparent that social distancing at stores is not as simple as limiting the
store’s total capacity. Ad hoc solutions to social distancing in the absence of relevant data could lead
to unexpected outcomes such as high-traffic areas and a higher risk of exposure.

3.2 Proposed System

Our approach uses artificial intelligence and edge AI devices such as Jetson Nano or Edge TPU to
track people in different environments and measure adherence to social distancing guidelines. It can
give proper notifications each time social distancing rules are violated. Our solution can be modified
to work in real-time by processing a USB or Camera Serial Interface. The demo runs on a video that
can be provided via the configuration file. The solution designed to run on small mini factor AI based

board. So we have consider using NVIDIA Jetson nano. NVIDIA® Jetson Nano™ lets you bring
incredible new capabilities to millions of small, power-efficient AI systems. It opens new worlds of
embedded IoT applications, including entry-level Network Video Recorders (NVRs), home robots,
and intelligent gateways with full analytics capabilities. Jetson Nano is also the perfect tool to start
learning about AI and robotics in real-world settings, with ready-to-try projects and the support of an
active and passionate developer community.

3.1.1 Advantages of Jetson Nano

 4 x USB 3.0 A. For Better Connectivity to Depth Cameras and External Accessories.
 4K Video Processing Capability Unlike Raspberry Pi
 Multiple Monitor can be Hooked Up
 Select-able Power Source.

3.1.2 Disadvantages of Jetson Nano

 Limited RAM Bandwidth of 25.6 GB/s Still Better than Raspberry Pi

 microSD as Main Storage Device Limits Disk Performance. Using the M.2 Key-E with
PCIe x1 Slot for SSD or USB HDD / SSD can Solve this Problem, Check this Solution.
 Less Support for Softwares as Architecture is AArch64, much software will not work out
of the box.

13
CHAPTER 4
SYSTEM REQUIREMENTS SPECIFICATION

4.1 Hardware Requirements

System NVIDIA Jetson Nano


Hard disk 500GB memory card
Monitor 15” VGA
Utilities Keyboard & Mouse

4.2 Software Requirements

Operating System Ubuntu 18.04 LTS


Coding Language Python
IDE Jupyter
Model & Dataset SSD MobileNet V2 & COCO
Utilities NVIDIA Jetpack 4.3 with Nvidia-Docker
Modules Scipy, TensorRT
Container Docker

4.3 DOMAIN DESCRIPTION


In a high-level study, we can divide our code into two primary modules, which we review as
follows:

4.3.1 Machine Learning


Machine Learning (ML) is an automated learning with little or no human intervention. It involves
programming computers so that they learn from the available inputs. The main purpose of machine
learning is to explore and construct algorithms that can learn from the previous data and make
predictions on new input data. To solve this problem, algorithms are developed that build knowledge
from a specific data and past experience by applying the principles of statistical science, probability,
logic, mathematical optimization, reinforcement learning, and control theory.

14
4.3.2 Applications of Machine Learning
 Vision processing
 Language processing
 Pattern recognition
 Games
 Data mining
 Expert systems
 Robotics

4.3.3 Steps Involved in Machine Learning


 Defining a Problem
 Preparing Data
 Evaluating Algorithms
 Improving Results
 Presenting Results

4.3.4 Types of Learning


Supervised (inductive) learning
Training data includes desired outputs
Unsupervised learning
Training data does not include desired outputs
Reinforcement learning
Rewards from sequence of actions
Supervised Learning
Supervised learning is commonly used in real world applications, such as face and speech
recognition, products or movie recommendations, and sales forecasting. Supervised learning can be
further classified into two types – Regression and Classification.

Regression
Regression trains on and predicts a continuous-valued response, for example predicting real estate
prices.

Classification
It attempts to find the appropriate class label, such as analyzing positive/negative sentiment, male and
female persons, benign and malignant tumors, secure and unsecure loans etc.

15
Unsupervised Learning

Unsupervised learning is used to detect anomalies, outliers, such as fraud or defective equipment, or
to group customers with similar behaviors for a sales campaign. It is the opposite of supervised
learning. There is no labeled data here. Unsupervised learning algorithms are extremely powerful
tools for analyzing data and for identifying patterns and trends. They are most commonly used for
clustering similar input into logical groups. Unsupervised learning algorithms include Kmeans,
Random Forests, Hierarchical clustering and so on.

Reinforcement Learning
Here learning data gives feedback so that the system adjusts to dynamic conditions in order to achieve
a certain objective. The system evaluates its performance based on the feedback responses and reacts
accordingly. The best known instances include self-driving cars and chess master algorithm AlphaGo.
Decision making (robot, chess machine)

4.4 LANGUAGE DESCRIPTION

4.4.1 About Python Language

Python is an object-oriented programming language created by Guido Rossum in 1989. It is ideally


designed for rapid prototyping of complex applications. It has interfaces to many OS system calls and
libraries and is extensible to C or C++. Many large companies use the Python programming language
include NASA, Google, YouTube, BitTorrent, etc. Python 20programming is widely used in Artificial
Intelligence, Natural Language Generation, Neural Networks and other advanced fields of Computer
Science. Python had deep focus on code readability & this class will teach you python from basics.

4.4.2 Python Programming Characteristics

 It provides rich data types and easier to read syntax than any other programming languages.

 It is a platform independent scripted language with full access to operating system API's
Compared to other programming languages, it allows more run- time flexibility

 It includes the basic text manipulation facilities of Perl and Awk

 A module in Python may have one or more classes and free functions

 Libraries in Pythons are cross-platform compatible with Linux, Macintosh, and Windows

 For building large applications, Python can be compiled to byte-code Python supports
functional and structured programming as well as OOP

16
 It supports interactive mode that allows interacting Testing and debugging of snippets of code

 In Python, since there is no compilation step, editing, debugging and testing is fast.

4.4.3 Applications of Python Programming Web Applications

You can create scalable Web Apps using frameworks and CMS (Content Management System) that
are built on Python. Some of the popular platforms for creating Web Apps are: Django, Flask,
Pyramid, Plone, Django CMS. Sites like Mozilla, Reddit, Instagram and PBS are written in Python.

Scientific and Numeric Computing


There are numerous libraries available in Python for scientific and numeric computing. There are
libraries like: SciPy and NumPy that are used in general purpose computing. And, there are specific
libraries like: EarthPy for earth science, AstroPy for Astronomy and so on. Also, the 21language is
heavily used in machine learning, data mining and deep learning.

Creating software Prototypes


Python is slow compared to compiled languages like C++ and Java. It might not be a good choice if
resources are limited and efficiency is a must. However, Python is a great language for creating
prototypes. For example: You can use Pygame (library for creating games) to create your game's
prototype first. If you like the prototype, you can use language like C++ to create the actual game.

Good Language to Teach Programming


Python is used by many companies to teach programming to kids and newbies. It is a good language
with a lot of features and capabilities. Yet, it's one of the easiest language to learn because of its simple
easy-to-use syntax.

4.4.4 Data Set and Model

Common Object in Context


The Core application trained with COCO large scale object detection segmentation, and captaining
dataset. COCO has several features,

Object segmentation

 Recognition in context
 Super pixel stuff segmentation
 330K images (>200K labeled)
 1.5 million object instances

17
 80 object categories
 91 stuff categories
 5 captions per image
 250,000 people with key points

In today’s world of deep learning if data is King, making sure it is in the right format might just be
Queen. Or at least Jack or 10. Anyway, it is important. After working hard to collect your images and
annotating all the objects, you must decide what format you are going to use to store all that info.
This may not seem like a big decision compared to all the other things you have to worry about, but
if you want to quickly see how different models perform on your data, it’s vital to get this step right.

Back in 2014 Microsoft created a dataset called COCO (Common Objects in COntext) to help
advance research in object recognition and scene understanding. COCO was one of the first large
scale datasets to annotate objects with more than just bounding boxes, and because of that it became
a popular benchmark to use when testing out new detection models. The format COCO uses to store
annotations has since become a de facto standard, and if you can convert your dataset to its style, a
whole world of state-of-the-art model implementations opens.

This is where pycococreator comes in. pycococreator takes care of all the annotation formatting
details and will help convert your data into the COCO format. Let’s see how to use it by working
with a toy dataset for detecting squares, triangles, and circles.

Figure 4.1 Shapes of Different datasets.

The shapes dataset has 500 128x128px jpeg images of random colored and sized circles, squares, and
triangles on a random colored background. It also has binary mask annotations encoded in png of
each of the shapes. This binary mask format is fairly easy to understand and create. That’s why it’s
the format your dataset needs to be in before you can use pycococreator to create your COCO-styled
version. You might be thinking, “why not just use the png binary mask format if it’s so easy to

18
understand.” Remember, the whole reason we’re trying to make a COCO dataset isn’t because it’s
the best way of representing annotated images, but because everyone else is using it. The example
script we’ll use to create the COCO-style dataset expects your images and annotations to have the
following structure:

shapes

└───train

└───annotations
│ │ <image_id>_<object_class_name>_<annotation_id>.png
│ │ ...

└───<subset><year>
│ <image_id>.jpeg
│ ...

In the shapes example, subset is “shapes_train”, year is “2018”, and object_class_name is “square”,
“triangle”, or “circle”. You would generally also have separate “validate” and “test” datasets.

{
"info": info,
"licenses": [license],
"categories": [category],
"images": [image],
"annotations": [annotation]
}

The “info”, “licenses”, “categories”, and “images” lists are straightforward to create, but the
“annotations” can be a bit tricky. Luckily we have pycococreator to handle that part for us. Let’s start
out by getting the easy stuff out of the way first. We’ll describe our dataset using python lists and
dictionaries and later export them to json.

19
INFO = {
"description": "Example Dataset",
"url": "https://github.com/waspinator/pycococreator",
"version": "0.1.0",
"year": 2018,
"contributor": "waspinator",
"date_created": datetime.datetime.utcnow().isoformat(' ')
}

LICENSES = [
{
"id": 1,
"name": "Attribution-NonCommercial-ShareAlike License",
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/"
}
]

CATEGORIES = [
{
'id': 1,
'name': 'square',
'supercategory': 'shape',
},
{
'id': 2,
'name': 'circle',
'supercategory': 'shape',
},
{
'id': 3,
'name': 'triangle',
'supercategory': 'shape',
},
]
Okay, with the first three done we can continue with images and annotations. All we have to do is
loop through each image jpeg and its corresponding annotation pngs and let pycococreator generate
the correctly formatted items. Lines 90 and 91 create our image entries, while lines 112-114 take
care of annotations.

20
# filter for jpeg images
for root, _, files in os.walk(IMAGE_DIR):
image_files = filter_for_jpeg(root, files)

# go through each image


for image_filename in image_files:
image = Image.open(image_filename)
image_info = pycococreatortools.create_image_info(
image_id, os.path.basename(image_filename), image.size)
coco_output["images"].append(image_info)

# filter for associated png annotations


for root, _, files in os.walk(ANNOTATION_DIR):
annotation_files = filter_for_annotations(root, files, image_filename)

# go through each associated annotation


for annotation_filename in annotation_files:

if 'square' in annotation_filename:
class_id = 1
elif 'circle' in annotation_filename:
class_id = 2
else:
class_id = 3

category_info = {'id': class_id, 'is_crowd': 'crowd' in image_filename}


binary_mask = np.asarray(Image.open(annotation_filename)
.convert('1')).astype(np.uint8)

annotation_info = pycococreatortools.create_annotation_info(
segmentation_id, image_id, category_info, binary_mask,
image.size, tolerance=2)
if annotation_info is not None:
coco_output["annotations"].append(annotation_info)

There are two types of annotations COCO supports, and their format depends on whether the
annotation is of a single object or a “crowd” of objects. Single objects are encoded using a list of
points along their contours, while crowds are encoded using column-major RLE (Run Length
Encoding). RLE is a compression method that works by replaces repeating values by the number of
times they repeat. For example 0 0 1 1 1 0 1 would become 2 3 1 1. Column-major just means that
instead of reading a binary mask array left-to-right along rows, we read them up-to-down along
columns. The tolerance option in pycococreatortools.create_annotation_info() changes how precise
contours will be recorded for individual objects. The higher the number, the lower the quality of
annotation, but it also means a lower file size. 2 is usually a good value to start with. COCO uses

21
JSON (JavaScript Object Notation) to encode information about a dataset. There are several
variations of COCO, depending on if its being used for object instances, object keypoints, or image
captions. We’re interested in the object instances format which goes something like this:

SSD-MobileNet-v2 is build using dataset COCO on Jetson Nano with docker technology to make the
model more portable.

SSD Mobilenet V2 object detection on Jetson Nano at 20+ FPS


The mobilenet V2 object detection is build on Jetson Nano at 20+ frames per seconds. The SSD
MobileNet V2 object detection model runs at 20+FPS. Twice as fast, also cutting down the memory
consumption down to only 32.5% of the total 4GB memory on Jetson Nano(i.e. around 1.3GB).
Plenty of memory left for running other fancy stuff. You have also noticed the CPU usage is also quite
low, only around 10% over the quad-core.

The mobilenet V2 image detection is built upon the latest JetPack 4.3 - L4T R32.3.1 base image. To
make an inference with TensorRT engine file, the two important Python packages are required,
TensorRT and Pycuda. Building Pycuda Python package from source on Jetson Nano might take some
time, so it is decided to pack the pre-build package into a wheel file and make the Docker build
process much smoother. Notice that Pycuda prebuilt with JetPack 4.3 is not compatible with older
versions of Jetpack and vers visa. As of the TensorRT python package, it came from the Jetson Nano
at directory usr/lib/python3.6/dist-packages/tensorrt/. This process helps to avoid the installation of
TensorFlow GPU python module as the TensorRT engine file on top of Jetpack 4.3.

Now, for the limitation of the TensorRT engine file approach. It simply will not work across different
JetPack version. The reason came from how the engine file is built by searching through CUDA
kernels for the fastest implementation available, and thus it is necessary to use the same GPU and
software stack(CUDA, CuDnn, TensorRT, etc.) for building like that on which the optimized engine
will run. TensorRT engine file is like a dress tailored exclusively for the setup, but its performance is
amazing when fitted on the right person/dev board.

Container

A container is a standard unit of software that packages up code and all its dependencies, so the
application runs quickly and reliably from one computing environment to another. A Docker container
image is a lightweight, standalone, executable package of software that includes everything needed
to run an application: code, runtime, system tools, system libraries and settings.

22
Docker
Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.
Containers allow a developer to package up an application with all the parts it needs, such as libraries
and other dependencies, and deploy it as one package. By doing so, thanks to the container, the
developer can rest assured that the application will run on any other Linux machine regardless of any
customized settings that machine might have that could differ from the machine used for writing and
testing the code.

23
CHAPTER 5

UML Representation
The below diagram depicts the relationship between the actor and the procedure involved.

24
The sequence diagram is used to depict the set of events in an order. For example, the activities are
ordered in the way they occur. Here, they start from the multiple frame and the activities end in the
user interface.

Sequence Diagram
One Pre- COCO User
Multiple
Frame Interface
Frame processing Dataset

Calibration
Method

25
Activity Diagram

COCO Dataset

Object Detection

Object Distance Calibration


Detection Calculation Method

User Interface

26
CHAPTER 6
Core Architecture

Video One
Video Frame Preprocessing
Input
Frames

Model Inference
SSD-Mobilenet
Calculate Dist
Model
Post Processing

Visualization

6.1 Computer Vision Engine

The computational core of this application lies in this module. There is Distancing class defined
in the core.py file takes a video frames as the input and return the coordinates of each detected object
as well as a matrix of distances measured between each pair. The coordinates will further be used by
the update method in the webgui class to draw bounding boxes around each person.

6.2 Data Pre-Processing

To prepare each frame to enter the object detection model, we applied some pre-processing such as
re-sizing and RGB transformation to the frames.

resized_image=cv.resize(cv_image,tuple(self.image_size[:2]))

rgb_resized_image = cv.cvtColor(resized_image, cv.COLOR_BGR2RGB)

27
6.2.1 Model Inference

In the constructor, we have a detector attribute that specifies the object detection model. We used
SSD-MobileNet-v2 as the default model for this application. The average inference time on Jetson
Nano was 44.7 milliseconds (22 frames per second).

The object detection model is built based on the config file and should have an inference method that
takes a proper image as input and returns a list of dictionaries. Each dictionary of this list contains
the information of a detected object, i.e., bounding box coordinates and object id.

tmp_objects_list = self.detector.inference(rgb_resized_image)

6.2.2 Bounding boxes post-processing

Since we used a general-purpose object detector (trained on COCO with 80 different classes) that has
not been trained for our specific task, the output of this model needs partial post-processing to be
robust for our specific purpose.

We applied three post-processing filterings to the raw bounding boxes to eliminate large boxes,
collapse duplicated boxes for a single object, and keep track of moving objects in different frames.

These post-processing filterings are applied in the following lines of code:

New_objects_list = self.ignore_large_boxes(objects_list)

new_objects_list = self.non_max_suppression_fast(new_objects_list,
float(self.config.get_section_dict("PostProcessor")[ "NMSThreshold"]))

tracked_boxes = self.tracker.update(new_objects_list)

6.2.3 Compute distances

After post-processing, we need to calculate the distances between every pair of persons detected
in each frame. We use Python’s scipy library to calculate the distances between each pair of bounding
box centroids. It is clear that the distances matrix is symmetric and has zeros on the diagonal.

centroids = np.array([obj["centroid"] for obj in new_objects_list])


distances = dist.cdist(centroids, centroids)
28
6.2.4 User Interface (UI) and Web Application

The UI section is responsible for drawing bounding boxes around the detected persons at each
frame. It uses different colors to show the distance between each pair of people.

The WebGUI object implements a Flask application and serves as an interface for the user. WebGUI
constructor takes config and engine_instance parameters as inputs and acts as the central application
for the output view.

Processing the video begins with the WebGUI start method. Within this method, the engine instance
calls process_video to process the video frame by frame. This method returns a list of dictionaries for
each detected object, a matrix of distances, and the image itself with the desired output resolution.
These values are then passed to the update method of the WebGUI class that draws bounding boxes
with proper colors around each object, i.e., person.

29
CHAPTER 7
TESTING
7.1 Software Testing

General
In a generalized way, we can say that the system testing is a type of testing in which the main aim is
to make sure that system performs efficiently and seamlessly. The process of testing is applied to a
program with the main aim to discover an unprecedented error, an error which otherwise could have
damaged the future of the software. Test cases which brings up a high possibility of discovering and
error is considered successful. This successful test helps to answer the still unknown errors.

7.1.1 Test Case

Testing, as already explained earlier, is the process of discovering all possible weak points in the
finalized software product. Testing helps to counter the working of sub-assemblies, components,
assembly, and the complete result. The software is taken through different exercises with the main
aim of making sure that software meets the business requirement and user-expectations and does not
fails abruptly. Several types of tests are used today. Each test type addresses a specific testing
requirement.

7.1.2 Testing Techniques

A test plan is a document which describes approach, its scope, its resources and the schedule of aimed
testing exercises. It helps to identify almost other test item, the features which are to be tested, its
tasks, how will everyone do each task, how much the tester is independent, the environment in which
the test is taking place, its technique of design plus the both the end criteria which is used, also rational
of choice of theirs, and whatever kind of risk which requires emergency planning. It can be also
referred to as the record of the process of test planning. Test plans are usually prepared with
signification input from test engineers.

(I) UNIT TESTING

In unit testing, the design of the test cases is involved that helps in the validation of the
internal program logic. The validation of all the decision branches and internal code
takes place. After the individual unit is completed it takes place. Plus, it is considered
after the individual united is completed before integration. The unit test thus performs
the basic level test at its component stage and test the business process, system

30
configurations etc. The unit test ensures that the unique path of the process gets
performed precisely to the documented specifications and contains clearly defined
inputs with the results which are expected.

(II) INTEGRATION TESTING

To determine whether if they really execute as a single program or application. The


testing is event driven and thus is concerned with the basic outcome of field. The
Integration tests demonstrate that the components were individually satisfaction, as
already represented by successful unit testing, the components are apt and fine. This
type of testing is specially aimed to expose the issues that come-up by the component’s
combination.

(II) FUNCTIONAL TESTING

The functional tests help in providing the systematic representation that functions tested
are available and specified by technical requirement, documentation of the system and
the user manual.

(III) SYSTEM TESTING

System testing, as the name suggests, is the type of testing in which ensure that the
software system meet the business requirements and aim. Testing of the configuration is
taken place here to ensure predictable result and thus analysis of it. System testing is
relied on the description of process and its flow, stressing on pre driven process and the
points of integration.

(IV) WHITE BOX TESTING

The white box testing is the type of testing in which the internal components of the
system software is open and can be processed by the tester. It is therefore a complex
type of testing process. All the data structure, components etc. are tested by the tester
himself to find out a possible bug or error. It is used in situation in which the black box
is incapable of finding out a bug. It is a complex type of testing which takes more time
to get applied.

(V) BLACK BOX TESTING

The black box testing is the type of testing in which the internal components of the
software is hidden and only the input and output of the system is the keyfor the tester to

31
find out a bug. It is therefore a simple type of testing. A programmer with basic
knowledge can also process this type of testing. It is less time consuming as compared
to the white box testing. It is very successful for software which are less complex are
straight-forward in nature. It is also less costly than white box testing.

(VI) ACCEPTANCE TESTING

User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional
requirements.

32
CHAPTER 8
Smart Social Distancing roadmap

As the project is under substantial development, lots of ideas can help us improve application
performance and add other exciting features. Smart Social Distancing roadmap can best explain our
priorities in the future of this project:
 Evaluate and benchmark different models.
 Improve the distance calculation module by considering perspectives.
 Provide a safety score to a given site and calculate useful statistics about a specific time
interval; for example, how many people entered the place, and how many times social
distancing rules were violated.
 UI improvements: show statistical and historical data in the GUI.
 Aid the program optimization by profiling different parts of the software, from the CV engine
to UI overheads.
 Provide the ability for the user to customize and re-train models with task-specific datasets.

33
2 APPENDIX
Source Code:

Detector.py
import logging
logger = logging.getLogger(__name__)

class Detector:
"""
Detector class is a high level class for detecting object using NVIDIA jetson devices.
When an instance of the Detector is created you can call inference method and feed your
input image in order to get the detection results.
:param config: Is a ConfigEngine instance which provides necessary parameters.
"""

def __init__(self, config):


self.config = config
self.net = None
self.fps = None
# Get model name from the config
self.name = self.config.get_section_dict('Detector')['Name']
if self.name == 'ssd_mobilenet_v2_coco' or self.name ==
"ssd_mobilenet_v2_pedestrian_softbio":
from . import mobilenet_ssd_v2
self.net = mobilenet_ssd_v2.Detector(self.config)
else:
raise ValueError('Not supported network named: ', self.name)

def __del__(self):
del self.net

34
def inference(self, resized_rgb_image):
"""
Run inference on an image and get Frames rate (fps)
Args:
resized_rgb_image: A numpy array with shape [height, width, channels]
Returns:
output: List of objects, each obj is a dict with two keys "id" and "bbox" and "score"
e.g. [{"id": 0, "bbox": [x1, y1, x2, y2], "score":s%}, {...}, {...}, ...]
"""
self.fps = self.net.fps
output = self.net.inference(resized_rgb_image)
return output

mobilenet_ssd_v2.py
import ctypes
import numpy as np
import tensorrt as trt
import pycuda.driver as cuda
import time
from ..utils.fps_calculator import convert_infr_time_to_fps
#import pycuda.autoinit # Required for initializing CUDA driver

import logging
logger = logging.getLogger(__name__)

class Detector:
"""
Perform object detection with the given prebuilt tensorrt engine.
:param config: Is a ConfigEngine instance which provides necessary parameters.

35
:param output_layout:
"""

def _load_plugins(self):
""" Required as Flattenconcat is not natively supported in TensorRT. """
ctypes.CDLL("/opt/libflattenconcat.so")
trt.init_libnvinfer_plugins(self.trt_logger, '')

def _load_engine(self):
""" Load engine file as a trt Runtime. """
trt_bin_path = '/repo/data/jetson/TRT_%s.bin' % self.model
with open(trt_bin_path, 'rb') as f, trt.Runtime(self.trt_logger) as runtime:
return runtime.deserialize_cuda_engine(f.read())

def _allocate_buffers(self):
"""
Create some space to store intermediate activation values.
Since the engine holds the network definition and trained parameters, additional space is
necessary.
"""
for binding in self.engine:
size = trt.volume(self.engine.get_binding_shape(binding)) * \
self.engine.max_batch_size
host_mem = cuda.pagelocked_empty(size, np.float32)
cuda_mem = cuda.mem_alloc(host_mem.nbytes)
self.bindings.append(int(cuda_mem))
if self.engine.binding_is_input(binding):
self.host_inputs.append(host_mem)
self.cuda_inputs.append(cuda_mem)
else:
self.host_outputs.append(host_mem)

36
self.cuda_outputs.append(cuda_mem)

del host_mem
del cuda_mem

logger.info('allocated buffers')
return

def __init__(self, config, output_layout=7):


""" Initialize TensorRT plugins, engine and context. """
self.config = config
self.model = self.config.get_section_dict('Detector')['Name']
self.class_id = int(self.config.get_section_dict('Detector')['ClassID'])
self.conf_threshold = self.config.get_section_dict('Detector')['MinScore']
self.output_layout = output_layout
self.trt_logger = trt.Logger(trt.Logger.INFO)
self._load_plugins()
self.fps = None

self.host_inputs = []
self.cuda_inputs = []
self.host_outputs = []
self.cuda_outputs = []
self.bindings = []
self._init_cuda_stuff()

def _init_cuda_stuff(self):
cuda.init()
self.device = cuda.Device(0) # enter your Gpu id here
self.cuda_context = self.device.make_context()
self.engine = self._load_engine()

37
self._allocate_buffers()
self.engine_context = self.engine.create_execution_context()
self.stream = cuda.Stream() # create a CUDA stream to run inference

def __del__(self):
""" Free CUDA memories. """
for mem in self.cuda_inputs:
mem.free()
for mem in self.cuda_outputs:
mem.free

del self.stream
del self.cuda_outputs
del self.cuda_inputs
self.cuda_context.pop()
del self.cuda_context
del self.engine_context
del self.engine
del self.bindings
del self.host_inputs
del self.host_outputs

@staticmethod
def _preprocess_trt(img):
""" Preprocess an image before TRT SSD inferencing. """
img = img.transpose((2, 0, 1)).astype(np.float32)
img = (2.0 / 255.0) * img - 1.0
return img

def _postprocess_trt(self, img, output):


""" Postprocess TRT SSD output. """

38
img_h, img_w, _ = img.shape
boxes, confs, clss = [], [], []
for prefix in range(0, len(output), self.output_layout):
# index = int(output[prefix+0])
conf = float(output[prefix + 2])
if conf < float(self.conf_threshold):
continue
x1 = (output[prefix + 3]) # * img_w)
y1 = (output[prefix + 4]) # * img_h)
x2 = (output[prefix + 5]) # * img_w)
y2 = (output[prefix + 6]) # * img_h)
cls = int(output[prefix + 1])
boxes.append((y1, x1, y2, x2))
confs.append(conf)
clss.append(cls)
return boxes, confs, clss

def inference(self, img):


"""
Detect objects in the input image.
Args:
img: uint8 numpy array with shape (img_height, img_width, channels)
Returns:
result: a dictionary contains of [{"id": 0, "bbox": [x1, y1, x2, y2], "score": s% }, {...},
{...}, ...]
"""
img_resized = self._preprocess_trt(img)
# transfer the data to the GPU, run inference and the copy the results back
np.copyto(self.host_inputs[0], img_resized.ravel())

# Start inference time

39
t_begin = time.perf_counter()
cuda.memcpy_htod_async(
self.cuda_inputs[0], self.host_inputs[0], self.stream)
self.engine_context.execute_async(
batch_size=1,
bindings=self.bindings,
stream_handle=self.stream.handle)
cuda.memcpy_dtoh_async(
self.host_outputs[1], self.cuda_outputs[1], self.stream)
cuda.memcpy_dtoh_async(
self.host_outputs[0], self.cuda_outputs[0], self.stream)
self.stream.synchronize()
inference_time = time.perf_counter() - t_begin # Seconds

# Calculate Frames rate (fps)


self.fps = convert_infr_time_to_fps(inference_time)
output = self.host_outputs[0]
boxes, scores, classes = self._postprocess_trt(img, output)
result = []
for i in range(len(boxes)): # number of boxes
if classes[i] == self.class_id + 1:
result.append({"id": str(classes[i] - 1) + '-' + str(i), "bbox": boxes[i], "score": scores[i]})

return result
classifier.py
class Classifier:
"""
Classifier class is a high level class for classifying images using x86 devices.
When an instance of the Classifier is created you can call inference method and feed your
input image in order to get the classifier results.
:param config: Is a ConfigEngine instance which provides necessary parameters.

40
"""
def __init__(self, config):
self.config = config
self.name = self.config.get_section_dict('Classifier')['Name']

if self.name == 'OFMClassifier':
from libs.classifiers.x86 import face_mask
self.net = face_mask.Classifier(self.config)
else:
raise ValueError('Not supported network named: ', self.name)

def inference(self, resized_rgb_image):


self.fps = self.net.fps
output, scores = self.net.inference(resized_rgb_image)
return output, scores

loggers.py
import time

LOG_FORMAT_VERSION = "1.0"

class Logger:
"""logger layer to build a logger and pass data to it for logging
this class build a layer based on config specification and call update
method of it based on logging frequency
:param config: a ConfigEngine object which store all of the config parameters. Access to any
parameter
is possible by calling get_section_dict method.
"""

def __init__(self, config, camera_id):

41
"""build the logger and initialize the frame number and set attributes"""
self.config = config
# Logger name, at this time only csv_logger is supported. You can implement your own logger
# by following csv_logger implementation as an example.
self.name = self.config.get_section_dict("Logger")["Name"]
if self.name == "csv_logger":
from . import csv_processed_logger
self.logger = csv_processed_logger.Logger(self.config, camera_id)

# For Logger instance from loggers/csv_logger


# region csv_logger
# from . import csv_logger
# self.logger = csv_logger.Logger(self.config)
# end region

# Specifies how often the logger should log information. For example with time_interval of 0.5
# the logger log the information every 0.5 seconds.
self.time_interval = float(self.config.get_section_dict("Logger")["TimeInterval"]) # Seconds
self.submited_time = 0
# self.frame_number = 0 # For Logger instance from loggers/csv_logger

def update(self, objects_list, distances):


"""call the update method of the logger.
based on frame_number, fps and time interval, it decides whether to call the
logger's update method to store the data or not.
Args:
objects_list: a list of dictionary where each dictionary stores information of an object
(person) in a frame.
distances: a 2-d numpy array that stores distance between each pair of objects.
"""

42
if time.time() - self.submited_time > self.time_interval:
objects = self.format_objects(objects_list)
self.logger.update(objects, distances, version=LOG_FORMAT_VERSION)
self.submited_time = time.time()
# For Logger instance from loggers/csv_logger
# region
# self.logger.update(self.frame_number, objects_list, distances)
# self.frame_number += 1
# end region

def format_objects(self, objects_list):


""" Format the attributes of the objects in a way ready to be saved
Args:
objects_list: a list of dictionary where each dictionary stores information of an object
(person) in a frame.
"""
objects = []
for obj_dict in objects_list:
obj = {}
# TODO: Get 3D position of objects
obj["position"] = [0.0, 0.0, 0.0]
obj["bbox"] = obj_dict["bbox"]
obj["tracking_id"] = obj_dict["id"]
if "face_label" in obj_dict and obj_dict["face_label"] != -1:
obj["face_label"] = obj_dict["face_label"]
# TODO: Add more optional parameters

objects.append(obj)
return objects

43
csv_logger.py
import csv
import os
from datetime import date
from tools.objects_post_process import extract_violating_objects

import numpy as np

def prepare_object(detected_object, frame_number):


"""Construct a dictionary that is appropriate for csv writer.
This function transform a dictionary with list values to a dictionary
with scalar values. This transformation is necessary for csv writer to avoid
writing lists into csv.
Args:
detected_object: It is a dictionary that contains an detected object information after
postprocessing.
frame_number: current frame number
Returns:
A transformed version of detected_object to a dictionary with only scalar values. It also
contains an item
for frame number.
"""
object_dict = {}
object_dict.update({"frame_number": frame_number})
for key, value in detected_object.items():
if isinstance(value, (list, tuple)):
for i, item in enumerate(value):
# TODO: Inspect why some items are float and some are np.float32
if isinstance(item, (float, np.float32)):
item = round(float(item), 4)

44
object_dict.update({str(key) + "_" + str(i): item})
else:
# TODO: Inspect why some items are float and some are np.float32
if isinstance(value, (float, np.float32)):
value = round(float(value), 4)
object_dict.update({key: value})
return object_dict

class Logger:
"""A CSV logger class that store objects information and violated distances information into csv
files.
This logger creates two csv file every day in two different directory, one for logging detected
objects
and one for logging violated social distancing incidents. The file names are the same as recording
date.
:param config: A ConfigEngine object which store all of the config parameters. Access to any
parameter
is possible by calling get_section_dict method.
"""

def __init__(self, config, camera_id):


self.config = config
# The parent directory that stores all log file.
self.log_directory = config.get_section_dict("Logger")["LogDirectory"]
# A directory inside the log_directory that stores object log files.
self.objects_log_directory = os.path.join(self.log_directory, camera_id, "objects_log")
self.distances_log_directory = os.path.join(self.log_directory, "distances_log")
self.dist_threshold = config.get_section_dict("PostProcessor")["DistThreshold"]
os.makedirs(self.objects_log_directory, exist_ok=True)
os.makedirs(self.distances_log_directory, exist_ok=True)

45
def update(self, frame_number, objects_list, distances):
"""Write the object and violated distances information of a frame into log files.
Args: frame_number: current frame number objects_list: A list of dictionary where each
dictionary stores
information of an object (person) in a frame. distances: A 2-d numpy array that stores distance
between each
pair of objects.
"""
file_name = str(date.today())
objects_log_file_path = os.path.join(self.objects_log_directory, file_name + ".csv")
distances_log_file_path = os.path.join(self.distances_log_directory, file_name + ".csv")
self.log_objects(objects_list, frame_number, objects_log_file_path)
self.log_distances(distances, frame_number, distances_log_file_path)

@staticmethod
def log_objects(objects_list, frame_number, file_path):
"""Write objects information of a frame into the object log file.
Each row of the object log file consist of a detected object (person) information such as
object (person) ids, bounding box coordinates and frame number.
Args: objects_list: A list of dictionary where each dictionary stores information of an object
(person) in a
frame. frame_number: current frame number file_path: log file path
"""
if len(objects_list) != 0:
object_dict = list(map(lambda x: prepare_object(x, frame_number), objects_list))

if not os.path.exists(file_path):
with open(file_path, "w", newline="") as csvfile:
field_names = list(object_dict[0].keys())
writer = csv.DictWriter(csvfile, fieldnames=field_names)
writer.writeheader()

46
with open(file_path, "a", newline="") as csvfile:
field_names = list(object_dict[0].keys())
writer = csv.DictWriter(csvfile, fieldnames=field_names)
writer.writerows(object_dict)

def log_distances(self, distances, frame_number, file_path):


"""Write violated incident's information of a frame into the object log file.
Each row of the distances log file consist of a violation information such as object (person) ids,
distance between these two object and frame number.
Args:
distances: A 2-d numpy array that stores distance between each pair of objects.
frame_number: current frame number
file_path: The path for storing log files
"""
violating_objects = extract_violating_objects(distances, self.dist_threshold)
if not os.path.exists(file_path):
with open(file_path, "w", newline="") as csvfile:
field_names = ["frame_number", "object_0", "object_1", "distance"]
writer = csv.DictWriter(csvfile, fieldnames=field_names)
writer.writeheader()
with open(file_path, "a", newline="") as csvfile:
field_names = ["frame_number", "object_0", "object_1", "distance"]
writer = csv.DictWriter(csvfile, fieldnames=field_names)
writer.writerows([{"frame_number": frame_number,
"object_0": indices[0],
"object_1": indices[1],
"distance": distances[indices[0], indices[1]]} for indices in violating_objects])

47
Csv_processed_logger.py
import csv
import os
from datetime import date, datetime
from tools.environment_score import mx_environment_scoring_consider_crowd
from tools.objects_post_process import extract_violating_objects
import itertools

import numpy as np

class Logger:
"""A CSV logger class that store objects information and violated distances information into csv
files.
This logger creates two csv file every day in two different directory, one for logging detected
objects
and violated social distancing incidents. The file names are the same as recording date.
:param config: A ConfigEngine object which store all of the config parameters. Access to any
parameter
is possible by calling get_section_dict method.
"""

def __init__(self, config, camera_id):


self.config = config
# The parent directory that stores all log file.
self.log_directory = config.get_section_dict("Logger")["LogDirectory"]
# A directory inside the log_directory that stores object log files.
self.objects_log_directory = os.path.join(self.log_directory, camera_id, "objects_log")
self.dist_threshold = config.get_section_dict("PostProcessor")["DistThreshold"]

os.makedirs(self.objects_log_directory, exist_ok=True)

48
def update(self, objects_list, distances, version):
"""Write the object and violated distances information of a frame into log files.
Args:
objects_list: List of dictionary where each dictionary stores information of an object
(person) in a frame.
distances: A 2-d numpy array that stores distance between each pair of objects.
"""
file_name = str(date.today())
objects_log_file_path = os.path.join(self.objects_log_directory, file_name + ".csv")
self.log_objects(version, objects_list, distances, objects_log_file_path)

def log_objects(self, version, objects_list, distances, file_path):


"""Write objects information of a frame into the object log file.
Each row of the object log file consist of a detected object (person) information such as
object (person) ids, bounding box coordinates and frame number.
Args:
objects_list: A list of dictionary where each dictionary stores information of an object
(person) in a frame.
distances: A 2-d numpy array that stores distance between each pair of objects.
file_path: The path for storing log files
"""
# TODO: Remove violation logic and move it to frontend
violating_objects = extract_violating_objects(distances, self.dist_threshold)
# Get unique objects that are in close contact
violating_objects_index_list = list(set(itertools.chain(*violating_objects)))

# Get the number of violating objects (people)


no_violating_objects = len(violating_objects)
# Get the number of detected objects (people)
no_detected_objects = len(objects_list)

49
# Get environment score
environment_score = mx_environment_scoring_consider_crowd(no_detected_objects,
no_violating_objects)
# Get timeline which is used for as Timestamp
now = datetime.now()
current_time = now.strftime("%Y-%m-%d %H:%M:%S")
file_exists = os.path.isfile(file_path)
with open(file_path, "a") as csvfile:
headers = ["Version", "Timestamp", "DetectedObjects", "ViolatingObjects",
"EnvironmentScore", "Detections", 'ViolationsIndexes']
writer = csv.DictWriter(csvfile, fieldnames=headers)

if not file_exists:
writer.writeheader()

writer.writerow(

{'Version': version, 'Timestamp': current_time, 'DetectedObjects': no_detected_objects,


'ViolatingObjects': no_violating_objects, 'EnvironmentScore': environment_score,
'Detections': str(objects_list), 'ViolationsIndexes': str(violating_objects_index_list)})
Distancing.py
import cv2 as cv
import numpy as np
from scipy.spatial.distance import cdist
import math
import os
import shutil
import time
from libs.centroid_object_tracker import CentroidTracker
from libs.loggers.loggers import Logger
from tools.environment_score import mx_environment_scoring_consider_crowd

50
from tools.objects_post_process import extract_violating_objects
from libs.utils import visualization_utils
from libs.utils.camera_calibration import get_camera_calibration_path
from libs.uploaders.s3_uploader import S3Uploader
import logging
logger = logging.getLogger(__name__)
class Distancing:

def __init__(self, config, source):


self.config = config
self.detector = None
self.device = self.config.get_section_dict('Detector')['Device']

self.classifier = None
self.classifier_img_size = None
self.face_mask_classifier = None

self.running_video = False
self.tracker = CentroidTracker(
max_disappeared=int(self.config.get_section_dict("PostProcessor")["MaxTrackFrame"]))
self.camera_id = self.config.get_section_dict(source)['Id']
self.logger = Logger(self.config, self.camera_id)
self.image_size = [int(i) for i in self.config.get_section_dict('Detector')['ImageSize'].split(',')]
self.default_dist_method = self.config.get_section_dict('PostProcessor')["DefaultDistMethod"]

if self.config.get_section_dict(source)["DistMethod"]:
self.dist_method = self.config.get_section_dict(source)["DistMethod"]
else:
self.dist_method = self.default_dist_method

self.dist_threshold = self.config.get_section_dict("PostProcessor")["DistThreshold"]

51
self.resolution = tuple([int(i) for i in self.config.get_section_dict('App')['Resolution'].split(',')])
self.birds_eye_resolution = (200, 300)

if self.dist_method == "CalibratedDistance":
calibration_file = get_camera_calibration_path(
self.config, self.config.get_section_dict(source)["Id"])
try:
with open(calibration_file, "r") as file:
self.h_inv = file.readlines()[0].split(" ")[1:]
self.h_inv = np.array(self.h_inv, dtype="float").reshape((3, 3))
except FileNotFoundError:
logger.error("The specified 'CalibrationFile' does not exist")
logger.info(f"Falling back using {self.default_dist_method}")
self.dist_method = self.default_dist_method

self.screenshot_period = float(
self.config.get_section_dict("App")["ScreenshotPeriod"]) * 60 # config.ini uses minutes as
unit
self.bucket_screenshots = config.get_section_dict("App")["ScreenshotS3Bucket"]
self.uploader = S3Uploader(self.config)
self.screenshot_path =
os.path.join(self.config.get_section_dict("App")["ScreenshotsDirectory"], self.camera_id)
if not os.path.exists(self.screenshot_path):
os.makedirs(self.screenshot_path)

def __process(self, cv_image):


"""
return object_list list of dict for each obj,
obj["bbox"] is normalized coordinations for [x0, y0, x1, y1] of box
"""

52
# Resize input image to resolution
cv_image = cv.resize(cv_image, self.resolution)

resized_image = cv.resize(cv_image, tuple(self.image_size[:2]))


rgb_resized_image = cv.cvtColor(resized_image, cv.COLOR_BGR2RGB)
tmp_objects_list = self.detector.inference(rgb_resized_image)
# Get the classifier result for detected face
if self.classifier is not None:
faces = []
for itm in tmp_objects_list:
if 'face' in itm.keys():
face_bbox = itm['face'] # [ymin, xmin, ymax, xmax]
if face_bbox is not None:
xmin, xmax = np.multiply([face_bbox[1], face_bbox[3]], self.resolution[0])
ymin, ymax = np.multiply([face_bbox[0], face_bbox[2]], self.resolution[1])
croped_face = cv_image[int(ymin):int(ymin) + (int(ymax) - int(ymin)),
int(xmin):int(xmin) + (int(xmax) - int(xmin))]

# Resizing input image


croped_face = cv.resize(croped_face, tuple(self.classifier_img_size[:2]))
croped_face = cv.cvtColor(croped_face, cv.COLOR_BGR2RGB)
# Normalizing input image to [0.0-1.0]
croped_face = np.array(croped_face) / 255.0
faces.append(croped_face)
faces = np.array(faces)
face_mask_results, scores = self.classifier.inference(faces)
[w, h] = self.resolution
idx = 0
for obj in tmp_objects_list:
if self.classifier is not None and 'face' in obj.keys():
if obj['face'] is not None:

53
obj['face_label'] = face_mask_results[idx]
idx = idx + 1
else:
obj['face_label'] = -1
box = obj["bbox"]
x0 = box[1]
y0 = box[0]
x1 = box[3]
y1 = box[2]
obj["centroid"] = [(x0 + x1) / 2, (y0 + y1) / 2, x1 - x0, y1 - y0]
obj["bbox"] = [x0, y0, x1, y1]
obj["centroidReal"] = [(x0 + x1) * w / 2, (y0 + y1) * h / 2, (x1 - x0) * w, (y1 - y0) * h]
obj["bboxReal"] = [x0 * w, y0 * h, x1 * w, y1 * h]

objects_list, distancings = self.calculate_distancing(tmp_objects_list)


anonymize = self.config.get_section_dict('PostProcessor')['Anonymize'] == "true"
if anonymize:
cv_image = self.anonymize_image(cv_image, objects_list)
return cv_image, objects_list, distancings

def gstreamer_writer(self, feed_name, fps, resolution):


"""
This method creates and returns an OpenCV Video Writer instance. The VideoWriter expects
its `.write()` method
to be called with a single frame image multiple times. It encodes frames into live video
segments and produces
a video segment once it has received enough frames to produce a 5-seconds segment of live
video.
The video segments are written on the filesystem. The target directory for writing segments is
determined by
`video_root` variable. In addition to writing the video segments, the VideoWriter also updates
a file named

54
playlist.m3u8 in the target directory. This file contains the list of generated video segments and
is updated
automatically.
This instance does not serve these video segments to the client. It is expected that the target
video directory
is being served by a static file server and the clientside HLS video library downloads
"playlist.m3u8". Then,
the client video player reads the link for video segments, according to HLS protocol, and
downloads them from
static file server.
:param feed_name: Is the name for video feed. We may have multiple cameras, each with
multiple video feeds (e.g. one
feed for visualizing bounding boxes and one for bird's eye view). Each video feed should be
written into a
separate directory. The name for target directory is defined by this variable.
:param fps: The HLS video player on client side needs to know how many frames should be
shown to the user per
second. This parameter is independent from the frame rate with which the video is being
processed. For example,
if we set fps=60, but produce only frames (by calling `.write()`) per second, the client will see
a loading
indicator for 5*60/30 seconds and then 5 seconds of video is played with fps 60.
:param resolution: A tuple of size 2 which indicates the resolution of output video.
"""
encoder = self.config.get_section_dict('App')['Encoder']
video_root = f'/repo/data/processor/static/gstreamer/{feed_name}'

shutil.rmtree(video_root, ignore_errors=True)
os.makedirs(video_root, exist_ok=True)

playlist_root = f'/static/gstreamer/{feed_name}'
if not playlist_root.endswith('/'):
playlist_root = f'{playlist_root}/'

55
# the entire encoding pipeline, as a string:
pipeline = f'appsrc is-live=true ! {encoder} ! mpegtsmux ! hlssink max-files=15 ' \
f'target-duration=5 ' \
f'playlist-root={playlist_root} ' \
f'location={video_root}/video_%05d.ts ' \
f'playlist-location={video_root}/playlist.m3u8 '

out = cv.VideoWriter(
pipeline,
cv.CAP_GSTREAMER,
0, fps, resolution
)

if not out.isOpened():
raise RuntimeError("Could not open gstreamer output for " + feed_name)
return out

def process_video(self, video_uri):


if self.device == 'Jetson':
from libs.detectors.jetson.detector import Detector
self.detector = Detector(self.config)
elif self.device == 'EdgeTPU':
from libs.detectors.edgetpu.detector import Detector
from libs.classifiers.edgetpu.classifier import Classifier
self.classifier = Classifier(self.config)
self.detector = Detector(self.config)
self.classifier_img_size = [int(i) for i in
self.config.get_section_dict('Classifier')['ImageSize'].split(',')]
elif self.device == 'Dummy':
from libs.detectors.dummy.detector import Detector
self.detector = Detector(self.config)

56
elif self.device == 'x86':
from libs.detectors.x86.detector import Detector
from libs.classifiers.x86.classifier import Classifier
self.detector = Detector(self.config)
if 'Classifier' in self.config.get_sections():
self.classifier = Classifier(self.config)
self.classifier_img_size = [int(i) for i in
self.config.get_section_dict('Classifier')['ImageSize'].split(',')]

if self.device != 'Dummy':
print('Device is: ', self.device)
print('Detector is: ', self.detector.name)
print('image size: ', self.image_size)

input_cap = cv.VideoCapture(video_uri)
fps = max(25, input_cap.get(cv.CAP_PROP_FPS))

if (input_cap.isOpened()):
logger.info(f'opened video {video_uri}')
else:
logger.error(f'failed to load video {video_uri}')
return

self.running_video = True

# enable logging gstreamer Errors (https://stackoverflow.com/questions/3298934/how-do-i-


view-gstreamer-debug-output)
os.environ['GST_DEBUG'] = "*:1"
out, out_birdseye = (
self.gstreamer_writer(feed, fps, resolution)
for (feed, resolution) in (

57
(self.camera_id, self.resolution),
(self.camera_id + '-birdseye', self.birds_eye_resolution)
)
)

dist_threshold = float(self.config.get_section_dict("PostProcessor")["DistThreshold"])
class_id = int(self.config.get_section_dict('Detector')['ClassID'])
frame_num = 0
start_time = time.time()
while input_cap.isOpened() and self.running_video:
_, cv_image = input_cap.read()
birds_eye_window = np.zeros(self.birds_eye_resolution[::-1] + (3,), dtype="uint8")
if np.shape(cv_image) != ():
cv_image, objects, distancings = self.__process(cv_image)
output_dict = visualization_utils.visualization_preparation(objects, distancings,
dist_threshold)

category_index = {class_id: {
"id": class_id,
"name": "Pedestrian",
}} # TODO: json file for detector config
# Draw bounding boxes and other visualization factors on input_frame
visualization_utils.visualize_boxes_and_labels_on_image_array(
cv_image,
output_dict["detection_boxes"],
output_dict["detection_classes"],
output_dict["detection_scores"],
output_dict["detection_colors"],
category_index,
instance_masks=output_dict.get("detection_masks"),
use_normalized_coordinates=True,

58
line_thickness=3,
)
# TODO: Implement perspective view for objects
birds_eye_window = visualization_utils.birds_eye_view(birds_eye_window,
output_dict["detection_boxes"],
output_dict["violating_objects"])
try:
fps = self.detector.fps
except:
# fps is not implemented for the detector instance"
fps = None

# Put fps to the frame


# region
# -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_-
txt_fps = 'Frames rate = ' + str(fps) + '(fps)' # Frames rate = 95 (fps)
# (0, 0) is the top-left (x,y); normalized number between 0-1
origin = (0.05, 0.93)
visualization_utils.text_putter(cv_image, txt_fps, origin)
# -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_-
# endregion

# Put environment score to the frame


# region
# -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_-
violating_objects = extract_violating_objects(distancings, dist_threshold)
env_score = mx_environment_scoring_consider_crowd(len(objects),
len(violating_objects))
txt_env_score = 'Env Score = ' + str(env_score) # Env Score = 0.7
origin = (0.05, 0.98)
visualization_utils.text_putter(cv_image, txt_env_score, origin)

59
# -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_- -_-
# endregion

out.write(cv_image)
out_birdseye.write(birds_eye_window)
frame_num += 1
if frame_num % 100 == 1:
logger.info(f'processed frame {frame_num} for {video_uri}')

# Save a screenshot only if the period is greater than 0, a violation is detected, and the
minimum period has occured
if (self.screenshot_period > 0) and (time.time() > start_time + self.screenshot_period) and
(
len(violating_objects) > 0):
start_time = time.time()
self.capture_violation(f"{start_time}_violation.jpg", cv_image)

self.save_screenshot(cv_image)
else:
continue
self.logger.update(objects, distancings)
input_cap.release()
out.release()
out_birdseye.release()
del self.detector
self.running_video = False

def stop_process_video(self):
self.running_video = False

def calculate_distancing(self, objects_list):

60
"""
this function post-process the raw boxes of object detector and calculate a distance matrix
for detected bounding boxes.
post processing is consist of:
1. omitting large boxes by filtering boxes which are bigger than the 1/4 of the size the image.
2. omitting duplicated boxes by applying an auxilary non-maximum-suppression.
3. apply a simple object tracker to make the detection more robust.
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such as
"id", "centroid" (a tuple of the normalized centroid coordinates (cx,cy,w,h) of the box) and
"bbox" (a tuple
of the normalized (xmin,ymin,xmax,ymax) coordinate of the box)
returns:
object_list: the post processed version of the input
distances: a NxN ndarray which i,j element is distance between i-th and l-th bounding box
"""
new_objects_list = self.ignore_large_boxes(objects_list)
new_objects_list = self.non_max_suppression_fast(new_objects_list,
float(self.config.get_section_dict("PostProcessor")[
"NMSThreshold"]))
tracked_boxes = self.tracker.update(new_objects_list)
new_objects_list = [tracked_boxes[i] for i in tracked_boxes.keys()]
for i, item in enumerate(new_objects_list):
item["id"] = item["id"].split("-")[0] + "-" + str(i)

centroids = np.array([obj["centroid"] for obj in new_objects_list])


distances = self.calculate_box_distances(new_objects_list)

return new_objects_list, distances

@staticmethod

61
def ignore_large_boxes(object_list):

"""
filtering boxes which are biger than the 1/4 of the size the image
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such as
"id", "centroid" (a tuple of the normalized centroid coordinates (cx,cy,w,h) of the box) and
"bbox" (a tuple
of the normalized (xmin,ymin,xmax,ymax) coordinate of the box)
returns:
object_list: input object list without large boxes
"""
large_boxes = []
for i in range(len(object_list)):
if (object_list[i]["centroid"][2] * object_list[i]["centroid"][3]) > 0.25:
large_boxes.append(i)
updated_object_list = [j for i, j in enumerate(object_list) if i not in large_boxes]
return updated_object_list

@staticmethod
def non_max_suppression_fast(object_list, overlapThresh):

"""
omitting duplicated boxes by applying an auxilary non-maximum-suppression.
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such
"id", "centroid" (a tuple of the normalized centroid coordinates (cx,cy,w,h) of the box) and
"bbox" (a tuple
of the normalized (xmin,ymin,xmax,ymax) coordinate of the box)
overlapThresh: threshold of minimum IoU of to detect two box as duplicated.
returns:

62
object_list: input object list without duplicated boxes
"""
# if there are no boxes, return an empty list
boxes = np.array([item["centroid"] for item in object_list])
corners = np.array([item["bbox"] for item in object_list])
if len(boxes) == 0:
return []
if boxes.dtype.kind == "i":
boxes = boxes.astype("float")
# initialize the list of picked indexes
pick = []
cy = boxes[:, 1]
cx = boxes[:, 0]
h = boxes[:, 3]
w = boxes[:, 2]
x1 = corners[:, 0]
x2 = corners[:, 2]
y1 = corners[:, 1]
y2 = corners[:, 3]
area = (h + 1) * (w + 1)
idxs = np.argsort(cy + (h / 2))
while len(idxs) > 0:
last = len(idxs) - 1
i = idxs[last]
pick.append(i)
xx1 = np.maximum(x1[i], x1[idxs[:last]])
yy1 = np.maximum(y1[i], y1[idxs[:last]])
xx2 = np.minimum(x2[i], x2[idxs[:last]])
yy2 = np.minimum(y2[i], y2[idxs[:last]])

w = np.maximum(0, xx2 - xx1 + 1)

63
h = np.maximum(0, yy2 - yy1 + 1)
# compute the ratio of overlap
overlap = (w * h) / area[idxs[:last]]
# delete all indexes from the index list that have
idxs = np.delete(idxs, np.concatenate(([last],
np.where(overlap > overlapThresh)[0])))
updated_object_list = [j for i, j in enumerate(object_list) if i in pick]
return updated_object_list

def calculate_distance_of_two_points_of_boxes(self, first_point, second_point):

"""
This function calculates a distance l for two input corresponding points of two detected
bounding boxes.
it is assumed that each person is H = 170 cm tall in real scene to map the distances in the
image (in pixels) to
physical distance measures (in meters).
params:
first_point: (x, y, h)-tuple, where x,y is the location of a point (center or each of 4 corners of a
bounding box)
and h is the height of the bounding box.
second_point: same tuple as first_point for the corresponding point of other box
returns:
l: Estimated physical distance (in centimeters) between first_point and second_point.
"""

# estimate corresponding points distance


[xc1, yc1, h1] = first_point
[xc2, yc2, h2] = second_point

dx = xc2 - xc1

64
dy = yc2 - yc1

lx = dx * 170 * (1 / h1 + 1 / h2) / 2
ly = dy * 170 * (1 / h1 + 1 / h2) / 2

l = math.sqrt(lx ** 2 + ly ** 2)

return l

def calculate_box_distances(self, nn_out):

"""
This function calculates a distance matrix for detected bounding boxes.
Three methods are implemented to calculate the distances, the first one estimates distance with
a calibration matrix
which transform the points to the 3-d world coordinate, the second one estimates distance of
center points of the
boxes and the third one uses minimum distance of each of 4 points of bounding boxes.
params:
object_list: a list of dictionaries. each dictionary has attributes of a detected object such as
"id", "centroidReal" (a tuple of the centroid coordinates (cx,cy,w,h) of the box) and "bboxReal"
(a tuple
of the (xmin,ymin,xmax,ymax) coordinate of the box)
returns:
distances: a NxN ndarray which i,j element is estimated distance between i-th and j-th
bounding box in real scene (cm)
"""
if self.dist_method == "CalibratedDistance":
world_coordinate_points = np.array([self.transform_to_world_coordinate(bbox) for bbox in
nn_out])
if len(world_coordinate_points) == 0:
distances_asarray = np.array([])

65
else:
distances_asarray = cdist(world_coordinate_points, world_coordinate_points)

else:
distances = []
for i in range(len(nn_out)):
distance_row = []
for j in range(len(nn_out)):
if i == j:
l=0
else:
if (self.dist_method == 'FourCornerPointsDistance'):
lower_left_of_first_box = [nn_out[i]["bboxReal"][0], nn_out[i]["bboxReal"][1],
nn_out[i]["centroidReal"][3]]
lower_right_of_first_box = [nn_out[i]["bboxReal"][2], nn_out[i]["bboxReal"][1],
nn_out[i]["centroidReal"][3]]
upper_left_of_first_box = [nn_out[i]["bboxReal"][0], nn_out[i]["bboxReal"][3],
nn_out[i]["centroidReal"][3]]
upper_right_of_first_box = [nn_out[i]["bboxReal"][2], nn_out[i]["bboxReal"][3],
nn_out[i]["centroidReal"][3]]

lower_left_of_second_box = [nn_out[j]["bboxReal"][0],
nn_out[j]["bboxReal"][1],
nn_out[j]["centroidReal"][3]]
lower_right_of_second_box = [nn_out[j]["bboxReal"][2],
nn_out[j]["bboxReal"][1],
nn_out[j]["centroidReal"][3]]
upper_left_of_second_box = [nn_out[j]["bboxReal"][0],
nn_out[j]["bboxReal"][3],
nn_out[j]["centroidReal"][3]]
upper_right_of_second_box = [nn_out[j]["bboxReal"][2],
nn_out[j]["bboxReal"][3],

66
nn_out[j]["centroidReal"][3]]

l1 = self.calculate_distance_of_two_points_of_boxes(lower_left_of_first_box,
lower_left_of_second_box)
l2 = self.calculate_distance_of_two_points_of_boxes(lower_right_of_first_box,
lower_right_of_second_box)
l3 = self.calculate_distance_of_two_points_of_boxes(upper_left_of_first_box,
upper_left_of_second_box)
l4 = self.calculate_distance_of_two_points_of_boxes(upper_right_of_first_box,
upper_right_of_second_box)

l = min(l1, l2, l3, l4)


elif (self.dist_method == 'CenterPointsDistance'):
center_of_first_box = [nn_out[i]["centroidReal"][0], nn_out[i]["centroidReal"][1],
nn_out[i]["centroidReal"][3]]
center_of_second_box = [nn_out[j]["centroidReal"][0],
nn_out[j]["centroidReal"][1],
nn_out[j]["centroidReal"][3]]

l = self.calculate_distance_of_two_points_of_boxes(center_of_first_box,
center_of_second_box)
distance_row.append(l)
distances.append(distance_row)
distances_asarray = np.asarray(distances, dtype=np.float32)
return distances_asarray

def transform_to_world_coordinate(self, bbox):


"""
This function will transform the center of the bottom line of a bounding box from image
coordinate to world
coordinate via a homography matrix

67
Args:
bbox: a dictionary of a coordinates of a detected instance with "id",
"centroidReal" (a tuple of the centroid coordinates (cx,cy,w,h) of the box) and "bboxReal" (a
tuple
of the (xmin,ymin,xmax,ymax) coordinate of the box) keys
Returns:
A numpy array of (X,Y) of transformed point
"""
floor_point = np.array([int((bbox["bboxReal"][0] + bbox["bboxReal"][2]) / 2),
bbox["bboxReal"][3], 1])
floor_world_point = np.matmul(self.h_inv, floor_point)
floor_world_point = floor_world_point[:-1] / floor_world_point[-1]
return floor_world_point

def anonymize_image(self, img, objects_list):


"""
Anonymize every instance in the frame.
"""
h, w = img.shape[:2]
for box in objects_list:
xmin = max(int(box["bboxReal"][0]), 0)
xmax = min(int(box["bboxReal"][2]), w)
ymin = max(int(box["bboxReal"][1]), 0)
ymax = min(int(box["bboxReal"][3]), h)
ymax = (ymax - ymin) // 3 + ymin
roi = img[ymin:ymax, xmin:xmax]
roi = self.anonymize_face(roi)
img[ymin:ymax, xmin:xmax] = roi
return img

@staticmethod

68
def anonymize_face(image):
"""
Blur an image to anonymize the person's faces.
"""
(h, w) = image.shape[:2]
kernel_w = int(w / 3)
kernel_h = int(h / 3)
if kernel_w % 2 == 0:
kernel_w -= 1
if kernel_h % 2 == 0:
kernel_h -= 1
return cv.GaussianBlur(image, (kernel_w, kernel_h), 0)

# TODO: Make this an async task?


def capture_violation(self, file_name, cv_image):
self.uploader.upload_cv_image(self.bucket_screenshots, cv_image, file_name, self.camera_id)

def save_screenshot(self, cv_image):


dir_path = f'{self.screenshot_path}/default.jpg'
if not os.path.exists(dir_path):
logger.info(f"Saving default screenshot for {self.camera_id}")
cv.imwrite(f'{self.screenshot_path}/default.jpg', cv_image)
visualization_utils.py
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#

69
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""A set of functions that are used for visualization.


These functions often receive an image, perform some visualization on the image.
Most functions do not return a value, instead they modify the image itself.
"""
import collections
import numpy as np
import PIL.Image as Image
import PIL.ImageColor as ImageColor
import PIL.ImageDraw as ImageDraw
import PIL.ImageFont as ImageFont
import cv2 as cv

_TITLE_LEFT_MARGIN = 10
_TITLE_TOP_MARGIN = 10

STANDARD_COLORS = [
"Green",
"Blue"
]

def draw_bounding_box_on_image_array(
image,
ymin,

70
xmin,
ymax,
xmax,
color=(255, 0, 0), # RGB
thickness=4,
display_str_list=(),
use_normalized_coordinates=True,
):
"""Adds a bounding box to an image (numpy array).

Bounding box coordinates can be specified in either absolute (pixel) or


normalized coordinates by setting the use_normalized_coordinates argument.

Args:
image: a numpy array with shape [height, width, 3].
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
draw_bounding_box_on_image(
image_pil,
ymin,

71
xmin,
ymax,
xmax,
color,
thickness,
display_str_list,
use_normalized_coordinates,
)
np.copyto(image, np.array(image_pil))

def draw_bounding_box_on_image(
image,
ymin,
xmin,
ymax,
xmax,
color=(255, 0, 0), # RGB
thickness=4,
display_str_list=(),
use_normalized_coordinates=True,
):
"""Adds a bounding box to an image.

Bounding box coordinates can be specified in either absolute (pixel) or


normalized coordinates by setting the use_normalized_coordinates argument.

Each string in display_str_list is displayed on a separate line above the


bounding box in black text on a rectangle filled with the input 'color'.
If the top of the bounding box extends to the edge of the image, the strings
are displayed below the bounding box.

72
Args:
image: a PIL.Image object.
ymin: ymin of bounding box.
xmin: xmin of bounding box.
ymax: ymax of bounding box.
xmax: xmax of bounding box.
color: color to draw bounding box. Default is red.
thickness: line thickness. Default value is 4.
display_str_list: list of strings to display in box
(each to be shown on its own line).
use_normalized_coordinates: If True (default), treat coordinates
ymin, xmin, ymax, xmax as relative to the image. Otherwise treat
coordinates as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
if use_normalized_coordinates:
(left, right, top, bottom) = (
xmin * im_width,
xmax * im_width,
ymin * im_height,
ymax * im_height,
)
else:
(left, right, top, bottom) = (xmin, xmax, ymin, ymax)
draw.line(
[(left, top), (left, bottom), (right, bottom), (right, top), (left, top)],
width=thickness,
fill=color,
)

73
try:
font = ImageFont.truetype("arial.ttf", 24)
except IOError:
font = ImageFont.load_default()

# If the total height of the display strings added to the top of the bounding
# box exceeds the top of the image, stack the strings below the bounding box
# instead of above.
display_str_heights = [font.getsize(ds)[1] for ds in display_str_list]
# Each display_str has a top and bottom margin of 0.05x.
total_display_str_height = (1 + 2 * 0.05) * sum(display_str_heights)

if top > total_display_str_height:


text_bottom = top
else:
text_bottom = bottom + total_display_str_height
# Reverse list and print from bottom to top.
for display_str in display_str_list[::-1]:
text_width, text_height = font.getsize(display_str)
margin = np.ceil(0.05 * text_height)
draw.rectangle(
[
(left, text_bottom - text_height - 2 * margin),
(left + text_width, text_bottom),
],
fill=color,
)
draw.text(
(left + margin, text_bottom - text_height - margin),
display_str,
fill="black",

74
font=font,
)
text_bottom -= text_height - 2 * margin

def draw_keypoints_on_image_array(
image, keypoints, color="red", radius=2, use_normalized_coordinates=True
):
"""Draws keypoints on an image (numpy array).

Args:
image: a numpy array with shape [height, width, 3].
keypoints: a numpy array with shape [num_keypoints, 2].
color: color to draw the keypoints with. Default is red.
radius: keypoint radius. Default value is 2.
use_normalized_coordinates: if True (default), treat keypoint values as
relative to the image. Otherwise treat them as absolute.
"""
image_pil = Image.fromarray(np.uint8(image)).convert("RGB")
draw_keypoints_on_image(
image_pil, keypoints, color, radius, use_normalized_coordinates
)
np.copyto(image, np.array(image_pil))

def draw_keypoints_on_image(
image, keypoints, color="red", radius=2, use_normalized_coordinates=True
):
"""Draws keypoints on an image.

Args:

75
image: a PIL.Image object.
keypoints: a numpy array with shape [num_keypoints, 2].
color: color to draw the keypoints with. Default is red.
radius: keypoint radius. Default value is 2.
use_normalized_coordinates: if True (default), treat keypoint values as
relative to the image. Otherwise treat them as absolute.
"""
draw = ImageDraw.Draw(image)
im_width, im_height = image.size
keypoints_x = [k[1] for k in keypoints]
keypoints_y = [k[0] for k in keypoints]
if use_normalized_coordinates:
keypoints_x = tuple([im_width * x for x in keypoints_x])
keypoints_y = tuple([im_height * y for y in keypoints_y])
for keypoint_x, keypoint_y in zip(keypoints_x, keypoints_y):
draw.ellipse(
[
(keypoint_x - radius, keypoint_y - radius),
(keypoint_x + radius, keypoint_y + radius),
],
outline=color,
fill=color,
)

def draw_mask_on_image_array(image, mask, color="red", alpha=0.4):


"""Draws mask on an image.

Args:
image: uint8 numpy array with shape (img_height, img_height, 3)
mask: a uint8 numpy array of shape (img_height, img_height) with

76
values between either 0 or 1.
color: color to draw the keypoints with. Default is red.
alpha: transparency value between 0 and 1. (default: 0.4)

Raises:
ValueError: On incorrect data type for image or masks.
"""
if image.dtype != np.uint8:
raise ValueError("`image` not of type np.uint8")
if mask.dtype != np.uint8:
raise ValueError("`mask` not of type np.uint8")
if np.any(np.logical_and(mask != 1, mask != 0)):
raise ValueError("`mask` elements should be in [0, 1]")
if image.shape[:2] != mask.shape:
raise ValueError(
"The image has spatial dimensions %s but the mask has "
"dimensions %s" % (image.shape[:2], mask.shape)
)
rgb = ImageColor.getrgb(color)
pil_image = Image.fromarray(image)

solid_color = np.expand_dims(np.ones_like(mask), axis=2) * np.reshape(


list(rgb), [1, 1, 3]
)
pil_solid_color = Image.fromarray(np.uint8(solid_color)).convert("RGBA")
pil_mask = Image.fromarray(np.uint8(255.0 * alpha * mask)).convert("L")
pil_image = Image.composite(pil_solid_color, pil_image, pil_mask)
np.copyto(image, np.array(pil_image.convert("RGB")))

def visualize_boxes_and_labels_on_image_array(

77
image,
boxes,
classes,
scores,
colors,
category_index,
instance_masks=None,
instance_boundaries=None,
keypoints=None,
use_normalized_coordinates=True,
max_boxes_to_draw=20,
min_score_thresh=0.0,
agnostic_mode=False,
line_thickness=4,
groundtruth_box_visualization_color="black",
skip_scores=False,
skip_labels=False,
):
"""Overlay labeled boxes on an image with formatted scores and label names.

This function groups boxes that correspond to the same location


and creates a display string for each detection and overlays these
on the image. Note that this function modifies the image in place, and returns
that same image.

Args:
image: uint8 numpy array with shape (img_height, img_width, 3)
boxes: a numpy array of shape [N, 4]
classes: a numpy array of shape [N]. Note that class indices are 1-based,
and match the keys in the label map.
scores: a numpy array of shape [N] or None. If scores=None, then

78
this function assumes that the boxes to be plotted are groundtruth
boxes and plot all boxes as black with no classes or scores.
colors: BGR fromat colors for drawing the boxes
category_index: a dict containing category dictionaries (each holding
category index `id` and category name `name`) keyed by category indices.
instance_masks: a numpy array of shape [N, image_height, image_width] with
values ranging between 0 and 1, can be None.
instance_boundaries: a numpy array of shape [N, image_height, image_width]
with values ranging between 0 and 1, can be None.
keypoints: a numpy array of shape [N, num_keypoints, 2], can
be None
use_normalized_coordinates: whether boxes is to be interpreted as
normalized coordinates or not.
max_boxes_to_draw: maximum number of boxes to visualize. If None, draw
all boxes.
min_score_thresh: minimum score threshold for a box to be visualized
agnostic_mode: boolean (default: False) controlling whether to evaluate in
class-agnostic mode or not. This mode will display scores but ignore
classes.
line_thickness: integer (default: 4) controlling line width of the boxes.
groundtruth_box_visualization_color: box color for visualizing groundtruth
boxes
skip_scores: whether to skip score when drawing a single detection
skip_labels: whether to skip label when drawing a single detection

Returns:
uint8 numpy array with shape (img_height, img_width, 3) with overlaid boxes.
"""
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)

79
box_to_color_map = collections.defaultdict(str)
box_to_instance_masks_map = {}
box_to_instance_boundaries_map = {}
box_to_keypoints_map = collections.defaultdict(list)
if not max_boxes_to_draw:
max_boxes_to_draw = boxes.shape[0]
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if instance_masks is not None:
box_to_instance_masks_map[box] = instance_masks[i]
if instance_boundaries is not None:
box_to_instance_boundaries_map[box] = instance_boundaries[i]
if keypoints is not None:
box_to_keypoints_map[box].extend(keypoints[i])
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ""
if not skip_labels:
if not agnostic_mode:
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]["name"]
else:
class_name = "N/A"
display_str = str(class_name)
if not skip_scores:
if not display_str:
display_str = "{}%".format(int(100 * scores[i]))
else:
display_str = "{}: {}%".format(

80
display_str, int(100 * scores[i])
)
box_to_display_str_map[box].append(display_str)
if agnostic_mode:
box_to_color_map[box] = "DarkOrange"
else:
box_to_color_map[box] = STANDARD_COLORS[
classes[i] % len(STANDARD_COLORS)
]

# Draw all boxes onto image.


for box, color in zip(boxes, colors):
xmin, ymin, xmax, ymax = box
if instance_masks is not None:
draw_mask_on_image_array(image, box_to_instance_masks_map[tuple(box)], color=color)
if instance_boundaries is not None:
draw_mask_on_image_array(
image, box_to_instance_boundaries_map[tuple(box)], color="red", alpha=1.0
)
draw_bounding_box_on_image_array(
image,
ymin,
xmin,
ymax,
xmax,
color=color,
thickness=line_thickness,
display_str_list=box_to_display_str_map[tuple(box)],
use_normalized_coordinates=use_normalized_coordinates,
)
if keypoints is not None:

81
draw_keypoints_on_image_array(
image,
box_to_keypoints_map[box],
color=color,
radius=line_thickness / 2,
use_normalized_coordinates=use_normalized_coordinates,
)

return image

def visualization_preparation(nn_out, distances, dist_threshold):


"""
prepare the objects boxes and id in order to visualize
Args:
nn_out: a list of dicionary contains normalized numbers of bonding boxes
{'id' : '0-0', 'bbox' : [x0, y0, x1, y1], 'score' : 0.99(optional} of shape [N, 3] or [N, 2]
distances: a symmetric matrix of normalized distances
dist_threshold: the minimum distance for considering unsafe distance between objects
Returns:
an output dictionary contains object classes, boxes, scores
"""
output_dict = {}
detection_classes = []
detection_scores = []
detection_boxes = []
is_violating = []
colors = []

distance = np.amin(distances + np.identity(len(distances)) * dist_threshold * 2, 0) if distances !=


[] else [dist_threshold]

82
for i, obj in enumerate(nn_out):
# Colorizing bounding box based on the distances between them
# R = 255 when dist=0 and R = 0 when dist > dist_threshold
redness_factor = 1.5
r_channel = np.maximum(255 * (dist_threshold - distance[i]) / dist_threshold, 0) *
redness_factor
g_channel = 255 - r_channel
b_channel = 0
# Create a tuple object of colors
color = (int(b_channel), int(g_channel), int(r_channel))
# Get the object id
obj_id = obj["id"]
# Split and get the first item of obj_id
obj_id = obj_id.split("-")[0]
box = obj["bbox"]
if "score" in obj:
score = obj["score"]
else:
score = 1.0
# Append all processed items
detection_classes.append(int(obj_id))
detection_scores.append(score)
detection_boxes.append(box)
colors.append(color)
is_violating.append(True) if distance[i] < dist_threshold else is_violating.append(False)
output_dict["detection_boxes"] = np.array(detection_boxes)
output_dict["detection_scores"] = detection_scores
output_dict["detection_classes"] = detection_classes
output_dict["violating_objects"] = is_violating
output_dict["detection_colors"] = colors
return output_dict

83
def birds_eye_view(input_frame, boxes, is_violating):
"""
This function receives a black window and draw circles (based on boxes) at the frame.
Args:
input_frame: uint8 numpy array with shape (img_height, img_width, 3)
boxes: A numpy array of shape [N, 4]
is_violating: List of boolean (True/False) which indicates the correspond object at boxes is
a violating object or not
Returns:
input_frame: Frame with red and green circles
"""
h, w = input_frame.shape[0:2]
for i, box in enumerate(boxes):
center_x = int((box[0] * w + box[2] * w) / 2)
center_y = int((box[1] * h + box[3] * h) / 2)
center_coordinates = (center_x, center_y)
color = (0, 0, 255) if is_violating[i] else (0, 255, 0)
input_frame = cv.circle(input_frame, center_coordinates, 2, color, 2)
return input_frame

def text_putter(input_frame, txt, origin, fontscale=0.75, color=(255, 0, 20), thickness=2):


"""
The function renders the specified text string in the image. This function does not return a
value instead it modifies the input image.
Args:
input_frame: The source image, is an RGB image.
txt: The specific text string for drawing.

84
origin: Top-left corner of the text string in the image. The resolution should be normalized
between 0-1
fontscale: Font scale factor that is multiplied by the font-specific base size.
color: Text Color. (BGR format)
thickness: Thickness of the lines used to draw a text.
"""
resolution = input_frame.shape
origin = int(resolution[1] * origin[0]), int(resolution[0] * origin[1])
font = cv.FONT_HERSHEY_SIMPLEX
cv.putText(input_frame, txt, origin, font, fontscale,
color, thickness, cv.LINE_AA)

SCREENSHOT

85
REFERENCES

1. R. E. Park, (1924) "The concept of social distance: As applied to the study of racial
relations", Journal of applied sociology, vol. 8, pp. 339334.
2. N. Karakayali, (2009) "Social distance and affective orientations 1", Sociological Forum, vol.
24, no. 3, pp. 538562.
3. Ajay Rupani, Pawan Whig , Gajendra Sujediya and Piyush Vyas, (2017) ”A robust
technique for image processing based on interfacing of Raspberry-Pi and FPGA using IoT,”
International Conference on Computer, Communications and Electronics (Comptelix),
IEEE Xplore: 18 Augus
4. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A Large-Scale
Hierarchical Image Database,” in CVPR, 2009.
5. C. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by
between-class attribute transfer,” in CVPR, 2009.
6. O. Russakovsky, J. Deng, Z. Huang, A. Berg, and L. Fei-Fei, “Detecting avocados to
zucchinis: what have we done, and where are we going?” in ICCV, 2013.
7. C. Fellbaum, WordNet: An electronic lexical database. Blackwell Books, 1998.
8. Mark Sandler Andrew Howard Menglong Zhu Andrey Zhmoginov Liang-Chieh Chen Google
Inc. {sandler, howarda, menglong, azhmogin, lcchen}@google.com

86

You might also like