Finding Vacant Parking Slots Using AI (YOLOv4) FINAL

Finding Vacant Parking Slots using AI
A Major Project Report Submitted To

Jawaharlal Nehru Technological University, Hyderabad in partial
fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
INFORMATION TECHNOLOGY
Submitted by:
K. Anil Kumar Reddy 17N31A1262

Mohammed Haji Baba 17N31A1296
Nandhigama Jeevan Reddy 17N31A12B2
Under the guidance of:

Mr. A. Yogananda MTech(Ph.D.)
Associate professor
DEPARTMENT OF INFORMATION TECHNOLOGY

MALLA REDDY COLLEGE OF ENGINEERING AND
TECHNOLOGY 2017-2021
(Autonomous Institution- UGC, Govt. of India)

(Affiliated to JNTUH, Hyderabad, Approved by AICTE, NBA &NAAC with ‘A’ Grade) Maisammaguda, Kompally,
Dhulapally, Secunderabad – 500100 website: www.mrcet.ac.in
Malla Reddy College of Engineering & Technology
DEPARTMENT OF INFORMATION TECHNOLOGY
(Autonomous Institution- UGC, Govt. of India)
(Affiliated to JNTUH, Hyderabad, Approved by AICTE, NBA &NAAC with ‘A’ Grade) Maisammaguda, Kompally,
Dhulapally,Secunderabad – 500100 website: www.mrcet.ac.in
CERTIFICATE
This is to certify that this is the bonafide record of the project entitled ‘’Finding Vacant
Parking Slots using AI, submitted by Kashireddy Anil Kumar Reddy (17N31A1262),
Mohammed Haji baba(17N31A1296) and Nandhigama Jeevan Reddy (17N31A12B2 ) of
B.Tech in the partial fulfillment of the requirement for the degree of Bachelor of Technology
in Information Technology, Department of IT during the year 2020-2021.The results
embodied in this project report have not been submitted to any other university or institute
for the award of any degree or diploma embodied in this project report have not been
submitted to any university or institute for the awardof any degree or diploma.
Internal Guide Head of the Department

Mr. A. Yogananda MTech(Ph.D.) Dr. G. Sharada (M. Tech, Ph. D, MISTE, MCSI)
Associate Professor Director & HOD
External Examiner
DECLARATION
we have declared that the project titled “FINDING VACENT PARKING SLOTS USING
AI”, submitted to Malla Reddy College of Engineering and Technology (UGC Autonomous),
affiliated to Jawaharlal Nehru Technological University Hyderabad (JNTUH) for the award of the
degree of bachelor of Information Technology is a result of original research carried out in this thesis.
Itis further declared that the project report or any part thereof has not been previously submitted to
any University or Institute for the award of the degree or diploma.
Kashireddy Anil Kumar Reddy 17N31A1262

ACKNOWLEDGEMENT
We feel honored and privileged to place our warm salutation to our college Malla Reddy College
of Engineering and Technology (UGC-Autonomous) and our Director Dr. VSK Reddy,
Principal Dr.S.Srinivas Rao who gave us the opportunity to have experience in engineering and
gain profound technical knowledge.
We express our heartiest thanks to Dr. G. Sharada, Head of the Department for encouraging us
in every aspect of our course and helping us realize our full potential.
We would like to thank our Project Guide Mr.A.Yogananda for his regular guidance,
suggestions and constant encouragement. We are extremely grateful to our Project Incharge Mr.
Praveen for her continuous monitoring and unflinching co-operation throughout project work.
We would like to thank our Class Incharge Mrs. Shilpa Thakur who in spite of being busy with
her academic duties took time to guide and keep us on the correct path.
We would also like to thank all the faculty members and supporting staff of the Department of
IT and all other departments who have been helpful directly or indirectly in making our project
a success.
We are extremely grateful to our parents for their blessings and prayers for the completion of
our project that gave us strength to do our project.
With regards and gratitude
Kashireddy Anil Kumar Reddy 17N31A1262

ABSTRACT
The car parking occupancy detection is one of the most important systems that are needed at
various parking lots. For this project, CNNs have been used because they achieve most promising
results than compared to the other traditional parking detections. This project presents a robust
technique for the car parking occupancy detection by going through the most of the parking issues
such as parking displacements, non-unified car sizes and inter object occlusion. This project
presents a real-time parking space detection based on the Convolutional Neural Networks (CNN).
The aim of this project is to solve the issue of car parking for some extent which is done by taking
videos from surveillance cameras and detecting the occupancy of the parking lot whether the
parking space is Empty or Occupied. By using this technique, the main parking lot issue, that is
availability of the parking spaces in the parking lot is solved, so that the drivers do not waste
much time in searching for the parking space and do not leave in frustration of not able to find
the exact Empty space. This project uses the Yolov4 object detection algorithm which is
implemented by deep neural network architecture. This is achieved by collecting the data from
five parking lots at our institute and training them using Yolov4 model. The detection is tested on
both images and videos and the results indicate that this method is most efficient in detecting a
car in parking lot
Malla Reddy College of Engineering & Technology
DEPARTMENT OF INFORMATION
TECHNOLOGY
1.INTRODUCTION ........................................................................................................................ 1
1.1 Overview of the Project ........................................................................................................... 1
1.2 PROBLEM STATEMENT ...................................................................................................... 2
1.3 OBJECTIVE............................................................................................................................ 2
2.SYSTEM ANALYSIS ................................................................................................................... 3
2.1 Existing System ....................................................................................................................... 3
2.2 Proposed system ...................................................................................................................... 4
2.3 Hardware Requirements ........................................................................................................... 5
2.4 Software Requirements ............................................................................................................ 5
2.3 SYSTEM STUDY ................................................................................................................... 9
3 TECHNOLOGIES USED .......................................................................................................... 12
3.1 PYTHON .............................................................................................................................. 12
3.2 CNN ...................................................................................................................................... 21
3.3 Artificial Intelligence ............................................................................................................. 22
3.4 Machine Learning .................................................................................................................. 24
3.4.1 Introduction .................................................................................................................... 24
3.5 Deep Learning ....................................................................................................................... 30
4 SYSTEM DESIGN ..................................................................................................................... 32
4.1 System Architecture ............................................................................................................... 32
4.2 YOLOv4 Architecture............................................................................................................ 32
4.3 UML Diagram ....................................................................................................................... 34
5 METHODOLOGY ..................................................................................................................... 36
5.1 Data Collection ...................................................................................................................... 36
5.2 Annotating Images ................................................................................................................. 36
5.3 Data Processing ..................................................................................................................... 37
5.4 YOLOv4................................................................................................................................ 37
5.5 Object detectors ..................................................................................................................... 52
5.6 Training and Testing .............................................................................................................. 54
6. IMPLEMENTATION ............................................................................................................... 56
6.1 Code: ..................................................................................................................................... 56
7.OUTPUTS................................................................................................................................... 59
8.CONCLUSION ........................................................................................................................... 66
8.1 Future scope .......................................................................................................................... 67
9.REFERENCES ........................................................................................................................... 68
1.INTRODUCTION
1.1 Overview of the Project
Finding a parking space in the parking lot these days is the major issue which cannot be
neglected because most of the time is wasted and also the energy of the drivers. Due to increase
in the population day by day the number of private vehicles is also increasing but due to some
reasons most of the times the number of parking spaces in a parking lot remains the same.
Sometimes, there may be an empty space in the parking lot but the driver who sits in the car
cannot know where the empty space is exactly located.
The reason behind that may be the free spot is far away from the car, or may be the
parking space is not visible because of some other cars or big objects hiding the empty space.
In the previous days, or may be at some parking lots now there used to be a person who manages
the parking spaces, in fact the person who manages also do not have the information about
where the empty space is located, they get to know only the total number of empty spaces so
the drivers themselves have to look around parking lot and search for the empty space,
meanwhile other driver arrives there are many losses such as time, fuel and also temper.
So, for these kinds of issues the researchers have developed some solutions like parking lot
recognition techniques by making use of some objects like video cameras and sensors which
are used to detect an empty and occupied state of the parking space This system can be
implemented mostly in the outdoor parking lot so that this can cover the whole parking lot.
Outdoor parking lots are like parking lots at shopping malls, restaurants and at universities.
This can be useful in real-time application by knowing the empty and occupied parking spaces
which also helps in reducing the time taken for parking a vehicle, pollution and can also reduce
the queues at the parking lot.
1|Page
1.2 PROBLEM STATEMENT
As the population grows there is an increase in the private vehicles which caused many issues
to the people at the crowded areas such as shopping malls, restaurants and universities parking
lots. This results in time waste for the drivers in search for the empty parking space which
includes atmospheric pollution and frustration of the drivers. On the other hand, most of the
other systems show only the total number of spaces empty or occupied rather than showing the
exact location.
1.3 OBJECTIVE
 To design a model which can efficiently detect the Empty and Occupied status of
the parking space in an image and a video by making use of Yolov4 object
detection.
 To detect a car parking lot occupancy in an image and in the video whether it is
Empty or Occupied by using Yolov4 detector which is represented by overlapping
the output in OpenCV and to collect data which is good and large enough to train
the model.
2|Page
2.SYSTEM ANALYSIS
2.1 Existing System
It is very often to see large number crowds at some places like, mega shopping malls,
stadiums and universities. In one of the heavily crowded places like shopping malls,
there will be huge crowds whenever there are any discounts and sales season presented
by merchants. As many people travel to these shopping malls in their own vehicles, so
the parking lots will also be crowded these days. Similarly, students and the staff at the
universities come to the institute by their own vehicles. There are many drivers who
only park at their favorite parking spaces in the parking lot during these periods.
Generally, the solutions for the parking lot problems are categorized into 3 groups:
• Counter-based
• Sensor-based
• Vision-based
Counter-based: This type of system work on the sensors which are placed at the entrance and
exit points of the parking lots. These can only give the information about the total number of
empty or occupied spaces in the parking lot but do not give the exact location about the empty
space, so these types of systems cannot be used on street and residential parking spaces.
Sensor-based: In the sensor-based systems there are wire and wireless based sensors in
which wired systems which mainly depend ultrasonic, infrared light and the wireless
based on the magnetic-based sensors which are fixed at the parking lot. Both the wired
and wireless are being used mainly at the parking lot like shopping malls which are
indoor. Since, these are mainly used at the indoor environments these ground sensors can
be used to decide the position of the parking spaces.
But, this kind of sensors requires high maintaining of sensors and installing costs
at each and every parking space, so on a total this might become expensive when there
are many parking spaces at a parking lot. So, the solution for this is to use application-
focused sensors which can be like magnetic or infrared sensors and pneumatic road tubes.
So, these kinds of sensors are very rarely used in some cities. Some paperwork will be
required for installing these sensors for the parking space detection and time will also be
wasted and the drivers will face some issues while they are willing to park their cars when
3|Page
the installation of the sensors are going on.
Vision-based: By using vision-based systems than the sensor-based has two advantages:
versatility and lower cost. Rather than using one ground sensor per parking space a smart
camera can be used which decreases the cost. Compared to the other systems the accuracy
of this system is far better than the others. Therefore, rather than using the other systems
it is better to go with the vision-based systems which gives high accuracy results. In
addition to the above advantages these can also be used to do the video surveillance
activities and can be used to face recognition or people recognition and video tracking of
the moving cars and people. In the below fig 2.3 the image represents the vision-based
parking.
2.2 Proposed system

In the proposed method this would like to present a robust detection algorithm based
on the deep learning techniques with good accuracy. The proposed method detects a car
occupancy in the parking lot in an image and in the video by training the model with the data
of the parking lot created. This shows that the convolutional neural networks work will have a
success rate close to the traditional methods used in the past.
The system will be able to detect the car occupancy in the parking lot according to the
camera direction when the camera is installed by the user. In this the images are collected from
the surveillance camera in different angles of the parking lot so that the model can detect the
car in different angles. The data for this is collected form five parking lots from the institute
for better results. As mentioned earlier there are some disadvantages included in this vision-
based system, the detection in this is more accurate and the advantages far outweighs its
disadvantages.
Image-based detection methods are widely used in many areas. With the help of
Convolutional Neural Networks (CNNs) in image category recognition tasks we can find
parking space. Parking occupancy detection framework by using a deep CNN and a
YOLOV4 classifier to detect the occupancy of outdoor parking spaces from images.
The classifier was trained and tested by the features learned by the deep CNN from public
datasets (PK lot) having different illuminance and weather conditions. Subsequently, we
evaluate the transfer learning performance of the developed method on a parking dataset
created for this project. To detect a car parking lot occupancy in an image and in the video
whether it is Empty or Occupied by using YOLOv4 detector which is represented by
4|Page
overlapping the output in OpenCV and to collect data which is good and large enough to
train the model
2.3 Hardware Requirements

 Ram: 2GB
 Intel chip: i3
 Refresh rate: 60hz
 Hard-disk: 40 GB
2.4 Software Requirements
We have used Python and the Libraries used in these projects are:
 OpenCV: OpenCV (Open-Source Computer Vision Library) is an open-source computer

vision and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. Being a BSD-licensed product, OpenCV makes it
easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive
set of both classic and state-of-the-art computer vision and machine learning algorithms.
These algorithms can be used to detect and recognize faces, identify objects, classify
human actions in videos, track camera movements, track moving objects, extract 3D
models of objects, produce 3D point clouds from stereo cameras, stitch images together
to produce a high resolution image of an entire scene, find similar images from an image
database, remove red eyes from images taken using flash, follow eye movements,
recognize scenery and establish markers to overlay it with augmented reality, etc.
OpenCV has more than 47 thousand people of user community and estimated number of
downloads exceeding 18 million. The library is used extensively in companies, research
groups and by governmental bodies.
Along with well-established companies like Google, Yahoo, Microsoft, Intel, IBM, Sony,
Honda, Toyota that employ the library, there are many start-ups such as Applied Minds,
VideoSurf, and Zeitera, that make extensive use of OpenCV. OpenCV’s deployed uses
span the range from stitching street view images together, detecting intrusions in
5|Page
surveillance video in Israel, monitoring mine equipment in China, helping robots navigate
and pick up objects at Willow Garage, detection of swimming pool drowning accidents
in Europe, running interactive art in Spain and New York, checking runways for debris in
Turkey, inspecting labels on products in factories around the world on to rapid face
detection in Japan.
It has C++, Python, Java and MATLAB interfaces and supports Windows,
Linux, Android and Mac OS. OpenCV leans mostly towards real-time vision applications
and takes advantage of MMX and SSE instructions when available. A full-featured CUDA
and OpenCL interfaces are being actively developed right now. There are over 500
algorithms and about 10 times as many functions that compose or support those algorithms.
OpenCV is written natively in C++ and has a templated interface that works seamlessly
with STL containers.
 Lxml:
lxml is a Pythonic, mature binding for the libxml2 and libxslt libraries. It provides safe and
convenient access to these libraries using the Element Tree API. It extends the Element Tree
API significantly to offer support for XPath, RelaxNG, XML Schema, XSLT, C14N and
much more.
 Tqdm:
tqdm derives from the Arabic word taqaddum which can mean “progress,” and is an
abbreviation for “I love you so much” in Spanish (te quiero demasiado).
Instantly make your loops show a smart progress meter - just wrap any iterable with
tqdm(iterable), and you’re done!
 TensorFlow==2.3.0rc0
TensorFlow is an end-to-end open-source platform for machine learning. It has a

comprehensive, flexible ecosystem of tools, libraries and community resources that lets
researchers push the state-of-the-art in ML and developers easily build and deploy ML
powered applications.
 Easy model building

Build and train ML models easily using intuitive high-level APIs like Keras with eager
6|Page
execution, which makes for immediate model iteration and easy debugging.
Robust ML production anywhere

Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no
matter what language you use.
Powerful experimentation for research
A simple and flexible architecture to take new ideas from concept to code, to state-of-
the-art models, and to publication faster.
 absl-py:
This repository is a collection of Python library code for building Python applications.
The code is collected from Google's own Python code base, and has been extensively
tested and used in production.
Features
 Simple application startup
 Distributed command line flags system
 Custom logging module with additional features
 Testing utilities
Installation
To install the package, simply run:
pip install absl-py
Or install from source:
python setup.py install
Running Tests
To run Abseil tests, you can clone the git repo and run bazel:
Git clone https://github.com/abseil/abseil-py.git
cd abseil-py
bazel test absl/...
 easydict: EasyDict allows to access dict values as attributes (works recursively). A
JavaScript-like properties dot notation for python dicts.
 Matplotlib: Matplotlib is a plotting library for the Python programming language and
its numerical mathematics extension NumPy. It provides an object-oriented API for
embedding plots into applications using general-purpose GUI toolkits like Tkinter,
7|Page
wxPython, Qt, or GTK. There is also a procedural "pylab" interface based on a state
machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is
discouraged.[3] SciPy makes use of Matplotlib.
Matplotlib was originally written by John D. Hunter. Since then, it has an active
development community and is distributed under a BSD-style license. Michael Droettboom
was nominated as matplotlib's lead developer shortly before John Hunter's death in August
2012and was further joined by Thomas Caswell.
Matplotlib 2.0.x supports Python versions 2.7 through 3.6. Python 3 support started with
Matplotlib 1.2. Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has
pledged not to support Python 2 past 2020 by signing the Python 3 Statement.
 Pillow: Pillow is the friendly PIL fork by Alex Clark and Contributors. PIL is the Python
Imaging Library by Fredrik Lundh and Contributors. As of 2019, Pillow development is
supported by Tidelift. The Python Imaging Library adds image processing capabilities to
your Python interpreter.
This library provides extensive file format support, an efficient internal representation, and
fairly powerful image processing capabilities.
The core image library is designed for fast access to data stored in a few basic pixel formats.
It should provide a solid foundation for a general image processing tool.
 pytesseract
Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will
recognize and “read” the text embedded in images.
Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a

stand-alone invocation script to tesseract, as it can read all image types supported by the
Pillow and Leptonica imaging libraries, including jpeg, png, gif, bmp, tiff, and others.
Additionally, if used as a script, Python-tesseract will print the recognized text instead of
writing it to a file.
8|Page
2.3 SYSTEM STUDY
FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth
with a very general plan for the project and some cost estimates. During system analysis the
feasibility study of the proposed system is to be carried out. This is to ensure that the proposed
system is not a burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
• ECONOMICAL FEASIBILITY
• TECHNICAL FEASIBILITY
• SOCIAL FEASIBILITY
ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus, the developed system as
well within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.
9|Page
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system
and to make him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user of the system.
INPUT AND OUTPUT DESIGN
INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation and those steps are
necessary to put transaction data in to a usable form for processing can be achieved by
inspecting the computer to read data from a written or printed document or it can occur by
having people keying the data directly into the system. The design of input focuses on
controlling the amount of input required, controlling the errors, avoiding delay, avoiding extra
steps and keeping the process simple. The input is designed in such a way so that it provides
security and ease of use with retaining the privacy. Input Design considered the following
things:
What data should be given as input?
How the data should be arranged or coded?
The dialog to guide the operating personnel in providing input.
Methods for preparing input validations and steps to follow when error occur.
OBJECTIVES
1.Input Design is the process of converting a user-oriented description of the input
into a computer-based system. This design is important to avoid errors in the data input process
and show the correct direction to the management for getting correct information from the
computerized system.
2. It is achieved by creating user-friendly screens for the data entry to handle large
volume of data. The goal of designing input is to make data entry easier and to be free from
errors. The data entry screen is designed in such a way that all the data manipulates can be
performed. It also provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered with
10 | P a g e
the help of screens. Appropriate messages are provided as when needed so that the user will
not be in maize of instant. Thus, the objective of input design is to create an input layout that
is easy to follow
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and
presents the information clearly. In any system results of processing are communicated to the
users and to other system through outputs. In output design it is determined how the
information is to be displaced for immediate need and also the hard copy output. It is the most
important and direct source information to the user. Efficient and intelligent output design
improves the system’s relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought out
manner; the right output must be developed while ensuring that each output element is designed
so that people will find the system can use easily and effectively. When analysis design
computer output, they should Identify the specific output that is needed to meet the
requirements.
2.Select methods for presenting information.
3.Create document, report, or other formats that contain information produced by
the system.
The output form of an information system should accomplish one or more of the
following objectives.
• Convey information about past activities, current status or projections of the
• Future.
• Signal important events, opportunities, problems, or warnings.
• Trigger an action.
• Confirm an action.
11 | P a g e
3 TECHNOLOGIES USED
3.1 PYTHON
Python is a general-purpose interpreted, interactive, object-oriented, and high-level

programming language. An interpreted language, Python has a design philosophy that
emphasizes code readability (notably using whitespace indentation to delimit code blocks
rather than curly brackets or keywords), and a syntax that allows programmers to express
concepts in fewer lines of code than might be used in languages such as C++or Java.
It provides constructs that enable clear programming on both small and large scales.
Python interpreters are available for many operating systems. Python, the reference
implementation of Python, is open-source software and has a community-based development
model, as do nearly all of its variant implementations. Python is managed by the non-profit
Python Software Foundation. Python features a dynamic type system and automatic memory
management. It supports multiple programming paradigms, including object-oriented,
imperative, functional and procedural, and has a large and comprehensive standard library
What is Python
Python is a popular programming language. It was created by Guido van Rossum, and released
in 1991.
It is used for:
 web development (server-side),
 software development
 mathematics
 system scripting.
What can Python do
 Python can be used on a server to create web applications.
 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready software
development.
Why Python
 Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc.).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer lines than some
other programming languages.
 Python runs on an interpreter system, meaning that code can be executed as soon as it
is written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-orientated way or a functional
way.
12 | P a g e
Good to know
 The most recent major version of Python is Python 3, which we shall be using in this
tutorial. However, Python 2, although not being updated with anything other than
security updates, is still quite popular.
 In this tutorial Python will be written in a text editor. It is possible to write Python in
an Integrated Development Environment, such as Thonny, PyCharm, NetBeans or
Eclipse which are particularly useful when managing larger collections of Python files.
Python Syntax compared to other programming languages
 Python was designed for readability, and has some similarities to the English language
with influence from mathematics.
 Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets for
this purpose.
Python Install
Many PCs and Macs will have python already installed.
To check if you have python installed on a Windows PC, search in the start bar for Python or
run the following on the Command Line (cmd.exe):
C:\Users\Your Name>python --version
To check if you have python installed on a Linux or Mac, then on Linux open the command
line or on Mac open the Terminal and type:
python --version
If you find that you do not have python installed on your computer, then you can download it
for free from the following website: https://www.python.org/
Python QuickStart
Python is an interpreted programming language; this means that as a developer you write
Python (.py) files in a text editor and then put those files into the python interpreter to be
executed.
The way to run a python file is like this on the command line:
C:\Users\Your Name>python helloworld.py
Where "helloworld.py" is the name of your python file.
Let's write our first Python file, called helloworld.py, which can be done in any text editor.
helloworld.py
print ("Hello, World!")
13 | P a g e
Simple as that. Save your file. Open your command line, navigate to the directory where you
saved your file, and run:
C:\Users\Your Name>python helloworld.py
The output should read:
Hello, World!
Congratulations, you have written and executed your first Python program.
The Python Command Line
To test a short amount of code in python sometimes it is quickest and easiest not to write the
code in a file. This is made possible because Python can be run as a command line itself.
Type the following on the Windows, Mac or Linux command line:
C:\Users\Your Name>python
Or, if the "python" command did not work, you can try "py":
C:\Users\Your Name>py
From there you can write any python, including our hello world example from earlier in the
tutorial:
Python 3.6.4 (v3.6.4: d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bits (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>print ("Hello, World!")
Which will write "Hello, World!" in the command line:
Python 3.6.4 (v3.6.4: d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bits (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>print ("Hello, World!")
Hello, World!
Whenever you are done in the python command line, you can simply type the following to quit
the python command line interface:
exit ()
Virtual Environments and Packages
Introduction
Python applications will often use packages and modules that don’t come as part of the standard
library. Applications will sometimes need a specific version of a library, because the
14 | P a g e
application may require that a particular bug has been fixed or the application may be written
using an obsolete version of the library’s interface.
This means it may not be possible for one Python installation to meet the requirements of every
application. If application A needs version 1.0 of a particular module but application B needs
version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will
leave one application unable to run.
The solution for this problem is to create a virtual environment, a self-contained directory tree
that contains a Python installation for a particular version of Python, plus a number of
additional packages.
Different applications can then use different virtual environments. To resolve the earlier
example of conflicting requirements, application A can have its own virtual environment with
version 1.0 installed while application B has another virtual environment with version 2.0. If
application B requires a library be upgraded to version 3.0, this will not affect application A’s
environment.
Creating Virtual Environments
The module used to create and manage virtual environments is called venv. venv will usually
install the most recent version of Python that you have available. If you have multiple versions
of Python on your system, you can select a specific Python version by running python3 or
whichever version you want.
To create a virtual environment, decide upon a directory where you want to place it, and run
the venv module as a script with the directory path:
python3 -m venv tutorial-env
This will create the tutorial-env directory if it doesn’t exist, and also create directories inside it
containing a copy of the Python interpreter, the standard library, and various supporting files.
A common directory location for a virtual environment is. venv. This name keeps the directory
typically hidden in your shell and thus out of the way while giving it a name that explains why
the directory exists. It also prevents clashing with. env environment variable definition files
that some tooling supports.
Once you’ve created a virtual environment, you may activate it.
On Windows, run:
tutorial-env\Scripts\activate.bat
On Unix or MacOS, run:
source tutorial-env/bin/activate
(This script is written for the bash shell. If you use the csh or fish shells, there are alternate
activate. Csh and activate. Fish scripts you should use instead.)
15 | P a g e
Activating the virtual environment will change your shell’s prompt to show what virtual
environment you’re using, and modify the environment so that running python will get you
that particular version and installation of Python. For example:
$ source ~/envs/tutorial-env/bin/activate
(tutorial-env) $ python
Python 3.5.1 (default, May 6 2016, 10:59:36)
...
>>> import sys
>>>sys.path
['', '/usr/local/lib/python35.zip', ...,
'~/envs/tutorial-env/lib/python3.5/site-packages']
>>>
12.3. Managing Packages with pip
You can install, upgrade, and remove packages using a program called pip. By default pip will
install packages from the Python Package Index, <https://pypi.org>. You can browse the
Python Package Index by going to it in your web browser, or you can use pip’s limited search
feature:
(tutorial-env) $ pip search astronomy
skyfield - Elegant astronomy for Python
gary - Galactic astronomy and gravitational dynamics.
novas - The United States Naval Observatory NOVAS astronomy library
astroobs - Provides astronomy ephemeris to plan telescope observations
PyAstronomy - A collection of astronomy related tools for Python.
...
pip has a number of subcommands: “search”, “install”, “uninstall”, “freeze”, etc. (Consult the
Installing Python Modules guide for complete documentation for pip.)
You can install the latest version of a package by specifying a package’s name:
(tutorial-env) $ pip install novas
Collecting novas
Downloading novas-3.1.1.3.tar.gz (136kB)
Installing collected packages: novas
Running setup.py install for novas
16 | P a g e
Successfully installed novas-3.1.1.3
You can also install a specific version of a package by giving the package name followed by
== and the version number:
(tutorial-env) $ pip install requests==2.6.0
Collecting requests==2.6.0
Using cached requests-2.6.0-py2.py3-none-any.whl
Installing collected packages: requests
Successfully installed requests-2.6.0
If you re-run this command, pip will notice that the requested version is already installed and
do nothing. You can supply a different version number to get that version, or you can run pip
install --upgrade to upgrade the package to the latest version:
(tutorial-env) $ pip install --upgrade requests

Collecting requests
Installing collected packages: requests
Found existing installation: requests 2.6.0
Uninstalling requests-2.6.0:
Successfully uninstalled requests-2.6.0
Successfully installed requests-2.7.0
pip uninstall followed by one or more package names will remove the packages from the virtual
environment.
pip show will display information about a particular package:
(tutorial-env) $ pip show requests

---
Metadata-Version: 2.0
Name: requests
Version: 2.7.0
Summary: Python HTTP for Humans.
Home-page: http://python-requests.org
Author: Kenneth Reitz
17 | P a g e
Author-email: me@kennethreitz.com
License: Apache 2.0
Location: /Users/akuchling/envs/tutorial-env/lib/python3.4/site-packages
Requires:
pip list will display all of the packages installed in the virtual environment:
(tutorial-env) $ pip list

novas (3.1.1.3)
numpy (1.9.2)
pip (7.0.3)
requests (2.7.0)
setup tools (16.0)
pip freeze will produce a similar list of the installed packages, but the output uses the format
that pip install expects. A common convention is to put this list in a requirements.txt file:
(tutorial-env) $ pip freeze > requirements.txt

(tutorial-env) $ cat requirements.txt
novas==3.1.1.3
numpy==1.9.2
requests==2.7.0
The requirements.txt can then be committed to version control and shipped as part of an
application. Users can then install all the necessary packages with install -r:
(tutorial-env) $ pip install -r requirements.txt

Collecting novas==3.1.1.3 (from -r requirements.txt (line 1))
...
Collecting numpy==1.9.2 (from -r requirements.txt (line 2))
...
Collecting requests==2.7.0 (from -r requirements.txt (line 3))
...
Installing collected packages: novas, numpy, requests
18 | P a g e
Running setup.py install for novas
Successfully installed novas-3.1.1.3 numpy-1.9.2 requests-2.7.0
pip has many more options. Consult the Installing Python Modules guide for complete
documentation for pip. When you’ve written a package and want to make it available on the
Python Package Index, consult the Distributing Python Modules guide.
Cross Platform
Platform. Architecture (executable=sys. executable, bits='', linkage='')
Queries the given executable (defaults to the Python interpreter binary) for various architecture
information.
Returns a tuple (bits, linkage) which contain information about the bit architecture and the
linkage format used for the executable. Both values are returned as strings.
Values that cannot be determined are returned as given by the parameter presets. If bits are
given as '', the size of(pointer) (or size of(long) on Python version < 1.5.2) is used as indicator
for the supported pointer size.
The function relies on the system’s file command to do the actual work. This is available on
most if not all Unix platforms and some non-Unix platforms and then only if the executable
points to the Python interpreter. Reasonable defaults are used when the above needs are not
met.
Note on Mac OS X (and perhaps other platforms), executable files may be universal files
containing multiple architectures.
To get at the “64-bitness” of the current interpreter, it is more reliable to query the sys.maxsize
attribute:
is_64bits = sys.maxsize> 2**32
platform.machine ()
Returns the machine type, e.g., 'i386'. An empty string is returned if the value cannot be
determined.
platform.node ()
Returns the computer’s network name (may not be fully qualified!). An empty string is returned
if the value cannot be determined.
platform. Platform (aliased=0, terse=0)
Returns a single string identifying the underlying platform with as much useful information as
possible.
The output is intended to be human readable rather than machine parseable. It may look
different on different platforms and this is intended.
19 | P a g e
If aliased is true, the function will use aliases for various platforms that report system names
which differ from their common names, for example SunOS will be reported as Solaris. The
system alias() function is used to implement this.
Setting terse to true causes the function to return only the absolute minimum information
needed to identify the platform.
platform.processor()
Returns the (real) processor name, e.g., 'amdk6'.
An empty string is returned if the value cannot be determined. Note that many platforms do not
provide this information or simply return the same value as for machine(). NetBSD does this.
platform.python_build()
Returns a tuple (build no, build date) stating the Python build number and date as strings.
platform.python_compiler()
Returns a string identifying the compiler used for compiling Python.
platform.python_branch()
Returns a string identifying the Python implementation SCM branch.
New in version 2.6.
platform.python_implementation()
Returns a string identifying the Python implementation. Possible return values are: ‘CPython’,
‘IronPython’, ‘Jython’, ‘PyPy’.
New in version 2.6.
platform.python_revision()
Returns a string identifying the Python implementation SCM revision.
New in version 2.6.
platform.python_version()
Returns the Python version as string 'major.minor.patchlevel'.
Note that unlike the Python sys.version, the returned value will always include the patch level
(it defaults to 0).
platform.python_version_tuple()
Returns the Python version as tuple (major, minor, patch level) of strings.
Note that unlike the Python sys.version, the returned value will always include the patch level
(it defaults to '0').
platform.release()
20 | P a g e
Returns the system’s release, e.g., '2.2.0' or 'NT' An empty string is returned if the value cannot
be determined.
platform.system()
Returns the system/OS name, e.g., 'Linux', 'Windows', or 'Java'. An empty string is returned if
the value cannot be determined.
platform.system_alias(system, release, version)
Returns (system, release, version) aliased to common marketing names used for some systems.
It also does some reordering of the information in some cases where it would otherwise cause
confusion.
platform.version()
Returns the system’s release version, e.g., '#3 on degas'. An empty string is returned if the value
cannot be determined.
platform.uname()
Fairly portable uname interface. Returns a tuple of strings (system, node, release, version,
machine, processor) identifying the underlying platform.
Note that unlike the os.uname() function this also returns possible processor information as
additional tuple entry.
Entries which cannot be determined are set to ''.
3.2 CNN
Convolutional Neural Networks (ConvNets or CNNs) are same as the human neural
networks which are built with weights and neurons. So, by using Convolutional Neural
Networks the task can be easily learned by the network. So, this is used with the preexisting
algorithm to recognize in real-time the occupancy of the parking space. This includes the
quality, accurate, efficient and the scalable answer for the real-time recognition of the parking
lot. Convolutional Neural Networks are different kinds of multi-layered network, which is
designed in such a way that with only some preprocessing it can detect the visual patterns from
the image itself. The learning techniques, particularly the convolutional neural networks
provide the solution to the problems such as parking occupancy detection.
This deep learning techniques provides a solution for the which is good for disturbances like
partial occlusions, presence of shadows, variation of light conditions and exhibits great
performance. The results provide some quality when considered with the parking occupancy
21 | P a g e
detection and environment which are totally different from the data used while training the
model. In the phase of classification, it needs computational resources which makes it
appropriate to the embedded environment such as Raspberry Pi. A convolutional neural
network is a combination of a large number of hidden layers in which each layer does
mathematical computations on the image and gives an output image that is given in input to
the network.
A convolutional neural network is different from the classical neural networks which is the
CNN has convolutional layers, which are able to model the spatial correlation of neighboring
pixels better than the normal fully connected layers. The classification output is given by the
input which is given at the training phase. The model takes lots of training time and also a bit
expensive but after the training is done the detection from the network is quite fast and efficient.
The below fig represents the Convolutional Neural Network (CNN) architecture.
Fig: CNN Architecture
3.3 Artificial Intelligence

Introduction to Artificial Intelligence
“The science and engineering of making intelligent machines, especially intelligent computer
programs”. -John McCarthy-
Artificial Intelligence is an approach to make a computer, a robot, or a product to think how
smart human think. AI is a study of how human brain think, learn, decide and work, when it
tries to solve problems. And finally, this study outputs intelligent software systems. The aim of
AI is to improve computer functions which are related to human knowledge, for example,
reasoning, learning, and problem-solving.
The intelligence is intangible. It is composed of
22 | P a g e
Reasoning
Learning
Problem Solving
Perception
Linguistic Intelligence
The objectives of AI research are reasoning, knowledge representation, planning, learning,
natural language processing, realization, and ability to move and manipulate objects. There are
long-term goals in the general intelligence sector.
Approaches include statistical methods, computational intelligence, and traditional coding AI.
During the AI research related to search and mathematical optimization, artificial neural
networks and methods based on statistics, probability, and economics, we use many tools.
Computer science attracts AI in the field of science, mathematics, psychology, linguistics,
philosophy and so on.
Trending AI Articles:
1. Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data
2. Data Science Simplified Part 1: Principles and Process
3. Getting Started with Building Realtime API Infrastructure
4. AI & NLP Workshop
Applications of AI
 Gaming − AI plays important role for machine to think of large number of possible
positions based on deep knowledge in strategic games. for example, chess, river
crossing, N-queens’ problems and etc.
 Natural Language Processing − Interact with the computer that understands natural
language spoken by humans.
 Expert Systems − Machine or software provide explanation and advice to the users.
 Vision Systems − Systems understand, explain, and describe visual input on the
computer.
 Speech Recognition − There are some AI based speech recognition systems have ability
to hear and express as sentences and understand their meanings while a person talks to
it. For example, Siri and Google assistant.
 Handwriting Recognition − The handwriting recognition software reads the text written
on paper and recognize the shapes of the letters and convert it into editable text.
 Intelligent Robots − Robots are able to perform the instructions given by a human.
23 | P a g e
Major Goals
 Knowledge reasoning
 Planning
 Machine Learning
 Natural Language Processing
 Computer Vision
 Robotics
IBM Watson
“Watson” is an IBM supercomputer that combines Artificial Intelligence (AI) and complex
inquisitive programming for ideal execution as a “question answering” machine. The
supercomputer is named for IBM’s founder, Thomas J. Watson.
IBM Watson is at the forefront of the new era of computing. At the point when IBM Watson
made, IBM communicated that “more than 100 particular techniques are used to inspect
perceive sources, find and make theories, find and score affirm, and combination and rank
speculations.” recently, the Watson limits have been expanded and the way by which Watson
works has been changed to abuse new sending models (Watson on IBM Cloud) and propelled
machine learning capacities and upgraded hardware open to architects and authorities. It isn’t
any longer completely a request answering figuring system arranged from Q&A joins yet can
now ‘see’, ‘hear’, ‘read’, ‘talk’, ‘taste’, ‘translate’, ‘learn’ and ‘endorse’
3.4 Machine Learning
3.4.1 Introduction
Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning
generally is to understand the structure of data and fit that data into models that can be
understood and utilized by people.
Although machine learning is a field within computer science, it differs from traditional
computational approaches. In traditional computing, algorithms are sets of explicitly
programmed instructions used by computers to calculate or problem solve. Machine learning
algorithms instead allow for computers to train on data inputs and use statistical analysis in
order to output values that fall within a specific range. Because of this, machine learning
facilitates computers in building models from sample data in order to automate decision-
making processes based on data inputs.
24 | P a g e
Any technology user today has benefitted from machine learning. Facial recognition
technology allows social media platforms to help users tag and share photos of friends. Optical
character recognition (OCR) technology converts images of text into movable type.
Recommendation engines, powered by machine learning, suggest what movies or television
shows to watch next based on user preferences. Self-driving cars that rely on machine learning
to navigate may soon be available to consumers.
Machine learning is a continuously developing field. Because of this, there are some
considerations to keep in mind as you work with machine learning methodologies, or analyze
the impact of machine learning processes.
In this tutorial, we’ll look into the common machine learning methods of supervised and
unsupervised learning, and common algorithmic approaches in machine learning, including the
k-nearest neighbor algorithm, decision tree learning, and deep learning. We’ll explore which
programming languages are most used in machine learning, providing you with some of the
positive and negative attributes of each. Additionally, we’ll discuss biases that are perpetuated
by machine learning algorithms, and consider what can be kept in mind to prevent these biases
when building algorithms.
Machine Learning Methods:
In machine learning, tasks are generally classified into broad categories. These categories are
based on how learning is received or how feedback on the learning is given to the system
developed.
Two of the most widely adopted machine learning methods are supervised learning which
trains algorithms based on example input and output data that is labeled by humans,
and unsupervised learning which provides the algorithm with no labeled data in order to
allow it to find structure within its input data. Let’s explore these methods in more detail.
Supervised Learning:
In supervised learning, the computer is provided with example inputs that are labeled with their
desired outputs. The purpose of this method is for the algorithm to be able to “learn” by
comparing its actual output with the “taught” outputs to find errors, and modify the model
accordingly. Supervised learning therefore uses patterns to predict label values on additional
unlabeled data.
For example, with supervised learning, an algorithm may be fed data with images of sharks
labeled as fish and images of oceans labeled as water. By being trained on this data, the
supervised learning algorithm should be able to later identify unlabeled shark images
as fish and unlabeled ocean images as water.
25 | P a g e
A common use case of supervised learning is to use historical data to predict statistically likely
future events. It may use historical stock market information to anticipate upcoming
fluctuations, or be employed to filter out spam emails. In supervised learning, tagged photos of
dogs can be used as input data to classify untagged photos of dogs.
Unsupervised Learning:
In unsupervised learning, data is unlabeled, so the learning algorithm is left to find
commonalities among its input data. As unlabeled data are more abundant than labeled data,
machine learning methods that facilitate unsupervised learning are particularly valuable.
The goal of unsupervised learning may be as straightforward as discovering hidden patterns
within a dataset, but it may also have a goal of feature learning, which allows the computational
machine to automatically discover the representations that are needed to classify raw data.
Unsupervised learning is commonly used for transactional data. You may have a large dataset
of customers and their purchases, but as a human you will likely not be able to make sense of
what similar attributes can be drawn from customer profiles and their types of purchases. With
this data fed into an unsupervised learning algorithm, it may be determined that women of a
certain age range who buy unscented soaps are likely to be pregnant, and therefore a marketing
campaign related to pregnancy and baby products can be targeted to this audience in order to
increase their number of purchases.
Without being told a “correct” answer, unsupervised learning methods can look at complex
data that is more expansive and seemingly unrelated in order to organize it in potentially
meaningful ways. Unsupervised learning is often used for anomaly detection including for
fraudulent credit card purchases, and recommender systems that recommend what products to
buy next. In unsupervised learning, untagged photos of dogs can be used as input data for the
algorithm to find likenesses and classify dog photos together.
Approaches
As a field, machine learning is closely related to computational statistics, so having a
background knowledge in statistics is useful for understanding and leveraging machine
learning algorithms.
For those who may not have studied statistics, it can be helpful to first define correlation and
regression, as they are commonly used techniques for investigating the relationship among
quantitative variables. Correlation is a measure of association between two variables that are
not designated as either dependent or independent. Regression at a basic level is used to
examine the relationship between one dependent and one independent variable. Because
regression statistics can be used to anticipate the dependent variable when the independent
26 | P a g e
variable is known, regression enables prediction capabilities.
Approaches to machine learning are continuously being developed. For our purposes, we’ll go
through a few of the popular approaches that are being used in machine learning at the time of
writing.
k-nearest neighbor
The k-nearest neighbor algorithm is a pattern recognition model that can be used for
classification as well as regression. Often abbreviated as k-NN, the k in k-nearest neighbor is
a positive integer, which is typically small. In either classification or regression, the input will
consist of the k closest training examples within a space.
We will focus on k-NN classification. In this method, the output is class membership. This will
assign a new object to the class most common among its k nearest neighbors. In the case of k
= 1, the object is assigned to the class of the single nearest neighbor.
Let’s look at an example of k-nearest neighbor. In the diagram below, there are blue diamond
objects and orange star objects. These belong to two separate classes: the diamond class and
the star class.
When a new object is added to the space — in this case a green heart — we will want the
machine learning algorithm to classify the heart to a certain class.
27 | P a g e
When we choose k = 3, the algorithm will find the three nearest neighbors of the green heart
in order to classify it to either the diamond class or the star class.
In our diagram, the three nearest neighbors of the green heart are one diamond and two stars.
Therefore, the algorithm will classify the heart with the star class.
28 | P a g e
Among the most basic of machine learning algorithms, k-nearest neighbor is considered to be
a type of “lazy learning” as generalization beyond the training data does not occur until a query
is made to the system.
Decision Tree Learning
For general use, decision trees are employed to visually represent decisions and show or inform
decision making. When working with machine learning and data mining, decision trees are
used as a predictive model. These models map observations about data to conclusions about
the data’s target value.
The goal of decision tree learning is to create a model that will predict the value of a target
based on input variables.
In the predictive model, the data’s attributes that are determined through observation are
represented by the branches, while the conclusions about the data’s target value are represented
in the leaves.
When “learning” a tree, the source data is divided into subsets based on an attribute value test,
which is repeated on each of the derived subsets recursively. Once the subset at a node has the
equivalent value as its target value has, the recursion process will be complete.
Let’s look at an example of various conditions that can determine whether or not someone
should go fishing. This includes weather conditions as well as barometric pressure conditions.
In the simplified decision tree above, an example is classified by sorting it through the tree to
the appropriate leaf node. This then returns the classification associated with the particular leaf,
29 | P a g e
which in this case is either a Yes or a No. The tree classifies a day’s conditions based on
whether or not it is suitable for going fishing.
A true classification tree data set would have a lot more features than what is outlined above,
but relationships should be straightforward to determine. When working with decision tree
learning, several determinations need to be made, including what features to choose, what
conditions to use for splitting, and understanding when the decision tree has reached a clear
ending.
3.5 Deep Learning
Introduction to Deep Learning

What is deep learning?
Deep learning is a branch of machine learning which is completely based on artificial neural
networks, as neural network is going to mimic the human brain so deep learning is also a kind
of mimic of human brain. In deep learning, we don’t need to explicitly program everything.
The concept of deep learning is not new. It has been around for a couple of years now. It’s on
hype nowadays because earlier we did not have that much processing power and a lot of data.
As in the last 20 years, the processing power increases exponentially, deep learning and
machine learning came in the picture.
A formal definition of deep learning is- neurons
Deep learning is a particular kind of machine learning that achieves great power and flexibility
by learning to represent the world as a nested hierarchy of concepts, with each concept defined
in relation to simpler concepts, and more abstract representations computed in terms of less
abstract ones.
In human brain approximately 100 billion neurons all together this is a picture of an individual
neuron and each neuron is connected through thousands of their neighbours.
The question here is how do we recreate these neurons in a computer. So, we create an artificial
structure called an artificial neural net where we have nodes or neurons. We have some neurons
for input value and some for-output value and in between, there may be lots of neurons
interconnected in the hidden layer.
30 | P a g e
31 | P a g e
4 SYSTEM DESIGN
4.1 System Architecture
4.2 YOLOv4 Architecture

All object detectors take an image in for input and compress features down through a
convolutional neural network backbone. In image classification, these backbones are the end
of the network and prediction can be made off of them.
In object detection, multiple bounding boxes need to be drawn around images along
with classification, so the feature layers of the convolutional backbone need to be mixed and
held up in light of one another. The combination of backbone feature layers happens in the
neck.
It is also useful to split object detectors into two categories: one-stage detectors and two stage
detectors. Detection happens in the head. Two-stage detectors decouple the task of object
localization and classification for each bounding box.
32 | P a g e
One-stage detectors make the predictions for object localization and classification at
the same time. YOLO is a one-stage detector, hence, You Only Look Once.
But inYOLOv4 uses the Two- stage Detector
The image above comes from YOLOv4's predecessor, EfficientDet. Written by Google
Brain, EfficientDet uses neural architecture search to find the best form of blocks in the neck
portion of the network, arriving at NAS-FPN. The EfficientDet authors then tweak it slightly
to the make the more architecture more intuitive (and probably perform better on their
development sets).
YOLOv4 chooses PANet for the feature aggregation of the network. They don't write
much on the rationale for this decision, and since NAS-FPN and BiFPN were written
concurrently, this is presumably an area of future research.
Additionally, YOLOv4 adds a SPP block after CSPDarknet53 to increase the receptive
field and separate out the most important features from the backbone.
33 | P a g e
4.3 UML Diagram
UML stands for Unified Modeling Language. UML is a standardized general-purpose

modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.
The goal is for UML to become a common language for creating models of object-
oriented computer software. In its current form UML is comprised of two major components:
A Meta-model and a notation. In the future, some form of method or process may also be
added to; or associated with, UML.
The Unified Modeling Language is a standard language for specifying, Visualization,
Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems.
The UML represents a collection of best engineering practices that have proven
successful in the modeling of large and complex systems.
The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.
GOALS:
The Primary goals in the design of the UML are as follows:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
2. Provide extendibility and specialization mechanisms to extend the core concepts.
3. Be independent of particular programming languages and development process.
4. Provide a formal basis for understanding the modeling language.
5. Encourage the growth of OO tools market.
6. Support higher level development concepts such as collaborations, frameworks,
patterns and components.
7. Integrate best practices.
USE CASE DIAGRAM:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented
as use cases), and any dependencies between those use cases. The main purpose of a use case
34 | P a g e
diagram is to show what system functions are performed for which actor. Roles of the actors
in the system can be depicted.
35 | P a g e
5 METHODOLOGY
In this this, training and finding a car in the parking space is done using convolutional
neural networks. In the proposed method, we would like to collect the images of cars
parked in the outdoor parking lot and create a dataset with all the images so that the
neural network can detect the car present in the parking space or not accurately.
5.1 Data Collection
In this project, we collected the images of the few parking lots. The images are collected using
the temporary surveillance camera which is placed above some coffee shop and at the hotel in
a position such that every parking slot is covered. This camera can be accessed using the VLC
media player, through which the videos are recorded. The videos are recorded every day in
different lighting conditions. We took snaps of the two different parking lots from the video
recorded. We collected 900 images from the parking lot and trained the model. For test images,
we randomly collected the images from the videos recorded and tested them for detection, for
testing the videos, we collected the video on a different day for few minutes and tested them.
The camera which is used for collecting the data is Honeywell’s new HBW4PR2 bullet
camera which has high picture clarity. The image quality is Full HD 1080p 25/30 fps. The
camera is very good in taking the quality of the data to the next level, it also has wide dynamic
range. The performance of the camera is also excellent in low-light conditions with 3D noise
reduction.
5.2 Annotating Images
We annotated images with two different classes which are “Empty” and “Occupied”. For
labelling the image we used Labeling software which is an open source. In this software the
images can be annotated easily as it is user friendly. Firstly, the software contains some pre-
defined classes which should be changed according to our classes, we have named the classes
as “Empty” and “Occupied”. The software is set to YOLO format and we annotated the images
where there is no car in a particular slot as “Empty” and where there is a car as “Occupied”.
The data of the bounding boxes that we annotated along with the two classes are saved in the
txt format which will save the coordinates of the bounding boxes.
36 | P a g e
5.3 Data Processing
For YOLOv4 detector, there is a different format for the values to be fed into the
model. These values are saved in the txt format which contains some of the
specified classes and some values. These values can be as below.
The order of the values in txt file is class, x, y, w, h. where:
 x = Absolute x / width of total image.

 y = Absolute y / height of total image.
 w = Absolute width / width of the total image.
 h = Absolute height / height of total image.”
Figure 3.3: YOLO format of txt files

Where Absolute x, Absolute y, Absolute width, Absolute height are given below
“Absolute x = (Xmin + (Absolute width/2))
Absolute y = (Ymin + (Absolute height/2))
Absolute width = abs (Xmax - Xmin)
Absolute height = abs (Ymax - Ymin)”
In the above figure 3.3 the number ‘0’ denotes the class “Empty” and the number
‘1’ denotes the class “Occupied” and the other values denotes the metrics x, y, w,
h.
5.4 YOLOv4
YOLO is short for You Only Look Once. It is a real-time object recognition system that can
recognize multiple objects in a single frame. YOLO recognizes objects more precisely and
faster than other recognition systems. It can predict up to 9000 classes and even unseen classes.
The real-time recognition system will recognize multiple objects from an image and also make
37 | P a g e
a boundary box around the object. It can be easily trained and deployed in a production system.
YOLO is based on a single Convolutional Neural Network (CNN). The CNN divides an image
into regions and then it predicts the boundary boxes and probabilities for each region. It
simultaneously predicts multiple bounding boxes and probabilities for those classes. YOLO
sees the entire image during training and test time so it implicitly encodes contextual
information about classes as well as their appearance.
YOLOv4 is twice as fast as Efficient (competitive recognition model) with comparable
performance. In addition, AP (Average Precision) and FPS (Frames Per Second) increased by
10% and 12% compared to YOLOv3
YoloV4 is an important improvement of YoloV3, the implementation of a new architecture in
the Backbone and the modifications in the Neck have improved the mAP(mean Average
Precision) by 10% and the number of FPS(Frame per Second) by 12%. In addition, it has
become easier to train this neural network on a single GPU.
We are going to detail all the layers of YoloV4, to understand how it works.
Backbone
What’s the backbone for? It’s a deep neural network composed mainly of convolution layers.
The main objective of the backbone is to extract the essential features, the selection of the
backbone is a key step it will improve the performance of object detection. Often pre-trained
neural networks are used to train the backbone.
The YoloV4 backbone architecture is composed of three parts:

 Bag of freebies
 Bag of specials
 CSPDarknet53
We are going to explain all these concepts in the following parts.

Bag of freebies
Bag of freebies methods are the set of methods that only increase the cost of training or change
the training strategy while leaving the cost of inference low. Let’s present some simple methods
commonly used in computer vision.
38 | P a g e
Data augmentation
The main objective of data augmentation methods is to increase the variability of an image in
order to improve the generalization of the model training.
Bag of freebies for fight against bias
Focal loss
The Focal Loss is designed to address the one-stage object detection scenario in which there is
an extreme imbalance between foreground and background classes during training (e.g.,
1:1000).
Usually, in classification problems cross entropy is used as a loss function to train the model.
The advantage of this function is to penalize an error more strongly if the probability of the
class is high.
Nevertheless, this function also penalizes true positive examples these small loss values can
overwhelm the rare class.
The new Focal loss function is based on the cross entropy by introducing a (1-pt)^gamma
coefficient. This coefficient allows to focus the importance on the correction of
misclassified examples.
The focusing parameter γ smoothly adjusts the rate at which easy examples are down-weighted.
When γ = 0, FL is equivalent to CE, and as γ is increased the effect of the modulating factor is
likewise increased
39 | P a g e
figure 8: Focal Loss (FL)
Label smoothing
Whenever you feel absolutely right, you may be plainly wrong. A 100% confidence in a
prediction may reveal that the model is memorizing the data instead of learning. Label
smoothing adjusts the target upper bound of the prediction to a lower value say 0.9. And it
will use this value instead of 1.0 in calculating the loss. This concept mitigates overfitting.
IoU loss
Most object detection models use bounding box to predict the location of an object. To
evaluate the quality of a model the L2 standard is used, to calculate the difference in
position and size of the predicted bounding box and the real bounding box.
The disadvantage of this L2 standard is that it minimizes errors on small objects and tries to
minimize errors on large bounding boxes.
To address this problem we use IoU loss for the YoloV4 model.
40 | P a g e
Compared to the l2 loss, we can see that instead of optimizing four coordinates
independently, the IoU loss considers the bounding box as a unit. Thus, the IoU loss could
provide more accurate bounding box prediction than the l2 loss. Moreover, the definition
naturally norms the IoU to [0, 1] regardless of the scales of bounding boxes
figure 9: IoU loss
Bag of specials
Bag of special methods are the set of methods which increase inference cost by a small
amount but can significantly improve the accuracy of object detection.
Mish activation
Mish is a novel self-regularized non-monotic activation function which can be defined by

f(x) = x tanh(softplus(x)).
41 | P a g e
figure 10: activation functions
Why this activation function improves the training?

Mish is bounded below and unbounded above with a range of [≈ -0.31,∞[. Due to the preservation of a
small amount of negative information, Mish eliminated by design the preconditions necessary for the
Dying ReLU phenomenon. A large negative bias can cause saturation of the ReLu function and causes
the weights not to be updated during the backpropagation phase making the neurons inoperative for
prediction.
Mish properties helps in better expressivity and information flow. Being unbounded above, Mish avoids
saturation, which generally causes training to slow down due to near-zero gradients drastically. Being
bounded below is also advantageous since it results in strong regularization effects.
Explain CSPDarknet53
42 | P a g e
The Cross Stage Partial architecture is derived from the DenseNet architecture which uses
the previous input and concatenates it with the current input before moving into the
dense layer.
Each stage layer of a DenseNet contains a dense block and a transition layer, and each dense
block is composed of k dense layers. The output of the ith dense layer will be concatenated
with the input of the ith dense layer, and the concatenated outcome will become the input of
the (i + 1)th dense layer. The equations showing the above-mentioned mechanism can be
expressed as:
where ∗ represents the convolution operator, and [x0, x1, …] means to concatenate x0, x1,
…, and wi and xi are the weights and output of the ith dense layer, respectively.
The CSP is based on the same principle except that instead of concatenating the ith output
with the ith input, we divided the input ith in two parts x0' and x0'’, one part will pass
through the dense layer x0'’, the second part x0' will be concatenated at the end with the
result at the output of the dense layer of x0'’.
This translates mathematically into the following equation:
43 | P a g e
Cross Stage Partial Forward
This will result in different dense layers repeatedly learn copied gradient information.
Neck (detector)
The essential role of the neck is to collect feature maps from different stages. Usually, a
neck is composed of several bottom-up paths and several top-down paths.
We will explain the different elements that make up the neck of the yoloV4 is their usefulness
in architecture.
SPP
What is the problem caused by CNN and fully connected network ?
The fully connected network requires a fixed size so we need to have a fixed size image, when
detecting objects, we don’t necessarily have fixed size images. This problem forces us to scale
the images, this method can remove a part of the object we want to detect and therefore decrease
the accuracy of our model.
The second problem caused by CNN is that the size of the sliding window is fixed.
How SPP runs?
At the output of the convolution neural networks, we have the features map, these are features
generated by our different filters. To make it simple, we can have a filter able to detect circular
geometric shapes, this filter will produce a feature map highlighting these shapes while keeping
the location of the shape in the image.
44 | P a g e
figure 13: SPP
Spatial Pyramid Pooling Layer will allow to generate fixed size features whatever the size
of our feature maps. To generate a fixed size, it will use pooling layers like Max Pooling for
example, and generate different representations of our feature maps.
I will detail the different steps carried out by an SPP.
In the case of Figure 13, we have a 3-level PPS. Suppose the conv5 (i.e., the last convolution
layer) has 256 features map.
1. First, each feature map is pooled to become a one value (grey part in figure 13). Then
the size of the vector is (1, 256)
2. Then, each feature map is pooled to have 4 values (green par in figure 13). Then the
size of the vector is (4, 256)
3. On the same way, each feature is pooled to have 16 values (blue part in figure 13). Then
the size of the vector is (16, 256)
4. The 3 vectors created in the previous 3 steps are then concatenated to form a fixed size
vector which will be the input of the fully connected net.
What are the benefits of SPP ?
45 | P a g e
1. SPP is able to generate a fixed- length output regardless of the input size
2. SPP uses multi-level spatial bins, while the sliding window pooling uses only a single
window size. Multi-level pooling has been shown to be robust to object deformations
3. SPP can pool features extracted at variable scales thanks to the flexibility of input
scales
PaNet: for aggregate different backbone levels
In the early days of deep learning, simple networks were used where an input passed through
a succession of layers. Each layer takes input from the previous layer. The early layers extract
localized texture and pattern information to build up the semantic information needed in the
later layers. However, as we progress to the right, localized information that may be needed
to fine-tune the prediction may be lost.
To correct this problem, PaNet has introduced an architecture that allows better propagation
of layer information from bottom to top or top to bottom.
The components of the neck typically flow up and down among layers and connect only the
few layers at the end of the convolutional network.
We can see in figure 14, that the information of the first layer is added in layer P5 (red
arrow), and propagated in layer N5 (green arrow). This is a shortcut to propagate low level
information to the top.
PaNet
How PaNet add information to top layers ?
46 | P a g e
In the original implementation of PaNet, the current layer and information from a
previous layer is added together to form a new vector. In the YoloV4 implementation, a
modified version is used where the new vector is created by concatenating the input and the
vector from a previous layer.
Yolo V4 modified PaNet
Head (detector)
The role of the head in the case of a one stage detector is to perform dense prediction. The
dense prediction is the final prediction which is composed of a vector containing the
coordinates of the predicted bounding box (centre, height, width), the confidence score
of the prediction and the label.
Bag of freebies (BoF)
CIoU-loss
The CIoU loss introduces two new concepts compared to IoU loss. The first concept is the
concept of central point distance, which is the distance between the actual bounding box
centre point and the predicted bounding box centre point.
The second concept is the aspect ratio, we compare the aspect ratio of the true bounding box
and the aspect ratio of the predicted bounding box. With these 3 measures we can measure
the quality of the predicted bounding box.
47 | P a g e
CIoU loss
where b and bgt denote the central points of B and Bgt, ρ(·) is the Euclidean distance, c is the
diagonal length of the smallest enclosing box covering the two boxes, α is a positive trade-off
parameter, and v measures the consistency of aspect ratio.
consistency of aspect ratio
where w is the height of the bounding box and w is the width.
CmBN (Cross mini–Batch Normalization)
Why use Cross mini–Batch Normalization instead of Batch Normalization? What are its
advantages and how does it work? We will answer these questions in this paragraph.
Batch Normalization does not perform when the batch size becomes small. The estimate of
the standard deviation and mean is biased by the sample size. The smaller the sample size, the
more likely it is not to represent the completeness of the distribution. To solve this problem,
Cross mini–Batch Normalization is used, which uses estimates from recent batches to
improve the quality of each batch’s estimate. A challenge of computing statistics over
multiple iterations is that the network activations from different iterations are not comparable
to each other due to changes in network weights. To solve this problem, Taylor polynomials
are used to approximate any indefinitely differentiable function.
Taylor formula
48 | P a g e
Let’s take the example of the cosine function, let’s note f(x) = cos(x), we will look for an
approximation of this function in the neighbourhood of 0. We use Taylor’s formula at
order 2:
f(x) = f(x0) + f’(x0)(x-x0) + (1/2)f’’(x0)(x-x0)
= 1 — x/2
We can see on figure 19(curve green), that this approximation in the neighbourhood of 0 is
pretty good, nevertheless the further we move away the more the quality of the
approximation decreases, we can go to higher orders to improve the quality of the
approximation.
Taylor approximation for cos(x)
Now back to Batch Normalization, the classic way to normalize a batch is as follows:
Batch Normalization
where ε is a small constant added for numerical stability, and μt(θt) and σt(θt) are the mean
and variance computed for all the examples from the current mini-batch.
49 | P a g e
The cross mini–Batch Normalization is defined as follows:
Cross mini–Batch Normalization
where the mean and variance are calculated from the previous N means and variances and
approximated using Taylor formulae to express them as a function of the parameters θt rather
than θt-N.
DropBlock regularization
Neural networks work better if they are able to generalize better, to do this we use
regularization techniques such as dropout which consists in deactivating certain neurons
during training. These methods generally improve accuracy during the test phase.
Nevertheless, the dropout drops features randomly, this method works well for fully connected
layers but is not efficient for convoluted layers where features are spatially correlated.
In Drop Block, features in a block (i.e., a contiguous region of a feature map), are dropped
together. As Drop Block discards features in a correlated area, the networks must look
elsewhere for evidence to fit the data.
The majority of CNN-based object detectors are largely applicable only for
recommendation systems. For example, searching for free parking spaces via urban video
cameras is executed by slow accurate models, whereas car collision warning is related to fast
inaccurate models. Improving the real-time object detector accuracy enables using them not
only for hint generating recommendation systems, but also for stand-alone process
50 | P a g e
management and human input reduction. Real-time object detector operation on conventional
Graphics Processing Units (GPU) allows their mass usage at an affordable price. The most
accurate modern neural networks do not operate in real time and require large number of GPUs
for training with a large mini-batch-size. We address such problems through creating a CNN
that operates in real-time on a conventional GPU, and for which training requires only one
conventional GPU.
Comparison of the proposed YOLOv4 and other state-of-the-art object detectors. YOLOv4
runs twice faster than Efficient with comparable performance. Improves Yolov4’s AP and FPS
by 10% and 12%, respectively.
The main goal of this work is designing a fast-operating speed of an object detector in
production systems and optimization for parallel computations, rather than the low
computation volume theoretical indicator (BFLOP). We hope that the designed object can be
easily trained and used. For example, anyone who uses a conventional GPU to train and test
can achieve real-time, high quality, and convincing object detection results, as the YOLOv4
results shown in Figure 1.
YOLOv4 consists of:
• Backbone: CSPDarknet53
• Neck: SPP , PAN
• Head: Yolov4
YOLO v4 uses:
51 | P a g e
• Bag of Freebies (BoF) for backbone: Cut Mix and Mosaic data augmentation, Drop Block
regularization, Class label smoothing
• Bag of Specials (BoS) for backbone: Mish activation, Cross-stage partial connections
(CSP), Multi input weighted residual connections (MiWRC)
• Bag of Freebies (BoF) for detector: CIoU-loss, CmBN, DropBlock regularization, Mosaic
data augmentation, Self-Adversarial Training, eliminate grid sensitivity, using multiple

anchors for a single ground truth, Cosine annealing scheduler , Optimal hyperparameters,
Random training shapes
• Bag of Specials (BoS) for detector: Mish activation, SPP-block, SAM-block, PAN path-
aggregation block,
DIoU-NMS
5.5 Object detectors
A modern detector is usually composed of two parts, a backbone which is pre-trained

on ImageNet and a head which is used to predict classes and bounding boxes of objects. For
those detectors running on GPU platform, their backbone could be VGG, ResNet, ResNeXt, or
DenseNet . For those detectors running on CPU platform, their backbone could be SqueezeNet
[31], MobileNet [28, 66, 27, 74], or ShuffleNet [97, 53]. As to the head part, it is usually
categorized into two kinds, i.e., one-stage object detector and two-stage object detector. The
most representative two-stage object detector is the R-CNN [19] series, including fast R-CNN
[18], faster R-CNN [64], R-FCN [9], and Libra R-CNN [58]. It is also possible to make a two-
stage object detector an anchor-free object detector, such as RepPoints [87]. As for one-stage
object detector, the most representative models are YOLO [61, 62, 63], SSD [50], and
RetinaNet [45]. In recent years, anchor-free one-stage object detectors are developed. The
detectors of this sort are CenterNet [13], CornerNet [37, 38], FCOS [78], etc. Object detectors
developed in recent years often insert some layers between backbone and head, and these layers
are usually used to collect feature maps from different stages. We can call it the neck of an
object detector. Usually, a neck is composed of several bottom-up paths and several top-down
paths. Networks equipped with this mechanism include Feature Pyramid Network (FPN) [44],
Path Aggregation Network (PAN) [49], BiFPN [77], and NAS-FPN [17].
In addition to the above models, some researchers put their emphasis on directly building a new
backbone (DetNet [43], DetNAS [7]) or a new whole model (SpineNet [12], HitDetector [20])
for object detection.
52 | P a g e
To sum up, an ordinary object detector is composed of several parts:
• Input: Image, Patches, Image Pyramid
• Backbones: VGG16 [68], ResNet-50 [26], SpineNet
[12], EfficientNet-B0/B7 [75], CSPResNeXt50 [81], CSPDarknet53 [81]
• Neck:
• Additional blocks: SPP [25], ASPP [5], RFB [47], SAM [85]
• Path-aggregation blocks: FPN [44], PAN [49], NAS-FPN [17], Fully-connected FPN,
BiFPN
[77], ASFF [48], SFAM [98]
• Heads:
• Dense Prediction (one-stage):
◦ RPN [64], SSD [50], YOLO [61], RetinaNet [45] (anchor based)
53 | P a g e
5.6 Training and Testing
As the deep learning requires hardware with high end specifications for training, so
we are using hardware with the following configurations
 CPU- Intel core i7-7700k, 4.3 GHz,
 Motherboard – Asus Prime Z270-A and Asus ROG-SYRIX-GTX1080TI
 RAM – 16GB for CPU and 11GB for GPU
We have used the pre-trained weights of YOLOV4 which are trained on COCO dataset
which consists of 80 classes and images are more than 80000. The training of our model
on the dataset was done by the previous steps. The filters of YOLOV4 configuration
file should be set according to the below YOLOV4 filters formula
• “Filters = (classes + 5) * 3”
As there are two classes in our case, the classes are set to our number of objects that is
two in each of 3 yolo layers, the filters which are in convolutional layer just before the
3 yolo layers are set to 21. The Yolov4 configuration file consists of batch number
which is set to 64, subdivisions to 16, learning rate to 0.001, saturation to 1.5 and
exposure to 1.5.
The resolution of the input image is resized to a width and height of 608*608. And
the training of our model is done, but as YOLOv4 does not limit the number of
iterations to be done, the system continuously trains the model. Usually, 2000 iterations
are sufficient for each class(object), but the iteration number should not be less than
4000 in total. But, for a better precision the training should be stopped for the iteration
where the average loss no longer decreases or remains constant. So, when the training
should be actually stopped depends upon the average loss function.
The average loss function changes gradually from a higher value to the lower value
and remains constant, then the training can be stopped. The weights file is generated
and saved for every 1000 iterations and this can be changed in the configuration file for
our desired number of iterations.
After the training is done the mAP (Mean Average Precision) is calculated for the
validated images. This should be done for every weight so that we can know the mAP
for each and every weight and the weights file whose mAP is higher can be used for
54 | P a g e
testing the image for good detection. Thresh is considered as the minimum threshold of
the IOU (Intersection Over Union) during the training. IOU (Intersection Over Union)
is determined to be the fraction of area of union between the anchor box and the ground
truth.
Figure3.4: Intersection Over Union
55 | P a g e
6. IMPLEMENTATION
6.1 Code:
56 | P a g e
57 | P a g e
58 | P a g e
7.OUTPUTS
The occupancy detection of the parking lot is done by YOLOv4 detector but as the parking lot
at our institute is in quadrilateral form and in the form of squares and rectangles, the bounding
boxes of the parking lot are drawn manually using OpenCV python code and saved them in an
xml format and the output data of the YOLOv4 detector and the each class i.e whether it is
Empty or Occupied is used to detect the occupancy by overlapping them with the bounding
boxes using python code in such a way that it covers each slot of the parking lot. For each
image below we can see the good performance of the system most of the times. So far, the
accuracy is good when tested on the images and also from the videos. The confidence of the
model for each test image are attached below the image which is pretty much good at all the
times.
59 | P a g e
60 | P a g e
61 | P a g e
62 | P a g e
63 | P a g e
64 | P a g e
65 | P a g e
8.CONCLUSION
This project has shown as a proof of concept the effectiveness of a single-shot detector CNN
such as YOLO can be a baseline to evolve a model intended to detect cars in a parking lot. It
highlights the effectiveness of containerization of responsibilities during training, where image
processing and compilation occurs concisely and efficiently in a separate service. Finally, this
project shows the application of car detection as a means to track throughput and capacity in
parking lots over time without the expenses of hardware such as GPUs or labor of parking lot
managers patrolling at regular intervals.
The level of accuracy in detecting cars in a parking lot depends greatly on the training
set and by extensive the method of data collection. Fixed camera angles present greater
challenges in object detection than top-down drone vantages. For near-perfect accuracy, drone
footage should be captured to minimize the effects of pose, depth, and occlusion that the CNN
would be challenged to work past.
However, if the use case was to track general throughput and capacity at regular time
intervals, the model could make fairly accurate detections of cars in the parking lot, thus still
providing valuable data and insight into the parking lot. There are several next steps in the
research process to work towards higher detection accuracy, particularly with regards to
addressing the challenges highlighted by the fixed camera angle experiments. Namely, the next
leap forward for this CNN 49 setup is deepening the feature extraction portion even further
than the version three did.
This would require extensive research and retraining on standard object detection
datasets such as VOC and COCO along with classifier-specific ones such as ImageNet. The
end result will be a slower runtime, but ideally, there will be more comprehensive feature maps
for the fourth (or even fifth) detection layer to detect either really large or small cars in the
foreground and background, respectively. There are also next steps in the maturation of the
practical application behind this project’s parking lot car detector. Skynet can be modified to
run continuously on a server that accepted image inputs.
This would remove a significant runtime bottleneck, the instantiation of the entire CNN
upon every execution of Skynet. Further, Skynet can further be augmented with an option to
66 | P a g e
read input from a video stream, which would convert the input at regular intervals into a tensor
for the neural network to consume. Object detection as a field will continue to evolve. There
may be a single-shot detector with higher accuracy than the YOLO-based ones or a different
approach to representing car locations’ ground truth, and perhaps those innovations will be the
next leaps forward in detecting cars in a parking lot.
8.1 Future scope
In future the model can be trained by increasing the images in the dataset and making
huge dataset in such a way that it can detected at different locations and with different
environment. The other scope is that the system can be developed in such a way that it
could give directions to the drivers to the nearby available empty space by using deep
learning and AI methods.
67 | P a g e
9.REFERENCES
[1] Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S Davis. Soft-NMS–
improving object detection with one line of code. In Proceedings of the IEEE
International Conference on Computer Vision (ICCV), pages 5561–5569, 2017. 4
[2] Zhaowei Cai and Nuno Vasconcelos. Cascade R-CNN: Delving into high quality object
detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 6154–6162, 2018. 12
[3] Jiale Cao, Yanwei Pang, Jungong Han, and Xuelong Li. Hierarchical shot detector. In
Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages
9705–9714, 2019. 12
[4] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin.
HarDNet: A low memory traffic network. Proceedings of the IEEE International
Conference on Computer Vision (ICCV), 2019. 13
[5] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L
Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous
convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and
Machine Intelligence (TPAMI), 40(4):834–848, 2017. 2, 4
[6] S. Lee, D. Yoon and A. Ghosh, "Intelligent parking lot application using wireless sensor
networks," 2008 International Symposium on Collaborative Technologies and Systems,
Irvine, CA, 2008.
[7] V. W. s. Tang, Y. Zheng and J. Cao, "An Intelligent Car Park Management System based
on Wireless Sensor Networks," 2006 First International Symposium on Pervasive
Computing and Applications, Urumqi, 2006.
[8] Z. Zhang, M. Tao and H. Yuan, "A Parking Occupancy Detection Algorithm Based on
AMR Sensor," in IEEE Sensors Journal, vol. 15, no. 2, pp. 1261-1269, Feb. 2015.
[9] Q. Wu, C. Huang, S. Wang, W. Chiu and T. Chen, "Robust Parking Space Detection
Considering Inter-Space Correlation," 2007 IEEE International Conference on
Multimedia and Expo, Beijing, 2007.
[10] Bong, David & K.C, Ting & Lai, Koon Chun. (2008). Integrated Approach in the Design
of Car Park Occupancy Information System (COINS). IAENG International Journal of
Computer Science. 35.
[11] G. Amato, F. Carrara, F. Falchi, C. Gennaro and C. Vairo, "Car parking occupancy
detection using smart camera networks and Deep Learning," 2016 IEEE Symposium on
Computers and Communication (ISCC), Messina, 2016.
[12] S. Valipour, M. Siam, E. Stroulia and M. Jagersand, "Parking-stall vacancy indicator
system, based on deep convolutional neural networks," 2016 IEEE 3rd World Forum on
Internet of Things (WF-IoT), Reston, VA, 2016.
[13] TzuTaLin, “tzutalin/labelimg: Labelimg is a graphical image annotation tool and label
object bounding boxes in images.” https://github.com/ tzutalin/labelImg, June 2017.
(Accessed on 09/30/2016).
[14] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: A database and
web-based tool for image annotation,” International Journal of Computer Vision, vol. 77,
no. 1, pp. 157–173, 2008.
68 | P a g e
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in Advances in Neural Information Processing Systems
25 (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), pp. 1097–1105,
Curran Associates, Inc., 2012.
[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to

document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[17] Redmon, J. (Mar 26, 2018). Yolo: Real-time object detection.
69 | P a g e

Finding Vacant Parking Slots Using AI (YOLOv4) FINAL

Uploaded by

Copyright:

Available Formats

You might also like

Finding Vacant Parking Slots Using AI (YOLOv4) FINAL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finding Vacant Parking Slots Using AI (YOLOv4) FINAL

Uploaded by

Copyright:

Available Formats

Finding Vacant Parking Slots using AI

A Major Project Report Submitted To

K. Anil Kumar Reddy 17N31A1262

Under the guidance of:

DEPARTMENT OF INFORMATION TECHNOLOGY

(Autonomous Institution- UGC, Govt. of India)

Internal Guide Head of the Department

Associate Professor Director & HOD

Kashireddy Anil Kumar Reddy 17N31A1262

With regards and gratitude

Kashireddy Anil Kumar Reddy 17N31A1262

Mohammed Haji Baba 17N31A1296

Nandhigama Jeevan Reddy 17N31A12B2

2.2 Proposed system

2.3 Hardware Requirements

2.4 Software Requirements

 OpenCV: OpenCV (Open-Source Computer Vision Library) is an open-source computer

TensorFlow is an end-to-end open-source platform for machine learning. It has a

 Easy model building

Robust ML production anywhere

Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine. It is also useful as a

Three key considerations involved in the feasibility analysis are

Python is a general-purpose interpreted, interactive, object-oriented, and high-level

(tutorial-env) $ pip install --upgrade requests

pip show will display information about a particular package:

(tutorial-env) $ pip show requests

(tutorial-env) $ pip list

(tutorial-env) $ pip freeze > requirements.txt

(tutorial-env) $ pip install -r requirements.txt

Fig: CNN Architecture

3.3 Artificial Intelligence

3.4 Machine Learning

3.5 Deep Learning

Introduction to Deep Learning

4.2 YOLOv4 Architecture

But inYOLOv4 uses the Two- stage Detector

UML stands for Unified Modeling Language. UML is a standardized general-purpose

5.1 Data Collection

5.2 Annotating Images

 x = Absolute x / width of total image.

Figure 3.3: YOLO format of txt files

“Absolute x = (Xmin + (Absolute width/2))

Absolute y = (Ymin + (Absolute height/2))

Absolute width = abs (Xmax - Xmin)

Absolute height = abs (Ymax - Ymin)”

The YoloV4 backbone architecture is composed of three parts:

We are going to explain all these concepts in the following parts.

Bag of freebies for fight against bias

figure 9: IoU loss

Mish is a novel self-regularized non-monotic activation function which can be defined by

Why this activation function improves the training?

This translates mathematically into the following equation:

What is the problem caused by CNN and fully connected network ?

How SPP runs?

I will detail the different steps carried out by an SPP.

What are the benefits of SPP ?

PaNet: for aggregate different backbone levels

How PaNet add information to top layers ?