Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 54

FACE RECOGNITION BASED ATTENDANCE SYSTEM

A PROJECT REPORT

Submitted by

INDHU PRAKASH P 1705081


NIVASINI R 1705093
SNEHA S 1705105
GANESH R 1805216

in partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

COIMBATORE INSTITUTE OF TECHNOLOGY


(Government Aided Autonomous Institution Affiliated to Anna University)
COIMBATORE-641014
ANNA UNIVERSITY-CHENNAI 600 025

1
MARCH 2020

COIMBATORE INSTITUTE OF TECHNOLOGY


(A Govt. Aided Autonomous Institution Affiliated to Anna University)

COIMBATORE –641 014

BONAFIDE CERTIFICATE

Certified that this project “Face recognition based Attendance System” is the bonafide
work of INDHU PRAKASH P, NIVASINI R, SNEHA S and GANESH R under us supervision
during the academic year 2019-2020.

Prof. Dr. G. Kousalya, Mr. K. Navaneeth Kumar,


HEAD OF THE DEPARTMENT, SUPERVISOR,
Department of CSE, Department of CSE,
Coimbatore Institute of Technology, Coimbatore Institute of Technology,
Coimbatore - 641 014. Coimbatore- 641 014.

Certified that the candidates were examined by us in the project work viva-
voce examination held on …………………

Internal Examiner External Examiner

2
Place :
Date :

ACKNOWLEDGEMENT

ACKNOWLEDGEMENT

We express our sincere thanks to our Secretary Dr. R. Prabhakar and our Principal Dr.
V. Selladurai for providing us a greater opportunity to carry out our work. The following words
are rather very meagre to express our gratitude to them. This work is the outcome of their
inspiration and product of plethora of their knowledge and richexperience.
We record the deep sense of gratefulness to Dr. G. Kousalya, Head of the Department of
Computer Science and Engineering, for her encouragement during thistenure.
We equally tender our sincere thankfulness to our project guide Mr. K. Navaneeth
Kumar, Department of Computer Science and Engineering, for his valuable suggestions and
guidance during this course.
During the entire period of study, all staff members of the Department of Computer
Science and Engineering have offered ungrudging help.It is also a great pleasure to acknowledge
the unfailing help we have received from our friends.
It is a great pleasure to thank our parents and family members for their constant support
and co-operation in the pursuit of this Endeavour.

3
ABSTRACT

ABSTRACT

Automatic face apperception (AFR) technologies have visually perceived dramatic


amendments in performance over the past years, and such systems are now widely utilized for
security and commercial applications. An automated system is a system for human face
apperception in an authentic time background for a college to mark the attendance of their
students. So Astute Attendance utilizing Genuine Time Face Apperception is an authentic world
solution which comes with day to day activities of handling students.The task is very difficult as
the real time background subtraction in an image is still a challenge. To detect real time human
face is used and a simple, fast Principal Component Analysis has used to recognize the faces
detected with a high accuracy rate. The matched face is used to mark attendance of the
employee.Our system maintains the attendance records of students automatically. We designed
an efficient module that comprises of face recognition to manage the attendance records of
students. This enrolling is a onetime process and their face will be stored in the database. During
enrolling of face we require a system since it is a onetime process. You can have your own roll
number as your employee id which will be unique for each employee. The presence of each
employee will be updated in a database. The results showed improved performance over manual
attendance management system. Attendance is marked after employee identification. This
product gives much more solutions with accurate results in user interactive manner rather than
existing attendance and leave management systems. Automatic face recognition (AFR)
technologies have made many improvements in the changing world. Smart Attendance using
Real-Time Face Recognition is a real- world solution which comes with day to day activities of
handling student attendance system. Face recognition-based attendance system is a process of
recognizing the students face for taking attendance by using face biometrics based on high -
definition monitor video and other information technology. In our face recognition project, a
computer system will be able to find and recognize human faces fast and precisely in images or
videos that are being captured through a surveillance camera. Numerous algorithms and
techniques have been developed for improving the performance of face recognition but the

4
concept to be implemented Deep Learning. It helps in conversion of the frames of the video into
images so that the face of the student can be easily recognized for their attendance so that the
attendance database can be easily reflectedautomatically.

TABLE OF CONTENTS
Chapter 1 8
1.Introduction 8
1.1. Object Recognition 8
1.2. ObjectRecognition Methods 8
1.3. Open-cv Feature Descriptor 10
Chapter 2 14
2.Literature Survey 14
2.1. Review Of Literature 14
Chapter 3 19
3.Methodology 19
3.1. YOLO Algorithm 19
3.2. Steps Of Implementation 21
3.3. Hardware And Software Used 24
3.4.Advantages 24
3.5.Disadvantages 25
3.6.Applications 25
3.7.Evaluation of the system 25
Chapter 4 42
4.Result And Discussion 42
4.1. Database Creation 42
4.2. Training Stage 43
4.2.1. Training Using YOLO Algorithm 43
4.3. TestingStage 43
4.3.1.Testing using YOLOAlgorithm 43
4.4. Comparative study 44
4.5. The Problems 47
Chapter 5 49
5.Conclusion 49
Chapter 6 50
6.Future scope 50

Reference 51

5
LIST OF FIGURES

1.1 A simplified illustration of YOLO object detector pipeline

1.2 Face graph for face recognition

1.3 Process of YOLO

3.1.1 Flowchart for YOLO algorithm

3.3.1 Students sitting at different positions from first row to last row

3.3.2 Face detection tested with different parameters

3.3.3 Plot of detection rate against image size for image 1

3.3.4 Face detection performed at Merge Threshold level of 0

3.3.5 Plot of detection rate against Merge Threshold level for image 1

3.3.6 Face detection tested with different parameters

3.3.7 Plot of detection rate against image size for image 2

3.3.8 Plot of detection rate against Merge Threshold for image 2

3.3.9 Face detection of image 1 at original size

3.3.10 Face detection of image 1 at ¾ of original size

3.3.11 Face detection of image 1 at ½ of original size

3.3.12 Face detection of image 1 at ¼ of original size

4.1 Database of objects in a classroom

6
4.2.1 CSV file database for YOLO

4.3.1 Testing using YOLO algorithm

4.4.1 Comparison of recognition rates or accuracy

4.5.1 System generated excel sheet

7
LIST OF TABLES

1 List of papers referred for literature survery

2 Textual description of images with level of difficulty for face


detection
3 Showing default parameters

4 (a) Shows performance metrics of different image sizes to ¾ ,1/2 and


¼ of the original image size using default parameters

4 (b) Shows performance metrics of different image sizes to ¾ ,1/2 and


¼ of the original image size using YOLO face detection

4 (c) Shows the performance metrics of different Merge Threshold


levels
5 (a) Shows performance metrics of different image sizes to ¾ ,1/2 and
¼ of the original image size using YOLO face detection

5 (b) Shows performance metrics of different image sizes to ¾ ,1/2 and


¼ of the original image size

5 (c) Shows the performance metrics of different Merge Threshold


levels

8
CHAPTER-1

1.INTRODUCTION
Automatic face apperception (AFR) technologies have made many amendments in the
transmuting world.In us face apperception project, a computer system will be able to find and
agonize human faces expeditious and precisely in images or videos that are being captured through
a surveillance camera. Numerous algorithms and techniques have been developed for ameliorating
the performance of face apperception but the concept to be implemented here is Deep Learning. It
avails in conversion of the frames of the video into images so that the face of the student can be
facilely apperceived for their attendance so that the attendance database can be facilely reflected
automatically.

1.1.OBJECTRECOGNITION

Without getting much into details (We would relish to engender another story about the
details on how it works), Weopiate to fixate on the different implementations and how to utilize
them. YOLO (You Only Look Once) uses deep learning and convolution neural networks (CNN)
for objectdetection, it stands out from its “competitors” because, as the designation denotes it only
needs to “see” each image once. This sanctions YOLO to be one of the most expeditious detection
algorithms (naturally sacrificing some precision). Thanks to this swiftness YOLO can detect objects
in authentic time (up to 30 FPS).
Each one of the cells will soothsay N possible “bounding boxes” and the caliber of certainty
(or probability) of each one of them, this betokens SxSxN boxes are calculated. The prodigious
majority of these boxes will have a very low probability, that’s why the algorithm proceeds to
expunge the boxes that are below a certain threshold of minimum probability.

1.2.OBJECT RECOGNITION METHODS

Face Apperception where that detected and processed face is compared to a database of
kenned faces, to decide who that person is. Since 2002, face detection can be performed fairly
facilely and reliably with Intel’s open source framework called OpenCV. However, detecting a
person’s face when that person is viewed from an angle is conventionally harder, sometimes
requiring 3D Head Pose Estimation. Additionally, lack of felicitous effulgence of an image can
greatly increase the arduousness of detecting a face, or incremented contrast in shadows on the
face, or maybe the picture is blurry, or the person is wearing glasses, etc. Face apperception
however is much less reliable than face detection, with a precision of 30-70% in general. Face
apperception has been a vigorous field of research since the 1990s, but is still a far way away from
9
a reliable method of utilize authentication. More and more techniques are being developed each
year. The Eigenface technique is considered the simplest method of precise face apperception, but
many other (much more perplexed) methods or amalgamations of multiple methods are scarcely
more precise.
OpenCV was commenced at Intel in 1999 by Gary Bradski for the purposes of expediting
research in and commercial applications of computer vision in the world and, for Intel, engendering
an ordinate dictation for ever more potent computers by such applications. VadimPisarevsky joined
Gary to manage Intel's Russian software OpenCV team. Over time the OpenCV team moved on to
other companies and other Research. Several of the pristine team ineluctably ended up working in
robotics and found their way to Willow Garage. In 2008, Willow Garage optically discerned the
desideratum to expeditiously advance robotic perception capabilities in an open way that leverages
the entire research and commercial community and commenced actively fortifying OpenCV, with
Gary and Vadim once again leading the effort. Intel's open-source computer-vision library can
greatly simplify computervision programming. It includes advanced capabilities - face detection,
face tracking, face apperception, Kalman filtering, and a variety of artificialintelligence (AI)
methods - in yare-tousle form. In integration, it provides many rudimentary computer-vision
algorithms via its lower-levelAPIs.
OpenCV has the advantage of being a multi-platform framework; it fortifies both Windows
and Linux, and more recently, Mac OS X. OpenCV has so many capabilities it can seem inundating
at first. Fortuitously, only a cull few need to be kenned beforehand to get commenced.

10
Fig1.1.A simplified illustration of the YOLO object detector pipeline
1.3.OPENCV FEATURES DESCRIPTION
Step 1 : Preprocessing

Often an input image is pre-processed to normalize contrast and effulgence


effects.Sometimes, gamma rectification engenders scarcely better results. The reason is that nobody
kens in advance which of these preprocessing steps will engender good results. You endeavor a few
different ones and some might give marginally better results.
We do utilize color information when available. RGB and LAB color spaces give
commensurable results, but restricting to grayscale abbreviates performance by 1.5% at 10−4
FPPW. Square root gamma compression of each color channel ameliorates performance at low
FPPW (by 1% at 10−4 FPPW) but log compression is too vigorous and worsens it by 2% at 10−4
FPPW.”
As you can visually perceive, they did not ken in advance what pre-processing to utilize.
They made plausible conjectures and used tribulation and error.
As a component of pre-processing, an input image or patch of an image is additionally
cropped and resized to a fine-tuned size. This is essential because the next step, feature extraction,
11
is performed on a fine-tuned sized image.

Step 2 : Feature Extraction

The input image has an extravagant amount of extra information that is not compulsory for
relegation. Consequently, the first step in image relegation is to simplify the image by extracting
the consequential information contained in the image and leaving out the rest. For example, if you
opiate to find shirt and coat buttons in images, you will descry a consequential variation in RGB
pixel values.

Histogram of Oriented Gradients (HOG)

1. It converts an image of fine-tuned size to a feature vector of fine- tuned size. HOG is
predicated on the conception that local object appearance can be efficaciously described by
the distribution (histogram) of edge directions (oriented gradients).
2. Gradient calculation: Calculate the x and the y gradient images, and , from the pristine
image. This can be done by filtering the pristine image with the followingkernels.
3. Utilizing the gradient images and, we can calculate the magnitude and orientation of the
gradient utilizing the followingequations.
4. The calculated gradients are “unsigned” and consequently are in the range 0 to 180degrees.
5. Cells: Divide the image into 8×8cells.
6. Calculate histogram of gradients in these 8×8 cells: At each pixel in an 8×8 cell we ken the
gradient (magnitude and direction), and ergo we have 64 magnitudes and 64 directions —
i.e. 128 numbers. Histogram of these gradients will provide a more utilizable
andcompactrepresentation If the direction of the gradient at a pixel is precisely 0, 20, 40 …
or 160 degrees, a vote equipollent to the magnitude of the gradient is cast by the pixel into
the bin. A pixel where the direction of the gradient is not precisely 0, 20, 40 … 160 degrees
splits its vote among the two most proximate bins predicated on the distance from the bin.
7. Block normalization: The histogram calculated in the precedent step is not very robust to
lighting changes. To contravene these effects we can normalize the histogram — i.e.
cerebrate of the histogram as a vector of 9 elements and divide each element by the
magnitude ofthis vector. In the pristine HOG paper, this normalization is not done over the
8×8 cell that engendered the histogram, but over 16×16 blocks. The conception is
equipollent, but now in lieu of a 9 element vector you have a 36 element vector.
8. Feature Vector: In the antecedent steps we deciphered how to calculate histogram over an
8×8 cell and then normalize it over a 16×16 block.Consequently, we can make 7 steps in
the horizontal direction and 15 steps in the vertical direction which integrates up to 7 x 15 =
105 steps

12
Step 3 : Learning Algorithm For Classification

In this section, we will learn how a relegation algorithm takes this feature vector as input
and outputs a class label (e.g. feline or background).
Afore a relegation algorithm can do its magic, we require training it by exhibiting thousands
of examples of felines and backgrounds. Different learning algorithms learn differently, but the
general principle is that learning algorithms treat feature vectors as points in higher dimensional
space, and endeavor to find planes / surfaces.
To simplify things, let us optically canvass one learning algorithm called Support Vector
Machines (SVM) in some detail.
Support Vector Machine ( SVM ) is one of the most popular supervised binary relegation
algorithm. Albeit the conceptions utilized in SVM have been around since 1963, the current version
was proposed in 1995 by Cortes and Vapnik.
We can cerebrate of this vector as a point in a 3780-dimensional space. Visualizing higher
dimensional space is infeasible, so let us simplify things remotely and imagine the feature vector
was just two dimensional.

Fig.1.2.Face graph for face recognition

13
All ebony dots belong to one class and the white dots belong to the other class. In other
words, we tell the algorithm the coordinates of the 2D dots and withal whether the dot is ebony
or white.
Different learning algorithms deduce how to disunite these two classes in different
ways. Linear SVM endeavors to find the best line that dissevers the two classes. H1 does not
disunite the two classes and is consequently not a good classifier. H2 and H3 both disunite the
two classes, but intuitively it feels like H3 is a better classifier than H2 because H3 appears to
dissever the two classes more cleanly. Because H2 is too proximate to some of the ebony and
white dots. On the other hand, H3 is culled such that it is at a maximum distance from
members of the two classes.
If you get an incipient 2D feature vector corresponding to an image the algorithm has
never visually perceived afore, you can simply test which side of the line the point
prevarications and assign it the opportune class label. If your feature vectors are in 3D, SVM
will find the opportune plane that maximally disunites the two classes. As you may have
conjectured, if your feature vector is in a 3780-dimensional space, SVM will find the
congruous hyperplane.

Fig.1.3.Process of YOLO

14
CHAPTER-2

2.LITERATURE SURVEY
This chapter gives a review of subsisting literature on the topic, the utilization of object
and image apperception in sundry applications. It then enumerates the literature gaps that have
been found from the survey and description of the proposed work.

2.1 REVIEW OF LITERATURE

Yolo is an incipient approach to remonstrate detection. The amalgamated architecture


is astronomically expeditious. The base YOLO model processes images in authentic time at 45
frames per second. A more diminutive version of the network, expeditious YOLO, process and
astounding 155 frames per second while still achieving double the map of other genuine time
detectors.
Compared to state of the art detection system, YOLO makes more localization errors
but is less liable to soothsay erroneous positives on background. Conclusively, YOLO learns
very general representations of objects. It out performs other detection methods including
DPM and R-CNN when generalizing from natural images to other domains like artwork.
Humans glance at an image and instantly ken what objects are in the image, where they
are, and how they interact. The human visual system is expeditious and precise, sanctioning us
to perform intricate tasks like driving with little conscious thought. Current detection systems
repurposes classifiers to perform detection. More recent approaches like R-CNN use region
proposal methods to first engender potential bounding boxes in an image and then run a
classifier on these proposed boxes. After relegation post- processing is utilized to refine the
bounding boxes, eliminate duplicate detection, and rescore the boxes predicated on other
objects in the scene. These intricate pipelines are slow and hard to optimize because each
individual component must be training discretely. The object detection is reframed as a single
regression quandary, straight from image pixel to bounding box coordinates and class
probabilities.
YOLO is refreshingly simple. This coalesced model has several benefits over
traditional methods of object detection. First, YOLO is profoundly expeditious. Since
detection is framed as a regression quandary, an intricate pipeline is not needed. A neural
network is ran on an incipient image at test time to soothsay detection. The base network runs
at 45 frames per second with no batch processing on a Titan X GPU and an expeditious
version runs at more than 15f ps. This betokens video streaming can be processed in authentic
time with less than 25 milli seconds of latency.
Further more, YOLO achieves more than twice the mean average precision of other

15
genuine time systems.
Second, YOLO reasons ecumenically about the image when making presages.
Expeditious R-CNN, a top detection method, mistakes background patches in an image for
objects because it can’t optically discern the more immensely colossal context. YOLO makes
less than a moiety the number of background errors compared to expeditious R-CNN. Third,
YOLO learns generalizable representations of objects. Since YOLO is highly generalizable it
is less liable to break down when applied to incipient domains or unexpected inputs. YOLO
still lags behind state of the art detection systems in precision. While it can expeditiously
identify objects in images it struggles to precisely localize some objects, especially diminutive
ones.
The spatial constraint imposed by YOLO limits the number of nearby objects that our
model can prognosticate. The model struggles with minute objects that appear in groups, such
as flocks of birds. Conclusively, while a training is done on a loss function that approximates
detection performance, the loss function treats errors the same in minute bounding boxes
versus astronomically immense bounding boxes. A diminutive error in an immensely colossal
box is generally benign but a diminutive error in a minuscule box has a much more
preponderant effect on IOU. The main source of error is erroneous localizations.
YOLO, is an expeditious precise object detector, making it ideal for computer vision
applications. While YOLO processes images individually, when annexed to a webcam it
functions like a tracking system, detecting objects as they kinetically circumvent and
transmute in appearance.YOLO, a cumulated model for object detection. It is simple to
construct and can be trained directly on full images. Fat YOLO is the most expeditious
general-purport object detector in the literature and YOLO pushes the state of the art in
genuine time object detection. YOLO additionally generalizes well to incipient domains
making it ideal for applications that rely on expeditious, robust object detection.

Sl.No Title Author Algorithm used Advantages


1 Automated Youssef Fast hierarchical houf Barcode readers are
barcode transform used at checkout
recognition for counters. However,
smart there is a major
identification and constraint when this
inspection tool is used.
automation Traditional camera
based picturing, the
distance between the
sensors and the object
is close to zero when

16
the reader is applied.
2 CSIFT: A SIFT Abdel-hakim, CSIFT CSIFT approach
Descriptor with Farag balances between
Color Invariant color and geometrical
Characteristics characteristics. The
colour invariance is
achieved by using
colour invariance
model whereas
geometrical
invariance is achieved
by building CSIFT
3 SIFT optimization Castillo- carrion SIFT It is a classic
and automation for and Guerrero- approach. Also the
matching images ginen original inspiration
from multiple for most of the
temporal sources descriptors proposed
later. It is more
accurate than any
other descriptors. It is
scale invariant also.
4 Color-blob-based Gecer, COSFIRE A positive prototype
COSFIRE filters Azzopardi, pattern for a
for object Petkov configuration by
recognition adding a set of
negative prototype
pattern. They are
effective for detection
of key points and
recognition of
objects.
5 Image matching Zhao, Zhai, SIFT Partially invariant to
algorithm based on Duvois, Wang changes in
SIFT using color illumination and
and exposure camera viewpoint.
information The features are
highly distinctive.
6 YOLO-Tomato: A Zhou, Cheng, Li Robust algorithm They have a local
Robust Algorithm minimum point. They
for Tomato guarantee starting
Detection Based from any initial
on YOLOv3 design. They usually
require few
calculations for each

17
and every iteration.
7 Fast, Compact and Madeo, Dober Binary descriptor and Binary image
Discriminative: image retrieval descriptors encode
Evaluation of Binary patch appearance
Descriptors for using a compact
Mobile Applications binary string. They
provide an attractive
alternative to the
widely used floating
point.
8 3D object Tang, Song, 3D Object recognition The projected image
recognition in Chen can be efficiently
cluttered scenes processed with faster
with robust shape 2D convolutions,
description and yielding lower
correspondence latency
selection
9 Object recognition Loncomilla, Local invariant Their features are
using local invariant Ruiz- del- solar, local, so robust to
features for robotic Martinez occlusion and clutter.
applications: A They have no prior
survey segmentation
10 FREAK:Fast Alahi, Ortiz, FREAK Reduced
Retina Keypoint Vandergheynst computational
complexity. Efficient
and easy to
implement
11 Speeded-Up Robust Herbert Bay , SURF After an image is
Features (SURF) Andreas Ess , computed into an
TinneTuytelaars , integral image, it can
and Luc Van Gool compute blocks of
subtraction between
any two blocks with
just six calculations
12 Object recognition Collet, Robust Supports you as you
and full pose Berenson, perception(RANSACK, dig inside both our
registration from a Srinivasa, Mean shift) application and
single image for Ferguson infrastructure to see
robotic what is actually going
manipulation on.

13 PCA-SIFT: A More Ke, Sukthankar PCA-SIFT They are low noise


Distinctive sensitivity, the
18
Representation for decreased
Local Image requirement for
Descriptors capacity and memory
and increased
efficiency
14 BRISK: Binary Leutenegger, BRISK They are rotational
Robust Invariant Chli, Siegwart and scale invariant
Scalable Keypoints but it takes more time
to detect the feature
points.
15 Vision Guided Patil SIFT It can generate large
pick and place number of features
objects using that densely cover the
Robotic arm based image over the full
on SIFT range scales and
locations
16 A Hybrid Model Mayureshnaik, Hybrid Extract features from
for Frontal View Saxena, Lalit an input image of a
Human Face human face and
Detection and match these features
Recognition with the feature
stored in the database
17 DeepFace: Closing YanivTaigman, Deep neural network Technology used in
the Gap to Human- Ming yang, Lior facebook and
Level Performance wolf instagram
in Face
Verification

18 Real Time Facial Sultoni, A.G. PCA To assess the


Recognition Using Abdullah accuracy of PCA
Principal method when
Component combined with
Analysis(PCA) EmguCV in face
And EmguCV recognition in real
time
19 Face recognition A.S.SyedNavaz, PCA and neural Security and
using principal T.Dhevishri, network authentication of a
component PratapMazumde person.
analysis and neural r
networks

Table 1: List of papers referred for literature survey

19
CHAPTER-3

3.METHODOLOGY
This chapter expounds the entire methodology that has been utilized in the dissertation
work in detail. It explicates the YOLO, PCA-YOLO, FREAK and Hybrid PCA-YOLO-
FREAK algorithms with flowcharts. It elucidates the steps of implementation like database
engenderment, training stage, testing stage, comparative results of the algorithms and authentic
time implementation.

3.1.YOLO Algorithm

ARCHITECTURAL DESIGN:

YOLO Algorithm has been studied and flowchart for YOLO algorithm is as shown in Fig

20
Fig 3.1.1 Flowchart for YOLO Algorithm

This cumulated model has several benefits over traditional methods of object detection.
First, YOLO is prodigiously expeditious. Since detection is framed as a regression quandary,
an intricate pipeline is not needed. A neural network is ran on an incipient image at test time to
prognosticate detection. The base network runs at 45 frames per second with no batch
processing on a Titan X GPU and an expeditious version runs at more than 15fps. This denotes
video streaming can be processed in authentic time with less than 25 mili seconds of latency.
Further more, YOLO achieves more than twice the mean average precision of other authentic
time systems.
21
3.2.STEPS OFIMPLEMENTATION

Face Detection:

Haar Object detector which detects objects utilizing the yolo face detection algorithm.
“The haar object detector utilizes the Yolo algorithm to detect people’s faces, nasal discerners,
ocular perceivers, mouth, or upper body.The Yolo algorithm examines an image within a
sliding box to match dark or light region to identify a face that contains mouth, ocular
perceivers and nasal perceiver. The haar classifier will determine the regions where a face can
be detected. Firstly, the input source of the image is implemented. This will be either from the
database or directly with a webcam. If loading from the database, the functionality is
implemented. Notice that the variable image has been set to ecumenical, so it can be called
anywhere in the GUI.
Merge Threshold: is the criterion needed to define the final detection in an area where
there are multiple detections around an object .With the threshold designated by an integer,
varying the integer targets a sizably voluminous area and influences the mendacious detection
rate during a multiscale detection.
ScaleFactor: This takes care of the detection resolution between the MinSize and
MaxSize. The search region is scaled by the detector between MinSize and MaxSize, where
the MinSize is the size of the most diminutive detectable object in a vector format with two
elements, set in pixels. Similarly, the MaxSize is the most sizably voluminous detectable
object in the same format as MinSize set inpixels.

Crop and Inscribe Images to a Folder:

The faces are cropped and resized and then the files are inscribed. All the faces
detected in the bounding boxes will be checked and cropped by utilization of the for-loop will
iterate over each bounding box.

Face Apperception:

The apperception part of the system has been implemented utilizing yolo with Singular
Value Decomposition (SVD). The utilization of this has been predicated on the information
amassed during the literature review.

Engendering and Loading Dataset:

The dataset is a folder of image folders with image folder assigned to a subject on the
dataset.Parameters have been assigned and it will be used later in the implementation to
engender the indices needed for the training and apperception processes respectively.

22
The dataset content is declared by the variable “folderContents” as an cell matrix. A
matrix is an array of single data type while a cell matrix is a matrix that holds matrices of
variants and formats. Afore going through each folder on our dataset, we initialize a vacuous
cell where this information will be stored. In order to have a valid index for each subject on the
dataset, we have initialized the subject(studentIndex) to zero. The size command is utilized to
ken the number of folders in the dataset. Inside the loop, the folder for each subject can be
accessed to read the list of files ending image format of which it has been preserved.It is
consequential to output the designation to the view of the utilize to show the code is being
executed. A vector is declared by variable “uvt” to contain 5 integers that will match the integer
values of file denomination, stored in the folder of each subject. With these integer values, only
the images that will be utilized for training and apperception to test the system will be accessed
and processed respectively.

Feature Extraction:

First utilize the function “imagesize” to ascertain all the images in the folder are of the
same size. Then we convert the image to gray. This will ameliorate the computational speed and
less information will be disoriented during extraction. The images (defined by the vector) are
read in a for-loop.

Image Filtering:

A minimum order-static filter (non-linear spatial filter) is integrated to the grayscale


image. This filter truncates the amplitude of intensity variation in pixel order where the number
of visual examinations are the order statistics. It is an image instauration technique that make
corrupted images near kindred as the pristine image.The results shows they were good due to
the implementation of this filter.

Proceeding to the feature extraction for training, we utilize SVD. First, we convert the
image into grayscale and convert the block matrix into double to represent a data matrix with
gray scale values to each pixel. The “blockcell” is a 5x52 matrix with 52 rows corresponding to
the number of blocks extracted for each image and 5 columns correspond to the number of
images to be trained. These values are obtained due to the image resized to a moiety of its size
to give [56 46]. This increases computational speed and the visual examination sequence
engendered as explicated in the design chapter. The overlapping size given by OL =
(newHeight-1).The number of blocks as optically discerned interiorly is now given by #Block =
(H-newH)/(newH-OL) +1. This leaves us with #Block = (56-5)/(5-4)+1= 52.
The block extraction commences with the parameters of the blocks are declared in figure
5.7. With these parameters already declared, it can be implemented by the code from line 74 to
line 82 of figure 5.9 above. This will now convert each face into a sequence of 52 elements as
23
mention earlier represented by a 5-by-52 element matrix. We should note here that we are
utilizing 5 images for the training as declared by the vector integers.

Quantization:

The values for the coefficients are quantized into discrete values
The next step is block quantization. We will find the maximum and minimum
coefficients of the optical discernment vector. First, we declare the coefficients and utilize a for-
loop to go through the second row of the data engendered in put them into separate matrices.The
values which were stored are quantized into the discrete values mentioned above. These
coefficients have maximum and minimum values and can be computed from all possible
optically canvassed coefficients. First, we utilized the values stored in the second row of
thedatabase(studentDatabase) to compute the minimum and maximum values of all optical
discernment vectors utilizing a for-loop, accumulating them into separate matrices as shown.
Each block needs to store a discrete value(label). Each block of image is assigned an integer.
With all the labels stored as a cell matrix in the 4 row of the database, they can be converted to a
conventionalmatrix

Training theDataset:

This will give an estimation of the emission and transition probabilities.First, engender
initial probabilities . The computation is done with the number of states of our model and all
possible visual examination symbols obtained during quantization. This is the probability matrix
that will be habituated to estimate the final probability states.The return values are an n-by-n
matrix of ones representing the number of states since we are utilizing the 7-state model. These
are stored on a sixth row on the database. These processes are iterated for all training
imagescorresponding to the integer declared by the vector “uvt” of the subjects in the dataset.

Apperception:

The apperception done by associating each model(face).Each input image is associated


with its own visual examination vector.Utilizing the trained model, we have final estimated
transition and emission of the trained model. This process is iterated for all images. The
maximum algorithm of probability of sequence given the transition and emission matrix can be
computed with the YOLO algorithm.The results are stored for each face index. To obtain the
percentage at which the recognized subject will accede with the desultory sequence, iterate
through all images. The YOLO face command is utilized to engender the arbitrary sequence of
states from the model from the transition and emission probabilities of the
database(studentDatabase).From the requisites, the client wanted to visually perceive the
matched face exhibited alongside the probed image. The denomination of the subject the image
24
is matched to is to be exhibited on top of the image found on the database. This was
implemented and exhibited utilizing the“imshow” command.The utilization of a for-loop is to
go through the database and cull the folder set that is identically tantamount to the match
equipollent to the index of the maximum probability in the exhibited results.

Data Capture Implementation:

The webcam or external camera has been implemented with the YOLO command.This
engenders the webcam object cam that connects a single webcam on the system. With this
function a menu cull can be engendered to capture or exit when the camera is initiated. The face
detector algorithm has been embedded here to detect faces straight away by way of cull from an
input command by the utilize.
The system can be evaluated by passing the face apperception function. The
faceapperception function returns the index of the recognized face on the database. The total
index of correctly recognized faces can be computed as a percentage against the total number of
faces. First, the function is implemented to go through the folder and pick images that were
nottraineddeclared by a vector “uvt”. The vector has integer values that match the indices of
images that were trained. The total number of recognized subjects is held conclusively in
afunction.

3.3.HARDWARE AND SOFTWARE USED

Hardware: Authentic-time implementation is done utilizing the Laptop and Camera


Laptop Designations: Dell Inspiron 15, Processor Intel (R) Core (TM) i5-7200U, 3.1.
GHz Camera Designations: Logitech HD 720p
Software: YOLO image processing toolbox, YOLO Image acquisition toolbox YOLO
version: 2017
In the next chapter, YOLO algorithm is implemented. Results for all the five steps i.e.
database engenderment, training stage, testing stage, comparative results of the algorithms and
genuine time implementation results are mentioned.

3.4. ADVANTAGES

 The software can be used for security purposes in organizations and in secured zones.
 The software stores the faces that are detected and automatically marks attendance.
 The system is convenient and secure for the users.
 It saves their time and efforts.
25
3.5.DISADVANTAGES

 The system don’t recognize properly in poor light so may give false results.
 It can only detect face from a limited distance.

3.6.APPLICATIONS
 The system can be used for places that require security like bank, military etc.
 It can also be used in houses and society to recognize the outsiders and save their
identity.
 The software can used to mark attendance based on face recognition in organizations.

3.7.EVALUATION OF THE SYSTEM:

Window Size: The window size consists of the MinSize and MaxSize. This sets the size
a face detection can be. For this system, to maximize the accuracy, I have decided to use
MinSize only which sets the minimum size for a face in order to include very tiny faces at the
back of the class, as the faces are of different sizes. The MinSize [height width] is greater than
or equal to [20 20] for this system. Other sizes have been tested during the implementation and
iteration testing of the system before settling for this size.
ScaleFactor: The ScaleFactor determines the scale for the detection resolution between
successive increments of the window size during scanning. This parameter will help to decrease
the number of false positives. (decrease in false positive is as a result of increase in scale factor).
MergeThreshold: This parameter will control the number of face detections and declare
the final detection of a face area after the combination and rejection of multiple detections
around the object. The accuracy of the system depends on the level of MergeThreshold. The
higher the value of the MergeThreshold level, the lower the accuracy and vice versa.
The algorithm has been analyzed with images of different sizes for Image1, Image2 and
Image3 respectively. These images were taken from different websites showing students in a
classroom setting with natural sitting positions showing faces of different sizes from (24x24 to
60x60). The textual description of the images has been summarized on the table below and has
been classified in order of difficulty.
Key:
nFaces = Number of Faces per image
nRows = Number of Rows per image

26
Image nFaces nRows Description Image
Size Classification
Image1 930X620 29 9  All the students are facing the camera with Easy
each face clearly visible.
 The sitting position is not relatively spaced
out (can be seen as organised).
 The students are arranged in 8 rows and 6
columns directly facing the camera.
 The sitting arrangement is queued in a
quadrilateral form.
 There is minimal obstruction to student’s
frontal faces
 At least 5 students have facial hair.
 Older looking students of age range 30 to 50
 At least 7 students have glasses on.
 The face sizes are ranging from 25x25 to
80x80.
 This is a clear picture with high illumination
and high resolution.
 Angle of inclination to the camera is
approximately 90 degrees.
Image2 850x570 59 17  Not all the students are facing the camera, Difficult
as their sitting positions is seen to be side by
side.
 Sitting position has a more naturally looking
feel of a lecture hall setting compared to
image1.
 The sitting position can be in seen in the
form of 11 rows with 5 identical columns at
the front and 7 columns at the back.
 Students faces are at different angles to each
other with reference to the camera position.
 The gap between the first row and last row
is wide, making the face sizes at the back
very tiny.
 There is a wide variety of face sizes ranging
from 16x16 to 60x74.
 The image shows increased brightness from
the front seats and decreases progressively
towards the back of the lecture hall.
 This image generally can be described as
low illumination and lower resolution
compared to image1.
 Angle of inclination to the camera is
approximately 90 degrees
Image3 1200x630 88 13  Camera in position to capture most of the Very Difficult
27
classroom.
 There is sparse distribution of the students
in a haphazard manner distant away from
the camera.
 Majority of the students are sitting at the
back of the classroom.
 There is a wide gap and empty space in the
middle between the first row and the last
row.
 The faces are at different angles to each
other.
 There is a wide variety of faces sizes
compared to image2 ranging from 14x16 to
40x40.
 Some students have their hands on their
faces and obstruction to other faces by some
other students.
 The lecturer is an obstruction to some faces
in some of the front rows which could be
detected.
 There is no clear distinction of facial
features such as beards, mouth nose due to
the student faces in the image a little blurry.
 Angle of inclination to the camera is
approximately 45 degrees

Table 2: Textual Description of Images with Level of Difficulty for Face Detection.

In order to carry out the experiment, all three images were utilized. The values of the
image sizes were engendered by resizing each image to ¾, ½ and ¼ of the pristine image sizes
respectively. The Merge Threshold values were obtained by performing the experiment at
Merge Threshold levels of 0,1,5,7 and 10. The ScaleFactor used was 1.05 and 1.25. The
performance metrics used to analyze the impact of these techniques are True Positive (TP),
Mendacious Positive (FP) and Mendacious Negative(FN).
Where;
TP is the number of faces detected from the algorithm (detected and identified true)
FP is the number of non-faces erroneously detected as faces (detected and identified as
mendacious)
FN is the number of faces not detected
Precision is the proportion of TP against all positive results. Precision = TP/TP+FP.

28
Detection Rate = TP/Total Faces.
Below the first image is considered with students sitting with all their faces facing the
front of the classroom.

Figure 3.3.1 of Image1: Students sitting at different positions from first row to last
row. Face Detection tested with no parameters.

In this image, the size of the image has not been minimized. The Merge Threshold
level, face window size and Scale Factor have not been taken into consideration. This shows
how well the system can perform with a good quality image with faces not too distant from
the camera.
Image MT Image Size Window SF TP FP FN Detection
Size Rate
Image1 4 930X620 [] 1.1 29 0 0 100
Image2 4 850x570 [] 1.1 29 0 0 49.15
Image3 4 1200x630 [] 1.1 20 0 0 22.72

Table 3: Showing default parameters

29
Figure 3.3.2 of Image1: Face Detection tested with different parameters.

In figure 3.3.2 of image1, the algorithm has been experimented with the following
parameters MergeThreshold = 1, window size for face = [20,20] and ScaleFactor = 1.05.

In this approach, the window scale of size [20,20] slides over the image at different
directions scanning the image. After one consummate scan, the scale factor (1.05) is applied
to incrementing the detection resolution of the window. This process perpetuates until the
size adjust to approximate the size of each face. Withal, the image size plays an immensely
colossal role for during this process as it determines the face sizes in the image. Abbreviating
the image size may lead to loss in features etc. However, it still detects some mendacious
positives as the algorithm searches for anything in shape with features of the face. Further
experiments were carried out to demonstrate the performance of the algorithm. The results of
these experiments are as summarized on the tablesbelow.

Table Key:

MergeThreshold = MT.

YOLO function Imagesize(I) where I is the image, ScaleFactor = SF.

True Positive = TP.

Erroneous Positive = FP.

30
Image MT Image Size Window SF TP FP FN Detection
Size Rate
Image1 4 Image size(I) [] 1.1 29 0 0 100
Image1 4 Image [] 1.1 29 0 0 100
size(I,0.75)
Image1 4 Image [] 1.1 19 0 0 65.51
size(I,0.5)
Image1 4 Image [] 1.1 3 0 0 10.34
size(I,0.25)
Table 4A Image 1: Shows Performance Metrics of different Image sizes (using
Imagesize) to ¾, ½ and ¼ of the original image size using default parameters.

Image MT Image Size Window SF TP FP FN Detection


Size Rate
Image1 1 Image size(I) [20 20] 1.05 28 15 0 96.55
Image1 1 Image [20 20] 1.05 28 9 0 96.55
size(I,0.75)
Image1 1 Image [20 20] 1.05 25 3 4 89.29
size(I,0.5)
Image1 1 Image [20 20] 1.05 7 0 22 24.13
size(I,0.25)

Table 4B Image 1: Shows Performance Metrics of different Image sizes (using


Imagesize) to ¾, ½ and ¼ of the original image size for YOLO face detection

Table 4B above shows the evaluation of Image1 with the pristine image size,
experimented with different image sizes, and the results have been exhibited as shown. From
this table, the figures, the system performs better with a 96.55 % detection rate. With the
image size truncated to different sizes, it can be visually perceived that the detection rate
drops. A plot of the Detection rate against image size has been shown in Figure 3 below.

31
Figure 3.3.3: Plot of Detection Rate Against Image Size for Image 1.

Image MT Image Size Window SF TP F FN Detection Rate


Size P
Image1 0 Image size(I) [20 20] 1.25 22 3 0 75.86
Image1 1 Image size(I) [20 20] 1.25 28 5 1 96.55
Image1 5 Image size(I) [20 20] 1.25 21 0 8 72.41
Image1 7 Image size(I) [20 20] 1.25 15 0 14 56.62
Image1 10 Image size(I) [20 20] 1.25 11 0 18 37.93

Table 4C Image 1: Shows Performance Metrics of different Merge Threshold Levels

Table 4C above shows the analysis at different mergethreshold levels with a


scaleFactor of 1.25 and window size of [20,20] keeping the pristine size of the image. The
detection rate was 78.86% at the threshold level of 0. At this mergethreshold level, the
detected face is counted depending on the number of windows clustered on the face. The
least number of windows on a face to be considered is three. The more diminutive windows
centre is within the centre of the face and the window with high detection confidence was
considered. The number of windows detecting with size is presaged to grow in linearity with
the size of the window scanned on a face.

As a consequence, faces with less than three bounding boxes are not counted because
the system is performed at a MergeThreshold level of zero. Consequently, object with high
cluster of windows were counted as a face and used to calculate the detection rate. Figure 4
32
of image one below shows the face detection performed at a MergeThreshold level of zero.

Figure 3.3.4 of Image1 (Gettyimages): Face Detection performed at Merge

Threshold level of 0.

Figure 3.3.5: Plot of Detection Rate Against Merge Threshold level for Image 1.

33
A further experiment was carried out on image 2 and image 3 respectively with the
respective parameters used to analyze that of image 1 and the following results were obtained
as shown on the tables below figures and tables below.

Figure 3.3.6 of Image 2 (Gettyimages): Face Detection tested with different parameter

Image MT Image Size Window SF TP FP FN Detection


Size Rate
Image2 4 Image size(I) [] 1.1 29 0 0 49.15
Image2 4 Image [] 1.1 18 0 0 30.51
size(I,0.75)
Image2 4 Image [] 1.1 8 0 0 13.55
size(I,0.5)
Image2 4 Image [] 1.1 0 0 0 0
size(I,0.25)

Table 5A Image 1: Shows Performance Metrics of different Image sizes


(using Image size) to ¾, ½ and ¼ of the original image size using default parameters
of the YOLO Face Detection algorithm.

34
Image MT Image Size Window SF TP FP FN Detection Rate
Size
Image2 1 Image size(I) [20 20] 1.05 47 12 12 76.6
Image2 1 Image [20 20] 1.05 33 9 26 55.93
size(I,0.75)
Image2 1 Image [20 20] 1.05 17 4 42 28.81
size(I,0.5)
Image2 1 Image [20 20] 1.05 3 1 56 5.08
size(I,0.25)

Table 5B Image 2: Shows Performance Metrics of different Image sizes (using


Image size) to ¾, ½ and ¼ of the original image size.

Figure 3.3.7: Plot of Detection Rate Against Image Size for Image 2.

Image MT Image Size Window SF TP FP FN Detection Rate


Size
Image2 0 Image size(I) [20 20] 1.25 39 0 20 66.10
Image2 1 Image size(I) [20 20] 1.25 40 7 19 67.79
Image2 5 Image size(I) [20 20] 1.25 14 0 45 23.72
Image2 7 Image size(I) [20 20] 1.25 11 0 48 18.64
Image2 10 Image size(I) [20 20] 1.25 8 0 51 13.56

Table 5C Image 2: Shows Performance Metrics of different Merge Threshold Levels


35
Figure3.3.8: Plot of Detection Rate Against Merge Threshold level for Image 2.

Figure 3.3.9: Face Detection of Image1 at original Size.

36
Figure 3.3.10: Face Detection of Image1 at ¾ of Original Size

37
Figure 3.3.11: Face Detection of Image1 at ½ of original Size

38
Figure 3.3.12: Face Detection of Image1 at ¼ of original Size

The tables above show three performance metrics obtained from the different
parameters carried out during the experiment. Tables 4A,5A show the performance metrics at
different image sizes with reverence to the default parameters of the YoloFace detection
algorithm. On Table 4A, the performance of the algorithm with Image1 shows a 100%
output. However, this is dependent on the illumination and the frontal pose of each individual
in the picture. As the size of the image is halved, a mark drop in the detection rate leaving the
TP at total of 19 detected faces and is minimized further as the image size is abbreviated to
¼. There was no FP and FN obtained. Comparing with Table 4B against the parameters set
out in this experiment, the algorithm performs better because it shows a detection rate of
96.55 and this does not truncate drastically as the image size is minimized compared to the
values obtained in Table 4B.
The performance of the algorithm on Table 5A shows how impotent the algorithm
will perform with the default parameters on an Image with a more arduous level. The
detection rate is at 49.15% as compared to 76.6 obtained on Table 4B and abbreviates as the
image size decreases. There are no FP and FN obtained utilizing the default parameters. On a

39
more arduous image, as shown on Table 5A, the detection rate is 22.72% against 53.40
utilizing the parameters designated on this paper.
Figures 12 to 15 shows the different sizes of the Image1. The size pristine size of the
image is shown in Figure 12. Figure 13 shows the size of the image truncated to ¾ of the
pristine size. From the figure, you can visually perceive the minimization in size is not as
remarkable as the truncation in size shown in Figure 14 (truncated to a moiety of the pristine
size) and Figure 15 (minimized to ¼ of the pristine size). This has an impact in the face sizes
of the individuals in the images. The faces sizes abbreviate as the image size abbreviates
leading to a poor detection rate. As a result, the performance of the system depends on the
variation of the image resolution.
This shows that the default parameters will only perform better with images of
higher illumination, higher resolution and face size of 25x25, enough contrast on the faces,
and most of the characteristics outlined for Image1 on Table 1 above.
The values were obtained by resizing the pristine image size of Image1, Image2
and Image3 (930X620, 850x570, 1200x630) respectively. Each image was scaled by a
factor of 1.05 at window size of [20 20]. On these tables (3B, 4B and 5B), the number of
True Positives, Erroneous Positives and Detection rates drops as the image size decreases.
In contrast, the Erroneous Negatives increases as the image size decreases. This has been
further illustrated by a plotting the Detection rate against image size as shown in Figures 3,
7 and 10 respectively.
Tables 4C and 5C show the performance metrics at different MergeThreshold
levels. Each threshold level was introduced as the experiment was carried out. The window
size [20 20] was scaled at 1.25. From these tables (4C and 5C), we can visually perceive
that the number of True.
Positives, Erroneous Positives decreases as the MergeThreshold level is
incremented at the different calibers from 1 to 10. There is no great paramount change with
a MergeThreshold level of zero. The detection rate decreases with increase in Merge
Threshold level. The number of Erroneous Negatives increases as Merge Threshold level
increases. To further illustrate this, a plot of Detection rate against Merge Threshold levels
is shown in figures 5, 8 and 11 respectively
From the analysis, above, predicated on the figures obtained, the face detection part
of this system will take into consideration a Merge Threshold =1, ScaleFactor =1.05,
window size of [20 20]. With the postulation that the lecturer will ask the students to face
forward into the camera, and the students being able to cooperate, the performance rate of
face detection.
The faces that have not been detected were either faces which the ScaleFactor could
not correctly determine its resolution as set by the MinSize of the system or did not have
enough contrast on them. Some of the faces appear in half profile which makes it arduous
to extract features that makes it facile for the system to detect as a face (Head pose).
40
Approximately 56 out of 88 faces in image3 and approximately 12 out of 59 faces
in image2 are too far at the back of the lecture hall which can additionally be a resolution
quandary. The sizes of these faces are approximated at [17 17], [16 16], [19 19], [19x25],
which are far below the minimum size designated by the system for detection. Moreover,
one can conclude that albeit all the images are of high resolution (930x620, 850x570,
1200x630) it will not meet the requisites for a better performance of the algorithm if
truncated remarkably. A truncation in the image size leads to a truncation in the face sizes
which in turn leads to poor detection rates. In summary for better performance the image
size should not be truncated more than ¾ of its pristinesize.
Furthermore, the arduousness of image3 which makes it arduous to optically
discern facial features even from the human perspective could be a more reason why the
system could not detect more faces. However, it can be very arduous to decide image
quality as it is a contributing factor that determines the evaluation time to detect a face
(illumination).
This component of the system has reached a conclusion, predicated on the findings
from the investigation of the different parameters of the Yolo algorithm. The purport of
this component of the system is to detect faces and count the number of faces detected for
attendance. The system automatically counts the total number of bounding boxes the
system detects as a face. This includes both True positive and Mendacious positive
respectively. In this regard, the numerical total of the lecture room will be obtained from
the total number of bounding boxes detected by the system.
This can however, be more or less due to the detection of TP and FP respectively.
The lecturer will have to attest by counting the total number of students to compare with
that obtained by the system.
By doing this, the first front three rows will be taken and followed in rows of three
for detection and subsequently for the apperception phase of the system. Apperception
cannot be achieved with minuscule face detection as the size will be too diminutive.
To achieve this on a class size of 80, a 16MP phone camera is recommended for a
maximum image size of 4920X3264 pixels. The posit here is that the Camera position is
directly facing, the students with deference to their sitting position.
As a component of this project, the practical conclusion to achieve the apperception phase is
to utilize a professional Camera which can give a high-resolution image with good
illumination. With this, we truncate the class size to 8 rows of 40 students. The postulation
here is that, images of each row will be taken with students facing the camera. This will
enable the face detection part of the system to be able to detect faces at a size that is
subsidiary for face apperception and withal a precise numerical count for the class register.
However, each row of students sitting position will be considered and a photo for each set of
students taken at a time for face detection. The face detection part of the system has been
implemented to automatically crop faces detected within a bounding box. The cropped faces
41
are automatically stored to a folder and will be utilized as test images for apperceptioneffort.

42
CHAPTER-4

4.RESULTS AND DISCUSSION


This chapter presents and describes the results and output obtained for YOLO
Algorithm for all the five steps i.e. database engenderment, training stage, testing stage,
comparative results of the algorithms and authentic time implementation results.

4.1.DATABASECREATION

Fig 4.1 Database of objects in classroom


43
4.2.TRAININGSTAGE
The YOLO algorithm have been trained for the static test images of the created
database for all the 25 objects and 5 samples of each object.

4.2.1.Training using YOLO algorithm: Database creation code for YOLO


algorithm is run. The YOLO feature descriptors are generated for each image in the code.
CSV files are formed and saved folder wise from s(1) to s(25) within the YOLO folder for
each object and foreachof the 5-sample image of the objects. Fig 4.2.1 shows CSV file
database for YOLO for object 1 and it’s 5 samples.

Fig 4.2.1 CSV file database for YOLO

4.3.TESTING STAGE

The testing is performed on static test images saved in a folder called test images.
The testing stage is performed using code of the algorithm and it is YOLO.

4.3.1.Testing using YOLO algorithm: Figure 4.3.1(a) isrudimentary exhibiting


the results obtained in YOLO, as we can optically discern in the command window the
object match and the execution time with veneration to each test image is peregrinated. No
match has been found for object 5 and 6. The object 22 matches with object 21. The

44
apperception rate obtained from the YOLO algorithm is 88%. This is because some of the
key points might have been left out and the algorithm faces discombobulation in
apperceiving someobjects.

Fig 4.3.1: Testing using YOLO Algorithm

4.4.Comparitive study

The comparative analysis is done utilizing tabular form, discombobulation matrix


and the bar graphical form. In the table 4.4 the computation time, recollection,
apperception rate and the matching of each object available in the dataset is mentioned.

Confusion matrix: A confusion matrix is a summing up of presage results on a


relegation quandary. The number of correct and erroneous presages are shown with count
values and broken down by each class. It shows the ways in which the relegation model is
discombobulated when it makes prognostications. It gives an insight not only into the
errors being made by the classifier but additionally the types of errors that are being made.
Each row of the matrix corresponds to a genuine class while each column corresponds to a
prognosticated class. The counts of correct and erroneous relegation are then filled into the
table. The total number of correct prognostications for a class go into the expected row for
that class value and the presaged column for that class value. In the same way, the total
number of erroneous prognostications for a class go into the expected row for that class
value and the presaged column for that class value.

45
Fig.4.4.1:Comparison of Recognition rates/Accuracy

46
Fig 4.4.2: System generated excelsheet

47
4.5.THE PROBLEMS

During the development of the system, some problems were encountered and have
been discussed here.

1.The experiments conducted after the implementation of the system required some
changes. During the initial phase of the evaluation, the aim was to change the parameters.
However, it was proven that, achieving on this objective with these parameters will lead to
a big failure on the second part of the system. The conclusion to set the parameters of this
part of the system based on very small class size was due to the failures obtained from the
recognition part of the system. The size of the image is very important in face recognition
as every pixel count. Resizing an image leads to a loss in pixels and highly influenced the
performance of the system poorly. The evaluation results showed the face detection part of
the system to perform at approximately 70% detection rate which was not as great. I have
learned that face detection algorithms work on changes to illumination, image resolution
etc. as already mention and is still an ongoing research to images of uncontrolled
backgrounds. However, the next challenge for future work and research is to implement a
system that can achieve a high performance on such images.
2.The evaluation of the face recognition part of the system produced results which
as not as expected. The evaluation showed that the recognition part can achieve
approximately 60% recognition rate based on the image resolution. With a poor image
resolution, there is a high chance the system will fail. Furthermore, in order to achieve this
result, the user must make sure image is of the right resolution discussed on this report.
Some people are not good at following instructions. This may lead to poor image quality
and will make the system perform poorly. However, for the purpose of evaluating the
system against images taken in controlled backgrounds (Yale Database), there is a high
chance of excellent result. The use of images taken from faces in the wild to evaluate the
database was a great idea suggested by us supervisor. This has given us a rough idea on the
performance rate at approximately 70%. This is different from that seen in the literature
review for a similar system which performed at 97% on one of the datasets used in
research.
3.Also, one of the reasons for the choice of the platform used implementation could
have influenced the poor performance as I do not really know the language at us best.
4.The algorithm used to determine the percentage probability generates different
percentage scores each time. This is because, the percentage generated is a percentage to
show to what extent the most likely sequence of state agrees with the random sequence.
The random sequence will be that of the input image and the most likely sequence is that
from the output image. Although it does not change the overall percentage of recognition,
it was not possible to tell the user at what percentage they could decide a face is a face. The

48
only way out was to carry out a test on all five input images and at least with three
matches, they user can confirm a face, based on the output match displayed side by side.
5.The performance of the system has impacted the reliability of the system.
Because it is still an ongoing research area, the system will not be available for use at the
end of the project. However, it can be used for research purposed by the supervisor and
experimented in a lecture room before approval.

49
CHAPTER-5

5.CONCLUSION

The purpose of reducing the errors that occur in the traditional attendance taking
system has been achieved by implementing this automated attendance system. In this
paper, face recognition system have been presented using deep learning which exhibits
robustness towards recognition of the users with accuracy of 98.3% . The result shows the
capability of the system to cope with the change in posing and projection of faces. From
face recognition, it has been determined that during face detection, the problem of
illumination is solved as the original image is turned into a HOG representation that
captures the major features of the image regardless of image brightness. In the face
recognition method, local facial landmarks are considered for further processing. After
which faces are encoded which generates 128 measurements of the captured face and the
optimal face recognition is done by finding the person’s name from the encoding. The
result is then used to generate an excel sheet, the pdf of which is sent to the students and
professors on weekly interval. This system is convenient to the user and it gives better
security.

50
CHAPTER-6

FUTURE SCOPE

This chapter gives brief conceptions about future development that can be undertaken
and other kindred applications that can be developed utilizing the above system. The
further ameliorations that can be incorporated in the work are Developing a GUI with
supplemental features which is more utilize amicable. Using a more expeditious
computer which can ameliorate computational speed significantly and a better
resolution camera. The system can be utilized as an ocular perceiver of a robot for
apperceiving and retrieving objects for sundry applications like pick and place. The
system can withal be acclimated to develop a Grocery shopping avail system for the
blind.

A more detailed research is needed on a project as such. The methods used could be
combined with others to achieve great results. Different methods have been
implemented in the past according to the literature review. This will need more time as
it is only a trial that will be made taking into consideration the method that already
exist in order to have a complete new idea.

A login functionality would be implemented on the system for security purposes. The
system will be deployed as a standalone which could be used by other schools. Data
confidentiality is very important. At the start of each school year, the images of new
students are taken and stored by the university. Each student will have the right to be
informed about the use of their faces for a face recognition attendance system. This
must be in line with the government laws on ethical issues and data protection laws and
rights. The students will have to consent to their images used for the purpose of
attendance. The system that has been delivered and should only be used for
experimental purpose as it is not completely reliable.

51
REFERENCES

1) Youssef, S.M. and Salem, R.M., 2007. Automated barcode recognition for
smart identification and inspection automation. Expert Systems with
Applications, 33(4),pp.968977.

2) Abdel-Hakim, A.E. and Farag, A.A., 2006. CYOLO: A YOLO descriptor with
color invariant characteristics. In Computer Vision and Pattern Recognition,
2006 IEEE Computer Society Conference on (Vol. 2, pp. 1978-1983).IEEE.

3) Castillo-Carrión, S. and Guerrero-Ginel, J.E., 2017. YOLO optimization and


automation for matching images from multiple temporal sources. International
Journal of Applied Earth Observation and Geoinformation, 57,pp.113-122.

4) Aparna, S. and Naidu, M.E., 2016. Video Registration Based on YOLO Feature
Vectors. Procedia Computer Science, 87,pp.233-239.

5) Gecer, B., Azzopardi, G. and Petkov, N., 2017. Color-blob-based COSFIRE


filters for object recognition. Image and Vision Computing, 57,pp.165-174.

6) Chi, M., Guosheng, W., Xiaojuan, B. and Tian, Y., 2016, October. YOLO-
based matching algorithm and its application in ear recognition. In Image and
Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI),
International Congress on (pp. 691-695).IEEE.

7) Zhao, Y., Zhai, Y., Dubois, E. and Wang, S., 2016. Image matching algorithm
based on YOLO using color and exposure information. Journal of systems
engineering and electronics, 27(3),pp.691-699.

8) Zhou, Z., Cheng, S. and Li, Z., 2016, November. MDS-YOLO: An improved
YOLO matching algorithm based on MDS dimensionality reduction. In
Systems and Informatics (ICSAI), 2016 3rd International Conference on (pp.
52
896-900).IEEE.

9) Zeng, Z., Liang, H., Su, M. and Zeng, C., 2016, June. Performance evaluation
of binary descriptors for mobile robots. In Industrial Electronics and
Applications (ICIEA), 2016 IEEE 11th Conference on (pp. 568-573).IEEE.

10) Madeo, S. and Bober, M., 2017. Fast, compact, and discriminative: Evaluation
of binary descriptors for mobile applications. IEEE Transactions on
Multimedia, 19(2),pp.221-235.

11) Advani, S., Zientara, P., Shukla, N., Okafor, I., Irick, K., Sampson, J., Datta, S.
and Narayanan, V., 2017. A Multitask Grocery Assist System for the Visually
Impaired: Smart glasses, gloves, and shopping carts provide auditory and tactile
feedback. IEEE Consumer Electronics Magazine, 6(1),pp.73-81.

12) Tang, K., Song, P. and Chen, X., 2017. 3D Object Recognition in Cluttered
Scenes With Robust Shape Description and Correspondence Selection. IEEE
Access, 5,pp.1833-1845.

13) Loncomilla, P., Ruiz-del-Solar, J. and Martínez, L., 2016. Object recognition
using local invariant features for robotic applications: A survey. Pattern
Recognition, 60,pp.499-514.

14) Alahi, A., Ortiz, R. and Vandergheynst, P., 2012, June. Freak: Fast retina
keypoint. In Computer vision and pattern recognition (CVPR), 2012 IEEE
conference on (pp.510-517).

15) Azad, P., Asfour, T. and Dillmann, R., 2009, October. Combining harris
interest points and the YOLO descriptor for fast scale-invariant object
recognition. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ
International Conference on (pp. 4275-4280).IEEE.

53
16) Bay, H., Ess, A., Tuytelaars, T. and Van Gool, L., 2008. Speeded-up robust
features (SURF). Computer vision and image understanding, 110(3),pp.346-
359.

17) Collet, A., Berenson, D., Srinivasa, S.S. and Ferguson, D., 2009, May. Object
recognition and full pose registration from a single image for robotic
manipulation. In Robotics and Automation, 2009. ICRA'09. IEEE International
Conference on (pp.48-55).IEEE.

18) Ke, Y. and Sukthankar, R., 2004, June. PCA-YOLO: A more distinctive
representation for local image descriptors. In Computer Vision and Pattern
Recognition, 2004. CVPR2004.Proceedings of the 2004 IEEE Computer
Society Conference on (Vol. 2, pp. II-II). IEEE.

19) Leutenegger, S., Chli, M. and Siegwart, R.Y., 2011, November. BRISK: Binary
robust invariant scalable keypoints. In Computer Vision (ICCV), 2011 IEEE
International Conference on (pp. 2548-2555).IEEE.

20) Patil, G.G., 2013. Vision guided pick and place robotic arm system based on
YOLO. Int. J. Sci. Eng. Res, 4(9),pp.242-248.

54

You might also like