Professional Documents
Culture Documents
D8 - Major - Project Report Final
D8 - Major - Project Report Final
D8 - Major - Project Report Final
A PROJECT REPORT
ON
“OBJECT RECOGNITION AND FACE RECOGNITION
FOR VISUALLY IMPAIRED PEOPLE”
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by
May 2022
Rukmini Knowledge Park, Kattigenahalli, Yelahanka, Bengaluru-560064
www.reva.edu.in
SCHOOL OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
Certified that the project work entitled “Object Recognition and Face Recognition for Visually
Impaired People” carried out under our guidance by Prof. Bindushree D C, Assistant Professor,
School of CSE, REVA University, the bonafide students of REVA University during the academic
year 2021-22. Mr. Manoj Sharma V (R18CS218), Mr. Manthan N R (R18CS219), Mr. Lalith
Kumar Nitesh Kumar Mehtha (R18CS200), Mr. Krishnapuram Kalyan Kumar Reddy
(R18CS191) is submitting the project report in partial fulfillment for the award of Bachelor of
Technology in Computer Science and Engineering during the academic year 2021–22. The
project report has been tested for plagiarism and has passed the plagiarism test with the similarity
score less than 16%. The project report has been approved as it satisfies the academic requirements
Dr. Ashwinkumar
Prof. Bindushree D C.
Guide Deputy Director
External Examiner
Dr. M Dhanamjaya
Vice Chancellor
1.
2.
DECLARATION
We, Mr. Manoj Sharma V (R18CS218), Mr. Manthan N R (R18CS219), Mr. Lalith Kumar
Nitesh Kumar Mehtha (R18CS200), Mr. Krishnapuram Kalyan Kumar Reddy
(R18CS191), students of B. Tech, belong to School of Computer Science and Engineering,
REVA University, declare that this Project Report entitled “Object Recognition and Face
Recognition for Visually Impaired People” is the result the of project work done by us under
the supervision of Mrs. Bindushree D C., Assistant. Professor, School of CSE, REVA
University.
I am submitting this Project Report in partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology in Computer Science and Engineering by the REVA
University, Bengaluru during the academic year 2021-22.
I declare that this project report has been tested for plagiarism and has passed the plagiarism test
with the similarity score less than 16% and it satisfies the academic requirements in respect of
Project work prescribed for the said Degree.
I further declare that this project or any part of it has not been submitted for any other Degree of
this University or any other University/ Institution.
1.
2.
3.
4.
(Signature of the Students)
Signed on 27th May 2022.
Certified that this project work submitted by Mr. Manoj Sharma V (R18CS218), Mr.
Manthan N R (R18CS219), Mr. Lalith Kumar Nitesh Kumar Mehtha (R18CS200), Mr.
Krishnapuram Kalyan Kumar Reddy (R18CS191), has been carried out under our guidance
and the declaration made by the candidates is true to the best of my knowledge.
We feel it’s our duty to acknowledge the help rendered to us by various persons.
With immense pleasure, we express our sincere gratitude, regards and thanks to our
project guide Mrs. Bindushree D C, Asst. professor, School of Computer Science and
guidance, motivation and all the support needed throughout the course of our project
work. we can never forget her valuable guidance and timely suggestion given to us.
Engineering, REVA University, Bengaluru for extending his full support and co-
operation.
We wish to record our profound and sincere gratitude to our Chancellor Dr. P
Shyama Raju, and Vice Chancellor Dr. M Dhanamjaya, REVA University, Bengaluru
for extending his full support and co-operation by allowing me to do the project in the
establishment.
We would like to thank our entire Teaching and Non-Teaching Faculty for their
support and Friends for their friendship making the life at REVA enjoyable and
memorable.
We would like to thank one and all who directly or indirectly helped us in
I
ABSTRACT
The technology shown here is designed for visually impaired people to recognize objects and
faces in their environment, allowing users to walk around securely without colliding with
them. Object identification is related to the video captured by the camera. OpenCV, YOLO,
and FaceNet are used to recognize faces and things in the video recorded. When a human
face is spotted, the algorithm matches the name to the individual. The user will next be given
with an audio version of their identity. Likewise, items spotted in the area will be delivered to
Keywords: Visually Impaired, YOLO, OpenCV, Object identification and Face recognition.
II
TABLE OF CONTENTS
Acknowledgement I
Abstract II
Table of contents III
List of figures IV
List of tables V
Chapter 1
Introduction 1
1.1 Introduction 1
1.2 Problem Statement 2
1.3 Objective 2
1.4 Scope of Work 2
Chapter 2
Literature Survey 3
Chapter 3
Proposed Work 5
3.1 Methodology 5
3.2 System Implementation 7
III
LIST OF FIGURES
IV
LIST OF TABLES
V
Object Recognition and Face Recognition for Visually Impaired People
Chapter 1
INTRODUCTION
1.1 Introduction
Many people are disabled, both temporarily and permanently. There are a lot of blind people all
throughout the world. According to the World Health Organization (WHO), about 390 lakh people
are fully blind, while another 2850 lakh are purblind, or visually impaired [5]. Many supporting or
guiding systems have been produced or are being developed to help people navigate from one place
Common tasks required medium sight, interaction, reading and writing of certain sight, analyzing the
surrounding space, and activities involving distant sight are all affected by poor eyesight.
Emerging developments in computer vision technology have prompted researchers to focus their
efforts on providing solutions for persons with visual impairments. These devices are designed to
The prospect of deploying technologies to detect objects [7] and individuals in the immediate
surroundings was investigated in this study. The detection result is delivered to the user in digital
audio. The capabilities of vision and hearing are similar in many respects. A real-time object
detection and facial recognition system is detailed with the goal of making a user aware of the objects
Visually Impaired are unable to move because they are unable to recognize the terrain and
surroundings. In your daily life, you will repeatedly require assistance and walking support systems.
Without vision, it can be difficult for the visually handicapped to navigate a room or hallway without
running into objects. Even with assistance, such as a walking stick, avoiding obstacles can be
1.3 Objective
The goal of this thesis is to create an object recognition system that can distinguish between 2D and
3D objects in a picture. The characteristics utilized and the classifier used for recognition determine
the performance of the object recognition system. This study aims to present a new feature extraction
approach for extracting global features and getting local features from the study area. In addition, the
study work aims to combine classical classifiers in order to recognize the item.
The system is both free and accessible. Real-time results are provided. Reliability, the visually
impaired user may rely on the system to provide accurate results. The difference between various
objects, such as a chair, a table, etc. may be clearly distinguished depending on the video quality.
Chapter 2
LITERATURE SURVEY
[1] The most challenging situations with respect to the blind or visually impaired population is to
fight with unemployment. Many schools adapt the existing Braille to educate them, but it becomes
Out of 12 million visually impaired people, only 10% of them makes an effort to learn Braille. Using
computer vision to read any text in any format and lighting condition a non-expensive wearable
device was designed using Raspberry Pi along with a camera to record content around and translate
the same to the blind in their choice of language and a sensor to alert the user about the distance with
an object. The system is designed including of image processing, machine learning and speech
synthesis techniques. The accuracy recorded with both optical character recognition and the object
[2] Smart spec produces a voice output for the visually impaired persons using text detection. Specs
comprises of an inbuilt camera to capture images and is further analyzed using Tesseract-Optical
Character recognition (OCR). Text is converted to speech with open-source software speech
synthesizer, eSpeak. Further the headphones produce the speech by TTS. Raspberry Pi acts as an
interface between camera, sensors, IP, and controls the peripheral units.
[3] In the field of electronic travel aids (ETA) which comes with sensor technology and signal
processing, it greatly improves the mobility of visually impaired persons in dynamic conditions.
Results are achieved in the field of like integrated environment for assisted movement, acoustical
[4] It is described how a CNN-based correlation algorithm can help visually impaired persons. Given
the wealth of information that can be derived from pictures captured, adding a visual processing unit
in the framework of systems that aid persons with visual impairments is urgently important,
regardless of the version presented. This research describes a correlation technique that uses cellular
neural networks (CNNs) to improve the characteristics of helping systems and provide more
information from the surroundings to visually impaired people. Parallel processing can handle the
majority of the operations (calculations) in the suggested approach. As a result, the computing time
may be reduced, and the computing time does not rise proportionately with the size of the template
pictures.
[5] In this paper, many computers vision technology has been developed to assist blind or visually
impaired people. Wayfinding, navigation, and finding daily necessities have all been made easier with
the help of camera-based systems. The observer's movement causes all scene objects, whether
stationary or non-stationary, to move. As a result, detecting moving objects with a moving observer is
critical.
Chapter 3
PROPOSED WORK
3.1 Methodology
Identifying several objects in a picture is called object detection, and it includes both object
localization and object categorization. A first basic method would be to slide a window with variable
dimensions and use a network trained on cropped photos to predict the content class each time.
Convolutions can be used to automate this procedure, which has a significant computational cost.
The main principle behind YOLO is to place a grid on an image (typically 19x19) in which just one
cell the one holding the center/midpoint of an object, is responsible for identifying that object.
The image recorded will be broken into little grids in this method. The midpoint will be determined
using these grids. The midpoint of the bounding box will be bx and by, as well as the width and
height will be bw and bh. The confidents will be determined from this, and if the probability of mid-
point is equal to or greater than confidents, the object or person name to which the confidents level
Intersection Over Union method to assess item localization, which quantifies the overlap between two
bounding boxes. Many outputs may be generated when estimating the bounding box of a given object
in a particular cell of the grid; Non-Max Suppression helps you identify the object just once. It
chooses the box with the highest probability and ignores the other boxes with a lot of overlap (IOU).
The camera will capture live video, and the frames will be drawn from the footage. The objects as
well as the person's face will be identified. The object's name and confidentiality will be determined.
Python libraries include Scikit-learn for machine learning, OpenCV for computer vision, TensorFlow
for neural networks, and more. Real-time computer vision tasks are performed with OpenCV. YOLO
provides a framework for object detection in near-real time. Keras is a TensorFlow and other
It's user-friendly, and it makes neural network-based machine learning models extremely
straightforward to train. Keras is a useful toolbox for a number of applications since it contains a
variety of neural network add-on features including as layers, optimizers, and activation functions.
Some hardware made use of a camera for live video capture and a headphone for audio output.
Chapter 4
RESULT ANALYSIS
Figure 2 shows an example output of facial recognition on the system, complete with bounding box,
name of person detected, and confidence score.
Figures 3, 4, 5, and 6 show the sample output, which shows identified objects with bounding boxes, labels,
and confidence ratings. These photos were captured for the purpose of object detection.
The live input, expected output, live output, and status of the face recognition tests are detailed in
Table 1.
Table 2 shows the live input, expected output, live output, YOLO confidence score, and face
recognition testing status.
Chapter 5
5.1 Conclusion
Object categorization and localization within a scene are two of the most challenging aspects of
object detection. The application of deep neural networks has aided in the identification of objects.
However, implementing such strategies necessitates a significant amount of computing and memory
resources. Therefore, utilizing deep neural network designs for object detection, such as YOLO,
produces positive results, demonstrating that they may be utilized for real-time object identification
For object detection at night, the camera's night vision mode should be accessible as an integrated
feature. For visual monitoring, the scale of the design remains constant. When the size of the
monitored object decreases over time, the background takes precedence over the tracked object. In
this case, the item may not be traceable. Splitting and merging with a single camera is not possible in
REFERENCES
[1] M.P. Arakeri, N.S. Keerthana, M. Madhura, A. Sankar, T. Munnavar, “Assistive Technology for
the Visually Impaired Using Computer Vision”, International Conference on Advances in Computing,
Communications and Informatics (ICACCI), Bangalore, India, pp. 1725-1730, sept. 2018.
[2] R. Ani, E. Maria, J.J. Joyce, V. Sakkaravarthy, M.A. Raja, “Smart Specs: Voice Assisted Text
Reading system for Visually Impaired Persons Using TTS Method”, IEEE International Conference
on Innovations in Green Energy and Healthcare Technologies (IGEHT), Coimbatore, India, Mar.
2017.
[4] L. Ţepelea, A. Gacsádi, I. Gavriluţ, V. Tiponuţ, “A CNN Based Correlation Algorithm to Assist
Visually Impaired Persons”, IEEE Proceedings of the International Symposium on Signals Circuits
and Systems (ISSCS 2011), pp.169-172, Iasi, Romania,2011.
[5] P. Szolgay,L. Ţepelea, V. Tiponuţ, A. Gacsádi, “Multicore Portable System for Assisting Visually
Impaired People”, 14th International Workshop on Cellular Nanoscale Networks and their
Applications, pp. 1-2, University of Notre Dame, USA, July 29-31, 2014.
[6] E.A. Hassan, T.B. Tang, “Smart Glasses for the Visually Impaired People”, 15th International
Conference on Computers Helping People with Special Needs (ICCHP), pp. 579-582, Linz, Austria,
2016.
[7] M. Trent, A. Abdelgawad, K. Yelamarthi, “A Smart Wearable Navigation System for Visually
Impaired”, 2nd EAI international Conference on Smart Objects and Technologies for Social Good
(GOODTECHS), pp. 333-341, Venice,Italy,2016.
[8] Jae Sung Cha, Dong Kyun Lim and Yong-Nyuo Shin, “Design and Implementation of a Voice
Based Navigation for Visually Impaired Persons”, International Journal of Bio- Science and Bio-
Technology, Vol. 5, No. 3, pp.61-68, June 2013.
[9] S. Khade, Y.H. Dandawate, “Hardware Implementation of Obstacle Detection for Assisting
Visually Impaired People in an Unfamiliar Environment by Using Raspberry Pi”, Smart Trends in
Information Technology And Computer Communications, SMARTCOM 2016, vol. 628, pp. 889-
895, Jaipur, India, 2016.
[10] R. C. Gonzalez, R. E. Woods and S. L. Eddins, “Digital Image Processing using MATLAB”, Pearson
Education, 2004.
We have presented our paper in 4th International Conference on Advances in Computing and
Information Technology (IACIT-2022) 17th and 18th May 2022. We have applied for publication in
Scopus journal.
character recognition and the object recognition algorithms Convolutions can be used to automate this procedure, which
was found to be 84% [1]. has a significant computational cost.
Smart spec [2] produces a voice output for the visually The main principle behind YOLO is to place a grid on an
impaired persons using text detection. Specs comprises of an image (typically 19x19) in which just one cell, the one
inbuilt camera to capture images and is further analyzed using holding the center/midpoint of an object, is responsible for
Tesseract-Optical Character recognition (OCR). Text is identifying that object.
converted to speech with open-source software speech
synthesizer, eSpeak. Further the headphones produce the The image recorded will be broken into little grids in this
speech by TTS. Raspberry Pi acts as an interface between
camera, sensors, IP, and controls the peripheral units. method. The midpoint will be determined using these grids.
The midpoint of the bounding box will be b x and by, as well
In the field of electronic travel aids (ETA) which comes with as the width and height will be bw and bh. The confidents will
sensor technology and signal processing, it greatly improves be determined from this, and if the probability of mid-point is
the mobility of visually impaired persons in dynamic equal to or greater than confidents, the object or person name
conditions. Results are achieved in the field of like integrated to which the confidents level matched will be predicted.
environment for assisted movement, acoustical virtual reality
(AVR), bioinspired solutions [3]. Intersection Over Union method to assess item localization,
which quantifies the overlap between two bounding boxes.
In this paper, many computers vision technology has been Many outputs may be generated when estimating the
developed to assist blind or visually impaired people. bounding box of a given object in a particular cell of the grid;
Wayfinding, navigation, and finding daily necessities have all
Non-Max Suppression helps you identify the object just once.
been made easier with the help of camera-based systems [9].
The observer's movement causes all scene objects, whether It chooses the box with the highest probability and ignores the
stationary or non-stationary, to move. As a result, detecting other boxes with a lot of overlap (IOU).
moving objects with a moving observer is critical [9].
Block Diagram
It is described how a CNN-based correlation algorithm can
help visually impaired persons. Given the wealth of
information that can be derived from pictures captured,
adding a visual processing unit in the framework of systems
that aid persons with visual impairments is urgently
important, regardless of the version presented. This research
describes a correlation technique that uses cellular neural
networks (CNNs) to improve the characteristics of helping
systems and provide more information from the surroundings
to visually impaired people [4]. Parallel processing can handle
the majority of the operations (calculations) in the suggested
approach. As a result, the computing time may be reduced,
and the computing time does not rise proportionately with the
size of the template pictures [4].
I. PROBLEM DEFINITION
Fig 1: Illustration of the developed framework
Visually Impaired are unable to move because they are unable
to recognize the terrain and surroundings [6]. In your daily The camera will capture live video, and the frames will be
life, you will repeatedly require assistance and walking drawn from the footage. The objects as well as the person's
support systems. Without vision, it can be difficult for the face will be identified. The object's name and confidentiality
visually handicapped to navigate a room or hallway without will be determined. The audio output is created after the text
running into objects. Even with assistance, such as a walking to speech conversion.
stick, avoiding obstacles can be difficult, uncomfortable, and
possibly inaccurate. III. SYSTEM IMPLEMENTATION
II. METHODOLOGY Python libraries include Scikit-learn for machine learning,
OpenCV for computer vision, TensorFlow for neural
Identifying several object in a picture is called object
networks, and more. Real-time computer vision tasks are
detection, and it includes both object localization and object
performed with OpenCV. YOLO provides a framework for
categorization. A first basic method would be to slide a
object detection in near-real time. Keras is a TensorFlow and
window with variable dimensions and use a network trained
other frameworks-compatible deep neural network library.
on cropped photos to predict the content class each time.
II. OBJECTIVE
III. APPLICATIONS
IV. CONCLUSION
Fig 5: Object Recognition of Fig 6: Object Recognition of
Car Dog Object categorization and localization within a scene are two
of the most challenging aspects of object detection. The
Figures 3, 4, 5, and 6 show the sample output, which shows application of deep neural networks has aided in the
identified objects with bounding boxes, labels, and identification of objects. However, implementing such
confidence ratings. These photos were captured for the strategies necessitates a significant amount of computing
purpose of object detection. and memory resources. As a consequence, utilizing deep
neural network designs for object detection, such as YOLO,
produces positive results, demonstrating that they may be Science and Bio-Technology, Vol. 5, No. 3, pp.61-68, June
utilized for real-time object identification and face 2013.
recognition, which can benefit the visually impaired.
[9] S. Khade, Y.H. Dandawate, “Hardware Implementation
I. FUTURE ENHANCEMENT of Obstacle Detection for Assisting Visually Impaired People
in an Unfamiliar Environment by Using Raspberry Pi”, Smart
For object detection at night, the camera's night vision mode Trends In Information Technology And Computer
should be accessible as an integrated feature. For visual Communications, SMARTCOM 2016, vol. 628, pp. 889-895,
monitoring, the scale of the design remains constant. When Jaipur, India, 2016.
the size of the monitored object decreases over time, the
background takes precedence over the tracked object. In this [10] R. C. Gonzalez, R. E. Woods and S. L. Eddins, “Digital
case, the item may not be traceable. Splitting and merging Image Processing using MATLAB”, Pearson Education,
with a single camera is not possible in all cases, resulting in a 2004.
loss of content from a 3D object projection in 2D images.
II. REFERENCES
[6] E.A. Hassan, T.B. Tang, “Smart Glasses for the Visually
Impaired People”, 15th International Conference on
Computers Helping People with Special Needs (ICCHP), pp.
579-582, Linz, Austria, 2016.
[8] Jae Sung Cha, Dong Kyun Lim and Yong-Nyuo Shin,
“De,sign and Implementation of a Voice Based Navigation
for Visually Impaired Persons”, International Journal of Bio-