Professional Documents
Culture Documents
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq
Impaired People
Ayesha Tariq
Executive Summary
Blind people have to face several problems in daily life, especially when they are
walking. for their protective support, an object detection system is developed in this
project based on raspberry pi and Tensor Flow framework. An image captured by a web
camera is taken as input, which is then processed to detect objects in blind person’s
surroundings. The proposed algorithm is tested on objects like a chair, a table, a TV and
other outdoor obstacles. This system is not only capable of detecting on which side of the
person, a particular object is present but also detect people as objects. After detection, the
recognized results are used to notify the blind person through voice in real time. The SSD
(Single Shot Detector) Mobilenet model is used to evaluate the object recognition system.
.
ii
Table of Contents
1 INTRODUCTION..................................................................................................................1
2 MOTIVATION.......................................................................................................................2
3 BACKGROUND.....................................................................................................................10
3.1.2 AI Goals...................................................................................................................10
4 RELATED WORK...............................................................................................................23
5 REFERENCES......................................................................................................................62
1 Introduction
In a report of World Health Organization, 285 million people are visually impaired,
among them 39 million are completely blind. According to the researchers, with the
increase in population, the number of people having partially or fully blindness will
increase from 39 million to 115 million in 2020 [1]. Low vision and blindness reduces
their productivity and mobility in completing day-to-day tasks, so for that they have to
rely on smart sticks, experience and other people to assist them in walking without facing
any obstacle. Moreover, sudden changes in their surroundings can also cause accidents
because it is not possible for them to react properly in any instant situation. It is not easy
to understand visual aspects like depth, color, position and orientation of an object.
However, from last few decades, technology has made several advancements to help
partially or fully visually impaired people. Despite these advancements, still blind people
are facing many problems in their personal and professional life.
In this project, we tried to resolve this issue by using deep learning techniques to detect
an object after identifying it. A large dataset consisting of objects such as chairs, doors,
tables, bicycles etc. from daily scenes is generated with the help of a web camera to apply
the object detection and recognition technique. Google Object Detection API
(Application Program Inteface) based on TF(TensorFlow) is used to identify different
obstacles. The API consists of several models of object detection but we have used Single
Shot Detector (SSD) MobileNet, which is trained on COCO datasets. By using this
model, we did not only identify the type of object but also inform the user on which
direction that particular object is present with the help of a voice message.
2 Motivation
The motivation behind this project is to help the blind people by making their
navigation easy with the help of object detection system. As with the growing
technology, it is our responsibility to use it for the betterment of people especially those
who are facing certain disabilities and are dependent on others to fulfil their daily tasks.
By properly utilizing the advanced technology, we can help disabled people by making
them independent persons of the society. The aim of this project is particularly, to help
the blind people or those who are suffering from a lack of vision. Such persons have to
encounter many obstacles while walking indoors or outdoors. A walking stick or a cane is
a simple mechanical device used to detect static objects in the surroundings of visually
impaired person. Although this device is light and portable but it range-limited and is not
an accurate source of protection for that person. A lot of research work has been done to
ensure the safety of blind people and still researchers are trying to make further
advancements. The proposed system in this project is also specially designed to make
full navigation easier for a blind person into his/her surroundings.
3 Project Overview
Blindness is a physical disability that makes a person dependent on static devices like
cane or a stick and on other people as they cannot perform their daily tasks and fulfil their
needs on their own. Researchers have worked on supporting such people by developing
modern technologies but still there is a long way in this field to help people to deal with
their disabilities. By considering the hardships that a blind person faces in his/her daily
routine, this project is developed to reduce the sufferings of visually impaired people by
building smart glasses based on Artificial Intelligence.
For detection of objects and humans, TensorFlow framework is used with Single Shot
Detector (SSD) based on MobileNet algorithm. The system is implemented on Raspberry
Pi and uses COCO dataset to train data. After the successful detection of object or a
person, the system gives an output in the form of text, which is further converted into an
audio message to alert the user so that he/she can take action accordingly. Moreover, text
detection and recognition is also performed in this project to guide user about the location
of a classroom, office or a room no. For example, if the user is a student and he/she wants
to go to a classroom then the smart glasses can detect and recognize the classroom no.
mentioned on the door or above the door. For text recognition and detection OpenCV is
used with OCR and Tesseract algorithm.
4 Project Plan
It is important to consider the risks that might face during project design and implementation.
So, the risk management is a necessity for completing a project. The risks that we can face during project
and their solution is summarized in table.
Risk Solution
Schedule and time conflict Distribute the tasks among group members,
weekly meetings to track the performance of each
member
Difficulty in finding the related tutorial Seeks advisor and senior’s help related to your
project
AI is not a new technology or a new world for researchers. It began in 1940s and
achieve an explosion of interest around 2010. The reason behind the immensely
increasing demand of AI was developments in machine learning, need of big data sources
and increasing powers of computer processing [3]. AI has been used in:
Robotics
In tools for monitoring social media from false and dangerous content
5.1.1 Categories of AI
Artificial Intelligence usually falls under two categories.
Artificial Narrow Intelligence (ANI)
Artificial General Intelligence (AGI)
Artificial Super Intelligence (ASI)
Narrow AI:
Because of having narrow range of abilities ANI is also termed as
“weak AI”. It is basically, a goal oriented type of AI, which is designed to
perform a specific single task. The task may be speech recognition, facial
recognition, internet surfing etc. These machines work under narrow sets
of limitations and constraints. Instead of replicating human intelligence,
ANI only simulates human behavior by using a narrow set of contexts and
parameters [4].
General AI:
AGI is also known as “strong AI or deep AI” and is the branch of
AI which intelligently solve any problem by mimicking human behavior
and intelligence. AGI is able to reason, make judgment, learn, plan and
solve problems by integrating prior knowledge in a creative and inventive
way.
Super AI:
ASI is the branch of AI that does not only understand or mimic
human intelligence but also at this level the machine becomes self-aware
exceeds the human ability and intelligence.
5.1.2 AI Goals:
1. Reasoning, Problem Solving:
Most of the artificial algorithms require giant computational
Software system to solve complex problems. The size of memory and time
required for the computation depends on how big a problem is. It is the
first priority of AI to search highly efficient and optimized problem-
solving algorithms by using logical techniques [5].
2. Knowledge Representation:
Knowledge engineering and representation are the central part of
AI because most of the machines require extensive knowledge about
human behavior, their ability to solve a problem and the environment in
which a particular problem is happening. Moreover, the knowledge of
different objects, relation between them, categories, events, properties and
their states is also important to learn for a machine in order to solve a
problem.
3. Machine Learning:
Machine Learning (ML) is the sub branch of Artificial Intelligence
and consists of computer algorithms that have the ability of improving
automatically through experience [6].
4. Planning:
The AI based computer system assess its environment to make a
plan consisting of predictions and parameters to maximize the ability of
solving a problem. This plan can also adapt according to the assessments
that are made using algorithms.
7. Robotics:
This branch of engineering deals with the use of robots by building
their design, way of construction and which operations they can perform.
This branch also deals with sensory feedback, control mechanism and
information processing [9].
8. Intelligence:
Both social and general intelligence are used by a machine to learn
and understand any intelligent task that a human can perform [10].
5.2 Machine learning
Machine Learning is an exciting sub-area of Artificial Intelligence and has cover the
modern world completely. It brings data in a different way like stories that are suggested
by Facebook. Machine Learning allows the computers to automatically learn and adapt
according to the situation by making predictions, recognitions and detections
automatically.
First, the training data is used as input into any selected algorithm. Training data whether
it is known or unknown is used for developing final ML algorithm. the data type has an
impact on algorithm which will be covered in next stages. to check whether the algorithm
is working correctly or not, some new data is used which is fed to the respective
algorithm and then results and predictions are checked. If the difference between
prediction and obtained results is large then the algorithm runs iteratively until the final
desired output is achieved. By using iterative approach, the algorithm learns continuously
to achieve the most optimal solution of the problem.
5.2.3 Types of ML:
Machine learning is divided into three categories.
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
1. Supervised Learning:
In supervised learning, labeled or known data is used as training
data. Because of the known data, the learning is called supervised
means it is directed into successful implementation and execution.
The input data into the ML algorithm is used for training model.
After getting a trained model from known data, unknown data is
used to develop another new response. To understand the concept of
Supervised Learning an example is used in figure 1.
Random Forest
Linear Regression
Polynomial Regression
Logistic Regression
K-nearest neighbors
Decision Trees
Naïve Bayes
2. Unsupervised Learning:
The training data is not only unknown but also unlabeled in
Unsupervised Learning. Without knowing the data, it is impossible
to guide input to the algorithm. To solve this issue, Unsupervised
Learning originates where the trained model gives the required
response after searching a particular pattern or an interesting
structure.
Fuzzy-means
K-means Clustering
Apriori
Singular Value Decomposition
3. Reinforcement Learning:
Here, algorithm uses trial and error process to discover data and
then it is decided that which action results in better rewards.
Reinforcement Learning consists of three major components that are
agent, actions and environment. The agent is the decision maker or
learner, actions are the way of an agent works and environment
involves everything around agent. Reinforcement learning is based
on agent’s actions that help in maximizing the reward in a given
time.
1. Advantages
Wide applications
2. Disadvantages
Data acquisition
Bi-Directional RNN: The output does not only depend on past but
also on future.
Deep RNN: Multiple layers are present per step to achieve more
accuracy and high learning rate.
Advantages:
Disadvantages:
susceptible to noise.
3. Autoencoder:
Autoencoders work in unsupervised environment by using
principle of propagation. Although they have some resemblance to
Principle Component Analysis (PCA) but they show more flexibility than
PCA.The basic purpose of autoencoders is the identification and
determination of normal or regular data to detect anomalies. Instead of
using multiple layers still output signal remains close to the input. There
are four different types of autoencoders.
low complexity
Disadvantages:
Some encoders create a deterministic bias in the model
1. Object Detection
In object detection, different instances of objects like humans,
weapons, airplanes, etc. are detected from images and videos. A large
number of research work for object detection is based on different
networks like R-CNN, Faster R-CNN, YOLO, SDD, HOG etc. Object
detection models are usually divided into three stages.
2. Face Recognition
Face recognition is one of the important applications of computer
vision. A feature extractor is used to extract features from a face and
predictions are made by using feature extractor classifier. CNNs brought
a revolutionary change in the field of face recognition due to which
security system has become more developed.
5. You-Only-Look-Once (YOLO)
YOLO is a globally used process for recognition and
detection of objects. The YOLO model mainly works in real time
and processes all the images at 45fps, while it’s another version
called Fast YOLO takes 155fps to process images. This processing
rate is still greater than mAP for other detectors. In some domains,
this algorithm works better than DPM and R-CNN. YOLO
network knows the context and gives less background errors.
However, it has certain limitations like it struggles with dealing
different aspect ratios and small objects present in an image.
Figure 11. YOLO Architecture [20]
Image Recognition
Video Detection
The development of computing machines and the availability of large scale dataset
has made machine learning an important field of research. Machine learning does not
only help in medical field, for the development of autonomous cars but also has made
wonderful discoveries in computer vision by detecting objects with the help of images
taken by any web camera. In [27], the authors have implemented some machine
learning algorithms for object detection to assist the people with visual disabilities.
The experiment is based on Convolutional Neural Network (CNN), which is trained
on the ImageNet dataset to not only detect objects but also narrate the information of
detected object to the respective blind person. Moreover, it is also stated that the
implementation can be performed with any device that has a camera such as tablets,
computers and mobile phones.
7.1 Flowchart
Flow chart shows how the system is processed and how it works step by step.
We have implemented two models that are explained through flow chart in later section.
7.2 Models
7.2.1 Model 1:
The first model that we have implemented based on TransferFlow SSD MobileNet
detects the person along with on which side of the user, the person is present.
The model deals three conditions. The model is trained on COCO dataset using
Raspberry Pi.
7.2.2 Model 2:
The second model is based on text detection and recognition using OpenCV with
Tesseract OCR. The flow chart of model is shown in fig.
Figure. Flowchart of Model 1
Figure. Flowchart of Model 2
12 Statement of Contribution
Task Contribution
T1: Planning Student ID Contribution
Percentage
T2: Requirements
Student ID Contribution
Percentage
T4: Background
Student ID Contribution
Percentage
T5: Design
Student ID Contribution
Percentage
T6: Implementation
Student ID Contribution
Percentage
201601484 33.3%
Task Contribution
Student ID Contribution
Percentage
Overall participation
201501602
201601484
13 References
[1] Bourne, R., Flaxman, S., Braithwaite, T., Cicinelli, M., Das, A., Jonas, J., Taylor, H. Magnitude, temporal
trends, and projections of the global prevalence of blindness and distance and near vision impairment: A
systematic review and meta-analysis. J. Lancet Glob. Health 2017,5, 888–897.
[3] Executive Office of the President, National Science and Technology Council, Committee on Technology
(October 12, 2016), Preparing for the Future of Artificial Intelligence, p. 6,
https://obamawhitehouse.archives.gov/sites/default/
[4] Heath, N. (2019, July 1). What is AI? Everything you need to know about Artificial Intelligence. Retrieved
December 3, 2019, from https://www.zdnet.com/article/what-is-ai-everything-you-need-to-know- about-
artificial-intelligence
[5] Reasoning system. (2019, November 30). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Reasoning_system.
[6] Machine learning. (2019, November 29). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Machine_learning.
[7] Natural language processing. (2019, November 30). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Natural_language_processing.
[8] Brownlee, J. (2019, July 5). A Gentle Introduction to Computer Vision. Retrieved December 3, 2019, from
https://machinelearningmastery.com/what-is-computer-vision/.
[10] Artificial general intelligence. (2019, December 2). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Artificial_general_intelligence.
[11] Priyadharshini (2020). What is Machine Learning and How Does It Work?
https://www.simplilearn.com/tutorials/machine-learning-tutorial/what-is-machine-learning
[12] Team, D. F. (2019, March 1). Advantages and Disadvantages of Machine Learning Language. Retrieved
December 3, 2019, from https://data-flair.training/blogs/advantages-and-disadvantages-of-machine-
learning/.
[13] A. P. Sanskruti Patel, (2018). Deep Leaning Architectures and its Applications a Survey.
INTERNATIONAL JOURNAL OF COMPUTER SCIENCES AND ENGINEERING, vol. 6, no. 6, pp.
1177-1183, 2018.
[14] Witold Pedrycz,Shyi-Ming Chen, (2020). Deep Learning: Conceptsand Architectures. Studies in
Computational Intelligence, DOI: 10.007/978-3-030-31756-0
[15] https://hub.packtpub.com/top-5-deep-learning-architectures/
[16] History of Face Recognition & Facial recognition software. (2019, May 9). Retrieved December 3, 2019,
from https://www.facefirst.com/blog/brief-history-of-face-recognition-software/ .
[17] Symanovich, S. (n.d.). How does facial recognition work? Retrieved December 3, 2019, from
https://us.norton.com/internetsecurity-iot-how-facial-recognition-software-works.html.
[18] https://www.gannettcdn.com/media/2018/05/22/USATODAY/USATODAY/636626047644270447-
052218-facial-recognition-Online.png
[19] Ivankov, A. (2019, October 22). Facial Recognition: Advantages and Disadvantages. Retrieved December
3, 2019, from https://www.profolus.com/topics/facial-recognition-advantages-and- disadvantages/.
[20] https://icatcare.org/app/uploads/2018/07/Thinking-of-getting-a-cat.png
[21] https://medium.com/@techmayank2000/object-detection-using-ssd-mobilenetv2-using-tensorflow-api-can-
detect-any-single-class-from-31a31bbd0691
[22] What is TensorFlow? Introduction, Architecture & Example. (n.d.). Retrieved December 3, 2019, from
https://www.guru99.com/what-is-tensorflow.html.
[23] Sumitra A. Jakhete, Avanti Dorle, Piyush Pimplikar, Pranit Bagmar, Atharva Rajurkar, 2019. Object
Recognition App for Visually Impaired, IEEE Pune Section International Conference (PuneCon),
DO1:10.1109/PuneCon46936.2019.9105670
[24] Tatsuro UEDA, Hirohiko KAWATA, Tetsuo TOMIZAWA,Akihisa OHYA and Shin’ich YUTA. Visual
Information Assist SystemUsing 3D SOKUIKI Sensor for Blind People, IECON, DOI:
10.1109/IECON.2006.347767
[25] Akhilesh Krishnan, Deepakraj G, Nishanth N, Dr.K.M.Anandkumar, 2016. Autonomous Walking Stick for
The Blind Using Echolocation and Image Processing, IEEE, DOI: 10.1109/IC3I.2016.7917927
[26] Aniqua Nusrat , Sonia Corraya, (2016). Detecting real time object along with the moving direction for
visually impaired people. 2nd International Conference on Electrical, Computer & Telecommunication
Engineering (ICECTE), DOI: 10.1109/ICECTE.2016.7879628
[27] Jawaid Nasreen, Warsi Arif, Asad Ali Shaikh, Yahya Muhammad, Monaisha Abdullah, (2019). Object
Detection and Narrator for Visually Impaired People. IEEE 6th International Conference on Engineering
Technologies and Applied Sciences (ICETAS), DOI: 10.1109/ICETAS48360.2019.9117405
[28] [3] Rais Bastomi, Firza Putra Ariatama, Lucke Yuansyah Arif Tryas Putri, Septian Wahyu Saputra,
Mohammad Rizki Maulana , Mat Syai’in, Ii Munadhif, Agus Khumaidi, Mohammad Basuki Rahmat,
Annas Singgih Setiyoko, Budi Herijono, E.A. Zuliari, Mardlijah , (2019). Object Detection and Distance
Estimation Tool for Blind People Using Convolutional Methods with Stereovision. International
Symposium on Electronics and Smart Devices, DOI: 10.1109/ISESD.2019.8909515
[29] [4] M. Nassih, I. Cherradi, Y. Maghous, B. Ouriaghli, Y. Salih-Alj, (2012). Obstacles Recognition System
for the Blind People Using RFID. Sixth International Conference on Next Generation Mobile Applications,
Services and Technologies, DOI: 10.1109/NGMAST.2012.28
[30] [5] Sunitha M. R; Fathima Khan; Gowtham Ghatge R; Hemaya S, (2019). Object Detection and Human
Identification using Raspberry Pi. 1st International Conference on Advances in Information Technology
(ICAIT), DOI: 10.1109/ICAIT47043.2019.8987398
[31] Samkit Shah, Jayraj Bandariya, Garima Jain, Mayur Ghevariya, Sarosh Dastoor, (2019). CNN based Auto-
Assistance System as a Boon for Directing Visually Impaired Person. Proceedings of the Third
International Conference on Trends in Electronics and Informatics, DOI: 10.1109/ICOEI.2019.8862699
[32] Zhigong Zhou, Xiaosong Lan, Shuxiao Li, Chengfei Zhu, Hongxing Chang, (2019). Feature Pyramid SSD:
Outdoor Object Detection Algorithm for Blind People. IEEE 5th International Conference on Computer
and Communications.
[33] Md. Mazidul Alam, Sayed Erfan Arefin, Miraj Al Alim, Samiul Islam Adib, Md. Abdur Rahman, (2017).
978-1-5090-5627-9/17/$31.00 ©2017 IEEE Indoor localization System for Assisting Visually Impaired
People. International Conference on Electrical, Computer and Communication Engineering (ECCE), DOI:
10.1109/ECACE.2017.7912927
[34] Milios Awad, Jad El Haddad, Edgar Khneisser, (2018). Intelligent Eye: A Mobile Application for Assisting
Blind People. IEEE Middle East and North Africa Communications Conference (MENACOMM), DOI:
10.1109/MENACOMM.2018.8371005
[35] Kresimir Romic, Irena Galic, Tomislav Galba, (2015). Technology Assisting the Blind - Video Processing
Based Staircase Detection. 57th International Symposium ELMAR, DOI: 10.1109/ELMAR.2015.7334533
14 Terminology & Definitions
AI: Artificial intelligence
TF: TensorFlow
API: Application Program Interface
SSD: Single Shot Detector
ANI: Artificial Narrow Intelligence
AGI: Artificial General Intelligence
ASI: Artificial Super Intelligence
NLP: Natural Language Program
RNN: Recurrent Neural Network
CNN: Convolutional Neural Network
HOG: Histogram of Oriented Gradients
R-CNN: Region-based Convolutional Neural Network
SPP: Spatial Pyramid Pooling
YOLO: You-Only-Look-Once
Biography
Appendix
Attention
Model: