A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq

A Report on Existing AI Work for Visually
Impaired People
Ayesha Tariq
Executive Summary
Blind people have to face several problems in daily life, especially when they are
walking. for their protective support, an object detection system is developed in this
project based on raspberry pi and Tensor Flow framework. An image captured by a web
camera is taken as input, which is then processed to detect objects in blind person’s
surroundings. The proposed algorithm is tested on objects like a chair, a table, a TV and
other outdoor obstacles. This system is not only capable of detecting on which side of the
person, a particular object is present but also detect people as objects. After detection, the
recognized results are used to notify the blind person through voice in real time. The SSD
(Single Shot Detector) Mobilenet model is used to evaluate the object recognition system.
.
ii
Table of Contents
1 INTRODUCTION..................................................................................................................1
2 MOTIVATION.......................................................................................................................2
3 BACKGROUND.....................................................................................................................10
3.1 Artificial Intelligence................................................................................................10
3.1.1 Categories of AI.......................................................................................................10
3.1.2 AI Goals...................................................................................................................10
3.2 Machine learning......................................................................................................12
3.2.1 What is Machine Learning.....................................................................................12
3.2.2 How does ML work................................................................................................12
3.2.3 Types of ML............................................................................................................12
3.2.4 Advantages and Disadvantages.............................................................................12
3.3 Deep Learning..........................................................................................................13
3.3.1 What is Deep Learning...........................................................................................13
3.3.2 How Deep Learning works....................................................................................13
3.3.3 Deep Learning Architectures.................................................................................14
3.3.4 Applications of Deep Learning in Computer Vision...........................................15
3.3.5 Object Detection Techniques................................................................................15

iii
3.4 TensorFlow Object Detection API..........................................................................16
3.4.1 What is API.............................................................................................................16
3.4.2 What is TensorFlow................................................................................................16
3.4.3 Advantages of TensorFlow.....................................................................................17
4 RELATED WORK...............................................................................................................23
4.1 Object Detection App. for Blind People ..............................................................23
4.2 3D SOKUIKI System for Blind People...................................................................23
4.3 Autonomous Walking Stick using Image Processing and Echolocation..............29
4.4 Real Time Object Detection with Moving Direction.............................................29
4.5 Convolutional Neural Network (CNN) with ImageNet Dataset...........................29
4.6 Object Detection with Distance Estimation using CNN........................................23
4.7 Object Detection using Radio Frequency Identification.......................................23
4.8 Object Detection using Raspberry Pi.....................................................................23
4.9 Haar Cascade and CNN for Object Detection.......................................................23
4.10 Feature Pyramid SSD using BLIND Dataset.......................................................23
4.11 Indoor Localization System...................................................................................23
4.12 GUI Based Mobile Application.............................................................................23
4.13 Mobile Application to Assist Blind People...........................................................23
4.14 Hand Gestures and Facial Recognition using OpenCV.......................................23

iv
5 REFERENCES......................................................................................................................62
6 TERMINOLOGY & DEFINITIONS...................................................................................65

List of Figures
1 FIG.1 – WORK BREAKDOWN STRUCTURE (WBS)......................................................5
2 FIG.2 – Supervised Learning.................................................................................................8
3 FIG.3 – UNSUPERVISED LEARNING.............................................................................11
4 FIG.4 – RNN NETWORK...................................................................................................14
5 FIG.5 – CNN ARCHITECTURE........................................................................................15
6 FIG.6 – BASIC ILLUSTRATION OF AUTOENCODER................................................17
7 FIG.7 – HOG ARCHITECTURE........................................................................................18
8 FIG.8 – R-CNN ARCHITECTURE.....................................................................................20
9 FIG.9– SPP-NET ARCHITECTURE.................................................................................24
10 FIG.10 – FAST R-CNN ARCHITECTURE.......................................................................30
11 FIG.11 – YOLO ARCHITECTURE...................................................................................31
12 FIG.12 – SDD MOBILENET ARCHITECTURE..............................................................32

1
1 Introduction
In a report of World Health Organization, 285 million people are visually impaired,
among them 39 million are completely blind. According to the researchers, with the
increase in population, the number of people having partially or fully blindness will
increase from 39 million to 115 million in 2020 [1]. Low vision and blindness reduces
their productivity and mobility in completing day-to-day tasks, so for that they have to
rely on smart sticks, experience and other people to assist them in walking without facing
any obstacle. Moreover, sudden changes in their surroundings can also cause accidents
because it is not possible for them to react properly in any instant situation. It is not easy
to understand visual aspects like depth, color, position and orientation of an object.
However, from last few decades, technology has made several advancements to help
partially or fully visually impaired people. Despite these advancements, still blind people
are facing many problems in their personal and professional life.
In this project, we tried to resolve this issue by using deep learning techniques to detect
an object after identifying it. A large dataset consisting of objects such as chairs, doors,
tables, bicycles etc. from daily scenes is generated with the help of a web camera to apply
the object detection and recognition technique. Google Object Detection API
(Application Program Inteface) based on TF(TensorFlow) is used to identify different
obstacles. The API consists of several models of object detection but we have used Single
Shot Detector (SSD) MobileNet, which is trained on COCO datasets. By using this
model, we did not only identify the type of object but also inform the user on which
direction that particular object is present with the help of a voice message.
2 Motivation
The motivation behind this project is to help the blind people by making their
navigation easy with the help of object detection system. As with the growing
technology, it is our responsibility to use it for the betterment of people especially those
who are facing certain disabilities and are dependent on others to fulfil their daily tasks.
By properly utilizing the advanced technology, we can help disabled people by making
them independent persons of the society. The aim of this project is particularly, to help
the blind people or those who are suffering from a lack of vision. Such persons have to
encounter many obstacles while walking indoors or outdoors. A walking stick or a cane is
a simple mechanical device used to detect static objects in the surroundings of visually
impaired person. Although this device is light and portable but it range-limited and is not
an accurate source of protection for that person. A lot of research work has been done to
ensure the safety of blind people and still researchers are trying to make further
advancements. The proposed system in this project is also specially designed to make
full navigation easier for a blind person into his/her surroundings.
3 Project Overview
Blindness is a physical disability that makes a person dependent on static devices like
cane or a stick and on other people as they cannot perform their daily tasks and fulfil their
needs on their own. Researchers have worked on supporting such people by developing
modern technologies but still there is a long way in this field to help people to deal with
their disabilities. By considering the hardships that a blind person faces in his/her daily
routine, this project is developed to reduce the sufferings of visually impaired people by
building smart glasses based on Artificial Intelligence.
For detection of objects and humans, TensorFlow framework is used with Single Shot
Detector (SSD) based on MobileNet algorithm. The system is implemented on Raspberry
Pi and uses COCO dataset to train data. After the successful detection of object or a
person, the system gives an output in the form of text, which is further converted into an
audio message to alert the user so that he/she can take action accordingly. Moreover, text
detection and recognition is also performed in this project to guide user about the location
of a classroom, office or a room no. For example, if the user is a student and he/she wants
to go to a classroom then the smart glasses can detect and recognize the classroom no.
mentioned on the door or above the door. For text recognition and detection OpenCV is
used with OCR and Tesseract algorithm.
4 Project Plan
4.1 Work breakdown structure (WBS)

Different Phases of the project are shown in the fig. Our project basically involves five
stages and each stage has some sub-stages consisting of small and big tasks.
Figure 1. Work Breakdown Structure (WBS)

4.2 Schedule/ Team organization
4.2.1 Weekly schedule
Table 1. Weekly work schedule
Week Tasks Week

No. No. Tasks
Week - Meet Advisor Week

1 - Decide on a topic 9
Week - Understand Problem Week

2 - Divide the tasks between group 10
members
- Weekly Plan
- Build WBS
Week - Write Abstract Week

3 - Write Introduction 11
Week - Write Motivation Week

4 - project Overview 12
-
Week - Write Background (AI, ML) Week
5 13
Week - Write Background (DL, Week

6 TesorFlow Object Detection) 14
Week - Literature Review Week

7 15
Week - Project Week

8 16
4.2.2 Team organization
Table 2. Team organization

4.3 Risk managements
It is important to consider the risks that might face during project design and implementation.
So, the risk management is a necessity for completing a project. The risks that we can face during project
and their solution is summarized in table.
Table 3. Risks and their solutions
Risk Solution
Schedule and time conflict Distribute the tasks among group members,
weekly meetings to track the performance of each
member
Difficulty in finding the related tutorial Seeks advisor and senior’s help related to your
project
Hard to download libraries (OpenCV, Use anaconda

TensorFlow)
Take real time images of different places and

Dataset Collection
situations by taking guidance from supervisor
5 Background
5.1 Artificial Intelligence

Artificial Intelligence (AI) is the branch of computer science where machines are
used to find the solutions of complex problems like a human. In AI, the algorithms are
implemented in a computer by utilizing the characteristics of human intelligence such as
the AI system first analyzes its surroundings and then takes some action to maximize the
chances of success [2].
AI is not a new technology or a new world for researchers. It began in 1940s and
achieve an explosion of interest around 2010. The reason behind the immensely
increasing demand of AI was developments in machine learning, need of big data sources
and increasing powers of computer processing [3]. AI has been used in:
 prediction tools and Disease mapping
 spam filters in email
 Smart assistants like Alexa and Siri
 Robotics
 In tools for monitoring social media from false and dangerous content
 Optimized health care recommendations
5.1.1 Categories of AI
Artificial Intelligence usually falls under two categories.
 Artificial Narrow Intelligence (ANI)
 Artificial General Intelligence (AGI)
 Artificial Super Intelligence (ASI)
Narrow AI:
Because of having narrow range of abilities ANI is also termed as
“weak AI”. It is basically, a goal oriented type of AI, which is designed to
perform a specific single task. The task may be speech recognition, facial
recognition, internet surfing etc. These machines work under narrow sets
of limitations and constraints. Instead of replicating human intelligence,
ANI only simulates human behavior by using a narrow set of contexts and
parameters [4].
General AI:
AGI is also known as “strong AI or deep AI” and is the branch of
AI which intelligently solve any problem by mimicking human behavior
and intelligence. AGI is able to reason, make judgment, learn, plan and
solve problems by integrating prior knowledge in a creative and inventive
way.
Super AI:
ASI is the branch of AI that does not only understand or mimic
human intelligence but also at this level the machine becomes self-aware
exceeds the human ability and intelligence.
5.1.2 AI Goals:
1. Reasoning, Problem Solving:
Most of the artificial algorithms require giant computational
Software system to solve complex problems. The size of memory and time
required for the computation depends on how big a problem is. It is the
first priority of AI to search highly efficient and optimized problem-
solving algorithms by using logical techniques [5].
2. Knowledge Representation:
Knowledge engineering and representation are the central part of
AI because most of the machines require extensive knowledge about
human behavior, their ability to solve a problem and the environment in
which a particular problem is happening. Moreover, the knowledge of
different objects, relation between them, categories, events, properties and
their states is also important to learn for a machine in order to solve a
problem.
3. Machine Learning:
Machine Learning (ML) is the sub branch of Artificial Intelligence
and consists of computer algorithms that have the ability of improving
automatically through experience [6].
4. Planning:
The AI based computer system assess its environment to make a
plan consisting of predictions and parameters to maximize the ability of
solving a problem. This plan can also adapt according to the assessments
that are made using algorithms.
5. Natural Language Processing (NLP):

NLP helps the machines to understand and read the languages that
human can speak. A powerful NLP system has the ability of enabling user
interfaces of natural languages and can acquire knowledge from internet
texts, which are written by humans [7]. Machine translation and text
mining are the applications of NLP.
6. Computer Vision:
In this field of study, different techniques or algorithms are
developed which can help computers to understand images taken by web
camera and videos content happening in real time [8].
7. Robotics:
This branch of engineering deals with the use of robots by building
their design, way of construction and which operations they can perform.
This branch also deals with sensory feedback, control mechanism and
information processing [9].
8. Intelligence:
Both social and general intelligence are used by a machine to learn
and understand any intelligent task that a human can perform [10].
5.2 Machine learning
Machine Learning is an exciting sub-area of Artificial Intelligence and has cover the
modern world completely. It brings data in a different way like stories that are suggested
by Facebook. Machine Learning allows the computers to automatically learn and adapt
according to the situation by making predictions, recognitions and detections
automatically.
5.2.1 What is Machine Learning:

Machine Learning does not need any direct programming because of the ability of
learning from experience just like humans. When machine learning applications are
exposed to some new data, they automatically learn, change, grow and develop without
any external help. In other words, Machine Learning helps the computers to look for
insightful information without telling them where to look. It is done by utilizing
leveraging algorithms that are used to learn from data using iterative approach. Basically,
previous transactions and iterations are used to learn from data by using “pattern
recognition” algorithms.
5.2.2 How does ML work:

There is no doubt that Machine Learning is one of the exciting branch of Artificial
Intelligence. It uses specific inputs to complete the task of data learning. It is important to
know the working principle of machine learning and how researchers can use it for future
developments.
First, the training data is used as input into any selected algorithm. Training data whether
it is known or unknown is used for developing final ML algorithm. the data type has an
impact on algorithm which will be covered in next stages. to check whether the algorithm
is working correctly or not, some new data is used which is fed to the respective
algorithm and then results and predictions are checked. If the difference between
prediction and obtained results is large then the algorithm runs iteratively until the final
desired output is achieved. By using iterative approach, the algorithm learns continuously
to achieve the most optimal solution of the problem.
5.2.3 Types of ML:
Machine learning is divided into three categories.
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
1. Supervised Learning:
In supervised learning, labeled or known data is used as training
data. Because of the known data, the learning is called supervised
means it is directed into successful implementation and execution.
The input data into the ML algorithm is used for training model.
After getting a trained model from known data, unknown data is
used to develop another new response. To understand the concept of
Supervised Learning an example is used in figure 1.
Figure 2: Supervised Learning [11]
In this example, the model is trying to distinguish between apple

and other fruits. Once the model is successfully trained, it will
identify and give the response that given data is an apple.
The algorithms used for Supervised Learning are listed below:
 Random Forest
 Linear Regression
 Polynomial Regression
 Logistic Regression
 K-nearest neighbors
 Decision Trees
 Naïve Bayes
2. Unsupervised Learning:
The training data is not only unknown but also unlabeled in
Unsupervised Learning. Without knowing the data, it is impossible
to guide input to the algorithm. To solve this issue, Unsupervised
Learning originates where the trained model gives the required
response after searching a particular pattern or an interesting
structure.
Figure 2 helps in understanding the concept of unsupervised

learning.
Figure 3: Supervised Learning [11]
The algorithms being used for unsupervised learning are:
 Fuzzy-means
 K-means Clustering
 Partial Least Squares
 Apriori
 Singular Value Decomposition
 Principle Component Analyses
3. Reinforcement Learning:
Here, algorithm uses trial and error process to discover data and
then it is decided that which action results in better rewards.
Reinforcement Learning consists of three major components that are
agent, actions and environment. The agent is the decision maker or
learner, actions are the way of an agent works and environment
involves everything around agent. Reinforcement learning is based
on agent’s actions that help in maximizing the reward in a given
time.
5.2.4 Advantages and Disadvantages

Although, machine learning has helped many organizations and companies
because of its ability of handling large datasets but some disadvantages are also
associated with Machine Learning [12].
1. Advantages
 it can easily identify patterns and trends.
 Human intervention is not required.
 It continuously improves in efficiency and accuracy
 Wide applications
 It can handle multi-variety and multi-dimensional data
2. Disadvantages
 Data acquisition
 It requires enough time and massive resources to let algorithms

learn and develop
 Selection of required algorithm for interpretation of results

 High error susceptibility
5.3 Deep Learning
5.3.1 What is Deep Learning:

Deep Learning is considered as a sub-area of machine learning. Through Artificial
Intelligence machines become able to think smartly with minimal human involvement.
Machine learning consists of different algorithms that have the ability to model
intellections from input data. However, deep learning offers many adaptive ways by
using deep neural networks that are able to learn features by their own from input data
and in this way machine can take its own decision by analyzing all the parameters
affecting a problem. Deep learning does not consist of task specific algorithms like
machine learning because it works by learning data representations and learning can be
unsupervised, supervised or semi-supervised. Moreover, there is also not any specific
condition for data type because it can handle data in any form, structured and
unstructured, including texts, images and sounds. It is also known as end-to-end learning
for its ability of learning directly from input data. Deep learning algorithms do not need
any human mediation and sometimes the algorithms produce more accurate and efficient
results than human’s.
5.3.2 How Deep Learning works:

Deep Learning is considered as an application, which consists of multi-layer neural
networks that have multiple neurons at every layer to complete the desired tasks like
regression, classification and clustering etc. Each neuron has an activation function and
is used as a single logistic node that is connected to the input in the subsequent layer.
To modify weights, a loss function is calculated at each neuron and is further
optimized so that it becomes suitable for input data. Each layer in neural network with
multiple neurons is initiated with different weights and has the ability of learning input
data simultaneously. Thus, each node in a single layer learns by analyzing the output of
previous layer by decreasing the approximation of input data to achieve an accurate
output [13].
5.3.3 Deep Learning Architectures:

Many architectures are being used for deep learning and some of them are discussed
below.
1. Recurrent Neural Networks (RNN):

The RNN is considered among fundamental architectures which
are the building blocks of deep learning. The basic difference between
typical neural networks and recurrent neural network is that it has many
connections that helps in giving feedback to same layers or prior layer
while the other networks only have feed-forward connections. Using this
feedback, RNNs can store the memory of previous inputs and the other
problems that the model has to face. RNNs can be trained by using
standard back propagation in time (BPTT) and also unfolded in time.
The key differentiator in RNN is the presence of feedback network

that can manifest itself from output layer, hidden layer or some other
combinations.
Figure 4. RNN Network [15]
Over the years, different varieties of RNN have been developed.
 Bi-Directional RNN: The output does not only depend on past but
also on future.
 Deep RNN: Multiple layers are present per step to achieve more
accuracy and high learning rate.
2. Convolutional Neural Network (CNN):

CNN involves convolution rather than operating with standard
matrix multiplication. Convolutional Network is famous for its two
distinct characteristics: parameter sharing and sparse interactions. In
sparse connectivity the kernel of model is smaller than the input size to
reduce the utilization of memory and its computation [14]. CNN involves
four steps that are:
 Convolution: It is the first stage where input signal is

received and the layer is convolved multiple times.
 Subsampling: After receiving the input from convolutional

layer, it is smoothened to decrease the filter’s sensitivity to
noise and other variations.
 Activation: At this stage, the flow of signal from one layer

to other is controlled.
 Fully Connected: Finally, all the layers are connected with

all neurons of preceding and subsequent layers.
Figure 5: CNN Architecture [15]
The advantages and disadvantages of CNN are listed below:
Advantages:
 Excellent visual recognition
 Once a segment is learned, CNN has ability of recognizing

that segment present anywhere in an image.
Disadvantages:
 Dependent on quality and size of training data
 susceptible to noise.
3. Autoencoder:
Autoencoders work in unsupervised environment by using
principle of propagation. Although they have some resemblance to
Principle Component Analysis (PCA) but they show more flexibility than
PCA.The basic purpose of autoencoders is the identification and
determination of normal or regular data to detect anomalies. Instead of
using multiple layers still output signal remains close to the input. There
are four different types of autoencoders.
 Vanilla Encoder: neural network having one hidden layer.
 Convolutional Encoder: input data is processed through

convolution instead of using fully connected layers.
 Muti-layer encoder: deals with multiple hidden layers
 Regularized encoder: use a specific loss function to get

optimal results
Figure 6. Basic Illustration of Autoencoder [15]
The advantages and disadvantages of Autoencoders are:

Advantages:
 The model is based on data instead of pre-defined filter
 low complexity
Disadvantages:
 Some encoders create a deterministic bias in the model
 Sometimes, the time required to train data can be high.
5.3.4 Applications of Deep Learning in Computer Vision
1. Object Detection
In object detection, different instances of objects like humans,
weapons, airplanes, etc. are detected from images and videos. A large
number of research work for object detection is based on different
networks like R-CNN, Faster R-CNN, YOLO, SDD, HOG etc. Object
detection models are usually divided into three stages.
 Information Region Selection: As an image may consist

of different objects, sizes and aspect ratios so a multi-scan
window is used to scan the complete image.
 Feature Extraction: To recognize various objects,

different features are extracted from an image by using
different object detection techniques.
 Classification: A classifier is created to differentiate the

target object from all other objects in an image.
2. Face Recognition
Face recognition is one of the important applications of computer
vision. A feature extractor is used to extract features from a face and
predictions are made by using feature extractor classifier. CNNs brought
a revolutionary change in the field of face recognition due to which
security system has become more developed.
3. Activity and Action recognition

Many deep learning techniques have been used for human activity
and action recognition. This application is very helpful in those areas
which are crowded and have the probability some suspicious activity.
4. Human Pose Estimation
Human pose estimation is used to determine the posture and

position of joints from images, depth images, image sequences, or
skeleton data which are captured by motion capturing hardware.
5.3.5 Object Detection Techniques

Different object detecting techniques are discussed below.
1. Histogram of Oriented Gradients (HOG):

HOG is actually a, feature descriptor which is used for
object detection. It is consisted not only on gradient occurrences
but also their orientations in a specific part of image like detection
windows and ROI (Region of Interest) etc. A sliding window
moves around the entire image for detection. The main advantage
of HOG is that it is easy to understand and implement and is
computationally inexpensive.
Figure 7: HOG architecture [16]
2. Region-based Convolutional Neural Network (R-CNN)

After the revolution of deep learning, researchers started to
investigate new ways to replace HOG based classifiers. However,
there was some problems with CNNs that they were
computationally expensive and too slow. To solve this issue, the
researchers developed R-CNN, which used a different object
proposal algorithm known as Selective Search to reduce the total
number of bounding boxes. R-CNN uses ConvNet to classify
objects for achieving object detection of high accuracy. Moreover,
it has the ability of scaling large scale object classes without using
hashing techniques.
Figure 8. R-CNN Architecture [17]
3. Spatial Pyramid Pooling (SPP-Net)

Still R-CNN was slow so, to replace R-CNN, the
researchers worked on SPP-Net where CNN representation for
whole image is calculated only once. Irrespective of image size
and scale, SPP-Net calculates the representation of fixed length.
Pyramid pooling has shown robust results in object detection.
However, still a problem was existed, which was the back
propagation through SPP layer. This problem paved the path for
Fast R-CNN.
Figure 9: SPP-Net Architecture [18]
4. Fast Region-based Convolutional Neural Network (Fast R-CNN)

This algorithm was developed to fix the problems of R-CN
and SPP-Net by improving the system’s speed and accuracy. In R-
CNN, region proposals are cropped to resize them but in Fast R-
CNN, the complete image is used to proceed further. The detection
rate if Fast R-CNN is higher than R-CNN and SPP-Net because
multi-task loss is used to train data in a single stage. Moreover,
there is no need of memory in Fast R-CNN to store the features of
images.
Figure 10. Fast R-CNN Architecture [19]
5. You-Only-Look-Once (YOLO)
YOLO is a globally used process for recognition and
detection of objects. The YOLO model mainly works in real time
and processes all the images at 45fps, while it’s another version
called Fast YOLO takes 155fps to process images. This processing
rate is still greater than mAP for other detectors. In some domains,
this algorithm works better than DPM and R-CNN. YOLO
network knows the context and gives less background errors.
However, it has certain limitations like it struggles with dealing
different aspect ratios and small objects present in an image.
Figure 11. YOLO Architecture [20]
6. Single Shot Detector (SSD)

SSD uses a single deep network for object detection from
an image. In this approach, the output is divided into various
default boxes by taking into consideration their aspect ratios,
which are used to scale the location of feature. It works by
combining multiple features of different resolutions to handle
multiple objects of different sizes. This algorithm is easy to
integrate into multiple systems that require the detection of
objects. In this project we are using SSD model with MobileNet
architecture because MobileNet architecture consists of 3x3 depth
wise convolution and 1x1 pointwise convolution. By merging
these two architectures, we will get a faster and efficient deep
learning method for object detection.
Figure 12. SSD MobileNet Architecture [21]

5.4 TensorFlow Object Detection API
5.4.1 What is API:

API is the abbreviation of Application Programming Interface, which provides a
common set of operations to developers to save their time as they do not need to do
programming from scratch.
5.4.2 What is TensorFlow:

TensorFlow is an open-source framework to create deep learning networks that can help in
object detection and recognition [22]. It consists of pre-trained models like KITTI dataset,
COCO dataset and Open Images dataset. TensorFlow can be used for:
 Image Recognition
 Video or sound Recognition
 Video Detection
5.4.3 Advantages of TensorFlow:

The reasons behind the popularity of TensorFlow are:
1. easily accessible.
2. Incorporates various API
3. allow the developer to use TensorBoard for visualizing the
structure of neural network by using graph computations.
4. debugging of program becomes easier

6 Related Work
6.1 Object Detection Application for Blind People

Vision is the most blessed sense that is necessary to help people for interaction with real
world. About 200 million people are visually impaired all around the world and because
of vision problem, they have to face a lot of hindrance in their day-to day activities. That
is the reason, it is essential for such people to understand their environment and to
recognize the objects they can interact with. In [23], the authors have developed an
android application that can help such people to see by using a handheld device like some
mobile phone. They have built an android application by integrating different techniques
that did not only help them to recognize obstacles around them but also alert them by
using an audio output in real time. This application is smart enough to alert them as soon
as possible to assure their safety. The object detection and recognition is done by using
Single Shot Detector (SSD). The reason behind using this algorithm is its ability of giving
accurate results for object detection is its response time, which is faster as compared to
other real time algorithms. TensorFlow APIs have used I the development of this
application along with TextToSpeech API for getting an audio output.
6.2 3D SOKUIKI System for Blind People

In [24], the authors have developed a system to aware blind people about their
surroundings. To get awareness regarding their environment, they have to take a small
sized PC and a 3D scanner with them while walking. The scanner helps in scanning the
data map in 3D of the surroundings. On the other hand, the PC helps in analyzing the
range of data map to detect those objects, which can be helpful for visually impaired
people by giving them a support while walking. This information is given through a
synthesized sound with the help of PC. In this paper, the authors did not only introduce
the basic concept of the entire system but also they have clarified all the tasks to realize
the system. Moreover, they have also elaborated the method of acquiring 3D data along
with the description of obstacles and detecting objects. At last, the practicality of the
system is examined by performing an experiment where their trial system has detected
the bumps and trenches faced by a blind person in an experimental environment.
6.3 Autonomous Walking Stick using Image Processing and

Echolocation
[25] has developed a smart walking stick also known as the Assistor to help visually
impaired people by giving them the ability of identifying objects through which they can
easily reach at their destination without depending on others. The Assistor is developed
by using the technologies of image processing, echo location and navigation system. It
serves like a potential aid for visually challenged people and hence help them by
improving the quality of life. Although, a lot of research work has been done before this
research to improve the lives of such people by making a lot walking sticks of different
types and several systems but none of them was able enough to offer a runtime
independent navigation along with the detection of objects and alerts after identification.
Ultrasonic sensors have been used in the Assistor for echoing sound waves and object
detection. for identification of objects and to capture runtime images, an image sensor
was used. Finally, a smartphone application was used to help user in navigation by using
Global Positioning System (GPS) and maps.
6.4 Real Time Object Detection with Moving Direction

Detection of objects in real time along with the person's moving direction is a real
challenging area for researchers. The recent development in technology to capture real
world scene and the advance portable devices such as Microsoft Kinect has made it
necessary to build a simple yet faster and reliable navigation system for blind people. In
[26], an effective and suitable technique has been developed to detect objects along with
their moving direction in the indoor environment. Microsoft Kinect is used to capture the
depth information regarding the front scenario of the visually impaired person. From a
video of one second, they extracted three depth frames consecutively and the distance
along with Line Profile graph is created for almost four predefined lines for every depth
frame. After that, the line profile graphs are used to analyze and detect any moving object
along with its moving direction. After implementing the experiment, the results show that
the proposed system is succeeded in detecting the moving object and its direction with an
accuracy of 92%. However, the overall efficiency of proposed system is 90%.
6.5 Convolutional Neural Network (CNN) using ImageNet Dataset
The development of computing machines and the availability of large scale dataset
has made machine learning an important field of research. Machine learning does not
only help in medical field, for the development of autonomous cars but also has made
wonderful discoveries in computer vision by detecting objects with the help of images
taken by any web camera. In [27], the authors have implemented some machine
learning algorithms for object detection to assist the people with visual disabilities.
The experiment is based on Convolutional Neural Network (CNN), which is trained
on the ImageNet dataset to not only detect objects but also narrate the information of
detected object to the respective blind person. Moreover, it is also stated that the
implementation can be performed with any device that has a camera such as tablets,
computers and mobile phones.
6.6 Object Detection with Distance Estimation using CNN

[28] has described a tool that has the ability of providing information regarding object.
This tool is also capable of estimating the distance of present object with the help of a
camera and it is also combined with a pair of glasses to make its use easy for the people
who need it. By using this tool, a blind person can not only detect object around him but
also recognize it which will help in improving the blind person's ability and skill to work
independently. The camera used in this tool acts like a human eye to avail real time
images and videos. After getting the RGB visual data, it is processed by using a
Convolution Neural Network (CNN) of 176x132 pixels which are convolved twice. After
convolution, the tool generates pixels of smaller size of 41x33 and weights are obtained
by using determined dataset and back propagation for classification. After obtaining the
results of detection, a centroid value is found which is the center point that helps in
calculating the distance between camera and objects with the help of Stereo Vision. After
that, to alert the blind person the output results are first converted into a sound and then
connected to a earphone. In this way, the blind person can navigate easily by hearing the
information given by the system. The simulated results show that the tool can detect and
recognize many objects like tables, humans, cars, chairs, motorbikes and bicycles with an
accuracy of 93.33%. However, the results show an error of 6.1% when the distance
between 50cm and 300cm is measured.
6.7 Object Detection using Radio Frequency Identification

In [29], Radio Frequency Identification (RFID) is used to recognize obstacles for blind
people. People having visual disabilities use canes or smart sticks as touching clues to
walk independently without any assistance. The main problem for a blind person is the
identification and detection of any type of object that they can face around their
surroundings. To make their navigation easy, although several techniques have been
developed like Radio Frequency Identification or Differential Global Positioning System
(DGPS) but all the previous work of several researchers has shown some limitations of
DGPS in detection of the position of obstacles, while on the other hand RFID has been
developed remarkably to help blind people. In this paper, not only an obstacle recognition
system is developed through canes by using RFID but the paper also describes the
existing relevant work in the field of RFID.
6.8 Object Detection using Raspberry Pi

[30] has investigated an approach of developing an independent navigation system for the
people who are partially or fully visually impaired with the help of a microprocessor,
which has a synthetic speech output. The system is designed to inform the blind persons
about the details of other human whom they encounter. The implemented algorithm uses
stereoscopic sonar system to detect the obstacles and detection is achieved with the help
of ultrasonic waves by sending them via vibro-tactile feedback to inform the blind person
about localization of objects. First, an obstacle is detected and then system stars
recognizing the type of object whether it is a human or any other static object. In case of
human, the system recognized the obstacle as human by matching the person's detail with
existing database and after that the system delivers this information to the user.
6.9 Haar Cascade and CNN for Object Detection
Machine Learning has paved the way for Artificial Intelligence (AI) to provide a system
that has the capability of natural learning and can make developments by analyzing its
experience without being programmed. On other words, it facilitates the system with
computer vision to make decisions depending on training algorithms. The main purpose
of [31] is to implement an object and obstacle detection system that can assist the people
with visually impaired to manage their daily activities without being accompanied with
external help. In this paper, a comparison is also made between different detection
algorithms such as Convolutional Neural Network (CNN) and Haar Cascade. Haar
Cascade is basically a classifier that is not only used in algorithms of face detection but
this algorithm can also be trained for various object detections. On the other hand,
Convolutional Neural Network is a deep learning technique that can be employed in
recognition of objects. The dataset used in this research consists of 2300 images of 3
separate classes. The experiment is performed to analyze which system is more
appropriate for real time detection and results show that CNN is more suitable than Haar
Cascade in terms of accuracy.
6.10 Feature Pyramid SSD using BLIND Dataset

[32] has established an object detection dataset based on outdoor environment known
as BLIND to benefit visually impaired people from the techniques of deep learning.
This dataset is different from already existing dataset like PASCAL VOC because in
their dataset various scales of objects are considered because of the difference in
distances between cameras and objects. BLIND dataset has a unique characteristic
that its working depends on scale invariance to detect objects for which simple SSD is
not sufficient. To handle this problem, the paper has proposed a novel detector called
Feature Pyramid SSD (FPSSD) that works on BLIND dataset and has the ability of
applying different fusion strategies to standard SSD. The experimental results show
that FPSSD is succeeded in achieving an accuracy of 75.4% Map by using BLIND
dataset and these results are 1.7% better than classical SSD. Moreover, the analyses
of results also demonstrate the need of establishing BLIND dataset by validating the
efficiency of proposed algorithm FPSSD for object detection to provide support to
blind people.
6.11 Indoor Localization System

In [33], the authors have developed an indoor localization system based on techniques of
image processing for color detection of connected objects. The developed system can
locate the user’s location by utilizing the techniques of color detection with a high
accuracy in real time scenarios. First the system filters out a particular image having a
specific color and it extracts all the pixel co-ordinates of that image. After that, the user’s
location is determined by making a comparison of the matrix for the values of already
existed matrix of trained images. They have conducted indoor experiments that yield very
efficient and accurate results. Moreover, they also integrate their localized system and
indoor navigation system after evaluating their results. The integration of these system
can prove helpful for blind people in those scenarios where accuracy and efficiency are
the most important factors. An android application is also developed in this paper to
further assist blind people in navigation.
6.12 GUI Based Mobile Application

In [34], an android application based on intelligent eye is developed to help visually
disabled people. The application assists partially or fully visually challenged people by
facilitating a certain set of helpful features like color detection, light detection, banknote
recognition and object recognition through a Graphical User Interface GUI that is
customized to help blind people. All these features are provided using a single device that
helps in reducing complexity and cost and also makes the application more practical. The
final results show that developed application is successful in achieving its aim by offering
all the desired features.
6.13 Mobile Application to Assist Blind People

Video analyses and processing have become an important part of today’s technical system
because it provides potential application to the system to provide assistance to many people
having physical disabilities. Fast processing of real time videos helps in extracting the useful
information about object detection. In [35], an algorithm is proposed to help blind people in
outdoor and indoor staircase detection to ease their navigation using video frames. The
proposed method uses morphological preprocessing procedures and horizontal and vertical
analyses of frames. The experimental results explain that implemented algorithm is accurate
and efficient.
6.14 Hand Gestures and Facial Recognition using OpenCV

A recognition system is developed in [36] using hand gestures and face recognition to
perform various tasks that can assist blind people to complete their daily chores. The
proposed system takes dynamic images from dynamic videos, which are then processed
using certain algorithms. For hand gesture system, the skin color is detected using YCbCr
color space and Convex Defect algorithm is used to point different features like angle
between fingers and finger tips. The gesture recognized system can be used to perform
various tasks like turning on/off lights or fans. On the other hand, LBPH recognizer and
classifiers of Haar Cascade are being employed to detect and recognize face respectively.
The research is implemented by using OpenCV. The system is successful to recognize
and detect human faces and mobile gestures. The accuracy of recognizing hand gestures
is 95.2% however facial recognition is performed with 92% accuracy.
7 Proposed Design
7.1 Flowchart
Flow chart shows how the system is processed and how it works step by step.
We have implemented two models that are explained through flow chart in later section.
7.2 Models
7.2.1 Model 1:
The first model that we have implemented based on TransferFlow SSD MobileNet
detects the person along with on which side of the user, the person is present.
The model deals three conditions. The model is trained on COCO dataset using
Raspberry Pi.
 Condition1: If the person is at left from half of the image
 Condition 2: If the person is at right from half of the image
 Condition 3: if the person is at a distance of (full width of image)/4
The process of implementation of model 1 is shown in fig.
7.2.2 Model 2:
The second model is based on text detection and recognition using OpenCV with
Tesseract OCR. The flow chart of model is shown in fig.
Figure. Flowchart of Model 1
Figure. Flowchart of Model 2
12 Statement of Contribution
Task Contribution
T1: Planning Student ID Contribution
Percentage
T2: Requirements
Student ID Contribution
Percentage
T3: Documentation Student ID Contribution

Percentage
T4: Background
Percentage
T5: Design
Percentage
T6: Implementation
Percentage
201601484 33.3%
Task Contribution
Percentage
Overall participation
Student ID Student Name Signature

201506485
201501602
201601484
13 References
[1] Bourne, R., Flaxman, S., Braithwaite, T., Cicinelli, M., Das, A., Jonas, J., Taylor, H. Magnitude, temporal
trends, and projections of the global prevalence of blindness and distance and near vision impairment: A
systematic review and meta-analysis. J. Lancet Glob. Health 2017,5, 888–897.
[2] Artificial intelligence (2019). Retrieved December 3, 2019, from

http://en.wikipedia.org/wiki/Artificial_intelligence#Definitions.
[3] Executive Office of the President, National Science and Technology Council, Committee on Technology
(October 12, 2016), Preparing for the Future of Artificial Intelligence, p. 6,
https://obamawhitehouse.archives.gov/sites/default/
[4] Heath, N. (2019, July 1). What is AI? Everything you need to know about Artificial Intelligence. Retrieved
December 3, 2019, from https://www.zdnet.com/article/what-is-ai-everything-you-need-to-know- about-
artificial-intelligence
[5] Reasoning system. (2019, November 30). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Reasoning_system.
[6] Machine learning. (2019, November 29). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Machine_learning.
[7] Natural language processing. (2019, November 30). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Natural_language_processing.
[8] Brownlee, J. (2019, July 5). A Gentle Introduction to Computer Vision. Retrieved December 3, 2019, from
https://machinelearningmastery.com/what-is-computer-vision/.
[9] Robotics. (2019, November 26). Retrieved December 3, 2019, from

https://en.wikipedia.org/wiki/Robotics.
[10] Artificial general intelligence. (2019, December 2). Retrieved December 3, 2019, from
https://en.wikipedia.org/wiki/Artificial_general_intelligence.
[11] Priyadharshini (2020). What is Machine Learning and How Does It Work?
https://www.simplilearn.com/tutorials/machine-learning-tutorial/what-is-machine-learning
[12] Team, D. F. (2019, March 1). Advantages and Disadvantages of Machine Learning Language. Retrieved
December 3, 2019, from https://data-flair.training/blogs/advantages-and-disadvantages-of-machine-
learning/.
[13] A. P. Sanskruti Patel, (2018). Deep Leaning Architectures and its Applications a Survey.
INTERNATIONAL JOURNAL OF COMPUTER SCIENCES AND ENGINEERING, vol. 6, no. 6, pp.
1177-1183, 2018.
[14] Witold Pedrycz,Shyi-Ming Chen, (2020). Deep Learning: Conceptsand Architectures. Studies in
Computational Intelligence, DOI: 10.007/978-3-030-31756-0
[15] https://hub.packtpub.com/top-5-deep-learning-architectures/
[16] History of Face Recognition & Facial recognition software. (2019, May 9). Retrieved December 3, 2019,
from https://www.facefirst.com/blog/brief-history-of-face-recognition-software/ .
[17] Symanovich, S. (n.d.). How does facial recognition work? Retrieved December 3, 2019, from
https://us.norton.com/internetsecurity-iot-how-facial-recognition-software-works.html.
[18] https://www.gannettcdn.com/media/2018/05/22/USATODAY/USATODAY/636626047644270447-
052218-facial-recognition-Online.png
[19] Ivankov, A. (2019, October 22). Facial Recognition: Advantages and Disadvantages. Retrieved December
3, 2019, from https://www.profolus.com/topics/facial-recognition-advantages-and- disadvantages/.
[20] https://icatcare.org/app/uploads/2018/07/Thinking-of-getting-a-cat.png
[21] https://medium.com/@techmayank2000/object-detection-using-ssd-mobilenetv2-using-tensorflow-api-can-
detect-any-single-class-from-31a31bbd0691
[22] What is TensorFlow? Introduction, Architecture & Example. (n.d.). Retrieved December 3, 2019, from
https://www.guru99.com/what-is-tensorflow.html.
[23] Sumitra A. Jakhete, Avanti Dorle, Piyush Pimplikar, Pranit Bagmar, Atharva Rajurkar, 2019. Object
Recognition App for Visually Impaired, IEEE Pune Section International Conference (PuneCon),
DO1:10.1109/PuneCon46936.2019.9105670
[24] Tatsuro UEDA, Hirohiko KAWATA, Tetsuo TOMIZAWA,Akihisa OHYA and Shin’ich YUTA. Visual
Information Assist SystemUsing 3D SOKUIKI Sensor for Blind People, IECON, DOI:
10.1109/IECON.2006.347767
[25] Akhilesh Krishnan, Deepakraj G, Nishanth N, Dr.K.M.Anandkumar, 2016. Autonomous Walking Stick for
The Blind Using Echolocation and Image Processing, IEEE, DOI: 10.1109/IC3I.2016.7917927
[26] Aniqua Nusrat , Sonia Corraya, (2016). Detecting real time object along with the moving direction for
visually impaired people. 2nd International Conference on Electrical, Computer & Telecommunication
Engineering (ICECTE), DOI: 10.1109/ICECTE.2016.7879628
[27] Jawaid Nasreen, Warsi Arif, Asad Ali Shaikh, Yahya Muhammad, Monaisha Abdullah, (2019). Object
Detection and Narrator for Visually Impaired People. IEEE 6th International Conference on Engineering
Technologies and Applied Sciences (ICETAS), DOI: 10.1109/ICETAS48360.2019.9117405
[28] [3] Rais Bastomi, Firza Putra Ariatama, Lucke Yuansyah Arif Tryas Putri, Septian Wahyu Saputra,
Mohammad Rizki Maulana , Mat Syai’in, Ii Munadhif, Agus Khumaidi, Mohammad Basuki Rahmat,
Annas Singgih Setiyoko, Budi Herijono, E.A. Zuliari, Mardlijah , (2019). Object Detection and Distance
Estimation Tool for Blind People Using Convolutional Methods with Stereovision. International
Symposium on Electronics and Smart Devices, DOI: 10.1109/ISESD.2019.8909515
[29] [4] M. Nassih, I. Cherradi, Y. Maghous, B. Ouriaghli, Y. Salih-Alj, (2012). Obstacles Recognition System
for the Blind People Using RFID. Sixth International Conference on Next Generation Mobile Applications,
Services and Technologies, DOI: 10.1109/NGMAST.2012.28
[30] [5] Sunitha M. R; Fathima Khan; Gowtham Ghatge R; Hemaya S, (2019). Object Detection and Human
Identification using Raspberry Pi. 1st International Conference on Advances in Information Technology
(ICAIT), DOI: 10.1109/ICAIT47043.2019.8987398
[31] Samkit Shah, Jayraj Bandariya, Garima Jain, Mayur Ghevariya, Sarosh Dastoor, (2019). CNN based Auto-
Assistance System as a Boon for Directing Visually Impaired Person. Proceedings of the Third
International Conference on Trends in Electronics and Informatics, DOI: 10.1109/ICOEI.2019.8862699
[32] Zhigong Zhou, Xiaosong Lan, Shuxiao Li, Chengfei Zhu, Hongxing Chang, (2019). Feature Pyramid SSD:
Outdoor Object Detection Algorithm for Blind People. IEEE 5th International Conference on Computer
and Communications.
[33] Md. Mazidul Alam, Sayed Erfan Arefin, Miraj Al Alim, Samiul Islam Adib, Md. Abdur Rahman, (2017).
978-1-5090-5627-9/17/$31.00 ©2017 IEEE Indoor localization System for Assisting Visually Impaired
People. International Conference on Electrical, Computer and Communication Engineering (ECCE), DOI:
10.1109/ECACE.2017.7912927
[34] Milios Awad, Jad El Haddad, Edgar Khneisser, (2018). Intelligent Eye: A Mobile Application for Assisting
Blind People. IEEE Middle East and North Africa Communications Conference (MENACOMM), DOI:
10.1109/MENACOMM.2018.8371005
[35] Kresimir Romic, Irena Galic, Tomislav Galba, (2015). Technology Assisting the Blind - Video Processing
Based Staircase Detection. 57th International Symposium ELMAR, DOI: 10.1109/ELMAR.2015.7334533
14 Terminology & Definitions
AI: Artificial intelligence
TF: TensorFlow
API: Application Program Interface
SSD: Single Shot Detector
ANI: Artificial Narrow Intelligence
AGI: Artificial General Intelligence
ASI: Artificial Super Intelligence
NLP: Natural Language Program
RNN: Recurrent Neural Network
CNN: Convolutional Neural Network
HOG: Histogram of Oriented Gradients
R-CNN: Region-based Convolutional Neural Network
SPP: Spatial Pyramid Pooling
YOLO: You-Only-Look-Once
Biography
Appendix
Attention
Model:

A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Report On Existing AI Work For Visually Impaired People: Ayesha Tariq

Uploaded by

Copyright:

Available Formats

A Report on Existing AI Work for Visually

3.1 Artificial Intelligence................................................................................................10

3.1.1 Categories of AI.......................................................................................................10

3.2 Machine learning......................................................................................................12

3.2.1 What is Machine Learning.....................................................................................12

3.2.2 How does ML work................................................................................................12

3.2.3 Types of ML............................................................................................................12

3.2.4 Advantages and Disadvantages.............................................................................12

3.3 Deep Learning..........................................................................................................13

3.3.1 What is Deep Learning...........................................................................................13

3.3.2 How Deep Learning works....................................................................................13

3.3.3 Deep Learning Architectures.................................................................................14

3.3.4 Applications of Deep Learning in Computer Vision...........................................15

3.3.5 Object Detection Techniques................................................................................15

3.4 TensorFlow Object Detection API..........................................................................16

3.4.1 What is API.............................................................................................................16

3.4.2 What is TensorFlow................................................................................................16

3.4.3 Advantages of TensorFlow.....................................................................................17

4.1 Object Detection App. for Blind People ..............................................................23

4.2 3D SOKUIKI System for Blind People...................................................................23

4.3 Autonomous Walking Stick using Image Processing and Echolocation..............29

4.4 Real Time Object Detection with Moving Direction.............................................29

4.5 Convolutional Neural Network (CNN) with ImageNet Dataset...........................29

4.6 Object Detection with Distance Estimation using CNN........................................23

4.7 Object Detection using Radio Frequency Identification.......................................23

4.8 Object Detection using Raspberry Pi.....................................................................23

4.9 Haar Cascade and CNN for Object Detection.......................................................23

4.10 Feature Pyramid SSD using BLIND Dataset.......................................................23

4.11 Indoor Localization System...................................................................................23

4.12 GUI Based Mobile Application.............................................................................23

4.13 Mobile Application to Assist Blind People...........................................................23

4.14 Hand Gestures and Facial Recognition using OpenCV.......................................23

6 TERMINOLOGY & DEFINITIONS...................................................................................65

1 FIG.1 – WORK BREAKDOWN STRUCTURE (WBS)......................................................5

2 FIG.2 – Supervised Learning.................................................................................................8

3 FIG.3 – UNSUPERVISED LEARNING.............................................................................11

4 FIG.4 – RNN NETWORK...................................................................................................14

5 FIG.5 – CNN ARCHITECTURE........................................................................................15

6 FIG.6 – BASIC ILLUSTRATION OF AUTOENCODER................................................17

7 FIG.7 – HOG ARCHITECTURE........................................................................................18

8 FIG.8 – R-CNN ARCHITECTURE.....................................................................................20

9 FIG.9– SPP-NET ARCHITECTURE.................................................................................24

10 FIG.10 – FAST R-CNN ARCHITECTURE.......................................................................30

11 FIG.11 – YOLO ARCHITECTURE...................................................................................31

12 FIG.12 – SDD MOBILENET ARCHITECTURE..............................................................32

4.1 Work breakdown structure (WBS)

Figure 1. Work Breakdown Structure (WBS)

4.2.1 Weekly schedule

Table 1. Weekly work schedule

Week Tasks Week

Week - Meet Advisor Week

Week - Understand Problem Week

Week - Write Abstract Week

Week - Write Motivation Week

Week - Write Background (DL, Week

Week - Literature Review Week

Week - Project Week

Table 2. Team organization

Table 3. Risks and their solutions

Hard to download libraries (OpenCV, Use anaconda

Take real time images of different places and

5.1 Artificial Intelligence

 prediction tools and Disease mapping