Professional Documents
Culture Documents
Visvesvaraya Technological University: Belagavi, Karnataka-590 014
Visvesvaraya Technological University: Belagavi, Karnataka-590 014
BACHELOR OF ENGINEERING
in
Mechatronics Engineering
by
Under Supervision of
MRS. ASHWINI TP
Senior Assistant Professor
Department of Mechatronics Engineering
MITE, Moodabidri
CERTIFICATE
This is to certify that the Project (18MTP68) entitled “Blind Assistance & Navigation
System’’ carried out by Mr. SHAHID SAYED, USN: 4MT19MT045, Mr. MOHAMMED
GOUSE SHAIKH, USN: 4MT19MT029, Mr. NAWMAN BAIG, USN: 4MT19MT035, Mr.
SHAQEEN M, USN: 4MT20MT403, a bonafide students of Mangalore Institute of
Technology & Engineering in partial fulfillment for the award of Bachelor of Engineering
in Mechatronics Engineering of the Visvesvaraya Technological University, Belagavi
during the year 2021-22. It is certified that all corrections/suggestions indicated for Internal
Assessment have been incorporated in the Report deposited in the departmental library. The
project report has been approved as it satisfies the academic requirements in respect of Seminar
prescribed for the said Degree.
1.
2.
Blind assistance and navigation system
ACKNOWLEDGEMENT
We would like to thank our beloved Chairman of Mangalore Institute of Technology and
Engineering Mr. Rajesh Chowta, Principal, Dr. M S Ganesha Prasad, and the Dean
(Academics) Dr. Divakar Shetty S for their kind patronage.
We heartily convey my deepest thanks to my internal guide Mrs. Ashwini TP Senior Asst-
Professor, Department of Mechatronics Engineering, MITE for his constant monitoring and
guidance provided during the course of this project work.
We would like to thank all the teaching and non-teaching staffs of MITE for their support in
completion of this project work.
Our acknowledgment would be incomplete if we do not thank our parents for their
encouragement and support throughout our educational life. Finally, we extend our gratitude
to all my friends and to all those who have helped us in the completion of the project work.
NAWMAN BAIG
MOHAMMED GOUSE
SHAQEEN M
SHAHID SAYED
DECLARATION
We hereby declare that the project work entitled “Blind Assistance and Navigation System”
is a record of original project work under taken by us in partial fulfilment of requirements for
the award of degree of bachelor of engineering in mechatronics engineering of Visvesvaraya
Technical University Belagavi during the Academic year 2021-2022 we have completed this
project work under the supervision of Mrs. Ashwini TP Senior Asst-Professor, Department
of Mechatronics Engineering Mangalore Institute of technology and engineering Moodabidri.
We are also declared that to the best of our knowledge and belief that the work reported herein
doesn’t form part of any other thesis or dissertation based on which a degree or award was
conferred earlier occasion by any student.
MOHAMMED 4MT19MT029
GOUSE
SHAQEEN M 4MT20MT403
ABSTRACT
Science and technology always try to make human life easier. The people who are having
complete blindness or low vision faces many difficulties during their navigation. Blindness
can occur due to many reasons including disease, injury or other conditions that limit vision.
The main purpose of this paper is to develop a navigation aid for the blind and the visually
impaired people. In this paper, we design and implement a system which helps the blind
and the visually impaired people to navigate freely by experiencing their surroundings. The
scene around the person will be captured by using a Raspberry Pi Cam and the objects in
the scene will be detected. The earphones will give a voice output describing the detected
objects. The architecture of the system includes the processor Raspberry Pi 3, Raspberry Pi
Cam, earphones and a power source. The processor collects the frames of the surroundings
and convert it to voice output. The device uses Yolo API, open-source machine learning
library developed by the Google Brain Team for the object detection and classification.
Yolo helps in creating machine learning models capable of identifying and classifying
multiple objects in a single image. Thus, details corresponding to various objects present
within a single frame are obtained using Yolo API. A Text to Speech Synthesizer (TTS)
software called eSpeak is used for converting the details of the detected object (in text
format) to speech output. So the video captured by using the Raspberry Pi Cam isfinally
converted to speech signals and thus narration of the scene describing various objects is
done. Objects which come under 90 different classes like cell phone, vase, person,couch etc.
are detected.
TABLE OF CONTENTS
PAGE
CHAPTER TITLE
NO NO
Acknowledgement i
Abstract ii
Table of content iii
Table of figure iv
CHAPTER 1 INTRODUCTION 2
1.1 Overview 2
1.2 Existing system 4
1.3 Problem statement 4
1.4 Proposed system 5
1.5 Objective of the project 5
CHAPTER 2 LITERATURE REVIEW 6
CHAPTER 3 METHODOLOGY 10
3.1 flow chart 10
3.1.1 Image acquisition 11
3.1.2 Noise removal 11
3.1.3 Edge detection 11
3.1.4 Edge linkage 12
3.1.5 Processed image 13
3.1.6 Object detection 13
3.2 Image to text conversion 14
3.3 Text to speech conversion 14
3.4 GPS system 15
CHAPTER 4 BLOCK DIAGRAM AND DESIGN 16
4.1 Hardware Requirements 17
4.1.1 Raspberry Pi 18
LIST OF FIGURES
FIGURE PAGE
DESCRIPTION
NO NO
4.3 Raspberry Pi 18
4.7 Battery 22
4.9 ESP8266 24
CHAPTER 1
INTRODUCTION
1.1 Overview
In our planet of 7.4 billion humans, 285 million are visually impaired out of whom 39 million
people are completely blind, i.e., have no vision at all, and 246 million have mild or severe
visual impairment (WHO, 2011). It has been predicted that by the year 2020, these numbers
will rise to 75 million blind and 200 million people with visual impairment.
As reading is of prime importance in the daily routine (text being present everywhere
from newspapers, commercial products, sign-boards, digital screens etc.) of mankind, visually
impaired people face a lot of difficulties. Our device assists the visually impaired by reading
out the text to them. There have been numerous advances in this area to help visually impaired
to read without much difficulties. The existing technologies use a similar approach as
mentioned in this paper, but they have certain drawbacks. Firstly, the input images taken in
previous works have no complex background, i.e., the test inputs are printed on a plain white
sheet. It is easy to convert such images to text without pre-processing, but such an approach
will not be useful in a real-time system. Also, in methods that use segmentation of characters
for recognition, the characters will be read out as individual letter and not a complete word.
This gives an undesirable audio output to the user. For our project, we wanted the device to
be able to detect the text from any complex background and read it efficiently.
Today, many physically challenged individuals depend on assistive technologies to
undertaketheir day-to-day activities. As a result, they will require additional support during
and after disasters especially when the infrastructure and other services are unavailable.
Different disaster management plans (Duncan et al., 2018; Ulmasova, Silcock, & Schranz,
2009; World Health Organizations, 2011) have been put forward addressing groups with
special requirements. Compared to the diversity of the problems and their population, this is
still minimal (World Health Organization, 2017). The term ‘disability’ covers a wide range of
disability forms; this study however, focuses on individuals living with visual impairment.
AWorld Health Organization (WHO) report states that there are 285 million people with visual
impairment worldwide. According to their statistics, of this group, 39 million are partially
blind and more than 1.3 million are completely blind.
In the initial days Visually Impaired person used the white cane for navigate themselves from
they had faced lot many problems to overcome that problem this problem IR sensor are used in
their white cane to the collusion. Here in this system IR sensor work on the basic principles The
IR transmitter sends an infrared signal that impact on the object and (e.g., white color), bounces
off and it will be received in the IR receiver. That captures the signal and detecting the object
in front the user and that will be indicated through the sound or vibrate.
UV sensor system is same working as the IR sensor system but ultrasonic sensor is more
advanced sensor in which we can measure distance by using ultrasonic sound wave. Thatwill
help the visually impaired person get know about place very accurate.
In our other objective is to read the text's in the sign boards and name board's etc. For
this where are using TTS engine which will convert text message to speech. Here output will
be in form of sound signal, and also GPS system is used for live tracking for safety of user and
routing of destination.
The objective of this research study is to design an assistive wearable cap for the blind or
visually impaired persons. The proposed solution presented is an assistive wearable ‘Pilot
System’ that helps people with visual impairment interact and navigate their way to safety by
wearing a cap fitted with a camera, which interacts with an online voice navigation system.
This solution could support visually impaired people to navigate their way to safety as
well as identify dangerous objects, ditches, fire and flood water scenarios after disasters. The
organization of this paper is as follows: firstly, a discussion on related assistive technologies
for the visually impaired are highlighted. Next, a description of the design of the proposed
solution along with the physical implementation of the pilot system is presented. The paper
concludes with some proposed future work.
To develop object or obstacle detection device using real time image processing .
To develop user tracking mechanism using GSM module.
CHAPTER 2
LITERATURE REVIEW
Abstract: In order to help the visually challenged people, a study that helps those people to
walk more confidently is proposed. The study hypothesizes a smart walking stick that alerts
visually-impaired people over obstacles, pit and water in front could help them in walking
with less accident. It outlines a better navigational tool for the visually impaired. It consists
of a simple walking stick equipped with sensors to give information about the environment.
GPS technology is integrated with pre-programmed locations to determine the optimal route
to be taken. The user can choose the location from the set of destinations stored in the memory
and will lead in the correct direction of the stick. In this system, ultrasonic sensor, pit sensor,
water sensor, GPS receiver, level converter, driver, vibrator, voice synthesizer, keypad,
speaker or headphone, PIC controller and battery are used. The overall aim of the device is to
provide a convenient and safe method for the blind to overcome their difficulties in daily life.
and for a longer time for each person. This also impacts on the provision of the long-term
needs for vision-related rehabilitation.
Title: Objects Talk - Object Detection and Pattern Tracking Using YoloYear:
2018
Author: Rasika Phadnis; Jaya Mishra; Shruti Bendale
Abstract: Objects in household that are frequently in use often follow certain patterns with
respect to time and geographical movement. Analyzing these patterns can help us keep better
track of our objects and maximize efficiency by minimizing time wasted in forgetting or
searching for them. In our project, we used Yolo, a relatively new library from Google,to model
our neural network. The Yolo Object Detection API is used to detect multipleobjects in real-
time video streams. We then introduce an algorithm to detect patterns and alertthe user if an
anomaly is found. We consider the research presented by Laube et al., Finding REMO-
detecting relative motion patterns in geospatial lifelines, 201-214, (2004).
environment aided by their hearing and crutches and then based on a vague impression
speculate where they are located. To let the blind confidently take each step, this paper
proposes sticking the electronic tag of the radio-frequency identification (RFID) system on
the back of guide bricks.
Title: Design and Implementation of a Walking Stick Aid for Visually Challenged People
Year: 2019
Author: Nilimasahoo, Hung-Wei Lin
Abstract: Visually challenged people (VCPs) face many difficulties in their routine life.
Usually, in many cases, they need to depend upon others, which makes them unconfident in
an unfamiliar environment. Thus, in this paper, we present an aid that helps in detecting
obstacles and water puddles in their way. This system comprises a walking stick and Android-
based applications (APPs). The walking stick is embedded with Raspberry Pi and
programmable interface controller (PIC) as a control kernel, sensors, a global position system
(GPS) module, and alert-providing components. Sensors help to detect obstacles, and the VCP
is informed through vibrations or a buzzer according to the obstacle detected. The GPS
module receives the coordinates of the VCP’s location, and the location can be tracked by
parents using an APP. Another important APP is used, called an emergency APP, by which
the VCP can communicate with parents or friends immediately by just shaking his/her cell
phone or pushing the power button four times in 5 s in panic situations. We used fewer
components to make the device simple, lighter, and cozy with very good features. This device
will help VCPs to live an independent life up to some extent (with security), which ultimately
will increase their confidence level in an unknown environment.
CHAPTER 3
METHODOLOGY
In order to increase the safety of the visually impaired person, this system of blind assistance
can be used which will help and aid the blind person to judge and predetect an oncoming
object and inform the user the user about the obstacle through the voice command. The image
processing of each frame capture by the camera is detected; the object is compared with
reference data and algorithms applied to these frames based on predetermined constrains.
3.1 Flowchart
Fig 3.4: original captured image Fig 3.5: resized gray scale version
Fig 3.6: Edge map of the captured image Fig 3.7: obstruct extracted from the foreground
CHAPTER 4
The schematic diagram of the project is simple and can be understood easily. Video data is
acquired from the camera module and is proposed using MATLAB. A GPS system is used to
track and direct the visually Impaired person to the destination location. OCR technology is
used to convert the image to text and TTS engine read the text which is coming from the OCR
software.
The above fig 4.1 shows the block diagram of the system. All the sensors are connected to the
controller (Raspberry pi) module and the python code works to decide whether the object is
present in front of the user. The data from the OCR and GPS are send to the speaker to the
user.
The above figure shows the circuit diagram of the system which uses pi camera for capturing
the image and we used the GPS for tracking the location of user. Speaker for audio output.
GSM module for internet connection.
The organization behind the Raspberry Pi consists of two arms. The first two models were
developed by the Raspberry Pi Foundation. After the Pi Model B was released, the Foundation
set up Raspberry Pi Trading, with Eben Upton as CEO, to develop the third model, the B+.
Raspberry Pi Trading is responsible for developing the technology.
Current monitors on the USB ports mean the B+ now supports hot-plugging
Current limiter on the 5V for HDMI means HDMI cable-powered VGA converters
work in all cases.
14 more GPIO pins
EEPROM readout support for the new HAT expansion boards
Higher drive capacity for analog audio out, from a separate regulator, which means a
better audio DAC quality.
No more back powering problems, due to the USB current limiters which also inhibit
back flow, together with the "ideal power diode" Composite output moved to 3.5mm
jack 15
Connectors now moved to two sides of the board rather than the four of the original
device
Ethernet LEDs moved to the Ethernet connector 4 squarely-positioned mounting
holes for more rigid attachment to cases
a) Model A/A+
This is the basic device, with a single USB port and 256MB of SDRAM. Onboard
ports include: Full size SD card, HDMI output port, 26 pin expansion header
exposing GPIO, 3.5mm audio jack, Camera interface port (CSI-2), LCD display
interface port (DSI) and One micro USB powerconnector for powering the device.
4.1.6 PI CAMERA:
The Pi camera module is a portable light weight camera that supports Raspberry Pi. It
communicates with Pi using the MIPI camera serial interface protocol. It is normally used in
image processing, machine learning or in surveillance projects. It is commonly used in
surveillance drones since the payload of camera is very less. Apart from these modules Pi can
also use normal USB webcams that are used along with computer
Pi Camera Features
5MP color camera module without microphone for Raspberry Pi
Supports both Raspberry Pi Model A and Model B
MIPI Camera serial interface
Omni vision 5647 Camera Module
Resolution: 2592 * 1944
Supports: 1080p, 720p and 480p
Light weight and portable (3g only)
This Raspberry Pi Camera has been used in this model which is used for the online streaming.
Online Streaming of the innermost circle is done using Raspberry pi processor which can be
viewed by typing the IP address of the host in any web server. Thus the activities taking place
in the innermost circle can be viewed anytime and anywhere when you have an internet
connection, thus ensuring the security. The Raspberry Pi camera module can be used to take
high-definition video, as well as stills photographs. It’s easy to use for beginners, but has
plenty to offer advanced users if you’re looking to expand your knowledge.
There are lots of examples online of people using it for time-lapse, slow-motion and other
video cleverness. You can also use the libraries we bundle with the camera to create effects.
If you’re interested in the nitty-gritty, you’ll want to know that the module has a five
megapixel fixed-focus camera that supports 1080p30, 720p60 and VGA90 video modes, as
well as stills capture. It attaches via a 15cm ribbon cable to the CSI port on the Raspberry Pi.
It can be accessed through the MMAL and V4L APIs, and there are numerous third-party
libraries built for it, including the Pi Camera Python library.
4.1.7 Battery
Lithium polymer cells have evolved from lithium-ion and lithium-metal batteries. The
primary difference is that instead of using a liquid lithium-salt electrolyte (such as LiPF6)
held in an organic solvent (such as EC/DMC/DEC), the battery uses a solid polymer
electrolyte(SPE) such as poly(ethyleneoxide) (PEO), poly(acrylonitrile) (PAN), poly(methyl
methacrylate) (PMMA) or poly(vinylidene fluoride) (PVdF).
A typical cell has four main components: positive electrode, negative electrode,
separator and electrolyte. The separator itself may be a polymer, such as a micro porous film
of polyethylene (PE) or polypropylene (PP); thus, even when the cell has a liquid electrolyte,it
will still contain a "polymer" component. In addition to this, the positive electrode can be
further divided into three parts: the lithium-transition-metal-oxide (such as LiCoO2 or
LiMn2O4), a conductive additive, and a polymer binder of poly (vinylidene fluoride) (PVdF).
The negative electrode material may have the same three parts, only with carbon replacing the
lithium-metal-oxide. The voltage of a single LiPo cell depends on its chemistry and varies from
about 4.2 V(fully charged) to about 2.7–3.0 V (fully discharged), where the nominal voltage is
3.6 or 3.7volts (about the middle value of highest and lowest value). For cells based on lithium-
metal-oxides (such as LiCoO2); this compares to 1.8–2.0 V (discharged) to 3.6–3.8 V (charged)
forthose based on lithium-iron-phosphate (LiFePO4).
The voltage of a single LiPo cell depends on its chemistry and varies from about 4.2 V (fully
charged) to about 2.7–3.0 V (fully discharged), where the nominal voltage is 3.6 or 3.7volts
(about the middle value of highest and lowest value). For cells based on lithium-metal-oxides
(such as LiCoO2); this compares to 1.8–2.0 V (discharged) to 3.6–3.8 V (charged) forthose
based on lithium-iron-phosphate (LiFePO4).
Global Positioning System (GPS) makes use of signals sent by satellites in space and ground
stations on Earth to accurately determine their position on Earth. Radio Frequency signals sent
from satellites and ground stations are received by the GPS. GPS makes use of these signals to
determine its exact position. The GPS itself does not need to transmit any information. The
signals received from the satellites and ground stations contain time stamps of the time when
the signals were transmitted. By calculating the difference between the time when the signal
was transmitted and the time when the signal was received. Using the speed of the signal, the
distance between the satellites and the GPS receiver can be determined using a simple formula
for distance using speed and time.
The GPS receiver module uses UART communication to communicate with controller or PC
terminal. Before using UART on Raspberry Pi, we should configure and enable it. For more
information about UART in Raspberry Pi and how to use it, refer the Raspberry Pi UART
The chip, and the software on it, as well as to translate the Chinese documentation. The
ESP8285 is an ESP8266 with 1 MiB of built-in flash, allowing the building of single-chip
devices capable of connecting to Wi-Fi.
These microcontroller chips have been succeeded by the ESP32 family of devices, including
the pin-compatible ESP32-C3.
The ESP8266 can be prevented from booting if some pins are pulled LOW or HIGH. The
following list shows the state of the following pins on BOOT:
GPIO16: pin is high at BOOT
GPIO0: boot failure if pulled LOW
GPIO2: pin is high on BOOT, boot failure if pulled LOW
GPIO15: boot failure if pulled HIGH
GPIO3: pin is high at BOOT
GPIO1: pin is high at BOOT, boot failure if pulled LOW
GPIO10: pin is high at BOOT
GPIO9: pin is high at BOOT
An ultrasonic sensor is an electronic device that measures the distance of a target object by
emitting ultrasonic sound waves, and converts the reflected sound into an electrical signal.
Ultrasonic waves travel faster than the speed of audible sound (i.e. the sound that humans can
hear). Ultrasonic sensors have two main components: the transmitter (which emits the sound
using piezoelectric crystals) and the receiver (which encounters the sound after it has travelled
to and from the target).
In order to calculate the distance between the sensor and the object, the sensor measures the
time it takes between the emission of the sound by the transmitter to its contact with the
receiver. The formula for this calculation is D = ½ T x C (where D is the distance, T is the
time, and C is the speed of sound ~ 343 meters/second). For example, if a scientist set up an
ultrasonic sensor aimed at a box and it took 0.025 seconds for the sound to bounce back, the
distance between the ultrasonic sensor and the box would be:
Ultrasonic sensors are used primarily as proximity sensors. They can be found in automobile
self-parking technology and anti-collision safety systems. Ultrasonic sensors are also used in
robotic obstacle detection systems, as well as manufacturing technology. In comparison to
infrared (IR) sensors in proximity sensing applications, ultrasonic sensors are not as susceptible
to interference of smoke, gas, and other airborne particles (though the physical components are
still affected by variables such as heat).
Ultrasonic sensors are also used as level sensors to detect, monitor, and regulate liquid
levels in closed containers (such as vats in chemical factories). Most notably, ultrasonic
technology has enabled the medical industry to produce images of internal organs, identify
tumors, and ensure the health of babies in the womb.
PYTHON FEATURES:
1. Easy to code:
Python is high level programming language. Python is very easy to learn language as
compared to other language like c, c#, python script, python etc. It is very easy to code in
python language and anybody can learn python basic in few hours or days. It is also developer-
friendly language.
Python language is freely available at official website and you can download it from
the given download link below click on the Download Python keyword.
3. Object-Oriented Language:
One of the key features of python is Object-Oriented programming. Python supports object-
oriented language and concepts of classes, objects encapsulation etc.
5. High-Level Language:
Python is a high-level language. When we write programs in python, we do not need to
remember the system architecture, nor do we need to manage the memory.
6. Extensible feature:
Python is an Extensible language. We can write some python code into c or c++ language and
also we can compile that code in c/c++ language.
9. Interpreted Language:
Python is an Interpreted Language. because python code is executed line by line at a time.
like other language c, python etc. there is no need to compile python code this makes it easier
to debug our code. The source code of python is converted into an immediate form called
bytecode.
1) Web Applications
We can use Python to develop web applications. It provides libraries to handle internet
protocols such as HTML and XML, JSON, Email processing, request, beautiful Soup, Feed
parser etc. It also provides Frameworks such as Django, Pyramid, Flask etc. to design and
develop web based applications. Some important developments are: Python Wiki Engines,
Python Blog Software etc.
3) Software Development
Python is helpful for software development process. It works as a support language and can
be used for build control and management, testing etc.
5) Business Applications
Python is used to build Business applications like ERP and e-commerce systems. Tryton is a
high-level application platform.
4.2.3 OpenCV:
OpenCV [OpenCV] is an open source computer vision library available from
http://SourceForge.net/projects/opencvlibrary. The library is written in C and C++ and runs
under Linux, Windows and Mac OS X. There is active development on interfaces for Python,
Ruby, MATLAB, and other languages. OpenCV was designed for computational efficiency
and with a strong focus on Real-time applications. OpenCV is written in optimized C and can
take advantage of multicore processors. If you desire further automatic optimization on Intel
architectures [Intel], you can buy Intel’s Integrated Performance Primitives (IPP) libraries
[IPP], which consist of low-level optimized routines in many different algorithmic areas.
OpenCV automatically uses the appropriate IPP library at runtime if that library is installed.
One of OpenCV’s goals is to provide a simple-to-use computer vision infrastructure that helps
people build fairly sophisticated vision applications quickly. The OpenCV library contains
over 500 functions that span many areas in vision, including factory product inspection,
medical imaging, security, user interface, camera calibration, stereo vision, and robotics.
Because computer vision and machine learning often go hand-in hand, OpenCV also contains
a full, general-purpose Machine Learning Library (MLL). The is sub library is focused on
statistical pattern recognition and clustering. The MLL is highly useful for the vision tasks
that are at the core of OpenCV’s mission, but it is general enough to be used for any machine
learning problem.
4.2.4 Raspbian OS
Raspbian OS is one of the official Operating systems available for free to download and use.
The system is based on Debian Linux and is optimized to work efficiently with the Raspberry
Pi computer. As we already know an OS is a set of basic programs and utilities that runs on a
specified hardware, in this case the Pi. Debian is very lightweight and makes a great choice for
the Pi. The Raspbian includes tools for browsing, python programming and a GUI desktop. The
Raspbian desktop environment is known as the “Lightweight X11 Desktop Environment”or in
short LXDE. This has a fairly attractive user interface that is built using the X Window System
software and is a familiar point and click interface. We shall look more into how to install and
use this OS in the next section.
Step 1: Take the Pi out of its anti-static cover and place it on the non-metal table.
Step 2: Connect the display – Connect the HDMI cable to the HDMI port on the Pi and the
other end of the HDMI cable to the HDMI port of the TV.
Step 3: Connect your Ethernet cable from the Router to the Ethernet port on the Pi
Step 4: Connect your USB mouse to one of the USB ports on the Pi
Step 5: Connect your USB Keyboard to the other USB port on the Pi
Step 6: Connect the micro-USB charger to the Pi but don’t connect it to the power
Languages:
The eSpeak speech synthesizer supports several languages, however in many cases these are
initial drafts and need more work to improve them. Assistance from native speakers is welcome
for these, or other new languages. Please contact me if you want to help.
eSpeak does text to speech synthesis for the following languages, some better than others.
1) The recognition rate. It was decreased less than 4 times (there’s a possibility to run in
multiple threads).
3) Text is recognizing from right to the left so the right receipt side is recognizing earlier than
from the left side.
4.2.5 Yolo
Yolo is a framework composed of two core building blocks — a library for defining
computational graphs and a runtime for executing such graphs on a variety of different
hardware. A computational graph has many advantages but more on that in just a moment.
Computational Graphs
In a nutshell, a computational graph is an abstract way of describing computations as a directed
graph. A directed graph is a data structure consisting of nodes (vertices) and edges. It’s a set of
vertices connected pairwise by directed edges.
Operations create or manipulate data according to specific rules. In Yolo those rules are called
ops, short for operations. Variables on the other hand represent shared, persistent state that
can be manipulated by running Ops on those variables.
The edges correspond to data, or multidimensional arrays (so-called Tensors) that flow
through the different operations. In other words, edges carry information from one node to
another. The output of one operation (one node) becomes the input to another operation and
the edge connecting the two nodes carry the value.
Here’s an example of a very simple program:
To create a computational graph out of this program, we create nodes for each of the operations
in our program, along with the input variables a and b. In fact, a and b could be constants if
they don’t change. If one node is used as the input to another operation, we draw a directed
arrow that goes from one node to another.
The computational graph for this program might look like this:
Computational graph representing our simple program and its data flow
This graph is drawn from left to right but you may also find graphs that are drawn from top to
bottom or vice versa. The reason why I chose the former is simply because I find it more
readable.
The computational graph above represents distinct computational steps that we need to
execute to arrive at our final outcome. First, we create two constants a and b. Then, we
multiply them, take their sum and use the results of those two operations to divide one by the
other. And finally, we print out the result.
This is not too difficult, but the question is why do we need a computational graph for this
What are the advantages of organizing computations as a directed graph
are executed, often sequentially, line by line. This means we would first multiply a and b and
only when this expression was evaluated we would take their sum. So, the program specifies
the order of execution, but computational graphs exclusively specify the dependencies across
the operations. In other words, how would the output of these operations flow from one
operation to another.
Now that we have a solid foundation let’s look at the core parts that constitute a computational
graph in Yolo. These are the parts that we will later on re-implement from scratch.
Yolo Basics:
A computational graph in Yolo consists of several parts:
Variables: Think of Yolo variables like normal variables in our computer programs. A
variable can be modified at any point in time, but the difference is that they have to be
initialized before running the graph in a session. They represent changeable parameters
within the graph. A good example for variables would be the weights or biases in a neural
network.
Placeholders: A placeholder allows us to feed data into the graph from outside and unlike
variables they don’t need to be initialized. Placeholders simply define the shape and the
data type. We can think of placeholders as empty nodes in the graph where the value is
provided later on. They are typically used for feeding in inputs and labels.
Graph: A graph is like a central hub that connects all the variables, placeholders, constants
to operations.
Remember from the beginning that we said Yolo is composed of two parts, a library fordefining
computational graphs and a runtime for executing these graphs That’s the
Graph and Session. The Graph class is used to construct the computational graph and the
Session is used to execute and evaluate all or a subset of nodes. The main advantage of deferred
execution is that during the definition of the computational graph we can construct very complex
expressions without directly evaluating them and allocating the space in memory that is needed.
In the snippet above we use both tf. Zero’s and np.zeros to create a matrix with all elements setto
zero. While NumPy will immediately instantiate the amount of memory that is needed for a trillion
by a trillion matrix filled with zeros, Yolo will only declare the shape and thedata type but not
allocate the memory until this part of the graph is executed.
4.3 Result
User will able to visualize path and distance of object in path. The user can operate device
without any external help as it is simple as user friendly device. The user can locate on the
map by GPS. The user guardian can locate and trackback. Overall device helps user to
indicate the path, environment recognition by giving voice command
CHAPTER 5
CONCLUSION
The system has a simple architecture that transforms the visual information captured using a
camera to voice information using Raspberry Pi. Unlike other systems available in the market,
the subject needs only to wear the helmet and doesn’t require any particular skills to operate
it. The proposed system is cheap and configurable. Any blind or visually impaired person can
use it simply since he/she has to only power up the device. The system helps in clear path
indication and environment recognition. The device is a real-time system that monitors the
environment and provides audio information about the environment making his/her
navigation safer and secure.
The blind assistance and navigation system will be really helpful for the blind people
in their navigation. The object detection can be developed to count the number of objects in
a scene. In this paper, the COCO model is used to train the SSD mobile net which can detect
of objects. The number of objects can be increased by training the model by ourselves. Face
detection can be also incorporated so that the blind person can easily identify his/her family
members and friends.
To develop object or obstacle detection device using real time image processing .
To develop user tracking mechanism using GSM module.
CHAPTER 6
FUTURE SCOPE
To address this challenge, this research study has proposed the blind assistance and navigation
system solution that can be utilized by the visually impaired for normal activities, and
especially during disaster situations. This blind assistance and navigation system device
provides a real-time navigation and narrative system. The device is cost effective (about NZD
200), which makes it affordable and accessible for the wider community who suffer from this
problem. We hope that this proposed blind assistance and navigation system can be a step to
providing the visually-impaired people with the missing support and services they so
desperately need during and after disaster situations. This research work is only a proof-of-
work; in our future work, we hope to make a complete standalone version with additional
assistive functionalities for the blind.
REFERENCES
R. Bruntrup and S. Edelkamp. Incremental map generation with gps traces. In Intelligent
Transportation System, 2005. Proceedings. 2005 IEEE, 2005.
D. Robbins, E. Cottrell, R. Sarin, and Eric Horvitz. Zone zoom: Map navigation for
smartphones with recursive view segmentation. Microsoft Research One Microsoft Way, 2004
S. Schroedl, S. Rogers, and C. Wilson. Map refinement from gps traces. Technical Report RTC
6/2000, DaimlerChrysler Research and Technology North America, 2000
Stephen V. Rice, Junichi Kanai, and Thomas A. Nartker. A report on the accuracy of OCR
devices. Technical Report 92-02, Information Science Research Institute, University of
Nevada, Las Vegas, March 1992.
Stephen V. Rice, Frank R. Jenkins, and Thomas A. Nartker. The fourth annual test of OCR
accuracy. Technical Report 95-04, Information Science Research Institute, University of
Nevada, Las Vegas, April 1995.
Thomas A. Nartker and Stephen V. Rice. OCR accuracy: UNLV’s third annual test. Inform,
Association for Information and Image Management.
Junichi Kanai, Stephen V. Rice, Thomas A. Nartker, and George Nagy. Automated evaluation
of OCR zoning. IEEE Transactions on Patter Analysis and Machine Intelligence.
Murase, H. and Nayar, S.K. 1995. Visual learning and recognition of 3-d objects from
appearance. International Journal of Computer vision
Roberts, L.G. 1965. Machine perception of 3-d solids. In Optical and Electro-Optical
Information Processing.