AI-Powered Smart Glasses For Blind Deaf and Dumb

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2022 5th International Conference on Advances in Science and Technology (ICAST)

AI-Powered Smart Glasses for Blind, Deaf,


and Dumb
Sneha M
Shree Lakshmi R
Department of Computer Engineering
Department of Computer Engineering
R.M.K Engineering College
R.M.K Engineering College
Tamilnadu
Tamilnadu
India
India
A. Thilagavathy
Department of Computer Engineering
Swetha K
R.M.K Engineering College
Department of Computer Engineering
Tamilnadu
2022 5th International Conference on Advances in Science and Technology (ICAST) | 978-1-6654-9263-8/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAST55766.2022.10039557

R.M.K Engineering College


India
Tamilnadu
India

Abstract - Back then, technology was far more advanced reach the item they knew was on the kitchen counter, and
(developed) than time. We can now determine what a deaf or he mistakenly or unwittingly used spices instead of sugar
mute person is saying only by capturing it and comparing it when baking cookies to have the greatest cookie baking
to predetermined datasets, showing what the individual is adventure. Receiving insufficiently comprehensive scene
striving to achieve. In this study, we propose a way of descriptions: Normally, blind persons are unable to see the
supporting impaired people, such as those who are Deaf, outside world and miss out on the majority of outdoor
Dumb, or Blind, by giving them a new tech that acts as an experiences. People talking slowly throws off lip reading:
eye, ear, and brain to them. Machine learning methods are When people realize someone is deaf, one of the first things
employed for object recognition with the aid of image they do is switch to a much slower mode of communication.
processing to give the blind, a pair of eyes. The deaf and the This is frequently done since individuals recognize that it
dumb can both benefit from text-to-speech communication will aid with lip reading and consider dynamically intended
using Bluetooth or radio technology and speech-to-text elocution. In any event, anyone attempting lip-reading will
translation. The convergence of all these technologies, find life increasingly difficult as a result of this. The
together with AI, AR, VR, and IoT technologies, will help in demand for a lip-perusing amplifier is predicated on
finding solutions to the problems that these people who face individuals talking often. This is how they figured out a
disabilities. technique to view the forms people produce while they
converse, so they could have unquestionably nice
Keywords— Internet of things, Artificial Intelligence, interactions with others. Changing to a slower pace of
processing of images, wireless communication, internet of speech alters the way your mouth operates and, as a result,
things, Bluetooth communication, Augmented Reality, Virtual improves your general health. Nighttime and gloomy
Reality. venues, such as clubs or concerts, are particularly
problematic for persons with hearing problems. In the end, lip
reading or gesture-based communication are the only methods
I. INTRODUCTION used by the hard of hearing to communicate with others. It
may quickly become difficult to communicate effectively
According to the World Health Organization, there are with people if there is little light. It's simply too boring to
around 285 million blind persons, 300 million deaf people, imagine viewing anything that might be interpreted
1 million deaf people, and many more afflicted with one or properly. Even dimly lit rooms can cause significant
more of the medical ailments listed above disabilities. problems for the practically deaf. There is no universal sign
Inability to communicate: Visual impairments impede a language: As strange as it may look to the newcomer,
person's ability to communicate with others, acquire gesture-based communication is unquestionably not a
information, and create his or her knowledge and universal tongue. Even the distinctions between British
experience, the necessity for aiding aids with multitasking Sign Language and American Sign Language are crucial
capabilities to cope with a variety of scenarios is critical. No since various nations have different customs. When you
advancement in technology: The available aids and combine it with the fact that local regions have their
technologies on the market today are expensive and only variations, similar to how accents are expressed in
allow us to use them for one purpose at a time. For languages, you have a slew of common misunderstandings.
example, a device designed for deaf people cannot be used We tackled the challenge with the help of several pre-
by blind people. Because no such device is available in the programmed algorithms, which we tweaked to fit our
market that can satisfy all of these needs, new devices with needs. Numerous hardware and software requirements are
multiple functions and lower prices are required. relevant to this piece, and the design was carried out in
Transportation: Getting about and exploring new locations compliance with those needs.
might be difficult, especially if one lives in a remote region.
Finding Objects: Things move by themselves (sarcasm) if The following is a list of hardware requirements:
they are not done by someone else, and they forget about it
afterward. This makes it difficult for handicapped people to

978-1-6654-9263-8/22/$31.00 ©2022 IEEE 280

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY LAHORE. Downloaded on November 20,2023 at 07:06:08 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)

T-OLED (Transparent OLED panel): This transparent The DAC (Digital to Analog Converter) A DAC
OLED panel makes it possible for persons who are deaf or converts digitally recorded sound to analog before
dumb to see the speech-to-text translation shown on the delivering it to the app for further processing. As depicted
glasses HUD. OLED: Rather than using a T-OLED panel, in Figure 2, the circuit diagram is as follows.
we could use a normal OLED panel with a resolution of Microcontroller: We're considering a Beagle bone Black
128x32 and a mirror to reflect the text from the OLED as a microcontroller for this project. We'll convert the
screen directly onto the user’s glass in front of the eye, modules to SMDs and employ bare CPUs instead of
negating the need for the user to install any extra gear. development boards before commencing a full-fledged
Two IP cameras, Since we don’t have any server storage manufacturing line.
in the glass we use IP cameras. As a result, we must use
the raw video by transferring it to another mobile device II. LITERATURE REVIEW
for processing conversion to our output specifications. In
impact-resistant technology, proximity sensors are used. If In [1], In order to use BVIP as a visual assistant, a
an obstruction occurs within the range of the sensors, it system based on Google Glass was created to help it with
will alert the user. The distance to any approaching item scene identification duties. The built-in camera on the
smart glasses is used to take photographs of the
or vehicle will be calculated using Ultrasonic Sensors. It
surroundings, which are then analyzed using the Azure
will take charge of the problem because proximity sensors Cognitive Services' Custom Vision Application
cannot be used at long distance. The use of the Programming Interface (Vision API) from Microsoft. In
microphone array will be crucial for the deaf and the [2], their study's goal is to provide an updated, complete
blind. The device will take in surrounding sound from the understanding of this research so that its interdisciplinary
users present position and process it in line with the sound nature's numerous elements may be utilized by
input such as sporting any approaching from the rear and developers. In [3], Wrist worn Obstacle Avoidance The
utilizing the front camera for the particular task. Speaker: purpose of Electronic Travel Aids for Blind is to inform the
To speak the text aloud and play it close to the listener’ scientific community and users about the capabilities of these
ear(useful in case of the blind individual, details described systems and advancements in assistive technology for people
in the lit. survey paragraph). In short the speaker will with visual impairments. It is a comparative assessment of
portable/wearable obstacle detection/avoidance systems (a
function as a earpiece.
subclass of ETAs). In [4], They provide a technique for
The ADC (Analog to Digital Converter) is a type of automated signal detection and recognition in natural
electrical integrated circuit that transforms analog signals environments and its application to a sign translation issue.
into binary signals made up of 1s and 0s. The vast Using this technique, we were able to develop a Chinese
sign translation system that, despite only functioning in
majority of these converters provide digital output in the
Chinese, can recognize and translate Chinese signs from
form of a binary integer and take input voltages in the 0- input from a camera into English.
to-10-volt, -5 volt, to +5 volt, and other ranges. We use
ADCs to convert the incoming analog signal to digital so In [5], It has been investigated if it is feasible to
that it may be stored in the microcontroller before being sent convert text to speech using a cheap computer and a little
amount of data saved, but it is not appropriate for all
computer memory capacities. In [6], Background
subtraction is a common task in computer vision. We look
at the conventional pixel-level method. We create an
effective adaptive approach using the Gaussian mixture
probability density. Recursive equations are used to
simultaneously choose the appropriate number of
components for each pixel and update the parameters on a
regular basis. In [7], The project's objective is to provide
people with a glove-based technology for translating deaf
Fig 1: ADC sound converter module and mute communication. In [8], Preliminary research
with blind people indicates a variety of difficulties with
modern cutting-edge technology, such as problems with
alignment, focus, accuracy, mobility, and efficiency.

In [9], A vision-based assistive technology with


speech output is suggested for label verification. To assist
blind persons in reading text labels and product packaging
circuit diagram. from hand-held items in their daily lives, the authors
propose a camera-based assistive text reading framework.
In [10], a smart device that can assist deaf disability
people in understanding and responding to their body
Fig 2: DAC sound converter module
language with AI algorithm solutions was developed. In
circuit diagram.
[11], The goal of the project is to create voice-based,
artificial intelligence (AI) smart gadgets that would utilize

281

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY LAHORE. Downloaded on November 20,2023 at 07:06:08 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)

Python, OpenCV, CNN, ASR, Google text-to-speech and


speech-to-text APIs, YOLO, Web API, Image Search, and
Image Recognition API, and HTML5 Geolocation
functionality in addition to the Maps JavaScript API.

YOL
O

Correct Loc Sim Other


Background

Fig 4: YOLO accuracy graph

The following is a description of the theoretical approach:

A. Deaf Person
For the persons who can’t hear, we are using a type of
Fig 3: YOLO Result for Object detection camera that can be used in a two-way system with
microphones, one camera on the back of the head and
the latest technology to show how it may help those another one on the front of the glass. The deaf person’s
who have hearing loss. According to [12], the purpose of rear camera will identify any impending impediments,
this technical IoT smart gadget is to assist in detecting such as automobiles, and convey that information to the
phantom electrical energy use in a smart home setting in microcontroller which processes the information. The
order to encourage energy conservation. microphones which are attached to glass will pick up any
sound in the environment, mostly speech, and transfer it
The work in [13] comes to the conclusion that while wirelessly to the microcontroller. Later when the
the deep processing structures might enhance this genre, microcontroller receives all of the datasets as input, before
the choice of features and the structure with which they transmitting them as text messages to the smart glass's
are integrated, including layer width, can also be HUD, it will first analyze it by the established constraints
important considerations. The project in [14] seeks to and bounds, where it will be displayed and acted upon.
provide a single device solution that is easy to use, quick, Imagine a circumstance where a bus is coming up behind
accurate, and economical. The major goal of the a deaf individual in real life. He won't hear it; therefore,
technology is to provide persons with disabilities a sense the back camera will capture it and send it to the
of independence and confidence by seeing, hearing, and microcontroller. In response, the microcontroller will send
speaking for them. The purpose of the study in [15] is to a message to the HUD alerting it that an incoming car is
determine how deaf individuals experience voice- coming.
hallucinations and if the reported perceptual qualities
mirror individual sensory and linguistic experiences. The B. Dumb People
purpose of the paper in [16] is to alert the research and We apparently went app-based rather than hardware-
development community to these gadgets' potential for based in the case of a person who can’t speak. For the
providing assistance. The usefulness of a number of dumb, we've created an app (akin to an Android or iOS
currently available features of these devices for visually app) that detects (captures) the deaf person's hand
impaired people is explored, recommendations are made movements or gestures and converts them to voice or text.
to make these features accessible, and some fresh uses for This AI glass will only work on the receiving side for the
these platforms are suggested. dub, not for the transfer of info. The receiver (a physically
fit normal person) will point the video camera at the dumb
III. METHODOLOGY person performing hand signals. Our application's video
camera will capture the video, process it, and display the
To process the data, ASR, YOLO, the Google voice-to-text needed information on the smartphone screen or as
API, and other algorithms were used. These algorithms speech.
make use of the following devices and technologies:

282

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY LAHORE. Downloaded on November 20,2023 at 07:06:08 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)

C. Blind Person execute the operations detailed in the Methodology


The front and the rear camera come to the aid of the paragraph on it. Receiving analogue direct impacts
visually impaired. The front camera performs all of the through the microphone while recording and transmitting
functions of the back camera, such as detecting incoming will cause a significant amount of noise. In order to
obstacles and other tasks, as well as identifying obstacles, minimize noise and seamlessly integrate the sound with
such as identifying a chair or table and converting them to the ESP clock signal, we will utilize a microphone to
speech via the microcontroller, which is then sent to the record the sound, convert it to digital using ADCs, and
speaker for output, or scanning a QR code, as needed. The then store it in the microcontroller before transmitting it to
camera occasionally might not be able to distinguish what the app. Information is converted to analog during
kind of object it is looking at due to a fault. The picture transmission via DACs after being saved with in
will then be taken by a camera that performs a search microcontroller so that the app could receive and analyze
based on the collected image and outputs the top result it on the backend. In order to notify the user of any
from that search. If this doesn't happen, it will only be able impending obstacles that the computer thinks they should
to name objects that are similarly shaped and create a trail be aware of, the application will automatically acquire
for someone who is blind to follow. Proximity sensors distance data via proximity and ultrasonic sensors. There
have been fitted on the front frame of the smart glass will be two subscriptions for the app to reduce the
body. These sensors will detect any obstructions within processing time of the app and the backend codes, or we
the preset range and inform the user. Furthermore, the may update our system for three distinct types for persons
blind person will also have a voice bot assistant, if he has with entire disabilities, blind and deaf disabilities at the
to make a call or to find the time and date rather than same time. One subscription will be utilized handle the
asking someone for the assistance, he can directly use the blind person, who will only receive video inputs from
AI glass. the cameras with the assistance of a microphone for
voice assistance. The other subscriptions will be
D. The app utilized to analyze incoming sounds and speeches for
the deaf and dumb. Deaf and dumb people are not
All raw data, feed, and microphone data will be required to put on extra camera systems on their eyes
transmitted to the Android app. The video feed will be because they're able to see normally and can do so
relayed to the app through an IP address over a wireless without any hassles. We preserved all of the hardware
connection, so the backend code of the app will access the as described for the blind persons in order to reduce
IP address of the ESP module to access the video feed and the strain on the CPU and mobile device. However,

Fig 5: Diagram showing how the technological aspects of the smart glass will be implemented and operate

283

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY LAHORE. Downloaded on November 20,2023 at 07:06:08 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)

functionality like listening to incoming discussions could be understood by others.


will be disabled, and the microphone would only
activate in response to calls as well as other
predetermined triggers. The subscription will be selected For deaf:
just once when the device is first started and paired with
the app. We don't have another option for the third sort of 1) They will be able to read what others are saying and
subscription but to use all the gear indicated and all reply appropriately.
processing to assist the person individually. 2) On the HUD, the input may be analyzed and turned
into text.

IV. RESULTS AND DISCUSSION


The following is a detailed process on how the product
will benefit a blind person: We must ensure that the URL
connects directly to the video stream and not to any HTML
website where the video stream might be seen. Since the
filename or URL must have the correct suffix for
OpenCV to read a video stream. The vast majority of the
processing takes place here. After importing all of the
necessary libraries, we must first construct an instance of
the class for detecting the object. Later, after changing the
model type to YOLOV3, which is also referred to as the
YOLO model algorithm with many numbers of various
sorts of objects for comparison (at least 80). Transferring
video to the server: Because the ESP32 is also a
centralized system, we may upload code straight to the
board to transmit movies to the server. If we enter the
network credentials into the code, the primary
monitor will reveal the board's IP address and display
the live footage from the camera. Text-to-voice is based
on the same technology as video object detection. Then
we ran the detect objects from the video method and passed
it the needed parameters (one of which must be the
camera input). We'll translate the object's name to picture
to voice and transmit it to the associated earpiece which
acts as a speaker to alert them of the approaching
impediment from behind or in front. We'll also examine
the proximity sensors, alerting the user to any obstacles
from the sides and making him or her aware of his or her
surroundings.

Fig 6: Object detection flowchart for


blind people

What function will the product have on the person who is


dumb, blind, or deaf?
For blind people:
1) This AI smart glass will assist in detecting objects and
allowing them to make a conversation with others.
2) They can also read on their own.
3) They are self-sufficient in terms of money and
possessions.
4) Able to move around obstacles while avoiding them.

For dumbs:
1) Through the Android software mentioned in the Fig 7: Product benefit
preceding section, they may communicate with
others using signlanguage, and the HUDs are used
to convert the sign to signal or text format which

284

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY LAHORE. Downloaded on November 20,2023 at 07:06:08 UTC from IEEE Xplore. Restrictions apply.
2022 5th International Conference on Advances in Science and Technology (ICAST)

Because There Is No Direct Technique For Converting be possible with automatic speech recognition. The blind
An Image To Voice, We Must Extract The Information From could be guided while walking with the help of live image
The Designed Model Outlined Above And Pass It To A Ml transmission and onboard object recognition, effectively
Algorithm To Convert Text-To-Speech. Speech Recognition, giving them an eye to see what is going on around them.
Spy Audio, And Pyttsx3 Must All Be Installed. We Must Smart glasses would assist in resolving the issues of many
Handle The Incoming Sound In Such A Manner That The blind, deaf, and dumb people who lack confidence in
Software Has A Second Or Two To Alter The Recording speaking and even assisting them in various activities. The
Energy Threshold By The External Noise Level. After blind and deaf might both benefit from this efficient, low-
Receiving The Needed Sound, We Do The Speech-To-Text cost technology's ability to see and hear.
Translation. (An Active Internet Connection Is Required.)
We Must Use The Initialize () Method With Its Parameters REFERENCES
To Initialize The Libraries That Is Needed For Converting
Text To Speech. Finally, We Utilize Run () And Wait () To [1] H. Ali A., S. U. Rao, S. Ranganath, T. S. Ashwin and G. R. M. Reddy,
Execute The Speech. Unless The Interpreter Sees Run () And "A Google Glass Based Real-Time Scene Analysis for the Visually
Impaired," in IEEE Access, vol. 9, pp. 166351-166369, 2021, DOI:
Wait (), None Of The Strings Will Be Executed. Once We 10.1109/ACCESS.2021.3135024.
Have The Needed Text Output, We Can Simply Output It To [2] Real Valdés, Santiago & Araujo, Alvaro. (2019). Navigation Systems
The Smart Glass's Hud. As An Add-On, We'll Use Apis Like for the Blind and Visually Impaired: Past Work, Challenges, and Open
For Searching Images And Identifying Objects, As Well As Problems. Sensors. 19. 3404. 10.3390/s19153404.
The Html5 Has Geolocation Feature And Maps. Js Api Is [3] N. G. Bourbakis and D. Dakopoulos, "Wearable Obstacle Avoidance
Also Used To Notify The User For Gps, And If The Yolo Electronic Travel Aids for Blind: A Survey," IEEE Transactions on
Model Fails To Identify Some Of The Objects It Notices, It Systems, Man, and Cybernetics, Part C (Applications and Reviews),
Can Use A Standard Google Based Algorithm Known As vol. 40, no. 1, Jan. 2010, pp. 25–35, DOI:
10.1109/TSMCC.2009.2021255..
Google Lens To Identify The Object.
[4] Chen, Xilin & Zhang, Jing & Waibel, Alex. (2004). Automatic
Detection and Recognition of Signs From Natural Scenes. Image
Processing, IEEE Transactions on. 13. 87 - 99.
10.1109/TIP.2003.819223.
[5] System for Converting English Text into SpeechAinsworth, William
A, IEEE Transactions on Audio and Electroacoustics, AU-21, 3, 288-
90, Jun 73.
[6] Zivkovic, Zoran. (2004). Improved Adaptive Gaussian Mixture
Model for Background Subtraction. Proceedings - International
Conference on Pattern Recognition. 2. 28 - 31 Vol.2.
10.1109/ICPR.2004.1333992.
[7] Assistive Translation for Deaf and Dumb People, Mandar Deshpande,
Prashant Deshmukh, and Sanjaykumar Mathapati, IJECCE Vol. 5,
Issue 4 July, Technovision-2014.
[8] Shilkrot, Roy & Huber, Jochen & Wong, Meng Ee & Maes, Pattie &
Nanayakkara, Suranga. (2015). FingerReader: A Wearable Device to
Explore Printed Text on the Go. 2363-2372.
10.1145/2702123.2702421.
[9] Babu. Y, Ramesh. “Vision-Based Assistive System for Label
Detection with Voice Output.” International Journal of Innovative
Research in Science, Engineering and Technology 3 (2014): 546-549.
[10] Battina, Dhaya Sindhu & Surya, lakshmisri. (2021). Innovative
study of an AI voice based smart Device to assist deaf people in
understanding and responding to their body language. SSRN
Electronic Journal. 9. 816-822.
[11] Harshada, "Smart Communication Assistant for Deaf and Dumb
People", International Journal for Research in Applied Science and
Engineering Technology, vol. 9, no., pp. 1358-1360, 2021.
[12] M. Li, W. Gu, W. Chen, Y. He, Y. Wu and Y. Zhang, "Smart Home:
Architecture, Technologies and Systems", Procedia Computer
Science, vol. 131, pp. 393-400, 2018.
[13] N. Morgan, "Deep and Wide: Multiple Layers in Automatic Speech
Recognition", IEEE Transactions on Audio, Speech, and Language
Processing, vol. 20, no. 1, pp. 7-13, 2012.
Fig 8: Analysis [14] Karmel, A. Sharma, M. pandya and D. Garg, "IoT based Assistive
Device for Deaf, Dumb and Blind People", Procedia Computer
Science, vol. 165, pp. 259-269, 2019.
V. CONCLUSION [15] J. Atkinson, K. Gleeson, J. Cromwell and S. O'Rourke, "Exploring the
perceptual characteristics of voice-hallucinations in deaf people",
This paper demonstrates how Artificial Intelligence Cognitive Neuropsychiatry, vol. 12, no. 4, pp. 339-361, 2007.
and Internet of Things technologies, along with most [16] Jafri, R., Ali, S.A.: Exploring the potential of eyewear-based wearable
utilized machine learning algorithms, which may be used display devices for use by the visually impaired. In: International
to tackle problems that have plagued the blind, deaf, and Conference on User Science and Engineering, Shah Alam, 2–5
deaf for years. Giving the deaf a voice and assisting them September 2014.
in understanding what those around them are saying may

285

Authorized licensed use limited to: UNIV OF ENGINEERING AND TECHNOLOGY LAHORE. Downloaded on November 20,2023 at 07:06:08 UTC from IEEE Xplore. Restrictions apply.

You might also like