Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

End to End Voice Assisted Inner Website Navigating

Model for Handless and Blind


Prof. Karthikeyan A Manoj Krishnaa Jerome
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
Panimalar Engineering College Panimalar Engineering College Panimalar Engineering College
Chennai, India Chennai, India Chennai, India
yourskeyan@gmail.com manojkrish91298@gmail.com jeromeprince99@gmail.com

Abstract—Handicap (Handless) people can access websites INTERACTIONS INVOLVED


using speech recognition and voice based movement in a
browser. It requires no keyboard and mouse for entering text i. User (handless) speeks the commands that he needs
and clicking links; a speech recognition software is used to to be done(Example: Click)
convert speech to text commands and appropriate actions such
ii. The microphone in the personal computer or
as clicking links, searching query, playing media is performed.
Such an implementation which helps blind and handicap laptop, records the audio and gives(audio signals) it
people is in existence. But it doesn’t support much feature and to a speech synthesizer.
functionality. Our solution covers navigating model within a iii. HTML5 Web Speech API [3] extracts the
website comfortably with more features from end to end commands in the speech and sends the coomands
solution such as from registration, login, form submission, till to the javascript application.
logout. It requires a standard template for webpage
recognition. It helps handicap (handless) and blind people to
iv. Javascript application receives the commands and
navigate inside a website more freely and comfortably. This checks whether any of the commands match with
solution helps a person in browsing the web without hands thus the received one and executes the necessary action
enables one to browse from some distance position. in the browser or current webpage(Example:
Implemented efficiently using a natural language processor Opening new tab, searching a topic, playing media,
and a JS execution engine to give seamless experience as exact scrolling,…)
as keyboard or mouse operation. Future scope of this
implementation is to add more voice functionality that covers
all common functions of websites. And also this voice
support feature can be included as an add-on feature in IV. SYSTEM ARCHITECTURE
newly developed and existing websites.
Keywords—voice based browser, natural language processor,
Speech recognition, Speech to Text, Internet Browsing.

I. INTRODUCTION
Evolution of natural language processing has made
speech driven actions possible. Such methodologies are in
both hardware such as robotics and software such as voice
assistant software. Speech synthesizer can enable smooth
web crawling for both handicap and blind people. Such an
implementation can be designed using these natural language
processor and website technologies helping handless people.
This solution leverages browsing the web without hands or
keyboards and mouse.
II. RELATED WORK
Voice assistant software [1] provides capabilities for
interacting with the application voice based. Through which
we can hear the conversation of AI [2] and respond to it. It
browses the web and gives you the answer in the form of
human speech. It opens the web pages spoken by you. You
can schedule task and work by it. You can get numerous
helps from it and it delivers services in the form of speech.
But you can’t able to click any link or fill any form or play
media by voice. These website based operation is the focused
work of this research.
III. PROPOSED WORK
This solution makes use of HTML5 Web Speech API and
Javascript. Since browser understands only frontend
languages, it leverages javascript for operation. This
implementation offers voice based browser surfing Fig. 1. System Architecture
functionality.
Fig. 1 describes the architecture of this solution. User who is TABLE 1. COMMANDS
handless or anyone who is in distant position speeks out the Commands Description
commands to perform. These commands are listed in the Link Opens links and clicks the desired one
commands table below. The application provides an Scroll Automatically scrolls the webpage from top
to bottom and vice versa
interface for the user to understand the process of voice
Search Searches the given content in the web
based operation which shows the progress of execution. Play Plays the video
Submit Submit the form
V. MODULES Zoom Perform zoom in current webpage
The solution is divided into multiple modules. Each Copy Copies text of the selected element
module with separate functionality. Fig. 2 shows the module
description. VI. USER INTERFACE
The application is characterized as an extension
1. LINK MODULE : logic, going to be developed as a browser extension and
When the user says link, the application published in Google chrome extension store which ease
searches for all the links in the current tab web for everyone able to easily capable of downloading and
page and assigns a unique identity to it. Thus user installing it, and making use of the application.
can further choose which link to click. Now user
says the required link ID, after which application User interface for this application is designed in a
performs a click operation. The result would be to way for users easy to use and manipulate the features.
navigate to that clicked link by voice. Visually this adds an another layer upon existing current
tabs creating interfaces dynamically based on user speech
2. AUTO SCROLL MODULE : by use of HTML, CSS and JS.
This module features the auto scrolling option
Session for each tab is maintained by the session
enabling the user to scroll automatically just by
saying the command scroll. Also gives the option handler where when you move to another tab and again
come back to the previous tab you see the same thing
to customize the speed of scrolling.
where you left before.
3. MEDIA MODULE : Fig. 3 illustrates an example interface which
Capable of opening media sites such as
depicts the success of matching a command spoken by
youtube.com and others, with the options to play
user.
and pause the video. Features the options to
forward the video and enabling full screen display.

4. TAB MODULE :
This module gives the feature of opening a
new tab, closing an existing tab, going back and
forth between pages. Enables performing voice
based web searching operation just similar to
Google search by voice system.

5. FORM FILLING MODULE :


Forms are difficult for handicap people to fill
it. Such a difficulty is overcome by this module by
enabling a person to fill the form by their voice
command. From entering field text, selecting
options to submitting the form comes under this
module functionality. Fig. 3. User Interface for successful matching

Fig. 4 displays an example interface which depicts


the failure of matching a command spoken by user.

FIG. 2. MODULES
Fig. 4. User Interface for unmatched command
[2] Conversation of AI: https://ai.googleblog.com/2018/05/duplex-ai-
VII. TEXT TO SPEECH ANALYSIS FOR BLIND system-for-natural-conversation.html
[3] “Web Speech API”, Draft Community Group Report, 21 January
Since blind people cannot able to see the webpage, 2020.
a new methodology can be created to read out the [4] Google Password Manager: https://passwords.google.com.
webpage text thus enabling blind to hear the webpage
content through speech. A screen reader software or text
to speech software can do this job but has one problem.
Since these software reads all the webpage content, it’s
irrelevant for one to hear all those or time consuming to
hear all those content. Say if a webpage has 1000 words,
until the software completes reading the blind person has
to listen to all those text. So it’s unnecessary content and
time consuming. What the blind person needs is to know
what are available in the webpage and what commands
should be said to access those available things. So the
content has to be mapped to some commands and this
mapping should be read out to the user by text to speech
software. So now the blind person can know what he/she
needs from that webpage and how to access that which
followed by that person saying the respective command.
This can be accomplished by two methods :
1. Using Name Attribute In Html Tags :
By giving a name attribute to each tag in
the html elements, we can make the text to
speech software to read those values in the name
attribute in this way we get to know the purpose
of that tags. By this way the blind person can
know about the content. But this is not feasible
since we can’t say that each websites has name
attributes in their tags.
2. XML Based Approach with AI:
An xml based webpage structure for each
webpage can be developed using AI, so that
screen reader can read those xml data by which
the person can know the purpose of content.

VIII. LOGIN ANALYSIS FOR BLIND


By using voice based functions it is not protective
for the blind person to dictate the username and password
to login where others can openly hear it. So some
methodology should be used while logging in to websites.
This works just similar to the Google password manager
[4] where it automatically generates some random
password, and auto fills it in further logins. So the user
can login without saying the password, the application
automatically fills the password field. The application
generates random password and it stores that password
mapped to the website URL, so that it automatically
fetches and fills it in further login.

CONCLUSION
Knowing Information via web is indispensable for
everyone which is being difficult for handicap and blind
people. Considering this problem we have built this
model using existing technology. Technology is growing
faster than before, which can bring new methods in
browsing and crawling the web which could help blind
and handless people more comfortable.

REFERENCES

[1] Abhay Dekate, Chaitanya Kulkarni, and Rohan Killedar, “Study of


Voice Controlled Personal Assistant Device,” International Journal
of Computer Trends and Technology (IJCTT) – Volume 42
Number 1 – December 2016.

You might also like