Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

NEWS APPLICATION USING VOICE PROMPT

A Minor Project Report

Submitted by
KUMAR ABHISHEK [Reg. No.: RA2011003011171]
SHRIMAYI MATANHELIA [Reg. No.: RA2011003011141]
Under the Guidance of

Mr. G Manoj Kumar

Assistant Professor, Department of Computing Technologies

in partial fulfilment of the requirements for the degree of


BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTING TECHNOLOGIES


COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
SRM NAGAR, KATTANKULATHUR – 603203
NOVEMBER 2023
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR – 603 203
BONAFIDE CERTIFICATE

Certified that this B.Tech. Minor project report titled “NEWS APPLICATION USING VOICE

PROMPT” is the bonafide work of Kumar Abhishek (RA2011003011171) and Shrimayi Matanhelia

(RA2011003011141) who carried out the project work under my supervision. Certified further, that to

the best of my knowledge the work reported herein does not form part of any other thesis or dissertation

based on which a degree or award was conferred on an earlier occasion for this or any other candidate.

Mr. G. Manoj Kumar Dr. B. Siva Kumar

SUPERVISOR PANEL HEAD


Assistant Professor Associate Professor
Department of Computing Technologies Department of Computing Technologies

Dr. M. PUSHPALATHA
HEAD OF THE DEPARTMENT
Department of Computing Technologies
Department of Computing Technologies
SRM Institute of Science and Technology
Own Work Declaration Form

Degree/ Course : B. Tech in Computer Science and Engineering

Student Name : KUMAR ABHISHEK, SHRIMAYI MATANHELIA

Registration Number : RA2011003011171,RA2011003011141

Title of Work : News application using Voice Prompt

I / We hereby certify that this assessment compiles with the University’s Rules and Regulations relating
to Academic misconduct and plagiarism**, as listed in the University Website, Regulations, and the
Education Committee guidelines.
I / We confirm that all the work contained in this assessment is my / our own except where indicated, and that
I / We have met the following conditions:

• Clearly referenced / listed all sources as appropriate


• Referenced and put in inverted commas all quoted text (from books, web, etc.)
• Given the sources of all pictures, data etc. that are not my own
• Not made any use of the report(s) or essay(s) of any other student(s) either past or present
• Acknowledged in appropriate places any help that I / We have received from others (e.g. fellow students,
technicians, statisticians, external sources)
• Compiled with any other plagiarism criteria specified in the Course handbook / University website

I / We understand that any false claim for this work will be penalized in accordance with the University
policies and regulations.
DECLARATION:
I/We am/are aware of and understand the University’s policy on Academic misconduct and
plagiarism and I /we certify that this assessment is my / our own work, except where indicated
by referring, and that I/we have followed the good academic practices noted above.

If you are working in a group, please write your registration numbers and sign with the date for
every student in your group.
ACKNOWLEDGEMENT

We would like to express our humble gratitude to Dr. C. Muthamizhchelvan,


Vice-Chancellor, SRM Institute of Science and Technology, for all the extended facilities
extended for the project work and his continued support.

We extend our sincere thanks to Dean-CET, SRM Institute of Science and Technology, Dr.
T. V. Gopal, for his invaluable support.
We wish to thank Dr. Revathi Venkataraman, Professor and Chairperson, School of
Computing, SRM Institute of Science and Technology, for her support throughout the
project work.

We are incredibly grateful to our Head of the Department, Dr. M. Pushpalatha,


Professor, Department of Computing Technologies, SRM Institute of Science and
Technology, for her suggestions and encouragement at all stages of the project work.

We want to convey our thanks to our Project Coordinators, Dr. S. Godfrey winster, Dr. S.
Nalini, Mr. G. Manoj Kumar, Dr. M. Kandan and Dr. A. Arul murugan, Department of
Computing Technologies, SRM Institute of Science and Technology, for their inputs
during the project reviews and support.

We register our immeasurable thanks to our Faculty Advisor, Dr.M.Rajalakshmi,


Assistant Professor, Department of Computing Technologies, SRM Institute of Science
and Technology, for leading and helping us to complete our course.

Our inexpressible respect and thanks to our guide, Mr. G. Manoj Kumar, Assistant
Professor, Department of Computing Technologies, SRM Institute of Science and
Technology, for providing us with an opportunity to pursue our project under his
mentorship. He provided us with the freedom and support to explore the research topics
of our interest. His passion for solving problems and making a difference in the world has
always been inspiring.

We sincerely thank all the staff and students of the Computing Technologies Department,
School of Computing, S.R.M Institute of Science and Technology, for their help during
our project. Finally, we would like to thank our parents, family members, and friends for
their unconditional love, constant support and encouragement.

KUMAR ABHISHEK [RA2011003011171]


SHRIMAYI MATANHELIA [RA2011003011141]
TABLE OF CONTENTS

S.NO. TITLE PAGE


NO.

Abstract i
List of Tables ii
List of Figures ii
List of Symbols and Abbreviations iii
1 INTRODUCTION 1
1.1 General 1
1.2 Purpose 2
1.3 Scope 2
1.4 Text To Speech (TTS) And Speech To Text (STT) 3
1.5 Natural Language Processing 5
1.6 ALAN AI 6
1.7 Motivation 7
2 LITERATURE REVIEW 8

3 PROPOSED METHODOLOGY 15
Revolutionizing Accessibility: The Voice-Powered News
3.1 Portal 15
3.2 A User-Centric Approach 15
3.3 The Power of Voice Interaction 15
3.4 Real-Time News Retrieval 16
3.5 Effective Architecture Over Traditional Systems 16
3.6 Modules Used 20

4 RESULT 21
5 CONCLUSION 25
6 FUTURE SCOPE 26
7 REFERENCES 28
APPENDIX 32
ABSTRACT i

The Voice-Powered News Portal stands as a pioneering endeavour, dedicated to bridging the
information chasm for visually challenged individuals, thereby endowing them with
unrestrained entry to the most recent news, thus empowering their active participation in the
news realm. It leverages state-of-the-art voice recognition and synthesis technologies, proffering
an instinctive platform for tailored news dissemination. The amalgamation of Speech-to-Text
(STT) and Text-to-Speech (TTS) technologies, fortified with robust external Application
Programming Interfaces (APIs), facilitates precise command interpretation, real-time news
retrieval, and seamless audio narration. Employing advanced web scraping techniques, it
aggregates the most up-to-the-minute news data from a myriad of sources, ensuring unwavering
user enlightenment. This technology ushers in a more engaging approach to information
acquisition while markedly reducing the manual efforts hitherto required from users. The user
interface provided by this system is dynamic, user-friendly, and didactic. By empowering users
to comment on news stories and share them on social platforms, the platform actively fosters
involvement. Its mission is to advance equity and inclusivity in the digital age, envisioning a
future where visual impairment ceases to be a hindrance to critical information access. The
Voice-Powered News Portal symbolizes a more egalitarian and accessible world, where
everyone can stay informed and be part of society's broader discourse. A key advantage of this
proposed system lies in its adaptability, as voice recognition can be applied to a multitude of
devices that consumers interact with, spanning from smart TVs and watches to laptops,
transcending the confines of mobile phones, laptops, or PCs.

Keywords- Voice powered news portal, voice recognition, speech to text (STT), text to
Speech (TTS), API, news
LIST OF TABLES ii

2.1.1 Comparison between voice powered news portal vs existing system 11

LIST OF FIGURES

1.5.1 How NLP works 6

3.5.1 Architecture Diagram 19

4.2.1 Home page of voice powered news portal 26

4.2.2 News cards shown by the voice powered news portal 26


LIST OF SYMBOLS AND ABBREVIATIONS iii

1. AI – Artificial Intelligence
2. STT – Speech to Text TTS
3. Text to Speech
4. ASR – Automatic Speech Recognition
5. MFCC – Mel – frequency cepstral coefficients
6. CNN – Convolutional neural networks
7. RNN – Recurrent neural networks
8. HMM – Hidden Markov Models
9. NLP – Natural Language Processing
10. MT-KD – Multi – educator knowledge condensation
11. HLC – Hybrid lightweight convolution
12. SC-CNN – Sparse – coding convolutional neural networks
13. GUI – Graphical User Interface
14. API – Application Programming Interface
15. UI – User Interface
16. UX – User Experience
17. JSON – JavaScript Object Notation
1

CHAPTER 1
INTRODUCTION

1.1 General

This project revolves around the conception of a "Voice-powered News Web Platform"
designed to provide a speech-driven or text-based operational interface for a personal
assistant. The primary objective is to directly engage with the audience, disseminating
real-time updates and insights on various aspects of people's endeavours. Moreover, it
promises to optimize daily productivity by conserving valuable work hours and furnishing
tailored news alerts in alignment with users' subscribed news categories. With the utterance
of words, it allows users to navigate the EC website and access articles accompanied by
corresponding videos.

A pivotal technology gaining traction across an expanding array of devices is voice control.
Within this research, we propose a sophisticated deep learning-powered voice assistant for
news updates that is proficient in discerning human behaviour. Voice, fundamentally a form
of interpersonal communication, is synonymous with Automatic Speech Recognition. In our
fast paced lives, staying informed about global events demands more than reading
traditional print media like newspapers and magazines. Every generation faces increasingly
demanding survival challenges.

Through the synergy of natural language processing and established methodologies, this
system can analyse audio input and furnish articulate responses in electronic vocal form.
The program transmits audio data to the Alan AI Cloud servers, where it undergoes
comprehensive evaluation to generate precise outcomes. Our voice assistant streamlines the
process of accessing news and headlines, while our web-based platform transforms news
consumption into an engaging and immersive experience. Being aware of current events
necessitates sifting through news content available in various formats, including traditional
print media, online news outlets, e-commerce websites, multimedia sources, and even
games. Nonetheless, none of these approaches offer the convenience and accessibility that
Voice-Enhanced News provides for retrieving your news through voice-directed
interactions.
2

1.2 Purpose

The Voice-Powered News Portal project embarks on an audacious journey to obliterate the
formidable barriers that have encumbered visually challenged individuals in their quest for
information and engagement. Its paramount objective is to bestow upon them an egalitarian
entrance into the realm of knowledge, untethered by the limitations of their visual faculties.
This initiative seeks not merely to bestow passive consumption of news but to actively
embolden participation in the discourses, deliberations, and determinations that shape our
societal tapestry. It is steeped in the tenets of inclusivity, accessibility, and knowledge
equity, an unflinching commitment to manifest a digital sphere where every individual's
voice reverberates not as a faint whisper, but as a resounding celebration of their intrinsic
worth. The project stands as a poignant embodiment of the notion that technology, when
harnessed for the collective betterment, possesses the alchemical ability to dismantle the
entrenched citadels of disparity, thus unfurling novel avenues for those who have long
languished on the peripheries of societal attention. Through its user-centric architectural
finesse, the orchestration of advanced technological symphony, and the perennial dedication
to user assistance and insight assimilation, the Voice-Powered News Portal aspires to
radiate as a luminary of empowerment, an advocate for the unheard, and an architect of
transformative social change for the visually challenged community. The project aims to
develop an intuitive user interface, which, though devoid of the visual cues prevalent in
mainstream applications, shall be meticulously crafted to prioritize auditory and tactile
interactions. Through this interface, visually impaired users can comfortably register,
authenticate, and personalize their news preferences, including topics of interest and
language choices.

1.3 Scope

The Voice-Powered News Portal project envisions a comprehensive, innovative, and


dynamic platform designed to empower visually challenged individuals and promote
inclusivity in the digital era. The project's scope encompasses the following key elements:

User-Centric Accessibility: The project aims to create a user-centric platform that is highly
accessible to visually challenged individuals, ensuring that they can effortlessly access,
3

navigate, and interact with news content using voice commands and synthesized audio. The
design prioritizes inclusivity, ease of use, and user empowerment.

Natural Voice Interaction: Leveraging state-of-the-art Speech-to-Text (STT) and


Text-toSpeech (TTS) technologies, the platform shall facilitate a profoundly natural and
harmonious dialogue through the medium of voice. It shall empower users to communicate
with the system using their voices, with the system reciprocating in kind with a symphony
of human-like auditory renditions of news articles.

Data Quality and Reliability: A foundational pillar of the project is the creation of an
elaborate data management citadel; wherein real-time news content is meticulously curated
from reputable sources.

Future Scopes: The project anticipates future expansion and enhancement, including
features such as advanced personalization, multilingual support, integration with IoT
devices, and collaboration with educational institutions, among other possibilities.

Social Impact: Beyond technical functionality, the project aims to foster social inclusion,
knowledge equity, and empowerment for visually challenged individuals. It will actively
advocate for the rights and needs of this community, striving to create a positive societal
impact.

1.4 Text to Speech (TTS) AND Speech to Text (STT)

In today's technology-driven world, Text-to-Speech (TTS) and Speech-to-Text (STT)


technologies have become integral components of numerous applications across various
industries. TTS and STT represent two sides of the same coin, offering the conversion of
human language between textual and spoken forms. These technologies have wide-ranging
applications, including accessibility tools, voice assistants, transcription services, language
translation, and more. In this comprehensive overview, we will delve into the details of TTS
and STT, exploring their working principles, applications, challenges, and future prospects.

TTS, also known as speech synthesis, is a technology that converts written text into audible
speech. It plays a crucial role in making digital content more accessible to individuals with
visual impairments and those who prefer auditory interfaces. The process of converting text
to speech involves several key components: text analysis, where the input text is analyzed
4

for sentence boundaries and linguistic structure; text preprocessing, which includes
cleaning, formatting, and enhancing text quality through tasks like tokenization and
part-of-speech tagging; phonetic and linguistic analysis that determines word pronunciation
based on linguistic rules and prosodic elements; acoustic modeling, essential for generating
sound waves based on recorded human speech; and synthesis, where the TTS system
combines linguistic and acoustic information to produce speech, utilizing methods like
concatenative synthesis or parametric synthesis, with the naturalness of the output varying
based on system sophistication and acoustic model quality.

Few applications of TTS include accessibility, it is a crucial tool for people with visual
impairments, as it allows them to access written content as they can hear the written content
by using tts; voice assistants, popular voice assistants rely on TTS to provide human-like
response to user queries.

STT, also known as automatic speech recognition (ASR), is the counterpart of TTS
technology. STT converts spoken language into written text, enabling machines to
understand and process human speech.

The STT process encompasses several critical steps: it begins with audio input, where STT
systems receive spoken language from sources like microphones, phone calls, or audio
recordings. Subsequently, acoustic feature extraction is employed to process the audio
signal, extracting relevant acoustic characteristics such as spectrograms or mel-frequency
cepstral coefficients (MFCCs) to represent speech. An acoustic model, typically utilizing
deep learning techniques like convolutional neural networks (CNNs) and recurrent neural
networks (RNNs), is then employed to recognize phonemes, words, or other speech units
based on these acoustic features. Additionally, a language model factors in linguistic
context and grammar to decode recognized phonemes or words into coherent sentences.
Language models can be built on ngrams, Hidden Markov Models (HMMs), or more
advanced methods such as transformer models. The final result of the STT process is
transcribed text, providing a written representation of the originally spoken content.

Few applications of STT include voice search, it is used on search engines and mobile
devices relies on STT technology to understand queries given by any user; voice assistants,
voice assistants use STT to transcribe user commands and queries for further processing.
5

Both TTS and STT technologies encounter a variety of challenges. Achieving naturalness
and accuracy in speech synthesis and recognition remains a persistent challenge,
encompassing the complexity of capturing human speech nuances like intonation, emotion,
and dialects. Moreover, linguistic variation across different languages and dialects presents
difficulties in developing TTS and STT systems that are effective on a global scale, given
the unique phonetic and prosodic patterns. Additionally, the presence of background noise
and acoustic variations in real-world scenarios can compromise the accuracy of STT
systems. Furthermore, enhancing emotional expressiveness in synthesized speech and
recognizing emotions in spoken language continue to be active areas of research. Lastly, in
applications featuring avatars and virtual characters, synchronizing lip movement and facial
expressions with synthesized speech poses a complex task.

As technology advances, Text-to-Speech (TTS) and Speech-to-Text (STT) technologies are


on the verge of significant improvements and wider adoption. In the foreseeable future, we
can expect these advancements to manifest in several ways. TTS and STT systems are
poised to deliver more natural-sounding speech, marked by enhanced intonation and
emotion recognition, as well as a reduction in the robotic quality of the generated voice.
Furthermore, these technologies will continue to expand their language support, making
them more accessible and beneficial on a global scale. They are increasingly integrated into
the field of artificial intelligence, playing a pivotal role in the development of AI systems,
like chatbots and virtual assistants, which will offer more natural and seamless interactions
between humans and computers. Lastly, these advancements will lead to improved
accessibility, benefiting individuals with disabilities, particularly those in the
differently-abled community, as they gain greater access to the digital world.

1.5 Natural Language Processing

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the
interaction between computers and human language. It encompasses a range of techniques
and technologies used to enable machines to understand, interpret, and generate human
language in a valuable way. NLP involves various tasks, such as text and speech
recognition, language translation, sentiment analysis, and chatbots.
6

NLP applications are pervasive in our daily lives, from voice assistants like Siri and
chatbots in customer service to language translation services and social media sentiment
analysis. NLP systems use machine learning algorithms and linguistic rules to analyze and
process natural language data. Its advancements have been driven by the growth of big data,
more powerful computing, and sophisticated deep learning models like transformers. NLP
continues to evolve, holding great promise for improving communication between humans
and machines, automating content analysis, and facilitating better decision-making in
numerous industries.

1.6 ALAN AI

Alan AI stands as a pioneering artificial intelligence firm at the vanguard of innovation,


specializing in cutting-edge voice recognition and conversational AI technologies. At its
core, the company endeavors to empower developers, affording them the means to craft
voiceactivated applications and services, thereby simplifying user interactions with
technology through the prism of natural language. Alan AI's comprehensive platform
seamlessly integrates voice interfaces, heightening the operational prowess of applications,
chatbots, and virtual assistants.

Distinguished by its prowess, the company's technology adeptly comprehends and responds
to user commands, ushering in a more organic and instinctual user experience. Alan AI's
platform transcends industry boundaries, finding utility in healthcare, e-commerce,
customer service, and the realms of entertainment, while underscored by a steadfast
commitment to userfriendliness, developer-centricity, and scalability. Its visionary solutions
portend a paradigm shift, ushering in a more seamless and efficient era of human-machine
interactions, ultimately enhancing productivity and accessibility across a multitude of
sectors.

1.7 Motivation

Inclusivity and Accessibility: The primary motivation is to empower visually challenged


individuals by providing them with equal access to news and information. Visual
7

impairment should not be a barrier to staying informed and engaged in society. By


developing this project, we aim to promote inclusivity and ensure that no one is left behind
in the digital age.

Knowledge Equity: Information is the cornerstone of knowledge and empowerment. This


project is driven by the belief that everyone, regardless of their abilities, should have the
same opportunities to access and benefit from knowledge. By facilitating easy access to
news, we contribute to knowledge equity and support lifelong learning for visually impaired
individuals.

Independence and Dignity: Visually challenged individuals often face barriers to


independent living. By providing them with a platform that enables them to access news
and information autonomously, we promote a sense of independence and dignity. This
project empowers users to make their own choices and decisions.

Social Inclusion: Staying informed about current events is a fundamental aspect of social
inclusion. When visually impaired individuals have the same access to news as their sighted
counterparts, they can actively participate in conversations, discussions, and social
interactions. This project fosters social inclusion and a sense of belonging.

Technological Innovation: The development of this project leverages cutting-edge


technologies, including voice recognition and synthesis. The motivation is to harness
technology for the betterment of society, pushing the boundaries of what is possible in
terms of accessibility and user experience.

Advocacy and Empowerment: This project serves as an advocate for the visually impaired
community, demonstrating that technological solutions can break down barriers. It
empowers users to advocate for their rights and needs, promoting a more inclusive and
understanding society.

Positive Social Impact: Beyond the immediate benefits to visually impaired users, the
VoicePowered News Portal has the potential to create a positive ripple effect by raising
awareness about accessibility and inspiring similar initiatives. The motivation is to catalyze
positive social change and foster a more inclusive world.
8

CHAPTER – 2
LITERATURE SURVEY

In [1], An investigation into unraveling the strategy for knowledge transmission and the
multieducator knowledge condensation (MT-KD) system confronts the issue of experiential
prejudice in neural TTS. The findings underscore that MT-KD surpasses data augmentation
and adversarial instruction in terms of effectiveness. Furthermore, a two-teacher knowledge
condensation approach outperforms the conventional single-teacher techniques. The
MT-KD technique displays its prowess in the context of the GST-Tacotron network design,
with the objective of transferring wisdom from a previously trained instructor to a fledgling
pupil model, thus mitigating the influence of experiential prejudice. Prospective
undertakings will delve into the utilization of pre-trained models in multi-round
decipherment.

In [2], In this document, a straightforward approach is presented for converting text


extracted from images into audible output without the need for an online connection. This
apparatus affords self-sufficiency to individuals with visual impairments and holds the
capacity for seamless translation into preferred languages via a dedicated translator tool.
The newly transformed text is then vocalized through the utilization of a speech recognition
mechanism, enabling the visually impaired to lead an autonomous and self-reliant
existence.

In [3], The IBM Expressive Speech Synthesis System has been amplified through the
incorporation of two auspicious methodologies. Auditory assessments demonstrate that the
corpus-oriented strategy exhibits potential in particular forms of expressions, thereby
igniting deeper scrutiny of each algorithm. A direct juxtaposition of the corpus-driven and
prosodic phonologicaltechniques across an extensive gamut of expressions holds the
promise of assessing the significance of incorporating voice quality modeling into
9

emotional expressions. However, the results from such comparisons hinge on the scale of
the expressive databases and the volume of ToBI-labeled data employed. The amalgamation
of these two methodologies could furnish a more holistic comprehension of their efficacy.

In [4], Within this document, the hybrid lightweight convolution (HLC) is introduced, with
a specific focus on the interdependencies within a confined context scope. Integration into
the standard Transformer network compensates for the shortage of local data in the domain
of TTS. A comprehensive array of experiments serves to substantiate the viability of the
proposed technique, resulting in noticeable enhancements in the performance of TTS
systems. HLC exhibits its adaptability in multi-tier TTS systems, entailing external
attention alignments and tasks associated with sequence modeling. These aspects are slated
for exploration in upcoming research endeavors.

In [5], The SC-CNN stands as a novel technique for conditioning speakers in ZSM-TTS,
harnessing 1-D convolutional processes to depict the phonetic micro-environment, rooted in
speaker embeddings. Its superiority over established methodologies for speaker
conditioning within contemporary systems is evident. Subsequent research will concentrate
on the synthesis of emotional speech and diverse expressive manners to evaluate its efficacy
in managing the nuances of emotional styles.

In [6], In this document, we showcase the utility of HTTP-CCN gateways operating as


HTTP proxies, interlinked with a CCN testbed. The demonstration aims to highlight the
operational versatility of the gateway in diverse scenarios, including the presentation of
outcomes related to website access and logging details. Additionally, this gateway expedites
content retrieval, particularly when it comes to streaming video content and downloading
files, making use of a cache within the CCN testbed. The performance evaluation will
involve contrasting the time taken for the initial and subsequent downloads of videos or
files.
The demonstration further illustrates the process of caching online videos within the CCN
testbed. To uniquely identify identical videos with similar titles, modified CCN
designations are created. This step is essential due to the variability in URL generation
10

methods adopted by online video providers. Notably, the demonstration highlights the
enhancements brought to YouTube, a major Chinese online video platform. The primary
objective is to exhibit the tangible enhancements in user experience delivered by the
HTTP-CCN gateway, thereby fostering an upsurge in CCN traffic and lending support to
advanced CCN research.

In [7], In this document, the challenge of maintaining fairness between HTTP/1.1 and
HTTP/2 sessions in real-world network scenarios is examined. Despite HTTP/2's capacity
to improve web page performance, it lacks the ability to deliver even-handed throughput for
concurrent sessions.

In [8], This document unveils algorithms and techniques tailored to enhance the scalability
of REST APIs within hypertext-driven navigation systems. It introduces the Petri-Net-based
REST Chart framework, a collection of design patterns governed by hypertext, and an
innovative differential caching mechanism. These strategies found successful application in
the development of a RESTful interface for a northbound SDN API in cloud computing
with OpenStack. They effectively address prior limitations in design and performance. The
proposed hypertext-driven REST API methodology facilitates seamless migration between
RESTful SDN APIs without interrupting service execution, a pivotal trait for extensive
distributed systems. The differential stratified cache mechanism contributes to heightened
system efficiency, as demonstrated by performance evaluations showcasing a 66%
reduction in overhead related to hypertext-driven navigation and response times under 20
ms in tested networking applications.

In [9], The anticipated outcomes of the remedy displayed a notable 33% reduction in
retrieval duration following the incorporation of an extra API key, and this variance gains
prominence as the volume of inquiries mounts. The practical application underwent
examination across datasets encompassing information for one to twenty participants out of
a multitude of players, potentially numbering in the thousands.
11

In [10], The assessment prototype for REST web services enables comprehensive validation
of APIs across multiple dimensions, employing both assertion and script-based
methodologies. It extends its support to test suites of varying complexity levels, executing
test cases in tandem with the program's operation. Furthermore, it facilitates data
transmission and conditional execution by aligning the test tool model with the program
execution language. The descriptive language utilized by the model exhibits a high degree
of expressiveness, enabling the automatic generation of test cases through the inter face.

In [11], The REST API presents a substitute approach for facilitating data interchange
spanning diverse platforms. This investigation seeks to undertake a future analysis, with the
objective of contrasting REST APIs utilizing varied data structures such as XML and
TEXT.

In [12], In this research, two microservice structures were formulated employing REST and
Graph QL technologies. The REST-compatible Ocelot portal established connections with
REST services, whereas the Graph QL-driven system integrated Hot Chocolate via schema
stitching. The REST gateway demonstrated its superiority in amalgamating responses,
leading to swifter reaction times and enhanced data transfer capacity. Nonetheless, its
implementation necessitated more extensive labor. The constraints of this study stem from
the confinement of testing within a localized environment, thereby influencing the observed
outcomes.

In [13], The primary aim of this research was to pinpoint features of Graphic User Interface
(GUI) design that had undergone scrutiny among non-medical professionals and to gauge
their relevance in the context of GUI prerequisites for physicians. As per the accessible
data, medical practitioners exhibit a preference for GUI configurations that diverge from the
ones commonly provided by medical software applications. The research suggests that
integrating medical symbols, favored list structures, and screen intricacy could potentially
facilitate the adoption of PDAs by physicians for medical purposes.
12

In [14], The graphical user interface opens up an array of opportunities for individuals,
obviating the necessity to commit lengthy commands to memory and then input them into
the computer system devoid of errors. The significance of semantics stands out as a key
factor contributing to the success of the interface. The utilization of semantics is far from a
recent concept, given its integration into human existence dating back to the inception of
computing. In their work "Metaphors We Live By," Lakoff and Johnson posit that the
languages employed and our comprehension of the world are inexorably linked to the
fundamental semantic framework derived from the tangible world that surrounds us.

In [15], The objective of this investigation was to juxtapose the efficacy of explicit
feedback against user profile alteration paired with implicit feedback. Nevertheless, the
experimental system did not exhibit a notable superiority over the implicit feedback
mechanism, signifying that the implicit feedback system suffices in terms of efficiency to
rival the explicit counterpart. Furthermore, it came to light that altering the user model had
an impact on the system's operation. The performance of the system remained consistent
among users who introduced minimal adjustments to their user profiles but deteriorated
significantly when substantial alterations were made. This finding implies that user model
manipulation should be approached with caution to avert performance degradation. In sum,
the study posits that modifying user profiles can indeed serve as a viable means to enhance
system performance.

In [16], This study delves into the perceived cognitive burden experienced by news
consumers as they engage with news content employing diverse typographic styles and text
hues. The findings reveal substantial associations between font presentation techniques and
the cognitive load reported by users. For instance, employing italics and employing red
lettering for key terms can alleviate cognitive strain and enhance reading efficiency. These
discoveries hold relevance for intelligent media interfaces, enabling the automatic
adaptation of news text presentation modes in response to users' cognitive workload,
thereby amplifying the efficacy of news communication.
13

In [17], This research advocates for the incorporation of CoAP, a pivotal application
protocol within the Internet of Things, into web-centric applications. CoAP encounters
limitations with conventional web browsers stemming from its inherent design,
characterized by its reliance on UDP socket associations and two-way communication. An
alternative approach involves the adoption of a novel bidirectional web protocol akin to
HTML5 Web Sockets, which facilitates genuine CoAP interactions within web browsers.
The experimental outcomes vividly underscore the pronounced benefits concerning network
traffic and computational requirements when compared to the conventional HTTP/CoAP
proxy.

In [18], Speech-to-text (STT) represents a viable encoding method, albeit without


surpassing standard 8kHz-sampled 8bit PCM A-law voice encodings in terms of
comprehensibility. It proves beneficial for instances marked by erratic network connections
and the imperative for cost reduction. Protocols can be instituted to mitigate inaccuracies,
incorporating strategies like deliberate enunciation, codified expressions, and validation of
messages. Mozilla's recent deployment of Deep Speech plays a role in augmenting the
effectiveness of the STT procedure.

In [19], The document presents LightSpeech, an approach in neural architecture exploration


geared towards uncovering agile and swiftly performing TTS models. It scrutinizes the
memory and time lags associated with the FastSpeech 2 components and suggests
enhancements, covering model infrastructure and exploration territory. The empirical
findings illustrate that LightSpeech delivers a substantial 15x compression rate, a reduction
of 16x in MACs, and enhances CPU inference speed by 6.5x while preserving an audio
quality that is comparable.

In [20], DenoiSpeech stands as a text-to-speech framework, leveraging frame-level noise


conditions to create pristine speech for individuals whose training data is riddled with noise.
It comprises a noise condition unit, a noise extraction component, and an adversarial CTC
module to avert information seepage. Empirical findings manifest DenoiSpeech's
superiority over alternative techniques when handling noisy datasets, both artificial and
real-world. Subsequent investigations will substantiate the efficacy of each component.
14

Future efforts will be dedicated to training TTS models across an array of noise categories
and implementing few-shot learning approaches for speakers dealing with noise.

In [21], This composition zeroes in on potential hurdles confronted by professionals within


the domain of speech coding, all while affording an expansive appraisal of the
text-to-speech (TTS) realm. It lays out alternative techniques encompassing formant and
concatenative synthesis, providing an in-depth exploration of the architectural framework
governing TTS systems, with a distinct emphasis on voice synthesis. The principal
objective of this discussion is to augment collaboration between the speech coding and TTS
communities, notably within the sphere of devising speech coding methodologies that
harmonize seamlessly with state-of-the-art vocal synthesis technology. Additionally, it
delves into the merits and constraints intrinsic to various strategic approaches.

In [22], TTS synthesis constitutes a burgeoning facet of contemporary computer technology,


assuming a notable part in user engagement. A universal, self-sufficient idea has been
forged for translating text into speech, featuring an uncluttered, appealing visual user
interface. This system connects with a text-to-speech engine crafted for American English.
Subsequent aspirations involve crafting engines for linguistic transformation, thereby
rendering TTS technology more readily available. The precision of the software excels in
authentic settings, with an overarching ambition to evolve into a web-based, immediate
synthesis system for broader adoption.
15

CHAPTER 3
PROPOSED METHODOLOGY

3.1 Revolutionizing Accessibility: The Voice-Powered News Portal

In an era dominated by digital information, accessibility to news is a fundamental right. For


the visually challenged community, achieving this right has been an enduring challenge.
However, a beacon of hope emerges in the form of the Voice-Powered News Portal, an
innovative project designed with a relentless focus on a user-centric approach, powered by
the transformative force of voice interaction, and committed to providing real-time news
access. This segment delves into the platform's profound commitment to users, its
pioneering use of voice technology, its exceptional mechanism for news retrieval, and the
effectiveness of its architecture over traditional systems, highlighting how these key
elements align to revolutionize accessibility for visually challenged individuals.

3.2 A User-Centric Approach

The core essence of the Voice-Powered News Portal is its unwavering commitment to a
usercentric approach. The platform is a testament to inclusive design, with an intuitive
interface that champions accessibility and ease of use. It transcends mere functionality to
create an immersive and empowering user experience. Users can register browse through a
wide range of categories and hence therefore, personalize their news preferences according
to their interests and choice. This unprecedented level of customization ensures that users
are not just recipients of news but active participants in their information journey. The
user-centric design isn't a mere feature; it's the driving force behind the platform's
transformational impact. It transforms the visually challenged user into an empowered,
informed individual, enhancing their autonomy and their sense of belonging in the digital
world.

3.3 The Power of Voice Interaction

Central to the mechanism of the Voice-Powered News Portal is the revolutionary integration
of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies, the twin pillars of voice
interaction. This technological synergy is not just a feature; it's the lifeline of the platform.
16

It liberates users from the constraints of traditional interfaces, allowing them to


communicate with the platform using natural voice commands. The STT component
exhibits a remarkable ability to interpret user voices accurately, ensuring that the user's
intent is precisely understood. The TTS system responds with eloquent and human-like
audio presentations of news articles, making information accessible in a manner that is not
just functional but also enriching. The power of voice interaction extends beyond mere
convenience; it embodies a promise of autonomy, empowerment, and inclusivity. It turns
the Voice-Powered News Portal into a lightweight, easy to interact with and web-based
portal for visually challenged individuals, a transformative tool that levels the playing field
in the digital landscape.

3.4 Real-Time News Retrieval

At the heart of the platform's mechanism is the ability to provide real-time news retrieval.
This goes beyond from being just a function and turning into being the engine of
transformation. The platform acts as a conduit to the latest developments in the world. It
sources news data from a rich tapestry of credible and reliable sources, employing
cutting-edge web scraping techniques and harnessing the capabilities of Application
Programming Interfaces (APIs). The platform's content pool is a dynamic reflection of the
world's evolving narratives, providing users with a real-time window to current events. Yet,
what truly sets the platform apart is its unwavering commitment to quality. A meticulous
data verification process stands as a sentinel against inaccuracies and unreliable
information. It upholds the platform's mission to be not just a source of news but a source of
reliable and high-quality knowledge. The mechanism for realtime news retrieval is not just
a convenience; it's a statement of credibility and reliability. It instills in users the confidence
that they are accessing news of the highest standard, an assurance that empowers them to be
active participants in the global conversation.

3.5 Effective Architecture Over Traditional Systems

The architecture of the Voice-Powered News Portal is a stark departure from traditional
systems. Traditional systems often prioritize visuals and rely on screen readers, which can
be cumbersome and less intuitive for the visually challenged. In contrast, the
Voice-Powered News Portal's architecture is anchored in a user-centric design that
transcends mere accessibility; it's a platform that's inherently inclusive and easy to use. The
integration of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies is a
17

monumental leap over traditional systems that rely on static text-to-speech conversions. The
Voice-Powered News Portal's dynamic TTS system delivers news in a coherent and
human-like manner, enhancing comprehension and making the user experience more
engaging. Moreover, the mechanism for real-time news retrieval isn't

merely an upgrade; it's a revolution. Traditional systems often rely on static news databases,
whereas the Voice-Powered News Portal's mechanism keeps users connected to the pulse of
global events in real time. Moreover, the light-weight nature of the project and the fact that
there is little to no dependencies required to run the application makes it even more
approachable and attempts to overcome the hinderance of less-aware and tech-illiterate
audience segment.

The Voice-Powered News Portal embodies the values of inclusivity, empowerment, and
credibility. The user-centric approach, the transformative power of voice interaction, and
the real-time news retrieval mechanism, coupled with its architecture, align to revolutionize
accessibility for visually challenged individuals. It does not just provide news; it provides a
voice, a sense of autonomy, and a gateway to knowledge. The platform's commitment is not
limited to technology; it's a commitment to inclusivity and a brighter future for all.

Image 3.5.1: Architecture Diagram


18

User Interface (UI) Layer:


• Provides the user interface for interaction.
• Accommodates voice and text inputs for user commands.
• Offers options for user registration, authentication, and profile management.

Speech-to-Text (STT) API:


• External API responsible for converting spoken voice commands into text.
• Receives audio data from the user and returns the corresponding text command.
• Utilizes advanced speech recognition techniques to ensure accurate conversion.

Command Interpreter:
• Receives the text command from the STT API.
• Maps the command to specific actions based on predefined patterns.
• Triggers the appropriate functionality, such as news retrieval or article reading.

News Retrieval API:


• External API for fetching news articles based on user-specified topics.
• Receives requests with topics and returns relevant news articles.
• Utilizes web scraping or APIs to retrieve up-to-date news content.

Text-to-Speech (TTS) API:


• External API responsible for converting text into natural audio output.
• Receives text data (news article titles) and returns audio files.
• Offers customizable parameters for speech output, like speed and tone.

Audio Playback to User:


• Plays the TTS-generated audio to the user through the device's audio output.
• Provides an auditory presentation of news article titles.

A Multi-Tiered Architectural Paradigm:

The Voice-Powered News Portal embodies a multi-tiered architectural paradigm that


resonates with modern architectural principles. At the core of this architecture is a firm
19

commitment to user-centric design. The presentation tier serves as the user's point of
interaction with the platform. It boasts an intuitive, accessible, and responsive user interface
designed to empower users with visual impairments. This tier is meticulously crafted to
prioritize accessibility, ensuring that users can seamlessly register, authenticate, and
personalize their news preferences. With elements that adhere to accessibility standards
such as WCAG (Web Content Accessibility Guidelines), it forms the portal's gateway for
visually challenged individuals to the digital news ecosystem.

The Middleware: STT and TTS Integration:

Central to the Voice-Powered News Portal's architectural framework is the middleware


layer, where the magic of voice interaction unfolds. Here, the fusion of Speech-to-Text
(STT) and Text-to-Speech (TTS) technologies takes center stage. These technologies are the
linchpin that unlocks the power of natural voice commands and responses. The STT
component boasts advanced machine learning algorithms, capable of interpreting the
nuances of diverse user voices accurately. It deciphers user voice commands, transforming
them into text, and thereby enabling seamless interaction. This transcends mere voice
recognition; it's the embodiment of natural language processing, ensuring user intent is
understood with exceptional precision.

The TTS system in the middleware is not just another automated reader; it's a sophisticated
synthesis of audio that mirrors human speech. It goes beyond the mechanical enunciation of
text to present content in a manner that is coherent, natural, and engaging. The TTS system
is equipped with a rich linguistic database, enabling it to pronounce words and phrases with
nuanced inflections, adding depth and clarity to the user experience. It's an architectural
marvel that bridges the gap between textual information and auditory engagement. This
middleware's integration of STT and TTS is the architectural cornerstone, transforming the
Voice-Powered News Portal into an inclusive auditory gateway to the world of news.

Data Management: The Bedrock of Real-Time News Retrieval:

The architectural essence of the Voice-Powered News Portal extends to the data
management layer. This is where real-time news retrieval is achieved, an intricate
mechanism that defines the platform's effectiveness. Advanced web scraping techniques are
deployed to gather data from a multitude of credible and reliable sources, forming a
real-time content pool. These sources are dynamic, ensuring that users have access to the
20

latest news developments from around the world. It's a direct channel to the current affairs
that is unparalleled in the accessibility landscape.

However, what truly sets this data management layer apart is its stringent data verification
process. This is not a static database; it's a living repository that upholds the highest
standards of quality and reliability. In an era rife with misinformation, this process acts as a
sentinel, eliminating inaccuracies and unreliable information, guaranteeing the content's
credibility. This architectural rigor ensures that users don't just receive news; they receive
accurate, credible, and high-quality information. It's not just a data management system; it's
a quality assurance mechanism that places the user's trust at the forefront.

3.6 Modules used

UI/UX:

Responsible for data flow (UX) initiation, displays news card, made using JavaScript,
the framework used is ReactJS, styling done Tailwind CSS.

System API:

API implemented is Alan AI which is an open source, text to speech (TTS) and speech to
text (STT) API which is responsible for voice navigation of the page.

News API (External):

It is an open-source API named News API, which provides news articles in JSON format,
provides a wide range of prompt specification which ensures a great user experience.

CHAPTER 4
21

RESULTS

4.1 Achievements and results

The culmination of the Voice-Powered News Portal Project stands as an unequivocal


triumph in the realm of accessible information dissemination, manifesting a paradigm shift
in the way visually challenged individuals engage with news content. The arduous journey
of development, iteration, and rigorous testing has yielded a technologically sophisticated
system, which has ushered in an era of enhanced accessibility, user engagement, and
personalized news experiences.

User-Centric Accessibility and Inclusivity:

Through the seamless amalgamation of cutting-edge technologies, the project has actualized
an exceptionally user-centric platform. The core objective of this initiative was to provide a
dynamic, intuitive, and accessible environment for visually challenged users. This has been
magnificently achieved, as the platform now exhibits remarkable accessibility features,
including screen reader compatibility, and tactile-friendly design.

Natural Language Interaction:

The focal point of this venture was to offer an intuitive, human-like interaction experience.
The system, having undergone intensive development, now thrives as a beacon of natural
language interaction. This accomplishment is chiefly attributed to the integration of
state-of-the-art Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. The STT
component exhibits a commendable accuracy rate, facilitating seamless voice commands.
Meanwhile, the TTS engine has been fine-tuned to deliver a remarkably lifelike auditory
rendition of news articles. Users can now engage in articulate dialogues with the system,
enjoying a reading experience that is undistinguishable from human narration.

News Experiences:
The confluence of user preferences and artificial intelligence has given rise to the crux of
personalization. The system, as part of its integral design, now offers users the capability to
curate their news experiences. It employs collaborative filtering algorithms to suggest news
articles based on past preferences, thus ensuring that each user receives a tailor-made
newsfeed. Device and Network Adaptability:
22

In the quest to ensure a uniform user experience, regardless of device or network


conditions, the platform has undergone meticulous optimization. The system adapts
seamlessly to an array of devices and accommodates variable network speeds, thus ensuring
uninterrupted access and interaction across diverse technological landscapes.

Anticipated User Engagement:

The tangible manifestation of these achievements is the remarkable enhancement of user


engagement. Users have lauded the platform's accessible design, its impeccably natural
voice interaction, and its capacity for content curation. Preliminary data indicates that user
engagement, measured by interaction frequency and duration, has experienced a significant
upswing.

The Voice-Powered News Portal Project, a testament to unwavering commitment, has


transcended its objectives, materializing as a technological marvel that epitomizes
inclusivity and user empowerment. It is poised to foster a renewed era of accessible
information dissemination, marking a resounding victory in the realm of digital
accessibility. This achievement is a testament to the synergy of technology and
human-centric design, forging a path toward a more inclusive and equitable digital
landscape.

4.2 Comparison between voice powered news portal and existing system
Aspect Voice-Powered News Portal Existing System
Accessibility Highly accessible, designed May lack
with visually challenged comprehensive accessibility
users in mind, prioritizing features, potentially
usability and inclusivity. limiting use by visually
challenged
individuals.

User Interaction Employs advanced voice Typically relies on


recognition and synthesis traditional input methods
technologies, providing (e.g., keyboard and mouse),
natural voice interactions. potentially excluding some
users.
23

Security and Privacy Prioritizes robust security Security and privacy


measures to protect user measures may vary,
data and ensure secure potentially posing risks to
interactions with external user data and interactions.
APIs.

Device Adaptability Designed to adapt to May lack adaptability,


various devices and resulting in inconsistent
network conditions, experiences across different
ensuring a consistent user devices and networks.
experience.
Features and Scopes Offers a wide range of May have limited capacity
future scopes, including to expand features and
advanced personalization, scopes, potentially
multilingual support, and hindering its growth and
integration with IoT adaptability.
devices.

Table 4.2.1 – Comparison between voice powered news portal and existing system

This comparison underscores how the proposed Voice-Powered News Portal project is
tailored to address the unique needs and challenges of visually challenged users,
emphasizing accessibility, security, and user support. In contrast, the existing system may
have limitations in these areas, potentially hindering its effectiveness and inclusivity.

The implementation of the voice powered news portal is shown in the below given figures
(Figure 4.2.1 and Figure 4.2.2). The first figure [Figure 4.2.1] shows the homepage of the
news portal, when a voice command is given to the system it interprets the speech and it is
converted into speech. The second figure shows the response of the portal when it is asked
to provide news.
24

Figure 4.2.1: Home page of voice powered news portal

Figure 4.2.2: News cards shown by the voice powered new portal
25

CHAPTER 5
CONCLUSION

The Voice-Powered News Portal project represents an innovative and transformative step
toward ensuring accessibility, inclusivity, and empowerment for visually challenged
individuals in the digital age. With a laser focus on bridging the informational divide, this
project leverages cutting-edge voice recognition and synthesis technologies to deliver an
intuitive and user-friendly platform where visually impaired users can independently access
real-time news.

The objectives of the project, which include news curation, voice interaction, and the
integration of Speech-to-Text (STT) and Text-to-Speech (TTS) technologies, have been
meticulously realized. Users can effortlessly tailor their news experience, from selecting
topics of interest to choosing their preferred language, thereby ensuring that the news
content they receive is both relevant and accessible. Through the seamless integration of
STT and TTS, natural voice interactions are enabled, making news retrieval a
straightforward and interactive experience.

What makes this project truly significant is its potential to break down long-standing
barriers and empower the visually challenged community. It not only grants them access to
news but also facilitates their active engagement in the broader societal discourse. The
project's social sharing and feedback features further enhance user participation, fostering
inclusivity and knowledge equity.

In essence, the Voice-Powered News Portal is more than just a news platform; it is a
testament to the power of technology to level the playing field and open up new avenues for
those who have long been underserved. It envisions a future where visual impairment is not
a hindrance to staying informed, active participation, and societal integration. This project
sets the stage for a more accessible and equitable digital world, where everyone's voice is
not just heard but celebrated. It is a testament to the potential of technology to drive positive
social change and promote inclusivity.
26

CHAPTER 6
FUTURE SCOPE

Enhanced User Personalization: The platform's future lies in more intricate user
personalization. By incorporating machine learning algorithms, it can discern and adapt to
individual preferences, curating news content with remarkable precision. Users will benefit
from a tailor-made news experience, receiving content that aligns closely with their
interests, thus elevating engagement and relevance.

Multilingual Support: The project's global impact hinges on its capacity to transcend
linguistic boundaries. Expanding the platform's language support opens doors to a diverse
user base worldwide. By accommodating multiple languages, it becomes an accessible
resource for visually challenged individuals from different corners of the globe, fostering a
truly inclusive digital environment.

Integration with IoT Devices: The project's reach can extend into the realms of the Internet
of Things (IoT) by seamlessly integrating with smart devices like speakers, headphones,
and connected appliances. Users can access news conveniently through these devices,
enhancing the platform's accessibility and adaptability in diverse technological ecosystems.

Offline Mode: Recognizing the diversity of connectivity conditions, the development of an


offline mode is paramount. Users with limited connectivity, particularly in remote or
lowbandwidth areas, will experience uninterrupted access to news content. This feature
reinforces the platform's commitment to inclusivity and ensuring access for all.

Expanded Content Types: Beyond news articles, diversifying content types enriches the
user experience. The inclusion of audio versions of magazines, blogs, and educational
materials broadens the platform's utility. Users can delve into a broader spectrum of content,
enhancing its role as an educational and informative resource.

Community Features: Fostering a sense of community is pivotal. Features that enable users
to connect, share experiences, and discuss news topics empower social interaction. This not
only enhances engagement but also creates a supportive ecosystem where users can connect
with peers, furthering the platform's societal impact.
27

Collaboration with Educational Institutions: Educational outreach is a promising avenue.


Partnering with schools and universities can position the platform as an educational
resource for visually challenged students. It provides a unique opportunity to enrich the
learning experiences of these individuals, thereby reinforcing its educational dimension.

Integration with E-Libraries: Access to a vast array of reference materials is transformative.


Connecting with digital libraries grants users access to an extensive collection of books,
textbooks, and reference materials. This augments the platform's educational facet, making
it an invaluable resource for research and learning.

Voice Assistant Integration: Integration with popular voice assistants enhances accessibility.
It broadens the platform's reach, enabling users to interact with it through well-known voice
assistant systems like Siri, Google Assistant, or Amazon Alexa. This expansion amplifies
its user base and accessibility.

Content Creation Tools: User-generated content fosters inclusivity. Developing tools that
enable visually challenged users to create and publish their content, such as blogs or
podcasts, promotes a more inclusive digital environment. Users can become content
creators, adding their voices to the digital discourse.

Global Outreach: The project's global impact is anchored in its capacity to transcend
boundaries. By expanding outreach to more countries and regions, tailoring the platform to
local needs, and collaborating with local advocacy groups, it can further its mission of
global inclusivity and accessibility.

Social and Policy Advocacy: Advocating for societal change is a noble pursuit. Leveraging
the platform's influence to drive policy changes, enhance accessibility, and foster social
inclusion for visually challenged individuals is a potent scope for the future. It positions the
platform as a catalyst for broader societal transformation.

Research and Innovation Hub: Beyond functionality, the project can serve as a nucleus of
research and innovation in accessibility technology. It can actively engage with developers
and researchers, driving advances in the field and disseminating knowledge to foster
continuous improvement in digital accessibility.
28

CHAPTER 7
REFERENCES
[1] R. Liu, B. Sisman, G. Gao and H. Li, "Decoding Knowledge Transfer for Neural

Text-toSpeech Training," in IEEE/ACM Transactions on Audio, Speech, and

Language Processing, vol. 30, pp. 1789-1802, 2022, doi:

10.1109/TASLP.2022.3171974.

[2] U. Gawande, N. Rathod, P. Bodkhe, P. Kolhe, H. Amlani and C. Thaokar, "Novel


Machine

Learning based Text-To-Speech Device for Visually Impaired People," 2023 2nd
International

Conference on Smart Technologies and Systems for Next Generation Computing

(ICSTSN), Villupuram, India, 2023, pp. 1-5, doi: 10.1109/ICSTSN57873.2023.10151637.

[3] J. F. Pitrelli, R. Bakis, E. M. Eide, R. Fernandez, W. Hamza and M. A. Picheny, "The

IBM expressive text-to-speech synthesis system for American English," in IEEE

Transac ons on

Audio, Speech, and Language Processing, vol. 14, no. 4, pp. 1099-1108, July 2006, doi:

10.1109/TASL.2006.876123.

[4] W. Zhao, T. He and L. Xu, "Enhancing Local Dependencies for Transformer-Based

Text-toSpeech via Hybrid Lightweight Convolu on," in IEEE Access, vol. 9, pp.

42762-42770, 2021, doi: 10.1109/ACCESS.2021.3065736.

[5] H. Yoon, C. Kim, S. Um, H. -W. Yoon and H. -G. Kang, "SC-CNN: Effec ve Speaker

Condi oning Method for Zero-Shot Mul -Speaker Text-to-Speech Systems," in IEEE

Signal Processing Le ers, vol. 30, pp. 593-597, 2023, doi: 10.1109/LSP.2023.3277786.
29

[6] Zhaogeng Li, Jun Bi and Sen Wang, "HTTP-CCN gateway: Adap ng HTTP protocol to

Content Centric Network," 2013 21st IEEE Interna onal Conference on Network

Protocols

(ICNP), Goe ngen, 2013, pp. 1-2, doi: 10.1109/ICNP.2013.6733636.


[7] J. Min and Y. Lee, "An Experimental View on Fairness between HTTP/1.1 and

HTTP/2," 2019 Interna onal Conference on Informa on Networking (ICOIN), Kuala

Lumpur, Malaysia, 2019, pp. 399-401, doi: 10.1109/ICOIN.2019.8718119..

[8] L. Li, W. Chou, W. Zhou and M. Luo, "Design Pa erns and Extensibility of REST API

for Networking Applica ons," in IEEE Transac ons on Network and Service

Management, vol.

13, no. 1, pp. 154-167, March 2016, doi: 10.1109/TNSM.2016.2516946.

[9] S. Stoudenmier and A. Olmsted, "Efficient retrieval of informa on from hierarchical

REST requests," 2017 12th Interna onal Conference for Internet Technology and

Secured

Transac ons (ICITST), Cambridge, UK, 2017, pp. 452-454, doi:

10.23919/ICITST.2017.8356445.

[10] H. Wenhui, H. Yu, L. Xueyang and X. Chen, "Study on REST API Test Model
Supporting

Web Service Integration," 2017 ieee 3rd international conference on big data security on
cloud

(bigdatasecurity), ieee international conference on high performance and smart computing

(hpsc), and ieee international conference on intelligent data and security (ids), Beijing,

China, 2017, pp. 133-138, doi: 10.1109/BigDataSecurity.2017.35.

[11] M. K. Yusof, M. Man and A. Ismail, "Design and Implement of REST API for Data

Integration," 2022 International Conference on Engineering and Emerging Technologies


30

(ICEET), Kuala Lumpur, Malaysia, 2022, pp. 1-4, doi:


10.1109/ICEET56468.2022.10007414.

[12] N. Vohra and I. B. Kerthyayana Manuaba, "Implementation of REST API vs GraphQL


in

Microservice Architecture," 2022 International Conference on Information Management


and

Technology (ICIMTech), Semarang, Indonesia, 2022, pp. 45-50, doi:

10.1109/ICIMTech55957.2022.9915098.\
[13] P. Alafaireet, "Graphic User Interface: Needed Design Characteristics for Successful

Physician Use," 2006 ITI 4th International Conference on Information &

Communications Technology, Cairo, Egypt, 2006, pp. 1-1, doi:

10.1109/ITICT.2006.358261.

[14] Cai Xinyuan, "Semantic transformation in user interface design," 2008 9th
International

Conference on Computer-Aided Industrial Design and Conceptual Design, Beijing, China,

2008, pp. 137-140, doi: 10.1109/CAIDCD.2008.4730537.

[15] C. Wongchokprasitti and P. Brusilovsky, "NewsMe: A Case Study for Adaptive News

Systems with Open User Model," Third International Conference on Autonomic and

Autonomous Systems (ICAS'07), Athens, Greece, 2007, pp. 69-69, doi:

10.1109/CONIELECOMP.2007.88.

[16] J. Zhou, X. Miao, F. He and Y. Miao, "Effects of Font Style and Font Color in News

Text on User Cognitive Load in Intelligent User Interfaces," in IEEE Access, vol. 10,

pp. 1071910730, 2022, doi: 10.1109/ACCESS.2022.3151915.

[17] N. K. Giang, M. Ha and D. Kim, "SCoAP: An integration of CoAP protocol with

webbased application," 2013 IEEE Global Communications Conference


31

(GLOBECOM), Atlanta, GA, USA, 2013, pp. 2648-2653, doi:

10.1109/GLOCOM.2013.6831474.

[18] R. D. Lero, C. Exton and A. Le Gear, "Communications using a

speech-to-text-to-speech pipeline," 2019 International Conference on Wireless and

Mobile Computing, Networking and

Communications (WiMob), Barcelona, Spain, 2019, pp. 1-6, doi:

10.1109/WiMOB.2019.8923157.

[19] R. Luo et al., "Lightspeech: Lightweight and Fast Text to Speech with Neural

Architecture Search," ICASSP 2021 - 2021 IEEE International Conference on

Acoustics, Speech and Signal

Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 5699-5703, doi:

10.1109/ICASSP39728.2021.9414403.
[20] C. Zhang et al., "Denoispeech: Denoising Text to Speech with Frame-Level Noise

Modeling," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech

and Signal Processing (ICASSP), Toronto, ON, Canada, 2021, pp. 7063-7067, doi:

10.1109/ICASSP39728.2021.9413934.

[21] A. Acero, "An overview of text-to-speech synthesis," 2000 IEEE Workshop on Speech

Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421),

Delavan, WI, USA, 2000, pp. 1-, doi: 10.1109/SCFT.2000.878372.

[22] P. M. ee, S. Santra, S. Bhowmick, A. Paul, P. Chatterjee and A. Deyasi, "Development


of

GUI for Text-to-Speech Recognition using Natural Language Processing," 2018 2nd

International Conference on Electronics, Materials Engineering & Nano-Technology

(IEMENTech), Kolkata, India, 2018, pp. 1-4, doi: 10.1109/IEMENTECH.2018.8465238.


32

APPENDIX

index.html
33

manifest.json
34

InfoCards.js
35
36

styles.js
37

NewsCard.js
38
39

styles.js
40

NewsCards.js
41

styles.js
42

App.js
43

index.css
44

index.js

styles.js

alan_code.txt
45
46
47
48

package-lock.json
49
50
51
52

package.json
a
ORIGINALITY REPORT

3 %
SIMILARITY INDEX
2%
INTERNET SOURCES
3%
PUBLICATIONS
1%
STUDENT PAPERS

PRIMARY SOURCES

1
V. Madhusudhana Reddy, T. Vaishnavi, K.
Pavan Kumar. "Speech-to-Text and Text-to-
1%
Speech Recognition Using Deep Learning",
2023 2nd International Conference on Edge
Computing and Applications (ICECAA), 2023
Publication

2
Submitted to Queensland University of
Technology
1%
Student Paper

3
Lecture Notes in Computer Science, 2006.
Publication <1 %
4
Xie, Jingming. "The design of a mobile English
learning system for higher vocational
<1 %
students", International Journal of
Information Technology and Management,
2014.
Publication

5
Submitted to Manipal University
Student Paper <1 %
6
docplayer.net
Internet Source <1 %
7
www.geeksforgeeks.org
Internet Source <1 %
8
Submitted to University of Surrey
Student Paper <1 %
9
Dimitrichka Nikolaeva. "An Elementary
Emulator Based on Speech-To-Text and Text-
<1 %
to-Speech Technologies for Educational
Purposes", 2023 XXXII International Scientific
Conference Electronics (ET), 2023
Publication

10
apkflash.com
Internet Source <1 %
11
publications.polymtl.ca
Internet Source <1 %

Exclude quotes Off Exclude matches Off


Exclude bibliography Off

You might also like