Major Project 1

The Team
Philemon Daniel :
Major Project Instructor
Ritesh Pushkar Tanuj Kumar Virendra Saini Animesh Baranyal
194114 194115 194113 194105

Text to speech software
…
September 26, 2022
Overview
To reiterate, text-to-speech (TTS) is the capacity of your computer or device to read text
aloud through software such as Speechify. The benefits of this technology can go very far.
Research has shown that text-to-speech technology improves accessibility, facilitates
comprehension, and creates a more efficient learning environment. Whether it’s used to
assist students, a business needs, children or just for your own pleasure. Individuals with
disabilities, such as dyslexia, or blindness have a particularly valuable advantage when
utilizing this speech software.
Understanding the problem
Single Box Text Animated text Text for Images
Spoken in the order that the
The text in any single box is It is possible to specify the text
animations play, waiting for each
spoken continuously from top to spoken for images (and indeed
to complete before speaking the
bottom. Individual boxs are any object including text) by
next. However if a number of
spoken in the order they are added entering Alternative Web Text for
animations are used within a
to the slide. There appears to be the image. This is done as follows:
single text box, PowerTalk waits
no way to alter this and the way to right click on the image, select
for the first animation only and
force a different order is to create 'format object' ('size and shape'),
then speaks all the remaining text.
a new slide and copy the text select the Web tab and enter the
You may want to break the text
boxes in the required order alternative text to be spoken. If
into separate boxes and animate
you enter a space then nothing is
each.
spoken at all
Natural Language It produces a phonetic transcription of the text read,
Processing (NLP) module: together with prosody
Major operations of the NLP module

First the text is segmented into tokens. The token-to-word conversion
Text Analysis: creates the orthographic form of the token. For the token “Mr” the
orthographic form “Mister”.
After the text analysis has been completed, pronunciation rules can be
Application of
applied. Letters cannot be transformed 1:1 into phonemes because
Pronunciation Rules: correspondence is not always parallel.
Pronunciation → the prosody is generated. The degree of naturalness of

Prosody Generation: TTS system is dependent on prosodic factors like intonation modelling,
amplitude modelling and duration modelling
The output of the NLP module is passed to the DSP module. This is where the actual synthesis of the
speech signal happens. In concatenative synthesis the selection and linking of speech segments take
place. For individual sounds the best option (where several appropriate options are available) are
selected from a database and concatenated.
The DSP component of a general Operations of the natural Language

concatenation based synthesizer processing module of a TTS synthesizer.
TTSR Interface when a text document is loaded into it.
CONCLUSION
Text to speech synthesis is a rapidly growing aspect of computer
technology and is increasingly playing a more important role in the
way we interact with the system and interfaces across a variety of
platforms. We have identified the various operations and processes
involved in text to speech synthesis. We have also developed a very
simple and attractive graphical user interface which allows the user
to type in his/her text provided in the text field in the application.
Our system interfaces with a text to speech engine developed for
American English. In future, we plan to make efforts to create
engines for localized language so as to make text to speech
technology more accessible to a wider range of People.
Timeline
26 Sept 2022 20 Nov 2022
Deployment Speech to text Project Submit
Global go-live
Major Project 2
References
Lemmetty, S., 1999. Review of Speech Syn1thesis Technology. Masters Dissertation, Helsinki University Of Technology.
Dutoit, T., 1993. High quality text-to-speech synthesis of the French language. Doctoral dissertation, Faculte Polytechnique de
Mons.
Suendermann, D., Höge, H., and Black, A., 2010. Challenges in Speech Synthesis. Chen, F., Jokinen, K., (eds.), Speech
Technology, Springer Science + Business Media LLC.
Allen, J., Hunnicutt, M. S., Klatt D., 1987. From Text to Speech: The MITalk system. Cambridge University Press.
Rubin, P., Baer, T., and Mermelstein, P., 1981. An articulatory synthesizer for perceptual research. Journal of the Acoustical Society
of America 70: 321–328.
van Santen, J.P.H., Sproat, R. W., Olive, J.P., and Hirschberg, J., 1997. Progress in Speech Synthesis. Springer.
van Santen, J.P.H., 1994. Assignment of segmental duration in text-to-speech synthesis. Computer Speech & Language, Volume 8,
Issue 2, Pages 95–128
Wasala, A., Weerasinghe R. , and Gamage, K., 2006, Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwaepenthesis.
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, pp. 890-897.
Lamel, L.F., Gauvain, J.L., Prouts, B., Bouhier, C., and Boesch, R., 1993. Generation and Synthesis of Broadcast Messages,
Proceedings ESCA-NATO Workshop and Applications of Speech Technology.
van Truc, T., Le Quang, P., van Thuyen, V., Hieu, L.T., Tuan, N.M., and Hung P.D., 2013. Vietnamese Synthesis System, Capstone
Project Document, FPT UNIVERSITY.

Major Project 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Major Project 1

Uploaded by

Copyright:

Available Formats

The Team

Ritesh Pushkar Tanuj Kumar Virendra Saini Animesh Baranyal

194114 194115 194113 194105

Major operations of the NLP module

Pronunciation → the prosody is generated. The degree of naturalness of

The DSP component of a general Operations of the natural Language

Deployment Speech to text Project Submit

You might also like