Speech Recogination

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 21

Minor Project On

Speech Recognition

Project Guide: Chinmaya Ku. Swain Asst. Prof. Comp. Sci. Engg.

Completed by:Ashish Kumar (0811012097) 4th year (computer sc. & engg) Iter(bhubaneswar)

1.Introduction to Speech Recognition 2.Steps of Speech Recognition 3.Speech Recognition Algorithm 3.1 Dynamic Time Wrapping Algorithm 3.2 Hidden Markov Model 4.Speech Recognition Software 5.Future Scope of Speech Recognition 6.Conclusion
2

Speech recognition is the process by which a computer (or other type of machine) identifies spoken words. It is an alternative to typing on keyboard.

Converts Analog Wave to Digital Data. Divides Digital Data into Small Segments. Matches the Small Segments with known phonemes. The phonemes extracted are compared with other phonemes stored in library files. Speech Recognition Software tries to figure out the actual word using complex statistical model like Hidden Markov Model. The Output is Given in the form of text or speech.

The analog Signal generated are converted into Digital Signal with the help of Sound Card. The regular sample of analog signal are taken. The digital Signal generated are filtered to remove unwanted noise and normalised to adjust to the match the speed of the template sound samples already stored in the system's memory.

Speech Recognition techniques generally implement a database approach for comparisions and recognition. A internal database module consisting of a set of pre-defined speech sequences is compared with the user input, by the following Speech Algorithms. Dynamic Time-Wrapping Algorithm Hidden Markov Model Algorithm

used basically for pattern recognition and pattern comparison method used to find a optimal match between 2 given sequence sequences are warped non-linearly in time dimension, and compared similarity independent of non-linear variation.

Sequence Alignment : In case the one of the given sequence is incomplete, or damaged.. manual or automatic segment matching and recognition. less importance to continuity, and more suited to match sequences with missing information, provided a long enough string is present. used in traditional speech recognition, by comparing input voice string with in-build speech tracks, to find the meaning of the said speech.

Any 2 sequence may be interpreted as a part of a common larger sequence, or the part of one or the other. Comparing a damaged or incomplete sequence with a previously acquired complete sequence may be a mismatch, as The start or end substring may be damaged and so, alignment may be required.
10

Sequence Alignment:

11

Its is a statistical model implementing Dynamic Bayesian Network Widely used in temporal pattern recognition for speech, handwriting, gesture, part-of-speech etc. HMM used in Speech Recognition views the speech signal as a piece-wise stationary signal or a short-time stationary signal ranging within 10msec.

12

It is a probabilistic approach to determine a number of possible outcomes and actions, which may be compared to the maximum likeness of it to be selected. It may be considered as a generalization of a no. of models, where latent variables may be related through a Markov process, rather then independent of each other.

13

14

After the input Speech sequence is processed via the various algorithms, precision techniques may be implemented to form closer-to-human response. Vocal Tract Length Normalization: Determine the gender of the speaker, by analysizing the vocal tract length of the recorded speech. Maximum Likeness Linear Regression: To determine the most likely human response, in order to form a verbal personality of the computer.

15

Most Recognition softwares only perform upto the string matching and recognition phase. The future implementation of this technique is to achieve speech understanding through NLP. NLP is the Branch of A.I which enables the computer to understand human language, process it and generate an appropiate response. It provides an anlytical computer,a human personality.

16

Carnegie Mellon University- Sphinx ToolKit MacSpeech Dictate (2010) for Macintosh iListen for Macintosh ViaVoice developed by IBM Corporation Windows Speech Recognition, for WinXP/Vista/Seven Siri Personal Digital Assistant, by Apple Inc.

17

Universal Translator, combining Automatic Translation and Voice Activation Technologies. GALE Project implemented by DARPA, focuses on instant translation between two languages, with about 90% accuracy. DARPA funded R&D project TRANSTAC, a language processing software to allow soldiers in foreign soil to communicate with the locals.

18

Even with wide sophistication and high level of research, Speech Recognition is still at its base level. The main implementation of this technology is by its merger with the concept of Natural Language Processing. In this Age of Automation, prospects of Speech Processing are limitless.

19

Speech And LanguageProcessing by Jorasky and Martin. Spoken Language Processing by Xuedong Huang Hidden Markovs Models Fundamentals by Daniel Ramage Speech Recognition Technology: A Critique by Stephen E. Levinson

20

21

You might also like