Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Voice Recognition Using MatLab

Presented by: Avienash raibole Paresh meshram Vinayak kolpek

The purpose of our project is to implement an efficient voice recognition algorithm using MatLab. Voice recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognised words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding.

What we can do with voice Recognition

Transcription dictation, information retrieval Command and control data entry, device control, navigation, call routing Information access airline schedules, stock quotes, directory assistance Problem solving travel planning, logistics


Speaker Recognition methods

Text Dependent : For speaker identity is based on his/ her speaking one or more specific phase. Text Independent: Speaker models capture characteristics of somebodys speech which show up irrespective of what one is saying.


Continuous speech

Frame Blocking




mel cepstrum


Mel-frequency wrapping


Feature Extraction
That extracts a small amount of data from the voice signal that can later be used to represent each speaker. A wide range of possibilities exist for parametrically representing the speech signal for the speaker recognition task, such as a)Linear Prediction Coding(LPC), b)Mel-Frequency Cepstrum Coefficients (MFCC), and others.

It is based on the known variation of the human ears critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies. To capture the phonetically important characteristics of speech, signal is expressed in the Mel frequency scale .



How does it work?

record a voice
Digitized Speech Signal (.wave file) Acoustic Preprocessing (DFT + MFCC)

extract feature vectors

Speech Recognizer (Dynamic Time Warping)

Record voice command (Time domain). Transform into frequency domain using Fourier Transform and get the magnitude spectrum. Compare spectrum of voice commands.

Controlling of device. Hands-free mobile phone in car. Single purpose command and control system. Voice Verification. Many more.

The model is trained much faster than other method. It is able to reduce large datasets to a smaller number of codebook vectors. Easy to implementation and more accurate. Speech is a very natural way to interact, and it is not necessary to sit at a keyboard or work with a remote control. No training required for users.

The amount of words that could be recognized by our program was limited, the more words we tried adding, the less accurate it became. The voice recognition program only works for the persons voice who is trained for it. Program is less accurate in noisy environments. Voice Recognition works best if the microphone is close to the user.

Future Of Voice Recognition

Better rejection of extraneous speech. Better recognition of embedded commands. Better efficiency on low cost processors. Standards for performance evaluation. Increased portability. Lower error rates. Improve overall robustness.

Research Articles on Speech

Koester, H.H. (2006). Factors that Influence the Performance of Experienced Speech Recognition Users. Assistive Technology, 18(1): 56-76. Koester, H.H. (2004). Usage, Performance, and Satisfaction Outcomes for Experienced Users of Speech Recognition. Journal of Rehabilitation Research and Development, 41(5): 739-754. Koester, H.H. (2003). Abandonment of Speech Recognition Systems. by New Users. Proceedings of RESNA 2003 Annual Conference, Atlanta, GA. Arlington, VA: RESNA Press. Koester, H.H. (2002). User Performance with Speech Recognition Systems: A Literature Review. Assistive Technology, 13(2):116-30.


You might also like