Synopsis 3

Synopsis on
Performance Analysis of Combined Wavelet

Transform and Artificial Neural Network for
Isolated Marathi Digit Recognition
By
Atul Dattatraya Narkhede

Under the Supervision of
Dr. Milind Nemade
Faculty of Engineering
PACIFIC ACADEMY OF HIGHER EDUCATION AND
RESEARCH UNIVERSITY, UDAIPUR.
1
SUMMARY
Abstract
Introduction
Review of Literature
Research Gaps
Scope of Research
Research Objectives
Hypothesis
Tools & Techniques
Research Plan
Tentative Chapter Flow
References
ABSTRACT
Speech processing is useful for various applications such as

mobile applications, healthcare, automatic translation,
robotics, video games, transcription, audio and video
database search, household applications, language learning
applications etc.
A speech recognition system has two major components,

namely, feature extraction and classification.
There are two dominant approaches of acoustic

measurement. First is a temporal domain or parametric
approach and second approach is nonparametric frequency
domain approach.
The objective of our research is to investigate the

combined performance of wavelet transform and artificial
neural network (ANN) for isolated Marathi digits so as to
improve accuracy of speech recognition system.
We proposed in to derive effective, efficient, and noise

robust features from the frequency subbands of the frame.
Each frame of speech signal is decomposed into different
frequency subbands using discrete wavelet transform
(DWT) and each subband is further classified using (ANN).
The effective speech features and classification would

improve quality of the speech, represent speech signal in
terms of frequency and bandwidth and improves speech
recognition.
4
INTRODUCTION
The performance of a speech processing system is

usually measured in terms of recognition accuracy.
All speech recognizers include an initial signal

processing front end that converts a speech signal into
its more convenient and compressed form called
feature vectors.
Feature extraction method plays a vital role in speech

recognition task.
The wavelet transform, with its flexible time-frequency

window, is an appropriate tool for the analysis of non
stationary signals like speech.
In speech signal, high frequencies are present very

briefly at the onset of a sound while lower frequencies
are present latter for long period.
DWT resolves all these frequencies well.
The DWT parameters contain the information of

different frequency scales.
This helps in getting the speech information of

corresponding frequency band.
Artificial Neural Network (ANN) is an efficient pattern

recognition mechanism which simulates the neural
information processing of human brain.
The computational intelligence of neural networks is

made up of their processing units, characteristics and
ability to learn.
In learning the system parameters of NN vary over

time and are characterized by their ability of local and
parallel computation, simplicity and regularity.
A wavelet transform is an elegant tool for the analysis

of non-stationary signals like speech.
The results have shown that this hybrid architecture

using Discrete wavelet transforms and Neural
networks could effectively extract the features from
the speech signal for automatic speech recognition.
REVIEW OF LITERATURE
Remarkable Observations in the review of work are as
follows
Speech features which are usually obtained via Fourier

transforms (FTs), Short time Fourier transforms (STFTs) or
Linear Predictive Coding techniques are used for some kind
of Automatic speech/speaker recognition (ASR). They may
not be suitable for representing speech/voice.
In spite of the improvement on the computation time for

FFT, the recognition time of the proposed systems is still
too long to be used for real-time applications.
Multi-core and parallel processing for the speech

recognition algorithm are necessary to further improve the
recognition time and is worthwhile to examine in the
research.
follows
The conventional approaches like Mel frequency cepstrum

coefficients (MFCC) and linear predictive coefficients (LPC)
focus on spectral features limited to lower frequency bands.
Best recognition was found from the DWT decomposition

when compared to the MFCCs for speaker independent and
speaker dependent tasks respectively.
Wavelet transform approaches provided good results in

clean, noisy and reverberant environments and also has a
much lower computational complexity.
follows
Wavelet decomposition results in a logarithmic set of

bandwidths, which is very similar to the response of human ear
to frequencies.
Wavelet transform efficiently locates the spectral changes in

speech signal as well as beginning and end of the sounds can
also be located.
Results show that hybrid architecture using discrete wavelet

transforms and neural networks could effectively extract the
features from the speech signal for automatic speech
recognition.
10
Remarkable Observations in the review of work are
as follows
Artificial neural network performance depends on the

size and quality of training samples.
The simplification of the ANN architecture without

reducing the recognition rate can also speed up the
recognition time.
Improving the recognition accuracy of the system by

combining the multiple classifiers.
11
RESEARCH GAPS
Feature extraction and classification are major components,

plays vital role in speech recognition systems. So efficient
representation of speech features & its classification is required
for speech recognition systems.
To improve accuracy of speech recognition system, we can use

hybrid architecture consist of Wavelet Transform (WT) and
Artificial Neural Network (ANN).
ANN architecture
recognition rate.
Isolated Marathi Digit should be recognised speedily by speeding

up the recognition time.
can be simplified without reducing the
12
SCOPE OF RESEARCH
The scope of our research is limited to investigate the

combined performance of Wavelet Transform (WT) and
Artificial Neural Network (ANN) for feature extraction
and classification for isolated Marathi digits.
TOOLS & TECHNIQUES
Matlab / Simulink Programming Language
13
RESEARCH OBJECTIVES
The objective of our research is to investigate the combined
performance of Wavelet Transform (WT) and Artificial Neural
Network (ANN) for Isolated Marathi Digits so as to improve
accuracy of speech recognition system.
To derive effective, efficient, and noise robust features

from the frequency sub bands of the frame using discrete
wavelet transform.
Each frame of speech signal is decomposed into different

frequency sub bands using discrete wavelet transform.
Classification of each sub band using artificial neural

network (ANN).
Determination of accuracy of speech recognition system.
14
ISOLATED
DIGIT
RECOGNITI
ON
15
RESEARCH
METHODOLOGY
16
Hypothesis
The objective of our research is to investigate the
combined performance of wavelet transform and artificial
neural network (ANN) for isolated Marathi digits so as to
improve accuracy of speech recognition system.
Tentative Chapter Flow

1.
Introduction to Speech Processing, Wavelet Transform

& ANN
2.
Speech Feature Extraction using Wavelet Transform

(WT)
3.
Speech Feature Classification using Artificial Neural

Network (ANN)
4.
Performance analysis of Speech Feature Extraction

and Classification Techniques
17
RESEARCH PLAN
Activity
Phas Phas Phas Phas Phas Phas

e
e II e III e IV
eV
e VI
I
Literature Survey
Study of Software Tools like
MATLAB/SIMULINK, Neural
Network Toolbox and its
MATLAB link
Survey of Existing Methods and
Algorithms
Suggesting techniques for
removing limitations in existing
algorithms
Simulation of combined
strategies
Comparing results of developed
strategies with existing
algorithms
Performance evaluation and
implementation
Documentation
Review & Research Paper
Preparation &
Presentation/Publication
18
REFERENCES
[1]
T. F. Quatieri, Discrete Time Speech Signal Processing, Pearson
[2]
Education, 2002.
R. M. Rao, A. S. Bopardikar, Wavelet Transform, Pearson Education, 2005.
[3]
J. M. Zurada, Introduction to Artificial Neural Network, West, 1992.
[4]
Yoshua Bengio, Renato De Mori, Regis Cardin, Speaker Independent Speech

Recognition with Neural Networks and Speech Knowledge, Department of
[5]
Computer Science McGill University, pp.218-225, 1990.

Bhiksha Raj, Lorenzo Turicchia, Bent Schmidt-Nielsen, and Rahul Sarpeshkar,
An FFT-Based Companding Front End for Noise-Robust Automatic Speech
Recognition, EURASIP Journal on Audio, Speech, and Music Processing,
[6]
vol.2007, pp.1-13, 2007.

Adam Glowacz, Witold Glowacz, Andrzej Glowacz, Sound Recognition of
Musical Instruments with Application of FFT and K-NN classifier with Cosine
Distance ,AGH university of Science and Technology, 2010.
19
REFERENCES
[7]
Gil
Lopes,
Recognition
[8]
Fernando
in
Noisy
Ribeiro,
Paulo
Environment,
Carvalho,
Whistle
Universidade
do
Sound
Minho,
Departamento de Electrnica Industrial, Guimares, Portugal.

Shing-Tai Pan, Chih-Chin Lai and Bo-Yu Tsai, The Implementation of Speech
Recognition
Systems
on
FPGA-Based
Embedded
Systems
with
SOC
Architecture, International Journal of Innovative Computing, Information and

[9]
Control, vol.7, no.11, pp.6161-6175, November 2011.

Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar,
Automatic Identification of Bird calls using Spectral Ensemble Average Voice
Prints, 14th European IEEE Signal Processing Conference, pp. 1 5, 2006.

[10] Dwijen Rudrapal, Smita Das, S. Debbarma, N. Kar, N. Debbarma, Voice
Recognition and Authentication as a Proficient Biometric Tool and its
Application in Online Exam for P.H People, International Journal of Computer
Applications (0975 8887), vol.39,no.12, pp.7-12, February 2012.
[11] Asm Sayem, Speech Analysis for Alphabets in Bangla Language:Automatic
Speech Recognition, International Journal of Engineering Research, vol.3,
no.2, pp.88-93, February 2014.
20
REFERENCES
[12] W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. TorresCarrasquillo, Support Vector Machines for Speaker and Language
Recognition, in Elsevier Journal of Computer Speech & Language,
vol. 20, issue 2/3, pp 210 229, 2006.
[13] Siddheshwar S. Gangonda, Dr. Prachi Mukherji, Speech Processing for Marathi
Numeral Recognition using MFCC and DTW Features, International Journal of
Engineering Research and Applications (IJERA), pp.218-222, March 2012.
[14] Wahyu Kusuma R., Prince Brave Guhyapati V., Simulation Voice Recognition
System for controlling Robotic Applications, Journal of Theoretical and Applied
Information Technology,vol.39, no.2,pp. 188-196, May 2012.
[15] Thiang and Suryo Wijoyo, Speech Recognition Using Linear Predictive Coding
and Artificial Neural Network for Controlling Movement of Mobile Robot,
International Conference on Information and Electronics Engineering, vol.6,
pp.179-183, 2011.
[16] Bishnu Prasad Das, Ranjan Parekh, Recognition of Isolated Words using
Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers,
International Journal of Modern Engineering Research, vol.2, pp.854-858, MayJune 2012.
21
REFERENCES
[17
P. Zegers, Speech recognition using neural network, MS Thesis,
Department of Electrical & Computer Engineering, University of
[18
Arizona, 1998.
Paul A.K., Das D., Kamal M.M., Bangla Speech Recognition System Using LPC
and ANN, 7th IEEE International Conference on Advances in Pattern
[19
Recognition, pp. 171-174, 2009.

Firoz Shah. A, Raji Sukumar. A and Babu Anto. P, Discrete Wavelet
Transforms and Artificial Neural Networks for Speech Emotion Recognition,

International Journal of Computer Theory and Engineering, vol. 2, no. 3,
[20
pp.319-322, June 2010.

Jeih-Weih Hung, Hao-Teng Fan , and Syu-Siang Wang, Several New DWT-
Based Methods for Noise-Robust Speech Recognition, International Journal

of Innovation, Management and Technology, vol. 3, no. 5, pp.547-551,
[21
October 2012.
Jagannath H Nirmal, Mukesh A Zaveri, Suprava Patnaik and Pramod H
Kachare, A novel voice conversion approach using admissible wavelet

packet decomposition, EURASIP Journal on Audio, Speech, and Music
[22
Processing, pp 1 10, 2013.

T. B. Adam, M. S. Salam, T. S. Gunawan, Wavelet Cesptral Coefficients for
Isolated Speech Recognition, Telkomnika, vol.11, no.5, pp.2731-2738, May22
REFERENCES
[23]
Sanja Grubesa, Tomislav Grubesa, Hrvoje Domitrovic, Speaker

Recognition Method combining FFT, Wavelet Functions and Neural
Networks,
[24]
Faculty
of
Electrical
Engineering
and
Computing,
University of Zagreb, Croatia.

Mohammed Anwer and Rezwan-Al-Islam Khan, Voice identification Using a
Composite
Haar
Wavelets
and
Proper
Orthogonal
Decomposition,
International Journal of Innovation and Applied Studies, vol. 4, no. 2, pp.353[25]
358, October 2013.

Marco Jeub, Dorothea Kolossa, Ramon F. Astudillo, Reinhold Orglmeister,
Performance Analysis of Wavelet-based Voice Activity Detection, NAG/DAGA-
[26]
Rotterdam, 2009.
Beng T Tan, Robert lang, Hieko Schroder, Andrew Spray, Phillip Dermody,
Applying Wavelet Analysis to Speech Segmentation and Classification,
[27]
Department of Computer Science.

Bartosz Zioko, Suresh Manandhar, Richard C. Wilson and Mariusz Zioko,
Wavelet Method of Speech Segmentation, University of York Heslington,
YO10 5DD, York, UK.
23
REFERENCES
[28] N. S. Nehe, R. S. Holambe, New Feature Extraction Techniques for

Marathi Digit Recognition, International Journal of Recent Trends in
Engineering, Vol 2, No. 2, November 2009.
[29] Sonia Sunny, David Peter S, K Poulose Jacob, Discrete Wavelet Transforms
and Artificial Neural Networks for Recognition of Isolated Spoken Words, in
International Journal of Computer Applications, volume 38, No.9, pp 9 13,
January 2012.
[30] N. S. Nehe, R. S. Holambe, DWT and LPC based feature extraction methods
for isolated word recognition, EURASIP Journal on Audio, Speech, and Music
Processing, vol.2012, pp.1-7, 2012.
[31] Engin Avci, Zuhtu Hakan Akpolat, Speech recognition using a wavelet packet
adaptive network based fuzzy inference system, in Elsevier Expert Systems &
Applications, vol 31, pp 495 503, 2006.
24
Thank you
25
Training phase accepts speech samples from different people and trains the
system to create acoustic models for each word in vocabulaey.TP undergoes
through two stages Data preparation & Recording data.
Verification Phase display some random numbers then check for pronouns
number.
Some time system consists of speech processing inclusive of digit boundary
and recognition which uses zero crossing and energy techniques. Mel
Frequency Cepstral Coefficients (MFCC) vectors are used to provide an
estimate of the vocal tract filter. Meanwhile dynamic time warping (DTW) is
used to detect the nearest recorded voice.
The general methodology of audio classification involves extracting
discriminatory features from the audio data and feeding them to a pattern
classifier. Different approaches and various kinds of audio features were
proposed with varying success rates. The features can be extracted either
directly from the time domain signal or from a transformation domain
depending upon the choice of the signal analysis approach. Some of the audio
features that have been successfully used for audio classification include Mel
Frequency Cepstral Coefficients (MFCC).
26
MFCCs are commonly derived as follows:

1.Take the Fourier Transform of (a windowed
excerpt of) a signal.
2.Map the powers of the spectrum obtained
above onto the Mel Scale, using triangular
overlapping windows.
3.Take the logs of the powers at each of the Mel
frequencies.
4.Take the discrete cosine transform of the list
of Mel log powers, as if it were a signal.
5.The MFCCs are the amplitudes of the resulting
spectrum.
27
28

Synopsis 3

Uploaded by

Copyright:

Available Formats

You might also like

Synopsis 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Synopsis 3

Uploaded by

Copyright:

Available Formats

Synopsis on

Performance Analysis of Combined Wavelet

Atul Dattatraya Narkhede

Dr. Milind Nemade

Tools & Techniques

Tentative Chapter Flow

Speech processing is useful for various applications such as

A speech recognition system has two major components,

There are two dominant approaches of acoustic

The objective of our research is to investigate the

We proposed in to derive effective, efficient, and noise

The effective speech features and classification would

The performance of a speech processing system is

All speech recognizers include an initial signal

Feature extraction method plays a vital role in speech

The wavelet transform, with its flexible time-frequency

In speech signal, high frequencies are present very

DWT resolves all these frequencies well.

The DWT parameters contain the information of

This helps in getting the speech information of

Artificial Neural Network (ANN) is an efficient pattern

The computational intelligence of neural networks is

In learning the system parameters of NN vary over

A wavelet transform is an elegant tool for the analysis

The results have shown that this hybrid architecture

Speech features which are usually obtained via Fourier

In spite of the improvement on the computation time for

Multi-core and parallel processing for the speech

The conventional approaches like Mel frequency cepstrum

Best recognition was found from the DWT decomposition

Wavelet transform approaches provided good results in

Wavelet decomposition results in a logarithmic set of

Wavelet transform efficiently locates the spectral changes in

Results show that hybrid architecture using discrete wavelet

Artificial neural network performance depends on the

The simplification of the ANN architecture without

Improving the recognition accuracy of the system by

Feature extraction and classification are major components,

To improve accuracy of speech recognition system, we can use

Isolated Marathi Digit should be recognised speedily by speeding

can be simplified without reducing the

The scope of our research is limited to investigate the

TOOLS & TECHNIQUES

Matlab / Simulink Programming Language

To derive effective, efficient, and noise robust features

Each frame of speech signal is decomposed into different

Classification of each sub band using artificial neural

Determination of accuracy of speech recognition system.

Tentative Chapter Flow

Introduction to Speech Processing, Wavelet Transform

Speech Feature Extraction using Wavelet Transform

Speech Feature Classification using Artificial Neural

Performance analysis of Speech Feature Extraction

Phas Phas Phas Phas Phas Phas

T. F. Quatieri, Discrete Time Speech Signal Processing, Pearson

J. M. Zurada, Introduction to Artificial Neural Network, West, 1992.

Yoshua Bengio, Renato De Mori, Regis Cardin, Speaker Independent Speech

Computer Science McGill University, pp.218-225, 1990.

vol.2007, pp.1-13, 2007.