Synopsis 3

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Synopsis on

Performance Analysis of Combined Wavelet


Transform and Artificial Neural Network for
Isolated Marathi Digit Recognition
By

Atul Dattatraya Narkhede


Under the Supervision of

Dr. Milind Nemade

Faculty of Engineering
PACIFIC ACADEMY OF HIGHER EDUCATION AND
RESEARCH UNIVERSITY, UDAIPUR.
1

SUMMARY

Abstract

Introduction

Review of Literature

Research Gaps

Scope of Research

Research Objectives

Hypothesis

Tools & Techniques

Research Plan

Tentative Chapter Flow

References

ABSTRACT

Speech processing is useful for various applications such as


mobile applications, healthcare, automatic translation,
robotics, video games, transcription, audio and video
database search, household applications, language learning
applications etc.

A speech recognition system has two major components,


namely, feature extraction and classification.

There are two dominant approaches of acoustic


measurement. First is a temporal domain or parametric
approach and second approach is nonparametric frequency
domain approach.

The objective of our research is to investigate the


combined performance of wavelet transform and artificial
neural network (ANN) for isolated Marathi digits so as to
improve accuracy of speech recognition system.

We proposed in to derive effective, efficient, and noise


robust features from the frequency subbands of the frame.
Each frame of speech signal is decomposed into different
frequency subbands using discrete wavelet transform
(DWT) and each subband is further classified using (ANN).

The effective speech features and classification would


improve quality of the speech, represent speech signal in
terms of frequency and bandwidth and improves speech
recognition.
4

INTRODUCTION

The performance of a speech processing system is


usually measured in terms of recognition accuracy.

All speech recognizers include an initial signal


processing front end that converts a speech signal into
its more convenient and compressed form called
feature vectors.

Feature extraction method plays a vital role in speech


recognition task.

The wavelet transform, with its flexible time-frequency


window, is an appropriate tool for the analysis of non
stationary signals like speech.

In speech signal, high frequencies are present very


briefly at the onset of a sound while lower frequencies
are present latter for long period.

DWT resolves all these frequencies well.

The DWT parameters contain the information of


different frequency scales.

This helps in getting the speech information of


corresponding frequency band.

Artificial Neural Network (ANN) is an efficient pattern


recognition mechanism which simulates the neural
information processing of human brain.

The computational intelligence of neural networks is


made up of their processing units, characteristics and
ability to learn.

In learning the system parameters of NN vary over


time and are characterized by their ability of local and
parallel computation, simplicity and regularity.

A wavelet transform is an elegant tool for the analysis


of non-stationary signals like speech.

The results have shown that this hybrid architecture


using Discrete wavelet transforms and Neural
networks could effectively extract the features from
the speech signal for automatic speech recognition.

REVIEW OF LITERATURE
Remarkable Observations in the review of work are as
follows

Speech features which are usually obtained via Fourier


transforms (FTs), Short time Fourier transforms (STFTs) or
Linear Predictive Coding techniques are used for some kind
of Automatic speech/speaker recognition (ASR). They may
not be suitable for representing speech/voice.

In spite of the improvement on the computation time for


FFT, the recognition time of the proposed systems is still
too long to be used for real-time applications.

Multi-core and parallel processing for the speech


recognition algorithm are necessary to further improve the
recognition time and is worthwhile to examine in the
research.

REVIEW OF LITERATURE
Remarkable Observations in the review of work are as
follows

The conventional approaches like Mel frequency cepstrum


coefficients (MFCC) and linear predictive coefficients (LPC)
focus on spectral features limited to lower frequency bands.

Best recognition was found from the DWT decomposition


when compared to the MFCCs for speaker independent and
speaker dependent tasks respectively.

Wavelet transform approaches provided good results in


clean, noisy and reverberant environments and also has a
much lower computational complexity.

REVIEW OF LITERATURE
Remarkable Observations in the review of work are as
follows

Wavelet decomposition results in a logarithmic set of


bandwidths, which is very similar to the response of human ear
to frequencies.

Wavelet transform efficiently locates the spectral changes in


speech signal as well as beginning and end of the sounds can
also be located.

Results show that hybrid architecture using discrete wavelet


transforms and neural networks could effectively extract the
features from the speech signal for automatic speech
recognition.

10

REVIEW OF LITERATURE
Remarkable Observations in the review of work are
as follows

Artificial neural network performance depends on the


size and quality of training samples.

The simplification of the ANN architecture without


reducing the recognition rate can also speed up the
recognition time.

Improving the recognition accuracy of the system by


combining the multiple classifiers.

11

RESEARCH GAPS

Feature extraction and classification are major components,


plays vital role in speech recognition systems. So efficient
representation of speech features & its classification is required
for speech recognition systems.

To improve accuracy of speech recognition system, we can use


hybrid architecture consist of Wavelet Transform (WT) and
Artificial Neural Network (ANN).

ANN architecture
recognition rate.

Isolated Marathi Digit should be recognised speedily by speeding


up the recognition time.

can be simplified without reducing the

12

SCOPE OF RESEARCH

The scope of our research is limited to investigate the


combined performance of Wavelet Transform (WT) and
Artificial Neural Network (ANN) for feature extraction
and classification for isolated Marathi digits.

TOOLS & TECHNIQUES

Matlab / Simulink Programming Language

13

RESEARCH OBJECTIVES
The objective of our research is to investigate the combined
performance of Wavelet Transform (WT) and Artificial Neural
Network (ANN) for Isolated Marathi Digits so as to improve
accuracy of speech recognition system.

To derive effective, efficient, and noise robust features


from the frequency sub bands of the frame using discrete
wavelet transform.

Each frame of speech signal is decomposed into different


frequency sub bands using discrete wavelet transform.

Classification of each sub band using artificial neural


network (ANN).

Determination of accuracy of speech recognition system.

14

ISOLATED
DIGIT
RECOGNITI
ON

15

RESEARCH
METHODOLOGY

16

Hypothesis
The objective of our research is to investigate the
combined performance of wavelet transform and artificial
neural network (ANN) for isolated Marathi digits so as to
improve accuracy of speech recognition system.

Tentative Chapter Flow


1.

Introduction to Speech Processing, Wavelet Transform


& ANN

2.

Speech Feature Extraction using Wavelet Transform


(WT)

3.

Speech Feature Classification using Artificial Neural


Network (ANN)

4.

Performance analysis of Speech Feature Extraction


and Classification Techniques

17

RESEARCH PLAN
Activity

Phas Phas Phas Phas Phas Phas


e
e II e III e IV
eV
e VI
I

Literature Survey
Study of Software Tools like
MATLAB/SIMULINK, Neural
Network Toolbox and its
MATLAB link
Survey of Existing Methods and
Algorithms
Suggesting techniques for
removing limitations in existing
algorithms
Simulation of combined
strategies
Comparing results of developed
strategies with existing
algorithms
Performance evaluation and
implementation
Documentation
Review & Research Paper
Preparation &
Presentation/Publication

18

REFERENCES
[1]

T. F. Quatieri, Discrete Time Speech Signal Processing, Pearson

[2]

Education, 2002.
R. M. Rao, A. S. Bopardikar, Wavelet Transform, Pearson Education, 2005.

[3]

J. M. Zurada, Introduction to Artificial Neural Network, West, 1992.

[4]

Yoshua Bengio, Renato De Mori, Regis Cardin, Speaker Independent Speech


Recognition with Neural Networks and Speech Knowledge, Department of

[5]

Computer Science McGill University, pp.218-225, 1990.


Bhiksha Raj, Lorenzo Turicchia, Bent Schmidt-Nielsen, and Rahul Sarpeshkar,
An FFT-Based Companding Front End for Noise-Robust Automatic Speech
Recognition, EURASIP Journal on Audio, Speech, and Music Processing,

[6]

vol.2007, pp.1-13, 2007.


Adam Glowacz, Witold Glowacz, Andrzej Glowacz, Sound Recognition of
Musical Instruments with Application of FFT and K-NN classifier with Cosine
Distance ,AGH university of Science and Technology, 2010.

19

REFERENCES
[7]

Gil

Lopes,

Recognition
[8]

Fernando
in

Noisy

Ribeiro,

Paulo

Environment,

Carvalho,

Whistle

Universidade

do

Sound
Minho,

Departamento de Electrnica Industrial, Guimares, Portugal.


Shing-Tai Pan, Chih-Chin Lai and Bo-Yu Tsai, The Implementation of Speech
Recognition

Systems

on

FPGA-Based

Embedded

Systems

with

SOC

Architecture, International Journal of Innovative Computing, Information and


[9]

Control, vol.7, no.11, pp.6161-6175, November 2011.


Hemant Tyagi, Rajesh M. Hegde, Hema A. Murthy and Anil Prabhakar,
Automatic Identification of Bird calls using Spectral Ensemble Average Voice

Prints, 14th European IEEE Signal Processing Conference, pp. 1 5, 2006.


[10] Dwijen Rudrapal, Smita Das, S. Debbarma, N. Kar, N. Debbarma, Voice
Recognition and Authentication as a Proficient Biometric Tool and its
Application in Online Exam for P.H People, International Journal of Computer
Applications (0975 8887), vol.39,no.12, pp.7-12, February 2012.
[11] Asm Sayem, Speech Analysis for Alphabets in Bangla Language:Automatic
Speech Recognition, International Journal of Engineering Research, vol.3,
no.2, pp.88-93, February 2014.

20

REFERENCES
[12] W. M. Campbell, J. P. Campbell, D. A. Reynolds, E. Singer, P. A. TorresCarrasquillo, Support Vector Machines for Speaker and Language
Recognition, in Elsevier Journal of Computer Speech & Language,
vol. 20, issue 2/3, pp 210 229, 2006.
[13] Siddheshwar S. Gangonda, Dr. Prachi Mukherji, Speech Processing for Marathi
Numeral Recognition using MFCC and DTW Features, International Journal of
Engineering Research and Applications (IJERA), pp.218-222, March 2012.
[14] Wahyu Kusuma R., Prince Brave Guhyapati V., Simulation Voice Recognition
System for controlling Robotic Applications, Journal of Theoretical and Applied
Information Technology,vol.39, no.2,pp. 188-196, May 2012.
[15] Thiang and Suryo Wijoyo, Speech Recognition Using Linear Predictive Coding
and Artificial Neural Network for Controlling Movement of Mobile Robot,
International Conference on Information and Electronics Engineering, vol.6,
pp.179-183, 2011.
[16] Bishnu Prasad Das, Ranjan Parekh, Recognition of Isolated Words using
Features based on LPC, MFCC, ZCR and STE, with Neural Network Classifiers,
International Journal of Modern Engineering Research, vol.2, pp.854-858, MayJune 2012.

21

REFERENCES
[17

P. Zegers, Speech recognition using neural network, MS Thesis,

Department of Electrical & Computer Engineering, University of

[18

Arizona, 1998.
Paul A.K., Das D., Kamal M.M., Bangla Speech Recognition System Using LPC

and ANN, 7th IEEE International Conference on Advances in Pattern

[19

Recognition, pp. 171-174, 2009.


Firoz Shah. A, Raji Sukumar. A and Babu Anto. P, Discrete Wavelet

Transforms and Artificial Neural Networks for Speech Emotion Recognition,


International Journal of Computer Theory and Engineering, vol. 2, no. 3,

[20

pp.319-322, June 2010.


Jeih-Weih Hung, Hao-Teng Fan , and Syu-Siang Wang, Several New DWT-

Based Methods for Noise-Robust Speech Recognition, International Journal


of Innovation, Management and Technology, vol. 3, no. 5, pp.547-551,

[21

October 2012.
Jagannath H Nirmal, Mukesh A Zaveri, Suprava Patnaik and Pramod H

Kachare, A novel voice conversion approach using admissible wavelet


packet decomposition, EURASIP Journal on Audio, Speech, and Music

[22

Processing, pp 1 10, 2013.


T. B. Adam, M. S. Salam, T. S. Gunawan, Wavelet Cesptral Coefficients for

Isolated Speech Recognition, Telkomnika, vol.11, no.5, pp.2731-2738, May22

REFERENCES
[23]

Sanja Grubesa, Tomislav Grubesa, Hrvoje Domitrovic, Speaker


Recognition Method combining FFT, Wavelet Functions and Neural
Networks,

[24]

Faculty

of

Electrical

Engineering

and

Computing,

University of Zagreb, Croatia.


Mohammed Anwer and Rezwan-Al-Islam Khan, Voice identification Using a
Composite

Haar

Wavelets

and

Proper

Orthogonal

Decomposition,

International Journal of Innovation and Applied Studies, vol. 4, no. 2, pp.353[25]

358, October 2013.


Marco Jeub, Dorothea Kolossa, Ramon F. Astudillo, Reinhold Orglmeister,
Performance Analysis of Wavelet-based Voice Activity Detection, NAG/DAGA-

[26]

Rotterdam, 2009.
Beng T Tan, Robert lang, Hieko Schroder, Andrew Spray, Phillip Dermody,
Applying Wavelet Analysis to Speech Segmentation and Classification,

[27]

Department of Computer Science.


Bartosz Zioko, Suresh Manandhar, Richard C. Wilson and Mariusz Zioko,
Wavelet Method of Speech Segmentation, University of York Heslington,
YO10 5DD, York, UK.

23

REFERENCES

[28] N. S. Nehe, R. S. Holambe, New Feature Extraction Techniques for


Marathi Digit Recognition, International Journal of Recent Trends in
Engineering, Vol 2, No. 2, November 2009.
[29] Sonia Sunny, David Peter S, K Poulose Jacob, Discrete Wavelet Transforms
and Artificial Neural Networks for Recognition of Isolated Spoken Words, in
International Journal of Computer Applications, volume 38, No.9, pp 9 13,
January 2012.
[30] N. S. Nehe, R. S. Holambe, DWT and LPC based feature extraction methods
for isolated word recognition, EURASIP Journal on Audio, Speech, and Music
Processing, vol.2012, pp.1-7, 2012.
[31] Engin Avci, Zuhtu Hakan Akpolat, Speech recognition using a wavelet packet
adaptive network based fuzzy inference system, in Elsevier Expert Systems &
Applications, vol 31, pp 495 503, 2006.

24

Thank you

25

Training phase accepts speech samples from different people and trains the
system to create acoustic models for each word in vocabulaey.TP undergoes
through two stages Data preparation & Recording data.
Verification Phase display some random numbers then check for pronouns
number.
Some time system consists of speech processing inclusive of digit boundary
and recognition which uses zero crossing and energy techniques. Mel
Frequency Cepstral Coefficients (MFCC) vectors are used to provide an
estimate of the vocal tract filter. Meanwhile dynamic time warping (DTW) is
used to detect the nearest recorded voice.
The general methodology of audio classification involves extracting
discriminatory features from the audio data and feeding them to a pattern
classifier. Different approaches and various kinds of audio features were
proposed with varying success rates. The features can be extracted either
directly from the time domain signal or from a transformation domain
depending upon the choice of the signal analysis approach. Some of the audio
features that have been successfully used for audio classification include Mel
Frequency Cepstral Coefficients (MFCC).

26

MFCCs are commonly derived as follows:


1.Take the Fourier Transform of (a windowed
excerpt of) a signal.
2.Map the powers of the spectrum obtained
above onto the Mel Scale, using triangular
overlapping windows.
3.Take the logs of the powers at each of the Mel
frequencies.
4.Take the discrete cosine transform of the list
of Mel log powers, as if it were a signal.
5.The MFCCs are the amplitudes of the resulting
spectrum.

27

28

You might also like