Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 9

“In Pursuit of Global Competitiveness”

Project Report

On

SPEAKER RECOGNITION USING


VECTOR QUANTIZATION

Submitted By

SARATH KOLLI (T.E. E&TC) Exam. No.


MAYUR BHAMRE (T.E. E&TC) Exam. No.
SHRIKANT MALI (T.E. E&TC) Exam. No.

Department of Electronics & Telecommunication Engineering


MET’S Institute of Engineering,BKC,Nashik.
(2008 - 2009)
CERTIFICATE

This is to certify that, the project “Speaker Recognition Using Vector Quantization”
submitted by Sarath Kolli, is a bonafide work completed under the supervision and
guidance in partial fulfilment for Electronics System Design Lab. Mini Project (T.E.
E&TC) at MET’S Institute of Engineering, BKC, affiliated to University of Pune (M.S.).

Place: Nashik

Date:

Prof. Risodkar Y.R. / Prof. Nandre A.G. Prof. Patil D.P.


Project Coordinator Head
Department of Department of
Electronics & Telecommunication Engineering. Electronics & Telecommunication Engineering.

Principal
MET’S Institute of Engineering, BKC,
Nashik (M.S.) – 422003
CONTENTS

List of Abbreviations i
List of Figures ii
List of Graphs iii
List of Tables iv

1. INTRODUCTION 1
1.1 Introduction 1
1.2 Necessity 3
1.3 Objectives 5
1.4 Theme 6
1.5 Organization 6

2. LITERATURE SURVEY 8
2.1 Historical Background 8
2.2 Previous Work 9
2.3 Forms of Speaker Recognition 11
2.3.1 Text Dependent 12
2.3.2 Text Independent 13
2.3.3 Speaker Verification 14
2.3.4 Speaker Identification 15
2.4 Human Speech Production System 16
2.4.1 Human Speech Production Model 20
2.4.2 Feature Sets 23
2.4.3 Mel- Frequency Cepstal Coefficients 24
2.4.4 Relative Spectral Feature Set 24
2.4.5 Band Filter Cross Correlation 24
2.5 Development of System 25
2.6 Few Companies Manufacturing Speaker recognition System 26
2.7 Comparison to other Biometric 29
2.8 Speaker Modeling 30

3. SYSTEM DEVELOPMENT 32
3.1 Feature Extraction 32
3.1.1 Introduction 32
3.1.2 Short Term Analysis 33
3.1.3 Cepstrum 34
3.1.4 MFCC 36
3.2 Speaker Matching and Modeling 37
3.2.1 Introduction 38
3.2.2 Vector Quantization 39
3.2.3 VQ Classifier 44
3.3 Speaker Discriminative Weighting Matching 45
3.3.1 Introduction 45
3.3.2 Weighted Similarity Measure 47
3.3.3 Computing Weights 47
3.4 Decision 48

4. PERFORMANCE ANALYSIS 49
4.1 MFCC Analysis 49
4.2 Experimental analysis 52
4.3 Training and Testing 61
4.4 Results 63

5. CONCLUSIONS 75
5.1 Conclusions 75
5.2 Future Scope 75
5.3 Applications 76

Appendices 77
Cost Estimation 78
References 80
Acknowledgment 81
List of Abbreviations
Symbol Illustrations
PIN Personal Identification Number
FAR False Acceptance Rate
FMR False Rejection Rate
EER Equal Error Rate
NIST National Institute of Standard and Technology
STFT Short Term Fourier Transform
FFT Fast Fourier Transform
IDFT Inverse Discrete Fourier Transform
MFCC Mel Frequency Cepstrum Coefficients
DCT Discrete Cosine Transform
LPC Linear Predictive Coding
LPCC Linear Predictive Cepstral Coefficients
DTW Dynamic Time Warping
VQ Vector Quantization
MSE Mean Squared Error
GLA Generalized Lloyd Algorithm
TT Texas Instruments
GMM Gaussian Mixture Modeling
HMM Hidden Markov Modeling
List of Figures

Figure Illustrations Page


2.1 Speaker Recognition 11
2.2 Speaker verification system 14
2.3 Speaker Identification System 15
2.4 Human Voice Production System 17
2.5 Global Airflow and Spectrogram Waveform 19
2.6 Wideband and Narrowband Spectrogram of an Utterance 20
2.7 Vocal Tract Model 21
2.8 Source Filter Model 23
3.1 Short Term Analysis 34
3.2 Speech Magnitude Spectrum 35
3.3 Cepstrum 35
3.4 Computation o Mel Cepstrum 36
3.5 Triangular Filters Used to Compute Mel Cepstrum 37
3.6 Vector Quantization o Two Speakers 39
3.7 Flow chart of LBG Algorithm 43
3.8 VQ Classifier 44
3.9 Illustration of Code vectors Having Different Discrimination power 46
3.10 Decision Process 48
4.1 Dependency of Identification Time on population Size 49
4.2 Distance Stabilization 51
4.3 Identification Error Rate Depending on Amount of Test Data 52
4.4 Speech Signal of 4 sec. Duration 53
4.5 Pre-emphasis of Speech Signal 54
4.6 Hamming Window output 55
4.7 FFT of Speech Signal 56
4.8 Discrete Fourier Transform 57
4.9 Mel-frequency warping 58
List of Graphs

Graph Illustrations Page


4.1 Identification Rate for 1 sec. Training & 1 sec. Testing Duration. 68
4.2 Identification Rate for 2 sec. Training & 1 sec. Testing Duration. 68
4.3 Identification Rate for 2 sec. Training & 2 sec. Testing Duration. 69
4.4 Identification Rate for 5 sec. Training & 1 sec. Testing Duration. 69
4.5 Identification Rate for 5 sec. Training & 2 sec. Testing Duration. 70
4.6 Identification Rate for 5 sec. Training & 3 sec. Testing Duration. 70
4.7 Identification Rate for 10 sec. Training &1sec. Testing Duration. 71
4.8 Identification Rate for 10 sec. Training & 2 sec. Testing Duration. 71
4.9 Identification Rate for 10 sec. Training & 3 sec. Testing Duration. 72
4.10 Identification Rate for 20 sec. Training & 1 sec. Testing Duration. 72
4.11 Identification Rate for 20 sec. Training & 2 sec. Testing Duration. 73
4.12 Identification Rate for 20 sec. Training & 3 sec. Testing Duration. 73
4.13 Identification Rate for 20 sec. Training & 4 sec. Testing Duration. 74
List of Tables

Table Illustrations Page


2.1 Historical Review of Speaker Recognition 10
4.1 Numerical Examples for Time complexities of feature extraction 50
Algorithm
4.2 Performance Measure Parameters 62
4.3 Comparison of Success Rate for 1sec. Training & 1sec. Testing 63
Duration.
4.4 Comparison of Success Rate for 2sec. Training & 1sec. Testing 63
Duration.
4.5 Comparison of Success Rate for 2sec. Training & 2sec. Testing 64
Duration.
4.6 Comparison of Success Rate for 5sec. Training & 1sec. Testing 64
Duration.
4.7 Comparison of Success Rate for 5sec. Training & 2sec. Testing 64
Duration.
4.8 Comparison of Success Rate for 5sec. Training & 3sec. Testing 65
Duration.
4.9 Comparison of Success Rate for 10sec. Training & 1sec. Testing 65
Duration.
4.10 Comparison of Success Rate for 10sec. Training & 2sec. Testing 65
Duration.
4.11 Comparison of Success Rate for 10sec. Training & 3sec. Testing 66
Duration.
4.12 Comparison of Success Rate for 20sec. Training & 1sec. Testing 66
Duration.
4.13 Comparison of Success Rate for 20sec. Training & 2sec. Testing 66
Duration.
4.14 Comparison of Success Rate for 20sec. Training & 3sec. Testing 67
Duration.
4.15 Comparison of Success Rate for 20sec. Training & 4sec. Testing 67
Duration.
Acknowledgment

I take this opportunity to express my heart-felt gratitude to Project Coordinators for their
constant encouragement, able guidance and support throughout the course of this
semester.
I sincerely thank Prof. Rehpade R.B. Principal, MET’S Institute of Engineering, Nashik,
for his advice and support during the course of this work.
I take this opportunity to thank Head of Department, Electronics & Telecommunication
Engineering, Prof. Patil D. P. & express my gratitude towards my parents, colleagues and
friends for their kind support during the completion of work.

Sarath Kolli
TE(E&TC)
Roll No.

You might also like