بحث عمر

Supervisor Certificate
I certify that the preparation of this project entitled “ Speaker identification

using linear predictive Code “ Ameer abd- ALKareem” Omar assam Abbas”
“Ahmed Moneim Zidan” “Nawaf Tahsin Ali ‘’ Was made under my
supervision at the Elecromechanical Engineering Department at the
University of Technology in partial fulfillment for the requirements of the
B.Sc.degree in Electromechanical Systems Engineering.
Supervisor signature :
Assist- prof- Dr.Ahmad Kamil Hasan
Date:
Page 1 of 25
TABLE OF CONTENT
Chapter 1: Introduction of Speaker Identification Systems
1.1 Introduction ...........................................................................................4
1.2 Applications of Speaker Identification Systems.............................................8
1.3 Classification of Speaker Identification Systems………………………………………………9
1.4 Objective of this Project..............................................................................10
Chapter 2: Speaker Identification System
2.1 Introduction………………………………………………………………………………………………….11
2.2 classification of speaker identification Systems. …………………………………………..13
2.3 Phases of Speaker Recognition System…………………………………………………………14
2.3.1 Enrollment phase………………………………………………………………………………………14
2.3.2 Feature Extraction Phase…………………………………………………………………………..15
2.4 linear predictive Code ( LPC)………………………………………………………………………..16
2.5 Gaussian mixture model (GMM) …………………………………………………………………19
Chapter3: speaker identification using LPC ……………………………………………………..23
Chapter4: Conclusion
4.1 Conclusion ………………………………………………………………………………………………….25
Reference
Page 2 of 25
Abstract
Speech signal is basically meant to carry the information about the linguistic
message. But it also contains the speaker-specific information. It is generated by
acoustically exciting the cavities of the mouth and nose, and can be used to
recognize (identify/verify) a person. This project deals with the speaker
identification task; i.e., to find the identity of a person using his/her speech from a
group of persons already enrolled during the training phase.
Speaker Recognition is the process of recognizing the speaker from the individual's
speech biometrics. The voice characteristics of every speaker are different and thus
can be used to construct a model. This model is later used to recognize an enrolled
speaker from the list of available speakers, iner The project makes an effort to
discuss Lined predictive code (LPC) technique for extraction of voice characteristic
with Gaussian Mixture Model (GMM). Further, an in-depth analysis of these
surveyed, techniques is made to identify their advantages and limitations The work
in the field of Speaker Recognition Systems has wide
applications in the fields of security, forensics, authentication etc.
Page 3 of 25
ACKNOWLEDGMENTS
Foremost, we would like to express our sincere gratitude to our supervisor Assist.
Prof. Dr. Ahmad Kamil Hasan for his continuous support, patience, motivation, and
immense knowledge. His guidance helped us in all the time of research and writing
of this project. I appreciate his time and effort.
Our thanks to the Academic and Administrative staff at the Department of

Electromechanical Engineering Electromechanical Systems Engineering Branch
We thank our close friends and fellows for the grateful help and enlightening in our
research.
Last but not the least, we would like to thank our families. our parents, sisters, and
brothers for encouraging and supporting us spiritually throughout our life.
Page 4 of 25
CHAPTER ONE
Introduction of Speaker Identification System
1.1 Introduction
Speaker recognition can be classified into speaker identification and speaker

verification. Speaker identification is the process of determining from which of the
registered speakers a given utterance comes as shown in figure (1). Speaker
verification is the process of accepting or rejecting the identity claimed by a
speaker. Most of the applications in which voice is used to confirm the identity of a
speaker are classified as speaker verification [1].
In the speaker identification task, a speech utterance from an unknown speaker is

analyzed and compared with speech models of known speakers. The unknown
speaker is identified as the speaker whose model best matches the input utterance.
In speaker verification, an identity is claimed by an unknown speaker, and an
utterance of this unknown speaker is compared with a model for the speaker
whose identity is being claimed. If the match is good enough, that is, above a
threshold, the identity claim is accepted. A high threshold makes it difficult for
impostors to be accepted by the system, but with the risk of falsely rejecting valid
users. Conversely, a low threshold enables valid users to be accepted consistently,
but with the risk of accepting impostors. To set the threshold at the desired level of
customer rejection (false rejection) and impostor acceptance (false acceptance),
data showing distributions of customer and impostor scores are necessary [2].
Page 5 of 25
The fundamental difference between identification and verification is the number
of decision alternatives. In identification, the number of decision alternatives is
equal to the size of the population, whereas in verification there are only two
choices, acceptance or rejection, regardless of the population size. Therefore,
speaker identification performance decreases asthe size of the population
increases, whereas speaker verification performance approaches a constant
independent of the size of the population, unless the distribution of physical
characteristics of speakers isextremely biased [3].
There is also a case called "open set" identification, in which a reference model for
an unknown speaker may not exist. In this case, an additional decision alternative,
"the unknown does not match any of the models", is bingedes required.
Verification can be considered a special case of the "open set" identification mode
in which the known population size is one. In either verification or identification, an
additional threshold test can be applied to determine whether the match is
sufficiently close to accept the decision, or if not, to ask for a new trial [4].
The speech signal conveys information about the identity of the speaker. The area
of speaker identification is concerned with extracting the identity of the person
speaking the utterance. As speech interaction with computers becomes more
pervasive in activities such as the telephone, financial transactions and information
retrieval from speech databases, theutility of automatically identifying a speaker is
based solely characteristic.
Page 6 of 25
This project emphasizes on text dependent speaker identification, which deals with
detecting a particular speaker from a known population. The system prompts the
user to provide speech utterance. System identifies the user by comparing the
codebook of speech utterance with those of the stored in the database and lists,
which contain the most likely speakers, could have given that speech utterance [5].
The speech signal is recorded Tor N speakers further the features are extracted.
Feature extraction is done by means of LPC coefficients. The GMM is trained by
applying these features as input parameters. The features are stored in templates
for further comparison. Here, the GMM corresponds to the output; the input is the
extracted features of the speaker to be identified. The GMM does the adjustment
and the best match is found to identify the speaker.
The number of epochs required to get the target.
Feature extraction techniques are used to transform the speech signals into
acoustic feature vectors, carrying the essential characteristics of the speech signal
which recognizes the identity of the speaker by their voice. The aim of feature
extraction is to reduce the dimension of acoustic feature vectors by removing
unwanted information and emphasizing the speaker-specific information
Figure( 1)shows speaker identification systems.
Page 7 of 25
[Figure 1] Speaker Identification Systems .
1.2 Applications of Speaker Identification Systems
The Applications of Speaker Identification Systems are [4]:
1- Transaction authentication - Toll fraud prevention.
2- Telephone credit card purchases.
3- telephone brokerage (e.g., stock trading).
4- Access control - Physical facilities and.
5- Computers and data networks Monitoring Remote time attendance logging.
6- Prison telephone usage.
7- Information retrieval-Customer information for call centers.
8- Speaker divarication.
Page 8 of 25
1.3 Classification of Speaker Identification Systems
1.4 The classification of Speaker Identification System can be classified
as:
1- Text dependent speaker identification systems: the sound in trained speech is

the same as the sound in test speech. In text-dependent, the speaker identification
system has prior knowledge about the text to be spoken and the user is expected
to cooperatively speak this text.
2- Text independent speaker identification: the sound in trained speech is different

from the sound in test speech. In a text- independent scenario, the system has no
prior knowledge about the text to be spoken and the user is not expected to be co-
operative. Text- dependent systems achieve high speaker identification
performance from relatively short utterances, while text-independent systems re-
quire long utterances to train reliable models and achieve good performance. The
performance of text independent speaker identification systems is also affected by
factors such as environmental noise. An example of text independent speaker
identification would be a forensic application with covert recordings of speech.
Speaker identification is a special case of the more general problem of speaker
recognition. In case of speaker identification, the goal is to determine which one
among a group of known voices best matches the input voice samples. If it is not
required that the speaker to be identified pronounces a specific set of phrases, as
in our case, the system is said to be text-independent [6].
Page 9 of 25
1.4 Objective of this Project
The objective of this project is to improve the performance of the speaker

identification system using LPC and GMM classifier technique, different
1.3 Project Outlines
The project outlines are organized as:
Chapter Two gives the generat overview of human speech production, and
consequently introduces the spreaders recognition feature extraction the speaker
identification using linear predictive code (LPC) mode and finally explain
Chapter Three explain simulation of speaker identification system
Gaussian mixture mp using LPC and GMM and gives the experimental, results.
Chapter four concludes the project.
Page 10 of 25
CHAPTER TWO
Speaker Identification System
2.1 Introduction
-Speaker identification system can be classified into speaker identification and

speaker verification. Speaker identification is the process 5 Packs of determining
which of the registered speakers a given utterance comes. Speaker Verification is
the process of accepting or rejecting the identity claimed by a speaker. Most of the
applications in which voice is used to confirm the identity of a speaker are classified
as speaker verification.
In the speaker identification task, a speech utterance from an unknown speaker is

analyzed and compared with speech models of known speakers. The unknown
speaker is identified as the speaker whose model best matches the input utterance
as shown in Fig 2 [7].
Fig.2: Block diagram of Speaker Identification.
Page 11 of 25
In speaker verification, an identity is claimed by an unknown speaker and an
utterance of this unknown speaker is compared with a model-for the speaker
whose identity is being claimed. If the match is good enough, that
is, above a threshold, the identity claim is accepted. A high threshold makes it
difficult for impostors to be accepted by the system, but with the risk of falsely
rejecting valid users. Conversely, a low threshold enables valid users to be accepted
consistently, but with the risk of accepting impostors. To set the threshold at the
desired level of customer rejection (false rejection) and impostor acceptance (false
acceptance), data showing distributions of customer and impostor scores are
necessary.
The fundamental difference between identification and verification is the number

of decision alternatives. In identification, the number of decision alternatives is
equal to the size of the population, whereas in verification there are only two
choices, acceptance or rejection. regardless of the population size. Therefore,
speaker identification performance decreases as the size of the population
increases, whereas speaker verification performance approaches a constant
independent of the size of the population, unless the distribution of physical
characteristics of speakers is extremely biased [8].
There is also a case called "open set" identification, in which a reference model for
an unknown speaker may not exist. In this case, an additional decision alternative,
"the unknown does not match any of the models", is required. Verification can be
considered a special case of the "open set" identification mode in which the known
population size is one. In either verification or identification, an additional
threshold test can be applied to determine whether the match is sufficiently close
to accept the decision, or if not, to ask for a new trial [6].
Page 12 of 25
2.2 classification of speaker Identification
Speaker recognition methods can also be divided into text-dependent (fixed

passwords) and text-independent (no specified passwords) methods. The former
requires the speaker to provide utterances of key words or sentences, the same
text being used for both training and recognition, whereas the latter do not rely on
a specific text being spoken. The text- dependent methods are usually based on
template/model-sequence- matching techniques in which the time axes of an input
speech sample and reference templates or reference models of the registered
speakers are aligned, and the similarities between them are accumulated from the
beginning to the end of the utterance. Since this method can directly exploit voice
individuality associated with each phoneme bryllable, it generally achieves higher
recognition performance than the text-independent method [7].
There are several applications, such as forensics and surveillance applications, in

which predetermined key words cannot be used. Moreover, human beings can
recognize speakers irrespective of the content of the utterance. Therefore, text-
independent methods have attracted more attention. Another advantage of text-
independent recognition is that it can be done sequentially, until a desired
significance level is reached, without the annoyance of the speaker having to
repeat key words again and again.
Both text-dependent and independent methods have a serious weakness. That is,
these security systems can easily be circumvented, because someone can play back
the recorded voice of a registered speaker uttering key words or sentences into the
microphone and be accepted as the registered speaker. Another problem is that
people often do not like text- dependent systems because they do not like to utter
their identification number, such as their social security number, within the hearing
of other
people. To cope with these problems, some methods use a small set of words, such
as digits as key words, and each user is prompted to utter a given sequence of key
words which is randomly chosen every time the system is used. Yet even this
Page 13 of 25
method is not reliable enough, since it can be circumvented with advanced
electronic recording equipment that can reproduce key words in a requested order.
Therefore, a text-prompted speaker recognition method has been proposed in

which password sentences are completely changed every time.
2.3 Phases of Speaker Recognition System
For almost all the recognition systems, training is the first step. We call this step in
speaker identification system enrollment phase, and call the following step
recognition phase.
2.3.1 Enrollment phase
It is to get the speaker models or voiceprints for speaker database. The first phase
of verification systems is also enrollment as shown in Fig.3. In this phase we extract
the most useful features from speech signal for SI and train models to get optimal
system parameters
Fig. 3: Enrollment phase for SR
2.3.2 Feature Extraction Phase:
In machine learning, pattern recognition, and image processing, feature extraction

starts from an initial set of measured data and builds derived values (features)
Page 14 of 25
intended to be informative and non-redundant, facilitating the subsequent learning
and generalization steps, and in some cases leading to better human
interpretations. Feature extraction is related to dimensionality reduction [4].
When the input data to an algorithm is too large to be processed and it is

suspected to be redundant (e.g. the same measurement in both feet and meters,
or the repetitiveness of images presented as pixels), then it can be transformed
into a reduced set of features. Determining a subset of the initial features is called
feature selection. The selected features are expected to contain the relevant
information from the input data, so that the desired task can be performed by
using this reduced representation instead of the complete initial data [4].
Feature extraction involves reducing the number of resources required to describe

a large set of data. When performing analysis of complex data one of the major
problems stems from the number of variables involved. Analysis with a large
number of variables generally requires a large amount of memory and computation
power, also it may cause a classification algorithm to overfit to training samples
and generalize poorly to new samples. Feature extraction is a general term for
methods of constructing combinations of the variables to get around these
problems while still describing the data with sufficient accuracy. Many machine
learning practitioners believe that properly optimized feature extraction is the key
to effective model construction.
Page 15 of 25
Linear Predictive coding
2.4: One of the most powerful speech analysis techniques is the method
of linear predictive analysis. This method has become the predominant
technique for estimating the basic speech parameters, e.g., pitch,
formants, spectra, vocal tract area functions and for representing speech
for low bit rate transmission or storage. The importance of this method
lies both in its ability to provide the speed and extremely accurate
estimates of the computation. The basic idea behind LPC analysis is that
a speech sample can be approximated as a linear combination of past
speech samples. By minimizing the sum of the squared differences (over
a finite interval) between the actual speech samples and the linearly
predicted ones.
It is assumed that the variations with time of the vocal tract shape can
be approximated with sufficient accuracy by a secession of stationary
shapes. It is possible to define an all-pole transfer function H(z) that
produces the output speech s(n) given the
Page 16 of 25
input excitation u(n) (either an impulse or
random noise) is given by

𝑺(𝒛) 𝑮
H(z) = 𝑼(𝒛) = 𝟏−∑𝑷 (1)
𝑲=𝟏 𝒂𝒌𝒛−𝒌
Thus, the linear filter is completely specified by scale factor G (gain factor) and p
predictor coefficients a,.,a,. The number of coefficients p required to represent any
speech segment adequately is determined by many factors, such as the length of
the vocal tract, the coupling of the nasal cavities, the place of the excitation and
the nature of the glottal flow function.
A major advantage of the all-pole model of the speech production is that it allows
one to determine the filter parameters in a straight-forward manner by solving a
set of linear equations. In the all-pole model, the speech sample s(n) at n sampling
instant is related to the excitation, u(n) by the following equation:
𝒑
s(n) = ∑ 𝐚𝐤 𝐬(𝐧 − 𝒌) + 𝑮𝒖(𝒏) (𝟐)
𝒌=𝟏
where u(n) is the n sampling of the excitation and G is the gain factor. Equation (2)
represents the LPC difference equation, which shows that the value of the present
output may be determined by summing the weighted present input, Gutn), and the
weighted sum of the post output samples. If the excitation utn) is white noise, the
best estimate of the a speech sample based on speech samples is given by:
𝒑
s(n) =∑𝒌=𝟏 𝒂𝒌𝒔(𝒏 − 𝒌) (𝟑)
where 3(n) is called the predicted value of sin) and is the predictor coefficient. The
prediction error between the actual speech sample and the predicted sample is
defined as:
e(n) = s(n) s(n) (4)

𝒑
𝒂𝒌𝒔 (𝐧 − 𝐤) (𝟓)
=s(n) − ∑
𝒌=𝟏
Page 17 of 25
which is the output of a system whose transfer function is:
𝒆(𝒛) 𝒑
A(z) = 𝑺(𝒛) = 𝟏 ∑𝒌=𝟏 𝐚𝐤𝐳 −𝒌 (𝟔)
where A(z) is the transfer function of the predictor error filter or the inverse filter
for the system H(2) To determine the filter coefficients, as, the mean squared
prediction error is minimized over a short-segment of speech (N). The average
square of the prediction error becomes:
𝑵−𝟏 𝑵−𝟏 𝒑
𝒂𝒌𝒔 (𝐧 − 𝐤)
𝑬𝒎 = ∑ 𝒆𝟐 (𝒏) ∑ [𝒔(𝒏) − ∑ ]𝟐
𝒏=𝟎 𝒏=𝟎 𝒌=𝟏
the values of the estimated predictor coefficients can be determined by minimizing

the partial derivatives of E with respect to a
𝝏𝑬𝒎
= 𝟎(𝒌 = 𝟏, 𝟐, … 𝒑) (𝟖)
𝝏𝒂𝒌
This yields p linear equations:

𝑵−𝟏 𝒑 𝑵−𝟏−𝑲
∑ 𝒔 (𝒏 − 𝒊)𝒔(𝒏) = ∑ 𝒂𝒌 ∑ 𝒔(𝒏 − 𝒊)𝒔(𝒏 − 𝒌) (𝟗)

𝒏=𝟎 𝒌=𝟏 𝒏=𝟎
where i-0,1,...p and k-1,2,p. Defining
R(i)= ∑𝑵−𝟏−𝒊
𝒏=𝟎 𝒔(𝒏)𝒔(𝒏 + 𝒊) (𝟏𝟎)
Then Equation (10s can be expressed by matrix, representation, as:

𝒂𝟏 𝑹(𝟏)
𝑹(𝟎) 𝑹(𝟏) … . 𝑹(𝑷 − 𝟏)
𝑹(𝟏) 𝑹(𝟎) … . 𝑹(𝑷 − 𝟐) 𝒂𝟐 𝑹(𝟐)
= (11)
𝑹(𝑷 − 𝟏) 𝑹(𝑷 − 𝟐) … . 𝑹(𝟎)
[ ] [𝒂𝒑] [𝑹(𝑷)]
Where Equation (12) is the DFX of the sequence Equation (13) gines the logarithm
of cahue value of the DFT of the input and Equation (14) gives the real copstral
coefficient the put sequence is the copstaal The real cepetnam is mainly used as
feature yector as an improvement over the decci sage of CPC based cepstral
features if a give speaker in the proces speaker identification.
Page 18 of 25
The pxp autocorrelation matrix of the term has the form of a toeplitz matrix .which
is symmetrical and has the same values along the lines parallel to the man diagonal
This type of equation is called a Yule Walker equation Since the positive definition
the autocorrelation matrix guaranteed by the definition of the autocorrelation

function an inverse matrix exists for the autocorrelation matrix Solving the
equation permits obtaining
The cepstrum, however. ignores the phase of the time-dependent Fourier

representation and therefore, the tim dependent cepstrum cannot uniquely
represent the speech wavefrm Nevertheless, it is seen that the cepstrum is a
convenient basis for estimating pitch voicing and formant Frequencies
The equation for the autocorrelation method can be effectively solved by the
Durbin's recursive solution method.
2.5 GMM Gaussion mixture model is a probabilistic model for representing the
presence of subpopula tions within an overall population, without requiring that an
observed data set should identify the sub-population to which an individual
observation belongs. Formally a mixture model corresponds to the mixture
distribution that represents the probability distribution of observations in the
overall population. The probability distribution function of GMM can be defined as:
𝟏 𝟏
𝑵(𝝁, 𝜺) = 𝐞𝐱𝐩(− (𝒙 − 𝝁)𝒕𝜺−𝟏(𝒙−𝝁)
(𝟐𝛑)! √𝜺𝜺 𝟐
Where
M- Mean
-Covariance Matrix of the Gaussian
X-Number of data points
Page 19 of 25
It is also important to note that because the component Gaussian are acting
together to model the overall feature density, full covariance matrices are not
necessary even if the features are not statistically independent. The linear
combination of diagonal covariance basis Gaussians is capable of modeling the
correlations between feature vector elements. The effect of using a set of M full
covariance matrix Gaussians can be equally obtained by using a larger set of
diagonal covariance Gaussians. GMMs are often used in biometric systems, most
notably in speaker recognition systems, due to their capability of rep-resenting a
large class of sample distributions. One of the powerful attributes of the GMM is its
ability to form smooth approximations to arbitrarily shaped densities,
The classical uni-modal Gaussian model represents feature distributions by a

position (mean vector) and a elliptic shape (covariance matrix) and a vector
quantizer (VQ) or nearest neighbor model represents a distribution by a discrete
set of characteristic templates [1]. A GMM acts as a hybrid between these two
models byusing a discrete set of Gaussian functions, each with their own meanand
covariance matrix, to allow a better modeling capability.
There are several methods to estimate the statistical parameters ofthe GMM
model. The most popular method is the maximum likelihood (ML) or maximum a
posteriori (MAP) estimator.
Figure 6 compares the densities obtained using a unimodal Gaussian model, and a
GMM. Plot (a) shows the histogram of a single feature from a speaker recognition
system (a single cepstral value from a 25 second utterance by a male speaker); plot
(b) shows a uni-modal Gaussian model of this feature distribution; plot (c) shows
aGMM and its ten underlying component densities; The GMM not only provides a
smooth overall
Page 20 of 25
Chapter 3
speaker identification using LPc
chapters3
There are two inputs to the speaker identification system the first is the identity
claim which may be provided be keyed in identification number
that gives a reference data corresponding to the claim to be retrieved. The

second is refer activated actived by a request to speak the sample utterance. All
the experiments were performed using 20 Speakers. The speech signal is
sampled at 16 KHZ. using Computer blaster (in normal room Conditions). The
speech sample is quantized into 16 bit. The next step is to normalize the
utterance with respect to identity claim: The Continuous speech signal is
blocked into frames of N samples with adjacent frames overlapping of M.
Samples (MCI) f(k, n)= s cn + M (k-1)), n = 0, 1,….n-1 k=1….L
where I is the number of frame in the speech Signal. The typical chosen value of
N and M are 280 Samples Cabout 17.5 msec) and 100 samples
(about 6 msec), respectively. The frame window used to minimize the signal
discontinuities at the beginning and end of each frame is defined as
£ (kin) = f (kin) W₁ (^), 1= 0, 1, 2, where Wh (n) is the Hamming window and it

can be defined ay
Wh(n) = 0.54-0.46 Cos (2πnN) where on EN and the window length is 2 = N + 1
Then, the utterance is converted into effective Parametric representation for

speaker identification done by Linear Predictive code (LPC). Feature matching
Performs the similarity measure between the unknown tterance and reference
template. The Gaussian Mixture model (GMM) is used for
Page 21 of 25
matching similarity. The Previous Procedures are refeated for all unknown.
Speakers and the system is checked to access for identifying speaker or not,
then the system is tested to find the identification rate which is defined as:
Correct Identifican Identification rate (IR)= No of Shh speakers Total Number of

speakers
Table (1) shows the identification rate for different numbers of speakers using
LPC and GMM-
Noof I dentification rate (%)

speakers
5 100
10 80
15 73.33
20 70
It is clear from this table that the identification rate is decreased when the
number of speaker is increased.
Page 22 of 25
Chapter 4
Conclusion
4.1 Conclusion
The following points are concluded from this project
1- This project presents a method to evaluate the performance of

speaker identification system. This method is based on using LPC as a
feature extraction and GMM is used as a classifier technique.
2- The performance of speaker identification system based on LPC and

GMM achieved high identification rate when train and test speech
signals were also kept under clean Condition.
Page 23 of 25
References
[1] Y. Zhang, L. Wu, and S. Wang, "Magnetic resonance brain image

classification by an improved artificial bee colony algorithm," Prog.
Electromagn. Res., vol. 116, no. April, pp. 65-79, 2011.
[2] S. Nakagawa, L. Wang, and S. Ohtsuka, "Speaker identification and

verification by combining LPC and phase information," IEEE Trans.
Audio, Speech Lang. Process., vol. 20, no. 4, pp. 1085–1095, 2012.
[3] A. K. Panda and A. K. Sahoo, "Study of Speaker Recognition

Systems," pp. 1-60, 2011.
[4] H. Aghajan, J. C. Augusto, and R. L.-C. Delgado, Human- centric

interfaces for ambient intelligence. Academic Press, 2009.
[5] S. Chen and Y. Luo, "Speaker verification using LPC and support
vector machine," Proc. Int. MultiConference ..., vol. I, pp. 18-21, 2009.
[6] T. Dutta, "Text dependent speaker identification based on

spectrograms," Proc. Image Vis. Comput., pp. 238-243, 2007.
[7] S. K. S. prof P. C. P. I. Bombay, "Features and Techniques for

Speaker Recognition," no. 03307409, pp. 1-16, 2003.
[8] T. Ganchev, N. Fakotakis and G. Kokkinakis, "Comparative

evaluation of various LPC implementations on the speaker.
Page 24 of 25
Republic of Iraq
Ministry of Higher Education and Scientific Research
University of Technology
EIectromeehanical Engineering Department
Energy and Renewable Energies Branch
Speaker Identification Using Linear Predictive Code

A graduation project is submitted to the Electromechanical Engineering
Department in partial fulfilment of the requirements for the degree of
Bachelor of Science in Electromechanical Engineering Department
Energy and Renewable Energies Branch
BY
Omar assam Abbas
Ameer abd- ALKareem
Ahmed Moneim Zidan
Nawaf Tahsin Ali
SUPERVISOR
Dr.Ahmad Kamil Hasan
Baghdad Iraq 2022 2023
Page 25 of 25

بحث عمر

Uploaded by

Copyright:

Available Formats

You might also like

بحث عمر

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

بحث عمر

Uploaded by

Copyright:

Available Formats

Supervisor Certificate

I certify that the preparation of this project entitled “ Speaker identification

Assist- prof- Dr.Ahmad Kamil Hasan

Chapter 1: Introduction of Speaker Identification Systems

1.1 Introduction ...........................................................................................4

1.2 Applications of Speaker Identification Systems.............................................8

1.3 Classification of Speaker Identification Systems………………………………………………9

1.4 Objective of this Project..............................................................................10

Chapter 2: Speaker Identification System

2.2 classification of speaker identification Systems. …………………………………………..13

2.3 Phases of Speaker Recognition System…………………………………………………………14

2.3.1 Enrollment phase………………………………………………………………………………………14

2.3.2 Feature Extraction Phase…………………………………………………………………………..15

2.4 linear predictive Code ( LPC)………………………………………………………………………..16

2.5 Gaussian mixture model (GMM) …………………………………………………………………19

Chapter3: speaker identification using LPC ……………………………………………………..23

4.1 Conclusion ………………………………………………………………………………………………….25

applications in the fields of security, forensics, authentication etc.

Our thanks to the Academic and Administrative staff at the Department of

Introduction of Speaker Identification System

Speaker recognition can be classified into speaker identification and speaker

In the speaker identification task, a speech utterance from an unknown speaker is

The number of epochs required to get the target.

Figure( 1)shows speaker identification systems.

1.2 Applications of Speaker Identification Systems

The Applications of Speaker Identification Systems are [4]:

1- Transaction authentication - Toll fraud prevention.

2- Telephone credit card purchases.

3- telephone brokerage (e.g., stock trading).

4- Access control - Physical facilities and.

5- Computers and data networks Monitoring Remote time attendance logging.

6- Prison telephone usage.

7- Information retrieval-Customer information for call centers.

1.4 The classification of Speaker Identification System can be classified

1- Text dependent speaker identification systems: the sound in trained speech is

2- Text independent speaker identification: the sound in trained speech is different

The objective of this project is to improve the performance of the speaker

1.3 Project Outlines

The project outlines are organized as:

Chapter Three explain simulation of speaker identification system

Chapter four concludes the project.

Speaker Identification System

-Speaker identification system can be classified into speaker identification and

In the speaker identification task, a speech utterance from an unknown speaker is

Fig.2: Block diagram of Speaker Identification.

The fundamental difference between identification and verification is the number

Speaker recognition methods can also be divided into text-dependent (fixed

There are several applications, such as forensics and surveillance applications, in

Therefore, a text-prompted speaker recognition method has been proposed in

2.3 Phases of Speaker Recognition System

2.3.1 Enrollment phase

Fig. 3: Enrollment phase for SR

2.3.2 Feature Extraction Phase:

In machine learning, pattern recognition, and image processing, feature extraction

When the input data to an algorithm is too large to be processed and it is

Feature extraction involves reducing the number of resources required to describe

random noise) is given by

e(n) = s(n) s(n) (4)