Professional Documents
Culture Documents
MajorInterim Report1
MajorInterim Report1
Measures
Abstract— Various measurable physiological and be- associated with each test example. On the contrary,
havioural traits which are distinctive have been investi- in Speaker identification the speaker does not make
gated for biometric recognition. Speech is the primary any explicit claim and it rather attempts to find
method of communication in which physiological and
the best match of the test speech against the
behavioral characteristics have individual differences i.e,
distinctive features of the vocal tract shape and intonation set of available trained models of the speakers.
can be captured and utilized for Automatic Speaker SV systems are more practically acclaimed for
Verification(ASV). Based on speech sample, the ASV’s deployment purpose and SI systems are used in
function is to accept or reject a claimed identity. Even forensic applications. Based on text SV, systems
though biometric authentication has advanced signifi- can be classified as text dependent and text inde-
cantly, they remain vulnerable to spoofing attacks. One
pendent systems. In text dependent system, text is
can synthetically produce an individual speech and use
it for authentication purposes. Our project focuses on fixed, which makes the speaker to utter the same
fundamental recognition performances, as opposed to text during training and testing. Where as in the
security to spoofing by modifying some existing features text independent system there is no restriction on
like Mel-Frequency Cepstral Coefficients(MFCC), Linear speech content of the users during training and
Predictive Cepstral Coefficients(LPCC) and Perceptual testing.
Linear Prediction (PLP) and conduct a study on imper-
sonation, replay speech synthesis and voice conversion
spoofing attacks.
II. PROBLEM STATEMENT
m−1
X k
C m = am + Ck am−k against MFCC. It is defined as one
k=1 m
of the characteristic property of the
for 1<m <p audio system.
m−1
X k 4) RFCC FEATURE EXTRACTION
Cm = Ck am−k METHOD:
k=m−p m
ori (MAP) estimation from a well- With mean vector and covariance
trained prior model . The EM al- matrix the mixture weights satisfy the
gorithm is iterative in nature. GMM constraint that The complete Gaussian
are generally used for text indepen- mixture model is parameterized by the
dent speaker identification.There is no mean vectors, covariance matrices and
need for the vocabulary database or mixture weights from all component
big phoneme. Capturing the general densities. These parameters are col-
characteristics of a population and lectively represented by the notation,
accordingly adapting it to individual X
speaker is the basic idea of UBM. λ = {ω, µ, }
i
UBM is defined as the model which
is used to compare the persons inde- For a sequence of T training vectors
pendent feature characteristics against X = {x1 , · · · , xT }
person specific feature model during
decision of acceptance or rejection. The GMM likelihood, assuming inde-
UBM is also said as GMM only pendence between the vectors, can be
with large set of speakers. Firstly written as
r
likelihood score or ratio for an un- Y
p(X | λ) = p(xt | λ)
known speech sample is found after t=1
that the match score of speaker spe-
For utterances with T frames, the log-
cific mode and universal background
likelihood of a speaker models is;
model is formed by using speaker
Ls (X) =
For speaker identification the value of is made using log-scale likelihood ra-
) (XL s is computed for all speaker tio as follow
models s enrolled in the system and
the owner of the model that generates λ(C) = logp(C|λconverted)logp(C|λnatural
the highest value is the returned as where C is the feature vector sequence
the identified speaker. During train- of a speech signal, converted is the
ing phase, Feature vectors are being GMM model for converted speech,
trained using Expectation and Maxi- and natural is the GMM model for
mization (EM) algorithm. An iterative natural speech. Under the three differ-
update of each of the parameters in , ent situations, we have the same nat-
with a consecutive increase in the log ural speech model natural, but three
likelihood at each step. different converted speech model con-
EQUAL ERROR RATE: verted . The number of Gaussian com-
ponents of GMM is set to 512. Equal
In an automatic speaker verification error rate (EER) is reported as the
(ASV) system, the equal error rate evaluation criterion.
(EER) is a measure to evaluate the
system performance. Usually it needs Feature Equal error Rate(%)
a large number of testing samples
to calculate the EER. In order to MFCCs 16.80
estimate the EER without the ex- cos-phase 6.60
periments using testing samples, a MGDF-phase 9.13
method of model-based EER estima-
tion which computes likelihood scores
Source: 2006 NIST Speaker
directly from client speaker models
Recognition Evaluation Test Set
is proposed. However, the distribution
of the computed likelihood scores is VII. FUTURE WORK
significantly biased against the distri-
Our future works include identify-
bution of likelihood scores obtained
ing best possible feature extraction
from testing samples. Now we design
method and improving its perfor-
and manipulate the speaker models of
mance. Based on this data Speaker
the client speakers and the imposters
verification system is to be trained
so that the distribution of the com-
and tested for Gaussian mixture
puted likelihood scores is closer to
model(GMM). Other blocks in SV
the distribution of likelihood scores
system will be studied and imple-
obtained from testing samples. Then a
mented. Vulnerabilty of SV system
more reliable EER can be calculated
towards different spoofing attacks
by the speaker model.
will be studied. Countermeasures for
The natural/converted speech decision
spoofing attacks will be proposed
and implemented and system will be [4] Douglas A. Reynolds A Speaker identification and verification
using Gaussian mixture speaker models, Speech Communi-
trained accordingly and performance cation 17 (1995) 91-108 ESCA Workshop on Automatic
check will be conducted. Speaker Recognition, Identification and Verification, Mar-
tigny, 5-7 April 1994.
[5] Longbiao Wang, Yuta Kawakami Relative Phase Information
VIII. WORKDONE for Detecting Human Speech and Spoofed Speech INTER-
SPEECH 2015.
We evaluated the perfomance of dif- [6] C. J. Kaufman, Rocky Mountain Research Lab., Boulder, CO,
ferent feature extraction methods like private communication, May 1995.