SVM Based

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Journal of Intelligent & Fuzzy Systems Xx (20xx) A-XxN

DOI:10.323 3/JIFS-169692
IOS Press

SVM based Voice Activity Detection by


fusing acoustic feature PLMS with
a new
Some existing acoustic features of speech

Shambhu Shankar Bharti". Manish Gupta and Suneeta Agarwal


Allahabad, U.P, India
Departmeni of Computer Science & Engineering, National Institute of Technology

Abstract. Voice activity detection (VAD) identifies the presence/absence of human spcech in a lrame of a given specch
ith
be dentified in clean speech signal but its accuracy decreases
signal. PresencelAbsence of human speech can easily based automated
VAD helps euhance the efficiency of speech signal
decreasing Signal-to-Noise ratio (SNR) value. Robust
to
aid devices etc. In this paper. a new feaure of speech
applicalions like specch enhancement, speaker identilicalion, bearing
is introduced and used for VAD. This newly defined feature PLMS along
signal- "Peak of Log Magnitude Spectrum (PIMSy" are used to train SVM classitier for VAD.
with three existing acoustic features(MIPCC:RASTA-PILP and Formant Frequency)
most role. Experimentally. it is also observed that the
Experimentally. it is found that coefficients of PLMS play prominent
with other state of the art methods (Sobn VAD
accuracy of the trained SVM
classifier for VAD is the highest when compared
and VAD G.729).

RASTA-PLP, Formant Frequency


Keywords: VAD, PLMS. SVM, MFCC,

of noise estimation. Several approaches for VAD like


1. Introduction
traditional algorithmic approach, machine-learning
Voice Activity Detection (VAD) is a binary clas based approach etc. exist in the literature. Traditional
because any frame either contains
algorithmic approach takes decision on the basis of
sification problem one or two acoustic feature/features. Machine leam-
human speech or does not. VAD differentiates the
speech in a speech ing based VAD uses multiple features for taking its
or absence of human
presence
be easily identified decision that makes it more resistive with noise than
frame. Voice (human speech)
can

signal using traditional approaches. These VADs are more appli-


in the clean (absence of noise) speech
like zero cross- cable in olher speech processing systems because
acoustic features of speech
simple Identification of voice they can naturally integrate with the systems like
ing (2CR) and energy.
rate
ratio (SNR) is speech/ speaker recognition. Since, machine-learning
becomes tedious when signal-to-noise
voice activity detection is con- based VAD fuses multiple acoustic features for taking
low. Due to this, robust
most open and challenging decision therefore selection of features play crucial
sidered to be one of the
area. An cfficient role. Fusion of multiple features of speech (with
ask in speech signal processing
SNR of existing speech similar properties) may or may not increase the per-
and robust VAD improves
enhancing the capability formance of the system. One can select large number
enhancement algorithms by
of features to produce better result but it may take
Bharti, Department
Coresponding author.
Shambhu Shankar larger tine in training. modelling aand testing which
National Institute of Tech-
Science & Engineering. will defeat its applicability in real life
of Computer
Allahabad. U.P-211004. India. E-mails:
shumbhu4u08@ applications
nology like hearing aid devices, online audio chat etc. Hence,
l1403@mnnit.ac.in.

gmail.com and rcs

2018-1OS Press
and ihe authors. All righis reserved
1064-1246/18/$35.00©

s e Atteoca
ani

You might also like