Professional Documents
Culture Documents
Implementing A Voice Recognition System Based On DSP Tms320c28121
Implementing A Voice Recognition System Based On DSP Tms320c28121
H CH MINH
KHOA IN - IN T
THNG 1 - 2009
MC LC
Trang.
3
5
Phn 2: Tng quan v nhn dng m thanh, la chn gii thut s nhng vo DSP
I. Tng quan v h ASR
6
6
3. Tch c trng.
11
24
30
33
Phn 3: H nhng
I. Board eZdsp
33
II. TMS320F2812.
35
39
42
45
45
46
47
Trang - 1 / 82 -
4. Ngun
49
50
50
51
56
c) Windowing v FFT
58
60
e) Cepstrum.
62
63
65
III. Kt qu v nhn xt
69
Ph lc
72
82
Trang - 2 / 82 -
Phn 1:
Gii thiu.
I. Mt s ng dng ca h thng nhn dng ting ni.
H thng nhn dng ting ni ( Automatic Speech Recognition ASR ) s c nhng ng dng
tuyt vi trong tt c cc lnh vc ca i sng, nu c p dng thnh cng s l mt cuc cch
mng trong giao tip ngi my (Human Machine Interface ), cc ng dng ca n bao trm trn nhiu
lnh vc nh cng nghip, an ninh v gii tr.
Trong lnh vc iu khin:
Cc h thng vi b t vng nh, nhp t ri rc c th p dng trong nhng ng dng tng
i n gin ci thin hiu qu nhp thng tin vo my (nhp ting ni nhanh gp hai ln nhp
thng tin bng cch g ch) trong mi trng sn xut (cng vic phn loi), trong nhng ng dng m
i tay khng cn gi tr (chng hn nh trong phng ti, trong bung li), trong cc ng dng iu
khin t xa vi thit b, iu khin robot, iu khin chi tr em, hay trong cc thit b yu cu thu
nh phi loi b h thng phm nhn, v nu c th s l mt phng php hu hiu gip cho ngi
khim th d dng giao tip iu khin vi thit b. Ni chung l trong nhng nhim v c bit c
khuynh hng gii hn b t vng v ni dung thng ip. Trong ti ny, ngi thc hin quan tm
nhiu cho mt h thng nhn dng trn mt b t vng nh (nh hn 10 t) ng dng trong cc h
thng iu khin vi tp lnh c nh.
Nhng ng dng thc tin m h thng ny s mang li l v cng to ln nh cc my tnh ca
Trang - 3 / 82 -
chng ta s khng cn cc bn phm, cc h thng iu khin s khng cn bng iu khin phc tp,
my in thoi s khng cn cn n cc bn quay s Pha trc ti x xe hi s c mt vi mch t
ng tr li c khi hi hng i v trong nh mi ngi s c tm lch bit nhc nhng vic cha
lm khi bn ln ting hi c th xem l mt bc t ph trn tt c cc lnh vc trong cuc sng ca
chng ta.
Cc h thng nh th ny c th tm thy nhng i in thoi di ng hin i nh iPhone
ca Apple hay dng Nseries ca Nokia.
Trong lnh vc chuyn i tn hiu:
Mt cuc phng vn c ng ln mt t bo, nu c mt h thng nhn dng cu ni hon
thin, ngi phng vin khng cn phi nh li bi phng vn ca mnh. Trong cc cuc hi tho trc
tip hay cc bui to m t xa, vn bn cuc hp s t ng c in ra m khng cn th k son
tho. H thng nhn dng ting ni s t ng chuyn i li ni thnh vn bn.
Trong cc cuc ni chuyn do bt ng ngn ng, hay do nhng vn t nh v t i dn tc,
chuyn i qua li gia hai ngn ng, cng vi h thng dch thut trn vn bn kt hp vi hai h
thng nhn dng ting ni s cho php cuc ni chuyn din ra bnh thng v t nhin. H thng
chuyn i ngn ng trc tip ny rt hu ch trong cc cuc hi tho ln c nhiu quc gia, dn tc
tham d.
H thng kiu nh th ny i hi kh nng nhn dng rt ln, cho ti thi im hin ti mc
ng dng cn hn ch.
Trong lnh vc nhn din:
H thng nhn dng ting ni kt hp vi x l tng hp ging ni cn c ng dng trong
lnh vc nhn din ting ni. H thng mt m ging ni cho php nhn dng ngi thng qua ting
ni, chng hn rt tin ra khi ngn hng hay cc tc v khc m khng cn kim tra ch k hay cc
giy t khc v c yu cu b mt v nhn thn. Hoc ng dng trong cc h thng kho t ng m
cha kho l ting ni.
H ASR nh vy c nhng p dng trong thc t.
C th tham kho thm ti a ch http://en.wikipedia.org/wiki/Speech_recognition cho cc ng
dng ca h thng ASR trong thc t.
Trang - 4 / 82 -
Tnh trng sc khe. Mt ngi khi khe mnh s pht m khc hn so vi khi gp m au,
v d nh cm cm chng hn.
Tc ni.
Vi mt ngi, trong mt khong thi gian ngn, vic pht m mt t trong nhiu ln khc
nhau c th khc nhau.
Nhiu, ting n ca mi trng xung quanh. V d mt ngi ni trong khng gian yn tnh
s d nghe hn l ngoi ng ph
Handset thu m c th khc nhau trong nhng tnh hung khc nhau.
iu kin l tng cho vic thc hin nhn dng ting ni ni chung v m thanh ni ring l
ting ni s n nh k c trong lc hun luyn v lc nhn dng. Ting ni ca mi ngi l duy nht,
khng trng ln vi nhng ngi khc. Do , cho n thi im hin ti, vic nhn dng m thanh,
ting ni l mt cng vic rt kh khn.
III.
Vit chng trnh nhng vo DSP thc hin cng vic nhn dng ting ni. La chn gii
thut ph hp vi ti nguyn ca phn cng.
Vi mc tiu l tp trung vo gii thut, nn khng cn xy dng cc phn cng demo khc ( v
d nh chic xe iu khin bng ging ni ). lun vn ny, th hin kt qu ca nhn
dng, ti ch th hin ln bng LCD v cc LED.
nh gi kh nng thc hin c.
Trang - 5 / 82 -
Phn 2:
TNG QUAN
V NHN DNG M THANH
& LA CHN GII THUT S NHNG VO
DSP
I. Tng quan v h thng ARS.
1. Nguyn tc hot ng ca h thng ARS.
Nhng nguyn tng qut sau c nh ngha vi h thng
identification. H thng text dependent speaker identification cng c nhng s phn loi tng t
nh vy. Tham kho [1].
H thng ARS c th c phn loi thnh 2 loi l nhn dng (Speaker Identification ) v xc
nhn ( Speaker Verification ).
H thng nhn dng ( Speaker Identification ): l h thng a ra quyt nh ngi no
trong s nhng ngi hun luyn h thng ang giao tip vi h thng.
H thng xc nhn ( Speaker Verification ): l h thng chp nhn/bc b mt ngi no .
Quyt nh ngi va giao tip vi h thng c nm trong nhng ngi c ng k
hay khng.
Hnh 2.1 th hin cu trc c bn ca 2 h thng trn.
Trang - 6 / 82 -
Feature
extraction
Input
speech
Reference
model
(Speaker #1)
Maximum
selection
Identification
result
(Speaker ID)
Similarity
Reference
model
(Speaker #N)
Input
speech
Speaker ID
(#M)
Feature
extraction
Similarity
Reference
model
(Speaker #M)
Decision
Verification
result
(Accept/Reject)
Threshold
Trang - 7 / 82 -
3. Tch c trng.
Tch c trng ca mu l mt phn quan trng ca bt c h thng nhn dng no. Mt cch
l tng, mt i tng khc nhau s c mt hoc nhiu c trng. Cc c trng cng khc nhau gia
cc i tng th vic nhn dng cng chnh xc.
Vic nhn dng s da trn cc c trng ny, c th s dng 1 c trng hoc kt hp nhiu
c trng li vi nhau. Vi cc h thng ASR hin nay, thng ch s dng mt c trng ca tn hiu
m thanh.
Cho ti thi im hin nay, cc phng php ch yu tch c trng c th k n nh:
Linear Prediction Coding (LPC), Mel Frequency Cepstrum Coefficients ( MFCC ), Principle
Components Analysis (PCA) v cc phng php khc.
Linear Prediction Coding.
Trang - 8 / 82 -
Frame
Blocking
mel
cepstrum
Windowing
frame
Cepstrum
mel
spectrum
FFT
spectrum
Mel-frequency
Wrapping
Trang - 10 / 82 -
Trang - 11 / 82 -
Record / Sampling.
Tai ngi thn nht vi cc tn hiu c tn s trong khong 100Hz 5Khz, v thng thng vi
tn hiu m thanh, khu vc ph ny chim phn ln nng lng ca m thanh c pht ra.
c c cc nng lng ch yu ca ting ni con ngi, h thng ca ti s ly mu m
thanh tc 12Khz, nh vy tn hiu thu c s mang tn s ln n 6Khz.
1.b)
Frame Blocking.
Trang - 12 / 82 -
1.c)
World detection.
gim khi lng tnh ton, v tng chnh xc, ch khi no c tn hiu ting ni thu c
mi x l. Cng vic ny c gi l tch t ( end point detection ).
Phng php ph bin nht l dng tch t l dng nng lng ( ESS Energy of Speech
Signal ) kt hp vi t l im qua im Zero ( ZCR Zero Crossing Rate ). Tham kho [2]. Ngoi ra
cn c cc phng php khc na l Teager's Energy, tham kho [3], hoc dng mng hun luyn
Neural.
V s n gin v tnh ph bin ca phng php ESS nn n c chn tch t y.
Phng php ESS da trn quan im cho rng khi no c ting ni, nng lng thu c s
ln hn rt nhiu so vi khi im lng. T vic xc nh liu c tn ti ting ni thu c hay khng
da vo nng lng ca tn hiu thu c.
Nng lng ca frame th n c tnh nh sau: E n=
length of frame
i=1
Trang - 14 / 82 -
ni
Trang - 15 / 82 -
Trang - 16 / 82 -
Tnh
I1 = 0.03*( IMX IMN ) + IMN.
I2 = 4* IMX.
Windowing.
Trang - 17 / 82 -
0 n N 1.
Trang - 18 / 82 -
FFT.
N1
n= 0
xn e j 2 kn / N ,
k = 0,1,2,..., N 1
Trang - 19 / 82 -
Hnh 2.12. Bin ph sau khi FFT. Hnh trn khi tn hiu khng c ca s ha.
Hnh th 2, tn hiu c b ca s ha bng Hamming.
Kt qu t bc ny l mt dy cc bin ph tn s ca cc frame lin tip nhau.
1.e)
Mt s nghin cu vt l v tai ngi cho thy phn ng ca tai ngi vi tn hiu ting ni
khng tun theo quy lut tuyn tnh v tn s. Vy mt cch tip cn ch quan, mi tn hiu m thanh
c pht ra s c chuyn i li cho ph hp, lc ny tn s mel ( mel frequency ) c s dng.
Mel frequency tuyn tnh tn s di 1Khz v logarithmic tn s trn 1Khz. Cng thc
mel frequency c tnh nh sau:
mel f =2595log 1 f /700
Trn thc t, tnh ton mel spectrum th ngi ta dng nhng ca s lc filterbank, c
xp xp mt cch ng iu nh Hnh 2.13 bn di. Filterbank l nhng b lc thng dy hnh ch
Trang - 20 / 82 -
l = X k M l k
l=0, 1,. .. , L1
k=0
Trang - 21 / 82 -
Cepstrum
Trang - 22 / 82 -
nhiu
cng
thc
tnh
DCT,
th
tham
kho
thm
ti
ch
1
= log k cos[ nk ]
2 K
k=1
n=0,1,... , K 1
Trang - 23 / 82 -
th nht, hnh ch nht ch acoustic vector ca ngi th 2. Trong qu trnh hun luyn, thut ton to
chm ( s c trnh by sau ) c dng to ra mt VQ codebook ca t .
Speaker 1
Speaker 2
Speaker 1
centroid
sample
VQ distortion
Speaker 2
centroid
sample
l= b1a12b2a22...bnan2
Trang - 25 / 82 -
Trang - 26 / 82 -
m<M
No
Stop
Split each
centroid
m = 2*m
Cluster
vectors
Find
centroids
Compute D
(distortion)
D = D
No
D ' D
<
D
Yes
m HONEY, hnh th 2 l Codebook ( 16 vectors ) thu c t thut ton LBG. Cch v hnh nh
th ny khng th thy c v tr ca codebook so vi cc acoustic vector, nhng v codebook thu
c l vector 20 chiu ( l kch thc ca Mel filterbank ),vic biu din trong khng gian a chiu
l kh khn, nn cch v nh vy thy c kt qu l ch yu.
Trang - 27 / 82 -
Hnh 2.19. VQ thu c t acoustic vector tng ng, nh vo thut ton LBG.
Chn kch thc ca codebook cng ln th mc nhn dng cng cao. Theo nh ti liu tham
kho [6] th s nh hng c th hin thng qua hai bn sau; kt qu ny cha c kim chng
li, nhng c th xem nh l s liu tham kho:
Trang - 28 / 82 -
Trang - 29 / 82 -
Trang - 30 / 82 -
Hnh 2.20 Chng trnh thu m kim tra thut ton trn PC.
Kt qu thu c khi chy bng MATLAB.
Khi kim tra vi 9 t t 1 9. Chng trnh phn bit c 9 t ny.
Trang - 31 / 82 -
Trang - 32 / 82 -
Phn 3:
H nhng.
I. Board eZdsp
Lun vn ca ti c xy dng trn nn DSP ca Texas Instrument, board mch c s
dng l eZdsp
http://c2000.spectrumdigital.com/ezf2812.
im sau:
Nhn x l tn hiu l TMS320F2812.
Tc 150 MIPS.
18K words on chip RAM.
128K words on chip FLASH memory
64K worlds off chip SARAM memory.
Onboard IEEE 1149.1 JTAG Controller.
Onboard IEEE 1149.1 JTAG emulation connector.
Trang - 34 / 82 -
II. TMS320F2812.
TMS320F2812 l nhn ca board mch eZdsp F2812. Phn sau y l nhng gii thiu khi
qut v DSP ny, thng tin chi tit hn c miu t cc Applications Report ca TI.
Trang - 35 / 82 -
H C28x DSP l h mi nht ca dng TMS320C2000 DSP. Chng trnh ca C28x tng
thch vi h 24x/240x DSP. Vi kh nng 32 x 32 bit MAC ca h C28x v kh nng x l 64 bit,
cho php C28x tr thnh s la chn cho nhng ng dng i hi nhng nhn iu khin foating
point.
3. External Interface.
Giao tip bt ng b cha 19 lines address, 16 lines d liu v 3 chip select lines. Chip select
lines c m ha thnh 5 vng bn ngoi, l Zones 0, 1, 2, 6 v 7.
Mi Zone ny c th lp trnh vi nhng wait states, strobe signal setup v hold timing khc
nhau.
4. Flash.
F2812 cha 128K x 16 b nh Flash. c chia thnh bn 8K x 16 sectors v 6 16K x 16
sector.
Trang - 36 / 82 -
Hot ng ca flahs c th c ci thin bng cch cho php chc nng flash pipeline trong
cc thanh ghi iu khin Flash.
5. M0, M1 SARAMs.
Mi vng ny cha 1K x 16 b nh RAM, c th dng cha chng trnh hay d liu
7. Boot ROM.
Vng Boot ROM cha boot-loading, c thc thi sau khi CPU c reset. N s kim tra mt
s GPIO quyt nh ch no bt u chng trnh.
8. Oscillator v PLL.
F2812 c cung cp xung nhp bng b dao ng ngoi hay bng thch anh gn vo chip.
Mt b PLL cung cp n 10 mc iu chnh vi dao ng ny. T s PLL c th c thay i
ngay c khi chng trnh ang chy, cho php chng trnh h thp tn s hot ng xung, trong
Trang - 37 / 82 -
9. GPIO Multiplexer.
Hu ht cc tn hiu ngoi vi c tch hp vi general purpose I/O. C nhng thanh ghi cho
php chn mt chn l GPIO hay l chn ca tn hiu ngoi vi.
Trang - 38 / 82 -
External pin.
Tham kho ti liu spru060c ADC user manual c nhng thng tin y hn.
III.
Vi board mch eZdsp F2812, chng trnh c th c cha trong RAM, on chip RAM hoc
off chip RAM c c tc hot ng cao nht ( 150MIPS on chip RAM ). Lc ny, d liu
v chng trnh cng c cha trong RAM.
Chng trnh cng c th c cha trong Flash ( vi h F ) hoc ROM ( vi h C ). Tuy nhin
cch cu hnh ny, tc hot ng b gim i ch cn khong 120 130 MIPS.
Vi nhng h thng standalone, th DSP khng c kt ni vi host, vy chng trnh nht
nh phi cha trong FLASH hoc ROM. Nhng t c tc cao nht vi nhng function i
hi thi gian tnh ton cao, c th copy nhng on chng trnh ny t Flash vo RAM c tc
hot ng ti a.
Trong chng trnh ca ti, phng php copy ny c p dng cho nhng function i hi
tnh real time cao nh khi thu tp d liu t ADC, tnh tch c trng Acoustic Vector, hun luyn v
nhn dng.
Applications Report SPRA958 - Running an Application from Internal Flash Memory on
the TMS320F281x DSP s cung cp nhiu thng tin hn v phng php ny.
Sau y l linker file phn vng b nh cho chng trnh trong lun vn ca ti.
MEMORY
{
PAGE 0: /* Program Memory */
Trang - 39 / 82 -
Trang - 40 / 82 -
PAGE = 0
/* Math Code */
PAGE = 0, TYPE = NOLOAD /* Math Tables In ROM */
/* newwwwww */
eramdata
: > ZONE6
PAGE = 1
/* for my APP */
FFTipcb
ALIGN(256) : { }
>
RAMH0
Trang - 41 / 82 -
PAGE 1
RAMH0
RAMH0
RAMH0
PAGE 1
PAGE 1
PAGE = 1
Trang - 42 / 82 -
TI cung cp rt nhiu cc th vin DSP, gip cho cng vic pht trin cc thut ton c rt
ngn kh nhiu thi gian. Cc th vin DSP c s dng trong ti ca ti c th k n nh:
C28x Communications Driver Library, s dng th vin SPI giao tip vi EEPROM ngoi.
C28x Fast Fourier Transforms Library, tnh FFT ca cc tn hiu m thanh.
C28x Filter Library, s dng b lc IIR lc tn hiu u vo t ADC.
C28x Fixed-Point Math Library, l b cng c ton hc tr gip cc php ton s hc.
C28x IQMath Library - A Virtual Floating Point Engine, l b cng c tnh ton s thc trn
nn chip Fixed Point DSP, ng thi h tr cc php ton s hc.
Trang - 43 / 82 -
Trang - 44 / 82 -
Phn 4:
Trang - 45 / 82 -
Trang - 46 / 82 -
Hnh 4.2. Giao tip vi EEPROM thng qua giao tip SPI.
ROM c s dng l 25AA640A ca Microchip. B nh 64Kbit, giao tip 8 bit. Tc giao
tip ln ti 10Mhz.
giao tip vi EEPROM ny, cc instruction set c dng chn cc chc nng ca chip.
Trang - 47 / 82 -
Trang - 48 / 82 -
4. Ngun.
giao tip vi DSP ( ngun 3.3 V ), cho mch khuch i tnh hiu OpAmp, cn to ngun
+5V , +3.3V, -5V. to in p Offset 1.5V trc khi a vo ADC, cn thit to ngun 1.5V
Trang - 49 / 82 -
Trang - 50 / 82 -
Vng trn s dng c chiu di 128 im, mi im c nh dng signed int. Tn hiu sau khi
thu c s c a qua b lc IIR loi b Offset.
ADC c cu to hot ng ch override. ch ny, channel 0 s c convert lin
tip 8 ln, cc gi tr s c lu vo Sequence 1 ri 2. Sau 8 ln convert ny, chng trnh to ngt.
Vi cch cu to nh vy, chng trnh s khng mt nhiu thi gian x l cc ngt t ADC.
B lc IIR.
TI c cung cp th vin IIR cho mc ch x l tn hiu, ng thi cung cp thm MATLAB
script khi to cc h s ca b lc ny.
Script eziir32.m l cng c thit k cc b lc IIR vi h s 32bit cho th vin C28x Filter
Library ca TI.
Hnh 4.8 sau ch cch thc hin c c cc h s ny:
Trang - 52 / 82 -
Trang - 53 / 82 -
*/
#define IIR32_COEFF {\
0,12501435,0,-2987997,2987997,\
-13537326,29729702,4594994,-9189987,4594994,\
-16060578,32499120,1025818115,-2051636230,1025818115}
Trang - 54 / 82 -
4275781
#define IIR32_NBIQ 3
#define IIR32_QFMAT
24
Trang - 55 / 82 -
Trang - 56 / 82 -
trnh s thay i t ngt gia cc frame, nn phi c s trng lp gia cc frame lin tip
vi nhau. Trong chng trnh ca ti, chn trng lp 48 tn hiu gia cc frame.
ITL = 1191
ITU = 5955
Nng lng ca cc frame khi c v khng c ting ni c th phn bit nh hnh di.
Trang - 57 / 82 -
Trang - 58 / 82 -
Trang - 59 / 82 -
#define MEL_BANK 10
const Uint16 b0[] = {5, 1, 1224, 1688, 710};
const Uint16 b1[] = {7, 2, 312, 1290, 1822, 1007, 256};
const Uint16 b2[] = {8, 4,178, 993, 1744,1558,
907,
297};
const Uint16 b3[] = {9, 7, 442, 1093, 1703, 1723, 1180, 667, 179};
const Uint16 b4[] = {11, 10, 277, 820, 1333, 1821, 1714, 1271,
847,
440,
50};
const Uint16 b5[] = {12, 14, 286, 729, 1153, 1560, 1950, 1675, 1314,
965, 629, 304};
const Uint16 b6[] = {15, 19, 325, 686, 1035, 1371, 1696, 1990, 1685,
1389,
1103,
824,
553,
290,
1447,
621,
1710,
405,
10,
33};
1967,
315,
611,
1783,
1540,
897,
1302,
1070,
193};
1379,
1595,
1389,
1198,
1010,
217,
460,
1807,
698,
1986,
826,
930,
1782,
646,
469,
218,
416,
1584,
294,
124};
const Uint16 b9[] = {26, 40,
802,
990,
1955,
1790,
854,
706,
1174,
1628,
561,
14,
1354,
1531,
1468,
418,
1706,
1311,
1156,
276,
137};
611,
1876,
1004,
Trang - 61 / 82 -
Trang - 62 / 82 -
Cc t ny s c a vo EEPROM lu tr.
Trang - 64 / 82 -
Trang - 65 / 82 -
Trang - 66 / 82 -
Trang - 68 / 82 -
III.
Kt qu v nhn xt.
Trang - 69 / 82 -
Kt qu:
Chng trnh DEMO vi 4 t c np sn vo ROM l mt, hai, ln, xung. Thc
nghim trong mi trng nhiu t ( trong phng, c ting qut my hoc ni chuyn t xa ) cho thy kt
qu nh sau:
Mt
100.00%
Hai
100.00%
Ln
70.00%
Xung
75.00%
Trang - 71 / 82 -
Phn:
Ph lc.
Ph lc A:
Chng trnh tnh h s mel spaced filterbank dng MATLAB.
function m = melfb(p, n, fs)
% MELFB
Determine matrix for a mel-spaced filterbank
%
% Inputs:
p number of filters in filterbank
%
n length of fft
%
fs sample rate in Hz
%
% Outputs:
x a (sparse) matrix containing the filterbank amplitudes
%
size(x) = [p, 1+floor(n/2)]
%
% Usage:
For example, to compute the mel-scale spectrum of a
%
colum-vector signal s, with length n and sample rate fs:
%
%
f = fft(s);
%
m = melfb(p, n, fs);
%
n2 = 1 + floor(n/2);
%
z = m * abs(f(1:n2)).^2;
%
%
z would contain p samples of the desired mel-scale spectrum
%
%
To plot filterbanks e.g.:
%
%
plot(linspace(0, (12500/2), 129), melfb(20, 256, 12500)'),
%
title('Mel-spaced filterbank'), xlabel('Frequency (Hz)');
f0 = 700 / fs;
fn2 = floor(n/2);
lr = log(1 + 0.5/f0) / (p+1);
% convert to fft bin numbers with 0 for DC term
Trang - 72 / 82 -
Ph lc B.
Schematic ca phn cng giao tip vi eZdsp F2812.
Trang - 73 / 82 -
Trang - 74 / 82 -
Trang - 75 / 82 -
Ph lc C.
Chng trnh nhn dng m thanh vi gii thut VQ da trn MATLAB.
function d = disteu(x, y)
% DISTEU Pairwise Euclidean distances between columns of two matrices
%
% Input:
Trang - 76 / 82 -
%
x, y: Two matrices whose each column is an a vector data.
%
% Output:
%
d:
Element d(i,j) will be the Euclidean distance between two
%
column vectors X(:,i) and Y(:,j)
%
% Note:
%
The Euclidean distance D between two vectors X and Y is:
%
D = sum((x-y).^2).^0.5
[M, N] = size(x);
[M2, P] = size(y);
if (M ~= M2)
error('Matrix dimensions do not match.')
end
d = zeros(N, P);
if (N < P)
copies = zeros(1,P);
for n = 1:N
d(n,:) = sum((x(:, n+copies) - y) .^2, 1);
end
else
copies = zeros(1,N);
for p = 1:P
d(:,p) = sum((x - y(:, p+copies)) .^2, 1)';
end
end
d = d.^0.5;
function r = mfcc(s, fs)
% MFCC
%
% Inputs: s contains the signal to analize
%
fs is the sampling rate of the signal
%
% Output: r contains the transformed signal
%
%
%%%%%%%%%%%%%%%%%%
% Mini-Project: An automatic speaker recognition system
Trang - 77 / 82 -
%
% Responsible: Vladan Velisavljevic
% Authors: Christian Cornaz
%
Urs Hunkeler
m = 100;
n = 256;
l = length(s);
nbFrame = floor((l - n) / m) + 1;
for i = 1:n
for j = 1:nbFrame
M(i, j) = s(((j - 1) * m) + i);
end
end
h = hamming(n);
M2 = diag(h) * M;
for i = 1:nbFrame
frame(:,i) = fft(M2(:, i));
end
t = n / 2;
tmax = l / fs;
m = melfb(20, n, fs);
n2 = 1 + floor(n / 2);
z = m * abs(frame(1:n2, :)).^2;
r = dct(log(z));
function r = vqlbg(d,k)
% VQLBG Vector quantization using the Linde-Buzo-Gray algorithme
%
% Inputs: d contains training data vectors (one per column)
%
k is number of centroids required
%
% Output: r contains the result VQ codebook (k columns, one for each centroids)
%
%
%%%%%%%%%%%%%%%%%%
Trang - 78 / 82 -
% Note:
%
Sound files in traindir is supposed to be:
%
s1.wav, s2.wav, ..., sn.wav
% Example:
%
>> code = train('C:\data\train\', 8);
k = 16;
for i = 1:n
% train a VQ codebook for each speaker
file = sprintf('%ss%d.wav', traindir, i);
disp(file);
[s, fs] = wavread(file);
v = mfcc(s, fs);
code{i} = vqlbg(v, k);
end
% Compute MFCC's
% Train VQ codebook
% Compute MFCC's
distmin = inf;
k1 = 0;
Trang - 80 / 82 -
for l = 1:length(code)
% each trained codebook, compute distortion
d = disteu(v, code{l});
dist = sum(min(d,[],2)) / size(d,1);
if dist < distmin
distmin = dist;
k1 = l;
end
end
msg = sprintf('Speaker %d matches with speaker %d', k, k1);
disp(msg);
end
Trang - 81 / 82 -
Trang - 82 / 82 -