Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

TRNG I HC BCH KHOA Tp.

H CH MINH
KHOA IN - IN T

THIT K B NHN DNG TING NI DA TRN


NN TNG DSP TMS320C2812

SINH VIN THC HIN: NGUYN QUC NH


HNG DN: TS. HUNH THI HONG

THNG 1 - 2009

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

MC LC
Trang.

Phn 1: Gii thiu.


I.

Mt s ng dng ca nhn dng ging ni.

II. Cc kh khn v hn ch ca h thng nhn dng ging ni( Automatic

3
5

Speech Recognition - ASR) cho n thi im hin nay.


III. Mc tiu ca lun vn.

Phn 2: Tng quan v nhn dng m thanh, la chn gii thut s nhng vo DSP
I. Tng quan v h ASR

6
6

1. Nguyn tc hot ng ca h thng ASR.

2. H thng Text Dependence ASR v Text Independence ASR.

3. Tch c trng.

4. Hun luyn cc c trng

II. La chn gii thut s nhng vo DSP.

11

1. Quy trnh tch c trng

24

2. Hun luyn v nhn dng.

30

3. Th nghim trn MATLAB

33

Phn 3: H nhng
I. Board eZdsp

33

II. TMS320F2812.

35

III. Phn vng b nh cho ng dng Standalone.

39

IV. Code Composer Studio

42

Phn 4: Gii thut nhng vo DSP.


I. Thit k ngun, giao din, ly mu.

45

1. Phn chuyn i tn hiu.

45

2. Giao tip vi EEPROM

46

3. Giao tip vi LCD.

47
Trang - 1 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

4. Ngun

49

II. Chng trnh cho DSP

50

1. Tch c trng Acoustic Vector

50

a) Ly mu, lc thng cao loi b Offset dng IIR

51

b) Tch thnh cc frame. Tch t da trn nng lng.

56

c) Windowing v FFT

58

d) Mel frequency Rapping.

60

e) Cepstrum.

62

2. Hun luyn theo thut ton LBG.

63

3. S gii thut ca ton b chng trnh

65

III. Kt qu v nhn xt

69

Ph lc

72

Ti liu tham kho.

82

Trang - 2 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Phn 1:

Gii thiu.
I. Mt s ng dng ca h thng nhn dng ting ni.
H thng nhn dng ting ni ( Automatic Speech Recognition ASR ) s c nhng ng dng
tuyt vi trong tt c cc lnh vc ca i sng, nu c p dng thnh cng s l mt cuc cch
mng trong giao tip ngi my (Human Machine Interface ), cc ng dng ca n bao trm trn nhiu
lnh vc nh cng nghip, an ninh v gii tr.
Trong lnh vc iu khin:
Cc h thng vi b t vng nh, nhp t ri rc c th p dng trong nhng ng dng tng
i n gin ci thin hiu qu nhp thng tin vo my (nhp ting ni nhanh gp hai ln nhp
thng tin bng cch g ch) trong mi trng sn xut (cng vic phn loi), trong nhng ng dng m
i tay khng cn gi tr (chng hn nh trong phng ti, trong bung li), trong cc ng dng iu
khin t xa vi thit b, iu khin robot, iu khin chi tr em, hay trong cc thit b yu cu thu
nh phi loi b h thng phm nhn, v nu c th s l mt phng php hu hiu gip cho ngi
khim th d dng giao tip iu khin vi thit b. Ni chung l trong nhng nhim v c bit c
khuynh hng gii hn b t vng v ni dung thng ip. Trong ti ny, ngi thc hin quan tm
nhiu cho mt h thng nhn dng trn mt b t vng nh (nh hn 10 t) ng dng trong cc h
thng iu khin vi tp lnh c nh.
Nhng ng dng thc tin m h thng ny s mang li l v cng to ln nh cc my tnh ca

Trang - 3 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

chng ta s khng cn cc bn phm, cc h thng iu khin s khng cn bng iu khin phc tp,
my in thoi s khng cn cn n cc bn quay s Pha trc ti x xe hi s c mt vi mch t
ng tr li c khi hi hng i v trong nh mi ngi s c tm lch bit nhc nhng vic cha
lm khi bn ln ting hi c th xem l mt bc t ph trn tt c cc lnh vc trong cuc sng ca
chng ta.
Cc h thng nh th ny c th tm thy nhng i in thoi di ng hin i nh iPhone
ca Apple hay dng Nseries ca Nokia.
Trong lnh vc chuyn i tn hiu:
Mt cuc phng vn c ng ln mt t bo, nu c mt h thng nhn dng cu ni hon
thin, ngi phng vin khng cn phi nh li bi phng vn ca mnh. Trong cc cuc hi tho trc
tip hay cc bui to m t xa, vn bn cuc hp s t ng c in ra m khng cn th k son
tho. H thng nhn dng ting ni s t ng chuyn i li ni thnh vn bn.
Trong cc cuc ni chuyn do bt ng ngn ng, hay do nhng vn t nh v t i dn tc,
chuyn i qua li gia hai ngn ng, cng vi h thng dch thut trn vn bn kt hp vi hai h
thng nhn dng ting ni s cho php cuc ni chuyn din ra bnh thng v t nhin. H thng
chuyn i ngn ng trc tip ny rt hu ch trong cc cuc hi tho ln c nhiu quc gia, dn tc
tham d.
H thng kiu nh th ny i hi kh nng nhn dng rt ln, cho ti thi im hin ti mc
ng dng cn hn ch.
Trong lnh vc nhn din:
H thng nhn dng ting ni kt hp vi x l tng hp ging ni cn c ng dng trong
lnh vc nhn din ting ni. H thng mt m ging ni cho php nhn dng ngi thng qua ting
ni, chng hn rt tin ra khi ngn hng hay cc tc v khc m khng cn kim tra ch k hay cc
giy t khc v c yu cu b mt v nhn thn. Hoc ng dng trong cc h thng kho t ng m
cha kho l ting ni.
H ASR nh vy c nhng p dng trong thc t.
C th tham kho thm ti a ch http://en.wikipedia.org/wiki/Speech_recognition cho cc ng
dng ca h thng ASR trong thc t.
Trang - 4 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

II. Cc kh khn v hn ch ca h thng ASR cho n thi im hin nay:


Vic ng dng h thng ASR cho n thi im hin nay vn cn nhiu hn ch l v mt s
kh khn do bn thn i tng m n nhm n, trong trng hp ny l ting ni hay m thanh, l
mt i tng khng n nh. Cc kh khn c th k n nh:
S bin ng ca ngi ni trong vic pht m:

Ting ni thay i theo thi gian, theo tui.

Tnh trng sc khe. Mt ngi khi khe mnh s pht m khc hn so vi khi gp m au,
v d nh cm cm chng hn.

Tc ni.

Vi mt ngi, trong mt khong thi gian ngn, vic pht m mt t trong nhiu ln khc
nhau c th khc nhau.

nh hng ca ngoi cnh:

Nhiu, ting n ca mi trng xung quanh. V d mt ngi ni trong khng gian yn tnh
s d nghe hn l ngoi ng ph

Handset thu m c th khc nhau trong nhng tnh hung khc nhau.

Khong cch t ming ngi ni n Handset.

iu kin l tng cho vic thc hin nhn dng ting ni ni chung v m thanh ni ring l
ting ni s n nh k c trong lc hun luyn v lc nhn dng. Ting ni ca mi ngi l duy nht,
khng trng ln vi nhng ngi khc. Do , cho n thi im hin ti, vic nhn dng m thanh,
ting ni l mt cng vic rt kh khn.

III.

Mc tiu ca lun vn.

Vit chng trnh nhng vo DSP thc hin cng vic nhn dng ting ni. La chn gii
thut ph hp vi ti nguyn ca phn cng.
Vi mc tiu l tp trung vo gii thut, nn khng cn xy dng cc phn cng demo khc ( v
d nh chic xe iu khin bng ging ni ). lun vn ny, th hin kt qu ca nhn
dng, ti ch th hin ln bng LCD v cc LED.
nh gi kh nng thc hin c.
Trang - 5 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Phn 2:

TNG QUAN
V NHN DNG M THANH
& LA CHN GII THUT S NHNG VO
DSP
I. Tng quan v h thng ARS.
1. Nguyn tc hot ng ca h thng ARS.
Nhng nguyn tng qut sau c nh ngha vi h thng

text independent speaker

identification. H thng text dependent speaker identification cng c nhng s phn loi tng t
nh vy. Tham kho [1].
H thng ARS c th c phn loi thnh 2 loi l nhn dng (Speaker Identification ) v xc
nhn ( Speaker Verification ).
H thng nhn dng ( Speaker Identification ): l h thng a ra quyt nh ngi no
trong s nhng ngi hun luyn h thng ang giao tip vi h thng.
H thng xc nhn ( Speaker Verification ): l h thng chp nhn/bc b mt ngi no .
Quyt nh ngi va giao tip vi h thng c nm trong nhng ngi c ng k
hay khng.
Hnh 2.1 th hin cu trc c bn ca 2 h thng trn.

Trang - 6 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.


Similarity

Feature
extraction

Input
speech

Reference
model
(Speaker #1)

Maximum
selection

Identification
result
(Speaker ID)

Similarity

Reference
model
(Speaker #N)

(a) Speaker Identification.

Input
speech

Speaker ID
(#M)

Feature
extraction

Similarity
Reference
model
(Speaker #M)

Decision

Verification
result
(Accept/Reject)

Threshold

(b) Speaker Verification.


Hnh 2.1. Nhng cu trc c bn ca h thng ASR. Tham kho t [1].
Nh hai hnh trn, mc cao nht, tt c cc h thng ASR bao gm 2 module chnh: tch
c trng (feature extraction ) v nhn dng c trng ( feature matching ).
Feature extraction: tch cc d liu t ting ni c trng nht ca ngi ni vo.
Feature matching: l qu trnh nhn dng ngi ang giao tip vi h thng bng cch so snh
cc c trng ca ngi ny vi nhng ngi c hun luyn.
Hai phn ny s c ni r hn trong phn 3 v phn 4.

Trang - 7 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

2. H thng Text dependenct ASR v Text independent ASR.


Nh tn gi ca n, text dependenct ASR ph thuc vo t c c vo. ng dng ca n
thng c tm thy trong cc h thng iu khin, hay dch v, khi m i hi nhiu ngi c th
tip cn c i tng.
Text dependenct ASR thng da trn k thut nhn dng khun mu ( template matching ).
Trong phng php ny, t c vo c tch ra thnh nhng vector c trng, thng l c trng
ph trong thi gian ngn. Tn hiu theo thi gian ca ting ni v mu tham kho ( ca nhng t c
hun luyn ) c xp xp dng thut ton DTW. Kt qu l ging nhau ca chng, c tnh t
thi im ban u n khi kt thc m thanh.
Hidden Markov model ( HMM ) l mt phng php hiu qu hn so vi DWT. HMM c
xem nh l thut ton m rng ca DTW, dng trong cc h thng ln.
H thng text independent ASR, dng nhn dng ngi ni hn l t no c ni. H
thng nh vy dng ch yu trong vic nhn dng ngi ni.
Mt trong nhng phng php ph bin vi h thng ny l Vector Quantization, dng vi s
lng mu nh. Thut ton ergodic HMM c gii thiu nh l mt phng php hiu qu hn, dng
nhn dng mt s lng mu ln hn,

3. Tch c trng.
Tch c trng ca mu l mt phn quan trng ca bt c h thng nhn dng no. Mt cch
l tng, mt i tng khc nhau s c mt hoc nhiu c trng. Cc c trng cng khc nhau gia
cc i tng th vic nhn dng cng chnh xc.
Vic nhn dng s da trn cc c trng ny, c th s dng 1 c trng hoc kt hp nhiu
c trng li vi nhau. Vi cc h thng ASR hin nay, thng ch s dng mt c trng ca tn hiu
m thanh.
Cho ti thi im hin nay, cc phng php ch yu tch c trng c th k n nh:
Linear Prediction Coding (LPC), Mel Frequency Cepstrum Coefficients ( MFCC ), Principle
Components Analysis (PCA) v cc phng php khc.
Linear Prediction Coding.
Trang - 8 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.2. S khi ca phng php LPC.


Ton b mt h thng ASR dng phng php LPC c miu t kh y ti [4] hoc ti
a ch http://www.owlnet.rice.edu/~elec532/PROJECTS98/speech/cepstrum/cepstrum.html.
Mel Frequency Cepstrum Coefficients.
continuous
speech

Frame
Blocking

mel
cepstrum

Windowing

frame

Cepstrum

mel
spectrum

FFT

spectrum

Mel-frequency
Wrapping

Hnh 2.3. S khi ca phng php MFCC.


Ton b h thng ASR dng phng php MFCC c miu t chi tit vi MATLAB ti http://
www.ifp.uiuc.edu/~minhdo/teaching/speaker_recognition

4. Hun luyn v nhn dng cc c trng.


Cc c trng sau khi c to thnh, d bng bt c phng php no cng s c dng
Trang - 9 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

hun luyn to c s d liu v nhn dng v sau.


Cc k thut chnh c dng trong vic nhn dng m thanh c th k n l:
Dynamic Time Warping DTW.
Phng php ny ph bin trong thp nin 1970 & 1980.
H thng ASR dng DTW c miu t chi tit vi MATLAB ti http://www.ee.columbia.edu/
~dpwe/resources/matlab/dtw/.
Vector Quantization VQ.
Phng php ny cho kt qu kh quan hn so vi phng php DTW, trong phm vi t vng
nh ( khong 20 t), y c th l phng php c quan tm.
Mt h thng ASR c xy dng trn nn MATLAB c th tm thy ti [1].
Hidden Markov Modeling/Artifitial Neural Network HMM/ANN.
HMM l k thut mi nht, c dng trong cc h thng i hi s t vng rt ln, ln n hn
ngn t.
Quy trnh ca mt h thng ASR c biu din nh sau.

Trang - 10 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.4. S khi ca h thng ARS vi k thut HMM.


Vic xy dng mt h thng nh vy c th tm thy chi tit ti [5].

II. La chn gii thut s nhng vo DSP.


Gii thut nhng vo DSP c chn vi iu kin chng trnh khng qu nng n vi Chip
DSP hin c, nhng phi bo m l chng trnh nhn dng c mt s t ti thiu.
Sau khi tm hiu cc thut ton ph bin hin nay, ti chn MFCC tch cc c trng v
VQ l phng php hun luyn v nhn dng. V s hiu qu ca n, v khng i hi khi lng tnh
ton ln.

1. Quy trnh tch c trng.


Vi quy trnh c th hin m hnh sau:

Trang - 11 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.5. S qu trnh tch c trng s c nhng vo DSP.


1.a)

Record / Sampling.

Tai ngi thn nht vi cc tn hiu c tn s trong khong 100Hz 5Khz, v thng thng vi
tn hiu m thanh, khu vc ph ny chim phn ln nng lng ca m thanh c pht ra.
c c cc nng lng ch yu ca ting ni con ngi, h thng ca ti s ly mu m
thanh tc 12Khz, nh vy tn hiu thu c s mang tn s ln n 6Khz.
1.b)

Frame Blocking.

c im ca tn hiu ting ni l tn hiu chm bin i theo thi gian, hay cn c gi l


quasi stationary , Hnh 2.6 th hin iu ny. Khi xem xt mt khong thi gian ngn, khong t
5 100ms, tnh cht ca n gn nh l n nh. C th thy iu hnh th 2 ca hnh 2.6 di.
Tuy nhin, trong mt khong thi gian di, tnh cht ca n b thay i, phn nh nhng m thanh khc
nhau c ni.

Trang - 12 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.6. Mt v d ca tn hiu ting ni.


Hnh trn l th trong mt khong thi gian di,
hnh di l th trong mt khong thi gian ngn.
V vy phn tch khong thi gian ngn (short time spectral analysis) thng c s dng
trong phn tch tnh cht ca tn hiu ting ni. Trong khi mt t c pht ra c th di n 1s, nn
cn thit phi chia cc tn hiu thu c thnh cc frame nh, cc frame ny c di tng ng 5
10ms.
trnh s thay i t ngt gia cc frame, s c s lp li gia cc frame lin tip. V d,
mi frame u tin c N mu; frame k tip cng s c N mu, nhng ch c M ( M < N ) mu l
mi, cn li N M mu u tin l N - M mu cui cng ca frame u tin, qu trnh nh vy tip
din cho nhng frame sau.
Thng thng chn N = 128, 256, 512 thun tin cho vic tnh FFT ng sau ny. V
chn M ~ N/3.
Trang - 13 / 82 -

SVTT: Nguyn Quc nh.

1.c)

GVHD: TS. Hunh Thi Hong.

World detection.

gim khi lng tnh ton, v tng chnh xc, ch khi no c tn hiu ting ni thu c
mi x l. Cng vic ny c gi l tch t ( end point detection ).
Phng php ph bin nht l dng tch t l dng nng lng ( ESS Energy of Speech
Signal ) kt hp vi t l im qua im Zero ( ZCR Zero Crossing Rate ). Tham kho [2]. Ngoi ra
cn c cc phng php khc na l Teager's Energy, tham kho [3], hoc dng mng hun luyn
Neural.
V s n gin v tnh ph bin ca phng php ESS nn n c chn tch t y.
Phng php ESS da trn quan im cho rng khi no c ting ni, nng lng thu c s
ln hn rt nhiu so vi khi im lng. T vic xc nh liu c tn ti ting ni thu c hay khng
da vo nng lng ca tn hiu thu c.
Nng lng ca frame th n c tnh nh sau: E n=

length of frame

i=1

Hnh 2.7 sau biu din nng lng ca m thanh ca m MT.

Trang - 14 / 82 -

ni

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.7. Mt v d v nng lng ca m thanh.


Thut ton la tch t da trn ESS c miu t nh Hnh 2.8. Tham kho t [2].

Trang - 15 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.8: Lu gii thut ca tch t bng nng lng.


Trong , cc ch s ITL v ITU c xc nh nh sau:

Thu li m ln nht, tnh c IMX = max( E(n) ).

Ghi nhn cc gi tr E(n) lc im lng, lc ny ch c nhiu, tnh c IMN = min( E(n) ).

Trang - 16 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Tnh
I1 = 0.03*( IMX IMN ) + IMN.

[ 3% nng lng ca khong dao ng ].

I2 = 4* IMX.

[ 4 ln nng lng nh nht ].

Tnh ngng nng lng ITL v ITU.


ITL = min ( I1, I2 )
ITU = 5* ITL.

Vi gii thut nh trn, ta c kt qu sau:

Hnh 2.9. Mt v d v t c tch ra bng ESS.


T hnh v, ta thy kt qu thu c l chp nhn c. Tn hiu sau khi c nhn bit c
phi l mt pht m ca ngi ni hay khng c dng mt chui cc frame, s c a vo phn tch
ph, nhng trc c ly ca s gim tc dng cnh ca vic ly frame.
1.d)

Windowing.
Trang - 17 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Bc tip theo l ca s ha mi frame gim s khng lin tc gia im u v im cui


ca mi frame. Nguyn tc ca vic ca s ha l gim thiu mo dng ph bng cch lm hp dn
pha u v cui ca mi frame.
Gi ca s l w(n), 0 n N 1, vi N l s phn t trong mi frame. Tn hiu sau khi ca s
ha nh sau:
yl(n) = xl(n) * w(n)

0 n N 1.

Thng thng ca s Hamming c dng, c dng nh sau:


2 n
w(n) = 0.54 0.46 cos
, 0 n N 1
N 1

Hnh 2.10: ca s Hamming 128 im.


Tn hiu sau khi Hamming ha s b thu nh li hai u nh hnh v 3.7.

Trang - 18 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.11. nh hng ca ca s Hamming ln tn hiu. Tn hiu b thu nh li hai u


1.e)

FFT.

Mc ch ca qu trnh ny l chuyn i tn hiu thi gian thnh tn hiu tn s. FFT l gii


thut nhanh ca php bin i Fourier ri rc ( DFT ).
Cng thc ca FFT vi N mu c tnh nh sau: X k =

N1

n= 0

xn e j 2 kn / N ,

k = 0,1,2,..., N 1

Xk c tnh t cng thc trn l s o, nhng thng thng ta ch quan tm n bin ca n.


Xb c th chia lm 2 phn: phn tn s dng ng vi 0 f < Fs / 2 tng ng vi 0 n N / 2 1 ,
v thnh phn tn s m Fs / 2 < f < 0 tng ng vi N / 2 + 1 n N 1 .
Sau y l kt qu khi sau bc ny, ng thi cng thy c tc dng ca vic ca s ha
frame.

Trang - 19 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.12. Bin ph sau khi FFT. Hnh trn khi tn hiu khng c ca s ha.
Hnh th 2, tn hiu c b ca s ha bng Hamming.
Kt qu t bc ny l mt dy cc bin ph tn s ca cc frame lin tip nhau.
1.e)

Mel frequency Wrapping.

Mt s nghin cu vt l v tai ngi cho thy phn ng ca tai ngi vi tn hiu ting ni
khng tun theo quy lut tuyn tnh v tn s. Vy mt cch tip cn ch quan, mi tn hiu m thanh
c pht ra s c chuyn i li cho ph hp, lc ny tn s mel ( mel frequency ) c s dng.
Mel frequency tuyn tnh tn s di 1Khz v logarithmic tn s trn 1Khz. Cng thc
mel frequency c tnh nh sau:
mel f =2595log 1 f /700
Trn thc t, tnh ton mel spectrum th ngi ta dng nhng ca s lc filterbank, c
xp xp mt cch ng iu nh Hnh 2.13 bn di. Filterbank l nhng b lc thng dy hnh ch
Trang - 20 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

nht, khong cch cng nh bng thng ca n l nhng hng s.

Hnh 2.13. Mel spaced filterbank vi 20 h s mel spectrum.


Chng trnh tnh cc h s ny c miu t phn ph lc A.
xc nh mel spectrum, cho bin ph tn s sau bc FFT trn qua b lc mel, vi
cng thc tnh nh sau:
N/2

l = X k M l k

l=0, 1,. .. , L1

k=0

Gi s vi L = 20, mel spectrum s c hnh dng nh sau, vi u vo khong 20 frames:

Trang - 21 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.14. Mel spectrum 20 im ca 20 frame lin tip.


Kt qu thu c n bc ny l mt ma trn [s im h s mel spectrum][s frame]. Cc
h s ny c a vo bc cui cng tm Acoustic vector, l c trng ca ging ni.
1.f)

Cepstrum

Bc ny chuyn i logarit ca mel spectrum v trc thi gian, kt qu c gi l mel


frequency cepstrum coefficients (MFCC) . MFCC ca ph ting ni l c trng cho c tnh ph bin
ca tn hiu t cc frame.
Phng php bin i y dng DCT ( discrite cosin transform ) thay v IFFT, v nhng c
im sau:
Tn hiu l thc
IFFT p dng cho tn hiu l s phc, trong khi DCT l s thc.

Trang - 22 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

DCT c chc nng tng t nh IFFT, nhng hiu qu hn do s dng s thc.


C

nhiu

cng

thc

tnh

DCT,

th

tham

kho

thm

ti

ch

http://en.wikipedia.org/wiki/Discrete_Cosine_Transform. Cng thc thng dng nht y l cng


thc 2.
K

1
= log k cos[ nk ]
2 K
k=1

n=0,1,... , K 1

Kt qu sau bc ny, ta c tn hiu c gi l Acoustic vector, mang c trng ca ging ni


thu c. Hnh dng Acoustic vector ca 1 frame c th nh sau:

Hnh 2.15. Mt v d v Acoustic vector ca 1 frame. Cepstrum c 20 im.

Trang - 23 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.16. Acoustic vector ca 20 frame lin tip.


n y, ta thu c c trng ca ting ni thu c, l mt chui cc acoustic vector ca
cc frame lin tip nhau. Cc acoustic vector ny s c dng hun luyn v nhn dng ging ni.

2. Hun luyn v nhn dng.


Nh cp n phn I. Cc k thut chnh cho n nay dng nhn dng ging ni c th k
n nh DTW, HMM v VQ. Trong phn ny th VQ c chn v s n gin trong cch hun luyn
v hiu qu cao ca n.
VQ l phng php nh x nhng vector trong mt khng gian ln thnh mt s lng hu hn
cc vector cng nm trong khng gian .Mi vng cc khng gian rng ln gi l mt b ( cluster ) c
th c trng bi tm ca n gi l codeworld. Tp hp cc code word ny gi l codebook.
Hnh 2.17 bn di ch nguyn l minh ha cho gii thut. Trong hnh v ny, ch c 2 ging
ni v 2 chiu ca khng gian acoustic vector c trnh by. Vng trn ch acoustic vector ca ngi
Trang - 24 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

th nht, hnh ch nht ch acoustic vector ca ngi th 2. Trong qu trnh hun luyn, thut ton to
chm ( s c trnh by sau ) c dng to ra mt VQ codebook ca t .
Speaker 1

Speaker 2

Speaker 1
centroid
sample

VQ distortion

Speaker 2
centroid
sample

Hnh 2.17. S nguyn l miu t thng tin ca codebook.


m thanh ny c th phn bit vi m thanh khc nh vo v tr ca centroid
nhn dng, khong cch Euclid c dng tnh khong cch t mi acoustic vector n
codeworld gn nht ca mi codebook c hun luyn ( nm trong database ). Ting ni no tng
ng vi tng cc khong cch Euclid n mt VQ codebook no nh nht th tng ng vi ting
ni to nn VQ codebook .
2.a) Khong cch Euclid.
Khong cch Euclid gia 2 vector n chiu a( a1, a2, , an ) v b( b1, b2, , bn ) c tnh nh
sau:

l= b1a12b2a22...bnan2

2.b) Thut ton to chm clustering the training vector.


Mc ch ca thut ton to chm l to mi VQ codebook cho mi t thu c t acoustic
vector to ra t phn trc. Thut ton LBG ( Linde, Buzo & Gray, 1980) l thut ton ni ting
nhm L vector hun luyn thnh M codebook vector.

Trang - 25 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Thut ton LBG c tm tt nh sau, tham kho t [1]:


1. Khi to 1 vector codebook, l trung tm ca tt c cc acoustic vector thu nhn c t phn
trc.
2. Nhn i kch thc ca codebook bng cch chia i codebook hin hnh yn theo quy tc sau:
y+n = yn ( 1 + )
y-n = yn ( 1 - )
Vi cc

n = 1 (kch thc ca codebook).


: h s chia. V d = 0.01.

3. Nearest Neighbor Serch: vi cc vector hun luyn, y l acoustic vector, tm cc codeworld


trong codebook hin thi gn vi n nht ( theo khong cch Euclid ), xp xp cc vector ny
vo trong cc cell tng ng.
4. Centroid Update: cp nht codeworld trong mi cell bc 3, dng centroid ca mi acoustic
vector tng ng vi cell , thng thng nht l dng trung bnh cng cc acoustic vector ca
cell .
5. Lp li bc 3 & 4 cho n khi khong cch trung bnh nh hn ngng cho trc.
6. Lp li bc 2, 3 & 4 cho n khi kch thc ca codebook n gi tr mong mun.
Hnh 2.17 sau miu t s gii thut ca thut ton LBG ny.

Trang - 26 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.


Find
centroid
Yes

m<M

No

Stop

Split each
centroid

m = 2*m

Cluster
vectors

Find
centroids

Compute D
(distortion)

D = D

No

D ' D
<
D

Yes

Hnh 2.18. Lu gii thut ca thut ton LBG.


Vi:

M: kch thc ca codebook cn tm


Cluster vectors: l bc (3), nearest neighbour search.
Find centroids: l bc (4).
Hnh v 2.19 sau cho kt qu ca thut ton LBG, hnh u tin l acoustic vector thu c ca

m HONEY, hnh th 2 l Codebook ( 16 vectors ) thu c t thut ton LBG. Cch v hnh nh
th ny khng th thy c v tr ca codebook so vi cc acoustic vector, nhng v codebook thu
c l vector 20 chiu ( l kch thc ca Mel filterbank ),vic biu din trong khng gian a chiu
l kh khn, nn cch v nh vy thy c kt qu l ch yu.
Trang - 27 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.19. VQ thu c t acoustic vector tng ng, nh vo thut ton LBG.
Chn kch thc ca codebook cng ln th mc nhn dng cng cao. Theo nh ti liu tham
kho [6] th s nh hng c th hin thng qua hai bn sau; kt qu ny cha c kim chng
li, nhng c th xem nh l s liu tham kho:

Trang - 28 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Nh vy kt qu ca vic hun luyn bng thut ton LBG l mt tp hp Codebook, bao gm


tp hp cc Codeworld. Cc vector ny s c dng cho qu trnh nhn dng.

Trang - 29 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

2.c) Nhn dng t.


Nh gii thiu chng II, ARS c 2 h thng c bn, l Speaker Identification v Speaker
Verification. y dng h thng xy dng trn phng php Speaker Indentification.
Qu trnh nhn dng c th miu t thng qua cc bc nh sau:
Vi mi m thanh pht ra trch c trng cua n, l cc Acoustic Vector nh ni trn.
Tnh khong cch ca Acoustic vector ny vi tng Codebook ca cc t hun luyn.
Tm khong cch Euclide nh nht ca cc Acoustic Vector ny n tng Codeworld ca
cc Codebook ny. Tnh trung bnh khong cch ny. ng vi mi Codebook ca t
c hun luyn, ta c tng ng mi khong cch nh vy.
Tm khong cch nh nht ca Acoustic Vector thu c vi Codebook ca cc t c
hun luyn, ta s gn t thu c vi t ( hun luyn) tng ng c khong cch nh
nht,
Vi ton b qu trnh tch c trng, hun luyn bng phng php LBG v nhn dng nh
trn, qu trnh nhn dng m thanh bng phng php VQ ( Vector Quantization ) c trnh by
mt cch tng i y .

3. Th nghim thut ton trn MATLAB.


Quy trnh miu t phn A. v phn B. c th nghim trn MATLAB 7.0. Qu trnh th
nghim l khng real time, m cc mu th v mu hun luyn c thu m thnh cc files.
Chng trnh chy hun luyn cc mu hun luyn trc, thu c cc Codebook tng
ng vi cc m thu c. Sau chng trnh s nhn dng cc t trong mu th, kt qu thu c c
th nh gi s b chnh xc ca thut ton cng nh nh hng cc thng s.
Thu m to mu th v mu hun luyn.
Dng chng trnh All2WAV Recorder thu li cc mu m thanh. V d khi th nghim ti
thu cc m t 1 10, mi t c 2 ln cho 2 qu trnh hun luyn v th.

Trang - 30 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.20 Chng trnh thu m kim tra thut ton trn PC.
Kt qu thu c khi chy bng MATLAB.
Khi kim tra vi 9 t t 1 9. Chng trnh phn bit c 9 t ny.

Trang - 31 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 2.21. Kt qu thu c vi MALAB.

Trang - 32 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Phn 3:

H nhng.
I. Board eZdsp
Lun vn ca ti c xy dng trn nn DSP ca Texas Instrument, board mch c s
dng l eZdsp

ca Digital Spectrum Incooperated, c th tham kho thm ti a ch

http://c2000.spectrumdigital.com/ezf2812.

Hnh 3.1. Board mch eZdsp TMS320F2812.


B cng c ny c thit k chuyn iu khin ng c. Board eZdsp F2812 c nhng c
Trang - 33 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

im sau:
Nhn x l tn hiu l TMS320F2812.
Tc 150 MIPS.
18K words on chip RAM.
128K words on chip FLASH memory
64K worlds off chip SARAM memory.
Onboard IEEE 1149.1 JTAG Controller.
Onboard IEEE 1149.1 JTAG emulation connector.

Hnh 3.2. S khi ca eZdsp TMS320F2812.


Cu trc b nh ca board mch nh sau:

Trang - 34 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 3.3. S t chc b nh ca board mch.


V cc thng tin cng nh cc cng c h tr cho board mch, c th tham kho thm ti a
ch http://c2000.spectrumdigital.com/ezf2812. Hoc http://www.ti.com.

II. TMS320F2812.
TMS320F2812 l nhn ca board mch eZdsp F2812. Phn sau y l nhng gii thiu khi
qut v DSP ny, thng tin chi tit hn c miu t cc Applications Report ca TI.
Trang - 35 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

H C28x DSP l h mi nht ca dng TMS320C2000 DSP. Chng trnh ca C28x tng
thch vi h 24x/240x DSP. Vi kh nng 32 x 32 bit MAC ca h C28x v kh nng x l 64 bit,
cho php C28x tr thnh s la chn cho nhng ng dng i hi nhng nhn iu khin foating
point.

1. Memory Bus ( Hardvard Bus Architecture )


Cng nh nhng chip DSP khc, nhiu bus c dng di chuyn d liu gia nhng vng d
liu v thit b ngoi vi ca CPU. Kin trc b nh ca C28x cha nhng bus c chng trnh, bus
c/ghi d liu. Bus c chng trnh c 22 lines a ch v 32 lines d liu. 32 lines d liu cho php
truy cp 32 bit chc nng trong 1 chu k my.
Kin trc nhiu bus, cn gi l Harvard Bus, cho php C28x ly lnh, c v ghi d liu trong
vng 1 chu k my. Tt c nhng ngoi vi v b nh c gn vo bus b nh s u tin cho vic truy
sut b nh.

2. Real Time JTAG v Analysis


H F281x v C281x c tch hp chun JTAG IEEE 1149.1. Hn na, h F281x v C281x h
tr real time chc nng modified b nh, ngoi vi v v tr thanh ghi bt c khi no nhn x l ang
chy. F281x v C281x tch hp kh nng real time trong phn cng ca CPU, y l kh nng c
bit ca dng F281x v C281x, software monitor khng cn c i hi na.

3. External Interface.
Giao tip bt ng b cha 19 lines address, 16 lines d liu v 3 chip select lines. Chip select
lines c m ha thnh 5 vng bn ngoi, l Zones 0, 1, 2, 6 v 7.
Mi Zone ny c th lp trnh vi nhng wait states, strobe signal setup v hold timing khc
nhau.

4. Flash.
F2812 cha 128K x 16 b nh Flash. c chia thnh bn 8K x 16 sectors v 6 16K x 16
sector.

Trang - 36 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hot ng ca flahs c th c ci thin bng cch cho php chc nng flash pipeline trong
cc thanh ghi iu khin Flash.

5. M0, M1 SARAMs.
Mi vng ny cha 1K x 16 b nh RAM, c th dng cha chng trnh hay d liu

6. L0, L1, H0 SARAMs.


F2812 cha 16K x 16 single-access RAM, c chia thnh 3 vng l L0, L1, H0. Mi vng ny
c th dng cha d liu v chng trnh.

7. Boot ROM.
Vng Boot ROM cha boot-loading, c thc thi sau khi CPU c reset. N s kim tra mt
s GPIO quyt nh ch no bt u chng trnh.

8. Oscillator v PLL.
F2812 c cung cp xung nhp bng b dao ng ngoi hay bng thch anh gn vo chip.
Mt b PLL cung cp n 10 mc iu chnh vi dao ng ny. T s PLL c th c thay i
ngay c khi chng trnh ang chy, cho php chng trnh h thp tn s hot ng xung, trong
Trang - 37 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

trng hp cn hot ng ch tit kim nng lng.

9. GPIO Multiplexer.
Hu ht cc tn hiu ngoi vi c tch hp vi general purpose I/O. C nhng thanh ghi cho
php chn mt chn l GPIO hay l chn ca tn hiu ngoi vi.

10. 32 bit CPU timer (0, 1, 2)


Timer 0 cho nhng mc ch chung.
Cn Timer 1,2 dnh ring cho DSP/BIOS Real Time OS

11. Ngoi vi ni tip


F2812 h tr kh nhiu giao tip vi ngoi vi, vi mc ch tng thch vi cc MCU hin thi:
eCAN: h tr 32 mailboxes, time stamping ca cc message, tng thch vi CAN 3.0B
McBSP: the Multichannel buffer serial port giao tip vi E1/T1 lines, phone quality codecs
cho modem applications hoc high qualities stereo audio DAC devices.
SPI: thng giao tip DSP vi ngoi vi ngoi hoc cc processor khc.
SCI: tng ng vi UART.

12. On- chip ADC.


F2712 ADC module c 16 knh, c th cu hnh hot ng nh 2 module 8 knh hot ng
t do, phc v cc s kin A v B. Hai module 8 knh ny cng c th c mc cascade to thanh
1 module 16 knh.
Cc c im chnh ca module ADC:
Mt nhn 12 bit ADC vi 2 b sample and hold c sn.
Hot ng ch Simultaneous hoc Sequential.
Analog input: 0V n 3V.
Tc chuyn i ca ADC l 25MHz, tng ng 12.5MSPS.
16 knh input, 16 register lu kt qu. Kt qu c tnh theo cng thc nh sau:

Trang - 38 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

C nhiu ngun kch khi qu trnh chuyn i ADC:

S/W: software immediate start.

EVA: event manager A.

EVB: event manager B.

External pin.

Tham kho ti liu spru060c ADC user manual c nhng thng tin y hn.

III.

Phn vng b nh cho eZdsp F2812 trong ng dng Standalone.

Vi board mch eZdsp F2812, chng trnh c th c cha trong RAM, on chip RAM hoc
off chip RAM c c tc hot ng cao nht ( 150MIPS on chip RAM ). Lc ny, d liu
v chng trnh cng c cha trong RAM.
Chng trnh cng c th c cha trong Flash ( vi h F ) hoc ROM ( vi h C ). Tuy nhin
cch cu hnh ny, tc hot ng b gim i ch cn khong 120 130 MIPS.
Vi nhng h thng standalone, th DSP khng c kt ni vi host, vy chng trnh nht
nh phi cha trong FLASH hoc ROM. Nhng t c tc cao nht vi nhng function i
hi thi gian tnh ton cao, c th copy nhng on chng trnh ny t Flash vo RAM c tc
hot ng ti a.
Trong chng trnh ca ti, phng php copy ny c p dng cho nhng function i hi
tnh real time cao nh khi thu tp d liu t ADC, tnh tch c trng Acoustic Vector, hun luyn v
nhn dng.
Applications Report SPRA958 - Running an Application from Internal Flash Memory on
the TMS320F281x DSP s cung cp nhiu thng tin hn v phng php ny.
Sau y l linker file phn vng b nh cho chng trnh trong lun vn ca ti.

MEMORY
{
PAGE 0: /* Program Memory */

Trang - 39 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

/* Memory (RAM/FLASH/OTP) blocks can be moved to PAGE1 for data allocation */


ZONE0
: origin = 0x002000, length = 0x002000
/* XINTF zone 0 */
ZONE1
: origin = 0x004000, length = 0x002000 /* XINTF zone 1 */
RAML0
: origin = 0x008000, length = 0x001000 /* on-chip RAM block L0 */
ZONE2
: origin = 0x080000, length = 0x080000 /* XINTF zone 2 */
OTP
: origin = 0x3D7800, length = 0x000800
/* on-chip OTP */
FLASHJ
: origin = 0x3D8000, length = 0x002000 /* on-chip FLASH */
FLASHI : origin = 0x3DA000, length = 0x002000 /* on-chip FLASH */
FLASHH
: origin = 0x3DC000, length = 0x004000 /* on-chip FLASH */
FLASHG
: origin = 0x3E0000, length = 0x004000 /* on-chip FLASH */
FLASHF
: origin = 0x3E4000, length = 0x004000 /* on-chip FLASH */
FLASHE
: origin = 0x3E8000, length = 0x004000 /* on-chip FLASH */
FLASHD
: origin = 0x3EC000, length = 0x004000 /* on-chip FLASH */
FLASHC
: origin = 0x3F0000, length = 0x004000 /* on-chip FLASH */
FLASHA : origin = 0x3F6000, length = 0x001F80 /* on-chip FLASH */
CSM_RSVD : origin = 0x3F7F80, length = 0x000076 /* Part of FLASHA. Program with all
0x0000 when CSM is in use. */
BEGIN
: origin = 0x3F7FF6, length = 0x000002 /* Part of FLASHA. Used for "boot to
Flash" bootloader mode. */
CSM_PWL : origin = 0x3F7FF8, length = 0x000008 /* Part of FLASHA. CSM password
locations in FLASHA */
ZONE7
: origin = 0x3FC000, length = 0x003FC0 /* XINTF zone 7 available if MP/MCn=1
*/
ROM
: origin = 0x3FF000, length = 0x000FC0 /* Boot ROM available if MP/MCn=0 */
RESET
: origin = 0x3FFFC0, length = 0x000002 /* part of boot ROM (MP/MCn=0) or
XINTF zone 7 (MP/MCn=1) */
VECTORS : origin = 0x3FFFC2, length = 0x00003E /* part of boot ROM (MP/MCn=0) or
XINTF zone 7 (MP/MCn=1) */
PAGE 1 : /* Data Memory */
/* Memory (RAM/FLASH/OTP) blocks can be moved to PAGE0 for program allocation */
/* Registers remain on PAGE1
*/
/* new */
ZONE6
: origin = 0x100000, length = 0x080000 /* XINTF zone 6 */
RAMM0
RAMM1
RAML1
FLASHB
RAMH0

: origin = 0x000000, length = 0x000400 /* on-chip RAM block M0 */


: origin = 0x000400, length = 0x000400 /* on-chip RAM block M1 */
: origin = 0x009000, length = 0x001000 /* on-chip RAM block L1 */
: origin = 0x3F4000, length = 0x002000 /* on-chip FLASH */
: origin = 0x3F8000, length = 0x002000 /* on-chip RAM block H0 */

Trang - 40 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

/* Allocate sections to memory blocks.


Note:
codestart user defined section in DSP28_CodeStartBranch.asm used to redirect code
execution when booting to flash
ramfuncs user defined section to store functions that will be copied from Flash into RAM
*/
SECTIONS
{
/* Allocate program areas: */
.cinit
: > FLASHA PAGE = 0
.pinit
: > FLASHA, PAGE = 0
.text
: > FLASHA PAGE = 0
codestart
: > BEGIN
PAGE = 0
ramfuncs
: LOAD = FLASHD,
RUN = RAML0,
LOAD_START(_RamfuncsLoadStart),
LOAD_END(_RamfuncsLoadEnd),
RUN_START(_RamfuncsRunStart),
PAGE = 0
csmpasswds
csm_rsvd

: > CSM_PWL PAGE = 0


: > CSM_RSVD PAGE = 0

/* Allocate uninitalized data sections: */


.stack
: > RAMM0
PAGE = 1
.ebss
: > RAML1
PAGE = 1
.esysmem
: > RAMH0
PAGE = 1
/* Initalized sections go in Flash */
/* For SDFlash to program these, they must be allocated to page 0 */
.econst
: > FLASHA PAGE = 0
.switch
: > FLASHA PAGE = 0
/* Allocate IQ math areas: */
IQmath
: > FLASHC
IQmathTables
: > ROM

PAGE = 0
/* Math Code */
PAGE = 0, TYPE = NOLOAD /* Math Tables In ROM */

/* newwwwww */
eramdata
: > ZONE6

PAGE = 1

/* for my APP */
FFTipcb

ALIGN(256) : { }

>

RAMH0

Trang - 41 / 82 -

PAGE 1

SVTT: Nguyn Quc nh.


FFTmag
FFTtf
iirfilt

GVHD: TS. Hunh Thi Hong.


>
>
:>

RAMH0
RAMH0
RAMH0

PAGE 1
PAGE 1
PAGE = 1

/* .reset is a standard section used by the compiler. It contains the */


/* the address of the start of _c_int00 for C Code. /*
/* When using the boot ROM this section and the CPU vector */
/* table is not needed. Thus the default type is set here to */
/* DSECT */
.reset
: > RESET,
PAGE = 0, TYPE = DSECT
vectors
: > VECTORS PAGE = 0, TYPE = DSECT
}

IV.Code Composer Studio (CCS).


CCS l mi trng pht trin cho cc dng DSP ca Texas Instrument (TI), iu c bit trn
CCS l c th v c th thi gian thc ca cc thng s trn DSP. Ngoi ra, TI cn h tr rt nhiu
cc th vin DSP, gip cho cng vic pht trin cc thut ton tit kim rt nhiu thi gian.
Vic v th thi gian thc trn CCS gip kim sot cc thng s thc, hnh 4.4 ch ra 4 th
ca cc tn hiu trc tip t b ADC, tnh hiu sau khi lc, v ph FFT ca n.

Trang - 42 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 3.4. V th thi gian thc trn CCS.

TI cung cp rt nhiu cc th vin DSP, gip cho cng vic pht trin cc thut ton c rt
ngn kh nhiu thi gian. Cc th vin DSP c s dng trong ti ca ti c th k n nh:
C28x Communications Driver Library, s dng th vin SPI giao tip vi EEPROM ngoi.
C28x Fast Fourier Transforms Library, tnh FFT ca cc tn hiu m thanh.
C28x Filter Library, s dng b lc IIR lc tn hiu u vo t ADC.
C28x Fixed-Point Math Library, l b cng c ton hc tr gip cc php ton s hc.
C28x IQMath Library - A Virtual Floating Point Engine, l b cng c tnh ton s thc trn
nn chip Fixed Point DSP, ng thi h tr cc php ton s hc.
Trang - 43 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Tt c cc th vin trn c th tm thy trn website ca TI http://www.ti.com.

Trang - 44 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Phn 4:

Xy dng h thng ASR trn nn DSP.


I. Thit k ngun, giao din, ly mu.
Board mch eZdsp F2812 ch c sn b DSP, v cc thnh phn JTAG, cha to thnh
mt h thng y no.
c th to thnh mt h thng ASR vi mc ti thiu, cn phi thit k mt mch giao tip
vi board eZdsp F2812. Mc ch ca board mch giao tip nh sau:
Chuyn i tn hiu m thanh thnh tn hiu in, khuch i v shift tn hiu ny vo tm ca
ADC ( trong khong 0 3V )
DSP TMS320F2812 khng c ROM lu tr; s lu tr ny l cn thit v d nh c trng
ca cc m thanh mu, cc gi tr cn thit phi lu li sau khi khng cn cp ngun nui na.
Vy board mch giao tip cn phi c EEPROM.
Cc phng tin giao tip vi ngi dng, c th l button v LCD.
Ngun cung cp.
p ng nhng yu cu trn, ti thi cng hai mch phn cng, mt l VR Interface v
mch Ngun. S chi tit ca hai mch trn c th tm thy ph lc B.

1. Phn chuyn i tn hiu:


Mc ch ca phn mch chuyn i tn hiu l bin i tn hiu m thanh thnh tn hiu in
trong khong 0 3V.

Trang - 45 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.1. Mch gia cng tn hiu.


Tn hiu sau cng [ADCin] l tn hiu xoay chiu c offset l 1.5V do OpAmp U6C to thnh.
OpAmp U6A c mc ch lc thng cao tn hiu t Microphone, tn hiu sau OpAmp ny c
Offset bng 0.
OpAmp U6B c mc ch khuch i tn hiu AC, khuch i ph thuc vo bin tr R47.

2. Phn giao tip vi EEPROM thng qua giao tip SPI.


Mc ch l lu tr cc thng s cn thit khi khng cn cp ngun cho mch na.

Trang - 46 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.2. Giao tip vi EEPROM thng qua giao tip SPI.
ROM c s dng l 25AA640A ca Microchip. B nh 64Kbit, giao tip 8 bit. Tc giao
tip ln ti 10Mhz.
giao tip vi EEPROM ny, cc instruction set c dng chn cc chc nng ca chip.

Tiu chun giao tip vi ROM ny c th tm thy ti Datasheet ca 25AA640A ca Microchip.


Trong chng trnh ca ti, th vin C28x Communications Driver Library c s dng
giao tip vi ROM ny.

3. Giao tip vi hin th LCD.


Mc ch d giao tip khi s dng, LCD c s dng hin th cc thng ip.

Trang - 47 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.3. Giao tip 4 bit vi LCD.


tit kim chn giao tip, 4 bit interface c dng giao tip vi LCD. tham kho giao
tip 4bit, c th xem ti a ch http://www.experimentboard.com/hd44780_4-bit_interface.
thun tin cho giao tip v lp trnh m rng, chng trnh xy dng mt menu cho LCD.
Chng trnh xy dng menu cho LCD c tham kho t chng trnh vit trn AVR ca anh Nguyn
Hi H, din n www.dientuvietnam.net.
Menu c xy dng nh sau:

Trang - 48 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.4. Menu build in cho LCD.

4. Ngun.
giao tip vi DSP ( ngun 3.3 V ), cho mch khuch i tnh hiu OpAmp, cn to ngun
+5V , +3.3V, -5V. to in p Offset 1.5V trc khi a vo ADC, cn thit to ngun 1.5V

Trang - 49 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.5. Ngun cung cp.

II. Chng trnh cho DSP.


Chng trnh vit cho DSP vi thut ton nh c miu t chng III. Tch c trng bng
h s MFCC, hun luyn v nhn dng bng phng php VQ.

1. Tch c trng acoustic vector.


Vic tch c trng thc hin nh hnh 3.1, chng III. y c b xung thm phn lc IIR
tr Offset 1.5V ca tn hiu trc khi a vo b ADC.

Trang - 50 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.6. S qu trnh x l tnh hiu trn DSP.


a) Ly mu, lc thng cao loi b Offset.
DSP s ly mu tn hiu tn s 12Khz, b ADC 12 bit. Vi Fs = 12Khz ny s thu c cc
tn hiu c tn s n 6Khz. qu trnh thu liu c lin tc, ng thi thun li cho vic tch
thnh cc frame ( c lp li gia cc frame ), vng trn d liu ( ring buffer ) c s dng. Phng
thc ny c tham kho ti a ch http://en.wikipedia.org/wiki/Ring_buffer.

Hnh 4.7. Cu trc d liu ring buffer.


Trang - 51 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Vng trn s dng c chiu di 128 im, mi im c nh dng signed int. Tn hiu sau khi
thu c s c a qua b lc IIR loi b Offset.
ADC c cu to hot ng ch override. ch ny, channel 0 s c convert lin
tip 8 ln, cc gi tr s c lu vo Sequence 1 ri 2. Sau 8 ln convert ny, chng trnh to ngt.
Vi cch cu to nh vy, chng trnh s khng mt nhiu thi gian x l cc ngt t ADC.
B lc IIR.
TI c cung cp th vin IIR cho mc ch x l tn hiu, ng thi cung cp thm MATLAB
script khi to cc h s ca b lc ny.
Script eziir32.m l cng c thit k cc b lc IIR vi h s 32bit cho th vin C28x Filter
Library ca TI.
Hnh 4.8 sau ch cch thc hin c c cc h s ny:

Trang - 52 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.8. Tm h s IIR vi eziir32.


p ng ca b lc nh trn nh sau:

Trang - 53 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.9. p ng ca b lc IIR.


Cc h s to thnh t Script trn:

/* HPF co-efficients for IIR32 module

*/

#define IIR32_COEFF {\
0,12501435,0,-2987997,2987997,\
-13537326,29729702,4594994,-9189987,4594994,\
-16060578,32499120,1025818115,-2051636230,1025818115}

Trang - 54 / 82 -

SVTT: Nguyn Quc nh.


#define IIR32_ISF

GVHD: TS. Hunh Thi Hong.

4275781

#define IIR32_NBIQ 3
#define IIR32_QFMAT

24

Cc h s trn s c dng trong chng trnh lc tn hiu.


Tn hiu thu c sau b lc
Vi cch thu thp d liu nh trn, tn hiu thu thp c nh sau:

Trang - 55 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.10. Tn hiu trc tip t b ADC, vi Offset 1.5V.


Tn hiu sau khi loi b tn s thp bng b lc IIR
V tn hiu m thanh thu c trong khong thi gian di.
b) Tch thnh cc Frame. Tch t da trn nng lng.

Trang - 56 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

trnh s thay i t ngt gia cc frame, nn phi c s trng lp gia cc frame lin tip
vi nhau. Trong chng trnh ca ti, chn trng lp 48 tn hiu gia cc frame.

#define BUF_SIZE 128


#define OVER_LAP 48
Khng phi tt c cc tn hiu thu c l tn hiu m thanh, c th l nhng khong im lng
ch ton l nhiu. nng cao chnh xc cng nh gim thiu khi lng tnh ton, ch nhng tn
hiu m thanh mi c a vo x l nhng bc sau.
Chng trnh tch t y da vo nng lng ca tn hiu. Thut ton c miu t hnh
3.5.
Vi thut ton nh vy vit vo DSP, kt qu cho thy chng trnh tch t mt cch kh chnh
xc. Chng trnh hun luyn cho ra hai mc nng lng nh sau ( trong mi trng hun luyn yn
tnh ti nh):

ITL = 1191
ITU = 5955
Nng lng ca cc frame khi c v khng c ting ni c th phn bit nh hnh di.

Trang - 57 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.11. Nng lng khi c v khng c tn hiu ting ni.


Tuy nhin, c th chy tt trn cc mi trng khc nhau, chng trnh s c chc nng tm
cc mc nng lng ny. Chng trnh ny c th hin menu11 trong menu build in for LCD.

Hnh 4.12. on menu ca chng trnh hun luyn tch t.


Cc gi tr ITL v ITU s c a vo EEPROM c th s dng sau khi khng cn cp
ngun na.
c) Windowing v FFT
thc hin chc nng phn tch ph, ti s dng th vin real FFT ca TI, th vin ny c
lun chc nng Windowing frame tn hiu.
Ca s c s dng l ca s Hamming ng vi frame 128 im.
Phn tch ph FFT 128 im, kt qu th hin nh sau:

Trang - 58 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.13. Hnh th 1: tn hiu trc tip t ADC.


Hnh th 2: tn hiu sau khi qua b lc IIR
Hnh th 3: FFT thu c t chng trnh t tn hiu hnh (2)
Hnh th 4: FFT ca tn hiu (2) ca chng trnh CCS.
So snh hnh th (3) v th (4) trn, c th nhn thy FFT ca chng trnh chy kh ging
vi ca CCS.
Trong trng hp khng c m thanh, ch c nhiu, c th ph bin nh sau:

Trang - 59 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.14. Ph ca CCS vi tn hiu nhiu, hnh 1.


Ph ca chng trnh vi tn hiu nhiu, hnh 2.
C th nhn thy vi nhiu, bin ph thu c bng 0 ti mi gi tr tn s.
d) Mel frequency Wrapping.
Nh trnh by phn III, m phng li p ng khng tuyn tnh ca tai ngi vi cc tn
hiu m thanh, ph mel c trnh by cn chnh li cc gi tr ph sau khi phn tch FFT.
Trang - 60 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Vi chng trnh nh ph lc A, cc h s thu c nh sau:

#define MEL_BANK 10
const Uint16 b0[] = {5, 1, 1224, 1688, 710};
const Uint16 b1[] = {7, 2, 312, 1290, 1822, 1007, 256};
const Uint16 b2[] = {8, 4,178, 993, 1744,1558,

907,

297};

const Uint16 b3[] = {9, 7, 442, 1093, 1703, 1723, 1180, 667, 179};
const Uint16 b4[] = {11, 10, 277, 820, 1333, 1821, 1714, 1271,
847,

440,

50};

const Uint16 b5[] = {12, 14, 286, 729, 1153, 1560, 1950, 1675, 1314,
965, 629, 304};
const Uint16 b6[] = {15, 19, 325, 686, 1035, 1371, 1696, 1990, 1685,
1389,

1103,

824,

553,

290,

const Uint16 b7[] = {18, 24,


1176,
843,

1447,
621,

1710,
405,

10,

33};

1967,

315,

611,

1783,

1540,

897,
1302,

1070,

193};

const Uint16 b8[] = {21, 33,


1157,

1379,

1595,

1389,

1198,

1010,

217,

460,

1807,

698,

1986,

826,

930,

1782,

646,

469,

218,

416,

1584,
294,

124};
const Uint16 b9[] = {26, 40,
802,

990,

1955,

1790,

854,

706,

1174,
1628,
561,

14,

1354,

1531,

1468,
418,

1706,

1311,

1156,

276,

137};

611,
1876,
1004,

Kt qu thu c ca mel frequency tng ng vi ph FFT c biu din nh hnh di


vi 2 frame.

Trang - 61 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.15. FFT v ph MEL tng ng ca 2 frame.


Kt qu n bc ny l mt dy cc vector ph mel. S lng cc vector ny bng vi s
lng cc frame c c t chng trnh tch t.
e) Cepstrum
Bc ny chuyn ngc tn hiu ph mel tr li tn hiu thi gian dng DCT. thc hin cc
php ton LOG v COS, ti s dng th vin Qmath v IQmath ca TI p ng tc tnh ton
nhanh thay v s dng cc hm LOG v COS ca <math.h>
Hnh sau biu din Cepstrum ca 2 frame lin tip, so snh vi FFT tng ng.

Trang - 62 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 4.16. FFT v Cepstrum tng ng ca 2 frame lin tip.

2. Hun luyn theo thut ton LBG


Thut ton LBG c niu c th phn 3.
p dng thut ton ny vo DSP c c cc c s d liu ca tng t c hun luyn.
Trang - 63 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Cc t ny s c a vo EEPROM lu tr.

Hnh 4.17. Cc gi tr Cepstrum ( hnh th 1 ) v VQ tng ng ( hnh th 2)


sau khi p dng thut ton LBG

Trang - 64 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

3. S gii thut ca ton b chng trnh.


c tc ti a, tt c cc on chng trnh s c copy vo SARAM c tc ti a
l 150MHz. Theo menuLcd, th chng trnh chnh gm 3 phn:
Chnh cc thng s xc nh im cui.
Phn chnh Endpoint Detection, phn ny da trn c s nng lng xc nh im cui.
Nhm hot ng trong mi trng nhiu khc nhau. Cc gi tr cho vic phn tch im cui l ITL
v ITU s c lu tr vo ROM ngoi.
Mi khi thay i mi trng lm vic, nn chnh li thng s ny u tin m bo s chnh
xc cho vic nhn dng.

Trang - 65 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 5.18. S gii thut ca phn xc nh im cui.


Phn hun luyn.
Phn hun luyn mc ch hun luyn cc t mu. Trong chng trnh ca ti c hai ch
nhn dng. Ch c lu li vo ROM v ch khng c lu li (temporality).
Ch c lu li vo ROM sau khi tch c trng ca tng t s c lu vo ROM. V d
mi t c lng t 8 mc VQ, mi lng t VQ l mi vector 10 phn t. Mi phn t c ln 32
bit. Tng cng mi t c hun luyn s ng vi 8*10*32 = 2560 bt = 320 byte.

Trang - 66 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Ch khng c lu li, mi t c hun luyn c lu tm thi vo b nh RAM, mt


bin gi l vqdatabase c dnh ring cha cc t c hun luyn, ti a c th cha c 10 t
nh vy.

Hnh 5.19. Lu gii thut ca tch c trng cho t.


Theo nh lu gii thut, khng phi mi t c thu c u c xc nhn, v d nhng
Trang - 67 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

m thu c qu ngn c nhiu kh nng l nhiu, hoc t qu di c th gy trn b nh chng


trnh.
Vi mc ch DEMO, ti lu sn trong ROM 4 t c hun luyn, l mt, hai, ln,
xung.
Phn nhn dng.
Tng ng vi phn hun luyn, phn nhn dng s c hai phn, l nhn dng nhng t c
lu trong ROM v nhn dng nhng t va mi hun luyn ( ch temporality ).
Vi ch nhn dng t lu trong ROM, trc ht chng trnh s ti cc d liu t ROM vo
RAM cho tc nhanh hn.
Phn nhn dng c thc hin theo phng php Voice Indentification. Tc l s tm trong s
nhng t c sn, t no ph hp vi t c thu c nht.

Trang - 68 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Hnh 5.20. Lu gii thut ca chng trnh nhn dng.

III.

Kt qu v nhn xt.

Trang - 69 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Kt qu:
Chng trnh DEMO vi 4 t c np sn vo ROM l mt, hai, ln, xung. Thc
nghim trong mi trng nhiu t ( trong phng, c ting qut my hoc ni chuyn t xa ) cho thy kt
qu nh sau:
Mt

100.00%

Hai

100.00%

Ln

70.00%

Xung

75.00%

T ln thng b nhm ln vi t hai, t xung hay b nhm ln vi t ln hoc hai.


Cn cc t mt v hai cho kt qu chnh xc 100%.
Cc t c ph m ging nhau, nh t mt v bn, tri v phi thng b nhm ln.
Khi tng s t nhn dng ln, kt qu km i hng, vi chng trnh ny s t nhn dng tt
nht l di 6 t. Vi cc t khc nhau kh nhiu, th kt qu thu c rt tt, thng trn 70%.
Kt qu s tt hn nu nh ngi hun luyn cng l ngi pht ra m nhn dng, nu ngi
hun luyn v ngi ra lnh c ging khc nhau th kt qu nhn dng gim i r rt.
Nhn xt.
Vn t nhn c trong bi ton l kh nh, di 6 t. Tuy nhin, cc t ny khc nhau nhiu
v cc pht m th kt qu nhn dng mi c th chp nhn c. Kt qu s rt xu khi nng vn t
cn nhn dng ln qu 6 t, hoc ngi hun luyn v ngi ra lnh c ging ni khc nhau nhiu.
u im ca bi ton l tnh thi gian thc, chng trnh c kh nng hun luyn t mi.
Chng trnh c th cu hnh chy trong cc mi trng nhiu khc nhau.
Hng pht trin cho ti ny.
Ngay t u, vic chn thut ton nhn dng VQ vi bn thn n ch dnh cho nhng h thng
c kh nng nhn dng vi s t nh, nn kt qu thu c vi s t vng hn ch l iu khng th
trnh khi. Vic chn cc thut ton nhn dng tin tin hn, v d nh HMM hay Neural Network s
cho kt qu kh quan hn. Tuy nhin cn xem xt kh nng ca phn cng c th p ng c khi
Trang - 70 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

lng tnh ton nh th no.


Vic nhn dng vi t n, ph thuc vo s khc nhau gia cc m nhn ca cc t, nn vi
cch tip cn nh th ny vn t vng khng th qu nhiu; do khi nng s t ln, s ging nhau gia
cc t l khng th trnh khi. Mt cch tip cn t nhin l da vo ng cnh ( contest ) a ra
quyt nh cui cng. Ng cnh y c th l ng php, c th l ch ang cp n. iu ny
cng kh ging vi cch tip cn ca con ngi, l da vo ng cnh khi ni chuyn.

Trang - 71 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Phn:

Ph lc.
Ph lc A:
Chng trnh tnh h s mel spaced filterbank dng MATLAB.
function m = melfb(p, n, fs)
% MELFB
Determine matrix for a mel-spaced filterbank
%
% Inputs:
p number of filters in filterbank
%
n length of fft
%
fs sample rate in Hz
%
% Outputs:
x a (sparse) matrix containing the filterbank amplitudes
%
size(x) = [p, 1+floor(n/2)]
%
% Usage:
For example, to compute the mel-scale spectrum of a
%
colum-vector signal s, with length n and sample rate fs:
%
%
f = fft(s);
%
m = melfb(p, n, fs);
%
n2 = 1 + floor(n/2);
%
z = m * abs(f(1:n2)).^2;
%
%
z would contain p samples of the desired mel-scale spectrum
%
%
To plot filterbanks e.g.:
%
%
plot(linspace(0, (12500/2), 129), melfb(20, 256, 12500)'),
%
title('Mel-spaced filterbank'), xlabel('Frequency (Hz)');
f0 = 700 / fs;
fn2 = floor(n/2);
lr = log(1 + 0.5/f0) / (p+1);
% convert to fft bin numbers with 0 for DC term
Trang - 72 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

bl = n * (f0 * (exp([0 1 p p+1] * lr) - 1));


b1 = floor(bl(1)) + 1;
b2 = ceil(bl(2));
b3 = floor(bl(3));
b4 = min(fn2, ceil(bl(4))) - 1;
pf = log(1 + (b1:b4)/n/f0) / lr;
fp = floor(pf);
pm = pf - fp;
r = [fp(b2:b4) 1+fp(1:b3)];
c = [b2:b4 1:b3] + 1;
v = 2 * [1-pm(b2:b4) pm(1:b3)];
m = sparse(r, c, v, p, 1+fn2);

Ph lc B.
Schematic ca phn cng giao tip vi eZdsp F2812.

Trang - 73 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Trang - 74 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Trang - 75 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Ph lc C.
Chng trnh nhn dng m thanh vi gii thut VQ da trn MATLAB.

function M3 = blockFrames(s, fs, m, n)


% blockFrames: Puts the signal into frames
%
% Inputs: s contains the signal to analize
%
fs is the sampling rate of the signal
%
m is the distance between the beginnings of two frames
%
n is the number of samples per frame
%
% Output: M3 is a matrix containing all the frames
%
%
%%%%%%%%%%%%%%%%%%
% Mini-Project: An automatic speaker recognition system
%
% Responsible: Vladan Velisavljevic
% Authors: Christian Cornaz
%
Urs Hunkeler
l = length(s);
nbFrame = floor((l - n) / m) + 1;
for i = 1:n
for j = 1:nbFrame
M(i, j) = s(((j - 1) * m) + i);
end
end
h = hamming(n);
M2 = diag(h) * M;
for i = 1:nbFrame
M3(:, i) = fft(M2(:, i));
end

function d = disteu(x, y)
% DISTEU Pairwise Euclidean distances between columns of two matrices
%
% Input:
Trang - 76 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

%
x, y: Two matrices whose each column is an a vector data.
%
% Output:
%
d:
Element d(i,j) will be the Euclidean distance between two
%
column vectors X(:,i) and Y(:,j)
%
% Note:
%
The Euclidean distance D between two vectors X and Y is:
%
D = sum((x-y).^2).^0.5
[M, N] = size(x);
[M2, P] = size(y);
if (M ~= M2)
error('Matrix dimensions do not match.')
end
d = zeros(N, P);
if (N < P)
copies = zeros(1,P);
for n = 1:N
d(n,:) = sum((x(:, n+copies) - y) .^2, 1);
end
else
copies = zeros(1,N);
for p = 1:P
d(:,p) = sum((x - y(:, p+copies)) .^2, 1)';
end
end
d = d.^0.5;
function r = mfcc(s, fs)
% MFCC
%
% Inputs: s contains the signal to analize
%
fs is the sampling rate of the signal
%
% Output: r contains the transformed signal
%
%
%%%%%%%%%%%%%%%%%%
% Mini-Project: An automatic speaker recognition system
Trang - 77 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

%
% Responsible: Vladan Velisavljevic
% Authors: Christian Cornaz
%
Urs Hunkeler
m = 100;
n = 256;
l = length(s);
nbFrame = floor((l - n) / m) + 1;
for i = 1:n
for j = 1:nbFrame
M(i, j) = s(((j - 1) * m) + i);
end
end
h = hamming(n);
M2 = diag(h) * M;
for i = 1:nbFrame
frame(:,i) = fft(M2(:, i));
end
t = n / 2;
tmax = l / fs;
m = melfb(20, n, fs);
n2 = 1 + floor(n / 2);
z = m * abs(frame(1:n2, :)).^2;
r = dct(log(z));
function r = vqlbg(d,k)
% VQLBG Vector quantization using the Linde-Buzo-Gray algorithme
%
% Inputs: d contains training data vectors (one per column)
%
k is number of centroids required
%
% Output: r contains the result VQ codebook (k columns, one for each centroids)
%
%
%%%%%%%%%%%%%%%%%%
Trang - 78 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

% Mini-Project: An automatic speaker recognition system


%
% Responsible: Vladan Velisavljevic
% Authors: Christian Cornaz
%
Urs Hunkeler
e = .01;
r = mean(d, 2);
dpr = 10000;
for i = 1:log2(k)
r = [r*(1+e), r*(1-e)];
while (1 == 1)
z = disteu(d, r);
[m,ind] = min(z, [], 2);
t = 0;
for j = 1:2^i
r(:, j) = mean(d(:, find(ind == j)), 2);
x = disteu(d(:, find(ind == j)), r(:, j));
for q = 1:length(x)
t = t + x(q);
end
end
if (((dpr - t)/t) < e)
break;
else
dpr = t;
end
end
end

function code = train(traindir, n)


% Speaker Recognition: Training Stage
%
% Input:
%
traindir : string name of directory contains all train sound files
%
n
: number of train files in traindir
%
% Output:
%
code : trained VQ codebooks, code{i} for i-th speaker
%
Trang - 79 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

% Note:
%
Sound files in traindir is supposed to be:
%
s1.wav, s2.wav, ..., sn.wav
% Example:
%
>> code = train('C:\data\train\', 8);
k = 16;

% number of centroids required

for i = 1:n
% train a VQ codebook for each speaker
file = sprintf('%ss%d.wav', traindir, i);
disp(file);
[s, fs] = wavread(file);
v = mfcc(s, fs);
code{i} = vqlbg(v, k);
end

% Compute MFCC's
% Train VQ codebook

function test(testdir, n, code)


% Speaker Recognition: Testing Stage
%
% Input:
%
testdir : string name of directory contains all test sound files
%
n
: number of test files in testdir
%
code : codebooks of all trained speakers
%
% Note:
%
Sound files in testdir is supposed to be:
%
s1.wav, s2.wav, ..., sn.wav
%
% Example:
%
>> test('C:\data\test\', 8, code);
for k = 1:n
% read test sound file of each speaker
file = sprintf('%ss%d.wav', testdir, k);
[s, fs] = wavread(file);
v = mfcc(s, fs);

% Compute MFCC's

distmin = inf;
k1 = 0;

Trang - 80 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

for l = 1:length(code)
% each trained codebook, compute distortion
d = disteu(v, code{l});
dist = sum(min(d,[],2)) / size(d,1);
if dist < distmin
distmin = dist;
k1 = l;
end
end
msg = sprintf('Speaker %d matches with speaker %d', k, k1);
disp(msg);
end

Trang - 81 / 82 -

SVTT: Nguyn Quc nh.

GVHD: TS. Hunh Thi Hong.

Ti liu tham kho chnh:


[1]
Minh N.Do. Digital signal processing mini project: An Automatic Speaker Recognition
System, http://www.ifp.uiuc.edu/~minhdo/teaching/speaker_recognition.
[2]
L.R. Rabiner, M.R. Sambur. An algorithm for determining the endpoints for isolated utterances,
Bell System Tech. Journal, Feb. 1975.
[3]
G.S. Ying, C.D. Mitchell & L.H. Jamieson. Endpoint detection of isolated utterances based on a
modified Teager energy measurement.
[4]
John L. Ostrander, Timothy D. Hopmann & Edward J. Delp. Speech Recognition Using LPC
Analysis. January 1986.
[5]
John-Paul Hosom, Jacques de Villiers, Ron Cole, Mark Fanty, Johan Schalkwyk, Yonghong Yan
& Wei Wei. Training Hidden Markov Model/Artificial Neural Network (HMM/ANN) Hybrids for
Automatic Speech Recognition (ASR),
http://speech.bme.ogi.edu/tutordemos/nnet_training/tutorial.html.
[6]
Md. R. Hasan, M. Jamil, Md. G. Rabbani, Md. S. Rahman. Speaker Indentification Using Mel
Frequency Cepstral Coefficients.
[7]

Application Report v User Manual ca Texas Instruments.

Trang - 82 / 82 -

You might also like