Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 22

Speech Processing Project Linear Predictive coding using Voice excited Vocoder

ECE 5525 Osama Saraireh Fall 2005 r! Veton "epus#a

The basic form of pitch excited LPC vocoder is shown below

$he speech signal is %iltered to no more than one hal% the s&stem sampling %re'uenc& and then () conversion is per%ormed! $he speech is processed on a %rame *& %rame *asis +here the anal&sis %rame length can *e varia*le! For each %rame a pitch period estimation is made along +ith a voicing decision! ( linear predictive coe%%icient anal&sis is per%ormed to o*tain an inverse model o% the speech spectrum ( ,-.! /n addition a gain parameter 01 representing some %unction o% the speech energ& is computed! (n encoding procedure is then applied %or trans%orming the anal&-ed parameters into an e%%icient set o% transmission parameters +ith the goal o% minimi-ing the degradation in the s&nthesi-ed speech %or a speci%ied num*er o% *its! "no+ing the transmission %rame rate and the num*er o% *its used %or each transmission parameters1 one can compute a noise2%ree channel transmission *it rate! (t the receiver1 the transmitted parameters are decoded into 'uanti-ed versions o% the coei%%icent anal&sis and pitch estimation parameters! (n excitation signal %or s&nthesis is then constructed %rom the transmitted pitch and voicing parameters! $he excitation signal then drives a s&nthesis %ilter 3)( ,-. corresponding to the anal&sis model ( ,-.! $he digital samples s4,n. are then passed through an )( converter and lo+ pass %iltered to generate the s&nthetic speech s,t.! Either *e%ore or a%ter s&nthesis1 the gain is used to match the s&nthetic speech energ& to the actual speech energ&! $he digital samples are the converted to an analog signal and passed through a %ilter similar to the one at the input o% the s&stem!

Linear predictive coding (LPC) of speech $he linear predictive coding ,LPC. method %or speech anal&sis and s&nthesis is *ased on modeling the Vocal tract as a linear (ll2Pole ,//5. %ilter having the s&stem trans%er %unction6

simple speech production

7here p is the num*er o% poles1 0 is the %ilter 0ain1 and a8#9 are the parameters that determine the poles! $here are t+o mutuall& exclusive +a&s excitation %unctions to model voiced and unvoiced speech sounds! For a short time2*asis anal&sis1 voiced speech is considered periodic +ith a %undamental %re'uenc& o% Fo1 and a pitch period o% 3)Fo1 +hich depends on the spea#er! :ence1 Voiced speech is generated *& exciting the all pole %ilter model *& a periodic impulse train! On the other hand1 unvoiced sounds are generated *& exciting the all2pole %ilter *& the output o% a random noise generator!

$he %undamental di%%erence *et+een these t+o t&pes o% speech sounds comes %rom the +a& the& are produced! $he vi*rations o% the vocal cords produce voiced

sounds! $he rate at +hich the vocal cords vi*rate dictates the pitch o% the sound! On the other hand1 unvoiced sounds do not rel& on the vi*ration o% the vocal cords! $he unvoiced sounds are created *& the constriction o% the vocal tract! $he vocal cords remain open and the constrictions o% the vocal tract %orce air out to produce the unvoiced sounds

0iven a short segment o% a speech signal1 lets sa& a*out 20 ms or 3;0 samples at a sampling rate < ":-1 the speech encoder at the transmitter must determine the proper excitation %unction1 the pitch period %or voiced speech1 the gain1 and the coe%%icients

ap8#9! $he *loc# diagram *elo+ descri*es

the encoder)decoder %or the Linear Predictive Coding! $he parameters o% the model are determined adaptivel& %rom the data and modeled into a *inar& se'uence and transmitted to the receiver! (t the receiver point1 the speech signal is the s&nthesi-ed %rom the model and excitation signal!

$he parameters o% the all2pole %ilter model are determined %rom the speech samples *& means o% linear prediction! $o *e speci%ic the output o% the Linear Prediction %ilter is

s , n. = a p , k . s , n k .
k= 3

and the corresponding error *et+een the o*served sample S,n. and the predicted value

s,n. is
e, n . = s , n . s , n .
4

*& minimi-ing the sum o% the s'uared error +e can determine the pole parameters a p , k . o% the model! $he result o% di%%erentiating the sum a*ove +ith respect to each o% the parameters and e'uation the result to -ero1 is a sep o% p linear e'uations

a
k =3

, k .rss , m k . = rss , m . +here m=3121>!p

+here rss , m . represent the autocorrelation o% the se'uence s ,n. de%ined as

rss , m . = s ,n. s , n + m.
n =0

the e'uation a*ove can *e expressed in matrix %orm as


Rss a = rss , m .

+here Rss a is a pxp autocorrelation matrix1 rss is a px3 autocorrelation vector1 and a is a px3 vector o% model parameters!
[row col] = size(data); if col==1 data=data'; end nframe = 0; msfr = round(sr/1000 fr); ! "on#ert ms to samples msfs = round(sr/1000 fs); ! "on#ert ms to samples duration = len$t%(data); speec% = filter([1 &preemp]' 1' data)'; ! (reemp%asize speec% mso#erlap = msfs & msfr; ramp = [0)1/(mso#erlap&1))1]'; ! "ompute part of window for frame*nde+=1)msfr)duration&msfs,1 ! frame rate=-0ms frame.ata = speec%(frame*nde+)(frame*nde+,msfs&1)); ! frame size=/0ms nframe = nframe,1; auto"or = +corr(frame.ata); ! "ompute t%e cross correlation auto"or0ec = auto"or(msfs,[0)1]);

$hese e'uations can *e solved in ?($L@ *& using the Levinson2 ur*in algorithm!

! 1e#inson's met%od err(1) = auto"or0ec(1); k(1) = 0; 2 = []; for inde+=1)1 numerator = [1 23'] auto"or0ec(inde+,1)&1)-); denominator = &1 err(inde+); k(inde+) = numerator/denominator; ! (2R"4R coeffs 2 = [2,k(inde+) flipud(2); k(inde+)]; err(inde+,1) = (1&k(inde+)5-) err(inde+);

$he gain parameter o% the %ilter can *e o*tained *& the input2output relationship as %ollo+
s , n. = a p ,k . s , n k . + 6+,n.
k =3 p

+here A,n. represent the input se'uence! 7e can %urther manipulate this e'uation and in terms o% the error se'uence +e have
6+,n. = s ,n. + a p ,k . s, n k . = e,n.
k =3 p

then 6 2 + 2 , n. = e 2 ,n.
n =0 n =0 N 3 N 3

i% the input excitation is normali-ed to unit energ& *& design1 then


6 2 + 2 ,n. = e 2 ,n. = rss ,0. + a p ,k .rss ,k .
n =0 n =0 k =3 N 3 N 3 p

+here 042 is set e'ual to the residual energ& resulting %rom the least s'uare optimi-ation !
! filter response if 0 $ain=0; cft=0)(1/-77))1; for inde+=1)1 $ain = $ain , a"oeff(inde+'nframe) e+p(&i - pi cft)35inde+; end $ain = a8s(13/$ain); spec()'nframe) = -0 lo$10($ain(1)1-9))'; plot(-0 lo$10($ain)); title(nframe); drawnow; end if 0 impulseResponse = filter(1' a"oeff()'nframe)' [1 zeros(1'-77)]); fre:Resp = -0 lo$10(a8s(fft(impulseResponse))); plot(fre:Resp); end

once the LPC coe%%icients are computed1 +e can determine +eather the input speech %rame is voiced1 and i% it is indeed voiced sound1 then +hat is the pitch! 7e can determine the pitch *& computing the %ollo+ing se'uence in matla*6

re , n. = ra , k .rss ,n k .
k =3

+h+re ra , k . is de%ined as %ollo+


ra ,n. = aa ,k .a p ,i + k .
k =3 p

+hich is de%ined as the autocorrelation se'uence o% the prediction coe%%icients! $he pitch id detected *& %inding the pea# o% the normali-ed se'uence

re , n. /n the time interval corresponds to B to 35 ms in the 20ms sampling %rame! /% re ,0. the value o% this pea# is at least 0!251 the %rame o% speech is considered voiced +ith a
pitch period e'ual to the value o% n = N p 1 +here
re , N p . re ,0.

is a maximum value!

/% the pea# value is less than 0!251 the %rame speech is considered unvoiced and the pitch +ould e'ual to -ero! errSig = %ilter,83 (C9131%rame ata.D E %ind excitation noise 0,n%rame. = s'rt,err,LF3..D E gain autoCorErr = xcorr,errSig.D E calculate pitch G voicing in%ormation 8@1/9 = sort,autoCorErr.D num = length,/.D i% @,num23. H !03I@,num. pitch,n%rame. = a*s,/,num. 2 /,num23..D else pitch,n%rame. = 0D end

$he value o% the LPC coe%%icients1 the pitch period1 and the t&pe o% excitation are then transmitted to the receiver! $he decoder s&nthesi-es the speech signal *& passing the proper excitation through the all pole %ilter model o% the vocal tract!

$&picall& the pitch period re'uires ; *its1 the gain parameters are represented in 5 *its a%ter the d&namic range is compressed logrithmatical&1 and the prediction coe%%icients re'uire <230 *its normall& %or accurac& reasons! $his is ver& important in LPC *ecause an& small changes in the prediction coe%%icients result in large change in the pole positions o% the %ilter model1 +hich cause insta*ilit& in the model! $his is overcome *& using the P(5(CO5 method !

Is speech frame Voiced or Unvoiced

Once the LPC coe%%icients are competed1 +e can determine +eather the input speech %rame is voiced1 and i% so1 +hat the pitch is! /% the speech %rame is decided to *e voiced1 an impulse train is emplo&ed to represent it1 +ith non-ero taps occurring ever& pitch period! ( pitch2detecting algorithm is used in order to determine to correct pitch period ) %re'uenc&! $he autocorrelation %unction is used to estimate the pitch period as ! :o+ever1 i% the %rame is unvoiced1 then +hite noise is used to represent it and a pitch period o% $=0 is transmitted! $here%ore1 either +hite noise or impulse train *ecomes the excitation o% the LPC s&nthesis %ilter

Two types of LPC vocoders were implemented in MATLAB Plain LPC Vocoder diagram is shown below :

ELPC vocoder
function [ outspeec% ] = speec%coder1( inspeec% ) ; ! (arameters) ! inspeec% ) wa#e data wit% samplin$ rate <s ! (<s can 8e c%an$ed underneat% if necessar=) ! Returns) ! outspeec% ) wa#e data wit% samplin$ rate <s ! (coded and res=nt%esized) if ( nar$in >= 1) error('ar$ument c%eck failed'); end; <s = 1?000; ! samplin$ rate in @ertz (@z) 4rder = 10; ! order of t%e model used 8= 1(" ! encoded t%e speec% usin$ 1(" [a"oeff' resid' pitc%' 6' parcor' stream] = proclpc(inspeec%' <s' 4rder); ! decode/s=nt%esize speec% usin$ 1(" and impulse&trains as e+citation outspeec% = s=nlpc(a"oeff' pitc%' <s' 6)

results )

residual plot 6

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8

50

100

150

200

250

300

350

400

450

500

The LPC gain Vs. Frames 1.4

1.2

0.8

0.6

0.4

0.2

50

100

150

Original speech signal 0.4

0.3

0.2

0.1

-0.1

-0.2

-0.3

0.5

1.5

2.5

3.5

4.5 x 10

5
4

output speech spectrum using LPC vocoder 10

-2

0.5

1.5

2.5

3.5

4.5 x 10

5
4

voice excited LPC Vocoder ,utili-ing C$ %or high compression rate)lo+ *its. the input speech signal in each %rame is %iltered +ith the estimated trans%er %unction o% LPC anal&-er! $his %iltered signal is called the residual!

$o achieve a high compression rate 1the discrete cosine trans%orm , C$. o% the residual signal could *e emplo&ed! $he C$ concentrates most o% the energ& o% the signal in the
%irst %e+ coe%%icients! $hus one +a& to compress the signal is to trans%er onl& the coe%%icients1 +hich contain most o% the energ&! %unction 8 outspeech 9 = speechcoder2, inspeech . E Parameters6 E inspeech 6 +ave data +ith sampling rate Fs E ,Fs can *e changed underneath i% necessar&. E 5eturns6 E outspeech 6 +ave data +ith sampling rate Fs E ,coded and res&nthesi-ed. i% , nargin J= 3. error,Cargument chec# %ailedC.D

endD Fs = 3;000D E sampling rate in :ert- ,:-. Order = 30D E order o% the model used *& LPC E encoded the speech using LPC 8aCoe%%1 resid1 pitch1 01 parcor1 stream9 = proclpc,inspeech1 Fs1 Order.D E per%orm a discrete cosine trans%orm on the residual resid = dct,resid.D 8a1*9 = si-e,resid.D E onl& use the %irst 50 C$2coe%%icients this can *e done E *ecause most o% the energ& o% the signal is conserved in these coe%%s resid = 8 resid,365016.D -eros,KB01*. 9D

E 'uanti-e the data resid = uencode,resid1K.D resid = udecode,resid1K.D E per%orm an inverse C$ resid = idct,resid.D E add some noise to the signal to ma#e it sound *etter noise = 8 -eros,501*.D 0!03Irandn,KB01*. 9D resid = resid F noiseD E decode)s&nthesi-e speech using LPC and the compressed residual as excitation outspeech = s&nlpc2,aCoe%%1 resid1 Fs1 0.D

res lts

noise ! " #eros$%&'b() &*&+,randn$-.&'b( /)

noise added to the signal to make it sound better 0.05 0.04 0.03 0.02 0.01 0 -0.01 -0.02 -0.03 -0.04 -0.05

50

100

150

200

250

300

350

400

450

500

resid ! resid 0 noise)


0.2

0.1

-0.1

-0.2

-0.3

-0.4

-0.5

50

100

150

200

250

300

350

400

450

500

Original speech signal 0.4

0.3

0.2

0.1

-0.1

-0.2

-0.3

0.5

1.5

2.5

3.5

4.5 x 10

5
4

reconstructed signal using voice Excited LPC vocoder 0.4

0.3

0.2

0.1

-0.1

-0.2

-0.3

0.5

1.5

2.5

3.5

4.5 x 10

5
4

?($L(@ %iles 6 clear allD Eosama saraireh E speech processing E r! Veton "epus#a EF/$ F(ll 2005 a= input ,Cplease load the speech signal as a !+av %ile C 1 CsC.D /nputsound%ile = a D 8inspeech1 Fs1 *its9 = +avread,/nputsound%ile.D E read the +ave%ile outspeech3 = speechcoder3,inspeech.D E plain LPC vocoder outspeech2 = speechcoder2,inspeech.D E Voice excitded LPC vocoder E plot results %igure,3.D

su*plot,B1313.D plot,inspeech.D gridD su*plot,B1312.D plot,outspeech3.D gridD su*plot,B131B.D plot,outspeech2.D gridD disp,CPress an& #e& to pla& the original sound %ileC.D pauseD soundsc,inspeech1 Fs.D disp,CPress an& #e& to pla& the LPC compressed %ileLC.D pauseD soundsc,outspeech31 Fs.D disp,CPress a #e& to pla& the voice2excited LPC compressed soundLC.D pauseD soundsc,outspeech21 Fs.D

%unction 8aCoe%%1resid1pitch101parcor1stream9 = proclpc,data1sr1L1%r1%s1preemp. E L 2 $he order o% the anal&sis! ! E %r 2 Frame time increment1 in ms! e%aults to 20ms E %s 2 Frame si-e in ms! E preemp 2 de%ault 0!MBN< E aCoe%% 2 $he LPC anal&sis results1 E resid 2 $he LPC residual1 E pitch 2 calculated *& %inding the pea# in the residualCs autocorrelation E%or each %rame! E 0 2 $he LPC gain %or each %rame! E parcor 2 $he parcor coe%%icients! E stream 2 $he LPC anal&sisC residual or excitation signal as one long vector!

i% ,narginOB.1 L = 30D end i% ,narginOK.1 %r = 20D end i% ,narginO5.1 %s = B0D end i% ,narginO;.1 preemp = !MBN<D end 8ro+ col9 = si-e,data.D i% col==3 data=dataCD end n%rame = 0D ms%r = round,sr)3000I%r.D E Convert ms to samples ms%s = round,sr)3000I%s.D E Convert ms to samples duration = length,data.D speech = %ilter,83 2preemp91 31 data.CD E Preemphasi-e speech msoverlap = ms%s 2 ms%rD ramp = 8063),msoverlap23.639CD E Compute part o% +indo+ %or %rame/ndex=36ms%r6duration2ms%sF3 E %rame rate=20ms %rame ata = speech,%rame/ndex6,%rame/ndexFms%s23..D E %rame si-e=B0ms n%rame = n%rameF3D autoCor = xcorr,%rame ata.D E Compute the cross correlation autoCorVec = autoCor,ms%sF806L9.D E LevinsonCs method err,3. = autoCorVec,3.D #,3. = 0D ( = 89D %or index=36L numerator = 83 (!C9IautoCorVec,indexF362362.D denominator = 23Ierr,index.D #,index. = numerator)denominatorD E P(5CO5 coe%%s ( = 8(F#,index.I%lipud,(.D #,index.9D err,indexF3. = ,32#,index.42.Ierr,index.D end aCoe%%,61n%rame. = 83D (9D parcor,61n%rame. = #CD E %ilter response i% 0 gain=0D c%t=06,3)255.63D %or index=36L gain = gain F aCoe%%,index1n%rame.Iexp,2iI2IpiIc%t.!4indexD end gain = a*s,3!)gain.D spec,61n%rame. = 20Ilog30,gain,3632<..CD plot,20Ilog30,gain..D title,n%rame.D dra+no+D end E Calculate the %ilter response E %rom the %ilterCs impulse E response ,to chec# a*ove.! i% 0

impulse5esponse = %ilter,31 aCoe%%,61n%rame.1 83 -eros,31255.9.D %re'5esponse = 20Ilog30,a*s,%%t,impulse5esponse...D plot,%re'5esponse.D end errSig = %ilter,83 (C9131%rame ata.D E %ind excitation noise 0,n%rame. = s'rt,err,LF3..D E gain autoCorErr = xcorr,errSig.D E calculate pitch G voicing in%ormation 8@1/9 = sort,autoCorErr.D num = length,/.D i% @,num23. H !03I@,num. pitch,n%rame. = a*s,/,num. 2 /,num23..D else pitch,n%rame. = 0D end E improve the compressed sound 'ualit& resid,61n%rame. = errSig)0,n%rame.D i%,%rame/ndex==3. E add residual %rames using a trape-oidal +indo+ stream = resid,36ms%r1n%rame.D else stream = 8stream9D overlapFresid,36msoverlap1n%rame.!IrampD resid,msoverlapF36ms%r1n%rame.D end i%,%rame/ndexFms%rFms%s23 H duration. stream = 8streamD resid,ms%rF36ms%s1n%rame.9D else overlap = resid,ms%rF36ms%s1n%rame.!I%lipud,ramp.D end end stream = %ilter,31 83 2preemp91 stream.CD

Speech ?odel one LPC Vocoder 6 %unction 8 outspeech 9 = speechcoder3, inspeech . E Parameters6 E inspeech 6 +ave data +ith sampling rate Fs E outputs6

E outspeech 6 +ave data +ith sampling rate Fs E ,coded and res&nthesi-ed. i% , nargin J= 3. error,Cargument chec# %ailedC.D endD Fs = <000D E sampling rate in :ert- ,:-. Order = 30D E order o% the model used *& LPC E encoded the speech using LPC 8aCoe%%1 resid1 pitch1 01 parcor1 stream9 = proclpc,inspeech1 Fs1 Order.D E decode)s&nthesi-e speech using LPC and impulse2trains as excitation outspeech = s&nlpc,aCoe%%1 pitch1 Fs1 0.D

E Voice2excited LPC vocoder

%unction 8 outspeech 9 = speechcoder2, inspeech . E Parameters6 E inspeech 6 +ave data +ith sampling rate Fs E ,Fs can *e changed underneath i% necessar&. E output6 E outspeech 6 +ave data +ith sampling rate Fs E ,coded and res&nthesi-ed.

i% , nargin J= 3. error,Cargument chec# %ailedC.D endD Fs = 3;000D E sampling rate in :ert- ,:-. Order = 30D E order o% the model used *& LPC E encoded the speech using LPC 8aCoe%%1 resid1 pitch1 01 parcor1 stream9 = proclpc,inspeech1 Fs1 Order.D E per%orm a discrete cosine trans%orm on the residual resid = dct,resid.D 8a1*9 = si-e,resid.D E onl& use the %irst 50 C$2coe%%icients this can *e done E *ecause most o% the energ& o% the signal is conserved in these coe%%s resid = 8 resid,365016.D -eros,KB01*. 9D

E 'uanti-e the data resid = uencode,resid1K.D resid = udecode,resid1K.D E per%orm an inverse C$ resid = idct,resid.D E add some noise to the signal to ma#e it sound *etter noise = 8 -eros,501*.D 0!03Irandn,KB01*. 9D resid = resid F noiseD E decode)s&nthesi-e speech using LPC and the compressed residual as excitation outspeech = s&nlpc2,aCoe%%1 resid1 Fs1 0.

5e%erences Linear Prediction o% Speech1 P! ?(5"EL1 (!: 05(Q1 Pr! Pages 302M;1 3M0235< igital signal Processing1 (lan V! Oppenheim) 5onald 7! Scha%er igital signal processing using ?($L(@1 Vina& "! /ngle1 Pohn Proa#id http6))+++!data2compression!com)speech!html

You might also like