Professional Documents
Culture Documents
Nén Huffman
Nén Huffman
Nén Huffman
i hc Nng - 2008
1. M u: Trong cc lnh vc ca cng ngh thng tin vin thng hin nay, vic truyn ti tin tc l mt cng vic xy ra thng xuyn. Tuy nhin thng tin c truyn ti i thng rt ln, iu ny gy kh khn cho cng vic truyn ti: gy tn km ti nguyn mng, tiu ph kh nng ca h thng gii quyt vn , cc thut ton nn c ra i. Ban u vi phng php m ha lot di RLC (Run Length Coding), pht hin mt lot cc bt lp li. y l phng php n gin nht. Nguyn tc c bn ca phng php ny l pht hin mt k t c s ln xut hin lin tip vt qua mt ngng c nh no . Trong trng hp ny dy s c thay th bng 3 k t: K t th nht l k t c bit, thng bo dy tip l dy c bit. K t th hai ch s ln lp. K t th ba ch k t lp. Nh vy t tng ca phng php ny l thay th mt dy bng mt dy khc ngn hn tun theo mt ngng no , v thng thng ngng c gi tr l 4. K n l phng php Huffman, da vo m hnh thng k, tnh tn sut xut hin ca cc k t, ri gn cho cc k t c tn sut cao mt t m ngn, cc k t tn sut thp t m di. Phng php ny phi lu gi li bng m gn km cng vi d liu nn. Mt phng php nn hon ton khc l thut ton nn d liu theo t in c s (Dictionary-based compression) C 2 loi: M ha t in tnh (static dictionary coding) M ha t in ng (dynamic dictionary coding) C rt nhiu thut ton p dng k thut ny nh LZ77, LZK, LZSS, LZHnhng trong ni dung bi bo co ny, chng ta ch cp n hai thut ton chnh l: + Thut ton LZ78. + Thut ton LZW.
258
i hc Nng - 2008
Jacob Ziv v Abraham Lempel m t k thut da trn t in bng m ha LZ77 v LZ78. tng da trn vic thay th 1 cm k t bng mt con tr, tr n v tr xut hin trc ca cm k t. LZW l m ha trong h LZ, hon thin hn LZ77-LZ78 v ang c s dng ph bin hin nay. V iu kin khng cho php nn bi bo co ch nu ra mt s thut ton nn d liu, nu mt s u nhc im v so snh lm ni bt phng php nn bng LZW. 2. Ni dung: Phng php m ha Huffman 2.1.1. Nguyn l: Nguyn l ca phng php Huffman l m ha cc bytes trong tp d liu ngun bng bin nh phn. N to m di bin thin l mt tp hp cc bits. y l phng php nn kiu thng k, nhng k t xut hin nhiu hn s c m ngn hn 2.1.2. Thut ton: Thut ton nn: Bc 1: Tm hai k t c trng s nh nht ghp li thnh mt, trng s ca k t mi bng tng trng s ca hai k t em ghp. Bc 2: Trong khi s lng k t trong danh sch cn ln hn mt th thc hin bc mt, nu khng th thc hin bc ba. Bc 3: Tch k t cui cng v to cy nh phn vi quy c bn tri m 0, bn phi m 1. Thut ton gii nn: Bc 1: c ln lt tng bit trong tp tin nn v duyt cy nh phn c xc nh cho n khi ht mt l. Ly k t l ghi ra tp gii nn. Bc 2: Trong khi cha ht tp tin nn th thc hin bc mt, ngc li th thc hin bc 3. Bc 3: Kt thc thut ton. Mt s nhng hn ch ca m Hufman: M Huffman ch thc hin c khi bit c tn sut xut hin ca cc k t. M Huffman ch gii quyt c d tha phn b k t. Huffman tnh i hi phi xy dng cy nh phn sn cha cc kh nng. iu ny i hi thi gian khng t do ta khng bit trc kiu d liu s c thc hin nn. Qu trnh gii nn phc tp do chiu di m khng bit trc cho n khi k t u tin c tm ra. Phng php m ha LZ78 Thay v thng bo v tr on vn lp li trong qu kh, m LZ78 nh s tt c cc on vn sao cho mi on ghi nhn s hiu on vn lp li trong qu kh cng vi mt k t m n lm cho on khc vi on trong qu kh. Nh vy mi on mi l mt on k t trong qu kh cng vi mt k t trong qu kh. Chnh v th on mi khc vi on c trong qu kh. V d: Gi s ta c on vn bn sau: aaabbabaabaaabab Theo thut ton LZ78 th chng c phn thnh cc on nh sau: Input A Aa b Ba baa baaa bab on output 1 0+a 2 1+a 3 0+b 4 3+a 5 4+a 6 5+a 7 4+b
259
i hc Nng - 2008
Nh vy bn nn ca chng ta l: (0,a); (1,a); (0,b); (3,a); (4,a); (5,a); (4,b) Thut ton nn: Bc 1: c mt k t -> ch, on c gn bng 1, kt np k t vo t in, w=ch; Bc 2: While not eof(f) do Begin c tip k t tip theo w:= ww+ch; If w thuc t in then ww:=w; Else begin Code(w,j); Ghi j v ch vo tp nn. Thm w vo t in. End; End; Bc 3: Dng chng trnh. Thut ton gii nn Bc 1: c thng tin v t in c lu trong tp nn, tl:=false; Bc 2: while not eof(f) do Begin c byte tip theo -> b Decode(b,s,t); If tl=false then w:=w+s Else w:=ww+s; TIMCHU(w,t); If t=false then Begin Ghi s ra tp gii nn Thm s vo t in End Else Begin ww:=s; End; End; Bc 3: Dng chng trnh. nh gi: Ni chung thut ton LZ78 l mt thut ton nn vn bn kh tt, c thi gian chy chng trnh tng i nhanh tuy nhin kh nng tit kim cha c khai thc tt a. Phng php m ha LZW Thut ton ny l s chuyn giao ca thut ton LZ78. Nh chng ta bit thut ton LZ78, vic lu tr cc k t theo sau mi on thng gy lng ph v b nh nn hiu qu nn cha cao. Thut ton LZW qun l bng cch loi b k t sau mi on do u ra ca mi on ch cha con tr m thi. Thut ton ny lu tr bng vic chun b mt danh sch cc on bao gm rt nhiu k t trong u vo l mt bng ch ci no , n thc hin mt qu trnh m rng cc bng ch ci hay ni cch khc l n dng k t b sung biu din li cc chui ca k t chnh quy. nn LZW trn m ASCII 8 bits ta cn m rng bng ch ci bng cch dng 9 bits hay nhiu hn 256 k t b sung m m 9 bits cung cp c dng lu tr cc chui m c quyt nh t cc chui trong ngun tin. Thut ton s khng t hiu qu nn cao nu c nhng iu kin sau:
260
i hc Nng - 2008
+ Ngun tin khng ng nht v c tnh d tha ca n thay i trong sut tp tin. + Ngun tin di mt cch ng k vt qu tm gii hn ca bng chui. Thut ton nn: Bc 1: Thng k to ra t in, ghi vo tp nn, t:=false; c k t u tin ->w Bc 2: While not eof(f) do Begin c mt k t ->ch If t=false then w:=w+ch Else Begin w:= ww+ch; t:=false; End; TIMCHU(w,tl); If tl=false then Begin Code(w,j); Ghi j ra tp nn. Thm w vo t in. w:=ch End Else Begin ww:=w; t:=true; End; End; Bc 3: Code(ch), Dng chng trnh. Thut ton gii nn: Bc 1: c thng tin t in trong tp nn, c byte tip theo, gii nn gn vo w, t=false; Bc 2: While not eof(f) do Begin c byte tip theo ->b Decode(b,s,t); If t=true then Begin For i:=1 to length(s) do Begin If t=false then w:=w+s(i) Else Begin w=ww+s(i); t:=false; End; TIMCHU(w,t); If t=false then Begin Thm vo t in; Ghi ra tp gii nn. w:=s(i) End; End; End; Else Begin
261
i hc Nng - 2008
Ghi ra tp gii nn; w:=w+ w(i); Thm w vo t in End; End; Bc 3: Decode(b,s,t): ghi s ra tp gii nn. Dng chng trnh. 3. 3. Kt lun: Cc phng php khc kt qu m ha tr v l b i <i,S>; i l mt con tr ch s nguyn, S l mt chui. => cch tr v ny kh d tha, khng hiu qu. LZW khc phc c bng cch: Kt qu m ha tr v ch cha duy nht con tr ch s nguyn, loi b chui S theo sau. Thut ton LZW khc phc c s lng ph v b nh m cc thut ton trc khng tn dng c ht. ng thi khc phc c s cng nhc ca thut ton nn, gp phn lm thut ton nn tr nn mm do hn, c sc hp dn hn i vi ngi s dng. Bi bo co trnh by v mt s thut ton nn thng dng hin ny. Gip chng ta c c ci nhn tng qut v nn d liu. ng thi cng trnh by c v 2 thut ton LZ78 v LZW. C th ni y l thut ton tiu biu trong h thng m h LZ, l tin cho cc thut ton nn d liu tt hn sau ny. Bi ton ng dng ca em mi ch dng li vic nn d liu t file *TXT. Nhng khng dng li , em s c hng pht trin a c bi ton vo thc tin c th nn c d liu t nhiu ngun d liu hn. ng thi khng ngng hc hi, tip thu kin thc c th s dng nhng thut ton ci tin hn, hiu qu hn vo vic xy dng chng trnh gip chng trnh nn nhanh hn, t l nn cao hn. TI LIU THAM KHO [1] Gio trnh L thuyt th - PSG.TSKH. Trn Quc Chin 2002 (Lu hnh ni b) [2] Thut ton trong tin hc V c Thi NXB KHKT [3] Cm nang Thut ton Robert Sedgewick NXBKHKT [4] Gio trnh l thuyt m Nguyn L Anh, Nguyn Vn Xut, Phm Th Long Trng HDL ng 1997 [5] Text Compress NXB Prentice Hall, Englewood Cliffs Newjersey.
262