5.1 Gii thiu v nn nh (1) Mt s khi nim Nn D liu (Data Compression) Nn d liu l qu trnh lm gim lng thng tin "d tha" trong d liu gc v do vy, lng thng tin thu c sau nn thng nh hn d liu gc rt nhiu. Vi d liu nh, kt qu thng l 10 : 1. Mt s phng php cn cho kt qu cao hn. K thut nn fractal cho t s nn l 30 : 1. Ngoi thut ng "nn d liu, cn c mt s tn gi khc nh: gim d tha, m ho nh gc. T l nn (Compression rate) T l nn l mt trong cc c trng quan trng nht ca mi phng php nn. Nhn chung, ngi ta nh ngha t l nn nh sau: T l nn =1/r (n v l %) r l t s nn c nh ngha: r = kch thc d liu gc/ kch thc d liu thu c sau nn. Nh vy hiu sut ca nn l : (1 - t l nn) (n v l %) (2) Cc loi d tha d liu (i) S phn b k t Trong mt dy k t, c mt s k t c tn sut xut hin nhiu hn mt s dy khc. Cc dy k t c tn sut cao c thay bi mt t m nh phn vi s bt nh; ngc li cc dy c tn sut thp s c m ho bi t m c nhiu bt hn. (ii) S lp li ca cc k t K thut nn dng trong trng hp ny l thay dy lp bi dy mi gm 2 thnh phn: s ln lp v k hiu dng m. Phng php m ho kiu ny c tn l m ho lot di RLC (Run Length Coding). (iii) d tha v tr Do s ph thuc ln nhau ca d liu, i khi bit c k hiu (gi tr) xut hin ti mt v tr, ng thi c th on trc s xut hin ca cc gi tr cc v tr khc nhau mt cch ph hp. Chng hn, nh biu din trong mt li hai chiu, mt s im hng dc trong mt khi d liu li xut hin trong cng v tr cc hng khc nhau. Do vy, thay v lu tr d liu, ta ch cn lu tr v tr hng v ct. (3) Phn loi cc phng php nn Cch th nht: da vo nguyn l nn. Nn chnh xc hay nn khng mt thng tin: h ny bao gm cc phng php nn m sau khi gii nn ta thu c chnh xc d liu gc. Nn c mt mt thng tin: h ny bao gm cc phng php m sau khi gii nn ta khng thu c d liu nh bn gc. Cch th hai: da vo cch thc thc hin nn. Phng php khng gian (Spatial Data Compression): cc phng php thuc h ny thc hin nn bng cch tc ng trc tip ln vic ly mu ca nh trong min khng gian. Phng php s dng bin i (Transform Coding): Gm cc phng php tc ng ln s bin i ca nh gc m khng tc ng trc tip nh h trn Cch th ba: da vo trit l ca s m ho. Cc phng php nn th h th nht: Gm cc phng php m mc tnh ton l n gin, th d nh vic ly mu, gn t m,... Cc phng php nn th h th hai: Da vo mc bo ho ca t l nn. Cch phn loi th t do Anil.K.Jain xut: Phng php im. Phng php d on. Phng php da vo bin i. Cc phng php t hp. Thc ra cch phn loi ny l chia nh ca cch phn loi th ba v da vo c ch thc hin nn. 5.2 M ho lot di Phng php m ho lot di lc u c pht trin dnh cho nh s 2 mc: mc en (1) v mc trng (0) nh cc vn bn trn nn trng, trang in, cc bc v k thut. Nguyn tc ca phng php l pht hin mt lot cc bt lp li, th d nh mt lot cc bit 0 nm gia hai bit 1, hay ngc li, mt lot bit 1 nm gia hai bit 0. Dy cc bit lp gi l lot hay mch (run). Thay th chui bi mt chui mi gm 2 thng tin: chiu di chui v bit lp (k t lp). Nh vy, chui thay th s c chiu di ngn hn chui cn thay. Chiu di ca chui lp c th ln hn 255 ? Gi s cc mch gm M bits. tin trnh by, t M =2m-1. Nh vy mch c c thay bi mch mi gm m bits. Vi cch thc ny, mi mch u c m ho bi t m c cng di. Ngi ta cng tnh c, vi M=15, ta s c m=4 v t s nn l 1,95. Vi chiu di c nh, vic ci t thut ton l n gin. Tuy nhin, t l nn s khng tt bng dng chiu di bin i hay gi l m RLC thch nghi. 5.3 Phng php m ho Huffman Phng php m ho Huffman l phng php da vo m hnh thng k. Da vo d liu gc, ngi ta tnh tn sut xut hin ca cc k t (1byte). Vic tnh tn sut c thc hin bng cch duyt tun t tp gc t u n cui. Trong phng php ny, ngi ta gn cho cc k t c tn sut cao mt t m ngn, cc k t c tn sut thp t m di. Ni mt cch khc, cc k t c tn xut cng cao c gn m cng ngn v ngc li. Thut ton bao gm 2 bc chnh: Giai on tnh tn sut ca cc k t trong d liu gc: Duyt tp gc mt cch tun t t u n cui xy dng bng m. Tip sau l sp xp li bng m theo th t tn sut gim dn. Giai on th hai: m ho. Duyt bng tn sut t cui ln u thc hin ghp 2 phn t c tn sut thp nht thnh mt phn t duy nht. Phn t ny c tn xut bng tng 2 tn sut thnh phn. Tin hnh cp nht li bng v loi b 2 phn t xt. Qu trnh c lp li cho n khi bng ch c mt phn t. Qu trnh ny gi l qu trnh to cy m Huffman v vic tp hp c tin hnh nh mt cy nh phn vi 2 nhnh. Phn t c tn sut thp bn phi, phn t kia bn tri. Vi cch to cy ny, tt c cc bit d liu/ k t l nt l; cc nt trong l cc nt tng hp. Sau khi cy to xong, ngi ta tin hnh gn m cho cc nt l. M ho: mi ln xung bn phi ta thm 1 bit "1" vo t m; mi ln xung bn tri ta thm 1 bit "0". Tt nhin c th lm ngc li, ch c gi tr m thay i cn tng chiu di l khng i. 5.4 Phng php LZW Khi nim nn t in c Jacob Lempel v Abraham Ziv a ra ln u tin vo nm 1977, sau pht trin thnh mt h gii thut nn t in LZ. Nm 1984, Terry Welch ci tin gii thut LZ thnh mt gii thut mi hiu qu hn v t tn l LZW. Phng php nn t in da trn vic xy dng t in lu cc chui k t c tn sut lp li cao v thay th bng t m tng ng mi khi gp li chng. Gii thut nn LZW xy dng mt t in lu cc mu c tn sut xut hin cao trong nh. T in l tp hp nhng cp t vng v ngha ca n. T vng s l cc t m c sp xp theo th t nht nh. Ngha l mt chui con trong d liu nh. T in c xy dng ng thi vi qu trnh c d liu. S c mt ca mt chui con trong t in khng nh rng chui tng xut hin trong phn d liu c. Thut ton lin tc "tra cu" v cp nht t in sau mi ln c mt k t d liu u vo. Do kch thc b nh khng phi v hn v m bo tc tm kim, t in ch gii hn 4096 phn t dng lu ln nht l 4096 gi tr ca cc t m. Nh vy di ln nht ca t m l 12 bits ( 4096 = 212). Cc k t T m th 256 cha mt m c bit l "m xo" (Clear Code). M xo nhm khc phc tnh trng s mu lp trong nh ln hn 4096. Khi mt nh c quan nim l nhiu mnh nh, v t in l mt b t in gm nhiu t in con. C ht mt mnh nh ngi ta li gi mt m xo bo hiu kt thc mnh nh c, bt u mnh nh mi ng thi khi to li t in cho mnh nh mi. M xo c gi tr l 256. T m th 257 cha m kt thc thng tin (EOI - End Of Information). M ny c gi tr l 257. Mt file nh GIF c th cha nhiu nh. Mi mt nh s c m ho ring. Chng trnh gii m s lp i lp li thao tc gii m tng nh cho n khi gp m kt thc thng tin th dng li. Cc t m cn li (t 258 n 4095) cha cc mu thng lp li trong nh. 512 phn t u tin ca t in biu din bng 9 bit. Cc t m t 512 n 1023 biu din bi 10 bit, t 1024 n 2047 biu din bi 11 bit v t 2048 n 4095 biu din bi 12 bit. Cho chui u vo l "ABCBCABCABCD" (M ASCII ca A l 65, B l 66,). T in ban u gm 256 k t c bn. Thut ton gii nn c m t nh sau:
while(GetNextCode() != EOI) do Begin
Begin InitDictionary(); if FIRST_CODE FIRST_CODE = TRUE; /*M u tin ca mi mnh nh*/ End; Then Begin NewStr := DeCode(code); OutBuff(code); OutBuff(NewStr); OldStr := code; OldString = OldStr + End; FirstChar(NewStr); If code = CC /* M xa*/ AddtoDictionary(OldStr); Then OldString := NewStr; End; u vo c kch thc: 12 * 8 = 96 bits (12 k t 8 bit ABCBCABCABCD) Chui u ra s l: AB-BC-CB-BCA-ABC-CA-ABCD (theo nh bng) 65 - 66 - 67 - 259 - 258 - 67 262 u ra c kch thc l: 4 * 8 + 3 * 9 = 59 bits (4 k t 8 bit: 65 66 67 67 v 3 chui 9 bit: 259 258 262) T l nn l: 96:59 1,63.