Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

CHAPTER 5

DICTIONARY
TECHNIQUES
Lecture By,
Prof.M.Dhanalakshmi,
Asst Prof.,
IT Dept,
SCET, Surat.
DICTIONARY TECHNIQUES - INTRODUCTION
 Lossless Compression technique.

 Used in case of long phrases and long sentences.

 In case of normal English dictionary which


maintains indexes to identify meanings of words
here also we use the concept of index.

 These techniques— both static and adaptive (or


dynamic)—build a list of commonly occurring
patterns and encode these patterns by
transmitting their index in the list.
 Dictionary Technique:
 1. Static Dictionary
 2. Dynamic Dictionary (OR) Adaptive

 Static Dictionary:
 The already existing knowledge of the source and using the
already built in dictionary the long phrases are encoded.
 Dynamic (or) Adaptive Dictionary:
 As we come across the phrases simultaneous dictionary
updating and encoding will be done in parallel.
 A very reasonable approach to encoding long phrases
is to keep a list, or dictionary, of frequently occurring
patterns.
 When these patterns appear in the source output,
they are encoded with a reference to the dictionary.
 In effect we are splitting the input into two
classes, frequently occurring patterns and
infrequently occurring patterns.

 For this technique to be effective, the class of


frequently occurring patterns, and hence the size
of the dictionary, must be much smaller than the
number of all possible patterns.
TYPES OF DICTIONARY TECHNIQUES

Dictionary

Static Dynamic

Digram Coder LZ77 LZ78 LZW


STATIC DICTIONARY
 When considerable prior knowledge about the
source is available.

 If the task were to compress the student records


at a university, a static dictionary approach may
be the best.

 This is because we know ahead of time that


certain words such as “Name” and “Student ID”
are going to appear in almost all of the records.

 A static dictionary technique that is less


specific to a single application is digram
coding.
DIGRAM CODING
 The dictionary consists of all letters of the source
alphabet followed by as many pairs of letters,
called digrams, as can be accommodated by the
dictionary.
 Steps:
 The digram encoder reads a two-character input and
searches the dictionary to see if this input exists in
the dictionary.
 If it does, the corresponding index is encoded and
transmitted.
 If it does not, the first character of the pair is
encoded. The second character in the pair then
becomes the first character of the next digram.
 The encoder reads another character to complete the
digram, and the search procedure is repeated.
DIGRAM CODING
 Example:
 Suppose we have a source with a five-letter
alphabet A={a,b,c,d,r} Based on knowledge about
the source, we build the dictionary shown in
Table:

 Suppose we wish to encode the sequence:


DIGRAM CODING
 The encoder reads the first two characters ab and
checks to see if this pair of letters exists in the
dictionary.
 It does and is encoded using the codeword 101.
 The encoder then reads the next two characters ra
and checks to see if this pair occurs in the dictionary.
 It does not, so the encoder sends out the code for r,
which is 100, then reads in one more character, c, to
make the two-character pattern ac.
 This does exist in the dictionary and is encoded as
110.
 Continuing in this fashion, the remainder of the
sequence is coded.
 The output string for the given input sequence is
101100110111101100000.
ADAPTIVE DICTIONARY
 LZ77 Approach

 LZ78 Approach

 LZW Approach
ADAPTIVE DICTIONARY
 LZ77/LZ1/Sliding Window Technique
 Sliding window is divided into two parts:
 1. Search Buffer
 2. Look Ahead Buffer

 Triplet of the form <o,l,c>


 o-> offset
 l-> Length of match

 c-> codeword.
ADAPTIVE DICTIONARY
 LZ77 Approach:
 In the LZ77 approach, the dictionary is simply a
portion of the previously encoded sequence.
 The encoder examines the input sequence through a
sliding window.
 The window consists of two parts,
 1. A search buffer that contains a portion of the
recently encoded sequence
 2. Look-ahead buffer that contains the next portion of

the sequence to be encoded.


ADAPTIVE DICTIONARY
 To encode the sequence in the look-ahead buffer,
the encoder moves a search pointer back through
the search buffer until it encounters a match to
the first symbol in the look-ahead buffer.
 Offset: The distance of the pointer from the look-
ahead buffer is called the offset.
 The encoder then examines the symbols following
the symbol at the pointer location to see if they
match consecutive symbols in the look-ahead
buffer.
 Length of match:
 The number of consecutive symbols in the search
buffer that match consecutive symbols in the look-
ahead buffer, starting with the first symbol, is called
the length of the match.
ADAPTIVE DICTIONARY
 Once the longest match has been found, the
encoder encodes it with a triple <o,l,c>, where o is
the offset, l is the length of the match, and c is
the codeword corresponding to the symbol in the
look-ahead buffer that follows the match.
 Three different possibilities that may be
encountered during the coding process:
 1. There is no match for the next character to be
encoded in the window.
 2. There is a match.
 3. The matched string extends inside the look-ahead
buffer.
ADAPTIVE DICTIONARY
 Example:
 cabracadabrarrarrad
 Window size = 13
 Look ahead buffer = 6 [Given in problem]
Search Buffer = 7
c a b r a c a d a b r a r r a r r a d

SB LAB
 i) c-> <0,0,C(c)>
Search Pointer

c a b r a c a d a b r a r r a r r a d

SB LAB
ADAPTIVE DICTIONARY
 ii) a-> <0,0,C(a)>

c a b r a c a d a b r a r r a r r a d

SB LAB
 iii)b->(0,0,C(b))

c a b r a c a d a b r a r r a r r a d

SB LAB
ADAPTIVE DICTIONARY
 iv) r-> (0,0,C(r))
Search Pointer

c a b r a c a d a b r a r r a r r a d

3
 v) a-> (3,1,C(c))

c a b r a c a d a b r a r r a r r a d

2
ADAPTIVE DICTIONARY
 vi) a-> (2,1,C(d))

c a b r a c a da b r a r r a r r a d

SB LAB

 vii) a-> (7,4,C(r))

c a b r a c a da b r a r r a r r a d

SB LAB
 vii) r-> (3,5,C(d)
c a b r a c a da b r a r r a r r a d
ADAPTIVE DICTIONARY
 Final encoded triplet sequence:
 <0,0,C(c)>

 <0,0,C(a)>

 <0,0,C(b)>

 <0,0,C(r)>

 <3,1,C(c)>

 <2,1,C(d)>

 <7,4,C(r)>

 <3,5,C(d)>
LZ77 DECODING

 LZ77 DECODING:
LZ78 ENCODING
 Represent the encoder output in doublet form
<i,c>.
 The inputs are coded as a double <i,c>, with i
being an index corresponding to the dictionary
entry that was the longest match to the input,
 c being the code for the character in the input
following the matched portion of the input.
 Whenever a newest entry happens then index
value will be “0”.
LZ78 ENCODING
 Example:
 Encode the sequence by using LZ78 Approach
 ABCDABCABCDAABCABCE

Encoder output Index Entry


<0,C(A)> 1 A
<0,C(B)> 2 B
<0,C(C)> 3 C
<0,C(D)> 4 D
<1,C(B)> 5 AB
<3,C(A)> 6 CA
<2,C(C)> 7 BC
<4,C(A)> 8 DA
<5,C(C)> 9 ABC
<9,C(E)> 10 ABCE
LZ78 DECODING
<0,C(A)> ,<0,C(B)>,<0,C(C)>,<0,C(D)>
<1,C(B)>,<3,C(A)>,<2,C(C)>,<4,C(A)>,<5,C(C)>
<9,C(E)>
Encoder Index Entry
output
<0,C(A)> 1 A
<0,C(B)> 2 B
<0,C(C)> 3 C
<0,C(D)> 4 D
<1,C(B)> 5 AB
<3,C(A)> 6 CA
<2,C(C)> 7 BC
<4,C(A)> 8 DA
<5,C(C)> 9 ABC
<9,C(E)> 10 ABCE
 Final sequence after decoding:
 ABCDABCABCDAABCABCE
LZ78 DRAWBACK
 The dictionary keeps growing without bound.

 In a practical situation, we would have to stop


the growth of the dictionary at some stage, and
then either prune it back or treat the encoding as
a fixed dictionary scheme.
LZW ALGORITHM
 There are a number of ways the LZ78 algorithm
can be modified.
 The most well-known modification, one that
initially sparked much of the interest in the LZ
algorithms, is a modification by Terry Welch
known as LZW.
 Welch proposed a technique for removing the
necessity of encoding the second element of the
pair <i,c>.
 The encoder would only send the index to the
dictionary.

You might also like