Professional Documents
Culture Documents
Unit - 5 - Dictionary Technique
Unit - 5 - Dictionary Technique
Unit - 5 - Dictionary Technique
5 Dictionary
Techniques
Data
Data
Compression
Compression
Get Prepared Together
Prepared and Edited by:- Divya Kaurani Designed by:- Kussh Prajapati
www.collegpt.com collegpt@gmail.com
Unit - 5 : Dictionary Techniques
Dictionary Coding :
The algorithm works by searching for matches between the text to be compressed
and a set of strings in the dictionary. When a match is found, the encoder substitutes
a reference to the string's position in the data structure. The encoded data can then be
decoded by replacing the codes with the text they represent.
1. Static Dictionary
2. Adaptive Dictionary
Static Dictionary:
The dictionary is permanent (or allowing addition, but not deletion). i.e.,
Application-specific, or data specific. Static dictionary compression (SDC) is a
technique that uses word-based dictionaries to compress short texts.
Diagram encoding and pattern substitution are better for text compression than
run-length encoding. Pattern substitution is especially effective for compressing
programming languages.
In diagram coding, two bits are encoded from a message at a time. The
dictionary consists of all letters in the source alphabet, followed by as many
pairs of letters as the dictionary can accommodate.
Adaptive Dictionary:
In data compression, an adaptive dictionary is a dictionary that is generated and
updated while the compression process is running. As the decoder compresses text, it
adds letter triplets or words that are not yet in the dictionary.
1. LZ77
2. LZ78
3. LZW
1. LZ77
A search pointer is moved back through the search buffer that contains a portion of
the recently encoded to match a pattern, or a symbol in the look ahead buffer.
The encoder searches the search buffer for the longest match pattern and sends
Code=(Offset, Max_match_length,New_symbol)
• Any pattern that recurs over a period longer than the search buffer size will not be
Captured.
• When increasing L (or S), longer matches would be possible, thus compression
efficiency increases
Improvements of LZ77
• To encode the triples using VLC, e.g. PKZIP, ZIP, LHarc, PNG, ARJ, Winzip
LZSS
• Use a flag bit to indicate whether what follows is the codeword for a new symbol.
The column POS indicates the current position in the input data.
The column DICTIONARY shows what string has been added to the dictionary. The
index of the string is equal to the step number.
The output of each step decodes to the string that has been added to the dictionary.
3. LZW
Process:
● Codes 0-255 in the code table are always assigned to represent single bytes
from the input file.
● When encoding begins the code table contains only the first 256 entries, with
the remainder of the table being blanks.
● Compression is achieved by using codes 256 through 4095 to represent
sequences of bytes. Compression is achieved by using codes 256 through 4095
to represent sequences of bytes.
● As the encoding continues, LZW identifies repeated sequences in the data, and
adds them to the code table.
● Decoding is achieved by taking each code from the compressed file, and
translating it through the code table to find what character or characters it
represents.
Matching scope Within sliding Both dictionary and Both dictionary and
window previous data current data
Total Points 12 12 12
Difference between Static and Adaptive Dictionary Techniques.
Memory Usage Requires less memory for May require more memory
storing the fixed dictionary to store and update the
adaptive dictionary
Visit: www.collegpt.com
www.collegpt.com collegpt@gmail.com