Unit - 5 - Dictionary Technique

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Data Compression C LLEGPT IT603N

5 Dictionary
Techniques

Data

Data

Compression
Compression
Get Prepared Together

Prepared and Edited by:- Divya Kaurani Designed by:- Kussh Prajapati

Get Prepared Together

www.collegpt.com collegpt@gmail.com
Unit - 5 : Dictionary Techniques
Dictionary Coding :

Dictionary coding is a lossless data compression technique that replaces frequently


occurring patterns in data with shorter codewords based on a predefined dictionary.
This approach can significantly reduce the size of the compressed data.

The algorithm works by searching for matches between the text to be compressed
and a set of strings in the dictionary. When a match is found, the encoder substitutes
a reference to the string's position in the data structure. The encoded data can then be
decoded by replacing the codes with the text they represent.

There are two types of Dictionary Techniques:

1. Static Dictionary

2. Adaptive Dictionary

Static Dictionary:
The dictionary is permanent (or allowing addition, but not deletion). i.e.,
Application-specific, or data specific. Static dictionary compression (SDC) is a
technique that uses word-based dictionaries to compress short texts.

● Diagram Coding: Diagram coding is a data compression method that uses


semi-static dictionaries. In the first pass, the method finds all of the characters
and most frequently used two character blocks (diagrams) in the source and
inserts them into a dictionary. In the second pass, compression is performed.

Diagram encoding and pattern substitution are better for text compression than
run-length encoding. Pattern substitution is especially effective for compressing
programming languages.

In diagram coding, two bits are encoded from a message at a time. The
dictionary consists of all letters in the source alphabet, followed by as many
pairs of letters as the dictionary can accommodate.
Adaptive Dictionary:
In data compression, an adaptive dictionary is a dictionary that is generated and
updated while the compression process is running. As the decoder compresses text, it
adds letter triplets or words that are not yet in the dictionary.

Types of Adaptive Dictionary:

1. LZ77

2. LZ78

3. LZW

1. LZ77

It is an adaptive dictionary technique.LZ77 works by replacing redundant data with


metadata in the form of a triplet:

A search pointer is moved back through the search buffer that contains a portion of
the recently encoded to match a pattern, or a symbol in the look ahead buffer.

The encoder searches the search buffer for the longest match pattern and sends
Code=(Offset, Max_match_length,New_symbol)

Where, Offset is a distance from the pointer to the found pattern.


New Symbol is a code of a next symbol after the match pattern.
Max_match_length is a number of symbols in the string found in the search buffer
and identical with those in the beginning of the lookahead buffer.
Analysis of LZ77:

• LZ77 assumes patterns in the input stream occur close together.

• Any pattern that recurs over a period longer than the search buffer size will not be
Captured.

• A better compression method would save frequently occurring patterns in the


dictionary.

• The size L of look-ahead buffer is limited

• The size S of search buffer is limited

• When increasing L (or S), longer matches would be possible, thus compression
efficiency increases

• But searching for longer matches would reduce the speed.

• When increasing the length of buffers, compression efficiency drops.

Improvements of LZ77

• To encode the triples using VLC, e.g. PKZIP, ZIP, LHarc, PNG, ARJ, Winzip

LZSS

• Encode two fields instead of three

• Use a flag bit to indicate whether what follows is the codeword for a new symbol.

• For example 0- for single characters, 1-for triples


Get Prepared Together
2. LZ78

LZ78 Compression Algorithm

● Start with empty dictionary


● P = empty
● WHILE (there more characters in the char stream)
C = next character in the char stream
IF (the string P+C in the dictionary)
P = P+C
ELSE //(if P is empty,zero is its codeword)
output the code word corresponding to P
output C
add the string P+C to the dictionary
P = empty
● IF (P is not empty)
output the code word corresponding to P
● END

Solution: The encoding process is presented below in which:

The column STEP indicates the number of the encoding step.

The column POS indicates the current position in the input data.

The column DICTIONARY shows what string has been added to the dictionary. The
index of the string is equal to the step number.

The column OUTPUT presents the output in the form (W,C).

W represents the index of prefixes in the dictionary.

The output of each step decodes to the string that has been added to the dictionary.
3. LZW

Process:

● Codes 0-255 in the code table are always assigned to represent single bytes
from the input file.
● When encoding begins the code table contains only the first 256 entries, with
the remainder of the table being blanks.
● Compression is achieved by using codes 256 through 4095 to represent
sequences of bytes. Compression is achieved by using codes 256 through 4095
to represent sequences of bytes.
● As the encoding continues, LZW identifies repeated sequences in the data, and
adds them to the code table.
● Decoding is achieved by taking each code from the compressed file, and
translating it through the code table to find what character or characters it
represents.

LZW Compression Algorithm:

● Initialize table with single character strings


● P = first input character
● WHILE not end of input stream
C = next input character
IF P + C is in the string table
P=P+C
ELSE
output the code for P
add P + C to the string table
P=C
END WHILE
● output code for P
Get Prepared Together
Difference between LZ77, LZ78, and LZW Dictionary Techniques.
Feature LZ77 LZ78 LZW

Type Dictionary coding Dictionary coding Dictionary coding

Dictionary Sliding window Explicit dictionary Fixed-size code


table

Matching scope Within sliding Both dictionary and Both dictionary and
window previous data current data

Encoding output Offset & length, Codewords for Codewords for


direct encoding matches/new matches/new
patterns patterns (special
code for new)

Decompression Needs sliding Needs dictionary Needs fixed-size


window & previous code table
data

Random access Less efficient More efficient Efficient

Memory usage Lower Potentially higher Fixed

Complexity Simpler Slightly more Simpler


complex

Adaptability Limited to window Adapts to Fixed dictionary


size encountered entries
patterns

Compression ratio Good for Generally good May be lower for


self-similar data complex data

Applications Data with General-purpose Image/video


self-similarity compression compression
(historically)

Example DEFLATE in ZIP - GIF images


(historically)

Total Points 12 12 12
Difference between Static and Adaptive Dictionary Techniques.

Feature Static Adaptive

Dictionary Usage Utilizes a fixed dictionary Adapts the dictionary


that does not change dynamically based on the
during compression input data

Codeword Assignment Codewords are Codewords are assigned


pre-assigned based on dynamically as new
fixed patterns or patterns are encountered
frequencies

Compression Ratio Generally lower Typically achieves higher


compression ratios compression ratios

Memory Usage Requires less memory for May require more memory
storing the fixed dictionary to store and update the
adaptive dictionary

Adaptability Not adaptive, dictionary Adaptive, dictionary


remains static throughout adjusts to the input data,
compression allowing for better
compression of varying
patterns

Encoding Efficiency May be less efficient for More efficient for


compressing data with compressing data with
variable patterns varying patterns as the
dictionary adapts

Decoding Efficiency Decoding process is Decoding may require


usually simpler and faster more computational effort
due to the dynamic nature
of the dictionary

Examples Huffman coding, LZ77, LZ78, LZW


Run-Length Encoding (Lempel-Ziv-Welch),
(RLE) DEFLATE
All the Best
"Enjoyed these notes? Feel free to share them with

your friends and provide valuable feedback in your

review. If you come across any inaccuracies, don't

hesitate to reach out to the author for clarification.

Your input helps us improve!"

Visit: www.collegpt.com

www.collegpt.com collegpt@gmail.com

You might also like