Professional Documents
Culture Documents
Chapter 3-Part II
Chapter 3-Part II
• Statistical methods:
• It requires prior information about the occurrence of symbols.
E.g. Huffman coding and Entropy coding
• Estimate probabilities of symbols, code one symbol at a time, shorter
codes for symbols with high probabilities.
• Dictionary-based coding
• The previous algorithms (both entropy and Huffman) require the
statistical knowledge which is often not available (e.g., live audio,
video).
• Dictionary based coding, such as Lempel-Ziv (LZ) compression
techniques do not require prior information to compress strings.
•Rather, replace symbols with a pointer to dictionary entries.
Common Compression Techniques
• Compression techniques are classified into static, adaptive
(or dynamic), and hybrid.
• Static coding requires two passes: one pass to compute
probabilities (or frequencies) and determine the mapping,
& a second pass to encode.
•Examples: Huffman Coding, entropy encoding
• Adaptive coding:
• It adapts to localized changes in the characteristics of the data,
and don't require a first pass over the data to calculate a
probability model. All of the adaptive methods are one-pass
methods; only one scan of the message is required.
• The cost paid for these advantages is that the encoder & decoder
must be complex to keep their states synchronized, & more
computational power is needed to keep adapting the
encoder/decoder state.
• Examples: Lempel-Ziv and Adaptive Huffman Coding
Data Compression = Modeling + Coding
• Data compression consists of taking a stream of symbols
and transforming them into codes.
• The model is a collection of data and rules used to process input
symbols and determine their probabilities.
• A coder uses a model (probabilities) to assign codes for the given
input symbols.
Step 2:
Step 3:
Solution (Cont.)
Step 5:
Step 4:
Solution (Cont.)
Step 6:
Solution (Cont.)
Step 7:
Huffman code tree
• The zip and unzip use the LZH technique while UNIX's compress
methods belong to the LZW and LZC classes.
Lempel-Ziv compression
•The problem with Huffman coding is that it requires
knowledge about the data before encoding takes place.
• Huffman coding requires frequencies of symbol occurrence
before codeword is assigned to symbols
•Lempel-Ziv compression:
• Not rely on previous knowledge about the data
• Rather builds this knowledge in the course of data
transmission/data storage
• Lempel-Ziv algorithm (called LZ) uses a table of code-words
created during data transmission;
•each time it replaces strings of characters with a reference to a previous
occurrence of the string.
Lempel-Ziv Compression Algorithm
• The multi-symbol patterns are of the form: C0C1 . . . Cn-
1Cn. The prefix of a pattern consists of all the pattern
symbols except the last: C0C1 . . . Cn-1
seen unseen
LZ Compression
•Second, index the pieces of text obtained in the breaking-
down process from 1 to n.
• The empty string (start of text) has index 0, a has index 1, ...
1. ABBCBCABABCAABCAAB
2. SATATASACITASA.
Group Assignment
Compare the Huffman Coding and Lempel-Ziv
algorithm.
• The algorithm
• Draw a flowchart
• Time and space complexity
• Provide example to show the algorithms work in
image compression