Professional Documents
Culture Documents
Compression: Another Example of Greedy Algorithm: Huffman Codes
Compression: Another Example of Greedy Algorithm: Huffman Codes
Compression: Another Example of Greedy Algorithm: Huffman Codes
CS 445-
445- Algorithms • Take a text (or an object) Lossless compression
• Encode it so that
– Less space is used, or energy to transmit it.
Another example of greedy algorithm:
– No information is lost – we can reconstruct the
Huffman codes original text
Huffman coding
Huffman coding / prefix codes • Codewords are presented by a binary tree
• Each leaf stores, and represents a character
Prefix codes: • Node with two children – left ~ 0; right ~ 1
no codeword is a prefix of another codeword • codeword = path from the root to the leaf storing given
characters
• Easy encoding:
The code represented by the lefs of the
– Construcs the strings s of bits by
tree is a prefix code (why?) 0 1
concatenating codewords of chars
char code • Easy decoding: char code 0 1 A
A 1 – given s, we find the first few bits (prefix) that
A 1 B
forms a char – there can be only one such prefix. 0 1
B 01 B 01
– Remove this prefix from s and repeat.
1 C
C 001 C 001
DADABCA
D 0001 0001 1 0001 1 01 001 1 D 0001 D
1
Huffman coding Huffman codes and full trees
• Given a text string X, find a prefix code for the characters of X
• Codewords are presented by binary trees giving smallest encoding for X
– Frequent characters should have short codewords
• We can always aim at getting full binary trees
– Rare characters should have long codewords
– (no node with a single child) • Example
0 1 – X = ABRACADABRA (“R” is rate, “A” is frequent)
char code – T1 encodes X into 29 bits
A 1 1 A – T2 encodes X into 24 bits
0
B 01 0 1 B
C 001
1 C
D 0001
000 C D B A B R
D
A R C D
Huffman codes-cont
Huffman codes and full trees • Σ’ - alphabet. X – input file to encode.
• f(x) = how many times x appears X.
• Given a file X, find a prefix code for the characters of X giving
smallest encoding for X.
• Let w(x) denote the binary code of a char x ∈Σ’.
• The size of the encoded file is therefor
– Frequent characters should have short codewords
∑x ∈Σ’ f(x) w(x)
– Rare characters should have long codewords
Greedy algorithm for generating opt tree Credit for next several slides:
2
Huffman Tree Construction 1
The number indicate the frequency Huffman Tree Construction 2
A C E H I A H
3 5 8 2 7 3 2
C E I Q
a 5 8 7
5
•The two least-frequent nodes are A,H •The two least-frequent nodes are A,H
•The algorithm replaces them with one new node a. •The algorithm replaces them with one new node a.
•Its frequency is the sum of frequencies of these two nodes •Its frequency is the sum of frequencies of these two nodes
b 10 8 7
b 10 15
c
The two least frequent nodes were a
and C, and they were replaced by a
node b whose frequency is the sum of
their frequencies – 10.
3
Huffman codes Huffman codes-correctness
Assume by induction that the algorithm works correctly for all
• Correctness: alphabets with less than n characters.