Professional Documents
Culture Documents
Lecture 10
Lecture 10
Arithmetic Code
• Shannon-Fano-Elias coding
• Arithmetic code
• Competitive optimality of Shannon code
• Generation of random variables
∑ ∫ x
F (x) = pi , F (x) = p(X ≤ x) = f (u)du
pi <x
• Truncate the real number to enough bits such that the codewords are
unique
F(x)
F(x) p(x)
F(x − 1)
1 2 x x
! "
1
x p(x) F (x) F (x) F (x) in Binary l(x) = log + 1 Codeword
p(x)
1 0.25 0.25 0.125 0.001 3 001
2 0.25 0.5 0.375 0.011 3 011
3 0.2 0.7 0.6 0.10011 4 1001
4 0.15 0.85 0.775 0.1100011 4 1100
5 0.15 1.0 0.925 0.1110110 4 1110
1
F (x) − ⌊F̄ (x)⌋l(x) <
2l(x)
For l(x) = ⌈log p(x)
1
⌉+1
1 1
= 1 ⌉+1 (1)
2l(x) ⌈log p(x)
2
1 log p(x)
1 1
< 2 = p(x) (2)
2 2
= F̄ (x) − F (x − 1) (3)
z ∗ = 0.z1z2 . . . zlzl+1
1
[0.z1z2 . . . zl, 0.z1z2 . . . zl + l
)
2
∑ ∑ 1
p(x)l(x) = p(x)⌈log ⌉ + 1 ≤ H(X) + 2 bits
p(x)
• L = 2.75
C(X1X2 · · · Xn) =?
• Arithmetic codes
• Decode on-the-fly
Reference: Arithmetic coding for data compression, by I. Witten, R. M. Neal and G. Cleary, Communications of the ACM, 1987.
• Given any length n and p(x1x2 · · · xn), code of length log p(x11···xn) + 2
bits
• No other code can do much better than the Shannon code most of the
time
p(l∗(X) < l(X)) ≥ p(l∗(X) > l(X))
• Huffman codes are not easy to analyze this way because lack of
expression for code length
• Dual problem: how many fair coin tosses are needed to generate a
random variable X
• h → a, th → b, tth → a · · ·
• Coding by “interval”:
Shannon-Fano-Elias code
Arithmetic code