Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

10.

7 Arithmetic Coding
Arithmetic coding is unlike all the other methods discussed in that it takes in the complete data
stream and outputs one specific codeword. This codeword is a floating point number between 0
and 1. The bigger the input data set, the more digits in the number output. This unique number is
encoded such that when decoded, it will output the exact input data stream. Arithmetic coding,
like Huffman, is a two-pass algorithm. The first pass computes the characters' frequency and
generates a probability table. The second pass does the actual compression.
The probability table assigns a range between 0 and 1 to each input character. The size of each
range is directly proportional to a characters' frequency. The order of assigning these ranges is
not as important as the fact that it must be used by both the encoder and decoder. The range
consists of a low value and a high value. These parameters are very important to the
encode/decode process. The more frequently occurring characters are assigned wider ranges in
the interval requiring fewer bits to represent them. The less likely characters are assigned more
narrow ranges, requiring more bits.
With arithmetic coding, you start out with the range 0.01.0 (Figure 10.9). The first character
input will constrain the output number with its corresponding range. The range of the next
character input will further constrain the output number. The more input characters there are, the
more precise the output number will be.

Figure 10.9 Assignment of ranges between 0 and 1.

Suppose we are working with an image that is composed of only red, green, and blue pixels.
After computing the frequency of these pixels, we have a probability table that looks like
 
Pixel Probability Assigned Range
Red 0.2 [0.0,0.2)
Green 0.6 [0.2,0.8)
Blue 0.2 [0.8,1.0)

The algorithm to encode is very simple.


LOW 0. 0
HIGH 1.0
WHILE not end of input stream
get next CHARACTER
RANGE = HIGH  LOW
HIGH = LOW + RANGE * high range of CHARACTER
LOW = LOW + RANGE * low range of CHARACTER
END WHILE
output LOW

Figure 10.10 shows how the range for our output is reduced as we process two possible input
streams.

0.0 0.2 0.8 1.0

RED GREEN BLUE

RED GREEN BLUE

RED GREEN BLUE

a
0.0 0.2 0.8 1.0

RED GREEN BLUE

RED GREEN BLUE

Figure 10.10 Reduced output range: (a) Green-Green-Red; (b) Green-Blue-Green.

Let's encode the string ARITHMETIC. Our frequency analysis will produce the following
probability table.
Symbol Probability Range
A 0.100000 0.000000 - 0.100000
C 0.100000 0.100000 - 0.200000
E 0.100000 0.200000 - 0.300000
H 0.100000 0.300000 - 0.400000
I 0.200000 0.400000 - 0.600000
M 0.100000 0.600000 - 0.700000
R 0.100000 0.700000 - 0.800000
T 0.200000 0.800000 - 1.000000

Before we start, LOW is 0 and HIGH is 1. Our first input is A. RANGE = 1  0 = 1. HIGH will
be (0 + 1) x 0.1 = 0.1. LOW will be (0 + l) x 0 = 0. These three calculations will be repeated until
the input stream is exhausted. As we process each character in the string, RANGE, LOW, and
HIGH will look like

A range = 1.000000000 low = 0.0000000000 high = 0. 1000000000


R range =0.100000000 low=0.0700000000 high = 0.0800000000
I range =0.010000000 low=0.0740000000 high = 0.0760000000
T range = 0.002000000 low = 0.0756000000 high = 0.0760000000
H range = 0.000400000 low = 0.0757200000 high = 0.0757600000
M range = 0.000000000 low = 0.0757440000 high = 0.0757480000
E range = 0.000004000 low = 0.0757448000 high = 0.0757452000
T range = 0.000000400 low = 0.0757451200 high = 0.0757452000
I range = 0.000000080 low = 0.0757451520 high = 0.0757451680
C range = 0.0000000 16 low = 0.0757451536 high = 0.0757451552

Our output is then 0.0757451536.


The decoding algorithm is just the reverse process.
get NUMBER
DO
find CHARACTER that has HIGH > NUMBER and LOW <NUMBER
set HIGH and LOW corresponding to CHARACTER
output CHARACTER
RANGE = HIGH  LOW
NUMBER = NUMBER  LOW
NUMBER = NUMBER  RANGE
UNTIL no more CHARACTERs
As we decode 0.0757451536, we see

num = 0,075745153600 A Range = 0. 1 low = 0.0 high = 0. 1


num = 0.757451536000 R Range = 0. 1 low = 0.7 high = 0.8
num = 0.574515360000 1 Range = 0.2 low = 0.4 high = 0.6
num = 0.872576800000 T Range = 0.2 low = 0.8 high = 1.0
num = 0.362884000000 H Range = 0. 1 low = 0.3 high = 0.4
num = 0.628840000000 M Range = 0. 1 low = 0.6 high = 0.7
num = 0.288400000002 E Range = 0. 1 low = 0.2 high = 0.3
num = 0.884000000024 T Range = 0.2 low = 0,8 high = 1.0
num = 0.420000000120 1 Range = 0.2 low = 0.4 high = 0.6
num = 0.100000000598 C Range = 0. 1 low = 0. 1 high = 0.2

Arithmetic coding is one possible algorithm for use in the entropy coder during JPEG
compression. For JPEG compression, see the next part. JPEG achieves slightly higher
compression ratios than the Huffman option but is computationally more intensive.

You might also like