Chapter 4

CHAPTER 4
ARITHMETIC
CODING
Lecture By,
Prof.M.Dhanalakshmi,
Asst.Prof., IT Dept,
SCET,Surat.
INTRODUCTION
 Huffman approach generates variable length
codes.
 Another popular method of generating
variable-length codes called arithmetic
coding.
 When a string is converted to arithmetic
encoding, frequently used characters will be
stored with fewer bits and not-so-frequently
occurring characters will be stored with more
bits, resulting in fewer bits used in total.
DISADVANTAGES OF HUFFMAN CODE
 Huffman coding methods generating codewords
for individual symbols.
 Arithmetic coding differs Huffman coding,
 Rather than separating the input into
component symbols and replacing each with a
code, arithmetic coding encodes the entire
message into a single number.
 It represents the current information as a range,
defined by two numbers.
 Storage of Huffman code for large applications
would be a highly inefficient and time-consuming
procedure.
 Impractical to obtain Huffman codes for long
sequences of symbols.
DISADVANTAGES OF HUFFMAN CODE
 In order to find the Huffman codeword for a
particular sequence of length m, we need
codewords for all possible sequences of length m.
 This fact causes an exponential growth in the
size of the codebook.
 We need a way of assigning codewords to
particular sequences without having to generate
codes for all sequences of that length.
 The arithmetic coding technique fulfills this
requirement.
HUFFMAN AND ARITHEMTIC
 Implemented results show that:
 Compression ratio of arithmetic coding is
better than Huffman coding,
LOGIC OF ARITHMETIC CODE
 In arithmetic coding a unique identifier or tag is
generated for the sequence to be encoded.
 This tag corresponds to a binary fraction, which
becomes the binary code for the sequence.
 Arithmetic coding approach is divided into two
phases:
 1. Phase 1: Unique identifier or tag is generated
for a given sequence of symbols.
 2. Phase 2: Tag is then given a unique binary
code.
 A unique arithmetic code can be generated for a
sequence of length m without the need for
generating codewords for all sequences of length m.
 This is contrast to the situation for Huffman
codes.
CODING A SEQUENCE
 In order to distinguish a sequence of symbols
from another sequence of symbols we need to
tag it with a unique identifier.
 One possible set of tags for representing sequences of

symbols are the numbers in the unit interval [0 ,1) .
 Because the number of numbers in the unit interval

is infinite, it should be possible to assign a unique tag
to each distinct sequence of symbols.
 In order to do this we need a function that will map

sequences of symbols into the unit interval.
CODING A SEQUENCE
 A function that maps random variables, and
sequences of random variables, into the
unit interval is the cumulative distribution
function (cdf) of the random variable associated
with the source.
 This is the function we will use in developing the

arithmetic code.
EXAMPLE
 Consider 3-letter alphabet A={a1,a2,a3} with
P(a1)=0.7, P(a2)=0.1, P(a3)=0,2.
 Using equation Fx(0)=0 , Fx(1)=0.7, Fx(2)=0.8,
Fx(3)=1.
EXAMPLE
 For three letter alphabets a1, a2, a3
 Step 1:
 Calculate the difference d
 d = Upper bound (UB) – lower bound (LB)
 Step 2:
 R(a1) = LB + d(Probability of a1)
 R(a2) = LB + d (Probability of a2)
 Iteration 1:
 First alphabet to be encoded is a1.
 So LB = 0, UB = 0.7
 d = 0.7 – 0 = 0.7
 R(a1) = 0 + 0.7(0.7) = 0.49
 R(a2) = 0.49 + 0.7(0.1) = 0.56
EXAMPLE
 Step 1:
 Calculate the difference d
 Step 2:
 Iteration 2:
 Next alphabet to be encoded is a2.
 Now LB = 0.49, UB = 0.56
 d = 0.56 – 0.49 = 0.07
 R(a1) = 0.49 + 0.07(0.7) = 0.539
 R(a2) = 0.539 + 0.07(0.1) = 0.546
EXAMPLE
 Step 1:
 Calculate the difference d
 Step 2:
 Iteration 3:
 Next alphabet to be encoded is a3.
 Now LB = 0.56, UB = 0.546
 d = 0.56 – 0.546 = 0.014
 R(a1) = 0.546 + 0.014(0.7) = 0.5558
 R(a2) = 0.5558 + 0.014(0.1) = 0.5572

EXAMPLE
 To calculate the tag:
 Tag = upper bound + lower bound
2
 0.56 + 0.546
2
 Tag value = 0.553
EQUATION TO GENERATE LOWER LIMIT
AND UPPER LIMIT
 For any sequence x= (x1,x2,x3….xn)
 Midpoint of the interval for the tag is

EXAMPLE GENERATING A TAG
 Consider the source A=(a1,a2,a3} with probability
P(a1)=0.8, P(a2)=0.02, P(a3)=0.18.
 Suppose we wish to encode the sequence 1 3 2 1
 Fx(0)=0, Fx(1)=0.8, Fx(2)=0.82, Fx(3)=1
 Use below equations to determine the lower and

upper limits of the interval containing the tag.
 Initialize u(0) 1 and l(0) to 0 .

 Our first element in the sequence is 1.
 The tag is contained in interval [0,0.8].

 The second element of the sequence is 3. Using
the update equations we get
 The interval containing the tag for the sequence

1 3 is [0.656,0.8]
 The third element, 2, results in the following
update equations:
 The interval for the tag is [0.7712,0.77408]

 Last element is 1.
 Continuing with the last element, the upper
 and lower limits of the interval containing the

tag are:
 Tag for the sequence 1 3 2 1 is generated as
DECIPHERING
THE TAG
DECIPHERING THE TAG
 Let us try to obtain the sequence represented by
the tag.
 The tag value is 0.772352.
 Start with l(0) 0 and u(0) = 1.
 After decoding the first element of the sequence

x1, the upper and lower limits become
 The interval containing the tag is

[FX(x1−1).FX(x1)].
 We need to find the value of x1 for which
0.772352 lies in the interval [FX(x1−1).FX(x1)].
DECIPHERING THE TAG
 If we pick x1 = 1, the interval is [0,0.8).
 If we pick x1 = 2, the interval is [0.8,0.82)
 If we pick x1 = 3, the interval is [0.82,1.0).
 As 0.772352 lies in the interval [0.0,0.8] we

choose x1 = 1.
 So, decoded first element in the sequence is 1.

DECIPHERING THE TAG
 Second element x2.
 The updated values of l(1) and u(1).
 If we pick x2 =1, the updated interval is [0,0.64],

which does not contain the tag.
 If we pick x2 = 2, the updated interval is [0.6,0.656],
which also does not contain the tag.
 If we pick x2 = 3, the updated interval is [0.656,0.8],
which does contain the tag value of 0.772352.
 Therefore, the second element in the sequence is 3.
 Decoded sequence is 1 3
DECIPHERING THE TAG
 To identify third element x3:
 Similarly calculate x3 and x4 by your own…

 Exercise…To calculate elements x3 and x4.
ALGORITHM TO DECIPHER A TAG
GENERATING A
BINARY CODE
GENERATING A BINARY CODE
 Using the algorithm described in the previous
section, we can obtain a tag for a given sequence x.
 However, the binary code for the sequence is what we
really want to know.
 We want to find a binary code that will represent the
sequence x in a unique and efficient manner.
 The tag forms a unique representation for the
sequence.
 This means that the binary representation of the tag
forms a unique binary code for the sequence.
 The binary representation of some of these values
would be infinitely long, in which case, although the
code is unique, it may not be efficient.
 To make the code efficient, the binary representation
has to be truncated.
 is a number in the interval 0 1. A binary
code for can be obtained by taking the binary
representation of this number and truncating it
to bits.
 Example:
 Consider a source A that generates letters from

an alphabet of size four,
 Binary code for the input source
ALGORITHM IMPLEMENTATION
 A recursive algorithm for the boundaries of the
interval containing the tag for the sequence being
encoded as
 where xn is the value of the random variable

corresponding to the nth observed symbol, l(n) is
the lower limit of the tag interval at the nth
iteration, and u(n) is the upper limit of the tag
interval at the nth iteration.
 Before we can implement this algorithm, there is
one major problem we have to resolve.
 Recall that the rationale for using numbers in the
interval [0,1] as a tag was that there are an
infinite number of numbers in this interval.
 Consider the values of l(n) and u(n) in previous
example.
 As n gets larger, these values come closer and
closer together.
 This means that in order to represent all the
subintervals uniquely we need increasing
precision as the length of the sequence increases.
 In a system with finite precision, the two values
are bound to converge, and we will lose all
information about the sequence from the point at
which the two values converged.
 To avoid this situation, we need to rescale
the interval.
 We would also like to perform the encoding
incrementally—that is, to transmit portions of
the code as the sequence is being observed,
rather than wait until the entire sequence has
been observed before transmitting the first bit.
 The algorithm we describe in this section takes
care of the problems of synchronized rescaling
and incremental encoding.
 As the interval becomes narrower, we have three
possibilities:
 1. The interval is entirely confined to the lower
half of the unit interval [0,0.5].
 2. The interval is entirely confined to the upper
half of the unit interval [0.5,1.0].
 3. The interval straddles the midpoint of the
unit interval.
 Consider only the first two cases.
 The most significant bit of the binary
representation of all numbers in the interval
[0,0.5] is 0, and the most significant bit of the
binary representation of all numbers in the
interval [0.5,1] is 1.
 Therefore, once the interval gets restricted to
either the upper or lower half of the unit interval,
the most significant bit of the tag is fully
determined.
 Therefore, without waiting to see what the rest of
the sequence looks like, we can indicate to the
decoder whether the tag is confined to the upper
or lower half of the unit interval by sending a 1
for the upper half and a 0 for the lower half.
 Once the encoder and decoder know which half
contains the tag, we can ignore the half of the
unit interval not containing the tag and
concentrate on the half containing the tag.
 As our arithmetic is of finite precision, we can do
this best by mapping the half interval containing
the tag to the full [0,1] interval.
 The mappings required are:
 As soon as we perform either of these mappings,
we lose all information about the most significant
bit.
 However, this should not matter because we have
already sent that bit to the decoder.
 We can now continue with this process,
generating another bit of the tag every time the
tag interval is restricted to either half of the unit
interval.
 This process of generating the bits of the
tag without waiting to see the entire
sequence is called incremental encoding.
EXAMPLE TAG GENERATION WITH
SCALING
 Encode the sequence 1 3 2 1
 P(a1)=0.8, P(a2)=0.02, P(a3)=0.18
 Initialize u(0) to 1, and l(0) to 0, the first

element of the sequence, 1, results in the
following update:
 The interval [0,0.8] is not confined to either the

upper or lower half of the unit interval, so we
proceed.
SCALING
 The second element of the sequence is 3. This
results in the update:
 The interval [0.656,0.8) is contained entirely in

the upper half of the unit interval
 So send the binary code 1 and rescale using
E2(x)=2(x-0.5)
 The third element, 2, results in the following

update equations:
SCALING
 The interval for the tag is [0.5424,0.54816],
which is contained entirely in the upper half of
the unit interval.
 Send 1 and go through another scaling
[E2 mapping].
 This interval is contained entirely in the lower

half of the unit interval.
 Send a 0 and use the E1 mapping to rescale:
 The interval is still contained entirely in the

lower half of the unit interval,
SCALING
 The interval is still contained entirely in the
lower half of the unit interval.
 Send 0 and perform scaling using E1 mapping.
 Because the interval containing the tag remains

in the lower half of the unit interval.
 Send 0 and rescale using E1 mapping.
SCALING
 Now the interval containing the tag is contained
entirely in the upper half of the unit interval.
 Send 1 and perform scaling using E2 mapping.
 Fourth element in input sequence is 1.

 Continuing with the last element, the upper and
lower limits of the interval containing the tag are
SCALING
 At this point, if we wished to stop encoding, all
we need to do is inform the receiver of the final
status of the tag value.
 We can do so by sending the binary
representation of any value in the final tag
interval.
 Generally, this value is taken to be u(n).
SCALING
 Therefore, this technique solves the finite
precision problem.
 As we shall soon see, the bits that we have been
sending with each mapping constitute the tag
itself, which satisfies our desire for incremental
encoding.
 The binary sequence generated during the
encoding process in the previous example is
1100011.
 A binary number 1100011 corresponds to the
decimal number 0.7734375.
INTEGER
IMPLEMENTATION
• Encoder Implementation
• Decoder Implementation
INTEGER IMPLEMEMTATION
 We have described a floating-point
implementation of arithmetic coding.
 Let us now repeat the procedure using integer

arithmetic and generate the binary code in the
process.
 ENCODER IMPLEMENTATION:
 The first thing we have to do is decide on the word
length to be used. Given a word length of m, we map
the important values in the 0 1 interval to the range
of 2m binary words.
 The point 0 gets mapped to
 1 gets mapped to
 Equations for Encoder implementation:
 ENCODING ALGORITHM:
ENCODING ALGORITHM:
 ENCODING ALGORITHM:
 E3 condition:
 Second MSB of l(k) = 1 and second MSB of u(k)=0.
 Scale3 -> No.of E3 Mappings.
ENCODING ALGORITHM:
 EXAMPLE:
 Encode the sequence 1 3 2 1 with parameters shown
in table below:
 First we need to select the word length m.

 Maximum available range is 2m values.
ENCODING EXAMPLE
 m should be large enough to accommodate
uniquely the set of values between 0 and
Total_Count.
 For this example add all the values of Count,
Cum_Count and Total_Count.
 40+1+9+50+40+41+50=231.
 231 fits with the range of 28=256.
 So, word length m=8.
 So, use 8-bit representation.
 With this value of m we have

ENCODING EXAMPLE
 Encoding sequence is 1 3 2 1
 First element of the sequence to be encoded
is 1.
 Next element in sequence is 3.
 The MSBs of l(2) and u(2) are both 1.

 Se, Send 1
ENCODING EXAMPLE
 Shift this value out and send it to the decoder.
 All other bits are shifted left by 1 bit, giving
 While the MSBs of the limits are different, the

second MSB of the upper limit is 0, while the
second MSB of the lower limit is 1.
 This is the condition for the E3 mapping.
 We complement the second MSB of both limits

and shift 1 bit to the left, shifting in a 0 as the
LSB of l(2) and a 1 as the LSB of u(2).
ENCODING EXAMPLE
 Also increment Scale3 to a value of 1.

 Next element in sequence is 2.
 Updating the limits we have:
 Two MSBs are equal to 1.

 So, Send 1.
 So, we shift out a 1 and shift left by 1 bit

ENCODING EXAMPLE
 As Scale3 value is > 0 send complement of 1

previously transmitted.
 Send 0
 Decrement Scale3 to 0.
 In the previous l(3) and u(3) two MSBs are equal

to 0.
 So, send 0.
ENCODING EXAMPLE
 Both MSBs are 0 again.
 So, shift the bits.
 Send 0.
 Both MSBs are equal to 1.

 So, shift bits to left.
 Send 1.
 Again MSBs are same 0.

 So, Send 0.
ENCODING EXAMPLE
 Now, MSBs are different. But second MSB of

l(3) =1 and second MSB of u(3) =0 which represents
E3 mapping.
 Increment Scale3 to 1.
ENCODING EXAMPLE
 The next element in the sequence to be encoded
is 1. Therefore,
 Until now our encoded sequence is 1100010

 We have completed our input sequence
1 3 2 1 to 1100010.
 At the point in terminating encoding it is
necessary to send the current status of the tag.
 Send the value of lower limit l(4).
ENCODING EXAMPLE
 As l(4) is 0, we will end up sending eight 0s.
 However, Scale3 at this point is 1.
 Therefore, after we send the first 0 from the

value of l(4), we need to send a 1 before
sending the remaining seven 0s.
 The final transmitted sequence is

1100010010000000.
DECODER IMPLEMENTATION
 DECODING EXAMPLE:
 After encoding the sequence in previous example
we ended up with the following binary sequence:
1100010010000000.
 Treating this as the received sequence and using
the parameters from Table below
DECODING EXAMPLE
 Using the same word length, eight, we read in
the first 8 bits of the received sequence to form
the tag t:
 11000100
 Initialize the lower and upper limits as
 To begin decoding, we compute

DECODING EXAMPLE
 Compare this value to
 Decode the first symbol as 1.
 Once we have decoded a symbol, we update the

lower and upper limits:
DECODING EXAMPLE
 The MSB of the limits are different and the E3

condition does not hold.
 Therefore, we continue decoding without
modifying the tag value.
 To obtain the next symbol, we compare
 which is 48, against the Cum_Count array:
 We decode 3
DECODING EXAMPLE
Update the limits:
 As the MSB of u and l are the same, we shift the

MSB out and read in a 0 for the LSB of 1 and a 1
for the LSB of u.
 We mimic this action for the tag as well, shifting
the MSB out and reading in the next bit from the
received bitstream as the LSB:
DECODING EXAMPLE
 Examining l and u we can see we have an E3
condition.
 Therefore, for l, u, and t, we shift the MSB out,
complement the new MSB, and read in a 0 as the
LSB of l, a 1 as the LSB of u, and the next bit in
the received bitstream as the LSB of t.
 To decode the next symbol, we compute
 Decode 2.
 Updating the limits using this decoded symbol,
we get:
 Exercise:
 Continue decoding the get the final element 1.
 Final decoded sequence would be 1 3 2 1

Chapter 4 - Arithmetic Coding

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4 - Arithmetic Coding

Uploaded by

Copyright:

Available Formats

 One possible set of tags for representing sequences of

 Because the number of numbers in the unit interval

 In order to do this we need a function that will map

 This is the function we will use in developing the

 Now LB = 0.56, UB = 0.546

 d = 0.56 – 0.546 = 0.014

 R(a1) = 0.546 + 0.014(0.7) = 0.5558

 R(a2) = 0.5558 + 0.014(0.1) = 0.5572

 Midpoint of the interval for the tag is

 Fx(0)=0, Fx(1)=0.8, Fx(2)=0.82, Fx(3)=1

 Use below equations to determine the lower and

 Initialize u(0) 1 and l(0) to 0 .

 The tag is contained in interval [0,0.8].

 The interval containing the tag for the sequence

 The interval for the tag is [0.7712,0.77408]

 and lower limits of the interval containing the

 Start with l(0) 0 and u(0) = 1.

 After decoding the first element of the sequence

 The interval containing the tag is

 If we pick x1 = 2, the interval is [0.8,0.82)

 If we pick x1 = 3, the interval is [0.82,1.0).

 As 0.772352 lies in the interval [0.0,0.8] we

 So, decoded first element in the sequence is 1.

 If we pick x2 =1, the updated interval is [0,0.64],

 Similarly calculate x3 and x4 by your own…

 Consider a source A that generates letters from

 where xn is the value of the random variable

 Initialize u(0) to 1, and l(0) to 0, the first

 The interval [0,0.8] is not confined to either the

 The interval [0.656,0.8) is contained entirely in

 The third element, 2, results in the following

 This interval is contained entirely in the lower

 The interval is still contained entirely in the

 Because the interval containing the tag remains

 Fourth element in input sequence is 1.

 Let us now repeat the procedure using integer

 First we need to select the word length m.

 231 fits with the range of 28=256.

 So, word length m=8.

 So, use 8-bit representation.

 With this value of m we have

 Next element in sequence is 3.

 The MSBs of l(2) and u(2) are both 1.

 While the MSBs of the limits are different, the

 We complement the second MSB of both limits

 Also increment Scale3 to a value of 1.

 Updating the limits we have:

 Two MSBs are equal to 1.

 So, we shift out a 1 and shift left by 1 bit

 As Scale3 value is > 0 send complement of 1

 In the previous l(3) and u(3) two MSBs are equal

 Both MSBs are equal to 1.

 Again MSBs are same 0.

 Now, MSBs are different. But second MSB of

 Until now our encoded sequence is 1100010

 However, Scale3 at this point is 1.

 Therefore, after we send the first 0 from the

 The final transmitted sequence is

 Initialize the lower and upper limits as

 To begin decoding, we compute

 Decode the first symbol as 1.