Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

CHAPTER 4

ARITHMETIC
CODING
Lecture By,
Prof.M.Dhanalakshmi,
Asst.Prof., IT Dept,
SCET,Surat.
INTRODUCTION
 Huffman approach generates variable length
codes.
 Another popular method of generating
variable-length codes called arithmetic
coding.
 When a string is converted to arithmetic
encoding, frequently used characters will be
stored with fewer bits and not-so-frequently
occurring characters will be stored with more
bits, resulting in fewer bits used in total.
DISADVANTAGES OF HUFFMAN CODE
 Huffman coding methods generating codewords
for individual symbols.
 Arithmetic coding differs Huffman coding,
 Rather than separating the input into
component symbols and replacing each with a
code, arithmetic coding encodes the entire
message into a single number.
 It represents the current information as a range,
defined by two numbers.
 Storage of Huffman code for large applications
would be a highly inefficient and time-consuming
procedure.
 Impractical to obtain Huffman codes for long
sequences of symbols.
DISADVANTAGES OF HUFFMAN CODE
 In order to find the Huffman codeword for a
particular sequence of length m, we need
codewords for all possible sequences of length m.
 This fact causes an exponential growth in the
size of the codebook.
 We need a way of assigning codewords to
particular sequences without having to generate
codes for all sequences of that length.
 The arithmetic coding technique fulfills this
requirement.
HUFFMAN AND ARITHEMTIC
 Implemented results show that:
 Compression ratio of arithmetic coding is
better than Huffman coding,
LOGIC OF ARITHMETIC CODE
 In arithmetic coding a unique identifier or tag is
generated for the sequence to be encoded.
 This tag corresponds to a binary fraction, which
becomes the binary code for the sequence.
 Arithmetic coding approach is divided into two
phases:
 1. Phase 1: Unique identifier or tag is generated
for a given sequence of symbols.
 2. Phase 2: Tag is then given a unique binary
code.
 A unique arithmetic code can be generated for a
sequence of length m without the need for
generating codewords for all sequences of length m.
 This is contrast to the situation for Huffman
codes.
CODING A SEQUENCE
 In order to distinguish a sequence of symbols
from another sequence of symbols we need to
tag it with a unique identifier.

 One possible set of tags for representing sequences of


symbols are the numbers in the unit interval [0 ,1) .

 Because the number of numbers in the unit interval


is infinite, it should be possible to assign a unique tag
to each distinct sequence of symbols.

 In order to do this we need a function that will map


sequences of symbols into the unit interval.
CODING A SEQUENCE
 A function that maps random variables, and
sequences of random variables, into the
unit interval is the cumulative distribution
function (cdf) of the random variable associated
with the source.

 This is the function we will use in developing the


arithmetic code.
EXAMPLE
 Consider 3-letter alphabet A={a1,a2,a3} with
P(a1)=0.7, P(a2)=0.1, P(a3)=0,2.
 Using equation Fx(0)=0 , Fx(1)=0.7, Fx(2)=0.8,
Fx(3)=1.
EXAMPLE
 For three letter alphabets a1, a2, a3
 Step 1:
 Calculate the difference d
 d = Upper bound (UB) – lower bound (LB)
 Step 2:
 R(a1) = LB + d(Probability of a1)
 R(a2) = LB + d (Probability of a2)
 Iteration 1:
 First alphabet to be encoded is a1.
 So LB = 0, UB = 0.7
 d = 0.7 – 0 = 0.7
 R(a1) = 0 + 0.7(0.7) = 0.49
 R(a2) = 0.49 + 0.7(0.1) = 0.56
EXAMPLE
 For three letter alphabets a1, a2, a3
 Step 1:
 Calculate the difference d
 d = Upper bound (UB) – lower bound (LB)
 Step 2:
 R(a1) = LB + d(Probability of a1)
 R(a2) = LB + d (Probability of a2)
 Iteration 2:
 Next alphabet to be encoded is a2.
 Now LB = 0.49, UB = 0.56
 d = 0.56 – 0.49 = 0.07
 R(a1) = 0.49 + 0.07(0.7) = 0.539
 R(a2) = 0.539 + 0.07(0.1) = 0.546
EXAMPLE
 For three letter alphabets a1, a2, a3
 Step 1:
 Calculate the difference d
 d = Upper bound (UB) – lower bound (LB)

 Step 2:
 R(a1) = LB + d(Probability of a1)
 R(a2) = LB + d (Probability of a2)
 Iteration 3:
 Next alphabet to be encoded is a3.

 Now LB = 0.56, UB = 0.546

 d = 0.56 – 0.546 = 0.014

 R(a1) = 0.546 + 0.014(0.7) = 0.5558

 R(a2) = 0.5558 + 0.014(0.1) = 0.5572


EXAMPLE
 To calculate the tag:
 Tag = upper bound + lower bound
2
 0.56 + 0.546
2
 Tag value = 0.553
EQUATION TO GENERATE LOWER LIMIT
AND UPPER LIMIT
 For any sequence x= (x1,x2,x3….xn)

 Midpoint of the interval for the tag is


EXAMPLE GENERATING A TAG
 Consider the source A=(a1,a2,a3} with probability
P(a1)=0.8, P(a2)=0.02, P(a3)=0.18.
 Suppose we wish to encode the sequence 1 3 2 1

 Fx(0)=0, Fx(1)=0.8, Fx(2)=0.82, Fx(3)=1

 Use below equations to determine the lower and


upper limits of the interval containing the tag.

 Initialize u(0) 1 and l(0) to 0 .


 Our first element in the sequence is 1.

 The tag is contained in interval [0,0.8].


EXAMPLE GENERATING A TAG
 The second element of the sequence is 3. Using
the update equations we get

 The interval containing the tag for the sequence


1 3 is [0.656,0.8]
EXAMPLE GENERATING A TAG
 The third element, 2, results in the following
update equations:

 The interval for the tag is [0.7712,0.77408]


EXAMPLE GENERATING A TAG
 Last element is 1.
 Continuing with the last element, the upper

 and lower limits of the interval containing the


tag are:
EXAMPLE GENERATING A TAG
 Tag for the sequence 1 3 2 1 is generated as
DECIPHERING
THE TAG
DECIPHERING THE TAG
 Let us try to obtain the sequence represented by
the tag.
 The tag value is 0.772352.

 Start with l(0) 0 and u(0) = 1.

 After decoding the first element of the sequence


x1, the upper and lower limits become

 The interval containing the tag is


[FX(x1−1).FX(x1)].
 We need to find the value of x1 for which
0.772352 lies in the interval [FX(x1−1).FX(x1)].
DECIPHERING THE TAG
 If we pick x1 = 1, the interval is [0,0.8).

 If we pick x1 = 2, the interval is [0.8,0.82)

 If we pick x1 = 3, the interval is [0.82,1.0).

 As 0.772352 lies in the interval [0.0,0.8] we


choose x1 = 1.

 So, decoded first element in the sequence is 1.


DECIPHERING THE TAG
 Second element x2.
 The updated values of l(1) and u(1).

 If we pick x2 =1, the updated interval is [0,0.64],


which does not contain the tag.
 If we pick x2 = 2, the updated interval is [0.6,0.656],
which also does not contain the tag.
 If we pick x2 = 3, the updated interval is [0.656,0.8],
which does contain the tag value of 0.772352.
 Therefore, the second element in the sequence is 3.
 Decoded sequence is 1 3
DECIPHERING THE TAG
 To identify third element x3:

 Similarly calculate x3 and x4 by your own…


 Exercise…To calculate elements x3 and x4.
ALGORITHM TO DECIPHER A TAG
GENERATING A
BINARY CODE
GENERATING A BINARY CODE
 Using the algorithm described in the previous
section, we can obtain a tag for a given sequence x.
 However, the binary code for the sequence is what we
really want to know.
 We want to find a binary code that will represent the
sequence x in a unique and efficient manner.
 The tag forms a unique representation for the
sequence.
 This means that the binary representation of the tag
forms a unique binary code for the sequence.
 The binary representation of some of these values
would be infinitely long, in which case, although the
code is unique, it may not be efficient.
 To make the code efficient, the binary representation
has to be truncated.
GENERATING A BINARY CODE
 is a number in the interval 0 1. A binary
code for can be obtained by taking the binary
representation of this number and truncating it
to bits.
 Example:

 Consider a source A that generates letters from


an alphabet of size four,
GENERATING A BINARY CODE
 Binary code for the input source
ALGORITHM IMPLEMENTATION
 A recursive algorithm for the boundaries of the
interval containing the tag for the sequence being
encoded as

 where xn is the value of the random variable


corresponding to the nth observed symbol, l(n) is
the lower limit of the tag interval at the nth
iteration, and u(n) is the upper limit of the tag
interval at the nth iteration.
 Before we can implement this algorithm, there is
one major problem we have to resolve.
 Recall that the rationale for using numbers in the
interval [0,1] as a tag was that there are an
infinite number of numbers in this interval.
ALGORITHM IMPLEMENTATION
 Consider the values of l(n) and u(n) in previous
example.
 As n gets larger, these values come closer and
closer together.
 This means that in order to represent all the
subintervals uniquely we need increasing
precision as the length of the sequence increases.
 In a system with finite precision, the two values
are bound to converge, and we will lose all
information about the sequence from the point at
which the two values converged.
 To avoid this situation, we need to rescale
the interval.
ALGORITHM IMPLEMENTATION
 We would also like to perform the encoding
incrementally—that is, to transmit portions of
the code as the sequence is being observed,
rather than wait until the entire sequence has
been observed before transmitting the first bit.
 The algorithm we describe in this section takes
care of the problems of synchronized rescaling
and incremental encoding.
 As the interval becomes narrower, we have three
possibilities:
 1. The interval is entirely confined to the lower
half of the unit interval [0,0.5].
 2. The interval is entirely confined to the upper
half of the unit interval [0.5,1.0].
 3. The interval straddles the midpoint of the
unit interval.
ALGORITHM IMPLEMENTATION
 Consider only the first two cases.
 The most significant bit of the binary
representation of all numbers in the interval
[0,0.5] is 0, and the most significant bit of the
binary representation of all numbers in the
interval [0.5,1] is 1.
 Therefore, once the interval gets restricted to
either the upper or lower half of the unit interval,
the most significant bit of the tag is fully
determined.
 Therefore, without waiting to see what the rest of
the sequence looks like, we can indicate to the
decoder whether the tag is confined to the upper
or lower half of the unit interval by sending a 1
for the upper half and a 0 for the lower half.
ALGORITHM IMPLEMENTATION
 Once the encoder and decoder know which half
contains the tag, we can ignore the half of the
unit interval not containing the tag and
concentrate on the half containing the tag.
 As our arithmetic is of finite precision, we can do
this best by mapping the half interval containing
the tag to the full [0,1] interval.
 The mappings required are:
ALGORITHM IMPLEMENTATION
 As soon as we perform either of these mappings,
we lose all information about the most significant
bit.
 However, this should not matter because we have
already sent that bit to the decoder.
 We can now continue with this process,
generating another bit of the tag every time the
tag interval is restricted to either half of the unit
interval.
 This process of generating the bits of the
tag without waiting to see the entire
sequence is called incremental encoding.
EXAMPLE TAG GENERATION WITH
SCALING
 Encode the sequence 1 3 2 1
 P(a1)=0.8, P(a2)=0.02, P(a3)=0.18

 Initialize u(0) to 1, and l(0) to 0, the first


element of the sequence, 1, results in the
following update:

 The interval [0,0.8] is not confined to either the


upper or lower half of the unit interval, so we
proceed.
EXAMPLE TAG GENERATION WITH
SCALING
 The second element of the sequence is 3. This
results in the update:

 The interval [0.656,0.8) is contained entirely in


the upper half of the unit interval
 So send the binary code 1 and rescale using
E2(x)=2(x-0.5)

 The third element, 2, results in the following


update equations:
EXAMPLE TAG GENERATION WITH
SCALING
 The interval for the tag is [0.5424,0.54816],
which is contained entirely in the upper half of
the unit interval.
 Send 1 and go through another scaling
[E2 mapping].

 This interval is contained entirely in the lower


half of the unit interval.
 Send a 0 and use the E1 mapping to rescale:

 The interval is still contained entirely in the


lower half of the unit interval,
EXAMPLE TAG GENERATION WITH
SCALING
 The interval is still contained entirely in the
lower half of the unit interval.
 Send 0 and perform scaling using E1 mapping.

 Because the interval containing the tag remains


in the lower half of the unit interval.
 Send 0 and rescale using E1 mapping.
EXAMPLE TAG GENERATION WITH
SCALING
 Now the interval containing the tag is contained
entirely in the upper half of the unit interval.
 Send 1 and perform scaling using E2 mapping.

 Fourth element in input sequence is 1.


 Continuing with the last element, the upper and
lower limits of the interval containing the tag are
EXAMPLE TAG GENERATION WITH
SCALING
 At this point, if we wished to stop encoding, all
we need to do is inform the receiver of the final
status of the tag value.
 We can do so by sending the binary
representation of any value in the final tag
interval.
 Generally, this value is taken to be u(n).
EXAMPLE TAG GENERATION WITH
SCALING
 Therefore, this technique solves the finite
precision problem.
 As we shall soon see, the bits that we have been
sending with each mapping constitute the tag
itself, which satisfies our desire for incremental
encoding.
 The binary sequence generated during the
encoding process in the previous example is
1100011.
 A binary number 1100011 corresponds to the
decimal number 0.7734375.
INTEGER
IMPLEMENTATION
• Encoder Implementation
• Decoder Implementation
INTEGER IMPLEMEMTATION
 We have described a floating-point
implementation of arithmetic coding.

 Let us now repeat the procedure using integer


arithmetic and generate the binary code in the
process.
 ENCODER IMPLEMENTATION:
 The first thing we have to do is decide on the word
length to be used. Given a word length of m, we map
the important values in the 0 1 interval to the range
of 2m binary words.
 The point 0 gets mapped to
 1 gets mapped to
 Equations for Encoder implementation:
 ENCODING ALGORITHM:
ENCODING ALGORITHM:

 ENCODING ALGORITHM:

 E3 condition:
 Second MSB of l(k) = 1 and second MSB of u(k)=0.
 Scale3 -> No.of E3 Mappings.
ENCODING ALGORITHM:

 EXAMPLE:
 Encode the sequence 1 3 2 1 with parameters shown
in table below:

 First we need to select the word length m.


 Maximum available range is 2m values.
ENCODING EXAMPLE
 m should be large enough to accommodate
uniquely the set of values between 0 and
Total_Count.
 For this example add all the values of Count,
Cum_Count and Total_Count.
 40+1+9+50+40+41+50=231.

 231 fits with the range of 28=256.

 So, word length m=8.

 So, use 8-bit representation.

 With this value of m we have


ENCODING EXAMPLE
 Encoding sequence is 1 3 2 1
 First element of the sequence to be encoded
is 1.

 Next element in sequence is 3.

 The MSBs of l(2) and u(2) are both 1.


 Se, Send 1
ENCODING EXAMPLE
 Shift this value out and send it to the decoder.
 All other bits are shifted left by 1 bit, giving

 While the MSBs of the limits are different, the


second MSB of the upper limit is 0, while the
second MSB of the lower limit is 1.
 This is the condition for the E3 mapping.

 We complement the second MSB of both limits


and shift 1 bit to the left, shifting in a 0 as the
LSB of l(2) and a 1 as the LSB of u(2).
ENCODING EXAMPLE

 Also increment Scale3 to a value of 1.


 Next element in sequence is 2.

 Updating the limits we have:

 Two MSBs are equal to 1.


 So, Send 1.

 So, we shift out a 1 and shift left by 1 bit


ENCODING EXAMPLE

 As Scale3 value is > 0 send complement of 1


previously transmitted.
 Send 0

 Decrement Scale3 to 0.

 In the previous l(3) and u(3) two MSBs are equal


to 0.
 So, send 0.
ENCODING EXAMPLE
 Both MSBs are 0 again.
 So, shift the bits.

 Send 0.

 Both MSBs are equal to 1.


 So, shift bits to left.

 Send 1.

 Again MSBs are same 0.


 So, Send 0.
ENCODING EXAMPLE

 Now, MSBs are different. But second MSB of


l(3) =1 and second MSB of u(3) =0 which represents
E3 mapping.

 Increment Scale3 to 1.
ENCODING EXAMPLE
 The next element in the sequence to be encoded
is 1. Therefore,

 Until now our encoded sequence is 1100010


 We have completed our input sequence
1 3 2 1 to 1100010.
 At the point in terminating encoding it is
necessary to send the current status of the tag.
 Send the value of lower limit l(4).
ENCODING EXAMPLE
 As l(4) is 0, we will end up sending eight 0s.

 However, Scale3 at this point is 1.

 Therefore, after we send the first 0 from the


value of l(4), we need to send a 1 before
sending the remaining seven 0s.

 The final transmitted sequence is


1100010010000000.
DECODER IMPLEMENTATION
DECODER IMPLEMENTATION
DECODER IMPLEMENTATION
 DECODING EXAMPLE:
 After encoding the sequence in previous example
we ended up with the following binary sequence:
1100010010000000.
 Treating this as the received sequence and using
the parameters from Table below
DECODING EXAMPLE
 Using the same word length, eight, we read in
the first 8 bits of the received sequence to form
the tag t:
 11000100

 Initialize the lower and upper limits as

 To begin decoding, we compute


DECODING EXAMPLE
 Compare this value to

 Decode the first symbol as 1.

 Once we have decoded a symbol, we update the


lower and upper limits:
DECODING EXAMPLE

 The MSB of the limits are different and the E3


condition does not hold.
 Therefore, we continue decoding without
modifying the tag value.
 To obtain the next symbol, we compare

 which is 48, against the Cum_Count array:

 We decode 3
DECODING EXAMPLE
Update the limits:

 As the MSB of u and l are the same, we shift the


MSB out and read in a 0 for the LSB of 1 and a 1
for the LSB of u.
 We mimic this action for the tag as well, shifting
the MSB out and reading in the next bit from the
received bitstream as the LSB:
DECODING EXAMPLE
 Examining l and u we can see we have an E3
condition.
 Therefore, for l, u, and t, we shift the MSB out,
complement the new MSB, and read in a 0 as the
LSB of l, a 1 as the LSB of u, and the next bit in
the received bitstream as the LSB of t.

 To decode the next symbol, we compute

 Decode 2.
 Updating the limits using this decoded symbol,
we get:

 Exercise:
 Continue decoding the get the final element 1.
 Final decoded sequence would be 1 3 2 1

You might also like