Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College

EC8093-DIGITAL IMAGE
PROCESSING
Dr.K.Kalaivani
Associate Professor
Dept. of EIE
Easwari Engineering College
UNIT 5
IMAGE COMPRESSION &
RECOGNITION
08/07/2020 2
• Definition: Compression means storing data in a
format that requires less space than usual.
• Data compression is particularly useful in

communications because it enables devices to
transmit the same amount of data in fewer bits.
08/07/2020 3
• The bandwidth of a digital communication link can
be effectively increased by compressing data at the
sending end and decompressing data at the
receiving end.
• There are a variety of data compression

techniques, but only a few have been
standardized.
08/07/2020 4
Types of Data Compression
• There are two main types of data compression : Lossy

and Lossless.
• In Lossy data compression the message can never be

recovered exactly as it was before it was compressed.
08/07/2020 5
• In a Lossless data compression file the original
message can be exactly decoded.
• Lossless compression is ideal for text.
• Huffman coding is type of lossless data

compression.
08/07/2020 6
Compression Algorithms
• Huffman Coding
• Run Length Encoding
• Shift Codes
• Arithmetic Codes
• Block Truncation Codes
• Transform codes
• Vector Quantization
08/07/2020 7
Huffman Coding
• Huffman coding is a popular compression technique that

assigns variable length codes (VLC) to symbols, so that the
most frequently occurring symbols have the shortest
codes.
• On decompression the symbols are reassigned their

original fixed length codes.
08/07/2020 8
• The idea is to use short bit strings to represent the
most frequently used characters
• and to use longer bit strings to represent less

frequently used characters.
08/07/2020 9
• That is, the most common characters, usually space, e,
and t are assigned the shortest codes.
• In this way the total number of bits required to transmit

the data can be considerably less than the number
required if the fixed length ASCII representation is used.
• A Huffman code is a binary tree with branches assigned

the value 0 or 1.
08/07/2020 10
08/07/2020 11
Huffman Algorithm
• To each character, associate a binary tree consisting of

just one node.
• To each tree, assign the character’s frequency, which is

called the tree’s weight.
• Look for the two lightest-weight trees. If there are more

than two, choose among them randomly.
08/07/2020 12
• Merge the two into a single tree with a new root
node whose left and right sub trees are the two we
chose.
• Assign the sum of weights of the merged trees as

the weight of the new tree.
• Repeat the previous step until just one tree is left.
08/07/2020 13
Huffman Coding Example
• Character frequencies
• A: 20% (.20)
• B: 9% (.09)
• C: 15%
• D: 11%
• E: 40%
• F: 5%
• No other characters in the document
08/07/2020 14
Huffman Code
E BF D A C
.4 .14 .11 .20 .15
0 1
B F
.09 .05
08/07/2020 15
Huffman Code ABCDEF
1.0
0 1
ABCDF E
• Codes .6 .4
• A: 010 0 1
• B: 0000 BFD AC
• .25 .35
C: 011 0 1 0 1
• D: 001 BF D A C
• E: 1 .14 .11 .20 .15
0 1
• F: 0001 B F
.09 .05
• Note
• None are prefixes of another
08/07/2020 16
Huffman Coding
• TENNESSEE • ENCODING
9 • E:1
0/ \1
• S : 00
5 e(4)
• T : 010
0/ \1
s(2) 3 • N : 011
0/ \1
t(1) n(2) Average code length = (1*4 +

2*2 + 3*2 + 3*1) / 9 = 1.89
08/07/2020 17
Average Code Length
Average code length =

i=0,n (length*frequency)/ i=0,n frequency
= { 1(4) + 2(2) + 3(2) + 3(1) } /(4+2+2+1)
= 17 / 9 = 1.89
08/07/2020 18
ENTROPY
Entropy is a measure of information content:
the more probable the message, the lower its
information content, the lower its entropy
Entropy = -i=1,n (pi log2 pi)

( p - probability of the symbol)
= - ( 0.44 * log20.44 + 0.22 * log20.22

+ 0.22 * log20.22 + 0.11 * log20.11 )
= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)
/ log2
= 1.8367
08/07/2020 19
Advantages & Disadvantages
• The problem with Huffman coding is that it uses an

integral number of bits in each code.
• If the entropy of a given character is 2.5 bits,the

Huffman code for that character must be either 2
or 3 bits , not 2.5.
08/07/2020 20
• Though Huffman coding is inefficient due to using
an integral number of bits per code , it is relatively
easy to implement and very efficient for coding
and decoding.
• It provides the best approximation for coding

symbols when using fixed width codes.
08/07/2020 21
Run-length encoding
22
• Run-length encoding (RLE) is a very simple form of data
compression encoding.
• RLE is a lossless type of compression
• It is based on simple principle of encoding data. This

principle is to every stream which is formed of the same
data values (repeating values is called a run) i.e sequence
of repeated data values is replaced with count number
and a single value.
23
• This intuitive principle works best on certain data types in
which sequences of repeated data values can be noticed;
• RLE is usually applied to the files that a contain large

number of consecutive occurrences of the same byte
pattern.
24
• RLE may be used on any kind of data regardless of its content, but data which is
being compressed by RLE determines how good compression ratio will be
achieved.
• RLE is used on text files which contains multiple spaces for indention and
formatting paragraphs, tables and charts.
• Digitized signals also consist of unchanged streams so such signals can also be
compressed by RLE.
• A good example of such signal are monochrome images, and questionable

compression would be probably achieved if such compression was used on
continous-tone (photographic) images.
25
• Fair compression ratio may be achieved if RLE is
applied on computer generated color images.
• RLE is a lossless type of compression and cannot

achieve great compression ratios,
• but a good point of that compression is that it can

be easily implemented and quickly executed.
26
Example1
• WWWWWWWWWWWWBWWWWWWWWWWWWBBB
WWWWWWWWWWWWWWWWWWWWWWWW
BWWWWWWWWWWWWWW
• If we apply a simple run-length code to the above hypothetical scan line, we get
the following:
• 12WB12W3B24WB14W
27
Shift code:
A shift code is generated by
• Arranging the source symbols so that their probabilities
are monotonically decreasing
•Dividing the total number of symbols into symbol blocks
of equal size.
•Coding the individual elements within all blocks
identically, and
•Adding special shift-up or shift-down symbols to identify
each block. Each time a shift-up or shift-down symbol is
recognized at the decoder, it moves one block up or down
with respect to a pre-defined reference block.
Arithmetic coding
•Unlike the variable-length codes described previously,
arithmetic coding, generates non-block codes.
•In arithmetic coding, a one-to-one correspondence between

source symbols and code words does not exist.
•Instead, an entire sequence of source symbols (or message)

is assigned a single arithmetic code word.
•Arithmetic coding, is entropy coder widely used, the only

problem is it's speed, but compression tends to be better than
can achieve
Dr.R.Sivakumar SKCET 2005

• The code word itself defines an interval of real numbers between 0
and 1
• As the number of symbols in the message increases, the interval

used to represent it becomes smaller and the number of
information units (say, bits) required to represent the interval
becomes larger
• Each symbol of the message reduces the size of the interval in

accordance with the probability of occurrence.
• It is suppose to approach the limit set by entropy.

• The idea behind arithmetic coding is to have a probability
line, 0-1
• assign to every symbol a range in this line based on its

probability
• higher the probability, the higher range which assigns to it.
• Once we have defined the ranges and the probability line,

start to encode symbols
• every symbol defines where the output floating point

number lands

Example
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.7.5 , 1.0)

Algorithm to compute the output number
• Low = 0
• High = 1
• Loop. For all the symbols.
Range = high - low
High = low + range * high_range of
the symbol being coded
Low = low + range * low_range of the symbol
being coded
Symbol Range Low value High value
0 1
b 1 0.5 0.75
a 0.25 0.5 0.625
c 0.125 0.59375 0.625
a 0.03125 0.59375 0.609375
The output number will be 0.59375

Arithmetic
Arithmetic coding
coding
Let the message to be encoded be a1a2a3a3a4

0.16
0.8 0.072 0.0688
0.08
0.4 0.04 0.056 0.0624 0.06496
0.2 0.048 0.0592 0.06368

So, any number in the interval [0.06752,0.0688) , for example
0.068 can be used to represent the message.
Decode 0.39.
Since 0.8>code word > 0.4, the first symbol should be a3.
1.0 0.8 0.72 0.432
0.8 0.72 0.648 0.8
0.4 0.36 0.504 0.4
0.2 0.28 0.432 0.2
0.0 0.2 0.36Dr.R.Sivakumar SKCET 2005

0.36

Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College

Uploaded by

Copyright:

Available Formats

EC8093-DIGITAL IMAGE

• Data compression is particularly useful in

• There are a variety of data compression

• There are two main types of data compression : Lossy

• In Lossy data compression the message can never be

• Lossless compression is ideal for text.

• Huffman coding is type of lossless data

• Run Length Encoding

• Block Truncation Codes

• Huffman coding is a popular compression technique that

• On decompression the symbols are reassigned their

• and to use longer bit strings to represent less

• In this way the total number of bits required to transmit

• A Huffman code is a binary tree with branches assigned

• To each character, associate a binary tree consisting of

• To each tree, assign the character’s frequency, which is

• Look for the two lightest-weight trees. If there are more

• Assign the sum of weights of the merged trees as

• Repeat the previous step until just one tree is left.

• No other characters in the document

t(1) n(2) Average code length = (1*4 +

Average code length =

= { 1(4) + 2(2) + 3(2) + 3(1) } /(4+2+2+1)

Entropy = -i=1,n (pi log2 pi)

= - ( 0.44 * log20.44 + 0.22 * log20.22

• The problem with Huffman coding is that it uses an

• If the entropy of a given character is 2.5 bits,the

• It provides the best approximation for coding

• RLE is a lossless type of compression

• It is based on simple principle of encoding data. This

• RLE is usually applied to the files that a contain large

• A good example of such signal are monochrome images, and questionable

• RLE is a lossless type of compression and cannot

• but a good point of that compression is that it can

•In arithmetic coding, a one-to-one correspondence between

•Instead, an entire sequence of source symbols (or message)

•Arithmetic coding, is entropy coder widely used, the only

Dr.R.Sivakumar SKCET 2005

• As the number of symbols in the message increases, the interval

• Each symbol of the message reduces the size of the interval in

• It is suppose to approach the limit set by entropy.

Dr.R.Sivakumar SKCET 2005

• assign to every symbol a range in this line based on its

• higher the probability, the higher range which assigns to it.

• Once we have defined the ranges and the probability line,

• every symbol defines where the output floating point

Dr.R.Sivakumar SKCET 2005

Symbol Probability Range

Dr.R.Sivakumar SKCET 2005

a 0.25 0.5 0.625

c 0.125 0.59375 0.625

a 0.03125 0.59375 0.609375

The output number will be 0.59375

Let the message to be encoded be a1a2a3a3a4

Dr.R.Sivakumar SKCET 2005

0.8 0.072 0.0688

0.4 0.04 0.056 0.0624 0.06496

0.2 0.048 0.0592 0.06368

Dr.R.Sivakumar SKCET 2005

1.0 0.8 0.72 0.432

0.8 0.72 0.648 0.8

0.4 0.36 0.504 0.4