Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

EC8093-DIGITAL IMAGE

PROCESSING
Dr.K.Kalaivani
Associate Professor
Dept. of EIE
Easwari Engineering College
UNIT 5
IMAGE COMPRESSION &
RECOGNITION

08/07/2020 2
• Definition: Compression means storing data in a
format that requires less space than usual.

• Data compression is particularly useful in


communications because it enables devices to
transmit the same amount of data in fewer bits.

08/07/2020 3
• The bandwidth of a digital communication link can
be effectively increased by compressing data at the
sending end and decompressing data at the
receiving end.

• There are a variety of data compression


techniques, but only a few have been
standardized.

08/07/2020 4
Types of Data Compression

• There are two main types of data compression : Lossy


and Lossless.

• In Lossy data compression the message can never be


recovered exactly as it was before it was compressed.

08/07/2020 5
• In a Lossless data compression file the original
message can be exactly decoded.

• Lossless compression is ideal for text.

• Huffman coding is type of lossless data


compression.

08/07/2020 6
Compression Algorithms
• Huffman Coding

• Run Length Encoding

• Shift Codes

• Arithmetic Codes

• Block Truncation Codes

• Transform codes

• Vector Quantization
08/07/2020 7
Huffman Coding

• Huffman coding is a popular compression technique that


assigns variable length codes (VLC) to symbols, so that the
most frequently occurring symbols have the shortest
codes.

• On decompression the symbols are reassigned their


original fixed length codes.

08/07/2020 8
• The idea is to use short bit strings to represent the
most frequently used characters

• and to use longer bit strings to represent less


frequently used characters.

08/07/2020 9
• That is, the most common characters, usually space, e,
and t are assigned the shortest codes.

• In this way the total number of bits required to transmit


the data can be considerably less than the number
required if the fixed length ASCII representation is used.

• A Huffman code is a binary tree with branches assigned


the value 0 or 1.

08/07/2020 10
08/07/2020 11
Huffman Algorithm

• To each character, associate a binary tree consisting of


just one node.

• To each tree, assign the character’s frequency, which is


called the tree’s weight.

• Look for the two lightest-weight trees. If there are more


than two, choose among them randomly.

08/07/2020 12
• Merge the two into a single tree with a new root
node whose left and right sub trees are the two we
chose.

• Assign the sum of weights of the merged trees as


the weight of the new tree.

• Repeat the previous step until just one tree is left.

08/07/2020 13
Huffman Coding Example
• Character frequencies
• A: 20% (.20)
• B: 9% (.09)
• C: 15%
• D: 11%
• E: 40%
• F: 5%

• No other characters in the document

08/07/2020 14
Huffman Code

E BF D A C
.4 .14 .11 .20 .15
0 1
B F
.09 .05

08/07/2020 15
Huffman Code ABCDEF
1.0
0 1
ABCDF E
• Codes .6 .4
• A: 010 0 1
• B: 0000 BFD AC
• .25 .35
C: 011 0 1 0 1
• D: 001 BF D A C
• E: 1 .14 .11 .20 .15
0 1
• F: 0001 B F
.09 .05
• Note
• None are prefixes of another

08/07/2020 16
Huffman Coding
• TENNESSEE • ENCODING
9 • E:1
0/ \1
• S : 00
5 e(4)
• T : 010
0/ \1

s(2) 3 • N : 011
0/ \1

t(1) n(2) Average code length = (1*4 +


2*2 + 3*2 + 3*1) / 9 = 1.89

08/07/2020 17
Average Code Length

Average code length =


i=0,n (length*frequency)/ i=0,n frequency

= { 1(4) + 2(2) + 3(2) + 3(1) } /(4+2+2+1)

= 17 / 9 = 1.89

08/07/2020 18
ENTROPY
Entropy is a measure of information content:
the more probable the message, the lower its
information content, the lower its entropy

Entropy = -i=1,n (pi log2 pi)


( p - probability of the symbol)

= - ( 0.44 * log20.44 + 0.22 * log20.22


+ 0.22 * log20.22 + 0.11 * log20.11 )
= - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11)
/ log2
= 1.8367

08/07/2020 19
Advantages & Disadvantages

• The problem with Huffman coding is that it uses an


integral number of bits in each code.

• If the entropy of a given character is 2.5 bits,the


Huffman code for that character must be either 2
or 3 bits , not 2.5.

08/07/2020 20
• Though Huffman coding is inefficient due to using
an integral number of bits per code , it is relatively
easy to implement and very efficient for coding
and decoding.

• It provides the best approximation for coding


symbols when using fixed width codes.

08/07/2020 21
Run-length encoding

22
• Run-length encoding (RLE) is a very simple form of data
compression encoding.

• RLE is a lossless type of compression

• It is based on simple principle of encoding data. This


principle is to every stream which is formed of the same
data values (repeating values is called a run) i.e sequence
of repeated data values is replaced with count number
and a single value.

23
• This intuitive principle works best on certain data types in
which sequences of repeated data values can be noticed;

• RLE is usually applied to the files that a contain large


number of consecutive occurrences of the same byte
pattern.

24
• RLE may be used on any kind of data regardless of its content, but data which is
being compressed by RLE determines how good compression ratio will be
achieved.

• RLE is used on text files which contains multiple spaces for indention and
formatting paragraphs, tables and charts.

• Digitized signals also consist of unchanged streams so such signals can also be
compressed by RLE.

• A good example of such signal are monochrome images, and questionable


compression would be probably achieved if such compression was used on
continous-tone (photographic) images.

25
• Fair compression ratio may be achieved if RLE is
applied on computer generated color images.

• RLE is a lossless type of compression and cannot


achieve great compression ratios,

• but a good point of that compression is that it can


be easily implemented and quickly executed.

26
Example1
• WWWWWWWWWWWWBWWWWWWWWWWWWBBB
WWWWWWWWWWWWWWWWWWWWWWWW
BWWWWWWWWWWWWWW

• If we apply a simple run-length code to the above hypothetical scan line, we get
the following:

• 12WB12W3B24WB14W

27
Shift code:
A shift code is generated by
• Arranging the source symbols so that their probabilities
are monotonically decreasing
•Dividing the total number of symbols into symbol blocks
of equal size.
•Coding the individual elements within all blocks
identically, and
•Adding special shift-up or shift-down symbols to identify
each block. Each time a shift-up or shift-down symbol is
recognized at the decoder, it moves one block up or down
with respect to a pre-defined reference block.
Arithmetic coding
•Unlike the variable-length codes described previously,
arithmetic coding, generates non-block codes.

•In arithmetic coding, a one-to-one correspondence between


source symbols and code words does not exist.

•Instead, an entire sequence of source symbols (or message)


is assigned a single arithmetic code word.

•Arithmetic coding, is entropy coder widely used, the only


problem is it's speed, but compression tends to be better than
can achieve

Dr.R.Sivakumar SKCET 2005


• The code word itself defines an interval of real numbers between 0
and 1

• As the number of symbols in the message increases, the interval


used to represent it becomes smaller and the number of
information units (say, bits) required to represent the interval
becomes larger

• Each symbol of the message reduces the size of the interval in


accordance with the probability of occurrence.

• It is suppose to approach the limit set by entropy.

Dr.R.Sivakumar SKCET 2005


• The idea behind arithmetic coding is to have a probability
line, 0-1

• assign to every symbol a range in this line based on its


probability

• higher the probability, the higher range which assigns to it.

• Once we have defined the ranges and the probability line,


start to encode symbols

• every symbol defines where the output floating point


number lands

Dr.R.Sivakumar SKCET 2005


Example

Symbol  Probability  Range 

a 2 [0.0 , 0.5)

b 1 [0.5 , 0.75)

c 1 [0.7.5 , 1.0) 

Dr.R.Sivakumar SKCET 2005


Algorithm to compute the output number

• Low = 0
• High = 1
• Loop. For all the symbols.
Range = high - low
High = low + range *  high_range of
the symbol being coded
Low = low + range * low_range of the symbol
being coded
Dr.R.Sivakumar SKCET 2005
Symbol  Range  Low value  High value 

    0 1

b 1 0.5 0.75

a 0.25 0.5 0.625

c 0.125 0.59375 0.625 

a 0.03125  0.59375 0.609375

The output number will be 0.59375


Dr.R.Sivakumar SKCET 2005
Arithmetic
Arithmetic coding
coding

Let the message to be encoded be a1a2a3a3a4

Dr.R.Sivakumar SKCET 2005


0.16

0.8 0.072 0.0688

0.08

0.4 0.04 0.056 0.0624 0.06496

0.2 0.048 0.0592 0.06368

Dr.R.Sivakumar SKCET 2005


So, any number in the interval [0.06752,0.0688) , for example
0.068 can be used to represent the message.
Decode 0.39.
Since 0.8>code word > 0.4, the first symbol should be a3.

1.0 0.8 0.72 0.432

0.8 0.72 0.648 0.8

0.4 0.36 0.504 0.4

0.2 0.28 0.432 0.2

0.0 0.2 0.36Dr.R.Sivakumar SKCET 2005


0.36

You might also like