Image Compression

Image Compression
CS474/674 – Prof. Bebis

Chapter 8 (except Sections 8.10-8.12)
Image Compression
• The goal of image compression is to reduce the amount
of data required to represent a digital image while at the
same time preserving as much information as possible.
• Lower memory requirements, faster transmission rates.
Data ≠ Information
• Data and information are not synonymous terms!
• Data is the means by which information is conveyed.
• The same information can be represented by different

amount of data!
Data ≠ Information - Example
Ex1: Your wife, Helen, will meet you at Logan Airport

in Boston at 5 minutes past 6:00 pm tomorrow
night
Ex2: Your wife will meet you at Logan Airport at 5

minutes past 6:00 pm tomorrow night
Ex3: Helen will meet you at Logan at 6:00 pm

tomorrow night
Image Compression (cont’d)
• Lossless
– Information preserving
– Low compression ratios
• Lossy
– Information loss
– High compression ratios
Trade-off: information loss vs compression ratio

Compression Ratio
compression
Compression ratio:
Relevant Data Redundancy
Example:
Types of Data Redundancy
(1) Coding Redundancy

(2) Interpixel (or Spatial) Redundancy
(3) Psychovisual (or Irrelevant Information)
Redundancy
• The goal of data compression is to reduce one or more

of these redundancy types.
Coding Redundancy
• A code is a system of rules to convert information (e.g.,
an image) into another form for efficient storage or
transmission.
– A code consists of a list of
symbols (e.g., letters, numbers,
bits etc.)
– A code word is a sequence of
symbols used to represent some
information (e.g., gray levels).
– The length of a code word is the
number of symbols in the code
word; could be fixed or variable.
Coding Redundancy (cont’d)
• Coding redundancy results from employing inefficient
coding schemes.
• To compare the efficiency of different coding schemes,

we can compute the average number of symbols Lavg
per code word (or average data content).
• Coding schemes with lower Lavg are more efficient

since they save more memory.
Coding Redundancy (cont’d)
N x M image
(symbols: bits)
rk: k-th gray level

l(rk): # of bits for representing rk
P(rk): probability of rk
(average length of symbols per code word

or average data content (bits) per pixel)
Average Image Size:

Coding Redundancy - Example
• Case 1: l(rk) = fixed length
L=8
/pixel
Average image size: 3NM

Coding Redundancy – Example (cont’d)
• Case 2: l(rk) = variable length
L=8
/pixel
Average image size: 2.7NM

Interpixel redundancy
• Interpixel redundancy results from pixel correlations (i.e.,
a pixel value can be reasonably predicted by its neighbors).
histograms
auto-correlation

f ( x) o g ( x)   f ( x) g ( x  a )da

auto-correlation: f(x)=g(x)
Interpixel redundancy (cont’d)
• To reduce interpixel redundancy, some transformation needs
to be applied on the data.
Example: gray-levels of line #100
Grayscale threshold
thresholding
transformation 11 ……………0000……………………..11…..000…..
Original: 1024 bytes

Binary
Thresholded: 1024 bits
Psychovisual redundancy
• The human eye is more sensitive to the lower frequencies
than to the higher frequencies in the visual spectrum.
• Idea: discard data that is perceptually insignificant (e.g., using
quantization).
256 gray levels 16 gray levels random noise + 16 gray levels
Add a small
Example: random
number
to each pixel
prior to
quantization
CR=8/4 = 2:1
Data ≠ Information (revisited)
Goal: reduce the amount of data while preserving as

much information as possible!
Question: What is the minimum amount of data that

preserves information content in an image?
We need some measure of information!

How do we measure information?
• We assume that information is generated by a
probabilistic process.
• Idea: associate information with probability!
• A random event E with probability P(E) contains
Note: when P(E)=1, I(E)=0

How much information does a pixel contain?
• Suppose that pixel values are generated by a random

process, then gray-level rk contains
units of information!
(assuming statistically independent random events)

How much information does an image contain?
• The average information content of an image is:

L 1
E   I (rk ) P(rk )
k 0
using
units of info / pixel

Entropy:
(e.g., bits/pixel)
Entropy – Example
H=1.6614 bits/pixel H=8 bits/pixel H=1.566 bits/pixel
The amount of entropy, and thus information in an image,

is far from intuitive!
Data Redundancy
• Data redundancy can be computed by comparing data to
information:
Do not confuse R with RD
(relative data redundancy)
where:
Note: if Lavg= H, then R=0 (no data redundancy)

Data Redundancy - Example
Lavg = 8 bits/pixel
R= Lavg- H or R= 6.19 bits/pixel

Entropy Estimation
• Estimating H reliably is not easy!
First order estimate of H Second order estimate of H

Use relative frequencies of pixels: Use relative frequencies of pixel blocks:
Which estimate is more reliable?

Entropy Estimation (cont’d)
• In general, differences between first-order and

higher-order entropy estimates indicate the presence
of interpixel redundancy.
• As we have discussed earlier, some transformation

needs to be applied on the data to address interpixel
redundancy.
Estimating Entropy - Example
• Consider a simple transformation that subtracts
column i+1 from column i (starting at i=1):
Difference image:
• No information has been lost – the original image can

be completely reconstructed from the difference image
by adding column i to column i+1 (starting at i=0)
Estimating Entropy – Example (cont’d)
16
Entropy of difference image:

Less than the entropy
of the original image
(i.e., 1.81 bits/pixel)
• It is possible that an even better transformation could

be found since the 2nd order entropy estimate is even lower!
General Image Compression and
Transmission Model
We will focus on the Source Encoder/Decoder only.

Encoder – Three Main Components
• Mapper: applies a transformation to the data to account for

interpixel redundancies.
Encoder (cont’d)
• Quantizer: quantizes the data to account for psychovisual

redundancies.
Encoder (cont’d)
• Symbol encoder: encodes the data to account for coding

redundancies.
Decoder - Three Main Components
• The decoder applies the same steps in inverse order.
• Note: the quantization is irreversible in general!

Fidelity Criteria
• How close is to ?
• Criteria
– Subjective: based on human observers.
– Objective: based on mathematically defined criteria.
Subjective Fidelity Criteria
Objective Fidelity Criteria
• Root mean square error (RMS)
• Mean-square signal-to-noise ratio (SNR)

Objective Fidelity Criteria - Example
RMS = 5.17 RMS = 15.67 RMS = 14.17

Lossless Compression
Taxonomy of Lossless Methods
(Run-length encoding)
(see “Image Compression Techniques” paper)

Huffman Coding
(addresses coding redundancy)
• A variable-length coding technique.
• Source symbols (i.e., gray-levels) are encoded one at a time.

– There is a one-to-one correspondence between source symbols and
code words.
– It is optimal in the sense that it minimizes code word length per
source symbol.
Huffman Coding (cont’d)
• Forward Pass
1. Sort probabilities per symbol (e.g., gray-levels)
2. Combine the lowest two probabilities
3. Repeat Step2 until only two probabilities
remain.
• Backward Pass
Assign code symbols going backwards
• Lavg assuming binary coding:
• Lavg assuming Huffman coding:

Huffman Decoding
• Decoding can be performed unambiguously using a

look-up table.
Arithmetic (or Range) Coding
(addresses coding redundancy)
• Huffman coding encodes source symbols one at a

time which might not be efficient overall.
• Arithmetic coding assigns sequences of source
symbols to variable length code words.
– There is no one-to-one correspondence between source
symbols and code words.
– Slower than Huffman coding but can achieve higher
compression.
Arithmetic Coding – Main Idea
• Maps a sequence of symbols to a real number (arithmetic code)

in the interval [0, 1).
α1 α2 α3 α3 α4
• The mapping is built incrementally (i.e., as each source symbol

arrives) and depends on the source symbol probabilities.
• The original sequence of symbols can be obtained by decoding
the arithmetic code.
Arithmetic Coding – Main Idea (cont’d)
symbol sequence: α1 α2 α3 α3 α4
known probabilities P(αi)
– Start with the interval [0, 1)
0 1
– A sub-interval of [0,1) is chosen to encode the first symbol α1 in the
sequence (based on P(α1)).
0 1
– A sub-interval inside the previous sub-interval is chosen to encode the

next symbol α2 in the sequence (based on P(α2)).
0 1
– Eventually, the whole symbol sequence is encoded by a number within
the final sub-interval, e.g.,:
final
Arithmetic Coding - Example
Subdivide [0,1)
based on P(αi)
Encode
α1 α2 α3 α3 α4
Subdivide Subdivide Subdivide Subdivide Subdivide
[0.06752, 0.0688) 0.8 1.6

final sub-interval
0.4 0.08
arithmetic code: 0.068 0.04
0.2
(can choose any number
within the final sub-interval)
Warning: finite precision arithmetic might cause problems due to truncations!

Arithmetic Coding - Example (cont’d)
• The arithmetic code 0.068 can be encoded using Binary Fractions:
0.0068 ≈ 0.000100011 (9 bits) (subject to conversion errors;

exact value is 0.068359375)
α1 α2 α3 α3 α4
• Huffman Code:
0100011001 (10 bits)
• Fixed Binary Code:

5 x 8 bits/symbol = 40 bits
Arithmetic Decoding - Example
Subdivide based
on P(αi) Subdivide Subdivide Subdivide Subdivide
1.0 0.8 0.72 0.592 0.5728
α4 α4 α4 α4 α4
0.8 0.72 0.688 0.5856 0.57152
Decode 0.572
α3 α3 α3 α3 α3
0.4 0.56 0.624 0.5728 0.56896
α2 α2 α2 α2 α2 α3 α3 α1 α2 α4
0.2 0.48 0.592 0.5664 0.56768
α1 α1 α1 α1 α1 A special EOF symbol can

be used to terminate iterations .
0.0 0.4
0.56 0.56 0.5664
LZW Coding
(addresses interpixel redundancy)
• Requires no prior knowledge of symbol probabilities.
• Assigns sequences of source symbols to fixed length

code words.
– There is no one-to-one correspondence between source symbols
and code words.
– Included in GIF, TIFF and PDF file formats
LZW Coding
• LZW builds a codebook (or dictionary) of symbol
sequences (i.e., gray-level sequences) as it processes
the image pixels.
• Each symbol sequence is encoded by its dictionary
location.
Dictionary Location Entry
0 … For example, the sequence of gray-

1 … levels 10-120-51 (3 bytes) will be
… …
encoded by 256 (1 byte)
255 …
256 10-120-51
… …
511 -
LZW Coding (cont’d)
• Initially, the first 256 entries of the dictionary are assigned
to the gray levels 0,1,2,..,255 (i.e., assuming 8 bits/pixel
images)
Initial Dictionary
0 0
1 1
… …
255 255
256 -
… …
511 -
LZW Coding – Main Idea
Example:
As the encoder examines the
39 39 126 126
image pixels, gray level
39 39 126 126
sequences that are not in the
39 39 126 126
dictionary are added to the
39 39 126 126
dictionary.
0 0
1 1
- Is 39 in the dictionary……..Yes
. . - What about 39-39………….No
255 255
256 - 39-39
* Add 39-39 at location 256
511 -
So, 39-39 can be encoded by 256!

Example
39 39 126 126 Concatenated Sequence: CS = CR + P
39 39 126 126
(CR) (P)
39 39 126 126 Dictionary
Location
39 39 126 126
CR = empty
repeat
P=next pixel
CS=CR + P
If CS is found:
(1) CR=CS
else:
(1) Add CS to D
(2) Output D(CR)
(3) CR=P
10 x 9 bits/symbol = 90 bits vs 16 x 8 bits/symbol = 128 bits
Decoding LZW
• Decoding can be done using the dictionary again.

• For image transmission, there is no need to transmit
the dictionary for decoding.
• The dictionary can be built on the “fly” by the
decoder as it reads the received code words.
Run-length coding (RLC)
• Represent sequences of repeating symbols (a “run”) using a

compact representation (symbol, count) :
(i) symbol: the symbol itself
(ii) count: the number of times the symbol repeats
111110000001  (1,5) (0, 6) (1, 1)

aaabbbbbbcc  (a,3) (b, 6) (c, 2)
• Each pair (symbol, count) can be thought as a “new” symbol.
– Huffman or Arithmetic coding could be used to encode them.
Bit-plane coding
• Process each bit plane individually.
(1) Decompose an image into a series of binary images.

(2) Compress each binary image (e.g., using RLC)
Lossy Methods - Taxonomy
(see “Image Compression Techniques” paper)

Lossy Compression – Transform Coding
• Transform the image into some other domain to address
interpixel redundancy.
Quantization is irreversible in general!

Example: Fourier Transform
Note that the magnitude

Approximate of the FT decreases, as
f(x,y) using u, v increase!
fewer F(u,v)
coefficients !
K << N
K-1 K-1
What transformations can be used?
• Various transformations T(u,v) are possible, for example:

– DFT
– DCT (Discrete Cosine Transform)
– KLT (Karhunen-Loeve Transformation)
– Principal Component Analysis (PCA)
• JPEG employs DCT for handling interpixel redundancy.

DCT (Discrete Cosine Transform)
Forward:
Inverse:
if u=0 if v=0
if u>0 if v>0
DCT (cont’d)
• Basis functions for a 4x4 image (i.e., cosines of

different frequencies).
DCT (cont’d)
DFT WHT DCT
Image is divided into

8 x 8 sub-images
(64 coefficients per
sub-image).
Sub-images were
reconstructed
by truncating
50% of the
coefficients.
RMS error was computed

between the original
and reconstructed
images.
RMS error: 2.32 1.78 1.13

DCT (cont’d)
RMS error
• Sub-image size selection:
Reconstructions (75% truncation of coefficients)
original 2 x 2 sub-images 4 x 4 sub-images 8 x 8 sub-images

JPEG Compression
Entropy
encoder
Became an
international
image
compression
standard in
1992.
Entropy
decoder
JPEG - Steps
1. Divide image into 8x8 subimages.
For each subimage do:

2. Shift the gray-levels in the range [-128, 127]
(i.e., reduces the dynamic range requirements of DCT)
3. Apply DCT; yields 64 coefficients

1 DC coefficient: F(0,0)
63 AC coefficients: F(u,v)
Example
[-128, 127] (i.e., DCT spectrum)

The low frequency
components are around the
upper-left corner of the
spectrum (not centered!).
JPEG Steps
4. Quantize coefficients (i.e., reduce the amplitude of

coefficients that do not contribute a lot).
Q(u,v): quantization array

Computing Q[i][j] - Example
• Quantization Array Q[i][j] array

Example (cont’d)
Cq(u,v)
C(u,v)
Quantization
array
Q(u,v) Small magnitude coefficients

have been truncated to zero!
“quality” controls how many of

them will be truncated!
JPEG Steps (cont’d)
5. Order the coefficients using zig-zag ordering
- Creates long runs of zeros (i.e., ideal for RLC)
JPEG Steps (cont’d)
6. Encode coefficients:
6.1 Form “intermediate” symbol sequence.
6.2 Encode “intermediate” symbol sequence into

a binary sequence (e.g., using Huffman coding)
Intermediate Symbol Sequence – DC coefficient
symbol_1 (SIZE) symbol_2 (AMPLITUDE)

(6) (61)
SIZE: # bits need to encode

the amplitude
Amplitude of DC coefficient
symbol_1 symbol_2
(SIZE) (AMPLITUDE)
predictive
coding:
64x64 64x64
The DC coefficient of blocks other than the first

block is substituted by the difference between the
DC coefficient of the current block and that of the
previous block.
Intermediate Symbol Sequence – AC coefficient
symbol_1 (RUN-LENGTH, SIZE) symbol_2 (AMPLITUDE) end of block
RUN-LENGTH: run of zeros preceding coefficient

SIZE: # bits for encoding the amplitude of coefficient
Note: If RUN-LENGTH > 15, use symbol (15,0) as many times as needed , e.g.,
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2  (15, 0)(3, 2)(2)
AC Coefficient Encoding
Symbol_1 Symbol_2
(Variable Length Code (VLC) (Variable Length Integer (VLI)
pre-computed Huffman codes) pre-computed codes)
# bits
Idea: smaller (and

more common)
values use fewer
bytes and take up
less space than
larger (and less
common) values.
DC coefficients are
encoded similarly.
(1,4) (12)  (111110110 1100)

VLC VLI
Final Symbol Sequence
Effect of “Quality” parameter
(58k bytes) (21k bytes) (8k bytes)
lower compression higher compression

Effect of “Quality” parameter (cont’d)
Effect of Quantization:
homogeneous 8 x 8 block
homogeneous 8 x 8 block (cont’d)
De-quantized coefficients
Quantized coefficients (multiply by Q(u,v))
homogeneous 8 x 8 block (cont’d)
Reconstructed
Error is low!
Original
non-homogeneous 8 x 8 block
non-homogeneous 8 x 8 block (cont’d)
De-quantized coefficients
Quantized coefficients (multiply by Q(u,v))
non-homogeneous 8 x 8 block (cont’d)
Reconstructed
Error is high!
Original
Case Study: Fingerprint Compression
• FBI is digitizing fingerprints at 500 dots per inch

using 8 bits of grayscale resolution.
• A single fingerprint card (contains fingerprints from
all 10 figures) turns into about 10 MB of data.
A sample fingerprint image

768 x 768 pixels =589,824 bytes
Need to Preserve Fingerprint Details
The "white" spots in the middle of

the black ridges are sweat pores
which are admissible points of
identification in court.
These details are just a couple

pixels wide!
What compression scheme should be used?
• Lossless or lossy compression?
• In practice lossless compression methods haven’t

done better than 2:1 on fingerprints!
• Does JPEG work well for fingerprint compression?

Results using JPEG compression
file size 45853 bytes
compression ratio: 12.9
Fine details have been lost.
Image has an artificial ‘‘blocky’’

pattern superimposed on it.
Artifacts will affect the

performance of fingerprint
recognition.
WSQ Fingerprint Compression
• An image coding standard for digitized fingerprints

employing the Discrete Wavelet Transform
(Wavelet/Scalar Quantization or WSQ).
• Developed and maintained by:

– FBI
– Los Alamos National Lab (LANL)
– National Institute for Standards and Technology (NIST)
Results using WSQ compression
file size 45621 bytes
compression ratio: 12.9
Fine details are better

preserved.
No “blocky” artifacts.
WSQ Algorithm
Target bit rate can be set via a parameter, similar

to the "quality" parameter in JPEG.
Compression ratio
• FBI’s target bit rate is around 0.75 bits per pixel (bpp)
• This corresponds to a compression ratio of

8/0.75=10.7
• Let’s compare WSQ with JPEG …

Varying compression ratio (cont’d)
0.9 bpp compression
WSQ image, file size 47619 bytes, JPEG image, file size 49658 bytes,
compression ratio 12.4 compression ratio 11.9
0.75 bpp compression
WSQ image, file size 39270 bytes JPEG image, file size 40780 bytes,
0.6 bpp compression
WSQ image, file size 30987 bytes, JPEG image, file size 30081 bytes,
JPEG Modes
• JPEG supports several different modes

– Sequential Mode
– Progressive Mode
– Hierarchical Mode
– Lossless Mode
(see “Survey” paper)
Sequential Mode
• Image is encoded in a single scan (left-to-right, top-to-

bottom); this is the default mode.
Progressive JPEG
• Image is encoded in multiple scans.
• Produces a quick, roughly decoded image when
transmission time is long.
Progressive JPEG (cont’d)
• Main algorithms:
(1) Progressive spectral selection algorithm
(2) Progressive successive approximation algorithm
(3) Hybrid progressive algorithm
(1) Progressive spectral selection algorithm

– Group DCT coefficients into several spectral bands
– Send low-frequency DCT coefficients first
– Send higher-frequency DCT coefficients next
Example:
Example
(2) Progressive successive approximation algorithm

– Send all DCT coefficients but with lower precision.
– Refine DCT coefficients in later scans.
Example:
Example
Example
after 0.9s after 1.6s
after 3.6s after 7.0s

(3) Hybrid progressive algorithm

– Combines spectral selection and successive approximation.
Hierarchical JPEG
• Hierarchical mode encodes the image at different

resolutions.
• Image is transmitted in multiple passes with increased

resolution at each pass.
Hierarchical JPEG (cont’d)
f4
N/4 x N/4
f2
down-sample
N/2 x N/2
up-sample
NxN
Problem 1
Problem 2
Problem 3
Quiz #7
• When: 12/6/2021 @ 1pm
• What: Image Compression
• Duration: 10 minutes

Image Compression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Image Compression

Uploaded by

Copyright:

Available Formats

Image Compression

CS474/674 – Prof. Bebis

• Data and information are not synonymous terms!

• Data is the means by which information is conveyed.

• The same information can be represented by different

Ex1: Your wife, Helen, will meet you at Logan Airport

Ex2: Your wife will meet you at Logan Airport at 5

Ex3: Helen will meet you at Logan at 6:00 pm

Trade-off: information loss vs compression ratio

(1) Coding Redundancy

• The goal of data compression is to reduce one or more

• To compare the efficiency of different coding schemes,

• Coding schemes with lower Lavg are more efficient

rk: k-th gray level

(average length of symbols per code word

Average Image Size:

• Case 1: l(rk) = fixed length

Average image size: 3NM

Average image size: 2.7NM

Example: gray-levels of line #100

Original: 1024 bytes

Goal: reduce the amount of data while preserving as

Question: What is the minimum amount of data that

We need some measure of information!

• Idea: associate information with probability!

• A random event E with probability P(E) contains

Note: when P(E)=1, I(E)=0

• Suppose that pixel values are generated by a random

(assuming statistically independent random events)

• The average information content of an image is:

units of info / pixel

H=1.6614 bits/pixel H=8 bits/pixel H=1.566 bits/pixel

The amount of entropy, and thus information in an image,

Note: if Lavg= H, then R=0 (no data redundancy)

R= Lavg- H or R= 6.19 bits/pixel

First order estimate of H Second order estimate of H

Which estimate is more reliable?

• In general, differences between first-order and

• As we have discussed earlier, some transformation

• No information has been lost – the original image can

Entropy of difference image:

• It is possible that an even better transformation could

We will focus on the Source Encoder/Decoder only.

• Mapper: applies a transformation to the data to account for

• Quantizer: quantizes the data to account for psychovisual

• Symbol encoder: encodes the data to account for coding

• The decoder applies the same steps in inverse order.

• Note: the quantization is irreversible in general!

• Root mean square error (RMS)

• Mean-square signal-to-noise ratio (SNR)

RMS = 5.17 RMS = 15.67 RMS = 14.17

(see “Image Compression Techniques” paper)

• Source symbols (i.e., gray-levels) are encoded one at a time.

• Lavg assuming Huffman coding:

• Decoding can be performed unambiguously using a

• Huffman coding encodes source symbols one at a

• Maps a sequence of symbols to a real number (arithmetic code)

• The mapping is built incrementally (i.e., as each source symbol

– A sub-interval inside the previous sub-interval is chosen to encode the

Subdivide Subdivide Subdivide Subdivide Subdivide

[0.06752, 0.0688) 0.8 1.6

Warning: finite precision arithmetic might cause problems due to truncations!

0.0068 ≈ 0.000100011 (9 bits) (subject to conversion errors;