UNIT 5: Image Compression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

UNIT 5 : Image Compression

• Image compression, the art and science of reducing the amount of data required to
represent a digital image, is one of the most useful and commercially successful
technologies in the field of digital image processing.

Need for compression:


• Data storage
• Data transmission
• Example:

 Twenty seven 8.5 GB dual player DVDs are needed to store it .To put a two-hour
movie on a single DVD, each frame must be compressed –on average-by a factor of
26.3.
 The compression must be even higher for high definition (HD) television, where image
resolutions reach 1920X1080X24bits/image.
 Web page images and high–resolution digital camera photos also are compressed
routinely to save storage space and reduce transmission time.
 For example, residential Internet connections deliver data at speeds ranging from
56kbps via conventional phone lines to more than 12Mbps for broadband.
 The time required to transmit a small 128X128X24 bit full –color image over this
range of speeds is from 7 to 0.03 seconds.
 Compression can reduce transmission time by a factor of 2 to 10 or more.
 Image compression plays an important role in many other areas, including televideo
conferencing, remote sensing, document and medical imaging, and facsimile
transmission (FAX).
 An increasing number of applications depend on the efficient manipulation, storage
and transmission of binary, gray-scale and color images.
 Benefits of Image Compression
 It provides a potential cost savings associated with sending less data over switched
telephone network where cost of call is usually based upon its duration.
 It not only reduces storage requirements but also overall execution time.
 It also reduces the probability of transmission errors since fewer bits are transferred.
 It also provides a level of security against illicit monitoring.

Fundamentals:

 The term data compression refers to the process of reducing the amount of data
required to represent a given quantity of information.
 A clear distinction must be made between data and information. They are not
synonymous. Data are the means by which information is conveyed. Because various
amounts of data may be used to represent the same amount of information,
representations that contain irrelevant or repeated information are said to contain
redundant data.
 If n1 and n2 denote the number of information-carrying units in two data sets that
represent the same information, the relative data redundancy RD of the first data set
(the one characterized by n1) can be defined as

where CR , commonly called the compression ratio, is

For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data set) the
first representation of the information contains no redundant data.
When n2 << n1, CR  ∞ and R1, implying significant compression and highly redundant
data.
Finally, when n2 >> n1 , CR 0 and RD∞, indicating that the second data set contains much
more data than the original representation. This, of course, is the normally undesirable case of
data expansion.
In general, CR and RD lie in the open intervals (0,∞) and (-∞, 1), respectively.

A practical compression ratio, such as 10 (or 10:1), means that the first data set has 10
information carrying units (say, bits) for every 1 unit in the second or compressed data set.
The corresponding redundancy of 0.9 implies that 90% of the data in the first data set is
redundant.
In digital image compression, three basic data redundancies can be identified and exploited:
(1) Coding Redundancy
(2) Interpixel Redundancy(spatial and temporal redundancy)
(3) Psychovisual Redundancy(irrelevant information)

Data compression is achieved when one or more of these redundancies are reduced or
eliminated.

• Coding Redundancy: The uncompressed image usually is coded with each pixel by a
fixed length. For example, an image with 256 gray scales is represented by an array of
8-bit integers. Using some variable length code schemes such as Huffman coding and
arithmetic coding may produce compression. Most 2-D intensity arrays contain more
bits than are needed to represent the intensities

• Inter-pixel Redundancy: It is a redundancy corresponding to statistical dependencies


among pixels, especially between neighboring pixels. Pixels of most 2-D intensity
arrays are correlated spatially and video sequences are temporally correlated

• Psycho-visual Redundancy: It is a redundancy corresponding to different sensitivities


to all image signals by human eyes. Therefore, eliminating some less relative
important information in our visual processing may be acceptable. Most 2-D intensity
arrays contain information that is ignored by the human visual system.

Coding Redundancy:

Let us assume that a discrete random variable rk in the interval [0, 1] represents the gray levels
of an image and that each rk occurs with probability p r (rk).

where L is the number of gray levels, nk is the number of times that the kth gray level appears
in the image, and n is the total number of pixels in the image. If the number of bits used to
represent each value of rk is Ɩ(rk), then the average number of bits required to represent each
pixel is
That is, the average length of the code words assigned to the various gray-level values is
found by summing the product of the number of bits used to represent each gray level and the
probability that the gray level occurs. Thus the total number of bits required to code an M X N
image is MNLavg.
Example of variable length coding:

The average number of bits required to represent each pixel is


L 1
Lavg   l (rk ) pr (rk )  0.25(2)  0.47(1)  0.24(3)  0.03(3)  1.81bits
k 0
8
C  4.42
1.81
R  1  1/ 4.42  0.774
Thus, 77.4% of the data in the original 8-bit 2-D intensity array is redundant.

Interpixel Redundancy:

1. All 256 intensities are equally probable.


2. The pixels along each line are identical.
3. The intensity of each line was selected randomly.
Run-length pair specifies the start of a new intensity and the
number of consecutive pixels that have that intensity.
Each 256-pixel line of the original representation is replaced
by a single 8-bit intensity value and length 256 in the run-length
representation. The compression ratio is
256  256  8
 128 :1
(256  256)  8
Psychovisual Redundancy:

The brightness of a region, as perceived by the eye, depends on factors other than simply the
light reflected by the region.
For example, intensity variations (Mach bands) can be perceived in an area of constant
intensity. Such phenomena result from the fact that the eye does not respond with equal
sensitivity to all visual information.
Certain information simply has less relative importance than other information in normal
visual processing. This information is said to be psychovisually redundant. It can be
eliminated without significantly impairing the quality of image perception.
That psychovisual redundancies exist should not come as a surprise, because human
perception of the information in an image normally does not involve quantitative analysis of
every pixel value in the image.
In general, an observer searches for distinguishing features such as edges or textural regions
and mentally combines them into recognizable groupings. The brain then correlates these
groupings with prior knowledge in order to complete the image interpretation process.
Unlike coding and interpixel redundancy, psychovisual redundancy is associated with real or
quantifiable visual information. Its elimination is possible only because the information itself
is not essential for normal visual processing. Since the elimination of psychovisually
redundant data results in a loss of quantitative information, it is commonly referred to as
quantization.
This terminology is consistent with normal usage of the word, which generally means the
mapping of a broad range of input values to a limited number of output values. As it is an
irreversible operation (visual information is lost), quantization results in lossy data
compression.

Fidelity criteria:

The removal of psychovisually redundant data results in a loss of real or quantitative visual
information. Because information of interest may be lost, a repeatable or reproducible means
of quantifying the nature and extent of information loss is highly desirable.

Two general classes of criteria are used as the basis for such an assessment:

A) Objective fidelity criteria(based on human observers )


B) Subjective fidelity criteria(mathematically defined criteria)

Objective fidelity criteria:


Although objective fidelity criteria offer a simple and convenient mechanism for
evaluating information loss, most decompressed images ultimately are viewed by humans.
Consequently, measuring image quality by the subjective evaluations of a human observer
often is more appropriate. This can be accomplished by showing a "typical" decompressed
image to an appropriate cross section of viewers and averaging their evaluations. The
evaluations may be made using an absolute rating scale or by means of side-by-side
comparisons of f(x, y) and f^(x, y).

Subjective fidelity criteria:

When the level of information loss can be expressed as a function of the original or
input image and the compressed and subsequently decompressed output image, it is said to be
based on an objective fidelity criterion. A good example is the root-mean-square (rms) error
between an input and output image. Let f(x, y) represent an input image and let f(x, y) denote
an estimate or approximation of f(x, y) that results from compressing and subsequently
decompressing the input. For any value of x and y, the error e(x, y) between f (x, y) and f^ (x,
y) can be defined as

so that the total error between the two images is

where the images are of size M X N.

The root-mean-square error, erms, between f(x, y) and f^(x, y) then is the square root of the
squared error averaged over the M X N array, or
A closely related objective fidelity criterion is the mean-square signal-to-noise ratio of the
compressed-decompressed image. If f^ (x, y) is considered to be the sum of the original image
f(x, y) and a noise signal e(x, y), the mean-square signal-to-noise ratio of the output image,
denoted SNRrms, is

The rms value of the signal-to-noise ratio, denoted SNRrms, is obtained by taking the square
root of Eq. above.
Measuring Image Information:
A key question in image compression is: “What is the minimum amount of data that is
sufficient to describe completely an image without loss of information?”
How do we measure the information content of an image?
We assume that information generation is a probabilistic process.

A random event E with probability P(E) contains:

I(E)=0 when P(E)=1

• Suppose that gray level values are generated by a random process, then rk contains:

units of information!

• Average information content of an image:


L 1
E   I (rk ) P(rk )
k 0
Entropy:

units/pixel(e.g., bits/pixel)
Redundancy:

If Lavg= H, then R=0 (no redundancy).

Image compression models:


Fig. 3.1 shows, a compression system consists of two distinct structural blocks: an encoder
and a decoder. An input image f(x, y) is fed into the encoder, which creates a set of symbols
from the input data. After transmission over the channel, the encoded representation is fed to
the decoder, where a reconstructed output image f^(x, y) is generated. In general, f^(x, y) may
or may not be an exact replica of f(x, y). If it is, the system is error free or information
preserving; if not, some level of distortion is present in the reconstructed image.
Both the encoder and decoder shown in Fig. 3.1 consist of two relatively independent
functions or subblocks. The encoder is made up of a source encoder, which removes input
redundancies, and a channel encoder, which increases the noise immunity of the source
encoder's output. The decoder includes a channel decoder followed by a source decoder. If the
channel between the encoder and decoder is noise free, the channel encoder and decoder are
omitted, and the general encoder and decoder become the source encoder and decoder,
respectively.

Fig.3.1 A general compression system model


The Source Encoder and Decoder:
The source encoder is responsible for reducing or eliminating any coding, interpixel, or
psychovisual redundancies in the input image.
As Fig. 3.2 (a) shows, each operation is designed to reduce one of the three redundancies.
Figure 3.2 (b) depicts the corresponding source decoder.
In the first stage of the source encoding process, the mapper transforms the input data into a
(usually nonvisual) format designed to reduce interpixel redundancies in the input image. This
operation generally is reversible and may or may not reduce directly the amount of data
required to represent the image.
Fig.3.2 (a) Source encoder and (b) source decoder model

Run-length coding is an example of a mapping that directly results in data compression in this
initial stage of the overall source encoding process. The representation of an image by a set of
transform coefficients is an example of the opposite case. Here, the mapper transforms the
image into an array of coefficients, making its interpixel redundancies more accessible for
compression in later stages of the encoding process.
The second stage, or quantizer block in Fig. 3.2 (a), reduces the accuracy of the
mapper's output in accordance with some pre established fidelity criterion. This stage reduces
the psychovisual redundancies of the input image. This operation is irreversible. Thus it must
be omitted when error-free compression is desired.
In the third and final stage of the source encoding process, the symbol coder creates a
fixed- or variable-length code to represent the quantizer output and maps the output in
accordance with the code. The term symbol coder distinguishes this coding operation from the
overall source encoding process. In most cases, a variable-length code is used to represent the
mapped and quantized data set. It assigns the shortest code words to the most frequently
occurring output values and thus reduces coding redundancy. The operation is reversible.
Upon completion of the symbol coding step, the input image has been processed to remove
each of the three redundancies.
Figure 3.2(a) shows the source encoding process as three successive operations, but all
three operations are not necessarily included in every compression system. for example, the
quantizer must be omitted when error-free compression is desired.
In the predictive compression systems, for instance, the mapper and quantizer are often
represented by a single block, which simultaneously performs both operations.
The source decoder shown in Fig. 3.2(b) contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse operations
of the source encoder's symbol encoder and mapper blocks. Because quantization results in
irreversible information loss, an inverse quantizer block is not included in the general source
decoder model shown in Fig. 3.2(b).

IMAGE COMPRESSION TECHNIQUES:


The Image Compression techniques are broadly classified into two categories depending
upon whether or not an exact replica of the original image could be reconstructed using the
compressed image.
These are
 lossless and
 lossy techniques
Lossless Compression Techniques:
In Lossless compression techniques, the original image can be perfectly recovered from the
compressed (encoded) image. These are also called noiseless since they do not add noise to
the signal (image).It is also known as Entropy Coding since it uses statistics/decomposition
techniques to eliminate/minimize redundancy. Lossless Compression is used only for a few
applications with stringent requirements such as medical imaging.
Lossy Compression Techniques:
Lossy schemes give much better compression ratios as compared with Lossless schemes.
These types of schemes are commonly used as the quality of the reconstructed image is
satisfactory for most applications.
Major performance considerations of a Lossy Compression scheme include:
1. Compression ratio
2. Signal-to- noise ratio
3. Speed of encoding and decoding.
Lossy Compression techniques include following schemes:
1. Transformation Coding
2. Vector Quantization
3. Fractal Coding
4. Block Truncation Coding
5. Sub band Coding.

Some basic compression methods:


Variable-Length Coding:

The simplest approach to error-free image compression is to reduce only coding


redundancy. Coding redundancy normally is present in any natural binary encoding of the
gray levels in an image. It can be eliminated by coding the gray levels. To do so requires
construction of a variable-length code that assigns the shortest possible code words to the
most probable gray levels. Here, we examine several optimal and near optimal techniques for
constructing such a code. These techniques are formulated in the language of information
theory. In practice, the source symbols may be either the gray levels of an image or the output
of a gray-level mapping operation (pixel differences, run lengths, and so on).

Huffman coding:

The most popular technique for removing coding redundancy is due to Huffman
(Huffman [1952]). When coding the symbols of an information source individually, Huffman
coding yields the smallest possible number of code symbols per source symbol. In terms of
the noiseless coding theorem, the resulting code is optimal for a fixed value of n, subject to the
constraint that the source symbols be coded one at a time.

The first step in Huffman's approach is to create a series of source reductions by ordering the
probabilities of the symbols under consideration and combining the lowest probability
symbols into a single symbol that replaces them in the next source reduction.
Figure 4.1 illustrates this process for binary coding .
At the far left, a hypothetical set of source symbols and their probabilities are ordered from
top to bottom in terms of decreasing probability values. To form the first source reduction, the
bottom two probabilities, 0.06 and 0.04, are combined to form a "compound symbol" with
probability 0.1. This compound symbol and its associated probability are placed in the first
source reduction column so that the probabilities of the reduced source are also ordered from
the most to the least probable.
This process is then repeated until a reduced source with two symbols (at the far right) is
reached.
The second step in Huffman's procedure is to code each reduced source, starting with the
smallest source and working back to the original source. The minimal length binary code for a
two-symbol source, of course, is the symbols 0 and 1.

As Fig. 4.2 shows, these symbols are assigned to the two symbols on the right (the assignment
is arbitrary; reversing the order of the 0 and 1 would work just as well). As the reduced source
symbol with probability 0.6 was generated by combining two symbols in the reduced source
to its left, the 0 used to code it is now assigned to both of these symbols, and a 0 and 1 are
arbitrarily

Fig.4.1 Huffman source reductions.

Fig.4.2 Huffman code assignment procedure.


appended to each to distinguish them from each other. This operation is then repeated for each
reduced source until the original source is reached. The final code appears at the far left in Fig.
4.2.
The average length of this code is

and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code efficiency is
0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities subject to
the constraint that the symbols be coded one at a time. After the code has been created, coding
and/or decoding is accomplished in a simple lookup table manner. The code itself is an
instantaneous uniquely decodable block code. It is called a block code because each source
symbol is mapped into a fixed sequence of code symbols. It is instantaneous, because each
code word in a string of code symbols can be decoded without referencing succeeding
symbols. It is uniquely decodable, because any string of code symbols can be decoded in only
one way. Thus, any string of Huffman encoded symbols can be decoded by examining the
individual symbols of the string in a left to right manner. For the binary code of Fig. 4.2, a
left-to-right scan of the encoded string 010100111100 reveals that the first valid code word is
01010, which is the code for symbol a3 .The next valid code is 011, which corresponds to
symbol a1. Continuing in this manner reveals the completely decoded message to be
a3a1a2a2a6.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates nonblock
codes.
In arithmetic coding, a one-to-one correspondence between source symbols and code words
does not exist.
Instead, an entire sequence of source symbols (or message) is assigned a single arithmetic
code word. The code word itself defines an interval of real numbers between 0 and 1. As the
number of symbols in the message increases, the interval used to represent it becomes smaller
and the number of information units (say, bits) required to represent the interval becomes
larger. Each symbol of the message reduces the size of the interval in accordance with its
probability of occurrence. Because the technique does not require, as does Huffman's
approach, that each source symbol translate into an integral number of code symbols (that is,
that the symbols be coded one at a time), it achieves (but only in theory) the bound established
by the noiseless coding theorem.
Fig.5.1 Arithmetic coding procedure
Figure 5.1 illustrates the basic arithmetic coding process. Here, a five-symbol sequence or
message, a1a2a3a3a4, from a four-symbol source is coded. At the start of the coding process,
the message is assumed to occupy the entire half-open interval [0, 1). As Table 5.2 shows, this
interval is initially subdivided into four regions based on the probabilities of each source
symbol. Symbol ax, for example, is associated with subinterval [0, 0.2). Because it is the first
symbol of the message being coded, the message interval is initially narrowed to [0, 0.2). Thus
in Fig. 5.1 [0, 0.2) is expanded to the full height of the figure and its end points labeled by the
values of the narrowed range.
The narrowed range is then subdivided in accordance with the original source symbol
probabilities and the process continues with the next message symbol.

Table 5.1 Arithmetic coding example

In this manner, symbol a2 narrows the subinterval to [0.04, 0.08), a3 further narrows it to
[0.056, 0.072), and so on. The final message symbol, which must be reserved as a special end-
of-message indicator, narrows the range to [0.06752, 0.0688). Of course, any number within
this subinterval—for example, 0.068—can be used to represent the message.
In the arithmetically coded message of Fig. 5.1, three decimal digits are used to represent the
five-symbol message. This translates into 3/5 or 0.6 decimal digits per source symbol and
compares favorably with the entropy of the source, which is 0.58 decimal digits or 10-ary
units/symbol. As the length of the sequence being coded increases, the resulting arithmetic
code approaches the bound established by the noiseless coding theorem.
In practice, two factors cause coding performance to fall short of the bound: (1) the addition of
the end-of-message indicator that is needed to separate one message from an-other; and (2) the
use of finite precision arithmetic. Practical implementations of arithmetic coding address the
latter problem by introducing a scaling strategy and a rounding strategy (Langdon and
Rissanen [1981]). The scaling strategy renormalizes each subinterval to the [0, 1) range before
subdividing it in accordance with the symbol probabilities. The rounding strategy guarantees
that the truncations associated with finite precision arithmetic do not prevent the coding
subintervals from being represented accurately.

Run-Length Coding:

Run Length Encoding is a very simple form of data compression in which runs of data
(that is, sequences in which the same data value occurs in many consecutive data
elements) are stored as a single data value and count, rather than as the Original run.
This is most useful on data that contains many such runs: for example, relatively simple
graphic images such as icons, line drawings, and animations.
Run-length Encoding, or RLE is a technique used to reduce the size of a repeating string
of characters.
This repeating string is called a run, typically RLE encodes a run of symbols into two
bytes , a count and a symbol.
Compression is achieved by eliminating a simple form of spatial redundancy –groups of
identical intensities.
When there are few or no runs of identical pixels, run-length encoding results in data
expansion.
RLE cannot achieve high compression ratios compared to other compression methods.
It is easy to implement and is quick to execute.
Run-length encoding is supported by most bitmap file formats such as TIFF, BMP and
PCX.
Example:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWW
WWWWWWWWWWWWWBWWWWWWWWWWWWWW
RLE coding:
12W1B12W3B24W1B14W
In one-dimensional run-length coding schemes for binary images, runs of continuous 1s or
0s in every row of an image are encoded together, resulting in substantial bit savings.
Either, it is to be indicated whether the row begins with a run of 1s or 0s, or the encoding
of a row compulsorily begins with 1s, where the count could be zero, if it begins with 0s.
Run-length encoding may be illustrated with the following example of a row of an image:
0001101000111110010111111111111111011100000000010000111111100000 001
Let us say, we adopt the second scheme, to begin with runs of 1s. Hence, the first run
count in the given binary sequence is 0.
Then we have a run of three 0s. Hence, the next count is 3. Proceeding in this manner, the
reader may verify that the given binary sequence gets encoded to:
0,3,2,1,1,3,5,2,1,1,15,1,3,9,1,4,7,7,1
Every run count values for 1s and 0s are associated with some probabilities, which may be
determined through some statistical observations. It is therefore possible to estimate the
entropies H0 and H1 for the runs of 0s and 1s respectively. If L0 and L1 are the average
values of run-lengths for 0s and 1s, the run-length entropy of the image is expressed as

We may apply some variable length coding to encode the run values. Although, run-length
coding is essentially a compression technique for binary images, it can be applied to gray-
scale images in an indirect way by first decomposing the gray-scale image to bit-plane
images and then applying run length coding on each of the bit-plane images.

Bit-Plane Coding:
An effective technique for reducing an image's inter pixel redundancies is to process
the image's bit planes individually. The technique, called bit-plane coding, is based on the
concept of decomposing a multilevel (monochrome or color) image into a series of binary
images and compressing each binary image via one of several well-known binary compression
methods.
Bit-plane decomposition:
The gray levels of an m-bit gray-scale image can be represented in the form of the base 2
polynomial

Based on this property, a simple method of decomposing the image into a collection of binary
images is to separate the m coefficients of the polynomial into m 1-bit bit planes.

The zeroth-order bit plane is generated by collecting the a0 bits of each pixel, while the (m - 1)
st-order bit plane contains the am-1, bits or coefficients. In general, each bit plane is numbered
from 0 to m-1 and is constructed by setting its pixels equal to the values of the appropriate bits
or polynomial coefficients from each pixel in the original image.

The inherent disadvantage of this approach is that small changes in gray level can have a
significant impact on the complexity of the bit planes. If a pixel of intensity 127 (01111111) is
adjacent to a pixel of intensity 128 (10000000), for instance, every bit plane will contain a
corresponding 0 to 1 (or 1 to 0) transition. For example, as the most significant bits of the two
binary codes for 127 and 128 are different, bit plane 7 will contain a zero-valued pixel next to
a pixel of value 1, creating a 0 to 1 (or 1 to 0) transition at that point.

An alternative decomposition approach (which reduces the effect of small gray-level


variations) is to first represent the image by an m-bit Gray code. The m-bit Gray code gm-
1...g2g1g0 that corresponds to the polynomial in Eq. above can be computed from

Here, denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, small changes in gray level are
less likely to affect all m bit planes. For instance, when gray levels 127 and 128 are adjacent,
only the 7th bit plane will contain a 0 to 1 transition, because the Gray codes that correspond
to 127 and 128 are 11000000 and 01000000, respectively.

Lossless Predictive Coding:


The error-free compression approach does not require decomposition of an image into
a collection of bit planes. The approach, commonly referred to as lossless predictive coding, is
based on eliminating the interpixel redundancies of closely spaced pixels by extracting and
coding only the new information in each pixel. The new information of a pixel is defined as
the difference between the actual and predicted value of that pixel.
Figure 8.1 shows the basic components of a lossless predictive coding system. The
system consists of an encoder and a decoder, each containing an identical predictor. As each
successive pixel of the input image, denoted fn, is introduced to the encoder, the predictor
generates the anticipated value of that pixel based on some number of past inputs. The output
of the predictor is then rounded to the nearest integer, denoted f^ n and used to form the
difference or prediction error which is coded using a variable-length code (by the symbol
encoder) to generate the next element of the compressed data stream.
Fig.8.1 A lossless predictive coding model: (a) encoder; (b) decoder

The decoder of Fig. 8.1 (b) reconstructs en from the received variable-length code words and
performs the inverse operation

Various local, global, and adaptive methods can be used to generate f^ n. In most cases,
however, the prediction is formed by a linear combination of m previous pixels. That is,

where m is the order of the linear predictor, round is a function used to denote the
rounding or nearest integer operation, and the αi, for i = 1,2,..., m are prediction coefficients.
In raster scan applications, the subscript n indexes the predictor outputs in accordance with
their time of occurrence. That is, fn, f^n and en in Eqns. above could be replaced with the more
explicit notation f (t), f^(t), and e (t), where t represents time. In other cases, n is used as an
index on the spatial coordinates and/or frame number (in a time sequence of images) of an
image.
In 1-D linear predictive coding, for example, Eq. above can be written as

where each subscripted variable is now expressed explicitly as a function of spatial


coordinates x and y. The Eq. indicates that the 1-D linear prediction f(x, y) is a function of the
previous pixels on the current line alone. In 2-D predictive coding, the prediction is a function
of the previous pixels in a left-to-right, top-to-bottom scan of an image. In the 3-D case, it is
based on these pixels and the previous pixels of preceding frames. Equation above cannot be
evaluated for the first m pixels of each line, so these pixels must be coded by using other
means (such as a Huffman code) and considered as an overhead of the predictive coding
process.
Lossy Predictive Coding:
In this type of coding, we add a quantizer to the lossless predictive model and examine the
resulting trade-off between reconstruction accuracy and compression performance.
As Fig.9 shows, the quantizer, which absorbs the nearest integer function of the error-free
encoder, is inserted between the symbol encoder and the point at which the prediction error is
formed.
It maps the prediction error into a limited range of outputs, denoted e^ n which establish the
amount of compression and distortion associated with lossy predictive coding.

Fig. 9 A lossy predictive coding model: (a) encoder and (b) decoder.

In order to accommodate the insertion of the quantization step, the error-free encoder of figure
must be altered so that the predictions generated by the encoder and decoder are equivalent.
As Fig.9 (a) shows, this is accomplished by placing the lossy encoder's predictor within a
feedback loop, where its input, denoted f˙n, is generated as a function of past predictions and
the corresponding quantized errors. That is,

This closed loop configuration prevents error buildup at the decoder's output. Note from Fig.
9(b) that the output of the decoder also is given by the above Eqn.
Optimal predictors:

The optimal predictor used in most predictive coding applications minimizes the encoder's
mean-square prediction error
subject to the constraint that

and

That is, the optimization criterion is chosen to minimize the mean-square prediction error, the
quantization error is assumed to be negligible (e˙n ≈ en), and the prediction is constrained to a
linear combination of m previous pixels.1 These restrictions are not essential, but they
simplify the analysis considerably and, at the same time, decrease the computational
complexity of the predictor. The resulting predictive coding approach is referred to as
differential pulse code modulation (DPCM).

Transform Coding:

All the predictive coding techniques operate directly on the pixels of an image and
thus are spatial domain methods. In this coding, we consider compression techniques that are
based on modifying the transform of an image. In transform coding, a reversible, linear
transform (such as the Fourier transform) is used to map the image into a set of transform
coefficients, which are then quantized and coded. For most natural images, a significant
number of the coefficients have small magnitudes and can be coarsely quantized (or discarded
entirely) with little image distortion. A variety of transformations, including the discrete
Fourier transform (DFT), can be used to transform the image data.

Fig. 10 A transform coding system: (a) encoder; (b) decoder.

Figure 10 shows a typical transform coding system. The decoder implements the inverse
sequence of steps (with the exception of the quantization function) of the encoder, which
performs four relatively straightforward operations: subimage decomposition, transformation,
quantization, and coding.
An N X N input image first is subdivided into subimages of size n X n, which are then
transformed to generate (N/n) 2 subimage transform arrays, each of size n X n.
The goal of the transformation process is to decorrelate the pixels of each subimage, or to
pack as much information as possible into the smallest number of transform coefficients.
The quantization stage then selectively eliminates or more coarsely quantizes the coefficients
that carry the least information. These coefficients have the smallest impact on reconstructed
subimage quality.
The encoding process terminates by coding (normally using a variable-length code) the
quantized coefficients. Any or all of the transform encoding steps can be adapted to local
image content, called adaptive transform coding, or fixed for all subimages, called
nonadaptive transform coding.

Consider a subimage of size n  n whose forward, discrete


transform T (u , v ) can be expressed in terms of the relation

n 1 n 1
T (u , v )    g ( x, y ) r ( x, y , u , v )
x 0 y 0

for u , v  0,1, 2, ..., n -1.

Given T (u , v ), g ( x, y ) similarly can be obtained using the


generalized inverse discrete transform

n 1 n 1
g ( x, y )    T (u , v) s ( x, y, u , v)
u 0 v 0

for x, y  0,1, 2,..., n -1.

Two main types of image transforms:


-Orthogonal transform:
e.g. Walsh-Hadamard transform, DCT
-Subband transform:
e.g. Wavelet transform
Image formats and compression standards:

In the context of digital imaging, an image file format is a standard way to organize and store
image data. It defines how the data is arranged and the type of compression, if any, that is
used.
An image container is similar to a file format but handles multiple types of image data.
Image compression standards define procedures for compressing and decompressing
images.

ISO: International standards Organization


IEC : International Electrotechnical Commission
ITU -T: International Telecommunication Union
CCITT: Consultative Committee of International Telegraph and Telephone

You might also like