Professional Documents
Culture Documents
UNIT 5: Image Compression
UNIT 5: Image Compression
UNIT 5: Image Compression
• Image compression, the art and science of reducing the amount of data required to
represent a digital image, is one of the most useful and commercially successful
technologies in the field of digital image processing.
Twenty seven 8.5 GB dual player DVDs are needed to store it .To put a two-hour
movie on a single DVD, each frame must be compressed –on average-by a factor of
26.3.
The compression must be even higher for high definition (HD) television, where image
resolutions reach 1920X1080X24bits/image.
Web page images and high–resolution digital camera photos also are compressed
routinely to save storage space and reduce transmission time.
For example, residential Internet connections deliver data at speeds ranging from
56kbps via conventional phone lines to more than 12Mbps for broadband.
The time required to transmit a small 128X128X24 bit full –color image over this
range of speeds is from 7 to 0.03 seconds.
Compression can reduce transmission time by a factor of 2 to 10 or more.
Image compression plays an important role in many other areas, including televideo
conferencing, remote sensing, document and medical imaging, and facsimile
transmission (FAX).
An increasing number of applications depend on the efficient manipulation, storage
and transmission of binary, gray-scale and color images.
Benefits of Image Compression
It provides a potential cost savings associated with sending less data over switched
telephone network where cost of call is usually based upon its duration.
It not only reduces storage requirements but also overall execution time.
It also reduces the probability of transmission errors since fewer bits are transferred.
It also provides a level of security against illicit monitoring.
Fundamentals:
The term data compression refers to the process of reducing the amount of data
required to represent a given quantity of information.
A clear distinction must be made between data and information. They are not
synonymous. Data are the means by which information is conveyed. Because various
amounts of data may be used to represent the same amount of information,
representations that contain irrelevant or repeated information are said to contain
redundant data.
If n1 and n2 denote the number of information-carrying units in two data sets that
represent the same information, the relative data redundancy RD of the first data set
(the one characterized by n1) can be defined as
For the case n2 = n1, CR = 1 and RD = 0, indicating that (relative to the second data set) the
first representation of the information contains no redundant data.
When n2 << n1, CR ∞ and R1, implying significant compression and highly redundant
data.
Finally, when n2 >> n1 , CR 0 and RD∞, indicating that the second data set contains much
more data than the original representation. This, of course, is the normally undesirable case of
data expansion.
In general, CR and RD lie in the open intervals (0,∞) and (-∞, 1), respectively.
A practical compression ratio, such as 10 (or 10:1), means that the first data set has 10
information carrying units (say, bits) for every 1 unit in the second or compressed data set.
The corresponding redundancy of 0.9 implies that 90% of the data in the first data set is
redundant.
In digital image compression, three basic data redundancies can be identified and exploited:
(1) Coding Redundancy
(2) Interpixel Redundancy(spatial and temporal redundancy)
(3) Psychovisual Redundancy(irrelevant information)
Data compression is achieved when one or more of these redundancies are reduced or
eliminated.
• Coding Redundancy: The uncompressed image usually is coded with each pixel by a
fixed length. For example, an image with 256 gray scales is represented by an array of
8-bit integers. Using some variable length code schemes such as Huffman coding and
arithmetic coding may produce compression. Most 2-D intensity arrays contain more
bits than are needed to represent the intensities
Coding Redundancy:
Let us assume that a discrete random variable rk in the interval [0, 1] represents the gray levels
of an image and that each rk occurs with probability p r (rk).
where L is the number of gray levels, nk is the number of times that the kth gray level appears
in the image, and n is the total number of pixels in the image. If the number of bits used to
represent each value of rk is Ɩ(rk), then the average number of bits required to represent each
pixel is
That is, the average length of the code words assigned to the various gray-level values is
found by summing the product of the number of bits used to represent each gray level and the
probability that the gray level occurs. Thus the total number of bits required to code an M X N
image is MNLavg.
Example of variable length coding:
Interpixel Redundancy:
The brightness of a region, as perceived by the eye, depends on factors other than simply the
light reflected by the region.
For example, intensity variations (Mach bands) can be perceived in an area of constant
intensity. Such phenomena result from the fact that the eye does not respond with equal
sensitivity to all visual information.
Certain information simply has less relative importance than other information in normal
visual processing. This information is said to be psychovisually redundant. It can be
eliminated without significantly impairing the quality of image perception.
That psychovisual redundancies exist should not come as a surprise, because human
perception of the information in an image normally does not involve quantitative analysis of
every pixel value in the image.
In general, an observer searches for distinguishing features such as edges or textural regions
and mentally combines them into recognizable groupings. The brain then correlates these
groupings with prior knowledge in order to complete the image interpretation process.
Unlike coding and interpixel redundancy, psychovisual redundancy is associated with real or
quantifiable visual information. Its elimination is possible only because the information itself
is not essential for normal visual processing. Since the elimination of psychovisually
redundant data results in a loss of quantitative information, it is commonly referred to as
quantization.
This terminology is consistent with normal usage of the word, which generally means the
mapping of a broad range of input values to a limited number of output values. As it is an
irreversible operation (visual information is lost), quantization results in lossy data
compression.
Fidelity criteria:
The removal of psychovisually redundant data results in a loss of real or quantitative visual
information. Because information of interest may be lost, a repeatable or reproducible means
of quantifying the nature and extent of information loss is highly desirable.
Two general classes of criteria are used as the basis for such an assessment:
When the level of information loss can be expressed as a function of the original or
input image and the compressed and subsequently decompressed output image, it is said to be
based on an objective fidelity criterion. A good example is the root-mean-square (rms) error
between an input and output image. Let f(x, y) represent an input image and let f(x, y) denote
an estimate or approximation of f(x, y) that results from compressing and subsequently
decompressing the input. For any value of x and y, the error e(x, y) between f (x, y) and f^ (x,
y) can be defined as
The root-mean-square error, erms, between f(x, y) and f^(x, y) then is the square root of the
squared error averaged over the M X N array, or
A closely related objective fidelity criterion is the mean-square signal-to-noise ratio of the
compressed-decompressed image. If f^ (x, y) is considered to be the sum of the original image
f(x, y) and a noise signal e(x, y), the mean-square signal-to-noise ratio of the output image,
denoted SNRrms, is
The rms value of the signal-to-noise ratio, denoted SNRrms, is obtained by taking the square
root of Eq. above.
Measuring Image Information:
A key question in image compression is: “What is the minimum amount of data that is
sufficient to describe completely an image without loss of information?”
How do we measure the information content of an image?
We assume that information generation is a probabilistic process.
• Suppose that gray level values are generated by a random process, then rk contains:
units of information!
units/pixel(e.g., bits/pixel)
Redundancy:
Run-length coding is an example of a mapping that directly results in data compression in this
initial stage of the overall source encoding process. The representation of an image by a set of
transform coefficients is an example of the opposite case. Here, the mapper transforms the
image into an array of coefficients, making its interpixel redundancies more accessible for
compression in later stages of the encoding process.
The second stage, or quantizer block in Fig. 3.2 (a), reduces the accuracy of the
mapper's output in accordance with some pre established fidelity criterion. This stage reduces
the psychovisual redundancies of the input image. This operation is irreversible. Thus it must
be omitted when error-free compression is desired.
In the third and final stage of the source encoding process, the symbol coder creates a
fixed- or variable-length code to represent the quantizer output and maps the output in
accordance with the code. The term symbol coder distinguishes this coding operation from the
overall source encoding process. In most cases, a variable-length code is used to represent the
mapped and quantized data set. It assigns the shortest code words to the most frequently
occurring output values and thus reduces coding redundancy. The operation is reversible.
Upon completion of the symbol coding step, the input image has been processed to remove
each of the three redundancies.
Figure 3.2(a) shows the source encoding process as three successive operations, but all
three operations are not necessarily included in every compression system. for example, the
quantizer must be omitted when error-free compression is desired.
In the predictive compression systems, for instance, the mapper and quantizer are often
represented by a single block, which simultaneously performs both operations.
The source decoder shown in Fig. 3.2(b) contains only two components: a symbol
decoder and an inverse mapper. These blocks perform, in reverse order, the inverse operations
of the source encoder's symbol encoder and mapper blocks. Because quantization results in
irreversible information loss, an inverse quantizer block is not included in the general source
decoder model shown in Fig. 3.2(b).
Huffman coding:
The most popular technique for removing coding redundancy is due to Huffman
(Huffman [1952]). When coding the symbols of an information source individually, Huffman
coding yields the smallest possible number of code symbols per source symbol. In terms of
the noiseless coding theorem, the resulting code is optimal for a fixed value of n, subject to the
constraint that the source symbols be coded one at a time.
The first step in Huffman's approach is to create a series of source reductions by ordering the
probabilities of the symbols under consideration and combining the lowest probability
symbols into a single symbol that replaces them in the next source reduction.
Figure 4.1 illustrates this process for binary coding .
At the far left, a hypothetical set of source symbols and their probabilities are ordered from
top to bottom in terms of decreasing probability values. To form the first source reduction, the
bottom two probabilities, 0.06 and 0.04, are combined to form a "compound symbol" with
probability 0.1. This compound symbol and its associated probability are placed in the first
source reduction column so that the probabilities of the reduced source are also ordered from
the most to the least probable.
This process is then repeated until a reduced source with two symbols (at the far right) is
reached.
The second step in Huffman's procedure is to code each reduced source, starting with the
smallest source and working back to the original source. The minimal length binary code for a
two-symbol source, of course, is the symbols 0 and 1.
As Fig. 4.2 shows, these symbols are assigned to the two symbols on the right (the assignment
is arbitrary; reversing the order of the 0 and 1 would work just as well). As the reduced source
symbol with probability 0.6 was generated by combining two symbols in the reduced source
to its left, the 0 used to code it is now assigned to both of these symbols, and a 0 and 1 are
arbitrarily
and the entropy of the source is 2.14 bits/symbol. The resulting Huffman code efficiency is
0.973.
Huffman's procedure creates the optimal code for a set of symbols and probabilities subject to
the constraint that the symbols be coded one at a time. After the code has been created, coding
and/or decoding is accomplished in a simple lookup table manner. The code itself is an
instantaneous uniquely decodable block code. It is called a block code because each source
symbol is mapped into a fixed sequence of code symbols. It is instantaneous, because each
code word in a string of code symbols can be decoded without referencing succeeding
symbols. It is uniquely decodable, because any string of code symbols can be decoded in only
one way. Thus, any string of Huffman encoded symbols can be decoded by examining the
individual symbols of the string in a left to right manner. For the binary code of Fig. 4.2, a
left-to-right scan of the encoded string 010100111100 reveals that the first valid code word is
01010, which is the code for symbol a3 .The next valid code is 011, which corresponds to
symbol a1. Continuing in this manner reveals the completely decoded message to be
a3a1a2a2a6.
Arithmetic coding:
Unlike the variable-length codes described previously, arithmetic coding generates nonblock
codes.
In arithmetic coding, a one-to-one correspondence between source symbols and code words
does not exist.
Instead, an entire sequence of source symbols (or message) is assigned a single arithmetic
code word. The code word itself defines an interval of real numbers between 0 and 1. As the
number of symbols in the message increases, the interval used to represent it becomes smaller
and the number of information units (say, bits) required to represent the interval becomes
larger. Each symbol of the message reduces the size of the interval in accordance with its
probability of occurrence. Because the technique does not require, as does Huffman's
approach, that each source symbol translate into an integral number of code symbols (that is,
that the symbols be coded one at a time), it achieves (but only in theory) the bound established
by the noiseless coding theorem.
Fig.5.1 Arithmetic coding procedure
Figure 5.1 illustrates the basic arithmetic coding process. Here, a five-symbol sequence or
message, a1a2a3a3a4, from a four-symbol source is coded. At the start of the coding process,
the message is assumed to occupy the entire half-open interval [0, 1). As Table 5.2 shows, this
interval is initially subdivided into four regions based on the probabilities of each source
symbol. Symbol ax, for example, is associated with subinterval [0, 0.2). Because it is the first
symbol of the message being coded, the message interval is initially narrowed to [0, 0.2). Thus
in Fig. 5.1 [0, 0.2) is expanded to the full height of the figure and its end points labeled by the
values of the narrowed range.
The narrowed range is then subdivided in accordance with the original source symbol
probabilities and the process continues with the next message symbol.
In this manner, symbol a2 narrows the subinterval to [0.04, 0.08), a3 further narrows it to
[0.056, 0.072), and so on. The final message symbol, which must be reserved as a special end-
of-message indicator, narrows the range to [0.06752, 0.0688). Of course, any number within
this subinterval—for example, 0.068—can be used to represent the message.
In the arithmetically coded message of Fig. 5.1, three decimal digits are used to represent the
five-symbol message. This translates into 3/5 or 0.6 decimal digits per source symbol and
compares favorably with the entropy of the source, which is 0.58 decimal digits or 10-ary
units/symbol. As the length of the sequence being coded increases, the resulting arithmetic
code approaches the bound established by the noiseless coding theorem.
In practice, two factors cause coding performance to fall short of the bound: (1) the addition of
the end-of-message indicator that is needed to separate one message from an-other; and (2) the
use of finite precision arithmetic. Practical implementations of arithmetic coding address the
latter problem by introducing a scaling strategy and a rounding strategy (Langdon and
Rissanen [1981]). The scaling strategy renormalizes each subinterval to the [0, 1) range before
subdividing it in accordance with the symbol probabilities. The rounding strategy guarantees
that the truncations associated with finite precision arithmetic do not prevent the coding
subintervals from being represented accurately.
Run-Length Coding:
Run Length Encoding is a very simple form of data compression in which runs of data
(that is, sequences in which the same data value occurs in many consecutive data
elements) are stored as a single data value and count, rather than as the Original run.
This is most useful on data that contains many such runs: for example, relatively simple
graphic images such as icons, line drawings, and animations.
Run-length Encoding, or RLE is a technique used to reduce the size of a repeating string
of characters.
This repeating string is called a run, typically RLE encodes a run of symbols into two
bytes , a count and a symbol.
Compression is achieved by eliminating a simple form of spatial redundancy –groups of
identical intensities.
When there are few or no runs of identical pixels, run-length encoding results in data
expansion.
RLE cannot achieve high compression ratios compared to other compression methods.
It is easy to implement and is quick to execute.
Run-length encoding is supported by most bitmap file formats such as TIFF, BMP and
PCX.
Example:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWW
WWWWWWWWWWWWWBWWWWWWWWWWWWWW
RLE coding:
12W1B12W3B24W1B14W
In one-dimensional run-length coding schemes for binary images, runs of continuous 1s or
0s in every row of an image are encoded together, resulting in substantial bit savings.
Either, it is to be indicated whether the row begins with a run of 1s or 0s, or the encoding
of a row compulsorily begins with 1s, where the count could be zero, if it begins with 0s.
Run-length encoding may be illustrated with the following example of a row of an image:
0001101000111110010111111111111111011100000000010000111111100000 001
Let us say, we adopt the second scheme, to begin with runs of 1s. Hence, the first run
count in the given binary sequence is 0.
Then we have a run of three 0s. Hence, the next count is 3. Proceeding in this manner, the
reader may verify that the given binary sequence gets encoded to:
0,3,2,1,1,3,5,2,1,1,15,1,3,9,1,4,7,7,1
Every run count values for 1s and 0s are associated with some probabilities, which may be
determined through some statistical observations. It is therefore possible to estimate the
entropies H0 and H1 for the runs of 0s and 1s respectively. If L0 and L1 are the average
values of run-lengths for 0s and 1s, the run-length entropy of the image is expressed as
We may apply some variable length coding to encode the run values. Although, run-length
coding is essentially a compression technique for binary images, it can be applied to gray-
scale images in an indirect way by first decomposing the gray-scale image to bit-plane
images and then applying run length coding on each of the bit-plane images.
Bit-Plane Coding:
An effective technique for reducing an image's inter pixel redundancies is to process
the image's bit planes individually. The technique, called bit-plane coding, is based on the
concept of decomposing a multilevel (monochrome or color) image into a series of binary
images and compressing each binary image via one of several well-known binary compression
methods.
Bit-plane decomposition:
The gray levels of an m-bit gray-scale image can be represented in the form of the base 2
polynomial
Based on this property, a simple method of decomposing the image into a collection of binary
images is to separate the m coefficients of the polynomial into m 1-bit bit planes.
The zeroth-order bit plane is generated by collecting the a0 bits of each pixel, while the (m - 1)
st-order bit plane contains the am-1, bits or coefficients. In general, each bit plane is numbered
from 0 to m-1 and is constructed by setting its pixels equal to the values of the appropriate bits
or polynomial coefficients from each pixel in the original image.
The inherent disadvantage of this approach is that small changes in gray level can have a
significant impact on the complexity of the bit planes. If a pixel of intensity 127 (01111111) is
adjacent to a pixel of intensity 128 (10000000), for instance, every bit plane will contain a
corresponding 0 to 1 (or 1 to 0) transition. For example, as the most significant bits of the two
binary codes for 127 and 128 are different, bit plane 7 will contain a zero-valued pixel next to
a pixel of value 1, creating a 0 to 1 (or 1 to 0) transition at that point.
Here, denotes the exclusive OR operation. This code has the unique property that
successive code words differ in only one bit position. Thus, small changes in gray level are
less likely to affect all m bit planes. For instance, when gray levels 127 and 128 are adjacent,
only the 7th bit plane will contain a 0 to 1 transition, because the Gray codes that correspond
to 127 and 128 are 11000000 and 01000000, respectively.
The decoder of Fig. 8.1 (b) reconstructs en from the received variable-length code words and
performs the inverse operation
Various local, global, and adaptive methods can be used to generate f^ n. In most cases,
however, the prediction is formed by a linear combination of m previous pixels. That is,
where m is the order of the linear predictor, round is a function used to denote the
rounding or nearest integer operation, and the αi, for i = 1,2,..., m are prediction coefficients.
In raster scan applications, the subscript n indexes the predictor outputs in accordance with
their time of occurrence. That is, fn, f^n and en in Eqns. above could be replaced with the more
explicit notation f (t), f^(t), and e (t), where t represents time. In other cases, n is used as an
index on the spatial coordinates and/or frame number (in a time sequence of images) of an
image.
In 1-D linear predictive coding, for example, Eq. above can be written as
Fig. 9 A lossy predictive coding model: (a) encoder and (b) decoder.
In order to accommodate the insertion of the quantization step, the error-free encoder of figure
must be altered so that the predictions generated by the encoder and decoder are equivalent.
As Fig.9 (a) shows, this is accomplished by placing the lossy encoder's predictor within a
feedback loop, where its input, denoted f˙n, is generated as a function of past predictions and
the corresponding quantized errors. That is,
This closed loop configuration prevents error buildup at the decoder's output. Note from Fig.
9(b) that the output of the decoder also is given by the above Eqn.
Optimal predictors:
The optimal predictor used in most predictive coding applications minimizes the encoder's
mean-square prediction error
subject to the constraint that
and
That is, the optimization criterion is chosen to minimize the mean-square prediction error, the
quantization error is assumed to be negligible (e˙n ≈ en), and the prediction is constrained to a
linear combination of m previous pixels.1 These restrictions are not essential, but they
simplify the analysis considerably and, at the same time, decrease the computational
complexity of the predictor. The resulting predictive coding approach is referred to as
differential pulse code modulation (DPCM).
Transform Coding:
All the predictive coding techniques operate directly on the pixels of an image and
thus are spatial domain methods. In this coding, we consider compression techniques that are
based on modifying the transform of an image. In transform coding, a reversible, linear
transform (such as the Fourier transform) is used to map the image into a set of transform
coefficients, which are then quantized and coded. For most natural images, a significant
number of the coefficients have small magnitudes and can be coarsely quantized (or discarded
entirely) with little image distortion. A variety of transformations, including the discrete
Fourier transform (DFT), can be used to transform the image data.
Figure 10 shows a typical transform coding system. The decoder implements the inverse
sequence of steps (with the exception of the quantization function) of the encoder, which
performs four relatively straightforward operations: subimage decomposition, transformation,
quantization, and coding.
An N X N input image first is subdivided into subimages of size n X n, which are then
transformed to generate (N/n) 2 subimage transform arrays, each of size n X n.
The goal of the transformation process is to decorrelate the pixels of each subimage, or to
pack as much information as possible into the smallest number of transform coefficients.
The quantization stage then selectively eliminates or more coarsely quantizes the coefficients
that carry the least information. These coefficients have the smallest impact on reconstructed
subimage quality.
The encoding process terminates by coding (normally using a variable-length code) the
quantized coefficients. Any or all of the transform encoding steps can be adapted to local
image content, called adaptive transform coding, or fixed for all subimages, called
nonadaptive transform coding.
n 1 n 1
T (u , v ) g ( x, y ) r ( x, y , u , v )
x 0 y 0
n 1 n 1
g ( x, y ) T (u , v) s ( x, y, u , v)
u 0 v 0
In the context of digital imaging, an image file format is a standard way to organize and store
image data. It defines how the data is arranged and the type of compression, if any, that is
used.
An image container is similar to a file format but handles multiple types of image data.
Image compression standards define procedures for compressing and decompressing
images.