Lecture 7 Application of Transforms

Digital Signal Processing (ECN-312)
Lecture 7 (Applications of Transforms)
Dheeraj Kumar
dheeraj.kumar@ece.iitr.ac.in
February 9, 2023
Table of Contents
1 Application of transforms in speech, audio, image and video

coding
Speech and audio coding
Image and video coding
2 The discrete cosine transform (DCT)

DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards
2 / 36
Table of Contents

coding

DCT-1 and DCT-2
3 / 36
Transform coding
❑ A type of data compression scheme for “natural” data like speech,

audio, images, and video
❑ The transformation is typically lossless (perfectly reversible) on its
own
❑ e.g., DFT
❑ Knowledge of the application is used to choose information to
discard, thereby lowering its bandwidth
❑ Used to enable better (more targeted) quantization
❑ Results in a lower quality copy of the original input
❑ Close enough for the purpose of the application
❑ Lossy compression
4 / 36
Transform coding
❑ Its a “frequency-domain” approach

❑ Efficiency depends on the type of linear transform and the nature
of bit allocation for quantizing transform coefficients
5 / 36
Table of Contents

coding

DCT-1 and DCT-2
6 / 36
Audio coding
❑ For high quality production of music (including speech) in multiple

channels
❑ Music has a much wider bandwidth and multichannels
❑ High rate waveform-based speech coder
❑ To retain the natural sound quality
❑ Make extensive use of human hearing properties in determining
the quantization levels in different frequency bands
❑ Each frequency component is quantized with a step-size that
depends on the hearing threshold
❑ Don’t code if the ear cannot hear it!
7 / 36
Audio coding: basic idea
❑ Decompose a signal into separate frequency bands

❑ Analyze signal energy in different bands and determine the total
masking threshold of each band
❑ Quantize samples in different bands with accuracy proportional to
the masking level
❑ Any signal below the masking level does not need to be coded
❑ Signal above the masking level are quantized with a quantization
step size according to masking level
❑ Bits are assigned across bands so that each additional bit provides
maximum reduction in perceived distortion
8 / 36
Audio coding block diagram
9 / 36
Table of Contents

coding

DCT-1 and DCT-2
10 / 36
❑ Images and videos have a vast amount of data associated with

them
❑ Compression is a key technology for their digital transmission and
storage
❑ Compression techniques takes advantage of the structure of
images and video
❑ Statistical, spatial and temporal redundancies
❑ Exploit the limitations of human visual perception to omit
components of the signal that will not be noticed
11 / 36
Image and video coder structure
❑ Transform T (x) is usually invertible

❑ Quantization is not invertible, and introduces distortion
❑ Combination of encoder and decoder is lossless
12 / 36
Table of Contents

coding

DCT-1 and DCT-2
13 / 36
Real-valued transform
❑ General class of finite-length transform representations

PN−1
❑ A[k ] = n=0 x[n]ϕ∗k [n]
PN−1
❑ x[n] = N1 k =0 A[k ]ϕk [n]
❑ Where, basis sequences ϕk [n] are orthogonal to one another
(
1
PN−1 ∗ 1, m = k
❑ N n=0 ϕk [n]ϕm [n] =
̸ n
0, m =
2πkn
❑ For DFT, ϕk [n] = ej N are complex and periodic
❑ Sequence A[k ] is complex, even if the sequence x[n] is real
❑ Natural to inquire, if there exist sets of real-valued basis
sequences that would yield a real-valued A[k] when x[n] is real
❑ Discrete cosine transform (DCT)
❑ Closely related to the DFT
❑ Useful and important in a number of signal-processing applications
(e.g., speech and image compression)
14 / 36
A periodic, symmetric sequence from a
finite-length sequence
❑ Basis sequences ϕk [n] in DCT are cosines

❑ Periodic and even symmetric
❑ For DFT, we represented finite-length sequences by forming
periodic sequences
❑ From which the finite-length sequence can be uniquely recovered
❑ Similarly, for DCT, we form a periodic, symmetric sequence from a
❑ Original finite-length sequence can be uniquely recovered
❑ Many ways to do this, hence, many definitions of the DCT (called
DCT1, DCT2, DCT3, and DCT4)
❑ Original finite-length (N = 4) sequence
15 / 36
Various periodic, symmetric sequences from a
16 / 36
Various periodic, symmetric sequences from a
❑ x̃1 [n]
❑ Period: 2N − 2 = 6
❑ Even symmetric about both n = 0 and n = N − 1 = 3
❑ x̃2 [n]
❑ Period: 2N = 8
❑ Even symmetric about half-sample points n = − 21 and n = 7
2
❑ x̃3 [n]
❑ Period: 4N = 16
❑ Even symmetric about both n = 0 and n = 8
❑ x̃4 [n]
❑ Period: 4N = 16
❑ Even symmetric about half-sample points n = − 12 and n = 15
2
❑ DCT-1 and DCT-2 are most popular
17 / 36
Table of Contents

coding

DCT-1 and DCT-2
18 / 36
Extension for the DCT-1
❑ x[n] is first modified at the endpoints and then extended to have

period 2N − 2
❑ x̃1 [n] = xα [((n))2N−2 ] + xα [((−n))2N−2 ]
❑ xα [n] is(the modified sequence xα [n] = α[n]x[n]
1
, n = 0 and N − 1
❑ α[n] = 2
1, 1 ≤ n ≤ N − 2
❑ The weighting of the endpoints compensates for the doubling that
occurs when the two terms in the expression of x̃1 [n] overlap at
n = 0, and n = N − 1
❑ x̃1 [n] = x[n] at n = 0, N − 1, 2N − 2, ...
19 / 36
DCT-1 transform pair
N−1
X πkn
X c1 [k ] = 2 α[n]x[n]cos , 0≤k ≤N −1
N −1
n=0
N−1
1 X πkn
x[n] = α[k ]X c1 [k ]cos , 0≤n ≤N −1
N −1 N −1
k =0
20 / 36
Extension and transform pair for DCT-2
❑ x[n] is extended to have period 2N

❑ x̃2 [n] = x[((n))2N ] + x[((−n − 1))2N ]
N−1
X πk (2n + 1)
c2
X [k ] = 2 x[n]cos , 0≤k ≤N −1
2N
n=0
N−1
1 X πk (2n + 1)
x[n] = β[k ]X c2 [k ]cos , 0≤n ≤N −1
N 2N
k =0
(
1
2, k =0
❑ Where, β[k ] =
1, 1≤k ≤N −1
21 / 36
DCT-1 and DCT-2 example
22 / 36
Table of Contents

coding

DCT-1 and DCT-2
23 / 36
Energy compaction
❑ DCT-2 of a finite-length sequence often has its coefficients more

highly concentrated at low indices than the DFT does
❑ Preferred in data compression applications
❑ Example: Consider input x[n] = an cos(ω0 n + ϕ),
n = 0, 1, ..., N − 1
❑ a = 0.9, ω0 = 0.1π, ϕ = 0, and N = 32
24 / 36
DFT of the example signal
25 / 36
DCT-2 of the example signal
❑ DCT-2 values are highly concentrated at low indices

❑ Energy of the sequence is more concentrated in the DCT-2
representation than in the DFT representation
26 / 36
Truncated representation
❑ Energy concentration property can be quantified by truncating

DFT and DCT-2
❑ Compare the mean-squared approximation error for the two
representations
❑ Both use the same number of real coefficient values
❑ DFT truncated representation:
N−1
dft 1 X j2πkn
xm [n] = Tm [k ]X [k ]e N , n = 0, 1, ..., N − 1
N
k =0
❑ X [k ] is the N-point DFT of x[n]


N−1−m
1, 0 ≤ k ≤
 2
❑ Tm [k ] = 0, N+1−m 2 ≤ k ≤ N−1+m
2

1, N+1+m ≤ k ≤ N − 1

2
27 / 36
Truncated representation
❑ m = 1 → X [ N2 ] is removed
❑ m = 3 → X [ N2 ], X [ N2 + 1], and X [ N2 − 1] are removed
❑ ...
dft [n] is synthesized by symmetrically omitting m ∈ {1, 3, 5, ...}
❑ xm
DFT coefficients
❑ DCT truncated representation:
N−1−m
dct 1 X πk (2n + 1)
xm [n] = β[k ]X c2 [k ]cos , 0≤n ≤N −1
N 2N
k =0
(
1
2, k =0
❑ β[k ] =
1, 1≤k ≤N −1
28 / 36
Truncation error
N−1 N−1
dft 1 X dft 1 X
E [m] = |x[n] − xm [n]|2 E dct
[m] = dct
|x[n] − xm [n]|2
N N
n=0 n=0
29 / 36
Truncation error
❑ DFT error grows steadily as m increases, while the DCT error

remains very small up to about m = 25
❑ N = 32 numbers of the sequence x[n] can be represented with
slight error by only seven DCT-2 coefficients
30 / 36
Table of Contents

coding

DCT-1 and DCT-2
31 / 36
Applications of DCT
❑ Major application of the DCT-2 is in signal compression

❑ Key part of many standardized algorithms for image, video, and
audio coding
❑ e.g., JPEG of images, MPEG for video, etc.
❑ Blocks of the signal are represented by their cosine transforms
❑ Exploiting the energy concentration property of DCT
32 / 36
JPEG
❑ JPEG algorithm is based on lossy transform coding

❑ Partitions the image into 8 × 8 pixel blocks
❑ Each of these blocks is then coded using two-dimensional DCT
33 / 36
JPEG example
34 / 36
JPEG example
❑ The next step is to quantize the DCT coefficients

❑ Partition the DCT coefficient into windows and generates a code to
represent each window
❑ Each coefficient is linearly quantized (quantization window size is
constant) independently of the other coefficients
❑ JPEG specification allows each DCT coefficient to be assigned its
own quantization step size
❑ More important frequency terms can be represented more
accurately than less important terms
❑ The frequency terms are scanned or reordered according to
increasing spatial frequency (zig-zag scan)
❑ Since higher spatial frequency terms are often zero or quantized
to zero, there will tend to be many zero terms in a row
❑ Optimal for run length coding
35 / 36
Thanks.

Lecture 7 Application of Transforms

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7 Application of Transforms

Uploaded by

Copyright:

Available Formats

Digital Signal Processing (ECN-312)

Lecture 7 (Applications of Transforms)

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ A type of data compression scheme for “natural” data like speech,

❑ Its a “frequency-domain” approach

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ For high quality production of music (including speech) in multiple

❑ Decompose a signal into separate frequency bands

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ Images and videos have a vast amount of data associated with

❑ Transform T (x) is usually invertible

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ General class of finite-length transform representations

❑ Basis sequences ϕk [n] in DCT are cosines

❑ Original finite-length (N = 4) sequence

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ x[n] is first modified at the endpoints and then extended to have

❑ x[n] is extended to have period 2N

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ DCT-2 of a finite-length sequence often has its coefficients more

❑ DCT-2 values are highly concentrated at low indices

❑ Energy concentration property can be quantified by truncating

❑ X [k ] is the N-point DFT of x[n]

❑ DFT error grows steadily as m increases, while the DCT error

1 Application of transforms in speech, audio, image and video

2 The discrete cosine transform (DCT)

❑ Major application of the DCT-2 is in signal compression

❑ JPEG algorithm is based on lossy transform coding

❑ The next step is to quantize the DCT coefficients

You might also like