Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Digital Signal Processing (ECN-312)

Lecture 7 (Applications of Transforms)

Dheeraj Kumar

dheeraj.kumar@ece.iitr.ac.in

February 9, 2023
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

2 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

3 / 36
Transform coding

❑ A type of data compression scheme for “natural” data like speech,


audio, images, and video
❑ The transformation is typically lossless (perfectly reversible) on its
own
❑ e.g., DFT
❑ Knowledge of the application is used to choose information to
discard, thereby lowering its bandwidth
❑ Used to enable better (more targeted) quantization
❑ Results in a lower quality copy of the original input
❑ Close enough for the purpose of the application
❑ Lossy compression

4 / 36
Transform coding

❑ Its a “frequency-domain” approach


❑ Efficiency depends on the type of linear transform and the nature
of bit allocation for quantizing transform coefficients

5 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

6 / 36
Audio coding

❑ For high quality production of music (including speech) in multiple


channels
❑ Music has a much wider bandwidth and multichannels
❑ High rate waveform-based speech coder
❑ To retain the natural sound quality
❑ Make extensive use of human hearing properties in determining
the quantization levels in different frequency bands
❑ Each frequency component is quantized with a step-size that
depends on the hearing threshold
❑ Don’t code if the ear cannot hear it!

7 / 36
Audio coding: basic idea

❑ Decompose a signal into separate frequency bands


❑ Analyze signal energy in different bands and determine the total
masking threshold of each band
❑ Quantize samples in different bands with accuracy proportional to
the masking level
❑ Any signal below the masking level does not need to be coded
❑ Signal above the masking level are quantized with a quantization
step size according to masking level
❑ Bits are assigned across bands so that each additional bit provides
maximum reduction in perceived distortion

8 / 36
Audio coding block diagram

9 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

10 / 36
Image and video coding

❑ Images and videos have a vast amount of data associated with


them
❑ Compression is a key technology for their digital transmission and
storage
❑ Compression techniques takes advantage of the structure of
images and video
❑ Statistical, spatial and temporal redundancies
❑ Exploit the limitations of human visual perception to omit
components of the signal that will not be noticed

11 / 36
Image and video coder structure

❑ Transform T (x) is usually invertible


❑ Quantization is not invertible, and introduces distortion
❑ Combination of encoder and decoder is lossless

12 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

13 / 36
Real-valued transform

❑ General class of finite-length transform representations


PN−1
❑ A[k ] = n=0 x[n]ϕ∗k [n]
PN−1
❑ x[n] = N1 k =0 A[k ]ϕk [n]
❑ Where, basis sequences ϕk [n] are orthogonal to one another
(
1
PN−1 ∗ 1, m = k
❑ N n=0 ϕk [n]ϕm [n] =
̸ n
0, m =
2πkn
❑ For DFT, ϕk [n] = ej N are complex and periodic
❑ Sequence A[k ] is complex, even if the sequence x[n] is real
❑ Natural to inquire, if there exist sets of real-valued basis
sequences that would yield a real-valued A[k] when x[n] is real
❑ Discrete cosine transform (DCT)
❑ Closely related to the DFT
❑ Useful and important in a number of signal-processing applications
(e.g., speech and image compression)

14 / 36
A periodic, symmetric sequence from a
finite-length sequence

❑ Basis sequences ϕk [n] in DCT are cosines


❑ Periodic and even symmetric
❑ For DFT, we represented finite-length sequences by forming
periodic sequences
❑ From which the finite-length sequence can be uniquely recovered
❑ Similarly, for DCT, we form a periodic, symmetric sequence from a
finite-length sequence
❑ Original finite-length sequence can be uniquely recovered
❑ Many ways to do this, hence, many definitions of the DCT (called
DCT1, DCT2, DCT3, and DCT4)

❑ Original finite-length (N = 4) sequence

15 / 36
Various periodic, symmetric sequences from a
finite-length sequence

16 / 36
Various periodic, symmetric sequences from a
finite-length sequence

❑ x̃1 [n]
❑ Period: 2N − 2 = 6
❑ Even symmetric about both n = 0 and n = N − 1 = 3
❑ x̃2 [n]
❑ Period: 2N = 8
❑ Even symmetric about half-sample points n = − 21 and n = 7
2
❑ x̃3 [n]
❑ Period: 4N = 16
❑ Even symmetric about both n = 0 and n = 8
❑ x̃4 [n]
❑ Period: 4N = 16
❑ Even symmetric about half-sample points n = − 12 and n = 15
2
❑ DCT-1 and DCT-2 are most popular

17 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

18 / 36
Extension for the DCT-1

❑ x[n] is first modified at the endpoints and then extended to have


period 2N − 2
❑ x̃1 [n] = xα [((n))2N−2 ] + xα [((−n))2N−2 ]
❑ xα [n] is(the modified sequence xα [n] = α[n]x[n]
1
, n = 0 and N − 1
❑ α[n] = 2
1, 1 ≤ n ≤ N − 2
❑ The weighting of the endpoints compensates for the doubling that
occurs when the two terms in the expression of x̃1 [n] overlap at
n = 0, and n = N − 1
❑ x̃1 [n] = x[n] at n = 0, N − 1, 2N − 2, ...

19 / 36
DCT-1 transform pair

N−1
X  πkn 
X c1 [k ] = 2 α[n]x[n]cos , 0≤k ≤N −1
N −1
n=0

N−1
1 X  πkn 
x[n] = α[k ]X c1 [k ]cos , 0≤n ≤N −1
N −1 N −1
k =0

20 / 36
Extension and transform pair for DCT-2

❑ x[n] is extended to have period 2N


❑ x̃2 [n] = x[((n))2N ] + x[((−n − 1))2N ]
N−1
X  πk (2n + 1) 
c2
X [k ] = 2 x[n]cos , 0≤k ≤N −1
2N
n=0
N−1
1 X  πk (2n + 1) 
x[n] = β[k ]X c2 [k ]cos , 0≤n ≤N −1
N 2N
k =0
(
1
2, k =0
❑ Where, β[k ] =
1, 1≤k ≤N −1

21 / 36
DCT-1 and DCT-2 example

22 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

23 / 36
Energy compaction

❑ DCT-2 of a finite-length sequence often has its coefficients more


highly concentrated at low indices than the DFT does
❑ Preferred in data compression applications
❑ Example: Consider input x[n] = an cos(ω0 n + ϕ),
n = 0, 1, ..., N − 1
❑ a = 0.9, ω0 = 0.1π, ϕ = 0, and N = 32

24 / 36
DFT of the example signal

25 / 36
DCT-2 of the example signal

❑ DCT-2 values are highly concentrated at low indices


❑ Energy of the sequence is more concentrated in the DCT-2
representation than in the DFT representation

26 / 36
Truncated representation

❑ Energy concentration property can be quantified by truncating


DFT and DCT-2
❑ Compare the mean-squared approximation error for the two
representations
❑ Both use the same number of real coefficient values
❑ DFT truncated representation:
N−1
dft 1 X j2πkn
xm [n] = Tm [k ]X [k ]e N , n = 0, 1, ..., N − 1
N
k =0

❑ X [k ] is the N-point DFT of x[n]



N−1−m
1, 0 ≤ k ≤
 2
❑ Tm [k ] = 0, N+1−m 2 ≤ k ≤ N−1+m
2

1, N+1+m ≤ k ≤ N − 1

2

27 / 36
Truncated representation

❑ m = 1 → X [ N2 ] is removed
❑ m = 3 → X [ N2 ], X [ N2 + 1], and X [ N2 − 1] are removed
❑ ...
dft [n] is synthesized by symmetrically omitting m ∈ {1, 3, 5, ...}
❑ xm
DFT coefficients
❑ DCT truncated representation:
N−1−m
dct 1 X  πk (2n + 1) 
xm [n] = β[k ]X c2 [k ]cos , 0≤n ≤N −1
N 2N
k =0
(
1
2, k =0
❑ β[k ] =
1, 1≤k ≤N −1

28 / 36
Truncation error

N−1 N−1
dft 1 X dft 1 X
E [m] = |x[n] − xm [n]|2 E dct
[m] = dct
|x[n] − xm [n]|2
N N
n=0 n=0

29 / 36
Truncation error

❑ DFT error grows steadily as m increases, while the DCT error


remains very small up to about m = 25
❑ N = 32 numbers of the sequence x[n] can be represented with
slight error by only seven DCT-2 coefficients

30 / 36
Table of Contents

1 Application of transforms in speech, audio, image and video


coding
Speech and audio coding
Image and video coding

2 The discrete cosine transform (DCT)


DCT-1 and DCT-2
Energy compaction property of DCT-2
Applications of DCT: JPEG and MPEG compression standards

31 / 36
Applications of DCT

❑ Major application of the DCT-2 is in signal compression


❑ Key part of many standardized algorithms for image, video, and
audio coding
❑ e.g., JPEG of images, MPEG for video, etc.
❑ Blocks of the signal are represented by their cosine transforms
❑ Exploiting the energy concentration property of DCT

32 / 36
JPEG

❑ JPEG algorithm is based on lossy transform coding


❑ Partitions the image into 8 × 8 pixel blocks
❑ Each of these blocks is then coded using two-dimensional DCT

33 / 36
JPEG example

34 / 36
JPEG example

❑ The next step is to quantize the DCT coefficients


❑ Partition the DCT coefficient into windows and generates a code to
represent each window
❑ Each coefficient is linearly quantized (quantization window size is
constant) independently of the other coefficients
❑ JPEG specification allows each DCT coefficient to be assigned its
own quantization step size
❑ More important frequency terms can be represented more
accurately than less important terms
❑ The frequency terms are scanned or reordered according to
increasing spatial frequency (zig-zag scan)
❑ Since higher spatial frequency terms are often zero or quantized
to zero, there will tend to be many zero terms in a row
❑ Optimal for run length coding

35 / 36
Thanks.

You might also like