Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

1

Prof. Tsuhan Chen


tsuhan@ece.cmu.edu
18-899 Special Topics in Signal Processing
Multimedia Communications:
Coding, Systems, and Networking
Lecture 8
MPEG-1 Audio
2
18-899/Spring 1998/Chen
MPEG-1 Audio
Outline
Background
Psychoacoustics
Subband coding
Layer I and II
Layer III
Frame structure and packetization
18-899/Spring 1998/Chen
MPEG-1 Audio
ISO/IEC 11172-3 (1988~1991)
First high quality audio compression standard
CD quality two-channel audio at 256 kbits/s
CD: 44.1 kHz 16 bits 2 = 1.411 Mbits/s
Frequency
Band (Hz)
Sampling
Rate
Bits per
Sample
Raw Bitrate
Telephone
Speech
300~3400 8 8 64
Wideband
Speech
50~7000 16 8 128
Mediumband
Audio
10~11000 24 16 384
Wideband
Audio
10~22000 48 16 768
3
18-899/Spring 1998/Chen
Quality Demonstration
MPEG-1 Audio (Layer II)
Stereo 44.1 kHz at 64 kbits/s
Stereo 44.1 kHz at 128 kbits/s
Stereo 44.1 kHz at 192 kbits/s
Stereo 44.1 kHz at 256 kbits/s
18-899/Spring 1998/Chen
Psychoacoustics
Threshold in quiet
4
18-899/Spring 1998/Chen
Frequency Masking
18-899/Spring 1998/Chen
Temporal Masking
Post-Masking: 50~200ms
Also Pre-Masking (much shorter)
5
18-899/Spring 1998/Chen
Encoder Block Diagram
mapping
quantizer
and
coding
frame
packing
psychoacoustic
model
PCM
audio samples
32, 44.1, 48 kHz
encoded
bitstream
11172-3
Encoder
ancillary data
18-899/Spring 1998/Chen
Decoder Block Diagram
frame
unpacking
reconstruction
inverse
mapping
encoded
bits tream
PCM
audio samples
32, 44.1, 48 kHz
ancillary data
11172-3 Decoder
6
18-899/Spring 1998/Chen
H
1
(z)
H
2
(z)
F
1
(z)
F
2
(z)
H
M
(z) F
M
(z)
M
M
M
M
M
M
Q
Q
Q
Analysis
Filterbank
Synthesis
Filterbank
Mapping: Subband Coding
Critical downsampling
Q should be based on signal-to-masking ratio (SMR)
Ears critical bands are not uniform, but logarithmic
18-899/Spring 1998/Chen
Alias cancellation and perfect reconstruction
M
M
M
z
-1
z
-1
E(z) R(z)
M
M
M
z
z
.
.
.
.
.
.
.
.
.
Polyphase Filterbank
7
18-899/Spring 1998/Chen
Layers
Increasing complexity, delay, and quality
Layer I
~384 kbits/s for perceptually lossless quality (4:1)
Layer II
~192 kbits/s for perceptually lossless quality (8:1)
Layer III
~128 kbits/s for perceptually lossless quality (12:1)
18-899/Spring 1998/Chen
Analysis
Filterbank
Scaler &
Quantizer
Mux
32
Masking
Threshold
Generator
Layer I and II Encoder
Dynamic
Bit
Allocator
FFT
Coder
8
Analysis
Filterbank
.
.
.
12 12 12
Layer I
Layer II
Block-Based Coding
12 samples for Layer I, 36 samples for Layer II
Block companding: Each block normalized by scalefactor
For Layer II, up to 3 scalefactors, with 2-bit scalefactor select
Each block receives one bit allocation
18-899/Spring 1998/Chen
Analysis
Filterbank
Scaler &
Quantizer
Mux
Layer III Encoder
FFT
MDCT
Huffman
Coding
Masking
Threshold
Generator
Coding
6 or 18
with overlap
9
18-899/Spring 1998/Chen
New Features in Layer III
Modified DCT (MDCT)
DCT with overlap
Long/short window switching
Short for better temporal resolution (to prevent pre-echoes)
Long for better frequency resolution
Nonuniform quantization
Entropy coding
Run-length and Huffman coding
Bit reservoir (buffer)
18-899/Spring 1998/Chen
Side Info Subband Sanples Header Info Aux Data
Frame Structure
Header info: Sync bits, system info, CRC (cyclic
redundancy code)
Side info: bit allocation, scalefactor, (and scalefactor select
for Layer II and III)
Subband samples: 32 12 for Layer I, 32 36 for Layer II
and III
Packetization: 4-byte header, 184-byte payload
10
18-899/Spring 1998/Chen
Stereo Redundancy Coding
Four modes: mono, stereo, dual with two separate
channel, joint stereo
In joint stereo mode
Human stereo perception > 2kHz is based on envelope
Intensity stereo coding > 2kHz
Encode (L + R)
Assign independent left- and right- scalefactors
Layer III supports (L+R) and (LR) coding
18-899/Spring 1998/Chen
References
Peter Noll, MPEG digital audio coding, IEEE Signal
Processing Magazine, Sept. 1997, pp. 59-81
D. Pan, A tutorial on MPEG/Audio compression,
IEEE Trans. on Multimedia, vol. 2, no. 2, 1995, pp.
60-74

You might also like