Professional Documents
Culture Documents
Audiocompression
Audiocompression
Audiocompression
4.1 Introduction
4.2 audio compression
4.3 Video compression
4.1 introduction
Decoder
• Decoder simply adds the previous register contents (PCM)
with the DPCM
• Since ADC will have noise there will be cumulative errors
in the value of the register signal
Operation of DPCM
• To eliminate this noise effect predictive methods are used
to predict a more accurate version of the previous signal
(use not only the current signal but also varying
proportions of a number of the preceding estimated
signals)
• These proportions used are known as predictor
coefficients
• Difference signal is computed by subtracting varying
proportions of the last three predicted values from the
current output by the ADC
Audio Compression- Third-order predictive DPCM signal encoder and decoder
Operation of Third-order predictive DPCM
• Difference signal is computed by subtracting varying
proportions of the last three predicted values from the
current digitized value output by the ADC
• If three predictor coefficients are C1=0.5 and C2=C3=0.25
then contents of R1 are shifted to right by 1 bit, R2 and R3
are shifted by 2 bits
• 3 shifted values are added together and resulting sum is
subtracted
• The values in the R1 register will be transferred to R2 and
R2 to R3 and the new predicted value goes into R1
• Decoder operates in a similar way by adding the same
proportions of the last three computed PCM signals to
the received DPCM signal
Adaptive differential PCM (ADPCM)
• Savings of bandwidth or improved quality can be
obtained by varying the number of bits used for the
difference signal depending on its amplitude (fewer bits
to encode smaller difference signals)
• An international standard for this is defined in ITU-T
recommendation G721
• This is based on the same principle as the DPCM except
an eight-order predictor is used and the number of bits
used to quantize each difference is varied
• This can be either 6 bits – producing 32 kbps – to obtain a
better quality output than with third order DPCM, or 5
bits- producing 16 kbps – if lower bandwidth is more
important
Adaptive differential PCM (ADPCM)
• A second ADPCM standard which is a derivative of G-721 is defined in
ITU-T Recommendation G-722 (better sound quality)
• This uses subband coding in which the input signal prior to sampling
is passed through two filters: one which passes only signal
frequencies in the range 50Hz through to 3.5kHz and the other only
frequencies in the range 3.5kHz through to 7kHz
• By doing this the input signal is effectively divided into two separate
equal-bandwidth signals, the first known as the lower subband signal
and the second the upper subband signal
• Each is then sampled and encoded independently using ADPCM, the
sampling rate of the upper subband signal being 16 ksps to allow for
the presence of the higher frequency components in this subband
Audio Compression- ADPCM subband encoder and
decoder schematic
Adaptive differential PCM (ADPCM)
• The use of two subbands has the advantage that
different bit rates can be used for each
• In general the frequency components in the lower
subband have a higher perceptual importance than
those in the higher subband
• For example with a bit rate of 64 kbps the lower
subband is ADPCM encoded at 48kbps and the
upper subband at 16kbps
• The two bitstreams are then multiplexed together
to produce the transmitted (64 kbps) signal – in
such a way that the decoder in the receiver is able
to divide them back again into two separate
streams for decoding
Adaptive predictive coding
• Even higher levels of compression possible at higher levels
of complexity
• These can be obtained by also making the predictor
coefficients adaptive
• In practice, the optimum set of predictor coefficients
continuously vary since they are a function of the
characteristics of the audio signal being digitized
• To exploit this property, the input speech signal is divided
into fixed time segments and, for each segment, the
currently prevailing characteristics are determined.
• The optimum set of coefficients are then computed and
these are used to predict more accurately the previous
signal
• This type of compression can reduce the bandwidth
requirements to 8kbps while still obtaining an acceptable
perceived quality
Linear predictive coding (LPC) signal encoder and
decoder
• After the ear hears a loud signal, it takes a further short time
before it can hear a quieter sound (temporal masking)
Temporal masking
• After the ear hears a loud sound it takes a further short
time before it can hear a quieter sound
• This is known as the temporal masking
• After the loud sound ceases it takes a short period of time
for the signal amplitude to decay
• During this time, signals whose amplitudes are less than
the decay envelope will not be heard and hence need not
be transmitted
• In order to achieve this the input audio waveform must be
processed over a time period that is comparable with that
associated with temporal masking
MPEG Audio Coders
• Perceptual coding is used in a range of
different audio compression applications
• Motion Pictures Expert Group was formed by
ISO to formulate a set of standards relating o a
range of multimedia applications that involve
the use of video with sound
• Coders associated with audio compression
part – MPEG audio coders
Audio Compression – MPEG perceptual coder
schematic
MPEG audio coder
• The audio input signal is first sampled and quantized using
PCM
• The bandwidth available for transmission is divided into a
number of frequency subbands using a bank of analysis
filters also known as critical band filters
• The bank of filters maps each set of 32 (time related) PCM
samples into an equivalent set of 32 frequency samples,
one per subbband- subband sample
– Indicates the magnitude of each of the 32 frequency components
that are present in a segment of the audio input signal of a time
duration equal to 32 PCM samples
– E.g Assuming 2 subbands and a sampling rate of 32ksps i.e max.
freq. 16kHz, each subband has a frequency of 500 Hz
• The time duration of each sampled segment of the
audio input signal is equal to the time to accumulate
12 successive sets of 32 PCM and hence subband
samples- 12 X32 PCM samples
• Analysis filter bank also determines the maximum
amplitude of the subband samples in each subband-
scaling factor for the subband
• Passed both to the psychoacoustic model together
with the set of frequency samples in each subband
• Processing associated with both frequency
and temporal masking is carried out by the
psychoacoustic model
• 12 sets of 32 PCM are converted into
frequency components using DFT
MPEG audio coder
• The output of the psychoacoutic model is a set of what are
known as signal-to-mask ratios (SMRs) and indicate the
frequency components whose amplitude is below the
audible components
• This is done to have more bits for highest sensitivity regions
compared with less sensitive regions
• In an encoder all the frequency components are carried in
a frame
Audio Compression – MPEG perceptual
coder schematic
C D -quality of
3 C D -quality 64kbps 60m s
64kbps per
channel
Reference:
• Fred Halsall, Multimedia Communications-
Applications, Networks, Protocols and
Standards, (1e), Pearson Education India,2002