Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

1

1
Speech Coding
Speech coders
Source coders
Waveform coders
LPC Vocoders Frequency domain Time domain
Nondifferential Differential SBC ATC
APC CVSDM
ADPCM Delta PCM
2
PCM (Pulse Code Modulation)
Here, analog signals are quantized in homogeneous steps similar to
the usual A/D conversion
It does not compress the information rate, since it does not use
speech-specific characteristics; based on the statistical
characteristics of speech amplitude
The quantization bits B must satisfy: or
Where is the quantization step size, and L is the range of signal
amplitude
The number of bits B must be decided so that the SNR of the
quantized signal is larger than that of the signal before quantization
PCM used in the ordinary telephone system called log PCM, since
the amplitude is compressed by logarithmic transformation before
linear quantization and coding
L 2
B
) L ( log B
2

2
3
PCM
Since the amplitude of a speech signal has an exponential distribution,
the occurrence probability for each bit is equalized by the logarithmic
transformation
Two types of transformation expressions that control the compression:
-law:
Logarithmic compressor used in North American
telecommunications systems
Has an input-output magnitude characteristics of the form:
where |s| is the magnitude of the input, |y| is the magnitude of the
output, and is a parameter that is selected to give the desired
compression characteristics
( )
( )

+
+
=
1
1
log
log s
y
4
PCM
The larger the value of , the larger the amount of compression
Typically is chosen between 100 and 500. = 255 is chosen as a
standard encoding of speech waveforms, in US and Canada
A-law:
Logarithmic compressor used in European telecommunications
systems
Has an input-output magnitude characteristics of the form:
where |s| is the magnitude of the input, |y| is the magnitude of the
output, and A is a parameter that is selected to give the desired
compression characteristics
A
s A
y
log
log
+
+
=
1
1
3
5
PCM
A = 87.56 is chosen as a standard encoding of speech waveforms
Even the two compression characteristics are different non-linear
functions, the characteristics are very similar.
6
Differential PCM
This method based on the predictive coding:
Information compression can be achieved by coding the difference
between adjacent samples or the difference between the actual
sample value and the predicted value calculated using the
correlation (prediction residual);
Can be done since a speech signal has a correlation between
adjacent samples as well as between distant sample;
The quantization bits can be reduced, because the difference and
prediction residual have a smaller range of variation and smaller
mean of energy than that of the original signal.
predictor

+
+
|(i)|
(n) To D/A converter
) (n e
-
quantizer
predictor

+
+
To channel
) (
~
n e
) (
~
n s ) (
~
n s
) (n s
4
7
Differential PCM
When the prediction is performed according to linear prediction, the
prediction residual is quantized and transmitted. For the first-order
linear prediction, the equation becomes e
t
= s
t
+
1
s
t-1
. If the predictor
coefficient is set as
1
= -1, the system only transmit the difference
between adjacent samples. This system called differential PCM
(DPCM)
This differential method is used to cope the accumulated encoder error
and to achieve the maximum prediction gain
8
Differential PCM
One nonstationary characteristic of a speech signal is that the variance
and the autocorrelation function of the output source vary with time;
while PCM and DPCM encoders are designed on the basis that the
output source is stationary;
The efficiency and performance of these encoders can be improved by
adapting them to the slowly time-variant statistics of the speech signal
In PCM and DPCM, the quantization error q(n) from a uniform
quantizer operating on a nonstationary input signal is a time-variant
variance (quantization noise power); improvement to reduce the
dynamic range of the quantization noise is the use of an adaptive
quantizer.
The adaptive quantizer used in conjunction with PCM called Adaptive
PCM (ADPCM); or with DPCM called Adaptive DPCM (ADPCM)
5
9
Adaptive PCM
Method used in order to utilize the nonstationarity of the dynamic
characteristics of speech amplitude for improving the SNR of quantized
speech
The step size of the quantization is varied according to the rms value of
the amplitude.
Since the speech signal can be considered to be stationary for a short
period, the step size can be varied relatively slow
Two different classifications of adaptive qualtizers: feedforward and
feedback.
10
Adaptive PCM
feedforward adaptive quantizer
Step size is adjusted for each signal sample based on a short-term
temporal estimate of the input speech signal variance
The optimum step size is decided according to the rms value
calculated for every block, and is transmitted to the receiver as side
information
/
quantizer X
) (
~
n s
) (n s
Adaptive
gain
controller
encoder decoder
input
6
11
Adaptive PCM
feedback (backward) adaptive quantizer
The step size does not need to be transmitted, since it can be
automatically generated sample by sample by using reconstructed
samples at both ends
The output of the quantizer is used in the adjustment of the step size
The forward adaptation is more efficient than the backward adaptation
The backward adaptation has a higher bit rate, because of the side
information
/
quantizer X
) (
~
n s ) (n s
Adaptive
gain
controller
encoder decoder
input
Adaptive
gain
controller
output
12
Adaptive DPCM
In Adaptive Differential PCM (ADPCM), the predictor is made adaptive
The coefficients of the predictor can be changed periodically to reflect the
changing signal statistics of the source
) (
~
n e
-
quantizer

predictor
+
+
+
To channel
Predictor
adaptation
encoder
Step-size
adaptation
) (
~
n e ) (n s
) (
~
n s
predictor

+
+ To D/A
converter
decoder
From channel
) (
~
n e
7
13
Adaptive DPCM
Here, the short-term autocorrelation method is used to compute
estimates of the LP parameters over the current frame
The predictor coefficients determined, are transmitted along with the
quantized error, to the receiver, which implements the same predictor
Therefore, the transmission of the predictor coefficients results in a
higher bit rate over the channel, offsetting in part the lower data rate
achieved by having a quantizer with fewer bits (fewer levels) to handle
the reduced dynamic range in the error resulting from adaptive
prediction.
As an alternative of transmitting the prediction coefficients, the
reflection coefficients are transmitted. They have a smaller dynamic
range and thus result in a lower bit rate
14
Adaptive DPCM
A 32-kbps ADPCM standard has been established by CCITT
(Consultative Committee for International Telephone and Telegraph)
for international telephone communications and by ANSI (American
National Standards Institute) for North American telephone systems
The forward type of ADPCM, where optimum prediction is performed
for every frame of speech signal, os called adaptive predictive coding
(APC). This is the narrowest sense designates a coding system
involving pitch prediction and two-level quantization for the predictive
parameters
8
15
APC (Adaptive Prediction Coding)
Viewed as an enhanced version of ADPCM where the periodicity of voiced
speech is used to reduce the size of error. Thus fewer bits are needed to
represent the error sequence
16
APC
The speech signal is analyzed frame by frame to obtain the predictor
coefficients
i
, pitch period M and amplitude of the pitch component .
This information and quantization step width q for the residual signal,
which together called side information, are transmitted along with the
residual signal. This residual signal is quantized and 1-bit coded (two
levels)
Since linear prediction is performed using all samples in each frame, a
large prediction gain can be obtained
Subjective evaluation experiments indicated that when the sampling
frequency is 6.67kHz (a transmission bit rate for the residual signal is
6.67kbps and a small amount of side information is additionally
transmitted), the quality of coded speech is slightly lower than with 6-bit
log PCM
9
17
Delta Modulation
A simplified form of DPCM where two-level (1-bit) quantizer is used in
conjunction with a fixed first-order predictor
) (n s
-
quantizer

Unit
delay z
-1

+
+
+
To channel
) (n e 1 ) (
~
= n e
) (
~
n s
) 1 (
~
) (
~
= n s n s
Z
-1

+
+
|(i)|
(n)
output
Lowpass
filter
) (
~
n s
This is an extreme method of differential quantization, where sampling
frequency is raised so high that the difference between adjacent samples
can be approximates by a 1-bit representation
Advantage is its simple structure, based on the fact that the correlation
between adjacent samples increases as a function of the sampling
frequency except for uncorrelated signals. If the correlation increases, the
prediction residual decreases
18
SBC (Subband Coding)
It is a coding in frequency domain, where the speech band is divided
into several neighbouring bands by a bank of band-pass filters (BPFs),
and a specific coding strategy is employed for each band signal
BPF
1
Coder
1
DS
1
BPF
N
Coder
N
DS
N
m
u
l
t
i
p
l
e
x
e
r
BPF
1
Deco-
der
1
INT
1
BPF
N
Deco-
der
N
INT
N
d
e
m
u
l
t
i
p
l
e
x
e
r
input output
DS = down-sampling; INT = interpolation
Speech signal passing through each BPF, is transformed into a baseband
signal by low-frequency conversion, down-sampled at the Nyquist rate,
and coded by an adaptive coding method, such as APCM
The inverse procedures reproduce the original signal
10
19
SBC
Design of the filter is important in achieving good performance of the
SBC
Advantages:
Processing concerning human auditory characteristics such as noise
shaping can easily be applied
A higher bit rate can be allocated to those bands in which higher
speech energy in concentrated or to those bands which are
subjectively more important
Produce less perceptible quantization noise at the same or even at a
lower bit rate
The quantization noise produced in one band does not influence any
other band; Or low-level speech input will not be corrupted by
quantization noise in another band
20
SBC
Since a short-time frequency analysis of input signals is performed in
the human auditory system, the method for controlling the
quantization noise in the frequency domain is effective and relatively
natural
The filter bank necessary for this method is realized by general digital
filters which handles analog sampled values. The most reasonable
way of dividing the frequency band is to equalize the contributions to
the articulation index from all subbands
Although this method is classified as a frequency-domain coding, it
can also be defined as a time-domain coding method, where input
signals are subdivided into frequency bands, and quantized.
11
21
ATC (Adaptive Transform Coding)
It is a method where a speech signal is divided into several frequency
bands, similar as in SBC. Here, a speech wave is divided into frames,
where each frame can be considered stationary.
Each speech frame is first orthogonally transformed into frequency-
domain components, which are subsequently processed by adaptive
quantization
22
ATC (Adaptive Transform Coding)
At the decoder stage, the speech wave is reproduced by concatenating
the inverse-transformed block waveforms
The system usually used a discrete cosine transform and adaptive bit
allocation for transformation and quantization.
To achieve coding efficiency, more bits are assigned to the more
important spectral coefficients and fewer bits to the less important
spectral coefficients.
By using a dynamic allocation in the assignment of the total number of
bits to the spectral coefficients, the changing statistics of speech signal
can be adapted.
12
23
Vocoders
The previous waveform coding techniques are based on either a sample-
by-sample, or a frame-by-frame, speech waveform representation either
in the time or frequency domain
Here, the method is done based on the representation of a speech signal
by an all-pole model of the vocal system. In another words, the speech
production system is modeled as an all-pole filter
For voiced speech, the excitation is a periodic impulse train with period
equal to the pitch period of the speech; For unvoiced speech, the
excitation is a white noise sequence
Basically, in the vocoders the model parameters is estimated from
frames of speech (speech analysis), encode and transmit the parameters
to the receiver on a frame-by-frame basis, and reconstruct the speech
signal from the model (speech synthesis) at the receiver
24
Vocoders
Most widely discussed, such as channel vocoders, phase vocoders,
formant vocoders or cepstral vocoders
13
25
LPC (Linear Predictive Coders)
This is a time-domain vocoders, where the significant features of speech
is extracted from the time waveform
The LPC is computationally intensive, however it is the most popular
among the class of low bit rate vocoders
26
LPC (Linear Predictive Coders)
Advantages:
The system is free from quality degradation due to source modeling
A low-frequency waveform is exactly reproduced within the limit of
the quantization error
Spectral information for the entire frequency range is efficiently
represent by this method
Since pitch period estimation and voiced/unvoiced decision are not
necessary, the system is free from both pitch estimation error and
voiced/unvoiced decision error.
Most widely discussed, such as residual excited LPC and multipulse
LPC
14
27
Performance Evaluation
Two techniques for evaluating the quality of speech coded in various
methods:
Subjective evaluation
The listening tests are conducted by playing the sample to a number
of listener and asking them to judge the quality of the speech
The tests provide results in terms of overall quality, listening effort,
intelligigibility, and naturalness
Examples:
1. A-B discrimination test
Test transparency of the quantizer, for broadcast-quality
coders.
Force the listeners to guess which of two signals was the
original, and which was quantized
28
Performance Evaluation
2. Diagnostic Rhyme Test (DRT)
The most popular and widely used intelligibility test.
Measure the listener ability to identify the spoken word.
Here, a word from a pair of rhymed words such as those-dose is
presented to the listener and the listener is asked to identify which
word was spoken
Typical percentage correct on the DRT tests, range from 75-90%
3. Diagnostic Acceptability Measure (DAM)
Evaluate acceptability of speech coding systems
These tests results are difficult to rank and hence require a
reference system
15
29
Performance Evaluation
4. Mean Opinion Score (MOS)
The most popular ranking system
Ask listeners to rate signals on a five-point scale
Average across listeners, and across sentences
Quality Scale Score Listening Effort Scale
Excellent 5 No effort required
Good 4 No appreciable effort required
Fair 3 Moderate effort required
Poor 2 Considerable effort required
Bad 1 No meaning understood with reasonable effort
30
Performance Evaluation
Objective evaluation
Have a general nature of a signal-to-noise ratio
Provide a quantitative value of how well the reconstructed speech
approximates the original speech
It doesnt necessarily give an indication of speech quality as perceived
by the human ear
Examples: Mean Square Error (MSE) distortion, frequency weighted
MSE, SNR, segmented SNR, etc.
16
31
References
Z.N. Li and M.S. Drew, Fundamentals of Multimedia,
Pearson Education, 2004
T.F. Quatieri, Discrete-Time Speech Signal Processing,
Principles and Practice, Prentice Hall, 2002
J.R. Deller, J.G. Proakis and J.H.L. Hansen, Discrete-Time
Processing of Speech Signals, Prentice Hall, 1993
S. Furui, Digital Speech Processing, Synthesis, and
Recognition, Marcel Dekker, 1989
B. Gold and N. Morgan, Speech and Audio Signal Processing,
Processing and Perceptual of Speech and Music, John Wiley &
Sons, 2000
32
References
T. Painter and A. Spanias, Perceptual Coding of Digital
Audio, Proc. of IEEE, vol. 88. No 4, April 2000

You might also like