Vocoder

Vocoders
1
The Channel Vocoder (analyzer):
 The channel vocoder employs a bank of
bandpass filters,
 Each having a bandwidth between 100 Hz and 300
Hz.
 Typically, 16-20 linear phase FIR filter are used.
 The output of each filter is rectified and lowpass
filtered.
 The bandwidth of the lowpass filter is selected to
match the time variations in the characteristics of the
vocal tract.
 For measurement of the spectral magnitudes, a
voicing detector and a pitch estimator are
included in the speech analysis.
2
The Channel Vocoder (analyzer block diagram):
Bandpass Lowpass A/D
Rectifier
Filter Filter Converter
Bandpass Lowpass A/D

Rectifier
Encoder
Filter Filter Converter To
S(n)
Channel
Voicing
detector
Pitch
detector
3
The Channel Vocoder (synthesizer):
 16-20 linear-phase FIR filters
 Covering 0-4 kHz
 Each having a bandwidth between 100-
300 Hz
 20-ms frames, or 50 Hz changing of
spectral magnitude
 LPF bandwidth: 20-25 Hz
 Sampling rate of the output of the filters:
50 Hz
4
 Bit rate:
1 bit for voicing detector
 6 bits for pitch period
 For 16 channels, each coded with 3-4 bits,
updated 50 times per second
 Then the total bit rate is 2400-3200 bps
 Further reductions to 1200 bps can be
achieved by exploiting frequency correlations
of the spectrum magnitude
5
 At the receiver the signal samples are passed
through D/A converters.
 The outputs of the D/As are multiplied by the

voiced or unvoiced signal sources.
 The resulting signal are passed through

bandpass filters.
 The outputs of the bandpass filters are summed

to form the synthesized speech signal.
6
The Channel Vocoder (synthesizer block diagram):
D/A Bandpass
Converter Filter
Output
∑ speech
D/A Bandpass
Converter Filter
Decoder
From
Channel
Voicing
Information Switch
Random
Pitch Pulse Noise
period generator generator
7
The Phase Vocoder :
 The phase vocoder is similar to the
channel vocoder.
 However, instead of estimating the pitch,

the phase vocoder estimates the phase
derivative at the output of each filter.
 By coding and transmitting the phase

derivative, this vocoder destroys the phase
information .
8
The Phase Vocoder
(analyzer block diagram, kth channel)
cos k n Short-term
magnitude
cos  n
ak n sin k n
Lowpass k
Decimator
Filter
Differentiator Compute
Short-term
Encoder
S(n) To
Magnitude
And Channel
Phase
Differentiator Derivative
Lowpass
cos  n
k Decimator
Filter bk n
Short-term phase
sin k n derivative
9
The Phase Vocoder
(synthesizer block diagram, kth channel)
Decimated
Short-term
amplitude
cos k n
Decoder
From
Channel
Cos Interpolator
Integrator ∑
Decimated Sin Interpolator

Short-term
sin k n
Phase
derivative
10
The Phase Vocoder :
 LPF bandwidth: 50 Hz
 Demodulation separation: 100 Hz
 Number of filters: 25-30
 Sampling rate of spectrum magnitude and phase
derivative: 50-60 samples per second
 Spectral magnitude is coded using PCM or
DPCM
 Phase derivative is coded linearly using 2-3 bits
 The resulting bit rate is 7200 bps
11
The Formant Vocoder :
 The formant vocoder can be viewed as a
type of channel vocoder that estimates the
first three or four formants in a segment of
speech.
 It is this information plus the pitch period

that is encoded and transmitted to the
receiver.
12
The Formant Vocoder :
 Example of formant:
 (a) : The spectrogram of the utterance “day one”
showing the pitch and the harmonic structure of
speech.
 (b) : A zoomed spectrogram of the fundamental and
the second harmonic.
(a) (b)
13
The Formant Vocoder (analyzer block diagram):
F3
F3 B3
F2
F2 B2
Input
Speech F1
F1 B1
Pitch V/U
And
V/U F0
Decoder
Fk :The frequency of the kth formant

Bk :The bandwidth of the kth formant
14
The Formant Vocoder ( synthesizer block diagram) :
F3
F3
B3
F2
F2 ∑
B2
F1
F1
B1
V/U Excitation
F0 Signal
15
Linear Predictive Coding :
 The objective of LP analysis is to estimate
parameters of an all-pole model for the vocal
tract.
 Several methods have been devised for

generating the excitation sequence for speech
synthesizes.
 Various LPC-type speech analysis and synthesis

methods differ primarily in the type of excitation
signal generated for speech synthesis.
16
LPC 10 :
 This methods is called LPC-10 because of
10 coefficient are typically employed.
 LPC-10 partitions the speech into the 180

sample frame.
 Pitch and voicing decision are determined

by using the AMDF and zero crossing
measures.
17
A General Discrete-Time Model For Speech Production
Pitch Gain
s(n)
Speech
DT G(z)
Signal
Voiced Impulse Glottal U(n)
generator Filter Voiced
Volume
V H(z) R(z)
velocity Vocal tract LP
U Filter Filter
Uncorrelated
Unvoiced Noise
generator
Gain
18
‫پيشگويي خطي‬
‫تعيين مرتبه پيشگويي‬
‫صفحه ‪ 19‬از ‪54‬‬

‫صفحه ‪ 20‬از ‪54‬‬

 m s 2
[ n ] 
PG  10 log  n  m  M 1 
 m 
 n m M 1
2
e [ n ] 
54 ‫ از‬21 ‫صفحه‬
‫مثال‬
‫‪M=4‬‬
‫‪M=10‬‬
‫صفحه ‪ 22‬از ‪54‬‬

‫مثال‬
‫‪M=2‬‬
‫‪M=10‬‬
‫‪M=54‬‬
‫صفحه ‪ 23‬از ‪54‬‬

‫ايده پيشگويي خطي بلند مدت‬
‫‪M=10‬‬
‫‪M=50‬‬
‫صفحه ‪ 24‬از ‪54‬‬

‫پيشگويي خطي بلند مدت‬
‫صفحه ‪ 25‬از ‪54‬‬

‫وكدر ‪LPC10‬‬
‫مشخصات عمومي‬
‫‪LPC10 ‬‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫‪‬‬
‫صفحه ‪26‬‬
LPC10 ‫وكدر‬
‫كد كننده‬
PCM
LPC LPC
LPC
Bit Encoder
54 ‫ از‬27 ‫صفحه‬
‫تشخيص پريود پيچ‬
m

R[l,m]   s[n]s[n  l]
n  m  N 1 
m
MDF[l , m]   s[n]  s[n  l ]
n  m  N 1  YMC
s[n]  b. s[n  N ]  e[n], m  N 1  m
28 ‫صفحه‬
‫‪MDF‬‬
‫‪‬‬
‫‪T=20,21,…,39,40,42,…,80,84‬‬
‫‪,…,154‬‬
‫صفحه ‪ 29‬از ‪54‬‬

‫كد كننده‬
‫‪LPC‬‬
‫‪RC‬‬
‫صفحه ‪ 30‬از ‪54‬‬

‫سنتز گفتار‬
‫سيگنال اصلي‬
‫بخش كد كننده‬
‫• تعيين صدادار‪/‬بيصدا بودن فريم‬
‫• تعيين دوره گام فثط براي حالت‬
‫صدادار‬
‫• محاسبه بهره سيگنال‬
‫قطار ضربه با پريود‬ ‫‪V/U‬‬

‫يراير دوره گام‬ ‫‪G‬‬
‫گفتار سنتز شده‬

‫نويز‬
‫تصادفي‬
‫صفحه ‪31‬‬
‫محدوديتها‬
‫‪AR‬‬
‫صفحه ‪32‬‬
Residual Excited LP Vocoder :
 Speech quality can be improved at the
expense of a higher bit rate by computing
and transmitting a residual error, as done
in the case of DPCM.
 One method is that the LPC model and

excitation parameters are estimated from
a frame of speech.
33
 The speech is synthesized at the transmitter and
subtracted from the original speech signal to
form the residual error.
 The residual error is quantized, coded, and

transmitted to the receiver
 At the receiver the signal is synthesized by

adding the residual error to the signal generated
from the model.
34
 The residual signal is low-pass filtered at 1000 Hz in the
analyzer to reduce bit rate
 In the synthesizer, it is rectified and spectrum flattened

(using a HPF), the lowpass and highpass signals are
summed and the resulting residual error signal is used to
excite the LPC model.
 RELP vocoder provides communication-quality speech

at about 9600 bps.
35
RELP Analyzer (type 1):
S(n) Buffer f (n; m) e (n; m)
And ∑
Residual
window error
LP
Parameters
stLP {â(i; m)}
Encoder
analysis To
Θ̂ 0 , gain estimate
Excitation Channel
V/U, decision
parameters
P̂, pitch estimate
LP
Synthesis
model
36
RELP Analyzer (type 2):
Prediction
Residual
S(n) Buffer f (n; m) Inverse  (n; m) Lowpass To
And Filter Decimator DFT Encoder
Filter Channel
window Â(z; m)
LP
Parameters
stLP
analysis {â(i; m)}
37
Synthesizer for a RELP vocoder
Buffer Residual
From Highpass
Decoder And Interpolator Rectifier
Filter
Channel Controller
∑
LP
model
Parameter
updates
LP Excitation
synthesizer
38
Multipulse LPC Vocoder
 RELP needs to regenerate the high-
frequency components at the decoder.
A crude approximation of the high frequencies
 The multipulse LPC is a time domain
analysis-by-synthesis method that results
in a better excitation signal for the LPC
vocal system filter.
39
Multipulse LPC Vocoder
 The information concerning the excitation sequence
includes:
 the location of the pulses
 an overall scale factor corresponding to the largest pulse amplitude
 The pulse amplitudes relative to the overall scale factor
 The scale factor is logarithmically quantized into 6 bits.

 The amplitudes are linearly quantized into 4 bits.
 The pulse locations are encoded using a differential
coding scheme.
 The excitation parameters are updated every 5 msec.
 The LPC vocal-tract parameters and the pitch period are
updated every 20 msec.
 The bit rate is 9600 bps.
40
Analysis-by-synthesis coder
 A stored sequence from a Gaussian
excitation codebook is scaled and used to
excite the cascade of a pitch synthesis filter
and the LPC synthesis filter
 The synthetic speech is compared with the
original speech
 Residual error signal is weighted
perceptually by a filter ˆ ˆ
 ( z / c) A( z )
W ( z)  
ˆ
 ( z) Aˆ ( z / c)
41
Obtaining the multipulse excitation:
(Analysis by synthesis method)
Input speech
s(n)
Buffer
And
P̂ LP analysis
f(n; m)
Pitch LP +  (n; m)
-
Synthesis Synthesis ∑
filterΘp (z) filter f̂(n; m)
Perceptual
Weighting
filter W(z)
Multipulse
Error
Excitation
generator
minimization  W (n; m)
42
Code Excited LP :
 CELP is an analysis-by-synthesis method
in which the excitation sequence is
selected from a codebook of zero-mean
Gaussian sequence.
 The bit rate of the CELP is 4800 bps.
43
CELP (analysis-by-synthesis coder) :
Speech samples
Buffer and Side

LP
LP analysis information
Gain parameters
Gaussian Pitch Spectral
Excitation Synthesis
Envelope ∑
(LP)
codebook filter Synthesis filter
Perceptual
Weighting
Filter W(z)
Computer Index of
Energy
Excitation
(square and sum)
sequence
44
 This weighted error is squared and
summed over a subframe block to give the
error energy
 By performing an exhaustive search
through the codebook we find the
excitation sequence that minimize the
error energy
45
 The gain factor for scaling the excitation
sequence is determined for each
codeword in the codebook by minimizing
the error energy for the block of samples
46
CELP (synthesizer) :
From Buffer Gaussian Pitch LP

decoder And Excitation Synthesis Synthesis
Channel controller codebook filter
filter
LP parameters,
gain and pitch
estimate
updates
47
CELP synthesizer
 Cascade of two all-pole filter with coefficients
that are updated periodically
 First filter is a long-delay pitch filter used to
generate the pitch periodicity in voiced speech
 This filter has this form
p
 p ( z) 
1  bz  p
48
CELP
 Parameters of the filter can be determined
by minimizing the prediction error energy,
after pitch estimation ,over a frame
duration of 5msec
 Second filter is a short-delay all-pole
(vocal-tract) filter and has 10-12
coefficients that are determined every 10-
20msec
49
Example:
 sampling frequency is 8khz
 subframe block duration for the pitch
estimation and excitation sequence is
performed every 5msec.
 We have 40 samples per 5-msec
 The excitation sequence consist of 40
samples
50
Example:
 A codebook of 1024 sequences gives
good-quality speech
 For such codebook size ,we require
10bits to send codebook index
 Hence the bit rate is reduced by a factor
of 4
 The transmission of pitch predictor
parameters and spectral predictor brings
the bit rate to about 4800 bps
51
Low-delay CELP coder
 CELP has been used to achieve toll-
quality speech at 16000 bps with low
delay.
 Although other types of vocoders
produces high quality speech at 16000
bps these vocoders buffer 10-20msec of
speech samples
52
 The one way delay is of the order of 20-40
msec
 With modification of CELP, it is possible to
reduce the one-way delay to about 2ms
 Low-delay CELP is achieved by using a
backward-adaptive predictor with a gain
parameter and an excitation vector size as
small as 5 samples
53
Input Speech
s(n)
Buffer and
window
Excitation f(n; m)
f̂(n; m) +
Vector LP (high-order)
Gain
Synthesis filter ∑
quantizer -
codebook  (n; m)
Predictor Perceptual
Gain adaptation Weighting
adaptation Filter W(z)
 W (n; m)
Error
minimization
54
 Pitch predictor used in the conventional
forward-adaptive coder is eliminated
 In order to compensate for the loss in pitch

information, the LPC predictor order is
increased significantly , to an order of 50
55
 LPC coefficients are updated more
frequently, every 2.5 ms
 5-sample excitation vector corresponds to

an excitation block duration of 0.625 msec
at 8-kHz sampling rate
56
 The logarithm of the excitation gain is
adapted every subframe excitation block
by employing a 10th-order adaptive linear
predictor in the logarithmic scale
 The coefficients of the logarithmic-gain

predictor are updated every four blocks by
performing an LPC analysis of previously
quantized excitation signal blocks
57
 The perceptual weighting filter is also 10th
order and is updated once every four
blocks by employing an LPC analysis on
frames of the input speech signal of
duration 2.5 msec
 The excitation codebook in the low-delay
CELP is also modified compared to
conventional CELP
 10-bit excitation codebook is employed
58
Vector Sum Excited LP :
 The VSELP coder and decoder basically differ in
method by which the excitation sequence is
formed
 In the next block diagram of the VSELP, there

are three excitation sources
 One excitation is obtained from the pitch period

state
 The other two excitation sources are obtained

from two codebooks
59
VSELP Decoder :
Long-term
Filter state
0
Spectral
Pitch envelop
Codebook Spectral Synthetic
∑ synthesis (LP)
1 synthesis post filter Speech
filter
filter
1
Codebook
2
2
60
VSELP Decoder
 LPC synthesis filter is implemented as a
10-pole filter and its coefficients are coded
and transmitted every 20ms
 Coefficients are updated in each 5-ms
frame by interpolation
 Excitation parameters are also updated
every 5ms
61
VSELP Decoder
 128 codewords in each of the two
codebooks
 codewords are constructed from two sets
of seven basis codewords by forming
linear combinations of the seven basis
codewords
 The long-term filter state is also a
codebook with 128 codeword sequences
62
VSELP Decoder
 In each 5-msec frame, the codewords from
this codebook are filtered through the
speech system filter ˆ( z ) and correlated
with the input speech sequence
 The filtered codeword is used to update

the history and the lag is transmitted to the
decoder
63
VSELP Decoder
 Thus the update occurs by appending the
best-filtered codeword to the history
codebook
 The oldest sample in the history array is
discarded
 The result is that the long-term state
becomes an adaptive codebook
64
VSELP Decoder
 The three excitation sequences are
selected sequentially from each of three
codebooks
 Each codebook search attempts to find the
codeword that minimizes the total energy
of the perceptually weighted error
 Once the codewords have been selected
the three gain parameters are optimized
65
VSELP Decoder
 Joint gain optimization is sequentially
accomplished by orthogonalizing each
weighted codeword vectors prior to the
codebook search
 These parameters are vector quantized to
one of 256 eight-bit vectors and
transmitted in every 5-ms frame
66
Vector Sum Excited LP :
 The bit rate of the VSELP is about 8000 bps.
 Bit allocations for 8000-bps VSELP
Parameters Bits/5-ms Frame Bits/20ms

10 LPC coefficients - 38
Average speech energy - 5
Excitation codewords from
two VSELP codebooks 14 56
Gain parameters 8 32
Lag of pitch filter 7 28
Total 29 159
67
VSELP Decoder
 Finally, an adaptive spectral post filter is
employed in VSELP following the LPC
synthesis filter; this post filter is a pole-zero
filter of the form
ˆ( z / c) Aˆ ( z )
W ( z)  
ˆ( z ) Aˆ ( z / c)
68
DEMO
Speech Codec Male Female Music
Speaker Speaker
Original Speech/Music
(16-bit sampled at 8KHz)
FS-1015 (LPC-10e 2.4
kb/s)
FS-1016(CELP 4.8 kb/s)
IS-54 ( VSELP 7.95 kb/s)
G.721 (32 kb/s ADPCM)
69
 Standard Voice Algorithms
 G.711
 The most widely used digital representation of voice signals is that of
the G.711 or PCM (Pulse Code Modulation)
 This codec represents a 4 kHz band limited voice signal sampled at 8
kHz using 8 bits per sample A-law or m-law coding.
 G.726
 The protocol for the G.726 codec requires a 64 kbps A-Law or m-law
PCM signal to be encoded into four different bit rate options ranging
from 2 bits per sample to 5 bits per sample
 The algorithm is based on Adaptive Differential Pulse Code Modulation
(ADPCM) and is based on 1 sample backward prediction scheme.
70
 G.728
 The G.728 algorithm compresses PCM codec voice signals to a bit rate of 16 kbps.
 This algorithm is based on a strong backward prediction scheme and is by far considered as one
of the most complex voice algorithms to be produced by the ITU standard organization.
 G.729
 For compression of voice signals at 8 kbps the G.729 algorithm offers toll quality with built in
algorithmic delays of less than 15 msec
 Additional features described in the G.729 Annex ensure VAD1 and Comfort Noise Generation
functionalities to enhance the quality and reduce the overall bit rate
 G.723.1
 The most widely used algorithm for band limited channels, such as VoIP and video conferencing,
is that of G.723.1
 The algorithm has two operating bit rates of 6.3 kbps and 5.3 kbps
 Although the delay is not as low as that of the other ITU standards its quality is near toll quality for
the given low bit rates, making it very efficient in bit usage.
71
 GSM2—AMR
 The latest GSM standard is the multi rate Adaptive Code Excited Linear Prediction
that provides compression in the range of 4.75 to 12.2 kbps
 In total the codec provides 12 bit rates that cover the half rate to full rate channel
capacity.
 GSM—FR
 The first digital codec used in a mobile environment is the GSM Full Rate vocoder
 The codec compresses 13 bit PCM sample signals to a rate of 13 kbps
 The algorithm is based on a very simple Regular Pulse Excited – Linear Prediction
Coding technique.
 GSM—HR
 To increase capacity, the GSM committee decided on a lower bit rate of 5.6 kbps for
the voice channel
 The algorithm is based on the Vector Sum Excited Linear Predictive (VSELP) and is
computationally as complex as other low bit rate algorithms.
72

Vocoder

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vocoder

Uploaded by

Copyright:

Available Formats

Vocoders

Bandpass Lowpass A/D

 The outputs of the D/As are multiplied by the

 The resulting signal are passed through

 The outputs of the bandpass filters are summed

 However, instead of estimating the pitch,

 By coding and transmitting the phase

Decimated Sin Interpolator

 It is this information plus the pitch period

Fk :The frequency of the kth formant

 Several methods have been devised for

 Various LPC-type speech analysis and synthesis

 LPC-10 partitions the speech into the 180

 Pitch and voicing decision are determined

‫صفحه ‪ 19‬از ‪54‬‬

‫صفحه ‪ 20‬از ‪54‬‬

‫صفحه ‪ 22‬از ‪54‬‬

‫صفحه ‪ 23‬از ‪54‬‬

‫صفحه ‪ 24‬از ‪54‬‬

‫صفحه ‪ 25‬از ‪54‬‬

‫صفحه ‪ 29‬از ‪54‬‬

‫صفحه ‪ 30‬از ‪54‬‬

‫قطار ضربه با پريود‬ ‫‪V/U‬‬

‫گفتار سنتز شده‬

 One method is that the LPC model and

 The residual error is quantized, coded, and

 At the receiver the signal is synthesized by

 In the synthesizer, it is rectified and spectrum flattened

 RELP vocoder provides communication-quality speech

stLP {â(i; m)}

 The scale factor is logarithmically quantized into 6 bits.

 The bit rate of the CELP is 4800 bps.

Buffer and Side

From Buffer Gaussian Pitch LP

 In order to compensate for the loss in pitch

 5-sample excitation vector corresponds to

 The coefficients of the logarithmic-gain

 In the next block diagram of the VSELP, there

 One excitation is obtained from the pitch period

 The other two excitation sources are obtained

 The filtered codeword is used to update

Parameters Bits/5-ms Frame Bits/20ms

FS-1016(CELP 4.8 kb/s)

IS-54 ( VSELP 7.95 kb/s)

G.721 (32 kb/s ADPCM)

You might also like