Professional Documents
Culture Documents
Gain Normalization in BPS Homomorphic Vocoder: Jae Chung and Ronald Schafer
Gain Normalization in BPS Homomorphic Vocoder: Jae Chung and Ronald Schafer
HOMOMORPHIC VOCODER
Jae H. Chung and Ronald W. Schafer
Georgia Institute of Technology
School of Electrical Engineering
Atlanta, Ga 30332
thesizer, depicted in Figure 2, an excitation sequence consisting of isolated impulses or random noise was created and
this input was convolved with the estimated vocal tract impulse response to produce the synthetic speech output. The
pitch period, amplitude of the excitation, and the low-time
cepstrum values comprise a parametric representation of the
speech signal that can be encoded for digital transmission
or storage.
The availability of increasingly powerful, inexpensive,
DSP microcomputers has made it possible to consider much
more sophisticated methods for obtaining the excitation signal in vocoders. Multipulse[3], code excited[4], and self excited or vector excitation[5] LPC vocoders have been widely
Abstract
This paper describes a new technique for coding the gains in a vector excitation homomorphic
vocoder. In this system, the excitation signal, which
is obtained by analysis-by-synthesis,consists of a part
derived from a Gaussian codebook and a part derived from the past excitation. The paper shows how
the correlation between the two gain parameters of
the excitation can be increased and how they can be
jointly coded at a lower bit-rate. This new approach
makes it possible to reduce the bit rate of the homomorphic vocoder from 4800 bps to 4200 bps with
essentially no degradation in speech quality.
1 Introduction
In the original definition of the homomorphic vocoder, an
estimate of the time-varying vocal tract impulse response
was extracted using the homomorphic filtering procedure
depicted in Figure 1.[1]The upper part of the figure depicts
the operations required to compute the cepstrum h[n] of
the vocal tract impulse response.[l][2]In Figure 1, v[n] is a
window sequence (e.g., Hamming window) which selects a
short segment of the speech signal for analysis, and l[n]is a
lifter of the form
l[n] =
2, 1 5 n < no
0, otherwise,
(1)
322.4.1.
0942
CH2829-0/90/0000-0942 $1 .OO
0 1990 IEEE
Figure 3 shows a block diagram representation of the analysisby-synthesis algorithm for determining the excitation signal
e[n]for the homomorphic vocoder. The excitation model for
a short excitation analysis frame (e.g., 5 msec or 40 samples
at the 8 kHz sampling rate) is of the form
41. = P1z1[.] + P Z Z Z [ ~ ]
where 21[n] = g[n] * f,,[n] and xz[n] = g[n] * e[n - 721,
(3)
and
g[n] = w[n]* h[n]is the perceptually weighted vocal tract
impulse response. The excitation signal is composed of the
where fY1[n]
is a zero-mean
following two parts: /31f7,,[n],
Gaussian codebook sequence corresponding to index 71 in
the codebook, and Pze[n- r2],which represents a short segment of the past (previously computed) excitation beginning
-y2 samples before the present excitation frame. Henceforth,
self-ezcitation gain.
First, the parameters
the mean-squared error
72
and
(4)
(5)
n
Gain Normalization
322.4.2.
0943
Samples/Frame
Bit Rate
Bits/Frame
cepstrum I 81 82 I 71 72 excitation I cepstrum (bits/sec)
40 I
160
4800
81I 4 4 1I 7 7 I
81 1 41 7
71
40 I
160 I
4200 I
I
5
which serves as an approximate relationship between the
codebook gain
and the normalized self-excited gain aIp21.
Conclusions
322.4.3.
0944
References
[ 11 A. V. Oppenheim, Speech analysis-synthesis system
based on homomorphic filtering, J. Acoust. Soc. Am.,
vo1.45, pp.458-465, Feb. 1969.
B. Atal and J. Remde, A new model of LPC excitation for producing natural-sounding speech at low bit
rates, Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 614-617, 1982.
[4] M. R. Schroeder and B. Atal, Code-excited linear prediction (CELP): high-quality speech at very low bit
rates, Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 937-940, 1985.
44
DISCRETE
CONVOLIPTON
+31
hI.[
ERROR
ZATION
Weighted Error
322.4.4.
0945
IOW
800
600
.. ... .
400
103 :
200
'0
SO
100
150
200
250
300
350
400
10'
4
2
0
0
50
100
150
200
250
excitationframe index
300
350
100
400
10'
102
104
103
SELF-EXCITATION GAIN
NO-D
IPZ I.
loo0
800
1
600
400
0.5
200
0
0
SO
100
150
250
300
350
400
50
100
150
200
250
excitationframe index
300
350
400
200
8
6
0.5
2
0
SO
100
150
200
250
excitationframe index
300
350
0
0
400
322.4.5.
0946