Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VQL. ASSP-28, NO.

1, FEBRUARY 1980 99

value. Thus, we might expect that the true value is 0.81 and window by the synthesis window prior to overlap-adding. This
(0.7/0.865). In bath cases we expect that the true MSC is simplified interpretation results in a more efficient structure for short-
in about the 0.7 to 0.8 range, but our estimates varied from time synthesis when a synthesis window is desired. In addition, we
0.4 to 0.7 because of misalignment when estimating the MSC, show how this structure can be used for andysis/synthesis applications
which require different analysis and synthesis rates, such as time com-
IV. CONCLUSION pression or expansion.
In conclusion, we see that even with a large number of FFT
segments, estimates of the magnitude-squared coherence can I. INTRODUCTIUN
be significantly biased downward, giving an erroneous indica-
tion of the value of the coherence. When the data are realigned The concepts ‘ofshort-time Fourier analysis and synthesis
and processed, estimates of the coherence are informative have been widely used for analyzing and modeling quasi-
descriptors of the extent to which the mean channel can be stationary (slowly time-varying) signals, such as speech. These
modeled by a h e a r timeinvariant filter. concepts have evolved from two fundamenta1 points of view.
One point of view is basically that of a filter-bank model,
REFERENCES The input signal is filtered by a bank of bandpass filters which
span the frequency range of interest. The outputs of the band-
G. M, Jenkins and D.G , Watts, Spectral Analysis and Its Applb- pass filters can then be used to de€ine a short-time Fourier
cations. San Francisco: Holden-Day, 1968.
L. €4.Koopmans, m e Spectral Analysis of Time Series. New spectrum. This model is the basis fox the phase vocoder de-
York: Academic, 1974. veloped by Flanagan and Golden [ l ] In the synthesis pro- s

G. C. Carter and C, H. Knapp, ‘‘Coherence and its estimation cedure a signal can be reconstructed from its short-time
via the partitioned modified chirp-z transform,” 1 . E . Zhrzs. Fourier spectra by summing theoutputs of the bandpass
Acotcst,, Speech, Sigzal Pmcessirag, vol. ASSP-23, pp. 257-264, filters. This method of analysis and synthesis is referred to as
June 1975, the filter-bank summation method. In practice, since the out-
G.C.Carter, C. H. Knapp, and A. H.Nuttall, “Estimation of the put signals from the filter bank are narrow-band signals, they
magnitude-squaredcoherence function via overlapped fast Fourier can be sampled at a lower sampling rate than the initial input
transform processing,” IEEE Trans. Audio Electrowoust., VOL signal and interpolated back to a high sampling rate for
AU-21, pp. 337-344, Aug. 1973, synthesis.
A. H. Nuttall and G . C. Carter, “Bias of the estimate ofmagnitude-
squared coherence,”IEEE Truns, Acoust., Speech, signd Process- The second point of view is basically that of a block-by-
ing, VO~.ASS€’-24, pp. 582-583, Dw. 1976. block analysis in time. The input signal is time windowed into
5 . S, Bendat and A. G . Piersol, Random Data: Analysis rand Mea- overlapping finite duration time segments [ $1. Each segment
surement hocedures. New Y a k : Wiley, 19’71. (New engineer- is then Fourier transformed t o give a short-time Fourier spec-
ing text on coherence t o appear in 1980.) trum, Schafer and Rabiner showed haw this analysis tech-
W. G, Halvorsen and J. S . Bendat, “Noise source identification nique can be efficiently implemented using the FFT algorithm
using coherent output power spectra,” J. Sound and Vibrutiort, [ 21. Allen [ 31 carefully discussed a method of synthesizing a
pp. 15-24, A u ~ 1975.
.
D, R. Brillinger, Time Series Data Analysis and l7zeory. New signal from its short-time Fourier spectra by inverse transfarm-
York: Holt, Rinehart, and Winston, 1975. ing each sample of the short-time spectra to recover the short-
H, Akaike and Y. Yamanouchi, “On the statistical estimation time segments of the signal in time. These overlapped signal
of frequency response function,” Ann. Inst. Statist. Math., vol. segments are then appropriately summed (overlapped and
14, pp. 23-56,1963. added) to reproduce the time signal. This method is referred
G, C. Carter, A. H. Nuttall, and P. G . Cable, “The smoothed to as the overlap-add synthesis method.
coherence transform,” Proc, lEEE, vol. 61, pp. 1497-1498, More recently, Allen and Rabiner compared the effects of
1973. modifications of the shortdime spectrum in the filter-bank
C. H. Knapp and E. C, Carter, “The generalized correlation
method for estimation of time delay,” IEEE Duns. Acoust., summation and overlap-add methods. Portnoff [ 5 1 , [ 91 has
Speech, SE‘gnal Processing, vol. AMP-24, Aug. 1976. fully developed the use of a synthesis window and has de-
J. P. Kuhn, “Detection performance of the smoothed coherence veloped general expressions €or short-time analysislsynthesis
transform (SCUT),” present4 at 1978 Xnt. Conf. Acuust., using distinct analysis and synthesis windows [ 6 ] ,[ 91. He has
Speech, Signal Processing. proposed an.implementation of shor-Lme synthesis based on
the FFT algorithm in which inverse transformedshort-time
Fourier spectra are appropriately interpolated by the synthesis
filler to obtain the desired reconstructed output. In this paper
.how that Portnoff’s windowed synthesis procedure can be
greatly simplified and that it can be implemented in the form
of a weighted overlap-add procedure of short-time synthesis.
In this technique the short-time spectra are inverse trans-
A Weighted Overlap-Add Method of Short-Time Fourier formed to produce the short-time signal segments. These
short-time signal segments are weighted by the synthesis win-
Analysis/S ynthesis dow. They are then overlapped and added in a manner similar
to the overlap-add synthesis procedure. The synthesis window
R. E. CROCHIERE
has the same effect in the synthesis procedure of the filter-
bank summation method.
Abstract-In this correspondence we present a new structure and a
Thus, two principle methods of implementation (or interpre-
simplified interpretation of short-time Fourier synthesis using synthesis
tation) of short-time Fourier analysislsynthesis can be readily
defined (various modifications of these two basic methods, of
windows. We show that this approach can be interpreted as a modifica-
cou.rse, can be generated). They are the filter-bank summation
tion of the overlapadd method where we inverse the Fourier transform
method with decimation (reduced sampling rate) and interpofa-
tion in each channel, and ’the weighted overlap-add method
Manuscript received May 25, 1979; revised September 5,1979. in which the input of the discrete Fourier transform is win-
The author is with Bell Laboratories, Murray Hill, NJ 07974, dowed by the analysis window and the output of the inverse
OO96-35 I8/8O/U2OO-OU99$QU.75 O 1980 IEEE

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on November 04,2020 at 04:25:59 UTC from IEEE Xplore. Restrictions apply.
transform is windowed by the synthesis window in the manner
to be described in this paper, Both methods are completely
general in terms of irnplernentating arbitrary analysis and syn-
X,(sR) =
M-
x-
m=o
1
xm(sR) W M
-mk

thesis windows (filters) and they are mathematically identical


where
and interchangeable (assuming the analysis and synthesis win-
dows are appropriately defined). The mathematical basis is
similar to that sf Portnoff [ 6 ] and, depending on the specific
synthesis windows defined, the methods of filterhank sum and and where ( ( n ) ) denotes
~ n reduced modulo M . Equations
overlapadd analyzed by Allen [ 33 and Allen and Rabiner [ 4 ] ( I)-( 7 ) define the basic mathematical framework for shart-
result. time analysis.

11. REVIEWO F THE MATHEMATICAL F R A M E W O R K FOR


SHORT-TIME ANALYSIS/SYNTHESXS
As stated above, the mathematical framework considered
here is similar to that derived by Psrtnoff-only the interpreta-
tion of the DFT synthesis procedure is changed. Therefore,
the notation that we will use is similar to Portnoff's, with
small changes for convenience, and we refer t o [ 6 ] for details
of the derivations.
A* Analysis where f(n) defines the synthesis window. The conditions on
The discrete short-time Fourier transform of a signal x ( m ) , the windows (filters) f(n) and h ( n ) , in order to achieve an ex-
sampled at equispaced frequencies every R samples in time act resynthesis, are derived in [ 61 , By defining Gn (SI?') as
m is commonly defined in the form the inverse DFT of & ( s ~ '1, is.,
w 1 M-1

where (8) becomes

M is the number of frequency samples, k is the discrete 'fre-


quency index, h (m)is the analysis window, and s denotes the
time index of the short-time transform at the decimated sam-
pling rate (decimated by the integer factor R ) .
By a change of variables r = m - d?, (1) can be modified to
the form
and it corresponds to the short-time transform referenced t~
00 the linearly Increasing (sliding) time frame y1 = sR'. The in-
verse transform of &(SI? ') can be defined as $,(SI?'), and
from (1 I ) and the &tion between linear phasg shift in the
D I T frequency domain [used to derive (7)j it can be shown
that

where X & R ) is the short-time transform referenced to a


fixed time origin rn = s = 0 and X k ( s R ) is the short-time trans-
form referenced t o a linearly &&sing (sliding) time reference
m = sR,which corresponds to the origin of the sliding analysis
window. As seen by (3), these two short-time transform are
related by a linear phase component and their magnitudes are Applying (I1 3) to ( 1 0)gives
identical. The sharl-time transform Xk (sR> can be expressed
ry

in the DFT farm


M-I
Thus, it is seen that at the time ,reference y1 replaced by n +
m=Q soR'the so term in (14) contributes a component f(n) ($OR ')
to the time shifted signal x^(n + soR'). It should be noted that
where x,Y (sR)is the time-aliased signal each term in the sum of ( 14) is itself a sequence in n, with
each term concentrated in a different region of the rz axis.
This form will be useful in the next section t o define the
weighted overlap-add synthesis procedure.
I ----
III. WEIGHTED OVERLAY-ADD
IMPLEMENTATION OF
The interpretation of this time-aliased signal will become more
SHORT-TIME ANALYSIS/SYNTHESXS
clear in the next section.
Since a linear phase shift in the DFT frequency domain cor- With the above mathematical framework, we can now illus-
responds to a circular or modulo rotation in the time domain, trate the weighted overlapadd implementation. Tu perform
X&?) can also be defined as the DFT of the circularly TO- the analysis we first select the appropriate section of the input
tated version of zm(sR ), i.e., signal and window it by the analysis window. We will assume

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on November 04,2020 at 04:25:59 UTC from IEEE Xplore. Restrictions apply.
101

I M SAMPLES OF x(m) I FA SAMPLES

. INPUT BUFFER * X(m)


INPUT SAMPLES IN
J1 BLOCKS OF R SAMPLES

CIRCULAR
JL
ROTATION SHORT-TIME TRANSFORM
/( SLIDING T1ME REF. 1

4- OUTPUT SAMPLES
IN .BfBCKS OF
R ' SAMPLES
M SAMPLES I
ZERO
SAMPLES

Fig. 2. A weighted overlapadd procedure for short-time Fourier


analysis/syntkesis using linear phase modification in the frequency
domain.
increasing time referenced short-time transform &(sR '). It
is then multiplied by (- to account for the circillar rotation
in the time domain by M / 2 samples (Le,, the inverse operation
t o that shown in Fig. 1). The resulting phase-shifted signal is
then inverse transformed and windowed by the synthesis win-
dow. These inverse-transformed windowed short-time signals
Cb) are then summed into the output buffer in an overlap-add
Fig. 1. fa) Generation of the short-time signal zm(sR) by windowing manner according to (14). At the synthesis (block) time so,
and circulary shifting by M / 2 samples. (b) An equivalent method the so term in (14) is summed into the output buffer. The
based on a phase modification of the DFT. resulting structure is seen in the lower part of Fig. 2.
The overall structure of Fig. 2 is seen t o be a modification
that the duration of h (m)is less than M where M is the trans- of the overlap-add structure. Input samples x ( n ) are shifted
from size (assumed t o be even). We select a block of M sam- into the input buffer in blocks of R samples and samples of
ples ( x ( m + SR - M / 2 ) , pn = 8, 1, 2, M - 1) and window
e 9
the output signal x^@) are shifted out of the output buffer in
the block by h (- m), as seen in Fig. I(&, such that h (0)aligns blocks of R' samples, with zero valued samples filling in the
with the m = M / 2 sample In the block. Assuming that h ( m ) is rightmost R' samples of the output buffer. The entire struc-
a symmetric zero-phase (odd number of taps) FIR window and ture is implemented in a block-by-block manner with the
that we wish to align the center sample of the window to the operations between the two buffers being performed uncc per
first sample of the transform, the windowed signal can be cir- block.
cularly rotated by M / 2 samples as'seen in Fig. l(a). The re- Note that for generality we have allowed R and R' t o be dif-
sulting signal corresponds to the short-time signal gm(SR) ferent. This can be useful in applications such as time expan-
according to ( 9 , and its DFT corresponds to X&R). Since sion or compression where the input and output rates may be
a circular shift by M / 2 samples corresponds to ;phase shift of different { 6 ]
exp ( - j 2 7 ~ k ( M / 2 ) / M=) (- I)&, an alternate form of this imple- If the operation of the phase modification in Fig. 2 is com-
mentation is shown in Fig. l(b). mutable with the short-time spectral modification, then it is
The upper part of Fig. 2 shows the overall implementation possible to replace the phase modifications (- W i S R k and
of the short-time analysis based un (3)-(7) and on the inter- (- WkrR'k with a single phase modification of the form
pretation in Fig. l(b). The input signal is buffered with an M W a R f - R ) k . Furthermore, if R' = R, the phase modifications
sample buffer, The contents of the buffer are copied every R can be eliminated entirely and the spectral modifications can
samples and windowed by the analysis window h(-m). This be made directly on the short-time transform in the sliding
windowed segment of speech is transformed to give the short- time reference (again art& if the phase modification and spec-
time Fourier transform in the sliding time reference as de- tral modification operations are commutable).
scribed above and in Fig. l . It is then multiplied by the phase By using the fact that multiplication by a linear phase shift
factor (- I)kW&"Rk to convert it tu the fixed time reference. in the DFT frequency domain corresponds t o a circular rata-
From IS)-( 14), a similar inverse synthesis structure can be tion in the time domain, the alternate analysis structure of
derived, as shown in the lower part of Fig. 2. The short-time Fig. 1 (a) results. Similary, an alternate synthesis structure can
transform &(SI? ') is first multiplied by W M sR'k , according to be defined in which f & R r ) .is first inverse transformed and
( 1 1) t o convert it from a fixed time reference to the linearly then circularly rotated by ((12 + sR' + M/2))* samples accord-

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on November 04,2020 at 04:25:59 UTC from IEEE Xplore. Restrictions apply.
102 IEEE TRANSACTIQNS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL, ASSP-28, NO, 1 , FEBRUARY 1980

I -
v. CONCLUSION
In this paper we have proposed a weighted overlap-add im-
plementation of short-time anafysislsynthesis. The advantages
of this scheme are that it is more efficient, in terms of storage,
and easier to program than the method used by Portnoff. It
also clearly illustrates the relationship between the weighted
overlap-add and the filter-bank summation methods of short-
time analysislsynthesis using arbitrary analysis and synthesis
windows. We have alsa attempted to point aut the differences
between the sliding and the fixed time references which are
often confused. in these methods. The basic mathematical
framework is similar to that derived by Portnoff, including ef-
fects of short-time modification-only the interpretation of
the structure is different.
ACKNOWLEDGMENT

U
~ to thank J. B. Allen and M. R. Port-
The author W S U ~like
noff for comments in the development of this paper.
REFERENCES
[ 11 J. L. Flanagan and R. Golden, “Phase vocoder,” Bell S’st. Tech.
Fig. 3. The filter-bank summation interpretation of short-time Fourier J., V S ~ . 45, pp. 1493-1509, Nov. 1966.
analysislsynthesis. [Z] R. W. Schafer and L. R. Rabiner, “Design and simulation of a
speech analysis-synthesis system based on short-time Fourier anal-
.ing to ( 12) to give $,(SI?’) (multiplies by W M
rc s ( R ‘ - R ) k can also
ysis-synthesis system based on short-time Fourier analysis,”
Duns. Audio Elecmacoust., val. AU-21, pp. 165-174, June 1973,
be implemented with a circular shift). This signal is then [3] J. €3- Allen, “Short-term spectral analysis synthesis, and modifica-
weighted by f(n) and overlap-added into the output buffer in tion by discrete Fourier transform,” IEEE Trans. Acousl., Speech,
the same manner as in Fig. 2. Signal Processing, vol. A S P - 2 5 , pp. 235-238, June 1977.
The advantages of the implementation of Fig 2 of the short- [4] J. €3.Allen and L. R, Rabiner, “A unified approach to short-time
time synthesis over the structure proposed by Portnoff [ 51, Fourier analysis and synthesis,” mot. IEEE, v d 6 5 , pp. I55X-
[ 6 ] is that it requires substantially less internal storage of data 1564, Nov. 1977.
and it is more easily programmed. Only M samples of partial [SI ha. R. Portnoff, “Implementation of the digital phase vocoder us-
sums of products in the overlap-add output buffer need be ing the fast Fourier transform,” XEEE Trans. Acoust, Speech,
Signal Processing, vol. A S P - 2 4 , pp. 243-248, June 1976.
stored, as opposed to requiring the storage sf many inverse [ 6 ] M. R. Portnoff, “Time scale modification of speech based on
transformed short-time segments. The number of actual arith- short-time Fourier analysis,” Ph.D. dissertation, Massachusetts
metic operations, however, is the same as that of Portnoff‘s Inst. Tech., Cambridge, Apr. 1978.
(assuming the alternate method of circular rotation is used in [ 7 ] T.A. C. M. Claasen and W,F. G . Mecklenbrauker, “On the trans-
place of the phase modification). portation of linear timsvarying discrete-time networks and its
Another advantage of the implementation of Fig. 2 is that it application to rnultirate digital systems,” Philips J. Res., vol. 23,
clearly shows the relationship of the filter-bank synthesis pp. 78-102,1978.
method to that of the overlap-add synthesis method. For this [8] D. Gabar, “Theory of communication?” J, IEE, no. 93, pp. 429-
reason we refer to this method as a weighted overlap-add tech- 457, Nov. 1946.
[9] M, R. Portnoff, “Time-frequency representation of digital signals
nique. If an M point rectangular synthesis window is used, and systems based on short-time Fourier analysis,” I E E E Trans.
then the synthesis method is identical to the overlap-add tech- A C O U S, Speech,
~. Signal Processing, this issue, pp. 5 5 6 9 .
nique proposed by Allen [ 3 ] .
rv.RELATIONSHIP TQ THE MODIFIED FILTER-BANK
SUMMATIONMETHOD
An alternative interpretation of short-time analysis/synthesis
is that of the modified filter-bank summation approach shown Generating Covariance Sequences and the Calculation of
in Fig. 3. Since this structure is derived from the same mathe- Quantization and Rounding Error Variances in Digital Filters
matical framework as the weighted overlap-add structure, it is
clear that they are equivalent structures. The values X,(&) in JEANPIERRE DUGRE, ALBYSIUS A, L. BEEX,
Fig. 3 are identical to those in Fig. 2. Both structures can im- AND LOUIS L* SCHARF
plement arbitrary analysis and/or synthesis windows. In the
case where R = R ’= 1 and f ( n )= ~ ( v z(a) unit pulse, 6 ( n ) = 1
for y1 = 0 , and 6(n)= 0 for n # O), it can be seen that the Abstract-A linear algorithm is given for the generation of cuvariance
modified filter-bank summation method becomes identical to sequences for rational digital filters using numerator and denominator
the filter-bank summation approach discussed by Allen and coefficients directly. There is no need to solve a Lyapunov equation o r
Rabiner [ 4 ]
It is also interesting to note that if we take the transpose of Manuscript Ieceived December 22, 1978; revised June 8, 1979 and
either of the structures of Fig. 2 , or Fig. 3, the structures re- July 23, 1979. This work was partially supported by the Office of
main the same; however, the roles o f the analysis and synthesis Naval Research, Statistics and Probability Branch, Arlington, VA, under
windows and R and R‘ are interchanged. This is consistent Contract NUUO24-75-C-0518.
with the concepts of transposition of linear time varying sys- The authors are with the Department of ElectricalEngineering,
tems discussed by Claasen and Mecklenbrauker [ 71. Colorado State University, Fort Collins, CO 80523.

Authorized licensed use limited to: Universidad Nacional Autonoma De Mexico (UNAM). Downloaded on November 04,2020 at 04:25:59 UTC from IEEE Xplore. Restrictions apply.

You might also like