Professional Documents
Culture Documents
Structured Iir Models For HRTF Interpolation
Structured Iir Models For HRTF Interpolation
262
base filters. These associations tell the interpolation algo- terms in the denominator of the transfer function. In the
rithm which pole (or zero) from filter A corresponds to a sequel two techniques for obtaining IIR approximations to
given pole (or zero) in filter B, thereby allowing the speci- FIR filters are presented.
fication of intermediate configurations. Unfortunately there
is no foolproof rule to create this kind of associations, al- 2.1. The Kalman Method
though some ad hoc strategies [12] have been proposed.
One method for obtaining an IIR filter approximation [6, 7]
We propose a new strategy for IIR interpolation, namely
is based on observing incoming values x0 , x1 , x2 , . . . and out-
the use of structured IIR filters. The specific structure im-
coming values y0 , y1 , y2 , . . . of a filter and trying to fit these
posed on the IIR representations is based on a division of
observations to an IIR filter equation of the form
the frequency range in N frequency bands, and it requires
that each frequency band contains exactly one pole and one N N
zero. Each pole and each zero of a filter becomes thus imme- ŷ(n) = ∑ a j x(n − j) − ∑ b j y(n − j),
j=0 j=1
diately associated with the corresponding pole and zero of
another filter via their common frequency bands. We show where the error f (a, b) = ky − ŷk is to be minimized over
that HRTF filters may be approximated by structured IIR a given range of index values. This is a convex quadratic
filters with errors that are comparable to those obtained by optimization problem with a unique minimizer, that can be
Kalman filter models [7]. We further show that structured directly computed by solving the first order necessary opti-
IIR filters can be interpolated using filter coefficient inter- mality condition (∇ f (a, b) = 0). Denoting by
polation and pole-zero interpolation with better results than
the corresponding interpolation of Kalman filters. w(k) = (xk , . . . , xk−N , yk−1 , . . . , yk−N )0
The structure of this paper is as follows. Section 2 pre-
the windowed joint input-output process, one obtains
sents two methods to construct alternative IIR representa-
tions for each HRTF in the database, the Kalman method a
= R−1 c,
and the structured IIR method. Section 3 discusses two −b
IIR interpolation strategies, filter coefficient interpolation
and pole-zero interpolation, and section 4 presents and dis- where R = ∑k w(k)w(k)0 and c = ∑k y(k)w(k).
cusses numerical experiments regarding the fitness of the By letting x represent a Dirac’s Delta function and y rep-
IIR models and of the interpolation methods. Finally, sec- resent the HRIR of interest, the above method allows the
tion 5 presents some conclusions and further work. computation of an IIR model that is close to the original
HRTF in the sense that the errors between the original HRIR
and the predicted values given by the above filter equation
2. IIR MODELS FOR HRTFS are minimized.
To obtain best results all HRTFs are time-aligned [10,
HRTF filters are usually obtained by direct measurements of
1, 12] so that all HRIRs start at time 0 (this is undone later
the corresponding HRIR using contact microphones inside
in the interpolation phase). This time-alignment guarantees
the ears of a person or a head doll [2]. Such a representation
a minimum-phase property of all systems, thus avoiding a
in the form of an impulse response is equivalent to the list
large row of zero coefficients to compensate for the initial
of coefficients of a FIR filter. As the number of coefficients
time delay (for the arrival of the direct sound).
in this representation increases (with a fixed sample rate),
It should be noted that there is a catch in the above ob-
the better the reproduction of subtleties that torso, head and
jective function: the measured errors relate the correct out-
pinna imprints in the sound signal as it enters the auditory
put values and the predicted output values, obtained from
canal.
past input and from correct past output values. When this
While highly individualized HRTF filters may provide
filter equation is applied to a new signal these latter values
wonderful results in terms of auralization, they are computa-
are not available, and the feedback terms in the equation
tionally expensive to run on a multi-user setting, not to men-
are fed with predicted past output values. The consequence
tion the difficulties in measuring a different HRTF database
is that the minimal error obtained by this method does not
for each individual user. A reasonable alternative is the use
correspond to the error between the original HRIR and the
of non-individualized HRTF datasets, from with users can
impulse response of the IIR filter, or to the error between
extract useful directional information [14]. But even when
the original HRTF and the transfer function of the IIR filter
adopting non-individualized HRTF databases such as CIPIC
(these two errors are interrelated via Parseval’s theorem).
[2], no less than 200 filter coefficients are needed for each
HRTF.
2.2. Structured IIR Filters
IIR (or recursive) filters are able to represent transfer
functions with high level of detail and fewer coefficients [7] The IIR model proposed here addresses both the error is-
due to the feedback terms, which correspond to polynomial sue raised in the previous section as well as the difficulty
263
of making associations between poles and zeros of different a multi-resolution first-order descent method was applied to
filters raised in [12]. As mentioned in the introduction, this obtain stationary points.
model requires that each frequency band in the z plane con- Although the computational cost of solving these prob-
tains exactly one pole and one zero, in such a way that the lems for a whole HRTF database is high, this procedure has
error between the original HRTF and the corresponding IIR to be done only once at a setup stage, and so this cost has
transfer function is minimized. no impact whatsoever on the computational cost of auraliza-
Consider the division of the upper half of the complex tion, which is always linear on N.
plane into N radial frequency bands with endpoints ω0 =
0, ω1 , . . . , ωN = 2π. The optimization problem has 2N com- 3. IIR HRTF INTERPOLATION
plex variables π1 , . . . , πN and ζ1 , . . . , ζN corresponding to the
poles and zeros in the upper half of the plane, so that π j HRTF databases are built by measuring the incoming HRIR
and ζ j belong to the band defined by the interval [ω j−1 , ω j ]. signal at each ear (of a head doll or a human) for a large
Each pole and zero has a corresponding conjugate depen- number of possible incoming directions. The minimum an-
dent counterpart, so that the IIR filter actually has 2N poles gle between adjacent measured directions can be found by
and 2N zeros. In order to ensure compactness of the feasible spatial sampling analysis [1] to be about 4.9◦ for a 44.1kHz
region and to guarantee the existence of solution, each pole sampling rate and 18cm distance between ears. The CIPIC
and zero is constrained to lie within the unit circle (it should database [2], for instance, uses 25 azimuth and 50 elevation
be noted that poles lying on the unit circle would lead to angles, for a total of 1250 HRTF filters for each ear. Al-
unstable filters, but these are ruled out by the minimization though this number of measurements is reasonably large, if
procedure since the original FIR filters are always stable). a soundscape rendering of a moving sound source is needed,
Let arg(·) denote the argument or angle of a complex the need for an interpolation procedure becomes evident.
number, ∗ denote complex conjugation, and let H be the A moving sound source is defined by an output audio
original HRTF. The complete optimization problem is thus signal x(n) and a spatial trajectory τ(n) that can be repre-
formulated as sented as polar coordinates (absolute value, azimuth and el-
evation). The absolute value in HRTF auralization is used
min kH − K̂(H, π, ζ )Ĥ(π, ζ )k2
for controlling the overall gain (using a distance-based rule
s.t. ω j−1 ≤ arg(π j ) ≤ ω j
such as inverse-distance gains or other psychoacoustical es-
ω j−1 ≤ arg(ζ j ) ≤ ω j , i = 1, . . . , N
timates), while the azimuth and elevation angles define the
|π j | ≤ 1, |ζ j | ≤ 1
closest HRTF filters in the database that will be combined in
the interpolation.
where Ĥ(π, ζ ) is the transfer function given by
One of the simplest interpolation approaches is to al-
ways choose the filter corresponding to the closest measured
∏Nj=1 (1 − ζ j z−1 )(1 − ζ j∗ z−1 )
Ĥ(π, ζ ) = . direction, and to crossfade transitions to avoid discontinu-
∏Nj=1 (1 − π j z−1 )(1 − π ∗j z−1 ) ities [12]. Unfortunately this is not good enough to ensure
auralization smoothness if sound sources move slightly fast:
and the overall amplitude factor K̂(H, π, ζ ) is set to with the CIPIC database a 1 second circular round-trip of a
sound source would involve 50 crossfades/second, a transi-
H, Ĥ(π, ζ ) tion rate which would produce audible artifacts due to phase
K̂(H, π, ζ ) =
kĤ(π, ζ )k2 shifts from one filter to the next.
The way to avoid this is to use adaptive filters whose
in order to minimize kH − K̂ Ĥ(π, ζ )k2 , which is a convex coefficients are recomputed every sample or every block of
quadratic expression on K̂ for fixed H, π and ζ . B samples (comprising a few milliseconds). For instance,
The radial division may be chosen according to the ap- linear and bilinear interpolations [4, 12] are exactly adap-
plication context, for instance equal division (ω j = j · Nπ ) or tive crossfades between 3 (or 4) measured HRTFs that are
π
octave division (ω0 = 0, ω j = 2N− j , j = 1, . . . , N). Other closest to the direction of τ(n) at any given time, where the
error measurements, such as the distance between magni- crossfade weights are linearly (or bilinearly) related to the
tude transfer functions, which avoids the phase uncertainty relative position of the sound source within the triangle (or
problem referred to by [8], can also be used at discretion. rectangle) defined by the original HRTF directions. By re-
This optimization problem has a nonlinear and noncon- computing the filter coefficients at sample rate, the adaptive
vex objective function, and requires global optimization filtering will not introduce audible discontinuities in the sig-
strategies such as multistart or branch-and-bound, combined nal, even with IIR models, provided that the spatial trajec-
with local descent strategies (Cauchy or Newton-like meth- tory function τ(n) is smooth and slowly-varying.
ods). In the experiments of section 4, Kalman solutions Obtaining the weights in the triangular or rectangular
were modified to obtain feasible starting points, from which case is a very simple geometrical problem. In triangular
264
interpolation, if the closest HRTFs have directions d 1 , d 2 also of the transfer functions (due to linearity of the z trans-
and d 3 relative to the user, and the virtual sound source has a form), whereas in the IIR case the interpolation affects si-
normalized direction dˆ (so that dˆ lies in the triangle defined multaneously the numerator and denominator polynomials
by d 1 , d 2 and d 3 ), the weights w that satisfy dˆ = w1 d 1 + of the transfer function
w2 d 2 + w3 d 3 are given by
a0 + a1 z−1 + · · · + aN z−N
H(z) =
−1 1 + b1 z−1 + · · · + bN z−N
w = d1 d2 d3 ˆ
d. that corresponds to the given IIR filter equation. Since the
interpolation of filter coefficients affects the position of the
poles and zeros in highly nonlinear ways, there is no a pri-
In rectangular (or bilinear) interpolation [4], if the azimuth ori guarantee that interpolated filters will always be stable.
and elevation angles of the closest HRTFs are (α j , βk ), for Unstable configurations were not observed in the numerical
j = 1, 2 and k = 1, 2, and the azimuth and elevation an- experiments (section 4), and they should not be the cause of
gles of the virtual sound source is (α̂, β̂ ), the corresponding troubles in the context of moving sound sources, where an
weights will be unstable configuration would probably be soon replaced by
a stable one.
|α2− j − α̂| |β2−k − β̂ |
w jk =
(α2 − α1 ) (β2 − β1 ) 3.2. Pole-Zero Interpolation (PZI)
corresponding to the HRTF (α j , βk ), for j = 1, 2 and k = The pole-zero interpolation takes the positions of the poles
1, 2. These weights are used in a variety of interpolation and zeros (zeros of the numerator and of the denominator
contexts in the remaining of this section, and will be referred of the transfer functions, respectively) of the IIR models of
to as the appropriate weights with respect to the interpola- 3 or 4 HRTFs that are closest to the direction of the virtual
tion method (triangular or rectangular). source, and combines them linearly using the appropriate
The choice between triangular or rectangular interpola- weights. Each pole (or zero) of the interpolated filter is ob-
tion may be based on computational performance (triangular tained by a linear combination of exactly one pole (zero)
being roughly 25% faster because it combines 3 instead of from each of the 3 or 4 filters, where this set of matching
4 HRTFs) or on minimizing the number of transitions be- poles (zeros) has been defined beforehand (this problem will
tween different sets of HRTFs (rectangular subdivision of be discussed in the sequel).
the database creates half the number of regions as does tri- Consider the set of filters {Fj } for 1 ≤ j ≤ 3 (triangular
angular subdivision). case) or 1 ≤ j ≤ 4 (rectangular case), defined by their pole-
Three adaptive interpolation approaches based on the zero structure
IIR filter model are presented in the sequel. The first one
uses the above weights to combine filter coefficients in the Fj ↔ {π jk , ζ jk , k = 1, . . . , N}.
filter equation, and the remaining two use the weights to po- and overall gain factor K j , that are supposed to be combined
sition poles and zeros in the z plane representation of the IIR with appropriate weights w j to create a new interpolated fil-
filters obtained by the Kalman method or by the structured ter. By assuming that the poles and zeros are ordered ac-
IIR approach. cording to their association, in such a way that π1k , π2k , π3k
are corresponding poles and ζ1k , ζ2k , ζ3k are corresponding
3.1. Interpolation of Filter Coefficients zeros, the structure of the interpolated filter will be given by
A first approach to interpolate IIR models of HRTF filters is
π̂k = ∑ j w j π jk ,
to look at the IIR filter equation F̂ ↔ .
ζˆk = ∑ j w j ζ jk , k = 1, . . . , N
N N
The interpolation of the overall gain factor could sim-
y(n) = ∑ a j x(n − j) − ∑ b j y(n − j) ply be computed as K̂ = ∑ j w j K j , but in many contexts this
j=0 j=1
leads to poor interpolation results, because this expression
and to consider the coefficients a j and b j as functions of assumes that the overall gain of the interpolated filter is
n, defined as linear combinations of the coefficients of the similar to those of the closest filters used in the interpola-
closest IIR models of measured HRTFs, using the appropri- tion. This assumption often fails to hold, for instance with
ate weights as discussed above. Kalman filters, which lack any kind of constraints on the po-
This idea is almost a direct translation of the usual linear sition of poles and zeros. We propose to recompute the over-
interpolation of HRTF FIR filters, with the notable excep- all gain factor from the original HRTF data, as in section 2.2.
tion that in the FIR case the interpolation of the coefficients In the case of interpolation, where the original HRTF is un-
is the same as the interpolation of the impulse responses, and known, this means finding the best overall gain with respect
265
to the closest measured HRTFs. Specifically, if {H j } are the Once a conforming IIR database is produced, the asso-
transfer functions used in the interpolation, we define ciation of poles and zeros may proceed by considering sepa-
D E rately the problem of associating real poles of one filter with
H j , Ĥ(π̂, ζˆ ) real poles of another filter, and so on for real zeros, complex
K̂ j = , poles and complex zeros.
kĤ(π̂, ζˆ )k2 One association strategy was previously proposed in [12]
using a minimum distance criterion. Consider for instance a
and the interpolated overall gain factor as K̂ = ∑ j w j K̂ j . set of complex poles π1 , π2 , . . . , πM that must be associated
In order to create associations of poles and zeros for dif- with another set of complex poles ρ1 , ρ2 , . . . , ρM . The pro-
ferent IIR filters one has to define exactly what it is that posed scheme is as follows: out of all possible permutations
must be achieved with the association. In the case of inter- p of the indices 1, 2, . . . , M, choose the one that minimizes
polation one wants to define intermediate IIR filters, and so the overall distance ∑ j |π j − ρ p( j) |. This association is re-
the structure of the interpolated poles and zeros must sat- flexive (that is, p−1 minimizes ∑ j |ρ j − π p−1 ( j) |), but it lacks
isfy the requirement that for each complex pole (or zero) its transitivity, meaning that the best association between filters
complex conjugate is also a pole (zero) of the same filter. A and C may be incompatible with the best associations be-
This is the same as saying that the filter coefficients (and the tween filters A and B and between filters B and C. This is
output filtered signal) have to be real-valued. a huge problem when trying to create associations between
Even this minimal requirement already turns the associ- poles and zeros of 3 or 4 HRTF filters at a time (as required
ation problem into an intrinsically difficult problem. Real for interpolation).
poles (or zeros) may be readily associated with other real We propose another association, based on frequency or-
poles (zeros). When it comes to complex poles or zeros one dering. For every IIR model in a preprocessed database,
is free to associate any members of different filters without sort all complex poles and zeros according to their angles,
violating this requirement, as long as the exact same asso- and use this ordering as the association for complex poles
ciations are made with the corresponding complex conju- and zeros of different filters, matching the j-th complex pole
gates. But there is no way of matching a filter with 2 zeros (zero) of any filter with the j-th complex pole (zero) of any
at z = −1 and z = 1, for instance, with another filter with other filter. Correspondingly, sort all real poles and zeros ac-
2 zeros at z = −i and z = i. Any linear combination will cording to their values, and use this ordering for association
produce a complex filter as a result of interpolation. Only as above. This association satisfies reflexivity and transi-
under very special circumstances may a real pole (or zero) tivity, being a feasible substitute for the minimum distance
be associated with a complex pole (zero), namely when the approach.
multiplicity of the real pole or zero is even, and an associa-
tion of a pair of overlapping real poles (or zeros) with a pair
of complex poles (zeros) is made possible. 3.4. PZI Using Structured IIR Filters
In the sequel this association problem will be addressed Structured IIR filters were proposed with the PZI strategy
in the specific contexts of IIR filters produced by the Kalman in mind, so the association of poles and zeros is already es-
method, and of structured IIR filters obtained by optimiza- tablished through the association of each frequency band to
tion as seen in section 2.2. exactly one pole and one zero. This is closely related to
the frequency-based association just proposed for IIR filters
3.3. PZI Using Kalman IIR Filters obtained by the Kalman method.
It should be noted that real poles and zeros may occur in
Since the Kalman method has no way of regulating the pres- structured IIR filters, either on the extreme frequency bands
ence or multiplicity of real poles or real zeros, some prepro- (the first starting on 0 and the last ending on π) or in any
cessing of the original filters is necessary in order to use the other band (if the magnitude of the pole or zero is zero after
produced IIR database in an interpolation context. Specifi- optimization). When that happens, the corresponding filter
cally, since overlapping poles or zeros are usually not pro- has actually two coinciding poles (or zeros) produced by the
duced by this method, the number of real poles in any filter same complex variable of the optimization model. During
should be made equal, as well as the number of real zeros. interpolation, if the spatial trajectory of the virtual sound
Two possible strategies for this preprocessing are: (1) allow source departs from a filter with a real pole in frequency
only complex poles and zeros in the filters; or (2) choose band j and reaches another filter where the matching pole
the most frequent number of real poles (and of real zeros) in the same frequency band is complex, the interpolation
and force every non-conforming filter to adapt to the cho- scheme will produce mirrored, complex-conjugate pole tra-
sen number of real poles (and of real zeros). In both cases jectories in frequency bands j and 2N − j + 1 (on the lower
a search among modified versions of the filter is needed to half of the unit circle), ensuring that the intermediate filters
enforce the chosen structure of poles and zeros. will be real-valued, as they should be.
266
Even though all poles and zeros used in the interpolation Performance Measures for IIR Interpolation
are constrained within the unit circle, we still obtained best 70 Relative Time (%)
Relative Number of Users
results by using the interpolation of the recomputed over-
60
all gain factors K̂ j , as defined in section 3.2, instead of the
interpolation of the original gains K j . 50
40
4. NUMERICAL EXPERIMENTS 30
20
The methodology for building IIR databases and for using
HRTF interpolation in binaural rendering presented in the 10
267
Errors for Kalman IIR Models Errors for Structured IIR Models, Left Ear, N=16
0.22
1 4
Kalman (avg)
Kalman (min) 0.2
3.5
Kalman (max)
0.8 3 0.18
2.5 0.16
Elevation (rad)
Relative Error
0.6 2
0.14
1.5
0.12
0.4 1
0.1
0.5
0.2 0 0.08
-0.5 0.06
0 -1 -0.5 0 0.5 1
2 4 6 8 10 12 14 16 18 20 0.04
Azymuth (rad)
Number of Poles (and Zeros)
0.6 1
Structured IIR Coef. Interp. (avg)
Structured IIR Coef. Interp. (std)
Kalman Coef. Interp. (avg)
0.4 0.8 Kalman Coef. Interp. (std)
0 0.4
2 4 6 8 10 12 14 16 18 20
Number of Poles (and Zeros)
0.2
268
Errors for Pole-Zero Interpolation Workshop on Applications of Signal Processing to Au-
1
Structured IIR PZI (avg) dio and Acoustics, New Paltz, 2001.
Structured IIR PZI (std)
Modified Kalman PZI (avg)
0.8 Modified Kalman PZI (std) [3] C. Cheng, “Moving sound source synthesis for bin-
Structured IIR Coef. Interp. (avg)
aural electro-acoustic music using interpolated head-
Relative Error
269