Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Audio Engineering Society

Convention Paper
Presented at the 131st Convention
2011 October 20–23 New York, NY, USA

This Convention paper was selected based on a submitted abstract and 750-word precis that have been peer reviewed by at least
two qualified anonymous reviewers. The complete manuscript was not peer reviewed. This convention paper has been
reproduced from the author's advance manuscript without editing, corrections, or consideration by the Review Board. The AES
takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved.
Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio
Engineering Society.

Observing the Clustering Tendencies of


Head Related Transfer Function Databases
Areti Andreopoulou, Agnieszka Roginska, and Juan Bello

Music and Audio Research Laboratory (MARL), New York University, New York, NY, 10012, USA
aa1510@nyu.edu, roginska@nyu.edu, jpbello@nyu.edu

ABSTRACT

This study offers a detailed description of the clustering tendencies of a large, standardized HRTF repository, and
compares the quality of the results to those of a CIPIC database subset. The statistical analysis was implemented by
applying k-means clustering on the log magnitude of HRTFs on the horizontal plane, for a varying number of
clusters. A thorough report on the grouping behavior of the filters as the number of clusters increases revealed a
superiority of the HRTF repository in describing common behaviors across equivalent azimuth positions, over the
CIPIC subset, for the majority of the HRTF datasets.
destructive interaction with the head, pinnae, and torso.
High variability in the anthropometric parameters of the
1. INTRODUCTION human body results in unique, subject-specific HRTF
filters, which assist localization accuracy and spatial
Nowadays, we experience a large amount of impression best when used by their original owner [7].
commercial products, such as video/computer games, There are currently several research methods explored
3D movies, and 3D televisions that aim at the creation to investigate efficient HRTF individualization
of an immersive environment for the user. Such techniques, such as Database Matching [14], the
products employ spatial sound technologies, for the Boundary element method [9], HRTF decomposition
design of accurate and realistic auditory fields. [4], Principal Component Analysis [8], HRTF clustering
Generalized Head Related Transfer Functions (HRTFs) [3, 12], and others.
are the most frequent tool of such immersive systems,
due to their computational efficiency, at the cost, 1.1. Cluster Analysis
sometimes, of spatial realism.
Cluster analysis is a very common statistical tool in
HRTFs, or their time-domain equivalent Head Related various research areas such as in data mining, machine
Impulse Responses (HRIRs), are the composite of all learning, information retrieval pattern recognition etc. It
the filtering effects that a sound undergoes before can be considered a method of unsupervised learning,
reaching one’s ears, through constructive and where members of a dataset are divided into subsets,
Andreopoulou et al. Clustering tendencies of HRTF databases.

called clusters. Elements of the same cluster share in order to lower the large number of filters per set, by
similar characteristics in some respect. sampling in a non-uniform but, rather, perceptual
manner. An HRTF filter corresponding to any virtual
One of the most popular methods of clustering analysis sound-source position can be re-synthesized, during
is k-means clustering, in which the data collection is reproduction, from this bank of HRTFs by means of
partitioned in k clusters, by grouping the data linear interpolation and Interaural Time Difference
observations based on their proximity to each cluster’s (ITD) insertion.
centroid. The main advantages of this algorithm are its
speed and simplicity, facts that permit its use on large Sets removed from the HRTFs can be re-synthesized by
datasets
! [13]. The algorithm proceeds as following: means of linear interpolation and Lemaire et al., in their
HRTF individualization research, used clustering as a
1. The value k for the number of clusters is statistical tool to detect the most representative impulse
chosen response filters in a set [10]. Those filters would have to
be measured to estimate the remaining others. For this
2. k clusters are randomly generated and the task a Self-Organizing Map (SOM) was used as a
cluster centers are calculated. clustering method. A Hierarchical Agglomerative
!
Clustering (HAC) tool, using the Ward criterion, ran on
3. Each data point is assigned to the nearest top of each map, grouping the HRTFs that shared
cluster, based on its proximity to the cluster similar characteristics.
!
centre.
Several further papers have explored the idea of
4. The cluster centers are updated reducing the full-scale process of HRTF measurements
to a set of representative filters that carry the necessary
5. The previous two steps keep repeating until spatial variance and individualization properties to be
convergence is met; that is, when cluster used as models for the rest of the set. On that premise,
assignments of the data-points stop changing. Nicol et al. [11] have compared 5 distance criteria in
search for the most meaningful HRTF dissimilarity
K-means clustering is a heuristic algorithm that cannot measure. Those were the Mean Square Error (MSE), the
guarantee convergence to the global optimum. Its results CO, the Fahn, the Avendano and the Durant criterion. In
strongly depend on the cluster initializations, which are both studies described in their paper the Avendano
random, and can differ between successive runs. Further criterion was chosen as the most appropriate one.
repetitions of the algorithm with varying initialization
conditions can yield optimized results. So et al. [12], on the other hand, used cluster analysis to
calculate twelve orthogonal components from a set of
1.2. Related work 196 HRTFs that were associated with forward and
backward positions. Those components were afterwards
presented to users to choose their preferred ones.
Clustering techniques on HRTFs are currently widely
Validation tests on the user selections showed that,
used in spatial audio research. The term may refer to
except for the achieved data reduction, localization
one of two things: either to the process of updating the
accuracy improved for the subjects, by means of
resolution grid of an HRTF set so that it contains only
minimization of front-back confusion errors.
the minimum number of necessary filters, before
localization accuracy starts dropping; or to the process
of grouping together elements of a large database, based This paper relates to on-going research that merges the
on their similarity, and choosing the best representative fields of Database Matching and HRTF Clustering.
for each cluster. Both methods lead to significant data More specifically, it offers a thorough investigation of
reduction in an HRTF collection, and, therefore, to more the clustering tendencies of a large, standardized HRTF
computationally efficient 3D audio processing systems. repository. Furthermore, it examines the effect of
database size on clustering, by comparing the clustering
results to those of the CIPIC database, one of the most
The research of Fahn & Lo on HRTF clustering [3] falls
popular, publicly available HRTF collections, by the
in the first of the two mentioned above categories. In
University of California Davis [1]. The technique
their work, the authors apply an improved LBG-based
chosen for cluster analysis of the data in this experiment
clustering algorithm on the power spectrum of HRTFs

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 2 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

was k-means, due to its simplicity, and ability to handle 3. CLUSTERING RESULTS
large datasets fast.
3.1. Ipsilateral content
2. HRTF CLUSTERING
This selection from the repository is comprised only of
ispilateral HRTFs for each ear. More specifically, for
2.1. Pre-processing
the right ear it contains filters from 0º to 180º (13
azimuth positions), and for the left filters from 180º to
The HRTF repository used in this study consists of 110
360º (13 azimuth positions) on the horizontal plane. The
datasets selected from the LISTEN [15], CIPIC [1], FIU
third sub-group for both ears is a composite of the
[6], and KEMAR [5] databases. A more detailed previous two.
description of its content and format can be found at [2].
3.1.1. Left and Right Ears
For the purposes of this experiment, only a part of the
repository was used. This subset exclusively contained
These two subsets contain 1430 HRTFs each (110
filters on the horizontal plane (0º elevation), converted
subjects at 13 azimuth locations). The maximum
to 256-samples long, minimum-phase HRTFs at a
number of clusters, before severe overlap starts, is 6.
sampling rate of 44100 Hz. Linear interpolation was
applied beforehand on the minimum-phase HRIRs to Such a value for k roughly corresponds to 2 azimuth
create sets that were advancing at 15º azimuthal locations per cluster. Figure 1a depicts the distribution
increments. Before use, the subset was standardized by of HRTF filters per cluster for all the analyzed
subtracting the mean and dividing by a standard subgroups. In this setup, as it is illustrated in the figure,
deviation. Such standardization steps were found the distribution
! of filters among different clusters is
fairly even, with just one exception. Such a cluster,
necessary to assure uniformity of the sets. The resulting
collection consisted of 5280 HRTFs (110 subjects at 24 occupying a wider area on the left and right hemisphere,
locations for both ears). will be noticed in most of the studied subset.

2.2. Clustering Due to symmetries in the head geometry, the left and
right ear subsets are found to cluster in an equivalent,
symmetric way across the median plane. This
The repository was divided into 3 groups based on the
observation can be easily spotted in Figure 2, which
content of its filters; the first contained filters that
depicts the grouping tendencies of the per-ear ipsilateral
corresponded to ispilateral azimuth positions only, the
content HRTFs in 4 clusters. In these types of graphs,
second filters of contralateral azimuth positions, and the
different azimuth locations are indicated by the position
third contained all filters. In each group k-means
of the clusters on a circle. Each gray shade corresponds
clustering was applied to either filters of both ears, or
to a different cluster, while the length of the sectors
filters of the left and right ear separately. This further
denotes the number of HRTFs per cluster. In Figure 2,
division resulted in 9 different HRTF sub-groups.
the symmetrical properties of the left and right ear
subsets can be realized when looking at the size and
Cluster analysis was run for a variable number of
color of equivalent azimuth-position sectors. The
clusters, with k ranging from 2 to 10. The reason for
overall structure of the left ear graph is almost a mirror
this variability was two-fold. While, the target was to
image of that of the right ear. Consequently, it suffices
approach the optimum number of clusters that would
to describe the decorrelation of HRTFs as k increases,
best describe the location-dependent filter relationships,
for just one ear.
the!changes in the content of each cluster, as k
increased, revealed interesting information on the
correlation of the filters. The observations of the For k = 2 , azimuth positions on the median plane start
clustering tendencies of these filters are described in the separating from the rest. The!180º filters are fully
following section. concentrated into one cluster, along with a large amount
! of HRTFs from azimuths in the 150º-180º range, and
! approximately half of the 0º ones. The remaining
HRTFs get grouped into the second cluster.

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 3 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

Figure 1 The distribution of HRTF filters per cluster for all the analyzed subgroups. a) Right Ear: Ipsilateral
Content, b) Both Ears: Ipsilateral Content, c) Left Ear: Contralateral Content, d) Both Ears: Contralateral Content,
e) Right Ear: All Content, f) Both Ears: All Content.

When k = 4 the front and back azimuth positions on


3.1.2. Both ears
the median plane get split into 2 different clusters, along
with filters from adjacent azimuths within a range of This subset contains 2860 HRTF filters (110 subjects at
30º. For the remaining locations, a wide cluster occupies 13 azimuth locations for both ears). The maximum
! positions from 30º to 90º, and a narrower one from 105º number of clusters, before severe overlap between
to 150º. clusters starts occurring, is 6. This number roughly
corresponds to 4 azimuth locations per cluster. In this
For k = 6 , on the other hand, things start getting more setting, as it can be seen in Figure 1b, the distribution of
evenly distributed. The two median plane clusters are filters per clusters is fairly even, with the exception of
more concentrated, containing the vast majority of the the ±90º cluster.
0º and 180º HRTFs. Their range is also much narrower,
! including filters within a 15º offset. The remaining of Following the decorrelation of the HRTFs as k
the azimuth positions are divided into 4 cluster, with increases, it is interesting that the first azimuth position
each one occupying, roughly, 2 adjacent locations. The for k = 2 that completely detaches itself in a separate
only exception is the 90º one, which spans from 60º to cluster is 180º, along with half of the 0º azimuth filters.
90º. The rest of the azimuths remain grouped ! in a single
cluster.
!

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 4 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

of ±15º. The remaining clusters contain the rest of the


positions on the two hemispheres with an average of 2
azimuth positions per filter. The only exception, as
mentioned before, is the ±90º cluster, which occupies a
much wider area of azimuth positions ranging from
±60º to ±90º.

3.2. Contralateral content

This selection from the repository contains only


contralateral content HRTFs. For the right ear, this
corresponds to filters from 180º to 360º (13 azimuth
positions), and for the left, to filters from 0º to 180º (13
azimuth positions) on the horizontal plane. Once again
the third sub-group, which contains filters for both ears,
is a composite of the previous two. The 0º and 180º
degrees azimuth positions appear in both ispilateral and
contralateral content groups, since the median plane
does not clearly belong to any of these two classes.

3.2.1. Left and Right Ears

These two subsets contain 1430 HRTF filters each (110


subjects at 13 azimuth locations). The maximum value
for k before severe overlap between clusters starts
occurring is 4, which corresponds approximately to 3
azimuth locations per cluster. The distribution of the
number of HRTFs per cluster can be seen in Figure 1c.
! Due to the mentioned above anatomical symmetry of
the human head, the two subsets cluster proportionally.
Figure 2 The clustering behavior of the Left and Right For this reason only a description of the clustering
Ear Ipsilateral HRTFs, for k = 4 . Each gray shade tendencies of the left ear, contralateral content HRTFs
corresponds to a different cluster, while the length of the will be given.
sectors denotes the number of HRTFs per cluster. This
figure illustrates the symmetry in the clustering For k = 2 , as seen before, the filters on the median
tendencies of the!two ears, across the median plane. plane, along with filters from adjacent azimuth positions
Such performance is most likely connected to the within a 30º range, get separated into a cluster. The
symmetric properties of the human head. remaining positions stay grouped together.
!
When k = 4 , the front and back locations on the When k = 4 , the clustering behavior of the subsets is
median plane (0º and 180º respectively) get separated different from what we have seen so far in the previous
into different clusters, along with some filters from groups. The 0º azimuth cluster becomes more
adjacent azimuth positions within a range of ±30º. The concentrated, containing considerably less adjacent
! remaining angles are divided into a wide cluster that! filters, from a 15º range. Furthermore, the wide 90º
occupies positions from ±30º to ±105º and a narrower cluster is also present, occupying a range from 75º to
one for the rest (Figure 3b). 120º degrees. The remaining 2 clusters, on the other
hand, demonstrate a different behavior, engaging a
For k = 6 the two median plane clusters remain response that resembles to the cone-of-confusion. The
separated but become considerably narrower by first spans from 15º to 45º and 150º to 180º degrees,
including significantly less adjacent filters from a range while the second from 30º to 60º and from 135º to 150º.
This behavior can be also observed in Figure 3c.
!

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 5 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

Figure 3 The clustering tendencies for each of the analyzed HRTF subgroups, when k = 4 . a) Right Ear:
Ipsilateral Content, b) Both Ears: Ipsilateral Content, c) Left Ear: Contralateral Content, d) Both Ears: Contralateral
Content, e) Right Ear: All Content, f) Both Ears: All Content. Each gray shade corresponds to a different cluster,
while the length of the sectors denotes the number of HRTFs per cluster.
!
3.2.2. Both ears
For k = 4 , the distribution of azimuth locations among
clusters is different from what has been observed so far.
This subset contains 2860 HRTF filters (110 subjects at
There, still, exist a cluster that contains the vast majority
13 azimuth locations for both ears). In this subgroup,
of the median plane filters. Unlike in other subsets, the
the maximum number of clusters, before severe overlap
0º and 180º degrees positions do not separate into
starts occurring, is, once again, 4. This value for!k ,
different groups, yet, remain joined together along with
roughly, corresponds to 6 azimuth positions per cluster.
some filters from the +15º and ± 135º degrees azimuth
The distribution of the number of filters per cluster for
locations. There, also, still exists the ±90º cluster, which
this setting can be seen in Figure 1d. spans from ±30º to ±120º degrees. For the remaining 2
! clusters one occupies filters in the ±105º to ±120º, and
Following the clustering tendencies of the contralateral the second engages the “cone-of-confusion” behavior
filters as k increases, we can make the following covering the ±150º to ±165º and ±15º to ± 30º degrees,
observations. When k = 2 , the median plane HRTFs respectively. These clustering tendencies can be viewed
along with adjacent azimuth positions, within a range of in Figure 3d.
± 30º for the front and ± 45º for the back, get separated
!from the rest of the filters in a cluster. The second
cluster contains
! the remaining filters.

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 6 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

3.3. All content subgroups is that for this value of k the 0º azimuth
positions get clustered with the ipsilateral while the 180º
This last group contains filters of both ispilateral and ones with the contralateral HRTFs.
contralateral content. More specifically, all available
azimuth locations on the horizontal plane are divided 3.3.2. Both ears !
into left and right ear sub-groups (24 azimuth positions
per ear). The third sub-group is a composite of the This subset contains 5280 filters (110 subjects at 24
previous two, holding HRTFs of both ears (48 azimuth azimuth locations for both ears). It is a composite of the
positions). left and right ear subgroups and contains all the filters
from the studied HRTF repository. For this case, the
3.3.1. Left and Right Ears selected optimum value for k is 8, which corresponds,
roughly, to 6 azimuth positions per cluster. The
These two sub-groups contain 2640 HRTF filters each distribution of filters per cluster for this setting can be
(110 subjects at 24 azimuth locations). The most seen in Figure 1f.
meaningful number of clusters for this collection is 6, !
which is equivalent to approximately 4 azimuth Following the clustering tendencies of the HRTFs, as
locations per cluster. For larger values of k , multiple the number of cluster increases, the subsequent
clusters begin to span in the same range, some of them observations can be made. When k = 2 , the ipsilateral
lacking a significant peak in an azimuth location. The and contralateral content filters get split into two
distribution of filters per cluster for the maximum different clusters. Yet, while the frontal medial plane
allowed value of k can be observed! in Figure 1e. Due HRTFs get primarily grouped in the ipsilateral content
to the observed clustering symmetry across the left and cluster, the 180º azimuth
! positions are almost evenly
right ear subsets, only the right ear clustering tendencies split in both.
will be described here.
! For k = 4 , two of the clusters contain ipsilateral and
For k = 2 , the ipsilateral content HRTFs get separated the remaining contralateral content filters. There is one
from the contralateral content ones into different wide centre cluster for the ipsilateral content ranging
clusters. As expected the majority of the median plane from ±30º to ±135º degrees. There is, also, one
filters get grouped along with the ipsilateral HRTFs. ! containing the median plane positions along with
! adjacent filters within a range of ±15º degrees for the
When k = 4 , two of the clusters cover, mainly, the front and ±30ª degrees for the back. For the contralateral
ipsilateral and the rest the contralateral content filters. filters, there is also a wide centre cluster spanning from
More specifically, the first cluster contains the median ±60º to ± 120º degrees, and another one covering the
plane HRTFs along with filters from adjacent positions ±15º to ±45º and -135º to -180º degrees range,
! within a range of ±15º, while the second cluster contains demonstrating a “cone-of-confusion” behavior (Figure
the remaining ipsilateral azimuth positions, spanning 3f).
from 15º to 150º degrees. For the contralateral content,
there is a centre cluster ranging from -45º to -135º, and When k = 6 , two clusters primarily contain the front
another one that exhibits the “cone-of-confusion” and back of the median plane filters respectively,
behavior containing filters from the -135º to 180º, and another two the ipsilateral and the remaining the
the -15º to -60º degrees azimuth positions (Figure 3e). contralateral content filters. More specifically, from the
! median plane clusters, the first contains the majority of
For k = 6 , half of the clusters cover the ipsilateral the 0º azimuth HRTFs along with adjacent filters within
content HRTFs, and the rest the contralateral content a range of ±30º. The second one contains the majority of
ones. For the ipsilateral case one cluster covers the the 180º filters along with the ±165º and the
azimuth positions from 0º to 30º, another from 45º to contralateral filters of the ±15º and ± 30º azimuth
! 120º, and the last from 135º to 165º. For the positions. From the two ispilateral-content clusters one
contralateral case, the first cluster spans from -60º to - covers the range from ±45º to ±120º degrees and the
120º degrees, while the remaining two reveal a “cone- second from ±135º to ±165º, while the contralateral
of-confusion” behavior, spanning from -15º to -30º and content ones demonstrate the seen-before behavior of a
-150º to -180º, and from -45º to -60º and -135º to 150º wide centre cluster, spanning from ±60º to ±120º and
degrees respectively. The interesting finding in these

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 7 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

the “cone-of confusion” one ranging form +15º to 30º other from ±30º to ±60 and from ±165º to ±135º
and from ±135º to ±165º degrees. degrees, the clustering results of the ipsilateral filters are
considerably different from what has been observed so
For k = 8 , two clusters contain HRTFs from the front far. The first of the two ipsilateral clusters contains
and back azimuth positions of the median plane. From HRTFs on the median plane, along with filters from
the remaining 6, half of them operate on ipsilateral and adjacent azimuth positions within a range of ±30º.
the rest on contralateral content filters. The clustering Nevertheless, the second one encloses filters from -120º
! tendencies are very similar to the ones of k = 6 . Yet, as to 120º degrees in one very wide cluster, along with half
expected, the clusters are now narrower and more of the 0º azimuth HRTFs. The described clustering
concentrated, with reduced overlap between adjacent behaviors can be viewed in Figure 4.
azimuth positions. The contralateral content HRTFs
maintain the “clustering along ! the cone of confusion”
behavior, while the ipsilateral content ones are just
symmetrical across the left and right hemisphere of the
head. For values of k greater that 8 severe overlap
occurs, as multiple clusters tend to occupy the same
azimuthal space. Most of the clustering noise normally
occurs in the contralateral content filters.
!
3.4. The CIPIC Database

This last group contains a subset of HRTFs found in the


CIPIC database [1]. More specifically, it is comprised of
2160 filters (45 subjects x 24 azimuth locations x 2 ears)
on the horizontal plane (0º elevation), advancing in 15º
degrees azimuthal increments. No further
standardization steps were taken.

Due to space limitations in this paper, only the


clustering tendencies of the whole content of the CIPIC
subset will be described. The most meaningful number
of clusters for this set is 4, which roughly corresponds
to 6 azimuth positions per cluster. Any attempt to
further increase the value of k only introduced overlaid
clusters over the same range of azimuth locations. The
distribution of filters per cluster when k = 4 can be
viewed in Figure 4.
! Figure 4 Top: The distribution of HRTF filters per
Monitoring the clustering tendencies of the CIPIC
subset as the value of k increases, one can make the cluster in the CIPIC database. Bottom: The clustering
! behavior of the CIPIC Database, for k = 4 . The results
following observations. For k = 2 , ipsilateral content
filters get separated from contralateral content ones. The are compared to those of Figure 3f.
zero azimuth position is exclusively grouped in the
ipsilateral cluster
! along with filters from within a range 4. DISCUSSION
of ±30º. Interestingly enough, the HRTFs from the !
!
180ª azimuth position are evenly distributed between
the two clusters. In the previous section, a detailed description of the
clustering tendencies of an HRTF data collection,
comprised of filters exclusively on the horizontal plane,
When k = 4 , two clusters contain ipsilateral and two
was given. The further partition of this data set into sub-
contralateral content filters. Yet, although the latter
groups based on the content of the HRTF filters
employ the known “cone-of confusion” behavior, with
(ipsilateral or contralateral), revealed some interesting
one centre cluster from ±75º to ±120º degrees and the
patterns in the way the data was grouped.
!

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 8 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

First of all, the maximum number of meaningful content of each subset. For example, the clusters with
clusters in a given HRTF collection depends not only on the highest concentration are those occupying positions
the total number of filters available, but also, as on the median plane (0º and 180º degrees). Additionally,
expected, on the spatial information, and the described there is always a wide cluster, centered at approximately
spatial variation of the set. This is the main reason why 90º azimuth, spanning roughly at ±30º degrees across
the contralateral content sub-groups, studied in the the interaural axis. This cluster will not easily be
previous section, were divided into 4 clusters as partitioned any further, since for an increased value of
opposed to 6 for the ipsilateral and 8 for the all-content k , multiple clusters will just overlap over the same
ones. While the concentration of each cluster improves space.
as k increases within the suggested limits (clusters
become narrower and better centered), any attempt to A comparison of the discussed tendencies and the
further increase the number of clusters in!a set, clustering behavior of the CIPIC database subset,
introduces redundancy by means of multiple overlaid revealed the limitations of the latter in clearly conveying
! clusters on the same azimuth locations. grouping patters between azimuth positions. The
maximum number of meaningfully distributed clusters,
Moreover, left and right ear HRTF banks were found to was considerably lower than that of the equivalent case
cluster in a symmetrical manner. Such behavior is in our repository. This limitation led to a deficient
believed to be correlated with the anthropometric grouping of the data into unevenly distributed clusters,
properties of the human head, which, despite of its which only succeeded in separating the ipsilateral from
anatomical differences among individuals that account the contralateral content. It is, therefore, believed that
for the personalized properties of HRTFs, carries a larger, and more complete HRTF repositories will
generally symmetric outline. This observation holds true reveal even more explicitly the clustering tendencies of
for both ipsilateral and contralateral filters, although it is HRTF filters.
more clearly observed in the first case. For this reason
minimum-phase HRTFs can be successfully applied to
any of the two ears with the correct insertion of the 5. CONCLUSIONS / FUTURE WORK
initial time delay (ITD).
This study includes a detailed description of the
Furthermore, depending on the content of filters in each clustering tendencies of a large, standardized HRTF
analyzed HRTF sub-group (ipsilateral, contralateral, or repository, based on k-means clustering. The clarity of
both), the first filters that separate from the rest are the clustering results was evaluated by comparison to
either the median plane ones (first two cases) or the those of the CIPIC database. The results revealed that
ipsilateral from the contralateral content ones (latter not only the length, but also the carried spatial
case). Yet, it is also interesting that, as the number of information, and the described spatial variation of an
clusters available increases, the 0º azimuth position HRTF collection, play an important role in the quality
tends to group together with ipsilateral filters, while the of the clustering results.
180º with contralateral.
Future work will begin with studying the clustering
In addition, it is worth restating the differences in the behavior of even larger HRTF databases that would
clustering behavior of the individual HRTF groups account for the characteristics of more individuals, and
based on the content of their filters. More specifically, with an attempt to reduce the size of the search-space
ipsilateral content sets group adjacent azimuth positions for similarities across different filters, by investigating
into the same cluster, keeping separate the median plane the appropriate low level features that would describe
filters. Contralateral content groups, on the other hand, each dataset best.
demonstrate a “cone of confusion” clustering behavior,
with clusters centered approximately across the 90º 6. REFERENCES
degrees axis. As expected, HRTF collections with both
ipsilateral and contralateral content filters, maintain the [1] Algazi, V.R, Duda, R.O, Thompson, D.M, and
clustering tendencies of both sub-groups. Avendano, C. "The CIPIC HRTF Database". Proc.
of the IEEE Workshop on Applications of Signal
Finally, there are some observations that remain Processing to Audio and Electroacoustics, pp. 99-
consistent irrespective of the number of clusters and the

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 9 of 10
Andreopoulou et al. Clustering tendencies of HRTF databases.

102, Mohonk Mountain House, New Paltz, NY, Proc of IEEE International Joint Conference on
October 2001. Neural Networks, 2005. IJCNN’05, Vol. 4, pp.
2041–2046, Montreal, Canada, July 2005.
[2] Andreopoulou, A., and Roginska, A. “Towards the
Creation of a Standardized HRTF Repository”. [11] Nicol, R, Lemaire, V. Bondu, A. and Busson, S.
Proc of the 131st AES Convention, New York, NY, "Looking for a Relevant Similarity Criterion for
October 2011. HRTF Clustering: A Comparative Study" Proc of
120th AES Convention, Paris, France, May 2006
[3] Fahn, C., and Lo, Y.C. “On the clustering of Head-
Related-Transfer Functions used for 3-D Sound [12] So, R.H.Y., Ngana, B., Hornerb, A., Braaschc, J.,
Localization”. Journal of Information Science and Blauertc, J. and Leungd, K. L. “Toward orthogonal
Engineering, Vol. 19, pp. 141 – 157, 2003. non-individualised head-related transfer functions
for forward and backward directional sound: cluster
[4] Faller, K. J., II, Barreto, A. and Adjouadi, M. analysis and an experimental study”. Ergonomics,
“Decomposition of Head-Related Transfer Vol. 53, No. 6, pp. 767 - 781, June 2010
Functions Based on the Hankel Total Least Squares
Method”. Proc of the Digital Signal Processing [13] Witten, I. H. and Frank, E. “Data Mining. Practical
Workshop and 5th IEEE Signal Processing Machine Learning Tools and Techniques”, Second
Education Workshop, pp. 161 - 166, Macro Island, Edition. San Francisco: Elsevier Inc, 2005.
FL, January 2009.
[14] Zotkin, D. N., Hwaiig, J., Duraiswami, R., and
[5] Gardner B, Martin KD, “HRTF Measurements of a Davis, L. S. “HRTF Personalization Using
KEMAR”, Journal of the Acoustical Society of Anthropometric Measurements”. Workshop on
America, vol. 97, issue 6, pp. 3907-3908, June Applications of Signal Processing to Audio and
1995. Acoustics, pp. 157 - 160, New Paltz, NY, October
2003.
[6] Gupta, N., Barreto, A., Joshi, M., & Agudelo, J. C.
"HRTF Database at FIU DSP Lab". Proc of the [15] http://recherche.ircam.fr/equipes/salles/listen/index.
IEEE International Conference on Acoustics Html
Speech and Signal Processing (ICASSP), pp. 169–
172, Dallas, TX, March 2010.

[7] Hu, H., Chen, L., and Wu, Z. Y. “The Estimation of


Personalized HRTFs in Individual VAS”. Proc. of
the Fourth International Conference on Natural
Computation, Vol. 1, pp. 203-207, Jinan, October
2008.

[8] Hwang, S. and Park, Y. “Interpretations on


principal components analysis of head-related
impulse responses in the median plane”. Journal of
the Acoustical Society of America 123 (4), pp.
EL65-EL71, March 2008.

[9] Katza, B. F. G. “Boundary element method


calculation of individual head-related transfer
function”. Journal of the Acoustic Society of
America, 110 (5), Pt. 1. pp. 2440 - 2448, 2005.

[10] Lemaire, V., Clérot, F., Busson, S., Nicol, R., and
Choqueuse, V. "Individualized HRTFs From Few
Measurements: a Statistical Learning Approach".

AES 131st Convention, New York, NY, USA, 2011 October 20–23
Page 10 of 10

You might also like