Professional Documents
Culture Documents
Itgarticle Embedded Fonts
Itgarticle Embedded Fonts
net/publication/224234249
CITATIONS READS
3 247
2 authors, including:
Juergen Freudenberger
Cyberagentur - Germany's Cyber Security Innovation Agency
153 PUBLICATIONS 785 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Juergen Freudenberger on 08 March 2015.
Abstract
In cars, the communication between different seat rows
proves to be difficult due to road noise and the acoustic
situation (seating positions and viewpoints). Drivers typi-
cally tend to turn their heads when talking to rear seat pas-
sengers, unconsciously challenging their and the passen-
gers’ safety. In this paper, we address the enhancement of
the speech understandability, particularly for the scenario
where driver or co-driver talk to rear seat passengers. We
propose an in-car communication system providing an am-
plification to the acoustic propagation from front to rear
taking into account the acoustic conditions in a car. In
particular, we present a spectral subtraction approach for
noise and feedback suppression, frequency shift and feed-
back cancelation modules in such systems.
1 Introduction
Although luxury vehicles provide a noise reduced environ-
ment in the cabin, the communication between front seats
and rear seats is difficult. In contrast to a normal conver- Figure 1: Seating position of passengers in a car and the
sation, in cars there is road noise and the positions and acoustic propagation of speech from the driver to the right
lines of vision of the passenger seats are fixed which im- rear seat passenger.
pairs speech understanding. Usually passengers do not feel
comfortable to conduct long conversations. Frequently, the
car driver is tempted to turn the head in order to improve In-car communication systems have tight restrictions
the communication. Thus, for safety and comfort reasons, for the maximum delay and allowable amplification for the
a system which supports natural communication between transmission from the speaking to the listening person, oth-
passengers is desirable. In this paper we discuss an in-car erwise the sound of the system is usually not acceptable.
communication system which improves speech communi- For instance, if the system gain is too high and the delay
cation in a vehicle. Such a system basically works as an too long, the speaking person recognizes his or her own
intercom between the different passenger seats. Further- echo. Moreover, high gains and long delays may lead to
more, it can serve as the acoustic front-end for other appli- situations, where the listening person localizes the speaker
cations like hands-free telephony, voice controlled devices, in the direction of the loudspeaker. This localization mis-
broadcast services, and dialog systems. Similar concepts match is rather disturbing and should therefore be avoided.
are considered for example in [1], [7], [5], and [8]. Typically, the overall delay (including A/D and D/A con-
Usually, communication systems are associated with version) should not exceed 10 ms [8]. Due to this delay
bi-directional communication, i.e., an in-car communica- restriction, the signal processing for in-car communica-
tion system may be expected to amplify or improve the tion systems is usually performed in the time-domain and
communication from front seat passengers to rear seat pas- block processing is avoided. This work presents a spec-
sengers and vice versa. However, practical experience tral subtraction approach, where the filter is calculated in
and experiments have shown that the front-to-rear com- the frequency-domain whereas the actual filtering is per-
munication path requires more attention concerning signal formed in time-domain. The filter is used to suppress back-
improvement [3]. First, the directivity of a human head ground noise as well as to prevent howling due to acoustic
(mouth) is forward-turned, especially for higher frequen- feedback.
cies. Second, upper medium class cars featuring hands- In this paper, we propose a half-duplex communica-
free telephony already provide all microphones and loud- tion system amplifying speech signals from the front seats
speakers which are required for enhanced front-to-rear to the rear seats in addition to the acoustic propagation of
communication. And third, given typical microphone po- these signals. The scenario is outlined in Fig. 1. It should
sitions (no near microphones), measurements have shown be noted that our system is especially designed to the use
that it is possible to achieve a gain of 10 dB from the front with microphones and loudspeakers which are already pro-
seats to the back seats by simply amplifying the signal vided by hands-free telephony and sound reproduction sys-
without any further processing (noise reduction) whereas tems and, thus, can be seen as an efficient add-on not re-
the prospective gain for the communication from rear to quiring a lot of extra hardware. As opposed to the sys-
front is rather lower. For the communication between tems described in [5] or [8], our system can not avail itself
driver and co-driver the gain is not even significantly larger of specially tailored microphones and loudspeakers at op-
than 0 dB [5]. timal position, but, nevertheless, features an audible and
noise level
AGC
measurable enhancement of the communication from the is required. To avoid seesaw changes between pause and
front seats to the rear seats. speech activity which are very likely to occur in sim-
In our experiments, we limit our considerations to the ple power-based VAD algorithms, we compare the signal
communication from the driver to the right rear seat pas- power in different frequency bands (below 750 Hz, 750 Hz
senger. Here, the acoustic path is indicated by a dotted to 1875 Hz, above 1875 Hz) and we consider a modified
line, further acoustic propagation paths are represented by power spectrum Φ̃xx ( f , k) at discrete time k is dependent
solid lines. Speech signals are captured by two micro- on previous values and which calculates as
phones located in the rear-view mirror and the processed
signal is output by two speakers (one in the door and one Φ̃xx ( f , k) = (1 − δ ) · Φ̃xx ( f , k − 1) + δ · Φxx ( f , k) ,
in the back shelf). Although not explicitly presented in this where δ ∈ [0, 1] is around 0.3 and where Φxx ( f , k) is the
illustration, we always examine both paths to the passen- current power spectrum. Knowing the current signal power
gers’ left and right ears. In the remainder of this paper, we spectrum Φxx ( f ) and an estimate Φnn ( f ) of the noise
describe the architecture of our proposed communication power spectrum, the filters are determined according to
system and we discuss results of our experiments. s
Φnn ( f )
Ĥ ss ( f ) = 1 − ,
2 System Architecture Φxx ( f )
The architecture of our proposed system is depicted in and the filter would be applied to the signal in the fre-
Fig. 2. Speech signals captured by both microphones quency domain before an inverse FFT is applied. The delay
are first processed by a delay-and-sum beamformer. This caused by this block processing is not tolerable for in-car
beamformer achieves a gain in terms of signal to noise ra- communication. To overcome this restriction we apply the
tio of 2-3dB for frequencies above 1kHz, while the gain inverse FFT to the transfer function of the filter and ob-
is only marginal for lower frequencies. The use of a tain the output signal by convolution in the time domain,
fixed beamformer instead of adaptive algorithms reduces where the filter coefficients are updated every 2ms. To re-
the complexity of the feedback canceler, because only one move musical noise the filter coefficients in time domain
are recursively smoothed, i.e. if ĥss −1 (Ĥ ss ( f )) is
feedback canceler is required in contrast to one canceler i (k) = F
for each microphone with adaptive beamformers. the ith filter coefficient at time k, we have
Due to the acoustic paths from the loudspeakers to the ĥss ss ss
i (k) = (1 − γ) · ĥi (k − 1) + γ · hi (k) ,
microphones the in-car communication system is a closed-
loop system that may become instable if the system gain where γ is a constant in the range [0.1, 0.9].
is too large. The task of the feedback canceler is to esti- On the one hand, the frequent update of the noise re-
mate the acoustic feedback and subtract it from the beam- duction filter increases the computation complexity of the
former output signal. However, feedback cancellation is system. On the other hand, simulations show that a 64 or
extremely difficult due to the strong correlation between 128 point FFT is sufficient to obtain the required SNR gain
the speech signal and the loudspeaker output signal, by of 3-5dB, so that the overall complexity is similar to con-
what the conditions of our system resemble those of hear- ventional noise reduction algorithms.
ing aids [4]. Usually, additional means to suppress rising In order to avoid feedback signals, a non-linearity is
feedback tones are required. In [5] and [8] adaptive FIR fil- introduced in the system. Here, we choose a frequency
ters are used to predict and suppress periodic signal com- shift method similar to the one described in [6]: The single
ponents. In our system, we use the spectral subtraction side-band signal is not shifted by a fixed frequency offset,
algorithm to reduce the background noise level as well as e.g., 5 Hz, but by a variable offset changing from 0 Hz to
to attenuate feedback frequencies. 10 Hz at a frequency of 5 Hz (“frequency warbling”). The
Conventionally, noise reduction is performed in the fre- frequency shift is only used at high noise levels as in this
quency domain, e.g. employing spectral subtraction. The case a higher signal amplification is required (increasing
objective of the spectral subtraction algorithm is the es- the risk of feedback signals). At lower noise levels (where
timation of the noise proportion in the short-time signal even the minor distortion introduced by the offset would
spectrum and to calculate an appropriate filter attenuating rather disturb the auditory impression) the frequency shift
noisy signal components. For the 16kHz sampling rate is switched off.
of our system, the spectral subtraction typically operates The theory behind frequency shifting requires a single
on overlapping signal blocks of 256 or 512 samples from side-band signal which is multiplied with the complex ex-
which the spectrum is calculated. ponential function, so that the output signal is
In order to estimate the noise proportion appropri- n o
ately, a robust method for voice activity detection (VAD) sshifted (k) = ℜ sSSB (k) · eΩshift ·k ,
cos (Ωshift · k)
without support
8000
δ(t − L)
6000
Frequency
s(k) sshifted (k)
4000
+_
sin (Ωshift · k)
2000
Hilbert 0
1 2 3 4 5 6 7
Time
with support
8000
Frequency
4000
where sSSB (k) is the single side-band input signal and 2000
Ωshift = 2π j · fshift (k)/ fs with variable frequency offset
0
fshift (k) and sampling frequency fs . In our implementa- 1 2 3 4 5 6 7
tion, we follow the approach shown in Figure 3 which uses Time
−4
microphones.
−6 A fixed delay of approximately 2ms is inserted in our
−8 simulation to model the delay introduced by the A/D and
D/A converters.
−10
The processed signal is output via two speakers – one
−12 located in the door and one located in the back shelf. The
−14 balance between both speakers can be adjusted with the aid
−16
of a system variable in our simulations. With this variable
200 400 800 1600 3200 6400 defaulting to 0.5, the signal is output in equal measures by
Frequency [Hz]
both speakers.
10
SNR at passenger’s ear [dB]
−15 0
−5
−20
−10
−25
−15
−30 −20
4 6 8 10 12 14 16 18 20 22 200 400 800 1600 3200 6400
SNR at front microphone [dB] Frequency [Hz]
Figure 6: SNR (in dB) at front microphone and rear seat Figure 7: SNR (in dB) in relevant one-third octave bands.
passenger.
the half-duplex communication from front seat passengers
output signal) and optical impression (looking at the spec- to rear seat passengers, our system features an SNR im-
trogram), we also include measurable metrics in our sys- provement of 3 - 7 dB at typical noise levels in a car cabin.
tem evaluation. First, we consider the delay which we
measure with the aid of composite speech signals (speech Acknowlegements
signals containing white noise bursts of length 200ms re-
peated every second) at a high SNR (engine turned off). Research for this article was sup-
Cross-correlation measurements with the in-car communi- ported by the German Federal Min-
cation system turned off determine a delay of the acoustic istry of Education and Research
path of 61 samples (3.8 ms). Including the system and sub- (Grant No. 17 N11 08 ).
tracting the cross-correlation of the acoustic propagation,
we obtain a system delay of 110 samples (7 ms). With such
a small delay, the first wavefront at the rear seat passenger References
arrives from the front (and not from the loudspeakers be-
hind him) leaving the correct impression that the signal is [1] B. M. Finn. Integrated vehicle voice enhancement sys-
coming from the driver, according to the Haas effect [2]. tem and hands-free cellular telephone system. Euro-
pean Patent EP 0 932 142 A2, Jul. 1999.
Having determined the delays, we perform SNR mea-
surements at different noise levels. As shown in Figure 6, [2] H. Haas. The influence of a single echo on the audibil-
we measure the reference SNR at the front microphone. ity of speech. J. Audio Eng. Soc., 20:145–159, March
Then we determine the signal-to-noise ratio at the rear seat 1972.
passenger’s left ear with the in-car communication system
switch off (without support) and on (with support). Look- [3] E. Hänsler and G. Schmidt, editors. Topics in Acous-
ing at the curves in the diagram, it can be understood that tic Echo and Noise Control: Selected Methods for the
our system features a minimum SNR improvement of 3 dB Cancellation of Acoustical Echoes, the Reduction of
(even increasing for a lower overall SNR). Background Noise, and Speech Processing. Springer,
The distribution of the SNR improvement over the one- 2006.
third octave bands is illustrated in Figure 7. Comparing [4] J. M. Kates. Signal Processing for hearing aids.
the dashed (without system support) and solid (with sup- Kluwer Academic Publishers, 1998. Chapter 6 in M.
port) lines, it can be observed that the SNR improvement Kahrs, K. Brandenburg: Applications of Digital Signal
introduced by our in-car communication system primarily Processing to Audio and Acoustics.
occurs in relevant bands between 300 Hz and 6400 Hz sup-
porting the acoustic impression of how the signal is actu- [5] K. Linhard and J. Freudenberger. Passenger in-car
ally enhanced. communication enhancement. In Proc. EUSIPCO, Vi-
enna, pages 21–24, 2004.
[6] G. Nishinimoya. Improvement of acoustic feedback
4 Conclusion stability of public address system by warbling. In Pro-
In this paper we have presented a half-duplex in-car ceedings of the Sixth International Congress of Acous-
communication system enhancing the acoustic propaga- tics, pages 93–96, 1968.
tion from front passengers to rear passengers inside a [7] K. Schaaf, J. Schultz, and K. Tontch. Digital voice
car. The system has low computational complexity and enhancement for improved in-car communication. In
has been implemented on the digital signal processor Proc. 3rd IFAC Workshop Advances in Automotive
TMS320C6713 from Texas Instruments. Our experiments Control, Karlsruhe, Germany, March 2001.
on real data show an audible enhancement of the speech
signal which can not only be visualized in spectrograms as [8] G. Schmidt and T. Haulick. Signal processing for
shown in Fig. 5 but can also be understood in signal-to- in-car communication systems. Signal Processing,
noise ratio measurements as outlined in Figures 6 or 7. For 86(6):1307–1326, 2006.