Enhancement Based On Auditory Masking

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Enhancement Based on Auditory Masking

In the phenomenon of auditory masking, one sound component is concealed by the presence
of another sound component. Heretofore in this chapter and throughout the text, sporadically
made use of this auditory masking principle in reducing the perception of noise exploited
masking of quantization noise by a signal. both noise and signal occurring within a particular
frequency band. Exploited the masking of additive noise by rapid change in a signal, both noise
and signal change occurring at a particular time instant. These two different psychoacoustic
phenomena are referred to as Frequency and temporal masking, respectively.
Research in psychoacoustics has also shown that can have difficulty hearing weak signals that
fall in the frequency or time vicinity of stronger signals (as well as those superimposed in time
or frequency on the masking signal, as in the above two cases). A small spectral component may
be masked by a stronger nearby spectral component. A similar masking can occur in time for two
closely-spaced sounds, In this section, this principle of masking is exploited for noise reduction
in the frequency-domain. While Temporal masking by adjacent sounds has proven useful,
particularly in wideband audio coding , it has been less widely used in speech processing because
it is more difficult to quantify. In this section, we begin with a further look at frequency-domain
masking that is based on the concept of a critical band. Then using the critical band paradigm,
we describe an approach to determine the masking threshold for complex signals such as speech.
The speech masking threshold is the spectral level (determined from the speech spectrum) below
which non-speech components are masked by speech components in frequency. Finally, we
illustrate the use of the masking threshold in two different noise reduction systems that are based
on generalizing spectral subtraction.

13.5.1 Frequency-Domain Masking Principles

the basilar membrane. located at the front -end of the human auditory system, can be modeled as
a bank of about 10,000 overlapping band pass filters, each tuned to a specific frequency (the
characteristic frequency) and with bandwidths that increase roughly logarithmically with
increasing characteristic frequency. These physiologically-based filters thus perform a spectral
analysis of sound pressure level appearing at the ear drum. In contrast,
there also exist psycho acoustically-based filters that relate to a human's ability to perceptually

1
resolve sound with respect to frequency. The bandwidths of these filters are known as the critical
bands of hearing and are similar in nature to the physiologically-based filters.
Frequency analysis by a human has been studied by using perceptual masking, Consider a
tone at some intensity that we are trying to perceive; we call this tone the maskee. A second tone,
adjacent in frequency, attempts to drown out the presence of the maskee; we call this adjacent
tone the masker. Our goal is to determine the intensity level of the maskee (relative to the
absolute level of hearing) at which it is not audible in the presence of the masker. This intensity
level is called the masking threshold of the maskee.
The general shape 12 of the masking curve for a masking tone at frequency Q 0 with a particular
sound pressure level (SPL) in decibels was first established by Wegel and Lane .Adjacent tones
that have an SPL below the solid lines are not audible in the presence of the tone at Q o. We see
then that there is a range of frequencies about the masker whose audibility is affected. maskee
tones above the masking frequency are more easily masked than tones below this frequency. The
masking threshold is therefore asymmetric, the masking threshold curve for frequencies higher
than Q() having a milder slope.,

the steepness of this slope in the higher frequencies is dependent on the level of the masking
tone at frequency Qo, with a milder slope as the level of the masking tone increases. On the other
hand, for frequencies lower than Q(), the masking curve is modeled with a fixed slope

Another important property of masking curves is that the bandwidth of these increases roughly
logarithmically as the frequency of the masker increases. In other words. the range of
frequencies that are affected by the masker increases as the frequency of the masking tone
increases. This range of frequencies in which the masker and maskee interact was quantified by
Fletcher through a different experiment. In Fletcher's experiment, a tone (the maskee) is masked
by a band of noise centered at the maskee frequency. The level of the tone set so that the tone
was not audible in the presence of wideband white noise. The bandwidth the noise was decreased
until the tone became audible. This experiment was repeated at different - frequencies and the
resulting bandwidths were dubbed by Fletcher the critical bands. The critical band also relates to
the effective bandwidth of the general masking curve. Critical bands reflect the frequency range
in which two sounds are not experienced independently are affected by each other in the human

2
perception of sound, thus also relating to our ability perceptually resolve frequency components
of a signal.
Given the roughly logarithmically increasing width of the critical band filters, this suggests
that about 24 critical band filters cover our maximum frequency range of 15000 Hz for human
perception. A means of mapping linear frequency to this perceptual representation is through
the bark scale. In this mapping, one bark covers one critical band with the functional relation
of frequency .

13.5.2 Calculation of the Masking Threshold

For complex signals such as speech, the effects of individual masking components are additive;
the overall masking at a frequency component due to all the other frequency components is given
by the sum of the masking due to the individual frequency components, giving a single masking
threshold .This threshold tells us what is or is not perceptible across the spectrum.
For a background noise disturbance (the maskee) in the presence of speech (the masker) we
want to determining the masking threshold curve. as determined from the speech spectrum,
below which the background noise would be inaudible. For the speech threshold calculation,
however we must consider that the masking ability of tonal and noise components of speech (in
masking background noise) is different
Based on the above masking properties, a common approach to calculating the background noise
masking threshold on each speech frame, denoted by T(pL, w), was developed by Johnston , who
does the analysis on a critical-band basis. This approach approximates the masking threshold
and reduces computation as would be required on a linear-frequency basis.
The method can be stated in the following four steps

S1: The masking threshold is obtained on each analysis frame from the clean speech by first
finding spectral energies (by summing squared magnitude values of the discrete STFT), denoted
by Ek with k the bark number, within the above 24 critical bands; as seen, the critical band edges
have logarithmically increasing frequency spacing. This step accounts approximately for the
frequency selectivity of a masking curve associated with a single tone at bark number k with
energy Ek Because only noisy speech is available in practice, an approximate estimate of the
clean speech spectrum is computed with spectral subtraction.

3
S2: To account for masking among neighboring critical bands, the critical band energies E k for
step S1 are convolved with a "spreading function" .This spreading function has an asymmetric
shape but with fixed slopes, and has a range of about 15 on a bark scale. If we denote the
spreading function by h k on the bark scale, then the resulting masking threshold curve is given
By Tk = Ek * h k .

S3: subtract a threshold offset that depends on the noise-like or tone-like nature of the masker.
One approach to determine this threshold offset uses the method of Sinha and Tewfik. Based on
speech being typically tone-like in low frequencies and noise-like in high frequencies

S4:then map the masking threshold Tk resulting from step S3 from the bark scale back to a linear
frequency scale to obtain T (pL, w) where w is sampled as DFT frequencies .

You might also like