Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

AUDIO DENOISING BY DIAGONAL AND NON DIAGONAL TIME

FREQUENCY CO-EFFICIENTS
VINUTHA A
PG Scholar, Department of E & CE, JNNCE, Shivamogga, Karnataka, INDIA.

AJAY BETUR P
Assistant Professor, Department of E & CE, JNNCE, Shivamogga, Karnataka, INDIA.

ABSTRACT: Audio, which includes voice, music and different kinds of sounds. Such audio
signals are corrupted by background noise and humming noise from audio equipments. Audio
noise reduction is the process of attenuating the noise from audio signal. Therefore audio
denoising procedures are designed to attenuate noise and retain the original signals. This paper
focuses on denoising of audio signals corrupted with Gaussian white noise which is specially hard
to remove and also describes diagonal and non diagonal, audio denoising algorithms. Diagonal
audio denoising procedure weakens noise by processing each frame independently, in which
thresholding operators are used at each terms. Hard and Soft thresholding techniques are
implemented in diagonal estimators. These estimators create isolated time frequency structures
that are perceived as musical noise degrades audio perception due to lack of time-frequency
regularity. Non diagonal audio denoising algorithm through adaptive time-frequency block
thresholding is introduced to remove highly scattered musical noise. From numerical experiments
it is observed that new adaptive estimator is robust to signal type variations and improves the SNR
compared to diagonal estimators. This procedure is demonstrated through objective and subjective
evaluations and is implemented using MATLAB tool.

KEYWORDS: Hard thresholding, Soft thresholding, Block thresholding, Spectrogam


representation.

INTRODUCTION

World Wide Web Applications have extensively grown since few decades and it has become
requisite tool for education, communication, industry, amusement etc. All these applications are
multimedia-based applications consisting of images, audio and videos. Audio data is an integral
part of many modern computer and multimedia applications. Numerous audio recordings are dealt
with audio and multimedia applications. The audio signal usually ranges from 20 Hz to 20 KHz.
Most natural audio signals are highly structured. Signals such as speech, bird song, environmental
sounds and most types of music consists of sets of distinct components such as transients and
harmonics with characteristic orientation in time and frequency. The audio signal processing is a
topic for last many years and its applications are always disturbed by noise and it is one of the
major problem. The noise, is an undesirable electrical energy that falls within the pass band of the
signal and cause disruption in communication. It is an unwanted and inevitable interference in any
form of communication. It is a noninformative,plays the role of absorbing the intelligence of the
original signal. Hence a non essential signal gets superimposed over an undisturbed signal. If the
regularity of noise lessens, then the methods for denoising gets more sophisticated. When a signal
passes through equipments and connecting wires gets naturally added with a noise resulting in
signal contamination. Once the signal gets polluted, it is difficult to remove the noise without
altering the original signal. Hence the basic task in signal processing is the denoising of signals.
The objective of audio denoising is to attenuate the noise, while recovering the underlying signals.
It is accessible in many applications such as music,speech restoration etc. Hence to remove the
noise from the audio signals some audio denoising procedures are used. Such audio denoising
procedures are designed to attenuate the noise and retain the original signals. Audio denoising is
done by two types of estimations.

Diagonal estimation.
Non diagonal estimation.

In diagonal estimation audio denoising algorithms attenuate the noise by processing each window
Fourier, with thresholding operators, in which Hard thresholding and Soft thresholding techniques
are used. In these technique at each noisy coefficients threshold value is set which removes noise
in time domain. Then the signal to noise ratio is calculated for the denoised signal. Then the
denoised signal in time domain is converted to time frequency domain by using STFT(Short time
fourier transform ). Then these time frequency oefficients are represented in matrix form and are
plotted using spectrogram plot. From this plot, presence of noisy coefficients at each frequency
is identified and create isolated time-frequency structures that are perceived as a musical
noise and are very hard to remove. Hence non-diagonal estimation procedure is introduced to
denoise the signal and to prevent artifacts called musical noise. In non diagonal estimation block
thresholding technique is used. Block thresholding algorithm parameters are chosen adaptively by
minimizing a stein estimation of the risk. Hence a filter is applied by convolution to the noisy
signal and all the coefficients of a block are treated with a common threshold, depending on the
block. For non stationary signals the fourier transform is computed over widowed signals using the
short time fourier transform. It shows time-frequency decomposition like a music score. Hence
removing noise by using adaptive block thresholding technique, will removes musical noise at
each blocks and gives better SNR value compared to diagonal estimation techniques.

Functional block diagram

157
Basics of audio and noise
Audio
Audible sound arises from pressure variations in the air falling on the ear drum. The human
auditory system is responsive to sounds in the frequency range of 20 Hz to 20 kHz. The audio
signals are non stationary and its signal properties change, slowly with time. Audio signals can be
very broadly categorized into four types, Environmental Sounds, Artificial Sounds, Speech, and
Music. The Speech can be viewed as a sequence of phones and Music as the evolving pattern of
notes.
Noise
Noise is an unwanted and inevitable interference in any form of communication. It is non-
informative and plays the role of absorbing the intelligence of the original signal. The audio signal
f is contaminated by a noise that is often modeled as a zero mean Gaussian process independent
of f:
y[n] = f(n) + (n), n = 0,1,N-1 (1)

Figure 1. Audio signal with noise

Basics of STFT, Spectrogram and Hanning window


STFT
The short-time Fourier transform (STFT) is a Fourier-related transform used to determine the
sinusoidal frequency and phase content of local sections of a signal as it changes over time. The
Fourier transform assumes that, the signal is analyzed over all time of an infinite duration.
Mathematically, frequency and time are orthogonal cannot mix one with the other. A time-
frequency audio denoising procedure computes a short time Fourier transform (STFT) of the noisy
signal and processes the resulting coefficients to attenuate the noise. STFT converts the signal
from time domain to time-frequency domain. It gives spectral components and are plotted using
spectrogram plots.

Spectrograms
Spectrogram is a three dimensional plot which represents the energy of the frequency content of a
signal as it changes over time. In spectrogram representation, the horizontal axis represents time,
the vertical axis is frequency, and the intensity of each point in the image represents amplitude of a
particular frequency at a particular time.

158
(a) (b)

Figure 2. STFT of (a) Original mozart signal (b) Noise added mozart signal

Hanning Window
If Fourier transform is applied to a non-stationery signal, it results in occurrence of more side
lobes. To reduce the no. of side lobes Hanning Window" is used . In Hanning window the main
lobe width is same as other window techniques but the magnitude of side lobe is reduced to the
maximum possible extent. So hanning window is preferable. The advantage of windowing is that,
it is straight forward to obtain minimal computational effort and number of side lobes will be
reduced. A Hanning window has continuous derivatives. It has the advantage of generating a tight
frame STFT.

Figure 3. Fourier transform of hanning window and normalised hanning window

STFT of a signal
STFT of original signal x(t) is represented as:

159

jt
STFT {x(t )} X ( , ) x(t ) (t )e dt (2)

x (t) = original signal in time domain.
(t) = Window function, commonly a Hann window.
Data to be transformed could be broken up into frames (which usually overlap each other). Each
frame is Fourier transformed, and the complex result is added to a matrix.

STFT { x[ n]} X ( m, ) x[ n][ n m]e j n (3)

x [n] = signal in frequency domain.
w [n] = window function in frequency.

AUDIO DENOISING PROCEDURE

The audio signal f is contaminated by a noise that is represented as ,

y[n] f [n] [n] n=0,1,N-1 (4)

The STFT of noisy signal y[n] decomposes the audio signal over a group of time and frequency
coefficients i.e., {g l , k }l , k
l = Time Coefficients, k = Frequency Coefficients.
The resulting coefficients can be written as,
N 1
Y [l , k ] y, g l ,k y[ n]g l,k [ n] (5)
n 0
Where * denotes the conjugate of both time and frequency coefficients. The noisy signal y[n] can
be reconstructed from the formula,
1
y[ n] Y [l , k ] g l , k [ n ]
A l ,k
(6)

To get tight frame of time frequency coefficients, the redundancy factor A is considered. A
denoising algorithms modifies time frequency coefficients by multiplying each coefficients by an
attenuation factor a[l,k] to attenuate the noisy component Y[l, k]. The resulting denoised "signal
estimator is,

1 1
f [n] F[l , k ]gl ,k [n] a[l , k ]Y[l, k ]gl,k [n] (7)
A l ,k A l ,k
Hence the denoising algorithms are differ through the calculations of a[l, k].
The optimal attenuation factor represented as,
1
a[l , k ] 1 (8)
[l , k ] 1

160
Where, [l , k ] is the priori SNR represented as,

F 2 [l , k ]
[l , k ] (9)
2 [l , k ]
From this optimal attenuation factor the quadratic estimation risk 'r' can be minimized.
2 2
1
r E f f E F l , k F [l , k ] (10)
A l ,k

Where, f = original signal and f = denoised signal.
The oracle risk is r0 is the lower bound of the risk 'r'.

1
r0 R0 (11)
A
2
F l , k 2 [l , k ]
R0 2
(12)
l ,k F l , k 2 [l , k ]
This r0 is not reached from Priori SNR calculations. Hence the posteriori SNR [l , k ] are
computed as,
2
Y l , k
l , k (13)
2 l, k
It can be verified from the unbiased estimator,

l , k l , k 1 (14)

DIAGONAL ESTIMATION

Diagonal Estimators are simple time frequency audio denoising algorithms compute each
attenuation factor a[l , k] only from the corresponding noisy coefficient Y [l , k].

The types of diagonal estimations are: Empirical Wiener estimation, Power subtraction estimation,
Donoho and Johnstone thresholding estimation.

From the statistical work of Donoho and Johnstone, thresholding estimators are studied for audio
noise removal. It gives an amplitude separation and is used to separate signal from noise. It sets
threshold value T 2log e N to each noisy coefficients.

161
Figure 4. Thresholding types: (a) Hard thresholding (b) Soft thresholding.

Hard thresholding
In Hard thresholding, the coefficients that are within the threshold value are consider as zero and
the coefficients which are above the threshold value remain same and are considered as actual
coefficients of the signal.

y(n) f (n) for f ( n) T

y n 0 f n T
for

Figure 5. Denoised by Hard threshold and its Spectrogram.

Soft thresholding
In Soft thresholding the coefficients which are within the threshold value are consider as zero and
subtract the threshold value from the coefficients which are above the threshold value. Depending
upon the changes in the noise threshold value will change.
y(n) f (n) T f ( n) T
for
y n 0 f n T
for

162
Figure 6. Denoised by Soft threshold and its Spectrogram.

Table 1. SNR values for different audio signals


SNR Soft Hard
Mozart Signal 1.55 3.66
Piano Signal 1.48 2.52
Violin Signal 3.99 7.48
Speech Signal 3.77 6.42
Bird Signal 10.43 17.18

Table.1 illustrates that, in diagonal estimation, hard thresholding gives better SNR than soft
thresholding for different audio signals. However, some noisy coefficients are still retains in the
denoised signals. Such coefficients are considered as Musical noise. To remove such musical
noise, Non diagonal estimation procedure is used.

NON DIAGONAL ESTIMATION

In Non diagonal estimation, block thresholding technique is used. STFT compute noisy signal in
time domain to time frequency domain and spectrogram plot represents the time frequency blocks.
The coefficients of each blocks are treated with common threshold to test different blocks and
keep the best and to compute attenuation factors. This block grouping regularizes the estimation to
remove musical noises. Adaptive time frequency block thresholding is introduced, which removes
hardly any musical noise and improves the SNR values .This procedure adjusts all the parameters
adaptively to signal property by minimizing a Stein estimation of the risk. This time frequency
audio denoising procedure computes (STFT) of the noisy signal and processes the resulting
coefficients to attenuate the noise. The STFT is invertible and the original signal can be recovered
from the transform by Inverse STFT. Adaptive block thresholding Algorithm uses two factors
which are same size of STFT time frequency coefficients matrix.

Attenuation Factor Map contains attenuation factor for each blocks.


Flag Depth contains subdivision representations.
Coefficient matrix is partitioned in to macro block . As the signal is real, the STFTcoeff matrix
presents symmetry. Hence only treat only negative frequencies.

163
Figure 6. Partition of macro blocks

The rectangular blocks of Length = Li in time and Width = Wi in frequency. The adaptive block
thresholding chooses the block sizes by minimizing an estimate of the risk 'r'. The risk r cannot be
calculated since f is unknown, but it can be estimated using Stein unbiased risk estimate (SURE)
theorem. This theorem is used to find the best block shapes into a macro block by minimising the
estimated risk 'r'
p rob 2
2
(15)

is the probability to keep a residual noise. Adjusting and the block sizes B can be considered as
an optimization factors for block thresholding estimators.

Choice of B and are performed as


Choice of block B
Group STFT coefficients matrix into disjoint rectangular blocks. The block size is Bi = Li Wi
Where, Li = 8, 4, 2 is block length in time and Wi = 16, 8, 4, 2, 1 is block width in frequency.
Choice of thresholding level
For the estimated choice of block size and the residual noise probability level , the thresholding
level will be obtained. For each block width B and length are estimated using Monte Carlo
simulation. The Table.2 shows the resulting value with = 0.1
Table 2. Thresholding level calculated for different block size Bi# with = 0.1
value W=16 W=8 W=4 W=2 W=1
L=8 1.5 1.6 1.9 2.3 2.5
L=4 1.7 1.9 2.4 3 3.4
L=2 1.9 2.5 2.4 3.2 4.8

RESULTS

The Block thresholding algorithm has been utilized for passage to time and frequency field. The
parameters are changed to get some minor changes in SNR. By varying the size of windows the
SNR values are changes. So optimal size of windows are obtained. The optimal size of window in
time depends on the sampling frequency. The results shows that the level of noise is not really
significant on the optimal size.

164
Figure 7. 44.1 kHz Sampling frequency

Figure 8. 11.1 kHz Sampling frequency

Figure 9. 16.0 kHz Sampling frequency

Unlike the diagonal estimator, the algorithm based on non diagonal time frequency estimation is
really good for denoising Music signals which are corrupted by noise. The SNR values for hard,
soft and block thresholding techniques are compared for different audio signals. The result is
shown in table.3 and conclude that block thresholding gives better SNR than hard and soft
thresholding techniques for music signals played using Mozart, Piano, Guitar etc.

165
Table 3. SNR values for different audio signals
SNR Soft Hard Block
Mozart Signal 1.55 3.66 6.47
Piano Signal 1.48 2.52 2.98
Violin Signal 3.99 7.48 10.29
Speech Signal 3.77 6.42 4.23
Bird Signal 10.43 17.18 13.15

CONCLUSIONS

The non diagonal estimators are more effective than diagonal estimators, because they produce
less musical noise. Perception of the human ear is the most important measure of audio quality.
Subjective listening test shows that the denoising of musical signals, speech signals and bird songs
using the block thresholding are perceived as being significantly improved in the audible quality
and the objective evaluation shows that, SNR values for block thresholding technique is greater
than hard and soft thresholding techniques for musical signals, but the SNR value is little bit poor
for speech signal and bird song. The performance of block thresholding is high for music signals
and this block thresholding technique is one of the best algorithm for denoising of audio signals,
specially for music signals.

REFERENCES

D. Donoho and J. Johnstone. (1994). Ideal spatial adaptation by wavelet shrinkage, Biometrika,
81(3),. 425-455.
S. M. Guoshen Yu and E. Bacry.( 2008). Audio denoising by time-frequency block
thresholding," IEEE Transactions On Signal Processing, 56(5).
O. Capp.( 1994). Elimination of the musical noise phenomenon with the ephraim and malah
noise suppressor," IEEE Trans. Speech, Audio Process, 2, 345-349.
S. S. Mallat. (2009). A Wavelet tour of signal processing - The Sparse Way, 3rd edition.
J. Lim and A. Oppenheim.(1979).Enhancement and bandwidth compression of noisy speech,"
Proceedings of the IEEE,67(12) 1586-1604.
H. L. A. Y. Ephraim and W. Roberts.(2005). The Electrical Engineering Handbook, A. brief
survey of speech enhancement, CRC Press,.
J. M. B. B. I. N. M. Berouti, R. Schwartz and M. Cambridge.( 1979). Enhancement of speech
corrupted by acoustic noise," In Acoustics, Speech, and Signal Processing, IEEE International
Conference on ICASSP 79, 4.
S. Boll.(1979). Suppression of acoustic noise in speech using spectral subtraction,"IEEE
Transactions on Acoustics, Speech and Signal Processing, 27(2),113-12.

166

You might also like