Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

ISSN 0976 6464(Print) ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), pp. 01-10 IAEME: www.iaeme.com/ijecet.asp Journal Impact Factor (2012): 3.5930 (Calculated by GISI) www.jifactor.com

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), IAEME

IJECET
IAEME

IMPROVING THE GLOBAL PARAMETER SIGNAL TO DISTORTION VALUE IN MUSIC SIGNALS USING PANNING TECHNIQUE AND DISCRETE WAVELET TRANSFORMS
VENKATESH KUMAR.N1, RAGHAVENDRA.N2, SUBASH KUMAR T. G3, MANOJ KUMAR.K4 1(SET, Asst. Professor, Department of ECE, Jain University, Jakkasandra, Ramanagar Taluk, Karnataka,India kumarsparadise@yahoo.com,) 2(Principal Staff Engineer, Google Inc, [formerly Motorola Mobility], Bangalore, India) 3(Project Leader, Jasmin Infotech Pvt Ltd, Velacherry, Chennai, India) 4(Manoj Kumar K, Consultant, Java Mentor, Bangalore, India)

ABSTRACT In this paper, an attempt is made to alleviate the effect of distortion during feature extraction of a music signal. The proposed method is compared with the existing methods for performance evaluation, thereby, improving the signal to distortion value. Keywords: Blind Source Separation; DWT; FFT; Panning; STFT; Signal to Distortion ratio;

1.

INTRODUCTION

The singing voice, in addition to being the oldest musical instrument, is also one of the most complex from an acoustic standpoint [1]. Research on the perception of singing is not as developed as in the closely related field of speech research [2]. Some of the existing work is surveyed in this section. Chou and Gu [3] had utilized a gaussian mixture model (GMM) to detect the vocal regions. The feature vectors used for the GMM include 4Hz modulation energy, harmonic coefficients, 4 Hz harmonic coefficients, delta mel frequency cepstral coefficients (MFCC) and delta log energy. Berenzweig and Ellis [4] had used a speech recognizers classifier to distinguish vocal segments from accompaniment.
1

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), IAEME

Kim and Whitman [5] had developed a system for singer identification in popular music recordings using voice coding features. Another system for automatic singer identification had been proposed by Zhang[6]. Maddage(c) et al. [7] had proposed a framework for music structure analysis with the help of repeated chord pattern analysis and vocal content analysis. MaximoCobos [8] had proposed a system for extracting singing voice from stereo recordings. This system combines panning information and pitch tracking, allowing to refine the time-frequency mask applied for extracting a vocal segment, and thus, improving the separation. Motivation In real time applications of sound separation like lyrics recognition and music remixing, music information retrieval requires accurate extraction of features from the music signal. Existing methods result in poor signal to distortion value. Hence, it is necessary to enhance music quality by improving the signal to distortion value. Problem Statement The applications of music separation algorithms in real time demand for better signal to noise and distortion ratios (SINAD). These parameters depend on the technique used for feature extraction, where in literature it can be found that similarity measures between the Short Time Fourier Transforms of the input signals were used to identify the time-frequency (TF) regions occupied by each source based on the panning coefficient. Instead, in this work, we implement the audio source separation using the similarity measures between the Discrete Wavelet Transforms (DWTs) of the input signals which were used to identify the timefrequency regions occupied by each source based on the panning coefficient, hence improving the Signal to and Distortion ratio. 2. 2.1 PROPOSED SOURCE SEPARATION TECHNIQUE 1.2 1.1

Music Source Separation Model The source separation problem can be stated as follows: given M linear mixtures of N sources mixed via an unknown M N mixing matrix A, estimate the underlying sources from the mixtures. When M = N, this can be achieved by estimating an un-mixing matrix W, which allows to estimate the original sources up to a permutation and a scale factor. Independent Component Analysis (ICA) algorithms are able to perform the separation if some conditions are satisfied: the sources must be non-Gaussian and statistically independent [9]. Moreover, the number of sources must be equal to the number of available mixtures, M = N, and the problem is said to be even determined. When M > N, the mixing process is defined as over determined and the underlying sources can be estimated by least-squares optimization using matrix pseudo-inversion. If M < N, the mixing process is underdetermined and the estimation of the sources becomes much more difficult [10]. When dealing with stereo commercial music recordings, only the information of the left and right channels is available, and thus, the mixture is generally underdetermined [11]. Sparse methods provide a powerful approach to the separation of several signals when there are more sources than sensors [12]. The sparsity property of audio signals means that in most time-frequency bins, all sources but one, at most, will have a time-frequency coefficient of zero or close to zero[13][14]. The DUET algorithm [15], originally conceived for separating under determined speech mixtures, assumes that because of the sparsity of speech in the Short Time
2

International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013) IAEME 2013),

Fourier Transform (STFT) domain, almost all mixture time-frequency points with significant time frequency magnitude are in fact due to only one of the original sources. In fact, in the ideal case when e each time-frequency point belongs only to one source, the sources are said to be W frequency W-Disjoint Orthogonal (W-DO). 2.2 Overview of the Proposed Model

Define Similarity

Stereo input

X1 X2

Measure

Partial Similarity measure calculated

is

Calculate Ambiguity Resolving Function

Panning Index Analysis and Gaussian Windowing

Set the window width

Foreground Streams

DWTs of the foreground streams are obtained

Apply DWT1operator, obtaining target signal ( (t)), i=1,2

Figure 1 overview of the proposed separation model

International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013) IAEME 2013),

2.3

Panning Index Windowing An initial segregation of the singing voice is used by applying the source identification technique developed by Avendano [16]. This technique is based on a comparison of the left and right signals in the TF plane, obtaining a two-dimensional map two dimensional that identifies different source components related to the panning gains used in the stereo mix panning down. Firstly, a similarity measure is defined: (1) where * denotes a complex conjugation. If the source is panned to the center then the function will get its maximum value of one, and if the source is panned completely to either side, the function will attain its minimum value of zero. A mpletely quadratic dependence on the panning knob makes the function (4.2) multi-valued and an valued ambiguity appears in knowing the lateral direction of the source. The ambiguity is resolv resolved using the following partial similarity measures: (2) and their difference (3) The ambiguity-resolving function is: resolving (4) Finally, the panning index (k,m) is obtained as (k,m) . (5)

which identifies the time-frequency components of the sources in the stereo mix when they frequency are all panned to different positions. If several sources are equally panned, they will appear in the PI map as a single source. Due to the overlap with other sources, selecting only bins with = 0 will exclude 0 bins where the source might still have significant energy but whose panning index has been altered by the presence of the interference. A Gaussian window is proposed to let components with values equal to 0 pass unmodified and weight TF points with a PI value near to 0: (6) where 0 is the panning index value for extracting a given source, controls the width of the window, and vis a floor value necessary to avoid setting DWT values to zero, which might result in musical-noise arti noise artifacts. The 0 value must be specified for centering the separating window. Most of the vocal removers exploit the fact that singing voice is usually panned to the center. This is true for most of music recordings, so 0 = 0 is normally used. A supervised exploration along different PI values can be used for locating more exactly the pan

International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013) IAEME 2013),

location of the vocals. The value of , used for setting up the window width, can be obtained , using (7) is , where cis the PI value where the window reaches a small value A, for example dB. Once the parameters of the window have been set up, the DWTs of the initial foreground streams channels: (8) These are converted back to the time domain applying the DWT1operator, obtaining The (t). are simply obtained by applying the window to each of the mixture = 60

denotes the corresponding step of the separation method. The recovered

target signal is obtained by adding the foreground streams of both channels: (9) 2.4 Performance Evaluation Separation algorithms can be evaluated by using a set of measures under some allowed distortions. These distortions depend on the kind of application considered. In [17], [1 four numerical performance criteria are defined. The Signal to Distortion Ratio Th (10)

the Signal to Interferences Ratio (11) the Signal to Noise Ratio

(12) and the Signal to Artifacts Ratio (13)

International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013) IAEME 2013),

where and

is a version of

modified by an allowed distortion, and where

respectively are the interferences, noise and artifacts error terms resulting from

the decomposition (14) The SIR and the SAR are indicators of the rejection of the interferences and th the absence of burbling artifacts, respectively. The SNR is a measure of the rejection of the sensor noise and the SDR can be seen as a global performance measure. 3. IMPLEMENTATION

The model discussed above is implemented in MATLAB R2010a software and BSS EVAL toolbox [18] for MATLAB is used for performance evaluation. Later in this section ] the extracted features using our method are compared with other two feature extraction techniques STFT and FFT. 3.1 Design parameters for Source separation Table 1 Design parameters for s source separation Parameter Frame-size Frame Overlap Panning Index(C) Panning Index(0) Smallest window Value (A) Floor value Value 1000 0.75% 0 0.001 0.0005

By considering the design parameters as mentioned in the TABLE 1, we calculate the , performance evaluation parameter SDR for the proposed model. To evaluate the extracted features they were compared in two classification experiments with two feature sets that have been proposed in the literature. The first feature set consists of features extracted using the STFT. The second feature set consists of features ts extracted from Fast Fourier Transform (FFT) The source separation method is applied over several wave files, where each audio file is approximately 30 seconds long, with a frame size of 1000 samples at 44100 Hz sampling rate.
6

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), IAEME

4.

RESULTS

Similarity measure between the Discrete Wavelet Transforms (DWT) of the input signals is used to identify time-frequency regions occupied by each source based on the panning coefficient assigned to it during the mix. Individual music components are identified and manipulated by clustering the time-frequency components with a given panning coefficient. After modification, an inverse IDWT is used to synthesize a time-domain processed signal. The Figures below shows the plot of both the input signal and the extracted voice signal of several wave files plotted using MATLAB. The performance evaluation parameter, SDR can be obtained using BSS EVAL toolbox in MATLAB. The music separation method is applied over several wave files, where each audio file is approximately 30 seconds long, with a frame size of 1000 samples at 44100 Hz sampling rate. Below are the lists of experiments. Experiment 1: The Fig. 2 demonstrates input signal and the extracted voice signal from the wave file boyfriend.wav, composed by Ashley simpson which is 20 seconds long, with 44100 Hz sampling rate. MATLAB software is used to plot the results. The results are tabulated in TABLE 2.

Figure 2.input signal and the extracted voice signal from the wave file boyfriend.wav Table 2 SDRs of boyfriend.wav SDR I/P Wave file/Composer Boyfriend -Ashley simpson FFT STFT DWT 84.0558

44.1153 51.1288

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), IAEME

Experiment 2: The Fig. 3 demonstrates input signal and the extracted voice signal from the wave file chammak challo.wav, composed by Vishal Shekhar which is 35 seconds long, with 44100 Hz sampling rate. MATLAB software is used to plot the results. The results are tabulated in TABLE 3.

Figure 3.i/p signal and the extracted voice signal from the wave file chammakchallo.wav Table 3 SDRs of chammakchallo.wav SDR I/P Wave file/Composer Chammakchallo vishalshekar FFT STFT DWT

35.5603 40.5822 83.47656607

Experiment 3: The Fig. 4 demonstrates input signal and the extracted voice signal from the wave file toxic.wav, composed by Britney Spears which 27 seconds long, with 44100 Hz sampling rate. MATLAB software is used to plot the results. The results are tabulated in TABLE 4.

Figure 4 input signal and the extracted voice signal from the wave file toxic.wav Table 4 SDRs of toxic.wav SDR I/P Wave file/Composer Toxic Britney spears
8

FFT

STFT

DWT 85.7679

56.2875 59.7563

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), IAEME

5.

CONCLUSION

Audio source separation using DWT is presented. The Discrete Wavelet Transforms (DWTs) of the input signals which were used to identify the time-frequency regions occupied by each source based on the panning coefficient hence improved the Signal to Noise and Distortion ratios. From the TABLE 2, 3 and 4 respectively, it is evident that the results obtained using DWT as feature extractor is approximately 38% better when compared with other two feature extractors, proving that proposed method provides better Signal to Noise and Distortion Ratios. 6. ACKNOWLEDGEMENTS

The authors 1 and 4 wish to acknowledge for the awesome technical support provided by Jasmin Infotech,India and Google India. REFERENCES [1]. Kim,Y. and Whitman, B. Singer identification in popular music recordings using voice coding features, Proc. ISMIR 2002. [2] P. Comon, Independent component analysis, a new concept?, SignalProcessing, vol. 36, no. 3, pp. 287314, April 1994. [3]. Chou, W. and Gu, L. Robust singing detection in speech/music discriminator design, Proc. ICASSP 2001. [4]. Berenzweig, A. and Ellis, D.P.W. Locating Singing voice segments within music signals , Proc. WASPAA 2001. [5]. Kim,Y. and Whitman, B. Singer identification in popular music recordings using voice coding features, Proc. ISMIR 2002. [6]. Zhang, T. System and method for automatic singer identification, Proc. ICME 2003. 762 Proc. [7]. Maddage, N.C.(c), et al. Content-based music structure analysis with applications to music semantic understanding, Proc. ACM Multimedia 2004. [8]. MaximoCobos, and Jose J. Lopez, Singing Voice Separation Combining Panning Information and Pitch Tracking, Audio Engineering Society Convention Paper Presented at the 124th Convention 2008 May 1720 Amsterdam, The Netherlands [9] J. F. Cardoso, Blind signal separation: statisticalprinciples, in Proceedings of the IEEE,vol.86, no. 10, pp. 2009-2025, [10] T. W. Lee, M. S. Lewicki, M. Girolami and T.J. Sejnowski, Blind source separation of moresources than mixtures using overcomplete representations,in IEEE Signal Processing Letters,vol.6, no. 4, pp.87-90, April 1999. [11] A. S. Master, Stereo Music Source Separationvia Bayesian Modeling, Ph.D. Dissertation,Stanford University, June 2006. [12] P. D. OGrady, B. A. Pearlmutter and S.T. Rickard, Survey of sparse and nonsparsemethods in source separation, InternationalJournal of Imaging Systems and Technology(IJIST), vol.15, no. 1, pp.18-33, 2005. [13] C. Jutten and M. Babaie-Zadeh, Source separationprinciples, current advances and applications,presented at the 2006 German-FrenchInsitute for Automation and Robotic AnnualMeeting, IAR 2006, Nancy, France, November2006.
9

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 6464(Print), ISSN 0976 6472(Online) Volume 4, Issue 1, January- February (2013), IAEME

[14] K. Torkkola, Blind separation for audio signals:are we there yet?, in Proceedings of theWorkshop on Independent Component Analysisand Blind Signal Separation (ICA 1999), 1999 [15] O. Yilmaz and S. Rickard, Blind separationof speech mixtures via time-frequency masking,in IEEE Transactions on Signal Processing,vol.52, no. 7, pp.1830-1847, July 2004. [16] C. Avendano, Frequency-domain source identificationand manipulation in stereo mixes forenhancement, suppression and re-panning applications,in IEEE Workshop on Applicationsof Signal Processing to Audio and Acoustics,New Paltz, New York, October 2003. [17] E. Vincent, R. Gribonval and C. Fevotte, PerformanceMeasurement in Blind Audio SourceSeparation, in IEEE Transactions on Speechand Audio Processing, vol.14, no. 4, pp.1462-1469, 2006. [18] C. Fevotte, R. Gribonval and E. Vincent,BSS EVAL Toolbox User Guide, IRISA,Rennes, France, 2006. [19] Ravindra M. Malkar, Vaibhav B. Magdum and Darshan N. Karnawat, An Adaptive Switched Active Power Line Conditioner Using Discrete Wavelet Transform (Dwt) International Journal of Electrical Engineering & Technology (IJEET), Volume2, Issue1, 2011, pp. 14 - 24, Published by IAEME.

10

You might also like