Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

1

1.7 MATLAB Exercise – Sampling Rate Conversion Between Typical Speech


and Audio Rates
Program Directory: matlab_gui\SRC
Program Name: SRC_GUI25.m
GUI data file: SRCONV.mat
Callbacks file: Callbacks_SRC_GUI25.m
TADSP: Sections 2.5.3-2.5.6, pp. 47-55.
Speech and audio waveforms are generally sampled at a number of standard rates, including 2000, 4000, 6000,
6667, 8000, 10000, 16000, 20000 and 40000 Hz. Often speech processing (within a MATLAB m-file) expects speech
or audio to be sampled at one of these standard rates. Hence if the sampling rate for an input signal to a MATLAB
algorithm or application is not the required rate, it is often necessary to convert the sampling rate, do the processing at
the specified rate, and convert back to the original sampling rate (e.g., for a speech processing algorithm).1
This MATLAB exercise provides a simple sampling rate converter between standard sampling rates of 2000, 4000,
6000, 6667, 8000, 10000, 16000, 20000 and 40000 Hz, as well as between any pair of sampling rates that have a least
common multiple that is an integer.

Sampling Rate Conversion – GUI Design


The GUI for this exercise consists of three panels, 2 graphic panels, 1 title box and 13 buttons. The functionality of
the 3 panels is:

1. one panel for the graphics display


2. one panel for parameters related to retrieving an existing speech file and allowing the user to play the original
speech signal, plot the original signal waveform, and plot the original signal long time spectrum,
3. one panel for specifying the name of the output file used to save the sampling rate converted speech, specifying
the new sampling rate of the signal and for performing sampling rate conversion, playing the converted speech
signal, plotting the converted speech signal, plotting the converted speech long time spectrum, and saving the
converted speech in the specified output filename.

The top graphics panel is used to display a waveform plot of the original speech waveform or its long time spectrum,
and the bottom graphics panel is used to display the sampling rate converted waveform or its long time spectrum. The
title box displays information about the file that is sampling rate converted, and the functionality of the 13 buttons is:

1. a pushbutton to select the directory with the desired speech file,


2. a popupmenu button to display the list of available speech/audio files in the directory speech_files, and to
select the spech file to be processed by the Sampling Rate Conversion code,
3. a pushbutton to play the original speech/audio file at the original sampling rate,
4. a pushbutton to plot the original signal waveform,
5. a pushbutton to plot the original signal long time spectrum,
6. an editable text button to specify the name of the file that can be used to store the sampling rate converted
signal (initially set to ’Output File Conv’); note that the sampling rate converted signal is stored in any desired
MATLAB directory,
7. an editable button to specify the converted signal sampling rate (initially set to 10000 samples per second),
1 For some speech and audio processing systems, sampling rate conversion back to the original rate may not be necessary; e.g., when the result

of the processing is not the signal itself but a parametric representation of the speech signal which can be well represented at the converted rate.
2

8. a pushbutton to convert the signal sampling rate,

9. a pushbutton to play the converted speech signal,

10. a pushbutton to plot the converted speech signal,

11. a pushbutton to plot the converted speech long time spectrum,

12. a pushbutton to save the converted speech in the file designated by an earlier button, in a user designated direc-
tory,

13. a pushbutton to terminate the GUI.

Sampling Rate Conversion – Scripted Run


A scripted run of the program ’SRC_GUI25.m’ is as follows:

1. run the program ’SRC_GUI25.m’ from the directory ’matlab gui\SRC’,

2. hit the pushbutton labeled ’Speech Directory’and click ’OK’ on the pre-selected directory ’speech files’,

3. select the speech file ’test 16k.wav’ from the list of files in the popupmenu button,

4. hit the ’Play Original Signal’ button to play the signal,

5. hit the ’Plot Original Signal Waveform’ button to plot the original speech signal in the top graphics panel
(displacing any previous plot),

6. hit the ’Plot Original Signal Long Time Spectrum’ button to plot the original signal long time spectrum in the
top graphics panel (displacing any previous plot),

7. using the defaults for the sampling rate converted speech filename (’Output File Conv’), and for the sampling
rate of the converted file (10,000 samples/second), hit the ’Convert Sampling Rate’ button to convert the signal
to the new sampling rate,

8. hit the ’Play Converted Speech Signal’ button to play the converted speech signal,

9. hit the ’Plot Converted Speech Signal’ button to plot the converted speech signal in the bottom graphics panel
(again displacing any existing plot),

10. hit the ’Plot Converted Speech Long Time Spectrum’ button to plot the long time log magnitude spectrum of
the converted speech in the lower graphics panel (again displacing any previous plot),

11. hit the ’Save Converted Speech in File’ button to save the converted speech file in a user-selected directory,
using the filename in the designated button above,

12. experiment with different speech files, and different sampling rate conversion parameter settings to see the effect
of changes in the sampling rate.

13. hit the ’Close GUI’ button to terminate the GUI.

The resulting graphical display for converting the speech file ’test 16k.wav’ (originally having a sampling rate of
16,000 samples/second) to a converted file with a sampling rate of 10,000 samples/second, is shown in Figure 1.
3

Figure 1: Example of graphical output of program ’SRC GUI25.m’ for the speech file ’test 16k.wav’

Sampling Rate Conversion – Issues for Experimentation


1. run the scripted exercise above, and answer the following:

• can you hear the difference between the original speech file (with a sampling rate of fs = 16, 000 samples
per second), and the sampling rate converted signal (with a sampling rate of fs = 10, 000 samples per
second)?
• can you see the difference between the original speech file waveform display and the waveform display of
the converted signal?
• can you see the difference between the original speech file long time spectrum and the long time spectrum
of the converted signal?
• what sounds of this speech utterance are most affected by the sampling rate conversion process? Can you
see the differences in the waveform plots?

2. change the sampling rate of the converted file from fs = 16, 000 samples per second to fs = 8000 samples per
second; what effect does the reduced sampling rate have on the speech signal?
3. change the sampling rate of the converted file from fs = 16, 000 samples per second to fs = 6000 samples per
second; what effect does the reduced sampling rate have on the speech signal naturalness and intelligibility?
4. change the sampling rate of the converted file from fs = 16, 000 samples per second to fs = 20, 000 samples
per second; what effect does this up-sampling have on the speech signal intelligibility or naturalness?

You might also like