Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Project1 Final Report (Team 13)

Name 1:王启帆 SID 1: 12110206 Name 2:康耀中 SID 2: 12110225

Name 3:郑振涛 SID 3: 12111031 Name 4:王晨阳 SID 4: 12111020

Final Report for Project 1. Speech Synthesis


and Perception with Envelope Cue
1.Introduction
According to survey data, the prevalence of hearing impairment in China is 1.62%. There are currently 20
million people with hearing impairments, and the number of children with hearing loss increases by 20,000
to 40,000 annually. With the development of an aging society, the number of elderly patients with
presbycusis is sure to increase.

A cochlear implant (CI) is a surgically implanted neuroprosthesis that provides a person who has bilateral
moderate-to-profound sensorineural hearing loss with sound perception and an opportunity with therapy
for improved speech understanding in both quiet and noisy environments. A CI bypasses acoustic hearing
by direct electrical stimulation of the auditory nerve. Cochlear implants are currently the only treatment for
severe deafness.

In this project, we have studied the mechanism of human hearing and the principle of cochlear implants,
and have done some implementations on MATLAB.

2.Background information
In this project, the cochlear implant (CI) we implemented is primarily based on the design of a real ear. The
real ear receives different frequencies of speech separately and the places where it receives signals of
different frequencies are not evenly distributed. The location and frequency reception of its ear is shown in
the following figure.

position-frequency correspondence formula

No. 1 / 12
Project1 Final Report (Team 13)

In the implementation of cochlear implants, speech is divided into multiple bands through different
bandpass filters, and then passed through a rectifier and a low pass filter to take the envelope of each
signal. The resulting electrical signals are then sent to electrodes that stimulate specific areas of the
patient's cochlea, ultimately resulting in the perception of sound in the patient's ear.

Because a typical artificial cochlear implant only has 22 electrodes, there is a limit to the number of bands
that can be used. In this project, we explore the signal between 200Hz and 7000Hz and followed the same
steps as in a typical cochlear implant, but instead of stimulating specific electrodes, we added a carrier
signal and summed the multiple channels, finally normalizing the output with respect to the original signal
energy.

3.Basic Task

No. 2 / 12
Project1 Final Report (Team 13)

We observe the recovered sound signal using direct listening, time domain plots, spectrograms, and
frequency domain plots.

Task 1
Set LPF cutoff frequency to 50Hz, and change the number of bands, N from 1 to 8. Explore the effect
of N on intelligibility.

In this task, we used a fourth-order Butterworth filter, and the envelope was obtained using full-wave
rectification method. The frequency boundary of the bandpass filter is divided according to the distance of
the frequency received by the real ear. The carrier frequency is the frequency corresponding to the
midpoint of the real ear distance corresponding to the frequency boundary of the carrier frequency, where
the formula for the carrier frequency is shown in the following figure.

Code

function s = gen_vo1(signal,N,fs,low_pass)%输出的为⼀段频率加载波后的信号,载波为两端中间值
d200 = log10(200/165.4+1)/0.06;
d7000 = log10(7000/165.4+1)/0.06;
d = (d7000-d200)/N;
x = 0:length(signal)-1;
x = x./fs;
s = zeros(1,length(signal));
for i = 1:N
fq_low = 165.4*(10^(0.06*(d*(i-1)+d200))-1);%确定⼀段低频率
fq_high = 165.4*(10^(0.06*(d*(i)+d200))-1);%确定⾼频率
[b,a] = butter(4,[fq_low,fq_high]/(fs/2)); %带通滤波器

No. 3 / 12
Project1 Final Report (Team 13)

y = filter(b,a,signal);
y = abs(y);
[b_lp,a_lp] = butter(4,low_pass/(fs/2));%低通滤波器取包络
y = filter(b_lp,a_lp,y);
fq_center = 165.4*(10^(0.06*(d*(i-0.5)+d200))-1);%添加载波 %+1000*rand(1);
s = s + y.*sin(2*pi*fq_center.*x);
end
s = s/norm(s)*norm(signal);
end

Result

Through the spectrum and spectrogram, we can clearly see that as the number of passbands increases, the
frequency becomes more abundant and the amount of information contained increases, and it becomes
closer to the value of the original signal. However, when N>100, the time domain signal begins to deform
and the last segment begins to increase. The energy in the frequency domain begins to converge at the
same frequency. When N is above 175, the time domain becomes an meaningless funnel-shaped image,
and only two frequencies have values in the frequency domain. From this conclusion, it can be seen that as
the number of passbands increases at the beginning, the spectrum gradually becomes richer and
approaches the true value. However, as the number of passbands exceeds a limit, the average energy of
each passband decreases, and finally may be ignored due to the small values, and only one or a certain
section of the value is larger, and is retained. Therefore, only one frequency has a value, and the time
domain also has meaningless signals.

No. 4 / 12
Project1 Final Report (Team 13)

Task 2
Set the number of band N=4, and change the LPF cutoff frequency. Explore the effect of cutoff
frequency on intelligibility.

We find that as the cut-off frequency increases, the sound restoration effect is better. By observing the
spectrogram, we find that many small signals appear next to the original carriers' frequencies, and by
comparing them to the original spectrogram, we find that these small signals partially restore the original
signal, which is why the effect is better. The envelope signal we filtered out also shows this. As we adjust the
cut-off frequency, we find that as the cut-off frequency increases, the restoration effect also improves.
However, the low-pass filtering effect of extracting the envelope is to reduce other frequencies and only
include a carrier frequency, so that it can be perceived by the cochlear implant, so it is not advisable to
blindly increase the cut-off frequency to improve the effect. And according to our continuous
experimentation, we found that after the cut-off frequency exceeded 2000 Hz, the sound restoration effect
basically stopped increasing. The possible reason is that 2000 Hz has already retained most of the
information, and there is not much information above 2000 Hz, so there is not much effect after increasing
it.

No. 5 / 12
Project1 Final Report (Team 13)

No. 6 / 12
Project1 Final Report (Team 13)

Task3 & Task4


Generate a noisy signal at SNR−5dB, set LPF cutoff frequency to 50 Hz, and change the number of
bands, N from 2 to 16. Explore the effect of N on the intelligibility, and compare findings with those
obtained in task 1.

Generate a noisy signal at SNR -5dB, set the number of band N=6, and change the LPF cutoff
frequency from 20 Hz to 400 Hz.

Code : generate a noise

function sig = noise(signal, fs, SNR)


sig_noise = repmat(signal,1,10);
[Pxx,w] = pwelch(sig_noise,[],[],512,fs);%⽣成功率谱
b = fir2(3000,w/(fs/2),sqrt(Pxx/max(Pxx)));
noise_white = 1 - 2*rand(1,length(signal));%⽣成⽩噪声
sig = filter(b,1,noise_white);
sig = sig/norm(sig)*norm(signal)/(10^(SNR/20));%⽣成SSN
end

In Task3 and Task4, under the influence of noise, the conclusions obtained in Task1 and Task2 still hold true,
a certain number of passbands and a higher cutoff frequency can more clearly hear the original signal
under noise.

No. 7 / 12
Project1 Final Report (Team 13)

4.Alternative signal processing strategies


1.Limiter
First part is a limiter, whose purpose is fairly simple, which is to control the peak below a threshold. We also
wrote a method for a adjustable threshold for weakening the signal proportion above the threshold. Our
idea is to cut off a part of the noise that is relatively large, so that the original signal is clearer. After testing,
we found that the limiter can make the signal clearer, but only to a certain extent.
The effect of limiter is like the picture below

No. 8 / 12
Project1 Final Report (Team 13)

No. 9 / 12
Project1 Final Report (Team 13)

2.R0
At first, we used Fourier transform to find the fundamental frequency, but it didn't work well. Later, we used
spectral analysis. Following is part of the code:

spec = spectrogram(y,chebwin(fs/10));
peak = zeros(size(spec,2),1);
scaler_f = fs/2 / size(spec,1);
threshold = max(max(abs(spec)))/40;
for j = 1:size(spec,2)
[~, locs] = findpeaks(abs(spec(:,j)),...
"Npeaks",1,"MinPeakProminence",threshold);
if locs > 1
peak(j) = locs - fq_center/scaler_f;
end
end

First, the input signal y is subjected to spectral analysis, and then the window function is specified. Then, the
threshold is defined as 1/40 of the absolute maximum value of the spectrum. Next, the loop iterates
through each column of the spectrum. For each column, the findpeaks function is called to find the peak
values of its absolute value, specifying that at most only one peak value is searched for and requiring that
the peak value's prominence be greater than the threshold. If a peak value is found, its position minus the
value of the center frequency is assigned to the current column of the array. Finally, we can obtain a
frequency offset map: Frequency offset can improve the quality of sound intonation and also have a certain
denoising effect.

5. Other comparison
1. envolope()
We tried using the envolope() function in Matlab to extract the envelope of a signal, and initially found that it
worked well. However, after adjusting the low-frequency cutoff for envelope extraction, we found that the
envelope function performed similarly to a high-frequency filter and that the effect was visible in the
spectrum plot. Using a Butterworth filter produced better results in terms of reducing noise interference
compared to using the envelope function.

No. 10 / 12
Project1 Final Report (Team 13)

y = envelope(y);

2.divide the signal equally with frequency


We attempted to use the method of equal frequency distribution, with the carrier also being the center
value of the upper and lower frequencies. However, we found that this method was not as effective as our
previous method of distinguishing frequencies based on the distance to the eardrum. In order to determine
the reason for this difference in effectiveness, we compared the frequency charts and found that when
frequencies were distinguished based on eardrum distance, there was a larger proportion of passbands in
the lower frequencies, which carried more information and therefore resulted in more successful
recoveries. The carrier frequency itself did not have a significant effect on the recognizability of the
recovered sound. When we changed the carrier frequency that was distinguished by eardrum distance to a
random frequency, the pitch of the sound changed each time, but this did not affect the recognition of the
speech.
The final conclusion is that we should use more bandwidth at low frequencies to acquire information, just
like a real ear. The choice of carrier frequency does not affect the recognition effect.

function s = gen_vo(low_band, high_band,signal,fs,low_pass)%输出的为⼀段频率加载波后的信号,载


波为两端中间值,⽤平均频率来规定通带
[b,a] = butter(4,[low_band,high_band]/(fs/2));
y = filter(b,a,signal);
y = abs(y);
[b_lp,a_lp] = butter(4,low_pass/(fs/2));
y = filter(b_lp,a_lp,y);
x = 0:length(y)-1;
x = x./fs;
s = y.*sin(2*pi*(low_band+high_band)/2.*x);
end

s1_5_n = zeros(1,length(y1));
N = 8;
for i = 1:N
s1_5_n = s1_5_n + gen_vo(200+(i-1)*(7000-200)/N,200+i*(7000-200)/N,y1,fs1,50);
end
s1_5_n = s1_5_n/norm(s1_5_n)*norm(y1);
figure;
sound(s1_5_n,fs1);
spect(s1_5_n,zeros(1,length(y1)),fs1,'平分频率');

No. 11 / 12
Project1 Final Report (Team 13)

fq_center = 165.4*(10^(0.06*(d*(i-0.5)+d200))-1)+2000*rand(1);%随机载波频率

3 virtual channels
In our experiments, we found that the more channels there are, the better the synthesis of speech.
However, in reality, there are a limited number of physical electrodes, usually 12-22, which means we must
use a limited number of electrodes to achieve the best effect. In physical implementation, there is a method
of simultaneously stimulating adjacent electrodes to form discernible virtual channels, which can improve
the performance of artificial cochlear implants.

Summary
In this project, we found that the number of passbands and the cutoff frequency for envelope extraction
had a significant impact on recognizability. An increase in the number of passbands below N=100 and a
decrease in the cutoff frequency for envelope extraction below 2000 Hz both significantly improved the
recsognizability of the sound. We also conducted other experiments, such as using a limiter and extracting
the base frequency to improve recognizability. We compared several methods for recovering the sound
(different filters, different methods for extracting the envelope, different passbands and carriers). Of course,
we also had many failures in our attempts, such as when we tried four different algorithms – similarity,
spectral distance, dynamic time warping, and directly calculating the norm of the time domain difference –
to quantify the quality of the recovered signal, but none of them were suitable for this project. However, we
learned a lot from our failures, and overall, this project was extremely beneficial to us.

No. 12 / 12

You might also like