The Use and Effective Analysis of Vocal Spectrum A

Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
Applied Mathematics and Nonlinear Sciences

https://www.sciendo.com
The use and effective analysis of vocal spectrum analysis method in vocal music
teaching
Bo Zhang1,†
1. Department of Art and Design, Tongcheng Teachers College, Tongcheng, Anhui, 231400, China.
Submission Info
Communicated by Z. Sabir
Received February 12, 2024
Accepted April 14, 2024
Available online June 3, 2024
Abstract
As computer science and technology continue to evolve and become more pervasive, their application in analyzing the
audio spectrum of vocalizations offers valuable insights for vocal music education. This study introduces a method
utilizing Fourier transform analysis to examine time-frequency domain signals in vocal teaching. Initially, voice
frequencies are collected during vocal music instruction. Subsequently, these frequencies are processed to extract
characteristic sequences, which are then reduced in scale to develop a model for voice spectrum recognition tailored to
vocal music education. This model facilitates detailed spectral analysis, enabling the investigation of its auxiliary benefits
in vocal music teaching, particularly in identifying prevalent instructional challenges. Our findings indicate that during
training on vowels “a” and “i,” professional singers’ pitch at 4kHz declined to between -15 and -18 dB, whereas students’
pitch varied around ±6dB, trending upwards. In cases of air leakage, significant gaps were observed at frequencies of
5500Hz, 10500Hz, and 14500Hz. At the same time, students exhibited missing frequencies at 7kHz, 12kHz, and 14kHz
during glottal tone production, with pronounced, abrupt peaks occurring when vocal folds were tightly constricted and
devoid of excessive links. This research substantiates the theoretical and practical benefits of digital spectrum technology
in enhancing vocal music education, thereby providing a scientific and supportive role.
Keywords: Spectrum analysis; Fourier transform; Audio feature sequence; Vocal spectrum recognition; Vocal music
teaching.
AMS 2010 codes: 68T05
†Corresponding author.
Email address: 15855616959@163.com ISSN 2444-8656
https://doi.org/10.2478/amns-2024-1361
© 2023 Bo Zhang, published by Sciendo.
This work is licensed under the Creative Commons Attribution alone 4.0 License.
2 Bo Zhang. Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14
1 Introduction
All sounds have their unique frequencies. Sound is generated by vibration, transmitted through a
certain medium, and received by the human ear. Throughout the transmission process, whether it is
music or noise, people do not intuitively understand and analyze the acoustic characteristics of the
sound. The use of spectrum analyzers will be the sound through the spectral coordinates of the
analysis of the map to solve this problem [1-3]. Through the image, the signal size of all frequency
bands can be analyzed intuitively to understand and filter the frequency bands you need to practice
efficiently to achieve sound adjustment [4-6].
The human voice can be considered a musical instrument in the broadest sense of the word. Still, the
human vocal organs are composed of soft muscles, unlike the hard materials of other musical
instruments, and the nerves that control these muscles are different from the nerves that control the
fingers, lips, and tongue [7-9]. It is because of these characteristics that vocal music seems so abstract
and raw, with a large degree of intuitive, more subjective feeling rather than objective clarity [10-12].
The technical requirements, voice aesthetics, and singing state embodied in the singing of vocalists
are explored to analyze the spectrum of different voices, visualize the abstract vocal music, and form
intuitive images [13-15].
With the continuous development of vocal art, more and more people may begin to study the nature
of the sound. Then with the constant popularization of computers, people can understand the sound
and analyze the sound by means of science and technology, and understand the unseen and
untouchable vocal music from another perspective [16-18]. In vocal music learning, the use of
spectral analysis is even more able to quickly find the strengths and weaknesses of the sound emitted
so as to clearly adjust the direction and improve learning efficiency [19].
In this paper, assuming that the reception vector of a given vocal frequency signal has been
determined, the detected vocal frequency signal is subjected to spectral sequence feature extraction,
and the trained audio signal feature sequence is used to identify unknown signals, based on which,
the vocal frequency dataset is constructed. The Fourier Transform is used to determine the vocal
spectra of vocal music teaching and to perform the signal analysis in the time domain and frequency
domain. The model is used to analyze the spectra of “a” and “i” vocalizations of professional singers
and vocal students in order to compare and find the differences and to test its positive auxiliary effect
in vocal music teaching. Finally, the three typical problems of “air leakage,” “glottal sound,” and
“vocal folds squeezing” are analyzed spectrally to find out the causes of students’ wrong vocalizations
so as to improve the efficiency of vocal teaching. The study was conducted to identify the causes of
students’ vocal errors in order to improve the efficiency of vocal teaching.
2 Construction of the human voice spectral analysis model
2.1 Spectrum analysis algorithm
2.1.1 Time-frequency analysis of vocal signals
The Fourier transform is a linear integral transformation. A conversion method has been established
to transform a time-domain signal into a frequency-domain signal in order to analyze and understand
the intrinsic properties of the signal from a frequency-domain perspective.
The use and effective analysis of vocal spectrum analysis method in vocal music teaching 3
Equation (1) is the mathematical representation of the Fourier series, xe k  is the helper form of the
complex sinusoidal signal x ( t ) , where X represents the amplitude of the sinusoidal signal. And
 represents the frequency of the complex sinusoidal signal. This decomposes the actual periodic
signal y ( t ) into a superposition of an infinite number of sinusoidal signals. From the expression,
we know that y ( t ) is also a periodic function of T = 2  :

y (t ) =  X ( k  ) e
k =−
jk 
(1)
The discrete frequency and time domain signals of the DFT transform enable a computer system to
digitally represent the DFT. The actual signals are of infinite length, but computers cannot represent
infinite-length signals, so a truncation of the original signal is used. This truncation is periodically
delayed as the main sequence of the DFT transform series. Let the length of sequence x ( n ) be M
and define x ( n ) :
The N -point DFT is:

N −1
X ( k ) = DFT  x ( n )  N =  x(n)e − j 2 in / N , k = 0,1, L, N − 1 (2)
n =0
Where N is called the interval length of the discrete Fourier transform. For ease of writing, let
WN = e− /2 N . Thus, the N -point DFT is usually denoted as:
N −1
X ( k ) = DFT  x ( n )  N =  x ( n )WNkn , k = 0,1, L, N − 1 (3)
n =0
The process of obtaining the frequency-domain signal properties of a signal from its corresponding
time-domain signal is known as the Fourier inverse transform, defining the X ( k ) discrete Fourier
inverse transform (IDFT) of a N -point as:
N −1
1
x ( n ) = IDFT  X ( k )  =
N
 X ( k )W
k =0
− kn
N , n = 0,1, L, N − 1 (4)
The DFT transform also has many useful properties. If x1 ( n ) and x2 ( n ) are a discrete N -point
sequence. X1 ( k ) and X 2 ( k ) are their corresponding DFT transforms, then there are:
DFT  ax1 ( n ) + bx2 ( n )  = aX 1 ( k ) + bX 2 ( k ) (5)
This is called the linearity of the DFT transform, i.e., if the original signal is a linear combination of
signals x1 ( n ) and x2 ( n ) , then the DFT of this sequence is also a linear combination of the DFTs
corresponding to x1 ( n ) and x2 ( n ) . The shift property of the DFT transform means that if the
original sequence of signals is x ( n ) shifted to the left or the right of the m sampling point, its
corresponding DFT is equal to the DFT transform of the x ( n ) sequence multiplied by a coefficient
related to m and k . I.e.:
DFT  x ( n − m )  = W − km X ( k )
(6)
DFT  x ( n − m )  = W km X ( k )
The odd, even, imaginary, and real symmetry of DFT is the theoretical basis for the filter design and
optimization of the real FFT/IFFT algorithm in this topic. For a complex sequence x ( n ) , if x* ( n )
is the conjugate complex of the original signal then his DFT transform X * ( k ) satisfies the following
relationship with the DFT transform of the original signal:
DFT  x* ( n )  = X * ( −k ) (7)
In real-time, the source sequences of the DFT transform are often sufficiently real rather than
imaginary when applied in the field. For example, the sound pressure signals in this topic are discrete
real sequences. The DFT transform also has some special properties when x ( n ) is a real number:
X * ( k ) = X ( −k ) = X ( N − k )
X k ( k ) = X R ( −k ) = X k ( N − k )
X r = − X 1 ( −k ) = − X 1 ( N − k ) (8)
X (k ) = X ( N − k )
arg  X ( k )  = − arg  X ( −k ) 
Where X R ( k ) and X l ( k ) are the real and imaginary parts of the Fourier transform X ( k ) ,
respectively. And when x ( n ) is a real even function, its Fourier transform is a pure real sequence,
if x ( n ) is an odd function, its Fourier transform is a pure imaginary sequence.
2.1.2 Fast implementation of the Fourier transform
Completing the DFT transform for a sequence of points in column N requires N 2 complex
multiplications and N ( N − 1) complex additions. In contrast, a complex multiplication requires
four real multiplications, and a complex addition requires two real additions. For the sequence of
columns N = 2048 , a total of 16777216 real multiplications and 8384512 additions are necessary.
The Fast Fourier Transform requires the DFT to have a transform interval of length N = 2 , where
M is a natural number. According to the defining equation (3) of the DFT transform, the above
equation is decomposed by the parity of n as:
N /2 −1 N /2 −1
X (k ) = l =0
x ( 2l ) WNkl +  x ( 2l + 1)W
l =0
k ( 2 l +1)
N
(9)
Let x1 ( l ) = x ( 2l ) , x2 ( l ) = x ( 2l + 1) , due to WN2 N = WNkl/ 2 , so the above equation can be written as:
N /2 −1 N /2 −1
X (k ) = 
l =0
x1 ( l )WNk /2 + WNk  x ( l )W
l =0
2
k
N /2 , k = 0,1, 2, L, N − 1 (10)
Denote by X1 ( k ) and X 2 ( k ) the N / 2 -point DFT of x1 ( l ) and x2 ( l ) , respectively, i.e:
N /2 −1
X 1 ( k ) = DFT  x1 ( l )  N /2 =  x ( l )W
i =0
1
kl
N /2 , k = 0,1, L, N / 2 − 1
(11)
N /2 −1
X 2 ( k ) = DFT  x2 ( l )  N /2 =  x ( l )W
i =0
2
kl
N /2 , k = 0,1, L, N / 2 − 1
Using the implied periodicity of WNk − N /2 = −Wk* and X1 ( k ) and X 2 ( k ) , one obtains:
 X ( k ) = X 1 ( k ) + WNk X 2 ( k )

  N N (12)
 X  k + 2  = X 1 ( k ) − WN X 2 ( k ) k = 0,1, L  2 − 1
k
  
When N = 23 = 8 , the first parity extraction decomposition is represented by a butterfly diagram. It

can be calculated that after one extraction decomposition, when N 1 , the operation of making a
N -point DFT is approximately halved. Therefore, the decomposition should be continued. After M
levels of time-domain parity extraction, it can be decomposed into N 1-point DFTs and M levels
of butterfly computation with N 2 butterflies per level. And the 1-point DFT is the 1-point time-
domain sequence itself.
2.2 Vocal Spectrum Recognition in Vocal Music Teaching
2.2.1 Detection of vocal frequency signals
Assuming that the received vector for a given audio signal of the human voice has been determined,
traverse all the transmit channel matrices, calculate the Euclidean distance between the target vector
and the received vector, and use the vector with the smallest distance as the vector of the audio signal.
At a certain transmission rate at the transmitter point of the audio signal, estimate the audio signal x
at the receiver, denoted as:
xˆ = arg min y − Gx
2
(13)
x A2 Ni
Where, N i denotes the number of transmitting antennas, y denotes the target vector, A denotes
the amplitude set of the audio signal, and G denotes the channel matrix.
In order to control the complexity of the algorithm, the concept of cost function is introduced in the
detection algorithm, and the cost function is obtained to be expressed as:
 ( x ) = xGGGGx − 2 yGGx (14)

After the above calculations are completed,  ( x ) is calculated as a negative value; the smaller the
result of the calculation indicates the better performance of the algorithm, and vice versa, means the
poor performance of the algorithm.
2.2.2 Audio signal feature sequence extraction
Spectral sequence feature extraction is performed on the detected vocal frequency signal to obtain a
high dimensional feature vector set B, B = b1 , b2 , , bN  , the feature vector set representation is in
the form of a multidimensional spectrogram, and N in the vector set denotes the frame index and
the difference distance between frame i and the rest of the N −1 is computed for the spectral
sequence respectively D0 The computational formula is as follows:
Dij = b j − bi i, j = 1,, N (15)
Combining all the computed distance vectors Di gives the N nd order matrix H . i.e.:
 0 D12 D13  D1N 

D 0 D23  D2 N 
 12
H =  D13 D23 0  D3 N  (16)
 
    
 D1N D2 N D3 N  0 
Where i and j denote the spectral frame indexes, Di denotes the difference between the two
spectrograms, bi and b3 denote the spectral sequence of frame i or j , respectively, and H
denotes the vector matrix consisting of all the difference values. By default, the spectral information
within J and J T is identical, and the upper triangular matrix is used to represent the entire matrix
in the subsequent operations to reduce the computational complexity.
In order to be able to obtain all the information from the spectrogram to construct the audio signal
feature sequence, the average value of the multidimensional spectrogram feature set and the distance
between the spectrogram features and the average value of each frame are calculated using the
formula as follows:
N
b i
(17)
bave = i =1
diagi = bi − bave (18)
Where bave denotes the average of the set of spectrogram features and diagi denotes the distance
difference, which are combined to form matrix H mo .
 diag1 D12 D13  D1N 

 0 diag 2 D23  D2 N 

Hm =  0 0 diag3  D3 N  (19)
 
      
 0 0 0  diag n 
The spectral sequence extracted through the above process is the feature sequence of the vocal audio
signal. After obtaining the target signal feature sequence, it is used as a training sample to recognize
the vocal audio signal.
2.2.3 Signal Recognition
In the stage of human voice audio signal recognition, the trained sequence of audio signal features is
used to recognize the unknown signal and the audio signal recognition task is divided into a binary
classification task, which constitutes C ( C −1) 2 binary classification model, denoted as:
N
 =   i Ri ( Bi Si ) (20)
i =1
(
E =  i −   i Ri ( Bi Si )  Ri ( B j Si ) )
N
(21)
i =1
Where  denotes the normal vector of the binary classification model, E denotes the bias, Ri
denotes the regularization parameter, and  i denotes the indicator function. For the input feature
vector H m , the judgment process of the binary classification model in the above is:
 = S ( Bi  , E , Ri ) (22)
Where  is the recognition result of the binary classification model. So far, the design of the human
voice frequency signal recognition algorithm based on a multi-dimensional spectrogram has been
completed.
2.2.4 Building a Human Voice Frequency Dataset
The open-source MIDI sheet music is used as the recognition target to extract the audio map features,
set the sampling rate to 44kHz and the sampling bit to 16bit, extract the start and stop time of each
note in the map, and synthesize the WAV file as the standard template sequence for the experiment.
For a certain pitch subband signal s , the short-time prescription energy of the signal at n  N is
calculated using the formula as follows:
W (s) =  s (r )
2
 d d (23)
r n − n + 
 2 2
Where d denotes the rectangular window length and r denotes the resolution of each window.
After processing, the standard human voice frequency template sequence is obtained to complete the
construction of the experimental dataset.
3 Results and Discussion
3.1 Vocal Spectrum Analysis as an Aid to Vocal Teaching and Learning
After a comparative study of singer sampling as well as student sampling to obtain the form, it can
be put to use in actual vocal teaching. The aim was to select a set of vocal students from a training
school to participate in the new teaching experience. First of all, the beginner vocalists who
participated in the course were trained to pronounce “a” and “i” in practice, and then the spectrum
was utilized to display the pronunciation graphs in real time. The results are shown in Fig. 1 after two
tests of the same pronunciation at the same tuning position and different distances from different
pitches of the same pronunciation tuning test.
The spectrum of the singer is blue, while the spectrum of the vocal student is orange. After comparison,
it is found that during the pronunciation process, the vocal learner’s loudness in the pronunciation of
the bass frequency is obviously missing. In contrast, the middle and high-frequency positions are
significantly higher than the singer’s audio spectrum. This phenomenon is especially obvious as the
pitch rises, and the position of “cough,” which should be declined at 4 kHz, appears to be maintained
or maintained at different pitches. At 4kHz, the position of the “cough” sound is maintained or
increased. While the professional singer’s spectral curve at 4kHz has decreased in pitch to a range of
-15db to -18db, the student’s pitch remains in the range of ±6db, and there is even a tendency for it
to go further up. The student has a significant problem with their larynx.
Figure 1. Professional singers contrast with vocal students singing
After determining the direction of the problem, the vocal teacher can correct it through a series of
training, such as relaxing the jaw, guiding the student’s oral cavity to remain active, preventing the
lower part of the throat from exerting force, relaxing the tongue and a series of methods targeted at
correcting the student’s voice after adjusting the state of most of the intentional force from the
laryngeal position to the respiratory state. Obviously, the throat sound is reduced. The student’s
spectrum was also closer to the singer’s spectrum, and the student’s spectrum after correction and
improvement is shown in Figure 2. Although the student still did not reach the singer’s standard at 4
kHz, the original upward trend had been changed to a downward trend in the -6 db to -3 db range.
This fully proves the great and effective role of spectrum analysis in vocal teaching.
Figure 2. Professional singers and vocal students singing contrast (improved)
Another student’s articulation was similar to the first one, with a tight laryngeal sound. The same
sampling method was used to obtain a comparison chart, which is shown in Fig. 3, a comparison of
the student’s and the professional singer’s vocalization spectra.
In the figure, it can be clearly seen that under the same loudness, the student’s spectrum, compared
to the singer, the mid-frequency and high-frequency part of the lack of obvious, and accompanied by
part of the regional laryngeal problem characteristics, such as in the 3kHz part, the singer’s spectral
curve ranges from -6db to -9db down to the range of -15db to -18db, and the student only down to
the range of -9db to -12db. There is a gap between the singer and the singer. In the traditional teaching
process, due to articulation in the auditory and laryngeal problems close, easy for inexperienced
teachers to solve the breakthrough point on the breath, but the spectrum shows that the main
characteristics in the table categorized in the nasal problem, emphasizing the breath problem is
counterproductive, the situation can be analyzed that the student appeared in the laryngeal problem
is caused by excessive nasal sound, the teacher because of the focus on the solution of the student’s
nasal sound problem, the vocal teacher through the Teachers focus on solving the student’s nasal
sound problem, voice teachers through a series of methods to solve the problem of nasal sound, such
as adjusting the internal state of the mouth, singing with the state of speech, voice concentration,
leaning, before the closed-mouth voice as the main, open-mouth voice as a supplement to a series of
solutions to break through the student’s problem. After correction, the student also made significant
progress.
Figure 3. Professional singers contrast with vocal students singing
3.2 Spectral analysis of common problems in vocal music
In the process of vocal learning, singers may experience problems with their voice when they have
not yet mastered the correct vocal technique. These problems are essentially subtle in their impact on
the student and can only be corrected and solved by the teacher. However, during the process of
learning, singers often do not realize that their voices have the problems above if there is no teacher
beside them. We can analyze our voice spectrally through spectrum analysis software. In vocal
teaching, the teacher can analyze the student’s voice using spectrum analysis software. In vocal
teaching, the teacher can assist in judging the student’s incorrect vocal habits by observing spectrum
charts from spectrum analysis in order to correct the mistakes in time.
3.2.1 Spectral curves of sound leakage and their solutions
For students who have not mastered the correct method of vocalization, the leakage of the vocal cords
is one of the more likely problems, so the voice sounds like an airflow sound. The voice is not focused
and scattered. For beginners, it is easiest to hide the vocal cords for fear of singing the voice badly,
so a teacher should correctly guide the students to dare to sing with the vocal cords, not to hide. The
more you hide the vocal cords, the more problems will occur. Figure 4 is the spectrogram of the
correct state, and Figure 5 is the spectrogram of the state of the sound leakage. Through the
comparison can be seen the vocal folds in the state of the leakage of some special frequency bands,
such as 5500Hz, 10,500Hz, and 14,500Hz near the band, there will be a lack of the situation, which
is due to the lack of full vibration of the vocal folds.
Figure 4. Unleakable vocal cord spectrum Figure 5. Spectrum of acoustic leakage
There are two general causes of vocal fold leakage: the first is a damaged, diseased, or severely
defective vocal fold. There are many diseases of the vocal folds, such as knots on the vocal folds,
polyps on the vocal folds, acute inflammation of the vocal folds, etc. These diseases can seriously
affect the vocal folds at the closure, resulting in uneven force when the vocal folds are closed and the
closure is not tight, which leads to the leakage of air from the vocal folds. The second reason is that
the singer’s vocal folds are not capable of closing enough, the breath is too strong, and the vocal folds
are powerless to close and block the airflow, leading to vocal fold leakage. For the first case, singers
should seek medical treatment in time, preferably no vocal treatment, because it is fundamental to
have a good vocal fold so as to produce good sound quality, so singers need to protect their vocal
folds. Singers should adhere to the principle of gradual and orderly progress in daily practice and
gradually increase the difficulty of practice to avoid excessive use of the vocal cords in the second
case.
3.2.2 Spectral curves and solutions for laryngeal sounds
Laryngeal voice is also a problem that is more likely to occur in beginning voice students, such as a
voice that is heavy, rough in tone, and dull. Why does the glottal voice sound dull and heavy? Figure
6 shows a typical frequency curve with a glottal voice problem. It can be seen that the lower positions
are louder and the higher frequencies are lacking, so such a voice sounds dull and lackluster, with
missing values at 7kHz, 12kHz, and 14kHz.
Figure 6. The spectral curve of the guttural sound
“Open throat” for beginners may not understand the right situation. Such a situation is also the main
reason for the large throat, but also in the learning of singing, deliberately imitating the voice of the
singer in the recording, the pursuit of volume, and the quest for thickness resulting in the throat sound.
Students can not send their voice through the breath into the nasopharyngeal cavity or nasopharyngeal
cavity to obtain real resonance, so the students sing when the larynx tense force, the voice is blocked
in the throat, and the breath can not flow. Some, in order to pursue resonance, will also be held up
their throat, resulting in the voice and breath of the incoherence. The vocal cords are extremely
straining when singing this way, which can easily lead to vocal fatigue.
For the laryngeal voice, beginners should listen more and more to see, not just to imitate the voice,
which is a bad habit. Secondly, to have a number of their level, do not challenge for their difficulty is
too large work to establish a correct concept so that the entire throat is in a state of natural relaxation,
to relax the root of the tongue and the muscles around the throat, the heart can not think of holding it
open, a correct understanding of the concept of strengthening the training for the breath to find the
breath flowing up and down in front of the pharyngeal wall. The breath flows up and down in front
of the pharyngeal wall, driving the vocal cords to vibrate and produce a feeling of relaxation in the
throat.
3.2.3 Spectral curves of vocal folds “jammed” and their solutions
Vocal folds “squeeze card” is usually because the breath is not smooth; the vocal folds do not have
breath support, but the vocal folds have to produce sound. The vocal folds only rely on the
surrounding muscles to help the vocal folds vibrate and sing, and the jaw, throat, and chest cavity feel
very tense. Such a voice does not sound loose, stiff, or pale. A vocal teacher can analyze a student’s
voice with the help of spectral analysis software to help determine what problems the student may
have. Figure 7 shows the spectrum of a typical vocal fold squeeze, with very pronounced and abrupt
peaks and no natural transitions. Through observation, it is observed that the voice is straight and
does not vibrate as it typically does when it is relaxed.
Figure 7. The spectral curve of Cord tension
To solve the sound “squeeze card” problem, first of all, to solve the breath problem, first have gas
after the sound, the sound in the breath, only breath problem solved, in order to liberate the vocal
folds around the excess strength, and then to sound to establish a correct understanding: singing can
not blindly pursue high, the pursuit of bright, and only based in the proper method of vocalization of
the voice, in order to go! Lyric song emotion, which is the beautiful sound.
4 Conclusion
In this paper, we constructed a vocal spectrum analysis model. We verified its auxiliary effect on
vocal teaching based on the comparison of professional singers’ and students’ vocal spectra. We
explored the performance of common vocal teaching problems in the spectrum and came to the
following conclusions:
1) In the vocal training of “a” and “i,” the professional singer’s spectral curve at 4kHz has
dropped to the range of -15db to -18db, but the student still maintains the range of ±6db and
is still in the rise. The spectrum analysis corrected it, and it now has a downward trend
reaching the range of -6 to -3, which confirms the effective auxiliary effect of life spectrum
analysis on vocal music teaching.
2) Through the spectrum analysis of the three typical problems of “air leakage,” “laryngeal
sound,” and “vocal folds squeezing” in vocal music teaching, it is found that the state of air
leakage is 5500hz, 10500hz, 14500hz, 10500hz, 14500hz and 14500hz, 10500hz, 14500hz,
and other nearby frequency bands are missing. When there is a guttural sound, there are
missing values at 7 kHz, 12 kHz, and 14 kHz. The peaks are very pronounced and abrupt when
the vocal folds are squeezed out of the throat, and there is no excessive resonance.
References
[1] Cao, W. (2022). Evaluating the vocal music teaching using backpropagation neural network. Mobile
Information Systems.
[2] Liu, W., & Shapii, A. B. (2022). Study on aesthetic teaching methods in ethnic music teaching in
universities in the context of intelligent internet of things. Scientific programming(Pt.16), 2022.
[3] Ma, X. (2021). Analysis on the application of multimedia-assisted music teaching based on ai technology.
Advances in multimedia(Pt.1), 2021.
[4] Bittner, R. M., Demetriou, A., Gulati, S., Humphrey, E. J., Reddy, S., & Seetharaman, P., et al. (2019).
An introduction to signal processing for singing-voice analysis: high notes in the effort to automate the
understanding of vocals in music. IEEE Signal Processing Magazine.
[5] Xu, Y. (2021). Systematic study on expression of vocal music and science of human body noise based on
wireless sensor node. Mobile information systems.
[6] Huang, M., & Zhang, Y. (2021). Design and construction of a pbl based evaluation index system for
classroom music education. International Journal of Emerging Technologies in Learning (iJET), 16(17),
107.
[7] Dimitrova-Grekow, T., Klis, A., & Igras-Cybulska, M. (2019). Speech emotion recognition based on
voice fundamental frequency. Archives of acoustics, 44(2), 277-286.
[8] Takeuchi, M., Soejima, Y., Ahn, J., Lee, K., Takaki, K., & Ifukube, T., et al. (2022). Development of a
hands-free electrolarynx for obtaining a human-like voice using the lpc residual wave. Electrical
engineering in Japan.
[9] Xiang, X., Zhang, X., & Chen, H. (2021). Acquisition and enhancement of remote human vocal signals
based on doppler radar. IEEE sensors journal(21-18).
[10] Han, Jae HyunKwak, Jun-HyukJoe, Daniel JuhyungHong, Seong KwangWang, Hee SeungPark, Jung
HwanHur, ShinLee, Keon Jae. (2018). Basilar membrane-inspired self-powered acoustic sensor enabled
by highly sensitive multi tunable frequency band. Nano Energy, 53.
[11] Karthika, Vijayan, Haizhou, Li, Tomoki, & Toda. (2018). Speech-to-singing voice conversion: the
challenges and strategies for improving vocal conversion processes. IEEE Signal Processing Magazine.
[12] Raymundo, A. A., Akhtar, M. Z., Felipe, S. J., Douglas, O., & Henrique, F. T. (2018). Feature pooling of
modulation spectrum features for improved speech emotion recognition in the wild. IEEE Transactions
on Affective Computing, PP, 1-1.
[13] Zhang, Y., & Yi, D. (2021). A new music teaching mode based on computer automatic matching
technology. International Journal of Emerging Technologies in Learning (iJET)(16).
[14] Li, W. (2019). Design and implementation of music teaching assistant platform based on internet of things.
Transactions on Emerging Telecommunications Technologies.
[15] Nam, J., Choi, K., Lee, J., Chou, S. Y., & Yang, Y. H. (2019). Deep learning for audio-based music
classification and tagging: teaching computers to distinguish rock from bach. IEEE Signal Processing
Magazine.
[16] Xia, X., & Yan, J. (2021). Construction of music teaching evaluation model based on weighted nave bayes.
Scientific Programming.
[17] Zhang, Y., & Li, Z. (2021). Automatic synthesis technology of music teaching melodies based on
recurrent neural network. Scientific programming(Pt.13), 2021.
[18] Gan, L., Wang, D., Wang, C., Xiao, D., & Li, F. (2021). Design and implementation of multimedia
teaching platform for situational teaching of music appreciation course based on virtual reality.
International Journal of Electrical Engineering Education, 002072092098609.
[19] Neilsen, T. B., Vongsawad, C. T., & Onwubiko, S. G. (2020). Teaching musical acoustics with simple
models and active learning. The Journal of the Acoustical Society of America, 148(4), 2528-2528.

The Use and Effective Analysis of Vocal Spectrum A

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Use and Effective Analysis of Vocal Spectrum A

Uploaded by

Copyright:

Available Formats

Applied Mathematics and Nonlinear Sciences, 9(1) (2024) 1-14

Applied Mathematics and Nonlinear Sciences

2 Construction of the human voice spectral analysis model

2.1 Spectrum analysis algorithm

2.1.1 Time-frequency analysis of vocal signals

The N -point DFT is:

DFT  ax1 ( n ) + bx2 ( n )  = aX 1 ( k ) + bX 2 ( k ) (5)

2.1.2 Fast implementation of the Fourier transform

Denote by X1 ( k ) and X 2 ( k ) the N / 2 -point DFT of x1 ( l ) and x2 ( l ) , respectively, i.e:

When N = 23 = 8 , the first parity extraction decomposition is represented by a butterfly diagram. It

2.2 Vocal Spectrum Recognition in Vocal Music Teaching

2.2.1 Detection of vocal frequency signals

 ( x ) = xGGGGx − 2 yGGx (14)

2.2.2 Audio signal feature sequence extraction

Dij = b j − bi i, j = 1,, N (15)

 0 D12 D13  D1N 

diagi = bi − bave (18)

 diag1 D12 D13  D1N 

2.2.3 Signal Recognition

2.2.4 Building a Human Voice Frequency Dataset

3 Results and Discussion

3.1 Vocal Spectrum Analysis as an Aid to Vocal Teaching and Learning

Figure 1. Professional singers contrast with vocal students singing

Figure 2. Professional singers and vocal students singing contrast (improved)

Figure 3. Professional singers contrast with vocal students singing

3.2 Spectral analysis of common problems in vocal music

3.2.1 Spectral curves of sound leakage and their solutions

Figure 4. Unleakable vocal cord spectrum Figure 5. Spectrum of acoustic leakage

3.2.2 Spectral curves and solutions for laryngeal sounds

Figure 6. The spectral curve of the guttural sound

3.2.3 Spectral curves of vocal folds “jammed” and their solutions

Figure 7. The spectral curve of Cord tension

You might also like