Download as pdf
Download as pdf
You are on page 1of 13
Scott McCoy | 73 Chapter 6 Voice Analysis tVoloe science’ has provided a wealth of information about the voice and its function, much of which is directly applicable to artistic singing. Research into acoustics and biome- chanics—including issues of breath Pressure and airflow—have been particularly important. This chapter will explore some of the ways this information trickles down to singers and teachers for practical use in the voice studio. Acoustic voice analysis might have originated in the scientific community, but it cer- tainly has not remained exclusively in that domain, Thanks to the pioneering work of prom- inent pedagogues, including William Vennard, Ralph Appleman, and Richard Miller, analysis has become an important, accessible tool for singers and their teachers. Analysis instru- ments, such as oscilloscopes and spectrum analyzers, formerly were found only in well- funded laboratories. But the personal computer has changed everything, With appropriate software, an inexpensive PC* becomes a comprehensive analysis station—something that would have cost tens of thousands of dollars less than a generation ago. Ordinary singers and teachers are now able to see the voice using programs and technology that are affordable and accessible. Indeed, if you can write an e-mail or log on to Facebook®, your technical savvy is sufficiently advanced to use and understand many of these tools. Voice laboratories that include sophisticated analysis instruments are increasingly common in music schools around the world, many of which have capabilities extending well beyond acoustics.’ Vocal fold vibration can be examined noninvasively through a process called electroglottography. Breath support can be evaluated through inductive plethysmog- raphy, which calculates the range of motion in the thorax and abdomen during respiration, and electromyography, which measures electrical activity in individual muscles during their contraction. Aerodynamic systems are available to measure airflow and the subglottal air pressure required to initiate phonation (reliable ways to measure subglottal pressure during phonation are more invasive). Nasality can be quantified through airflow and acoustic measures. Hybrid instruments even exist that can explore acoustics, airflow, and vocal fold movement simultaneously. While most of these instruments are beyond the fiscal means of individual singers and teachers, a respectable voice lab can be set up in an institution for no more than the cost of a typical computer, music theory, or piano lab. 4 At the time of this writing, computerized voice analysis remains firmly rooted in the PC world. Options for Mac are limited but improving. 5 A partial list of notable institutions with significant labs includes Oberlin, The Ohio State Uni Shenandoah Conservatory, Westminster Choir College, University of Kansas, University of ‘Texas-Austin, University of Texas-San Antonio, and the University of North Texas. This list is far from inclusive and additional schools establish labs every year. 74 | The Basics of Voice Science and Pedagouy i ow fast the computer Computerized voice analysis is not a panacea; no poate nes ihe compte oF hoy complex the program, itis unlikely ever to surpass the ican rar a ern ing in a voice; it cannot, , = help its use 1d what is happening in a voice i Sound ig pits user understan PE tference between acrines ag i f he beautiful or musical. In fact, it cannot reliably tell c ‘ is is ed as a tea soprano! Its this author's opinion that voice analysis i best viewed as a teaching aid, myc like an audio ot video recorder, Just as no one has ever learned to sing solely by listening rp recordings of his voice, no one will learn to sing solely by looking at a computer monitor Acoustic analysis transforms sound from the aural to the visual realm, Much ike ww, analysis instruments divide sound into tg prisms refract light into the colors of the rainbow, analysis ins y component parts: overtones and noise. The result is an objective representation of what once existed only in the subjective realm, Timbre can be visualized and quantified. The singers formant can be located and measured for frequency and amplitude. The relative amplitudes of the first vowel formant and the singer’s formant can be analyzed to help create a beautify| chiaroscuro tone. Legato, vibrato, and changes in musical dynamics can be visualized and used as biofeedback in teaching, Analysis instruments and programs come in a wide variety of formats. Rudimentary examples can be downloaded for little or no cost from the Internet; few of these programs are designed specifically for voice analysis, but they still can be useful. Programs written specifically to look at the voice can range in price from relatively modest to prohibitively expensive. Increased cost usually is accompanied by more accurate acquisition, digitalization, and analysis of the audio signal, additional display options (such as 3-D plotting of spectro grams), and voice-specific analysis options such as formants, vowel plotting and perturbation assessment. ‘The remainder of this Chapter will focus on specific types of voice analysis and the way they might be used. Most examples were created using VoceVista Professional (Visualization Software, LLC), and the Computer Speech Lab (Kay-Pentax). Acoustic Measures a VW [TT] Power Spectrum Ba {| AMINA : WYV\g AN A power spectrum displays E ey Ww AN VA ie the analysis through a line or bar PRY TP YA AA Aaa : 06 anno — sano" graph, allowing the components ~ Frequeney{Hz) | of a complex soundwave to be Figure 6.1: Power spectrum, showing frequency deen (gtuplieude aod feeqsecny of and amplitude of harmonics the fundamental and overtones). In a typical power spectrum, fre- quency is plotted on the horizontal (X) axis and amplitude is plotted on the vertical (Y) axis (Figure 6.1 and 6/1). Scott McCoy | 75 Power spectrum analysis shows the content of sound during a single moment in time. pis period might range from a few milliseconds to several hours (length i restricted by the rocessing power and sampling limitations of the analysis device). When the ita i ended to include several cycles of vibration, the resulting analysis is called a are avers ge spectrum (LTAS), as was shown in the previous chapter. ‘An analogy might prove helpful in understanding aspects of a power spectrum. When sgewed in real-time, the analysis is like a movie; it consists of individual frames that knit qogether to form a seamlessly moving image. When you view a still image of a power spec- ‘ram, it is like viewing 2 photograph or a single frame from a movie. Long-term average s Used as a feedback tool, power spectrum analysis only can display the sound that leaves a singer’s mouth; it cannot sepa- rate the source from the filter and therefore cannot directly show the locations of formants. The a tion of formants, however, is ea ily inferred by looking at tl relative strength of elements the spectrum; harmonics thata stronger than others almost cer- tainly have been amplified by a formant. You can’t see the actual formant, but you can see its im- pact on the sound (Figure 6.2). Formant locations can more easily be inferred in an FFT analysis using vocal fry or an elec- ro-larynx (6/2), a device used by people who have had their larynx surgically removed. In both | HT Haanwasty sectrum is like @ photograph taken with along exposure time, “Energy(dB) Figure 6.2: Power spectrum showing harmonics that are amplified by formants ‘Amplification from formants. ia cm ae esis Figure 6.3: Vocal fry used to estimate formant locations cases, there is a low fundamental frequency followed by a large number of overtones; as a result, formants emerge from the FFT spectrum in great detail (Figure 6.3). 76 The Basics of Voice Science and Pedagogy Spectrogram of time to acoustic analysis. The graphic display isa (Figure 6.4). Frequency now appears on the izontal (X) axis. Amplitude is shown by colo, change (gray scale or color). Spectrograms add the dimension bit like a power spectrum turned on its side vertical (Y) axis and time appears on the hori Spectrograms © may either be wide. * |B band or narrowband (some programs also F provide a middle. - band option). Nar- rowband _spectro- : grams divide the Figure 6.4: Spectrogram + power spectrum that is frequency spectrum rotated 90 degrees counterclockwise into small segments, = e 3 & allowing clear visualization of harmonics and time-related vocal events such as vibrato. Wide- band spectrograms divide the frequency spectrum into broad swaths; individual harmonics are obliterated but the total bandwidth of formants is more readily visible. Wideband spec- trograms also provide high resolution in the domain of time, in some cases allowing acoustic visualization of each opening and closing cycle of the glottis (Figure 6.5). Tineee) Se) O ESAS) Tyne) a Figure 6.5: Wideband (left) and narrowband spectrograms Spectrogram Application Students acquire knowledge in different ways. Some are kinesthetic learners who can- not accomplish a task until they know what it feels like in their bodies. Others are intellectual learners who must understand the concepts before they can be put into action. Still others learn aurally, responding best to modeling from the teacher. Many students, however, are visual learners; they must see it to do it. Feedback from real-time spectrograms is ideally suited to these visual learners. Teachers must remember that the current generation of stu- dents does not remember a world without computers. For them, receiving information from a display screen is familiar and normal. Unfortunately, spectrographic feedback tends to be more effective for male than female voices. The reason is simple: men almost always sing in a frequency range that permits a Scott McCoy | 7 large number of harmonics to be visible on the spectrographic display. Women, however, enerally sing much higher. At times, only three or four harmonics might be seen, a demon. caraed DY Figure 6.6, which shows the same phrase sung by a baritone and a Spel and inthe web example 6/3. Notice that the spatial separation between harmonics is greater for the soprano and that fewer harmonics always are visible. Figure 6.6: Baritone (left) and soprano singing the same phrase Spectrogram use is further complicated by linear frequency displays. This can be par- ticularly troubling for musicians, who are accustomed to thinking in terms of pitch, rather than frequency. A spectrogram displaying a frequency range of 0-5,000Hz, for example, in- cludes pitches extending to E8-flat (a minor third above the highest note of the piano). The midpoint of this display is 2,500Hz, E7-flat, which is the highest E-flat on the piano key- board. Eight musical octaves are squeezed into the bottom half of the display; a single octave occupies the top half! Some programs offer a solution to this problem through an option to display frequency logarithmically. Using this system, harmonics are spatially separated like their equivalent musical intervals (Figure 6.7 and 6/4). Many musicians find this type of display to be more intuitive. ee) ieee _Time(sec) _ Figure 6.7: Linear (left) and logarithmic displays of frequency While real-time spectrograms can be used in many ways during voice instruction, in my experience, five applications are most important: control of vibrato; control of legato; monitoring the strength and bandwidth of the singer’s formant; monitoring elements of dic- tion, including the duration of consonants, and the length of diphthongs (both intentional and accidental); and verifying the pitch accuracy of vocal onsets and releases 78 | The Basics of Voice Science and Pedagogy re of singing in the Western, classical Vibrato is an important stylistic feat ng ; .r vibrato to always remain uniform and consistent, Tegardles. ess, Most singers strive to allow thei . of pitch or loudness. Vibrato easily is seen using narrowband spectrograms. When it ig sent, the harmonics appear tO wiggle ina nearly sinusoidal pattern (changing ace ees rate & speed). This movement is most apparent in the higher harmonics where large age of frequency only result in small changes of pitch. If vibrato is absent, harmonics wij : es like flat lines, not unlike the EKG tracings of someone whose heart has stopped aay Alterations in vibrato speed, which frequently are employed by singers in commercial on are clearly visible. Perturbations or other irregul: es, (Figure 6.8 and 6/5). larities within the vibrato can also be seen Figure 6.8: Four types of vibrato Narrowband spectrograms also allow us to measure the rat i can be a tedious task with some programs, especially when pcleneeeraanece ieee ah VoceVista, however, provides a special function to automate this process. The ca ee tracts a single harmonic from the spectrum and places it in its own analysis Gtuioe Tis cursors are used to select a portion of the waveform from which measures of he tbr es in paeiieetroire oe 69 and 6/6). The vibrato measurement routine ae eee es s later) includes calculations for amplitude modulation and fre- Scott McCoy | 79 a ie fe or —_—— = Pe SE | * SRR IRIE TRIRTTRIRIRIRES WW toe | RRARRARER RRR PISS DIPPING Figure 6.9: Vibrato measurement tool Visual learners often can correct problems related to vibrato quickly using the sometimes in only a few minutes. Teacher supervision is ‘echt ed a ete student allow their natural vibrato to occur and are not manufacturing the sound. Oia . ers learn the sensations and sounds of the new production, use of the spectros a shold be halted to avoid dependency. In this regard, the spectrogram becomes a nae it stimulates new behavior patterns that can continue after the direct feedback is eliminated. Control of Legato Legato is visualized in a spectrogram through the continuity of the signal. Non-legato connections are seen as vertical interruptions in the harmonics, which might result from poor yowel/consonant continuity, from technical issues, such as breath control, or from inatten- tion to the musical line (Figure 6.10 and 6/7). As with vibrato, real-time spectrographic feedback might help students see things about their legato that they cannot yet hear. Eventually, however, the ear and brain must Figure 6.10: Legato (left) and non-legato singing Strength and Bandwidth of the Singer’s Formant The center frequency, bandwidth, and amplitude of the singer’s formant are shown through the intensity of high frequency harmonics (remember—we don’t see the actual for- mant, we only see how it changes the sound). If Fs is active, a continuous band of energy will be visible somewhere in the range of 2,400~-3,200Hz. The singer’s formant occupies a 80 | The Basics of Voice Science and Pedagoey is not the same for all yoj, relatively narrow band within this frequen) ae ae ee ce eS For it altos, and mezzos, Fs © e} icy than for basses, baritones, contralto: rdeally, we want to see Fs producing Be tenors and sopranos (when they are using it). Idealy, (Figure 6.11)- This is a difficult task for some people, cape! 1 /u/ and /o/. Bi a | and support adj amplification across all vowels cially for the closed/back vowel can help singers learn the vowel sound. ut the biofeedback provided by Spectrograph, justments required to achieve this consisten, nee SI a eee Figure 6.11: Consistent (left) and inconsistent singer's formant Excess nasality also can impact the singer’s formant. Rather than tightly clustering F3. F5 within a narrow frequency band, the energy can split into two discreet zones. Because this yields a less desirable sound, I tend to view this type of Fs as the equivalent of a serpent’s split tongue! In most cases, the singer’s formant will reunite into a single band as soon as lifting the soft palate cuts off the flow of air and sound through the nose. Diction and Language Many singers find diphthongs problematic, often because they are sung differently in different styles of music. In the classical idiom, the first vowel in the pair generally receives the greater importance and longer duration. The opposite can be true in some popular or commercial styles, which creates problems for some singers when they must adapt to the pure vowels of most foreign languages. For example, native speakers of English almost al- ways pronounce the vowel /o/ as the diphthong /ou/ and /e/ as /ei/. Habitual events, such as inappropriate use of diphthongs, can be very difficult for students to hear or feel in their own singing—it is the equivalent of trying to hear your own accent. Visualization through spectrography can increase awareness of such habits, facilitating their correction (Figure 6.12 and 6/8). Figure 6.12: Pure vowels (left) and unintentional diphthongs Scott McCoy | 81 Double consonants in Italian pose another chronic problem for developing singers (and even some professionals). In terms of duration, a double consonant is nae ee ae as Jong asa single; the relationship actually approaches ten to one. Because time is alaplayed in spectrograms, they are ideal tools for di: splaying the duration of single versus dlouble conso- nants. As shown in Figure 6.13 and 6/9, the difference bevw. i Bae jo dia end cons ovens en the single /n/ of una and the Figure 6.132: Double (left) versus single consonants Onset/Offset of tone Spectrograms can reveal significant information about how sounds begin and end. Many singers fall into the habit of sliding up to pitches in a style sometimes called “scoop- ing.” This articulation might be appropriate in some musical styles but is problematic for others. In this regard, scooping has quite a lot in common with delayed start of vibrato and accidental diphthongs: it easily becomes a habitual part of a singer's sound. And as we all know, habits—especially bad ones—can be difficult to break. A spectrogram clearly reveals scooping onsets, making it an ideal tool to trigger awareness of the habit (Figure 6.14 and 6/10). Remember: correction requires recognition! larly Li Spectrographic feedback also can assist with breathy (aspirate) and hard (glottal) on- sets of sound, In the aspirate onset, airflow precedes tone; in the glottal onset, air is pressur- ized beneath the glottis and explosively released at the instant tone begins. Aspirate onset is visible in the spectrogram as a subtle “fade in” of the sound, often accompanied by a small scoop up to the pitch. Hard onsets display a burst of energy; initial tuning often is sharp, requiring a quick slide down to the intended pitch. Balances onsets are clean, neither sharp nor flat, and allow vibrato to begin immediately. Offset is visible in much the same manner. An aspirate release—as in a sigh—creates a spectrographic image that fades out, usually ac- companied by a drooping pitch. Hard onset is seen as an abrupt spike in energy at the mo- ment sound stops. Balanced offsets split the difference between the other two (Figure 6.15 82 The Basics of Voice Science and Pedagogy and 6/11). Frequency (Hz) | Figure 6.15: Three types of vocal onset Linear Predictive Coding (LPC) ; LPC, also sometimes called formant analysis, uses @ mathematical algorithm to int, polate vocal tract resonance based on the amplitude and location of harmonics. Like the power spectrum, LPC places data on a two-dimensional graph with amplitude on the Yai and frequency on the X-axis (Figure 6.16 and 6/: 12). The result is a reasonable estimate of formant frequencies and bandwidths provided Fo is ata lower frequency than the expect first formant. As soon as pitch rises above E4 (approximately), LPC accuracy becomes ex. tremely unreliable. This problem occurs because the computer is unable to decipher the dig. ference between Fo and Fl. If /i/ is sung at C3, LPC likely will place F1 correctly at about E4. But if pitch rises to C5, all vowels except Ja/ will appear to have A at that same pitch, By the time pitch exceeds F5, peaks in the LPC signal display harmonics, not formants, 2 a 3000, Frequency (He) Frequency (H2) Figure 6.16: LPC becomes unreliable when Fo is higher than the expected first formant of a vowel When used within an appropriate frequency range, formant analysis through LPC can help a student maintain vowel integrity over a range of pitches. As a scale is sung, formant peaks might vary in amplitude, but should change very little in frequency. LPC also provides an excellent means to monitor the relationship between F1 and Fs to help develop a balanced, chiaroscuro (light / dark) timbre. For many people, optimal sound is produced when the singer’s formant, which usually is the third peak in the LPC display, is maintained at about the same amplitude as the first formant (Figure 6.17). Scott McCoy | 83 Eneray(ae) 2000 3000 ; Frequency(Hz) Figure 6.17: Well-balanced (left) and weak singer's formants vowel Matching Some of the more advanced analysis programs, including the Sona-Match module i cat (Kay-Pentax), use LPC analysis to display the accuracy of vowel sounds. A granh ie re, ared with F1 on one axis and F2 on the other; the program estimates GonaENS ie the vocal sound and plots them as a point on this graph (Figure 6.18 and 6/: 13), aah sr ol att bo 5 E* |r te Fe a it peels ' ‘| | 2826 242322212019 18 17 16 15 14 13 12 11 10 9 8 7 6 | F2 (Hz / 100(logarithmic)) Figure 6.18: Vowel matching through F1/F2 plot Target locations of vowels can be adjusted for differences among men, women and children, and for inter-language variations. It is important to remember to check the program setup in this regard; if a woman sings into a program whose analysis parameters are set for aman, or vice versa, results will be erratic and unreliable. Vowel matching has its best accuracy when measuring spoken, rather than sung, vowels. This is especially true for women because of the aforementioned problem of Fo rising above the typical location of F1. In singing, closed, back vowels (/o/ and /u/) can be difficult for the com- puter to identify accurately, especially in voices with a prominent singer's formant. In these cases, Fs often is mistakenly plotted as F2, skewing the data point to a strange location on the graph. The singer can correct for this by temporarily reducing the ring in his sound. 84 | The Basics of Voice Science and Pedagosy Physiologic Measures Electroglottography oa : = out what the vocal folds are doing during phonation. On, ae © opti pe. Two versions of this device are avaitany It’s not easy to figure : ; i sO) lab] is to look at them directly with = Bia sont ehrough the mouth, and flexible scopes s a costly, invasive procedure that jg technical problems in singing. Besj de ‘re curious about basic voice fy, use by medical personal: rigid sc ao are inserted through the nose. Either way, this i suited to diagnosing pathologies than correcting 8, Wh can afford to take a student to the clinic every time you ction? Electroglottography (EGG) provides a much more eee ae clecttoglo. tograph is an instrument that noninvasively measures vocal aa | Patterns durin, vie bration. Two electrodes are placed on the neck on either ss le ey ee (everything is external) and a tiny current is passed from one electrode tou # oul a ; ectrical re: this signal changes as the glottis opens and closes; ploming © eee signal on a graph provides a detailed, noninvasive assessment of vocal fold vibration. e procedure is 100% free of risk, and permits unrestricted, normal singing during the collection of data, The in. formation provided by EGG is extremely useful for correcting problems created by insufg. cient or excess breath pressure, and more importantly, for issues associated with Voice registers and registration. better Sistance to The basic EGG signal is shown in Figure ge en 6.19 and lnteractive 6/14. The top of the anal. ysis window represents the point of maximum 4 glottal closure; maximum opening is at the bottom. Time passes from left to right as ina spectrogram. Three important aspects of vocal fold vibration are revealed through EGG; the os rate at which the glottis closes, the rate at opens which it opens, and the relative duration of the closed and open phases. Opening and closing ~ Mesimum epering of gts rates are seen in the slope of the ascending/de- Figure 6.19: The EGG signal scending portions of the signal. Open/closed phase duration is expressed through the closed quotient (CQ), which shows the percent of time the glottis is closed during each cycle (the open quotient is the reciprocal of this number). closing phase oa EGG has distinct advantages over laryngoscopy for singers and teachers: ¢ It is noninvasive, allowing people to sing normally while data are collected. ¢ Licensure is not required to own or use an EGG, making it accessible to voice teachers, students, researchers, and clinicians. ¢ It is relatively inexpensive. Lab quality instruments can run into the thousands of Scott McCoy | 85 dollars, but portable units are available for much less « Iris relatively simple to use the instrument and to interpret th ie signals, Conclusion ‘As we have seen, science provides a great vari i ny of which have direct pedagogic sprliion Pests Bs samiaine we poles mant analysis, electroglottography, and airflow has the potential . he if spectrograms, for- petter singers. Of equal or perhaps greater importance, nee ae b apadent, become of analyses deepens our understanding of voice production, coe “i rom these types realize why it is s0 difficult to produce accurate vowel sounds on hi re yperated yen they gounds better than /a/ on a given pitch. igh pitches, or why /e/ i o be intimi i A eke ie ek aes by voice analysis technology or the principles of reso- nance that gui $ use a final analogy: word processing. Do you rememb when you wrote your first document with a word processor? Those of us et rs oath behind us than lie ahead remember making the transition from typewriters to ees But even current students who have never known a time without PCs had area processing from scratch. How did you learn to do this? Did you first read the manual ee cover to cover, memorizing every capability and command, or did you just start typing and Jearn how to make formatting changes as the need arose? I'll wager that every one of ed ted for the latter approach. Begin your foray into the world of voice analysis in the same is Start with a rudimentary spectrogram program and simply play with it. Experiment with it until you begin to see the connection between the sound that comes out of your mouth and the signals that are displayed on the screen. If you are a teacher, become comfortable seeing your own voice before you start to use the technology with your students. Eventually, I hope you will come to accept voice analysis technology as nothing more than an additional com- ponent of your teaching toolbox. Whether we realize it or not, good singing is totally dependent upon acoustics and resonance. Every time we make a subtle vowel adjustment, move our tongue, lift our palate, drop our jaw, change the shape of our lips, and raise or lower our larynx, we change our resonance. Fortunately, we need not rely on intuition and experimentation to achieve optimal resonance. If we understand the laws of acoustics and resonance, we can make informed choices that lead us to more beautiful singing.

You might also like