Digital Synthesisof Natural and Unnatural Sounds


Murray Hill, NJ 07974, USA

The sampling theorem guarantees theoretically that any sound which can be heard
by the human ear can be synthesized from digital samples. Today, computers have
actually produced a great variety of interesting and powerful timbres. These range
from close imitations of musical instruments to sounds never heard before. An analysis-
synthesis technique has been developed which involves analyzing natural sounds and
synthesizing approximations to these sounds using simplified models. This work has
led not only to rich synthesized sounds, but also to a fundamental understanding of
the important factors in natural sounds. Unnatural sounds include pitch and rhythm
paradoxes in which the apparent pitch or tempo of a sound can simultaneously increase
or decrease; timbres with nonharmonic overtones which preserve.some perceptions of
classic harmony while rejecting others; and sounds in which the timbre depends on
unusual ways in which the spectrum of the sound changes during a note. Examples of
various synthesized sounds are included.

The sampling theorem says that theoretically any had to be synthesized correctly, including the right
sound that can be heard by the human ear can be syn- average spectrum of the sound, the right attack time,
thesized from computer-generated samples of a sound and the right decay time, but these things in themselves
waveform. The question is, how far have we come are not sufficient. The real key that he found is that
toward this theoretical goal? The original sounds made when the tone gets louder, the proportion of high-fie-
in the late 1950s and early 1960s were far from the quency energy must increase. At low amplitude the
ultimate possibilities. Since that time, great progress first overtone or the first harmonic is the strongest, and
has been made by many people, the rest of the harmonics fall off fairly rapidly; at loud
One of the classic early achievements was the tech- sounds the highest energy is in one of the higher har-
nique which Jean-Claude Risset developed for syn- monics. This particular pattern of increase in the high-
thesizing and understanding normal instrumental frequency energy as the tone builds up is the key to
sounds. The first sound sample consists of ten tones making a good trumpet sound.
from Risset. Five of the tones are recordings of atrumpet How did Risset discover this? Risset used a two-
and five of the tones are synthesized on the computer, pronged attack on the problem. He did a frequency
These are randomly intermixed. The correct order of analysis of sounds he recorded from a real trumpet.
the sounds is TCTCCTCTTC (C--computer, T-- He found out lots of important information from the
trumpet ). The normal person, in listening to these analysis, and he found out much unimportant infor-
sounds, can get about 50% correct; in other words, the mation. How did he separate the two? He used synthesis
person is guessing. A good trumpet player can get about to separate them. He synthesized the sounds on the
70% correct, tape, putting in some of the informationfrom the anal-
What did Risset find out is necessary to make good ysis. When the synthesis sounded good to his ear, he
trumpet tones? Some of the things that were well known knew he had the important information about what

constitutes the trumpet sound. This is an example of The complete details of this instrument are not im-
what is called the analysis-synthesis technique for portant here. The point is that we have a language for
studying timbres. It is indeed a powerful technique, describing complex sounds so that once one person has
If you hear a passage with a sequence of sounds put found out how to make a sound, this person can write
together as Risset knew how to make them, you would down and publish the instrument used plus the score.
instantly be able to tell the difference between the syn- Other people can then learn from this experience.
thesized trumpet and a real trumpet. Risset, at the time To recapitulate our understanding of instrumental
he made these sounds, did not know what we would sounds, we have the analysis-synthesis technique, we
today call the prosodic features of playing the trumpet, have instruments which involve unit generators to de-
In other words, he did not know how one note is blended scribe the structure of what we know, and we have
into the next note by a real trumpet player, scores which say what quantitative parameters have to
Dexter Morrill at Colgate University studied this be put into these instruments. This knowledge has
question extensively and has made some very nice con- formed the basis of much subsequent research by many
nected passages. Morrill is a trumpet player himself, people. In the last 20 years we have learned more about
The second sample on the tape is one of his passages instrumental sounds, primarily through these tech-
synthesized on the computer, niques, than was known in the entire period before
There are many prosodic features in this example, then.
the most prominent of which are the pitch glides that After considering instrumental sounds, let us turn
connect one note to the other, to some unnatural sounds. A class of the most interesting
The third sample is a flute sound, also by Risset. unnatural sounds are called the paradoxical sounds.
Fig. 1 shows the diagram of the computer program The pitch paradox is one of these. The fourth sample
which produced this sound. The program is best de- contains a pitch paradox first discovered by Roger
scribed by an interconnected set of unit generators. Shepard. It consists of 12 tones played over and over
One generator is an oscillator that makes the waveform, again. The characteristics of these tones are such that
It has two controls, an amplitude control and a frequency they sound like a continually ascending sequence of
control. It has a waveshape that can be specified by a pitches. How is this done? It is actually done in a very
stored function in the computer memory. The stored simple way. Fig. 2 shows the spectrum of a few of the
function that Risset used contained one to six harmonics, tones in the series. Not all the harmonics are present.
one harmonic for the high pitched notes and more har- The overtones are all one octave apart; none of the
monics for the lower pitched notes. Another oscillator other overtones are in the sound. The computer can
imposes an attack and decay function on the waveform easily synthesize this. A normal instrument cannot.
oscillator. It determines how the sound builds up and The paradox takes advantage of the fact that tones
decays away. This oscillator is used in a degenerate that are an octave apart are hard to tell apart by listeners.
way; it produces only one cycle for each note. People tend to make octave mistakes. The amplitude
Another oscillator imposes a sinusoidal fluctuation of the tones is arranged so that the low-frequency tones
on the amplitude. The rate of that sinusoidal fluctuation and the high-frequency tones are very faint, and, indeed,
is 5 Hz, yet another oscillator imposes a slight random the lowest frequency tone is so faint that you cannot
quaver on the amplitude at a fairly high rate of 60 Hz, hear it. When the pitch is moved up an octave, the
but a low amplitude of about 1% of the average am- highest frequency tone is so faint that you cannot hear
plitudeof thenote. it.
As the pitch goes up, the amplitude of the low-fre-
quency tones increases and the amplitude of the high-
o.o, 60 5 ,,Ofo frequency tones decreases. The bell-shaped curve that

with pitch. When the pitch has moved up exactly one

octave, then the highest frequency tone can be elimi-
nated and a low-frequency tone can be added; you cannot

R_ specifiesthe envelope of the spectrumdoes not change


I10 220 440 880 1760 3,520


Fig. 1. Risset flute. Fig. 2. Discrete pitch paradox after R. N. Shepard.


hear these changes. Thus we have a cyclic phenomenon better understanding of the nature of violin resonances.
which sounds like a sequence of tones that are contin- The work was carried out by analyzing sounds from a
ually going up in pitch, real violin sound and then by synthesizing sounds and
Risset generalized Shepard's technique into a con- tuning the electrical resonances to achieve a good sound
tinuous glissando. The fifth sample is a glissando where quality.
the pitch is going down. It is a stereo example, The Fig. 3 shows the amplitude-frequency response of
sound from the two loudspeakers is slightly out of phase three different sets of resonances which have been tried
with respect to the downward sweep in pitch, and so with the electronic violin. The three curves differ pri-
you will hear a beating at about 2 Hz between the two marily in the Q of the resonators, the curves being low
loudspeakers. This adds both to the beauty of the tone Q (zero), medium Q, and high Q. The medium Q re-
and to the illusion, sonators produced the best tone, the low Q resonators
Risset was able to identify two different components produced harsh tones, and the high Q resonators pro-
of the pitch. One is associated with the fundamental duced a nasal or pinched effect. The ninth sample dem-
frequency of the tone; the other is associated with the onstrates a scale played on the medium Q resonators
peak of the spectrum where the energy is highest. He which give the best sound.
makes sounds, for example, where the pitch goes up It is difficult to judge a violin by a scale, so the tenth
and at the same time the point of the highest energy sample on the disk is a two-violin duet, one violin
goes down. In this way he makes a sound which both being a normal acoustic violin and the other an electronic
descends and ascends in pitch at the same time. He violin. Most people have great difficulty in determining
describes it as a sound which goes down the scale, but which is which. The violin which begins the duet is
goes up in brightness. It is the sixth sample, the electronic violin. A good violinist can tell the elec-
Another paradox can be created with tempo or rhythm, tronic violin from the acoustic one, but it is clear that
Ken Knowlton invented a sound which apparently gets the two instruments are close enough together in timbre
faster and faster, but in reality is periodic so that every so that they can be effectively played together. The
20 seconds it repeats. How is this sound made? The electronic violin can play the repertoire of the normal
sound starts out as a regular series of pulses, about five violin.
pulses per second. As time goes on, these pulses get The signal from the violin string and the pickup need
faster, but in addition every other pulse gets smaller.
Eventually, when the time between two successive
pulses equals half the original time, every other pulse 4
has been reduced to zero in amplitude so that the sound o
is as it was at the beginning. The seventh sample dem- -4
onstrates thiseffect, d8
This illusion is not quite as convincing as the pitch
illusion, but it nevertheless does work.
A last paradox is demonstrated in sample eight. It 200
i 500
O0 2000
I 5000
is the up/down fast/slow paradox, again by Risset, where ,_
he puts all these effects together and produces a sound co_
which goes down the scale and up in brightness, slows 4

down in tempo, but gets faster. With the right adjust- o x AA^ A AA,I_AA_A_!
ments of the stereo loudspeakers, the sound will also
It is unnecessary to say that these sounds are unique
vv vv,w vvv,v,vv,
tO the computer; no other instrument could produce -20
them. The spatial effects that are used in the eighth -24

sound sample depended very heavily on techniques that -_2

I [ I I I
were developed by John Chowning who used rever- 2oo 50o HZ_ooo 20oo 5000
beration, Doppler frequency, and loudness to control cb_
the apparent location of sounds.
Next we will deviate from the purely digital part of
this talk to mention some violin sounds. These sounds o _ ,,, , l_lllllJJl_lJ
have used the same techniques for their development -4
that we have used for the digital sounds, and they also -6

show the possibility of making natural and unnatural -.6

soundswitha violin. -20
The violin is an electronic violin where the body of -24

the violin has been removed and, instead, the vibrations -32 z_o _o '
_OOO zc_ '

of the strings are picked up by contact microphones Hz

and processed through a set of parallel resonances. The {=_
sound comes out of a loudspeaker. This work led to Fig. 3. Violin formant. (a) Harsh. (b) Best. (c) Pinched.
MATHEWS processed by a series of violinlike resonances. Olive and applied them in his unique way to distort
Instead, we can substitute something very different, speech for musically expressive purposes. The kinds
We can go back to what Risset learned about brass of distortion that he found interesting are illustrated in
sounds and arrange a voltage-controlled low-pass filter sample 13. Dodge demonstrates strong and interesting
plus an amplitude detector, so that when the violin pitch and time distortions in the speech that can be
string is played louder, the resulting sound will have made.
more high-frequency energy. We would expect that Many listeners find the quality of these speech sounds
this might have a brasslike sound. The 1lthsoundsam- expressive and interesting, but not very beautiful. If
ple shows that it does. one is not trying to say words and communicate infor-
Another instrument which is important to synthesize mation, one can make very beautiful speechlike sounds.
with the computer, and also very difficult for the eom- The 14th sample was prepared by Risset. I will not
puter to synthesize, is the human voice. Much effort discuss how it was made, but simply say that I find it
has gone into synthesizing a good speaking voice. One gorgeous. These samples show that the computer can
can also impose melodic pitch patterns on the speech make both natural and unnatural speech sounds.
and produce a song with the same synthesis equipment. I have demonstrated computer-synthesized examples
Joseph Olive at Bell Labs wrote a short opera. The of sounds, some of which are very normal and some
characters in the opera are a woman scientist, who very abnormal. In sample 15, I end with an excerpt
made a speaking computer, and the computer itself, from a piece that Risset composed, entitled Songe.
The computer falls in love with its creator, and they When one initially hears the piece, the timbres seem
sing the duet which is reproduced in sample 12. Alas, so beautiful that one feels they must have been played
shortly after this moment of passion the scientist became on a very old instrument. Upon closer listening, one
alarmed at their incestuous relationship and took the realizes that the sounds could only be produced by a
computer apart. Alas, operas have to have sad endings, computer and no other instrument could possibly have
Charles Dodge took techniques developed by Joe made them.


Max V. Mathews was born in Columbus, Nebraska, development of computer methods for speech pro-
on 1926 November 13. He received the following de- cessing, studies of human speech production, studies
grees in the field of electrical engineering: B.S. in of auditory masking, and the invention of techniques
1950 from the California Institute of Technology; M.S. for computer drawing of typography. He is currently
in 1952, and Sc.D. in 1954 from the Massachusetts investigating the effect of resonances on sound quality.
Institute of Technology. Resonances are essential parts of human speech and
He joined the staff of Bell Telephone Laboratories of violins. Both better speech transmission and better
in 1955 and is currently director of the Acoustical and violins may be achieved by a better understanding in
Behavioral Research Center. This laboratory carries this area.
out research in speech communication, visual eom- Mr. Mathews is a member of the Audio Engineering
munication, human memory and learning, programmed Society and Sigma Xi. He is a fellow of the Acoustical
instruction, analysis of subjective opinions, and phys- Society of America and the Institute of Electrical and
ical acoustics. ElectronicsEngineers.In 1973he receivedthe IEEE
Mr. Mathews' personal research is concerned with David Sarnoff Gold Medal Award.
sound and music synthesis with digital computers and From 1974 to 1980 he was the scientific advisor to
with the application of computers to areas in which the Institut de Recherche et Coordination Acoustique/
man-machine interactions are critical. He developed a Musique (IRCAM) in Paris, France. In 1975 April he
program (Music V) for the direct digital synthesis of was elected a member of the National Academy of
sounds and, more recently, a program (Groove) for ' Sciences, andin 1979 March he was elected a member
the computer control of a sound synthesizer. Music V of the National Academy of Engineering. In 1982 May
is now widely used in music departments in the United he was elected fellow of the American Academy of
States, Europe, and Japan. His past research included Arts & Sciences.


