Professional Documents
Culture Documents
Gender and Voice 2
Gender and Voice 2
Gender and Voice 2
Citation: The Journal of the Acoustical Society of America 98, 3080 (1995); doi: 10.1121/1.413832
View online: https://doi.org/10.1121/1.413832
View Table of Contents: https://asa.scitation.org/toc/jas/98/6
Published by the Acoustical Society of America
Analysis, synthesis, and perception of voice quality variations among female and male talkers
The Journal of the Acoustical Society of America 87, 820 (1990); https://doi.org/10.1121/1.398894
Speaker sex identification from voiced, whispered, and filtered isolated vowels
The Journal of the Acoustical Society of America 59, 675 (1976); https://doi.org/10.1121/1.380917
Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in
running speech
The Journal of the Acoustical Society of America 106, 1054 (1999); https://doi.org/10.1121/1.427115
The perceptual representation of voice gender
John W. Mullennix
Departmentof Psychology,
WayneStateUniversity,71 W. WarrenAvenue,Detroit,Michigan48202
Keith A. Johnson
Departmentof Linguistics,OhioStateUniversity,Columbus,
Ohio43210
3080 J. Acoust.Soc. Am. 98 (6), December1995 0001-4966/95/98(6)/3080/16/$6.00 ¸ 1995 AcousticalSocietyof America 3080
voice, we decidedthat exploringvoice genderwas the logi- tinuum is categorical.The term categoricalperception,as
cal first stepin determininghow voice in generalis percep- used here, adheres to the standard definition of Liberman
tually represented.The seriesof experimentsdescribedbe- et al. (1957) for speech.The test of categoricalness
for a
low was designedto provide specific evidence regarding male/femalevoice continuumprovidesinformationaboutthe
how voice stimuli varying in gender are representedin degreeto which auditoryinformationspecificto voice gen-
memory. der is retainedduringperception.If perceptionis categorical,
In order to investigatethe perceptualrepresentationof this would suggest that some detailed information about
voice gender,we used a twofold approachto the problem. voice genderis lost duringperception.If perceptionis not
First, the issueof categoricalperceptionfor voice was exam- categorical,this would suggestthat informationabout spe-
ined. Althoughthereare debatesaboutwhat categoricalper- cific auditory attributesrelated to voice genderis retained
ceptionis andhow it pertainsto speechperception(e.g.,see and subsequentlyused for discriminating among voice
volume edited by Hamad, 1987), we felt it worthwhileto stimuli within a gendercategory.
explorecategoricalperceptionwith voice genderin orderto In this experiment,identificationand ABX discrimina-
make somepreliminaryassessments aboutvoice genderrep- tion procedureswere usedto assesscategoricalperception.
resentation.Certainly,there is evidencethat categoricalper- The ABX taskis more memoryintensivethanotherdiscrimi-
ceptionmay be a generalperceptualability, rather than a nation tasks (Pisoni and Lazarus, 1974). However, this is
specializedability restrictedonly to speech(Hamad, 1987). preciselywhy we chosethe task. With the ABX task, the
It is possiblethat this generalability couldexhibit itself with likelihoodof obtaininga discriminationpeak in the boundary
voice-relatedstimuli that do not vary in acoustic-phonetic region between male and female voice is maximized due to
dimensions.If, by usingstandardidentificationand discrimi- the additionalmemory load (Macmillan et al., 1977). If a
nation paradigms,it is shown that the perceptionof voice discriminationpeak is not observedwith theABX procedure,
genderis categorical,then discriminationswithin voice gen- we would be confident that the discrimination results reflect
der categoriesshouldbe poor. This result would be consis- an absenceof categoricalperception,since discrimination
tent with the idea that voice information is converted into a peaks would most likely not be found using other low-
reducedrepresentation duringperception,with somedetailed uncertainty
discriminationprocedures (e.g.,AX, 4IAX, etc.).
informationaboutvoice gender"lost." On the other hand, if The predictionsare as follows. If the perceptionof voice
perceptionof voice genderis not categorical,thenthis would genderis categorical,a steep "voice boundary"identifica-
suggestthat, duringperception,detailedinformationrelating tion function should be observed between male and female
to voice genderis retainedand is availableto voice discrimi- ends of the continuumalong with a discriminationpeak in
nationprocesses.This latter result would be consistentwith the voice boundaryarea. In addition,the observeddiscrimi-
recent findings indicating that episodicinformation about nationperformanceshouldfit with predicteddiscrimination
voiceis retainedin memory(Palmeriet al., 1993). data derived from the identificationdata (Liberman et al.,
The secondapproachis to assessthe perceptualrepre- 1957). If perceptionis not categorical,a graduallysloping
sentationof voice genderby usingselectiveadaptationtech- identificationfunctionwith no sharpboundaryshouldbe ob-
niques.The resultsfrom selectiveadaptationexperimentsin servedalongwith the absenceof a discriminationpeakin the
speechperceptionhaveprovenusefulin examininglevelsof obtaineddiscriminationdata. These two alternativesrepre-
perceptualprocessingand representation(Samuel, 1986; sent the two extremesof categoricalversuscontinuousper-
Sawusch,1986). In particular,thesefindingsprovide evi- ception.Any pattern of identificationand/or discrimination
dencefor at leasttwo levelsof processing andrepresentation data falling betweentheseextremeswould representan in-
for phoneticperception:an auditory-based level and a higher termediatefinding betweencategoricaland continuousper-
abstractlevel (Samuel,1986;Sawusch,1986).In thepresent ception.
study,voice genderstimuli were adaptedundervariouscon- A. Method
ditions in order to determinethe level of perceptualrepre-
sentationappropriatefor voice: low-level auditoryor higher 1. Subjects
level abstract.Evidencefor a higher level perceptualrepre- The subjectswere 30 volunteersdrawn from introduc-
sentation would be consistent with the idea that voice is tory and upper-levelpsychologycoursesat WayneStateUni-
storedin terms of abstractmale and female categoriesin versity. Subjectsreceived coursecredit for their participa-
memory.On the otherhand, evidencefor an auditoryrepre- tion. All subjects were native speakersof English who
sentationof voice would suggestthat the representationof reportedno historyof a speechor hearingdisorder.
voice genderis not abstract,but is basedon explicit details
aboutthe auditoryparametersrelevantto voice. 2. Stimuli
In summary,the perceptualrepresentation of voice gen-
The stimuli were synthesizedspeechtokens prepared
der will be investigatedby determiningwhethervoice is cat-
with an updatedversionof the Klatt (1980) softwaresynthe-
egoricallyperceivedand whetheran auditoryrepresentation
sispackage.The stimuliconsistedof 250-ms-durationtokens
or a more abstractrepresentation is appropriate.
of the steady-state
vowel/i/ranging perceptuallyfrom male
to female voice. The vowel/i/was chosen because the for-
I. EXPERIMENT 1
mant values for male and female voice tokens of/i/do not
In experiment 1, we were interestedin determining usuallyoverlapwith othervowels.In a pilot experiment,the
whether the perception of a syntheticvoice gender con- acousticfactors of fundamentalfrequency (F0), formant
3081 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3081
RATING % CORRECT
6 100
80
60
40
20
I • I ; I • • I I, I , 0
I 2 3 4 5 6 7 8 9 10 11
STIMULUS
ß IDENTIFICATION --x-- OBTAINED ABX ...... PREDICTED ABX
FIG. 1. IdentificationandABX discriminationdatafrom experiment1. The identificationdataare indexedby the ratingscaleon the left Y axis andthe
obtained versuspredicted discrimination
performance in termsof percentcorrectdiscrimination
is indexedby thepercentcorrectscaleontherightY axis.
values(F1 ,F2,F3), and glottalfunction(a combination of tinuum(i.e., stimuli 1 and 3, 5 and 7, etc.) are presentedas
AH, OQ, andTL) were manipulatedto createa setof stimuli the "A" and "B" stimuli in the trial. The "X" stimulus is
presentedto 16 subjectsfor perceptualratingsof voice qual- identical to either A or B and the listener decides whether the
ity. In the pilot experiment,low F0 valuesand low formant X stimulusmatchedA or B. 180 randomizedtrials were pre-
values biased listenerstoward "male" voice ratings. How- sented, with half the trials in the ABA format and half in the
ever,glottalfunctionvalueshad little effecteitherway on the ABB format. Each of the three stimulus events on each trial
classification of stimuli as male or female. Two stimuli cor- were separatedby a 500-ms ISI. For each of the nine pos-
respondingto the mosthighly rated "male voice" tokenand sible AB stimuluspairings,the order of stimuli within the
the most highly rated "female voice" token were chosen. pair was counterbalanced. These arrangementsproduceda
These two tokens differed in F0 and formant values, but the total of 20 discriminationresponsesper stimuluspair. Sub-
glottal function values were identical and corresponded to jects were given a 1-min break halfway throughthe trial
the least breathy glottal function. The full synthesisvalues block. Stimuli were presentedon computerusingthe ONLINE
for thesetwo tokensare listed in AppendixA. These tokens program(Miller, 1990).Stimuliweresampledat a rateof 10
were used as seriesend points for the syntheticvoice con- kHz, low-passfilteredat 4.8 kHz, andpresentedoneat a time
tinuum used for the experiment.An 11-membersynthetic overAKG K240DF headphones to subjectsat a comfortable
voice continuumvarying from male to female voice was listeninglevel.
generatedby incrementingF0 and formantvaluesin linear
stepwisefashionbetweenthe F0 and formantvaluescorre- B. Results and discussion
spondingto the two end points.Glottal functionvaluesre-
mained constant across the series. The identification and discrimination data collapsed
acrosssubjectare displayedin Fig. 1. In experiment1 andin
3. Procedure
all subsequentexperimentsdescribedbelow, resultswere
The baseline identification trials were presentedfirst. alsoanalyzedwith a sexof listenerfactor.Sinceno effectsof
One block of 110 randomized trials (11 continuum sex of listenerwere observedin any experiment,all results
stimulix10 repetitions)was presented.Stimuli were pre- are reportedwithoutthis analysisvariable.
sented one at a time for identification. Listeners were in- As shownin the figure,the identificationfunctionexhib-
structedto listento eachstimulusand rate it usinga six-point its a gradualslope from male to female voice categories.
scalefrom "good male voice" to "good femalevoice." They There is no sharpdiscontinuityin the boundaryregion be-
were told to use numbers 1 or 6 if the voice was a good tween the male and female ends of the continuum. For the
exemplarof a male or female voice, numbers2 or 5 if the ABX data, two functions are shown: the obtainedABX data
voice was clearly male or female but did not soundas good from the experimentand the predictedABX data basedon
as other voices, and numbers 3 or 4 if they were not sure the identificationdata. The predicteddata was obtainedby
about the voice genderand had to guess. applyingthe standard
formulaof Libermanet al. (1957) to
After a 2-min break, listenerswere presentedwith the the identificationdata for each subjectto derive predicted
two-step ABX discriminationtrials. In the two-step ABX discriminationperformance.As seenin the figure, the ob-
paradigm,two stimuli differingby two placesalongthe con- tainedABX data consistsof high overall discriminationper-
3082 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et al.: Voice gender 3082
formancewith no discriminationpeak. This function varies A. Method
predicteddata for each stimuluspair. In experiment2, all aspectsof the procedurefor present-
The identification and discrimination results indicate
ing the identificationtrials were identical to experiment 1
that the perceptionof the syntheticvoice gendercontinuum except for the response alternatives. Listeners were in-
is not categorical.There was no sharpdiscontinuitybetween structedto listen to each stimulusand respondby pressinga
male and female categoriesin the identification data and key correspondingto three choices:male, female, or other.
there was an absenceof a discriminationpeak in the bound- Listenerswere told that they shoulduse the other response
ary region. In addition,the discriminationperformancewas alternativewheneverthey heard a stimulusthat they did not
substantiallyhigher than the discriminationperformancepre- distinctlyperceiveas a male or female voice.
dictedby applyingthe Haskinsformula to the identification
data. The patternof identificationand discriminationperfor- B. Results and discussion
manceis typicalof many auditorypsychophysical functions.
The lack of categoricalperceptionindicatesthat the percep- The identificationdata collapsedacrosssubjectare dis-
played in Fig. 2. In this figure, the identificationdata are
tion of voice genderis differentthan the perceptionof pho-
shownas three separatefunctionsfor eachresponsealterna-
netic stimuli, in termsof categoricalness.Theseresultssug-
tive. The patternof data for the male and female responses
gest that the perception of voice gender is handled by
was approximatelythe same as observedfor experiment1.
auditory psychophysicalprocesses.In addition, the results
For the otherresponse,this alternativewas usedprimarily to
suggestthat specificand detailedauditoryinformationabout
label stimuli from the middle of the continuum. However,
voicegenderis retainedduringperception.
when examiningthe extent to which subjectsusedthis label,
identificationresponsesonly reached32% for the stimulus
receivingthehighestnumberof otherresponses (stimulus6).
Also, the overall total numberof otherresponseswas 13.1%,
tt. EXPERIMENT 2 comparedto 46.9% for the male responses and 39.9% for the
female responses.Thus, althoughthere was a tendencyfor
some stimuli to be labeled as an other voice, evidence for the
Althoughthe combinedidentificationand discrimination
data from experiment 1 suggestan absenceof categorical presenceof a third voice was weak. Overall, the resultsfrom
perception,it is possiblethat the resultswere affected by
experiment2 providelittle supportfor the presenceof other
voices in the stimulus continuum.
anotherfactor.This factor is that intermediatevoice catego-
ries could existin betweenthe male and female end pointsof
Itt. EXPERIMENT 3
the syntheticvoice continuum.If suchvoice categoriesexist,
the presenceof thesecategoriescould influencethe interpre- In the next two experiments,the natureof the perceptual
tation of the data obtainedfrom experiment1. representation of voice genderwas investigatedby usingse-
To assessthis possibility,a secondidentificationexperi- lective adaptationtechniques.Selectiveadaptationin speech
ment was conducted.In experiment 2, the same stimulus perceptionhas a long history(Ades, 1976; Samuel,1986).
continuumfrom experiment1 was used.However, insteadof Althoughsomeresearchers
have criticizedspeechadaptation
using a six-point male/female rating scale, subjectswere experiments (Diehl, 1981;Diehl et al., 1985),otherssuggest
given three responsealternatives:male, female, or "other." thatthe techniqueis usefulin examiningthe natureof speech
Subjectswere told that the otherresponseshouldbe usedif perceptionandrepresentation (Samuel,1986). The resultsof
they heard a voice that did not fit into the male or female adaptationexperimentshave been used to specify details
categories.If the numberof other identificationresponsesto aboutthe perceptuallevelsof processingusedduringspeech
stimuli in the middle of the continuumtums out to be high, perception(Sawusch,1986) and the format of LTM repre-
this would suggestthat anothervoice was perceivedin the sentations of speech(Miller etal., 1983; Samuel, 1982).
continuum.If the numberof other responsesto thesestimuli Here, our assumptionabout selectiveadaptationis that the
is low, this would suggestthat subjectsin experiment1 es- repeatedpresentationof an adaptingstimulussomehowaf-
sentiallyperceivedthe continuumin termsof the two male/ fectsor altersthe perceptualprocessingor representationof
female categories. speechrelatedto the adaptor.
3083 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL' Voice gender 3083
% CORRECT
lOO
80
60
4o
20
I 2 3 4 5 6 7 8 9 10 11
STIMULUS
FIG.2. Identification
datafromexperiment
2. Thedataareindexed
bypercent
correct
labeling
ontheY axisforthethreeresponse
alternatives
(male,female,
or other).
voice adaptation.Auditory overlap refers to the degreeof Syntheticadaptors Naturaladaptors Difference Ratio
acousticoverlap of an adaptingstimulusto the continuum
Male F0 136 Hz 125 Hz -11 Hz 0.92
end-pointstimulusin the continuumbeingtested.The role of
F 1 270 Hz 259 Hz - 11 Hz 0.96
auditoryoverlapon selectiveadaptationin speechperception F2 2290 Hz 2074 Hz - 216 Hz 0.91
is very important.Some researchin speechadaptationhas F3 3010 Hz 2938 Hz -72 Hz 0.98
indicatedthat adaptationdoes not occur unlessthere is a
Female F0 250 Hz 230 Hz -20 Hz 0.92
substantialauditoryoverlapof the adaptorwith a continuum F1 310 Hz 362 Hz +52 Hz 1.17
endpoint(Ades, 1976). Theseresultshave beeninterpreted F2 2790 Hz 2506 Hz -284 Hz 0.90
as evidencefor the presenceof an auditorylevel of speech F3 3310 Hz 3595 Hz +285 Hz 1.09
processing
andrepresentation
(Sawusch
andJusczyk,1981).
3084 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennixet aL: Voice gender 3084
natural adaptors,the auditoryoverlapfrom adaptorto series For the syntheticmale and syntheticfemale conditions,
end point varied from 100%. The percent overlap for each the adaptorsconsistedof the male and female end-point
acousticparameterwas determinedby deriving a ratio value stimuli from the continuum,respectively.For the male natu-
equal to the value of eachnaturaladaptorparameterdivided ral andfemalenaturalconditions,the adaptorswere naturally
by the value of the corresponding parameterfor the appro- produced/i/ vowel tokens spokenby a male and female
priatesyntheticcontinuumendpoint(i.e., maleadaptorcom- speaker,respectively.
paredto male end point, female adaptorcomparedto female Each subject received the baseline block of trials first
endpoint).This ratiowasconvertedto a percentage overlap followed by the block of adaptationtrials. Each block con-
valuefor eachseparateparameter(seeTableI). All param- sistedof 110 trials. Listenersusedthe samesix-pointrating
eter valuesfor the naturalmale adaptorfell below that of the scale as before.
male end point. For the naturalfemale adaptor,the F0 and
F2 valuesfell below the female end point, while the F1 and
F3 valuesfell abovethe endpoint.Thusthe auditoryoverlap B. Results and discussion
of the natural adaptorsto the end points, as defined by a
The resultscollapsedacrosssubjectsfor the four adap-
combination of F0 and formant values, was close to but not
tation conditionsare displayedin Fig. 4. The figure shows
identicalto the valuesfor the syntheticend-pointadaptors.
the identificationfunctionsbefore and after adaptationfor
The predictionsfor experiment3 are as follows. If the
each of the four adaptorconditions.A visual inspectionof
syntheticand naturaladaptorshave the sameeffect on adap- the data indicates that the identification function is shifted
tation, this would indicate that auditory overlap has little
toward the male end point for the male adaptor conditions
effect. This result would be consistentwith the hypothesis
and toward the female end point for the female adaptorcon-
that an abstractperceptualrepresentationof voice gender ditions.
was adaptedin the experiment.The other alternativeis that
Two three-way ANOVAs with the factors of token
the amountof adaptationproducedby the syntheticand natu-
(stimulusnumberin the continuum),condition(baselineor
ral adaptorsdiffers. This result would indicatethat auditory
adaptation),and adaptor(syntheticor natural)were run on
overlap does affect adaptation,suggestingthat an auditory-
the combineddatafrom the two male adaptorconditionsand
basedrepresentation of voice was adapted. the combined data from the two female conditions. This
3085 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennixet aL' Voice gender 3085
(a)
Male Femal
e
.............
. ,,
......
4i{11!i[l
l:. l.):..........
........
d•-•t••-:•:•:•:;:•?.
:" •':.• :r;•-•
;,:?
.......................................
7.?ai•{ '-;':'i ";•:7•:•i''
a ................
r•:•::.'::•
';'• - .: :•:'•:.•.....................................
": ß'.:. ••• •.• ;:•.• .......
• ..:<.•: : ..:-.-
............ :::•t.:•;.: .':.:,•:.:.-..
•:•:}:::;':"•. :.',.•:"
--.,-:::::
:'. '-..'"
•:::'"': - -- : ] • ::-::<::•:5.:x:..2
4.•................................................................
• "::'•"•:':'•:<•'•'
'•g............................................
• .............................
•r'7•'•T'T7¾T7T
......................
7....................................................
T•Z•T]:5'5
:.::•
k:•:;
•]•[•;•]•q}ZT•
• .........
.........
•']•.•;
•;:•':(•
??'::•
?:•
.•;.•
.•.
;:...;
:• '[-:...:.
.................
•.: ...................
?.,•:
:,.
::.-•
"•......•;i.;
:•-'i•:
] -•:
8. •38 ......
Time ( seo ) I.
(b)
3086 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3086
RATING RATING
i i
2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11
STIMULUS STIMULUS
--'-- Baseline ....... Synthetic Male Adapt --'--Baseline ...... Natural Male Adapt
RATING RATING
2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11
STIMULUS STIMULUS
---'--Baseline ...... Synth Female Adapt --'-- Baseline ....... Nat Female Adapt
FIG. 4. Baselineand adaptationidentificationdatafrom experiment3. The syntheticmale and naturalmale data are shownon the top left and right and the
syntheticfemale and naturalfemale data are shownon the bottomleft and right.
3087 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3087
HighFO,LowFs Low F..O.,....H.
ighFs
'
FO
FIG. 5. Spectrogramsfor the formantsandF0 adaptors.Amplitudedisplaysare shownon top for eachstimulus,F0 valuesovertime in the middle,and
spectrographic
displayswith the centerformantvaluesfor F1, F2, and F3 shownat the bottom.
termsof the major acousticparametersrelatedto voice gen- is importantto assesswhetherthe voice adaptationeffects
der. However, the perceptualratingsfrom the pilot experi- found previouslywere due to this factor.
ment describedin experiment 1 indicated that there were Finally, if F0 overlapis responsiblefor adaptation,then
somedifferencesbetweenthe adaptorsin termsof voice gen- adaptationshouldonly be obtainedwith the F0 adaptor.If
der quality.On a five-pointrating scalefrom male to female neitheradaptorhasan effect,this wouldindicatethatit is the
combination of the formant and F0 acoustic factors that is
voice, the F0 adaptorreceiveda rating of 1.8 and the for-
mants adaptorreceiveda rating of 3.3. Thus the F0 adaptor important in producing adaptation. This result would
was rated as more male than the formantsadaptor. strengthenthe hypothesisthat the perceptualrepresentation
The use of theseadaptorsalso allows an assessment of of voice genderis basedon auditoryparametersand not ab-
whether perceptualvoice quality is responsiblefor adapta- stractvoice representations.
tion. If this factor matters,one would expect that a signifi-
cant amount of adaptationwould be obtainedwith the F0 A. Method
3088 d. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3088
RATING F0 adaptorwere identical to the formant values for the fe-
male endpoint.If adaptationwas drivenby formantoverlap,
then the F0 shift could be interpretedas a formant-driven
adaptationeffect toward the female end of the continuum
insteadof an assimilationshift. However, this explanation
losesstrengthas one considersthat the formantsadaptorhad
no effect. If adaptationwas formant driven, both adaptors
shouldhave had significantadaptingeffectsin oppositedi-
rections.It is unclearwhy an assimilationeffect would occur
I I I I I I I I I
STIMULUS V. EXPERIMENT 5
3089 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3089
RATING However,there was no significanteffect of condition(F
= 2.3, p < 0.15) andno significantinteractionof tokenwith
condition(F= 1.4, p<0.19). For the female instructions
group, significantmain effects of token [F(10,140)
=360.8, p<0.001] and condition [F(1,14)=21.3,
p < 0.001 ] wereobtained.The interactionof tokenwith con-
dition was also significant[F(10,140)=3.1, p<0.001].
Post hoc tests of the interaction showed reliable differences
betweenbaselineand adaptationconditionsonly for stimuli
I I I I I I
5 6 7 8 9 10
4-7, 9, and11.3
STIMULUS The resultsfor the male instructionsgroupwere consis-
tent with an auditory-basedexplanationof voice adaptation.
'-'-- Identification ....... Male Instructions
However, the resultsfor the female instructionsgroup were
RATING unexpected.When subjectswere told that the adaptorwas
female, they heard the stimuli acrossthe continuumas more
female(assimilation).
Why wouldmaleinstructions
haveno
effect but female instructionsproduce assimilation?One
speculativeansweris that subjectswho receivedfemale in-
structionsadjustedtheir overall criteria to favor female re-
sponsesto the other stimuli in the series.However, this ex-
planation is ad hoc. It is interesting to note that, in
experiment4, assimilationtoward the female end point was
2 3 4 5 6 7 8 9 10 11 also found, but was producedby adaptationwith the F0
STIMULUS adaptor.Although there may be some common mechanism
producingthis effect in both experiments,the nature of this
--'-- Baseline ...... Female Instructions
mechanism remains unclear.
3090 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennixet al.' Voice gender 3090
suggestingthat auditoryoverlapis important.Finally, in ex- findingsrepresenta first step toward definingthe natureof
periment 5, there was no clear-cutevidencein supportof a the perceptualrepresentationof voice genderand that much
higher level cognitivefactor in voice adaptationand by ex- of the knowledgewe have about acoustic-phoneticpercep-
tensionno evidencefor an abstractrepresentationof voice tion is based on work with syntheticspeech.In addition,
gender. syntheticvoiceshave been usedby othersto determinehow
When the findingsfrom all experimentsare considered listeners
judgethe perceptualqualityof voice(Gerrattet al.,
together,somedifferencesand similaritiesbetweenphoneme 1993). But, we do acknowledge that in orderto providea
representations and voice genderrepresentations can be dis- more definitive examinationof voice representation,future
cussed.The most importantfinding is that voice genderis studiesshouldcomparesyntheticspeechto naturalspeechin
not storedin abstractmale and female voice representations. addressingrelated issues.
Instead, voice gender appearsto be stored in the form of In conclusion,the hypothesisthat voice genderis stored
auditory-basedperceptualrepresentations. These representa- in abstractrepresentationsin memoryreceivedlittle support.
tions,in all probability,containspecificauditoryinformation However, the present investigationfocused only on a few
about acousticvoice parametersrelevant to gender.The re- preliminary aspectsof this issue. Future researchneeds to
sults of experiment4 suggestthat theserepresentationsare examinein furtherdetail the prototypehypothesisof storing
not basedon one isolatedparameterlike F0 or formant fre- voicesin memory (Papcunet al., 1989) and other details
quencies.Instead,the representations are probablyan audi- about voice representationnot specificallyrelated to voice
tory compositeof the various acousticfactors relevant to gender.
voice genderlike F0, formantfrequencies,breathiness,etc.
Althoughthereis a closerelationshipbetweenphoneticcod- ACKNOWLEDGMENTS
ing and voice codingprocessesduringperception,the repre-
This researchwas supportedby NIDCD Grant No. R01-
sentationsof phonemesand voicesappearto be qualitatively
DCO 1667 to one of the authors(J. W. M.) We wish to thank
different in that phoneticrepresentationsmay not be as de-
tailed. Georgianne Baartmans, Renee Dudzinski, Kathy Gorday,
and Margaret Webb for running subjectsand Lin Zong for
The adaptationeffectsobservedalso contrastwith stud-
programming assistance.Thanks also to Jim Ralston, Jim
ies examiningvowel adaptation(Godfrey, 1980; Morse
Sawusch,and one anonymousreviewer for useful comments
et al., 1976). Vowel adaptation,as assessed
in thesestudies,
and critiques.
can occur in some circumstanceswhen the vowel adaptor
and vowel end point are spectrallydissimilar.Resultsof this
APPENDIX A
type can be explainedby positingthe involvementof either
higher level auditorypatternsor abstractphoneticrepresen-
Male end-point synthesis values
tations.However, the lack of adaptationwith spectrallydis-
similar voice adaptorsobservedin the presentstudysuggests
SYM V/C VAL SYM V/C VAL
that voice is tied to lower level auditoryrepresentations.
The presentresultsalso serve to suggestfuture direc- sr C 10 000 nf C 4
tions to pursue concerningthe acousticfactors related to du C 250 ss C 2
voicegenderperception(Coleman,1976;Lasset al., 1976; ui C 5 rs C 1
Murry andSingh,1980;SinghandMurry, 1978). Insteadof f0 V 136 av V 60
focusing an separateindividual acousticfactors and how F 1 v 270 b1 v 60
they contributeto genderperception,the resultsof experi- F2 v 2290 b2 v 90
ments 3 and 4 suggestthat perhapsvoice gendershouldbe F3 v 3010 b3 v 150
studied in terms of integrated auditory representations.In F4 v 3500 b4 v 200
addition,the presentfindingssuggestthat infants' classifica- F5 v 3700 b5 v 200
tion of voiceinto male andfemalecategories(Miller, 1983; f6 v 4990 b6 v 500
Miller et al., 1982) may be basedon heuristicsthat utilize fz v 280 bz v 90
specific and detailed auditory voice information. Further fp v 280 bp v 90
studiesof the type performedin the presentstudywith male- ah V 35 oq V 75
only voices and female-only voices may help to elucidate at v 0 tl V 20
some of these issues. af v 0 sk v 0
Finally, one dimensionalissueshouldbe mentionedcon- al v 0 pl v 80
cerningthe presentresults.All of the experimentsreported a2 v 0 p2 v 200
here usedsyntheticspeechtokens.One possiblecriticismof a3 v 0 p3 v 350
this studyis that the resultsfound with syntheticspeechmay a4 v 0 p4 v 500
not generalizeto natural speech.Synthetic voices are re- a5 v 0 p5 v 600
duced stimuli that do not contain the full complementof a6 v 0 p6 v 800
acoustic information contained in natural voices. There is an v 0 ab v 0
much evidenceindicating that perceptionand memory for ap v 0 os C 0
syntheticstimuliis differentthanfor naturalstimuli(Ralston gO v 64 dF v 0
et al., 1995). Our reply to thispotentialdiscussion
is thatour db v 0
3091 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3091
Varied parameters SYM V/C VAL SYM V/C VAL
3093 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3093
SYM V/C VAL SYM V/C VAL •Forallfouradaptation
conditions,
theresults
werealsoanalyzed
intermsof
categoryboundarydata.Categoryboundarieswere computedfor baseline
and adaptationconditionsfor eachsubjectandthen analyzedin a one-way
a2 v 0 p2 v 200
ANOVA. The resultsusing this method mirrored the resultsreportedon
a3 v 0 p3 v 350 analysesof raw identificationdata,in termsof overalleffectof condition:
a4 v 0 p4 v 500 [F(1,14)=6.8, p<0.03] for syntheticmale, (F=0.8, n.s.) for natural
a5 v 0 p5 v 600 male (F-0.4, n.s.)for syntheticfemale,and[F(1,12)= 10.7, p<0.01]
a6 v 0 p6 v 800 for natural female. Thus the effects of adaptationas measuredthrough
an v 0 ab v 0 categoryboundarymovementanalyseswere approximatelythe same.
2Results
forthesetwoconditions
werealsoanalyzed
usingcategory
bound-
ap v 0 os C 0 ary data. The analysesshowedthat the effect of condition,usingcategory
gO v 64 dF v 0 boundarydata,was similarto the analysesbasedon raw identificationdata
db v 0 for the formantsadaptor(F= 1.3, n.s.)andfor the F0 adaptor(F= 3.2,
n.s.).
Varied parameters 3Results
forthese
twoconditions
werealsoanalyzed
usingcategory
bound-
ary data. The resultsshowedthat the effect of condition,usingcategory
boundarydata,was similarto the analysesbasedon raw identificationdata
Time fO av ah oq t1 for the male instructions
condition(F= 2.5, n.s.) and the femaleinstruc-
tionscondition[F (1,14) = 9.0, p < 0.01 ].
0 136 0 20 25 0
5 136 54 20 28 0
10 136 60 20 31 0
Ades,A. E. (1976). "Adaptingthe propertydetectorsfor speechpercep-
tion," in New Approachesto LanguageMechanisms,editedby R. J. Wales
15 136 60 20 34 0
andE. Walker(North-Holland, Amsterdam), pp. 55-107.
20 136 60 20 37 0 Assmann,P. F., Nearey,T M., andHogan,J. T. (1982). "Vowelidentifica-
25 136 60 20 40 0 tion: Orthographic,perceptual,and acousticaspects,"J. Acoust.Soc.Am.
30 136 60 20 43 0 71, 975-989.
35 136 60 20 46 0 Coleman,R. O. (1976). "A comparison
of the contributions
of two voice
40 136 60 20 50 0 qualitycharacteristics
to the perceptionsof malenessandfemalenessin the
45 136 60 20 50 0 voice," J. SpeechHear. Res. 19, 168-180.
50 136 60 20 50 0 Creelman,C. D. (1957). "Caseof the unknowntalker,"J. Acoust.Soc.Am.
29, 655.
55 136 60 20 50 0
Diehl, R. L. (1981). "Featuredetectors
for speech:A criticalreappraisal,"
60 136 60 20 50 0
Psychol.Bull. 89, 1-18.
65 136 60 20 50 0 Diehl, R. L., Kleunder,K., andParker,E. M. (1985). "Are selectiveadap-
70 136 60 20 50 0 tationandcontrasteffectsreally distinct?,"J. Exp. Psychol.Hum. Percept.
75 136 60 20 50 0 Performance 11, 209-220.
80 136 60 20 50 0 Fourcin,A. J. (1968). "Speech-source
interference,"IEEE Trans.Audio
85 136 60 20 50 0 Electroacoust. ACC-16, 65-67.
90 136 60 20 50 0 Gamer,W. (1974). TheProcessing
of Informationand Structure(Erlbaum,
95 136 60 20 50 0
Hillsdale,NJ).
Gerratt, B. R., Kreiman, J., Antonanzas-Barroso, N., and Berke, G. S.
100 136 60 20 50 0
(1993). "Comparinginternal and externalstandardsin voice quality
105 136 60 20 50 0 judgements,"J. SpeechHear. Res. 36, 14-20.
110 136 60 20 50 0 Godfrey,J. J. (1980). "Comparison of consonantalandvocaliccuesin se-
115 136 60 20 50 0 lectiveadaptation," Percept.Psychophys. 28, 103-111.
120 136 60 20 50 0 Harnad,S., (Ed.). (1987). CategoricalPerception(Cambridge U. P., Cam-
125 136 60 20 50 0 bridge,England).
130 135 60 20 50 0 Johnson,K. (1990a)."The roleof perceived speakeridentityin F0 normal-
135 134 60 20 50 0 ization of vowels," J. Acoust. Soc. Am. 88, 642-654.
140 133 60 20 50 0
Johnson,
K. (1990b). "Contrastandnormalization
in vowelperception,"
J.
Phon. 18, 229-254.
145 132 60 20 50 0
Klatt, D. H. (1980). "Softwarefor a cascade/parallel
formantsynthesizer,"
150 131 60 20 50 0 J. Acoust. Soc. Am. 67, 971-995.
155 130 60 20 50 0 Klatt, D. H. (1989). "Reviewof selectedmodelsof speechperception,"in
160 129 60 20 50 0 Lexical Representationand Process,edited by W. D. Marslen-Wilson
165 128 60 20 50 0 (MIT, Cambridge,MA), pp. 169-226.
170 127 60 20 50 0 Klatt,D. H., andKlatt,L. C. (1990). "Analysis,synthesis,
andperception
of
175 126 60 20 50 0 voice quality variationsamongfemale and male talkers,"J. Acoust.Soc.
Am. 87, 820-857.
180 125 60 20 50 0
Kreiman,J., andPapcun,G. (1991). "Comparingdiscrimination
andrecog-
185 124 58 21 50 1
nition of unfamiliarvoices," SpeechCommun.10, 265-275.
190 123 57 22 51 3
Kuhl, P. K. (1991). "Human adultsand humaninfantsshowa perceptual
195 122 56 23 52 4 magneteffect for the prototypesof speechcategories,monkeysdo not,"
200 121 55 24 53 6 Percept.Psychophys. 50, 93-107.
205 120 53 25 53 7 Ladefoged,P., andBroadbent, D. (1957). "Informationconveyed
by vow-
210 119 52 26 54 9 els," J. Acoust. Soc. Am. 29, 98-104.
215 118 51 28 55 10 Lass,N.J., Hughes,K. R., Bowyer, M.D., Waters,L. T., and Bourne,V. T
220 118 50 29 56 12 (1976). "Speakersexidentification
from voiced,whispered,
andfiltered
isolated vowels," J. Acoust. Soc. Am. 59, 675-678.
225 117 48 30 56 13
Liberman,A.M., Harris,K. S., Hoffman,H. S., andGriffith,B.C. (1957).
230 116 47 31 57 15
"The discriminationof speechsoundswithin and acrossphonemebound-
235 115 46 32 58 16
aries," J. Exp. Psychol.54, 358-368.
240 114 45 33 59 18 Macmillan,N. A., Kaplan,H. L., and Creelman,C. D. (1977). "The psy-
245 113 0 35 60 20 chophysics of categoricalperception,"Psychol.Rev. 84, 452-471.
3094 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3094
Miller, C. L. (1983). "Developmental
changes in male/femaleclassification Samuel,A. G. (1982). "Phoneticprototypes,"Percept.Psychophys.
31,
by infants,"Infant Behav.Dev. 6, 313-330. 307-314.
Miller, C. L., Younger,B. A., andMorse,P.A. (1982). "The categorization Samuel,A. G. (1986). "Red herringdetectorsand speechperception:in
of male and female voicesin infancy," Infant Behav. Dev. 5, 143-159. defenseof selectiveadaptation,"CognitivePsychol.18, 452-499.
Miller, C. M. (1990). ONLINE;
A general-purpose
programfor the generation Samuel,A. G. (1988). "Centralandperipheralrepresentationof whispered
and controlof laboratoryexperimentsand perceptualtestswith acoustic
and voiced speech,"J. Exp. Psychol.:Hum. Percept.Perform. 14, 379-
stimuli,for usewith microcomputers
usingMS-DOS[computer program], 388.
LaboratoryMicrosystems,Baton Rouge, LA.
Miller, J. L., Connine,C. M., Schermer,T. M., and Kleunder,K. R. (1983). Sawusch,J. R. (1977). "Peripheralandcentralprocesses in selectiveadap-
"A possibleauditorybasisfor internal structureof phoneticcategories," tationof placeof articulationin stopconsonants,"
J. Acoust.Soc.Am. 62,
738-750.
Percept.Psychophys.46, 505-512.
Morse,P.A., Kass,J. E., andTurkienicz,R. (1976). "Selectiveadaptationof Sawusch,J. R. (1986). "Auditoryand phoneticcodingof speech,"in Pat-
vowels,"Percept.Psychophys. 19, 137-143. tern Recognitionby Humansand Machines,editedby E. C. Schwaband
Mullennix,J. W., and Pisoni,D. B. (1990). "Stimulusvariabilityandpro- H. C. Nusbaum(Academic,Orlando,FL), Vol. 1, pp. 51-88.
cessingdependenciesin speechperception,"Percept.Psychophys.47,, Sawusch,J. R., andJusczyk,P. W. (1981). "Adaptationandcontrastin the
379-390.
perceptionof voicing,"J. Exp. Psychol.:Hum. Percept.Perform.11, 242-
Mullennix, J. W., Pisoni,D. B., and Martin, C. S. (1989). "Some effectsof 250.
talker variability on spokenword recognition,"J. Acoust. Soc. Am. 85, Sawusch,J. R., and Pisoni,D. B. (1976). "Responseorganizationin selec-
365-378.
tive adaptationto speechsounds,"Percept.Psychophys. 20, 413-418.
Murry, T., and Singh,S. (1980). "Multidimensional
analysisof male and
Singh,S., andMurry,T (1978). "Multidimensional classification
of normal
female voices," J. Acoust. Soc. Am. 68, 1294-1300.
voice qualities,"J. Acoust.Soc.Am. 64, 81-87.
Nygaard,L. C., Sommers,M. S., and Pisoni,D. B. (1992). "Effectsof
stimulusvariability on the representationof spokenwords in memory," Sommers, M. S., Nygaard,L. C., andPisoni,D. B. (1992). "Stimulusvari-
Researchon SpokenLanguageProcessingPR-18 (Indiana University, ability and spokenword recognition:effectsof variabilityin speakingrate
Bloomington, IN), pp. 163-184. and overall amplitude," in Research on Spoken Language Processing
Palmeri,T. J., Goldinger,S. D., andPisoni,D. B. (1993). "Episodicencod- PR-18 (IndianaUniversity,Bloomington,IN), pp. 31-52.
ing of voice attributesand recognitionmemoryfor spokenwords,"J. Exp. Van Lancker, D., Kreiman, J., and Wickens, T (1985b). "Familiar voice
Psychol.Learn. Memory Cognition19, 309-328. recognition:patternsand parameters.Part II: Recognitionof rate-altered
Papcun,G., Kreiman,J., and Davis,A. (1989). "Long-termmemoryfor voice," J. Phon. 13, 39-52.
unfamiliar voices," J. Acoust. Soc. Am. 85, 913-925. Van Lancker,D., Kreiman,J., and Emmorey,K. (1985a). "Familiar voice
Pisoni,D. B., and Lazarus,J. H. (1974). "Categoricaland noncategorical recognition:patternsand parameters.Part I: Recognitionof backward
modesof speechperceptionalongthe voicingcontinuum,"J. Acoust.Soc. voices," J. Phon. 13, 19-38.
Am. 55, 328-333.
Verbrugge,
R. R., Strange,
W., Shankweiler,
D. P.,andEdman,T. R. (1976).
Ralston,J. V., Pisoni,D. B., and Mullennix,J. W. (1995). "Perceptionand
"What informationenablesa listenerto map a talker's vowel space?,"J.
comprehension of speech,"in Applied SpeechTechnology,edited by A.
Acoust. Soc. Am. 60, 198-212.
Syrdal,R. Bennett,and S. Greenspan (CRC, BocaRaton,FL), pp. 233-
287. Weenink,D. J. M. (1986). "The identificationof vowel stimuli from men,
Remez,R., Rubin, P., Nygaard,L., and Howell, W. (1987). "Perceptual women, and children," in Proceedings10 from the Institute of Phonetic
normalizationof vowelsproducedby sinusoidalvoices,"J. Exp. Psychol. Sciencesof the Universityof Amsterdam(Universityof Amsterdam,Am-
Hum. Percept.Performance13, 40-61. sterdam,The Netherlands),pp. 41-54.
3095 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3095