Gender and Voice

The perceptual representation of voice gender

John W. Mullennix, Keith A. Johnson, Meral Topcu-Durgun, and Lynn M. Farnsworth

Citation: The Journal of the Acoustical Society of America 98, 3080 (1995); doi: 10.1121/1.413832
Published by the Acoustical Society of America


The perceptual representation of voice gender
John W. Mullennix
Departmentof Psychology,
WayneStateUniversity,71 W. WarrenAvenue,Detroit,Michigan48202
Keith A. Johnson
Departmentof Linguistics,OhioStateUniversity,Columbus,

Meral Topcu-Durgunand Lynn M. Farnsworth

Departmentof Psychology,

(Received29 November1993;revised2 June1995;accepted

7 June1995)
The perceptualrepresentation of voice genderwas examinedwith two experimental paradigms:
identification/discriminationand selectiveadaptation.The results from the identificationand
discrimination of a syntheticmale-femalevoicecontinuumindicatedthatvoicegenderperception
was not categorical.In addition,resultsfrom selectiveadaptationexperiments with naturaland
syntheticvoice stimuli indicatedthat the perceptualrepresentation of voice adaptedis an
auditory-based representation.Overall,thesefindingssuggestthatthe perceptual
voice genderis auditorybasedand is qualitativelydifferentfrom the representationof phonetic
information. ¸ 1995 AcousticalSocietyof America.
PACS numbers: 43.71.Bp

INTRODUCTION dicatedthat the perceptualprocessingof the phoneticinfor-

mationwas contingenton the perceptualprocessingof voice
Listenersare able to identifytalkersby their voiceswith and vice versa.
a great deal of accuracyand very little trouble.Even when Given the close relationshipbetween voice perception
voices are unfamiliar, listenershave very consistentimpres- andphoneticperception,it is importantto assess
the specific
sionsof a talker's gender,height, weight, etc. It is obvious processesand representations relevantto each.Theoriesof
that we have knowledgeabout voices in long-termmemory
speechperceptionhave concentrated almostexclusivelyon
(LTM) that is usedin perceivingvoice, as illustratedby the
the perceptualprocessesinvolved in processingacoustic-
brief mental hesitationwe sometimesexperiencewhen we
phoneticinformation(Klatt, 1989). Our view is that to un-
answerthe telephoneand try to identify the voice of the
derstandspeechperceptionin its entirety,the perceptual
caller.Despitethe researchon recognitionand memoryfor
resentationof voice must also be delineated.Recently,some
familiar and unfamiliarvoices(Kreimanand Papcun,1991;
researchhasexploredhow acoustic-phonetic informationis
Papcunet al., 1989;Van Lanckeret al., 1985a,1985b),we
storedin memory,with the hypothesisadvancedthat speech
still know very little abouthow voice is represented
is storedin termsof prototypes(Kuhl, 1991;Miller et al.,
perceptionand storedin LTM.
1983; Samuel,1982). A similarhypothesishas been enter-
The importanceof investigatingthe representationof
voice is underscoredby recent findingsimplicatinga close
tainedfor voices(Papcunet al., 1989).Althoughthehypoth-
esisthatvoicesare storedin LTM prototypeshasmerit, there
relationshipbetweenthe listener'suse of perceivedtalker
voice information and phonetic perception.For instance, is little empiricalevidencecurrentlyavailablein the litera-
Johnson(1990a) foundthat the perceptionof vowelsis al- ture to supportthis assertion(although,see Kreimanand
tered when vowels are embedded within carrier phrases Papcun,1991).
whose intonational contours denote different talkers. He con- In the presentstudy,our goal is to examinethe percep-
cluded that vowel normalization is mediated through pro- tual representation of talker voice. Specifically,we focuson
cessesthatrely on contextualtalker-related
information(see the representation of voice gender.The perceptionof voice
also Ladefogedand Broadbent,1957). Other studieshave genderis dependentupona numberof acousticfactors,in-
shownthat variationin talker identity from trial to trial in an cluding fundamentalfrequency,formant frequencies,and
et al., breathiness (Klatt and Klatt, 1990). Previousstudieshave
1982; Johnson, 1990b; Verbrugge et al., 1976; Weenink, investigated the relativeimportanceof theseacousticfactors
1986),consonant perception (Fourcin,1968),andwordrec- in contributingto judgementsof voice gender(e.g., Lass
ognition(Creelman,1957; Mullennix et al., 1989; Nygaard et al., 1976; Murry and Singh, 1980). The importanceof
et al., 1992; Sommerset al., 1992).Theseinterferingeffects voice genderin talker voice perceptionis illustratedby the
of voice suggestthat the processingof voice and phonetic fact that infantsare able to categorizevoicesinto male and
information are closely tied together.Further evidencefor femalecategoriesat a very early age (Miller, 1983;Miller
this closerelationshipwas obtainedby Mullennix and Pisoni et al., 1982). Other researchhas indicatedthat the primary
(1990), who foundthattheprocessing of voiceandphoneme factorunderlyingthe perceivedsimilarityof pairsof normal
dimensions in a speededclassification task (Gamer,1974) voices is male-female categorization(Singh and Murry,
was integral.In their study,the integralityof processingin- 1978). Given the importanceof genderin the perceptionof

3080 J. Acoust.Soc. Am. 98 (6), December1995 0001-4966/95/98(6)/3080/16/$6.00 ¸ 1995 AcousticalSocietyof America 3080
voice, we decidedthat exploringvoice genderwas the logi- tinuum is categorical.The term categoricalperception,as
cal first stepin determininghow voice in generalis percep- used here, adheres to the standard definition of Liberman
tually represented.The seriesof experimentsdescribedbe- et al. (1957) for speech.The test of categoricalness
for a
low was designedto provide specific evidence regarding male/femalevoice continuumprovidesinformationaboutthe
how voice stimuli varying in gender are representedin degreeto which auditoryinformationspecificto voice gen-
memory. der is retainedduringperception.If perceptionis categorical,
In order to investigatethe perceptualrepresentationof this would suggest that some detailed information about
voice gender,we used a twofold approachto the problem. voice genderis lost duringperception.If perceptionis not
First, the issueof categoricalperceptionfor voice was exam- categorical,this would suggestthat informationabout spe-
ined. Althoughthereare debatesaboutwhat categoricalper- cific auditory attributesrelated to voice genderis retained
ceptionis andhow it pertainsto speechperception(e.g.,see and subsequentlyused for discriminating among voice
volume edited by Hamad, 1987), we felt it worthwhileto stimuli within a gendercategory.
explorecategoricalperceptionwith voice genderin orderto In this experiment,identificationand ABX discrimina-
make somepreliminaryassessments aboutvoice genderrep- tion procedureswere usedto assesscategoricalperception.
resentation.Certainly,there is evidencethat categoricalper- The ABX taskis more memoryintensivethanotherdiscrimi-
ceptionmay be a generalperceptualability, rather than a nation tasks (Pisoni and Lazarus, 1974). However, this is
specializedability restrictedonly to speech(Hamad, 1987). preciselywhy we chosethe task. With the ABX task, the
It is possiblethat this generalability couldexhibit itself with likelihoodof obtaininga discriminationpeak in the boundary
voice-relatedstimuli that do not vary in acoustic-phonetic region between male and female voice is maximized due to
dimensions.If, by usingstandardidentificationand discrimi- the additionalmemory load (Macmillan et al., 1977). If a
nation paradigms,it is shown that the perceptionof voice discriminationpeak is not observedwith theABX procedure,
genderis categorical,then discriminationswithin voice gen- we would be confident that the discrimination results reflect
der categoriesshouldbe poor. This result would be consis- an absenceof categoricalperception,since discrimination
tent with the idea that voice information is converted into a peaks would most likely not be found using other low-
reducedrepresentation duringperception,with somedetailed uncertainty
discriminationprocedures (e.g.,AX, 4IAX, etc.).
informationaboutvoice gender"lost." On the other hand, if The predictionsare as follows. If the perceptionof voice
perceptionof voice genderis not categorical,thenthis would genderis categorical,a steep "voice boundary"identifica-
suggestthat, duringperception,detailedinformationrelating tion function should be observed between male and female
to voice genderis retainedand is availableto voice discrimi- ends of the continuumalong with a discriminationpeak in
nationprocesses.This latter result would be consistentwith the voice boundaryarea. In addition,the observeddiscrimi-
recent findings indicating that episodicinformation about nationperformanceshouldfit with predicteddiscrimination
voiceis retainedin memory(Palmeriet al., 1993). data derived from the identificationdata (Liberman et al.,
The secondapproachis to assessthe perceptualrepre- 1957). If perceptionis not categorical,a graduallysloping
sentationof voice genderby usingselectiveadaptationtech- identificationfunctionwith no sharpboundaryshouldbe ob-
niques.The resultsfrom selectiveadaptationexperimentsin servedalongwith the absenceof a discriminationpeakin the
speechperceptionhaveprovenusefulin examininglevelsof obtaineddiscriminationdata. These two alternativesrepre-
perceptualprocessingand representation(Samuel, 1986; sent the two extremesof categoricalversuscontinuousper-
Sawusch,1986). In particular,thesefindingsprovide evi- ception.Any pattern of identificationand/or discrimination
dencefor at leasttwo levelsof processing andrepresentation data falling betweentheseextremeswould representan in-
for phoneticperception:an auditory-based level and a higher termediatefinding betweencategoricaland continuousper-
abstractlevel (Samuel,1986;Sawusch,1986).In thepresent ception.
study,voice genderstimuli were adaptedundervariouscon- A. Method
ditions in order to determinethe level of perceptualrepre-
sentationappropriatefor voice: low-level auditoryor higher 1. Subjects
level abstract.Evidencefor a higher level perceptualrepre- The subjectswere 30 volunteersdrawn from introduc-
sentation would be consistent with the idea that voice is tory and upper-levelpsychologycoursesat WayneStateUni-
storedin terms of abstractmale and female categoriesin versity. Subjectsreceived coursecredit for their participa-
memory.On the otherhand, evidencefor an auditoryrepre- tion. All subjects were native speakersof English who
sentationof voice would suggestthat the representationof reportedno historyof a speechor hearingdisorder.
voice genderis not abstract,but is basedon explicit details
aboutthe auditoryparametersrelevantto voice. 2. Stimuli
In summary,the perceptualrepresentation of voice gen-
The stimuli were synthesizedspeechtokens prepared
der will be investigatedby determiningwhethervoice is cat-
with an updatedversionof the Klatt (1980) softwaresynthe-
egoricallyperceivedand whetheran auditoryrepresentation
sispackage.The stimuliconsistedof 250-ms-durationtokens
or a more abstractrepresentation is appropriate.
of the steady-state
vowel/i/ranging perceptuallyfrom male
to female voice. The vowel/i/was chosen because the for-
mant values for male and female voice tokens of/i/do not
In experiment 1, we were interestedin determining usuallyoverlapwith othervowels.In a pilot experiment,the
whether the perception of a syntheticvoice gender con- acousticfactors of fundamentalfrequency (F0), formant

3081 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3081
6 100





I • I ; I • • I I, I , 0
I 2 3 4 5 6 7 8 9 10 11


FIG. 1. IdentificationandABX discriminationdatafrom experiment1. The identificationdataare indexedby the ratingscaleon the left Y axis andthe
obtained versuspredicted discrimination
performance in termsof percentcorrectdiscrimination
is indexedby thepercentcorrectscaleontherightY axis.

values(F1 ,F2,F3), and glottalfunction(a combination of tinuum(i.e., stimuli 1 and 3, 5 and 7, etc.) are presentedas
AH, OQ, andTL) were manipulatedto createa setof stimuli the "A" and "B" stimuli in the trial. The "X" stimulus is
presentedto 16 subjectsfor perceptualratingsof voice qual- identical to either A or B and the listener decides whether the
ity. In the pilot experiment,low F0 valuesand low formant X stimulusmatchedA or B. 180 randomizedtrials were pre-
values biased listenerstoward "male" voice ratings. How- sented, with half the trials in the ABA format and half in the
ever,glottalfunctionvalueshad little effecteitherway on the ABB format. Each of the three stimulus events on each trial
classification of stimuli as male or female. Two stimuli cor- were separatedby a 500-ms ISI. For each of the nine pos-
respondingto the mosthighly rated "male voice" tokenand sible AB stimuluspairings,the order of stimuli within the
the most highly rated "female voice" token were chosen. pair was counterbalanced. These arrangementsproduceda
These two tokens differed in F0 and formant values, but the total of 20 discriminationresponsesper stimuluspair. Sub-
glottal function values were identical and corresponded to jects were given a 1-min break halfway throughthe trial
the least breathy glottal function. The full synthesisvalues block. Stimuli were presentedon computerusingthe ONLINE
for thesetwo tokensare listed in AppendixA. These tokens program(Miller, 1990).Stimuliweresampledat a rateof 10
were used as seriesend points for the syntheticvoice con- kHz, low-passfilteredat 4.8 kHz, andpresentedoneat a time
tinuum used for the experiment.An 11-membersynthetic overAKG K240DF headphones to subjectsat a comfortable
voice continuumvarying from male to female voice was listeninglevel.
generatedby incrementingF0 and formantvaluesin linear
stepwisefashionbetweenthe F0 and formantvaluescorre- B. Results and discussion
spondingto the two end points.Glottal functionvaluesre-
mained constant across the series. The identification and discrimination data collapsed
acrosssubjectare displayedin Fig. 1. In experiment1 andin
3. Procedure
all subsequentexperimentsdescribedbelow, resultswere
The baseline identification trials were presentedfirst. alsoanalyzedwith a sexof listenerfactor.Sinceno effectsof
One block of 110 randomized trials (11 continuum sex of listenerwere observedin any experiment,all results
stimulix10 repetitions)was presented.Stimuli were pre- are reportedwithoutthis analysisvariable.
sented one at a time for identification. Listeners were in- As shownin the figure,the identificationfunctionexhib-
structedto listento eachstimulusand rate it usinga six-point its a gradualslope from male to female voice categories.
scalefrom "good male voice" to "good femalevoice." They There is no sharpdiscontinuityin the boundaryregion be-
were told to use numbers 1 or 6 if the voice was a good tween the male and female ends of the continuum. For the
exemplarof a male or female voice, numbers2 or 5 if the ABX data, two functions are shown: the obtainedABX data
voice was clearly male or female but did not soundas good from the experimentand the predictedABX data basedon
as other voices, and numbers 3 or 4 if they were not sure the identificationdata. The predicteddata was obtainedby
about the voice genderand had to guess. applyingthe standard
formulaof Libermanet al. (1957) to
After a 2-min break, listenerswere presentedwith the the identificationdata for each subjectto derive predicted
two-step ABX discriminationtrials. In the two-step ABX discriminationperformance.As seenin the figure, the ob-
paradigm,two stimuli differingby two placesalongthe con- tainedABX data consistsof high overall discriminationper-

3082 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et al.: Voice gender 3082
formancewith no discriminationpeak. This function varies A. Method

substantiallyfrom the near-chancepredicteddiscrimination 1. Subjects

performancebasedon the identificationdata. To verify the
The subjectswere 22 volunteersdrawn from introduc-
tory and upper-levelpsychologycoursesat Wayne StateUni-
two-way ANOVA with the factorsof stimuluspair and data
versity. Subjectsreceived coursecredit for their participa-
type (observedor predicted)wasconducted
on the discrimi-
tion. All subjects were native speakers of English who
nation data to test for differencesbetweenobservedand pre-
reportedno historyof a speechor hearingdisorder.
dicted discrimination.Significantmain effects of stimulus
pair IF(8,23 2) = 3.1, p < 0.01 ] and data type IF(1,29) 2. Stimuli
=97.4, p<0.001] were obtained.The interactionbetween
pair and data type was also significant[F(8,2 32) - 5.7, The stimuli were the sameas usedin experiment1.
p<0.001]. Newman-Keuls probes of the interaction
showedthat the obtaineddata differed significantlyfrom the 3. Procedure

predicteddata for each stimuluspair. In experiment2, all aspectsof the procedurefor present-
The identification and discrimination results indicate
ing the identificationtrials were identical to experiment 1
that the perceptionof the syntheticvoice gendercontinuum except for the response alternatives. Listeners were in-
is not categorical.There was no sharpdiscontinuitybetween structedto listen to each stimulusand respondby pressinga
male and female categoriesin the identification data and key correspondingto three choices:male, female, or other.
there was an absenceof a discriminationpeak in the bound- Listenerswere told that they shoulduse the other response
ary region. In addition,the discriminationperformancewas alternativewheneverthey heard a stimulusthat they did not
substantiallyhigher than the discriminationperformancepre- distinctlyperceiveas a male or female voice.
dictedby applyingthe Haskinsformula to the identification
data. The patternof identificationand discriminationperfor- B. Results and discussion
manceis typicalof many auditorypsychophysical functions.
The lack of categoricalperceptionindicatesthat the percep- The identificationdata collapsedacrosssubjectare dis-
played in Fig. 2. In this figure, the identificationdata are
tion of voice genderis differentthan the perceptionof pho-
shownas three separatefunctionsfor eachresponsealterna-
netic stimuli, in termsof categoricalness.Theseresultssug-
tive. The patternof data for the male and female responses
gest that the perception of voice gender is handled by
was approximatelythe same as observedfor experiment1.
auditory psychophysicalprocesses.In addition, the results
For the otherresponse,this alternativewas usedprimarily to
suggestthat specificand detailedauditoryinformationabout
label stimuli from the middle of the continuum. However,
voicegenderis retainedduringperception.
when examiningthe extent to which subjectsusedthis label,
identificationresponsesonly reached32% for the stimulus
receivingthehighestnumberof otherresponses (stimulus6).
Also, the overall total numberof otherresponseswas 13.1%,
tt. EXPERIMENT 2 comparedto 46.9% for the male responses and 39.9% for the
female responses.Thus, althoughthere was a tendencyfor
some stimuli to be labeled as an other voice, evidence for the
Althoughthe combinedidentificationand discrimination
data from experiment 1 suggestan absenceof categorical presenceof a third voice was weak. Overall, the resultsfrom
perception,it is possiblethat the resultswere affected by
experiment2 providelittle supportfor the presenceof other
voices in the stimulus continuum.
anotherfactor.This factor is that intermediatevoice catego-
ries could existin betweenthe male and female end pointsof
the syntheticvoice continuum.If suchvoice categoriesexist,
the presenceof thesecategoriescould influencethe interpre- In the next two experiments,the natureof the perceptual
tation of the data obtainedfrom experiment1. representation of voice genderwas investigatedby usingse-
To assessthis possibility,a secondidentificationexperi- lective adaptationtechniques.Selectiveadaptationin speech
ment was conducted.In experiment 2, the same stimulus perceptionhas a long history(Ades, 1976; Samuel,1986).
continuumfrom experiment1 was used.However, insteadof Althoughsomeresearchers
have criticizedspeechadaptation
using a six-point male/female rating scale, subjectswere experiments (Diehl, 1981;Diehl et al., 1985),otherssuggest
given three responsealternatives:male, female, or "other." thatthe techniqueis usefulin examiningthe natureof speech
Subjectswere told that the otherresponseshouldbe usedif perceptionandrepresentation (Samuel,1986). The resultsof
they heard a voice that did not fit into the male or female adaptationexperimentshave been used to specify details
categories.If the numberof other identificationresponsesto aboutthe perceptuallevelsof processingusedduringspeech
stimuli in the middle of the continuumtums out to be high, perception(Sawusch,1986) and the format of LTM repre-
this would suggestthat anothervoice was perceivedin the sentations of speech(Miller etal., 1983; Samuel, 1982).
continuum.If the numberof other responsesto thesestimuli Here, our assumptionabout selectiveadaptationis that the
is low, this would suggestthat subjectsin experiment1 es- repeatedpresentationof an adaptingstimulussomehowaf-
sentiallyperceivedthe continuumin termsof the two male/ fectsor altersthe perceptualprocessingor representationof
female categories. speechrelatedto the adaptor.

3083 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL' Voice gender 3083





I 2 3 4 5 6 7 8 9 10 11


--E•-- % Male ---•'---% Female ß % Other

FIG.2. Identification
2. Thedataareindexed
ontheY axisforthethreeresponse
or other).

In experiments3 and 4, a syntheticmale/femalevoice Other studieshave shown that, in certain situations,adapta-

continuumwas usedin conjunctionwith a numberof differ- tion can occurwhen there is an auditorymismatchbetween
entvoiceadaptors. The adaptorswerevariedin termsof their theadaptoranda continuum
acousticcompositionin orderto assesswhetheran auditory These latter resultshave been interpretedas supportfor a
representation or a moreabstractrepresentationof voicewas higher level "abstract"or categoricalrepresentation of
appropriate.In experiment3, adaptationwith syntheticend- speech(Samuel,1986, 1988). Taken together,it appears
point adaptorswas comparedto naturalvoice adaptorsthat likely that speechadaptationis relatedto bothauditory-level
differedslightlyfrom the endpointsin overallauditoryover- andhigherlevel processes. In thepresentstudy,the auditory
lap. Overall auditoryoverlapwas definedas a combination overlapof adaptorto continuumis manipulated. The results
of F0 and formantvalue overlap.In experiment4, synthetic canprovideinformationaboutwhetherthe adaptation effects
adaptorswere used that varied substantiallyfrom the end are relatedto an auditory-based perceptualrepresentation or
point in one of two acousticfactors,eitherF0 value or for- a more abstractperceptualrepresentation.
mant values. By comparingthe resultsof experiment3 to In thepresentexperiment, theeffectsof auditoryoverlap
experiment4, two issuescanbe investigated. The first issue were assessed for voice adaptation.The male and female
is the effects of degreeof auditory overlap on adaptation. syntheticend-pointadaptors werecongruent to themaleand
The second issue is the relative contribution of each acoustic femalevoice seriesendpoints(a 100% auditoryoverlapof
factor to voice adaptation.By examiningadaptationeffects adaptorwith endpoint).But thenaturallyproduced maleand
in this way, the effectsof auditoryoverlapon voice adapta- femaleadaptorsdifferedacoustically from the male andfe-
tion can be properly assessed. male seriesendpoints.Table I showsthe F0, F1, F2, and
In experiment3, in two conditions,male and female F3 values for the syntheticand natural adaptors.For the
end-pointvoice stimuli from the continuumwere used as
adaptors.In two other conditions,naturallyproducedmale TABLE I. List of frequencyvaluesfor F0 andformantsfor syntheticand
and female voice stimuli were used as adaptors.The end- naturaladaptorsin experiment3. Differencerefersto theraw differencein
point adaptorswere comparedto the natural adaptorsbe- Hz from the syntheticto thenaturaladaptors;ratiois equalto the valueof
causewe wantedto assessthe effectsof auditoryoverlap on the naturalparametersdividedby the value of the syntheticparameter.

voice adaptation.Auditory overlap refers to the degreeof Syntheticadaptors Naturaladaptors Difference Ratio
acousticoverlap of an adaptingstimulusto the continuum
Male F0 136 Hz 125 Hz -11 Hz 0.92
end-pointstimulusin the continuumbeingtested.The role of
F 1 270 Hz 259 Hz - 11 Hz 0.96
auditoryoverlapon selectiveadaptationin speechperception F2 2290 Hz 2074 Hz - 216 Hz 0.91
is very important.Some researchin speechadaptationhas F3 3010 Hz 2938 Hz -72 Hz 0.98
indicatedthat adaptationdoes not occur unlessthere is a
Female F0 250 Hz 230 Hz -20 Hz 0.92
substantialauditoryoverlapof the adaptorwith a continuum F1 310 Hz 362 Hz +52 Hz 1.17
endpoint(Ades, 1976). Theseresultshave beeninterpreted F2 2790 Hz 2506 Hz -284 Hz 0.90
as evidencefor the presenceof an auditorylevel of speech F3 3310 Hz 3595 Hz +285 Hz 1.09

3084 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennixet aL: Voice gender 3084
natural adaptors,the auditoryoverlapfrom adaptorto series For the syntheticmale and syntheticfemale conditions,
end point varied from 100%. The percent overlap for each the adaptorsconsistedof the male and female end-point
acousticparameterwas determinedby deriving a ratio value stimuli from the continuum,respectively.For the male natu-
equal to the value of eachnaturaladaptorparameterdivided ral andfemalenaturalconditions,the adaptorswere naturally
by the value of the corresponding parameterfor the appro- produced/i/ vowel tokens spokenby a male and female
priatesyntheticcontinuumendpoint(i.e., maleadaptorcom- speaker,respectively.
paredto male end point, female adaptorcomparedto female Each subject received the baseline block of trials first
endpoint).This ratiowasconvertedto a percentage overlap followed by the block of adaptationtrials. Each block con-
valuefor eachseparateparameter(seeTableI). All param- sistedof 110 trials. Listenersusedthe samesix-pointrating
eter valuesfor the naturalmale adaptorfell below that of the scale as before.
male end point. For the naturalfemale adaptor,the F0 and
F2 valuesfell below the female end point, while the F1 and
F3 valuesfell abovethe endpoint.Thusthe auditoryoverlap B. Results and discussion
of the natural adaptorsto the end points, as defined by a
The resultscollapsedacrosssubjectsfor the four adap-
combination of F0 and formant values, was close to but not
tation conditionsare displayedin Fig. 4. The figure shows
identicalto the valuesfor the syntheticend-pointadaptors.
the identificationfunctionsbefore and after adaptationfor
The predictionsfor experiment3 are as follows. If the
each of the four adaptorconditions.A visual inspectionof
syntheticand naturaladaptorshave the sameeffect on adap- the data indicates that the identification function is shifted
tation, this would indicate that auditory overlap has little
toward the male end point for the male adaptor conditions
effect. This result would be consistentwith the hypothesis
and toward the female end point for the female adaptorcon-
that an abstractperceptualrepresentationof voice gender ditions.
was adaptedin the experiment.The other alternativeis that
Two three-way ANOVAs with the factors of token
the amountof adaptationproducedby the syntheticand natu-
(stimulusnumberin the continuum),condition(baselineor
ral adaptorsdiffers. This result would indicatethat auditory
adaptation),and adaptor(syntheticor natural)were run on
overlap does affect adaptation,suggestingthat an auditory-
the combineddatafrom the two male adaptorconditionsand
basedrepresentation of voice was adapted. the combined data from the two female conditions. This

A. Method analysiswas performedto comparethe magnitudeof adap-

tation producedby syntheticversusnatural adaptors.Four
1. Subjects separatetwo-way ANOVAs with the factors of token and
The subjectswere 56 volunteerswho participatedfor conditionwere conductedas followupsfor each of the four
coursecredit. The subjectswere drawn from the samepool adaptationconditionsto assessfurther the effects of each
as the previousexperiments. adaptor.
For the male adaptationdata,the three-wayanalysison
2. Stimuli the combineddata revealeda significantmain effect of token
The stimulus continuum was identical to that used for
[F(10,260)=397.1, p<0.001], condition [F(1,26)
- 19.5, p<0.001 ], andadaptor[F(1,26)=6.8, p<0.02].
experiment1. In addition,two naturallyproducedtokensof
A significantinteractionof tokenwith adaptorIF(10,260)
thevowel/i/from a malespeaker(JM) anda femalespeaker
-2.8, p<0.01] and token with condition [F(10,260)
(KG) were recorded,digitized,and storedon disk. The du-
ration of the natural male/i/was 240 ms and the duration of
- 4.8, p < 0.001 ] were observed,but no interactionof con-
dition with adaptor(F = 2.0, p < 0.17) and no three-way
the naturalfemale/i/was 250 ms. Spectrograms of the natu-
interaction(F-1.6, p<0.12) were found. The followup
ral and syntheticadaptorsare shownin Fig. 3.
analysisof the syntheticmale adaptationconditionrevealed
significantmain effects of token [F(10,140)=188.7,
3. Procedure
p<0.001] andcondition[F(1,14)-20.5, p<0.001]. The
There were four separateadaptationconditions:male interaction of token with condition was also significant
synthetic adaptor, female synthetic adaptor, male natural [F(10,140)=5.3, p<0.001]. Newman-Keuls post hoc
were tests of the interaction showed that differences between base-
run in each of the syntheticadaptorconditionsand 13 sub- line and adaptationwere reliable for stimuli 1-8 on the con-
jects in eachof the naturaladaptorconditions. tinuum.
Each listener received stimuli in a baseline condition The followup analysisof the natural male adaptation
and in an adaptationcondition.The baselinecondition was conditionrevealeda significanteffectof tokenIF(10,120)
identical to previous experiments.In the adaptationcondi- - 212.4, p <0.001 ]. The effects of condition[F= 3.7,
tions,the samestimuli were presentedfor identificationas in p<0.08] andthe iriteraction
of tokenwith condition
the baseline condition. However, these stimuli alternated - 1.6, p < 0.11 ) were not significant.
with sequences of adaptorrepetitions.The adaptorsequence For the female adaptationdata, the three-way analysis
consistedof 50 repetitionsof the adaptor,each repetition on the combined data revealed a significantmain effect of
separatedby a 1000-ms ISI. There were ten alternatingse- token[F(10,260)= 343.1, p<0.001 ], condition[F(1,26)
quencesof trials, with 11 randomizedcontinuumstimuli pre- - 6.8, p < 0.02 ], and adaptor[F( 1,26) - 10.5, p < 0.01 ].
sentedfor identificationafter eachadaptorsequence. Significantinteractions of tokenwith adaptor[F(10,260)

3085 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennixet aL' Voice gender 3085

Male Femal

. ,,

l:. l.):..........
:" •':.• :r;•-•
7.?ai•{ '-;':'i ";•:7•:•i''
a ................
';'• - .: :•:'•:.•.....................................
": ß'.:. ••• •.• ;:•.• .......
• ..:<.•: : ..:-.-
............ :::•t.:•;.: .':.:,•:.:.-..
•:•:}:::;':"•. :.',.•:"
:'. '-..'"
•:::'"': - -- : ] • ::-::<::•:5.:x:..2
• "::'•"•:':'•:<•'•'

• .............................
• .........

:• '[-:...:.
•.: ...................
] -•:
8. •38 ......
Time ( seo ) I.


FIG. 3. Spectrogramsfor the syntheticmale andfemaleadaptors.Spectrograms

for the syntheticadaptorsare shownon top and spectrograms
for the natural
adaptorsareshownat thebottom.Amplitudedisplaysare shownon topfor eachstimulus,F0 valuesovertime in themiddle,andspectrographic displayswith
the centerformatvaluesfor F1, F2, andF3 shownat the bottom.

=8.1, p<0.001] and token with condition[F(10,260) [F(10,140)='2.2,p< 0.03]. Posthoctests

of theinterac-
= 3.5, p<0.001 ] were observed,but no interactionof con- tion showeddifferencesbetweenbaselineand adaptationfor
dition with adaptor(F- 0.1, p < 0.79) and no three-way stimuli 5 and 8-10.
interaction(F= 1.8, p<0.06) were found. The followup The followup analysisof the natural female adaptation
analysis of the synthetic female adaptation condition re- condition revealed a significant main effect of token
vealed a significantmain effect of token [F(10,140) [F(10,120) = 332.3, p< 0.001 ] andof conditionIF(1,12)
= 104.7, p < 0.001 ], but not condition(F = 2.8, p < 0.12). =9.1, p<0.02]. The interactionof token with condition
The interaction of token with condition was significant wassignificant[F( 10,120) = 3.5, p< 0.001 ]. Theposthoc

3086 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3086

i i

2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11


--'-- Baseline ....... Synthetic Male Adapt --'--Baseline ...... Natural Male Adapt


2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11


---'--Baseline ...... Synth Female Adapt --'-- Baseline ....... Nat Female Adapt

FIG. 4. Baselineand adaptationidentificationdatafrom experiment3. The syntheticmale and naturalmale data are shownon the top left and right and the
syntheticfemale and naturalfemale data are shownon the bottomleft and right.

testsshoweddifferencesbetweenbaselineand adaptationfor The resultsfor the naturalfemale adaptorwere different.

stimuli 5-7.1 Although the F0 and formant valuesfor the natural female
Overall, the resultsof experiment3 indicatethat the au- adaptordid not match the female end point, adaptationwas
ditory overlap of adaptorto end point had some effect on still observed.Overall, when consideringboth the male and
adaptation.The first resultis that the syntheticadaptorspro- female adaptorresults,there is weak evidencefor the hy-
ducedsignificantperceptualchangesin the predicteddirec- pothesisthat auditoryoverlaphas an effect on adaptationof
tion toward the end points, as exhibited by the significant voice. However, the hypothesisthat higher levels of voice
token by conditioninteractions.For the natural adaptors,the processingcontributeto adaptationcannot be completely
ruled out.
male adaptordid not have a significanteffect while the fe-
male adaptorproduceda changesimilar to the syntheticfe-
male adaptonAlthough the lack of a condition by adaptor IV. EXPERIMENT 4
interactionin the overall analysesindicatesno differential
Experiment4 provided a further test of the effects of
effects of syntheticversusnatural adaptorsfor male or fe-
auditory overlap on adaptation.In experiment3, auditory
male adaptation,the individual analysesfor each condition
overlap was definedas changesin F0 and formant values
show that some differences do exist.
together.However, the adaptationobservedcould have de-
For male voice adaptation,the auditorymismatchof the
pendedon an auditorymatchof the adaptorto the continuum
naturaladaptorto the male end point resultedin an absence end point for the formantvaluesalone,F0 alone,or both. In
of adaptationwith the naturalmale adaptor.Theseresultsare this experiment,two adaptorswere usedwhere the formant
intriguingwhen one considersthat all valuesof the acoustic valuesand F0 valuesof the adaptorswere set up in opposi-
parametersof the natural male adaptor were less than the tion to one anotherin terms of values appropriatefor male
valuesof the syntheticmale adaptonOne speculativeanswer andfemale seriesendpoints.The adaptorsin this experiment
is that voice adaptationis tunedto regionsof the continuum were designedto test adaptationof the male end of the con-
in a manner similar to that found for phonetic continua tinuumonly. One adaptorpossessed formantvaluesidentical
(Miller et al., 1983; Samuel, 1982). Under this scenario, to the male end point but an F0 valueidenticalto the female
even thoughthe male adaptoris unambiguouslyidentifiedas end point (the "formantsadaptor").The otheradaptorpos-
male, the acousticparametersof the natural male adaptor sessedan F0 value identical to the male end point but for-
may have fallen outside the range of values where it was mant valuesidenticalto the female end point ("F0 adap-
effective. tor"). Thusthe voiceadaptorsusedherewere ambiguousin

3087 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3087
HighFO,LowFs Low F..O.,....H.


779 ! lee ( see ) 2.547

FIG. 5. Spectrogramsfor the formantsandF0 adaptors.Amplitudedisplaysare shownon top for eachstimulus,F0 valuesovertime in the middle,and
displayswith the centerformantvaluesfor F1, F2, and F3 shownat the bottom.

termsof the major acousticparametersrelatedto voice gen- is importantto assesswhetherthe voice adaptationeffects
der. However, the perceptualratingsfrom the pilot experi- found previouslywere due to this factor.
ment describedin experiment 1 indicated that there were Finally, if F0 overlapis responsiblefor adaptation,then
somedifferencesbetweenthe adaptorsin termsof voice gen- adaptationshouldonly be obtainedwith the F0 adaptor.If
der quality.On a five-pointrating scalefrom male to female neitheradaptorhasan effect,this wouldindicatethatit is the
combination of the formant and F0 acoustic factors that is
voice, the F0 adaptorreceiveda rating of 1.8 and the for-
mants adaptorreceiveda rating of 3.3. Thus the F0 adaptor important in producing adaptation. This result would
was rated as more male than the formantsadaptor. strengthenthe hypothesisthat the perceptualrepresentation
The use of theseadaptorsalso allows an assessment of of voice genderis basedon auditoryparametersand not ab-
whether perceptualvoice quality is responsiblefor adapta- stractvoice representations.
tion. If this factor matters,one would expect that a signifi-
cant amount of adaptationwould be obtainedwith the F0 A. Method

adaptorbut not the formantsadaptor(althoughvoiceinfor-

1. Subjects
mation can still be perceivedin the absenceof F0; see Re-
mez et al., 1987).This resultwouldsupportthe involvement Twenty-eightvolunteersfrom the same pool with the
of higher-levelabstractvoice representationsin adaptation. samecharacteristics
as the previousexperimentswere used.
On the other hand, the adaptorsselectedalso allow assess-
ments of the acoustic factors that are involved. If formant 2. Stimuli

overlap drives adaptation,then adaptationshould be ob- The stimuluscontinuumwas identical to experiment 1.

servedwith the formantsadaptoronly. This result would be Two adaptorswere used that varied in their acousticand
importantbecauseone possibilityfor the effectsof the syn- perceptualsimilarity to the male end-point stimulus.The
thetic adaptorsfound in the previous experimentwas that adaptorswere drawnfrom the pool of stimulithat were rated
they were due to simple formant overlap. In the synthetic in experiment1. Adaptor1 (the formantsadaptor)had for-
voice series,formant values changedacrossthe continuum, mant values identical to the male end point, but F0 values
althoughall stimuliwere identifiedas/i/. Previouswork with identicalto thefemaleendpoint.Adaptor2 (theF0 adaptor)
auditory/phonetic adaptationhas shownthat formantoverlap had F0 valuesidentical to the male end point, but formant
or formantpatternoverlapresultsin adaptationof phonetic valuesidenticalto the femaleendpoint (seeAppendixB for
continua(Ades, 1976; Sawusch,1977). Since the formant synthesis values).Spectrograms of bothadaptorsare shown
valuesfor the syntheticadaptorscompletelyoverlappedwith in Fig. 5. From the pilot experiment,adaptor 1 received a
the endpoints,it is possiblethat adaptationwas drivenby an rating of 3.3 on the five-pointrating scalefrom male to fe-
auditory/phonetic formantfactorunrelatedto voice gender.It male voice and adaptor2 had a value of 1.8.

3088 d. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3088
RATING F0 adaptorwere identical to the formant values for the fe-
male endpoint.If adaptationwas drivenby formantoverlap,
then the F0 shift could be interpretedas a formant-driven
adaptationeffect toward the female end of the continuum
insteadof an assimilationshift. However, this explanation
losesstrengthas one considersthat the formantsadaptorhad
no effect. If adaptationwas formant driven, both adaptors
shouldhave had significantadaptingeffectsin oppositedi-
rections.It is unclearwhy an assimilationeffect would occur

2 3 4 5 6 7 8 9 10 with the F0 adaptor,but the importantpoint is that the ad-

STIMULUS aptationeffectwas absent.The absenceof adaptationeffects
for eitheradaptoris importantfor two reasons:(1) It shows
'-'-- Baseline ....... Formants Adapt
that simpleauditoryoverlapof one isolatedvoice parameter
RATING is insufficientto produceadaptation;and (2) it showsthatthe
perceptualvoice quality of the adaptorhas little effect on
adaptation.The lack of a voice quality effect is consistent
with the idea that voice adaptationis relatedto adaptationof
an auditory-based representation
of voice. Furthermore,be-
causeneither isolated acousticfactor producedadaptation
alone, it appearsthat the perceptualrepresentation of voice
genderis one where theseacousticfactorsare integratedto-
getherin an auditory/spectral representationof voice.
0 I I I I I I I I I ,
I 2 3 4 5 6 7 8 9 10 11


'-'--Baseline ....... F0 Adapt

The final experimentwas conductedas a further test of
the hypothesisthat voice adaptationis relatedto an auditory-
FIG. 6. Baselineand adaptationidentificationdata from experiment4. The
basedperceptualrepresentation. In experiment5, a situation
formantsadaptationdataare shownat the top andthe F0 adaptationdataare
shown at the bottom. was setup where voice adaptorswere usedthat were percep-
tually ambiguousfor eachindividualsubject.Modifying the
3. Procedure
methodsthatSawuschandPisoni(1976) usedfor ambiguous
adaptorsin auditory/phoneticadaptation,the baselineidenti-
There were two adaptationconditions:formantsadaptor fication performancefor each subjectwas tabulatedand a
and F0 adaptor.Fourteensubjectswere run in each condi- stimulusfrom the voice continuumclosestto the boundary
tion. All other aspectsof the procedurewere identical to between male and female voice selectedas the ambiguous
experiment3. adaptorfor that particular subject.Then, each subjectwas
given instructionsto bias him/her in termsof the perceptual
B. Results and discussion voice quality of the ambiguousadaptonHalf of the subjects
were told that the adaptorwas male and half were told the
The resultsare shownin Fig. 6. At the top of figure are adaptorwas female. If voice adaptationis related to how
the formantsadaptorresultsand at the bottom of the figure subjectsclassify voice into categories,then a cognitive in-
are the F0 adaptorresults. structionalmanipulationto biasthe voicegenderof the adap-
For the formantsadaptorcondition,a significantmain tor shouldresult in adaptationtoward the female end point
effect of token was obtained [F(10,130)=145.8, for the female instructionsgroup and adaptationtoward the
p < 0.001 ]. However,therewasno significant effectof con- male end point for the male instructionsgroup.However, if
dition (F= 1.0, p < 0.34) and no significantinteraction(F voice adaptationis relatedto auditory-based representations
= 0.7, p < 0.69). For theF0 adaptorcondition,a significant of voice, then the instructions should have no effect and no
main effect of token was obtained[F(10,130)--287.8, net differencebetween baselineand adaptationconditions
p < 0.001 ], but no effectof condition(F = 0.5, p < 0.48). should be found.
However, a significantinteractionof token with condition
wasobserved [F( 10,130) = 6.1, p < 0.001 ]. Posthoctests A. Method
of the interaction showed differences between baseline and
1. Subjects
forstimuli3 and6-9.2
Thirty volunteersfrom the same pool with the same
Overall, therewas no evidencefor adaptationwith either
as the previousexperimentswere used.
adaptor.Althougha significantinteractionof tokenwith con-
dition was obtainedfor the F0 adaptor,the identification
2. Stimuli
shift was toward the female end point. If anything,this was
an assimilationeffect, not adaptation. The stimuluscontinuumwas identical to experiment1.
There is one possibleexplanationfor the F0 adaptor The adaptorfor each subjectwas a stimulusdrawn from the
shifttowardthe femaleendpoint.The formantvaluesfor the continuum.

3089 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3089
RATING However,there was no significanteffect of condition(F
= 2.3, p < 0.15) andno significantinteractionof tokenwith
condition(F= 1.4, p<0.19). For the female instructions
group, significantmain effects of token [F(10,140)
=360.8, p<0.001] and condition [F(1,14)=21.3,
p < 0.001 ] wereobtained.The interactionof tokenwith con-
dition was also significant[F(10,140)=3.1, p<0.001].
Post hoc tests of the interaction showed reliable differences
betweenbaselineand adaptationconditionsonly for stimuli
5 6 7 8 9 10
4-7, 9, and11.3
STIMULUS The resultsfor the male instructionsgroupwere consis-
tent with an auditory-basedexplanationof voice adaptation.
'-'-- Identification ....... Male Instructions
However, the resultsfor the female instructionsgroup were
RATING unexpected.When subjectswere told that the adaptorwas
female, they heard the stimuli acrossthe continuumas more
Why wouldmaleinstructions
effect but female instructionsproduce assimilation?One
speculativeansweris that subjectswho receivedfemale in-
structionsadjustedtheir overall criteria to favor female re-
sponsesto the other stimuli in the series.However, this ex-
planation is ad hoc. It is interesting to note that, in
experiment4, assimilationtoward the female end point was
2 3 4 5 6 7 8 9 10 11 also found, but was producedby adaptationwith the F0
STIMULUS adaptor.Although there may be some common mechanism
producingthis effect in both experiments,the nature of this
--'-- Baseline ...... Female Instructions
mechanism remains unclear.

FIG. 7. Baselineand adaptationidentificationdata from experiment5. The

male instructionsadaptationdata are shownat the top and the female in- Vl. CONCLUSIONS
structionsadaptationdata are shownat the bottom.
Overall, the findingsfrom the seriesof experimentspre-
sentedhere convergeto form a preliminary picture of the
3. Procedure
perceptualrepresentations of voice gender.First, the results
There were two adaptationconditions:male instructions from experiment 1 using identification and discrimination
adaptor and female instructionsadaptor. Fifteen subjects proceduresindicatethat the perceptionof a syntheticvoice
were run in eachcondition.The adaptorfor eachsubjectwas gender continuum is not categorical.The absenceof cat-
selectedby examiningthe baselinetrials data for each indi- egoricalperceptionsuggests that the perceptionof voicegen-
vidual subject.The averagerating was calculatedfor each der informationin the speechsignalis accomplished through
stimulusand the stimulushaving a rated value closestto the auditory psychophysicalprocesses.Furthermore,the high
midpointvalueof the male/femaleratingscale(3.5) was se- overalldiscriminationperformanceindicatesthat specificau-
lected as the ambiguousadaptor.Thus, for each individual ditow informationabout voice is retained and, at the very
subject,the adaptorselectedwas basedonly on that subject's least,is availableto discriminationprocesses.Sincethe per-
baselineidentificationperformance.Subjectsin the male in- ceptualprocessesthat extractvoice informationfrom the sig-
structionsgroup were told that they would hear a "male nal rely on contactwith storedrepresentations of speechin
voice" adaptorand subjectsin the female instructionsgroup LTM, it would appearthat voice genderis not represented in
were told they would hear a "female voice" adaptorin the memoryin termsof abstractrepresentations that contain"re-
adaptationblock. The instructionsread by subjectsfor the duced" or canonicalrepresentations of voice.
adaptationconditionswere identical to the instructionsused Second,the resultsfrom the selectiveadaptationexperi-
in experiment4 for the male and female syntheticadaptor ments in experiments3-5 are favorable to the hypothesis
conditions.In addition,the experimenterswere instructednot that the perceptualrepresentations of voice genderare audi-
to say anythingabout the ambiguousnatureof the adaptor. tory based.There was someindicationin experiment3 that
All other aspectsof the procedurewere identicalto experi- the auditory overlap of the adaptor to the continuumend
ment 3. point affectedadaptation.Interpretingthis result within the
context of other speechadaptationstudieswhere auditory
B. Results and discussion
for adaptation
Sawuschand Jusczyk,1981), it appearsthat the perceptual
The resultsare shownin Fig. 7. At the top of figure are representationof voice gender that was adapted was an
the male instructionsgroup resultsand at the bottom of the auditory-basedrepresentation.In experiment4, further sup-
figure are the female instructionsgroupresults. port was given to this idea, as adaptorsthat shared one
For the male instructionsgroup,a significantmain effect acousticparameter(eitherformantvaluesor F0 value)with
of token was obtained[F(10,140)=350.8, p<0.001]. a voice end point failed to producean adaptingeffect, again

3090 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennixet al.' Voice gender 3090
suggestingthat auditoryoverlapis important.Finally, in ex- findingsrepresenta first step toward definingthe natureof
periment 5, there was no clear-cutevidencein supportof a the perceptualrepresentationof voice genderand that much
higher level cognitivefactor in voice adaptationand by ex- of the knowledgewe have about acoustic-phoneticpercep-
tensionno evidencefor an abstractrepresentationof voice tion is based on work with syntheticspeech.In addition,
gender. syntheticvoiceshave been usedby othersto determinehow
When the findingsfrom all experimentsare considered listeners
judgethe perceptualqualityof voice(Gerrattet al.,
together,somedifferencesand similaritiesbetweenphoneme 1993). But, we do acknowledge that in orderto providea
representations and voice genderrepresentations can be dis- more definitive examinationof voice representation,future
cussed.The most importantfinding is that voice genderis studiesshouldcomparesyntheticspeechto naturalspeechin
not storedin abstractmale and female voice representations. addressingrelated issues.
Instead, voice gender appearsto be stored in the form of In conclusion,the hypothesisthat voice genderis stored
auditory-basedperceptualrepresentations. These representa- in abstractrepresentationsin memoryreceivedlittle support.
tions,in all probability,containspecificauditoryinformation However, the present investigationfocused only on a few
about acousticvoice parametersrelevant to gender.The re- preliminary aspectsof this issue. Future researchneeds to
sults of experiment4 suggestthat theserepresentationsare examinein furtherdetail the prototypehypothesisof storing
not basedon one isolatedparameterlike F0 or formant fre- voicesin memory (Papcunet al., 1989) and other details
quencies.Instead,the representations are probablyan audi- about voice representationnot specificallyrelated to voice
tory compositeof the various acousticfactors relevant to gender.
voice genderlike F0, formantfrequencies,breathiness,etc.
Althoughthereis a closerelationshipbetweenphoneticcod- ACKNOWLEDGMENTS
ing and voice codingprocessesduringperception,the repre-
This researchwas supportedby NIDCD Grant No. R01-
sentationsof phonemesand voicesappearto be qualitatively
DCO 1667 to one of the authors(J. W. M.) We wish to thank
different in that phoneticrepresentationsmay not be as de-
tailed. Georgianne Baartmans, Renee Dudzinski, Kathy Gorday,
and Margaret Webb for running subjectsand Lin Zong for
The adaptationeffectsobservedalso contrastwith stud-
programming assistance.Thanks also to Jim Ralston, Jim
ies examiningvowel adaptation(Godfrey, 1980; Morse
Sawusch,and one anonymousreviewer for useful comments
et al., 1976). Vowel adaptation,as assessed
in thesestudies,
and critiques.
can occur in some circumstanceswhen the vowel adaptor
and vowel end point are spectrallydissimilar.Resultsof this
type can be explainedby positingthe involvementof either
higher level auditorypatternsor abstractphoneticrepresen-
Male end-point synthesis values
tations.However, the lack of adaptationwith spectrallydis-
similar voice adaptorsobservedin the presentstudysuggests
that voice is tied to lower level auditoryrepresentations.
The presentresultsalso serve to suggestfuture direc- sr C 10 000 nf C 4
tions to pursue concerningthe acousticfactors related to du C 250 ss C 2
voicegenderperception(Coleman,1976;Lasset al., 1976; ui C 5 rs C 1
Murry andSingh,1980;SinghandMurry, 1978). Insteadof f0 V 136 av V 60
focusing an separateindividual acousticfactors and how F 1 v 270 b1 v 60
they contributeto genderperception,the resultsof experi- F2 v 2290 b2 v 90
ments 3 and 4 suggestthat perhapsvoice gendershouldbe F3 v 3010 b3 v 150
studied in terms of integrated auditory representations.In F4 v 3500 b4 v 200
addition,the presentfindingssuggestthat infants' classifica- F5 v 3700 b5 v 200
tion of voiceinto male andfemalecategories(Miller, 1983; f6 v 4990 b6 v 500
Miller et al., 1982) may be basedon heuristicsthat utilize fz v 280 bz v 90
specific and detailed auditory voice information. Further fp v 280 bp v 90
studiesof the type performedin the presentstudywith male- ah V 35 oq V 75
only voices and female-only voices may help to elucidate at v 0 tl V 20
some of these issues. af v 0 sk v 0
Finally, one dimensionalissueshouldbe mentionedcon- al v 0 pl v 80
cerningthe presentresults.All of the experimentsreported a2 v 0 p2 v 200
here usedsyntheticspeechtokens.One possiblecriticismof a3 v 0 p3 v 350
this studyis that the resultsfound with syntheticspeechmay a4 v 0 p4 v 500
not generalizeto natural speech.Synthetic voices are re- a5 v 0 p5 v 600
duced stimuli that do not contain the full complementof a6 v 0 p6 v 800
acoustic information contained in natural voices. There is an v 0 ab v 0
much evidenceindicating that perceptionand memory for ap v 0 os C 0
syntheticstimuliis differentthanfor naturalstimuli(Ralston gO v 64 dF v 0
et al., 1995). Our reply to thispotentialdiscussion
is thatour db v 0

3091 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3091
Varied parameters SYM V/C VAL SYM V/C VAL

Time fO av ah oq tl F3 v 3310 b3 v 150

F4 v 4100 b4 v 200
0 136 0 20 25 0 F5 v 3700 b5 v 200
5 136 54 20 28 0
f6 v 4990 b6 v 500
10 136 60 20 31 0
15 136 60 20 34 0
fz v 280 bz v 90
20 136 60 20 37 0 fp v 280 bp v 90
25 136 60 20 40 0 ah V 35 oq V 75
30 136 60 20 43 0 at v 0 tl V 20
35 136 60 20 46 0 af v 0 sk v 0
40 136 60 20 50 0 al v 0 pl v 80
45 136 60 20 50 0
a2 v 0 p2 v 200
50 136 60 20 50 0
a3 v 0 p3 v 350
55 136 60 20 50 0
60 136 60 20 50 0
a4 v 0 p4 v 500
65 136 60 20 50 0 a5 v 0 p5 v 600
70 136 60 20 50 0 a6 v 0 p6 v 800
75 136 60 20 50 0 an v 0 ab v 0
80 136 60 20 50 0 ap v 0 os C 0
85 136 60 20 50 0 gO v 64 dF v 0
90 136 60 20 50 0 db v 0
95 136 60 20 50 0
100 136 60 20 50 0
105 136 60 20 50 0
Varied parameters
110 136 60 20 50 0
115 136 60 20 50 0 Time fO av ah oq tl
120 136 60 20 50 0
0 250 0 40 45 10
125 136 60 20 50 0
5 250 54 40 48 10
130 135 60 20 50 0
10 250 60 40 51 10
135 134 60 20 50 0
140 133 60 20 50 0 15 250 60 40 54 10
145 132 60 20 50 0 20 250 60 40 57 10
150 131 60 20 50 0 25 250 60 40 60 10
155 130 60 20 50 0 30 250 60 40 63 10
160 129 60 20 50 0 35 250 60 40 66 10
165 128 60 20 50 0 40 250 60 40 70 10
170 127 60 20 50 0 45 250 60 40 70 10
175 126 60 20 50 0 50 250 60 40 70 10
180 125 60 20 50 0 55 250 60 40 70 10
185 124 58 21 50 1 60 250 60 40 70 10
190 123 57 22 51 3 65 250 60 40 70 10
195 122 56 23 52 4 70 250 60 40 70 10
200 121 55 24 53 6 75 250 60 40 70 10
205 120 53 25 53 7 80 250 60 40 70 10
210 119 52 26 54 9 85 250 60 40 70 10
215 118 51 28 55 10 90 250 60 40 70 10
220 118 50 29 56 12 95 250 60 40 70 10
225 117 48 30 56 13 100 250 60 40 70 10
230 116 47 31 57 15
105 250 60 40 70 10
235 115 46 32 58 16
110 250 60 40 70 10
240 114 45 33 59 18
115 250 60 40 70 10
245 113 0 35 60 20
120 250 60 40 70 10
125 250 60 40 70 10
130 248 60 40 70 10
Female end-point synthesis values 135 246 60 40 70 10
140 244 60 40 70 10
SYM V/C VAL SYM V/C VAL 145 242 60 40 70 10
150 241 60 40 70 10
sr C 10 000 nf C 4 155 239 60 40 70 10
du C 250 ss C 2 160 237 60 40 70 10
ui C 5 rs C 1 165 235 60 40 70 10
f0 V 250 av V 60 170 234 60 40 70 10
F1 v 310 bl v 60 175 232 60 40 70 10
F2 v 2790 b2 v 90 180 230 60 40 70 10

3092 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995

Mullennix et aL: Voice gender 3092
Time fO av ah oq tl Time fO av ah oq tl
185 229 58 41 70 11 40 250 60 20 50 0
190 227 57 42 71 13 45 250 60 20 50 0
195 225 56 43 72 14 50 250 60 20 50 0
200 224 55 44 73 16 55 250 60 20 50 0
205 222 53 45 73 17 60 250 60 20 50 0
210 220 52 46 74 19 65 250 60 20 50 0
215 219 51 48 75 20 70 250 60 20 50 0
75 250 60 20 50 0
220 217 50 49 76 22
80 250 60 20 50 0
225 216 48 50 76 23
85 250 60 20 50 0
230 214 47 51 77 25
90 250 60 20 50 0
235 212 46 52 78 26
95 250 60 20 50 0
240 211 45 53 79 28
100 250 60 20 50 0
245 209 0 55 80 30 105 250 60 20 50 0
110 250 60 20 50 0
115 250 60 20 50 0
120 250 60 20 50 0
125 250 60 20 50 0
130 248 60 20 50 0
"Formants adaptor" synthesis values 135 246 60 20 50 0
140 244 60 20 50 0
SYM V/C VAL SYM V/C VAL 145 242 60 20 50 0
150 241 60 20 50 0
sr C 10 000 nf C 4 155 239 60 20 50 0
du C 250 ss C 2 160 237 60 20 50 0
ui C 5 rs C 1 165 235 60 20 50 0
170 234 60 20 50 0
f0 V 250 av V 60
175 232 60 20 50 0
F1 v 270 b1 v 60
180 230 60 20 50 0
F2 v 2290 b2 v 90
185 229 58 21 50 1
F3 v 3010 b3 v 150 190 227 57 22 51 3
F4 v 3500 b4 v 200 195 225 56 23 52 4
F5 v 3700 b5 v 200 200 224 55 24 53 6
205 222 53 25 53 7
f6 v 4990 b6 v 500
210 220 52 26 54 9
fz v 280 bz v 90 215 219 51 28 55 10
fp v 280 bp v 90 220 217 50 29 56 12
ah V 35 oq V 75 225 216 48 30 56 13
at v 0 tl V 20 230 214 47 31 57 15
af v 0 sk v 0 235 212 46 32 58 16
al v 0 pl v 80 240 211 45 33 59 18
245 209 0 35 60 20
a2 v 0 p2 v 200
a3 v 0 p3 v 350
a4 v 0 p4 v 500 "FO adaptor" synthesis values
a5 v 0 p5 v 600
a6 v 0 p6 v 800 SYM V/C VAL SYM V/C VAL
an v 0 ab v 0
ap v 0 os C 0 sr C 10000 nf C 4
du C 250 ss C 2
gO v 64 dF v 0
db v 0 ui C 5 rs C 1
f0 V 136 av V 60
F1 v 310 b1 v 60
Varied parameters F2 v 2790 b2 v 90
F3 v 3310 b3 v 150
Time fO av ah oq tl F4 v 4100 b4 v 200
F5 v 3700 b5 v 200
0 250 0 20 25
5 250 54 20 28
f6 v 4990 b6 v 500
10 250 60 20 31 fz v 280 bz v 90
15 250 60 20 34 fp v 280 bp v 90
20 250 60 20 37 ah V 35 oq V 75
25 250 60 20 40 at v 0 tl V 20
30 250 60 • 20 43 af v 0 sk v 0
35 250 60 20 46 al v 0 pl v 80

3093 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3093
SYM V/C VAL SYM V/C VAL •Forallfouradaptation
categoryboundarydata.Categoryboundarieswere computedfor baseline
and adaptationconditionsfor eachsubjectandthen analyzedin a one-way
a2 v 0 p2 v 200
ANOVA. The resultsusing this method mirrored the resultsreportedon
a3 v 0 p3 v 350 analysesof raw identificationdata,in termsof overalleffectof condition:
a4 v 0 p4 v 500 [F(1,14)=6.8, p<0.03] for syntheticmale, (F=0.8, n.s.) for natural
a5 v 0 p5 v 600 male (F-0.4, n.s.)for syntheticfemale,and[F(1,12)= 10.7, p<0.01]
a6 v 0 p6 v 800 for natural female. Thus the effects of adaptationas measuredthrough
an v 0 ab v 0 categoryboundarymovementanalyseswere approximatelythe same.
ap v 0 os C 0 ary data. The analysesshowedthat the effect of condition,usingcategory
gO v 64 dF v 0 boundarydata,was similarto the analysesbasedon raw identificationdata
db v 0 for the formantsadaptor(F= 1.3, n.s.)andfor the F0 adaptor(F= 3.2,
Varied parameters 3Results
ary data. The resultsshowedthat the effect of condition,usingcategory
boundarydata,was similarto the analysesbasedon raw identificationdata
Time fO av ah oq t1 for the male instructions
condition(F= 2.5, n.s.) and the femaleinstruc-
tionscondition[F (1,14) = 9.0, p < 0.01 ].
0 136 0 20 25 0
5 136 54 20 28 0
10 136 60 20 31 0
Ades,A. E. (1976). "Adaptingthe propertydetectorsfor speechpercep-
tion," in New Approachesto LanguageMechanisms,editedby R. J. Wales
15 136 60 20 34 0
andE. Walker(North-Holland, Amsterdam), pp. 55-107.
20 136 60 20 37 0 Assmann,P. F., Nearey,T M., andHogan,J. T. (1982). "Vowelidentifica-
25 136 60 20 40 0 tion: Orthographic,perceptual,and acousticaspects,"J. Acoust.Soc.Am.
30 136 60 20 43 0 71, 975-989.
35 136 60 20 46 0 Coleman,R. O. (1976). "A comparison
of the contributions
of two voice
40 136 60 20 50 0 qualitycharacteristics
to the perceptionsof malenessandfemalenessin the
45 136 60 20 50 0 voice," J. SpeechHear. Res. 19, 168-180.
50 136 60 20 50 0 Creelman,C. D. (1957). "Caseof the unknowntalker,"J. Acoust.Soc.Am.
29, 655.
55 136 60 20 50 0
Diehl, R. L. (1981). "Featuredetectors
for speech:A criticalreappraisal,"
60 136 60 20 50 0
Psychol.Bull. 89, 1-18.
65 136 60 20 50 0 Diehl, R. L., Kleunder,K., andParker,E. M. (1985). "Are selectiveadap-
70 136 60 20 50 0 tationandcontrasteffectsreally distinct?,"J. Exp. Psychol.Hum. Percept.
75 136 60 20 50 0 Performance 11, 209-220.
80 136 60 20 50 0 Fourcin,A. J. (1968). "Speech-source
interference,"IEEE Trans.Audio
85 136 60 20 50 0 Electroacoust. ACC-16, 65-67.
90 136 60 20 50 0 Gamer,W. (1974). TheProcessing
of Informationand Structure(Erlbaum,
95 136 60 20 50 0
Gerratt, B. R., Kreiman, J., Antonanzas-Barroso, N., and Berke, G. S.
100 136 60 20 50 0
(1993). "Comparinginternal and externalstandardsin voice quality
105 136 60 20 50 0 judgements,"J. SpeechHear. Res. 36, 14-20.
110 136 60 20 50 0 Godfrey,J. J. (1980). "Comparison of consonantalandvocaliccuesin se-
115 136 60 20 50 0 lectiveadaptation," Percept.Psychophys. 28, 103-111.
120 136 60 20 50 0 Harnad,S., (Ed.). (1987). CategoricalPerception(Cambridge U. P., Cam-
125 136 60 20 50 0 bridge,England).
130 135 60 20 50 0 Johnson,K. (1990a)."The roleof perceived speakeridentityin F0 normal-
135 134 60 20 50 0 ization of vowels," J. Acoust. Soc. Am. 88, 642-654.

140 133 60 20 50 0
K. (1990b). "Contrastandnormalization
in vowelperception,"
Phon. 18, 229-254.
145 132 60 20 50 0
Klatt, D. H. (1980). "Softwarefor a cascade/parallel
150 131 60 20 50 0 J. Acoust. Soc. Am. 67, 971-995.
155 130 60 20 50 0 Klatt, D. H. (1989). "Reviewof selectedmodelsof speechperception,"in
160 129 60 20 50 0 Lexical Representationand Process,edited by W. D. Marslen-Wilson
165 128 60 20 50 0 (MIT, Cambridge,MA), pp. 169-226.
170 127 60 20 50 0 Klatt,D. H., andKlatt,L. C. (1990). "Analysis,synthesis,
175 126 60 20 50 0 voice quality variationsamongfemale and male talkers,"J. Acoust.Soc.
Am. 87, 820-857.
180 125 60 20 50 0
Kreiman,J., andPapcun,G. (1991). "Comparingdiscrimination
185 124 58 21 50 1
nition of unfamiliarvoices," SpeechCommun.10, 265-275.
190 123 57 22 51 3
Kuhl, P. K. (1991). "Human adultsand humaninfantsshowa perceptual
195 122 56 23 52 4 magneteffect for the prototypesof speechcategories,monkeysdo not,"
200 121 55 24 53 6 Percept.Psychophys. 50, 93-107.
205 120 53 25 53 7 Ladefoged,P., andBroadbent, D. (1957). "Informationconveyed
by vow-
210 119 52 26 54 9 els," J. Acoust. Soc. Am. 29, 98-104.
215 118 51 28 55 10 Lass,N.J., Hughes,K. R., Bowyer, M.D., Waters,L. T., and Bourne,V. T
220 118 50 29 56 12 (1976). "Speakersexidentification
from voiced,whispered,
isolated vowels," J. Acoust. Soc. Am. 59, 675-678.
225 117 48 30 56 13
Liberman,A.M., Harris,K. S., Hoffman,H. S., andGriffith,B.C. (1957).
230 116 47 31 57 15
"The discriminationof speechsoundswithin and acrossphonemebound-
235 115 46 32 58 16
aries," J. Exp. Psychol.54, 358-368.
240 114 45 33 59 18 Macmillan,N. A., Kaplan,H. L., and Creelman,C. D. (1977). "The psy-
245 113 0 35 60 20 chophysics of categoricalperception,"Psychol.Rev. 84, 452-471.

3094 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3094
Miller, C. L. (1983). "Developmental
changes in male/femaleclassification Samuel,A. G. (1982). "Phoneticprototypes,"Percept.Psychophys.
by infants,"Infant Behav.Dev. 6, 313-330. 307-314.
Miller, C. L., Younger,B. A., andMorse,P.A. (1982). "The categorization Samuel,A. G. (1986). "Red herringdetectorsand speechperception:in
of male and female voicesin infancy," Infant Behav. Dev. 5, 143-159. defenseof selectiveadaptation,"CognitivePsychol.18, 452-499.
Miller, C. M. (1990). ONLINE;
A general-purpose
programfor the generation Samuel,A. G. (1988). "Centralandperipheralrepresentationof whispered
and controlof laboratoryexperimentsand perceptualtestswith acoustic
and voiced speech,"J. Exp. Psychol.:Hum. Percept.Perform. 14, 379-
stimuli,for usewith microcomputers
usingMS-DOS[computer program], 388.
LaboratoryMicrosystems,Baton Rouge, LA.
Miller, J. L., Connine,C. M., Schermer,T. M., and Kleunder,K. R. (1983). Sawusch,J. R. (1977). "Peripheralandcentralprocesses in selectiveadap-
"A possibleauditorybasisfor internal structureof phoneticcategories," tationof placeof articulationin stopconsonants,"
J. Acoust.Soc.Am. 62,
Percept.Psychophys.46, 505-512.
Morse,P.A., Kass,J. E., andTurkienicz,R. (1976). "Selectiveadaptationof Sawusch,J. R. (1986). "Auditoryand phoneticcodingof speech,"in Pat-
vowels,"Percept.Psychophys. 19, 137-143. tern Recognitionby Humansand Machines,editedby E. C. Schwaband
Mullennix,J. W., and Pisoni,D. B. (1990). "Stimulusvariabilityandpro- H. C. Nusbaum(Academic,Orlando,FL), Vol. 1, pp. 51-88.
cessingdependenciesin speechperception,"Percept.Psychophys.47,, Sawusch,J. R., andJusczyk,P. W. (1981). "Adaptationandcontrastin the
perceptionof voicing,"J. Exp. Psychol.:Hum. Percept.Perform.11, 242-
Mullennix, J. W., Pisoni,D. B., and Martin, C. S. (1989). "Some effectsof 250.
talker variability on spokenword recognition,"J. Acoust. Soc. Am. 85, Sawusch,J. R., and Pisoni,D. B. (1976). "Responseorganizationin selec-
tive adaptationto speechsounds,"Percept.Psychophys. 20, 413-418.
Murry, T., and Singh,S. (1980). "Multidimensional
analysisof male and
Singh,S., andMurry,T (1978). "Multidimensional classification
of normal
female voices," J. Acoust. Soc. Am. 68, 1294-1300.
voice qualities,"J. Acoust.Soc.Am. 64, 81-87.
Nygaard,L. C., Sommers,M. S., and Pisoni,D. B. (1992). "Effectsof
stimulusvariability on the representationof spokenwords in memory," Sommers, M. S., Nygaard,L. C., andPisoni,D. B. (1992). "Stimulusvari-
Researchon SpokenLanguageProcessingPR-18 (Indiana University, ability and spokenword recognition:effectsof variabilityin speakingrate
Bloomington, IN), pp. 163-184. and overall amplitude," in Research on Spoken Language Processing
Palmeri,T. J., Goldinger,S. D., andPisoni,D. B. (1993). "Episodicencod- PR-18 (IndianaUniversity,Bloomington,IN), pp. 31-52.
ing of voice attributesand recognitionmemoryfor spokenwords,"J. Exp. Van Lancker, D., Kreiman, J., and Wickens, T (1985b). "Familiar voice
Psychol.Learn. Memory Cognition19, 309-328. recognition:patternsand parameters.Part II: Recognitionof rate-altered
Papcun,G., Kreiman,J., and Davis,A. (1989). "Long-termmemoryfor voice," J. Phon. 13, 39-52.
unfamiliar voices," J. Acoust. Soc. Am. 85, 913-925. Van Lancker,D., Kreiman,J., and Emmorey,K. (1985a). "Familiar voice
Pisoni,D. B., and Lazarus,J. H. (1974). "Categoricaland noncategorical recognition:patternsand parameters.Part I: Recognitionof backward
modesof speechperceptionalongthe voicingcontinuum,"J. Acoust.Soc. voices," J. Phon. 13, 19-38.
Am. 55, 328-333.
R. R., Strange,
W., Shankweiler,
D. P.,andEdman,T. R. (1976).
Ralston,J. V., Pisoni,D. B., and Mullennix,J. W. (1995). "Perceptionand
"What informationenablesa listenerto map a talker's vowel space?,"J.
comprehension of speech,"in Applied SpeechTechnology,edited by A.
Acoust. Soc. Am. 60, 198-212.
Syrdal,R. Bennett,and S. Greenspan (CRC, BocaRaton,FL), pp. 233-
287. Weenink,D. J. M. (1986). "The identificationof vowel stimuli from men,
Remez,R., Rubin, P., Nygaard,L., and Howell, W. (1987). "Perceptual women, and children," in Proceedings10 from the Institute of Phonetic
normalizationof vowelsproducedby sinusoidalvoices,"J. Exp. Psychol. Sciencesof the Universityof Amsterdam(Universityof Amsterdam,Am-
Hum. Percept.Performance13, 40-61. sterdam,The Netherlands),pp. 41-54.

3095 J. Acoust. Soc. Am., Vol. 98, No. 6, December 1995 Mullennix et aL: Voice gender 3095

