Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Transcribing in IPA

The International Phonetic Alphabet

Note that we have been talking about phones as if it were obvious what they are, but this
is not always the case. It is sometimes easy to find a clear separation between the phones
in a given word, that is, to segment the word into its component phones, but sometimes, it
can be very difficult. We can see this difference by looking at waveforms, which are
special pictures that graphically represent the air vibrations of sound waves. The two
waveforms in Figure 3.19 show a notable difference in how easy it is to segment the
English words nab and wool.

Figure 3.19. Waveforms for the English words nab and wool.
The waveform for nab contains abrupt transitions between three very different regions,
corresponding to three phones. In comparison, the waveform for wool has smooth
transitions from beginning to end, with no obvious divisions between phones. For the
purposes of this textbook, all words will be segmented for you, but it is important to
remember that when working with raw data from a spoken language, it may not be so
clear where the boundaries are between phones.

When we can identify the individual phones in a word, we want to have a suitable way to
notate them that can be easily and consistently understood, so that the relevant
information about the pronunciation can be conveyed in an unambiguous way to other
linguists. Such notation is called a transcription, which may be very broad (giving only
the minimal information needed to contrast one word with another), or it may be very
narrow (giving a large amount of fine-grained phonetic detail), or somewhere in between.
Whether broad, narrow, or in between, phonetic transcription is conventionally given in
square brackets [ ], so that, for example, the consonant at the beginning of the English
word nab could be transcribed as [n], with the understanding that the symbol [n] is
intended to represent a voiced alveolar nasal stop.

As linguists, we are interested in studying and describing as many languages as we can,

so for spoken languages, we ideally want to use a transcription system that can be used
for all possible phones in any spoken language. This means we cannot simply use one
existing language’s writing system, because it would be optimized for representing the
phones of that language and would not have easy ways to represent phones from other

In addition, many writing systems are filled with inconsistencies and irregularities that
make them unsuitable for any kind of rigorous and unambiguous transcription, even for
their associated spoken language. For example, the letter <a> in the English writing
system is used to represent different phones, such as the low front unrounded vowel in
nab, the low back unrounded vowel in father, the mid front tense unrounded vowel in halo,
and the mid central unrounded vowel in diva. Conversely, the high front unrounded tense
vowel in English can be represented by different letters and letter combinations: <i> in
diva, <ee> in meet, <ea> in meat, <e> in me, and <y> in mummy. That is, then English
writing system does not have a one-to-one relationship between phones and letters.

Note that symbols from a writing system are represented here with surrounding angle brackets < >.
This is a common notational convention in linguistics that helps visually distinguish symbols in a
writing system from symbols used for the transcription of phones, which are enclosed in square
brackets [ ].

Furthermore, even if English spelling were perfectly regular, many specific English words
have different equally valid pronunciations, such as either, data, and route. But even
words that seem to have only one consistent pronunciation may in fact be pronounced
differently by different speakers in more subtle ways. For example, in Los Angeles and
London, the vowel in the word mop normally has a low back tongue position, with the
London vowel also having some lip rounding that is not used in the Los Angeles
pronunciation. In Chicago, the vowel in mop is articulated more in the centre of the mouth,
making mop sound nearly like map to other speakers. If we tried to describe in writing how
to pronounce a vowel from another language, and we said that it was pronounced the
same as the vowel in the English word mop, we could not guarantee that the reader would
know whether the vowel in question is back and unrounded (as in Los Angeles mop), back
and round (as in London mop), or central and unrounded (as in Chicago mop).

The International Phonetic Alphabet

To avoid these problems, linguists have devised more suitable transcription systems for
spoken languages, each with their own strengths and weaknesses. In this textbook, we
will use a widespread standard transcription system called the International Phonetic
Alphabet (abbreviated IPA). The IPA was created by the International Phonetic
Association (unhelpfully also abbreviated IPA). The IPA organization was founded in
1886, and the first version of their transcription system was published shortly after. Since
then, the IPA transcription system has undergone many revisions as our understanding of
the world’s spoken languages has evolved. The most recent symbol was added to the IPA
in 2005: [ⱱ] for the labiodental tap, a phone found in many languages of central Africa,
such as Mono (a Central Banda language of the Ubangian family, spoken in the
Democratic Republic of the Congo; Olson and Hajek 1999).

For reference, the full chart for the IPA is given in Figure 3.20. This chart is available under
a Creative Commons Attribution-Sharealike 3.0 Unported License, copyright © 2020 by
the International Phonetic Association. It is also available online at the IPA’s homepage,
and there are also some online versions that are accessible for screenreaders, such as
this one created by Weston Ruter.

Figure 3.20. Full chart of all symbols in the International Phonetic Alphabet.
Learning the IPA takes a lot of time, practice, and guidance, and it is not just about
memorizing symbols. The underlying structure and principles behind the organization of
the table are what really matter. In this way, the IPA is like the periodic table of elements in
chemistry. So, while it is helpful to know that Na is the chemical symbol for the element
sodium with atomic number 11 and that [m] is the IPA symbol for a voiced bilabial nasal
stop, it is much more important to know what these concepts are and what those terms
mean. What is sodium? What does it mean for an element to have an atomic number of
11? What does it mean for a phone to be voiced? How is the vocal tract configured for a
bilabial nasal stop?

This is why this chapter focuses on defining concepts, so that you can build a solid
foundation in understanding how phones are articulated. The notation is also important,
but it has no value without the corresponding conceptual understanding.

Using the IPA

A full discussion of how to use the IPA for transcription is beyond the scope of an
introductory textbook like this one. Here, we discuss a few guidelines and some concrete
examples from English. For any transcription, it is important to keep in mind who your
audience is and what the purpose of the transcription is. Most of the time, we normally
only need a fairly broad transcription to get across a basic idea of the most important
aspects of the articulation.

One important guiding recommendation from the IPA for broad transcription is to use the
typographically simplest notation that still conveys the most crucial information. For
example, when possible, we should choose symbols like upright [a] and [r] rather than
their rotated counterparts [ɐ] and [ɹ]. These upright symbols are easier to type, easier to
read, and more reliable in how they are displayed in different fonts.

Another aspect of typographic simplicity is to avoid diacritics, which are special marks
like [ ̪ ] and [ʰ] that are placed above, below, through, or next to a symbol to give it a
slightly different meaning. These are often necessary for certain contexts, but sometimes,
they are superfluous and can hinder the reader’s understanding. Thus, you should use
diacritics when their meaning is crucial, but avoid them otherwise.

Typographic simplicity is also good practice when dealing with a lot of variation between
speakers that is not relevant to the main point. For example, the English consonant
typically spelled by the letter <r> is pronounced in many different ways by different
speakers. Many North Americans have some sort of central approximant, but it varies from
alveolar [ɹ], to postalveolar [ɹ̱ ], to retroflex [ɻ]. Some speakers may also have a constriction
between the tongue root and pharyngeal wall, which is indicated in the IPA with a [ˁ]
diacritic after the symbol. Some speakers may have some amount lip rounding, which is
indicated in the IPA with a [ʷ] diacritic after the symbol. Some speakers may have both
pharyngealization and rounding!

That results in at least twelve different possible articulations, each with its own
transcription in the IPA, depending on the place of articulation, whether or not there is
pharyngealization, and whether or not there is lip rounding. The IPA symbols for these
twelve possibilities are given in the list below. For each, the symbols are in order by place
of articulation: alveolar, postalveolar, and retroflex.

no pharyngealization and no rounding: [ɹ], [ɹ̱ ], or [ɻ]

pharyngealization and no rounding: [ɹˁ], [ɹ̱ ˁ], or [ɻˁ]
rounding and no pharyngealization: [ɹʷ], [ɹ̱ ʷ], or [ɻʷ]
both pharyngealization and rounding: [ɹˁʷ], [ɹ̱ ˁʷ], or [ɻˁʷ]

Furthermore, there are many other rhotic pronunciations beyond those in North American
varieties, such as an alveolar tap [ɾ] or trill [r] in Scotland, a voiced uvular fricative [ʁ] in
Northumbria, and a labiodental approximant [ʋ] in London.
Thus, when transcribing English in general, there is no one single symbol that accurately
represents the pronunciation of this consonant, so for broad transcription, a plain upright
[r] with no diacritics is a reasonable choice that follows the IPA’s recommendation for
typographic simplicity. Of course, when transcribing a specific articulation from a specific
speaker, it may make sense to use a more precise symbol, especially if the details of the
articulation are important. But generally speaking, a plain upright [r] is normally fine for
English, though some linguists may prefer to use [ɹ] or [ɻ] for North American English, even
though there are at least a dozen equally valid North American pronunciations. If you are
taking a course in linguistics, be sure to follow the standards and conventions set by your

Why is there so much variation in the pronunciation of English <r>? These consonants belong to an
unusual class of phones called rhotics, named after the Greek letter rho <ρ>, which itself represents
a rhotic consonant. Across the world’s spoken languages, we find a lot of variation in rhotics. Many
languages only have one rhotic, but which particular rhotic they have can be very different even
between closely related languages. The pronunciation of rhotics in a language can also shift over
time, especially if the language only has one. There seems to be no single overarching phonetic
similarity in the various rhotics, and linguists are still trying to figure out what makes this class of
consonants so special.

Even when the pronunciation of a given phone is fairly consistent across speakers, many
linguists still choose a typographically simpler transcription. Consider the consonant at the
beginning of the English word chin, which is a voiceless postalveolar affricate. Affricates in
the IPA are normally transcribed by writing the corresponding plosive symbol to represent
the stop closure, followed by the corresponding fricative symbol to represent the fricated
release, both united under a curved tie-bar [ ͡ ] to show they are unified as a single phone.

So, to represent the voiceless postalveolar affricate in chin, we need to select the correct
plosive and fricative symbols. First, let us consider the voiceless postalveolar fricative. We
can find its symbol in the IPA chart by looking in the section devoted to consonants.
Places of articulation are listed across the top, while manners of articulation are listed
down the left. Within a given cell, if there are two symbols, the one on the left is voiceless,
and the one on the right is voiced. So looking in the postalveolar column and the fricative
row, we find the symbols [ʃ] and [ʒ], and since we are interested in the voiceless fricative,
we pick the symbol [ʃ].

However, there is no similar basic symbol for a voiceless postalveolar plosive in the IPA.
That part of the chart is blank, so we have to create our own symbol by using the base
symbol for a similar consonant and adding one or more diacritics. In this case, we can use
alveolar [t] and put a retraction diacritic [ ̱ ] under it to indicate that its place of articulation
is slightly farther back, as we did for the postalveolar central approximant [ɹ̱ ] before. Thus,
we get [ṯ] as the symbol for a voiceless postalveolar plosive.

So, we would begin by putting these two symbols together under a tie-bar: [ṯ͡ ʃ]. However,
most English speakers also pronounce this affricate with some amount of lip rounding, so
a fully accurate transcription would be something more like [ṯ͡ ʃʷ], with the [ʷ] diacritic to
indicate rounding.

But hardly any linguist transcribes this affricate with that much phonetic detail. It is almost
never relevant to indicate that it is round, and the postalveolar location of the stop closure
is implied by the fact that it has a postalveolar release. You cannot release a stop closure
in a position different from where it is made: if there is a postalveolar release, it
necessarily must come from a postalveolar closure. So for typographic simplicity, the
affricate is more commonly transcribed simply as [t͡ ʃ], with neither of the two diacritics. As
with [r] for the English rhotic, [t͡ ʃ] is not technically an accurate transcription for most
speakers, but it is typographically simpler and conveys of all the crucial information
needed to understand the basics of the articulation. The tie-bar on the affricate may also
sometimes be left off in transcriptions, so [tʃ] is also a common transcription for this
affricate, making it even more typographically simple.

Even without these issues, there is still usually no such thing as “the” correct transcription
of a word. Two pronunciations of the same word by the same speaker will always have
some differences, because we live in a physical world where we cannot avoid slight
imperfections and fluctuations. Even if we wanted to capture all of those possible
differences with the IPA, it is simply not designed for that level of phonetic detail. When
such detail is important, it needs to be conveyed in other ways, such as with pictures (like
waveforms and midsagittal diagrams) and numerical measurements (like loudness in
decibels and duration in milliseconds). Thus, an IPA transcription is always inherently
missing some details, so we have to decide how much detail is needed and how much
should be left out for simplicity.

Transcribing English with the IPA

Despite all these pitfalls, it is still important to get some basic skill in transcription for future
work in linguistics. Since this textbook is presented in English, English is a good starting
point to give you something concrete in which to ground your understanding of how to do
transcription. However, there is much dialectal variation in English, so the transcriptions
offered here are very general and may differ from the varieties of English you are familiar
We begin with consonants, where there tends to be less variation across dialects. Table
3.2 lists some plosives and affricates of English, with their IPA symbol (keeping in mind
typographic simplicity) and example words containing each consonant in various
positions, where possible. For each word, The portion of the spelling that corresponds to
the phone is in bold. Finally, a phonetic description of each consonant is also given.

symbol example example example description

(beginning) (middle) (end)

[p] pan rapid lap voiceless

bilabial plosive

[b] ban rabid lab voiced bilabial


[t] tan atop let voiceless

alveolar plosive

[d] den adopt led voiced alveolar


[t͡ ʃ] chin batches rich voiceless


[d͡ ʒ] gin badges ridge voiced


[k] can bicker lack voiceless velar


[ɡ] gain bigger lag voiced velar


[ʔ] — uh–oh — voiceless glottal


Most of these are straightforward, but as discussed in Section 3.3, the alveolar
consonants are normally apicoalveolar, though some speakers may pronounce them with
the blade of the tongue. If that detail is necessary, these consonants can be transcribed as
[t̻] and [d̻ ], using the laminal diacritic [ ̻ ]. Regardless of the active articulator, some
speakers may pronounce these consonants on the back of the teeth rather than on the
alveolar ridge, in which case, they would be transcribed as [t̪] and [d̪ ], using the dental
diacritic [ ̪ ].

The glottal plosive (also frequently called a glottal stop) is only a marginal consonant in
English. It can be found as the catch in the throat in the middle of the interjection uh-oh.
Some speakers also have it elsewhere, such as in the middle of some British English
pronunciations of the word bottle. It is articulated by making a full stop closure with the
vocal folds, blocking all airflow through the glottis.

Table 3.3 lists some fricatives of English.

symbol example example example description
(beginning) (middle) (end)

[f] fan wafer leaf voiceless


[v] van waver leave voiced


[θ] thin ether truth voiceless


[ð] than either smooth voiced


[s] sin muscle bus voiceless

alveolar fricative

[z] zone muzzle buzz voiced alveolar


[ʃ] shin Haitian rush voiceless


[ʒ] — Asian rouge voiced


[h] hen ahead — voiceless glottal


The most notable variation here is that some speakers do not have [θ] and [ð], and
instead used [t] and [d] or [f] and [v], depending on the dialect and the position in the word.
As with the postalveolar affricates mentioned before, the postalveolar fricatives are also
usually somewhat rounded, so they could be more narrowly transcribed as [ʃʷ] and [ʒʷ].
The voiced postalveolar fricative [ʒ] is also one of the rarest consonants in English, and
many speakers pronounce it as an affricate in some positions instead of as a fricative. For
example, you may hear speakers pronounce the final consonant of garage as the affricate
[d͡ ʒ] rather than the fricative [ʒ].

Table 3.4 lists some sonorants of English. Across the world’s spoken languages,
sonorants tend to be voiced by default, because their high degree of airflow causes the
vocal folds to spontaneously vibrate, unless extra effort is put in to keep them from
vibrating. This is true for English, so the phonation of the sonorants is not listed here.

symbol example example example description

(beginning) (middle) (end)

[m] man simmer ram bilabial nasal


[n] nun sinner ran alveolar nasal


[ŋ] — singer rang velar nasal stop

[l] lane folly ball alveolar lateral


[r] run sorry bar (various! see


[j] yawn onion — palatal central


[w] won awake — labial-velar


A few of these sonorants warrant extra discussion. The alveolar nasal stop [n] has much
of the same variation as the alveolar plosives, with some speakers having a
laminoalveolar articulation [n̻ ] and some having a dental articulation [n̪ ]. The velar nasal
stop is often one of the most surprising phones of English to English speakers who are
new to phonetics, because is not easily identifiable as its own phone. Many people are
misled by the spelling and think they say words like singer with a [ɡ], but in fact, most
speakers have only a nasal stop there, so that singer differs from finger, with singer having
only [ŋ] and finger having [ŋɡ]. However, there are speakers who do genuinely pronounce
all words like these with a [ɡ] after the nasal stop, but even then, the nasal stop they have
is still velar [ŋ], not alveolar [n].

A notable consonant here is [w], which is special among the consonants of English in
being doubly articulated, which means that it has two equal places of articulation. It is
both bilabial (with an approximant constriction between the two lips) and velar (with a
second approximant constriction between the tongue back and the velum). Its place of
articulation is usually called labial-velar. English used to consistently have two labial-velar
approximants, a voiced [w] and a voiceless [ʍ]. Very few speakers today have both of
these, but those who do pronounce the words witch and which differently, with voiced [w]
in witch and voiceless [ʍ] in which.

Now we can move on to the vowels. This is where much of the variation in pronunciation
occurs across English dialects, and fully describing all of the vowels in English would take
up an entire textbook of its own. Note that this is not a general property of spoken
languages overall. Some are like English, with most dialectal variation in the vowels, but
others have much more dialectal variation in the consonants, while others may have a
relatively even mixture of variation in both consonants and vowels.

Table 3.5 lists some monophthongs of English, with a focus on the English vowels as they
are broadly pronounced across North American dialects. However, there is still much
variation just in North America, and this discussion should not be taken to represent any
particular speaker or region, let alone any sort of idealized standard or target
pronunciation. This is simply a convenient abstraction that provides a useful baseline,
though it is still only a very rough guide, and individual speakers can vary quite a lot from
what is discussed here. Note also that unstressed vowels are very unstable, especially in
fast speech, so for example, unstressed [u] could be pronounced more like [ʊ] or [ə], even
for the same speaker saying the same word. As in most spoken languages, the vowels of
English are generally all voiced, so their phonation is not listed here. Example words are
given that show the vowel in a stressed syllable, an unstressed syllable, and at the end of
the word (see Sections 3.10 and 3.11 for more about syllables and stress).
symbol example example example description
(stressed) (unstressed) (end)

[i] beater saltier see high front


[ɪ] bitter — — high front

unrounded lax

[e] baker — say mid front


[ɛ] better — — mid front

unrounded lax

[æ] batter — — low front


[ɑ] father — spa low back


[ɒ] bonnet — — low back round

[ɔ] border — saw mid back round


[o] boater — sew mid back round


[ʊ] booker — — hiɡh back round


[u] boomer manual sue hiɡh back round


[ʌ] butter — — mid central

unrounded lax

[ə] — animal sofa mid central

unrounded lax

As noted before, there is a lot of variation that cannot be adequately discussed here, so
we only cover a few notable deviations. First, while many speakers pronounce the four
tense vowels as monophthongs as transcribed here, most speakers pronounce some or
all of them as diphthongs instead, perhaps even having an approximant at the end rather
than a vowel. For example, high front unrounded tense [i] may be pronounced more like
[ɪi] or [ij] by some speakers. It is especially common for the two tense mid vowels to be
pronounced as diphthongs, something like [eɪ] and [oʊ] or perhaps [ej] and [ow].

Many of the back round vowels, especially [ʊ], are fronter and/or unrounded for some
speakers in some dialects.

The back vowels in bore and bought are pronounced similarly to each other by some
North Americans, and so they are often represented with the same symbol [ɔ], though
note there may still be some differences, with [ɔ] before a rhotic often pronounced
somewhat higher, closer to [o]. However, many speakers in Canada and in the western
United States have a very different vowel in bought from bore. Their bought vowel is much
lower, and for some speakers, it is also unrounded. These speakers use the same low
vowel in bought that they use in bot. For most North Americans, the low vowels in bot and
balm are pronounced the same, either as back round [ɒ] or back unrounded [ɑ]; in some
dialects, it may be central unrounded [a]. Others have two different vowels for these
words, usually [ɒ] in bot and [ɑ] or [a] in balm. Needless to say, this part of the vowel
system of English is particularly troublesome, and even many expert linguists get aspects
of it wrong.

The two mid central vowels [ʌ] and [ə] are often treated as related pronunciations of the
same vowel, based on whether or not they occur in a stressed syllable (again, see
Sections 3.10 and 3.11 for more about syllables and stress). For now, just note that some
vowels of English are pronounced louder and longer than others, which we call stressed,
while the other softer and shorter vowels are said to be unstressed. We can see the
difference in stress in pairs like billow and below, which differ mostly in which syllable is
stressed: the first syllable in billow and the second syllable in below. The two central
vowels of English differ in stress: the first syllable of the name Bubba is stressed, and the
second is unstressed, so we might transcribe this name as [bʌbə]. Although these two
vowels sound very similar for many speakers and could easily be notated with the same
symbol, there is a long tradition of notating the unstressed mid central vowel of English
with [ə] and the stressed mid central vowel with [ʌ], based on historical pronunciations in
which the stressed vowel used to be pronounced much farther back (and still is, in some

The symbol [ə] has a special name, schwa. A common joke among linguists is that their favourite
vowel is schwa or that they wish they could be more like schwa, because it is unstressed, referring to
a desire to avoid the typically high amounts of stress in academia. However, while schwa is always
unstressed in English and in some other spoken languages, in many others, [ə] is a normal vowel like
any other vowel and can be stressed. For example, stressed [ə] can be found in languages like
Sḵwx̱ wú7mesh (a.k.a. Squamish, a Coast Salish language of the Salishan family, spoken in British
Columbia; Demers and Horn 1978), Romanian (a western Romance language of the Indo-European
family, spoken in Romania; Chitoran 2002), and Mandarin (a Sinitic language of the Sino-Tibetan
family, spoken in China and nearby areas; Cheng 1973). Thus, there is nothing that requires [ə] to be
unstressed in general, so you should not think of it an inherently unstressed vowel.

Finally, we consider diphthongs and syllabic consonants, which are phones that have
consonant-like constrictions in the vocal tract but which function more like vowels within
English. Some diphthongs and syllabic consonants of English are given in Table 3.5.
symbol example example example description
(stressed) (unstressed) (end)

[aɪ] biter — sigh low central

unrounded to
high front

[aʊ] browner — how low central

unrounded to
high back round

[ɔɪ] boiler — soy mid back round

to high front

[r̩ ] burning interval sir syllabic rhotic

[l̩ ] — hazelnut saddle syllabic alveolar


[n̩ ] — calendar sudden syllabic alveolar

nasal stop

[m̩ ] — bottomless seldom syllabic bilabial

nasal stop

For the diphthongs, the symbols used here represent a rough average over where they
typically start and end, but the actual pronunciation varies quite a lot from speaker to
speaker and even for the same speaker. The low starting point for [aɪ] and [aʊ] may be
closer to back [ɑ] or front [æ], while the mid back starting point for [ɔɪ] may be closer to
tense [o]. Additionally, the hiɡh front ending point for [aɪ] and [ɔɪ] may be closer to tense [i]
or the approximant [j], while the hiɡh back ending point for [aʊ] may similarly be closer to
tense [u] or the approximant [w].

Syllabic consonants are transcribed by using the syllabic diacritic [ˌ] under the relevant
consonant symbol. However, sometimes these are transcribed with a preceding [ə]
instead, so that hazelnut could be transcribed either as [hezl̩ nʌt] or as [hezəlnʌt]. Syllabic
rhotics (also called rhotacized vowels or r-coloured vowels) are so common that they
have their own dedicated symbols: [ɝ] in stressed syllables and [ɚ] in unstressed
syllables. Thus, burning could be transcribed as [br̩ nɪŋ] or [bɝnɪŋ], while interval could be
transcribed as [ɪntr̩ vl̩ ] or [ɪntɚvl̩ ].

With all of this variation, not just in pronunciation by different speakers, but in transcription
choices by different linguists, it can be difficult to figure out what is really intended by a
given transcription. This is why when exact phonetic details matter, it is a good idea not to
rely solely on the IPA, but to include prose descriptions, midsagittal diagrams, and other
tools that can help clarify exactly what is meant.

You might also like