Professional Documents
Culture Documents
Kjellin 2015 Practise Pronunciation W Audacity
Kjellin 2015 Practise Pronunciation W Audacity
net/publication/285234145
CITATIONS READS
0 36,080
1 author:
Olle Kjellin
Landstinget Kronoberg
19 PUBLICATIONS 134 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Olle Kjellin on 30 November 2015.
A re you learning a new language? Do you, like me, have the ambition to learn it well, to sound as "native" as
possible, or at least to have a listener-friendly pronunciation that will not embarrass me or annoy the native
speakers? This paper will show you how to achieve that, and explain why it is possible, even if you are not a
child. In these 21 pages with its 34 illustrations you will learn how to:
• Produce perfect pronunciation exercises with your favourite sentences for free.
• Practice the way that will give you the best result, for example perfect pronunciation, if you wish.
1 Introduction
There is as yet, to my knowledge, no freely or commercially available pronunciation practice material that is "best" for
my purpose. So I produce my own material, and so could you. It is easy with Audacity, which is a very powerful free
software for recording and editing sounds. It costs time, to be sure, but this is well spent time that not only yields really
good results for my pronunciation exercises, but it also makes learning faster. This tutorial will show both how to utilize
Audacity and how best to perform the exercises, and why so, according to my knowledge and experience. 1 Hopefully, it
will suit you too.
Among commercially available language courses, for instance Pimsleur and Rosetta Stone make really good courses. I
have several of them. But still I want to add my own modifications to make them even better, or supplement with other
material that I make myself according to the guidelines in this tutorial. As you will soon see, I practice without any text
in the beginning, in accordance with recommendations based on research as well as on my own experience. Because
most writing systems do not represent the pronunciation well enough, but rather confuses the learner and leads to
faulty, "broken" pronunciation. If you still do want written support, you should learn the IPA (International Phonetic
Alphabet http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and try the Glossika method
(http://www.glossika.com/).2
For advanced learners of English, Richard Cauldwell's Speech in Action http://www.speechinaction.com/ is an
unsurpassed source.
If you don't want to pay for CDs or online courses, there are some quite good free materials, too. If you have native-
speaking friends to record, do so. Else, I can recommend book2 from Goethe-Verlag http://www.goethe-
verlag.com/book2/. I often download their sound files and modify them as below for my own language studies.
However, beware of all the amateurish materials that are abundant on the Internet. Most of them are too bad! Some even
incorrect.
This paper was born from my Swedish tutorial on Audacity, originally written as a handout to participants in my
pronunciation classes. It had only a brief description of the practice method at the end, but many readers wanted more
of that kind, and they wanted it in the beginning, so here it is! As a testimonial of the successfulness of the Quality
Repetition method, I might mention that many of my pron-class participants thought the course was too short,
regardless of education level (even many MDs or other academics who were unsatisfied with their Swedish
pronunciation), and despite the fact that the courses were one whole week long, about 35 hours mainly consisting of
intensive chorus and individual practice on just 12 representative sentences: how to say the participants' own street
addresses! And all of them were very angry that they were not given this chance to pronounce correctly from the
beginning.
dancing, typing, operating brains, reading x-rays, writing calligraphy, flower arrangement, or whatever skill you want to
acquire. It's not a unique method at all, but rather self evident for elite performers in all those areas, so it is doubtful if I
could call it "my" method. But sadly, deliberate practice has been out of fashion in language (and mathematics)
pedagogy since decades! It has been scorned at as "skinnerism" or whatever. This is a very unfortunate situation, and I
want to turn it back to normal again. Deliberate, persistent practice in a special way to be described below and termed
Quality Repetition is "my method", or, in fact, everybody's method known since prehistoric times. It is effective because
it is based on neurophysiology. Toddlers do it all the time in their own, innate, smart way when they acquire their first
language. (Or languages; because there are no physiological, only practical limits to the number of languages that can
be acquired in parallel.) Adults are well advised to peek at toddler's "methods" and adapt them to their own capabilities
and limitations. This paper will show how you can do that.
Teachers usually teach the alphabet and the grammar well and carefully, but seldom pronunciation to a sufficient
degree. Many of them even think it is unnecessary to practice pronunciation with adult learners, on the (false) belief that
they will never succeed anyway. Particularly not with the prosody (rhythm and intonation of speech), which often is
alleged to be "the most difficult" thing to learn in a new language, although it is arguably the most important thing to
learn if you want to get a listener-friendly pronunciation with a good communicative function.
Is it really true, then, that it is so difficult to learn L2 (second-language) pronunciation, or that the prosody is
particularly difficult? – No, on the contrary! Not only is it very possible to learn very excellent L2 pronunciation
equal to, or not very different from, native pronunciation, but the prosody even is the easiest part! This claim of mine
is based on my long-time experience as a language learner and teacher coupled with my medical training focused on the
physiology of the voice and speech organs and of the brain and neuromuscular system in learning and forgetting. There
is plenty of scientific evidence (though mainly in the medical literature); see the selected bibliography in section 20 on
page 20.
It has become more and more recognized among language teachers in the recent decades that the speech prosody is the
overwhelmingly and undeniably most important factor for reaching a near-native, or at least a listener-friendly,
pronunciation. The prosody is for speech what the carrier wave is for a radio transmission. The "program" is
superimposed on the carrier wave, and the wave as such should not normally be perceived consciously. Therein lies
another great potential and important function of prosody: by suddenly varying the pitch, loudness or length of sounds
and words in unexpected ways, i.e., by adding emphasis, the speaker can choose to bring prosody up to a conscious and
conspicuous level and attract the listener's attention to the paralinguistic contents of the message. This corresponds to
italics, boldface etc. of writing.
The prosody of any language typically consists of less than ten or so rules based on only three fundamental elements
that every (=indeed every!) language uses in its particular prosody: voice pitch, voice loudness, and length of sounds.
These three mechanisms are well developed from the moment of birth (listen to a baby!), they work in the same way for
all human beings, and there are only partial differences in the details of how they are controlled and utilized in different
languages, in varying proportions of importance per each specific language. For example, what may be called "stress"
or prominence is often signalled by a certain pitch and/or loudness variation in the stressed syllable (as in Spanish,
Hungarian or Finnish), or on the pre-stress syllable (as quite often in Polish, Russian and maybe French), often
accompanied by a slight lengthening of the stressed syllable (as in Russian and Spanish) or a significant lengthening (as
in French and English), or signalled almost only by the length (as in Swedish), whereas length has nothing at all to do
with stress in some other languages (such as Finnish, Hungarian, Czech). And some languages don't even use "stress"
but have other means of prosodic signalling (such as Japanese, Somali, and maybe French). Pitch is used to signal the
morphological structure of words in some languages (such as Swedish, Japanese and Tibetan), or to signal lexical
identity in other languages (such as Chinese, Thai, Vietnamese and many African and Native American languages).
Common to all these uses still is that, regardless of language, they involve the very same three fundamental elements ─
pitch, loudness and length ─ to signal all those lexical, grammatical, emotional and other characteristics involved in the
spoken conversation. And in every culture there are songs, and songs too consist of notes in sequences with varying
pitches, loudnesses and lengths. So indeed, each one of us above toddler age already masters all the prosodic means
being used in any other language; we just have to learn how to tweak our skills for the specific details of the new,
particular language we are learning. And please do carefully note: All prosodic uses of pitch, loudness and length appear
in each and every utterance, so it is a very good and time-efficient idea to concentrate mainly on the prosody from the
very outset of learning a new language. Don't care too much about the particulars of vowels and consonants until you
feel confident with the prosody.
In contrast to the small number of prosodic details, there are typically some 30-40 vowel and consonant sounds (some
languages have less, some have more, some have considerably many more), but all of them don't appear every time, not
in every utterance. So they are indeed of less importance than the prosody, at least in the beginning. You can see proof
of that in children's first-language acquisition: By the time the toddlers can say 25-30 words in their emerging language,
their prosody is already identical with that of their adults. However, it will usually take some 5-6 years or even more,
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 3/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
before they can master all the vowels and consonants. Despite this, they are never perceived of as having any “foreign
accent” – thanks to their correct prosody! Therefore, I always practice prosody first and foremost, even if my tongue
will stumble on many individual vowels and consonants that may pop up every now and then.
But, how to do it then, if you have got no teacher to help you? The answer is in this paper. Produce your own materials
for pronunciation exercises and follow my advice here! Read more about the methodology and its neurophysiological
foundations in the next few sections. Then the hands-on instructions for the use of Audacity from section 7 and on.
The short version: In the way to be described below I will train my ears with the correct speech rhythm and
melody according to the model and saturate my brain's primary hearing centres as well as its hearing
perception centres with it. I should not torture my ears to hear me speak with a faulty accent (as I would do
in the beginning, if I didn't saturate my ears first). Subconsciously and gradually, by shadowing, mirroring
and imitation, I will train and automatize my mirror neurons (imitation neurons), which are then to guide my
speech muscles to my own pronunciation, when, eventually, I start saying the phrases myself without help.
In this process my brain will actually be physically changed due to its plasticity. This is learning on the
neuroanatomical scale. My brain will very effectively connect and match the sounds that I hear with the
sounds that I make and the sounds that I should make. Therefore, I should not trouble my speech muscles to
learn first to speak with a funny pronunciation (as, again, I would do in the beginning, if I didn't saturate my
ears first). Instead, I will first make the (correct) model utterances resound as a template din in my head, and
that will direct my speech muscles accordingly. It will then even be difficult for me to pronounce much
differently from the model. (Incidentally, this is also how our native, first-language speech is mirrored,
acquired, controlled and monitored, the speech muscles then being guided by internally “hearing” and
predicting how the result would and should sound for a given articulation. More than 50,000 years of human
language evolution cannot be wrong.)
In conclusion: I will practice pronunciation with my ears and let automated nerve reflexes do the rest. I will
then have created an “audio-motor procedural memory” for the target language, with a result as native-like
as I have the time and motivation to aspire for.
A) Hearing
The primary hearing centres are neuronal arrays situated in the temporal lobes, bilaterally (both sides), also called the
auditory cortex. They belong mainly to the sensory system. The auditory nerves from both ears are connected to the
brain stem, and then relayed in a series of neurons (nerve cells) to the primary hearing areas and also to other places,
bilaterally. About 60% of the nerve fibres from one ear cross over to the other side (i.e., from the left ear to the right
temporal lobe, and vice versa), while some 40% remain on the same side. Some of the crossed pathways cross back
again after relaying its signals to various other locations and reflex circuits, for example for directional hearing and
head-turning reflexes towards a sound. See schematic pictures in bit.ly/auditory-path-1 and http://bit.ly/auditory-path-2.
A useful reflex is the Lombard reflex that causes me to speak louder in a loud environment. Replaying my material loud
will activate my speech organs better than soft. The auditory system is replete with reflexes of various kinds.
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 4/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Speech may seem to be a sequence of distinct words, each made up of distinct sounds. In reality, however, speech is a
continuous stream of interwoven sounds. Sounds are caused by pressure waves in a medium. The words from a speaker
as well as all other natural sounds and noises from around us are an extremely complex mixture of regular and irregular
waves (vibrations) in air travelling to our eardrums. Pure physics. An intricate mechanism amplifies the air waves in the
ear while transforming them to water waves in the inner ear and then actually zooms in on speech-relevant vibrations
(particularly those pertaining to the speaker's speech rhythm), synchronizes with them, performs a basic sorting of
relevant sounds, filters out non-speech sounds, and converts all into electrochemical signals in the neurons leading to
the brain, where they are further sorted into higher-order categories of many kinds (phonetical, phonological,
morphological, syntactical, lexical, semantical, etc.), by which they can be identified and hopefully correctly
comprehended in their particular context.3 Surprisingly, the auditory nerve contains many more efferent (motor) fibres
than afferent (sensory) ones! However, the sorting and filtering mechanisms of the inner ear is dependent on this
arrangement.
The primary hearing centres register the physical characteristics of the incoming signals, such as pitch, loudness and
length, and map them tonotopically along the cerebral cortex. Tonotopical mapping means that low pitch is at one end
and high pitch at the other, like the keys on a piano, in a simple, straightforward array. It is nicely illustrated in
http://bit.ly/tonotopic. Corresponding "periodotopic" mapping probably exists for the temporal (timing) aspects of
sounds too.
From the primary auditory cortex, signals are relayed on to higher-order hearing-perception and comprehension centres
(see B, below), and to mirror neurons (see D, below). The pathways to mirror neurons are the shorter and faster, which
has important implications for our practice method. Presumably, the efferent fibres mentioned above are connected
with the mirror neurons.
We hear and perceive our own words in three different ways. The first is by air conduction: the sounds from our mouth
go around the cheeks into the ears and are converted as above. The second is by bone conduction: The waves travel
directly through the soft tissues and bone into the inner ear. This is louder and much faster than air conduction, not only
because the route is so much shorter and even bypasses the eardrum and middle ear, but also because the waves travel
more than four times faster and with additional components in water and solids than in air. So we really can't know how
our own air-conducted speech sounds until we listen to a recording. Some people don't like to hear themselves in a
recording, but that is how we "really" sound to other people, like it or not. The bone conduction pathway enables a very
fast route for auditory feedback, which is very important for the pre- and subconscious monitoring of what we are
saying while we talk. (As for articulatory feedback, there are also feedback loops through proprioception, i.e., senses of
muscular and joint positions and movements. However, although not unimportant, the proprioceptive routes in general
are too slow for real-time feedback.)
The third way of perceiving our own words is psychological: We "know" what we said, because we wanted to say it. It
is usually correct, but occasionally it happens that the mouth said it wrong, or even another word. In most cases we can
correct ourselves immediately, but at times it happens that the mistake goes undetected, and we can swear on being
correct even when we are not. Only a recording can reveal the truth then. Incidentally, this can also happen when
perceiving another person's speech. We may hear only what we expected to hear, or what we could comprehend, and we
can honestly swear on being correct even when we are not. Again, only a recording can solve the issue.
B) Perception
The brain's hearing perception "centres" also belong to the sensory system and are responsible for how we understand
speech and language. They are vast, complex, intertwined systems of nerve circuits and networks mainly distributed in
the parietal lobes and around the angles between the temporal and parietal lobes (Wernicke's area). These centres ,
circuits and networks continuously exchange information with one another as well as with the primary auditory centres
as well as with the mirror neurons (see below) across both the right and the left brain, and to innumerable other
networks that, all taken together, represent functions for speech, language, memory, emotions, etc.
Please note: No brain is “half”, even if they are called hemispheres. The brains are a paired organ, just like the eyes,
ears, kidneys, lungs, hands, feet, etc., none of which are “halves”. And like all the other paired organs, both brains can
perform the same actions simultaneously. However, in some special cases conflicts might arise if both brains competed
about what to do. So with thick bundles of extra fibres between them (the corpus callosum), the right and left brain will
communicate, discuss, and decide which side to do what and how much. As a result one side may become well trained
(dominant) and the other side "ring-rusty" or even inhibited (dominated!), but nevertheless always prepared to jump in
and substitute if the dominant side should falter.
Don't ever believe in the urban myths about right-left brain separation of tasks. Some of them may be a little true to a
3 People with hearing impairment using hearing aids or cochlear implants can't sort out sounds like that in their inner ear. A noisy environment or
several speakers talking simultaneously will impose great difficulties on them.
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 5/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
certain extent, but not in the way they are presented by non-experts in the media. The differences of "lateralization" and
dominance often are in only a few percent of the total, bilateral activity. Speech and language are such highly
specialized and finely trained functions, that the non-dominant brain is less prepared than for other functions ─ but not
unable ─ to jump in and substitute, for example after an injury. In the majority of people the left brain dominates for
many aspects of language, while the right brain usually is dominant for prosodic factors. However, both the right and
the left brain do indeed cooperate all the time, even in language and speech.
Eating and drinking utilizes the same anatomical structures in the face, mouth and pharynx as speech, but the
controlling neural networks are different from speech, and left-right dominance is random at about 50:50 percent. So, in
case of a unilateral stroke, about half of the patients get swallowing problems, depending on whether the stroke is on the
swallow-dominant side or not. However, in most of these cases the swallowing functions return more or less completely
in about 3-4 months. This is due to the brain's plasticity (see next section), by which the intact, previously non-dominant
brain, re-learns to control swallowing. Similarly for all other lost functions when they resolve after some time, but often
intensive and extensive rehabilitation is needed to wake up, coach and exercise the substituting brain.
C) Plasticity
Learning and getting results of training is only possible thanks to the plasticity of the brains. This means their ability
to adapt, reorganize connections, change, and even grow anatomically, in response to incoming stimuli and identified
needs, in effect relocating functions between the two of the pair as well as within each brain separately. This is one of
the most fascinating function of the brains, it happens very fast, and it occurs in both the sensory and the motor system.
And it is not necessary to have had a stroke to induce plasticity; it is a normal function in all brains, at all ages!
A connection between two neurons is called a synapse. Plasticity primarily affects the number of synapses. On an
average, each neuron has input synapses from about 10,000 other neurons and constantly receives various signals from
all of them, some excitatory, some inhibitory. When a neuron has accumulated enough signals of a certain kind that it is
specialized for, it "fires" and sends a signal on to its output synapses with, again, some 10,000 other nerve cells. One
adult pair of brains has about 100 billion (100,000,000,000) neurons. Multiply these three factors and find that this is
indeed a huge network of some ten billion billion (=10,000,000,000,000,000,000) connections. In comparison, the little
World Wide Web is rather a tiny network: the available Internet statistics from May 2015 says there are only 3.1 billion
users in the world right now. (http://www.internetlivestats.com/watch/internet-users/)
At birth, we are even bestowed with some 200 billion neurons, with only rather few synapses. But in response to all and
any incoming stimuli and physical activities of the child, zillions of new synapses are formed each minute and connect
all the involved neurons. To accommodate all the new synapses, the neurons form extensive systems of branches and
twigs, in a process called arborization. It is therefore very important to present as many different stimuli as possible to a
child from birth to adulthood, to promote arborization and synapse formation. The more modalities of different kinds
that are involved and coupled (eyes, ears, hands, body movements, right side, left side, etc.) in motor pattern formation,
the better and the more robust will the skills and long-term memories be. "Neurons that fire together, wire together"
(Hebb's principle). This too has pedagogical implications, because the same applies to all ages. For example, we should
practice prosody complete with suitable body language.
Unused neurons are weeded out or made dormant. For instance, surprisingly, a newborn baby has pathways from the
primary auditory centres in the temporal lobes to the visual centres in the occipital lobes, but since such pathways are
generally not needed, they will shrink and almost disappear. Unless the child is blind, of course, in which case these
neurons are retained, and retrained, to serve the visual centres, which would otherwise have been unemployed, but now
will be used for auditory tasks instead. That is an example of plasticity. However, even blindness acquired in adulthood
will induce similar activation of the visual centres by auditory input. That might seem to be impressive plasticity,
indeed. However, every instance of normal learning of anything at all is accomplished through these same plasticity
mechanisms, and they work perfectly throughout our entire lifetime! This is very encouraging news.
In response to a new stimulus it takes only seconds for small "knobs" (dendritic spines) to form on the branches of
neurons. This time-lapse video of knob formation https://www.youtube.com/watch?v=s9-fNLs-arc illustrates learning
on the scale of a single neuron! If the stimulus is not repeated, the new knobs will disappear. If the stimulus is repeated
sufficiently many times, the knobs will develop further and form permanent synapses and wire together all neurons that
happened to be involved in that task, for instance the pronunciation of a new speech sound, or a whole sentence with
correct rhythm and intonation pattern with concomitant body movements and gestures. The results are long-term
memories. Such wired-together networks may be re-used in total or in parts in the formation of yet other networks, and
hence assist in recall, cueing, and mental associations of all kinds. All this is the neurophysiological rationale for multi-
modal multiple repetitions in any learning process. Unfortunately, there is no shortcut to learning and long-term
memory, only repetitive work. Deliberate, persistent, repetitive practice.
Ever since we start speaking as toddlers and throughout all our lives, every time we say anything at all, every utterance
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 6/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
will serve as an instance of practice that will form new synapses and thus further consolidate and reinforce our speech
habits as represented in our mirror neurons. And so we will all become super experts in all the procedures involved in
hearing and speaking our first language(s). The robustness of the procedural memory and other long-term memory in
general is a linear effect of the number of repetitions. It is statistical learning.
Thus, procedural memories for skilled actions form like paths in a lawn: They emerge wherever you tread frequently
enough, nowhere else. But fortunately, there is no best-before date for plasticity. As we grow older, we will, in many
cases (but not all, depending on the type of task), need more repetitions per item to learn it and automatize it than at
younger ages. That is the only age effect. And there is no neurophysiological difference between language learning and
any other types of motor learning. So forget the disheartening myths about age and language learning, at least as
concerns pronunciation. (It may be true for grammar, which ususally is more complicated.) Just repeat a larger number
of times, if you are "older". And be sure to make it right from the beginning, to avoid arborization and synapse
formation for unwanted pronunciation. Because wrong pronunciation too will induce all these plasticity processes in the
same way and end up stored as motor "skills" in your long-term, procedural memory. You don't really aspire for that.
Fossilization in second language users (i.e. a petrified foreign accent in spite of many years' use of the new language) is
more due to faulty instruction and insufficient training at the beginner's level than to any biological constraints, and thus
is preventable (if you do want to prevent it). Due to the time handicap of adult learners there is little chance for us ever
to catch up with a native speaker in every respect, but it is indeed perfectly feasible to sound like one in the limited
number of sentences we are able to say.
In experimental conditions it has been found that automating a new (simple) motor skill takes about 15 minutes. Can
you practice the same sentence for 15 minutes? It seems like a good idea to do so. However, depending on the difficulty
of the task and your previous experience with similar skills, of course, it may take longer or shorter time than that to
learn a new motor pattern. For example, the 15 click consonants in Zulu are quite a challenge for English speakers, but
presumably easy-peasy for Xhosa speakers (who have 21 click consonants). When, however, you can say 20-30
sentences in a native or near-native way in your new language, after hours of deliberate, persistent practice on only
them, you will also be able to say 20-30 million other sentences in the same way. Because they all follow the same rules
of prosody and pronunciation. So part of the trick for the adult language learner is to have a very limited curriculum for
the initial pronunciation training period.
D) Mirror neurons
Our pair of brains contains numerous mirror neurons, also called imitation neurons. Discovered only in the late 20th
century, their functions are highly relevant for language learning and acquisition, and this may be the most fascinating
area of recent research in neuroscience. The human mirror system is involved in understanding others’ actions and their
intentions behind them, and it underlies mechanisms of observational learning. Research on the mechanism involved in
learning by imitation has shown that there is a strong activation of the mirror system during new motor pattern
formation. It has been suggested that the mirror-neuron system is the basic mechanism from which language developed.
Some functional deficits typical of autism spectrum disorder, such as deficits in imitation, emotional empathy, and
attributing intentions to others, seem to have a clear counterpart in the functions of the mirror system.
The mirror neurons belong to the motor system. They are motor neurons primarily involved in finely tuned muscular
actions, movements and procedures. But secondarily, they are also recruited when we observe other people perform
similar actions and procedures with which we ourselves already have prior experience and interest. In essence, mirror
neurons are a kind of action and pattern recognition mechanism essential for the perception and appreciation of what
other people are doing, saying, or intending. Therefore the mirror neurons are also crucially involved when we want to
shadow, mirror and imitate what others do or say, such as the teacher in a language class. Our ability of, and agility in,
such action recognition, mirroring and imitation depends heavily on the mirror neurons' prior experience of the same
sort, and to some extent to our motivation and desire in perceiving the signals. Learning of motor skills is the result of
inducing the formation of new mirror-neuron networks by plasticity processes. The amount of mirror activation
correlates with the degree of our motor skill for that action. Experiments have shown an increase in mirror activation
over time in people who underwent a period of motor training in which they became skilful. It works after brain injuries
too; data on plasticity induced by motor observation provide a conceptual basis for application of action-observation
protocols in stroke rehabilitation.
Since we all as adults already have ample experience and skills in speaking at all (our first language), our mirror
neurons are ready to recognize, mirror and imitate the new language almost directly (after due listening practice, as
above; otherwise not). This is in stark contrast to a pre-linguistic toddler, who has to train both his mirror neurons and
his speech organs from scratch, which takes many times longer than for adults. (Small children do not necessarily learn
languages more quickly than adults, except for the fact that they usually spend far more practice time per day on it than
adults.)
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 7/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
A little handicap we have as adult learners is that our mirror neurons are heavily biassed in favour of our first
language(s), so they will tend only to "recognize" and do what they already know or think they should expect (the
action recognition function). That is, they may miss many details and get a more or less distorted picture that better
conforms with their experience. Deaf by preconceptions. This happens particularly if we start reading too soon into the
course. Learning a new language should always be done without reference to the writing, initially. Because the letters (if
based on the same script system as ours, or transcribed to our script) will in all likelihood signal their usual meanings to
us, namely the sounds of our own native language. This will lead to suboptimal perception, suboptimal recognition, and
suboptimal imitation of the new details, the situation we call "foreign accent". To avoid this, we would need a teacher
pointing out the details and giving immediate feedback for the learner to perceive and modify his pronunciation habits
in accordance with the patterns of the new language correctly. However, since we already are super elite players of our
speech instruments as such, this actually is no big deal, but we do need to get the detailed information and pay much
attention to it until our new pronunciation becomes automatic and starts working subconsciously. We are better than
parrots. We use both quality and quantity for learning. So, in addition to a teacher, we need extensive and deliberate
listening practice, as recommended in this paper. If you have no teacher, studying phonetics is a good option. Also if
you do have a teacher. And compulsory if you are a teacher.
The actions of mirror neurons are subconscious most of the time, but sometimes they surface in comical ways:
Examples that everybody surely has experienced are when we are watching a soccer game on TV and feel twitches in
our own legs as if to try to kick the ball; or when we are listening to a person with a hoarse voice and feel urged to clear
our own throats. The latter example is due to the fact that there are direct neuronal pathways from the primary auditory
cortex (in the temporal lobes) to those mirror neurons (in the frontal lobes) that monitor and control the speech and
voice muscles. These direct pathways do not involve understanding of the contents! This makes it very fast to shadow
or mirror what somebody is saying, even before you know what s/he is saying. This also makes it very effective to
practice pronunciation in chorus with your class, or in unison with your recording, because your mirror-neuron system
will compel your speech and voice muscles to act according to the loud and overwhelming auditory input. This will
push you into getting a native-like rhythm and intonation, virtually without even a chance of getting it wrong. You will
like that!
Indeed, experiments have confirmed that coupling observation and execution significantly increases plasticity in the
motor cortex. After a training period in which participants simultaneously performed and observed congruent
movements, there was a potentiation of the learning effect. "Observation" here might mean only the auditory input, but
best of all would be a live teacher, whose lip shapes, facial expressions, gestures and all body language could be
observed and mimicked.
All of this, all that is known about mirror neurons in speech-related activities, lends very strong, neurophysiological
support for the method as advocated in this paper, in which we practice multimodally multitudinous times in chorus
along with the teacher and class or a recording. We call it Quality Repetition. (This term was coined by Judy B.
Gilbert, well-known author of many books on English pronunciation for foreign or immigrant learners, when we gave
workshops together long ago. Judy also introduced the use of a big rubber band to indicate the long sounds of English.
This is more than a toy thingy, it is the powerful addition of another modality, vision, to the exercises. It will
significantly increase the neuronal traffic between the left and right brain and assist in making that detail ─ length ─
more salient and robust in the learners' procedural memory. I use the rubber band extensively in my Swedish classes
too.)
Most mirror neurons seem to be distributed in the frontal lobes, which are the "head-quarters" of motor activities.
Neuronal networks involved in speech and facial expressions are concentrated in Broca's area (and its homologue on
the non-dominant side) where there is a concentration of mirror neurons. Actually, these mirror neurons for speech also
monitor the results of their own speech by continuous, real-time mirroring and monitoring our own spoken output. That
is, they compare what they hear us ourselves say, with the memory of what they think we should say and should sound
like. This enables us to modify our speech on the fly, should the need arise due to some temporary constraints, such as if
we are chewing gum at the same time, or are having a congested nose, or are whispering or shouting, or whatever that
forces the speech muscles to act differently from the usual ways. This is called compensatory articulation, in which we
can instantly modify, adapt and correct our articulation by result-guided processes based on the audio-motor
procedural memory stored with our mirror neurons. "Audio-motor" = the coupling of sounds and speech gestures. 4 All
motor movements (including vocal ones) are organised around goals.
Actually, there is always a natural variation in our pronunciation of any sound or sound sequence, not only depending
on such factors as the degree of stress, the surrounding sounds, how much air we have left in the lungs, etc., but also
random variations because we are only humans. Not the least, there are immense anatomical differences between
individuals. Some of us are small, some are big, some are adults, some are children, some are males, some are females,
4 Of course, there is also input from sensor organs of touch in lips, tongue and pharynx, and proprioceptive information of muscular and joint
positions and movements, but "audio-sensory-propriocipio-motor" would be too cumbersome a word. Let "audio-motor" cover it all.
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 8/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
some are skinny, some are obese. Some have high-domed palates, some have flat palates, some have narrow palates,
some have wide palates, some have big chins, some have small chins, some have their teeth pointing this way, some
have their teeth pointing that way, some have all their teeth, some have not. Etc. etc. All these factors lead to ever so
different acoustical properties, but all are still able to produce virtually the same acoustic output as everybody else
speaking the same language. Speech is independent of anatomical differences between vocal apparatuses. In effect, each
individual has his own full set of goal-oriented basic compensatory articulation to accomodate for his particular
anatomy to achieve the same acoustic result as the other people around. One could say that we are all experts on
applied, acoustical phonetics. This competence pivots on the auditory-goal-guided processes of the audio-motor
procedural memory. And this is why we have to train our ears first in learning a new language (just as we did for our
first language).
An important thing to know is that "a pronunciation" is not a kind of event in which we hit a canonical bull's eye in the
middle of a target; never so. It is rather a whole cloud of permitted variants around that bull's eye on a multifaceted,
polygonal target slate, or region, bordering on and bumping into its surrounding sounds, much like the electron cloud
around an atom. For the atom, it is totally unimportant where the electron is, as long as it stays with that atom. For the
listener of speech, it is totally unimportant where in the target region the speaker hits, as long as it is in the right region,
for example "th" instead of "f" or "t" or "s". Try saying these with varying positions of the tongue and lips. In a natural
context, the native listener will not discern anything of the physical variation (if not too excessive), but will perceive the
"target region" as an abstract category (called a phoneme in linguistics). This is called categorical perception, in which
the native speaker is just like deaf for the internal variations but a super-expert on detecting the minutest transgression
across the boundaries. The categorical perception has its counterpart in articulation and other aspects of speech; we may
call it categorical production. This is where the compensatory articulation comes into play: We never need to pronounce
"bull's-eye" canonical sounds. It is rather not only enough, but even the better, to hit the target region (the "category")
on the part of it that is nearest our previous target region which we just pronounced, and with any temporary constraints
that may have applied. For this reason too, nothing is better than a live teacher who makes us practice repeatedly in
chorus with all the natural and other variations, and who acts as a Quality Controller giving us immediate feedback
whenever our products happen to fall outside the stipulated limits, but generously lets us hit anywhere between the limit
lines a great number of times accompanied by cheers and encouragement. Then, by sheer statistical learning, we will
acquire a feel for the limits of native or native-like compensatory articulation. Categorical rather than precise
articulation is the goal.
Interestingly, the boundaries of a given sound is not static, but moves and changes depending on surrounding sounds
degree of stress, and other factors. On moving from one sound to the next, the speaker should only try just to cross the
nearest boundary but not to go further, thereby quickly achieving sufficient categorical (phonological) contrast with
minimal effort. This phenomenon is one of several factors of what phoneticians call coarticulation, the articulation of
adjacent sounds almost "together". Exaggerated articulation, on the other hand, is a sign of foreignness, be it ever so
perfect as such. It may even have reduced intelligibility for native listeners due to its relative lack of coarticulation.
Quality Repetition helps me achieve the natural articulation and coarticulation.
IPA transcription is a research instrument intended to show one particular pronunciation at one particular event and thus
does not reflect the natural variation. Therefore, IPA is not optimal for the purpose of teaching or practising
pronunciation. When I learn languages or teach Swedish, I very seldom use IPA. The ears are much more powerful. But
depending on the learner's experience with IPA and awareness of natural variation, it might still be a useful substitute
when a teacher isn't available.
Little by little I will start softening the sound level, more and more. Finally, I will hardly hear the sounds at all while I
still keep repeating. At that stage I will speak it almost by myself, like a native! Without the help of a teacher. But direct
feedback with comments by a live teacher with the same amount of patience would of course have been even better.
With this method I can fairly quickly learn the pronunciation, at least the prosody, of any language. I only need a few
short recordings, I edit them, and I listen to them hundreds of times. I can even have them droning from the car CD
player while I am driving, because being repetitive they don't distract my attention on driving, while I can still listen
attentively enough to train my ears and mirror neurons.
Initially I don't necessarily have to understand anything at all, but of course it more fun I could. With time I will be able
to discern more and more. I am like a little child conquering his first language, but I do it faster than a child. With my
recordings, I have no teacher who gets fatigued, no difficult letters, no boring text, no complicated grammar, no
confusing explanations. Only pronunciation, pronunciation, pronunciation, pronunciation, ... Particularly the rhythm and
intonation. When my new pronunciation is ready (!) after some time with thousands of exercises with the same small
amount of practice sentences, then it is time to move on with a good textbook and/or teacher. I will be on the
approximate level of a native 2-4 year old. That is, I will have a native or near-native prosody, as explained above. But
in addition, I will also have quite good command of most if not all the vowels and consonants, because my speech
apparatus is mature. And I have a basic vocabulary and a set of useful sentences. The front door to the new language is
wide open. I can begin functioning in a simple conversation. Fortunately, my interlocutors can't know what I do NOT
know. Thanks to my pronunciation they will think I know very much more than I actually do, even when I hesitate and
don't find the right words. They will find it natural that I'm still having some empty slots in my command of their
vocabulary... It will be easy for me to make contacts with native speakers, because they will not shun me because of my
pronunciation. They will respect me because I respect their language.
This situation, in my opinion, is far better than hurrying through a language course and superficially learn a lot, but with
unbrushed prosody and pronunciation, hoping that I will deal with that later. Because the sad truth, as you may have
inferred by now, would most likely rather be that I learn and automatize such unbrushed pronunciation that neither I nor
my teacher nor any other native speaker will like, much less respect.
An advantageous spin-off effect of the Quality Repetition method is the fact that, in all languages, there are close
connections between the pronunciation and the grammar, particularly between their prosody and syntax. Hence,
focusing so hard on the pronunciation initially, will also help me approach and master the grammar better later on.
I will also claim that the method I advocate here is very time-efficient. Because it will not take a long time to master 20-
30 sentences to the level I aspire for. Of course the required time is very individual, depending on many factors such as
previous experience with learning languages, time available for practice, and the difficulty of the particular language.
But I would dare say that it should take no more than 100 hours of active exercises. The other alternative, the broken
pronunciation, will take most people more than a lifetime to repair!
The scientific and empirical underpinnings for this method are sketched in my 1998 article "Accent Addition : Prosody
and Perception Facilitate Second Language Learning" (see link in the bibliography), and detailed in my 2002 book
"[Pronunciation, Language and the Brain. Theory and Methods for Language Education]" with more than 200
annotated references (sorry, only in Swedish so far). But when they were written, we didn't know as much about mirror
neurons as we do now. So the present paper is an important addition.
5 Minimal pairs
Don't ever practice much with minimal pairs! Minimal pairs are good for phonological research and for making learners
aware of crucial, phonological distinctions, such as in the vowel in ship and sheep, or the initial consonant in tin, thin
and sin. So, of course some listening practice and some pronunciation practice with minimal pairs will obviously have
to take place, but only initially, for creating the awareness. Not more. They should never be automated pairwise,
because of Hebb's principle, "neurons that fire together, wire together." That is, if the words are automated together,
they will always pop up in my mind together. Even if (or, rather, particularly if) I master the distinction to exquisite
perfection, every time I am about to say one of them in context, both of them will appear in my mind as in a multiple
choice test, I will hesitate for a fraction of a second, and distressingly often pick the wrong one. Usually, I will notice
the mistake and immediately correct myself. But there has been a break in my fluency, a totally unnecessary break that
will embarrass me every time. "Oh horror! I chose the wrong word again even though I know perfectly well..."
A conspicuous example of the destructiveness of minimal-pair exercises is the /r/ versus /l/ issue for Japanese speakers
of English; they struggle with the pair almost daily ever since they begin learning English in middle school. Even those
who are highly proficient in the language as well as in the phonetic realization of [r] and [l] fumble with them almost
every time and make many unnecessary and sometimes embarrassing mistakes. Those Japanese persons whom I met
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 10/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
who spoke Swedish or any other foreign language generally fared much better, making no or much fewer such mistakes.
Presumably they did not practice l-r as minimal pairs in their other languages.
This happens not only in pronunciation but in grammar and vocabulary too, such as gender le-la in French or en-ett in
Swedish. I'm sure every reader of this paper can recognize the situation. For instance, native speakers of English have a
notorious tendency to pick the wrong alternative of their and there in writing their own language. This is not due to low
education or low IQ but more likely to Hebbian muddle-up. Their teachers will have been very meticulous about
teaching them the distinction a zillion times at school... So don't ever practice much with two similar things. Put them
each in their own natural (and different!) context, and Quality Practice one the first day, and the other one another day.
Monday: There was a cute, fluffy sheep in the barn. Wednesday: I saw their luxurious, white ship in the harbour.
Fig. 2
Fig. 3
Fig. 4
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 12/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Sometimes it may be difficult to set your computer and Audacity for recording from a microphone or the speaker sound
(E.g., from YouTube or some pod radio. If so, ask someone who understands your computer to help you.
You will probably have to import a separate component to handle mp3 files. If so, follow the link and tips that may pop
up and install that component too. Or else, skip mp3 and use only wav.
Hint: When using a microphone, be sure to place it at your cheek a little bit behind the angle of your
mouth, so as not to blow air into the mic and cause a noisy recording.
NB: In most laptops the built-in mic makes rather low-quality sound, so a separate mic is recommended!
More hints: If you want to make phonetical analyses, use the wav format, not mp3. The program of first
choice for phonetics is Praat (Dutch for "speech"). Praat too is free, extremely versatile and powerful and
used by most of the phoneticians in the world. Unfortunately it is not so intuitive, but there are lots of
detailed help files, tutorials and active user groups. Download it from http://www.fon.hum.uva.nl/praat/
One very good tutorial for both Praat and phonetics is available at http://swphonetics.com/praat/ by
renowned Swedish-British phonetician Sidney Wood.
Fig. 7
Hint: When you quit the program, it will ask if
you want to save changes. Always reply No!
(Fig. 8) I will explain later below.
Fig. 8
9 Zooming
Look at the View menu.
There are several alternatives for zooming in and out. (Fig. 9)
Try them out, and learn the keyboard commands! That will speed
up and simplify your work significantly.
Fig. 10
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 14/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
← If you place the marker on the lower edge of the stereo sound channels, you
can resize both channels up and down symmetrically.
(See Fig. 11 a and b on the left.)
Fig. 11
Fig. 12
10 Stereo or mono?
Usually mono is enough (occupies ½ the
space on my hard disk), so I will remove
one channel.
Click the little triangle ▼ (Fig. 13), and get
a drop-down menu Fig. 14).
Choose Split Stereo to Mono, and the
channels will split into two identical mono
channels.
Pick either one and close it with the little Fig. 13
cross × in its upper left corner (Fig. 15).
The other option here (Split Stereo Track) will keep the right and
left channels different as in the original (if you really used a stereo
microphone). You might want to experiment with each channel
separately, and then join them again. You will get funny or artistic
effects! Fig. 14
However, for the purpose of pronunciation exercises, mono is enough, occupies the least
space on your drive, and is the best choice.
Fig. 15
Hint: Remember that you may Undo (Ctrl+Z) at any time, and Redo (Ctrl+Y) (and "un-undo" and "un-
redo") as many times as you like, if needed or wanted. If you ever should feel total panic, wondering
what on earth you have done, then just close the program, and as always answer No to the question if you
want to save changes! Next time you open the file, everything is as it was from the beginning. The
original recording will never be affected by our manipulations.
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 15/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Hint: When you temporarily stopped the recording during class, and then start recording again, a new
track will be created below the previous one. This does not matter much, but makes the editing
cumbersome afterwards. It is better to continue recording in the same track as before. To achieve this,
press Shift+Record (Shift+R).
Alternatively, use the Pause button instead of Stop. Then just un-pause to continue recording.
Fig. 16
Here is the result of the 21.4 dB amplification in this particular example (Fig. 19):
If we ever should want to make the sound softer, we use the same menu Amplify
but put a minus (-) in front of the dB value.
Fig. 19
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 16/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Hint: Sometimes there are spikes of artefact noises in the midst of the utterance that I want to amplify.
Then I zoom into the noise until I can delimit and select only the spike, exactly, and de-amplify it
significantly. Finally I will zoom out again and amplify the whole utterance in the usual way as described
above, the noise being gone.
Hint: After selecting, but before doing anything with the selection, I press Z on the keyboard. This will
move the edges of the selection to the nearest zero value in the amplitude curve. This essentially removes
the risk of getting irritating clicks in the manipulated result. (I press Z so often that it has become a like a
subconscious reflex, even if it often is unnecessary. But it takes less than a second, and nothing can be
destroyed.)
Fig. 20 shows a very zoomed-in picture of the left edge of a selection before I pressed "Z", and Fig. 21
shows the result after "Z". Notice how the edge of the selection and the amplitude curve now cross the zero
line at the same place.
Fig. 20
Fig. 21
Fig. 22
If your recording has too slow tempo, you can speed it up with a positive percentage. I do it most of the time with the
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 17/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Book2 recordings, especially on the renderings in my own language. (All recordings in Book2 are intended for learners,
and then combined to be used bilingually in a great number of possible permutations.)
Remember (again) that you can always Undo (Ctrl+Z) and try other values until you are satisfied. Or just for fun!
14 Prepare sound tracks for practising with your smartphone, CD, mp3 or
computer
Let's assume that I have an audio recording from a language class, or a chat over a cup of coffee with friends, or a radio
program, or a TV drama, or an old language course on a cassette tape, or something from YouTube, or whatever, with
useful phrases that I want to practice my pronunciation with. In the following example I have chosen a little phrase
embedded in a dialogue. The phrase happens to be about 2.31 seconds long (displayed in the bottom margin; Fig. 24).
This duration is very suitable for pronunciation exercises. Remember that! About 2 seconds is the best duration for
practice sentences! Perhaps a bit longer when you are getting more advanced. I listen a couple of times with
Shift+Spacebar (=Shift+Play), and take a note of its time position that is displayed along the upper border; in this case
just before 15 seconds measured from start (Fig. 24, upper margin). This is useful to know if the total recording is very
long and I might get lost when I zoom out...
I then press Z and modify the amplitude and tempo as above, if needed.
I also want some "air" around my practice phrase, so I will create silence before and after it. I zoom in a bit and put the
marker at the left edge of my selection, press Z and click the menu Generate → Silence (Fig. 25) and get a dialogue to
choose the duration of the silence, for example 2 seconds (Fig. 26). I do the same at the end of the selection.
Fig. 24 Fig. 25
Fig. 26
My track now looks like in Fig. 27; no sound is lost, just pushed aside by 2 seconds in each direction:
Fig. 27
Hint: Be sure now to extend the selection a little bit into the silences, particularly some 600 ms
(milliseconds) at the end. Because ca 600-800 ms (0.6-0.8 seconds) of silence between the repetitions,
neither longer nor shorter, will typically make it easy to practice in unison with the program with a
comfortable rhythm. Test this by Shift-playing your selection a couple of times, stop and adjust the
included silences and Shift-play again, until you obtain the rhythm that feels the most comfortable to you.
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 18/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
The next thing is to make the selection repeat itself a couple of times. Go to Effect →
Repeat... (Fig. 28) and specify the number of repetitions (Fig. 29). I often enter 5, which
will give me 6 exemplars total (Fig. 30).
Fig. 29
Fig. 28
Fig. 30
Hint: This 600-800 ms silent interval between the repetitions will give precise time for breathing and
contemplating how to modify one's pronunciation for the next round. Because this is all about chorus
practice together with the recording; not any "listen and say after me" as in olden times. ("The "listen and
say after me" procedure is ineffective in the beginning of second language learning; perhaps better a little
later on, when the pronunciation is solidly mastered.)
While my six exemplars of the practice sentence are still selected, it is time to save them. However, in Audacity we
typically don't "save" the file, but export selection. Go to menu File → Export selection... (Fig. 31):
Fig. 31
...and first choose a suitable location to save it, and then a suitable file name (for instance part of the sentence itself). I
can also choose the file format, such as MP3, WAV or other (Fig. 32):
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 19/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Fig. 32
Hint: Write track number before the file name (with leading zero for 01-09). This will simplify the
sorting later.
Hint: I put my practice sentences in Dropbox directly. This will give me immediate back-ups in case of a
hard-disk crash after all this work, and best of all, I can access the most recent version of my files at once
from any other computer and my smartphone. No need for a memory stick.
If you haven't got Dropbox yet, please use this "invitation" link from me http://db.tt/tsfzycJ4, and we
will both get a little extra bonus space.
Remember when you close the program to reply No to save. We have already exported what we wanted to keep.
Extra: If you reply Yes to Save, you will save a Project, a special Audacity file that is quite big but
allows many exciting possibilities. For instance, you can annotate your recording. Or you can create
music, or sing in chorus with yourself in several different tracks while you are playing various
instruments in several other tracks. You can manipulate and mix them in innumerable ways. Professional
musicians do so. There are a lots of fun things to do with Audacity. When you are ready, you concatenate
them all into a final version with two stereo channels, export them to a WAV file, burn ten CDs, and go
sell them on the Flea Market on Saturday! Or at least one CD to your mother.
15 Now YOU try! Experiment with Audacity and yourself. Nothing can go wrong!
16 More hints
For quickly and easily getting the pitch contour (aka F0 extraction) of your practice sentence(s), please use the free
program WaveSurfer. Read about it here: http://en.wikipedia.org/wiki/WaveSurfer
and download it here: http://www.spectrogramsforspeech.com/tutorials-2/software-download-2/
In the Glossika group on Facebook, some very good suggestions came up:
Alexander Giddings wrote:
It just occurred to me that the quickest and most effective way to edit the A files may be simply to use the repeat
function over each group of two target sentences (following the primer) and then the truncate silence feature over
the whole file once you are finished, which will give you a pause of exactly the same length (i.e. 600-800
milliseconds) between each repetition and between each group of repetitions. ... There is one downside, however,
which is that any sentence-internal pauses (as in the mini-dialogues) longer than the specified truncate length will be
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 20/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
20 Selected bibliography
Cattaneo, L., & Rizzolatti, G. (2009). The Mirror Neuron System. Archives of Neurology, 66(5), 557–560. Available at
http://archneur.jamanetwork.com/article.aspx?articleid=796996
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The Role of Deliberate Practice in the Acquisition of Expert
Performance. Psychological Review, 100(3), 363–406. Available at:
http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice(PsychologicalReview).pdf
Ericsson, K. A. (2000). How experts attain and maintain superior performance: Implications for the enhancement of
skilled performance in older individuals. Journal of Ageing and Physical Activity, 8, 346-352. (Updated excerpt
available at: http://www.psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html or
5 Kjellin, O. (1977). Observations on consonant types and “tone” in Tibetan. Journal of Phonetics, 5, 317–338.
© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 21/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
http://www.freezepage.com/1404355998UGCCCQIQAR)
Hurford, J.R. (2002). Language beyond our grasp: what mirror neurons can, and cannot, do for language evolution. In
D. Kimbrough Oller, U. Griebel, & K. Plunkett, eds. The Evolution of Communication Systems: A Comparative
Approach. Cambridge MA: MIT Press. Available at: http://www.lel.ed.ac.uk/~jim/mirrormit.pdf.
Kjellin, O. (1999). Accent Addition : Prosody and Perception Facilitate Second Language Learning. In O. Fujimura, B.
D. Joseph, & B. Palek, eds. Linguistics and Phonetics Conference 1998 (LP’98). Columbus, Ohio: The Karolinum
Press, pp. 1–25. Available at: http://olle-kjellin.com/SpeechDoctor/ProcLP98.html. (Recommended reading!)
Kjellin, O. (2002). Uttalet, språket och hjärnan. Teori och metodik för språkundervisningen [Pronunciation, Language
and the Brain. Theory and Methods for Language Education]. [Swedish] Uppsala: Hallgren och Fallgren Studieförlag
AB.
Rizzolatti, G. (2005). The mirror neuron system and its function in humans. Anatomy and Embryology, 210(5-6), 419–
21. Available at: http://link.springer.com/article/10.1007/s00429-005-0039-z?LI=true
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. WIREs Cogn Sci. Retrieved May
14, 2012, from http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WCS78.html
Skoyles, J.R. (1998). Speech phones are a replication code. Medical Hypotheses, (50), pp.167–173. Available at:
http://human-existence.com/publications/Medical Hypotheses 98 Skoyles Phones.pdf.
Tettamanti, M. et al. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. Journal of
cognitive neuroscience, 17(2), pp. 273–81. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15811239.
***