Kjellin 2015 Practise Pronunciation W Audacity

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/285234145
Quality Practise Pronunciation With Audacity – The Best Method!
Research · November 2015

DOI: 10.13140/RG.2.1.3617.9281
CITATIONS READS
0 36,080
1 author:
Olle Kjellin
Landstinget Kronoberg
19 PUBLICATIONS 134 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Speech physiology View project
All content following this page was uploaded by Olle Kjellin on 30 November 2015.
The user has requested enhancement of the downloaded file.

© Olle Kjellin 2015: Practise-Pronunciation-w-Audacity 1/21
May be updated at any time; this is version 1.2, last edited on May 24, 2015 at 16:26:42
Quality Practise Pronunciation With Audacity – The Best Method!

A tutorial by Olle Kjellin, MD, PhD
A re you learning a new language? Do you, like me, have the ambition to learn it well, to sound as "native" as
possible, or at least to have a listener-friendly pronunciation that will not embarrass me or annoy the native
speakers? This paper will show you how to achieve that, and explain why it is possible, even if you are not a
child. In these 21 pages with its 34 illustrations you will learn how to:
• Produce perfect pronunciation exercises with your favourite sentences for free.
• Practice the way that will give you the best result, for example perfect pronunciation, if you wish.
1 Introduction
There is as yet, to my knowledge, no freely or commercially available pronunciation practice material that is "best" for
my purpose. So I produce my own material, and so could you. It is easy with Audacity, which is a very powerful free
software for recording and editing sounds. It costs time, to be sure, but this is well spent time that not only yields really
good results for my pronunciation exercises, but it also makes learning faster. This tutorial will show both how to utilize
Audacity and how best to perform the exercises, and why so, according to my knowledge and experience. 1 Hopefully, it
will suit you too.
Among commercially available language courses, for instance Pimsleur and Rosetta Stone make really good courses. I
have several of them. But still I want to add my own modifications to make them even better, or supplement with other
material that I make myself according to the guidelines in this tutorial. As you will soon see, I practice without any text
in the beginning, in accordance with recommendations based on research as well as on my own experience. Because
most writing systems do not represent the pronunciation well enough, but rather confuses the learner and leads to
faulty, "broken" pronunciation. If you still do want written support, you should learn the IPA (International Phonetic
Alphabet http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and try the Glossika method
(http://www.glossika.com/).2
For advanced learners of English, Richard Cauldwell's Speech in Action http://www.speechinaction.com/ is an
unsurpassed source.
If you don't want to pay for CDs or online courses, there are some quite good free materials, too. If you have native-
speaking friends to record, do so. Else, I can recommend book2 from Goethe-Verlag http://www.goethe-
verlag.com/book2/. I often download their sound files and modify them as below for my own language studies.
However, beware of all the amateurish materials that are abundant on the Internet. Most of them are too bad! Some even
incorrect.
This paper was born from my Swedish tutorial on Audacity, originally written as a handout to participants in my
pronunciation classes. It had only a brief description of the practice method at the end, but many readers wanted more
of that kind, and they wanted it in the beginning, so here it is! As a testimonial of the successfulness of the Quality
Repetition method, I might mention that many of my pron-class participants thought the course was too short,
regardless of education level (even many MDs or other academics who were unsatisfied with their Swedish
pronunciation), and despite the fact that the courses were one whole week long, about 35 hours mainly consisting of
intensive chorus and individual practice on just 12 representative sentences: how to say the participants' own street
addresses! And all of them were very angry that they were not given this chance to pronounce correctly from the
beginning.
2 How to practice pronunciation the best – in theory

My method of practising pronunciation is very effective. It's all about simply repeating many, many, many times.
Deliberate, tenacious practice. Purposeful, persistent practice. Actually this is the classical method for learning
anything that you want to hone your skills in, such as in sports, arts, hunting, playing instruments or computer games,
1 I am a language nerd having tried to learn many foreign languages and taught Swedish to foreigners and to Swedish teachers intermittently since
1970. Further, I am a linguist and phonetician with a medical Ph.D. in Speech Physiology. In addition, I am also a medical radiologist specialized
in the anatomy and functions of the speech and swallowing organs. And, because of my interest in the neurology of learning and communication,
I also worked for 6 years' in a memory and dementia clinic. See further section 18, p. 20
2 If you are reading the PDF version electronically, the links and cross-references should be clickable.
dancing, typing, operating brains, reading x-rays, writing calligraphy, flower arrangement, or whatever skill you want to
acquire. It's not a unique method at all, but rather self evident for elite performers in all those areas, so it is doubtful if I
could call it "my" method. But sadly, deliberate practice has been out of fashion in language (and mathematics)
pedagogy since decades! It has been scorned at as "skinnerism" or whatever. This is a very unfortunate situation, and I
want to turn it back to normal again. Deliberate, persistent practice in a special way to be described below and termed
Quality Repetition is "my method", or, in fact, everybody's method known since prehistoric times. It is effective because
it is based on neurophysiology. Toddlers do it all the time in their own, innate, smart way when they acquire their first
language. (Or languages; because there are no physiological, only practical limits to the number of languages that can
be acquired in parallel.) Adults are well advised to peek at toddler's "methods" and adapt them to their own capabilities
and limitations. This paper will show how you can do that.
Teachers usually teach the alphabet and the grammar well and carefully, but seldom pronunciation to a sufficient
degree. Many of them even think it is unnecessary to practice pronunciation with adult learners, on the (false) belief that
they will never succeed anyway. Particularly not with the prosody (rhythm and intonation of speech), which often is
alleged to be "the most difficult" thing to learn in a new language, although it is arguably the most important thing to
learn if you want to get a listener-friendly pronunciation with a good communicative function.
Is it really true, then, that it is so difficult to learn L2 (second-language) pronunciation, or that the prosody is
particularly difficult? – No, on the contrary! Not only is it very possible to learn very excellent L2 pronunciation
equal to, or not very different from, native pronunciation, but the prosody even is the easiest part! This claim of mine
is based on my long-time experience as a language learner and teacher coupled with my medical training focused on the
physiology of the voice and speech organs and of the brain and neuromuscular system in learning and forgetting. There
is plenty of scientific evidence (though mainly in the medical literature); see the selected bibliography in section 20 on
page 20.
It has become more and more recognized among language teachers in the recent decades that the speech prosody is the
overwhelmingly and undeniably most important factor for reaching a near-native, or at least a listener-friendly,
pronunciation. The prosody is for speech what the carrier wave is for a radio transmission. The "program" is
superimposed on the carrier wave, and the wave as such should not normally be perceived consciously. Therein lies
another great potential and important function of prosody: by suddenly varying the pitch, loudness or length of sounds
and words in unexpected ways, i.e., by adding emphasis, the speaker can choose to bring prosody up to a conscious and
conspicuous level and attract the listener's attention to the paralinguistic contents of the message. This corresponds to
italics, boldface etc. of writing.
The prosody of any language typically consists of less than ten or so rules based on only three fundamental elements
that every (=indeed every!) language uses in its particular prosody: voice pitch, voice loudness, and length of sounds.
These three mechanisms are well developed from the moment of birth (listen to a baby!), they work in the same way for
all human beings, and there are only partial differences in the details of how they are controlled and utilized in different
languages, in varying proportions of importance per each specific language. For example, what may be called "stress"
or prominence is often signalled by a certain pitch and/or loudness variation in the stressed syllable (as in Spanish,
Hungarian or Finnish), or on the pre-stress syllable (as quite often in Polish, Russian and maybe French), often
accompanied by a slight lengthening of the stressed syllable (as in Russian and Spanish) or a significant lengthening (as
in French and English), or signalled almost only by the length (as in Swedish), whereas length has nothing at all to do
with stress in some other languages (such as Finnish, Hungarian, Czech). And some languages don't even use "stress"
but have other means of prosodic signalling (such as Japanese, Somali, and maybe French). Pitch is used to signal the
morphological structure of words in some languages (such as Swedish, Japanese and Tibetan), or to signal lexical
identity in other languages (such as Chinese, Thai, Vietnamese and many African and Native American languages).
Common to all these uses still is that, regardless of language, they involve the very same three fundamental elements ─
pitch, loudness and length ─ to signal all those lexical, grammatical, emotional and other characteristics involved in the
spoken conversation. And in every culture there are songs, and songs too consist of notes in sequences with varying
pitches, loudnesses and lengths. So indeed, each one of us above toddler age already masters all the prosodic means
being used in any other language; we just have to learn how to tweak our skills for the specific details of the new,
particular language we are learning. And please do carefully note: All prosodic uses of pitch, loudness and length appear
in each and every utterance, so it is a very good and time-efficient idea to concentrate mainly on the prosody from the
very outset of learning a new language. Don't care too much about the particulars of vowels and consonants until you
feel confident with the prosody.
In contrast to the small number of prosodic details, there are typically some 30-40 vowel and consonant sounds (some
languages have less, some have more, some have considerably many more), but all of them don't appear every time, not
in every utterance. So they are indeed of less importance than the prosody, at least in the beginning. You can see proof
of that in children's first-language acquisition: By the time the toddlers can say 25-30 words in their emerging language,
their prosody is already identical with that of their adults. However, it will usually take some 5-6 years or even more,
before they can master all the vowels and consonants. Despite this, they are never perceived of as having any “foreign
accent” – thanks to their correct prosody! Therefore, I always practice prosody first and foremost, even if my tongue
will stumble on many individual vowels and consonants that may pop up every now and then.
But, how to do it then, if you have got no teacher to help you? The answer is in this paper. Produce your own materials
for pronunciation exercises and follow my advice here! Read more about the methodology and its neurophysiological
foundations in the next few sections. Then the hands-on instructions for the use of Audacity from section 7 and on.
3 Neurophysiology of speech perception, production and learning
The short version: In the way to be described below I will train my ears with the correct speech rhythm and
melody according to the model and saturate my brain's primary hearing centres as well as its hearing
perception centres with it. I should not torture my ears to hear me speak with a faulty accent (as I would do
in the beginning, if I didn't saturate my ears first). Subconsciously and gradually, by shadowing, mirroring
and imitation, I will train and automatize my mirror neurons (imitation neurons), which are then to guide my
speech muscles to my own pronunciation, when, eventually, I start saying the phrases myself without help.
In this process my brain will actually be physically changed due to its plasticity. This is learning on the
neuroanatomical scale. My brain will very effectively connect and match the sounds that I hear with the
sounds that I make and the sounds that I should make. Therefore, I should not trouble my speech muscles to
learn first to speak with a funny pronunciation (as, again, I would do in the beginning, if I didn't saturate my
ears first). Instead, I will first make the (correct) model utterances resound as a template din in my head, and
that will direct my speech muscles accordingly. It will then even be difficult for me to pronounce much
differently from the model. (Incidentally, this is also how our native, first-language speech is mirrored,
acquired, controlled and monitored, the speech muscles then being guided by internally “hearing” and
predicting how the result would and should sound for a given articulation. More than 50,000 years of human
language evolution cannot be wrong.)
In conclusion: I will practice pronunciation with my ears and let automated nerve reflexes do the rest. I will
then have created an “audio-motor procedural memory” for the target language, with a result as native-like
as I have the time and motivation to aspire for.
The longish version:

Neurophysiology is the science of how the nervous system functions in normal life. The nervous system is deeply
involved in everything we do and thus is of utmost importance. To really appreciate the best, neurophysiology-based
method of learning pronunciation, as presented here, it is a good idea to spend enough time on this chapter, in order to
understand some of the neurophysiological processes involved.
The nervous system works in two principal directions: (1) the sensory (afferent) system sends signals into the brain,
monitoring our body, its inside, outside and surroundings, with the senses of hearing, vision, taste, smell, and feelings of
touch, pain, temperature, vibration, balance, position, hunger, thirst, and more (yes, we do have much more than five
senses!). And (2) the motor (efferent) system plans and composes chains of muscular activities and sends signals from
the brain out, to execute those compositions, leading to finely tuned and controlled sequences of movements such as
walking, typing, speaking, correcting of equilibrium, playing tennis, etc. The sensory and the motor systems constantly
work together. In our exercises with the pronunciation of a new language, we want to make the two systems cooperate
on our own conditions. To this end, four important neurophysiological components will naturally, conveniently, and
rather effortlessly, come to our assistance:
A) Hearing
The primary hearing centres are neuronal arrays situated in the temporal lobes, bilaterally (both sides), also called the
auditory cortex. They belong mainly to the sensory system. The auditory nerves from both ears are connected to the
brain stem, and then relayed in a series of neurons (nerve cells) to the primary hearing areas and also to other places,
bilaterally. About 60% of the nerve fibres from one ear cross over to the other side (i.e., from the left ear to the right
temporal lobe, and vice versa), while some 40% remain on the same side. Some of the crossed pathways cross back
again after relaying its signals to various other locations and reflex circuits, for example for directional hearing and
head-turning reflexes towards a sound. See schematic pictures in bit.ly/auditory-path-1 and http://bit.ly/auditory-path-2.
A useful reflex is the Lombard reflex that causes me to speak louder in a loud environment. Replaying my material loud
will activate my speech organs better than soft. The auditory system is replete with reflexes of various kinds.
Speech may seem to be a sequence of distinct words, each made up of distinct sounds. In reality, however, speech is a
continuous stream of interwoven sounds. Sounds are caused by pressure waves in a medium. The words from a speaker
as well as all other natural sounds and noises from around us are an extremely complex mixture of regular and irregular
waves (vibrations) in air travelling to our eardrums. Pure physics. An intricate mechanism amplifies the air waves in the
ear while transforming them to water waves in the inner ear and then actually zooms in on speech-relevant vibrations
(particularly those pertaining to the speaker's speech rhythm), synchronizes with them, performs a basic sorting of
relevant sounds, filters out non-speech sounds, and converts all into electrochemical signals in the neurons leading to
the brain, where they are further sorted into higher-order categories of many kinds (phonetical, phonological,
morphological, syntactical, lexical, semantical, etc.), by which they can be identified and hopefully correctly
comprehended in their particular context.3 Surprisingly, the auditory nerve contains many more efferent (motor) fibres
than afferent (sensory) ones! However, the sorting and filtering mechanisms of the inner ear is dependent on this
arrangement.
The primary hearing centres register the physical characteristics of the incoming signals, such as pitch, loudness and
length, and map them tonotopically along the cerebral cortex. Tonotopical mapping means that low pitch is at one end
and high pitch at the other, like the keys on a piano, in a simple, straightforward array. It is nicely illustrated in
http://bit.ly/tonotopic. Corresponding "periodotopic" mapping probably exists for the temporal (timing) aspects of
sounds too.
From the primary auditory cortex, signals are relayed on to higher-order hearing-perception and comprehension centres
(see B, below), and to mirror neurons (see D, below). The pathways to mirror neurons are the shorter and faster, which
has important implications for our practice method. Presumably, the efferent fibres mentioned above are connected
with the mirror neurons.
We hear and perceive our own words in three different ways. The first is by air conduction: the sounds from our mouth
go around the cheeks into the ears and are converted as above. The second is by bone conduction: The waves travel
directly through the soft tissues and bone into the inner ear. This is louder and much faster than air conduction, not only
because the route is so much shorter and even bypasses the eardrum and middle ear, but also because the waves travel
more than four times faster and with additional components in water and solids than in air. So we really can't know how
our own air-conducted speech sounds until we listen to a recording. Some people don't like to hear themselves in a
recording, but that is how we "really" sound to other people, like it or not. The bone conduction pathway enables a very
fast route for auditory feedback, which is very important for the pre- and subconscious monitoring of what we are
saying while we talk. (As for articulatory feedback, there are also feedback loops through proprioception, i.e., senses of
muscular and joint positions and movements. However, although not unimportant, the proprioceptive routes in general
are too slow for real-time feedback.)
The third way of perceiving our own words is psychological: We "know" what we said, because we wanted to say it. It
is usually correct, but occasionally it happens that the mouth said it wrong, or even another word. In most cases we can
correct ourselves immediately, but at times it happens that the mistake goes undetected, and we can swear on being
correct even when we are not. Only a recording can reveal the truth then. Incidentally, this can also happen when
perceiving another person's speech. We may hear only what we expected to hear, or what we could comprehend, and we
can honestly swear on being correct even when we are not. Again, only a recording can solve the issue.
B) Perception
The brain's hearing perception "centres" also belong to the sensory system and are responsible for how we understand
speech and language. They are vast, complex, intertwined systems of nerve circuits and networks mainly distributed in
the parietal lobes and around the angles between the temporal and parietal lobes (Wernicke's area). These centres ,
circuits and networks continuously exchange information with one another as well as with the primary auditory centres
as well as with the mirror neurons (see below) across both the right and the left brain, and to innumerable other
networks that, all taken together, represent functions for speech, language, memory, emotions, etc.
Please note: No brain is “half”, even if they are called hemispheres. The brains are a paired organ, just like the eyes,
ears, kidneys, lungs, hands, feet, etc., none of which are “halves”. And like all the other paired organs, both brains can
perform the same actions simultaneously. However, in some special cases conflicts might arise if both brains competed
about what to do. So with thick bundles of extra fibres between them (the corpus callosum), the right and left brain will
communicate, discuss, and decide which side to do what and how much. As a result one side may become well trained
(dominant) and the other side "ring-rusty" or even inhibited (dominated!), but nevertheless always prepared to jump in
and substitute if the dominant side should falter.
Don't ever believe in the urban myths about right-left brain separation of tasks. Some of them may be a little true to a
3 People with hearing impairment using hearing aids or cochlear implants can't sort out sounds like that in their inner ear. A noisy environment or
several speakers talking simultaneously will impose great difficulties on them.
certain extent, but not in the way they are presented by non-experts in the media. The differences of "lateralization" and
dominance often are in only a few percent of the total, bilateral activity. Speech and language are such highly
specialized and finely trained functions, that the non-dominant brain is less prepared than for other functions ─ but not
unable ─ to jump in and substitute, for example after an injury. In the majority of people the left brain dominates for
many aspects of language, while the right brain usually is dominant for prosodic factors. However, both the right and
the left brain do indeed cooperate all the time, even in language and speech.
Eating and drinking utilizes the same anatomical structures in the face, mouth and pharynx as speech, but the
controlling neural networks are different from speech, and left-right dominance is random at about 50:50 percent. So, in
case of a unilateral stroke, about half of the patients get swallowing problems, depending on whether the stroke is on the
swallow-dominant side or not. However, in most of these cases the swallowing functions return more or less completely
in about 3-4 months. This is due to the brain's plasticity (see next section), by which the intact, previously non-dominant
brain, re-learns to control swallowing. Similarly for all other lost functions when they resolve after some time, but often
intensive and extensive rehabilitation is needed to wake up, coach and exercise the substituting brain.
C) Plasticity
Learning and getting results of training is only possible thanks to the plasticity of the brains. This means their ability
to adapt, reorganize connections, change, and even grow anatomically, in response to incoming stimuli and identified
needs, in effect relocating functions between the two of the pair as well as within each brain separately. This is one of
the most fascinating function of the brains, it happens very fast, and it occurs in both the sensory and the motor system.
And it is not necessary to have had a stroke to induce plasticity; it is a normal function in all brains, at all ages!
A connection between two neurons is called a synapse. Plasticity primarily affects the number of synapses. On an
average, each neuron has input synapses from about 10,000 other neurons and constantly receives various signals from
all of them, some excitatory, some inhibitory. When a neuron has accumulated enough signals of a certain kind that it is
specialized for, it "fires" and sends a signal on to its output synapses with, again, some 10,000 other nerve cells. One
adult pair of brains has about 100 billion (100,000,000,000) neurons. Multiply these three factors and find that this is
indeed a huge network of some ten billion billion (=10,000,000,000,000,000,000) connections. In comparison, the little
World Wide Web is rather a tiny network: the available Internet statistics from May 2015 says there are only 3.1 billion
users in the world right now. (http://www.internetlivestats.com/watch/internet-users/)
At birth, we are even bestowed with some 200 billion neurons, with only rather few synapses. But in response to all and
any incoming stimuli and physical activities of the child, zillions of new synapses are formed each minute and connect
all the involved neurons. To accommodate all the new synapses, the neurons form extensive systems of branches and
twigs, in a process called arborization. It is therefore very important to present as many different stimuli as possible to a
child from birth to adulthood, to promote arborization and synapse formation. The more modalities of different kinds
that are involved and coupled (eyes, ears, hands, body movements, right side, left side, etc.) in motor pattern formation,
the better and the more robust will the skills and long-term memories be. "Neurons that fire together, wire together"
(Hebb's principle). This too has pedagogical implications, because the same applies to all ages. For example, we should
practice prosody complete with suitable body language.
Unused neurons are weeded out or made dormant. For instance, surprisingly, a newborn baby has pathways from the
primary auditory centres in the temporal lobes to the visual centres in the occipital lobes, but since such pathways are
generally not needed, they will shrink and almost disappear. Unless the child is blind, of course, in which case these
neurons are retained, and retrained, to serve the visual centres, which would otherwise have been unemployed, but now
will be used for auditory tasks instead. That is an example of plasticity. However, even blindness acquired in adulthood
will induce similar activation of the visual centres by auditory input. That might seem to be impressive plasticity,
indeed. However, every instance of normal learning of anything at all is accomplished through these same plasticity
mechanisms, and they work perfectly throughout our entire lifetime! This is very encouraging news.
In response to a new stimulus it takes only seconds for small "knobs" (dendritic spines) to form on the branches of
neurons. This time-lapse video of knob formation https://www.youtube.com/watch?v=s9-fNLs-arc illustrates learning
on the scale of a single neuron! If the stimulus is not repeated, the new knobs will disappear. If the stimulus is repeated
sufficiently many times, the knobs will develop further and form permanent synapses and wire together all neurons that
happened to be involved in that task, for instance the pronunciation of a new speech sound, or a whole sentence with
correct rhythm and intonation pattern with concomitant body movements and gestures. The results are long-term
memories. Such wired-together networks may be re-used in total or in parts in the formation of yet other networks, and
hence assist in recall, cueing, and mental associations of all kinds. All this is the neurophysiological rationale for multi-
modal multiple repetitions in any learning process. Unfortunately, there is no shortcut to learning and long-term
memory, only repetitive work. Deliberate, persistent, repetitive practice.
Ever since we start speaking as toddlers and throughout all our lives, every time we say anything at all, every utterance
will serve as an instance of practice that will form new synapses and thus further consolidate and reinforce our speech
habits as represented in our mirror neurons. And so we will all become super experts in all the procedures involved in
hearing and speaking our first language(s). The robustness of the procedural memory and other long-term memory in
general is a linear effect of the number of repetitions. It is statistical learning.
Thus, procedural memories for skilled actions form like paths in a lawn: They emerge wherever you tread frequently
enough, nowhere else. But fortunately, there is no best-before date for plasticity. As we grow older, we will, in many
cases (but not all, depending on the type of task), need more repetitions per item to learn it and automatize it than at
younger ages. That is the only age effect. And there is no neurophysiological difference between language learning and
any other types of motor learning. So forget the disheartening myths about age and language learning, at least as
concerns pronunciation. (It may be true for grammar, which ususally is more complicated.) Just repeat a larger number
of times, if you are "older". And be sure to make it right from the beginning, to avoid arborization and synapse
formation for unwanted pronunciation. Because wrong pronunciation too will induce all these plasticity processes in the
same way and end up stored as motor "skills" in your long-term, procedural memory. You don't really aspire for that.
Fossilization in second language users (i.e. a petrified foreign accent in spite of many years' use of the new language) is
more due to faulty instruction and insufficient training at the beginner's level than to any biological constraints, and thus
is preventable (if you do want to prevent it). Due to the time handicap of adult learners there is little chance for us ever
to catch up with a native speaker in every respect, but it is indeed perfectly feasible to sound like one in the limited
number of sentences we are able to say.
In experimental conditions it has been found that automating a new (simple) motor skill takes about 15 minutes. Can
you practice the same sentence for 15 minutes? It seems like a good idea to do so. However, depending on the difficulty
of the task and your previous experience with similar skills, of course, it may take longer or shorter time than that to
learn a new motor pattern. For example, the 15 click consonants in Zulu are quite a challenge for English speakers, but
presumably easy-peasy for Xhosa speakers (who have 21 click consonants). When, however, you can say 20-30
sentences in a native or near-native way in your new language, after hours of deliberate, persistent practice on only
them, you will also be able to say 20-30 million other sentences in the same way. Because they all follow the same rules
of prosody and pronunciation. So part of the trick for the adult language learner is to have a very limited curriculum for
the initial pronunciation training period.
D) Mirror neurons
Our pair of brains contains numerous mirror neurons, also called imitation neurons. Discovered only in the late 20th
century, their functions are highly relevant for language learning and acquisition, and this may be the most fascinating
area of recent research in neuroscience. The human mirror system is involved in understanding others’ actions and their
intentions behind them, and it underlies mechanisms of observational learning. Research on the mechanism involved in
learning by imitation has shown that there is a strong activation of the mirror system during new motor pattern
formation. It has been suggested that the mirror-neuron system is the basic mechanism from which language developed.
Some functional deficits typical of autism spectrum disorder, such as deficits in imitation, emotional empathy, and
attributing intentions to others, seem to have a clear counterpart in the functions of the mirror system.
The mirror neurons belong to the motor system. They are motor neurons primarily involved in finely tuned muscular
actions, movements and procedures. But secondarily, they are also recruited when we observe other people perform
similar actions and procedures with which we ourselves already have prior experience and interest. In essence, mirror
neurons are a kind of action and pattern recognition mechanism essential for the perception and appreciation of what
other people are doing, saying, or intending. Therefore the mirror neurons are also crucially involved when we want to
shadow, mirror and imitate what others do or say, such as the teacher in a language class. Our ability of, and agility in,
such action recognition, mirroring and imitation depends heavily on the mirror neurons' prior experience of the same
sort, and to some extent to our motivation and desire in perceiving the signals. Learning of motor skills is the result of
inducing the formation of new mirror-neuron networks by plasticity processes. The amount of mirror activation
correlates with the degree of our motor skill for that action. Experiments have shown an increase in mirror activation
over time in people who underwent a period of motor training in which they became skilful. It works after brain injuries
too; data on plasticity induced by motor observation provide a conceptual basis for application of action-observation
protocols in stroke rehabilitation.
Since we all as adults already have ample experience and skills in speaking at all (our first language), our mirror
neurons are ready to recognize, mirror and imitate the new language almost directly (after due listening practice, as
above; otherwise not). This is in stark contrast to a pre-linguistic toddler, who has to train both his mirror neurons and
his speech organs from scratch, which takes many times longer than for adults. (Small children do not necessarily learn
languages more quickly than adults, except for the fact that they usually spend far more practice time per day on it than
adults.)
A little handicap we have as adult learners is that our mirror neurons are heavily biassed in favour of our first
language(s), so they will tend only to "recognize" and do what they already know or think they should expect (the
action recognition function). That is, they may miss many details and get a more or less distorted picture that better
conforms with their experience. Deaf by preconceptions. This happens particularly if we start reading too soon into the
course. Learning a new language should always be done without reference to the writing, initially. Because the letters (if
based on the same script system as ours, or transcribed to our script) will in all likelihood signal their usual meanings to
us, namely the sounds of our own native language. This will lead to suboptimal perception, suboptimal recognition, and
suboptimal imitation of the new details, the situation we call "foreign accent". To avoid this, we would need a teacher
pointing out the details and giving immediate feedback for the learner to perceive and modify his pronunciation habits
in accordance with the patterns of the new language correctly. However, since we already are super elite players of our
speech instruments as such, this actually is no big deal, but we do need to get the detailed information and pay much
attention to it until our new pronunciation becomes automatic and starts working subconsciously. We are better than
parrots. We use both quality and quantity for learning. So, in addition to a teacher, we need extensive and deliberate
listening practice, as recommended in this paper. If you have no teacher, studying phonetics is a good option. Also if
you do have a teacher. And compulsory if you are a teacher.
The actions of mirror neurons are subconscious most of the time, but sometimes they surface in comical ways:
Examples that everybody surely has experienced are when we are watching a soccer game on TV and feel twitches in
our own legs as if to try to kick the ball; or when we are listening to a person with a hoarse voice and feel urged to clear
our own throats. The latter example is due to the fact that there are direct neuronal pathways from the primary auditory
cortex (in the temporal lobes) to those mirror neurons (in the frontal lobes) that monitor and control the speech and
voice muscles. These direct pathways do not involve understanding of the contents! This makes it very fast to shadow
or mirror what somebody is saying, even before you know what s/he is saying. This also makes it very effective to
practice pronunciation in chorus with your class, or in unison with your recording, because your mirror-neuron system
will compel your speech and voice muscles to act according to the loud and overwhelming auditory input. This will
push you into getting a native-like rhythm and intonation, virtually without even a chance of getting it wrong. You will
like that!
Indeed, experiments have confirmed that coupling observation and execution significantly increases plasticity in the
motor cortex. After a training period in which participants simultaneously performed and observed congruent
movements, there was a potentiation of the learning effect. "Observation" here might mean only the auditory input, but
best of all would be a live teacher, whose lip shapes, facial expressions, gestures and all body language could be
observed and mimicked.
All of this, all that is known about mirror neurons in speech-related activities, lends very strong, neurophysiological
support for the method as advocated in this paper, in which we practice multimodally multitudinous times in chorus
along with the teacher and class or a recording. We call it Quality Repetition. (This term was coined by Judy B.
Gilbert, well-known author of many books on English pronunciation for foreign or immigrant learners, when we gave
workshops together long ago. Judy also introduced the use of a big rubber band to indicate the long sounds of English.
This is more than a toy thingy, it is the powerful addition of another modality, vision, to the exercises. It will
significantly increase the neuronal traffic between the left and right brain and assist in making that detail ─ length ─
more salient and robust in the learners' procedural memory. I use the rubber band extensively in my Swedish classes
too.)
Most mirror neurons seem to be distributed in the frontal lobes, which are the "head-quarters" of motor activities.
Neuronal networks involved in speech and facial expressions are concentrated in Broca's area (and its homologue on
the non-dominant side) where there is a concentration of mirror neurons. Actually, these mirror neurons for speech also
monitor the results of their own speech by continuous, real-time mirroring and monitoring our own spoken output. That
is, they compare what they hear us ourselves say, with the memory of what they think we should say and should sound
like. This enables us to modify our speech on the fly, should the need arise due to some temporary constraints, such as if
we are chewing gum at the same time, or are having a congested nose, or are whispering or shouting, or whatever that
forces the speech muscles to act differently from the usual ways. This is called compensatory articulation, in which we
can instantly modify, adapt and correct our articulation by result-guided processes based on the audio-motor
procedural memory stored with our mirror neurons. "Audio-motor" = the coupling of sounds and speech gestures. 4 All
motor movements (including vocal ones) are organised around goals.
Actually, there is always a natural variation in our pronunciation of any sound or sound sequence, not only depending
on such factors as the degree of stress, the surrounding sounds, how much air we have left in the lungs, etc., but also
random variations because we are only humans. Not the least, there are immense anatomical differences between
individuals. Some of us are small, some are big, some are adults, some are children, some are males, some are females,
4 Of course, there is also input from sensor organs of touch in lips, tongue and pharynx, and proprioceptive information of muscular and joint
positions and movements, but "audio-sensory-propriocipio-motor" would be too cumbersome a word. Let "audio-motor" cover it all.
some are skinny, some are obese. Some have high-domed palates, some have flat palates, some have narrow palates,
some have wide palates, some have big chins, some have small chins, some have their teeth pointing this way, some
have their teeth pointing that way, some have all their teeth, some have not. Etc. etc. All these factors lead to ever so
different acoustical properties, but all are still able to produce virtually the same acoustic output as everybody else
speaking the same language. Speech is independent of anatomical differences between vocal apparatuses. In effect, each
individual has his own full set of goal-oriented basic compensatory articulation to accomodate for his particular
anatomy to achieve the same acoustic result as the other people around. One could say that we are all experts on
applied, acoustical phonetics. This competence pivots on the auditory-goal-guided processes of the audio-motor
procedural memory. And this is why we have to train our ears first in learning a new language (just as we did for our
first language).
An important thing to know is that "a pronunciation" is not a kind of event in which we hit a canonical bull's eye in the
middle of a target; never so. It is rather a whole cloud of permitted variants around that bull's eye on a multifaceted,
polygonal target slate, or region, bordering on and bumping into its surrounding sounds, much like the electron cloud
around an atom. For the atom, it is totally unimportant where the electron is, as long as it stays with that atom. For the
listener of speech, it is totally unimportant where in the target region the speaker hits, as long as it is in the right region,
for example "th" instead of "f" or "t" or "s". Try saying these with varying positions of the tongue and lips. In a natural
context, the native listener will not discern anything of the physical variation (if not too excessive), but will perceive the
"target region" as an abstract category (called a phoneme in linguistics). This is called categorical perception, in which
the native speaker is just like deaf for the internal variations but a super-expert on detecting the minutest transgression
across the boundaries. The categorical perception has its counterpart in articulation and other aspects of speech; we may
call it categorical production. This is where the compensatory articulation comes into play: We never need to pronounce
"bull's-eye" canonical sounds. It is rather not only enough, but even the better, to hit the target region (the "category")
on the part of it that is nearest our previous target region which we just pronounced, and with any temporary constraints
that may have applied. For this reason too, nothing is better than a live teacher who makes us practice repeatedly in
chorus with all the natural and other variations, and who acts as a Quality Controller giving us immediate feedback
whenever our products happen to fall outside the stipulated limits, but generously lets us hit anywhere between the limit
lines a great number of times accompanied by cheers and encouragement. Then, by sheer statistical learning, we will
acquire a feel for the limits of native or native-like compensatory articulation. Categorical rather than precise
articulation is the goal.
Interestingly, the boundaries of a given sound is not static, but moves and changes depending on surrounding sounds
degree of stress, and other factors. On moving from one sound to the next, the speaker should only try just to cross the
nearest boundary but not to go further, thereby quickly achieving sufficient categorical (phonological) contrast with
minimal effort. This phenomenon is one of several factors of what phoneticians call coarticulation, the articulation of
adjacent sounds almost "together". Exaggerated articulation, on the other hand, is a sign of foreignness, be it ever so
perfect as such. It may even have reduced intelligibility for native listeners due to its relative lack of coarticulation.
Quality Repetition helps me achieve the natural articulation and coarticulation.
IPA transcription is a research instrument intended to show one particular pronunciation at one particular event and thus
does not reflect the natural variation. Therefore, IPA is not optimal for the purpose of teaching or practising
pronunciation. When I learn languages or teach Swedish, I very seldom use IPA. The ears are much more powerful. But
depending on the learner's experience with IPA and awareness of natural variation, it might still be a useful substitute
when a teacher isn't available.
4 How to practise pronunciation the best – in real life!

NB. This is important: When I play my practice sentences, I set my player to Repeat 1, so I can listen lots of times
without having to press the play button every time. Hundreds of times. Thousands of times, over and over again. This is
very efficient, and necessary for training my ears first. Particularly if it is a completely new language of which I have no
prior knowledge. Further, since I will make 6 copies of every exemplar in each track (see below), I will get 6 repetitions
of each even when I have not set it to Repeat 1, as when I review my material at a later stage. This will efficiently
remind me of all that I had forgotten. There is no commercially available material that is as good as this.
In the beginning I set the volume of the player to quite loud and "push" the sounds into my head. Little by little my ears
will be "saturated", and I will be able to discern words and feel an urge to mirror and to speak in unison with the
recording. Thanks to some particular neural reflex circuits between the ears and the speech organs (the Lombard reflex,
and others), I too will speak in quite a loud voice, reflexively. This is good for training the speech muscles, all 146 of
them. And thanks to some other nerve circuits (including the mirror neurons, that compel me to mirror and imitate
correctly), it actually is quite difficult to pronounce with any other rhythm and melody than in my model sentences. That
is, I will automatically and irresistibly get the correct prosody! If, at this stage, I don't get all the vowels and consonants
correct, this does not matter much, really, as explained in the previous section.
Little by little I will start softening the sound level, more and more. Finally, I will hardly hear the sounds at all while I
still keep repeating. At that stage I will speak it almost by myself, like a native! Without the help of a teacher. But direct
feedback with comments by a live teacher with the same amount of patience would of course have been even better.
With this method I can fairly quickly learn the pronunciation, at least the prosody, of any language. I only need a few
short recordings, I edit them, and I listen to them hundreds of times. I can even have them droning from the car CD
player while I am driving, because being repetitive they don't distract my attention on driving, while I can still listen
attentively enough to train my ears and mirror neurons.
Initially I don't necessarily have to understand anything at all, but of course it more fun I could. With time I will be able
to discern more and more. I am like a little child conquering his first language, but I do it faster than a child. With my
recordings, I have no teacher who gets fatigued, no difficult letters, no boring text, no complicated grammar, no
confusing explanations. Only pronunciation, pronunciation, pronunciation, pronunciation, ... Particularly the rhythm and
intonation. When my new pronunciation is ready (!) after some time with thousands of exercises with the same small
amount of practice sentences, then it is time to move on with a good textbook and/or teacher. I will be on the
approximate level of a native 2-4 year old. That is, I will have a native or near-native prosody, as explained above. But
in addition, I will also have quite good command of most if not all the vowels and consonants, because my speech
apparatus is mature. And I have a basic vocabulary and a set of useful sentences. The front door to the new language is
wide open. I can begin functioning in a simple conversation. Fortunately, my interlocutors can't know what I do NOT
know. Thanks to my pronunciation they will think I know very much more than I actually do, even when I hesitate and
don't find the right words. They will find it natural that I'm still having some empty slots in my command of their
vocabulary... It will be easy for me to make contacts with native speakers, because they will not shun me because of my
pronunciation. They will respect me because I respect their language.
This situation, in my opinion, is far better than hurrying through a language course and superficially learn a lot, but with
unbrushed prosody and pronunciation, hoping that I will deal with that later. Because the sad truth, as you may have
inferred by now, would most likely rather be that I learn and automatize such unbrushed pronunciation that neither I nor
my teacher nor any other native speaker will like, much less respect.
An advantageous spin-off effect of the Quality Repetition method is the fact that, in all languages, there are close
connections between the pronunciation and the grammar, particularly between their prosody and syntax. Hence,
focusing so hard on the pronunciation initially, will also help me approach and master the grammar better later on.
I will also claim that the method I advocate here is very time-efficient. Because it will not take a long time to master 20-
30 sentences to the level I aspire for. Of course the required time is very individual, depending on many factors such as
previous experience with learning languages, time available for practice, and the difficulty of the particular language.
But I would dare say that it should take no more than 100 hours of active exercises. The other alternative, the broken
pronunciation, will take most people more than a lifetime to repair!
The scientific and empirical underpinnings for this method are sketched in my 1998 article "Accent Addition : Prosody
and Perception Facilitate Second Language Learning" (see link in the bibliography), and detailed in my 2002 book
"[Pronunciation, Language and the Brain. Theory and Methods for Language Education]" with more than 200
annotated references (sorry, only in Swedish so far). But when they were written, we didn't know as much about mirror
neurons as we do now. So the present paper is an important addition.
5 Minimal pairs
Don't ever practice much with minimal pairs! Minimal pairs are good for phonological research and for making learners
aware of crucial, phonological distinctions, such as in the vowel in ship and sheep, or the initial consonant in tin, thin
and sin. So, of course some listening practice and some pronunciation practice with minimal pairs will obviously have
to take place, but only initially, for creating the awareness. Not more. They should never be automated pairwise,
because of Hebb's principle, "neurons that fire together, wire together." That is, if the words are automated together,
they will always pop up in my mind together. Even if (or, rather, particularly if) I master the distinction to exquisite
perfection, every time I am about to say one of them in context, both of them will appear in my mind as in a multiple
choice test, I will hesitate for a fraction of a second, and distressingly often pick the wrong one. Usually, I will notice
the mistake and immediately correct myself. But there has been a break in my fluency, a totally unnecessary break that
will embarrass me every time. "Oh horror! I chose the wrong word again even though I know perfectly well..."
A conspicuous example of the destructiveness of minimal-pair exercises is the /r/ versus /l/ issue for Japanese speakers
of English; they struggle with the pair almost daily ever since they begin learning English in middle school. Even those
who are highly proficient in the language as well as in the phonetic realization of [r] and [l] fumble with them almost
every time and make many unnecessary and sometimes embarrassing mistakes. Those Japanese persons whom I met
who spoke Swedish or any other foreign language generally fared much better, making no or much fewer such mistakes.
Presumably they did not practice l-r as minimal pairs in their other languages.
This happens not only in pronunciation but in grammar and vocabulary too, such as gender le-la in French or en-ett in
Swedish. I'm sure every reader of this paper can recognize the situation. For instance, native speakers of English have a
notorious tendency to pick the wrong alternative of their and there in writing their own language. This is not due to low
education or low IQ but more likely to Hebbian muddle-up. Their teachers will have been very meticulous about
teaching them the distinction a zillion times at school... So don't ever practice much with two similar things. Put them
each in their own natural (and different!) context, and Quality Practice one the first day, and the other one another day.
Monday: There was a cute, fluffy sheep in the barn. Wednesday: I saw their luxurious, white ship in the harbour.
6 Reading and writing

People who can read and write want to read and write. This is a serious dilemma for a beginning language learner. On
the one hand, reading adds an extra modality to the learning situation, which is good. And it is true that you should read
as much as possible in all languages you are learning as well as in your own language. You don't even have to
understand everything, because thanks to the statistical characteristics of texts and of the brains' plasticity mechanisms,
you will still gain both grammatical and lexical competence from the sheer amount of reading. On the other hand, as has
been pointed out above, reading and writing should be avoided in the beginning of any language course, until the
pronunciation is mastered reasonably well (as also is the naturally smart condition for first-language learners).
When, eventually, the writing system is introduced, the learner should be warned that there is no connection whatsoever
between the letters and the sounds. The script signs are only arbitrary, abstract symbols of the sounds, very abstract and
very arbitrary. Who said you should use Roman letters instead of runes or Greek or Hindi letters for English? Often
enough there may be more than one symbol or constellation of symbols to a given sound (such as English e, ee, ea, ie,
ei); or more than one sound to a given symbol (such as English c as in cancer); or two different sounds together in one
symbol (as in x); or some symbols not sounded at all, depending on the context or other circumstances (such as English
"silent e", or th in sixths). In these respects, some languages (English, French, Swedish, Tibetan) are worse than others
(Finnish, Hungarian, Spanish, Serbo-Croat, Japanese). In languages such as English, Swedish and Russian, reductions
are very important. That is, the pronunciation is far away from what you might expect from the letters, usually
depending on the degree of stress. If you learn the prosody well, reductions will come naturally; and vice versa, if you
make the reductions well, the prosody will come out more naturally. But if you learn from the writing, you may miss the
reductions completely. Or, if you learn Japanese "arigato" from a romanized transcription, you may think that the
Roman letter r represents an r sound, which, of course, is utterly wrong.
So, if possible, keep off the writing until you can speak!
7 Download, install and start Audacity.

Available for PC, Mac and Linux.
Get Audacity at Audacity.sourceforge.net (Fig. 1).

Audacity means boldness, courage. It is pronounced [ɔːˈdæsəti]
with the stress on -da; it's not a "city". :-)
I don't know why they chose this name, but the software is
fabulous. Powerful. You can do lots of things with it and have lots
of fun.
At the time of writing this, I'm using Audacity version 2.0.5.
Do have a look at one or more of the many Tutorials that are

available in English and 15 other languages. Fig. 1
They are under the Help tab (Fig. 2).
Now, start your Audacity and start enjoying!
Fig. 2
Check the settings. For example, there are 51 languages

to choose between in the program. The default language
is English, and the translations are of varying quality, and
in many cases only some functions are translated at all.
You will find the language settings and other preferences
under Edit → Preferences → Interface, shortcut Ctrl+P.
(Fig. 3)
Fig. 3
In Fig. 4 you can see the Swedish interface.
For the time being, accept all standard and default

settings. For most purposes they are the best.
Fig. 4
Sometimes it may be difficult to set your computer and Audacity for recording from a microphone or the speaker sound
(E.g., from YouTube or some pod radio. If so, ask someone who understands your computer to help you.
You will probably have to import a separate component to handle mp3 files. If so, follow the link and tips that may pop
up and install that component too. Or else, skip mp3 and use only wav.
Hint: When using a microphone, be sure to place it at your cheek a little bit behind the angle of your
mouth, so as not to blow air into the mic and cause a noisy recording.
NB: In most laptops the built-in mic makes rather low-quality sound, so a separate mic is recommended!
More hints: If you want to make phonetical analyses, use the wav format, not mp3. The program of first
choice for phonetics is Praat (Dutch for "speech"). Praat too is free, extremely versatile and powerful and
used by most of the phoneticians in the world. Unfortunately it is not so intuitive, but there are lots of
detailed help files, tutorials and active user groups. Download it from http://www.fon.hum.uva.nl/praat/
One very good tutorial for both Praat and phonetics is available at http://swphonetics.com/praat/ by
renowned Swedish-British phonetician Sidney Wood.
8 Look at all the buttons

At the top left of the program window you will find buttons of the
same kind as on an old cassette player. ( Fig. 5) So you will likely feel
comfortable to start using Audacity at once:
Fig. 5
Pause ─ Play ─ Stop ─ Skip to Start ─ Skip to End ─ Record.
If you hover the mouse over a button, a help text kindly appears. The same for all other buttons too. In many cases you
will also find a help text in the left bottom margin.
Hint: The easiest way to record is by pressing R on the keyboard. Then stop and play with the space bar.
Replaying will start from the beginning, or where you have placed the marker.
To continue recording in the same track, press Shift+R. Because if you only press R now, the recording
will start in a new track below your previous track, instead of at the end of the previous recording.
Now record something (R), or open any pre-
existing recording you may have (File →
Open...). Experiment with the buttons! Nothing
can go wrong; your original recording will not be
affected, and everything you do can be undone.
Here in Fig. 6 I opened a part of my Pimsleur
Hungarian course.
What we see, are the sound tracks for the right
and left channel (stereo). Curves like these are
called oscillograms. They show the pressure
variations of the sound waves. A straight line is
silence. The hight of the peaks is the amplitude
(≈loudness).
You can select just one part of the recording and
listen to the selection only, if you like. Do as in a
text document: press, hold down the left mouse Fig. 6
button and drag to the length you like.
(Shaded area in Fig. 6)
Or Select All with Ctrl+A.
Hint: Hold down the Shift key when you start playing, and the
selection will be replayed repeatedly until you stop it. I

recommend this procedure for pronunciation practice.
Examine what happens with the green Play button when you press Shift!
The selection can be treated like any selected text in Word or OpenOffice: Copy, Cut, Paste, Delete, Move, Change,
and very much more. We will return to that later. Please do experiment with various menu choices and keyboard
commands that you think look interesting!
Hint: Everything that you do can be undone in the ordinary way with Ctrl+Z or Edit → Undo. You
can undo exactly EVERYTHING that you may have done. And redo, and re-undo what you redid, etc.
For Redo, press Ctrl+Y. Try it! Do several things and undo and redo them back and forth as much as you
like. You can even undo opening the file, and undo undoing that you undid opening the file. :-)
Undo and Redo also exist as buttons with curved
arrows as in many other ordinary programs today.
(Fig. 7).
Fig. 7
Hint: When you quit the program, it will ask if
you want to save changes. Always reply No!
(Fig. 8) I will explain later below.
Fig. 8
9 Zooming
Look at the View menu.
There are several alternatives for zooming in and out. (Fig. 9)
Try them out, and learn the keyboard commands! That will speed
up and simplify your work significantly.
What I use most of the time is Ctrl+E (Zoom to Selection) and

Ctrl+F (Fit in Window), and often zoom in and out with Ctrl+1
and Ctrl+3.
Ctrl+F means "show the full recording in the window" and is
handy to get an overview of exactly everything without having to
scroll back and forth or zoom out in small steps.
Fig. 9
Ctrl+E will zoom in and enlarge my

selection, making it fill the whole
window. Then I often use Ctrl+3
(=zoom out a little) directly afterwards
to get a better perspective on each side of
the selection.
Fig. 10 shows what my selection from
Fig. 6 may look like after Ctrl+E+3:
Fig. 10
← If you place the marker on the lower edge of the stereo sound channels, you
can resize both channels up and down symmetrically.
(See Fig. 11 a and b on the left.)
The marker changes its shape to ↨
If you place the marker on the line

between the channels, you can resize
them reciprocally (i.e., make one
wider, the other one narrower).
(See Fig. 12 a and b on the right)→
Usually there is no need to do this.
Fig. 11
Fig. 12
10 Stereo or mono?
Usually mono is enough (occupies ½ the
space on my hard disk), so I will remove
one channel.
Click the little triangle ▼ (Fig. 13), and get
a drop-down menu Fig. 14).
Choose Split Stereo to Mono, and the
channels will split into two identical mono
channels.
Pick either one and close it with the little Fig. 13
cross × in its upper left corner (Fig. 15).
The other option here (Split Stereo Track) will keep the right and
left channels different as in the original (if you really used a stereo
microphone). You might want to experiment with each channel
separately, and then join them again. You will get funny or artistic
effects! Fig. 14
However, for the purpose of pronunciation exercises, mono is enough, occupies the least
space on your drive, and is the best choice.
Fig. 15
Hint: Remember that you may Undo (Ctrl+Z) at any time, and Redo (Ctrl+Y) (and "un-undo" and "un-
redo") as many times as you like, if needed or wanted. If you ever should feel total panic, wondering
what on earth you have done, then just close the program, and as always answer No to the question if you
want to save changes! Next time you open the file, everything is as it was from the beginning. The
original recording will never be affected by our manipulations.
11 Recording during my (Olle Kjellin's) pronunciation classes

It's a good idea to record as much as possible during our pronunciation exercises, both the teacher's voice and your own
voice. We will practice a lot in chorus. In that procedure, everybody's speech will be "pushed", as it were, to more or
less inevitably get the same rhythm and melody (intonation) as the teacher's voice. (Thanks to the brain's mirror
neurons.) If you use a headphone and record yourself with its mic correctly placed near your mouth, your own voice
will dominate in the recording. It's your own voice with this excellent rhythm and melody. Choose the best of these
recordings for your future pronunciation exercises. Then use Audacity to practice in unison with yourself, imitate your
own voice! That is what suits best for your own ears, your own brain, and your own speech apparatus. This is very
effective, and often much better than most commercial (and usually expensive) language courses on DVD, CD or
cassette.
Hint: When you temporarily stopped the recording during class, and then start recording again, a new
track will be created below the previous one. This does not matter much, but makes the editing
cumbersome afterwards. It is better to continue recording in the same track as before. To achieve this,
press Shift+Record (Shift+R).
Alternatively, use the Pause button instead of Stop. Then just un-pause to continue recording.
12 Manipulate and adapt your recorded material

At times, the recording will be too soft (weak), and you will just get a very thin sound track (Fig. 16):
Fig. 16
Then do like this: Select a part as

explained above (or select all with
Ctrl+A). Go to menu Effect and click
Amplify... (Fig. 17)
A dialogue opens (Fig. 18) in which
Audacity suggests the largest possible
amplification in dB without clipping the
peaks. In this particular instance 21.4 dB.
Generally it is best to accept the
suggested degree of amplification. But if
you think it got too loud, just Undo
(Ctrl+Z) and then do the Amplify anew
with a lower dB value. Again and again, Fig. 17 Fig. 18
until you are satisfied.
Here is the result of the 21.4 dB amplification in this particular example (Fig. 19):
If we ever should want to make the sound softer, we use the same menu Amplify
but put a minus (-) in front of the dB value.
Fig. 19
Hint: Sometimes there are spikes of artefact noises in the midst of the utterance that I want to amplify.
Then I zoom into the noise until I can delimit and select only the spike, exactly, and de-amplify it
significantly. Finally I will zoom out again and amplify the whole utterance in the usual way as described
above, the noise being gone.
Hint: After selecting, but before doing anything with the selection, I press Z on the keyboard. This will
move the edges of the selection to the nearest zero value in the amplitude curve. This essentially removes
the risk of getting irritating clicks in the manipulated result. (I press Z so often that it has become a like a
subconscious reflex, even if it often is unnecessary. But it takes less than a second, and nothing can be
destroyed.)
Fig. 20 shows a very zoomed-in picture of the left edge of a selection before I pressed "Z", and Fig. 21
shows the result after "Z". Notice how the edge of the selection and the amplitude curve now cross the zero
line at the same place.
Fig. 20
Fig. 21
13 Help! They speak too fast!
Do like this: Select an utterance in

the usual way, then click Effect →
Change tempo... (Fig. 22)
I get a dialogue (Fig. 23), where I

enter minus 20 percent.
Audacity will then work for a little
while, filling in extra sound that
makes the speech slower. It's a kind
of cheating, but the result is usually
good enough for the purpose.
If it sounds too bad, try a smaller
tempo change, for instance -15 Fig. 23
percent. More than -20 will seldom
be good.
Try out various tempo changes!
Fig. 22
If your recording has too slow tempo, you can speed it up with a positive percentage. I do it most of the time with the
Book2 recordings, especially on the renderings in my own language. (All recordings in Book2 are intended for learners,
and then combined to be used bilingually in a great number of possible permutations.)
Remember (again) that you can always Undo (Ctrl+Z) and try other values until you are satisfied. Or just for fun!
14 Prepare sound tracks for practising with your smartphone, CD, mp3 or
computer
Let's assume that I have an audio recording from a language class, or a chat over a cup of coffee with friends, or a radio
program, or a TV drama, or an old language course on a cassette tape, or something from YouTube, or whatever, with
useful phrases that I want to practice my pronunciation with. In the following example I have chosen a little phrase
embedded in a dialogue. The phrase happens to be about 2.31 seconds long (displayed in the bottom margin; Fig. 24).
This duration is very suitable for pronunciation exercises. Remember that! About 2 seconds is the best duration for
practice sentences! Perhaps a bit longer when you are getting more advanced. I listen a couple of times with
Shift+Spacebar (=Shift+Play), and take a note of its time position that is displayed along the upper border; in this case
just before 15 seconds measured from start (Fig. 24, upper margin). This is useful to know if the total recording is very
long and I might get lost when I zoom out...
I then press Z and modify the amplitude and tempo as above, if needed.
I also want some "air" around my practice phrase, so I will create silence before and after it. I zoom in a bit and put the
marker at the left edge of my selection, press Z and click the menu Generate → Silence (Fig. 25) and get a dialogue to
choose the duration of the silence, for example 2 seconds (Fig. 26). I do the same at the end of the selection.
Fig. 24 Fig. 25
Fig. 26
My track now looks like in Fig. 27; no sound is lost, just pushed aside by 2 seconds in each direction:
Fig. 27
Hint: Be sure now to extend the selection a little bit into the silences, particularly some 600 ms
(milliseconds) at the end. Because ca 600-800 ms (0.6-0.8 seconds) of silence between the repetitions,
neither longer nor shorter, will typically make it easy to practice in unison with the program with a
comfortable rhythm. Test this by Shift-playing your selection a couple of times, stop and adjust the
included silences and Shift-play again, until you obtain the rhythm that feels the most comfortable to you.
The next thing is to make the selection repeat itself a couple of times. Go to Effect →
Repeat... (Fig. 28) and specify the number of repetitions (Fig. 29). I often enter 5, which
will give me 6 exemplars total (Fig. 30).
Fig. 29
Fig. 28
Fig. 30
Hint: This 600-800 ms silent interval between the repetitions will give precise time for breathing and
contemplating how to modify one's pronunciation for the next round. Because this is all about chorus
practice together with the recording; not any "listen and say after me" as in olden times. ("The "listen and
say after me" procedure is ineffective in the beginning of second language learning; perhaps better a little
later on, when the pronunciation is solidly mastered.)
While my six exemplars of the practice sentence are still selected, it is time to save them. However, in Audacity we
typically don't "save" the file, but export selection. Go to menu File → Export selection... (Fig. 31):
Fig. 31
...and first choose a suitable location to save it, and then a suitable file name (for instance part of the sentence itself). I
can also choose the file format, such as MP3, WAV or other (Fig. 32):
Fig. 32
Hint: Write track number before the file name (with leading zero for 01-09). This will simplify the
sorting later.
Hint: I put my practice sentences in Dropbox directly. This will give me immediate back-ups in case of a
hard-disk crash after all this work, and best of all, I can access the most recent version of my files at once
from any other computer and my smartphone. No need for a memory stick.
If you haven't got Dropbox yet, please use this "invitation" link from me http://db.tt/tsfzycJ4, and we
will both get a little extra bonus space.
Remember when you close the program to reply No to save. We have already exported what we wanted to keep.
Extra: If you reply Yes to Save, you will save a Project, a special Audacity file that is quite big but
allows many exciting possibilities. For instance, you can annotate your recording. Or you can create
music, or sing in chorus with yourself in several different tracks while you are playing various
instruments in several other tracks. You can manipulate and mix them in innumerable ways. Professional
musicians do so. There are a lots of fun things to do with Audacity. When you are ready, you concatenate
them all into a final version with two stereo channels, export them to a WAV file, burn ten CDs, and go
sell them on the Flea Market on Saturday! Or at least one CD to your mother.
15 Now YOU try! Experiment with Audacity and yourself. Nothing can go wrong!
16 More hints
For quickly and easily getting the pitch contour (aka F0 extraction) of your practice sentence(s), please use the free
program WaveSurfer. Read about it here: http://en.wikipedia.org/wiki/WaveSurfer
and download it here: http://www.spectrogramsforspeech.com/tutorials-2/software-download-2/
In the Glossika group on Facebook, some very good suggestions came up:
Alexander Giddings wrote:
It just occurred to me that the quickest and most effective way to edit the A files may be simply to use the repeat
function over each group of two target sentences (following the primer) and then the truncate silence feature over
the whole file once you are finished, which will give you a pause of exactly the same length (i.e. 600-800
milliseconds) between each repetition and between each group of repetitions. ... There is one downside, however,
which is that any sentence-internal pauses (as in the mini-dialogues) longer than the specified truncate length will be
condensed in the same way.

Rand added:
Here is how I quickly edit glossika files down for choral repetition in Audacity: use the "sound finder" feature. It
will automatically find each phrase and break it up for you. It won't break up short pauses within the phrase because
you set the duration of silence that it considers a break. You can also tell it how much before or after the phrase
(silence) to include in the output file. Set this really short and put your iPod on repeat and you have mobil choral
repetition. Then you export and it will auto sequentially name the files for you, I always make it 1-50 for each c file
(ex sentence 605 is En_Zhs_13_05, meaning 13th C file, track 5). Takes me about 3 minutes from start to finish
breaking the C file into individual files then putting them together as a playlist on iTunes and putting it on my
phone.
17 Even more later

I may want to add or delete stuff here, whatever comes to my mind and whenever that happens. Keep alert for new
versions. You can identify the version and its last edited time and date in the header on each page.
18 Why should you believe my assertions?

This author is a language nerd who has experienced many kinds of language pedagogy, some with success, many
without. I was born a monolingual Swede. I have worked quite a lot as a teacher of Swedish to learners with other first
languages. In this role, I have always had a special interest in pronunciation and grammar, and the associations between
the pronunciation and the grammar and how grammar functions in the brain. I also educate other teachers and produce
textbooks and learning materials. Further, I am a linguist and phonetician. I got my Ph.D. in speech science (or,
formally, Dr. of Medical Science in speech physiology) at the Research Institute of Logopedics and Phoniatrics, Faculty
of Medicine, Tokyo University, in 1976, defending my thesis in Japanese. The thesis dealt with the associations between
speech physiology, phonetics, phonology and prosody in Tibetan5. Not a very outstanding thesis, but I learnt a real lot.
In addition, I am a MD, a senior diagnostic radiologist, specialized in examining the speech and swallowing organs in
the assessment of dysphagia (swallowing problems) of various kinds. I believe that few other phoneticians have seen
the speech organs in "live shows" as many times as I have, and it has been very instructive to see the vast variability of
the "same" anatomy. Moreover, I worked for several years in a psychiatric clinic assessing and treating memory
disorders and dementias.
The common denominator between all these fields is communication and human interaction. No other person in the
world whom I know of (yet) has combined competences from disciplines of both the humanities and the natural
sciences and synthesised them into a well thought out methodology for language learning that is based both on personal
experience of language learning as well as teaching, and on what science really knows about how humans function.
19 Was this useful to you?

I spent a lot of time on learning all this, and on writing this tutorial. Any comments, praise, critique, corrections,
suggestions and requests will be appreciated. Please go to FACEBOOK, join this group,
https://www.facebook.com/groups/best.pronunciation/ and give me some feedback.
20 Selected bibliography
Cattaneo, L., & Rizzolatti, G. (2009). The Mirror Neuron System. Archives of Neurology, 66(5), 557–560. Available at
http://archneur.jamanetwork.com/article.aspx?articleid=796996
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The Role of Deliberate Practice in the Acquisition of Expert
Performance. Psychological Review, 100(3), 363–406. Available at:
http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice(PsychologicalReview).pdf
Ericsson, K. A. (2000). How experts attain and maintain superior performance: Implications for the enhancement of
skilled performance in older individuals. Journal of Ageing and Physical Activity, 8, 346-352. (Updated excerpt
available at: http://www.psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html or
5 Kjellin, O. (1977). Observations on consonant types and “tone” in Tibetan. Journal of Phonetics, 5, 317–338.
http://www.freezepage.com/1404355998UGCCCQIQAR)
Hurford, J.R. (2002). Language beyond our grasp: what mirror neurons can, and cannot, do for language evolution. In
D. Kimbrough Oller, U. Griebel, & K. Plunkett, eds. The Evolution of Communication Systems: A Comparative
Approach. Cambridge MA: MIT Press. Available at: http://www.lel.ed.ac.uk/~jim/mirrormit.pdf.
Kjellin, O. (1999). Accent Addition : Prosody and Perception Facilitate Second Language Learning. In O. Fujimura, B.
D. Joseph, & B. Palek, eds. Linguistics and Phonetics Conference 1998 (LP’98). Columbus, Ohio: The Karolinum
Press, pp. 1–25. Available at: http://olle-kjellin.com/SpeechDoctor/ProcLP98.html. (Recommended reading!)
Kjellin, O. (2002). Uttalet, språket och hjärnan. Teori och metodik för språkundervisningen [Pronunciation, Language
and the Brain. Theory and Methods for Language Education]. [Swedish] Uppsala: Hallgren och Fallgren Studieförlag
AB.
Rizzolatti, G. (2005). The mirror neuron system and its function in humans. Anatomy and Embryology, 210(5-6), 419–
21. Available at: http://link.springer.com/article/10.1007/s00429-005-0039-z?LI=true
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. WIREs Cogn Sci. Retrieved May
14, 2012, from http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WCS78.html
Skoyles, J.R. (1998). Speech phones are a replication code. Medical Hypotheses, (50), pp.167–173. Available at:
http://human-existence.com/publications/Medical Hypotheses 98 Skoyles Phones.pdf.
Tettamanti, M. et al. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. Journal of
cognitive neuroscience, 17(2), pp. 273–81. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15811239.
***
View publication stats

Kjellin 2015 Practise Pronunciation W Audacity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kjellin 2015 Practise Pronunciation W Audacity

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Quality Practise Pronunciation With Audacity – The Best Method!

Research · November 2015

Speech physiology View project

The user has requested enhancement of the downloaded file.

Quality Practise Pronunciation With Audacity – The Best Method!

2 How to practice pronunciation the best – in theory

3 Neurophysiology of speech perception, production and learning

The longish version:

4 How to practise pronunciation the best – in real life!

6 Reading and writing

7 Download, install and start Audacity.

Get Audacity at Audacity.sourceforge.net (Fig. 1).

Do have a look at one or more of the many Tutorials that are

Check the settings. For example, there are 51 languages

In Fig. 4 you can see the Swedish interface.

For the time being, accept all standard and default

8 Look at all the buttons

selection will be replayed repeatedly until you stop it. I

What I use most of the time is Ctrl+E (Zoom to Selection) and

Ctrl+E will zoom in and enlarge my

The marker changes its shape to ↨

If you place the marker on the line

11 Recording during my (Olle Kjellin's) pronunciation classes

12 Manipulate and adapt your recorded material

Then do like this: Select a part as

13 Help! They speak too fast!

Do like this: Select an utterance in

I get a dialogue (Fig. 23), where I

condensed in the same way.

17 Even more later

18 Why should you believe my assertions?

19 Was this useful to you?

View publication stats

You might also like