Professional Documents
Culture Documents
Quality Practise Pronunciation With Audacity - Kjellin
Quality Practise Pronunciation With Audacity - Kjellin
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
A re you learning a new language? Do you, like me, have the ambition to learn it well, even to sound as
"native" as possible, or at least to have a listener-friendly pronunciation that will not embarrass yourself or
annoy the native language speakers? This paper will show you and explain why and how it is possible to achieve
that, even if you are not a child (children are allegedly fast learners, according to common observations, but
many of us regard that as an urban legend). In these 26 pages with its 35 illustrations you will learn how to:
• Produce perfect pronunciation exercises with your favourite sentences for free.
• Practice in the way that will give you the best result, for example perfect pronunciation, if you wish.
1 Introduction
There is as yet, to my knowledge, no freely or commercially available pronunciation practice material that is "best" for
my purpose. So I produce my own material, and so could you. It is easy with Audacity, which is a very powerful free
software for recording and editing sounds. It takes time, to be sure, but this is time well spent that not only yields really
good results for my pronunciation exercises, but it also makes learning faster. This tutorial will show both how to utilize
Audacity and how best to perform the exercises, and why it works, according to my knowledge and experience.
Hopefully, it will suit you too.
There are many commercially available language courses, and I have several of them. For instance, Pimsleur and
Rosetta Stone produce really good courses. But still I want to add my own modifications to make them even better, or
supplement them with other material that I make myself according to the guidelines in this tutorial. As you will soon
see, I practice without any text in the beginning, in accordance with all recommendations based on research as well as
on my own experience: Avoid written materials as long as possible in the beginning! Because most writing systems
do not represent the pronunciation well enough, but rather confuse the learner and leads to faulty, "broken"
pronunciation. If you still do want written support, you should learn the IPA (International Phonetic Alphabet
http://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and try the Glossika method
(http://www.glossika.com/).3
For serious learners of English, Richard Cauldwell's Speech in Action http://www.speechinaction.com/ is an
unsurpassed source, a must.
If you don't want to pay for CDs or online courses, there are some quite good free materials available, too. If you have
native-speaking friends to record, do so. Else, I can recommend book2, also called 50Languages, from Goethe-Verlag
http://www.goethe-verlag.com/book2/. I often download their sound files and modify them as below for my own
language studies.
As a competitor or complement to my Audacity method, please do try this fantastic, incredible app (for Android):
Ear2Memory. http://ear2memory.com
I only learnt about it recently, but I really love it. It has some learning curve, but that's easily overcome and the user will
be greatly rewarded!
However, beware of all the amateurish materials that are abundant on the Internet. Most of them are too bad! Some even
incorrect, presented by self-appointed “teachers” seemingly with no clue of what they are doing.
This paper was born from my Swedish cookery-book tutorial on Audacity, originally written as a handout to participants
in my own pronunciation classes. That handout had only a brief description of the theory behind the practice method at
the end, but many readers wanted more of the theory, and they wanted it from the start, so here it is! Also, as a
testimonial of the successfulness of the Quality Repetition method, I might mention that many of my pron-class
participants thought my courses were too short, regardless of their education levels (even many MDs or other
1 This author is a language nerd having tried to learn many foreign languages and taught Swedish to foreigners and to Swedish teachers
intermittently since 1970. Furthermore, I am a linguist and phonetician with a medical Ph.D. in Speech Physiology. In addition, I am also a
medical radiologist subspecialized in the anatomy and functions of the speech and swallowing organs. And, because of my interest in the
neurology of learning, forgetting and communication, I also worked for 6 years' in a memory and dementia clinic. See further section 21, p. 25
2 New in this version (2,7): Minor changes.
3 If you are reading the PDF version electronically, the links and cross-references should be clickable.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 2/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
academics who were unsatisfied with their Swedish pronunciation), and despite the fact that the courses were one whole
week long, about 35 hours mainly consisting of intensive chorus and individual practice on just 12 representative
sentences, such as how to say the participants' own street addresses! And all of them were very angry that they were not
given this chance to pronounce correctly from their first week of studying Swedish. So this is a method that should be
adopted and adhered to during the very first couple of weeks of learning a new language. Maybe 2-8 weeks, depending
on your previous knowledge and the difficulty of the language in question. After that period you will be mastering the
language like a native 3-6-year-old toddler and can as quickly move on to attain a very high level of the language(s) you
are learning. Why? Because native infants too begin by acquiring the pronunciation without having to be confused by
reading or consulting grammar books. Obviously that's the best “method”, as they all typically will end up speaking as -
ahem - native speakers. :-)
different languages, in varying proportions of importance per each specific language. For example, what may be called
"stress" or prominence is often signalled by a certain pitch and/or loudness variation in the stressed syllable (as in
Spanish, Hungarian or Finnish), or on the pre-stress syllable (as quite often in Polish, Russian and maybe French), often
accompanied by a slight lengthening of the stressed syllable (as in Russian and Spanish), or a significant lengthening
(as in French and English), or signalled almost only by the length (as in Swedish), whereas length happens to have
nothing at all to do with stress in some other languages (such as Finnish, Hungarian, Czech, Yakut). And some
languages don't even use "stress" at all, but have other means of prosodic signalling (such as Japanese, Somali, and
maybe French). Pitch is used to signal the morphological structure of words in some languages (such as Swedish,
Japanese and Tibetan), or to signal lexical identity of words in other languages (such as all Chinese languages, Thai,
Vietnamese, and many African and Native American languages; so-called tone languages).
Common to all these uses still is that, regardless of language, they always involve the very same three fundamental
elements ─ pitch, loudness and length ─ to signal all those lexical, grammatical, emotional and other characteristics
involved in the spoken conversation. Also, in every culture there are songs, and songs too consist of notes in sequences
of varying pitches, loudnesses and lengths. So indeed, each one of us above toddler age already masters all the prosodic
means being used in any other language; we just have to learn how to tweak our existing skills for the specific needs of
the new, particular language we are learning. And please do carefully note: All prosodic uses of pitch, loudness and
length appear in each and every utterance regardless of its contents, so it is a very, very good and time-efficient idea to
concentrate mainly on the prosody from the very outset of learning a new language. Don't care too much about the
particulars of vowels and consonants until you feel confident with the prosody. This happens to be exactly how normal
children are learning their first (“native”) language(s), and they succeed perfectly; by definition. And mind you, on their
way to perfection, they will prioritize the prosody and typically “cheat” in the beginning with the most difficult
consonants and vowels as long as the correct prosody is preserved. So, why not do the same as adults? Do cheat with
the particulars of single sounds until you feel mature enough to master them, but never ever cheat with the prosody!
In contrast to the surprisingly small number of prosodic details to learn, there are typically some 30-40 vowel and
consonant sounds (some languages have less, some have more, some have considerably many more), but all of them
don't appear every time, not in every utterance. So they are indeed of less importance than the prosody, at least in the
beginning. You can see proof of that in children's first-language acquisition, as already hinted above. By the time the
toddlers can say 25-30 words in their emerging language, their prosody is already identical with that of their adults.
Nevertheless, it will usually take some 5-6 years or even more, before they can master all the vowels and consonants.
Despite this, they are never perceived of as having any “foreign accent” at all – thanks to their correct prosody!
Therefore, I always practice prosody first and foremost, even if my tongue will stumble on some individual vowels and
consonants that may pop up every now and then.
But, how to do it then, if you haven't got a teacher to help you? The answer is in this tutorial. Produce your own
materials for pronunciation exercises and heed my advice here! Read more about the methodology and its
neurophysiological foundations in the next few sections. Finally read the cookery-book instructions for the use of
Audacity in the second half of this article, from section 9 and on.
And if you are a language teacher, study this tutorial carefully and apply the methods in your classroom. Your students
will be very satisfied. And do not worry, they will not think the many repetitions make it boring, because, for them,
every new repetition is a better version than all the previous ones, and that will have a tremendous and even addictive
effect on the brain's reward centres.
You may want to read more about this in my Facebook group https://www.facebook.com/groups/best.pronunciation
where you can find several of my essays on various aspects of the method, and also testimonials by “victims” who
actually tried it out.
The short version: In the way to be described below I will train my ears with the correct speech rhythm and
melody according to the model and saturate my brain's primary hearing centres as well as its hearing
perception centres without speaking anything myself. I should not torture my ears to hear myself speak with
a faulty accent (as I would do in the beginning, if I didn't saturate my ears first). Subconsciously and
gradually, by shadowing, mirroring and imitation, I will train and automatise my mirror neurons (imitation
neurons), which are then used to guide my speech muscles to my own pronunciation when, eventually, I can
start saying the practice phrases (calibration sentences) myself without help. In this process my brain will
actually be physically changed due to its plasticity. This is learning on the neuroanatomical scale. My brain
will connect and match the sounds that I hear with the sounds that I can make and the sounds that I should
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 4/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
make. This is a kind of pattern recognition process. Therefore, I should not trouble my speech muscles to
learn first to speak with a funny pronunciation (as, again, I would do in the beginning, if I didn't saturate my
ears first). Instead, I will first make the (correct) model utterances resound as an audio template like a din in
my head, and that will direct my speech muscles accordingly. It will then even be difficult for me to
pronounce much differently from the model. (Incidentally, this is also how our native, first-language speech
is mirrored, acquired, controlled and monitored, the speech muscles then being guided by internally
“hearing” and predicting how the result would and should sound for a given articulation. More than 50,000
years of human language evolution cannot be wrong.)
In conclusion: I will practice pronunciation with my ears and let automated nerve reflexes do the rest. I will
then have created an “audio-motor procedural memory” for the target language, with a result as native-like
as I have the time and motivation to aspire for.
A) Hearing
The primary hearing centres are neuronal arrays situated in the temporal lobes, bilaterally (both sides), also called the
auditory cortex. They belong mainly to the sensory system. The auditory nerves from both ears are connected to the
brain stem, and then relayed in a series of neurons (nerve cells) to the primary hearing areas and also to other places,
bilaterally. About 60% of the nerve fibres from one ear cross over to the other side (i.e., from the left ear to the right
temporal lobe, and vice versa), while some 40% remain on the same side. Many of the crossed pathways cross back
again after relaying its signals to various other locations and reflex circuits, for example for directional hearing and
head-turning reflexes towards a sound. See schematic picture in http://bit.ly/auditory-path-2. A useful reflex is the
Lombard reflex that causes me subconsciously and irresistably to speak louder in a loud environment. Replaying my
material loudly thus is a good idea: It will activate my speech organs stronger than playing softly. The auditory system
is replete with reflexes of various kinds.
Speech may seem to be a sequence of distinct words, each made up of distinct sounds. In reality, however, speech is a
continuous stream of interwoven sounds. Sounds are caused by pressure variations in a medium (for example air, water
or human tissues). The words from a speaker as well as all other natural sounds and noises from around us are an
extremely complex mixture of regular and irregular waves (vibrations) and noises in air travelling to our eardrums. The
eardrums are co-moved, transferring the pressure variations to a chain of three small bones in the middle ear, the
ossicles. Pure physics. An intricate mechanism amplifies the air waves in the middle ear while transforming them from
air to water waves in the inner ear via the oval window, and then actually zooms in on speech-relevant vibrations
(particularly those pertaining to the speaker's speech rhythm), synchronizes with them, performs a basic sorting of
relevant sounds, filters out non-speech sounds, and converts all into electrochemical signals in the neurons leading to
the brain, where they are further sorted into higher-order categories of many kinds (phonetical, phonological,
morphological, syntactical, lexical, semantical, non-speech, etc.), by which they can be identified and hopefully
correctly comprehended in their particular context.7 You may be surprised to learn that the auditory nerve actually
contains many more efferent (motor) fibres than afferent (sensory) ones! However, the sorting and filtering
mechanisms of the inner ear is dependent on this arrangement.
7 People with hearing impairment using hearing aids or cochlear implants can't sort out sounds like that in their inner ear. A noisy environment or
several speakers talking simultaneously will impose great difficulties on them.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 5/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
The primary hearing centres register the physical characteristics of the incoming signals, such as pitch, loudness and
length, and map them tonotopically along the cerebral cortex. Tonotopical mapping means that neurons are ordered with
those for low pitch at the anterior end and high pitch at the posterior, much like the keys on a piano, in a simple,
straightforward array. This is nicely illustrated in http://bit.ly/tonotopic. Corresponding "periodotopic" mapping
probably exists for the temporal (timing) aspects of sounds too.
From the primary auditory cortex, signals are then relayed on to higher-order hearing-perception and comprehension
centres (see B, below), and to mirror neurons (see D, below). The pathways to mirror neurons are the shorter and faster,
which has important implications for our practice method. Presumably, the efferent fibres to the inner ear mentioned
above are connected with the mirror neurons.
We hear and perceive our own speech in three different ways. One is by air-ossicle conduction: the sounds from our
mouth go around the cheeks into the ears and are converted to nerve signals via the eardrums and ossicles etc. as above.
The second way is by bone conduction: These waves travel directly through the soft tissues and bone into the inner ear.
This is louder and much faster than air conduction, not only because the route is so much shorter and even bypasses the
eardrum and middle ear, but also because the waves travel more than four times faster in water and solids than in air. So
we really can't know how our own air-conducted speech sounds until we listen to a recording. Some people don't like to
hear themselves off a recording, but that is how we "really" sound to other people, like it or not. The bone conduction
pathway enables a very fast route for auditory feedback, which is very important for the pre- and subconscious
monitoring of what we are saying while we talk. (As for articulatory feedback, there are also feedback loops through
proprioception, i.e., senses of muscular and joint positions and movements. However, although not unimportant, the
proprioceptive routes in general are too slow for real-time feedback. The auditory nerve is short, thick and fast. The
proprioceptive nerves are long, thin and slow.)
The third way of perceiving our own words is psychological: We "know" what we said or ought to have said, because
we wanted to say it. It is usually correct, but occasionally it happens that the mouth said it wrong, or even another word.
In most cases we can correct ourselves immediately, but at times it happens that the mistake goes undetected, and we
can swear on being correct even when we are not. Only a recording can reveal the truth then. Incidentally, this can also
happen when perceiving another person's speech. We may hear only what we expected to hear, or what we could
comprehend, and we can honestly swear on being correct even when we are not. Again, only a recording can solve the
issue. There is never ever any point in arguing about what somebody did or did not say.
B) Perception
The brain's hearing perception "centres" also belong to the sensory system and are responsible for how we understand
speech and language. They are vast, complex, intertwined systems of nerve circuits and networks mainly distributed in
the parietal lobes and around the angles between the temporal and parietal lobes (Wernicke's area). These centres,
circuits and networks continuously exchange information with one another as well as with the primary auditory centres
in the temporal lobes as well as with the mirror neurons mainly in the frontal lobes (see below) across both the right and
the left brain, and to innumerable other networks that, all taken together, represent functions for speech, language,
memory, emotions, etc.
Please note: No brain is “half”, even if they are called hemispheres. The brains are a paired organ, just like the eyes,
ears, kidneys, lungs, hands, feet, etc., none of which ever are “halves”. And like all the other paired organs, both brains
can perform the same actions simultaneously. However, in some special cases conflicts might arise if both brains
competed about what to do. So with thick bundles of extra fibres between them (the corpus callosum), the right and left
brain will communicate, discuss, and decide which side is to do what and how much. As a result one side may become
well trained (dominant) and the other side dominated, "ring-rusty" or even inhibited, but nevertheless always prepared
to jump in and substitute if the dominant side should falter.
Don't ever believe in the urban myths about right-left brain separation of tasks. Some of them may contain a little truth
to a certain extent, but not in the way they are presented by non-experts in the media. The differences of "lateralization"
and dominance often are in only a few percent of the total, bilateral activity. Speech and language are such highly
specialized and finely trained functions, that the non-dominant brain is less prepared for these functions ─ but not
unable to jump in and substitute, for example after an injury. In the majority of people the left brain dominates for many
aspects of language, while the right brain usually is dominant for the prosodic factors. However, both the right and the
left brain do indeed cooperate extensively all the time, even in language and speech.
Eating and drinking utilizes the same anatomical structures in the face, mouth and pharynx as speech, but the
controlling neural networks are different from speech, and left-right dominance is random at about 50:50 percent. So, in
cases of a unilateral stroke, about half of the patients get swallowing problems, depending on whether the stroke is on
the swallowing-dominant side or not. However, in most of these cases the swallowing functions return more or less
completely in about 3-4 months. This is not thanks to any healing of dead neurons but due to the brain's plasticity (see
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 6/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
next section), by which the intact, contralateral, previously non-dominant brain, slowly re-learns how to control
swallowing. Similarly for all other lost functions when they resolve after some time. To some extent this is a
spontantaneous process, but usually very intensive and extensive rehabilitation activities are needed to alert, coach and
exercise the substituting brain.
C) Plasticity
Learning and getting results of training is only possible thanks to the plasticity of the brains. This means their ability
to adapt, reorganize connections, change, and even grow anatomically, in response to incoming stimuli and identified
needs, in effect relocating functions between the right/left brain pair as well as within each brain separately. This is one
of the most fascinating functions of the brains. It happens very fast, and it occurs in both the sensory and the motor
system. And it is not necessary to have had a stroke to induce plasticity; it is a normal function of all brains at all ages!
A connection between two neurons is called a synapse. Plasticity primarily affects the number of synapses. On an
average, each neuron has input synapses from about 10,000 other neurons and constantly receives various signals from
all of them, some excitatory, some inhibitory.
When a neuron has accumulated enough signals of a certain kind that it is specialized for, it will "fire", in its turn
sending a signal on to its output synapses with, again, some 10,000 other nerve connections. One adult pair of brains
has about 100 billion (100,000,000,000) neurons. Multiply these three factors and find that this is indeed a huge
network of some ten billion billion (=10,000,000,000,000,000,000) connections (synapses). In comparison, the World
Wide Web is a very tiny network. According to the available Internet statistics from February 21, 2021, there are only
about 4.83 billion users in the world right now (equating, for the comparison, one internet-connected user with one
brain synapse). (http://www.internetlivestats.com/watch/internet-users/)
At birth, we are even bestowed with some 200 billion neurons, but with only rather few synapses. However, in response
to all and any incoming stimuli and physical activities of the child, zillions of new synapses are formed each minute and
connect all the involved neurons. To accommodate all the new synapses, the neurons form extensive systems of
branches and twigs, in a process called arborization. It is therefore very important to present as many different stimuli as
possible to a child from birth to adulthood, to promote arborization and synapse formation. The more modalities of
different kinds that are involved and coupled (eyes, ears, hands, body movements, right side, left side, etc.) in motor
pattern formation, the better and the more robust will the skills and long-term memories be. "Neurons that fire together,
wire together" (Hebb's principle). This too has pedagogical implications, because the same applies to all ages. For
example, we should practice prosody complete with appropriate body language.
Unused neurons are weeded out or made dormant. For instance, surprisingly, a newborn baby has neural pathways from
the primary auditory centres in the temporal lobes to the visual centres in the occipital lobes, but since such pathways
are generally not needed, they will shrink and almost disappear. Unless the child is blind, of course, in which case these
neurons are kept active, and retrained, to serve the visual centres, which would otherwise have been unemployed, but
now will be used for auditory tasks instead. That is an example of plasticity. However, even blindness acquired in
adulthood will induce similar activation of the visual centres by auditory input. Isn't that impressive plasticity?
However, every instance of normal learning of anything at all at any age at all is accomplished through these same
plasticity mechanisms, and they work perfectly throughout our entire lifetime! This is very encouraging news. There is
thus no a priori reason to give up learning anything, not even a second language, after puberty or any other
mythological age limit, although usually the earlier the better, when possible.
In response to a new stimulus it takes only seconds for small "knobs" (dendritic spines) to form on the branches of
neurons. This time-lapse video of knob formation https://www.youtube.com/watch?v=s9-fNLs-arc illustrates learning
on the scale of branches of a single neuron! If the stimulus is not repeated, the new knobs will disappear. If the stimulus
is repeated sufficiently many times, the knobs will develop further and form permanent synapses and wire together all
neurons that happened to be involved in that task, for instance the pronunciation of a new speech sound, or a whole
sentence with correct rhythm and intonation pattern with concomitant body movements and gestures ─ and grammatical
structure! The results are long-term memories. Such wired-together networks may be re-used in total or in parts in the
formation of yet other networks, and hence assist in recall, cueing, and mental associations of all kinds. All this is the
neurophysiological rationale for multi-modal multiple repetitions in any learning process. The bad news is that there
is no shortcut to learning and long-term memory, only repetitive work. Deliberate, persistent, repetitive practice.
See this amazing video by Harvard Professor Jeff Lichtman on the abundance of criss-crossing neurons and synaptic
connections here (starting at 21:40): https://youtu.be/2QVy0n_rdBI?t=1299
Ever since we start speaking as toddlers and throughout all our lives, every time we say anything at all, every utterance
will serve as an instance of practice that will form new synapses and thus further consolidate and reinforce our speech
habits as represented in our mirror neurons and elsewhere. And so we will all become super experts in all the procedures
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 7/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
involved in hearing and speaking our first language(s). The robustness of this procedural memory and other long-term
memory in general is a linear effect of the number of repetitions. It is statistical learning.
Thus, procedural memories for skilled actions form like paths in a lawn: They emerge wherever you tread frequently
enough, nowhere else, and never independently of older paths, i.e., previous knowledge. But fortunately, there is no
best-before date for plasticity. As we grow older, we will, in many cases (but not all, depending on the type of task),
need more repetitions per item to learn it and automatize it than at younger ages. That is the only age effect. And there is
no neurophysiological difference between language learning and any other types of motor learning. So forget the
disheartening myths about age and language learning, at least as concerns pronunciation. (It may be true for grammar,
which usually is more complicated.) Just repeat a larger number of times if you are "older." And be sure to make it right
from the beginning, to avoid arborization and synapse formation for unwanted pronunciation. Because wrong
pronunciation too will induce all these plasticity processes in the same way and end up stored as unwanted, faulty
motor "skills" in your long-term, procedural memory. You don't really aspire for that. Fossilization in second language
users (i.e. a petrified foreign accent in spite of many years' use of the new language) is more due to faulty instruction
and insufficient training at the beginner's level than to any biological constraints, and thus is preventable (if you do want
to prevent it). Due to the time handicap of adult learners there is little chance for us ever to catch up with a native
speaker in every respect, but it is indeed perfectly feasible to sound like a native speaker in the limited number of
sentences we are able to say. This is particularly true of the prosody of the language, the easiest part to master, because
you already master the three fundamental elements of it: the pitch, the loudness, and the length of sounds, as outlined
above. Surprised? No need to be. This is natural, unavoidable neurophysiology.
In experimental conditions it has been found that automating a new (simple) motor skill takes up to about 15 minutes of
repetitions. Can you practice the same sentence for 15 minutes? It seems like a good idea to do so. However, depending
on the difficulty of the task and your previous experience with similar skills, of course, it may take longer or shorter
time than that to learn a new motor pattern. For example, the 15 click consonants in Zulu are quite a challenge for most
English speakers, but presumably easy-peasy for Xhosa speakers (who have 21 click consonants). When, however, you
can say 20-30 sentences in a native or near-native way in your new language, after hours of deliberate, persistent
practice on only them, you will also, automatically, be able to say 20-30 million other sentences with the same, excellent
pronunciation. Because they all follow exactly the same rules of prosody and pronunciation. So part of the trick for the
adult language learner is to have a very limited curriculum of such “calibration sentences” for the initial pronunciation
training period to really make it possible really to learn them completely and perfectly.
D) Mirror neurons
Our pair of brains contains numerous mirror neurons, also called imitation neurons. Discovered only in the late 20th
century, their functions are highly relevant for language learning and acquisition, and this may be the most fascinating
area of recent research in neuroscience. The human mirror-neuron system is involved in understanding others’ actions
and their intentions behind them, and it underlies mechanisms of observational learning. Research on the mechanism
involved in learning by imitation has shown that there is a strong activation of the mirror-neuron system during new
motor pattern formation. It has been suggested that the mirror-neuron system is the basic mechanism from which
language developed. Some functional deficits typical of autism spectrum disorder, such as deficits in imitation,
emotional empathy, and attributing intentions to others, seem to have a clear counterpart in malfunctions of the mirror-
neuron system.
Surprisingly, the mirror neurons belong to the motor system! They are motor neurons primarily involved in finely tuned
muscular actions, movements and procedures that we can perform. But secondarily, they are also recruited when we
observe other people perform similar actions and procedures with which we ourselves already have prior experience
and interest. In essence, mirror neurons are a kind of action and pattern recognition mechanism essential for the
perception and appreciation of what other people are doing, saying, or intending. Therefore, the mirror neurons are also
crucially involved when we want to shadow, mirror and imitate what others do or say, such as the teacher in a language
class. Our ability of, and agility in, such action recognition, mirroring and imitation depends heavily on these mirror
neurons' prior experience of the same sort, and to some extent to our motivation and desire in perceiving the signals.
Learning motor skills is the result of inducing the formation of new mirror-neuron networks by plasticity processes. As
simple as that. The amount of mirror activation correlates with the degree of our motor skill for that action. Experiments
have shown an increase in mirror activation over time in people who underwent a period of motor training in which
they became skilful. It works after brain injuries too; data on plasticity induced by motor observation provide a
conceptual basis for application of action-observation protocols in stroke rehabilitation.
Since we all as adults already have ample experience and skills in speaking at all (our first language), our mirror
neurons are ready to recognize, mirror and imitate the new language almost directly (after due listening practice, as
above; otherwise not). This is in stark contrast to pre-linguistic toddlers, who has to train both their mirror neurons and
their speech organs from scratch, which will take many times longer than for adults. (Small children do not necessarily
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 8/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
learn languages more quickly than adults, except for the fact that they usually spend far more practice time per day on it
and get much more immediate feedback than adults typically do.)
A little handicap we have as adult learners is that our mirror neurons are heavily biassed in favour of our first
language(s), so they will tend only to "recognize" and do what they already know or think they should expect (the
action recognition function). In this process they may happen to reinterprete what they hear into something more
familiar. That is, they may miss many details and get a more or less distorted picture that better conforms with their
experience. Deaf by preconceptions (amblyacusia). This happens particularly if we start reading too soon into the
language course. Learning a new language should always be done without reference to the writing, initially. Because the
letters (particularly if based on a similar script system as own, or transcribed to our own script) will in all likelihood
signal their usual meanings to us, namely the sounds of our own native language instead of the new language. This will
lead to suboptimal perception, suboptimal recognition, and suboptimal imitation of the new details, the situation we call
"foreign accent". To avoid this, we would need a teacher pointing out the minute details and giving immediate feedback
for the learners to perceive and modify their pronunciation habits in accordance with the patterns of the new language
correctly. However, since we already are super elite players of our speech instruments as such, this actually is no big
deal, but we do need to get the detailed information and pay much attention to it until our new pronunciation becomes
automatic and starts working subconsciously. We are better than parrots. We use both quality and quantity for learning.
So, in addition to a teacher, we need extensive and deliberate listening practice, as recommended in this tutorial. If you
have no teacher, studying phonetics is a good option. This also if you do have a teacher. And an unavoidable must if you
are a teacher.
The actions of mirror neurons are subconscious most of the time, but sometimes they surface in comical ways:
Examples that everybody surely has experienced are when we are watching a soccer/football game on TV and feel
twitches in our own legs as if to try to kick the ball; or when we are listening to a person with a hoarse voice and feel
urged to clear our own throats. Such urges are due to the fact that (1) we recognise the situation, and (2) there are direct
neuronal pathways from the primary auditory cortex (in the temporal lobes) to those mirror neurons (in the frontal
lobes) that monitor and control the speech and voice muscles (or leg muscles). These direct pathways do not involve
understanding of the contents of what is being said! This makes it very fast to shadow or mirror what somebody is
saying, even before you know what s/he is saying. This also makes it very efficient to practice pronunciation in
chorus with your class, or in unison with your recordings, because your mirror-neuron system will compel your
speech and voice muscles to act according to the loud and overwhelming auditory input. This will push you into
getting a native-like rhythm and intonation, virtually without even a chance of getting it wrong. You will certainly
appreciate and enjoy that!
Indeed, experiments have confirmed that the coupling of observation and execution significantly increases plasticity
in the motor cortex. After a training period in which participants simultaneously performed and observed congruent
movements, there was a potentiation of the learning effect. "Observation" here might mean only the auditory input, but
best of all would be a live teacher, whose lip shapes, facial expressions, gestures and all body language could be
observed and mimicked at the same time. Multimodal learning is the best.
All of this, all that is known about mirror neurons in speech-related activities, lends very strong, neurophysiological
support for the method as advocated in this tutorial, in which we practice multimodally multitudinous times in chorus
along with the teacher and class or a recording. We call it Quality Repetition. (This term was coined by Judy B.
Gilbert, well-known author of many books on English pronunciation for foreign or immigrant learners, when we gave
workshops together long ago. Judy also introduced the use of a big rubber band to indicate the long sounds of English.
This is more than a toy gadget, it is the powerful addition of another modality, vision, to the exercises. It will
significantly increase the neuronal traffic between the left and right brain and assist in making that detail ─ length ─
more salient and robust in the learners' procedural memory. I use the rubber band extensively in my Swedish classes
too, where segmental length contrasts are even more significant than in English.)
Most mirror neurons seem to be distributed in the frontal lobes, which are the "head-quarters" of motor activities.
Neuronal networks involved in speech and facial expressions are concentrated in Broca's area (and its homologue on
the non-dominant side) where there is an abundance of mirror neurons. Actually, these mirror neurons for speech (and
hand movements!) also monitor the results of their own speech by continuous, real-time mirroring and monitoring our
own spoken output. That is, they compare what they hear us say ourselves, with the memory of what they think we
should say and should sound like. This is important, because it enables us to modify our speech on the fly, should the
need arise due to some temporary constraints, such as if we are chewing gum at the same time, or are having a
congested nose, or are whispering or shouting, or whatever circumstances that force the speech muscles to act
differently from the usual ways. This is called compensatory articulation, in which we can instantly modify, adapt and
correct our articulation by result-guided processes based on the audio-motor procedural memory stored with our
mirror neurons. "Audio-motor" = the coupling of sounds and speech gestures.8 All motor movements (including vocal
8 Of course, there is also input from sensor organs of touch in lips, tongue and pharynx, and proprioceptive information of muscular and joint
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 9/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
positions and movements, but "audio-sensory-propriocipio-motor" would be too cumbersome a word. Let "audio-motor" cover it all.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 10/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
“feel” for the language. How do people speak? How do they interact? How do they interrupt one another? How do they
emphasize? How do they express astonishment? How do they laugh? How is their general voice setting? High pitched?
Low pitched? Tongue fronted? Tongue far back? Melodious? What kind of rhythm? Etc.
Next, I download the 100 bilingual9 free mp3 audio lessons on that language from 50Languages (book2.de) and listen to
them in random order, over and over again until I begin recognising and understanding many of the words and
sentences. So far this is pure ear training, and I don't try to speak yet. By the way, do not do this on-line, because then
you will be tempted to read the written texts, which will hinder you from hearing the fine and crucial details. Also, in
the case of Latinised transcriptions from other script systems, you can't always trust them. (For example, the Latinised
Arabic at 50Languages is almost totally incomprehensible.)
For the next stage, when I am getting more serious about one particular language, I will pick one of the lessons that
might be more interesting to know and that doesn't sound too difficult. It could be lesson 1 or 30 or any other lesson.
Ideally, if I have enough time, I will edit the lesson with Audacity, so that I can get many repetitions of each sentence
consecutively. Otherwise I will have to do with “ordinary” repetitions of the lesson in toto. (Each lesson of
50Languages contains 20 sentences and is about 4 minutes long. All their languages contain exactly the same
sentences.)
NB. This is important: When I play my practice sentences, I now set my player to Repeat 1, so I can listen lots and
lots of times without having to press the play button every time. Hundreds of times. Maybe thousands of times, over and
over again. This is very efficient, and necessary for the training of my ears first.10 Particularly if it is a completely new
language of which I have no prior knowledge. Furthermore, since I will make perhaps 6 copies of every item in each
track (see below), I will get 6 exemplars (repetitions) of each even when I have not set it to Repeat 1, as when I review
my material at a later stage. This will efficiently remind me of all that I had forgotten. I know of no commercially
available material that is as good as this.
In the beginning I set the volume of the player to quite loud to "push" the sounds into my head. Little by little my ears
will be "saturated", and I will be able to discern words and feel an urge to mirror and gradually to speak in unison with
the recording. Thanks to neural reflex circuits between the ears and the speech organs (the Lombard reflex, and others),
I too will be speaking in quite loud a voice, reflexively. This is good for training all my 146 speech muscles. And thanks
to some other nerve circuits (including the mirror neurons, that compel me to mirror and imitate correctly), it will
actually be quite difficult for me to pronounce with any other rhythm and melody than in my model sentences. That is, I
will automatically and irresistibly get the correct prosody from the beginning! If, at this stage, I don't get all the vowels
and consonants correct, this does not matter much, really, as explained above. I cheat or remain silent across the
difficult ones. As I stated above: The prosody is the easiest part of the new language.
Little by little I will start softening the sound level, more and more. Finally, I will hardly hear the sounds at all while I
still keep repeating. At that stage I will speak it almost by myself, like a native! Without the help of a teacher. But direct,
immediate feedback with comments by a live, well-educated and dedicated teacher with the same amount of patience
would of course have been even better, much better.
With this method I can fairly quickly learn the pronunciation, at least the prosody, of any language. I only need a few
short recordings. I edit them, and then listen to them hundreds of times. I can even have them droning off the car media
player while I am driving, because being repetitive they don't distract my attention from driving, while I can still listen
attentively enough to train my ears and my mirror neurons.
Initially I don't necessarily have to understand anything at all, but of course it would be more fun and efficient if I
could. With time I will be able to discern more and more. I will be like a little child conquering his-her first language,
but I will do it faster than a child. With my recordings, I have no teacher who gets fatigued, no difficult letters, no
boring text, no complicated grammar, no confusing explanations. Only pronunciation, pronunciation, pronunciation,
pronunciation, ... Particularly the rhythm and intonation; the prosody. When my new pronunciation is ready (!) after
some time with hundreds of exercises with the same small amount of practice sentences, then it is time for me to move
on with a good textbook and/or teacher. I will be on the approximate level of a native 2-4 year old toddler. That is, I will
have a native or near-native prosody, as explained above, and a small vocabulary in idiomatic constructions. But in
addition, I will also have quite good command of most if not all the vowels and consonants, because my speech
9 “Bilingual” means that you will have one “start” language saying a sentence and one target language immediately saying the translation, often
with both a male and female voice. The start language does not have to be my first language. Rather, it should be another language that I have
already studied previously and master fairly well, at least on this simple level that 50Languages offers. In this way I will keep bettering the
previous language too, instead of forgetting it, while I'm learning the new language. This is called the ladder method and is recommended by the
real polyglots. Surprisingly, this will lead to NOT mixing them up. As if they each is assigned their own specific-language box in my brain,
instead of them all being thrown into the same general “foreign-language box” resulting in inevitable muddling up and confusion, if I study my
foreign languages sequentially instead of in parallel.
10 Being 73 years old, I do need many repetitions. But age does not prohibit me from learning and acquiring native or near-native pronunciation.
How could it? On the neuro-molecular level, the learning processes are still the same as when I was new-born.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 11/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
apparatus is mature (in contrast to a toddler, that is). And I will have a basic, working vocabulary and a set of useful
sentences. My basic practice sentences will now be my calibration sentences for all of my future learning. The front
door to my new language will be wide open. I can begin functioning in a simple conversation. Fortunately, my
interlocutors can't know what I do NOT know. Thanks to my pronunciation they will think I know very much more than
I actually do, even when I hesitate and don't find the right words. They will find it natural that I'm still having some
empty slots in my command of their vocabulary, and they will not know that nearly all slots still are totally empty... As a
result it will be easy for me to make contact with native speakers; they will not shun me because of my pronunciation.
On the contrary, they will respect me because I am respecting their language.
This situation, in my opinion, is far better than hurrying through a language course and superficially learn many lessons,
but with unbrushed prosody and faltering pronunciation, hoping that I will deal with that little detail later on. Because
the sad truth, as you may have inferred by now, would most likely rather be that I would have learnt and automatized
such unbrushed, “broken” pronunciation that neither I myself nor my teacher nor any other native interlocutor will like
or much less respect. And that will be very difficult to remedy at a later stage.
An advantageous spin-off effect of the Quality Repetition method is the fact that, in all languages, there are close
connections between the pronunciation and the grammar, particularly between their prosody and syntax. Hence,
focusing so hard on the pronunciation of whole sentences initially, will also help me approach and master the grammar
better later on.
I will also claim that the method I advocate here is very time-efficient. Because it will not take a long time to master 20-
30 sentences to the level I aspire for. Of course the required time is very individual, depending on many factors such as
previous experience of learning languages, time available for practice, and the difficulty of the particular language. But
I would dare say that on an average it should take not much more than, say, 100 hours or so of active exercises. The
other alternative, that of initially learning a “broken” pronunciation, will take most people more than a lifetime to
repair!
More than occasionally I encounter adult learners of Swedish who speak with an ever so slight foreign accent or even
no detectable foreignness at all. Asked about how they attained this, they will recount something similar to the method I
advocate here. Some people have that innate instinct and motivation. However, and unfortunately, the majority who
don't have that innate drive are usually let down by the educational system. Deliberate practice of pronunciation is not
pervasive in the classrooms, because most teachers feel undereducated and insecure of how to do it. This paper suggests
the best remedy so far. It is targeting both teachers and learners. Please do as I tell you! :-)
5 Research
The scientific and empirical underpinnings for this method are sketched in my 1998 article "Accent Addition : Prosody
and Perception Facilitate Second Language Learning" (see link in the bibliography), and detailed in my 2002 book
"[Pronunciation, Language and the Brain. Theory and Methods for Language Education]" with more than 200
annotated references (sorry, only in Swedish so far). But when they were written, we didn't know as much about mirror
neurons as we do now. So the present paper is an important update.
Classroom research on pronunciation and its teaching methods is very difficult to perform rigorously, and there are no
“hard data” on how this Quality Repetition method fares in reality. Teacher educators on higher academic levels never
fail to point that out. It is their duty by profession and training to take a critical stance on all new methods and fads. I
too am a PhD in speech science and also am trained in higher medical research, so I know. However, there are no “hard
data” on how the hitherto used methods or non-methods fare either, except that in this case we can find a clear answer
from the results of a large-scale de-facto experiment, namely traditional language teaching: Go out and speak with any
“foreigner” you can meet and listen to their usually quite broken pronunciation. That is the result of the presently used
“methods” of pronunciation teaching in the second-language classrooms. They are the victims of a gigantic, unscientific
test with no control group. To be sure, there are scientific articles reporting on the foreign-accentedness of L2 learners
after so-and-so long time in the new country, but in none of those that I have seen are the modes and amounts of
classroom instructions and practice detailed in the text! They seem to consider time to be the main or even the only
factor influencing the ultimate, inevitable attainment of L2 pronunciation. Here I want to repeat my strong contention
that faulty pronunciation of a second or foreign language is much more the result of suboptimal instruction and practice
than of age, or whatever.
So, in conclusion, dear fellow teachers and learners, at least try out the Quality Repetition method seriously and
diligently! Nothing can get worse than without it.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 12/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
6 An interesting article
Here is a fantastic review by Patricia Kuhl (2010) of brain mechanisms in language acquisition. I particularly liked this
figure showing focal activites in the brains at birth and at 6 and 12 months of age, when subjected to audio input of
speech.
The brain silhuettes in the upper row shows activity in the so-called Wernicke's area, where speech input is received
(from the primary auditory centres in the temporal lobes), analysed, parsed, processed, interpreted and understood. So it
is no surprise that this area is activated by auditory input, most vigorously in the newborns and subsequently reflecting
increasing automaticity in the older infants.
The lower row shows activity in the motor output area, called Broca's area, where muscular action sequences for facial
and speech muscles are composed, before sending them to the actual motor neurons a bit further back in the brain for
subsequent direct neural commands to the muscles. Note that this motor activity recorded still is from audio input of
speech. Because this is one important area where mirror neurons develop and function to help recognize familiar
sounds after training by comparing the input sounds with the sounds that the individual can produce himself/herself by
activity in this very area. So in Broca's area there is no activity in the newborn brains, but increasing activity with age,
i.e., with training! The same brain activity development can be predicted in adults exposed to a completely unknown
language on the first day (equivalent to newborn) and after 6 and 12 months of intensive and extensive listening and
practising. (I hope that this kind of study on adults learners will be done soon. Or perhaps done already?)
This excellent review paper thus supports the notion of mirror neurons' involvement in audio-motor learning and
audio-motor memory templates for goal-guided speech production, as I explained above.
The paper also reviewed studies that show that multi-national pre-school staff can safely use their respective L1:s (first,
or best, languages) when interacting with the children, even if they don't speak the same languages. The children's
brains will develop normally in response to each language and attain a higher cognitive level than if exposed only to
one language. Kuhl states, "... post-exposure MMN phonetic discrimination data revealed that infants showing greater
phonetic learning had higher cognitive control scores post-exposure. ... Taken as a whole, the data are consistent with
the notion that cognitive skills are strongly linked to phonetic learning at the initial stage of phonetic development." 11
Other research reviewed shows that in listeners' brains "cognitive effort is increased when processing nonnative
speech," and that training studies "show that adults can improve nonnative phonetic perception when training occurs
under more social learning conditions, and MEG measures before and after training indicate that neural efficiency
increases after training."12
11 MMN = MisMatch Negativity, a kind of electro-encephalography revealing if a particular perception happens or not.
7 Minimal pairs
Here comes a serious warning: Don't ever practice much with minimal pairs! Minimal pairs are good for phonological
research and for making learners aware of crucial, phonological distinctions, such as of the vowels in ship and sheep, or
the initial consonants in tin, thin and sin. So, of course some listening practice and some pronunciation practice with
minimal pairs will obviously have to take place, but only initially, for creating that awareness. Not more. They should
never be automated pairwise, because of Hebb's principle, "neurons that fire together, wire together." That is, if the
words are automated together, they will always pop up in my mind together. Even if (or, rather, particularly if) I master
the distinction to exquisite perfection after pairwise practice, then every time I am about to say one of them in context,
both of them will appear in my mind as if in a multiple-choice test, forcing me to hesitate for a fraction of a second, and
distressingly often pick the wrong one. Usually, I will notice the mistake and immediately correct myself. But my
fluency will be ruined, a totally unnecessary break that will embarrass me every time. "Oh horror! I chose the wrong
word again even though I know perfectly well that a stalactite – or was it a stalagmite? – hangs from the ceiling … or
was it vice versa? ..." – So please be sure to avoid that trap. Don't ever practice much with contrastive pairs on the same
day!
A conspicuous example of the destructiveness of minimal-pair exercises is the /r/ versus /l/ issue for Japanese learners
of English. They will struggle with that pair daily ever since they begin learning English in school. Even those who are
highly proficient in the English language as well as in the phonetic realization of [r] and [l] will fumble with them
almost every time and make many unnecessary and sometimes embarrassing mistakes. On the other hand, those
Japanese persons whom I met who spoke Swedish, Russian, Tibetan, Chinese or any other foreign language generally
fared much better, making no or much fewer such mistakes. Presumably they did not practice light-right etc. as minimal
pairs in their other languages, after English.
This happens not only in pronunciation but in grammar and vocabulary too, such as gender le-la in French or en-ett in
Swedish. I'm sure every reader of this paper can recognize the situation. For instance, native speakers of English have a
notorious tendency to pick the wrong alternative of their and there and even they're in writing their own language. This
is not due to low education or low IQ but more likely to natural Hebbian muddle-up: Their teachers likely were very
meticulous about teaching the distinction a zillion times at school, but ... but ... So don't ever practice much with two
similar things. Put them each in their own natural (and different!) context, and Quality Practice only one of them on the
first day, and the other one on another day much later. For instance, Monday: There was a fluffy sheep in the barn.
Thursday: I saw a big ship in the harbour.
A pervasive, non-linguistic example of the deleterious effect of pairwise practice is many people's notorious difficulty
of even keeping right-left apart. I call it the "stalactite-stalagmite effect". Completely unnecessary, if you ask me.
prosody well, reductions will come naturally. And vice versa: if you make the reductions well, the prosody will come
out more naturally. But if you learn from the writing, you may miss the reductions completely, and thereby the prosody
too. Or, if you learn Japanese "arigato" from a romanized transcription, you may be led to think that the Latin letter r
represents an r sound, which, however, happens to be utterly wrong.
So, if possible, shun away from the writing until you can speak! Billions of toddlers can't be wrong on this “method”. :-)
The next main section will tell you, step by step, how to use the free Audacity sound editor to customise your own audio
lessons. Other softwares also exist, particularly the Ear2Memory mentioned in the introduction, but when I wrote the
bulk of this article, I was only familiar with Audacity. It would take too much time to rewrite the cookery-book, or even
to update it for its recent evolution. So here you are, dear reader! I'm confident that you can adapt it to your own needs
and wishes.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 15/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Get Audacity at
http://www.audacityteam.org/download/ (Fig. 1).
Audacity means boldness, courage. It is pronounced [ɔːˈdæsəti]
with the stress on -da; it's not a "city". :-)
I don't know why they chose this name, but the software is
fabulous. Powerful. You can do lots of things with it and have lots
of fun.
At the time of writing this section, I was using Audacity version
2.0.5. Today, version 3.0.2 is available with many new functions.
Please bear with any differences in the screenshots.
Fig. 1
Do have a look at one or more of the many tutorials that are
available in English and many other languages.
They are under the Help tab (Fig. 2).
Now, start your Audacity and start enjoying!
Fig. 2
Fig. 3
Sometimes it may be difficult to set your computer and Audacity for recording from a microphone or the speaker sound,
e.g., from YouTube or some pod radio. If so, ask someone who understands your computer to help you.
You will probably have to import a separate component to handle mp3 files. If so, follow the link and tips that may pop
up and install that component too. Or else, skip mp3 and use only wav.
Hint: When using a microphone, be sure to place it at your cheek a little bit behind the angle of your
mouth, so as not to blow air into the mic and cause a noisy recording.
NB: In most laptops the built-in mic makes rather low-quality sound, so a separate mic is recommended!
More hints: If you want to make phonetical analyses, use the wav format, not mp3. The program of first
choice for phonetics is Praat (Dutch for "speech"). Praat too is free, extremely versatile and powerful and
used by most of the phoneticians in the world. Unfortunately it is not so intuitive, but there are lots of
detailed help files, tutorials and active user groups. Download it from http://www.fon.hum.uva.nl/praat/
One very good tutorial for both Praat and phonetics is available at http://swphonetics.com/praat/ by
renowned Swedish-British phonetician Sidney Wood.
Examine what happens with the green Play button when you press Shift!
The selection can be treated like any selected text in Word or OpenOffice: Copy, Cut, Paste, Delete, Move, Change,
and very much more. We will return to that later. Please do experiment with various menu choices and keyboard
commands that you think look interesting!
Hint: Everything that you do can be undone in the ordinary way with Ctrl+Z or Edit → Undo. You
can undo exactly EVERYTHING that you may have done. And redo, and re-undo what you redid, etc.
For Redo, press Ctrl+Y. Try it! Do several things and undo and redo them back and forth as much as you
like. You can even undo opening the file, and undo undoing that you undid opening the file. :-)
Undo and Redo also exist as buttons with curved
arrows as in many other ordinary programs today.
(Fig. 7).
Fig. 7
Hint: When you quit the program, it will ask if
you want to save changes. Always reply No!
(Fig. 8) I will explain later below.
Fig. 8
11 Zooming
Look at the View menu.
There are several alternatives for zooming in and out. (Fig. 9)
Try them out, and learn the keyboard commands! That will speed
up and simplify your work significantly.
Fig. 10
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 18/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
← If you place the marker on the lower edge of the stereo sound
channels, you can resize both channels up and down symmetrically.
(See Fig. 11 a and b on the left.)
Fig. 11
If you place the marker on the line
between the channels, you can resize them
reciprocally (i.e., make one wider, the
other one narrower).
(See Fig. 12 a and b on the right)→
Usually there is no need to do this.
Fig. 12
12 Stereo or mono?
Usually mono is enough (occupies ½ the
space on my hard disk), so I will remove one
channel.
Click the little triangle ▼ (Fig. 13), and get a
drop-down menu Fig. 14).
Choose Split Stereo to Mono, and the
channels will split into two identical mono
channels.
Pick either one and close it with the little Fig. 13
cross × in its upper left corner (Fig. 15).
The other option here (Split Stereo Track) will keep the right and
left channels different as in the original (if you really used a stereo
microphone). You might want to experiment with each channel
separately, and then join them again. You may get funny or artistic
effects! Fig. 14
However, for the purpose of pronunciation exercises, mono is enough, occupies the least space
on your drive, and is the best choice.
Fig. 15
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 19/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Hint: Remember that you may Undo (Ctrl+Z) at any time, and Redo (Ctrl+Y) (and "un-undo" and "un-
redo") as many times as you like, if needed or wanted. If you ever should feel total panic, wondering
what on earth you have done, then just close the program, and as always answer No to the question if you
want to save changes! Next time you open the file, everything is as it was from the beginning. The
original recording will never be affected by our manipulations.
Hint: When you temporarily stopped the recording during class, and then start recording again, a new
track will be created below the previous one. This does not matter much, but makes the editing
cumbersome afterwards. It is better to continue recording in the same track as before. To achieve this,
press Shift+Record (Shift+R).
Alternatively, use the Pause button instead of Stop. Then just un-pause to continue recording.
Fig. 16
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 20/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Fig. 18
Fig. 17
Generally it is best to accept the suggested degree of amplification. But if you think it
got too loud, just Undo (Ctrl+Z) and then do the Amplify anew with a lower dB value.
Again and again, until you are satisfied.
Fig. 19 is the result of the 21.4 dB amplification in this particular example.
If we ever should want to make the sound softer, instead, we will use the same menu
Amplify but put a minus (-) in front of the dB value.
Fig. 19
Hint: Sometimes there are spikes of artefact noises in the midst of the utterance that I want to amplify.
Then I zoom into the noise until I can delimit and select only the spike, exactly, and de-amplify it
significantly. Finally I will zoom out again and amplify the whole utterance in the usual way as described
above, the noise being gone.
Hint: After selecting something, but before doing anything with the selection, I press Z on the keyboard.
This will move the edges of the selection to the nearest zero value in the amplitude curve. This essentially
removes the risk of getting irritating clicks in the manipulated result. (I press Z so often that it has become
a like a subconscious reflex, even if it often is unnecessary. But it takes less than a second, and nothing can
be destroyed.)
Fig. 20 shows a very zoomed-in picture of the left edge of a selection before I pressed "Z", and Fig. 21
shows the result after "Z". Notice how the edge of the selection and the amplitude curve now cross the zero
line at the same place.
Edit: In the more recent version of Audacity, there are good click removal algorithms available.
Fig. 20
Fig. 21
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 21/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Fig. 22
If your recording has too slow tempo, you can speed it up with a positive percentage. I do it most of the time with the
Book2 recordings, especially on the renderings in my own language. (Most speech samples in Book2 are somewhat
slow as they are intended for learners, and then combined to be used bilingually in a great number of possible
permutations.)
Remember (again) that you can always Undo (Ctrl+Z) and try other values until you are satisfied. Or just for fun!
16 Prepare sound tracks for practising with your smartphone, CD, mp3 or
computer
Let's assume that I have an audio recording from a language class, or a chat over a cup of coffee with friends, or a radio
program, or a TV drama, or an old language course on a cassette tape, or something from YouTube, or whatever, with
useful phrases that I want to practice my pronunciation with. In the following example I have chosen a little phrase
embedded in a dialogue. The phrase happens to be about 2.31 seconds long (displayed in the bottom margin; Fig. 24).
This duration is very suitable for pronunciation exercises. Remember that! About 2 seconds is the best duration for
practice sentences! Maybe even shorter when you are a beginner, and probably quite a bit longer when you are getting
more advanced. I listen a couple of times with Shift+Spacebar (=Shift+Play), and take a note of its time position that
is displayed along the upper border; in this case just before 15 seconds measured from start (Fig. 24, upper margin).
This is useful to know if the total recording is very long and I might get lost when I zoom out...
I then press Z and modify the amplitude and tempo as above, if needed.
I also want some "air" around my practice phrase, so I will create silence before and after it. I zoom in a bit and put the
marker at the left edge of my selection, press Z and click the menu Generate → Silence (Fig. 25) and get a dialogue to
choose the duration of the silence, for example 2 seconds (Fig. 26). I do the same at the end of the selection.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 22/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
My track now looks like in Fig. 27; no sound is lost, just pushed aside by 2 seconds in each direction:
Fig. 27
Hint: Be sure now to extend the selection a little bit into the silences, particularly some 600 ms
(milliseconds) at the end. Because ca 600-800 ms (0.6-0.8 seconds) of silence between the repetitions,
neither longer nor shorter, will typically make it easy to practice in unison with the program with a
comfortable rhythm. The interval also depends on the rhythm and tempo of the sample. Test it by Shift-
playing your selection a couple of times, stop and adjust the included silences and Shift-play again, until
you obtain the rhythm that feels the most comfortable to you.
The next thing is to make the selection repeat itself a couple of times. Go to Effect →
Repeat... (Fig. 28) and specify the number of repetitions (Fig. 29). I often enter 5,
which will give me 6 exemplars total (Fig. 30).
Fig. 29
Fig. 28
Fig. 30
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 23/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Hint: This 600-800 ms silent interval between the repetitions will give precise time for breathing and
contemplating how to modify one's pronunciation for the next round. Because this is all about chorus
practice together with the recording; not any "listen and say after me" as in olden times. (The "listen and
say after me" procedure is ineffective in the beginning of second language learning; perhaps better a little
later on, when the pronunciation is solidly mastered.)
While my six exemplars of the practice sentence are still selected, it is time to save them. However, in Audacity we
typically don't "save" the file, but export selection. Go to menu File → Export selection... (Fig. 31):
Fig. 31
...and first choose a suitable location to save it, and then a suitable file name (for instance part of the sentence itself). I
can also choose the file format, such as MP3, WAV, AIFF, Ogg or other (Fig. 32):
Fig. 32
Hint: Write track number before the file name (with leading zero for 01-09). This will simplify the
sorting later.
Hint: I put my practice sentences in Dropbox directly. This will give me immediate back-ups in case of a
hard-disk crash after all this work, and best of all, I can access the most recent version of my files at once
from any other computer and my smartphone. No need for a memory stick.
If you haven't got Dropbox yet, please use this "invitation" link from me http://db.tt/tsfzycJ4, and we
will both get a little extra bonus space.
Remember when you close the program to reply No to save. We have already exported what we wanted to keep.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 24/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Extra: If you reply Yes to Save, you will save a Project, a special Audacity file that is quite big but
allows many exciting possibilities. For instance, you can annotate your recording. Or you can create
music, or sing in chorus with yourself in several different tracks while you are playing various
instruments in several other tracks. You can manipulate and mix them in innumerable ways. Professional
musicians do so. There are a lots of fun things to do with Audacity. When you are ready, you concatenate
them all into a final version with two stereo channels, export them to a WAV file, burn ten CDs, and go
sell them on the Flea Market on Saturday! Or at least one CD to your mother.
17 Now YOU try! Experiment with Audacity and yourself. Nothing can go wrong!
18 More hints
For quickly and easily getting the pitch contour (also known as F0 extraction) of your practice sentence(s), please use
the free program WaveSurfer (not for stereo tracks). Read about it here: http://en.wikipedia.org/wiki/WaveSurfer
and download it from here: https://sourceforge.net/projects/wavesurfer/
In the Glossika group on Facebook, some very good suggestions came up:
Alexander Giddings wrote:
It just occurred to me that the quickest and most effective way to edit the A files [of the Glossika language courses]
may be simply to use the repeat function over each group of two target sentences (following the primer) and then the
truncate silence feature over the whole file once you are finished, which will give you a pause of exactly the same
length (i.e. 600-800 milliseconds) between each repetition and between each group of repetitions. ... There is one
downside, however, which is that any sentence-internal pauses (as in the mini-dialogues) longer than the specified
truncate length will be condensed in the same way.
Rand added:
Here is how I quickly edit glossika files down for choral repetition in Audacity: use the "sound finder" feature. It
will automatically find each phrase and break it up for you. It won't break up short pauses within the phrase because
you set the duration of silence that it considers a break. You can also tell it how much before or after the phrase
(silence) to include in the output file. Set this really short and put your iPod on repeat and you have mobil choral
repetition. Then you export and it will auto sequentially name the files for you, I always make it 1-50 for each c file
(ex sentence 605 is En_Zhs_13_05, meaning 13th C file, track 5). Takes me about 3 minutes from start to finish
breaking the C file into individual files then putting them together as a playlist on iTunes and putting it on my
phone.
19 Other software
New info, March 10, 2019: There is a very neat program and app called WorkAudioBook, with which you can easily
replay a zillion times any sentence or phrase of an mp3 audio file (e.g. audio books and 50Languages lessons) for
language practice, including automatically find and jump to the next phrase. (Thanks to Piotr Szafraniec for this info.)
With its associated software you can extract sound tracks from films and Youtube videos and save them to mp3,
including subtitles.
http://workaudiobook.com/WorkAudioBook/Download(Windows).aspx
I haven't yet found a way to change the tempo as in Audacity, but maybe there is one.
I think that Audacity used as I explained in this tutorial is superior for the beginning stages of language learning. But
when you get so advanced that you can enjoy audio books, you may probably prefer WorkAudioBooks. As for myself,
I'm not very fond of films and audio books, but I will perhaps use it for news podcasts in various languages.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 25/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
23 Selected bibliography
Cattaneo, L., & Rizzolatti, G. (2009). The Mirror Neuron System. Archives of Neurology, 66(5), 557–560. Available at
http://archneur.jamanetwork.com/article.aspx?articleid=796996
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The Role of Deliberate Practice in the Acquisition of
Expert Performance. Psychological Review, 100(3), 363–406. Available at:
http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/DeliberatePractice(PsychologicalReview).pdf
Ericsson, K. A. (2000). How experts attain and maintain superior performance: Implications for the enhancement of
skilled performance in older individuals. Journal of Ageing and Physical Activity, 8, 346-352. (Updated excerpt
available at: http://www.psy.fsu.edu/faculty/ericsson/ericsson.exp.perf.html or
http://www.freezepage.com/1404355998UGCCCQIQAR)
Hurford, J.R. (2002). Language beyond our grasp: what mirror neurons can, and cannot, do for language evolution. In
D. Kimbrough Oller, U. Griebel, & K. Plunkett, eds. The Evolution of Communication Systems: A Comparative
Approach. Cambridge MA: MIT Press. Available at: http://www.lel.ed.ac.uk/~jim/mirrormit.pdf.
Kjellin, O. (1999). Accent Addition : Prosody and Perception Facilitate Second Language Learning. In O. Fujimura, B.
D. Joseph, & B. Palek, eds. Linguistics and Phonetics Conference 1998 (LP’98). Columbus, Ohio: The Karolinum
Press, pp. 1–25. Available at: http://olle-kjellin.com/SpeechDoctor/ProcLP98.html. (Recommended reading!)
13 Kjellin, O. (1977). Observations on consonant types and “tone” in Tibetan. Journal of Phonetics, 5, 317–338.
© Olle Kjellin 2021: Kjellin-Practise-Pronunciation-w-Audacity 26/26
May be updated at any time; this is version 2.7 last edited on July 14, 2021 at 16:31
Kjellin, O. (2002). Uttalet, språket och hjärnan. Teori och metodik för språkundervisningen [Pronunciation, Language
and the Brain. Theory and Methods for Language Education]. [book in Swedish] Uppsala: Hallgren och Fallgren
Studieförlag AB.
(The book is out of print as of Feb 2021, and I'm now in the process of uploading a slightly revised version of it as a
free google doc for anyone to enjoy: https://bit.ly/Uttalet)
Kuhl, P. K. (2010). Brain mechanisms in early language acquisition. Neuron, 67(5), 713–27.
https://doi.org/10.1016/j.neuron.2010.08.038
Rizzolatti, G. (2005). The mirror neuron system and its function in humans. Anatomy and Embryology, 210(5-6), 419–
21. Available at: http://link.springer.com/article/10.1007/s00429-005-0039-z?LI=true
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. WIREs Cogn Sci. Retrieved
May 14, 2012, from http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WCS78.html
Skoyles, J.R. (1998). Speech phones are a replication code. Medical Hypotheses, (50), pp.167–173. Available at:
http://human-existence.com/publications/Medical Hypotheses 98 Skoyles Phones.pdf.
Tettamanti, M. et al. (2005). Listening to action-related sentences activates fronto-parietal motor circuits. Journal of
cognitive neuroscience, 17(2), pp. 273–81. Available at: http://www.ncbi.nlm.nih.gov/pubmed/15811239.
***