Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

research

DOI:10.1145/ 3584859
As an example, consider learning
Understanding how human memory and styles. Advocates of learning styles
claim that effective instruction match-
learning works, the differences between es learners’ preferred styles—visual
beginners and experts, and practical steps learners look, auditory learners listen,
developers can take to improve their learning, and kinesthetic learners do. A 2020 re-
view found that 89% of people believe
training, and recruitment. that learners’ preferred styles should
dictate instruction, though research-
BY NEIL C.C. BROWN, FELIENNE F.J. HERMANS, ers have known for several decades
AND LAUREN E. MARGULIEUX that this is inaccurate.28 While learners
have preferred styles, effective instruc-

10 Things
tion matches the content, not learning
styles. A science class should use graphs
to present data rather than verbal de-
scriptions, regardless of visual or audi-

Software
tory learning styles, just like cooking
classes should use hands-on practices
rather than reading, whether learners
prefer a kinesthetic style or not.

Developers
Decades of research into cogni-
tive psychology, education, and pro-
gramming education provide strong
insights into how we learn. The next

Should Learn
10 sections of this article provide re-
search-backed findings about learn-
ing that apply to software developers
and discuss their practical implica-

about Learning
tions. This information can help with
learning for yourself, teaching junior
staff, and recruiting staff.

1. Human Memory Is
Not Made of Bits
Human memory is central to learn-
ing. As Kirschner and Hendrick put
it, “Learning means that there has
been a change made in one’s long-
term memory.”20 Software developers
for software developers.
L E A R N I NG I S N ECE S S A RY are familiar with the incredible power
of computer memory, where we can
Change is perpetual: New technologies are frequently
invented, and old technologies are repeatedly updated. key insights
Thus, developers do not learn to program just once—
˽ Learning is vital for programmers, but the
over the course of their careers they will learn many new human mind works quite differently than
a computer.
programming languages and frameworks.
˽ Understanding how humans learn can
Just because we learn does not mean we understand
ILLUSTRATION BY LISA SH EEH A N

help you learn more effectively.

how we learn. One survey in the U.S. found that the ˽ The Internet and LLMs have not made
learning obsolete; learning is essential
majority of beliefs about memory were contrary to and takes time.

those of scientific consensus: People do not intuitively ˽ Expertise changes how you think, letting
you solve problems more easily but also
understand how memory and learning work.37 potentially hindering your ability to teach.

78 COM MUNICATIO NS O F TH E AC M | JA NUA RY 2024 | VO L . 67 | NO. 1


JA N UA RY 2 0 2 4 | VO L. 6 7 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 79
research

store a series of bits and later retrieve Spreading activation has a nega- working memory. Long-term memory
that exact series of bits. While human tive implication for memory1 and a is where information is permanently
memory is similar, it is neither as pre- positive implication for problem- stored and is functionally limitless;1
cise nor as reliable. solving.32 Spreading activation in that sense, it functions somewhat
Due to the biological complex- means that related, but imprecise, like a computer’s disk storage. Work-
ity of human memory, reliability is a information can become conflated ing memory, however, is used to con-
complicated matter. With computer with the target information, meaning sciously reason about information
memory, we use two fundamental our recall of information can be unre- to solve problems;2 it functions like
operations: read and write. Reading liable. However, spreading activation a CPU’s registers, storing a limited
computer memory does not modify is also associated with insight-based amount of information in real time to
it, and it does not matter how much problem solving, or “aha moments.” allow access and manipulation.
time passes between writes and Because pathways stay primed for Working memory is limited, and
reads. Human long-term memory is hours, sometimes stepping away its capacity is roughly fixed at birth.2
not as sterile. Human memory seems from a problem to work on a different While higher working-memory ca-
to have a “read-and-update” opera- one with its own spreading activation pacity is related to higher general
tion, wherein fetching a memory can causes two unrelated areas to connect intelligence, working-memory ca-
both strengthen and modify it—a in the middle. When two previously pacity is not the be-all and end-all
process known as reconsolidation. unrelated areas connect, creative and for performance.22 Higher capacity
This modification is more likely on unique solutions to problems can enables faster learning, but our un-
recently formed memories. Because arise. This is why walks, showers, or limited long-term memory removes
of this potential for modification, a otherwise spending time away from a limitations on how much we could
fact does not exist in a binary state problem can help you get unstuck in ultimately learn in total.1 Expert
of either definitively known or un- problem solving. programmers may have low or high
known; it can exist in intermediate In summary, human memory does working memory capacity but it is the
states. We can forget things we pre- not work by simply storing and re- contents of their long-term memory
viously knew, and knowledge can be trieving from a specific location like that make them experts.
unreliable, especially when recently computer memory. Human memory As people learn more about a topic,
learned. is more fragile and more unreliable, they relate information together into
Another curious feature of human but it can also offer great benefits chunks.a Chunking allows the mul-
memory is “spreading activation.”1 in problem solving and deep under- tiple pieces of information to act as
Our memories are stored in intercon- standing by connecting knowledge one piece of information in working
nected neural pathways. When we together. We will elaborate further memory. For example, when learn-
try to remember something, we acti- on this in later sections, especially ing an email address, a familiar do-
vate a pathway of neurons to access on retrieving items from memory and main, such as gmail.com, is treated
the targeted information. However, strengthening memories. as one piece of information instead
activation is not contained within of a random string of characters, like
one pathway. Some of the activation 2. Human Memory Is xvjki.wmt. Thus, the more informa-
energy spreads to other connected Composed of One Limited tion that is chunked, the larger work-
pathways, like heat radiating from a and One Unlimited System ing memory is functionally.38 Using
hot water pipe. This spreading activa- Human memory comprises two
tion leaves related pathways primed main components that are relevant a This is not an informal description: the techni-
for activation for hours.1 to learning: long-term memory and cal term is actually “chunks.”

Figure 1. Two ways of presenting the same database schema description with differing extraneous cognitive load.

The dashed box on the left contains exactly the same information as the awkward textual
description in the dashed box on the right. But if a developer only received one of the two
to create an SQL database, they are likely to find the diagram easier than the text. We say
that the text here has a higher extraneous cognitive load.

Team A team should have an id and a name. The name should be a text,
Player
the id should be numeric. The name should have a maximum length,
id number which is 32. There are also players: a player should have an id
id number compared
name text(32) (which, like teams, should be numeric), a name (that is text, but
name text to
role ?text unlimited in length), and role (although the role can be missing),
plays_for ?number and a plays_for which has the numeric id of their team.
This link to the team can be missing.

80 COM MUNICATIO NS O F TH E ACM | JA NUA RY 2024 | VO L . 67 | NO. 1


research

our computer analogy, our working respond more quickly and with less
memory/CPU registers may only let effort.15 Kahneman19,b describes cog-
us store five pointers to chunks in nition as being split into “system 1”
long-term memory/disk, but there is and “system 2” (thus proving that not
no limit on the size of the chunks, so
the optimal strategy is to increase the Expert developers only developers struggle with naming
things). System 1 is fast and driven
size of the chunks by practicing using
information and solving problems.
can reason at a by recognition, relying upon pattern
recognition in long-term memory,
When learning new tools or skills, higher level by while system 2 is slower and focused
it is important to understand the
cognitive load, or amount of working
having memorized on reasoning, requiring more pro-
cessing in working memory. This is
memory capacity, demanded by the common patterns part of a general idea known as dual-
task. Cognitive load has two parts:
intrinsic load and extraneous load.
in program code, process theories.34
Expert developers can reason at
Intrinsic load is how many pieces which frees up their a higher level by having memorized
of information or chunks are inher-
ently necessary to achieve the task; it cognition. (usually implicitly, from experience)
common patterns in program code,
cannot be changed except by chang- which frees up their cognition.4 One
ing the task. In contrast, extraneous such instance of this is “design pat-
cognitive load is unnecessary infor- terns” in programming, similar to
mation that, nevertheless, is part of previously discussed chunks. An ex-
performing the task. Presentation pert may immediately recognize that
format is an example of how extrane- a particular piece of code is carrying
ous cognitive load can vary. If you are out a sorting algorithm, while a be-
implementing a database schema, it ginner might read line by line to try
is easier to use a diagram with tables to understand the workings of the
and attributes than a plain English code without recognizing the bigger
description—the latter has higher picture.
extraneous load because you must A corollary to this is that beginners
mentally transform the description can become experts by reading and
into a schema, whereas the diagram understanding a lot of code. Experts
can be mapped directly (see Figure 1). build up a mental library of patterns
Extraneous load is generally higher that let them read and write code
for beginners because they cannot more easily in the future. Seeing pure-
distinguish between intrinsic and ex- ly imperative C code may only partial-
traneous information easily. ly apply to functional Haskell code, so
When faced with a task that seems seeing a variety of programming para-
beyond a person’s abilities, it is im- digms will help further. Overall, this
portant to recognize that this can be pattern matching is the reason that
changed by reorganizing the task. De- reading and working with more code,
composing the problem into smaller and more types of code, will increase
pieces that can be processed and proficiency at programming.
chunked will ultimately allow the
person to solve complex problems. 4. Understanding a Concept
This principle should be applied to Goes from Abstract to
your own practice when facing prob- Concrete and Back
lems at the edge of or beyond your Research shows that experts deal with
current skills, but it is especially rel- concepts in different ways than be-
evant when working with junior de- ginners. Experts use generic and ab-
velopers and recruits. stract terms that look for underlying
concepts and do not focus on details,
3. Experts Recognize, whereas beginners focus on surface
Beginners Reason details and have trouble connecting
One key difference between begin- these details to the bigger picture.
ners and experts is that experts have These differences affect how experts
seen it all before. Research into chess
experts has shown that their primary
b Parts of Kahneman’s book were undermined
advantage is their ability to remember by psychology’s “replication crisis,” which af-
and recognize the state of the board. fected some of its findings, but not the idea of
This allows them to decide how to system 1 and 2.

JA N UA RY 2 0 2 4 | VO L. 6 7 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 81
research

reason but also how they learn. an abstract concept, but after much
For example, when explaining a practice, a function becomes a con-
variadic function in Python to some- crete item (or chunk) to us and we can
one new to the concept, experts learn the next level of abstraction.
might say that it is a function that can
take a varying number of arguments. Problem-solving 5. Spacing and Repetition Matter
A beginner may focus on details such
as the exact syntax for declaring and
is (incorrectly) How often have you heard that you
should not cram for an exam? Unless,
calling the function and may think conceived as of course, you want to forget every-
that passing one argument is a spe-
cial case. An expert may more eas-
a generic skill. thing by the next day. This advice is
based on one of the most predictable
ily understand or predict the details However, this is and persistent effects in cognitive
while having the concept explained
to them.
not how problem- psychology: the spacing effect.10 Ac-
cording to the spacing effect, humans
When you are learning a new con- solving works in the learn problem-solving concepts best
cept, you will benefit from both forms
of explanation: abstract features and brain. by spacing out their practice across
multiple sessions, multiple days, and
concrete details with examples. More ideally, multiple weeks.
specifically, you will benefit from fol- The reason spacing works is due to
lowing the semantic wave, a concept the relationship between long-term
defined by Australian scientist Karl and working memory previously de-
Maton,25 as illustrated by Figure 2. scribed in this article. When learners
Following the semantic wave, you practice solving problems, they prac-
continuously switch between the ab- tice two skills. First, matching the
stract definition and several diverse information in the problem to a con-
examples of the concept. The more cept that can solve it (such as a filter-
diverse the examples are, the better. ing loop), and second, applying the
Even wrong examples are beneficial concept to solve the problem (such
when compared to correct examples as writing the loop). The first skill re-
to understand why they are wrong,23 quires activating the correct neural
such as seeing a mutable variable la- pathway to the concept in long-term
beled as non-constant when trying to memory.5 If learners repeatedly solve
learn what a constant is. This process the same kind of problem, such as
is called unpacking. for-each loop problems, then that
With these diverse examples, you pathway to long-term memory stays
can then (re)visit the abstract defi- active, and they miss practicing the
nition and construct a deeper un- first skill. A common result of un-
derstanding of the concept. Deeper spaced practice is that people can
understanding stems from recogniz- solve problems, but only when they
ing how multiple details from the ex- are told which concept to use.5 While
amples connect to the one abstract interleaving different types of prob-
concept in the definition, a process lems, such as loop and conditional
called repacking. problems, can help, pathways take
Programming frequently involves time to return to baseline, making
learning about abstract concepts. spacing necessary to get the most
Faced with an abstract concept to out of practice time.10 In addition,
learn, such as functions, people often the brain needs rest to consolidate
reach for concrete instantiations of the new information that has been
the concept to examine, for example, processed so that it can be applied to
the abs function that returns the ab- new problems.
solute value of a number.17 One chal- Going against this time-tested
lenge is that as concepts get more principle, intensive coding boot-
abstract (from values to variables/ camps require learners to cram their
objects to functions/classes to high- problem-solving practice into un-
er-order functions/metaclasses and spaced sessions. While this is not
eventually category theory), the dis- ideal, researchers of the spacing ef-
tance to a concrete example increas- fect have known from the beginning
es. The saving grace is that as we learn that most learners still prefer to cram
abstract concepts, they become more their practice into as little time as
concrete to us. Initially, a function is possible.10 For people whose only vi-

82 COMM UNICATIO NS O F THE ACM | JA NUA RY 2024 | VO L . 67 | NO. 1


research

able option to learn programming is joins. The wisdom of relying on the to memorize information, despite it
intensive bootcamps, we can apply Internet or AI differs between begin- being available on the Internet.
the spacing research to maximize ners and experts: There is a key dis-
their outcomes. tinction between a beginner who has 7. Problem-Solving Is
To structure a day of learning, never learned the details and thus Not a Generic Skill
learners should limit learning bouts lacks the memory connections, and Problem-solving is a large part of pro-
to 90 minutes or less.21 The neuro- an expert who has learned the deeper gramming. One common (but incor-
chemical balance in the brain makes structure but searches for the forgot- rect) idea in software development is
concentration difficult after this ten fine details.1 to directly teach problem-solving as a
point.21 After each learning bout, There is even some evidence to specific skill, which can then be ap-
take at least 20 minutes to rest.21 Re- suggest that searching the Internet plied to different aspects of develop-
ally rest by going for a walk or sitting is less efficient for remembering in- ment (design, debugging, and so on).
quietly—without working on other formation. One study found that in- Thus, problem-solving is (incorrect-
tasks, idly browsing the Internet, or formation was remembered less well ly) conceived as a generic skill. How-
chatting with others. Rest speeds up if it was found via the Internet (com- ever, this is not how problem-solving
the consolidation process, which also pared to a physical book).11 Another works in the brain.
happens during sleep. found that immediately searching While humans do have some ge-
Within a learning bout, there are the Internet led to worse recall of the neric problem-solving skills, they are
a couple of strategies to maximize ef- same information later, compared to much less efficient than domain-spe-
ficiency. First, randomize the order first trying to think of the answer be- cific problem-solving skills, such as
of the type of problem being solved fore resorting to searching.14 It seems being able to debug programs. While
so that different concepts are being that searching may rob the brain of we can learn to reason, we do not
activated in long-term memory.5 Be the benefits of the memory-strength- learn how to solve problems in gener-
forewarned, though, that randomiz- ening effect of recalling information. al. Instead, we learn how to solve pro-
ing the order improves learning out- There is also the issue of cogni- gramming problems, or how to plan
comes but requires more effort.6 The tive load discussed earlier. An Inter- the best chess move, or how to cre-
second strategy is to take short breaks net search requires a form of context ate a knitting pattern. Each of these
at random intervals to enhance mem- switching for the brain; its limited at- skills is separate and does not influ-
ory consolidation. A 10-second break tention and working memory must be ence the others. Research into chess
every 2-5 minutes is recommended.18 switched from the task at hand (pro- found little or no effect of learning
gramming) to a new cognitive task it on other academic and cognitive
6. The Internet Has Not (searching the Internet and selecting skills, and the same is true for music
Made Learning Obsolete a result or evaluating an AI-generated instruction and cognitive training.36
The availability of programming result). If the required knowledge is This inability to transfer problem-
knowledge changed with the advent instead memorized, then not only is solving skills is why “brain training”
of the Internet. Knowledge about syn- access much faster (like using a cache is ineffective for developing general
tax or APIs went from being buried in versus fetching from a hard disk), but intelligence.29
reference books to being a few key- it also avoids the cognitive drain of The one exception to this rule ap-
strokes away. Most recently, AI-pow- context switching and filtering out pears to be spatial skills. Spatial skills
ered tools such as ChatGPT, Codex, extraneous information from the allow us to visualize objects in our
and GitHub Copilot will even fill in search. So there are multiple reasons mind, like a Tetris shape, and mental-
these details (mostly accurately) for
you. This raises an obvious question: Figure 2. The semantic wave for variadic functions.
Why is it worth learning details—or
anything at all—if the knowledge is
available from the Internet within abstract understanding
seconds? variadic functions across different contexts
have multiple arguments and languages
We learn by storing pieces of
knowledge in our long-term memory abstract
and forming connections between
them.1 If the knowledge is not pres-
ent in the brain, because you have unpacking use* repacking
level of
not yet learned it well, the brain can- abstraction
not form any connections between args are lists
it, so higher levels of understanding
and abstraction are not possible.1 If
every time you need a piece of code concrete
to do a database join you search on- time
line for it, insert it, and move on, you
will be unlikely to learn much about

JA N UA RY 2 0 2 4 | VO L. 6 7 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 83
research

ly manipulate those objects, like ro- ple programming languages through- with it” view—and some believe it is
tating a Tetris shape. Training these out their careers. Knowing multiple almost entirely about practice—the
generic skills can improve learning in languages can be beneficial once “10,000 hours” idea that only suffi-
other disciplines. This phenomenon they have been mastered, but some- cient practice is required for exper-
is so unusual that it has caused much times transferring knowledge from tise. Both extreme views are wrong,
consternation in cognitive and learn- one programming language to anoth- and in this section, we will explore
ing sciences.24 Yet, spatial training er can lead to faulty knowledge. For the evidence for the differing effects
improves performance on a range of example, a programmer may learn of aptitude and practice.
non-verbal skills regardless of initial about inheritance in Java, where one There has been much research to
ability, age, or type of training task.40 method overrides a parent method try to predict programming aptitude
Recent work has even demonstrated as long as the signatures match, and but few reliable results. Attempts
that spatial training can improve ef- transfer this knowledge to C++, where to produce a predictive test for pro-
ficiency for professional software de- overriding a parent method addition- gramming ability have generally
velopers, likely because they are still ally requires that the parent method come to naught. Research has found
learning new concepts.30 Even with is declared virtual. These kinds of dif- that all of the following fail to predict
this strange exception, the best way ferences—where features are similar programming ability: gender, age,
to learn how to solve programming in syntax but different in semantics academic major, race, prior perfor-
problems is still to practice solving between languages—specifically hin- mance in math, prior experience with
programming problems rather than der the transfer of knowledge.39 another programming language, per-
looking for performance benefits Experts often help to train begin- ceptions of CS, and preference for
from learning chess or other cogni- ners, but experts without experience humanities or sciences.35 There was
tive training. in training others often do not real- an industry of aptitude tests for pro-
There is a secondary implication ize that beginners think differently. gramming that began in the 1960s,
here for recruitment. One popular Thus, they fail to tailor their expla- but as Robins33 summarizes, the pre-
idea for screening programming can- nations for someone with a different dictive accuracy was poor and the
didates was to give brain-teaser puz- mental model. This is known as the tests fell out of use.
zles, such as how to weigh a jumbo expert blind-spot problem: difficulty There is mixed evidence for the
jet. As Google worked out by 2013, in seeing things through the eyes of importance of years of experience,
this is a waste of time7—there is no a beginner once you have become an which relates to practice. There is a
reliable correspondence between expert. It can be overcome by listen- correlation between the reputation of
problem-solving in the world of brain ing carefully to beginners explain programmers on Stack Overflow and
teasers and problem-solving in the their current understanding and tai- their age: Older people have a higher
world of programming. If you want loring explanations accordingly. reputation.27 However, a recent study
to judge programming ability, assess Sometimes, however, knowledge found only a weak link between years
programming ability. becomes so automated that it is dif- of experience and success on a pro-
ficult for experts to verbalize it.1 This gramming task among programmers
8. Expertise Can Be Problematic automated knowledge is why experts who were relatively early in their ca-
in Some Situations have intuitions about how to solve reers,31 suggesting that aptitude may
We have discussed many ways in problems or explain their process as, have a stronger effect than experi-
which expertise benefits learning "I just know." In these cases of tacit ence, at least early in programmers’
and performance. However, being an knowledge, beginners might better careers.
expert can also lead to problems. learn from instructional materials As in most domains, two factors
Programmers use tools and aids designed to support beginners, of- that weakly predict success in early
to be more effective, such as version ten called scaffolded instruction, or programming are general intelli-
control systems or IDEs. Such tools from a peer rather than an expert. A gence and working memory capac-
can have different effects on begin- more knowledgeable (but still rela- ity.4 These factors roughly represent
ners and experts. Beginners may get tively novice) peer is a highly valuable reasoning skills and how much in-
overwhelmed by the amount of op- resource to bridge the gap between formation a learner can process at
tions available in professional tools beginners and experts. They can help once. As such, they predict the rate of
(due to increased cognitive load) and the beginner develop new knowledge learning rather than absolute ability.
may benefit from beginner-friendly and the expert to rediscover automat- A sub-measure of these two factors,
hints on how to use the tool. However, ed knowledge. spatial reasoning, is a stronger pre-
experts find the same hints more dis- dictor of success in programming,
tracting than useful because they al- 9. The Predictors of Programming though still quite moderate.30 Spa-
ready know what to do. This is known Ability Are Unclear tial reasoning also predicts success
as the expertise-reversal effect: Hints The success of learning program- in other science and math fields,24
and guides that help beginners can ming, like most activities, is built on so this is not programming-specific.
get in the way of experts and make a mix of inherent aptitude and prac- Further, these weak-to-moderate
them less productive. tice. Some people believe it is purely correlations largely disappear with
Programmers usually learn multi- about aptitude—the “you’re born increased experience for various rea-

84 COMM UNICATIO NS O F THE ACM | JA NUA RY 2024 | VO L . 67 | NO. 1


research

sons. Thus, intelligent people will mindset aligns with a practice view—
not always make good programmers, that people’s abilities are malleable.
and good programmers need not be Applied to learning, this mindset says
high in general intelligence. that if someone struggles with a new
In short, it is very hard to predict
who will be able to program, espe- Attempts to task, they can master it with enough
practice.
cially in the long term. Programmers
could come from any background or
produce a As described in Cheryan et al.,9 nei-
ther extreme view is true. For example,
demographic, and links to any other predictive test practically everyone can learn some
factors (such as intelligence) are gen-
erally fleeting in the face of experi-
for programming physics, even if they are not initially
good at it. However, practically no
ence. Therefore, in recruiting new ability have one can earn the Nobel Prize in Phys-
programmers, there are no shortcuts
to identifying programming ability,
generally come to ics, no matter how much they prac-
tice. Between these extremes, we are
nor are there any reliable “candidate naught. often trying to figure out the bound-
profiles” to screen candidates for aries of our abilities. When teachers
programming ability. and learners approach new tasks with
a growth mindset, they tend to persist
10. Your Mindset Matters through difficulties and overcome
There is a long-standing idea of a bi- failure more consistently.12
nary split in programming ability: While the evidence for this effect
You either can program or you can- is strong and intuitive, research sug-
not. There have been many compet- gests it can be difficult to change
ing theories behind this. One of the someone’s mindset to be more
more compelling theories is the idea growth-oriented.8 In particular,
of learning edge momentum,33 that there are two common misconcep-
each topic is dependent on previous tions about how to promote a growth
topics, so once you fall behind you mindset that prove ineffective. The
will struggle to catch up. A less com- first misconception is to reward ef-
pelling theory is the idea of a “geek fort rather than performance because
gene” (you are born with it or not), a growth mindset favors practice over
which has little empirical evidence.26 aptitude. But learners are not stu-
As discussed in the previous section, pid; they can tell when they are not
we have recently come to understand progressing, and teachers praising
differences in programming ability unproductive effort is not helpful.
as differences in prior experience.16 Instead, effort should be rewarded
Learners who might seem similar (for only when the learner is using ef-
example, in the same class, with the fective strategies and on the path to
same degree, completing the same success.13 The second misconception
bootcamp) can have vastly different is that when someone approaches a
knowledge and skills, putting them task with a growth mindset, they will
ahead or behind in terms of learn- maintain that mindset throughout
ing edge momentum or, within a the task. In reality, as we face set-
snapshot of time, making them seem backs and experience failure, people
“born with it” or not. A similar effect skew toward a fixed mindset because
is found in any highly technical field we are not sure where the boundar-
that is optionally taught before uni- ies of our abilities lie. Thus, we must
versity (for example, CS, physics, and practice overcoming setbacks and
engineering).9 failures to maintain a growth-mind-
The binary split view, and its ef- set approach.13
fects on teaching and learning, have A related concept to fixed and
been studied across academic disci- growth mindsets is goal orienta-
plines in research about fixed versus tion. This is split into two categories:
growth mindsets.12 A fixed mindset approach and avoidance. The “ap-
aligns with an aptitude view that proach” goal orientation involves
people’s abilities are innate and un- wanting to do well, and this engen-
changing. Applied to learning, this ders positive and effective learning
mindset says that if someone strug- behaviors: working hard, seeking
gles with a new task, then they are not help, and trying new and challenging
cut out for it. Alternatively, a growth topics. In contrast, the “avoidance”

JA N UA RY 2 0 2 4 | VO L. 6 7 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 85
research

goal orientation involves avoiding for recruiting and those for training
failure. This leads to negative and and learning.
ineffective behaviors: disorganized For recruiting, we make the follow-
study, not seeking help, anxiety over ing recommendations:

Intelligent people
performance, and avoiding chal- ˲ There are no good proxies for
lenge. It is important that learners programming ability. Stereotypes
can make mistakes without severe
penalties if they are to be directed to-
will not always based on gender, race, or other fac-
tors are not supported by evidence.
ward “approach” rather than “avoid- make good If you want to know how well candi-
ance.”
When learning a new skill or train-
programmers, and dates program, look at their previ-
ous work or test them on authentic
ing someone in a new skill, remem- good programmers programming tasks. To emphasize a
ber that approaching tasks with a
growth mindset is effective but also
need not be high in specific point: Do not test candidates
with brain-teaser puzzles.
a skill to be developed. Unfortu- general intelligence. ˲ At least among young developers,
nately, we cannot simply tell people years of experience may not be a very
to have a growth mindset and reap reliable measure of ability.
the benefits. Instead, nurture this ˲ A related recommendation from
skill by seeking or providing honest Behroozi et al.3 is to get candidates
feedback about the process of learn- to solve interview problems in a room
ing and the efficacy of strategies. For on their own before presenting the
mentors, praise areas where a men- solution, as the added pressure from
tee is making progress and accept an interviewer observing or requiring
that they will make mistakes without talking while solving it adds to cogni-
chastising them. For learners, reflect tive load and stress in a way that im-
on how skills have improved in the pairs performance.
past weeks or months when you are For learning and training, we make
doubtful about your progress. Fur- the following recommendations:
ther, expect that a growth mindset ˲ Reading a lot of code will help
will shift toward a fixed mindset in someone become a more efficient
the face of failure, but it can also be programmer.
redeveloped and made stronger with ˲ Experts are not always the best at
practice. Feeling discouraged is nor- training beginners.
mal, but it does not mean that you ˲ Learning takes time, including
will always feel discouraged. If you time between learning sessions. In-
feel like quitting, take a break, take tense cramming is not effective, but
a walk, consider your strategies, and spaced repetition is.
then try again. ˲ Similarly, spending time away
from a problem can help to solve it.
Summary ˲ Just because you can find it
Software developers must continu- through an Internet search or genera-
ally learn in order to keep up with tive AI tool does not mean learning
the fast-paced changes in the field. has become obsolete.
Learning anything, programming in- ˲ Use examples to go between ab-
cluded, involves committing items to stract concepts and concrete learn-
memory. Human memory is fascinat- able facts.
ingly complex. While it shares some ˲ Seeking to succeed (rather than
similarities with computer architec- avoid failure) and believing that abil-
ture, there are key differences that ity is changeable are important fac-
make it work quite differently. In this tors in resilience and learning.
article, we have explained the current Further reading. Many books on
scientific understanding of how hu- learning center around formal educa-
man memory works, how learning tion; they are aimed at school teach-
works, the differences between be- ers and university lecturers. How-
ginners and experts, and related it all ever, the principles are applicable
to practical steps that software devel- everywhere, including professional
opers can take to improve their learn- development. We recommend three
ing, training, and recruitment. books:
Recommendations. We have split ˲ Why Don’t Students Like School? by
up our recommendations into those Daniel T. Willingham provides a short

86 COMMUNICATIO NS O F TH E AC M | JA NUA RY 2024 | VO L . 67 | NO. 1


research

• more online and readable explana- Does stress impact technical interview performance? in the learning styles neuromyth, and does it
In Proceedings of the 28th ACM Joint Meeting on matter? A pragmatic systematic review. Frontiers in
A list of full tion of many of the prin- European Software Engineering Conf. and Symp. Education (2020), 5; https://bit.ly/47h0Jmo.
references and ciples of memory and on the Foundations of Software Engineering (2020), 29. Owen, A.M. et al. Putting brain training to the test.
supplementary 481–492; https://bit.ly/3Sj3AHn Nature 465, 7299 (2010), 775–778.
information how the brain works. 4. Bergersen, G.R. and Gustafsson, J.-E. Programming 30. Parkinson, J. and Cutts, Q. Relationships between
is available ˲ The Programmer’s skill, knowledge, and working memory among an early-stage spatial skills test and final CS degree
at https://bit. professional software developers from an investment outcomes. In Proceedings of the 53rd ACM Technical
Brain by Felienne Her- theory perspective. J. of Individual Differences 32, 4 Symp. on Computer Science Education 1, (2022),
ly/3G2NPNl.
mans et al.c relates (2011), 201. 293–299; 10.1145/3478431.3499332.
5. Bjork, R.A. and Allen, T.W. The spacing effect: 31. Peitek, N. et al. Correlates of programmer efficacy
these concepts to programming and Consolidation or differential encoding? J. of Verbal and their link to experience: A combined EEG and eye-
describes how techniques for learn- Learning and Verbal Behavior 9, 5 (1970), 567–572. tracking study. In Proceedings of the 30th ACM Joint
6. Bjork, R.A. and Bjork, E.L. Desirable difficulties in European Software Engineering Conf. and Symp. on
ing and revision that are used at theory and practice. J. of Applied Research in Memory the Foundations of Software Engineering (Nov. 2022),
and Cognition 9, 4 (2020), 475. 120–131.
school can still apply to professional 32. Raufaste, E., Eyrolle, H., and Mariné, C. Pertinence
7. Bryant, A. In head-hunting, big data may not be such
development. a big deal. The New York Times (June 2013); https:// generation in radiological diagnosis: Spreading
˲ How Learning Happens: Seminal nyti.ms/3Msdraa. activation and the nature of expertise. Cognitive
8. Burgoyne, A.P., Hambrick, D.Z., and Macnamara, B.N. Science 22, 4 (1998), 517–546; https://bit.ly/3sidUoq
Works in Educational Psychology and How firm are the foundations of mind-set theory? 33. Robins, A. Learning edge momentum: A new account
The claims appear stronger than the evidence. of outcomes in CS1. Computer Science Education 20,
What They Mean in Practice by Paul Psychological Science 31, 3 (2020), 258–267; https:// 1 (2010), 37–71.
A. Kirschner and Carl Hendrick20 bit.ly/3MsdPp4 34. Robins, A.V. Dual process theories: Computing
9. Cheryan, S., Ziegler, S.A., Montoya, A.K., and Jiang, cognition in context. ACM Trans. Comput. Education
provides a tour through influential L. Why are some STEM fields more gender balanced 22, 4 (Sept. 2022); 10.1145/3487055.
papers, explaining them in plain lan- than others? Psychological Bulletin 143, 1 (2017), 1. 35. Rountree, N., Rountree, J., Robins, A., and Hannah, R.
10. Dempster, F.N. The spacing effect: A case study in the Interacting factors that predict success and failure
guage and the implications and link- failure to apply the results of psychological research. in a CS1 course. ACM SIGCSE Bulletin 36, 4 (2004),
ages between them. American Psychologist 43, 8 (1988), 627. 101–104.
11. Dong, G. and Potenza, M.N. Behavioural and brain 36. Sala, G. and Gobet, F. Does far transfer exist? Negative
The papers cited can also serve as responses related to Internet search and memory. evidence from chess, music, and working memory
further reading. If you are a software European J. of Neuroscience 42, 8 (2015), 2546– training. Current Directions in Psychological Science
2554; https://bit.ly/49loGL6 26, 6 (2017), 515–520; 10.1177/0963721417712760.
developer you may not have access to 12. Dweck, C.S. Mindset: The New Psychology of Success. 37. Simons, D.J. and Chabris, C.F. What people believe
all of them; ACM members with the Random House (2006). about how memory works: A representative survey of
13. Dweck, C.S. and Yeager, D.S. Mindsets: A view from the U.S. population. PLOS ONE 6, 8 (Aug. 2011), 1–7;
digital library option will have access two eras. Perspectives on Psychological Science 14, 3 10.1371/journal.pone.0022757.
38. Thalmann, M., Souza, A.S., and Oberauer, K. How does
to the ACM papers, although many of (2019), 481–496.
chunking help working memory? J. of Experimental
14. Giebl, S. et al. Thinking first versus Googling first:
our references are from other disci- Preferences and consequences. J. of Applied Research Psychology: Learning, Memory, and Cognition 45, 1
(2019), 37.
plines. For more recent papers, many in Memory and Cognition (2022); https://bit.ly/3QlXun5
39. Tshukudu, E. and Cutts, Q. Understanding conceptual
15. Gobet, F. and Simon, H.A. The roles of recognition
authors supply free PDFs on their processes and look-ahead search in time-constrained transfer for students learning new programming
expert problem solving: Evidence from grand-master- languages. In Proceedings of the 2020 ACM Conf.
websites; you may wish to try search- on Intern. Computing Education Research, 227–237;
level chess. Psychological Science 7, 1 (1996),
ing the Web for the exact title to find 52–55; https://bit.ly/40oT4jY 10.1145/3372782.3406270.
16. Grabarczyk, P., Nicolajsen, S.M., and Brabrand, C. 40. Uttal, D.H. et al. The malleability of spatial skills:
such PDFs. Many authors are also A meta-analysis of training studies. Psychological
On the effect of onboarding computing students
happy to supply you with a copy if you without programming-confidence or experience. In Bulletin 139, 2 (2013), 352.
contact them directly. Proceedings of the 22nd Koli Calling Intern. Conf. on
Computing Education Research (2022), 1–8.
17. Hazzan, O. Reducing abstraction level when learning Neil C.C. Brown is a senior research fellow at King's
abstract algebra concepts. Educational Studies in College London, U.K.
Acknowledgments Mathematics 40, 1 (Sept. 1999), 71–90; https://bit. Felienne F.J. Hermans is a professor at Vrije Universiteit
We are grateful to Greg Wilson, who ly/49lZd4f Amsterdam, The Netherlands.
18. Huberman, A. Teach & learn better with a
was instrumental in initiating and “neuroplasticity super protocol”. Neural Network Lauren E. Margulieux (lmargulieux@gsu.edu) is an
proofreading this paper, and to our (Oct. 2021). associate professor at Georgia State University, Learning
19. Kahneman, D. Thinking, Fast and Slow. Macmillan, Sciences, Atlanta, GA, USA.
other proofreaders: Philip Brown, 2011.
Kristina Dietz, Jack Parkinson, An- 20. Kirschner, P.A. and Hendrick, C. How Learning This work is licensed under a Creative
Happens: Seminal Works in Educational Psychology
thony Robins, and Justin Thurman. and What They Mean in Practice. Routledge (2020).
Commons Attribution-NonCommercial
International 4.0 License.
This work is funded in part by the 21. Kleitman, N. Basic rest-activity cycle—22 years later.
Sleep 5, 4 (1982), 311–317.
National Science Foundation under 22. Kyllonen, P.C. and Christal, R.E. Reasoning ability
grant #1941642. Any opinions, find- is (little more than) working-memory capacity?!
Intelligence 14, 4 (1990), 389–433.
ings, and conclusions or recommen- 23. Margulieux, L. et al. When wrong is right: The
dations expressed in this material instructional power of multiple conceptions. In
Proceedings of the 17th ACM Conf. on Intern.
are those of the authors and do not Computing Education Research (2021), 184–197.
necessarily reflect the views of the 24. Margulieux, L.E. Spatial encoding strategy theory:
The relationship between spatial skill and STEM
National Science Foundation. achievement. In Proceedings of the 2019 ACM Conf.
on Intern. Computing Education Research, 81–90;
10.1145/3291279.3339414.
25. Maton, K. Making semantic waves: A key to cumulative
c Full disclosure: This is written by one of the knowledge-building. Linguistics and Education 24, 1
authors although the other authors recom- (2013), 8–22; https://bit.ly/3tXhvZt.
26. McCartney, R. et al. Folk pedagogy and the geek
mend it as well. gene: geekiness quotient. In Proceedings of the 2017
ACM SIGCSE Technical Symp. on Computer Science
Education (2017), 405–410.
References 27. Morrison, P. and Murphy-Hill, E. Is programming
1. Anderson, J.R. Cognitive Psychology and its knowledge related to age? An exploration of stack Watch the authors discuss
Implications. Macmillan (2005). overflow. In 2013 10th Working Conf. on Mining this work in the exclusive
2. Baddeley, A. Working memory. Science 255, 5044 Software Repositories (MSR), 69–72; https://bit. Communications video.
(1992), 556–559. ly/3Sp5Mgv. https://cacm.acm.org/videos/ten-
3. Behroozi, M., Shirolkar, S., Barik, T., and Parnin, C. 28. Newton, P.M. and Salvi, A. How common is belief things-software-developers

JA N UA RY 2 0 2 4 | VO L. 6 7 | N O. 1 | C OM M U N IC AT ION S OF T HE ACM 87

You might also like