Professional Documents
Culture Documents
Eye Movements During Text Reading Align With The R
Eye Movements During Text Reading Align With The R
https://doi.org/10.1038/s41562-021-01215-4
Across languages, the speech signal is characterized by a predominant modulation of the amplitude spectrum between about
4.3 and 5.5 Hz, reflecting the production and processing of linguistic information chunks (syllables and words) every ~200 ms.
Interestingly, ~200 ms is also the typical duration of eye fixations during reading. Prompted by this observation, we demonstrate
that German readers sample written text at ~5 Hz. A subsequent meta-analysis of 142 studies from 14 languages replicates this
result and shows that sampling frequencies vary across languages between 3.9 Hz and 5.2 Hz. This variation systematically
depends on the complexity of the writing systems (character-based versus alphabetic systems and orthographic transparency).
Finally, we empirically demonstrate a positive correlation between speech spectrum and eye movement sampling in low-skilled
non-native readers, with tentative evidence from post hoc analysis suggesting the same relationship in low-skilled native read-
ers. On the basis of this convergent evidence, we propose that during reading, our brain’s linguistic processing systems imprint
a preferred processing rate—that is, the rate of spoken language production and perception—onto the oculomotor system.
S
peech production and perception form a quasi-rhythmic eye movements during non-linguistic tasks or between different
information processing cycle1. During spoken communica- languages or writing systems.
tion, our brain entrains to the frequency structure of the To address these foundational questions, we first used an empiri-
speech signal2,3, suggesting that the temporal structure of the lin- cal dataset17 to determine eye movement sampling frequencies for
guistic stimulus drives neural processes in auditory and language 50 native speakers of German during sentence reading compared
processing systems4. Across languages, the amplitude modulation with a non-linguistic control task, using two different methodolo-
spectrum of natural speech peaks consistently in a frequency range gies. Next, to determine the generality of these results and to inves-
between 4.3 and 5.5 Hz (refs. 5,6), which reflects the fact that informa- tigate possible cross-linguistic differences in the sampling rate of
tive signals (for example, syllables7,8) are processed by the listeners’ reading, we conducted a meta-analysis of 124 studies from 14 dif-
brains every ~200 ms (ref. 9). Interestingly—and, we hypothesize, ferent languages. To this end, we established a frequency analysis for
not accidentally—a typical eye fixation during reading has a very fixation durations extracted from published eye-tracking studies.
similar duration: between ~200 ms for orthographically transpar- Finally, we acquired two new datasets, one with 48 non-native and
ent writing systems such as German or Finnish10,11 and ~250 ms for one with 86 native speakers of German, to directly investigate the
character-based systems such as Chinese11,12. relationship between the sampling frequency of reading and speech
Abundant research has used eye-movement recordings to study production rates on a subject-by-subject level. The experimental
reading at high temporal resolution, exploring, for example, how and meta-analytic results show (1) that written text is sampled in
reading is influenced by word length, word frequency or word pre- the same frequency range as spoken language; (2) that the sampling
dictability given a sentence context12. Among various measures that rate of reading has an upper limit at ~5 Hz, observable in languages
can be derived from eye-movement recordings, timing measures with transparent orthographies; (3) that this rate can be modulated
such as fixation duration are most frequently examined and consid- depending on the complexity of the writing system (for example,
ered precise markers of reading speed13,14. These temporally highly in character-based as opposed to alphabetic scripts or in alphabetic
resolved measurements have so far been analysed only at the level scripts with opaque grapheme–phoneme mapping); and (4) that a
of individual items—typically words. However, other domains of direct coupling between reading and speech rates is found only in
cognitive research (such as attention15) demonstrate that eye move- persons with lower levels of reading skill.
ments can also be subjected to frequency-based analyses. We here
demonstrate that a frequency-based exploration of how written text Results
is sampled by the eyes can open up new perspectives onto several Estimating the sampling rate of reading. Fifty healthy volun-
fundamental questions related to the process of reading, includ- teers read sentences from the Potsdam Sentence Corpus (144
ing whether reading is related to spoken language processing (as sentences presented as a whole; 1,138 words in total10) while move-
recent investigations of word-per-minute measures suggest16) and ments of their right eye were tracked (resolution, 1,000 Hz). As a
whether the visual system’s sampling of linguistic input differs from non-linguistic control task, the participants scanned ‘z-strings’
1
Department of Psychology, Goethe University Frankfurt, Frankfurt am Main, Germany. 2Center for Individual Development and Adaptive Education
of Children at Risk (IDeA), Frankfurt am Main, Germany. 3Department of Linguistics, University of Vienna, Vienna, Austria. 4Centre for Cognitive
Neuroscience, University of Salzburg, Salzburg, Austria. 5Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany. 6Ernst Struengmann
Institute for Neuroscience, Frankfurt am Main, Germany. 7Department of Psychology, New York University, New York, NY, USA. 8Max-Planck-NYU
Center for Language, Music, and Emotion (CLaME), Frankfurt am Main, Germany. 9Brain Imaging Center, Goethe University Frankfurt,Frankfurt am Main,
Germany. ✉e-mail: benjamin.gagl@univie.ac.at
that were constructed by replacing all letters of the sentence with these cases. Note that estimating sampling rates from the mean
the letter ‘z’ (for example, ‘Ein berühmter Maler hat sich selbst (rather than the mode) of fixation durations results in lower rates
ein Ohr abgeschnitten’ / ‘A famous painter cut off his own ear’ for reading (4.5 Hz) and scanning (3.7 Hz). This results from an
was transformed to ‘Zzz zzzzzzzzz Zzzzz zzz zzzz zzzzzz zzz Zzz overestimation of the central tendency by the mean in right-skewed
zzzzzzzzzzzzz’; see Methods for the details and ref. 17 for previous distributions (Fig. 1c) and indicates that this procedure would
results from this dataset). Given that fixation numbers do not differ be inadequate.
significantly between sentences and z-strings17–19, similar scan paths Finally, power spectra of reading versus z-string scanning were
are assumed, which qualifies z-strings as valid control stimuli for estimated using canonical frequency analysis. For each task, we cre-
reading experiments (Please find multiple analyses that account for ated a time series (resolution, 1,000 Hz) starting with the first sac-
potential differences related to scan path characteristics in results cade of the first participant and ending with the last fixation of the
section below). last participant, with a ‘1’ at the exact time of saccade onset and a
‘0’ elsewhere. Note that saccade onsets are the appropriate event for
Fixation durations. After preprocessing (leading to the removal of generating this time series, as they are the re-occurring event and can
3.1% of the data), we estimated mean fixation durations separately be measured with high accuracy22. Subsequently, the power spectra
for each participant and experimental condition. Figure 1a shows of these task-specific event time courses were estimated via Fourier
that fixation durations (presented here as subject-specific means) transform to visualize periodic signal components across subjects
are shorter for reading than for scanning (average, 197 ms versus (Methods). Corroborating the results of the first analysis approach,
249 ms, respectively; t(49) = 11.1; P < 0.001; Cohen’s d = 1.25; 95% a prominent peak was found at 5 Hz for reading and a somewhat less
confidence interval (CI), 0.9 to 1.6). This has been reported previ- pronounced peak at ~4 Hz for scanning (Fig. 1e). To compare these
ously for this dataset17 and replicates earlier results for German18, estimates between reading and scanning, we next estimated separate
English20 and French19 in which fixation durations increased from power spectra for each participant. Individual peaks were retrieved,
reading to scanning by 38–42 ms. averaged (Fig. 1f) and statistically compared. This analysis repro-
duces the sampling frequencies estimated from the mode of fixation
Saccade probability. As a first characterization of rhythmic durations, with frequencies of 5.0 Hz and 4.4 Hz for reading and
eye-movement patterns during reading, we plotted the probabil- scanning, respectively (t(49) = −7.9; P < 0.001; Cohen’s d = −0.9;
ity that a saccade occurs for each sample point after stimulus onset 95% CI, −1.1 to −0.6). There was a high correlation between the two
(Fig. 1b). This analysis demonstrates distinct peaks visible at regular analysis approaches (reading: t(48) = 9.3; P < 0.001; r = 0.80; 95%
intervals, providing evidence that eye movements follow a rhythmic CI, 0.7 to 0.9; scanning: t(48) = 5.5; P < 0.001; r = 0.62; 95% CI, 0.4
structure in both reading and scanning. Importantly, this rhythmic to 0.8), which underscores the validity of sampling-duration-based
pattern is more pronounced and faster during reading. Dominant frequency estimations.
sampling rates were estimated directly from fixation durations, as To summarize, a quantitative frequency-domain characteriza-
well as using classical frequency analysis. While the former approach tion of eye-tracking data shows that the predominant sampling
is important because fixation durations are also the basis for the frequency during reading in German, across participants, is ~5 Hz.
subsequent meta-analysis, the latter approach allows us to evaluate This frequency representation of the reading process falls squarely
the validity of fixation-duration-based frequency estimation. within the boundaries of the predominant modulation frequencies
of 4.3–5.5 Hz determined for speech signals across languages5,6,
Sampling rates. To estimate sampling rates from fixation durations, which in turn have a clear reflection in the neuronal response to
we first estimated sampling periods T (that is, the time from the speech3. We observed the ~5 Hz peak during reading using two dif-
start of a saccade to the start of the next) by adding to each fixa- ferent analysis strategies—that is, when estimating sampling fre-
tion duration (n = 112,547) the duration of the preceding saccade. quencies from saccade and fixation durations and when analysing
Figure 1c shows the distribution of all sampling periods across par- the sequence of saccade events in the frequency domain. Attentive
ticipants, separately for reading and z-string scanning. Note that scanning of z-strings shows highly similar scan-path character-
due to the ex-Gaussian distribution of fixation durations typical for istics compared to reading18,19 but a significantly lower sampling
fixation duration data21, the mean (dashed line) overestimates the frequency at ~4.2 Hz, convergent with findings from non-linguistic
central tendency, whereas the mode (solid line) by definition is a attentional reorienting tasks15,23.
better representation of the predominant sampling period (Fig. 1c). An analysis of the pupil response in this same dataset had previ-
Next, we estimated an eye movement sampling frequency f for each ously indicated higher cognitive effort during reading than during
participant and condition, by dividing 1 s by the subject-specific z-string scanning17. This finding most likely reflects the additional
mode of the sampling period in seconds. This revealed a higher involvement of reading-specific and linguistic processes, such as
average sampling rate for reading (5.0 Hz) than for the control lexical–semantic access, beyond the oculomotor sampling itself.
task (4.2 Hz; Fig. 1d). This difference was significant (t(49) = −8.2; The specific sampling rate observed for reading is thus unlikely to
P < 0.001; Cohen’s d = −1.27; 95% CI, −1.7 to −0.9), and 45 of 50 be driven exclusively by (perceptual or cognitive) features of the
participants showed a numeric reduction in sampling frequency stimulus. In light of the overlap with the rate of spoken language, we
from reading to scanning (grey lines in Fig. 1d). We find virtually tentatively propose that the observed sampling rate of ~5 Hz may
the same pattern of effects when regressive saccades are removed reflect functional constraints imposed by the interfaced nature that
(that is, when analysing only single fixation cases; 4.9 Hz and 4.2 Hz the process of reading has between visual and linguistic processing
for reading and scanning, respectively; t(49) = −6.9; P < 0.001; (which developed primarily on the basis of spoken language). We
Cohen’s d = −1.13; 95% CI, −1.6 to −0.7) and only slightly higher speculate that the brain’s language systems impose the cortical rate
values when restricting the analyses to inter-word re-fixations (that at which speech is produced and perceived onto oculomotor pro-
is, fixations after regressive saccades; 5.2 Hz versus 4.6 Hz, respec- gramming systems exclusively during reading, possibly to optimize
tively; t(48) = −5.5; P < 0.001; Cohen’s d = −0.8; 95% CI, −1.1 to language-related information processing.
−0.4; note that one participant was excluded due to the absence of This hypothesis predicts that the overlap of reading and speech
regressive saccades in the scanning task). Sampling rates of reading rates should generalize across languages and writing systems.
and scanning are thus highly similar between forward-oriented and However, writing systems differ substantially between languages11,24,
regressive eye movements (t(96) = 6.7; P < 0.001; r = 0.6; 95% CI, 0.4 and even within writing systems, the mapping from orthography to
to 0.7). Therefore, all further analyses do not differentiate between meaning differs between languages25. For example, the letter a in
Saccade probability
0.0075
200
0.0050 Reading
Scanning
100 0.0025
0 0
Reading Scanning 0 200 400 600 800 1,000
Task Time relative to the first saccade
of a sentence or z-string (ms)
c d
* Mode Fixation-duration-based
0.006 + Mean estimation
* 6
Reading
4
Scanning
0.002
2
0 0
0 200 400 600 800 Reading Scanning
Saccade + fixation durations (ms) Task
6
Power spectral
1 × 10–5
density (a.u.)
5 × 10–6 2
0
1 2 3 4 5 6 7 8 9 10 11 Reading Scanning
Frequency (Hz) Task
Fig. 1 | Reading-related sampling rates. a, Subject-specific mean fixation durations from 50 participants (dots), the overall means (circle) and CIs
(coloured bars) while reading sentences on the Potsdam sentence corpus10 and scanning z-strings. The lines connect reading with z-string scanning data,
per subject, to visualize effects at the single-subject level. The violin plots show the distributions of individual means (blue indicates scanning, and green
indicates reading; similar in d and f). b, Mean saccade probability (across all participants and stimuli, separated by task) relative to the first saccade of
the sentence, with a nonlinear regression line. c, The sampling period T of one event was defined as the duration of a fixation plus its preceding saccade.
Displayed is the distribution of these sampling periods for sentence reading (green) and z-string scanning (blue), with estimated means (plus symbols and
dashed lines) and modes (asterisks and solid lines). d, Subject-specific mean sampling frequencies f (that is, equal to 1/T) and the overall means (crossed
circles) based on the sampling periods shown in c. e, Power spectra for reading and z-string scanning, estimated across all participants using Fourier
transform analysis. f, Individual peak frequencies estimated from individual power spectra and their means (crossed circles). See Methods for details.
cat versus ball maps onto two different speech sounds in English, Cross-linguistic meta-analysis of reading rates. To investigate
whereas it maps onto the same sound in the German translations the language generality of the alignment between speech and read-
of these words (Katze versus Ball). This letter-to-sound correspon- ing rates, we conducted a meta-analysis of sampling frequencies
dence strongly influences reading acquisition26, so that among the during reading in 14 different languages, based on 1,420 fixation
alphabetic writing systems, opaque orthographies (writing systems duration estimates extracted from 124 original studies published
like English with inconsistent letter-to-sound correspondences) are between 2006 and 2016 (see Methods for the selection criteria). In
associated with lower reading accuracy during the first years of learn- addition to this cross-linguistic comparison, we examined possible
ing to read. These differences would be suggestive of cross-linguistic differences between character-based and alphabetic writing sys-
differences in the frequency at which written text can be sampled, tems, as well as the effect of letter-to-sound correspondence among
and recent experimental evidence (such as the observation of longer alphabetic writing systems. We also explored the cross-linguistic
fixation durations for Chinese than for Finnish or English11) seems correlation between eye movement sampling frequencies and
to support this prediction. language-specific peaks of the speech modulation spectra, and the
400
300
Fixation duration (ms)
200
100
All Arabic Chinese French Polish Italian English Spanish Thai Japanese Hebrew Korean Finnish German Dutch
6
Mean range of speech rate
All Arabic Chinese French Polish Italian English Spanish Thai Hebrew Japanese Korean Finnish German Dutch
Fig. 2 | Meta-analysis of reading-related sampling rates. Fixation durations (top) and corresponding eye movement sampling frequencies (bottom) for
14 different languages. The violin plots (left) represent the respective distributions of all 1,420 duration or frequency values extracted from the included
studies, independent of language. The bars reflect CIs, and the circles reflect the means. In the right panel, each dot reflects one study (mean number of
fixation durations per study, 12.4); the bars reflect CIs, and the circles reflect the mean across studies for each language. In the lower panel, the dashed
lines represent the range of the means of the peak amplitude modulation spectrum that was empirically determined for speech in different languages in
independent work5,6. The dotted lines represent the range between the lowest mean minus one standard deviation and the highest mean plus one standard
deviation for the same data (which was manually read out from figure in ref. 5 and from figure 7 in ref. 6). For Arabic, 1 study and 12 fixation durations
are available; for Chinese, 20 and 205; for Dutch, 5 and 45; for English, 65 and 965; for Finnish, 3 and 21; for French, 2 and 3; for German, 14 and 48; for
Hebrew, 3 and 28; for Italian, 1 and 1; for Japanese, 2 and 12; for Korean, 2 and 39, for Polish, 1 and 1; for Spanish, 4 and 10; and for Thai, 3 and 30.
association between reading rates and information density (linguis- saccade duration from Study 1; 29 ms) to the mean fixation dura-
tic information per syllable27) across languages. tion. Finally, the sampling frequency was calculated as f = 1/T.
All studies selected for inclusion reported mean fixation dura-
tions. However, as discussed above, mean fixation durations are Fixation durations and sampling frequency: descriptive statistics.
not a valid representation of the predominant sampling duration in Figure 2 shows that the majority of mean fixation durations derived
fixation data and accordingly not the preferred basis for calculating from the reading studies were between 200 and 300 ms (upper
the sampling rate of reading. We used 29 full empirical datasets to panel), which transforms to mean sampling frequencies between
develop a transformation function that allowed us to estimate the 3.9 and 5.2 Hz (lower panel). Note that languages with only one
mode from the mean fixation durations reported in the original original study (Arabic, Italian and Polish) were excluded from
publications. In brief, this involved fitting ex-Gaussian distributions descriptive and further statistical analysis. As expected, the major-
to the empirical distributions of these datasets, retrieving distri- ity of studies were conducted in English28. Ten of the 14 languages
butional parameters (including mean and mode) and on this basis in our meta-analysis fall between the minimum (4.3 Hz) and the
optimizing a regression-based transformation that estimates the maximum (5.5 Hz) of previously reported5,6 language-specific peaks
mode from the mean (see Methods and Supplementary Methods for of the speech amplitude modulation spectra (Fig. 2, lower panel,
the details). For the meta-analysis, mean durations were extracted dashed lines). The remaining four languages fell within the range of
from published studies and transformed to the mode. The sampling one standard deviation around the language-specific speech peaks
period T (the interval from saccade onset to the end of the following (Fig. 2, lower panel, dotted lines). Considering the language-specific
fixation; see above) was obtained by adding an estimate (the mode CIs, only for Chinese can we be confident that the sampling rate
(amplitude)
−5
Association of individual speech and reading rates. We tested
the correlation between peaks in the speech modulation spectra of
individual speakers and their eye movement sampling rates during
−10
reading in two experiments. First, we tested 48 learners of German
(Study 3), as we expected to observe higher variabilities in both
measures in non-native language learners than in native speakers32
and a more direct relationship between speaking and reading (simi- 0.1 1.0 10.0
lar to letter-by-letter reading in beginning readers33). We recorded Rate (Hz)
eye movements from each participant while reading German sen-
tences (implemented analogously to the reading task in Study 1) b
and a speech sample based on a ‘small-talk interview’ (22 questions,
on average 18 min of speech per participant; range, 6 to 28 min).
the language-specific mean peaks (dotted line in Fig. 4c). The peaks
frequency (Hz)
3 3
of individual speech modulation spectra and eye movement sam-
2 2
pling frequencies were in a comparable range (Fig. 4b; confirmed
by a significant equivalence test36: t(47) = −1.8, P = 0.04; equiva- 1 1
lence bounds, ±0.33) and positively correlated (Fig. 4c; t(45) = 2.1;
0 0
P = 0.04; Est = 0.32; s.e. = 0.15). Note that this correlation effect was
3 4 5 20 40 60
estimated while controlling for individual differences in reading
Individual speech Individual reading
proficiency (Fig. 4d; t(45) = 2.1; P = 0.04; Est = 0.032; s.e. = 0.016) by frequency speed (%
calculating a linear model that estimates the individual eye move- (Hz) sentences read)
ment sampling rate with speech modulation rate as a predictor.
In a second, preregistered study (Study 4), we assessed the Fig. 4 | Relationship of speech and reading rates in non-native German
relationship between speech modulation spectrum peaks and eye speakers. a, Speech modulation spectrum from 48 non-native speakers
movement sampling frequencies in a group of 86 native speakers of German. The y axis shows the speech modulation index6; the x axis
of German (Methods; preregistration: https://osf.io/mjhkz). We shows the speech modulation rate. For additional comparison, we present
replicated the finding that the peaks of the speech modulation the mean range (dashed lines) and standard deviations (dotted lines) of
spectra and eye movement sampling frequencies were in the same the speech amplitude modulation spectra across languages, which were
range (Fig. 5a; equivalence tests for left and right eye: t(85) > 3.9; read out from figure 3c in ref. 5 and from figure 7 in ref. 6. b, Eye movement
P < 0.001; equivalence bounds, ±0.33). The preregistered correlation sampling frequency in reading and the mean amplitude modulation
analysis showed a small positive (albeit not significant) relation- spectrum in speech, for each participant. The lines connect the reading
ship between eye movement sampling and speech modulation rates and speech frequencies of each individual, the violin plots represent the
(t(175) = 1.8; P = 0.08; Est = 0.07; s.e. = 0.04). For further explo- distribution of the data, the bars represent the standard error of the mean
ration (that is, non-registered post-hoc analysis), we separately and the circles reflect the mean. c, Positive correlation of the individual peaks
investigated and compared four subgroups created by a 2 × 2 com- of the speech modulation spectrum (x axis), reflecting each participant’s
bination of reading speed and reading accuracy. Specifically, we speech rate, with the eye movement frequency (y axis) from the same
implemented a median split based on reading speed measured with participants. d, Correlation between the eye movement frequency (y axis)
a standardized German reading test (the adult version of the SLS35; and a paper-and-pencil-based reading score (x axis) reflecting a positive
fast versus slow: median, 78% versus 60%) and, orthogonal to this, association of the eye movement sampling rate and reading performance.
divided the sample on the basis of their sentence comprehension In c and d, we present the individual sampling frequencies corrected for
accuracies in the eye-tracking experiment (errors present versus reading skill and speech frequency, respectively, based on predictions from
absent: median, 0% versus 15%). Only readers with the lowest skill the fitted linear regression models used for statistical analysis.
level (that is, slow and low comprehension performance) showed a
robust positive association (n = 21; Fig. 5b, bottom left; t(37) = 3.4;
P = 0.002; Est = 0.33; s.e. = 0.10). None of the other groups showed present when the percentage of regressions, skipping and single fix-
a significant correlation, resulting in a reliable interaction effect ation probabilities (Table 1) were added as covariates to the model.
(t(162) = 2.8; P = 0.005; Est = −0.05; s.e. = 0.02), which was also Note that the low-reading-skill group had a lower reading speed
Fast
Individual frequency (Hz)
2
4
0
6
2
4
Slow
2
0
0
Reading, left eye Reading, right eye Speech 3 4 5 6 3 4 5 6
Individual speech frequency (Hz)
Fig. 5 | Relationship of speech and reading rates in 86 native German speakers. a, Eye movement sampling frequency measured during reading for the
left and right eyes (left and centre plots) and the mean amplitude modulation spectrum of samples of spoken speech (right plot). The grey dots represent
individual data points from all participants; the lines connect the reading and speech frequencies of each participant. The violin plots represent the
distribution of the data, the filled bars represent the standard error of the mean and the circles reflect the mean. b, The correlation between the speech
modulation spectrum (x axis) and the eye movement sampling frequency (y axis). The four panels represent performance subgroups depending on reading
speed (slow versus fast; median split) and whether the participants produced errors in an independent standardized reading test (see Methods for details).
Table 1 | Reading speed, reading comprehension and basic eye tracking measures (fixation durations, skipping probability, single
fixation cases and percentage of regressions) for Study 3 and Study 4
Study 3 Study 4, fast with no Study 4, fast with Study 4, slow with no Study 4, slow with
errors errors errors errors
Reading speed (SLS test; % 26.0 (10.4) 79.0 (6.5) 78.9 (5.3) 58.0 (11.1) 53.6 (9.0)
sentences read)
Reading speed (experiment; in s) 4.8 (2.3) 2.4 (0.7) 2.2 (0.8) 2.4 (0.5) 2.7 (0.6)
Reading comprehension 20 (40) 0 (0) 17 (4.4) 0 (0) 19 (14.5)
(experiment; % errors)
Fixation duration (right eye, in ms) 223 (29) 192 (20) 188 (15) 196 (21) 198 (20)
Fixation duration (left eye, in ms) 225 (30) 192 (20) 189 (15) 197 (21) 199 (20)
Skipping (%) 7.5 (5.8) 19.8 (7.9) 24.0 (10.8) 15.7 (6.9) 14.7 (6.6)
Single fixation cases (%) 35.5 (14.6) 52.3 (9.1) 54.1 (10.2) 56.6 (9.2) 52.7 (9.4)
Regressions (%) 10.9 (4.5) 12.1 (5.3) 9.3 (6.9) 8.6 (4.9) 10.6 (6.0)
For Study 4, the data are presented separately for the four performance-based subgroups. All values reflect means and standard deviations. Statistical comparisons of the four groups from Study 4 (2 × 2,
linear regression models with an error-by-speed-group interaction) yielded a significant main effect of reading speed group (slow versus fast) for SLS-based reading speed (estimate, 0.53; s.e. = 0.22;
t(84) = 2.4; P = 0.017; all other effects, t < 1.7) and an obvious group difference on the percentage of errors (0 versus ~18% errors). For the eye-tracking measures, the only significant effect was a main
effect of reading speed group on the percentage of skipping (estimate, −0.09; s.e. = 0.03; t(84) = 3.4; P < 0.001); no other main effect or interaction effect reached significance (all t < 1.94).
than the slow-only group that produced no errors (t(86) = 18.1; movements sample text at a higher rate than during comparable,
P < 0.001; Cohen’s d = 2.7; 95% CI, 2.1 to 3.3) but still had a sub- cognitively less challenging non-linguistic tasks. More impor-
stantially higher reading speed than the non-native readers from tantly, we demonstrate that the sampling frequency of reading lies
Study 3 (t(90) = 26.3; P < 0.001; Cohen’s d = 5.5; 95% CI, 4.6 to 6.4; within the range of previously observed speech rates for one lan-
for general eye-movement characteristics of both experiments, guage, German. Next, by integrating data from 124 empirical stud-
see Table 1). In sum, we could not find the correlation between ies across languages, we show that eye movement sampling varies
reading and speaking rates in the entire sample, but we found tenta- between ~3.9 Hz and ~5.2 Hz, indicating a higher variability than
tive evidence for the correlation when we restricted the group to previously assumed. While it was generally believed that average
native speakers with low reading skills. fixation durations are similar even for very distinct orthographies
such as Chinese and English (Rayner12, p. 1461), our meta-analytic
Discussion results show significantly higher sampling rates for alphabetic than
This frequency-based investigation of eye movements during read- for character-based writing systems. However, average speech rates
ing shows that reading operates in a generally comparable fre- have been shown to vary more narrowly around 5 Hz across lan-
quency domain as the production and perception of natural speech. guages (that is, 4.3–5.4 Hz in ref. 5 and 4.3–5.5 Hz in ref. 6), a range
We first reproduced, in a frequency-domain analysis, previous that would exclude the lower frequencies we observed for reading.
insights based on fixation duration measures17–19—that is, that eye Our meta-analytic findings indicate that this might result from
differences in the complexity of the underlying orthographies (for reading33, which suggests that low-skilled readers sample written
example, character versus alphabetic), such that more computation- text with a temporal resolution close to the speech processing rate.
ally ‘difficult’ orthographies might slow down reading relative to In contrast, the faster reading rates of fluent readers indicate that
highly transparent alphabetic orthographies. they utilize the static nature of text better by processing the fix-
Subsequently, we demonstrate in two independent empirical ated word and following words (based on parafoveal vision) within
studies that second-language learners (of German) read in a lower one ‘sample’.
frequency range than native readers (~4.3 Hz versus ~4.7 Hz) and The specific characteristics of fixation behaviour and text pre-
that only language learners and low-skilled native readers show a sentation during reading can also provide context for another
positive correlation between individual reading and speech rates. intriguing phenomenon—that is, the significantly lower sampling
Combined, these results suggest that reading (that is, an internally rates in character-based than in alphabetic writing systems, while
controlled visual–perceptual process involving sophisticated ocu- overall reading times for sentences with the same content are com-
lomotor programming) is remarkably well temporally aligned with parable between the writing systems11. In character-based lan-
the rate at which spoken language is produced (and perceived). We guages, fewer fixations per sentence are needed to sample the entire
tentatively suggest that this observed association between speech stimulus, while the increased perceptual complexity and informa-
and reading supports the existence of fundamental perceptual prin- tion density11 lead to longer fixation durations relative to alphabetic
ciples underlying the temporal structure of linguistic information languages. In the non-linguistic control task, the participants were
processing, irrespective of modality16. presented with stimuli consisting of many repetitions of the same
Text is a temporally stable visual stimulus. However, our eye letter. In this case, information density and perceptual complexity
movements impose temporal structure onto the linguistic input are low compared with all real-language stimuli. Nevertheless, we
when sequentially sampling a text. The reading process—includ- observed longer fixation durations in the z-string scanning task,
ing the oculomotor programmes—thus serves as an interface which may indicate the presence of qualitatively different cognitive
between a stable external percept and linguistic processing sys- processes compared with reading.
tems optimized for analysing sequential speech input. The obser- The frequency representation of reading-related eye-tracking
vation of faster sampling rates during reading than during parsing data that we advance here can be construed as nothing but a
non-linguistic letter strings (Study 1) indicates that sampling rates transformation of fixation and saccade duration data. This trans-
are not exclusively driven by the physical layout of the stimulus or formation also comes at the cost of zooming out to a ‘meso-level’
by the cognitive effort of processing the stimulus (in which case representation of the data1, at which we rely on aggregated data
they should have been slower). We tentatively propose that neural (that is, one data point per participant), which is against the trend
processors dedicated to the linguistic analysis of speech impose in eye-movement research of focusing on investigating single words
their preferred timing onto the process of reading. Evidence and using regression methods for detailed analysis of, for example,
for the principled possibility of such internally driven entrain- the influence of word characteristics52. Still, the frequency perspec-
ment of reading comes from the observation that manipulating tive proposed here provides a view of component processes of read-
the speed of ‘inner speech’ during reading has a causal effect on ing as an interface between linguistic and orthographic processing.
reading speed37–39. This approach to reading research opens up several interesting new
Our meta-analysis demonstrates that the sampling rate of read- research questions. For example, it becomes possible to compare
ing varies between languages but falls within the range of speech reading behaviour more directly with evidence from other measure-
rates identified in cross-linguistic studies5,6. The meta-analysis ment modalities (such as oscillatory brain activation data53,54) and
also shows higher sampling rates for transparent than for opaque to other cognitive–psychological domains (such as attention15,55),
orthographies, which converges with transparency effects within which typically do not have the advantage of exact duration mea-
languages29,40 and cross-linguistic studies investigating reading surements for different events of interest (for example, during covert
development26,41,42. Direct associations between reading and speech attention). Maybe most importantly, the frequency perspective on
rates could not be established in the meta-analysis given the small reading offers direct links to several neurodynamic phenomena in
number of languages for which all necessary parameters were speech perception5,6, including the observation that dyslexic chil-
available. Empirical Studies 3 and 4 show this relationship on a dren56,57 and adults58 show altered cortical tracking of speech signals
subject-by-subject level, but only in less-skilled readers. This sug- in the oscillatory domain.
gests that increasing reading expertise makes the tight control of In conclusion, we show that during reading, our eyes ‘sample’
reading by linguistic processors in the brain obsolete. written text in the same frequency range in which speech is pro-
The differential coupling of speech and reading rates in low-skilled duced and perceived, which suggests that extracting information
but not high-skilled readers may also result from other phenomena from linguistic stimuli follows a similar temporal structure irrespec-
well-established in reading research, such as word skipping, parafo- tive of modality. A plausible mechanism is to assume that linguistic
veal preprocessing and re-fixations. Reading is not merely a sequence processing has a preferred cortical rate of information uptake and
of word-to-word fixations. From time to time, we skip words as a thus acts as an internal temporal driver for eye movements elic-
result of parafoveal preprocessing43,44, which describes visual word ited during reading. Eye movements in reading are thus utilized as
recognition based on low-acuity visual information from parafoveal a temporal interface between a stable physical stimulus—written
regions of the retina. Also, words are sometimes fixated multiple text—and brain systems that have evolved to process signals whose
times—for example, to correct perceptual errors after suboptimal temporal structure is constrained by the characteristics of our vocal
landing at the beginning or end of a word14,18,45 or when semantic tract1. However, our empirical data also demonstrate that a direct
inconsistencies must be resolved by re-reading46,47. We14 and others10 coupling between speech and reading rates is present only in per-
have shown that overall probabilities for word skipping and multiple sons with low reading skills, which calls for future work to clarify
fixations on a word are comparable (~20%) when reading the sen- the mediating role of reading expertise in the temporal relation-
tence materials used here. However, low-skilled readers show lower ship between speech processing and reading rates. We suggest that
skipping rates (reflecting reduced parafoveal preprocessing48–50) and the frequency perspective on reading adopted here opens up new
more re-fixations on the same word13. Readers with lower skills thus research paths, such as understanding slow or impaired reading,
focus more on fixated words and their components (letters and syl- understanding second language learning or more directly inves-
lables)51, indicating a greater alignment between the phonological tigating the commonalities and differences between reading and
properties of the words and the eye movements they elicit during other cognitive processes.
Reporting Summary
Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis We implemented eye-tracking data preprocessing in Perl on a MAC system, speech data preprocessing in MATLAB and the data preprocessing
for the Fourier transform control analysis in Python both on a Ubuntu system. All further processing steps and analysis were implemented in
R. Please find all scripts here https://osf.io/96vp8/.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.
Data
Policy information about availability of data
March 2021
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A description of any restrictions on data availability
- For clinical datasets or third party data, please ensure that the statement adheres to our policy
Research sample Fifty (13 male; 18–47 years old; M=24 years; students at University of Salzburg) native speakers of German participated in Study 1,
forty-nine (13 male; 18–74years old; M=24 years) non-native German speakers participated in Study 3, and eighty-six (36 male; 18–
53years old; M=25years; five had to be excluded based on preregistered outlier correction boundaries for both speech and reading
rates; +-3 standard deviations) German speakers participated in Study 4 after giving informed consent according to procedures
approved by the respective local ethics committee. For Study 1, see Gagl, Hawelka & Hutzler, 2011, BRM for more details. Note that
relative to the previously published report, one participant was added. For Study 3, participants with varying mother tongues (Arabic,
Azerbaijani, Bulgarian, Chinese, English, Farsi, French, Georgian, Indonesian, Italian, Japanese, Persian, Russian, Serbo-Croatian,
Spanish, Turkish, Ukrainian, Hungarian, Urdu, and Uzbek) and, for Study 4, native German participants were recruited on the campus
of Goethe University Frankfurt as part of a more extensive study. Also, note that six participants from Study 3 became literate
without the acquisition of an alphabetic script.
Sampling strategy Datasets for studies 1 and 3 were available before, and the sample size of 49 and 50 participants typically prevent the inflation of
overestimation of effects due to small sample sizes (e.g., see Cremers HR, Wager TD, Yarkoni T (2017) The relation between statistical
power and inference in fMRI. PLoS ONE 12(11): e0184923. https://doi.org/10.1371/journal.pone.0184923 shows stable effects with
N > 30). For the dataset in study 4, we provide a pre-registration, including a power calculation (https://osf.io/mjhkz). We
implemented a Monte-Carlo simulation procedure (i.e., randomly sampling data and re-estimating the general linear model 10,000
times with different participants). With this procedure, we learned that a total of 90 participants must be collected to gain a power
value of .9. After excluding five participants due to a pre-registered exclusion criterion, data from 86 participants were analyzed. For
study 1, student participants from the University of Salzburg were selected, for study 3 non-native German speakers, and for study 4
native German speakers, both from the Frankfurt area were selected.
Data collection In reading and scanning tasks used in Study 1, 3 & 4 all stimuli were presented on computer screens. The researcher presented the
interview questions, and the reading scores were implemented via a paper-pencil test. One researcher was testing one participant,
and researchers testing were blind to the hypothesis of the current study.
Timing For Study 1, we relied on a dataset first presented in Gagl, Hawelka, & Hutzler 2011. For Study 3, all data was collected in 2018. For
Study 4, all data was collected in 2019 and 2020.
Data exclusions In Study 3 we found that one and study 4 five participant had a mean amplitude modulation spectrum, which was larger than three
standard deviations from the mean of the sample; this participant was excluded from the analysis. See analysis section for details.
Ethics oversight Study 1 was approved by the ethics board of the University of Salzburg and Studies 3/4 was approved by the ethics board of
the Goethe University of Frankfurt
Note that full information on the approval of the study protocol must also be provided in the manuscript.
March 2021
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com