Running Performance, - VO, and Running Economy: The Widespread Issue of Endogenous Selection Bias

Sports Med

DOI 10.1007/s40279-017-0789-9


_ 2max, and Running Economy: The

Running Performance, VO
Widespread Issue of Endogenous Selection Bias
Nicolai T. Borgen1

Ó Springer International Publishing AG 2017

Abstract Studies in sport and exercise medicine routinely use 1 Introduction

samples of highly trained individuals in order to understand
what characterizes elite endurance performance, such as run-
Many studies in sport and exercise medicine are conducted
ning economy and maximal oxygen uptake (VO _ 2max ). How-
to determine what characterizes elite performance, as well
ever, it is not well understood in the literature that using such as to understand how elite athletes can improve further
samples most certainly leads to biased findings and accordingly [24, 25, 28, 32, 42]. For instance, studies examine to what
potentially erroneous conclusions because of endogenous _ 2max ) affects race per-
extent maximal oxygen uptake (VO
selection bias. In this paper, I review the current literature on
formance [30], or if rearfoot striking is more economical
running economy and VO _ 2max , and discuss the literature in light
than midfoot striking [37]. There is a widespread belief that
of endogenous selection bias. I demonstrate that the results in a in order to gain insights into elite performance, researchers
large part of the literature may be misleading, and provide some cannot rely on studies of all runners. Instead, these studies
practical suggestions as to how future studies may alleviate typically select subjects based on race performance, either
endogenous selection bias. intentionally (e.g., studies of Olympic qualifiers) or as a
result of convinience sampling. Having a homogeneous
sample of highly trained individuals, or even elite athletes,
Key Points is assumed to be an advantage in the literature.
However, it is not well known in the literature that in
Using samples restricted (truncated) to contain only observational studies (i.e., non-experimental studies),
elite athletes or highly trained individuals may result selecting subjects based on prior race performance will
in biased results. likely result in spurious correlations because of endogenous
The association between running economy and selection bias [10, 40]. The problem is twofold. First, when
_ 2max in truncated samples is at least partially
VO some individuals in the entire population (e.g., all US cit-
spurious. izens) have a higher probability of being included in the
population of interest (e.g., elite athletes), restricting the
The effect size of running economy and VO_ 2max on analysis to the population of interest amounts to condi-
race performance in truncated samples is attenuated. tioning on whatever increases the probability of being in
the group of interest (e.g., prior race performance). Second,
Electronic supplementary material The online version of this conditioning on the outcome variable or an effect of the
article (doi:10.1007/s40279-017-0789-9) contains supplementary treatment variable, such as race performance, may sub-
material, which is available to authorized users.
stantially bias the correlations and lead to erroneous con-
& Nicolai T. Borgen clusions. Perhaps because of its paradoxical and counterintuitive nature [4], this second point is difficult to
recognize and not sufficiently acknowledged in the
Department of Sociology and Human Geography, University
of Oslo, P.O. Box 1096, Blindern, 0317 Oslo, Norway

N. T. Borgen

In this review article, I discuss the widespread issue of _ 2max )

2.2 Maximal Oxygen Uptake (VO
endogenous selection bias in the literature on running
performance, VO _ 2max , and running economy (RE), and _ 2max reflects an individual’s maximal rate of energy
provide some suggestions on how to solve this issue. After expenditure [18], and has for a century been linked to
providing a narrative review of the literature, I use simple running performance [25]. In studies that include relatively
models and hypothethical data to demonstrate how sample heterogeneous pools of runners, VO _ 2max has repeatedly
restriction induces bias in the findings in the literature. been shown to be highly correlated with race performance
First, I show how endogenous selection bias induces an [11, 26]. In one study of well-trained but not elite runners
inverse relationship between VO _ 2max and RE, even though (i.e., VO_ 2max at about 60 ml kg1 min1 for males and 50
no such relationship exists in the population. Second, I ml kg1 min1 for females), VO _ 2max explained 90.2% of
show that having an elite sample most likely results in the variance in a 10 mile run and was the single best pre-
attenuated estimates of the effects of VO _ 2max and RE on dictor of running performance [32]. Legaz-Arrese et al.
race performance. [19] provide a graphical overview of the literature, showing
The primary aim of this article is to review and dis- the relationship between VO _ 2max and International Asso-
cuss the literature on running performance, VO _ 2max , and ciation of Athletics Federations (IAAF) scores reported in
RE. However, the article is also relevant to other studies the literature.
within sport and exercise medicine. It demonstrates that The correlation between VO _ 2max and race performance
unless the independent variable of interest is randomized is also evident by the fact that elite athletes typically have
(which it seldom is [7, 16]), the choice of study subjects _ 2max , for men often between 70 and 85 ml
very high VO
not only has implications for whom the results could be 1 1
kg min and for women about 10% lower
generalized to (external validity), but it may also affect
[12, 18, 22, 25, 30, 51]. This is about a 50–100% higher
the internal validity. The take-away point of this article
_ 2max than the normal active population [25, 55]. Among
is that when defining a population of interest, one should
elite athletes, VO_ 2max has been shown to be similar for
always consider whether individuals with certain char-
acteristics are more likely to be included in the popu- runners competing at distances ranging from the 3000 m to
lation of interest, and if these characteristics are in some the marathon [19].
way related to the outcome or independent variables in Although elites have a higher VO _ 2max , some longitudinal
the analyses. studies of elite athletes suggest that VO _ 2max changes very
little in well-trained or elite athletes [18, 22]. However, this
may also be because elite athletes tend to do very little
2 Literature Review high-intensity training [34]. While low-intensity training
may lead to a rapid increase in VO _ 2max for individuals who
2.1 Determinants of Running Performance _
initially have low VO2max , much higher intensity may be
needed for well-trained athletes [34]. Studies that include
The classic model of endurance running was initially training at or near VO_ 2max for well-trained runners indicate
established more than a century ago [25, 32], and suggests _
that VO2max may also increase in well-trained athletes [34].
that the physiological ‘concepts’ VO _ 2max , RE, lactate
Studies of the effect of VO _ 2max on race performance in
_ 2max affect race
threshold, and fractional utilization of VO well-trained or elite athletes have reported mixed findings.
performance [32]. Still, we are yet to fully understand the Some studies have found fairly strong correlations between
determinants of elite race performance, and developments _ 2max (- 0.5 to - 0.87) in well-
race performance and VO
in our understanding of the role of the brain [36] and
trained runners [9, 15, 29, 35]. However, several studies
mental fatigue [27] have recently added to our under-
failed to find a correlation between race performance and
standing of race performance. _ 2max in homogeneous samples consisting of only elites
All of the physiological concepts have been subject to
[18, 28, 30]. Of particular interest is a longitudinal study of
extensive research, but I will mainly focus on RE and
32 athletes who were followed over three years, in which
_ 2max in this narrative review. VO
VO _ 2max is one of the
Legaz-Arrese et al. [18] demonstrated only small changes
main studied measures of athletic competence, while RE _ 2max and no relationship between changes in VO _ 2max
in VO
has become increasingly studied as a response to the
and race performance. One interpretation of this finding has
complete dominance of Kenya and Ethiopia in distance
been that a high VO _ 2max is necessary to gain membership to
running over recent decades [28]. However, endogenous
selection bias is also relevant in the literature on other the elite performance cluster, but within this elite cluster,
_ 2max does not discriminate further [19].
physiological factors.

Endogenous Selection Bias in the Literature on Running Performance

However, it is possible that the entire relationship _ 2max and RE

2.4 Association Between VO
between VO _ 2max and race performance is spurious. Eval-
uating longitudinal data on sedentary individuals, Vollard An interesting and seemingly nonintuitive finding in the
et al. [52] found that VO_ 2max and race performance were literature is a moderate positive correlation between
related in a cross-sectional case, but that improvements in _ 2max and RE in samples of highly trained or elite ath-
_ 2max did not lead to any improvements in race perfor-
VO letes [28, 30, 50, 51], and a weak positive correlation in
mance, even in this group of sedentary individuals [52]. diverse samples of recreational runners [38]. In the litera-
Other longitudinal studies have also failed to find a positive ture, this positive correlation is typically described as an
correlation between changes in VO_ 2max and changes in race inverse association between VO _ 2max and RE, meaning that
performance. In Ramsbottom et al. [44], improvements in a individuals with high VO _ 2max have on average poorer RE.
5 km trial were correlated with RE but not VO _ 2max , while Some researchers, for instance Joyner [23], have suggested
in Paavolainen et al. [39], 5-km performance actually that high VO_ 2max may be incompatible with excellent RE
declined with improvements in VO _ 2max . (or lactate threshold).
An inverse association between VO _ 2max and economy is
also found among world-class cyclists. Lucia et al. [20]
2.3 Running Economy (RE) found that cyclists with higher VO _ 2max had lower cycling
economy (CE) and gross mechanical efficiency (GE),
Over the last few decades, East-African runners, particu- which led to a fruitful discussion about how expressing
larly from Kenya and Ethiopia, have dominated middle- _ 2max and ‘‘economy’’ relative to body mass may
both VO
and long-distance running events. Several possible expla-
lead to spurious findings [1, 2, 21].1 However, after
nations have been proposed [56], but VO _ 2max is probably
accounting for body mass, the inverse association among
not the explanation, as Kenyan and Ethopian runners do not cyclists [21] and runners [51] still exists.
have superior VO _ 2max compared to, for instance, European _ 2max and RE is
An inverse relationship between VO
runners [46]. However, elite East-African runners are counterintuitive, because elite athletes have on average
typically small, even compared to other elite runners, and _ 2max and better RE [31, 43]. At this point
both higher VO
studies have shown that smaller runners and runners with
there is no clear understanding of why there is an inverse
thin lower legs have better RE [12]. For instance, a recent
relationship, only speculations [30, 38, 51]. One suggested
study of competitive Kenyan distance runners demon-
explanation for this nonintuitive finding is that runners with
strated that having low body mass index (BMI), lower mid- _ 2max rely more on fat utilization [38]. Another
higher VO
thigh and ankle circumference, as well as a short Achilles
explanation is that greater lower limb mass will result in
moment arm, all had a positive influence on RE [28].
higher VO_ 2max but also poorer RE, which could explain the
RE is defined as the oxygen costs of endurance running
at a given speed [25], meaning that efficient or economical inverse relationship [30]. Finally, overstriding leads to
runners have a low RE value. RE has been shown to vary excessive vertical oscillation and breaking forces (i.e., poor
about 30–40% among individuals, and about 20–30% RE), but possibly also increased recruitment of muscle
among elite athletes [28, 30]. We know little about whether mass, which may result in higher VO _ 2max [51].
RE can be improved [12], but some studies suggest that an However, in cross-sectional observational studies, the
increase in high intensity interval training, plyometric relationship between RE and VO _ 2max is most definitely
training, altitude training, and heat exposure may improve biased by endogenous selection bias. In fact, as the next
RE [47]. section explains in detail, considering the strong relation-
RE has been researched extensively over the last few ship between RE and VO _ 2max on the one hand and being an
decades [47], and many studies have found a strong asso- elite athlete on the other hand, it is likely that the entire
ciation between RE and race performance. Some studies inverse association is spurious.
indicate that RE is an even better predictor of race per-
formance among elite runners than VO _ 2max [47]. However,
a recent study questions whether RE is indeed the expla-
nation for the East-African running dominance [28]. In a
sample of 32 competitive Kenyan distance runners, Mooses
et al. [28] found that RE was not associated with running Note that although most studies express economy and efficiency
performance among elites. Similar findings were reported directly as the oxygen cost, some studies define economy and
efficiency such that a high value indicates higher efficiency [21]. For
by Grant et al. [15], who studied a sample of well-trained instance, in the study by Lucia et al. [20], the correlations between
runners. _ 2max and CE/GE are negative.

N. T. Borgen

3 Endogenous Selection Bias To see why, we have to consider what factors influence
the probability of being included in the population of
_ 2max and RE
3.1 Inverse Association Between VO interest (sub-02:15:00 marathon). Consider the empirically
based but simplified example in Fig. 1a where RE and
Let us consider a hypothetical study (for simplicity, costs _ 2max are determined independently, both RE and
and availability are no issue here). Say that we defined the _ 2max affect race performance (RP), and race perfor-
population of interest as male runners with marathon times mance affects the probability of being an elite athlete. In
below 02:15:00, and that we have drawn a random sample this example, VO _ 2max and RE are marginally independent,
from this population of 50 male athletes where we have meaning that knowing an individual’s level of VO _ 2max in
(perfectly) measured RE and VO _ 2max . For these 50 hypo- the full population (elites and non-elites) does not provide
_ 2max and
thetical athletes, we find a correlation between VO any information about the individual’s level of RE (no
RE of 0.25. The correlation is significant and the confi- correlation between VO _ 2max and RE). However, condi-
dence interval is fairly narrow, and we can accordingly tioning on elite status (or restricting the population of
generalize the effect to the population. interest to elites) will induce a spurious inverse relationship
From a merely descriptive point of view, this correlation between VO _ 2max and RE, assuming that the path coefficient
is valid in the sense that the correlation would have been p is negative and the path coefficient c is positive [4].
very similar had we used data on the entire population of Linear path modeling is a useful tool to see why two
sub-02:15:00 male marathoners. Our hypothetical study marginally independent variables (RE and VO _ 2max in
lets us conclude that athletes with high VO_ 2max on average
Fig. 1a) may become dependent if we condition on a
are less efficient (i.e., a higher oxygen cost) and those common outcome of these variables (elite status in Fig. 1a)
having exceptional RE on average have lower VO _ 2max , [41]. If, in a full population sample, we estimate the
which is similar to the conclusions that could be drawn standardized regression coefficient (b) ^ of RE on VO _ 2max
based on the studies in the literature conditional on elite status, we identify
[20, 28, 30, 38, 50, 51].2
However, observed associations consist of both causal cpd2
b^RE;VO2max elite ¼ : ð1Þ
and various noncausal (i.e., spurious) components [10].3 1  p2 d 2
Although the observed inverse association can be gener- Since the true causal effect is 0 (i.e., no correlation),
alized to the population, it is nevertheless at least partly Eq. (1) reveals that conditioning on elite status induces
spurious. It means that the elite marathoners with the bias. This is generally known as Berkson’s paradox [4],
lowest level of VO _ 2max typically have, on average, better endogenous selection bias [10], or collider bias [40].
RE than the elite marathoners with the highest level of Restricting the analysis to a subsample of elite athletes
_ 2max , but that we should not expect to see a detoriation
VO amounts to conditioning on elite status, and hence results in
of an individual’s RE if he/she increased his/her VO _ 2max bias for the same reason [10].4 This is often known as
(through for instance interval training). sample selection bias [41] or sample truncation bias [10].
It may seem counterintuitive at first that restricting the
sample to elite athletes induces a spurious inverse associ-
2 ation between RE and VO _ 2max , but it is actually straight-
Statistical generalization from a sample to a population (of interest)
depends on assumptions such as random sampling. When using forward. Consider the situation where it is known that an
convenience samples, which are commonly used in the literature, the individual is an elite marathoner (sub-02:15:00 marathon)
statistical significance of the correlations may be misleading [5]. With and that this individual has only a mediocre RE compared
convenience samples, not all elites or highly trained individuals are
to other elite marathoners. What could then be inferred
equally likely to be included in the sample, and the study participants
are likely to be more alike with regard to for instance training about his/her VO_ 2max ? It is likely exceptional, because he/
principles (e.g., amount of high-intensity interval training) than the she is unlikely to run a sub-02:15:00 marathon with a
participants would have been had they been selected through a _ 2max . It is not that a high
probability sample. I suspect that this will lead to P values that are too
mediocre RE and a mediocre VO
small and that the uncertainty in the results is underestimated. _ 2max leads to poor RE (causal component), but that
However, the literature routinely report P values without any individuals that become elite athletes despite being
discussion. To explain their sampling procedure, and discuss any
potential bias, researchers should consider using guidelines for
reporting observational studies, for instance the Strengthening the 4
Linear path models allow us to calculate coefficients under the
Reporting of Observational Studies in Epidemiology (STROBE) assumption of linearity and homogeneous effects (no interactions).
statement [53]. Restricting the analysis to elite athletes is the same as adding a control
Following the counterfactual model of causality, causal effects are for elite athletes and interactions between elite athletes and all
defined as contrasts between potential outcomes [33]. independent variables.

Endogenous Selection Bias in the Literature on Running Performance

a b
V̇O2max V̇O2max
π π
δ δ
RP Elite λ α RP Elite

γ γ
Fig. 1 Hypothesized associations between maximal oxygen uptake independent in the population (no arrow between VO _ 2max and RE).
_ 2max ), running economy (RE), race performance (RP), and being an
(VO Controlling for RP or its descendant Elite induces a spurious association
elite athlete (Elite) in the entire population. The single-headed arrows between VO_ 2max and RE, because RP is a common outcome of VO _ 2max
_ 2max affects RP),
represent direct effects from causes to effects (e.g., VO and RE (i.e., collider variable). b An example where (1) an increase in
while bidirectional arrows indicate that two variables have one or more _ 2max is hypothesized to impair RE (a) and (2) the association
causes in common. The Greek letters are path coefficients between pairs between VO_ 2max and RE is confounded (k). Controlling for RP or its
of variables, and are in this article interpreted as correlations (e.g., p is descendant Elite still induces a spurious association between VO _ 2max
the correlation between VO _ 2max and RP). a A simplified example where
and RE
both VO2max and RE are assumed to affect RP but are marginally

inefficient must necessarily have other traits that compen- illustrate the amount of bias, we need some assumptions. In
sate, such as a very high VO_ 2max (non-causal component). Fig. 2a, the correlation between RE and VO _ 2max in the full
Note that this endogenous selection bias occurs not population is constrained to be zero. Additionally, the sizes of
simply because the sample is restricted to a subgroup of the the correlations between VO _ 2max and race performance and
population in itself, but because it is restricted in a specific between RE and race performance are constrained to be equal.
way. For instance, restricting the sample to individuals The implication of this latter assumption is that Fig. 2a
with a VO _ 2max above 70 ml kg1 min1 will not lead to illustrates an upper bound for the bias, as further discussed in
endogenous selection bias as long as all individuals in the Appendix S2 of the ESM. Appendix S3 of the ESM provides a
population of interest (individuals with VO _ 2max above supplementary data simulation.
1 1
70 ml kg min ) have an equal chance of being sampled Figure 2a illustrates that the amount of bias is a function
regardless of whether they are elites or not. In that case, of two aspects. First, the bias is greatest in cases where
and assuming that VO _ 2max and RE are truly independently _ 2max and RE explain most of the variation in race per-
determined, there would be no reason to expect that those formance (i.e., stronger correlations between VO _ 2max and
with a relatively low VO_ 2max have a higher RE than those race performance and between RE and race performance).
with high VO2max . Thus, the problem occurs when condi- Second, a more elite sample will lead to more bias, which
tioning on the collider variable elite status or, similarly, is intuitive. In a sample of the best 100 marathoners in the
when the sample is restricted to elites only. world, those with (relatively) low VO _ 2max must have
Endogenous selection bias is not about whether the exceptional RE. However, in a sample that consists of all
effects are different for elites and non-elites. Endogenous but the slowest 10% of the population, those with low
selection bias is about how spurious associations are _ 2max could very well have poor RE. That said, the bias
introduced in the data because of (for instance) sample may be substantial even if we include all but the slowest
restriction, resulting in erroneous correlations in the sub- 25% of runners (see the top 75% line in Fig. 2a).
group studied. Thus, the results are not even valid for the After accounting for body mass, the correlation between
subgroup studied: they are statistical artifacts. RE and VO _ 2max in samples of highly trained individuals
ranges from about 0.25 to 0.30 [51]. In Fig. 2a, we see that
3.2 The Size of the Spurious Inverse Relationship among the top 1% and 25% of the runners, the estimated
correlation is about 0.25–0.30 (upper bound) when VO _ 2max
The simple formula in Eq. (1) demonstrates that the associa- and RE explain about 50% of the variation in race per-
tion between VO _ 2max and RE becomes inverse after condi- formance in the population.5 Thus, for the entire
tioning on race performance when working with population
data, and, intuitively, the same holds when restricting the 5
sample to elite athletes only. Figure 2a illustrates the amount The coefficient of determination (R2 ) can be calculated by summing
the squared semipartial correlations, which in this case is identical to
of bias we may expect in settings where (1) the amount of the pairwise correlations (Fig. 1a): R2 ¼ p2 þ c2 . Since p and c are
_ 2max and RE explains is
variation in race performance that VO constrained to be equal, p and c can be calculated from the
varied and (2) the sample selectivities differ. However, to figure using ðR2  12Þ2 .

N. T. Borgen

a b
0.80 0.00
Estimated correlation

Estimated correlation
0.60 -0.20

0.40 -0.40

0.20 -0.60

0.00 -0.80
0.00 0.20 0.40 0.60 0.80 1.00 -0.80 -0.60 -0.40 -0.20 0.00
R True (causal) correlation

Top 1% Top 25% Top 1% Top 25%

Top 50% Top 75% Top 50% Top 75%
Top 99% Entire population Top 99% Entire population

Fig. 2 Illustration of the size of endogenous selection bias (see with the top 1, 25, 50, 75, and 99% race performances, as well as for
Appendix S1 in the ESM for the Stata code). a The figure is the entire popoulation. b The figure is constructed by generating a set
constructed by generating a set of hypothetical full populations where of hypothethical full populations where the correlation between
the correlation between RE and VO _ 2max is constrained to be zero, _ 2max and race performance varies between 0 and 0.75 (x-axis).
while the amount of variation in race performance (R2 ) that RE and From each of these populations, the correlation between VO _ 2max and
_ 2max explain varies between 0 and 0.98 (x-axis). The sizes (but not
VO race performance is estimated (y-axis) for individuals in the
signs) of the correlations between RE and race performance and subpopulations with the top 1, 25, 50, 75, and 99% race performances,
between VO_ 2max and race performance are constrained to be equal. as well as for the entire popoulation. Replacing VO _ 2max with RE
From each of these populations, the correlation between RE and produces the same results, only with different signs
_ 2max is estimated (y-axis) for individuals in the subpopulations

correlation between RE and VO _ 2max to be spurious, RE and illustrated the amount of bias we may expect given some
_ 2max may only need to account for approximately 50%
VO simplified assumptions. In the next section, I show that the
of the variation in race performance, which, based on the effects of RE and VO_ 2max on race performance may be
literature, is plausible [11, 26, 32, 47]. This demonstrates biased for the same reason.
that restricting the sample to elite athletes has the potential
to substantially bias the findings in the literature. _ 2max on Race
3.3 The Effects of RE and VO
Thus far, the discussion has relied on the simplified Performance
model where RE and VO _ 2max are assumed to be marginally
independent (Fig. 1a). However, we could also expect RE Many studies that have used elite samples have failed to
and VO _ 2max to be marginally dependent (Fig. 1b). First, find a significant effect of RE and VO _ 2max on race per-
elite runners have both higher VO _ 2max and higher RE than formance [18, 28, 30]. Consider the study of 32 competi-
good runners [43]. This may indicate that some unobserved tive Kenyan runners, in which Mooses et al. [28] found
background factors (e.g., genetics) affect both VO _ 2max and neither a significant effect of RE on IAAF score
(r ¼ 0:01) nor a significant effect of VO _ 2max on IAAF
RE, as suggested by the curved dotted line in Fig. 1b.
Second, we have also seen several explanations for why a score (r ¼ 0:29) [28].6 A study of Olympic trials qualifiers
_ 2max may actually impair RE [30, 38, 51]. Thus, also found a nonsignificant correlation between VO _ 2max
high VO
Fig. 1b includes a direct effect of VO_ 2max on RE. and race performance (r ¼  0:21) [30].
In this more complex example, the path coefficients However, restricting the sample to elite athletes not only
become more complicated (for a general example, see induces an inverse spurious association between RE and
Pearl [41]), but the bias is no less present. The correlation _ 2max , it also biases the effects of RE and VO
VO _ 2max on race
between VO _ 2max and RE in an elite sample would be equal performance. Conditioning on elite status, which is a
to the causal effect a and some bias caused by (1) unob- descendant of the outcome variable race performance,
served confounding (k) and (2) endogenous selection bias.
Section 3.1 explained why restricting the sample to
contain only elite athletes may result in a spurious inverse This study also found an inverse relationship between RE and
_ 2max (r ¼ 0:42), which, as discussed in Sect. 3.1, is at least
association between VO _ 2max and RE, while Sect. 3.2 has
partially spurious.

Endogenous Selection Bias in the Literature on Running Performance

induces a spurious association between the predictor vari- The amount of attenuation in studies by Mooses et al.
able and all unmeasured causes of elite status. [28] and Morgan and Daniels [30] is difficult to predict, as
To keep things simple, let us first consider what happens we do not know the true correlation between race perfor-
if we condition on elite status in a regression of race per- mance and elite status (d); nor do we know if RE and
formance on VO _ 2max in a population sample where VO _ 2max _ 2max are exogenous (i.e., the no-confounding assump-
is hypothesized to be exogenous. Based on Fig. 1a, we tion). Sample sizes of 32 [28] and 22 [30] also mean that
identify the following [41]: the point estimates are imprecise. However, assuming that
RE and VO _ 2max are exogenous, correlations of  0:01,
pð1  d2 Þ
b^RP;REelite ¼ ð2Þ 0.29, and  0:21 in an elite sample (top 1%) would be
1  p 2 d2
expected if the true correlations were about  0:05; 0.7,
In this particular example, this means that the estimate is and  0:6. Although this is only speculation based on
biased towards zero, as d [ 0. To see the estimated effect assumptions such as the no-confounding assumption and
of RE, p simply needs to be replaced by c. Restricting the taking the point estimates at face value, it suggests that we
sample to a subsample of elite athletes amounts to condi- could expect large bias when regressing race performance
tioning on race performance, and hence results in bias for on predictors in a sample that consists of elite athletes.
the same reason. Some studies have used longitudinal data to investigate
Figure 2b illustrates the size of the endogenous selection how change in VO _ 2max relates to change in race perfor-
bias in some settings, based on the model in Fig. 1a (see mance [19, 52]. The next section discusses how the within
Appendix S3 in the ESM for a supplementary data simu- estimator removes endogenous selection bias but also has
lation). If there is no bias, we should expect the value on some drawbacks.
the x-axis (the true correlation between VO _ 2max and race
performance) to perfectly match the value on the y-axis
(the estimated correlation), as is the case when the corre- 4 Why Longitudinal Data May Not Be
lation is estimated in the entire population. If the value of the Solution
the y-axis is smaller than the value on the x-axis, as is the
case in all the subpopulations, then the effect of VO _ 2max is 4.1 Within-Subject Variance as a Solution
There are two important take-away points from Fig. 2b. Let us consider the question of what level of RE individual
First, we see that the amount of attenuation bias depends on i1 would have if he/she had a VO_ 2max of 70 ml kg1 min1
the sample selectivity. When comparing results from 1 1
rather than 65 ml kg min . Since we could only
studies of elite athletes, highly-trained runners, recreational observe the actual VO_ 2max (65 ml kg1 min1 ) and not the
runners, and untrained runners, it may be tempting to, for
counterfactual one (70 ml kg1 min1 ), we have a missing
example, conclude that VO _ 2max matters more for untrained
data problem. In the cross-sectional case, we solve this
than elites. In fact, cross-sectional studies often suggest missing data problem by comparing individual i1 with
that using a homogeneous sample of elite athletes in itself _ 2max of
another individual i2 who has a VO
could explain the failure to find a significant relationship of 1 1
70 ml kg min under the assumption that these two
RE and/or VO _ 2max to race performance [20, 28, 30].
individuals are equal in all other relevant aspects. How-
However, Fig. 2b illustrates that the amount of attenuation ever, when using a sample consisting of elites, the indi-
is a function of the sample selectivity, and that even if the vidual i2 is not otherwise identical to i1 , as he/she likely has
(causal) correlation is identical for all runners, one may a lower RE. This is why the cross-sectional comparison
obtain quite different results in different samples simply breaks down, as explained in Sect. 3.
because of endogenous selection bias. However, if we have repeated observations of each
Second, the amount of attenuation bias depends on the individual (i.e., longitudinal data), then we can compare
true (causal) correlation, and the bias is zero when the true individual i1 at time t1 with the same individual at time t2 .
correlation is zero. Thus, unlike the case of the inverse Under the assumption that the bias is invariant over time
relationship between RE and VO _ 2max , adjusting for a
and there is no selective attrition, the fixed effects model
descendant of the outcome variable would only generate takes account of bias caused by both confounding and
bias if there is a marginal association (confounding or endogenous selection [3, 6, 54], and we can identify causal
causal effect) between VO _ 2max and race performance. This effects. Although not motivated by endogenous selection
can also be seen by replacing p with 0 in Eq. (2), meaning bias, some studies have used this methodology to investi-
that the numerator reduces to 0, the denominator reduces to gate the association between VO_ 2max and race performance,
1, and the estimated association is 0. finding no association [18, 52].

N. T. Borgen

4.2 Small Sample Size and Noisy Measures (prior) race performance is either (1) the common outcome
of two variables (VO _ 2max and RE) or (2) the outcome
Despite the fact that the fixed effects estimator is a solution variable or a descendant of the outcome variable (race
to endogenous selection bias, the use of within-subject performance), then the sample restriction induces bias in
variation has major drawbacks that render it basically the analysis.
ineffective in the literature. For instance, studies of elite The main conclusions of this review can be summarized
athletes have demonstrated that heavy training may not as follows. First, I have demonstrated that the inverse
change VO _ 2max at all or may change it only marginally relationship between RE and VO _ 2max that many studies find
[18, 22]. Thus, by discarding all between-subject variation, [28, 30, 50, 51] is likely spurious. Second, I have demon-
we utilize only a fraction of the variance in the data set. strated that endogenous selection bias may substantially
The implication is likely large standard errors and impre- attenuate the effects of predictors on race performance,
cise results. which may explain why some studies that use elite samples
Imprecise results are especially problematic given the _ 2max [18, 28, 30] and
fail to find significant effects of VO
small sample sizes in the literature [18, 51, 52], and RE [15, 28]. Third, I have shown that a more elite sample
accordingly their low statistical power. Adding individual will lead to more bias, but that the bias may be substantial
fixed effects will likely not increase statistical power, even in samples of recreational runners. Fourth, I have
which means that the correlations need to be very large for argued that using within-subject variation is problematic.
this design to be able to detect any significant effects Given the small sample sizes in the literature, the fact that
[3, 14].7 This casts some doubt on the studies that find no _ 2max changes only marginally in elite athletes [18, 22],
effect of RE and VO _ 2max using longitudinal data [18, 52].
and the problem of measurement error [17, 48], null find-
Additionally, RE and VO _ 2max would likely differ ings [18] are not surprising.
depending on factors such as altitude [28], time of year Studies in the literature provide many interesting find-
[22], and running surface [48, 49], and they would be ings; for example, the relationship between anthropometric
measured with some degree of error [17, 47]. For instance, variables and RE in Mooses et al. [28].8 Like many other
the typical measurement error of RE is shown to be about studies, Mooses et al. [28] also contribute by describing
2.4% [48]. Although this amount of measurement error is key characteristics of elite distance runners. However,
of little concern in a cross-sectional case, it will likely caution should be exercised when estimating correlations
substantially attenuate the estimates when using a within- between RE, VO _ 2max , and race performance in observa-
subject estimator. For example, because VO _ 2max changes tional samples of elite runners.
very little in elite athletes, the within-subject variation we The challenge of estimating causal effects using a ran-
observe may to a large extent be caused by random mea- dom sample of observational data from a population is in
surement error (i.e., noise) [17], and because the mea- itself formidable [33], and having a (convenience) sample
surement error is random, it is accordingly not related to of elite or well-trained athletes magnifies the challenge.
race performance. In sum, null findings may not be that Although eliminating all sources of bias, such as mea-
surprising in studies that evaluate longitudinal data surement error, is difficult to achieve, the bias caused by
[18, 52]. endogenous selection in this particular literature is so
fundamental that it cannot be ignored. Finding a good
solution to the problem of endogenous selection bias is
5 Conclusions difficult, mainly because elite runners by definition are
rare, but also because they are unlikely to participate in
In this article, I have provided a critical review of the randomized trials (which would eliminate endogenous
literature that investigates the associations between RE, selection bias).
_ 2max , and elite running performance. Studies in this
VO Nevertheless, perhaps the best solution is to conduct
literature routinely use samples of highly trained individ- _ 2max
experiments to investigate the effects of changes in VO
uals, which inevitably results in endogenous selection bias. and RE and how these relate to changes in race
The crux of the problem is that restricting the analysis
sample to a population of interest amounts to conditioning 8
This association is also biased by endogenous selection, but the bias
on whatever increases the probability of being in the group
is likely small and the findings accordingly informative. Given the
of interest, such as prior race performance. If, for instance, following model: anthropometric variable ! RE ! RP ! elite, then
the elite variable is a descendant of the outcome variable RE, and
This means that studies which estimate within-subject correlations restricting the sample to elites amounts to conditioning on the
with small sample sizes and find significant effects most likely outcome variable. However, since the effects of RE on elite are less
exaggerate the correlation. than the effects of RP on elite, the bias would most likely be small.

Endogenous Selection Bias in the Literature on Running Performance

