Journal of Sports Sciences

Reliability and stability of anthropometric and

performance measures in highly-trained young soccer
players: effect of age and maturation
a a
Martin Buchheit & Alberto Mendez-Villanueva
Aspire, Academy for Sports Excellence, Football Performance and Science Department
Doha , Qatar
Published online: 08 May 2013.

Journal of Sports Sciences, 2013

Reliability and stability of anthropometric and performance measures

in highly-trained young soccer players: effect of age and maturation


Aspire, Academy for Sports Excellence, Football Performance and Science Department, Doha, Qatar

(Accepted 27 February 2013)

The purpose of this study was to assess both short-term reliability and long-term stability of anthropometric and physical
performance measures in highly-trained young soccer players in relation to age and maturation. Data were collected on 80
players from an academy (U13–U18, pre- (n = 14), circum- (n = 32) and post- (n = 34) estimated peak height velocity,
PHV). For the reliability analysis, anthropometric and performance tests were repeated twice within a month. For the
stability analysis, these tests were repeated 12 times over a 4-year period in 10 players. Absolute reliability was assessed with
the typical error of measurement, expressed as a coefficient of variation (CV). Relative reliability and long-term stability
were assessed using the intraclass correlation coefficient (ICC). There was no clear age or maturation effect on either the
CVs or ICCs: e.g., Post-PHV vs. Pre-PHV: effect size = –0.37 (90% confidence limits (CL):-1.6;0.9), with chances of
greater/similar/lower values of 20/20/60%. For the long-term stability analysis, ICCs varied from 0.66 (0.50;0.80) to 0.96
(0.93;0.98) for 10-m sprint time and body mass, respectively. The short-term reliability of anthropometry and physical
performance measures is unlikely to be affected by age or maturation. However, some of these measures are unstable
throughout adolescence, which questions their usefulness in a talent identification perspective.

Keywords: variability, peak height velocity, adolescence, maximal sprinting speed, maximal aerobic speed

Introduction also important in developing soccer players to

assess maturation timing (Philippaerts, et al.,
Anthropometric measures and physical performance
2006), training effectiveness (Buchheit, Mendez-
tests are regularly performed in soccer academies,
Villanueva, Delhomel, Brughelli, & Ahmaidi, 2010;
both for aiding selection/detection (Reilly, Williams,
Buchheit, Simpson, Al Haddad, et al., 2012; Mujika,
Nevill, & Franks, 2000; Vaeyens et al., 2006) and
Santisteban, & Castagna, 2009) and eventually
training monitoring purposes (Buchheit, Simpson, Al
adapt/modify training contents in relation to matura-
Haddad, Bourdon, & Mendez-Villanueva, 2012).
tion status (Ford et al., 2011).
The great majority of test batteries used for talent
In addition to accuracy, the short-term reliability
identification in soccer include, among others, mea-
of any measurement (i.e., the consistency of a parti-
sures of height, body mass, sprinting speed and car-
cular measure when repeated on different occasions
diorespiratory fitness (Reilly et al., 2000; Vaeyens
in similar conditions) is of great importance for prac-
et al., 2006). Adolescent players who possess the
titioners and researchers to avoid biased interpreta-
required characteristics to make the elite adult level
tion when both assessing changes in a marker
may not necessarily retain these attributes throughout
(absolute reliability, monitoring purpose) (Hopkins,
growth and maturation (Vaeyens, Lenoir, Williams,
2000) and/or ranking individuals (relative reliability,
& Philippaerts, 2008). However, evidences arising
selection/identification purpose) (Weir, 2005). While
from non-experimental studies suggest that some
the time course of the changes in anthropometric
anthropometric traits and physical capacities might
and physical performance measures throughout ado-
be important in determining whether already selected
lescence has now been documented in developing
elite young soccer players are successful or not in
soccer players (Philippaerts et al., 2006; Spencer,
acceding to higher standards of play (le Gall,
Pyne, Santisteban, & Mujika, 2011; Williams,
Carling, Williams, & Reilly, 2010). Monitoring
Oliver, & Faulkner, 2011), whether maturity status
changes in body dimensions and physical perfor-
affects the short-term reliability of these measures
mance throughout the training/competitive year is
has not been investigated yet. Because of the limited

2 M. Buchheit & A. Mendez-Villanueva

experience of the younger and/or pre-pubertal the same team (Figueiredo, Coelho, Cumming, &
players in the different testing protocols (familiarisa- Malina, 2010), we expected to observe a poor long-
tion), reliability could be worse for these groups. The term stability of the measures when comparing a
disruption of motor coordination (“adolescent awk- player to his team-mates.
wardness”) generally observed before the age at peak
height velocity (APHV) (Beunen & Malina, 1988)
might also be associated with a poorer reliability of
physical performance measures. Participants
Together with short-term reliability, the long-term
Data were collected in 80 highly-trained young foot-
stability of these measures throughout consecutive
ball players (14.5 ± 1.5 years) from an elite soccer
years (Abbott & Collins, 2002) refers to the consis-
academy (from Under 13 (U13) to U18). All the
tency of the position or rank of individuals in the
players, irrespective of age groups, participated on
group relative to others, and influences directly the
average in ~14 hours of combined soccer-specific
level of confidence with which the physical potential
training and competitive play per week (6–8 soccer
of a talented young player can be predicted to his late
training sessions, 1 strength training session, 1–2 con-
adolescence/early adulthood (Vaeyens et al., 2008).
ditioning sessions, 1 domestic game per week and 2
This has obvious implications in elite soccer given
international club games every 3 weeks). All players
the economical stakes associated with players’
had a minimum of 3 years prior soccer-specific train-
recruitment in the elite academies. Long-term stabi-
ing. Written informed consent was obtained from the

lity is generally examined via correlation coefficients
players and their parents. The study was approved by
(Abbott & Collins, 2002) or the intraclass correlation
the local research ethics committee and conformed to
coefficient (ICC), with ICC values > 0.70 generally
the recommendations of the Declaration of Helsinki.
considered as reflecting high stability. To date,
research on the long-term stability of anthropometric
and physical performance measures in young soccer Study overview
players is very limited, and data on the general popu-
lation have been inconclusive. For example, while Testing was conducted on the same indoor synthetic
the long-term stability of performance measures track which allowed the maintenance of standardised
was shown to be moderate (Maia et al., 2003) to environmental conditions (22.0 ± 0.5°C, 55% relative
good (Maia et al., 2001) throughout adolescence by humidity). Training contents and load during the two-
some authors, others reported a poor stability of to-three weeks preceding the testing sessions were well
such measures (i.e., most of the correlation coeffi- standardised, similar before each testing period and
cients between the test-retest values were lower than comparable for each team. For the short-term reliability
0.70) (Abbott & Collins, 2002). The adolescents analyses, anthropometric and performance tests were
examined in these latter studies were however either repeated twice within a month during the competitive
sedentary or only involved in a generic sport pro- season (during “standardised” weeks). For the long-
gramme, so results can obviously not be compared term stability analysis, we used data from 10 players
to a specific population of highly-trained soccer belonging to the same team, which were tested 3 times
players. Additionally, the tests batteries included per year (every 4 months) with the same testing battery
generic physical test measures (e.g., isometric over a 4-year period (total = 11 testing sessions). All
strength, muscle trunk endurance), and not those players were familiar with the physical tests, which
typically employed in soccer academies (e.g., 10-m included: standing and seating height, body mass
sprint time, maximal sprinting speed). (BM), the sum of 7 skinfolds, counter movement jump
The first aim of the present study was therefore to (CMJ), 10-m sprint time, maximal sprinting speed and
assess the short-term (absolute and relative, Weir, peak incremental test speed (used as an indirect measure
2005) reliability of anthropometric and physical per- of maximal aerobic speed). Anthropometric measures
formance measures in highly-trained soccer in rela- were taken in the morning before breakfast. The incre-
tion to age and maturation. Because of the combined mental test was performed during a morning training
effect of testing experience and maturation on these session (8 AM), while the CMJ and speed tests were
measures, we expected that the level of reliability performed during an afternoon session (3 PM). Testing
would improve with age/maturity status. The second sessions were at least 24 h apart.
objective was to further examine, in a selected num-
ber of outfield players, the long-term stability of
Anthropometric measures and maturity status
anthropometric and physical performance measures
over a 4-year training/competitive period. Since
maturity timing and, in turn, physical performance, All anthropometric measures were taken in the
may differ between players of the same age within morning by the same experienced anthropometrist
Reliability and stability of test measures in youth soccer 3

(International Standards for Anthropometric skeletal age (SA) (estimated from a hand and wrist
Assessment, ISAK Level 3). Dimensions included radiograph; Gilsanz–Ratib’s bone age atlas (Gilsanz
stretch stature, sitting height, BM, and sum of & Ratib, 2005)). The average estimate of APHV was
seven skinfolds (triceps, subscapular, biceps, supras- 14.3 ± 0.7 years for the present study population.
pinale, abdominal, front thigh, and medial calf). For analysis, players were then either grouped as
Stretch stature was measured using a wall-mounted U14 (n = 35), U16 (n = 30) or U18 (n = 15), or
stadiometer (± 0.1 cm, Holtain Ltd., Crosswell, allocated to either Pre- (< –1 year from APHV,
UK), sitting height with a stadiometer mounted on n = 14), Circum- (≥ –1 year from/to APHV ≤ 1,
a purpose-built table (± 0.1 cm, Holtain Ltd., n = 32) or Post- (< –1 year to APHV, n = 34)
Crosswell, UK), body mass with a digital balance PHV groups, based on their estimated APHV during
(± 0.1 kg, ADE Electronic Column Scales, the first testing session (Table I).
Hamburg, Germany), and skinfold thicknesses with
a Harpenden skinfold caliper (± 0.1 mm, Baty
Speed tests
International, Burgess Hill, UK). Landmarks for
each skinfold measurement were in accordance All players performed two maximal 40-m sprints
with previously described procedures (Marfell- during which 10-m split times were recorded using
Jones, Olds, Stewart, & Carter, 2006) with all skin- dual-beam electronic timing gates (Swift
fold measurements taken on the right side of the Performance Equipment, Lismore, Australia).
body. The age at peak height velocity (PHV) was Acceleration capacity was inferred from the first
used as a relative indicator of somatic maturity 10-m split (s). Maximal sprinting speed (km · h−1)
representing the time of maximum growth in stature was defined as the fastest 10-m split time measured
during adolescence as described by Mirwald, Baxter- during each 40-m sprint (Buchheit, Simpson,
Jones, Bailey, and Beunen (2002). Ethnicity of the Peltola, & Mendez-Villanueva, 2012). Split times
players was Arab (Middle Eastern; considered as were measured to the nearest 0.01 s. Players com-
“white” on the Census forms, as the Canadian ado- menced each sprint from a standing start with their
lescents who served to determine the initial regres- front foot 0.5 m behind the first timing gate and
sions to estimate age at PHV (Mirwald et al., 2002)). were instructed to sprint as fast as possible over the
The effect of ethnicity on the validity of biological full 40 m. The players started when ready, thus
maturity estimates using the procedures described eliminating reaction time. Each trial was separated
above is presently unknown; the equation was there- by at least 60 s of recovery with the best performance
fore assumed to be valid for the present sample used as the final result.
(Buchheit, Mendez-Villanueva, Simpson, &
Bourdon, 2010; Mendez-Villanueva et al., 2010).
Lower limb explosive strength
Data derived from a sample of 90 young soccer
players (age range: 12.1–17.3 years) in our academy A vertical countermovement jump (CMJ, cm) with
showed that age from/to PHV is well correlated flight time measured with a force plate (Kistler
(r = 0.69 (90% confidence limits: 0.59;0.77) with 9286AA, Kistler Instrument Corp., Winterthur,

Table I. Anthropometric and physical performance measures in highly-trained soccer players in relation to estimated peak age velocity.

Maximal Peak
Sum of 7 sprinting incremental
Age APHV Height Body mass skinfolds 10-m speed test speed
n (years) (years) (cm) (kg) (mm) sprint (s) (km · h−1) CMJ (cm) (km · h−1)

U14 35 13.2 ± 0.6 −0.9 ± 0.7 155.9 ± 7.3 43.9 ± 8.1 51.8 ± 22.3 1.89 ± 0.09 26.8 ± 2.1 33.0 ± 4.5 15.5 ± 1.1
U16 30 15.0 ± 0.6 0.6 ± 1.0 165.4 ± 9.0 52.9 ± 10.1 46.9 ± 9.9 1.81 ± 0.09 29.4 ± 2.1 39.6 ± 5.1 16.3 ± 0.9
U18 15 16.9 ± 0.6 2.0 ± 0.6 170.8 ± 7.5 61.4 ± 9.8 42.9 ± 5.6 1.71 ± 0.05 31.5 ± 1.2 47.2 ± 6.1 16.9 ± 1.1

Pre-PHV 14 12.8 ± 0.6 −1.5 ± 0.3 150.7 ± 5.4 38.5 ± 3.5 48.6 ± 12.6 1.92 ± 0.05 25.6 ± 1.3 29.6 ± 3.7 15.1 ± 0.6
Circum-PHV 32 14.2 ± 0.8 −0.2 ± 0.6 160.6 ± 6.1 48.1 ± 7.5 49.0 ± 20.8 1.85 ± 0.10 28.1 ± 2.0 37.4 ± 4.7 16.0 ± 1.0
Post-PHV 34 15.6 ± 1.5 1.8 ± 0.5 172.2 ± 6.5 61.8 ± 8.4 47.2 ± 10.0 1.76 ± 0.08 30.7 ± 2.0 42.8 ± 7.1 16.6 ± 1.0

Anthropometric measures, estimated age from/to peak height velocity (APHV), 10-m sprint time, maximal sprinting speed, counter
movement jump (CMJ) and peak incremental test speed in young soccer players, with data analysed in relation to age groups (Under 14
(U14), Under 16 (U16) or Under 18 (U18)) and maturity status (pre- (Pre-PHV), circum- (Circum-PHV) or post- (Post-PHV) estimated
peak height velocity young soccer players). *Numbers in brackets stand for n Pre-, Circum- or Post-PHV players per age group.
4 M. Buchheit & A. Mendez-Villanueva

Switzerland) to calculate jump height was used to rank = rank × 10/n player tested). Possible differences
assess lower limb explosive strength. Players were in short-term reliability between the different maturity
instructed to keep their hands on their hips with and age groups were analysed for practical significance
the depth of the counter movement self-selected. while comparing the average CV and ICC obtained
Each trial was validated by visual inspection to from the 8 measures in each group (Fourchet,
ensure each landing was without significant leg flex- Materne, Horobeanu, Hudacek, & Buchheit, 2012),
ion. At least three valid CMJ’s were performed sepa- using magnitude-based inferences (Hopkins,
rated by 25 s of passive recovery, with the best Marshall, Batterham, & Hanin, 2009). Standardised
performance recorded. differences or effect sizes (90% confidence interval)
for CVs and ICCs were calculated, and the threshold
values for Cohen effect size (ES) statistics were > 0.2
Incremental field running test
(small), > 0.6 (moderate), and > 1.2 (large) (Hopkins
A modified version of the University of Montreal et al., 2009). The magnitude of the ICC was assessed
Track Test (Leger & Boucher, 1980) (i.e., Vam- using the following thresholds: > 0.99, extremely high;
Eval) was used to approach maximal aerobic speed 0.99–0.90, very high; 0.90–0.75, high; 0.75–0.50,
(Buchheit, Mendez-Villanueva, Simpson, et al., moderate; 0.50–0.20, low; < 0.20, very low (WG
2010). The test began with an initial running Hopkins, unpublished observations). Probabilities
speed of 8.5 km · h−1 with consecutive speed were also calculated to establish whether the true
increases of 0.5 km · h−1 each minute until exhaus- (unknown) differences were lower, similar or higher
tion. The players adjusted their running speed than the smallest worthwhile difference or change (0.2
according to auditory signals timed to match 20-m multiplied by the between-subject standard deviation,
intervals delineated by marker cones around a based on Cohen’s effect size principle). Quantitative
200-m long indoor athletics track. The test ended chances of higher or lower differences were evaluated
when the players failed on two consecutive occa- qualitatively as follows: <1%, almost certainly not; 1
sions to reach the next cone in the required time. −5%, very unlikely; 5−25%, unlikely; 25−75%, possi-
The average velocity of the last stage completed was ble; 75−95%, likely; 95−99%, very likely; >99%,
recorded as the players’ peak incremental test speed almost certain. If the chance of substantially higher
(km · h−1). If the last stage was not fully completed, or lower values were both >5%, the true difference
the peak incremental test speed was calculated as was assessed as unclear. Otherwise, we interpreted
peak incremental test speed = S + (t/60 × 0.5), that change as the observed chance (Hopkins et al.,
where S is the last completed stage speed in 2009).
km · h−1 and t is the time in seconds of the uncom-
pleted stage.
Statistical analysis Short-term reliability
Data in the text and figures are presented as means The different reliability variables for anthropometric
with 90% confidence limits (CL) and intervals (CI), and performance measures are presented in Tables II
respectively. All data were first log-transformed to and III for players grouped by age and estimated
reduce bias arising from non-uniformity error. The maturity status, respectively. When comparing the
short-term reliability of each anthropometric and phy- different CVs obtained for each age group, there was
sical performance measure was assessed while calcu- no clear difference in CVs: U16 vs. U14: effect size
lating both the typical error of measurement (TE, (ES) = –0.14 (–0.9;0.7), with chances of greater/simi-
absolute reliability), expressed as a coefficient of varia- lar/lower values of 24/31/45%, U18 vs. U14:
tion (CV, 90% CL) (Hopkins, 2000) and the intra- ES = –0.21 (–1.0;0.6), 20/29/51% and U18 vs. U16:
class correlation coefficient (ICC, 90% CL, relative ES = –0.09 (–0.9;0.8), 28/32/41%. There was also no
reliability) (Weir, 2005) with a specifically-designed clear differences between the ICCs: U16 vs. U14:
spreadsheet (Hopkins, 2012). The TEs as calculated ES = 0.36 (–0.5;1.2), 63/24/13%, U18 vs. U14:
by Hopkins are of great interest for the present study ES = –0.01 (–0.8;0.8), 33/32/35% and U18 vs. U16:
design since it is insensitive to the change in the mean ES = –0.30 (–1.1;0.5), 15/26/58%. Similarly, there
between the successive trials. The long-term stability was no clear difference in the CVs obtained for each
of players’ anthropometric and performance measures maturity group: Post-PHV vs. Pre-PHV: ES = –0.37
throughout the 4-year period was also assessed with an (–1.6;0.9), 20/20/60%, Post-PHV vs. Circum-PHV:
ICC. When a player missed a testing session, the ES = –0.01 (–0.9;0.8), 33/32/35% and Circum-PHV
individual rankings of the remaining players were vs. Pre-PHV: ES = –0.27 (–1.5;1.0), 24/22/54%.
rescaled to a 10-point scale (i.e., rescaled Finally, all between-group comparisons for ICCs
Table II. Reliability of anthropometric and physical performance measures over a month in the different age groups.

Sum of 7 Maximal sprinting Peak incremental

Height (cm) Body mass (kg) skinfolds (mm) APHV (years) 10-m sprint (s) speed (km · h−1) CMJ (cm) test speed (km · h−1)

U14 (n = 35) Pairwise 6 6 6 6 26 23 23 30

Mean change 2.0 (1.8;2.1) 1.6 (0.9;2.3) −0.1 (–6.5;6.2) 0.0 (–0.1;0.1) −0.01 (–0.03;0.06) 0.3 (0.1;0.6) 0.9 (0.2;1.5) 0.0 (–0.3;0.2)
CV% (90%CL) 0.1 (0.1;0.2) 1.8 (1.2;3.8) 8.5 (5.6;18.5) 0.6 (0.4;1.4) 2.3 (1.9;3.1) 1.9 (1.5;2.5) 4.4 (3.5;5.9) 4.1 (3.4;5.3)
ICC 1.0 (0.99;1.00) 0.99 (0.92;1.00) 0.92 (0.57;0.98) 0.97 (0.83;1.00) 0.68 (0.42;0.83) 0.92 (0.83;0.96) 0.89 (0.79;0.95) 0.69 (0.48;0.82)
U16 (n = 30) Pairwise 19 19 20 19 28 28 28 25
Mean change 1.0 (0.7;1.3) 1.3 (0.9;1.6) 0.2 (–0.9;1.2) −0.0 (–0.1;0.1) −0.01 (–0.02;0.01) 0.4 (0.2;0.5) 1.4 (0.6;2.2) 0.1 (–0.1;0.3)
CV% (90%CL) 0.3 (0.3;0.5) 0.9 (0.7;1.3) 4.0 (3.2;5.6) 0.6 (0.5;0.8) 2.0 (1.6;2.8) 1.1 (0.9;1.4) 4.9 (4.0;6.4) 2.4 (2.0;3.2)
ICC 0.99 (0.98;1.00) 1.00 (0.99;1.00) 0.93 (0.86;0.97) 0.98 (0.96;0.99) 0.82 (0.67;0.90) 0.97 (0.95;0.99) 0.86 (0.74;0.93) 0.81 (0.65;0.90)
U18 (n = 15) Pairwise 10 10 10 10 14 14 14 10
Mean change 0.5 (0.1;0.8) 0.5 (0.3;0.8) −2.1 (–3.5;0.8) 0.1 (0.0;0.2) −0.01 (–0.04;0.02) 0.3 (0.0;0.5) 3.2 (2.3;4.1) 0.5 (0.1;1.0)
CV% (90%CL) 0.3 (0.2;0.4) 0.5 (0.4;0.8) 3.3 (2.4;5.6) 0.6 (0.4;0.9) 2.4 (1.8;3.5) 1.1 (0.8;1.6) 3.3 (2.5;4.9) 3.2 (2.4;5.4)
ICC 1.00 (0.99;1.00) 1.00 (0.99;1.00) 0.95 (0.83;0.99) 0.98 (0.94;1.00) 0.57 (0.16;0.82) 0.92 (0.80;0.97) 0.95 (0.87;0.98) 0.72 (0.27;0.91)
All players Pairwise 35 35 36 35 65 65 63 65
pooled comparisons
Together Mean change 1.0 (0.8;1.3) 1.0 (0.9;1.3) −0.5 (–1.6;0.6) 0.0 (–0.0;0.1) −0.01 (0.02;0.00) 0.3 (0.22;0.44) 1.4 (0.9;19) 0.1 (0.1;0.3)
(n = 80) CV% (90%CL) 0.4 (0.3;0.5) 1.4 (1.2;1.8) 4.9 (4.1;6.2) 0.6 (0.5;0.8) 2.2 (1.9;2.5) 1.4 (1.2;1.6) 4.5 (3.9;5.3) 3.5 (3.0;4.1)
ICC 1.00 (0.99;1.00) 1.00 (0.99;1.00) 0.93 (0.88;0.96) 0.98 (0.97;0.99) 0.87 (0.80;0.91) 0.98 (0.96;0.98) 0.95 (0.92;0.96) 0.77 (0.67;0.84)

Reproducibility variables (coefficient of variation, CV and intraclass correlation coefficient, ICC) for anthropometric measures, estimated age at peak height velocity (APHV), 10-m sprint time,
maximal sprinting speed, counter movement jump (CMJ) and peak incremental test speed in Under 14 (U14), Under 16 (U16) or Under 18 (U18) young soccer players.
Reliability and stability of test measures in youth soccer
Table III. Reliability of anthropometric and physical performance measures over a month in the different maturity groups.

Peak incremental
Sum of 7 Maximal sprinting test speed
Height (cm) Body mass (kg) skinfolds (mm) APHV (years) 10-m sprint (s) speed (km · h−1) CMJ (cm) (km · h−1)
M. Buchheit & A. Mendez-Villanueva

Pre-PHV Pairwise N/A 10 10 10 13

players (n = 14) comparisons
Mean change −0.02 (–0.05;0.02) −0.1 (–0.4;0.2) 0.6 (–0.4;1.6) 0.2 (–0.3;0.8)
CV% (90%CL) 2.2 (1.6;3.6) 1.6 (1.2;2.6) 4.6 (3.3;7.7) 5.1 (3.9;7.9)
ICC 0.48 (–0.1;0.81) 0.90 (0.70;0.97) 0.88 (0.63;0.96) 0.44 (0.0;0.76)
Circum-PHV Pairwise comparisons 16 16 16 16 26 26 26 31
players (n = 32) Mean change 1.5 (0.8;1.3) 1.4 (1.2;1.7) −1.1 (–2.5;0.2) 0.0 (–0.1;0.1) −0.01 (–0.03;0.01) 0.5 (0.3;0.6) 1.3 (0.5;2.1) 0.1 (0.1;0.3)
CV% (90%CL) 0.3 (0.3;0.5) 0.9 (0.7;1.3) 3.8 (2.9;5.5) 0.7 (0.5;1.0) 2.2 (1.8;2.9) 1.4 (1.1;1.8) 4.9 (4.0;6.4) 2.8 (2.3;3.6)
ICC 0.99 (0.98;1.00) 1.00 (0.99;1.00) 0.96 (0.91;0.99) 0.97 (0.94;0.99) 0.76 (0.57;0.87) 0.96 (0.92;0.98) 0.86 (0.74;0.93) 0.85 (0.73;0.91)
Post-PHV Pairwise comparisons 17 17 18 17 29 29 27 21
players (n = 34) Mean change 0.5 (0.3;1.8) 0.7 (0.3;1.0) −0.8 (–2.1;0.5) 0.0 (–0.0;0.1) 0.01 (–0.01;0.02) 0.4 (0.2;0.5) 1.8 (1.2;2.4) 0.3 (0.0;0.6)
CV% (90%CL) 0.3 (0.2;0.4) 1.0 (0.8;1.5) 4.7 (3.6;6.6) 0.6 (0.5;0.9) 2.2 (1.8;2.8) 1.2 (1.0;1.5) 4.1 (3.3;5.2) 3.0 (2.4;4.0)
ICC 1.00 (0.99;1.00) 0.99 (0.98;1.00) 0.91 (0.79;0.96) 0.98 (0.95;0.99) 0.77 (0.60;0.87) 0.97 (0.94;0.98) 0.94 (0.89;0.97) 0.73 (0.40;0.86)

Reproducibility variables (coefficient of variation, CV and intraclass correlation coefficient, ICC) for anthropometric measures, estimated age at peak height velocity (APHV), 10-m sprint time,
maximal sprinting speed, counter movement jump (CMJ) and peak incremental test speed in pre- (Pre-PHV), circum- (Circum-PHV) or post- (Post-PHV) estimated peak height velocity young
soccer players. Note that reliability data from Pre-PHV players are not provided due to insufficient sample size (i.e., < 5).
Reliability and stability of test measures in youth soccer 7

were unclear : Post-PHV vs. Pre-PHV: ES = 0.76 Discussion

(–0.6;2.1), 81/10/9%, Post-PHV vs. Circum-PHV:
In this study, we examined for the first time short-term
ES = –0.07 (–1.2;1.1), 34/25/42% and Circum-
(absolute and relative) reliability of anthropometric and
PHV vs. Pre-PHV: ES = 0.79 (–0.5;2.0), 82/10/8%.
physical performance measures in relation to age and
maturation in highly-trained young soccer players, as
well as their long-term stability over a 4-year period.
Long-term stability
The main findings of the present study were: 1) when
A selection of individual changes in 10-m sprint, comparing the different CVs and ICCs obtained for each
maximal sprint speed, CMJ and peak incremental group, there was no clear age or maturity effect, 2) over-
test speed over the 4-year period is illustrated in all, all CV values for anthropometric and physical per-
Figure 1; there were some evident differences in formance measures were low (e.g., 0.4% for height,
the changes in performance in players presenting 2.2% for 10-m sprint time and 3.5% for peak incremen-
similar anthropometric and performance profiles at tal test speed) and similar to those previously observed in
the start of the follow up period (i.e., U13 category). older/less trained populations, 3) we observed large
Figure 2 (individual ranks) and Table IV (team inter-individual differences in the change in physical
values) show the stability in the selected anthropo- performances over the 4-year period in players present-
metric and performance measures over the 4-year ing similar anthropometric and performance profiles at
period. The level of stability over the 4-year period the age of 12, i.e., for the 10 players selected for long-
was measure-dependent, and was rated as moderate itudinal analysis, the ICC varied from 0.66 (0.50;0.80)
for skinfolds, 10-m sprint, CMJ and maximal sprint to 0.96 (0.93;0.98) for 10-m sprint time and BM,
speed, high for peak incremental test speed and very respectively, suggesting varying levels of long-term sta-
high for height, BM and APHV. bility as a function of the measure considered.

2.05 34
148 cm / 36 kg
2.00 –1.5 yr from PHV
137 cm / 28 kg
MSS: 25 km/h
Vvam-Eval: 15 km/h –2.5 yr from PHV 32
1.95 Vvam-Eval: 15 km/h
10-m sprint (s)

MSS (km.h–1)
1.90 30
148 cm / 36 kg
–1.5 yr from PHV
1.85 147 cm / 32 kg Vvam-Eval: 15 km/h
–2 yr from PHV
MSS 25.5 km/h 28
1.80 Vvam-Eval: 14.5 km/h
147 cm / 32 kg
–2 yr from PHV
1.75 Vvam-Eval: 14.5 km/h
137 cm / 28 kg 26
–2.5 yr from PHV
1.70 Vvam-Eval: 15 km/h

1.65 24
11 12 13 14 15 16 17 11 12 13 14 15 16 17

48 Age (years) Age (years) 20

46 140 cm / 33 kg
–2.5 yr from PHV 19
44 148 cm / 36 kg MSS: 25 km/h
–1.5 yr from PHV
42 MSS: 25 km/h
VVam-Eval (km.h–1)

Vvam-Eval: 15 km/h
CMJ (cm)

38 17

147 cm / 32 kg 16
34 –2 yr from PHV
MSS 25.5 km/h
32 15
147 cm / 32 kg 148 cm / 36 kg
30 –2 yr from PHV 146 cm / 34 kg –1.5 yr from PHV
MSS 25.5 km/h –2.5 yr from PHV MSS: 25 km/h 14
28 Vvam-Eval: 14.5 km/h MSS: 23 km/h

26 13
12 13 14 15 16 17 11 12 13 14 15 16 17
Age (years) Age (years)

Figure 1. Changes in 10-m sprint time, maximal sprinting speed (MSS), counter movement jump (CMJ) and peak incremental test speed
(VVam-Eval) in selected highly-trained soccer players over a 4-year period. Grey areas represent the time when players entered the academy.
The anthropometric and performance profile of each player at this particular time is detailed in the colored text box (i.e., height, body mass,
age for/from peak height velocity [PHV] and selected performance measures). Note some evident differences in the changes in performance
in players presenting similar anthropometric and performance profiles at the start of the follow up.
8 M. Buchheit & A. Mendez-Villanueva

10-m sprint MSS

A.M. A.M.
A.A. A.A.
H.N. H.N.
I.A. I.A.


J.M. J.M.
M.F. M.F.
N.I. N.I.
O.M. O.M.
S.M. S.M.
S.A. S.A.

CMJ VVam-Eval
A.M. A.M.
A.A. A.A.
H.N. H.N.
I.A. I.A.

J.M. J.M.
M.F. M.F.
N.I. N.I.
O.M. O.M.
S.M. S.M.
S.A. S.A.

0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
Average rank (90% CI) Average rank (90% CI)

Figure 2. Average (90% confidence interval, CI) rank for each outfield player over a 4-year period with respect to 10-m sprint time, maximal
sprinting speed (MSS), counter movement jump (CMJ) and peak incremental test speed (VVam-Eval).

Table IV. Long-term stability of ranking throughout the 4-year period.

Maximal Peak
Sum of 7 10-m sprinting incremental
Height Body mass skinfolds APHV sprint speed CMJ test speed

n rank 98/110 98/110 96/110 98/110 95/110 97/110 97/110 98/110

ICC (90%CL) 0.91 0.96 0.62 0.95 0.66 0.71 0.66 0.83
(0.85;0.95) (0.93;0.98) (0.45;0.78) (0.91;0.97) (0.50;0.80) (0.55;0.83) (0.50;0.80) (0.73;0.91)

Intraclass correlation coefficient (ICC) for the player’s rank for anthropometric measures, estimated age at peak height velocity (APHV),
10-m sprint time, maximal sprinting speed, counter movement jump (CMJ) and peak incremental test speed in selected highly-trained
soccer players over a 4-year period.

Short-term reliability of anthropometric and physical and, hence, displayed some slight improvements in
performance measures in highly-trained young soccer performance (likely to the combined effect growth
players differing in age and maturation and training).
In the present study, the average estimate of APHV
The TE represents the noise occurring from trial-to-
(14.3 ± 0.7 years) was close to the range previously
trial, which might confound the assessment of real
described for European boys (13.8–14.2 years,
changes in repeated measures (i.e., when monitoring
(Malina, Bouchard, & Bar-Or, 2004)), which tends
changes in athletes, absolute reliability) (Hopkins,
to confirm the validity of this estimate in the Middle
2000). In contrast, the ICC reflects the ability of a
East population. Our primary finding is that, in con-
test to differentiate between individuals (i.e. relative
trast with our hypotheses, neither age (Table II) nor
reliability) (Weir, 2005). It is also sample-size depen-
maturity status (Table III) had a clear effect on the
dent and largely affected by the heterogeneity of the
CV and ICC values for anthropometric and physical
between-subject measures (Weir, 2005). It is also
performance measures. All players, as a part of the
worth noting that both the TE and ICC are insensi-
general monitoring system of the academy, had
tive to the change in the mean between successive
already repeated the tests several times before the
trials, which was particularly relevant for the present
experimentation. This may have diminished
study design. Players were tested within a month,
Reliability and stability of test measures in youth soccer 9

variations in technique and voluntary engagement in 1999)) and the sum of 7 skinfolds (4.9% vs. ≈ 6–
the younger groups, resulting in comparable reliability 10% in 16-year-old individuals taken from the gen-
levels across the different age/maturity groups. While eral population (Moreno et al., 2003)). We found,
this might be test-dependent, at least 2 to 3 repeti- however, a greater CV for BM (1.4%) than reported
tions of the same speed-related test are sometimes in the general population (i.e., 0.2–0.5% for a meta-
required to observe stabilisation in performance (i.e., analysis (Ulijaszek & Kerr, 1999)). This can be
to overcome the learning effect) (Glaister et al., related to the fact that BM may vary more on a
2009). The relatively high training volume of our day-by-day basis in athletes than in sedentary indivi-
players (approximately 14 h a week) might also duals measured on the same day, probably as a
explain present findings, especially for the pre-PHV function of the different training sessions/cycles and
players expected to present an “adolescence awk- the associated changes in hydration status. Finally,
wardness” and a poorer reliability of performance despite the growing interest for the assessment of
measures. In fact, regular and intense training may maturity status in developing players (Malina,
tend to stabilise locomotor function, and, in turn, Coelho, Figueiredo, Carling, & Beunen, 2012;
normalise possible differences in reliability between Philippaerts et al., 2006), the reliability of the esti-
the different age/maturity groups. mated APHV (Mirwald et al., 2002) was still
Finally, the overall short-term reliability of most of unknown. Present results show for the first time
the present measures was in agreement with data that the estimated APHV likely presents a high
previously reported for adolescents and/or other level of reliability (CV = 0.6%, ICC = 0.98), which
populations. While care should be taken when com- is actually better than that of the individual anthro-
paring our data with those from the literature pometric measures used to derive this estimation.
because of the different methods employed to calcu- Overall, these data suggest that, in highly-trained
late both a CV (i.e., calculated from the standard soccer players, the present anthropometric and per-
deviation of difference vs. from the TE as in the formance measures show good absolute reliability
present study) and an ICC (Weir, 2005), the CV and high to very-high relative reliability, irrespective
for 10-m sprint times (2.2%) was within the range of age and maturation. Similar thresholds (i.e., ½ of
(0.9–2.1%) observed in young athletes aged 8–18 the CV) (Hopkins, Hawley, & Burke, 1999) can
years (Rumpf, Cronin, Oliver, & Hughes, 2011), therefore be used to assess meaningful changes dur-
although poorer than previously reported in elite ing the season in players differing in age/maturation
adult rugby players (≈ 1% (Duthie, Pyne, Ross, status.
Livingstone, & Hooper, 2006)). While the absolute
reliability of maximal sprint speed as assessed in the
Long-term stability of anthropometric and physical
present study has not been examined yet, the CV
performance measures in highly-trained young soccer
value observed (1.4%) is in line with the values gen-
players over a 4-year period
erally reported for 40-m sprints (Rumpf et al., 2011).
The CV for CMJ height (4.5%) was similar to the In agreement with our hypothesis, we found large
5% measured in elite adult Australian Football Rules variations in the rank scores of the players with
players on a force plate (Cormack, Newton, respect to both anthropometric and physical perfor-
McGuigan, & Doyle, 2008). Poorer reliability of mance measures over the 4-year period (Table IV
CMJ was however reported in untrained, 13-year- and Figure 2). It is however worth noting that within
old adolescents (13%, (Lloyd, Oliver, Hughes, & such a limited group of players (i.e., n = 10), small
Williams, 2009)), but in this latter study a contact changes in ranking are responsible for large changes
mat was used, which prevents a direct comparison in ICC (i.e., moving from the first to second place
with present data. The CV for the peak incremental corresponds to a 10% change). We also observed
test speed observed in the present study (3.5%) is large inter-individual differences in the change in
also consistent with the data reported for similar physical performances in players presenting similar
incremental running protocol: 3.5% for speed anthropometric and performance profiles at the age
reached at the University of Montreal Track Test of 12, both between two consecutive testing sessions
in moderately trained athletes (Leger & Boucher, and over the entire 4-year period (Figure 1). We
1980), 2.5 and 3% for treadmill peak speed in well- observed for example, in two young players present-
trained male distance runners (Saunders, Cox, ing a similar anthropometric profile and an equiva-
Hopkins, & Pyne, 2010) and recreational runners lent maximal sprint speed at the age of 12.5 years
(Harling, Tong, & Mickleborough, 2003), respec- (25.5 km · h−1), more than 2.5 km · h−1 of difference
tively. With respect to body composition, present at the age of 16.5 (Figure 1, upper right panel).
CV values are also comparable to the literature on These two players belonged to the same team,
intra-observer reliability: height (0.4% vs. 0.2–0.4% reported a great attendance to training and therefore
for a meta-analysis in adults (Ulijaszek & Kerr, have, with the exception of short (less than a week)
10 M. Buchheit & A. Mendez-Villanueva

training interruptions due to minor injuries (none of selection process and methods of data analysis
them got a major injury), and the obvious playing might explain the differences observed. It is also
position-related differences in training load, a likely worth noting that the precision needed to rank indi-
similar training load over the 4 years. It is therefore viduals is much higher for selection in soccer players
difficult to clearly explain these individual responses than in the general population, since club/academies
to training, but they were likely associated with dif- will tend to select a couple of players among thou-
ferent genetic backgrounds (Bouchard & Rankinen, sands of candidates, with the well-known associated
2001; Vollaard et al., 2009) and changes in body economic stakes. Therefore, present results suggest
dimensions, which show large inter-individual differ- that the predictive values of physical testing in such
ences in their rate of devolvement during and/with young soccer players (i.e., U13) may be limited,
growth (Malina et al., 2004). Anecdotally, the player especially for speed (i.e., 10-m sprint time) and
that became faster put on more (muscle) mass than explosive strength (i.e., CMJ) measures
the other, i.e., +3 kg of BM, which is known to affect (ICCs ≤ 0.66). In contrast, height, BM and to a
speed-related locomotor performance in this specific lesser extent, aerobic power (as inferred from peak
population (Mendez-Villanueva et al., 2010). It is incremental test speed), are likely be the more stable
however worth noting that due to the limited sample measures. While the exact reasons for the observed
size for the longitudinal analysis, the uncertainty of differences in the long-term stability of these mea-
some of the outcomes is important (i.e., large 90% sures could not be examined in the present study,
CI for ICCs); present results should therefore be our data question the interest of testing some physi-
interpreted with care. Further longitudinal studies cal performances in a talent identification perspec-
on a larger sample size using a more powerful mod- tive at these early ages (Abbott & Collins, 2002;
elling approach (Maia et al., 2003), together with a Buchheit, 2011).
better description of individual training contents, are
therefore warranted to draw definitive conclusions.
In the present study, while more than 30 players Conclusion
trained/played in the academy during the 4-year per- To conclude, in highly-trained developing soccer
iod of interest, we chose to restrict our analysis to the players, the short-term reliability of anthropometry
players having successfully participated in at least and performance measures is unlikely to be affected
90% of the total testing sessions to assess the con- by age and maturation. Similar thresholds (i.e., ½ of
tinuous changes in their respective ranking (Table IV the CV (Hopkins et al., 1999)) can therefore be used
and Figure 2); this resulted in a limited but high- to assess meaningful changes during the season in
quality sample size for final data analysis. Except for players differing in age/maturation status. However,
the exemplary study from Phillippaerts (Philippaerts the relative ranking of each player within a team can
et al., 2006), longitudinal studies including a larger vary considerably, so that the changes in anthropo-
sample size have generally been conducted over metric and physical performance measures are unli-
shorter time periods and/or on less trained popula- kely to be predictable throughout adolescence. In
tions (e.g., Abbott & Collins, 2002; Beunen et al., players presenting similar anthropometric and perfor-
1992; Lefevre, Beunen, Steens, Claessens, & mance profiles at the age of 12, very large inter-indi-
Renson, 1990; Maia et al., 2001; Maia et al., 2003; vidual differences can exist in the change in their
Williams et al., 2011). physical performances over 4 years. This poor long-
The fact that the fittest U13 players might not term stability partly limits and questions the interest of
remain the fittest when they reach the U17 age testing some physical performance measures in such
group is consistent with the poor long-term stability young players (i.e., U13) in a talent identification
of the measures observed in a general sporting popu- perspective. Further longitudinal studies on larger
lation over a year, where correlation coefficient less sample sizes are still warranted to define the optimal
than 0.70 were reported between the rank scores periods of testing throughout growth and development
(pre vs. post period of interest) (Abbott & Collins, that may be associated with the greatest prediction of
2002). In contrast, Beunen and collaborators future anthropometric and performances profiles.
showed in different studies in the general population
that the stability of physical fitness was moderate
