Professional Documents
Culture Documents
Reading Research Mass
Reading Research Mass
Reading and understanding the full text of a research study without formal training
can be a daunting task. We’ve all been there too. We can each remember a time
not too long ago (Eric Helms was still in his early 50s then, Greg Nuckols was 11
years old, and Mike Zourdos still had long hair) when the title of a study caught our
eye and we eagerly searched out the full text. However, upon locating the full text,
we discovered that deciphering the scientific jargon was overwhelming. Reading
the full text of a journal article and discovering unfamiliar terms (i.e. ANOVA, p-
value, etc.) along with a foreign writing style is frustrating when all you want to
know is "what can I take from this to increase my bench press?" Well, we’re here to
help. In addition to our monthly reviews, we want you to enhance your own skill set
to decipher research.
READING AN INTRODUCTION
The overarching goal of an introduction is to build a clear rationale
based upon previous scientific data and then present the purpose of the study and
the hypotheses to the reader. To accomplish this, the authors of the paper start
with the most basic concepts, which apply to the specific study, then establish a
problem and thus, the need for the present study.
An introduction should have a clear flow, and the concepts must logically connect.
If, as the reader, you look up and think, "wait, how did we get here," the introduction
may very well have a hole in it. Essentially, the rationale must be clear. In science,
research should not try to go from point "A" to point "C.’’ Rather, data connects
from "A" to "B" because that is how a rationale is built. Even if we are already past
point "B" in a practical sense, the research still needs to investigate the
fundamental underlying theory. This is important as sometimes the assumptions
underlying practice are incorrect, and practice could be improved by investigating
the underlying theory. With that said, in reality, practical experience may very well
influence the construction of a research question in applied science. Specifically,
changes in practice occur faster than changes in science as anecdotal information
requires less time to accumulate than scientific evidence. Thus, practice often
informs research to test the applications that already are in place under more
rigorous control. Often, occurrences drive research questions. For example, if you
see a positive training adaptation in response to a programming change, then you
might examine the literature to see if there is a logical case to be made to
investigate why that programming change might be effective.
In the most basic sense, to interpret an introduction simply try to “talk it out” (you
can do this verbally by yourself or as a writing exercise) as if you were explaining it
to someone with absolutely no knowledge of the concept. For example, if you are
reading a study investigating the effect of different repetition ranges with equated
volume on hypertrophy, you might talk it out as the following:
3. Moderate- to high-repetition sets are often utilized for high volume and thus are
recommended for hypertrophy training.
4. However, it is not known if high repetitions are better than lower repetitions for
hypertrophy for any mechanistic reason.
6. Therefore, the purpose of this study is to examine high versus lower repetitions with
equated volume for muscle hypertrophy.
7. We hypothesize that hypertrophy adaptations will be the same due to equal volume,
however, high repetitions will be more time efficient to complete.
That example is a pretty basic concept, but the flow is logical. If the scientific
writing is difficult to understand, simply talk out the rationale as you might explain
it to someone, then refer back to the study and the scientific writing should now be
easier to understand. Ultimately, as a reader, you should be able to easily find a
logical flow and rationale for the study. Additionally, it is often (although not
always) that the best introductions accomplish all of the above in a concise manner
(i.e. 2 pages or less in a Word document). As you read enough papers, you’ll be
able to differentiate the good introductions from the poor ones.
Unlike an introduction, a methods section will have subsections. The first two
subsections will be titled subjects/participants and experimental protocol (note: the
titles of these sections will vary slightly by journal), and the final subsection will
be statistical analyses. In between the initial and final subsections, the remaining
subsections will be study-specific, each section explaining in detail the actual
training program and testing methods employed. Sometimes there are tertiary
sections within a subsection.
Let’s use our example from the introduction and devise methods subsections:
METHODS
Subjects
Experimental Protocol
Testing Protocols
o One-Repetition Maximum Testing
o Wilks Coefficient
o Muscle Thickness
o Muscular Endurance
o Body Fat Percentage
Training Protocol
Dietary Log
Training History Questionnaire
Statistical Analyses
The above example is similar to a previous full text, which can be found here.
Above, you can see all of the testing procedures in addition to the initial standard
sub-sections for subjects and the experimental protocol. It is important that a
study explicitly describes or cites the testing procedures so that it is reproducible.
In the experimental protocol (or similarly titled section), you’ll find the study design,
where it is stated if the study has multiple groups or conditions. If a study has
"groups," it is comparing different individuals divided into multiple groups. These
groups are usually randomly sampled or often, counterbalanced. For example, if
training program A is compared to program B for squat strength over 10 weeks, the
subjects will be counterbalanced so that there is no significant difference (more on
"significant" difference and statistics in just a bit) between groups for strength at
baseline (start of the study). One thing to look for is that ideally, groups should also
be counterbalanced by relative strength at baseline in addition to absolute
strength. On the other hand, if a study has "conditions," then it is a crossover
design, where subjects are compared to themselves under different conditions. For
example, an individual takes 300mg of caffeine immediately prior to performing a
muscular endurance test with 60% of 1RM, then 72 hours later the same individual
ingests a placebo and repeats the protocol.
In studies that employ a training program over multiple weeks (usually a mesocycle
of about 8 weeks), details are commonly left out. We imagine you have read a
study before, as have we, which stated something to the effect of: "Subjects trained
3 times a week for 8 weeks. Subjects completed 3-4 sets of 8-12 repetitions between 60-
80% of 1RM each training session." A description like this is, unfortunately, common
and leaves much to be desired. Questions remain such as: How was load
progressed? How many subjects did 12 reps and how many did 8 reps? How was it
decided when to do 4 sets and when to do 3 sets? Did some subjects do more reps
on multi-joint exercises and some more on single-joint exercises? Obviously, all of
those factors could skew the results. We would again point you to the study linked
above as an example of a clear methods section, which describes the details of the
training program, including the progression model, so that the protocol is clear to
the reader.
Statistics
Perhaps the most daunting task of reading a scientific manuscript without formal
training is tackling the statistical analysis section, which is always located at the end
of the methods section just before the results. To decrease the barriers of
understanding a statistical analysis section, let’s cover the basic terms and statistics
you’ll see in applied exercise science research. The below terms, definitions, and
descriptions are far from comprehensive, but they are the most common and will
provide a baseline knowledge for statistics reading.
While you may be familiar with this, the mean (or the average) is the sum of all
values divided by the total number of values. You may also be familiar with the
term standard deviation, which is the deviation or variation from the mean. A small
or low standard deviation represents that variation is low and the subjects’
responses or descriptive statistics were homogenous. However, a high standard
deviation demonstrates a large variation in response. Therefore, when a high
standard deviation exists (in response to an intervention), it is useful to see
individual responses and not just means.
P-value
In applied exercise science, you will almost always see a statement that the level of
significance is set at p≤0.05. Now, the "default" position is the null hypothesis,
which states that any difference between groups or conditions is due to sampling
or random error. A p-value, generated by the statistical tests utilized, represents
the probability that any observed difference is due to sampling or random error.
Thus, if the resultant p-value of the particular statistical model (discussed below) is
less than or equal to 0.05, the findings are said to be "statistically significant."
Specifically, this means there is a less than 5% chance the change is random; thus,
the null hypothesis can be reasonably rejected. In other words, if a p-value is 0.01,
there is a 1% chance statistical significance is being reported mistakenly. Also, if a
p-value falls between 0.051 and 0.099, this is commonly highlighted and stated to
be "approaching significance," meaning, despite not reaching the threshold for
significance, the authors believe there is likely a "meaningful difference," which will
be discussed more later. Note that p-values are often misinterpreted to simply
represent the likelihood that the study’s hypothesis is true (i.e. if there’s only a 5%
probability that the observed difference would have occurred by chance, there
must be a 95% chance that there is really a meaningful difference).
R-value
Magnitude of an R-value
R-squared or r2
An r2 will sometimes be reported with regression analysis as well. This value will
provide the amount of variance in the dependent variable that can be predicted by
the independent variable. For example, if there is an r2 = 0.50 from a regression
analysis using change in muscle growth as the independent variable and change in
maximal strength as the dependent variable, then the r2 value indicates 50% of the
change in strength is explained by the change in hypertrophy.
T-test
A t-test is only used to examine if a difference exists between two variables. There
are two types of t tests to discuss: a student’s (independent) t-test and a paired t-
test.
Student’s t-test: In applied exercise science, a student’s t-test is used when the
individuals are unrelated. For example, if a study is testing program A versus
program B over 8 weeks, the subjects should be counterbalanced (as discussed
above) at baseline. Thus, when subjects are allocated to groups, a student’s t-test
will be used to ensure those groups do not have statistically different strength
levels to start, meaning the goal is for p>0.05 in this case. In reality, researchers
should aim to have strength levels as close as possible between groups to start, so
a p-value much higher than 0.05 is desirable.
Paired t-test: A paired t-test is used to compare two conditions when the individuals
are the same, such as in a crossover design. For example, if we used the
hypothetical crossover study on the effects of caffeine on muscular endurance
from the methods portion of this document (caffeine prior to testing, then 72 hours
later the same test after no caffeine), a paired t-test would be used to examine if
differences exist between conditions since the subjects are the same.
Quite commonly, a repeated measures ANOVA will appear in the statistical analyses
section of the paper you are reading. This is used to assess differences between
two or more groups at multiple time points. For example, if training program A is
compared to training program B over 8 weeks, and bench press 1RM was tested
pre- and post-study, then a 2 (group) x 2 (time) repeated measures ANOVA would
be used to assess changes and differences between groups for bench press 1RM.
Now, let’s say that a mid-point measurement was also conducted for bench press
1RM. In that case, there is an additional time point so the ANOVA would now be a 2
(group) x 3 (time) repeated measures test. However, if we went back to only 2
time points (pre and post), but added a group (now comparing programs A, B and
C), it would be …. you guessed it, a 3 (group) x 2 (time) repeated measures. See? It’s
not hard. Remember, in most instances, names are very descriptive: In this case,
we have "repeated measures," thus, the measures are being repeated.
Post-Hoc Test
The statistics section may also state that a "post-hoc" test was used for "pairwise
comparisons" or "multiple comparison" purposes. In short, this means that there is
significance somewhere as detected by the original ANOVA, but the researcher
doesn’t know "where" the significance is yet. So, a post-hoc is used to determine if
the difference is pre- to post-study in either group (a change over time within a
group) or if there is a group or interaction difference (one group changing more
than the other) being detected.
One-Way ANOVA
Effect Size
If you have perused a scientific paper, you are most likely familiar with the terms t-
test and ANOVA, but maybe less familiar with the term effect size (ES); fortunately,
ES is becoming much more common and almost standard to report in exercise
science research these days. An ES will quantify the magnitude of difference
between groups and conditions or is simply used to examine the magnitude of
change in one group from pre- to post-study. This is quite useful to help determine
a meaningful difference if the p-value does not reach statistical significance.
Conversely, the ES might reveal that there is not much of a meaningful difference
when the p-value is indeed <0.05; thus, an ES is useful for both of those reasons.
1) Between Conditions ES
Again, let’s use the caffeine example for max repetitions on the squat at 60% 1RM,
with and without caffeine, 72 hours apart. We would first run a paired t-test, then
afterward, we would also calculate an ES, which would be calculated as: ES = (Mean
2 – Mean 1) / Pooled Standard Deviation. Mean 2 in this case is the caffeine
condition because it would be hypothesized that caffeine ingestion would result in
more squat repetitions. Thus, if that is correct, the ES would be positive. If in fact,
the placebo or control (i.e. no caffeine) condition had better performance, the
resulting effect size would be negative, showing that the results were the opposite
of the hypothesis. Finally, the pooled standard deviation is the weighted average of
the standard deviations. Therefore, when calculating the pooled SD, the condition
or group with the larger sample size receives more weight in the calculation.
Now, if a study doesn’t report a between condition ES, the good news is you can
usually do it yourself. In this instance of an acute study crossover design, you
indeed can calculate it, so let’s do so. For our mock calculation, we’ll make up
numbers for each condition. Placebo = 24 ± 9 reps, and caffeine condition = 28 ± 7
reps. Note that there are no decimals above. That is because you only record
whole numbers for the measurement, then you round to the nearest whole
number as well. We would now do: 28 – 24 (our hypothesis is in favor of the
caffeine group so that number comes first) = 4. Then, divide 4 by the pooled
standard deviation of 8 (the average of 9 and 7), which gives us an ES of 0.50. The
good news is you don’t have to do this by hand, just plug in the means and
standard deviations at this link. https://www.socscistatistics.com/
2) Within Group ES
Think back to the 8-week program A versus program B comparison with two groups
in which we ran the 2 x 2 repeated measures ANOVA. In addition to that, we would
also like to examine the magnitude of change in each individual group over time
from pre- to post- testing. To do this, we can calculate a within group effect size
using the same formula as above: ES = (Mean 2 – Mean 1) / Pooled Standard
Deviation. This time, Mean 2 is the post-study bench press 1RM mean in group A,
and Mean 1 is the pre-study bench press 1RM mean in group A. Then we would do
that again, but for group B.
This equation will simply give us the magnitude of change in each individual group;
it shouldn’t be used to compare groups (unfortunately, it’s often used to make
between-group comparisons, but that’s not how within group ESs should be used).
The within group effect size is something you can again calculate yourself if a study
does not. So, let’s again make up some numbers in terms of means and SDs for a
mock calculation: Group A has a pre-study bench press 1RM of 102.75 ± 10.57kg
and a post-study 1RM of 115.50 ± 14.78. Following the equation above, we would
simply find the difference of pre- and post-training means for the group, so group A
would be: 115.50 – 102.75 = 12.75, then divide that by the pooled SD (the average
of 10.57 and 14.78). Using the link just above in example 1 we would get an ES
of 0.99.
3) Between Group ES
Unlike the within group, the between group ES compares the magnitude of change
between group A and group B. To accomplish this, the means of the change
scores are used rather than pre- or post-study means or simply the means of each
condition in example 1. However, unlike the two examples above, the between
group effect size most likely cannot be calculated from simply reading a
manuscript. This is because you need the pooled SD of the mean change, which
cannot be found by simply having pre- and post-study means. The SD of the mean
change is calculated from each of the individual change scores themselves; thus,
you would need individual subject data for this calculation. The second-best way to
calculate between group ES would be to use the mean change scores as described
and then divide the difference by the pooled standard deviations of both groups’
pre-study means, which could be calculated on your own.
Percentage Change
A main time effect is the overall effect over time. In other words, let’s take
our program A versus program B example for bench press 1RM over 8
weeks. If the results state, “There was a significant main time effect (p=0.02)
for bench press 1RM”, that means that across all subjects (both groups
included), bench press increased from pre- to post-study. Again, names are
descriptive, so just break down the term and you’ll figure it out. Main
(overall), time (pre- to post-study), effect (something changed). Then, a post
hoc test tells us if there was a difference between groups for this overall
change. So, main effects are always presented first, then interactions
second.
Group Effect
Using our training program example, a group effect tells you if there is a
statistically greater bench 1RM at post-study in one group vs. the other.
This will tell you if the groups are changing at different rates over time.
After the main statistics are presented, you may see effect sizes and percentage
changes. Now, those statistics are not always applicable, so if you don’t see them,
that doesn’t mean the results are inherently poor. However, for many studies
applicable to the readers of MASS, effect sizes will often be appropriate, and the
growing trend in applied exercise science research is to include effect sizes as part
of the standard statistical package. Since we now understand common statistics
and how they are presented, it is our hope that by looking at the totality of
statistics, you'll be able to decipher a results section.
It’s important to understand – especially in our field where sample sizes are small
(making null hypothesis testing more prone to error) – when a meaningful
difference is present, even in the absence of a statistically significant difference.
However, thinking even further, an ES >0.20 in the absence of a significant p-value
does not always represent a meaningful change. It is quite possible in this scenario
that with a larger sample size, this ES would not remain. Therefore, an ES >0.20
(and findings in general) should be viewed with some skepticism in small sample
sizes, especially in the absence of a significant p-value. This is not to say that a
meaningful difference doesn’t exist in this case, but rather the totality of statistical
analyses should be analyzed in each study when determining whether a meaningful
difference exists. Finally, firm conclusions should only be made once a strong
majority of studies show similar outcomes.
Figures are often used when data is more easily understood visually. For example, if
the authors want to display volume over the course of a training study, they might
make a figure that plots tonnage over time, as it would easily show volume in
different weeks relative to other weeks. Figures can be line graphs or bar graphs (or
other visual displays), and they typically have error bars (representing standard
deviation) but don’t list the exact values.
When the authors want to present a lot of data, including means, standard
deviations, effect sizes, and percentage changes for many groups, it is often more
efficient to do so with a table. The text will state the effects that occur, and then will
refer to the table. For example, "Both program A and program B increased bench
press 1RM from pre- to post-study (p=0.02), however, there was a significant group
effect (p<0.001) in favor of program A. Specific values for bench press 1RM are
displayed in Table 1."
An important note: Authors typically don’t "double report." Meaning, data in a table
or figure won’t be written in the body of the text and vice versa. Thus, if you think
you are missing something in the text, make sure to check the tables and figures.
Results of statistical analysis reported as main effects first, and then group
and interaction effects second.
Tables and figures to provide visuals.
Hopefully, exact p-values and ES.
READING A DISCUSSION
Most of the time, the final main heading of a scientific manuscript is the discussion.
The discussion is where authors interpret their findings and compare and contrast
the findings to other research. Additionally, in our field, there is special importance
on presenting practical applications of the findings. You should also find limitations
that the authors themselves point out (every study has limitations) and suggestions
for future research to move the ideas forward. A good discussion also presents the
findings cautiously.
Opening Paragraph
This is most of the discussion. The authors will compare and contrast their findings
with other research. If findings are in disagreement with other research, then
explanations should be given. These may commonly include: different
methodologies (i.e. different dosage of a supplement), different subject population
(i.e. trained vs. untrained), different study length (i.e. 8 vs. 12 weeks), etc.
Limitations
Every study has limitations, and there’s nothing wrong with that. A good discussion
acknowledges these limitations. Sometimes the authors will provide counterpoints
to help explain the novelty of the study to offset the limitations; but nonetheless,
the limitations exist. If the authors don’t point out the limitations when they submit
a study, often the reviewers will ensure that they do before allowing it to be
published.
The final part of a discussion is the conclusion. Its purpose is to briefly restate the
main points and then provide practical applications. In some journals, there is a
separate heading for applications, and in some, there is not. However, in either
case, a good discussion in applied research provides recommendations for the
athlete and coach to utilize the findings to improve their training and performance.
But, you’re lucky: You have MASS for that, too.
Let’s recall our crossover design caffeine example and assume the caffeine
condition produced significantly more reps at 60% of 1RM. An overstatement
would be, "Based upon our findings, we recommend that acute caffeine ingestion can
be utilized in all populations of athletes to improve resistance training performance."
That is a poor statement for a few reasons:
1. This study did not employ all populations of athletes; thus, the authors should
only apply it to the population presently used. That doesn’t mean that the
intervention wouldn’t be beneficial for other populations. It just means that the
authors cannot say that based upon the present dataset.
EXTRA POINTS
Before we conclude, let's go over just a few more points. In any manuscript, usually
on the first page, you’ll find the corresponding author with his/her email. If you
have any questions about the manuscript, don’t hesitate to contact the
corresponding author. If you’re respectful and patient, we bet you get a response.
You can find papers by searching on PubMed. Not all journals in the field are
indexed on Pubmed, but most of the high-quality ones are. Here is a pretty
comprehensive journal list, along with each journal’s associated impact factor (a
rough guide to its overall quality). If you cannot find a journal on Pubmed, try
Google Scholar.
When conducting a Pubmed search, after you type in your search term and hit
return/enter, you will notice tabs on the left and on top of the page where you can
refine your search. You can select "most recent" or "best match," and you can also
select "Review," which is found under "Article Types." If a topic is new to you,
selecting "review" is useful to find a meta-analysis, systematic review, or even
narrative review to give you an overarching idea of the results and mechanisms.
Additionally, a recent review will have many papers in the reference list; thus, you
will now have all of those papers at your disposal.
FINAL WORDS
We have various goals with MASS. One is obvious: to disseminate the most
important and recent information related to strength sport to you in an easy-to-
understand format. However, an additional goal is to teach and to help improve
the ability of our readers to interpret scientific research to enhance their own
knowledge and training. It is our hope that this document will help reading the full
text of a study become a less daunting task, allowing you to be confident in your
ability to read and understand a full manuscript. Besides, what could be a better
Friday or Saturday night than looking for science on Pubmed?