Professional Documents
Culture Documents
Reporting: by Mohammed Nawaiseh
Reporting: by Mohammed Nawaiseh
By
Mohammed Nawaiseh
References
Reporting
● http://evc-cit.info/psych018/Reporting_Statistics.pdf
Discovering statistics
● http://www.discoveringstatistics.com/docs/writinglabreports.pdf
Μ mean
x bar bar symbol above the x denotes mean
Range
R χ2 Chi-square (read as kai)
H0 null hypothesis
H1 alternate hypothesis
t T test
f ANOVA
Mdn median
U Mann-Whitney Test
d Cohen’s d
General Guidelines
● Rounding Numbers
● Statistical Abbreviations
● General
Rounding Numbers
● For numbers greater than 100, report to the nearest whole number (e.g., M = 6254).
● For numbers between 10 and 100, report to one decimal place (e.g., M = 23.4).
● For numbers between 0.10 and 10, report to two decimal places (e.g., M = 4.34, SD = 0.93).
● For numbers less than 0.10, report to three decimal places, or however many digits you need to have a non-zero number
(e.g., M = 0.014, SEM = 0.0004).
● Do not report any decimal places if you are reporting something that can only be a whole number. For example, the number of
participants in a study should be reported as N = 5, not N = 5.0.
● Report exact p-values (not p < .05), even for non-significant results. Round as above, unless SPSS gives a p-value of .000;
then report p < .001. Two-tailed p-values are assumed. If you are reporting a one-tailed p-value, you must say so.
● Omit the leading zero from p-values, correlation coefficients (r), partial eta-squared (ηp 2), and other numbers that cannot ever
be greater than 1.0 (e.g., p = .043, not p = 0.043).
● Please pay attention to issues of italics and spacing. APA style is very precise about these.
● Also, with the exception of some p values, most statistics should be rounded to two decimal places.
Statistical Abbreviations
● Abbreviations using Latin letters, such as mean (M) and standard deviation (SD), should be italicised, while
abbreviations using Greek letters, such as partial eta-squared (ηp 2), should not be italicised and can be
written out in full if you cannot use Greek letters.
● There should be a space before and after equal signs.
● The abbreviations should only be used inside of parentheses; spell out the names otherwise.
● Inferential statistics should generally be reported in the style of:
○ “statistic(degrees of freedom) = value, p = value, effect size statistic = value”
Statistic Example
Percentages are also most clearly displayed in parentheses with no decimal places:
● Nearly half (49%) of the sample was married.
Chi-Square statistics are reported with degrees of freedom and sample size in parentheses, the Pearson
chi-square value (rounded to two decimal places), and the significance level:
● The percentage of participants that were married did not differ by gender, x2(1, N = 90) = 0.89, p = .35.
Tables are useful if you find that a paragraph has almost as many numbers as words. If you do use a table, do
not also report the same information in the text. It's either one or the other.
T Tests are reported like chi-squares, but only the degrees of freedom are in parentheses. Following that, report
the t statistic (rounded to two decimal places) and the significance level.
● There was a significant effect for gender, t(54) = 5.43, p < .001, with men receiving higher scores than
women.
● What you put in the wording will differ slightly depending on if you have a one sample t-test, or a t-test for
groups. Examples:
○ One sample: “Younger teens woke up earlier (M = 7:30, SD = .45) than teens in general. t(33) = 2.10,
p = 0.31″
○ Dependent/Independent samples: “Younger teens indicated a significant preference for video games
(M = 7.45, SD = 2.51) than books (M = 4.22, SD = 2.23), t(15) = 4.00, p < .001.”
ANOVAs (both one-way and two-way) are reported like the t test, but there are two degrees-of-freedom numbers
to report. First report the between-groups degrees of freedom, then report the within-groups degrees of freedom
(separated by a comma). After that report the F statistic (rounded off to two decimal places) and the significance
level.
● There was a significant main effect for treatment, F(1, 145) = 5.43, p = .02, and a significant interaction, F(2,
145) = 3.24, p = .04.
Correlations are reported with the degrees of freedom (which is N-2) in parentheses and the significance level:
● The two variables were strongly correlated, r(55) = .49, p < .01.
Regression results are often best presented in a table. APA doesn't say much about how to report regression
results in the text, but if you would like to report the regression in the text of your Results section, you should at
least present the unstandardized or standardized slope (beta), whichever is more interpretable given the data,
along with the t-test and the corresponding significance level. (Degrees of freedom for the t-test is N-k-1 where k
equals the number of predictor variables.) It is also customary to report the percentage of variance explained
along with the corresponding F test.
● Social support significantly predicted depression scores, b = -.34, t(225) = 6.53, p < .001. Social support
also explained a significant proportion of variance in depression scores, R2 = .12, F(1, 225) = 42.64, p <
.001.
P- Value
● Significance levels in journal articles--especially in tables--are often reported as either "p > .05," "p < .05,"
"p < .01," or "p < .001." APA style dictates reporting the exact p value within the text of a manuscript
(unless the p value is less than .001).
● P-values: report the p-value exactly, unless it is less than .001. If less than that amount, the convention is
to report it as: p < .001.
● Probability values referring to significance tests should be reported exactly (i.e., to either two or three
decimal places).
○ Note that p values less than .001 should be reported as p < .001.
● You should not use a leading zero when reporting p values as they cannot exceed 1:
○ p = 0.02 → incorrect, because of the leading zero
○ p < .05 → incorrect, because it should be reported exactly
○ p = .02
○ p = .0001 → incorrect, because it should be reported as: p < .001.
Confidence intervals
● For CIs, use brackets: 95% CIs [2.47, 2.99], [-5.1, 1.56], and [-3.43, 2.89].
● If you are reporting a list of statistics within parentheses, you do not need to use brackets within the
parentheses. For example (SD = 1.5, CI = -5, 5)
○ Participants were 98 men and 132 women aged 17 to 25 years (men: M = 19.2, SD = 2.32; women: M
= 19.6, SD = 2.54).
Inferential Statistics
● When reporting inferential statistics (e.g., t test, F tests, chi square) include enough information to
allow readers to make sense of the analysis. Also, effect sizes and confidence intervals must be
included where possible (or enough information so that the reader can construct them).
Non-parametric tests
● Do not report means and standard deviations for non-parametric tests.
● Report the median and range in the text or in a table.
● The statistics U and Z should be capitalised and italicised.
● A measure of effect size, r, can be calculated by dividing Z by the square root of N (r = Z / √N).
● Tests
○ Mann-Whitney Test
○ Wilcoxon Signed-ranks Test
○ Sign Test
Mann-Whitney Test (2 Independent Samples...)
● A Mann-Whitney test indicated that self-rated attractiveness (sra) was greater for women who were not
using oral contraceptives (Mdn = 5) than for women who were using oral contraceptives (Mdn = 4), U =
67.5, p = .034, r = .38.
Wilcoxon Signed-ranks Test (2 Related Samples...)
● A Wilcoxon Signed-ranks test indicated that femininity was preferred more in female faces (Mdn = 0.85)
than in male faces (Mdn = 0.65), Z = 4.21, p < .001, r = .76.
● Negative ranks stated that female was more than male in 25 times; that's why female faces were preferred more
Sign Test (2 Related Samples...)
● A sign test indicated that femininity was preferred more in female faces than in male faces, Z = 3.47, p = .001.
T Tests
● Report degrees of freedom in parentheses.
● The statistics t, p and Cohen’s d should be reported and italicised
● Tests
○ One-sample t-test
○ Paired-samples t-test
○ Independent-samples t-test
One-sample t-test
● One-sample t-test indicated that femininity preferences were greater than the chance level of 3.5 for female faces (M = 4.50, SD = 0.70), t(30) = 8.01, p
< .001, d = 1.44, but not for male faces (M = 3.46, SD = 0.73), t(30) = -0.32, p = .75, d = 0.057.
● Mean for female were larger than 3.5, but males were not.
Another Ex
● The number of masculine faces chosen out of 20 possible was compared to the chance value of 10 using a one-sample t-test.
Masculine faces were chosen more often than chance, t(76) = 4.35, p = .004, d = 0.35.
Paired-samples t-test
Report paired-samples t-tests in the same way as one-sample t-tests
● A paired-samples t-test indicated that scores were significantly higher for the pathogen subscale (M = 26.4, SD = 7.41) than for the sexual subscale (M
= 18.0, SD = 9.49), t(721) = 23.3, p < .001, d = 0.87.
● Scores on the pathogen subscale (M = 26.4, SD = 7.41) were higher than scores on the sexual subscale (M = 18.0, SD = 9.49), t(721) = 23.3, p < .001,
d = 0.87. A one tailed p-value is reported due to the strong prediction of this effect.
Independent-samples t-test
● Depending on the Levene’s test for equality of variances
○ > .05 → Not significant → equal variances assumed
○ < .05 → Significant → equal variances not assumed
● If Equal variances assumed → An independent-samples t-test indicated that scores were significantly higher for women (M = 27.0, SD = 7.21) than for
men (M = 24.2, SD = 7.69), t(734) = 4.30, p < .001, d = 0.35.
● If Equal variances not assumed → If Levene’s test for equality of variances is significant, report the statistics for the raw equal variances not assumed
with the altered degrees of freedom rounded to the nearest whole number.
○ Scores on the pathogen subscale were higher for women (M = 27.0, SD = 7.21) than for men (M = 24.2, SD = 7.69), t(340) = 4.18, p < .001, d =
0.35.
○ Levene’s test indicated unequal variances (F = 2.56, p = .109), so degrees of freedom were adjusted from 734 to 340.
● Mean score for female (27) is more than mean score for male (24)
Reporting Levene's test
● Levene's test showed that the variances for body fat percentage were not equal, F(2,77) = 4.58, p = 0.013.
Or
● Levene's test showed that the variances in mile time of non smokers is significantly different than that of
smokers, F(315.846) = 102.9, p < 0.001 .
Note
● F (Df of the first row, Df of the second row)
ANOVA
● ANOVAs have two degrees of freedom to report. Report the between-groups df first and the within-groups df second, separated by a comma and a
space (e.g., F(1, 237) = 3.45).
● The measure of effect size, partial eta-squared (ηp 2), may be written out or abbreviated, omits the leading zero and is not italicised.
● Tests
○ One-way ANOVAs
○ 2-way Factorial ANOVAs
○ 3-way ANOVAs and Higher
○ ANCOVA
● F(dfEffect, dfError) = value, p value, effect size
One-way ANOVAs and Post-hocs
● Analysis of variance showed a main effect of self-rated attractiveness (SRA) on preferences for femininity in female faces, F(2, 1279) = 6.15, p = .002,
ηp 2 = .010.
● Post Hoc analyses using Tukey’s HSD indicated that femininity preferences were lower for participants with low SRA than for participants with average
SRA (p = .014) and high SRA (p = .004), but femininity preferences did not differ significantly between participants with average and high SRA (p = .82).
In ANOVA report the the main effect then the interaction effect.
2-way Factorial ANOVAs
● A 3x2 ANOVA with self-rated attractiveness (low, average, high) and oral contraceptive use (true, false) as
between-subjects factors revealed a main effects of SRA, F(2, 1276) = 6.11, p = .002, ηp2 = .009, and oral
contraceptive use, F(1, 1276) = 4.38, p = .037, ηp 2 = 0.003.
● These main effects were not qualified by an interaction between SRA and oral contraceptive use, F(2, 1276)
= 0.43, p = .65, ηp 2 = .001.
3-way ANOVAs and Higher
● Although some textbooks suggest that you report all main effects and interactions, even if not significant,
this reduces the understandability of the results of a complex design (i.e. 3-way or higher).
● Report all significant effects and all predicted effects, even if not significant. If there are more than two
non-significant effects that are irrelevant to your main hypotheses (e.g. you predicted an interaction among
three factors, but did not predict any main effects or 2- way interactions), you can summarise them as in the
example below.
3-way ANOVAs and Higher
● A mixed-design ANOVA with sex of face (male, female) as a within-subjects factor and self-rated
attractiveness (low, average, high) and oral contraceptive use (true, false) as between-subjects factors
revealed a main effect of sex of face, F(1, 1276) = 1372, p < .001, ηp 2 = .52.
● This was qualified by interactions between sex of face and SRA, F(2, 1276) = 6.90, p = .001, ηp 2 = .011,
and between sex of face and oral contraceptive use, F(1, 1276) = 5.02, p = .025, ηp 2 = .004.
● The predicted interaction among sex of face, SRA and oral contraceptive use was not significant, F(2, 1276)
= 0.06, p = .94, ηp 2 < .001.
● All other main effects and interactions were non-significant and irrelevant to our hypotheses, all F ≤ 0.94, p ≥
.39, ηp 2 ≤ .001.
Violations of Sphericity and Greenhouse-Geisser Corrections
● ANOVAs are not robust to violations of sphericity, but can be easily corrected.
● For each within-subjects factor with more than two levels, check if Mauchly’s test is significant (p <.05).
○ If so, report chi-squared (χ2), degrees of freedom, p and epsilon (ε) as below and report the
Greenhouse-Geisser corrected values for any effects involving this factor (rounded to the appropriate
decimal place).
● Only two levels→ SPSS will report a chi-squared of .000 and no p-value for within subjects factors with only
two levels; corrections are not needed.
Violations of Sphericity and Greenhouse-Geisser Corrections
● Data were analysed using a mixed-design ANOVA with a within-subjects factor of subscale (pathogen,
sexual, moral) and a between-subject factor of sex (male, female).
● Mauchly’s test indicated that the assumption of sphericity had been violated (χ2(2) = 36.1, p < .001),
therefore degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.95).
● Main effects of subscale, F(1.91, 1350.8) = 378, p < .001, ηp 2 = .35, and sex, F(1, 709) = 78.8, p < .001,
ηp 2 = . 10, were qualified by an interaction between subscale and sex, F(1.91, 1351) = 30.4, p < .001,
ηp 2 = .041.
Mauchly's test for the sphericity assumption
● Mauchly's test, X2 (df) = 4.05, p = .54 did (not) indicate any violation of sphericity
ANCOVA
● An ANCOVA [between-subjects factor: sex (male, female); covariate: age] revealed no main effects of sex,
F(1, 732) = 2.00, p = .16, ηp 2 = .003, or age, F(1, 732) = 3.25, p = .072, ηp 2 = .004, and no interaction
between sex and age, F(1, 732) = 0.016, p = .90, ηp 2 < .001.
In another way
● The predicted main effect of sex was not significant, F(1, 732) = 2.00, p = .16, ηp 2 = .003, nor was the
predicted main effect of age, F(1, 732) = 3.25, p = .072, ηp 2 = .004. The interaction between sex and age
were also not significant, F(1, 732) = 0.016, p = .90, ηp 2 < .001.
MANOVA
Ex
● In an effort to reveal any gender differences a MANOVA was conducted on the nine subscales of the SCL-90-R.
● Box’s test was significant [F(45, 92720) = 2.47, p < .001], indicating the covariance matrices of the dependent variables were not equal
across groups.
● In light of the significant Box’s test, Pillai’s Trace was selected over Wilk’s Λ.
○ Pillia’s Trace indicated there was a significant difference between the genders, however, the effect size of gender was small,
Pillia’s Trace = .11, F(9, 233) = 3.21, p < .001, ηp 2 = .11.
● Subsequent to the MANOVA, univariate ANOVA’s were conducted on the nine subscales. To correct for an elevated risk of a Type I Error,
the Bonferroni method was applied and the alpha value was set to .005 (.05 divided by 9).
○ A significant gender difference was identified for only the somatization scale, and the effect size of gender was very small, F(1,
241) = 9.58, p < .005, ηp 2 = .04.
○ Further, the Levene’s test for the somatization scale was significant, indicating that the error variance of the dependent variable
was not homogeneous, F(1, 241) = 12.52, p < .001.
Limitation section
● Other anomalies, such as a significant Box’s test for the MANOVA and a significant Levene's test for the somatization sub-scale during
the ANOVA, question the validity of the results.
Correlations
● Italicise r and p. Omit the leading zero from r.
● Ex
○ Preferences for femininity in male and female faces were positively correlated,
Pearson’s r(1282) = .13, p < .001.
Pearson, Spearman,Kendall tau and Point-Biserial Correlation
Mean M= The mean score in the conflicting condition, M = 37.5 s, SD = 5.9 s was greater than the mean score in the consistent
condition, M = 24.3 s, SD = 6.8 s.
Standard SD =
Deviation
Confidence 95% CI [LL, UL] Participants who drank Sleepy Head tea 20 min before going to bed did fell asleep significantly faster, M = 10.5 min, 95% CI
Interval [5.11, 15.23] than those who drank a pint of coffee 20 min before going to bed, M = 19.3 min, 95% CI [14.11, 25.23]
p values p = value (no leading zero). The more drunk the participants, the higher their attractiveness ratings for the stooges, p = .003. The more drunk the
p < .001 (if the value is less participants, the higher their attractiveness ratings for the stooges, p < .001.
than .001).
Cohen’s d d = value The mean score in the conflicting condition, M = 37.5 s, was greater than the control, M = 24.3 s, d = 1.94.
chi square x2(df) = value, p value, effect There was a significant association between the type of training and whether or not cats would dance,x 2(1)= 25.36, p < .001.
size
t test t(df) = value, p value, The mean score in the conflicting condition, M = 37.5 s, SD 5.9 s was significantly greater than the mean score in the
effect size consistent condition, M = 24.3 s, SD 6.8 s, and this difference was significant, t(67) = 4.38, p = .032, r = .47.
Correlation r = value (with no leading The more drunk the participants, the higher their attractiveness ratings for the stooges, r = .54, 95% CI [0.44, 0.64], p = .003.
Pearson (r ) zero), confidence interval, p
Spearman (rs) value
ANOVA F(dfEffect, dfError) = value, There was a significant effect of Viagra on levels of libido, F(2, 12) = 5.12, p = .025, ω 2 = .36.
p value, effect size
Templates
Test Template
Pearson's r or A (Pearson product-moment correlation coefficient or Pearson's r )was computed to assess the relationship between (Variable 1)
Pearson _____________________ and (Variable 2)____________________.”
product-moment Pearson’s r value and (possibly) significance values
correlation coefficient ● “There was a (Positive or negative) correlation (no correlation) between the two variables [r = _______, n =_______, p = ________].”
Reference to your scatterplot
● “A scatter plot summarizes the results (Figure ____)”
○ Figure number
Methods section
Statistical analysis
● SPSS version 25.0 has been used in our analysis.
● After the data (scores) were entered into a Microsoft Excel 2010 worksheet and configured properly, they
were imported into SPSS
● Descriptive statistics—including standard deviations, means, and medians—were calculated for the …
○ Or
■ Mean (± standard deviation) have been used to describe continuous variables (i.e. age).
■ Count (frequency) has been used to describe other nominal variables (i.e. gender and others).
● All statistical tests were two-sided and p values <0.05 were considered statistically significant
○ Or
■ A p-value of 0.05 has been adopted as a significant threshold.
● All underlying assumptions were met, unless otherwise indicated.
Results
● A total of 100 patients have been included in this study with a mean age of 70 (±10).
● They were 50 (50 %) men and 50 (50%) women.
● Detailed patient’s characteristics are presented in (Table 1).