Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

STATISTICS AND RESEARCH DESIGN

Does more or new data mean that a


nonsignificant result will become
significant?
Nikolaos Pandis, Associate Editor of Statistics and Research Design
Bern, Switzerland

I
t is common in the orthodontic literature to interpret Are the above statements correct? No, because there
study findings based solely on the P value even is no guarantee that if we add more patients, the differ-
though this is an arbitrary threshold. The P value is ence between treatments A and B will stay the same, and
a surprise index and indicates the probability of it will not become smaller and, in turn, less significant. In
observing the results at hand under the null hypothesis. cases where the difference remains the same or even be-
For example, under the assumption that there is no dif- comes larger, adding more patients will decrease the P
ference between 2 methods in the time required to finish value if the variability of the data does not change; how-
orthodontic alignment, the P value gives the probability ever, this is often not how it works out.
of observing a difference of, say, 1, 2, or 5 months (or a In addition, planning a new study to repeat a previ-
more extreme one) between treatment A and B. Depend- ous smaller study (with a borderline significant result)
ing on the difference in time to complete the alignment with a similar design and a larger sample size will not
under the 2 interventions, the number of patients, and guarantee the same result as the smaller study and/or
the variability of the data, a P value is calculated that in- it will not guarantee a P value smaller than 0.05.
dicates how likely the observed results would be under To determine if claims like the above are justifiable,
the assumption of no difference. A large difference Wood et al4 calculated the probability of the P value
such as 5 months is more likely to command a smaller moving away from the statistical significance threshold
P value and thus a statistically significant result, (0.05) and toward nonsignificance under different sce-
compared with a smaller difference such as 1 or narios. They found that in a study in which the P value
2 months, with everything else being equal. was 0.06, the percentages of times that we would expect
If the P value is very small or very large, there is less a P value larger than 0.06 (less significant) were 38.6 %,
controversy in stating a significant or a nonsignificant 34.4%, 27.5 %, and 21.8% when increasing the sample
result; however, caution should be exercised because size by 10%, 20%, 50%, and 100%, respectively. There-
significance does not necessarily indicate clinically fore, the probability of not reaching statistical signifi-
important.1 The distinction between statistical signifi- cance by adding more patients given the P value of
cance and no significance becomes less clear if the calcu- 0.06 is substantial and does not corroborate with the
lated P value is just above the 0.05 P value threshold, as claims. In addition, they examined the probability to
it allows room for exploitation during the interpretation expect a nonsignificant result of a new repeat experi-
and by assuming that the P value is moving to the direc- ment, which is independently analyzed, with the same
tion of statistical significance (toward P \0.05). number of patients. If the first experiment found statis-
In the literature, it is common to encounter misinter- tical significance anywhere from 0.001 to 0.05, the per-
pretation and/or overinterpretation of study results in centage of times for the repeat experiment to find
the presence of nonstatistical significance, and this has nonsignificant results was expected to range from
been considered as a form of research spin.2 For 17.3% to 50.0%, respectively. For P values between
example, we can see statements such as “Statistical sig- 0.06 and 0.15 in the first experiment, the percentage
nificance was not reached, but there was a trend for A to of times for the repeat experiment to find nonsignificant
be better than B,” or “., we do not know whether a results was expected to be from 50.2% to 64.4%, respec-
larger sample size would have affected the statistically tively. Therefore, there is a high probability that the
nonsignificant findings differently.”3 repeat experiment will not confirm the claims of signif-
icance that was supposedly missed in the first, smaller
Am J Orthod Dentofacial Orthop 2020;158:150-1 study.
0889-5406/$36.00
Ó 2020 by the American Association of Orthodontists. All rights reserved.
In conclusion, caution should be exercised when in-
https://doi.org/10.1016/j.ajodo.2020.04.015 terpreting results close to the threshold of statistical
150
Statistics and Research Design 151

significance because additional patients or a larger nonsignificant results for primary outcomes. JAMA 2010;303:
repeat study are not a guarantee for a statistically signif- 2058-64.
3. Bj€orksved M, Arnrup K, Lindsten R, Magnusson A, Sundell AL,
icant result.
Gustafsson A, et al. Closed vs open surgical exposure of palatally
displaced canines: surgery time, postoperative complications, and
REFERENCES
patients' perceptions: a multicentre, randomized, controlled trial.
1. Wasserstein RL, Lazar NA. The ASA statement on p-values: context, Eur J Orthod 2018;40:626-35.
process, and purpose. Am Stat 2016;70:129-33. 4. Wood J, Freemantle N, King M, Nazareth I. Trap of trends to statis-
2. Boutron I, Dutton S, Ravaud P, Altman DG. Reporting and tical significance: likelihood of near significant P value becoming
interpretation of randomized controlled trials with statistically more significant with extra data. BMJ 2014;348:g2215.

American Journal of Orthodontics and Dentofacial Orthopedics July 2020  Vol 158  Issue 1

You might also like