Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Journal of Dental Research

http://jdr.sagepub.com

Significance Level and Confidence Interval


Rosario H. Potter
J DENT RES 1994; 73; 494
DOI: 10.1177/00220345940730020101

The online version of this article can be found at:


http://jdr.sagepub.com

Published by:

http://www.sagepublications.com

On behalf of:
International and American Associations for Dental Research

Additional services and information for Journal of Dental Research can be found at:

Email Alerts: http://jdr.sagepub.com/cgi/alerts

Subscriptions: http://jdr.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://jdr.sagepub.com at Kasturba Medical College, Manipal on May 22, 2009
Guest Editorial

Significance Level and Confidence Interval


Rosario H. Potter, DMD, MSD, MS, Professor Emerita

Graduate students in my class in biostatistics are required to a real difference or effect in whatever we are testing. Statistics is
demonstrate that they fully understand the meaning and im- the tool we use to demonstrate that we have found such a differ-
plications of the much-used phrase "results are significant, P < ence. Todo that, we have to"disprove"(orchallengeorcontradict)
0.05." To this end, I am continually dismayed when virtually the so-called null hypothesis of no difference. In other words, we
every new class comes in with similar preconceived but erro- have to show evidence that our results are not compatible with
neous notions about significance level and confidence inter- the null hypothesis, which assumes no difference. The test statis-
val. The more recent the class and presumably the more expo- tic (t, F, x2, etc.) calculated from our data will lead us to two, and
sure to increased usage of statistics in the literature, the greater only two, possible conclusions: that our data either (1) signifi-
is the difficulty to correct their set concepts of these two terms, cantly deviate from zero or no difference, i.e., we reject the null
which are related yet have different meanings and different hypothesisandconcludeadifference;or(2)donotdeviatesignifi-
uses. cantlyfromthenullhypothesisof nodifference,i.e.,nodifference
We often see the following in print: can be found. In this manner, we have performed a statistical test
* The confidence level was set at P < 0.05. of hypothesis, and our decision, differ or do not differ, is based on
* Statisticalsignificancewassetat the 95%confidence level pre-determined cut-off points in the percentage or probability
(P < 0.05). distribution of the test statistic that we use. Notice that this isa yes
* Significance between groups was determined by using P or no decision, either significant or not significant. Cut-off points
< 0.05 confidence limits. are referred to as the critical values of the test statistics.
* The difference shows statistical significance at probabil- Significance at 0.05 level means that: (1) our data are suffi-
ity of 95%. ciently far from the cut-offs for no difference, to lead us to a
* The difference was not significant at the 95% confidence conclusion of difference; and (2) in this decision, we are aware
interval. that we incur a probability (P) or chance of 5% of being wrong,
* The groups show similar mean values at 95% confidence. because we know that 5% of similarly conducted experiments
These expressions are not unique to dental publications, but will showastatistical significancejustbychancealone,evenif no
are also routinely found in the medical literature. It is not real difference or effect exists. The smaller the P level (0.01, 0.001,
surprising that new and non-statistician researchers are easily 0.0001) and the farther our data from the cut-off, the less chance
misled toward at least three erroneous concepts: first, that wehavefor a wrongconclusion;and thusthemorecertainwecan
significance level and confidence level are synonymous and be of the difference that we find from our data.
interchangeable terms, one being the "other side" of the other; This percentage (0.05, 0.01, etc.) is the level of significance,
second, that the term "not significant" shows that the mean or the P level, or a, which is the probability of making an error
values are the same; and third, that if our results are significant in concluding a difference when none really exists. By this
at P < 0.05, we may be confident at 95% probability that this definition, significance level of 0.95 or 95% is meaningless.
difference or effect we found is real. In our test of hypothesis, notice that (1) we, the experiment-
I believe we have a responsibility to guide our future scien- ers, determine the significance level or cut-off (0.05, etc.); (2)
tists in concepts fundamental to research. Therefore, this edi- the cut-off is arbitrary; (3) the P level for concluding a differ-
torial takes advantage of the excellent forum offered by the ence may be very small but never absolute zero because, in
Journal to clarify such commonly misused and misleading scientific research, certainty is never absolute, only relative;
terms in the literature. In the attempt, plain words will be used and (4) the significance statement isa rather strong statement,
as much as possible without invoking statistical jargon. because we are saying in essence that we know our magnitude
A priority goal in scientific research, simply stated, is to show of error when we conclude a difference.

494
Downloaded from http://jdr.sagepub.com at Kasturba Medical College, Manipal on May 22, 2009
J Dent Res 73(2)1994 Guest Editorial 495

Graduate students have no problem with the concept that P treatment mean and the control mean in our data, while the
< 0.05 measures the chance of error in concluding a difference. interval for mean difference mentioned above is calculated
The problem comes when they mistakenly stretch this to in- around the difference between treatment and control means.
clude the notion that if 0.05 measures error, the other side, the The confidence interval for mean difference gives the most
95%, must be the confidence level and must mean 95% prob- helpful information in the alternative case where P > 0.05. Here,
ability of a real difference. This is not the case. These two the chance of error in claiming a difference is large and greater
probabilities do not add up to unity. If they do, we should than 5%, thus leading us to a conclusion of no significance. The
theoretically be able to set a significance level of zero and be 95% confidence interval calculated around the mean differ-
100% certain that this difference is real. ence in our data will include zero and a range of values that will
The use of the word "confidence" is unfortunate. It is more lead us to conclude no significance. Why is this interval inf or-
easily understood per se than the word "significance", so much mative? Because the confidence limits give a whole range of
so that authors without statistical background seem to be more differences that may be clinically or scientifically meaningful
comfortable using it than the latter, even to the point of misuse. although not statistically significant. In other words, even if
What, then, is the confidence interval? analysis of our data does not result in finding a statistically
All of us have used the mean in our data as a one-value significant difference or effect, there may exist a real diff erence
estimate of an unknown real mean value, a parameter which or effect that may not be large but is meaningful to the re-
we seek. In statistics, this is the point estimate. Likewise, mean searcher. This point must be emphasized particularly when
difference, correlation coefficient r, regression coefficient b, our data are of near-borderline significance, say somewhere at
odds ratio, percent reduction, etc., are point estimates. Not all P = 0.06. Such results occur because: (1) sample size is small due
researchers, however, are aware that there is another estimate to expense and time constraints; and/or (2) variation (standard
via a range of values, i.e., the confidence interval estimate. deviation) is large due to difficulties in obtaining quantitative,
Statistics textbooks tell us that this range has a designated reliable, and repeatable data, to instrumentation and technique
likelihood (usually 95% or 99%) to include the real but un- errors, to inherently large variation between subjects, etc. In
known parameter that we seek. In other words, we may be 95% this case, the two confidence intervals, for treatment effect and
(or 99%) confident that the values between the two confidence for control mean, can be expected to overlap.
limits calculatedfrom our sample data may include the un- The reasoning then follows that, in publications, the state-
known real mean value, mean difference, r, b, etc. ment "P > 0.05, not significant" is not as informative as specify-
Let us now examine how this definition of the confidence inter- ing the actual P levels obtained and showing the confidence
val relates to, and differs from, the level of significance. Primarily, limits for the mean difference. Levels at 0.06 and 0.50 are both
confidence interval is a method of estimation, while significance greater than 0.05, but have very different implications. At 0.50
level is used with the statistical test of hypothesis to arrive at a level, a conclusion of no difference is obvious. But if P level is of
conclusion. The twoprocedures,however,can be used togethervery near-borderline significance, we can say that: (1) our observed
effectively to give more informative results than either alone. difference or effect, although not significant, is clinically or
Consider the case where we conclude a difference at P < 0.05. scientifically meaningful; (2) within the confidence limits is a
The chance of this conclusion being wrong is less than 5%. Here, range of differences of which we can be 95% confident that
our observed mean difference is large enough to be beyond the these values may include the real difference; and (3) if sample
cut-off from zero or no mean difference, in the reference distri- size were increased and/or variation decreased in follow-up
bution upon which the null hypothesis is based. On the other studies, there is a strong likelihood that the difference would
hand, the definition of the 95% confidence interval tells us that reach statistical significance. In fact, such preliminary data are
clustering around the mean difference in our data is a range of of ten required by f unding agencies like NIH to estimate needed
values that has a 95% chance of including the real mean differ- sample size to ensure statistical significance in future experi-
ence. These values will not include zero, which is consistent ments.
with the significant results in the test of hypothesis. It is impor- It is hoped that this discourse clarifies the following:
tant to note that the 95% confidence interval is calculated First, significance level and confidence level are not syn-
around the sample mean difference in ourdata, and not around onymous or interchangeable terms. The amount 0.05 is used
the zero mean difference in the reference distribution in our with the significance level and 95% with the confidence level.
test of hypothesis. This is probably the underlying notion that This is because 0.05 refers to a probability given our particular
leads to misconstruing 95% as the probability of correctly sample data, whereas 95% is not probability in this sense but
concluding a real difference. refers to the percentage of intervals that are expected to include
In this case, where our data show a significant dif ference, we the real difference among which our interval may or may not
may choose to publish confidence intervals for the mean treat- even be one. A statement of significance is associated with the
ment effect and for the control mean. The more highly signifi- test of hypothesis against a reference distribution with a hy-
cant our result is (P < 0.01, P < 0.001, etc.), the farther these two pothesized mean, as, for example, against the null hypothesis
intervals will be from each other, and the values will not where the hypothesized mean is zero difference. A statement of
overlap. Note that these two intervals are calculated around the conf idence refers to an interval of values calculated from sample
Downloaded from http://jdr.sagepub.com at Kasturba Medical College, Manipal on May 22, 2009
496 Potter J Dent Res 73(2) 1994

data, as, f or example, 95% confidence interval around the mean cance level (0.05, 0.01, etc.) but also, more importantly, on
difference in our data, and no hypothesis testing is involved. sample size, amount of variation, and the magnitude of mean
The two terms thus have dif ferent, albeit related, meanings and difference. Therefore, it is calculable only when these values
uses. We can make a statement of significance at the 0.05 level, are available from preliminary data. Power is a required statis-
and we can make a statement of confidence at the 95% level. We tic for grant proposals because it shows the likelihood that any
are, however, making two different statements and not two detected difference or effect is real and not a chance finding.
versions of one and the same statement. Power, although not usually shown in published papers, can be
Second, "no statistical significance" does not 'necessarily expected to be high if sample size is sufficient, if the difference
imply that the group means are the same. What can be inferred between group means is large, and if data measurement is valid
is that either the difference is so small that there is no practical and repeatable using reliable technique and instrumentation.
difference between the group means, or the observed differ- The readers should now be ready to edit the inappropriate
ence is not large enough to reach the 0.05 cut-off for signifi- statements cited previously, as follows:
cance due to small sample size or large variance but is nonethe- * The significance level was set at P < 0.05.
less meaningful to the researcher. * Statistical significance wasset at the 0.05 probability level.
Third, concluding a significant difference at P < 0.05 does * Significance between groups was determined by using P <
not mean that we can be confident at 95% probability that this 0.05.
difference or effect we found is real. The test of hypothesis * The difference shows statistical significance at probabil-
yields a yes or no conclusion. In concluding a difference, we can ity of 0.05.
make a definite statement of significance that P < 0.05, which * The difference was not significant at the 0.05 probability
says that we know we incur a 5% chance of error (or 1% if P < level.
0.01, etc.). If our observed difference falls in the other side, i.e., * The groups do not show a difference at the 0.05 signifi-
the 95% region of the distribution, our conclusion is necessarily cance level.
that of no difference. Therefore, 95% or 0.95 probability means All are statements of significance in conclusions drawn from
that, if there indeed is no real difference or effect, we will not get tests of hypothesis. They are not confidence statements.
a mean difference in our data large enough for significance, 95 Our responsibility to the next generation of dental research-
times out of 100. Note that this statement is entirely different ers dictates a more thoughtful approach to our written words,
from the incorrect statement that 95% measures probability of the effects of which are frequently underestimated in the rush
a real difference or effect. The reasoning then follows that 95% to publish. No citations are referenced, since it is not the intent
has meaning not in the "yes" conclusion but in the "no" conclu- of this edinrial to finger-point, but rather to alert authors,
sion, where no significant difference is found. Again, this 95% editors, and reviewers to avoid misleading statistical state-
or 0.95 (1-0.05) probability is not a confidence interval. ments in published papers.
The final question is thus: If significance level tells us our
chance of a wrong conclusion, what measures the probability that -Rosario H. Potter
we are correct when we conclude a difference, that this is a real Department of Oral Facial Development
difference or effect? The answer is, the power of the statistical test. Indiana University School of Dentistry
Power of the test depends not only on the designated signifi- Indianapolis, IN 46202-5186

Downloaded from http://jdr.sagepub.com at Kasturba Medical College, Manipal on May 22, 2009

You might also like