Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Criteria for Selecting a Significance Level: A Note on the Sacredness of .

05
Author(s): Sanford Labovitz
Source: The American Sociologist, Vol. 3, No. 3 (Aug., 1968), pp. 220-222
Published by: American Sociological Association
Stable URL: http://www.jstor.org/stable/27701367
Accessed: 19-06-2015 12:28 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

American Sociological Association is collaborating with JSTOR to digitize, preserve and extend access to The American
Sociologist.

http://www.jstor.org

This content downloaded from 128.210.126.199 on Fri, 19 Jun 2015 12:28:32 UTC
All use subject to JSTOR Terms and Conditions
CRITERIA FOR SELECTING A SIGNIFICANCE LEVEL:
A NOTE ON THE SACREDNESS OF .05
Sanford Labovitz

University of Southern California

lines in a
Sociologists have now been formally warned that not only just one of several guide selecting significance
is .05 not sacred, but the selection of a significance level is a level.
One et al. Eleven more or less criteria are delimited.
complex process. major suggestion by Skipper, independent
is that the researcher should no choose a 1. Practical consequences. The
practicality of the prob
(1967) longer
standard but the obtained e.g., .40 or .003. lem refers to the gravity of available kinds of error on the
level, report level,
On the one hand, this suggestion seems to involve less think basis of value orientations. Testing whether prefrontal lo

than the conventional .05 or .01. At least here or sedation is the better method for curing patients
ing choosing botomy
there are two or three conventions on which to base a selec is a grave choice if we value vitality and recognize the long

tion. On the other level and let and extreme effects of In this example, a
hand, reporting the obtained lasting lobotomy.
the reader out the significance is not entirely a small error rate of of .001 or
ting figure (level significance) perhaps
new Some authors have been less would be chosen so that it would be extremely difficult to
suggestion. already reporting
the obtained and they do not appear to give more con the null hypothesis of no difference and accept lo
level, reject
sideration to the dynamics of a significance level any more botomy over sedation. On the other hand, if we were testing
than their more conventional the article the difference between two types of sedation, a
colleagues. However, perhaps
and his associates to be important, if error rate would be chosen if there were few
by Skipper may prove larger (.05),
it can sensitize to some of the serious drastic or long range effects for either one.
sociologists problems
involved with tests of significance. 2. Plausibility of alternatives. A test of hypothesis should
The authors three suggestions to signifi not be considered in isolation. Unless the inquiry is in an
give pertaining
cance levels and how to report them : ( 1 ) think and reflect on area where nothing is known, the available ration
virtually
should be
the arbitrary nature of conventional
levels of significance, ales and empirical evidence (from other studies)
(2) report the actual level obtained, and of considered in interpreting a significance test. Suppose the
(3) regardless
the level obtained, give an opinion on whether or not it sup results are directly opposed to existing theory and empirical
ports the hypothesis. If these suggestions are followed ex evidence, or even "common sense." That is, the evidence
in research some of the problems of the conclusion is large, and there is no theoretical or
tensively reporting, against
interpreting significance tests should diminish. However, the support these conditions,
for the finding. Under it
empirical
authors do not adequately spell out the guide lines (criteria) would probably be best to choose a small error rate (.01 or
to the selection of a significance level.1 The .001 ), because in all the studies the conclusion we
leading following opposing
section eleven criteria to this problem. are bound to find a few negative results on the basis of
specifies applicable
chance alone. We would hesitate to so easily reject the null
when rejection is such a deviant result. On the
hypothesis,
Some
Criteria to Consider the conclusion, a
other hand, if the evidence supports larger
in Choosing a Significant Level level would be more since now we
significance appropriate,
are usually more willing to reject the null hypothesis of no
The following is neither an exhaustive nor all inclusive difference.
classification scheme of criteria on which to select a signifi of a test
3. Power of the test?sample size. The power
cance level. However, it appears to represent the major dimen
varies directly with sample size, that is, as N increases there
sions that should be either explicitly or implicitly considered is a greater probability of correctly rejecting the null hy
by researchers. There
attempt is no
to integrate the entire to a specific
pothesis (in comparison alternative hypothesis).
list, nor to rank order the criteria in terms of importance.
the standard error varies with
Moreover, inversely sample
To do either seems premature. Note that none of the criteria a large N a small difference
size. Consequently, with is likely
should be considered in isolation?each should constitute a small even
to be statistically significant, while with
large N
differences may not reach the predetermined level. Therefore,
1Besidesthe problem of criteria for significance levels, there small error rates (.01 or .001) should usually accompany
are three general points that are not handled adequately by the
large N's and large error rates (.10 or .05) should be used
authors. First, the authors do not place the arbitrariness of a
for small N's.
significance level within the perspective of the general state of
and evidence. Instead, they emphasize the 4. Powerof the test?size of true difference. The power
theory, knowledge
a single test, whether or not it reaches a of a test not only varies with sample size (and level of signifi
single test. Actually,
predetermined significance level leads to no major decision. Few, cance), but also with the size of the "true" difference, e.g.,
if any, researchers would accept or reject any statement on the the magnitude of the difference between means. Therefore
basis of a single test. Second, Skipper, et al. ignore cross-classifica when the true difference is large, the probability of correctly
tion versus tests of significance arguments. While this is not
rejecting the null hypothesis is also large, except if the sam
their concern, their whole article is essentially meaningless if tests
ple size is small enough to offset this condition. A small error
of significance are not applicable. Finally, the authors emphasize
rate probably should be used when the difference is expected
applied research and the lay, statistically unsophisticated, audience.
If statistically adroit colleagues are the prospective to be substantial. This conclusion is based on the rationale
audience, per
haps their suggestions are less useful. that if a large difference is expected and only a small differ

220
The American Sociologist

This content downloaded from 128.210.126.199 on Fri, 19 Jun 2015 12:28:32 UTC
All use subject to JSTOR Terms and Conditions
ence is obtained, the null of no difference should when one or more assumptions have been violated. For ex
hypothesis
not be rejected. ample, Student's t and analysis of variance have been demon
5. Type I vs. type II error. As pointed out by Skipper, strated to be robust under of nonnormality
the conditions
et al., most textbooks emphasize the criterion of minimizing and heterogeneity. However, under these conditions the actual
the probability of errors described as type I (rejecting a .02 level of significance may be met at the .01 level and the
true null) and type II (failing to reject a false null). These .10 at the .05. Consequently, depending on the statistical
errors, to some extent, vary inversely with one another. test in question, when the data do not meet all the assump

Consequently, minimizing one type of error tends to increase tions, a small error rate should be chosen and interpreted
the other. To illustrate, a .05 significance level yields fewer as a larger one, e.g., .01 is interpreted as .02 or .05. On the
type II errors than the .01. other hand, if the data reasonably meet the assumptions,
To digress on tests of hypotheses, a large significance then a large error rate can be used with confidence.
level (.05) makes it easier to reject the null hypothesis and 9. One-tail vs. two-tail tests. As stated in most introductory
accept the original set up by the researcher.
hypothesis The statistics books, it is easier to reject the null hypothesis in

original hypothesis usually states a difference (and perhaps a directional (one-tail test) as opposed to a nondirectional

specifies the direction), while the null usually is stated in (two-tail test) hypothesis. The z-score equivalents for a
terms of no difference. Therefore, a large error rate increases one-tail test are lower than those for two tail (e.g., 1.65 as
the probability of accepting the researcher's hypothesis, but compared to 1.96 at the .05 level). It is reasoned that knowl
it also increases the probability of doing so incorrectly (type edge of the direction of the hypothesis should give the re
I error). However, with a large error rate, there is a low searcher the advantage of more easily rejecting the null and

probability that the original hypothesis is both correct and accepting the original
hypothesis.
we failed to accept it. If we feel that the original However, the notion of one-tail vs. two-tail is largely a
hypothesis
should not be accepted until a high level of certainty is myth, because it is based on the rationale that we either

reached, then many true original hypotheses are likely to be have absolutely no idea of the direction of the hypothesis
that are not accepted II or we of Either ex
lying around (type error). Which have absolute knowledge the direction.
error is best? Aside from our
personal feelings on how a treme alternative is an unlikely occurrence. It is most prob
science should develop, at this point, the other alternatives able that we have some idea of the direction of the hypoth
listed should help solve the apparent dilemma. esis, but there is a small to large amount of uncertainty in
6. Convention. Skipper, et al., strongly argue against using our reasoning. Consequently, we should neither accept the
conventional levels of significance such as .05 and .01. For z-score of the one-tail or two-tail test, e.g., 1.96
equivalent
the most part their conclusion seems justified, and the other or 1.65, but an intermediate score between the two values.
criteria listed further indicate the limitations of using a con At the .05 level if we are certain of the direction
largely
ventional level. It is listed as a separate criterion primarily it is research or sound
(that is, supported by previous
because (1) these conventions are used in sociology, and (2) then we should select a z-score closer to 1.65.
rationale),
they may be positively evaluated as yielding some consistency
If, on the other hand, there is a large degree of uncertainty,
among research results. If most results are applied to a a z-score nearer to 1.96 would be more appropriate. This
similar standard, readers have some idea of the compa a
is the equivalent of saying that we should choose larger
rability of results study to another.
from one However, the
or smaller error rate depending upon our degree of con
disadvantages of a conventional level (such as not considering
fidence in the direction of our hypothesis.
available evidence or the nature of the problem) well out
10. Confidence interval. A confidence interval not only
weigh this factor. As a final remark, the selection of a con
a probability band containing some statistical
but on provides
ventional level may not rest on any sound rationale,
measure or difference, but actually provides tests of hypoth
such incidental factors as the particular field of social
eses. Therefore, the difference between a test and an
science, where an individual degree, or the
received his
interval is not clearcut. of considering
The the
importance
journal under consideration.
confidence interval as
selecting a a level of
criterion in
7. Degree of control in design. It is well known that R. A.
significance depends on whether or not the problem requires
Fisher selected the .05 level in his agricultural ex
generally
a small or large interval. For a smaller interval a larger error
periments. These experiments were based on complex (e.g.,
latin square or that offered a high
rate is necessary (.05), while for larger intervals (in which
factorial) designs degree
there is more confidence that they contain the parameters)
of control over the effects of extraneous factors. The effects
of a smaller error rate is necessary (.01).
of "other factors" were handledby randomizing plots
rows etc. Under 11. Testing vs. developing hypotheses. If testing a well
ground, and columns of products, such highly
controlled conditions Fisher seemed justified in using the reasoned and developed hypothesis that will distinguish
error rate of .05 instead of .01 or lower. If other between two
theories, it seems logical to select a small level
larger
factors are the results of the experiment are likely of significance. This is based on the notion that we want
controlled,
to be due to the experimental variable or chance differences to be fairly sure if one theory is to be selected over another.

and not due to extraneous factors. Stated a On the other hand, if we are just exploring a set of inter
otherwise, large
amount of control in an reduces alternative in relations for the purpose of
developing hypotheses to be
experiment
so that a level of can be tested in another study, a larger error rate will tend to yield
terpretations larger significance
tolerated. In of low control, a more strin more hypotheses?any of which may be subsequently vali
designs perhaps
dated. in this stage the .10
gent error rate should be selected (.01) since the alternative Therefore, exploration perhaps
to chance differences could be due to extraneous factors as or .20 level would be sufficient.
as to the independent variable. under low Caution should be used not to fall into the trap of thinking
well Consequently,
we should make it more difficult to reject that the few "significant" relations out of many possible ones
control conditions
the null of no difference. have truly reached the designated level. Out of twenty inter
hypotheses
of a relations we are to find one at the .05
8. Robustness of test. Robustness is the ability likely significant
statistical test to maintain its deduced conclusion level on the basis of chance alone. However, we do not fall
logically
221
August, 1968

This content downloaded from 128.210.126.199 on Fri, 19 Jun 2015 12:28:32 UTC
All use subject to JSTOR Terms and Conditions
into this trap if the "significant" relations are criteria should not be viewed as definitive in any and
subsequently sense,
tested. some are more than others. I welcome
undoubtedly important
any response on other possible criteria, and any thoughts
Conclusion on the evaluation of those presented above.

In conclusion, Skipper, et al. have performed a definite


service to sociology if more of us probe into the ra
Reference
deeper
tionales behind significance levels and
stop using an ab
Skipper, J. K., Jr., et al.
solute standard as proof of a hypothesis. To buttress this 1967 "The sacredness of .05; a note concerning the uses
position, eleven criteria are presented that hopefully will of significance in social sciences." American Soci
aid researchers in selecting an level. These 2 16-19.
appropriate ologist (February):

THE CASEWORKER AS RESEARCH INTERVIEWER


Michael A. La Sorte

Hofstra University

In the past several years it has not been uncommon for practitioner-researcher role. As he notes, "Committed as they
the caseworker to take on the responsibilities of research were to the full practice of social work skills with their
interviewer in addition to his regular social work duties. clients, social workers found it next to impossible to live
This has come about for a number of reasons and is mainly up to certain experimental developed before our proj
plans
to be seen in the action-researchtype program. It is as ect got under way. Thus our initial plan called for one

sumed, often without question, that the caseworker qua sample of skid row men to be interviewed and referred to

caseworker, by virtue of his training and experience in hu another agency for service, sample of skid row
while another
man relations, is admirably suited for the position of re men would be interviewed
and given considerable attention
is and also be relocated . . . The
search interviewer. The argument of this paper is that he by the staff. research design
not so qualified. This erroneous impression has created un broke down because the social work staff found it next to im
necessary problems in social work-research endeavors. The possible to withhold their skills from one population and
successful of the action-research program is give it to the other (Shostak, 1966:160)." The problem was
completion
heavily contingent upon the caseworker's ability and willing handled by discontinuing the research plan as originally
ness to accept the dual role of researcher and caseworker. formulated and replacing it with a much emasculated one.
There is little indication in the literature of the ratio of Blumberg gives an inadequate evaluation of the impact of
success to failure of such interdisciplinary efforts, nor where this change on the
goals of the action-research program,
there has been can we
failure locate a reliable analysis of especially in reference to its research phase.
the reasons for failure
along with proposed solutions. None What follows is a description and analysis of the failure
theless, ideally, the action-research type program should of an action-research project. The focus will be on the case
work to the profit of both the social work profession and the workers in the organization who found themselves with the
aims of social research. We suspect that this has seldom prospect of accomplishing two diverse tasks?casework and
been the case. To date, there have been a few analyses of research. They were faced with the predicament of reconcil
organizations with some similarities to the action-research ing a basic job orientation to a system whose demands they
situation (Shostak, 1966; Lazarsfeld et al., 1967; Hammond, saw as deviating from some rather fundamental tenets of
1964). These studies have focused on the social and organi their profession.
zational contexts of social research. They cover a wide range
and illuminate the many difficulties encountered when re
The Project and the Caseworker
search and programmatic ventures in the field must contend
with other interest One goal of the action-research
disciplines, groups, governmental agencies, project was to demonstrate
and intra-organizational conflict. the effectiveness of the intensive casework and to
technique
Where social work-research projects have been discussed, promote its distribution and to the permanent
acceptance
the rather serious dilemmas and conflicts to be found in the social work in the city. To effect this a most
agencies logical
internal organization have either been ignored or glossed recruitment was undertaken. Five social work
procedure
over. This to confront
reluctance issues has were and convinced of the feasibility
organizational agencies approached
not only prolonged the problem and stifled discussion but of the project and of the potential rewards which would be
also, in some cases, has given an unjustified aura of scien to them. Each contributed one of their
forthcoming agency
tific respectability to empirical findings arriving from action most efficient social workers to the project on a part-time
research programs. basis. Each found himself with
caseworker, then, obligations
Where organizational conflict has been in to his permanent
acknowledged supervisor (from the parent agency), the
the literature, the treatment tends to be cursory, polite, non action-research the ac
project supervisor (who coordinated
controversial, and consequently of little value. In a recent tivities of all five part-time and the research
caseworkers),
article on an action-research program in Philadelphia, section hard
(that expected empirical data from the case
Leonard Blumberg indicates the kind of situation that tends the formal called
workers). Officially, job requirements
to develop when the caseworker is expected to assume a dual for a congruence of these variant role obligations. They
^^The American
Sociologist

This content downloaded from 128.210.126.199 on Fri, 19 Jun 2015 12:28:32 UTC
All use subject to JSTOR Terms and Conditions

You might also like