1996 Vangel Print PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Confidence Intervals for a Normal Coefficient of Variation

Author(s): Mark G. Vangel


Reviewed work(s):
Source: The American Statistician, Vol. 50, No. 1 (Feb., 1996), pp. 21-26
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2685039 .
Accessed: 06/11/2012 16:01

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.

http://www.jstor.org
Confidence Intervals for a Normal Coefficient of Variation
Mark G. VANGEL
to easily calculate confidence intervals for r that attain very
nearly the nominal confidence level for any sample size.
This article presents an analysis of the small-sample dis- These calculations require only a table of quantiles of the
tribution of a class of approximate pivotal quantities for a X2 distribution.
normal coefficient of variation that contains the approxima- Let Y, denote a x2 random variable with v = n-1 de-
tions of McKay, David, the "naive" approximate interval grees of freedom, and define W- Yl/v. For aeE (0, 1) let
obtained by dividing the usual confidence interval on the 2 denote the 100crpercentile of the distribution of YL,
standard deviation by the sample mean, and a new interval and let t x= /Iv be the corresponding quantile of W,.
closely related to McKay. For any approximation in this Define the random variable
class, a series is given for e(t), the difference between the
cdf's of the approximate pivot and the reference distribu-
K2(1+0K2) (5)
tion. Let denote
i the population coefficient of variation. (I + HK2) K2

For McKay, David, and the "naive" interval e(t) = 0(,2), where 0 = O(v,ae) is a known function. If we choose 0 so
while for the new procedure e(t) = O(r04). that
KEY WORDS: Chi-squared approximation;Noncentral t
Pr(Q < t) -Pr(Wi, < t) (6)
distribution;McKay's approximation.
then, because the distribution of W, is known and free of
r, we can use Q as an approximate pivot for constructing
hypothesis tests and confidence intervals for K. We define
1. INTRODUCTION the accuracy of the approximation (6) to be e(t) _ p - a,
where p Pr(Q < t). Note that p is the actual confidence
If X is a normal random variable with mean ,u and vari- level of a one-sided confidence interval for r,2, based on Q,
ance -2, then the parameter having nominal confidence ae.In Section 2 we give a Taylor
series expansion for e(t) in powers of r,2, leaving the de-
O_
.N-- ~~~~~~~(1)
tails to the Appendix. We then consider four choices for 0:
At corresponding to the approximations of McKay (1932) and
is called the population coefficient of variation. Let Xi for David (1949), to the "naive" approximate interval obtained
i = 1, .. ., n be an independent random sample, with Xi by dividing the usual confidence interval on the standardde-
N(At,u2) for each i. In terms of the usual sample estimates viation by the sample mean, and to a new interval closely
of the normal parameters related to McKay (1932).
n
McKay (1932) proposed that Q and W, are approxi-
x = EXi/n
mately equal in distribution when 0 = v/(v + 1), but he
(2)
was unable to investigate the small-sample distribution of
i=l1
Q. Consequently, Fieller (1932) and Pearson (1932) per-
and formed numerical and simulation studies with satisfactory
n results. David (1949) proposed McKay's approximation
S2 = X _
(X-)2/(-)(3 with 0 = 1; this suggestion has received much less attention
i=l1 than McKay (1932). Much later, Iglewicz and Myers (1970)
compared selected quantiles of the approximate distribution
a point estimate of (1) is
for K, obtained from Q with McKay's choice of 0, with the
K_ S/X. (4) corresponding exact values obtained using the noncentral t
distribution. This numerical investigation demonstrated that
This statistic is widely calculated and interpreted, often for McKay's approximation is very good, at least for n > 10
very small n, usually without an accompanying confidence and 0 < K < .3. Instead of examining differences in quan-
interval. An exact method for confidence intervals on , tiles numerically, we will investigate differences in cdf's
based on the noncentral t distribution is available (Lehmann analytically, and thereby develop a deeper understanding of
1986, p. 352), but it is computationally cumbersome; hence the small-sample properties of these approximations.
the need for approximate intervals. In this article, we will McKay arrived at his approximation by first deriving an
investigate an approximate pivotal quantity that can be used expression for the density of S/ XI, where S2 = (n-
1)S2/n. He then reexpressed this density as a contour inte-
gral, to which he applied an approximation closely related
Mark G. Vangel is Mathematical Statistician, Statistical Engineering
Division, National Institute of Standards and Technology, Gaithersburg,
to the saddlepoint method (e.g., DeBruijn 1958, chap. 5)
MD 20899-0001. This work was completed while the author was with the in order to obtain an approximate density. Since McKay's
Materials Directorate, Army Research Laboratory,Watertown, MA. statistic is based on an asymptotic approximation, it can

?- 1996 American Statistical Association The American Statistician, February 1996, Vol. 15, No. 1 21
be expected to have good properties for large samples. It is For McKay's approximation 0 =(1) _ v/(v + 1), and (8)
likely that, had McKay performed the small-sample analysis becomes
of this article to Q(02), he would have proposed a slightly
different confidence interval that would not have required
verification for small samples by "the direct, but neverthe- 2
less arduous, method of numerical test" (McKay 1932, p. el (t) tH' (t) I + 2 + 9w + 2w2 - 3w3 + 2w4
695) (i.e., numerical integration and Monte Carlo simula-
tion), and the history of this problem as outlined in the
previous paragraphwould have been different.
-3vt + 9v2t + 5w3t - 7T4t - 5w2t2 1
2. A TAYLOR SERIES FOR e(t) - 2v3t2 + 9w4t2 - 5w4t3 + w4t4 4
+ 2(1+ v)3
Denote the distribution of W, by H, (.) so that, for 0
< ae < 1,H,(t) = a. Because u(x) x/(Ox + 1) is a
monotone function with inverse u-l (y) y/(l - Oy),

+ 0(Z6)W, (9)
[(l+2 2)

=Pr K( 2X + t < 2)

David (1949) proposed Q with 0 0(2) 1 as an ap-


-

proximation pivot for a normal coefficient of variation. The


=Pr[- < 1 H)t1 accuracy of David's approximation is

For a given choice of O(v,ae) we have defined the accuracy


of the corresponding approximation to be e(t) _ p - . In 4 + 14w - lOw2 + 4w3
(t)-tHttH' (t)
e2C-2(t t 4 (t - 2)K +
the Appendix we show that w+1I 4(1+wV)2

(1 - t)w - 11
e(t) OtH(t) [t-1 JI2 / -16t + 4Vt + 18v2t - 14w3t + 6t2
- 15wt2 - 6w2t2 + 18v3t2
+ 5Vt3 - 4w2t3 - 1OV3t3

-6 + llv - 6v2 + v3 - 31t + 6,2t + 2v2t4 + 2w3t4


+ 4(1+ V)2
-3v3t + 3,3t2 -_ 3t3
+ 4(1+v)2

+ (1-v + 2vt)(1-Ot)
I + v

(1 - v + vt)(2 - v + vt)(1 - Ot)2


2(1 + v) x K4+oQ6) . (10)

+ (2 + v - vt)(1 - ot)2 _4 + /(c6)

Another reasonable choice for 0 is H(3 1/t. Confidence


intervals are obtained for this choice of 0 by simply dividing
the endpoints of the usual confidence interval for uf by X.
22 The American Statistician, February 1996, Vol. 15, No. 1
9 Alpha=.01 Alpha=.05
-n 2 - n2
CLO n 10n CLC: n=
---0n=25 n=O. n =25
C-

* 0
o I ~
I~ ~~I70 ~ ~~~ ~ ~1 ~
I 1 1I
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Kappa Kappa

0.0L.O0.2 .3 0.40.5 ?0 0
AlphaK=.95a Alpha=.99

0.0 0. 020.n.405 00 0.1 n=0.030405


=
.
Fiue1ChZ
Figure5 EatVleV oo oc)
n =2,510an
o 25 ,Itann~=.1 .... =5 .95,ad.9sFucinof
n=05

The corresponding approximation has accuracy


14+(1)}
Xt4 + 0(/M6) 9(3 (13)

e3t- tH >(l -t) -1 ;2


(t) t
We will refer to the approximations corresponding to these
four choices of 0 as Approximations 1-4, or the McKay,
David, Naive, and Modified McKay approximations, re-
[-6 + llv- 6v2 + v3 - 3vt spectively.
+ 6v 2t - 3V3t + 3,3t2 _ 3t3
+ 4(1+ v)2 3. DISCUSSION
If r is small, as is usually the case in practice, ej (t) will
also be small for j = 1, . .. , 4, so any of the above approxi-
mations will be satisfactory. For large samples 0(j) 1 for
X 4 + O(16)} ( j 7&3; hence the three corresponding methods are asymp-
totically equivalent. Since McKay's method is asymptoti-
cally exact, these three methods are all exact in the limit of
large n. Investigation of el(t) and e2(t) demonstrates that
Finally, note that if David's approximation is not clearly better than McKay's,
and in any case, McKay's method is much more often used
~22 + vii v [22+1, than David's. Also, Approximation 3, although adequate if
H=O H~~~
(4) = w 2 1 (12)
12
(w +1)t //+1I w+1 Lx2 l r is sufficiently small, is substantially less accurate than the
other three approximations. We will therefore not consider
then the O(r 2) term in (8) is zero, and we have an approx- David's and the Naive approximation further, and restrict
imation with accuracy attention primarily to the McKay and Modified McKay ap-
proximations.
( [-2 - 3v + 12v2 - 9w3 + 2v4 Denote e(.) regarded as a function of ae by a(.), that is,
e4(t) _ tH' (t)J4 | + vt - 15v2t + 21v3t - 7v4t e(o~) _e[H, (a)]. The difference p(ae) 1a1(a)j - 1e4(a)j
C-4
will be positive when the Modified McKay approximation
(t) tHv' (t)2(1 + w)3 is more accurate than McKay's approximation, and nega-
tive otherwise. Hence this difference provides a means for
comparing these two methods. Using the noncentral t dis-
5w2t2 - 16v3t2 + 9v4t2 + 4v3t3 - 5v4t3 + v4t4 tribution, it is straightforwardto evaluate p(ae) exactly, and
+
2(1+ v)3 this is preferable to using the approximate formulas of the
previous section. In Figure 1 results are displayed of com-
The American Statistician, February 1996, Vol. 15, No. 1 23
. ... .. ......

-Cocm
- ModifiedMcKay McKa
CZ0

.1-10
.001 .005 .025 .1 .25 .5 .75 .9 .975 .995 .999
Kappa = < 9.1 E-M07)/tX
2le(alpha)l Alpha (on logit scale)
b0 ,, McKayl

CZ Kapa .2 (e(alpha)I< 4.2E-04)


= 0

C\J
2 The(Exct)Appuace
Figue fo.25MKa n ModifiedMcKayMehd,aFucinofx,orn=5nd =.5nd2.

*.001 .005 .05 .1 .25 .5 .75 .9 .975 .995 .999


Alpha(on logitscale)
Figure2. The (Exact)Accuraciesforthe McKayand ModifiedMcKayMethods,as Functionsof a~,forn = 5 and iK .05 and .25.

puting Wo(av)numericallyfor 50 values of i between .01 4. CONFIDENCE INTERVALS


and .5; for samplesizes of 2, 5, 10, and25; andfor avequal In this sectionwe illustratehow the approximatepivot (5)
to .01, .05, .95, and .99. Note that the ModifiedMcKay can be used to determineapproximateconfidenceintervals
methodis usuallymoreaccuratethanMcKay'smethod.The for i. We assumne that i is positive,andthatthe probability
McKaymethodis moreaccuratethanthe ModifiedMcKay of K being negativeis negligible.
only when avand i are both large. But neithermethodis A 100(1 - a)% approximate confidence interval based on
appropriatewhen i is large becausea large i corresponds (5) is
to a nonnegligibleprobabilityof obtaininga negativeob-
servation,which invalidatesany methodbased on (5). We K
note in passingthat McKay suggeststhat his methodonly
A [ (fK) 2 (f )21 (14)
be used when one can assumethat i < .33 (McKay 1932,
p. 637). where tXU x ,l/2/v and t2 X> /2/v. One-sidedin-
Whatis not clearfromthe differencesdisplayedin Figure tervalscan be determinedsimilarly.If we let ui vti, for
1 is that,particularlywhen , is small,the ModifiedMcKay i = 1, 2, thenwe can writethe McKayandModifiedMcKay
approximation is often extremely accurate:in fact, virtually confidenceintervalsas
exact. This point is made by Figure 2, which shows the
accuraciesof these two methods(as determinedfrom the
noncentralt distribution)as functionsof av, for a sample
size of 5, and for , = .05 and , .25, respectively.
One might also conjecturefrom Figure 2 that McKay's A1IKK[( ) 1- 2 + 2(15)
methodalways providesa conservativeone-sidedinterval.
There is considerablenumericalevidence to supportthis
and
claim, but a proof is not available.Althoughthe Modified
McKaymethodcanprovideanticonservative one-sidedcon-
A4 = { K [ ( ll +2 _1) K2 + t!ii]
fidencelimits (i.e., the relevantaccuracycurves in Figure
2 are above the horizontalaxis for some values of av),the
amountof this anticonservatismis always small. We can
be certainof this because both the McKay and Modified K
K[ 2 1 )f + 2 },(16)
McKayintervalsare exact in the limit of large n, and both
arebased on the assumptionthat the probabilityof obtain- respectively.The modifiedintervalA4 differsonly slightly
ing a negativeobservationis negligible. So we need only from the originalintervalA1. The analysis in this article
be concernedwith small n and small ,. These cases can has led to a simple adjustmentto the classicalintervalthat
be investigatedexhaustively,andthe resultssummarizedin improvesits small-sampleaccuracywhile preservingthe
Figures1 and 2 are representative. high accuracythat A1 has for large n.
24 The American Statistician, February 1996, Vol. 15, No. 1
Note that if Oi = 0(3) = I/ti for i = 1,2, then (14) imposes slight restrictions on avand i that are not important
becomes in practice.
The random variables X and S are equal in distribution
A3= (K/ t1IK/4t2), (17) to
X = ,u + Z / (A.2)
whichis the usualintervalon o, with the endpointsdivided
by X. and
Because (1 + ,K2)/,K2 in (5) is a monotonefunctionof ,K2,
we can also use (5) to test he null hypothesisHo:n = n S =7 N/W (A.3)
for some known no. An endpointof the interval(14) does respectively, where Z - N(O, 1), and Z and W, are inde-
not exist if t(OK2 + 1) - K2 < 0 or, equivalently, pendent. Hence
t
K2O> (18)
(K2 W-12~Z
(K) WI(I + n lZI n) - 2 =T[V,6 (A.4)
In order for (18) to hold for the choices of 0 considered
in this article,eitherK2 must be large or t must be small. where T>,8 denotes a noncentral-t random variable with de-
Neither of these conditionsis likely to occur in practice, grees of freedom v and noncentrality parameter 8 _ /n.
except possibly when n and t are both very small. If K is By conditioning on Z and expanding in a Taylor series
small but (18) holds, then one can eitherreducethe confi- about Z 0, we have that
dence level, increasethe samplesize, or else use the exact
methodbased on the noncentralt distribution.
p PrK-
5. EXAMPLE
The tensile strength of five specimens of a compos-
ite materialare as follows (in 1000 psi): 326, 302, 307, E{H, [ (qH(+ Z)21}}q) H (q) + i
299, 329. We have X 312.6 and S 13.94, so that
K =.045, ul X 11.14, and U2 X4, 25 .4844.
Equations (15) and (16) lead to confidence intervals on i
by the McKayand ModifiedMcKayniethods,respectively. x {(lM ) -1 2
For this examplethe ModifiedMcKayproceduregives the
95%confidenceinterval(.0270, .1293), which differsfrom
the McKayintervalonly in the fourthdecimalplace. The
"naive"95% confidenceinterval[basedon (17)] is (.0267, -6 + llv - 6v2 + w3 - 3wq
.1281).
+ 6w2q - 3v3q + 3w3q2 - w3q3
6. CONCLUSION 4(w + 1)2
A class of approximatepivotalquantitiesfor a normalco-
efficientof variationrelatedto the approximation
of McKay
(1932) has been investigatedanalytically,with particular
emphasison four specialcases. The most importantresults
arethat,if i denotesthe populationcoefficientof variation, + o(K6)t (A.5)
then the differencesbetweenthe actualand nominallevels
of McKay's (1932) confidenceintervalare of 0(Q2), and
thata very slight modificationof McKay'smethodleads to
an apparentlynew 0(n 4) methodthatis usuallysuperiorto Using (A.1), the terms in (A.5) can now be expanded in
McKay (1932), and that is recommended. powers of n2 about n2 = O,giving

APPENDIX: DERIVATION OF EQUATION (8) HV(q) = HV(t) + t(Ot - I)H t

For most applications, will be small, so our plan is to + t(Ot - 1)2/2[2H1' (t) + tH (t)] K4 + Q(K6) (A.6)
let
H'(q)= H1(t) + t(Ot-1 )H (t)K2 + Q(K4) (A.7)
2)
a = a(Ei )-1+ ~~~t and
(1-St)K2 (A.1)

w+ 1 w+ 1
andto expandPr[(K/iS)2? q] in a Taylorseriesin ,S2, then
to expandeach termin this seriesagainin powersof ,S2, US- es2 +?O(^;4). (A.8)
ing (A.1). We assumethroughoutthatq iS nonnegative;this +(Ot-1)(w-1-2wt)

The American Statistician, February 1996, Vol. 15, No. 1 25


Using the identity DeBruijn, N. G. (1958), Asymptotic Methods in Analysis, New York: In-
terscience.
t Hi,(t) = [2 (I - t) - I tHII,(t)I (A.9) Fieller, E. (1932), "A Numerical Test of the Adequacy of A. T. McKay's
Approximation,"Journal of the Royal Statistical Society, 95, 699-702.

substituting (A.6), (A.7), and (A.8) into (A.5), and collecting Iglewicz, B., and Myers, R. H. (1970), "On the Percentage Points of the
Sample Coefficient of Variation,"Technometrics, 12, 166-169.
terms in e2 leads to (8).
Lehmann, E. L. (1986), TestingStatistical Hypotheses (2nd ed.), New York:
John Wiley.
[Received July 1993. Revised November 1994.] McKay, A. T. (1932), "Distribution of the Coefficient of Variation and the
Extended 't' Distribution," Journal of the Royal Statistical Society, 95,
695-698.
REFERENCES
Pearson, E. S. (1932), "Comparisonof A. T. McKay's Approximation with
David, F. N. (1949), "Note on the Application of Fisher's k-statistics," Experimental Sampling Results,"Journal of the Royal Statistical Society,
Biometrika, 36, 383-393. 95, 703-703.

26 The American Statistician, February 1996, Vol. 15, No. 1

You might also like