Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

ESTIMATION AND

TEST OF
14 SIGNIFICANCE
(14.1 INTRODUCTION

The object of sampling is to study the features of the population on the basis of
sample observations. Acarefully selection sample is expected to reveal thesefeatres,
and hence we shall infer about the population from a statisticalI analysis ofi the
This process is known as Statistical lInference. sample.
There are two types of problems. Firstly, we may have no iinformation at allabou
some characteristics of the population,especially the values of the parameters involve
in the distribution, and it is required to obtain estimates of these parameters. This i
the problem of Estimation. Secondly, some information or hypothetical values of he
parametersmay be available, and it is required to test how far the hypothesis is tenable
inthe light of the information provided by the sample. This is the problem of Test of
Hypothesis or Test of Significance.

14.2 THEORY OF ESTIMATION

Suppose we have a random sample x,, X, ...X, on a variable x, whose distribution in


the population involves an unknown parameter 0. It is required to find an estimate of
8 on the basis of sample values. The estimation is done in two different ways:
(i) Point Estimation, and (ii) Interval Estimation. In point estimation, the estimated
value is given by a single quantity, which is a function of sample observatios
(i.e. statistic). This function is called the 'estimator' of the parameter, and he a
ofan interval
the estimator in a particular sample is called an 'estimate'. In interval estimttiu
within which the parameter is expected tolie is given by using ttWoquantities
based on sample values. This is known as Confidence Interval, and the twoquantitces
which are used to specify the interval, are known as Confidence Limits.

14.3 POINT ESTIMATION-CRITERIA FOR GOOD


ESTIMATORS
thesae
Many functions of sample observations may be of
as estimators valuesmay
parameter. For proposed
example, either the mean or median or mode of the Sample
be used to estimnate the parameter u of the Normal distribution with p.d.t.
ret imation and Test of
Significance 529

uhahweshallinfuture refer to as N(u, o). Naturally we have to choose one among


thesc
cestimators on the basis of certain criteria.
Accordingto R.A. Fisher, the criteria for agood estimator are
) Unbiasedness,
() Consistency.
(D) Efficiency.
(v) Sufficiency.
tnbiaseness
Astatistici is said to be an Unbiased Estimator of a parameter , if the expected value
oftis .
E(t) = 0 (14.3.1)
oberwise., the estimator is said to be 'biased'. The bias of a statistic inestimating
Bisgiven as
Bias = E(t) 0 (14.3.2)
Let xy. Ay, Thenx, be a random sample drawn from a population with mean and
variance

Sample mean (7)= S4

Sample variance
(S)= J-)? (14.3.3)

Theorem I The sample mean x is an unbiased estimator of the population mean


because
E()= u (14.3.4)
Zheorem II The sample varianceS' is a biased estimator of the population variance
oi because
E(S) -1
(14.3.5)

orem III An unbjased estimator of the population variance o is given by

(14.3.6)
(n - 1)
becausNotee the distinction betweenE(s²)S and= o2s in which only he denominators are different.
(14.3.7)

Sis the variance of the sample observations, but s is the 'unbiased estimator' of the
varvance (o in the population.
Example
with 14.1 Show that the sample mean based on a simple random sample
Cement (srswr) is an unbiased estinator of the population mean.
[C.U., B.A(Econ)'78|
Statistical Methods
546

ns
X995
30.8 30.8
Or,
i.e.. 1.52 S o's 31.1
20.3 0.99

(10) for Variance-Ratio : (means unknown)

If two independent random samples of sizes n, and n, are drawn from two
Normal populations with unknown means M Hy and variances o, o
respectively, then 2 follows F distribution with degrees of freedom
(n, -1.n,- ). If F9ps and Fo2s denote the lower and the upper 2.5% points of
Fdistribution, we have with probability 95% the following inequalities
.025 (14.5.12)

The 95% confidence interval for ol¡; can be obtained from this as
< (14.5.13)
Fo25 F97s s5
where s and s; denote unbiased estimators of o, o respectively from the
two samples. (see Example 14.51)
14.6 THEORY OF TEST OF SIGNIFICANCE

statistical Hypothesis
In many practical problems, statisticians are called upon to make decisions about a
statistical population onthe basis of sample observations. For example, given a randon
sample, it may be required to decide whether the population, from which the sample
has been obtained, is a normal distribution with mean =40and s.d, =3. In
to reach such decisions, it is necessary to make attempi
certain
the characteristics of population, particularly about assumptions guesses
or
the
values of its parameters. Such an assumption or statementprobability
about distribution
the populationis
called a Statistical Ilypothesis. The validity of ahypothesis will beetested by analysing
the sample. The procedure which enables us to decide whether a certain hypothesisis
true or not, is called Test of
Significance or Test of Hypothesis.
Null Hypothesis and Alternative Hypothesis
In tests of significance, we start with a certain hypothesis about the Population
characteristics.This iscalled Null H, For
example, the null hypotheSIS may beHypothesis,
and denoted by the symbol
that the population mean is 40. We wrie
H,(u = 40)
Any hypothesis which differes from the null hypothesis is called Alternarive
Hypothesis, and is denoted by the symbol H,. The null hypothesis is tested against an
Estimation and Test of Sianifi cance 547

allernative hypothesis which in the above case, may be either that the population
not 40, or that it is greater than 40, or tat it is less than 40: i.e. any one of
meanis
H(u 40), H(u> 40), H,(u< 40)
Thesampleisthen analysed to decide whether to 'reject' or not to reject the null
hypothesis H. For this purpose, we choose asuitable statistic, called "Test Statistic"
andfind its
sampling distribution, assuming that H, is really true. The 'observed
the statistic in the sanmple will in general be different from the expected
value' of
because of sampling fluctuations. If the difference betwecn them is large, the
value the
hypothesis H, is rejected, and we doubt the validity of our assumption. If
null
difference is not large, HÍ is not rejected, and the difference may be considered to
havearisen solely due to
fluctuations of sampling. It is therefore necessary to decide
ndmuch of difference is tolerable before we are able to conclude that the null
bypothesis is acceptable.
.oHel o£ Significance and Critical Region
The decision aboutrejection or otherwise of the null hypothesis isbased on probability
considerations.Assuming the nullhypothesis to be true, we calculate the probability
of obtaining adiffernce equal lo or greater than the observed difference. If this
nrobability is found to be small, say less than .05, the conclusion is that the observed
valhe of the statistic is rather unusual and has arisen because the underlying assumption
ie. null hypothesis) is not true. We say that the observed difference is significant at
5per cent level, and hence the 'null hypothesis is rejected' at 5 per cent level of
significance. If, however, this probability is not very small, say more than .05, the
observed difference cannot be considered to be unusual and is attributed to sampling
fluctuations only. The difference is, now, said to be not significant at 5 per cent level.
and we conclude that there is no reason to reject the null hypothesis' at 5 per cent
levelof significance. It has become customary to use 5% and 1% levels of significance,
although othe levels, such as 2% or 0.5% may also be used.
Without actually goingto calculate this probability, the test of significance may be
simplified as follows. From the sampling distribution of the statistic, we find the
maximum difference which is exceeded in (say (5) per cent of cases. If the observed
difference is larger than this value, the null hypothesis is rejected. If it is less, there is
no reason to reject the null hypothesis.
distribution, Since
Suppose, the smapling distribution of the statistic is a normal
5%, the
Ihe area under normal curve outside the ordinates at mean t 1.96 (s.d.) is only
probability that the observed value of the statistic differs from the expected value of
be
I96 times the S.E. or more is .05; and the probability of a larger difference will
stillo smaller. If, therefore.
(Observed value) - (Expected value) (14.6.1)
Z=
S.E.
than l.96, the
S either greater than 1.96 or less than -1.96 (i.e. numerically greater
values z>1.96 or
ulnypothesis H, is rejected at 5% level of significance. The set of
S-1.96, i.e. lzl196
OISitutes what is called the Critical Region for the test. Similarly since the area
sde mean 2.58 (s.d.) is only 1%. H, is rejected at 1% level of significance, if
?numerically exceeds 2.58, i.e. the critical region is Izl22.58 at 1% level.
Statistical Methods
548

appropriate test statistic we are


Using the sampling distribution of an specified level between thus able to
establish the maximum difference at a null hypothesis Ho. The the observed
the set of
expected values that is consistent with
difference which lead to the of
and
vaalues
this
test statistic corresponding to Conversely, the set of values ofFthe test of the
acceptance
called Region of acceptance. statistic Hs
lies at the boundary of the
test. The value of the statistic which Whenthe null hypothesis isregions of
leading
to the rejection of H, is referredto as Region of Rejection or "Critical Region of the

and is called
rejection value
of observed
Critical Value. true,
ofthe test statistic falling in the critical region is oftenacceptancelity
the called the
probabi
"Size of Critical Region".
Size of Critical Region S Level of Significance
However, for a continuous population, the critical region is so determined that its
Size equals the Level of Significance (a).
Two-tailed and 0ne-tailed Tests
Our discussions above were centred round testing the significance of 'differenoet
between the observed and expected values, i.e. whether the observed value ie
significantly different from (i.e. either larger or smaller than) the expected value as
could arise due to fluctuations of random sampling. In the illustration, the null
hypothesis is tested against "both-sided alternatives" (u> 40 or u<40), iLe.
H, (u =40) against H, (u 40)
Thus assuming H, to be true, we would be looking for large differences on both
sides of the expected value, i.e. in "both tails" of the distribution. Such tests are.
therefore, called wo-tailed tests".
Sometimes we are interested in tests for large differences on one side only i.e. in
one 'one tail' of the distribution. For example, whether a new process of
manufacture
produces bricks with a 'higher' breaking strength, or whether a change in the
production technique yields ´lower' percentage of defectives. These are known as
"one-tailed tests".
For testing the null hypothesis against "one-sided alte
(u> 40), i.e., rnatives (right side)
H, (u =40)
the calculated value of the statistic z is against H, (u> 40)
under the standardnormal curve lies to compared with L645. since 5% of the lca
the right of 1.645. If the observed val
zexceeds 1.645, the null
hypothesis Ho is rejected at 5%
level were used, we would replace 1.645 by 2.33. Thus the significance. Ifa1%
levelcritical
of regions for test at
5% and 1% levels are z
For testing the null 1.645 and z 2.33 respectively.
i.e. hypothesis against one-sided alternatives (left side)"(u<40)
the value of zis H, (u= 40) against H, (u
compared with < 40) -2.33
for significance at 1% -1.645 for
significance at 5% level, and with for
5% and 1% levels level. The critical regions are now 2zs-1.645 andlzs-2.33
respectively.
In fact, the sampling distributions of many of the commonly--used statistics canbe
approximated by normal distributions as the sample size increases, Soo thatthese rules
areapplicable in most cases when the
sample size is 'large', say. morethan 30.
Estimation and Test of Signifi cance 549

It is evident that the same null hypothesis may be tested against alternative
hypothesisof different types depending on the nature of the problem (see Table 14.1).
Correspondingly, the type of test and the critical region associated with each test will
alsobe different (Table
14.2).
mable 14.1 Formulation of Null Hypothesis and
ALternative Hypothesis
Problem Null Hypothesis Alternative
Hypothesis
Test whether u = 40, or
() uisdifferent from 40 H, (u = 40) H,(u 40)
(2) is more than 40 H, (u = 40) H{(u> 40)
(3) u is less than 40 H,(u= 40) H,(u< 40)
Table 14.2 Type of Test and Critical Region

ALternative Type of Type of Critical


Hypothesis Alternative Test Region
H(u 40) Both-sided Two-tailed Both Tails
H, (u>40) One-sided One-tailed Right Tail
H (u< 40) One-sided One-tailed Left Tail

Using ¿-statistic, the critical regions at a level of significance are shown in


Fig. 14.1.
(1) Two-tailed Test

RegionCr.
Critical Region
z2a2 or zS-za2

i.e.:|2 u2 (Both Tails)


a2 a2

Za?
(2) One-tailed Test (3) One-tailed Test

uotd)
RegionCr. |

0 Za -Za
Critical Region Critical Region
zS Za(Left Tail)
z2Za (Right Tail)
Fig. 14.1 Critical Regions for TwO-tailed and One
tailed Tests
Statistical Methods
550

I and Type II Errors


TYpe hypothesis does not guarantee that all
The procedure of testing
At
statistical
times, the tests may
lead to erroneous conclusions. deciThissionsis
accurate. which are
are perfectly decision istaken onthe basis of sample values, themselves
so, becausethe purely on chance.
fluctuating and depend decisions are of tWo types:
statistical
The errorsin committed bythe testin
is the error
TypeI error -This hypothersis.
(1) rejecting atrue null test in
committed by the
error -This is the error
(ii) Type II hypothesis.
accepting a false null wherher the population mean in
previous ilustration for testing population
Considering the that we have a random sample from a that th
imagine find
40, i.e. H,(u = 40), let us we apply the test for Ho(u = 40), we might
whose nmean is really 40. If region, thereby leading to the conclusion that
the critical hypothesis although it is true
value of test statistic lies in rejects the null
mean is not 40: i.e. the testas "Type lerror" or " Error of First Kind"
the populationcommitted what is known sample from a population
We have thus we have a random
suppose that we apply the test for
On the other hand, different from 40, say 43. If
be region,
whose mean is known to statistic may, by chance, lie in the acceptance null
the test reject the
H, (u= 40), the value of mean may be 40; i.e. the test does not
leading to the conclusion that the
false. This is again another form of incorrect
it is
hypothesis H, (u= 40), although
committed is known as Type llerror"or "Error of
Second
decision, and the error thus
Kind". (6 = ,)
Significance H,
Table 14.3 Errors in Test of
Statistical Decision
True Situation

Type I error
Correct decision
0= Correct decision
Type llerror
the
we can measure in advance
Using the sampling distribution of the test statistic, rejeicd
probabilities of committingthe two types of errors. Since the null hypothesis is
only when the test statistic falls in the critical region,
Probability of Type I error itis truc
=Probability of rejecting H,(0= ,). whencriticalregion.
= Probability that the test statistic liesinthe
assuming 0= (a) ofthe
test.
The probability of Type lerror must not exceed the level of significance
Probabilityof Type I error s Level of Significance values ot 8
The probability of Type Il error assunies different Ivalues for ditterentcceptedonly
covered by the alternative hypothesis H,. Since the null hypothesis is
when the observed value of the test statistic lies outside the eritical region.
Probability of Type II error (when = ). false
= Probability of accepting H, (0= ,). when itis regionof
= Probability that the test stalistie lies in the
acceplance, assuming =0,.
Estimation and Test of Siqnificance 551

probability of Type lerror is necessary for constructing atest of


significance.
The the-Size of the Critical Region'. The probability of Type Il error is usedto
fact
isin "power" of the test in detectingfalsity of the null hypothesis.
the the types
desirablethat the test procedure be so framed which minimises both
measure

Itis Butthis is not possible, because for agivensample size an attempt to reduce
error.
of error is generally accompanied by an increase in the
other type of error.
of
one
lype
ofsignificance are designed so as to limit
the probability of Type I error to
tests
The (usually5% or 1%) and at the same time to minimise the probabil1ty
|value
specified
distribution.
a error. Note that when the population has a continuous
ofType error
Probability of Type I
= Level of significance
= Size of critical region
Test
Power of a
hypothesis H, (8= ) is accepted when the observed value of test statistic
The null
test procedure. Suppose that the
lies the critical region, as determined by the a specified alternative hypothesis
i.e.
Inie value of 0 is not ,. but another value 0,,
H, is not rejected, i.e. the test statistic
H«8=9,) is true. Type Ilerror is committed if
probability of Type II error is a function of
lies outside the critical region. Hence the true.
assumed to be
9..because now =, is
II error, when = , is true, the
If B(e,) denotes the probability of Type against the specified
the test
complementary probability 1-ße,) is called Power of
alternative H, (@= 0,).
Power = |- Probability of Type IIerror
= Probability of rejecting H, when His true
as possible for all critical regions
Obviously, we would like a test to be as 'powerful'
same size. Treated as a function of 0, the expression P(0) =l- B() is called
ofthe P(e)
Function of the test for , against 6. The curve obtained by plotting
Power
as Power Curve.
against all possible values of 0, is known
in a large lot is P. To test the null
Example 14.19 The fraction of defective items
numberfof defectives in a sample of8
hypothesis H,: P= 0.2, one considers the the hypothesis otherwise. What
hypothesis iff s6, and rejects
ems and accepts the test? Whau is the probability of type Il
error
probability of type I error of this
S Ihe [W.B.H.S., "79]
Corresponding to P = 0.1? not.
test whether the fraction (P) of defectives in the lot is 0.2 or
uClOn We are going to hypothesis H,(P 0.2). The test procedure is
as
ulhypothesis H,(P =0.2).
Alternative
follows: lot.
(1) Take a random sample of 8 items fromthe
in the sample.
14) Count the number of defectives found in the sample is 6or less (fs 6),
the number of defectives
9) Accept H, if of defectives in the lot may
be ),.2.
and conclude that the fraction
the number of defectives actually obtained in the sampie is 7 or
Reject H if 0.2: i.e.
Conclusion: The fraction of defectivesin the lot is not
8.
H, is not tenable.
552 Statistical Methods

It may be seen that the number of defectives () in the


variable, which follows Binomial distribution with sample is
The probability of r defectives is C,P" | - P-r.
parameters arandomp
Probability of Type lerror
=Probability of rejecting HÍ, when H, is true
n=8and
= Probability of 7or 8 defectives, when P= 0.2
= C, (0.2)'0.8)' + (0.2)*
=0.0000,8448

Probability of Type llerror


= Probability of accepting HÍ. when aspecified H, is
=Probability of 6or less defectives, when P =01 true.
=|- Probability of 7or 8 defectives, when P = 0.1
= |-(C,(0.1)°(0.9) +(0.1)°]
= 0.9999,9927

Definition of Useful Terms


(1) Statistical Hypothesis. Any statement or
or the values of its parameters is assertion abouta statistical populatio
called a Statistical Hypothesis. There are two
types of hypothesis-Simple and Composite.
(2) Simple Hypothesis. A statistical hypothesis which
completely (i.e. the probability distribution specifies the population
called a Simple Hypothesis. and all parameters are known) is
(3) Composite Hypothesis. A
statistical hypothesis which does not specify the
population completely (i.e. either the form of probability
parameters remain
(4) Test of Hypothesisunknown) is called a Composite distribution or some
(or Test of Hypothesis.
procedure which specifies a set of Significance).
"rules for
A Test of Hypothesis is a
'reject' the hypothesis under decision" whether to 'accept' or
(S) Null Hypothesis. A statisticalconsideration (i.e. null hypothesis).
whose validity is tested for hypothesis whichis set up (i.e. assumed) and
possible
observations iscalled a Null Hypothesis. Itrejection on the basis of sample
is denoted by H, and
alternatives. Tests of hypothesis deal with tested aganst
(6) hypothesis only. rejection or acceptance of nul
Alternative Hypothesis. A
statistical hypothesis which differs from the null
hypothesis is called an
alternative hypothesis Alternative Hypothesis,
the is not tested, but and is denoted by H. The
rejection its
acceptance
contradicts (acceptance)
the null
depends on the type
of the null
hypothesis. The hypothesis.
(rejection) depends on
Alternative hypothesis
of choice of an appropriate critical region
sided
(7) Test (rightleft)Aor specified alternative hypothesis, viz. whether both-sided, one-
valueStatistic. function alternative.
of sample
called deta eTest
rmines the final decision observations (i.e. statistic) whose computed
Statistic. The regarding of Ha,is
carefully
null
and a
knowledge of appropriate
its
acceptancehas to be chosen verythe
test statistic
or rejection
of thehypot
test hstatistic
esis is true) is essentialsampling
in distribution when
under Ho (i.e.the
falls in the critical framing the decision rules. If value
region, the null hypothesis rejected.
Estimation and Test of Signi ficance 553

8)Critical Region. The set of values of the test statistic which lead to rejection
ofthenull hypothesis is called Critical Region of the test. The probability
withwhich.a true nuill hypothesis IS rejected by the test is often referred to as
"Size"ofthe Critical Region. Geometrically, asample X, X: nis n',of size
looked upon as just a point x, called Sample point, within the region of all
possiblesamples, called the Sample Space (W). The critical region is then
defined as a subset (w) of those sample points which lead to the rejection of
the null hypothesis.
Level of Significance. The maximum probability with which a true null
hypothesis is rejected is known as Level of Significance of the test, and is
denotedby a. In framing decision rules,the level of significance is arbitrarily
chosenin advance depending on the consequences of a statistical decision.
Customarily, 5% or 1% level of significance is taken, although other levels
ach as 2% or 2% IS also used. The level of significance ais used to indicate
the upper limit of the probability of committing Type I eror, i.e. the size of
the critical region.
(10) TypeI Error(or Error of First Kind), This is the error committed in rejecting
a nullhypothesis by the test when it is really true. The critical region is so
determined that the probability of Type I eror does not exceed the level of
significance of the test.
0) Type II Error (or Error of Second Kind). This is the error committed in
accepting anull hypothesis by the test when it is really false. The probability
of Type II error depends on the specified value of the alternative hypothesis,
and is used in evaluating the efficiency of a test.
12) Power. The probability of rejecting a false null hypothesis is called Power of
the test. Therefore, Power is the probability of drawing a correct conclusion
by the lest, when the nul hypothesis is false. For a specified value of the
parameter Consistent with the alternative hypothesis,
Power = |- Probability of Type Il error.
I the power is graphically plotted against all specified alternatives, the curve
tMained is known as Power Curve. The power of a test determines the extent to
Mch atest can discriminate between true and false null hypothesis.
teps in Test of Signi£icance
(|) Set up the "Null Hypothesis" H, and the "Alternative Hypothesis" H, on the
values of
vasts of the given problem. The null hypothesis usually specifies thealternative
Some parameters involved in the population: H, (0 = %). The
hypothesis may be any one of the following types: H,(6 6}) H,(6> e).
H(0 <6). The type of alternative hypothesis determines whether to use a
(right orTleft tail).
two-tailed or one-tailed
2) State the ppropriate teststatistic"
test and alsoits sampling distribution, when
the null statistic z =(T- ,)/S.E.
hypothesis is true. In large sample tests thedistribution, is often used.
(T), whichsample
In small tests, the follows
approximately population Standard Normal
is assumedto be Normal and various test
Chi-square, t or Fdistribution
Staistics are used which follow Standard Nomal,
exacly.
Statistical Methods
554

significance" a ofthetest. if itis not specified


(3) Select the level of
maximum probability of in the
problem. This represents
error, i.e. of making
a
the
wrong commithening Tygvepenl
decision by the test procedure wh
level of in
aa
Usually. a5% or 1%
null hypothesis is true.
mentioned, use 5 level).
significance iact
is the
(lf used
nothing is the test atthe chosen level of
region " of
(4) Find the critical values of the test statistic which lead to
represents the set of
hypothesis. The
null critical region always signirefjiecctanitaocinles.
appears in one or both
the alternative hypothesis is
This
of the
of th
distribution, depending on whether
both-sided. The area in the tails (called
'size of the critical region
equal to the level of significance a. For a one-tailed test. a appears In one ta
one-)mustsided be
or

in each tail of the distribution.


and for atwo-tailedtest od2 appears The criucal
region is
T Ton or Ts T-a2 when H, (0 ,)
when H(0> ,)
when H(0<e,)
TS1|-a
where T, is the value of Tsuch that the area to its right is a.
(5) Compute the value of the teststatistic T on the basis of sample data the wo
hypothesis. In large sample tests, if some parameters remain unknown they
should be estimated from the sample.
(6) If the computed value of test statistic Tlies in the critical region, "reject H.".
otherwise "do not reject Hj'. The decision regarding rejection or otherwise of
H, is made after a comparison of the computed value of T with the critical
value (i.e. boundary value of the appropriate critical region).
(7) Write the conclusion in plain non-technical language. If H, is
rejected. the
interpretation is: "the data are not consistent with the assumption that the null
Iypothesis is true and hence H, is not tenable". If H, is not rejected, "the data
cannot provide any evidence against the nullhypothesis and hence H, may be
accepted to the true". The conclusionshould preterably be given in the words
stated in the problem.
(14.7 LARGE SAMPLE TESTS
The following test of
are also called significance are valid only for large sample sizes. These tests
approximately
"approximate tests", because the sampling distributions used are only
true. when the number
more than 30). The accuracy of observations in the
however inreases with larger samplesample 1s la3
(A) Using sizes.
Nomal Distribution
(1) Test for a
shows that the specified proportion of size n
A random sampledefectives
is p. It is proportion
required
of
to test the members pOssessing a certain attribute (e.g.
specified value P: hypothesis that the proportion Pin the Populationhasi
If the
sample size n is large, we useH, (P=P)
the test
statistic
555
Bstimation and Test of Signifi cance

7,= (14.7.1)
(S.E.of p)
approximatelyfollows standard normal distribution.
hichlestingsignificance at 5% level, the rules are as follows:
For alternative Pis different'
hypothesis is that the population proportion
) Ifthe
from Po. reject M,When the value of zlies outside the range-1.96 to 1.96.
H, (P Po); Critical Region Iz12 1.96
(i)Ifthe
alternative hypothesis is that the population proportion Pis 'greater
when the value of zis greater than 1.645.
than Po reject H,
H,P>Po): Critical Region z 1.645
A f the alternative hypothesis is that the population proportion P is 'less' than
Pa, reject H, when the value of z is less than - 1.645.,
H,(P< P); Critical Region z s -1.645
Otherwise, do not reject the nullhypothesis H:
For testing at 1% level, the values 1.96, 1.645 and -1.645, given above, should
be replaced by 2.58, 2.33 and -2.33 respectively.
Table 14.4 Rejection Rules for H,(P = P)
Alternative Critical Region
Hypothesis
5% level 1% level
H,
P#Po lzl21.96 lzl22.58
z21.645 z2.33
P>Po
zSl.645 z S-2.33
P<Po
Confidence limits for the population proportion P are (see 14.5.2)
95% confidence limits =p+ 1.96 (S.E. of p)
9% confidence limits =p+ 2.58(S.E. of p) (14.7.2)

where S.E. of p= n
if H, is rejected.
ple 14.20 Adice was thrown 400 times and 'six'resulted 80 times. Do the
data justifythe hypothesis af an unbiased dice? [I.C.W.A. June "77]
uOn Let us assume that the dice is unbiased, i.e. the null hypothesis is that the probability
1
Obtaining a 'six' with the dice is 6'

H
Altermatively the dice is not ubiased, i.e. the alternative hypothesis
Statistical Methods
556

Since 'six' occurred 80 times out of 400, the


observed value of the proportion
(p) of 'six' is
80
p= 400 =0.2

Onthe assumption that H, is true (i.e. thhe dice is ubiased), the expected value oof the
(P) of 'six' is proportion
Po= = 0.167
6

6 6
(S.E. of p)= 400 120
=.0186

Using the proportion of 'six' in the sample, our test statistic is


Observed value-Expected value
Standard Error
0.2 -0.167
= 1.77
.0186
When H, is true, the statistic z follows Standard Normal distribution. Since the value of y
does not fall in the'Critical Region(critical region at 5% levellzl21.96), it is not significant
at 5% level. We have, therefore, no reject the null hypothesis, and conclude that the dice may
be unbiased.

Example 14.21 Ina big city 325 men out of 600were found to be Smokers. Does
this information support the conclusion that the najority of men in this city are
Smokers? (State the hypothesis clearly. [I.C.W.A., June '82]
Solution Null Hypothesis is that the proportion of smokers in the whole city is 50%,
50
1.e. = 0.5.
100
H(P = 0.5)
We are interested to see if the proportion of smokers is more than 50%; i.e. Alternative
Hypothesis is
H,(P> 0.5)
The proportion of smokers in the sample is 325 out of n = 600. Using sample proportion,
325
Observed Value (p) = = 0.542
600
If the null hypothesis H, is true,
Expected Value (P) = 0.5
and 0.5(1-05)
S.E. of p= 600
=0.0204
The test statistic is

Observed value - Expected value 0.542 -0.5


=2.1
S.E. 0.0204 testis
Since the alternative hypothesis H,(P > 0.5) is one-sided, the Critical Region ofthe
one-teailed. At 5% level of significance
Critical Region is z 1.645
(Note that the tail-area of Standard Normal turve for zl.645 is 5%%).
Estimation and Test of
Significance 557
The valueoftest statistic , viz. 2.1, lies in the critical region. and hence is "significant". We
theretorereject the null hypothesis at 5% level of significance and conclude that the data
thehypothesis that majority of men in
suppon the city are smokers.
Example14.22 A
manufacturer claimed that at least 90% of the components
which,hesupplied. confomed to
showedthat only 164 were upto the
specifications. Arandom sample of 200 components
standard. Test his claim at 1%level of significance.
Solution Null
Ihypothesis is that the proportion of components
i.e.
isO%,
0.9) conforming
to specifications
H, (P = 0.9)
Thealternative hypothesis isthat it is less than 90%, H,
conformingto specifications, i.e. (P<0.9). The proportion of articles
164
Observed value = = 0.82
200
Assuming that H, is true,
Expected value =0.9
0.9 x0.1
Standard Error = =.0212
200
The value of the test statistic is
0.82- 0.9
=-3.77
.0212
Since the value of z, viz. -3.77, is less than -2.33 (critical
the null hypothesis is rejected. The test therefore reveals that
region at 1% level is zS-2.33),
the manufacturer's claim is not
justified.
[Note: (1) Examples 14.20to 14.22 all relate to "Test for a specified
are different depending on the type of alternative hypothesis. Exanmple proportion", but the tests
est, Example 14.21 a one-tailed'" (right tail) test and Example 14.22 a
14.20 shows a "twotailed"
(2) As far as possible, the Examples below illustrate various "one-tailed'" (left tail) test.
tests, followed by the two kinds of one-tailed tests.]
types of tests-two tailed

(2) Test for equality of t wo proportions Let p and p, be


the
Poportions in two large random samples of sizes n and n, drawn respectively from
MO Populations. We are interested to test whether the
proportions in the two
Ppulations are equal or not, i.e. whether the difference (p, - p)as observed in the
nples has arisen onlydue to fluctuations of sampling.
If His H(P, = P,) against H,(P, P)
true, i.e. the two population proportions are equal, say P, the estimate of Pis

p= (14.7.3)
The expected value of the difference (p, -P) is then 0, and

S.E. of p-P2= where p t q=l


Statistical Methods

558
test statistic
n,and n, are large the
sizes
Ifthesarmple PIP2

approximately follows
standard normal
S.E.
distribution. The critical
significance regions are (1474)
lzl2 1.96 at 5% level of significance
lzl 2.58 at 1% level of
P, are
(see 14.5.5)
limits for P, -
Confidence
limits = (p,-p)t 1.96 (S.E.)
95% confidence (S.E.)
= (p,-P) t2.58
99% confidence limits
PI9L P242if H, is rejected.
(14.15)
S.E. = n
where

Example 14.23 In alarge city A, 20per cent of arandom sample of 900 school
children had defective eye-sight. In another large city B, 5 per cent of a random
sample of 1600 children had the same defect. Is this difference between the two
proportions significant? Obtain 95% confidence limits for the differnece in the
populationproportions.
Solution Nullhypothesis is that the population proportions are equal.
H, (P, = P,)
Alternative hypothesis is that they are not equal, H,(P, # P)
n =900 p, =20% = 0.20
n, = 1600, p, = 15% = 0.15
Under Ho, the estimate of the equal but unknown proportion Pis
900 x0.20+ 1600 >x0.15
p= =0.168
900+ 1600

S.E. of (P,-P,) =J0.168 x 0.832 900 1600


=0002427 =.0156
0.200.15
=3.21
.0156
Since the value of zfalls in the critical region z >2.58. it is significant at 1% level. ne
therefore reject the null hypothesis H, and conclude that the difference between the two
proportions is significant; i.e. P and P, are not equal.
[Note: This is a twC-tailed test.]
The 95% confidence limits for the
difference P - P, are
(p,-P)
where S.E. of (p, -p) is given t 1.96 at(S.E.
by (13.7.4) pageof160.
p -P)
But since P and P, are not known,
and the significance test has shwon thatP, #*P, they are estimated by the sample proportions
p, =0.20 and p, = 0.15
respectively.
S.E. of p, -P, =
n

n
P292 approximately
Estimation and Test of Signi ficance 559

|0.20 x 0.80 0.15x0.85


=016
900 1600

Hence,the
95% confidence limits for P, - P, are
(0.20- 0.156)#+1.96 (.016) =.05 +.03l=.019 and .081

Example
14.24 A machine produced 20 defective articles in a batch of 400.
overhauling it produced 10 defectives in a batch of 300. Has the machine
Afer [C.A., Nov. 81]
imporved?

Solution Null Hypothesis is that the proportions of defectives before and after overhauling
after
decreased
equal. Alternative Hypothesis is that the proportion of defectives has
Ie
overhauling.
H, (P,= P,) against H, (P >P)
We have
Sample size Sample proportion
20 1
Sample 1: n, = 400 Pi 400 20
10 1
Sample 2: n, = 300 P2= 300 30

null hypothesis is true, the estimate of the common proportion in the


Assuming that the
population is
20+ 10 3
p = 400 +300 70

Taking the difference of proportions (p -P),


1 1
=.0167
Observed value = 30
20
assumed equal)
Expected value = 0 (the two proportions
3 67( 1 . 1 =.0155
S.E. of (P-P) 070\400 900)
Using the test statistic (14.7.4),
0167
Observed value - Expected value = 1.08
7= 0155
S.E. therefore
hypothesis H, (P, > P,)isone-sided, and the critical region is 1.645.
ce the alternative Critical Region is >
of the Standard Normal curve. At5% level, the is not greater
not fall in the critical region (i.e.null hypothesis
one tail
yalue of z. viz. 1.08 does
ethe observed no reason to reject the
han 1.645), it is3"not significant". We have therefore machine has not
and proportrions may be equal; i.e. the
conclude that the two population
miporved after overhauling. The next one shows a
one-tailed(lefttail) test.)
Note: This is aI one-tailed (right tail) test.

be
sample of 400 were found to
Exampls. e
Smoker
14.25 In a certain
In another city, the
city
number of
l00 men
smokers
in a
was 300in a random sample of
800.
smokers in the second citv
Does this indicate that there is a greater proportion of
than in the
first?
Statistica1 Methods
560

Solution Null hypothesis H, (P, = P)


Alternative hypothesis H{(P, <P)
100
n,= 400, = 0.25
Pi 400

300
n, = 800. = 0.375
800

Under H, the estimate of the equal but unknown proportion is


100 + 300
p=
400 + 800 3

Expected value of (p, -P) is 0, and


1
S.E. of p - P2 = 900
=.029

0.25-0,375
2= =-4.31
.029
Here the alternative hypothesis H, (P, < P,) is
critical regions are one sided, and for this
one-tailed test the
zs-1.645 at 5% level of significance
zs- 2.33 at 1% level of
Since the observed value of z is less
than
significance
null hypothesis and conclude that -2.33, it is significant at 1%
level, We reject the
in the first. the proportions of smokers is greater in the second city than
[Note: Which Level of
(1) The critical region atSignificance to use-5% or 1%?
1%
the former is a subset of level is always included in the critical region at
the latter). 5% level (ie.
is found to be Therefore, if an
observed
the other hand,'sifignificcant' at 1% level, it is value of the test statistic
the observed value is 'not no doubt significant at 5% level also. On
significant at 1% level.
(2) The significant' at 5% level, it is obviously not
following general rules may be
atest:
the testConsider the critical regions both followed in the choice of
statistic is at 5o and 1%o levels. Iflevel of significance o
(a) not the observed value o
significant at 5%
at 1% level.level, mention that only. It is
(b) significance See Examples unnecessary to consider
significant at 5% level, but 14.20, 24, 26,
not significant of 1% 40, etc.
significant
at 1% level.
at 5%
level. Do not level, mention only that itis
(c) See
Examples mention
14.21 and
that the observed value is not significant
significant
5%
at
level. See 1% level, mention that only. It is
42.
If the level of Examples 14.23, 25, 27 useless to mention significance at
Examples
(3) An 14.22, sig
32,nifi
33,cance
36,
is
mentioned in the
to 31, etc.
39, 41,43 to 45, problem, there is no choice, but to it. See use
observed value of the test
"significant"; significant at 1% level,statistic
if
etc.
that is
it is said significant
to be at 5% level is said to besimply
"highly significant ]
ot imation and
Test of Significance 561

TOst for A Specified mean (large sample) Arandom sample


flargesize n gives a sample mean x. t is required to test the hypothesis that the
ppuatonmean I has aspecified value :
H,(M= H)
Forlarge n,thesamplingedistribution of sample mean ( )is approximately normal
Therefore, when H, is true, the test statistic

Z= S.E.of X
(14.7.6)
ayproximatelyfollows standard normal distribution. The critical region of the test,
On
depends the nature of the alternative hypothesis, is given in the table below.
atich
Table 14.5 Rejection Rules for H (å = u,)
Alternative Critical Region
Hypothesis
5% level 1% level
H,
lzl1.96 lzl2.58
z2 1.645 z22.33

ZS-1.645 z S-2.33

Confidence limits for the population proportion u are (see 14.5.2)


95% confidence limits = ± 1.96 (S.E. of I)
(14.7.7.)
99% confidence limits =+ ± 2.58 (S.E. of )

where S.E. of If the population s.d. o is not known, it is replaced by the


=n
sample s.d. S.
samples) Suppose
14) Test for eguality of two means (large
sizes n, and n, from two
nave two independent random samples of large
respectively. It is required to test
puatons, and the sample means are , and x,
aer the two populations have the same means.
The standard error of the difference of means (+-I) is

(14.7.8)
S.E. =
n
populations. In orderto test the
here oy and O, are the standard deviationsin the two
hypothesis that the means are equal
Hu, = )
Use the test statistic
S.E. -n (14.7.9)
S.E.
Statistical Methods
562
distibution, For
follows standard normal testing
which approximately
critical regions are:
against H (u, # u) the lzl21.96 at 5% level of significance
lzl> 2.58 at 1% level of significance
equality t
mean
are (see 4.5.4)
Confidence limits for u, -u, - )±1.96 (S.E.)
confidence limits =(
95% (S.E.)
confidence limits = ( - ) +2.58
99%
where S.E. is given by(14.7.8).
Example 14.26 Asample of 400 male students is found to have a mean height of
(14740,
Can it be reasonably regarded as a sample from a large population wth
171.38cms.
mean height (17.17 cms and
Solution
s.d. 3.30 cms).
assume that the sample really
We
comes from alarge population [LC:Wwith.A., meanJune'8)
3.30 (cms). Null Hypothesis is 17141
and s.d.
H,(u =171.17, o =3.30)
Alternative Hypothesis is that the sample does not come Trom such a population
H,(u ÷ 171.17)
Since the sample size n = 400 is large, the sample mean (x) is approximately normaly
distributed.
Observed value () = 171.38
Expected value (u,) = 171.17
3.30
S.E. of x = = 0.165
n 400
When Ho is true, the z-statistic (14.7.6) follows N (0, 1).
Observed value - Expected value 171.38171.17
Z= = 1.27
S.E. 0.165
Since the alternative hypothesis is both sided (ie. u is either more than or less than 171.1).
trhe critical region of the test is also two tailed (i.e. given by two tails of Standard Nomd
curve). At 5% level
Critical Region is |zl1.96
The value of test statistic z(viz. 1.27) does not fall in the critical region and hence s
Significant".We have therefore no reason to reject the null hypothesis at 5% level of sIg
and conclude that the sample may be regarded as having arisen from the given Population.

Example 14.27 An automatic machine was designed to pack exactly 20kyof


Vanaspati. Asample of 100tins was examinedto test the machine. The average
was found to be 1.94 kg with Standard Deviation 0.10 kg. Is the machine workiny
properly? [C.A.,May'
Solution Given Sampel size (n) =100
Sample mean (T) = 1.94 kg
it isrequired to test Sample s.d. (S) =
the hypothesis that the0.10kg
Null hypothesis: Hou =population mean is 2.0 KB
Alternative hypothesis: H,(u 2.02.0kg)kg)
Bstimation and Test of
Significance 563

Sincethesamplesize (n) is large, the sample mean (r) is approximately normally distributed
and SE. = g/Nn However, since the population s.d. o is not known, an
meanu
with
proximatevalue
of S.E. is
S 0.10
S.E. = VI00 0.01
Vn
1.94-2.0
Therefore, =-6
0.01
Sincelzl=6 exceeds 2.58., we reject the null hypothesis at 1% level of significance and
concludethat the machine is not funcioning properly.
Examples 14.26 and 27 are two-tailed tests for specified mean, the former with o
(Note:
known, and
the latter with Gunknown. Example 14.28 is a one-tailed test (o unknown).]

Example 14.28 Amachine part was designed to withstand an ave rage pressure
A random sample of size 100 from alarge batch was tested and it was
of120units.
dthat the average presSure which these parts can withstand is 105 units with a
standard deviation of 20 units. Test whether the batch meets the specifications.
[C.U., M. Com. "72, '74]
Colution Null hypothesis: Mean pressure in the population which a machine part can
120 units.
uithstand is 120 units. Alternnative hypothesis: Mean pressure is smaller than
H,(u=120) against H,(u< 120).
S
S.E. of x= approximately,(since o is not known)

20
= 2.
100
Here, sample mean is =105. Therefore,
105 - 120
=-7.5
2
(critical region at 1% level for the one
Since the valuje of z=-7.5 is smaller than -2.33
1% level of significance and conclude
lailed test is z S-2.33),we reject the null hypothesis at than 120units, i.e. that the batch of
in the population is less
diat lneaverage breaking strength specification.
acnine parts does not meet the (If the value is found to be significant at 1%
H, than at 5% level).
C, We have stronger grounds for rejection of

there are two different processes of


Example 14.29 In a certain factory
average weight in asample of 250items produced
Manufacturing
from
the same item. The
process is found to be 120 grammes with a S.D. of 12 grammes; the
one 400 items from the other process are 124 and 4.
corComputrespondi
e ng. the Standard Error of
of
figures in1a sampledifference between
confidence
the two sample means. Is the
limits for the differenced in
this

diaverf eargeence significant? Also find 99%


weights of items produced bythe two processes.
[I.C.W.A., Dec. "78]
produced from thetwo
weights of items
Solprocesses
ution Null hypothesis is that the average the averages are not equal. H, (u = ,)
are equal. Alternative hypothesis is that
against H,(u,
Statistical Methods
564

n, = 250 n, = 400
Sample sizes
Sample means I = 120 gms X, = 124 gms
Sample S.D.s S, = 12 gms S, = 14 gms
sanple S.D.s to calculate
Since the population S.D.s are not known, we use the
Error of the difference (I, I,) between the two sample means (see 14.7 8) the
S.E. of +- 7= |
Standad
/122 14
= L.03
250 400
The observed value of the test statistic is
- 120 - 124
=-3.9
S.E. 1.03
Critical region for both sided alternatives H, (u, #u,) is lzl>1.96 at 5% levelel andlzl2.58
at 1% level of significance. Since lzl= 3,9 exceeds 2.58, we reject H, at 1% 1level
signiticant; i.e. u, # Uo.lof
and conclude that the difference between two mneans is
99% confidence limits for u, - u, are(see 14.7.10). significance,
(7-) +2.58 (S.E.) = (120 - 124) + 2.58(1.03)
= -4+ 2.66 =- 1.34 and -6.66
This means that confidence limits for the
difference U, - u, are 1.34 and 6.66 øms

Example 14.30 The


67.5 and 68.0 respectively.means
of two large samples of
Test sizes 1000 and 2000 are
with s.d. 2.5. (No credit if the the equality of means of the twopopulations each
null
Assunptions should be stated clearly). and alternative hvpothesis are not stated.
Solution Null Hypothesis is that the means of [L.C.W.A., June '82)
Hypothesis the two
is that they are not equal. populations are equal, and Alternative
Hu =u,) against H(u, # u,)
Since the population sizes are
large, the appropriate test
statistic is
which follows Standard 2=S.E.of (X -Xy)
Normal distribution, when H, is true. Given
I
=1000
=67.5 n, = 2000

Using (13.7.3a), Population s.d. (o)=2.5 (common


X, =68.0
for both)
S.E. of (X-, =2.5 +
V1000 2000
=2.5 VoO15
Observed value of z=(67.5- 68.0) = 0,097

The 0.097 =-5.2


both tailsalofternative
StandardhypotNormal
hesis is both-sided, and so the
curve. At 1% level Critical Region of the test is given by
Critical Region :1zl2.58
Estimat ion and Test of 567
Significance
case.
present L-l=l-5.2|= 5.2 is actually greaterthan 2.58. So, we reject the null
the
levelof significance and conclude that the means of the two
H,at1% populations
a

Note:Examples 14 29 to 31 are all two-tailed tests for equality of two means. The two
Ayulaions.d.s are(i) unknown in Example 14.29: (ii) known and equal, but sample s.d.s are
in Example14.330: and (iii) known and equal, but the sample s.d.s are also given in
fample
14.31,
]

pxample14.31 Themean yield of wheat from a district A was 210 lbs. with S.D.
peracrefrom a sample of 100 plots. In another district B, the mean yield
0lbs. with S.D. =}12ibs. from asample of 150 plots. Assuming that the standard
nationofyieldin the entire state was II lbs., test whether there is any significant
benween the
sfirencebe mean yield of crops in the two districts.[C.A., May '76]
Solution Let u and l, denote the mean yields of crops in districts Aand Brespectively.
H,(u, u,)
H =4)against n, = 100 n, = 150
Given
A = 210 lbs. I, = 220 lbs.
S, = 10lbs. S, = 12 lbs.
Population S.D. (o)= || lbs.
Me mav assume that the S.D. of yicld in the whole state is the S.D. of yields in the two
knc's That is, the two populations have the same S.D. (o) |1 lbs. Using (13.7.3a)
4
SE. of -, =

= 142
I50
The observed value of the test statistic is
A-) 210- 220
=-7.04
S.E. 142
Since lzl=7.04exceeds 2.58, we reiect H, at 1% level and conclude that there is a significant
ilference in the mean yields of crops in the two district.
Using Chi-soquare ( ) Distribution
small
C-square distribution (Page 519) is used in both large sample and
nple tests. It is mainly used in
) Test for goodness of fit.
2) Test for independence of attributes.
8) Test for a specified standard deviation (Small Sampletest).
y) The test, devised by
Test for goodness of fit (Pearsonian in good agreement with
Parson. is used to decide whether the observations are spposed to have arisen
thyparedpospecihetiwithcalfiedthedistribution.
i.e. whether the sample may
Population.
be
The observed
expected frequencies
different classes are
frequencies (fo)of
(f) by he test statistic
(14.7.11)
Statistical Methods

566
"goodness-of-fit chi-square".
"Pearsonian
Chi-square" or When
This is called hypothetical
population)
hypothesis, viz. the
the null
in agreement withchi-square distribution with(k:- ) degrees
are
H,(Dateapproximately follows observed value of the statistic
statistic classes. Ifthe
is true,the
where k is the number of level, the null hypothesissiis rejected.
of freedom, given
tabulated value ofx at a
exceedsthe pea-breeding. Mendel obtained the
on
14.32 n his experiments yellow-315; Wrinkled and yellow--l01;
Example seeds : Roundand
following frequenciesof
and green-32; Total-556. Theory predicts that
Wrinkled Examinethe correspondence
Round and green108; the proportions 9:3:3:1.
frequenciesshould bein value of for3df. is 7.815)
the observations. (Given that 5%
between theory and theory, and that the divergences hau
with
assumption that the data agree probabilities for the classes should bo
Solution On the
sampling fluctuations, the
arisen only due to below:
3 1 expected frequencies are calculated
9 3 respectively. The
16, 16 16' 16
Observed Frequency
Expected Frequency
Class
9
x 556 = 313
Round and vellow 315 16
3
X 556 = 104
Wrinkled and yellow 101 16
3
108 x 556 = 104
Round and green 16
3
32 x 556 = 35
Wrinkled ande green 16

556 556

Note: The totals for Observed and Expected frequencies must be equal.
(315-313)' (101-104) (108104) +
(3235)
313 104 104 35
4 9 16
+
313 104 104 3S
= 01 + .09 + .15 +.26 = 0.51
Since there are 4 classes. Degrees of freedom = (4 - 1)=3
We are given that the 5% value of y for3 d.f. is 7.815. Since the observed
than the tabulated value, the null hypothesis cannot be value 0.51 iS less
support the theory. rejected. We conclude that the observai

Example 14.33 Adice was thrown 60 times with the following results:
Face 2 3 4 Total
Frequency 10
13 11 12 60
Estimation and Test of Signifi cance 567

Arethedata consistent with the hypothesis that the dice is unbiased?

(Given o = 15.09 for 5 degrees of freedom).


Colution Null hypothesis is that the dice is unbiased. Then the probability of each face is

is 60x
and the expected frequency 6
= 10 for each.

Observed Frequency f) 10 13 12

Expected Frequency (S) 10 10 10 10

16 0 4 4

16 4 9 4
+ + =3.4
10 10 10 10 10

There are 6 classes, Degrees of freedom = (6-- I) =5


Since the observed value of x(viz. 3.4) is less than the tabulated value 15.09 at 1% for
sdegrees of freedom, we cannot reject the null hypothesis at 1% level of significance. The
conclusion is that the data are in agreement with the hypothesis of an unbiased dice.

Example 14.34 5 identicalcoins are tossed 320 times, and the number of heads
appearing each time is recorded. The results are:
Number of Heads 2 3 4 5 Total
14 45 80 12 61 8 320
Frequency
Would you conclude that the coins are biased ? (Given x as = l1.07 and yo =
I5.09 for 5degrees of freedom).
Solution This is a problem for testing "goodness of fit" of a binomial distribution. The nuli
hypothesis is that the coins are unbiased, i.e. the probability of obtaining head is 1/2 for each
coin.
H, (data support Binomial distribution with P= 1/2)

The test

ItH, is true, i.e. the coins are assumed to tbe unbiased, the probability of obtainingr heads
n One throwof the set of 5 coins is given by the binomial distribution
5-r

P(r) = 32
SO, the expected frequency () of r heads in 320 tosses is
f,=320. P(r) =10x C.
Inese are shown below for different values of r:
Number of Heads () 4 Total

Observed Frequency Uo) 14 45 80 |12 61 320

Expected Frequency V) 10 50 100 100 50 10 320


568 Statistical Methods

4 5 20° 12
10 50 100 100 50 10
0.4 =10.36.
= 1.6+0.5+ 4.0 + 1.44+2.42+
There are 6 classes: so De grees of freedom=6-l= 5
tabulated value at 5 laya
Since the observed value of z (viz. 10.36) is less than the conclude that
We therefore
(Viz. 11.07), we cannot reject H, at 5% level of significance.
coins may not be biased.

Contingency
(2) Test for independence of attributes |Note: (1)Section 12)or
(see
Table-In Statistics,sometimes we have to deal with "attributes
qualitative characters of members which cannot be measured accurately, although
the members can be divided into twoor more categories with respect to the attributes.
Let us consider two attributes A and B, where A is shown in mcategories A,, A,, ...
form of a two
Am, and B in n categories B,, B,. .... B,. The data can be shown in the
way table with m rows and n columns, as in a biv ariate' frequencydistribution (Tables
9.2 and 9.7). This two-way frequency table for attributes is known as (m x)
Contingency Table (see Example 14.35). The frequency of members belonging to
both the categories A, and B, simultaneously is shown in the cell at the i-th row and /
th column, and denoted by (A,B.).
Similarly. (A) and(B,) denote the frequency of members belonging to categories
A, and B, respectively, and N the total frequency, as in the table below.
(3 x 4) Contingency Table

Atribute B

B, B, Total
B4
A, (A,B) (A, B,) (A,B) (A, B,) (A,)
A (A,B,) (A,B,) (A,B) (A,B,) (A,)
A (A,B,) (A,B,) (A,B,) (A,B}) (A,)
Total (B,) (B,) (B) N
(B,)
Oten, the members are divided into two categories only in respect of each attribute. according
to the presence or absence of attribute A (denoted by A and a) and presence or absence o
attributeB (denoted by Band B). See Example 14.36.
(2 x 2 ) Contingency Table
Attribute A
A Total

B (AB) = a (a B) =b (B) = a +b
B (AB) =c (a ß) = d (B) =c+d
Total (A)= a+c (a) =b + d

You might also like