Chi Square

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Chi-Square

Test

MBMG-7104/ ITHS-2202/ IMAS-3101/ IMHS-3101

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


What is Chi Square test ?

 The (pronounced as Chi-square) was first


used by Karl Pearson in the year 1900.

 CHI SQUARE TEST is a non parametric test


not based on any assumption or distribution
of any variable.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


What is Chi Square test ?

 This statistical test follows a specific


distribution known as chi square distribution.

 Ingeneral the test, is used to measure the


differences between what is observed and
what is expected according to an assumed
hypothesis is called the chi-square test.
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
The formula for computing chi-square is:

where’s O = observed frequency, E = expected or theoretical frequency.

 The quantity describes the magnitude of discrepancy


between theory and observation,
 i.e., with the help of test we can know whether a
given discrepancy between theory and observation can
be attributed to fit the observed facts.
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
APPLICATIONS OF A CHI SQUARE TEST
This test can be used for :

1. Tests of Hypothesis Concerning Variance


2. Significance test
3. Test of independence of attributes
4. Goodness of fit of distributions
5. Test of homogeneity.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


1) Tests of Hypothesis Concerning Variance
For a normally distributed population which has
population variance of σ2 , if we draw a random
sample of n size, with sample variance of s2 , then

i.e. the sample follows a with n-1 degree of


freedom. The d.f. for distribution is denoted by ʋ.
Or ʋ = n-1
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
1) Tests of Hypothesis Concerning Variance
In testing hypothesis about the variance of a
normally distributed population,

the null hypothesis is


H 0: =

where is some specified value of the population


variance.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Illustration 1:
Weights in kilograms of 10 shipments are given
below:

38, 40, 45, 53, 47, 43, 55, 48, 52, 49.

Can we say that variance of the distribution of


weight of all shipments from which the above
sample of 10 shipments was drawn is equal to 20
square kilogram?

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Solution :
Let the null hypothesis be that the variance of the distribution of shipments
weight is 20 square kilogram,
Or H0: 𝜎 = 20
∑ 2
Mean value of sample 𝑥̅ = = = 47
Weight (in Kg) (𝑥 − 𝑥̅ ) 𝑥 − 𝑥̅ 2
x
38 -9 81
40 -7 49 20

45 -2 4
53 +6 36 And degree of freedom for χ² distribution
47 0 0 ʋ = n−1
43 -4 16 From χ² table value, the value for 9 d.f. at 5%
55 +8 64 level of significance = 16.919.
48 +1 1
52 +5 25 Since the calculated value of χ² is less than the tabulated
49 +2 4 value of χ², it is insignificant and the null hypothesis is
∑𝑥 = 470 ∑(𝑥 − 𝑥̅ )= 0 ∑(𝑥 − 𝑥̅ )2= 280 accepted. That the variance of the distribution of weights
of all shipments in the population is 20 kilograms
2) SIGNIFICANCE TEST

This test enables us to test the significance of null


hypothesis using test.

Illustration 2 :-

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Solution :-

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


3) TEST OF INDEPENDENCE OF ATTRIBUTES

This test enables us to explain whether or not two


attributes are associated.

 >For instance, we may be interested in knowing


whether a new medicine is effective in controlling fever
or not, test is useful.

 In such a situation, we proceed with the null hypothesis


that the two attributes (viz., new medicine and control
of fever) are independent which means that new
medicine is not effective in controlling fever.
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
2) TEST OF INDEPENDENCE OF ATTRIBUTES
 In the test of independence, the population and
sample are classified according to some
attributes.
 To test the independence of attributes we uses a
contingency table, as :
 Let us designate the two attributes as A and B
where, attribute A is assumed to have r categories
and attribute B is assumed to have c categories.
 Furthermore, assume the total number of
observations in the problem is N.
 A representation of these observations is shown in
a table where O, represents the observation in the
ith row and jth column. Such a table in the matrix
form is called a contingency table, as shown
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
Illustration 3:

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
 fe : for male student choosing French
language = 55 x 60 / 90 = 36.67
 Similarly fe : for Female student choosing
French language = 35 x 60 / 90 = 23.33
 fe : for male student choosing Russian
language = 55 x 30 / 90 = 18.33
 fe : for female student choosing Russian
language = 35 x 30 / 90 = 11.67
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
O E (O-E)2 (O-E)2 / E
39 36.67 5.4289 0.148047
21 23.33 5.4289 0.2327
16 18.33 5.4289 0.296176 = 1.142
14 11.67 5.4289 0.465201

Σ(O-E)2 / E 1.142125

For contingency table, the degree of freedom is calculated as :


ʋ = (r−1)(c−1) = (2−1) (2−1) = 1

From table value, the value for ʋ = 1 (d.f. = 1) at 5% level of


significance = 3.84

Since The calculated value of is less than the table value. The null
hypothesis is accepted. Hence, there is no relationship b/w choice of
language and gender. BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
Exercise:
1. A sample of 200 persons with a particular disease was selected. Out of
these, 100 were given a drug and the others were not given any drug.
The results are as follows:
No. of persons
Drug No Drug Total
Cured 65 55 120
Not cured 35 45 80
Total 100 100 200

Test, whether the drug is effective or not.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Exercise:
2. A certain drug is claimed to be effective in curing cold. In an
experiment on 500 persons with cold, half of them were given the drug
and half of them were given the sugar pills. The patients' reactions to
the treatment are recorded in the following table:

Helped Harmed No effect Total


Drug 150 30 70 250
Suger pills 130 40 80 250
Total 280 70 150 500

On the basis of this data, can it be concluded that there is a


significant difference in the effect of the drug and sugar pills

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


4) TEST OF GOODNESS OF FIT OF DISTRIBUTIONS:

 Tests of goodness of fit are used when we want


to determine whether an actual sample
distribution matches a known theoretical
distribution.

 it enables us to ascertain how well the


theoretical distribution such as Binomial,
Poisson, Normal, etc., fit empirical distribution,

 i.e., how does the sample data fit a distribution


BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
4) TEST OF GOODNESS OF FIT OF DISTRIBUTIONS:

 The test formula for goodness of fit is:

 where’s O = observed frequency, E = expected or theoretical frequency

 The null hypothesis is


 H0: = the sample is drawn from the theoretical population distribution,

 The alternative hypothesis is


 Ha: = the sample is not drawn from the theoretical population distribution.
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
 If (calculated) < (tabulated) with (n-1) d.f,
then null hypothesis is accepted.

 If (calculated) > (tabulated) with (n-1) d.f,


then null hypothesis is rejected.

 if null hypothesis is accepted, then it can be


concluded that the given distribution follows
theoretical distribution.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Illustration 4:
The number of spare parts requires in a factory
was found to vary from day to day. In a sample
study, the following information was obtained:
Days Mon. Tue. Wed. Thu Fri Sat Total
No. of parts
demanded : 1124 1125 1110 1120 1125 1115 6720

Test the hypothesis that the number of parts


demanded does depend on the day of the week.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Illustration 5:
A survey of 320 families with 5 children each,
revealed the following distribution regarding birth
of Childs:
No. of boys
born : 5 4 3 2 1 0
No. of girls
born : 0 1 2 3 4 5
Families : 14 56 110 88 40 12

Is this result consistent with the hypothesis that


boys and girls births are equally probable?

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


5) TEST OF HOMOGENITY
 This test explore the proposition that several
populations are homogeneous with respect to some
characteristic of interest.

 i.e. used to test whether the occurrence of events


follow uniformity or not

 e.g. the admission of patients in government hospital in


all days of week is uniform or not can be tested with
the help of chi square test.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


5) TEST OF HOMOGENITY

 The analytical procedure is same as that discussed for


test of independence.

 It is different from independence test in following ways:


 In Independence test we are concerned with the
problem whether the two attributes are independent or
not.
 while in tests of homogeneity, we are concerned
whether the different samples come from the same
population or same or not
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
5) TEST OF HOMOGENITY

 We use contingency table and make calculations just as


independence test.

 If (calculated) < (tabulated), then null hypothesis


is accepted,

 and it can be concluded that there is a homogeneity or


uniformity in the occurrence of the events.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Illustration 6:
A random sample of 400 persons was selected from three
age groups, and each person was asked to specify which
TV program they prefer out of 3 type of Programs. The
results are shown in the following table:

Test the hypothesis that the populations are homogeneous


with respect to the types of television program, people
prefer.
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
Conditions for the Application of Test

The following five basic conditions must be met in order for


chi-square analysis to be applied:

(1) The experimental data (sample observation) must be


independent of each other.

(2) The sample data must be drawn at random from the


target population.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


(3) The data should be expressed in original units for
convenience of comparison and not in percentage or ratio
form.

(4) There should not be less than five observations.

(5) For less than 5 observations, the value of χ² shall be


overestimated and result in too many rejections of the null
hypothesis.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


YATE'S CORRECTION
 The chi-square distribution is continuous distribution
used with discrete data from a contingency table.

 When the expected frequencies are large, this


approximate procedure is appropriate.

 In a 2 × 2 table, when expected frequencies are small,


a correction was proposed by F. Yates in the year 1934
called "Yates's correction for continuity". The correction
consists of:

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


YATE'S CORRECTION
 The correction consists of:

 where |O-E| means the absolute difference.


 In general, the correction is made only when the
number of degrees of freedom is ʋ = 1.
 For large samples, this yields practically the same
results as the uncorrected χ².
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
IMPORTANT CHARACTERISTICS OF A CHI SQUARE TEST

 This test (as a non-parametric test) is based on


frequencies and not on the parameters like mean
and standard deviation.

 The test is used for testing the hypothesis and is


not useful for estimation.

 This test can also be applied to a complex


contingency table with several classes and as
such is a very useful test in research work.
BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior
IMPORTANT CHARACTERISTICS OF A CHI SQUARE TEST

 This test is an important non-parametric test as


no rigid assumptions are necessary in regard to
the type of population, no need of parameter
values and relatively less mathematical details
are involved.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


STEPS INVOLVED IN CALCULATING
1. Calculate the expected frequencies and the observed
frequencies:-
A. For Independence & Homogeneity test use
contingency table and formula for Expected
frequencies fe as :

B. For Goodness of fit :


a) If no distribution given use Uniform distribution =
b) If particular distribution is given(binomial, poisson etc.) use
formula of that distribution to calculate expected frequency.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


STEPS INVOLVED IN CALCULATING
2. Then is calculated as follows:

3. Calculate degree of freedom for χ² distribution

for goodness of fit : ʋ = n−1

4. Compare calculate with critical value of at ʋ d.f.

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Exercise:
1. The figures given below are (a) the theoretical frequencies of a
distribution and (b) the frequencies of the distribution having the same
mean, standard deviation and total frequency as in (a):

(a) 1 12 66 220 495 792 924 792 495 220 66 12 1


(b) 2 15 66 210 484 799 943 799 484 210 66 15 2

Test, whether the normal distribution provides a good fit to the data ?

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior


Exercise:
2. 1,000 workers, are there in a factory. They get exposed to an
epidemic. The workers get affected and remain unaffected is given. The
factory administration launched a vaccination campaign. The data of
epidemic attack and vaccination is given as :

Vaccinated workers Not- vaccinated Total


Epedemic effect 200 500 700
not affected 200 100 300
Total 400 600 1000

On the basis of this information, can it be said that vaccination and


epidemic contamination(affected by epidemic) are independent?

BA - Ravindra N. Shukla (Research Scholar) ABV-IIITM Gwalior

You might also like