Professional Documents
Culture Documents
Chi Sqaure &anova
Chi Sqaure &anova
The test we use for comparing a sample variance to some theoretical or hypothesised variance of
population is different than z-test or the t-test. The test we use for this purpose is known as chi-square test
and the test statistic symbolised as c2 , known as the chi-square value, is worked out.
The chi-square value to test the null hypothesis viz,
H0: ss2 = sp2 worked out as under:
2
σ 2s
c = 2 (n−1)
σp
Where;
ss2 = variance of the sample
sp2 = variance of the population
(n – 1) = degree of freedom, n being the number of items in the sample.
Then by comparing the calculated value of c2 with its table value for (n – 1) degrees of freedom at a
given level of significance, we may either accept H0 or reject it. If the calculated value of c2 is equal to
or less than the table value, the null hypothesis is accepted; otherwise the null hypothesis is rejected. This
test is based on chi-square distribution which is not symmetrical and all the values happen to be positive;
one must simply know the degrees of freedom for using such a distribution.
Example: A sample of 10 is drawn randomly from a certain population. The sum of the squared
deviations from the mean of the given sample is 50. Test the hypothesis that the variance of the
population is 5 at 5 per cent level of significance.
c2 = 9.99 or 10
The table value of c2 at 5 per cent level for 9 d.f. is 16.92. The calculated value of c2 (10) is less than this table
value (calculated value<table value= null hypothesis is not rejected), so we accept the null hypothesis and
conclude that the variance of the population is 5 as given in the question.
B. CHI SQAURE TEST AS NON PARAMETRIC TEST
CALCULATION STEPS of c2
(f ¿ ¿ o−f e )2
ᵡ 2=∑ ¿
fe
Where,
F0 = observed frequency of each of response categories
Fe = expected frequency in each of the response categories
Step 1:
( Row total for the row of that cell)∗(Columntotal for thecolumn of that cell)
Expected frequency of any cell=
Grand total
Step 2: Find out the difference between observed and expected frequencies and find out the squares of such
differences i.e., calculate (fo – fe)2.
Step 3: Divide the (fo – fe)2 obtained as stated above by the corresponding expected frequency to get (fo – fe)2/fe and
this should be done for all the cell frequencies or the group frequencies.
( f ¿ ¿ o−f e )2
(iv) Find the summation of (fo –fe) /fe values or what we call∑
2
¿ This is the required c2 value.
fe
The c2 value obtained as such should be compared with relevant table value of c2 and then inference be
drawn.
Example: Mr. George Mcmohan, president of National general health insurance company, is opposed to
national health insurance. He argues that it would be too costly to implement, particularly since the
existence of such a system would, among other effects, tend to encourage people to spend more time in
hospitals. George believes that lengths of stays in hospitals are dependent on the types of health insurance
that people have. He asked donna, his staff statistician. To check the matter. Donna collected data on a
random sample of 660 hospital stays and summarized them in the given table: test at 99% confidence
interval.
Days in hospital
Fraction costs <5 5-10 >10 Total
covered by <25% 40 75 65 180
insurance 25-50% 30 45 75 150
>50% 40 100 190 330
Total 110 220 330 660
Answer: null hypothesis H0: Length of stay and type of insurance are independent
Alternate hypothesis Ha: Length of stay depends on type of insurance
Fe=(RT*CT/GT (Fo-
s.no. Row Column Fo ) Fo-Fe (Fo-Fe)^2 Fe)^2/Fe
3.33333333
1 1 1 40 30 10 100 3
2 1 2 75 60 15 225 3.75
6.94444444
3 1 3 65 90 -25 625 4
4 2 1 30 25 5 25 1
5 2 2 45 50 -5 25 0.5
6 2 3 75 75 0 0 0
4.09090909
7 3 1 40 55 -15 225 1
0.90909090
8 3 2 100 110 -10 100 9
3.78787878
9 3 3 190 165 25 625 8
24.3156565
chi-sqaure 7
2
As the value of ᵡ is 24.31 which is greater than table value of 13.27 at 4 degree of freedom at 1 %
significance level we can say that null hypothesis is not accepted and thus there is significant association
between insurance type and duration of stay in hospital. (insurance coverage and length of hospital stay
are dependent on each other.)
Chi-Sqaure in SPSS
Applicable: When DV and IDV both are Non-metric(measured on nominal/ ordinal scale)
Cases
Chi-Square Tests
Interpretation: Chi-sqaure test statistics of 823.87 (significance value <0.05) indicates that there is
significant association between two variables (Income and job satisfaction).
1) For checking assumption go to AnalyzeexploreCross tabs click on cells
Output table:
Under $25
Expected 203.4 232.6 255.5 257.9 224.5 1174.0
Count
$25 - $49
Income Expected 413. 8 473.1 519.8 524.6 456.7 2388.0
category Count
in 137 215 261 262 245 1120
Count
thousands
$50 - $74
Expected 194.1 221.9 243.8 246.1 214.2 1120.0
Count
$75+
Expected 297.7 340.4 373.9 377.4 328.6 1718.0
Count
Count 1109 1268 1393 1406 1224 6400
Total
Expected 1109.0 1268.0 1393.0 1406.0 1224.0 6400.0
Count
Look for expected frequencies if there is any frequency <5 then chi -sqaure assumption is violated
and chi-sqaure test cannot be applied. If assumption is fulfilled then go ahead.
ANOVA is essentially a procedure for testing the difference among different groups of data for
homogeneity. Earlier, we noted that t-test can be used to study the means of one or two samples. But,
if there are more than two samples, then multiple t-tests will need to be applied. This process may be
very complex. Instead, ANOVA can be used to study the means of two or more populations.
ANOVA involves analysis of dependent variable should be interval or ratio scale.
and Independent variable should be categorical (nominal or ordinal scale).
Through ANOVA technique one can, in general, investigate any number of factors which are
hypothesized or said to influence the dependent variable. One may as well investigate the differences
amongst various categories within each of these factors which may have a large number of possible
values. If we take only one factor and investigate the differences amongst its various categories
having numerous possible values, we are said to use one-way ANOVA and in case we investigate
two factors at the same time, then we use two-way ANOVA. In a two or more way ANOVA, the
interaction (i.e., inter-relation between two independent variables/factors), if any, between two
independent variables affecting a dependent variable can as well be studied for better decisions.
Two estimates of population variance viz., one based on between samples variance and the other
based on within samples variance. Then the said two estimates of population variance are compared
with F-test,
Estimate of population variance based on between samples variance
F=
Estimate of population variance based on within samples varianc e
Step 3: a) if the null hypothesis accepted (p value >0.05), that means there is no significant difference
between the mean scores of different categories
b) if the null hypothesis is rejected (p value <0.05), that means At least one of the categories of IDV
differ significantly from the rest in their mean scores. then apply post -hoc analysis for checking
the different category.
Steps in SPSS:
1. Analyze General Linear ModelUnivariate
PLOTS POST HOC
OPTIONS
OUTPUT
Table 1: This table represents the count of each category for each IDV variable
Between-Subjects Factors
Value Label N
1 A 9
Territory 2 B 8
3 C 13
1 Summer 10
Season 2 Winter 10
3 Rainy 10
Table 2: shows the assumption of homogeneity of variances where levene test is applied to test the
hypothesis
1.871 8 21 .119
Territory * Season
Dependent Variable: Sales
Multiple Comparisons
Dependent Variable: Sales
Tukey HSD
(I) Territory (J) Territory Mean Difference Std. Error Sig. 95% Confidence Interval
(I-J) Lower Bound Upper Bound
Multiple Comparisons
Dependent Variable: Sales
Tukey HSD
(I) Season (J) Season Mean Difference Std. Error Sig. 95% Confidence Interval
(I-J) Lower Bound Upper Bound
Territory N Subset
1
Sales
A Tukey HSD 9 3.22
C 13 4.85
Season N Subset
B 8 4.88
1
Sig. .194
Winter 10 3.90
Means for groups in homogeneous subsets are displayed. Based on observed means.
Summer 10 4.50
The error term is Mean Square(Error) = 4.040.
Rainy 10 4.70
a. Uses Harmonic Mean Sample Size = 9.584.
Sig. .652
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed.
Means for groups in homogeneous subsets are displayed. Based on observed means.
c. Alpha = .05.
The error term is Mean Square(Error) = 4.040.
a. Uses Harmonic Mean Sample Size = 10.000.
b. Alpha = .05.
PROFILE PLOTS: Estimated Margin Means of Sales for Territory And Season