Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

The Chi-square

What is a Chi Square Test?


The chi- square compares or tallies or counts of categorical responses between two or more
independent groups, denoted as X2. It is used to investigate whether the distribution of
categorical variable differ from one another.
There are two types of chi-square tests. Both use the chi-square statistic and distribution for
different purposes:
 A chi-square goodness of fit test determines if a sample data matches a population.
 A chi-square test for independence compares two variables in a contingency table to see
if they are related. In a more general sense, it tests to see whether distributions
of categorical variables differ from each another.
 A very small chi square test statistic means that your observed data fits your expected
data extremely well. In other words, there is a relationship.
 A very large chi square test statistic means that the data does not fit very well. In other
words, there isn’t a relationship.
It is important to note that chi-square can only be used on actual numbers and not on
percentages, proportions, means etc.
1.Homogenity: It is test for association of two variables (comparison of different groups). It is
comparison of two categorical variables in a contingency table to see if they are related. i-e 2*2
Contingency table for two variables
variable 1 2 Totals
Category 1 a b a+b
Category 2 c d c+d
Totals a+c b+d N=a+b+c+d

Solved Example:
Is there any association between appearance of eye colour and hair colour can be interpreted in
given set of data?
Given Data:
For Eye colour:1=black, 2=brown, 3=blue
Black=10times, Brown=10, blue=13
For Hair colour: 1=black, 2=brown, 3=blonde
Black=12, brown=9, blonde=12
H0= There is no association between eye colour and hair colour.
H1= There is significant association between eye colour and hair colour.
Key for SPSS: descriptive-crosstabs-add either eye colour or hair colour in row or column (no
difference in calculations)-statistics-check chi-square-continue-cells-counts observed expected-
ok.
RESULTS FROM SPSS:

eyecolour * haircolour Crosstabulation

haircolour Total

black brown blonde

Count 5 3 2 10
black
Expected Count 3.6 2.7 3.6 10.0

Count 1 3 6 10
eyecolour brown
Expected Count 3.6 2.7 3.6 10.0
Chi-Square Tests
Count 6 3 4 13
blue
Value
Expected Count df4.7 Asymp.
3.5 Sig. (2- 4.7 13.0
Count 12 sided)
9 12 33
Total a
Pearson Chi-Square 5.288
Expected Count 12.0 4 9.0 .25912.0 33.0
Likelihood Ratio 5.886 4 .208
Linear-by-Linear Association .059 1 .809
N of Valid Cases 33

a. 9 cells (100.0%) have expected count less than 5. The minimum


expected count is 2.73.

Discussion and conclusion:


As the sig value of chi-square test statistics is larger than decided sig value (5.288 > 0.05). Thus
we accept the null hypothesis and conclude that there was no significant association between eye
colour and hair colour.
2. Goodness of fit: The chi-square goodness of fit test is a useful to compare a theoretical model
to observed data.it is used to compare a collection of categorical data with some theoretical
expected distribution (with some known ratio). It is more often used in genetics.
 Test for linkage
 Comparison with defined ratio
Solved Example 1:
Consider a standard package of milk chocolate M&Ms. There are six different colors: red,
orange, yellow, green, blue and brown.
Suppose that we are curious about the distribution of these colors and ask, do all six colors occur
in equal proportion? This is the type of question that can be answered with a goodness of fit test.
Setting
We begin by noting the setting and why the goodness of fit test is appropriate. Our variable of
color is categorical. There are six levels of this variable, corresponding to the six colors that are
possible. We will assume that the M&Ms we count will be a simple random sample from the
population of all M&Ms.
Null and Alternative Hypotheses
The null and alternative hypotheses for our goodness of fit test reflect the assumption that we are
making about the population. Since we are testing whether the colors occur in equal proportions,
our null hypothesis will be that all colors occur in the same proportion. More formally, if p1 is
the population proportion of red candies, p2 is the population proportion of orange candies, and
so on, then the null hypothesis is that p1= p2 = .
. . = p6 = 1/6.
The alternative hypothesis is that at least one of the population proportions is not equal to 1/6.
Actual and Expected Counts
The actual counts are the number of candies for each of the six colors. The expected count refers
to what we would expect if the null hypothesis were true. We will let n be the size of our sample.
The expected number of red candies is p1 n or n/6. In fact, for this example, the expected number
of candies for each of the six colors is simply n times pi, or n/6.
Chi-square Statistic for Goodness of Fit
We will now calculate a chi-square statistic for a specific example. Suppose that we have a
simple random sample of 600 M&M candies with the following distribution:
 212 of the candies are blue.
 147 of the candies are orange.
 103 of the candies are green.
 50 of the candies are red.
 46 of the candies are yellow.
 42 of the candies are brown.
If the null hypothesis were true, then the expected counts for each of these colors would be (1/6)
x 600 = 100. We now use this in our calculation of the chi-square statistic.
We calculate the contribution to our statistic from each of the colors. Each is of the form (Actual
– Expected)2/Expected.:
 For blue we have (212 – 100)2/100 = 125.44
 For orange we have (147 – 100)2/100 = 22.09
 For green we have (103 – 100)2/100 = 0.09
 For red we have (50 – 100)2/100 = 25
 For yellow we have (46 – 100)2/100 = 29.16
 For brown we have (42 – 100)2/100 = 33.64
We then total all of these contributions and determine that our chi-square statistic is 125.44 +
22.09 + 0.09 + 25 +29.16 + 33.64 =235.42.
Note A rule of chi square twst is that if the expected frequencies of any cell is less than 5,
we do yart correction. Formula: (|O-E|-0.5)2 OR when degree of freedom is 1 then we also
use yart correction.
Degrees of Freedom
The number of degrees of freedom for a goodness of fit test is simply one less than the number
of levels of our variable. Since there were six colors, we have 6 – 1 = 5 degrees of freedom.
Chi-square Table and P-Value
The chi-square statistic of 235.42 that we calculated corresponds to a particular location on a chi-
square distribution with five degrees of freedom. We now need a p-value, to determines the
probability of obtaining a test statistic at least as extreme as 235.42 while assuming that the null
hypothesis is true.
Microsoft’s Excel can be used for this calculation. We find that our test statistic with five
degrees of freedom has a p-value of 7.29 x 10-49. This is an extremely small p-value.
Decision Rule
We make our decision on whether to reject the null hypothesis based on the size of the p-value.
Since we have a very miniscule p-value, we reject the null hypothesis. We conclude that M&Ms
are not evenly distributed among the six different colors. A follow-up analysis could be used to
determine a confidence interval for the population proportion of one particular color.
Example 2: Do all blood groups are distributed in equal proportions in a population?
DATA: 2,1,1,3,4,3,2,2,1,1,1,4,2,2,2,2,2 where,
1=blood group A, 2= blood group B, 3=blood group AB, 4=blood group O.
H0= All blood groups occur in same proportions. Or x1=x2=x3=x4=1/4
H1= blood groups are randomly distributed.
CASE-I: All categories equal
Key: analyses-nonparametric-chi-square-add blood group to test variable ut- expected values-
check all categories equal-options-check descriptive-ok

blooggroups

Observed N Expected N Residual

A 5 3.4 1.6
B 8 11.9 -3.9
AB 2 .5 1.5
O 2 1.2 .8
Total 17

Test Statistics

blooggroups

Chi-Square 6.936a
df 3
Asymp. Sig. .074

As expected frequencies of all cells is less than 5, we need to do yart correction.


Result and conclusion:

As value of p after yart correction is 3.94 which is greater than 0.05, so we will accept the null
hypothesis that all blood groups are equally distributed in the population i-e x1=x2=x3=x4=1/4
CASE-II:
Let’s say we have blood group data in following form.
Blood groups: Values values/100 observed N
A 20% 0.2 5
B 70% 0.7 8
AB 3% 0.03 2
O 7% 0.07 2
ON SPSS:
Key: analyse- nonparametric-chi-square-add blood to test variable-expected values-check values-
in this box add %ratio values given to each blood group according to order of their labelling.
blood.groups

Observed N Expected N Residual

A 5 3.4 1.6
B 8 11.9 -3.9
AB 2 .5 1.5
O 2 1.2 .8
Total 17

Test Statistics

blood.groups

Chi-Square 6.936a
df 3
Asymp. Sig. .074

a. 3 cells (75.0%) have


expected frequencies less than
5. The minimum expected cell
frequency is .5.

We need to do yart correction as frequency of 3 cells is less than 5.


Result and conclusion:
After yart correction the p value is 0.969 which is greater than the decided p value of 0.05. Thus
we accept the null hypothesis and concludes that all blood groups are distributed in same
proportions in the given population.

You might also like