Chi Square Test SL&HL

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Statistics & Probability Math AI SL

25. Chi Square Test


Summary of Key Points

1. The null hypothesis, denoted by 𝐻0 , is the default position, assumed to be true unless there
is significant evidence against it.

2. The alternative hypothesis, denoted by 𝐻1 , specifies how you think the position may have
changed.

3. Number of degree of freedom = 𝒏 − 𝟏 (If 𝑓𝑜 is given in single dimension)

Number of degree of freedom = (𝑟 − 1) × (𝑐 − 1) (If 𝑓𝑜 is given in matrix format)

4. Calculate 𝝌𝟐 𝒕𝒆𝒔𝒕 statistic and 𝝌𝟐 𝒄𝒓𝒊𝒕𝒊𝒄𝒂𝒍

5. Reasoning and Conclusion :

Using 𝝌𝟐 𝒕𝒆𝒔𝒕

If 𝜒 2 𝑡𝑒𝑠𝑡 < 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 , then accept 𝐻0

If the 𝜒 2 𝑡𝑒𝑠𝑡 > 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 , then reject 𝐻0

Using 𝒑 − 𝒗𝒂𝒍𝒖𝒆

If the p-value > the significance level, there is no sufficient evidence to reject 𝐻0 .
So accept 𝐻0

If the p-value < the significance level, there is sufficient evidence to reject 𝐻0 .
So reject 𝐻0 . Accept 𝐻1

Goodness of Fit Test (GoF test)


1
Eg 1. A school offers a choice of three different languages. Ibson knows that in his year, of the
3
1 5
students selected Spanish, selected German and selected Russian. Math AI SL
4 12

In order to investigate whether the choices in the year below follow the same distribution,
he selected a sample of 60 students. Their choices were:

Language Frequency
Spanish 26
German 16
Russian 18

Ibson conducts a 𝜒 2 test at the 5% significance level to test whether the choices in the year
below follow the same distribution .

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 1


(a) State the null and alternative hypotheses.

𝐻0 ∶ Choice of languages follow the given distribution


1 1 5
(Spanish = 3 , German = 4 and Russian = 12)
𝐻1 ∶ Choice of languages do not follow the given distribution

(b) State the number of degrees of freedom.

Number of degree of freedom = 𝒏 − 𝟏 (If 𝑓𝑜 is given in single dimension)


= 3−1 = 2

(c) Fill out a table of observed and expected frequencies and calculate the 𝜒 2 value and the
𝑝 −value.

Language (𝒇𝒐 ) Probability (𝒇𝒆 )


Spanish 26 1 1
× 60 = 20
3 3
German 16 1 1
× 60 = 15
4 4
Russian 18 5 5
× 60 = 25
12 12
Go to Statistics mode in GDC. Input 𝒇𝒐 and 𝒇𝒆 in List1 and List2 respectively

Then Test – Chi – GoF

𝜒 2 value = 3.83 and 𝑝 −value = 0.1476

(d) Calculate the critical value for this test at 5% significance level

GDC - Statistics – Distribution – Chi – InverseChi –


Area = 5% , d.f = 2

𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 5.991

(e) What should Ibson conclude (using 𝜒 2 value) ?

𝜒 2 𝑡𝑒𝑠𝑡 = 3.83, 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 5.99

𝜒 2 𝑡𝑒𝑠𝑡 < 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 . There is no sufficient evidence to reject 𝐻0 . So accept 𝐻0

Conclusion : Choice of languages follow the given distribution

(f) What should Ibson conclude (using 𝑝 − value) ?

𝑝 −value = 0.1476 = 14.76% , significance level = 5%

the p-value > the significance level. No sufficient evidence to reject 𝐻0 . So accept 𝐻0

Conclusion : Choice of languages follow the given distribution

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 2


Eg 2. The times taken by ten-year-old children to solve a puzzle can be modelled by a normal
distribution with mean 12 minutes and standard deviation 2.5 minutes. Math AI SL

The times taken to solve the same puzzle by a random sample of 50 ten-year-old children are
as follows:

Time(min) 𝑡 ≤9 9 < 𝑡 ≤ 11 11 < 𝑡 ≤ 13 13 < 𝑡 ≤ 15 𝑡 > 15


Frequency 10 11 20 5 4
Test, using a 10% significance level, whether the times of the ten-year-old children come
from the same distribution.

Answer: T = Time , 𝑇~𝑁(12, 2.52 )

𝐻0 ∶ Times come from the distribution 𝑇~𝑁(12, 2.52 )

𝐻1 ∶ Times do not come from the distribution 𝑇~𝑁(12, 2.52 )

Number of degree of freedom = 5 − 1 = 4

Time (𝒇𝒐 ) Probability (𝒇𝒆 )


𝑡 ≤9 10 0.1151 0.1151 × 50 = 5.76
9 < 𝑡 ≤ 11 11 0.2295 0.2295 × 50 = 11.5
11 < 𝑡 ≤ 13 20 0.3108 0.3108 × 50 = 15.5
13 < 𝑡 ≤ 15 5 0.2295 0.2295 × 50 = 11.5
𝑡 > 15 4 0.1151 0.1151 × 50 = 5.76

Go to Statistics mode in GDC. Input 𝒇𝒐 and 𝒇𝒆 in List1 and List2 respectively

Then Test – Chi – GoF

𝜒 2 value = 8.62 and 𝑝 −value = 0.0713 = 7.13%

Concluding using 𝒑 −value

𝒑 −value = 7.13% < 10%. We got sufficient evidence to reject 𝐻0 . Reject 𝐻0

Accept 𝐻1 . Times do not come from the distribution 𝑇~𝑁(12, 2.52 )

1. A theory predicts that three different types of flower should appear in the ratio 1 ∶ 2 ∶ 3.
A sample of 60 flowers contains 14 flowers of type A, 18 flowers of type B and 28 flowers
of type C. A 𝜒 2 goodness of fit test is used to test the theory at the 10% significance level.

(a) Calculate the expected frequencies and state the number of degrees of freedom.

(b) Find the 𝜒 2 value.

(c) The critical value is 4.605. State the conclusion of the test. Math AI SL

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 3


2. Sheeba thinks that students at her large college are equally likely to study any of the four
Mathematics courses. Math AI SL

She asks a random sample of 80 students and obtains the following results:

Course 𝑀𝑎𝑡ℎ 𝐴𝐴 𝐻𝐿 𝑀𝑎𝑡ℎ 𝐴𝐴 𝑆𝐿 𝑀𝑎𝑡ℎ 𝐴𝐼 𝐻𝐿 𝑀𝑎𝑡ℎ 𝐴𝐼 𝑆𝐿


No of students 18 21 8 33

Sheeba conducts a 𝜒 2 test using a 5% significance level whether the choice is equally
likely.

(a) State the null and alternative hypotheses.

(b) Write down the expected frequencies and the number of degrees of freedom.

(c) Find the 𝑝-value for the test and hence state the conclusion.

3. A six-sided dice is rolled 120 times, giving the following results: Math AI SL

Outcome 1 2 3 4 5 6
frequency 26 12 16 28 14 24

Is there evidence, at the 2% significance level, that the dice is not fair?

4. Rajesh is practising tennis serves. He takes three serves at a time and records the number of
successful serves. Math AI SL

He believes that this number can be modelled by the binomial distribution B(3, 0.7).

No of successful serves out of three 0 1 2 3


frequency 7 28 95 70

(a) State the hypotheses for a 𝜒 2 goodness of fit test.

(b) Find the expected frequencies and write down the number of degrees of freedom.

(c) Calculate the 𝜒 2 value.

(d) The critical value for the test is 6.25. State the conclusion of the test.

5. Michelle tosses six coins simultaneously and records the number of tails.
She repeats this 600 times. The results are shown in the table. Math AI SL

Number of tails 0 1 2 3 4 5 6
frequency 9 62 120 178 152 67 12

(a) State the distribution of the number of tails for an unbiased coin.

(b) Hence work out the expected frequencies for Michelle’s experiment.

(c) Test at the 5% significance level whether there is evidence that the coins are biased.

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 4


6. Four friends are guessing answers to maths questions. They think that they each have the
probability of 0.5 of guessing the correct answer to any question, independently of each other.

(a) Let 𝑋 be the number of correct answers to a single question. Accepting the assumptions
above are correct, state the distribution of 𝑋. Math AI SL

In order to test whether their assumptions are correct, the friends guess answers to 100
questions and record the number of correct answers to each question.

Number of correct answers 0 1 2 3 4


frequency 12 13 45 22 8

(b) State appropriate hypotheses for a 𝜒 2 goodness of fit test.

(c) State the number of degrees of freedom.

(d) Conduct the test at the 2% significance level and state your conclusion.

7. An athlete believes that her long jump distances follow a normal distribution with mean 5.8 m
and standard deviation 0.8 m. Math AI SL

(a) Assuming her belief is correct, copy the table below and fill in the missing probabilities:

Distance(𝑚) <5 5 to 6 6 to 7 >7


probability 0.159

In order to test her belief, she records her distances from a random sample of 100 jumps,
obtaining the following results:

Distance(𝑚) <5 5 to 6 6 to 7 >7


frequency 17 42 38 3

(b) State the number of degrees of freedom for a 𝜒 2 goodness of fit test.

(c) State suitable hypotheses.

(d) Conduct the test at the 10% significance level.

8. A train company claims that times for a particular journey are distributed normally with mean
23 minutes and standard deviation 2.6 minutes. Math AI SL

Sophia takes this train to school and wants to test the company’s claim.
She decides to conduct a 𝜒 2 test and records the durations of 50 randomly selected journeys:

Time(min) < 21.5 21.5 − 22.5 22.5 − 23.5 23.5 − 24.5 > 24.5
frequency 3 8 14 17 8

(a) Find the expected frequencies.


(b) Write down the number of degrees of freedom.
(c) Calculate the 𝜒 2 value.
(d) The critical value for Sophia’s test is 9.49. State the conclusion in context.

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 5


9. A teacher suggests that exam grades at their college can be modelled by the following
distribution: Math AI SL

𝑔(11−𝑔)
𝑃(𝐺 = 𝑔) = , for 𝑔 = 3,4, 5, 6, 7
140

A random sample of 40 students had the following grades:

Grade 3 4 5 6 7
frequency 2 10 9 12 7

Test, using a 10% significance level, whether the teacher’s model is appropriate for these data.

𝝌𝟐 test for independence


Example
Julio investigates whether people’s favourite sport depends on their age. He asks 30 adults and 50
children to select their favourite sport out of football, basketball and baseball. He records the results
in the contingency table: Math AI SL

Adults Children
Football 8 23
Basketball 12 11
Baseball 10 16

Julio conducts a 𝝌𝟐 test for independence.

(a) State the hypotheses for this test.

𝐻0 ∶ Age and favorite sport are independent

𝐻1 ∶ Age and favorite sport are not independent

(b) Find the number of degrees of freedom.

Ans : number of degrees of freedom = (3 − 1) × (2 − 1) = 2

(c) Find the expected frequencies and the 𝝌𝟐 value.

Adults Children
Football 11.6 19.4
Basketball 8.63 14.4
Baseball 9.75 16.3

(d) Julio looks up a list of critical values for his test and finds that the appropriate critical value
is 9.21. What should Julio conclude from the test? (Use 𝝌𝟐 values)

𝜒 2 value = 3.93 (GDC) and 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 9.21 (given)

𝜒 2 𝑡𝑒𝑠𝑡 < 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 . There is no sufficient evidence to reject 𝐻0 . So accept 𝐻0

Conclusion : Age and favorite sport are independent

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 6


(e) What should Julio conclude from the test using the 𝑝 −value at 5% significance level?

Calculated 𝑝 −value = 0.14 = 14% (using GDC)

𝒑 −value = 14% > 5%. We got no sufficient evidence to reject 𝐻0 . Accept 𝐻0

Conclusion : Age and favorite sport are independent

10. Ruby wants to test whether students’ food preferences depend on their age. Math AI SL
She conducts a survey in the school canteen, recording which option each student chooses.

Veggie burger Fish fingers Peperoni pizza

Adults 18 26 51
Children 53 38 47

Conduct a suitable 𝜒 2 test at the 5% significance level, stating your hypotheses and
conclusion clearly.

11. A zoologist investigates whether different types of insect are more common in different
locations. She collects a random sample of insects from a meadow and a forest and counts
the number of ants, bees and flies. Math AI SL

Ants Bees Flies

Meadow 26 15 21
Forest 32 6 18

Use a 𝜒 2 test with a 10% significance level to test whether the type of insect found depends
on the location.

12. The table shows information about the mode of transport students use to get to school in four
different cities.

(a) Calculate the 𝜒 2 value.

(b) Use a 𝜒 2 test to find out whether there is evidence, that there is a relationship
between the mode of transport and the city. Math AI SL

Amsterdam Houston Athens Johannesburg

Car 12 25 48 24
Bus 18 33 12 18
Bicycle 46 12 7 53
Walk 38 8 3 21

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 7


Answer

1 (a) 10, 20, 30; 2 (b) 1.93 (c) Insufficient evidence that the ratio is different from 1:2:3

2.(a) 𝐻0 : Each course is equally likely,


𝐻1 : Each course is not equally likely
(b) 20, 20, 20, 20; 3
(c) 𝑝 = 0.001 19, sufficient evidence that each course is not equally likely

3. 𝜒 2 value = 11.6, 𝑝 = 0.0407, insufficient evidence that the dice is not fair

4. (a) 𝐻0 : The data comes from the distribution B(3, 0.7)


𝐻1 : The data does not come from the distribution B(3, 0.7)

(b) 5.40, 37.8, 88.2, 68.6; 3 (c) 3.57

(d) Insufficient evidence that the data does not come from the distribution B(3, 0.7)

5. (a) Binomial, 𝑛 = 6, 𝑝 = 0.5 (b) 9.38, 56.3, 141, 188, 141, 56.3, 9.38
(c) 𝜒 2 value = 7.82; insufficient evidence that the coins are biased

6 (a) X ~ B(4, 0.5)


(b) H0: The data comes from the distribution B(4, 0.5)
H1: The data does not come from this distribution
(c) 4
(d) 𝜒 2 value = 13.4, 𝑝 = 0.00948,
sufficient evidence that the data does not come from B(4, 0.5)

7 (a) 0.440, 0.334, 0.0668 (b) 3


(c) H0: The data comes from the distribution N(5.8, 0.82 )
H1: The data does not come from the distribution N(5.8, 0.82 )
(d) 𝜒 2 value = 2.82, 𝑝 = 0.420, insufficient evidence to reject H0

8. (a) 14.1, 7.09, 7.62, 7.09, 14.1 (b) 4 (c) 30.7


(d) Sufficient evidence that the times do not follow the distribution N(23, 2.62 )

9. 𝜒 2 = 5.46, 𝑝 = 0.243, insufficient evidence that the model is not appropriate

10. 𝜒 2 = 12.14, 𝑝 = 0.00231, sufficient evidence that the age and food choices are not
Independent
11. 𝜒 2 = 4.41, 𝑝 = 0.110, insufficient evidence that type of insect depends on location

12. 𝜒 2 value = 125.8 (using GDC) , 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 = 16.92

𝜒 2 𝑡𝑒𝑠𝑡 > 𝜒 2 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 , sufficient evidence that city and mode of transport are dependent

SBA/IBDP/Math SL /.Statistics/ 25.Chi Square Test/SASIKUMAR 8

You might also like