Professional Documents
Culture Documents
Topic 4B. Inferential Statistics
Topic 4B. Inferential Statistics
Topic 4B. Inferential Statistics
MATHEMATICS
Applications and Interpretation SL (and HL)
Lecture Notes
Christos Nikolaidis
TOPIC 4
STATISTICS AND PROBABILITY
Only for HL
January 2023
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Thus, the following topics will provide more of a “recipe” for each
case, rather than a mathematical investigation
1 For example, we don’t say “we accept the claim” but “we do not have enough
evidence to reject the claim”)
1
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
We state:
Null Hypothesis H0: the claim (affirmative)
Alternative hypothesis H1: the negation of the claim
If NOT, we reject H0
If YES, we do not have enough evidence to reject H0
10% 5% 1%
The significance level is usually
a=0.10 a=0.05 a=0.01
(it is clearly stated in the question)
For example, a=0.05 means: we reject results far away from the
claim H0, in a way that the probability to make a mistake is 5%.
Both p-value and statistic are obtained by the sample (by GDC).
p-value < a
If we reject H0
statistic value > critical value
2
The critical value depends on the significance level a (hopefully it will be given!)
2
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
2
A slight modification gives sn-1 and corrects this bias:
2 n 2
sn-1 = sn
n- 1
We say that
2
sn-1 is an unbiased estimate of the population variance σ 2
For example, if our sample has size n=6 and sn 1.71, then
2 6 2 6
sn-1 = sn = 1.712 = 3.50892
5 5
sn-1 1.87
3
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
For the following data
1, 2, 3, 4, 5, 6
the GDC gives
σx 1.71
sx 1.87
So, what is the standard deviation for our data?
There are two different situations: Population vs Sample.
3
Well, this sample doesn’t look very random, but it serves our purpose, to
compare with the first situation. One would expect something more random like
“2, 3, 6, 2, 1, 4” which has a smaller standard deviation sn , so that the
correction would provide sn-1 , a more reliable estimate for the population.
4
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
either or
two samples of data only the corresponding statistics
sample means: x1 x2
we enter data in GDC in
standard deviations: sx1 sx2
LIST 1 LIST 2
size of the samples: n1 n2
We state
[null hypothesis] Ho: μ1=μ2
[alternative hypothesis] H1: μ1≠μ2 or μ1>μ2 or μ1<μ2
We use GDC
Statistics - TEST – t – (2 samples)
if List: statistics sx , x , n are automatically entered
if Var: we enter sx , x , n on ourselves
Pooled: ON (always)
Execute gives
p-value
Conclusion
IF THEN
5
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
To compare the mean weights between two populations A and B
we obtain two samples
Sample A 65 70 74 69 57 64 61 78 83 80
Sample B 71 65 68 59 70 65 55 52
(GDC gives that the two sample means are x1 =70.1 and x2 =63.1)
We will test two different claims for the population means μ1, μ2
with a = 0.05
(a) Ann claims that μ1> μ2
(b) Bill claims that population means are different
Solution
(a) We perform a 1-tailed t-test
H o: μ1 = μ2
H 1: μ1 > μ2
GDC gives p-value = 0.041
Since p-value < 0.05
we reject Ho. That is, we accept Ann’s claim that μ1 > μ2
(b) We perform a 2-tailed t-test
H o: μ1 = μ2
H 1: μ1 ≠ μ2
GDC gives p-value = 0.082
Since p-value > 0.05
we do not have enough evidence to reject Ho. Bill is not right!
NOTICE.
If the significance level was a = 0.10, the null hypothesis Ho
would be rejected in both cases
If the significance level was a = 0.01, the null hypothesis Ho
would not be rejected in both cases
6
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2.
The same example, if they give us only the statistics:
Solution
7
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
We state
[null hypothesis] Ho: the two criteria are independent
[alternative hypothesis] H1: the two criteria are not independent
We use GDC
Statistics - TEST – CHI – 2WAY
We enter observed frequencies in Matrix A (use F2 ►MAT)
Expected frequencies are automatically entered in Matrix B
Execute gives
χ2statistic and p-value
Conclusion
IF THEN
p-value < a
we reject Ho
(or χ2statistic > χ2critical)
8
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
n matrix
For a m
degrees of freedom = (m-1)×(n-1)
fexpected
EXAMPLE 1
9
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
Solution
H o: gender and favorite sport are independent
H 1: gender and favorite sport are not independent
GDC gives
χ2statistic = 7.00
p-value = 0.0301
degrees of freedom = 2
Since
p-value < 0.05
we reject Ho, that is gender and favorite sport are not
independent.
Extra details:
The matrix is 23. That is why
degrees of freedom = 12 = 2
The expected frequencies are
Tennis Volley Basketball
Male 13.5 9 13.5
Female 16.5 11 16.5
10
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
If some fexpected is 5, we need to merge two columns (or two rows
11
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
We state
[null hypothesis] Ho: data follow the distribution
[alternative hypothesis] H1: data do not follow the distribution
We use GDC
Statistics - TEST – CHI – GOF
We enter observed frequencies in List 1
and expected frequencies in List 2
List 1 List 2
f1 Np1
f2 Np2
… …
fn Npn
Conclusion
IF THEN
p-value < a
we reject Ho
(or χ2statistic > χ2critical)
χ2critical will be given If necessary
12
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
NOTICE
For example
EXAMPLE 1
Philipp claims that the supporters of Football teams A, B, C and D
are as follows
A B C D
30% 30% 25% 15%
A B C D
11 13 10 6
Solution
We perform a Goodness of fit Chi-squared test with
13
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
We enter
observed frequencies in List 1,
expected frequencies in List 2. (multiply probabilities by 40)
List 1 List 2
11 400.30 = 12
13 400.30 = 12
10 400.26 = 10.4
6 400.14 = 5.6
d.f.= n-1 = 3
GDC gives
χ2statistic = 0.210
p-value = 0.976
Since
p-value > 0.05
we do not have enough evidence to reject Ho.
We may accept that Philipp’s claim about the distribution of the
people is true.
NOTICE
An alternative way to draw the conclusion for example 1 is by
using the statistic value χ2statistic = 0.210.
14
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
100 people throw a die 10 times and count the number of sixes:
We test the claim that the game follows the Binomial distribution
B(10,1/6) with a=0.05
Solution
Ho: data follow Binomial distribution B(10,1/6)
H1: data do not follow Binomial distribution B(10,1/6)
The Binomial distribution B(10,1/6) gives the probabilities
0 1 2 3 4-10
0.162 0.323 0.291 0.155 0.070
We enter
observed frequencies in List 1,
expected frequencies in List 2. (multiply probabilities by 100)
List 1 List 2
15 1000.162 = 16.2
30 1000.323 = 32.3
30 1000.291 = 29.1
15 1000.155 = 15.5
10 1000.070 = 7
15
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 3
It is claimed that the amount of sugar contained in 1-kg packets is
actually normally distributed with a mean of μ = 1000g and a
standard deviation of σ = 30g. We pick at random 80 packets of
sugar and notice their weight. The results are shown below
packets 10 37 28 5
List 1 List 2
10 800.091 = 7.28
37 800.409 = 32.72
28 800.409 = 32.72
5 800.091 = 7.28
16
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
ONLY FOR
HL
17
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
18
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
It is claimed that the following sample
frequency 10 35 15 20 20
List 1 List 2
10 1000.010 = 10
35 1000.353 = 35.3
15 1000.140 = 12
20 1000.223 = 22.3
20 1000.184 = 18.4
d.f.= n-1 = 4
GDC gives
χ2statistic = 0.450 and p-value = 0.978
Since
p-value > 0.05
we do not reject Ho. The distribution can be Po(8).
19
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
DEGREES OF FREEDOM
The definition is
d.f. = number of values that have the freedom to vary
In simple cases d.f. = n-1
This is because, in List 2 (expected data) we enter the first n-1
values, but the last entry is not free (as we know their sum):
List 2
*
*
*
*
N – (above)
SUM = N
Distribution d.f.
Any random with n categories n-1
Binomial B(n,p)
if they give us p, well n-1
x n-2
otherwise we consider p=
n
Poisson Po(λ)
If they give us λ, well n-1
If not, we consider λ= x n-2
Normal
If they give us μ, σ well n-1
If they don’t give μ we consider μ= x n-2
If they don’t give σ we consider σ= s n- 1 n-2
If they don’t give both μ, σ (so μ= x , σ= s n- 1 ) n-3
20
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
100 people play a game 10 times and count the number of wins:
We test the claim that the game follows the Binomial distribution
B(10,p) with a=0.05
Solution
Since p is not known we have to estimate it:
x midpoint frequency
0 0 15
1 1 30
2 2 30
3 3 15
4-10 7 10
x = 2 .0 5
x 2.05
Since np = x p = = = 0.205
n 10
We test the claim:
Ho: data follow Binomial distribution B(10,205)
H1: data do not follow Binomial distribution B(10,0.205)
The Binomial distribution B(10,0.205) gives the probabilities
0 1 2 3 4-10
0.101 0.260 0.302 0.207 0.130
We enter
observed frequencies in List 1,
expected frequencies in List 2. (multiply probabilities by 100)
21
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 3
It is claimed that the amount of sugar contained in 1-kg packets is
actually normally distributed with μ=1000. We pick at random 80
packets and notice their weight. The results are shown below
packets 10 37 28 5
x midpoint frequency
940-960 950 10
960-1000 980 37
1000-1040 1020 28
1040-1060 1050 5
22
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 4
It is claimed that the following sample
frequency 10 35 15 20 20
x midpoint frequency
0-4 2 10
5-7 6 35
8 8 15
9-10 9.5 20
11-15 13 20
23
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
either or
a sample of data only the corresponding statistics
sample mean: x
we enter data in GDC in
standard deviation: sx (this is sn-1 )
LIST 1
size of the sample: n
We use GDC:
Statistics – INTR – (Z or t) – (1 sample)
Z t
If we know σ If we don’t know σ
We have to enter σ on ourselves We use sx (instead of σ)
if List: statistics sx , x , n are already there
if Var: we enter σ or sx x , n on ourselves
Execute gives
Lower
Upper
Conclusion
The a% confidence interval for the population mean μ is
[Lower , Upper]
We are 95% confident that the interval will contain the population
mean μ
or otherwise
If we choose a sample 100 times, we expect that in 95 of them,
the interval will contain the population mean μ
24
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
For a sample of n=40 data, we know that x=23, sn-1 =3. We also
know that the standard deviation of the population is σ=2.8.
Solution
EXAMPLE 2
For a sample of n=20 data, we know that x=23, sn-1 =3.
Solution
EXAMPLE 3
For a sample of n=20 data, we know that x=23, sn =5 .
Solution
n 2 20 2
(a) s 2n-1 = sn = 7 = 51.5789... 51.6 and sn-1 =7.18
n- 1 19
(b) GDC: Statistics – INTR – t – 1 SAMPLE – Data: Variable
25
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 4
16 17 17 15 20 16 18 15 21 18 17 16
Solution
26
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
either or
a sample of data only the corresponding statistics
sample mean: x
we enter data in GDC in
standard deviation: sx
LIST 1
size of the sample: n
We state
[null hypothesis] Ho: μ=μ0
[alternative hypothesis] H1: μ≠μ0 or μ>μ0 or μ<μ0
We use GDC:
Statistics - TEST – (Z or t) – (1 sample)
Z t
If we know σ If we don’t know σ
We have to enter σ on ourselves We use sx (instead of σ)
if List: statistics sx , x , n are already there
if Var: we enter σ or sx , x , n on ourselves
Execute gives
Zstatistic = z tstatistic =t
p-value
Conclusion
IF THEN
p-value < a
we reject Ho
(or Zstatistic > Zcritical) (or tstatistic > tcritical)
27
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
InvN or Invt
Zcritical tcritical
InvN(0,1) – Tail: right t - Invt
Area = a for 1-tailed test
Area = a/2 for 2-tailed test
PAIRED DATA
Mind the difference between
value in month 1 x1 x2 x3 …
value in month 2 y1 y2 y3 …
we find differences x1 – y1 x2 – y2 x3 – y3 …
28
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
For a sample of n=40 data, we know that x=23, sn-1 =3. We also
know that the standard deviation of the population is σ=2.8.
Solution
H1: μ≠24
p-value = 0.0239
Hence μ≠24
H1: μ<24
p-value = 0.0119
Hence μ<24
NOTICE
We also use the statistic value (against the critical value)
(a) zstatistic = -2.26
(b) zstatistic = -2.25
29
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
For a sample of n=20 data, we know that x=23, sn-1 =3.
Solution
H1: μ≠24
p-value = 0.152
H1: μ<24
p-value = 0.0762
NOTICE
We can also use the statistic value (against the critical value)
(a) tstatistic = -1.49
(b) tstatistic = -1.49
30
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 3
16 17 17 15 20 16 18 15 21 18 17 16
Solution
H0: μ=18
H1: μ≠18
p-value = 0.147
NOTICE
We can also use the statistic value (against the critical value)
tstatistic = -1.56
31
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 4
In January
16 17 17 15 20 16 18 15 21 18 17 16
In March
18 17 19 17 18 16 17 16 21 21 18 17
Use a=0.05
Solution
We have paired data, hence we find the differences and treat them
as 1 sample.
2 0 2 2 -2 0 -1 1 0 3 1 1
H0: d=0
H1: d>0
p-value = 0.0475
32
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
λ>λ0 or λ<λ0
λ=λ0 against
only 1-tailed test
We state
[null hypothesis] Ho: λ=λ0
[alternative hypothesis] H1: λ>λ0 or λ<λ0
We use GDC
Statistics – DIST – Poisson Po(λ0)
x
statistic
= nx = x
i
Conclusion
IF THEN
p-value < a we reject Ho
33
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
The number of accidents per day in a certain area follows Poisson.
We record the number of accidents in 5 different days
NOTICE
For (a), since 57=35, the accidents per day are 7. We could
also state: Ho: λ=7, H1: λ>7,
But still, for the p-value we consider Po(35).
They could only give us that n=5, x =7.8, instead of the complete
table. The statistic is x =nx =39.
i
34
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
p>p0 or p<p0
p=p0 against
only 1-tailed test
We state
[null hypothesis] Ho: p=p0
[alternative hypothesis] H1: p>p0 or p<p0
We use GDC
Statistics – DIST – Binomial B(n,po)
x
statistic
= x (size of the group)
if H1: p>p0 p-value = P(X xstatistic )
if H1: p<p0 p-value = P(X xstatistic )
Conclusion
IF THEN
35
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
In a sample of 200 people there are 50 smokers. That is
n=200
x=50
50
observed proportion p= =0.25
200
Test the following two claims
(a) the proportion of smokers in the population is 0.30 (i.e. 30%)
(b) the proportion of smokers in the population is 0.20 (i.e. 20%)
Use a=0.05
Solution
For both questions, the statistic is x=50
(a) Ho: p=0.30
H1: p<0.30
We consider B(200, 0.30) and
p-value = P(X≤50) = 0.0695
Since p-value > 0.05 we do not have enough evidence to reject Ho
[we may accept that the proportion is 0.30]
(b) Ho: p=0.20
H1: p>0.20
We consider B(200, 0.20) and
p-value = P(X50) = 0.049
Since p-value < 0.05 (almost!) we reject Ho
[we accept that the proportion is greater]
36
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
We state
[null hypothesis] Ho: ρ=0 [there is no correlation]
[alternative hypothesis] H1: ρ≠0 or ρ>0 or ρ<0
We use GDC
Statistics – TEST – t – REG
Execute gives
p-value
Conclusion
IF THEN
p-value < a we reject Ho
37
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
Consider the following sample of bivariate data
x 5 10 15 20 25 30 35 40 45 50
y 50 52 55 53 57 57 60 55 58 55
Solution
(a) y = 3.39x – 159.7
(b) r = 0.67
For (c) and (d) we use GDC: Statistics – TEST – t – REG
(c) H0: ρ = 0
H1: ρ ≠ 0
p-value = 0.035
Since p-value < 0.05 we reject H0
Hence, we can support that there is correlation.
(c) H0: ρ = 0
H1: ρ > 0
p-value = 0.018
Since p-value < 0.05 we reject H0
Hence, we can support that there is a positive
correlation.
38
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
For a specific hypothesis H0, the set of values for which we reject
H0 is called critical region. It has nothing to do with the sample,. It
depends only on the significance level a.
We define the critical region only for 3 of the tests we have seen.
CRITICAL REGION
CASES OF H1
by InvN
39
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
CRITICAL REGION
CASES OF H1
by Pcd with trial and error
λ<λ0 X≤r
Pcd(0 to r) < a
λ>λ0 Xr
CRITICAL REGION
CASES OF H1
by Bcd with trial and error
p<p0 X≤r
Bcd(0 to r) < a
p>p0 X r
Bcd(r to n) < a
NOTICE
In the last two cases, Poisson and Binomial, the significance level is
not exactly a, but P(critical region), i.e. what Pcd or Bcd gives
40
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
we use
μ0 λ0 p0
TYPE I 2
Ncd(μ0, σ ) Po(λ0) B(n,p0)
ERROR n
to find
Prob(CRITICAL REGION)
In fact, this is the significance level a
we use
μ1 λ1 p1
TYPE II 2
Ncd(μ1, σ ) Po(λ1) B(n,p1)
ERROR n
to find
Prob(NON-CRITICAL REGION)
4
or ACCEPTANCE REGION
41
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 1
For a sample of n=40 data, we know that x=23. We also know
that the standard deviation of the population is σ=2.8.
For the test
H0: μ=24
H1: μ≠24
with a=0.05
(a) Find the critical region and the non-critical region.
(b) State the TYPE I error
(c) It was finally found that μ = 22.9 Find the TYPE II error.
Solution
2
(a) We use N(μ0, σ ), that is Normal with
n
2.8
mean = 24 st. deviation =
40
The critical region is shown below
InvN gives:
CRITICAL REGION: (-∞,23.1) (24.9, +∞)
NON- CRITICAL REGION: (23.1, 24.9)
(b) It is a=0.05
(c) Given that μ=22.9, we use Ncd with
2.8
mean = 22.9 st. deviation =
40
to find P(non-critical region), that is
P(23.1<X<24.9)
The TYPE II error is 0.326
42
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 2
The number of accidents per day in a certain area follows Poisson.
We record the number of accidents in 5 different days
NOTICE
For the test
Ho: λ=50
H1: λ<50 with a = 0.10
we use Po(50) and find when P(X≤r) < 0.10
CRITICAL REGION: [0, 40] with Prob = 0.086
NON- CRITICAL REGION: [41, +∞)
43
TOPIC 4: STATISTICS AND PROBABILITY Christos Nikolaidis
EXAMPLE 3
In a sample of 200 people there are 50 smokers. That is
n=200
x=50
50
observed proportion p= =0.25)
200
For the test
Ho: p = 0.30
H1: p < 0.30 with a = 0.05
(a) Find the critical region and the non-critical region.
(b) Find the TYPE I error
(c) It was finally found that p = 0.23. Find the TYPE II error.
Solution
(a) We use B(200, 0.30)
For the critical region we find when P(X≤r) < 0.0.5
Bcd with trial and error gives r = 48, with P(X≤44) = 0.035
CRITICAL REGION: [0, 48]
NON- CRITICAL REGION: [49, 200]
(b) It is P(X≤44) = 0.035
(c) We use B(200, 0.23)
We find the probability of the non-critical region.
The TYPE II error is P(X49) = 0.333
44