Professional Documents
Culture Documents
Lecture 7 April 24
Lecture 7 April 24
Michele Guindani
Summer 2022
Example
The chief operation officer reports that the zinc percentage should be centered
around m0 = 4.75.
They also mentioned that - in any case - the percentage should not be less
thant 4.5% or more than 5%.
How can we decide if the zinc percentage is within specification?
λ1 = θ1 M1 λ 2 = θ2 M2
Is θ1 (the incidence rate of breast cancer in the first group) lower than
θ2 ?
Example
Walmart wants to test if self-checkout is faster than using a cashier
(traditional method).
Example
Walmart wants to test if self-checkout is faster than using a cashier
(traditional method).
Let θ be the proportion of people that check out faster with self-checkout.
H0 : θ ≤ 30 and H1 : θ > 30
H0 : θ1 ≥ θ2 and H1 : θ1 < θ2
H0 : θ1 ≥ θ2 and H1 : θ1 < θ2
H0 : θ1 = θ2 and H1 : θ1 6= θ2
λ1 = θ1 M1 λ2 = θ2 M2
H 0 : θ1 ≤ θ2 and H1 : θ1 > θ2
There are four possible outcomes when making hypothesis test decisions
from sample data:
⚠️ Big issue: we can’t minimize both types of errors at the same time!!
⚠️ Big issue: we can’t minimize both types of errors at the same time!!
and then
minimize the probability of Type II error: β
⚠️ This means that we believe that one error is more serious than the other,
so we want to control it!
😳😱
😳😱 🤯
😳😱 🤯
ñ No worries, we have already seen examples of test statistics, e.g. for
testing H0 : µ = µ0
X̄ − µo
Z= √ ∼ N (0, 1)
σ/ n
😳😱 🤯
ñ No worries, we have already seen examples of test statistics, e.g. for
testing H0 : µ = µ0
X̄ − µo
😍
Z= √ ∼ N (0, 1)
σ/ n
😳😱 🤯
ñ No worries, we have already seen examples of test statistics, e.g. for
testing H0 : µ = µ0
X̄ − µo
😍
Z= √ ∼ N (0, 1)
σ/ n
⚠️ The choice of the test statistic depends on the problem and questions we
are facing.
🤯
No worries, this simply means:
If the null hypothesis is true, how likely it is that your data would have
occurred by random chance?
🤯
No worries, this simply means:
If the null hypothesis is true, how likely it is that your data would have
occurred by random chance?
The last step in Hypothesis testing is to take the decision to reject or fail
to reject the null hypothesis based on the p-value and the significance
value α:
Decision rule:
If p-value ≤ α ñ Reject H0
H0 : θ ≤ 30 and H1 : θ ≥ 30
statsmodels.stats.weightstats.ztes
H0 : µ = 500 vs HA : µ 6= 500
Michele Guindani
Summer 2022
H0 : θ1 ≥ θ2 and H1 : θ1 < θ2
H0 : θ1 ≥ θ2 and H1 : θ1 < θ2
(p̂A − p̂B ) − 0
Z= r
p̂(1 − p̂) n11 + n12
p̂ = (p1 n1 + p2 n2 ) / (n1 + n2 )
In this case, the number of measurements pre- and post- are the same.
The test proceeds by reducing the two samples to only one by calculating
the difference between the two observations for each pair.
Non-normal data
Michele Guindani
Summer 2022
The nonparametric tests are in general less powerful (need more evidence to
reject the null)
M. Guindani (Summer 2022) Probability & Statistics for DS & AI 2022 48 / 88
Probability & Statistics for DS & AI
Analysis of Variance
Michele Guindani
Summer 2022
Then the null and alternative hypotheses for one-way ANOVA for
comparing 3 groups are given by
H0 : µ1 = µ2 = µ3
HA : Not all µ values are equal