Professional Documents
Culture Documents
Inferential Statistics - Hypothesis Testing & Estimation: by Alfred Ngwira
Inferential Statistics - Hypothesis Testing & Estimation: by Alfred Ngwira
A statistical hypothesis is a
conjecture/claim about a population
parameter(eg population mean,
proportion) which may or may not be
true. E.g Proportion of girls at Bunda is
30%.
Hypothesis
Null hypothesis
Alternative hypothesis
Symbolized by H1, is a statistical hypothesis
that states that there is a difference between
a parameter and a specific value, or states
that there is a difference between two
parameters.
Types of hypothesis
The alternative hypothesis usually contains
the symbol >, <, or ≠.
H0 will contain =
hypothesis testing
Type I error( ): error/probability of
rejecting Ho when it is not to be rejected).
below:
region(s)
Hypothesis about the mean
g? Use α=0.05.
Solution
Step 1 Stating null and alternative hypothesis:
H 0 : 1.2
H 1 : 1.2
Hypothesis test about mean
Note that we have two tailed test
| z | z / 2 i.e when
| z | z0.05/ 2 z0.025 1.96
Review
– Alpha, is the type I error-probability of
rejecting null when it is not to be rejected.
Review
Review
Review
Example
t 1.833
Hypothesis about mean
i.e
.
H 1 : 1 2 or 1 2 0
2) H 0 : 1 2 or 1 2 0
H1 : 1 2 or 1 2 0
Hypothesis about difference
between two means
3) H 0 : 1 2 or 1 2 0
H1 : 1 2 or 1 2 0
Note (1) is a two tailed test while (2) & (3)
are one tailed tailed test hypothesis
formulations.
Hypothesis about difference
between two means
Now appropriate statistic is difference
between sample means X 1 X 2 .
2 2
sample variances S , S1 2
and use t-test
with smaller of n1 1 and n2 1
degrees of freedom: X1 X 2
t
S12 S 22
n1 n2
Hypothesis about difference
between two means
Note that the just defined Z and t test for
the difference between two means
assume that the two population variances
are not the same i.e
2 2
1 2
n1 n2 n1 n2
Hypothesis about the difference
between two means
If n1 , n2 30 and that ,
1
2 2
2 are
2 2
estimated by S , S 1 2 then Z statistic is
X1 X 2
Z
S
1 1
where
n1 n2
S (n1 1) S (n2 1)
2 2
S 1 2
H 0: 1 2
H 1 : 1 2
Hypothesis about difference
between two means
Step 2: Since sample sizes are less than 30
we use t test statistic i.e
X1 X 2
t with smaller degrees of
S12 S 22
n1 n2
n1 1 and n1 1.
.
Hypothesis about difference
between two means
Note here we assume that two population
t t / 2,n1 n2 2
Hypothesis about difference
between two means
Step 4: Test statistic calculation
(5 1)7.33 (7 1)4.32
s 2.351
572
x1 x2 3.96 5.29
t
1 1 1 1
s 2.351
n1 n2 5 7
1.33
0.966
1.3863
Hypothesis about difference
between two means
Step 5: Conclusion: since t t 0.025,10 we
Examples
– Ratio of males to total in statistics
class(400/635)
H1: p 0.04
H 0 : P1 P2 H 0 : P1 P2 0
or
H 1 : P1 P2 H1 : P1 P2 0
Hypothesis about difference
between population proportions
(3) H 0 : P1 P2 H 0 : P1 P2 0
H 1 : P1 P2 H1 : P1 P2 0
n1 Pˆ1 n2 Pˆ2
where P is the pooled sample
n1 n2
proportions.
Hypothesis about difference
between proportions
Example
1 1
P(1 P)
n1 n2
Hypothesis about difference
between population proportions
i.e when Z ≥ 1.96 or Z≤ -1.96
roups
. . . . .
. . . . .
. . . . .
n j x j x
k
2
which is Fk 1, N k
j 1
MST
k 1
variation is
n 1s x xj
k k nj
2 2
j j j ,i
j 1 j 1 i 1
MSE
N k N k
Weight in kgs
n1 n2 n3 6, N n1 n2 n3 18
6 * 195.8 177.5 6 * 175 177.5 6 * 161.6 177.5
2 2
2
3 1
1779 .4
One way anova/comparing
more than two means
Calculating F statistic we have
MST 1779 .4
F 8.2
MSE 216.91
Step 5: Conclusion, since the calculated
F=8.2 > the critical value=3.68, we reject
the null hypothesis i.e there is a difference
among the treatment means.
One way anova/comparing
more than two means
The F test to compare means is based on
analysis of variance(ANOVA) of yij into
different sources i.e due to group, and due
to error i.e
TSS=SST+SSE
ment
MSB 145.90
F 14.78
MSE 9.87
One way anova/comparing
more than two means
Critical F-value: Falpha,k-1,N-k =F 0.05,2,50 =
3.183
Level 1
Level 2
X1
.
Level I
Testing for association in
cross tables
We wish to test whether there is an
association between X1 and X2. The
following is the hypothesis formulation:
2 2
( I 1)( J 1),
Testing for association in
cross tables
The null, Ho is rejected when
2 2
( I 1)( J 1),
Testing for association in
cross tables
Example
Status
A B Row total
(O E )
2
2
E
Testing for association in
cross tables
Thus we have:
( 25 13 .125 ) 2
(10 21 .875 ) 2
2
13.125 21.875
(5 16.875) 2 (40 28.125) 2
16.875 28.125
13.52
Testing for association in
cross tables
We reject Ho when
2 2
1, 0.05 3.84
Response Percent
Response 1 2 3 Total
Observed 82 64 54 200
= 58 = 78 =64 200
Chi-square goodness of fit
Solution:
2 O E 2
E
(82 58) 2 (64 78) 2 (54 64) 2
58 78 64
9.93 2.51 1.56
14
Chi-square goodness of fit
Solution
We reject Ho when
0.01, 2 9.210
2 2
Chi-square goodness of fit
of 40 bells.
Interval estimation for
population mean
Solution
S S
X Z , X Z
2 n 2 n
2 2
20 1.96 ,20 1.96
40 40
(19.38,20.6)
Interval estimation of population
mean
This means estimate of tobacco in 2015
was between 19.38 to 20.6 tonnes with
95% probability/confidence
n1 n2
Interval estimation of population
mean difference
If we don’t sample from normal but sample
sizes are large enough i.e n1 , n2 30 and
that 1 , 2 are unknown and estimated by
sample standard deviations S1 , S 2 then
S12 S 22 case 2
se( X 1 X 2 )
n1 n2
Interval estimation of population
mean difference
But if sample sizes are small i.e less than
30, and estimate 1 , 2 by S1 , S 2 then we
use t distribution to have CI i.e CI is
( X 1 X 2 ) t ,df se( X 1 X 2 ), ( X 1 X 2 ) t ,df se( X 1 X 2 )
2 2
1 1 (n1 1)S12 (n2 1)S 22
se( X 1 X 2 ) S S
n1 n2 n1 n2 2
Interval estimation of population
mean difference
df n1 n2 2
Note in use of such t distribution we
n1 1 or n2 1
Interval estimation of population
mean difference
Example
was 0.148.
Interval estimation of population
mean difference
For an independent random sample of 417
firms that did not revalue their fixed
assets, the mean ratio of debt to tangible
assets was 0.489 and the sample
standard deviation was 0.159. Find a 99%
confidence interval for the difference
between the two population means.
Interval estimation of population
mean difference
Solution
X2 Y2 0.1482 0.159 2
X Y z / 2 0.517 0.489 2.575
n X nY 190 417
0.028 0.034
Or (-0.0062, 0.0622)
Interval estimation of population
mean difference
Example
n1 10 n2 10
2
, df n1 n2
2 2
3256 2341
(83256 88354 ) 2.262
10 10
5098 2868 .535
(7967 ,2229 )
Interval estimation for
population proportion
Just as we had interval estimation for
population mean we can have interval
estimation for population proportion.
FISP supporters.
Solution
Pˆ (1 Pˆ ) 0.68(1 0.68)
Pˆ Z 0.68 Z 0.05
2 n 2 805
0.68(1 0.68)
0.68 1.96
805
(0.649,711)
Confidence interval for
proportion difference
Here we wish to construct interval
estimate for P1 P2
Pˆ1 Pˆ2
Confidence interval for
proportion difference
Note if sample sizes are large i.e n1 , n2 30
P1 (1 P1 ) P2 (1 P2 )
Pˆ1 Pˆ2 Z
2
n1 n2
P1 (1 P1 ) P2 (1 P2 ) ˆ ˆ P(1 P) P(1 P)
Pˆ1 Pˆ2 Z , P1 P2 Z
2
n1 n2 2
n1 n2
Confidence interval for
proportion difference
Where P1 , P2 are approximated by sample
ˆ ,P
ˆ
values P1 2
ear infections.
Confidence interval for
proportions difference
Another sample of children 159 took xylitol
Hypothesis formulation:
Ho: = 17 versus
H1: ≠ 17
Two tailed hypothesis test using
CI
95% Confidence Interval for mean weight is
S S
X Z , X Z
2 n 2 n
2 2
20 1.96 ,20 1.96
40 40
(19.38,20.6)
Two tailed hypothesis test using
CI
Now since the interval does not contain the
The End