Inferential Statistics

Inferential Statistics
-Hypothesis Testing &

Estimation
By
Alfred Ngwira
Inferential Statistics
 Will make conclusions about the

population parameters using sample
statistics. Precisely we will be
1. Testing hypothesis about population
parameters using sample statistics
2. Estimating population parameters using

sample statistics.
Hypothesis
 A statistical hypothesis is a
conjecture/claim about a population
parameter(eg population mean,
proportion) which may or may not be
true. E.g Proportion of girls at Bunda is
30%.
Hypothesis
 Statistical hypothesis testing is a decision-

making process for evaluating claims
about a population parameter using the
sample.
Hypothesis
Examples
1. The mean temperature at Salima town is

less than 35C.
2. The mean grade point average of graduating

students at a university is at least 2.3.
3. The mean income for LUANAR graduates

when employed is MK150000 per month.
Types of hypothesis
Null hypothesis
 Symbolized by Ho, is a statistical hypothesis

that states that there is no difference between
a parameter and a specific value, or that there
is no difference between two parameters.
 The null hypothesis contains equal sign E.g

2,3 on previous slide are null hypotheses.
Types of hypothesis
Alternative hypothesis
 Symbolized by H1, is a statistical hypothesis
that states that there is a difference between
a parameter and a specific value, or states
that there is a difference between two
parameters.
Types of hypothesis
 The alternative hypothesis usually contains
the symbol >, <, or ≠.
 E.g 1 is alternative hypothesis on previous

slide
Hypothesis testing
procedure
Step 1: Identify H0 and H1
H0 will contain =
H1 will contain > , < , or 
Step 2: Select the test statistic e.g z, t, f,

or chi-square, based on
distribution of sample statistic.
Hypothesis testing Procedure
Step 3: Use a given level of significance( 

-Type I error) to determine the
critical/rejection region(s)(use  to find
critical value/point). We use statistical
tables.
Hypothesis testing Procedure
 Eg for right tailed t test the rejection region

based on alpha( )
Hypothesis testing
procedure
Step 4: Calculate the test statistic from
sample data
Step 5: Make your decision. If the test

statistic falls in the critical region, reject H0.
If the test statistic does not fall in the
critical region, do not reject H0. Interpret
your decision in terms of the claim.
Hypothesis testing
Types of errors in conclusion made in
hypothesis testing
 Type I error(  ): error/probability of
rejecting Ho when it is not to be rejected).
 Type II error(  ): error/probability of failing

to reject(‘accepting’) Ho when it is false.
Hypothesis about the mean
 The following are possible hypothesis

formulations about the mean:
H 0 :   0
1 2 H 0 :   0 3 H 0 :   0
H1 :   0 H :   H1 :    0
1 0
 Note   population  mean

0  specific  hypothesised  value
 First is two tailed, 2nd is right tailed and 3rd is
left tailed hypothesis formulation.
 Two tailed because there are two directions of

alternative (right/left)
 Right tailed because direction of alternative is

right and left tailed because direction of
alternative is left.
Hypothesis about mean
Example of two tailed hypothesis

formulation
Ho: Average barley tobacco yield in 2014

was 50 000 000 kg
H1: Average barley tobacco yield in 2014

was not 50 000 000 kg
Example of right tailed hypothesis

formulation
Ho: Average barley tobacco yield in

2014 was 50 000 000 kg
H1: Average barley tobacco yield in

2014 was more than 50 000 000 kg
Example of left tailed hypothesis

formulation
Ho: Average barley tobacco yield in 2014

was 50 000 000 kg
H1: Average barley tobacco yield in 2014

was less than 50 000 000 kg
 Note the direction of alternative hypothesis

determines the type of hypothesis test i.e
whether two tailed, right tailed or left tailed.
 Two tailed hypothesis means that when

testing such hypothesis formulation, there
will be two rejection regions(to right and
left of the distribution of the test statistic)
 Eg if use z test to test two tailed

hypothesis then the following are the two
rejection regions in the z distribution.
 A right tailed hypothesis test has a

rejection region only to the right of the
distribution of the test statistic i.e
 A left tailed hypothesis test has rejection

region to the left of the distribution of the
test statistic.
Activity
Consider the hypothesis formulation
below:
Ho: Mean maize yield per hec. was 8 bags in

2015/2016 versus
H1: Mean maize yied per hec. was less 8 bags

1. Determine whether the test of hypothesis

will be left tailed/right tailed/two tailed.
2. Determine direction of rejection
region(s)
 Note that critical points marking rejection


regions in two tailed test are based on 2
while critical points marking rejection

regions in one tailed test are based on the
whole  .
Activity
Mark T/F
1. In two tailed test there are two rejection
regions
2. To get the critical values marking
rejection regions in two tailed test we use
alpha  divided by two.
3. In one tailed test we get critical value
marking rejection region by using alpha .
 Sample statistic to use to test such

hypotheses is sample mean( X )
1. If sample from normal population and know
population standard deviation  use z-test(1)
2. If sample size is large i.e n  30 and sample

from any population and standard deviation 
is estimated by sample S use z-test(2)
3) If the sample from normal and that

sample size is small i.e n  30 and that
population standard deviation is estimated
by sample standard deviation then use t
test(3) with n-1 degrees of freedom.
x x x
1. z  2. z  3. t 
/ n / n s/ n
Example
 Full-time PhD students receive an average
salary of MK12,837 according to the
Department of Education. The dean of
graduate studies at the university feels that
PhD students earn more than this.
 He selects 44 students randomly and

finds their average salary is MK14,445
with a standard deviation of MK1,500.
With  = 0.05, is the dean correct?
Testing hypothesis about mean
Solution
Step 1 Stating null and alternative hypothesis:
H 0 :   Mk12,837
H1 :   Mk12,837
– Thus we have right tailed test. Our rejection region is
to the right. Ho will be rejected if sample mean will
be far to the right.
Step 2 Sample statistic is sample mean X
and its distribution is approximately
normal since sample size is greater than
X 
30 i.e so use z- test statistic Z
/ n
after standardization.
Hypothesis test about the mean
Step 3 Critical value: we reject Ho when

z  z i.e when
z  z0.05  1.65
Step 4 Calculating statistic using sample

X  14445  12837
data, Z   7.11
/ n 1500 / 44
Step 5 Conclusion: since z > 1.65, we reject

the Ho and adopt alternative, H1 i.e PhD
students earn more than MK12837 based
on available data.
Hypothesis about the population
mean
Example
A nutritionist believes 12 g box of

breakfast cereal contains an average 1.2 g
of bran. The nutritionist gets a random
sample of sixty boxes of popular cereal.
She find sample mean of 1.170g, and
standard deviation, s = 0.111g.
Do the data indicate that the mean bran content
of all boxes of this brand of cereal differs from 1.2
g? Use α=0.05.
Solution
Step 1 Stating null and alternative hypothesis:
H 0 :   1.2
H 1 :   1.2
Hypothesis test about mean
Note that we have two tailed test
Step 2 Sample statistic is X and is normal since

X 
n>30 and its standard form is Z
S/ n
Step 3 Rejection criteria: we reject Ho when
| z | z / 2 i.e when
| z | z0.05/ 2  z0.025  1.96
i.e when Z  1.96 or Z  1.96

 i.e when z is in the right rejection region or
left
X  1.170  1.2
Step 5 Now Z    2.09
S/ n 0.111 / 60
Step 6 Conclusion, z < -1.96, we reject
reject null, Ho i.e the mean is
different from 1.2

 Review
– Alpha,  is the type I error-probability of
rejecting null when it is not to be rejected.
– Alpha  is area of rejection region.

 Review
Review
 For right tailed test,  =p(Z ≥ critical

value/point), if use Z test or  = p(t ≥
critical value) if use t test.
 For a left tailed test  =p(Z ≤ - critical

value) for Z test.
Review
 For a left tailed test =p(Z ≤ -critical

value) for Z test and  =p(t ≤ -critical) for
t -test.
Hypothesis test about the mean

Review on alpha ( )
Hypothesis test about mean-
Review
 For two tailed test there are two rejection
regions(right/left)
 The total sum of areas of the two regions

is 
 That’s each rejection region =  / 2 in
area or probability
Review
 Two rejection rejections are alpha in area
Review
 For a two tailed test, if you find a positive

test statistic compare it with positive
critical value and reject Ho if statistic is ≥
positive critical, otherwise fail to reject Ho
 If you find a negative statistic, compare it

with negative critical value, and reject Ho if
statistic is ≤ negative critical, otherwise fail

to reject Ho(see e.g before this review)
Example
The average rainfall during the summer

months for the southern of Malawi is 11.52
mm. A researcher selects a random
sample of 10 districts in the southern
Malawi and finds that the average amount
of rainfall for 2014 is 7.42 mm.
The standard deviation of the sample is

1.3 mm. At  = 0.05, can it be concluded
that for 2014 the mean rainfall was below
11.52 mm?
Solution
Step 1 H 0 :   11.52
H1 :   11.52
Step 2 Since n<30 test statistic is t  X  

S/ n
with n-1 degrees of freedom
Step 3 Critical value. We reject Ho when

t  t ,n 1 i.e when
t  t0.05,9 i.e when
t  1.833
 i.e
.

X  7.42  11.52
Step 4 Now t   9.97
S/ n 1.3 / 10
Step 5 Conclusion: Since t is less than the
critical value, we reject Ho and adopt

H1 to say mean rainfall is below
11.52 mm.
Hypothesis about difference
between two means
 Possible hypothesis formulations
1) H :
0  
1  2
or 1   0
2
H 1 : 1   2 or 1   2  0
2) H 0 : 1   2 or 1   2  0
H1 : 1   2 or 1   2  0
between two means
3) H 0 : 1   2 or 1   2  0
H1 : 1   2 or 1   2  0
 Note (1) is a two tailed test while (2) & (3)
are one tailed tailed test hypothesis
formulations.
between two means
 Now appropriate statistic is difference
between sample means X 1  X 2 .
 If sample from normal or n1 , n2  30 , X 1  X 2
is also normal and thus by standardization

X 1  X 2  ( 1   2 ) X 1  X 2  ( 1   2 ) X 1  X 2
Z  
se( X 1  X 2 )  12  22  12  22
 
n1 n2 n1 n2
,
between two means
 Note if ,n1 , n2  30 and that don’t know
population variances  1 ,  2 we can
2 2
use sample variances 2

S ,S 2
1 2
and still use z-test statistic:

X1  X 2
Z
S12 S 22

n1 n 2
between two means
 Note if ,n1, n2  30 and that don’t know
population variances  1 ,  2 can use
2 2
2 2
sample variances S , S1 2
and use t-test
with smaller of n1  1 and n2  1
degrees of freedom: X1  X 2
t
S12 S 22

n1 n2
between two means
 Note that the just defined Z and t test for
the difference between two means
assume that the two population variances
are not the same i.e   
2 2
1 2
 If we assume population variances are

 2
equal 1 2   2
  2
then we have Z statistic
as:
Hypothesis about the difference
between two means
 Under equal population variance
assumption i.e 12   22   2
X1  X 2 X1  X 2
Z 
 2
 2
2 2
1
 2

n1 n2 n1 n2
X1  X 2 X1  X 2
 
1 1 
1 1

   
2
 n1 n2  n1 n2
Hypothesis about the difference
between two means
 If n1 , n2  30 and that  ,
1
2 2
2 are
2 2
estimated by S , S 1 2 then Z statistic is
X1  X 2
Z
S
1 1

where
n1 n2
S (n1  1)  S (n2  1)
2 2
S 1 2
n1  n2  2 is the pooled sample

standard deviation.
between two means
 If we assume that population variances
are same 12   22   2 then t-statistic is:
X1  X 2
t
1 1
S 
n1 n2
with n1+n2-2 degrees of freedom where
S12 (n1  1)  S 22 (n2  1)
S
n1  n2  2
is pooled sample standard deviation.
two means
Example Two types of fertilizers UREA
and CAN were applied to two maize plots
respectively. Farmers think that there is
no difference in maize yield between two
fertilizers. A researcher takes sample of
40 maize grains in plot 1 and 32 maize
grains in plot 2.
between two means
 The average weight of maize grains is
10kg in plot 1 and 7kg in plot 2. Standard
deviations for weights for plot1 is 2kg and
for plot 2 is 4kg. Test whether there is a
difference in maize yield for UREA and
CAN. Assume that population variances
are not equal.
between two means
Solution
Step 1 H 0 : 1   2
H 1 : 1   2
Step 2 Since n1 , n2  30 we use
X1  X 2
Z
S12 S 22

n1 n2
between two means
Step 3: Critical value: We reject Ho if
| Z | Z  / 2  Z 0.05 / 2  Z 0.025  1.96
i.e when Z  1.96 or Z  1.96

X1  X 2 10  7
Step 4: Now Z   3.87
2 2 2 2
S S 2 4

1 2

n1 n2 40 32
Step 5: Since Z  Z  / 2  1.96 we reject Ho.

between two means
Example
A farmer thinks that local chickens lay

eggs with lager weight than hybrid. She
collect 10 eggs for local and 12 eggs for
hybrid.
between two means
 The mean weight for local is found to be
5kg and that for hybrid is found to be 12kg.
The standard deviation for local weight is
2kg and that for hybrid is 3kg. Test the
claim for the farmer.
between two means
Solution
Data X 1  5, X 2  12 S1  2, S 2  3
n1  10, n2  12
Step 1 Hypothesis
H 0: 1   2
H 1 : 1   2
between two means
Step 2: Since sample sizes are less than 30
we use t test statistic i.e
X1  X 2
t with smaller degrees of
S12 S 22

n1 n2
n1  1 and n1  1.
.
between two means
 Note here we assume that two population
variances are not equal.
Step 3: Critical value; We reject Ho if t  t ,df

where df is smaller of n1  1 or n2  1
i.e reject Ho when t  t 0.05,9  1.833

between two means
X1  X 2 5  12
Step 4: Now t   6.19
S12 S 22 212 3 22
 
n1 n2 10 12
Step 5: Conclusion: since t  t 0.05,9 we fail to

reject null hypothesis, i.e based on the
available data 1  2
between two means
Example
Is there difference in return book times
between two university students?
LUANAR 2, 4.3, 8.5, 3, 2.
Mzuzu 3 , 6.5, 5, 7.5, 8, 4, 3 Assume
population variances are the same.

between two means
Step 2: Test statistic with assumption of
equal population variances is t since n1, n2  30
X1  X 2
t
1 1
S 
n1 n 2
with n1  n2  2 degrees of freedom where
S is pooled sample standard deviation.

(n1  1)S12  (n2  1)S 22
S
n1  n2  2
between two means
Step 3: Rejection criteria: Ho is rejected
whent  t / 2,n1 n2 2 or
t  t / 2,n1  n2 2
between two means
Step 4: Test statistic calculation
(5  1)7.33  (7  1)4.32
s  2.351
572
x1  x2 3.96  5.29
t 
1 1 1 1
s    2.351   
 n1 n2  5 7
 1.33
  0.966
1.3863
between two means
Step 5: Conclusion: since t  t 0.025,10 we
fail to reject Ho i.e there is no
difference between return book times
Between the two universities.

Hypothesis about the proportion
 Population proportion is the ratio of items

of interest to the total.
 Examples
– Ratio of males to total in statistics
class(400/635)
– Ratio of extension students to total(200/635) .

Hypothesis about population
proportion
R
P
 We denote population ratio by N
where R is the number of items of interest

and N is total population size and we
ˆ  r
denote sample ratio by P
n .
 Note r is the number items of interest from

the sample of n items/individuals.
Hypothesis about proportion
 To test hypothesis about population

R
proportion P
N
we use the sample
proportion ˆ  r
P .
n
• Note that the mean/expected value of ˆ  r

P
n
ˆ r r  r  P(1  P)
is E( P)  E n   p and variance is V ( Pˆ )  V    V   
n n n
proportion
 Now for large samples i.e n ≥ 30 the
ˆ  r
sampling distribution of sample P
n
proportion is approximately normal with

P(1  P)
mean P and variance of n
so that
by standardization we have
Pˆ  P
Z
P(1  P)
n
proportion
 That’s to test hypothesis about population
proportion we will assume large samples
so as to use the z-test.
Pˆ  P
Z
P(1  P)
n
proportion
Example
An ABM marketing company claims that it

receives 4% responses from its mailing.
To test this claim, a random sample of 500
was surveyed with 25 responses. Test at
the α =0.05 significance level.
-

proportion
H0 : p = 0.04
H1: p  0.04
 This is a two-sided rejection region test.
 Appropriate test statistic is

Pˆ  P
Z
P(1  P)
n
proportion
 Now we reject Ho if Z  Z or Z  Z 
2 2
i.e when Z   Z 0.05   Z 0.025 or Z  Z 0.205  Z 0.025

2
i.e when Z ≤ -1.96 or Z ≥ 1.96
Pˆ  P 0.05  0.04
 Now Z   1.14
P(1  P) 0.04(1  0.04)
n 500
proportion
 Conclusion: Since z=1.14<1.96, we fail to
reject the null hypothesis, i.e the claim of
ABM marketing company is likely to be
true.
 Note we used the Z-test since n ≥ 30 by

CLT.
between population proportions
 Possible hypothesis formulations:
H 0 : p1  p 2 or H 0 : p1  p 2  0
H 1 : p1  p 2 H 1 : p1  p 2  0
H 0 : P1  P2 H 0 : P1  P2  0
or
H 1 : P1  P2 H1 : P1  P2  0
(3) H 0 : P1  P2 H 0 : P1  P2  0
H 1 : P1  P2 H1 : P1  P2  0
 Appropriate statistic is the difference

between sample proportions i.e
ˆ P
P ˆ
1 2
 Note if sample from normal population or
that n1 , n2  30 then Pˆ1  Pˆ2 is normal with
P1 (1  P1 ) P2 (1  P2 )
mean P1  P2 and variance 
n1 n2
 That’s standardising Pˆ1  Pˆ2 we have z-test

( Pˆ1  Pˆ2 )  ( P1  P2 ) Pˆ1  Pˆ2
Z 
P1 (1  P1 ) P2 (1  P2 ) P1 (1  P1 ) P2 (1  P2 )
 
n1 n2 n1 n2
between two proportions
 Under assumption of equal population
Pˆ1  Pˆ2
proportions i.e, P1  P2  P we have Z
1 1
P(1  P)  
 n1 n2 
n1 Pˆ1  n2 Pˆ2
where P is the pooled sample
n1  n2
proportion based on two sample
proportions.
between proportions
Example
A farmer club in Mzuzu claims that the

proportion of rotten ground nuts in their
50kg bag is same as that of Mulli Brothers
limited.
A researcher collects 100 ground nuts
from a bag of farmer club and finds that
20% are rotten and collects 80 from Mulli
bag and finds that 12% are rotten. Test the
claim of famers club.
between proportions
 Data Pˆ1  0.20, Pˆ2  0.12 n1  100, n2  80
 Hypothesis: H 0 : p1  p 2
H 1 : p1  p 2
ˆ P
P ˆ
 Test statistic is Z  1 2
 1 1 
P(1  P)  
 n1 n2 
 i.e when Z ≥ 1.96 or Z≤ -1.96
 Now calculating test statistic

n1 Pˆ1  n2 Pˆ2 100  0.20  80  0.12
P   0.16
n1  n2 100  80
Pˆ1  Pˆ2 0.20  0.12

Z   1.44
1 1  1 1
P(1  P)   0.16(1  0.16)  
 n1 n2   100 80 
 Since Z<1.96 we fail to reject the null
hypothesis i.e based on the available data
there is no difference in proportions of
rotten groundnuts in the bags of famers
club and that of Mulli.
One way anova/comparing
more than two means
 Data lay out
Treatments/g Observations
roups
1 y11 y12 y13 … … y1n1
2 y21 y22 y23 … … y1n2
. . . . .
. . . . .
. . . . .
k yk1 yk2 yk3 … … yknk

more than two means
MST
 Appropriate statistic is F
MSE
~ Fk 1, N k where
 n j x j  x 
k
2
which is Fk 1, N k
j 1
MST 
k 1
where nj is the sample size for
group j X j is sample mean for group j for

j=1,2,3,…,k, X is the grand/overall mean
more than two means
 Mean square error(MSE) or within group
variation is
 n 1s  x  xj 
k k nj
2 2
j j j ,i
j 1 j 1 i 1
MSE  
N k N k
where S 2j is group j variance x ji is

an observation in group j column i.
more than two means
 The null hypothesis is rejected when this
statistic is greater or equal to the critical

F-value, i.e when F  Fk 1, N  k or P( F )  
more than two means
 Rejection criteria
more than two means
Example
The following data are the weights in kgs of
patients after being given three diets.
Weight in kgs
Diet 1 210 215 205 180 175 190
Diet 2 180 160 195 190 170 155

more than two means
Step 1: Hypothesis
Ho: no difference among
diets(µ1=µ2=…=µk)
H1: there is difference among
diets(atleast two diet means are
different)
MST
Step 2: Statistic to use is F ~ Fk 1, N k
MSE
more than two means
Step 3: Critical value: Ho is rejected when

FF k 1, N  k F0.05
2,15  3.68
more than two means
Step 4: Calculating statistic
n1  n2  n3  6, N  n1  n2  n3  18
x1  195.83, x2  175, x3  161.6, x  177.5.

more than two means
Calculating the MST we have
 n x  x
k
2
j j
j 1
MST 
k 1

    
6 * 195.8  177.5  6 * 175  177.5  6 * 161.6  177.5
2 2
2
3 1
 1779 .4
more than two means
Calculating F statistic we have
MST 1779 .4
F   8.2
MSE 216.91
Step 5: Conclusion, since the calculated
F=8.2 > the critical value=3.68, we reject
the null hypothesis i.e there is a difference
among the treatment means.
more than two means
 The F test to compare means is based on
analysis of variance(ANOVA) of yij into
different sources i.e due to group, and due
to error i.e
Total variation=between group

variation+error variation(within group)
more than two means
 Now total variability in yij is measured by
TSS, between group by SST, and error
variability by SSE. Thus analysis of
variance in data yij is summarized by
TSS=SST+SSE
 The degrees of freedom for TSS is N-1,

SST is k-1, SSE is N-k
more than two means
 Now the F test to compare group means
compares variability due to group and
variability due to error by ratio
SST
F k  1 
MST
 Fk 1, N  k
SSE MSE
N k
more than two means
 MST is mean square of between group
sum of squares, and MSE is mean square
of error sum of squares.
 MST is a measure of variabiity in data due

to group difference and MSE is a measure
of within group data variability.
more than two means
 Ho is rejected when
F  Fk1, N k or when P( F )  
where  is type 1 error a.k.a significance

level.
more than two means
 Summary of one way anova table and F
statistic
Source of Sum of Degrees of Mean F-value
variation squares(SS) freedom(df) square(MS)
Group/treat SST k-1 MST=SST/k-1 F=MST/MSE
ment
Error SSE N-k MSE=SSE/N-k
Total TSS N-1
 Note, TSS=BSS+SSE, and N-1=k-1+N-k

more than two means
Example
Complete ANOVA table below and test as

to whether groups are equal or not
ANOVA
Source SS df MS F-value
Groups 291.8027 (b)_ 145.90 (c)_
Error (a)_ 50
Total 785.1908 52
more than two means
Solution
ANOVA
Source SS df MS F-value
Groups 291.8027 (b) 2 145.90 (c) 14.78
(a)
Error 493.3881 50 9.87
Total 785.1908 52
more than two means
a) subtract group sum of squares from
total since TSS=SST+SSE
b) subtract error df from total
c) Use the formula:
MSB 145.90
F   14.78
MSE 9.87
more than two means
 Critical F-value: Falpha,k-1,N-k =F 0.05,2,50 =
3.183
 Now since F=14.78 > F 0.05,2,50 = 3.183 ,

we reject the null hypothesis i.e the
treatment/group means are different.

Testing for association in
cross tables
 Let X1 and X2 be categorical variables
with I and J categories/levels respectively
in the cross table/contingency table
cross tables
 Cross table of two categorical variables
X2
Level 1 Level 2 Level 3 ….. …. Level J
Level 1
Level 2
X1
.
Level I
cross tables
 We wish to test whether there is an
association between X1 and X2. The
following is the hypothesis formulation:
Ho: There is no association
H1: There is an association

cross tables
 Alternatively you may state the
hypotheses as follows:
Ho: X1 and X2 are independent
H1: X1 and X2 are dependent

cross tables
 One of the statistics to use is Pearson chi-
square defined as
 
2 O  E  2
~ 2
( I 1)( J 1)
E
 The null, Ho is rejected when
  2 2
( I 1)( J 1),
cross tables
 The null, Ho is rejected when
 
2 2
( I 1)( J 1),
cross tables
Example
Test whether there is association between
heart attack status and personality type

Heart Attack Type A Type B
Status
Heart Attack O=25 O=10
No Heart Attack O=5 O=40

cross tables
Solution
 From a mare observation it seems there
is an association between heart attack and
personality(positive association). But we
test if the association is significant(i.e real

not by chance)
cross tables
 To test for significant association we use
Pearson Chi-square test
 Now to compute the chi-square, we first

compute expected frequencies(E) i.e
E = (cell’s column total)*(cell’s row total) /

n
cross tables
Personality Type
A B Row total
Heart Attack O=25 O=10 35
Column total 30 50 Grand total =80

cross tables
 E: type A and heart attack:
(30)(35)/80 = 13.125
 E: type A and no heart attack:
(30)(45)/80 =16.875
 E: type B and heart attack:
(50)(35)/80 = 21.875
 E: type B and no heart attack:
(50)(45)/80 = 28.125
cross tables
 Putting expected information in table we
have
Personality Type
A B Row total
E=13.12 E=21.875
No Heart Attac O=5 O=40 45
E=16.875 E=28.125
Column total 30 50 grand total=80
cross tables
 Now to compute the chi-square we use:
 (O  E ) 
2
   
2

 E 
cross tables
 Thus we have:
( 25  13 .125 ) 2
 (10  21 .875 ) 2

 
2
  
13.125  21.875 
 (5  16.875) 2   (40  28.125) 2 
     
 16.875   28.125 
 13.52
cross tables
 We reject Ho when
 
2 2
1, 0.05  3.84
 Now our obtained χ2 of 13.52 exceeds

this value.
 We reject H0. i,e there is an association

between heart attack and personality type.
cross tables
 Assumptions of the Pearson Chi-
square Test
– All expected frequencies ≥ 2. Observed
frequencies can be < 2.
– No 20% of expected frequencies must be < 5

cross tables
 Special Consideration
– If the expected frequencies in the cells are
“too small,” the Pearson χ2 test may not be
valid, use the Fisher exact test. You can read
about Fisher exact test.
Chi-square goodness of fit
 Used to test whether obsvd freq distr agrees with
expected/theoretical freq distribution
Example Farmers participation in tobacco farming
Response Percent
1.Yes—currently participate 29%
2.Yes—participated in the past 39%
3. No—have never participated 32%

Suppose a current survey of n=200 farmers
indicate the following responses.
Response 1 2 3 Total
Observed 82 64 54 200
Expected 0.29*200 0.39*200 0.32*200
= 58 = 78 =64 200
Solution:
Ho: The observed freqs have same distr as

previous
H1: They have different distribution

 Calculating the Chi-square we have:
 
2 O  E 2
E
(82  58) 2 (64  78) 2 (54  64) 2
  
58 78 64
 9.93  2.51  1.56
 14
 The critical value is   ,df   2

0.05, 2  5.991
 Now since the calculated chi-square is >

5.991, we reject the null hypothesis i.e the
observed frequencies are not the same as
expected frequencies.
Example
Of 64 offspring of a certain cross of guinea
pigs, 34 are red, 10 are black and 20 are
white. According to the genetic model, these
numbers should be in the ratio of 9:3:4. Are
the data consistent with the model?

Solution
Ho: The data agree with genetic model
H1: The data does not agree with the model
We reject Ho when
    0.01, 2  9.210
2 2
 Now the chi-square statistic is

2  
O  E 
2

(34  36) 2 (10  12) 2 (20  16) 2
   1.44
E 36 12 16
 Now since the calculated chi-square is
less than the tabulated chi-square, we fail
to reject Ho i.e the data probably agrees
with the model.
Point and interval estimation of
population parameter
 Point estimate of population parameter is
the single value of the population
parameter.
 Suppose you estimate the statistic class

mean height(parameter) by sample
mean(statistic) as 63 cm, then 63cm is the
point estimate of parameter, mean height.
Interval estimation for
population mean
 Interval estimate of population
parameter(e.g mean) is a number of
values for the parameter using an interval.
 E.g instead of estimate of students mean

height by 63cm we may say that students
height mean/average lies between 62cm
and 65cm or within the interval (62, 65).
population mean
 General formula for interval estimation for
population mean  is
 S S  or  S 
 X  Z  , X  Z   S
 X  t  , X  t  
   
 2 n 2 n
 2 n 2 n 
with probability 1

population mean
 Note the use of z or t, is due to varying
distribution of sample mean X
as being used to estimate population

mean 
Interval estimation
 The quantity 1 is called confidence

coefficient, i.e it is the measure of a
researcher confidence that the estimate of
population mean lies within the interval
 It is the probability that the estimate of

population mean is within the interval.
Interval estimation of population
mean
Example
Find the 95% confidence interval for the mean
weight of tobacco produced in 2015 using the
sample mean weight of 20 tonnes, and sample
standard deviation of 2 tonnes and sample size
of 40 bells.
population mean
Solution
 S S 
 X  Z  , X  Z  
  
 2 n 2 n
 2 2 
  20  1.96  ,20  1.96  
 40 40 
 (19.38,20.6)
mean
 This means estimate of tobacco in 2015
was between 19.38 to 20.6 tonnes with
95% probability/confidence
 The goodness of interval estimation is that

it attaches an error to the estimate.
population mean
 That is, e.g if we estimate  by sample X
 S S 
mean by interval  X  Z  , X  Z   ,
 
 2 n 2 n 
S
 The margin of error is  Z 
2 n
mean difference
 Sometimes the interest may be to find
confidence interval for population mean
difference 1   2
 Note that appropriate statistic to use is the

difference between sample means X1  X 2
mean difference
 If we sample from two normal populations
with mean 1 ,  2 and that we know
populations standard deviations 1 ,  2,
then X1  X 2 will be normal with mean
 12  22
1  2 and variance of  and
n1 n2
 2
 2
hence se( X 1  X 2 )  1
 2
n1 n2
mean difference
 If we don’t sample from normal but sample
sizes are large enough i.e n1 , n2  30 and
that 1 ,  2 are unknown and estimated by
sample standard deviations S1 , S 2 then
X1  X 2 would still be normal by CLT.

mean difference
 Under such two scenarios we use
standard normal(Z) distribution to have CI
for mean difference defined as:
 
( X 1  X 2 )  Z   se( X 1  X 2 ), ( X 1  X 2 )  Z   se( X 1  X 2 )
 2 2 
 12  22
where se( X 1  X2) 
n1

n2
case 1
S12 S 22 case 2
se( X 1  X 2 )  
n1 n2
mean difference
 But if sample sizes are small i.e less than
30, and estimate 1 ,  2 by S1 , S 2 then we
use t distribution to have CI i.e CI is
 
( X 1  X 2 )  t ,df  se( X 1  X 2 ), ( X 1  X 2 )  t ,df  se( X 1  X 2 )
 2 2 
1 1 (n1  1)S12  (n2  1)S 22
se( X 1  X 2 )  S  S
n1 n2 n1  n2  2
mean difference
df  n1  n2  2
Note in use of such t distribution we
assume that the two population variances
are equal i.e     

2 2 2
1 2
mean difference
 If we assume that they are not equal then
we would have the same CI for difference
of means using t distribution but with
S12 S 22 and df=smaller of the
se( X 1  X 2 )  
n1 n2
two:
n1  1 or n2  1
mean difference
Example
For a random sample of 190 Agribusiness
firms that revalued their fixed assets, the
mean ratio of debt to tangible assets was
0.517 and the sample standard deviation
was 0.148.
mean difference
 For an independent random sample of 417
firms that did not revalue their fixed
assets, the mean ratio of debt to tangible
assets was 0.489 and the sample
standard deviation was 0.159. Find a 99%
confidence interval for the difference
between the two population means.
mean difference
Solution
 X2  Y2 0.1482 0.159 2
X  Y  z  / 2    0.517  0.489  2.575 
n X nY 190 417
 0.028  2.575 0.0001153 0.00006063
 0.028  0.034
Or (-0.0062, 0.0622)
mean difference
Example
A farmer wants to know estimate
difference in mean maize yield between
UREA and CAN. He/She gets 10 yields in
kgs for each fertilizer and computes the
sample mean yield for each fertilizer.

mean difference
The following is sample data
UREA CAN
X 1  83256 X 1  88354
s1 s2
 3256  2341
n1  10 n2  10
Construct 95% CI for the mean difference

mean difference
Solution
2 2
S S
X 1  X 2  t 1
 2
2
, df n1 n2
2 2
3256 2341
 (83256  88354 )  2.262 
10 10
 5098  2868 .535
 (7967 ,2229 )
population proportion
 Just as we had interval estimation for
population mean we can have interval
estimation for population proportion.
 Interval estimation for population

proportion P using the sample proportion P̂
 ˆ (1  Pˆ ) ˆ (1  Pˆ ) 
is defined as  Pˆ  Z  P
, ˆZ 
P
P 
 
n

n 
 2 2 
 with confidence coefficient of 1
i. e probability that the population
proportion lies within the interval
 Note for confidence interval for proportion

we will only use z distribution for sample
proportion because we will mostly
Example
FUM conducted a poll between Jan. 14
and Jan 22,2016. They asked 805 as if to
retain FISP or drop it. They found a
sample proportion of 68% who supported

to retain it.
Using Interval Estimation to test
hypothesis
Construct a 95% CI for the true proportion of
FISP supporters.
Solution
Pˆ (1  Pˆ ) 0.68(1  0.68)
Pˆ  Z    0.68  Z 0.05 
2 n 2 805
0.68(1  0.68)
 0.68  1.96 
805
 (0.649,711)
Confidence interval for
proportion difference
 Here we wish to construct interval
estimate for P1  P2
 The appropriate statistic sample to use
sample proportion difference denoted by
Pˆ1  Pˆ2
 Note if sample sizes are large i.e n1 , n2  30
then by CLT Pˆ1  Pˆ2 has approximately

normal distribution with mean P1  P2 ,
P(1  P) P(1  P)
variance  and standard error
n1 n2
of P(1  P) P(1  P)

n1 n2
 That’s we use standard normal distribution
to have CI for P1  P2 when we have large
samples i.e in this case CI is defined as:
Pˆ1  Pˆ2  Z   se( Pˆ1  Pˆ2 )
2
P1 (1  P1 ) P2 (1  P2 )
 Pˆ1  Pˆ2  Z  
2
n1 n2
 P1 (1  P1 ) P2 (1  P2 ) ˆ ˆ P(1  P) P(1  P) 
  Pˆ1  Pˆ2  Z   , P1  P2  Z   
 2
n1 n2 2
n1 n2 
 Where P1 , P2 are approximated by sample
ˆ ,P
ˆ
values P1 2
 In case of assuming equal population

proportions i.e P1  P2  P then CI for P1  P2 is
     
 Pˆ  Pˆ  Z P(1  P)   , Pˆ  Pˆ  Z P(1  P)   
1 1 1 1
 1 2 2 n n  1 2  n n 
  1 2 2  1 2 
Pˆ1 n1  Pˆ2 n2 pooled sample proportion
P
n1  n2
proportions difference
Example NFS student believes that a
sweetener called xylitol helps prevent ear
infections. In a randomized experiment 165
children took a placebo and 68 of them got
ear infections.
Another sample of children 159 took xylitol
and 46 of them got ear infections.
Construct the 95% CI for difference in
Proportions assuming that the proportions
are not equal.

Solution
ˆ 68 ˆ 46
Note that P1   0.412, P2   0.289
165 159
Z   Z 0.05  Z 0.05  1.96
2 2
0.412 (1  0.412 ) 0.289 (1  0.289 )
 (0.412  0.289  1.96  ,
165 159
0.412 (1  0.412 ) 0.289 (1  0.289 )
0.412  0.289  1.96  )
165 159
 ( 0.020, 0.226)
Testing two tailed hypothesis
formulation using CI
 An interval estimate of a parameter may
be used to test the two tailed hypothesis
formulation about the parameter.
 In this case you fail to reject Ho if CI

contains the parameter value under Ho
and reject Ho if CI excludes this value.
Testing hypothesis using CI
Example
Find the 95% confidence interval for the mean
weight of tobacco produced in 2015 using the
sample mean weight of 20 tones, and sample
standard deviation of 2 tones and sample size
of 40 bells and use your CI to test claim that
mean weight was 17 tones.

Two tailed hypothesis test using
Confidence Interval
Solution
Hypothesis formulation:
Ho:  = 17 versus
H1:  ≠ 17
CI
95% Confidence Interval for mean weight is
 S S 
 X  Z  , X  Z  
  
 2 n 2 n
 2 2 
  20  1.96  ,20  1.96  
 40 40 
 (19.38,20.6)
CI
Now since the interval does not contain the
value under Ho, we reject Ho and adopt the
alternative hypothesis and say, the mean
weight was different from 17 tones.

CI
 Note that we use CI to test two tailed
hypothesis formulation only and not one
tailed formulation since the confidence
interval is two sided.
The End

Inferential Statistics - Hypothesis Testing & Estimation: by Alfred Ngwira

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inferential Statistics - Hypothesis Testing & Estimation: by Alfred Ngwira

Uploaded by

Copyright:

Available Formats

-Hypothesis Testing &

 Will make conclusions about the

2. Estimating population parameters using

 Statistical hypothesis testing is a decision-

1. The mean temperature at Salima town is

2. The mean grade point average of graduating

3. The mean income for LUANAR graduates

 Symbolized by Ho, is a statistical hypothesis

 The null hypothesis contains equal sign E.g

 E.g 1 is alternative hypothesis on previous

Step 1: Identify H0 and H1

H1 will contain > , < , or 

Step 2: Select the test statistic e.g z, t, f,

Step 3: Use a given level of significance( 

 Eg for right tailed t test the rejection region

Step 5: Make your decision. If the test

Types of errors in conclusion made in

 Type II error(  ): error/probability of failing

 The following are possible hypothesis

 Note   population  mean

left tailed hypothesis formulation.

 Two tailed because there are two directions of

 Right tailed because direction of alternative is

Example of two tailed hypothesis

Ho: Average barley tobacco yield in 2014

H1: Average barley tobacco yield in 2014

Example of right tailed hypothesis

Ho: Average barley tobacco yield in

H1: Average barley tobacco yield in

Example of left tailed hypothesis

Ho: Average barley tobacco yield in 2014

H1: Average barley tobacco yield in 2014

 Note the direction of alternative hypothesis

 Two tailed hypothesis means that when

 Eg if use z test to test two tailed

 A right tailed hypothesis test has a

 A left tailed hypothesis test has rejection

Consider the hypothesis formulation

Ho: Mean maize yield per hec. was 8 bags in

H1: Mean maize yied per hec. was less 8 bags

1. Determine whether the test of hypothesis

2. Determine direction of rejection

 Note that critical points marking rejection

while critical points marking rejection

 Sample statistic to use to test such

2. If sample size is large i.e n  30 and sample

3) If the sample from normal and that

 He selects 44 students randomly and

Step 3 Critical value: we reject Ho when

Step 4 Calculating statistic using sample

Step 5 Conclusion: since z > 1.65, we reject

A nutritionist believes 12 g box of

of all boxes of this brand of cereal differs from 1.2

Step 2 Sample statistic is X and is normal since

i.e when Z  1.96 or Z  1.96

Step 6 Conclusion, z < -1.96, we reject

reject null, Ho i.e the mean is

different from 1.2

– Alpha  is area of rejection region.

 For right tailed test,  =p(Z ≥ critical

 For a left tailed test  =p(Z ≤ - critical