Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 118

Inferential

Statistics

1
Objectives
At the end of this course students will be able to:

 Define Inferential statistics

 Know statistical estimation

 Understand hypothesis testing & the “types of


errors” in decision making.

 Use test statistics to examine hypothesis about


population parameter
2
Inference

Use a random sample to


learn something about a
larger population

3
Inferential Statistics

 Inferential Statistics: Are statistical methods used


for drawing conclusions about a population based on
the information obtained from the a sample of
observations drawn from that population

4
Inferential Statistics
 Involves
– Estimation Population?
Population?
– Hypothesis
testing

 Purpose
– Make decisions about
population
characteristics
Inferential Statistics
Inferential statistics

Hypothesis testing Estimation

One sample Point estimation

Two samples Interval estimation

6
Inferential process

7
Statistical Estimation
 Estimation is the process of determining a likely value
of population parameter, based on information
collected from the sample
 Estimation is the use of sample statistics to estimate the
corresponding population parameters
 The objective of estimation is to determine the
approximate value of unknown population parameter
on the basis of a sample statistic 8
Sample Statistics as Estimators of Population
Parameters

A sample statistic is a A population parameter


numerical measure of a is a numerical measure of
summary characteristic of a summary characteristic
a sample. of a population.

 An estimator of a population parameter is a sample statistic used to


estimate or predict the population parameter

 An estimate of a parameter is a particular numerical value of a


sample statistic obtained through sampling.
9
Estimation
Every member of the
population has the
same chance of being
Population selected in the sample

Parameter

Random sample
Estimation
Statistic
10
Estimation
Estimation

Point Interval
estimation estimation

11
Point and Interval Estimates
 A point estimate is a single value used as an estimate of a population

parameter

 Interval estimate is a range or interval of numbers believed to include

unknown population parameter with a certain degree of assurance

 Point estimate is always within the interval estimate

Lower Upper
Confidence Confidence
Point Estimate Limit
Limit

Interval estimate
12
Estimation Process
Interval estimate
Population Point estimate
Mean I am 95%
 
 confident that 
Mean, , is  X = 50
is between 40 &
unknown 60.

 
RandomSample


 

13
Point estimation

 A single numerical value used to estimate the


corresponding population parameter
 Gives little information about how close the value is to
the unknown population parameter
 Example: Sample mean X= 3 is point estimate of
unknown population mean

14
Sample statistic &their corresponding
population parameter
Statistic Parameter
Mean: X estimates 
Variance: s2 estimates 2
Standard
deviation:
s estimates 
Proportion: p estimates 
From entire
From sample
population
15
Properties of good estimate
a) Unbiasedness: An estimator is said to be unbiased
if its expected value is equal to the population
parameter it estimates.

 For example: when E ( X )   ,the sample mean is an


unbiased estimator of the population mean
 The mean of any single sample will probably not
equal to the population mean, but the average of the
means of repeated independent samples from a
population will equal to the population mean.
16
Properties of good estimate

b) Minimum variance: An estimate which has


a minimum standard error is a good estimator
 For symmetrical distribution the mean has a minimum
standard error and
 If the distribution is skewed the median has a minimum
standard error

17
Properties of good estimate

C) Consistency:
C) Consistency: An
An estimator
estimator isis said
said to
to be
be consistent
consistent ifif its
its
probability of
probability of being
being close
close to
to the
the parameter
parameter itit estimates
estimates increases
increases as
as
thesample
the samplesize
sizeincreases
increases

Consistency

n = 10 n = 100

18
Interval estimation

 A single-valued estimate conveys little information


about the actual value of the population parameter,
about the accuracy of the estimate
 The probability of getting a sample statistic value
that is exactly equal to the corresponding population
parameter is usually quite small

19
Interval estimation
 It is not reasonable to assume that a sample statistic
value is exactly equal to the corresponding population
parameter
 An interval estimate which locates the population
parameter within an interval, with a level of
confidence is needed

20
Confidence Interval or Interval Estimate

 Confidence
Confidence interval
interval oror interval
interval estimate
estimate isis aa range
range or
or
interval of
interval of numbers
numbers believed
believed to to include
include anan unknown
unknown
population parameter
population parameter
Confidence
Confidence interval:
interval: provide
provide aa range
range of
of values
values of of the
the
estimate likely
estimate likely to
to include
include the
the “true”
“true” population
population parameter
parameter
with aa given
with given probability
probability

 A confidence interval or interval estimate has two


components:
A range or interval of values
An associated level of confidence
21
Confidence Level
1. Probability that the unknown population
parameter falls within interval
2. Denoted (1 – 
• is probability that parameter is not within
interval
3. Typical values are 99%, 95%, 90%

22
CI for population mean:
There are different conditions to be considered to construct confidence intervals of the
population mean,
1. Large-sample size and when  is known

 For sufficiently large sample size n >30, the sampling


distribution of the sample mean, is approximately
normal
 A 100(1‐α)% σ C.I. for μ is: σ σ
x  z /2  (x - z /2 , x + z /2 )
n n n
 α is to be chosen by the researcher, most
common values of α are 0.05, 0.01, 0.001 and 0.1

23
CI for population mean:
2. Large-sample size and when  is unknown
 Whenever  is not known (and the population is assumed
normal), the correct distribution to use is the t distribution
with n-1 degrees of freedom. However, for large degrees
of freedom, the t distribution is approximated well by the
Z distribution

 A large sample 100(1‐α)% C.I. for μ is:


s
x  z
2 n
 Note that: when  is unknown, s is a good approximation
of  24
Example

 An epidemiologist studied the blood glucose level of


a random sample of 100 patients. The mean was 170,
with a SD of 10. Construct the 95% CI for the
population mean.

25
Solution
s
X  Z /2
n
10
170  1.96
100

 (168.04, 171.96)

 We are 95% sure that the mean blood glucose level


of the population lies between 168.04 and 171.96

26
CI for population mean:
3. Small sample size (n<30) and when  is
unknown
 If population standard deviation is unknown, then
the sample means from samples of size n are t-
distributed with n-1 degrees of freedom

 A 100(1‐α)% C.I. for μ is:

s
X  t /2, n-1
n
27
Example: The average earnings per share (EPS)
for 10 industrial stocks randomly selected from
those listed on the Dow-Jones Industrial
Average was found to be X = 1.85 with a
standard deviation of S=0.395. Calculate a 99%
confidence interval for the average EPS of all
the industrials listed on the DJIA.

Solution:
28
29
Example: A random sample of 900 workers
showed an average height of 67 inches with a
standard deviation of 5 inches.

A. Find a 95% confidence interval of the mean


height of all workers

B. Find a 99% confidence interval of the mean


height of all workers
Solution:
30
31
32
Example: Suppose we want to estimate a 95%
confidence interval for the average quarterly returns
of all fixed-income funds in the Ethiopia. We draw a
sample of 100 observations and calculate the sample
mean to be 0.05 and the standard deviation 0.03. We
assume that those returns are normally distributed
with known variance.

Solution:
33
34
Example:

1. An economist is interested in studying the incomes of


consumers in a particular country. The population standard
deviation is known to be $1,000. A random sample of 50
individuals resulted in a mean income of $15,000. Construct
the 95% confidence interval ?

2. An auditor, examining a total of 820 accounts receivable of a


corporation, took a random sample of 60 of them. The sample
mean was $127 and the sample standard deviation was $43.
Find a 99% confidence interval for the population mean. 35
CI for a population proportion: Large-sample size

 For sufficiently large samples, the sampling distribution of the


proportion p is approximately normal
 A 100(1‐α)% CI for π is:

p±zα/2 p(1-p)
n

A sample is considered large enough when both n  p and n  q are greater


than 5, where q =1-p.

36
Example
• In a sample of 400 people who were questioned
regarding their participation in sports, 160 said that
they did participate. Construct a 98 % confidence
interval for P, the proportion of P in the population
who participate in sports.

37
38
Exercise:

1. In a survey of 300 automobile drivers in one city, 123


reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.

2. In a survey of 300 automobile drivers in one city, 123


reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
39
Sample size
determination

40
Sample size determination
 Common questions:

– “How many subjects should I study?”


– Too small sample: -Waste of time and resources
-Results have no practical use
– Too large sample: -Waste of resources
-Data quality compromised
-Any small difference can be
statistically significant
41
When deciding on sample size:

 Precision is related to confidence level & CI

42
43
Margin of Error

44
Factors Affecting Margin of Error

 Margin of error is determined by n, s and α


– As n increases, the width of CI decreases.
– As s increases, the width of CI increases
– As the confidence level increases (αdecreases),the
width of CI increases 45
Reducing the Margin of Error
σ
ME  zα/ 2
n
 The margin of error can be reduced if

– the standard deviation is lower (s ↓)

– The sample size is increased (n↑)

– The confidence level is decreased, (1 – ) ↓


46
Sample size determination depends on:

 Objective of the study


 Design of the study
 Degree of precision or accuracy – the allowed
deviation from the true population parameter (can be
within 1% to 5%)
 Degree of confidence level required
 Availability of resources
47
Estimation of single mean
(zα/ 2 )  2 2
n=
d2
Where:
n = sample size
 = population standard deviation if known,
d = desired degree of precision = half of the
width of confidence interval
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
48
Example
 Suppose that for a certain group of cancer patients, we are
interested in estimating the mean age at diagnosis. We would
like a 95% CI of 5 years wide. If the population SD is 12
years, how large should our sample be?

(zα/ 2 )2   2 (1.96)2  (144)


n= 2
 2
 88.5  89
d (2.5)

49
But the population  is most of the time unknown
As a result, it has to be estimated from:
 Pilot or preliminary sample:

– Select a pilot sample and estimate  with the sample

standard deviation, s
 Similar studies

50
Estimation of single proportion
(zα/ 2 ) 2  pq
n=
d2
Where:
n = sample size
P = percentage
q = 1-p
d = desired degree of precision
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
51
Example
A) Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% with 95% confidence

(zα/ 2 )2  pq (1.96)2  (0.2)(0.8)


n= 2
 2
 683
d (0.03)

52
Example
B) If the above sample is to be taken from a relatively small
population (say N = 3000) , the required minimum sample will
be obtained from the above estimate by making some
adjustment (if the population is less than 10,000 then a smaller
sample size may be required).

n 683
n final =   557
n 683
1+ 1
N 3000

53
 An estimate of p is not always available
 However, the formula may also be used for sample size
calculation based on various assumptions for the values of p.

Note: if no prior information about the proportion (p),


assume p=q=0.5
54
• Example 1: Calculate the sample size for a population of
100000. Take confidence level as 95% and margin of error
as 5%.
Solution:
• Sample size for 100000 population.
We will calculate the sample size first by calculating it for
infinite size and then adjusting it to the required size.
Given: Z = 1.960, P = 0.5, M = 0.05
• Using the sample size formula, adjust the sample size for the
required population in solved example 1.
55
• Example 3: Using the Sample Size Formula, find the
sample size for a survey where confidence level = 95%,
standard deviation = 0.5, and margin of error = +/- 5%.
Solution:
• The Sample Size can be calculated as = (Z-score)2 *
SD*(1-SD) / (margin of error)2= ((1.96)2 x 0.5(0.5)) /
(0.05)2= (3.8416 x 0.25) / 0.0025=0 .9604 /0 .0025=
384.16
• Thus, you will be needing 385 respondents for this survey.
56
Hypothesis
testing

57
What is a Hypothesis?
 A hypothesis is a
I claim the mean of GPA
claim (assumption) about of this class is  3.5
the true value of unknown
population parameter
- Parameter may be
population mean, proportion,
correlation coefficient,...
– Must be stated
before analysis
58
Hypothesis testing
 The purpose of hypothesis testing is to determine whether
enough statistical evidence exists to enable us to conclude that
a belief or hypothesis about a parameter is reasonable
 Examples

– Is a new drug effective in curing a certain disease? A


sample of patient is randomly selected. Half of them are
given the new drug where half are given the standard drug .
Then, the improvement in the patients conditions is
measured and compared
59
Hypothesis Testing Process
Assume the
population
mean age is 50.
( H 0 :   50) Identify the Population

Is X  20 likely if    ?
Take a Sample
No, not likely!

REJECT H0
 X  20 
60
Steps in hypothesis testing

1) State the statistical hypotheses


 There are two hypotheses:
-Null hypotheses

- Alternative hypotheses

61
State the null hypotheses

 Null hypothesis – called the hypothesis of no


difference or no association or no effect
 States that ‘’there’s no difference’’ between the
hypothesized value and the population parameter
value
 Is always about a population parameter, not
about a sample

62
Null Hypothesis:
H0

 The null hypothesis (denoted by H0) is a statement that


the value of a population parameter (such as proportion,
mean, or standard deviation) is equal to some claimed
value.
 Always contains the “=” “≤” or “” sign
 We test the null hypothesis directly.

 Either reject H0 or fail to reject H0.

63
State the alternative hypotheses
 Alternate to null hypothesis

 Says’’ there’s a difference between the


hypothesized value and the population parameter
value
 It is what we are trying to prove, i.e. the reason for
the research question.

64
Alternative Hypothesis:
H1 or HA

 The alternative hypothesis (denoted by H1 or HA) is


the statement that the parameter has a value that
somehow differs from the null hypothesis.

 The symbolic form of the alternative hypothesis


must use one of these symbols: , < or >.

 May or may not be accepted

65
Hypothesis
Example: Consider population mean

H0: μ = μ0
HA: μ  μ0

Two- tailed

66
Example:
A. Is the mean SBP of the population is different from 120
mmHg?

- H0 : The mean SBP of the population is not different from

120 mmHg (H0: m = 120).

- HA : The mean SBP of the population is different from

120 mmHg (H1: m ≠ 120).

67
Errors in making Decision
1.Type I Error
– Probability of rejecting true null hypothesis
– Probability of accepting a false alternative hypothesis
– Probability of Type I Error is (Alpha)
• Called level of significance

2.Type II Error
– Probability of failing to reject a false null hypothesis
– Probability of rejecting a true alternative hypothesis
– Probability of Type II Error is (Beta)

68
Type I & II Errors Have an Inverse
Relationship
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.

69
Factors Affecting Type II Error
 Significance level

–  Increases when  decreases

 Population standard deviation
 
–  Increases when  increases

 Sample size

–  Increases when n decreases n
70
Controlling Type I and
Type II Errors
 For any fixed , an increase in the sample
size n will cause a decrease in 
 For any fixed sample size n, a decrease in 
will cause an increase in . Conversely, an
increase in  will cause a decrease in .
 To decrease both  and , increase the
sample size.

71
Power of a statistical test
 The power of a statistical test is the probability of
rejecting Ho, when Ho is really false. Thus power =
1-β.
 Clearly if the test maximizes power, it minimizes the
probability of Type 2 error β.

72
Summary:
Elements of a Hypothesis Test
Null Hypothesis (H0)
– A theory about the values of one or more population
parameters. The status quo.
Alternative Hypothesis (Ha)
– A theory that contradicts the null hypothesis. The theory
generally represents that which we will accept only when
sufficient evidence exists to establish its truth.
Test Statistic
– A sample statistic used to decide whether to reject the null
hypothesis. In general,
Estimate-Hypothesized Parameter
test statistic=
Standard Error
73
Summary:
Elements of a Hypothesis Test
Critical Value
– A value to which the test statistic is compared at some
particular significance level. (usually at  =.01, .05, .10)
Rejection Region
– The numerical values of the test statistic for which the null
hypothesis will be rejected.
– The probability is  that the rejection region will contain the
test statistic when the null hypothesis is true, leading to a
Type I error.  is usually chosen to be small (.01, .05, .10)
and is the level of significance of the test. 74
Summary of One- and Two-Tail Tests
One-Tail Test Two-Tail Test One-Tail Test
(left tail) (right tail)

H0: μ  μ0 H0: μ ≤ μ0
HA: μ > μ0
HA: μ < μ0

75
Summary: Rejection Regions
1. Rejection Regions (In Grey)
.5 
 
.5 
Form of Ha: 0 2 2

2-tail hypothesis 

2 2

If |z|>|z/2|
0

Then reject the null hypothesis.

Form of Ha: <0 .5  


1-tail hypothesis
 .5

If z< z
0

Then reject the null hypothesis.

Form of Ha: >o


.5  
1-tail hypothesis

.5
If z> z

Then reject the null hypothesis

76
Summary :Type I and Type II
Errors

77
Example: Two-Tail Test

Q. Does an average box of


cereal contain 368 grams of
cereal? A random sample of
25 boxes showed X = 372.5.
The company has specified s
368 gm.
to be 15 grams. Test at the a
= 0.05 level.

78
General steps in hypothesis testing:

1. State the null and the alternative hypotheses.

2. Specify the level of significance, i.e. choose α (this


always given)

3. Identify the critical regions (s): the region in which


the null hypothesis is rejected.

4. Computation of the test statistic.

5. Making decision.

6. Conclusion
79
Summary of Decision Rules

80
Example Solution: Two-Tail Test
H0: m = 368

H1: m ¹ 368 Test Statistic:


s= 15
n = 36
Z –test is appropriate
a = 0.05 Decision: Do not reject
Critical Value: ±1.96 H0 at a = .05
Reject Reject
.025 .025
Conclusion: There is
No evidence that the
-1.96 0 1.96
Z true Mean is not 368
1.60
81
Example: Two-Tailed Test
Does an average box of cereal
contain 368 grams of cereal?
A random sample of 25 boxes
had a mean of 372.5 and a
standard deviation of 12
368 gm.
grams. Test at the .05 level of
significance.
Solution

Test Statistic:
• H0:  = 368
• HA:   368
•  = 0.05
• df = 25-1=24
• Critical Value: ±2.042 0.02 < p-value < 0.05

Decision: Reject Ho since p-


Reject H0 Reject H0 value <  = .05 and t* > t-
.025 .025 critical
Conclusion: There is evidence
-2.042 0 2.042 t population average is not 368.
Example: One Tail Test

Q. Does an average box of


cereal contain more than
368 grams of cereal? A
random sample of 36
boxes showed X = 372.5.
The company has 368 gm.
specified s to be 15 grams.
Test at the a = 0.05 level. H0: m £ 368
H1: m > 368

84
Solution
H0: m £ 368
H1: m > 368 Test Statistic:
a = 0.05 X 
Z  1.50
n = 36 
Critical Value: 1.645 n
Reject

.05 Do Not reject H0 at a = .05


Decision:
0 1.645
Z No evidence that true
Conclusion:
mean is more than 368
1.50 85
• The p-value is the probability of obtaining a
value of the test statistic as extreme as, or more
extreme than, the actual value obtained, when
the null hypothesis is true.

• The p-value is the smallest level of significance,


, at which the null hypothesis may be rejected
using the obtained value of the test statistic.
• If P-value  a, reject the null hypothesis.
• If P-value  a, do not reject the null hypothesis.

86
Example: An automatic bottling machine fills cola into two liter (2000
cc) bottles. A consumer advocate wants to test the null hypothesis that
the average amount filled by the machine into a bottle is at least 2000 cc.
A random sample of 40 bottles coming out of the machine was selected
and the exact content of the selected bottles are recorded. The sample
mean was 1999.6 cc. The population standard deviation is known from
past experience to be 1.30 cc.
Compute the p-value for this test.

•• H0:2000
H0: 2000 x  0 1999.6 - 2000
z  =
•• H1:2000
H1: 2000 1.3
n
•• nn==40, 0==2000,
40,0 2000,x-bar
x-bar==1999.6,
1999.6, 40
==1.3
1.3 = 1.95
p - value  P(Z  -1.95)
•• The
Thetest
teststatistic
statisticis:
is:  0.5000 - 0.4744
 0.0256

87
p -Value Solution
Since (p-Value = 0.0256)  (a = 0.05)
Reject H0 .

88
Example: One-ailed Test

Is the average capacity of the


batteries less than 140 ampere-
hours? A random sample of 20
batteries had a mean of 138.47 and a
standard deviation of 2.66. Assume
a normal distribution. Test at the .05
level of significance.
Solution
Test Statistic:
• H0:  =>140
X   138.47  140
Ha:  < 140 t    2.57
*

S 2.66
•  = 0.05
n 20
• df = 20-1=19
• Critical Value: For t* , P-value <.05

Reject H0 Decision: Reject Ho since


p-value < a and t* < t-critical
.05
Conclusion: There is an evidence
population average is less than
-1.729 0 t
140
Example: An insurance company believes that, over the last few years,
the average liability insurance per board seat in companies defined as
“small companies” has been $2000. Using α = 0.01, test this hypothesis
using Growth Resources, Inc. survey data.

1. H0:  = 2000 Vs H1:   2000

2. For  = 0.01, critical values of z are ±2.576

3. The test statistic is:

4. Do not reject H0 if: [-2.576  z  2.576]

5. Reject H0 if: [z <-2.576] or z 2.576]


n = 100
x = 2700
s = 947

x  0 2700 - 2000
z  =
s 947

n 100

700
=  7 .39  Reject H
94.7 0
6. Conclusion: Since the test statistic falls in
the upper rejection region, H0 is rejected, and
we may conclude that the average insurance
liability per board seat in “small companies”
is more than $2000.
Example:
1. A company that delivers packages within a large metropolitan
area claims that it takes an average of 28 minutes for a package
to be delivered from your door to the destination. Suppose that
you want to carry out a hypothesis test of this claim. Claim this
the minutes for a package to be delivered is equal to 28 at 0.05
level of significance.
2. The University uses thousands of fluorescent light bulbs each
year. The brand of bulb it currently uses has a mean life of 900
hours. A manufacturer claims that its new brand of bulbs, which
cost the same as the brand the university currently uses, has a
mean life of more than 900 hours. The university has decided to
purchase the new brand if, when tested, the test evidence
supports the manufacturer’s claim at the 0.05 significance level.
Suppose 64 bulbs were tested with the following results: = 920
hours S = 80 hours. Will the University purchase the new brand
of fluorescent bulbs?94
Measures of association

95
Chi-Square

 Test two variables (Categorical variables) for


independence
 Consider rxc contingency table:

Variable A Variable B
B1 B2 B3 B4 Totals
A1
A2
A3
Totals Grand total
where:
r = number of rows (number of categories of variable A)
c = number of columns (number of categories of variable B)
Chi-Square
 Hypothesis to be tested:
H0: There is no association between the
row and column variables
HA: There is an association
or
H0: The row and column variables are
independent
HA: The two variables are dependent
 Test Statistic: χ 2 - test with df= (r -1)x(c -1)

97
Chi-Square( 2) - test

where:
Oij -Observed frequency of i th row and jth column
i th row total×jth column total R i ×C j
E ij = =
grand total n
R i -Marginal total of the i th row
C j -Marginal total of the jth column
n-Grand total
98
An alternative method to calculate Chi-Square
for 2×2 table
Outcome
Exposure Yes No Total

Yes a b r1
No c d r2
Total c1 c2 n
n ( ad  bc ) 2
2 
r1r2c1c2
 Remember that Chi-Square test should be applied
to counts and not percentages 99
Characteristics of the Chi-Square Distribution

1. It is not symmetric.

2.The shape of the chi-square distribution depends upon the


degrees of freedom, just like Student’s t-distribution.

3. As the number of degrees of freedom increases, the chi-


square distribution becomes more symmetric as is
illustrated in the following Figure (see next slide) .

4. The values are non-negative. That is, the values of are


greater than or equal to 0. 100
The Chi-Square Distribution

101
Assumption  2 - test

 For the chi-square independence test to be used, the


following must be true
o The observed frequencies must be obtained by using
a random sample
o No expected frequency should be less than 1,
and no more than 20% of the expected
frequencies should be less than 5.
102
Critical values for chi-square:
.

 Critical values are found in Table by first locating


the row corresponding to the appropriate number of
degrees of freedom (where df = n –1). Next, the
significance level  is used to determine the correct
column.

0  2 (df , ) 2
103
Steps

 Step 1: Determine the null and alternative


hypothesis

HO: The two variables are independent

HA : The two variables are associated

104
 Step 2: Select a level of significance α based upon
the seriousness making Type I error. The level of
significance is used to determine the critical value.
All Chi-Square tests for independence are right-
tailed tests, so the critical value is with (r -1)x(c-1)
degrees of freedom. The shaded region at the right
represents the critical region or rejection region.

105
106
 Step 3: Calculate the expected frequencies for
Contingency Table Cells and Verify the requirements are
satisfied.
(Su m of r ow r )  (Su m of colu m n c )
E xpect ed fr equ en cy E r ,c 
Sa m ple size

(1) all expected frequencies are greater than or equal to 1


(all Eij > 1)

(2) no more than 20% of the expected frequencies are less than 5.

 If the conditions listed above are satisfied, then…


107
 Step 4: Compute the test statistic

2
2 (O  E )
χ 
E

where O represents the observed frequencies and


E represents the expected frequencies

108
 Step 5: Make a decision to reject or fail to
reject the null hypothesis
- Compare the critical value to the test statistic

 Step 6: State the conclusion

109
Example
 A researcher wishes to determine whether there is a
relationship between the gender of an individual
and the amount of alcohol consumed. A sample of
68 people is selected, and the following data are
obtained. At  = 0.10, can a researcher conclude
that alcohol consumption is related to gender?

110
Example

 Results of observed frequencies

Alcohol consumption
Low Moderate High Row
Gender total

Male 10 9 8 27
Female 13 16 12 41
Column 23 25 20 68

total 111
Solution
Step 1 State the hypothesis

H0: The amount of alcohol that a person consumes

is independent of the individual’s gender

HA: The amount of alcohol that a person

consumes is dependent of the individual’s

gender

112
Step 2 Find the critical value: the critical value is
4.605, since the degrees of freedom are (2-1)(3-1)=2

Step 3 compute the test value: First, compute the


expected frequency.
(41)(23)
(27)(23) E 2,1 =  13.87
E1,1 =  9.13 68
68
(41)(25)
(27)(25) E 2,2 =  15.07
E1,2 =  9.93 68
68
(41)(20)
(27)(20) E 2,3 =  12.06
E1,3 =  7.94 68
68
113
 The completed table of expected frequencies :

Alcohol consumption
Row
Low Moderate High
Gender total

Male 9.13 9.93 7.94 27


Female 13.87 15.07 12.06 41
Column 23 25 20 68

total
114
Then, the test value is
(O  E ) 2
2  
all cells E

(10  9.13) 2 (9  9.93) 2 (8  7.94) 2


  
9.13 9.93 7.94

(13  13.87) 2 (16  15.07) 2 (12  12.06) 2


  
13.87 15.07 12.06

 0.283
115
Step 4 Make the decision: Do not reject the null
hypothesis, since 0.283 < 4.605

Step 5 Conclusion: There is no enough evidence to


support the claim that the amount of alcohol a person
consumes is dependent of the individual’s gender

116
Example: Random samples of 200 men, all retired were
classified according to educational level and their
number of children is as shown below. Test at α= 0.05
level of significance that is a relationship between
number of children and educational level of men?

117
Example: A psychologist selected 100 people from each of
three income groups and asked them if
they were “very happy.” The percent for each group who
responded yes and the
number from the survey are shown in the table. At a 0.05
test the claim that there is
no difference in the proportions.

HH income >33% 34-67% >67% Total

Very happy 24 33 38 95

Not very happy 76 67 62 205

Total 100 100 100 300

118

You might also like