Professional Documents
Culture Documents
Chapter Two-Four
Chapter Two-Four
Statistics
1
Objectives
At the end of this course students will be able to:
3
Inferential Statistics
4
Inferential Statistics
Involves
– Estimation Population?
Population?
– Hypothesis
testing
Purpose
– Make decisions about
population
characteristics
Inferential Statistics
Inferential statistics
6
Inferential process
7
Statistical Estimation
Estimation is the process of determining a likely value
of population parameter, based on information
collected from the sample
Estimation is the use of sample statistics to estimate the
corresponding population parameters
The objective of estimation is to determine the
approximate value of unknown population parameter
on the basis of a sample statistic 8
Sample Statistics as Estimators of Population
Parameters
Parameter
Random sample
Estimation
Statistic
10
Estimation
Estimation
Point Interval
estimation estimation
11
Point and Interval Estimates
A point estimate is a single value used as an estimate of a population
parameter
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Interval estimate
12
Estimation Process
Interval estimate
Population Point estimate
Mean I am 95%
confident that
Mean, , is X = 50
is between 40 &
unknown 60.
RandomSample
13
Point estimation
14
Sample statistic &their corresponding
population parameter
Statistic Parameter
Mean: X estimates
Variance: s2 estimates 2
Standard
deviation:
s estimates
Proportion: p estimates
From entire
From sample
population
15
Properties of good estimate
a) Unbiasedness: An estimator is said to be unbiased
if its expected value is equal to the population
parameter it estimates.
17
Properties of good estimate
C) Consistency:
C) Consistency: An
An estimator
estimator isis said
said to
to be
be consistent
consistent ifif its
its
probability of
probability of being
being close
close to
to the
the parameter
parameter itit estimates
estimates increases
increases as
as
thesample
the samplesize
sizeincreases
increases
Consistency
n = 10 n = 100
18
Interval estimation
19
Interval estimation
It is not reasonable to assume that a sample statistic
value is exactly equal to the corresponding population
parameter
An interval estimate which locates the population
parameter within an interval, with a level of
confidence is needed
20
Confidence Interval or Interval Estimate
Confidence
Confidence interval
interval oror interval
interval estimate
estimate isis aa range
range or
or
interval of
interval of numbers
numbers believed
believed to to include
include anan unknown
unknown
population parameter
population parameter
Confidence
Confidence interval:
interval: provide
provide aa range
range of
of values
values of of the
the
estimate likely
estimate likely to
to include
include the
the “true”
“true” population
population parameter
parameter
with aa given
with given probability
probability
22
CI for population mean:
There are different conditions to be considered to construct confidence intervals of the
population mean,
1. Large-sample size and when is known
23
CI for population mean:
2. Large-sample size and when is unknown
Whenever is not known (and the population is assumed
normal), the correct distribution to use is the t distribution
with n-1 degrees of freedom. However, for large degrees
of freedom, the t distribution is approximated well by the
Z distribution
25
Solution
s
X Z /2
n
10
170 1.96
100
(168.04, 171.96)
26
CI for population mean:
3. Small sample size (n<30) and when is
unknown
If population standard deviation is unknown, then
the sample means from samples of size n are t-
distributed with n-1 degrees of freedom
s
X t /2, n-1
n
27
Example: The average earnings per share (EPS)
for 10 industrial stocks randomly selected from
those listed on the Dow-Jones Industrial
Average was found to be X = 1.85 with a
standard deviation of S=0.395. Calculate a 99%
confidence interval for the average EPS of all
the industrials listed on the DJIA.
Solution:
28
29
Example: A random sample of 900 workers
showed an average height of 67 inches with a
standard deviation of 5 inches.
Solution:
33
34
Example:
p±zα/2 p(1-p)
n
36
Example
• In a sample of 400 people who were questioned
regarding their participation in sports, 160 said that
they did participate. Construct a 98 % confidence
interval for P, the proportion of P in the population
who participate in sports.
37
38
Exercise:
40
Sample size determination
Common questions:
42
43
Margin of Error
44
Factors Affecting Margin of Error
49
But the population is most of the time unknown
As a result, it has to be estimated from:
Pilot or preliminary sample:
standard deviation, s
Similar studies
50
Estimation of single proportion
(zα/ 2 ) 2 pq
n=
d2
Where:
n = sample size
P = percentage
q = 1-p
d = desired degree of precision
Z= is the standard normal value at the level of
confidence desired, usually at 95% confidence
level
51
Example
A) Suppose that you are interested to know the proportion of
infants who breastfed >18 months of age in a rural area.
Suppose that in a similar area, the proportion (p) of breastfed
infants was found to be 0.20. What sample size is required to
estimate the true proportion within ±3% with 95% confidence
52
Example
B) If the above sample is to be taken from a relatively small
population (say N = 3000) , the required minimum sample will
be obtained from the above estimate by making some
adjustment (if the population is less than 10,000 then a smaller
sample size may be required).
n 683
n final = 557
n 683
1+ 1
N 3000
53
An estimate of p is not always available
However, the formula may also be used for sample size
calculation based on various assumptions for the values of p.
57
What is a Hypothesis?
A hypothesis is a
I claim the mean of GPA
claim (assumption) about of this class is 3.5
the true value of unknown
population parameter
- Parameter may be
population mean, proportion,
correlation coefficient,...
– Must be stated
before analysis
58
Hypothesis testing
The purpose of hypothesis testing is to determine whether
enough statistical evidence exists to enable us to conclude that
a belief or hypothesis about a parameter is reasonable
Examples
Is X 20 likely if ?
Take a Sample
No, not likely!
REJECT H0
X 20
60
Steps in hypothesis testing
- Alternative hypotheses
61
State the null hypotheses
62
Null Hypothesis:
H0
63
State the alternative hypotheses
Alternate to null hypothesis
64
Alternative Hypothesis:
H1 or HA
65
Hypothesis
Example: Consider population mean
H0: μ = μ0
HA: μ μ0
Two- tailed
66
Example:
A. Is the mean SBP of the population is different from 120
mmHg?
67
Errors in making Decision
1.Type I Error
– Probability of rejecting true null hypothesis
– Probability of accepting a false alternative hypothesis
– Probability of Type I Error is (Alpha)
• Called level of significance
2.Type II Error
– Probability of failing to reject a false null hypothesis
– Probability of rejecting a true alternative hypothesis
– Probability of Type II Error is (Beta)
68
Type I & II Errors Have an Inverse
Relationship
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.
69
Factors Affecting Type II Error
Significance level
– Increases when decreases
Population standard deviation
– Increases when increases
Sample size
– Increases when n decreases n
70
Controlling Type I and
Type II Errors
For any fixed , an increase in the sample
size n will cause a decrease in
For any fixed sample size n, a decrease in
will cause an increase in . Conversely, an
increase in will cause a decrease in .
To decrease both and , increase the
sample size.
71
Power of a statistical test
The power of a statistical test is the probability of
rejecting Ho, when Ho is really false. Thus power =
1-β.
Clearly if the test maximizes power, it minimizes the
probability of Type 2 error β.
72
Summary:
Elements of a Hypothesis Test
Null Hypothesis (H0)
– A theory about the values of one or more population
parameters. The status quo.
Alternative Hypothesis (Ha)
– A theory that contradicts the null hypothesis. The theory
generally represents that which we will accept only when
sufficient evidence exists to establish its truth.
Test Statistic
– A sample statistic used to decide whether to reject the null
hypothesis. In general,
Estimate-Hypothesized Parameter
test statistic=
Standard Error
73
Summary:
Elements of a Hypothesis Test
Critical Value
– A value to which the test statistic is compared at some
particular significance level. (usually at =.01, .05, .10)
Rejection Region
– The numerical values of the test statistic for which the null
hypothesis will be rejected.
– The probability is that the rejection region will contain the
test statistic when the null hypothesis is true, leading to a
Type I error. is usually chosen to be small (.01, .05, .10)
and is the level of significance of the test. 74
Summary of One- and Two-Tail Tests
One-Tail Test Two-Tail Test One-Tail Test
(left tail) (right tail)
H0: μ μ0 H0: μ ≤ μ0
HA: μ > μ0
HA: μ < μ0
75
Summary: Rejection Regions
1. Rejection Regions (In Grey)
.5
.5
Form of Ha: 0 2 2
2-tail hypothesis
2 2
If |z|>|z/2|
0
If z< z
0
76
Summary :Type I and Type II
Errors
77
Example: Two-Tail Test
78
General steps in hypothesis testing:
5. Making decision.
6. Conclusion
79
Summary of Decision Rules
80
Example Solution: Two-Tail Test
H0: m = 368
Test Statistic:
• H0: = 368
• HA: 368
• = 0.05
• df = 25-1=24
• Critical Value: ±2.042 0.02 < p-value < 0.05
84
Solution
H0: m £ 368
H1: m > 368 Test Statistic:
a = 0.05 X
Z 1.50
n = 36
Critical Value: 1.645 n
Reject
86
Example: An automatic bottling machine fills cola into two liter (2000
cc) bottles. A consumer advocate wants to test the null hypothesis that
the average amount filled by the machine into a bottle is at least 2000 cc.
A random sample of 40 bottles coming out of the machine was selected
and the exact content of the selected bottles are recorded. The sample
mean was 1999.6 cc. The population standard deviation is known from
past experience to be 1.30 cc.
Compute the p-value for this test.
•• H0:2000
H0: 2000 x 0 1999.6 - 2000
z =
•• H1:2000
H1: 2000 1.3
n
•• nn==40, 0==2000,
40,0 2000,x-bar
x-bar==1999.6,
1999.6, 40
==1.3
1.3 = 1.95
p - value P(Z -1.95)
•• The
Thetest
teststatistic
statisticis:
is: 0.5000 - 0.4744
0.0256
87
p -Value Solution
Since (p-Value = 0.0256) (a = 0.05)
Reject H0 .
88
Example: One-ailed Test
x 0 2700 - 2000
z =
s 947
n 100
700
= 7 .39 Reject H
94.7 0
6. Conclusion: Since the test statistic falls in
the upper rejection region, H0 is rejected, and
we may conclude that the average insurance
liability per board seat in “small companies”
is more than $2000.
Example:
1. A company that delivers packages within a large metropolitan
area claims that it takes an average of 28 minutes for a package
to be delivered from your door to the destination. Suppose that
you want to carry out a hypothesis test of this claim. Claim this
the minutes for a package to be delivered is equal to 28 at 0.05
level of significance.
2. The University uses thousands of fluorescent light bulbs each
year. The brand of bulb it currently uses has a mean life of 900
hours. A manufacturer claims that its new brand of bulbs, which
cost the same as the brand the university currently uses, has a
mean life of more than 900 hours. The university has decided to
purchase the new brand if, when tested, the test evidence
supports the manufacturer’s claim at the 0.05 significance level.
Suppose 64 bulbs were tested with the following results: = 920
hours S = 80 hours. Will the University purchase the new brand
of fluorescent bulbs?94
Measures of association
95
Chi-Square
Variable A Variable B
B1 B2 B3 B4 Totals
A1
A2
A3
Totals Grand total
where:
r = number of rows (number of categories of variable A)
c = number of columns (number of categories of variable B)
Chi-Square
Hypothesis to be tested:
H0: There is no association between the
row and column variables
HA: There is an association
or
H0: The row and column variables are
independent
HA: The two variables are dependent
Test Statistic: χ 2 - test with df= (r -1)x(c -1)
97
Chi-Square( 2) - test
where:
Oij -Observed frequency of i th row and jth column
i th row total×jth column total R i ×C j
E ij = =
grand total n
R i -Marginal total of the i th row
C j -Marginal total of the jth column
n-Grand total
98
An alternative method to calculate Chi-Square
for 2×2 table
Outcome
Exposure Yes No Total
Yes a b r1
No c d r2
Total c1 c2 n
n ( ad bc ) 2
2
r1r2c1c2
Remember that Chi-Square test should be applied
to counts and not percentages 99
Characteristics of the Chi-Square Distribution
1. It is not symmetric.
101
Assumption 2 - test
0 2 (df , ) 2
103
Steps
104
Step 2: Select a level of significance α based upon
the seriousness making Type I error. The level of
significance is used to determine the critical value.
All Chi-Square tests for independence are right-
tailed tests, so the critical value is with (r -1)x(c-1)
degrees of freedom. The shaded region at the right
represents the critical region or rejection region.
105
106
Step 3: Calculate the expected frequencies for
Contingency Table Cells and Verify the requirements are
satisfied.
(Su m of r ow r ) (Su m of colu m n c )
E xpect ed fr equ en cy E r ,c
Sa m ple size
(2) no more than 20% of the expected frequencies are less than 5.
2
2 (O E )
χ
E
108
Step 5: Make a decision to reject or fail to
reject the null hypothesis
- Compare the critical value to the test statistic
109
Example
A researcher wishes to determine whether there is a
relationship between the gender of an individual
and the amount of alcohol consumed. A sample of
68 people is selected, and the following data are
obtained. At = 0.10, can a researcher conclude
that alcohol consumption is related to gender?
110
Example
Alcohol consumption
Low Moderate High Row
Gender total
Male 10 9 8 27
Female 13 16 12 41
Column 23 25 20 68
total 111
Solution
Step 1 State the hypothesis
gender
112
Step 2 Find the critical value: the critical value is
4.605, since the degrees of freedom are (2-1)(3-1)=2
Alcohol consumption
Row
Low Moderate High
Gender total
total
114
Then, the test value is
(O E ) 2
2
all cells E
0.283
115
Step 4 Make the decision: Do not reject the null
hypothesis, since 0.283 < 4.605
116
Example: Random samples of 200 men, all retired were
classified according to educational level and their
number of children is as shown below. Test at α= 0.05
level of significance that is a relationship between
number of children and educational level of men?
117
Example: A psychologist selected 100 people from each of
three income groups and asked them if
they were “very happy.” The percent for each group who
responded yes and the
number from the survey are shown in the table. At a 0.05
test the claim that there is
no difference in the proportions.
Very happy 24 33 38 95
118