Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 103

Chapter 4

Inferences based on a Single Sample:


Confidence Intervals and tests of
hypotheses
Chapter outline

 Appendix 1: the Normal Distribution


 Sampling Distribution
 Confidence intervals for the population
parameter
 Proper sample size for estimating a population
parameter
 Test of Hypothesis
Appendix 1

The Normal Distribution


Importance of
Normal Distribution

 Describes many random processes or


continuous phenomena
 Basis for classical statistical inference
Normal Distribution

1. ‘Bell-shaped’ & f(x )


symmetrical
2. Mean, median,
mode are equal
x

Mean
Median
Mode
Probability Density Function
2
 1   x  
1    
 2    

f ( x)  e
 2
where
µ = Mean of the normal random variable
x
 = Standard deviation
π = 3.1415 . . .
e = 2.71828 . . .
P(x < a) is obtained from a table of
normal
probabilities
Effect of Varying
Parameters ( & )
Normal Distribution
Probability
Probability is
d
area under
curve!
P(c  x  d)  
c
f (x)dx ?

f(x)

x
c d
Standard Normal Distribution
The standard normal distribution is a normal
distribution with µ = 0 and  = 1. A random
variable with a standard normal distribution,
denoted by the symbol z, is called a standard
normal random variable.
The Standard Normal Table:
P(0 < z < 1.96)
Standardized Normal Prob (0<z<1.96)
Probability Table (Portion) Prob (z<1.96) – Prob(z<0) = 0.975 – 0.5 = 0.475
Z .04 .05 .06 s=1
1.8 .4671 .4678 .4686
.4750
1.9 .4738 .4744 .4750
2.0 .4793 .4798 .4803
m= 0 1.96 z
2.1 .4838 .4842 .4846 Shaded area
Probabilities exaggerated
The Standard Normal Table:
P(–1.26  z  1.26)

Standardized Normal Distribution


s=1

.3962 .3962 P(–1.26 ≤ z ≤ 1.26)


= .3962 + .3962
= .7924

–1.26 1.26 z
m=0
Shaded area exaggerated
The Standard Normal Table:
P(z > 1.26)

Standardized Normal Distribution


s=1

.5000
P(z > 1.26)
= .5000 – .3962
.3962
= .1038

1.26 z
m=0
The Standard Normal Table:
P(–2.78  z  –2.00)
Standardized Normal Distribution
s=1
.4973 P(–2.78 ≤ z ≤ –2.00)
= .4973 – .4772
.4772 = .0201
–2.78 –2.00 z
m=0
Shaded area exaggerated
The Standard Normal Table:
P(z > –2.13)

Standardized Normal Distribution


s=1
.4834 .5000 P(z > –2.13)
= .4834 + .5000
= .9834

–2.13 z
m=0
Shaded area exaggerated
Non-standard Normal
Distribution
Normal distributions differ by Each distribution would
mean & standard deviation. require its own table.

f(x)

x That’s an infinite
number of tables!
Converting a Normal Distribution to
a Standard Normal Distribution

If x is a normal random variable with mean μ and


standard deviation , then the random variable z,
defined by the formula
x µ
z

has a standard normal distribution. The value z


describes the number of standard deviations
between x and µ.
Standardize the
Normal Distribution

x
z
Normal  Standardized Normal
Distribution Distribution
s s=1

m x m= 0 z
One table!
Finding a Probability Corresponding to a
Normal Random Variable
1. Sketch normal distribution, indicate mean, and shade
the area corresponding to the probability you want.
2. Convert the boundaries of the shaded area from x
values to standard normal random variable z
x µ
z

Show the z values under corresponding x values.
3. Use Table II in Appendix D to find the areas
corresponding to the z values. Use symmetry when
necessary.
Non-standard Normal μ = 5,
σ = 10: P(5 < x < 6.2)
x 6.2  5
z   .12
Normal  10 Standardized Normal
Distribution Distribution

s = 10 s=1
.0478

m= 5 6.2 x m= 0 .12 z
Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(3.8  x  5)
x 3.8  5
z   .12
 10
Normal Standardized Normal
Distribution Distribution

s = 10 s=1
.0478

3.8 m = 5 x -.12 m = 0 z
Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(2.9  x  7.1)
x 2.9  5 x 7.1  5
z   .21 z   .21
 10  10
Normal Standardized Normal
Distribution Distribution
s = 10 s=1
.1664

.0832 .0832

2.9 5 7.1 x -.21 0 .21 z


Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(x  8)
x 85
z   .30
Normal
 10 Standardized Normal
Distribution Distribution
s = 10 s=1
.5000
.3821
.1179

m=5 8 x m=0 .30 z


Shaded area exaggerated
Non-standard Normal μ = 5,
σ = 10: P(7.1  X  8)
x 7.1  5 x 85
z   .21 z   .30
 10  10
Normal Standardized Normal
Distribution Distribution
s = 10 s=1

.1179
.0347
.0832

m=5 7.1 8 x m=0 .21 .30 z


Shaded area exaggerated
Normal Distribution Thinking
Challenge
You work in Quality Control for
GE. Light bulb life has a
normal distribution with
= 2000 hours and = 200
hours. What’s the probability
that a bulb will last
A. between 2000 and 2400
hours?
B. less than 1470 hours?
Finding z-Values
for Known Probabilities
What is Z, given Standardized Normal
P(z) = .1217? Probability Table (Portion)

s=1 Z .00 .01 0.2


.1217
0.0 .0000 .0040 .0080

0.1 .0398 .0438 .0478

m=0 ?
.31 z 0.2 .0793 .0832 .0871

Shaded area 0.3 .1179 .1217 .1255


exaggerated
Finding x Values
for Known Probabilities

Normal Distribution Standardized Normal Distribution


s = 10 s=1
.1217 .1217

m = 5 8.1
? x m = 0 .31 z

x    z    5  .3110  
Shaded areas exaggerated
Sampling Distribution
Parameter & Statistic

A parameter is a numerical descriptive measure


of a population. Because it is based on all the
observations in the population, its value is almost
always unknown.
A sample statistic is a numerical descriptive
measure of a sample. It is calculated from the
observations in the sample.
Common Statistics &
Parameters
Sample Statistic Population Parameter

Mean x 
Standard
Deviation s 

Variance s2 2
Binomial ^
p p
Proportion
Sampling Distribution

The sampling distribution of a sample statistic


calculated from a sample of n measurements is
the probability distribution of the statistic.

 Collect randomly n measurements for each


sample from the population The statistics of
these samples follow a distribution, namely
sampling distribution.
Example
Sampling Distributions
Suppose There’s a Population ...
 Population size, N = 4
 Random variable, x
 Values of x: 1, 2, 3, 4
 Uniform distribution

© 1984-1994 T/Maker Co.


Population Characteristics

Summary Measure Population Distribution


N P(x)
.3
 xi .2
 i1
 2.5 .1
N .0 x
1 2 3 4
All Possible Samples
of Size n = 2

16 Samples 16 Sample Means


1st 2nd Observation 1st 2nd Observation
Obs 1 2 3 4 Obs 1 2 3 4
1 1,1 1,2 1,3 1,4 1 1.0 1.5 2.0 2.5
2 2,1 2,2 2,3 2,4 2 1.5 2.0 2.5 3.0
3 3,1 3,2 3,3 3,4 3 2.0 2.5 3.0 3.5
4 4,1 4,2 4,3 4,4 4 2.5 3.0 3.5 4.0
Sample with replacement
Sampling Distribution
of All Sample Means

16 Sample Means Sampling Distribution


1st 2nd Observation of the Sample Mean
Obs 1 2 3 4
1 1.0 1.5 2.0 2.5 P(x)
.3
2 1.5 2.0 2.5 3.0 .2
.1
3 2.0 2.5 3.0 3.5 .0 x
1.0 1.5 2.0 2.5 3.0 3.5 4.0
4 2.5 3.0 3.5 4.0
Summary Measure of
All Sample Means

x i
1.0  1.5  ...  4.0
X  i 1
  2.5
N 16
Comparison
Population Sampling Distribution
P(x) P(x)
.3 .3
.2 .2
.1 .1
.0 x
.0 x
1 2 3 4 1.0 1.5 2.0 2.5 3.0 3.5 4.0

  2.5  x  2.5
The Sampling Distribution
of a Sample Mean and the
Central Limit Theorem
Properties of the Sampling
Distribution of x

1. Mean of the sampling distribution equals mean


of sampled population, that is,
 x  E x   .
2. Standard deviation of the sampling distribution
equals Standard deviation of sampled population
Square root of sample size

That is,  x  .
Standard error n
Standard Error of the Mean

The standard deviation  x is often referred


to as the standard error of the mean.
Theorem

If a random sample of n observations is selected


from a population with a normal distribution, the
sampling distribution of x will be a normal
distribution.
Sampling from
Normal Populations
 Central Tendency Population Distribution
x   s = 10
 Dispersion
 m = 50 x
x 

n
Sampling with Sampling Distribution
replacement n=4 n =16
x = 5 x = 2.5

mx- = 50 x
Standardizing the Sampling
Distribution of x
x  x x
z 
x 
Sampling n Standardized Normal
Distribution Distribution
sx s=1

mx x m =0 z
Central Limit Theorem
Consider a random sample of n observations
selected from a population (any probability
distribution) with mean μ and standard deviation .
Then, when n is sufficiently large, the sampling
distribution of x will be approximately a normal
distribution with mean  x   and standard
deviation  x   n . The larger the sample size,
the better will be the normal approximation to the
sampling distribution of x .
Central Limit Theorem

As sample x 
n
size gets
sampling
large
distribution
enough
becomes almost
(n  30) ...
normal.

x   x
Central Limit Theorem

x
x  
The Sampling Distribution
of the Sample Proportion
Sample Proportion

Just as the sample mean is a good estimator of the


population mean, the sample proportion—denoted


— is a good estimator of the population
proportion p. How good the estimator is will
depend on the sampling distribution of the statistic.
x. similar to
This sampling distribution has properties
those of the sampling distribution of
Sample Distribution of p̂
1. Mean of the sampling distribution is equal to the
true binomial proportion, p; that is, E ( pˆ )  p.
Consequently, p̂ is an unbiased estimator of p
2. Standard deviation of the sampling distribution is
equal to p (1  p ) / n ; that is,

3. For large samples, the sampling distribution is


approximately normal. (A sample is considered
large if   p (1  p ) / n .

npˆ  15 and n(1  pˆ )  15.)
Example

Suppose you’re interested in the average amount of


money that students in this class (the population)
have on them. How would you find out?
Confidence intervals of
parameters
Target Parameter
 The unknown population parameter (e.g., mean or
proportion) that we are interested in estimating is
called the target parameter.
 The type of data (quantitative or qualitative)
collected is indicative of the target parameter. With
quantitative data, you are likely to be estimating the
mean or variance of the data. With qualitative data
with two outcomes (success or failure), the binomial
proportion of successes is likely to be the parameter
of interest.
Target Parameter
Determining the Target Parameter

Parameter Key Words of Phrase Type of Data

µ Mean; average Quantitative

p Proportion; percentage
fraction; rate Qualitative
Estimates

If the sampling distribution of a sample statistic


has a mean equal to the population parameter
the statistic is intended to estimate, the statistic
is said to be an unbiased estimate of the
parameter.
If the mean of the sampling distribution is not
equal to the parameter, the statistic is said to be
a biased estimate of the parameter.
Point Estimator
A point estimator of a population parameter is a
rule or formula that tells us how to use the sample
data to calculate a single number that can be used
as an estimate of the target parameter.
Point Estimation
 Provides a single value: based on
observations from one sample
 Gives no information about how close the
value is to the unknown population parameter

Example: Sample mean x = 3 is the point


estimate of the unknown population mean
Interval Estimator

An interval estimator (or confidence


interval) is a formula that tells us how to use
the sample data to calculate an interval that
estimates the target parameter.
Interval Estimation
 Provides a range of values
 Based on observations from one sample
 Gives information about closeness to unknown
population parameter
• Stated in terms of probability
– Knowing exact closeness requires knowing unknown
population parameter
 Example: Unknown population mean lies between
50 and 70 with 95% confidence
Estimation Process
Population Random Sample
I am 95%
Mean  confident that 
Mean, , is  x = 50 is between 40 &
unknown 60.

 
 
Sample 

 


Key Elements of
Interval Estimation
Sample statistic
Confidence
(point estimate)
interval

Confidence Confidence
limit (lower) limit (upper)

A confidence interval provides a range of


plausible values for the population parameter.
Example-Overdue

Data set: OVRDUE


Confidence Interval
The Central Limit Theorem:
 The sampling distribution of the sample mean

is approximately normal for large samples.


 The interval estimator:

1.96
x  1.96 x  x 
n
 For large samples, the fact that sigma is
unknown The sample standard deviation s
provides a very good approximation to sigma.
Confidence Interval
If sample measurements yield a value of x that falls
between the two lines on either side of µ, then the
interval x  1.96 x will contain µ.
95% Confidence Level
If our confidence level is 95%, then in the long run,
95% of our confidence intervals will contain µ and
5% will not.
To choose a different confidence coefficient we
increase or decrease the area (call it ) assigned to
the tails. If we place /2 in each tail

and z/2 is the z-value, the


confidence interval with
coefficient (1 – ) is
 
x  z 2  x .
Large-Sample
(1 – )% Confidence Interval for µ

x   z 2  x  x  z 2  / n 
where z/2 is the z-value with an area /2 to its right
and in the standard normal distribution.
The parameter  is the standard deviation of the
sampled population, and n is the sample size.
Note: When  is unknown and n is large (n ≥ 30),
the confidence interval is approximately equal to

x  z 2 s / n 
where s is the sample standard deviation.
Required Conditions

1. A random sample is selected from the target


population.
2. The sample size n is large (i.e., n ≥ 30). Due to
the Central Limit Theorem, this condition
guarantees that the sampling distribution of
is approximately normal. Also, for large n, s will
be a good estimator of .
Thinking Challenge
 (x) = 0.05/
You’re a Q/C inspector for = 0.005
Gallo. The  for 2-liter bottles Prob (1,99 –
1,64.0,005
is .05 liters. A random sample
of 100 bottles showed x =
1.99 liters. What is the 90%
confidence interval estimate
of the true mean amount in 2-
liter bottles?
22 liter
liter

© 1984-1994 T/Maker Co.


Problem

 Unoccupied seats on flights cause airlines to lose


revenue. Suppose a large airline wants to
estimate its average number of unoccupied seats
per flight over the past year. To accomplish this,
the records of 225 flights are randomly selected,
and the number of unoccupied seats is noted for
each of the sampled flights. (The data are saved
in the NOSHOW file.). Estimate 90% confidence
interval of the mean number of unoccupied seats.
Example
Example
Small Sample  Unknown

Instead of using the standard normal statistic


xµ xµ
z 
x  n

use the t–statistic


xµ
t
s n
in which the sample standard deviation, s, replaces
the population standard deviation, .
Student’s t-Statistic
The t-statistic has a sampling distribution very
much like that of the z-statistic: mound-shaped,
symmetric, with mean 0.
The primary
difference between
the sampling
distributions of t and
z is that the t-
statistic is more
variable than the z-
statistic.
Degrees of Freedom

The actual amount of variability in the sampling


distribution of t depends on the sample size n. A
convenient way of expressing this dependence is
to say that the t-statistic has (n – 1) degrees of
freedom (df).
Student’s t Distribution

Standard
Normal
Bell-Shaped
t (df = 13)
Symmetric
‘Fatter’ Tails
t (df = 5)

z
t
0
t - Table
t-value
If we want the t-value with an area of .025 to its
right and 4 df, we look in the table under the
column t.025 for the entry in the row corresponding
to 4 df. This entry is t.025 = 2.776. The
corresponding standard normal z-score is z.025 =
1.96.
Small-Sample
Confidence Interval for µ

 s 
x  t 2 
 n 

where ta/2 is based on (n – 1) degrees of freedom.


Required Conditions

1. A random sample is selected from the target


population.
2. The population has a relative frequency
distribution that is approximately normal.
Estimation Example
Mean ( Unknown)

A random sample of n = 25 has x = 50 and s = 8.


Set up a 95% confidence interval estimate for .
s s
x  t /2     x  t /2 
n n
8 8
50  2.064     50  2.064 
25 25
46.70    53.30
Thinking Challenge
You’re a time study analyst
in manufacturing. You’ve
recorded the following task
times (min.):
3.6, 4.2, 4.0, 3.5, 3.8, 3.1.
What is the 90% confidence
interval estimate of the
population mean task time?
Problem
 Facial structure of CEOs. In Psychological Science (Vol. 22, 2011), researchers
reported that a chief executive officer’s facial structure can be used to predict a
firm’s financial performance. The study involved measuring the facial width to-
height ratio (WHR) for each in a sample of 55 CEOs at publicly traded Fortune
500 firms. These WHR values (determined by a computer analyzing a photo of the
CEO’s face) had a mean of x = 1.96 and a standard deviation of s = .15.
 a. Find and interpret a 95% confidence interval for m, the mean facial WHR for
all CEOs at publicly traded Fortune 500 firms.
 b. The researchers found that CEOs with wider faces (relative to height) tended to
be associated with firms that had greater financial performance. They based their
inference on an equation that uses facial WHR to predict financial performance.
Suppose an analyst wants to predict the financial performance of a Fortune 500
firm based on the value of the true mean facial WHR of CEOs. The analyst wants
to use the value of m = 2.2. Do you recommend he use this value?
Large-Sample Confidence
Interval for a Population
Proportion
Problem

 A food-products company conducted a market


study by randomly sampling and interviewing
1,000 consumers to determine which brand of
breakfast cereal they prefer. Suppose 313
consumers were found to prefer the company’s
brand. How would you estimate the true fraction
of all consumers who prefer the company’s cereal
brand?
Sampling Distribution of p̂
1. The mean of the sampling distribution of p̂ is p;
that is, p̂ is an unbiased estimator of p.

2. The standard deviation of the sampling


distribution of p̂ is pq n ; that is,  p̂  pq n
where q = 1–p.
3. For large samples, the sampling distribution of p̂
is approximately normal. A sample size is
considered large if both np̂  15 and nq̂  15.
Large-Sample Confidence
Interval for p̂

pq ˆˆ
pq
pˆ  z 2 pˆ  pˆ  z 2  pˆ  z 2
n n
x
where p̂  and q̂  1  p̂.
n

Note: When n is large, p̂ can approximate the


value of p in the formula for  p̂ .
Conditions Required for a Valid
Large-Sample Confidence
Interval for p
1. A random sample is selected from the target
population.
2. The sample size n is large. (This condition will be
satisfied if both np̂  15 and nq̂  15 . Note that np̂
and nq̂ are simply the number of successes and
number of failures, respectively, in the sample.).
Estimation Example
Proportion
A random sample of 400 graduates showed 32
went to graduate school. Set up a 95% confidence
interval estimate for p.

ˆˆ
pq ˆˆ
pq 32
pˆ  Z /2  p  pˆ  Z /2 pˆ   0.08
n n 400

.08 .92  .08 .92 


.08  1.96  p  .08  1.96
400 400

.053  p  .107
Thinking Challenge
You’re a production
manager for a newspaper.
You want to find the %
defective. Of 200
newspapers, 35 had
defects. What is the 90%
confidence interval estimate
of the population
proportion defective?
Problem
Adjusted (1 – )100% Confidence
Interval for a Population Proportion, p

p1  p 
p  z 2
n4
x2
p 
where  n  4 is the adjusted sample proportion
of observations with the characteristic of interest, x
is the number of successes in the sample, and n is
the sample size.
Determining the Sample Size
Sample size and C.I.
Sampling Error
In general, we express the reliability associated
with a confidence interval for the population mean
µ by specifying the sampling error within which
we want to estimate µ with 100(1 –)% confidence.
The sampling error (denoted SE), then, is equal to
the half-width of the confidence interval.
Sample Size Determination for 100(1 – )
% Confidence Interval for µ

In order to estimate µ with a sampling error (SE)


and with 100(1 – )% confidence, the required
sample size is found as follows:
  
z 2    SE
 n
The solution for n is given by the equation
2
 z /2 
n 
 SE 
Sample Size Example
What sample size is needed to be 90% confident
the mean is within  5? A pilot study suggested
that the standard deviation is 45.

1.645 45
2 2
(z 2 ) 
2 2

n   219.2  220
(SE) 2 5
2
Sample Size Determination for 100(1 – )
% Confidence Interval for p

In order to estimate p with a sampling error SE and


with 100(1 – )% confidence, the required sample
size is found by solving the following equation for
n: pq
z 2  SE
n
The solution for n can be written as follows:
z   pq 
2
 2
Note: Always round n
n up to the nearest
SE  2
integer value.
Sample Size Example
What sample size is needed to estimate p
within .03 with 90% confidence?
width .03
SE    .015
2 2

(Z 2 )  pq 
2
1.645  .5 .5 
2

n   3006.69  3007
(SE) 2 .015 2
Thinking Challenge
You work in Human Resources at Merrill Lynch.
You plan to survey employees to find their
average medical expenses. You want to be
95% confident that the sample mean is within ±
$50.
A pilot study showed that  was about $400.
What sample size do you use?
Confidence Interval for a
Population Variance
Confidence Interval for a
Population Variance
Conditions Required for a Valid
Confidence Interval for 2

1. A random sample is selected from the target


population.
2. The population of interest has a relative
frequency distribution that is approximately
normal.
Thinking Challenge
You’re a marketing manager for a 5K race. You
take a random sample of the times of 292 runners
from the last race, with mean of 28.5 minutes and
standard deviation of 8.3 minutes. What is the
95% confidence interval estimate of the
population variance?

You might also like