Chapter 2 - Basic Statistical Methods: - Describing Sample Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Chapter 2 –Basic Statistical Methods

• Describing sample data


– Random samples
– Sample mean, variance, standard deviation
– Populations versus samples
– Population mean, variance, standard deviation
– Estimating parameters
• Simple comparative experiments
– The hypothesis testing framework
– The two-sample t-test
– Checking assumptions, validity

Chapter 2 Design & Analysis of Experiments 1


7E 2009 Montgomery
Portland Cement Formulation (page 24)

Chapter 2 Design & Analysis of Experiments 2


7E 2009 Montgomery
Basic Statistical Concepts
• Run: each observation of the conducted experiment.
• Noise: the variation observed among individual runs. It is usually
called experimental error or just error.
• Such error is considered a statistical error (i.e. it arises from variation
that is uncontrolled and generally unavoidable.)
• The presence of error or noise implies that the response variable is a
random variable.
• Random variables are either discrete or continuous.
• Discrete random variables: countable finite or infinite sets of all
possible values of the random variable.
• Continuous random variables: interval (measurable) sets of all
possible values of the random variable.
• Graphical description of variability: Dot Diagram, Histogram, Box
Plot,

Chapter 2 Design & Analysis of Experiments 3


7E 2009 Montgomery
Basic Statistical Concepts (Cont.)
• Probability distribution: the probability structure of a random
variable.
– Probability mass function for discrete random variables (i.e. P(y= yj) = p(yj))
– Probability density function for continuous random variables (i.e. P (a ≤ y ≤ b))
• Central Tendency Measures: mean , expected value , median, mode
• Spread Measures: range, variance, standard devation.
• Sampling & sampling distribution (p. 28)
– Random samples (equal probability of being chosen)
– Sample mean
– Sample variance
– Sample standard deviation
• Properties of the sample mean & variance (point estimators):
– Should be unbiased
– Should have minimum variance
• Degrees of freedom
• Normal, t, chi square, & F distributions
Chapter 2 Design & Analysis of Experiments 4
7E 2009 Montgomery
Graphical View of the Data
Dot Diagram, Fig. 2.1, pp. 24

Chapter 2 Design & Analysis of Experiments 5


7E 2009 Montgomery
If you have a large sample, a
histogram may be useful

Chapter 2 Design & Analysis of Experiments 6


7E 2009 Montgomery
Box Plots, Fig. 2.3, pp. 26

Chapter 2 Design & Analysis of Experiments 7


7E 2009 Montgomery
The Hypothesis Testing Framework

• Statistical hypothesis testing is a useful


framework for many experimental
situations
• Origins of the methodology date from the
early 1900s
• We will use a procedure known as the two-
sample t-test

Chapter 2 Design & Analysis of Experiments 8


7E 2009 Montgomery
The Hypothesis Testing Framework

• Sampling from a normal distribution


• Statistical hypotheses: H :   
0 1 2

H1 : 1  2
Chapter 2 Design & Analysis of Experiments 9
7E 2009 Montgomery
Estimation of Parameters
1 n
y   yi estimates the population mean 
n i 1
n
1
S 
2

n  1 i 1
( yi  y ) estimates the variance 
2 2

Chapter 2 Design & Analysis of Experiments 10


7E 2009 Montgomery
Summary Statistics (pg. 36)
Formulation 1 Formulation 2

“New recipe” “Original recipe”

y1  16.76 y2  17.04
S  0.100
1
2
S 22  0.061
S1  0.316 S 2  0.248
n1  10 n2  10

Chapter 2 Design & Analysis of Experiments 11


7E 2009 Montgomery
How the Two-Sample t-Test Works:
Use the sample means to draw inferences about the population means
y1  y2  16.76  17.04  0.28
Difference in sample means
Standard deviation of the difference in sample means
2
 
2
y
n
This suggests a statistic:
y1  y2
Z0 
 12  22

n1 n2
Chapter 2 Design & Analysis of Experiments 12
7E 2009 Montgomery
How the Two-Sample t-Test Works:
Use S and S to estimate  and 
1
2 2
2
2
1
2
2

y1  y2
The previous ratio becomes
2 2
S S

1 2
n1 n2
However, we have the case where      2
1
2
2
2

Pool the individual sample variances:


( n  1) S 2
 ( n  1) S 2
Sp  1
2 1 2 2
n1  n2  2
Chapter 2 Design & Analysis of Experiments 13
7E 2009 Montgomery
How the Two-Sample t-Test Works:
The test statistic is
y1  y2
t0 
1 1
Sp 
n1 n2
• Values of t0 that are near zero are consistent with the null
hypothesis
• Values of t0 that are very different from zero are consistent
with the alternative hypothesis
• t0 is a “distance” measure-how far apart the averages are
expressed in standard deviation units
• Notice the interpretation of t0 as a signal-to-noise ratio
Chapter 2 Design & Analysis of Experiments 14
7E 2009 Montgomery
The Two-Sample (Pooled) t-Test
(n1  1) S12  (n2  1) S22 9(0.100)  9(0.061)
S 
2
  0.081
n1  n2  2 10  10  2
p

S p  0.284

y1  y2 16.76  17.04
t0    2.20
1 1 1 1
Sp  0.284 
n1 n2 10 10

The two sample means are a little over two standard deviations apart
Is this a "large" difference?

Chapter 2 Design & Analysis of Experiments 15


7E 2009 Montgomery
The Two-Sample (Pooled) t-Test
• So far, we haven’t really
done any “statistics” t0 = -2.20
• We need an objective
basis for deciding how
large the test statistic t0
really is
• In 1908, W. S. Gosset
derived the reference
distribution for t0 …
called the t distribution
• Tables of the t
distribution – see
textbook appendix
Chapter 2 Design & Analysis of Experiments 17
7E 2009 Montgomery
The Two-Sample (Pooled) t-Test
• A value of t0 between
–2.101 and 2.101 is t0 = -2.20
consistent with
equality of means
• It is possible for the
means to be equal and
t0 to exceed either
2.101 or –2.101, but it
would be a “rare
event” … leads to the
conclusion that the
means are different
• Could also use the
P-value approach
Chapter 2 Design & Analysis of Experiments 18
7E 2009 Montgomery
The Two-Sample (Pooled) t-Test

t0 = -2.20

• The P-value is the area (probability) in the tails of the t-distribution beyond -2.20 + the
probability beyond +2.20 (it’s a two-sided test)
• The P-value is a measure of how unusual the value of the test statistic is given that the null
hypothesis is true
• The P-value the risk of wrongly rejecting the null hypothesis of equal means (it measures
rareness of the event)
• The P-value in our problem is P = 0.042

Chapter 2 Design & Analysis of Experiments 19


7E 2009 Montgomery
Computer Two-Sample t-Test Results

Chapter 2 Design & Analysis of Experiments 20


7E 2009 Montgomery
Checking Assumptions –
The Normal Probability Plot

-Both samples are


drawn from individual
normally distributed
populations
- Equal variance
-Observations are
independent random
variables
Transformation &
Nonparametric TOH
Chapter 2 Design & Analysis of Experiments 21
7E 2009 Montgomery
Importance of the t-Test
• Provides an objective framework for simple
comparative experiments
• Could be used to test all relevant hypotheses
in a two-level factorial design, because all
of these hypotheses involve the mean
response at one “side” of the cube versus
the mean response at the opposite “side” of
the cube
Chapter 2 Design & Analysis of Experiments 22
7E 2009 Montgomery
Confidence Intervals (See pg. 44)
• Hypothesis testing gives an objective statement
concerning the difference in means, but it doesn’t
specify “how different” they are
• General form of a confidence interval
L    U where P( L    U )  1  
• The 100(1- α)% confidence interval on the
difference in two means:
y1  y2  t / 2,n1  n2 2 S p (1/ n1 )  (1/ n2 )  1  2 
y1  y2  t / 2,n1  n2 2 S p (1/ n1 )  (1/ n2 )

Chapter 2 Design & Analysis of Experiments 23


7E 2009 Montgomery
Chapter 2 Design & Analysis of Experiments 24
7E 2009 Montgomery
Other Chapter Topics
• Hypothesis testing when the variances are
known
• One sample inference
• Hypothesis tests on variances
• Paired experiments

Chapter 2 Design & Analysis of Experiments 25


7E 2009 Montgomery

You might also like