Chapter 2 - Basic Statistical Methods: - Describing Sample Data

Chapter 2 –Basic Statistical Methods

• Describing sample data

– Random samples
– Sample mean, variance, standard deviation
– Populations versus samples
– Population mean, variance, standard deviation
– Estimating parameters
• Simple comparative experiments
– The hypothesis testing framework
– The two-sample t-test
– Checking assumptions, validity

Chapter 2 Design & Analysis of Experiments

Portland Cement Formulation (page 24)

Basic Statistical Concepts
• Run: each observation of the conducted experiment.
• Noise: the variation observed among individual runs. It is usually
called experimental error or just error.
• Such error is considered a statistical error (i.e. it arises from variation
that is uncontrolled and generally unavoidable.)
• The presence of error or noise implies that the response variable is a
random variable.
• Random variables are either discrete or continuous.
• Discrete random variables: countable finite or infinite sets of all
possible values of the random variable.
• Continuous random variables: interval (measurable) sets of all
possible values of the random variable.
• Graphical description of variability: Dot Diagram, Histogram, Box

Basic Statistical Concepts (Cont.)
• Probability distribution: the probability structure of a random
– Probability mass function for discrete random variables (i.e. P(y= yj) = p(yj))
– Probability density function for continuous random variables (i.e. P (a ≤ y ≤ b))
• Central Tendency Measures: mean , expected value , median, mode
• Spread Measures: range, variance, standard devation.
• Sampling & sampling distribution (p. 28)
– Random samples (equal probability of being chosen)
– Sample mean
– Sample variance
– Sample standard deviation
• Properties of the sample mean & variance (point estimators):
– Should be unbiased
– Should have minimum variance
• Degrees of freedom
• Normal, t, chi square, & F distributions
Graphical View of the Data
Dot Diagram, Fig. 2.1, pp. 24

If you have a large sample, a
histogram may be useful

Box Plots, Fig. 2.3, pp. 26

The Hypothesis Testing Framework

• Statistical hypothesis testing is a useful

framework for many experimental
• Origins of the methodology date from the
early 1900s
• We will use a procedure known as the two-
sample t-test

The Hypothesis Testing Framework

• Sampling from a normal distribution

• Statistical hypotheses: H :   
0 1 2

H1 : 1  2
Estimation of Parameters
1 n
y   yi estimates the population mean 
n i 1
S 

n  1 i 1
( yi  y ) estimates the variance 
2 2

Summary Statistics (pg. 36)
Formulation 1 Formulation 2

“New recipe” “Original recipe”

y1  16.76 y2  17.04
S  0.100
S 22  0.061
S1  0.316 S 2  0.248
n1  10 n2  10

How the Two-Sample t-Test Works:
Use the sample means to draw inferences about the population means
y1  y2  16.76  17.04  0.28
Difference in sample means
Standard deviation of the difference in sample means
 
This suggests a statistic:
y1  y2
Z0 
 12  22

n1 n2
How the Two-Sample t-Test Works:
Use S and S to estimate  and 
2 2

y1  y2
The previous ratio becomes
2 2

1 2
n1 n2
However, we have the case where      2

Pool the individual sample variances:

( n  1) S 2
 ( n  1) S 2
Sp  1
2 1 2 2
n1  n2  2
How the Two-Sample t-Test Works:
The test statistic is
y1  y2
t0 
1 1
Sp 
n1 n2
• Values of t0 that are near zero are consistent with the null
• Values of t0 that are very different from zero are consistent
with the alternative hypothesis
• t0 is a “distance” measure-how far apart the averages are
expressed in standard deviation units
• Notice the interpretation of t0 as a signal-to-noise ratio
The Two-Sample (Pooled) t-Test
(n1  1) S12  (n2  1) S22 9(0.100)  9(0.061)
S 
  0.081
n1  n2  2 10  10  2

S p  0.284

y1  y2 16.76  17.04
t0    2.20
1 1 1 1
Sp  0.284 
n1 n2 10 10

The two sample means are a little over two standard deviations apart
Is this a "large" difference?

The Two-Sample (Pooled) t-Test
• So far, we haven’t really
done any “statistics” t0 = -2.20
• We need an objective
basis for deciding how
large the test statistic t0
really is
• In 1908, W. S. Gosset
derived the reference
distribution for t0 …
called the t distribution
• Tables of the t
distribution – see
textbook appendix
The Two-Sample (Pooled) t-Test
• A value of t0 between
–2.101 and 2.101 is t0 = -2.20
consistent with
equality of means
• It is possible for the
means to be equal and
t0 to exceed either
2.101 or –2.101, but it
would be a “rare
event” … leads to the
conclusion that the
means are different
• Could also use the
P-value approach
The Two-Sample (Pooled) t-Test

t0 = -2.20

• The P-value is the area (probability) in the tails of the t-distribution beyond -2.20 + the
probability beyond +2.20 (it’s a two-sided test)
• The P-value is a measure of how unusual the value of the test statistic is given that the null
hypothesis is true
• The P-value the risk of wrongly rejecting the null hypothesis of equal means (it measures
rareness of the event)
• The P-value in our problem is P = 0.042

Computer Two-Sample t-Test Results

Chapter 2 Design & Analysis of Experiments 20

7E 2009 Montgomery
Checking Assumptions –
The Normal Probability Plot

-Both samples are

drawn from individual
normally distributed
- Equal variance
-Observations are
independent random
Transformation &
Nonparametric TOH
Importance of the t-Test
• Provides an objective framework for simple
comparative experiments
• Could be used to test all relevant hypotheses
in a two-level factorial design, because all
of these hypotheses involve the mean
response at one “side” of the cube versus
the mean response at the opposite “side” of
the cube
Confidence Intervals (See pg. 44)
• Hypothesis testing gives an objective statement
concerning the difference in means, but it doesn’t
specify “how different” they are
• General form of a confidence interval
L    U where P( L    U )  1  
• The 100(1- α)% confidence interval on the
difference in two means:
y1  y2  t / 2,n1  n2 2 S p (1/ n1 )  (1/ n2 )  1  2 
y1  y2  t / 2,n1  n2 2 S p (1/ n1 )  (1/ n2 )

Other Chapter Topics
• Hypothesis testing when the variances are
• One sample inference
• Hypothesis tests on variances
• Paired experiments

