Module - 4 - Analyze Phase - Oct 20

Analyze Phase
Module 4
1
Agenda
– Overview of Analyze Phase
– Hypothesis Testing
– Confidence Intervals
– Sample Size
– Analysis of Variance (ANOVA)
– Chi Square
– Multi-variate Studies
– Correlation and Regression
2
Saudi Aramco: Company General Use
Overview of Analyze Phase
3
In Analyze phase, we sift thru the various
x’s to focus on the critical x’s
Identify problem’s root causes through process & data analysis
Define phase
Symptoms “Y” Quantify output
Measure phase
Fishbones, C&E matrix, FMEA
Process Maps, etc 30 - 50 Inputs Prioritize causes
Paretos, Multi-vari, 1-t, 2-t, paired t, ANOVA,

10 - 15 Xs Analyze phase
Chi-Sq, Test for equal var, proportions tests Validate causes
8 - 10 PIVs
3-6 Key PIVs

Optimized Process 4
Analyze Topics
❑ Individual components of the Analyze aren’t intuitively related but if their

relationships are understood early, the relevant pieces can be assimilated easier
❑ Several common concepts are repeated throughout this phase; If they are
understood up front, the follow-on lessons are easier
▪ Hypothesis Testing
▪ Confidence Intervals
▪ Analysis of Variance (ANOVA)
▪ Chi Square
▪ Correlation and Regression

5
Hypothesis Testing
6
What is Hypothesis Testing (1/2)
❑ The probability of occurrence is based on a pre-determined statistical confidence
❑ Decisions are based on:

▪ Beliefs (past experience)
▪ Preferences (current needs)
▪ Evidence (statistical data)
▪ Risk (acceptable level of failure)
❑ A hypothesis is just a statement that we want to test:

▪ The average time to process a purchase order is 12 days
▪ The reject rate from Machine 5 has improved as a result of our work
▪ There is no relationship between humidity and our Cost of Poor Quality
▪ My average Chess Ranking for 2022 is lower than my average Chess Ranking for 2023
7
What is Hypothesis Testing (2/2)
In statistics, we usually form at least two hypotheses:
❑ The “null hypothesis” H0 assumes no significant difference/relationship
▪ This is the default assumption of all statistical tests
❑ The “alternative hypothesis” Ha assumes there is significant difference/relationship
❑ To do any hypothesis testing, we need to

▪ First obtain a sample (or samples) and measure the relevant variable of interest
▪ Then the measured sample value is compared to the hypothesised value of the population
& then we decide to support or not support the hypothesis
❑ The key question becomes how can we reliably use the values from a single sample
to make conclusions about a population value?

8
The cookie owner’s claim (Conjecture)
❑ A cookie shop is selling their famous product — gingerbread cookie! The owner
believes his product is the most delicious one in the world
❑ Also, the owner said that the average weight (μ) of each product (a bag of
gingerbread cookies) is 500g
❑ Assume that we know the bags of cookie

is normally distributed with a standard
deviation (σ) equals 30g
❑ If the owner’s claim is true (the average

weight of one bag of cookies = 500g), we
could expect the distribution of one bag
of cookies looks like the next figure
9
Can We Believe in the Owner’s Words?
❑ Does the average weight of a bag of cookies really equal 500g? What if the owner
deceives customers and gives less than 500g cookies? How do we validate his words?
❑ Here, is where “hypothesis testing” comes in
❑ To implement hypothesis testing, firstly, let’s set up our null hypothesis (H0) and the
alternative hypothesis (H1)
❑ As Industrial engineer were taught not suspect others without having any evidence
❑ So, we assume the owner is honest about his business (H0). If we want to check
whether his cookies is less than 500g, we need to collect data and have enough
evidence to support our guess (H1) So…we have the hypothesis statement set up as
follows: H0: Average weight of one bag of cookies (μ) = 500g
H1: Average weight of one bag of cookies (μ) < 500g

10
Can We Believe in the Owner’s Words?
❑ Since we are unsure about how our

population distribution looks like, I
use the dashed line to represent
possible distributions. If the owner’s
claim is true, we could expect one
bag of cookies has a distribution
with a mean weight equals to 500g
(left picture)
❑ However, if the owner’s claim is not true and the mean weight of cookies is less
than 500g, the population distribution should look differently (any of right
picture)
11
How to test the owner’s statement ?
❑ The problem statement is set
H0: Average weight of one bag of cookies (μ) = 500g

H1: Average weight of one bag of cookies (μ) < 500g
❑ So now, the next question is: how to test our hypothesis statement?
❑ Maybe just weigh all bags of cookies so that we could know the exact population
distribution?
❑ Well…obviously, it is IMPOSSIBLE for us to collect ALL the cookies (population) produced

from the Owner’s cookie shop!!! So…what should we do?
❑ Here we need to use what we learned in Inferential statistics

12
From Inferential statistics
❑ From inferential statistics, it is almost impossible to collect all the data of the whole
population to calculate the parameters (population mean μ, population standard deviation
σ,..etc) and that’s why we use samples and statistics (sample mean 𝑥, sample standard
deviation s,….etc) as an estimator to help us infer the unknown population parameters

13
Hypothesis testing the process
❑ In hypothesis testing, we are not interested in a single unknown parameter; instead,

we are interested in “whether we can reject the null hypothesis?”
❑ To answer this question,

we follow the same
method — we calculate
the statistics from our
sample data for
inferring the answer to
this question. The
statistics used here
called Test Statistics

14
Sampling Distribution
❑ Before we collect sample data and calculate the test statistic to test the hypothesis
statement, we need to understand the concept of sampling distribution
❑ Sampling Distribution is the distribution of the sample statistic
❑ Let’s use sample mean (x̄) as an example

▪ If we sampling from the population many times, we could get many sample datasets (sample
1 to sample m). Then, if we calculate the sample mean (x̄) from each sample dataset, we
could get m data points of the sample mean (x̄)
▪ Use these data points, we could draw a distribution of sample mean (x̄). Since this
distribution is from the sample statistic, we called the distribution Sampling Distribution of
sample mean (x̄)
❑ The same idea applies to other statistics. For example, if we calculate the test statistic
from each sample dataset, we could get the sampling distribution of the test statistic.
15
Sampling Distribution
❑ A sampling distribution is similar to all the other distributions
❑ It shows how likely (probability) the statistic value might appear if we sampling from
the population many times
We will use the

brown color to
represent the
sampling
distribution curve
in the following
sections

16
Testing Hypothesis Statements
❑ The first thing we need to do is to have a sample dataset
❑ So, we go to the cookie shop and randomly pick up 25 bags of cookies (n) as our sample
data, and we calculate the mean weight (x̄) of this sample is 485g
❑ The first part of testing is to compare our sample statistic to the null hypothesis so that
we can know how far away our sample statistic is from the expected value
❑ To do so, we first assume the null hypothesis is true
❑ What does this mean? This means, in our case, we assume the population distribution
of one bag of cookies is really equals to 500g

17
❑ If the statement is true, according to Central Limit Theorem, we could have a sampling
distribution of sample mean (x̄) looks like the below picture (mean value of the sample
mean = 500g) if we sampling from this population many times

18
❑ So now, if the null hypothesis is true, we could

easily see that our sample mean is 15g below
(485–500 = -15) the expected mean value (500g)
❑ Hmm… but “15g” is only a number, which is not

very helpful for us to explain the meaning
❑ Also, if we want to calculate the probability under
the curve, it is inefficient to calculate it case by case
❑ Imagine there are numerous distributions, each of them has its own mean and standard
deviation…you really don’t want to calculate the probability for many many times…
❑ So, what should we do? We standardize our value so that the mean value of distribution
always equals zero 19
Z-score and Test Statistic
❑ The benefit of standardization is that statisticians already generate a table that

includes the area under each standardized value
❑ So that we don’t need to calculate the area case by case. All we need to do is to
standardize our data
❑ How to standardize? In our case, we use the z-score to transform our data. And z-score
is the Test Statistic in our case

20
Z-score and Test Statistic
❑ The next picture shows the sampling

distribution of the test statistic (z-
score). If our sample data exactly equal
to the null hypothesis (the population
mean =500g, the sample mean = 500g),
we should have the test statistic equals
to 0
❑ In our case, our sample mean equals

485g, which gives us the test statistic
equals to -2.5. This indicates that our
sample data has 2.5 standard errors
below the expected value
21
Choice of Test Statistic
❑ The test statistic is chosen based on different cases.
❑ You might hear different kinds of statistical tests, such as z-test, t-test, chi-square test…Why
we need different kinds of tests?
❑ Because we might need to test different types of data (categorical? quantitative?), we might
have different purpose of testing (testing for mean? proportion?), the data we have might
have a different distribution, we might only have limited attributes of our data……Hence,
how to choose a suitable testing method is another crucial work
❑ In this case, since we are interested in testing the mean value, also, we assume our
population data is normally distributed with known population standard deviation (σ)
❑ Based on our condition, we choose the z-test for this case

22
Measuring the probability of the sample data
❑ So, we know how far away our test statistic is from the expected value when the null
hypothesis is true. Then, what we really want to know is: how likely (probability) we get this
sample data if the null hypothesis is true?
❑ To answer this question, we need to calculate the probability. As you know, the probability
between one point to the other point is the area under our sampling distribution curve
between these two points
❑ So here, we do not calculate the probability of a specific point; instead, we calculate the
probability from our test statistic point to infinite — indicates the cumulative probability of
all the points which farther away from our test statistic (also farther away from the
expected test statistic)
❑ This cumulative probability is our p-value

23
P-value
❑ The p-value is the probability of obtaining test results at least as extreme as the results
actually observed, under the assumption that the null hypothesis is correct
❑ Let’s calculate the p-value in our case
❑ We could just look up the z-table, or use

any statistical software to help us get the
p-value
❑ In our case, we have p-value equals 0.0062

(0.62%). Since our alternative hypothesis
(H1) is set up as “mean value less than
500g”, we only care about the value that
less than our test statistics (left-hand side)
24
P-value
❑ Now, we have p-value = 0.0062. It is a small number…but what does this mean?
❑ This means, under the condition that our null hypothesis is true (population mean really
equals 500g), if we sampling from this population distribution 1000 times, we will have 6.2
times chance to get this sample data (sample mean = 485g) or other samples with sample
mean less than 485g

25
P-value
❑ If we get sample data with

a sample mean equals to
485g, there are two
possible explanations:
1. The population mean really

equals 500g (H0 is correct)
We got “lucky” to get this
rare sample data! (6.2 times
out of 1000 times sampling)
2. The assumption of the “null hypothesis is true” is incorrect. This sample data (sample
mean equals 485g) actually comes from other population distribution where the sample
mean = 485g more likely to happen
26
P-value
❑ So now we know that if our p-value is very small, that means either we get a very rare
sample data or our assumption (null hypothesis is true) is incorrect
❑ Then, the next question is: we only have the p-value now, but how to use it to judge
when to reject the null hypothesis? In other words, how small the p-value is, we are
willing to say that this sample comes from another population?
❑ Here, let’s introduce the judgment standard — significant level (α). The significant
level is a pre-defined value that needs to be set before implementing the hypothesis
testing. The significant level is just a threshold, which gives us a criterion of when to
reject the null hypothesis
❑ This criterion is set as below:

▪ if p-value ≤ significant level (α), we reject the null hypothesis (H0)
Saudi Aramco: Company General Use ▪ if p-value > significant level (α), we fail to reject the null hypothesis (H0) 27
Significance Level
❑ We can see the below picture, the red area is the significant level (In our case, it
equals 0.05). We use the significant level as our criterion, if the p-value within (less
than or equal to) the red area, we reject H0; if the p-value exceeds (greater than) the
red area, we fail to reject H0
❑ The significant level (α) also indicates

the maximum risk we are acceptable
for type I error (type I error means we
reject H0 when H0 is actually true)
❑ In our case, we have p-value = 0.0062,

which smaller than 0.05, as a result,
we can reject our null hypothesis
28
What if we change the significant level?
❑ The result will be different. Since 0.0062 > 0.005, we then fail to reject H0. So here is
the tricky part, since the significant level is subjective, we need to determine it
before the testing. Otherwise, we might very likely to cheat ourselves after knowing
the p-value

29
Recap of what we covered so far
❑ Part 1: To test whether our sample data support the alternative hypothesis or not, we
first assume the null hypothesis is true. So that we can know how far away our sample
data from the expected value given by the null hypothesis. The p-value is the
probability of obtaining test results at least as extreme as the results actually
observed, under the assumption that the null hypothesis is correct
❑ Part 2: Based on the distribution, data types, purpose, known attributes of our data,
choose an appropriate test statistic. And calculate the test statistic of our sample
data. (Test statistic shows how far away our sample data from the expected value)
❑ Part 3: Calculate the probability (area under the sampling distribution curve) from the
test statistic point to infinite (indicates more extreme) at the direction represent your
alternative hypothesis(left-tailed, right-tailed, two-tailed)

30
Recap of what we covered so far
What is the meaning of a small p-value?
❑ If we have a very small p-value, it might indicate two possible meaning:
(1) We are so “lucky” to get this very rare sample data!
(2) This sample data is not from our null hypothesis distribution; instead, it is from
other population distribution. (So that we consider to reject the null hypothesis)
❑ How to use p-value?
To determine whether we could reject the null hypothesis, we compare the p-value to
the pre-defined significant level (threshold)
▪ If p-value ≤ significant level (α), we reject the null hypothesis (H0)
▪ If p-value > significant level (α), we fail to reject the null hypothesis (H0)

31
Hypothesis Testing Risk
❑ The alpha risk or Type 1 Error (generally called the “Producer’s Risk”) is the probability
that we could be wrong in saying that something is “different
❑ It is an assessment of the likelihood that the observed difference could have occurred
by random chance. Alpha is the primary decision-making tool of most statistical tests
Actual Conditions
Not Different Different
(Ho is True) (Ho is False)
Not Different Correct Type II

(Fail to Reject
Decision Error
Ho)
Statistical
Conclusions
Different
Type 1 Correct
(Reject Ho) Error Decision

32
Alpha Risk
Alpha ( ) risks are expressed relative to a reference distribution
Distributions include:
The a-level is represented
t-distribution by the clouded areas.
z-distribution Sample results in this area
2- distribution lead to rejection of H0.
F-distribution
Region of Region of
DOUBT DOUBT
Accept as chance differences

33
Alpha Risk
❑ Alpha (α) is known as the significance level; the probability of being wrong (risk level)

34
The beta risk
❑ The beta risk or Type 2 Error (also called the “Consumer’s Risk”) is the probability that
we could be wrong in saying that two or more things are the same when, in fact, they
are different
Actual Conditions
Not Different Different
(Ho is True) (Ho is False)
Not Different Correct Type II

(Fail to Reject
Decision Error
Ho)
Statistical
Conclusions
Different
Type 1 Correct
(Reject Ho) Error Decision

35
The beta risk
❑ Beta Risk is the probability of failing to reject the null hypothesis when a difference
exists
Distribution if H0 is true
Reject H0
 = Pr(Type 1 error)
 = 0.05
H0 value
Accept H0 Distribution if Ha is true

= Pr(Type II error)
Critical value of test statistic 36

The beta risk
❑ The beta risk is the probability that we could be wrong in saying that two or more
things are the same when, in fact, they are different

37
Steps to Statistical Hypothesis Test
Define the problem and state objectives
State a “Null Hypothesis” (Ho)
State the “Alternate Hypothesis” (Ha)
Establish significance level ()
Collect sample data
Calculate test statistic and/or p-value
DECIDE:
What does the evidence suggest?
Reject Ho? or Fail to reject Ho?

38
What are the types of Hypothesis Testing
One Two Multiple
1-Sample 2-Sample
Mean ANOVA
t t Continuous
2
 -test ANOVA ANOVA
MINITAB • 2 Variances Test For Equal
Descriptive • Test For Equal Variances
Variance Statistics Variance Continuous
(Use CI)
2-Sample
1-Sample 2
Proportion P Test  - test Discrete
P Test 2
 - test
Hypothesis Testing and Tool Selection

39
What are the types of Hypothesis Testing
Low p-value Tool Used Interpretation High p-value
Discrete data Defect Counts
80 vs. 150 140 vs. 150
Chi Square; P Tests Differences in % defect
585 535 585 535
p-value <0.05 p-value >0.05
Continuous data Cycle Time, $$

ANOVA; t-tests Differences in averages

Analyze
ANOVA; F-Tests Differences in variation

p-value >0.05
p-value <0.05

Correlation & Regression Strength of relationship

There is a Low p-value identifies where critical x’s There is no
difference difference
are
40
Hypothesis Testing Roadmap
Normal
Two samples One sample
Test of Equal Variance 1 Sample Variance 1 Sample z-test/t-test
Variance Equal Variance Not Equal
Two samples One sample Two samples One sample
2 Sample T One Way ANOVA 2 Sample T One Way ANOVA

41
Non Normal
Test of Equal Variance Median Test
Mann-Whitney Several Median Tests

42
Attribute Data
One Factor Two Factors

One sample Two samples Two or More Samples
One Sample Two Sample Chi Square Test

Proportion Proportion (Contingency Table)
Chi Square Test

(Contingency Table)

43
Common Pitfalls to Avoid
.
❑ While using Hypothesis Testing the following facts should be kept in mind at the
conclusion stage:
• The decision is about Ho and NOT Ha.
• The conclusion statement is whether the contention of Ha was upheld
• The null hypothesis (Ho) is on trial
• When a decision has been made:
o Nothing has been proved
o It is just a decision
o All decisions can lead to errors (Types I and II)

44
Common Pitfalls to Avoid
.
❑ If the decision is to “Reject Ho,” then the conclusion should read “There is
sufficient evidence at the α level of significance to show that “state the alternative
hypothesis Ha.”
❑ If the decision is to “Fail to Reject Ho,” then the conclusion should read “There
isn’t sufficient evidence at the α level of significance to show that “state the
alternative hypothesis.”

45
Hypothesis Testing (Normal data)
46
Hypothesis test for μ (σ known)
Null hypothesis: H0 :  = 0
Test statistic value :
Alternative Hypothesis Rejection Region for Level  Test
Saudi Aramco: Company General Use 47

Testing means of a large sample
When the sample size is large, the z tests for case I are easily modified to
yield valid test procedures without requiring either a normal population
distribution or known 
A large n (>30) implies that the standardized variable
has approximately a standard normal distribution.

Testing means of a small sample coming
from a normal
The One-Sample t Test
Null hypothesis: H0:  = 0
Test statistic value: Normality must be assessed
Alternative Hypothesis Rejection Region for a Level 

Test

P-Values for z Tests
The calculation of the P-value depends on whether the test

is upper-, lower-, or two-tailed.
Each of these is the probability of getting a value at least

as extreme as what was obtained (assuming H0 true).

P-Values for z Tests

Testing Proportion of a large sample
The estimator is unbiased , has

approximately a normal distribution, and its standard
deviation is
When H0 is true, and so

does not involve any unknown parameters. It then follows
that when n is large and H0 is true, the test statistic
has approximately a standard normal

distribution.

Proportions: Large-Sample Tests
Alternative Hypothesis Rejection

Region
Ha: p > p0 z  z (upper-tailed)
Ha: p < p0 z  –z (lower-tailed)
Ha: p ≠ p0 either z  z/2

or z  –z/2 (two-tailed)
These test procedures are valid provided that np0  10 and

n(1 – p0)  10.

Two Samples Test, Known Variances
In general:
Null hypothesis: H0 : 1 – 2 = 0
Test statistic value: z =

Two Samples Test, Known Variances
Null hypothesis: H 0 :  1 –  2 = 0
Alternative Hypothesis Rejection Region for Level  Test
Ha: 1 – 2 > 0 z  z (upper-tailed)
Ha: 1 – 2 < 0 z  – z (lower-tailed)
Ha: 1 – 2 ≠ 0 either z  z/2 or z  – z/2(two-

tailed)

Large-Sample Tests
The assumptions of normal population distributions and known

values of 1 and 2 are fortunately unnecessary when both
sample sizes are sufficiently large
Furthermore, using 𝑆12 and 𝑆12 in place of 12 and 22 gives a
variable whose distribution is approximately standard normal:
These tests are usually appropriate if both m > 30 and n > 30

The Two-Sample t Test
When the population distribution are both normal, the

standardized variable
Normality must be assessed
has approximately a t distribution with df v

estimated from the data by

The two-sample t test for testing H0: 1 – 2 = 0 is as follows:
Test statistic value: t =

Alternative Hypothesis Rejection Region for Approximate

Level  Test
Ha: 1 – 2 > 0 t  t,v (upper-tailed)
Ha: 1 – 2 < 0 t  – t,v (lower-tailed)
Ha: 1 – 2  0 either t  t/2,v or t  –t/2,v (two-tailed)

A Test for Proportion Differences
Theoretically, we know that:
has approximately a standard normal distribution when H0

is true
However, this Z cannot serve as a test statistic because

the value of p is unknown—H0 asserts only that there is a
common value of p, but does not say what that value is

A Large-Sample Test Procedure
Under the null hypothesis, we assume that p1 = p2 = p,

instead of separate samples of size m and n from two
different populations (two different binomial distributions).
So, we really have a single sample of size m + n from one
population with proportion p
The total number of individuals in this combined sample

having the characteristic of interest is X + Y
The estimator of p is then

Using and = 1 – in place of p and q in our old

equation gives a test statistic having approximately
a standard normal distribution when H0 is true
Null hypothesis: H0: p1 – p2 = 0
Test statistic value (large
samples):

Alternative Hypothesis Rejection Region for

Approximate Level  Test
Ha: p1 – p2 > 0 z  za
Ha: p1 – p2 < 0 z  –za
Ha: p1 – p2  0 either z  za/2 or z  –za/2
A P-value is calculated in the same way as for previous z tests.
The test can safely be used as long as and are all

at least 10

The F Test for Ratio of Variances
The F probability distribution has two parameters, denoted by v1 and

v2. The parameter v1 is called the numerator degrees of freedom, and
v2 is the denominator degrees of freedom
A random variable that has an F distribution cannot assume a

negative value. The density function is complicated and will not
be used explicitly, so it’s not shown
There is an important connection between an F variable and chi--

squared variables

The F Distribution
If X1 and X2 are independent chi-squared rv’s with v1 and

v2 df, respectively, then the rv
can be shown to have an F distribution.
Recall that a chi-squared distribution was obtain by summing squared

standard Normal variables (such as squared deviations for example).
So a scaled ratio of two variances is a ratio of two scaled chi-squared
variables

The F Distribution
Figure below illustrates a typical F density

function.

The F Distribution
We use for the value on the horizontal axis that

captures  of the area under the F density curve with v1
and v2 df in the upper tail
The density curve is not symmetric, so it would seem that both

upper- and lower-tail critical values must be tabulated. This is
not necessary, though, because of the fact that
For example, F.05,6,10 = 3.22 and F.95,10,6 = 0.31 = 1/3.22.

A test procedure for hypotheses concerning the ratio is

based on the following result.
Theorem
Let X1,…, Xm be a random sample from a normal distribution
with variance let Y1,…, Yn be another random
sample (independent of the Xi’s) from a normal distribution
with variance and let and denote the
two sample variances. Then the rv
has an F distribution with v1 = m – 1 and v2 = n – 1.

This theorem results from combining the fact that the

variables and each have a
chi-squared distribution with m – 1 and n – 1 df,
respectively.
Because F involves a ratio rather than a difference, the test

statistic is the ratio of sample variances.
The claim that is then rejected if the ratio differs by

too much from 1.

Null hypothesis:
Test statistic value:
Alternative Hypothesis Rejection Region for a Level

 Test
Ratio of Variances or Equality of Variances are the same test as either their
ratio is close to one or their difference is close to zero
Bartlett's Test for Equality of Variances
❑ Check the equality or test the variation between two sample data, or two groups of
data we use F-test
❑ When we want to test the equality of variances between more than 2 variances, we
use Bartlett’s test
❑ Bartlett's test is used to test if k samples have equal variances. Equal variances
across samples is called homogeneity of variances.
❑ Some statistical tests, for example the analysis of variance (ANOVA), assume that
variances are equal across groups or samples

71
❑ Bartlett's test is sensitive to departures from normality
❑ The Levene’s Test is an alternative to the Bartlett test that is less sensitive to
departures from normality
❑ Some common statistical methods assume that variances of the populations from
which different samples are drawn are equal. Bartlett's test assesses this
assumption. It tests the null hypothesis that the population variances are equal

72
H0: σ12 = σ22 = …. = σk2

H1: σi2 ≠ σj2 for at least one pair (i,j)
The test statistic is rather ugly:
In the above, Si2 is the variance of the ith group, N is the total sample size, Ni is the
sample size of the ith group, k is the number of groups, and Sp2 is the pooled variance. The
pooled variance is a weighted average of the group variances and is defined as:

73
Critical Region:
The variances are judged to be unequal if,
𝑇 > 𝜒21−𝑎 ,𝑘−1
Where
2
𝜒1−𝑎 ,𝑘−1 is the critical value of the chi-square
distribution
with k - 1 degrees of freedom and a significance level of α
Key assumptions : Homogeneity (common group variances), Normality of

responses (or of residuals), and Independence of responses (or of residuals). (Hopefully
achieved through randomization…)

74
Paired t-test
❑ A Paired t-test is used to compare the Means of two measurements from the same
samples generally used as a before and after test
❑ This is appropriate for testing the difference between two Means when the data are
paired and the paired differences follow a Normal Distribution
❑ This matching allows you to account for variability between the pairs usually delta
(d)
resulting in a smaller error term, thus increasing the sensitivity
of the Hypothesis Test or confidence interval.
Ho: μδ = μo
before after
Ha: μδ ≠ μo
❑ Where μδ is the population Mean of the differences and μ0 is the hypothesized

Mean of the differences, typically zero.
75
Paired t-test Example
❑ We are interested in changing the sole material for a popular brand of shoes for
children. In order to account for variation in activity of children wearing the shoes,
each child will wear one shoe of each type of sole material. The sole material will
be randomly assigned to either the left or right shoe.
❑ 2. Statistical Problem:
Ho: μδ = 0
Ha: μδ ≠ 0
❑ 3. Paired t-test (comparing data that must remain paired).
α = 0.05 β = 0.10

76
❑ How much of a difference can be detected with 10 samples?

77
❑ How much of a difference can be detected with 10 samples?

MINITABTM Session Window
Power and Sample Size
1-Sample t Test
Testing Mean = null (versus not = null)
Calculating power for Mean = null +
difference
Alpha = 0.05 Assumed Standard Deviation =
1
Sample
Size Power Difference
10 0.9 1.15456
This means we will be able to detect a difference of only

1.15 if the Standard Deviation is equal to 1

78
We need to calculate the difference between the two distributions

We are concerned with the delta; is the Ho outside the t-calc
79
❑ Following the Hypothesis Test roadmap, we first test the AB-Delta distribution for
Normality
MINITABTM Session Window

Box Plot of AB Delta
One-Sample T: AB Delta
Test of mu = 0 vs not = 0
Variable N Mean StDev SE Mean
AB Delta 10 0.410000 0.387155 0.122429
95% CI T P
(0.133046, 0.686954) 3.35 0.009
Reject the null hypothesis since we are 95% confident that there is a
difference in wear between the two materials (does not include zero)
80
Hypothesis Testing (Non-normal data)
81
Non-Normal Hypothesis Tests
❑ At this point we have covered the tests for determining significance for Normal
Data. We will continue to follow the roadmap to complete the test for Non-normal
Data with Continuous Data
❑ Later in the module we will use another roadmap that was designed for Discrete
Data
❑ Recall that Discrete Data does not follow a Normal Distribution, but because it is
not Continuous Data, there are a separate set of tests to properly analyze the data

82
Non-Normal Hypothesis Tests
❑ Why do we care if a data set is Normally Distributed?

▪ When it is necessary to make inferences about the true nature of the population based
on random samples drawn from the population
▪ When the two indices of interest (X-Bar and s) depend on the data being Normal
▪ For problem solving purposes, because we don’t want to make a bad decision – having
Normal Data is so critical that with EVERY statistical test, the first thing we do is check
for Normality of the data
❑ There are four primary causes for Non-normal Data:

▪ Skewness – Natural and Artificial Limits
▪ Mixed Distributions - Multiple Modes
▪ Kurtosis
▪ Granularity 83
Non-Normal Distributions
1 Skewed 2 Kurtosis
3 Multi-Modal 4 Granularity

84
Skewness Classification
Potential Causes of Skewness
Left Skew Right Skew
60
40
50
Frequency
Frequency
30 40
20 30
20
10
10
0 0
10 15 20 4 5 6 7 8 9 10 11
1-1 Natural Limits

1-2 Artificial Limits (Sorting)
1-3 Mixtures
1-4 Non-Linear Relationships
1-5 Interactions
Saudi Aramco: Company General Use 1-6 Non-Random Patterns Across Time 85
Mixed Distributions 1-3
Mixed Distributions occur when data comes from multiple

sources that are supposed to be the same yet are not
Machine A Machine B
Operator A Operator B
Payment Method A Payment Method B Combined
Interviewer A Interviewer B
Sample A + Sample B
=

86
1-4 Non-Linear Relationships
Non-Linear Relationships occur when the X and Y scales

are different
10
Marginal Distribution
Y
5
of Y
0
0 50 100
X
of X 87
1-5 Interactions
Interactions occur when two inputs interact with each other to

have a larger impact on Y than either would by themselves
Interaction Plot for Process Output Aerosol Hairspray
On
35
If you find that two
Room Temperature
Spray
Off inputs have a large
impact on Y but would
30
not effect Y by
themselves, this is
called a Interaction
25
No Spray
No Fire With Fire

88
1-6 Time Relationships / Patterns
The distribution is dependent on time
30
25
of Y
20
10 20 30 40 50
Time
Often seen when tooling requires “warming up”, tool wear, chemical bath
depletions, ambient temperature effect on tooling
89
Non-Normal Right (Positive) Skewed
Moment coefficient of Skewness will be close to zero for

symmetric distributions, negative for left Skewed and positive
for right Skewed Summary for Pos Skew
A nderson-D arling N ormality T est
A -S quared 46.49
P -V alue < 0.005
M ean 70.000
S tD ev 10.000
V ariance 100.000
S kew ness 2.41707
Kurtosis 6.93041
N 500
M inimum 62.921
1st Q uartile 63.647
M edian 65.695
3rd Q uartile 72.821
70 80 90 100 110 120 130
M aximum 130.366
95% C onfidence Interv al for M ean
69.121 70.879
95% C onfidence Interv al for M edian
65.260 66.501
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter v als
9.416 10.662
Mean
Median
65 66 67 68 69 70 71

90
Non-Normal Right (Positive) Skewed
Moment coefficient of Skewness will be close to zero for

symmetric distributions, negative for left Skewed and positive
for right Skewed Summary for Pos Skew
A -S quared 46.49
P -V alue < 0.005
M ean 70.000
S tD ev 10.000
V ariance 100.000
S kew ness 2.41707
Kurtosis 6.93041
N 500
M inimum 62.921
M edian 65.695
70 80 90 100 110 120 130
M aximum 130.366
69.121 70.879
65.260 66.501
9.416 10.662
Mean
Median
65 66 67 68 69 70 71

91
Kurtosis
Kurtosis refers to the shape of the tails

– Leptokurtic
– Platykurtic
• Different combinations of distributions causes the resulting overall shapes
Leptokurtic Platykurtic
Peaked with Long-Tails Flat with Short-Tails

92
Kurtosis - Platykurtic distribution
Platykurtic
Multiple Means shifting over time produces a plateau of the data
as the shift exhibits this shift
Summary for Flat Causes:
A nderson-D arling N ormality Test
A -S quared
P -V alue <
1.74
0.005
2-1. Mixtures: (Combined Data from
M ean 52.330
Multiple Processes)
S tD ev
V ariance
5.099
26.001
Multiple Set-Ups
S kew ness 0.033260 Multiple Batches
Kurtosis -0.988765
N 182 Multiple Machines
M inimum 41.978 Tool Wear (over time)
M edian 52.223
44 48 52 56 60 64
3rd Q uartile
M aximum
56.729
64.140
2-2 Sorting or Selecting:
95% C onfidence Interv al for M ean Scrapping product that falls outside the
51.585 53.076
spec limits
50.932 53.741
9 5 % C onfidence Inter vals

2-3 Trends or Patterns:
4.624 5.685
Mean
Lack of Independence in the data
Median
(example: tool wear, chemical bath)
51.0 51.5 52.0 52.5 53.0 53.5 54.0
2-4 Non Linear Relationships
Chemical Systems
Negative coefficient of Kurtosis indicates Platykurtic distribution
93
Kurtosis - Leptokurtic distribution
.
Platykurtic
Distributions overlaying each other that have very different
variance can cause a Leptokurtic distribution
Causes:
Summary for LongTail
A -S quared 3.59 2-1. Mixtures: (Combined Data from
P -V alue < 0.005
M ean 51.389
Multiple Processes)
S tD ev 12.998 Multiple Set-Ups
V ariance 168.960
S kew ness -0.06752 Multiple Batches
Kurtosis 3.08271
N 125 Multiple Machines
M inimum 0.813 Tool Wear (over time)
M edian 52.017
0 15 30 45 60 75 90
M aximum 94.795 2-2 Sorting or Selecting:
Scrapping product that falls outside the
49.088 53.691
95% C onfidence Interv al for M edian spec limits
50.584 52.666
11.562 14.845 2-3 Trends or Patterns:
Mean
Lack of Independence in the data
Median (example: tool wear, chemical bath)
49 50 51 52 53 54
2-4 Non Linear Relationships

Chemical Systems
Positive Kurtosis value indicates Leptokurtic distribution
94
Multiple Modes
.
Platykurtic
Reasons for Multiple Modes:
1 Mixtures of distributions (most likely)
2 Lack of independence – trends or patterns
3 Catastrophic failures
(example: testing voltage on a motor and the motor shorts out so we
get a zero reading etc.)
Multiple Modes have such dramatic combinations of underlying sources that

they show distinct Modes. They may have shown as Platykurtic but were far
enough apart to see separation
These are usually the easiest to identify causes

95
Bi-Modal Distribution
.
Summary for BiModal
A -S quared 27.11
P -V alue <
M ean
0.005
79.570
This is an example of a Bi-Modal
S tD ev
V ariance
32.385
1048.785 Distribution. Interestingly each
peak is actually a Normal
S kew ness 0.00716
K urtosis -1.63184
N 500
M inimum
1st Q uartile
21.341
48.265
Distribution, but when the data is
20 40 60 80 100 120 140
M edian
3rd Q uartile
83.772
110.379 viewed as a group it is obviously
M aximum 142.391
95% C onfidence Interv al for M ean not Normal
76.724 82.416
62.354 97.233
30.494 34.527
Mean
Median
60 70 80 90 100
2 Different Distributions
-2 different machines
-2 different operators
-2 different administrators
96
Extreme Bi-Modal (Outliers)
.
Summary for ExtremeBiModal
A nderson-Darling N ormality Test
A -S quared 22.88 If you see an extreme
P -V alue < 0.005
outlier, it usually has
M ean 58.487
S tD ev 21.751 its own cause or own
V ariance 473.106
S kew ness -0.59479
source of variation. It’s
Kurtosis
N
-1.03403
385
relatively easy to
M inimum 19.987 isolate the cause by
1st Q uartile
M edian
26.920
66.161
looking on the X axis of
30 45 60 75 90 105
3rd Q uartile 74.140 the Histogram
M aximum 103.301
56.308 60.667
63.410 67.793
20.315 23.406
Mean
Median
55.0 57.5 60.0 62.5 65.0 67.5

97
Bi-Modal – Multiple Outliers
.
Summary for Multiple Outliers
A nderson-D arling N ormality Test Having multiple outliers
A -S quared 20.90
P -V alue < 0.005 is more difficult to
M ean 26.251 correct. This action
S tD ev 4.845
V ariance 23.477 typically means
S kew ness
Kurtosis
3.17250
9.11483 multiple inputs
N 108
M inimum 22.629
M edian 25.053
24 28 32 36 40 44
M aximum 46.000
25.326 27.175
24.836 25.297
4.274 5.594
Mean
Median
25.0 25.5 26.0 26.5 27.0 27.5

98
Granularity
.
Granular data is easy to see in a Dot Plot
– Use Caution!
• It looks ‘Normal’ but it is only symmetric and not Continuous
– Causes:
1 Measurement system resolution (Gage R&R) Notice the P-
2 Categorical (step-type function) data value in the
Normal
Probability
Plot, it is
definitely
smaller than
0.05

99
Normal Example
.
Notice the contrast to the previous slide!

100
Conclusions Regarding Distributions
❑ Non-normal Distributions are not BAD!!!
❑ Non-normal Distributions can give more Root Cause information than Normal data
(the nature of why…)
❑ Understanding what the data is telling us is KEY !!!

101
Non Normal
Test of Equal Variance Median Test
Mann-Whitney Several Median Tests

102
Test of Equal Variance
❑ Levene’s test of Equal Variance is used to compare the estimated

population Standard Deviations from two or more samples with Non-normal
Distributions
❑ Ho: σ1 = σ2 = σ3 …
❑ Ha: At least one is different

103
Test of Equal Variance (Minitab)
P-value < 0.05 (0.00)

Assume data is not
Normally Distributed
Probability Plot of Rot 2

Normal
99.9
Mean 1.023
StDev 1.407
99
N 100
AD 7.448
95 P-Value <0.005
90
80
70
Percent
Stat > Basic Statistics > Normality test… 60
50
40
30
20
10
5
0.1
-5.0 -2.5 0.0 2.5 5.0 7.5 10.0
Rot 2

104
Test of Equal Variance Non-Normal Distribution
Stat>ANOVA>Test for Equal Variance
Use Levene’s Statistics for Non-Normal Data
P-value >0.05 (0.860) Assume variance is equal.
Ho: σ1 = σ2 = σ3 …
Ha: At least one is different.
Test for Equal Variances for Rot 2

F-Test
Test Statistic 1.75
1 P-Value 0.053
Factors2
Lev ene's Test
Test Statistic 0.03
P-Value 0.860
2
1.0 1.2 1.4 1.6 1.8 2.0 2.2

95% Bonferroni Confidence Intervals for StDevs
1
Factors2
0 2 4 6 8 10
Rot 2

105
Test of Equal Variance - Conclusions
❑ When testing 2 samples with Normal Distribution, use F-test:
To determine whether two Normal Distributions have equal variance
❑ When testing >2 samples with Normal Distribution, use Bartlett’s test:
To determine whether multiple Normal Distributions have equal variance
❑ When testing 2 or more samples with Non-normal Distributions, use Levene’s test:
To determine whether two or more distributions have Equal Variance
Our focus for this module when working with Non-normal Distributions

106
Mean and Median
This Graphical Summary provides the confidence interval for the Median
With Normal Data notice the With skewed data, the Mean is
symmetrical shape of the distribution influenced by the outliers. Notice the
and notice how the Mean and the Median is still centered
Median are centered
A nderson-Darling N ormality Test A nderson-Darling N ormality Test
A -S quared 0.30 A -S quared 3.72

P -V alue 0.574 P -V alue < 0.005
M ean 350.51 M ean 4.8454

S tDev 5.01 S tDev 3.1865
V ariance 25.12 V ariance 10.1536
S kew ness -0.079532 S kew ness 1.11209
Kurtosis -0.635029 Kurtosis 1.26752
N 75 N 200
M inimum 339.09 M inimum 0.1454

1st Q uartile 347.48 1st Q uartile 2.4862
M edian 350.48 M edian 4.1533
3rd Q uartile 353.99 3rd Q uartile 6.5424
M aximum 359.53 M aximum 16.4629
340 344 348 352 356 360 0 3 6 9 12 15 95% C onfidence Interv al for M ean
349.35 351.66 4.4011 5.2898
95% C onfidence Interv al for M edian 95% C onfidence Interv al for M edian
349.30 351.85 3.6296 4.7174
95% C onfidence Interv al for S tDev 95% C onfidence Interv al for S tDev
4.32 5.97 2.9018 3.5336
95% Confidence Intervals 95% Confidence Intervals
Mean Mean
Median Median
349.0 349.5 350.0 350.5 351.0 351.5 352.0 3.5 4.0 4.5 5.0 5.5

107
MINITAB’s Nonparametric tests
❑ 1-Sample Sign: performs a one-sample sign test of the Median and calculates the
corresponding point estimate and confidence interval. Use this test as an
alternative to one-sample Z and one-sample t-tests
❑ 1-Sample Wilcoxon: performs a one-sample Wilcoxon signed rank test of the

Median and calculates the corresponding point estimate and confidence interval
(more discriminating or efficient than the sign test). Use this test as a
nonparametric alternative to one-sample Z and one-sample t-tests.
❑ Mann-Whitney: performs a Hypothesis Test of the equality of two population

Medians and calculates the corresponding point estimate and confidence interval.
Use this test as a nonparametric alternative to the two-sample t-test

108
MINITAB’s Nonparametric tests
❑ Kruskal-Wallis: performs a Hypothesis Test of the equality of population Medians

for a one-way design. This test is more powerful than Mood’s Median (the
confidence interval is narrower, on average) for analyzing data from many
populations, but is less robust to outliers. Use this test as an alternative to the
one-way ANOVA
❑ Mood’s Median Test: performs a Hypothesis Test of the equality of population

Medians in a one-way design. Test is similar to the Kruskal-Wallis Test. Also
referred to as the Median test or sign scores test. Use as an alternative to the
one-way ANOVA

109
1-Sample Sign Test
❑ This test is used when you want to compare the Median of one distribution to a target
value
❑ Must have at least one column of numeric data. If there is more than one column of
data, MINITABTM performs a one-sample Wilcoxon test separately for each column
❑ The hypotheses:
H0: M = Mtarget
Ha: M ≠ Mtarget
❑ Interpretation of the resulting P-value is the same

110
1-Sample Sign Test
❑ Example: Our facility requires a cycle time from an improved process of 63 minutes.
This process supports the customer service division and has become a bottleneck to
completion of order processing. To alleviate the bottleneck the improved process
must perform at least at the expected 63 minutes
❑ Ho: M = 63
❑ Ha: M ≠ 63 Stat>Non parametric> 1 sample sign …

Or
❑ 1-Sample Sign or 1-Sample Wilcoxon Stat> Non parametric> 1 sample Wilcoxon

111
1-Sample Sign Test
Stat>Non parametric> 1 Sample Sign …
For a two tailed test, choose the

“not equal” for the alternative
hypothesis.
Sign Test for Median: Pos Skew

Sign Test of Median = 63.00 versus = 63.00
N Below Equal Above P Median
Pos Skew 500 37 0 463 0.0000 65.70
As you can see the P-value is less than 0.05, so we must reject the null hypothesis which means
we have data that supports the alternative hypothesis that the Median is different than 63.

112
1 Sample Wilcoxon Test
Stat>Non parametric> 1 Sample Wilcoxon …
Wilcoxon Signed Rank Test: Pos Skew

Test of Median = 63.00 versus Median not = 63.00
N for Wilcoxon Estimated

N Test Statistic P Median
Pos Skew 500 500 124015.0 0.000 67.83
As you can see the P-value is less than 0.05, so we must reject the null hypothesis which means we
have data that supports the alternative hypothesis that the Median is different than 63.
113
Mann-Whitney Example
❑ The Mann-Whitney test is used to test if the Medians for 2 samples are different.
❑ Determine if different machines have different Median cycle times.
Ho: M1 = M2
Ha: M1 ≠ M2
❑ There are 200 data points for each machine, well over the minimum sample
necessary

114
First run a Normality Test…of course!
When looking at the probability plot, Match
Probability Plot of Mach A
Normal
A yields a less than .05 P-value. Now look
99.9
Mean 15.24
at Graph B? Ok now you have one graph
99
StDev
N
5.379
200 that is Non-normal Data and the other that
is Normal
AD 1.550
95 P-Value <0.005
90
80
70
Percent
60
50 Probability Plot of Mach B
40
30 Normal
20 99.9
Mean 16.73
10
StDev 5.284
5 99
N 200
AD 0.630
1 95 P-Value 0.099
90
0.1 80
0 10 20 7030 40
Percent
60
Mach A 50
40
30
20
10
5
0.1
0 5 10 15 20 25 30 35
Mach B

115
Now you’ll actually run the Mann-Whitney test and based on the
results end up determining that Medians of the machines are
different.
Stat>Nonparametric>Mann-Whitney… Since zero (the difference between the 2 Medians) is
not contained within the confidence interval we
reject the null hypothesis. Also, the last line in the
Session Window where it says … “is significant at
0.0019” is the equivalent of a P-value for the Mann-
Whitney test
Mann-Whitney Test and CI: Mach A, Mach B

N Median
Mach A 200 14.841
Mach B 200 16.346
Point estimate for ETA1-ETA2 is -1.604
95.0 Percent CI for ETA1-ETA2 is (-2.635,-0.594)
W = 36509.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is
significant at 0.0019
116
❑ Example: A credit card company now understands there is no variability difference in customer
calls/week for the two different credit card types. This means no difference in strategy of
deploying the workforces. However, the credit card company wants to see if there is a
difference in call volume between the two different card types. The company expects no
difference since the total sales among the two credit card types are similar. The Black Belt was
selected and told to evaluate with 95% confidence if the averages were the same. The Black
Belt reminded the credit card company the calls/day were not Normal distributions so he would
have to compare using Medians since Medians are used to describe the central tendency of Non-
normal Populations
❑ Analyze the problem using the Hypothesis Testing roadmap.
❑ Is there a difference in call volume between the 2 different card types?

117
Mann-Whitney Example: Solution
❑ Since we know the data are Non-normal we can proceed to performing a Mann-Whitney Test
Stat>Nonparametrics>Mann-Whitney
Mann-Whitney Test and CI: CallsperWk1, CallsperWk2

N Median
CallsperWk1 22 739.0
95.0 Percent CI for ETA1-ETA2 is (-91.9,43.0)
W = 36509.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.4580

118
Mann-Whitney Example: Solution
❑ As you can see there is a difference in the Median between CallsperWk1 and CallsperWk2.
❑ Therefore, there is not a difference in call volume between the two different card types
Mann-Whitney Test and CI: CallsperWk1, CallsperWk2

N Median
95.0 Percent CI for ETA1-ETA2 is (-91.9,43.0)
W = 36509.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.4580

119
Mood’s Median Test
❑ An aluminum company wanted to compare the operation of its three facilities worldwide.
They want to see if there is a difference in the recoveries among the three locations. A
Black Belt was asked to help management evaluate the recoveries at the locations with 95%
confidence.
❑ Ho: M1 = M2 = M3
Ha: at least one is different
Use the Mood’s Median test.
❑ Based on the smallest sample of 13, the test will be able to detect a difference close to 1.5
❑ Statistical Conclusions: Use the data in the columns named “Recovery” and “Location” in
the Minitab worksheet “Hypoteststud.mtw” for analysis

120
Mood’s Median Test Example: Solution
Stat>Basic Statistics>Graphical Summary… Instead of using the Anderson-Darling test for Normality,
this time we used the Graphical Summary method. It
gives a P-value for Normality and allows a view of the
data that the Normality test does not.
Summary for Recovery
Location = Savannah
A -S quared 0.81
P -V alue 0.032
M ean 87.660
S tD ev 7.944
V ariance 63.113
S kew ness -0.15286
Kurtosis -1.11764
N 25
M inimum 75.300
M edian 87.500
78 84 90 96 3rd Q uartile 96.550
M aximum 99.200
84.381 90.939
86.179 90.080
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tD ev
Mean 6.203 11.052
Median
84.0 85.5 87.0 88.5 90.0 91.5

121
Location = Bangor Notice evidence of outliers in at
A -S quared 0.72
least 2 of the 3 populations. You
P -V alue 0.045
M ean 93.042
could do Box Plot to get a clearer
S tD ev
V ariance
5.918
35.017 idea about Outliers.
S kew ness -1.81758
Kurtosis 4.66838
N 13
M inimum 76.630
1st Q uartile 90.600 Location = Ankhar
M edian 94.800
78 84 90 96 3rd Q uartile 97.350
M aximum 99.700 A -S quared 0.86
P -V alue 0.022
89.466 96.617 M ean 88.302
S tD ev 6.929
V ariance 48.008
90.637 97.036 S kew ness -0.105610
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tD ev Kurtosis 0.182123
4.243 9.768 N 20
Mean
M inimum 73.500
Median 1st Q uartile 85.150
M edian 88.425
90 92 94 96 98
78 84 90 96 3rd Q uartile 89.700
M aximum 99.450
85.059 91.545
86.735 89.299
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tD ev
Mean 5.269 10.120
Median
85 86 87 88 89 90 91

122
Test for Equal Variances for Recovery
Bartlett's Test
Test Statistic 1.33
Ankhar P-Value 0.514
Lev ene's Test
Test Statistic 1.02
P-Value 0.367
Location
Bangor
Savannah
3 4 5 6 7 8 9 10 11 12

123
Mood’s Median Test
Stat>NonParametrics > Moods Median [Session Output}…
Mood Median Test: Recovery versus Location
Mood median test for Recovery

Chi-Square = 12.11 DF = 2 P = 0.002
Individual 95.0% CIs

Location N<= N> Median Q3-Q1 ---+---------+---------+---------+---
Ankhar 13 7 88.4 4.5 (-----*--)
Bangor 1 12 94.8 6.8 (-------------*------)
Savannah 15 10 87.5 17.6 (----*-------)
---+---------+---------+---------+---
87.0 90.0 93.0 96.0
Overall median = 88.9
We observe the confidence intervals for the Medians of the 3 populations. Note
there is no overlap of the 95% confidence levels for Bangor—so we visually know
the P-value is below 0.05.
Statistical C on clu sion : Sin ce th e P -valu e of th e Mood ’ s Me dian te st is le ss th an 0.05,

we re je ct th e n u ll h y poth e sis.
Practical C on clu sion : Ban gor h as th e h igh e st re cove ry of all th re e facilitie s. 124
Kruskal-Wallis Test
Using the same data set, analyze using the Kruskal-Wallis test.
Kruskal-Wallis Test: Recovery versus Location When comparing the Kruskal-Wallis

test to the Mood’s Median test, the
Kruskal-Wallis Test on Recovery Kruskal-Wallis test is better. In this
case the Kruskal-Wallis Test showed
Location N Median Ave Rank Z
Ankhar 20 88.43 27.3 -0.73 the variances were equal and
Bangor 13 94.80 40.2 2.60 illustrated the same conclusion.
Savannah 25 87.50 25.7 -1.49
Overall 58 29.5
H = 6.86 DF = 2 P = 0.032
H = 6.87 DF = 2 P = 0.032 (adjusted for
ties)
This output is the “least friendly” to interpret. Look for the P-value which tells us we reject the null
hypothesis. We have the same conclusion as with the Mood’s Median test. 125
Unequal Variance
❑ Where do you go in the roadmap if the variance is not equal?
▪ Unequal variances are usually the result of differences in the shape of the
distribution
▪ Extreme tails
▪ Outliers
▪ Multiple modes
❑ These conditions should be explored through data demographics
❑ For Skewed Distributions with comparable Medians, it is unusual for the variances to
be different without some assignable cause impacting the process

126
Check For Normality
Check for normality using Stat > Basic Statistics > Normality….
Model A and Model B are similar in nature (not exact), but are manufactured
in the same plant
Probability Plot of Model A Probability Plot of Model B

Normal Normal
99 99
Mean 10.28 Mean 2.826
StDev 0.7028 StDev 3.088
95 N 10 95 N 10
AD 0.227 AD 0.753
90 90
P-Value 0.747 P-Value 0.033
80 80
70 70
Percent
Percent
60 60
50 50
40 40
30 30
20 20
10 10
5 5
1 1
8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 -5.0 -2.5 0.0 2.5 5.0 7.5 10.0
Model A Model B
Model A is Normal, Model B is Non-normal

127
Check for equal Variance
Now le’ts check for Equal Variances using Levene’s test but remember, first
you’ll need to stack the data so you can run this test…
Test for Equal Variances for Data

F-Test
Test Statistic 0.05
Model A P-Value 0.000
Lev ene's Test
idvar
Test Statistic 4.47
P-Value 0.049
Model B
0 1 2 3 4 5 6 7
Model A
idvar
Model B
0 2 4 6 8 10 12
Data
The P-value is just under the limit of .05. Whenever the result is borderline,
as in this case, use your process knowledge to make a judgment.
128
Plot the data to explore explain the differences
Let’s look at data demographics for clues

Summary for Model A Summary for Model B
A nderson-D arling N ormality Test A nderson-D arling N ormality Test
A -S quared 0.23 A -S quared 0.75
P -V alue 0.747 P -V alue 0.033
M ean 10.279 M ean 2.8260

S tD ev 0.703 S tD ev 3.0882
V ariance 0.494 V ariance 9.5370
S kew ness 0.330968 S kew ness 1.29887
Kurtosis -0.614597 Kurtosis 0.92377
N 10 N 10
M inimum 9.213 M inimum 0.2253

1st Q uartile 9.779 1st Q uartile 0.3488
M edian 10.111 M edian 1.7773
3rd Q uartile 10.816 3rd Q uartile 5.5508
9.0 9.5 10.0 10.5 11.0 11.5 0 2 4 6 8 10
M aximum 11.496 M aximum 9.4440
95% C onfidence Interv al for M ean 95% C onfidence Interv al for M ean
9.776 10.782 0.6169 5.0352
95% C onfidence Interv al for M edian 95% C onfidence Interv al for M edian
9.767 10.848 0.3465 5.5873
95% C onfidence Interv al for S tD ev 95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals 9 5 % C onfidence Inter vals
0.483 1.283 2.1242 5.6379
Mean Mean
Median Median
9.8 10.0 10.2 10.4 10.6 10.8 11.0 0 1 2 3 4 5 6
Dotplot of Model A, Model B
Graph> Dotplot> Multiple Y’s, Simple

Model A
Model B
-0.0 1.6 3.2 4.8 6.4 8.0 9.6 11.2
Data

129
Confidence Interval
130
Why Confidence Interval?
.
❑ Sample statistics such as the mean, standard deviation and proportion (x, s, p) are
only estimates of the population parameters (𝑋, , and P)
❑ Since there is variability in these estimates from sample to sample, we can quantify
the uncertainty using confidence intervals
❑ Confidence intervals provide us with a range in which population parameters are

likely to fall

131
What Is A Confidence Interval?
A Graphical View
.
❑ A 95% confidence interval suggests that Population Mean
approximately 95 out of 100

confidence intervals will contain the
population parameter
❑ Confidence level = 1-α
❑ 1-α is called the probability content or Sample Mean

level of confidence
❑ Alpha (α) is known as the significance

level; the probability of being wrong
(risk level)
Confidence Interval

132
Central Limit Theorem
.
500
Population
400
Frequency
Distribution
300

SE Mean = x= n 200
100
0
 x = Standard Error of the Mean 30 40 50 60 70 80 90 100
Population
 = Standard Deviation for the Individual Scores
80
n = Sample Size for the mean 70
60
Frequency
50
40
30
Sample Means
20
Distribution 10
0
30 40 50 60 70 80 90 100
Sample

133
Point and Interval Estimates
.
❑ A point estimate is a single We can estimate a with a Sample Statistic
Population Parameter (a Point Estimate)
number, and a confidence
interval provides additional Mean μ X
information about the
variability of the estimate Proportion P p
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval

134
Point and Interval Estimates
.
❑ How much uncertainty is associated with a point estimate of a population parameter?
❑ An interval estimate provides more information about a population characteristic than

does a point estimate Such interval estimates are called confidence intervals
❑ The general formula for all confidence intervals is:
Point Estimate ± (Critical Value)(Standard Error)

Where:
Point Estimate is the sample statistic estimating the population parameter of interest
Critical Value is a table value based on the sampling distribution of the point estimate and
the desired confidence level
Standard Error is the standard deviation of the point estimate

135
Confidence Intervals on mean with known 
.
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown

136
Confidence Interval for μ (σ Known)
.
❑ Assumptions
Population standard deviation σ is known
Population is normally distributed
❑ If population is not normal, use large sample (n > 30)
❑ Confidence interval estimate:

σ
X ± Z𝛼/2
n
where X is the point estimate
Zα/2 is the normal distribution critical value for a probability of /2 in each tail
σ
is the standard error
n
137
Finding the Critical Value, Z α/2
.
Consider a 95% confidence interval: Z α/2 = 1.96
1 − α = 0.95 so α = 0.05
α α
= 0.025 = 0.025
2 2
Z units: Zα/2 = -1.96 0 Zα/2 = 1.96

X units: Lower Upper
Point Estimate
Confidence Confidence
Limit Limit

138
Common Levels of Confidence
.
Confidence
Confidence
Coefficient, Zα/2 value
Level
1− 
80% 0.80 1.28
90% 0.90 1.645
95% 0.95 1.96
98% 0.98 2.33
99% 0.99 2.58
99.8% 0.998 3.08
99.9% 0.999 3.27

139
Intervals and Level of Confidence
Sampling Distribution of the Mean
/2 1– /2
μx = μ x
Intervals x1
extend from x2 (1-)100%
σ of intervals
X − Zα / 2 constructed
n
contain μ;
to
σ ()100% do
X + Zα / 2 not.
n Confidence Intervals
140
Example
.
❑ A sample of 11 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is 0.35 ohms
❑ Determine a 95% confidence interval for the true mean resistance of the population
Solution Interpretation
σ We are 95% confident that the true
X ± Z𝛼/2 mean resistance is between
n
1.9932 and 2.4068 ohms
= 2.20 ± 1.96 (0.35/ 11) Although the true mean may or may
= 2.20 ± 0.2068 not be in this interval, 95% of
intervals formed in this manner
1.9932 ≤ 𝜇 ≤ 2.4068 will contain the true mean

141
Confidence Intervals on mean with unknown 
.
Confidence
Intervals
Mean Proportion
σ Known σ Unknown

142
Confidence Interval for μ (σ unknown)
.
❑ Assumptions
Population standard deviation is unknown
Population is normally distributed
If population is not normal, use large sample (n > 30)
❑ We use the Student’s t Distribution instead of the normal distribution as it factor in the
greater uncertainty associated with small sample sizes
𝑠
X ± t 𝛼/2
n
where tα/2 is the critical value of the t distribution with n
-1 degrees of freedom and an area of α/2 in each tail
143
Student’s t Distribution
.
❑ The t is a family of distributions
❑ The tα/2 value depends on degrees of freedom (d.f.)
❑ Number of observations that are free to vary after sample mean has been calculated
𝑑. 𝑓. = 𝑛 − 1

144
Degrees of Freedom (df)
.
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7
Let X2 = 8
If the mean of these three
What is X3? values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2

(2 values can be any numbers, but the third is not free to vary
for a given mean)
145
Student’s t Distribution
.
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
146
Student’s t table
.
Upper Tail Area

Let: n = 3
df .10 .05 .025 df = n - 1 = 2
 = 0.10
1 3.078 6.314 12.706 /2 = 0.05
2 1.886 2.920 4.303

3 1.638 2.353 3.182 /2 = 0.05
The body of the table

contains t values, not 0
probabilities
2.920 t
147
Selected t distribution values
.
With comparison to the Z value
Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) (∞ d.f.)
0.80 1.372 1.325 1.310 1.28

0.90 1.812 1.725 1.697 1.645
0.95 2.228 2.086 2.042 1.96
0.99 3.169 2.845 2.750 2.58
Note: t Z as n increases

148
Example
.
❑ A random sample of n = 25 has X = 50 and S = 8. Form a 95% confidence interval for μ
❑ d.f. = n – 1 = 24, so t α/2 = t 0.025 = 2.0639

The confidence interval is
S Interpreting this interval requires the
𝑋 ± 𝑡𝛼/2 assumption that the population you are
n
8 sampling from is approximately a normal
= 50 ± (2.0639) distribution (especially since n is only 25)
25
This condition can be checked by creating a:
46.698 ≤ μ ≤ 53.302 Normal probability plot or Boxplot

149
Confidence Intervals for the Population Proportion, P
.
Confidence
Intervals
Mean Proportion
σ Known σ Unknown

150
Confidence Interval for the Population Proportion, P
.
❑ Recall that the distribution of the sample proportion is approximately normal if the sample
size is large, with standard deviation
𝑝(1
Ƹ − 𝑝)Ƹ
𝑃=
n
❑ We will estimate this with sample data:
𝑝(1
Ƹ − 𝑝)Ƹ
n

151
Confidence Interval for the Population Proportion, P
.
❑ Upper and lower confidence limits for the population proportion are calculated with the
formula
𝑝Ƹ (1 − 𝑝Ƹ )
𝑝Ƹ ± Z𝛼/2
n
where Zα/2 is the standard normal value for the level of confidence desired
pො is the sample proportion
n is the sample size
Note: must have np > 5 and n(1-p) > 5

152
Example
❑ A random sample of 100 people shows that 25 are left-handed.
❑ Form a 95% confidence interval for the true proportion of left-handers
p ± Z𝛼/2 p(1 − p)/n We are 95% confident that the true
= 25/100 ± 1.96 0.25(0.75)/100 percentage of left-handers in the
= 0.25 ± 1.96 0.0433 population is between 16.51% and 33.49%
Although the interval from 0.1651 to 0.3349
= 0.1651 ≤ p ≤ 0.3349 may or may not contain the true
proportion, 95% of intervals formed from
samples of size 100 in this manner will
contain the true proportion

153
Sample Size
154
Distinguishing between Two Samples
.
Theoretical Distribution
❑ Recall from the Central Limit Theorem as of Means
d When n = 2
the number of individual observations
d=5
increase the Standard Error decreases. S=1
❑ In this example when n=2 we cannot

distinguish the difference between the
Means (> 5% overlap, P-value > 0.05)
❑ When n=30, we can distinguish between the

Theoretical Distribution
Means (< 5% overlap, P-value < 0.05) There of Means
is a significant difference When n = 30
d=5
S=1

155
Delta Sigma—The Ratio between δ and S
Large Delta
❑ Delta (δ) is the size of the difference between two
Means or one Mean and a target value
d
❑ Sigma (S) is the sample Standard Deviation of the
distribution of individuals of one or both of the
samples under question
❑ When δ & S is large, we don’t need statistics

because the differences are so large
Large S
❑ If the variance of the data is large, it is difficult to
establish differences. We need larger sample sizes
to reduce uncertainty
156
The Perfect Sample Size
Question: “How many samples should we take?”
Answer: “Well, that depends on the size of your delta and Standard Deviation”
Question: “How should we conduct the sampling?”
Answer: “Well, that depends on what you want to know”
Question: “Was the sample we took large enough?”
Answer: “Well, that depends on the size of your delta and Standard Deviation
Question: “Should we take some more samples just to be sure?”
Answer: “No, not if you took the correct number of samples the first time!”

157
The Perfect Sample Size
❑ The minimum sample size required to

provide exactly 5% overlap (risk). In
order to distinguish the Delta.
❑ Note: If you are working with Non-

normal Data, multiply your calculated
sample size by 1.1 (this is based on 40 50 60 70
Population
recommendations by multiple studies)
Saudi Aramco: Company General Use 40 50 60 70

158
Determining Sample Size
Determining
Sample Size
For the For the

Mean Proportion

159
Sampling Error
❑ The required sample size can be found to reach a desired margin of error (e)
with a specified level of confidence (1 - α)
❑ The margin of error is also called sampling error the amount of imprecision in
the estimate of the population parameter or the amount added and subtracted
to the point estimate to form the confidence interval

160
Determining
Sample Size
2
For the 𝑍𝛼/2 𝜎2
Mean 𝑛=
𝑒2
σ
σ e = Zα / 2
X  Zα / 2 n
n

161
❑ To determine the required sample size for the mean, you must know:
1) The desired level of confidence (1 - α), which determines the critical value,
Zα/2
2) The acceptable sampling error, e
3) The standard deviation, σ

162
Required Sample Size Example
If  = 45, what sample size is needed to estimate the

mean within ± 5 with 90% confidence?
Z 2 σ 2 (1.645)2 (45)2
n= 2
= 2
= 219.19
e 5
So the required sample size is n = 220

(Always round up)

163
If σ is unknown
❑ If unknown, σ can be estimated when using the required

sample size formula
❑ Use a value for σ that is expected to be at least as large as

the true σ
❑ Select a pilot sample and estimate σ with the sample

standard deviation, S

164
For the 𝐙𝛂𝟐 𝐏(𝟏 − 𝒑)

Proportion 𝐧= 𝟐
𝐞𝟐
𝒑(𝟏 − 𝒑)
𝑝Ƹ (1 − 𝑝Ƹ ) 𝐞=𝐙
𝑝Ƹ ± Z𝛼/2 𝐧
n
Another approach to choosing n uses the fact that the sample size will always be a maximum
for p = 0.5 [that is, p(1 - p)≤ 0.25 with equality for p 0.5], and this can be used to obtain an
upper bound on n. In other words, we are at least 100(1 – α)% confident that the error in
estimating p by is less than E if the sample size is 𝟐
𝐙α/𝟐
𝐧= 𝟎. 𝟐𝟓
𝐞𝟐
165
❑ To determine the required sample size for the proportion, you must know:
Zα/2
3) The true proportion of events of interest, p
4) P can be estimated with a pilot sample if necessary (or conservatively use

0.5 as an estimate of p)

166
Required Sample Size Example
How large a sample would be necessary to estimate the true

proportion of defectives in a large population within ±3%, with
95% confidence?
(Assume a pilot sample yields p = 0.12)
Solution:
For 95% confidence, use Zα/2 = 1.96 , e = 0.03
p = 0.12, so use this to estimate p So use n = 451
2
Z𝛼/2 𝑝(1 − 𝑝) (1.96)2 (0.12)(1 − 0.12)
n= = = 450.74
e2 (0.03)2

167
Proportion data - Example
❑ Laura was looking at the percentage of duplicate payments. She has randomly
sampled 50 and discovered that four were duplicated or defective. She wants
a 95% confidence of the overall payments population defect rate to within plus
or minus 2%. If she uses the defect percentage of her sample, calculate the
sample size she would need to determine what she wants to know
A. What sample size is needed based on the information above?
B. Once she sees the number, she indicates she is uncertain about the defect
rate, calculate the sample needed with an unknown defect rate
C. After seeing the samples sizes needed, Laura is concerned about never being
able to go to Hawaii again. What could you suggest?
168
❑ N = 50, C = 4, E = 2% , P*(1-P) = 4*46/50 = 0.0736
A. What sample size is needed based on the information above?
𝐙𝛂𝟐 𝐏(𝟏 − 𝒑)
𝟐 𝟏. 𝟗𝟔𝟐 ∗ 𝟎. 𝟎𝟕𝟑𝟔
𝐧= = = 𝟕𝟎𝟔. 𝟖𝟓 ⇒ 𝟕𝟎𝟕
𝐞𝟐 𝟎. 𝟎𝟐𝟐

169
❑ N = 50, C = 4, E = 2% , P*(1-P) = 4*46/50 = 0.0736
B. Once she sees the number, she indicates she is uncertain about the defect rate,
calculate the sample needed with an unknown defect rate
𝐙𝛂𝟐 𝟎. 𝟐𝟓
𝟐 𝟏. 𝟗𝟔𝟐 ∗ 𝟎. 𝟐𝟓
𝐧= = = 𝟐𝟒𝟎𝟏 ⇒ 𝟕𝟎𝟕
𝐞𝟐 𝟎. 𝟎𝟐𝟐
C. After seeing the samples sizes needed, Laura is concerned about never being
able to go to Hawaii again. What could you suggest?
Reduce confidence needed, decrease precision or error around the population mean

170
Determining Sample Size Of Attribute Data
❑ To determine the required sample size for the proportion, you must know:
Zα/2
3) The average number of defects of interest, 𝐶
2
𝐶 𝑍𝛼/2
𝑛=
𝑒2

171
Attribute Data - Example
❑ Jeri is looking at number of claims lines defects. There is no prior history on

this, so she takes a random sample of 100 claim lines and determines that the
average number of defects is 72. She wants to be 95% confident of the
❑ overall population average, plus or minus 3 lines
❑ Was her sample of 100 adequate to estimate the overall c?
2
𝐶 𝑍𝛼/2 72 ∗ 1.962
𝑛= = = 30.7 ⇒ 31
𝑒2 3 2

172
❑ Jennifer has already completed her first project. She is now analyzing a
suggestion to reduce the number of cell phones the company pays for. While
there is a report from the phone company about the number of calls per cell
phone, Jennifer knows she needs to verify the data on the report for her
Measurement System Analysis
❑ What size sample does she need to be 95% confident in the GRR accuracy if the
average number of calls per cell phone per month is 32 and she wants to be within
+/- 5 calls?
❑ Jennifer says this is great news! I can afford to be more accurate. How about +/-
2 calls? What will you tell her?

173
❑ What size sample does she need to be 95% confident in the GRR accuracy if the
average number of calls per cell phone per month is 32 and she wants to be within
+/- 5 calls?
2
𝐶 𝑍𝛼/2 32 ∗ 1.962
𝑛= = = 4.92 ⇒ 5
𝑒2 5 2
❑ Jennifer says this is great news! I can afford to be more accurate. How about +/-
2 calls? What will you tell her?
2
𝐶 𝑍𝛼/2 32 ∗ 1.962
𝑛= = = 30.73 ⇒ 31
𝑒2 2 2

174
Sample Size - Summary
Continuous Proportions Count
2 𝐙𝛂𝟐 𝐏(𝟏 − 𝒑)
𝑍𝛼/2 𝜎2 2
𝐶 𝑍𝛼/2
𝟐
𝑛= 𝐧= 𝑛=
𝑒2 𝐞𝟐 𝑒2
❑ We often do not have historical defect data, so we begin by taking a sample of

100 for attribute data and at least 30 for continuous to get an estimate
❑ When we find sample sizes too numerous to investigate, we can decrease our
confidence or the amount of error to get us to a more reasonable n

175
Analysis of Variance (ANOVA)
176
ANOVA
❑ Analysis of Variance (ANOVA) is used to investigate and model the relationship

between a response variable and one or more independent variables
❑ Analysis of Variance extends the two sample t-test for testing the equality of two
population Means to a more general null hypothesis of comparing the equality of
more than two Means, versus them not all being equal
❑ The classification variable, or factor, usually has three or more levels (If there are
only two levels, a t-test can be used)
❑ Allows you to examine differences among means using multiple comparisons
❑ The ANOVA test statistic is:

Avg SS between S2 between
= 2
Avg SS within S within
177
What do we want to know?
❑ Is the between group variation large enough to be distinguished from the within
group variation?
(Between Group Variation)

delta X
(δ)
Total (Overall) Variation
Within Group Variation

(level of supplier 1)
X
X
X X
X
X X X
μ1 μ2
178
Calculating ANOVA
Where:
Total (Overall) Variation
G - the number of groups (levels in the study)
xij = the individual in the jth group
nj = the number of individuals in the jth group or level

delta
(δ) Within Group Variation
𝑋 = the grand Mean
Xj = the Mean of the jth group or level
(Between Group Variation)
Between Group Variation Within Group Variation Total Variation

g g nj g nj
j=1
nj (Xj − X) 2
 (X ij − X) 2
 (X
j=1 i =1
ij − X) 2
j=1 i =1

179
Alpha Risk and Pair-Wise t-tests
❑ The alpha risk increases as the number of Means increases with a pair-wise t-test
scheme. The formula for testing more than one pair of Means using a t-test is:
1 − (1 − α )
k
where k = number of pairs of means

so, for 7 pairs of means and an α = 0.05 :
1 - (1 - 0.05) = 0.30
7
or 30% alpha risk

180
Comparison Of Means
❑ “Are the means of the populations (1, 2, 3, 4) equal, or are there statistically
significant differences?”
vs.. vs.. vs..
1 2 3 4
❑ These populations represent the levels of a factor
❑ Use samples to make inferences about the populations

181
Example
❑ The Sigma Finance Company is attempting to improve the time it takes to process
forms. The team believes there is a difference in the form cycle time between the
four processing centers
Center 1 Center 2 Center 3 Center 4
62 63 68 56
60 67 66 62
63 71 72 60
59 64 67 61
65 68 63
66 68 64
63
59

182
ANOVA Table In MINITAB
One-way ANOVA: Center 1, Center 2, Center 3, Center 4
Analysis of Variance
Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000
Error 20 112.00 5.60
Total 23 340.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ---+---------+---------+---------+---
Center 1 4 61.000 1.826 (------*------)
Center 2 6 66.000 2.828 (-----*----)
Center 3 6 68.000 1.673 (----*-----)
Center 4 8 61.000 2.619 (----*----)
---+---------+---------+---------+---
Pooled StDev = 2.366 59.5 63.0 66.5 70.0

183
The Concept
Center 1 Center 2 Center 3 Center 4 Center 1 Center 2 Center 3 Center 4
62 63 68 56 62 63 68 56
60 67 66 62 60 67 66 62
63 71 72 60 63 71 72 60
59 64 67 61 59 64 67 61
65 68 63 65 68 63
66 68 64 66 68 64
Variation 63 63
Within Variation Between
59 59
(error) (Factor)
1.82 2.82 1.67 2.62 61 66 68 61
Stdev AVG
62 63 68 56
60 67 66 62 Variation
Total
63 71 72 60 = Variation +
Variation Between Within
59 64 67 61
65 68 63
66 68 64
63
Total Variation 59
184
The Formulas
Variation Within (error) Variation Between (Factor)
62 63 68 56 62 63 68 56
60 67 66 62 60 67 66 62
63 71 72 60 63 71 72 60
59 64 67 61 59 64 67 61
65 68 63 65 68 63
66 68 64 66 68 64
63 63
2
Σ (nj – 1)sj 59 Y = 64 59
4
=
4
SS
Error
෍(𝑛𝑗 − 1) 𝑆𝑗 2
SS = ෍ 𝑛𝑗 (𝑦𝑗 − 𝑦)2
Factor
(Within) 𝑗=1
(Between) 𝑗=1
Total Variation
62 63 68 56
Total Variation
60 67 66 62 = Variation +
63 71 72 60 Variation Between Within
59 64 67 61
65 68 63 SST = SSb + SSe
66 68 64
63 Factor Error
59 185
How it works
Variation Between (Factor)
62 63 68 56 yj 61 66 68 61
60 67 66 62
63 71 72 60 sj2 3.33 7.95 2.79 6.85
59 64 67 61
65 68 63 nj 4 6 6 8
66 68 64
63
Y = 64 59
SS =
4
SSb Analysis of Variance
Error
෍(𝑛𝑗 − 1) 𝑆𝑗 2
(Within) 𝑗=1 Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000
4 Error 20 112.00 5.60
SS = ෍ 𝑛𝑗 (𝑦𝑗 − 𝑦)2 Total 23 340.00
Factor
(Between) 𝑗=1
SSe

186
Mean Sum Of Squares
4
෍ 𝑛𝑗 (𝑦𝑗 − 𝑦)2 MSb

36 + 24 + 96 + 72 228
MS = 𝑗=1
= = = 76
Factor
(Between)
# of Factors -1 3 3
F Calculated
Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000 76
= 13.57
5.6
Error 20 112.00 5.60
Total 23 340.00
4 MSe
2
෍(𝑛𝑗 − 1) 𝑆𝑗
MS = 9.99 + 39.75 + 13.95 + 48.02 112
Error
𝑗=1
4
= = = 5.60
(Within)
20 20
෍(𝑛𝑗 − 1)
𝑗=1

187
ANOVA Table
One-way ANOVA: Center 1, Center 2, Center 3, Center 4
Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000
Error 20 112.00 5.60
Total 23 340.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ---+---------+---------+---------+---
Center 1 4 61.000 1.826 (------*------)
Center 2 6 66.000 2.828 (-----*----)
Center 3 6 68.000 1.673 (----*-----)
Center 4 8 61.000 2.619 (----*----)
---+---------+---------+---------+---
Pooled StDev = 2.366 59.5 63.0 66.5 70.0

188
Conclusion
❑ The Sigma Finance Company has made a decision to outsource its contracting
function. Four companies have been identified and one of the criteria is the time
in which they close contracts
❑ Are any of the vendors’ significantly better than the others in average time and
consistency with at least 95% confidence?
▪ Since p-value ≤ significant level (α) then we reject the null hypothesis (H0) and
conclude there is a difference between the four companies

189
How To Set Up In MINITAB
Follow the hypothesis test roadmap!

190
Main Effects Plot

191
Main Effects Plot
Main Effects Plot - Data Means for Stacked
Grand
68
Average
67
66
Stacked
65
64
63
62
61
Center

192
Three Samples Example
❑ We have three potential suppliers that claim to have equal levels of quality.
Supplier B provides a considerably lower purchase price than either of the other
two vendors. We would like to choose the lowest cost supplier but we must ensure
that we do not effect the quality of our raw material.
We would like test the data to determine whether

there is a difference between the three suppliers
193
Test for Normality
Probability Plot of Supplier A
Normal ❑ All three suppliers samples are Normally
99
Mean
StDev
3.664
0.4401 Distributed
95 N 5
AD 0.246
90
Supplier A (P-value 0.568), Supplier B (P-value

P-Value 0.568
80
70
Percent
60
50 0.385), Supplier C (P-value 0.910)
40
30
20
10 Probability Plot of Supplier B

Normal
5
99
Probability
Mean Plot
3.968 of Supplier C
1 StDev 0.2051
Normal
2.5 3.0 3.5 4.0
95 4.5 N 5
Supplier A 90
99 AD 0.314
P-Value 0.385 Mean 4.03
StDev 0.4177
80
95 N 5
70 AD 0.148
90
Percent
60 P-Value 0.910
50
80
40
70
30
Percent
60
20
50
10 40
30
5
20
1 10
3.50 3.75 4.00 5 4.25 4.50
Supplier B
1
3.0 3.5 4.0 4.5 5.0
Supplier C

194
Test for Equal Variance
❑ Test for Equal Variance (Must stack data to

create “Response” & “ Factors”):
Test for Equal Variances for Data
Bartlett's Test
Test Statistic 2.11
Supplier A P-Value 0.348
Lev ene's Test
Test Statistic 0.59
P-Value 0.568
Suppliers
Supplier B
Supplier C
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8

195
ANOVA MINITAB
Stat>ANOVA>One-Way Unstacked
Enter Stacked Supplier data in

“Responses:”
Click on “Graphs…”,
Check “Boxplots of data” 196
ANOVA MINITAB
What does this graph tell us?
Boxplot of Supplier A, Supplier B, Supplier C

4.6
4.4
4.2
4.0
Data
3.8
3.6
3.4
3.2
3.0
Supplier A Supplier B Supplier C

197
ANOVA Session Window
P-value > .05 - No Difference
between suppliers
One-way ANOVA: Supplier A, Supplier B, Supplier C
Source DF SS MS F P
Factor 2 0.384 0.192 1.40 0.284
Error 12 1.641 0.137
Total 14 2.025
S = 0.3698 R-Sq = 18.95% R-Sq(adj) = 5.44%
Level N Mean StDev

Supplier A 5 3.6640 0.4401
Supplier B 5 3.9680 0.2051 Stat>ANOVA>One Way (unstacked)
Supplier C 5 4.0300 0.4177
Individual 95% CIs For Mean Based on Pooled StDev

Level +---------+---------+---------+---------
Supplier A (-----------*-----------)
Supplier B (-----------*-----------)
Supplier C (-----------*-----------)
+---------+---------+---------+---------
3.30 3.60 3.90 4.20
Pooled StDev = 0.3698

198
ANOVA Session Window
One-way ANOVA: Supplier A, Supplier B, Supplier C
Source DF SS MS F P
Factor 2 0.384 0.192 1.40 0.284
Error 12 1.641 0.137
Total 14 2.025
S = 0.3698 R-Sq = 18.95% R-Sq(adj) = 5.44%

F-Calc F-Critical
Level N Mean StDev

D/N 1 2 3 4
Supplier A 5 3.6640 0.4401
1 161.40 199.50 215.70 224.60
Supplier B 5 3.9680 0.2051
2 18.51 19.00 19.16 19.25
Supplier C 5 4.0300 0.4177
3
4
10.13
7.71
9.55
6.94
9.28
6.59
9.12
6.39
𝐹𝑐𝑎𝑙𝑐 ≥ 𝐹𝛼,2,12
Individual 95% CIs For Mean Based on Pooled StDev 2
5 6.61 5.79 5.41 5.19
Level +---------+---------+---------+---------
6 5.99 5.14 4.76 4.53 1.40 𝑖𝑠 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 3.89
Supplier A (-----------*-----------)
7 5.59 4.74 4.35 4.12
Supplier B (-----------*-----------) 𝑤𝑒 𝑐𝑜𝑛𝑐𝑙𝑢𝑑𝑒 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠
8 5.32 4.46 4.07 3.84
Supplier C (-----------*-----------) 9 5.12 4.26 3.86 3.63 no Difference
+---------+---------+---------+--------- 10 4.96 4.10 3.71 3.48 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑠𝑢𝑝𝑝𝑙𝑖𝑒𝑟𝑠
3.30 3.60 3.90 4.20 11 4.84 3.98 3.59 3.36
12 4.75 3.89 3.49 3.26
Pooled StDev = 0.3698 13 4.67 3.81 3.41 3.18
14 4.60 3.74 3.34 3.11
15 4.54 3.68 3.29 3.06
199
ANOVA Assumptions
1. Observations are adequately described by the model
2. Errors are normally and independently distributed
3. Homogeneity of variance among factor levels
❑ In one-way ANOVA, model adequacy can be checked by either of the following:
▪ Check the data for Normality at each level and for homogeneity of variance
across all levels
▪ Examine the residuals (a residual is the difference in what the model predicts
and the true observation)
o Normal plot of the residuals
o Residuals versus fits
o Residuals versus order

200
Residual Plots

201
Histogram of Residuals
Histogram of the Residuals

(responses are Supplier A, Supplier B, Supplier C)
4
Frequency
0
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Residual
The Histogram of residuals should show a

bell shaped curve.
202
Normal Probability Plot of Residuals
Normal Probability Plot of the Residuals
99
95
90
80
70
Percent
60
50
40
30
20
10
1
-1.0 -0.5 0.0 0.5 1.0
Residual
Normality plot of the residuals should follow a straight line

Results of our example look good
The Normality assumption is satisfied
203
Normal Probability Plot of Residuals
Residuals Versus the Fitted Values

0.75
0.50
0.25
Residual
0.00
-0.25
-0.50
3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00 4.05

Fitted Value
The plot of residuals versus fits examines constant variance

The plot should be structureless with no outliers present 204
Fisher’s Least Significant Difference
❑ A one-way ANOVA is used to determine whether or not there is a statistically

significant difference between the means of three or more independent groups
❑ If the p-value from the ANOVA is less than some significance level (like α = .05), we
can reject the null hypothesis and conclude that at least one of the group means is
different from the others
❑ But in order to find out exactly which groups are different from each other, we
must conduct a post-hoc test
❑ One commonly used post-hoc test is Fisher’s least significant difference test

205
Fisher’s Least Significant Difference
❑ To perform this test, we first calculate the following test statistic:
1 1
𝐿𝑆𝐷 = 𝑡𝛼 , 𝐷𝐹 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑠 ∗ 𝑀𝑆𝐺𝑟𝑜𝑢𝑝𝑠 +
2 𝑛1 𝑛2
Where 𝒕𝜶 , 𝑫𝑭 𝒇𝒐𝒓 𝒈𝒓𝒐𝒖𝒑𝒔 ∶ the t-crtitical from the t-distribution with 𝛼 and 𝐷𝐹 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑠 is the
𝟐
degree of freedom within groups from the ANOVA table
𝑴𝑺𝑮𝒓𝒐𝒖𝒑𝒔 ∶ the mean squared within groups from the ANOVA table
𝒏𝟏 , 𝒏𝟐 : the sample size of each group
❑ We can then compare the mean difference between each group to this test
statistic. If the absolute value of the mean difference between two groups is
greater than the test statistic, we can declare that there is a statistically significant
difference between the group means
206
Example: Fisher’s LSD Test
❑ Suppose a professor wants to know whether or not three different studying

techniques lead to different exam scores among students. To test this, she
randomly assigns 10 students to use each studying technique and records their
exam scores
❑ The following table shows the exam scores for
each student based on the studying technique
they used:

207
❑ The professor performs a one-way ANOVA and get the following results:

208
❑ Since the p-value in the ANOVA table (.018771) is less than .05, we can conclude
that not all of the mean exam scores between the three groups are equal
❑ Thus, we can proceed to perform Fisher’s least significant difference test to

determine which group means are different
❑ Using the output of the ANOVA, we can calculate Fisher’s test statistic as:
1 1
𝐿𝑆𝐷 = 𝑡𝛼 , 𝐷𝐹 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑠 ∗ 𝑀𝑆𝐺𝑟𝑜𝑢𝑝𝑠 +
2 𝑛1 𝑛2
1 1
𝐿𝑆𝐷 = 𝑡0.025, 27 ∗ 36.948 + = 2.0252 7.3896 = 5.578
10 10

209
❑ We can then calculate the absolute mean difference between each group:
▪ Technique 1 vs. Technique 2: |80 – 85.8| = 5.8
▪ Technique 1 vs. Technique 3: |80 – 88| = 8
▪ Technique 2 vs. Technique 3: |85.8 – 88| = 2.2
❑ The absolute mean differences between technique 1 vs. technique 2 and technique
1 vs. technique 3 are greater than Fisher’s test statistic, thus we can conclude that
these techniques lead to statistically significantly different mean exam scores
❑ We can also conclude that there is no significant difference in mean exam scores
between technique 2 and technique 3

210

Module - 4 - Analyze Phase - Oct 20

Uploaded by

Copyright:

Available Formats

You might also like

Module - 4 - Analyze Phase - Oct 20

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module - 4 - Analyze Phase - Oct 20

Uploaded by

Copyright:

Available Formats

Analyze Phase

– Overview of Analyze Phase

– Analysis of Variance (ANOVA)

– Correlation and Regression

Paretos, Multi-vari, 1-t, 2-t, paired t, ANOVA,

3-6 Key PIVs

Saudi Aramco: Company General Use

❑ Individual components of the Analyze aren’t intuitively related but if their

▪ Analysis of Variance (ANOVA)

▪ Correlation and Regression

Saudi Aramco: Company General Use

❑ The probability of occurrence is based on a pre-determined statistical confidence

❑ Decisions are based on:

▪ Preferences (current needs)

▪ Evidence (statistical data)

▪ Risk (acceptable level of failure)

❑ A hypothesis is just a statement that we want to test:

▪ There is no relationship between humidity and our Cost of Poor Quality

In statistics, we usually form at least two hypotheses:

❑ The “null hypothesis” H0 assumes no significant difference/relationship

▪ This is the default assumption of all statistical tests

❑ The “alternative hypothesis” Ha assumes there is significant difference/relationship

❑ To do any hypothesis testing, we need to

Saudi Aramco: Company General Use

❑ Assume that we know the bags of cookie

❑ If the owner’s claim is true (the average

❑ Here, is where “hypothesis testing” comes in

Saudi Aramco: Company General Use

❑ Since we are unsure about how our

❑ The problem statement is set

H0: Average weight of one bag of cookies (μ) = 500g

❑ Well…obviously, it is IMPOSSIBLE for us to collect ALL the cookies (population) produced

❑ Here we need to use what we learned in Inferential statistics

Saudi Aramco: Company General Use

Saudi Aramco: Company General Use

❑ In hypothesis testing, we are not interested in a single unknown parameter; instead,

❑ To answer this question,

Saudi Aramco: Company General Use

❑ Sampling Distribution is the distribution of the sample statistic

❑ Let’s use sample mean (x̄) as an example

❑ A sampling distribution is similar to all the other distributions

We will use the

Saudi Aramco: Company General Use

❑ The first thing we need to do is to have a sample dataset

❑ To do so, we first assume the null hypothesis is true

Saudi Aramco: Company General Use

Saudi Aramco: Company General Use

❑ So now, if the null hypothesis is true, we could

❑ Hmm… but “15g” is only a number, which is not

❑ Also, if we want to calculate the probability under

the curve, it is inefficient to calculate it case by case

❑ The benefit of standardization is that statisticians already generate a table that

Saudi Aramco: Company General Use

❑ The next picture shows the sampling

❑ In our case, our sample mean equals

❑ The test statistic is chosen based on different cases.

❑ Based on our condition, we choose the z-test for this case