Professional Documents
Culture Documents
Module - 4 - Analyze Phase - Oct 20
Module - 4 - Analyze Phase - Oct 20
Module - 4 - Analyze Phase - Oct 20
Module 4
1
Agenda
– Hypothesis Testing
– Confidence Intervals
– Sample Size
– Chi Square
– Multi-variate Studies
2
Saudi Aramco: Company General Use
Overview of Analyze Phase
3
In Analyze phase, we sift thru the various
x’s to focus on the critical x’s
Identify problem’s root causes through process & data analysis
Define phase
Symptoms “Y” Quantify output
Measure phase
Fishbones, C&E matrix, FMEA
Process Maps, etc 30 - 50 Inputs Prioritize causes
❑ Several common concepts are repeated throughout this phase; If they are
understood up front, the follow-on lessons are easier
▪ Hypothesis Testing
▪ Confidence Intervals
▪ Chi Square
6
What is Hypothesis Testing (1/2)
▪ The reject rate from Machine 5 has improved as a result of our work
▪ My average Chess Ranking for 2022 is lower than my average Chess Ranking for 2023
Saudi Aramco: Company General Use
7
What is Hypothesis Testing (2/2)
▪ Then the measured sample value is compared to the hypothesised value of the population
& then we decide to support or not support the hypothesis
❑ The key question becomes how can we reliably use the values from a single sample
to make conclusions about a population value?
❑ A cookie shop is selling their famous product — gingerbread cookie! The owner
believes his product is the most delicious one in the world
❑ Also, the owner said that the average weight (μ) of each product (a bag of
gingerbread cookies) is 500g
❑ Does the average weight of a bag of cookies really equal 500g? What if the owner
deceives customers and gives less than 500g cookies? How do we validate his words?
❑ To implement hypothesis testing, firstly, let’s set up our null hypothesis (H0) and the
alternative hypothesis (H1)
❑ As Industrial engineer were taught not suspect others without having any evidence
❑ So, we assume the owner is honest about his business (H0). If we want to check
whether his cookies is less than 500g, we need to collect data and have enough
evidence to support our guess (H1) So…we have the hypothesis statement set up as
follows: H0: Average weight of one bag of cookies (μ) = 500g
H1: Average weight of one bag of cookies (μ) < 500g
❑ However, if the owner’s claim is not true and the mean weight of cookies is less
than 500g, the population distribution should look differently (any of right
picture)
Saudi Aramco: Company General Use
11
How to test the owner’s statement ?
❑ So now, the next question is: how to test our hypothesis statement?
❑ Maybe just weigh all bags of cookies so that we could know the exact population
distribution?
❑ From inferential statistics, it is almost impossible to collect all the data of the whole
population to calculate the parameters (population mean μ, population standard deviation
σ,..etc) and that’s why we use samples and statistics (sample mean 𝑥, sample standard
deviation s,….etc) as an estimator to help us infer the unknown population parameters
❑ Before we collect sample data and calculate the test statistic to test the hypothesis
statement, we need to understand the concept of sampling distribution
▪ Use these data points, we could draw a distribution of sample mean (x̄). Since this
distribution is from the sample statistic, we called the distribution Sampling Distribution of
sample mean (x̄)
❑ The same idea applies to other statistics. For example, if we calculate the test statistic
from each sample dataset, we could get the sampling distribution of the test statistic.
Saudi Aramco: Company General Use
15
Sampling Distribution
❑ It shows how likely (probability) the statistic value might appear if we sampling from
the population many times
❑ So, we go to the cookie shop and randomly pick up 25 bags of cookies (n) as our sample
data, and we calculate the mean weight (x̄) of this sample is 485g
❑ The first part of testing is to compare our sample statistic to the null hypothesis so that
we can know how far away our sample statistic is from the expected value
❑ What does this mean? This means, in our case, we assume the population distribution
of one bag of cookies is really equals to 500g
❑ If the statement is true, according to Central Limit Theorem, we could have a sampling
distribution of sample mean (x̄) looks like the below picture (mean value of the sample
mean = 500g) if we sampling from this population many times
❑ Imagine there are numerous distributions, each of them has its own mean and standard
deviation…you really don’t want to calculate the probability for many many times…
❑ So, what should we do? We standardize our value so that the mean value of distribution
always equals zero 19
Saudi Aramco: Company General Use
Z-score and Test Statistic
❑ So that we don’t need to calculate the area case by case. All we need to do is to
standardize our data
❑ How to standardize? In our case, we use the z-score to transform our data. And z-score
is the Test Statistic in our case
❑ You might hear different kinds of statistical tests, such as z-test, t-test, chi-square test…Why
we need different kinds of tests?
❑ Because we might need to test different types of data (categorical? quantitative?), we might
have different purpose of testing (testing for mean? proportion?), the data we have might
have a different distribution, we might only have limited attributes of our data……Hence,
how to choose a suitable testing method is another crucial work
❑ In this case, since we are interested in testing the mean value, also, we assume our
population data is normally distributed with known population standard deviation (σ)
❑ So, we know how far away our test statistic is from the expected value when the null
hypothesis is true. Then, what we really want to know is: how likely (probability) we get this
sample data if the null hypothesis is true?
❑ To answer this question, we need to calculate the probability. As you know, the probability
between one point to the other point is the area under our sampling distribution curve
between these two points
❑ So here, we do not calculate the probability of a specific point; instead, we calculate the
probability from our test statistic point to infinite — indicates the cumulative probability of
all the points which farther away from our test statistic (also farther away from the
expected test statistic)
❑ The p-value is the probability of obtaining test results at least as extreme as the results
actually observed, under the assumption that the null hypothesis is correct
❑ Now, we have p-value = 0.0062. It is a small number…but what does this mean?
❑ This means, under the condition that our null hypothesis is true (population mean really
equals 500g), if we sampling from this population distribution 1000 times, we will have 6.2
times chance to get this sample data (sample mean = 485g) or other samples with sample
mean less than 485g
2. The assumption of the “null hypothesis is true” is incorrect. This sample data (sample
mean equals 485g) actually comes from other population distribution where the sample
mean = 485g more likely to happen
Saudi Aramco: Company General Use
26
P-value
❑ So now we know that if our p-value is very small, that means either we get a very rare
sample data or our assumption (null hypothesis is true) is incorrect
❑ Then, the next question is: we only have the p-value now, but how to use it to judge
when to reject the null hypothesis? In other words, how small the p-value is, we are
willing to say that this sample comes from another population?
❑ Here, let’s introduce the judgment standard — significant level (α). The significant
level is a pre-defined value that needs to be set before implementing the hypothesis
testing. The significant level is just a threshold, which gives us a criterion of when to
reject the null hypothesis
Saudi Aramco: Company General Use ▪ if p-value > significant level (α), we fail to reject the null hypothesis (H0) 27
Significance Level
❑ We can see the below picture, the red area is the significant level (In our case, it
equals 0.05). We use the significant level as our criterion, if the p-value within (less
than or equal to) the red area, we reject H0; if the p-value exceeds (greater than) the
red area, we fail to reject H0
❑ The result will be different. Since 0.0062 > 0.005, we then fail to reject H0. So here is
the tricky part, since the significant level is subjective, we need to determine it
before the testing. Otherwise, we might very likely to cheat ourselves after knowing
the p-value
❑ Part 1: To test whether our sample data support the alternative hypothesis or not, we
first assume the null hypothesis is true. So that we can know how far away our sample
data from the expected value given by the null hypothesis. The p-value is the
probability of obtaining test results at least as extreme as the results actually
observed, under the assumption that the null hypothesis is correct
❑ Part 2: Based on the distribution, data types, purpose, known attributes of our data,
choose an appropriate test statistic. And calculate the test statistic of our sample
data. (Test statistic shows how far away our sample data from the expected value)
❑ Part 3: Calculate the probability (area under the sampling distribution curve) from the
test statistic point to infinite (indicates more extreme) at the direction represent your
alternative hypothesis(left-tailed, right-tailed, two-tailed)
(2) This sample data is not from our null hypothesis distribution; instead, it is from
other population distribution. (So that we consider to reject the null hypothesis)
To determine whether we could reject the null hypothesis, we compare the p-value to
the pre-defined significant level (threshold)
▪ If p-value > significant level (α), we fail to reject the null hypothesis (H0)
❑ The alpha risk or Type 1 Error (generally called the “Producer’s Risk”) is the probability
that we could be wrong in saying that something is “different
❑ It is an assessment of the likelihood that the observed difference could have occurred
by random chance. Alpha is the primary decision-making tool of most statistical tests
Actual Conditions
Not Different Different
(Ho is True) (Ho is False)
F-distribution
Region of Region of
DOUBT DOUBT
❑ Alpha (α) is known as the significance level; the probability of being wrong (risk level)
❑ The beta risk or Type 2 Error (also called the “Consumer’s Risk”) is the probability that
we could be wrong in saying that two or more things are the same when, in fact, they
are different
Actual Conditions
Not Different Different
(Ho is True) (Ho is False)
❑ Beta Risk is the probability of failing to reject the null hypothesis when a difference
exists
Distribution if H0 is true
Reject H0
= Pr(Type 1 error)
= 0.05
H0 value
❑ The beta risk is the probability that we could be wrong in saying that two or more
things are the same when, in fact, they are different
DECIDE:
What does the evidence suggest?
Reject Ho? or Fail to reject Ho?
1-Sample 2-Sample
Mean ANOVA
t t Continuous
2
-test ANOVA ANOVA
MINITAB • 2 Variances Test For Equal
Descriptive • Test For Equal Variances
Variance Statistics Variance Continuous
(Use CI)
2-Sample
1-Sample 2
Proportion P Test - test Discrete
P Test 2
- test
Normal
Two samples One sample
Non Normal
Two samples One sample
Attribute Data
o It is just a decision
❑ If the decision is to “Fail to Reject Ho,” then the conclusion should read “There
isn’t sufficient evidence at the α level of significance to show that “state the
alternative hypothesis.”
46
Hypothesis test for μ (σ known)
Null hypothesis: H0 : = 0
When the sample size is large, the z tests for case I are easily modified to
yield valid test procedures without requiring either a normal population
distribution or known
A large n (>30) implies that the standardized variable
In general:
Null hypothesis: H0 : 1 – 2 = 0
Null hypothesis: H 0 : 1 – 2 = 0
Furthermore, using 𝑆12 and 𝑆12 in place of 12 and 22 gives a
variable whose distribution is approximately standard normal:
samples):
Theorem
Let X1,…, Xm be a random sample from a normal distribution
with variance let Y1,…, Yn be another random
sample (independent of the Xi’s) from a normal distribution
with variance and let and denote the
two sample variances. Then the rv
Null hypothesis:
Ratio of Variances or Equality of Variances are the same test as either their
ratio is close to one or their difference is close to zero
Saudi Aramco: Company General Use 70
Bartlett's Test for Equality of Variances
❑ Check the equality or test the variation between two sample data, or two groups of
data we use F-test
❑ When we want to test the equality of variances between more than 2 variances, we
use Bartlett’s test
❑ Bartlett's test is used to test if k samples have equal variances. Equal variances
across samples is called homogeneity of variances.
❑ Some statistical tests, for example the analysis of variance (ANOVA), assume that
variances are equal across groups or samples
❑ The Levene’s Test is an alternative to the Bartlett test that is less sensitive to
departures from normality
❑ Some common statistical methods assume that variances of the populations from
which different samples are drawn are equal. Bartlett's test assesses this
assumption. It tests the null hypothesis that the population variances are equal
In the above, Si2 is the variance of the ith group, N is the total sample size, Ni is the
sample size of the ith group, k is the number of groups, and Sp2 is the pooled variance. The
pooled variance is a weighted average of the group variances and is defined as:
❑ A Paired t-test is used to compare the Means of two measurements from the same
samples generally used as a before and after test
❑ This is appropriate for testing the difference between two Means when the data are
paired and the paired differences follow a Normal Distribution
❑ This matching allows you to account for variability between the pairs usually delta
(d)
resulting in a smaller error term, thus increasing the sensitivity
of the Hypothesis Test or confidence interval.
Ho: μδ = μo
before after
Ha: μδ ≠ μo
❑ We are interested in changing the sole material for a popular brand of shoes for
children. In order to account for variation in activity of children wearing the shoes,
each child will wear one shoe of each type of sole material. The sole material will
be randomly assigned to either the left or right shoe.
❑ 2. Statistical Problem:
Ho: μδ = 0
Ha: μδ ≠ 0
α = 0.05 β = 0.10
❑ Following the Hypothesis Test roadmap, we first test the AB-Delta distribution for
Normality
Reject the null hypothesis since we are 95% confident that there is a
difference in wear between the two materials (does not include zero)
Saudi Aramco: Company General Use
80
Hypothesis Testing (Non-normal data)
81
Non-Normal Hypothesis Tests
❑ At this point we have covered the tests for determining significance for Normal
Data. We will continue to follow the roadmap to complete the test for Non-normal
Data with Continuous Data
❑ Later in the module we will use another roadmap that was designed for Discrete
Data
❑ Recall that Discrete Data does not follow a Normal Distribution, but because it is
not Continuous Data, there are a separate set of tests to properly analyze the data
▪ When the two indices of interest (X-Bar and s) depend on the data being Normal
▪ For problem solving purposes, because we don’t want to make a bad decision – having
Normal Data is so critical that with EVERY statistical test, the first thing we do is check
for Normality of the data
▪ Kurtosis
▪ Granularity 83
Saudi Aramco: Company General Use
Non-Normal Distributions
1 Skewed 2 Kurtosis
3 Multi-Modal 4 Granularity
60
40
50
Frequency
Frequency
30 40
20 30
20
10
10
0 0
10 15 20 4 5 6 7 8 9 10 11
Machine A Machine B
Operator A Operator B
Payment Method A Payment Method B Combined
Interviewer A Interviewer B
Sample A + Sample B
=
10
Marginal Distribution
Y
5
of Y
0
0 50 100
X
Marginal Distribution
Saudi Aramco: Company General Use
of X 87
1-5 Interactions
On
35
If you find that two
Room Temperature
Spray
Off inputs have a large
impact on Y but would
30
not effect Y by
themselves, this is
called a Interaction
25
No Spray
30
Marginal Distribution
25
of Y
20
10 20 30 40 50
Time
Often seen when tooling requires “warming up”, tool wear, chemical bath
depletions, ambient temperature effect on tooling
Saudi Aramco: Company General Use
89
Non-Normal Right (Positive) Skewed
M ean 70.000
S tD ev 10.000
V ariance 100.000
S kew ness 2.41707
Kurtosis 6.93041
N 500
M inimum 62.921
1st Q uartile 63.647
M edian 65.695
3rd Q uartile 72.821
70 80 90 100 110 120 130
M aximum 130.366
95% C onfidence Interv al for M ean
69.121 70.879
95% C onfidence Interv al for M edian
65.260 66.501
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter v als
9.416 10.662
Mean
Median
65 66 67 68 69 70 71
M ean 70.000
S tD ev 10.000
V ariance 100.000
S kew ness 2.41707
Kurtosis 6.93041
N 500
M inimum 62.921
1st Q uartile 63.647
M edian 65.695
3rd Q uartile 72.821
70 80 90 100 110 120 130
M aximum 130.366
95% C onfidence Interv al for M ean
69.121 70.879
95% C onfidence Interv al for M edian
65.260 66.501
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter v als
9.416 10.662
Mean
Median
65 66 67 68 69 70 71
Leptokurtic Platykurtic
Peaked with Long-Tails Flat with Short-Tails
44 48 52 56 60 64
3rd Q uartile
M aximum
56.729
64.140
2-2 Sorting or Selecting:
95% C onfidence Interv al for M ean Scrapping product that falls outside the
51.585 53.076
95% C onfidence Interv al for M edian
spec limits
50.932 53.741
M ean 51.389
Multiple Processes)
S tD ev 12.998 Multiple Set-Ups
V ariance 168.960
S kew ness -0.06752 Multiple Batches
Kurtosis 3.08271
N 125 Multiple Machines
M inimum 0.813 Tool Wear (over time)
1st Q uartile 46.488
M edian 52.017
3rd Q uartile 55.620
0 15 30 45 60 75 90
M aximum 94.795 2-2 Sorting or Selecting:
95% C onfidence Interv al for M ean
Scrapping product that falls outside the
49.088 53.691
95% C onfidence Interv al for M edian spec limits
50.584 52.666
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
11.562 14.845 2-3 Trends or Patterns:
Mean
Lack of Independence in the data
Median (example: tool wear, chemical bath)
49 50 51 52 53 54
3 Catastrophic failures
(example: testing voltage on a motor and the motor shorts out so we
get a zero reading etc.)
M ean
0.005
79.570
This is an example of a Bi-Modal
S tD ev
V ariance
32.385
1048.785 Distribution. Interestingly each
peak is actually a Normal
S kew ness 0.00716
K urtosis -1.63184
N 500
M inimum
1st Q uartile
21.341
48.265
Distribution, but when the data is
20 40 60 80 100 120 140
M edian
3rd Q uartile
83.772
110.379 viewed as a group it is obviously
M aximum 142.391
95% C onfidence Interv al for M ean not Normal
76.724 82.416
95% C onfidence Interv al for M edian
62.354 97.233
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter v als
30.494 34.527
Mean
Median
60 70 80 90 100
2 Different Distributions
-2 different machines
-2 different operators
-2 different administrators
Saudi Aramco: Company General Use
96
Extreme Bi-Modal (Outliers)
.
Summary for ExtremeBiModal
A nderson-Darling N ormality Test
A -S quared 22.88 If you see an extreme
P -V alue < 0.005
outlier, it usually has
M ean 58.487
S tD ev 21.751 its own cause or own
V ariance 473.106
S kew ness -0.59479
source of variation. It’s
Kurtosis
N
-1.03403
385
relatively easy to
M inimum 19.987 isolate the cause by
1st Q uartile
M edian
26.920
66.161
looking on the X axis of
30 45 60 75 90 105
3rd Q uartile 74.140 the Histogram
M aximum 103.301
95% C onfidence Interv al for M ean
56.308 60.667
95% C onfidence Interv al for M edian
63.410 67.793
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
20.315 23.406
Mean
Median
M inimum 22.629
1st Q uartile 24.128
M edian 25.053
3rd Q uartile 25.971
24 28 32 36 40 44
M aximum 46.000
95% C onfidence Interv al for M ean
25.326 27.175
95% C onfidence Interv al for M edian
24.836 25.297
95% C onfidence Interv al for S tD ev
9 5 % C onfidence Inter vals
4.274 5.594
Mean
Median
❑ Non-normal Distributions can give more Root Cause information than Normal data
(the nature of why…)
Non Normal
Two samples One sample
❑ Ho: σ1 = σ2 = σ3 …
Percent
Stat > Basic Statistics > Normality test… 60
50
40
30
20
10
5
0.1
-5.0 -2.5 0.0 2.5 5.0 7.5 10.0
Rot 2
Factors2
Lev ene's Test
Test Statistic 0.03
P-Value 0.860
2
1
Factors2
0 2 4 6 8 10
Rot 2
❑ When testing >2 samples with Normal Distribution, use Bartlett’s test:
❑ When testing 2 or more samples with Non-normal Distributions, use Levene’s test:
Our focus for this module when working with Non-normal Distributions
This Graphical Summary provides the confidence interval for the Median
With Normal Data notice the With skewed data, the Mean is
symmetrical shape of the distribution influenced by the outliers. Notice the
and notice how the Mean and the Median is still centered
Median are centered
Median Median
349.0 349.5 350.0 350.5 351.0 351.5 352.0 3.5 4.0 4.5 5.0 5.5
❑ 1-Sample Sign: performs a one-sample sign test of the Median and calculates the
corresponding point estimate and confidence interval. Use this test as an
alternative to one-sample Z and one-sample t-tests
❑ This test is used when you want to compare the Median of one distribution to a target
value
❑ Must have at least one column of numeric data. If there is more than one column of
data, MINITABTM performs a one-sample Wilcoxon test separately for each column
❑ The hypotheses:
H0: M = Mtarget
Ha: M ≠ Mtarget
❑ Example: Our facility requires a cycle time from an improved process of 63 minutes.
This process supports the customer service division and has become a bottleneck to
completion of order processing. To alleviate the bottleneck the improved process
must perform at least at the expected 63 minutes
❑ Ho: M = 63
As you can see the P-value is less than 0.05, so we must reject the null hypothesis which means
we have data that supports the alternative hypothesis that the Median is different than 63.
As you can see the P-value is less than 0.05, so we must reject the null hypothesis which means we
have data that supports the alternative hypothesis that the Median is different than 63.
Saudi Aramco: Company General Use
113
Mann-Whitney Example
❑ The Mann-Whitney test is used to test if the Medians for 2 samples are different.
Ho: M1 = M2
Ha: M1 ≠ M2
❑ There are 200 data points for each machine, well over the minimum sample
necessary
60
50 Probability Plot of Mach B
40
30 Normal
20 99.9
Mean 16.73
10
StDev 5.284
5 99
N 200
AD 0.630
1 95 P-Value 0.099
90
0.1 80
0 10 20 7030 40
Percent
60
Mach A 50
40
30
20
10
5
0.1
0 5 10 15 20 25 30 35
Mach B
❑ Example: A credit card company now understands there is no variability difference in customer
calls/week for the two different credit card types. This means no difference in strategy of
deploying the workforces. However, the credit card company wants to see if there is a
difference in call volume between the two different card types. The company expects no
difference since the total sales among the two credit card types are similar. The Black Belt was
selected and told to evaluate with 95% confidence if the averages were the same. The Black
Belt reminded the credit card company the calls/day were not Normal distributions so he would
have to compare using Medians since Medians are used to describe the central tendency of Non-
normal Populations
❑ Since we know the data are Non-normal we can proceed to performing a Mann-Whitney Test
Stat>Nonparametrics>Mann-Whitney
❑ As you can see there is a difference in the Median between CallsperWk1 and CallsperWk2.
❑ Therefore, there is not a difference in call volume between the two different card types
❑ An aluminum company wanted to compare the operation of its three facilities worldwide.
They want to see if there is a difference in the recoveries among the three locations. A
Black Belt was asked to help management evaluate the recoveries at the locations with 95%
confidence.
❑ Ho: M1 = M2 = M3
Ha: at least one is different
Use the Mood’s Median test.
❑ Based on the smallest sample of 13, the test will be able to detect a difference close to 1.5
❑ Statistical Conclusions: Use the data in the columns named “Recovery” and “Location” in
the Minitab worksheet “Hypoteststud.mtw” for analysis
Stat>Basic Statistics>Graphical Summary… Instead of using the Anderson-Darling test for Normality,
this time we used the Graphical Summary method. It
gives a P-value for Normality and allows a view of the
data that the Normality test does not.
Summary for Recovery
Location = Savannah
A nderson-D arling N ormality Test
A -S quared 0.81
P -V alue 0.032
M ean 87.660
S tD ev 7.944
V ariance 63.113
S kew ness -0.15286
Kurtosis -1.11764
N 25
M inimum 75.300
1st Q uartile 79.000
M edian 87.500
78 84 90 96 3rd Q uartile 96.550
M aximum 99.200
95% C onfidence Interv al for M ean
84.381 90.939
95% C onfidence Interv al for M edian
86.179 90.080
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tD ev
Median
M ean 93.042
could do Box Plot to get a clearer
S tD ev
V ariance
5.918
35.017 idea about Outliers.
S kew ness -1.81758
Kurtosis 4.66838
N 13
M inimum 76.630
Summary for Recovery
1st Q uartile 90.600 Location = Ankhar
M edian 94.800
A nderson-D arling N ormality Test
78 84 90 96 3rd Q uartile 97.350
M aximum 99.700 A -S quared 0.86
P -V alue 0.022
95% C onfidence Interv al for M ean
89.466 96.617 M ean 88.302
S tD ev 6.929
95% C onfidence Interv al for M edian
V ariance 48.008
90.637 97.036 S kew ness -0.105610
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tD ev Kurtosis 0.182123
4.243 9.768 N 20
Mean
M inimum 73.500
Median 1st Q uartile 85.150
M edian 88.425
90 92 94 96 98
78 84 90 96 3rd Q uartile 89.700
M aximum 99.450
95% C onfidence Interv al for M ean
85.059 91.545
95% C onfidence Interv al for M edian
86.735 89.299
9 5 % C onfidence Inter vals 95% C onfidence Interv al for S tD ev
Median
85 86 87 88 89 90 91
Bartlett's Test
Test Statistic 1.33
Ankhar P-Value 0.514
Lev ene's Test
Test Statistic 1.02
P-Value 0.367
Location
Bangor
Savannah
3 4 5 6 7 8 9 10 11 12
95% Bonferroni Confidence Intervals for StDevs
We observe the confidence intervals for the Medians of the 3 populations. Note
there is no overlap of the 95% confidence levels for Bangor—so we visually know
the P-value is below 0.05.
Practical C on clu sion : Ban gor h as th e h igh e st re cove ry of all th re e facilitie s. 124
Saudi Aramco: Company General Use
Kruskal-Wallis Test
Using the same data set, analyze using the Kruskal-Wallis test.
H = 6.86 DF = 2 P = 0.032
H = 6.87 DF = 2 P = 0.032 (adjusted for
ties)
This output is the “least friendly” to interpret. Look for the P-value which tells us we reject the null
hypothesis. We have the same conclusion as with the Mood’s Median test. 125
Saudi Aramco: Company General Use
Unequal Variance
▪ Unequal variances are usually the result of differences in the shape of the
distribution
▪ Extreme tails
▪ Outliers
▪ Multiple modes
❑ For Skewed Distributions with comparable Medians, it is unusual for the variances to
be different without some assignable cause impacting the process
Model A and Model B are similar in nature (not exact), but are manufactured
in the same plant
Percent
60 60
50 50
40 40
30 30
20 20
10 10
5 5
1 1
8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 -5.0 -2.5 0.0 2.5 5.0 7.5 10.0
Model A Model B
idvar
Test Statistic 4.47
P-Value 0.049
Model B
0 1 2 3 4 5 6 7
95% Bonferroni Confidence Intervals for StDevs
Model A
idvar
Model B
0 2 4 6 8 10 12
Data
The P-value is just under the limit of .05. Whenever the result is borderline,
as in this case, use your process knowledge to make a judgment.
Saudi Aramco: Company General Use
128
Plot the data to explore explain the differences
Median Median
130
Why Confidence Interval?
.
❑ Sample statistics such as the mean, standard deviation and proportion (x, s, p) are
only estimates of the population parameters (𝑋, , and P)
❑ Since there is variability in these estimates from sample to sample, we can quantify
the uncertainty using confidence intervals
Frequency
Distribution
300
SE Mean = x= n 200
100
0
x = Standard Error of the Mean 30 40 50 60 70 80 90 100
Population
= Standard Deviation for the Individual Scores
80
n = Sample Size for the mean 70
60
Frequency
50
40
30
Sample Means
20
Distribution 10
0
30 40 50 60 70 80 90 100
Sample
Lower Upper
Confidence Confidence
Point Estimate Limit
Limit
Width of
confidence interval
Point Estimate is the sample statistic estimating the population parameter of interest
Critical Value is a table value based on the sampling distribution of the point estimate and
the desired confidence level
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
1 − α = 0.95 so α = 0.05
α α
= 0.025 = 0.025
2 2
μx = μ x
Intervals x1
extend from x2 (1-)100%
σ of intervals
X − Zα / 2 constructed
n
contain μ;
to
σ ()100% do
X + Zα / 2 not.
n Confidence Intervals
Saudi Aramco: Company General Use
140
Example
.
❑ A sample of 11 circuits from a large normal population has a mean resistance of 2.20
ohms. We know from past testing that the population standard deviation is 0.35 ohms
❑ Determine a 95% confidence interval for the true mean resistance of the population
Solution Interpretation
σ We are 95% confident that the true
X ± Z𝛼/2 mean resistance is between
n
1.9932 and 2.4068 ohms
= 2.20 ± 1.96 (0.35/ 11) Although the true mean may or may
= 2.20 ± 0.2068 not be in this interval, 95% of
intervals formed in this manner
1.9932 ≤ 𝜇 ≤ 2.4068 will contain the true mean
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
❑ We use the Student’s t Distribution instead of the normal distribution as it factor in the
greater uncertainty associated with small sample sizes
𝑠
X ± t 𝛼/2
n
where tα/2 is the critical value of the t distribution with n
-1 degrees of freedom and an area of α/2 in each tail
Saudi Aramco: Company General Use
143
Student’s t Distribution
.
❑ The t is a family of distributions
❑ Number of observations that are free to vary after sample mean has been calculated
𝑑. 𝑓. = 𝑛 − 1
Let X1 = 7
Let X2 = 8
If the mean of these three
What is X3? values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
Saudi Aramco: Company General Use
146
Student’s t table
.
Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) (∞ d.f.)
Note: t Z as n increases
Solution Interpretation
S Interpreting this interval requires the
𝑋 ± 𝑡𝛼/2 assumption that the population you are
n
8 sampling from is approximately a normal
= 50 ± (2.0639) distribution (especially since n is only 25)
25
This condition can be checked by creating a:
46.698 ≤ μ ≤ 53.302 Normal probability plot or Boxplot
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
𝑝(1
Ƹ − 𝑝)Ƹ
𝑃=
n
𝑝(1
Ƹ − 𝑝)Ƹ
n
𝑝Ƹ (1 − 𝑝Ƹ )
𝑝Ƹ ± Z𝛼/2
n
where Zα/2 is the standard normal value for the level of confidence desired
Solution Interpretation
p ± Z𝛼/2 p(1 − p)/n We are 95% confident that the true
= 25/100 ± 1.96 0.25(0.75)/100 percentage of left-handers in the
= 0.25 ± 1.96 0.0433 population is between 16.51% and 33.49%
Although the interval from 0.1651 to 0.3349
= 0.1651 ≤ p ≤ 0.3349 may or may not contain the true
proportion, 95% of intervals formed from
samples of size 100 in this manner will
contain the true proportion
154
Distinguishing between Two Samples
.
Theoretical Distribution
❑ Recall from the Central Limit Theorem as of Means
d When n = 2
the number of individual observations
d=5
increase the Standard Error decreases. S=1
Large S
❑ If the variance of the data is large, it is difficult to
establish differences. We need larger sample sizes
to reduce uncertainty
Saudi Aramco: Company General Use
156
The Perfect Sample Size
Question: “How many samples should we take?”
Answer: “Well, that depends on the size of your delta and Standard Deviation”
Answer: “Well, that depends on the size of your delta and Standard Deviation
Answer: “No, not if you took the correct number of samples the first time!”
Determining
Sample Size
❑ The required sample size can be found to reach a desired margin of error (e)
with a specified level of confidence (1 - α)
❑ The margin of error is also called sampling error the amount of imprecision in
the estimate of the population parameter or the amount added and subtracted
to the point estimate to form the confidence interval
Determining
Sample Size
2
For the 𝑍𝛼/2 𝜎2
Mean 𝑛=
𝑒2
σ
σ e = Zα / 2
X Zα / 2 n
n
❑ To determine the required sample size for the mean, you must know:
1) The desired level of confidence (1 - α), which determines the critical value,
Zα/2
Z 2 σ 2 (1.645)2 (45)2
n= 2
= 2
= 219.19
e 5
Another approach to choosing n uses the fact that the sample size will always be a maximum
for p = 0.5 [that is, p(1 - p)≤ 0.25 with equality for p 0.5], and this can be used to obtain an
upper bound on n. In other words, we are at least 100(1 – α)% confident that the error in
estimating p by is less than E if the sample size is 𝟐
𝐙α/𝟐
𝐧= 𝟎. 𝟐𝟓
𝐞𝟐
Saudi Aramco: Company General Use
165
Determining Sample Size
❑ To determine the required sample size for the proportion, you must know:
1) The desired level of confidence (1 - α), which determines the critical value,
Zα/2
Solution:
For 95% confidence, use Zα/2 = 1.96 , e = 0.03
p = 0.12, so use this to estimate p So use n = 451
2
Z𝛼/2 𝑝(1 − 𝑝) (1.96)2 (0.12)(1 − 0.12)
n= = = 450.74
e2 (0.03)2
❑ Laura was looking at the percentage of duplicate payments. She has randomly
sampled 50 and discovered that four were duplicated or defective. She wants
a 95% confidence of the overall payments population defect rate to within plus
or minus 2%. If she uses the defect percentage of her sample, calculate the
sample size she would need to determine what she wants to know
B. Once she sees the number, she indicates she is uncertain about the defect
rate, calculate the sample needed with an unknown defect rate
C. After seeing the samples sizes needed, Laura is concerned about never being
able to go to Hawaii again. What could you suggest?
Saudi Aramco: Company General Use
168
Proportion data - Example
𝐙𝛂𝟐 𝐏(𝟏 − 𝒑)
𝟐 𝟏. 𝟗𝟔𝟐 ∗ 𝟎. 𝟎𝟕𝟑𝟔
𝐧= = = 𝟕𝟎𝟔. 𝟖𝟓 ⇒ 𝟕𝟎𝟕
𝐞𝟐 𝟎. 𝟎𝟐𝟐
B. Once she sees the number, she indicates she is uncertain about the defect rate,
calculate the sample needed with an unknown defect rate
𝐙𝛂𝟐 𝟎. 𝟐𝟓
𝟐 𝟏. 𝟗𝟔𝟐 ∗ 𝟎. 𝟐𝟓
𝐧= = = 𝟐𝟒𝟎𝟏 ⇒ 𝟕𝟎𝟕
𝐞𝟐 𝟎. 𝟎𝟐𝟐
C. After seeing the samples sizes needed, Laura is concerned about never being
able to go to Hawaii again. What could you suggest?
Reduce confidence needed, decrease precision or error around the population mean
❑ To determine the required sample size for the proportion, you must know:
1) The desired level of confidence (1 - α), which determines the critical value,
Zα/2
2
𝐶 𝑍𝛼/2
𝑛=
𝑒2
2
𝐶 𝑍𝛼/2 72 ∗ 1.962
𝑛= = = 30.7 ⇒ 31
𝑒2 3 2
❑ Jennifer has already completed her first project. She is now analyzing a
suggestion to reduce the number of cell phones the company pays for. While
there is a report from the phone company about the number of calls per cell
phone, Jennifer knows she needs to verify the data on the report for her
Measurement System Analysis
❑ What size sample does she need to be 95% confident in the GRR accuracy if the
average number of calls per cell phone per month is 32 and she wants to be within
+/- 5 calls?
❑ Jennifer says this is great news! I can afford to be more accurate. How about +/-
2 calls? What will you tell her?
❑ What size sample does she need to be 95% confident in the GRR accuracy if the
average number of calls per cell phone per month is 32 and she wants to be within
+/- 5 calls?
2
𝐶 𝑍𝛼/2 32 ∗ 1.962
𝑛= = = 4.92 ⇒ 5
𝑒2 5 2
❑ Jennifer says this is great news! I can afford to be more accurate. How about +/-
2 calls? What will you tell her?
2
𝐶 𝑍𝛼/2 32 ∗ 1.962
𝑛= = = 30.73 ⇒ 31
𝑒2 2 2
❑ When we find sample sizes too numerous to investigate, we can decrease our
confidence or the amount of error to get us to a more reasonable n
176
ANOVA
❑ Analysis of Variance extends the two sample t-test for testing the equality of two
population Means to a more general null hypothesis of comparing the equality of
more than two Means, versus them not all being equal
❑ The classification variable, or factor, usually has three or more levels (If there are
only two levels, a t-test can be used)
❑ Is the between group variation large enough to be distinguished from the within
group variation?
X
X
X X
X
X X X
μ1 μ2
Saudi Aramco: Company General Use
178
Calculating ANOVA
Where:
Total (Overall) Variation
G - the number of groups (levels in the study)
j=1
nj (Xj − X) 2
(X ij − X) 2
(X
j=1 i =1
ij − X) 2
j=1 i =1
❑ The alpha risk increases as the number of Means increases with a pair-wise t-test
scheme. The formula for testing more than one pair of Means using a t-test is:
1 − (1 − α )
k
❑ “Are the means of the populations (1, 2, 3, 4) equal, or are there statistically
significant differences?”
1 2 3 4
❑ These populations represent the levels of a factor
❑ The Sigma Finance Company is attempting to improve the time it takes to process
forms. The team believes there is a difference in the form cycle time between the
four processing centers
62 63 68 56
60 67 66 62
63 71 72 60
59 64 67 61
65 68 63
66 68 64
63
59
Analysis of Variance
Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000
Error 20 112.00 5.60
Total 23 340.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ---+---------+---------+---------+---
Center 1 4 61.000 1.826 (------*------)
Center 2 6 66.000 2.828 (-----*----)
Center 3 6 68.000 1.673 (----*-----)
Center 4 8 61.000 2.619 (----*----)
---+---------+---------+---------+---
Pooled StDev = 2.366 59.5 63.0 66.5 70.0
Total Variation
Center 1 Center 2 Center 3 Center 4
62 63 68 56
Total Variation
60 67 66 62 = Variation +
63 71 72 60 Variation Between Within
59 64 67 61
65 68 63 SST = SSb + SSe
66 68 64
63 Factor Error
Saudi Aramco: Company General Use
59 185
How it works
Variation Between (Factor)
Center 1 Center 2 Center 3 Center 4 Center 1 Center 2 Center 3 Center 4
62 63 68 56 yj 61 66 68 61
60 67 66 62
63 71 72 60 sj2 3.33 7.95 2.79 6.85
59 64 67 61
65 68 63 nj 4 6 6 8
66 68 64
63
Y = 64 59
SS =
4
SSb Analysis of Variance
Error
(𝑛𝑗 − 1) 𝑆𝑗 2
(Within) 𝑗=1 Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000
4 Error 20 112.00 5.60
SS = 𝑛𝑗 (𝑦𝑗 − 𝑦)2 Total 23 340.00
Factor
(Between) 𝑗=1
SSe
Analysis of Variance
F Calculated
Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000 76
= 13.57
5.6
Error 20 112.00 5.60
Total 23 340.00
4 MSe
2
(𝑛𝑗 − 1) 𝑆𝑗
MS = 9.99 + 39.75 + 13.95 + 48.02 112
Error
𝑗=1
4
= = = 5.60
(Within)
20 20
(𝑛𝑗 − 1)
𝑗=1
Analysis of Variance
Source DF SS MS F P
Factor 3 228.00 76.00 13.57 0.000
Error 20 112.00 5.60
Total 23 340.00
Individual 95% CIs For Mean
Based on Pooled StDev
Level N Mean StDev ---+---------+---------+---------+---
Center 1 4 61.000 1.826 (------*------)
Center 2 6 66.000 2.828 (-----*----)
Center 3 6 68.000 1.673 (----*-----)
Center 4 8 61.000 2.619 (----*----)
---+---------+---------+---------+---
Pooled StDev = 2.366 59.5 63.0 66.5 70.0
❑ The Sigma Finance Company has made a decision to outsource its contracting
function. Four companies have been identified and one of the criteria is the time
in which they close contracts
❑ Are any of the vendors’ significantly better than the others in average time and
consistency with at least 95% confidence?
▪ Since p-value ≤ significant level (α) then we reject the null hypothesis (H0) and
conclude there is a difference between the four companies
Grand
68
Average
67
66
Stacked
65
64
63
62
61
Center
❑ We have three potential suppliers that claim to have equal levels of quality.
Supplier B provides a considerably lower purchase price than either of the other
two vendors. We would like to choose the lowest cost supplier but we must ensure
that we do not effect the quality of our raw material.
60
50 0.385), Supplier C (P-value 0.910)
40
30
20
60 P-Value 0.910
50
80
40
70
30
Percent
60
20
50
10 40
30
5
20
1 10
3.50 3.75 4.00 5 4.25 4.50
Supplier B
1
3.0 3.5 4.0 4.5 5.0
Supplier C
Bartlett's Test
Test Statistic 2.11
Supplier A P-Value 0.348
Lev ene's Test
Test Statistic 0.59
P-Value 0.568
Suppliers
Supplier B
Supplier C
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
95% Bonferroni Confidence Intervals for StDevs
Click on “Graphs…”,
Check “Boxplots of data” 196
Saudi Aramco: Company General Use
ANOVA MINITAB
What does this graph tell us?
4.4
4.2
4.0
Data
3.8
3.6
3.4
3.2
3.0
Supplier A Supplier B Supplier C
Source DF SS MS F P
Factor 2 0.384 0.192 1.40 0.284
Error 12 1.641 0.137
Total 14 2.025
Source DF SS MS F P
Factor 2 0.384 0.192 1.40 0.284
Error 12 1.641 0.137
Total 14 2.025
▪ Check the data for Normality at each level and for homogeneity of variance
across all levels
▪ Examine the residuals (a residual is the difference in what the model predicts
and the true observation)
o Normal plot of the residuals
4
Frequency
0
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Residual
95
90
80
70
Percent
60
50
40
30
20
10
1
-1.0 -0.5 0.0 0.5 1.0
Residual
0.50
0.25
Residual
0.00
-0.25
-0.50
❑ If the p-value from the ANOVA is less than some significance level (like α = .05), we
can reject the null hypothesis and conclude that at least one of the group means is
different from the others
❑ But in order to find out exactly which groups are different from each other, we
must conduct a post-hoc test
❑ One commonly used post-hoc test is Fisher’s least significant difference test
1 1
𝐿𝑆𝐷 = 𝑡𝛼 , 𝐷𝐹 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑠 ∗ 𝑀𝑆𝐺𝑟𝑜𝑢𝑝𝑠 +
2 𝑛1 𝑛2
Where 𝒕𝜶 , 𝑫𝑭 𝒇𝒐𝒓 𝒈𝒓𝒐𝒖𝒑𝒔 ∶ the t-crtitical from the t-distribution with 𝛼 and 𝐷𝐹 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑠 is the
𝟐
𝑴𝑺𝑮𝒓𝒐𝒖𝒑𝒔 ∶ the mean squared within groups from the ANOVA table
❑ We can then compare the mean difference between each group to this test
statistic. If the absolute value of the mean difference between two groups is
greater than the test statistic, we can declare that there is a statistically significant
difference between the group means
Saudi Aramco: Company General Use
206
Example: Fisher’s LSD Test
they used:
❑ The professor performs a one-way ANOVA and get the following results:
❑ Since the p-value in the ANOVA table (.018771) is less than .05, we can conclude
that not all of the mean exam scores between the three groups are equal
❑ Using the output of the ANOVA, we can calculate Fisher’s test statistic as:
1 1
𝐿𝑆𝐷 = 𝑡𝛼 , 𝐷𝐹 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑠 ∗ 𝑀𝑆𝐺𝑟𝑜𝑢𝑝𝑠 +
2 𝑛1 𝑛2
1 1
𝐿𝑆𝐷 = 𝑡0.025, 27 ∗ 36.948 + = 2.0252 7.3896 = 5.578
10 10
❑ We can then calculate the absolute mean difference between each group:
▪ Technique 1 vs. Technique 2: |80 – 85.8| = 5.8
❑ The absolute mean differences between technique 1 vs. technique 2 and technique
1 vs. technique 3 are greater than Fisher’s test statistic, thus we can conclude that
these techniques lead to statistically significantly different mean exam scores
❑ We can also conclude that there is no significant difference in mean exam scores
between technique 2 and technique 3