Download as pdf or txt
Download as pdf or txt
You are on page 1of 157

STATISTICS LEVEL 2

Part 2
(Rev 2.0)

ST Restricted
Structure of the course

Module 1: INTRODUCTION Module 4: HYPOTHESIS TESTING


• Introduction
• First Concepts • 1 Normal Population
• Population VS Sample • 2 Normal Populations
• Descriptive VS Inferential • ANOVA
• Estimation (point and interval) • Non-Parametric Test
• Hypothesis Testing Annex 1
• Inferential Error
• Introduction to Bootstrap
Module 2: CENTRAL LIMIT THEOREM
• Decision Making Process Annex 2
• Numerical Simulation & Examples
• t-Distribution • Overview of Outlier Detection Methods

Module 3: CONFIDENCE INTERVAL


• Estimation
• Point Estimation
• Properties of Point Estimators
• Interval Estimation
• Parameters of 1 Normal Population
• Parameters of 2 Normal Population

2
ST Restricted
Module 4: Hypotheses testing

ST Restricted
Module 4 objectives

• At the end of this chapter, you will be able to:

• Test statistical hypotheses on single population parameters


• To use hypothesis testing procedures to compare more than one population parameters (this will be
in subsequent module)
• To assess when a parametric procedure can be used and when it is better to use a non-parametric
one (this will be in subsequent module)

4
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Inferential Statistics (or Inference)
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Hypothesis testing
e.g., Test the claim that the population mean weight is 120 pounds

5
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Hypothesis Testing Procedure
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Common steps to every hypothesis testing procedure:
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions ❑ First step - definition of a “system of two hypotheses”, indicated by H0 and H1
Test on correlation coefficient
More than two populations o The first, H0 is called the “null hypothesis” and
Non-parametric procedures o the second, H1, is called the “alternative hypothesis”.
Introduction o They are defined in a mutually exclusive way. This means that if one hypothesis is true, the
List of tests
second one must be false.
Module 4 Key Learning
❑ Second step – data collection (sample) from the population on which we want to
make some inference. Of course, we assume that the sample adequately represents
the population.
❑ Third step - decision about the hypothesis which is more likely to be true based data
evidence. Statistics helps us to take this decision. The conclusion of the test is always
associated to an acceptable probability of error called “significance level”.

6
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


The Null Hypothesis – H0
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion ➢ States the assumption (numerical) to be tested
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
Example: The average number of TV sets in U.S.
More than two populations Homes is equal to three ( H0: μ=3)
Non-parametric procedures
Introduction

➢ Is always about a population parameter, not


List of tests

Module 4 Key Learning


about a sample statistic

H0 : μ = 3 H0 : X = 3

7
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


The Null Hypothesis – H0
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
➢ Begin with the assumption that the null hypothesis is true
Test difference of means
Test ratio of variances o Like the notion of innocent until proven guilty
Test difference of proportions
Test on correlation coefficient
More than two populations
➢ Refers to the “status quo”
Non-parametric procedures
Introduction

➢ Always contains “=” , “≤” or “” sign


List of tests

Module 4 Key Learning

➢ May or may not be rejected

8
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


The Alternative Hypothesis – H1
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
➢ Is the opposite of the null hypothesis
Test on proportion
Two normal populations • e.g., The average number of TV sets in U.S. homes is not equal to 3 ( H1: μ
Test difference of means
Test ratio of variances ≠3)
Test difference of proportions
Test on correlation coefficient ➢ Challenges the “status quo”
More than two populations
Non-parametric procedures
Introduction
➢ Always contains the “≠”, “<“ or “>” sign
• If it contains the “≠” sign, the test is a “two-sided test”
List of tests

Module 4 Key Learning


• If it contains the “<“ or the “>” sign, the test is a “one-sided test”
➢ Never contains the “=” , “≤” or “” sign
➢ May or may not be supported
➢ Is generally the hypothesis that the researcher is trying to support

9
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis Testing Process


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Start from the Claim: Population
One normal population
Test on mean
Test on variance
“The population
Test on proportion
Two normal populations
mean age is 50”.
Test difference of means
Test ratio of variances
Test difference of proportions
(Null Hypothesis:
H0: μ = 50 )
Test on correlation coefficient
More than two populations
Now select a
Non-parametric procedures
Introduction random sample
List of tests

Module 4 Key Learning


Suppose
ഥ = 𝟐𝟎 likely if μ = 50?
Is 𝑿 the sample
mean age
ഥ = 𝟐𝟎
is 20: 𝑿
Sample
If not likely,
REJECT
the Null Hypothesis
10
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis Testing Process


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance ഥ (if H0 is true)
Sampling Distribution of 𝒙
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
H0 : μ = 50
Test on correlation coefficient
More than two populations
Non-parametric procedures ഥ= 20
𝒙 μ = 50
X
Introduction
List of tests If it is unlikely that we
Module 4 Key Learning would get a sample
average of this value ...
... if in fact this (50)
were the population
mean…
... then we reject the null
hypothesis that μ = 50.

11
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis Testing Process


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures “If the sample mean 𝑥ҧ were exactly 50, of course we would accept the null hypothesis.”
One normal population
Test on mean Question:
Test on variance
Test on proportion
what is the maximum “distance” between the parameter value hypothesized
Two normal populations in H0 (50) and the sample mean 𝑥,ҧ to still conclude that H0 is true?
Test difference of means
Test ratio of variances
Test difference of proportions ഥ (if H0 is true)
Sampling Distribution of 𝒙
Test on correlation coefficient
More than two populations
Non-parametric procedures H0 : μ = 50
Introduction
List of tests H1 : μ ≠ 50
Module 4 Key Learning

μ = 50 X

max distance (to still support H0)?


12
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis Testing Process


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
H0 : μ ≤ 50
One normal population Consider the (one-sided) system of hypotheses:
Test on mean H1 : μ > 50
Test on variance
Test on proportion
Two normal populations ഥ (if H0 is true)
Sampling Distribution of 𝒙
Test difference of means
Test ratio of variances How large the sample average can
Test difference of proportions be to still support H0?
Test on correlation coefficient
More than two populations CRITERION: use as “threshold”, that
Non-parametric procedures value of ഥ𝒙 larger than (1-α)% of the
Introduction (1-α) = 0.95 α = 0.05 population and smaller than α%.
List of tests α is a probability, so, between 0 and 1.
Module 4 Key Learning
For example, α = 0.05

μ = 50 X
“threshold”
Critical Value (for a given α)

Adopting this criterion to define the Critical Value, implies that we accept a risk that some large values
of 𝑥ҧ (larger than the critical value) will lead to an erroneous rejection of H0 (the yellow right tale of the
distribution in the figure).
- The probability associated to this event (our risk) is α.
- Conversely, we don’t risk to do this error with probability (1- α).
13
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis Testing Process


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean In case of a two-sided test, the probability α is equally divided on the two
Test on variance
Test on proportion
tails of the distribution, so that the total risk has still a probability equal to α.
Two normal populations In these cases, we have two critical values:
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
ഥ (if H0 is true)
Sampling Distribution of 𝒙
More than two populations
Non-parametric procedures
Introduction
List of tests H0 : μ = 50
Module 4 Key Learning H1 : μ ≠ 50
α/2 = 0.025 (1-α) = 0.95 α/2 = 0.025

μ = 50 X
Critical Value Critical Value

14
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis Testing Process


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean Process Specs : 0 ± 20um.
Test on variance We claim the pop mean will be at nominal value 0um.
Test on proportion
Two normal populations We collect sample data and investigate if our claim is supported.
Test difference of means
Test ratio of variances
Test difference of proportions Null Hypothesis, 𝑯𝒐 Alternative Hypothesis, 𝑯𝟏
Test on correlation coefficient
More than two populations • Status quo. • Challenge the status quo.
Non-parametric procedures • Contains only = , ≤ 𝑜𝑟 ≥ sign. • Contains only ≠ , < 𝑜𝑟 > sign.
Introduction • “Innocent unless proven guilty” • Hypothesis the researcher trying to
List of tests
investigate.
Module 4 Key Learning

𝐻𝑜 : 𝜇 = 0 𝐻1 : 𝜇 ≠ 0 (Two Sided)

𝐻𝑜 : 𝜇 = 0 𝑜𝑟 (𝜇 ≥ 0) 𝐻1 : 𝜇 < 0 (One Sided Lower)

𝐻𝑜 : 𝜇 = 0 𝑜𝑟 (𝜇 ≤ 0) 𝐻1 : 𝜇 > 0 (One Sided Upper)

Hypothesis Testing is about inferring population parameter.

Only population parameter symbols are used in Hypothesis Statement and NOT sample statistic.

15

ST Restricted
Module 4

Hypothesis testing - introduction


Hypothesis Testing Process
Associated errors – type I & type II 𝑆𝑝𝑒𝑐𝑠 = 5𝑢𝑚 ± 20𝑢𝑚
Significance, confidence, power
Parametric procedures Step 1 Define the System of Hypothesis
One normal population
Test on mean
𝐻𝑜 : 𝜇 = 5
Test on variance
Test on proportion 𝐻1 : 𝜇 ≠ 5
Two normal populations
Test difference of means
Test ratio of variances Step 2 Convert Sample Data into Information.
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

16
ST Restricted
Module 4

Hypothesis testing - introduction


Non-Standardized Scale Standardized Scale
Associated errors – type I & type II
Step 3
Significance, confidence, power
Parametric procedures
Establish the Distribution of the Sample Average Assume 𝐻𝑜 is true
One normal population (given a sample of size n)
Test on mean
Test on variance 𝝈𝒙
Test on proportion 𝑰𝒇 𝝈𝒙 𝒌𝒏𝒐𝒘𝒏 N ( µx , ) Std.Error
𝒏
𝝈𝒙
Two normal populations
Test difference of means 𝒏
• 𝝁𝒙 = 𝝁𝒙ഥ
Test ratio of variances 𝝈𝒙 ഥ
𝒙
Test difference of proportions • 𝝈𝒙ഥ = 𝑺𝒕𝒅. 𝑬𝒓𝒓 = µx = 5
𝒏
Test on correlation coefficient
More than two populations
Non-parametric procedures t-distribution Assume 𝐻𝑜 is true
𝑰𝒇𝝈𝒙 𝒖𝒏𝒌𝒏𝒐𝒘𝒏
Introduction (with n-1 dof)
List of tests
Std.Error 𝑑𝑜𝑓 = 29
Module 4 Key Learning 𝒔𝒙 𝟑. 𝟔𝟗
𝑺𝒕𝒅. 𝑬𝒓𝒓 = = = 𝟎. 𝟔𝟕 0.67
𝒏 𝟑𝟎

𝒙
µx = 5

Here onwards, we are using 𝝈𝒙 unknown situation.

Step 4
Assume 𝐻𝑜 is true

0.025 0.025
User to define α. In this case α=0.05

0.67


𝒙 17
µx = 5
ST Restricted
Module 4

Hypothesis testing - introduction


Non-Standardized Scale Standardized Scale
Associated errors – type I & type II
Significance, confidence, power Step 5 Assume 𝐻𝑜 is true
Parametric procedures
One normal population Establish the Acceptance & Rejection Regions 0.025 0.025
Test on mean
Test on variance • If the Sample Mean is within the Acceptance Region
Test on proportion 0.67
Statistical Conclusion: Failed to Reject Ho
Two normal populations
Test difference of means • If the Sample Mean is within the Rejection Region µx = 5

𝒙
Test ratio of variances Statistical Conclusion : Reject Ho REJECT REJECT
ACCEPT REGION
Test difference of proportions REGION REGION

Test on correlation coefficient


In this case, the Sample Mean, 𝑥ҧ = −1.27.
More than two populations
Non-parametric procedures But how do we know 𝑥ҧ = −1.27 is within which region?
Introduction This is why we need to find the Critical Value.
List of tests

Module 4 Key Learning


Step 6
Assume 𝐻𝑜 is true
Find the Critical Value
0.025 0.025

• How to find the Critical Value?


0.67
• Note: In modern days, statistical software

𝒙
will compute the Critical Value. µx = 5
REJECT REJECT
ACCEPT REGION
REGION REGION
• Subsequent steps is to show the classical
method used.

Critical Value Critical Value


18
ST Restricted
Module 4

Hypothesis testing - introduction


Non-Standardized Scale Standardized Scale
Associated errors – type I & type II
Significance, confidence, power Step 7 𝒙lj − 𝝁
Parametric procedures
Assume 𝐻𝑜 is true 𝒕= 𝒔 Assume 𝐻𝑜 is true
One normal population Convert to Standardized Scale, where
Test on mean in our case: 0.025 0.025
𝒏 0.025 0.025
Test on variance
Test on proportion 𝝁=5
0.67
Two normal populations
Test difference of means 𝒔
Test ratio of variances = 𝟎. 𝟔𝟕
µx = 5

𝒙 t
𝒏 µx =0
Test difference of proportions REJECT REJECT REJECT REJECT
ACCEPT REGION ACCEPT REGION
Test on correlation coefficient REGION REGION REGION REGION

More than two populations


Non-parametric procedures
Introduction Critical Value Critical Value Critical Value Critical Value
List of tests

Module 4 Key Learning Statistician have tabulated the function


of t-distribution creating the
t-statistical table.

19
ST Restricted
Module 4

Hypothesis testing - introduction


Non-Standardized Scale Standardized Scale
Associated errors – type I & type II
Significance, confidence, power Step 8
Parametric procedures
One normal population Refer to statistical table for t-distribution Using the Statistical Table, we know
Test on mean the Critical Value is 2.045
Test on variance • Alpha is 0.025
Test on proportion • dof is 29.
Two normal populations Assume 𝐻𝑜 is true
Test difference of means
Test ratio of variances 0.025 0.025
Test difference of proportions
Test on correlation coefficient
1
More than two populations
t
Non-parametric procedures µx = 0
Introduction REJECT REJECT
ACCEPT REGION
List of tests REGION REGION

Module 4 Key Learning


-2.045 2.045

20
ST Restricted
Module 4

Hypothesis testing - introduction


Non-Standardized Scale Standardized Scale
Associated errors – type I & type II
Significance, confidence, power Step 9
Parametric procedures
Assume 𝐻𝑜 is true
One normal population Covert back to Non-Standardized Scale
Test on mean 0.025 0.025
Test on variance
Test on proportion
1
Two normal populations
Test difference of means
Test ratio of variances t
µx =0
Test difference of proportions REJECT REJECT
ACCEPT REGION
Test on correlation coefficient REGION REGION

More than two populations


Non-parametric procedures -2.045 2.045
Introduction
List of tests

Module 4 Key Learning 𝒔


ഥ = 𝝁 ± (𝟐. 𝟎𝟒𝟓)(
𝒙 )
Assume 𝐻𝑜 is true 𝒏

0.025 0.025

0.67

µx = 5 ഥ
𝒙
𝒙lj − 𝟓
= ±𝟐. 𝟎𝟒𝟓 REJECT
ACCEPT REGION
REJECT
𝟎. 𝟔𝟕 REGION REGION

ഥ = 𝟓 ± (𝟐. 𝟎𝟒𝟓)(𝟎. 𝟔𝟕)


𝒙 3.63 6.37

21
ST Restricted
Module 4

Hypothesis testing - introduction


Non-Standardized Scale Standardized Scale
Associated errors – type I & type II
Significance, confidence, power Step 9
Parametric procedures
Assume 𝐻𝑜 is true
One normal population Now we can answer which region the
Test on mean Sample Mean, 𝑥ҧ = −1.27 is at.
0.025 0.025
Test on variance
Test on proportion Statistical Conclusion:
Two normal populations Reject Ho 0.67
Test difference of means
Test ratio of variances ഥ
𝒙
µx = 5
Test difference of proportions REJECT
REJECT REGION ACCEPT REGION
Test on correlation coefficient REGION

More than two populations Assume 𝐻𝑜 is true


Non-parametric procedures 𝑥ҧ = −1.27 3.63 6.37
Introduction 0.025 0.025
List of tests

Module 4 Key Learning


t
µx =0
REJECT
REJECT REGION ACCEPT REGION
REGION

Convert Sample Mean 𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = −9.29 -2.045 2.045


into Standardized Scale

𝒙lj − 𝝁 (−𝟏. 𝟐𝟕) − (𝟓)


𝑻𝒆𝒔𝒕 𝑺𝒕𝒂𝒕𝒊𝒔𝒕𝒊𝒄𝒔 = 𝒔 = 𝟎. 𝟔𝟕
𝒏

(Note: In JMP, the calculation takes into account


all the decimal points)
22
ST Restricted
Module 4
Results in JMP
Hypothesis testing - introduction Standardized Scale
Associated errors – type I & type II 𝐻𝑜 : 𝜇 = 5
Assume 𝐻𝑜 is true
Significance, confidence, power
Parametric procedures 𝐻1 : 𝜇 ≠ 5 0.025 0.025
One normal population
Test on mean
Test on variance
Test on proportion
t
Two normal populations µx =0
Test difference of means REJECT REGION ACCEPT REGION
REJECT
REGION
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = −9.29 -2.045 2.045
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning Non-Standardized Scale


Assume 𝐻𝑜 is true

0.025 0.025

0.67


𝒙
µx = 5
REJECT
REJECT REGION ACCEPT REGION
REGION

𝑥ҧ = −1.27 3.63 6.37


23
ST Restricted
0.025 0.025
This concept works for any
0.95 chosen 𝛼 value.

The case shown here is


using 𝛼 = 0.05.
REJECTION REGION ACCEPTANCE REGION REJECTION REGION

Critical Value Critical Value

Test Statistic < Abs(Critical Value)


𝑃 𝑣𝑎𝑙𝑢𝑒 > 0.05
𝐹𝑎𝑖𝑙 𝑡𝑜 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻𝑜

Test Statistic Test Statistic

Test Statistic > Abs (Critical Value)


𝑃 𝑣𝑎𝑙𝑢𝑒 < 0.05
𝑅𝑒𝑗𝑒𝑐𝑡 𝐻𝑜

Test Statistic Test Statistic 24


ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14
Associated errors – type I & type II
Significance, confidence, power
Exercise File:
Parametric procedures Scenario:
One normal population
Test on mean
Test on variance 1. You are setting up the machine.
Test on proportion 2. 30 setup units measured.
Two normal populations
Test difference of means
3. Descriptive Statistics obtained.
Test ratio of variances a. All single value within specs limit. Die Attach Placement Specs = 0𝑢𝑚 ± 20𝑢𝑚
Test difference of proportions
Test on correlation coefficient b. Cpk > 1.67.
More than two populations
4. Decision? :
Non-parametric procedures
a. Release machine for production, or
Introduction b. Re-setup machine.
List of tests

Module 4 Key Learning Note:


• This is the same example used in Confidence Interval topic (Example #6).
• You can compare the statistical conclusion vs the two methods
(Confidence Interval & Hypothesis Testing).

Trainer will show:


1. How to perform Hypothesis
Testing.

3. Interpretation of results.
25
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Make first distribution. Go to Analyze > Distribution. Cast the 2 columns to Y
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction Do a hypothesis testing. Use hotspot, select Test Mean. Enter Hypothesized Mean. Hit OK
List of tests

Module 4 Key Learning

Use for non-normal distribution

26
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions If 2-sided test
Test on correlation coefficient If right test
If left test
More than two populations
Non-parametric procedures
Since Prob > |t| is
Introduction greater than the
List of tests
significance value (0.05),
Module 4 Key Learning then we FAIL TO REJECT
The NULL HYPOTHESIS.

Practical Conclusion:
The process is centered, can
release machine for
production.
Perform the same for Y-Offset data. What is your conclusion?

27
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means Since Prob > |t| is
Test ratio of variances
Test difference of proportions less than the
Test on correlation coefficient significance value (0.05),
More than two populations then we REJECT
Non-parametric procedures The NULL HYPOTHESIS.
Introduction
List of tests
Practical Conclusion:
Module 4 Key Learning
The process is not centered,
check the machine for
Y-offset.

28
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


𝐼𝑓 𝐻𝑜 𝑖𝑠 𝑇𝑟𝑢𝑒
2 Sided Test
Significance, confidence, power
Parametric procedures
One normal population
𝐻𝑜 : 𝜇 = 0
Test on mean
Test on variance 𝐻1 : 𝜇 ≠ 0 2.5% 2.5%
Test on proportion
Two normal populations 95%
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
REJECTION REGION ACCEPTANCE REGION REJECTION REGION
More than two populations
Non-parametric procedures
Introduction
List of tests
By default this graph
Module 4 Key Learning shows two-sided test.

To see one-sided test,


goes “P(value)
Animation”.

Statistical Conclusion:
29
Fail to Reject Ho
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


𝐼𝑓 𝐻𝑜 𝑖𝑠 𝑇𝑟𝑢𝑒
Significance, confidence, power 1 Upper Sided Test
Parametric procedures
One normal population
Test on mean
𝐻𝑜 : 𝜇 ≤ 0
Test on variance
Test on proportion 𝐻1 : 𝜇 > 0 5.0%
Two normal populations 95%
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
ACCEPTANCE REGION REJECTION REGION
More than two populations
Non-parametric procedures
Introduction
Statistical Conclusion:
List of tests Fail to Reject Ho
Module 4 Key Learning

30
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II 𝐼𝑓 𝐻𝑜 𝑖𝑠 𝑇𝑟𝑢𝑒


Significance, confidence, power
Parametric procedures
1 Lower Sided Test
One normal population
Test on mean
Test on variance 𝐻𝑜 : 𝜇 ≥ 0
Test on proportion 5.0%
Two normal populations
Test difference of means
𝐻1 : 𝜇 < 0
Test ratio of variances 95%
Test difference of proportions
Test on correlation coefficient Statistical Conclusion:
More than two populations Accept 𝐻1 REJECTION REGION ACCEPTANCE REGION
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

31
ST Restricted
Module 4 Side by Side Comparison : Confidence Interval vs Hypothesis Testing on same data
Hypothesis testing - introduction

Associated errors – type I & type II


Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means Target of Interest : 0
Test ratio of variances
Test difference of proportions
Test on correlation coefficient Confidence Interval Hypothesis Testing
More than two populations
1. Establish Hypothesis Statement
Non-parametric procedures
Introduction
List of tests
𝐻𝑜 : 𝜇 = 0 𝐻1 : 𝜇 ≠ 0
Module 4 Key Learning 2. Establish the Distribution of the Sample Average
1. Establish the Distribution of the Sample Average
(given a sample of size n)
(given a sample of size n)

Assume is 𝐻𝑜 True.

𝑑𝑜𝑓 = 29 𝑑𝑜𝑓 = 29

𝟎
t t
𝟎 32

ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Confidence Interval Hypothesis Testing


Significance, confidence, power
Parametric procedures
2. Define Significance Level, 𝜶 (in this case 0.05). 3. Define Significance Level, 𝜶 (in this case 0.05).
One normal population
Test on mean 3. Establish Upper and Lower Confidence Limit. 4. Establish Acceptance and Rejection region.
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances 2.5% 2.5% 2.5% 2.5%
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction 𝟎 𝟎
List of tests Critical Value Critical Value REJECT ACCEPTANCE REGION REJECT
(-2.045) (2.045)
Module 4 Key Learning Critical Value Test Statistic Test Statistic Critical Value
𝒔 𝒔 (-2.045) (-1.8778) (1.8778) (2.045)
ഥ − 𝒕𝒏−𝟏,𝜶/𝟐
𝑿 ഥ + 𝒕𝒏−𝟏,𝜶/𝟐
<𝝁<𝑿
𝒏 𝒏

Lower C.L Upper C.L

3.69 3.69
(-1,27) – (2.045)( ) (-1.27) + (2.045)( )
30 30 Represents the “Yellow” area shown
above. Hence, we know the Test Statistic
-2.65 0.11 Is within the Acceptance Region.

33
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Confidence Interval Hypothesis Testing


Significance, confidence, power
Parametric procedures 5. Statistical Conclusion
One normal population
Test on mean P(value) > 0.05 ➔ Fail to Reject 𝐻𝑜 . ➔ Process is centered.
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations 4. Statistical Conclusion:
Non-parametric procedures Target of Interest : Zero is within CI ➔ Process is centered.
Introduction
List of tests

Module 4 Key Learning

Both Inferential Method provides the same statistical conclusion.

34
ST Restricted
Module 4
Practical Simulation to show Confidence Interval and Hypothesis Testing provide same statistical result performed at same
Hypothesis testing - introduction significance level, alpha (Simulation using sheet Pop 1 from Central Limit Theorem.xls file where the known Pop mean is 5.02).

Associated errors – type I & type II Confidence Interval performed at alpha=0.05 (or 95% confidence level)
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion Pop Mean.
Two normal populations @5.02
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations On Average 4/100 CI
do not contain µ
Non-parametric procedures
Introduction Hypothesis Statement : 𝐻𝑜 : 𝜇 =5.02 , 𝐻1 : 𝜇 ≠ 5.02. Hypothesis test performed at alpha=0.05 (or 95% confidence level)
List of tests

Module 4 Key Learning

On Average
4/100
P(value) <0.05
35
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Errors in making decisions
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion Type I Error - “reject a true null hypothesis”
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
The probability of Type I Error is 
Test on correlation coefficient
More than two populations
•  is called “level of significance” of the test
Non-parametric procedures
Introduction
•  is set by researcher in advance
List of tests

Module 4 Key Learning

Type II Error - “fail to reject a false null hypothesis”


The probability of Type II Error is β

36
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Outcomes and probabilities
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance Possible Hypothesis Test Outcomes
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances Actual Situation Key:
Test difference of proportions
Test on correlation coefficient Outcome
More than two populations Decision H0 True H0 False (Probability)
Non-parametric procedures
Introduction
List of tests Do Not No error Type II Error
Module 4 Key Learning Reject H0 (1 - α) (β)

Type I Error No Error


Reject H0
(α) (1 - β)

37
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Type I & II Error Relationship
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
▪ Type I and Type II errors cannot happen at the same time
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
▪ Type I error can only occur if H0 is true
Test difference of proportions
Test on correlation coefficient ▪ Type II error can only occur if H0 is false
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning


If type I error probability (  ) , then Type II error probability ( β )

Lowering , the probability of type I error (with no change in available data), β, the
probability of type II error, increases.

38

ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Consequences of type I & II Error
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
EXAMPLE on the consequences of the two types of error.
Test on variance “We want to test if two Equipment, on average, are aligned. Suppose that a sample of size n1 is
Test on proportion
Two normal populations
drawn from the population of measurements of Equipment 1 and that a second sample of size n 2 is
Test difference of means drawn from the population of measurements of Equipment 2. Moreover, let’s assume that: (a) both
Test ratio of variances
Test difference of proportions
populations are Normally distributed with known (and equal) variances, (b) samples are
Test on correlation coefficient independent and (c) n1 = n2.
More than two populations
The hypotheses to test are: H0: μ1- μ2 = 0 => If true, the equipment are aligned.
Non-parametric procedures
Introduction H1: μ1- μ2  0 => If true, the equipment are misaligned.
List of tests

Module 4 Key Learning Type I Error


“Reject H0 which is true”: We waste our efforts, time and money trying to align
equipment which are already aligned.
Type II Error
“Accept H0 which is false”: We do nothing to match the equipment, since we think
that they are aligned. But they are not and this error will
increase process variability.

39
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Consequences of type I & II Error
Significance, confidence, power
Parametric procedures
One normal population
Test on mean EXAMPLE on the consequences of the two types of error.
Test on variance
Test on proportion A production process is monitored with SPC techniques.
Two normal populations
Test difference of means H0: “The process is in-control”.
Test ratio of variances
The hypotheses to test are:
Test difference of proportions
H1: “The process is out-of-control”.
Test on correlation coefficient
More than two populations
Type I Error
Non-parametric procedures
“Reject H0 which is true”: Actually, the process is in-control, but we erroneously
Introduction think that it is not. In this case, a FALSE ALARM occurred.
List of tests
Thus, we waste our efforts, time and money investigating
Module 4 Key Learning and trying to remove the effect of “special causes” of
variability which, actually, do not exist.
Type II Error
“Accept H0 which is false”: We fail to detect that the process run out-of control. The
effects of this error are considered very dangerous.

40
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Factors affecting type II Error
Significance, confidence, power
Parametric procedures
One normal population
Test on mean All else equal,
Test on variance
Test on proportion
Two normal populations
Test difference of means

β
Test ratio of variances
Test difference of proportions o when the difference between hypothesized parameter and its true value
Test on correlation coefficient
More than two populations
Non-parametric procedures


Introduction
List of tests
o β when
Module 4 Key Learning

o β when σ

o β when n

41
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Relationship between α and β
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion Test on 1 tail (right)
Two normal populations H0: μ ≤ 0
Test difference of means H1: μ = μ1 > 0
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
Ho ACCEPTANCE REGION Ho REJECT REGION
List of tests
H1 REJECT REGION H1 ACCEPTANCE REGION
Module 4 Key Learning

The values of α and β are linked.


Given α, the value of β depends on:
• The distribution of the considered variable (in the graphical example, it is normal)
• The distance between the values of the parameter in the null and alternative hypotheses. In this
case, the distance between μ0 and μ1
• The population variance.

42
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Significance, Confidence, Power
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations SYMBOL MEANING NAME
Test difference of means
Test ratio of variances
Test difference of proportions
α Probability of making type I error Level of significance
Test on correlation coefficient
(1-α) Probability of not making type I error Level of confidence
More than two populations
Non-parametric procedures β Probability of making type II error ---
Introduction
List of tests (1-β) Probability of not making type II error Power
Module 4 Key Learning

43
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient Hypothesis testing procedures on population
More than two populations
Non-parametric procedures
parameters (one normal population)
Introduction
List of tests

Module 4 Key Learning

44
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Hypothesis tests on one population parameters
Significance, confidence, power
Parametric procedures
One normal population Tests on the parameters of one normally distributed population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Hypothesis
Test ratio of variances
Test difference of proportions Tests
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests Population Population Population
Module 4 Key Learning Mean Variance Proportion

σ2 Known σ2 Unknown

45
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis tests on one population parameters


Associated errors – type I & type II
Significance, confidence, power Two equivalent ways to test hypotheses. They are separated just for training purposes.
Parametric procedures However, they are two faces of the same coin. In everyday work, we shall use only the p-value
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means Use the CRITICAL VALUE
Test ratio of variances Use the P-VALUE
Test difference of proportions (which defines a REJECTION REGION)
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests
HYPOTHESIS TESTING PROCEDURE (Critical Value and the Rejection Region).

Module 4 Key Learning


1. Define the HYPOTESES to test.
2. Draw a SAMPLE from the population.
3. Define and calculate the TEST STATISTIC using sample data (done by statistical software).
4. Define the LEVEL OF SIGNIFICANCE (α).
5. Find the CRITICAL VALUE (from Statistical Tables) and the REJECTION REGION.
6. Make your DECISION (Reject or not H0).

DECISION RULE:
Test statistic > Critical value ( level)  REJECT H0
Test statistic ≤ Critical value ( level)  DO NOT REJECT H0 46
ST Restricted
Module 4

Hypothesis testing - introduction Hypothesis tests on one population parameters


Associated errors – type I & type II
Significance, confidence, power The Hypotheses
Parametric procedures
One normal population H0: μ = 0 Assumption: Normal population
Test on mean H1: μ ≠ 0 Context: Variance known
Test on variance
Test on proportion H0: μ = 0 OR μ  0
Two normal populations H1: μ > 0
Test difference of means
Test ratio of variances H0: μ = 0 OR μ ≥ 0 0 is a constant!
Test difference of proportions H1: μ < 0
Test on correlation coefficient
More than two populations
Non-parametric procedures
TEST STATISTIC, CRITICAL VALUE and REJECTION RULE
Introduction
List of tests ❑ TEST STATISTIC – A statistic, is a Test Statistic for hypotheses H0 and H1, if it is
Module 4 Key Learning
known how this statistic is distributed when the null hypothesis (H0) is true. Its
value is calculated on sample data by statistical software.
❑ CRITICAL VALUE – it depends on the test level of significance (α). It is the largest
(absolute) value of the test statistic (for a given α) that still permits to support H0 .
This value is found on Statistical Tables.
❑ REJECTION RULE – it tells us when H0 must be rejected. Typically, «Reject H0 for
“large” values of the test statistic». “LARGE” = “(the absolute value of the test
statistic is) larger than the CRITICAL VALUE”.

47
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Test Statistic and Critical Value
Significance, confidence, power EXAMPLE
Parametric procedures
“A phone industry manager thinks that customer monthly cell phone bill have increased, and
now average over $52 per month. The company wishes to test this claim.” (Assume  = 10
One normal population
Test on mean
Test on variance is known) 1. Define the HYPOTESES to test.
Test on proportion
Two normal populations 2. Draw a SAMPLE from the population.
3. Calculate the TEST STATISTIC using sample data.
Test difference of means HYPOTHESIS TESTING PROCDURE : 4. Define the LEVEL OF SIGNIFICANCE (α).
Test ratio of variances
Test difference of proportions (Method: Critical value and rejection region) 5. Find the CRITICAL VALUE and the REJECTION REGION.
Test on correlation coefficient 6. Make your DECISION (Reject or not H0).

More than two populations H0: μ  52 The average is not over $52 per month.
1. Hypotheses formulation:
Non-parametric procedures H1: μ > 52 The average is greater than $52 per month
Introduction
List of tests 2. Sample extraction: The following results are obtained: n = 64, 𝑥ҧ = 53.1, and it’s known that 𝜎 = 10.
Module 4 Key Learning x − μ0 53.1 − 52
3. Test statistic calculation: z = = = 0.88
σ 10
n 64
4. Level of significance: α = 0.10

5. Critical value and rejection region: zα=0.1= 1.28 (from statistical tables -
Rejection Region: z > 1.28
6. Decision:
Do not reject H0 at the significance level  = 0.1, since z = 0.88 < 1.28
(i.e.: at level 10%, there is not sufficient evidence that the mean bill is over $52).
48
ST Restricted
Module 4

Hypothesis testing - introduction Test Statistic and Critical Value


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean GRAPHICALLY:
Test on variance (Test on the right tail)
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
α = 0.1
Non-parametric procedures (1-α) = 0.90
Introduction
List of tests

Module 4 Key Learning


μ=0 z = 0.88 zα =1.28
Z
(Test Statistic) (Critical Value)

DO NOT REJECT H0 REJECT H0

REJECTION REGION (H0)

The Test Statistic has not fallen into the Rejection Region → Do not reject H0
49
ST Restricted
Module 4

Hypothesis testing - introduction


The P-Value
Associated errors – type I & type II
Significance, confidence, power Two equivalent ways to describe
Parametric procedures hypothesis testing procedure
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Use the CRITICAL VALUE
Use the P-VALUE
Test difference of means (which defines a REJECTION REGION)
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
HYPOTHESIS TESTING PROCEDURE (Using the P-Value).
Introduction
List of tests 1. Define the HYPOTESES to test.
2. Extract a SAMPLE from the population.
Module 4 Key Learning
3. Define and calculate the TEST STATISTIC using sample data (by statistical software).
4. Define the LEVEL OF SIGNIFICANCE (α).
5. Calculate the P-VALUE (done by statistical software).
6. Draw your CONCLUSIONS (Reject or not H0) (*).

DECISION RULE:
P-value <   REJECT H0
P-value ≥   DO NOT REJECT H0
50
ST Restricted
Module 4

Hypothesis testing - introduction The P-Value


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean Definition: The p-value is smallest value of  for which H0 can be rejected.
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Graphically: p-value
Test on correlation coefficient
More than two populations

Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning


0
Do not reject H0 Reject H0


(Critical Value)

Z
(Test Statistic)

51
ST Restricted
Module 4

Hypothesis testing - introduction The P-Value


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
EXAMPLE
One normal population Consider again the previous example 3.3 of the “phone industry manager”.
Test on mean
Test on variance
Test on proportion H0: μ  52 The average is not over $52 per month.
Two normal populations
Test difference of means
1. Hypotheses formulation: H1: μ > 52 The average is greater than $52 per month
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
2. Sample results: n = 64, 𝑥ҧ = 53.1, 𝑎𝑛𝑑 𝜎 = 10 (assumed known).
More than two populations
x − μ0 53.1 − 52
Non-parametric procedures
3.Test statistic (*): z = = = 0.88
Introduction σ 10
List of tests
n 64
Module 4 Key Learning
4. Level of significance: α = 0.10

5. P-Value calculation (*):


 53.1 − 52.0 
p − value = P(X  53.1 |  = 52 ) = P  Z   = P(Z  0.88 ) = 1 − F ( 0.88 ) = 1 − 0.8106 = 0.1894
 10 / 64 

6. Conclusions: Do not reject H0 at the significance level  = 0.1, since p-value = 0.1894 >  = 0.10

(*) calculations are carried out by statistical software. For more details, se also the Manual
of Statistical Methodology, ANNEX 6 (8482919 ver. 2). 52
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


The P-Value
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Graphically: p-value = 0.1894
Test difference of means
Test ratio of variances

 = 0.1
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests 0
Do not reject H0 Reject H0
Module 4 Key Learning

Zα =1.28
(Critical Value)

Z =0.88
(Test Statistic)

53
ST Restricted
Module 4

Hypothesis testing - introduction Summary of Rules


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
SUMMARY OF REJECTION RULES USING BOTH METHODS:
One normal population
Test on mean
Test on variance
Test on proportion NO |Test YES NO YES
 P-value
Two normal populations Statistic|
Test difference of means is large
Test ratio of variances
is large
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
REJECT H0
Introduction
List of tests
DO NOT REJECT H0
Module 4 Key Learning

• The “test statistic” is considered large if: |Test Statistic| > Critical Value (tables)
• The “p-value” is considered large if: P-value > Significance Level (α)
• According to the formulation of H1 (one or two sided test) the comparison between the test
statistic and the critical value is carried out according to the following rules:
o One-sided (left) test - reject H0 if test statistic < critical value
o One-sided (right) test - reject H0 if test statistic > critical value
o Two-sided test - reject H0 if test statistic < critical value 1 OR if test statistic > critical value 2
• The comparison between the significance level (α) and the p-value is carried out according to the
following rule: reject H0 if p-value < significance level (α)
54
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Summary of Rules
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance The methodology so far illustrated, can be adopted for all the cases in
Test on proportion the next slides.
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient The only differences between them regards:
More than two populations
Non-parametric procedures ➢ the hypotheses
Introduction
List of tests
➢ the test statistic
Module 4 Key Learning

Next slides will show only summary tables.


In all cases, we assume the populations to be normally distributed.

55
ST Restricted
Module 4 Hypothesis
Tests
Hypothesis testing - introduction

Associated errors – type I & type II


Population
Summary of Tests
Population Population
Significance, confidence, power Mean
Variance Proportion

Parametric procedures
One normal population
Test on mean
Test on variance σ2 Known
σ2 Unknown
FOR THE MEAN - variance known
Test on proportion
Two normal populations
Test difference of means 𝒙lj − 𝝁𝟎
Test ratio of variances
The Test Statistic:
𝒛 = 𝝈 > 𝒛𝜶
Test difference of proportions is a value of the standard normal distribution
Test on correlation coefficient
𝒏
More than two populations
Non-parametric procedures
Introduction
List of tests
HYPOTHESES REJECT H0 IF P(value) < alpha or for Test
Statistics with condition stated below
Module 4 Key Learning

H0: μ = μ0 (or H0: μ ≥ μ0) 𝒙lj − 𝝁𝟎


𝒛= 𝝈 < −𝒛𝜶
H1: μ < μ0
𝒏
H0: μ = μ0 (or H0: μ  μ0) 𝒙lj − 𝝁𝟎
𝒛 = 𝝈 ≻ 𝒛𝜶
H1: μ > μ0
𝒏
H0: μ = μ0 𝒙lj − 𝝁𝟎 𝒙lj − 𝝁𝟎 𝒙lj − 𝝁𝟎
𝒛 = 𝝈 > 𝒛𝜶/𝟐 ⇔ 𝝈 > 𝒛𝜶/𝟐 𝑶𝑹 𝝈 < −𝒛𝜶/𝟐
H1: μ  μ0
𝒏 𝒏 𝒏
56
ST Restricted
Module 4 Hypothesis
Tests
Hypothesis testing - introduction

Associated errors – type I & type II


Summary of Tests
Population Population
Population
Significance, confidence, power Mean
Variance Proportion

Parametric procedures
One normal population
Test on mean σ2 Known σ2 Unknown
Test on variance
Test on proportion FOR THE MEAN - variance unknown
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions 𝒙lj − 𝝁𝟎
Test on correlation coefficient The Test Statistic: 𝒕 = 𝒔 is a value of the t distribution with (n – 1) DF
More than two populations
𝒏
Non-parametric procedures
Introduction
List of tests
HYPOTHESES REJECT H0 IF P(value) < alpha or for Test Statistics with
Module 4 Key Learning condition stated below
H0: μ = μ0 (or H0: μ ≥ μ0) 𝒙lj − 𝝁𝟎
𝒕= 𝒔 < −𝒕𝒏−𝟏 ,𝜶
H1: μ < μ0
𝒏
H0: μ = μ0 (or H0: μ  μ0) 𝒙lj − 𝝁𝟎
𝒕 = 𝒔 > 𝒕𝒏−𝟏 ,𝜶
H1: μ > μ0
𝒏

H0: μ = μ0 𝒙lj − 𝝁𝟎 𝒙lj − 𝝁𝟎 𝒙lj − 𝝁𝟎
𝒕 = > 𝒕𝒏−𝟏,𝜶/𝟐 ⇔ > 𝒕𝒏−𝟏,𝜶/𝟐 𝑶𝑹 < −𝒕𝒏−𝟏,𝜶/𝟐
H1: μ  μ0 𝑺 𝑺 𝑺
𝒏 𝒏 𝒏 57
ST Restricted
Module 4 Hypothesis
Tests
Hypothesis testing - introduction

Associated errors – type I & type II


Summary of Tests
Population Population
Population
Significance, confidence, power Mean
Variance Proportion

Parametric procedures
One normal population
Test on mean
σ2 Known
Test on variance σ2 Unknown
Test on proportion
Two normal populations FOR THE VARIANCE
Test difference of means
Test ratio of variances
Test difference of proportions
2
(n − 1)s2
Test on correlation coefficient The Test Statistic: 𝜒n−1 = is a value of the χ2 (Chi-squared) distribution with n-1 d.f.
σ20
More than two populations
Non-parametric procedures
Introduction HYPOTHESES REJECT H0 IF P(value) < alpha or GRAPHICALLY
List of tests
for Test Statistics with condition
Module 4 Key Learning stated below
𝐻0 : 𝜎 2 = 𝜎02 (𝑜𝑟 𝜎 2 ≥ 𝜎02 ) 2 2
α
𝜒𝑛−1 < 𝜒𝑛−1 ,1−𝛼
𝐻1 : 𝜎 2 < 𝜎02 2
𝜒𝑛− 1 ,1−𝛼

𝐻0 : 𝜎 2 = 𝜎02 (𝑜𝑟𝜎 2 ≤ 𝜎02 ) 2 2


α
𝜒𝑛−1 > 𝜒𝑛−1 ,𝛼
𝐻1 : 𝜎 2 > 𝜎02 2
𝜒𝑛−1 ,𝛼
2 2 α/2 α/2
𝐻0 : 𝜎 2 = 𝜎02 𝜒𝑛−1 > 𝜒𝑛−1 ,𝛼 /2 OR
𝐻1 : 𝜎 2 ≠ 𝜎02 2
𝜒𝑛−1 2
< 𝜒𝑛− 2
𝜒𝑛−
2
𝜒𝑛−1 ,𝛼/2
1 ,1−𝛼/2 1 ,1−𝛼/2
58
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14.1
Associated errors – type I & type II
Significance, confidence, power
Note : How to perform Hypothesis Testing on One Pop Mean have been shown in
Parametric procedures
Exercise 14.
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
1. Open the exercise File:
More than two populations
Non-parametric procedures 2. Trainer will show using JMP:
Introduction
List of tests a. How to perform Hypothesis Testing on One Pop Variance.
Module 4 Key Learning
3. Interpretation of results.

59
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14.1
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures Make first distribution. Go to Analyze > Distribution. Cast the 2 columns to Y
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction Do a hypothesis testing. Use hotspot, select Test Std Dev.
List of tests

Module 4 Key Learning


Value to enter should be variance

60
ST Restricted
Module 4

Hypothesis testing - introduction Ho: 2 = 1 Exercise #14.1


Associated errors – type I & type II H1: 2  1
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances If 2-sided test
Test difference of proportions If left test
Test on correlation coefficient
If right test
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Since Min PValue is less than the significance value (0.05), then we REJECT the NULL HYPOTHESIS.

Practical Conclusion:
The process variance is statistically not equal to 1
61
ST Restricted
Module 4 Hypothesis
Tests
Hypothesis testing - introduction

Associated errors – type I & type II


Summary of Tests
Population Population
Population
Significance, confidence, power Mean
Variance Proportion

Parametric procedures
One normal population
Test on mean
σ2 Known
Test on variance σ2 Unknown
Test on proportion
Two normal populations FOR THE PROPORTION
Test difference of means
Test ratio of variances
Test difference of proportions 𝑝Ƹ − 𝑝0
Test on correlation coefficient The Test Statistic: 𝑧 = is a value of the standard normal distribution
𝑝0 (1 − 𝑝0 )
More than two populations 𝑛
Non-parametric procedures
Introduction
List of tests ASSUMPTION: The binomial distribution can be approximated by a normal distribution.
Module 4 Key Learning Rule of thumb → The normal approximation holds when np(1-p) > 9
HYPOTHESES REJECT H0 IF P(value) < alpha or for Test Statistics
with condition stated below
𝐻0 : 𝑝 = 𝑝0 (𝑜𝑟 𝑝 ≥ 𝑝0 )
𝑧 < −𝑧𝛼
𝐻1 : 𝑝 < 𝑝0
𝐻0 : 𝑝 = 𝑝0 (𝑜𝑟 𝑝 ≤ 𝑝0 ) 𝑧 > 𝑧𝛼
𝐻1 : 𝑝 > 𝑝0
𝐻0 : 𝑝 = 𝑝0 𝒛 < −𝑧𝛼 OR 𝒛 > 𝑧𝛼 62
𝐻1 : 𝑝 ≠ 𝑝0 2 2 ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14.2
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean 1. Open the exercise File:
Test on variance
Test on proportion
Two normal populations 2. Trainer will show using JMP:
Test difference of means a. How to perform Hypothesis Testing on One Pop Proportion.
Test ratio of variances
Test difference of proportions
Test on correlation coefficient 3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

63
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14.2
Associated errors – type I & type II
Significance, confidence, power
Make first distribution. Go to Analyze > Distribution. Cast Conformity to Y, then OK
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations Do a hypothesis testing. Use hotspot, select Test Probabilities.
Non-parametric procedures
Introduction
List of tests
Sum equal to 1
Module 4 Key Learning

Since Prob value is greater than the significance value (0.05), then
we FAIL TO REJECT the NULL HYPOTHESIS.

Practical Conclusion:
The proportion conformity is equal to 0.8 64
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14.2
Associated errors – type I & type II
Significance, confidence, power
Using Sample Calculator in Add-Ins > Statistics Calculator > CI for One Proportion
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests Select Raw Data. Use Conformity for the Test.

Module 4 Key Learning

65
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #14.2
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Input same value as in previous slide (Hypothesized proportion)


Input alpha (0.05)

66
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient Hypothesis testing on parameters
More than two populations
Non-parametric procedures
of two normal populations
Introduction
List of tests

Module 4 Key Learning

67
ST Restricted
Module 4

Hypothesis testing - introduction Summary of Tests


Associated errors – type I & type II
Significance, confidence, power
GENERAL SCHEME:
Parametric procedures
One normal population
Test on mean
Test on variance HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient DIFFERENCE of 2 means RATIO of 2 DIFFERENCE Correlation
More than two populations
variances of 2 proportions coefficient
Non-parametric procedures
Introduction
List of tests DEPENDENT INDEPENDENT
Module 4 Key Learning SAMPLES SAMPLES

Variances Variances
KNOWN UNKNOWN

Variances Variances assumed


assumed EQUAL UNEQUAL
68
ST Restricted
Module 4

Hypothesis testing - introduction HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS


Summary of Tests
Associated errors – type I & type II
DIFFERENC RATIO of 2 DIFFERENCE Correlation
Significance, confidence, power
E of 2 means variances of 2 proportions coefficient

Parametric procedures
One normal population DEPENDENT INDEPENDENT
SAMPLES SAMPLES
Test on mean
Test on variance
Test on proportion Variances
KNOWN
Variances Test the difference of means of two
UNKNOWN
Two normal populations
Test difference of means dependent (or “paired”) samples
Test ratio of variances Variances assumed Variances
EQUAL assumed
Test difference of proportions UNEQUAL
Test on correlation coefficient 𝒅 − 𝒅𝟎 is a value of the t distribution with
𝒕= 𝒔𝒅
More than two populations The Test Statistic: (n – 1) degrees of freedom.
Non-parametric procedures 𝒏
Introduction
List of tests

Module 4 Key Learning


HYPOTHESES REJECT H0 IF P(value) < alpha or for Test
Statistics with condition stated below Of particular interest is
H0: μx - μy = () d0 the case d0 = 0 to test
t < -tn-1, α
H1: μx - μy < d0 the equality of the
population means.
H0: μx - μy = () d0
H1: μx - μy > d0 t > tn-1, α

H0: μx - μy = d0
H1: μx - μy  d0 t > tn-1, α/2 OR t < -tn-1, α/2
69
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #15
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
1. Open the exercise File:
One normal population
Test on mean
Test on variance 2. Trainer will show using JMP:
Test on proportion
a. How to perform Hypothesis Testing.
Two normal populations
Test difference of means
Test ratio of variances 3. Interpretation of results.
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

70
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #15
Associated errors – type I & type II
Significance, confidence, power
Go to Analyze > Specialized Modeling > Matched Pair
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Cast T0 and T500 columns to Y, then OK If 2-sided test


If right test
If left test

It also has the Confidence interval for the mean difference

Since Prob value is less than the significance value


(0.05), then we REJECT the NULL HYPOTHESIS.
71
Mean difference is not equal to 0.
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #15
Associated errors – type I & type II
Significance, confidence, power
Go to Analyze > Distribution > Test Mean
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Conclusion:
Since Prob value is less than the significance value
(0.05), then we REJECT the NULL HYPOTHESIS.
Mean difference is not equal to 0.

72
ST Restricted
Module 4

Hypothesis testing - introduction HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS


Summary of Tests
Associated errors – type I & type II
Significance, confidence, power DIFFERENC RATIO of 2 DIFFERENCE Correlation
E of 2 means variances of 2 proportions coefficient
Parametric procedures
One normal population DEPENDENT INDEPENDENT
Test on mean SAMPLES SAMPLES

Test on variance
Test on proportion Variances Variances Test the difference of means of two
Two normal populations KNOWN UNKNOWN

Test difference of means independent samples – variances known


Test ratio of variances Variances assumed Variances
EQUAL assumed
Test difference of proportions UNEQUAL
Test on correlation coefficient
More than two populations
ഥ ഥ − 𝒅𝟎
𝒙−𝒚 is a value of the standard normal distribution.
Non-parametric procedures The Test Statistic: 𝒁 =
Introduction 𝛔𝐱 𝟐 𝛔𝐲 𝟐
List of tests +
𝒏𝐱 𝒏𝐲
Module 4 Key Learning

HYPOTHESES REJECT H0 IF P(value) < alpha or for Test Statistics


with condition stated below
H0: μx - μy = () d0
z < -zα
H1: μx - μy < d0
H0: μx - μy = () d0
z > zα
H1: μx - μy > d0
H0: μx - μy = d0
z < -z α/2 OR z > zα/2
H1: μx - μy  d0
73
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #16
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
1. Open the exercise File:
One normal population
Test on mean
Test on variance 2. Trainer will show using JMP:
Test on proportion
a. How to perform Hypothesis Testing on 2 Independent
Two normal populations
Test difference of means Samples with Variance Known.
Test ratio of variances
Test difference of proportions
Test on correlation coefficient 3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

74
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #16
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures Go to Tables > Stack
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Go to Add-Ins > Statistical Calculators > Hypothesis Test for Two Means

75
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #16
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Cast the columns to appropriate fields > OK
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Known variance
Introduction
List of tests
Unknown variance

Module 4 Key Learning

Conclusion:
Since Prob value is less than the
significance value (0.05), then we REJECT
the NULL HYPOTHESIS.
Mean difference is not equal to 0.

76
* In case there is a larger size historical std dev, use them instead of this 30 samples
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #16
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Cast the columns to appropriate fields > OK
One normal population
Test on mean
Test on variance
Test on proportion
Unknown variance
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient Select Variance option
based on scenario
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

77
ST Restricted
Module 4

Hypothesis testing - introduction


HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS
Summary of Tests
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures DIFFERENC RATIO of 2 DIFFERENCE Correlation
E of 2 means variances of 2 proportions coefficient
One normal population
Test on mean
Test on variance DEPENDENT INDEPENDENT

Test on proportion SAMPLES SAMPLES


Test the difference of means of independent
Two normal populations
Test difference of means Variances Variances samples – variances unknown but assumed equal
KNOWN UNKNOWN
Test ratio of variances
Test difference of proportions
Test on correlation coefficient Variances assumed Variances
EQUAL assumed
UNEQUAL
More than two populations
Non-parametric procedures ഥ ഥ − 𝒅𝟎
𝒙−𝒚
Introduction The Test Statistic: 𝒕 = is a value of the t distribution with (nx + ny – 2) d.f.
List of tests 𝟏 𝟏
𝒔𝟐𝒑 +
𝒏𝒙 𝒏𝒚 ( n x − 1)s x2 + ( n2 y −(1) 2
n sx y− 1)s x2 + ( n y − 1)s 2y
Module 4 Key Learning The pooled variance s 2p =is defined as: s p =
nx + n y − 2 nx + n y − 2
HYPOTHESES REJECT H0 IF P(value) < alpha or for Test Statistics
with condition stated below
H0: μx - μy = () d0
H1: μx - μy < d0
𝑡 < −𝑡 𝑛𝑥 +𝑛𝑦 −2 ,𝛼
H0: μx - μy = () d0
H1: μx - μy > d0
𝑡>𝑡 𝑛𝑥 +𝑛𝑦 −2 ,𝛼
H0: μx - μy = d0
𝑡 < −𝑡 𝑛𝑥 +𝑛𝑦 −2 ,𝛼/2 OR 𝑡 >𝑡
H1: μx - μy  d0 𝑛𝑥 +𝑛𝑦 −2 ,𝛼/2
78
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #17
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
1. Open the exercise File:
One normal population
Test on mean
Test on variance 2. Trainer will show using JMP:
Test on proportion
a. How to perform Unequal Variance Test.
Two normal populations
Test difference of means b. How to perform pooled-t test.
Test ratio of variances
Test difference of proportions
Test on correlation coefficient 3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

79
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #17
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to Analyze > Fit Y by X. Cast Data to Y, Label to X then OK
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

80
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #17
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning Since P-value > significance level (0.05), we FAIL TO REJECT
NULL HYPOTHESIS; variances are equal.

Note: Some explanation on the different Unequal Variance Tests Method. 81


https://www.jmp.com/support/help/en/16.0/index.shtml#page/jmp/unequal-variances.shtml ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #17
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Since P-value < significance level (0.05), we REJECT NULL


HYPOTHESIS.

82
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #17
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to Add-Ins > Statistics Calculator > Hypothesis Test for Two Means
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Since P-value < significance


level (0.05), we REJECT NULL
HYPOTHESIS.

83
ST Restricted
Module 4

Hypothesis testing - introduction Summary of Tests


Associated errors – type I & type II HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS

Significance, confidence, power


Parametric procedures DIFFERENC RATIO of 2 DIFFERENCE Correlation
E of 2 means variances of 2 proportions coefficient
One normal population
Test on mean
Test on variance DEPENDENT INDEPENDENT
Test on proportion SAMPLES SAMPLES
Test the difference of means of independent
Two normal populations
Test difference of means Variances
KNOWN
Variances samples – variances unknown and assumed
UNKNOWN
Test ratio of variances
Test difference of proportions unequal
Test on correlation coefficient Variances assumed Variances
EQUAL assumed
More than two populations UNEQUAL

Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning


ഥ ഥ − 𝒅𝟎
𝒙−𝒚 is a value of the t distribution with  degrees of freedom
The Test Statistic: 𝒕 =
𝟐
𝝈𝟐𝒙 𝝈𝒚 2
+
𝒏𝒙 𝒏𝒚  s2 s 2y 
( ) + (
x
)
 n x ny 
HYPOTHESES REJECT H0 IF: where, v = 2
 s 2y 
2
 s x2 
H0: μx - μy = () d0   /(n x − 1) +   /(n y − 1)
𝑡 < −𝑡𝜈,𝛼 n   ny 
H1: μx - μy < d0  x  
H0: μx - μy = () d0
𝑡 > 𝑡𝜈,𝛼
H1: μx - μy > d0
H0: μx - μy = d0
𝑡 < −𝑡𝜈,𝛼/2 OR 𝑡 > 𝑡𝜈,𝛼/2 84
H1: μx - μy  d0
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #18
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
1. Open the exercise File:
One normal population
Test on mean
Test on variance 2. Trainer will show using JMP:
Test on proportion
a. How to perform Unequal Variance Tests.
Two normal populations
Test difference of means b. How to perform t-test.
Test ratio of variances
Test difference of proportions
Test on correlation coefficient 3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

85
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #18
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to Analyze > Fit Y by X. Cast Data to Y, Label to X then OK
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

86
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #18
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning Since P-value > significance level (0.05), we REJECT NULL HYPOTHESIS;
variances are not equal.

Note: Some explanation on the different Unequal Variance Tests Method.


https://www.jmp.com/support/help/en/16.0/index.shtml#page/jmp/unequal-
variances.shtml
87
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #18
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to hotspot and select t Test
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures Since P-value < significance level (0.05), we REJECT NULL
Introduction
List of tests
HYPOTHESIS. Mean difference is not equal to 0.

Module 4 Key Learning

88
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #18
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to Add-Ins > Statistics Calculator > Hypothesis Test for Two Means
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Since P-value <


significance level (0.05), we
REJECT NULL
HYPOTHESIS.

89
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS
Summary of Tests
Significance, confidence, power
DIFFERENC RATIO of 2 DIFFERENCE Correlation
Parametric procedures
E of 2 means variances of 2 proportions coefficient
One normal population
Test on mean
DEPENDENT INDEPENDENT
Test on variance SAMPLES SAMPLES
Test on proportion Test the ratio of variances
Two normal populations
Variances Variances
Test difference of means KNOWN UNKNOWN
Test ratio of variances
Test difference of proportions Variances
Variances assumed
Test on correlation coefficient EQUAL assumed
UNEQUAL
More than two populations
Non-parametric procedures
Introduction
List of tests HYPOTHESES REJECT H0 IF:
𝒔𝟐𝒙 is a value of the F
Module 4 Key Learning 𝜎𝑥2 The Test Statistic: 𝑭= 𝟐
𝐻0 : 2 = ≤ 1 𝒔𝒚
𝜎𝑦
𝐹 > 𝐹𝑛𝑥 −1,𝑛𝑦−1,𝛼
𝜎𝑥2 distribution with (nx-1) and (ny-1) d.f. for the
𝐻1 : 2 >1
𝜎𝑦 numerator and the denominator of F respectively.
𝜎𝑥2
𝐻0 : 2 = 1
𝜎𝑦
𝐹 > 𝐹𝑛𝑥 −1,𝑛𝑦−1,𝛼/2
𝜎𝑥2
𝐻1 : 2 ≠ 1
𝜎𝑦 90
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #19
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
1. Two exercise files from previous example.
One normal population
Test on mean
a. Unequal Variance
Test on variance b. Equal Variance
Test on proportion
Two normal populations
Test difference of means 2. Unequal Variance Test have been shown
Test ratio of variances previously in Exercise #17 and #18..
Test difference of proportions
Test on correlation coefficient
More than two populations 3. If need be, trainer can show again.
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

91
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS
Summary of Tests
Significance, confidence, power DIFFERENC RATIO of 2 DIFFERENCE Correlation
E of 2 means variances of 2 proportions coefficient
Parametric procedures
One normal population DEPENDENT INDEPENDENT
Test on mean SAMPLES SAMPLES
Test on variance Test the difference of proportions
Test on proportion
Variances Variances
Two normal populations KNOWN UNKNOWN
Test difference of means
Test ratio of variances Variances assumed Variances
Test difference of proportions EQUAL assumed
UNEQUAL
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction (𝑝Ƹ𝑥 − 𝑝Ƹ𝑦 )
List of tests The Test Statistic: 𝑧= is a value of the standard normal distribution.
𝑝Ƹ0 (1 − 𝑝Ƹ0 ) 𝑝Ƹ0 (1 − 𝑝Ƹ0 )
+ n x p̂ x + n y p̂ y
Module 4 Key Learning 𝑛𝑥 𝑛𝑦 Where: p̂0 =
nx + n y
HYPOTHESES REJECT H0 IF: ASSUMPTIONS

H0: px - py = () 0
z < -z
H1: px - py < 0 Two large independent random
H0: px - py = () 0 samples of sizes nx and ny, are drawn.
z > z
H1: px - py > 0 The normal approximation holds (still
H0: px - py = 0 z < -z/2 OR z the rule of thumb, np(1-p)>9)
H1: px - py  0 >z/2
92
Where 𝒑 ෝ 𝟎 is a weighted estimate of the (under H0) common proportion.
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #20
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
1. Open the exercise File:
Test on mean
Test on variance
Test on proportion
2. Trainer will show using JMP:
Two normal populations a. How to perform Hypothesis Test for 2 Sample Proportion.
Test difference of means
Test ratio of variances
Test difference of proportions 3. Interpretation of results.
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

93
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #20
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to Tables > Stack. Go to Analyze > Fit Y by X
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

If left test
If right test
If 2-sided test

Depending on the system of hypothesis, evaluate P-value


Can toggle the response
against alpha (0.05)
94
ST Restricted
Module 4

Hypothesis testing - introduction Inference on the correlation coefficient


Associated errors – type I & type II
Significance, confidence, power HYPOTHESIS TESTING ON PARAMETERS OF TWO NORMAL POPULATIONS

Parametric procedures
One normal population
DIFFERENC RATIO of 2 DIFFERENCE Correlation
Test on mean E of 2 means variances of 2 proportions coefficient
Test on variance
Test on proportion
DEPENDENT INDEPENDENT
Two normal populations SAMPLES
SAMPLES
Test difference of means
Test ratio of variances
Test difference of proportions Variances Variances
KNOWN UNKNOWN
Test on correlation coefficient
More than two populations Variances
FOR THE CORRELATION COEFFICIENT
Variances assumed
EQUAL assumed
Non-parametric procedures UNEQUAL
Introduction
List of tests
𝐫 (𝐧 − 𝟐)
Module 4 Key Learning
The TEST STATISTIC: 𝐭= is a value of the t distribution with (n-2) d.f.
(𝟏 − 𝐫𝟐 )

HYPOTHESES REJECT H0 IF:


𝐻0 : 𝜌 = 0 (𝑜𝑟𝜌 ≥ 0)
𝑡 < −𝑡𝑛−2,𝛼
𝐻1 : 𝜌 < 0
𝐻0 : 𝜌 = 0 (𝑜𝑟𝜌 ≤ 0)
𝑡 > 𝑡𝑛−2,𝛼
𝐻1 : 𝜌 > 0 NOTE: for a nonparametric test on the
correlation coefficient, see also the
𝐻0 : 𝜌 = 0 𝑡 < −𝑡𝑛−2,𝛼/2 OR 𝑡 > 𝑡𝑛−2,𝛼/2 Manual of Statistical Methodology, §7.
𝐻1 : 𝜌 ≠ 0 (DMS 8482919_A) 95
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #21
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
1. Open the exercise File:
Test on mean
Test on variance
Test on proportion
2. Trainer will show using JMP:
Two normal populations a. How to perform Hypothesis Test for Correlation Coefficient.
Test difference of means 3. Interpretation of results.
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

96
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #21
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population Go to Analyze > Multivariate > Multivariate
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Correlation is significant
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning


Cast column to Y > OK

97
ST Restricted
Module 4

Hypothesis testing - introduction Comparing more than two population means


Associated errors – type I & type II
Significance, confidence, power
Parametric procedures PROBLEM: We want to compare the unknown means of k=3 populations.
One normal population “Populations” can be “equipment”, “testers”, “bonders” and so on, e.g. to
Test on mean
Test on variance assess the alignment between equipment, testers, bonders etc.
Test on proportion
Two normal populations The system of hypotheses to test is:
Test difference of means
Test ratio of variances H0: μ1 = μ2 = μ3 = μ (constant)
Test difference of proportions
Test on correlation coefficient H1: “At least one of the means is different from at least another one”.
More than two populations
Non-parametric procedures
Introduction
List of tests IDEA: “to compare all possible pairs of means, using the methods shown so far
Module 4 Key Learning
(e.g. the t-test)”.

The following group of hypotheses should be tested:

H0: μ1 = μ2 H0: μ1 = μ3 H0: μ2 = μ3


H1: μ1 ≠ μ2 + H1: μ1 ≠ μ3 + H1: μ2 ≠ μ3

98

ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Comparing more than two population means
Significance, confidence, power
Parametric procedures WHY NOT? This approach has several drawbacks. Among them:
One normal population 1. It is time-consuming (as k increases, the number of pairs to compare will rapidly become very
Test on mean
Test on variance large)
Test on proportion 2. If each comparison is made at the α significance level, the conclusion, based on these tests is
Two normal populations
Test difference of means
not associated to the same alpha level. An adjustment is required (e.g. Bonferroni or others).
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures We need a simpler and more efficient method: the F-test
Introduction
List of tests
Generalizing to a number k of means, we will test the following system of hypotheses.
Module 4 Key Learning

𝑯𝟎 : “the k means are all equal”


𝑯𝟏 : “at least one of the k means is significantly different from at least another one”

Assumptions (of the F-test):


1. The samples are independent and randomly drawn from k populations
2. All k populations are normally distributed
3. The k population variances are homogeneous (i.e. not significantly different)
99
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Comparing more than two population means
Significance, confidence, power
Parametric procedures CASE 1: one variable - simultaneous comparison of means (5 means for UBM thickness in the example)
One normal population
Test on mean INPUT OUTPUT
Test on variance
Test on proportion
one-way X1 - one independent Y - one dependent
Two normal populations
variable on many levels SYSTEM response variable
Test difference of means ANOVA
Test ratio of variances
Test difference of proportions
Test on correlation coefficient UBM thickness, 5 levels Ball Shear
More than two populations
Non-parametric procedures
Introduction CASE 2: two variables - simultaneous comparison of means (in the example: 5 means for variable X1
List of tests
and 3 means for variable X2)
Module 4 Key Learning INPUT OUTPUT
Two-ways
X1 and X2 - two independent Y - one dependent
ANOVA variables on many levels
SYSTEM response variable

X1 - Reflow Temperature, 5 levels


Ball Shear
X2 - Reflow Time, 3 levels

More than two variables multivariable ANOVA
NOTE 100
for a deeper analysis on these tests, consider the training on “Model Building and Design of Experiment, Level 2”. ST Restricted
Comparing more than two population means
Nb. of variables Goal of the test Parametric Non-parametric Multiple comparison technique (*)
test test (medians)
1 Compare means F-test Kruskal-Wallis Tukey/Bonferroni/Sheffè/LSD/Newman-Keuls/…
2 Compare means F-test Friedman Tukey/Bonferroni/Sheffè/LSD/Newman-Keuls/…

One variable:
For a given parameter (e.g. “Ball Diameter”), test the alignment between k machines (1 indep. Variable=MACHINE , k levels=k MACHINE_ID’s)

More than one variable


For a given parameter ((e.g. “Ball Diameter”), test the alignment between k machines (first variable, K levels)
AND
the effect of s different air temperatures (second variable, S levels) on the same parameter
AND
The effect of h values of relative Humidity (third variable, H levels) on the same parameter

(*) The F-test tells us if at least one mean is different from – at least – another one. It does not tell us which mean
is different from which other. To know this, we can use the “multiple comparison methods”, which identify groups of
homogeneous means.
101
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


F test to compare K means (1 variable)
Significance, confidence, power
To carry-out the test, some sample data are required. Once data are available, statistical
Parametric procedures
One normal population
packages like JMP calculate the p-value of the F test.
Test on mean
Test on variance
Test on proportion
Hypotheses:
Two normal populations 𝐻0 : 𝜇𝑖 = 𝜇 ∀𝑖, 𝑖 = 1,2, … , 𝑘 all the k population means are equal
Test difference of means 𝐻1 : ∃𝑖: 𝜇𝑖 ≠ 𝜇 at least one of the population means is different
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures Sample data :
Introduction
List of tests The following table summarizes the results of data collection. Each row represents a sample.
Samples can be either equally or differently sized
Module 4 Key Learning
Sample from the first population
Sample
Levels Sample
Average
1 Average of data from
the first population
2
Variable X on K levels
⁞ ⁞ ⁞ Average of data from
the k-th population
k

Sample from the k-th population


102
ST Restricted
Module 4

Hypothesis testing - introduction


F test to compare K means (1 variable)
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures Example: we want to test the alignment of K machines. We refer to a relevant parameter Y e.g. “ball
One normal population shear”. From each machine, a sample of n measurements has been collected.
Test on mean
Test on variance
Test on proportion
Two normal populations
STANDARD NOTATION
Test difference of means
Test ratio of variances Machines Sample (n replications) Average
Test difference of proportions
Test on correlation coefficient 1
More than two populations
2
Non-parametric procedures
𝑦𝑖𝑗 Average of the n replications
Introduction
List of tests
⁞ 𝑦ത𝑖∙ of the ith sample (ith machine)
Module 4 Key Learning
k
Grand average 𝑦ത∙∙ 𝒊 = 𝟏, 𝟐, ⋯ , 𝒌
𝒋 = 𝟏, 𝟐, ⋯ , 𝒏
Ball shear value from ith
machine, in the jth replication)
Grand average of all the n x k observations

The dots (∙) substitute the indices used for averaging. For example, in 𝑦ത𝑖∙ it replaces the index j, (columns) to permit the
calculation of row averages. Or it replaces both indices i and j – like in 𝑦ത∙∙ to indicate the value of the grand average

NOTE
For simplicity, in this example the replications are “balanced”, i.e. same number of measurements from each machine (n). 103
This is not a necessary condition for data analysis. Unbalanced cases can be analyzed as well ST Restricted
Module 4

Hypothesis testing - introduction


F test to compare K means (1 variable)
Associated errors – type I & type II
Significance, confidence, power
In short, we can say that the machines are aligned if simultaneously the samples are drawn
Parametric procedures from populations with:
One normal population 1. Same mean (𝝁𝟏 = 𝝁𝟐 = ⋯ = 𝝁𝒌 )
Test on mean
Test on variance 2. Same variance (𝝈𝟐𝟏 = 𝝈𝟐𝟐 = ⋯ 𝝈𝟐𝒌 )
Test on proportion
3. Same distribution (by assumption, the normal distribution)
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions Let’s focus on the equality of the k means;
Test on correlation coefficient
More than two populations ഥ𝟏∙
𝒚 𝝁𝟏
Non-parametric procedures
ഥ𝟐∙
𝒚 𝝁𝟐
Introduction are estimators of
List of tests
⁞ ⁞
Module 4 Key Learning
ഥ𝒌∙
𝒚 𝝁𝒌

Machines Sample (n replications) Average

1 Sample from machine 1 ഥ𝟏∙


𝒚
2 Sample from machine 2 ഥ𝟐∙
𝒚
𝑦ത𝑖∙
⁞ ⁞ ⁞
k Sample from machine k ഥ𝒌∙
𝒚 𝑖 = 1,2, ⋯ , 𝑘
𝑗 = 1,2, ⋯ , 𝑛
Grand average ഥ∙∙
𝒚 104
ST Restricted
Module 4

Hypothesis testing - introduction


F test to compare K means (1 variable)
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
“all the machines are matched”
Test on variance
Test on proportion
Two normal populations one common
Test difference of means
Test ratio of variances H0: μi = μ ∀ i σ
population
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures μ1 = μ2 = ⋯ = μk = μ
Introduction
List of tests Two possible
Module 4 Key Learning
outcomes:
“at least 2 machines are mismatched”

? ?

σ σ σ
H1: ∃ i: μi ≠ μ
μ1 μk μ2 = μ3

105
ST Restricted
F test to compare K means (1 variable)
Two sources of variability generate two different types of useful information to test the hypotheses on the
equality of means:

Source 1 Machines Sample (n replications) Average Source 2

Variability WITHIN the


1 Sample from machine 1 ഥ𝟏∙
𝒚
Variability BETWEEN the
k samples 2 Sample from machine 2 ഥ𝟐∙
𝒚
k samples
⁞ ⁞ ⁞
k Sample from machine k ഥ𝒌∙
𝒚
Grand average ഥ∙∙
𝒚
This variability represents the
The more these values are
inherent process variability. No
variable, the more it is likely
link with the effect of the variable
that H0 will be rejected (i.e.
on the considered parameter
𝑯𝟎 : 𝝁𝒊 = 𝝁 ∀𝒊, 𝒊 = 𝟏, 𝟐, … , 𝒌 means are different)
(differences between machines)
𝑯𝟏 : ∃𝒊: 𝝁𝒊 ≠ 𝝁

106
ST Restricted
Module 4

Hypothesis testing - introduction F test to compare K means (1 variable)


Associated errors – type I & type II
Machines Sample (n replications) Average
Significance, confidence, power
Variability BETWEEN
Parametric procedures 1 Sample from machine 1 ഥ𝟏∙
𝒚 the k samples
One normal population
Test on mean 2 Sample from machine 2 ഥ𝟐∙
𝒚
Test on variance
Test on proportion ⁞ ⁞ ⁞
Two normal populations
Test difference of means
Test ratio of variances k Sample from machine k ഥ𝒌∙
𝒚
Test difference of proportions
Test on correlation coefficient
Variability WITHIN
Grand average ഥ∙∙
𝒚
More than two populations the k samples
𝑯𝟎 : 𝝁𝒊 = 𝝁 ∀𝒊, 𝟏 = 𝟏, 𝟐, … , 𝒌
Non-parametric procedures
Introduction 𝑯𝟏 : ∃𝒊: 𝝁𝒊 ≠ 𝝁
List of tests

Module 4 Key Learning


How the F-test works
is considered Reject H0
𝒗𝒂𝒓𝒊𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝑩𝑬𝑻𝑾𝑬𝑬𝑵 LARGE (machine mismatched)
If the ratio 𝑭=
𝒗𝒂𝒓𝒊𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝑾𝑰𝑻𝑯𝑰𝑵 is considered Do not reject H0
SMALL (machine matched)

NOTE:
Statistics helps us in fixing the relative concepts of “large” and “small”, once a level of significance has been established. 107
ST Restricted
Module 4

Hypothesis testing - introduction


F test to compare K means
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
The F statistic is the ratio of the between estimate of variance and the within estimate
One normal population
of variance
Test on mean
Test on variance
o It is always positive
Test on proportion
Two normal populations
o df1 = k -1 will typically be small
Test difference of means
Test ratio of variances
o df2 = (n-1)k = nT - k will typically be large
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures g(F)
Introduction
List of tests
Decision Rule:
 = 0.05
Module 4 Key Learning
Reject H0 if
F(calc) > F(k-1), (nT – k), 
0 F
Do not Reject H0
Or for small p-values. reject H0
F(k-1), (nT – k),

F(calc) = ?
108
ST Restricted
Advanced Explanation
Numerically
To calculate the test statistic F, we first decompose SST, the total deviance Machines Sample (n replications) Average
(SST stands for “Total Sum of Squares”) of the observations into the following
components: 1
2
SST = SSX + SSE
where, ⁞ 𝑦𝑖𝑗 𝑦ത𝑖∙
- SSX is the deviance of the sample averages (i.e. the variability “between” k
the samples due to the differences between the levels of the variable X, 𝑖 = 1,2, ⋯ , 𝐾
e.g. the machines) and
Grand average 𝑦ത∙∙
𝑗 = 1,2, ⋯ , 𝑛
- SSE is the sum of the deviances “within” the samples, the inherent
process variability

𝑘 𝑛 𝑘 𝑘 𝑛
2 2 2
෍ ෍ 𝑦𝑖𝑗 − 𝑦ത∙∙ = 𝑛 ෍ 𝑦ത𝑖∙ − 𝑦ത∙∙ + ෍ ෍ 𝑦𝑖𝑗 − 𝑦ത𝑖∙
𝑖=1 𝑗=1 𝑖=1 𝑖=1 𝑗=1

SST SSX SSE


109
ST Restricted
Advanced Explanation
To calculate the test statistic F, we need a ratio of two variances. Not of two deviances.
To obtain a variance from a deviance, we divide it by its degrees of freedom.

𝑫𝒆𝒗(𝑿)
𝒗𝒂𝒓 𝑿 =
𝑫𝑭
In our case:
Deviance Degrees of Freedom (DF)
SST kn - 1
SSX k–1
SSE k(n – 1)

So, to get the variances (also called Mean Squares, MS), we simply divide the deviances (Sum of Squares) by the
corresponding DF:
Deviance (SS) Degrees of Freedom (DF) Variance (MS)
SSX k–1 MSX = SSX/(k -1)
SSE k(n – 1) MSE = SSE/(k(n – 1))
SST kn - 1 MST = SST/(kn – 1) is not used
110
ST Restricted
Advanced Explanation

𝐻0 : 𝜇𝑖 = 𝜇 ∀𝑖, 𝑖 = 1,2, … , 𝑘 𝑀𝑆𝑋


To test the hypotheses 𝐻1 : ∃𝑖: 𝜇𝑖 ≠ 𝜇 the test statistic F can then be calculated: 𝐹=
𝑀𝑆𝐸
The terms calculated so far are usually summarized in a table called AN.O.VA. table (stands for ANalysis Of VAriance).

Degrees of
Source of Devianc Test
Freedom Variance (MS) P-value
variability e (SS) statistic
(DF)
Variable X
SSX k–1 MSX = SSX/(k -1)
(machines) 𝑴𝑺𝑿
𝑭𝑿 = 𝒑 − 𝒗𝒂𝒍𝒖𝒆𝑿
Error SSE k(n – 1) MSE = SSE/(k(n – 1)) 𝑴𝑺𝑬
Total SST kn - 1
One-factor (or “one-way”) AN.O.VA. table
NOTE:
Be careful not to be misled. This procedure is called analysis of variance (ANOVA) since it uses the ratio between two variances.
However, they are not the object of our investigation (see the hypotheses), we just “use” them, but to study the population means!

111
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


F test - 2 variables
Significance, confidence, power
X1 X2 1 2 … h Average
Parametric procedures
One normal population
Test on mean 1
Test on variance
Test on proportion 𝒊 = 𝟏, 𝟐, ⋯ , 𝒌 Levels of X1
Two normal populations
𝒋 = 𝟏, 𝟐, ⋯ , 𝒉 Levels of X2
Test difference of means 2
Test ratio of variances 𝒓 = 𝟏, 𝟐, ⋯ , 𝒏 Replications
Test difference of proportions
Test on correlation coefficient
More than two populations ⁞ ⁞ ⁞ ⁞ ⁞ ⁞
Non-parametric procedures
Introduction Average of the measurements
List of tests
k ഥ𝒊∙∙
𝒚 obtained when X1 is set on its ith level
Module 4 Key Learning

Average ഥ∙𝒋∙
𝒚 ഥ⋯
𝒚 Grand average (average of all


the khn measurements)

𝒚𝒊𝒋𝒓 ഥ𝒊𝒋∙
𝒚
Average of the measurements
obtained when X2 is set on its jth level
rth measurement obtained when X1 and X2
are set on their ith and jth levels respectively

Average of the n replications when X1 and X2


112
are set on their ith and jth levels respectively
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


F test - 2 variables
Significance, confidence, power X1 X2 1 2 … h Average
Parametric procedures We want to assess the effects of 3 terms:
One normal population ➢ X1
Test on mean 1
Test on variance ➢ X2
Test on proportion ➢ X1X2, the interaction between X1 and X2
Two normal populations
Test difference of means 2
Test ratio of variances
Test difference of proportions For each term for which we need to assess
⁞ ⁞ ⁞
Test on correlation coefficient
More than two populations ⁞ ⁞ ⁞ its effect on the response variable, an F test
is performed. The software calculates a p-
Non-parametric procedures
Introduction
value for each term to be tested.
List of tests
k ഥ𝒊∙∙
𝒚
Module 4 Key Learning 𝒊 = 𝟏, 𝟐, ⋯ , 𝒌
Average ഥ∙𝒋∙
𝒚 ഥ⋯
𝒚 𝒋 = 𝟏, 𝟐, ⋯ , 𝒉


𝒓 = 𝟏, 𝟐, ⋯ , 𝒏

The effect of an interaction between 2 variables X1 and X2 is significant, when for some combinations
(or settings) of X1 and X2, the value of the response variable is significantly higher (or lower) than what
we might expect considering X1 and X2 independently. This is called a “multiplicative” effect of X1 and
X2. If X1 and X2 are independent, their effect is said to be only “additive”.
113
ST Restricted
Advanced Explanation

To test the effects of X1, X2 and their interaction, X1*X2, we decompose SST, the total sum of squares
(or deviance), in the following components:

SST = SSX1 + SSX2 + SS(X1X2) + SSE

𝑘 ℎ 𝑛 𝑘 ℎ 𝑘 ℎ 𝑘 ℎ 𝑛
2 2 2 2 2
෍ ෍ ෍ 𝑦𝑖𝑗𝑟 − 𝑦ത⋯ = 𝑛ℎ ෍ 𝑦ത𝑖∙∙ − 𝑦ത⋯ + 𝑛𝑘 ෍ 𝑦ത∙𝑗∙ − 𝑦ത⋯ + 𝑛 ෍ ෍ 𝑦ത𝑖𝑗∙ − 𝑦ത𝑖∙∙ − 𝑦ത∙𝑗∙ + 𝑦ത⋯ + ෍ ෍ ෍ 𝑦𝑖𝑗𝑟 − 𝑦ത𝑖𝑗∙
𝑖=1 𝑗=1 𝑟=1 𝑖=1 𝑗=1 𝑖=1 𝑗=1 𝑖=1 𝑗=1 𝑟=1

SST SSX1 SSX2 SS(X1X2) SSE

114
ST Restricted
Advanced Explanation

Deviance (SS) Degrees of Freedom (DF) Variance (MS)


SSX1 k–1 MSX1
SSX2 h-1 MSX2
SS(X1X2) (k – 1)(h - 1) MS(X1X2)
SSE kh(n – 1) MSE
SST kh(n – 1) MST

To test the equality of means for X1, X2 and their interaction, three test statistics are calculated:

𝑀𝑆𝑋1 𝑀𝑆𝑋2 𝑀𝑆(𝑋1 𝑋2 )


𝐹𝑋1 = 𝐹𝑋2 = 𝐹𝑋1𝑋2 =
𝑀𝑆𝐸 𝑀𝑆𝐸 𝑀𝑆𝐸

115
ST Restricted
Advanced Explanation
And finally, the AN.O.VA. table can be created

Source of Deviance Degrees of Variance


Test statistic p-value
variability (SS) Freedom (DF) (MS)
MSX1
Variable X1 SSX1 k–1 MSX1 𝑭𝑿𝟏 = 𝒑 − 𝒗𝒂𝒍𝒖𝒆𝑿𝟏
𝑴𝑺𝑬
MSX2
Variable X2 SSX2 h–1 MSX2 𝑭𝑿𝟐 = 𝒑 − 𝒗𝒂𝒍𝒖𝒆𝑿𝟐
𝑴𝑺𝑬
MSX1X2
Interaction X1X2 SSX1X2 (k – 1)(h – 1) MSX1X2 𝑭𝑿𝟏 𝑿𝟐 = 𝒑 − 𝒗𝒂𝒍𝒖𝒆𝑿𝟏 𝑿𝟐
𝑴𝑺𝑬
Error SSE kh(n – 1) MSE
Total SST khn - 1
Two-factors (or “two-ways”) AN.O.VA. table

116
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #22
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
1. Open the exercise File:
Test on proportion
Two normal populations 2. Trainer will show using JMP:
Test difference of means
Test ratio of variances a. How to perform One Way and 2 Way ANOVA analysis
Test difference of proportions
Test on correlation coefficient
3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

117
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #22
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population Go to Tables > Stack (Machine A/B/C to Columns)
Test on mean
Test on variance Continuous
Test on proportion
Two normal populations Nominal
Test difference of means
Test ratio of variances
Test difference of proportions Ordinal
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests
Go to Analyze > Fit Y by X. Cast Data to Y, Machine to X Go to hot spot > Means/Anova
Module 4 Key Learning

118
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #22
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population Go to hot spot > Compare Means (All Pairs)
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Prob value <


Module 4 Key Learning
significance of 0.05, we
reject Null hypothesis.
Which mean is
different?

Machine B is
different with
Machine A and C

Machine A and C
has no statistically
significant difference 119
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #22
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population Go to Analyze > Fit Model
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests
Cast Response to Y, Machine&Operator to Macros > Full Factorial

Module 4 Key Learning

Machine is a significant factor

The factors machine and Operator &


their interaction were added in the model
120
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #22
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Note: Use common axis settings


121
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Introduction to nonparametric tests


Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Nonparametric Statistics deals with the
Test difference of proportions
Test on correlation coefficient same problems of parametric Statistics.
More than two populations
Non-parametric procedures
Introduction
The method is different.
List of tests

Module 4 Key Learning


Basically, there is at least one nonparametric
equivalent for each parametric type of test.

122
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Introduction to nonparametric tests
Significance, confidence, power
Parametric procedures
One normal population
Test on mean Parametric and nonparametric methods.
Test on variance
Test on proportion
Two normal populations ❑ need for statistical procedures that enable us to process data of "low quality”:
Test difference of means
Test ratio of variances ▪ small samples,
Test difference of proportions
Test on correlation coefficient
▪ on variables about which nothing is known (e.g. their distribution).
More than two populations
Non-parametric procedures
❑ Specifically, nonparametric methods were developed to be used in cases when
Introduction the researcher knows nothing about the parameters of the variable of interest in
List of tests
the population (hence the name nonparametric).
Module 4 Key Learning

❑ Nonparametric methods do not rely on the estimation of parameters (such as the


mean or the standard deviation) describing the distribution of the variable of
interest in the population.

❑ Therefore, these methods are also sometimes (and more appropriately)


called parameter-free methods or distribution-free methods.

123
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Introduction to nonparametric tests


Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance Nonparametric Statistics
Test on proportion
Two normal populations
Test difference of means
• Fewer restrictive assumptions about underlying
Test ratio of variances
Test difference of proportions probability distributions
Test on correlation coefficient
More than two populations • Population distributions may be skewed or, in general, assumptions
Non-parametric procedures on the distribution are not required
Introduction
List of tests • All else equal, nonparametric procedures are less
Module 4 Key Learning
powerful than their parametric counterparts (i.e. higher β,
the probability of type II error → lower power (1-β) → when
the alternative is true, they may be less likely to reject H0)

124
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Introduction to nonparametric tests
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Two normal populations ❑ Also for nonparametric statistics, the first step is
Test difference of means
Test ratio of variances
Test difference of proportions
the formulation of 2 hypotheses.
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests
❑ A second step is the calculation of a test statistic
Module 4 Key Learning
based on sample data

❑ As final result, a p-value is produced. The


interpretation is the same as for parametric tests.

125
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Nonparametric tests


Significance, confidence, power
Parametric procedures
One normal population
Differences between independent groups.
Test on mean
Test on variance
• The parametric counterpart is the t-test (for independent samples).
Test on proportion
Two normal populations
• Nonparametric alternatives for this test are:
Test difference of means • the Wald-Wolfowitz runs test,
Test ratio of variances
Test difference of proportions • the Mann-Whitney U test, and
Test on correlation coefficient
• the Kolmogorov-Smirnov two-sample test.
More than two populations
• If we have multiple groups, we would use analysis of variance, ANOVA (the
Non-parametric procedures
Introduction nonparametric ANOVA equivalents to this method are the Kruskal-Wallis analysis
List of tests
of ranks and the Median test).
Module 4 Key Learning

Multiple comparisons ➔ Steel Dwass

H0 / H1 P(value) Statistical Conclusion


H0 : µ1 = µ2 < Alpha Reject Ho
H1 : µ1 ≠ µ2 “The 2 populations have
different means” at Alpha
significance level. 126
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Nonparametric tests
Significance, confidence, power
Parametric procedures
One normal population
Test on mean Equality of variances of 2 independent groups.
• The parametric counterpart is the F-test (for independent
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
samples) or the Bartlett test.
Test difference of proportions
Test on correlation coefficient
• Nonparametric alternatives for this test are:
More than two populations • the Brown-Forsythe test,
Non-parametric procedures
Introduction • the Levene test
List of tests
• Cochran Test
• Hartley Test
Module 4 Key Learning

H0 / H1 P(value) Statistical Conclusion


H0 : 𝜎12 = 𝜎22 < Alpha Reject Ho
H1 : 𝜎12 ≠ 𝜎22 “The 2 populations have
different variance” at Alpha
significance level.

127
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #23
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
1. Open the exercise File:
Test on mean
Test on variance
Test on proportion
2. Trainer will show using JMP:
Two normal populations a. How to perform non-parametric test for two independent samples.
Test difference of means
Test ratio of variances
Test difference of proportions 3. Interpretation of results.
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

128
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #23
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Go to Analyze > Fit Y by X
Test on mean
Test on variance Go to hotspot > Nonparametric > Wilcoxon Test
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Machine 3 & 4 are not different

129
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #23
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population Go to hotspot > Unequal Variances
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning


Machine 3 & 4 variance
are the same

130
ST Restricted
Module 4

Hypothesis testing - introduction Nonparametric tests


Associated errors – type I & type II
Significance, confidence, power
Differences between dependent groups.
Parametric procedures • The parametric approach is the t-test (two variables measured in the same
One normal population sample. Dependent samples).
Test on mean
Test on variance • Nonparametric alternatives to this test are:
Test on proportion
Two normal populations
• the Sign test and
Test difference of means • the Wilcoxon's matched pairs test.
Test ratio of variances
Test difference of proportions • If the variables of interest are dichotomous (like "pass" vs. "no pass") then
Test on correlation coefficient
McNemar's Chi-square test.
More than two populations
• If there are more than two variables that were measured in the same sample,
Non-parametric procedures
Introduction then we would customarily use repeated measures ANOVA.
List of tests • Nonparametric alternatives to this method are
Module 4 Key Learning • Friedman's two-way ANOVA and
• Cochran Q test (if the variable was measured in terms of
categories, e.g., "passed" vs. "failed").

H0 / H1 P(value) Statistical Conclusion


H0 : µ1 = µ2 < Alpha Reject Ho
H1 : µ1 ≠ µ2 “The 2 populations have
different means” at Alpha
significance level.
131
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #24
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
1. Open the exercise File:
Test on mean
Test on variance
Test on proportion
2. Trainer will show using JMP:
Two normal populations a. How to perform non-parametric test for two dependent samples.
Test difference of means (available under Match Paired-t ➔ Non Parametric).
Test ratio of variances
Test difference of proportions
Test on correlation coefficient 3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

132
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #24
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population Go to Analyze > Specialized Modelling > Matched Pair Go to hotspot > Wilcoxon or Sign Test
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

133
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Relationships between variables. Nonparametric tests
Significance, confidence, power
Parametric procedures
• To express a relationship between two variables one usually
One normal population
Test on mean
computes the correlation coefficient.
Test on variance
Test on proportion
• Nonparametric equivalents to the standard (Pearson) correlation
Two normal populations
Test difference of means
coefficient are:
Test ratio of variances
Test difference of proportions • Spearman R,
• Kendall Tau and others
Test on correlation coefficient
More than two populations
Non-parametric procedures • If the two variables of interest are categorical in nature (e.g.,
Introduction
List of tests "passed" vs. "failed" by "male" vs. "female") nonparametric
Module 4 Key Learning statistics for testing the relationship between the two
variables are:
• the Chi-square test or
• the Fisher exact test.
H0 / H1 P(value) Statistical Conclusion
H0 : X1 and X2 are not correlated. < Alpha Reject Ho
H1 : X1 and X2 are correlated. “X1 and X2 are correlated”
at Alpha significance level. 134
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #25
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion 1. Open the exercise File:
Two normal populations
Test difference of means
Test ratio of variances
2. Trainer will show using JMP:
Test difference of proportions a. How to perform non-parametric test for relationship between 2 variables.
Test on correlation coefficient
More than two populations 3. Interpretation of results.
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

135
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #25
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
Go to Analyze Multivariate Methods > Multivariate
Test on mean
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

Prob value > significance level;


correlation is not significant 136
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II Nonparametric tests


Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Goodness Of Fit (GOF) test.
Two normal populations
Test difference of means
• This type of procedure is used to test a claim about the
Test ratio of variances
Test difference of proportions
distribution of a population (e.g. “normality tests”)
Test on correlation coefficient
More than two populations
• Examples of nonparametric tests for GOF are:
Non-parametric procedures • Kolmogorov-Smirnov test,
• Anderson-Darling test
Introduction
List of tests

Module 4 Key Learning • Lilliefors test


• Shapiro-Wilks test
• Cramèr-von Mises test
H0 / H1 P(value) Statistical Conclusion
H0 : Data follow an assumed < Alpha Reject Ho
distribution (for example: Normal).
H1 : Data do not follow the assumed
distribution. 137
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #26
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
One normal population
1. Open the exercise File:
Test on mean
Test on variance
Test on proportion
2. Trainer will show using JMP:
Two normal populations a. How to perform non-parametric test for Goodness of Fit Test.
Test difference of means b. Dependent on what distribution are being assumed, JMP will utilize different test method.
Test ratio of variances
Test difference of proportions
Test on correlation coefficient 3. Interpretation of results.
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning

138
ST Restricted
Module 4

Hypothesis testing - introduction


Exercise #26
Associated errors – type I & type II
Significance, confidence, power
Parametric procedures
Go to Analyze > Distribution Column3
One normal population
Test on mean Go to hot spot > Continuous Fit
Test on variance
Test on proportion
Two normal populations
Test difference of means
Test ratio of variances
Test difference of proportions
Test on correlation coefficient
More than two populations
Non-parametric procedures
Introduction
List of tests

Module 4 Key Learning Column4

Evaluate the P-value


139
ST Restricted
Module 4

Hypothesis testing - introduction

Associated errors – type I & type II


Comparison parametric vs. nonparametric. Summary
Significance, confidence, power
Parametric procedures
One normal population
Test on mean
Test on variance
Test on proportion
Parametric Non-parametric
Two normal populations Assumed distribution Normal Any
Test difference of means
Test ratio of variances Assumed variance Homogeneous Any
Test difference of proportions
Test on correlation coefficient Typical data Ratio or Interval Ordinal or Nominal
More than two populations Data set relationships Independent Any
Non-parametric procedures Usual central measure Mean Median
Introduction
List of tests Benefits Can draw more conclusions Simplicity; Less affected by outliers
Tests
Module 4 Key Learning
Correlation test Pearson Spearman
Independent measures, 2 groups Independent-measures t-test Mann-Whitney test
One-way, independent-
Independent measures, >2 groups Kruskal-Wallis test
measures ANOVA
Repeated measures, 2 conditions Matched-pair t-test Wilcoxon test
One-way, repeated
Repeated measures, >2 conditions Friedman's test
measures ANOVA

140
ST Restricted
Module 4 Key Learning’s

• Hypothesis Tests for one (normal) population parameters


• Using the Critical Value
• Using the P-Value

• Hypothesis Tests for two (normal) population parameters

• More than two populations: introduction to (one-way) ANOVA

• Introduction to nonparametric (distribution-free) Tests

141
ST Restricted
Annex: Overview of outlier detection methods

ST Restricted
Annex 2 objectives
At the end of this chapter, you will be able to:

• Assess the importance of detecting outliers prior to any statistical analysis.


• Have a better visibility on the most popular methods to detect outliers with
particular focus on univariate ones.

143
ST Restricted
Introduction
Outliers detection

As pointed out in the Manual of Statistical Methodology (8482919 ver.2), Chapter 7, great
importance resides in the adoption of effective methods to detect outliers. The quality of the
results of statistical analyses performed on contaminated data is heavily affected by the
presence of outliers in the dataset. As an example, consider two important statistical
applications which are heavily affected by the presence of outliers: Regression Analysis (with
OLS method) and Control Charts for process monitoring.

Moreover, from “Outlier identification in high dimensions” (2006), P. Filzmoser, R. Maronna, and M. Werner:

“Accurate identification of outliers plays an important role in statistical analysis. If classical statistical models
are blindly applied to data containing outliers, the results can be misleading at best. In addition, outliers
themselves are often the special points of interest in many practical situations and their identification is the
main purpose of the investigation. Classical tools based on the mean and covariance matrix are rarely able
to detect all the multivariate outliers in a given sample due to the masking effect (Becker and Gather, 1999),
with the consequence that methods based on classical measures are unsuitable for general use unless it is
certain that outliers are not present. Contaminated data are commonly found in several situations, and so
robust methods that identify or downweight outliers are essential tools for statisticians”.

144
ST Restricted
Methods for outlier detection
Several methods have been developed to detect outliers.
A first classification level separates between:

→ Methods for Univariate Outlier Detection


→ Methods for Multivariate Outlier Detection

While most surveys collect multivariate data, univariate outlier detection methods are usually
preferred for their simplicity. But these methods fail to detect observations that violate
the correlational structure of the dataset.

Graphical example: At univariate


level point A is not an outlier (neither
for X1 or X2). Conversely, considering
the bivariate distribution of X1 and A
X2, point A is an outlier.

OUTLIER

145
ST Restricted
Methods for outlier detection

Yet, the methods for outliers detection can be divided into different groups
according to the statistical procedure/approach which is adopted:

• Distribution-based methods.
• Distance-based methods.
• Density-based methods.
• Methods based on clustering.

146
ST Restricted
Methods for outlier detection
Distribution-based methods

they assume a known distribution of the data, and test if the target extreme value is an outlier of the
distribution, i.e., whether or not it deviates from the assumed distribution. Examples of this group of methods
are Dixon or Grubb tests. Often, in real world data it is not easy to fulfill the distributional requirements, and
this creates a limitation to their use.

Distance-based methods

Several outliers detection methods use some measure of distance to evaluate how far away an observation
is from the centre of the data. To measure this distance, the sample mean and variance may be used, but
since they are not robust to outliers, they can mask the very observations we seek to detect. In other terms,
a method which is not robust, i.e. which itself is being effected by the outliers, is of few (if no) help in
detecting them. To avoid this masking effect, variability and location estimators need to be “robustified” , that
means make the statistical estimators less sensitive to outliers. It is for this reason that many outlier
detection methods use order statistics, such as the median or quartile.

Methods for robustification of the estimators include, among the others, the Minimum Covariance
Determinant (MCD) due to Rousseeuw (The MCD estimator is determined by a subset of points of size h
which minimizes the determinant of the variance-covariance matrix over all subsets of size h).

In univariate statistics, distance-based methods provide interesting results and are often preferred for their
relative simplicity. However, in high dimensional space the notion of outlier based on distance may become
meaningless.

147
ST Restricted
Methods for outlier detection
Density-based methods

these methods assign to each object a degree to be an outlier. This degree is called the Local Outlier Factor
(LOF) of an object. It is “Local” since the considered property is the density of objects in the surrounding
neighborhood of the object itself.

Methods based on clustering

Clustering is a basic method to detect outliers. From the viewpoint of clustering algorithm, potential outliers
are data which are not located in any cluster. Furthermore, if a cluster significantly differs from other clusters,
the objects in this cluster might be outliers.

Graphically:
CLUSTER A CLUSTER B

To be noticed that with a distance-based


approach, the point in the red circle would
never be considered an outlier. In fact, it is very
OUTLIER close to the average of the dataset.

148
ST Restricted
Methods for univariate outlier detection
The methods listed below are based on “distance considerations” and are generally
considered robust in case of non-normal data (→ they do not require the normality
assumption). The idea of “distance”, means that an observed value is defined outlier if its
distance from what is considered the centre of the distribution is greater than a cut-off value.

Among the most popular methods:

• THE STANDARD DEVIATION (SD) METHOD


• THE Z-SCORE METHOD
• THE MODIFIED Z-SCORE METHOD
• THE TUKEY’S METHOD (BOXPLOT)
• THE ADJUSTED BOXPLOT METHOD
• THE MADe METHOD

149
ST Restricted
Methods for outlier detection
In a simulation study within the STATS Program, these methods have been tested on a
representative number of FE SPC variables with real data (results available).

The conclusion of the study about the most pertinent methods are summarized as follows:

➢ MADe and MD methods provide equivalent and pertinent results on both production and
monitor data.

➢ BoxPlot methods is generally aligned with MD & MADe, but in several cases the
proposed limits are not well adapted to distribution. This method provides good results
when employed on contamination data.

➢ Adjusted Box Plot (with Johnson Fit or Bootstrap methods) don’t provide correct limits.

150
ST Restricted
Methods for outlier detection
The MADe Method

The MADe method, using the Median and the Median Absolute Deviation (MAD), is one of the
basic robust methods which are largely unaffected by the presence of outliers in the dataset.
This approach is similar to the SD method. However, here the median and MADe are employed
instead of the mean and the standard deviation.
The MADe method is defined as follows:
RULE: An observation is considered outlier if its value is outside the interval:

MED ± 3 MADe

where (for a sample of size n):


• MED is the median (or 50th percentile)
• MADe=1.483×MAD for large normal data.
• MAD = Median Absolute Deviation = Median (|xi – Median(x)|) and i=1, 2, …, n

MAD is an estimator of data variability. It is similar to the standard deviation and like the
median has an approximately 50% breakdown point.
When the MAD value is scaled by a factor of 1.483, it is similar to the standard deviation in a
normal distribution. This scaled MAD value is the MADe.

151
ST Restricted
Key Learning’s
Now you know:
• The importance of adopting effective filters for outliers in
every statistical analysis.
• That exist several methods and approaches to detect
outliers.
• How to detect outliers at univariate level (using distance-
based methods).

152
ST Restricted
Conclusion
153

• What you could do next to better improve your statistical


competency:
• Use as much as possible what you have learned. And do it since
tomorrow!
• Only way to avoid forgetting what you learned: do not wait too
much time after the course to start implementing the techniques shown
in the training.
• Think about attending the next training course on “Statistical
Model Building”
• You will learn:
• how to generate a functional relation between a response variable and one or
more explanatory variables, based on empirical data.
• How to optimize these models making them reliable and usable.
• How to find stability windows and how to optimize a process

ST Restricted
Post-test
• Complete the post-test to the best of your knowledge

It allows us to measure the learning that has taken place


during the training.

10-15 minutes

154
ST Restricted
Customer satisfaction

How can we improve for next time?

Kirkpatrick Level 1 evaluation questionnaire

You will receive an e-mail, Please take 5mn to complete the


evaluation form this will help us to continually improve
the learning content, facilitation, organization

155
ST Restricted
CONGRATULATIONS!!

ST Restricted
File Revision

Version Date Remarks Who


1.0 2017 Initial Release Marco Della Seta
• New format and template.
2.0 June 2021 Marco Della Seta / HK Looi
• New exercises.

157
ST Restricted

You might also like