Download as pdf or txt
Download as pdf or txt
You are on page 1of 385

Manual 2 of 4

Practical and Clear Graduate Statistics


in Excel

Hypothesis Tests of Mean: z-Tests & t Tests


Hypothesis Tests of Proportion
Chi-Square Hypothesis Tests:
Independence & Goodness-Of-Fit Tests

The Excel Statistical Master


(that’ll be you!)

By Mark Harmon
Copyright © 2014 Mark Harmon
No part of this publication may be reproduced
or distributed without the express permission
of the author.
mark@ExcelMasterSeries.com
ISBN: 978-1-937159-21-4
Table of Contents

t-Tests in Excel
t-Test: t-Distribution-Based Hypothesis Test ....................................... 16
t-Test Overview .......................................................................................................... 16
Null Hypothesis .......................................................................................................... 16
Null Hypothesis - Rejected or Not But Never Accepted.............................................. 17
Alternative Hypothesis................................................................................................ 17
One-Tailed Test vs. Two-Tailed Test ......................................................................... 18
Level of Certainty ....................................................................................................... 18
Level of Significance (Alpha) ...................................................................................... 18
Region of Acceptance ................................................................................................ 18
Region of Rejection .................................................................................................... 19
Critical Value(s) .......................................................................................................... 19
Test Statistic ............................................................................................................... 19
p Value ....................................................................................................................... 20
Critical t Value or Critical z Value ............................................................................... 20
Critical t Value For 1-Tailed Test in Right Tail: ............................................................................ 20
Critical t Value For 1-Tailed Test in Left Tail:............................................................................... 21
Critical t Values For a 2-Tailed Test: .............................................................................................. 21
3 Equivalent Reasons To Reject Null Hypothesis ...................................................... 22
1) Sample Statistic Beyond Critical Value ....................................................................................... 22
2) Test Statistic Beyond Critical t or z Value .................................................................................. 22
3) p Value Smaller Than α (1-Tailed) or α/2 (2-Tailed) ................................................................. 22
Independent vs. Dependent Samples ........................................................................ 22
Pooled vs. Unpooled Tests ........................................................................................ 22
Type I and Type II Errors ............................................................................................ 22
Power of a Test .......................................................................................................... 23
Effect Size .................................................................................................................. 23
Nonparametric Alternatives for t-Tests in Excel ......................................................... 23
Hypothesis Test of Mean vs. Proportion..................................................................... 23
Hypothesis Tests of Mean – Overview ............................................................................................. 23
Hypothesis Tests of Proportion – Overview .................................................................................... 24
t-Test vs. z-Test.......................................................................................................... 24
Means of Large Samples Are Normally Distributed .................................................... 24
Requirements of a z-Test ........................................................................................... 24
Requirements of a t-Test ............................................................................................ 25
Basic Steps of a Hypothesis Test of Mean ................................................................. 25
Uses of Hypothesis Tests of Mean ............................................................................. 27
Types of Hypothesis Tests of Mean ........................................................................... 27

1) One-Sample t-Test in Excel ................................................................ 28


Overview .................................................................................................................... 28
Example of a 1-Sample, 2-Tailed t-Test in Excel ....................................................... 28
Summary of Problem Information................................................................................................... 29
Question 1) Type of Test? ................................................................................................................. 30
a) Hypothesis Test of Mean or Proportion? .................................................................................................................... 30
b) One-Sample or a Two-Sample Test? ........................................................................................................................... 30
c) Independent (Unpaired) Test or a Dependent (Paired) Test? .................................................................................... 30
d) One-Tailed or Two-Tailed Hypothesis?....................................................................................................................... 30
e) t-Test or z-Test? ........................................................................................................................................................... 30

Question 2) Test Requirements Met?............................................................................................... 31


a) t-Distribution of Test Statistic...................................................................................................................................... 31
When Sample Size Is Large ......................................................................................................................................... 31
When Sample Size is Small ......................................................................................................................................... 31

Evaluating the Normality of the Sample Data ................................................................................ 32


Histogram in Excel ........................................................................................................................................................... 32
Normal Probability Plot in Excel ...................................................................................................................................... 33
Kolmogorov-Smirnov Test For Normality in Excel ......................................................................................................... 34
Anderson-Darling Test For Normality in Excel ............................................................................................................... 36
Shapiro-Wilk Test For Normality in Excel ....................................................................................................................... 38

Correctable Reasons That Normal Data Can Appear Non-Normal ............................................. 39


When Data Are Not Normally Distributed...................................................................................... 39
Step 1 – Create the Null and Alternate Hypotheses ....................................................................... 40
Step 2 – Map the Distributed Variable to t-Distribution ............................................................... 41
Step 3 – Map the Regions of Acceptance and Rejection ................................................................ 42
Calculate Critical Values .................................................................................................................................................. 43

Step 4 – Determine Whether to Reject Null Hypothesis ................................................................ 44


1) Compare Sample Mean x-bar With Critical Value ....................................................................................................... 44
2) Compare the t Value With Critical t Value ................................................................................................................... 44
3) Compare the p Value With Alpha ................................................................................................................................ 45

Excel Shortcut to Performing a One-Sample t-Test ....................................................................... 47


Effect Size in Excel............................................................................................................................. 48
Power of the Test With Free Utility G*Power ................................................................................ 49
Nonparametric Alternatives in Excel ............................................................................................... 52
Wilcoxon One-Sample, Signed-Rank Test in Excel ........................................................................................................ 53
Sign Test in Excel ............................................................................................................................................................. 63

2) Two-Independent-Sample, Pooled t-Test in Excel ........................... 69


Overview .................................................................................................................... 69
Example of 2-Sample, 1-Tailed, Pooled t-Test in Excel ............................................. 70
Summary of Problem Information................................................................................................... 71
Question 1) What Type of Test Should Be Done?........................................................................... 72
a) Hypothesis Test of Mean or Proportion? .................................................................................................................... 72
b) One-Sample or a Two-Sample Test? ........................................................................................................................... 72
c) Independent or Dependent (Paired) Test? .................................................................................................................. 73
d) One-Tailed or Two-Tailed Test? .................................................................................................................................. 73
e) t-Test or as a z-Test?.................................................................................................................................................... 73
f) Pooled or Unpooled t-Test? ......................................................................................................................................... 73
F Test For Sample Variance Comparison in Excel ..................................................................................................... 73
Levene’s Test For Sample Variance Comparison in Excel ........................................................................................ 74
Brown-Forsythe Test For Sample Variance Comparison in Excel ............................................................................ 76

Question 2) Requirements Met? ....................................................................................................... 77


a) Normal Distribution of Both Sample Means ............................................................................................................... 77
1) Sample Size of Both Samples Greater Than 30...................................................................................................... 77
2) Both Populations Are Normally Distributed ........................................................................................................... 77
3) Both Samples Are Normally Distributed................................................................................................................. 77
b) Similarity of Sample Variances ................................................................................................................................... 78
c) Independence of Samples ........................................................................................................................................... 78

Evaluating the Normality of the Sample Data ................................................................................ 78


Histogram in Excel ........................................................................................................................................................... 79
Normal Probability Plot in Excel ...................................................................................................................................... 81
Kolmogorov-Smirnov Test For Normality in Excel ......................................................................................................... 82
Anderson-Darling Test For Normality in Excel ............................................................................................................... 85
Shapiro-Wilk Test For Normality in Excel ....................................................................................................................... 87

Correctable Reasons That Normal Data Can Appear Non-Normal ............................................. 89


When Data Are Not Normally Distributed...................................................................................... 89
Step 1 – Create Null and Alternate Hypotheses .............................................................................. 90
Step 2 – Map Distributed Variable on a t-Distribution Curve ...................................................... 91
Step 3 – Map the Regions of Acceptance and Rejection ................................................................ 93
Calculate Critical Values .................................................................................................................................................. 94
One-Tailed Critical Values ........................................................................................................................................... 94
Two-Tailed Critical Values ........................................................................................................................................... 95
Step 4 – Determine Whether to Reject Null Hypothesis ................................................................ 95
1) Compare the Sample Mean x_bar1-x_bar2 With Critical Value ................................................................................... 95
2) Compare t Value With Critical t Value ......................................................................................................................... 95
3) Compare the p Value With Alpha ................................................................................................................................ 96

Excel Data Analysis Tool Shortcut ................................................................................................... 97


Excel Statistical Function Shortcut ................................................................................................ 103
Effect Size in Excel........................................................................................................................... 104
Power of the Test With Free Utility G*Power .............................................................................. 106
Nonparametric Alternatives in Excel ............................................................................................. 110
Mann-Whitney U Test in Excel ....................................................................................................................................... 110

How Sample Standard Deviation Affects t-Test Results .............................................................. 124

3) Two-Independent-Sample, Unpooled t-Test in Excel ..................... 133


Overview .................................................................................................................. 133
Example of 2-Sample, 2-Tailed, Unpooled t-Test in Excel ....................................... 134
Summary of Problem Information................................................................................................. 135
Question 1) What Type of Test Should Be Done?......................................................................... 136
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 136
b) One-Sample or Two-Sample Test? ............................................................................................................................ 136
c) Independent (Unpaired) Test or Dependent (Paired) Test? ..................................................................................... 136
d) One-Tailed or Two-Tailed Test? ................................................................................................................................ 137
e) t-Test or z-Test? ......................................................................................................................................................... 137
f) Pooled or Unpooled t-Test? ....................................................................................................................................... 137
F Test For Sample Variance Comparison in Excel ................................................................................................... 137
Levene’s Test For Sample Variance Comparison in Excel ...................................................................................... 138
Brown-Forsythe Test For Sample Variance Comparison in Excel .......................................................................... 140

Question 2) Test Requirements Met?............................................................................................. 141


a) Normal Distribution of Both Sample Means ............................................................................................................. 141
1) Sample Size of Both Samples Greater Than 30.................................................................................................... 141
2) Both Populations Are Normally Distributed ......................................................................................................... 141
3) Both Samples Are Normally Distributed............................................................................................................... 142
b) Significantly Different Sample Variances ................................................................................................................. 142
c) Independence of Samples ......................................................................................................................................... 142

Evaluating the Normality of the Sample Data .............................................................................. 142


Histogram in Excel ......................................................................................................................................................... 143
Normal Probability Plot in Excel .................................................................................................................................... 145
Kolmogorov-Smirnov Test For Normality in Excel ....................................................................................................... 146
Anderson-Darling Test For Normality in Excel ............................................................................................................. 148
Shapiro-Wilk Test For Normality in Excel ..................................................................................................................... 151

Correctable Reasons That Normal Data Can Appear Non-Normal ........................................... 153
Step 1 – Create the Null and Alternate Hypotheses ..................................................................... 153
Step 2 – Map the Distributed Variable on a t-Distribution Curve .............................................. 155
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 156
Calculate the Critical Values .......................................................................................................................................... 157
Two-Tailed Critical Values ......................................................................................................................................... 157
One-Tailed Critical Value ........................................................................................................................................... 158

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 158


1) Compare Sample Mean, x_bar1-x_bar2 With Critical Value ...................................................................................... 158
2) Compare t Value With Critical t Value ....................................................................................................................... 158
3) Compare the p Value With Alpha .............................................................................................................................. 159

Excel Data Analysis Tool Shortcut ................................................................................................. 160


Excel Statistical Function Shortcut ................................................................................................ 165
Effect Size in Excel........................................................................................................................... 165
Power of the Test With Free Utility G*Power .............................................................................. 169
Nonparametric Alternatives in Excel ............................................................................................. 172

Paired (Two-Sample Dependent) t-Test in Excel ................................ 173


Overview .................................................................................................................. 173
Example of Paired, 1-Tailed t-Test in Excel ............................................................. 173
Summary of Problem Information................................................................................................. 175
Question 1) Type of Test? ............................................................................................................... 176
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 176
b) One-Sample or Two-Sample Test? ............................................................................................................................ 176
c) Independent (Unpaired) Test or Dependent (Paired) Test? ..................................................................................... 176
d) One-Tailed or Two-Tailed Test? ................................................................................................................................ 176
e) t-Test or z-Test? ......................................................................................................................................................... 176

Question 2) Test Requirements Met?............................................................................................. 177


a) t-Distribution of Test Statistic.................................................................................................................................... 177
When Difference Sample Size Is Large ..................................................................................................................... 177
When the Difference Sample Size is Small ............................................................................................................... 178

Evaluating the Normality of the Sample Data .............................................................................. 178


Histogram in Excel ......................................................................................................................................................... 178
Normal Probability Plot in Excel .................................................................................................................................... 180
Kolmogorov-Smirnov Test For Normality in Excel ....................................................................................................... 180
Anderson-Darling Test For Normality in Excel ............................................................................................................. 182
Shapiro-Wilk Test For Normality in Excel ..................................................................................................................... 184

Correctable Reasons That Normal Data Can Appear Non-Normal ........................................... 185
When Data Are Not Normally Distributed.................................................................................... 185
Step 1 – Create the Null and Alternate Hypotheses ..................................................................... 186
Step 2 – Map the Distributed Variable to t-Distribution ............................................................. 187
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 188
Calculate the Critical Value ............................................................................................................................................ 189

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 190


1) Compare x-bardiff With Critical Value ......................................................................................................................... 190
2) Compare t Value With Critical t Value ....................................................................................................................... 190
3) Compare p Value With Alpha ..................................................................................................................................... 191

Excel Data Analysis Tool Shortcut ................................................................................................. 192


Excel Statistical Function Shortcut ................................................................................................ 197
Effect Size in Excel........................................................................................................................... 198
Power of the Test With Free Utility G*Power .............................................................................. 200
Nonparametric Alternatives in Excel ............................................................................................. 204
Wilcoxon Signed-Rank Test in Excel ............................................................................................................................ 204
The Sign Test in Excel.................................................................................................................................................... 216

z-Tests: Hypothesis Tests Using the Normal Distribution in Excel .. 222


z-Test Overview ....................................................................................................... 222
Hypothesis Test Overview ........................................................................................ 222
Null Hypothesis ........................................................................................................ 222
The Null Hypothesis is Either Rejected or Not Rejected But Is Never Accepted ...................... 223
Alternative Hypothesis.............................................................................................. 223
One-Tailed Test vs. a Two-Tailed Test .................................................................... 224
Level of Certainty ..................................................................................................... 224
Level of Significance (Alpha) .................................................................................... 224
Region of Acceptance .............................................................................................. 224
Region of Rejection .................................................................................................. 225
Critical Value(s) ........................................................................................................ 225
Test Statistic ............................................................................................................. 225
Critical t Value or Critical z Value ............................................................................. 226
Relationship Between p Value and Alpha ................................................................ 226
Critical z Values........................................................................................................ 227
Critical z Value for a one-tailed test in the right tail:................................................................... 227
Critical z Value for a one-tailed test in the left tail: ..................................................................... 227
Critical z Values for a two-tailed test: ........................................................................................... 227
p Value ..................................................................................................................... 227
The 3 Equivalent Reasons To Reject the Null Hypothesis ....................................... 228
Independent Samples vs. Dependent Samples ....................................................... 228
Pooled vs. Unpooled Tests ...................................................................................... 228
Type I and Type II Errors .......................................................................................... 228
Power of a Test ........................................................................................................ 228
Effect Size ................................................................................................................ 229
Nonparametric Alternatives ...................................................................................... 229
Hypothesis Test of Mean vs. Proportion................................................................... 229
Hypothesis Tests of Mean – Basic Definition ................................................................................ 229
Hypothesis Tests of Proportion – Basic Definition ....................................................................... 230
Hypothesis Tests of Mean ........................................................................................ 230
t-Test versus z-Test .......................................................................................................................... 230
Normal Distribution of Means of Large Samples ......................................................................... 230
Requirements for a z-Test ............................................................................................................... 231
Requirements for a t-Test ............................................................................................................... 231
Basic Steps of a Hypothesis Test of Mean...................................................................................... 231
Uses of Hypothesis Tests of Mean .................................................................................................. 233
Types of Hypothesis Tests of Mean: t-Tests or z-Tests ................................................................ 233

1) One-Sample z-Test in Excel ............................................................. 234


Overview .................................................................................................................. 234
Example of a One-Sample, Two-Tailed z-Test in Excel ........................................... 234
Summary of Problem Information................................................................................................. 235
Question1) Type of Test? ................................................................................................................ 236
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 236
b) One-Sample or a Two-Sample Test? ......................................................................................................................... 236
c) Independent (Unpaired) Test or a Dependent (Paired) Test? .................................................................................. 236
d) One-Tailed or Two-Tailed Hypothesis?..................................................................................................................... 236
e) t-Test or z-Test? ......................................................................................................................................................... 237

Question 2) Test Requirements Met?............................................................................................ 237


a) Normal Distribution of Test Statistic ......................................................................................................................... 237
1) Population standard deviation, σ, is known ......................................................................................................... 237
2) Sample size is large (n > 30).................................................................................................................................. 237

Step 1 – Create the Null and Alternative Hypotheses .................................................................. 238


Step 2 – Map the Distributed Variable to Normal Distribution .................................................. 239
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 240
Calculate Critical Values ................................................................................................................................................ 240

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 241


1) Compare x-bar With Critical Value ............................................................................................................................ 241
2) Compare the z Score With Critical z Value ............................................................................................................... 241
3) Compare the p Value With Alpha .............................................................................................................................. 242
Excel Formula Shortcut to Performing a One-Sample z-Test ..................................................... 243

2) Two-Independent-Sample, Unpooled z-Test in Excel .................... 245


Overview .................................................................................................................. 245
Example of 2-Sample, 2-Tailed, Unpooled z-Test in Excel ...................................... 246
Summary of Problem Information................................................................................................. 246
Question 1) What Type of Test Should Be Done?......................................................................... 247
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 247
b) One-Sample or Two-Sample Test? ............................................................................................................................ 247
c) Independent (Unpaired) Test or Dependent (Paired) Test? ..................................................................................... 247
) One-Tailed or Two-Tailed Test? .................................................................................................................................. 247
e) t-Test or z-Test? ......................................................................................................................................................... 247
f) Pooled or Unpooled t-Test? ....................................................................................................................................... 248

Question 2) Test Requirements Met?............................................................................................. 248


a) Normal Distribution of Both Sample Means ............................................................................................................. 248
1) Both Population Standard Deviations, σ1 and σ2, Are Known ........................................................................... 248
2) Both samples sizes are large (n > 30). .................................................................................................................. 248

Step 1 – Create the Null and Alternative Hypotheses .................................................................. 249


Step 2 – Map the Distributed Variable on a Normal Distribution Curve .................................. 250
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 251
Calculate the Critical Values .......................................................................................................................................... 251
Two-Tailed Critical Values ......................................................................................................................................... 251

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 252


1) Compare x_bar1-x_bar2 With Critical Value ............................................................................................................... 252
2) Compare the z Score with the Critical z Value.......................................................................................................... 253
3) Compare the p Value With Alpha .............................................................................................................................. 253

Excel Data Analysis Tool Shortcut ................................................................................................. 254

3) Paired (Two-Sample Dependent) z-Test in Excel ........................... 257


Overview .................................................................................................................. 257
Example of Paired, 1-Tailed (Left-Tail) z-Test in Excel ............................................ 257
Summary of Problem Information................................................................................................. 259
Question 1) What Type of Test Should Be Done?......................................................................... 260
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 260
b) One-Sample or Two-Sample Test? ............................................................................................................................ 260
c) Independent (Unpaired) Test or Dependent (Paired) Test? ..................................................................................... 260
d) One-Tailed or Two-Tailed Test? ................................................................................................................................ 260
e) t-Test or z-Test? ......................................................................................................................................................... 260

Question 2) Test Requirements Met?............................................................................................. 260


a) Test Statistic Distributed According to Normal Distribution ................................................................................... 260
Step 1 – Create the Null and Alternative Hypotheses .................................................................. 261
Step 2 – Map Distributed Variable to Normal Distribution Curve............................................. 262
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 263
Calculate the Critical Value ............................................................................................................................................ 263

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 264


1) Compare x-bardiff, with Critical Value ........................................................................................................................ 264
2) Compare z Score with Critical z Value ...................................................................................................................... 264
3) Compare p Value to Alpha. ........................................................................................................................................ 266

Excel Shortcut to Performing a Paired z-Test .............................................................................. 267

Hypothesis Testing on Binomial Data ................................................. 268


Overview .................................................................................................................. 268
Null Hypothesis ........................................................................................................ 268
Null Hypothesis is Either Rejected or Not Rejected But Is Never Accepted .............................. 268
Alternative Hypothesis.............................................................................................. 269
One-Tailed Test vs. a Two-Tailed Test .................................................................... 269
Level of Certainty ..................................................................................................... 270
Level of Significance (Alpha) .................................................................................... 270
Region of Acceptance .............................................................................................. 270
Region of Rejection .................................................................................................. 270
Critical Value(s) ........................................................................................................ 271
Test Statistic ............................................................................................................. 271
Critical t Value or Critical z Value ............................................................................. 271
Relationship Between p Value and Alpha ................................................................ 272
The 3 Equivalent Reasons To Reject the Null Hypothesis ....................................... 272
Type I and Type II Errors .......................................................................................... 272
Power of a Test ........................................................................................................ 273
Effect Size ................................................................................................................ 273
Hypothesis Test of Mean vs. Proportion................................................................... 273
Uses of Hypothesis Tests of Proportion ................................................................... 276
Types of Hypothesis Tests of Proportion.................................................................. 276

1) One-Sample Hypothesis Test of Proportion ................................... 277


Overview .................................................................................................................. 277
Example of a One-Sample, Two-Tailed Hypothesis Test of Proportion in Excel ...... 278
Summary of Problem Information................................................................................................. 278
Question1) Type of Test? ................................................................................................................ 279
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 279
b) One-Tailed or Two-Tailed Hypothesis?..................................................................................................................... 279
b) One-Sample or a Two-Sample Test? ......................................................................................................................... 279
d) t-Test or z-Test? ......................................................................................................................................................... 279

Question 2) Test Requirements Met?............................................................................................. 279


Can Binomial Distribution Be Approximated By Normal Distribution? ...................................................................... 279

Step 1 – Create the Null and Alternative Hypotheses .................................................................. 281


Step 2 – Map the Distributed Variable to Normal Distribution .................................................. 282
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 283
The Regions of Acceptance and Rejection ................................................................................................................... 283
Calculate Critical Values ................................................................................................................................................ 284

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 285


1) Compare p-bar With Critical Value ............................................................................................................................ 285
2) Compare z Value With Critical z Value ...................................................................................................................... 285
3) Compare p Value With Alpha ..................................................................................................................................... 286

Two-Sample, Pooled Hypothesis Test of Proportion in Excel ........... 287


Overview .................................................................................................................. 287
Example of a Two-Sample, Pooled, Two-Tailed Hypothesis Test of Proportion in Excel
................................................................................................................................. 288
Summary of Problem Information................................................................................................. 288
Question1) Type of Test? ................................................................................................................ 289
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 289
b) One-Tailed or Two-Tailed Hypothesis?..................................................................................................................... 289
b) One-Sample or a Two-Sample Test? ......................................................................................................................... 289
d) Pooled Test or an Unpooled Test? ............................................................................................................................ 289
d) t-Test or z-Test? ......................................................................................................................................................... 290

Question 2) Test Requirements Met?............................................................................................. 290


Can Binomial Distribution Be Approximated By Normal Distribution? ...................................................................... 290

Step 1 – Create the Null and Alternative Hypotheses .................................................................. 292


Step 2 – Map the Distributed Variable to Normal Distribution .................................................. 293
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 294
The Regions of Acceptance and Rejection ................................................................................................................... 294
Calculate Critical Values ................................................................................................................................................ 295

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 296


1) Compare p_bar2–p_bar1 With Critical Value ............................................................................................................ 296
2) Compare z Value With Critical z Value ...................................................................................................................... 296
3) Compare p Value With Alpha ..................................................................................................................................... 297

Two-Sample, Unpooled Hypothesis Test of Proportion in Excel ...... 299


Overview .................................................................................................................. 299
Example of a Two-Sample, Unpooled, One-Tailed Hypothesis Test of Proportion in
Excel ........................................................................................................................ 300
Summary of Problem Information................................................................................................. 300
Question1) Type of Test? ................................................................................................................ 301
a) Hypothesis Test of Mean or Proportion? .................................................................................................................. 301
b) One-Tailed or Two-Tailed Hypothesis?..................................................................................................................... 301
b) One-Sample or a Two-Sample Test? ......................................................................................................................... 301
d) Pooled Test or an Unpooled Test? ............................................................................................................................ 301
d) t-Test or z-Test? ......................................................................................................................................................... 302

Question 2) Test Requirements Met?............................................................................................. 302


Can Binomial Distribution Be Approximated By Normal Distribution? ...................................................................... 302

Step 1 – Create the Null and Alternative Hypotheses .................................................................. 304


Step 2 – Map the Distributed Variable to Normal Distribution .................................................. 305
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 306
The Regions of Acceptance and Rejection ................................................................................................................... 306
Calculate Critical Values ................................................................................................................................................ 307

Step 4 – Determine Whether to Reject Null Hypothesis .............................................................. 308


1) Compare p_bar2–p_bar1 With Critical Value ............................................................................................................ 308
2) Compare z Value With Critical z Value ...................................................................................................................... 308
3) Compare p Value With Alpha ..................................................................................................................................... 309

Chi-Square Independence Test in Excel ............................................. 310


Overview .................................................................................................................. 310
Contingency Table ........................................................................................................................... 310
Test Compares Actual vs. Expected Bin Counts ........................................................................... 310
Null Hypothesis ................................................................................................................................ 310
Test Statistic ..................................................................................................................................... 310
When to Reject Null Hypothesis ..................................................................................................... 310
Required Assumptions .................................................................................................................... 311
Example of Chi-Square Independent Test in Excel .................................................. 312
Step 1 – Place Actual Counts In Contingency Table .................................................................... 312
Creating the Contingency Table From an Excel Pivot Table ....................................................................................... 313

Step 2 – Place Expected Counts In Contingency Table ................................................................ 316


Step 3 – Create Null and Alternative Hypotheses ......................................................................... 316
Step 4 – Verify Required Assumptions .......................................................................................... 317
Step 5 – Calculate Chi-Square Statistic, Χ2 ................................................................................... 317
Step 6 – Calculate Critical Chi-Square Value and p Value ......................................................... 318
Step 7 – Determine Whether To Reject Null Hypothesis ............................................................. 319
Chi-Square Goodness-of-Fit Tests in Excel ........................................ 320
Overview .................................................................................................................. 320
Test Statistic ..................................................................................................................................... 320
Required Assumptions .................................................................................................................... 320
Null Hypothesis ................................................................................................................................ 320
Basic Excel Formulas ...................................................................................................................... 321
The Two Types of GOF Tests .................................................................................. 321
1) Bin Sizes Are Pre-Determined.................................................................................................... 321
2) Bin Sizes Arbitrarily Set To Match a Distribution ................................................................... 321
GOF Example – Type 1............................................................................................ 322
Bin Sizes Are Pre-Determined ........................................................................................................ 322
Problem Information ....................................................................................................................... 322
Step 1 – Create Expected Bins ........................................................................................................ 322
Step 2 – Calculate Counts in Expected Bins .................................................................................. 322
Step 3 – Verify Required Assumptions .......................................................................................... 323
Step 4 – Create Null and Alternative Hypotheses ......................................................................... 323
Step 5 – Calculate Chi-Square Statistic, Χ2 ................................................................................... 324
Step 6 – Calculate Critical Chi-Square Value and p Value ......................................................... 325
Step 7 – Determine Whether To Reject Null Hypothesis ............................................................. 326
GOF Example – Type 2............................................................................................ 327
Bin Sizes Arbitrarily Set To Match a Distribution ....................................................................... 327
Chi-Square Goodness-of-Fit Test for Normality ....................................................... 327
Overview ........................................................................................................................................... 327
Chi-Square GOF Test for Normality Example in Excel .............................................................. 328
Step 1 – Sort and Standardize Data ............................................................................................... 328
a) Sorting the Data.......................................................................................................................................................... 329
b) Standardizing the Data .............................................................................................................................................. 330

Step 2 – Create Bins......................................................................................................................... 331


Step 3 – Determine Actual Count For Each Bin ........................................................................... 332
a) Creating a Histogram With the Excel Histogram Tool.............................................................................................. 333
b) Creating a Histogram With a Formula and Bar Chart............................................................................................... 334

Step 4 – Determine Expected Count For Each Bin....................................................................... 336


Step 5 – Verify Required Assumptions .......................................................................................... 339
Step 6 – Create Null and Alternative Hypotheses ......................................................................... 340
Step 7 – Calculate Chi-Square Statistic, Χ2 ................................................................................... 341
Step 8 – Calculate Critical Chi-Square Value and p Value ......................................................... 342
Step 9 – Determine Whether To Reject Null Hypothesis ............................................................. 343
Chi-Square Population Variance Test in Excel ................................... 344
Overview .................................................................................................................. 344
One-Sample Chi-Square Population Variance Test ................................................. 344
Two-tailed test .................................................................................................................................. 344
One-tailed test – Right tail .............................................................................................................. 345
One-tailed test – Left tail ................................................................................................................. 345
Example of 1-Sample, 2-Tailed, Chi-Square Population Variance Test in Excel ...... 345
Problem Information ....................................................................................................................... 345
Requirement of Population Normality .......................................................................................... 345
Non-Parametric Alternatives to 1-Sample Chi-Square Population Variance Test ................... 346
Null and Alternative Hypotheses .................................................................................................... 346
Chi-Square Statistic and Chi-Square Critical Values .................................................................. 347
Example of 1-Sample, 1-Tailed, Right Tail, Chi-Square Population Variance Test in
Excel ........................................................................................................................ 348
Problem Information ....................................................................................................................... 348
Requirement of Population Normality .......................................................................................... 348
Non-Parametric Alternatives to the One-Sample Chi-Square Population Variance Test ........ 349
Null and Alternative Hypotheses .................................................................................................... 349
Chi-Square Statistic and Chi-Square Critical Values .................................................................. 350
Example of 1-Sample, 1-Tailed, Left Tail, Chi-Square Population Variance Test in
Excel ........................................................................................................................ 351
Problem Information ....................................................................................................................... 351
Requirement of Population Normality .......................................................................................... 351
Non-Parametric Alternatives to 1-Sample Chi-Square Population Variance Test ................... 352
Null and Alternative Hypotheses .................................................................................................... 352
Chi-Square Statistic and Chi-Square Critical Values .................................................................. 353

F-Test – 2-Sample, 2-Tailed Chi-Square Population Variance Test ... 354


F Test Problem in Excel ........................................................................................... 356
Example Data ................................................................................................................................... 357
Step 2 – Verify Normality of Both Populations ............................................................................. 357
Step 3 – Create the Null and Alternative Hypotheses .................................................................. 360
Step 4 – Calculate the F Statistic .................................................................................................... 360
Step 5 – Calculate F Critical ........................................................................................................... 360
Step 6 – Compare the F Statistic to F Critical .............................................................................. 360
Performing the F Test With the Data Analysis F Test Tool ........................................................ 362
In-Depth Analysis of Sample Normality ........................................................................................ 364
Excel Histogram ............................................................................................................................................................. 364
Kolmogorov-Smirnov Test For Normality in Excel ....................................................................................................... 368
Anderson-Darling Test For Normality in Excel ............................................................................................................. 371
Shapiro-Wilk Test For Normality in Excel ..................................................................................................................... 373

Correctable Reasons That Normal Data Can Appear Non-Normal ........................................... 375
Nonparametric Alternatives to the F Test ..................................................................................... 376
Levene’s Test For Sample Variance Comparison in Excel .......................................................................................... 376
Brown-Forsythe Test For Sample Variance Comparison in Excel .............................................................................. 378

Check Out the Latest Book in the Excel Master Series! .................... 380

Meet the Author .................................................................................... 385


t-Test: t-Distribution-Based Hypothesis Test

t-Test Overview
The t-Test is the most commonly-used hypothesis test that analyzes sample data to determine if two
populations have significantly different means. A t-Test can be applied if the test statistic follows the t
Distribution under the Null Hypothesis. The test statistic will follow the t Distribution if any of the following
conditions exist:
1) The population is normally distributed.
2) The sample is normally distributed.
3) The sample size is large.
The t-Test is the appropriate population mean hypothesis testing tool when sample size is small and/or
the population standard deviation is not known. A t-Test can always be substituted for a z-Test.

Hypothesis Test Overview


A hypothesis test evaluates whether a sample is different enough from a population to establish that the
sample probably did not come from that population. If a sample is different enough from a hypothesized
population, then the population from which the sample came is different than the hypothesized
population.

Null Hypothesis
A hypothesis test is based upon a Null Hypothesis which states that the sample did come from that
population. A hypothesis test compares a sample statistic such as a sample mean to a population
parameter such as the population’s mean. The amount of difference between the sample statistic and the
population parameter determines whether the Null Hypothesis can be rejected or not.
The Null Hypothesis states that the population from which the sample came has the same mean or
proportion as a hypothesized population. The Null Hypothesis is always an equality stating that the
means or proportions of two populations are the same.
An example of a basic Null Hypothesis for a Hypothesis Test of Mean would be the following:
H0: x_bar = Constant = 5
This Null Hypothesis would be used to state that the population from which the sample was taken has a
mean equal to 5. The Constant (5) is the mean of the hypothesized population that the sample’s
population is being compared to. The Null Hypothesis states that the sample’s population and the
hypothesized population have the same means. The Alternative Hypothesis states that they are different.
An example of a basic Null Hypothesis for a Hypothesis Test of Proportion would be the following:
H0: p_bar = Constant = 0.3
This Null Hypothesis would be used to state that the population from which the sample was taken has a
proportion equal to 0.3. The Constant (0.3) is the proportion of the hypothesized population that the
sample’s population is being compared to. The Null Hypothesis states that the sample’s population and
the hypothesized population have the same proportions. The Alternative Hypothesis states that they are
different.

16
Null Hypothesis - Rejected or Not But Never Accepted
A hypothesis test has only two possible outcomes: the Null Hypothesis is either rejected or is not rejected.
It is never correct to state that the Null Hypothesis was accepted. A hypothesis test only determines
whether there is or is not enough evidence to reject the Null Hypothesis. The Null Hypothesis is rejected
only when the hypothesis test result indicates a Level of Certainty that the Null Hypothesis is not valid at
least equals the specified Level of Certainty.
If the required Level of Certainty for a hypothesis test is specified to be 95 percent, the Null Hypothesis
will be rejected only if the test result indicates that there is at least a 95 percent probability that the Null
Hypothesis is invalid. In all other cases, the Null Hypothesis would not be rejected. This is not equivalent
to stating that the Null Hypothesis was accepted. The Null Hypothesis is never accepted; it can only be
rejected or not rejected.

Alternative Hypothesis
The Alternative Hypothesis is always in inequality stating that the means or proportions of two populations
are not the same. The Alternative Hypothesis can be non-directional if it states that the means or
proportions of two populations are merely not equal to each other. The Alternative Hypothesis is
directional if it states that the mean or proportion of one of the populations is less than or greater than the
mean of proportion of the other population.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Mean would be the
following:
H1: x_bar ≠ 5
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a mean that is not equal to 5.
An example of a directional Alternative Hypothesis would be the following:
H1: x_bar > 5
or
H1: x_bar < 5
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a mean that is either greater than or less than 5.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Proportion would be the
following:
H1: p_bar ≠ 0.3
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a proportion that is not equal to 0.3.
An example of a directional Alternative Hypothesis would be the following:H 1: p_bar > 0.3
or
H1: p_bar < 0.3
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a proportion that is either greater than or less than 0.3.

17
One-Tailed Test vs. Two-Tailed Test
The number of tails in a hypothesis test depends on whether the test is directional or not. The operator of
the Alternative Hypothesis indicates whether or not the hypothesis test is directional. A non-directional
operator (a “not equal” sign) in the Alternative Hypothesis indicates that the hypothesis test is a two-
tailed test. A directional operator (a “greater than” or “less than” sign) in the Alternative Hypothesis
indicates that the hypothesis test is a one-tailed test.
The Region of Rejection (the alpha region) for a one-tailed test is entirely contained in the one of the
outer tails. A “greater than” operator in the Alternative Hypothesis indicates that the test is a one-tailed
test in the right tail. A “less than” operator in the Alternative Hypothesis indicates that the test is a one-
tailed test in the left tail. If α = 0.05, then one of the outer tails will contain the entire 5-percent Region of
Rejection.
The Region of Rejection (the alpha region) for a two-tailed test is split between both outer tails. Each
outer tail will contain half of the total Region of Rejection (alpha/2). If α = 0.05, then each outer tail will
contain a 2.5-percent Region of Rejection if the test is a two-tailed tailed.

Level of Certainty
Each hypothesis test has Level of Certainty that is specified. The Null Hypothesis is rejected only when
that Level of Certainty has been reached that the sample did not come from the population. A commonly
specified Level of Certainty is 95 percent. The Null Hypothesis would only be rejected in this case if the
sample statistic was different enough from the population parameter that at least 95 percent certainty was
achieved that the sample did not come from that population.

Level of Significance (Alpha)


The Level of Certainty for a hypothesis test is often indicated with a different term called the Level of
Significance also known as α (alpha). The relationship between the Level of Certainty and α is the
following:
α = 1 – Level of Certainty
An alpha that is set to 0.05 indicates that a hypothesis test requires a Level of Certainty of 95 percent that
the sample came from a different population to be reached before the Null Hypothesis is rejected.

Region of Acceptance
A Hypothesis Test of Mean or Proportion can be performed if the Test Statistic is distributed according to
the normal distribution or the t distribution. The Test Statistic is derived directly from the sample statistic
such as the sample mean. If the Test Statistic is distributed according to the normal or t distribution, then
the sample statistic is also distributed according to normal or t distribution. This will be discussed is
greater detail shortly.
A Hypothesis Test of Mean or Proportion can be understood much more intuitively by mapping the
sample statistic (the sample mean or proportion) to its own unique normal or t distribution. The sample
statistic is the distributed variable whose distribution is mapped according its own unique normal or t
distribution
The Region of Acceptance is the percentage of area under this normal or t distribution curve that equals
the test’s specified Level of Certainty. If the hypothesis test requires 95 percent in order to reject the Null
Hypothesis, the Region of Acceptance will include 95 percent of the total area under the distributed
variable’s mapped normal or t distribution curve.

18
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Acceptance, the Null Hypothesis is not rejected. If the observed value of the
sample statistic falls outside of the Region of Acceptance (into the Region of Rejection), the Null
Hypothesis is rejected.

Region of Rejection
The Region of Rejection is the percentage of area under this normal or t distribution curve that equals the
test’s specified Level of Significance (alpha). It is important to remember the following relationship:
Level of Significance (alpha) = 1 – Level of Certainty.
If the required Level of Certainty to reject the Null Hypothesis is 95 percent, then the following are true:
Level of Certainty = 0.95
Level of Significance (alpha) = 0.05
The Region of Acceptance includes 95 percent of the total area under the normal or t distribution curve
that maps the distributed variable, which is the sample statistic (the sample mean or proportion).
The Region of Rejection includes 5 percent of the total area under the normal or t distribution curve that
maps the distributed variable, which is the sample statistic (the sample mean or proportion). The 5-
percent alpha region is entirely contained in one of the tails if the test is a one-tailed test. The 5-percent
alpha region is split between both of the outer tails if the test is a one-tailed test.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Rejection (outside the Region of Acceptance), the Null Hypothesis is rejected.
If the observed value of the sample statistic falls inside of the Region of Acceptance, the Null Hypothesis
is not rejected.

Critical Value(s)
Each hypothesis test has one or two Critical Values. A Critical Value is the location of boundary between
the Region of Acceptance and the Region of Rejection. A one-tailed test has one critical value because
the Region of rejection is entirely contained in one of the outer tails. A two-tailed test has two Critical
Values because the Region of Rejection is split between the two outer tails.
The Null Hypothesis is rejected if the sample statistic (the observed sample mean or proportion) is farther
from the curve’s mean than the Critical Value on that side. If the sample statistic is farther from the
curve’s mean than the Critical value on that side, the sample statistic lies in the Region of Rejection. If the
sample statistic is closer to the curve’s mean than the Critical value on that side, the sample statistic lies
in the Region of Acceptance.

Test Statistic
Each hypothesis test calculates a Test Statistic. The Test Statistic is the amount of difference between
the observed sample statistic (the observed sample mean or proportion) and the hypothesized population
parameter (the Constant on the right side of the Null Hypothesis) which will be located at the curve’s
mean.
This difference is expressed in units of Standard Errors. The Test Statistic is the number of Standard
Errors that are between the observed sample statistic and the hypothesized population parameter. The
Null Hypothesis is rejected if that number of Standard Errors specified by the Test Statistic) is larger than
a critical number of Standard Errors. The critical number of Standard Errors is determined by the required
Level of Certainty

19
The Test Statistic is either the z Score or the t Value depending on whether a z-Test or t-Test is being
performed. This will be discussed in greater detail shortly.

p Value
Each hypothesis test calculates a p Value. The p Value is the area under the curve that is beyond the
sample statistic (the observed sample mean or proportion). The p Value is the probability that a sample of
size n with the observed sample mean or proportion could have occurred if the Null Hypothesis were true.
If, for example, the p Value of a Hypothesis Test of Mean or Proportion were calculated to be 0.0212, that
would indicated that there is only a 2.12 percent chance that a sample of size n would have the observed
sample mean or proportion if the Null Hypothesis were true. The Null Hypothesis states that the
population from which the sample came has the same mean as the hypothesized population. This mean
is the Constant on the right side of the Null Hypothesis.
The p Value is compared to alpha for a one-tailed test and to alpha/2 for a two-tailed test. The Null
Hypothesis is rejected if p is smaller than α for a one-tailed test or if p is smaller than α/2 for a two-tailed
test. If the p Value is smaller than α for a one-tailed test or smaller than α/2 for a two-tailed test, the
sample statistic is in the Region of Rejection.
Calculations of the Critical t Value(s) and the p Value are as follows:

Critical t Value or Critical z Value


Each hypothesis test calculates Critical t or z Values. A Critical t Value is calculated for a t-Test and a
Critical z Value is calculated for a z-Test. A Critical t or z Value is the amount of difference expressed in
Standard Errors between the boundary of the Region of Rejection (the Critical Value) and hypothesized
population parameter (the Constant on the right side of the Null Hypothesis) which will be located at the
curve’s mean.
A one-tailed test has only one Critical t or z Value because the Region of Rejection is entirely contained in
one outer tail A two-tailed test has two Critical z or t Values because the Region of Rejection is split
between the two outer tails.The Test Statistic (the t Value or z Score) are compared with the Critical t or z
Value on that side of the mean.
If the Test Statistic is farther from the standardized mean of zero than the Critical t or z Value on that side,
the Null Hypothesis is rejected. The Test Statistic is the number of Standard Errors that the sample
statistic is from the curve’s mean.
The Critical t or z Value on the same side is the number of Standard Errors that the Critical Value (the
boundary of the Region of Rejection) is from the mean. If the Test Statistic is farther from the
standardized mean of zero than the Critical t or z value, the sample statistic lies in the Region of
Rejection.

Critical t Value For 1-Tailed Test in Right Tail:


Excel 2010 and beyond
Critical t Value = T.INV(1-α,df)

Prior to Excel 2010


Critical t Value = TINV(2*α,df)

20
Critical t Value For 1-Tailed Test in Left Tail:
Excel 2010 and beyond
Critical t Value = T.INV(α,df)

Prior to Excel 2010


Critical t Value = -TINV(2*α,df)

Note that the negative sign has to be manually inserted into this pre-2010 formula to calculate the Critical
t Value in the left tail for a one-tailed test.

Critical t Values For a 2-Tailed Test:


Excel 2010 and beyond
Critical t Values = ±T.INV(1-α/2,df)

Prior to Excel 2010


Critical t Values = ±TINV(α,df)p Value

The p Value is calculated using the same formulas whether the test is a one-tailed test or a two-tailed
test.

Excel 2010 and beyond


p Value = T.DIST.RT(ABS(t Value), df)

Prior to Excel 2010


p Value = TDIST(ABS(t Value), df)

21
3 Equivalent Reasons To Reject Null Hypothesis
The Null Hypothesis of a Hypothesis Test of Mean or Proportion is rejected if any of the following
equivalent conditions are shown to exist:

1) Sample Statistic Beyond Critical Value


The sample statistic (the observed sample mean or proportion) would therefore lie in the Region of
Rejection because the Critical Value is the boundary of the Region of Rejection.

2) Test Statistic Beyond Critical t or z Value


The Test Statistic (the t Value of z Score) is the number of Standard Errors that the sample statistic is
from the curve’s mean. The Critical t or z Value is the number of Standard Errors that the boundary of the
Region of Rejection is from the curve’s mean. If the Test Statistic is farther from farther from the
standardized mean of 0 than the Critical t or z Value, the sample statistic lies in the Region of Rejection.

3) p Value Smaller Than α (1-Tailed) or α/2 (2-Tailed)


The p Value is the curve area beyond the sample statistic. α and α/2 equal the curve areas contained by
the Region of Rejection on that side for a one-tailed test and a two-tailed test respectively. If the p value is
smaller than α for a one-tailed test or α/2 for a two-tailed test, the sample statistic lies in the Region of
Rejection.

Independent vs. Dependent Samples


A sample that is independent of a second sample has data values that are not influenced by any of the
data values within the second sample. Dependent samples are often referred to as paired data. Paired
data are data pairs in which one of the values of each pair has an influence on the other value of the data
pair. An example of a paired data sample would be a set of before-and-after test scores from the same
set of people.

Pooled vs. Unpooled Tests


A two-independent-sample Hypothesis Test of Mean can be pooled or unpooled. A pooled test can be
performed if the variances of both independent samples are similar. This is a pooled test because a
single pooled standard deviation replaces both sample standard deviations in the calculation of the
Standard Error. An unpooled test must be performed when the variances of the two independent samples
are not similar.

Type I and Type II Errors


A Type I Error is a false positive and a Type II Error is a false negative. A false positive occurs when a
test incorrectly detects of a significant difference when one does not exist. A false negative occurs when a
test incorrectly fails to detect a significant different when one exists.
α (the specified Level of Significance) = a test’s probability of a making a Type I Error.
β = a test’s probability of a making a Type II Error.

22
Power of a Test
The Power of a test indicates the test’s sensitivity. The Power of a test is the probability that the test will
detect a significant difference if one exists. The Power of a test is the probability of not making a Type II
Error, which is failing to detect a difference when one exists. A test’s Power is therefore expressed by the
following formula:
Power = 1 – β

Effect Size
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.” A large effect would be a difference between two groups that is easily
noticeable with the measuring equipment available. A small effect would be a difference between two
groups that is not easily noticed.

Nonparametric Alternatives for t-Tests in Excel


Nonparametric tests are sometimes substituted for t-Tests because normality requirements cannot be
met. A t-Test is a Hypothesis Test of Mean that can be performed if the sample statistic (and therefore the
Test Statistic) is distributed according to the t distribution under the Null Hypothesis. The sample statistic
(the sample mean) is distributed according to the t distribution if any of the following three conditions
exist:
1) Sample size is large (n > 30). The sample taken for the hypothesis test must have at least 30 data
observations.
2) The population from which the sample was taken is verified to be normally distributed.
3) The sample is verified to be normally distributed.
If none of these conditions can be met or confirmed, a nonparametric test can often be substituted for a t-
Test. A nonparametric test does not have normality requirements that a parametric test such as a t-Test
does.

Hypothesis Test of Mean vs. Proportion


Hypothesis Test covered in this section will either be Hypothesis Tests of Mean or Hypothesis Test of
Proportion. A data point of a sample taken for a Hypothesis Test of Mean can have a range of values. A
data point of a sample taken for a Hypothesis Test of Proportion is binary; it can take only one of two
values.

Hypothesis Tests of Mean – Overview


A Hypothesis Test of Mean compares an observed sample mean with a hypothesized population mean to
determine if the sample was taken from the same population. An example would be to compare a sample
of monthly sales of stores in one region to the national average to determine if mean sales from the
region (the population from which the sample was taken) is different than the national average (the
hypothesized population parameter). As stated, a sample taken for a Hypothesis Test of Mean can have
a range of values. In this case, the sales of a sample sampled store can fall within a wide range of values.

23
Hypothesis Tests of Mean require that the Test Statistic is distributed either according to the normal
distribution or to the t distribution. The Test Statistic in a Hypothesis Test of Mean is derived directly from
the sample mean and therefore has the same distribution as the sample mean.

Hypothesis Tests of Proportion – Overview


A Hypothesis Test of Proportion compares an observed sample proportion with a hypothesized
population proportion to determine if the sample was taken from the same population. An example would
be to compare the proportion of defective units from a sample taken from one production line to the
proportion of defective units from all production lines to determine if the proportion defective from the one
production line (the population from which the sample was taken) is different than from the proportion
defective of all production lines (the hypothesized population parameter). As stated, a sample taken for a
Hypothesis Test of Proportion can only have one of two values. In this case, a sampled unit from a
production line is either defective or it is not.
Hypothesis Test of Proportion are covered in detail in a separate section in this manual. They are also
summarized at the end of the binomial distribution section.

t-Test vs. z-Test


A Hypothesis Test of Mean will either be performed as a z-Test or as a t-Test. When the sample mean
and therefore the Test Statistic are distributed according to the normal distribution, the hypothesis test is
called a z-Test and the Test Statistic is called the z Score. When the sample mean and therefore the
Test Statistic is distributed according to the t distribution, the hypothesis test is called a t-Test and the
Test Statistic is called the t Value. The Test Statistic is the number of Standard Errors that the observed
sample mean is from the hypothesized population mean.
t-Tests are covered in detail is a separate section in this manual. They are also summarized at the end of
this t distribution section.
z-Tests are covered in detail is a separate section in this manual. They are also summarized at the end of
the normal distribution section.

Means of Large Samples Are Normally Distributed


According to the Central Limit Theorem, the means of large samples will be normally distributed no matter
how the population from which the samples came is distributed. This is true as long as the samples are
random and the sample size, n, is large (n > 30). n equals the number of data observations that each
sample contains.
If the single sample taken for a Hypothesis Test of Mean is large (n > 30), then the means of a number of
similar samples taken from the same population would be normally distributed as per the Central Limit
Theorem. This is true no matter how the population or the single sample are distributed.
If the single sample taken for a Hypothesis Test of Mean is small (n < 30), then the means of a number of
similar samples taken from the same population would be normally distributed only if the population was
proven to be normally distributed or if the sample was proven to be normally distributed.

Requirements of a z-Test
A z-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived
from the sample mean) is normally distributed. The sample mean and therefore the Test Statistic are
normally distributed only when the following two conditions are both met:

24
1) The size of the single sample taken is large (n > 30). The Central Limit Theorem states that means of
large samples will be normally distributed. When the size of the single sample is small (n < 30), only a t-
Test can be performed.
2) The population standard deviation, σ (sigma), is known.

Requirements of a t-Test
A t-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived from
the sample mean) is distributed according to the t distribution. The sample mean and therefore the Test
Statistic are distributed according to the t distribution when both of these conditions are met:
1) The sample standard deviation, s, is known.
2) Either the sample or the population has been verified for normality.
A t-Test can be performed when the single sample is large (n > 30) but is the only option when the size of
the single sample is small (n < 30). A z-Test can only be performed when the size of the single sample is
large (n > 30) and the population standard deviation is known.
As mentioned, a Hypothesis Test of Mean requires that the sample mean and therefore the Test Statistic
is distributed either according to the normal distribution or to the t distribution. The sample mean and the
Test Statistic are distributed variables that can be graphed according to the normal or t distribution.
The Test Statistic, which represents the number of Standard Errors that the sample mean is from the
hypothesized population mean, could be graphed on a standard normal distribution curve or a
standardized t distribution curve. Both these two distribution curves have their means at zero and the
length of one Standard Error is set to equal 1.

Basic Steps of a Hypothesis Test of Mean


The major steps of the simple Hypothesis Test of Mean, a one-sample t-Test, are described as follows:
1) A sample of data is taken. The sample statistic which is the sample mean is calculated.
2) A Null Hypothesis is created stating the population from which the sample was taken has the same
proportion as a hypothesized population proportion. An Alternative Hypothesis is constructed stating that
sample population’s proportion is not equal to, greater than, or less than the hypothesized population
proportion depending on the wording of the problem.
3) The sample proportion is mapped to a normal curve that has a mean equal to the hypothesized
population proportion and a Standard Error calculated based upon a formula specific to the type of
Hypothesis Test of Proportion.
4) The Critical Values are calculated and the Regions of Acceptance and Rejection are mapped on the
normal graph that maps the distributed variable. The Critical Values represent the boundaries between
the Region of Acceptance and Region of Rejection.
5) Critical t Values, the Test Statistic (the t Value) and p Value are then calculated.
6) The Null Hypothesis is rejected if any of the following three equivalent conditions are shown to exist:
a) The observed sample mean, x_bar, is beyond the Critical Value.
b) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
c) The p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
The following graph represents the final result of a typical one-sample, two-tailed t-Test. In this case the
Null Hypothesis was rejected. This result is represented as follows:

25
This t-Test was a two-tailed test as evidenced by the yellow Region of Rejection split between the both
outer tails. In this t-Test the alpha was set to 0.05. This 5-percent Region of Rejection is split between the
two tails so that each tail contains a 2.5 percent Region of Rejection.
The mean of this non-standardized t-distribution curve is 186,000. This indicates that the Null Hypothesis
is as follows:
H0: x_bar = 186,000
Since this is a two-tailed t-Test, the Alternative Hypothesis is as follows:
H1: x_bar ≠ 186,000
This one-sample t-Test is evaluating whether the population from which the sample was taken has a
population mean that is not equal to 186,000. This is a non-directional t-Test and is therefore two-tailed.
The sample statistic is the observed sample mean of this single sample taken for this test. This observed
sample mean is calculated to be 200,000.
The boundaries of the Region of Rejection occur at 172,083 and 199,916. Everything beyond these two
point is in the Region of Rejection. These two Critical Values are 1.95 Standard Errors from the
standardized mean of 0. This indicates that the Critical t Values are ±1.95.
The graph shows that the sample statistic (the sample mean of 200,000) falls beyond the right Critical
value of 199,916 and is therefore in the Region of Rejection.
The sample statistic is 2.105 Standard Errors from the standardized mean of 0. This is further from the
standardized mean of 0 than the right Critical t value which is 1.95.
The curve area beyond the sample statistic consists of 2.4 percent of the area under the curve. This is
smaller than α/2 which is 2.5 percent of the total curve area because alpha was set to 0.05.
As the graph shows, all three equivalent conditions have been met to reject the Null Hypothesis. It can be
stated with at least 95 percent certainty that the mean of the population from which the sample was taken
does not equal the hypothesized population mean of 186,000.
26
Uses of Hypothesis Tests of Mean
1) Comparing the mean of a sample taken from one population with the another population’s
mean to determine if both populations have the different means. An example of this would be to
compare the mean monthly sales of a sample of retail stores from one region to the national mean
monthly store sales to determine if the mean monthly sales of all stores in the one region are different
than the national mean.

2) Comparing the mean of a sample taken from one population to a fixed number to determine if
that population’s mean is different than the fixed number. An example of this might be to compare
the mean product measurement taken a sample of a number of units of a product to the company’s
claims about that product specification to determine if the actual mean measurement of all units of that
company’s product is different than what the company claims it is.

3) Comparing the mean of a sample from one population with the mean of a sample from another
population to determine if the two populations have different means. An example of this would be to
compare the mean of a sample of daily production totals from one crew with the mean of a sample of
daily production totals from another crew to determine if the two crews have different mean daily
production totals.

4) Comparing successive measurement pairs taken on the same group of objects to determine if
anything has changed between measurements. An example of this would be to evaluate whether there
is mean difference in before-and-after tests scores of a small sample of the same people to determine if a
training program made a difference to all of the people who underwent it.

5) Comparing the same measurements taken on pairs of related objects. An example of this would
be to evaluate whether there is mean difference in the incomes of husbands and wives in a sample of
married couples to determine if there is a mean difference in the incomes of husbands and wives in all
married couples.

It is important to note that a hypothesis test is used to determine if two populations are different. The
outcome of hypothesis test is to either reject or fail to reject the Null Hypothesis. It would be incorrect to
state that a hypothesis test is used to determine if two populations are the same.

Types of Hypothesis Tests of Mean


Hypothesis Tests of Mean are either t-Tests or z-Tests.
The 4 types of t-Tests discussed here are the following:
One-sample t-Test
Two-Independent-Sample, Pooled t-Test
Two-Independent-Sample, Unpooled t-Test
Two-Dependent-Sample (Paired) t-Test
The 3 types of z-Test discussed in this manual are the following:
One-sample z-Test
Two-independent-Sample, Unpooled z-Test
Two-Dependent-Sample (Paired) z-Test
A detailed description of each of the 4 types of t-Tests along with examples in Excel are as follows:

27
1) One-Sample t-Test in Excel

Overview
This hypothesis test determines whether the mean of the population from which the sample was taken is
equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant. This constant
is often the known mean of a population from which the sample may have come from. The constant is the
constant on the right side of the Null Hypothesis.

x_bar = Observed Sample Mean

df = n - 1
Null Hypothesis H0: x_bar = Constant
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

Example of a 1-Sample, 2-Tailed t-Test in Excel


This problem is very similar to the problem solved in the z-test section for a one-sample, two-tailed t-test.
Similar problems were used in each of these sections to show the similarities and also contrast the
differences between the one-sample z-Test and t-test as easily as possible.
This problem compares average monthly sales from one fast food chain’s retail stores in one region with
the average monthly sales of all of the fast food chain’s retails in the entire country. The region being
evaluated has more than 1,000 very similar stores. The national mean monthly retail store sales equals
$186,000.
Determine with at least 95% certainty whether the average monthly sales of all of the fast food chain’s
stores in the one region is different than the national average monthly sales of all of the fast food chain’s
stores.
The data sample of sales for the same month for a random sample of 20 fast food retail stores in a region
is as follows:
$240,000 $180,000 $200,000 $260,000 $200,000
$220,000 $180,000 $200,000 $200,000 $180,000
$160,000 $180,000 $200,000 $220,000 $220,000
$140,000, $220,000 $200,000 $240,000 $160,000
28
Running the Excel data analysis tool Descriptive Statistics will provide the Sample Mean, the Sample
Standard Deviation, the Standard Error, and the Sample Size. The output of this tool appears as follows:

Summary of Problem Information


x_bar = sample mean = AVERAGE() = 200,000
µ = national (population) mean = 186,000
s = sample standard deviation =STDEV.S() = 29735.68
σ (Greek letter “sigma”) = population standard deviation = Not Known
n = sample size = COUNT() = 20

SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)


n = sample size = COUNT() = 20
SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)
Note that this calculation of the Standard Error using the sample standard deviation, s, is an estimate of
the true Standard Error which would be calculated using the population standard deviation, σ.
SE = 6649.10
df = degrees of freedom = n – 1 = 20 – 1 = 19
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

29
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The Initial Two Questions To Be Answered Before Performing the Four-Step Hypothesis Test of Mean are
as follows:

Question 1) Type of Test?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Mean because each individual observation (each sampled monthly sales
figure) within the sample can have a wide range of values. Data observations for Hypothesis Tests of
Proportion are binary: they can take only one of two possible values.

b) One-Sample or a Two-Sample Test?


This is a one-sample hypothesis test because only one sample containing monthly sales figures from
twenty stores has been taken and is being compared to the national monthly retail store average for the
same month.

c) Independent (Unpaired) Test or a Dependent (Paired) Test?


It is neither. The designation of “paired” or “unpaired” applies only for two-sample hypothesis tests.

d) One-Tailed or Two-Tailed Hypothesis?


The problem asks to determine whether the twenty-store monthly average is simply different than the
national average. This is a non-directional inequality making this hypothesis test a two-tailed test. If the
problem asked whether the twenty-store average was greater than or less than the national average, the
inequality would be directional and the resulting hypothesis test would be a one-tailed test. A two-tailed
test is more stringent than a one-tailed test.

e) t-Test or z-Test?
Assuming that the population or sample can pass a normality test, a hypothesis test of mean must be
performed as a t-Test when the sample size is small (n < 30) or if the population variance is unknown.
In this case the sample size is small as n = 20. This Hypothesis Test of Mean must therefore be
performed as a t-Test and not as a z-Test.
The t Distribution with degrees of freedom = df = n – 1 is defined as the distribution of random data
sample of sample size n taken from a normal population.

30
The means of samples taken from a normal population are also distributed according to the t
Distribution with degrees of freedom = df = n – 1.
The Test Statistic (the t Value, which is based upon sample mean (x_bar) because it equals (x_bar –
Constant)/(s/SQRT(n)) will therefore also be distributed according to the t Distribution. A t-Test will be
performed if the Test Statistic is distributed according to the t Distribution.
The distribution of the Test Statistic for sample taken from a normal population is always described by the
t Distribution. The shape of the t Distribution converges to (very closely resembles) the shape of the
standard normal distribution when sample size becomes large (n > 30).
The Test Statistic’s distribution can be approximated by the normal distribution only if the sample size is
large (n > 30) and the population standard deviation, σ, is known. A z-Test can be used if the Test
Statistic’s distribution can be approximated by the normal distribution. A t-Test must be used in all other
cases.
It should be noted that a one-sample t-Test can always be used in place of a one-sample z-Test. All z-
Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software
packages including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a t-Test that is one-sample, two-tailed hypothesis test of mean as long as
all required assumptions have been met.

Question 2) Test Requirements Met?


a) t-Distribution of Test Statistic
A t-Test can be performed if the distribution of the Test Statistic (the t value) can be approximated under
the Null Hypothesis by the t Distribution. The Test Statistic is derived from the mean of the sample taken
and therefore has the same distribution that the sample mean would have if multiple similar samples were
taken from the same population.
The sample size indicates how to determine the distribution of the sample mean and therefore the
distribution of the Test Statistic as follows:

When Sample Size Is Large


When the sample size is large (n > 30), the distribution of means of similar samples drawn from the same
population is described by the t Distribution. As per the Central Limit Theorem, as sample size increases,
the distribution of the sample means converges to the normal distribution as does the t Distribution. When
sample size approaches infinity, the t Distribution converges to the standard normal distribution.
When sample size is large, the distribution of the sample mean, and therefore the distribution of the Test
Statistic, is always described by the t Distribution. A t-Test can therefore always be used when sample
size is large, regardless of the distribution of the population or sample.

When Sample Size is Small


The data in a sample taken from a normally-distributed population will be distributed according to the t
Distribution regardless of sample size.
The means of similar random samples taken from a normally-distributed population are also distributed
according to the t Distribution regardless of sample size.
The sample mean, and therefore the Test Statistic, are distributed according to the t Distribution if the
population is normally distributed.

31
The population is considered to be normally distributed if any of the following are true:
1) The population from which the sample was taken is shown to be normally distributed.
2) The sample is shown to be normally distributed. If the sample passes a test of normality then the
population from which the sample was taken can be assumed to be normally distributed.
The population or the sample must pass a normality test before a t-Test can be performed. If the only
data available are the data of the single data taken, then sample must pass a normality test before a t-
Test can be performed.

Evaluating the Normality of the Sample Data


The following five normality tests will be performed on the sample data here:
An Excel histogram of the sample data will be created.
A normal probability plot of the sample data will be created in Excel.
The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.
The Anderson-Darling test for normality of the sample data will be performed in Excel.
The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

Histogram in Excel
The quickest way to check the sample data for normality is to create an Excel histogram of the data as
shown below, or to create a normal probability plot of the data if you have access to an automated
method of generating that kind of a graph.

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

32
The sample group appears to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.

Normal Probability Plot in Excel


Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:

33
The normal probability plots for the sample group show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).

Kolmogorov-Smirnov Test For Normality in Excel


The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.

34
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

0.1500 = Max Difference Between Actual and Expected CDF


20 = n = Number of Data Points
0.05 = α

35
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Max Difference Between the Actual and Expected CDF (0.1500) is less than the Kolmogorov-
Smirnov Critical Value for n = 20 and α = 0.05 so do not reject the Null Hypothesis.
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected if the maximum difference between the expected and actual CDF of
any of the data points exceed the Critical Value for the given n and α.

Anderson-Darling Test For Normality in Excel


The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a Test Statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons:
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.
If the Test Statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the Test Statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.

36
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

Adjusted Test Statistic A* = 0.407


Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for the Difference Sample Group (0.407) is significantly less than the
Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for the
sample group is accepted.

37
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A Test Statistic W is calculated. If this Test Statistic is less than a critical value of W
for a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample
is normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
Sample Data

0.967452 = Test Statistic W


0.905 = W Critical for the following n and Alpha
20 = n = Number of Data Points
0.05 = α

38
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Shapiro-Wilk Test Statistic W (0.967452) is larger than W Critical 0.905. The Null Hypothesis
therefore cannot be rejected. There is not enough evidence to state that the data are not normally
distributed with a confidence level of 95 percent.

Correctable Reasons That Normal Data Can Appear Non-Normal


If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.

When Data Are Not Normally Distributed


The Sign Test and Wilcoxon One-Sample Signed-Rank Test are nonparametric alternative to the one-
sample t-test when the normality assumption of sampled data is questionable. The one-sample t-test is
used to evaluate whether a population from which samples are drawn has the same mean as a known
value. The nonparametric tests evaluate whether the sample have the same median as a known value.
The Sign Test is a much less powerful alternative to the Wilcoxon One-Sample Signed-Rank test, but
does not assume that the differences between the samples and the known value is symmetrical about a
median, as does the Wilcoxon One-Sample Signed-Rank test when used as a nonparametric alternative
to the one-sample t-test. The Sign Test is non-directional and can be substituted only for a two-tailed test
but not for a one-tailed test.
The parametric one-sample, two-tailed t-Test that is currently being in this section detected a difference at
alpha = 0.05. The Wilcoxon One-Sample Signed-Rank Test also detected a difference at alpha = 0.05.
The Sign Test was not able to detect a difference at alpha = 0.25.
Both the Wilcoxon One-Sample Signed-Rank Test and the Sign Test will be performed on the data in this
example near the end of this section.

39
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternate Hypotheses


The Null Hypothesis is always an equality that states that the items being compared are the same. In this
case, the Null Hypothesis would state that the average monthly sales of all stores in the region (the
population from which the twenty-store sample was taken) is not different than the national monthly store
average sales, µ, which is $186,000. We will use the variable x_bar to represent the sample mean of the
twenty stores. The Null Hypothesis is as follows:
H0: x_bar = Constant = 186,000
The Constant is quite often the known population mean, µ, to which the sample mean is being compared.
The Alternative Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the average monthly sales of all stores in the
region (the population from which the twenty-store sample was taken) is merely different than the national
monthly store average sales, µ, which is $186,000.
The Alternative Hypothesis is as follows:
H1: x_bar ≠ Constant, which is 186,000
H1: x_bar ≠ 186,000
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the means of the populations from
which the samples were taken. A one-sample t-Test determines whether to reject or fail to reject the Null
Hypothesis that states that that population from which the sample was taken (the entire region) has a
mean equal to the Constant. The Constant in this case is equal to known national average.
Parameters necessary to map the distributed variable, x_bar, to the t Distribution are the following:
s = sample standard deviation =STDEV.S() = 29735.68
n = sample size = COUNT() = 20
SE = Standard Error = s / SQRT(n) = 29735.68 / SQRT(20)
df = degrees of freedom = n – 1 = 20 – 1 = 19

40
Step 2 – Map the Distributed Variable to t-Distribution
A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed
according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-
related Test Statistic are distributed according to the t Distribution.
The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies
only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its
degrees of freedom, which equals n – 1 if n = sample size.
The means of similar, random samples taken from a normal population are distributed according to the t
Distribution. This means that the distribution of a large number of means of samples of size n taken from
a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.
The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of
freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map
the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:

A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar.

41
The variable x_bar is distributed according to the t Distribution. Mapping this distributed variable to a t
Distribution curve is shown as follows:

This non-standardized t Distribution curve has its mean set to equal the Constant taken from the Null
Hypothesis, which is:
H0: x_bar = Constant = 186,000
This non-standardized t Distribution curve is constructed from the following parameters:
Mean = 186,000
Standard Error = 6,649.10
Degrees of Freedom = 19
Distributed Variable = x_bar

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different the sample mean, x_bar = $200.000, is from the national average of
$186,000.
The non-standardized t Distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.

42
If the sample mean’s value of x_bar = 200,000 falls into a Region of Rejection, the Null Hypothesis is
rejected. If the sample mean’s value of x_bar = 200,000 falls into a Region of Acceptance, the Null
Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.
This 5 percent is divided up between the two outer tails. Each outer tail contains 2.5 percent of the curve
that is the Region of Rejection.
The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical values need to be calculated.

Calculate Critical Values


A Critical Value is the boundary between a Region of Acceptance and a Region of Rejection. In the case
of a two-tailed test, the Region of rejection is split between two outer tails. There are therefore two Critical
Values.
The Critical Value is the boundary on either side of the curve beyond which 2.5 percent of the total area
under the curve exists. In this case both Critical Values can be found by the following:
Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± T.INV(1-α/2,df) * SE
Critical Values = 186,000 ± T.INV(0.975, 19) * 6649.1
Critical Values = 186,000 ± 13,916
Critical Values = 172,083 and 199,916
The Region of Rejection is therefore everything that is to the right of 199,916 and everything to the left of
172,083.

43
The non-standardized t Distribution curve with the blue Region of Acceptance and the yellow Regions of
Rejection divided by the Critical Values is shown is in the following Excel-generated graph of this non-
standardized t Distribution curve:

Step 4 – Determine Whether to Reject Null Hypothesis


The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
1) Compare Sample Mean x-bar With Critical Value
Reject the Null Hypothesis if the sample mean, x_bar = 200,000, falls into the Region of Rejection.
Equivalently, reject the Null Hypothesis if the sample mean, x_bar, is further the curve’s mean of 186,000
the Critical Value.
The Critical Values have been calculated to be 172,083 on the left and 199,916 on the right. X_bar
(200,000) is further from the curve mean (186,000) than right
Critical Value (199,916). The Null Hypothesis would therefore be rejected.

2) Compare the t Value With Critical t Value


The t Value is the number of Standard Errors that x_bar is from the curve’s mean of 186,000.
The Critical t Value is the number of Standard Errors that the Critical Value is from the curve’s mean.
Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t
Value.

44
Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the
Critical t Value.

t Value (Test Statistic) = (200,000 – 186,000)/6,649.1


t Value (Test Statistic) = 2.105
This means that the sample mean, x_bar, is 2.105 standard errors from the curve mean (186,000).
Critical t Values = ±T.INV(1-α/2,df)
Critical t Values = ±T.INV(1-0.05/2,19)
Critical t Values = ±2.093
This means that the boundaries of the Region of Rejection are 2.093 standard errors from the curve
mean (186,000) on each side since this is a two-tailed test.
The Null Hypothesis is rejected because the t Value (2.105) is farther from the standardized mean of zero
than the Critical t Value on that side (+2.093) indicating that x_bar is in the Region of Rejection.

3) Compare the p Value With Alpha


The p Value is the percent of the curve that is beyond x_bar (200,000). If the p Value is smaller than
Alpha/2 (since this is a two-tailed test), the Null Hypothesis is rejected.
p Value = T.DIST.RT(ABS(t Value), df)
p Value = T.DIST.RT(ABS(2.105), 19)
p Value = 0.0244

45
The p Value (0.0244) is smaller than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore
reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is
smaller than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails.
This is shown in the following Excel-generated graph of this non-standardized t Distribution curve:

It should be noted that if this t-Test were a one-tailed test, which is less stringent than a two-tailed test,
the Null Hypothesis would still have been rejected because:
1) The p Value (0.0244) would still be smaller than the Alpha (0.05) Region of Rejection, which is now
entirely contained in the right tail
2) x_bar (200,000) would still be outside the Region of Acceptance, which would now have its outer right
boundary at 197,497.2 (mean + T.INV(1 - Alpha,df)*SE)
3) The t Value (2.105) would still be larger than the critical t Value which would now be 1.73 (Critical t
Value = T.INV(1 - Alpha,df))

46
Excel Shortcut to Performing a One-Sample t-Test
All of the three other types of t-Tests (two-independent-sample pooled and unpooled t-Tests along with
the paired t-Test) can be solved in one step with a built-in Excel formula and also with a built-in Data
Analysis tool for each t-Test.
Excel unfortunately does not provide a formula or tool that can perform or solve a one-sample t-Test in
one step. Interestingly enough, a one-sample z-Test can be solved in Excel in one step with the following
formula:
p Value = MIN(Z.TEST(array,Constant,σ),1- Z.TEST(array,Constant,σ))
array = Set of sample data
Constant = the Constant in the Null Hypothesis
σ = Population standard deviation
There is no such method in Excel to perform a one-sample t-Test similarly in a single step. The other
three types of t-Tests each have a one-step tool and a one-step formula. One of the main reasons that
these tools and formulas are one-step is that the t Value is calculated automatically. There is no one-
sample t-Test tool or formula that automatically calculates the t Value while performing the t-Test or
calculating the p Value. The t Value must be calculated in its own step when performing a one-sample t-
Test in Excel.
The formula needed to perform a one-sample t-Test is the following as previously shown:
p Value = T.DIST.RT(ABS(t Value), df)
This formula requires that the t Value be calculated first. This must be done manually using the following
steps:
t Value = (x-bar – Constant)/SE
SE = s/SQRT(n)
The one-sample t-Test is a very common statistical test so it is surprising that Excel does not have a one-
step formula or a Data Analysis tool to directly calculate either the p Value or t Value given the array and
the Constant from the Null Hypothesis. Each of the other three types of t-Tests has its own specific
formulas and its own Data Analysis tools to perform either entire the t-Test or calculate the p Value in a
single step.

47
Effect Size in Excel
Effect size for a one-sample t-Test is a method of expressing the difference between the sample mean,
x_bar, and the Constant in a standardized form that does not depend on the sample size.
Remember that the Test Statistic (the t Value) for a one-sample t-Test calculated by the following formula:

since SE = (Sample mean) / SQRT(Sample Size) = s/SQRT(n)

The t Value specifies the number of Standard Errors that the sample mean, x_bar, is from the Constant.
The t Value is dependent upon the sample size, n. The t Value determines whether the test has achieved
statistical significance and is dependent upon sample size. Achieving statistical significance means that
the Null Hypothesis (H0: x_bar = Constant) has been rejected.
The Effect Size, d, for a one-sample t-Test is a very similar measure that does not depend on sample size
and has the following formula:

A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
If the t Value has already been calculated, the Effect Size can be quickly calculated by the following
formula:

The d measured here is Cohen’s d for a one-sample t-Test. The Effect Size is a standardized measure of
size of the difference that the t-Test is attempting to detect. The Effect Size for a one-sample t-Test is a
measure of that difference in terms of the number of sample standard deviations. Note that sample size
has no effect on Effect Size. Effect size values for the one-sample t-Test are generalized into the
following size categories:

48
d = 0.2 up to 0.5 = small Effect Size
d = 0.05 up to 0.8 = medium Effect Size
d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar – Constant| / s = |200,000 – 186,000| / 29,736.68 = 0.471
An effect size of d = 0.471 is considered to be a small effect.

Power of the Test With Free Utility G*Power


The Power of a one-sample t-Test is a measure of the test’s ability to detect a difference given the
following parameters:
Alpha (α)
Effect Size (d)
Sample Size (n)
Number of Tails
Power is defined by the following formula:
Power = 1 – β
Β equals the probability of a Type 2 Error. A Type 2 Error can be described as a False Negative. A false
Negative represents a test not detecting a difference when a difference does exist.
1 – β = Power = the probability of a test detecting a difference when one exists.
Power is therefore a measure of the sensitivity of a statistical test. It is common to target a Power of 0.8
for statistical tests. A Power of 0.8 indicates that a test has an 80 percent probability of detecting a
difference.
The four variables that are required in order to determine the Power for a one-sample t-Test are Alpha
(α), Effect Size (d), Sample Size (n), and the Number of Tails. Typically alpha, Effect Size, and the
Number of Tails are held constant while sample size is varied (usually increased) to achieve the desired
Power for the statistical test.
Manual calculation of a test’s Power given Alpha, Effect Size, Sample Size, and the Number of Tails are
quite tedious. Fortunately there are a number of free utilities online that will readily calculate a test’s
statistical Power. A widely-used online Power calculation utility called G*Power is available for download
from the Institute of Experimental Psychology at the University of Dusseldorf at this link:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.

49
Bring up G*Power’s initial screen and input the following information:
Test family: t-Tests
Statistical test: Difference from constant (one-sample case)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 2
Effect Size (d) = 0.471
Alpha (α) = 0.05
Sample Size (n) = 20
The completed dialogue screen appears as follows:

50
Clicking Calculate would produce the following output:

The Power achieved for this test is 0.5645. This means that the current two-tailed test has a 56.45
percent chance of detecting a difference that has an effect size of 0.471 if α = 0.05 and n = 20.

51
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:

This would indicate that a Power of 80 percent would be achieved for this test if the sample size were
approximately n = 34.

Nonparametric Alternatives in Excel


There are two nonparametric tests that can be substituted for the one-sample t-Test when normality of
the sample or population cannot be verified and sample size is small. These two tests are the Wilcoxon
One-Sample, Sign-Rank Test and the Sign Test. The one-sample t-test is used to evaluate whether a
population from which samples are drawn has the same mean as a known value. The nonparametric
tests evaluate whether the sample have the same median as a known value.
The Sign Test is significantly less powerful alternative to the Wilcoxon One-Sample Signed-Rank test, but
does not assume that the differences between the samples and the known value is symmetrical about a
median, as does the Wilcoxon One-Sample Signed-Rank test when used as a nonparametric alternative

52
to the one-sample t-test. The Sign Test is non-directional and can be substituted only for a two-tailed test
but not for a one-tailed test.
The Wilcoxon test is based upon the sum of rankings of values while the Sign Test is based upon the
sum of positive versus negative values.
The Wilcoxon One-Sample Signed Rank is much more powerful (able to detect a difference) than the
Sign Test but has a required assumption that sample data are distributed about a median is a relatively
symmetric fashion. The Sign Test does not have this assumption.

Wilcoxon One-Sample, Signed-Rank Test in Excel


The Wilcoxon One-Sample, Signed-Rank Test is an alternative to the one-sample t-Test when sample
size is small (n < 30) and normality cannot be verified for the sample data or the population from which
the sample was taken.
The Wilcoxon One-Sample, Signed-Rank Test calculates the difference between each data point in the
sample and the Constant from the t-Test’s Null Hypothesis (186,000 in this case). The absolute values of
each difference and ranked and then assigned the sign (positive or negative) that the difference originally
had. These signed ranks are summed up to create the Test Statistic W.
Test Statistic W will be approximately normally distributed if the required assumptions are met for this
test. The Test Statistic’s z Score can then be calculated and compared with the Critical z value. The
decision whether or not to reject the test’s Null Hypothesis is made based on the results of this
comparison.
The Null Hypothesis for this test states that the median of the difference population equals a Constant.
This is somewhat similar to the Null Hypothesis of the one-sample t-Test which states that the mean of a
population equals a Constant.
The Wilcoxon One-Sample, Signed-Rank Test is performed on this data by implementing the following
steps:

Step 1) Calculate the Difference Between Each Sample Data Point and the Constant to Which the
Sample Is Being Compared.
The original Null Hypothesis from the one-sample t-Test stated that the mean monthly retails sales for the
stores in a single region is equal to the nation average which is 186,000. The Null Hypothesis for this t-
Test was as follows:
H0: x_bar = Constant =186,000

53
A difference sample consisting of the differences between each sample data point and the Constant
(186,000) is created as follows:

Step 2) Create the Null and Alternative Hypotheses.


The one-sample t-Test attempts to determine whether the mean monthly retails sales for the stores in a
single region is equal to the nation average which is 186,000.
The Wilcoxon One-Sample, Signed-Rank Test attempts to determine whether the median monthly retails
sales for the stores in a single region is equal to 186,000.
If the median monthly retail sales for the region’s stores equals 186,000, then the median of the
difference will equal zero. The Null Hypothesis is based on this and is stated as follows:
H0: Median_Difference = Constant = 0

54
The Alternative Hypothesis is non-directional because the test’s overall purpose is to determine only
whether or not the regional mean monthly retail sales equals the national average of 186,000. The
Alternative Hypothesis for this Wilcoxon One-Sample, Signed-Rank Test will therefore be stated as
follows:
H1: Median_Difference ≠ Constant = 0
H1: Median_Difference ≠ 0
Step 3) Evaluate Whether the Test’s Required Conditions Have Been Met
The Wilcoxon One-Sample, Signed-Rank Test has the following requirements:
a) Data are ratio or interval but not categorical (nominal or ordinal). This is the case here.
b) Sample size is at least 10.
c) Data of the Difference sample are distributed about a median with reasonable symmetry. Test Statistic
W will not be normally distributed unless this assumption is met.
The following Excel-generated histogram shows that the difference data are distributed symmetrically
about their median of 14,000:

This histogram and the sample’s median were generated in Excel as follows:

55
56
Step 4 – Record the Sign of Each Difference
Place a “+1” and “-1” next to each non-zero difference. This can be automatically generated with an If-
Then-Else statement as follows:

57
Placing a plus sign (+) next to a number automatically requires a custom number format available from
the Format Cell dialogue box. One custom format that will work is the following: “+”#:”-“# . This is
demonstrated in following Excel screen shot:

58
Step 5 – Sort the Absolute Values of the Differences While Retaining the Sign Associated With
Each Difference
Sort both columns based upon column of difference absolute values.

59
Step 6 –Rank the Absolute Values, Attach the Signs, and Sum up the Signed Ranks to Create Test
Statistic W.
The absolute values are ranked in ascending order starting with a rank of 1. Absolute values that are tied
area assigned the average rank of the tied values. For example, the first four absolute values are 6000.
Each of these four absolute values would be assigned a rank of 2.5, which is equal to the average rank of
all four, i.e., (1 + 2 + 3 + 4) / 4 = 2.5.
Test Statistic W is equal to the sum of all signed ranks.

60
Step 7 – Calculate the z Score of W
The distribution of Test Statistic W can be approximated by the normal distribution if all of the required
assumptions for this test are met. The difference data consists of more than 10 points of ratio data that
are reasonably symmetrically distributed about their median. The assumptions are therefore met for this
Wilcoxon One-Sample, Signed-Rank Test.
The standard deviation of W, σW, is calculated as follows:
σW = SQRT[ n(n + 1)(2n + 1)/6 ] = 53.57
z Score = ( W – Constant – 0.5) / σW
z Score = ( 110 – 0 – 0.5) / 53.57 = 2.04
The constant is the Constant from the Null Hypothesis for this test, which is the following:
H0: Median_Difference = Constant = 0
The z Score must include a 0.5 correction for continuity because W assumes whole integer values
(except in the event of a tie of ranks).
Step 8 – Reject or Fail to Reject the Null Hypothesis Based Upon a Comparison Between the z
Score and the Critical z Value
Given that α = 0.05 and this is a two-tailed test, the Critical z Value is calculated as follows:
Z Criticalα=0.05,Two-Tailed = ±NORM.S.INV(1 – α/2) = ±NORM.S.INV(0.975)
Z Criticalα=0.05,Two-Tailed = ±1.9599
The Null Hypothesis is rejected if the z Score is further from the standardized mean of zero than the
Critical z Values. This is the case here since the z Score (2.04) is further from the standardized mean of
zero than the Critical z Values (±1.9599). These results from the Wilcoxon Signed-Rank Test are shown
in the following Excel-generated graph:

61
Rejection of the Null Hypothesis for this test can be interpreted to state that there is at least 95 percent
certainty that the median of the difference sample does not equal zero. This would mean that there is 95
percent certainty that the median monthly sales of the retail stores in the region does not equal the
national average of 186,000.
The results of this Wilcoxon One-Sample, Signed-Rank Test were very similar to the results of the original
one-sample t-Test in which the Null Hypothesis was rejected because the t value (2.105) was further from
the standardized mean of zero than the Critical t Value (2.093). The results of this t-Test indicate 95
percent certainty that the mean monthly sales of the retail stores in the region does not equal the national
average of 186,000.
The results of the t-Test are shown in the following Excel-generated graph of this non-standardized t
Distribution:

The Wilcoxon One-Sample Signed-Rank Test detects that the median difference between the region’s
retail store monthly sales and the national average is significant at an alpha level of 0.05.
The one-sample t- Test detects that the mean difference between the region’s retail store monthly sales
and the national average is significant at an alpha level of 0.05.

62
Sign Test in Excel
The Sign Test along with the Wilcoxon One-Sample Signed-Rank Test are nonparametric alternatives to
the one-sample t-Test when the normality of the sample or population cannot be verified and the sample
size is small.
The Wilcoxon One-Sample Signed-Rank Test is significantly more powerful than the Sign Test but has a
requirement of symmetrical distribution about a median for the difference sample data (the data set of the
sample points minus the Constant of the Null Hypothesis). The Wilcoxon One-Sample Signed-Rank Test
is based upon a normal approximation of its Test Statistic’s distribution. This requires that the difference
sample be reasonably symmetrically distributed about a median.
The Sign Test has no requirements regarding the distribution of data but, as mentioned, is significantly
less powerful than the Wilcoxon One-Sample Signed-Rank Test.

The Sign Test counts the number of positive and negative non-zero differences between sample data and
the Constant from the Null Hypothesis in the one-sample t-Test. In this case that Constant = 186,000
because the Null Hypothesis of the one-tailed t-Test is as follows:
H0: x_bar = Constant = 186,000

63
This difference sample is calculated as follows:

64
A count of positive and negative differences in this sample is taken as follows:

The minimum count of positive or negative non-zero differences is designated as the Test Statistic W for
this One-Sample Sign Test. Test Statistic W is named after Frank Wilcoxon who developed the test.
The objective of the two-tailed, one-sample t-Test was to determine whether to reject or fail to reject the
Null Hypothesis that states that the mean monthly sales of retails stores in the one region is equal to the
national average which is 186,000.
If the region’s mean store sales is equal to 186,000, then the probability of the monthly sales of any store
in the region minus 186,000 being positive (greater than zero) is the same as the probability of being
negative (less than zero). This probability is 50 percent.
Without knowing whether positive outcomes or negative outcomes are being counted, the probability of
the mean monthly sales of the region’s stores being 186,000 is equal to the probability of a positive
outcome (p) being 50 percent OR the probability of a negative outcome (q) being 50 percent.

65
The Null Hypothesis for this two-tailed, one-sample Sign Test states that the probability of a difference
being positive (p) OR the probability of a difference being negative (q) is 50 percent. This can be
expressed as follows:
H0: p=0.5 OR q=0.5
which would be expressed as follows:
H0: p=0.5 ∩ q=0.5
The Alternative Hypothesis would state the following:
H1: p≠0.5 ∩ q≠0.5
Each non-zero difference is classified as either positive or negative. This is a binary event because the
classification of each difference has only two possible outcomes: the non-zero difference is either positive
or negative.
The distribution of the outcomes of this binary event can be described by the binomial distribution as long
as the following two conditions exist:
1) Each binary trial is independent.
2) The data from which the differences are derived are at least ordinal. The data can be ratio, interval,
ordinal, but not nominal. The differences of “less than” and “greater than” must be meaningful even if the
amount of difference is not, as would be the case with ordinal data but not with nominal data.
3) Each binary trial has the same probability of a positive outcome.
All of these conditions are met because of the following:
1) Each sample taken is independent of any other sample.
2) The differences are derived from continuous (either ratio or interval) data.
3) The proportion of positive differences versus negative differences is assumed to be constant in the
population from which the sample of differences was derived.
The counts of the positive and negative differences both follow the binomial distribution. The binary event
to be analyzed will be one of the two, i.e., either the count of positive differences OR the count of the
negative differences. The conservative choice will be made by selecting the count that has the lowest
number.
This count, whether it is the count of positive differences or the count of negative differences, is
designated as W, the Test Statistic. This Test Statistic follows the binomial distribution because W
represents the count of positive or negative outcomes of independent binary events that all have the
same probability of a positive outcome.
As stated, the Null Hypothesis of this two-tailed, one-sample Sign Test is the following:
H0: p=0.5 ∩ q=0.5
The Null Hypothesis would be rejected if the p Value calculated from this test is less than alpha, which is
customarily set at 0.05.
The logical operator OR represents the intersection of sets. The probability of Event A OR Event B
occurring equals the sums of the probabilities of each occurring individually.
Pr(A ∩ B) = PR(A) + Pr(B)
The p Value of this test represents the probability that p = 0.5 given that the count of positive differences
is less than or equal to W OR q = 0.5 given that the count of negative differences is less than or equal
to W. Test Statistic W can represent either the count of positive OR negative differences and is set to the
difference type that has the lower count.
The p value equals the probability that p = 0.5 if W equals UP TO the count of positive differences OR the
probability that q = 0.5 if W equals UP TO the count of negative differences.

66
This p Value is expressed as follows:
p Value =
Pr (p = No. of Positive Differences ≤ W |p=0.5,n = No. of Non-Zero Differences)

Pr (q = No. of Negative Differences ≤ W |p=0.5,n = No. of Non-Zero Differences)
Since Pr(A ∩ B) = PR(A) + Pr(B)
p Value =
Pr (p = No. of Positive Differences ≤ W |p=0.5,n = 20 = No. of Non-Zero Differences)
+
Pr (q = No. of Negative Differences ≤ W |p=0.5,n =20 = No. of Non-Zero Differences)
Given that variable x is binomially distributed, the CDF (Cumulative Distribution Function) of the x ≤ X is
calculated in Excel as follows:
F(X;n,p) = BINOM.DIST(X, n, p, 1)
This calculates the probability that up to X number of positive outcomes will occur in n total binary trials if
the probability of a positive outcome is p for every trial. “1” specifies that the Excel formula will calculate
the CDF and not the PDF.
Therefore the following can be calculated:
Pr (p = No. of Positive Differences ≤ W |p=0.5, n = Total No. of Non-Zero Differences) =
= BINOM.DIST(W, n, p,1)
= BINOM.DIST(7,20,0.5,1) = 0.1316
Pr (q = No. of Negative Differences ≤ W |p=0.5,n =20 = No. of Non-Zero Differences)
= 1 - BINOM.DIST(n - W, n, q,1)
= 1 - BINOM.DIST(13,20,0.5,1) = 0.1316
Due the symmetry of the binominal distribution, the following is true:
BINOM.DIST(W, n, p,1) = 1 - BINOM.DIST(n - W, n, q,1)
p Value = BINOM.DIST(W, n, p,1) + [1 - BINOM.DIST(n - W, n, q,1)]
p Value = 2 * BINOM.DIST(W, n, p,1)
p Value = 2 * BINOM.DIST(7,20,0.5,1) = 2 * 0.1316 = 0.2712

67
This is shown in the following Excel-generated graph of the PDF of the binomial distribution for this sign
test. The parameters of this binomial distribution are Total Trials = N = 20 and the Probability of a Positive
Outcome of Each Trial, p, equal 0.5. The Probability of a Negative Outcome, q, also equals 0.5.

This total p value (0.2712 = 0.1356 + 0.1356) is larger than alpha (set at 0.05). The Null Hypothesis is
therefore not rejected at this alpha level. The Null Hypothesis for this test can be interpreted to state that
the mean difference is equal to zero. This would be equivalent to stating that the mean monthly retails
sales for the region is equal to the national average which is 186,000.
This example demonstrates how much less powerful the one-sample Sign Test is than the one-sample t-
Test or the one-sample Wilcoxon Signed-Rank Test. The Sign Test did not come close to detecting a
difference at the same alpha level that the other two tests did.

68
2) Two-Independent-Sample, Pooled t-Test in Excel

Overview
This hypothesis test evaluates two independent samples to determine whether the difference between the
two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-
tailed test) than a constant. This is a pooled test because a single pooled standard deviation replaces
both sample standard deviations because they are similar enough.
x_bar1 - x_bar2 = Observed difference between the sample means

Pooled t-Tests are performed if the variances of both sample groups are similar. A rule-of-thumb is as
follows: A Pooled t-Test should be performed if the standard deviation of one sample, s 1, is no more than
twice as large as the standard deviation in the other sample s2. That is the case here for the following
example.
dfpooled = degrees of freedom = n1 + n2 – 2

Null Hypothesis H0: x_bar1 - x_bar2 = Constant


The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar1 - x_bar2 is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

69
Example of 2-Sample, 1-Tailed, Pooled t-Test in Excel
In this example two different brand of the same type of battery are being tested to determine if there
probably is a real difference in the average length of time that batteries from each of the two brands last.
The length of each battery’s lifetime of operation in minutes was recorded. Determine with 95 percent
certainty whether Brand A batteries have a longer average lifetime than Brand B batteries.
Here are the data samples from batteries of the two brands:

70
Running the Excel data analysis tool Descriptive Statistics separately on each sample group produces the
following output:

Note that when performing two-sample t-Tests in Excel, always designate Sample 1 (Variable 1) to be the
sample with the larger mean.
The results of the Pooled t-Test will be more intuitive if the sample group with the larger mean is
designated as the first sample and the sample group with the smaller mean is designated as the second
sample.
Another reason for designating the sample group with the larger mean as the first sample is to obtain the
correct result from the Excel data analysis tool t-Test: Two-Sample Assuming Equal Variances. The
test statistic (T Stat in the Excel output) and the Critical t value (t Critical two-tail in the Excel output) will
have the same sign (as they always should) only if the sample group with the larger mean is designated
the first sample.

Summary of Problem Information


Sample Group 1 – Brand A (Variable 1)
x_bar1 = sample1 mean = AVERAGE() = 43.56
µ1 (Greek letter “mu”) = population mean from which Sample 1 was drawn = Not Known
s1 = sample1 standard deviation =STDEV.S() = 16.92
Var1 = sample1 variance =VAR() = 286.13
σ1 (Greek letter “sigma”) = population standard deviation from which Sample 1 was drawn = Not Known
n1 = sample1 size = COUNT() = 16

71
Sample Group 2 – Brand B (Variable 2)
x_bar2 = sample2 mean = AVERAGE() = 33.53
µ2 (Greek letter “mu”) = population mean from which Sample 2 was drawn = Not Known or needed to
solve this problem
s2 = sample2 standard deviation =STDEV.S() = 15.28
Var2 = sample2 variance =VAR() = 233.39
σ2 (Greek letter “sigma”) = population standard deviation from which Sample 2 was drawn = Not Known
or needed to solve this problem
n2 = sample2 size = COUNT() = 17
x_bar1 - x_bar2 = 43.56 – 33.53 = 10.03
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean are as follows:

Question 1) What Type of Test Should Be Done?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Mean because each individual observation (each sampled battery’s lifetime)
within each of the two sample groups can have a wide range of values. Data points for Hypothesis Tests
of Proportion are binary: they can take only one of two possible values.

b) One-Sample or a Two-Sample Test?


This is a two-sample hypothesis test because two independent samples are being compared with each
other. The two sample groups are the lifetimes in minutes of 16 Brand A batteries and the lifetimes of 17
Brand B batteries.

72
c) Independent or Dependent (Paired) Test?
It is an unpaired test because data observations in each sample group are completely unrelated to data
observations in the other sample group. The designation of “paired” or “unpaired” applies only for two-
sample hypothesis tests.

d) One-Tailed or Two-Tailed Test?


The problem asks to determine whether the average lifetime of Brand A batteries is greater than the
average lifetime of Brand B batteries. This is a directional inequality making this hypothesis test a one-
tailed test. If the problem asked whether the average lifetimes of both brands was simply different, the
inequality would be non-directional and the resulting hypothesis test would be a two-tailed test. A two-
tailed test is more stringent than a one-tailed test.

e) t-Test or as a z-Test?
A two-independent-sample hypothesis test of mean must be performed as a t-Test if sample size is small
(n1 + n2 < 40). In this case the sample size is small as n1 + n2 = 33. This Hypothesis Test of Mean must
be performed as a t-Test. A t-Test uses the t distribution and not the normal distribution as does a z-Test.

f) Pooled or Unpooled t-Test?


Pooled t-Tests are performed if the variances of both sample groups are similar. A rule-of-thumb is as
follows: A Pooled t-Test should be performed if the standard deviation of one sample is no more than
twice as large as the standard deviation in the other sample. That is the case here as the following are
true:
s1 = sample1 standard deviation = 16.92
and
s2 = sample2 standard deviation = 15.28

F Test For Sample Variance Comparison in Excel


An Excel F Test performed on the two sample groups produces the following output:

The Null Hypothesis of an F Test states that the variances of the two groups are the same. The p Value
shown in the Excel F Test output equals 0.345. This is much larger than the Alpha (0.05) that is typically
used for an F Test so the Null Hypothesis cannot be rejected.

73
We therefore conclude as a result of the F Test that the variances are the same. The F Test is sensitive to
non-normality of data. The sample variances can also be compared using the nonparametric Levene’s
Test and also the nonparametric Brown-Forsythe Test.

Levene’s Test For Sample Variance Comparison in Excel


Levene’s Test is a hypothesis test commonly used to test for the equality of variances of two or more
sample groups. Levene’s Test is more robust against non-normality of data than the F Test.
The Null Hypothesis of Levene’s Test is average distance to the sample mean is the same for each
sample group. Acceptance of this Null Hypothesis implies that the variances of the sampled groups are
the same. The distance to the mean for each data point of both samples is shown as follows:

74
Levene’s Test involves performing Single-Factor ANOVA on the groups of distances to the mean. This
can be easily implemented in Excel by applying the Excel data analysis tool ANOVA: Single Factor.
Applying this tool on the above data produces the following output:

The Null Hypothesis of Levene’s Test states that the average distance to the mean for the two groups are
the same. Acceptance of this Null Hypothesis would imply that the sample groups have the same
variances. The p Value shown in the Excel ANOVA output equals 0.6472. This is much larger than the
Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
We therefore conclude as a result of Levene’s Test that the variances are the same or, at least, that we
don’t have enough evidence to state that the variances are different. Levene’s Test is sensitive to outliers
because relies on the sample mean, which can be unduly affected by outliers. A very similar
nonparametric test called the Brown-Forsythe Test relies on sample medians and is therefore much less
affected by outliers as Levene’s Test is or by non-normality as the F Test is.

75
Brown-Forsythe Test For Sample Variance Comparison in Excel
The Brown-Forsythe Test is a hypothesis test commonly used to test for the equality of variances of two
or more sample groups. The Null Hypothesis of the Brown-Forsythe Test is average distance to the
sample median is the same for each sample group. Acceptance of this Null Hypothesis implies that the
variances of the sampled groups are the same. The distance to the median for each data point of both
samples is shown as follows:

The Brown-Forsythe Test involves performing Single-Factor ANOVA on the groups of distances to the
median. This can be easily implemented in Excel by applying the Excel data analysis tool ANOVA:
Single Factor. Applying this tool on the above data produces the following output:

76
The Null Hypothesis of the Brown-Forsythe Test states that the average distance to the median for the
two groups are the same. Acceptance of this Null Hypothesis would imply that the sample groups have
the same variances. The p Value shown in the Excel ANOVA output equals 0.6627. This is much larger
than the Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
We therefore conclude as a result of the Brown-Forsythe Test that the variances are the same or, at
least, that we don’t have enough evidence to state that the variances are different.
Each of the above tests can be considered relatively equivalent to the others. The variances of both
sample groups are verified to be similar enough to permit using a Pooled test for this two-independent
sample hypothesis test.
This hypothesis test is a t-Test that is two-independent-sample, one-tailed, Pooled hypothesis test
of mean.

Question 2) Requirements Met?


a) Normal Distribution of Both Sample Means
A t-Test can be performed if the distribution of the Test Statistic (the t value) can be approximated under
the Null Hypothesis by the t Distribution. The t Value for this test is calculated as follows:

To perform a hypothesis test that is based on the normal distribution or t distribution, both sample means
must be normally distributed. In other words, if we took multiple samples just like either one of the two
mentioned here, the means of those samples would have to be normally distributed in order to be able to
perform a hypothesis test that is based upon the normal or t distributions.
For example, 30 independent, random samples of the battery lifetimes from each of the two battery
brands could be evaluated just like the single sample of the lifetimes of 15+ batteries from each of the two
battery brands as mentioned here. If the means of all of the 30 samples from one battery brand and,
separately, the means of the other 30 samples from the other battery brand are normally distributed, a
hypothesis test based on the normal or t distribution can be performed on the two independent samples
taken.
The means of the samples would be normally distributed if any of the following are true:

1) Sample Size of Both Samples Greater Than 30


The Central Limit Theorem states that the means of similar-sized, random, independent samples will be
normally distributed if the sample size is large (n >30) no matter how the underlying population from
which the samples came from is distributed. In reality, the distribution of sample means converges toward
normality when n is as small as 5 as long as the underlying population is not too skewed.

2) Both Populations Are Normally Distributed


If this is the case, the means of similar sized, random, independent samples will also be normally
distributed. It is quite often the case that the distribution of the underlying population is not known and
should not be assumed.

3) Both Samples Are Normally Distributed


If the sample is normally distributed, the means of other similar-sized, independent, random samples will
also be normally distributed. Normality testing must be performed on the sample to determine whether the
sample is normally distributed.
77
In this case the sample size for both samples is small: n1 and n2 are both less than 30. The normal
distribution of both sample means must therefore be tested and confirmed. Normality testing on each of
the samples has to be performed to confirm the normal distribution of the means of both samples.

b) Similarity of Sample Variances


The two-independent sample, Pooled t-Test requires that the two independent samples have similar
variances. Samples that have similar variances are said to be homoscedastistic. Samples that have
significantly different variances are said to be heteroscedastistic. The samples in this example have the
similar variance variances. This is confirmed by the variances comparison tests that were previously
mentioned in this example.

c) Independence of Samples
This type of a hypothesis test requires both samples be totally independent of each other. In this case
they are completely independent. There is no relationship between the observations that make up each of
the two sample groups.

Evaluating the Normality of the Sample Data


The following five normality tests will be performed on the sample data here:
An Excel histogram of the sample data will be created.
A normal probability plot of the sample data will be created in Excel.
The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.
The Anderson-Darling test for normality of the sample data will be performed in Excel.
The Shapiro-Wilk test for normality of the sample data will be performed in Excel.
The quickest way to evaluate normality of a sample is to construct an Excel histogram from the sample
data.

78
Histogram in Excel

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

79
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.

80
Normal Probability Plot in Excel
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:

Normal probability plots for both sample groups show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).

81
Kolmogorov-Smirnov Test For Normality in Excel
The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
Variable 1 - Brand A Battery Lifetimes

0.0885 = Max Difference Between Actual and Expected CDF


16 = n = Number of Data Points
0.05 = α

82
Variable 2 - Brand B Battery Lifetimes

0.1007 = Max Difference Between Actual and Expected CDF


17 = n = Number of Data Points
0.05 = α

83
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected only if the maximum difference between the expected and actual CDF
of any of the data points exceed the Critical Value for the given n and α. That is not the case here.
The Max Difference Between the Actual and Expected CDF for Variable 1 (0.0885) and for Variable 2
(0.1007) are significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15
(0.34) at α = 0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of each of the two sample
groups is accepted.

84
Anderson-Darling Test For Normality in Excel
The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons:
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.
If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
Variable 1 – Brand A Battery Lifetimes

Adjusted Test Statistic A* = 0.174

85
Variable 2 - Brand B Battery Lifetimes

Adjusted Test Statistic A* = 0.227


Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for Variable 1 (0.174) and for Variable 2 (0.227) are significantly less than
the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for
each of the two sample groups is accepted.

86
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
Variable 1 – Brand A Battery Life

0.972027 = Test Statistic W


0.887 = W Critical for the following n and Alpha
16 = n = Number of Data Points
0.05 = α
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

87
Test Statistic W (0. 972027) is larger than W Critical 0.887. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

Variable 2 – Brand B Battery Life

0.971481 = Test Statistic W


0.892 = W Critical for the following n and Alpha
17 = n = Number of Data Points
0.05 = α
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0. 971481) is larger than W Critical 0.892. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

88
Correctable Reasons That Normal Data Can Appear Non-Normal
If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.

When Data Are Not Normally Distributed


When normality of data cannot be confirmed for a small sample, it is necessary to substitute a
nonparametric test for a t-Test. Nonparametric tests do not have the same normality requirement that the
t-Test does. The most common nonparametric test that can be substituted for the two-independent-
sample t-Test when data normality cannot be confirmed is the Mann-Whitney U Test.
The Mann-Whitney U Test is performed on the data in this example at the end of this section.
Nonparametric tests are generally less powerful (less able to detect a difference) than parametric tests.
The parametric two-independent sample, one-tailed t-Test performed here does detect a difference at
alpha = 0.05. the nonparametric Mann-Whitney U test conducted at the end of this section on the same
data did not detect a difference at alpha = 0.05.
The required questions have been satisfactorily answered. We will however perform the t-Test to
demonstrate how a two-independent-sample, Pooled t-Test is done. We now proceed to complete the
four-step method for solving all Hypothesis Tests of Mean. These four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

89
Step 1 – Create Null and Alternate Hypotheses
The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the average optimism scores for both sample groups are the
same. We will use the variable x_bar1-x_bar2 to represent the difference between the means of the two
groups. If the mean scores for both groups are the same, then the difference between the two means,
x_bar1-x_bar2, would equal zero. The Null Hypothesis is as follows:
H0: x_bar1-x_bar2 = Constant = 0
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the mean of the population from which the
first sample (x_bar1) was taken is greater than the mean of the population from which the second sample
was taken (x_bar2). The Alternate Hypothesis is as follows:
H1: x_bar1-x_bar2 > Constant, which is 0
H1: x_bar1-x_bar2 > 0
The Alternative Hypothesis is directional (“greater than” or “less than” instead of “not equal”) and the
hypothesis test is therefore a one-tailed test. The “greater than” operator in the Alternative hypothesis
indicates that this one-tailed test occurs in the right tail. It should be noted that a two-tailed test is more
rigorous (requires a greater differences between the two entities being compared before the test shows
that there is a difference) than a one-tailed test.
The following formulas are used by the Two-Independent Sample, Pooled t-Test:

Pooled Degrees of Freedom


df = degrees of freedom = n1 + n2 - 2
df = 16 + 17 – 2 = 31

Pooled Sample Standard Deviation

2 2
sPooled = SQRT[{(n1-1)s1 +(n2-1)s2 }/df]
2 2
sPooled = SQRT[{(16-1)*(16.915) +(17-1)*(15.277) }/31]
sPooled = 16.09

90
Pooled Sample Standard Error

SEPooled = sPooled *SQRT(1/n1 + 1/n2)


SEPooled = 16.09 * SQRT(1/16 + 1/17)
SEPooled = 5.6046
2
Note that this calculation of the Standard Error using the sample variance, s , is an estimate of the true
2
Standard Error which would be calculated using the population variance, σ , of the populations from
which the samples were drawn.
These parameters are used to map the distributed variable, x_bar1-x_bar2, to the t Distribution as follows:

Step 2 – Map Distributed Variable on a t-Distribution Curve


A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed
according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-
related Test Statistic are distributed according to the t Distribution.
The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies
only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its
degrees of freedom, which equals n – 1 if n = sample size.
The means of similar, random samples taken from a normal population are distributed according to the t
Distribution. This means that the distribution of a large number of means of samples of size n taken from
a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.
The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of
freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map
the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.

91
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:

A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar 1-x_bar2.

92
The variable x_bar1-x_bar2 is distributed according to the t Distribution. Mapping this distributed variable
to a t Distribution curve is shown as follows:

This non-standardized t Distribution curve is constructed from the following parameters:


Mean = 0, which is the Constant taken from the Null Hypothesis
Standard Error Pooled = 5.604
Degrees of Freedom = 31
Distributed Variable = : x_bar1-x_bar2

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bar1 is from x_bar2 by showing how different x_bar1-x_bar2 (10.033) is
from zero.
The non-standardized t Distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.
If the difference between the sample means, x_bar1-x_bar2 (10.033), falls into a Region of Rejection, the
Null Hypothesis is rejected. If the difference between the sample means, x_bar 1-x_bar2 (10.033), falls into
a Region of Acceptance, the Null Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.

93
This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the
Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which
outer tail. The Alternative Hypothesis is the follows:
H1: x_bar1-x_bar2 > 0
A “greater than” or “less than” operator indicates that this will be a one-tailed test. The “greater than” sign
indicates that the Region of Rejection will be in the right tail.
The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical Values need to be calculated.

Calculate Critical Values

One-Tailed Critical Values


A Critical Value is the boundary between a Region of Acceptance and a Region of Rejection. The entire
5-percent alpha region lies beyond the Critical Value because this is a one-tailed test. The Critical Value
can be found as follows:
Critical Value = Mean + (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Value = Mean + T.INV(1-α,df) * SE
Critical Value = 0 + T.INV(1-0.05, 31) * 5.604
Critical Value = 0 + 9.503
Critical Value = +9.503
The Region of Rejection therefore includes everything that is to the right of +9.503.
The distribution curve with the blue Region of Acceptance and the yellow Region of Rejection is shown is
as follows:

94
If this were a two-tailed test, the Critical values would be determined as follows:

Two-Tailed Critical Values


Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± T.INV(1-α/2,df) * SE
Critical Values = 0 ± T.INV(0,975, 31) * 5.604
Critical Values = 0 ± 11.43
Critical Values = -11.43 and +11.43
The Critical Values for the two-tailed test (-11.43 and +11.43) are farther from the mean than the Critical
Value for one-tailed test (+9.503). This means that a two-tailed test is more stringent than a one-tailed
test.

Step 4 – Determine Whether to Reject Null Hypothesis


The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information The three tests are as
follows:

1) Compare the Sample Mean x_bar1-x_bar2 With Critical Value


Reject the Null Hypothesis if the sample mean, x_bar1-x_bar2 = 10.033, falls into the Region of Rejection.
Do not reject the Null Hypothesis if the sample mean, x_bar 1-x_bar2 = 10.033, falls into the Region of
Acceptance
Equivalently, reject the Null Hypothesis if the sample mean, x_bar 1-x_bar2 = 10.033, is further the curve’s
mean of 0 than the Critical Value.
The Critical Value has been calculated to be 9.503. The observed x_bar 1-x_bar2 (10.033) is further from
the curve mean (0) than Critical Value (9.503). The Null Hypothesis would therefore be rejected.

2) Compare t Value With Critical t Value


The t Value is the number of Standard Errors that x_bar1-x_bar2 (10.033) is from the curve’s mean of 0.
The Critical t Value is the number of Standard Errors that the Critical Value (9.503) is from the curve’s
mean.
Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t
Value.
The t Value, the Test Statistic in a t-Test, is the number of Standard Errors that x_bar 1-x_bar2 is from the
mean. The Critical t Value is the number of Standard Errors that the Critical Value is from the mean. If the
t Value is larger than the Critical t Value, the Null Hypothesis can be rejected.

t Value (test statistic) = (x_bar1 - x_bar2 - 0) / SE


t Value (test statistic) = (10.033)/5.604 = 1.790

95
One-Tailed (Right Tail) Critical t Value = T.INV(1-α,df)
One-Tailed (Right Tail) Critical t Value = T.INV(1-0.05, 31) = 1.696
This indicates that (x_bar1 - x_bar2) is 1.696 standard errors to the right of the Constant, which is 0.

The t Value (1.790) is farther from the standardized mean of zero than the Critical t Value (1.696) so the
Null Hypothesis is rejected.

3) Compare the p Value With Alpha


The p Value is the percent of the curve that is beyond x_bar 1-x_bar2 (10.033). If the p Value is smaller
than Alpha, the Null Hypothesis is rejected.
p Value = T.DIST.RT(ABS(t Value), df)
p Value = T.DIST.RT(ABS(1.790), 31)
p Value = 0.042
The p Value (0.042) is smaller than Alpha (0.05) and we therefore reject the Null Hypothesis. A graph
below shows that the red p Value (the curve area beyond x_bar 1-x_bar2) is smaller than the yellow Alpha,
which is the 5 percent Region of Rejection in the outer right tail. This is shown in the following Excel-
generated graph of this non-standardized t Distribution curve:

The value of x_bar1-x_bar2, 10, has a t Value of 1.79 and therefore is 1.79 standard errors from the mean.
This is further from the mean than the critical value of 9.6, which is the t critical distance of 1.69 standard
errors from the mean.

96
It should be noted that if this t-Test were a two-tailed test, which is more stringent than a one-tailed test,
the Null Hypothesis would be accepted because:
1) The p Value (0.042) would now be larger than Alpha/2 (0.025)
2) x_bar1-x_bar2 (10.033) would now be in the Region of Acceptance, which would now have its outer
right boundary at 11.43 (mean + T.INV(1-α/2,df)*SE)

Excel Data Analysis Tool Shortcut


This two-independent-sample, Pooled t-Test can be solved much quicker using the following Excel data
analysis tool:
t-Test: Assuming Equal Variances
Before this test is employed, all required assumptions such as normality of data must be verified as was
done.
The two-independent sample, pooled t-Test can be quickly solved in Excel using either the Data Analysis
tool or the formula that are both specific for this test. The Excel tool can be found by clicking Data
Analysis under the Data tab. The tool is titled t-Test:Two-Sample Assuming Equal Variances. The
entire Data Analysis Toolpak is an add-in that ships with Excel but must first be activated by the user
before it is available. This tool will be applied to the following data set using the same data as the
preceding example in this section.
As mentioned, the data should be input with the sample having the largest mean being designated as the
sample group. Doing so ensures that the Excel output will have the same signs for the t Value and Critical
t Value. If the sample with the smallest mean is input is the first sample, the t Value will correctly be
negative but the Critical t Value will be incorrectly listed by Excel as being positive.

97
Following are screen shots of how the data should be entered:

The completed dialogue box for this tool is shown as follows:

98
Clicking OK will produce the following result. This result agrees with the calculations that were performed
in this section.

99
The calculations to create the preceding output were performed as follows. The individual outputs are
color-coded so it is straight-forward to match the calculations with the outputs of the tool.

100
101
102
Excel Statistical Function Shortcut
Another very quick way to perform this t-Test is to calculate the p value and compare it to Alpha (for a
one-tailed test) or Alpha/2 for a two-tailed test.
The p Value of this two-independent-sample, Pooled t-Test can be very quickly using the following Excel
statistical function:
=T.TEST(array 1,array2,1,2)
Before this test is employed, all required assumptions such as normality of data must be verified as was
done.
The stand-alone Excel formula to perform a two-independent sample, pooled t-Test is shown as follows. If
the resulting p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test, the difference
between the means of the samples is deemed to be statistically significant. This indicates that the two
samples were likely drawn from different populations.

The Null Hypothesis of the t-Test would not be rejected if the test were two-tailed because the p Value
(0.042) is greater than Alpha/2 (0.025).The Null Hypothesis of the t-Test would be rejected if the test were
one-tailed because the p Value (0.042) is less than Alpha/2 (0.025). A one-tailed test is less.

103
Effect Size in Excel
Effect size in a t-Test is a convention of expressing how large the difference between two groups is
without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.”
A large effect would be a difference between two groups that is easily noticeable with the measuring
equipment available. A small effect would be a difference between two groups that is not easily noticed.
Effect size for a two-independent-sample, pooled t-Test is a method of expressing the distance between
the difference between sample mean, x_bar1-x_bar2, and the Constant in a standardized form that does
not depend on the sample size.
Remember that the Test Statistic (the t Value) for a two-independent-sample t-Test calculated by the
following formula:

which equals

since

104
Since degrees of freedom for a two-independent-sample, pooled t-Test equals the following:
df = n1 + n2 – 2

The t Value specifies the number of Standard Errors that the differences between sample means, x_bar 1-
x_bar2, is from the Constant. The t Value is dependent upon the sample size, n. The t Value determines
whether the test has achieved statistical significance and is dependent upon sample size. Achieving
statistical significance means that the Null Hypothesis (H 0: x_bar1-x_bar2 = Constant = 0) has been
rejected.
The t Value for a two-independent-sample, pooled is calculated as follows:

The Effect Size, d, for a two-independent-sample, pooled t-Test is a very similar measure that does not
directly depend on sample size and has the following formula:

spooled pools the sample standard deviations based upon the proportion of combined samples that each of
the sample sizes n1 and n2 represent and not the absolute values of n1 and n2. spooled is therefore not
directly dependent on sample sizes n1 and n2.
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
If the t Value has already been calculated, the Effect Size can be quickly calculated by the following
formula:

105
The d measured here is Cohen’s d for a two-independent-sample, pooled t-Test. The Effect Size is a
standardized measure of size of the difference that the t-Test is attempting to detect. The Effect Size for a
two-independent-sample, pooled t-Test is a measure of that difference in terms of the number of sample
standard deviations. Note that sample size has no effect on Effect Size. Effect size values for the two-
independent-sample, pooled t-Test are generalized into the following size categories:
d = 0.2 up to 0.5 = small Effect Size
d = 0.05 up to 0.8 = medium Effect Size
d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar1 - x_bar2 – Constant| / spooled = |43.56 – 33.53 – 0| / 16.09 = 0.623
An Effect Size of d = 0.623 is considered to be a medium effect.

Power of the Test With Free Utility G*Power


The Power of a two-independent-sample, pooled t-Test is a measure of the test’s ability to detect a
difference given the following parameters:
Alpha (α)
Effect Size (d)
Sample Sizes (n1 and n2)
Number of Tails
Power is defined by the following formula:
Power = 1 – β
Β equals the probability of a Type 2 Error. A Type 2 Error can be described as a False Negative. A false
Negative represents a test not detecting a difference when a difference does exist.
1 – β = Power = the probability of a test detecting a difference when one exists.
Power is therefore a measure of the sensitivity of a statistical test. It is common to target a Power of 0.8
for statistical tests. A Power of 0.8 indicates that a test has an 80 percent probability of detecting a
difference.
The four variables that are required in order to determine the Power for a one-sample t-Test are Alpha
(α), Effect Size (d), Sample Sizes (n1 and n2), and the Number of Tails. Typically alpha, Effect Size, and
the Number of Tails are held constant while sample sizes are varied (usually increased) to achieve the
desired Power for the statistical test.
Manual calculation of a test’s Power given Alpha, Effect Size, Sample Size, and the Number of Tails are
quite tedious. Fortunately there are a number of free utilities online that will readily calculate a test’s
statistical Power. A widely-used online Power calculation utility called G*Power is available for download
from the Institute of Experimental Psychology at the University of Dusseldorf at this link:

http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.
Bring up G*Power’s initial screen and input the following information:

106
Test family: t-Tests
Statistical test: Means: Difference between two independent means (two groups)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 1
Effect Size (d) = 0.623
Alpha (α) = 0.05
Sample Sizes (n1 = 16 and n2 = 17)
The completed dialogue screen appears as follows:

107
Clicking Calculate would produce the following output:

The Power achieved for this test is 0.5416. This means that the current one-tailed test has a 54.16
percent chance of detecting a difference that has an effect size of 0.623 if α = 0.05, n 1 = 16, and n2 = 17.

108
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:

This would indicate that a Power of 80 percent would be achieved for this test if the total sample size
were equal to approximately n1 + n2 = 65.

109
Nonparametric Alternatives in Excel
The Mann-Whitney U Test is a nonparametric test that can be substituted for the two-sample t-Test (both
pooled or unpooled) when the following circumstances occur:
1) Normality of at least one sample or one population cannot be verified and sample size is small.
2) The data is ordinal. A t-Test requires that the data be either ratio or interval but not ordinal. The Mann-
Whitney U Test requires only the data be at least ordinal so that all of the data can be ranked. The
specific difference between data values does not have to be measurable.
3) Either one of the sample groups has significant outliers. The Mann-Whitney U Test is based upon the
rankings of data values and is therefore much affected by outliers than a t-Test, which is based on
sample means.
The two-independent-sample t-Test compares the means of the two samples to determine if the means of
the two populations are significantly different. The two populations are those from the two samples were
taken.

Mann-Whitney U Test in Excel


The Mann-Whitney U Test performs a similar evaluation by comparing the ranks of one sample group to
the average ranks of both sample groups to determine if the ranks of each of the two populations are
significantly different. The two populations are those from the two samples were taken.
Basics of the Mann-Whitney U Test
The Mann-Whitney U Test is based upon the adjusted sum of rankings of values in each of the two
samples. All of the data from both sample groups is combined into one group and ranked. The data,
along with their rankings are then returned to their original groups. The rankings of each group are
summed up separately and then adjusted. The smaller of the two adjusted sums is designated as the
Test Statistic U.
If the required assumptions of the Mann-Whitney U Test are met, the Test Statistic is approximately
normally distributed. A z Score is calculated based upon the Test Statistic and then compared to a Critical
z Value based upon the specified alpha and number of tails in the test.
The Null Hypothesis is rejected or not rejected based upon the z Score is further from the standardized
normal mean of zero than the Critical z Value.
The Null Hypothesis of the Mann-Whitney U Test is similar to the Null Hypothesis of a two-independent
sample t-Test. The t-test would have the following Null Hypothesis:
H0: x_bar1 – x_bar2 = Constant
The Mann-Whitney U Test would have the following Null Hypothesis:
H0: Sum_of_Ranks1 – Sum_of_Ranks2 = 0
If the Null Hypothesis cannot be rejected, the sample groups are not considered to be different at the
specified level of significance. If this Null Hypothesis can be rejected, the sample groups are considered
to be different.
The Mann-Whitney U Test can be performed as a one or two-tailed test. As with most hypothesis tests,
the operator in the Alternative Hypothesis determines whether the test is one or two-tailed.
A non-directional operator (a “not equal” sign) in the Alternative Hypothesis indicates a two-tailed test. If
the Mann-Whitney U Test is to be performed as a one-tailed test, it will always be performed in the left tail
regardless of which sample is expected to have the largest rank sum. The Test Statistic U for the Mann-
Whitney U Test is always based upon the sample with the lowest rank sum. This will be discussed further
in this section.
Required Assumptions of the Mann-Whitney U Test

110
1) The data are at least ordinal so that the data can be ranked. Differences between sample data points
do not have to be measurable.
2) All data observations are independent of each other.
3) The sum of sample sizes, n1 + n2, equals at least 20.
4) Both samples have similar distribution shapes. A histogram of each sample will display the shape of
the data’s distribution.
If these assumptions are met, the Test Statistic U will have an approximately normal distribution. The
Mann-Whitney U test is based upon the Test Statistic U being approximately normally distributed. Test
Statistic U is the sum of the ranks of the data in one of the two samples.
Following are the data from the two data samples that will be compared in this Mann-Whitney U Test:

111
Step 1 – Evaluate Whether the Required Assumptions Are Met
The required assumptions for the Mann-Whitney U Test are as follows:
1) The data are at least ordinal so that the data can be ranked. Differences between sample data points
do not have to be measurable.
2) All data observations are independent of each other.
3) The sum of sample sizes, n1 + n2, equals at least 20.
4) Both samples have similar distribution shapes. A histogram of each sample will display the shape of
the data’s distribution.
The first three assumptions have clearly been met. Histograms of each sample group have to be created
to determine if both samples have similar distribution shapes. The following Excel histograms show that
the sample groups have reasonably similar distribution shapes:

112
Step 2 – Create the Null and Alternative Hypotheses
The purpose of the original t-Test was to determine with 95 percent certainty whether Brand A batteries
have a longer average lifetime than Brand B batteries. The Null and Alternative Hypotheses for this t-Test
are the following:
H0: x_bar1-x_bar2 = 0
H1: x_bar1-x_bar2 > 0
The “greater than” operator in the Alternative Hypothesis indicates that this is a one-tailed test in the right
tail.
The two-independent-sample t-Test compares the means of the two samples to determine if the means of
the two populations are significantly different. The two populations are those from the two samples were
taken.
The Mann-Whitney U Test performs a similar evaluation by comparing the ranks of one sample group to
the average ranks of both sample groups to determine if the ranks of each of the two populations are
significantly different. The two populations are those from the two samples were taken.

113
Just as with a two-independent-sample t-Test, the Mann-Whitney U Test can be performed as a two-
tailed test or as a one-tailed test. The Alternative Hypothesis specifies which tail(s) the test will be
focused on.
The Null Hypothesis for this Mann-Whitney U Test is as follows:
H0: U = Uaverage
There is one notable difference between a one-tailed t-Test and a one-tailed Mann-Whitney U Test. A
one-tailed t-Test either the left of the right tail. A one-tailed Mann-Whitney U Test will always be
performed in the left tail regardless of which sample is expected to have the larger rank sum.
The Alternative Hypothesis for this one-tailed test in the left tail is the following:
H1: U < Uaverage
The reason that a one-tailed Mann-Whitney U Test is always performed in the left tail is the Test Statistic
U is always less than Uaverage (which is the average of U1 and U2) because Statistic U is set to equal the
smaller of the two adjusted sums of ranks, U1 and U2, for the two groups.
Uaverage = (U1 and U2)/2
It should be noted that the one-tailed t-Test was performed in the right tail. This one-tailed Mann-Whitney
U Test is performed in the left tail.
Test Statistic U is calculated in the following steps.

114
Step 3 – Combine All of the Data Into a Single Column
Make sure that each data point has its group name in an adjacent cell. This will be necessary to return
the data back to the original groups.

115
Step 4 – Sort All of the Data

116
Step 5 – Rank All of the Data
Ties (data that have the same values) are assigned the rank that is the average rank for all of the tied
values. For example, the two tied data values of 26 would have been assigned the ranks of 9 and 10 if
they were not tied. Since they are tied, they are both assigned the average rank of 9.5.

117
Step 6 – Return the Data to the Original Two Groups
Sort all three columns simultaneously according the column that contains the name of the original group
that each data value belongs.

118
Step 7 – Calculate R and n For Each Sample Group
R equals the sum of the ranks for each group and n is the sample size of each group.

119
Step 8 – Calculate U1 and U2
U1 and U2 are adjusted rank sums for the two groups.
U1 = R1 – n1(n1 + 1)/2
U1 = 314 – 16(16 – 1)/2 = 178
U2 = R2 – n2(n2 + 1)/2
U2 = 247 – 17(17 – 1)/2 = 94

Step 9 – Set-Test Statistic U to the Smaller of U1 or U2


U1 = 178
U2 = 94
Test Statistic U = 94

120
Step 10 – Calculate the Mean and Standard Deviation of U
U_bar = (U1 + U2) / 2 = n1*n2 / 2 = 136
sU = SQRT( n1 * n2 * (n1 + n2+ 1) / 12) = 27.76

Step 11 – Calculate the z Score and Critical z Value


It should be noted that the one-tailed t-Test was performed in the right tail. This one-tailed Mann-Whitney
U Test is performed in the left tail because Test Statistic U is always set to equal the smaller of U1 and U2.
z Score = (U – U_bar)/ sU
z Score = (94 – 136)/27.76 = -1.5130
Critical z Value α=0.05,one-tailed,left tail = NORM.S.INV(α)
Critical z Value α=0.05,one-tailed,left tail = NORM.S.INV(0.05) = -1.6448
If this were a two-tailed test, the Critical z Values would be calculated as follows:
Critical z Values α=0.05,two-tailedl = ±NORM.S.INV(1 – α/2) = ±1.9599

Step 12 – Determine Whether or Not to Reject the Null Hypothesis by Comparing the z Score to the
Critical z Value
The Null Hypothesis is rejected if the z Score is farther from the standardized normal distribution’s mean
of zero than the Critical z Value. This is not the case here because the z Score (-1.5130) is closer to the
standardized mean of zero than the Critical z Value (-1.6448). There is not enough evidence to reject the
Null Hypothesis at an alpha level of 0.05.
This one-tailed, left tail, Mann-Whitney U Test was not sensitive enough to detect a difference at α = 0.05.
The Null Hypothesis, which states that the adjusted rank sum of one of the groups is not different than the
average adjusted rank sum of both groups, is not rejected. The rankings of the data in each group are not
found to be significantly different at an alpha level of 0.05. The two populations from which the samples
were taken are not assumed to have different rankings. This one-tailed Mann-Whitney U Test did not
detect a difference in the two populations based on the two samples taken from the populations. This
information is shown in the following Excel-generated graph:

121
The equivalent one-tailed, right tail, two-independent-sample t-Test was sensitive enough to detect a
difference at α = 0.05. The t value (1.79) was further from the standardized mean of zero than the Critical
t Value (1.69). The Null Hypothesis of this t-Test, which states that the means of both populations are not
different, is rejected. This one-tailed t-Test did detect a difference in the two populations based on the two
samples taken from the populations. This information is shown in the following Excel-generated graph:

122
You may have noticed that the p Value (red region in the chart) appears in the left tail in the Mann-
Whitney U Test but appears in the right tail in the t-Test graph directly above.
It should be noted that the Mann-Whitney U Test is always has its p value (red region in the graph) in its
left tail. This is due to the negative z Score. The z Score calculated in the Mann-Whitney U Test will
always negative because Test Statistic U is always set to equal the smaller of U1 and U2.
The formula for this z Score is the following:
z Score = (U – U_bar)/ sU
This z Score is negative because U is always less than U_bar.
The p Value for a t-test can appear in the right or left tail because the t Value of a t-Test can be positive or
negative. The t Value in this t-Test is positive because the formula for the t Value is the following:

x_bar1 = 43.56
x_bar2 = 33.53
Constant = 0
This t Value is positive because x_bar1 - x_bar2 - Constant is positive.
Nonparametric tests generally have less power (ability to detect a difference) than their parametric
equivalents. One way to increase the likelihood that a nonparametric test will detect a difference is to
increase alpha. Increasing alpha decreases the required level of certainty because of the following
relationship:
Alpha – 1 – Level of Required Certainty
If alpha were doubled from a value of α = 0.05 to α = 0.10, the Critical z Value is changed from a value of
-1.6448 to -1.2816. The Mann-Whitney U Test would have detected a difference in this case.

123
How Sample Standard Deviation Affects t-Test Results
When the standard deviation in sample groups is increased, the sample groups harder to tell apart. This
might be more intuitive to understand if presented visually.
Below are box plots of three sample groups each having a small sample standard deviation:

Each of the sample groups is visually easy to differentiate from the others. The measures of spread -
standard deviation and variance - are shown for each sample group. Remember that variance equals
standard deviation squared.
If each sample group’s spread is increased (widened), the sample groups become much harder to
differentiate from each other. The graph shown below is of three sample groups having the same means
as above but much wider spread.

124
It is easy to differentiate the sample groups in the top graph but much less easy to differentiate the
sample groups in the bottom graph simply because the sample groups in the bottom graph have much
wider spread.
In statistical terms, one could say that it is easy to tell that the samples in the top graph were drawn from
different populations. It is much more difficult to say whether the sample groups in the bottom graph were
drawn from different populations.
Relationship Between the Two-Independent-Sample, Pooled t-Test and Single-Factor ANOVA
The preceding illustrates the underlying principle behind both t-tests and ANOVA tests. One of the main
purposes of both t-tests and ANOVA tests is to determine whether samples are from the same
populations or from different populations. The variance (or equivalently, the standard deviation) of the
sample groups is what is what determines how difficult it is to tell the sample groups apart.
The two-independent-sample, pooled t-test is essentially the same test as single-factor ANOVA. The two-
independent-sample, pooled t-test can only be applied to two sample groups at one time. Single-Factor
ANOVA can be applied to three or more groups at one time. Both two-independent-sample, pooled t-test
and single-factor ANOVA require that variances of sample groups be similar.
We will apply both the two-independent sample t-test and single-factor ANOVA to the first two samples in
each of the above graphs to verify that the results are equivalent.

125
Sample Groups With Small Variances (the first graph)

126
Applying a two-independent-sample, pooled t-test to the first two of the three sample groups of this graph
would produce the following result:

This result would have been obtained by filling in the Excel dialogue box as follows:

127
Running Single-Factor ANOVA on those same two sample groups would produce this result:

This result would have been obtained by filling in the Excel dialogue box as follows:

Both the Two-Independent-Sample, Pooled t-test and the Single-Factor ANOVA test produce the same
result when applied to these two sample groups. They both produce the same p Value (1.51E-10) which
is extremely small. This indicates that the result is statistically significant and that the difference in the
means of the two groups is real. More correctly put, it can be stated that there is a very small chance
(1.51E-10) that the samples came from the same population and that the result obtained (that their
means are different) was merely a random occurrence.

128
Sample Groups With Large Variances (the second graph)

129
Applying a two-independent sample t-test to the first two of the three sample groups in this graph would
produce the following result:

This result would have been obtained by filling in the Excel dialogue box as follows:

130
Running Single-Factor ANOVA on those same two sample groups would produce this result:

This result would have been obtained by filling in the Excel dialogue box as follows:

Both the t-test and the ANOVA test produce the same result when applied to these two sample groups.
They both produce the same p Value (0.230876). This is relatively large. 95 percent is the standard level
of confidence usually required in statistical hypothesis tests to conclude that the results are statistically
significant (real). The p value needs to be less than 0.05 to achieve a 95 percent confidence level that a
difference really exists. The sample groups with the large spread produced a p Value greater than 0.05
and we can therefore not reject the Null Hypothesis which states that the sample groups are the same.
The results are not statistically significant and we cannot conclude that the two samples were not drawn
from the same population.

131
Showing How the Formulas For Both the t-Test and for ANOVA Produce the Same Result
t-Test Formula
The Two-Independent-Sample, Pooled t-Test is used to determine with a specific degree of certainty
whether there really is a difference between the mean values of two sample groups given a similar
amount of variance in each of the two sample groups.
If the sample standard deviation in each of the two sample groups, s 1 and s2, is large, then the Pooled
Standard Deviation will also be large, as can be seen from the following equation:

Pooled Sample Standard Deviation


2 2
sPooled = SQRT[{(n1-1)s1 +(n2-1)s2 }/df]
This, in turn, increases the value of length on one Standard Error, SE Pooled, as can be seen in the
following equation:

Pooled Sample Standard Error


SEPooled = sPooled *SQRT(1/n1 + 1/n2)
This, in turn, decreases the t Value, as can be seen in the following equation:
t Value = (x_bar1-x_bar2) / SEPooled
The larger the t Value, the more likely it is that the sample groups are different, i.e., came from different
populations.
The bottom line is that increased variance (or, equivalently, standard deviation) in the sample groups
causes the t Value to be smaller. This makes it less likely that a t-Test will show that the sample groups
are really different.

ANOVA
The ANOVA outputs of the previous two comparisons demonstrate the following:
The smaller the p Value is, the more certainty exists that sample groups are really different, i.e., that the
sample groups came from different populations.
The p Value is derived from the F value. The larger the F Value, the smaller is the p Value.
The F value can be roughly described as being the variation between groups divided by the variation
within groups (the spread of the groups).
As the spread (standard deviation) of the sample groups increase, the F value become smaller. When the
F Value become smaller, the p Value becomes larger. The larger the p Value becomes, the less certainty
exists that the ANOVA results are statistically significant (real). If the results are not statistically
significant, we cannot reject the Null Hypothesis that states that the sample different (drawn from different
populations).
Bottom line: the larger the standard deviation of sample groups being compared with a two-independent-
sample, pooled t-Test or single-factor ANOVA, that harder it is to state that the sample groups are truly
different, i.e., that the sample groups come from different populations.

132
3) Two-Independent-Sample, Unpooled t-Test in Excel

Overview
This hypothesis test evaluates two independent samples to determine whether the difference between the
two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-
tailed test) than a constant. This is an unpooled test because a single pooled standard deviation
CANNOT replace both sample standard deviations because they are too different.
x_bar1 - x_bar2 = Observed difference between the sample means

Unpooled t-Tests are performed if the variances of both sample groups are not similar. A rule-of-thumb is
as follows: A Pooled t-Test should only be performed if the standard deviation of one sample, s 1, is no
more than twice as large as the standard deviation in the other sample s 2. An unpooled t-Test should be
performed if that condition is not met.
Null Hypothesis H0: x_bar1 - x_bar2 = Constant
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar1 - x_bar2 is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

133
Example of 2-Sample, 2-Tailed, Unpooled t-Test in Excel
This problem is very similar to the problem solved in the z-test section for a two-independent-sample, two-
tailed z-test. Similar problems were used in each of these sections to show the similarities and also
contrast the differences between the two-independent-sample z-Test and t-test as easily as possible.
Two shifts on a production are being compared to determine if there is a difference in the average daily
number of units produced by each shift. The two shifts operate eight hours per day under nearly identical
conditions that remain fairly constant from day to day. A sample of the total number of units produced by
each shift on a random selection of days is taken. Determine with a 95 percent Level of Confidence if
there is a difference between the average daily number of units produced by the two shifts.
Here is the sampled data as follows:

134
Running the Excel data analysis tool Descriptive Statistics separately on each sample group produces the
following output:

Note that when performing two-sample t-Tests in Excel, always designate Sample 1 (Variable 1) to be the
sample with the larger mean.
The results of the Unpooled t-Test will be more intuitive if the sample group with the larger mean is
designated as the first sample and the sample group with the smaller mean is designated as the second
sample.
Another reason for designating the sample group with the larger mean as the first sample is to obtain the
correct result from the Excel data analysis tool t-Test:Two-Sample Assuming Unequal Variances. The
test statistic (T Stat in the Excel output) and the Critical t value (t Critical two-tail in the Excel output) will
have the same sign (as they always should) only if the sample group with the larger mean is designated
the first sample.

Summary of Problem Information


Sample Group 1 – Shift A (Variable 1)
x_bar1 = sample1 mean = AVERAGE() = 46.55
µ1 (Greek letter “mu”) = population mean from which Sample 1 was drawn = Not Known
s1 = sample1 standard deviation =STDEV.S() = 24.78
Var1 = sample1 variance =VAR() = 613.84
σ1 (Greek letter “sigma”) = population standard deviation from which Sample 1 was drawn = Not Known
n1 = sample1 size = COUNT() = 20

135
Sample Group 2 – Shift B (Variable 2)
x_bar2 = sample2 mean = AVERAGE() = 42.24
µ2 (Greek letter “mu”) = population mean from which Sample 2 was drawn = Not Known
s2 = sample2 standard deviation =STDEV.S() = 11.80
Var2 = sample2 variance =VAR() = 139.32
σ2 (Greek letter “sigma”) = population standard deviation from which Sample 2 was drawn = Not Known
n2 = sample2 size = COUNT() = 17
x_bar1 - x_bar2 = 46.55 – 42.24 = 4.31
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The initial two questions that need to be answered before performing the Four-Step Hypothesis Test of
Mean are as follows:

Question 1) What Type of Test Should Be Done?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Mean because each individual observation (each sampled shift’s output)
within each of the two sample groups can have a wide range of values. Data points for Hypothesis Tests
of Proportion are binary: they can take only one of two possible values.

b) One-Sample or Two-Sample Test?


This is a two-sample hypothesis test because two independent samples are being compared with each
other. The two sample groups are the daily units produced by Shift A and the daily units produced by Shift
B.

c) Independent (Unpaired) Test or Dependent (Paired) Test?


It is an unpaired test because data observations in each sample group are completely unrelated to data
observations in the other sample group. The designation of “paired” or “unpaired” applies only for two-
sample hypothesis tests.

136
d) One-Tailed or Two-Tailed Test?
The problem asks to determine whether there is simply a difference in the average number of daily units
produced by Shift A and by Shift B. This is a non-directional inequality making this hypothesis test a two-
tailed test. If the problem asked to determine whether Shift A’s production is greater than or less than
than Shift B’s, the inequality would be directional and the resulting hypothesis test would be a one-tailed
test. A two-tailed test is more stringent than a one-tailed test.

e) t-Test or z-Test?
A two-independent-sample hypothesis test of mean must be performed as a t-Test if sample size is small
(n1 + n2 < 40). In this case the sample size is small as n1 + n2 = 37 This Hypothesis Test of Mean must be
performed as a t-Test. A t-Test uses the t distribution and not the normal distribution as does a z-Test.

f) Pooled or Unpooled t-Test?


Pooled t-Tests are performed if the variances of both sample groups are similar. A rule-of-thumb is as
follows: A Pooled t-Test should be performed if the standard deviation of one sample is no more than
twice as large as the standard deviation in the other sample. That is definitely not the case here as the
following are true:
s1 = sample1 standard deviation = 24.78
and
s2 = sample2 standard deviation = 11.80

F Test For Sample Variance Comparison in Excel


An F Test is a hypothesis test commonly used to test for the equality of variances of two or more sample
groups. An Excel F Test performed on the two sample groups produces the following output:

The Null Hypothesis of an F Test states that the variances of the two groups are the same. The p Value
shown in the Excel F Test output equals 0.002. This is much smaller than the Alpha (0.05) that is typically
used for an F Test so the Null Hypothesis can be rejected. The p value indicates that there is only a 0.2
percent of obtaining this result if the Null Hypothesis is true.

137
We therefore conclude as a result of the F Test that the variances are the not same. The F Test is
sensitive to non-normality of data. The sample variances can also be compared using the nonparametric
Levene’s Test and also the nonparametric Brown-Forsythe Test.

Levene’s Test For Sample Variance Comparison in Excel


Levene’s Test is a hypothesis test commonly used to test for the equality of variances of two or more
sample groups. Levene’s Test is more robust against non-normality of data than the F Test.
The Null Hypothesis of Levene’s Test is average distance to the sample mean is the same for each
sample group. Acceptance of this Null Hypothesis implies that the variances of the sampled groups are
the same. The distance to the mean for each data point of both samples is shown as follows:

138
Levene’s Test involves performing Single-Factor ANOVA on the groups of distances to the mean. This
can be easily implemented in Excel by applying the Excel data analysis tool ANOVA: Single Factor.
Applying this tool on the above data produces the following output:

The Null Hypothesis of Levene’s Test states that the average distance to the mean for the two groups are
the same. Rejection of this Null Hypothesis would imply that the sample groups have the different
variances. The p Value shown in the Excel ANOVA output equals 0.0025. This is much smaller than the
Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis should be rejected.
We therefore conclude as a result of Levene’s Test that the variances are different. Levene’s Test is
sensitive to outliers because relies on the sample mean, which can be unduly affected by outliers. A very
similar nonparametric test called the Brown-Forsythe Test relies on sample medians and is therefore
much less affected by outliers as Levene’s Test is or by non-normality as the F Test is.

139
Brown-Forsythe Test For Sample Variance Comparison in Excel
The Brown-Forsythe Test is a hypothesis test commonly used to test for the equality of variances of two
or more sample groups. The Null Hypothesis of the Brown-Forsythe Test is average distance to the
sample median is the same for each sample group. Acceptance of this Null Hypothesis implies that the
variances of the sampled groups are the same. The distance to the median for each data point of both
samples is shown as follows:

The Brown-Forsythe Test involves performing Single-Factor ANOVA on the groups of distances to the
median. This can be easily implemented in Excel by applying the Excel data analysis tool ANOVA:
Single Factor. Applying this tool on the above data produces the following output:

140
The Null Hypothesis of the Brown-Forsythe Test states that the average distance to the median for the
two groups are the same. Acceptance of this Null Hypothesis would imply that the sample groups have
the same variances. The p Value shown in the Excel ANOVA output equals 0.0033. This is much smaller
than the Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis should be rejected.
We therefore conclude as a result of the Brown-Forsythe Test that the variances are the not same.
Each of the above tests can be considered relatively equivalent to the others. We believe that the
variances of both sample groups are dissimilar enough to force using an Unpooled test for this two-
independent sample hypothesis test.
This hypothesis test is a t-Test that is two-independent-sample, two-tailed, Unpooled hypothesis
test of mean.

Question 2) Test Requirements Met?


a) Normal Distribution of Both Sample Means
A t-Test can be performed if the distribution of the Test Statistic (the t value) can be approximated under
the Null Hypothesis by the t Distribution. The t Value for this test is calculated as follows:

To perform a hypothesis test that is based on the normal distribution or t distribution, both sample means
must be normally distributed. In other words, if we took multiple samples just like either one of the two
mentioned here, the means of those samples would have to be normally distributed in order to be able to
perform a hypothesis test that is based upon the normal or t distributions.
For example, 30 independent, random samples of the daily production from each of the two shifts could
be evaluated just like the single sample of units produced from 15+ production days from each of the two
shifts as mentioned here. If the means of all of the 30 samples from one shift and, separately, the means
of the other 30 samples from the other shift are normally distributed, a hypothesis test based on the
normal or t distribution can be performed on the two independent samples taken.
The means of the samples would be normally distributed if any of the following are true:

1) Sample Size of Both Samples Greater Than 30


The Central Limit Theorem states that the means of similar-sized, random, independent samples will be
normally distributed if the sample size is large (n >30) no matter how the underlying population from
which the samples came from is distributed. In reality, the distribution of sample means converges toward
normality when n is as small as 5 as long as the underlying population is not too skewed.

2) Both Populations Are Normally Distributed


If this is the case, the means of similar sized, random, independent samples will also be normally
distributed. It is quite often the case that the distribution of the underlying population is not known and
should not be assumed.

141
3) Both Samples Are Normally Distributed
If the sample is normally distributed, the means of other similar-sized, independent, random samples will
also be normally distributed. Normality testing must be performed on the sample to determine whether the
sample is normally distributed.
In this case the sample size for both samples is small: n 1 and n2 are both less than 30. The normal
distribution of both sample means must therefore be tested and confirmed. Normality testing on each of
the samples has to be performed to confirm the normal distribution of the means of both samples.

b) Significantly Different Sample Variances


The two-independent sample, Unpooled t-Test expects that the two independent samples have
significantly different variances. Samples that have similar variances are said to be homoscedastistic.
Samples that have significantly different variances are said to be heteroscedastistic. The samples in this
example have the significantly different variances. This is confirmed by the variances comparison tests
that were previously mentioned in this example.

c) Independence of Samples
This type of a hypothesis test requires both samples be totally independent of each other. In this case
they are completely independent.

Evaluating the Normality of the Sample Data


The following five normality tests will be performed on the sample data here:
An Excel histogram of the sample data will be created.
A normal probability plot of the sample data will be created in Excel.
The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.
The Anderson-Darling test for normality of the sample data will be performed in Excel.
The Shapiro-Wilk test for normality of the sample data will be performed in Excel.
The quickest way to evaluate normality of a sample is to construct an Excel histogram from the sample
data.

142
Histogram in Excel
Excel histograms of both sample groups are as follows:

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

143
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.

144
Normal Probability Plot in Excel
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:

Normal probability plots for both sample groups show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).

145
Kolmogorov-Smirnov Test For Normality in Excel
The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

Variable 1 – Shift A Units Produced

0.0938 = Max Difference Between Actual and Expected CDF


20 = n = Number of Data Points
0.05 = α

146
Variable 2 - Shift B Units Produced

0.1212 = Max Difference Between Actual and Expected CDF


17 = n = Number of Data Points
0.05 = α

147
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected if the maximum difference between the expected and actual CDF of
any of the data points exceed the Critical Value for the given n and α.
The Max Difference Between the Actual and Expected CDF for Variable 1 (0.0938) and for Variable 2
(0.1212) are significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15
(0.34) at α = 0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of each of the two sample
groups is accepted.

Anderson-Darling Test For Normality in Excel


The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons:
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.
If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.

148
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

Variable 1 - Shift A Units Produced

Adjusted Test Statistic A* = 0.253

149
Variable 2 - Shift B Units Produced

Adjusted Test Statistic A* = 0.219

Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for Variable 1 (0.253) and for Variable 2 (0.219) are significantly less than
the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for
each of the two sample groups is accepted.

150
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.

Variable 1 - Shift A Units Produced

0.966538 = Test Statistic W


0.905 = W Critical for the following n and Alpha
20 = n = Number of Data Points
0.05 = α
151
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0.966538) is larger than W Critical 0.905. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

Variable 2 - Shift B Units Produced

0.974736 = Test Statistic W


0.892 = W Critical for the following n and Alpha
17 = n = Number of Data Points
0.05 = α
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0.974736) is larger than W Critical 0.892. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

152
Correctable Reasons That Normal Data Can Appear Non-Normal
If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.

The above questions have been satisfactorily answered. We will however perform the t-Test to
demonstrate how a two-independent-sample, unpooled t-Test is done. We now proceed to complete the
four-step method for solving all Hypothesis Tests of Mean. These four steps are as follows:

Step 1) Create the Null Hypothesis and the Alternate Hypothesis


Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternate Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the average optimism scores for both sample groups are the
same. We will use the variable x_bar1-x_bar2 to represent the difference between the means of the two
groups. If the mean scores for both groups are the same, then the difference between the two means,
x_bar1-x_bar2, would equal zero. The Null Hypothesis is as follows:
H0: x_bar1-x_bar2 = Constant = 0

153
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the first mean (x_bar 1) is different than the
second mean (x_bar2). The Alternate Hypothesis is as follows:
H1: x_bar1-x_bar2 ≠ Constant, which is 0
H1: x_bar1-x_bar2 ≠ 0
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.

The following formulas are used by the Two-Independent Sample, Unpooled t-Test:

Unpooled Degrees of Freedom

df = [ { (Var1/n1) + (Var2/n2) }^2 ] / [ {(Var1/n1)^2 / (n1 - 1) } + { (Var2/n2)^2 / (n2-1) } ]


df = [ { (613.84/20) + (139.32/17) }^2 ] / [ {(613.84/20)^2 / (20 - 1) } + { (139.32/17)^2 / (17 - 1) } ]
df = 28

Unpooled Sample Standard Error

SE = SQRT[ (Var1/n1) + (Var2/n2) ]


SE = SQRT[ (613.84/20) + (139.32/17) ]
SE = 6.236
2
Note that this calculation of the Standard Error using the sample variance, s , is an estimate of the true
2
Standard Error which would be calculated using the population variance, σ , of the populations from
which the samples were drawn.
These parameters are used to map the distributed variable, x_bar 1-x_bar2, to the t Distribution curve as
follows:

154
Step 2 – Map the Distributed Variable on a t-Distribution Curve
A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed
according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-
related Test Statistic are distributed according to the t Distribution.
The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies
only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its
degrees of freedom, which equals n – 1 if n = sample size.
The means of similar, random samples taken from a normal population are distributed according to the t
Distribution. This means that the distribution of a large number of means of samples of size n taken from
a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.
The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of
freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map
the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:

A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar 1-x_bar2.

155
The variable x_bar1-x_bar2 is distributed according to the t Distribution. Mapping this distributed variable
to a t Distribution curve is shown as follows:

This non-standardized t Distribution curve is constructed from the following parameters:


Mean = 0, which is the constant taken from the Null Hypothesis
Standard Error = 6.236
Degrees of Freedom = 28
Distributed Variable = : x_bar1-x_bar2

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bar1 is from x_bar2 by showing how different x_bar1-x_bar2 (4.31) is
from zero.
The non-standardized t Distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.
If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a Region of Rejection, the
Null Hypothesis is rejected. If the difference between the sample means, x_bar 1-x_bar2 (4.31), falls into a
Region of Acceptance, the Null Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.

156
This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the
Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which
outer tail. The Alternative Hypothesis is the follows:
H1: x_bar1-x_bar2 ≠ 0
A “not equal” operator indicates that this will be a two-tailed test. This means that the Region of Rejection
is split between both outer tails.
The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical Values need to be calculated.

Calculate the Critical Values

Two-Tailed Critical Values


Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± T.INV(1-α/2,df) * SE
Critical Values = 0 ± T.INV(0.975, 28) * 6.236
Critical Values = 0 ± 12.77
Critical Values = -12.77 and +12.77
The Region of Rejection therefore includes everything that is to the right of +12.77 and to the left of -
12.77.
The distribution curve with the blue Region of Acceptance and the yellow Regions of Rejection is shown
is as follows:

157
If this were a one-tailed test, the Critical Values would be determined as follows:

One-Tailed Critical Value


The Region of Rejection would be in the right tail because Sample 1 has the higher mean.
Critical Value = Mean + (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Value = Mean + T.INV(1-α,df) * SE
Critical Value = 0 + T.INV(0.95, 28) * 6.236
Critical Value = 0 + 10.61
Critical Value = +10.61
The Critical Values for the two-tailed test are farther from the mean than the Critical Value for one-tailed
test. This means that a two-tailed test is more stringent than a one-tailed test.

Step 4 – Determine Whether to Reject Null Hypothesis


The object of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis. There
are three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of
these tests needs to be performed because all three provide equivalent information. The three tests are
as follows:

1) Compare Sample Mean, x_bar1-x_bar2 With Critical Value


Reject the Null Hypothesis if the sample mean, x_bar1-x_bar2 = 4.31, falls into the Region of Rejection.
Fail to Reject the Null Hypothesis if the sample mean, x_bar 1-x_bar2 = 4.31, falls into the Region of
Acceptance.
Equivalently, reject the Null Hypothesis if the sample mean, x_bar 1-x_bar2, is further the curve’s mean of
0 then the Critical Value. Fail to reject the Null Hypothesis if the sample mean, x_bar 1-x_bar2, is closer the
curve’s mean of 0 then the Critical Value.
The Critical Values have been calculated to be -12.77 on the left and +12.77 on the right. x_bar 1-x_bar2
(4.31) is closer from the curve mean (0) than right Critical Value (+12.77). The Null Hypothesis would
therefore not be rejected.

2) Compare t Value With Critical t Value


The t Value corresponds to the standardized value of the sample mean, x_bar 1-x_bar2 = 4.31. The t Value
is the number of Standard Errors that x_bar is from the curve’s mean of 0.
The Critical t Value is the number of Standard Errors that the Critical Value is from the curve’s mean.
Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t
Value.
Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the
Critical t Value.

The t Value, the Test Statistic in a t-Test, is the number of Standard Errors that x_bar 1-x_bar2 is from the
mean. The Critical t Value is the number of Standard Errors that the Critical Value is from the mean. If the
t Value is larger than the Critical t Value, the Null Hypothesis can be rejected.

158
t Value (test statistic) = (x_bar1 - x_bar2 - 0) / SE
t Value (test statistic) = (4.31)/6.239 = 0.69

Two-Tailed Critical t Values = ± T.INV(1-α/2,df)


Two-Tailed Critical t Values = ±T.INV(1- 0.05/2, 28) = ±2.048

Right Critical t Value = +2.048


This indicates that (x_bar1 - x_bar2) is 2.048 standard errors to the right of the mean (mean = Constant =
0).
The t Value (0.69) is much closer to the mean (mean = Constant = 0) than the Critical t Value (+2.048) so
the Null Hypothesis is accepted.

3) Compare the p Value With Alpha


The p Value is the percent of the curve that is beyond x_bar 1-x_bar2 (4.31). If the p Value is larger than
Alpha/2 (since this is a two-tailed test), the Null Hypothesis is accepted. The p Value in this case is
calculated by the following Excel formula:
p Value = T.DIST.RT(ABS(t Value), df) = T.DIST.RT(ABS(0.69), 28) = 0.247
The p Value (0.247) is much larger than Alpha/2 (0.025 – because this is one tail of a two-tailed test) and
we therefore accept the Null Hypothesis. A graph below shows that the red p Value (the curve area
beyond x_bar1-x_bar2) is much larger than the yellow Alpha, which is the 2.5 percent Region of Rejection
in the outer right tail. This is shown in the following Excel-generated graph of this non-standardized t
Distribution curve:

159
It should be noted that if this t-Test were a one-tailed test, which is less stringent than a two-tailed test,
the Null Hypothesis would still be accepted because:
1) The p Value (0.247) is still much larger than Alpha (0.05)
2) x_bar1-x_bar2 (4.31) is still in the Region of Acceptance, which would now have its outer right boundary
at 10.61 (mean + T.INV(1-α,df)*SE)
3) the t Value (0.69) would still be smaller than the Critical t Value which would now be 1.70 (TINV(1-
α,df))

Excel Data Analysis Tool Shortcut


This two-independent-sample, unpooled t-Test can be solved much quicker using the following Excel data
analysis tool:
t-Test: Assuming Unequal Variances
This Excel tool is part of the Data Analysis Toolpak that is an add-in which ships with Excel but must first
be activated by the user before it is available.
Before this test is employed, all required assumptions such as normality of data must be verified as was
done.
As mentioned, the data should be input with the sample having the largest mean being designated as the
sample group. Doing so ensures that the Excel output will have the same signs for the t Value and Critical
t Value. If the sample with the smallest mean is input is the first sample, the t Value will correctly be
negative but the Critical t Value will be incorrectly listed by Excel as being positive.
This tool will be applied to the following data set using the same data as the preceding example in this
section:

160
The completed dialogue box for this tool is shown as follows:

161
Clicking OK will produce the following result. This result agrees with the calculations that were performed
in this section.

162
The calculations to create the preceding output were performed as follows. The individual outputs are
color-coded so it is straight-forward to match the calculations with the outputs of the tool.

163
164
Excel Statistical Function Shortcut
The stand-alone Excel formula to perform a two-independent sample, unpooled t-Test is shown as
follows. If the resulting p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test, the
difference between the means of the samples is deemed to be statistically significant. This indicates that
the two samples were likely drawn from different populations.

Effect Size in Excel


Effect size in a t-Test is a convention of expressing how large the difference between two groups is
without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.”
A large effect would be a difference between two groups that is easily noticeable with the measuring
equipment available. A small effect would be a difference between two groups that is not easily noticed.
Effect size for a two-independent-sample, unpooled t-Test is a method of expressing the distance
between the difference between sample mean, x_bar1-x_bar2, and the Constant in a standardized form
that does not depend on the sample size.
Remember that the Test Statistic (the t Value) for a two-independent-sample t-Test (both pooled and
unpooled t-Tests) calculated by the following formula:

Pooled t-Test formulas are used when the variances of both independent sample groups are similar. The
rule of thumb is that the pooled t-Test formulas are used if the sample standard deviation of one of the
groups is no more than twice as large as the sample standard deviation of the other group. Unpooled t-
Test formulas are used the difference between the sample standard deviations is larger.
The t Value in a pooled t-Test is calculated as follows:

165
The Standard Error for a pooled t-Test is calculated as follows:

Effect Size for a pooled t-Test is calculated as follows:

In this case spooled can be derived from SEpooled with the following calculation:

Effect Size for an unpooled t-Test is calculated as follows:

sunpooled for purposes of calculating Effect Size can be derived from SE in the same way that s pooled can as
follows:

166
The Standard Error for a unpooled t-Test is calculated as follows:

sunpooled can therefore be calculated as follows:

With algebraic manipulation, the formula for sunpooled can be shortened to the following formula:

sunpooled = 18.905
The t Value specifies the number of Standard Errors that the differences between sample means, x_bar 1-
x_bar2, is from the Constant. The t Value is dependent upon the sample size, n. The t Value determines
whether the test has achieved statistical significance and is dependent upon sample size. Achieving
statistical significance means that the Null Hypothesis (H 0: x_bar1-x_bar2 = Constant = 0) has been
rejected.
The Effect Size, d, for a two-independent-sample, unpooled t-Test is a very similar measure that does not
directly depend on sample size and has the following formula:

sunpooled pools the sample standard deviations based upon the proportion of combined samples that each
of the sample sizes n1 and n2 represent and not the absolute values of n1 and n2. sunpooled is therefore not
directly dependent on sample sizes n1 and n2.
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.

167
The d measured here is Cohen’s d for a two-independent-sample, unpooled t-Test. The Effect Size is a
standardized measure of size of the difference that the t-Test is attempting to detect. The Effect Size for a
two-independent-sample, unpooled t-Test is a measure of that difference in terms of the number of
sample standard deviations. Note that sample size has no effect on Effect Size. Effect size values for the
two-independent-sample, unpooled t-Test are generalized into the following size categories:
d = 0.2 up to 0.5 = small Effect Size

d = 0.05 up to 0.8 = medium Effect Size


d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar1 - x_bar2 – Constant| / sunpooled = |46.55 – 42.24 – 0| / 18.905 = 0.228
An Effect Size of d = 0.228 is considered to be a small effect.

168
Power of the Test With Free Utility G*Power
The Power of a two-independent-sample, pooled t-Test is a measure of the test’s ability to detect a
difference given the following parameters:
Alpha (α)
Effect Size (d)
Sample Sizes (n1 and n2)
Number of Tails
Power is defined by the following formula:
Power = 1 – β
Β equals the probability of a Type 2 Error. A Type 2 Error can be described as a False Negative. A false
Negative represents a test not detecting a difference when a difference does exist.
1 – β = Power = the probability of a test detecting a difference when one exists.
Power is therefore a measure of the sensitivity of a statistical test. It is common to target a Power of 0.8
for statistical tests. A Power of 0.8 indicates that a test has an 80 percent probability of detecting a
difference.
The four variables that are required in order to determine the Power for a one-sample t-Test are Alpha
(α), Effect Size (d), Sample Sizes (n1 and n2), and the Number of Tails. Typically alpha, Effect Size, and
the Number of Tails are held constant while sample sizes are varied (usually increased) to achieve the
desired Power for the statistical test.
Manual calculation of a test’s Power given Alpha, Effect Size, Sample Size, and the Number of Tails are
quite tedious. Fortunately there are a number of free utilities online that will readily calculate a test’s
statistical Power. A widely-used online Power calculation utility called G*Power is available for download
from the Institute of Experimental Psychology at the University of Dusseldorf at this link:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.
Bring up G*Power’s initial screen and input the following information:
Test family: t-Tests
Statistical test: Means: Difference between two independent means (two groups)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 2
Effect Size (d) = 0.228
Alpha (α) = 0.05
Sample Sizes (n1 = 20 and n2 = 17)

169
The completed dialogue screen appears as follows:

170
Clicking Calculate would produce the following output:

The Power achieved for this test is 0.1031. This means that the current one-tailed test has a 10.31
percent chance of detecting a difference that has an effect size of 0.228 if α = 0.05, n 1 = 20, and n2 = 17.

171
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:

This would indicate that a Power of 80 percent would be achieved for this test if the total sample size
were equal to approximately n1 + n2 = 600.

Nonparametric Alternatives in Excel


When normality of data cannot be confirmed for a small sample, it is necessary to substitute a
nonparametric test for a t-Test. Nonparametric tests do not have the same normality requirement that the
t-Test does. The most common nonparametric test that can be substituted for the two-independent-
sample t-Test when data normality cannot be confirmed is the Mann-Whitney U Test.
A Mann-Whitney U test was performed on the data in the section covering the pooled, two-independent-
sample t-Test but will not be performed on this data used in the unpooled, two-independent-sample t-
Test. The procedure for the Mann-Whitney U test is exactly the same for pooled and unpooled two-
independent sample t-Tests.

172
Paired (Two-Sample Dependent) t-Test in Excel

Overview
This hypothesis test determines whether the mean of a sample of differences between pairs of data
(x_bardiff) is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant.
Before-and-after fitness levels of individuals undergoing a training program would be an example of
paired data. The sample evaluated would be the group of differences between the before-and-after
scores of the individuals. This is called the difference sample.
x_bardiff = Observed Difference Sample Mean

df = n – 1
Null Hypothesis H0: x_bardiff = Constant
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bardiff is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

Example of Paired, 1-Tailed t-Test in Excel


This problem is very similar to the problem solved in the z-test section for a paired, one-tailed z-test.
Similar problems were used in each of these sections to show the similarities and also contrast the
differences between the paired z-Test and t-test as easily as possible.
A new clerical program was introduced to a large company with the hope that clerical errors would be
reduced. 5,000 clerical workers in the company underwent the training program. 17 Clerical employees
who underwent the training were randomly selected. The average number of clerical errors that each of
these 17 employees made per month for six months prior to the training and also for six months following
the training were recorded. Each of the 17 employees had a similar degree of clerical experience within
the company and performed nearly the same volume and type of clerical work in the before and after
months.
Based upon the results of the 17 sampled clerical employees, determine with 95 percent certainty
whether the average number of monthly clerical mistakes was reduced for the entire 5,000 clerical
employees who underwent the training.
It is the difference that we are concerned with. A hypothesis test will be performed on the sample of
differences. The distributed variable will be designated as x_bardiff and will represent that average
difference between After and Before samples.

173
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred.
This problem illustrates why the t-test is nearly always used instead of a z-Test to perform a two-
dependent-sample (paired) hypothesis test of mean. The z-Test requires the population standard
deviation of the differences between the pairs be known. This is often not the case, but is required for a
paired z-Test . The t-test requires only the sample standard deviation of the sample of paired differences
be known.
Before and After Results and Their Differences Are As Follows:

Running the Excel data analysis tool Descriptive Statistics on the column of Difference data produces the
following output:
Running the Excel data analysis tool Descriptive Statistics on the column of Difference data will provide
the Sample Mean, the Sample Standard Deviation, the Standard Error, and the Sample Size. It will even
provide half the width of a confidence interval about the mean based on this sample for any specified
level of certainty if that option is specified. The output of this tool appears as follows:

174
It is the difference that we are concerned with. A hypothesis test will be performed on the sample of
differences. The distributed variable will be designated as x_bar diff and will represent that average
difference between After and Before samples.
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred

Summary of Problem Information


x_bardiff = sample mean =AVERAGE() = -3.35
sdiff = sample standard deviation = STDEV.S() = 6.4
n = sample size = number of pairs = COUNT() = 17
df = n – 1 = 16

SEdiff = Standard Error = sdiff / SQRT(n) = 6.4 / SQRT(16)


SEdiff = 1.55

175
Note that this calculation of the Standard Error using the sample standard deviation, sdiff, is an estimate of
the true Standard Error which would be calculated using the population standard deviation, σ diff.
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean are as follows:

Question 1) Type of Test?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Mean because each individual observation (each sampled difference) within
the sample can have a wide range of values. Data points for Hypothesis Tests of Proportion are binary:
they can take only one of two possible values.

b) One-Sample or Two-Sample Test?


This is a two-sample hypothesis test because the data exists in two groups of measurements. One
sample group contains Before measurements and the other sample group contains After measurements.

c) Independent (Unpaired) Test or Dependent (Paired) Test?


This is a paired (dependent) hypothesis test because each Before observation has a related After
observation made on the same person.

d) One-Tailed or Two-Tailed Test?


The problem asks to determine whether there has been a reduction in clerical mistake from Before to
After. This is a directional inequality making this hypothesis test a one-tailed test. If the problem asked
whether Before and After were simply different, the inequality would be non-directional and the resulting
hypothesis test would be a two-tailed test. A two-tailed test is more stringent than a one-tailed test.

e) t-Test or z-Test?
Assuming that the difference population or difference sample can pass a normality test, a hypothesis test
of mean must be performed as a t-Test when the difference sample size (n = number of difference pairs)
is small (n < 30) or if the variance of differences is unknown.

176
In this case the difference sample size (the number of data pairs) is small as n = 17 data sample pairs.
This Hypothesis Test of Mean must therefore be performed as a t-Test and not as a z-Test.
The t Distribution with degrees of freedom = df = n – 1 is defined as the distribution of random data
sample of sample size n taken from a normal population.
The means of samples taken from a normal population are also distributed according to the t
Distribution with degrees of freedom = df = n – 1.
The Test Statistic (the t Value, which is based upon the difference sample mean (x_bardiff) because it
equals (x_bardiff – Constant)/(SEdiff) will therefore also be distributed according to the t Distribution. A t-
Test will be performed if the Test Statistic is distributed according to the t Distribution.
The distribution of the Test Statistic for the difference sample taken from a normal population of
differences is always described by the t Distribution. The shape of the t Distribution converges to (very
closely resembles) the shape of the standard normal distribution when the difference sample size
becomes large (n > 30).
The Test Statistic’s distribution can be approximated by the normal distribution only if the difference
sample size is large (n > 30) and the population standard deviation, σ, is known. A z-Test can be used if
the Test Statistic’s distribution can be approximated by the normal distribution. A t-Test must be used in
all other cases.
It should be noted that a paired t-Test can always be used in place of a paired z-Test. All z-Tests can be
replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages
including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a t-Test that is two-sample, paired (dependent), one-tailed hypothesis test
of mean.

Question 2) Test Requirements Met?


a) t-Distribution of Test Statistic
A t-Test can be performed if the distribution of the Test Statistic (the t value) can be approximated under
the Null Hypothesis by the t Distribution. The Test Statistic is derived from the mean of the difference
sample and therefore has the same distribution that the difference sample mean would have if multiple
similar samples were taken from the same population of differences between data sample pairs.
The difference sample size indicates how to determine the distribution of the difference sample mean and
therefore the distribution of the Test Statistic as follows:

When Difference Sample Size Is Large


When the difference sample size is large (n > 30 meaning that there are more than 30 pairs of data), the
distribution of means of similar samples drawn from the same population of differences is described by
the t Distribution. As per the Central Limit Theorem, as the difference sample size increases, the
distribution of the difference sample means converges to the normal distribution as does the t Distribution.
When the difference sample size approaches infinity, the t Distribution converges to the standard normal
distribution.
When the difference sample size is large, the distribution of the distribution sample mean, and therefore
the distribution of the Test Statistic, is always described by the t Distribution. A t-Test can therefore
always be used when the difference sample size is large, regardless of the distribution of the population
of differences or the difference sample.

177
When the Difference Sample Size is Small
The data in a difference sample taken from a normally-distributed population of paired differences will be
distributed according to the t Distribution regardless of the difference sample size.
The means of similar difference samples randomly taken from a normally-distributed population of paired
differences are also distributed according to the t Distribution regardless of the difference sample size.
The difference sample mean, and therefore the Test Statistic, are distributed according to the t
Distribution if the population of paired differences is normally distributed.
The population of paired differences is considered to be normally distributed if any of the following are
true:
1) Population of Paired Differences Is Normally Distributed
2) Difference Sample Is Normally Distributed
If the difference sample passes a test of normality then the population of paired difference from which the
difference sample was taken can be assumed to be normally distributed.
The population of paired differences or the difference sample must pass a normality test before a t-Test
can be performed. If the only data available are the data of the single difference sample taken, then
difference sample must pass a normality test before a t-Test can be performed.

Evaluating the Normality of the Sample Data


The following five normality tests will be performed on the difference sample data here:
An Excel histogram of the sample data will be created.
A normal probability plot of the sample data will be created in Excel.
The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.
The Anderson-Darling test for normality of the sample data will be performed in Excel.
The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

Histogram in Excel
The quickest way to check the difference sample data for normality is to create an Excel histogram of the
data as shown below, or to create a normal probability plot of the data if you have access to an
automated method of generating that kind of a graph.
It is very important to verify the normality of the data if the difference sample size is small. If a hypothesis
test’s required underlying assumptions cannot be met, the test is invalid.

178
Here is a histogram of the sample of differences.

To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

The sample of differences appears to be distributed reasonably closely to the bell-shaped normal
distribution. It should be noted that bin size in an Excel histogram is manually set by the user. This
arbitrary setting of the bin sizes can has a significant influence on the shape of the histogram’s output.
Different bin sizes could result in an output that would not appear bell-shaped at all. What is actually set
by the user in an Excel histogram is the upper boundary of each bin.

179
Normal Probability Plot in Excel
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for the sampled differences. This can be implemented in Excel and appears as follows:

The normal probability plot for the sampled differences shows that the data appears to be very close to
being normally distributed. The actual sample data (red) matches very closely the data values of the
sample were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence
interval boundaries (green).

Kolmogorov-Smirnov Test For Normality in Excel


The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

180
Difference Sample Group

0.0926 = Max Difference Between Actual and Expected CDF


17 = n = Number of Data Points
0.05 = α

181
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected if the maximum difference between the expected and actual CDF of
any of the data points exceed the Critical Value for the given n and α.
The Max Difference Between the Actual and Expected CDF for Difference sample group (0.0926) is
significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15 (0.34) at α =
0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of the difference sample group is accepted.

Anderson-Darling Test For Normality in Excel


The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons:
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.
If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.

182
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
Difference Sample Group

Adjusted Test Statistic A* = 0.244


Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for the Difference Sample Group (0.244) is significantly less than the
Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for
each of the two sample groups is accepted.

183
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
Difference Data

0.968860 = Test Statistic W


0.892 = W Critical for the following n and Alpha
17 = n = Number of Data Points
0.05 = α
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected

184
Test Statistic W (0. 968860) is larger than W Critical 0.892. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

Correctable Reasons That Normal Data Can Appear Non-Normal


If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.

When Data Are Not Normally Distributed


When normality of data cannot be confirmed for a small sample, it is necessary to substitute a
nonparametric test for a t-Test. Nonparametric tests do not have the same normality requirement that the
t-Test does. The most common nonparametric tests that can substituted for the paired t-Test when data
normality cannot be confirmed are the Wilcoxon Signed-Rank Test and the nonparametric Sign Test.
The Wilcoxon Signed Rank Test is the more powerful nonparametric test but requires that data be
relatively symmetrical about a median. The nonparametric Sign Test can be used when this requirement
cannot be met.
The Wilcoxon Signed Rank Test also can only be performed in ratio or interval data. The nonparametric
Sign test can be performed on ordinal data.
Ratio and interval scales have measurable differences between values. Ordinal scales do not. The
absolute temperature scale is a ratio scale because the value of zero means that there is no heat. The
Fahrenheit or Celsius temperature scales are interval scales because the zero value is arbitrarily placed.
A Likert rating scale would be an example of ordinal data.
The Sign Test is non-directional and can only be substituted for a two-tailed test but not for a one-tailed
test like the example in this section.
The Wilcoxon Signed-Rank Test will be performed on the data in this example. The parametric paired,
one-tailed t-Test was able to detect a difference at alpha = 0.05. The one-tailed Wilcoxon Signed-Rank
Test was also able to detect a difference at alpha = 0.05.
The above questions have not been satisfactorily answered.
185
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternate Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the there is no difference between before and after data. We
will use the variable x_bardiff to represent the mean between the before and after measurements. The Null
Hypothesis is as follows:
H0: x_bardiff = Constant = 0
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether there has been a reduction in clerical errors,
i.e., the After measurements are, on average, smaller than the Before measurements. The Alternate
Hypothesis is as follows:
H1: x_bardiff < Constant , which is 0
H1: x_bardiff < 0
The Alternative Hypothesis is directional (“greater than” or “less than” instead of “not equal,” which is non-
directional) and the hypothesis test is therefore a one-tailed test. The “less than” operator indicates that
this is a one-tailed test with the Region of Rejection (the alpha region) entirely contained in the left tail. A
“greater than” operator would indicate a one-tailed test focused on the right tail.
It should also be noted that a two-tailed test is more rigorous (requires a greater differences between the
two entities being compared before the test shows that there is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the means of the population of
paired differences from which the difference samples were taken. A population of paired differences
would be the differences of data pairs in a population of data pairs.
A paired t-Test determines whether to reject or fail to reject the Null Hypothesis that states that that
population of paired differences from which the difference sample was taken has a mean equal to the
Constant. The Constant in this case is equal to 0. This means that the Null Hypothesis states that the
average difference between data pairs of an entire population from which the sample of data pairs were
drawn is zero.
Parameters necessary to map the distributed variable, x_bardiff, are the following:
x_bardiff = sample mean =AVERAGE() = -3.35
sdiff = sample standard deviation = STDEV.S() = 6.4
n = sample size = number of pairs = COUNT() = 17
df = n – 1 = 16

186
SEdiff = Standard Error = sdiff / SQRT(n) = 6.4 / SQRT(16)
These parameters are used to map the distributed variable, x_bardiff, to the t Distribution curve as follows:

Step 2 – Map the Distributed Variable to t-Distribution


A t-Test can be performed if the difference sample mean, and the Test Statistic (the t Value) are
distributed according to the t Distribution. If the difference sample has passed a normality test, then the
difference sample mean and closely-related Test Statistic are distributed according to the t Distribution.
The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies
only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its
degrees of freedom, which equals n – 1 if n = sample size.
The means of similar, random difference samples taken from a normal population of paired differences
are distributed according to the t Distribution. This means that the distribution of a large number of means
of difference samples of size n taken from a normal population will have the same shape as a t
Distribution with its degrees of equal to n – 1.
The difference sample mean and the Test Statistic are both distributed according to the t Distribution with
degrees of freedom equal to n – 1 if the sample or population is shown to be normally distributed. This
step will map the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.

The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:

187
A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar diff.
The variable x_bardiff is distributed according to the t Distribution. Mapping this distributed variable to a t
Distribution curve is shown as follows:

This non-standardized t Distribution curve has its mean set to equal the Constant taken from the Null
Hypothesis, which is:
H0: x_bardiff = Constant = 0
This non-standardized t Distribution curve is constructed from the following parameters:
Mean = Constant = 0
Standard Errordiff = 1.55
Degrees of Freedom = 16
Distributed Variable = x_bardiff

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to accept or reject the Null Hypothesis at a given
level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bardiff (-3.35) is from the hypothesized mean of 0.
The non-standardized t Distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.

188
The above distribution curve that maps the distribution of variable x_bar diff can be divided up into two
types of regions: the Region of Acceptance and the Region of Rejection.
If x_bardiff’s value of -3.35 falls in the Region of Acceptance, we must accept the Null Hypothesis. If
x_bardiff’s value of -3.35 falls in the Region of Rejection, we can reject the Null Hypothesis.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.
This 5 percent is entirely contained in the outer left tail. The outer left tail contains the 5 percent of the
curve that is the Region of Rejection.

Calculate the Critical Value


The boundary between Region of Acceptance and Region of Rejection is called Critical Value. The
location of this Critical Value need to be calculated as follows.
One-tailed, Left tail Critical Value = x_bardiff - (Number of Standard Errors from Mean to Region of
Rejection) * SEdiff
One-tailed, Left tail Critical Value = x_bardiff + T.INV(α,df) * SEdiff
One-tailed, Left tail Critical Value = 0 + T.INV(0.05, 16) * 1.55
One-tailed, Left tail Critical Value = -2.711
The Region of Rejection is therefore everything that is to the left of -2.711.
The distribution curve with the blue 95-percent Region of Acceptance and the yellow 5-percent Region of
Rejection entirely contained in the left tail is shown is as follows:

189
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:

1) Compare x-bardiff With Critical Value


Reject the Null Hypothesis if the sample mean, x_bardiff = -3.35, falls into the Region of Rejection.
Equivalently, reject the Null Hypothesis if the sample mean, x_bardiff, is further the curve’s mean of 0 than
the Critical Value.
The Critical Values have been calculated to be -2.71 on the left. x_bardiff (-3.35) is further from the curve
mean (0) than left Critical Value (-2.71). The Null Hypothesis would therefore be rejected.

2) Compare t Value With Critical t Value

The t Value corresponds to the standardized value of the sample mean, x_bar diff = -3.35. The t Value is
the number of Standard Errors that x_bardiff is from the curve’s mean of 0.

The Critical t Value is the number of Standard Errors that the Critical Value is from the curve’s mean.

Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t
Value.

Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the
Critical t Value.

t Value (Test Statistic) = (x_bardiff) / SEdiff = (-3.35)/1.55

t Value (Test Statistic) = -2.159

This means that the sample mean, x_bardiff, is 2.159 standard errors to the left of the curve mean of 0.

One-tailed, left-tail Critical t Value = T.INV(α,df)

One-tailed, left-tail Critical t Value = T.INV(0.05, 16) = -1.76

This means that the boundary of the Region of Rejection are 1.76 standard errors to the left of the curve
mean of 0 since this is a one-tailed test in the left tail.

The Null Hypothesis is rejected because the t Value is farther from curve mean the Critical t Values
indicating that x_bardiff is in the Region of Rejection.

190
3) Compare p Value With Alpha

The p Value is the percent of the curve that is beyond x_bar diff (-3.35). If the p Value is smaller than
Alpha, the Null Hypothesis is rejected.

p Value = T.DIST.RT(ABS(t Value), df)

p Value = T.DIST.RT(ABS(-2.159), 16)

p Value = 0.023

The p Value (0.023) is smaller than Alpha (0.05) Region of Rejection in the right tail and we therefore
reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is
smaller than the yellow Alpha, which is the 5 percent Region of Rejection in the left tail. This is shown in
the following Excel-generated graph of this non-standardized t Distribution curve:

191
Excel Data Analysis Tool Shortcut
This two-independent-sample, Pooled t-Test can be solved much quicker using the following Excel data
analysis tool:
t-Test: Paired Two Sample for Means
The Excel tool can be found by clicking Data Analysis under the Data tab. The tool is titled t-Test:Paired
Two Sample For Means. The entire Data Analysis Toolpak is an add-in that ships with Excel but must
first be activated by the user before it is available.
This hypothesis test creates the sample of differences by subtracting the Before results from the After
results. If the training program has successfully reduced the average number monthly clerical errors per
employee, the resulting average difference (x_bardiff) will be negative.
If the Before data was subtracted from the After data, the After data (in column B) sample should be
designated as Variable 1, as is done here. This ensures that the t Value (T Stat in the Excel output) has
the correct sign, which would be negative in this case.
This tool will be applied to the following data set using the same data as the preceding example in this
section.

The completed dialogue box for this tool is shown as follows:

192
193
Clicking OK will produce the following result. This result agrees with the calculations that were performed
in this section.

194
The calculations to create the preceding output were performed as follows. The individual outputs are
color-coded so it is straight-forward to match the calculations with the outputs of the tool.

195
196
Excel Statistical Function Shortcut
The stand-alone Excel formula to perform a paired (two-dependent sample) t-Test is shown as follows. If
the resulting p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test, the mean difference
between the Before and After sample pairs is deemed to be statistically significant.

The p Value is calculated to be 0.023. This is less than Alpha (0.05) or Alpha/2 (0.025) so the Null
Hypothesis for this t-Test would be rejected for both a one-tailed test and a two-tailed test if Alpha is set
to 0.05 (95 percent certainty required for the test).

197
Effect Size in Excel
Effect size in a t-Test is a convention of expressing how large the difference between two groups is
without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.”
A large effect would be a difference between two groups that is easily noticeable with the measuring
equipment available. A small effect would be a difference between two groups that is not easily noticed.
Effect size for a paired (two-dependent-sample) t-Test is a method of expressing the difference between
the sample mean, x_bardiff, and the Constant in a standardized form that does not depend on the sample
size.
Remember that the Test Statistic (the t Value) for a paired t-Test calculated by the following formula:

since

then

The t Value specifies the number of Standard Errors that the sample mean, x_bar diff, is from the Constant.
The t Value is dependent upon the sample size, n. The t Value determines whether the test has achieved
statistical significance and is dependent upon sample size. Achieving statistical significance means that
the Null Hypothesis (H0: x_bar = Constant) has been rejected.
The Effect Size, d, for a paired-sample t-Test is a very similar measure that does not depend on sample
size and has the following formula:

198
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
If the t Value has already been calculated, the Effect Size can be quickly calculated by the following
formula:

The d measured here is Cohen’s d for a paired t-Test. The Effect Size is a standardized measure of size
of the difference that the t-Test is attempting to detect. The Effect Size for a paired t-Test is a measure of
that difference in terms of the number of sample standard deviations. Note that sample size has no effect
on Effect Size. Effect size values for the paired t-Test are generalized into the following size categories:
d = 0.2 up to 0.5 = small Effect Size
d = 0.05 up to 0.8 = medium Effect Size
d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar diff – Constant| / sdiff = |–3.35- 0| / 6.40 = 0.523
An effect size of d = 0.523 is considered to be a medium effect.

199
Power of the Test With Free Utility G*Power
The Power of a one-sample t-Test is a measure of the test’s ability to detect a difference given the
following parameters:
Alpha (α)
Effect Size (d)
Sample Size (n)
Number of Tails
Power is defined by the following formula:
Power = 1 – β
Β equals the probability of a Type 2 Error. A Type 2 Error can be described as a False Negative. A false
Negative represents a test not detecting a difference when a difference does exist.
1 – β = Power = the probability of a test detecting a difference when one exists.
Power is therefore a measure of the sensitivity of a statistical test. It is common to target a Power of 0.8
for statistical tests. A Power of 0.8 indicates that a test has an 80 percent probability of detecting a
difference.
The four variables that are required in order to determine the Power for a one-sample t-Test are Alpha
(α), Effect Size (d), Sample Size (n), and the Number of Tails. Typically alpha, Effect Size, and the
Number of Tails are held constant while sample size is varied (usually increased) to achieve the desired
Power for the statistical test.
Manual calculation of a test’s Power given Alpha, Effect Size, Sample Size, and the Number of Tails are
quite tedious. Fortunately there are a number of free utilities online that will readily calculate a test’s
statistical Power. A widely-used online Power calculation utility called G*Power is available for download
from the Institute of Experimental Psychology at the University of Dusseldorf at this link:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.
Bring up G*Power’s initial screen and input the following information:
Test family: t-Tests
Statistical test: Difference between two dependent means (matched pairs)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 1
Effect Size (d) = 0.523
Alpha (α) = 0.05
Sample Size (n) = 17
The completed dialogue screen appears as follows:

200
201
Clicking Calculate would produce the following output:

The Power achieved for this test is 0.6624. This means that the current one-tailed paired t-Test has a
66.24 percent chance of detecting a difference that has an effect size of 0.523 if α = 0.05 and n = 17.

202
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:

This would indicate that a Power of 80 percent would be achieved for this test if the sample size were
approximately n = 24.

203
Nonparametric Alternatives in Excel
Wilcoxon Signed-Rank Test in Excel
The Wilcoxon Signed-Rank Test is an alternative to the paired t-Test when sample size is small (number
of pairs = n < 30) and normality cannot be verified for the difference sample data or the population from
which the difference sample was taken.
The Wilcoxon Signed-Rank Test calculates the difference between each data point in the difference
sample and the Constant from the t-Test’s Null Hypothesis (0 in this case). The absolute values of each
difference and ranked and then assigned the sign (positive or negative) that the difference originally had.
These signed ranks are summed up to create the Test Statistic W.
Test Statistic W will be approximately normally distributed if the required assumptions are met for this
test. The Test Statistic’s z Score can then be calculated and compared with the Critical z value. The
decision whether or not to reject the test’s Null Hypothesis is made based on the results of this
comparison.
The Null Hypothesis for this test states that the median difference equals the Constant, i.e. H 0: Population
Median Difference = Constant. This is very similar to the Null Hypothesis of the one-sample t-Test which
states that the population median difference is equal to the Constant. The population is the set of
differences from all possible before-and-after data pairs.
The Wilcoxon Signed-Rank Test is performed by implementing the following steps:
1) Calculate the difference between each difference data point and the Constant that the sample is being
compared to, which is the Constant of 0 from the Null Hypothesis.
2) Record the sign (positive or negative) of each difference.
3) Sort the absolute values of difference data in ascending order.
4) Assign ranks to this data.
5) Attach the sign of each difference to its rank.
6) Sum up all of these signed ranks. This sum is the Test Statistic W.
7) Calculate the standard deviation, σw, for these signed ranks.
8) The Test Statistic W will be approximately normally distributed if the required assumptions for this test
are met. Calculate the z Score for this variable W.
9) Compare the z Score of W with the Critical z Value for the given alpha and number of tails in the test. If
the z Score is further from the standardized mean of zero than the Critical z Value, the Null Hypothesis
can be rejected. The Null Hypothesis states that the population’s median is equal to the Constant from
the Null Hypothesis.

204
Performing the Wilcoxon Signed Rank Test on the data is the example in this section is done as follows:
Step 1) Calculate the Difference Between Each Sample Data Point and the Constant to Which the
Sample Is Being Compared.
The original Null Hypothesis from the paired t-Test stated that the mean difference between all before-
and-after data pairs in the entire population is equal to 0. The Null Hypothesis for this test was as follows:
H0: x_bardiff = Constant =0
The x_bardiff sample is created as follows:

205
The Wilcoxon Signed-Rank Test calculates the difference between each data point in the difference
sample and the Constant from the t-Test’s Null Hypothesis (0 in this case). That final difference sample is
created as follows:

Step 2) Create the Null and Alternative Hypotheses.


The one-sample t-Test attempts to determine whether the mean difference between all possible pairs of
before-and-after data is equal to 0.
The Wilcoxon Signed-Rank Test attempts to determine whether the median difference between all
possible pairs of before-and-after data is equal to 0. The Null Hypothesis for this would be stated as
follows:
H0: Median_Difference = Constant = 0
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether there has been a reduction in clerical errors,
i.e., the After measurements are smaller than the Before measurements. If there has been a reduction in
clerical errors, the median difference will be less than 0.
The Alternative Hypothesis for this Wilcoxon Signed-Rank Test will therefore be stated as follows:
H1: Median_Difference < Constant = 0
H1: Median_Difference < 0

206
Step 3) Evaluate Whether the Test’s Required Conditions Have Been Met
The Wilcoxon Signed-Rank Test has the following requirements:
a) Data are ratio or interval but not categorical (nominal or ordinal). This is the case here.
b) Sample size (the number of data pairs) is at least 10. This is the case here.
c) Data of the Difference sample are distributed about a median with reasonable symmetry. Test Statistic
W will not be normally distributed unless this assumption is met.
The following Excel-generated histogram shows that the difference data are distributed symmetrically
about their median of -4. The symmetry about the median of -4 is not perfect but, given the small sample
size, is reasonable enough to proceed with this test:

207
This histogram and the sample’s median were generated in Excel as follows:

208
Step 4 – Record the Sign of Each Difference
Place a “+1” and “-1” next to each non-zero difference. This can be automatically generated with an If-
Then-Else statement as follows:

209
Placing a plus sign (+) next to a number automatically requires a custom number format available from
the Format Cell dialogue box. One custom format that will work is the following: “+”#:”-“# . This is
demonstrated in following Excel screen shot:

210
Step 5 – Sort the Absolute Values of the Differences While Retaining the Sign Associated to Each
Difference
Sort both columns based upon column of difference absolute values.

211
Step 6 –Rank the Absolute Values, Attach the Signs, and Sum up the Signed Ranks to Create Test
Statistic W.
The absolute values are ranked in ascending order starting with a rank of 1. Absolute values that are tied
area assigned the average rank of the tied values. For example, the first three absolute values are 2.
Each of these three absolute values would be assigned a rank of 2, which is equal to the average rank of
all three, i.e., (1 + 2 + 3) / 3 = 2.
Test Statistic W is equal to the sum of all signed ranks.

212
Step 7 – Calculate the z Score of W
The distribution of Test Statistic W can be approximated by the normal distribution if all of the required
assumptions for this test are met. The difference data consists of more than 10 points of ratio data that
are reasonably symmetrical about their median. The assumptions are met for this Wilcoxon Signed-Rank
Test.
The standard deviation of W, σW, is calculated as follows:
σW = SQRT[ n(n + 1)(2n + 1)/6 ] = 42.25
z Score = ( W – Constant – 0.5) / σW
z Score = ( -83 – 0 – 0.5) / 42.25 = -1.98
The constant is the Constant from the Null Hypothesis for this test, which is the following:
H0: Median_Difference = Constant = 0
The z Score must include a 0.5 correction for continuity because W assumes whole integer values
(except in the event of a tie of ranks).

213
Step 8 – Reject or Fail to Reject the Null Hypothesis Based Upon a Comparison Between the z
Score and the Critical z Value
Given that α = 0.05 and this is a two-tailed test, the Critical z Value is calculated as follows:
Z Criticalα=0.05,One-Tailed, Left_Tail = NORM.S.INV(α) = NORM.S.INV(0.05)
Z Criticalα=0.05, One-Tailed, Left_Tail = -1.64485
The Null Hypothesis is rejected if the z Score is further from the standardized mean of zero than the
Critical z Values. This is the case here since the z Score (-1.98) is further from the standardized mean of
zero than the Critical z Values (-1.64485). this information is shown in the Excel-generated graph as
follows:

The Wilcoxon Signed-Rank Test detects that the median difference between the before-and-after data is
significant at an alpha level of 0.05.
The results of this Wilcoxon Signed-Rank Test were very similar to the results of the original paired t-Test
in which the Null Hypothesis was rejected because the t Value (-2.159) was further from the standardized
mean of zero than the Critical t Value (-1.76).

214
The results of the original t-Test are shown as follows:

The Paired t- Test detects that the mean difference between the before-and-after data is significant at an
alpha level of 0.05.

215
The Sign Test in Excel
The Sign Test along with the Wilcoxon Signed-Rank Test are nonparametric alternatives to the paired t-
Test when the normality of the difference sample cannot be verified and the sample size is small.
The Wilcoxon Signed-Rank Test is significantly more powerful than the Sign Test but has a requirement
of symmetrical distribution about a median for the difference sample data (the data set of the sample
points minus the Constant of the Null Hypothesis). The Wilcoxon Signed-Rank Test is based upon a
normal approximation of its Test Statistic’s distribution. This requires that the difference sample be
reasonably symmetrically distributed about a median.
The Sign Test has no requirements regarding the distribution of data but, as mentioned, is significantly
less powerful than the Wilcoxon Signed-Rank Test.
The Sign Test counts the number of positive and negative non-zero differences between difference
sample data and the Constant from the Null Hypothesis in the paired t-Test. In this case that Constant = 0
because the Null Hypothesis of the one-tailed t-Test is as follows:
H0: x_bardiff = Constant = 0
The after-minus-before difference sample for this problem is calculated as follows:

216
The final difference sample is created by subtracting the Constant from the Null Hypothesis, 0, from the
after-minus-before difference as follows:

217
A count of positive and negative differences in this sample is taken as follows:

The minimum count of positive or negative non-zero differences is designated as the Test Statistic W for
the Sign Test. Test Statistic W is named after Frank Wilcoxon who developed the test.
The objective of the one-tailed, paired t-Test was to determine whether to reject or fail to reject the Null
Hypothesis that states that the mean difference between the number of clerical errors before and after the
training for all employees is equal to 0.
If the mean difference is equal to 0, then the probability of a difference being positive (greater than zero)
is the same as the probability of being negative (less than zero). This probability is 50 percent.
The Null Hypothesis for this one-tailed, Sign Test states that the probability of a difference being positive
(p) is 50 percent. This can be expressed as follows:
H0: p=0.5
The Alternative Hypothesis would state the following:
H1: p≠0.5

218
Each non-zero difference is classified as either positive or negative. This is a binary event because the
classification of each difference has only two possible outcomes: the non-zero difference is either positive
or negative.
The distribution of the outcomes of this binary event can be described by the binomial distribution as long
as the following two conditions exist:
1) Each binary trial is independent.
2) The data from which the differences are derived are at least ordinal. The data can be ratio, interval,
ordinal, but not nominal. The differences of “less than” and “greater than” must be meaningful even if the
amount of difference is not, as would be the case with ordinal data but not with nominal data.
3) Each binary trial has the same probability of a positive outcome.
All of these conditions are met because of the following:
1) Each sample taken is independent of any other sample.
2) The differences are derived from continuous (either ratio or interval) data.
3) The proportion of positive differences versus negative differences is assumed to be constant in the
population from which the sample of differences was derived.
The counts of the positive and negative differences both follow the binomial distribution. The lowest
count, whether it is the count of positive differences or the count of negative differences, is designated as
W, the Test Statistic. This Test Statistic follows the binomial distribution because W represents the count
of positive or negative outcomes of independent binary events that all have the same probability of a
positive outcome.
As stated, the Null Hypothesis of this one-tailed, paired Sign Test is the following:
H0: p=0.5
If the training program was successful, there would be a reduction in the number of clerical errors. If the
number of clerical errors were reduced, there would be a larger number of negative after-minus-before
differences than positive differences.

219
The difference count indicates that there are 11 negative differences and 6 positive differences. These
counts are distributed according to the binomial distribution that has a probability of positive outcome, p,
equal to 0.5 and the total number of trials, N, equal to 17. As Excel-generated graph of this binomial
distribution is shown as follows:

This test evaluates whether a count of 11 negative differences is significant at an alpha level of 0.05. The
area under the PDF curve beyond 11 differences is equal to the probability that this outcome did not
occur by chance. This is the p value for this test.
The Null Hypothesis would be rejected if the p Value calculated from this test is less than alpha, which is
customarily set at 0.05.
The binomial distribution is symmetric and the curve area in the right tail beyond 11 differences is the
same as the curve area in the left tail beyond 6 differences.
Test Statistic W in the Sign Test is always set to the lower count. The area outside the lower count is
equal to the area outside the upper count. That area is the p Value for the one-tailed Sign test.
That p Value is equal to the probability that the number of positive outcomes is less than W = 6 if the total
number of nonzero counts = N = 17 and every binary trial has the same probability of a positive outcome
= p = 0.5.
Given that variable x is binomially distributed, the CDF (Cumulative Distribution Function) of the x ≤ X is
calculated in Excel as follows:
p Value = F(X;n,p) = BINOM.DIST(X, n, p, 1)
This calculates the probability that up to X number of positive outcomes will occur in n total binary trials if
the probability of a positive outcome is p for every trial. “1” specifies that the Excel formula will calculate
the CDF and not the PDF.

220
Therefore the following can be calculated:
p Value = Pr (p = No. of Positive Differences ≤ W |p=0.5, N = Total No. of Non-Zero Differences) =
p Value = BINOM.DIST(W, N, p,1)
p Value = BINOM.DIST(7,20,0.5,1) = 0.1316
This is shown in the following Excel-generated graph of the PDF of the binomial distribution for this sign
test. The parameters of this binomial distribution are Total Trials = N = 17 and the Probability of a Positive
Outcome of Each Trial, p, equal 0.5.

The p Value (0.1661) is larger than alpha (set at 0.05). The Null Hypothesis is therefore not rejected at
this alpha level. This would be equivalent to stating that there is not enough evidence to reject the Null
Hypothesis which ultimately states that there has been no reduction in clerical errors as a result of the
training program.
This example demonstrates how much less powerful the one-sample Sign Test is than the paired t-Test
or the Wilcoxon Signed-Rank Test. The Sign Test did not come close to detecting a difference at the
same alpha level that the other two tests did.

221
z-Tests: Hypothesis Tests Using the Normal
Distribution in Excel

z-Test Overview
The z-Test is a hypothesis test that analyzes sample data to determine if two populations have
significantly different means. A z-Test can be applied if the test statistic follows the normal distribution
under the Null Hypothesis. The test statistic will follow the normal distribution if both of the following
conditions exist at the same time:
1) The sample size is large.
2) The population standard deviation is known.
The population and sample do not have to be evaluated for normality when sample size is large. As per
the Central Limit Theorem, large sample size ensures that the sample mean will be normally distributed
The test statistic is derived from the sample mean. Normal distribution of the test statistic further requires
that the population standard deviation be known.
The t-Test is the appropriate population mean hypothesis testing tool when sample size is small and/or
the population standard deviation is not known. A t-Test can always be substituted for a z-Test.

Hypothesis Test Overview


A hypothesis test evaluates whether a sample is different enough from a population to establish that the
sample probably did not come from that population. If a sample is different enough from a hypothesized
population, then the population from which the sample came is different than the hypothesized
population.

Null Hypothesis
A hypothesis test is based upon a Null Hypothesis which states that the sample did come from that
population. A hypothesis test compares a sample statistic such as a sample mean to a population
parameter such as the population’s mean. The amount of difference between the sample statistic and the
population parameter determines whether the Null Hypothesis can be rejected or not.
The Null Hypothesis states that the population from which the sample came has the same mean or
proportion as a hypothesized population. The Null Hypothesis is always an equality stating that the
means or proportions of two populations are the same.
An example of a basic Null Hypothesis for a Hypothesis Test of Mean would be the following:
H0: x_bar = Constant = 5
This Null Hypothesis would be used to state that the population from which the sample was taken has a
mean equal to 5. The Constant (5) is the mean of the hypothesized population that the sample’s
population is being compared to. The Null Hypothesis states that the sample’s population and the
hypothesized population have the same means. The Alternative Hypothesis states that they are different.
An example of a basic Null Hypothesis for a Hypothesis Test of Proportion would be the following:
H0: p_bar = Constant = 0.3
This Null Hypothesis would be used to state that the population from which the sample was taken has a
proportion equal to 0.3. The Constant (0.3) is the proportion of the hypothesized population that the
sample’s population is being compared to. The Null Hypothesis states that the sample’s population and

222
the hypothesized population have the same proportions. The Alternative Hypothesis states that they are
different.

The Null Hypothesis is Either Rejected or Not Rejected But Is Never


Accepted
A hypothesis test has only two possible outcomes: the Null Hypothesis is either rejected or is not rejected.
It is never correct to state that the Null Hypothesis was accepted. A hypothesis test only determines
whether there is or is not enough evidence to reject the Null Hypothesis. The Null Hypothesis is rejected
only when the hypothesis test result indicates a Level of Certainty that the Null Hypothesis is not valid at
least equals the specified Level of Certainty.
If the required Level of Certainty for a hypothesis test is specified to be 95 percent, the Null Hypothesis
will be rejected only if the test result indicates that there is at least a 95 percent probability that the Null
Hypothesis is invalid. In all other cases, the Null Hypothesis would not be rejected. This is not equivalent
to stating that the Null Hypothesis was accepted. The Null Hypothesis is never accepted; it can only be
rejected or not rejected.

Alternative Hypothesis
The Alternative Hypothesis is always in inequality stating that the means or proportions of two populations
are not the same. The Alternative Hypothesis can be non-directional if it states that the means or
proportions of two populations are merely not equal to each other. The Alternative Hypothesis is
directional if it states that the mean or proportion of one of the populations is less than or greater than the
mean of proportion of the other population.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Mean would be the
following:
H1: x_bar ≠ 5
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a mean that is not equal to 5.
An example of a directional Alternative Hypothesis would be the following:
H1: x_bar > 5 or H1: x_bar < 5
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a mean that is either greater than or less than 5.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Proportion would be the
following:
H1: p_bar ≠ 0.3
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a proportion that is not equal to 0.3.
An example of a directional Alternative Hypothesis would be the following:
H1: p_bar > 0.3 or H1: p_bar < 0.3
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a proportion that is either greater than or less than 0.3.

223
One-Tailed Test vs. a Two-Tailed Test
The number of tails in a hypothesis test depends on whether the test is directional or not. The operator of
the Alternative Hypothesis indicates whether or not the hypothesis test is directional. A non-directional
operator (a “not equal” sign) in the Alternative Hypothesis indicates that the hypothesis test is a two-
tailed test. A directional operator (a “greater than” or “less than” sign) in the Alternative Hypothesis
indicates that the hypothesis test is a one-tailed test.
The Region of Rejection (the alpha region) for a one-tailed test is entirely contained in the one of the
outer tails. A “greater than” operator in the Alternative Hypothesis indicates that the test is a one-tailed
test in the right tail. A “less than” operator in the Alternative Hypothesis indicates that the test is a one-
tailed test in the left tail. If α = 0.05, then one of the outer tails will contain the entire 5-percent Region of
Rejection.
The Region of Rejection (the alpha region) for a two-tailed test is split between both outer tails. Each
outer tail will contain half of the total Region of Rejection (alpha/2). If α = 0.05, then each outer tail will
contain a 2.5-percent Region of Rejection if the test is a two-tailed tailed.

Level of Certainty
Each hypothesis test has Level of Certainty that is specified. The Null Hypothesis is rejected only when
that Level of Certainty has been reached that the sample did not come from the population. A commonly
specified Level of Certainty is 95 percent. The Null Hypothesis would only be rejected in this case if the
sample statistic was different enough from the population parameter that at least 95 percent certainty was
achieved that the sample did not come from that population.

Level of Significance (Alpha)


The Level of Certainty for a hypothesis test is often indicated with a different term called the Level of
Significance also known as α (alpha). The relationship between the Level of Certainty and α is the
following:
α = 1 – Level of Certainty
An alpha that is set to 0.05 indicates that a hypothesis test requires a Level of Certainty of 95 percent that
the sample came from a different population to be reached before the Null Hypothesis is rejected.

Region of Acceptance
A Hypothesis Test of Mean or Proportion can be performed if the Test Statistic is distributed according to
the normal distribution or the t-Distribution. The Test Statistic is derived directly from the sample statistic
such as the sample mean. If the Test Statistic is distributed according to the normal or t-Distribution, then
the sample statistic is also distributed according to normal or t-Distribution. This will be discussed is
greater detail shortly.
A Hypothesis Test of Mean or Proportion can be understood much more intuitively by mapping the
sample statistic (the sample mean or proportion) to its own unique normal or t-Distribution. The sample
statistic is the distributed variable whose distribution is mapped according its own unique normal or t-
Distribution.
The Region of Acceptance is the percentage of area under this normal or t-Distribution curve that equals
the test’s specified Level of Certainty. If the hypothesis test requires 95 percent in order to reject the Null
Hypothesis, the Region of Acceptance will include 95 percent of the total area under the distributed
variable’s mapped normal or t-Distribution curve.

224
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Acceptance, the Null Hypothesis is not rejected. If the observed value of the
sample statistic falls outside of the Region of Acceptance (into the Region of Rejection), the Null
Hypothesis is rejected.

Region of Rejection
The Region of Rejection is the percentage of area under this normal or t-Distribution curve that equals the
test’s specified Level of Significance (alpha). It is important to remember the following relationship:
Level of Significance (alpha) = 1 – Level of Certainty.
If the required Level of Certainty to reject the Null Hypothesis is 95 percent, then the following are true:
Level of Certainty = 0.95
Level of Significance (alpha) = 0.05
The Region of Acceptance includes 95 percent of the total area under the normal or t-Distribution curve
that maps the distributed variable, which is the sample statistic (the sample mean or proportion).
The Region of Rejection includes 5 percent of the total area under the normal or t-Distribution curve that
maps the distributed variable, which is the sample statistic (the sample mean or proportion). The 5-
percent alpha region is entirely contained in one of the tails if the test is a one-tailed test. The 5-percent
alpha region is split between both of the outer tails if the test is a one-tailed test.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Rejection (outside the Region of Acceptance), the Null Hypothesis is rejected.
If the observed value of the sample statistic falls inside of the Region of Acceptance, the Null Hypothesis
is not rejected.

Critical Value(s)
Each hypothesis test has one or two Critical Values. A Critical Value is the location of boundary between
the Region of Acceptance and the Region of Rejection. A one-tailed test has one critical value because
the Region of rejection is entirely contained in one of the outer tails. A two-tailed test has two Critical
Values because the Region of Rejection is split between the two outer tails.
The Null Hypothesis is rejected if the sample statistic (the observed sample mean or proportion) is farther
from the curve’s mean than the Critical Value on that side. If the sample statistic is farther from the
curve’s mean than the Critical value on that side, the sample statistic lies in the Region of Rejection. If the
sample statistic is closer to the curve’s mean than the Critical value on that side, the sample statistic lies
in the Region of Acceptance.

Test Statistic
Each hypothesis test calculates a Test Statistic. The Test Statistic is the amount of difference between
the observed sample statistic (the observed sample mean or proportion) and the hypothesized population
parameter (the Constant on the right side of the Null Hypothesis) which will be located at the curve’s
mean.
This difference is expressed in units of Standard Errors. The Test Statistic is the number of Standard
Errors that are between the observed sample statistic and the hypothesized population parameter. The
Null Hypothesis is rejected if that number of Standard Errors specified by the Test Statistic) is larger than
a critical number of Standard Errors. The critical number of Standard Errors is determined by the required
Level of Certainty.

225
The Test Statistic is either the z Score or the t Value depending on whether a z-Test or t-Test is being
performed. This will be discussed in greater detail shortly.

Critical t Value or Critical z Value


Each hypothesis test calculates Critical t or z Values. A Critical t Value is calculated for a t-Test and a
Critical z Value is calculated for a z-Test. A Critical t or z Value is the amount of difference expressed in
Standard Errors between the boundary of the Region of Rejection (the Critical Value) and hypothesized
population parameter (the Constant on the right side of the Null Hypothesis) which will be located at the
curve’s mean.
A one-tailed test has only one Critical t or z Value because the Region of Rejection is entirely contained in
one outer tail A two-tailed test has two Critical z or t Values because the Region of Rejection is split
between the two outer tails.
The Test Statistic (the t Value or z Score) are compared with the Critical t or z Value on that side of the
mean. If the Test Statistic is farther from the standardized mean of zero than the Critical t or z Value on
that side, the Null Hypothesis is rejected.
The Test Statistic is the number of Standard Errors that the sample statistic is from the curve’s mean. The
Critical t or z Value on the same side is the number of Standard Errors that the Critical Value (the
boundary of the Region of Rejection) is from the mean. If the Test Statistic is farther from the
standardized mean of zero than the Critical t or z value, the sample statistic lies in the Region of
Rejection.

Relationship Between p Value and Alpha


Each hypothesis test calculates a p Value. The p Value is the area under the curve that is beyond the
sample statistic (the observed sample mean or proportion). The p Value is the probability that a sample of
size n with the observed sample mean or proportion could have occurred if the Null Hypothesis were true.
If, for example, the p Value of a Hypothesis Test of Mean or Proportion were calculated to be 0.0212, that
would indicated that there is only a 2.12 percent chance that a sample of size n would have the observed
sample mean or proportion if the Null Hypothesis were true. The Null Hypothesis states that the
population from which the sample came has the same mean as the hypothesized population. This mean
is the Constant on the right side of the Null Hypothesis.
The p Value is compared to alpha for a one-tailed test and to alpha/2 for a two-tailed test. The Null
Hypothesis is rejected if p is smaller than α for a one-tailed test or if p is smaller than α/2 for a two-tailed
test. If the p Value is smaller than α for a one-tailed test or smaller than α/2 for a two-tailed test, the
sample statistic is in the Region of Rejection.
Calculations of the Critical z Value(s) and the p Value are as follows:

226
Critical z Values

Critical z Value for a one-tailed test in the right tail:


Excel 2010 and beyond
Critical z Value = NORM.S.INV(1-α)
Prior to Excel 2010
Critical z Value = NORMSINV(1-α)

Critical z Value for a one-tailed test in the left tail:


Excel 2010 and beyond
Critical z Value = NORM.S.INV(α)
Prior to Excel 2010
Critical z Value = NORMSINV(α)

Critical z Values for a two-tailed test:


Excel 2010 and beyond
Critical z Values = ±NORM.S.INV(1-α/2)
Prior to Excel 2010
Critical z Values = ±NORMSINV(1-α/2)

p Value
Excel 2010 and beyond
p Value =MIN(NORM.DIST(x_bar,µ,SE,TRUE),1-NORM.DIST(x_bar,µ,SE,TRUE))
x_bar represents one of the following:
- the sample mean if this is a one-independent sample z-Test
- the difference between the sample means of a two-independent sample z-Test
- the mean difference between the paired values if this is a paired z-Test.

If the z Score (Test Statistic) is known, the p Value can be calculated more simply as follows:
p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))

Prior to Excel 2010


p Value =MIN(NORMDIST(x_bar,µ,SE,TRUE),1-NORMDIST(x_bar,µ,SE,TRUE))
If the z Score (Test Statistic) is known, the p Value can be calculated more simply as follows:
p Value =MIN(NORMSDIST(z Score),1-NORMSDIST(z Score))

227
The 3 Equivalent Reasons To Reject the Null Hypothesis
The Null Hypothesis of a Hypothesis Test of Mean or Proportion is rejected if any of the following
equivalent conditions are shown to exist:
1) The sample statistic (the observed sample mean or proportion) is beyond the Critical Value. The
sample statistic would therefore lie in the Region of Rejection because the Critical Value is the boundary
of the Region of Rejection.
2) The Test Statistic (the t value or z Score) is farther from zero than the Critical t or z Value. The
Test Statistic is the number of Standard Errors that the sample statistic is from the curve’s mean. The
Critical t or z Value is the number of Standard Errors that the boundary of the Region of Rejection is from
the curve’s mean. If the Test Statistic is farther from farther from the standardized mean of 0 than the
Critical t or z Value, the sample statistic lies in the Region of Rejection.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test. The p Value is the
curve area beyond the sample statistic. α and α/2 equal the curve areas contained by the Region of
Rejection on that side for a one-tailed test and a two-tailed test respectively. If the p value is smaller than
α for a one-tailed test or α/2 for a two-tailed test, the sample statistic lies in the Region of Rejection.

Independent Samples vs. Dependent Samples


A sample that is independent of a second sample has data values that are not influenced by any of the
data values within the second sample. Dependent samples are often referred to as paired data. Paired
data are data pairs in which one of the values of each pair has an influence on the other value of the data
pair. Paired data sample would be a set of before-and-after test scores from the same set of people.

Pooled vs. Unpooled Tests


A two-independent-sample Hypothesis Test of Mean can be pooled or unpooled. A pooled test can be
performed if the variances of both independent samples are similar. This is a pooled test because a
single pooled standard deviation replaces both sample standard deviations in the calculation of the
Standard Error. An unpooled test must be performed when the variances of the two independent samples
are not similar.

Type I and Type II Errors


A Type I Error is a false positive and a Type II Error is a false negative. A false positive occurs when a
test incorrectly detects of a significant difference when one does not exist. A false negative occurs when a
test incorrectly fails to detect a significant different when one exists.
α (the specified Level of Significance) = a test’s probability of a making a Type I Error.
β = a test’s probability of a making a Type II Error.

Power of a Test
The Power of a test indicates the test’s sensitivity. The Power of a test is the probability that the test will
detect a significant difference if one exists. The Power of a test is the probability of not making a Type II
Error, which is failing to detect a difference when one exists. A test’s Power is therefore expressed by the
following formula:
Power = 1 – β

228
Effect Size
Effect size in a t-Test or z-Test is a convention of expressing how large the difference between two
groups is without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.” A large effect would be a difference between two groups that is easily
noticeable with the measuring equipment available. A small effect would be a difference between two
groups that is not easily noticed.

Nonparametric Alternatives
Nonparametric tests are not substituted for z-Tests because a z-Test (a Hypothesis test of Mean that is
performed using the normal distribution) can only be performed on large samples (n > 30). The sample
mean and therefore the Test Statistic will always be normally-distributed as per the Central Limit
Theorem.
Nonparametric tests are sometimes substituted for t-Tests because normality requirements cannot be
met. A t-Test is a Hypothesis Test of Mean that can be performed if the sample statistic (and therefore the
Test Statistic) is distributed according to the t-Distribution under the Null Hypothesis. The sample statistic
(the sample mean) is distributed according to the t-Distribution if any of the following three conditions
exist:
1) Sample size is large (n > 30). The sample taken for the hypothesis test must have at least 30 data
observations.
2) The population from which the sample was taken is verified to be normal-distributed.
3) The sample is verified to be normal-distributed.
If none of these conditions can be met or confirmed, a nonparametric test can often be substituted for a t-
Test. A nonparametric test does not have normality requirements that a parametric test such as a t-Test
does.

Hypothesis Test of Mean vs. Proportion


Hypothesis Test covered in this section will either be Hypothesis Tests of Mean or Hypothesis Test of
Proportion. A data point of a sample taken for a Hypothesis Test of Mean can have a range of values. A
data point of a sample taken for a Hypothesis Test of Proportion is binary; it can take only one of two
values.

Hypothesis Tests of Mean – Basic Definition


A Hypothesis Test of Mean compares an observed sample mean with a hypothesized population mean to
determine if the sample was taken from the same population. An example would be to compare a sample
of monthly sales of stores in one region to the national average to determine if mean sales from the
region (the population from which the sample was taken) is different than the national average (the
hypothesized population parameter). As stated, a sample taken for a Hypothesis Test of Mean can have
a range of values. In this case, the sales of a sample sampled store can fall within a wide range of values.

229
Hypothesis Tests of Proportion – Basic Definition
A Hypothesis Test of Proportion compares an observed sample proportion with a hypothesized
population proportion to determine if the sample was taken from the same population. An example would
be to compare the proportion of defective units from a sample taken from one production line to the
proportion of defective units from all production lines to determine if the proportion defective from the one
production line (the population from which the sample was taken) is different than from the proportion
defective of all production lines (the hypothesized population parameter). As stated, a sample taken for a
Hypothesis Test of Proportion can only have one of two values. In this case, a sampled unit from a
production line is either defective or it is not.
Hypothesis Test of Proportion are covered in detail in a separate section in this manual. They are also
summarized at the end of the binomial distribution section.

Hypothesis Tests of Mean


Hypothesis Tests of Mean require that the Test Statistic is distributed either according to the normal
distribution or to the t-Distribution. The Test Statistic in a Hypothesis Test of Mean is derived directly from
the sample mean and therefore has the same distribution as the sample mean.

t-Test versus z-Test


A Hypothesis Test of Mean will either be performed as a z-Test or as a t-Test. When the sample mean
and therefore the Test Statistic are distributed according to the normal distribution, the hypothesis test is
called a z-Test and the Test Statistic is called the z Score. When the sample mean and therefore the
Test Statistic is distributed according to the t-Distribution, the hypothesis test is called a t-Test and the
Test Statistic is called the t Value. The Test Statistic is the number of Standard Errors that the observed
sample mean is from the hypothesized population mean.
t-Tests are covered in detail is a separate section in this manual. They are also summarized at the end of
the t-Distribution section.
z-Tests are covered in detail is a separate section in this manual. They are also summarized at the end of
this normal distribution section.

Normal Distribution of Means of Large Samples


According to the Central Limit Theorem, the means of large samples will be normal-distributed no matter
how the population from which the samples came is distributed. This is true as long as the samples are
random and the sample size, n, is large (n > 30). n equals the number of data observations that each
sample contains.
If the single sample taken for a Hypothesis Test of Mean is large (n > 30), then the means of a number of
similar samples taken from the same population would be normal-distributed as per the Central Limit
Theorem. This is true no matter how the population or the single sample are distributed.
If the single sample taken for a Hypothesis Test of Mean is small (n < 30), then the means of a number of
similar samples taken from the same population would be normal-distributed only if the population was
proven to be normal-distributed or if the sample was proven to be normal-distributed.

230
Requirements for a z-Test
A z-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived
from the sample mean) is normal-distributed. The sample mean and therefore the Test Statistic are
normal-distributed only when the following two conditions are both met:
1) The size of the single sample taken is large (n > 30). The Central Limit Theorem states that means of
large samples will be normal-distributed. When the size of the single sample is small (n < 30), only a t-
Test can be performed.
2) The population standard deviation, σ (sigma), is known.

Requirements for a t-Test


A t-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived from
the sample mean) is distributed according to the t-Distribution. The sample mean and therefore the Test
Statistic are distributed according to the t-Distribution when both of these conditions are met:
1) The sample standard deviation, s, is known.
2) Either the sample or the population has been verified for normality.
A t-Test can be performed when the single sample is large (n > 30) but is the only option when the size of
the single sample is small (n < 30). A z-Test can only be performed when the size of the single sample is
large (n > 30) and the population standard deviation is known.
As mentioned, a Hypothesis Test of Mean requires that the sample mean and therefore the Test Statistic
is distributed either according to the normal distribution or to the t-Distribution. The sample mean and the
Test Statistic are distributed variables that can be graphed according to the normal or t-Distribution.
The Test Statistic, which represents the number of Standard Errors that the sample mean is from the
hypothesized population mean, could be graphed on a standard normal distribution curve or a
standardized t-Distribution curve. Both these two distribution curves have their means at zero and the
length of one Standard Error is set to equal 1

Basic Steps of a Hypothesis Test of Mean


The major steps of the simple Hypothesis Test of Mean, a one-sample z-Test, are described as follows:
1) A sample of data is taken. The sample statistic which is the sample mean is calculated.
2) A Null Hypothesis is created stating the population from which the sample was taken has the same
proportion as a hypothesized population proportion. An Alternative Hypothesis is constructed stating that
sample population’s proportion is not equal to, greater than, or less than the hypothesized population
proportion depending on the wording of the problem.
3) The sample proportion is mapped to a normal curve that has a mean equal to the hypothesized
population proportion and a Standard Error calculated based upon a formula specific to the type of
Hypothesis Test of Proportion.
4) The Critical Values are calculated and the Regions of Acceptance and Rejection are mapped on the
normal graph that maps the distributed variable. The Critical Values represent the boundaries between
the Region of Acceptance and Region of Rejection.
5) Critical z Values, the Test Statistic (the z Score) and p Value are then calculated.
6) The Null Hypothesis is rejected if any of the following three equivalent conditions are shown to exist:
a) The observed sample mean, x_bar, is beyond the Critical Value.
b) The z Score (the Test Statistic) is farther from zero than the Critical z Value.
231
c) The p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
The following graph represents the final result of a typical one-sample, two-tailed z-Test. In this case the
Null Hypothesis was rejected. This result is represented as follows:

This z-Test was a two-tailed test as evidenced by the yellow Region of Rejection split between the both
outer tails. In this t-Test the alpha was set to 0.05. This 5-percent Region of Rejection is split between the
two tails so that each tail contains a 2.5 percent Region of Rejection.
The mean of this non-standardized normal distribution curve is 186,000. This indicates that the Null
Hypothesis is as follows:
H0: x_bar = 186,000
Since this is a two-tailed t-Test, the Alternative Hypothesis is as follows:
H1: x_bar ≠ 186,000
This one-sample z-Test is evaluating whether the population from which the sample was taken has a
population mean that is not equal to 186,000. This is a non-directional z-Test and is therefore two-tailed.
The sample statistic is the observed sample mean of this single sample taken for this test. This observed
sample mean is calculated to be 200,000.
The boundaries of the Region of Rejection occur at 176,703 and 195,297. Everything outside of these two
points is in the Region of Rejection. These two Critical Values are 1.959 Standard Errors from the
standardized mean of 0. This indicates that the Critical z Values are ±1.959.
The graph shows that the sample statistic (the sample mean of 200,000) falls beyond the right Critical
value of 195,257 and is therefore in the Region of Rejection.
The sample statistic is 2.951 Standard Errors from the standardized mean of 0. This is further from the
standardized mean of 0 than the right Critical t value which is 1.959.

232
The curve area beyond the sample statistic consists of 2.4 percent of the area under the curve. This is
smaller than α/2 which is 2.5 percent of the total curve area because alpha was set to 0.05.
As the graph shows, all three equivalent conditions have been met to reject the Null Hypothesis. It can be
stated with at least 95 percent certainty that the mean of the population from which the sample was taken
does not equal the hypothesized population mean of 186,000.

Uses of Hypothesis Tests of Mean


1) Comparing the mean of a sample taken from one population with the another population’s
known mean to determine if both populations have the different means. An example of this would be to
compare the mean monthly sales of a sample of retail stores from one region to the national mean
monthly store sales to determine if the mean monthly sales of all stores in the one region are different
than the national mean.
2) Comparing the mean of a sample taken from one population to a fixed number to determine if
that population’s mean is different than the fixed number. An example of this might be to compare the
mean product measurement taken a sample of a number of units of a product to the company’s claims
about that product specification to determine if the actual mean measurement of all units of that
company’s product is different than what the company claims it is.
3) Comparing the mean of a sample from one population with the mean of a sample from another
population to determine if the two populations have different means. An example of this would be to
compare the mean of a sample of daily production totals from one crew with the mean of a sample of
daily production totals from another crew to determine if the two crews have different mean daily
production totals.
4) Comparing successive measurement pairs taken on the same group of objects to determine if
anything has changed between measurements. An example of this would be to evaluate whether there is
mean difference in before-and-after tests scores of a small sample of the same people to determine if a
training program made a difference to all of the people who underwent it.
5) Comparing the same measurements taken on pairs of related objects. An example of this would
be to evaluate whether there is mean difference in the incomes of husbands and wives in a sample of
married couples to determine if there is a mean difference in the incomes of husbands and wives in all
married couples.
It is important to note that a hypothesis test is used to determine if two populations are different, The
outcome of hypothesis test is to either reject or fail to reject the Null Hypothesis. It would be incorrect to
state that a hypothesis test is used to determine if two populations are the same.

Types of Hypothesis Tests of Mean: t-Tests or z-Tests


The 4 types of t-Tests discussed in this manual are the following:
One-sample t-Test
Two-Independent-Sample, Pooled t-Test
Two-Independent-Sample, Unpooled t-Test
Two-Dependent-Sample (Paired) t-Test
The 3 types of z-Test discussed here are the following:
One-sample z-Test
Two-independent-Sample, Unpooled z-Test
Two-Dependent-Sample (Paired) z-Test

233
Detailed discussions of each of the three types of z-Tests along with examples in Excel are as follows:

1) One-Sample z-Test in Excel

Overview
This hypothesis test determines whether the mean of the population from which the sample was taken is
equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant. This constant
is often the known mean of a population from which the sample may have come from. The constant is the
constant on the right side of the Null Hypothesis.
x_bar = Observed Sample Mean

Null Hypothesis H0: x_bar = Constant


The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar is beyond the Critical Value.
2) The z Score (the Test Statistic) is farther from zero than the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

Example of a One-Sample, Two-Tailed z-Test in Excel


This problem is very similar to the problem solved in the t-test section for a one-sample, one-tailed t-test.
Similar problems were used in each of these sections to show the similarities and also contrast the
differences between the one-sample z-Test and t-test as easily as possible.
This problem compares average monthly sales from one fast food chain’s retail stores in one region with
the average monthly sales of all of the fast food chain’s retails in the entire country. The region being
evaluated has more than 1,000 very similar stores. The national mean monthly retail store sales equals
$186,000. The standard deviation of the monthly sales for the entire population of stores is $30,000.
Determine with at least 95% certainty whether the average monthly sales of all of the fast food chain’s
stores in the one region is different than the national average monthly sales of all of the fast food chain’s
stores.

234
The data sample of sales for the month for a random sample of 40 retail stores in a region is as follows:

Summary of Problem Information


x_bar = sample mean = AVERAGE() = 200,000
µ = national (population) mean = 186,000
α = 1-Level of Certainty Required = 1 – 0.95 = 0.05
s = sample standard deviation = Not Known and not needed for a z-Test
σ (Greek letter “sigma”) = population standard deviation = 30,000
n = sample size = COUNT() = 40

235
SE = Standard Error = σ / SQRT(n) = 30,000 / SQRT(40) = 4,743
Note that this calculation of the Standard Error using the population standard deviation, σ, is the true
Standard Error. If the sample standard error, s, were used in place of σ, the Standard Error calculated
would be an estimate of the true Standard Error. The z-Test requires the population standard deviation
but the t-test uses the sample standard deviation as an estimate of the population standard deviation.
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternative Hypothesis
Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean are as follows:

Question1) Type of Test?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Mean because each individual observation (each sampled monthly retail
store sales figure) within the sample can have a wide range of values. Data points for Hypothesis Tests of
Proportion are binary: they can take only one of two possible values.

b) One-Sample or a Two-Sample Test?


This is a one-sample hypothesis test because only one sample containing monthly sales figures from
forty stores has been taken and is being compared to the national monthly retail store average for the
same month.

c) Independent (Unpaired) Test or a Dependent (Paired) Test?


It is neither. The designation of “paired” or “unpaired” applies only for two-sample hypothesis tests.

d) One-Tailed or Two-Tailed Hypothesis?


The problem asks to determine whether the forty-store monthly average is simply different than the
national average. This is a non-directional inequality making this hypothesis test a two-tailed test. If the
problem asked whether the forty-store average was either higher or was lower, the inequality would be
directional and the resulting hypothesis test would be a one-tailed test. A two-tailed test is more stringent
than a one-tailed test.

236
e) t-Test or z-Test?
A hypothesis test of means can be performed if the distribution of the Test Statistic under the Null
Hypothesis can be approximated by either the normal distribution or the t-Distribution.
A z-Test is a statistical test in which the distribution of the Test Statistic under the Null Hypothesis can be
approximated by the normal distribution. A t-test is a statistical test in which the distribution of the Test
Statistic under the Null Hypothesis can be approximated by the t-Distribution.
This hypothesis test of mean can be performed as z-Test because sample size is large (n = 40) and the
population standard deviation (σ = 30,000) is known. The large sample size and known population
standard deviation ensure that the distribution of the sample mean (and therefore Test Statistic, which is
derived from the sample mean) can be approximated by the normal distribution under the Null
Hypothesis.
It should be noted that a one-sample t-Test can always be used in place of a one-sample z-Test. All z-
Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software
packages including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a t-Test that is one-sample, two-tailed hypothesis test of mean as long as
all required assumptions have been met.

Question 2) Test Requirements Met?


a) Normal Distribution of Test Statistic
The normal distribution can be used to map the distribution of the sample mean (and therefore the Test
Statistic, which is derived from the sample mean) only if the following conditions exist:

1) Population standard deviation, σ, is known


Population standard deviation, σ, is one of the two required parameters needed to fully describe a unique
normal distribution curve and must therefore be known in order to perform a z-Test (which uses the
normal distribution).
and

2) Sample size is large (n > 30)


The Central Limit Theorem states that if a number of large, random samples of the same size were taken
from the same population, the means of the samples would be normal-distributed. If the sample mean is
normal-distributed, the Test Statistic, which equals (sample mean – Constant ) / SE, will also be normal-
distributed because it is derived from the sample mean.
If sample size is large, neither the normality of the population nor the normality of the sample data has to
be confirmed.
The sample man and therefore the Test Statistic are normal-distributed because sample size is large (n =
40) and the population standard deviation (σ = 30,000) is known.

237
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:

Step 1) Create the Null Hypothesis and the Alternative Hypothesis

Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis

Step 3 – Map the Regions of Acceptance and Rejection

Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternative Hypotheses


The Null Hypothesis is always an equality that states that the items being compared are the same. In this
case, the Null Hypothesis would state that the average monthly sales of all stores in the region (the
population from which the forty-store sample was taken) is not different than the national monthly store
average sales, µ, which is $186,000. We will use the variable x_bar to represent the sample mean of the
forty stores. The Null Hypothesis is as follows:
H0: x_bar = Constant = 186,000
The Constant is quite often the known population mean, µ, to which the sample mean is being compared.
The Alternative Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the average monthly sales of all stores in the
region (the population from which the forty-store sample was taken) is merely different than the national
monthly store average sales, µ, which is $186,000.
The Alternative Hypothesis is as follows:
H1: x_bar ≠ Constant
H1: x_bar ≠ 186,000
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the means of the populations from
which the samples were taken. A one-sample t-Test determines whether to reject or fail to reject the Null
Hypothesis that states that that population from which the sample was taken (the entire region) has a
mean equal to the Constant. The Constant in this case is equal to known national average.
Parameters necessary to map the distributed variable, x_bar, are the following:
σ (Greek letter “sigma”) = population standard deviation = 30,000
n = sample size = COUNT() = 40

SE = Standard Error = σ / SQRT(n) = 30,000 / SQRT(40) = 4,743


238
These parameters used to map the distributed variable, x_bar, to the normal distribution are as follows:

Step 2 – Map the Distributed Variable to Normal Distribution


A z-Test can be performed if the sample mean, and the Test Statistic (the z Score) are distributed
according to the normal Distribution. If the sample size is large and the population standard deviation is
known, the sample mean and closely-related Test Statistic are distributed according to the normal
Distribution.
The sample mean x_bar is distributed according to the normal distribution. The distributed variable would
in this case be the sample mean, x_bar.
Mapping this distributed variable to a normal Distribution curve is shown as follows:

This non-standardized normal Distribution curve has its mean set to equal the Constant taken from the
Null Hypothesis, which is:
H0: x_bar = Constant = 186,000
This non-standardized normal Distribution curve is constructed from the following parameters:
Mean = 186,000
Standard Error = 4,743
Distributed Variable = x_bar

239
Step 3 – Map the Regions of Acceptance and Rejection
The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different the sample mean, x_bar = $200.000, is from the national average of
$186,000.
The non-standardized normal distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Regions of Rejection. A two-tailed test has the Region of Rejection split between the
two outer tails. A boundary between a Region of Acceptance and a Region of Rejection is called a Critical
Value.
If the sample mean’s value of x_bar = 200,000 falls into a Region of Rejection, the Null Hypothesis is
rejected. If the sample mean’s value of x_bar = 200,000 falls into a Region of Acceptance, the Null
Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.
This 5 percent is divided up between the two outer tails. Each outer tail contains 2.5 percent of the curve
that is the Region of Rejection.
The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical values need to be calculated.

Calculate Critical Values


A Critical Value is the boundary between a Region of Acceptance and a Region of Rejection. In the case
of a two-tailed test, the Region of rejection is split between two outer tails. There are therefore two Critical
Values.
The Critical Value is the boundary on either side of the curve beyond which 2.5 percent of the total area
under the curve exists. In this case both Critical Values can be found by the following:
Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± NORM.S.INV(1-α/2) * SE
Critical Values = 186,000 ± NORM.S.INV(1 - 0.05/2) * 4,743
Critical Values = 186,000 ± NORM.S.INV(0.975) * 4,743
Critical Values = 186,000 ± 9,296
Critical Values = 176,703 and 195,297
The Region of Rejection is therefore everything that is to the right of 195,297 and everything to the left of
176,703.

240
An Excel-generated distribution curve with the blue Region of Acceptance and the yellow Regions of
Rejection is shown is as follows:

Step 4 – Determine Whether to Reject Null Hypothesis


The object of a hypothesis test is to determine whether to accept or reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
1) Compare x-bar With Critical Value
Reject the Null Hypothesis if the sample mean, x_bar = 200,000, falls into the Region of Rejection.
Equivalently, reject the Null Hypothesis if the sample mean, x_bar, is further from the curve’s mean of
186,000 than the Critical Value.
The Critical Values have been calculated to be 176,703 on the left and 195,297 on the right. X_bar
(200,000) is further from the curve mean (186,000) than the right Critical Value (195,297). The Null
Hypothesis would therefore be rejected.

2) Compare the z Score With Critical z Value


The z Score corresponds to the standardized value of the sample mean, x_bar = 200,000. The z Score is
the number of Standard Errors that x_bar is from the curve’s mean of 186,000.
The Critical z Value is the number of Standard Errors that the Critical Value is from the curve’s mean.
Reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than the Critical z
Value.

241
Equivalently, reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than
the Critical z Value.

The Constant is the Constant from the Null Hypothesis (H0: x_bar = Constant = 186,000)
Z Score (Test Statistic) = (200,000 – 186,000)/4,743
Z Score (Test Statistic) = 2.951
This means that the sample mean, x_bar, is 2.951 standard errors from the curve mean (186,000).

Two-tailed Critical z Values = ±NORM.S.INV(1-α/2)


Two-tailed Critical z Values = ±NORM.S.INV(1-0.05/2)
Two-tailed = ±NORM.S.INV(0.975) = ±1.9599
This means that the Region of Rejection for this two-tailed hypothesis test in either tail begins at 1.9599
standard errors from (to the left of and to the right of) the standardized mean of zero.
This means that the boundaries of the Region of Rejection are 1.9599 standard errors from the curve
mean (186,000) on each side since this is a two-tailed test.
The Null Hypothesis is rejected because the z Score (+2.951) is further from the standardized mean of
zero than the Critical z Values (±1.9599). This is another indication that x_bar (200,000) is in the Region
of Rejection.

3) Compare the p Value With Alpha


The p Value is the percent of the curve that is beyond x_bar (200,000). If the p Value is smaller than
Alpha/2, the Null Hypothesis is rejected.
p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))
p Value =MIN(NORM.S.DIST(2.951,TRUE),1-NORM.S.DIST(2.951,TRUE))
p Value = 0.0016

242
The p Value (0.0016) is smaller than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore
reject the Null Hypothesis. The following Excel-generated graph shows that the red p Value (the curve
area beyond x_bar) is smaller than the yellow Alpha, which is the 5 percent Region of Rejection split
between both outer tails.

Excel Formula Shortcut to Performing a One-Sample z-Test


This problem could also be quickly solved with the following Excel z-Test formula:
p Value =MIN(Z.TEST(array,Constant,σ),1-Z.TEST(array,Constant,σ))
It should be noted that when the Constant is positive, the p Value Excel formula is
p Value = Z.TEST(array,Constant,σ),
When the Constant is negative, the p Value Excel formula is
p Value = 1-Z.TEST(array, Constant,σ))
The Constant is taken from the Null Hypothesis and is equal to 0.
The Null Hypothesis is as follows:
H0: x_bar = Constant = 186,000

243
The Excel z-Test formula produces the p Value as follows:

Note that the array can be spread across two columns as is done here. The array does not have to be
entirely contained in a single column in this case.

244
2) Two-Independent-Sample, Unpooled z-Test in Excel

Overview
This hypothesis test evaluates two independent samples to determine whether the difference between the
two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-
tailed test) than a constant.
This is an unpooled test. An unpooled test can always be used in place of a pooled test. An unpooled test
must be used when population variances are not similar. An unpooled test calculates the Standard Error
using separate standard deviations instead of combining them into a single, pooled standard deviation as
a pooled test does.
The t-Test is nearly always used to compare two independent samples. For this reason, only the
unpooled, two-independent-sample z-Test will be covered. The pooled version of this z-Test will not be
covered.
In the real world, the only the sample variances are known but the population variances are usually not
known and therefore t-tests are nearly always used to perform a two-independent-sample hypothesis test
of mean. For this reason, only the unpooled, two-independent-sample z-Test will be explained. This z-
Test can always be used in place of the pooled z-Test that could be used if population variances were
known to be similar enough.
x_bar1 - x_bar2 = Observed difference between the sample means

Note that this is the same formula for SE for the two-independent-sample, unpooled t-test except that the
variance for the z-Test is the population variance as follows:
2
var1 = σ1
2
var2 = σ2
and not the sample variance used for the t-test as follows:
2
var1 = s1
2
var2 = s2

Null Hypothesis H0: x_bar1 - x_bar2 = Constant


The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar1 - x_bar2 is beyond the Critical Value.
2) The z Score (the Test Statistic) is farther from zero than the Critical t Value
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

245
Example of 2-Sample, 2-Tailed, Unpooled z-Test in Excel
This problem is very similar to the problem solved in the t-test section for a two-independent-sample, two-
tailed t-test. Similar problems were used in each of these sections to show the similarities and also
contrast the differences between the two-independent-sample z-Test and t-test as easily as possible.
Two shifts on a production are being compared to determine if there is a difference in the average daily
number of units produced by each shift. The two shifts operate eight hours per day under nearly identical
conditions that remain fairly constant from day to day. A sample of the total number of units produced by
each shift on a random selection of days is taken. Determine with a 95 percent Level of Confidence if
there is a difference between the average daily number of units produced by the two shifts.
Note that when performing two-sample z-tests in Excel, always designate Sample 1 (Variable 1) to be the
sample with the larger mean.
The results of the two-independent-sample z-Test will be more intuitive if the sample group with the larger
mean is designated as the first sample and the sample group with the smaller mean is designated as the
second sample.
Details about both data samples are shown as follows:

Summary of Problem Information


Sample Group 1 – Shift A (Variable 1)
x_bar1 = sample1 mean = 46.55
µ1 (Greek letter “mu”) = population mean from which Sample 1 was drawn = Not Known
σ1 (Greek letter “sigma”) = population standard deviation from which Sample 1 was drawn = 25.5
2
Var1 = population1 variance = σ1 = 650.25
n1 = sample1 size = 40

Sample Group 2 – Shift B (Variable 2)


x_bar2 = samples mean = 42.24
µ2 (Greek letter “mu”) = population mean from which Sample 2 was drawn = Not Known
σ2 (Greek letter “sigma”) = population standard deviation from which Sample 2 was taken = 11.2
2
Var2 = population2 variance = σ2 = 125.44
n2 = sample1 size = 36
x_bar1 - x_bar2 = 46.55 – 42.24 = 4.31
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As mentioned, always designate Sample 1 (Variable 1) to be the sample with the larger mean when
performing two-sample z-Tests in Excel.
The results of the Unpooled z-Test will be more intuitive if the sample group with the larger mean is
designated as the first sample and the sample group with the smaller mean is designated as the second
sample.
Another reason for designating the sample group with the larger mean as the first sample is to obtain the
correct result from the Excel data analysis tool for two-independent-sample, unpooled z-Tests called the
z-Test:Two-Sample for Means. The test statistic (z in the Excel output, which stands for z Score) and
246
the Critical z value (z Critical in the Excel output) will have the same sign (as they always should) only if
the sample group with the larger mean is designated the first sample.
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternative Hypothesis
Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test
The initial two questions that need to be answered before performing the Four-Step Hypothesis Test of
Mean are as follows:

Question 1) What Type of Test Should Be Done?


a) Hypothesis Test of Mean or Proportion?
This is a test of mean because each individual observation (each sampled shift’s output) within each of
the two sample groups can have a wide range of values. Data points for tests of proportion are binary:
they can take only one of two possible values.

b) One-Sample or Two-Sample Test?


This is a two-sample hypothesis test because two independent samples are being compared with each
other. The two sample groups are the daily units produced by Shift A and the daily units produced by Shift
B.

c) Independent (Unpaired) Test or Dependent (Paired) Test?


It is an unpaired test because data observations in each sample group are completely unrelated to data
observations in the other sample group. The designation of “paired” or “unpaired” applies only for two-
sample hypothesis tests.

) One-Tailed or Two-Tailed Test?


The problem asks to determine whether there is a difference in the average number of daily units
produced by Shift A and by Shift B. This is a non-directional inequality making this hypothesis test a two-
tailed test. If the problem asked to determine whether Shift A really does have a higher average than Shift
B, the inequality would be directional and the resulting hypothesis test would be a one-tailed test. A two-
tailed test is more stringent than a one-tailed test.

e) t-Test or z-Test?
A z-Test is a statistical test in which the distribution of the Test Statistic under the Null Hypothesis can be
approximated by the normal distribution.

247
The Test Statistic is distributed by the normal distribution if both samples are large and both population
standard deviations are known. Both samples considered to be large samples because both sample sizes
(n1 = 40 and n2 = 36) exceeds 30. Both population standard deviations (σ1 = 25.5 and σ2 = 11.2) are
known.
Because both sample sizes (n1 = 40 and n2 = 36) exceeds 30, both sample means are therefore normal-
distributed as per the Central Limit Theorem. The difference between two normally-distributed sample
means is also normal-distributed. The Test Statistic is derived from the difference between the two means
and is therefore normal-distributed. A z-Test can be performed if the Test Statistic is normal-distributed.
It should be noted that a two-independent-sample, unpooled t-Test can always be used in place of a two-
independent-sample, unpooled. All z-Tests can be replaced be their equivalent t-Tests. As a result, some
major commercial statistical software packages including the well-known SPSS provide only t-Tests and
no direct z-Tests.

f) Pooled or Unpooled t-Test?


A pooled z-Test can be performed if the variances of both populations are similar, i.e., one population’s
standard deviation is no more than twice as large as the other population’s standard deviation. An
unpooled z-Test must be performed otherwise.
An unpooled z-Test can always be performed in the place of a pooled z-Test. Excel only provides a tool
and formula for an unpooled z-test but not a pooled z-Test. For this reason the only type of two-
independent-sample z-Test covered in this section will be the unpooled one.
t-Tests can always be performed in place of z-Tests. Excel does have separate tools and formulas for
pooled and unpooled, two-independent-sample t-Tests.
This hypothesis test is a z-Test that is two-independent-sample, unpooled two-tailed hypothesis
test of mean as long as all required assumptions have been met.

Question 2) Test Requirements Met?


a) Normal Distribution of Both Sample Means
The normal distribution can be used to map the distribution of the difference of the sample means (and
therefore the Test Statistic, which is derived from this difference) only if the following conditions exist:

1) Both Population Standard Deviations, σ1 and σ2, Are Known


Those values are σ1 = 25.5 and σ2 = 11.2. Population standard deviation, σ, is one of the two required
parameters needed to fully describe a unique normal distribution curve and must therefore be known in
order to perform a z-Test (which uses the normal distribution).
and

2) Both samples sizes are large (n > 30).


Because both sample sizes (n1 = 40 and n2 = 36) exceeds 30, both sample means are therefore normal-
distributed as per the Central Limit Theorem. The difference between two normally-distributed sample
means is also normal-distributed. The Test Statistic is derived from the difference between the two means
and is therefore normal-distributed.
The distributions of both samples and populations do not have to be verified because both sample means
are known to be normal-distributed as a result of the large size.
The difference between the sample means and therefore the Test Statistic are normal-distributed
because both samples are large and both population standard deviations are known.

248
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:
Step 1) Create the Null Hypothesis and the Alternative Hypothesis
Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject the Null Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical z Value Test
Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternative Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the average optimism scores for both sample groups are the
same. We will use the variable x_bar1-x_bar2 to represent the difference between the means of the two
groups. If the mean scores for both groups are the same, then the difference between the two means,
x_bar1-x_bar2, would equal zero. The Null Hypothesis is as follows:
H0: x_bar1-x_bar2 = Constant = 0
The Alternative Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the first mean (x_bar 1) is different than the
second mean (x_bar2). The Alternative Hypothesis is as follows:
H1: x_bar1-x_bar2 ≠ Constant
H1: x_bar1-x_bar2 ≠ 0
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.
Parameters necessary to map the distributed variable, x_bar1-x_bar2, to the normal distribution are the
following:

249
Step 2 – Map the Distributed Variable on a Normal Distribution Curve
H0: x_bar1-x_bar2 = Constant = 0
n1 = 40
n2 = 36
2 2
Var1 = σ1 = (25.5) = 650.25
2 2
Var2 = σ2 = (11.2) = 125.44

Unpooled Population Standard Error

SE = SQRT[ (Var1/n1) + (Var2/n2) ]


SE = SQRT[ (650.25/40) + (125.44/36) ]
SE = 4.443

This non-standardized normal distribution curve has its mean set to equal the Constant taken from the
Null Hypothesis, which is:
H0: x_bar1-x_bar2 = Constant = 0
This non-standardized normal distribution curve is constructed from the following parameters:
Mean = 0
Standard Error = 4.443
Distributed Variable = x_bar1-x_bar2
250
Step 3 – Map the Regions of Acceptance and Rejection
The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bar1 is from x_bar2 by showing how different x_bar1-x_bar2 (4.31) is
from zero.
The non-standardized t-Distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.
If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a Region of Rejection, the
Null Hypothesis is rejected. If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a
Region of Acceptance, the Null Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.
This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the
Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which
outer tail. The Alternative Hypothesis is the follows:
H1: x_bar1-x_bar2 ≠ 0
A “not equal” operator indicates that this will be a two-tailed test. This means that the Region of Rejection
is split between both outer tails.
The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical Values need to be calculated.

Calculate the Critical Values

Two-Tailed Critical Values


Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± NORM.S.INV(1-α/2) * SE
Critical Values = 0 ± NORM.S.INV(1 - 0.05/2) * 4.443
Critical Values = 0 ± NORM.S.INV(0.975) * 4.443
Critical Values = 0 ± 8.708
Critical Values = -8.708 and 8.708
The Region of Rejection is therefore everything that is to the right of 8.708 and everything to the left of -
8.708.

251
The following Excel-generated distribution curve with the blue Region of Acceptance and the yellow
Regions of Rejection is shown is as follows:

Step 4 – Determine Whether to Reject Null Hypothesis


The object of a hypothesis test is to determine whether to accept or reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:

1) Compare x_bar1-x_bar2 With Critical Value


Reject the Null Hypothesis if the sample mean, x_bar1-x_bar2 = 4.31, falls into the Region of Rejection.
Fail to reject the Null Hypothesis if the sample mean, x_bar1-x_bar2 = 4.31, falls into the Region of
Acceptance.
Equivalently, reject the Null Hypothesis if the sample mean, x_bar1-x_bar2, is further from the curve’s
mean of 0 than the Critical Value. Fail to reject the Null Hypothesis if the sample mean, x_bar1-x_bar2, is
closer than the curve’s mean of 0 than the Critical Value.
The Critical Values have been calculated to be +8.708 on the left and -8.708 on the right. x_bar1-x_bar2
(4.31) is closer to the curve mean (0) than the right Critical Value (+8.708). The Null Hypothesis would
therefore not be rejected.

252
2) Compare the z Score with the Critical z Value
The z Score is the number of Standard Errors that x_bar1-x_bar2 (4.31) is from the curve’s mean of 0.
The Critical z Value is the number of Standard Errors that the Critical Value is from the curve’s mean.
Reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than the Critical z
Value. Fail to reject the Null Hypothesis if the z Score is closer to the standardized mean of zero than the
Critical z Value.
Equivalently, reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than
the Critical z Value. Fail to reject the Null Hypothesis if the z Score is closer to the standardized mean of
zero than the Critical z Value.

The Constant is the Constant from the Null Hypothesis (H0: x_bar1-x_bar2 = Constant = 0)
Z Score (Test Statistic) = (4.31 – 0)/4.443
Z Score (Test Statistic) = 0.97
This means that the sample mean, x_bar1-x_bar2 (4.31), is 0.97 standard errors from the curve mean (0).
Two-tailed Critical z Values = ±NORM.S.INV(1-α/2)
Two-tailed Critical z Values = ±NORM.S.INV(1-0.05/2)
Two-tailed = ±NORM.S.INV(0.975) = ±1.9599
This means that the boundaries between the Region of Acceptance and the Region of Rejection are
1.9599 standard errors from the curve mean on each side since this is a two-tailed test.
The Null Hypothesis is not rejected because the z Score (+0.97) is closer to the standardized mean of
zero than the Critical z Value on the right side (+1.9599).

3) Compare the p Value With Alpha


The p Value is the percent of the curve that is beyond x_bar 1-x_bar2 (4.31). If the p Value is smaller than
Alpha/2, the Null Hypothesis is rejected. If the p Value is larger than Alpha/2, the Null Hypothesis is not
rejected.
p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))
p Value =MIN(NORM.S.DIST(0.97,TRUE),1-NORM.S.DIST(0.97,TRUE))
p Value = 0.1660
The p Value (0.1660) is larger than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore
do not reject the Null Hypothesis.

253
The following Excel-generated graph shows that the red p Value (the curve area beyond x_bar1-x_bar2) is
larger than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails.

Excel Data Analysis Tool Shortcut


This two-independent-sample, unpooled z-Test can be solved much quicker using the following Excel
data analysis tool:
z-Test: Two Sample For Means. This tool uses the formulas for an unpooled, two-sample z-Test as are
shown above. This tool can be accesses by clicking Data Analysis under the Data tab. The entire Data
Analysis Toolpak is an add-in that ships with Excel but must first be activated by the user before it is
available. This tool calculates the z Score and p Value using the same equations as shown.

254
Note that this tool requires that all data in each sample group be placed in a single column. In the
following image, only the first 19 data points of each sample are showing.

The completed dialogue box for this tool which produced the preceding output is as follows:

255
256
3) Paired (Two-Sample Dependent) z-Test in Excel

Overview
This hypothesis test determines whether the mean of a sample of differences between pairs of data
(x_bardiff) is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant.
Before-and-after fitness levels of individuals undergoing a training program would be an example of
paired data. The sample evaluated would be the group of differences between the before-and-after
scores of the individuals. This is called the difference sample.
The t-test is nearly always used instead of a z-Test to perform a two-dependent-sample (paired)
hypothesis test of mean. The z-Test requires the population standard deviation of the differences
between the pairs be known. The sample standard deviation of the difference sample is readily available
but the population standard deviation of the differences is usually not known. The t-test requires only the
sample standard deviation of the sample of paired differences be known.
x_bardiff = difference sample mean
Null Hypothesis H0: x_bardiff = Constant

Example of Paired, 1-Tailed (Left-Tail) z-Test in Excel


This problem is very similar to the problem solved in the t-test section for a paired, one-tailed t-test.
Similar problems were used in each of these sections to show the similarities and also contrast the
differences between the paired z-Test and t-test as easily as possible.
A new clerical program was introduced to a large company with the hope that clerical errors would be
reduced. 5,000 clerical workers in the company underwent the training program. 50 Clerical employees
who underwent the training were randomly selected. The average number of clerical errors that each of
these 50 employees made per month for six months prior to the training and also for six months following
the training were recorded. Each of the 50 employees had a similar degree of clerical experience within
the company and performed nearly the same volume and type of clerical work in the before and after
months. The standard deviation of the after-before differences for all 5,000 employees who underwent the
training is known to be 6.4.
Based upon the results of the 50 sampled clerical employees, determine with 95 percent certainty
whether the average number of monthly clerical mistakes was reduced for the entire 5,000 clerical
employees who underwent the training.
It is the difference that we are concerned with. A hypothesis test will be performed on the sample of
differences. The distributed variable will be designated as x_bar diff and will represent that average
difference between After and Before samples.

257
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred.
This problem illustrates why the t-test is nearly always used instead of a z-Test to perform a two-
dependent-sample (paired) hypothesis test of mean. The z-Test requires the population standard
deviation of the differences between the pairs be known. This is rarely ever the case, but will be given for
this problem so that a paired z-Test can be used. The t-test requires only the sample standard deviation
of the sample of paired differences be known.

It is the difference that we are concerned with. A hypothesis test will be performed on the sample of
differences. The distributed variable will be designated as x_bardiff and will represent that average
difference between After and Before samples.
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred.

258
Summary of Problem Information
x_bardiff = sample mean =AVERAGE() = -2.14
σdiff = population standard deviation = 6.4
n = sample size = number of pairs = COUNT() = 40
SEdiff = Standard Error = σdiff / SQRT(n) = 6.4 / SQRT(40) = 1.01
Note that this calculation of the Standard Error, SEdiff, using the population standard deviation, σdiff, is the
true Standard Error. If the sample standard error, sdiff, were used in place of σdiff, the Standard Error
calculated would be an estimate of the true Standard Error. The z-Test requires the population standard
deviation of the paired differences but the t-test uses the sample standard deviation as an estimate of the
population standard deviation of the paired differences.
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
The Excel data analysis tool Descriptive Statistics in not employed when the z-Test is used. Descriptive
Statistics should only be used if a t-Test will be performed. The Standard Deviation and Standard Error
calculated by Descriptive Statistics is based upon the sample standard deviation. the z-Test uses the
population standard deviation instead of the sample standard deviation used by the t-Test.
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


What Type of Test Should Be Done?
Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternative Hypothesis
Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Perform the Critical Value Test, the p Value Test, or the Critical t Value Test

259
The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean Are As Follows:

Question 1) What Type of Test Should Be Done?


a) Hypothesis Test of Mean or Proportion?
This is a test of mean because each individual observation (each sampled difference) within the sample
can have a wide range of values. Data points for tests of proportion are binary: they can take only one of
two possible values.

b) One-Sample or Two-Sample Test?


This is a two-sample hypothesis test because the data exists in two groups of measurements. One
sample group contains Before measurements and the other sample group contains After measurements.

c) Independent (Unpaired) Test or Dependent (Paired) Test?


This is a paired (dependent) hypothesis test because each Before observation has a related After
observation made on the same person.

d) One-Tailed or Two-Tailed Test?


The problem asks to determine whether there has been a reduction in clerical mistake from Before to
After. This is a directional inequality making this hypothesis test a one-tailed test. This one-tailed test will
be in the left tail because the Alternative Hypothesis, which will be created shortly, will use the “less than”
operator.
If the problem asked whether Before and After were simply different, the inequality would be non-
directional and the resulting hypothesis test would be a two-tailed test. A two-tailed test is more stringent
than a one-tailed test.

e) t-Test or z-Test?
A z-Test can be performed if the Test Statistic’s distribution can be approximated by the normal
distribution under the Null Hypothesis. The Test Statistic’s distribution can be approximated by the normal
distribution only if the difference sample size is large (n > 30) and the population standard deviation, σ, is
known. A t-Test must be used in all other cases.
Sample size, n, equals 40 and population standard deviation, σ, equals 6.4 so both conditions are met for
the z-Test.
It should be noted that a paired t-Test can always be used in place of a paired z-Test. All z-Tests can be
replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages
including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a z-Test that is two-sample, paired (dependent), one-tailed hypothesis test
of mean.

Question 2) Test Requirements Met?


a) Test Statistic Distributed According to Normal Distribution
A z-Test can be performed if the distribution of the Test Statistic can be approximated under the Null
Hypothesis by the normal distribution. The Test Statistic is derived from the mean of the difference
sample and therefore has the same distribution that the difference sample mean would have if multiple
similar samples were taken from the same population of differences between data sample pairs.

260
The difference sample size indicates how to determine the distribution of the difference sample mean and
therefore the distribution of the Test Statistic. As per the Central Limit Theorem, as the difference sample
size increases, the distribution of the difference sample means converges to the normal distribution.
In actuality, the sample mean converges toward the t-Distribution as sample size increases. The t-
Distribution converges to the standard normal distribution as sample size increases. The t-Distribution
nearly exactly resembles the standard normal distribution when sample size exceeds 30. The sample
mean’s distribution can therefore be approximated by the normal distribution. The Test Statistic’s
distribution can therefore be approximated by the normal distribution because the Test Statistic is derived
from the sample mean.
As per the Central Limit Theorem, the Test Statistic’s distribution can be approximated by the normal
distribution when the difference sample size is large regardless of the distribution of population from
which the sample was drawn. There is also no need to verify the normality of the difference sample, as
would be the case with a t-Test when population distribution is not known.
We can now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These
four steps are as follows:

Step 1 – Create the Null and Alternative Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the there is no difference between before and after data. We
will use the variable x_bardiff to represent the mean between the before and after measurements. The
Null Hypothesis is as follows:
H0: x_bardiff = Constant = 0
The Alternative Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether there has been a reduction in clerical errors,
i.e., the After measurements are, on average, smaller than the Before measurements. The Alternative
Hypothesis is as follows:
H1: x_bardiff < Constant
H1: x_bardiff < 0
The Alternative Hypothesis is directional (“greater than” or “less than” instead of “not equal,” which is non-
directional) and the hypothesis test is therefore a one-tailed test. The “less than” operator indicates that
this is a one-tailed test with the Region of Rejection (the alpha region) entirely contained in the left tail. A
“greater than” operator would indicate a one-tailed test focused on the right tail.
It should also be noted that a two-tailed test is more rigorous (requires a greater differences between the
two entities being compared before the test shows that there is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the means of the population of
paired differences from which the difference samples were taken. A population of paired differences
would be the differences of data pairs in a population of data pairs.
A paired z-Test determines whether to reject or fail to reject the Null Hypothesis that states that that
population of paired differences from which the difference sample was taken has a mean equal to the
Constant. The Constant in this case is equal to 0. This means that the Null Hypothesis states that the
average difference between data pairs of an entire population from which the sample of data pairs were
drawn is zero.

261
Parameters necessary to map the distributed variable, x_bardiff , to the normal distribution are the
following:
x_bardiff = sample mean =AVERAGE() = -2.14
σdiff = population standard deviation = 6.4
n = sample size = number of pairs = COUNT() = 40
SEdiff = Standard Error = σdiff / SQRT(n) = 6.4 / SQRT(40) = 1.01
These parameters are used to map the distributed variable, x_bardiff, to the Excel-generated normal
distribution curve as follows:

Step 2 – Map Distributed Variable to Normal Distribution Curve


A z-Test can be performed if the difference sample mean, and the Test Statistic (the t Value) are
distributed according to the normal distribution. If the difference sample has passed a normality test, then
the difference sample mean and closely-related Test Statistic are distributed according to the normal
distribution.
The variable x_bardiff is distributed according to the normal d Distribution. Mapping this distributed
variable to a t-Distribution curve is shown as follows:

This non-standardized t-Distribution curve has its mean set to equal the Constant taken from the Null
Hypothesis, which is:
H0: x_bardiff = Constant = 0
This non-standardized normal distribution curve is constructed from the following parameters:
Curve Mean = Constant = 0
Standard Errordiff = 1.01
Distributed Variable = x_bardiff

262
Step 3 – Map the Regions of Acceptance and Rejection
The goal of a hypothesis test is to determine whether to accept or reject the Null Hypothesis at a given
level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bardiff (-2.14) is from the hypothesized mean of 0.
The non-standardized normal distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.
The above distribution curve that maps the distribution of variable x_bar diff can be divided up into two
types of regions: the Region of Acceptance and the Region of Rejection.
If x_bardiff’s value of -2.14 falls in the Region of Acceptance, we must accept the Null Hypothesis. If
x_bardiff’s value of -2.14 falls in the Region of Rejection, we can reject the Null Hypothesis.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.
This 5 percent is entirely contained in the outer left tail. The outer left tail contains the 5 percent of the
curve that is the Region of Rejection.
The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical values need to be calculated as follows.

Calculate the Critical Value


One-tailed, Left tail Critical Value = Mean + (Number of Standard Errors from Mean to Region of
Rejection) * SEdiff
Note that the Mean = the Constant from the Null Hypothesis, which is 0.
One-tailed, Left tail Critical Value = Mean + NORM.S.INV(α) * SEdiff
One-tailed, Left tail Critical Value = 0 + NORM.S.INV(0.05) * 1.01
One-tailed, Left tail Critical Value = 0 + (-1.6449) * 1.01
One-tailed, Left tail Critical Value = -1.66
The Region of Rejection is therefore everything that is to the left of -1.66.

263
The distribution curve with the blue 95-percent Region of Acceptance and the yellow 5-percent Region of
Rejection entirely contained in the left tail is shown is as follows:

Step 4 – Determine Whether to Reject Null Hypothesis


The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:

1) Compare x-bardiff, with Critical Value


Reject the Null Hypothesis if the sample mean, x_bardiff = -2.14, falls into the Region of Rejection.
Equivalently, reject the Null Hypothesis if the sample mean, x_bardiff, is further the curve’s mean of 0 than
the Critical Value.
The Critical Values have been calculated to be -1.66 on the left. x_bardiff (-2.14) is further from the curve
mean (0) than left Critical Value (-1.66). The Null Hypothesis would therefore be rejected.
2) Compare z Score with Critical z Value
The z Score is also known as the Test Statistic in a z-Test and is the number of Standard Errors that
x_bardiff is from the mean (mean = Constant = 0).
The Critical z Value is the number of Standard Errors that the Critical Value is from the mean.
If the z Score is further from the standardized mean of zero than the Critical z Value, the Null Hypothesis
can be rejected.

264
Z Score (as called the Test Statistic) = (x_bardiff – 0) / SEdiff
Z Score (Test Statistic) = (-2.14 – 0) / 1.01
Z Score (Test Statistic) = -2.11
This indicates that x_bardiff is 2.11 standard errors to the left of the mean (mean = Constant = 0).

Critical z Value = NORM.S.INV(α)


If α = 0.05, the Critical z Value for a one-tailed hypothesis test in the left tail is calculated as follows:
Critical z Value = NORM.S.INV(0.05) = NORM.S.INV(0.05) = -1.6449
This means that the Region of Rejection for a one-tailed hypothesis test in the left tail begins at 1.66
standard errors from (to the left of) the standardized mean of zero.
The z Score (-2.11) is farther from the standardized mean of zero than the Critical z Value (-1.6449) so
the Null Hypothesis is rejected.

265
3) Compare p Value to Alpha.
The p Value is the percent of the curve that is beyond x_bardiff (-2.14). If the p Value is smaller than
Alpha, the Null Hypothesis is rejected. The p Value in this case is calculated by the following Excel
formula:
p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))
p Value =MIN(NORM.S.DIST(-2.11,TRUE),1-NORM.S.DIST(-2.11,TRUE))
p Value = 0.0174
The p Value (0.0174) is smaller than Alpha (0.05) and we therefore reject the Null Hypothesis. The
following Excel-generated graph shows that the red p Value (the curve area beyond x_bardiff) is smaller
than the yellow Region of Rejection in the left tail.
The p Value (0.0174) is smaller than Alpha (0.05) Region of Rejection in the left tail and we therefore
reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is
smaller than the yellow Alpha, which is the 5 percent Region of Rejection in the left tail. This is shown in
the following Excel-generated graph of this non-standardized t-Distribution curve:

266
Excel Shortcut to Performing a Paired z-Test
Excel does not provide any formulas or tools in the Data Analysis ToolPak add-in that directly perform the
paired z-Test. The easy work-around is to perform a one-sample z-Test on the difference data sample.
This formula is as follows:
p Value =MIN(Z.TEST(array,Constant,σdiff),1-Z.TEST(array,Constant,σdiff))
It should be noted that when the Constant is positive, the p Value Excel formula is
p Value = Z.TEST(array,Constant,σdiff),
When the Constant is negative, the p Value Excel formula is
p Value = 1-Z.TEST(array, Constant,σdiff))
The Constant is taken from the Null Hypothesis and is equal to 0.
The Null Hypothesis is as follows:
H0: x_bardiff = Constant = 0
Applying the Excel one-sample z-Test formula to the sample of difference data would give the following p
Value for this paired z-Test:

267
Hypothesis Testing on Binomial Data

Overview
A hypothesis test evaluates whether a sample is different enough from a population to establish that the
sample probably did not come from that population. If a sample is different enough from a hypothesized
population, then the population from which the sample came is different than the hypothesized
population.

Null Hypothesis
A hypothesis test is based upon a Null Hypothesis which states that the sample did come from a
hypothesized population. A hypothesis test compares a sample statistic such as a sample mean or
sample proportion to a population parameter such as the population’s mean or proportion. The amount of
difference between the sample statistic and the population parameter determines whether the Null
Hypothesis can be rejected or not.
The Null Hypothesis states that the population from which the sample came has the same mean or
proportion as a hypothesized population. The Null Hypothesis is always an equality stating that the
means or proportions of two populations are the same.
An example of a basic Null Hypothesis for a Hypothesis Test of Mean would be the following:
H0: x_bar = Constant = 5
This Null Hypothesis would be used to state that the population from which the sample was taken has a
mean equal to 5. The Constant (5) is the mean of the hypothesized population that the sample’s
population is being compared to. The Null Hypothesis states that the sample’s population and the
hypothesized population have the same means. The Alternative Hypothesis states that they are different.
An example of a basic Null Hypothesis for a Hypothesis Test of Proportion would be the following:
H0: p_bar = Constant = 0.3
This Null Hypothesis would be used to state that the population from which the sample was taken has a
proportion equal to 0.3. The Constant (0.3) is the proportion of the hypothesized population that the
sample’s population is being compared to. The Null Hypothesis states that the sample’s population and
the hypothesized population have the same proportions. The Alternative Hypothesis states that they are
different.

Null Hypothesis is Either Rejected or Not Rejected But Is Never


Accepted
A hypothesis test has only two possible outcomes: the Null Hypothesis is either rejected or is not rejected.
It is never correct to state that the Null Hypothesis was accepted. A hypothesis test only determines
whether there is or is not enough evidence to reject the Null Hypothesis. The Null Hypothesis is rejected
only when the hypothesis test result indicates a Level of Certainty that the Null Hypothesis is not valid at
least equals the specified Level of Certainty.
If the required Level of Certainty for a hypothesis test is specified to be 95 percent, the Null Hypothesis
will be rejected only if the test result indicates that there is at least a 95 percent probability that the Null
Hypothesis is invalid. In all other cases, the Null Hypothesis would not be rejected. This is not equivalent
to stating that the Null Hypothesis was accepted. The Null Hypothesis is never accepted; it can only be
rejected or not rejected.

268
Alternative Hypothesis
The Alternative Hypothesis is always in inequality stating that the means or proportions of two populations
are not the same. The Alternative Hypothesis can be non-directional if it states that the means or
proportions of two populations are merely not equal to each other. The Alternative Hypothesis is
directional if it states that the mean or proportion of one of the populations is less than or greater than the
mean of proportion of the other population.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Mean would be the
following:
H1: x_bar ≠ 5
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a mean that is not equal to 5.
An example of a directional Alternative Hypothesis would be the following:
H1: x_bar > 5
or
H1: x_bar < 5
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a mean that is either greater than or less than 5.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Proportion would be the
following:
H1: p_bar ≠ 0.3
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a proportion that is not equal to 0.3.
An example of a directional Alternative Hypothesis would be the following:
H1: p_bar > 0.3
or
H1: p_bar < 0.3
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a proportion that is either greater than or less than 0.3.

One-Tailed Test vs. a Two-Tailed Test


The number of tails in a hypothesis test depends on whether the test is directional or not. The operator of
the Alternative Hypothesis indicates whether or not the hypothesis test is directional. A non-directional
operator (a “not equal” sign) in the Alternative Hypothesis indicates that the hypothesis test is a two-
tailed test. A directional operator (a “greater than” or “less than” sign) in the Alternative Hypothesis
indicates that the hypothesis test is a one-tailed test.
The Region of Rejection (the alpha region) for a one-tailed test is entirely contained in the one of the
outer tails. A “greater than” operator in the Alternative Hypothesis indicates that the test is a one-tailed
test in the right tail. A “less than” operator in the Alternative Hypothesis indicates that the test is a one-
tailed test in the left tail. If α = 0.05, then one of the outer tails will contain the entire 5-percent Region of
Rejection.
The Region of Rejection (the alpha region) for a two-tailed test is split between both outer tails. Each
outer tail will contain half of the total Region of Rejection (alpha/2). If α = 0.05, then each outer tail will
contain a 2.5-percent Region of Rejection if the test is a two-tailed tailed.
269
Level of Certainty
Each hypothesis test has Level of Certainty that is specified. The Null Hypothesis is rejected only when
that Level of Certainty has been reached that the sample did not come from the population. A commonly
specified Level of Certainty is 95 percent. The Null Hypothesis would only be rejected in this case if the
sample statistic was different enough from the population parameter that at least 95 percent certainty was
achieved that the sample did not come from that population.

Level of Significance (Alpha)


The Level of Certainty for a hypothesis test is often indicated with a different term called the Level of
Significance also known as α (alpha). The relationship between the Level of Certainty and α is the
following:
α = 1 – Level of Certainty
An alpha that is set to 0.05 indicates that a hypothesis test requires a Level of Certainty of 95 percent that
the sample came from a different population to be reached before the Null Hypothesis is rejected.

Region of Acceptance
A Hypothesis Test of Mean or Proportion can be performed if the Test Statistic is distributed according to
the normal distribution or the t distribution. The Test Statistic is derived directly from the sample statistic
such as the sample mean. If the Test Statistic is distributed according to the normal or t distribution, then
the sample statistic is also distributed according to normal or t distribution. This will be discussed is
greater detail shortly.
A Hypothesis Test of Mean or Proportion can be understood much more intuitively by mapping the
sample statistic (the sample mean or proportion) to its own unique normal or t distribution. The sample
statistic is the distributed variable whose distribution is mapped according its own unique normal or t
distribution.
The Region of Acceptance is the percentage of area under this normal or t distribution curve that equals
the test’s specified Level of Certainty. If the hypothesis test requires 95 percent in order to reject the Null
Hypothesis, the Region of Acceptance will include 95 percent of the total area under the distributed
variable’s mapped normal or t distribution curve.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Acceptance, the Null Hypothesis is not rejected. If the observed value of the
sample statistic falls outside of the Region of Acceptance (into the Region of Rejection), the Null
Hypothesis is rejected.

Region of Rejection
The Region of Rejection is the percentage of area under this normal or t distribution curve that equals the
test’s specified Level of Significance (alpha). It is important to remember the following relationship:
Level of Significance (alpha) = 1 – Level of Certainty.
If the required Level of Certainty to reject the Null Hypothesis is 95 percent, then the following are true:
Level of Certainty = 0.95
Level of Significance (alpha) = 0.05

270
The Region of Acceptance includes 95 percent of the total area under the normal or t distribution curve
that maps the distributed variable, which is the sample statistic (the sample mean or proportion).
The Region of Rejection includes 5 percent of the total area under the normal or t distribution curve that
maps the distributed variable, which is the sample statistic (the sample mean or proportion). The 5-
percent alpha region is entirely contained in one of the tails if the test is a one-tailed test. The 5-percent
alpha region is split between both of the outer tails if the test is a one-tailed test.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Rejection (outside the Region of Acceptance), the Null Hypothesis is rejected.
If the observed value of the sample statistic falls inside of the Region of Acceptance, the Null Hypothesis
is not rejected.

Critical Value(s)
Each hypothesis test has one or two Critical Values. A Critical Value is the location of boundary between
the Region of Acceptance and the Region of Rejection. A one-tailed test has one critical value because
the Region of rejection is entirely contained in one of the outer tails. A two-tailed test has two Critical
Values because the Region of Rejection is split between the two outer tails.
The Null Hypothesis is rejected if the sample statistic (the observed sample mean or proportion) is farther
from the curve’s mean than the Critical Value on that side. If the sample statistic is farther from the
curve’s mean than the Critical value on that side, the sample statistic lies in the Region of Rejection. If the
sample statistic is closer to the curve’s mean than the Critical value on that side, the sample statistic lies
in the Region of Acceptance.

Test Statistic
Each hypothesis test calculates a Test Statistic. The Test Statistic is the amount of difference between
the observed sample statistic (the observed sample mean or proportion) and the hypothesized population
parameter (the Constant on the right side of the Null Hypothesis) which will be located at the curve’s
mean.
This difference is expressed in units of Standard Errors. The Test Statistic is the number of Standard
Errors that are between the observed sample statistic and the hypothesized population parameter. The
Null Hypothesis is rejected if that number of Standard Errors specified by the Test Statistic) is larger than
a critical number of Standard Errors. The critical number of Standard Errors is determined by the required
Level of Certainty.
The Test Statistic is either the z Score or the t Value depending on whether a z Test or t Test is being
performed. This will be discussed in greater detail shortly.

Critical t Value or Critical z Value


Each hypothesis test calculates Critical t or z Values. A Critical t Value is calculated for a t Test and a
Critical z Value is calculated for a z Test. A Critical t or z Value is the amount of difference expressed in
Standard Errors between the boundary of the Region of Rejection (the Critical Value) and hypothesized
population parameter (the Constant on the right side of the Null Hypothesis) which will be located at the
curve’s mean.
A one-tailed test has only one Critical t or z Value because the Region of Rejection is entirely contained in
one outer tail A two-tailed test has two Critical z or t Values because the Region of Rejection is split
between the two outer tails.
The Test Statistic (the t Value or z Score) are compared with the Critical t or z Value on that side of the
mean.
271
If the Test Statistic is farther from the standardized mean of zero than the Critical t or z Value on that side,
the Null Hypothesis is rejected. The Test Statistic is the number of Standard Errors that the sample
statistic is from the curve’s mean. The Critical t or z Value on the same side is the number of Standard
Errors that the Critical Value (the boundary of the Region of Rejection) is from the mean. If the Test
Statistic is farther from the standardized mean of zero than the Critical t or z value, the sample statistic
lies in the Region of Rejection.

Relationship Between p Value and Alpha


Each hypothesis test calculates a p Value. The p Value is the area under the curve that is beyond the
sample statistic (the observed sample mean or proportion). The p Value is the probability that a sample of
size n with the observed sample mean or proportion could have occurred if the Null Hypothesis were true.
If, for example, the p Value of a Hypothesis Test of Mean or Proportion were calculated to be 0.0212, that
would indicated that there is only a 2.12 percent chance that a sample of size n would have the observed
sample mean or proportion if the Null Hypothesis were true. The Null Hypothesis states that the
population from which the sample came has the same mean as the hypothesized population. This mean
is the Constant on the right side of the Null Hypothesis.
The p Value is compared to alpha for a one-tailed test and to alpha/2 for a two-tailed test. The Null
Hypothesis is rejected if p is smaller than α for a one-tailed test or if p is smaller than α/2 for a two-tailed
test. If the p Value is smaller than α for a one-tailed test or smaller than α/2 for a two-tailed test, the
sample statistic is in the Region of Rejection.

The 3 Equivalent Reasons To Reject the Null Hypothesis


The Null Hypothesis of a Hypothesis Test of Mean or Proportion is rejected if any of the following
equivalent conditions are shown to exist:
1) The sample statistic (the observed sample mean or proportion) is beyond the Critical Value. The
sample statistic would therefore lie in the Region of Rejection because the Critical Value is the boundary
of the Region of Rejection.
2) The Test Statistic (the t value or z Score) is farther from zero than the Critical t or z Value. The
Test Statistic is the number of Standard Errors that the sample statistic is from the curve’s mean. The
Critical t or z Value is the number of Standard Errors that the boundary of the Region of Rejection is from
the curve’s mean. If the Test Statistic is farther from farther from the standardized mean of 0 than the
Critical t or z Value, the sample statistic lies in the Region of Rejection.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test. The p Value is the
curve area beyond the sample statistic. α and α/2 equal the curve areas contained by the Region of
Rejection on that side for a one-tailed test and a two-tailed test respectively. If the p value is smaller than
α for a one-tailed test or α/2 for a two-tailed test, the sample statistic lies in the Region of Rejection.

Type I and Type II Errors


A Type I Error is a false positive and a Type II Error is a false negative. A false positive occurs when a
test incorrectly detects of a significant difference when one does not exist. A false negative occurs when a
test incorrectly fails to detect a significant different when one exists.
α (the specified Level of Significance) = a test’s probability of a making a Type I Error.
β = a test’s probability of a making a Type II Error.

272
Power of a Test
The Power of a test indicates the test’s sensitivity. The Power of a test is the probability that the test will
detect a significant difference if one exists. The Power of a test is the probability of not making a Type II
Error, which is failing to detect a difference when one exists. A test’s Power is therefore expressed by the
following formula:
Power = 1 – β

Effect Size
Effect size in a t-Test or z Test is a convention of expressing how large the difference between two
groups is without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.” A large effect would be a difference between two groups that is easily
noticeable with the measuring equipment available. A small effect would be a difference between two
groups that is not easily noticed.

Hypothesis Test of Mean vs. Proportion

Hypothesis Test covered in this section will either be Hypothesis Tests of Mean or Hypothesis Test of
Proportion. A data point of a sample taken for a Hypothesis Test of Mean can have a range of values. A
data point of a sample taken for a Hypothesis Test of Proportion is binary; it can take only one of two
values.
Hypothesis Tests of Mean – Basic Definition
A Hypothesis Test of Mean compares an observed sample mean with a hypothesized population mean to
determine if the sample was taken from the same population. An example would be to compare a sample
of monthly sales of stores in one region to the national average to determine if mean sales from the
region (the population from which the sample was taken) is different than the national average (the
hypothesized population parameter). As stated, a sample taken for a Hypothesis Test of Mean can have
a range of values. In this case, the sales of a sample sampled store can fall within a wide range of values.
Hypothesis Tests of mean are covered in detail separate sections on t Tests and z Tests.
t Tests are also summarized at the end of the section on the t distribution.
z Tests are also summarized at the end of the section on the normal distribution.
Hypothesis Tests of Proportion – Basic Definition
A Hypothesis Test of Proportion compares an observed sample proportion with a hypothesized
population proportion to determine if the sample was taken from the same population. An example would
be to compare the proportion of defective units from a sample taken from one production line to the
proportion of defective units from all production lines to determine if the proportion defective from the one
production line (the population from which the sample was taken) is different than from the proportion
defective of all production lines (the hypothesized population parameter). As stated, a sample taken for a
Hypothesis Test of Proportion can only have one of two values. In this case, a sampled unit from a
production line is either defective or it is not.

273
Data observations in the sample taken for a Hypothesis Test of Proportion are required to be distributed
according to the binomial distribution. Data that are binomially distributed are independent of each other,
binary (can assume only one of two states), and all have the same probability of assuming the positive
state.
The binomial distribution can be approximated by the normal distribution under the following two
conditions:
1) p (the probability of a positive outcome on each trial) and q (q = 1 – p) are not too close to 0 or 1.
2) np > 5 and nq > 5
A z Test can be performed on binomially-distributed data if the above conditions are met. Hypothesis
Test of Proportion only use z Tests and not t Tests because the binomial distribution is approximated by
the normal distribution, not the t distribution.
The Test Statistic for a z Test is a z Score.
A Hypothesis Test of Proportion is performed in a very similar manner to a Hypothesis Test of Mean. A
general description of the major steps is as follows:
1) A sample of binary data is taken. The sample proportion is calculated. Examples of a sample
proportion is the proportion of sampled people who are of one gender or the proportion of sampled
production units that are defective.
2) A Null Hypothesis is created stating the population from which the sample was taken has the same
proportion as a hypothesized population proportion. An Alternative Hypothesis is constructed stating that
sample population’s proportion is not equal to, greater than, or less than the hypothesized population
proportion depending on the wording of the problem.
3) The sample proportion is mapped to a normal curve that has a mean equal to the hypothesized
population proportion and a Standard Error calculated based upon a formula specific to the type of
Hypothesis Test of Proportion.
4) The Critical Values are calculated and the Regions of Acceptance and Rejection are mapped on the
normal graph that maps the distributed variable.
5) Critical z Values, the Test Statistic (z Score) and p Value are calculated.
6) The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
a) The observed sample proportion, p_bar, is beyond the Critical Value.
b) The z Value (the Test Statistic) is farther from zero than the Critical z Value.
c) The p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
The Null Hypothesis is not rejected in the output of the following Hypothesis Test of Proportion because
none of the above equivalent conditions exist. This is evidenced in the following graph:

274
This z-Test was a two-tailed test as evidenced by the yellow Region of Rejection split between the both
outer tails. In this t-Test the alpha was set to 0.05. This 5-percent Region of Rejection is split between the
two tails so that each tail contains a 2.5 percent Region of Rejection.
The mean of this non-standardized normal distribution curve is 0.30. This indicates that the Null
Hypothesis is as follows:
H0: p_bar = 0.30
Since this is a two-tailed t-Test, the Alternative Hypothesis is as follows:
H1: p_bar ≠ 0.30
This one-sample z-Test is evaluating whether the population from which the sample was taken has a
population proportion that is not equal to 0.30. This is a non-directional z-Test and is therefore two-tailed.
The sample statistic is the observed sample proportion of this single sample taken for this test. This
observed sample proportion is calculated to be 0.42.
The boundaries of the Region of Rejection occur at 0.17 and 0.43. Everything beyond these two points is
in the Region of Rejection. Everything inside of these two points is in the Region of Acceptance. These
two Critical Values are 1.95 Standard Errors from the standardized mean of 0. This indicates that the
Critical t Values are ±1.96.
The graph shows that the sample statistic (the sample proportion of 0.42) falls inside the right Critical
Value of 0.43 and is therefore in the Region of Acceptance.
The sample statistic is 1.85 Standard Errors from the standardized mean of 0. This is closer to the
standardized mean of 0 than the right Critical t value which is 1.96.
The curve area beyond the sample statistic consists of 3.2 percent of the area under the curve. This is
larger than α/2 which is 2.5 percent of the total curve area because alpha was set to 0.05.
The Null Hypothesis is not rejected. As the graph shows, none of the three equivalent conditions have
been met to reject the Null Hypothesis. It cannot be stated with at least 95 percent certainty that the
proportion of the population from which the sample was taken does not equal the hypothesized
population proportion of 0.30.
It should be noted that failure to reject the Null Hypothesis is not equivalent to accepting the Null
Hypothesis. A hypothesis test can only reject or fail to reject the Null Hypothesis.

275
Uses of Hypothesis Tests of Proportion
1) Comparing the proportion of a sample taken from one population with the another population’s
proportion to determine if both populations have the different proportions. An example of this would be to
compare the proportion of monthly purchases returned in a sample of retail stores from one region to the
national mean monthly return rate to determine if the monthly proportion of sales returned in all stores in
the one region is different than the national monthly return rate.
2) Comparing the proportion of a sample taken from one population to a fixed proportion to determine if
that population’s proportion is different than the fixed proportion. An example of this might be to compare
the proportion of a specified chemical measured in a sample of a number of units of a product to the
company’s claims about that product specification to determine if the actual proportion of the chemical in
all units of that company’s product is different than what the company claims it is.
3) Comparing the proportion of a sample from one population with the proportion of a sample from
another population to determine if the two populations have different proportions. An example of this
would be to compare the proportion of defective units of a sample of production runs by one crew with the
proportion of defective units of a sample of production runs by another crew to determine if the two crews
have consistently different proportions of defective units in all of their runs.
4) Comparing successive measurement pairs taken on the same group of objects to determine if anything
has changed between measurements. An example of this would be to evaluate whether there is
difference in the proportion of the same people passing a standardized test before and after a training
program to determine if the training program makes a difference in the proportion of all people who take
the standardized test before and after undergoing the training
5) Comparing the same measurements taken on pairs of related objects. An example of this would be to
evaluate whether the proportion of total household income brought in by the husband and the wife is
different in a sample of married couples to determine if there is a difference in the proportions of total
household income brought in by husbands and wives in all married couples.
It is important to note that a hypothesis test is used to determine if two populations are different, The
outcome of hypothesis test is to either reject or fail to reject the Null Hypothesis. It would be incorrect to
state that a hypothesis test is used to determine if two populations are the same.

Types of Hypothesis Tests of Proportion


The 3 types of Hypothesis tests of Proportion discussed here are the following:
One-sample Hypothesis Test of Proportion
Two-Independent-Sample, Pooled Hypothesis Test of Proportion
Two-Independent-Sample, Unpooled Hypothesis Test of Proportion
A description of each of these three hypothesis tests of proportion is as follows:

276
1) One-Sample Hypothesis Test of Proportion

Overview
This hypothesis test analyzes a single sample to determine if the population from which the sample was
taken has a proportion that is equal to a constant, p. In many cases, a one-sample Hypothesis test of
Proportion is used to determine if one population has the same proportion as the known proportion of
another population, p.
p_bar = observed sample proportion
p_bar = X/n = (Number of successes in the sample)/(Number of trials in the sample)
q_bar = 1 – p_bar
p = Constant (is the hypothesized population proportion, which is often a known population proportion)
q=1–p

Null Hypothesis H0: p_bar = Constant = p

The z Value for this test, which is the Test Statistic, is calculated as follows:

SE is calculated using population parameters p and q, not sample statistics p_bar and q_bar.

The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed p_bar is beyond the Critical Value.
2) The z Value (the Test Statistic) is farther from zero than the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

277
Example of a One-Sample, Two-Tailed Hypothesis Test of
Proportion in Excel
Following is result of a one-sample hypothesis test of proportion that is two-tailed:
Over the course of one entire year, 30 percent of all units produced by one production line had at least
one defect. During the next year the first 21 out of 50 units produced by the production line had a defect.
Determined with 95 percent certainty whether production line’s performance has changed.
Note that the question asks only whether there has been any change, not whether there has been a
specific change such as whether there has been a worsening of performance. This means that the
hypothesis test will be a two-tailed test and not a one-tailed test.

Summary of Problem Information


p = Population proportion defective for last year = 0.30
The example evaluates whether the population proportion for this year equals p (last year’s proportion
defectives) based on a sample taken from this year’s production.
q = Population proportion not defective for last year
q = 1 – p = 0.70
X = number of defects detected in the sample taken from this year’s production = 21
n = Sample size = 50
p_bar = Observed sample proportion = X/n = 21/50 = 0.42
Required Level of Certainty = 95% = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As with all Hypothesis Tests of Proportion, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions That Must be Answered Satisfactorily
1) What Type of Test Should Be Done?
2) Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2) – Map the Normal Curve Based on the Null Hypothesis
Step 3) – Map the Regions of Acceptance and Rejection
Step 4) – Perform the Critical Value Test, the p Value Test, or the Critical z Value Test

278
The Initial Two Questions That Must Be Answered Before Performing the Four-Step Hypothesis Test of
Proportion are as follows:

Question1) Type of Test?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Proportion because each individual observation (each sample unit from the
production line) can have only one of two values: it is defective or it is not. Samples for Hypothesis Test of
Mean can have more than two values and often can assume any value within a possible range of values.

b) One-Tailed or Two-Tailed Hypothesis?


The problem asks to determine whether the production line’s proportion defective during the current year
is simply different than during the previous year. This is a non-directional inequality making this
hypothesis test a two-tailed test. If the problem asked whether the proportion defective this year was
greater than or less than last year, the inequality would be directional and the resulting hypothesis test
would be a one-tailed test. A two-tailed test is more stringent than a one-tailed test.

b) One-Sample or a Two-Sample Test?


This is a one-sample hypothesis test because only one production sample of 50 units has had its
proportion defective calculated. This sample proportion will be compared with the overall proportion
defective from the previous year.

d) t-Test or z-Test?
A Hypothesis Test of Proportion uses the normal distribution to approximate the underlying binomial
distribution that the sampled objects follow. A Hypothesis Test of Proportion will therefore always be a z
Test and not a t Test.
A hypothesis test of proportion will always be a z test because a hypothesis test of proportion always
uses the normal distribution to model the distributed variable. A t Test uses the t distribution to model the
distributed variable.
Samples taken for a Hypothesis Test of Proportion are binary: they can only assume one of two values.
Binary objects are distributed according to the binomial distribution. The binomial distribution can be
approximated by the normal distribution. A Hypothesis Test of Proportion uses the normal distribution to
approximate the underlying binomial distribution that the sampled objects follow. A Hypothesis Test of
Proportion will therefore always be a z Test and not a t Test.
This hypothesis test is a z Test that is one-sample, two-tailed hypothesis test of proportion.

Question 2) Test Requirements Met?


Can Binomial Distribution Be Approximated By Normal Distribution?
The samples for a Hypothesis Test of Proportion follow the binomial distribution because each sample
has only two possible outcomes and the probability of the positive outcome is always the same for each
sample taken.
A Hypothesis Test of Proportion approximates the binomial distribution with the normal distribution so that
normal-distribution-based statistical analysis tools such as z Scores can be used.
The most important requirement of a Hypothesis Test of Proportion is the validity of approximating the
binomial distribution with the normal distribution. The binomial distribution can be approximated by the
normal distribution if sample size, n, is large enough and p is not too close to 0 or 1. This can be summed
up with the following rule:

279
The binomial distribution can be approximated by the normal distribution if np > 5 and nq >5. In this case,
the calculation of np and qp is the following:
n = 50
p = 0.30
q = 0.70
np = 15 and nq = 35
np > 5 and nq >5 so it is valid to approximate the binomial distribution with the normal distribution.
Because the binomial distribution can be modeled by the normal distribution, a z Test can be used to
perform a Hypothesis Test of Proportion.
The binomial distribution has the following parameters:
Mean = np
Variance = npq
Each unique normal distribution can be completely described by two parameters: its mean and its
standard deviation. As long as np > 5 and nq > 5, the following substitution can be made:
Normal (mean, standard deviation) approximates Binomial (n,p)
If np is substituted for the normal distribution’s mean and npq is substituted for the normal distribution’s
standard deviation as follows:
Normal (mean, standard deviation)
becomes
Normal (np, npq)
which approximates Binomial (n,p)
This can be demonstrated with Excel using data from this problem.
n = 50 = the number of trials in one sample
p = 0.3 = expected probability of a positive result in all trials
q = 1 – p = 0.7 = expected probability of a negative result in all trials
If the number of positive outcomes is randomly picked to be X = 21, the normal approximation of the
binomial distribution’s PDF at the point X = 21 is computed as follows:
BINOM.DIST(X, n, p, FALSE)
= BINOM.DIST(21, 50, 0.3, FALSE)
= 0.023
The normal distribution’s PDF will equal approximately the same value of binomial distribution’s PDF if the
following substitutions are made:
NORM.DIST(X, Mean, Stan. Dev, FALSE)
= NORM.DIST(X, np, npq, FALSE)
This is the basis for the normal approximation of the binomial distribution as follows:
BINOM.DIST(X, n, p, FALSE) ≈ NORM.DIST(X, np, npq, FALSE)
NORM.DIST(X, np, npq, FALSE)
= NORM.DIST(21,15,10.5,FALSE) = 0.032
The difference between BINOM.DIST(21, 50, 0.3, FALSE) and NORM.DIST(21,15,10.5,FALSE) is less
than 0.01. That is reasonably close.
280
Note that the normal approximation of the binomial distribution only works for the PDF (Probability
Density Function) and not the CDF (Cumulative Distribution Function). Replacing FALSE with TRUE in
the above BINOM.DIST() and NORM.DIST() formulas would calculate their CDFs instead of their PDFs.

We now proceed to complete the four-step method for solving all Hypothesis Tests of Proportion. These
four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternative Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the proportion defective of the entire current year’s production
is not different than the proportion defective from the entire last year’s production, p, which was p = 30
percent or 0.30. This Null Hypothesis would be written as follows:
H0: p_bar = Constant = p
H0: p_bar = p = 0.3
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. In this case, the Alternative Hypothesis would state that the average defective percentage for
the current year is different than the average defective percentage for last year. This Alternate Hypothesis
is as follows:
H1: p_bar ≠ Constant
H1: p_bar ≠ p
H1: p_bar ≠ 0.3
The “not-equals” sign indicates that this is a two-tailed hypothesis test and not a one-tailed test.
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the proportion of the populations
from which the sample was taken. A one-sample Hypothesis Test of Proportion determines whether to
reject or fail to reject the Null Hypothesis which states that the population has a proportion equal to the
Constant. In this case the population from which the sample was taken is the entire current year’s
production. The Constant in this case is equal to the proportion defective for the entire last year’s
production.

281
Step 2 – Map the Distributed Variable to Normal Distribution
A z Test can be performed if the sample proportion is distributed according to the normal distribution. The
sample proportion, p_bar, is distributed according to the binomial distribution. The normal distribution can
be used to approximate this binomial distribution because the requirements that np and qp are greater
than 5. The distribution of the sample proportion, p_bar, can therefore be approximated by the normal
distribution.
The sample proportion, p_bar, will be mapped to a normal distribution. Each unique normal distribution
can be fully described by two parameters: its mean and standard deviation.
The mean of the normal distribution curve that maps the distributed variable p_bar is equal to the
Constant in the Null Hypothesis. The Null Hypothesis is as follows:
H0: p_bar = Constant = p = 0.3
The distributed variable p_bar will be mapped to a normal distribution curve with a mean = 0.3.
Population parameters such as population standard deviation have to be estimated if only sample data is
available. In this case the population standard deviation will be estimated by the Standard Error which is
based on the sample size.
Standard Error (SE) for a one-sample Hypothesis Test of Proportion is calculated as follows:
SE = Standard Error

SE = SQRT[ (0.3*0.7)/50 ]
SE = 0.0648
Note that SE is calculated using p (0.3) from the Null Hypothesis and not p_bar (0.42). q is derived from p
and q_bar is derived from p_bar.
The normal distribution curve that maps the distribution of variable p_bar now has the following
parameters:
Mean = 0.3
Standard Error = 0.0648

282
This Excel-generated normal distribution curve is shown as follows:

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to accept or reject the Null Hypothesis at a given
level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected.
In this case we are trying to show graphically how different the observed p_bar (0.42) is from the known p
(p = 0.3 = the previous annual proportion of defects). p_bar is the defect rate of a sample taken from this
year’s production and p is the proportion defective from the entire last year’s production.
A Hypothesis Test of Proportion calculates the probability of a sample having a proportion defective equal
to the observed p_bar (0.42) if the true defect rate of this year’s production is the same as last year’s
production (0.3) and the sample proportion is normally distributed.

The Regions of Acceptance and Rejection


The normal distribution curve that maps the distribution of variable p_bar can be divided up into two types
of regions: the Region of Acceptance and the Region of Rejection.
If p_bar’s observed value of 0.42 falls in the Region of Acceptance, we fail to reject the Null Hypothesis. If
p_bar falls in the Region of Rejection, we reject the Null Hypothesis.
This is a two-tailed test because the Region of Rejection is split between both outer tails. The Alternative
Hypothesis indicates this. A hypothesis test is non-directional if the Alternative Hypothesis has the non-
directional operator “not equal to.”
A hypothesis test is a one-tailed test if the Alternative Hypothesis has a directional operator, i.e., “greater
than” or “less than.” A one-tailed test has the entire Region of Rejection contained in one outer tail. A
“greater than” operator indicates a right-tailed test. A “less than” operator indicates a left-tailed test.
In this case the Alternative Hypothesis is the following:
H1: p_bar ≠ 0.3

283
This Alternate Hypothesis indicates that the hypothesis test has the Region of Rejection split between
both outer tails and is therefore two-tailed.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this normal distribution
curve.
Because this test is a two-tailed test, the 5 percent Region of Rejection is divided up between the two
outer tails. Each outer tail contains 2.5 percent of the total 5 percent is the Region of Rejection.

Calculate Critical Values


The Critical Value is the boundary of the Region of Rejection. The Critical Value is the boundary on either
side of the curve beyond which 2.5 percent of the total area under the curve exists. In this case both
Critical Values can be found by the following:
Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± NORM.S.INV(1-α/2) * SE
Critical Values = 0.3 ± NORM.S.INV(1- 0.05/2 ) * 0.06428
Critical Values = 0.3 ± NORM.S.INV(0.975) * 0.06428
Critical Values = 0.30 ± 0.13
Critical Values = 0.43 and 0.17
The Region of Rejection is therefore everything that is to the right of 0.43 and everything to the left of
0.17.
This normal distribution curve with the blue Region of Acceptance in the center and the yellow Regions of
Rejection in the outer tails is shown is as follows:

284
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
1) Compare p-bar With Critical Value
If the observed value of p_bar (0.43) falls into the Region of Acceptance (the blue region under the
curve), the Null Hypothesis is not rejected. If the observed value of p_bar falls into the Regions of
Rejection (either of the two yellow outer regions), the Null Hypothesis is rejected.
The observed p_bar (0.42) is closer to the curve’s mean (0.3) than the Critical Value (0.43) and falls in
the blue Region of Acceptance. We therefore do not reject the Null Hypothesis. We cannot state with 95
percent certainty that there is a real difference between the overall defect rates of this year and last year
based upon the defect rate of the sample taken from this year’s production.

2) Compare z Value With Critical z Value


The z Value is the number of Standard Errors that the observed p_bar is from the standardized mean of
zero. The Critical z Value is the number of Standard Errors that the Critical Value is from the mean.
If the z Value is closer to the standardized mean of zero than the Critical z Value, the Null Hypothesis is
not rejected. If the z Value is farther from the standardized mean of zero than the Critical z Value, the Null
Hypothesis is rejected.
z Value = Test Statistic

z Value = (0.42 – 0.3)/0.0648


z Value = 1.85
This means that the observed p_bar (0.42) is 1.85 Standard Errors from the curve’s mean.
Critical z Valuesα=0.05,two-tailed = ±NORM.S.INV(1-α/2)
Critical z Valuesα=0.05,two-tailed = ±NORM.S.INV(0.975) = 1.96
This means that the left and right boundaries of the Regions of Rejection are 1.96 Standard Errors from
the curve’s mean.
The z Value (1.85) is closer to the curve’s standardized mean of zero than the Critical z Value (1.96) so
the Null Hypothesis is not rejected.

285
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond the observed p_bar (0.42). If the p Value is smaller
than Alpha/2 (if the test is two-tailed), the Null Hypothesis is rejected. If the p Value is larger than Alpha/2,
the Null Hypothesis is not rejected.
The p Value is calculated by the following Excel formula:
p Value = MIN(NORM.S.DIST(z Value,TRUE),1-NORM.S.DIST(z Value,TRUE))
p Value = MIN(NORM.S.DIST(1.85,TRUE),1-NORM.S.DIST(1.85,TRUE))
p Value = 0.032
The p Value (0.032) is larger than Alpha/2 (0.025) and we therefore do not reject the Null Hypothesis.
The following Excel-generated graph shows that the red p Value (the curve area beyond p_bar) is larger
than the yellow Alpha/2 Region of rejection in the outer right tail.

It should be noted that if this z Test were a one-tailed test, which is less stringent than a two-tailed test,
the Null Hypothesis would now have been reject because of the following three equivalent conditions:
1) The p Value (0.034) is smaller than Alpha (0.05). A one-tailed test would contain the entire 5-percent
Region of Rejection in one outer tail.
2) p_bar (0.42) would now be the Region of Rejection, which would now have its outer right boundary at
0.41, the Critical value for a one-tailed test.
Critical Valueone-tailed,right tail = Mean + NORM.S.INV(1-α) * SE = 0.41
3) The z Value (1.85) would now be farther the standardized mean of zero than the critical z Value which
would now be 1.6444
Critical z Valueone-tailed,right tail = NORM.S.INV(1-α) = 1.6448

286
Two-Sample, Pooled Hypothesis Test of Proportion in
Excel

Overview
This hypothesis test analyzes two independent samples to determine if the populations from which the
samples were taken have equal proportions. This test is often used to determine whether two sample
proportions are likely the same. The test is a called pooled test because the Null Hypothesis states that
the two sample proportions are the same. The formula for Standard Error uses a pooled proportion that
combines the proportions of both samples. This formula is shown in the following set of formulas.
p_bar1 = observed sample 1 proportion
p_bar1 = X1/n1
= (Number of successes in sample 1)/(Number of trials in sample 1)
p_bar2 = observed sample 2 proportion
p_bar2 = X2/n2
= (Number of successes in sample 2)/(Number of trials in sample 2)

Null Hypothesis H0: p_bar2 - p_bar1 = Constant = 0


This Null Hypothesis that states that p_bar1 = p_bar2 indicates that this is a pooled test.

The z Value for this test is the Test Statistic and is calculated as follows:

287
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed p_bar2 - p_bar1 is beyond the Critical Value.
2) The z Value (the Test Statistic) is farther from zero than the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

Example of a Two-Sample, Pooled, Two-Tailed Hypothesis


Test of Proportion in Excel
A new 401K plan is being proposed to the employees of a large company as a replacement for the
existing plan.
90 out of 200 randomly sampled male employees prefer the proposed plan over the existing plan.
59 out of 100 randomly sampled female employees prefer the proposed plan over the existing plan.
Determine with 95 percent certainty whether there is a difference between the proportions of male and
female employees who prefer the proposed plan.
Note that this will be a pooled z Test because this objective of this hypothesis test is determine whether
there is a difference between p_bar1 and p_bar2, i.e. p_bar1 – p_bar2 = 0 or p_bar2 – p_bar1 = 0
Two-independent-sample Hypothesis Tests of Proportion remain slightly more intuitive and allow for
consistent use of the variable name p_bar2–p_bar1 if the larger sample proportion is always designated
as p_bar2.and the smaller sample proportion is designated as p_bar1.

Summary of Problem Information


First Sample (Male Employees)
X1 = number of positive outcomes in n1 trials = 90
n1 = number of trial (sample size) = 200
p_bar1 = X1/n1 = 90/200 = 0.45

Second Sample (Female Employees)


X2 = number of positive outcomes in n2 trials = 59
n2 = number of trial (sample size) = 100
p_bar2 = X2/n2 = 59/100 = 0.59
p_bar2 – p_bar1 = 0.59 – 0.45 = 0.14

Level of Certainty = 0.95


Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05

288
As with all Hypothesis Tests of Proportion, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


1) What Type of Test Should Be Done?
2) Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2) – Map the Normal Curve Based on the Null Hypothesis
Step 3) – Map the Regions of Acceptance and Rejection
Step 4) – Perform the Critical Value Test, the p Value Test, or the Critical z Value Test

The Initial Two Questions That Must Be Answered Before Performing the Four-Step Hypothesis Test of
Proportion are as follows:

Question1) Type of Test?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Proportion because each individual observation (the preference of each
sampled employee) can have only one of two values: the employee either prefers or does not prefer the
proposed plan over the existing plan. Samples for Hypothesis Test of Mean can have more than two
values and often can assume any value within a possible range of values.

b) One-Tailed or Two-Tailed Hypothesis?


The problem asks whether the proportion of male employees who prefer the proposed plan over the
existing plan is simply different than the proportion of female employees who have the same preference.
This is a non-directional inequality making this hypothesis test a two-tailed test. If the problem asked
whether the proportion of males preferring the proposed plan was greater than or less than the proportion
of females with that preference, the inequality would be directional and the resulting hypothesis test would
be a one-tailed test. A two-tailed test is more stringent than a one-tailed test.

b) One-Sample or a Two-Sample Test?


This is a two-sample hypothesis test because two independent samples are being compared. One
sample included only male preferences and the other sample includes only female preferences.

d) Pooled Test or an Unpooled Test?


Once a hypothesis test has been determined to be a two-independent-sample test, the test should be
designated as a pooled or unpooled test because each uses different formulas. A pooled Hypothesis Test
of Proportion makes a basic assumption that the proportions of both populations are the same. An
unpooled Hypothesis Test of Proportion makes a basic assumption that the proportions of both
populations are not the same. This is a pooled test because the proportion of male and female employees
who prefer the proposed plan is assumed to be the same.

289
d) t-Test or z-Test?
A Hypothesis Test of Proportion uses the normal distribution to approximate the underlying binomial
distribution that the sampled objects follow. A Hypothesis Test of Proportion will therefore always be a z
Test and not a t Test.
A hypothesis test of proportion will always be a z test because a hypothesis test of proportion always
uses the normal distribution to model the distributed variable. A t Test uses the t distribution to model the
distributed variable.
Samples taken for a Hypothesis Test of Proportion are binary: they can only assume one of two values.
Binary objects are distributed according to the binomial distribution. The binomial distribution can be
approximated by the normal distribution. A Hypothesis Test of Proportion uses the normal distribution to
approximate the underlying binomial distribution that the sampled objects follow. A Hypothesis Test of
Proportion will therefore always be a z Test and not a t Test.
This hypothesis test is a z Test that is two-independent-sample, pooled, two-tailed hypothesis test
of proportion.

Question 2) Test Requirements Met?


Can Binomial Distribution Be Approximated By Normal Distribution?
The samples for a Hypothesis Test of Proportion follow the binomial distribution because each sample
has only two possible outcomes and the probability of the positive outcome is always the same for each
sample taken.
A Hypothesis Test of Proportion approximates the binomial distribution with the normal distribution so that
normal-distribution-based statistical analysis tools such as z Scores can be used.
The most important requirement of a Hypothesis Test of Proportion is the validity of approximating the
binomial distribution with the normal distribution. The binomial distribution can be approximated by the
normal distribution if sample size, n, is large enough and p is not too close to 0 or 1. This can be summed
up with the following rule:
The binomial distribution can be approximated by the normal distribution if np > 5 and nq >5. In this case,
the calculation of np and qp is the following:
Sample 1
X1 = 90
n1 = 200
p1 = 0.45
q1 = 0.55
n1p1 = 90 and n1q1 = 110

Sample 2
X2 = 59
n2 = 100
p2 = 0.59
q2 = 0.41
n2p2 = 59 and n2q2 = 41

290
np > 5 and nq >5 for both samples so it is valid to approximate the binomial distribution with the normal
distribution. Because the binomial distribution can be modeled by the normal distribution, a z Test can be
used to perform a Hypothesis Test of Proportion.

The binomial distribution has the following parameters:


Mean = np
Variance = npq
Each unique normal distribution can be completely described by two parameters: its mean and its
standard deviation. As long as np > 5 and nq > 5, the following substitution can be made:
Normal (mean, standard deviation) approximates Binomial (n,p)
When np is substituted for the normal distribution’s mean and npq is substituted for the normal
distribution’s standard deviation, then the following is true:
Normal (mean, standard deviation)
becomes
Normal (np, npq)
This approximates Binomial (n,p).
The approximation can be demonstrated with Excel using data from this problem.
X = 90 = the number of positive outcomes in n trials
n = 200 = the number of trials in one sample
p = 0.45 = expected probability of a positive result in all trials
q = 1 – p = 0.55 = expected probability of a negative result in all trials
The normal approximation of the binomial distribution as follows:
BINOM.DIST(X, n, p, FALSE) ≈ NORM.DIST(X, np, npq, FALSE)
Analyzing the data from Sample 1 would produce the following comparison:
BINOM.DIST(X, n, p, FALSE)
= BINOM.DIST(90, 200, 0.45, FALSE) = 0.056
NORM.DIST(X, np, npq, FALSE)
= NORM.DIST(90, 90, 49.5, FALSE) = 0.008
The difference between BINOM.DIST(90, 200, 0.45, FALSE) and NORM.DIST(90, 90, 49.5, FALSE) is
less than 0.05. That is reasonably close.
Note that the normal approximation of the binomial distribution only works for the PDF (Probability
Density Function) and not the CDF (Cumulative Distribution Function). Replacing FALSE with TRUE in
the above BINOM.DIST() and NORM.DIST() formulas would calculate their CDFs instead of their PDFs.

291
We now proceed to complete the four-step method for solving all Hypothesis Tests of Proportion. These
four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternative Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the proportion of males employees who prefer the proposed
plan is not different than that the proportion of males employees who prefer the proposed plan. This Null
Hypothesis would be written as follows:
H0: p_bar2–p_bar1 = Constant = 0
This test is called a pooled test because the Null Hypothesis states that the two sample proportions are
the same. The Constant in the Null Hypothesis equals zero for a pooled, two-independent-sample
Hypothesis Test of Proportion. The formula for Standard Error in a pooled, two-independent-sample
Hypothesis Test of Proportion uses a pooled proportion that combines the proportions of both samples.
The Null Hypothesis for an unpooled, two-independent-sample Hypothesis Test of Proportion states that
the two sample proportions are not the same. The Constant for this Null Hypothesis is a non-zero
number.
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. In this case, the Alternative Hypothesis would state that the proportion of males employees who
prefer the proposed plan is different than that the proportion of males employees who prefer the proposed
plan. This Alternate Hypothesis is as follows:
H1: p_bar2–p_bar1 ≠ Constant, which equals 0
Therefore:
H1: p_bar2–p_bar1 ≠ 0
The “not-equals” sign indicates that this is a two-tailed hypothesis test and not a one-tailed test.
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the proportion of the populations
from which the samples were taken. A two-independent-sample Hypothesis Test of Proportion
determines whether to reject or fail to reject the Null Hypothesis which states that the populations from
which the two independent samples came from have equal proportions. In this case the Hypothesis test
analyzes whether total population of male employees has the same proportion preferring the proposed
plan as the total female population based upon much smaller samples taken from each of the two
populations.

292
Step 2 – Map the Distributed Variable to Normal Distribution
A z Test can be performed if the sample proportion p_bar2–p_bar1 is distributed according to the normal
distribution. The sample proportion, p_bar2–p_bar1, is distributed according to the binomial distribution
because both p_bar1 and p_bar2 are binomially distributed. The normal distribution can be used to
approximate this binomial distribution because the requirements that np and qp are greater than 5 for
both p_bar1 and p_bar2 . The distribution of the sample proportion, p_bar2–p_bar1, can therefore be
approximated by the normal distribution.
The sample proportion, p_bar2–p_bar1, will be mapped to a normal distribution. Each unique normal
distribution can be fully described by two parameters: its mean and standard deviation.
The mean of the normal distribution curve that maps the distributed variable p_bar is equal to the
Constant in the Null Hypothesis. The Null Hypothesis is as follows:
H0: p_bar2–p_bar1 = Constant = 0
The distributed variable p_bar2–p_bar1 will be mapped to a normal distribution curve with a mean = 0,
which is the Constant.
Population parameters such as population standard deviation have to be estimated if only sample data is
available. In this case the population standard deviation will be estimated by the Standard Error which is
based on the sample size.
Standard Error (SEDiff) for a pooled, two-independent-sample Hypothesis Test of Proportion is calculated
as follows:

ppooled = (X1 + X2)/(n1 + n2) = (90 + 59)/(200 + 100) = 0.50


qpooled = 1 - ppooled = 1 – 0.50 = 0.50

SEDiff = SQRT[ ppooled * qpooled * (1/n1 + 1/n2) ]


SEDiff = SQRT[ 0.50 * 0.50 * (1/200 + 1/100) ] = 0.06
The normal distribution curve that maps the distribution of variable p_bar2–p_bar1 now has the following
parameters:
Mean = 0
Standard Error = 0.06

293
This Excel-generated normal distribution curve is shown as follows:

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to accept or reject the Null Hypothesis at a given
level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected.
In this case we are trying to show graphically how different the observed p_bar2–p_bar1 (0.14) is from the
hypothesized p_bar2–p_bar1 (0).
This Hypothesis Test of Proportion calculates the probability of a difference sample having a difference
proportion equal to the observed p_bar2–p_bar1 (0.14) if the true difference equals 0 and the difference
sample proportion is normally distributed.

The Regions of Acceptance and Rejection


The normal distribution curve that maps the distribution of variable p_bar2–p_bar1 can be divided up into
two types of regions: the Region of Acceptance and the Region of Rejection.
If p_bar2–p_bar1’s observed value of 0.14 falls in the Region of Acceptance, we fail to reject the Null
Hypothesis. If p_bar2–p_bar1 falls in the Region of Rejection, we reject the Null Hypothesis.
This is a two-tailed test because the Region of Rejection is split between both outer tails. The Alternative
Hypothesis indicates this. A hypothesis test is non-directional if the Alternative Hypothesis has the non-
directional operator “not equal to.”
A hypothesis test is a one-tailed test if the Alternative Hypothesis has a directional operator, i.e., “greater
than” or “less than.” A one-tailed test has the entire Region of Rejection contained in one outer tail. A
“greater than” operator indicates a right-tailed test. A “less than” operator indicates a left-tailed test.
In this case the Alternative Hypothesis is the following:
H1: p_bar2–p_bar1 ≠ 0
This Alternate Hypothesis indicates that the hypothesis test has the Region of Rejection split between
both outer tails and is therefore two-tailed.

294
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this normal distribution
curve.
Because this test is a two-tailed test, the 5 percent Region of Rejection is divided up between the two
outer tails. Each outer tail contains 2.5 percent of the total 5 percent is the Region of Rejection.

Calculate Critical Values


The Critical Value is the boundary of the Region of Rejection. The Critical Value is the boundary on either
side of the curve beyond which 2.5 percent of the total area under the curve exists. In this case both
Critical Values can be found by the following:
Critical Values = Mean ± (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Values = Mean ± NORM.S.INV(1-α/2) * SEDiff
Critical Values = 0 ± NORM.S.INV(1- 0.05/2 ) * 0.06
Critical Values = 0 ± NORM.S.INV(0.975) * 0.06
Critical Values = 0 ± 0.12
Critical Values = -0.12 and +0.12
The Region of Rejection is therefore everything that is to the right of 0.12 and everything to the left of -
0.12.
This normal distribution curve with the blue Region of Acceptance in the center and the yellow Regions of
Rejection in the outer tails is shown is as follows:

295
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:

1) Compare p_bar2–p_bar1 With Critical Value


If the observed value of p_bar2–p_bar1 (0.14) falls into the Region of Acceptance (the blue region under
the curve), the Null Hypothesis is not rejected. If the observed value of p_bar2–p_bar1 falls into the
Regions of Rejection (either of the two yellow outer regions), the Null Hypothesis is rejected.
The observed p_bar2–p_bar1 (0.14) is farther to the curve’s mean (0) than the Critical Value on the right
side (+0.118) and falls in the yellow Region of Rejection. We therefore reject the Null Hypothesis. We can
state with 95 percent certainty that there is a real difference between the overall proportions of male and
female employees who prefer the proposed plan based upon the small samples taken each of the two
populations.

2) Compare z Value With Critical z Value


The Test Statistic for this Hypothesis Test is called the z Value. The z Value is the number of Standard
Errors that the observed p_bar2–p_bar1 is from the mean of zero. The Critical z Value is the number of
Standard Errors that the Critical z Value is from the mean.
If the z Value is closer to the standardized mean of zero than the Critical z Value, the Null Hypothesis is
not rejected. If the z Value is farther from the standardized mean of zero than the Critical z Value, the Null
Hypothesis is rejected. The test Statistic is calculated as follows:

z Value (the Test Statistic) = (p_bar2–p_bar1 - Constant) / SEDiff


z Value (the Test Statistic) = (0.59 – 0.45 – 0)/0.06
z Value (the Test Statistic) = 2.29
Critical z Valueα=0.05,two-tailed,right tail = NORM.S.INV(1-α/2)
(Note that the two-tailed Critical z Value in the left tail = NORM.S.INV(α/2)
Critical z Valueα=0.05,two-tailed,right tail = NORM.S.INV(0.975) = 1.96
This means that the observed p_bar2–p_bar1 (0.14) is 2.29 Standard Errors from the curve’s mean (0).
Critical z Valueα=0.05,two-tailed = ±NORM.S.INV(1-α/2)
Critical z Valueα=0.05,two-tailed = ±NORM.S.INV(0.975) = 1.96
This means that the left and right boundaries of the Regions of Rejection are 1.96 Standard Errors from
the curve’s mean.
The z Value (2.29) is farther to the curve’s standardized mean of zero than the Critical z Value (1.96) so
the Null Hypothesis is rejected.

296
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond the observed p_bar2–p_bar1 (0.14). If the p Value is
smaller than Alpha/2 (since this is a two-tailed test), the Null Hypothesis is rejected. If the p Value is larger
than Alpha/2, the Null Hypothesis is not rejected.
The p Value is calculated by the following Excel formula:
p Value = MIN(NORM.S.DIST(z Value,TRUE),1-NORM.S.DIST(z Value,TRUE))
p Value = MIN(NORM.S.DIST(2.29,TRUE),1-NORM.S.DIST(2.29,TRUE))
p Value = 0.0111
The p Value (0.0111) is smaller than Alpha/2 (0.025) and we therefore reject the Null Hypothesis.
The following Excel-generated graph shows that the red p Value (the curve area beyond p_bar2–p_bar1)
is smaller than the yellow Alpha/2 Region of rejection in the outer right tail.

It should be noted that if this z Test were a one-tailed test, the Null Hypothesis would also be rejected
because a two-tailed Hypothesis Test is more stringent than a one-tailed test.
The one and two-tailed tests both calculate the same p Value (0.011), z Value (2.29), and observed value
of p_bar2–p_bar1 (0.14) . The critical values that these are compared to are different between the one and
two-tailed tests. These critical values are the Critical Value, the Critical z Value, and the area of rejection
in one outer tail of the Region of Rejection.
Critical values for a one-tailed would be the following:
Critical Value = Mean + NORM.S.INV(1-α) * SEDiff
Critical Value = 0 + NORM.S.INV(1- 0.05 ) * 0.06
Critical Value = 0 + NORM.S.INV(0.95) * 0.06
Critical Value = 0.0987

297
Critical z Valueα=0.05,one-tailed,right tail = NORM.S.INV(1-α)
Critical z Valueα=0.05,one-tailed,right tail = NORM.S.INV(0.95) = 1.64

Region of Rejection area in the right outer tail = α = 0.05.

Note that one of the main differences between critical values for a one and two-tailed test is that the one-
tailed test critical values are calculated using α while the two-tailed critical values are calculated using α/2.

298
Two-Sample, Unpooled Hypothesis Test of Proportion
in Excel

Overview
This hypothesis test analyzes two independent samples to determine if the populations from which the
samples were taken have equal proportions. This test is often used to determine whether two sample
proportions are likely different by some specific proportion.
The test is called an unpooled test because the Null Hypothesis states that the two sample proportions
are not the same. The formula for Standard Error uses a unpooled proportion that does not combine the
proportions of both samples into a single, pooled proportion as a pooled test does. This formula is shown
in this following set of formulas.
p_bar1 = observed sample 1 proportion
p_bar1 = X1/n1
= (Number of successes in sample 1)/(Number of trials in sample 1)

p_bar2 = observed sample 2 proportion


p_bar2 = X2/n2
= (Number of successes in sample 2)/(Number of trials in sample 2)

Null Hypothesis H0: p_bar2 - p_bar1 = Constant = a non-zero proportion


This Null Hypothesis that states that p_bar1 ≠ p_bar2 indicates that this is an unpooled test.

The z Value for this test is the Test Statistic and is calculated as follows:

The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed p_bar2 - p_bar1 is beyond the Critical Value.
2) The z Value (the Test Statistic) is farther from zero the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.

299
Example of a Two-Sample, Unpooled, One-Tailed Hypothesis
Test of Proportion in Excel
It is believed that Production Line B produces 5 percent more defects than Production Line A.
Both production lines manufacture the same products and have the same types of machines that are all
in similar condition. Both production lines operate approximately the same number of hours. The only
difference between the production lines is the experience of the crews. The crews that operate Production
Line A have more experience the crews on Production Line B.
Completed units from both production lines were sampled and evaluated over the same period of time as
follows.
12 out of 200 randomly sampled units produced on Production Line A were nonconforming. The
proportion of sample units from Production Line A that were nonconforming was 0.06 (6 percent).
39 out of 300 randomly sampled units produced on Production Line B were nonconforming. The
proportion of sample units from Production Line B that were nonconforming was 0.13 (13 percent).
Determine with 95 percent certainty whether production Line B’s overall proportion nonconforming
exceeds that of Production Line A by more than 5 percent. In other words, determine whether difference
between Production Line B’s overall percent defective and Production Line A’s overall percent defective is
greater than 5 percent.
Note that this will be a unpooled z Test because the proportions of the two populations are assumed to be
different. The Null Hypothesis will state that the difference between the proportion of defectives of the two
populations from which the samples are taken is equal to 5 percent. The Alternative Hypothesis states
that this difference is greater than 5 percent.
Sample results for the two samples cannot be pooled if they are known to be different, as stated by the
Null Hypothesis. Sample results can only be pooled if the Null Hypothesis states that the proportions of
the two samples are the same.
Two-independent-sample Hypothesis Tests of Proportion remain slightly more intuitive and allow for
consistent use of the variable name p_bar2–p_bar1 if the larger sample proportion is always designated
as p_bar2.and the smaller sample proportion is designated as p_bar1.

Summary of Problem Information


First Sample – Production Line A (Experienced Crews)
X1 = number of positive outcomes (defects) in n1 trials = 12
n1 = number of trials (random units inspected) = 200
p_bar1 = X1/n1 = 12/200 = 0.06
Second Sample – Production Line B (Inexperienced Crews)
X2 = number of positive outcomes (defects) in n2 trials = 39
n2 = number of trials (random units inspected) = 300
p_bar2 = X2/n2 = 39/300 = 0.13
p_bar2 – p_bar1 = 0.13 – 0.06 = 0.07
Level of Certainty = 0.95

Alpha (α) = 1 – Level of certainty = 1 – 0.95 = 0.05

300
As with all Hypothesis Tests of Proportion, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.

The Initial Two Questions That Must be Answered Satisfactorily


1) What Type of Test Should Be Done?
2) Have All of the Required Assumptions For This Test Been Met?

The Four-Step Method For Solving All Hypothesis Tests of Mean


Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2) – Map the Normal Curve Based on the Null Hypothesis
Step 3) – Map the Regions of Acceptance and Rejection
Step 4) – Perform the Critical Value Test, the p Value Test, or the Critical z Value Test

The Initial Two Questions That Must Be Answered Before Performing the Four-Step Hypothesis Test of
Proportion are as follows:

Question1) Type of Test?


a) Hypothesis Test of Mean or Proportion?
This is a Hypothesis Test of Proportion because each individual observation (the status of each sampled
completed production unit) can have only one of two values: conforming (not defective) or nonconforming
(defective). Samples for Hypothesis Test of Mean can have more than two values and often can assume
any value within a possible range of values.

b) One-Tailed or Two-Tailed Hypothesis?


The problem asks whether the difference between the proportion of defective units from Production Line
A and Production Line B is greater than the 5 percent. This is a directional inequality making this
hypothesis test a one-tailed test. This is a one-tailed test in the right tail if the directional inequality is
greater than. If the directional inequality is less than, the one-tailed test will occur in the left tail.
If the problem asks whether the difference between the proportion of defective units from Production Line
A and Production Line B is simply not equal to 5 percent, the inequality is non-directional and the test is
two-tailed. A two-tailed test is more stringent than a one-tailed test.

b) One-Sample or a Two-Sample Test?


This is a two-sample hypothesis test because two independent samples are being compared. One
independent sample included completed units from Production Line A and the other independent sample
included completed units from Production Line B.

d) Pooled Test or an Unpooled Test?


Once a hypothesis test has been determined to be a two-independent-sample test, the test should be
designated as a pooled or unpooled test because each uses different formulas. A pooled Hypothesis Test
of Proportion makes a basic assumption that the proportions of both populations are the same. An
unpooled Hypothesis Test of Proportion makes a basic assumption that the proportions of both
populations are not the same. This is an unpooled test because the difference between the proportion
defective for each of the two production lines is assumed to be at least 5 percent.

301
d) t-Test or z-Test?
A Hypothesis Test of Proportion uses the normal distribution to approximate the underlying binomial
distribution that the sampled objects follow. A Hypothesis Test of Proportion will therefore always be a z
Test and not a t Test.
A hypothesis test of proportion will always be a z test because a hypothesis test of proportion always
uses the normal distribution to model the distributed variable. A t Test uses the t distribution to model the
distributed variable.
Samples taken for a Hypothesis Test of Proportion are binary: they can only assume one of two values.
Binary objects are distributed according to the binomial distribution. The binomial distribution can be
approximated by the normal distribution. A Hypothesis Test of Proportion uses the normal distribution to
approximate the underlying binomial distribution that the sampled objects follow. A Hypothesis Test of
Proportion will therefore always be a z Test and not a t Test.
This hypothesis test is a z Test that is two-independent-sample, unpooled, one-tailed hypothesis
test of proportion.

Question 2) Test Requirements Met?


Can Binomial Distribution Be Approximated By Normal Distribution?
The samples for a Hypothesis Test of Proportion follow the binomial distribution because each sample
has only two possible outcomes and the probability of the positive outcome is always the same for each
sample taken.
A Hypothesis Test of Proportion approximates the binomial distribution with the normal distribution so that
normal-distribution-based statistical analysis tools such as z Scores can be used.
The most important requirement of a Hypothesis Test of Proportion is the validity of approximating the
binomial distribution with the normal distribution. The binomial distribution can be approximated by the
normal distribution if sample size, n, is large enough and p is not too close to 0 or 1. This can be summed
up with the following rule:
The binomial distribution can be approximated by the normal distribution if np > 5 and nq >5. In this case,
the calculation of np and qp is the following:
Sample 1
X1 = 12
n1 = 200
p1 = 0.06
q1 = 0.94
n1p1 = 12 and n1q1 = 188

Sample 2
X2 = 39
n2 = 300
p2 = 0.13
q2 = 0.87
n2p2 = 39 and n2q2 = 261

302
np > 5 and nq >5 for both samples so it is valid to approximate the binomial distribution with the normal
distribution. Because the binomial distribution can be modeled by the normal distribution, a z Test can be
used to perform a Hypothesis Test of Proportion.
The binomial distribution has the following parameters:
Mean = np
Variance = npq
Each unique normal distribution can be completely described by two parameters: its mean and its
standard deviation. As long as np > 5 and nq > 5, the following substitution can be made:
Normal (mean, standard deviation) approximates Binomial (n,p)
When np is substituted for the normal distribution’s mean and npq is substituted for the normal
distribution’s standard deviation, then the following is true:
Normal (mean, standard deviation)
becomes
Normal (np, npq)
This approximates Binomial (n,p).
The approximation can be demonstrated with Excel using data from the second sample of this problem.
X = 90 = the number of positive outcomes in n trials
n = 200 = the number of trials in one sample
p = 0.45 = expected probability of a positive result in all trials
q = 1 – p = 0.55 = expected probability of a negative result in all trials
The normal approximation of the binomial distribution as follows:
BINOM.DIST(X, n, p, FALSE) ≈ NORM.DIST(X, np, npq, FALSE)

Analyzing the data from Sample 2 would produce the following comparison:
BINOM.DIST(X, n, p, FALSE)
= BINOM.DIST(39, 300, 0.13, FALSE) = 0.067
NORM.DIST(X, np, npq, FALSE)
= NORM.DIST(39, 39, 33.93, FALSE) = 0.012
The difference between BINOM.DIST(39, 300, 0.13 FALSE) and NORM.DIST(39, 39, 33.93, FALSE) is
less than 0.06. That is reasonably close.
Note that the normal approximation of the binomial distribution only works for the PDF (Probability
Density Function) and not the CDF (Cumulative Distribution Function). Replacing FALSE with TRUE in
the above BINOM.DIST() and NORM.DIST() formulas would calculate their CDFs instead of their PDFs.

303
We now proceed to complete the four-step method for solving all Hypothesis Tests of Proportion. These
four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test

Proceeding through the four steps is done is follows:

Step 1 – Create the Null and Alternative Hypotheses


The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the difference in proportions defective between Production
Line A and Production Line B equals 5 percent. This Null Hypothesis would be written as follows:
H0: p_bar2–p_bar1 = Constant = 0.05
This test is an unpooled test because the Null Hypothesis states that the two sample proportions do not
equal each other. The Constant in the Null Hypothesis for an unpooled test is therefore a nonzero
number.
The formula for Standard Error in an unpooled, two-independent-sample Hypothesis Test of Proportion
does not use a pooled proportion that combines the proportions of both samples as a pooled test would.
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. In this case, the Alternative Hypothesis states that the difference between proportions defective
of Production Line A and Production Line B is greater than 5 percent. This Alternate Hypothesis is as
follows:
H1: p_bar2–p_bar1 > Constant, which equals 0.05
Therefore:
H1: p_bar2–p_bar1 > 0.05
The “not-equals” sign indicates that this is a two-tailed hypothesis test and not a one-tailed test.
The Alternative Hypothesis is directional (“greater than” or “less than” instead of “not equal to”) and the
hypothesis test is therefore a one-tailed test. the “greater than” operation in the Alternative Hypothesis
indicates that this one-tailed test will occur in the right tail. A “less than” operator would indicate that this
one-tailed test would occur in the left tail.
It should be noted that a two-tailed test is more rigorous (requires a greater differences between the two
entities being compared before the test shows that there is a difference) than a one-tailed test.
It is important to note that the Null and Alternative Hypotheses refer to the proportion of the populations
from which the samples were taken. A two-independent-sample Hypothesis Test of Proportion
determines whether to reject or fail to reject the Null Hypothesis which states that the populations from
which the two independent samples came from have equal proportions.
In this case the Hypothesis test analyzes whether total proportion defective of Production Line B is at
least 5 percent greater than the total proportion defective of Production Line A based upon much smaller
samples taken from both production lines.

304
Step 2 – Map the Distributed Variable to Normal Distribution
A z Test can be performed if the sample proportion p_bar2–p_bar1 is distributed according to the normal
distribution. The sample proportion, p_bar2–p_bar1, is distributed according to the binomial distribution
because both p_bar1 and p_bar2 are binomially distributed. The normal distribution can be used to
approximate this binomial distribution because the requirements that np and qp are greater than 5 for
both p_bar1 and p_bar2 . The distribution of the sample proportion, p_bar2–p_bar1, can therefore be
approximated by the normal distribution.
The sample proportion, p_bar2–p_bar1, will be mapped to a normal distribution. Each unique normal
distribution can be fully described by two parameters: its mean and standard deviation.
The mean of the normal distribution curve that maps the distributed variable p_bar is equal to the
Constant in the Null Hypothesis. The Null Hypothesis is as follows:
H0: p_bar2–p_bar1 = Constant = 0.05
The distributed variable p_bar2–p_bar1 will be mapped to a normal distribution curve with a mean = 0.05,
which is the Constant.
Population parameters such as population standard deviation have to be estimated if only sample data is
available. In this case the population standard deviation will be estimated by the Standard Error which is
based on the sample size.
Standard Error (SEDiff) for an unpooled, two-independent-sample Hypothesis Test of Proportion is
calculated as follows:

SEDiff = SQRT[ (p_bar1*q_bar1/n1) + (p_bar2*q_bar2/n2) ]


SEDiff = SQRT[ (0.13*0.87/300) + (0.06*0.94/200) ]
SEDiff =0.03
The normal distribution curve that maps the distribution of variable p_bar2–p_bar1 now has the following
parameters:
Mean = 0.05
Standard Error = 0.03

305
This Excel-generated normal distribution curve is shown as follows:

Step 3 – Map the Regions of Acceptance and Rejection


The goal of a hypothesis test is to determine whether to accept or reject the Null Hypothesis at a given
level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected.
In this case we are trying to show graphically how different the observed p_bar2–p_bar1 (0.07) is from the
hypothesized p_bar2–p_bar1 (0.05).
This Hypothesis Test of Proportion calculates the probability of a difference sample having a difference
proportion equal to the observed p_bar2–p_bar1 (0.07) if the true difference equals 0.05 and the
difference sample proportion is normally distributed.

The Regions of Acceptance and Rejection


The normal distribution curve that maps the distribution of variable p_bar2–p_bar1 can be divided up into
two types of regions: the Region of Acceptance and the Region of Rejection.
If p_bar2–p_bar1’s observed value of 0.07 falls in the Region of Acceptance, we fail to reject the Null
Hypothesis. If p_bar2–p_bar1 falls in the Region of Rejection, we reject the Null Hypothesis.
This is a one-tailed test in the right tail as indicated by the Alternative Hypothesis. A one-tailed test in the
right tail means that the entire Region of Rejection is contained in the outer right tails. The Alternative
Hypothesis indicates this.
A hypothesis test is a one-tailed test if the Alternative Hypothesis has a directional operator, i.e., “greater
than” or “less than.” A one-tailed test has the entire Region of Rejection contained in one outer tail. A
“greater than” operator indicates a right-tailed test. A “less than” operator indicates a left-tailed test.
A hypothesis test is non-directional if the Alternative Hypothesis has the non-directional operator “not
equal to.”
In this case the Alternative Hypothesis is the following:
H1: p_bar2–p_bar1 > 0.05
306
This Alternate Hypothesis indicates that the hypothesis test has the Region of Rejection entirely
contained in the outer right tail. The total size of the Region of Rejection is equal to Alpha. In this case
Alpha, α, is equal to 0.05 (not to be mistaken with the Constant of the Null Hypothesis which
coincidentally happens to be 0.05 as well). This means that the Region of Rejection will take up 5 percent
of the total area under this normal distribution curve.

Calculate Critical Values


The Critical Value is the boundary of the Region of Rejection. The Critical Value is the boundary on the
right side of the curve beyond which 5 percent of the total area under the curve exists. In this case the
Critical Value can be found by the following:
Critical Value = Mean + (Number of Standard Errors from Mean to Region of Rejection) * SE
Critical Value = Mean + NORM.S.INV(1-α) * SEDiff
Critical Value = 0.05 + NORM.S.INV(1- 0.05 ) * 0.03
Critical Value = 0.05 + NORM.S.INV(0.95) * 0.03
Critical Value = 0.05 + 0.049
Critical Value = 0.099
The Region of Rejection is therefore everything that is to the right of 0.099.
This normal distribution curve with the blue Region of Acceptance in the center and the yellow Region of
Rejection in the outer tails is shown is as follows:

307
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:

1) Compare p_bar2–p_bar1 With Critical Value


If the observed value of p_bar2–p_bar1 (0.07) falls into the Region of Acceptance (the blue region under
the curve), the Null Hypothesis is not rejected. If the observed value of p_bar2–p_bar1 falls into the
Regions of Rejection (either of the two yellow outer regions), the Null Hypothesis is rejected.
The observed p_bar2–p_bar1 (0.07) is closer to the curve’s mean (0.05) than the Critical Value on the
right side (0.099) and falls in the blue Region of Acceptance. We therefore reject the Null Hypothesis. We
cannot state with 95 percent certainty that the difference between the proportions defective of Production
Line A and Production Line B is greater 0.05.

2) Compare z Value With Critical z Value


The Test Statistic for this Hypothesis Test is called the z Value. The z Value is the number of Standard
Errors that the observed p_bar2–p_bar1 is from the mean of 0.05. The Critical z Value is the number of
Standard Errors that the Critical z Value is from the mean.
If the z Value is closer to the standardized mean of zero than the Critical z Value, the Null Hypothesis is
not rejected. If the z Value is farther from the standardized mean of zero than the Critical z Value, the Null
Hypothesis is rejected. The test Statistic is calculated as follows:

z Value (the Test Statistic) = (p_bar2–p_bar1 - Constant) / SEDiff


z Value (the Test Statistic) = (0.13 – 0.06 – 0.05)/0.03
z Value (the Test Statistic) = 0.67
This means that the observed p_bar2–p_bar1 (0.07) is 0.67 Standard Errors from the curve’s mean (0.05).
Critical z Valueα=0.05,one-tailed,right tail = NORM.S.INV(1-α)
[ Note that the one-tailed Critical z Value in the left tail = NORM.S.INV(α) ]
Critical z Valueα=0.05,one-tailed,right tail = NORM.S.INV(0.95) = 1.65
This means that the boundary of the Region of Rejection is 1.65 Standard Errors to the right of the
curve’s mean (0.05).
The z Value (0.67) is closer to the curve’s standardized mean of zero than the Critical z Value (1.65) so
the Null Hypothesis is not rejected.

308
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond the observed p_bar2–p_bar1 (0.07) . If the p Value
is smaller than Alpha (since the test is one-tailed), the Null Hypothesis is rejected. If the p Value is larger
than Alpha, the Null Hypothesis is not rejected.
The p Value is calculated by the following Excel formula:
p Value = MIN(NORM.S.DIST(z Value,TRUE),1-NORM.S.DIST(z Value,TRUE))
p Value = MIN(NORM.S.DIST(0.67,TRUE),1-NORM.S.DIST(0.67,TRUE))
p Value = 0.2523
The p Value (0.2523) is larger than Alpha (0.05) and we therefore cannot reject the Null Hypothesis.
The following Excel-generated graph shows that the red p Value (the curve area beyond p_bar2–p_bar1)
is larger than the yellow Alpha Region of Rejection in the outer right tail.

The Null Hypothesis would also not be rejected if the test were two-tailed because a two-tailed test is
more stringent that a one-tailed test. A hypothesis test is more stringent if the Null Hypothesis is harder to
reject.

309
Chi-Square Independence Test in Excel

Overview
The Chi-Square Independence Test is used to determine whether two categorical variables associated
with the same item act independently on that item. The example presented in this section analyzes
whether the gender of the purchaser of a car is independent of the color of the car. This Chi-Square
Independence Test answers the question of whether gender plays a role in the color selection of a
purchased car.
Each item (each purchased car) has two attributes associated with it. These two attributes are the
categorical variables of purchaser’s gender and color. The counts of the number of cars purchased for
each unique combination of gender and color are placed in a matrix called a contingency table.

Contingency Table
A contingency table is a two-way cross-tabulation. Each row in the contingency table is associated with
one of the levels of one of the categorical attributes (such as gender) and each column is associated with
one of the levels of the other categorical attribute (such as color).
The number of rows in the contingency table, r, is equal to the number of levels of the row attribute. The
number of columns in the contingency table, c, is equal to the number of levels of the column attribute.
The contingency table is therefore an r x c table and has r x c cells representing r x c unique
combinations of levels of row and column attributes.

Test Compares Actual vs. Expected Bin Counts


The Chi-Square Independence Test compares whether counts of the actual data for each unique
combination of factors of the two variables are significantly different than the counts that would be
expected if the attributes were totally independent of each other.

Null Hypothesis
A Null Hypothesis is created which states there is no significant difference between the actual and
expected counts of data for the unique combinations of levels of the two factors.

Test Statistic
2
The Chi-Square Independence Test calculates a Test Statistic called a Chi-Square Statistic, Χ . The
distribution of this Test Statistic can be approximated by the Chi-Square distribution if several conditions
are met.

When to Reject Null Hypothesis


The Null Hypothesis is rejected if that Chi-Square Statistic is larger than a Critical Chi-Square Value
based upon the specified alpha level and degrees of freedom associated with that test. Equivalently, the
Null Hypothesis is rejected if the p value derived from the test is smaller than the specified alpha level.

310
Required Assumptions
2
The distribution of this Test Statistic, Χ , can be approximated by the Chi-Square distribution with degrees
of freedom equal to df = (r – 1)(c – 1) if the following three conditions are met:
1) The number of cells in the contingency table (r x c) is at least 5. A 2 x 2 contingency table is not large
enough. One of the two attributes must have at least 3 levels.
2) The average value of all of the expected counts is at least 5.
3) All of the expected counts equal at least 1.

311
Example of Chi-Square Independent Test in Excel
We will examine whether gender and product color selection are independent of each other. A car
company in the United States sold new 12,000 cars of one brand in one month. The car company
recorded the gender of each customer and also the color of the car. The car was available in only three
colors: red, blue, and green.
The actual counts of cars purchased in that months for each unique combination of gender/color are
shown as follows:

Determine with 95-percent certainty the car purchaser’s gender and the selected color of the car are
independent of each other.

Step 1 – Place Actual Counts In Contingency Table


The actual counts of the number of items having each unique combination of row and column attribute
level are placed into the proper cell in the r x c contingency table. In this case the counts of the number of
cars associated with each unique combination of gender/color are placed into the correct cells of the 2 x 3
contingency table as follows:

312
Creating the Contingency Table From an Excel Pivot Table
The contingency table can be created with Excel’s Pivot Table tool is the data are initially presented in the
following fashion as they often are:

The Pivot Table is accessed from within the Insert tab.


Insert / Pivot Table / Pivot Table bring up the initial Pivot Table dialogue box. The table range and
output location should be filled in as follows:

313
Hitting OK brings up the following final Pivot Table dialogue box:

314
Dragging the label Color down to the Column Labels box and to the Σ Values box and then dragging the
label Gender down to the Row Labels box produces the completed Pivot Table as follows. This Pivot
Table is an exact match of the contingency table containing the actual values for this data set.

Note that the Excel Pivot Table would be an exact match for the contingency table with the actual counts
that is shown again here.

315
Step 2 – Place Expected Counts In Contingency Table
The expected counts for each unique combination of levels of row/column attributes are placed into the
correct cells of an identical contingency table as follows:

The expected counts are based upon the assumption that the row and column attributed act
independently of each other. The method of calculated the expected numbers based upon this
assumption is shown below:

Step 3 – Create Null and Alternative Hypotheses


The Null Hypothesis states that there is no difference between the expected and actual counts of items
2
for each unique combination of levels of row and column attributes. The Test Statistic, Χ , would equal 0
in this case. The Null Hypothesis is therefore specified as follows:
2
H0: Χ = 0
2
The Chi-Square Statistic, Χ , is distributed according to the Chi-Square distribution if certain conditions
are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability
density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined
for positive values of x.
Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis
specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified
as follows:
2
H1: Χ > 0

316
Step 4 – Verify Required Assumptions
The distribution of this Test Statistic, Χ2, can be approximated by the Chi-Square distribution with
degrees of freedom equal to df = (r – 1)(c – 1) if the following three conditions are met:
1) The number of cells in the contingency table (r x c) is at least 5. The contingency table is a 2 x 3 table
so this condition is met.
2) The average value of all of the expected counts is at least 5. This condition is met.
3) All of the expected counts equal at least 1. This condition is met.

Step 5 – Calculate Chi-Square Statistic, Χ2


2
The Test Statistic, which is the Chi-Square Statistic, Χ , is calculated for n = r x c unique cells in the
contingency table as follows:

This can be quickly implemented in a convenient table as follows:

317
Step 6 – Calculate Critical Chi-Square Value and p Value
The degrees of freedom for the Chi-Square Independence Test is calculated as follows:
r = number of rows = 2
c = number of columns = 3
df = (r – 1)(c – 1) = (2 – 1)(3 – 1) = 2

The Critical Chi-Square Value is calculated as follows:


Chi-Square Critical = CHISQ.INV.RT(α,df)
Chi-Square Critical = CHISQ.INV.RT(0.05,2) = 5.99

Prior to Excel 2010, the formula is calculated as follows:


Chi-Square Critical = CHIINV(α,df)

The p Value is calculated as follows:


p Value = CHISQ.DIST.RT(Chi-Square Statistic,df)
p Value = CHISQ.DIST.RT(6.17,2) = 0.0457

Prior to Excel 2010, the formula is calculated as follows:


p Value = CHIDIST(Chi-Square Statistic,df)

318
Step 7 – Determine Whether To Reject Null Hypothesis
The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:
1) Chi-Square Statistic > Critical Chi-Square Value
2) p Value < α
Both of these conditions exist as follows.

Chi-Square Statistic = 6.17


Critical Chi-Square value = 5.99

p Value = 0.0457
α = 0.05

In this case we reject the Null Hypothesis because the Chi-Square Statistic (6.17) is larger than the
Critical Value (5.99) or, equivalently, the p Value (0.0457) is smaller than Alpha (0.05).
A graphical representation of this problem is shown as follows:

319
Chi-Square Goodness-of-Fit Tests in Excel

Overview
Chi-Square Goodness-Of-Fit (GOF) tests are hypothesis tests that determine how closely a sample of
data fits a hypothesized distribution. The actual data observations are divided up into groups called bins.
The same number of data points is divided up into identical bins in the groupings that would be expected
if these data points exactly matched the hypothesized distribution.
The counts of actual data observations in each bin are compared with the expected number of data points
that would be in identical bins if the data exactly matched the hypothesized distribution.

Test Statistic
2
A Test Statistic called the Chi-Square Statistic, Χ , is calculated based upon the comparison of the counts
of actual data points in each bin and the counts of expected data points in each of the bins. The formula
for the Chi-Square Statistic is as follows:

n = the total number of bins that containing expected groupings of data points
Actuali = the number of actual observed data points that fall into the ith bin
Expectedi = the number of expected data points in the ith bin if the data exactly matched hypothesized
distribution

Required Assumptions
2
The distribution of the Chi-Square Statistic, Χ , can be approximated by the Chi-Square distribution if the
following 3 conditions are met:
1) n ≥ 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5

Null Hypothesis
2
The Null Hypothesis of this hypothesis test states that Χ = 0. This would mean that actual and expected
counts of data points in each bin are the same. This Null Hypothesis is rejected if either of the following
two equivalent conditions exist:
1) The Chi-Square Statistic is larger than the Critical Chi-Square Value
2) The p Value is smaller than the specified alpha.

320
Basic Excel Formulas
The formulas for the Critical Chi-Square Value and p Value in Excel are the following:
Critical Chi-Square Value = CHISQ.INV.RT(α, df)
p Value = CHISQ.DIST.RT(Chi-Square Statistic, df)
df = degrees of freedom and is calculated using one of two different formulas depending on which of the
two types of GOF tests is being performed.

The Two Types of GOF Tests

1) Bin Sizes Are Pre-Determined


An example would be to test whether the weekly sales count is uniformly distributed throughout the seven
days of the week. The actual sales count for each day would be compared with expected bins each
containing one seventh of the total weekly sales count. The sales count for each day would be expected
equal one-seventh of the week’s total sales count if sales were uniformly distributed throughout the seven
week days. This type of a GOF test often starts with the actual observed data already allocated to bins.
This is the case here in that actual sales are grouped at the start into bins each holding the sales of a
separate day. This example will be performed shortly within this section.
df = n - 1
n = number of bins of expected data

2) Bin Sizes Arbitrarily Set To Match a Distribution


An example would be to perform a Chi-Square Goodness-of-Fit Test for normality on a large single group
of data values. This type of a GOF test starts with the actual observed data in a single group and
therefore not yet allocated to bins. The expected bins are created by establishing arbitrary CDF endpoints
of each bin. The upper and lower CDF endpoints of each expected bin determine the total number of data
points that should be placed in each of these expected bins. The actual data values will be grouped in
bins whose endpoints match those of the expected bins. Standardizing the actual observed data points is
a way of simplifying their bin allocation. The Chi-Square GOF Test for Normality will be performed in this
section using this method.
df = n – 1 – m
n = number of bins of expected data
m = number of parameters needed to fully describe the distribution, e.g. m = 2 for the normal distribution,
which is fully described by two parameters; the mean and standard deviation.

321
GOF Example – Type 1

Bin Sizes Are Pre-Determined


In this example, sales counts for each weekday have been averaged over the course of an entire year.
The average number of sales for each weekday is shown in the follow figure. Determine with 90 percent
certainty whether sales counts are uniformly distributed over the seven days of the week.

Problem Information
Required Level of Certainty = 90 percent
α = 0.10
Actual data observations divided up into 7 bins.
The 7 Actual bins contain the average count of sales that occurred on each of the seven weekdays.
The average number of total sales each week was 105. This is the total number of actual data
observations.

Step 1 – Create Expected Bins


The framework of expected and actual bins must match. There are seven bins containing actual sales
counts for each of the seven days of the week. There must also be seven bins containing expected sales
counts for each of the seven weekdays.

Step 2 – Calculate Counts in Expected Bins


The expected bins contain the data counts that would be expected if the total number of actual data
points (105) were divided up according to the hypothesized grouping, i.e., uniformly distributed among all
seven weekday.
Each of the seven expected bins will contain the expected number of the daily sales if all of the 105 total
sales are uniformly distributed across seven days. Expected bin counts are calculated as follows:

322
Step 3 – Verify Required Assumptions
2
The distribution of the Chi-Square Statistic, Χ , can be approximated by the Chi-Square distribution if the
following 3 conditions are met:
1) n ≥ 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5
All of these conditions have been met.

Step 4 – Create Null and Alternative Hypotheses


The Null Hypothesis states that actual distribution of the data matches the hypothesized distribution. The
Null Hypothesis for the Chi-Square GOF is always specified as the following:
2
H0: Χ = 0
2
The Chi-Square Statistic, Χ , is distributed according to the Chi-Square distribution if certain conditions
are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability
density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined
for positive values of x.
Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis
specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified
as follows:
2
H1: Χ > 0

323
Step 5 – Calculate Chi-Square Statistic, Χ2
2
The Test Statistic, which is the Chi-Square Statistic, Χ , is calculated by this formula as shown below as
follows:

This can be quickly implemented in a convenient table as follows:

324
Step 6 – Calculate Critical Chi-Square Value and p Value
The degrees of freedom for the Chi-Square Independence Test is calculated as follows:
df = n – 1 = 7 – 1 = 6
n = k = number of expected bins

The Critical Chi-Square Value is calculated as follows:


Chi-Square Critical = CHISQ.INV.RT(α,df)
Chi-Square Critical = CHISQ.INV.RT(0.10,6) = 10.64

Prior to Excel 2010, the formula is calculated as follows:


Chi-Square Critical = CHIINV(α,df)
The p Value is calculated as follows:
p Value = CHISQ.DIST.RT(Chi-Square Statistic,df)
p Value = CHISQ.DIST.RT(11.07,6) = 0.0863

Prior to Excel 2010, the formula is calculated as follows:


p Value = CHIDIST(Chi-Square Statistic,df)

325
Step 7 – Determine Whether To Reject Null Hypothesis
The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:
1) Chi-Square Statistic > Critical Chi-Square Value
2) p Value < α
Both of these equivalent conditions exist as follows:

Chi-Square Statistic = 11.07


Critical Chi-Square value = 10.64

p Value = 0.0863
α = 0.10

In this case we reject the Null Hypothesis because the Chi-Square Statistic (11.07) is larger than the
Critical Value (10.64) or, equivalently, the p Value (0.0863) is smaller than Alpha (0.10).
A graphical representation of this problem is shown as follows:

326
GOF Example – Type 2

Bin Sizes Arbitrarily Set To Match a Distribution

Chi-Square Goodness-of-Fit Test for Normality

Overview
The first example in this section demonstrated the Chi-Square GOF test being performed for the uniform
distribution. The Chi-Square GOF test can be used to test how well any data sample fits just about any
distribution. Quite often the Chi-Square GOF test is used to test whether a sample of data is normally
distributed. The Chi-Square GOF test for normality is an alternative to other well-known normality tests
such as the Anderson-Darling and Kolmogorov-Smirnov tests.
The Chi-Square GOF test can be used to test whether a data sample can be fitted with any distribution for
which the CDF (Cumulative Distribution Function) can be calculated. The Anderson-Darling and
Kolmogorov-Smirnov tests can only be used to test whether a data sample can be fitted with a continuous
distribution such as the normal distribution. The Chi-Square GOF test with continuous distributions as well
as discrete distributions such as the binomial and Poisson distributions.

327
Chi-Square GOF Test for Normality Example in Excel
Determine with 95 percent certainty whether the following sample of data is normally distributed.

Step 1 – Sort and Standardize Data


Data that will be evaluated with the Chi-Square GOF Test for Normality usually not provided in pre-
determined bins. The bins, or more specifically, the upper and lower boundaries of each of the bins, have
not been established. Sorting and standardizing the data greatly facilitates the development of the bin
specifications.
Sorting the data makes the data’s range and any significant outliers apparent. Outliers judged to be
extreme and therefore non-representative of the data can be removed. Each significant outlier should be
evaluated on a case-by-case basis. Outliers that have been removed and the justifications for removal
should be noted.
Standardizing a data value converts that value to its z Score. A z Score is equal to the number of sample
standard deviations that the value is from the sample mean. Standardizing the data allow bin boundaries
to be based upon increments of sample standard deviation which is fairly intuitive and uncomplicated.

328
More importantly, converting data values to their z Scores makes it possible to use the normal
distribution’s CDF (Cumulative Distribution Function) to calculate the percentage of total data points that
would be expected to fall into each of the bins. This will be discussed in more detail shortly.

a) Sorting the Data


The raw data can be sorted using the sorting tool available in Excel. This is effective but an even better
way to sort the data is use the formula shown in the following diagram. Using this formula has the
advantage that the data will be automatically resorted if any of the data are changed. The sorting tool
would need to be reapplied each time any of the data have been changed. The formula can be typed into
the top cell and then quickly copied down to the bottom as follows:

329
b) Standardizing the Data
Standardizing the data simply involves subtracting the mean from the data value and then dividing by the
standard deviation. This calculation converts each data value to its z Score. For population data, the z
Score is the number of population standard deviations that the data value is from the population mean.
For sample data, the z Score is the number of sample standard deviations that a data value is from the
sample mean.
The z Scores in this example are calculated from sample data as follows:

330
Step 2 – Create Bins
Bin creation involves specifying the upper and lower boundaries of each bin into which the actual and
expected values will fall. Sorting and standardizing the data simplify bin creation.
The z Scores of the data range from -1.787 to 2.490. The bins should cover that entire range because
there are no significant outliers among the 26 total z Scores.
The bins need to be large enough so the three required conditions for the Test Statistic to follow the Chi-
2
Square Distribution are met. The distribution of the Chi-Square Statistic, Χ , can be approximated by the
Chi-Square distribution if the following 3 conditions are met:
1) The number of bins (n) is at least 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5

Establishing dimensions for the bins is an arbitrary process. Three important criteria need to be
considered when establishing the upper and lower boundaries of the bins. These are the following:
1) The bins need to be large enough so the Chi-Square GOF Test’s three conditions will be met.
2) The overall range of all of the bins should be large enough to capture all data points that have not
removed for being outliers.
2) The bins need to be small enough so that the Chi-Square GOF Test will have sufficient power. The
power of a statistical test is equivalent to its sensitivity and is measured as follows:
Power = 1 – β
β is the test’s probability of making a Type II error. A type II error is a false negative or failing to detect a
significant difference.
Power is therefore a statistical test’s probability of not failing to detect a significant difference. This would
be the sensitivity of a test.
3) The distance between upper and lower boundaries for all bins should be as similar as possible.
Establishing optimal dimensions for the bins is dependent on judgment and statistical skill of the person
performing the test. One possible configuration for the bins would be to construct five bins that catch all
data points with z Scores ranging from -2.5 up to 2.5. Each bin would have a range equaling the length of
one z Score. The boundaries of the five bins configured with those dimensions are shown as follows:

331
It is not yet known whether at least one data point is expected to fall into each of these bins and whether
the average number of data points expected to fall into a bin is at least five. The number of data points
expected to fall into each of the bins if the data were normally distributed will be calculated in Step 4 of
this process.

Step 3 – Determine Actual Count For Each Bin


Since bins dimensions have been established, it is now possible to determine how many of the actual z
scores will fall into each bin. The counts of actual z Scores that have values in the range of each of the
bins should be taken.
A histogram would perform this task because a histogram provides counts of the number of data points
that have values in specified ranges. Histograms in Excel can be implemented in two ways as follows:
1) Excel’s histogram data analysis tool
332
2) An Excel formula combined with a bar chart
The formula/bar chart method is the better method because the all of the output is automatically updated
when the raw data is changed. The histogram tool must be manually re-run each time that raw data is
changed. The two methods are shown in detail as follows:

a) Creating a Histogram With the Excel Histogram Tool


This tool is accessed under the Data tab as follows:
Data tab / Data Analysis / Histogram
A Histogram dialogue box appears. This dialogue box requires the location of the data to be analyzed, the
upper boundaries of each bin, and the location of the upper left corner of the output.
The output of this tool consists of a Frequency chart that contains the counts of the number of data points
that fell into each bin and a histogram bar chart which graphically illustrates the number of data points
that fell into each bin. The input and the output of the histogram tool are shown as follows:

333
An expanded view of the completed dialogue box is shown as follows:

b) Creating a Histogram With a Formula and Bar Chart


The Excel histogram tool, like all of the data analysis tools, does not automatically update its output when
input data is changed. The tool must be manually rerun to update the output whenever the input data has
been changed.
This inconvenience can be eliminated by using an Excel formula combined with a bar chart to create an
output equivalent to that of the histogram data analysis tool. Excel formulas and charts automatically
update their output when input data is changed.
There are two Excel formulas that can be used to provide a count of data values within a specified range.
The following formula works in versions of Excel from 2007 and later:
=COUNTIFS(range,”lower condition”,range,”upper condition”)
The following formula works in current versions of Excel and versions of Excel prior to 2007:
=SUMPRODUCT((range & lower condition)*(range & upper condition))
Both of these formulas are shown in operation calculating the actual bin counts as follows:

334
The histogram bar chart is an Excel bar chart that is based upon the actual bin counts and the bins’ upper
boundary z Scores.
The bins counts and the bar chart output are automatically updated if any of the raw data have been
changed.
This bar chart is created in Excel as follows:
Insert tab / Column Chart / 2-D Clustered Column Chart/
A blank chart will appear on the worksheet. Right-click on the blank chart.
Select Data / This brings up the Select Data Source dialogue box
On the left side under Legend Entries (Series) select the blank data series / Edit /
In the Series Values box, select Bin Actual Count cells J4 to J8 / OK
On the right side under Horizontal (Category) Axis Labels, select Bin Upper Boundary z Score cells I4 to
I8 / OK
Note that the values in cells I4 to I8 need to start with lower values on top in order to have lower values
on the right side of the x axis.

335
Step 4 – Determine Expected Count For Each Bin
Standardizing a data value converts that value to its z Score. Converting data values to their z Scores
makes it possible to use the normal distribution’s CDF (Cumulative Distribution Function) to calculate the
percentage of total data points that would be expected to fall into each of the bins.
The normal distribution’s CDF (Cumulative Distribution Function) equals the probability that sampled point
from a normally distributed population has a value UP TO X given the population’s mean, µ, and standard
deviation, σ.
The normal distribution’s CDF is expressed as F(X,µ,σ).
The normal distribution’s CDF at point X is calculated in Excel as follows:
F(X,µ,σ) = NORM.S.DIST(z Score(X),TRUE)
If data are normally distributed, the percentage of total data points that is expected to lie between X Upper
and XLower is equal to the difference in the CDF values at those two X values. This is equal to the
following:
Percentage of Data between XUpper and XLower = F(XUpper,µ,σ) - F(XLower,µ,σ)
Percentage of Data between XUpper and XLower =
= NORM.S.DIST(z Score(XUpper),TRUE)) – NORM.S.DIST(z Score(XLower),TRUE)

This is illustrated in the following example.


Calculate the percentage of total data points in a normally distributed population that lie between the
values of 25 and 30 if the population mean, µ, equals 27 and the population standard deviation, σ, equals
5.
Standardizing the data values converts them to z Scores as follows:
z Score = (X - µ)/σ
z Score(x = 25) = (25 – 27)/5 = -0.4
z Score(x = 30) = (30 – 27)/5 = 0.6

Percentage of Data between XUpper and XLower =


= NORM.S.DIST(z Score(XUpper),TRUE)) – NORM.S.DIST(z Score(XLower),TRUE)
= NORM.S.DIST(0.6,TRUE) – NORM.S.DIST(-0.4)
= 0.7257 – 0.3446 = 0.3812
= 38.12 %

336
This is demonstrated in the following diagram which shows that 38.12 percent of the area under the
normal distribution PDF (Probability Density Function) curve lies between x = 25 and x = 30 if µ = 27 and
σ = 5.

337
The CDF values of the z Scores of the upper and lower bin boundaries are created as follows:

The percentage of the total number of data points in each bin is equal to the percentage of the normal
curve area assigned to each bin if the data are normally distributed.
The percentage of normal curve area assigned to each bin is equal to the CDF of the bin’s upper z Score
minus the CDF of the bin’s lower z Score. This subtraction is performed in the following image.

338
The expected count of data points in a bin if the data is normally distributed is equal to the total number of
actual data points (26) times the percentage of the total normal curve area assigned to the bin. This
calculation is also performed in the following image:

Step 5 – Verify Required Assumptions


2
The distribution of the Chi-Square Statistic, Χ , can be approximated by the Chi-Square distribution if the
following 3 conditions are met:
1) n ≥ 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5
All of these conditions have been met.

339
Step 6 – Create Null and Alternative Hypotheses
The Null Hypothesis states that actual distribution of the data matches the hypothesized distribution. The
Null Hypothesis for the Chi-Square GOF is always specified as the following:
2
H0: Χ = 0
2
The Chi-Square Statistic, Χ , is distributed according to the Chi-Square distribution if certain conditions
are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability
density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined
for positive values of x.
Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis
specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified
as follows:
2
H1: Χ > 0

340
Step 7 – Calculate Chi-Square Statistic, Χ2
2
The Test Statistic, which is the Chi-Square Statistic, Χ , is calculated for n = r x c unique cells in the
contingency table as follows:

This can be quickly implemented in a convenient table as follows:

341
Step 8 – Calculate Critical Chi-Square Value and p Value
The degrees of freedom for the Chi-Square Independence Test is calculated as follows:
df = n – 1 –m = n – 1 – 2 = 2
n = k = number of expected bins

The Critical Chi-Square Value is calculated as follows:


Chi-Square Critical = CHISQ.INV.RT(α,df)
Chi-Square Critical = CHISQ.INV.RT(0.05,2) = 5.99
Prior to Excel 2010, the formula is calculated as follows:
Chi-Square Critical = CHIINV(α,df)

The p Value is calculated as follows:


p Value = CHISQ.DIST.RT(Chi-Square Statistic,df)
p Value = CHISQ.DIST.RT(6.60,2) = 0.0369
Prior to Excel 2010, the formula is calculated as follows:
p Value = CHIDIST(Chi-Square Statistic,df)

342
Step 9 – Determine Whether To Reject Null Hypothesis
The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:
1) Chi-Square Statistic > Critical Chi-Square Value
2) p Value < α
Both of these equivalent conditions exist as follows:

Chi-Square Statistic = 6.60


Critical Chi-Square Value = 5.99

p Value = 0.0369
α = 0.05

In this case we reject the Null Hypothesis because the Chi-Square Statistic (6.60) is larger than the
Critical Value (5.99) or, equivalently, the p Value (0.0369) is smaller than Alpha (0.50).
A graphical representation of this problem is shown as follows:

343
Chi-Square Population Variance Test in Excel

Overview
The Chi-Square Population Variance Test is a hypothesis test is used to determine if the variance of a
normally-distributed population has changed. One common use of this test is to determine whether an
adjustment made to a production line causes a change in variance at some measurement point on the
production line.
The Chi-Square Variance Test can be performed as a one-sample or a two-sample test.
A one-sample test usually involved using a single sample taken from a normally-distributed population to
determine whether the variance of that population has changed from a known variance measured in the
past. The production line example just mentioned is the most common use of the one-sample Chi-Square
Variance Test. In this case the test is most accurate when the benchmark population standard deviation
has been calculated from a stable process over a long period of time.
A two-sample test is used to determine whether two normally-distributed populations have the same
variance. This is known as the F Test.
As with most hypothesis tests, the Chi-Square Population variance Test can be conducted as a one-tailed
test or a two-tailed test. When this hypothesis test is conducted as a one-tailed test, it is used to
determine whether the population variance has moved in one direction, i.e., the test is being used to
determine only whether the population variance has increased or the test is being used to determine only
whether the population variance has decreased.
When this hypothesis test is being conducted as a two-tailed test, it is being used to determine whether
the population variance has changed in any direction (a one-sample test) or whether two populations
have the same variance (a two-sample test). The two-tailed test is more stringent than the one-tailed test;
the two-tailed test requires more change to reject the Null Hypothesis than a one-tailed test of the same
alpha level. The Null Hypothesis states either that a single population variance has not changed (a one-
sample test) or that two populations have the same variance (an F Test, which is a two-sample test).

One-Sample Chi-Square Population Variance Test


The one-sample Chi-Square Population Variance Test is used to determine if a normally-distributed
population’s variance has changed. A sample of size n is taken from the population and its variance is
2
measured. A test statistic called the Chi-Square Statistics is then created from n (sample size), s (sample
2
variance), and σ (population variance) as follows:
2 2
Chi-Square Statistic = (n-1)* s / σ
This test statistic is called the Chi-Square Statistic because the distribution of this test statistic can be
approximated by the Chi-Square distribution with n-1 degrees of freedom if the population is normally
distributed.
Critical Chi-Square Values are then calculated based upon the degrees of freedom (n-1), alpha, and the
number of tails in the hypothesis test as follows:

Two-tailed test
Left Critical Value = CHISQ.INV(α/2,df)
Right Critical Value = CHISQ.INV(1 – α/2,df)

344
One-tailed test – Right tail
Critical Value = CHISQ.INV(1 – α,df)

One-tailed test – Left tail


Critical Value = CHISQ.INV(α,df)
The Null Hypothesis of the One-Sample Chi-Square Population Variance Test states that population
variance has not changed. This Null Hypothesis is rejected if the Chi-Square Statistic is outside of a
Critical Value on the same side of the Chi-Square PDF curve. The Null Hypothesis is not rejected of the
Chi-Square Statistic falls inside of the Critical Value on that side of the curve.

Example of 1-Sample, 2-Tailed, Chi-Square Population


Variance Test in Excel
A specific measurement is taken on each unit that is completed from a production line over a long period
of time. These measurements have been determined to be normally distributed with a population variance
2
= σ = 0.09.
An adjustment was made to the production line that may have affected the variance of the measurement
taken on each completed unit. A random sample of 150 units from the newly-adjusted production line had
the measurement taken. The sample variance of this 150-unit sample is s = 0.32. Determine with 95
percent certainty whether the population variance has changed in any direction as a result of the
adjustment.

Problem Information
Sample size = n = 150
Degrees of Freedom = n – 1 = 149
Sample Standard Deviation = s = 0.32
2
Sample Variance = s = 0.1024
Long-term, Benchmark Population Variance = 0.09
Alpha = 1 – Required Level of Certainty = 1 – 0.95 = 0.05

Requirement of Population Normality


Before performing this test, it is important to verify that the population of data measurements is normally
distributed. If the data measurements are not normally distributed, this statistical test could produce totally
invalid results.
If the normality of the population cannot be confirmed, the normality of the sample must be confirmed.
Large sample size (n > 30) does not waive the normality requirement as occurs with t Tests.
Common ways to confirm normality of sample data are the following:
An Excel histogram of the sample data in Excel
A normal probability plot of the sample data in Excel
The Kolmogorov-Smirnov test for normality of the sample data in Excel

345
The Anderson-Darling test for normality of the sample data in Excel
The Shapiro-Wilk test for normality of the sample data in Excel
The above tests are all performed on the sample data in the following section for the F Test, which is a
two-sample, one-tailed Chi-Square Population Variance test.

Non-Parametric Alternatives to 1-Sample Chi-Square Population


Variance Test
When population normality cannot be confirmed, nonparametric alternatives for the one-sample Chi-
Square Population Variance Test include Levene’s Test and the Brown-Forsythe Test.
Levene’s Test and the Brown-Forsythe Test are nonparametric tests that are used to compare variances
of two samples when the F Test’s normality requirement cannot be met. These two nonparametric tests
can also be used in place of the one-sample, Chi-Square Population Variance Test.
Since both of these nonparametric tests are used to compare variances between two samples and
require that two data samples be taken for comparison.
The one-sample Chi-Square Population Test must be changed slightly in order to meet the requirement of
two samples to compare. A “Before” sample must now be taken in place of the known population
standard deviation data. The one-sample Chi-Square Population Variance Test compares an “After”
sample with known population variance data. The nonparametric tests requires that a “Before” sample be
taken to compare with the “After” sample. The “Before” sample is taken before to the adjustment is made
to, for example, a production line. The “After” sample is taken after the adjustment is made.
Levene’s Test and the Brown-Forsythe Test are performed in the next section on the two samples taken
for comparison by the F Test.

Null and Alternative Hypotheses


The Null Hypothesis is created as follows:
2
Ho: σ = 0.09
2
σ equals the variance of the new population of measurements taken after the adjustment was made to
the production line,
2
This Null Hypothesis states that the new population variance, σ , is the same as the old, long-term
population variance of measurements taken before the adjustment was made.
The Alternative Hypothesis is created as follows:
2
H1: σ ≠ 0.09
The non-directional operator, ≠, indicates that this hypothesis test is a two-tailed test. A directional
operation (< or >) would indicate that this hypothesis test is a one-tailed test.

346
Chi-Square Statistic and Chi-Square Critical Values
The Chi-Square Statistic and Critical Values for this two-tailed test are calculated as follows:

These left and right Critical Values are shown in this Chi-Square PDF distribution curve for 149 degrees
of freedom as follows:

The Chi-Square Statistic (169.5) falls inside of the Critical Values (117, 184) and into the red Region of
Acceptance. The Null Hypothesis is therefore not rejected. It cannot be stated with 95 percent certainty
that the variance of the measurement taken from the completed production unit has changed as a result
of the adjustment made to production line.

347
Example of 1-Sample, 1-Tailed, Right Tail, Chi-Square
Population Variance Test in Excel
A specific measurement is taken on each unit that is completed from a production line over a long period
of time. These measurements have been determined to be normally distributed with a population variance
2
= σ = 0.09.
An adjustment was made to the production line that may have affected the variance of the measurement
taken on each completed unit. A random sample of 150 units from the newly-adjusted production line had
the measurement taken. The sample variance of this 150-unit sample is s = 0.33. Determine with 95
percent certainty whether the population variance has increased as a result of the adjustment.

Problem Information
Sample size = n = 150
Degrees of Freedom = n – 1 = 149
Sample Standard Deviation = s = 0.33
2
Sample Variance = s = 0.1089
Long-term Population Variance = 0.09
Alpha = 1 – Required Level of Certainty = 1 – 0.95 = 0.05

Requirement of Population Normality


Before performing this test, it is important to verify that the population of data measurements is normally
distributed. If the data measurements are not normally distributed, this statistical test could produce totally
invalid results.
If the normality of the population cannot be confirmed, the normality of the sample must be confirmed.
Large sample size (n > 30) does not waive the normality requirement as occurs with t Tests.
Common ways to confirm normality of sample data are the following:
An Excel histogram of the sample data in Excel
A normal probability plot of the sample data in Excel
The Kolmogorov-Smirnov test for normality of the sample data in Excel
The Anderson-Darling test for normality of the sample data in Excel
The Shapiro-Wilk test for normality of the sample data in Excel
The above tests are all performed on the sample data in the following section for the F Test, which is a
two-sample, one-tailed Chi-Square Population Variance test.

348
Non-Parametric Alternatives to the One-Sample Chi-Square
Population Variance Test
When population normality cannot be confirmed, nonparametric alternatives for the one-sample Chi-
Square Population Variance Test include Levene’s Test and the Brown-Forsythe Test.
Levene’s Test and the Brown-Forsythe Test are nonparametric tests that are used to compare variances
of two samples when the F Test’s normality requirement cannot be met. These two nonparametric tests
can also be used in place of the one-sample, Chi-Square Population Variance Test.
Since both of these nonparametric tests are used to compare variances between two samples and
require that two data samples be taken for comparison.
The one-sample Chi-Square Population Test must be changed slightly in order to meet the requirement of
two samples to compare. A “Before” sample must now be taken in place of the known population
standard deviation data. The one-sample Chi-Square Population Variance Test compares an “After”
sample with known population variance data. The nonparametric tests requires that a “Before” sample be
taken to compare with the “After” sample. The “Before” sample is taken before to the adjustment is made
to, for example, a production line. The “After” sample is taken after the adjustment is made.
Levene’s Test and the Brown-Forsythe Test are performed in the next section on the two samples taken
for comparison by the F Test.

Null and Alternative Hypotheses


The Null Hypothesis is created as follows:
2
Ho: σ = 0.09
2
σ equals the variance of the new population of measurements taken after the adjustment was made to
the production line,
2
This Null Hypothesis states that the new population variance, σ , is the same as the old, long-term
population variance of measurements taken before the adjustment was made.
The Alternative Hypothesis is created as follows:
2
H1: σ > 0.09
The directional operator, >, indicates that this hypothesis test is a one-tailed test in the right tail. A non-
directional operator (≠) would indicate that this hypothesis test is a two-tailed test. The directional
operation, <, would indicate that the hypothesis test is a one-tailed test in the left tail.

349
Chi-Square Statistic and Chi-Square Critical Values
The Chi-Square Statistic and Critical Values for this two-tailed test are calculated as follows:

These left and right Critical Values are shown in this Chi-Square PDF distribution curve for 149 degrees
of freedom as follows:

The Chi-Square Statistic (180.3) falls outside of the Critical Value (178) and into the blue Region of
Rejection. The Null Hypothesis is therefore rejected. It can be stated with 95 percent certainty that the
variance of the measurement taken from the completed production unit has increased as a result of the
adjustment made to production line.
It should be noted that the Null Hypothesis would not be rejected if this were a two-tailed test. The Chi-
Square Statistic, 180.3, falls inside of the Chi-Square Critical Values of the two-tailed test (117, 184). The
two-tailed test is more stringent than the one-tailed test. This is the case with nearly every type of
hypothesis test.

350
Example of 1-Sample, 1-Tailed, Left Tail, Chi-Square
Population Variance Test in Excel
A specific measurement is taken on each unit that is completed from a production line over a long period
of time. These measurements have been determined to be normally distributed with a population variance
2
= σ = 0.09.
An adjustment was made to the production line that may have affected the variance of the measurement
taken on each completed unit. A random sample of 150 units from the newly-adjusted production line had
the measurement taken. The sample variance of this 150-unit sample is s = 0.33. Determine with 95
percent certainty whether the population variance has decreased as a result of the adjustment.

Problem Information
Sample size = n = 150
Degrees of Freedom = n – 1 = 149
Sample Standard Deviation = s = 0.27
2
Sample Variance = s = 0.0729
Long-term Population Variance = 0.09
Alpha = 1 – Required Level of Certainty = 1 – 0.95 = 0.05

Requirement of Population Normality


Before performing this test, it is important to verify that the population of data measurements is normally
distributed. If the data measurements are not normally distributed, this statistical test could produce totally
invalid results.
If the normality of the population cannot be confirmed, the normality of the sample must be confirmed.
Large sample size (n > 30) does not waive the normality requirement as occurs with t Tests.
Common ways to confirm normality of sample data are the following:
An Excel histogram of the sample data in Excel
A normal probability plot of the sample data in Excel
The Kolmogorov-Smirnov test for normality of the sample data in Excel
The Anderson-Darling test for normality of the sample data in Excel
The Shapiro-Wilk test for normality of the sample data in Excel
The above tests are all performed on the sample data in the following section for the F Test, which is a
two-sample, one-tailed Chi-Square Population Variance test.

351
Non-Parametric Alternatives to 1-Sample Chi-Square Population
Variance Test
When population normality cannot be confirmed, nonparametric alternatives for the one-sample Chi-
Square Population Variance Test include Levene’s Test and the Brown-Forsythe Test.
Levene’s Test and the Brown-Forsythe Test are nonparametric tests that are used to compare variances
of two samples when the F Test’s normality requirement cannot be met. These two nonparametric tests
can also be used in place of the one-sample, Chi-Square Population Variance Test.
Since both of these nonparametric tests are used to compare variances between two samples and
require that two data samples be taken for comparison.
The one-sample Chi-Square Population Test must be changed slightly in order to meet the requirement of
two samples to compare. A “Before” sample must now be taken in place of the known population
standard deviation data. The one-sample Chi-Square Population Variance Test compares an “After”
sample with known population variance data. The nonparametric tests requires that a “Before” sample be
taken to compare with the “After” sample. The “Before” sample is taken before to the adjustment is made
to, for example, a production line. The “After” sample is taken after the adjustment is made.
Levene’s Test and the Brown-Forsythe Test are performed in the next section on the two samples taken
for comparison by the F Test.

Null and Alternative Hypotheses


The Null Hypothesis is created as follows:
2
Ho: σ = 0.09
2
σ equals the variance of the new population of measurements taken after the adjustment was made to
the production line,
2
This Null Hypothesis states that the new population variance, σ , is the same as the old, long-term
population variance of measurements taken before the adjustment was made.
The Alternative Hypothesis is created as follows:
2
H1: σ > 0.09
The directional operator, <, indicates that this hypothesis test is a one-tailed test in the left tail. A non-
directional operator (≠) would indicate that this hypothesis test is a two-tailed test. The directional
operation, >, would indicate that the hypothesis test is a one-tailed test in the right tail.

352
Chi-Square Statistic and Chi-Square Critical Values
The Chi-Square Statistic and Critical Values for this two-tailed test are calculated as follows:

These left and right Critical Values are shown in this Chi-Square PDF distribution curve for 149 degrees
of freedom as follows:

The Chi-Square Statistic (121.8) falls outside of the Critical Value (120) and into the blue Region of
Rejection. The Null Hypothesis is therefore rejected. It can be stated with 95 percent certainty that the
variance of the measurement taken from the completed production unit has decreased as a result of the
adjustment made to production line.
It should be noted that the Null Hypothesis would not be rejected if this were a two-tailed test. The Chi-
Square Statistic, 121.8, falls inside of the Chi-Square Critical Values of the two-tailed test (117, 184). The
two-tailed test is more stringent than the one-tailed test. This is the case with nearly every type of
hypothesis test.

353
F-Test – 2-Sample, 2-Tailed Chi-Square Population
Variance Test
The variances of two normally-distributed populations can be compared for equality using the F-Test. The
F-Test is a two-sample, two-tailed population variance test. This is a hypothesis test with a Null
Hypothesis stating that the variances of both populations are the same. The Null Hypothesis is shown as
follows:
H0: σ1 = σ2 = σ
2
Note that population variance = σ
The F Test is always performed as a one-tailed test in the right tail with the Alternative Hypothesis
constructed as follows:
H0: σ1 > σ2
The F Test is performed as a one-tailed test in the right tail because the sample with the larger standard
deviation of the two samples is designated as sample 1. The population from which that sample was
taken is designated as population 1. The two parameters associated with sample 1 and population 1 are
s1 (sample 1 standard deviation) and σ1 (population 1 standard deviation).
The F distribution describes the distribution of the F statistic, also called the f value. An F statistic can be
calculated if two independent random samples are taken from two normally-distributed populations. The
following parameters associated with the two samples and populations that must be determined are the
following:
n1 = size of sample 1
n2 = size of sample 2

df1 = degrees of freedom 1 = n1 – 1


df2 = degrees of freedom 2 = n2 - 1

s1 = standard deviation of sample 1


σ1 = standard deviation of population 1
2 2 2
Χ 1 = Chi-Square statistic from population 1 = df1 * s1 / σ1

s2 = standard deviation of sample 2


σ2 = standard deviation of population 2
2 2 2
Χ 2 = Chi-Square statistic from population 2 = df2 * s2 / σ2

The F statistic can then be calculated in any of the following four equivalent ways:
2 2 2 2
f = [ s1 /σ1 ] / [ s2 /σ2 ]
2 2 2 2
f = [ s1 * σ2 ] / [ s2 * σ1 ]
2 2
f=[Χ 1 / df1 ] / [ Χ 2 / df2 ]
2 2
f=[Χ 1 * df2 ] / [ Χ 2 * df1 ]
The numerator of the F statistic should be the parameters associated with the larger s.

354
The distribution of all possible values of the f statistic is called the F distribution, with v1 and v2 degrees of
freedom.
Since the F distribution has the chi-square distribution as a component, many of the chi-square
distribution properties are also properties of the F distribution such as the following:
1) The distribution is non-symmetric.
2) The mean is approximately 1.
3) The F-values are all non-negative.
4) There are two independent degrees of freedom, one for the numerator, and one for the denominator.
5) Each different F distribution has a unique pair of degrees of freedom.
The F Test is a hypothesis test determines if the variances of two normally-distributed populations are
significantly different based upon the standard deviations of samples taken from each population.
The F Test is performed by comparing the calculated F statistic to an F Critical Value, F α(df1,df2). Alpha,
α, is the specified level of significance for the hypothesis test. The Null Hypothesis that the two variances
are the same is rejected if the F statistic is greater than F Critical. Equivalently, the Null Hypothesis is also
rejected of the p Value (the area in the right tail of the F distribution curve that is beyond the F statistic) is
smaller than alpha.
It should be noted that the F Test is extremely sensitive to non-normality. It is very important to verify
normality of both samples or both populations prior to performing an F Test.

355
F Test Problem in Excel
Determine with 95 percent certainty whether the variances of battery lifetime of Brand A and brand B are
significantly different from each other.

Descriptive statistics run on the above data samples produces the following result:

356
Example Data
2
s1 = 286.13 (This sample is designated sample 1 because its variance is larger)
2
s2 = 232.39
n1 = 16
n2 = 17
df1 = n1 – 1 = 15
df2 = n2 – 1 = 16

Step 2 – Verify Normality of Both Populations


The F Test is extremely sensitive to non-normality and will produce an incorrect result if either population
is not normally distributed. It is therefore very important to verify the normality of both populations prior to
performing the F Test.
If the normality of both populations cannot be confirmed, the normality of both samples must be
confirmed. Large sample size (n > 30) does not waive the normality requirement as occurs with t Tests.
An Excel histogram is the quickest way to attain a rough assessment of the normality of a data sample.
Histograms of both data samples are shown as follows. The histogram indicates that the sample data is
normally distributed. The normal distribution of the sample data infers that the populations from which the
sample came are also normally distributed as required by the F Test.
An in-depth analysis of the normality of the sample data will be performed at the end of this section. For
brevity, this F Test’s requirement of population normality will be considered satisfied by the following bell-
shaped Excel histograms of the data from each of the two samples. Excel histograms of both sample
groups are as follows:

357
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

358
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.

359
Having confirmed the F Test’s requirement of normality of both populations, the F Test can be conducted
as follows:

Step 3 – Create the Null and Alternative Hypotheses


H0: σ1 = σ2
H1: σ1 > σ2 – indicates that this is a one-tailed test in the right tail

Step 4 – Calculate the F Statistic


2 2 2 2
f = [ s1 /σ1 ] / [ s2 /σ2 ]
s1 is larger than s2 and should therefore go in the numerator. Since the Null Hypothesis states that the
population variances, σ1 and σ2, are equal, the F statistic can be reduced to the following:
2 2
F Statistic = f = s1 / s2
f = 286.13 / 232.39 = 1.226

Step 5 – Calculate F Critical


F Critical = Fα(df1, df2) = Fα=0.05(df1 = 15, df2 = 16)
F Critical = Fα(df1, df2) = F.INV.RT(α, df1, df2) = F.INV.RT(0.05,15,16) = 2.352

Step 6 – Compare the F Statistic to F Critical


F Statistic (f = 1.226) is smaller than F Critical (2.352) so the Null Hypothesis is not rejected. There is not
sufficient evidence at α = 0.05 to state that the variances of the two populations (the battery lifetimes of
brand A and brand B) are significantly different.
Equivalently, the p value can be compared to alpha as follows:
p Value = F.DIST.RT(F statistic, df1, df2) = F.DIST.RT(1.226,15,16) = 0.345
The p Value (0.345) is larger than alpha (0.05) so the Null Hypothesis is not rejected.
This result shown on a graph of the F distribution with df 1=15 and df2=16 is as follows:

360
The Null Hypothesis of an F Test states that the variances of the two groups are the same. The p Value
shown in the Excel F Test output equals 0.345. This is much larger than the Alpha (0.05) that is typically
used for an F Test so the Null Hypothesis cannot be rejected. A p value of 0.345 indicates that there is a
34.5 percent probability of a Type I error, i.e. a false positive. This means that there is a 34.5 percent
probability that the difference in the variances shown by the test do not exist and are merely the chance
result of random sampling from each population.
The p value needs to be no larger than 0.05 to be at least 95 percent certain that the test’s indication of a
difference between the population variances is a true result. A p Value of 0.345 indicates that only 65.5
percent certainty exists that the a difference between the population variances really exists.

361
Performing the F Test With the Data Analysis F Test Tool
The F Test can be performed in one step by using the Excel Data Analysis F Test tool. this tool can be
accessed under the Data tab as follows:
Data tab / Data Analysis / F Test Two Sample for Variances
The F Test dialogue box then appears and should be completed as follows:

Hitting the OK button will produce the following output. Directly below the output are the calculations that
duplicate the output created by this tool.

362
363
In-Depth Analysis of Sample Normality
The F Test is extremely sensitive to non-normality of either population from which the samples were
taken. A population’s normality is confirmed when a sample taken from that population is shown to be
normally distributed. The preceding F test was performed on the basis of bell-shaped histograms of each
of the two samples’ data. Other methods of confirming sample normality are shown as follows:
Evaluating the Normality of the Sample Data
The following five normality tests will be performed on the sample data here:
An Excel histogram of the sample data will be created.
A normal probability plot of the sample data will be created in Excel.
The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.
The Anderson-Darling test for normality of the sample data will be performed in Excel.
The Shapiro-Wilk test for normality of the sample data will be performed in Excel.

Excel Histogram
The quickest way to evaluate normality of a sample is to construct an Excel histogram from the sample
data. This is shown as follows:
Excel histograms of both sample groups are as follows:

364
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

365
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:

Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:

366
Normal probability plots for both sample groups show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).

367
Kolmogorov-Smirnov Test For Normality in Excel
The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

Variable 1 - Brand A Battery Lifetimes

0.0885 = Max Difference Between Actual and Expected CDF


16 = n = Number of Data Points
0.05 = α

368
Variable 2 - Brand B Battery Lifetimes

0.1007 = Max Difference Between Actual and Expected CDF


17 = n = Number of Data Points
0.05 = α

369
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected only if the maximum difference between the expected and actual CDF
of any of the data points exceed the Critical Value for the given n and α. That is not the case here.
The Max Difference Between the Actual and Expected CDF for Variable 1 (0.0885) and for Variable 2
(0.1007) are significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15
(0.34) at α = 0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of each of the two sample
groups is accepted.

370
Anderson-Darling Test For Normality in Excel
The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons.
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.

If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)

Variable 1 – Brand A Battery Lifetimes

Adjusted Test Statistic A* = 0.174

371
Variable 2 - Brand B Battery Lifetimes

Adjusted Test Statistic A* = 0.227

Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for Variable 1 (0.174) and for Variable 2 (0.227) are significantly less than
the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for
each of the two sample groups is accepted.

372
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.

Variable 1 – Brand A Battery Life

0.972027 = Test Statistic W


0.887 = W Critical for the following n and Alpha
16 = n = Number of Data Points
0.05 = α

373
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0. 972027) is larger than W Critical 0.887. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

Variable 2 – Brand B Battery Life

0.971481 = Test Statistic W


0.892 = W Critical for the following n and Alpha
17 = n = Number of Data Points
0.05 = α

The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0. 971481) is larger than W Critical 0.892. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.

374
Correctable Reasons That Normal Data Can Appear Non-Normal
If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can often be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.

375
Nonparametric Alternatives to the F Test
The F Test is extremely sensitive to non-normality of either population. When normality of population of
sample data cannot be confirmed, sample variances can be compared using the nonparametric Levene’s
Test and also the nonparametric Brown-Forsythe Test.
It is often a good idea to perform at least one of these tests along with the F Test even when sample or
population normality has been confirmed.
Levene’s Test and the Brown-Forsythe Test will be performed on the sample data as follows:

Levene’s Test For Sample Variance Comparison in Excel


Levene’s Test is a hypothesis test commonly used to test for the equality of variances of two or more
sample groups. Levene’s Test is more robust against non-normality of data than the F Test.
The Null Hypothesis of Levene’s Test is average distance to the sample mean is the same for each
sample group. Acceptance of this Null Hypothesis implies that the variances of the sampled groups are
the same. The distance to the mean for each data point of both samples is shown as follows:

Levene’s Test involves performing Single-Factor ANOVA on the groups of distances to the mean. This
can be easily implemented in Excel by applying the Excel data analysis tool ANOVA: Single Factor.
Applying this tool on the above data produces the following output:

376
The Null Hypothesis of Levene’s Test states that the average distance to the mean for the two groups are
the same. Acceptance of this Null Hypothesis would imply that the sample groups have the same
variances. The p Value shown in the Excel ANOVA output equals 0.6472. This is much larger than the
Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
We therefore conclude as a result of Levene’s Test that the variances are the same or, at least, that we
don’t have enough evidence to state with 95 percent certainty that the variances are different. Levene’s
Test is sensitive to outliers because relies on the sample mean, which can be unduly affected by outliers.
A very similar nonparametric test called the Brown-Forsythe Test relies on sample medians and is
therefore much less affected by outliers as Levene’s Test is or by non-normality as the F Test is.

377
Brown-Forsythe Test For Sample Variance Comparison in Excel
The Brown-Forsythe Test is a hypothesis test commonly used to test for the equality of variances of two
or more sample groups. The Null Hypothesis of the Brown-Forsythe Test is average distance to the
sample median is the same for each sample group. Acceptance of this Null Hypothesis implies that the
variances of the sampled groups are the same. The distance to the median for each data point of both
samples is shown as follows:

The Brown-Forsythe Test involves performing Single-Factor ANOVA on the groups of distances to the
median. This can be easily implemented in Excel by applying the Excel data analysis tool ANOVA:
Single Factor. Applying this tool on the above data produces the following output:

378
The Null Hypothesis of the Brown-Forsythe Test states that the average distance to the median for the
two groups are the same. Acceptance of this Null Hypothesis would imply that the sample groups have
the same variances. The p Value shown in the Excel ANOVA output equals 0.6627. This is much larger
than the Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.

379
Check Out the Latest Book in the Excel Master Series!

Click Here To Download This 200+ Page Excel Solver Optimization Manual Right Now for $19.95

http://37.solvermark.pay.clickbank.net/

For anyone who wants to be performing optimization at a high level with the Excel Solver quickly, Step-
By-Step Optimization With Excel Solver is the e-manual for you. This is a hands-on, step-by-step,
complete guidebook for both beginner and advanced Excel Solver users. This book is perfect for the
many students who are now required to be proficient in optimization in so many majors as well as industry
professionals who have an immediate need to become up-to-speed with advanced optimization in a short
time frame.
Step-By-Step Optimization With Excel Solver is 200+ pages .pdf e-manual of simple yet thorough
explanations on how to use the Excel Solver to solve today’s most widely known optimization problems.
Loaded with screen shots that are coupled with easy-to-follow instructions, this .pdf e-manual will simplify
many difficult optimization problems and make you a master of the Excel Solver almost immediately.
The author of Step-By-Step Optimization With Excel Solver, Mark Harmon, was the Internet marketing
manager for several years for the company that created the Excel Solver and currently develops it for
Microsoft Excel today. He shares his deep knowledge of and experience with optimization using the Excel
Solver in this book.
Here are just some of the Solver optimization problems that are solved completely with simple-to-
understand instructions and screen shots in this book
● The famous “Traveling Salesman” problem using Solver’s Alldifferent constraint and the Solver’s
Evolutionary method to find the shortest path to reach all customers. This also provides an advanced use
of the Excel INDEX function.

380
● The well-known “Knapsack Problem” which shows how optimize the use of limited space while
satisfying numerous other criteria.
● How to perform nonlinear regression and curve-fitting on the Solver using the Solver’s GRG Nonlinear
solving method
● How to solve the “Cutting Stock Problem” faced by many manufacturing companies who are trying to
determine the optimal way to cut sheets of material to minimize waste while satisfying customer orders.
● Portfolio optimization to maximize return or minimize risk.
● Venture capital investment selection using the Solver’s Binary constraint to maximize Net Present Value
of selected cash flows at year 0. Clever use of the If-Then-Else statements makes this a simple problem.
● How use Solver to minimize the total cost of purchasing and shipping goods from multiple suppliers to
multiple locations.
● How to optimize the selection of different production machine to minimize cost while fulfilling an order.
● How to optimally allocate a marketing budget to generate the greatest reach and frequency or number
of inbound leads at the lowest cost.

Step-By-Step Optimization With Excel Solver has complete instructions and numerous tips on every
aspect of operating the Excel Solver. You’ll fully understand the reports and know exactly how to tweek all
of the Solver’s settings for total custom use. The book also provides lots of inside advice and guidance on
setting up the model in Excel so that it will be as simple and intuitive as possible to work with All of the
optimization problems in this book are solved step-by-step using a 6-step process that works every time.
In addition to detailed screen shots and easy-to-follow explanations on how to solve every optimization
problem in this e-manual, a link is provided to download an Excel workbook that has all problems
completed exactly as they are in this e-manual.
Step-By-Step Optimization With Excel Solver is exactly the e-manual you need if you want to be
optimizing at an advanced level with the Excel Solver quickly.

Reader Testimonials

"Step-By-Step Optimization With Excel Solver is the "Missing Manual" for the Excel Solver. It is pretty
difficult to find good documentation anywhere on solving optimization problems with the Excel Solver.
This book came through like a champ!
Optimization with the Solver is definitely not intuitive, but this book is. I found it very easy to work through
every single one of the examples. The screen shots are clear and the steps are presented logically. The
downloadable Excel spreadsheet with all example completed was quite helpful as well.
Once again, it really amazing how little understandable documentation there is on doing real-life
optimization problems with Solver.
For example, just try to find anything anywhere about the well-known Traveling Salesman Problem (a
salesman needs to find the shortest route to visit all customers once).
Step-By-Step Optimization With Excel Solver is the "Missing Manual" for the Excel Solver. It is pretty
difficult to find good documentation anywhere on solving optimization problems with the Excel Solver.
This book came through like a champ!
Optimization with the Solver is definitely not intuitive, but this book is. I found it very easy to work through
every single one of the examples. The screen shots are clear and the steps are presented logically. The
downloadable Excel spreadsheet with all example completed was quite helpful as well.

381
Once again, it really amazing how little understandable documentation there is on doing real-life
optimization problems with Solver.
For example, just try to find anything anywhere about the well-known Traveling Salesman Problem (a
salesman needs to find the shortest route to visit all customers once)
It is a tricky problem for sure, but this book showed a quick and easy way to get it done. I'm not sure I
would have ever figured that problem out, or some the other problems in the book, without this manual.
I can say that this is the book for anyone who wants or needs to get up to speed on an advanced level
quickly with the Excel Solver. It appears that every single aspect of using the Solver seems to be covered
thoroughly and yet simply. The author presents a lot of tricks in how to set the correct Solver settings to
get it to do exactly what you want.
The book flows logically. It's an easy read. Step-By-Step Optimization With Excel Solver got me up to
speed on the Solver quickly and without to much mental strain at all. I can definitely recommend this
book."
Pam Copus
Sonic Media Inc

“As Graduate student of the Graduate Program in International Studies (GPIS) at Old Dominium
University, I'm required to have a thorough knowledge of Excel in order to use it as a tool for interpreting
data, conducting research and analysis.
I've always found the Excel Solver to be one of the more difficult Excel tools to totally master. Not any
more. This book was so clearly written that I was able to do almost every one of advanced optimization
examples in the book as soon as I read through it once.
I can tell that the author really made an effort to make this manual as intuitive as possible. The screen
shots were totally clear and logically presented.
Some of the examples that were very advanced, such as the venture capital investment example, had
screen shot after screen shot to ensure clarity of the difficult Excel spreadsheet and Solver dialogue
boxes.
It definitely was "Step-By-Step" just like the title says. I must say that I did have to cheat a little bit and
look at the Excel spreadsheet with all of the book's example that is downloadable from the book. The
spreadsheet was also a great help.
Step-By-Step Optimization With Excel Solver is not only totally easy to understand and follow, but it is
also very complete. I feel like I'm a master of the Solver. I have purchased a couple of other books in the
Excel MaSter Series (the Excel Statistical Master and the Advanced Regression in Excel book) and they
have all been excellent.
I am lucky to have come across this book because the graduate program that I am in has a number of
optimization assignments using the Solver. Thanks Mark for such an easy-to-follow and complete book
on using the Solver. It really saved me a lot of time in figuring this stuff out."
Federico Catapano
Graduate Student
International Studies Major
Old Dominion University
Norfolk, Virginia

"I'm finished with school (Financial Economics major) and currently work for a fortune 400 company as a
business analyst. I find that the statistics and optimization manuals are indispensable reference tools
throughout the day.

382
I keep both eManuals loaded on my ipad at all times just in case I have to recall a concept I don't use all
the time. Its easier to recall the concepts from the eManuals rather then trying to sift through the
convoluted banter in a text book, and for that I applaud the author!
In a business world where I need on demand answers now this optimization eManual is the perfect tool.
I just recently used the bond investment optimization problem to build a model in excel and help my VP
understand that a certain process we're doing wasn't maximizing our resources.
That's the great thing about this manual, you can use any practice problem (with a little outside thinking)
to mold it into your own real life problem and come up with answers that matter in the work place.!"
Sean Ralston
Sr. Financial Analyst
Enogex LLC
Oklahoma City, Oklahoma

"Excel Solver is a tool that most folks never use. I was one of those people. I was working on a project,
and was told that solver might be helpful. I did some research online, and was more confused than ever. I
started looking for a book that might help me. I got this book, and was not sure what to expect.
It surpassed my expectations! The book explains the concepts behind the solver, the best way to set up
the "problem", and how to use the tool effectively. It also gives many examples including the files. The
files are stored online, and you can download them so you can see everything in excel.
The author does a fantastic job on this book. While I'm not a solver "expert", I am definitely much smarter
about it than I was before. Trust me, if you need to understand the solver tool, this book will get you
there."
Scott Kinsey
Missouri

“The author, Mark, has a writing style that is easy to follow, simple, understandable, with clear examples
that are easy to follow. This book is no exception.
Mark explains how solver works, the different types of solutions that can be obtained and when to use
one or another, explains the content and meaning of the reports available. Then he presents several
examples, goes about defining each problem, setting it up in excel and in solver and interpreting the
solution.
It is a really good book that teaches you how to apply solver (linear programming) to a problem.”
Luis R. Heimpel
El Paso, Texas

383
Click Here To Download This 200+ Page Excel Solver Optimization Manual Right Now for $19.95

http://37.solvermark.pay.clickbank.net/

384
Meet the Author

Mark Harmon is a university statistics instructor and statistical/optimization consultant. He was the
Internet marketing manager for several years for the company that created the Excel Solver and currently
develops that add-in for Excel. He has made contributions to the development of Excel over a long period
of time dating all the way back to 1992 when he was one of the beta users of Excel 4 creating the sales
force deployment plan for the introduction of the anti-depressant drug Paxel into the North American
market.
Mark Harmon is a natural teacher. As an adjunct professor, he spent five years teaching more than thirty
semester-long courses in marketing and finance at the Anglo-American College in Prague, Czech
Republic and the International University in Vienna, Austria. During that five-year time period, he also
worked as an independent marketing consultant in the Czech Republic and performed long-term
assignments for more than one hundred clients. His years of teaching and consulting have honed his
ability to present difficult subject matter in an easy-to-understand way.
This manual got its start when Mark Harmon began conducting statistical analysis to increase the
effectiveness of various types of Internet marketing that he was performing during the first decade of the
2000s. Mark initially formulated the practical, statistical guidelines for his own use but eventually realized
that others would also greatly benefit by this step-by-step collection of statistical instructions that really did
not seem to be available elsewhere. Over the course of a number of years and several editions, this
instruction manual blossomed into the Excel Master Series of graduate-level, step-by-step, complete,
practical, and clear set of guidebooks that it is today.
Mark Harmon received a degree in electrical engineering from Villanova University and MBA in marketing
from the Wharton School.
Mark is an avid fan of the beach life and can nearly always be found by a warm and sunny beach.

385

You might also like