Professional Documents
Culture Documents
12T Hypoth FT 2PropTests
12T Hypoth FT 2PropTests
Hypothesis Tests:
One Factor - Two Groups
F-tests (variances)
t-tests (means)
Two Proportions
Topic Motivation:
Which Group is performing better?
Case I:
Measure: Time to reconcile account, days
Group A (International Accounts): Average = 8.5 days
Group B (Domestic Accounts): Average = 9.0 days
Based on 20 samples from each group taken over 1 month
(lower the better)
Case II:
Measure: % Defective (two proportions)
Shift 1 = 3.2%
Shift 2 = 3.0%
Based on 200 samples taken over 1 month
Comments:
2
Topics
Common Approach:
Step 1 (Test variances):
If have two groups of independent samples and may assume
underlying distribution is Normal, use F-test to test for
differences in the two variances
If concerned about normality, may use Levene Test
Step 2 (Test means):
If data from the two groups are independent - use independent
samples t-test either:
Assuming equal variances
Assuming unequal variances
Note: If data are paired - use Paired t-test
Paired: Same physical unit in both groups 10
Case Study
Document Handling Time (DHT)
Background: Financial services firm is required to check that original Trade
Fund documents meet Govt compliance requirements
Process: Review request receive time to notification sent to Adviser
Metric: Time required to check document compliance (DHT)
Scope: Internal Requirement: DHT < 10 min per review of Std. Document
Control Factors
Location (Region A vs. Region B)
Type of Customer (VIP vs. Regular)
Time of year (Peak vs Non-peak)
Staff Skill Level (Expert vs. Proficient)
Submission In Good Order vs. Not
Language (E vs. F)
Multi-Box Plot 13
Another Way:
What do non-
overlapping 95%
Confidence Intervals
of Std. Dev. suggest?
Assume
Normal
Assume
a = 0.05
Method DF1 DF2 Statistic P-Value
F 163 45 1.60 0.067
16
Region (A vs. B)
Language (E vs. F)
Staff Experience (Proficient vs. New Hire)
VIP Status (Yes=1 vs. No=0)
Document Not in Good Order/Rework (Yes=1, No=0)
Measurement System Study: Automated Collection System vs. Manual
18
Pooled t-test
(Independent two-sample test)
Pooled t-tests assume equal variances
(pooled variance Sp)
Ho : m1 m2
Ha : m1 m2
Test Statistic:
X1 - X 2 Sp 2
=
n1 1S12 n2 1S 22
t
S p n11 n12 n1 + n2 - 2
Given our two variance test results, which test might you recommend?
(Note: Or, choose independently of F-test results based on
understanding of the process)
20
Minitab: t-test
(Region A vs. B)
Ho : m1 m2 Ha : m1 m2
Check box
if assume
equal variances
21
Minitab: STAT >> Basic Statistics >> Two Sample t
22
Interval Plots
Another way to compare Means is Interval Plot (Minitab Graphs)
What is the visual indicator of a significant difference?
23
Will Discuss
in Power and
Sample Size
Planning
24
Provides similar
information plus
interpretations/
graphical output
Conclusion: Fail to
Reject Ho (i.e.,
not enough
evidence to
conclude means
are different)
25
Ho : m A m B
Alternative Ha is
Ha : m A m B the difference
of interest to test
26
Minitab Assistant:
One-Sided (one-tail) Test
Important: Alternative
Ha is what you are
interested in concluding!
Suppose wish to test if
New Hires take longer
to complete review
Assume
Minitab Results: One-Sided Test a = 0.05
P-value = 0.005
Mean Difference
Effect = 4.84
(=16.64 11.80)
m D m1 m2
Ho : m D 0 Ha : m D 0 Test Statistic:
d
Here, d-bar is the average difference t
sd is the std. dev. of the differences sd
n is the number of paired samples
n
Minitab Results
Paired t-test
31
33
35
Sample X N Sample p
1 187 479 0.390397
2 214 1128 0.189716
Ha: p1 > p2
36
Minitab Results
Sample X N Sample p
1 187 479 0.390397
2 214 1128 0.189716 Assume
a = 0.05
Difference = p (1) - p (2)
Estimate for difference: 0.200680
95% lower bound for difference: 0.159293
Test for difference = 0 (vs > 0): Z = 8.50
P-Value = 0.000
Summary
Hypothesis Tests for differences between two groups:
F-test: Test two variances
T-test: Test two means (equal variances, unequal variances, paired)
Two Proportion Test: Test two proportions
Different types of t-tests exist based on whether:
Samples are independent or dependent (e.g., Paired t-test uses same units
in each group)
Within group variances are assumed equal or unequal
Results may be affected setup: alpha error, sample size, 1-sided vs. 2-
sided tests, and how representative sample is of population
Hypothesis tests (e.g., F-test, t-test, two proportion test) provide a tool
to assess if a difference is statistically significant
Ultimately, users must determine if statistically significant also
implies practically significant
38
Appendix:
Effect Size, ni, and Practical Significance
*Defining very large sample size for a continuous variable would depend on the
standard deviation and test result implications though generally a sample size of 10,000
(or perhaps 1000) may be viewed as very large. Of course, regardless of sample size,
one could have statistically significant but not practically meaningful results
39
Mean
Factor Mean 1 Mean 2 Pooled S* |Cohen's d| Effect Category
Difference
Region (A vs B) 11.52 13.84 5.57 -2.32 0.42 Small
Language (E vs F) 12.15 11.66 5.65 0.49 0.09 Minimal or Near Zero
In Order vs. NGO 10.3 18 4.65 -7.70 1.66 Large
Suppose we wish to
estimate the effect
of rework (cases not in
good order or NGO=1)
41
42