Basic Statistics

Dr. Gajendra K.
Vishwakarma
Assistant Professor,
Indian School of Mines
Dhanbad-826004, INDIA
Contents:
1. What is Statistics
2. Type of Variables and Data
3. Selection of Statistical Tests
4. Parametric Tests
5. Non-Parametric Tests
6. Analysis of Variance
7. Post-hoc Comparison Tests
What Statistics is?
Statistics is a Science which deals with
Systematic collection, Classification,
Tabulation, Analysis and Interpretation of
Numerical / Categorical data/facts.
Unless using appropriate statistical

techniques, the results are meaningless:
Garbage In, Garbage Out!

Identifying Variables
Step 4: Test your hypothesis

Types of Variables
There are 2 main types of variables:
Independent Variable: The variable that is changed

by the scientist; the ‘I control’ variable
Dependent Variable: The variable that might change

because of what the scientist changes – what is
being measured
Remember!
Your hypothesis can TELL you what your
variables are!
Ex. If I drink Mountain Dew before bed, then

I will not sleep very much.
IV: Drinking Mountain Dew

DV: the amount of sleep
Practice
Use this hypothesis to identify the variables:
If I leave all the lights on all day,

then my electric bill will be expensive
IV: ______________________
DV: ______________________
If I brush my cat more, then there will be less
fur on my furniture
IV: ______________________
DV: ______________________
Now read the following experiment and identify the
independent and dependent variables
Elizabeth wanted to test if temperature affected how fast milk

goes bad and curdles. She left milk in a room temperature
closet, a fridge, and a oven that was turned on low heat.
She then measured how rotten the milk was after 10 days.
IV: ____________________________________
DV: ____________________________________
Type of Data
“Science of Statistics is the most useful
servant but only of great value to those who
understand its proper use ”- King
11
Confidence Interval
Population Random Sample
Mean I am 95%
X = 50
confident that  is
Mean, , is between 40 & 60.
unknown
Sample
12
Confidence Interval
  1.645 x   1.645 x
90% Samples
  1.96 x   1.96 x
95% Samples
  2.58 x   2.58 x
99% Samples
13
Calculation of Confidence Interval
The range of values we can be reasonably certain includes the true
value
     
P  X  Z  / 2     0   X  Z  / 2    (1   ) CI ; 95 %
 n  n
     
P  X  1.96     0   X  1.96    0.05 CI ; 95 %
 n  n
    
CI   X  1.96  ,  X  1.96  
 n n
14
Testing of Hypothesis
 Hypothesis
- A statement relating to objective
 Null hypothesis
H0: there is no difference between the groups or no effect
 Alternative hypothesis
H1: there is a difference between the groups or effect
15
Null and Alternative Hypotheses
 Convert the research question to null and
alternative hypotheses
 The null hypothesis (H0) is a claim of “no
difference in the population”
 The alternative hypothesis (Ha) claims “H0 is
false”
 Collect data and seek evidence against H0 as a
way of bolstering Ha (deduction)
Hypothesis Testing Steps
A. Null and alternative hypotheses
B. Test statistic
C. P-value and interpretation
D. Significance level (optional)
Test Statistic
This is an example of a one-sample test of a mean
when σ is known. Use this statistic to test the
problem:
x  0
z stat 
SE x
where  0  population mean assuming H 0 is true

and SE x 
n
Meaning of p value
 A p-value measures the strength of the evidence against the
null hypothesis
 Lets us decide whether to reject or accept the null hypothesis

p > 0.05 Not significant
p = 0.01 to 0.05 Significant
p = 0.001 to 0.01 Very significant
p < 0.001 Extremely significant
19
P-value
 The P-value answer the question: What is the
probability of the observed test statistic or one more
extreme when H0 is true?
 This corresponds to the AUC in the tail of the
Standard Normal distribution beyond the zstat.
 Convert z statistics to P-value :
For Ha: μ> μ0  P = Pr(Z > zstat) = right-tail beyond zstat
For Ha: μ< μ0  P = Pr(Z < zstat) = left tail beyond zstat
For Ha: μμ0  P = 2 × one-tailed P-value
 Use Table B or software to find these probabilities
(next two slides).
Rejection/Non-rejection Region
Area = .95
Area =.025 Area =.025
rejection region rejection region
non-rejection region
21
One-sided P-value for zstat of 0.6
One-sided P-value for zstat of 3.0
Two-Sided P-Value
 One-sided Ha 
AUC in tail beyond
zstat
 Two-sided Ha 
consider potential Examples: If one-sided P =
deviations in both 0.0010, then two-sided P = 2
directions  double × 0.0010 = 0.0020. If one-
the one-sided P- sided P = 0.2743, then two-
value sided P = 2 × 0.2743 =
0.5486.
Interpretation
 P-value answer the question: What is the
probability of the observed test statistic …
when H0 is true?
 Thus, smaller and smaller P-values provide
stronger and stronger evidence against H0
 Small P-value  strong evidence
Interpretation
Conventions*
P > 0.10  non-significant evidence against H0
0.05 < P  0.10  marginally significant evidence
0.01 < P  0.05  significant evidence against H0
P  0.01  highly significant evidence against H0
Examples
P =.27  non-significant evidence against H0
P =.01  highly significant evidence against H0
* It is unwise to draw firm borders for “significance”
How to Choose A Statistical Test?
Selection of appropriate statistical test depends
on the type / distribution of variables (Data).
Variables : Different classes of information
are known as the variables of a
dataset
Type of variable:-
• qualitative or
• quantitative
Contd…
Qualitative data divided into:
• Nominal variables
• Ordinal variables
• Interval: (Interval variables do not have a true
zero),
• Ratio variables: (Variables spaced equal

intervals with a true zero point, e.g. age.)
Quantitative data divided into:

• Discrete variables
• Continuous variables
Selection of Appropriate Tests
Type of Data
Goal Measurement Rank, Score, or Binomial Survival Time

(from Gaussian Measurement (from (Two
Population) Non- Gaussian Possible
Population) Outcomes)
Describe one Mean, SD Median, interquartile Proportion Kaplan Meier
group range survival curve
Compare one One-sample t Wilcoxon test Chi-square

group to a test or
hypothetical Binomial
value test **
Compare two Unpaired t test Mann-Whitney test Fisher's test Log-rank test or
unpaired (chi-square Mantel-Haenszel*
groups for large
samples)
Compare two Paired t test Wilcoxon test McNemar's Conditional
paired groups test proportional
hazards
regression*
Contd…
Type of Data
Compare three One-way Kruskal-Wallis test Chi-square Cox proportional

or more ANOVA test hazard
unmatched regression**
groups
Compare three Repeated- Friedman test Cochrane Conditional
or more measures Q** proportional
matched ANOVA hazards
groups regression**
Quantify Pearson Spearman Contingenc
association correlation correlation y
between two coefficients
variables **
Predict value Simple linear Nonparametric Simple Cox proportional

from another regression regression** logistic hazard regression*
measured or regression*
variable Nonlinear
regression
Predict value from Multiple linear Multiple Cox proportional
several regression* logistic hazard regression*
measured or or regression*
binomial Multiple
variables nonlinear
regression**
Statistical Methods
1. Descriptive statistics:
•To analyze the basic features of data under
consideration.
•Tabular methods
•Numerical summaries
(Proportion ,Percentage, Mean (average), Median,
Mode, Percentiles, Range, Variance or Standard
deviation)
•Graphical methods:
(Bar Charts / Histograms, Pie Charts, Scatter
Diagrams)
Parametric & Nonparametric
Methods
Parametric :Statistical methods which depend on the
parameters of populations or probability
distributions.
Assumption :
 The observations must be independent
 The observations must be drawn from normally
distributed populations
 These populations must have the same variances
 The means of these normal and homoscedastic
populations must be linear combinations of effects
due to columns and/or rows*
Nonparametric Methods
Non-Parametric : Statistical tests that don't assume a

distribution or use parameters are
called nonparametric tests
Assumption :
 Observations are independent
 Variable under study has underlying continuity
Parametric Test
Following are the common Parametric tests :
•t-test
•ANOVA
•Regression
•Correlation
• These tests are only meaningful for

continuous data.
• Assuming normal distribution or
• Whose distribution can be rendered normal
by mathematical transformation.
What is a t Test?
 Common Definition: Comparing two
means to see if they are significantly different
from each other.
 Technical Definition: Any statistical test that
uses the t family of distributions.
Cannot use the probabilities based on the normal
distribution if
•The population standard deviation (S.D.) is unknown
AND
•We have a small sample (i.e., n < 120)
The family of t distributions was created to take
sample size into account
Types
 One sample
compare with population
 Unpaired
compare with control
 Paired
same subjects: pre-post
 Z-test
large samples >60
Single Sample t-test
Definition: Used to compare the mean of a
sample to a known number (often 0).
Assumptions: Subjects are randomly drawn
from a normal population.
Test: The hypotheses for a single sample t-test
are:
Ho: u = u0 Against H1: u < > u0

where u0 denotes the hypothesized value
Test statistic: The test statistic, t, has N-1 degrees
of freedom, where N is the number of
observations.
Is there a difference?
between you…means, who is meaner?

Independent Group t-Test
Definition: Used to compare the means of two
independent groups
Assumptions:
 Subjects are randomly assigned to one of two
groups.
 The distribution of the means being compared are
normal with equal variances.
Test: The hypotheses for the comparison of

two independent groups are:
Ho: u1 = u2 (means of the two groups are equal)
H1: u1  u2 (means of the two group are not equal)
t Test for Difference in Two Means
•Different Data Sources:
 Unrelated
 Independent
 Sample selected from one population has no
effect or bearing on the sample selected from
the other population.
• Use Difference Between the 2 Sample
Means
• Use Pooled Variance t Test
t Test for Differences in Two Means
(Variances Unknown)
•Assumptions:
 Both Populations Are Normally
Distributed
 Or, If Not Normal, Can Be Approximated
by Normal Distribution
 Samples are Randomly and
Independently
drawn
 Population Variances Are Unknown But
Contd…
 This is called the pooled variance approach.
 Population variance estimated by taking a

weighted average of the sample variances.
 Two sample means estimated before estimate the

variances, therefore we lose 2 degrees of freedom
Statistical Analysis
control treatment
group group
mean mean
Is there a difference?
What does difference mean?
medium
variability
high
variability
The mean difference

low is the same for all
variability three cases
What does difference mean?
medium
variability
high
variability
Which one shows

low the greatest
variability difference?
Developing the Pooled-Variance t Test
• Setting Up the Hypothesis:
H0: 1 = 2 H0: 1 -2 = 0 Two
OR
H1: 1  2 H1: 1 - 2  0 Tail
H0: 1  2 H0: 1 - 2  0 Right

OR
H1: 1 > 2 H1: 1 - 2 > 0 Tail
H0: 1 2 H0: 1 - 2  Left

OR Tail
H1: 1 < 2 H1: 1 - 2 < 0
•Calculate the Pooled Sample Variances as

an Estimate of the Common Populations
Variance:
Sp2 = Pooled-Variance n1 = Size of Sample 1

S12 = Variance of Sample 1 n2 = Size of Sample 2
2
S2 = Variance of sample 2
•Compute the Test Statistic:

Hypothesized
t 
(X1_  X_2) (1  2) Difference
(usually zero
1 1  when testing for
2
Sp   n  n 
 equal means)
 1 2 
df  n  n  2
1 2
2
SP 
(n 1  1) 2
S1  (n 2 1 ) S22
(n 1 1)  ( n 2 1)
Example : Efficacy of Drug A & B in
Reduction of IOP
Group1 Group2
Number of cases 21 25
Mean IOP 3.27 2.53
Std Dev 1.30 1.16
Assuming equal variances, is

there a difference in average
IOP (= 0.05)?
Calculating the Test Statistic
First, estimate the common variance as a weighted

average of the two sample variances using the
degrees of freedom as weights
SP
2

( n1  1)
S1
2
 (n2  1)  S2
2
(n1  1) (n2  1 )


( 21  1) 1.30  (25  1)  116
2
.
2
 1.510
( 21  1)(25  1)
Calculating the Test Statistic
t 
( X 1  X 2 ) (1   2)
SP
2 1 1 
   
n 1 n 2 
(3.27  2.53)  0

1510
. 1 1 
   
 21 25 
 2.03
Inference
 H0: 1 - 2 = 0 (1 = 2)
Test Statistic:
 H1: 1 - 2 0 (12)
 = 0.05 3.27  2. 53
t  2.03
 df = 21 + 25 - 2 = 44 1 1 
1.510    P=.048
 Critical Value(s): 21 25 
Decision:
Reject H0 Reject H0 Reject at  = 0.05
.025 .025 Conclusion:
There is evidence of a
-2.0154 0 2.0154 t difference in means.
What happens if samples aren’t
independent?
That is, they are
“dependent” or “correlated”?
Paired T test
•Definition: Used to compare means on the same or
related subject over time or in differing circumstances.
•Assumptions: The observed data are from the

same subject or from a matched subject and are
drawn from a population with a normal distribution.
• Characteristics: Subjects are often tested in a
before-after situation (across time, with some
intervention).
•An extension of this test is the repeated measure

ANOVA.
Paired T test
•Test: The paired t-test is actually a test that the
differences between the two observations is 0.
If D represents the difference between
observations, the hypotheses are:
Ho: D = 0 Against H1: D  0
•The test statistic is t with n-1 degrees of freedom.
If the p-value associated with t is < 0.05),

Inference : Reject the null hypothesis i.e. there is a
difference in means across the paired observations.
Example : Test whether reduction in IOP is
higher in Active Drug than Placebo?
(IOP) Active Rx Placebo

60 32
32 44
80 22
50 40
Sample Average: 55.5 34.5
Hypotheses for Paired T-test
Does the average difference of the population, D,
differ from 0?
Null hypothesis: H0: D = 1 - 2 = 0
Alternative hypotheses: HA: D = 1 - 2  0

HA: D = 1 - 2 > 0
HA: D = 1 - 2 < 0
The Paired-T Test Statistic
If:
• there are n pairs
• and the differences are normally distributed
Then:
The test statistic, which follows a t-distribution with n-1
degrees of freedom, gives us our p-value:
sample difference hypothesized difference d μ

td   s d
standard error of differences d
n
Data analyzed as 2-Sample T
Two sample T for M vs F
N Mean StDev SE Mean

Active Rx 4 41.5 26.2 13
Placebo 4 39.5 26.1 13
T-Test = 0.11
P = 0.92 DF = 6
Both use Pooled StDev = 26.2
P = 0.92. Do not reject null. Insufficient evidence to

conclude that average reduction in IOP differ between
Active Rx and Placebo.
Data analyzed as Paired T
Paired T for M - F
N Mean StDev SE Mean

Active Rx 4 41.5 26.2 13.1
Placebo 4 39.5 26.1 13.1
Difference 4 2.000 0.816 0.408
T-Test of mean difference = 0 (vs not = 0):

T-Value = 4.90
P-Value = 0.016
P = 0.016. Reject null. Sufficient evidence to
conclude that average reduction in IOP differ
significantly between Active Rx and Placebo.
Limitations – paired t test
 Doesn’t control a No. of other variables
in a simple pre-post design
 In many studies pre-test not possible

- mortality
studies
 With-in subject variation is introduced

twice
What happened?
 P-value from two-sample t-test is just plain wrong.
(Assumptions not met.)
 We removed or “blocked out” the extra variability

in the data due to differences in jobs, thereby
focusing directly on the differences in salaries.
 The paired t-test is more “powerful” because the

paired design reduces the variability in the data.
Nonparametric Methods
 There is at least one nonparametric test
equivalent to a parametric test
 These tests fall into several categories

– Tests of differences between groups
(independent samples)
– Tests of differences between variables
(dependent samples)
– Tests of relationships between variables
Common Nonparametric Tests
 Wilcoxon Rank-sum test ~ t test
 (More commonly called the Mann-Whitney
test)
 Wilcoxon Signed Rank Test ~ paired t test
 Sign test (a simple paired test, uses only sign of

differences not ranks or raw values)
 Kruskal-Wallis test ~ ANOVA (like a t test or

rank-sum test with more than 2 groups)
Differences between independent groups
Two samples –compare mean value
Parametric Nonparametric
t-test for independent Mann-Whitney U test
samples
Wald-Wolfowitz runs
test
Kolmogorov-Smirnov
two sample test
Mann-Whitney U Test
 Nonparametric alternative to two-sample t-
test
 Actual measurements not used – ranks of
the measurements used
 Data can be ranked from highest to lowest
or lowest to highest values
 Calculate Mann-Whitney U statistic
U = n1n2 + n1(n1+1) – R1
2
Example: Mann-Whitney U test
 Two tailed null hypothesis that there is no

difference between the heights of male and female
students
 Ho: Male and female students are the same height
 HA: Male and female students are not the same

height
U = n1n2 + n1(n1+1) – R1 Heights Heights Ranks Ranks
2 of males of of male of
(cm) females heights female
U=(7)(5) + (7)(8) – 30 (cm) heights
2
Example
U = 35 + 28 – 30 193 175 1 7
U = 33 188 173 2 8
185 168 3 10
U’ = n1n2 – U
183 165 4 11
U’ = (7)(5) – 33
180 163 5 12
U’ = 2
178 6
U 0.05(2),7,5 = U 0.05(2),5,7 =
30 170 9
As 33 > 30, Ho is n1 = 7 n2 = 5 R1 = 30 R2 = 48
rejected
Differences between dependent groups
 Compare two Parametric Nonparametric
variables
measured in the t-test for
same sample dependent Sign test
samples
Wilcoxon’s
matched pairs
test
 If more than Repeated Friedman’s two
two variables measures way analysis of
are measured in ANOVA variance
same sample Cochran Q
Wilcoxon Signed Rank Test
 Also called the Wilcoxon paired-sample
test
 For paired data (before-after studies,

matched pairs)
 Uses signs (+/-) of ranks of differences

between pairs
contd…
 Calculate differences (or changes)
 Rank differences ignoring signs (+/-)
 Add signs of differences to ranks
 Sum number of positive ranks
 Sum number of negative ranks
 Use smallest number as the test statistics

 A new drug to lower blood pressure
is given to 10 people.
Example
 Diastolic blood pressure was

measured before taking the medicine
and one week later.
 Null hypothesis: no change in
blood pressure (median of
differences is 0)
Example
Before After Before Rank Signed
- after w/o Rank T+ = 51
signs
T- = 4
90 86 4 4.5 4.5
86 82 4 4.5 4.5 From table
85 88 -3 3 -3 .01 ≤ p≤ .02
90 85 5 7 7
Reject null
81 82 -1 1 -1 hypothesis,
i.e. drug has an
87 82 5 7 7 effect
91 85 6 9.5 9.5
80 75 5 7 7
86 80 6 9.5 9.5
92 90 2 2 2
The Kruskal-Wallis H Test
 The Kruskal-Wallis H Test is a
nonparametric procedure
 Used to compare more than two populations
in a CRD.
 All n = n = n1+n2+…+nk measurements are

jointly ranked (i.e.treat as one large sample).
 Use the sums of the ranks of the k samples

to compare the distributions.
 Rank the total measurements in all k
samples
 from 1 to n. Tied observations are assigned
average of the ranks they would have gotten
if not tied.
 Calculate
 Ti = rank sum for the ith sample i = 1,
2,…,k
 And the test statistic
12 Ti 2
H    3( n  1)
n( n  1) ni
H
H00:: the
the kk distributions
distributions are
are identical
identical versus
versus
H
Haa:: atatleast
least one
one distribution
distribution isis different
different
Test
Test statistic:
statistic: Kruskal-Wallis
Kruskal-Wallis H H
When H
When H00 isis true,
true, the
the test
test statistic
statistic H H hashas an
an
approximate
approximate chi-square
chi-square distribution
distribution with
with df
df == k-1.
k-1.
Use
Use aa right-tailed
right-tailed rejection
rejection region
region or
or p-value
p-value based
based on
on
the
the Chi-square
Chi-square distribution.
distribution.
Example
Four groups of Patients were randomly assigned to
be treat with four different drugs, and their
achievement test scores were recorded. Are the
distributions of test scores the same, or do they
differ in location?
1 2 3 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
Treatment Group Rank
Rankthe
the16
16
measurements
measurements
1 2 3 4
from
from11toto16,
16,
65 (3) 75 (7) 59(1) 94 (16)
and
andcalculate
calculate
87 (13) 69 (5) 78 (8) 89 (15)
the
thefour
fourrank
rank
73 (6) 83 (12) 67 (4) 80 (10) sums.
sums.
79 (9) 81 (11) 62 (2) 88 (14)
Ti 31 35 15 55
HH0::the distributions of scores are the same

0 the distributions of scores are the same
HHa::the
thedistributions
distributionsdiffer
differin
inlocation
location
a
12 Ti 2
Test statistic: H    3(n  1)
n(n  1) ni
12  312  352  152  552 
    3(17)  8.96
16(17)  4 
Treatment Group
HH0::the distributions of scores are the same
0 the distributions of scores are the same
HHa::the
thedistributions
distributionsdiffer
differin
inlocation
location
a
12 Ti 2
Test statistic: H    3(n  1)
n(n  1) ni
12  312  352  152  552 
    3(17)  8.96
16(17)  4 
Rejection
Rejectionregion:
region: For
Foraaright-
right- Reject
RejectHH00..There
Thereisissufficient
sufficient
tailed
tailedchi-square
chi-squaretest with
testwith evidence
evidenceto toindicate
indicatethat
thatthere
there
==.05
.05and
anddf
df==4-1
4-1=3,
=3,reject
rejectHH00ifif isisaadifference
differenceinintest
testscores
scoresfor
for
HH 7.81.
7.81. the
thefour
fourdrugs..
drugs..
Key Concepts
Kruskal-Wallis H Test: Completely Randomized Design
1. Jointly rank all the observations in the k samples (treat as
one large sample of size n say). Calculate the rank sums,
Ti  rank sum of sample i, and the test statistic
2. If the null hypothesis of equality of distributions is false,

H will be unusually large, resulting in a one-tailed test.
3. For sample sizes of five or greater, the rejection region for
H is based on the chi-square distribution with (k  1)
degrees of freedom.
Advantages of Nonparametric Tests
 Probability statements obtained from
nonparametric statistics are exact
probabilities, regardless of the shape of the
population distribution from which the
random sample was drawn.
 If sample sizes as small as N=6 are used,

there is no alternative to using a
nonparametric test.
Limitations - general
 Fails to gauge magnitude of difference
between two means
Solution : construct Confidence
interval
 Only compares 2 groups

Solution- if> than 2 groups – ANOVA
Independent Group Analysis of
Variance (ANOVA)
 Definition: An extension of the
independent group t-test for more than two
groups.
 Used to compare the means of more
than two independent groups.
 Assumptions: Subjects are randomly
assigned to one of n groups.
 The distribution of the means by
group are normal with equal variances.
Independent Group Analysis of
Variance (ANOVA)
 Test: The hypotheses for the comparison of independent
groups are: (k is the number of groups)
 Ho: u1 = u2 ... = uk (means of the all groups are equal)
 Ha: ui <> uj (means of the two or more groups are not
equal)
 The test is performed in an ANOVA table.
 The test statistic is an F test with k-1 and N-k degrees of
freedom, where N is the total number of subjects.
 A low p-value for this test indicates that at least one pair
of means are not equal.
Post Hoc Tests
 What is a Post Hoc Test?

 Review:
 Adjusting Alpha Level
 Multiple A Priori Comparisons
 What makes a test Post Hoc?

 Many tests could be Post Hoc… But, there are
set Post Hoc tests

Studentized Range Statistic q
qt 2
yL  yS Independent Groups
qr 
MS error yL  Largest mean r  # steps  1
n yS  Smallest mean
Example
y1 y2 y3 11 .8  8.2 3.6
q3    2.06
8.2 8.2 11.8 15.4 1.75
Note: arrange means 5 Fail to Reject
in ascending order!
n5 MSerror  12 q( r 3,df 12, .05) = 3.77

critical value
df error  12
 q’s can tell where differences are (more

specific than F)
 Solving q’s is just like solving t’s

Solving for the Smallest Significant
Difference
yL  yS
qr  Example
MS error y1 y2 y3 n5
n 8.2 8.2 11.8 MSerror  12
MSerror q( r 3,df 12, .05) = 3.77

y L  y S  qr critical value
n
15.4
 3.77  6.61
5
Solving for the Smallest Significant
Difference
 Solving for the smallest significant
difference will help make quicker
comparisons
 But we still need a way to organize things

nicely…
Unequal N’s
yL  yS Tukey-Kramer
qr 
MS error
Replace MS error
n
n
MSerror
y L  y S  qr with MS error MS error

n nL nS
2
L  larger y n
S  smaller y n
Unequal N’s
Behrens-Fisher
2
S 2
S 2
 L
 S

S2
S *
2
n L nS 
 r  y L  yS  q0.05 ( r ,df  ) L
 S
n L nS df   
2 2
 S L2   S S2 
2    
 n L    nS 
* Each particular pairing of means must be nL  1 nS  1
examined with a different critical q value
and their own S 2
Thus, the smallest significant difference will

vary even for a given r
Trying to fix q
Tukey's HSD Tukey's WSD

N-K except qk  qr
qWSD 
qHSD always  qr (largest) 2
If there are 4 means, all k  # of means
differences are treated as 4
r = # of steps between the two
steps.
means to be compared.
What Happens to Alpha Level?

Power?
Tukey’s HSD and WSD
T1 T2 T3 T4 T5 r r
T1 1 1 7 8 5 4.04
T2 0 6 7 4 3.79
T3 6 7 3 3.44
T4 1 2 2.86
T5
Use Tukey’s WSD, not normal method for q

Dunnett’s Control vs. Treatment
run standard t and use t d Table (MSe ) or, solve for critical difference (CV)
2 MS e
CV( yc  yT )  t d
j
n
Example
yc  10 , yT1  8 , yT2  4 , MS e  30 , n  11
Go to Table for t d (k , df e )
yc  yT1  2 ns
t d  2.32
yc  yT2  6 * p  0.05
2(30)
CV  2.32
11
 2.32 (2.34)  5.42
Sheffé’s Test
Linear contrast To evaluate

MS(contrast) 1) consult F table and find critical value F.05
F= (k-1, dferror) (CV)
MS(error)
2) multiply CV by (k-1). (new CV)
It sets the family-wise Type-I Error rate (   .05 ) for ALL

possible linear contrasts, not merely the pair-wise comparisons.
Don’t use when only doing pair-wise, because it will be overly

conservative.
Post Hoc Summary
 When to use what…
 q in most situations… but use Tukey’s HSD for
critical value
 Put things in a Newman-Kewls table
 when N’s are unequal, use Tukey’s correction
 Dunnett’s when you have one control and
multiple treatments
 Sheffé’s ONLY when you are doing complex

comparisons (i.e., contrasts)
Post Hoc Summary
 Be aware of the alpha level and power issues…
 Why can’t we have a perfect test (i.e., low
alpha level and high power)?
 How does Tukey’s HSD and HSD relate to

this?
 How does Dunnett’s relate to this?
 How does Sheffé’s relate to this?

(Any Question ?)

Basic Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Statistics

Uploaded by

Copyright:

Available Formats

Dr. Gajendra K.

Unless using appropriate statistical

Garbage In, Garbage Out!

Step 4: Test your hypothesis

Independent Variable: The variable that is changed

Dependent Variable: The variable that might change

Ex. If I drink Mountain Dew before bed, then

IV: Drinking Mountain Dew

If I leave all the lights on all day,

Elizabeth wanted to test if temperature affected how fast milk

 Lets us decide whether to reject or accept the null hypothesis

Area =.025 Area =.025

rejection region rejection region

• Ratio variables: (Variables spaced equal

Quantitative data divided into:

Goal Measurement Rank, Score, or Binomial Survival Time

Compare one One-sample t Wilcoxon test Chi-square

Compare three One-way Kruskal-Wallis test Chi-square Cox proportional

Predict value Simple linear Nonparametric Simple Cox proportional

Non-Parametric : Statistical tests that don't assume a

• These tests are only meaningful for

Ho: u = u0 Against H1: u < > u0

between you…means, who is meaner?

Test: The hypotheses for the comparison of

 This is called the pooled variance approach.

 Population variance estimated by taking a

 Two sample means estimated before estimate the

The mean difference

Which one shows

H0: 1  2 H0: 1 - 2  0 Right

H0: 1 2 H0: 1 - 2  Left

•Calculate the Pooled Sample Variances as

Sp2 = Pooled-Variance n1 = Size of Sample 1

•Compute the Test Statistic:

Assuming equal variances, is

First, estimate the common variance as a weighted

(n1  1) (n2  1 )

•Assumptions: The observed data are from the

•An extension of this test is the repeated measure

Ho: D = 0 Against H1: D  0

•The test statistic is t with n-1 degrees of freedom.

If the p-value associated with t is < 0.05),

(IOP) Active Rx Placebo

Null hypothesis: H0: D = 1 - 2 = 0

Alternative hypotheses: HA: D = 1 - 2  0

sample difference hypothesized difference d μ

N Mean StDev SE Mean

P = 0.92. Do not reject null. Insufficient evidence to

N Mean StDev SE Mean

T-Test of mean difference = 0 (vs not = 0):

 In many studies pre-test not possible

 With-in subject variation is introduced

 We removed or “blocked out” the extra variability

 The paired t-test is more “powerful” because the

 These tests fall into several categories

 Wilcoxon Signed Rank Test ~ paired t test

 Sign test (a simple paired test, uses only sign of

 Kruskal-Wallis test ~ ANOVA (like a t test or

 Two tailed null hypothesis that there is no

 Ho: Male and female students are the same height

 HA: Male and female students are not the same

 For paired data (before-after studies,

 Uses signs (+/-) of ranks of differences

 Calculate differences (or changes)