Professional Documents
Culture Documents
Basic Statistics
Basic Statistics
Vishwakarma
Assistant Professor,
Indian School of Mines
Dhanbad-826004, INDIA
Contents:
1. What is Statistics
2. Type of Variables and Data
3. Selection of Statistical Tests
4. Parametric Tests
5. Non-Parametric Tests
6. Analysis of Variance
7. Post-hoc Comparison Tests
What Statistics is?
Statistics is a Science which deals with
Systematic collection, Classification,
Tabulation, Analysis and Interpretation of
Numerical / Categorical data/facts.
IV: ______________________
DV: ______________________
If I brush my cat more, then there will be less
fur on my furniture
IV: ______________________
DV: ______________________
Now read the following experiment and identify the
independent and dependent variables
IV: ____________________________________
DV: ____________________________________
Type of Data
“Science of Statistics is the most useful
servant but only of great value to those who
understand its proper use ”- King
11
Confidence Interval
Population Random Sample
Mean I am 95%
X = 50
confident that is
Mean, , is between 40 & 60.
unknown
Sample
12
Confidence Interval
1.645 x 1.645 x
90% Samples
1.96 x 1.96 x
95% Samples
2.58 x 2.58 x
99% Samples
13
Calculation of Confidence Interval
The range of values we can be reasonably certain includes the true
value
P X Z / 2 0 X Z / 2 (1 ) CI ; 95 %
n n
P X 1.96 0 X 1.96 0.05 CI ; 95 %
n n
CI X 1.96 , X 1.96
n n
14
Testing of Hypothesis
Hypothesis
- A statement relating to objective
Null hypothesis
H0: there is no difference between the groups or no effect
Alternative hypothesis
H1: there is a difference between the groups or effect
15
Null and Alternative Hypotheses
Convert the research question to null and
alternative hypotheses
The null hypothesis (H0) is a claim of “no
difference in the population”
The alternative hypothesis (Ha) claims “H0 is
false”
Collect data and seek evidence against H0 as a
way of bolstering Ha (deduction)
Hypothesis Testing Steps
A. Null and alternative hypotheses
B. Test statistic
C. P-value and interpretation
D. Significance level (optional)
Test Statistic
This is an example of a one-sample test of a mean
when σ is known. Use this statistic to test the
problem:
x 0
z stat
SE x
where 0 population mean assuming H 0 is true
and SE x
n
Meaning of p value
A p-value measures the strength of the evidence against the
null hypothesis
19
P-value
The P-value answer the question: What is the
probability of the observed test statistic or one more
extreme when H0 is true?
This corresponds to the AUC in the tail of the
Standard Normal distribution beyond the zstat.
Convert z statistics to P-value :
For Ha: μ> μ0 P = Pr(Z > zstat) = right-tail beyond zstat
For Ha: μ< μ0 P = Pr(Z < zstat) = left tail beyond zstat
For Ha: μμ0 P = 2 × one-tailed P-value
Use Table B or software to find these probabilities
(next two slides).
Rejection/Non-rejection Region
Area = .95
non-rejection region
21
One-sided P-value for zstat of 0.6
One-sided P-value for zstat of 3.0
Two-Sided P-Value
One-sided Ha
AUC in tail beyond
zstat
Two-sided Ha
consider potential Examples: If one-sided P =
deviations in both 0.0010, then two-sided P = 2
directions double × 0.0010 = 0.0020. If one-
the one-sided P- sided P = 0.2743, then two-
value sided P = 2 × 0.2743 =
0.5486.
Interpretation
P-value answer the question: What is the
probability of the observed test statistic …
when H0 is true?
Thus, smaller and smaller P-values provide
stronger and stronger evidence against H0
Small P-value strong evidence
Interpretation
Conventions*
P > 0.10 non-significant evidence against H0
0.05 < P 0.10 marginally significant evidence
0.01 < P 0.05 significant evidence against H0
P 0.01 highly significant evidence against H0
Examples
P =.27 non-significant evidence against H0
P =.01 highly significant evidence against H0
* It is unwise to draw firm borders for “significance”
How to Choose A Statistical Test?
Selection of appropriate statistical test depends
on the type / distribution of variables (Data).
Variables : Different classes of information
are known as the variables of a
dataset
Type of variable:-
• qualitative or
• quantitative
Contd…
Qualitative data divided into:
• Nominal variables
• Ordinal variables
• Interval: (Interval variables do not have a true
zero),
control treatment
group group
mean mean
Is there a difference?
What does difference mean?
medium
variability
high
variability
medium
variability
high
variability
t
(X1_ X_2) (1 2) Difference
(usually zero
1 1 when testing for
2
Sp n n
equal means)
1 2
df n n 2
1 2
2
SP
(n 1 1) 2
S1 (n 2 1 ) S22
(n 1 1) ( n 2 1)
Example : Efficacy of Drug A & B in
Reduction of IOP
Group1 Group2
Number of cases 21 25
Mean IOP 3.27 2.53
Std Dev 1.30 1.16
SP
2
( n1 1)
S1
2
(n2 1) S2
2
t
( X 1 X 2 ) (1 2)
SP
2 1 1
n 1 n 2
(3.27 2.53) 0
1510
. 1 1
21 25
2.03
Inference
H0: 1 - 2 = 0 (1 = 2)
Test Statistic:
H1: 1 - 2 0 (12)
= 0.05 3.27 2. 53
t 2.03
df = 21 + 25 - 2 = 44 1 1
1.510 P=.048
Critical Value(s): 21 25
Decision:
Reject H0 Reject H0 Reject at = 0.05
.025 .025 Conclusion:
There is evidence of a
-2.0154 0 2.0154 t difference in means.
What happens if samples aren’t
independent?
That is, they are
“dependent” or “correlated”?
Paired T test
•Definition: Used to compare means on the same or
related subject over time or in differing circumstances.
T-Test = 0.11
P = 0.92 DF = 6
Both use Pooled StDev = 26.2
(independent samples)
– Tests of differences between variables
(dependent samples)
– Tests of relationships between variables
Common Nonparametric Tests
Wilcoxon Rank-sum test ~ t test
(More commonly called the Mann-Whitney
test)
Parametric Nonparametric
t-test for independent Mann-Whitney U test
samples
Wald-Wolfowitz runs
test
Kolmogorov-Smirnov
two sample test
Mann-Whitney U Test
Nonparametric alternative to two-sample t-
test
Actual measurements not used – ranks of
the measurements used
Data can be ranked from highest to lowest
or lowest to highest values
Calculate Mann-Whitney U statistic
U = n1n2 + n1(n1+1) – R1
2
Example: Mann-Whitney U test
U = 35 + 28 – 30 193 175 1 7
U = 33 188 173 2 8
185 168 3 10
U’ = n1n2 – U
183 165 4 11
U’ = (7)(5) – 33
180 163 5 12
U’ = 2
178 6
U 0.05(2),7,5 = U 0.05(2),5,7 =
30 170 9
As 33 > 30, Ho is n1 = 7 n2 = 5 R1 = 30 R2 = 48
rejected
Differences between dependent groups
Compare two Parametric Nonparametric
variables
measured in the t-test for
same sample dependent Sign test
samples
Wilcoxon’s
matched pairs
test
If more than Repeated Friedman’s two
two variables measures way analysis of
are measured in ANOVA variance
same sample Cochran Q
Wilcoxon Signed Rank Test
Also called the Wilcoxon paired-sample
test
85 88 -3 3 -3 .01 ≤ p≤ .02
90 85 5 7 7
Reject null
81 82 -1 1 -1 hypothesis,
i.e. drug has an
87 82 5 7 7 effect
91 85 6 9.5 9.5
80 75 5 7 7
86 80 6 9.5 9.5
92 90 2 2 2
The Kruskal-Wallis H Test
The Kruskal-Wallis H Test is a
nonparametric procedure
Used to compare more than two populations
in a CRD.
H
H00:: the
the kk distributions
distributions are
are identical
identical versus
versus
H
Haa:: atatleast
least one
one distribution
distribution isis different
different
Test
Test statistic:
statistic: Kruskal-Wallis
Kruskal-Wallis H H
When H
When H00 isis true,
true, the
the test
test statistic
statistic H H hashas an
an
approximate
approximate chi-square
chi-square distribution
distribution with
with df
df == k-1.
k-1.
Use
Use aa right-tailed
right-tailed rejection
rejection region
region or
or p-value
p-value based
based on
on
the
the Chi-square
Chi-square distribution.
distribution.
Example
Four groups of Patients were randomly assigned to
be treat with four different drugs, and their
achievement test scores were recorded. Are the
distributions of test scores the same, or do they
differ in location?
1 2 3 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
Treatment Group Rank
Rankthe
the16
16
measurements
measurements
1 2 3 4
from
from11toto16,
16,
65 (3) 75 (7) 59(1) 94 (16)
and
andcalculate
calculate
87 (13) 69 (5) 78 (8) 89 (15)
the
thefour
fourrank
rank
73 (6) 83 (12) 67 (4) 80 (10) sums.
sums.
79 (9) 81 (11) 62 (2) 88 (14)
Ti 31 35 15 55
12 Ti 2
Test statistic: H 3(n 1)
n(n 1) ni
12 312 352 152 552
3(17) 8.96
16(17) 4
Treatment Group
HH0::the distributions of scores are the same
0 the distributions of scores are the same
HHa::the
thedistributions
distributionsdiffer
differin
inlocation
location
a
12 Ti 2
Test statistic: H 3(n 1)
n(n 1) ni
12 312 352 152 552
3(17) 8.96
16(17) 4
Rejection
Rejectionregion:
region: For
Foraaright-
right- Reject
RejectHH00..There
Thereisissufficient
sufficient
tailed
tailedchi-square
chi-squaretest with
testwith evidence
evidenceto toindicate
indicatethat
thatthere
there
==.05
.05and
anddf
df==4-1
4-1=3,
=3,reject
rejectHH00ifif isisaadifference
differenceinintest
testscores
scoresfor
for
HH 7.81.
7.81. the
thefour
fourdrugs..
drugs..
Key Concepts
Kruskal-Wallis H Test: Completely Randomized Design
1. Jointly rank all the observations in the k samples (treat as
one large sample of size n say). Calculate the rank sums,
Ti rank sum of sample i, and the test statistic
qt 2
Studentized Range Statistic q
yL yS Independent Groups
qr
MS error yL Largest mean r # steps 1
n yS Smallest mean
Example
y1 y2 y3 11 .8 8.2 3.6
q3 2.06
8.2 8.2 11.8 15.4 1.75
Note: arrange means 5 Fail to Reject
in ascending order!
yL yS
qr Example
MS error y1 y2 y3 n5
n 8.2 8.2 11.8 MSerror 12
yL yS Tukey-Kramer
qr
MS error
Replace MS error
n
n
MSerror
y L y S qr with MS error MS error
n nL nS
2
L larger y n
S smaller y n
Unequal N’s
Behrens-Fisher
2
S 2
S 2
L
S
S2
S *
2
n L nS
r y L yS q0.05 ( r ,df ) L
S
n L nS df
2 2
S L2 S S2
2
n L nS
* Each particular pairing of means must be nL 1 nS 1
examined with a different critical q value
and their own S 2
T1 T2 T3 T4 T5 r r
T1 1 1 7 8 5 4.04
T2 0 6 7 4 3.79
T3 6 7 3 3.44
T4 1 2 2.86
T5
run standard t and use t d Table (MSe ) or, solve for critical difference (CV)
2 MS e
CV( yc yT ) t d
j
n
Example
yc 10 , yT1 8 , yT2 4 , MS e 30 , n 11
Go to Table for t d (k , df e )
yc yT1 2 ns
t d 2.32
yc yT2 6 * p 0.05
2(30)
CV 2.32
11
2.32 (2.34) 5.42
Sheffé’s Test
critical value
Put things in a Newman-Kewls table
multiple treatments