Professional Documents
Culture Documents
Doc2 2 PDF
Doc2 2 PDF
Prepared by:
Anjene Palma
1
Parametric and Nonparametric: Demystifying the Terms
2
The field of statistics exists because it is usually impossible to collect data from
all individuals of interest (population). Our only solution is to collect data from a subset
(sample) of the individuals of interest, but our real desire is to know the “truth” about the
population. Quantities such as means, standard deviations and proportions are all
important values and are called “parameters” when we are talking about a population.
Since we usually cannot get data from the whole population, we cannot know the values
of the parameters for that population. We can, however, calculate estimates of these
quantities for our sample. When they are calculated from sample data, these quantities
are called “statistics.” A statistic estimates a parameter.
Parametric statistical procedures rely on assumptions about the shape of the
distribution (i.e., assume a normal distribution) in the underlying population and about
the form or parameters (i.e., means and standard deviations) of the assumed
distribution. Nonparametric statistical procedures rely on no or few assumptions about
the shape or parameters of the population distribution from which the sample was
drawn.
3
Chi-square Determine whether
Ch-square a data set fits a
Goodness of Fit
known distribution
Chi-square Test for Chi-square Determine whether
Independence probabilities
classified
Non-Parametric Compares one
(Makes no sample median to a
assumption about 1 Sample Sign Test None historical median or
the underlying target
distribution of the Compares two
Mann-Whitney None
data) independent
TEST
sample median
www.statanalytica.com
4
group?
Compare means If our experiment
between two or has three groups Analysis
more (e.g., placebo, new of variance Kruskal-Wallis test
distinct/independent drug #1, new drug (ANOVA)
groups #2), we might want
to know whether
the mean systolic
blood pressure at
baseline differed
among the three
groups?
Estimate the degree Is systolic blood
of association pressure Pearson coefficient Spearman`s rank
between two associated with the of correlation correlation
quantitative patient`s age?
variables
5
more difficult than for parametric procedures.
Visit with the statistician if you are in doubt whether parametric or nonparametric
procedure are more appropriate for your data.
PARAMETRIC TESTS
Most of the statistical tests we perform are based on a set of assumptions.
When these assumptions are violated the results of the analysis can be misleading or
completely erroneous.
Some tests (e.g., ANOVA) require that the groups of data being studied
have the same variance. In Homogeneity of Variance we provide some tests for
determining whether groups of data have the same variance.
6
Determine if there
is a significant
difference between
an observed mean
One Sample t-
and a theoretical
Test
one. The sample
size is small and
the variance is
unknown.
One Sample Test Determine if there
Mean
is a significant
difference between
an observed mean
Z Test and a theoretical
one. The variance
is known and the
sample size is
large.
Test the
Pearson
association
Correlation Correlation
between two
Coefficient
sample
Compare two
observed means
(independent
samples). The
Two Group t-Test
sample size is
small and the
variance is
unknown
Compare two
Two Sample Test
observed means
Mean (paired sample).
Paired t-Test The sample size is
small and the
variance is
unknown
Compare two
observed means
(independent
Z Test samples). The
variance is known
and the sample
size is large.
STUDENT`S T-TEST
7
Developed by Prof. W. S. Gossett
A t-test compares the difference between two means of different groups to
determine whether the difference is statistically significant.
Let x1,x2⋯xn be a random H0:u = u2 sample of size “n” has drawn from a normal
population with mean (µ) and variance σ^2 .
̅
x -u
Under H_0, the test statistic is t=
s
n
Two Sample t-Test
Used when two independent random samples come from the normal populations
having unknown or same variance.
We test the null hypothesis that the two population means are same i.e.,
μ_1 = u_2
Assumptions:
8
1. Populations are distributed normally
2. Samples are drawn independently and at random
Conditions:
1. Standard deviations in the populations are same and not known
2. Size of the sample is small
Null Hypothesis:
.H0:u = u2
Null Hypothesis:
H0:μd = 0 ,
9
̅
d = Average of d
s= Standard deviation
n= Sample size
Z-Test
Z-test is a statistical test where normal distribution is applied and is basically
used for dealing with problems relating to large sample when the frequency is
greater then or equal to 30.
It is used when population standard deviation is known.
Assumptions:
Population is normally distributed
The sample is drawn at random
Conditions:
Population standard deviation σ is known
Size of the sample is large (say n>30)
Null Hypothesis:
Population mean (u) is equal to a specified value u0
10
Type of Correlation Correlation Coefficient
Developed by R.A.Fischher
ONE WAY ANOVA – compares two or more unmatched groups when data are
categorized in one factor.
Ex:
1. Comparing a control group with three different doses of aspirin
2. Comparing the productivity of three or more employees based on working hours in a
company
TWO WAY ANOVA
Used to determine the effect of two nominal predictor variables on a continuous
outcome variable.
It analyses the effect of the independent variables on the expected outcome
itself.
Ex: Comparing the employee productivity based on the working hours and working
conditions.
Assupmtions of ANOVA:
The samples are independent and selected randomly.
Parent population from which samples are taken is of normal distribution.
Various treatment and environmental effects are additive in nature.
2
The experiment errors are distributed normally with mean zero and variance σ .
ANOVA compares variance by means of F-ratio
11
F= variance between samples
variance within samples
It again depends on experimental designs
Null Hypothesis:
H0= All population means are the same
If the computed Fc is greater than F critical value, we are likely to reject the null
hypothesis.
If the computed Fc is lesser than the F critical value, then the null hypothesis is
accepted.
ANOVA Table
Sources of Sum of Squares Degrees of Mean Squares F + Ratio
variations (SS) Freedom (d. f.) (MS)
SS/d. f.
Between Treatment sum of TrSS TrSS
samples or squares (TrSS) (k-1) (k-1) EMS
groups
(Treatment)
Within samples Error sum of
or groups squares (ESS) (n-k) ESS
(Errors) (n-k)
Total Total sum of (n-k)
squares (TSS)
12
assumes one thing or another about the properties of the source of the population(s).
Non-parametric statistics is the branch of statistics. It refers to a statistical method in
which the data is not required to fit a normal distribution. Non-parametric statistics
uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking
or order of sorts.
For example: A survey conveying consumer preferences ranging from like to dislike
would be considered ordinal data.
Nonparametric statistics does not assume that data is drawn from normal
distribution is estimated under this form of statistical measurements like descriptive
statistics, statistical test, inference statistics and models. There is no assumption of
sample size because it observed data is quantitative.
This type of statistics can be used without mean, sample size, standard deviation or
estimation of any other parameters.
Non-parametric test do not make any assumptions on distribution in the data.
The following table describes some of the most popular non-parametric tests and why
they measure.
TYPES OF NON-PARAMETRIC TEST
Sign test
Rank sum test
Chi-square test
Wilcoxon signed-rank test
McNemer test
Spearman’s rank correlation
SIGN TETS
The sign test is one of the non-parametric test. Its names says the fact that is
based on the direction of the plus (+) and minus (-) signs of observations in a sample.
The sign test may be classified in two types:
Type Measure Name Description
One Mean One Sample Determine if there is a
Sample Wilxcon's Test significant difference
Test between an observed
mean and a theoretical
one
Randomness Run Test Determine the
randomness of data
Distribution Kolmogorov- Compare an observed
Smirnov Test [4] distribution to a
theoretical one. Data is
13
continuous
Chi Square Test Compare an observed
distribution to a
theoretical one. Data are
binned and represent
frequencies
Two Correlation Spearman Rank Test association between
Sample Correlation two sample
Mean Mann-Whitney’s Compare two observed
Test means (independent
samples)
Wilcoxon’s Test Compare two observe
[5] means (paired samples)
Distribution Kolmogorov- Compare an observed
Smirnov Test distribution to a
theoretical one. Data is
continuous
Chi Square Test Compare an observed
distribution to a
theoretical one. Data are
binned and represent
frequencies
14
Sign + 0 + + - 0 + + + -
S
(X-Y)
For example
Formula
1
U1 = n1n2+ n1(n1+1)-∑r1.
2
15
N1 = number of samples readings in one area.
For example:
H test to understand whether exam performance, measured on a continuous scale
from 0-100, differed based on test anxiety levels (i.e., dependent variable would be
“exam performance” and independent variable would be “test anxiety level”, which has
three independent groups: students with “low”, “medium” and “high” test anxiety levels).
Formula
H = 12 ∑T²-3(n+1)
n(n+1)
Where ni= sample size for a population
Ti = rank sum for population
n = total no. of observations.
CHI-SQUARE TEST
The chi-square test is a non-parametric test. It is used mainly when dealing with a
nominal variable. The chi-square test is mainly 2 methods.
Let us suppose our coin-flipping experiment yielded 12 heads and 8 tails. Our
expected frequencies (10-10) and our observed frequencies (12-8).
16
Independence: the independence of test is difference between the frequencies of
occurrence in two or more categories with two or more groups.
For example:
Educational Low Middle High Total
attainment
UG 13 16 01 30
PG 43 51 60 154
56 67 61 184
The educational attainment is classified (UG and PG) and income categories (low,
middle, high) then we could use the chi-square test for independence.
Formula: x ²-∑[(O-E) ²]
MCNEMER TEST
McNemer test is one of the important non-parametric test often used when the
data happen to be nominal and relate to two related samples. As such this test id
especially useful with before and after measurement of the same subjects.
Example: A researcher wanted to compare the attitudes of medical students toward
confidence in statistics ana[ysis before and after intensive statistic course.
Formula:
x² = (b-c) ² / (b + c) (d f)
17
SPEARMAN’S RANK CORRELATION
For example:
ENGLISH MATH (marks) Rank Ranks (MATH) Difference of
(marks) (ENGLISH) ranks
56 66 9 4 5
75 70 3 2 1
45 40 10 10 0
71 60 4 7 3
62 65 6 5 1
64 56 5 9 16
58 59 8 8 0
80 77 1 1 0
76 67 2 3 1
61 63 7 6 1
Formula:
1 - 6∑D ²
N(N ² - 1)
D=R1 - R2
Where
R1 = rating one
R2 = rating two
N = numbers of pairs
BIBLIOGRAPHY
Books
Research Methodology
C.R.K. Kothari
Website
• https://www.slideshare.com
• www.sciencecentral.com
Others
[1]. What Is The Difference Between Parametric And Non-parametric Statistics?
https://sourceessay.com/what-is-the-difference-between-parametric-and-non-
parametric-statistics?
18
[2]. Kolmogorov — Smirnov Test
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93%Smirnov_tes
[3]. Wilcoxon test
https://www.investopedia.comtermsw/wilcoxon-test.asp
19