Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

TESTING HYPOTHESIS:

Parametric and Non-parametric Test

Prepared by:
Anjene Palma

1
Parametric and Nonparametric: Demystifying the Terms

By Tanya Hoskin, a statistician in the Mayo Clinic Department of Health Sciences


Research who provides consultations through the Mayo Clinic CTSA BERD Resource. In
More Good Reasons to Look at the Data, we looked at data distributions to assess
center, shape and spread and described how the validity of many statistical procedures
relies on an assumption of approximate normality. But what do we do if our data are not
normal? In this article, we’ll cover the difference between parametric and nonparametric
procedures. Nonparametric procedures are one possible solution to handle non-normal
data.
Definitions
If you’ve ever discussed an analysis plan with a statistician, you’ve probably
heard the term “nonparametric” but may not have understood what it means.
Parametric and nonparametric are two broad classifications of statistical procedures.
The Handbook of Nonparametric Statistics 1 from 1962 (p. 2) says:
“A precise and universally acceptable definition of the term ‘nonparametric’ is not
presently available. The viewpoint adopted in this handbook is that a statistical
procedure is of a nonparametric type if it has properties which are satisfied to a
reasonable approximation when some assumptions that are at least of a moderately
general nature hold.”
That definition is not helpful in the least, but it underscores the fact that it is
difficult to specifically define the term “nonparametric.” It is generally easier to list
examples of each type of procedure (parametric and nonparametric) than to define the
terms themselves. For most practical purposes, however, one might define
nonparametric statistical procedures as a class of statistical procedures that do not rely
on assumptions about the shape or form of the probability distribution from which the
data were drawn.
The short explanation
Several fundamental statistical concepts are helpful prerequisite knowledge for
fully understanding the terms “parametric” and “nonparametric.” These statistical
fundamentals include random variables, probability distributions, parameters,
population, sample, sampling distributions and the Central Limit Theorem. I cannot
explain these topics in a few paragraphs, as they would usually comprise two or three
chapters in a statistics textbook.

2
The field of statistics exists because it is usually impossible to collect data from
all individuals of interest (population). Our only solution is to collect data from a subset
(sample) of the individuals of interest, but our real desire is to know the “truth” about the
population. Quantities such as means, standard deviations and proportions are all
important values and are called “parameters” when we are talking about a population.
Since we usually cannot get data from the whole population, we cannot know the values
of the parameters for that population. We can, however, calculate estimates of these
quantities for our sample. When they are calculated from sample data, these quantities
are called “statistics.” A statistic estimates a parameter.
Parametric statistical procedures rely on assumptions about the shape of the
distribution (i.e., assume a normal distribution) in the underlying population and about
the form or parameters (i.e., means and standard deviations) of the assumed
distribution. Nonparametric statistical procedures rely on no or few assumptions about
the shape or parameters of the population distribution from which the sample was
drawn.

DIFFERENT HYPOTHESIS TESTS


Table 1.
Hypothesis Test Underlying Purpose
Distribution
Comparison one
1 Sample sample average to
Normal
t-Test a historical average
or target
Compares two
2 Sample
Normal independent
t-Test
sample averages
Compares two
Paired t-Test Normal dependent sample
averages
Parametric Compare two or
(Assumes the data Test for Equal more independent
subscribes to a Variances Chi-Square sample variances
distribution) or standard
deviations
Compares one
sample proportion
1proporttion Test Binomial (percentage) to a
historical average
target
Compares two
2 Proportion Test Binomial independent
proportions

3
Chi-square Determine whether
Ch-square a data set fits a
Goodness of Fit
known distribution
Chi-square Test for Chi-square Determine whether
Independence probabilities
classified
Non-Parametric Compares one
(Makes no sample median to a
assumption about 1 Sample Sign Test None historical median or
the underlying target
distribution of the Compares two
Mann-Whitney None
data) independent
TEST
sample median
www.statanalytica.com

Parametric tests and analogous nonparametric procedures


As I mentioned, it is sometimes easier to list examples of each type of procedure
than to define the terms. Table 1 contains the names of several statistical procedures
you might be familiar with and categorizes each one as parametric or nonparametric.
All of the parametric procedures listed in Table 1 rely on an assumption of approximate
normality.
Table 2.
Analysis type Example Parametric Nonparametric
Procedure procedure
Compare means Is the mean
between two systolic blood
distinct/independent pressure (at Wilcoxon
groups baseline) for rank-sum test
patients assigned Two-sample t-test
to placebo different
from mean for
patients assigned
for treatment
group?
Compare two Was there a
quantitative significant change
measurements in systolic blood
taken from the pressure between Wilcoxon
same individual baseline and the six Paired t-test signed-rank test
-month follow-up
measurement in
the treatment

4
group?
Compare means If our experiment
between two or has three groups Analysis
more (e.g., placebo, new of variance Kruskal-Wallis test
distinct/independent drug #1, new drug (ANOVA)
groups #2), we might want
to know whether
the mean systolic
blood pressure at
baseline differed
among the three
groups?
Estimate the degree Is systolic blood
of association pressure Pearson coefficient Spearman`s rank
between two associated with the of correlation correlation
quantitative patient`s age?
variables

Take- home points


Here is a summary of the major points and how they might affect statistical
analysis you perform:
 Parametric and nonparametric are two broad classifications of statistical
procedures.
 Parametric tests are based on assumptions about the distribution of underlying
population from which the sample was taken. The most common parametric
assumption is that data are approximately normally distributed.
 Nonparametric test do not rely on assumptions about the shape or parameters
of the underlying population distribution.
 If the data deviate strongly from the assumptions of a parametric procedure,
using the parametric procedure could lead to incorrect conclusions.
 You should be aware of the assumptions associated with a parametric procedure
and should learn methods to evaluate the validity of those assumptions.
 If you determine that the assumptions of the parametric procedure is not valid,
use an analogous nonparametric procedure instead.
 The parametric assumption of normality is particularly worrisome for small size
(n<30). Nonparametric tests are often a good option for these data.
 It can be difficult to decide whether to use a parameter or nonparametric
procedure in some cases. Nonparametric procedures generally have less power
for the same sample size than the corresponding parametric procedure if the
data truly are normal. Interpretation of nonparametric procedures can also be

5
more difficult than for parametric procedures.
 Visit with the statistician if you are in doubt whether parametric or nonparametric
procedure are more appropriate for your data.

PARAMETRIC TESTS
Most of the statistical tests we perform are based on a set of assumptions.
When these assumptions are violated the results of the analysis can be misleading or
completely erroneous.

Typical assumptions are:


 Normality: Data have a normal distribution (or at least is symmetric)
 Homogeneity of variances: Data from multiple groups have the same variance
 Linearity: Data have a linear relationship
 Independence: Data are independent

We explore in detail what it means for data to be normally distributed in


Normal Distribution, but in general it means that the graph of the data has the
shape of a bell curve. Such data is symmetric around its mean and has kurtosis
equal to zero. In Testing for Normality and Symmetry we provide tests to
determine whether data meet this assumption.

Some tests (e.g., ANOVA) require that the groups of data being studied
have the same variance. In Homogeneity of Variance we provide some tests for
determining whether groups of data have the same variance.

Some tests (e.g., Regression) require that there be a linear correlation


between the dependent and independent variables. Generally, linearity can be
tested graphically using scatter diagrams or via other techniques explored in
Correlation, Regression and Multiple Regression.

As already said, Parametric Tests assumes a normal distribution in the


data. The following table describes some of the most popular parametric tests
and what they measure.

Type Measure Name Description

6
Determine if there
is a significant
difference between
an observed mean
One Sample t-
and a theoretical
Test
one. The sample
size is small and
the variance is
unknown.
One Sample Test Determine if there
Mean
is a significant
difference between
an observed mean
Z Test and a theoretical
one. The variance
is known and the
sample size is
large.
Test the
Pearson
association
Correlation Correlation
between two
Coefficient
sample
Compare two
observed means
(independent
samples). The
Two Group t-Test
sample size is
small and the
variance is
unknown
Compare two
Two Sample Test
observed means
Mean (paired sample).
Paired t-Test The sample size is
small and the
variance is
unknown
Compare two
observed means
(independent
Z Test samples). The
variance is known
and the sample
size is large.

STUDENT`S T-TEST

7
 Developed by Prof. W. S. Gossett
 A t-test compares the difference between two means of different groups to
determine whether the difference is statistically significant.

One Sample t-Test


Assumptions:
 Population is normally distributed
 Sample is drawn from the population and it should be random
 We should know the population mean
Conditions:
 Population standard deviation is not known
 Size of the sample is small (<30)
 In one sample t-test, we know the population mean.
 We draw the random sample from the population and then compare the sample
mean with the population mean and make a statistical decision as to whether or
not the sample mean is different from the population.

Let x1,x2⋯xn be a random H0:u = u2 sample of size “n” has drawn from a normal
population with mean (µ) and variance σ^2 .

Null Hypothesis (H_0):


Population mean (µ) is equal to a specified value μ_0.

̅
x -u
Under H_0, the test statistic is t=
s
n
Two Sample t-Test
 Used when two independent random samples come from the normal populations
having unknown or same variance.
 We test the null hypothesis that the two population means are same i.e.,
μ_1 = u_2
Assumptions:

8
1. Populations are distributed normally
2. Samples are drawn independently and at random
Conditions:
1. Standard deviations in the populations are same and not known
2. Size of the sample is small
Null Hypothesis:
.H0:u = u2

Under H0, the test statistic is t =


| ̅ x -̅ y |
1 1
s +
n1 n2
Paired t-Test
Used when measurements are taken from the same subject before and after some
manipulation or treatment.
Ex: To determine the significance of a difference in blood pressure before and after
administration of an experimental pressure substance.
Assumptions:
1. Population are distributed normally
2. Samples are drawn independently and at random
Conditions:
1. Samples are related with each other
2. Sizes of the samples are small and equal
3. Standard deviations in the populations are equal and not known

Null Hypothesis:
H0:μd = 0 ,

Under H0, the test statistic is t =


|̅ d |
s
n
Where, d = difference between x1 and x2

9
̅
d = Average of d
s= Standard deviation
n= Sample size

Z-Test
 Z-test is a statistical test where normal distribution is applied and is basically
used for dealing with problems relating to large sample when the frequency is
greater then or equal to 30.
 It is used when population standard deviation is known.
Assumptions:
 Population is normally distributed
 The sample is drawn at random
Conditions:
 Population standard deviation σ is known
 Size of the sample is large (say n>30)

Null Hypothesis:
Population mean (u) is equal to a specified value u0

H :u = u Under H , the test statistic is Z =


| ̅
x -μ |0
0 0 0
s
n
If the calculated value of Z<table value of Z is at 5% level of significance, H0 is
accepted hence we conclude that there is no significant difference between the
population mean and the one specified in H0 as μ0.

Pearson’s ‘r’ Correlation


 Correlation is a technique for investigating the relationship between two
quantitative continuous variables.
 Pearson’s Correlation Coefficient (r) is a measure of the strength of the
association between the two variables.
TYPES OF CORRELATION

10
Type of Correlation Correlation Coefficient

Perfect Positive Correlation r=+


Partial Positive Correlation 0 < r < +1
No Correlation r=0
Partial Negative Correlation 0 > r > -1
Perfect Negative Correlation r = -1

ANOVA (Analysis of Variance)


 Analysis of Variance (ANOVA) is a collection of statistical models used to
analyze the different between group means of variances.
 Compares multiple groups at one time

 Developed by R.A.Fischher
ONE WAY ANOVA – compares two or more unmatched groups when data are
categorized in one factor.

Ex:
1. Comparing a control group with three different doses of aspirin
2. Comparing the productivity of three or more employees based on working hours in a
company
TWO WAY ANOVA
 Used to determine the effect of two nominal predictor variables on a continuous
outcome variable.
 It analyses the effect of the independent variables on the expected outcome
itself.
Ex: Comparing the employee productivity based on the working hours and working
conditions.

Assupmtions of ANOVA:
 The samples are independent and selected randomly.
 Parent population from which samples are taken is of normal distribution.
 Various treatment and environmental effects are additive in nature.
2
 The experiment errors are distributed normally with mean zero and variance σ .
 ANOVA compares variance by means of F-ratio

11
F= variance between samples
variance within samples
 It again depends on experimental designs
Null Hypothesis:
H0= All population means are the same

 If the computed Fc is greater than F critical value, we are likely to reject the null
hypothesis.
 If the computed Fc is lesser than the F critical value, then the null hypothesis is
accepted.
ANOVA Table
Sources of Sum of Squares Degrees of Mean Squares F + Ratio
variations (SS) Freedom (d. f.) (MS)

SS/d. f.
Between Treatment sum of TrSS TrSS
samples or squares (TrSS) (k-1) (k-1) EMS
groups
(Treatment)
Within samples Error sum of
or groups squares (ESS) (n-k) ESS
(Errors) (n-k)
Total Total sum of (n-k)
squares (TSS)

S. No Type of group Parametric test


1. Comparison of two paired groups Unpaired t-Test
2. Comparison of two unpaired groups Unpaired sample t-Test
3. Comparison of population and sample One sample t-Test
drawn from the same population
4. Comparison of three or more matched Two way ANOVA
groups but varied in two factors
5. Comparison of three or more matched One way ANOVA
groups but varied in one factor
6. Correlation between two variables Pearson Correlation

NON-PARAMETRIC TEST STATISTICS


Non-parametric test is one that makes no such assumptions. In this strict sense,
“non-parametric” is essentially a null category, since virtually all statistical tests

12
assumes one thing or another about the properties of the source of the population(s).
Non-parametric statistics is the branch of statistics. It refers to a statistical method in
which the data is not required to fit a normal distribution. Non-parametric statistics
uses data that is often ordinal, meaning it does not rely on numbers, but rather a ranking
or order of sorts.
For example: A survey conveying consumer preferences ranging from like to dislike
would be considered ordinal data.
Nonparametric statistics does not assume that data is drawn from normal
distribution is estimated under this form of statistical measurements like descriptive
statistics, statistical test, inference statistics and models. There is no assumption of
sample size because it observed data is quantitative.
This type of statistics can be used without mean, sample size, standard deviation or
estimation of any other parameters.
Non-parametric test do not make any assumptions on distribution in the data.
The following table describes some of the most popular non-parametric tests and why
they measure.
TYPES OF NON-PARAMETRIC TEST
 Sign test
 Rank sum test
 Chi-square test
 Wilcoxon signed-rank test
 McNemer test
 Spearman’s rank correlation
SIGN TETS
The sign test is one of the non-parametric test. Its names says the fact that is
based on the direction of the plus (+) and minus (-) signs of observations in a sample.
The sign test may be classified in two types:
Type Measure Name Description
One Mean One Sample Determine if there is a
Sample Wilxcon's Test significant difference
Test between an observed
mean and a theoretical
one
Randomness Run Test Determine the
randomness of data
Distribution Kolmogorov- Compare an observed
Smirnov Test [4] distribution to a
theoretical one. Data is

13
continuous
Chi Square Test Compare an observed
distribution to a
theoretical one. Data are
binned and represent
frequencies
Two Correlation Spearman Rank Test association between
Sample Correlation two sample
Mean Mann-Whitney’s Compare two observed
Test means (independent
samples)
Wilcoxon’s Test Compare two observe
[5] means (paired samples)
Distribution Kolmogorov- Compare an observed
Smirnov Test distribution to a
theoretical one. Data is
continuous
Chi Square Test Compare an observed
distribution to a
theoretical one. Data are
binned and represent
frequencies

ONE SAMPLE SIGN TEST


The one sample sign test is a very simple non-parametric test and the data can be
non-symmetric in nature. The one sample sign test computes the statistical
significance of a hypothesized median value for a single data set.
For example:
H0 : population median =63
H1 : population median >63
+ 8 ; - =2 Total sample = 10
TWO SAMPLE SIGN TEST
The sign test has Important applications in problems where we deal with paired
data. Each pair of value can be replace with a plus (+) sign if the first value (say X) is
greater than the first value of second sample (say Y) and we take minus (-) sign if the
first value of x is less than the first value. In case of two values are equal, the pairs are
discarded.
For example:
By X 1 0 2 3 1 0 2 2 3 0
By Y 0 0 1 0 2 0 0 1 1 2

14
Sign + 0 + + - 0 + + + -
S
(X-Y)

Total number of + signs = 6


Total number of – signs = 2 Hence, sample size is 8 [since there are 2 zeros in the
sign row and such 2 pairs are discarded (10-2=8)]
Formula
For more samples
K = n-1 -0.98 √n 2
For large samples
Z = S – np √np(1 - p)

RANK SUM TEST


Rank sum test are
 U test (Wilcoxon-Mann-Whitney test)
 H test (Kruskal-Wallis test)

 U test: It is non-parametric test. This test is determine whether two independent


samples have been drawn from the same population. The data that can be
ranked I.e., order from lowest to highest (ordinal data).

For example

The value of one sample 53, 38, 69, 57, 46.

The values of another sample 44, 40, 61, 53, 32.

We assign the ranks to all observations, adopting

low to high ranking process and given items

belong to a single sample.

Formula

1
U1 = n1n2+ n1(n1+1)-∑r1.
2

15
N1 = number of samples readings in one area.

N2 = number of samples reading in another area.

∑ = sum of ranks of readings.

 H test: The Kruskal-Wallis H test (also called as the


“one-Way ANOVA on ranks) is a rank-based non
parametric test that can be use to determine if there are
statistically significant difference between two or more
groups of an independent variable on a continuous or
ordinal dependent variable.

For example:
H test to understand whether exam performance, measured on a continuous scale
from 0-100, differed based on test anxiety levels (i.e., dependent variable would be
“exam performance” and independent variable would be “test anxiety level”, which has
three independent groups: students with “low”, “medium” and “high” test anxiety levels).

Formula
H = 12 ∑T²-3(n+1)
n(n+1)
Where ni= sample size for a population
Ti = rank sum for population
n = total no. of observations.

CHI-SQUARE TEST

The chi-square test is a non-parametric test. It is used mainly when dealing with a
nominal variable. The chi-square test is mainly 2 methods.

 Goodness of fit: Goodness of fit refers to whether a significant difference exists


between an observed number and an expected number of responses, people or
other objects.
For example:
Suppose that we flip a coin 20 times and record the frequency of occurrence of
heads and record the frequency of occurrence of heads and tails. Then we should
expect 10 heads and 10 tails.

Let us suppose our coin-flipping experiment yielded 12 heads and 8 tails. Our
expected frequencies (10-10) and our observed frequencies (12-8).

16
 Independence: the independence of test is difference between the frequencies of
occurrence in two or more categories with two or more groups.
For example:
Educational Low Middle High Total
attainment
UG 13 16 01 30
PG 43 51 60 154
56 67 61 184
The educational attainment is classified (UG and PG) and income categories (low,
middle, high) then we could use the chi-square test for independence.

Formula: x ²-∑[(O-E) ²]

WILCOXON SIGNED-RANKS TEST

In various research situation in the context of two-related sample when we can


determined both direction and magnitude of difference between the matched values, we
can use an important non-parametric test. Wilcoxon matched-pair test. While applying
this test, we first find the difference between each pair of values and assign rank to the
difference from the smallest to the largest without regard to sign.

For example: experiment on brand name quality perception


Pair Brand A Brand B Difference Rank
1 25 32 -7 7.5
2 29 30 -1 2.5
3 10 8 2 5.5
4 31 32 -1 2.5
5 27 20 7 7.5
6 24 32 -8 9
7 26 27 -1 2.5
8 29 30 -1 2.5
9 30 32 -2 5.5
10 32 32 0 Omit
11 20 30 -10 10
12 5 32 -27 11

MCNEMER TEST

McNemer test is one of the important non-parametric test often used when the
data happen to be nominal and relate to two related samples. As such this test id
especially useful with before and after measurement of the same subjects.
Example: A researcher wanted to compare the attitudes of medical students toward
confidence in statistics ana[ysis before and after intensive statistic course.
Formula:
x² = (b-c) ² / (b + c) (d f)

17
SPEARMAN’S RANK CORRELATION

In this method a measure of association that is based on the ranks of the


observations and not on the numerical values of the data. It was developed by famous
Charles Spearman in the early 1990s and such it is also known as spearman’s rank
correlation co-efficient.

For example:
ENGLISH MATH (marks) Rank Ranks (MATH) Difference of
(marks) (ENGLISH) ranks
56 66 9 4 5
75 70 3 2 1
45 40 10 10 0
71 60 4 7 3
62 65 6 5 1
64 56 5 9 16
58 59 8 8 0
80 77 1 1 0
76 67 2 3 1
61 63 7 6 1

Formula:

1 - 6∑D ²
N(N ² - 1)
D=R1 - R2

Where
R1 = rating one
R2 = rating two
N = numbers of pairs

BIBLIOGRAPHY

Books
Research Methodology
C.R.K. Kothari
Website
• https://www.slideshare.com
• www.sciencecentral.com

Others
[1]. What Is The Difference Between Parametric And Non-parametric Statistics?
https://sourceessay.com/what-is-the-difference-between-parametric-and-non-
parametric-statistics?

18
[2]. Kolmogorov — Smirnov Test
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93%Smirnov_tes
[3]. Wilcoxon test
https://www.investopedia.comtermsw/wilcoxon-test.asp

19

You might also like