Non Parametric Statistics

Non Parametric Statistics
Engineering Data Analysis
Department of Mathematics and Statistics

College of Science
1/65
Non Parametric Statistics
I It refers to a statistical method in which the data is not required to fit a normal
distribution. Due to such reason, they are sometimes referred to as
distribution-free tests.
I Nonparametric tests serve as an alternative to parametric tests.
I Most non-parametric tests apply to data in an ordinal scale, and some apply to
data in nominal scale.
Note: Do not use non parametric procedures if parametric procedures can be used.
2/65
Advantages of Non Parametric Statistical Procedures
I Most non parametric tests have very few requirements, it is unlikely that these
tests will be used improperly.
I For some non parametric procedures, the computations are fairly easy.
I The procedures can be used for count data or rank data such as the rankings of
a movie as excellent, good, fair, or poor.
3/65
Disadvantages of Non Parametric Statistical Procedures
I Nonparametric procedures are less efficient than parametric procedures.

I The results may or may not provide an accurate answer because they are
distribution free.
4/65
Non Parametric Tests
I One-Sample Sign Test

I Wilcoxon Signed Rank Test
I Mann Whitney U - Test
I Kruskal Wallis H - Test
I Spearman Rank Correlation Test
I Chi - square Test
5/65
One Sample Sign Test
One Sample Sign Test is a nonparametric equivalent of tests regarding a single

population mean.
Command for One Sample Sign Test
To use the command of one sample sign test, you need to download the package
signmedian.test.
signmedian.test(<numeric vector>, mu = <known mean>,

alternative=<alternative>, conf.level=1 − α )
6/65
Assumption:
The samples must be independent.
7/65
Null and Alternative Hypothesis
H 0 : M = M0
Ha : M 6= M0 two-tailed: two.sided
H 0 : M ≤ M0
Ha : M > M0 one-tailed: greater
H 0 : M ≥ M0
Ha : M < M0 one-tailed: less
8/65
Example 1: A website administrator for a company claims that the mean number of visitors per day to the
company’s website is no more than 1500. An employee doubts the accuracy of this claim. The number of visitors per
day for 20 randomly selected days are listed below. At α = 0.05, can the employee reject the administrators claim?
No. No. of Visitors No. No. of Visitors

1 1469 11 1525
2 1463 12 1568
3 1487 13 1602
4 1579 14 1544
5 1462 15 1548
6 1476 16 1492
7 1523 17 1500
8 1620 18 1452
9 1634 19 1511
10 1570 20 1823
9/65
Procedures for Testing Hypothesis
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
10/65
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
Step 2: α = 0.05
10/65
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
Step 2: α = 0.05
Step 3: Since we are comparing the mean of one sample to a known standard
mean, we will use the one sample sign test.
10/65
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
Step 2: α = 0.05
Step 4: Determine the p-value.
signmean.test.
signmean.test(<numeric vector>, mu = <known mean>,

alternative=’<condition>’, conf.level=1 − α )
10/65
Step 5: Since p-value (0.1796) is greater than to 0.05 level of significance, we failed
to reject H0 .
Step 6: There is no sufficient evidence to reject the claim of the website
administrator.
11/65
Example 2: Recent studies of the private practices of physicians who saw no

Medicaid patients suggested that the mean length of each patient visit was 22
minutes. It is believed that the mean visit length in practices with a large Medicaid
load is shorter than 22 minutes. A random sample of 20 visits in practices with a
large Medicaid load yielded, in order, the following visit lengths:
12/65
Example 2:
No. Time (minutes) No. Time (minutes)
1 3.3 11 16.8
2 4.3 12 23.4
3 3.4 13 18.1
4 20.1 14 23.5
5 15.6 15 18.7
6 20.4 16 24.8
7 16.2 17 18.9
8 21.6 18 24.9
9 16.4 19 19.1
10 21.9 20 26.8
Based on these data, is there sufficient evidence to conclude that the mean visit length in practices with a large
Medicaid load is shorter than 22 minutes?
13/65
Step 1: H0 : M ≥ 22 and Ha : M < 22
Step 2: α = 0.05
signmean.test.
signmean.test(<numeric vector>, mu = 22,

alternative=’<condition>’, conf.level=1 − α )
14/65
Step 5: Since p-value (0.02069) is less than to 0.05 level of significance, we reject
H0 .
Step 6: There is sufficient evidence to conclude that the mean visit length in
practices with a large Medicaid load is shorter than 22 minutes.
15/65
Wilcoxon Signed Rank Test
Wilcoxon Signed Rank Test is a non parametric equivalent to t-test for two related
samples.
Command for Wilcoxon Signed Rank Test
wilcox.test(<a>, <b>, alternative = <condition>, paired =

TRUE, conf.level=1 − α, exact = FALSE)
If there is a tie in your data, it is necessary to add the command exact = FALSE
to avoid error message in the console.
a: numeric vector of data values

b: numeric vector of data values
16/65
H 0 : M1 = M2
Ha : M1 6= M2 two-tailed: two.sided
H 0 : M1 ≤ M2
Ha : M1 > M2 one-tailed: greater
H 0 : M1 ≥ M2
Ha : M1 < M2 one-tailed: less
17/65
Assumptions:
I Your dependent variable should be measured at the ordinal or continuous level.
I Your independent variable should consist of two categorical, ”related groups” or
”matched pairs”.
18/65
Example 1: Ten women participate in a study. A physical therapist measures the women’s waistlines before and
8 weeks after a rigorous exercise program begins. Test whether the program decreased the mean waistline at the
α = 0.01 level of significance.
Before (inches) After (inches)

48.0 42.5
23.1 20.7
28.6 18.3
23.9 24.8
25.7 58.3
47.8 33.7
38.9 33.5
22.1 22.1
49.7 43.1
31.1 23.6
19/65
Step 1: H0 : Mbefore ≤ Mafter and Ha : Mbefore > Mafter

Step 2: α = 0.01
Step 3: Since we are comparing the mean of two related groups, we will use the
wilcoxon signed rank test.

20/65
Step 5: Since p-value (0.07757) is greater than 0.01 level of significance, we failed
to reject H0 .
Step 6: There is no sufficient evidence to conclude that the program help to
decreased the mean waistline.
21/65
Example 2: An analyst might want to determine whether there is a difference in the
cost per mile of airfares in the United States between 1979 and 2009 for various
cities. The data in table represent the costs per mile of airline tickets for a sample of
17 cities for both 1979 and 2009.
City 1979 2009 City 1979 2009
1 20.07 23.07 10 19.37 18.4
2 19.63 21.46 11 18.25 20.02
3 19.2 18.6 12 19.75 21.77
4 19.98 20.72 13 22.4 22.26
5 18.18 18.91 14 20.96 21.28
6 20.3 22.8 15 20.27 21.11
7 23.8 19.51 16 23.77 18.27
8 23.57 19.05 17 23.59 22.10
9 19.72 21.85 22/65
Step 1: H0 : M1979 = M2009 and Ha : M1979 6= M2009

Step 2: α = 0.05
Step 3: Since we are comparing the mean of two related groups, we will use the
wilcoxon signed rank test.

23/65
to reject H0 .
Step 6: There is no sufficient evidence to conclude that the there is difference in
the cost per mile of airfares in the United States between 1979 and 2009 for various
cities.
24/65
Mann Whitney U-Test
Mann Whitney U-Test is a non parametric procedure that is used to test the equality
of two population means from independent samples. Non parametric equivalent of
independent sample t-test.
Command for Mann Whitney U-Test
wilcox.test(<a>, <b>, alternative=’<condition>’,

conf.level=1 − α, exact = FALSE)
If there is a tie in your data, it is necessary to add the command exact = FALSE
to avoid error message in the console.
25/65
Mann Whitney U-Test
H 0 : M1 = M2
Ha : M1 6= M2 two-tailed: two.sided
H 0 : M1 ≤ M2
Ha : M1 > M2 one-tailed: greater
H 0 : M1 ≥ M2
Ha : M1 < M2 one-tailed: less
26/65
Mann Whitney U-Test
Assumptions:
I Your dependent variable should be measured at the ordinal or continuous level.
I Your independent variable should consist of two categorical, ”independent
groups”.
27/65
Mann Whitney U-Test
Example 1: When exposed to an infection, a person typically develops antibodies.

The extent to which the antibodies respond can be measured by looking at a
person’s titer, which is a measure of the number of antibodies present. The higher
the titer is, the more antibodies that are present. The data in table represent the
titers of 11 ill people and 11 healthy people exposed to the tularemia virus in
Vermont. Is the level of titer in the ill group greater than the level of titer in the
healthy group? Use the α = 0.10 level of significance.
28/65
Mann Whitney U-Test
ill healthy
640 10
80 320
1280 320
160 320
640 80
640 160
1280 10
640 640
160 160
320 320
160 320
29/65
Step 1: H0 : Mill ≤ Mhealthy and Ha : Mill > Mhealthy

Step 2: α = 0.10
Step 3: Since we are comparing the mean of two independent groups, we will use
the Mann Whitney U-Test.
wilcox.test(<a>, <b>, alternative=’<condition>’, conf.level =

0.9, exact = FALSE)
30/65
H0 .
Step 6: There is sufficient evidence to conclude that the level of titer in the ill
group is greater than the level of titer in the healthy group.
31/65
Mann Whitney U-Test
Example 2: An engineer is comparing the time to failure (in flight hours) of two
different air conditioners for airplanes and wants to determine if the mean time to
failure for model Y is longer than the mean time to failure for model X. She obtains
a random sample of 26 failure times for model X and an independent random sample
of 17 failure times for model Y. Do the data in Table suggest that the time to failure
for model Y is longer? Use the α = 0.05 level of significance.
32/65
Mann Whitney U-Test
Model X Model Y Model X Model Y

7 115 109 168
20 55 33 118
5 219 25 122
52 245 19 253
103 239 59
17 130 287
7 412 128
4 62 68
76 225 3
19 129 4
25 71 91
4 12 472
76 200 28
33/65
Step 1: H0 : Mx ≥ My and Ha : Mx < My

Step 2: α = 0.05
Step 3: Since we are comparing the mean of two independent groups, we will use
the Mann Whitney U-Test.
wilcox.test(<a>, <b>, alternative=’<condition>’, conf.level =

0.95, exact = FALSE)
34/65
Step 5: Since p-value (0.0001362) is less than to 0.05 level of significance, we

reject H0 .
Step 6: There is sufficient evidence to conclude that the mean time to failure for
model Y is longer than the mean time to failure for model X.
35/65
Kruskal Wallis H-Test
Kruskal Wallis H-Test is a rank based non parametric test that can be used to
determine if there are statistically significant differences between two or more groups
of an independent variable on continuous or ordinal dependent variable. It is a non
parametric equivalent to one way ANOVA.
Command for Kruskal Wallis H-Test
kruskal.test(<numeric vector> ∼ <grouping factor>)
36/65
H0 : µ 1 = µ 2 = · · · = µ k
Ha : At least one of the population means is different from the others.
37/65
Assumptions:
I One independent variable with two or more levels (independent groups). The
test is more commonly used when you have three or more levels.
I The level of measurement of dependent variable are ordinal, interval or ratio
level.
I Your observations should be independent.
38/65
Example 1: Researchers wanted to compare math test scores of students at the end of secondary school from
various cities. Eight randomly selected students each from Makati, Manila, and the Quezon City were administered
the same exam; the results are presented in the table. Can the researchers conclude that the distribution of exam
scores is different for each city at the α = 0.01 level of significance?
Makati Manila Quezon City

578 568 506
548 530 518
521 571 485
555 569 480
548 563 458
530 535 456
502 561 513
492 450 491
39/65
Step 1:
H0 : The distribution of exam scores is the same for each city.
Ha : The distribution of exam scores is different for each city.
Step 2: α = 0.01
Step 3: Since we are comparing the mean of more than two independent groups, we
will use the Kruskal Wallis H-Test.
40/65
H0 .
Step 6: This means that the distribution of exam scores is different for each city.
41/65
Example 2: A family doctor claims that the distribution of HDL cholesterol in male for age groups 20 to 29
years old, 40 to 49 years old, and 60 to 69 years old are different. He obtains a simple random sample of 12
individuals from each age group and determines their HDL cholesterol. The results are presented in the table.Test
the doctor’s claim at the α = 0.05 level of significance.
No. 20-29 years old 40-49 years old 60-69 years old
1 54 61 44
2 43 41 70
3 38 44 80
4 30 47 53
5 61 33 51
6 53 29 49
7 35 59 49
8 34 35 42
9 39 34 35
10 46 74 44
11 50 50 37
12 35 65 38
42/65
Step 1:
H0 : The distribution of HDL cholesterol in male for the age groups 20 to 29 years
old, 40 to 49 years old, and 60 to 69 years old are the same.
Ha : The distribution of HDL cholesterol in male for the age groups 20 to 29 years
old, 40 to 49 years old, and 60 to 69 years old are different.
Step 2: α = 0.05
Step 3: Since we are comparing the mean of more than two independent groups, we
will use the Kruskal Wallis H-Test.
43/65
to reject H0 .
Step 6: There is no sufficient evidence to support the claim of the doctor.
44/65
Spearman Rank Correlation
Spearman Rank Correlation (Spearman Rho) is used to measure the strength and
direction of association between two ordinal or continuous variables. It is a non
parametric version of the Pearson Product-Moment correlation.
Command for Spearman Rho
cor.test(<numeric vector (independent)> , <numeric vector

(dependent)> method = ’spearman’, conf.level = 1 − α)
45/65
H0 : There is no significant relationship between two continuous/ordinal variables.

Ha : There is significant relationship between two continuous/ordinal variables.
46/65
Assumptions:
I The two variables should be measured on an ordinal or continuous scale.
I There needs to be a monotonic relationship between the two variables.
47/65
Example 1: Here is the data of 9 participants in a triathlon. Is there a relationship between the individual ranks
obtained in swimming and cycling at 0.05 level of significance?
Swimming Rank Cycling Rank

46 99
45 98
18 10
22 25
17 16
31 32
48 33
1 2
61 59
5 8
48/65
Step 1:
H0 : There is no significant relationship between the individual ranks obtained in
swimming and cycling.
Ha : There is significant relationship between the individual ranks obtained in
swimming and cycling.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two ordinal variables, we
will use Spearman Rho.

49/65
Step 5: Since the p-value (0.00138) is less than to 0.05 level of significance, we
reject H0 .
Step 6: There is significant relationship between the individual ranks obtained in
swimming and cycling and its relationship is very strong based on correlation
coefficient (0.8909).
50/65
Example 2: The following are the ranks in statistics and the ranks in mathematics of
10 students in an examination. Determine if there is a relationship between the
ranks of students in the two subjects. Use 0.05 level of significance.
Subject Statistics Mathematics
1 56 66
2 75 70
3 45 20
4 71 60
5 62 65
6 64 56
7 58 59
8 80 77
9 76 67
10 61 68 51/65
Step 1:
H0 : There is no significant relationship between the ranks of students in statistics
and mathematics subjects.
Ha : There is significant relationship between the ranks of students in statistics and
mathematics subjects.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two ordinal variables, we
will use Spearman Rho.

52/65
Step 5: Since the p-value (0.06025) is greater than to 0.05 level of significance, we
failed reject H0 .
Step 6: There is no significant relationship between the ranks of students in
statistics and mathematics subjects and its relationship is moderately strong based
on correlation coefficient (0.6242).
53/65
Chi-Square Test
Chi-Square: Test for independence is used to discover if there is association between

two categorical variables.
Command for Chi-Square Test
chisq.test(<x>,<y>)
x: numeric vector or matrix

y: numeric vector; ignore if x is a matrix
54/65
Chi-Square Test
H0 : The two categorical variables are independent.

Ha : The two categorical variables are dependent.
55/65
Chi-Square Test
Assumptions:
I There are 2 variables, and both are measured as categories, usually at the
nominal level. However, categories may be ordinal. Interval or ratio data that
have been collapsed into ordinal categories may also be used.
I The two variables should consist of two or more categorical, independent groups.
I The data in the cells should be frequencies, or counts of cases rather than
percentages or some other transformation of the data.
I For a 2 by 2 table, all expected frequencies > 5.
I For a larger table, all expected frequencies > 1 and no more than 20% of all
cells may have expected frequencies < 5.
56/65
Chi-Square Test
Example 1: The Gallup Organization conducted a survey in 2014 asking individuals

questions pertaining to social well-being such as strength of relationship with spouse,
partner, or closest friend, making time for trips or vacations, and having someone
who encourages them to be healthy. Social well-being scores were determined based
on answers to these questions and used to categorize individuals as thriving,
struggling, or suffering in their social wellbeing. In addition, body mass index (BMI)
was determined based on height and weight of the individual. This allowed for
classification as obese, overweight, normal weight, or underweight.
57/65
Chi-Square Test
The data in the following contingency table are based on the results of this survey.
Thriving Struggling Suffering

Obese 202 250 102
Overweight 294 302 110
Normal Weight 300 295 103
Underweight 17 17 8
Researchers wanted to determine whether the sample data suggests that there is an
association between weight classification and social well-being.
58/65
Step 1:
H0 : There is no association between weight classification and social well-being.
Ha : There is association between weight classification and social well-being.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two categorical variables,
we will use the Chi-square test.
59/65
The data given is presented in a contingency table. The raw data is not given. To
solve this problem, we need to construct a matrix.
Syntax: matrix(<numeric vector>, nrow = <n>, ncol = <m>, byrow =
<bool>, dimnames = list(<vector>,<vector>))
chisq.test(<matrix>)
Step 5: Since the p-value (0.3057) is greater than to 0.05 level of significance, we
failed to reject H0 .
Step 6: There is no sufficient evidence to conclude that there is an association
between weight classification and social well-being.
60/65
Chi-Square Test
Example 2: Educators are always looking for novel ways to teach statistics to
undergraduates as part of a non-statistics degree course (e.g., psychology). With
current technology, it is possible to present how-to guides for statistical programs
online instead of in a book. However, different people learn in different ways. An
educator would like to know whether gender (male/female) is associated with the
preferred type of learning medium (online vs. books). Import excel file
”CHISQUARE(gender vs learning medium)”.
61/65
Testing the Assumption
Contingency Table
To Construct Contingency Table
table(<row category>, <column category>)
62/65
Testing the Assumption
Contingency Table
To Construct Contingency Table
table(<row category>, <column category>)
Check if all frequencies are greater than 5.
62/65
Procedures for Testing the Hypothesis
Step 1:
H0 : Gender is not associated with the preferred type of learning medium.
Ha : Gender is associated with the preferred type of learning medium.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two categorical variables,
we will use Chi-square test.
chisq.test(<matrix>)
63/65
Procedures for Testing the Hypothesis
Step 5: Since the p-value (0.02635) is less than to 0.05 level of significance, we
reject H0 .
Step 6: There is sufficient evidence based on sample data that the gender of
students is associated with the preferred type of learning medium.
64/65
Thank You!
65/65

Non Parametric Statistics

Uploaded by

Copyright:

Available Formats

You might also like

Non Parametric Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Non Parametric Statistics

Uploaded by

Copyright:

Available Formats

Non Parametric Statistics

Engineering Data Analysis

Department of Mathematics and Statistics

I Nonparametric procedures are less efficient than parametric procedures.

I One-Sample Sign Test

One Sample Sign Test is a nonparametric equivalent of tests regarding a single

Command for One Sample Sign Test

signmedian.test(<numeric vector>, mu = <known mean>,

The samples must be independent.

Null and Alternative Hypothesis

No. No. of Visitors No. No. of Visitors

signmean.test(<numeric vector>, mu = <known mean>,

Example 2: Recent studies of the private practices of physicians who saw no

signmean.test(<numeric vector>, mu = 22,

wilcox.test(<a>, <b>, alternative = <condition>, paired =

a: numeric vector of data values

Null and Alternative Hypothesis

Before (inches) After (inches)

Step 1: H0 : Mbefore ≤ Mafter and Ha : Mbefore > Mafter

wilcox.test(<a>, <b>, alternative = <condition>, paired =

Step 1: H0 : M1979 = M2009 and Ha : M1979 6= M2009

wilcox.test(<a>, <b>, alternative = <condition>, paired =

wilcox.test(<a>, <b>, alternative=’<condition>’,

Null and Alternative Hypothesis

Example 1: When exposed to an infection, a person typically develops antibodies.

Step 1: H0 : Mill ≤ Mhealthy and Ha : Mill > Mhealthy

wilcox.test(<a>, <b>, alternative=’<condition>’, conf.level =

Model X Model Y Model X Model Y

Step 1: H0 : Mx ≥ My and Ha : Mx < My

wilcox.test(<a>, <b>, alternative=’<condition>’, conf.level =

Step 5: Since p-value (0.0001362) is less than to 0.05 level of significance, we

Command for Kruskal Wallis H-Test

kruskal.test(<numeric vector> ∼ <grouping factor>)

Null and Alternative Hypothesis

Makati Manila Quezon City

kruskal.test(<numeric vector> ∼ <grouping factor>)

kruskal.test(<numeric vector> ∼ <grouping factor>)

Command for Spearman Rho

cor.test(<numeric vector (independent)> , <numeric vector

Null and Alternative Hypothesis

H0 : There is no significant relationship between two continuous/ordinal variables.

I There needs to be a monotonic relationship between the two variables.

Swimming Rank Cycling Rank

cor.test(<numeric vector (independent)> , <numeric vector

cor.test(<numeric vector (independent)> , <numeric vector

Chi-Square: Test for independence is used to discover if there is association between

Command for Chi-Square Test

x: numeric vector or matrix

Null and Alternative Hypothesis

H0 : The two categorical variables are independent.

Example 1: The Gallup Organization conducted a survey in 2014 asking individuals

Thriving Struggling Suffering

table(<row category>, <column category>)

table(<row category>, <column category>)

Check if all frequencies are greater than 5.

You might also like