Professional Documents
Culture Documents
Non Parametric Statistics
Non Parametric Statistics
Non Parametric Statistics
1/65
Non Parametric Statistics
I It refers to a statistical method in which the data is not required to fit a normal
distribution. Due to such reason, they are sometimes referred to as
distribution-free tests.
I Nonparametric tests serve as an alternative to parametric tests.
I Most non-parametric tests apply to data in an ordinal scale, and some apply to
data in nominal scale.
Note: Do not use non parametric procedures if parametric procedures can be used.
2/65
Advantages of Non Parametric Statistical Procedures
I Most non parametric tests have very few requirements, it is unlikely that these
tests will be used improperly.
I For some non parametric procedures, the computations are fairly easy.
I The procedures can be used for count data or rank data such as the rankings of
a movie as excellent, good, fair, or poor.
3/65
Disadvantages of Non Parametric Statistical Procedures
4/65
Non Parametric Tests
5/65
One Sample Sign Test
To use the command of one sample sign test, you need to download the package
signmedian.test.
6/65
One Sample Sign Test
Assumption:
7/65
One Sample Sign Test
H 0 : M = M0
Ha : M 6= M0 two-tailed: two.sided
H 0 : M ≤ M0
Ha : M > M0 one-tailed: greater
H 0 : M ≥ M0
Ha : M < M0 one-tailed: less
8/65
One Sample Sign Test
Example 1: A website administrator for a company claims that the mean number of visitors per day to the
company’s website is no more than 1500. An employee doubts the accuracy of this claim. The number of visitors per
day for 20 randomly selected days are listed below. At α = 0.05, can the employee reject the administrators claim?
9/65
Procedures for Testing Hypothesis
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
10/65
Procedures for Testing Hypothesis
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
Step 2: α = 0.05
10/65
Procedures for Testing Hypothesis
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
Step 2: α = 0.05
Step 3: Since we are comparing the mean of one sample to a known standard
mean, we will use the one sample sign test.
10/65
Procedures for Testing Hypothesis
Step 1: H0 : M ≤ 1500 and Ha : M > 1500
Step 2: α = 0.05
Step 3: Since we are comparing the mean of one sample to a known standard
mean, we will use the one sample sign test.
Step 4: Determine the p-value.
Command for One Sample Sign Test
To use the command of one sample sign test, you need to download the package
signmean.test.
10/65
Procedures for Testing Hypothesis
Step 5: Since p-value (0.1796) is greater than to 0.05 level of significance, we failed
to reject H0 .
Step 6: There is no sufficient evidence to reject the claim of the website
administrator.
11/65
One Sample Sign Test
12/65
One Sample Sign Test
Example 2:
No. Time (minutes) No. Time (minutes)
1 3.3 11 16.8
2 4.3 12 23.4
3 3.4 13 18.1
4 20.1 14 23.5
5 15.6 15 18.7
6 20.4 16 24.8
7 16.2 17 18.9
8 21.6 18 24.9
9 16.4 19 19.1
10 21.9 20 26.8
Based on these data, is there sufficient evidence to conclude that the mean visit length in practices with a large
Medicaid load is shorter than 22 minutes?
13/65
Procedures for Testing Hypothesis
Step 1: H0 : M ≥ 22 and Ha : M < 22
Step 2: α = 0.05
Step 3: Since we are comparing the mean of one sample to a known standard
mean, we will use the one sample sign test.
Step 4: Determine the p-value.
Command for One Sample Sign Test
To use the command of one sample sign test, you need to download the package
signmean.test.
14/65
Procedures for Testing Hypothesis
Step 5: Since p-value (0.02069) is less than to 0.05 level of significance, we reject
H0 .
Step 6: There is sufficient evidence to conclude that the mean visit length in
practices with a large Medicaid load is shorter than 22 minutes.
15/65
Wilcoxon Signed Rank Test
Wilcoxon Signed Rank Test is a non parametric equivalent to t-test for two related
samples.
Command for Wilcoxon Signed Rank Test
If there is a tie in your data, it is necessary to add the command exact = FALSE
to avoid error message in the console.
16/65
Wilcoxon Signed Rank Test
H 0 : M1 = M2
Ha : M1 6= M2 two-tailed: two.sided
H 0 : M1 ≤ M2
Ha : M1 > M2 one-tailed: greater
H 0 : M1 ≥ M2
Ha : M1 < M2 one-tailed: less
17/65
Wilcoxon Signed Rank Test
Assumptions:
I Your dependent variable should be measured at the ordinal or continuous level.
I Your independent variable should consist of two categorical, ”related groups” or
”matched pairs”.
18/65
Wilcoxon Signed Rank Test
Example 1: Ten women participate in a study. A physical therapist measures the women’s waistlines before and
8 weeks after a rigorous exercise program begins. Test whether the program decreased the mean waistline at the
α = 0.01 level of significance.
19/65
Procedures for Testing Hypothesis
20/65
Procedures for Testing Hypothesis
Step 5: Since p-value (0.07757) is greater than 0.01 level of significance, we failed
to reject H0 .
Step 6: There is no sufficient evidence to conclude that the program help to
decreased the mean waistline.
21/65
Wilcoxon Signed Rank Test
Example 2: An analyst might want to determine whether there is a difference in the
cost per mile of airfares in the United States between 1979 and 2009 for various
cities. The data in table represent the costs per mile of airline tickets for a sample of
17 cities for both 1979 and 2009.
City 1979 2009 City 1979 2009
1 20.07 23.07 10 19.37 18.4
2 19.63 21.46 11 18.25 20.02
3 19.2 18.6 12 19.75 21.77
4 19.98 20.72 13 22.4 22.26
5 18.18 18.91 14 20.96 21.28
6 20.3 22.8 15 20.27 21.11
7 23.8 19.51 16 23.77 18.27
8 23.57 19.05 17 23.59 22.10
9 19.72 21.85 22/65
Procedures for Testing Hypothesis
23/65
Step 5: Since p-value (0.6701) is greater than to 0.05 level of significance, we failed
to reject H0 .
Step 6: There is no sufficient evidence to conclude that the there is difference in
the cost per mile of airfares in the United States between 1979 and 2009 for various
cities.
24/65
Mann Whitney U-Test
Mann Whitney U-Test is a non parametric procedure that is used to test the equality
of two population means from independent samples. Non parametric equivalent of
independent sample t-test.
Command for Mann Whitney U-Test
If there is a tie in your data, it is necessary to add the command exact = FALSE
to avoid error message in the console.
25/65
Mann Whitney U-Test
H 0 : M1 = M2
Ha : M1 6= M2 two-tailed: two.sided
H 0 : M1 ≤ M2
Ha : M1 > M2 one-tailed: greater
H 0 : M1 ≥ M2
Ha : M1 < M2 one-tailed: less
26/65
Mann Whitney U-Test
Assumptions:
I Your dependent variable should be measured at the ordinal or continuous level.
I Your independent variable should consist of two categorical, ”independent
groups”.
27/65
Mann Whitney U-Test
28/65
Mann Whitney U-Test
ill healthy
640 10
80 320
1280 320
160 320
640 80
640 160
1280 10
640 640
160 160
320 320
160 320
29/65
Procedures for Testing Hypothesis
30/65
Procedures for Testing Hypothesis
Step 5: Since p-value (0.04657) is less than to 0.10 level of significance, we reject
H0 .
Step 6: There is sufficient evidence to conclude that the level of titer in the ill
group is greater than the level of titer in the healthy group.
31/65
Mann Whitney U-Test
Example 2: An engineer is comparing the time to failure (in flight hours) of two
different air conditioners for airplanes and wants to determine if the mean time to
failure for model Y is longer than the mean time to failure for model X. She obtains
a random sample of 26 failure times for model X and an independent random sample
of 17 failure times for model Y. Do the data in Table suggest that the time to failure
for model Y is longer? Use the α = 0.05 level of significance.
32/65
Mann Whitney U-Test
33/65
Procedures for Testing Hypothesis
34/65
Procedures for Testing Hypothesis
35/65
Kruskal Wallis H-Test
Kruskal Wallis H-Test is a rank based non parametric test that can be used to
determine if there are statistically significant differences between two or more groups
of an independent variable on continuous or ordinal dependent variable. It is a non
parametric equivalent to one way ANOVA.
36/65
Kruskal Wallis H-Test
H0 : µ 1 = µ 2 = · · · = µ k
Ha : At least one of the population means is different from the others.
37/65
Kruskal Wallis H-Test
Assumptions:
I One independent variable with two or more levels (independent groups). The
test is more commonly used when you have three or more levels.
I The level of measurement of dependent variable are ordinal, interval or ratio
level.
I Your observations should be independent.
38/65
Kruskal Wallis H-Test
Example 1: Researchers wanted to compare math test scores of students at the end of secondary school from
various cities. Eight randomly selected students each from Makati, Manila, and the Quezon City were administered
the same exam; the results are presented in the table. Can the researchers conclude that the distribution of exam
scores is different for each city at the α = 0.01 level of significance?
39/65
Procedures for Testing Hypothesis
Step 1:
H0 : The distribution of exam scores is the same for each city.
Ha : The distribution of exam scores is different for each city.
Step 2: α = 0.01
Step 3: Since we are comparing the mean of more than two independent groups, we
will use the Kruskal Wallis H-Test.
Step 4: Determine the p-value.
Command for Kruskal Wallis H-Test
40/65
Procedures for Testing Hypothesis
Step 5: Since p-value (0.007318) is less than to 0.01 level of significance, we reject
H0 .
Step 6: This means that the distribution of exam scores is different for each city.
41/65
Kruskal Wallis H-Test
Example 2: A family doctor claims that the distribution of HDL cholesterol in male for age groups 20 to 29
years old, 40 to 49 years old, and 60 to 69 years old are different. He obtains a simple random sample of 12
individuals from each age group and determines their HDL cholesterol. The results are presented in the table.Test
the doctor’s claim at the α = 0.05 level of significance.
No. 20-29 years old 40-49 years old 60-69 years old
1 54 61 44
2 43 41 70
3 38 44 80
4 30 47 53
5 61 33 51
6 53 29 49
7 35 59 49
8 34 35 42
9 39 34 35
10 46 74 44
11 50 50 37
12 35 65 38
42/65
Procedures for Testing Hypothesis
Step 1:
H0 : The distribution of HDL cholesterol in male for the age groups 20 to 29 years
old, 40 to 49 years old, and 60 to 69 years old are the same.
Ha : The distribution of HDL cholesterol in male for the age groups 20 to 29 years
old, 40 to 49 years old, and 60 to 69 years old are different.
Step 2: α = 0.05
Step 3: Since we are comparing the mean of more than two independent groups, we
will use the Kruskal Wallis H-Test.
Step 4: Determine the p-value.
Command for Kruskal Wallis H-Test
43/65
Procedures for Testing Hypothesis
Step 5: Since p-value (0.5774) is greater than to 0.05 level of significance, we failed
to reject H0 .
Step 6: There is no sufficient evidence to support the claim of the doctor.
44/65
Spearman Rank Correlation
Spearman Rank Correlation (Spearman Rho) is used to measure the strength and
direction of association between two ordinal or continuous variables. It is a non
parametric version of the Pearson Product-Moment correlation.
45/65
Spearman Rank Correlation
46/65
Spearman Rank Correlation
Assumptions:
I The two variables should be measured on an ordinal or continuous scale.
47/65
Spearman Rank Correlation
Example 1: Here is the data of 9 participants in a triathlon. Is there a relationship between the individual ranks
obtained in swimming and cycling at 0.05 level of significance?
48/65
Procedures for Testing Hypothesis
Step 1:
H0 : There is no significant relationship between the individual ranks obtained in
swimming and cycling.
Ha : There is significant relationship between the individual ranks obtained in
swimming and cycling.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two ordinal variables, we
will use Spearman Rho.
Step 4: Determine the p-value.
Command for Spearman Rho
Step 5: Since the p-value (0.00138) is less than to 0.05 level of significance, we
reject H0 .
Step 6: There is significant relationship between the individual ranks obtained in
swimming and cycling and its relationship is very strong based on correlation
coefficient (0.8909).
50/65
Spearman Rank Correlation
Example 2: The following are the ranks in statistics and the ranks in mathematics of
10 students in an examination. Determine if there is a relationship between the
ranks of students in the two subjects. Use 0.05 level of significance.
Subject Statistics Mathematics
1 56 66
2 75 70
3 45 20
4 71 60
5 62 65
6 64 56
7 58 59
8 80 77
9 76 67
10 61 68 51/65
Procedures for Testing Hypothesis
Step 1:
H0 : There is no significant relationship between the ranks of students in statistics
and mathematics subjects.
Ha : There is significant relationship between the ranks of students in statistics and
mathematics subjects.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two ordinal variables, we
will use Spearman Rho.
Step 4: Determine the p-value.
Command for Spearman Rho
Step 5: Since the p-value (0.06025) is greater than to 0.05 level of significance, we
failed reject H0 .
Step 6: There is no significant relationship between the ranks of students in
statistics and mathematics subjects and its relationship is moderately strong based
on correlation coefficient (0.6242).
53/65
Chi-Square Test
chisq.test(<x>,<y>)
54/65
Chi-Square Test
55/65
Chi-Square Test
Assumptions:
I There are 2 variables, and both are measured as categories, usually at the
nominal level. However, categories may be ordinal. Interval or ratio data that
have been collapsed into ordinal categories may also be used.
I The two variables should consist of two or more categorical, independent groups.
I The data in the cells should be frequencies, or counts of cases rather than
percentages or some other transformation of the data.
I For a 2 by 2 table, all expected frequencies > 5.
I For a larger table, all expected frequencies > 1 and no more than 20% of all
cells may have expected frequencies < 5.
56/65
Chi-Square Test
57/65
Chi-Square Test
The data in the following contingency table are based on the results of this survey.
Researchers wanted to determine whether the sample data suggests that there is an
association between weight classification and social well-being.
58/65
Procedures for Testing Hypothesis
Step 1:
H0 : There is no association between weight classification and social well-being.
Ha : There is association between weight classification and social well-being.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two categorical variables,
we will use the Chi-square test.
59/65
Procedures for Testing Hypothesis
Step 4: Determine the p-value.
The data given is presented in a contingency table. The raw data is not given. To
solve this problem, we need to construct a matrix.
Syntax: matrix(<numeric vector>, nrow = <n>, ncol = <m>, byrow =
<bool>, dimnames = list(<vector>,<vector>))
Command for Chi-Square Test
chisq.test(<matrix>)
Step 5: Since the p-value (0.3057) is greater than to 0.05 level of significance, we
failed to reject H0 .
Step 6: There is no sufficient evidence to conclude that there is an association
between weight classification and social well-being.
60/65
Chi-Square Test
Example 2: Educators are always looking for novel ways to teach statistics to
undergraduates as part of a non-statistics degree course (e.g., psychology). With
current technology, it is possible to present how-to guides for statistical programs
online instead of in a book. However, different people learn in different ways. An
educator would like to know whether gender (male/female) is associated with the
preferred type of learning medium (online vs. books). Import excel file
”CHISQUARE(gender vs learning medium)”.
61/65
Testing the Assumption
Contingency Table
To Construct Contingency Table
62/65
Testing the Assumption
Contingency Table
To Construct Contingency Table
62/65
Procedures for Testing the Hypothesis
Step 1:
H0 : Gender is not associated with the preferred type of learning medium.
Ha : Gender is associated with the preferred type of learning medium.
Step 2: α = 0.05
Step 3: Since we are testing the significant relationship of two categorical variables,
we will use Chi-square test.
Step 4: Determine the p-value.
Command for Chi-Square Test
chisq.test(<matrix>)
63/65
Procedures for Testing the Hypothesis
Step 5: Since the p-value (0.02635) is less than to 0.05 level of significance, we
reject H0 .
Step 6: There is sufficient evidence based on sample data that the gender of
students is associated with the preferred type of learning medium.
64/65
Thank You!
65/65